All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs
@ 2023-02-02 18:57 Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

Our profile data show that using kmalloc(non_const_size)/kfree(ptr)
has a certain cost, because kfree(ptr) has to pull a 'struct page'
in cpu caches.

Using a dedicated kmem_cache for TCP skb->head allocations makes
a difference, both in cpu cycles and memory savings.

This kmem_cache could also be used for GRO skb allocations,
this is left as a future exercise.

Eric Dumazet (4):
  net: add SKB_HEAD_ALIGN() helper
  net: remove osize variable in __alloc_skb()
  net: factorize code in kmalloc_reserve()
  net: add dedicated kmem_cache for typical/small skb->head

 include/linux/skbuff.h |  8 ++++
 net/core/skbuff.c      | 95 +++++++++++++++++++++++++++---------------
 2 files changed, 70 insertions(+), 33 deletions(-)

-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
@ 2023-02-02 18:57 ` Eric Dumazet
  2023-02-02 20:06   ` Soheil Hassas Yeganeh
  2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

We have many places using this expression:

 SKB_DATA_ALIGN(sizeof(struct skb_shared_info))

Use of SKB_HEAD_ALIGN() will allow to clean them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h |  8 ++++++++
 net/core/skbuff.c      | 18 ++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5ba12185f43e311e37c9045763c3ee0efc274f2a..f2141b7e3940cee060e8443dbaa147b843eb43a0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -255,6 +255,14 @@
 #define SKB_DATA_ALIGN(X)	ALIGN(X, SMP_CACHE_BYTES)
 #define SKB_WITH_OVERHEAD(X)	\
 	((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+/* For X bytes available in skb->head, what is the minimal
+ * allocation needed, knowing struct skb_shared_info needs
+ * to be aligned.
+ */
+#define SKB_HEAD_ALIGN(X) (SKB_DATA_ALIGN(X) + \
+	SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
 #define SKB_MAX_ORDER(X, ORDER) \
 	SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X))
 #define SKB_MAX_HEAD(X)		(SKB_MAX_ORDER((X), 0))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bb79b4cb89db344d23609f93b2bcca5103f1e92d..b73de8fb0756c02cf9ba4b7e90854c9c17728463 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -558,8 +558,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	osize = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
 	if (unlikely(!data))
@@ -632,8 +631,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
 		goto skb_success;
 	}
 
-	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	len = SKB_DATA_ALIGN(len);
+	len = SKB_HEAD_ALIGN(len);
 
 	if (sk_memalloc_socks())
 		gfp_mask |= __GFP_MEMALLOC;
@@ -732,8 +730,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 		data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
 		pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small);
 	} else {
-		len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-		len = SKB_DATA_ALIGN(len);
+		len = SKB_HEAD_ALIGN(len);
 
 		data = page_frag_alloc(&nc->page, len, gfp_mask);
 		pfmemalloc = nc->page.pfmemalloc;
@@ -1936,8 +1933,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	size = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
@@ -6288,8 +6284,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	size = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
@@ -6407,8 +6402,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_DATA_ALIGN(size);
-	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	size = SKB_HEAD_ALIGN(size);
 	size = kmalloc_size_roundup(size);
 	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/4] net: remove osize variable in __alloc_skb()
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
@ 2023-02-02 18:57 ` Eric Dumazet
  2023-02-02 20:07   ` Soheil Hassas Yeganeh
  2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

This is a cleanup patch, to prepare following change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b73de8fb0756c02cf9ba4b7e90854c9c17728463..a82df5289208d69716e60c5c1f201ec3ca50a258 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -533,7 +533,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 {
 	struct kmem_cache *cache;
 	struct sk_buff *skb;
-	unsigned int osize;
 	bool pfmemalloc;
 	u8 *data;
 
@@ -559,16 +558,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
 	size = SKB_HEAD_ALIGN(size);
-	osize = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
+	size = kmalloc_size_roundup(size);
+	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
 	if (unlikely(!data))
 		goto nodata;
 	/* kmalloc_size_roundup() might give us more room than requested.
 	 * Put skb_shared_info exactly at the end of allocated zone,
 	 * to allow max possible filling before reallocation.
 	 */
-	size = SKB_WITH_OVERHEAD(osize);
-	prefetchw(data + size);
+	prefetchw(data + SKB_WITH_OVERHEAD(size));
 
 	/*
 	 * Only clear those fields we need to clear, not those that we will
@@ -576,7 +574,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * the tail pointer in struct sk_buff!
 	 */
 	memset(skb, 0, offsetof(struct sk_buff, tail));
-	__build_skb_around(skb, data, osize);
+	__build_skb_around(skb, data, size);
 	skb->pfmemalloc = pfmemalloc;
 
 	if (flags & SKB_ALLOC_FCLONE) {
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/4] net: factorize code in kmalloc_reserve()
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
  2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
@ 2023-02-02 18:58 ` Eric Dumazet
  2023-02-02 20:09   ` Soheil Hassas Yeganeh
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
  2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni
  4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:58 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

All kmalloc_reserve() callers have to make the same computation,
we can factorize them, to prepare following patch in the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a82df5289208d69716e60c5c1f201ec3ca50a258..ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -478,17 +478,20 @@ EXPORT_SYMBOL(napi_build_skb);
  * may be used. Otherwise, the packet data may be discarded until enough
  * memory is free
  */
-static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
+static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
 			     bool *pfmemalloc)
 {
-	void *obj;
 	bool ret_pfmemalloc = false;
+	unsigned int obj_size;
+	void *obj;
 
+	obj_size = SKB_HEAD_ALIGN(*size);
+	*size = obj_size = kmalloc_size_roundup(obj_size);
 	/*
 	 * Try a regular allocation, when that fails and we're not entitled
 	 * to the reserves, fail.
 	 */
-	obj = kmalloc_node_track_caller(size,
+	obj = kmalloc_node_track_caller(obj_size,
 					flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
 					node);
 	if (obj || !(gfp_pfmemalloc_allowed(flags)))
@@ -496,7 +499,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
 
 	/* Try again but now we are using pfmemalloc reserves */
 	ret_pfmemalloc = true;
-	obj = kmalloc_node_track_caller(size, flags, node);
+	obj = kmalloc_node_track_caller(obj_size, flags, node);
 
 out:
 	if (pfmemalloc)
@@ -557,9 +560,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
+	data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc);
 	if (unlikely(!data))
 		goto nodata;
 	/* kmalloc_size_roundup() might give us more room than requested.
@@ -1931,9 +1932,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+	data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
 		goto nodata;
 	size = SKB_WITH_OVERHEAD(size);
@@ -6282,9 +6281,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+	data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
 		return -ENOMEM;
 	size = SKB_WITH_OVERHEAD(size);
@@ -6400,9 +6397,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 	if (skb_pfmemalloc(skb))
 		gfp_mask |= __GFP_MEMALLOC;
 
-	size = SKB_HEAD_ALIGN(size);
-	size = kmalloc_size_roundup(size);
-	data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+	data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
 	if (!data)
 		return -ENOMEM;
 	size = SKB_WITH_OVERHEAD(size);
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
                   ` (2 preceding siblings ...)
  2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
@ 2023-02-02 18:58 ` Eric Dumazet
  2023-02-02 20:14   ` Soheil Hassas Yeganeh
                     ` (3 more replies)
  2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni
  4 siblings, 4 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:58 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
	Eric Dumazet

Recent removal of ksize() in alloc_skb() increased
performance because we no longer read
the associated struct page.

We have an equivalent cost at kfree_skb() time.

kfree(skb->head) has to access a struct page,
often cold in cpu caches to get the owning
struct kmem_cache.

Considering that many allocations are small,
we can have our own kmem_cache to avoid the cache line miss.

This also saves memory because these small heads
are no longer padded to 1024 bytes.

CONFIG_SLUB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head   2907   2907    640   51    8 : tunables    0    0    0 : slabdata     57     57      0

CONFIG_SLAB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head    607    624    640    6    1 : tunables   54   27    8 : slabdata    104    104      5

Note: after Kees Cook patches and this one, we might
be able to revert commit
dbae2b062824 ("net: skb: introduce and use a single page frag cache")
because GRO_MAX_HEAD is also small.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
---
 net/core/skbuff.c | 52 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc..3e540b4924701cc57b6fbd1b668bab3b652ee94c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -89,6 +89,19 @@ static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
 #ifdef CONFIG_SKB_EXTENSIONS
 static struct kmem_cache *skbuff_ext_cache __ro_after_init;
 #endif
+static struct kmem_cache *skb_small_head_cache __ro_after_init;
+
+#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
+
+/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
+#define SKB_SMALL_HEAD_CACHE_SIZE					\
+	(is_power_of_2(SKB_SMALL_HEAD_SIZE) ?			\
+		(SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) :	\
+		SKB_SMALL_HEAD_SIZE)
+
+#define SKB_SMALL_HEAD_HEADROOM						\
+	SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
+
 int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
 EXPORT_SYMBOL(sysctl_max_skb_frags);
 
@@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
 	void *obj;
 
 	obj_size = SKB_HEAD_ALIGN(*size);
+	if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
+	    !(flags & KMALLOC_NOT_NORMAL_BITS)) {
+
+		/* skb_small_head_cache has non power of two size,
+		 * likely forcing SLUB to use order-3 pages.
+		 * We deliberately attempt a NOMEMALLOC allocation only.
+		 */
+		obj = kmem_cache_alloc_node(skb_small_head_cache,
+				flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
+				node);
+		if (obj) {
+			*size = SKB_SMALL_HEAD_CACHE_SIZE;
+			goto out;
+		}
+	}
 	*size = obj_size = kmalloc_size_roundup(obj_size);
 	/*
 	 * Try a regular allocation, when that fails and we're not entitled
@@ -805,6 +833,14 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data)
 	return page_pool_return_skb_page(virt_to_page(data));
 }
 
+static void skb_kfree_head(void *head, unsigned int end_offset)
+{
+	if (end_offset == SKB_SMALL_HEAD_HEADROOM)
+		kmem_cache_free(skb_small_head_cache, head);
+	else
+		kfree(head);
+}
+
 static void skb_free_head(struct sk_buff *skb)
 {
 	unsigned char *head = skb->head;
@@ -814,7 +850,7 @@ static void skb_free_head(struct sk_buff *skb)
 			return;
 		skb_free_frag(head);
 	} else {
-		kfree(head);
+		skb_kfree_head(head, skb_end_offset(skb));
 	}
 }
 
@@ -1995,7 +2031,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	return 0;
 
 nofrags:
-	kfree(data);
+	skb_kfree_head(data, size);
 nodata:
 	return -ENOMEM;
 }
@@ -4633,6 +4669,12 @@ void __init skb_init(void)
 						0,
 						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
 						NULL);
+	skb_small_head_cache = kmem_cache_create("skbuff_small_head",
+						SKB_SMALL_HEAD_CACHE_SIZE,
+						0,
+						SLAB_HWCACHE_ALIGN | SLAB_PANIC,
+						NULL);
+
 	skb_extensions_init();
 }
 
@@ -6297,7 +6339,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 	if (skb_cloned(skb)) {
 		/* drop the old head gracefully */
 		if (skb_orphan_frags(skb, gfp_mask)) {
-			kfree(data);
+			skb_kfree_head(data, size);
 			return -ENOMEM;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
@@ -6405,7 +6447,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 	memcpy((struct skb_shared_info *)(data + size),
 	       skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
 	if (skb_orphan_frags(skb, gfp_mask)) {
-		kfree(data);
+		skb_kfree_head(data, size);
 		return -ENOMEM;
 	}
 	shinfo = (struct skb_shared_info *)(data + size);
@@ -6441,7 +6483,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 		/* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */
 		if (skb_has_frag_list(skb))
 			kfree_skb_list(skb_shinfo(skb)->frag_list);
-		kfree(data);
+		skb_kfree_head(data, size);
 		return -ENOMEM;
 	}
 	skb_release_data(skb, SKB_CONSUMED);
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper
  2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
@ 2023-02-02 20:06   ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> We have many places using this expression:
>
>  SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>
> Use of SKB_HEAD_ALIGN() will allow to clean them.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  include/linux/skbuff.h |  8 ++++++++
>  net/core/skbuff.c      | 18 ++++++------------
>  2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 5ba12185f43e311e37c9045763c3ee0efc274f2a..f2141b7e3940cee060e8443dbaa147b843eb43a0 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -255,6 +255,14 @@
>  #define SKB_DATA_ALIGN(X)      ALIGN(X, SMP_CACHE_BYTES)
>  #define SKB_WITH_OVERHEAD(X)   \
>         ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +
> +/* For X bytes available in skb->head, what is the minimal
> + * allocation needed, knowing struct skb_shared_info needs
> + * to be aligned.
> + */
> +#define SKB_HEAD_ALIGN(X) (SKB_DATA_ALIGN(X) + \
> +       SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +
>  #define SKB_MAX_ORDER(X, ORDER) \
>         SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X))
>  #define SKB_MAX_HEAD(X)                (SKB_MAX_ORDER((X), 0))
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index bb79b4cb89db344d23609f93b2bcca5103f1e92d..b73de8fb0756c02cf9ba4b7e90854c9c17728463 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -558,8 +558,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * aligned memory blocks, unless SLUB/SLAB debug is enabled.
>          * Both skb->head and skb_shared_info are cache line aligned.
>          */
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         osize = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
>         if (unlikely(!data))
> @@ -632,8 +631,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
>                 goto skb_success;
>         }
>
> -       len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> -       len = SKB_DATA_ALIGN(len);
> +       len = SKB_HEAD_ALIGN(len);
>
>         if (sk_memalloc_socks())
>                 gfp_mask |= __GFP_MEMALLOC;
> @@ -732,8 +730,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>                 data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
>                 pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small);
>         } else {
> -               len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> -               len = SKB_DATA_ALIGN(len);
> +               len = SKB_HEAD_ALIGN(len);
>
>                 data = page_frag_alloc(&nc->page, len, gfp_mask);
>                 pfmemalloc = nc->page.pfmemalloc;
> @@ -1936,8 +1933,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         size = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
> @@ -6288,8 +6284,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         size = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
> @@ -6407,8 +6402,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_DATA_ALIGN(size);
> -       size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       size = SKB_HEAD_ALIGN(size);
>         size = kmalloc_size_roundup(size);
>         data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 2/4] net: remove osize variable in __alloc_skb()
  2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
@ 2023-02-02 20:07   ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> This is a cleanup patch, to prepare following change.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  net/core/skbuff.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b73de8fb0756c02cf9ba4b7e90854c9c17728463..a82df5289208d69716e60c5c1f201ec3ca50a258 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -533,7 +533,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  {
>         struct kmem_cache *cache;
>         struct sk_buff *skb;
> -       unsigned int osize;
>         bool pfmemalloc;
>         u8 *data;
>
> @@ -559,16 +558,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * Both skb->head and skb_shared_info are cache line aligned.
>          */
>         size = SKB_HEAD_ALIGN(size);
> -       osize = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
> +       size = kmalloc_size_roundup(size);
> +       data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
>         if (unlikely(!data))
>                 goto nodata;
>         /* kmalloc_size_roundup() might give us more room than requested.
>          * Put skb_shared_info exactly at the end of allocated zone,
>          * to allow max possible filling before reallocation.
>          */
> -       size = SKB_WITH_OVERHEAD(osize);
> -       prefetchw(data + size);
> +       prefetchw(data + SKB_WITH_OVERHEAD(size));
>
>         /*
>          * Only clear those fields we need to clear, not those that we will
> @@ -576,7 +574,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * the tail pointer in struct sk_buff!
>          */
>         memset(skb, 0, offsetof(struct sk_buff, tail));
> -       __build_skb_around(skb, data, osize);
> +       __build_skb_around(skb, data, size);
>         skb->pfmemalloc = pfmemalloc;
>
>         if (flags & SKB_ALLOC_FCLONE) {
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/4] net: factorize code in kmalloc_reserve()
  2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
@ 2023-02-02 20:09   ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> All kmalloc_reserve() callers have to make the same computation,
> we can factorize them, to prepare following patch in the series.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

> ---
>  net/core/skbuff.c | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a82df5289208d69716e60c5c1f201ec3ca50a258..ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -478,17 +478,20 @@ EXPORT_SYMBOL(napi_build_skb);
>   * may be used. Otherwise, the packet data may be discarded until enough
>   * memory is free
>   */
> -static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
> +static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
>                              bool *pfmemalloc)
>  {
> -       void *obj;
>         bool ret_pfmemalloc = false;
> +       unsigned int obj_size;
> +       void *obj;
>
> +       obj_size = SKB_HEAD_ALIGN(*size);
> +       *size = obj_size = kmalloc_size_roundup(obj_size);
>         /*
>          * Try a regular allocation, when that fails and we're not entitled
>          * to the reserves, fail.
>          */
> -       obj = kmalloc_node_track_caller(size,
> +       obj = kmalloc_node_track_caller(obj_size,
>                                         flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
>                                         node);
>         if (obj || !(gfp_pfmemalloc_allowed(flags)))
> @@ -496,7 +499,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
>
>         /* Try again but now we are using pfmemalloc reserves */
>         ret_pfmemalloc = true;
> -       obj = kmalloc_node_track_caller(size, flags, node);
> +       obj = kmalloc_node_track_caller(obj_size, flags, node);
>
>  out:
>         if (pfmemalloc)
> @@ -557,9 +560,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>          * aligned memory blocks, unless SLUB/SLAB debug is enabled.
>          * Both skb->head and skb_shared_info are cache line aligned.
>          */
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> +       data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc);
>         if (unlikely(!data))
>                 goto nodata;
>         /* kmalloc_size_roundup() might give us more room than requested.
> @@ -1931,9 +1932,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> +       data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
>                 goto nodata;
>         size = SKB_WITH_OVERHEAD(size);
> @@ -6282,9 +6281,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> +       data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
>                 return -ENOMEM;
>         size = SKB_WITH_OVERHEAD(size);
> @@ -6400,9 +6397,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>         if (skb_pfmemalloc(skb))
>                 gfp_mask |= __GFP_MEMALLOC;
>
> -       size = SKB_HEAD_ALIGN(size);
> -       size = kmalloc_size_roundup(size);
> -       data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> +       data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
>         if (!data)
>                 return -ENOMEM;
>         size = SKB_WITH_OVERHEAD(size);
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
@ 2023-02-02 20:14   ` Soheil Hassas Yeganeh
  2023-02-03  5:11   ` Jakub Kicinski
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, Alexander Duyck

On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Recent removal of ksize() in alloc_skb() increased
> performance because we no longer read
> the associated struct page.
>
> We have an equivalent cost at kfree_skb() time.
>
> kfree(skb->head) has to access a struct page,
> often cold in cpu caches to get the owning
> struct kmem_cache.
>
> Considering that many allocations are small,
> we can have our own kmem_cache to avoid the cache line miss.
>
> This also saves memory because these small heads
> are no longer padded to 1024 bytes.
>
> CONFIG_SLUB=y
> $ grep skbuff_small_head /proc/slabinfo
> skbuff_small_head   2907   2907    640   51    8 : tunables    0    0    0 : slabdata     57     57      0
>
> CONFIG_SLAB=y
> $ grep skbuff_small_head /proc/slabinfo
> skbuff_small_head    607    624    640    6    1 : tunables   54   27    8 : slabdata    104    104      5
>
> Note: after Kees Cook patches and this one, we might
> be able to revert commit
> dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> because GRO_MAX_HEAD is also small.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Very nice!

> ---
>  net/core/skbuff.c | 52 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc..3e540b4924701cc57b6fbd1b668bab3b652ee94c 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -89,6 +89,19 @@ static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
>  #ifdef CONFIG_SKB_EXTENSIONS
>  static struct kmem_cache *skbuff_ext_cache __ro_after_init;
>  #endif
> +static struct kmem_cache *skb_small_head_cache __ro_after_init;
> +
> +#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
> +
> +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
> +#define SKB_SMALL_HEAD_CACHE_SIZE                                      \
> +       (is_power_of_2(SKB_SMALL_HEAD_SIZE) ?                   \
> +               (SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) :        \
> +               SKB_SMALL_HEAD_SIZE)
> +
> +#define SKB_SMALL_HEAD_HEADROOM                                                \
> +       SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
> +
>  int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
>  EXPORT_SYMBOL(sysctl_max_skb_frags);
>
> @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
>         void *obj;
>
>         obj_size = SKB_HEAD_ALIGN(*size);
> +       if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> +           !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> +
> +               /* skb_small_head_cache has non power of two size,
> +                * likely forcing SLUB to use order-3 pages.
> +                * We deliberately attempt a NOMEMALLOC allocation only.
> +                */
> +               obj = kmem_cache_alloc_node(skb_small_head_cache,
> +                               flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> +                               node);
> +               if (obj) {
> +                       *size = SKB_SMALL_HEAD_CACHE_SIZE;
> +                       goto out;
> +               }
> +       }
>         *size = obj_size = kmalloc_size_roundup(obj_size);
>         /*
>          * Try a regular allocation, when that fails and we're not entitled
> @@ -805,6 +833,14 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data)
>         return page_pool_return_skb_page(virt_to_page(data));
>  }
>
> +static void skb_kfree_head(void *head, unsigned int end_offset)
> +{
> +       if (end_offset == SKB_SMALL_HEAD_HEADROOM)
> +               kmem_cache_free(skb_small_head_cache, head);
> +       else
> +               kfree(head);
> +}
> +
>  static void skb_free_head(struct sk_buff *skb)
>  {
>         unsigned char *head = skb->head;
> @@ -814,7 +850,7 @@ static void skb_free_head(struct sk_buff *skb)
>                         return;
>                 skb_free_frag(head);
>         } else {
> -               kfree(head);
> +               skb_kfree_head(head, skb_end_offset(skb));
>         }
>  }
>
> @@ -1995,7 +2031,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>         return 0;
>
>  nofrags:
> -       kfree(data);
> +       skb_kfree_head(data, size);
>  nodata:
>         return -ENOMEM;
>  }
> @@ -4633,6 +4669,12 @@ void __init skb_init(void)
>                                                 0,
>                                                 SLAB_HWCACHE_ALIGN|SLAB_PANIC,
>                                                 NULL);
> +       skb_small_head_cache = kmem_cache_create("skbuff_small_head",
> +                                               SKB_SMALL_HEAD_CACHE_SIZE,
> +                                               0,
> +                                               SLAB_HWCACHE_ALIGN | SLAB_PANIC,
> +                                               NULL);
> +
>         skb_extensions_init();
>  }
>
> @@ -6297,7 +6339,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
>         if (skb_cloned(skb)) {
>                 /* drop the old head gracefully */
>                 if (skb_orphan_frags(skb, gfp_mask)) {
> -                       kfree(data);
> +                       skb_kfree_head(data, size);
>                         return -ENOMEM;
>                 }
>                 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> @@ -6405,7 +6447,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>         memcpy((struct skb_shared_info *)(data + size),
>                skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
>         if (skb_orphan_frags(skb, gfp_mask)) {
> -               kfree(data);
> +               skb_kfree_head(data, size);
>                 return -ENOMEM;
>         }
>         shinfo = (struct skb_shared_info *)(data + size);
> @@ -6441,7 +6483,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>                 /* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */
>                 if (skb_has_frag_list(skb))
>                         kfree_skb_list(skb_shinfo(skb)->frag_list);
> -               kfree(data);
> +               skb_kfree_head(data, size);
>                 return -ENOMEM;
>         }
>         skb_release_data(skb, SKB_CONSUMED);
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
  2023-02-02 20:14   ` Soheil Hassas Yeganeh
@ 2023-02-03  5:11   ` Jakub Kicinski
  2023-02-03  7:00     ` Eric Dumazet
  2023-02-03  7:59   ` Paolo Abeni
  2023-02-03 19:37   ` kernel test robot
  3 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03  5:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Thu,  2 Feb 2023 18:58:01 +0000 Eric Dumazet wrote:
> +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */

Why is that?  Is it to prevent potential mixing up of objects from 
the cache with objects from general slabs (since we only do a
end_offset == SKB_SMALL_HEAD_HEADROOM check)?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03  5:11   ` Jakub Kicinski
@ 2023-02-03  7:00     ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-03  7:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Fri, Feb 3, 2023 at 6:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu,  2 Feb 2023 18:58:01 +0000 Eric Dumazet wrote:
> > +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
>
> Why is that?  Is it to prevent potential mixing up of objects from
> the cache with objects from general slabs (since we only do a
> end_offset == SKB_SMALL_HEAD_HEADROOM check)?

Good question.

Some alloc_skb() callers use GFP_DMA (or __GFP_ACCOUNT)
we can not use the dedicated kmem_cache for them.

They could get an object of size 512 or 1024

Since I chose not adding yet another
skb->head_has_been_allocated_from_small_head_cache,
we want to make sure we will  not have issues in the future, if
SKB_HEAD_ALIGN(MAX_TCP_HEADER)
becomes a power-of-two. (for example for some of us increasing MAX_SKB_FRAGS)

Alternative would be to add a check at boot time, making sure
no standard cache has the same object size.

This might have an issue with CONFIG_SLOB=y, I wish this was gone already...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
  2023-02-02 20:14   ` Soheil Hassas Yeganeh
  2023-02-03  5:11   ` Jakub Kicinski
@ 2023-02-03  7:59   ` Paolo Abeni
  2023-02-03  8:17     ` Eric Dumazet
  2023-02-03 19:37   ` kernel test robot
  3 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2023-02-03  7:59 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh

On Thu, 2023-02-02 at 18:58 +0000, Eric Dumazet wrote:
> Note: after Kees Cook patches and this one, we might
> be able to revert commit
> dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> because GRO_MAX_HEAD is also small.

I guess I'll need some time to do the relevant benchmarks, but I'm not
able to schedule them very soon.

> @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
>  	void *obj;
>  
>  	obj_size = SKB_HEAD_ALIGN(*size);
> +	if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> +	    !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> +
> +		/* skb_small_head_cache has non power of two size,
> +		 * likely forcing SLUB to use order-3 pages.
> +		 * We deliberately attempt a NOMEMALLOC allocation only.
> +		 */
> +		obj = kmem_cache_alloc_node(skb_small_head_cache,
> +				flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> +				node);
> +		if (obj) {
> +			*size = SKB_SMALL_HEAD_CACHE_SIZE;
> +			goto out;
> +		}

In case kmem allocation failure, should we try to skip the 2nd 
__GFP_NOMEMALLOC attempt below?

I *think* non power of two size is also required to avoid an issue
plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
kmem_cache allocated.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03  7:59   ` Paolo Abeni
@ 2023-02-03  8:17     ` Eric Dumazet
  2023-02-03 19:47       ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-03  8:17 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Fri, Feb 3, 2023 at 8:59 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2023-02-02 at 18:58 +0000, Eric Dumazet wrote:
> > Note: after Kees Cook patches and this one, we might
> > be able to revert commit
> > dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> > because GRO_MAX_HEAD is also small.
>
> I guess I'll need some time to do the relevant benchmarks, but I'm not
> able to schedule them very soon.

No worries, this can be done later.
Note the results might depend on SLUB/SLAB choice.

>
> > @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
> >       void *obj;
> >
> >       obj_size = SKB_HEAD_ALIGN(*size);
> > +     if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> > +         !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> > +
> > +             /* skb_small_head_cache has non power of two size,
> > +              * likely forcing SLUB to use order-3 pages.
> > +              * We deliberately attempt a NOMEMALLOC allocation only.
> > +              */
> > +             obj = kmem_cache_alloc_node(skb_small_head_cache,
> > +                             flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> > +                             node);
> > +             if (obj) {
> > +                     *size = SKB_SMALL_HEAD_CACHE_SIZE;
> > +                     goto out;
> > +             }
>
> In case kmem allocation failure, should we try to skip the 2nd
> __GFP_NOMEMALLOC attempt below?

We could, but my reasoning was that we might find an object in the
other kmem_cache freelist.

kmalloc-1 objects tend to use smaller page orders, so are more likely
to succeed if
there is no order-3 page available in the buddy allocator.(I mentioned
this in the comment)


>
> I *think* non power of two size is also required to avoid an issue
> plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
> kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
> kmem_cache allocated.

Indeed there are multiple cases explaining why SKB_SMALL_HEAD_CACHE_SIZE
needs to be a non power of two.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
                     ` (2 preceding siblings ...)
  2023-02-03  7:59   ` Paolo Abeni
@ 2023-02-03 19:37   ` kernel test robot
  2023-02-03 19:42     ` Jakub Kicinski
  3 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2023-02-03 19:37 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: oe-kbuild-all, netdev, eric.dumazet, Alexander Duyck,
	Soheil Hassas Yeganeh, Eric Dumazet

Hi Eric,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230203-031126
patch link:    https://lore.kernel.org/r/20230202185801.4179599-5-edumazet%40google.com
patch subject: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
config: arc-randconfig-r014-20230204 (https://download.01.org/0day-ci/archive/20230204/202302040329.E10xZHbY-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/002bb9238e98f3fbeb0402c97f711420c3321b93
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230203-031126
        git checkout 002bb9238e98f3fbeb0402c97f711420c3321b93
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash net/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/core/skbuff.c: In function 'kmalloc_reserve':
>> net/core/skbuff.c:503:23: error: 'KMALLOC_NOT_NORMAL_BITS' undeclared (first use in this function)
     503 |             !(flags & KMALLOC_NOT_NORMAL_BITS)) {
         |                       ^~~~~~~~~~~~~~~~~~~~~~~
   net/core/skbuff.c:503:23: note: each undeclared identifier is reported only once for each function it appears in


vim +/KMALLOC_NOT_NORMAL_BITS +503 net/core/skbuff.c

   486	
   487	/*
   488	 * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells
   489	 * the caller if emergency pfmemalloc reserves are being used. If it is and
   490	 * the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves
   491	 * may be used. Otherwise, the packet data may be discarded until enough
   492	 * memory is free
   493	 */
   494	static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
   495				     bool *pfmemalloc)
   496	{
   497		bool ret_pfmemalloc = false;
   498		unsigned int obj_size;
   499		void *obj;
   500	
   501		obj_size = SKB_HEAD_ALIGN(*size);
   502		if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
 > 503		    !(flags & KMALLOC_NOT_NORMAL_BITS)) {
   504	
   505			/* skb_small_head_cache has non power of two size,
   506			 * likely forcing SLUB to use order-3 pages.
   507			 * We deliberately attempt a NOMEMALLOC allocation only.
   508			 */
   509			obj = kmem_cache_alloc_node(skb_small_head_cache,
   510					flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
   511					node);
   512			if (obj) {
   513				*size = SKB_SMALL_HEAD_CACHE_SIZE;
   514				goto out;
   515			}
   516		}
   517		*size = obj_size = kmalloc_size_roundup(obj_size);
   518		/*
   519		 * Try a regular allocation, when that fails and we're not entitled
   520		 * to the reserves, fail.
   521		 */
   522		obj = kmalloc_node_track_caller(obj_size,
   523						flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
   524						node);
   525		if (obj || !(gfp_pfmemalloc_allowed(flags)))
   526			goto out;
   527	
   528		/* Try again but now we are using pfmemalloc reserves */
   529		ret_pfmemalloc = true;
   530		obj = kmalloc_node_track_caller(obj_size, flags, node);
   531	
   532	out:
   533		if (pfmemalloc)
   534			*pfmemalloc = ret_pfmemalloc;
   535	
   536		return obj;
   537	}
   538	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03 19:37   ` kernel test robot
@ 2023-02-03 19:42     ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03 19:42 UTC (permalink / raw)
  To: kernel test robot
  Cc: Eric Dumazet, David S . Miller, Paolo Abeni, oe-kbuild-all,
	netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh

On Sat, 4 Feb 2023 03:37:10 +0800 kernel test robot wrote:
> All errors (new ones prefixed by >>):
> 
>    net/core/skbuff.c: In function 'kmalloc_reserve':
> >> net/core/skbuff.c:503:23: error: 'KMALLOC_NOT_NORMAL_BITS' undeclared (first use in this function)  
>      503 |             !(flags & KMALLOC_NOT_NORMAL_BITS)) {
>          |                       ^~~~~~~~~~~~~~~~~~~~~~~
>    net/core/skbuff.c:503:23: note: each undeclared identifier is reported only once for each function it appears in

diff --git a/net/Kconfig b/net/Kconfig
index 27bc49c7ad73..41e27a4805a8 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -8,6 +8,7 @@ menuconfig NET
        select NLATTR
        select GENERIC_NET_UTILS
        select BPF
+       depends on !SLOB
        help
          Unless you really know what you are doing, you should say Y here.
          The reason is that some programs need kernel networking support even

? :)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
  2023-02-03  8:17     ` Eric Dumazet
@ 2023-02-03 19:47       ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03 19:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, David S . Miller, netdev, eric.dumazet,
	Alexander Duyck, Soheil Hassas Yeganeh

On Fri, 3 Feb 2023 09:17:55 +0100 Eric Dumazet wrote:
> > I *think* non power of two size is also required to avoid an issue
> > plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
> > kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
> > kmem_cache allocated.  
> 
> Indeed there are multiple cases explaining why SKB_SMALL_HEAD_CACHE_SIZE
> needs to be a non power of two.

Since you may need to respin would you mind spelling this out a bit
more in the comment ? Maybe: 

-/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
+/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two.
+ * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique
+ * size, and we can differentiate heads from skb_small_head_cache
+ * vs system slabs by looking at their size (skb_end_offset()).
+ */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs
  2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
                   ` (3 preceding siblings ...)
  2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
@ 2023-02-06 18:16 ` Paolo Abeni
  4 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2023-02-06 18:16 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh

On Thu, 2023-02-02 at 18:57 +0000, Eric Dumazet wrote:
> Our profile data show that using kmalloc(non_const_size)/kfree(ptr)
> has a certain cost, because kfree(ptr) has to pull a 'struct page'
> in cpu caches.
> 
> Using a dedicated kmem_cache for TCP skb->head allocations makes
> a difference, both in cpu cycles and memory savings.
> 
> This kmem_cache could also be used for GRO skb allocations,
> this is left as a future exercise.
> 
> Eric Dumazet (4):
>   net: add SKB_HEAD_ALIGN() helper
>   net: remove osize variable in __alloc_skb()
>   net: factorize code in kmalloc_reserve()
>   net: add dedicated kmem_cache for typical/small skb->head
> 
>  include/linux/skbuff.h |  8 ++++
>  net/core/skbuff.c      | 95 +++++++++++++++++++++++++++---------------
>  2 files changed, 70 insertions(+), 33 deletions(-)

LGTM,

Acked-by: Paolo Abeni <pabeni@redhat.com>

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-02-06 18:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
2023-02-02 20:06   ` Soheil Hassas Yeganeh
2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
2023-02-02 20:07   ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
2023-02-02 20:09   ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
2023-02-02 20:14   ` Soheil Hassas Yeganeh
2023-02-03  5:11   ` Jakub Kicinski
2023-02-03  7:00     ` Eric Dumazet
2023-02-03  7:59   ` Paolo Abeni
2023-02-03  8:17     ` Eric Dumazet
2023-02-03 19:47       ` Jakub Kicinski
2023-02-03 19:37   ` kernel test robot
2023-02-03 19:42     ` Jakub Kicinski
2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.