* [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs
@ 2023-02-02 18:57 Eric Dumazet
2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
Eric Dumazet
Our profile data show that using kmalloc(non_const_size)/kfree(ptr)
has a certain cost, because kfree(ptr) has to pull a 'struct page'
in cpu caches.
Using a dedicated kmem_cache for TCP skb->head allocations makes
a difference, both in cpu cycles and memory savings.
This kmem_cache could also be used for GRO skb allocations,
this is left as a future exercise.
Eric Dumazet (4):
net: add SKB_HEAD_ALIGN() helper
net: remove osize variable in __alloc_skb()
net: factorize code in kmalloc_reserve()
net: add dedicated kmem_cache for typical/small skb->head
include/linux/skbuff.h | 8 ++++
net/core/skbuff.c | 95 +++++++++++++++++++++++++++---------------
2 files changed, 70 insertions(+), 33 deletions(-)
--
2.39.1.456.gfc5497dd1b-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
@ 2023-02-02 18:57 ` Eric Dumazet
2023-02-02 20:06 ` Soheil Hassas Yeganeh
2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
` (3 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
Eric Dumazet
We have many places using this expression:
SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
Use of SKB_HEAD_ALIGN() will allow to clean them.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/skbuff.h | 8 ++++++++
net/core/skbuff.c | 18 ++++++------------
2 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5ba12185f43e311e37c9045763c3ee0efc274f2a..f2141b7e3940cee060e8443dbaa147b843eb43a0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -255,6 +255,14 @@
#define SKB_DATA_ALIGN(X) ALIGN(X, SMP_CACHE_BYTES)
#define SKB_WITH_OVERHEAD(X) \
((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+/* For X bytes available in skb->head, what is the minimal
+ * allocation needed, knowing struct skb_shared_info needs
+ * to be aligned.
+ */
+#define SKB_HEAD_ALIGN(X) (SKB_DATA_ALIGN(X) + \
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
#define SKB_MAX_ORDER(X, ORDER) \
SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X))
#define SKB_MAX_HEAD(X) (SKB_MAX_ORDER((X), 0))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bb79b4cb89db344d23609f93b2bcca5103f1e92d..b73de8fb0756c02cf9ba4b7e90854c9c17728463 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -558,8 +558,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
* aligned memory blocks, unless SLUB/SLAB debug is enabled.
* Both skb->head and skb_shared_info are cache line aligned.
*/
- size = SKB_DATA_ALIGN(size);
- size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ size = SKB_HEAD_ALIGN(size);
osize = kmalloc_size_roundup(size);
data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
if (unlikely(!data))
@@ -632,8 +631,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
goto skb_success;
}
- len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- len = SKB_DATA_ALIGN(len);
+ len = SKB_HEAD_ALIGN(len);
if (sk_memalloc_socks())
gfp_mask |= __GFP_MEMALLOC;
@@ -732,8 +730,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small);
} else {
- len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- len = SKB_DATA_ALIGN(len);
+ len = SKB_HEAD_ALIGN(len);
data = page_frag_alloc(&nc->page, len, gfp_mask);
pfmemalloc = nc->page.pfmemalloc;
@@ -1936,8 +1933,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- size = SKB_DATA_ALIGN(size);
- size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ size = SKB_HEAD_ALIGN(size);
size = kmalloc_size_roundup(size);
data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
@@ -6288,8 +6284,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- size = SKB_DATA_ALIGN(size);
- size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ size = SKB_HEAD_ALIGN(size);
size = kmalloc_size_roundup(size);
data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
@@ -6407,8 +6402,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- size = SKB_DATA_ALIGN(size);
- size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ size = SKB_HEAD_ALIGN(size);
size = kmalloc_size_roundup(size);
data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
--
2.39.1.456.gfc5497dd1b-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next 2/4] net: remove osize variable in __alloc_skb()
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
@ 2023-02-02 18:57 ` Eric Dumazet
2023-02-02 20:07 ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:57 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
Eric Dumazet
This is a cleanup patch, to prepare following change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/skbuff.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b73de8fb0756c02cf9ba4b7e90854c9c17728463..a82df5289208d69716e60c5c1f201ec3ca50a258 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -533,7 +533,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
{
struct kmem_cache *cache;
struct sk_buff *skb;
- unsigned int osize;
bool pfmemalloc;
u8 *data;
@@ -559,16 +558,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
* Both skb->head and skb_shared_info are cache line aligned.
*/
size = SKB_HEAD_ALIGN(size);
- osize = kmalloc_size_roundup(size);
- data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
+ size = kmalloc_size_roundup(size);
+ data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
if (unlikely(!data))
goto nodata;
/* kmalloc_size_roundup() might give us more room than requested.
* Put skb_shared_info exactly at the end of allocated zone,
* to allow max possible filling before reallocation.
*/
- size = SKB_WITH_OVERHEAD(osize);
- prefetchw(data + size);
+ prefetchw(data + SKB_WITH_OVERHEAD(size));
/*
* Only clear those fields we need to clear, not those that we will
@@ -576,7 +574,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
* the tail pointer in struct sk_buff!
*/
memset(skb, 0, offsetof(struct sk_buff, tail));
- __build_skb_around(skb, data, osize);
+ __build_skb_around(skb, data, size);
skb->pfmemalloc = pfmemalloc;
if (flags & SKB_ALLOC_FCLONE) {
--
2.39.1.456.gfc5497dd1b-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next 3/4] net: factorize code in kmalloc_reserve()
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
@ 2023-02-02 18:58 ` Eric Dumazet
2023-02-02 20:09 ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni
4 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:58 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
Eric Dumazet
All kmalloc_reserve() callers have to make the same computation,
we can factorize them, to prepare following patch in the series.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/skbuff.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a82df5289208d69716e60c5c1f201ec3ca50a258..ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -478,17 +478,20 @@ EXPORT_SYMBOL(napi_build_skb);
* may be used. Otherwise, the packet data may be discarded until enough
* memory is free
*/
-static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
+static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
bool *pfmemalloc)
{
- void *obj;
bool ret_pfmemalloc = false;
+ unsigned int obj_size;
+ void *obj;
+ obj_size = SKB_HEAD_ALIGN(*size);
+ *size = obj_size = kmalloc_size_roundup(obj_size);
/*
* Try a regular allocation, when that fails and we're not entitled
* to the reserves, fail.
*/
- obj = kmalloc_node_track_caller(size,
+ obj = kmalloc_node_track_caller(obj_size,
flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
node);
if (obj || !(gfp_pfmemalloc_allowed(flags)))
@@ -496,7 +499,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
/* Try again but now we are using pfmemalloc reserves */
ret_pfmemalloc = true;
- obj = kmalloc_node_track_caller(size, flags, node);
+ obj = kmalloc_node_track_caller(obj_size, flags, node);
out:
if (pfmemalloc)
@@ -557,9 +560,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
* aligned memory blocks, unless SLUB/SLAB debug is enabled.
* Both skb->head and skb_shared_info are cache line aligned.
*/
- size = SKB_HEAD_ALIGN(size);
- size = kmalloc_size_roundup(size);
- data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
+ data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc);
if (unlikely(!data))
goto nodata;
/* kmalloc_size_roundup() might give us more room than requested.
@@ -1931,9 +1932,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- size = SKB_HEAD_ALIGN(size);
- size = kmalloc_size_roundup(size);
- data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+ data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
goto nodata;
size = SKB_WITH_OVERHEAD(size);
@@ -6282,9 +6281,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- size = SKB_HEAD_ALIGN(size);
- size = kmalloc_size_roundup(size);
- data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+ data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
return -ENOMEM;
size = SKB_WITH_OVERHEAD(size);
@@ -6400,9 +6397,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
- size = SKB_HEAD_ALIGN(size);
- size = kmalloc_size_roundup(size);
- data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
+ data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
if (!data)
return -ENOMEM;
size = SKB_WITH_OVERHEAD(size);
--
2.39.1.456.gfc5497dd1b-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
` (2 preceding siblings ...)
2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
@ 2023-02-02 18:58 ` Eric Dumazet
2023-02-02 20:14 ` Soheil Hassas Yeganeh
` (3 more replies)
2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni
4 siblings, 4 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-02 18:58 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh,
Eric Dumazet
Recent removal of ksize() in alloc_skb() increased
performance because we no longer read
the associated struct page.
We have an equivalent cost at kfree_skb() time.
kfree(skb->head) has to access a struct page,
often cold in cpu caches to get the owning
struct kmem_cache.
Considering that many allocations are small,
we can have our own kmem_cache to avoid the cache line miss.
This also saves memory because these small heads
are no longer padded to 1024 bytes.
CONFIG_SLUB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head 2907 2907 640 51 8 : tunables 0 0 0 : slabdata 57 57 0
CONFIG_SLAB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head 607 624 640 6 1 : tunables 54 27 8 : slabdata 104 104 5
Note: after Kees Cook patches and this one, we might
be able to revert commit
dbae2b062824 ("net: skb: introduce and use a single page frag cache")
because GRO_MAX_HEAD is also small.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
---
net/core/skbuff.c | 52 ++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 47 insertions(+), 5 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc..3e540b4924701cc57b6fbd1b668bab3b652ee94c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -89,6 +89,19 @@ static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
#ifdef CONFIG_SKB_EXTENSIONS
static struct kmem_cache *skbuff_ext_cache __ro_after_init;
#endif
+static struct kmem_cache *skb_small_head_cache __ro_after_init;
+
+#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
+
+/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
+#define SKB_SMALL_HEAD_CACHE_SIZE \
+ (is_power_of_2(SKB_SMALL_HEAD_SIZE) ? \
+ (SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) : \
+ SKB_SMALL_HEAD_SIZE)
+
+#define SKB_SMALL_HEAD_HEADROOM \
+ SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
+
int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
EXPORT_SYMBOL(sysctl_max_skb_frags);
@@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
void *obj;
obj_size = SKB_HEAD_ALIGN(*size);
+ if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
+ !(flags & KMALLOC_NOT_NORMAL_BITS)) {
+
+ /* skb_small_head_cache has non power of two size,
+ * likely forcing SLUB to use order-3 pages.
+ * We deliberately attempt a NOMEMALLOC allocation only.
+ */
+ obj = kmem_cache_alloc_node(skb_small_head_cache,
+ flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
+ node);
+ if (obj) {
+ *size = SKB_SMALL_HEAD_CACHE_SIZE;
+ goto out;
+ }
+ }
*size = obj_size = kmalloc_size_roundup(obj_size);
/*
* Try a regular allocation, when that fails and we're not entitled
@@ -805,6 +833,14 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data)
return page_pool_return_skb_page(virt_to_page(data));
}
+static void skb_kfree_head(void *head, unsigned int end_offset)
+{
+ if (end_offset == SKB_SMALL_HEAD_HEADROOM)
+ kmem_cache_free(skb_small_head_cache, head);
+ else
+ kfree(head);
+}
+
static void skb_free_head(struct sk_buff *skb)
{
unsigned char *head = skb->head;
@@ -814,7 +850,7 @@ static void skb_free_head(struct sk_buff *skb)
return;
skb_free_frag(head);
} else {
- kfree(head);
+ skb_kfree_head(head, skb_end_offset(skb));
}
}
@@ -1995,7 +2031,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
return 0;
nofrags:
- kfree(data);
+ skb_kfree_head(data, size);
nodata:
return -ENOMEM;
}
@@ -4633,6 +4669,12 @@ void __init skb_init(void)
0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC,
NULL);
+ skb_small_head_cache = kmem_cache_create("skbuff_small_head",
+ SKB_SMALL_HEAD_CACHE_SIZE,
+ 0,
+ SLAB_HWCACHE_ALIGN | SLAB_PANIC,
+ NULL);
+
skb_extensions_init();
}
@@ -6297,7 +6339,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
if (skb_cloned(skb)) {
/* drop the old head gracefully */
if (skb_orphan_frags(skb, gfp_mask)) {
- kfree(data);
+ skb_kfree_head(data, size);
return -ENOMEM;
}
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
@@ -6405,7 +6447,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
memcpy((struct skb_shared_info *)(data + size),
skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
if (skb_orphan_frags(skb, gfp_mask)) {
- kfree(data);
+ skb_kfree_head(data, size);
return -ENOMEM;
}
shinfo = (struct skb_shared_info *)(data + size);
@@ -6441,7 +6483,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
/* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */
if (skb_has_frag_list(skb))
kfree_skb_list(skb_shinfo(skb)->frag_list);
- kfree(data);
+ skb_kfree_head(data, size);
return -ENOMEM;
}
skb_release_data(skb, SKB_CONSUMED);
--
2.39.1.456.gfc5497dd1b-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper
2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
@ 2023-02-02 20:06 ` Soheil Hassas Yeganeh
0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:06 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, Alexander Duyck
On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> We have many places using this expression:
>
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
>
> Use of SKB_HEAD_ALIGN() will allow to clean them.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
> include/linux/skbuff.h | 8 ++++++++
> net/core/skbuff.c | 18 ++++++------------
> 2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 5ba12185f43e311e37c9045763c3ee0efc274f2a..f2141b7e3940cee060e8443dbaa147b843eb43a0 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -255,6 +255,14 @@
> #define SKB_DATA_ALIGN(X) ALIGN(X, SMP_CACHE_BYTES)
> #define SKB_WITH_OVERHEAD(X) \
> ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +
> +/* For X bytes available in skb->head, what is the minimal
> + * allocation needed, knowing struct skb_shared_info needs
> + * to be aligned.
> + */
> +#define SKB_HEAD_ALIGN(X) (SKB_DATA_ALIGN(X) + \
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +
> #define SKB_MAX_ORDER(X, ORDER) \
> SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X))
> #define SKB_MAX_HEAD(X) (SKB_MAX_ORDER((X), 0))
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index bb79b4cb89db344d23609f93b2bcca5103f1e92d..b73de8fb0756c02cf9ba4b7e90854c9c17728463 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -558,8 +558,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> * aligned memory blocks, unless SLUB/SLAB debug is enabled.
> * Both skb->head and skb_shared_info are cache line aligned.
> */
> - size = SKB_DATA_ALIGN(size);
> - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> + size = SKB_HEAD_ALIGN(size);
> osize = kmalloc_size_roundup(size);
> data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
> if (unlikely(!data))
> @@ -632,8 +631,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
> goto skb_success;
> }
>
> - len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> - len = SKB_DATA_ALIGN(len);
> + len = SKB_HEAD_ALIGN(len);
>
> if (sk_memalloc_socks())
> gfp_mask |= __GFP_MEMALLOC;
> @@ -732,8 +730,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
> data = page_frag_alloc_1k(&nc->page_small, gfp_mask);
> pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small);
> } else {
> - len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> - len = SKB_DATA_ALIGN(len);
> + len = SKB_HEAD_ALIGN(len);
>
> data = page_frag_alloc(&nc->page, len, gfp_mask);
> pfmemalloc = nc->page.pfmemalloc;
> @@ -1936,8 +1933,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> if (skb_pfmemalloc(skb))
> gfp_mask |= __GFP_MEMALLOC;
>
> - size = SKB_DATA_ALIGN(size);
> - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> + size = SKB_HEAD_ALIGN(size);
> size = kmalloc_size_roundup(size);
> data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> if (!data)
> @@ -6288,8 +6284,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
> if (skb_pfmemalloc(skb))
> gfp_mask |= __GFP_MEMALLOC;
>
> - size = SKB_DATA_ALIGN(size);
> - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> + size = SKB_HEAD_ALIGN(size);
> size = kmalloc_size_roundup(size);
> data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> if (!data)
> @@ -6407,8 +6402,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
> if (skb_pfmemalloc(skb))
> gfp_mask |= __GFP_MEMALLOC;
>
> - size = SKB_DATA_ALIGN(size);
> - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> + size = SKB_HEAD_ALIGN(size);
> size = kmalloc_size_roundup(size);
> data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> if (!data)
> --
> 2.39.1.456.gfc5497dd1b-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 2/4] net: remove osize variable in __alloc_skb()
2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
@ 2023-02-02 20:07 ` Soheil Hassas Yeganeh
0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, Alexander Duyck
On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> This is a cleanup patch, to prepare following change.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
> net/core/skbuff.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b73de8fb0756c02cf9ba4b7e90854c9c17728463..a82df5289208d69716e60c5c1f201ec3ca50a258 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -533,7 +533,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> {
> struct kmem_cache *cache;
> struct sk_buff *skb;
> - unsigned int osize;
> bool pfmemalloc;
> u8 *data;
>
> @@ -559,16 +558,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> * Both skb->head and skb_shared_info are cache line aligned.
> */
> size = SKB_HEAD_ALIGN(size);
> - osize = kmalloc_size_roundup(size);
> - data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc);
> + size = kmalloc_size_roundup(size);
> + data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> if (unlikely(!data))
> goto nodata;
> /* kmalloc_size_roundup() might give us more room than requested.
> * Put skb_shared_info exactly at the end of allocated zone,
> * to allow max possible filling before reallocation.
> */
> - size = SKB_WITH_OVERHEAD(osize);
> - prefetchw(data + size);
> + prefetchw(data + SKB_WITH_OVERHEAD(size));
>
> /*
> * Only clear those fields we need to clear, not those that we will
> @@ -576,7 +574,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> * the tail pointer in struct sk_buff!
> */
> memset(skb, 0, offsetof(struct sk_buff, tail));
> - __build_skb_around(skb, data, osize);
> + __build_skb_around(skb, data, size);
> skb->pfmemalloc = pfmemalloc;
>
> if (flags & SKB_ALLOC_FCLONE) {
> --
> 2.39.1.456.gfc5497dd1b-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 3/4] net: factorize code in kmalloc_reserve()
2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
@ 2023-02-02 20:09 ` Soheil Hassas Yeganeh
0 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, Alexander Duyck
On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> All kmalloc_reserve() callers have to make the same computation,
> we can factorize them, to prepare following patch in the series.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
> net/core/skbuff.c | 27 +++++++++++----------------
> 1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a82df5289208d69716e60c5c1f201ec3ca50a258..ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -478,17 +478,20 @@ EXPORT_SYMBOL(napi_build_skb);
> * may be used. Otherwise, the packet data may be discarded until enough
> * memory is free
> */
> -static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
> +static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
> bool *pfmemalloc)
> {
> - void *obj;
> bool ret_pfmemalloc = false;
> + unsigned int obj_size;
> + void *obj;
>
> + obj_size = SKB_HEAD_ALIGN(*size);
> + *size = obj_size = kmalloc_size_roundup(obj_size);
> /*
> * Try a regular allocation, when that fails and we're not entitled
> * to the reserves, fail.
> */
> - obj = kmalloc_node_track_caller(size,
> + obj = kmalloc_node_track_caller(obj_size,
> flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> node);
> if (obj || !(gfp_pfmemalloc_allowed(flags)))
> @@ -496,7 +499,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node,
>
> /* Try again but now we are using pfmemalloc reserves */
> ret_pfmemalloc = true;
> - obj = kmalloc_node_track_caller(size, flags, node);
> + obj = kmalloc_node_track_caller(obj_size, flags, node);
>
> out:
> if (pfmemalloc)
> @@ -557,9 +560,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> * aligned memory blocks, unless SLUB/SLAB debug is enabled.
> * Both skb->head and skb_shared_info are cache line aligned.
> */
> - size = SKB_HEAD_ALIGN(size);
> - size = kmalloc_size_roundup(size);
> - data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
> + data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc);
> if (unlikely(!data))
> goto nodata;
> /* kmalloc_size_roundup() might give us more room than requested.
> @@ -1931,9 +1932,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> if (skb_pfmemalloc(skb))
> gfp_mask |= __GFP_MEMALLOC;
>
> - size = SKB_HEAD_ALIGN(size);
> - size = kmalloc_size_roundup(size);
> - data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
> if (!data)
> goto nodata;
> size = SKB_WITH_OVERHEAD(size);
> @@ -6282,9 +6281,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
> if (skb_pfmemalloc(skb))
> gfp_mask |= __GFP_MEMALLOC;
>
> - size = SKB_HEAD_ALIGN(size);
> - size = kmalloc_size_roundup(size);
> - data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
> if (!data)
> return -ENOMEM;
> size = SKB_WITH_OVERHEAD(size);
> @@ -6400,9 +6397,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
> if (skb_pfmemalloc(skb))
> gfp_mask |= __GFP_MEMALLOC;
>
> - size = SKB_HEAD_ALIGN(size);
> - size = kmalloc_size_roundup(size);
> - data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL);
> + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL);
> if (!data)
> return -ENOMEM;
> size = SKB_WITH_OVERHEAD(size);
> --
> 2.39.1.456.gfc5497dd1b-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
@ 2023-02-02 20:14 ` Soheil Hassas Yeganeh
2023-02-03 5:11 ` Jakub Kicinski
` (2 subsequent siblings)
3 siblings, 0 replies; 17+ messages in thread
From: Soheil Hassas Yeganeh @ 2023-02-02 20:14 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, Alexander Duyck
On Thu, Feb 2, 2023 at 1:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Recent removal of ksize() in alloc_skb() increased
> performance because we no longer read
> the associated struct page.
>
> We have an equivalent cost at kfree_skb() time.
>
> kfree(skb->head) has to access a struct page,
> often cold in cpu caches to get the owning
> struct kmem_cache.
>
> Considering that many allocations are small,
> we can have our own kmem_cache to avoid the cache line miss.
>
> This also saves memory because these small heads
> are no longer padded to 1024 bytes.
>
> CONFIG_SLUB=y
> $ grep skbuff_small_head /proc/slabinfo
> skbuff_small_head 2907 2907 640 51 8 : tunables 0 0 0 : slabdata 57 57 0
>
> CONFIG_SLAB=y
> $ grep skbuff_small_head /proc/slabinfo
> skbuff_small_head 607 624 640 6 1 : tunables 54 27 8 : slabdata 104 104 5
>
> Note: after Kees Cook patches and this one, we might
> be able to revert commit
> dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> because GRO_MAX_HEAD is also small.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Very nice!
> ---
> net/core/skbuff.c | 52 ++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ae0b2aa1f01e8060cc4fe69137e9bd98e44280cc..3e540b4924701cc57b6fbd1b668bab3b652ee94c 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -89,6 +89,19 @@ static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
> #ifdef CONFIG_SKB_EXTENSIONS
> static struct kmem_cache *skbuff_ext_cache __ro_after_init;
> #endif
> +static struct kmem_cache *skb_small_head_cache __ro_after_init;
> +
> +#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
> +
> +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
> +#define SKB_SMALL_HEAD_CACHE_SIZE \
> + (is_power_of_2(SKB_SMALL_HEAD_SIZE) ? \
> + (SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) : \
> + SKB_SMALL_HEAD_SIZE)
> +
> +#define SKB_SMALL_HEAD_HEADROOM \
> + SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
> +
> int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
> EXPORT_SYMBOL(sysctl_max_skb_frags);
>
> @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
> void *obj;
>
> obj_size = SKB_HEAD_ALIGN(*size);
> + if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> + !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> +
> + /* skb_small_head_cache has non power of two size,
> + * likely forcing SLUB to use order-3 pages.
> + * We deliberately attempt a NOMEMALLOC allocation only.
> + */
> + obj = kmem_cache_alloc_node(skb_small_head_cache,
> + flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> + node);
> + if (obj) {
> + *size = SKB_SMALL_HEAD_CACHE_SIZE;
> + goto out;
> + }
> + }
> *size = obj_size = kmalloc_size_roundup(obj_size);
> /*
> * Try a regular allocation, when that fails and we're not entitled
> @@ -805,6 +833,14 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data)
> return page_pool_return_skb_page(virt_to_page(data));
> }
>
> +static void skb_kfree_head(void *head, unsigned int end_offset)
> +{
> + if (end_offset == SKB_SMALL_HEAD_HEADROOM)
> + kmem_cache_free(skb_small_head_cache, head);
> + else
> + kfree(head);
> +}
> +
> static void skb_free_head(struct sk_buff *skb)
> {
> unsigned char *head = skb->head;
> @@ -814,7 +850,7 @@ static void skb_free_head(struct sk_buff *skb)
> return;
> skb_free_frag(head);
> } else {
> - kfree(head);
> + skb_kfree_head(head, skb_end_offset(skb));
> }
> }
>
> @@ -1995,7 +2031,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> return 0;
>
> nofrags:
> - kfree(data);
> + skb_kfree_head(data, size);
> nodata:
> return -ENOMEM;
> }
> @@ -4633,6 +4669,12 @@ void __init skb_init(void)
> 0,
> SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> NULL);
> + skb_small_head_cache = kmem_cache_create("skbuff_small_head",
> + SKB_SMALL_HEAD_CACHE_SIZE,
> + 0,
> + SLAB_HWCACHE_ALIGN | SLAB_PANIC,
> + NULL);
> +
> skb_extensions_init();
> }
>
> @@ -6297,7 +6339,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
> if (skb_cloned(skb)) {
> /* drop the old head gracefully */
> if (skb_orphan_frags(skb, gfp_mask)) {
> - kfree(data);
> + skb_kfree_head(data, size);
> return -ENOMEM;
> }
> for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> @@ -6405,7 +6447,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
> memcpy((struct skb_shared_info *)(data + size),
> skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
> if (skb_orphan_frags(skb, gfp_mask)) {
> - kfree(data);
> + skb_kfree_head(data, size);
> return -ENOMEM;
> }
> shinfo = (struct skb_shared_info *)(data + size);
> @@ -6441,7 +6483,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
> /* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */
> if (skb_has_frag_list(skb))
> kfree_skb_list(skb_shinfo(skb)->frag_list);
> - kfree(data);
> + skb_kfree_head(data, size);
> return -ENOMEM;
> }
> skb_release_data(skb, SKB_CONSUMED);
> --
> 2.39.1.456.gfc5497dd1b-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
2023-02-02 20:14 ` Soheil Hassas Yeganeh
@ 2023-02-03 5:11 ` Jakub Kicinski
2023-02-03 7:00 ` Eric Dumazet
2023-02-03 7:59 ` Paolo Abeni
2023-02-03 19:37 ` kernel test robot
3 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03 5:11 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet,
Alexander Duyck, Soheil Hassas Yeganeh
On Thu, 2 Feb 2023 18:58:01 +0000 Eric Dumazet wrote:
> +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
Why is that? Is it to prevent potential mixing up of objects from
the cache with objects from general slabs (since we only do a
end_offset == SKB_SMALL_HEAD_HEADROOM check)?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-03 5:11 ` Jakub Kicinski
@ 2023-02-03 7:00 ` Eric Dumazet
0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2023-02-03 7:00 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, netdev, eric.dumazet,
Alexander Duyck, Soheil Hassas Yeganeh
On Fri, Feb 3, 2023 at 6:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 2 Feb 2023 18:58:01 +0000 Eric Dumazet wrote:
> > +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
>
> Why is that? Is it to prevent potential mixing up of objects from
> the cache with objects from general slabs (since we only do a
> end_offset == SKB_SMALL_HEAD_HEADROOM check)?
Good question.
Some alloc_skb() callers use GFP_DMA (or __GFP_ACCOUNT)
we can not use the dedicated kmem_cache for them.
They could get an object of size 512 or 1024
Since I chose not adding yet another
skb->head_has_been_allocated_from_small_head_cache,
we want to make sure we will not have issues in the future, if
SKB_HEAD_ALIGN(MAX_TCP_HEADER)
becomes a power-of-two. (for example for some of us increasing MAX_SKB_FRAGS)
Alternative would be to add a check at boot time, making sure
no standard cache has the same object size.
This might have an issue with CONFIG_SLOB=y, I wish this was gone already...
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
2023-02-02 20:14 ` Soheil Hassas Yeganeh
2023-02-03 5:11 ` Jakub Kicinski
@ 2023-02-03 7:59 ` Paolo Abeni
2023-02-03 8:17 ` Eric Dumazet
2023-02-03 19:37 ` kernel test robot
3 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2023-02-03 7:59 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski
Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh
On Thu, 2023-02-02 at 18:58 +0000, Eric Dumazet wrote:
> Note: after Kees Cook patches and this one, we might
> be able to revert commit
> dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> because GRO_MAX_HEAD is also small.
I guess I'll need some time to do the relevant benchmarks, but I'm not
able to schedule them very soon.
> @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
> void *obj;
>
> obj_size = SKB_HEAD_ALIGN(*size);
> + if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> + !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> +
> + /* skb_small_head_cache has non power of two size,
> + * likely forcing SLUB to use order-3 pages.
> + * We deliberately attempt a NOMEMALLOC allocation only.
> + */
> + obj = kmem_cache_alloc_node(skb_small_head_cache,
> + flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> + node);
> + if (obj) {
> + *size = SKB_SMALL_HEAD_CACHE_SIZE;
> + goto out;
> + }
In case kmem allocation failure, should we try to skip the 2nd
__GFP_NOMEMALLOC attempt below?
I *think* non power of two size is also required to avoid an issue
plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
kmem_cache allocated.
Thanks!
Paolo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-03 7:59 ` Paolo Abeni
@ 2023-02-03 8:17 ` Eric Dumazet
2023-02-03 19:47 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2023-02-03 8:17 UTC (permalink / raw)
To: Paolo Abeni
Cc: David S . Miller, Jakub Kicinski, netdev, eric.dumazet,
Alexander Duyck, Soheil Hassas Yeganeh
On Fri, Feb 3, 2023 at 8:59 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2023-02-02 at 18:58 +0000, Eric Dumazet wrote:
> > Note: after Kees Cook patches and this one, we might
> > be able to revert commit
> > dbae2b062824 ("net: skb: introduce and use a single page frag cache")
> > because GRO_MAX_HEAD is also small.
>
> I guess I'll need some time to do the relevant benchmarks, but I'm not
> able to schedule them very soon.
No worries, this can be done later.
Note the results might depend on SLUB/SLAB choice.
>
> > @@ -486,6 +499,21 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
> > void *obj;
> >
> > obj_size = SKB_HEAD_ALIGN(*size);
> > + if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> > + !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> > +
> > + /* skb_small_head_cache has non power of two size,
> > + * likely forcing SLUB to use order-3 pages.
> > + * We deliberately attempt a NOMEMALLOC allocation only.
> > + */
> > + obj = kmem_cache_alloc_node(skb_small_head_cache,
> > + flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
> > + node);
> > + if (obj) {
> > + *size = SKB_SMALL_HEAD_CACHE_SIZE;
> > + goto out;
> > + }
>
> In case kmem allocation failure, should we try to skip the 2nd
> __GFP_NOMEMALLOC attempt below?
We could, but my reasoning was that we might find an object in the
other kmem_cache freelist.
kmalloc-1 objects tend to use smaller page orders, so are more likely
to succeed if
there is no order-3 page available in the buddy allocator.(I mentioned
this in the comment)
>
> I *think* non power of two size is also required to avoid an issue
> plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
> kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
> kmem_cache allocated.
Indeed there are multiple cases explaining why SKB_SMALL_HEAD_CACHE_SIZE
needs to be a non power of two.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
` (2 preceding siblings ...)
2023-02-03 7:59 ` Paolo Abeni
@ 2023-02-03 19:37 ` kernel test robot
2023-02-03 19:42 ` Jakub Kicinski
3 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2023-02-03 19:37 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: oe-kbuild-all, netdev, eric.dumazet, Alexander Duyck,
Soheil Hassas Yeganeh, Eric Dumazet
Hi Eric,
I love your patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230203-031126
patch link: https://lore.kernel.org/r/20230202185801.4179599-5-edumazet%40google.com
patch subject: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
config: arc-randconfig-r014-20230204 (https://download.01.org/0day-ci/archive/20230204/202302040329.E10xZHbY-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/002bb9238e98f3fbeb0402c97f711420c3321b93
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230203-031126
git checkout 002bb9238e98f3fbeb0402c97f711420c3321b93
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arc SHELL=/bin/bash net/core/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
net/core/skbuff.c: In function 'kmalloc_reserve':
>> net/core/skbuff.c:503:23: error: 'KMALLOC_NOT_NORMAL_BITS' undeclared (first use in this function)
503 | !(flags & KMALLOC_NOT_NORMAL_BITS)) {
| ^~~~~~~~~~~~~~~~~~~~~~~
net/core/skbuff.c:503:23: note: each undeclared identifier is reported only once for each function it appears in
vim +/KMALLOC_NOT_NORMAL_BITS +503 net/core/skbuff.c
486
487 /*
488 * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells
489 * the caller if emergency pfmemalloc reserves are being used. If it is and
490 * the socket is later found to be SOCK_MEMALLOC then PFMEMALLOC reserves
491 * may be used. Otherwise, the packet data may be discarded until enough
492 * memory is free
493 */
494 static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
495 bool *pfmemalloc)
496 {
497 bool ret_pfmemalloc = false;
498 unsigned int obj_size;
499 void *obj;
500
501 obj_size = SKB_HEAD_ALIGN(*size);
502 if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
> 503 !(flags & KMALLOC_NOT_NORMAL_BITS)) {
504
505 /* skb_small_head_cache has non power of two size,
506 * likely forcing SLUB to use order-3 pages.
507 * We deliberately attempt a NOMEMALLOC allocation only.
508 */
509 obj = kmem_cache_alloc_node(skb_small_head_cache,
510 flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
511 node);
512 if (obj) {
513 *size = SKB_SMALL_HEAD_CACHE_SIZE;
514 goto out;
515 }
516 }
517 *size = obj_size = kmalloc_size_roundup(obj_size);
518 /*
519 * Try a regular allocation, when that fails and we're not entitled
520 * to the reserves, fail.
521 */
522 obj = kmalloc_node_track_caller(obj_size,
523 flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
524 node);
525 if (obj || !(gfp_pfmemalloc_allowed(flags)))
526 goto out;
527
528 /* Try again but now we are using pfmemalloc reserves */
529 ret_pfmemalloc = true;
530 obj = kmalloc_node_track_caller(obj_size, flags, node);
531
532 out:
533 if (pfmemalloc)
534 *pfmemalloc = ret_pfmemalloc;
535
536 return obj;
537 }
538
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-03 19:37 ` kernel test robot
@ 2023-02-03 19:42 ` Jakub Kicinski
0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03 19:42 UTC (permalink / raw)
To: kernel test robot
Cc: Eric Dumazet, David S . Miller, Paolo Abeni, oe-kbuild-all,
netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh
On Sat, 4 Feb 2023 03:37:10 +0800 kernel test robot wrote:
> All errors (new ones prefixed by >>):
>
> net/core/skbuff.c: In function 'kmalloc_reserve':
> >> net/core/skbuff.c:503:23: error: 'KMALLOC_NOT_NORMAL_BITS' undeclared (first use in this function)
> 503 | !(flags & KMALLOC_NOT_NORMAL_BITS)) {
> | ^~~~~~~~~~~~~~~~~~~~~~~
> net/core/skbuff.c:503:23: note: each undeclared identifier is reported only once for each function it appears in
diff --git a/net/Kconfig b/net/Kconfig
index 27bc49c7ad73..41e27a4805a8 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -8,6 +8,7 @@ menuconfig NET
select NLATTR
select GENERIC_NET_UTILS
select BPF
+ depends on !SLOB
help
Unless you really know what you are doing, you should say Y here.
The reason is that some programs need kernel networking support even
? :)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
2023-02-03 8:17 ` Eric Dumazet
@ 2023-02-03 19:47 ` Jakub Kicinski
0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-03 19:47 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, David S . Miller, netdev, eric.dumazet,
Alexander Duyck, Soheil Hassas Yeganeh
On Fri, 3 Feb 2023 09:17:55 +0100 Eric Dumazet wrote:
> > I *think* non power of two size is also required to avoid an issue
> > plain (no GFP_DMA nor __GFP_ACCOUNT) allocations in case of fallback to
> > kmalloc(), to prevent skb_kfree_head() mis-interpreting skb->head as
> > kmem_cache allocated.
>
> Indeed there are multiple cases explaining why SKB_SMALL_HEAD_CACHE_SIZE
> needs to be a non power of two.
Since you may need to respin would you mind spelling this out a bit
more in the comment ? Maybe:
-/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. */
+/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two.
+ * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique
+ * size, and we can differentiate heads from skb_small_head_cache
+ * vs system slabs by looking at their size (skb_end_offset()).
+ */
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
` (3 preceding siblings ...)
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
@ 2023-02-06 18:16 ` Paolo Abeni
4 siblings, 0 replies; 17+ messages in thread
From: Paolo Abeni @ 2023-02-06 18:16 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski
Cc: netdev, eric.dumazet, Alexander Duyck, Soheil Hassas Yeganeh
On Thu, 2023-02-02 at 18:57 +0000, Eric Dumazet wrote:
> Our profile data show that using kmalloc(non_const_size)/kfree(ptr)
> has a certain cost, because kfree(ptr) has to pull a 'struct page'
> in cpu caches.
>
> Using a dedicated kmem_cache for TCP skb->head allocations makes
> a difference, both in cpu cycles and memory savings.
>
> This kmem_cache could also be used for GRO skb allocations,
> this is left as a future exercise.
>
> Eric Dumazet (4):
> net: add SKB_HEAD_ALIGN() helper
> net: remove osize variable in __alloc_skb()
> net: factorize code in kmalloc_reserve()
> net: add dedicated kmem_cache for typical/small skb->head
>
> include/linux/skbuff.h | 8 ++++
> net/core/skbuff.c | 95 +++++++++++++++++++++++++++---------------
> 2 files changed, 70 insertions(+), 33 deletions(-)
LGTM,
Acked-by: Paolo Abeni <pabeni@redhat.com>
Thanks!
Paolo
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-02-06 18:17 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 18:57 [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Eric Dumazet
2023-02-02 18:57 ` [PATCH net-next 1/4] net: add SKB_HEAD_ALIGN() helper Eric Dumazet
2023-02-02 20:06 ` Soheil Hassas Yeganeh
2023-02-02 18:57 ` [PATCH net-next 2/4] net: remove osize variable in __alloc_skb() Eric Dumazet
2023-02-02 20:07 ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 3/4] net: factorize code in kmalloc_reserve() Eric Dumazet
2023-02-02 20:09 ` Soheil Hassas Yeganeh
2023-02-02 18:58 ` [PATCH net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head Eric Dumazet
2023-02-02 20:14 ` Soheil Hassas Yeganeh
2023-02-03 5:11 ` Jakub Kicinski
2023-02-03 7:00 ` Eric Dumazet
2023-02-03 7:59 ` Paolo Abeni
2023-02-03 8:17 ` Eric Dumazet
2023-02-03 19:47 ` Jakub Kicinski
2023-02-03 19:37 ` kernel test robot
2023-02-03 19:42 ` Jakub Kicinski
2023-02-06 18:16 ` [PATCH net-next 0/4] net: core: use a dedicated kmem_cache for skb head allocs Paolo Abeni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).