linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
@ 2021-01-11 18:27 Alexander Lobakin
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-11 18:27 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Eric Dumazet, Edward Cree, Jonathan Lemon, Willem de Bruijn,
	Miaohe Lin, Alexander Lobakin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, linux-kernel

Inspired by cpu_map_kthread_run() and _kfree_skb_defer() logics.

Currently, all sorts of skb allocation always do allocate
skbuff_heads one by one via kmem_cache_alloc().
On the other hand, we have percpu napi_alloc_cache to store
skbuff_heads queued up for freeing and flush them by bulks.

We can use this struct to cache and bulk not only freeing, but also
allocation of new skbuff_heads, as well as to reuse cached-to-free
heads instead of allocating the new ones.
As accessing napi_alloc_cache implies NAPI softirq context, do this
only for __napi_alloc_skb() and its derivatives (napi_alloc_skb()
and napi_get_frags()). The rough amount of their call sites are 69,
which is quite a number.

iperf3 showed a nice bump from 910 to 935 Mbits while performing
UDP VLAN NAT on 1.2 GHz MIPS board. The boost is likely to be
way bigger on more powerful hosts and NICs with tens of Mpps.

Patches 1-2 are preparation steps, while 3-5 do the real work.

Alexander Lobakin (5):
  skbuff: rename fields of struct napi_alloc_cache to be more intuitive
  skbuff: open-code __build_skb() inside __napi_alloc_skb()
  skbuff: reuse skbuff_heads from flush_skb_cache if available
  skbuff: allocate skbuff_heads by bulks instead of one by one
  skbuff: refill skb_cache early from deferred-to-consume entries

 net/core/skbuff.c | 62 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 14 deletions(-)

-- 
2.30.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive
  2021-01-11 18:27 [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
@ 2021-01-11 18:28 ` Alexander Lobakin
  2021-01-11 18:28   ` [PATCH net-next 2/5] skbuff: open-code __build_skb() inside __napi_alloc_skb() Alexander Lobakin
                     ` (4 more replies)
  2021-01-12  8:20 ` [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Eric Dumazet
  2021-01-12  9:54 ` Edward Cree
  2 siblings, 5 replies; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-11 18:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Eric Dumazet, Edward Cree, Jonathan Lemon, Willem de Bruijn,
	Miaohe Lin, Alexander Lobakin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, linux-kernel

skb_cache and skb_count fields are used to store skbuff_heads queued
for freeing to flush them by bulks, and aren't related to allocation
path. Give them more obvious names to improve code understanding and
allow to expand this struct with more allocation-related elements.

Misc: indent struct napi_alloc_cache declaration for better reading.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7626a33cce59..17ae5e90f103 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -366,9 +366,9 @@ EXPORT_SYMBOL(build_skb_around);
 #define NAPI_SKB_CACHE_SIZE	64
 
 struct napi_alloc_cache {
-	struct page_frag_cache page;
-	unsigned int skb_count;
-	void *skb_cache[NAPI_SKB_CACHE_SIZE];
+	struct page_frag_cache	page;
+	u32			flush_skb_count;
+	void			*flush_skb_cache[NAPI_SKB_CACHE_SIZE];
 };
 
 static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
@@ -860,11 +860,11 @@ void __kfree_skb_flush(void)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 
-	/* flush skb_cache if containing objects */
-	if (nc->skb_count) {
-		kmem_cache_free_bulk(skbuff_head_cache, nc->skb_count,
-				     nc->skb_cache);
-		nc->skb_count = 0;
+	/* flush flush_skb_cache if containing objects */
+	if (nc->flush_skb_count) {
+		kmem_cache_free_bulk(skbuff_head_cache, nc->flush_skb_count,
+				     nc->flush_skb_cache);
+		nc->flush_skb_count = 0;
 	}
 }
 
@@ -876,18 +876,18 @@ static inline void _kfree_skb_defer(struct sk_buff *skb)
 	skb_release_all(skb);
 
 	/* record skb to CPU local list */
-	nc->skb_cache[nc->skb_count++] = skb;
+	nc->flush_skb_cache[nc->flush_skb_count++] = skb;
 
 #ifdef CONFIG_SLUB
 	/* SLUB writes into objects when freeing */
 	prefetchw(skb);
 #endif
 
-	/* flush skb_cache if it is filled */
-	if (unlikely(nc->skb_count == NAPI_SKB_CACHE_SIZE)) {
+	/* flush flush_skb_cache if it is filled */
+	if (unlikely(nc->flush_skb_count == NAPI_SKB_CACHE_SIZE)) {
 		kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_SIZE,
-				     nc->skb_cache);
-		nc->skb_count = 0;
+				     nc->flush_skb_cache);
+		nc->flush_skb_count = 0;
 	}
 }
 void __kfree_skb_defer(struct sk_buff *skb)
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 2/5] skbuff: open-code __build_skb() inside __napi_alloc_skb()
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
@ 2021-01-11 18:28   ` Alexander Lobakin
  2021-01-11 18:29   ` [PATCH net-next 3/5] skbuff: reuse skbuff_heads from flush_skb_cache if available Alexander Lobakin
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-11 18:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Eric Dumazet, Edward Cree, Jonathan Lemon, Willem de Bruijn,
	Miaohe Lin, Alexander Lobakin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, linux-kernel

In preparation for skbuff_heads caching and reusing, open-code
__build_skb() inside __napi_alloc_skb() with factoring out
the skbbuff_head allocation itself.
Note that the return value of __build_skb_around() is not checked
since it never returns anything except the given skb.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 17ae5e90f103..3c904c29efbb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -485,6 +485,11 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
 }
 EXPORT_SYMBOL(__netdev_alloc_skb);
 
+static struct sk_buff *__napi_decache_skb(struct napi_alloc_cache *nc)
+{
+	return kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC);
+}
+
 /**
  *	__napi_alloc_skb - allocate skbuff for rx in a specific NAPI instance
  *	@napi: napi instance this buffer was allocated for
@@ -525,12 +530,15 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 	if (unlikely(!data))
 		return NULL;
 
-	skb = __build_skb(data, len);
+	skb = __napi_decache_skb(nc);
 	if (unlikely(!skb)) {
 		skb_free_frag(data);
 		return NULL;
 	}
 
+	memset(skb, 0, offsetof(struct sk_buff, tail));
+	__build_skb_around(skb, data, len);
+
 	if (nc->page.pfmemalloc)
 		skb->pfmemalloc = 1;
 	skb->head_frag = 1;
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 3/5] skbuff: reuse skbuff_heads from flush_skb_cache if available
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
  2021-01-11 18:28   ` [PATCH net-next 2/5] skbuff: open-code __build_skb() inside __napi_alloc_skb() Alexander Lobakin
@ 2021-01-11 18:29   ` Alexander Lobakin
  2021-01-11 18:29   ` [PATCH net-next 4/5] skbuff: allocate skbuff_heads by bulks instead of one by one Alexander Lobakin
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-11 18:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Eric Dumazet, Edward Cree, Jonathan Lemon, Willem de Bruijn,
	Miaohe Lin, Alexander Lobakin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, linux-kernel

Instead of unconditional allocating a new skbuff_head and
unconditional flushing of flush_skb_cache, reuse the ones queued
up for flushing if there are any.
skbuff_heads stored in flush_skb_cache are already unreferenced
from any pages or extensions and almost ready for use. We perform
zeroing in __napi_alloc_skb() anyway, regardless of where did our
skbuff_head came from.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3c904c29efbb..0e8c597ff6ce 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -487,6 +487,9 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 
 static struct sk_buff *__napi_decache_skb(struct napi_alloc_cache *nc)
 {
+	if (nc->flush_skb_count)
+		return nc->flush_skb_cache[--nc->flush_skb_count];
+
 	return kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC);
 }
 
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 4/5] skbuff: allocate skbuff_heads by bulks instead of one by one
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
  2021-01-11 18:28   ` [PATCH net-next 2/5] skbuff: open-code __build_skb() inside __napi_alloc_skb() Alexander Lobakin
  2021-01-11 18:29   ` [PATCH net-next 3/5] skbuff: reuse skbuff_heads from flush_skb_cache if available Alexander Lobakin
@ 2021-01-11 18:29   ` Alexander Lobakin
  2021-01-11 18:29   ` [PATCH net-next 5/5] skbuff: refill skb_cache early from deferred-to-consume entries Alexander Lobakin
  2021-01-11 18:49   ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Jonathan Lemon
  4 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-11 18:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Eric Dumazet, Edward Cree, Jonathan Lemon, Willem de Bruijn,
	Miaohe Lin, Alexander Lobakin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, linux-kernel

Use the same napi_alloc_cache struct and the same approach as used
for bulk-freeing skbuff_heads to allocate them for new skbs.
The new skb_cache will store up to NAPI_SKB_CACHE_SIZE (currently
64, which equals to NAPI_POLL_WEIGHT to be capable to serve one
polling cycle) and will be refilled by bulks in case of full
depletion or after completing network softirqs.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0e8c597ff6ce..57a7307689f3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -367,6 +367,8 @@ EXPORT_SYMBOL(build_skb_around);
 
 struct napi_alloc_cache {
 	struct page_frag_cache	page;
+	u32			skb_count;
+	void			*skb_cache[NAPI_SKB_CACHE_SIZE];
 	u32			flush_skb_count;
 	void			*flush_skb_cache[NAPI_SKB_CACHE_SIZE];
 };
@@ -490,7 +492,15 @@ static struct sk_buff *__napi_decache_skb(struct napi_alloc_cache *nc)
 	if (nc->flush_skb_count)
 		return nc->flush_skb_cache[--nc->flush_skb_count];
 
-	return kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC);
+	if (unlikely(!nc->skb_count))
+		nc->skb_count = kmem_cache_alloc_bulk(skbuff_head_cache,
+						      GFP_ATOMIC,
+						      NAPI_SKB_CACHE_SIZE,
+						      nc->skb_cache);
+	if (unlikely(!nc->skb_count))
+		return NULL;
+
+	return nc->skb_cache[--nc->skb_count];
 }
 
 /**
@@ -870,6 +880,7 @@ void __consume_stateless_skb(struct sk_buff *skb)
 void __kfree_skb_flush(void)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	u32 num;
 
 	/* flush flush_skb_cache if containing objects */
 	if (nc->flush_skb_count) {
@@ -877,6 +888,13 @@ void __kfree_skb_flush(void)
 				     nc->flush_skb_cache);
 		nc->flush_skb_count = 0;
 	}
+
+	num = NAPI_SKB_CACHE_SIZE - nc->skb_count;
+	if (num)
+		nc->skb_count += kmem_cache_alloc_bulk(skbuff_head_cache,
+						       GFP_ATOMIC, num,
+						       nc->skb_cache +
+						       nc->skb_count);
 }
 
 static inline void _kfree_skb_defer(struct sk_buff *skb)
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 5/5] skbuff: refill skb_cache early from deferred-to-consume entries
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
                     ` (2 preceding siblings ...)
  2021-01-11 18:29   ` [PATCH net-next 4/5] skbuff: allocate skbuff_heads by bulks instead of one by one Alexander Lobakin
@ 2021-01-11 18:29   ` Alexander Lobakin
  2021-01-11 18:49   ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Jonathan Lemon
  4 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-11 18:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: Eric Dumazet, Edward Cree, Jonathan Lemon, Willem de Bruijn,
	Miaohe Lin, Alexander Lobakin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, linux-kernel

Instead of unconditional queueing of ready-to-consume skbuff_heads
to flush_skb_cache, feed skb_cache with them instead if it's not
full already.
This greatly reduces the frequency of kmem_cache_alloc_bulk() calls.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/core/skbuff.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 57a7307689f3..ba0d5611635e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -904,6 +904,11 @@ static inline void _kfree_skb_defer(struct sk_buff *skb)
 	/* drop skb->head and call any destructors for packet */
 	skb_release_all(skb);
 
+	if (nc->skb_count < NAPI_SKB_CACHE_SIZE) {
+		nc->skb_cache[nc->skb_count++] = skb;
+		return;
+	}
+
 	/* record skb to CPU local list */
 	nc->flush_skb_cache[nc->flush_skb_count++] = skb;
 
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
                     ` (3 preceding siblings ...)
  2021-01-11 18:29   ` [PATCH net-next 5/5] skbuff: refill skb_cache early from deferred-to-consume entries Alexander Lobakin
@ 2021-01-11 18:49   ` Jonathan Lemon
  2021-01-11 21:03     ` Alexander Lobakin
  4 siblings, 1 reply; 21+ messages in thread
From: Jonathan Lemon @ 2021-01-11 18:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Eric Dumazet, Edward Cree,
	Willem de Bruijn, Miaohe Lin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, linux-kernel

On Mon, Jan 11, 2021 at 06:28:21PM +0000, Alexander Lobakin wrote:
> skb_cache and skb_count fields are used to store skbuff_heads queued
> for freeing to flush them by bulks, and aren't related to allocation
> path. Give them more obvious names to improve code understanding and
> allow to expand this struct with more allocation-related elements.

I don't think prefixing these with flush_ is the correct approach;
flush is just an operation on the structure, not a property of the
structure itself.  It especially becomes confusing in the later 
patches when the cache is used on the allocation path.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive
  2021-01-11 18:49   ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Jonathan Lemon
@ 2021-01-11 21:03     ` Alexander Lobakin
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-11 21:03 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Edward Cree, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, linux-kernel

From: Jonathan Lemon <jonathan.lemon@gmail.com>
Date: Mon, 11 Jan 2021 10:49:45 -0800

> On Mon, Jan 11, 2021 at 06:28:21PM +0000, Alexander Lobakin wrote:
>> skb_cache and skb_count fields are used to store skbuff_heads queued
>> for freeing to flush them by bulks, and aren't related to allocation
>> path. Give them more obvious names to improve code understanding and
>> allow to expand this struct with more allocation-related elements.
>
> I don't think prefixing these with flush_ is the correct approach;
> flush is just an operation on the structure, not a property of the
> structure itself.  It especially becomes confusing in the later
> patches when the cache is used on the allocation path.

Agree, but didn't come up with anything more fitting. Any suggestions
maybe?

> --
> Jonathan

Thanks,
Al


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-11 18:27 [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
@ 2021-01-12  8:20 ` Eric Dumazet
  2021-01-12 10:56   ` Alexander Lobakin
  2021-01-12  9:54 ` Edward Cree
  2 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2021-01-12  8:20 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Edward Cree, Jonathan Lemon,
	Willem de Bruijn, Miaohe Lin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, LKML

On Mon, Jan 11, 2021 at 7:27 PM Alexander Lobakin <alobakin@pm.me> wrote:
>
> Inspired by cpu_map_kthread_run() and _kfree_skb_defer() logics.
>
> Currently, all sorts of skb allocation always do allocate
> skbuff_heads one by one via kmem_cache_alloc().
> On the other hand, we have percpu napi_alloc_cache to store
> skbuff_heads queued up for freeing and flush them by bulks.
>
> We can use this struct to cache and bulk not only freeing, but also
> allocation of new skbuff_heads, as well as to reuse cached-to-free
> heads instead of allocating the new ones.
> As accessing napi_alloc_cache implies NAPI softirq context, do this
> only for __napi_alloc_skb() and its derivatives (napi_alloc_skb()
> and napi_get_frags()). The rough amount of their call sites are 69,
> which is quite a number.
>
> iperf3 showed a nice bump from 910 to 935 Mbits while performing
> UDP VLAN NAT on 1.2 GHz MIPS board. The boost is likely to be
> way bigger on more powerful hosts and NICs with tens of Mpps.

What is the latency cost of these bulk allocations, and for TCP traffic
on which GRO is the norm ?

Adding caches is increasing cache foot print when the cache is populated.

I wonder if your iperf3 numbers are simply wrong because of lack of
GRO in this UDP VLAN NAT case.

We are adding a log of additional code, thus icache pressure, that
iperf3 tests can not really measure.

Most linus devices simply handle one packet at a time (one packet per interrupt)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-11 18:27 [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
  2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
  2021-01-12  8:20 ` [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Eric Dumazet
@ 2021-01-12  9:54 ` Edward Cree
  2021-01-12 11:08   ` Alexander Lobakin
  2 siblings, 1 reply; 21+ messages in thread
From: Edward Cree @ 2021-01-12  9:54 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Jakub Kicinski
  Cc: Eric Dumazet, Edward Cree, Jonathan Lemon, Willem de Bruijn,
	Miaohe Lin, Steffen Klassert, Guillaume Nault, Yadu Kishore,
	Al Viro, netdev, linux-kernel

Without wishing to weigh in on whether this caching is a good idea...
Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
 caches, to have a single larger cache, such that whenever it becomes full
 we bulk flush the top half, and when it's empty we bulk alloc the bottom
 half?  That should mean fewer branches, fewer instructions etc. than
 having to decide which cache to act upon every time.

-ed

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-12  8:20 ` [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Eric Dumazet
@ 2021-01-12 10:56   ` Alexander Lobakin
  2021-01-12 12:32     ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-12 10:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 12 Jan 2021 09:20:39 +0100

> On Mon, Jan 11, 2021 at 7:27 PM Alexander Lobakin <alobakin@pm.me> wrote:
>>
>> Inspired by cpu_map_kthread_run() and _kfree_skb_defer() logics.
>>
>> Currently, all sorts of skb allocation always do allocate
>> skbuff_heads one by one via kmem_cache_alloc().
>> On the other hand, we have percpu napi_alloc_cache to store
>> skbuff_heads queued up for freeing and flush them by bulks.
>>
>> We can use this struct to cache and bulk not only freeing, but also
>> allocation of new skbuff_heads, as well as to reuse cached-to-free
>> heads instead of allocating the new ones.
>> As accessing napi_alloc_cache implies NAPI softirq context, do this
>> only for __napi_alloc_skb() and its derivatives (napi_alloc_skb()
>> and napi_get_frags()). The rough amount of their call sites are 69,
>> which is quite a number.
>>
>> iperf3 showed a nice bump from 910 to 935 Mbits while performing
>> UDP VLAN NAT on 1.2 GHz MIPS board. The boost is likely to be
>> way bigger on more powerful hosts and NICs with tens of Mpps.
>
> What is the latency cost of these bulk allocations, and for TCP traffic
> on which GRO is the norm ?
>
> Adding caches is increasing cache foot print when the cache is populated.
>
> I wonder if your iperf3 numbers are simply wrong because of lack of
> GRO in this UDP VLAN NAT case.

Ah, I should've mentioned that I use UDP GRO Fraglists, so these
numbers are for GRO.

My board gives full 1 Gbps (link speed) for TCP for more than a year,
so I can't really rely on TCP passthrough to measure the gains or
regressions.

> We are adding a log of additional code, thus icache pressure, that
> iperf3 tests can not really measure.

Not sure if MIPS arch can provide enough debug information to measure
icache pressure, but I'll try to catch this.

> Most linus devices simply handle one packet at a time (one packet per interrupt)

I disagree here, most modern NICs usually handle thousands of packets
per single interrupt due to NAPI, hardware interrupt moderation and so
on, at least in cases with high traffic load.

Al


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-12  9:54 ` Edward Cree
@ 2021-01-12 11:08   ` Alexander Lobakin
  2021-01-12 12:23     ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-12 11:08 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Edward Cree, Jonathan Lemon, Willem de Bruijn, Miaohe Lin,
	Steffen Klassert, Guillaume Nault, Yadu Kishore, Al Viro, netdev,
	linux-kernel

From: Edward Cree <ecree.xilinx@gmail.com>
Date: Tue, 12 Jan 2021 09:54:04 +0000

> Without wishing to weigh in on whether this caching is a good idea...

Well, we already have a cache to bulk flush "consumed" skbs, although
kmem_cache_free() is generally lighter than kmem_cache_alloc(), and
a page frag cache to allocate skb->head that is also bulking the
operations, since it contains a (compound) page with the size of
min(SZ_32K, PAGE_SIZE).
If they wouldn't give any visible boosts, I think they wouldn't hit
mainline.

> Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
>  caches, to have a single larger cache, such that whenever it becomes full
>  we bulk flush the top half, and when it's empty we bulk alloc the bottom
>  half?  That should mean fewer branches, fewer instructions etc. than
>  having to decide which cache to act upon every time.

I though about a unified cache, but couldn't decide whether to flush
or to allocate heads and how much to process. Your suggestion answers
these questions and generally seems great. I'll try that one, thanks!

> -ed

Al


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-12 11:08   ` Alexander Lobakin
@ 2021-01-12 12:23     ` Eric Dumazet
  2021-01-13  1:02       ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2021-01-12 12:23 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Edward Cree, David S. Miller, Jakub Kicinski, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

On Tue, Jan 12, 2021 at 12:08 PM Alexander Lobakin <alobakin@pm.me> wrote:
>
> From: Edward Cree <ecree.xilinx@gmail.com>
> Date: Tue, 12 Jan 2021 09:54:04 +0000
>
> > Without wishing to weigh in on whether this caching is a good idea...
>
> Well, we already have a cache to bulk flush "consumed" skbs, although
> kmem_cache_free() is generally lighter than kmem_cache_alloc(), and
> a page frag cache to allocate skb->head that is also bulking the
> operations, since it contains a (compound) page with the size of
> min(SZ_32K, PAGE_SIZE).
> If they wouldn't give any visible boosts, I think they wouldn't hit
> mainline.
>
> > Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
> >  caches, to have a single larger cache, such that whenever it becomes full
> >  we bulk flush the top half, and when it's empty we bulk alloc the bottom
> >  half?  That should mean fewer branches, fewer instructions etc. than
> >  having to decide which cache to act upon every time.
>
> I though about a unified cache, but couldn't decide whether to flush
> or to allocate heads and how much to process. Your suggestion answers
> these questions and generally seems great. I'll try that one, thanks!
>


The thing is : kmalloc() is supposed to have batches already, and nice
per-cpu caches.

This looks like an mm issue, are we sure we want to get over it ?

I would like a full analysis of why SLAB/SLUB does not work well for
your test workload.

More details, more numbers.... before we accept yet another
'networking optimization' adding more code to the 'fast' path.

More code means more latencies when all code needs to be brought up in
cpu caches.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-12 10:56   ` Alexander Lobakin
@ 2021-01-12 12:32     ` Eric Dumazet
  2021-01-12 18:26       ` Alexander Lobakin
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2021-01-12 12:32 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Edward Cree, Jonathan Lemon,
	Willem de Bruijn, Miaohe Lin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, LKML

On Tue, Jan 12, 2021 at 11:56 AM Alexander Lobakin <alobakin@pm.me> wrote:
>

>
> Ah, I should've mentioned that I use UDP GRO Fraglists, so these
> numbers are for GRO.
>

Right, this suggests UDP GRO fraglist is a pathological case of GRO,
not saving memory.

Real GRO (TCP in most cases) will consume one skb, and have page
fragments for each segment.

Having skbs linked together is not cache friendly.

So I would try first to make this case better, instead of trying to
work around the real issue.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-12 12:32     ` Eric Dumazet
@ 2021-01-12 18:26       ` Alexander Lobakin
  2021-01-12 19:19         ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2021-01-12 18:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Lobakin, David S. Miller, Jakub Kicinski, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 12 Jan 2021 13:32:56 +0100

> On Tue, Jan 12, 2021 at 11:56 AM Alexander Lobakin <alobakin@pm.me> wrote:
>>
>
>>
>> Ah, I should've mentioned that I use UDP GRO Fraglists, so these
>> numbers are for GRO.
>>
>
> Right, this suggests UDP GRO fraglist is a pathological case of GRO,
> not saving memory.
>
> Real GRO (TCP in most cases) will consume one skb, and have page
> fragments for each segment.
>
> Having skbs linked together is not cache friendly.

OK, so I rebased test setup a bit to clarify the things out.

I disabled fraglists and GRO/GSO fraglists support advertisement
in driver to exclude any "pathological" cases and switched it
from napi_get_frags() + napi_gro_frags() to napi_alloc_skb() +
napi_gro_receive() to disable local skb reusing (napi_reuse_skb()).
I also enabled GSO UDP L4 ("classic" one: one skbuff_head + frags)
for forwarding, not only local traffic, and disabled NF flow offload
to increase CPU loading and drop performance below link speed so I
could see the changes.

So, the traffic flows looked like:
 - TCP GRO (one head + frags) -> NAT -> hardware TSO;
 - UDP GRO (one head + frags) -> NAT -> driver-side GSO.

Baseline 5.11-rc3:
 - 865 Mbps TCP, 866 Mbps UDP.

This patch (both separate caches and Edward's unified cache):
 - 899 Mbps TCP, 893 Mbps UDP.

So that's cleary *not* only "pathological" UDP GRO Fraglists
"problem" as TCP also got ~35 Mbps from this, as well as
non-fraglisted UDP.

Regarding latencies: I remember there were talks about latencies when
Edward introduced batched GRO (using linked lists to pass skbs from
GRO layer to core stack instead of passing one by one), so I think
it's a perennial question when it comes to batching/caching.

Thanks for the feedback, will post v2 soon.
The question about if this caching is reasonable isn't closed anyway,
but I don't see significant "cons" for now.

> So I would try first to make this case better, instead of trying to
> work around the real issue.

Al


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-12 18:26       ` Alexander Lobakin
@ 2021-01-12 19:19         ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2021-01-12 19:19 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Edward Cree, Jonathan Lemon,
	Willem de Bruijn, Miaohe Lin, Steffen Klassert, Guillaume Nault,
	Yadu Kishore, Al Viro, netdev, LKML

On Tue, Jan 12, 2021 at 7:26 PM Alexander Lobakin <alobakin@pm.me> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 12 Jan 2021 13:32:56 +0100
>
> > On Tue, Jan 12, 2021 at 11:56 AM Alexander Lobakin <alobakin@pm.me> wrote:
> >>
> >
> >>
> >> Ah, I should've mentioned that I use UDP GRO Fraglists, so these
> >> numbers are for GRO.
> >>
> >
> > Right, this suggests UDP GRO fraglist is a pathological case of GRO,
> > not saving memory.
> >
> > Real GRO (TCP in most cases) will consume one skb, and have page
> > fragments for each segment.
> >
> > Having skbs linked together is not cache friendly.
>
> OK, so I rebased test setup a bit to clarify the things out.
>
> I disabled fraglists and GRO/GSO fraglists support advertisement
> in driver to exclude any "pathological" cases and switched it
> from napi_get_frags() + napi_gro_frags() to napi_alloc_skb() +
> napi_gro_receive() to disable local skb reusing (napi_reuse_skb()).
> I also enabled GSO UDP L4 ("classic" one: one skbuff_head + frags)
> for forwarding, not only local traffic, and disabled NF flow offload
> to increase CPU loading and drop performance below link speed so I
> could see the changes.
>
> So, the traffic flows looked like:
>  - TCP GRO (one head + frags) -> NAT -> hardware TSO;
>  - UDP GRO (one head + frags) -> NAT -> driver-side GSO.
>
> Baseline 5.11-rc3:
>  - 865 Mbps TCP, 866 Mbps UDP.
>
> This patch (both separate caches and Edward's unified cache):
>  - 899 Mbps TCP, 893 Mbps UDP.
>
> So that's cleary *not* only "pathological" UDP GRO Fraglists
> "problem" as TCP also got ~35 Mbps from this, as well as
> non-fraglisted UDP.
>
> Regarding latencies: I remember there were talks about latencies when
> Edward introduced batched GRO (using linked lists to pass skbs from
> GRO layer to core stack instead of passing one by one), so I think
> it's a perennial question when it comes to batching/caching.
>
> Thanks for the feedback, will post v2 soon.
> The question about if this caching is reasonable isn't closed anyway,
> but I don't see significant "cons" for now.
>

Also it would be nice to have KASAN support.

We do not want to unconditionally to recycle stuff, since this might
hide use-after-free.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-12 12:23     ` Eric Dumazet
@ 2021-01-13  1:02       ` Jakub Kicinski
  2021-01-13  4:46         ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2021-01-13  1:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Lobakin, Edward Cree, David S. Miller, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

On Tue, 12 Jan 2021 13:23:16 +0100 Eric Dumazet wrote:
> On Tue, Jan 12, 2021 at 12:08 PM Alexander Lobakin <alobakin@pm.me> wrote:
> >
> > From: Edward Cree <ecree.xilinx@gmail.com>
> > Date: Tue, 12 Jan 2021 09:54:04 +0000
> >  
> > > Without wishing to weigh in on whether this caching is a good idea...  
> >
> > Well, we already have a cache to bulk flush "consumed" skbs, although
> > kmem_cache_free() is generally lighter than kmem_cache_alloc(), and
> > a page frag cache to allocate skb->head that is also bulking the
> > operations, since it contains a (compound) page with the size of
> > min(SZ_32K, PAGE_SIZE).
> > If they wouldn't give any visible boosts, I think they wouldn't hit
> > mainline.
> >  
> > > Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
> > >  caches, to have a single larger cache, such that whenever it becomes full
> > >  we bulk flush the top half, and when it's empty we bulk alloc the bottom
> > >  half?  That should mean fewer branches, fewer instructions etc. than
> > >  having to decide which cache to act upon every time.  
> >
> > I though about a unified cache, but couldn't decide whether to flush
> > or to allocate heads and how much to process. Your suggestion answers
> > these questions and generally seems great. I'll try that one, thanks!
>  
> The thing is : kmalloc() is supposed to have batches already, and nice
> per-cpu caches.
> 
> This looks like an mm issue, are we sure we want to get over it ?
> 
> I would like a full analysis of why SLAB/SLUB does not work well for
> your test workload.

+1, it does feel like we're getting into mm territory

> More details, more numbers.... before we accept yet another
> 'networking optimization' adding more code to the 'fast' path.
> 
> More code means more latencies when all code needs to be brought up in
> cpu caches.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-13  1:02       ` Jakub Kicinski
@ 2021-01-13  4:46         ` Eric Dumazet
  2021-01-13 17:03           ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2021-01-13  4:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexander Lobakin, Edward Cree, David S. Miller, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

On Wed, Jan 13, 2021 at 2:02 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 12 Jan 2021 13:23:16 +0100 Eric Dumazet wrote:
> > On Tue, Jan 12, 2021 at 12:08 PM Alexander Lobakin <alobakin@pm.me> wrote:
> > >
> > > From: Edward Cree <ecree.xilinx@gmail.com>
> > > Date: Tue, 12 Jan 2021 09:54:04 +0000
> > >
> > > > Without wishing to weigh in on whether this caching is a good idea...
> > >
> > > Well, we already have a cache to bulk flush "consumed" skbs, although
> > > kmem_cache_free() is generally lighter than kmem_cache_alloc(), and
> > > a page frag cache to allocate skb->head that is also bulking the
> > > operations, since it contains a (compound) page with the size of
> > > min(SZ_32K, PAGE_SIZE).
> > > If they wouldn't give any visible boosts, I think they wouldn't hit
> > > mainline.
> > >
> > > > Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
> > > >  caches, to have a single larger cache, such that whenever it becomes full
> > > >  we bulk flush the top half, and when it's empty we bulk alloc the bottom
> > > >  half?  That should mean fewer branches, fewer instructions etc. than
> > > >  having to decide which cache to act upon every time.
> > >
> > > I though about a unified cache, but couldn't decide whether to flush
> > > or to allocate heads and how much to process. Your suggestion answers
> > > these questions and generally seems great. I'll try that one, thanks!
> >
> > The thing is : kmalloc() is supposed to have batches already, and nice
> > per-cpu caches.
> >
> > This looks like an mm issue, are we sure we want to get over it ?
> >
> > I would like a full analysis of why SLAB/SLUB does not work well for
> > your test workload.
>
> +1, it does feel like we're getting into mm territory

I read the existing code, and with Edward Cree idea of reusing the
existing cache (storage of pointers),
ths now all makes sense, since there will be not much added code (and
new storage of 64 pointers)

The remaining issue is to make sure KASAN will still work, we need
this to detect old and new bugs.

Thanks !

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-13  4:46         ` Eric Dumazet
@ 2021-01-13 17:03           ` Jakub Kicinski
  2021-01-13 17:15             ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2021-01-13 17:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Lobakin, Edward Cree, David S. Miller, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

On Wed, 13 Jan 2021 05:46:05 +0100 Eric Dumazet wrote:
> On Wed, Jan 13, 2021 at 2:02 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 12 Jan 2021 13:23:16 +0100 Eric Dumazet wrote:  
> > > On Tue, Jan 12, 2021 at 12:08 PM Alexander Lobakin <alobakin@pm.me> wrote:  
> > > >
> > > > From: Edward Cree <ecree.xilinx@gmail.com>
> > > > Date: Tue, 12 Jan 2021 09:54:04 +0000
> > > >  
> > > > > Without wishing to weigh in on whether this caching is a good idea...  
> > > >
> > > > Well, we already have a cache to bulk flush "consumed" skbs, although
> > > > kmem_cache_free() is generally lighter than kmem_cache_alloc(), and
> > > > a page frag cache to allocate skb->head that is also bulking the
> > > > operations, since it contains a (compound) page with the size of
> > > > min(SZ_32K, PAGE_SIZE).
> > > > If they wouldn't give any visible boosts, I think they wouldn't hit
> > > > mainline.
> > > >  
> > > > > Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
> > > > >  caches, to have a single larger cache, such that whenever it becomes full
> > > > >  we bulk flush the top half, and when it's empty we bulk alloc the bottom
> > > > >  half?  That should mean fewer branches, fewer instructions etc. than
> > > > >  having to decide which cache to act upon every time.  
> > > >
> > > > I though about a unified cache, but couldn't decide whether to flush
> > > > or to allocate heads and how much to process. Your suggestion answers
> > > > these questions and generally seems great. I'll try that one, thanks!  
> > >
> > > The thing is : kmalloc() is supposed to have batches already, and nice
> > > per-cpu caches.
> > >
> > > This looks like an mm issue, are we sure we want to get over it ?
> > >
> > > I would like a full analysis of why SLAB/SLUB does not work well for
> > > your test workload.  
> >
> > +1, it does feel like we're getting into mm territory  
> 
> I read the existing code, and with Edward Cree idea of reusing the
> existing cache (storage of pointers),
> ths now all makes sense, since there will be not much added code (and
> new storage of 64 pointers)
> 
> The remaining issue is to make sure KASAN will still work, we need
> this to detect old and new bugs.

IDK much about MM, but we already have a kmem_cache for skbs and now
we're building a cache on top of a cache.  Shouldn't MM take care of
providing a per-CPU BH-only lockless cache?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-13 17:03           ` Jakub Kicinski
@ 2021-01-13 17:15             ` Eric Dumazet
  2021-01-13 18:12               ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2021-01-13 17:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexander Lobakin, Edward Cree, David S. Miller, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

On Wed, Jan 13, 2021 at 6:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 13 Jan 2021 05:46:05 +0100 Eric Dumazet wrote:
> > On Wed, Jan 13, 2021 at 2:02 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Tue, 12 Jan 2021 13:23:16 +0100 Eric Dumazet wrote:
> > > > On Tue, Jan 12, 2021 at 12:08 PM Alexander Lobakin <alobakin@pm.me> wrote:
> > > > >
> > > > > From: Edward Cree <ecree.xilinx@gmail.com>
> > > > > Date: Tue, 12 Jan 2021 09:54:04 +0000
> > > > >
> > > > > > Without wishing to weigh in on whether this caching is a good idea...
> > > > >
> > > > > Well, we already have a cache to bulk flush "consumed" skbs, although
> > > > > kmem_cache_free() is generally lighter than kmem_cache_alloc(), and
> > > > > a page frag cache to allocate skb->head that is also bulking the
> > > > > operations, since it contains a (compound) page with the size of
> > > > > min(SZ_32K, PAGE_SIZE).
> > > > > If they wouldn't give any visible boosts, I think they wouldn't hit
> > > > > mainline.
> > > > >
> > > > > > Wouldn't it be simpler, rather than having two separate "alloc" and "flush"
> > > > > >  caches, to have a single larger cache, such that whenever it becomes full
> > > > > >  we bulk flush the top half, and when it's empty we bulk alloc the bottom
> > > > > >  half?  That should mean fewer branches, fewer instructions etc. than
> > > > > >  having to decide which cache to act upon every time.
> > > > >
> > > > > I though about a unified cache, but couldn't decide whether to flush
> > > > > or to allocate heads and how much to process. Your suggestion answers
> > > > > these questions and generally seems great. I'll try that one, thanks!
> > > >
> > > > The thing is : kmalloc() is supposed to have batches already, and nice
> > > > per-cpu caches.
> > > >
> > > > This looks like an mm issue, are we sure we want to get over it ?
> > > >
> > > > I would like a full analysis of why SLAB/SLUB does not work well for
> > > > your test workload.
> > >
> > > +1, it does feel like we're getting into mm territory
> >
> > I read the existing code, and with Edward Cree idea of reusing the
> > existing cache (storage of pointers),
> > ths now all makes sense, since there will be not much added code (and
> > new storage of 64 pointers)
> >
> > The remaining issue is to make sure KASAN will still work, we need
> > this to detect old and new bugs.
>
> IDK much about MM, but we already have a kmem_cache for skbs and now
> we're building a cache on top of a cache.  Shouldn't MM take care of
> providing a per-CPU BH-only lockless cache?

I think part of the improvement comes from bulk operations, which are
provided by mm layer.

I also note Alexander made no provision for NUMA awareness.
Probably reusing skb located on a remote node will not be ideal.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing
  2021-01-13 17:15             ` Eric Dumazet
@ 2021-01-13 18:12               ` Jakub Kicinski
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2021-01-13 18:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Lobakin, Edward Cree, David S. Miller, Edward Cree,
	Jonathan Lemon, Willem de Bruijn, Miaohe Lin, Steffen Klassert,
	Guillaume Nault, Yadu Kishore, Al Viro, netdev, LKML

On Wed, 13 Jan 2021 18:15:20 +0100 Eric Dumazet wrote:
> > IDK much about MM, but we already have a kmem_cache for skbs and now
> > we're building a cache on top of a cache.  Shouldn't MM take care of
> > providing a per-CPU BH-only lockless cache?  
> 
> I think part of the improvement comes from bulk operations, which are
> provided by mm layer.
> 
> I also note Alexander made no provision for NUMA awareness.
> Probably reusing skb located on a remote node will not be ideal.

I was wondering about that yesterday, but couldn't really think 
of a legitimate reason not to have XPS set up right. Do you have
particular config in mind, or are we taking "default config"?

Also can't the skb _itself_ be pfmemalloc?

My main point is that I'm wondering if this sort of cache would be
useful when allocating skbs for sockets? Assuming that the network
stack is not isolated to its own cores, won't fronting alloc_skb() 
with 

	bh_disable() 
	try the cache
	bh_enable()

potentially help? In that sense fronting kmem_cache would feel cleaner
than our own little ring buffer.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-01-13 18:13 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 18:27 [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Alexander Lobakin
2021-01-11 18:28 ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Alexander Lobakin
2021-01-11 18:28   ` [PATCH net-next 2/5] skbuff: open-code __build_skb() inside __napi_alloc_skb() Alexander Lobakin
2021-01-11 18:29   ` [PATCH net-next 3/5] skbuff: reuse skbuff_heads from flush_skb_cache if available Alexander Lobakin
2021-01-11 18:29   ` [PATCH net-next 4/5] skbuff: allocate skbuff_heads by bulks instead of one by one Alexander Lobakin
2021-01-11 18:29   ` [PATCH net-next 5/5] skbuff: refill skb_cache early from deferred-to-consume entries Alexander Lobakin
2021-01-11 18:49   ` [PATCH net-next 1/5] skbuff: rename fields of struct napi_alloc_cache to be more intuitive Jonathan Lemon
2021-01-11 21:03     ` Alexander Lobakin
2021-01-12  8:20 ` [PATCH net-next 0/5] skbuff: introduce skbuff_heads bulking and reusing Eric Dumazet
2021-01-12 10:56   ` Alexander Lobakin
2021-01-12 12:32     ` Eric Dumazet
2021-01-12 18:26       ` Alexander Lobakin
2021-01-12 19:19         ` Eric Dumazet
2021-01-12  9:54 ` Edward Cree
2021-01-12 11:08   ` Alexander Lobakin
2021-01-12 12:23     ` Eric Dumazet
2021-01-13  1:02       ` Jakub Kicinski
2021-01-13  4:46         ` Eric Dumazet
2021-01-13 17:03           ` Jakub Kicinski
2021-01-13 17:15             ` Eric Dumazet
2021-01-13 18:12               ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).