[PATCH net-next 0/2] net: use kmem_cache_free_bulk in kfree_skb

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] net: use kmem_cache_free_bulk in kfree_skb_list
@ 2023-01-05 15:42 Jesper Dangaard Brouer
  2023-01-05 15:42 ` [PATCH net-next 1/2] net: fix call location in kfree_skb_list_reason Jesper Dangaard Brouer
  2023-01-05 15:42 ` [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk Jesper Dangaard Brouer
  0 siblings, 2 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2023-01-05 15:42 UTC (permalink / raw)
  To: netdev
  Cc: Jesper Dangaard Brouer, Jakub Kicinski, David S. Miller,
	edumazet, pabeni

The kfree_skb_list function walks SKB (via skb->next) and frees them
individually to the SLUB/SLAB allocator (kmem_cache). It is more
efficient to bulk free them via the kmem_cache_free_bulk API.

Netstack NAPI fastpath already uses kmem_cache bulk alloc and free
APIs for SKBs.

The kfree_skb_list call got an interesting optimization in commit
520ac30f4551 ("net_sched: drop packets after root qdisc lock is
released") that can create a list of SKBs "to_free" e.g. when qdisc
enqueue fails or deliberately chooses to drop . It isn't a normal data
fastpath, but the situation will likely occur when system/qdisc are
under heavy workloads, thus it makes sense to use a faster API for
freeing the SKBs.

E.g. the (often distro default) qdisc fq_codel will drop batches of
packets from fattest elephant flow, default capped at 64 packets (but
adjustable via tc argument drop_batch).

---

Jesper Dangaard Brouer (2):
      net: fix call location in kfree_skb_list_reason
      net: kfree_skb_list use kmem_cache_free_bulk

 net/core/skbuff.c | 67 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 56 insertions(+), 11 deletions(-)

--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/2] net: fix call location in kfree_skb_list_reason
  2023-01-05 15:42 [PATCH net-next 0/2] net: use kmem_cache_free_bulk in kfree_skb_list Jesper Dangaard Brouer
@ 2023-01-05 15:42 ` Jesper Dangaard Brouer
  2023-01-06 19:54   ` Saeed Mahameed
  2023-01-05 15:42 ` [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk Jesper Dangaard Brouer
  1 sibling, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2023-01-05 15:42 UTC (permalink / raw)
  To: netdev
  Cc: Jesper Dangaard Brouer, Jakub Kicinski, David S. Miller,
	edumazet, pabeni

The SKB drop reason uses __builtin_return_address(0) to give the call
"location" to trace_kfree_skb() tracepoint skb:kfree_skb.

To keep this stable for compilers kfree_skb_reason() is annotated with
__fix_address (noinline __noclone) as fixed in commit c205cc7534a9
("net: skb: prevent the split of kfree_skb_reason() by gcc").

The function kfree_skb_list_reason() invoke kfree_skb_reason(), which
cause the __builtin_return_address(0) "location" to report the
unexpected address of kfree_skb_list_reason.

Example output from 'perf script':
 kpktgend_0  1337 [000]    81.002597: skb:kfree_skb: skbaddr=0xffff888144824700 protocol=2048 location=kfree_skb_list_reason+0x1e reason: QDISC_DROP

Patch creates an __always_inline __kfree_skb_reason() helper call that
is called from both kfree_skb_list() and kfree_skb_list_reason().
Suggestions for solutions that shares code better are welcome.

As preparation for next patch move __kfree_skb() invocation out of
this helper function.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/skbuff.c |   34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4a0eb5593275..007a5fbe284b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -932,6 +932,21 @@ void __kfree_skb(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(__kfree_skb);
 
+static __always_inline
+bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
+{
+	if (unlikely(!skb_unref(skb)))
+		return false;
+
+	DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX);
+
+	if (reason == SKB_CONSUMED)
+		trace_consume_skb(skb);
+	else
+		trace_kfree_skb(skb, __builtin_return_address(0), reason);
+	return true;
+}
+
 /**
  *	kfree_skb_reason - free an sk_buff with special reason
  *	@skb: buffer to free
@@ -944,26 +959,19 @@ EXPORT_SYMBOL(__kfree_skb);
 void __fix_address
 kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
 {
-	if (unlikely(!skb_unref(skb)))
-		return;
-
-	DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX);
-
-	if (reason == SKB_CONSUMED)
-		trace_consume_skb(skb);
-	else
-		trace_kfree_skb(skb, __builtin_return_address(0), reason);
-	__kfree_skb(skb);
+	if (__kfree_skb_reason(skb, reason))
+		__kfree_skb(skb);
 }
 EXPORT_SYMBOL(kfree_skb_reason);
 
-void kfree_skb_list_reason(struct sk_buff *segs,
-			   enum skb_drop_reason reason)
+void __fix_address
+kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason)
 {
 	while (segs) {
 		struct sk_buff *next = segs->next;
 
-		kfree_skb_reason(segs, reason);
+		if (__kfree_skb_reason(segs, reason))
+			__kfree_skb(segs);
 		segs = next;
 	}
 }



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-05 15:42 [PATCH net-next 0/2] net: use kmem_cache_free_bulk in kfree_skb_list Jesper Dangaard Brouer
  2023-01-05 15:42 ` [PATCH net-next 1/2] net: fix call location in kfree_skb_list_reason Jesper Dangaard Brouer
@ 2023-01-05 15:42 ` Jesper Dangaard Brouer
  2023-01-06 20:09   ` Saeed Mahameed
  2023-01-06 22:33   ` Jakub Kicinski
  1 sibling, 2 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2023-01-05 15:42 UTC (permalink / raw)
  To: netdev
  Cc: Jesper Dangaard Brouer, Jakub Kicinski, David S. Miller,
	edumazet, pabeni

The kfree_skb_list function walks SKB (via skb->next) and frees them
individually to the SLUB/SLAB allocator (kmem_cache). It is more
efficient to bulk free them via the kmem_cache_free_bulk API.

This patches create a stack local array with SKBs to bulk free while
walking the list. Bulk array size is limited to 16 SKBs to trade off
stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache"
uses objsize 256 bytes usually in an order-1 page 8192 bytes that is
32 objects per slab (can vary on archs and due to SLUB sharing). Thus,
for SLUB the optimal bulk free case is 32 objects belonging to same
slab, but runtime this isn't likely to occur.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/skbuff.c |   39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 007a5fbe284b..e6fa667174d5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -964,16 +964,53 @@ kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
 }
 EXPORT_SYMBOL(kfree_skb_reason);
 
+#define KFREE_SKB_BULK_SIZE	16
+
+struct skb_free_array {
+	unsigned int skb_count;
+	void *skb_array[KFREE_SKB_BULK_SIZE];
+};
+
+static void kfree_skb_defer_local(struct sk_buff *skb,
+				  struct skb_free_array *sa,
+				  enum skb_drop_reason reason)
+{
+	/* if SKB is a clone, don't handle this case */
+	if (unlikely(skb->fclone != SKB_FCLONE_UNAVAILABLE)) {
+		__kfree_skb(skb);
+		return;
+	}
+
+	skb_release_all(skb, reason);
+	sa->skb_array[sa->skb_count++] = skb;
+
+	if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) {
+		kmem_cache_free_bulk(skbuff_head_cache, KFREE_SKB_BULK_SIZE,
+				     sa->skb_array);
+		sa->skb_count = 0;
+	}
+}
+
 void __fix_address
 kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason)
 {
+	struct skb_free_array sa;
+	sa.skb_count = 0;
+
 	while (segs) {
 		struct sk_buff *next = segs->next;
 
+		skb_mark_not_on_list(segs);
+
 		if (__kfree_skb_reason(segs, reason))
-			__kfree_skb(segs);
+			kfree_skb_defer_local(segs, &sa, reason);
+
 		segs = next;
 	}
+
+	if (sa.skb_count)
+		kmem_cache_free_bulk(skbuff_head_cache, sa.skb_count,
+				     sa.skb_array);
 }
 EXPORT_SYMBOL(kfree_skb_list_reason);
 



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/2] net: fix call location in kfree_skb_list_reason
  2023-01-05 15:42 ` [PATCH net-next 1/2] net: fix call location in kfree_skb_list_reason Jesper Dangaard Brouer
@ 2023-01-06 19:54   ` Saeed Mahameed
  2023-01-06 19:57     ` Saeed Mahameed
  0 siblings, 1 reply; 13+ messages in thread
From: Saeed Mahameed @ 2023-01-06 19:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Jakub Kicinski, David S. Miller, edumazet, pabeni

On 05 Jan 16:42, Jesper Dangaard Brouer wrote:
>The SKB drop reason uses __builtin_return_address(0) to give the call
>"location" to trace_kfree_skb() tracepoint skb:kfree_skb.
>
>To keep this stable for compilers kfree_skb_reason() is annotated with
>__fix_address (noinline __noclone) as fixed in commit c205cc7534a9
>("net: skb: prevent the split of kfree_skb_reason() by gcc").
>
>The function kfree_skb_list_reason() invoke kfree_skb_reason(), which
>cause the __builtin_return_address(0) "location" to report the
>unexpected address of kfree_skb_list_reason.
>
>Example output from 'perf script':
> kpktgend_0  1337 [000]    81.002597: skb:kfree_skb: skbaddr=0xffff888144824700 protocol=2048 location=kfree_skb_list_reason+0x1e reason: QDISC_DROP
>
>Patch creates an __always_inline __kfree_skb_reason() helper call that
>is called from both kfree_skb_list() and kfree_skb_list_reason().
>Suggestions for solutions that shares code better are welcome.
>
>As preparation for next patch move __kfree_skb() invocation out of
>this helper function.
>
>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>---
> net/core/skbuff.c |   34 +++++++++++++++++++++-------------
> 1 file changed, 21 insertions(+), 13 deletions(-)
>
>diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>index 4a0eb5593275..007a5fbe284b 100644
>--- a/net/core/skbuff.c
>+++ b/net/core/skbuff.c
>@@ -932,6 +932,21 @@ void __kfree_skb(struct sk_buff *skb)
> }
> EXPORT_SYMBOL(__kfree_skb);
>
>+static __always_inline
>+bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
>+{
>+	if (unlikely(!skb_unref(skb)))
>+		return false;
>+
>+	DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX);
>+
>+	if (reason == SKB_CONSUMED)
>+		trace_consume_skb(skb);
>+	else
>+		trace_kfree_skb(skb, __builtin_return_address(0), reason);
>+	return true;

why not just call __kfree_skb(skb); here instead of the boolean return ? 
if it because __kfree_skb() makes a call to
skb_release_all()->..->kfree_skb_list_reason()
then it's already too deep and the return address in that case isn't
predictable, so you're not avoiding any breakage by keeping
direct calls to __kfree_skb() from kfree_skb_reason and+kfree_skb_list_reason

>+}
>+
> /**
>  *	kfree_skb_reason - free an sk_buff with special reason
>  *	@skb: buffer to free
>@@ -944,26 +959,19 @@ EXPORT_SYMBOL(__kfree_skb);
> void __fix_address
> kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
> {
>-	if (unlikely(!skb_unref(skb)))
>-		return;
>-
>-	DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX);
>-
>-	if (reason == SKB_CONSUMED)
>-		trace_consume_skb(skb);
>-	else
>-		trace_kfree_skb(skb, __builtin_return_address(0), reason);
>-	__kfree_skb(skb);
>+	if (__kfree_skb_reason(skb, reason))
>+		__kfree_skb(skb);
> }
> EXPORT_SYMBOL(kfree_skb_reason);
>
>-void kfree_skb_list_reason(struct sk_buff *segs,
>-			   enum skb_drop_reason reason)
>+void __fix_address
>+kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason)
> {
> 	while (segs) {
> 		struct sk_buff *next = segs->next;
>
>-		kfree_skb_reason(segs, reason);
>+		if (__kfree_skb_reason(segs, reason))
>+			__kfree_skb(segs);
> 		segs = next;
> 	}
> }
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/2] net: fix call location in kfree_skb_list_reason
  2023-01-06 19:54   ` Saeed Mahameed
@ 2023-01-06 19:57     ` Saeed Mahameed
  0 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2023-01-06 19:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Jakub Kicinski, David S. Miller, edumazet, pabeni

On 06 Jan 11:54, Saeed Mahameed wrote:
>On 05 Jan 16:42, Jesper Dangaard Brouer wrote:
>>The SKB drop reason uses __builtin_return_address(0) to give the call
>>"location" to trace_kfree_skb() tracepoint skb:kfree_skb.
>>
>>To keep this stable for compilers kfree_skb_reason() is annotated with
>>__fix_address (noinline __noclone) as fixed in commit c205cc7534a9
>>("net: skb: prevent the split of kfree_skb_reason() by gcc").
>>
>>The function kfree_skb_list_reason() invoke kfree_skb_reason(), which
>>cause the __builtin_return_address(0) "location" to report the
>>unexpected address of kfree_skb_list_reason.
>>
>>Example output from 'perf script':
>>kpktgend_0  1337 [000]    81.002597: skb:kfree_skb: skbaddr=0xffff888144824700 protocol=2048 location=kfree_skb_list_reason+0x1e reason: QDISC_DROP
>>
>>Patch creates an __always_inline __kfree_skb_reason() helper call that
>>is called from both kfree_skb_list() and kfree_skb_list_reason().
>>Suggestions for solutions that shares code better are welcome.
>>
>>As preparation for next patch move __kfree_skb() invocation out of
>>this helper function.
>>
>>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>---
>>net/core/skbuff.c |   34 +++++++++++++++++++++-------------
>>1 file changed, 21 insertions(+), 13 deletions(-)
>>
>>diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>index 4a0eb5593275..007a5fbe284b 100644
>>--- a/net/core/skbuff.c
>>+++ b/net/core/skbuff.c
>>@@ -932,6 +932,21 @@ void __kfree_skb(struct sk_buff *skb)
>>}
>>EXPORT_SYMBOL(__kfree_skb);
>>
>>+static __always_inline
>>+bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
>>+{
>>+	if (unlikely(!skb_unref(skb)))
>>+		return false;
>>+
>>+	DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX);
>>+
>>+	if (reason == SKB_CONSUMED)
>>+		trace_consume_skb(skb);
>>+	else
>>+		trace_kfree_skb(skb, __builtin_return_address(0), reason);
>>+	return true;
>
>why not just call __kfree_skb(skb); here instead of the boolean return 
>? if it because __kfree_skb() makes a call to
>skb_release_all()->..->kfree_skb_list_reason()
>then it's already too deep and the return address in that case isn't
>predictable, so you're not avoiding any breakage by keeping
>direct calls to __kfree_skb() from kfree_skb_reason and+kfree_skb_list_reason
>

I see now, it's because of the next patch, you will use a different
deallocator in kfree_skb_list_reason.

Reviewed-by: Saeed Mahameed <saeed@kernel.org>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-05 15:42 ` [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk Jesper Dangaard Brouer
@ 2023-01-06 20:09   ` Saeed Mahameed
  2023-01-06 22:33   ` Jakub Kicinski
  1 sibling, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2023-01-06 20:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Jakub Kicinski, David S. Miller, edumazet, pabeni

On 05 Jan 16:42, Jesper Dangaard Brouer wrote:
>The kfree_skb_list function walks SKB (via skb->next) and frees them
>individually to the SLUB/SLAB allocator (kmem_cache). It is more
>efficient to bulk free them via the kmem_cache_free_bulk API.
>
>This patches create a stack local array with SKBs to bulk free while
>walking the list. Bulk array size is limited to 16 SKBs to trade off
>stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache"
>uses objsize 256 bytes usually in an order-1 page 8192 bytes that is
>32 objects per slab (can vary on archs and due to SLUB sharing). Thus,
>for SLUB the optimal bulk free case is 32 objects belonging to same
>slab, but runtime this isn't likely to occur.
>
>Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

any performance numbers ? 

LGTM,
Reviewed-by: Saeed Mahameed <saeed@kernel.org>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-05 15:42 ` [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk Jesper Dangaard Brouer
  2023-01-06 20:09   ` Saeed Mahameed
@ 2023-01-06 22:33   ` Jakub Kicinski
  2023-01-09 12:24     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2023-01-06 22:33 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, David S. Miller, edumazet, pabeni

Hi!

Would it not be better to try to actually defer them (queue to 
the deferred free list and try to ship back to the NAPI cache of 
the allocating core)? Is the spin lock on the defer list problematic
for fowarding cases (which I'm assuming your target)?

Also the lack of perf numbers is a bit of a red flag.

On Thu, 05 Jan 2023 16:42:47 +0100 Jesper Dangaard Brouer wrote:
> +static void kfree_skb_defer_local(struct sk_buff *skb,
> +				  struct skb_free_array *sa,
> +				  enum skb_drop_reason reason)

If we wanna keep the implementation as is - I think we should rename
the thing to say "bulk" rather than "defer" to avoid confusion with 
the TCP's "defer to allocating core" scheme..

kfree_skb_list_bulk() ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-06 22:33   ` Jakub Kicinski
@ 2023-01-09 12:24     ` Jesper Dangaard Brouer
  2023-01-09 19:34       ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2023-01-09 12:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: brouer, netdev, David S. Miller, edumazet, pabeni

On 06/01/2023 23.33, Jakub Kicinski wrote:
> Hi!
> 
> Would it not be better to try to actually defer them (queue to
> the deferred free list and try to ship back to the NAPI cache of
> the allocating core)? 
> Is the spin lock on the defer list problematic
> for fowarding cases (which I'm assuming your target)?

We might be talking past each-other.  As the NAPI cache for me
is the per CPU napi_alloc_cache (this_cpu_ptr(&napi_alloc_cache);)

This napi_alloc_cache doesn't use a spin_lock, but depend on being
protected by NAPI context.  The code in this patch closely resembles how
the napi_alloc_cache works.  See code: napi_consume_skb() and
__kfree_skb_defer().

> Also the lack of perf numbers is a bit of a red flag.
>

I have run performance tests, but as I tried to explain in the
cover letter, for the qdisc use-case this code path is only activated
when we have overflow at enqueue.  Thus, this doesn't translate directly
into a performance numbers, as TX-qdisc is 100% full caused by hardware
device being backed up, and this patch makes us use less time on freeing
memory.

I have been using pktgen script ./pktgen_bench_xmit_mode_queue_xmit.sh
which can inject packets at the qdisc layer (invoking __dev_queue_xmit).
And then used perf-record to see overhead of SLUB (__slab_free is top#4)
is reduced.

> On Thu, 05 Jan 2023 16:42:47 +0100 Jesper Dangaard Brouer wrote:
>> +static void kfree_skb_defer_local(struct sk_buff *skb,
>> +				  struct skb_free_array *sa,
>> +				  enum skb_drop_reason reason)
> 
> If we wanna keep the implementation as is - I think we should rename
> the thing to say "bulk" rather than "defer" to avoid confusion with
> the TCP's "defer to allocating core" scheme..

I named it "defer" because the NAPI cache uses "defer" specifically func
name __kfree_skb_defer() why I choose kfree_skb_defer_local(), as this
patch uses similar scheme.

I'm not sure what is meant by 'TCP's "defer to allocating core" scheme'.
Looking at code I guess you are referring to skb_attempt_defer_free()
and skb_defer_free_flush().

It would be too high cost calling skb_attempt_defer_free() for every SKB
because of the expensive spin_lock_irqsave() (+ restore).  I see the
skb_defer_free_flush() can be improved to use spin_lock_irq() (avoiding
mangling CPU flags).  And skb_defer_free_flush() (which gets called from
RX-NAPI/net_rx_action) end up calling napi_consume_skb() that endup
calling kmem_cache_free_bulk() (which I also do, just more directly).

> 
> kfree_skb_list_bulk() ?

Hmm, IMHO not really worth changing the function name.  The
kfree_skb_list() is called in more places, (than qdisc enqueue-overflow
case), which automatically benefits if we keep the function name
kfree_skb_list().

--Jesper

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-09 12:24     ` Jesper Dangaard Brouer
@ 2023-01-09 19:34       ` Jakub Kicinski
  2023-01-09 22:10         ` Alexander H Duyck
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2023-01-09 19:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: brouer, netdev, David S. Miller, edumazet, pabeni

On Mon, 9 Jan 2023 13:24:54 +0100 Jesper Dangaard Brouer wrote:
> > Also the lack of perf numbers is a bit of a red flag.
> >  
> 
> I have run performance tests, but as I tried to explain in the
> cover letter, for the qdisc use-case this code path is only activated
> when we have overflow at enqueue.  Thus, this doesn't translate directly
> into a performance numbers, as TX-qdisc is 100% full caused by hardware
> device being backed up, and this patch makes us use less time on freeing
> memory.

I guess it's quite subjective, so it'd be good to get a third opinion.
To me that reads like a premature optimization. Saeed asked for perf
numbers, too.

Does anyone on the list want to cast the tie-break vote?

> I have been using pktgen script ./pktgen_bench_xmit_mode_queue_xmit.sh
> which can inject packets at the qdisc layer (invoking __dev_queue_xmit).
> And then used perf-record to see overhead of SLUB (__slab_free is top#4)
> is reduced.

Right, pktgen wasting time while still delivering line rate is not of
practical importance.

> > kfree_skb_list_bulk() ?  
> 
> Hmm, IMHO not really worth changing the function name.  The
> kfree_skb_list() is called in more places, (than qdisc enqueue-overflow
> case), which automatically benefits if we keep the function name
> kfree_skb_list().

To be clear - I was suggesting a simple
  s/kfree_skb_defer_local/kfree_skb_list_bulk/
on the patch, just renaming the static helper.

IMO now that we have multiple freeing optimizations using "defer" for
the TCP scheme and "bulk" for your prior slab bulk optimizations would
improve clarity.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-09 19:34       ` Jakub Kicinski
@ 2023-01-09 22:10         ` Alexander H Duyck
  2023-01-10 14:52           ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander H Duyck @ 2023-01-09 22:10 UTC (permalink / raw)
  To: Jakub Kicinski, Jesper Dangaard Brouer
  Cc: brouer, netdev, David S. Miller, edumazet, pabeni

On Mon, 2023-01-09 at 11:34 -0800, Jakub Kicinski wrote:
> On Mon, 9 Jan 2023 13:24:54 +0100 Jesper Dangaard Brouer wrote:
> > > Also the lack of perf numbers is a bit of a red flag.
> > >  
> > 
> > I have run performance tests, but as I tried to explain in the
> > cover letter, for the qdisc use-case this code path is only activated
> > when we have overflow at enqueue.  Thus, this doesn't translate directly
> > into a performance numbers, as TX-qdisc is 100% full caused by hardware
> > device being backed up, and this patch makes us use less time on freeing
> > memory.
> 
> I guess it's quite subjective, so it'd be good to get a third opinion.
> To me that reads like a premature optimization. Saeed asked for perf
> numbers, too.
> 
> Does anyone on the list want to cast the tie-break vote?

I'd say there is some value to be gained by this. Basically it means
less overhead for dropping packets if we picked a backed up Tx path.

> > I have been using pktgen script ./pktgen_bench_xmit_mode_queue_xmit.sh
> > which can inject packets at the qdisc layer (invoking __dev_queue_xmit).
> > And then used perf-record to see overhead of SLUB (__slab_free is top#4)
> > is reduced.
> 
> Right, pktgen wasting time while still delivering line rate is not of
> practical importance.

I suspect there are probably more real world use cases out there.
Although to test it you would probably have to have a congested network
to really be able to show much of a benefit.

With the pktgen I would be interested in seeing the Qdisc dropped
numbers for with vs without this patch. I would consider something like
that comparable to us doing an XDP_DROP test since all we are talking
about is a synthetic benchmark.

> 
> > > kfree_skb_list_bulk() ?  
> > 
> > Hmm, IMHO not really worth changing the function name.  The
> > kfree_skb_list() is called in more places, (than qdisc enqueue-overflow
> > case), which automatically benefits if we keep the function name
> > kfree_skb_list().
> 
> To be clear - I was suggesting a simple
>   s/kfree_skb_defer_local/kfree_skb_list_bulk/
> on the patch, just renaming the static helper.
> 
> IMO now that we have multiple freeing optimizations using "defer" for
> the TCP scheme and "bulk" for your prior slab bulk optimizations would
> improve clarity.

Rather than defer_local would it maybe make more sense to look at
naming it something like "kfree_skb_add_bulk"? Basically we are
building onto the list of buffers to free so I figure something like an
"add" or "append" would make sense.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-09 22:10         ` Alexander H Duyck
@ 2023-01-10 14:52           ` Jesper Dangaard Brouer
  2023-01-10 20:20             ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2023-01-10 14:52 UTC (permalink / raw)
  To: Alexander H Duyck, Jakub Kicinski, Jesper Dangaard Brouer
  Cc: brouer, netdev, David S. Miller, edumazet, pabeni


On 09/01/2023 23.10, Alexander H Duyck wrote:
> On Mon, 2023-01-09 at 11:34 -0800, Jakub Kicinski wrote:
>> On Mon, 9 Jan 2023 13:24:54 +0100 Jesper Dangaard Brouer wrote:
>>>> Also the lack of perf numbers is a bit of a red flag.
>>>>   
>>>
>>> I have run performance tests, but as I tried to explain in the
>>> cover letter, for the qdisc use-case this code path is only activated
>>> when we have overflow at enqueue.  Thus, this doesn't translate directly
>>> into a performance numbers, as TX-qdisc is 100% full caused by hardware
>>> device being backed up, and this patch makes us use less time on freeing
>>> memory.
>>
>> I guess it's quite subjective, so it'd be good to get a third opinion.
>> To me that reads like a premature optimization. Saeed asked for perf
>> numbers, too.
>>
>> Does anyone on the list want to cast the tie-break vote?
> 
> I'd say there is some value to be gained by this. Basically it means
> less overhead for dropping packets if we picked a backed up Tx path.
> 

Thanks.

I have microbenchmarks[1] of kmem_cache bulking, which I use to assess 
what is the (best-case) expected gain of using the bulk APIs.

The module 'slab_bulk_test01' results at bulk 16 element:

  kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16)
  kmem-bulk    Per elem: 64 cycles(tsc) 17.905 ns (step:16)

Thus, best-case expected gain is: 45 cycles(tsc) 12.627 ns.
  - With usual microbenchmarks caveats
  - Notice this is both bulk alloc and free

[1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm

>>> I have been using pktgen script ./pktgen_bench_xmit_mode_queue_xmit.sh
>>> which can inject packets at the qdisc layer (invoking __dev_queue_xmit).
>>> And then used perf-record to see overhead of SLUB (__slab_free is top#4)
>>> is reduced.
>>
>> Right, pktgen wasting time while still delivering line rate is not of
>> practical importance.
> 

I better explain how I cause the push-back without hitting 10Gbit/s line
rate (as we/Linux cannot allocated SKBs fast enough for this).

I'm testing this on a 10Gbit/s interface (driver ixgbe). The challenge 
is that I need to overload the qdisc enqueue layer as that is triggering 
the call to kfree_skb_list().

Linux with SKBs and qdisc injecting with pktgen is limited to producing 
packets at (measured) 2,205,588 pps with a single TX-queue (and scaling 
up 1,951,771 pps per queue or 512 ns per pkt). Reminder 10Gbit/s at 64 
bytes packets is 14.8 Mpps (or 67.2 ns per pkt).

The trick to trigger the qdisc push-back way earlier is Ethernet
flow-control (which is on by default).

I was a bit surprised to see, but using pktgen_bench_xmit_mode_queue_xmit.sh
on my testlab the remote host was pushing back a lot, resulting in only
256Kpps being actually sent on wire. Monitored with ethtool stats script[2].


[2] 
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

> I suspect there are probably more real world use cases out there.
> Although to test it you would probably have to have a congested network
> to really be able to show much of a benefit.
> 
> With the pktgen I would be interested in seeing the Qdisc dropped
> numbers for with vs without this patch. I would consider something like
> that comparable to us doing an XDP_DROP test since all we are talking
> about is a synthetic benchmark.

The pktgen script output how many packets it have transmitted, but from
above we know that this most of these packets are actually getting
dropped as only 256Kpps are reaching the wire.

Result line from pktgen script: count 100000000 (60byte,0frags)
  - Unpatched kernel: 2396594pps 1150Mb/sec (1150365120bps) errors: 1417469
  - Patched kernel  : 2479970pps 1190Mb/sec (1190385600bps) errors: 1422753

Difference:
  * +83376 pps faster (2479970-2396594)
  * -14 nanosec faster (1/2479970-1/2396594)*10^9

The patched kernel is faster. Around the expected gain from using the
kmem_cache bulking API (slightly more actually).

More raw data and notes for this email avail in [3]:

  [3] 
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org


>>
>>>> kfree_skb_list_bulk() ?
>>>
>>> Hmm, IMHO not really worth changing the function name.  The
>>> kfree_skb_list() is called in more places, (than qdisc enqueue-overflow
>>> case), which automatically benefits if we keep the function name
>>> kfree_skb_list().
>>
>> To be clear - I was suggesting a simple
>>    s/kfree_skb_defer_local/kfree_skb_list_bulk/
>> on the patch, just renaming the static helper.
>>

Okay, I get it now. But I disagree with same argument as Alex makes below.

>> IMO now that we have multiple freeing optimizations using "defer" for
>> the TCP scheme and "bulk" for your prior slab bulk optimizations would
>> improve clarity.
> 
> Rather than defer_local would it maybe make more sense to look at
> naming it something like "kfree_skb_add_bulk"? Basically we are
> building onto the list of buffers to free so I figure something like an
> "add" or "append" would make sense.
> 

I agree with Alex, that we are building up buffers to be freed *later*,
thus we should somehow reflect that in the naming.

--Jesper


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-10 14:52           ` Jesper Dangaard Brouer
@ 2023-01-10 20:20             ` Jakub Kicinski
  2023-01-13 13:42               ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2023-01-10 20:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexander H Duyck, brouer, netdev, David S. Miller, edumazet, pabeni

On Tue, 10 Jan 2023 15:52:48 +0100 Jesper Dangaard Brouer wrote:
> > Rather than defer_local would it maybe make more sense to look at
> > naming it something like "kfree_skb_add_bulk"? Basically we are
> > building onto the list of buffers to free so I figure something like an
> > "add" or "append" would make sense.
> 
> I agree with Alex

Alex's suggestion (kfree_skb_add_bulk) sgtm.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk
  2023-01-10 20:20             ` Jakub Kicinski
@ 2023-01-13 13:42               ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2023-01-13 13:42 UTC (permalink / raw)
  To: Jakub Kicinski, Jesper Dangaard Brouer
  Cc: brouer, Alexander H Duyck, netdev, David S. Miller, edumazet, pabeni

On 10/01/2023 21.20, Jakub Kicinski wrote:
> On Tue, 10 Jan 2023 15:52:48 +0100 Jesper Dangaard Brouer wrote:
>>> Rather than defer_local would it maybe make more sense to look at
>>> naming it something like "kfree_skb_add_bulk"? Basically we are
>>> building onto the list of buffers to free so I figure something like an
>>> "add" or "append" would make sense.
>>
>> I agree with Alex
> 
> Alex's suggestion (kfree_skb_add_bulk) sgtm.

Okay, great I'll use that and send a V2.

--Jesper

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-13 13:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05 15:42 [PATCH net-next 0/2] net: use kmem_cache_free_bulk in kfree_skb_list Jesper Dangaard Brouer
2023-01-05 15:42 ` [PATCH net-next 1/2] net: fix call location in kfree_skb_list_reason Jesper Dangaard Brouer
2023-01-06 19:54   ` Saeed Mahameed
2023-01-06 19:57     ` Saeed Mahameed
2023-01-05 15:42 ` [PATCH net-next 2/2] net: kfree_skb_list use kmem_cache_free_bulk Jesper Dangaard Brouer
2023-01-06 20:09   ` Saeed Mahameed
2023-01-06 22:33   ` Jakub Kicinski
2023-01-09 12:24     ` Jesper Dangaard Brouer
2023-01-09 19:34       ` Jakub Kicinski
2023-01-09 22:10         ` Alexander H Duyck
2023-01-10 14:52           ` Jesper Dangaard Brouer
2023-01-10 20:20             ` Jakub Kicinski
2023-01-13 13:42               ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).