All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: polish skb defer freeing
@ 2022-05-16  4:24 Eric Dumazet
  2022-05-16  4:24 ` [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free() Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16  4:24 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

While testing this recently added feature on a variety
of platforms/configurations, I found the following issues:

1) A race leading to concurrent calls to smp_call_function_single_async()

2) Missed opportunity to use napi_consume_skb()

3) Need to limit the max length of the per-cpu lists.

4) Process the per-cpu list more frequently, for the
   (unusual) case where net_rx_action() has mutiple
   napi_poll() to process per round.

Eric Dumazet (4):
  net: fix possible race in skb_attempt_defer_free()
  net: use napi_consume_skb() in skb_defer_free_flush()
  net: add skb_defer_max sysctl
  net: call skb_defer_free_flush() before each napi_poll()

 Documentation/admin-guide/sysctl/net.rst |  8 ++++++++
 include/linux/netdevice.h                |  1 +
 net/core/dev.c                           | 15 ++++++++++-----
 net/core/dev.h                           |  2 +-
 net/core/skbuff.c                        | 18 ++++++++++--------
 net/core/sysctl_net_core.c               |  8 ++++++++
 6 files changed, 38 insertions(+), 14 deletions(-)

-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free()
  2022-05-16  4:24 [PATCH net-next 0/4] net: polish skb defer freeing Eric Dumazet
@ 2022-05-16  4:24 ` Eric Dumazet
  2022-05-16 18:15   ` Jakub Kicinski
  2022-05-16  4:24 ` [PATCH net-next 2/4] net: use napi_consume_skb() in skb_defer_free_flush() Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16  4:24 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

A cpu can observe sd->defer_count reaching 128,
and call smp_call_function_single_async()

Problem is that the remote CPU can clear sd->defer_count
before the IPI is run/acknowledged.

Other cpus can queue more packets and also decide
to call smp_call_function_single_async() while the pending
IPI was not yet delivered.

This is a common issue with smp_call_function_single_async().
Callers must ensure correct synchronization and serialization.

I triggered this issue while experimenting smaller threshold.
Performing the call to smp_call_function_single_async()
under sd->defer_lock protection did not solve the problem.

Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
to insert locked csd") replaced an informative WARN_ON_ONCE()
with a return of -EBUSY, which is often ignored.
Test of CSD_FLAG_LOCK presence is racy anyway.

Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 1 +
 net/core/dev.c            | 7 +++++--
 net/core/skbuff.c         | 5 ++---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 536321691c725ebd311088f4654dd04b9abbaaef..89699a299ba1b0b544b7abb782708d827abe55ac 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3127,6 +3127,7 @@ struct softnet_data {
 	/* Another possibly contended cache line */
 	spinlock_t		defer_lock ____cacheline_aligned_in_smp;
 	int			defer_count;
+	int			defer_ipi_scheduled;
 	struct sk_buff		*defer_list;
 	call_single_data_t	defer_csd;
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index a601da3b4a7c800801f763f097f00f3a3b591107..d708f95356e0b03a61e8211adcf6d272dfa322b5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4581,9 +4581,12 @@ static void rps_trigger_softirq(void *data)
 #endif /* CONFIG_RPS */
 
 /* Called from hardirq (IPI) context */
-static void trigger_rx_softirq(void *data __always_unused)
+static void trigger_rx_softirq(void *data)
 {
+	struct softnet_data *sd = data;
+
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	smp_store_release(&sd->defer_ipi_scheduled, 0);
 }
 
 /*
@@ -11381,7 +11384,7 @@ static int __init net_dev_init(void)
 		INIT_CSD(&sd->csd, rps_trigger_softirq, sd);
 		sd->cpu = i;
 #endif
-		INIT_CSD(&sd->defer_csd, trigger_rx_softirq, NULL);
+		INIT_CSD(&sd->defer_csd, trigger_rx_softirq, sd);
 		spin_lock_init(&sd->defer_lock);
 
 		init_gro_hash(&sd->backlog);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bd16e158b3668f496a9d2c8e8b6f3433a326314c..1e2180682f2e94c45e3f26059af6d18be2d9f9d3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6514,8 +6514,7 @@ void skb_attempt_defer_free(struct sk_buff *skb)
 	sd->defer_count++;
 
 	/* kick every time queue length reaches 128.
-	 * This should avoid blocking in smp_call_function_single_async().
-	 * This condition should hardly be bit under normal conditions,
+	 * This condition should hardly be hit under normal conditions,
 	 * unless cpu suddenly stopped to receive NIC interrupts.
 	 */
 	kick = sd->defer_count == 128;
@@ -6525,6 +6524,6 @@ void skb_attempt_defer_free(struct sk_buff *skb)
 	/* Make sure to trigger NET_RX_SOFTIRQ on the remote CPU
 	 * if we are unlucky enough (this seems very unlikely).
 	 */
-	if (unlikely(kick))
+	if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1))
 		smp_call_function_single_async(cpu, &sd->defer_csd);
 }
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/4] net: use napi_consume_skb() in skb_defer_free_flush()
  2022-05-16  4:24 [PATCH net-next 0/4] net: polish skb defer freeing Eric Dumazet
  2022-05-16  4:24 ` [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free() Eric Dumazet
@ 2022-05-16  4:24 ` Eric Dumazet
  2022-05-16  4:24 ` [PATCH net-next 3/4] net: add skb_defer_max sysctl Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16  4:24 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

skb_defer_free_flush() runs from softirq context,
we have the opportunity to refill the napi_alloc_cache,
and/or use kmem_cache_free_bulk() when this cache is full.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d708f95356e0b03a61e8211adcf6d272dfa322b5..35b6d79b0c51412534dc3b3374b8d797d212f2d8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6632,7 +6632,7 @@ static void skb_defer_free_flush(struct softnet_data *sd)
 
 	while (skb != NULL) {
 		next = skb->next;
-		__kfree_skb(skb);
+		napi_consume_skb(skb, 1);
 		skb = next;
 	}
 }
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/4] net: add skb_defer_max sysctl
  2022-05-16  4:24 [PATCH net-next 0/4] net: polish skb defer freeing Eric Dumazet
  2022-05-16  4:24 ` [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free() Eric Dumazet
  2022-05-16  4:24 ` [PATCH net-next 2/4] net: use napi_consume_skb() in skb_defer_free_flush() Eric Dumazet
@ 2022-05-16  4:24 ` Eric Dumazet
  2022-05-16 20:39   ` Jakub Kicinski
  2022-05-16  4:24 ` [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll() Eric Dumazet
  2022-05-16 10:40 ` [PATCH net-next 0/4] net: polish skb defer freeing patchwork-bot+netdevbpf
  4 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16  4:24 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

commit 68822bdf76f1 ("net: generalize skb freeing
deferral to per-cpu lists") added another per-cpu
cache of skbs. It was expected to be small,
and an IPI was forced whenever the list reached 128
skbs.

We might need to be able to control more precisely
queue capacity and added latency.

An IPI is generated whenever queue reaches half capacity.

Default value of the new limit is 64.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 Documentation/admin-guide/sysctl/net.rst |  8 ++++++++
 net/core/dev.c                           |  1 +
 net/core/dev.h                           |  2 +-
 net/core/skbuff.c                        | 15 +++++++++------
 net/core/sysctl_net_core.c               |  8 ++++++++
 5 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index f86b5e1623c6922b070fd7c62e83271ee9aee46c..fa4dcdb283cf8937df8414906b57949cc7a3c2bc 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -322,6 +322,14 @@ a leaked reference faster. A larger value may be useful to prevent false
 warnings on slow/loaded systems.
 Default value is 10, minimum 1, maximum 3600.
 
+skb_defer_max
+-------------
+
+Max size (in skbs) of the per-cpu list of skbs being freed
+by the cpu which allocated them. Used by TCP stack so far.
+
+Default: 64
+
 optmem_max
 ----------
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 35b6d79b0c51412534dc3b3374b8d797d212f2d8..ac22fedfeaf72dc0d46f4793bbd9b2d5dd301730 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4329,6 +4329,7 @@ int netdev_max_backlog __read_mostly = 1000;
 EXPORT_SYMBOL(netdev_max_backlog);
 
 int netdev_tstamp_prequeue __read_mostly = 1;
+unsigned int sysctl_skb_defer_max __read_mostly = 64;
 int netdev_budget __read_mostly = 300;
 /* Must be at least 2 jiffes to guarantee 1 jiffy timeout */
 unsigned int __read_mostly netdev_budget_usecs = 2 * USEC_PER_SEC / HZ;
diff --git a/net/core/dev.h b/net/core/dev.h
index 328b37af90ba9465d83c833dffd18547ddef4028..cbb8a925175a257f8ce2a27eebb02e03041eebb8 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -39,7 +39,7 @@ void dev_addr_check(struct net_device *dev);
 /* sysctls not referred to from outside net/core/ */
 extern int		netdev_budget;
 extern unsigned int	netdev_budget_usecs;
-
+extern unsigned int	sysctl_skb_defer_max;
 extern int		netdev_tstamp_prequeue;
 extern int		netdev_unregister_timeout_secs;
 extern int		weight_p;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1e2180682f2e94c45e3f26059af6d18be2d9f9d3..ecb8fe7045a2f9c080cd0299ff7c0c1ea88d996b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -80,6 +80,7 @@
 #include <linux/user_namespace.h>
 #include <linux/indirect_call_wrapper.h>
 
+#include "dev.h"
 #include "sock_destructor.h"
 
 struct kmem_cache *skbuff_head_cache __ro_after_init;
@@ -6494,16 +6495,21 @@ void skb_attempt_defer_free(struct sk_buff *skb)
 	int cpu = skb->alloc_cpu;
 	struct softnet_data *sd;
 	unsigned long flags;
+	unsigned int defer_max;
 	bool kick;
 
 	if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
 	    !cpu_online(cpu) ||
 	    cpu == raw_smp_processor_id()) {
-		__kfree_skb(skb);
+nodefer:	__kfree_skb(skb);
 		return;
 	}
 
 	sd = &per_cpu(softnet_data, cpu);
+	defer_max = READ_ONCE(sysctl_skb_defer_max);
+	if (READ_ONCE(sd->defer_count) >= defer_max)
+		goto nodefer;
+
 	/* We do not send an IPI or any signal.
 	 * Remote cpu will eventually call skb_defer_free_flush()
 	 */
@@ -6513,11 +6519,8 @@ void skb_attempt_defer_free(struct sk_buff *skb)
 	WRITE_ONCE(sd->defer_list, skb);
 	sd->defer_count++;
 
-	/* kick every time queue length reaches 128.
-	 * This condition should hardly be hit under normal conditions,
-	 * unless cpu suddenly stopped to receive NIC interrupts.
-	 */
-	kick = sd->defer_count == 128;
+	/* Send an IPI every time queue reaches half capacity. */
+	kick = sd->defer_count == (defer_max >> 1);
 
 	spin_unlock_irqrestore(&sd->defer_lock, flags);
 
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 195ca5c2877125bf420128d3c6465ac216f459e5..ca8d38325e1e1d7775d61893fab94ff9499ef5f8 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -578,6 +578,14 @@ static struct ctl_table net_core_table[] = {
 		.extra1		= SYSCTL_ONE,
 		.extra2		= &int_3600,
 	},
+	{
+		.procname	= "skb_defer_max",
+		.data		= &sysctl_skb_defer_max,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+	},
 	{ }
 };
 
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll()
  2022-05-16  4:24 [PATCH net-next 0/4] net: polish skb defer freeing Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-05-16  4:24 ` [PATCH net-next 3/4] net: add skb_defer_max sysctl Eric Dumazet
@ 2022-05-16  4:24 ` Eric Dumazet
  2022-05-16 18:21   ` Jakub Kicinski
  2022-05-16 10:40 ` [PATCH net-next 0/4] net: polish skb defer freeing patchwork-bot+netdevbpf
  4 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16  4:24 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

skb_defer_free_flush() can consume cpu cycles,
it seems better to call it in the inner loop:

- Potentially frees page/skb that will be reallocated while hot.

- Account for the cpu cycles in the @time_limit determination.

- Keep softnet_data.defer_count small to reduce chances for
  skb_attempt_defer_free() to send an IPI.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ac22fedfeaf72dc0d46f4793bbd9b2d5dd301730..aabb695e25f35402bbc85602e1bb9c86d2fa209e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6654,6 +6654,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 	for (;;) {
 		struct napi_struct *n;
 
+		skb_defer_free_flush(sd);
+
 		if (list_empty(&list)) {
 			if (!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll))
 				goto end;
@@ -6683,8 +6685,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 
 	net_rps_action_and_irq_enable(sd);
-end:
-	skb_defer_free_flush(sd);
+end:;
 }
 
 struct netdev_adjacent {
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/4] net: polish skb defer freeing
  2022-05-16  4:24 [PATCH net-next 0/4] net: polish skb defer freeing Eric Dumazet
                   ` (3 preceding siblings ...)
  2022-05-16  4:24 ` [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll() Eric Dumazet
@ 2022-05-16 10:40 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-16 10:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, edumazet

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Sun, 15 May 2022 21:24:52 -0700 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> While testing this recently added feature on a variety
> of platforms/configurations, I found the following issues:
> 
> 1) A race leading to concurrent calls to smp_call_function_single_async()
> 
> [...]

Here is the summary with links:
  - [net-next,1/4] net: fix possible race in skb_attempt_defer_free()
    https://git.kernel.org/netdev/net-next/c/97e719a82b43
  - [net-next,2/4] net: use napi_consume_skb() in skb_defer_free_flush()
    https://git.kernel.org/netdev/net-next/c/2db60eed1a95
  - [net-next,3/4] net: add skb_defer_max sysctl
    https://git.kernel.org/netdev/net-next/c/39564c3fdc66
  - [net-next,4/4] net: call skb_defer_free_flush() before each napi_poll()
    https://git.kernel.org/netdev/net-next/c/909876500251

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free()
  2022-05-16  4:24 ` [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free() Eric Dumazet
@ 2022-05-16 18:15   ` Jakub Kicinski
  2022-05-16 18:24     ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2022-05-16 18:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, Paolo Abeni, netdev, Eric Dumazet

On Sun, 15 May 2022 21:24:53 -0700 Eric Dumazet wrote:
> A cpu can observe sd->defer_count reaching 128,
> and call smp_call_function_single_async()
> 
> Problem is that the remote CPU can clear sd->defer_count
> before the IPI is run/acknowledged.
> 
> Other cpus can queue more packets and also decide
> to call smp_call_function_single_async() while the pending
> IPI was not yet delivered.
> 
> This is a common issue with smp_call_function_single_async().
> Callers must ensure correct synchronization and serialization.
> 
> I triggered this issue while experimenting smaller threshold.
> Performing the call to smp_call_function_single_async()
> under sd->defer_lock protection did not solve the problem.
> 
> Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
> to insert locked csd") replaced an informative WARN_ON_ONCE()
> with a return of -EBUSY, which is often ignored.
> Test of CSD_FLAG_LOCK presence is racy anyway.

If I'm reading this right this is useful for backports but in net-next
it really is a noop? The -EBUSY would be perfectly safe to ignore?
Just checking.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll()
  2022-05-16  4:24 ` [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll() Eric Dumazet
@ 2022-05-16 18:21   ` Jakub Kicinski
  2022-05-16 18:26     ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2022-05-16 18:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, Paolo Abeni, netdev, Eric Dumazet

On Sun, 15 May 2022 21:24:56 -0700 Eric Dumazet wrote:
> -end:
> -	skb_defer_free_flush(sd);
> +end:;

Sorry for the nit pick but can I remove this and just return like we
did before f3412b3879b4? Is there a reason such "label:;}" is good?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free()
  2022-05-16 18:15   ` Jakub Kicinski
@ 2022-05-16 18:24     ` Eric Dumazet
  2022-05-16 18:54       ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16 18:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev

On Mon, May 16, 2022 at 11:16 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 15 May 2022 21:24:53 -0700 Eric Dumazet wrote:
> > A cpu can observe sd->defer_count reaching 128,
> > and call smp_call_function_single_async()
> >
> > Problem is that the remote CPU can clear sd->defer_count
> > before the IPI is run/acknowledged.
> >
> > Other cpus can queue more packets and also decide
> > to call smp_call_function_single_async() while the pending
> > IPI was not yet delivered.
> >
> > This is a common issue with smp_call_function_single_async().
> > Callers must ensure correct synchronization and serialization.
> >
> > I triggered this issue while experimenting smaller threshold.
> > Performing the call to smp_call_function_single_async()
> > under sd->defer_lock protection did not solve the problem.
> >
> > Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
> > to insert locked csd") replaced an informative WARN_ON_ONCE()
> > with a return of -EBUSY, which is often ignored.
> > Test of CSD_FLAG_LOCK presence is racy anyway.
>
> If I'm reading this right this is useful for backports but in net-next
> it really is a noop? The -EBUSY would be perfectly safe to ignore?
> Just checking.

Not sure I understand the question.

trigger_rx_softirq() and friends were only in net-next, so there is no
backport needed.

Are you talking of calls from net_rps_send_ipi() ?
These are fine, because we own an atomic bit (NAPI_STATE_SCHED).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll()
  2022-05-16 18:21   ` Jakub Kicinski
@ 2022-05-16 18:26     ` Eric Dumazet
  2022-05-16 18:56       ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16 18:26 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev

On Mon, May 16, 2022 at 11:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 15 May 2022 21:24:56 -0700 Eric Dumazet wrote:
> > -end:
> > -     skb_defer_free_flush(sd);
> > +end:;
>
> Sorry for the nit pick but can I remove this and just return like we
> did before f3412b3879b4? Is there a reason such "label:;}" is good?

I thought that having a return in the middle of this function would
hurt us at some point.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free()
  2022-05-16 18:24     ` Eric Dumazet
@ 2022-05-16 18:54       ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2022-05-16 18:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev

On Mon, 16 May 2022 11:24:40 -0700 Eric Dumazet wrote:
> On Mon, May 16, 2022 at 11:16 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > If I'm reading this right this is useful for backports but in net-next
> > it really is a noop? The -EBUSY would be perfectly safe to ignore?
> > Just checking.  
> 
> Not sure I understand the question.
> 
> trigger_rx_softirq() and friends were only in net-next, so there is no
> backport needed.
> 
> Are you talking of calls from net_rps_send_ipi() ?
> These are fine, because we own an atomic bit (NAPI_STATE_SCHED).

Ah, I think I get it now. It was unclear what's the problem this patch
is solving this part of the commit message really is key:

> This is a common issue with smp_call_function_single_async().
> Callers must ensure correct synchronization and serialization.

smp_call_function_single_async() does not protect its own internal state
so we need to wrap it in our own locking (of some form thereof).

Sorry for the noise.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll()
  2022-05-16 18:26     ` Eric Dumazet
@ 2022-05-16 18:56       ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2022-05-16 18:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev

On Mon, 16 May 2022 11:26:14 -0700 Eric Dumazet wrote:
> On Mon, May 16, 2022 at 11:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sun, 15 May 2022 21:24:56 -0700 Eric Dumazet wrote:  
> > > -end:
> > > -     skb_defer_free_flush(sd);
> > > +end:;  
> >
> > Sorry for the nit pick but can I remove this and just return like we
> > did before f3412b3879b4? Is there a reason such "label:;}" is good?  
> 
> I thought that having a return in the middle of this function would
> hurt us at some point.

I guess personal preference. Let's leave it unless someone else shares
my disregard for pointlessly jumping to the closing bracket :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 3/4] net: add skb_defer_max sysctl
  2022-05-16  4:24 ` [PATCH net-next 3/4] net: add skb_defer_max sysctl Eric Dumazet
@ 2022-05-16 20:39   ` Jakub Kicinski
  2022-05-16 20:43     ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2022-05-16 20:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, Paolo Abeni, netdev, Eric Dumazet

On Sun, 15 May 2022 21:24:55 -0700 Eric Dumazet wrote:
> @@ -6494,16 +6495,21 @@ void skb_attempt_defer_free(struct sk_buff *skb)
>  	int cpu = skb->alloc_cpu;
>  	struct softnet_data *sd;
>  	unsigned long flags;
> +	unsigned int defer_max;
>  	bool kick;
>  
>  	if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
>  	    !cpu_online(cpu) ||
>  	    cpu == raw_smp_processor_id()) {
> -		__kfree_skb(skb);
> +nodefer:	__kfree_skb(skb);
>  		return;
>  	}
>  
>  	sd = &per_cpu(softnet_data, cpu);
> +	defer_max = READ_ONCE(sysctl_skb_defer_max);
> +	if (READ_ONCE(sd->defer_count) >= defer_max)
> +		goto nodefer;
> +
>  	/* We do not send an IPI or any signal.
>  	 * Remote cpu will eventually call skb_defer_free_flush()
>  	 */
> @@ -6513,11 +6519,8 @@ void skb_attempt_defer_free(struct sk_buff *skb)
>  	WRITE_ONCE(sd->defer_list, skb);
>  	sd->defer_count++;
>  
> -	/* kick every time queue length reaches 128.
> -	 * This condition should hardly be hit under normal conditions,
> -	 * unless cpu suddenly stopped to receive NIC interrupts.
> -	 */
> -	kick = sd->defer_count == 128;
> +	/* Send an IPI every time queue reaches half capacity. */
> +	kick = sd->defer_count == (defer_max >> 1);

nit: it will behave a little strangely for defer_max == 1
we'll let one skb get onto the list and free the subsequent 
skbs directly but we'll never kick the IPI

Moving the sd->defer_count++; should fix it and have no significant
side effects. I think.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 3/4] net: add skb_defer_max sysctl
  2022-05-16 20:39   ` Jakub Kicinski
@ 2022-05-16 20:43     ` Eric Dumazet
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2022-05-16 20:43 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev

On Mon, May 16, 2022 at 1:39 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 15 May 2022 21:24:55 -0700 Eric Dumazet wrote:
> > @@ -6494,16 +6495,21 @@ void skb_attempt_defer_free(struct sk_buff *skb)
> >       int cpu = skb->alloc_cpu;
> >       struct softnet_data *sd;
> >       unsigned long flags;
> > +     unsigned int defer_max;
> >       bool kick;
> >
> >       if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
> >           !cpu_online(cpu) ||
> >           cpu == raw_smp_processor_id()) {
> > -             __kfree_skb(skb);
> > +nodefer:     __kfree_skb(skb);
> >               return;
> >       }
> >
> >       sd = &per_cpu(softnet_data, cpu);
> > +     defer_max = READ_ONCE(sysctl_skb_defer_max);
> > +     if (READ_ONCE(sd->defer_count) >= defer_max)
> > +             goto nodefer;
> > +
> >       /* We do not send an IPI or any signal.
> >        * Remote cpu will eventually call skb_defer_free_flush()
> >        */
> > @@ -6513,11 +6519,8 @@ void skb_attempt_defer_free(struct sk_buff *skb)
> >       WRITE_ONCE(sd->defer_list, skb);
> >       sd->defer_count++;
> >
> > -     /* kick every time queue length reaches 128.
> > -      * This condition should hardly be hit under normal conditions,
> > -      * unless cpu suddenly stopped to receive NIC interrupts.
> > -      */
> > -     kick = sd->defer_count == 128;
> > +     /* Send an IPI every time queue reaches half capacity. */
> > +     kick = sd->defer_count == (defer_max >> 1);
>
> nit: it will behave a little strangely for defer_max == 1
> we'll let one skb get onto the list and free the subsequent
> skbs directly but we'll never kick the IPI

Yes, I was aware of this, but decided it was not a big deal.

Presumably people will be interested to disable the thing completely,
I am not sure about defer_max == 1

>
> Moving the sd->defer_count++; should fix it and have no significant
> side effects. I think.

SGTM, thanks !

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-16 21:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16  4:24 [PATCH net-next 0/4] net: polish skb defer freeing Eric Dumazet
2022-05-16  4:24 ` [PATCH net-next 1/4] net: fix possible race in skb_attempt_defer_free() Eric Dumazet
2022-05-16 18:15   ` Jakub Kicinski
2022-05-16 18:24     ` Eric Dumazet
2022-05-16 18:54       ` Jakub Kicinski
2022-05-16  4:24 ` [PATCH net-next 2/4] net: use napi_consume_skb() in skb_defer_free_flush() Eric Dumazet
2022-05-16  4:24 ` [PATCH net-next 3/4] net: add skb_defer_max sysctl Eric Dumazet
2022-05-16 20:39   ` Jakub Kicinski
2022-05-16 20:43     ` Eric Dumazet
2022-05-16  4:24 ` [PATCH net-next 4/4] net: call skb_defer_free_flush() before each napi_poll() Eric Dumazet
2022-05-16 18:21   ` Jakub Kicinski
2022-05-16 18:26     ` Eric Dumazet
2022-05-16 18:56       ` Jakub Kicinski
2022-05-16 10:40 ` [PATCH net-next 0/4] net: polish skb defer freeing patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.