All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: make unregister netdev warning timeout configurable
@ 2021-03-20 14:28 Dmitry Vyukov
  2021-03-21  8:28 ` Leon Romanovsky
  2021-03-22 19:26 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Dmitry Vyukov @ 2021-03-20 14:28 UTC (permalink / raw)
  To: davem, edumazet; +Cc: Dmitry Vyukov, netdev, linux-kernel

netdev_wait_allrefs() issues a warning if refcount does not drop to 0
after 10 seconds. While 10 second wait generally should not happen
under normal workload in normal environment, it seems to fire falsely
very often during fuzzing and/or in qemu emulation (~10x slower).
At least it's not possible to understand if it's really a false
positive or not. Automated testing generally bumps all timeouts
to very high values to avoid flake failures.
Make the timeout configurable for automated testing systems.
Lowering the timeout may also be useful for e.g. manual bisection.
The default value matches the current behavior.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 net/Kconfig    | 12 ++++++++++++
 net/core/dev.c |  4 +++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/Kconfig b/net/Kconfig
index 8cea808ad9e8d..ebb9cc00ac81d 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -461,6 +461,18 @@ config ETHTOOL_NETLINK
 	  netlink. It provides better extensibility and some new features,
 	  e.g. notification messages.
 
+config UNREGISTER_NETDEV_TIMEOUT
+	int "Unregister network device timeout in seconds"
+	default 10
+	range 0 3600
+	help
+	  This option controls the timeout (in seconds) used to issue
+	  a warning while waiting for a network device refcount to drop to 0
+	  during device unregistration.
+	  A lower value may be useful during bisection to detect a leaked
+	  reference faster. A larger value may be useful to prevent false
+	  warnings on slow/loaded systems.
+
 endif   # if NET
 
 # Used by archs to tell that they support BPF JIT compiler plus which flavour.
diff --git a/net/core/dev.c b/net/core/dev.c
index 0f72ff5d34ba0..ca03ee407133b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10405,7 +10405,9 @@ static void netdev_wait_allrefs(struct net_device *dev)
 
 		refcnt = netdev_refcnt_read(dev);
 
-		if (refcnt && time_after(jiffies, warning_time + 10 * HZ)) {
+		if (refcnt &&
+		    time_after(jiffies, warning_time +
+			       CONFIG_UNREGISTER_NETDEV_TIMEOUT * HZ)) {
 			pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
 				 dev->name, refcnt);
 			warning_time = jiffies;

base-commit: 5aa3c334a449bab24519c4967f5ac2b3304c8dcf
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: make unregister netdev warning timeout configurable
  2021-03-20 14:28 [PATCH] net: make unregister netdev warning timeout configurable Dmitry Vyukov
@ 2021-03-21  8:28 ` Leon Romanovsky
  2021-03-22 19:26 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2021-03-21  8:28 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: davem, edumazet, netdev, linux-kernel

On Sat, Mar 20, 2021 at 03:28:51PM +0100, Dmitry Vyukov wrote:
> netdev_wait_allrefs() issues a warning if refcount does not drop to 0
> after 10 seconds. While 10 second wait generally should not happen
> under normal workload in normal environment, it seems to fire falsely
> very often during fuzzing and/or in qemu emulation (~10x slower).
> At least it's not possible to understand if it's really a false
> positive or not. Automated testing generally bumps all timeouts
> to very high values to avoid flake failures.
> Make the timeout configurable for automated testing systems.
> Lowering the timeout may also be useful for e.g. manual bisection.
> The default value matches the current behavior.
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  net/Kconfig    | 12 ++++++++++++
>  net/core/dev.c |  4 +++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 

Our verification team would like to see this change too.

Thanks,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: make unregister netdev warning timeout configurable
  2021-03-20 14:28 [PATCH] net: make unregister netdev warning timeout configurable Dmitry Vyukov
  2021-03-21  8:28 ` Leon Romanovsky
@ 2021-03-22 19:26 ` David Miller
  2021-03-23  6:51   ` Dmitry Vyukov
  1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2021-03-22 19:26 UTC (permalink / raw)
  To: dvyukov; +Cc: edumazet, netdev, linux-kernel

From: Dmitry Vyukov <dvyukov@google.com>
Date: Sat, 20 Mar 2021 15:28:51 +0100

> netdev_wait_allrefs() issues a warning if refcount does not drop to 0
> after 10 seconds. While 10 second wait generally should not happen
> under normal workload in normal environment, it seems to fire falsely
> very often during fuzzing and/or in qemu emulation (~10x slower).
> At least it's not possible to understand if it's really a false
> positive or not. Automated testing generally bumps all timeouts
> to very high values to avoid flake failures.
> Make the timeout configurable for automated testing systems.
> Lowering the timeout may also be useful for e.g. manual bisection.
> The default value matches the current behavior.
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org

I'd say a sysctl knob is much better than a compile time setting for this.
That way stock kernels can be used in these testing scenerios.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: make unregister netdev warning timeout configurable
  2021-03-22 19:26 ` David Miller
@ 2021-03-23  6:51   ` Dmitry Vyukov
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Vyukov @ 2021-03-23  6:51 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev, LKML

On Mon, Mar 22, 2021 at 8:26 PM David Miller <davem@davemloft.net> wrote:
>
> From: Dmitry Vyukov <dvyukov@google.com>
> Date: Sat, 20 Mar 2021 15:28:51 +0100
>
> > netdev_wait_allrefs() issues a warning if refcount does not drop to 0
> > after 10 seconds. While 10 second wait generally should not happen
> > under normal workload in normal environment, it seems to fire falsely
> > very often during fuzzing and/or in qemu emulation (~10x slower).
> > At least it's not possible to understand if it's really a false
> > positive or not. Automated testing generally bumps all timeouts
> > to very high values to avoid flake failures.
> > Make the timeout configurable for automated testing systems.
> > Lowering the timeout may also be useful for e.g. manual bisection.
> > The default value matches the current behavior.
> >
> > Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> > Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
> > Cc: netdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
>
> I'd say a sysctl knob is much better than a compile time setting for this.
> That way stock kernels can be used in these testing scenerios.

FTR, I've mailed v2 with a sysctl:
https://lore.kernel.org/netdev/20210323064923.2098711-1-dvyukov@google.com/T/#u

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-23  6:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-20 14:28 [PATCH] net: make unregister netdev warning timeout configurable Dmitry Vyukov
2021-03-21  8:28 ` Leon Romanovsky
2021-03-22 19:26 ` David Miller
2021-03-23  6:51   ` Dmitry Vyukov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.