linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: remove static inline from dev_put/dev_hold
@ 2019-11-11 14:05 Tony Lu
  2019-11-11 16:56 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tony Lu @ 2019-11-11 14:05 UTC (permalink / raw)
  To: davem; +Cc: shemminger, netdev, linux-kernel

This patch removes static inline from dev_put/dev_hold in order to help
trace the pcpu_refcnt leak of net_device.

We have sufferred this kind of issue for several times during
manipulating NIC between different net namespaces. It prints this
log in dmesg:

  unregister_netdevice: waiting for eth0 to become free. Usage count = 1

However, it is hard to find out who called and leaked refcnt in time. It
only left the crime scene but few evidence. Once leaked, it is not
safe to fix it up on the running host. We can't trace dev_put/dev_hold
directly, for the functions are inlined and used wildly amoung modules.
And this issue is common, there are tens of patches fix net_device
refcnt leak for various causes.

To trace the refcnt manipulating, this patch removes static inline from
dev_put/dev_hold. We can use handy tools, such as eBPF with kprobe, to
find out who holds but forgets to put refcnt. This will not be called
frequently, so the overhead is limited.

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 include/linux/netdevice.h | 24 ++++--------------------
 net/core/dev.c            | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20f190b4c18..872d266c6da5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3720,27 +3720,11 @@ extern unsigned int	netdev_budget_usecs;
 /* Called by rtnetlink.c:rtnl_unlock() */
 void netdev_run_todo(void);
 
-/**
- *	dev_put - release reference to device
- *	@dev: network device
- *
- * Release reference to device to allow it to be freed.
- */
-static inline void dev_put(struct net_device *dev)
-{
-	this_cpu_dec(*dev->pcpu_refcnt);
-}
+/* Release reference to device to allow it to be freed. */
+void dev_put(struct net_device *dev);
 
-/**
- *	dev_hold - get reference to device
- *	@dev: network device
- *
- * Hold reference to device to keep it from being freed.
- */
-static inline void dev_hold(struct net_device *dev)
-{
-	this_cpu_inc(*dev->pcpu_refcnt);
-}
+/* Hold reference to device to keep it from being freed. */
+void dev_hold(struct net_device *dev);
 
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
  * and _off may be called from IRQ context, but it is caller
diff --git a/net/core/dev.c b/net/core/dev.c
index 99ac84ff398f..620fb3d6718a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1294,6 +1294,30 @@ void netdev_notify_peers(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_notify_peers);
 
+/**
+ *	dev_put - release reference to device
+ *	@dev: network device
+ *
+ * Release reference to device to allow it to be freed.
+ */
+void dev_put(struct net_device *dev)
+{
+	this_cpu_dec(*dev->pcpu_refcnt);
+}
+EXPORT_SYMBOL(dev_put);
+
+/**
+ *	dev_hold - get reference to device
+ *	@dev: network device
+ *
+ * Hold reference to device to keep it from being freed.
+ */
+void dev_hold(struct net_device *dev)
+{
+	this_cpu_inc(*dev->pcpu_refcnt);
+}
+EXPORT_SYMBOL(dev_hold);
+
 static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: remove static inline from dev_put/dev_hold
  2019-11-11 14:05 [PATCH] net: remove static inline from dev_put/dev_hold Tony Lu
@ 2019-11-11 16:56 ` Stephen Hemminger
  2019-11-12  7:18   ` Tony Lu
  2019-11-11 17:21 ` Eric Dumazet
  2019-11-11 21:26 ` Cong Wang
  2 siblings, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2019-11-11 16:56 UTC (permalink / raw)
  To: Tony Lu; +Cc: davem, shemminger, netdev, linux-kernel

On Mon, 11 Nov 2019 22:05:03 +0800
Tony Lu <tonylu@linux.alibaba.com> wrote:

> This patch removes static inline from dev_put/dev_hold in order to help
> trace the pcpu_refcnt leak of net_device.
> 
> We have sufferred this kind of issue for several times during
> manipulating NIC between different net namespaces. It prints this
> log in dmesg:
> 
>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
> 
> However, it is hard to find out who called and leaked refcnt in time. It
> only left the crime scene but few evidence. Once leaked, it is not
> safe to fix it up on the running host. We can't trace dev_put/dev_hold
> directly, for the functions are inlined and used wildly amoung modules.
> And this issue is common, there are tens of patches fix net_device
> refcnt leak for various causes.
> 
> To trace the refcnt manipulating, this patch removes static inline from
> dev_put/dev_hold. We can use handy tools, such as eBPF with kprobe, to
> find out who holds but forgets to put refcnt. This will not be called
> frequently, so the overhead is limited.
> 
> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>

In the past dev_hold/dev_put was in the hot path for several
operations. What is the performance implication of doing this?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: remove static inline from dev_put/dev_hold
  2019-11-11 14:05 [PATCH] net: remove static inline from dev_put/dev_hold Tony Lu
  2019-11-11 16:56 ` Stephen Hemminger
@ 2019-11-11 17:21 ` Eric Dumazet
  2019-11-12  9:48   ` Tony Lu
  2019-11-11 21:26 ` Cong Wang
  2 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2019-11-11 17:21 UTC (permalink / raw)
  To: Tony Lu, davem; +Cc: shemminger, netdev, linux-kernel



On 11/11/19 6:05 AM, Tony Lu wrote:
> This patch removes static inline from dev_put/dev_hold in order to help
> trace the pcpu_refcnt leak of net_device.
> 
> We have sufferred this kind of issue for several times during
> manipulating NIC between different net namespaces. It prints this
> log in dmesg:
> 
>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
> 
> However, it is hard to find out who called and leaked refcnt in time. It
> only left the crime scene but few evidence. Once leaked, it is not
> safe to fix it up on the running host. We can't trace dev_put/dev_hold
> directly, for the functions are inlined and used wildly amoung modules.
> And this issue is common, there are tens of patches fix net_device
> refcnt leak for various causes.
> 
> To trace the refcnt manipulating, this patch removes static inline from
> dev_put/dev_hold. We can use handy tools, such as eBPF with kprobe, to
> find out who holds but forgets to put refcnt. This will not be called
> frequently, so the overhead is limited.
>

This looks as a first step.

But I would rather get a full set of scripts/debugging features,
instead of something that most people can not use right now.

Please share the whole thing.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: remove static inline from dev_put/dev_hold
  2019-11-11 14:05 [PATCH] net: remove static inline from dev_put/dev_hold Tony Lu
  2019-11-11 16:56 ` Stephen Hemminger
  2019-11-11 17:21 ` Eric Dumazet
@ 2019-11-11 21:26 ` Cong Wang
  2019-11-12  8:47   ` Tony Lu
  2 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2019-11-11 21:26 UTC (permalink / raw)
  To: Tony Lu; +Cc: David Miller, shemminger, Linux Kernel Network Developers, LKML

On Mon, Nov 11, 2019 at 6:12 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
>
> This patch removes static inline from dev_put/dev_hold in order to help
> trace the pcpu_refcnt leak of net_device.
>
> We have sufferred this kind of issue for several times during
> manipulating NIC between different net namespaces. It prints this
> log in dmesg:
>
>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1

I debugged a nasty dst refcnt leak in TCP a long time ago, so I can
feel your pain.


>
> However, it is hard to find out who called and leaked refcnt in time. It
> only left the crime scene but few evidence. Once leaked, it is not
> safe to fix it up on the running host. We can't trace dev_put/dev_hold
> directly, for the functions are inlined and used wildly amoung modules.
> And this issue is common, there are tens of patches fix net_device
> refcnt leak for various causes.
>
> To trace the refcnt manipulating, this patch removes static inline from
> dev_put/dev_hold. We can use handy tools, such as eBPF with kprobe, to
> find out who holds but forgets to put refcnt. This will not be called
> frequently, so the overhead is limited.

I think tracepoint serves the purpose of tracking function call history,
you can add tracepoint for each of dev_put()/dev_hold(), which could
also inherit the trace filter and trigger too.

The netdev refcnt itself is not changed very frequently, but it is
refcnt'ed by other things like dst too which is changed frequently.
This is why usually when you see the netdev refcnt leak warning,
the problem is probably somewhere else, like dst refcnt leak.

Hope this helps.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: remove static inline from dev_put/dev_hold
  2019-11-11 16:56 ` Stephen Hemminger
@ 2019-11-12  7:18   ` Tony Lu
  0 siblings, 0 replies; 7+ messages in thread
From: Tony Lu @ 2019-11-12  7:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, shemminger, netdev, linux-kernel

On Mon, Nov 11, 2019 at 08:56:32AM -0800, Stephen Hemminger wrote:
> On Mon, 11 Nov 2019 22:05:03 +0800
> Tony Lu <tonylu@linux.alibaba.com> wrote:
> 
> > This patch removes static inline from dev_put/dev_hold in order to help
> > trace the pcpu_refcnt leak of net_device.
> > 
> > We have sufferred this kind of issue for several times during
> > manipulating NIC between different net namespaces. It prints this
> > log in dmesg:
> > 
> >   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
> > 
> > However, it is hard to find out who called and leaked refcnt in time. It
> > only left the crime scene but few evidence. Once leaked, it is not
> > safe to fix it up on the running host. We can't trace dev_put/dev_hold
> > directly, for the functions are inlined and used wildly amoung modules.
> > And this issue is common, there are tens of patches fix net_device
> > refcnt leak for various causes.
> > 
> > To trace the refcnt manipulating, this patch removes static inline from
> > dev_put/dev_hold. We can use handy tools, such as eBPF with kprobe, to
> > find out who holds but forgets to put refcnt. This will not be called
> > frequently, so the overhead is limited.
> > 
> > Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
> 
> In the past dev_hold/dev_put was in the hot path for several
> operations. What is the performance implication of doing this?

From code analysis, there should be a little performance backwards.
I don't have the benchmark data for now. I will make a kernel module to
take a series of benchmarks for dev_put/dev_hold. Actually there is a plan
to take a whole solution for this issue. The benchmarks will be done
after this.

Cheers
Tony Lu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: remove static inline from dev_put/dev_hold
  2019-11-11 21:26 ` Cong Wang
@ 2019-11-12  8:47   ` Tony Lu
  0 siblings, 0 replies; 7+ messages in thread
From: Tony Lu @ 2019-11-12  8:47 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, shemminger, Linux Kernel Network Developers, LKML

On Mon, Nov 11, 2019 at 01:26:13PM -0800, Cong Wang wrote:
> On Mon, Nov 11, 2019 at 6:12 AM Tony Lu <tonylu@linux.alibaba.com> wrote:
> >
> > This patch removes static inline from dev_put/dev_hold in order to help
> > trace the pcpu_refcnt leak of net_device.
> >
> > We have sufferred this kind of issue for several times during
> > manipulating NIC between different net namespaces. It prints this
> > log in dmesg:
> >
> >   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
> 
> I debugged a nasty dst refcnt leak in TCP a long time ago, so I can
> feel your pain.
> 
> 
> >
> > However, it is hard to find out who called and leaked refcnt in time. It
> > only left the crime scene but few evidence. Once leaked, it is not
> > safe to fix it up on the running host. We can't trace dev_put/dev_hold
> > directly, for the functions are inlined and used wildly amoung modules.
> > And this issue is common, there are tens of patches fix net_device
> > refcnt leak for various causes.
> >
> > To trace the refcnt manipulating, this patch removes static inline from
> > dev_put/dev_hold. We can use handy tools, such as eBPF with kprobe, to
> > find out who holds but forgets to put refcnt. This will not be called
> > frequently, so the overhead is limited.
> 
> I think tracepoint serves the purpose of tracking function call history,
> you can add tracepoint for each of dev_put()/dev_hold(), which could
> also inherit the trace filter and trigger too.

Thanks for your advice. I already made a patch set to add a pair of
tracepoints to trace dev_hold()/dev_put() as an available solution. I
used to want to give a flexible approach for people who want to choose.
I will send it out later.

> 
> The netdev refcnt itself is not changed very frequently, but it is
> refcnt'ed by other things like dst too which is changed frequently.
> This is why usually when you see the netdev refcnt leak warning,
> the problem is probably somewhere else, like dst refcnt leak.

We also suffered dst refcnt leak issue before. It is really hard to
investigate. I will think about this place well.

> 
> Hope this helps.
> 
> Thanks.


Thanks.
Tony Lu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: remove static inline from dev_put/dev_hold
  2019-11-11 17:21 ` Eric Dumazet
@ 2019-11-12  9:48   ` Tony Lu
  0 siblings, 0 replies; 7+ messages in thread
From: Tony Lu @ 2019-11-12  9:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, shemminger, netdev, linux-kernel

On Mon, Nov 11, 2019 at 09:21:58AM -0800, Eric Dumazet wrote:
> 
> 
> On 11/11/19 6:05 AM, Tony Lu wrote:
> > This patch removes static inline from dev_put/dev_hold in order to help
> > trace the pcpu_refcnt leak of net_device.
> > 
> > We have sufferred this kind of issue for several times during
> > manipulating NIC between different net namespaces. It prints this
> > log in dmesg:
> > 
> >   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
> > 
> > However, it is hard to find out who called and leaked refcnt in time. It
> > only left the crime scene but few evidence. Once leaked, it is not
> > safe to fix it up on the running host. We can't trace dev_put/dev_hold
> > directly, for the functions are inlined and used wildly amoung modules.
> > And this issue is common, there are tens of patches fix net_device
> > refcnt leak for various causes.
> > 
> > To trace the refcnt manipulating, this patch removes static inline from
> > dev_put/dev_hold. We can use handy tools, such as eBPF with kprobe, to
> > find out who holds but forgets to put refcnt. This will not be called
> > frequently, so the overhead is limited.
> >
> 
> This looks as a first step.

Yes, I used to want to give a flexible approach for people, and they
could choose tools what they want. And I already made a patch, putting a
pair tracepoints into dev_put()/dev_hold() to trace that. I will send it out
later.

> 
> But I would rather get a full set of scripts/debugging features,
> instead of something that most people can not use right now.
> 
> Please share the whole thing.

Thanks.
Tony Lu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-12  9:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11 14:05 [PATCH] net: remove static inline from dev_put/dev_hold Tony Lu
2019-11-11 16:56 ` Stephen Hemminger
2019-11-12  7:18   ` Tony Lu
2019-11-11 17:21 ` Eric Dumazet
2019-11-12  9:48   ` Tony Lu
2019-11-11 21:26 ` Cong Wang
2019-11-12  8:47   ` Tony Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).