netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] net: remove static inline from dev_put()/dev_hold()
@ 2019-11-12 13:05 Tony Lu
  2019-11-12 13:05 ` [PATCH v2 2/2] net: add trace events for net_device refcnt Tony Lu
  0 siblings, 1 reply; 4+ messages in thread
From: Tony Lu @ 2019-11-12 13:05 UTC (permalink / raw)
  To: davem; +Cc: xiyou.wangcong, eric.dumazet, shemminger, netdev, linux-kernel

This patch removes static inline from dev_put()/dev_hold() in order to help
trace the pcpu_refcnt leak of net_device.

We have sufferred this kind of issue for several times during
manipulating NIC between different net namespaces. It prints this
log in dmesg:

  unregister_netdevice: waiting for eth0 to become free. Usage count = 1

However, it is hard to find out who called and leaked refcnt in time. It
only left the crime scene but few evidence. Once leaked, it is not
safe to fix it up on the running host. We can't trace dev_put/dev_hold
directly, for the functions are inlined and used wildly amoung modules.
And this issue is common, there are tens of patches fixes net_device
refcnt leak for various causes.

To trace the refcnt manipulating, this patch removes static inline from
dev_put()/dev_hold(). We can use tools, such as eBPF with kprobe, to
find out who holds but forgets to put refcnt. This will not be called
frequently, so the overhead is limited.

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
v1->v2:
- provides two trace events and tracepoitns together in patch #2
- fix some typos in commit message
---
 include/linux/netdevice.h | 24 ++++--------------------
 net/core/dev.c            | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20f190b4c18..872d266c6da5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3720,27 +3720,11 @@ extern unsigned int	netdev_budget_usecs;
 /* Called by rtnetlink.c:rtnl_unlock() */
 void netdev_run_todo(void);
 
-/**
- *	dev_put - release reference to device
- *	@dev: network device
- *
- * Release reference to device to allow it to be freed.
- */
-static inline void dev_put(struct net_device *dev)
-{
-	this_cpu_dec(*dev->pcpu_refcnt);
-}
+/* Release reference to device to allow it to be freed. */
+void dev_put(struct net_device *dev);
 
-/**
- *	dev_hold - get reference to device
- *	@dev: network device
- *
- * Hold reference to device to keep it from being freed.
- */
-static inline void dev_hold(struct net_device *dev)
-{
-	this_cpu_inc(*dev->pcpu_refcnt);
-}
+/* Hold reference to device to keep it from being freed. */
+void dev_hold(struct net_device *dev);
 
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
  * and _off may be called from IRQ context, but it is caller
diff --git a/net/core/dev.c b/net/core/dev.c
index 99ac84ff398f..620fb3d6718a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1294,6 +1294,30 @@ void netdev_notify_peers(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_notify_peers);
 
+/**
+ *	dev_put - release reference to device
+ *	@dev: network device
+ *
+ * Release reference to device to allow it to be freed.
+ */
+void dev_put(struct net_device *dev)
+{
+	this_cpu_dec(*dev->pcpu_refcnt);
+}
+EXPORT_SYMBOL(dev_put);
+
+/**
+ *	dev_hold - get reference to device
+ *	@dev: network device
+ *
+ * Hold reference to device to keep it from being freed.
+ */
+void dev_hold(struct net_device *dev)
+{
+	this_cpu_inc(*dev->pcpu_refcnt);
+}
+EXPORT_SYMBOL(dev_hold);
+
 static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] net: add trace events for net_device refcnt
  2019-11-12 13:05 [PATCH v2 1/2] net: remove static inline from dev_put()/dev_hold() Tony Lu
@ 2019-11-12 13:05 ` Tony Lu
  2019-11-12 15:27   ` David Ahern
  0 siblings, 1 reply; 4+ messages in thread
From: Tony Lu @ 2019-11-12 13:05 UTC (permalink / raw)
  To: davem; +Cc: xiyou.wangcong, eric.dumazet, shemminger, netdev, linux-kernel

The net_device refcnt leak is hard to trace and debug for now. We need
the ability to know when and who manipulated the refcnt.

Adding the trace events for net_device pcpu_refcnt and also tracepoints
in dev_put()/dev_hold(), provides the history of net_device refcnt inc
and desc. With trace logs analysis, paring the put and hold history, we
can find out who leaked.

Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
---
 include/trace/events/net.h | 41 ++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |  4 ++++
 2 files changed, 45 insertions(+)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 3b28843652d2..3bf6dd738882 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -326,6 +326,47 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit,
 	TP_ARGS(ret)
 );
 
+DECLARE_EVENT_CLASS(net_dev_refcnt_template,
+
+	TP_PROTO(struct net_device *dev, void *location),
+
+	TP_ARGS(dev, location),
+
+	TP_STRUCT__entry(
+		__string(	name,		dev->name	)
+		__field(	int,		refcnt		)
+		__field(	void *,		location	)
+	),
+
+	TP_fast_assign(
+		int i, refcnt = 0;
+
+		for_each_possible_cpu(i)
+			refcnt += *per_cpu_ptr(dev->pcpu_refcnt, i);
+
+		__assign_str(name, dev->name);
+		__entry->refcnt = refcnt;
+		__entry->location = location;
+	),
+
+	TP_printk("dev=%s refcnt=%d location=%p",
+		__get_str(name), __entry->refcnt, __entry->location)
+);
+
+DEFINE_EVENT(net_dev_refcnt_template, net_dev_put,
+
+	TP_PROTO(struct net_device *dev, void *location),
+
+	TP_ARGS(dev, location)
+);
+
+DEFINE_EVENT(net_dev_refcnt_template, net_dev_hold,
+
+	TP_PROTO(struct net_device *dev, void *location),
+
+	TP_ARGS(dev, location)
+);
+
 #endif /* _TRACE_NET_H */
 
 /* This part must be outside protection */
diff --git a/net/core/dev.c b/net/core/dev.c
index 620fb3d6718a..163870a09984 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1302,6 +1302,8 @@ EXPORT_SYMBOL(netdev_notify_peers);
  */
 void dev_put(struct net_device *dev)
 {
+	trace_net_dev_put(dev, __builtin_return_address(0));
+
 	this_cpu_dec(*dev->pcpu_refcnt);
 }
 EXPORT_SYMBOL(dev_put);
@@ -1314,6 +1316,8 @@ EXPORT_SYMBOL(dev_put);
  */
 void dev_hold(struct net_device *dev)
 {
+	trace_net_dev_hold(dev, __builtin_return_address(0));
+
 	this_cpu_inc(*dev->pcpu_refcnt);
 }
 EXPORT_SYMBOL(dev_hold);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 2/2] net: add trace events for net_device refcnt
  2019-11-12 13:05 ` [PATCH v2 2/2] net: add trace events for net_device refcnt Tony Lu
@ 2019-11-12 15:27   ` David Ahern
  2019-11-16 19:47     ` Cong Wang
  0 siblings, 1 reply; 4+ messages in thread
From: David Ahern @ 2019-11-12 15:27 UTC (permalink / raw)
  To: Tony Lu, davem
  Cc: xiyou.wangcong, eric.dumazet, shemminger, netdev, linux-kernel

On 11/12/19 6:05 AM, Tony Lu wrote:
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index 3b28843652d2..3bf6dd738882 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -326,6 +326,47 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit,
>  	TP_ARGS(ret)
>  );
>  
> +DECLARE_EVENT_CLASS(net_dev_refcnt_template,
> +
> +	TP_PROTO(struct net_device *dev, void *location),
> +
> +	TP_ARGS(dev, location),
> +
> +	TP_STRUCT__entry(
> +		__string(	name,		dev->name	)
> +		__field(	int,		refcnt		)
> +		__field(	void *,		location	)
> +	),
> +
> +	TP_fast_assign(
> +		int i, refcnt = 0;
> +
> +		for_each_possible_cpu(i)
> +			refcnt += *per_cpu_ptr(dev->pcpu_refcnt, i);

Rather than copying the definition of netdev_refcnt_read here, so just
use it.

> +
> +		__assign_str(name, dev->name);
> +		__entry->refcnt = refcnt;
> +		__entry->location = location;
> +	),
> +
> +	TP_printk("dev=%s refcnt=%d location=%p",
> +		__get_str(name), __entry->refcnt, __entry->location)
> +);
> +
> +DEFINE_EVENT(net_dev_refcnt_template, net_dev_put,
> +
> +	TP_PROTO(struct net_device *dev, void *location),
> +
> +	TP_ARGS(dev, location)
> +);
> +
> +DEFINE_EVENT(net_dev_refcnt_template, net_dev_hold,
> +
> +	TP_PROTO(struct net_device *dev, void *location),
> +
> +	TP_ARGS(dev, location)
> +);
> +
>  #endif /* _TRACE_NET_H */
>  

The location alone does nothing for resolving reference count leaks; you
really have to use stack traces to pair up the hold and put and to give
context for what are repetitive locations for the hold and put.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 2/2] net: add trace events for net_device refcnt
  2019-11-12 15:27   ` David Ahern
@ 2019-11-16 19:47     ` Cong Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Cong Wang @ 2019-11-16 19:47 UTC (permalink / raw)
  To: David Ahern
  Cc: Tony Lu, David Miller, Eric Dumazet, shemminger,
	Linux Kernel Network Developers, LKML

On Tue, Nov 12, 2019 at 7:27 AM David Ahern <dsahern@gmail.com> wrote:
> The location alone does nothing for resolving reference count leaks; you
> really have to use stack traces to pair up the hold and put and to give
> context for what are repetitive locations for the hold and put.

+1

And, as you are using tracepoint, you get trace filter and trigger for free,
which already includes stacktrace.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-16 19:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 13:05 [PATCH v2 1/2] net: remove static inline from dev_put()/dev_hold() Tony Lu
2019-11-12 13:05 ` [PATCH v2 2/2] net: add trace events for net_device refcnt Tony Lu
2019-11-12 15:27   ` David Ahern
2019-11-16 19:47     ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).