netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch net-next v2] net: add a generic tracepoint for TX queue timeout
@ 2019-05-02  2:56 Cong Wang
  2019-05-02  7:06 ` Jiri Pirko
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Cong Wang @ 2019-05-02  2:56 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Eran Ben Elisha, Jiri Pirko

Although devlink health report does a nice job on reporting TX
timeout and other NIC errors, unfortunately it requires drivers
to support it but currently only mlx5 has implemented it.
Before other drivers could catch up, it is useful to have a
generic tracepoint to monitor this kind of TX timeout. We have
been suffering TX timeout with different drivers, we plan to
start to monitor it with rasdaemon which just needs a new tracepoint.

Sample output:

  ksoftirqd/1-16    [001] ..s2   144.043173: net_dev_xmit_timeout: dev=ens3 driver=e1000 queue=0

Cc: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/trace/events/net.h | 23 +++++++++++++++++++++++
 net/sched/sch_generic.c    |  2 ++
 2 files changed, 25 insertions(+)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 1efd7d9b25fe..2399073c3afc 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -95,6 +95,29 @@ TRACE_EVENT(net_dev_xmit,
 		__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
 );
 
+TRACE_EVENT(net_dev_xmit_timeout,
+
+	TP_PROTO(struct net_device *dev,
+		 int queue_index),
+
+	TP_ARGS(dev, queue_index),
+
+	TP_STRUCT__entry(
+		__string(	name,		dev->name	)
+		__string(	driver,		netdev_drivername(dev))
+		__field(	int,		queue_index	)
+	),
+
+	TP_fast_assign(
+		__assign_str(name, dev->name);
+		__assign_str(driver, netdev_drivername(dev));
+		__entry->queue_index = queue_index;
+	),
+
+	TP_printk("dev=%s driver=%s queue=%d",
+		__get_str(name), __get_str(driver), __entry->queue_index)
+);
+
 DECLARE_EVENT_CLASS(net_dev_template,
 
 	TP_PROTO(struct sk_buff *skb),
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 848aab3693bd..cce1e9ee85af 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -32,6 +32,7 @@
 #include <net/pkt_sched.h>
 #include <net/dst.h>
 #include <trace/events/qdisc.h>
+#include <trace/events/net.h>
 #include <net/xfrm.h>
 
 /* Qdisc to use by default */
@@ -441,6 +442,7 @@ static void dev_watchdog(struct timer_list *t)
 			}
 
 			if (some_queue_timedout) {
+				trace_net_dev_xmit_timeout(dev, i);
 				WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
 				       dev->name, netdev_drivername(dev), i);
 				dev->netdev_ops->ndo_tx_timeout(dev);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Patch net-next v2] net: add a generic tracepoint for TX queue timeout
  2019-05-02  2:56 [Patch net-next v2] net: add a generic tracepoint for TX queue timeout Cong Wang
@ 2019-05-02  7:06 ` Jiri Pirko
  2019-05-02  8:44 ` Eran Ben Elisha
  2019-05-04  4:43 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Jiri Pirko @ 2019-05-02  7:06 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Eran Ben Elisha, Jiri Pirko

Thu, May 02, 2019 at 04:56:59AM CEST, xiyou.wangcong@gmail.com wrote:
>Although devlink health report does a nice job on reporting TX
>timeout and other NIC errors, unfortunately it requires drivers
>to support it but currently only mlx5 has implemented it.
>Before other drivers could catch up, it is useful to have a
>generic tracepoint to monitor this kind of TX timeout. We have
>been suffering TX timeout with different drivers, we plan to
>start to monitor it with rasdaemon which just needs a new tracepoint.
>
>Sample output:
>
>  ksoftirqd/1-16    [001] ..s2   144.043173: net_dev_xmit_timeout: dev=ens3 driver=e1000 queue=0
>
>Cc: Eran Ben Elisha <eranbe@mellanox.com>
>Cc: Jiri Pirko <jiri@mellanox.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Useful. Thanks!

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch net-next v2] net: add a generic tracepoint for TX queue timeout
  2019-05-02  2:56 [Patch net-next v2] net: add a generic tracepoint for TX queue timeout Cong Wang
  2019-05-02  7:06 ` Jiri Pirko
@ 2019-05-02  8:44 ` Eran Ben Elisha
  2019-05-04  4:43 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Eran Ben Elisha @ 2019-05-02  8:44 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Jiri Pirko



On 5/2/2019 5:56 AM, Cong Wang wrote:
> Although devlink health report does a nice job on reporting TX
> timeout and other NIC errors, unfortunately it requires drivers
> to support it but currently only mlx5 has implemented it.
> Before other drivers could catch up, it is useful to have a
> generic tracepoint to monitor this kind of TX timeout. We have
> been suffering TX timeout with different drivers, we plan to
> start to monitor it with rasdaemon which just needs a new tracepoint.
> 
> Sample output:
> 
>    ksoftirqd/1-16    [001] ..s2   144.043173: net_dev_xmit_timeout: dev=ens3 driver=e1000 queue=0
> 
> Cc: Eran Ben Elisha <eranbe@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch net-next v2] net: add a generic tracepoint for TX queue timeout
  2019-05-02  2:56 [Patch net-next v2] net: add a generic tracepoint for TX queue timeout Cong Wang
  2019-05-02  7:06 ` Jiri Pirko
  2019-05-02  8:44 ` Eran Ben Elisha
@ 2019-05-04  4:43 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2019-05-04  4:43 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, eranbe, jiri

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed,  1 May 2019 19:56:59 -0700

> Although devlink health report does a nice job on reporting TX
> timeout and other NIC errors, unfortunately it requires drivers
> to support it but currently only mlx5 has implemented it.
> Before other drivers could catch up, it is useful to have a
> generic tracepoint to monitor this kind of TX timeout. We have
> been suffering TX timeout with different drivers, we plan to
> start to monitor it with rasdaemon which just needs a new tracepoint.
> 
> Sample output:
> 
>   ksoftirqd/1-16    [001] ..s2   144.043173: net_dev_xmit_timeout: dev=ens3 driver=e1000 queue=0
> 
> Cc: Eran Ben Elisha <eranbe@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks Cong.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-05-04  4:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-02  2:56 [Patch net-next v2] net: add a generic tracepoint for TX queue timeout Cong Wang
2019-05-02  7:06 ` Jiri Pirko
2019-05-02  8:44 ` Eran Ben Elisha
2019-05-04  4:43 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).