All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/5] netdev: show a process of packets
@ 2010-07-20  0:43 Koki Sanagi
  2010-07-20  0:45 ` [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Koki Sanagi @ 2010-07-20  0:43 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

CHANGE-LOG since v2:
    1) let all tracepoints of softirq use DECLARE_EVENT_CLASS
    2) let tracepoint of netdev_queue and netdev_receive use DECLARE_EVENT_CLASS
    3) add tracepoint to skb_free_datagram_locked
    4) show function and time when received packet is freed

These patch-set adds tracepoints to show us a process of packets.
Using these tracepoints and existing points, we can get the time when
packet passes through some points in transmit or receive sequence.
For example, this is an output of perf script which is attached by patch 5/5.

79074.756672832sec cpu=1
irq_entry(+0.000000msec,irq=77:eth3)
         |------------softirq_raise(+0.001277msec)
irq_exit (+0.002278msec)     |
                             |
                      softirq_entry(+0.003562msec)
                             |
                             |---netif_receive_skb(+0.006279msec,len=100)
                             |            |
                             |   skb_copy_datagram_iovec(+0.038778msec, 2285:sshd)
                             |
                      napi_poll_exit(+0.017160msec, eth3)
                             |
                      softirq_exit(+0.018248msec)

The above is a receive side. Like this, it can show receive sequence from
interrupt(irq_entry) to application(skb_copy_datagram_iovec). There are eight
points in this side. All events except for skb_copy_datagram_iovec and
freeing skb events can be associated with each other by CPU number.
skb_copy_datagram_iovec and freeing skb events can be associated with
netif_receive_skb by skbaddr.
This script shows one NET_RX softirq and events related to it. All relative
time bases on first irq_entry which raise NET_RX softirq.

   dev    len      Qdisc               netdevice             free
   eth3   114  79044.417123332sec     0.005242msec          0.103843msec
   eth3   114  79044.580090422sec     0.002306msec          0.103632msec
   eth3   114  79044.719078251sec     0.002288msec          0.104093msec

The above is a transmit side. There are three tracepoints in this side.
Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in
dev_hard_start_xmit. It indicates finishing putting a packet to driver.
point3 is in consume_skb and dev_kfree_skb_irq. It indicates freeing a
transmitted packet.
Values of this script are, from left, device name, length of a packet, a time of
point1, an interval time between point1 and point2 and an interval time between
point2 and point3.

These times are useful to analyze a performance or to detect a point where
packet delays. For example,
- NET_RX softirq calling is late.
- Application is late to take a packet.
- It takes much time to put a transmitting packet to driver
  (It may be caused by packed queue)

And also, these tracepoint help us to investigate a network driver's trouble
from memory dump because ftrace records it to memory. And ftrace is so light
even if always trace on. So, in a case investigating a problem which doesn't
reproduce, it is useful.

Thanks,
Koki Sanagi.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-20  0:43 [RFC PATCH v3 0/5] netdev: show a process of packets Koki Sanagi
@ 2010-07-20  0:45 ` Koki Sanagi
  2010-07-20 11:04   ` Neil Horman
  2010-07-20  0:46 ` [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Koki Sanagi @ 2010-07-20  0:45 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Add a tracepoint for tracing when softirq action is raised.

It and the existed tracepoints complete softirq's tracepoints:
softirq_raise, softirq_entry and softirq_exit.

And when this tracepoint is used in combination with
the softirq_entry tracepoint we can determine
the softirq raise latency.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

[ factorize softirq events with DECLARE_EVENT_CLASS ]
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/linux/interrupt.h  |    8 +++++-
 include/trace/events/irq.h |   57 ++++++++++++++++++++++++++-----------------
 kernel/softirq.c           |    4 +-
 3 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index c233113..1cb5726 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -18,6 +18,7 @@
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
 #include <asm/system.h>
+#include <trace/events/irq.h>
 
 /*
  * These correspond to the IORESOURCE_IRQ_* defines in
@@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
 asmlinkage void __do_softirq(void);
 extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
-#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
+static inline void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise(nr);
+	or_softirq_pending(1UL << nr);
+}
+
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 extern void wakeup_softirqd(void);
diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 0e4cfb6..717744c 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -5,7 +5,9 @@
 #define _TRACE_IRQ_H
 
 #include <linux/tracepoint.h>
-#include <linux/interrupt.h>
+
+struct irqaction;
+struct softirq_action;
 
 #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
 #define show_softirq_name(val)				\
@@ -84,56 +86,65 @@ TRACE_EVENT(irq_handler_exit,
 
 DECLARE_EVENT_CLASS(softirq,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int nr),
 
-	TP_ARGS(h, vec),
+	TP_ARGS(nr),
 
 	TP_STRUCT__entry(
-		__field(	int,	vec			)
+		__field(	unsigned int,	vec	)
 	),
 
 	TP_fast_assign(
-		__entry->vec = (int)(h - vec);
+		__entry->vec	= nr;
 	),
 
 	TP_printk("vec=%d [action=%s]", __entry->vec,
-		  show_softirq_name(__entry->vec))
+		show_softirq_name(__entry->vec))
+);
+
+/**
+ * softirq_raise - called immediately when a softirq is raised
+ * @nr: softirq vector number
+ *
+ * Tracepoint for tracing when softirq action is raised.
+ * Also, when used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise latency.
+ */
+DEFINE_EVENT(softirq, softirq_raise,
+
+	TP_PROTO(unsigned int nr),
+
+	TP_ARGS(nr)
 );
 
 /**
  * softirq_entry - called immediately before the softirq handler
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @nr: softirq vector number
  *
- * The @h parameter, contains a pointer to the struct softirq_action
- * which has a pointer to the action handler that is called. By subtracting
- * the @vec pointer from the @h pointer, we can determine the softirq
- * number. Also, when used in combination with the softirq_exit tracepoint
+ * Tracepoint for tracing when softirq action starts.
+ * Also, when used in combination with the softirq_exit tracepoint
  * we can determine the softirq latency.
  */
 DEFINE_EVENT(softirq, softirq_entry,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(nr)
 );
 
 /**
  * softirq_exit - called immediately after the softirq handler returns
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @nr: softirq vector number
  *
- * The @h parameter contains a pointer to the struct softirq_action
- * that has handled the softirq. By subtracting the @vec pointer from
- * the @h pointer, we can determine the softirq number. Also, when used in
- * combination with the softirq_entry tracepoint we can determine the softirq
- * latency.
+ * Tracepoint for tracing when softirq action ends.
+ * Also, when used in combination with the softirq_entry tracepoint
+ * we can determine the softirq latency.
  */
 DEFINE_EVENT(softirq, softirq_exit,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(nr)
 );
 
 #endif /*  _TRACE_IRQ_H */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 825e112..6790599 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -215,9 +215,9 @@ restart:
 			int prev_count = preempt_count();
 			kstat_incr_softirqs_this_cpu(h - softirq_vec);
 
-			trace_softirq_entry(h, softirq_vec);
+			trace_softirq_entry(h - softirq_vec);
 			h->action(h);
-			trace_softirq_exit(h, softirq_vec);
+			trace_softirq_exit(h - softirq_vec);
 			if (unlikely(prev_count != preempt_count())) {
 				printk(KERN_ERR "huh, entered softirq %td %s %p"
 				       "with preempt_count %08x,"


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT
  2010-07-20  0:43 [RFC PATCH v3 0/5] netdev: show a process of packets Koki Sanagi
  2010-07-20  0:45 ` [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
@ 2010-07-20  0:46 ` Koki Sanagi
  2010-07-20 11:09   ` Neil Horman
  2010-07-20  0:47 ` [RFC PATCH v3 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Koki Sanagi @ 2010-07-20  0:46 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Neil Horman <nhorman@tuxdriver.com>

This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
the usability of napi_poll tracepoint.

          <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
          <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1

An original patch is below.
http://marc.info/?l=linux-kernel&m=126021713809450&w=2
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

And add a fix by Steven Rostedt.
http://marc.info/?l=linux-kernel&m=126150506519173&w=2

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/napi.h |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
index 188deca..8fe1e93 100644
--- a/include/trace/events/napi.h
+++ b/include/trace/events/napi.h
@@ -6,10 +6,31 @@
 
 #include <linux/netdevice.h>
 #include <linux/tracepoint.h>
+#include <linux/ftrace.h>
+
+#define NO_DEV "(no_device)"
+
+TRACE_EVENT(napi_poll,
 
-DECLARE_TRACE(napi_poll,
 	TP_PROTO(struct napi_struct *napi),
-	TP_ARGS(napi));
+
+	TP_ARGS(napi),
+
+	TP_STRUCT__entry(
+		__field(	struct napi_struct *,	napi)
+		__string(	dev_name, napi->dev ? napi->dev->name : NO_DEV)
+	),
+
+	TP_fast_assign(
+		__entry->napi = napi;
+		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+	),
+
+	TP_printk("napi poll on napi struct %p for device %s",
+		__entry->napi, __get_str(dev_name))
+);
+
+#undef NO_DEV
 
 #endif /* _TRACE_NAPI_H_ */
 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v3 3/5] netdev: add tracepoints to netdev layer
  2010-07-20  0:43 [RFC PATCH v3 0/5] netdev: show a process of packets Koki Sanagi
  2010-07-20  0:45 ` [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
  2010-07-20  0:46 ` [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
@ 2010-07-20  0:47 ` Koki Sanagi
  2010-07-20 11:41   ` Neil Horman
  2010-07-20  0:49 ` [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb Koki Sanagi
  2010-07-20  0:50 ` [RFC PATCH v3 5/5] perf:add a script shows a process of packet Koki Sanagi
  4 siblings, 1 reply; 25+ messages in thread
From: Koki Sanagi @ 2010-07-20  0:47 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit and
netif_receive_skb. These tracepoints help you to monitor network driver's
input/output.

            sshd-4445  [001] 241367.066046: net_dev_queue: dev=eth3 skbaddr=dd6b2538 len=114
            sshd-4445  [001] 241367.066047: net_dev_xmit: dev=eth3 skbaddr=dd6b2538 len=114 rc=0
          <idle>-0     [001] 241367.067472: net_dev_receive: dev=eth3 skbaddr=f5e59000 len=52

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/net.h |   75 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |    5 +++
 net/core/net-traces.c      |    1 +
 3 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
new file mode 100644
index 0000000..8a21361
--- /dev/null
+++ b/include/trace/events/net.h
@@ -0,0 +1,75 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net
+
+#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/ip.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(net_dev_xmit,
+
+	TP_PROTO(struct sk_buff *skb,
+		 int rc),
+
+	TP_ARGS(skb, rc),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__field(	int,		rc		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__entry->rc = rc;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
+		__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
+);
+
+DECLARE_EVENT_CLASS(net_dev_template,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u",
+		__get_str(name), __entry->skbaddr, __entry->len)
+)
+
+DEFINE_EVENT(net_dev_template, net_dev_queue,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, net_dev_receive,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+#endif /* _TRACE_NET_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/dev.c b/net/core/dev.c
index 93b8929..4acfec6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,7 @@
 #include <linux/jhash.h>
 #include <linux/random.h>
 #include <trace/events/napi.h>
+#include <trace/events/net.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -1955,6 +1956,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		rc = ops->ndo_start_xmit(skb, dev);
+		trace_net_dev_xmit(skb, rc);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
 		return rc;
@@ -1975,6 +1977,7 @@ gso:
 			skb_dst_drop(nskb);
 
 		rc = ops->ndo_start_xmit(nskb, dev);
+		trace_net_dev_xmit(nskb, rc);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;
@@ -2165,6 +2168,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
 #endif
+	trace_net_dev_queue(skb);
 	if (q->enqueue) {
 		rc = __dev_xmit_skb(skb, q, dev, txq);
 		goto out;
@@ -2939,6 +2943,7 @@ int netif_receive_skb(struct sk_buff *skb)
 	if (netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	trace_net_dev_receive(skb);
 #ifdef CONFIG_RPS
 	{
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index afa6380..7f1bb2a 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -26,6 +26,7 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/skb.h>
+#include <trace/events/net.h>
 #include <trace/events/napi.h>
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-20  0:43 [RFC PATCH v3 0/5] netdev: show a process of packets Koki Sanagi
                   ` (2 preceding siblings ...)
  2010-07-20  0:47 ` [RFC PATCH v3 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
@ 2010-07-20  0:49 ` Koki Sanagi
  2010-07-20  4:54   ` Eric Dumazet
  2010-07-20 11:50   ` Neil Horman
  2010-07-20  0:50 ` [RFC PATCH v3 5/5] perf:add a script shows a process of packet Koki Sanagi
  4 siblings, 2 replies; 25+ messages in thread
From: Koki Sanagi @ 2010-07-20  0:49 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

[RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
we can check how long it takes to free transmited packets. And using it, we can
calculate how many packets driver had at that time. It is useful when a drop of
transmited packet is a problem.

          <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
          <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840

        udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900


Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
 net/core/datagram.c        |    1 +
 net/core/dev.c             |    2 ++
 net/core/skbuff.c          |    1 +
 4 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..84c9041 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
 		__entry->skbaddr, __entry->protocol, __entry->location)
 );
 
+DECLARE_EVENT_CLASS(free_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,	skbaddr	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p", __entry->skbaddr)
+
+);
+
+DEFINE_EVENT(free_skb, consume_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+
+);
+
+DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+
+);
+
+DEFINE_EVENT(free_skb, skb_free_datagram_locked,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+
+);
+
 TRACE_EVENT(skb_copy_datagram_iovec,
 
 	TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/datagram.c b/net/core/datagram.c
index f5b6f43..1ea32a0 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -231,6 +231,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 {
 	bool slow;
 
+	trace_skb_free_datagram_locked(skb);
 	if (likely(atomic_read(&skb->users) == 1))
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
diff --git a/net/core/dev.c b/net/core/dev.c
index 4acfec6..d979847 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -131,6 +131,7 @@
 #include <linux/random.h>
 #include <trace/events/napi.h>
 #include <trace/events/net.h>
+#include <trace/events/skb.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -1581,6 +1582,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
 		struct softnet_data *sd;
 		unsigned long flags;
 
+		trace_dev_kfree_skb_irq(skb);
 		local_irq_save(flags);
 		sd = &__get_cpu_var(softnet_data);
 		skb->next = sd->completion_queue;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 34432b4..a7b4036 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
+	trace_consume_skb(skb);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(consume_skb);


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH v3 5/5] perf:add a script shows a process of packet
  2010-07-20  0:43 [RFC PATCH v3 0/5] netdev: show a process of packets Koki Sanagi
                   ` (3 preceding siblings ...)
  2010-07-20  0:49 ` [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb Koki Sanagi
@ 2010-07-20  0:50 ` Koki Sanagi
  4 siblings, 0 replies; 25+ messages in thread
From: Koki Sanagi @ 2010-07-20  0:50 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

Add a perf script which shows a process of packets and processed time.
It helps us to investigate networking or network device.

If you want to use it, install perf and record perf.data like following.

#perf trace record netdev-times [script]

If you set script, perf gathers records until it ends.
If not, you must Ctrl-C to stop recording.

And if you want a report from record,

#perf trace report netdev-times [options]

If you use some options, you can limit an output.
Option is below.

tx: show only process of tx packets
rx: show only process of rx packets
dev=: show a process specified with this option
debug: work with debug mode. It shows buffer status.

For example, if you want to show a process of received packets associated
with eth3,

#perf trace report netdev-times rx dev=eth3
79074.756672832sec cpu=1
irq_entry(+0.000000msec,irq=77:eth3)
         |------------softirq_raise(+0.001277msec)
irq_exit (+0.002278msec)     |
                      softirq_entry(+0.003562msec
                             |
                             |---netif_receive_skb(+0.006279msec,len=100)
                             |            |
                             |   skb_copy_datagram_iovec(+0.038778msec, 2285:sshd)
                             |
                      napi_poll_exit(+0.017160msec, eth3)
                             |
                      softirq_exit(+0.018248msec)


This perf script helps us to analyze a process time of transmit/receive
sequence.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 tools/perf/scripts/python/bin/netdev-times-record |    8 +
 tools/perf/scripts/python/bin/netdev-times-report |    5 +
 tools/perf/scripts/python/netdev-times.py         |  478 +++++++++++++++++++++
 3 files changed, 491 insertions(+), 0 deletions(-)

diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record
new file mode 100644
index 0000000..12da07e
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-record
@@ -0,0 +1,8 @@
+#!/bin/bash
+perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue	\
+		-e net:net_dev_receive -e skb:consume_skb		\
+		-e skb:kfree_skb -e skb:skb_free_datagram_locked	\
+		-e skb:dev_kfree_skb_irq -e napi:napi_poll		\
+		-e irq:irq_handler_entry -e irq:irq_handler_exit	\
+		-e irq:softirq_entry -e irq:softirq_exit		\
+		-e irq:softirq_raise -e skb:skb_copy_datagram_iovec $@
diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report
new file mode 100644
index 0000000..c3d0a63
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-report
@@ -0,0 +1,5 @@
+#!/bin/bash
+# description: display a process of packet and processing time
+# args: [tx] [rx] [dev=] [debug]
+
+perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@
diff --git a/tools/perf/scripts/python/less b/tools/perf/scripts/python/less
new file mode 100644
index 0000000..e69de29
diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py
new file mode 100644
index 0000000..486f16e
--- /dev/null
+++ b/tools/perf/scripts/python/netdev-times.py
@@ -0,0 +1,478 @@
+# Display a process of packets and processed time.
+# It helps us to investigate networking or network device.
+#
+# options
+# tx: show only tx chart
+# rx: show only rx chart
+# dev=: show only thing related to specified device
+# debug: work with debug mode. It shows buffer status.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+all_event_list = []; # insert all tracepoint event related with this script
+irq_dic = {}; # key is cpu and value is a list which stacks irqs
+              # which raise NET_RX softirq
+net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry
+		 # and a list which stacks receive
+receive_hunk_list = []; # a list which include a sequence of receive events
+rx_skb_list = []; # received packet list for matching
+		       # skb_copy_datagram_iovec
+
+buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and
+		       # tx_xmit_list
+of_count_rx_skb_list = 0; # overflow count
+
+tx_queue_list = []; # list of packets which pass through dev_queue_xmit
+of_count_tx_queue_list = 0; # overflow count
+
+tx_xmit_list = [];  # list of packets which pass through dev_hard_start_xmit
+of_count_tx_xmit_list = 0; # overflow count
+
+tx_free_list = [];  # list of packets which is freed
+
+# options
+show_tx = 0;
+show_rx = 0;
+dev = 0; # store a name of device specified by option "dev="
+debug = 0;
+
+# indices of event_info tuple
+EINFO_IDX_NAME=   0
+EINFO_IDX_CONTEXT=1
+EINFO_IDX_CPU=    2
+EINFO_IDX_TIME=   3
+EINFO_IDX_PID=    4
+EINFO_IDX_COMM=   5
+
+# Calculate a time interval(msec) from src(nsec) to dst(nsec)
+def diff_msec(src, dst):
+	return (dst - src) / 1000000.0
+
+# Display a process of transmitting a packet
+def print_transmit(hunk):
+	if dev != 0 and hunk['dev'].find(dev) < 0:
+		return
+	print "%7s %5d %6d.%09dsec %12.6fmsec      %12.6fmsec" % \
+		(hunk['dev'], hunk['len'],
+		nsecs_secs(hunk['queue_t']),
+		nsecs_nsecs(hunk['queue_t']),
+		diff_msec(hunk['queue_t'], hunk['xmit_t']),
+		diff_msec(hunk['xmit_t'], hunk['free_t']))
+
+PF_IRQ_ENTRY= "irq_entry(+%fmsec,irq=%d:%s)"
+PF_IRQ_EXIT=  "irq_exit (+%fmsec)     |"
+PF_SOFT_RAISE="         |------------softirq_raise(+%fmsec)"
+PF_SOFT_ENTRY="                      softirq_entry(+%fmsec)"
+PF_SOFT_EXIT= "                      softirq_exit (+%fmsec)\n"
+PF_NAPI_POLL= "                      napi_poll_exit(+%fmsec, %s)"
+PF_JOINT=     "                             |"
+PF_WJOINT=    "                             |            |"
+PF_NET_RECV=  "                             |---netif_receive_skb" \
+				"(+%fmsec,len=%d)"
+PF_CPY_DGRAM= "                             |   skb_copy_datagram_iovec" \
+				"(+%fmsec, %d:%s)"
+PF_FREE_DGRAM="                             |   skb_free_datagram_locked" \
+				"(+%fmsec)"
+PF_KFREE_SKB= "                             |   kfree_skb" \
+				"(+%fmsec)"
+PF_CONS_SKB=  "                             |   consume_skb" \
+				"(+%fmsec)"
+
+# Display a process of received packets and interrputs associated with
+# a NET_RX softirq
+def print_receive(hunk):
+	show_hunk = 0
+	irq_list = hunk['irq_list']
+	cpu = irq_list[0]['cpu']
+	base_t = irq_list[0]['irq_ent_t']
+	# check if this hunk should be showed
+	if dev != 0:
+		for i in range(len(irq_list)):
+			if irq_list[i]['name'].find(dev) >= 0:
+				show_hunk = 1
+				break
+	else:
+		show_hunk = 1
+	if show_hunk == 0:
+		return
+
+	print "%d.%09dsec cpu=%d" % \
+		(nsecs_secs(base_t), nsecs_nsecs(base_t), cpu)
+	for i in range(len(irq_list)):
+		print PF_IRQ_ENTRY % \
+			(diff_msec(base_t, irq_list[i]['irq_ent_t']),
+			irq_list[i]['irq'], irq_list[i]['name'])
+
+		if 'sirq_raise_t' in irq_list[i].keys():
+			print PF_SOFT_RAISE % \
+				diff_msec(base_t, irq_list[i]['sirq_raise_t'])
+
+		if 'irq_ext_t' in irq_list[i].keys():
+			print PF_IRQ_EXIT % \
+				diff_msec(base_t, irq_list[i]['irq_ext_t'])
+	if 'sirq_ent_t' not in hunk.keys():
+		print 'maybe softirq_entry is dropped'
+		return
+	print PF_SOFT_ENTRY % \
+		diff_msec(base_t, hunk['sirq_ent_t'])
+	print PF_JOINT
+	event_list = hunk['event_list']
+	for i in range(len(event_list)):
+		event = event_list[i]
+		if event['event_name'] == 'napi_poll':
+			print PF_NAPI_POLL % \
+			    (diff_msec(base_t, event['event_t']), event['dev'])
+		else:
+			print PF_NET_RECV % \
+			    (diff_msec(base_t, event['event_t']), event['len'])
+			if 'comm' in event.keys():
+				print PF_WJOINT
+				print PF_CPY_DGRAM % \
+					(diff_msec(base_t, event['comm_t']),
+					event['pid'], event['comm'])
+			elif 'handle' in event.keys():
+				print PF_WJOINT
+				if event['handle'] == \
+				    "skb_free_datagram_locked":
+					print PF_FREE_DGRAM % \
+						diff_msec(base_t,
+							event['comm_t'])
+				elif event['handle'] == "kfree_skb":
+					print PF_KFREE_SKB % \
+						diff_msec(base_t,
+							event['comm_t'])
+				elif event['handle'] == "consume_skb":
+					print PF_CONS_SKB % \
+						diff_msec(base_t,
+							event['comm_t'])
+		print PF_JOINT
+	print PF_SOFT_EXIT % diff_msec(base_t, hunk['sirq_ext_t'])
+
+def trace_begin():
+	global show_tx
+	global show_rx
+	global dev
+	global debug
+
+	for i in range(len(sys.argv)):
+		if i == 0:
+			continue
+		arg = sys.argv[i]
+		if arg == 'tx':
+			show_tx = 1
+		elif arg =='rx':
+			show_rx = 1
+		elif arg.find('dev=',0, 4) >= 0:
+			dev = arg[4:]
+		elif arg == 'debug':
+			debug = 1
+	if show_tx == 0  and show_rx == 0:
+		show_tx = 1
+		show_rx = 1
+
+def trace_end():
+	# order all events in time
+	all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
+					    b[EINFO_IDX_TIME]))
+	# process all events
+	for i in range(len(all_event_list)):
+		event_info = all_event_list[i]
+		name = event_info[EINFO_IDX_NAME]
+		if name == 'irq__softirq_exit':
+			handle_irq_softirq_exit(event_info)
+		elif name == 'irq__softirq_entry':
+			handle_irq_softirq_entry(event_info)
+		elif name == 'irq__softirq_raise':
+			handle_irq_softirq_raise(event_info)
+		elif name == 'irq__irq_handler_entry':
+			handle_irq_handler_entry(event_info)
+		elif name == 'irq__irq_handler_exit':
+			handle_irq_handler_exit(event_info)
+		elif name == 'napi__napi_poll':
+			handle_napi_poll(event_info)
+		elif name == 'net__net_dev_receive':
+			handle_net_dev_receive(event_info)
+		elif name == 'skb__skb_copy_datagram_iovec':
+			handle_skb_copy_datagram_iovec(event_info)
+		elif name == 'net__net_dev_queue':
+			handle_net_dev_queue(event_info)
+		elif name == 'net__net_dev_xmit':
+			handle_net_dev_xmit(event_info)
+		elif name == 'skb__kfree_skb':
+			handle_kfree_skb(event_info)
+		elif name == 'skb__dev_kfree_skb_irq':
+			handle_dev_kfree_skb_irq(event_info)
+		elif name == 'skb__consume_skb':
+			handle_consume_skb(event_info)
+		elif name == 'skb__skb_free_datagram_locked':
+			handle_skb_free_datagram_locked(event_info)
+	# display receive hunks
+	if show_rx:
+		for i in range(len(receive_hunk_list)):
+			print_receive(receive_hunk_list[i])
+	# display transmit hunks
+	if show_tx:
+		print "   dev    len      Qdisc        " \
+			"       netdevice             free"
+		for i in range(len(tx_free_list)):
+			print_transmit(tx_free_list[i])
+	if debug:
+		print "debug buffer status"
+		print "----------------------------"
+		print "xmit Qdisc:remain:%d overflow:%d" % \
+			(len(tx_queue_list), of_count_tx_queue_list)
+		print "xmit netdevice:remain:%d overflow:%d" % \
+			(len(tx_xmit_list), of_count_tx_xmit_list)
+		print "receive:remain:%d overflow:%d" % \
+			(len(rx_skb_list), of_count_rx_skb_list)
+
+# called from perf, when it finds a correspoinding event
+def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm,
+			irq, irq_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			irq, irq_name)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret)
+	all_event_list.append(event_info)
+
+def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			napi, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_receive(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+			skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, rc, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, rc ,dev_name)
+	all_event_list.append(event_info)
+
+def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, protocol, location):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, protocol, location)
+	all_event_list.append(event_info)
+
+def skb__skb_free_datagram_locked(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr)
+	all_event_list.append(event_info)
+
+def skb__dev_kfree_skb_irq(name, context, cpu, sec, nsec, pid, comm, skbaddr):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr)
+	all_event_list.append(event_info)
+
+def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr)
+	all_event_list.append(event_info)
+
+def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm,
+	skbaddr, skblen):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen)
+	all_event_list.append(event_info)
+
+def handle_irq_softirq_exit(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	irq_list = []
+	event_list = 0
+	if cpu in irq_dic.keys():
+		irq_list = irq_dic[cpu]
+		del irq_dic[cpu]
+	if cpu in net_rx_dic.keys():
+		sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t']
+		event_list = net_rx_dic[cpu]['event_list']
+		del net_rx_dic[cpu]
+	if irq_list == [] or event_list == 0:
+		return
+	rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time,
+		    'irq_list':irq_list, 'event_list':event_list}
+	# merge information realted to a NET_RX softirq
+	receive_hunk_list.append(rec_data)
+
+def handle_irq_softirq_entry(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]}
+
+def handle_irq_softirq_raise(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	if cpu not in irq_dic.keys() \
+	or len(irq_dic[cpu]) == 0:
+		return
+	irq = irq_dic[cpu].pop()
+	# put a time to prev irq on the same cpu
+	irq.update({'sirq_raise_t':time})
+	irq_dic[cpu].append(irq)
+
+def handle_irq_handler_entry(event_info):
+	(name, context, cpu, time, pid, comm, irq, irq_name) = event_info
+	if cpu not in irq_dic.keys():
+		irq_dic[cpu] = []
+	irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time}
+	irq_dic[cpu].append(irq_record)
+
+def handle_irq_handler_exit(event_info):
+	(name, context, cpu, time, pid, comm, irq, ret) = event_info
+	if cpu not in irq_dic.keys():
+		return
+	irq_record = irq_dic[cpu].pop()
+	if irq != irq_record['irq']:
+		return
+	irq_record.update({'irq_ext_t':time})
+	# if an irq doesn't include NET_RX softirq, drop.
+	if 'sirq_raise_t' in irq_record.keys():
+		irq_dic[cpu].append(irq_record)
+
+def handle_napi_poll(event_info):
+	(name, context, cpu, time, pid, comm, napi, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		event_list = net_rx_dic[cpu]['event_list']
+		rec_data = {'event_name':'napi_poll',
+				'dev':dev_name, 'event_t':time}
+		event_list.append(rec_data)
+
+def handle_net_dev_receive(event_info):
+	global of_count_rx_skb_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		rec_data = {'event_name':'netif_receive_skb',
+			    'event_t':time, 'skbaddr':skbaddr, 'len':skblen}
+		event_list = net_rx_dic[cpu]['event_list']
+		event_list.append(rec_data)
+		rx_skb_list.insert(0, rec_data)
+		if len(rx_skb_list) > buffer_budget:
+			rx_skb_list.pop()
+			of_count_rx_skb_list += 1
+
+def handle_net_dev_queue(event_info):
+	global of_count_tx_queue_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time}
+	tx_xmit_list.insert(0, skb)
+	if len(tx_xmit_list) > buffer_budget:
+		tx_xmit_list.pop()
+		of_count_tx_xmit_list += 1
+
+def handle_net_dev_xmit(event_info):
+	global of_count_tx_queue_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, rc, dev_name) = event_info
+	if rc == 0: # NETDEV_TX_OK
+		for i in range(len(tx_xmit_list)):
+			skb = tx_xmit_list[i]
+			if skb['skbaddr'] == skbaddr:
+				skb['xmit_t'] = time
+				tx_queue_list.insert(0, skb)
+				del tx_xmit_list[i]
+				if len(tx_queue_list) > buffer_budget:
+					tx_queue_list.pop()
+					of_count_tx_queue_list += 1
+				return
+
+def handle_kfree_skb(event_info):
+	(name, context, cpu, time, pid, comm,
+		skbaddr, protocol, location) = event_info
+	for i in range(len(tx_queue_list)):
+		skb = tx_queue_list[i]
+		if skb['skbaddr'] == skbaddr:
+			del tx_queue_list[i]
+			return
+	for i in range(len(tx_xmit_list)):
+		skb = tx_xmit_list[i]
+		if skb['skbaddr'] == skbaddr:
+			del tx_xmit_list[i]
+			return
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if rec_data['skbaddr'] == skbaddr:
+			rec_data.update({'handle':"kfree_skb",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return
+
+def handle_dev_kfree_skb_irq(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr) = event_info
+	for i in range(len(tx_queue_list)):
+		skb = tx_queue_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_queue_list[i]
+			return
+
+def handle_consume_skb(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr) = event_info
+	for i in range(len(tx_queue_list)):
+		skb = tx_queue_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_queue_list[i]
+			return
+
+def handle_skb_free_datagram_locked(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr) = event_info
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if skbaddr == rec_data['skbaddr']:
+			rec_data.update({'handle':"skb_free_datagram_locked",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return
+
+def handle_skb_copy_datagram_iovec(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if skbaddr == rec_data['skbaddr']:
+			rec_data.update({'handle':"skb_copy_datagram_iovec",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-20  0:49 ` [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb Koki Sanagi
@ 2010-07-20  4:54   ` Eric Dumazet
  2010-07-20  6:47     ` Koki Sanagi
  2010-07-20 11:50   ` Neil Horman
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2010-07-20  4:54 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	fweisbec, mathieu.desnoyers

Le mardi 20 juillet 2010 à 09:49 +0900, Koki Sanagi a écrit :
> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
> we can check how long it takes to free transmited packets. And using it, we can
> calculate how many packets driver had at that time. It is useful when a drop of
> transmited packet is a problem.
> 
>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
> 
>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
> 
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
>  net/core/datagram.c        |    1 +
>  net/core/dev.c             |    2 ++
>  net/core/skbuff.c          |    1 +
>  4 files changed, 46 insertions(+), 0 deletions(-)
> 
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 4b2be6d..84c9041 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
>  		__entry->skbaddr, __entry->protocol, __entry->location)
>  );
>  
> +DECLARE_EVENT_CLASS(free_skb,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb),
> +
> +	TP_STRUCT__entry(
> +		__field(	void *,	skbaddr	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->skbaddr = skb;
> +	),
> +
> +	TP_printk("skbaddr=%p", __entry->skbaddr)
> +
> +);
> +
> +DEFINE_EVENT(free_skb, consume_skb,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +
> +);
> +
> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +
> +);
> +
> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +
> +);
> +
>  TRACE_EVENT(skb_copy_datagram_iovec,
>  
>  	TP_PROTO(const struct sk_buff *skb, int len),
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index f5b6f43..1ea32a0 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -231,6 +231,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
>  {
>  	bool slow;
>  
> +	trace_skb_free_datagram_locked(skb);

Here you unconditionally trace before the test on skb->users

>  	if (likely(atomic_read(&skb->users) == 1))
>  		smp_rmb();
>  	else if (likely(!atomic_dec_and_test(&skb->users)))
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4acfec6..d979847 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -131,6 +131,7 @@
>  #include <linux/random.h>
>  #include <trace/events/napi.h>
>  #include <trace/events/net.h>
> +#include <trace/events/skb.h>
>  #include <linux/pci.h>
>  
>  #include "net-sysfs.h"
> @@ -1581,6 +1582,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
>  		struct softnet_data *sd;
>  		unsigned long flags;
>  
> +		trace_dev_kfree_skb_irq(skb);
>  		local_irq_save(flags);
>  		sd = &__get_cpu_var(softnet_data);
>  		skb->next = sd->completion_queue;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 34432b4..a7b4036 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
>  		smp_rmb();
>  	else if (likely(!atomic_dec_and_test(&skb->users)))
>  		return;

While here you trace _after_ the test on skb->users - and a "return;" ,
so you miss some consume_skb() calls


> +	trace_consume_skb(skb);
>  	__kfree_skb(skb);
>  }
>  EXPORT_SYMBOL(consume_skb);
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-20  4:54   ` Eric Dumazet
@ 2010-07-20  6:47     ` Koki Sanagi
  0 siblings, 0 replies; 25+ messages in thread
From: Koki Sanagi @ 2010-07-20  6:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	fweisbec, mathieu.desnoyers

(2010/07/20 13:54), Eric Dumazet wrote:
> Le mardi 20 juillet 2010 à 09:49 +0900, Koki Sanagi a écrit :
>> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
>> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
>> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
>> we can check how long it takes to free transmited packets. And using it, we can
>> calculate how many packets driver had at that time. It is useful when a drop of
>> transmited packet is a problem.
>>
>>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
>>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
>>
>>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
>>
>>
>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>> ---
>>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
>>  net/core/datagram.c        |    1 +
>>  net/core/dev.c             |    2 ++
>>  net/core/skbuff.c          |    1 +
>>  4 files changed, 46 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
>> index 4b2be6d..84c9041 100644
>> --- a/include/trace/events/skb.h
>> +++ b/include/trace/events/skb.h
>> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
>>  		__entry->skbaddr, __entry->protocol, __entry->location)
>>  );
>>  
>> +DECLARE_EVENT_CLASS(free_skb,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	void *,	skbaddr	)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->skbaddr = skb;
>> +	),
>> +
>> +	TP_printk("skbaddr=%p", __entry->skbaddr)
>> +
>> +);
>> +
>> +DEFINE_EVENT(free_skb, consume_skb,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +
>> +);
>> +
>> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +
>> +);
>> +
>> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +
>> +);
>> +
>>  TRACE_EVENT(skb_copy_datagram_iovec,
>>  
>>  	TP_PROTO(const struct sk_buff *skb, int len),
>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>> index f5b6f43..1ea32a0 100644
>> --- a/net/core/datagram.c
>> +++ b/net/core/datagram.c
>> @@ -231,6 +231,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
>>  {
>>  	bool slow;
>>  
>> +	trace_skb_free_datagram_locked(skb);
> 
> Here you unconditionally trace before the test on skb->users
> 
>>  	if (likely(atomic_read(&skb->users) == 1))
>>  		smp_rmb();
>>  	else if (likely(!atomic_dec_and_test(&skb->users)))
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 4acfec6..d979847 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -131,6 +131,7 @@
>>  #include <linux/random.h>
>>  #include <trace/events/napi.h>
>>  #include <trace/events/net.h>
>> +#include <trace/events/skb.h>
>>  #include <linux/pci.h>
>>  
>>  #include "net-sysfs.h"
>> @@ -1581,6 +1582,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
>>  		struct softnet_data *sd;
>>  		unsigned long flags;
>>  
>> +		trace_dev_kfree_skb_irq(skb);
>>  		local_irq_save(flags);
>>  		sd = &__get_cpu_var(softnet_data);
>>  		skb->next = sd->completion_queue;
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 34432b4..a7b4036 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
>>  		smp_rmb();
>>  	else if (likely(!atomic_dec_and_test(&skb->users)))
>>  		return;
> 
> While here you trace _after_ the test on skb->users - and a "return;" ,
> so you miss some consume_skb() calls
> 
Yeah, I'll move trace_consume_skb() before the test.

Thanks,
Koki Sanagi.

> 
>> +	trace_consume_skb(skb);
>>  	__kfree_skb(skb);
>>  }
>>  EXPORT_SYMBOL(consume_skb);
>>
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-20  0:45 ` [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
@ 2010-07-20 11:04   ` Neil Horman
  2010-07-21  6:57     ` Koki Sanagi
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Horman @ 2010-07-20 11:04 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Tue, Jul 20, 2010 at 09:45:31AM +0900, Koki Sanagi wrote:
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> Add a tracepoint for tracing when softirq action is raised.
> 
> It and the existed tracepoints complete softirq's tracepoints:
> softirq_raise, softirq_entry and softirq_exit.
> 
> And when this tracepoint is used in combination with
> the softirq_entry tracepoint we can determine
> the softirq raise latency.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
> 
> [ factorize softirq events with DECLARE_EVENT_CLASS ]
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  include/linux/interrupt.h  |    8 +++++-
>  include/trace/events/irq.h |   57 ++++++++++++++++++++++++++-----------------
>  kernel/softirq.c           |    4 +-
>  3 files changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index c233113..1cb5726 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -18,6 +18,7 @@
>  #include <asm/atomic.h>
>  #include <asm/ptrace.h>
>  #include <asm/system.h>
> +#include <trace/events/irq.h>
>  
>  /*
>   * These correspond to the IORESOURCE_IRQ_* defines in
> @@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
>  asmlinkage void __do_softirq(void);
>  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
>  extern void softirq_init(void);
> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
> +static inline void __raise_softirq_irqoff(unsigned int nr)
> +{
> +	trace_softirq_raise(nr);
> +	or_softirq_pending(1UL << nr);
> +}
> +
We already have tracepoints in irq_enter and irq_exit.  If the goal here is to
detect latency during packet processing, cant the delta in time between those
two points be used to determine interrupt handling latency?


>  extern void raise_softirq_irqoff(unsigned int nr);
>  extern void raise_softirq(unsigned int nr);
>  extern void wakeup_softirqd(void);
> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> index 0e4cfb6..717744c 100644
> --- a/include/trace/events/irq.h
> +++ b/include/trace/events/irq.h
> @@ -5,7 +5,9 @@
>  #define _TRACE_IRQ_H
>  
>  #include <linux/tracepoint.h>
> -#include <linux/interrupt.h>
> +
> +struct irqaction;
> +struct softirq_action;
>  
>  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
>  #define show_softirq_name(val)				\
> @@ -84,56 +86,65 @@ TRACE_EVENT(irq_handler_exit,
>  
>  DECLARE_EVENT_CLASS(softirq,
>  
> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +	TP_PROTO(unsigned int nr),
>  
> -	TP_ARGS(h, vec),
> +	TP_ARGS(nr),
>  
>  	TP_STRUCT__entry(
> -		__field(	int,	vec			)
> +		__field(	unsigned int,	vec	)
>  	),
>  
>  	TP_fast_assign(
> -		__entry->vec = (int)(h - vec);
> +		__entry->vec	= nr;
>  	),
>  
>  	TP_printk("vec=%d [action=%s]", __entry->vec,
> -		  show_softirq_name(__entry->vec))
> +		show_softirq_name(__entry->vec))
> +);
> +
> +/**
> + * softirq_raise - called immediately when a softirq is raised
> + * @nr: softirq vector number
> + *
> + * Tracepoint for tracing when softirq action is raised.
> + * Also, when used in combination with the softirq_entry tracepoint
> + * we can determine the softirq raise latency.
> + */
> +DEFINE_EVENT(softirq, softirq_raise,
> +
> +	TP_PROTO(unsigned int nr),
> +
> +	TP_ARGS(nr)
>  );
>  
>  /**
>   * softirq_entry - called immediately before the softirq handler
> - * @h: pointer to struct softirq_action
> - * @vec: pointer to first struct softirq_action in softirq_vec array
> + * @nr: softirq vector number
>   *
> - * The @h parameter, contains a pointer to the struct softirq_action
> - * which has a pointer to the action handler that is called. By subtracting
> - * the @vec pointer from the @h pointer, we can determine the softirq
> - * number. Also, when used in combination with the softirq_exit tracepoint
> + * Tracepoint for tracing when softirq action starts.
> + * Also, when used in combination with the softirq_exit tracepoint
>   * we can determine the softirq latency.
>   */
>  DEFINE_EVENT(softirq, softirq_entry,
>  
> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +	TP_PROTO(unsigned int nr),
>  
> -	TP_ARGS(h, vec)
> +	TP_ARGS(nr)
>  );
>  
>  /**
>   * softirq_exit - called immediately after the softirq handler returns
> - * @h: pointer to struct softirq_action
> - * @vec: pointer to first struct softirq_action in softirq_vec array
> + * @nr: softirq vector number
>   *
> - * The @h parameter contains a pointer to the struct softirq_action
> - * that has handled the softirq. By subtracting the @vec pointer from
> - * the @h pointer, we can determine the softirq number. Also, when used in
> - * combination with the softirq_entry tracepoint we can determine the softirq
> - * latency.
> + * Tracepoint for tracing when softirq action ends.
> + * Also, when used in combination with the softirq_entry tracepoint
> + * we can determine the softirq latency.
>   */
>  DEFINE_EVENT(softirq, softirq_exit,
>  
> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +	TP_PROTO(unsigned int nr),
>  
> -	TP_ARGS(h, vec)
> +	TP_ARGS(nr)
>  );
>  
>  #endif /*  _TRACE_IRQ_H */
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 825e112..6790599 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -215,9 +215,9 @@ restart:
>  			int prev_count = preempt_count();
>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
>  
> -			trace_softirq_entry(h, softirq_vec);
> +			trace_softirq_entry(h - softirq_vec);
>  			h->action(h);
> -			trace_softirq_exit(h, softirq_vec);
> +			trace_softirq_exit(h - softirq_vec);

You're loosing information here by reducing the numbers of parameters in this
tracepoint.  How many other tracepoint scripts rely on having both pointers
handy?  Why not just do the pointer math inside your tracehook instead?

>  			if (unlikely(prev_count != preempt_count())) {
>  				printk(KERN_ERR "huh, entered softirq %td %s %p"
>  				       "with preempt_count %08x,"
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT
  2010-07-20  0:46 ` [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
@ 2010-07-20 11:09   ` Neil Horman
  2010-07-21  7:00     ` Koki Sanagi
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Horman @ 2010-07-20 11:09 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Tue, Jul 20, 2010 at 09:46:51AM +0900, Koki Sanagi wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> 
> This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
> the usability of napi_poll tracepoint.
> 
>           <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
>           <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1
> 
> An original patch is below.
> http://marc.info/?l=linux-kernel&m=126021713809450&w=2
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> And add a fix by Steven Rostedt.
> http://marc.info/?l=linux-kernel&m=126150506519173&w=2
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  include/trace/events/napi.h |   25 +++++++++++++++++++++++--
>  1 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
> index 188deca..8fe1e93 100644
> --- a/include/trace/events/napi.h
> +++ b/include/trace/events/napi.h
> @@ -6,10 +6,31 @@
>  
>  #include <linux/netdevice.h>
>  #include <linux/tracepoint.h>
> +#include <linux/ftrace.h>
> +
> +#define NO_DEV "(no_device)"
> +
> +TRACE_EVENT(napi_poll,
>  
> -DECLARE_TRACE(napi_poll,
>  	TP_PROTO(struct napi_struct *napi),
> -	TP_ARGS(napi));
> +
> +	TP_ARGS(napi),
> +
> +	TP_STRUCT__entry(
> +		__field(	struct napi_struct *,	napi)
> +		__string(	dev_name, napi->dev ? napi->dev->name : NO_DEV)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->napi = napi;
> +		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> +	),
> +
> +	TP_printk("napi poll on napi struct %p for device %s",
> +		__entry->napi, __get_str(dev_name))
> +);
> +
> +#undef NO_DEV
>  
>  #endif /* _TRACE_NAPI_H_ */
>  
> 
NAK, This change will create a build break in the drop monitor code.  You'll
need to fix that up if you want to make this change.
Neil


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 3/5] netdev: add tracepoints to netdev layer
  2010-07-20  0:47 ` [RFC PATCH v3 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
@ 2010-07-20 11:41   ` Neil Horman
  2010-07-21  7:01     ` Koki Sanagi
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Horman @ 2010-07-20 11:41 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Tue, Jul 20, 2010 at 09:47:59AM +0900, Koki Sanagi wrote:
> This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit and
> netif_receive_skb. These tracepoints help you to monitor network driver's
> input/output.
> 
>             sshd-4445  [001] 241367.066046: net_dev_queue: dev=eth3 skbaddr=dd6b2538 len=114
>             sshd-4445  [001] 241367.066047: net_dev_xmit: dev=eth3 skbaddr=dd6b2538 len=114 rc=0
>           <idle>-0     [001] 241367.067472: net_dev_receive: dev=eth3 skbaddr=f5e59000 len=52
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  include/trace/events/net.h |   75 ++++++++++++++++++++++++++++++++++++++++++++
>  net/core/dev.c             |    5 +++
>  net/core/net-traces.c      |    1 +
>  3 files changed, 81 insertions(+), 0 deletions(-)
> 
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> new file mode 100644
> index 0000000..8a21361
> --- /dev/null
> +++ b/include/trace/events/net.h
> @@ -0,0 +1,75 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM net
> +
> +#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_NET_H
> +
> +#include <linux/skbuff.h>
> +#include <linux/netdevice.h>
> +#include <linux/ip.h>
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(net_dev_xmit,
> +
> +	TP_PROTO(struct sk_buff *skb,
> +		 int rc),
> +
> +	TP_ARGS(skb, rc),
> +
> +	TP_STRUCT__entry(
> +		__field(	void *,		skbaddr		)
> +		__field(	unsigned int,	len		)
> +		__field(	int,		rc		)
> +		__string(	name,		skb->dev->name	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->skbaddr = skb;
> +		__entry->len = skb->len;
> +		__entry->rc = rc;
> +		__assign_str(name, skb->dev->name);
> +	),
> +
> +	TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
> +		__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
> +);
> +
> +DECLARE_EVENT_CLASS(net_dev_template,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb),
> +
> +	TP_STRUCT__entry(
> +		__field(	void *,		skbaddr		)
> +		__field(	unsigned int,	len		)
> +		__string(	name,		skb->dev->name	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->skbaddr = skb;
> +		__entry->len = skb->len;
> +		__assign_str(name, skb->dev->name);
> +	),
> +
> +	TP_printk("dev=%s skbaddr=%p len=%u",
> +		__get_str(name), __entry->skbaddr, __entry->len)
> +)
> +
> +DEFINE_EVENT(net_dev_template, net_dev_queue,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +);
> +
> +DEFINE_EVENT(net_dev_template, net_dev_receive,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +);
> +#endif /* _TRACE_NET_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 93b8929..4acfec6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -130,6 +130,7 @@
>  #include <linux/jhash.h>
>  #include <linux/random.h>
>  #include <trace/events/napi.h>
> +#include <trace/events/net.h>
>  #include <linux/pci.h>
>  
>  #include "net-sysfs.h"
> @@ -1955,6 +1956,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  		}
>  
>  		rc = ops->ndo_start_xmit(skb, dev);
> +		trace_net_dev_xmit(skb, rc);
>  		if (rc == NETDEV_TX_OK)
>  			txq_trans_update(txq);
>  		return rc;
> @@ -1975,6 +1977,7 @@ gso:
>  			skb_dst_drop(nskb);
>  
>  		rc = ops->ndo_start_xmit(nskb, dev);
> +		trace_net_dev_xmit(nskb, rc);
>  		if (unlikely(rc != NETDEV_TX_OK)) {
>  			if (rc & ~NETDEV_TX_MASK)
>  				goto out_kfree_gso_skb;
> @@ -2165,6 +2168,7 @@ int dev_queue_xmit(struct sk_buff *skb)
>  #ifdef CONFIG_NET_CLS_ACT
>  	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
>  #endif
> +	trace_net_dev_queue(skb);
>  	if (q->enqueue) {
>  		rc = __dev_xmit_skb(skb, q, dev, txq);
>  		goto out;
> @@ -2939,6 +2943,7 @@ int netif_receive_skb(struct sk_buff *skb)
>  	if (netdev_tstamp_prequeue)
>  		net_timestamp_check(skb);
>  
> +	trace_net_dev_receive(skb);

I imagine for completeness you'll want to make a call to this in netif_rx and
netif_rx_ni as well.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-20  0:49 ` [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb Koki Sanagi
  2010-07-20  4:54   ` Eric Dumazet
@ 2010-07-20 11:50   ` Neil Horman
  2010-07-21  7:02     ` Koki Sanagi
  1 sibling, 1 reply; 25+ messages in thread
From: Neil Horman @ 2010-07-20 11:50 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Tue, Jul 20, 2010 at 09:49:10AM +0900, Koki Sanagi wrote:
> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
> we can check how long it takes to free transmited packets. And using it, we can
> calculate how many packets driver had at that time. It is useful when a drop of
> transmited packet is a problem.
> 
>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
> 
>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
> 
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
>  net/core/datagram.c        |    1 +
>  net/core/dev.c             |    2 ++
>  net/core/skbuff.c          |    1 +
>  4 files changed, 46 insertions(+), 0 deletions(-)
> 
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 4b2be6d..84c9041 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
>  		__entry->skbaddr, __entry->protocol, __entry->location)
>  );
>  
> +DECLARE_EVENT_CLASS(free_skb,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb),
> +
> +	TP_STRUCT__entry(
> +		__field(	void *,	skbaddr	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->skbaddr = skb;
> +	),
> +
> +	TP_printk("skbaddr=%p", __entry->skbaddr)
> +
> +);
> +
> +DEFINE_EVENT(free_skb, consume_skb,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +
> +);
> +
> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +
> +);
> +
> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
> +
> +	TP_PROTO(struct sk_buff *skb),
> +
> +	TP_ARGS(skb)
> +
> +);
> +

Why create these last two tracepoints at all?  dev_kfree_skb_irq will eventually
pass through kfree_skb anyway, getting picked up by the tracepoint there, the
while the latter won't (since it uses __kfree_skb instead), I think that could
be fixed up by add a call to trace_kfree_skb there directly, saving you two
tracepoints.

Neil

> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-20 11:04   ` Neil Horman
@ 2010-07-21  6:57     ` Koki Sanagi
  2010-07-21 11:14       ` Neil Horman
  0 siblings, 1 reply; 25+ messages in thread
From: Koki Sanagi @ 2010-07-21  6:57 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/07/20 20:04), Neil Horman wrote:
> On Tue, Jul 20, 2010 at 09:45:31AM +0900, Koki Sanagi wrote:
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>
>> Add a tracepoint for tracing when softirq action is raised.
>>
>> It and the existed tracepoints complete softirq's tracepoints:
>> softirq_raise, softirq_entry and softirq_exit.
>>
>> And when this tracepoint is used in combination with
>> the softirq_entry tracepoint we can determine
>> the softirq raise latency.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
>> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
>>
>> [ factorize softirq events with DECLARE_EVENT_CLASS ]
>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>> ---
>>  include/linux/interrupt.h  |    8 +++++-
>>  include/trace/events/irq.h |   57 ++++++++++++++++++++++++++-----------------
>>  kernel/softirq.c           |    4 +-
>>  3 files changed, 43 insertions(+), 26 deletions(-)
>>
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index c233113..1cb5726 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -18,6 +18,7 @@
>>  #include <asm/atomic.h>
>>  #include <asm/ptrace.h>
>>  #include <asm/system.h>
>> +#include <trace/events/irq.h>
>>  
>>  /*
>>   * These correspond to the IORESOURCE_IRQ_* defines in
>> @@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
>>  asmlinkage void __do_softirq(void);
>>  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
>>  extern void softirq_init(void);
>> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
>> +static inline void __raise_softirq_irqoff(unsigned int nr)
>> +{
>> +	trace_softirq_raise(nr);
>> +	or_softirq_pending(1UL << nr);
>> +}
>> +
> We already have tracepoints in irq_enter and irq_exit.  If the goal here is to
> detect latency during packet processing, cant the delta in time between those
> two points be used to determine interrupt handling latency?

Certainly, the time between irq_entry and irq_exit is not directly related to
latency during packet processing. But it's indirectly related it.
Because softirq_entry isn't passed until irq exits and softirq_entry time is
related to packet processing latency. So I show it as a reference.

> 
> 
>>  extern void raise_softirq_irqoff(unsigned int nr);
>>  extern void raise_softirq(unsigned int nr);
>>  extern void wakeup_softirqd(void);
>> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
>> index 0e4cfb6..717744c 100644
>> --- a/include/trace/events/irq.h
>> +++ b/include/trace/events/irq.h
>> @@ -5,7 +5,9 @@
>>  #define _TRACE_IRQ_H
>>  
>>  #include <linux/tracepoint.h>
>> -#include <linux/interrupt.h>
>> +
>> +struct irqaction;
>> +struct softirq_action;
>>  
>>  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
>>  #define show_softirq_name(val)				\
>> @@ -84,56 +86,65 @@ TRACE_EVENT(irq_handler_exit,
>>  
>>  DECLARE_EVENT_CLASS(softirq,
>>  
>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>> +	TP_PROTO(unsigned int nr),
>>  
>> -	TP_ARGS(h, vec),
>> +	TP_ARGS(nr),
>>  
>>  	TP_STRUCT__entry(
>> -		__field(	int,	vec			)
>> +		__field(	unsigned int,	vec	)
>>  	),
>>  
>>  	TP_fast_assign(
>> -		__entry->vec = (int)(h - vec);
>> +		__entry->vec	= nr;
>>  	),
>>  
>>  	TP_printk("vec=%d [action=%s]", __entry->vec,
>> -		  show_softirq_name(__entry->vec))
>> +		show_softirq_name(__entry->vec))
>> +);
>> +
>> +/**
>> + * softirq_raise - called immediately when a softirq is raised
>> + * @nr: softirq vector number
>> + *
>> + * Tracepoint for tracing when softirq action is raised.
>> + * Also, when used in combination with the softirq_entry tracepoint
>> + * we can determine the softirq raise latency.
>> + */
>> +DEFINE_EVENT(softirq, softirq_raise,
>> +
>> +	TP_PROTO(unsigned int nr),
>> +
>> +	TP_ARGS(nr)
>>  );
>>  
>>  /**
>>   * softirq_entry - called immediately before the softirq handler
>> - * @h: pointer to struct softirq_action
>> - * @vec: pointer to first struct softirq_action in softirq_vec array
>> + * @nr: softirq vector number
>>   *
>> - * The @h parameter, contains a pointer to the struct softirq_action
>> - * which has a pointer to the action handler that is called. By subtracting
>> - * the @vec pointer from the @h pointer, we can determine the softirq
>> - * number. Also, when used in combination with the softirq_exit tracepoint
>> + * Tracepoint for tracing when softirq action starts.
>> + * Also, when used in combination with the softirq_exit tracepoint
>>   * we can determine the softirq latency.
>>   */
>>  DEFINE_EVENT(softirq, softirq_entry,
>>  
>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>> +	TP_PROTO(unsigned int nr),
>>  
>> -	TP_ARGS(h, vec)
>> +	TP_ARGS(nr)
>>  );
>>  
>>  /**
>>   * softirq_exit - called immediately after the softirq handler returns
>> - * @h: pointer to struct softirq_action
>> - * @vec: pointer to first struct softirq_action in softirq_vec array
>> + * @nr: softirq vector number
>>   *
>> - * The @h parameter contains a pointer to the struct softirq_action
>> - * that has handled the softirq. By subtracting the @vec pointer from
>> - * the @h pointer, we can determine the softirq number. Also, when used in
>> - * combination with the softirq_entry tracepoint we can determine the softirq
>> - * latency.
>> + * Tracepoint for tracing when softirq action ends.
>> + * Also, when used in combination with the softirq_entry tracepoint
>> + * we can determine the softirq latency.
>>   */
>>  DEFINE_EVENT(softirq, softirq_exit,
>>  
>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>> +	TP_PROTO(unsigned int nr),
>>  
>> -	TP_ARGS(h, vec)
>> +	TP_ARGS(nr)
>>  );
>>  
>>  #endif /*  _TRACE_IRQ_H */
>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>> index 825e112..6790599 100644
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -215,9 +215,9 @@ restart:
>>  			int prev_count = preempt_count();
>>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
>>  
>> -			trace_softirq_entry(h, softirq_vec);
>> +			trace_softirq_entry(h - softirq_vec);
>>  			h->action(h);
>> -			trace_softirq_exit(h, softirq_vec);
>> +			trace_softirq_exit(h - softirq_vec);
> 
> You're loosing information here by reducing the numbers of parameters in this
> tracepoint.  How many other tracepoint scripts rely on having both pointers
> handy?  Why not just do the pointer math inside your tracehook instead?

In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
can't use softirq DECLARE_EVENT_CLASS as is.
Currently,  there is no script using softirq_entry or softirq_exit.

Thanks,
Koki Sanagi.

> 
>>  			if (unlikely(prev_count != preempt_count())) {
>>  				printk(KERN_ERR "huh, entered softirq %td %s %p"
>>  				       "with preempt_count %08x,"
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT
  2010-07-20 11:09   ` Neil Horman
@ 2010-07-21  7:00     ` Koki Sanagi
  2010-07-21 11:24       ` Neil Horman
  0 siblings, 1 reply; 25+ messages in thread
From: Koki Sanagi @ 2010-07-21  7:00 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/07/20 20:09), Neil Horman wrote:
> On Tue, Jul 20, 2010 at 09:46:51AM +0900, Koki Sanagi wrote:
>> From: Neil Horman <nhorman@tuxdriver.com>
>>
>> This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
>> the usability of napi_poll tracepoint.
>>
>>           <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
>>           <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1
>>
>> An original patch is below.
>> http://marc.info/?l=linux-kernel&m=126021713809450&w=2
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>>
>> And add a fix by Steven Rostedt.
>> http://marc.info/?l=linux-kernel&m=126150506519173&w=2
>>
>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>> ---
>>  include/trace/events/napi.h |   25 +++++++++++++++++++++++--
>>  1 files changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
>> index 188deca..8fe1e93 100644
>> --- a/include/trace/events/napi.h
>> +++ b/include/trace/events/napi.h
>> @@ -6,10 +6,31 @@
>>  
>>  #include <linux/netdevice.h>
>>  #include <linux/tracepoint.h>
>> +#include <linux/ftrace.h>
>> +
>> +#define NO_DEV "(no_device)"
>> +
>> +TRACE_EVENT(napi_poll,
>>  
>> -DECLARE_TRACE(napi_poll,
>>  	TP_PROTO(struct napi_struct *napi),
>> -	TP_ARGS(napi));
>> +
>> +	TP_ARGS(napi),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	struct napi_struct *,	napi)
>> +		__string(	dev_name, napi->dev ? napi->dev->name : NO_DEV)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->napi = napi;
>> +		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
>> +	),
>> +
>> +	TP_printk("napi poll on napi struct %p for device %s",
>> +		__entry->napi, __get_str(dev_name))
>> +);
>> +
>> +#undef NO_DEV
>>  
>>  #endif /* _TRACE_NAPI_H_ */
>>  
>>
> NAK, This change will create a build break in the drop monitor code.  You'll
> need to fix that up if you want to make this change.
> Neil
> 
I built a kernel with CONFIG_NET_DROP_MONITOR=y.
But build break did not occur.

Thanks,
Koki Sanagi.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 3/5] netdev: add tracepoints to netdev layer
  2010-07-20 11:41   ` Neil Horman
@ 2010-07-21  7:01     ` Koki Sanagi
  0 siblings, 0 replies; 25+ messages in thread
From: Koki Sanagi @ 2010-07-21  7:01 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/07/20 20:41), Neil Horman wrote:
> On Tue, Jul 20, 2010 at 09:47:59AM +0900, Koki Sanagi wrote:
>> This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit and
>> netif_receive_skb. These tracepoints help you to monitor network driver's
>> input/output.
>>
>>             sshd-4445  [001] 241367.066046: net_dev_queue: dev=eth3 skbaddr=dd6b2538 len=114
>>             sshd-4445  [001] 241367.066047: net_dev_xmit: dev=eth3 skbaddr=dd6b2538 len=114 rc=0
>>           <idle>-0     [001] 241367.067472: net_dev_receive: dev=eth3 skbaddr=f5e59000 len=52
>>
>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>> ---
>>  include/trace/events/net.h |   75 ++++++++++++++++++++++++++++++++++++++++++++
>>  net/core/dev.c             |    5 +++
>>  net/core/net-traces.c      |    1 +
>>  3 files changed, 81 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
>> new file mode 100644
>> index 0000000..8a21361
>> --- /dev/null
>> +++ b/include/trace/events/net.h
>> @@ -0,0 +1,75 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM net
>> +
>> +#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_NET_H
>> +
>> +#include <linux/skbuff.h>
>> +#include <linux/netdevice.h>
>> +#include <linux/ip.h>
>> +#include <linux/tracepoint.h>
>> +
>> +TRACE_EVENT(net_dev_xmit,
>> +
>> +	TP_PROTO(struct sk_buff *skb,
>> +		 int rc),
>> +
>> +	TP_ARGS(skb, rc),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	void *,		skbaddr		)
>> +		__field(	unsigned int,	len		)
>> +		__field(	int,		rc		)
>> +		__string(	name,		skb->dev->name	)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->skbaddr = skb;
>> +		__entry->len = skb->len;
>> +		__entry->rc = rc;
>> +		__assign_str(name, skb->dev->name);
>> +	),
>> +
>> +	TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
>> +		__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
>> +);
>> +
>> +DECLARE_EVENT_CLASS(net_dev_template,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	void *,		skbaddr		)
>> +		__field(	unsigned int,	len		)
>> +		__string(	name,		skb->dev->name	)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->skbaddr = skb;
>> +		__entry->len = skb->len;
>> +		__assign_str(name, skb->dev->name);
>> +	),
>> +
>> +	TP_printk("dev=%s skbaddr=%p len=%u",
>> +		__get_str(name), __entry->skbaddr, __entry->len)
>> +)
>> +
>> +DEFINE_EVENT(net_dev_template, net_dev_queue,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +);
>> +
>> +DEFINE_EVENT(net_dev_template, net_dev_receive,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +);
>> +#endif /* _TRACE_NET_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 93b8929..4acfec6 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -130,6 +130,7 @@
>>  #include <linux/jhash.h>
>>  #include <linux/random.h>
>>  #include <trace/events/napi.h>
>> +#include <trace/events/net.h>
>>  #include <linux/pci.h>
>>  
>>  #include "net-sysfs.h"
>> @@ -1955,6 +1956,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>>  		}
>>  
>>  		rc = ops->ndo_start_xmit(skb, dev);
>> +		trace_net_dev_xmit(skb, rc);
>>  		if (rc == NETDEV_TX_OK)
>>  			txq_trans_update(txq);
>>  		return rc;
>> @@ -1975,6 +1977,7 @@ gso:
>>  			skb_dst_drop(nskb);
>>  
>>  		rc = ops->ndo_start_xmit(nskb, dev);
>> +		trace_net_dev_xmit(nskb, rc);
>>  		if (unlikely(rc != NETDEV_TX_OK)) {
>>  			if (rc & ~NETDEV_TX_MASK)
>>  				goto out_kfree_gso_skb;
>> @@ -2165,6 +2168,7 @@ int dev_queue_xmit(struct sk_buff *skb)
>>  #ifdef CONFIG_NET_CLS_ACT
>>  	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
>>  #endif
>> +	trace_net_dev_queue(skb);
>>  	if (q->enqueue) {
>>  		rc = __dev_xmit_skb(skb, q, dev, txq);
>>  		goto out;
>> @@ -2939,6 +2943,7 @@ int netif_receive_skb(struct sk_buff *skb)
>>  	if (netdev_tstamp_prequeue)
>>  		net_timestamp_check(skb);
>>  
>> +	trace_net_dev_receive(skb);
> 
> I imagine for completeness you'll want to make a call to this in netif_rx and
> netif_rx_ni as well.
> 
It might be better to add tracepoint to netif_rx to determine a time between
netif_rx and netif_receive_skb.

Thanks,
Koki Sanagi.

> 
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-20 11:50   ` Neil Horman
@ 2010-07-21  7:02     ` Koki Sanagi
  2010-07-21 10:56       ` Neil Horman
  0 siblings, 1 reply; 25+ messages in thread
From: Koki Sanagi @ 2010-07-21  7:02 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/07/20 20:50), Neil Horman wrote:
> On Tue, Jul 20, 2010 at 09:49:10AM +0900, Koki Sanagi wrote:
>> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
>> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
>> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
>> we can check how long it takes to free transmited packets. And using it, we can
>> calculate how many packets driver had at that time. It is useful when a drop of
>> transmited packet is a problem.
>>
>>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
>>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
>>
>>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
>>
>>
>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>> ---
>>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
>>  net/core/datagram.c        |    1 +
>>  net/core/dev.c             |    2 ++
>>  net/core/skbuff.c          |    1 +
>>  4 files changed, 46 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
>> index 4b2be6d..84c9041 100644
>> --- a/include/trace/events/skb.h
>> +++ b/include/trace/events/skb.h
>> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
>>  		__entry->skbaddr, __entry->protocol, __entry->location)
>>  );
>>  
>> +DECLARE_EVENT_CLASS(free_skb,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(	void *,	skbaddr	)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->skbaddr = skb;
>> +	),
>> +
>> +	TP_printk("skbaddr=%p", __entry->skbaddr)
>> +
>> +);
>> +
>> +DEFINE_EVENT(free_skb, consume_skb,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +
>> +);
>> +
>> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +
>> +);
>> +
>> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
>> +
>> +	TP_PROTO(struct sk_buff *skb),
>> +
>> +	TP_ARGS(skb)
>> +
>> +);
>> +
> 
> Why create these last two tracepoints at all?  dev_kfree_skb_irq will eventually
> pass through kfree_skb anyway, getting picked up by the tracepoint there, the
> while the latter won't (since it uses __kfree_skb instead), I think that could
> be fixed up by add a call to trace_kfree_skb there directly, saving you two
> tracepoints.
> 
> Neil
> 
I think dev_kfree_skb_irq isn't chased by trace_kfree_skb or trace_consume_skb
completely. Because net_tx_action frees skb by __kfree_skb. So it is better to
add trace_kfree_skb before it. skb_free_datagram_locked is same.

Thanks,
Koki Sanagi.
>>
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-21  7:02     ` Koki Sanagi
@ 2010-07-21 10:56       ` Neil Horman
  2010-07-22  8:39         ` Koki Sanagi
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Horman @ 2010-07-21 10:56 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Wed, Jul 21, 2010 at 04:02:57PM +0900, Koki Sanagi wrote:
> (2010/07/20 20:50), Neil Horman wrote:
> > On Tue, Jul 20, 2010 at 09:49:10AM +0900, Koki Sanagi wrote:
> >> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
> >> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
> >> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
> >> we can check how long it takes to free transmited packets. And using it, we can
> >> calculate how many packets driver had at that time. It is useful when a drop of
> >> transmited packet is a problem.
> >>
> >>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
> >>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
> >>
> >>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
> >>
> >>
> >> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> >> ---
> >>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
> >>  net/core/datagram.c        |    1 +
> >>  net/core/dev.c             |    2 ++
> >>  net/core/skbuff.c          |    1 +
> >>  4 files changed, 46 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> >> index 4b2be6d..84c9041 100644
> >> --- a/include/trace/events/skb.h
> >> +++ b/include/trace/events/skb.h
> >> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
> >>  		__entry->skbaddr, __entry->protocol, __entry->location)
> >>  );
> >>  
> >> +DECLARE_EVENT_CLASS(free_skb,
> >> +
> >> +	TP_PROTO(struct sk_buff *skb),
> >> +
> >> +	TP_ARGS(skb),
> >> +
> >> +	TP_STRUCT__entry(
> >> +		__field(	void *,	skbaddr	)
> >> +	),
> >> +
> >> +	TP_fast_assign(
> >> +		__entry->skbaddr = skb;
> >> +	),
> >> +
> >> +	TP_printk("skbaddr=%p", __entry->skbaddr)
> >> +
> >> +);
> >> +
> >> +DEFINE_EVENT(free_skb, consume_skb,
> >> +
> >> +	TP_PROTO(struct sk_buff *skb),
> >> +
> >> +	TP_ARGS(skb)
> >> +
> >> +);
> >> +
> >> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
> >> +
> >> +	TP_PROTO(struct sk_buff *skb),
> >> +
> >> +	TP_ARGS(skb)
> >> +
> >> +);
> >> +
> >> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
> >> +
> >> +	TP_PROTO(struct sk_buff *skb),
> >> +
> >> +	TP_ARGS(skb)
> >> +
> >> +);
> >> +
> > 
> > Why create these last two tracepoints at all?  dev_kfree_skb_irq will eventually
> > pass through kfree_skb anyway, getting picked up by the tracepoint there, the
> > while the latter won't (since it uses __kfree_skb instead), I think that could
> > be fixed up by add a call to trace_kfree_skb there directly, saving you two
> > tracepoints.
> > 
> > Neil
> > 
> I think dev_kfree_skb_irq isn't chased by trace_kfree_skb or trace_consume_skb
> completely. Because net_tx_action frees skb by __kfree_skb. So it is better to
> add trace_kfree_skb before it. skb_free_datagram_locked is same.
> 
It isn't, you're right, but that was the point I made above.  Those missed areas
could be easily handled by adding calls to trace_kfree_skb which already exists,
to the missed areas.  Then you don't need to create those new tracepoints.  The
way your doing this, if someone wants to trace all skb frees in debugfs, they
would have to enable three tracepoints, not just one.  Not that thats the point
of your patch, but its something to consider, and it simplifies your code.
Neil

> Thanks,
> Koki Sanagi.
> >>
> > 
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-21  6:57     ` Koki Sanagi
@ 2010-07-21 11:14       ` Neil Horman
  2010-07-21 13:01         ` KOSAKI Motohiro
  2010-07-22  8:41         ` Koki Sanagi
  0 siblings, 2 replies; 25+ messages in thread
From: Neil Horman @ 2010-07-21 11:14 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Wed, Jul 21, 2010 at 03:57:05PM +0900, Koki Sanagi wrote:
> (2010/07/20 20:04), Neil Horman wrote:
> > On Tue, Jul 20, 2010 at 09:45:31AM +0900, Koki Sanagi wrote:
> >> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> >>
> >> Add a tracepoint for tracing when softirq action is raised.
> >>
> >> It and the existed tracepoints complete softirq's tracepoints:
> >> softirq_raise, softirq_entry and softirq_exit.
> >>
> >> And when this tracepoint is used in combination with
> >> the softirq_entry tracepoint we can determine
> >> the softirq raise latency.
> >>
> >> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> >> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
> >>
> >> [ factorize softirq events with DECLARE_EVENT_CLASS ]
> >> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> >> ---
> >>  include/linux/interrupt.h  |    8 +++++-
> >>  include/trace/events/irq.h |   57 ++++++++++++++++++++++++++-----------------
> >>  kernel/softirq.c           |    4 +-
> >>  3 files changed, 43 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >> index c233113..1cb5726 100644
> >> --- a/include/linux/interrupt.h
> >> +++ b/include/linux/interrupt.h
> >> @@ -18,6 +18,7 @@
> >>  #include <asm/atomic.h>
> >>  #include <asm/ptrace.h>
> >>  #include <asm/system.h>
> >> +#include <trace/events/irq.h>
> >>  
> >>  /*
> >>   * These correspond to the IORESOURCE_IRQ_* defines in
> >> @@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
> >>  asmlinkage void __do_softirq(void);
> >>  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
> >>  extern void softirq_init(void);
> >> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
> >> +static inline void __raise_softirq_irqoff(unsigned int nr)
> >> +{
> >> +	trace_softirq_raise(nr);
> >> +	or_softirq_pending(1UL << nr);
> >> +}
> >> +
> > We already have tracepoints in irq_enter and irq_exit.  If the goal here is to
> > detect latency during packet processing, cant the delta in time between those
> > two points be used to determine interrupt handling latency?
> 
> Certainly, the time between irq_entry and irq_exit is not directly related to
> latency during packet processing. But it's indirectly related it.
> Because softirq_entry isn't passed until irq exits and softirq_entry time is
> related to packet processing latency. So I show it as a reference.
> 
Its not directly related no, but look at it, the amount of processing between
irq_exit and softirq_entry is minimal.  The information you are trying to
extract by computing the delta from irq_entry to softirq_entry is almost exactly
the same as that from irq_entry to irq_exit.  For that matter, since you're
trying to guage lantency for packet processing, I expect you could get the same
delta by measuring irq_entry to napi_poll tracepoint time, and save the hassle
of needing to filter on softirq processing that doesn't relate to packet
processing.

> > 
> > 
> >>  extern void raise_softirq_irqoff(unsigned int nr);
> >>  extern void raise_softirq(unsigned int nr);
> >>  extern void wakeup_softirqd(void);
> >> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> >> index 0e4cfb6..717744c 100644
> >> --- a/include/trace/events/irq.h
> >> +++ b/include/trace/events/irq.h
> >> @@ -5,7 +5,9 @@
> >>  #define _TRACE_IRQ_H
> >>  
> >>  #include <linux/tracepoint.h>
> >> -#include <linux/interrupt.h>
> >> +
> >> +struct irqaction;
> >> +struct softirq_action;
> >>  
> >>  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
> >>  #define show_softirq_name(val)				\
> >> @@ -84,56 +86,65 @@ TRACE_EVENT(irq_handler_exit,
> >>  
> >>  DECLARE_EVENT_CLASS(softirq,
> >>  
> >> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> >> +	TP_PROTO(unsigned int nr),
> >>  
> >> -	TP_ARGS(h, vec),
> >> +	TP_ARGS(nr),
> >>  
> >>  	TP_STRUCT__entry(
> >> -		__field(	int,	vec			)
> >> +		__field(	unsigned int,	vec	)
> >>  	),
> >>  
> >>  	TP_fast_assign(
> >> -		__entry->vec = (int)(h - vec);
> >> +		__entry->vec	= nr;
> >>  	),
> >>  
> >>  	TP_printk("vec=%d [action=%s]", __entry->vec,
> >> -		  show_softirq_name(__entry->vec))
> >> +		show_softirq_name(__entry->vec))
> >> +);
> >> +
> >> +/**
> >> + * softirq_raise - called immediately when a softirq is raised
> >> + * @nr: softirq vector number
> >> + *
> >> + * Tracepoint for tracing when softirq action is raised.
> >> + * Also, when used in combination with the softirq_entry tracepoint
> >> + * we can determine the softirq raise latency.
> >> + */
> >> +DEFINE_EVENT(softirq, softirq_raise,
> >> +
> >> +	TP_PROTO(unsigned int nr),
> >> +
> >> +	TP_ARGS(nr)
> >>  );
> >>  
> >>  /**
> >>   * softirq_entry - called immediately before the softirq handler
> >> - * @h: pointer to struct softirq_action
> >> - * @vec: pointer to first struct softirq_action in softirq_vec array
> >> + * @nr: softirq vector number
> >>   *
> >> - * The @h parameter, contains a pointer to the struct softirq_action
> >> - * which has a pointer to the action handler that is called. By subtracting
> >> - * the @vec pointer from the @h pointer, we can determine the softirq
> >> - * number. Also, when used in combination with the softirq_exit tracepoint
> >> + * Tracepoint for tracing when softirq action starts.
> >> + * Also, when used in combination with the softirq_exit tracepoint
> >>   * we can determine the softirq latency.
> >>   */
> >>  DEFINE_EVENT(softirq, softirq_entry,
> >>  
> >> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> >> +	TP_PROTO(unsigned int nr),
> >>  
> >> -	TP_ARGS(h, vec)
> >> +	TP_ARGS(nr)
> >>  );
> >>  
> >>  /**
> >>   * softirq_exit - called immediately after the softirq handler returns
> >> - * @h: pointer to struct softirq_action
> >> - * @vec: pointer to first struct softirq_action in softirq_vec array
> >> + * @nr: softirq vector number
> >>   *
> >> - * The @h parameter contains a pointer to the struct softirq_action
> >> - * that has handled the softirq. By subtracting the @vec pointer from
> >> - * the @h pointer, we can determine the softirq number. Also, when used in
> >> - * combination with the softirq_entry tracepoint we can determine the softirq
> >> - * latency.
> >> + * Tracepoint for tracing when softirq action ends.
> >> + * Also, when used in combination with the softirq_entry tracepoint
> >> + * we can determine the softirq latency.
> >>   */
> >>  DEFINE_EVENT(softirq, softirq_exit,
> >>  
> >> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> >> +	TP_PROTO(unsigned int nr),
> >>  
> >> -	TP_ARGS(h, vec)
> >> +	TP_ARGS(nr)
> >>  );
> >>  
> >>  #endif /*  _TRACE_IRQ_H */
> >> diff --git a/kernel/softirq.c b/kernel/softirq.c
> >> index 825e112..6790599 100644
> >> --- a/kernel/softirq.c
> >> +++ b/kernel/softirq.c
> >> @@ -215,9 +215,9 @@ restart:
> >>  			int prev_count = preempt_count();
> >>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
> >>  
> >> -			trace_softirq_entry(h, softirq_vec);
> >> +			trace_softirq_entry(h - softirq_vec);
> >>  			h->action(h);
> >> -			trace_softirq_exit(h, softirq_vec);
> >> +			trace_softirq_exit(h - softirq_vec);
> > 
> > You're loosing information here by reducing the numbers of parameters in this
> > tracepoint.  How many other tracepoint scripts rely on having both pointers
> > handy?  Why not just do the pointer math inside your tracehook instead?
> 
> In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
> can't use softirq DECLARE_EVENT_CLASS as is.
> Currently,  there is no script using softirq_entry or softirq_exit.
> 
That shouldn't matter, just pass in NULL for softirq_vec in
__raise_softirq_irqoff as the second argument to the trace function.  You may
need to fix up the class definition so that the assignment or printk doesn't try
to dereference that pointer when its NULL, but thats easy enough, and it avoids
breaking any other perf scripts floating out there.
Neil

> Thanks,
> Koki Sanagi.
> 
> > 
> >>  			if (unlikely(prev_count != preempt_count())) {
> >>  				printk(KERN_ERR "huh, entered softirq %td %s %p"
> >>  				       "with preempt_count %08x,"
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> > 
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT
  2010-07-21  7:00     ` Koki Sanagi
@ 2010-07-21 11:24       ` Neil Horman
  0 siblings, 0 replies; 25+ messages in thread
From: Neil Horman @ 2010-07-21 11:24 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Wed, Jul 21, 2010 at 04:00:15PM +0900, Koki Sanagi wrote:
> (2010/07/20 20:09), Neil Horman wrote:
> > On Tue, Jul 20, 2010 at 09:46:51AM +0900, Koki Sanagi wrote:
> >> From: Neil Horman <nhorman@tuxdriver.com>
> >>
> >> This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
> >> the usability of napi_poll tracepoint.
> >>
> >>           <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
> >>           <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1
> >>
> >> An original patch is below.
> >> http://marc.info/?l=linux-kernel&m=126021713809450&w=2
> >> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >>
> >> And add a fix by Steven Rostedt.
> >> http://marc.info/?l=linux-kernel&m=126150506519173&w=2
> >>
> >> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> >> ---
> >>  include/trace/events/napi.h |   25 +++++++++++++++++++++++--
> >>  1 files changed, 23 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
> >> index 188deca..8fe1e93 100644
> >> --- a/include/trace/events/napi.h
> >> +++ b/include/trace/events/napi.h
> >> @@ -6,10 +6,31 @@
> >>  
> >>  #include <linux/netdevice.h>
> >>  #include <linux/tracepoint.h>
> >> +#include <linux/ftrace.h>
> >> +
> >> +#define NO_DEV "(no_device)"
> >> +
> >> +TRACE_EVENT(napi_poll,
> >>  
> >> -DECLARE_TRACE(napi_poll,
> >>  	TP_PROTO(struct napi_struct *napi),
> >> -	TP_ARGS(napi));
> >> +
> >> +	TP_ARGS(napi),
> >> +
> >> +	TP_STRUCT__entry(
> >> +		__field(	struct napi_struct *,	napi)
> >> +		__string(	dev_name, napi->dev ? napi->dev->name : NO_DEV)
> >> +	),
> >> +
> >> +	TP_fast_assign(
> >> +		__entry->napi = napi;
> >> +		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> >> +	),
> >> +
> >> +	TP_printk("napi poll on napi struct %p for device %s",
> >> +		__entry->napi, __get_str(dev_name))
> >> +);
> >> +
> >> +#undef NO_DEV
> >>  
> >>  #endif /* _TRACE_NAPI_H_ */
> >>  
> >>
> > NAK, This change will create a build break in the drop monitor code.  You'll
> > need to fix that up if you want to make this change.
> > Neil
> > 
> I built a kernel with CONFIG_NET_DROP_MONITOR=y.
> But build break did not occur.
> 
My fault, sorry.  I misunderstood what the macro change was doing here.  It
should be fine.

Thanks
Neil

> Thanks,
> Koki Sanagi.
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-21 11:14       ` Neil Horman
@ 2010-07-21 13:01         ` KOSAKI Motohiro
  2010-07-21 13:56           ` Neil Horman
  2010-07-22  8:41         ` Koki Sanagi
  1 sibling, 1 reply; 25+ messages in thread
From: KOSAKI Motohiro @ 2010-07-21 13:01 UTC (permalink / raw)
  To: Neil Horman
  Cc: kosaki.motohiro, Koki Sanagi, netdev, linux-kernel, davem,
	kaneshige.kenji, izumi.taku, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

> > >>  #endif /*  _TRACE_IRQ_H */
> > >> diff --git a/kernel/softirq.c b/kernel/softirq.c
> > >> index 825e112..6790599 100644
> > >> --- a/kernel/softirq.c
> > >> +++ b/kernel/softirq.c
> > >> @@ -215,9 +215,9 @@ restart:
> > >>  			int prev_count = preempt_count();
> > >>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
> > >>  
> > >> -			trace_softirq_entry(h, softirq_vec);
> > >> +			trace_softirq_entry(h - softirq_vec);
> > >>  			h->action(h);
> > >> -			trace_softirq_exit(h, softirq_vec);
> > >> +			trace_softirq_exit(h - softirq_vec);
> > > 
> > > You're loosing information here by reducing the numbers of parameters in this
> > > tracepoint.  How many other tracepoint scripts rely on having both pointers
> > > handy?  Why not just do the pointer math inside your tracehook instead?
> > 
> > In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
> > can't use softirq DECLARE_EVENT_CLASS as is.
> > Currently,  there is no script using softirq_entry or softirq_exit.
> > 
> That shouldn't matter, just pass in NULL for softirq_vec in
> __raise_softirq_irqoff as the second argument to the trace function.  You may
> need to fix up the class definition so that the assignment or printk doesn't try
> to dereference that pointer when its NULL, but thats easy enough, and it avoids
> breaking any other perf scripts floating out there.

please see 5 lines above. we already have 'h - softirq_vec' calculation in
this function. so, Sanagi-san's change don't makes any overhead.

So, if the overhead is zero, I'd prefer simplest tracepoint definition :)




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-21 13:01         ` KOSAKI Motohiro
@ 2010-07-21 13:56           ` Neil Horman
  2010-07-23  5:34             ` KOSAKI Motohiro
  0 siblings, 1 reply; 25+ messages in thread
From: Neil Horman @ 2010-07-21 13:56 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji,
	izumi.taku, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Wed, Jul 21, 2010 at 10:01:34PM +0900, KOSAKI Motohiro wrote:
> > > >>  #endif /*  _TRACE_IRQ_H */
> > > >> diff --git a/kernel/softirq.c b/kernel/softirq.c
> > > >> index 825e112..6790599 100644
> > > >> --- a/kernel/softirq.c
> > > >> +++ b/kernel/softirq.c
> > > >> @@ -215,9 +215,9 @@ restart:
> > > >>  			int prev_count = preempt_count();
> > > >>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
> > > >>  
> > > >> -			trace_softirq_entry(h, softirq_vec);
> > > >> +			trace_softirq_entry(h - softirq_vec);
> > > >>  			h->action(h);
> > > >> -			trace_softirq_exit(h, softirq_vec);
> > > >> +			trace_softirq_exit(h - softirq_vec);
> > > > 
> > > > You're loosing information here by reducing the numbers of parameters in this
> > > > tracepoint.  How many other tracepoint scripts rely on having both pointers
> > > > handy?  Why not just do the pointer math inside your tracehook instead?
> > > 
> > > In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
> > > can't use softirq DECLARE_EVENT_CLASS as is.
> > > Currently,  there is no script using softirq_entry or softirq_exit.
> > > 
> > That shouldn't matter, just pass in NULL for softirq_vec in
> > __raise_softirq_irqoff as the second argument to the trace function.  You may
> > need to fix up the class definition so that the assignment or printk doesn't try
> > to dereference that pointer when its NULL, but thats easy enough, and it avoids
> > breaking any other perf scripts floating out there.
> 
> please see 5 lines above. we already have 'h - softirq_vec' calculation in
> this function. so, Sanagi-san's change don't makes any overhead.
> 
> So, if the overhead is zero, I'd prefer simplest tracepoint definition :)
> 
I never complained about performance here, I complained about information loss.
You have a tracepoint that provides two arguments here, and you're eliminating
one of them.  That will potentially break other users of this tracepoint.  I
understand we don't normally care about that with tracepoints as much, but if we
can avoid it, why don't we?
Neil

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-21 10:56       ` Neil Horman
@ 2010-07-22  8:39         ` Koki Sanagi
  2010-07-22 14:57           ` Neil Horman
  0 siblings, 1 reply; 25+ messages in thread
From: Koki Sanagi @ 2010-07-22  8:39 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/07/21 19:56), Neil Horman wrote:
> On Wed, Jul 21, 2010 at 04:02:57PM +0900, Koki Sanagi wrote:
>> (2010/07/20 20:50), Neil Horman wrote:
>>> On Tue, Jul 20, 2010 at 09:49:10AM +0900, Koki Sanagi wrote:
>>>> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
>>>> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
>>>> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
>>>> we can check how long it takes to free transmited packets. And using it, we can
>>>> calculate how many packets driver had at that time. It is useful when a drop of
>>>> transmited packet is a problem.
>>>>
>>>>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
>>>>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
>>>>
>>>>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
>>>>
>>>>
>>>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>>>> ---
>>>>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
>>>>  net/core/datagram.c        |    1 +
>>>>  net/core/dev.c             |    2 ++
>>>>  net/core/skbuff.c          |    1 +
>>>>  4 files changed, 46 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
>>>> index 4b2be6d..84c9041 100644
>>>> --- a/include/trace/events/skb.h
>>>> +++ b/include/trace/events/skb.h
>>>> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
>>>>  		__entry->skbaddr, __entry->protocol, __entry->location)
>>>>  );
>>>>  
>>>> +DECLARE_EVENT_CLASS(free_skb,
>>>> +
>>>> +	TP_PROTO(struct sk_buff *skb),
>>>> +
>>>> +	TP_ARGS(skb),
>>>> +
>>>> +	TP_STRUCT__entry(
>>>> +		__field(	void *,	skbaddr	)
>>>> +	),
>>>> +
>>>> +	TP_fast_assign(
>>>> +		__entry->skbaddr = skb;
>>>> +	),
>>>> +
>>>> +	TP_printk("skbaddr=%p", __entry->skbaddr)
>>>> +
>>>> +);
>>>> +
>>>> +DEFINE_EVENT(free_skb, consume_skb,
>>>> +
>>>> +	TP_PROTO(struct sk_buff *skb),
>>>> +
>>>> +	TP_ARGS(skb)
>>>> +
>>>> +);
>>>> +
>>>> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
>>>> +
>>>> +	TP_PROTO(struct sk_buff *skb),
>>>> +
>>>> +	TP_ARGS(skb)
>>>> +
>>>> +);
>>>> +
>>>> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
>>>> +
>>>> +	TP_PROTO(struct sk_buff *skb),
>>>> +
>>>> +	TP_ARGS(skb)
>>>> +
>>>> +);
>>>> +
>>>
>>> Why create these last two tracepoints at all?  dev_kfree_skb_irq will eventually
>>> pass through kfree_skb anyway, getting picked up by the tracepoint there, the
>>> while the latter won't (since it uses __kfree_skb instead), I think that could
>>> be fixed up by add a call to trace_kfree_skb there directly, saving you two
>>> tracepoints.
>>>
>>> Neil
>>>
>> I think dev_kfree_skb_irq isn't chased by trace_kfree_skb or trace_consume_skb
>> completely. Because net_tx_action frees skb by __kfree_skb. So it is better to
>> add trace_kfree_skb before it. skb_free_datagram_locked is same.
>>
> It isn't, you're right, but that was the point I made above.  Those missed areas
> could be easily handled by adding calls to trace_kfree_skb which already exists,
> to the missed areas.  Then you don't need to create those new tracepoints.  The
> way your doing this, if someone wants to trace all skb frees in debugfs, they
> would have to enable three tracepoints, not just one.  Not that thats the point
> of your patch, but its something to consider, and it simplifies your code.
> Neil
> 

O.K. I've re-made a patch to use trace_kfree_skb instead of
trace_dev_kfree_skb_irq and trace_skb_free_datagram_locked.
But I've got a problem.
I should use not __builtin_return_address, but macro or function which returns
current address. But I don't know any macro like that. Do you know any solution ?

Koki Sanagi.
---
 include/trace/events/skb.h |   17 +++++++++++++++++
 net/core/datagram.c        |    1 +
 net/core/dev.c             |    2 ++
 net/core/skbuff.c          |    1 +
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..75ce9d5 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb,
 		__entry->skbaddr, __entry->protocol, __entry->location)
 );
 
+TRACE_EVENT(consume_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,	skbaddr	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p", __entry->skbaddr)
+);
+
 TRACE_EVENT(skb_copy_datagram_iovec,
 
 	TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 251997a..96dab4f 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
+	trace_kfree_skb(skb, __builtin_return_address(0));
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(skb_free_datagram_locked);
diff --git a/net/core/dev.c b/net/core/dev.c
index e6a911f..faded6f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -131,6 +131,7 @@
 #include <linux/random.h>
 #include <trace/events/napi.h>
 #include <trace/events/net.h>
+#include <trace/events/skb.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -2577,6 +2578,7 @@ static void net_tx_action(struct softirq_action *h)
 			clist = clist->next;
 
 			WARN_ON(atomic_read(&skb->users));
+			trace_kfree_skb(skb, __builtin_return_address(0));
 			__kfree_skb(skb);
 		}
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 76d33ca..ce0bc36 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
+	trace_consume_skb(skb);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(consume_skb);



 



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-21 11:14       ` Neil Horman
  2010-07-21 13:01         ` KOSAKI Motohiro
@ 2010-07-22  8:41         ` Koki Sanagi
  1 sibling, 0 replies; 25+ messages in thread
From: Koki Sanagi @ 2010-07-22  8:41 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/07/21 20:14), Neil Horman wrote:
> On Wed, Jul 21, 2010 at 03:57:05PM +0900, Koki Sanagi wrote:
>> (2010/07/20 20:04), Neil Horman wrote:
>>> On Tue, Jul 20, 2010 at 09:45:31AM +0900, Koki Sanagi wrote:
>>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>>
>>>> Add a tracepoint for tracing when softirq action is raised.
>>>>
>>>> It and the existed tracepoints complete softirq's tracepoints:
>>>> softirq_raise, softirq_entry and softirq_exit.
>>>>
>>>> And when this tracepoint is used in combination with
>>>> the softirq_entry tracepoint we can determine
>>>> the softirq raise latency.
>>>>
>>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
>>>> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
>>>>
>>>> [ factorize softirq events with DECLARE_EVENT_CLASS ]
>>>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>>>> ---
>>>>  include/linux/interrupt.h  |    8 +++++-
>>>>  include/trace/events/irq.h |   57 ++++++++++++++++++++++++++-----------------
>>>>  kernel/softirq.c           |    4 +-
>>>>  3 files changed, 43 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>> index c233113..1cb5726 100644
>>>> --- a/include/linux/interrupt.h
>>>> +++ b/include/linux/interrupt.h
>>>> @@ -18,6 +18,7 @@
>>>>  #include <asm/atomic.h>
>>>>  #include <asm/ptrace.h>
>>>>  #include <asm/system.h>
>>>> +#include <trace/events/irq.h>
>>>>  
>>>>  /*
>>>>   * These correspond to the IORESOURCE_IRQ_* defines in
>>>> @@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
>>>>  asmlinkage void __do_softirq(void);
>>>>  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
>>>>  extern void softirq_init(void);
>>>> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
>>>> +static inline void __raise_softirq_irqoff(unsigned int nr)
>>>> +{
>>>> +	trace_softirq_raise(nr);
>>>> +	or_softirq_pending(1UL << nr);
>>>> +}
>>>> +
>>> We already have tracepoints in irq_enter and irq_exit.  If the goal here is to
>>> detect latency during packet processing, cant the delta in time between those
>>> two points be used to determine interrupt handling latency?
>>
>> Certainly, the time between irq_entry and irq_exit is not directly related to
>> latency during packet processing. But it's indirectly related it.
>> Because softirq_entry isn't passed until irq exits and softirq_entry time is
>> related to packet processing latency. So I show it as a reference.
>>
> Its not directly related no, but look at it, the amount of processing between
> irq_exit and softirq_entry is minimal.  The information you are trying to
> extract by computing the delta from irq_entry to softirq_entry is almost exactly
> the same as that from irq_entry to irq_exit.  For that matter, since you're
> trying to guage lantency for packet processing, I expect you could get the same
> delta by measuring irq_entry to napi_poll tracepoint time, and save the hassle
> of needing to filter on softirq processing that doesn't relate to packet
> processing.

Yeah, to determine interrput latency, we need either one irq_exit or
softirq_entry, not both.
And I think softirq_entry should be left because there is a possibility that
softirq isn't triggered immidiately after irq_exit.
softirq_exit isn't needed because it is not related to packet processing.
softirq_raise is needed because it connects irq_entry and softirq_entry but
there is no need to show it. Currently, my idea is like the following.

irq_entry(+0.000000msec,irq=77:eth3)
    |
softirq_entry(+0.003562msec)
    |
    |---netif_receive_skb(+0.006279msec,len=100)
    |            |
    |   skb_copy_datagram_iovec(+0.038778msec, 2285:sshd)
    |
napi_poll_exit(+0.017160msec, eth3)

> 
>>>
>>>
>>>>  extern void raise_softirq_irqoff(unsigned int nr);
>>>>  extern void raise_softirq(unsigned int nr);
>>>>  extern void wakeup_softirqd(void);
>>>> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
>>>> index 0e4cfb6..717744c 100644
>>>> --- a/include/trace/events/irq.h
>>>> +++ b/include/trace/events/irq.h
>>>> @@ -5,7 +5,9 @@
>>>>  #define _TRACE_IRQ_H
>>>>  
>>>>  #include <linux/tracepoint.h>
>>>> -#include <linux/interrupt.h>
>>>> +
>>>> +struct irqaction;
>>>> +struct softirq_action;
>>>>  
>>>>  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
>>>>  #define show_softirq_name(val)				\
>>>> @@ -84,56 +86,65 @@ TRACE_EVENT(irq_handler_exit,
>>>>  
>>>>  DECLARE_EVENT_CLASS(softirq,
>>>>  
>>>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>>>> +	TP_PROTO(unsigned int nr),
>>>>  
>>>> -	TP_ARGS(h, vec),
>>>> +	TP_ARGS(nr),
>>>>  
>>>>  	TP_STRUCT__entry(
>>>> -		__field(	int,	vec			)
>>>> +		__field(	unsigned int,	vec	)
>>>>  	),
>>>>  
>>>>  	TP_fast_assign(
>>>> -		__entry->vec = (int)(h - vec);
>>>> +		__entry->vec	= nr;
>>>>  	),
>>>>  
>>>>  	TP_printk("vec=%d [action=%s]", __entry->vec,
>>>> -		  show_softirq_name(__entry->vec))
>>>> +		show_softirq_name(__entry->vec))
>>>> +);
>>>> +
>>>> +/**
>>>> + * softirq_raise - called immediately when a softirq is raised
>>>> + * @nr: softirq vector number
>>>> + *
>>>> + * Tracepoint for tracing when softirq action is raised.
>>>> + * Also, when used in combination with the softirq_entry tracepoint
>>>> + * we can determine the softirq raise latency.
>>>> + */
>>>> +DEFINE_EVENT(softirq, softirq_raise,
>>>> +
>>>> +	TP_PROTO(unsigned int nr),
>>>> +
>>>> +	TP_ARGS(nr)
>>>>  );
>>>>  
>>>>  /**
>>>>   * softirq_entry - called immediately before the softirq handler
>>>> - * @h: pointer to struct softirq_action
>>>> - * @vec: pointer to first struct softirq_action in softirq_vec array
>>>> + * @nr: softirq vector number
>>>>   *
>>>> - * The @h parameter, contains a pointer to the struct softirq_action
>>>> - * which has a pointer to the action handler that is called. By subtracting
>>>> - * the @vec pointer from the @h pointer, we can determine the softirq
>>>> - * number. Also, when used in combination with the softirq_exit tracepoint
>>>> + * Tracepoint for tracing when softirq action starts.
>>>> + * Also, when used in combination with the softirq_exit tracepoint
>>>>   * we can determine the softirq latency.
>>>>   */
>>>>  DEFINE_EVENT(softirq, softirq_entry,
>>>>  
>>>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>>>> +	TP_PROTO(unsigned int nr),
>>>>  
>>>> -	TP_ARGS(h, vec)
>>>> +	TP_ARGS(nr)
>>>>  );
>>>>  
>>>>  /**
>>>>   * softirq_exit - called immediately after the softirq handler returns
>>>> - * @h: pointer to struct softirq_action
>>>> - * @vec: pointer to first struct softirq_action in softirq_vec array
>>>> + * @nr: softirq vector number
>>>>   *
>>>> - * The @h parameter contains a pointer to the struct softirq_action
>>>> - * that has handled the softirq. By subtracting the @vec pointer from
>>>> - * the @h pointer, we can determine the softirq number. Also, when used in
>>>> - * combination with the softirq_entry tracepoint we can determine the softirq
>>>> - * latency.
>>>> + * Tracepoint for tracing when softirq action ends.
>>>> + * Also, when used in combination with the softirq_entry tracepoint
>>>> + * we can determine the softirq latency.
>>>>   */
>>>>  DEFINE_EVENT(softirq, softirq_exit,
>>>>  
>>>> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
>>>> +	TP_PROTO(unsigned int nr),
>>>>  
>>>> -	TP_ARGS(h, vec)
>>>> +	TP_ARGS(nr)
>>>>  );
>>>>  
>>>>  #endif /*  _TRACE_IRQ_H */
>>>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>>>> index 825e112..6790599 100644
>>>> --- a/kernel/softirq.c
>>>> +++ b/kernel/softirq.c
>>>> @@ -215,9 +215,9 @@ restart:
>>>>  			int prev_count = preempt_count();
>>>>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
>>>>  
>>>> -			trace_softirq_entry(h, softirq_vec);
>>>> +			trace_softirq_entry(h - softirq_vec);
>>>>  			h->action(h);
>>>> -			trace_softirq_exit(h, softirq_vec);
>>>> +			trace_softirq_exit(h - softirq_vec);
>>>
>>> You're loosing information here by reducing the numbers of parameters in this
>>> tracepoint.  How many other tracepoint scripts rely on having both pointers
>>> handy?  Why not just do the pointer math inside your tracehook instead?
>>
>> In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
>> can't use softirq DECLARE_EVENT_CLASS as is.
>> Currently,  there is no script using softirq_entry or softirq_exit.
>>
> That shouldn't matter, just pass in NULL for softirq_vec in
> __raise_softirq_irqoff as the second argument to the trace function.  You may
> need to fix up the class definition so that the assignment or printk doesn't try
> to dereference that pointer when its NULL, but thats easy enough, and it avoids
> breaking any other perf scripts floating out there.
> Neil
> 
>> Thanks,
>> Koki Sanagi.
>>
>>>
>>>>  			if (unlikely(prev_count != preempt_count())) {
>>>>  				printk(KERN_ERR "huh, entered softirq %td %s %p"
>>>>  				       "with preempt_count %08x,"
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>>
>>
>>
>>
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
  2010-07-22  8:39         ` Koki Sanagi
@ 2010-07-22 14:57           ` Neil Horman
  0 siblings, 0 replies; 25+ messages in thread
From: Neil Horman @ 2010-07-22 14:57 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, laijs, scott.a.mcmillan, rostedt, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Thu, Jul 22, 2010 at 05:39:15PM +0900, Koki Sanagi wrote:
> (2010/07/21 19:56), Neil Horman wrote:
> > On Wed, Jul 21, 2010 at 04:02:57PM +0900, Koki Sanagi wrote:
> >> (2010/07/20 20:50), Neil Horman wrote:
> >>> On Tue, Jul 20, 2010 at 09:49:10AM +0900, Koki Sanagi wrote:
> >>>> [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb
> >>>> This patch adds tracepoint to consume_skb, dev_kfree_skb_irq and
> >>>> skb_free_datagram_locked. Combinating with tracepoint on dev_hard_start_xmit,
> >>>> we can check how long it takes to free transmited packets. And using it, we can
> >>>> calculate how many packets driver had at that time. It is useful when a drop of
> >>>> transmited packet is a problem.
> >>>>
> >>>>           <idle>-0     [001] 241409.218333: consume_skb: skbaddr=dd6b2fb8
> >>>>           <idle>-0     [001] 241409.490555: dev_kfree_skb_irq: skbaddr=f5e29840
> >>>>
> >>>>         udp-recv-302   [001] 515031.206008: skb_free_datagram_locked: skbaddr=f5b1d900
> >>>>
> >>>>
> >>>> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> >>>> ---
> >>>>  include/trace/events/skb.h |   42 ++++++++++++++++++++++++++++++++++++++++++
> >>>>  net/core/datagram.c        |    1 +
> >>>>  net/core/dev.c             |    2 ++
> >>>>  net/core/skbuff.c          |    1 +
> >>>>  4 files changed, 46 insertions(+), 0 deletions(-)
> >>>>
> >>>> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> >>>> index 4b2be6d..84c9041 100644
> >>>> --- a/include/trace/events/skb.h
> >>>> +++ b/include/trace/events/skb.h
> >>>> @@ -35,6 +35,48 @@ TRACE_EVENT(kfree_skb,
> >>>>  		__entry->skbaddr, __entry->protocol, __entry->location)
> >>>>  );
> >>>>  
> >>>> +DECLARE_EVENT_CLASS(free_skb,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb),
> >>>> +
> >>>> +	TP_STRUCT__entry(
> >>>> +		__field(	void *,	skbaddr	)
> >>>> +	),
> >>>> +
> >>>> +	TP_fast_assign(
> >>>> +		__entry->skbaddr = skb;
> >>>> +	),
> >>>> +
> >>>> +	TP_printk("skbaddr=%p", __entry->skbaddr)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, consume_skb,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, dev_kfree_skb_irq,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>> +DEFINE_EVENT(free_skb, skb_free_datagram_locked,
> >>>> +
> >>>> +	TP_PROTO(struct sk_buff *skb),
> >>>> +
> >>>> +	TP_ARGS(skb)
> >>>> +
> >>>> +);
> >>>> +
> >>>
> >>> Why create these last two tracepoints at all?  dev_kfree_skb_irq will eventually
> >>> pass through kfree_skb anyway, getting picked up by the tracepoint there, the
> >>> while the latter won't (since it uses __kfree_skb instead), I think that could
> >>> be fixed up by add a call to trace_kfree_skb there directly, saving you two
> >>> tracepoints.
> >>>
> >>> Neil
> >>>
> >> I think dev_kfree_skb_irq isn't chased by trace_kfree_skb or trace_consume_skb
> >> completely. Because net_tx_action frees skb by __kfree_skb. So it is better to
> >> add trace_kfree_skb before it. skb_free_datagram_locked is same.
> >>
> > It isn't, you're right, but that was the point I made above.  Those missed areas
> > could be easily handled by adding calls to trace_kfree_skb which already exists,
> > to the missed areas.  Then you don't need to create those new tracepoints.  The
> > way your doing this, if someone wants to trace all skb frees in debugfs, they
> > would have to enable three tracepoints, not just one.  Not that thats the point
> > of your patch, but its something to consider, and it simplifies your code.
> > Neil
> > 
> 
> O.K. I've re-made a patch to use trace_kfree_skb instead of
> trace_dev_kfree_skb_irq and trace_skb_free_datagram_locked.
> But I've got a problem.
> I should use not __builtin_return_address, but macro or function which returns
> current address. But I don't know any macro like that. Do you know any solution ?
> 
Since the trace call is the first thing in the function, why not just pass in
skb_free_datagram_locked as the pointer.  That should work out properly
Neil

> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
  2010-07-21 13:56           ` Neil Horman
@ 2010-07-23  5:34             ` KOSAKI Motohiro
  0 siblings, 0 replies; 25+ messages in thread
From: KOSAKI Motohiro @ 2010-07-23  5:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: kosaki.motohiro, Koki Sanagi, netdev, linux-kernel, davem,
	kaneshige.kenji, izumi.taku, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

> On Wed, Jul 21, 2010 at 10:01:34PM +0900, KOSAKI Motohiro wrote:
> > > > >>  #endif /*  _TRACE_IRQ_H */
> > > > >> diff --git a/kernel/softirq.c b/kernel/softirq.c
> > > > >> index 825e112..6790599 100644
> > > > >> --- a/kernel/softirq.c
> > > > >> +++ b/kernel/softirq.c
> > > > >> @@ -215,9 +215,9 @@ restart:
> > > > >>  			int prev_count = preempt_count();
> > > > >>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
> > > > >>  
> > > > >> -			trace_softirq_entry(h, softirq_vec);
> > > > >> +			trace_softirq_entry(h - softirq_vec);
> > > > >>  			h->action(h);
> > > > >> -			trace_softirq_exit(h, softirq_vec);
> > > > >> +			trace_softirq_exit(h - softirq_vec);
> > > > > 
> > > > > You're loosing information here by reducing the numbers of parameters in this
> > > > > tracepoint.  How many other tracepoint scripts rely on having both pointers
> > > > > handy?  Why not just do the pointer math inside your tracehook instead?
> > > > 
> > > > In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
> > > > can't use softirq DECLARE_EVENT_CLASS as is.
> > > > Currently,  there is no script using softirq_entry or softirq_exit.
> > > > 
> > > That shouldn't matter, just pass in NULL for softirq_vec in
> > > __raise_softirq_irqoff as the second argument to the trace function.  You may
> > > need to fix up the class definition so that the assignment or printk doesn't try
> > > to dereference that pointer when its NULL, but thats easy enough, and it avoids
> > > breaking any other perf scripts floating out there.
> > 
> > please see 5 lines above. we already have 'h - softirq_vec' calculation in
> > this function. so, Sanagi-san's change don't makes any overhead.
> > 
> > So, if the overhead is zero, I'd prefer simplest tracepoint definition :)
> > 
> I never complained about performance here, I complained about information loss.
> You have a tracepoint that provides two arguments here, and you're eliminating
> one of them.  That will potentially break other users of this tracepoint.  I
> understand we don't normally care about that with tracepoints as much, but if we
> can avoid it, why don't we?

I see. I have no objection.

Thanks.




^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-07-23  5:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-20  0:43 [RFC PATCH v3 0/5] netdev: show a process of packets Koki Sanagi
2010-07-20  0:45 ` [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
2010-07-20 11:04   ` Neil Horman
2010-07-21  6:57     ` Koki Sanagi
2010-07-21 11:14       ` Neil Horman
2010-07-21 13:01         ` KOSAKI Motohiro
2010-07-21 13:56           ` Neil Horman
2010-07-23  5:34             ` KOSAKI Motohiro
2010-07-22  8:41         ` Koki Sanagi
2010-07-20  0:46 ` [RFC PATCH v3 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
2010-07-20 11:09   ` Neil Horman
2010-07-21  7:00     ` Koki Sanagi
2010-07-21 11:24       ` Neil Horman
2010-07-20  0:47 ` [RFC PATCH v3 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
2010-07-20 11:41   ` Neil Horman
2010-07-21  7:01     ` Koki Sanagi
2010-07-20  0:49 ` [RFC PATCH v3 4/5] skb: add tracepoints to freeing skb Koki Sanagi
2010-07-20  4:54   ` Eric Dumazet
2010-07-20  6:47     ` Koki Sanagi
2010-07-20 11:50   ` Neil Horman
2010-07-21  7:02     ` Koki Sanagi
2010-07-21 10:56       ` Neil Horman
2010-07-22  8:39         ` Koki Sanagi
2010-07-22 14:57           ` Neil Horman
2010-07-20  0:50 ` [RFC PATCH v3 5/5] perf:add a script shows a process of packet Koki Sanagi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.