All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] netdev: show a process of packets
@ 2010-08-23  9:41 Koki Sanagi
  2010-08-23  9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
                   ` (5 more replies)
  0 siblings, 6 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-08-23  9:41 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

Rebase to the latest net-next.

CHANGE-LOG since v3:
    1) change arguments of softirq tracepoint into original one.
    2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked
       and add trace_kfree_skb before __kfree_skb instead of them.
    3) add tracepoint to netif_rx and display it by netdev-times script.

These patch-set adds tracepoints to show us a process of packets.
Using these tracepoints and existing points, we can get the time when
packet passes through some points in transmit or receive sequence.
For example, this is an output of perf script which is attached by patch 5/5.

106133.171439sec cpu=0
  irq_entry(+0.000msec irq=24:eth4)
         |
  softirq_entry(+0.006msec)
         |
         |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
         |            |
         |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
         |
  napi_poll_exit(+0.022msec eth4)

106134.175634sec cpu=1
  irq_entry(+0.000msec irq=28:eth1)
         |
         |---netif_rx(+0.009msec skb=f3ef0a00)
         |
  softirq_entry(+0.018msec)
         |
         |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84)
         |            |
         |      skb_copy_datagram_iovec(+0.033msec 0:swapper)
         |
  napi_poll_exit(+0.035msec (no_device))

The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can
show receive sequence frominterrupt(irq_entry) to application
(skb_copy_datagram_iovec). 
This script shows one NET_RX softirq and events related to it. All relative
time bases on first irq_entry which raise NET_RX softirq.

   dev    len      Qdisc               netdevice             free
   eth4    74 106125.030004sec        0.006msec             0.009msec
   eth4    87 106125.041020sec        0.007msec             0.023msec
   eth4    66 106125.042291sec        0.003msec             0.012msec
   eth4    66 106125.043274sec        0.006msec             0.004msec
   eth4   850 106125.044283sec        0.007msec             0.018msec

The above is a transmit side. There are three check-time-points.
Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in
dev_hard_start_xmit. It indicates finishing putting a packet to driver.
point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet.
Values of this script are, from left, device name, length of a packet, a time of
point1, an interval time between point1 and point2 and an interval time between
point2 and point3.

These times are useful to analyze a performance or to detect a point where
packet delays. For example,
- NET_RX softirq calling is late.
- Application is late to take a packet.
- It takes much time to put a transmitting packet to driver
  (It may be caused by packed queue)

And also, these tracepoint help us to investigate a network driver's trouble
from memory dump because ftrace records it to memory. And ftrace is so light
even if always trace on. So, in a case investigating a problem which doesn't
reproduce, it is useful.

Thanks,
Koki Sanagi.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH v4 1/5] irq: add tracepoint to softirq_raise
  2010-08-23  9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi
@ 2010-08-23  9:42 ` Koki Sanagi
  2010-09-03 15:29   ` Frederic Weisbecker
  2010-09-08  8:33   ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan
  2010-08-23  9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-08-23  9:42 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Add a tracepoint for tracing when softirq action is raised.

It and the existed tracepoints complete softirq's tracepoints:
softirq_raise, softirq_entry and softirq_exit.

And when this tracepoint is used in combination with
the softirq_entry tracepoint we can determine
the softirq raise latency.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

[ factorize softirq events with DECLARE_EVENT_CLASS ]
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/linux/interrupt.h  |    8 +++++++-
 include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a0384a4..d3e8e90 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -18,6 +18,7 @@
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
 #include <asm/system.h>
+#include <trace/events/irq.h>
 
 /*
  * These correspond to the IORESOURCE_IRQ_* defines in
@@ -407,7 +408,12 @@ asmlinkage void do_softirq(void);
 asmlinkage void __do_softirq(void);
 extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
-#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
+static inline void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise((struct softirq_action *)&nr, NULL);
+	or_softirq_pending(1UL << nr);
+}
+
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 extern void wakeup_softirqd(void);
diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 0e4cfb6..3ddda02 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -5,7 +5,9 @@
 #define _TRACE_IRQ_H
 
 #include <linux/tracepoint.h>
-#include <linux/interrupt.h>
+
+struct irqaction;
+struct softirq_action;
 
 #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
 #define show_softirq_name(val)				\
@@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
 	),
 
 	TP_fast_assign(
-		__entry->vec = (int)(h - vec);
+		if (vec)
+			__entry->vec = (int)(h - vec);
+		else
+			__entry->vec = *((int *)h);
 	),
 
 	TP_printk("vec=%d [action=%s]", __entry->vec,
@@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit,
 	TP_ARGS(h, vec)
 );
 
+/**
+ * softirq_raise - called immediately when a softirq is raised
+ * @h: pointer to struct softirq_action
+ * @vec: pointer to first struct softirq_action in softirq_vec array
+ *
+ * The @h parameter contains a pointer to the softirq vector number which is
+ * raised. @vec is NULL and it means @h includes vector number not
+ * softirq_action. When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise latency.
+ */
+DEFINE_EVENT(softirq, softirq_raise,
+
+	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+
+	TP_ARGS(h, vec)
+);
+
 #endif /*  _TRACE_IRQ_H */
 
 /* This part must be outside protection */


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT
  2010-08-23  9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi
  2010-08-23  9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
@ 2010-08-23  9:43 ` Koki Sanagi
  2010-08-24  3:52   ` David Miller
  2010-09-08  8:34   ` [tip:perf/core] napi: Convert " tip-bot for Neil Horman
  2010-08-23  9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-08-23  9:43 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Neil Horman <nhorman@tuxdriver.com>

This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
the usability of napi_poll tracepoint.

          <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
          <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1

An original patch is below.
http://marc.info/?l=linux-kernel&m=126021713809450&w=2
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

And add a fix by Steven Rostedt.
http://marc.info/?l=linux-kernel&m=126150506519173&w=2

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/napi.h |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
index 188deca..8fe1e93 100644
--- a/include/trace/events/napi.h
+++ b/include/trace/events/napi.h
@@ -6,10 +6,31 @@
 
 #include <linux/netdevice.h>
 #include <linux/tracepoint.h>
+#include <linux/ftrace.h>
+
+#define NO_DEV "(no_device)"
+
+TRACE_EVENT(napi_poll,
 
-DECLARE_TRACE(napi_poll,
 	TP_PROTO(struct napi_struct *napi),
-	TP_ARGS(napi));
+
+	TP_ARGS(napi),
+
+	TP_STRUCT__entry(
+		__field(	struct napi_struct *,	napi)
+		__string(	dev_name, napi->dev ? napi->dev->name : NO_DEV)
+	),
+
+	TP_fast_assign(
+		__entry->napi = napi;
+		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+	),
+
+	TP_printk("napi poll on napi struct %p for device %s",
+		__entry->napi, __get_str(dev_name))
+);
+
+#undef NO_DEV
 
 #endif /* _TRACE_NAPI_H_ */
 


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 3/5] netdev: add tracepoints to netdev layer
  2010-08-23  9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi
  2010-08-23  9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
  2010-08-23  9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
@ 2010-08-23  9:45 ` Koki Sanagi
  2010-08-24  3:53   ` David Miller
  2010-09-08  8:34   ` [tip:perf/core] netdev: Add " tip-bot for Koki Sanagi
  2010-08-23  9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-08-23  9:45 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and
netif_receive_skb. These tracepoints help you to monitor network driver's
input/output.

          <idle>-0     [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
          <idle>-0     [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
            sshd-6828  [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
            sshd-6828  [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/net.h |   82 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |    6 +++
 net/core/net-traces.c      |    1 +
 3 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
new file mode 100644
index 0000000..5f247f5
--- /dev/null
+++ b/include/trace/events/net.h
@@ -0,0 +1,82 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net
+
+#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/ip.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(net_dev_xmit,
+
+	TP_PROTO(struct sk_buff *skb,
+		 int rc),
+
+	TP_ARGS(skb, rc),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__field(	int,		rc		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__entry->rc = rc;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
+		__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
+);
+
+DECLARE_EVENT_CLASS(net_dev_template,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u",
+		__get_str(name), __entry->skbaddr, __entry->len)
+)
+
+DEFINE_EVENT(net_dev_template, net_dev_queue,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_receive_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_rx,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+#endif /* _TRACE_NET_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/dev.c b/net/core/dev.c
index 7cd5237..c9b026a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -128,6 +128,7 @@
 #include <linux/jhash.h>
 #include <linux/random.h>
 #include <trace/events/napi.h>
+#include <trace/events/net.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -1978,6 +1979,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		rc = ops->ndo_start_xmit(skb, dev);
+		trace_net_dev_xmit(skb, rc);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
 		return rc;
@@ -1998,6 +2000,7 @@ gso:
 			skb_dst_drop(nskb);
 
 		rc = ops->ndo_start_xmit(nskb, dev);
+		trace_net_dev_xmit(nskb, rc);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;
@@ -2186,6 +2189,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
 #endif
+	trace_net_dev_queue(skb);
 	if (q->enqueue) {
 		rc = __dev_xmit_skb(skb, q, dev, txq);
 		goto out;
@@ -2525,6 +2529,7 @@ int netif_rx(struct sk_buff *skb)
 	if (netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	trace_netif_rx(skb);
 #ifdef CONFIG_RPS
 	{
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
@@ -2841,6 +2846,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	if (!netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	trace_netif_receive_skb(skb);
 	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
 		return NET_RX_SUCCESS;
 
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index afa6380..7f1bb2a 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -26,6 +26,7 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/skb.h>
+#include <trace/events/net.h>
 #include <trace/events/napi.h>
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 4/5] skb: add tracepoints to freeing skb
  2010-08-23  9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi
                   ` (2 preceding siblings ...)
  2010-08-23  9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
@ 2010-08-23  9:46 ` Koki Sanagi
  2010-08-24  3:53   ` David Miller
  2010-09-08  8:35   ` [tip:perf/core] skb: Add " tip-bot for Koki Sanagi
  2010-08-23  9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi
  2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt
  5 siblings, 2 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-08-23  9:46 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

This patch adds tracepoint to consume_skb and add trace_kfree_skb before
__kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check how long it
takes to free transmited packets. And using it, we can calculate how many
packets driver had at that time. It is useful when a drop of transmited packet
is a problem.

            sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/trace/events/skb.h |   17 +++++++++++++++++
 net/core/datagram.c        |    1 +
 net/core/dev.c             |    2 ++
 net/core/skbuff.c          |    1 +
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..75ce9d5 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb,
 		__entry->skbaddr, __entry->protocol, __entry->location)
 );
 
+TRACE_EVENT(consume_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,	skbaddr	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p", __entry->skbaddr)
+);
+
 TRACE_EVENT(skb_copy_datagram_iovec,
 
 	TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 251997a..282806b 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
+	trace_kfree_skb(skb, skb_free_datagram_locked);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(skb_free_datagram_locked);
diff --git a/net/core/dev.c b/net/core/dev.c
index c9b026a..48f7977 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -129,6 +129,7 @@
 #include <linux/random.h>
 #include <trace/events/napi.h>
 #include <trace/events/net.h>
+#include <trace/events/skb.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -2589,6 +2590,7 @@ static void net_tx_action(struct softirq_action *h)
 			clist = clist->next;
 
 			WARN_ON(atomic_read(&skb->users));
+			trace_kfree_skb(skb, net_tx_action);
 			__kfree_skb(skb);
 		}
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 99ef721..ef4ffa8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
+	trace_consume_skb(skb);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(consume_skb);


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH v4 5/5] perf:add a script shows a process of packet
  2010-08-23  9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi
                   ` (3 preceding siblings ...)
  2010-08-23  9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi
@ 2010-08-23  9:47 ` Koki Sanagi
  2010-08-24  3:53   ` David Miller
                     ` (2 more replies)
  2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt
  5 siblings, 3 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-08-23  9:47 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

Add a perf script which shows a process of packets and processed time.
It helps us to investigate networking or network device.

If you want to use it, install perf and record perf.data like following.

#perf trace record netdev-times [script]

If you set script, perf gathers records until it ends.
If not, you must Ctrl-C to stop recording.

And if you want a report from record,

#perf trace report netdev-times [options]

If you use some options, you can limit an output.
Option is below.

tx: show only process of tx packets
rx: show only process of rx packets
dev=: show a process specified with this option
debug: work with debug mode. It shows buffer status.

For example, if you want to show a process of received packets associated
with eth4,

#perf trace report netdev-times rx dev=eth4
106133.171439sec cpu=0
  irq_entry(+0.000msec irq=24:eth4)
         |
  softirq_entry(+0.006msec)
         |
         |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
         |            |
         |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
         |
  napi_poll_exit(+0.022msec eth4)

This perf script helps us to analyze a process time of transmit/receive
sequence.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 tools/perf/scripts/python/bin/netdev-times-record |    8 +
 tools/perf/scripts/python/bin/netdev-times-report |    5 +
 tools/perf/scripts/python/netdev-times.py         |  464 +++++++++++++++++++++
 3 files changed, 477 insertions(+), 0 deletions(-)

diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record
new file mode 100644
index 0000000..2b59511
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-record
@@ -0,0 +1,8 @@
+#!/bin/bash
+perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue	\
+		-e net:netif_receive_skb -e net:netif_rx		\
+		-e skb:consume_skb -e skb:kfree_skb			\
+		-e skb:skb_copy_datagram_iovec -e napi:napi_poll	\
+		-e irq:irq_handler_entry -e irq:irq_handler_exit	\
+		-e irq:softirq_entry -e irq:softirq_exit		\
+		-e irq:softirq_raise $@
diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report
new file mode 100644
index 0000000..c3d0a63
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-report
@@ -0,0 +1,5 @@
+#!/bin/bash
+# description: display a process of packet and processing time
+# args: [tx] [rx] [dev=] [debug]
+
+perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@
diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py
new file mode 100644
index 0000000..9aa0a32
--- /dev/null
+++ b/tools/perf/scripts/python/netdev-times.py
@@ -0,0 +1,464 @@
+# Display a process of packets and processed time.
+# It helps us to investigate networking or network device.
+#
+# options
+# tx: show only tx chart
+# rx: show only rx chart
+# dev=: show only thing related to specified device
+# debug: work with debug mode. It shows buffer status.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+all_event_list = []; # insert all tracepoint event related with this script
+irq_dic = {}; # key is cpu and value is a list which stacks irqs
+              # which raise NET_RX softirq
+net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry
+		 # and a list which stacks receive
+receive_hunk_list = []; # a list which include a sequence of receive events
+rx_skb_list = []; # received packet list for matching
+		       # skb_copy_datagram_iovec
+
+buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and
+		       # tx_xmit_list
+of_count_rx_skb_list = 0; # overflow count
+
+tx_queue_list = []; # list of packets which pass through dev_queue_xmit
+of_count_tx_queue_list = 0; # overflow count
+
+tx_xmit_list = [];  # list of packets which pass through dev_hard_start_xmit
+of_count_tx_xmit_list = 0; # overflow count
+
+tx_free_list = [];  # list of packets which is freed
+
+# options
+show_tx = 0;
+show_rx = 0;
+dev = 0; # store a name of device specified by option "dev="
+debug = 0;
+
+# indices of event_info tuple
+EINFO_IDX_NAME=   0
+EINFO_IDX_CONTEXT=1
+EINFO_IDX_CPU=    2
+EINFO_IDX_TIME=   3
+EINFO_IDX_PID=    4
+EINFO_IDX_COMM=   5
+
+# Calculate a time interval(msec) from src(nsec) to dst(nsec)
+def diff_msec(src, dst):
+	return (dst - src) / 1000000.0
+
+# Display a process of transmitting a packet
+def print_transmit(hunk):
+	if dev != 0 and hunk['dev'].find(dev) < 0:
+		return
+	print "%7s %5d %6d.%06dsec %12.3fmsec      %12.3fmsec" % \
+		(hunk['dev'], hunk['len'],
+		nsecs_secs(hunk['queue_t']),
+		nsecs_nsecs(hunk['queue_t'])/1000,
+		diff_msec(hunk['queue_t'], hunk['xmit_t']),
+		diff_msec(hunk['xmit_t'], hunk['free_t']))
+
+# Format for displaying rx packet processing
+PF_IRQ_ENTRY= "  irq_entry(+%.3fmsec irq=%d:%s)"
+PF_SOFT_ENTRY="  softirq_entry(+%.3fmsec)"
+PF_NAPI_POLL= "  napi_poll_exit(+%.3fmsec %s)"
+PF_JOINT=     "         |"
+PF_WJOINT=    "         |            |"
+PF_NET_RECV=  "         |---netif_receive_skb(+%.3fmsec skb=%x len=%d)"
+PF_NET_RX=    "         |---netif_rx(+%.3fmsec skb=%x)"
+PF_CPY_DGRAM= "         |      skb_copy_datagram_iovec(+%.3fmsec %d:%s)"
+PF_KFREE_SKB= "         |      kfree_skb(+%.3fmsec location=%x)"
+PF_CONS_SKB=  "         |      consume_skb(+%.3fmsec)"
+
+# Display a process of received packets and interrputs associated with
+# a NET_RX softirq
+def print_receive(hunk):
+	show_hunk = 0
+	irq_list = hunk['irq_list']
+	cpu = irq_list[0]['cpu']
+	base_t = irq_list[0]['irq_ent_t']
+	# check if this hunk should be showed
+	if dev != 0:
+		for i in range(len(irq_list)):
+			if irq_list[i]['name'].find(dev) >= 0:
+				show_hunk = 1
+				break
+	else:
+		show_hunk = 1
+	if show_hunk == 0:
+		return
+
+	print "%d.%06dsec cpu=%d" % \
+		(nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu)
+	for i in range(len(irq_list)):
+		print PF_IRQ_ENTRY % \
+			(diff_msec(base_t, irq_list[i]['irq_ent_t']),
+			irq_list[i]['irq'], irq_list[i]['name'])
+		print PF_JOINT
+		irq_event_list = irq_list[i]['event_list']
+		for j in range(len(irq_event_list)):
+			irq_event = irq_event_list[j]
+			if irq_event['event'] == 'netif_rx':
+				print PF_NET_RX % \
+					(diff_msec(base_t, irq_event['time']),
+					irq_event['skbaddr'])
+				print PF_JOINT
+	print PF_SOFT_ENTRY % \
+		diff_msec(base_t, hunk['sirq_ent_t'])
+	print PF_JOINT
+	event_list = hunk['event_list']
+	for i in range(len(event_list)):
+		event = event_list[i]
+		if event['event_name'] == 'napi_poll':
+			print PF_NAPI_POLL % \
+			    (diff_msec(base_t, event['event_t']), event['dev'])
+			if i == len(event_list) - 1:
+				print ""
+			else:
+				print PF_JOINT
+		else:
+			print PF_NET_RECV % \
+			    (diff_msec(base_t, event['event_t']), event['skbaddr'],
+				event['len'])
+			if 'comm' in event.keys():
+				print PF_WJOINT
+				print PF_CPY_DGRAM % \
+					(diff_msec(base_t, event['comm_t']),
+					event['pid'], event['comm'])
+			elif 'handle' in event.keys():
+				print PF_WJOINT
+				if event['handle'] == "kfree_skb":
+					print PF_KFREE_SKB % \
+						(diff_msec(base_t,
+						event['comm_t']),
+						event['location'])
+				elif event['handle'] == "consume_skb":
+					print PF_CONS_SKB % \
+						diff_msec(base_t,
+							event['comm_t'])
+			print PF_JOINT
+
+def trace_begin():
+	global show_tx
+	global show_rx
+	global dev
+	global debug
+
+	for i in range(len(sys.argv)):
+		if i == 0:
+			continue
+		arg = sys.argv[i]
+		if arg == 'tx':
+			show_tx = 1
+		elif arg =='rx':
+			show_rx = 1
+		elif arg.find('dev=',0, 4) >= 0:
+			dev = arg[4:]
+		elif arg == 'debug':
+			debug = 1
+	if show_tx == 0  and show_rx == 0:
+		show_tx = 1
+		show_rx = 1
+
+def trace_end():
+	# order all events in time
+	all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
+					    b[EINFO_IDX_TIME]))
+	# process all events
+	for i in range(len(all_event_list)):
+		event_info = all_event_list[i]
+		name = event_info[EINFO_IDX_NAME]
+		if name == 'irq__softirq_exit':
+			handle_irq_softirq_exit(event_info)
+		elif name == 'irq__softirq_entry':
+			handle_irq_softirq_entry(event_info)
+		elif name == 'irq__softirq_raise':
+			handle_irq_softirq_raise(event_info)
+		elif name == 'irq__irq_handler_entry':
+			handle_irq_handler_entry(event_info)
+		elif name == 'irq__irq_handler_exit':
+			handle_irq_handler_exit(event_info)
+		elif name == 'napi__napi_poll':
+			handle_napi_poll(event_info)
+		elif name == 'net__netif_receive_skb':
+			handle_netif_receive_skb(event_info)
+		elif name == 'net__netif_rx':
+			handle_netif_rx(event_info)
+		elif name == 'skb__skb_copy_datagram_iovec':
+			handle_skb_copy_datagram_iovec(event_info)
+		elif name == 'net__net_dev_queue':
+			handle_net_dev_queue(event_info)
+		elif name == 'net__net_dev_xmit':
+			handle_net_dev_xmit(event_info)
+		elif name == 'skb__kfree_skb':
+			handle_kfree_skb(event_info)
+		elif name == 'skb__consume_skb':
+			handle_consume_skb(event_info)
+	# display receive hunks
+	if show_rx:
+		for i in range(len(receive_hunk_list)):
+			print_receive(receive_hunk_list[i])
+	# display transmit hunks
+	if show_tx:
+		print "   dev    len      Qdisc        " \
+			"       netdevice             free"
+		for i in range(len(tx_free_list)):
+			print_transmit(tx_free_list[i])
+	if debug:
+		print "debug buffer status"
+		print "----------------------------"
+		print "xmit Qdisc:remain:%d overflow:%d" % \
+			(len(tx_queue_list), of_count_tx_queue_list)
+		print "xmit netdevice:remain:%d overflow:%d" % \
+			(len(tx_xmit_list), of_count_tx_xmit_list)
+		print "receive:remain:%d overflow:%d" % \
+			(len(rx_skb_list), of_count_rx_skb_list)
+
+# called from perf, when it finds a correspoinding event
+def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm,
+			irq, irq_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			irq, irq_name)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret)
+	all_event_list.append(event_info)
+
+def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			napi, dev_name)
+	all_event_list.append(event_info)
+
+def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+			skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+			skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, rc, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, rc ,dev_name)
+	all_event_list.append(event_info)
+
+def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, protocol, location):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, protocol, location)
+	all_event_list.append(event_info)
+
+def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr)
+	all_event_list.append(event_info)
+
+def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm,
+	skbaddr, skblen):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen)
+	all_event_list.append(event_info)
+
+def handle_irq_handler_entry(event_info):
+	(name, context, cpu, time, pid, comm, irq, irq_name) = event_info
+	if cpu not in irq_dic.keys():
+		irq_dic[cpu] = []
+	irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time}
+	irq_dic[cpu].append(irq_record)
+
+def handle_irq_handler_exit(event_info):
+	(name, context, cpu, time, pid, comm, irq, ret) = event_info
+	if cpu not in irq_dic.keys():
+		return
+	irq_record = irq_dic[cpu].pop()
+	if irq != irq_record['irq']:
+		return
+	irq_record.update({'irq_ext_t':time})
+	# if an irq doesn't include NET_RX softirq, drop.
+	if 'event_list' in irq_record.keys():
+		irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_raise(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	if cpu not in irq_dic.keys() \
+	or len(irq_dic[cpu]) == 0:
+		return
+	irq_record = irq_dic[cpu].pop()
+	if 'event_list' in irq_record.keys():
+		irq_event_list = irq_record['event_list']
+	else:
+		irq_event_list = []
+	irq_event_list.append({'time':time, 'event':'sirq_raise'})
+	irq_record.update({'event_list':irq_event_list})
+	irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_entry(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]}
+
+def handle_irq_softirq_exit(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	irq_list = []
+	event_list = 0
+	if cpu in irq_dic.keys():
+		irq_list = irq_dic[cpu]
+		del irq_dic[cpu]
+	if cpu in net_rx_dic.keys():
+		sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t']
+		event_list = net_rx_dic[cpu]['event_list']
+		del net_rx_dic[cpu]
+	if irq_list == [] or event_list == 0:
+		return
+	rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time,
+		    'irq_list':irq_list, 'event_list':event_list}
+	# merge information realted to a NET_RX softirq
+	receive_hunk_list.append(rec_data)
+
+def handle_napi_poll(event_info):
+	(name, context, cpu, time, pid, comm, napi, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		event_list = net_rx_dic[cpu]['event_list']
+		rec_data = {'event_name':'napi_poll',
+				'dev':dev_name, 'event_t':time}
+		event_list.append(rec_data)
+
+def handle_netif_rx(event_info):
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	if cpu not in irq_dic.keys() \
+	or len(irq_dic[cpu]) == 0:
+		return
+	irq_record = irq_dic[cpu].pop()
+	if 'event_list' in irq_record.keys():
+		irq_event_list = irq_record['event_list']
+	else:
+		irq_event_list = []
+	irq_event_list.append({'time':time, 'event':'netif_rx',
+		'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name})
+	irq_record.update({'event_list':irq_event_list})
+	irq_dic[cpu].append(irq_record)
+
+def handle_netif_receive_skb(event_info):
+	global of_count_rx_skb_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		rec_data = {'event_name':'netif_receive_skb',
+			    'event_t':time, 'skbaddr':skbaddr, 'len':skblen}
+		event_list = net_rx_dic[cpu]['event_list']
+		event_list.append(rec_data)
+		rx_skb_list.insert(0, rec_data)
+		if len(rx_skb_list) > buffer_budget:
+			rx_skb_list.pop()
+			of_count_rx_skb_list += 1
+
+def handle_net_dev_queue(event_info):
+	global of_count_tx_queue_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time}
+	tx_queue_list.insert(0, skb)
+	if len(tx_queue_list) > buffer_budget:
+		tx_queue_list.pop()
+		of_count_tx_queue_list += 1
+
+def handle_net_dev_xmit(event_info):
+	global of_count_tx_xmit_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, rc, dev_name) = event_info
+	if rc == 0: # NETDEV_TX_OK
+		for i in range(len(tx_queue_list)):
+			skb = tx_queue_list[i]
+			if skb['skbaddr'] == skbaddr:
+				skb['xmit_t'] = time
+				tx_xmit_list.insert(0, skb)
+				del tx_queue_list[i]
+				if len(tx_xmit_list) > buffer_budget:
+					tx_xmit_list.pop()
+					of_count_tx_xmit_list += 1
+				return
+
+def handle_kfree_skb(event_info):
+	(name, context, cpu, time, pid, comm,
+		skbaddr, protocol, location) = event_info
+	for i in range(len(tx_queue_list)):
+		skb = tx_queue_list[i]
+		if skb['skbaddr'] == skbaddr:
+			del tx_queue_list[i]
+			return
+	for i in range(len(tx_xmit_list)):
+		skb = tx_xmit_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_xmit_list[i]
+			return
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if rec_data['skbaddr'] == skbaddr:
+			rec_data.update({'handle':"kfree_skb",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return
+
+def handle_consume_skb(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr) = event_info
+	for i in range(len(tx_xmit_list)):
+		skb = tx_xmit_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_xmit_list[i]
+			return
+
+def handle_skb_copy_datagram_iovec(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if skbaddr == rec_data['skbaddr']:
+			rec_data.update({'handle':"skb_copy_datagram_iovec",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT
  2010-08-23  9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
@ 2010-08-24  3:52   ` David Miller
  2010-09-08  8:34   ` [tip:perf/core] napi: Convert " tip-bot for Neil Horman
  1 sibling, 0 replies; 93+ messages in thread
From: David Miller @ 2010-08-24  3:52 UTC (permalink / raw)
  To: sanagi.koki
  Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Date: Mon, 23 Aug 2010 18:43:51 +0900

> From: Neil Horman <nhorman@tuxdriver.com>
> 
> This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
> the usability of napi_poll tracepoint.
> 
>           <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
>           <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1
> 
> An original patch is below.
> http://marc.info/?l=linux-kernel&m=126021713809450&w=2
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> And add a fix by Steven Rostedt.
> http://marc.info/?l=linux-kernel&m=126150506519173&w=2
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 3/5] netdev: add tracepoints to netdev layer
  2010-08-23  9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
@ 2010-08-24  3:53   ` David Miller
  2010-09-08  8:34   ` [tip:perf/core] netdev: Add " tip-bot for Koki Sanagi
  1 sibling, 0 replies; 93+ messages in thread
From: David Miller @ 2010-08-24  3:53 UTC (permalink / raw)
  To: sanagi.koki
  Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Date: Mon, 23 Aug 2010 18:45:02 +0900

> This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and
> netif_receive_skb. These tracepoints help you to monitor network driver's
> input/output.
> 
>           <idle>-0     [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
>           <idle>-0     [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
>             sshd-6828  [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
>             sshd-6828  [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 4/5] skb: add tracepoints to freeing skb
  2010-08-23  9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi
@ 2010-08-24  3:53   ` David Miller
  2010-09-08  8:35   ` [tip:perf/core] skb: Add " tip-bot for Koki Sanagi
  1 sibling, 0 replies; 93+ messages in thread
From: David Miller @ 2010-08-24  3:53 UTC (permalink / raw)
  To: sanagi.koki
  Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Date: Mon, 23 Aug 2010 18:46:12 +0900

> This patch adds tracepoint to consume_skb and add trace_kfree_skb before
> __kfree_skb in skb_free_datagram_locked and net_tx_action.
> Combinating with tracepoint on dev_hard_start_xmit, we can check how long it
> takes to free transmited packets. And using it, we can calculate how many
> packets driver had at that time. It is useful when a drop of transmited packet
> is a problem.
> 
>             sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 5/5] perf:add a script shows a process of packet
  2010-08-23  9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi
@ 2010-08-24  3:53   ` David Miller
  2010-09-07 16:57   ` Frederic Weisbecker
  2010-09-08  8:35   ` [tip:perf/core] perf: Add a script to show packets processing tip-bot for Koki Sanagi
  2 siblings, 0 replies; 93+ messages in thread
From: David Miller @ 2010-08-24  3:53 UTC (permalink / raw)
  To: sanagi.koki
  Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers

From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Date: Mon, 23 Aug 2010 18:47:09 +0900

> Add a perf script which shows a process of packets and processed time.
> It helps us to investigate networking or network device.
 ...
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/5] netdev: show a process of packets
  2010-08-23  9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi
                   ` (4 preceding siblings ...)
  2010-08-23  9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi
@ 2010-08-30 23:50 ` Steven Rostedt
  2010-09-03  2:10   ` Koki Sanagi
  5 siblings, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-08-30 23:50 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet,
	fweisbec, mathieu.desnoyers

On Mon, 2010-08-23 at 18:41 +0900, Koki Sanagi wrote:
> Rebase to the latest net-next.
> 
> CHANGE-LOG since v3:
>     1) change arguments of softirq tracepoint into original one.
>     2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked
>        and add trace_kfree_skb before __kfree_skb instead of them.
>     3) add tracepoint to netif_rx and display it by netdev-times script.
> 
> These patch-set adds tracepoints to show us a process of packets.
> Using these tracepoints and existing points, we can get the time when
> packet passes through some points in transmit or receive sequence.
> For example, this is an output of perf script which is attached by patch 5/5.
> 
> 106133.171439sec cpu=0
>   irq_entry(+0.000msec irq=24:eth4)
>          |
>   softirq_entry(+0.006msec)
>          |
>          |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
>          |            |
>          |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
>          |
>   napi_poll_exit(+0.022msec eth4)
> 
> 106134.175634sec cpu=1
>   irq_entry(+0.000msec irq=28:eth1)
>          |
>          |---netif_rx(+0.009msec skb=f3ef0a00)
>          |
>   softirq_entry(+0.018msec)
>          |
>          |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84)
>          |            |
>          |      skb_copy_datagram_iovec(+0.033msec 0:swapper)
>          |
>   napi_poll_exit(+0.035msec (no_device))
> 
> The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can
> show receive sequence frominterrupt(irq_entry) to application
> (skb_copy_datagram_iovec). 
> This script shows one NET_RX softirq and events related to it. All relative
> time bases on first irq_entry which raise NET_RX softirq.
> 
>    dev    len      Qdisc               netdevice             free
>    eth4    74 106125.030004sec        0.006msec             0.009msec
>    eth4    87 106125.041020sec        0.007msec             0.023msec
>    eth4    66 106125.042291sec        0.003msec             0.012msec
>    eth4    66 106125.043274sec        0.006msec             0.004msec
>    eth4   850 106125.044283sec        0.007msec             0.018msec
> 
> The above is a transmit side. There are three check-time-points.
> Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in
> dev_hard_start_xmit. It indicates finishing putting a packet to driver.
> point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet.
> Values of this script are, from left, device name, length of a packet, a time of
> point1, an interval time between point1 and point2 and an interval time between
> point2 and point3.
> 
> These times are useful to analyze a performance or to detect a point where
> packet delays. For example,
> - NET_RX softirq calling is late.
> - Application is late to take a packet.
> - It takes much time to put a transmitting packet to driver
>   (It may be caused by packed queue)
> 
> And also, these tracepoint help us to investigate a network driver's trouble
> from memory dump because ftrace records it to memory. And ftrace is so light
> even if always trace on. So, in a case investigating a problem which doesn't
> reproduce, it is useful.
> 

The entire series:

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/5] netdev: show a process of packets
  2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt
@ 2010-09-03  2:10   ` Koki Sanagi
  2010-09-03  2:17     ` David Miller
  0 siblings, 1 reply; 93+ messages in thread
From: Koki Sanagi @ 2010-09-03  2:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/08/31 8:50), Steven Rostedt wrote:
> On Mon, 2010-08-23 at 18:41 +0900, Koki Sanagi wrote:
>> Rebase to the latest net-next.
>>
>> CHANGE-LOG since v3:
>>     1) change arguments of softirq tracepoint into original one.
>>     2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked
>>        and add trace_kfree_skb before __kfree_skb instead of them.
>>     3) add tracepoint to netif_rx and display it by netdev-times script.
>>
>> These patch-set adds tracepoints to show us a process of packets.
>> Using these tracepoints and existing points, we can get the time when
>> packet passes through some points in transmit or receive sequence.
>> For example, this is an output of perf script which is attached by patch 5/5.
>>
>> 106133.171439sec cpu=0
>>   irq_entry(+0.000msec irq=24:eth4)
>>          |
>>   softirq_entry(+0.006msec)
>>          |
>>          |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
>>          |            |
>>          |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
>>          |
>>   napi_poll_exit(+0.022msec eth4)
>>
>> 106134.175634sec cpu=1
>>   irq_entry(+0.000msec irq=28:eth1)
>>          |
>>          |---netif_rx(+0.009msec skb=f3ef0a00)
>>          |
>>   softirq_entry(+0.018msec)
>>          |
>>          |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84)
>>          |            |
>>          |      skb_copy_datagram_iovec(+0.033msec 0:swapper)
>>          |
>>   napi_poll_exit(+0.035msec (no_device))
>>
>> The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can
>> show receive sequence frominterrupt(irq_entry) to application
>> (skb_copy_datagram_iovec). 
>> This script shows one NET_RX softirq and events related to it. All relative
>> time bases on first irq_entry which raise NET_RX softirq.
>>
>>    dev    len      Qdisc               netdevice             free
>>    eth4    74 106125.030004sec        0.006msec             0.009msec
>>    eth4    87 106125.041020sec        0.007msec             0.023msec
>>    eth4    66 106125.042291sec        0.003msec             0.012msec
>>    eth4    66 106125.043274sec        0.006msec             0.004msec
>>    eth4   850 106125.044283sec        0.007msec             0.018msec
>>
>> The above is a transmit side. There are three check-time-points.
>> Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in
>> dev_hard_start_xmit. It indicates finishing putting a packet to driver.
>> point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet.
>> Values of this script are, from left, device name, length of a packet, a time of
>> point1, an interval time between point1 and point2 and an interval time between
>> point2 and point3.
>>
>> These times are useful to analyze a performance or to detect a point where
>> packet delays. For example,
>> - NET_RX softirq calling is late.
>> - Application is late to take a packet.
>> - It takes much time to put a transmitting packet to driver
>>   (It may be caused by packed queue)
>>
>> And also, these tracepoint help us to investigate a network driver's trouble
>> from memory dump because ftrace records it to memory. And ftrace is so light
>> even if always trace on. So, in a case investigating a problem which doesn't
>> reproduce, it is useful.
>>
> 
> The entire series:
> 
> Acked-by: Steven Rostedt <rostedt@goodmis.org>
> 
> -- Steve
> 

Thanks many acks. and I have one question.

These patches have several component.

Patch1 is kernel component, but patch2-5 are netdev component.
What tree is good to be included ?
If it is not net-next, I must rebase to another tree.

Thanks,
Koki Sanagi.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/5] netdev: show a process of packets
  2010-09-03  2:10   ` Koki Sanagi
@ 2010-09-03  2:17     ` David Miller
  2010-09-03  2:55       ` Koki Sanagi
  0 siblings, 1 reply; 93+ messages in thread
From: David Miller @ 2010-09-03  2:17 UTC (permalink / raw)
  To: sanagi.koki
  Cc: rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet,
	fweisbec, mathieu.desnoyers

From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Date: Fri, 03 Sep 2010 11:10:51 +0900

> Thanks many acks. and I have one question.
> 
> These patches have several component.
> 
> Patch1 is kernel component, but patch2-5 are netdev component.
> What tree is good to be included ?
> If it is not net-next, I must rebase to another tree.

I would prefer it goes into the tracing tree or whatever is the most appropriate
for patch #1.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/5] netdev: show a process of packets
  2010-09-03  2:17     ` David Miller
@ 2010-09-03  2:55       ` Koki Sanagi
  2010-09-03  4:46         ` Frederic Weisbecker
  0 siblings, 1 reply; 93+ messages in thread
From: Koki Sanagi @ 2010-09-03  2:55 UTC (permalink / raw)
  To: David Miller
  Cc: rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet,
	fweisbec, mathieu.desnoyers

(2010/09/03 11:17), David Miller wrote:
> From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> Date: Fri, 03 Sep 2010 11:10:51 +0900
> 
>> Thanks many acks. and I have one question.
>>
>> These patches have several component.
>>
>> Patch1 is kernel component, but patch2-5 are netdev component.
>> What tree is good to be included ?
>> If it is not net-next, I must rebase to another tree.
> 
> I would prefer it goes into the tracing tree or whatever is the most appropriate
> for patch #1.
> 

O.K. I'll rebase to linux-2.6-tip.

Thanks,
Koki Sanagi.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/5] netdev: show a process of packets
  2010-09-03  2:55       ` Koki Sanagi
@ 2010-09-03  4:46         ` Frederic Weisbecker
  2010-09-03  5:12           ` Koki Sanagi
  0 siblings, 1 reply; 93+ messages in thread
From: Frederic Weisbecker @ 2010-09-03  4:46 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: David Miller, rostedt, netdev, linux-kernel, kaneshige.kenji,
	izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan,
	eric.dumazet, mathieu.desnoyers

On Fri, Sep 03, 2010 at 11:55:04AM +0900, Koki Sanagi wrote:
> (2010/09/03 11:17), David Miller wrote:
> > From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> > Date: Fri, 03 Sep 2010 11:10:51 +0900
> > 
> >> Thanks many acks. and I have one question.
> >>
> >> These patches have several component.
> >>
> >> Patch1 is kernel component, but patch2-5 are netdev component.
> >> What tree is good to be included ?
> >> If it is not net-next, I must rebase to another tree.
> > 
> > I would prefer it goes into the tracing tree or whatever is the most appropriate
> > for patch #1.
> > 
> 
> O.K. I'll rebase to linux-2.6-tip.


No need, they apply very well :)

I'll push that to -tip soon.

Thanks.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 0/5] netdev: show a process of packets
  2010-09-03  4:46         ` Frederic Weisbecker
@ 2010-09-03  5:12           ` Koki Sanagi
  0 siblings, 0 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-09-03  5:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: David Miller, rostedt, netdev, linux-kernel, kaneshige.kenji,
	izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan,
	eric.dumazet, mathieu.desnoyers

(2010/09/03 13:46), Frederic Weisbecker wrote:
> On Fri, Sep 03, 2010 at 11:55:04AM +0900, Koki Sanagi wrote:
>> (2010/09/03 11:17), David Miller wrote:
>>> From: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
>>> Date: Fri, 03 Sep 2010 11:10:51 +0900
>>>
>>>> Thanks many acks. and I have one question.
>>>>
>>>> These patches have several component.
>>>>
>>>> Patch1 is kernel component, but patch2-5 are netdev component.
>>>> What tree is good to be included ?
>>>> If it is not net-next, I must rebase to another tree.
>>>
>>> I would prefer it goes into the tracing tree or whatever is the most appropriate
>>> for patch #1.
>>>
>>
>> O.K. I'll rebase to linux-2.6-tip.
> 
> 
> No need, they apply very well :)
> 
> I'll push that to -tip soon.
> 
> Thanks.
> 

O.K. Thanks!

Koki Sanagi.



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise
  2010-08-23  9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
@ 2010-09-03 15:29   ` Frederic Weisbecker
  2010-09-03 15:39     ` Steven Rostedt
  2010-09-03 15:43     ` Steven Rostedt
  2010-09-08  8:33   ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan
  1 sibling, 2 replies; 93+ messages in thread
From: Frederic Weisbecker @ 2010-09-03 15:29 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, mathieu.desnoyers

On Mon, Aug 23, 2010 at 06:42:48PM +0900, Koki Sanagi wrote:
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> Add a tracepoint for tracing when softirq action is raised.
> 
> It and the existed tracepoints complete softirq's tracepoints:
> softirq_raise, softirq_entry and softirq_exit.
> 
> And when this tracepoint is used in combination with
> the softirq_entry tracepoint we can determine
> the softirq raise latency.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
> 
> [ factorize softirq events with DECLARE_EVENT_CLASS ]
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  include/linux/interrupt.h  |    8 +++++++-
>  include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
>  2 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index a0384a4..d3e8e90 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -18,6 +18,7 @@
>  #include <asm/atomic.h>
>  #include <asm/ptrace.h>
>  #include <asm/system.h>
> +#include <trace/events/irq.h>
>  
>  /*
>   * These correspond to the IORESOURCE_IRQ_* defines in
> @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void);
>  asmlinkage void __do_softirq(void);
>  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
>  extern void softirq_init(void);
> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
> +static inline void __raise_softirq_irqoff(unsigned int nr)
> +{
> +	trace_softirq_raise((struct softirq_action *)&nr, NULL);
> +	or_softirq_pending(1UL << nr);
> +}
> +
>  extern void raise_softirq_irqoff(unsigned int nr);
>  extern void raise_softirq(unsigned int nr);
>  extern void wakeup_softirqd(void);
> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> index 0e4cfb6..3ddda02 100644
> --- a/include/trace/events/irq.h
> +++ b/include/trace/events/irq.h
> @@ -5,7 +5,9 @@
>  #define _TRACE_IRQ_H
>  
>  #include <linux/tracepoint.h>
> -#include <linux/interrupt.h>
> +
> +struct irqaction;
> +struct softirq_action;
>  
>  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
>  #define show_softirq_name(val)				\
> @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
>  	),
>  
>  	TP_fast_assign(
> -		__entry->vec = (int)(h - vec);
> +		if (vec)
> +			__entry->vec = (int)(h - vec);
> +		else
> +			__entry->vec = *((int *)h);
>  	),



It seems that this will break softirq_entry/exit tracepoints.
__entry->vec will deref vec->action() for these two, which is not
what we want.

If you can't have the same tracepoint signature for the three, just
split the new one in a seperate TRACE_EVENT().

Thanks.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise
  2010-09-03 15:29   ` Frederic Weisbecker
@ 2010-09-03 15:39     ` Steven Rostedt
  2010-09-03 15:42       ` Frederic Weisbecker
  2010-09-03 15:43     ` Steven Rostedt
  1 sibling, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-09-03 15:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji,
	izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan,
	eric.dumazet, mathieu.desnoyers

On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote:

> >  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
> >  #define show_softirq_name(val)				\
> > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
> >  	),
> >  
> >  	TP_fast_assign(
> > -		__entry->vec = (int)(h - vec);
> > +		if (vec)
> > +			__entry->vec = (int)(h - vec);
> > +		else
> > +			__entry->vec = *((int *)h);
> >  	),
> 
> 
> 
> It seems that this will break softirq_entry/exit tracepoints.
> __entry->vec will deref vec->action() for these two, which is not
> what we want.

But for trace_softirq_entry and trace_softirq_exit, vec will not be
NULL.


> 
> If you can't have the same tracepoint signature for the three, just
> split the new one in a seperate TRACE_EVENT().

It may be a bit of a hack, and questionable about adding another
TRACE_EVENT(). There still is a pretty good space savings in using
DEFINE_EVENT() over TRACE_EVENT() though.

-- Steve




^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise
  2010-09-03 15:39     ` Steven Rostedt
@ 2010-09-03 15:42       ` Frederic Weisbecker
  0 siblings, 0 replies; 93+ messages in thread
From: Frederic Weisbecker @ 2010-09-03 15:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji,
	izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan,
	eric.dumazet, mathieu.desnoyers

On Fri, Sep 03, 2010 at 11:39:36AM -0400, Steven Rostedt wrote:
> On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote:
> 
> > >  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
> > >  #define show_softirq_name(val)				\
> > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
> > >  	),
> > >  
> > >  	TP_fast_assign(
> > > -		__entry->vec = (int)(h - vec);
> > > +		if (vec)
> > > +			__entry->vec = (int)(h - vec);
> > > +		else
> > > +			__entry->vec = *((int *)h);
> > >  	),
> > 
> > 
> > 
> > It seems that this will break softirq_entry/exit tracepoints.
> > __entry->vec will deref vec->action() for these two, which is not
> > what we want.
> 
> But for trace_softirq_entry and trace_softirq_exit, vec will not be
> NULL.


Oh right...

/me slaps his forehead


 
> 
> > 
> > If you can't have the same tracepoint signature for the three, just
> > split the new one in a seperate TRACE_EVENT().
> 
> It may be a bit of a hack, and questionable about adding another
> TRACE_EVENT(). There still is a pretty good space savings in using
> DEFINE_EVENT() over TRACE_EVENT() though.


Yeah, let's keep it as is.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise
  2010-09-03 15:29   ` Frederic Weisbecker
  2010-09-03 15:39     ` Steven Rostedt
@ 2010-09-03 15:43     ` Steven Rostedt
  2010-09-03 15:50       ` Frederic Weisbecker
  1 sibling, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-09-03 15:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji,
	izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan,
	eric.dumazet, mathieu.desnoyers

On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote:

> >  /*
> >   * These correspond to the IORESOURCE_IRQ_* defines in
> > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void);
> >  asmlinkage void __do_softirq(void);
> >  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
> >  extern void softirq_init(void);
> > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
> > +static inline void __raise_softirq_irqoff(unsigned int nr)
> > +{
> > +	trace_softirq_raise((struct softirq_action *)&nr, NULL);

Perhaps doing:

	trace_softirq_raise((struct softirq_action *)((unsigend long)nr),
				NULL);

and ...

> > +	or_softirq_pending(1UL << nr);
> > +}
> > +
> >  extern void raise_softirq_irqoff(unsigned int nr);
> >  extern void raise_softirq(unsigned int nr);
> >  extern void wakeup_softirqd(void);
> > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> > index 0e4cfb6..3ddda02 100644
> > --- a/include/trace/events/irq.h
> > +++ b/include/trace/events/irq.h
> > @@ -5,7 +5,9 @@
> >  #define _TRACE_IRQ_H
> >  
> >  #include <linux/tracepoint.h>
> > -#include <linux/interrupt.h>
> > +
> > +struct irqaction;
> > +struct softirq_action;
> >  
> >  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
> >  #define show_softirq_name(val)				\
> > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
> >  	),
> >  
> >  	TP_fast_assign(
> > -		__entry->vec = (int)(h - vec);
> > +		if (vec)
> > +			__entry->vec = (int)(h - vec);
> > +		else
> > +			__entry->vec = *((int *)h);

			__entry->vec = (int)h;

would be better.


> >  	),
> 
> 
> 
> It seems that this will break softirq_entry/exit tracepoints.
> __entry->vec will deref vec->action() for these two, which is not
> what we want.
> 
> If you can't have the same tracepoint signature for the three, just
> split the new one in a seperate TRACE_EVENT().

Doing the above will at least be a bit safer.

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise
  2010-09-03 15:43     ` Steven Rostedt
@ 2010-09-03 15:50       ` Frederic Weisbecker
  2010-09-06  1:46         ` Koki Sanagi
  0 siblings, 1 reply; 93+ messages in thread
From: Frederic Weisbecker @ 2010-09-03 15:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji,
	izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan,
	eric.dumazet, mathieu.desnoyers

On Fri, Sep 03, 2010 at 11:43:12AM -0400, Steven Rostedt wrote:
> On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote:
> 
> > >  /*
> > >   * These correspond to the IORESOURCE_IRQ_* defines in
> > > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void);
> > >  asmlinkage void __do_softirq(void);
> > >  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
> > >  extern void softirq_init(void);
> > > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
> > > +static inline void __raise_softirq_irqoff(unsigned int nr)
> > > +{
> > > +	trace_softirq_raise((struct softirq_action *)&nr, NULL);
> 
> Perhaps doing:
> 
> 	trace_softirq_raise((struct softirq_action *)((unsigend long)nr),
> 				NULL);
> 
> and ...
> 
> > > +	or_softirq_pending(1UL << nr);
> > > +}
> > > +
> > >  extern void raise_softirq_irqoff(unsigned int nr);
> > >  extern void raise_softirq(unsigned int nr);
> > >  extern void wakeup_softirqd(void);
> > > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> > > index 0e4cfb6..3ddda02 100644
> > > --- a/include/trace/events/irq.h
> > > +++ b/include/trace/events/irq.h
> > > @@ -5,7 +5,9 @@
> > >  #define _TRACE_IRQ_H
> > >  
> > >  #include <linux/tracepoint.h>
> > > -#include <linux/interrupt.h>
> > > +
> > > +struct irqaction;
> > > +struct softirq_action;
> > >  
> > >  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
> > >  #define show_softirq_name(val)				\
> > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
> > >  	),
> > >  
> > >  	TP_fast_assign(
> > > -		__entry->vec = (int)(h - vec);
> > > +		if (vec)
> > > +			__entry->vec = (int)(h - vec);
> > > +		else
> > > +			__entry->vec = *((int *)h);
> 
> 			__entry->vec = (int)h;
> 
> would be better.
> 
> 
> > >  	),
> > 
> > 
> > 
> > It seems that this will break softirq_entry/exit tracepoints.
> > __entry->vec will deref vec->action() for these two, which is not
> > what we want.
> > 
> > If you can't have the same tracepoint signature for the three, just
> > split the new one in a seperate TRACE_EVENT().
> 
> Doing the above will at least be a bit safer.


Agreed, I'm going to change that in the patch.

Thanks.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise
  2010-09-03 15:50       ` Frederic Weisbecker
@ 2010-09-06  1:46         ` Koki Sanagi
  0 siblings, 0 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-09-06  1:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, netdev, linux-kernel, davem, kaneshige.kenji,
	izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan,
	eric.dumazet, mathieu.desnoyers

(2010/09/04 0:50), Frederic Weisbecker wrote:
> On Fri, Sep 03, 2010 at 11:43:12AM -0400, Steven Rostedt wrote:
>> On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote:
>>
>>>>  /*
>>>>   * These correspond to the IORESOURCE_IRQ_* defines in
>>>> @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void);
>>>>  asmlinkage void __do_softirq(void);
>>>>  extern void open_softirq(int nr, void (*action)(struct softirq_action *));
>>>>  extern void softirq_init(void);
>>>> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
>>>> +static inline void __raise_softirq_irqoff(unsigned int nr)
>>>> +{
>>>> +	trace_softirq_raise((struct softirq_action *)&nr, NULL);
>>
>> Perhaps doing:
>>
>> 	trace_softirq_raise((struct softirq_action *)((unsigend long)nr),
>> 				NULL);
>>
>> and ...
>>
>>>> +	or_softirq_pending(1UL << nr);
>>>> +}
>>>> +
>>>>  extern void raise_softirq_irqoff(unsigned int nr);
>>>>  extern void raise_softirq(unsigned int nr);
>>>>  extern void wakeup_softirqd(void);
>>>> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
>>>> index 0e4cfb6..3ddda02 100644
>>>> --- a/include/trace/events/irq.h
>>>> +++ b/include/trace/events/irq.h
>>>> @@ -5,7 +5,9 @@
>>>>  #define _TRACE_IRQ_H
>>>>  
>>>>  #include <linux/tracepoint.h>
>>>> -#include <linux/interrupt.h>
>>>> +
>>>> +struct irqaction;
>>>> +struct softirq_action;
>>>>  
>>>>  #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
>>>>  #define show_softirq_name(val)				\
>>>> @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
>>>>  	),
>>>>  
>>>>  	TP_fast_assign(
>>>> -		__entry->vec = (int)(h - vec);
>>>> +		if (vec)
>>>> +			__entry->vec = (int)(h - vec);
>>>> +		else
>>>> +			__entry->vec = *((int *)h);
>>
>> 			__entry->vec = (int)h;
>>
>> would be better.
>>
>>
>>>>  	),
>>>
>>>
>>>
>>> It seems that this will break softirq_entry/exit tracepoints.
>>> __entry->vec will deref vec->action() for these two, which is not
>>> what we want.
>>>
>>> If you can't have the same tracepoint signature for the three, just
>>> split the new one in a seperate TRACE_EVENT().
>>
>> Doing the above will at least be a bit safer.
> 
> 
> Agreed, I'm going to change that in the patch.
> 
> Thanks.
> 

I agree.

Thanks,
Koki Sanagi.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH v4 5/5] perf:add a script shows a process of packet
  2010-08-23  9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi
  2010-08-24  3:53   ` David Miller
@ 2010-09-07 16:57   ` Frederic Weisbecker
  2010-09-08  8:35   ` [tip:perf/core] perf: Add a script to show packets processing tip-bot for Koki Sanagi
  2 siblings, 0 replies; 93+ messages in thread
From: Frederic Weisbecker @ 2010-09-07 16:57 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku,
	kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, mathieu.desnoyers

On Mon, Aug 23, 2010 at 06:47:09PM +0900, Koki Sanagi wrote:
> Add a perf script which shows a process of packets and processed time.
> It helps us to investigate networking or network device.
> 
> If you want to use it, install perf and record perf.data like following.
> 
> #perf trace record netdev-times [script]
> 
> If you set script, perf gathers records until it ends.
> If not, you must Ctrl-C to stop recording.
> 
> And if you want a report from record,
> 
> #perf trace report netdev-times [options]
> 
> If you use some options, you can limit an output.
> Option is below.
> 
> tx: show only process of tx packets
> rx: show only process of rx packets
> dev=: show a process specified with this option
> debug: work with debug mode. It shows buffer status.
> 
> For example, if you want to show a process of received packets associated
> with eth4,
> 
> #perf trace report netdev-times rx dev=eth4
> 106133.171439sec cpu=0
>   irq_entry(+0.000msec irq=24:eth4)
>          |
>   softirq_entry(+0.006msec)
>          |
>          |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
>          |            |
>          |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
>          |
>   napi_poll_exit(+0.022msec eth4)
> 
> This perf script helps us to analyze a process time of transmit/receive
> sequence.
> 
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> ---
>  tools/perf/scripts/python/bin/netdev-times-record |    8 +
>  tools/perf/scripts/python/bin/netdev-times-report |    5 +
>  tools/perf/scripts/python/netdev-times.py         |  464 +++++++++++++++++++++
>  3 files changed, 477 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record
> new file mode 100644
> index 0000000..2b59511
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/netdev-times-record
> @@ -0,0 +1,8 @@
> +#!/bin/bash
> +perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue	\


Nano-nits:

-c 1 and -R are now default settings for tracepoints and -f is not
needed anymore. I've removed them.



> +all_event_list = []; # insert all tracepoint event related with this script



Ah I didn't know ";" works with python :)


> +def trace_end():
> +	# order all events in time
> +	all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
> +					    b[EINFO_IDX_TIME]))



Events already arrive in time order to the scripts.

Thnaks!


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-08-23  9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
  2010-09-03 15:29   ` Frederic Weisbecker
@ 2010-09-08  8:33   ` tip-bot for Lai Jiangshan
  2010-09-08 11:25     ` [sparc build bug] " Ingo Molnar
  1 sibling, 1 reply; 93+ messages in thread
From: tip-bot for Lai Jiangshan @ 2010-09-08  8:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro

Commit-ID:  2bf2160d8805de64308e2e7c3cd97813cb58ed2f
Gitweb:     http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f
Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Tue, 7 Sep 2010 17:49:34 +0200

irq: Add tracepoint to softirq_raise

Add a tracepoint for tracing when softirq action is raised.

This and the existing tracepoints complete softirq's tracepoints:
softirq_raise, softirq_entry and softirq_exit.

And when this tracepoint is used in combination with
the softirq_entry tracepoint we can determine
the softirq raise latency.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: David Miller <davem@davemloft.net>
Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4C724298.4050509@jp.fujitsu.com>
[ factorize softirq events with DECLARE_EVENT_CLASS ]
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/interrupt.h  |    8 +++++++-
 include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a0384a4..531495d 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -18,6 +18,7 @@
 #include <asm/atomic.h>
 #include <asm/ptrace.h>
 #include <asm/system.h>
+#include <trace/events/irq.h>
 
 /*
  * These correspond to the IORESOURCE_IRQ_* defines in
@@ -407,7 +408,12 @@ asmlinkage void do_softirq(void);
 asmlinkage void __do_softirq(void);
 extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
-#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
+static inline void __raise_softirq_irqoff(unsigned int nr)
+{
+	trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL);
+	or_softirq_pending(1UL << nr);
+}
+
 extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 extern void wakeup_softirqd(void);
diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 0e4cfb6..6fa7cba 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -5,7 +5,9 @@
 #define _TRACE_IRQ_H
 
 #include <linux/tracepoint.h>
-#include <linux/interrupt.h>
+
+struct irqaction;
+struct softirq_action;
 
 #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
 #define show_softirq_name(val)				\
@@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
 	),
 
 	TP_fast_assign(
-		__entry->vec = (int)(h - vec);
+		if (vec)
+			__entry->vec = (int)(h - vec);
+		else
+			__entry->vec = (int)(long)h;
 	),
 
 	TP_printk("vec=%d [action=%s]", __entry->vec,
@@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit,
 	TP_ARGS(h, vec)
 );
 
+/**
+ * softirq_raise - called immediately when a softirq is raised
+ * @h: pointer to struct softirq_action
+ * @vec: pointer to first struct softirq_action in softirq_vec array
+ *
+ * The @h parameter contains a pointer to the softirq vector number which is
+ * raised. @vec is NULL and it means @h includes vector number not
+ * softirq_action. When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise latency.
+ */
+DEFINE_EVENT(softirq, softirq_raise,
+
+	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+
+	TP_ARGS(h, vec)
+);
+
 #endif /*  _TRACE_IRQ_H */
 
 /* This part must be outside protection */

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [tip:perf/core] napi: Convert trace_napi_poll to TRACE_EVENT
  2010-08-23  9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
  2010-08-24  3:52   ` David Miller
@ 2010-09-08  8:34   ` tip-bot for Neil Horman
  1 sibling, 0 replies; 93+ messages in thread
From: tip-bot for Neil Horman @ 2010-09-08  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro

Commit-ID:  3e4b10d7a4d2a78af64f8096dc7cdb3bebd65adb
Gitweb:     http://git.kernel.org/tip/3e4b10d7a4d2a78af64f8096dc7cdb3bebd65adb
Author:     Neil Horman <nhorman@tuxdriver.com>
AuthorDate: Mon, 23 Aug 2010 18:43:51 +0900
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Tue, 7 Sep 2010 17:51:01 +0200

napi: Convert trace_napi_poll to TRACE_EVENT

This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT
to improve the usability of napi_poll tracepoint.

          <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
          <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1

The original patch is below:
http://marc.info/?l=linux-kernel&m=126021713809450&w=2

[ sanagi.koki@jp.fujitsu.com: And add a fix by Steven Rostedt:
http://marc.info/?l=linux-kernel&m=126150506519173&w=2 ]

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4C7242D7.4050009@jp.fujitsu.com>
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/trace/events/napi.h |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
index 188deca..8fe1e93 100644
--- a/include/trace/events/napi.h
+++ b/include/trace/events/napi.h
@@ -6,10 +6,31 @@
 
 #include <linux/netdevice.h>
 #include <linux/tracepoint.h>
+#include <linux/ftrace.h>
+
+#define NO_DEV "(no_device)"
+
+TRACE_EVENT(napi_poll,
 
-DECLARE_TRACE(napi_poll,
 	TP_PROTO(struct napi_struct *napi),
-	TP_ARGS(napi));
+
+	TP_ARGS(napi),
+
+	TP_STRUCT__entry(
+		__field(	struct napi_struct *,	napi)
+		__string(	dev_name, napi->dev ? napi->dev->name : NO_DEV)
+	),
+
+	TP_fast_assign(
+		__entry->napi = napi;
+		__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+	),
+
+	TP_printk("napi poll on napi struct %p for device %s",
+		__entry->napi, __get_str(dev_name))
+);
+
+#undef NO_DEV
 
 #endif /* _TRACE_NAPI_H_ */
 

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [tip:perf/core] netdev: Add tracepoints to netdev layer
  2010-08-23  9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
  2010-08-24  3:53   ` David Miller
@ 2010-09-08  8:34   ` tip-bot for Koki Sanagi
  1 sibling, 0 replies; 93+ messages in thread
From: tip-bot for Koki Sanagi @ 2010-09-08  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro

Commit-ID:  cf66ba58b5cb8b1526e9dd2fb96ff8db048d4d44
Gitweb:     http://git.kernel.org/tip/cf66ba58b5cb8b1526e9dd2fb96ff8db048d4d44
Author:     Koki Sanagi <sanagi.koki@jp.fujitsu.com>
AuthorDate: Mon, 23 Aug 2010 18:45:02 +0900
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Tue, 7 Sep 2010 17:51:33 +0200

netdev: Add tracepoints to netdev layer

This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit,
netif_rx and netif_receive_skb. These tracepoints help you to monitor
network driver's input/output.

          <idle>-0     [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
          <idle>-0     [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
            sshd-6828  [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
            sshd-6828  [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4C72431E.3000901@jp.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/trace/events/net.h |   82 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |    6 +++
 net/core/net-traces.c      |    1 +
 3 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
new file mode 100644
index 0000000..5f247f5
--- /dev/null
+++ b/include/trace/events/net.h
@@ -0,0 +1,82 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net
+
+#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/ip.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(net_dev_xmit,
+
+	TP_PROTO(struct sk_buff *skb,
+		 int rc),
+
+	TP_ARGS(skb, rc),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__field(	int,		rc		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__entry->rc = rc;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
+		__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
+);
+
+DECLARE_EVENT_CLASS(net_dev_template,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+		__field(	unsigned int,	len		)
+		__string(	name,		skb->dev->name	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__assign_str(name, skb->dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u",
+		__get_str(name), __entry->skbaddr, __entry->len)
+)
+
+DEFINE_EVENT(net_dev_template, net_dev_queue,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_receive_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_rx,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+#endif /* _TRACE_NET_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/dev.c b/net/core/dev.c
index 3721fbb..5a4fbc7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -128,6 +128,7 @@
 #include <linux/jhash.h>
 #include <linux/random.h>
 #include <trace/events/napi.h>
+#include <trace/events/net.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -1978,6 +1979,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		rc = ops->ndo_start_xmit(skb, dev);
+		trace_net_dev_xmit(skb, rc);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
 		return rc;
@@ -1998,6 +2000,7 @@ gso:
 			skb_dst_drop(nskb);
 
 		rc = ops->ndo_start_xmit(nskb, dev);
+		trace_net_dev_xmit(nskb, rc);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;
@@ -2186,6 +2189,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
 #endif
+	trace_net_dev_queue(skb);
 	if (q->enqueue) {
 		rc = __dev_xmit_skb(skb, q, dev, txq);
 		goto out;
@@ -2512,6 +2516,7 @@ int netif_rx(struct sk_buff *skb)
 	if (netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	trace_netif_rx(skb);
 #ifdef CONFIG_RPS
 	{
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
@@ -2828,6 +2833,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	if (!netdev_tstamp_prequeue)
 		net_timestamp_check(skb);
 
+	trace_netif_receive_skb(skb);
 	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
 		return NET_RX_SUCCESS;
 
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index afa6380..7f1bb2a 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -26,6 +26,7 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/skb.h>
+#include <trace/events/net.h>
 #include <trace/events/napi.h>
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [tip:perf/core] skb: Add tracepoints to freeing skb
  2010-08-23  9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi
  2010-08-24  3:53   ` David Miller
@ 2010-09-08  8:35   ` tip-bot for Koki Sanagi
  1 sibling, 0 replies; 93+ messages in thread
From: tip-bot for Koki Sanagi @ 2010-09-08  8:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro

Commit-ID:  07dc22e7295f25526f110d704655ff0ea7687420
Gitweb:     http://git.kernel.org/tip/07dc22e7295f25526f110d704655ff0ea7687420
Author:     Koki Sanagi <sanagi.koki@jp.fujitsu.com>
AuthorDate: Mon, 23 Aug 2010 18:46:12 +0900
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Tue, 7 Sep 2010 17:51:53 +0200

skb: Add tracepoints to freeing skb

This patch adds tracepoint to consume_skb and add trace_kfree_skb
before __kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check
how long it takes to free transmitted packets. And using it, we can
calculate how many packets driver had at that time. It is useful when
a drop of transmitted packet is a problem.

            sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4C724364.50903@jp.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/trace/events/skb.h |   17 +++++++++++++++++
 net/core/datagram.c        |    1 +
 net/core/dev.c             |    2 ++
 net/core/skbuff.c          |    1 +
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..75ce9d5 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb,
 		__entry->skbaddr, __entry->protocol, __entry->location)
 );
 
+TRACE_EVENT(consume_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,	skbaddr	)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p", __entry->skbaddr)
+);
+
 TRACE_EVENT(skb_copy_datagram_iovec,
 
 	TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 251997a..282806b 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
+	trace_kfree_skb(skb, skb_free_datagram_locked);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(skb_free_datagram_locked);
diff --git a/net/core/dev.c b/net/core/dev.c
index 5a4fbc7..2308cce 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -129,6 +129,7 @@
 #include <linux/random.h>
 #include <trace/events/napi.h>
 #include <trace/events/net.h>
+#include <trace/events/skb.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -2576,6 +2577,7 @@ static void net_tx_action(struct softirq_action *h)
 			clist = clist->next;
 
 			WARN_ON(atomic_read(&skb->users));
+			trace_kfree_skb(skb, net_tx_action);
 			__kfree_skb(skb);
 		}
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3a2513f..12e61e3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
+	trace_consume_skb(skb);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(consume_skb);

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [tip:perf/core] perf: Add a script to show packets processing
  2010-08-23  9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi
  2010-08-24  3:53   ` David Miller
  2010-09-07 16:57   ` Frederic Weisbecker
@ 2010-09-08  8:35   ` tip-bot for Koki Sanagi
  2 siblings, 0 replies; 93+ messages in thread
From: tip-bot for Koki Sanagi @ 2010-09-08  8:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, tzanussi, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro

Commit-ID:  359d5106a2ff4ffa2ba129ec8f54743c341dabfc
Gitweb:     http://git.kernel.org/tip/359d5106a2ff4ffa2ba129ec8f54743c341dabfc
Author:     Koki Sanagi <sanagi.koki@jp.fujitsu.com>
AuthorDate: Mon, 23 Aug 2010 18:47:09 +0900
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Tue, 7 Sep 2010 18:43:32 +0200

perf: Add a script to show packets processing

Add a perf script which shows packets processing and processed
time. It helps us to investigate networking or network devices.

If you want to use it, install perf and record perf.data like
following.

If you set script, perf gathers records until it ends.
If not, you must Ctrl-C to stop recording.

And if you want a report from record,

If you use some options, you can limit the output.
Option is below.

tx: show only tx packets processing
rx: show only rx packets processing
dev=: show processing on this device
debug: work with debug mode. It shows buffer status.

For example, if you want to show received packets processing
associated with eth4,

106133.171439sec cpu=0
  irq_entry(+0.000msec irq=24:eth4)
         |
  softirq_entry(+0.006msec)
         |
         |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
         |            |
         |      skb_copy_datagram_iovec(+0.039msec 10291::10291)
         |
  napi_poll_exit(+0.022msec eth4)

This perf script helps us to analyze the processing time of a
transmit/receive sequence.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4C72439D.3040001@jp.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/scripts/python/bin/netdev-times-record |    8 +
 tools/perf/scripts/python/bin/netdev-times-report |    5 +
 tools/perf/scripts/python/netdev-times.py         |  464 +++++++++++++++++++++
 3 files changed, 477 insertions(+), 0 deletions(-)

diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record
new file mode 100644
index 0000000..d931a82
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-record
@@ -0,0 +1,8 @@
+#!/bin/bash
+perf record -a -e net:net_dev_xmit -e net:net_dev_queue		\
+		-e net:netif_receive_skb -e net:netif_rx		\
+		-e skb:consume_skb -e skb:kfree_skb			\
+		-e skb:skb_copy_datagram_iovec -e napi:napi_poll	\
+		-e irq:irq_handler_entry -e irq:irq_handler_exit	\
+		-e irq:softirq_entry -e irq:softirq_exit		\
+		-e irq:softirq_raise $@
diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report
new file mode 100644
index 0000000..c3d0a63
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-report
@@ -0,0 +1,5 @@
+#!/bin/bash
+# description: display a process of packet and processing time
+# args: [tx] [rx] [dev=] [debug]
+
+perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@
diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py
new file mode 100644
index 0000000..9aa0a32
--- /dev/null
+++ b/tools/perf/scripts/python/netdev-times.py
@@ -0,0 +1,464 @@
+# Display a process of packets and processed time.
+# It helps us to investigate networking or network device.
+#
+# options
+# tx: show only tx chart
+# rx: show only rx chart
+# dev=: show only thing related to specified device
+# debug: work with debug mode. It shows buffer status.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+all_event_list = []; # insert all tracepoint event related with this script
+irq_dic = {}; # key is cpu and value is a list which stacks irqs
+              # which raise NET_RX softirq
+net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry
+		 # and a list which stacks receive
+receive_hunk_list = []; # a list which include a sequence of receive events
+rx_skb_list = []; # received packet list for matching
+		       # skb_copy_datagram_iovec
+
+buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and
+		       # tx_xmit_list
+of_count_rx_skb_list = 0; # overflow count
+
+tx_queue_list = []; # list of packets which pass through dev_queue_xmit
+of_count_tx_queue_list = 0; # overflow count
+
+tx_xmit_list = [];  # list of packets which pass through dev_hard_start_xmit
+of_count_tx_xmit_list = 0; # overflow count
+
+tx_free_list = [];  # list of packets which is freed
+
+# options
+show_tx = 0;
+show_rx = 0;
+dev = 0; # store a name of device specified by option "dev="
+debug = 0;
+
+# indices of event_info tuple
+EINFO_IDX_NAME=   0
+EINFO_IDX_CONTEXT=1
+EINFO_IDX_CPU=    2
+EINFO_IDX_TIME=   3
+EINFO_IDX_PID=    4
+EINFO_IDX_COMM=   5
+
+# Calculate a time interval(msec) from src(nsec) to dst(nsec)
+def diff_msec(src, dst):
+	return (dst - src) / 1000000.0
+
+# Display a process of transmitting a packet
+def print_transmit(hunk):
+	if dev != 0 and hunk['dev'].find(dev) < 0:
+		return
+	print "%7s %5d %6d.%06dsec %12.3fmsec      %12.3fmsec" % \
+		(hunk['dev'], hunk['len'],
+		nsecs_secs(hunk['queue_t']),
+		nsecs_nsecs(hunk['queue_t'])/1000,
+		diff_msec(hunk['queue_t'], hunk['xmit_t']),
+		diff_msec(hunk['xmit_t'], hunk['free_t']))
+
+# Format for displaying rx packet processing
+PF_IRQ_ENTRY= "  irq_entry(+%.3fmsec irq=%d:%s)"
+PF_SOFT_ENTRY="  softirq_entry(+%.3fmsec)"
+PF_NAPI_POLL= "  napi_poll_exit(+%.3fmsec %s)"
+PF_JOINT=     "         |"
+PF_WJOINT=    "         |            |"
+PF_NET_RECV=  "         |---netif_receive_skb(+%.3fmsec skb=%x len=%d)"
+PF_NET_RX=    "         |---netif_rx(+%.3fmsec skb=%x)"
+PF_CPY_DGRAM= "         |      skb_copy_datagram_iovec(+%.3fmsec %d:%s)"
+PF_KFREE_SKB= "         |      kfree_skb(+%.3fmsec location=%x)"
+PF_CONS_SKB=  "         |      consume_skb(+%.3fmsec)"
+
+# Display a process of received packets and interrputs associated with
+# a NET_RX softirq
+def print_receive(hunk):
+	show_hunk = 0
+	irq_list = hunk['irq_list']
+	cpu = irq_list[0]['cpu']
+	base_t = irq_list[0]['irq_ent_t']
+	# check if this hunk should be showed
+	if dev != 0:
+		for i in range(len(irq_list)):
+			if irq_list[i]['name'].find(dev) >= 0:
+				show_hunk = 1
+				break
+	else:
+		show_hunk = 1
+	if show_hunk == 0:
+		return
+
+	print "%d.%06dsec cpu=%d" % \
+		(nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu)
+	for i in range(len(irq_list)):
+		print PF_IRQ_ENTRY % \
+			(diff_msec(base_t, irq_list[i]['irq_ent_t']),
+			irq_list[i]['irq'], irq_list[i]['name'])
+		print PF_JOINT
+		irq_event_list = irq_list[i]['event_list']
+		for j in range(len(irq_event_list)):
+			irq_event = irq_event_list[j]
+			if irq_event['event'] == 'netif_rx':
+				print PF_NET_RX % \
+					(diff_msec(base_t, irq_event['time']),
+					irq_event['skbaddr'])
+				print PF_JOINT
+	print PF_SOFT_ENTRY % \
+		diff_msec(base_t, hunk['sirq_ent_t'])
+	print PF_JOINT
+	event_list = hunk['event_list']
+	for i in range(len(event_list)):
+		event = event_list[i]
+		if event['event_name'] == 'napi_poll':
+			print PF_NAPI_POLL % \
+			    (diff_msec(base_t, event['event_t']), event['dev'])
+			if i == len(event_list) - 1:
+				print ""
+			else:
+				print PF_JOINT
+		else:
+			print PF_NET_RECV % \
+			    (diff_msec(base_t, event['event_t']), event['skbaddr'],
+				event['len'])
+			if 'comm' in event.keys():
+				print PF_WJOINT
+				print PF_CPY_DGRAM % \
+					(diff_msec(base_t, event['comm_t']),
+					event['pid'], event['comm'])
+			elif 'handle' in event.keys():
+				print PF_WJOINT
+				if event['handle'] == "kfree_skb":
+					print PF_KFREE_SKB % \
+						(diff_msec(base_t,
+						event['comm_t']),
+						event['location'])
+				elif event['handle'] == "consume_skb":
+					print PF_CONS_SKB % \
+						diff_msec(base_t,
+							event['comm_t'])
+			print PF_JOINT
+
+def trace_begin():
+	global show_tx
+	global show_rx
+	global dev
+	global debug
+
+	for i in range(len(sys.argv)):
+		if i == 0:
+			continue
+		arg = sys.argv[i]
+		if arg == 'tx':
+			show_tx = 1
+		elif arg =='rx':
+			show_rx = 1
+		elif arg.find('dev=',0, 4) >= 0:
+			dev = arg[4:]
+		elif arg == 'debug':
+			debug = 1
+	if show_tx == 0  and show_rx == 0:
+		show_tx = 1
+		show_rx = 1
+
+def trace_end():
+	# order all events in time
+	all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
+					    b[EINFO_IDX_TIME]))
+	# process all events
+	for i in range(len(all_event_list)):
+		event_info = all_event_list[i]
+		name = event_info[EINFO_IDX_NAME]
+		if name == 'irq__softirq_exit':
+			handle_irq_softirq_exit(event_info)
+		elif name == 'irq__softirq_entry':
+			handle_irq_softirq_entry(event_info)
+		elif name == 'irq__softirq_raise':
+			handle_irq_softirq_raise(event_info)
+		elif name == 'irq__irq_handler_entry':
+			handle_irq_handler_entry(event_info)
+		elif name == 'irq__irq_handler_exit':
+			handle_irq_handler_exit(event_info)
+		elif name == 'napi__napi_poll':
+			handle_napi_poll(event_info)
+		elif name == 'net__netif_receive_skb':
+			handle_netif_receive_skb(event_info)
+		elif name == 'net__netif_rx':
+			handle_netif_rx(event_info)
+		elif name == 'skb__skb_copy_datagram_iovec':
+			handle_skb_copy_datagram_iovec(event_info)
+		elif name == 'net__net_dev_queue':
+			handle_net_dev_queue(event_info)
+		elif name == 'net__net_dev_xmit':
+			handle_net_dev_xmit(event_info)
+		elif name == 'skb__kfree_skb':
+			handle_kfree_skb(event_info)
+		elif name == 'skb__consume_skb':
+			handle_consume_skb(event_info)
+	# display receive hunks
+	if show_rx:
+		for i in range(len(receive_hunk_list)):
+			print_receive(receive_hunk_list[i])
+	# display transmit hunks
+	if show_tx:
+		print "   dev    len      Qdisc        " \
+			"       netdevice             free"
+		for i in range(len(tx_free_list)):
+			print_transmit(tx_free_list[i])
+	if debug:
+		print "debug buffer status"
+		print "----------------------------"
+		print "xmit Qdisc:remain:%d overflow:%d" % \
+			(len(tx_queue_list), of_count_tx_queue_list)
+		print "xmit netdevice:remain:%d overflow:%d" % \
+			(len(tx_xmit_list), of_count_tx_xmit_list)
+		print "receive:remain:%d overflow:%d" % \
+			(len(rx_skb_list), of_count_rx_skb_list)
+
+# called from perf, when it finds a correspoinding event
+def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec):
+	if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+		return
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm,
+			irq, irq_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			irq, irq_name)
+	all_event_list.append(event_info)
+
+def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret)
+	all_event_list.append(event_info)
+
+def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			napi, dev_name)
+	all_event_list.append(event_info)
+
+def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+			skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+			skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, dev_name)
+	all_event_list.append(event_info)
+
+def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, skblen, rc, dev_name):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen, rc ,dev_name)
+	all_event_list.append(event_info)
+
+def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm,
+			skbaddr, protocol, location):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, protocol, location)
+	all_event_list.append(event_info)
+
+def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr)
+	all_event_list.append(event_info)
+
+def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm,
+	skbaddr, skblen):
+	event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+			skbaddr, skblen)
+	all_event_list.append(event_info)
+
+def handle_irq_handler_entry(event_info):
+	(name, context, cpu, time, pid, comm, irq, irq_name) = event_info
+	if cpu not in irq_dic.keys():
+		irq_dic[cpu] = []
+	irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time}
+	irq_dic[cpu].append(irq_record)
+
+def handle_irq_handler_exit(event_info):
+	(name, context, cpu, time, pid, comm, irq, ret) = event_info
+	if cpu not in irq_dic.keys():
+		return
+	irq_record = irq_dic[cpu].pop()
+	if irq != irq_record['irq']:
+		return
+	irq_record.update({'irq_ext_t':time})
+	# if an irq doesn't include NET_RX softirq, drop.
+	if 'event_list' in irq_record.keys():
+		irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_raise(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	if cpu not in irq_dic.keys() \
+	or len(irq_dic[cpu]) == 0:
+		return
+	irq_record = irq_dic[cpu].pop()
+	if 'event_list' in irq_record.keys():
+		irq_event_list = irq_record['event_list']
+	else:
+		irq_event_list = []
+	irq_event_list.append({'time':time, 'event':'sirq_raise'})
+	irq_record.update({'event_list':irq_event_list})
+	irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_entry(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]}
+
+def handle_irq_softirq_exit(event_info):
+	(name, context, cpu, time, pid, comm, vec) = event_info
+	irq_list = []
+	event_list = 0
+	if cpu in irq_dic.keys():
+		irq_list = irq_dic[cpu]
+		del irq_dic[cpu]
+	if cpu in net_rx_dic.keys():
+		sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t']
+		event_list = net_rx_dic[cpu]['event_list']
+		del net_rx_dic[cpu]
+	if irq_list == [] or event_list == 0:
+		return
+	rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time,
+		    'irq_list':irq_list, 'event_list':event_list}
+	# merge information realted to a NET_RX softirq
+	receive_hunk_list.append(rec_data)
+
+def handle_napi_poll(event_info):
+	(name, context, cpu, time, pid, comm, napi, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		event_list = net_rx_dic[cpu]['event_list']
+		rec_data = {'event_name':'napi_poll',
+				'dev':dev_name, 'event_t':time}
+		event_list.append(rec_data)
+
+def handle_netif_rx(event_info):
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	if cpu not in irq_dic.keys() \
+	or len(irq_dic[cpu]) == 0:
+		return
+	irq_record = irq_dic[cpu].pop()
+	if 'event_list' in irq_record.keys():
+		irq_event_list = irq_record['event_list']
+	else:
+		irq_event_list = []
+	irq_event_list.append({'time':time, 'event':'netif_rx',
+		'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name})
+	irq_record.update({'event_list':irq_event_list})
+	irq_dic[cpu].append(irq_record)
+
+def handle_netif_receive_skb(event_info):
+	global of_count_rx_skb_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	if cpu in net_rx_dic.keys():
+		rec_data = {'event_name':'netif_receive_skb',
+			    'event_t':time, 'skbaddr':skbaddr, 'len':skblen}
+		event_list = net_rx_dic[cpu]['event_list']
+		event_list.append(rec_data)
+		rx_skb_list.insert(0, rec_data)
+		if len(rx_skb_list) > buffer_budget:
+			rx_skb_list.pop()
+			of_count_rx_skb_list += 1
+
+def handle_net_dev_queue(event_info):
+	global of_count_tx_queue_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, dev_name) = event_info
+	skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time}
+	tx_queue_list.insert(0, skb)
+	if len(tx_queue_list) > buffer_budget:
+		tx_queue_list.pop()
+		of_count_tx_queue_list += 1
+
+def handle_net_dev_xmit(event_info):
+	global of_count_tx_xmit_list
+
+	(name, context, cpu, time, pid, comm,
+		skbaddr, skblen, rc, dev_name) = event_info
+	if rc == 0: # NETDEV_TX_OK
+		for i in range(len(tx_queue_list)):
+			skb = tx_queue_list[i]
+			if skb['skbaddr'] == skbaddr:
+				skb['xmit_t'] = time
+				tx_xmit_list.insert(0, skb)
+				del tx_queue_list[i]
+				if len(tx_xmit_list) > buffer_budget:
+					tx_xmit_list.pop()
+					of_count_tx_xmit_list += 1
+				return
+
+def handle_kfree_skb(event_info):
+	(name, context, cpu, time, pid, comm,
+		skbaddr, protocol, location) = event_info
+	for i in range(len(tx_queue_list)):
+		skb = tx_queue_list[i]
+		if skb['skbaddr'] == skbaddr:
+			del tx_queue_list[i]
+			return
+	for i in range(len(tx_xmit_list)):
+		skb = tx_xmit_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_xmit_list[i]
+			return
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if rec_data['skbaddr'] == skbaddr:
+			rec_data.update({'handle':"kfree_skb",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return
+
+def handle_consume_skb(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr) = event_info
+	for i in range(len(tx_xmit_list)):
+		skb = tx_xmit_list[i]
+		if skb['skbaddr'] == skbaddr:
+			skb['free_t'] = time
+			tx_free_list.append(skb)
+			del tx_xmit_list[i]
+			return
+
+def handle_skb_copy_datagram_iovec(event_info):
+	(name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info
+	for i in range(len(rx_skb_list)):
+		rec_data = rx_skb_list[i]
+		if skbaddr == rec_data['skbaddr']:
+			rec_data.update({'handle':"skb_copy_datagram_iovec",
+					'comm':comm, 'pid':pid, 'comm_t':time})
+			del rx_skb_list[i]
+			return

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-09-08  8:33   ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan
@ 2010-09-08 11:25     ` Ingo Molnar
  2010-09-08 12:26       ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker
  2010-10-18  9:44       ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra
  0 siblings, 2 replies; 93+ messages in thread
From: Ingo Molnar @ 2010-09-08 11:25 UTC (permalink / raw)
  To: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro
  Cc: linux-tip-commits


* tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> Commit-ID:  2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> Gitweb:     http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
> AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900
> Committer:  Frederic Weisbecker <fweisbec@gmail.com>
> CommitDate: Tue, 7 Sep 2010 17:49:34 +0200
> 
> irq: Add tracepoint to softirq_raise
> 
> Add a tracepoint for tracing when softirq action is raised.
> 
> This and the existing tracepoints complete softirq's tracepoints:
> softirq_raise, softirq_entry and softirq_exit.
> 
> And when this tracepoint is used in combination with
> the softirq_entry tracepoint we can determine
> the softirq raise latency.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> LKML-Reference: <4C724298.4050509@jp.fujitsu.com>
> [ factorize softirq events with DECLARE_EVENT_CLASS ]
> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  include/linux/interrupt.h  |    8 +++++++-
>  include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
>  2 files changed, 31 insertions(+), 3 deletions(-)

FYI, this commit broke the Sparc build:

In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
                 from /home/mingo/tip/arch/sparc/include/asm/irq.h:6,
                 from /home/mingo/tip/include/linux/irqnr.h:10,
                 from /home/mingo/tip/include/linux/irq.h:22,
                 from /home/mingo/tip/include/asm-generic/hardirq.h:6,
                 from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11,
                 from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6,
                 from /home/mingo/tip/include/linux/hardirq.h:10,
                 from /home/mingo/tip/include/linux/ftrace_event.h:7,
                 from /home/mingo/tip/include/trace/syscall.h:6,
                 from /home/mingo/tip/include/linux/syscalls.h:76,
                 from /home/mingo/tip/init/initramfs.c:9:
/home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff':
/home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
/home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment
make[2]: *** [init/initramfs.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,

	Ingo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH] irq: Fix circular headers dependency
  2010-09-08 11:25     ` [sparc build bug] " Ingo Molnar
@ 2010-09-08 12:26       ` Frederic Weisbecker
  2010-09-09 19:54         ` [tip:perf/core] " tip-bot for Frederic Weisbecker
  2010-10-18  9:44       ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra
  1 sibling, 1 reply; 93+ messages in thread
From: Frederic Weisbecker @ 2010-09-08 12:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, mathieu.desnoyers, sanagi.koki, rostedt, nhorman,
	scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet,
	kaneshige.kenji, davem, izumi.taku, kosaki.motohiro,
	linux-tip-commits

On Wed, Sep 08, 2010 at 01:25:29PM +0200, Ingo Molnar wrote:
> 
> * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> > Commit-ID:  2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > Gitweb:     http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
> > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900
> > Committer:  Frederic Weisbecker <fweisbec@gmail.com>
> > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200
> > 
> > irq: Add tracepoint to softirq_raise
> > 
> > Add a tracepoint for tracing when softirq action is raised.
> > 
> > This and the existing tracepoints complete softirq's tracepoints:
> > softirq_raise, softirq_entry and softirq_exit.
> > 
> > And when this tracepoint is used in combination with
> > the softirq_entry tracepoint we can determine
> > the softirq raise latency.
> > 
> > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Cc: David Miller <davem@davemloft.net>
> > Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
> > Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
> > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > LKML-Reference: <4C724298.4050509@jp.fujitsu.com>
> > [ factorize softirq events with DECLARE_EVENT_CLASS ]
> > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  include/linux/interrupt.h  |    8 +++++++-
> >  include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
> >  2 files changed, 31 insertions(+), 3 deletions(-)
> 
> FYI, this commit broke the Sparc build:
> 
> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
>                  from /home/mingo/tip/arch/sparc/include/asm/irq.h:6,
>                  from /home/mingo/tip/include/linux/irqnr.h:10,
>                  from /home/mingo/tip/include/linux/irq.h:22,
>                  from /home/mingo/tip/include/asm-generic/hardirq.h:6,
>                  from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11,
>                  from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6,
>                  from /home/mingo/tip/include/linux/hardirq.h:10,
>                  from /home/mingo/tip/include/linux/ftrace_event.h:7,
>                  from /home/mingo/tip/include/trace/syscall.h:6,
>                  from /home/mingo/tip/include/linux/syscalls.h:76,
>                  from /home/mingo/tip/init/initramfs.c:9:
> /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff':
> /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
> /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment
> make[2]: *** [init/initramfs.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
> 
> 	Ingo


Yeah, there is a circular dependency.
Does that fixes the issue (and if so, does that look sane)?

Thanks.

---
>From fc21eaa02d4a6f0af396af6a106587e61515cd86 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Wed, 8 Sep 2010 14:17:31 +0200
Subject: [PATCH] irq: Fix circular headers dependency

asm-generic/hardirq.h needs asm/irq.h which might include
linux/interrupt.h as in the sparc 32 case. At this point
we need irq_cpustat generic definitions, but those are
included later in asm-generic/hardirq.h.

Then delay a bit the inclusion of irq.h from
asm-generic/hardirq.h, it doesn't need to be included early.

This fixes:

In file included from arch/sparc/include/asm/irq_32.h:11,
                  from arch/sparc/include/asm/irq.h:6,
                  from include/linux/irqnr.h:10,
                  from include/linux/irq.h:22,
                  from include/asm-generic/hardirq.h:6,
                  from arch/sparc/include/asm/hardirq_32.h:11,
                  from arch/sparc/include/asm/hardirq.h:6,
                  from include/linux/hardirq.h:10,
                  from include/linux/ftrace_event.h:7,
                  from include/trace/syscall.h:6,
                  from include/linux/syscalls.h:76,
                  from init/initramfs.c:9:
include/linux/interrupt.h: In function '__raise_softirq_irqoff':
include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
include/linux/interrupt.h:414: error: lvalue required as left operand of assignment

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 include/asm-generic/hardirq.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/hardirq.h b/include/asm-generic/hardirq.h
index 62f5908..04d0a97 100644
--- a/include/asm-generic/hardirq.h
+++ b/include/asm-generic/hardirq.h
@@ -3,13 +3,13 @@
 
 #include <linux/cache.h>
 #include <linux/threads.h>
-#include <linux/irq.h>
 
 typedef struct {
 	unsigned int __softirq_pending;
 } ____cacheline_aligned irq_cpustat_t;
 
 #include <linux/irq_cpustat.h>	/* Standard mappings for irq_cpustat_t above */
+#include <linux/irq.h>
 
 #ifndef ack_bad_irq
 static inline void ack_bad_irq(unsigned int irq)
-- 
1.6.2.3



^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [tip:perf/core] irq: Fix circular headers dependency
  2010-09-08 12:26       ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker
@ 2010-09-09 19:54         ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 93+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2010-09-09 19:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, sanagi.koki, fweisbec, tglx, laijs, mingo

Commit-ID:  3b8fad3e2f5f69bfd8e42d099ca8582fb2342edf
Gitweb:     http://git.kernel.org/tip/3b8fad3e2f5f69bfd8e42d099ca8582fb2342edf
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 8 Sep 2010 14:26:00 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 9 Sep 2010 21:28:58 +0200

irq: Fix circular headers dependency

asm-generic/hardirq.h needs asm/irq.h which might include
linux/interrupt.h as in the sparc 32 case. At this point
we need irq_cpustat generic definitions, but those are
included later in asm-generic/hardirq.h.

Then delay a bit the inclusion of irq.h from
asm-generic/hardirq.h, it doesn't need to be included early.

This fixes:

 include/linux/interrupt.h: In function '__raise_softirq_irqoff':
 include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
 include/linux/interrupt.h:414: error: lvalue required as left operand of assignment

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Cc: mathieu.desnoyers@efficios.com
Cc: rostedt@goodmis.org
Cc: nhorman@tuxdriver.com
Cc: scott.a.mcmillan@intel.com
Cc: eric.dumazet@gmail.com
Cc: kaneshige.kenji@jp.fujitsu.com
Cc: davem@davemloft.net
Cc: izumi.taku@jp.fujitsu.com
Cc: kosaki.motohiro@jp.fujitsu.com
LKML-Reference: <20100908122557.GA5310@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/asm-generic/hardirq.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/hardirq.h b/include/asm-generic/hardirq.h
index 62f5908..04d0a97 100644
--- a/include/asm-generic/hardirq.h
+++ b/include/asm-generic/hardirq.h
@@ -3,13 +3,13 @@
 
 #include <linux/cache.h>
 #include <linux/threads.h>
-#include <linux/irq.h>
 
 typedef struct {
 	unsigned int __softirq_pending;
 } ____cacheline_aligned irq_cpustat_t;
 
 #include <linux/irq_cpustat.h>	/* Standard mappings for irq_cpustat_t above */
+#include <linux/irq.h>
 
 #ifndef ack_bad_irq
 static inline void ack_bad_irq(unsigned int irq)

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-09-08 11:25     ` [sparc build bug] " Ingo Molnar
  2010-09-08 12:26       ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker
@ 2010-10-18  9:44       ` Peter Zijlstra
  2010-10-18 10:11         ` Peter Zijlstra
  2010-10-18 10:48         ` Peter Zijlstra
  1 sibling, 2 replies; 93+ messages in thread
From: Peter Zijlstra @ 2010-10-18  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro, linux-tip-commits, Heiko Carstens

On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote:
> * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> > Commit-ID:  2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > Gitweb:     http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
> > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900
> > Committer:  Frederic Weisbecker <fweisbec@gmail.com>
> > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200
> > 
> > irq: Add tracepoint to softirq_raise
> > 
> > Add a tracepoint for tracing when softirq action is raised.
> > 
> > This and the existing tracepoints complete softirq's tracepoints:
> > softirq_raise, softirq_entry and softirq_exit.
> > 
> > And when this tracepoint is used in combination with
> > the softirq_entry tracepoint we can determine
> > the softirq raise latency.
> > 
> > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Cc: David Miller <davem@davemloft.net>
> > Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
> > Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
> > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > LKML-Reference: <4C724298.4050509@jp.fujitsu.com>
> > [ factorize softirq events with DECLARE_EVENT_CLASS ]
> > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  include/linux/interrupt.h  |    8 +++++++-
> >  include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
> >  2 files changed, 31 insertions(+), 3 deletions(-)
> 
> FYI, this commit broke the Sparc build:
> 
> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
>                  from /home/mingo/tip/arch/sparc/include/asm/irq.h:6,
>                  from /home/mingo/tip/include/linux/irqnr.h:10,
>                  from /home/mingo/tip/include/linux/irq.h:22,
>                  from /home/mingo/tip/include/asm-generic/hardirq.h:6,
>                  from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11,
>                  from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6,
>                  from /home/mingo/tip/include/linux/hardirq.h:10,
>                  from /home/mingo/tip/include/linux/ftrace_event.h:7,
>                  from /home/mingo/tip/include/trace/syscall.h:6,
>                  from /home/mingo/tip/include/linux/syscalls.h:76,
>                  from /home/mingo/tip/init/initramfs.c:9:
> /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff':
> /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
> /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment
> make[2]: *** [init/initramfs.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,

I could build sparc64_defconfig, but s390 is broken for me by this...



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-10-18  9:44       ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra
@ 2010-10-18 10:11         ` Peter Zijlstra
  2010-10-18 10:26           ` Heiko Carstens
  2010-10-18 10:48         ` Peter Zijlstra
  1 sibling, 1 reply; 93+ messages in thread
From: Peter Zijlstra @ 2010-10-18 10:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro, linux-tip-commits, Heiko Carstens

On Mon, 2010-10-18 at 11:44 +0200, Peter Zijlstra wrote:
> On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote:
> > * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> > 
> > > Commit-ID:  2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > > Gitweb:     http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > > Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
> > > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900
> > > Committer:  Frederic Weisbecker <fweisbec@gmail.com>
> > > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200
> > > 
> > > irq: Add tracepoint to softirq_raise
> > > 
> > > Add a tracepoint for tracing when softirq action is raised.
> > > 
> > > This and the existing tracepoints complete softirq's tracepoints:
> > > softirq_raise, softirq_entry and softirq_exit.
> > > 
> > > And when this tracepoint is used in combination with
> > > the softirq_entry tracepoint we can determine
> > > the softirq raise latency.
> > > 
> > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > > Cc: David Miller <davem@davemloft.net>
> > > Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
> > > Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
> > > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
> > > Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > > LKML-Reference: <4C724298.4050509@jp.fujitsu.com>
> > > [ factorize softirq events with DECLARE_EVENT_CLASS ]
> > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > ---
> > >  include/linux/interrupt.h  |    8 +++++++-
> > >  include/trace/events/irq.h |   26 ++++++++++++++++++++++++--
> > >  2 files changed, 31 insertions(+), 3 deletions(-)
> > 

> I could build sparc64_defconfig, but s390 is broken for me by this...


the below makes s390 build again, not sure its completely safe for all
configs though...

Heiko?

---
 arch/s390/include/asm/hardirq.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/s390/include/asm/hardirq.h
b/arch/s390/include/asm/hardirq.h
index 498bc38..9558a71 100644
--- a/arch/s390/include/asm/hardirq.h
+++ b/arch/s390/include/asm/hardirq.h
@@ -15,7 +15,6 @@
 #include <linux/threads.h>
 #include <linux/sched.h>
 #include <linux/cache.h>
-#include <linux/interrupt.h>
 #include <asm/lowcore.h>
 
 #define local_softirq_pending() (S390_lowcore.softirq_pending)


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-10-18 10:11         ` Peter Zijlstra
@ 2010-10-18 10:26           ` Heiko Carstens
  0 siblings, 0 replies; 93+ messages in thread
From: Heiko Carstens @ 2010-10-18 10:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro, linux-tip-commits

On Mon, Oct 18, 2010 at 12:11:47PM +0200, Peter Zijlstra wrote:
> > I could build sparc64_defconfig, but s390 is broken for me by this...
> 
> 
> the below makes s390 build again, not sure its completely safe for all
> configs though...
> 
> Heiko?

We have a similar patch in our git tree to fix this issue:

http://git390.marist.edu/cgi-bin/gitweb.cgi?p=linux-2.6.git;a=commitdiff;h=b722c7e6ce9b52b58a5488aacfd90936a2720dd9

(link valid until the next rebase ;)
 
> ---
>  arch/s390/include/asm/hardirq.h |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/s390/include/asm/hardirq.h
> b/arch/s390/include/asm/hardirq.h
> index 498bc38..9558a71 100644
> --- a/arch/s390/include/asm/hardirq.h
> +++ b/arch/s390/include/asm/hardirq.h
> @@ -15,7 +15,6 @@
>  #include <linux/threads.h>
>  #include <linux/sched.h>
>  #include <linux/cache.h>
> -#include <linux/interrupt.h>
>  #include <asm/lowcore.h>
>  
>  #define local_softirq_pending() (S390_lowcore.softirq_pending)
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-10-18  9:44       ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra
  2010-10-18 10:11         ` Peter Zijlstra
@ 2010-10-18 10:48         ` Peter Zijlstra
  2010-10-19 10:58           ` Koki Sanagi
  1 sibling, 1 reply; 93+ messages in thread
From: Peter Zijlstra @ 2010-10-18 10:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt,
	nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel,
	eric.dumazet, kaneshige.kenji, davem, izumi.taku,
	kosaki.motohiro, linux-tip-commits, Heiko Carstens, Luck, Tony

On Mon, 2010-10-18 at 11:44 +0200, Peter Zijlstra wrote:
> On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote:
> > * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> > 
> > > Commit-ID:  2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > > Gitweb:     http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f
> > > Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
> > > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900
> > > Committer:  Frederic Weisbecker <fweisbec@gmail.com>
> > > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200
> > > 
> > > irq: Add tracepoint to softirq_raise
> > > 
> > > Add a tracepoint for tracing when softirq action is raised.
> > > 
> > > This and the existing tracepoints complete softirq's tracepoints:
> > > softirq_raise, softirq_entry and softirq_exit.
> > > 
> > > And when this tracepoint is used in combination with
> > > the softirq_entry tracepoint we can determine
> > > the softirq raise latency.
> > > 

> > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
> >                  from /home/mingo/tip/arch/sparc/include/asm/irq.h:6,
> >                  from /home/mingo/tip/include/linux/irqnr.h:10,
> >                  from /home/mingo/tip/include/linux/irq.h:22,
> >                  from /home/mingo/tip/include/asm-generic/hardirq.h:6,
> >                  from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11,
> >                  from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6,
> >                  from /home/mingo/tip/include/linux/hardirq.h:10,
> >                  from /home/mingo/tip/include/linux/ftrace_event.h:7,
> >                  from /home/mingo/tip/include/trace/syscall.h:6,
> >                  from /home/mingo/tip/include/linux/syscalls.h:76,
> >                  from /home/mingo/tip/init/initramfs.c:9:
> > /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff':
> > /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
> > /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment
> > make[2]: *** [init/initramfs.o] Error 1
> > make[2]: *** Waiting for unfinished jobs....
> > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
> 
> I could build sparc64_defconfig, but s390 is broken for me by this...

/me being very grumpy @ Lai.. ia64 is broken too!

/me goes revert this shite

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-10-18 10:48         ` Peter Zijlstra
@ 2010-10-19 10:58           ` Koki Sanagi
  2010-10-19 11:25             ` Peter Zijlstra
  2010-10-19 13:00             ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner
  0 siblings, 2 replies; 93+ messages in thread
From: Koki Sanagi @ 2010-10-19 10:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, mathieu.desnoyers, fweisbec, rostedt, nhorman,
	scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet,
	kaneshige.kenji, davem, izumi.taku, kosaki.motohiro,
	linux-tip-commits, Heiko Carstens, Luck, Tony

(2010/10/18 19:48), Peter Zijlstra wrote:
> On Mon, 2010-10-18 at 11:44 +0200, Peter Zijlstra wrote:
>> On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote:
>>> * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>>
>>>> Commit-ID:  2bf2160d8805de64308e2e7c3cd97813cb58ed2f
>>>> Gitweb:     http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f
>>>> Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
>>>> AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900
>>>> Committer:  Frederic Weisbecker <fweisbec@gmail.com>
>>>> CommitDate: Tue, 7 Sep 2010 17:49:34 +0200
>>>>
>>>> irq: Add tracepoint to softirq_raise
>>>>
>>>> Add a tracepoint for tracing when softirq action is raised.
>>>>
>>>> This and the existing tracepoints complete softirq's tracepoints:
>>>> softirq_raise, softirq_entry and softirq_exit.
>>>>
>>>> And when this tracepoint is used in combination with
>>>> the softirq_entry tracepoint we can determine
>>>> the softirq raise latency.
>>>>
> 
>>> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
>>>                  from /home/mingo/tip/arch/sparc/include/asm/irq.h:6,
>>>                  from /home/mingo/tip/include/linux/irqnr.h:10,
>>>                  from /home/mingo/tip/include/linux/irq.h:22,
>>>                  from /home/mingo/tip/include/asm-generic/hardirq.h:6,
>>>                  from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11,
>>>                  from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6,
>>>                  from /home/mingo/tip/include/linux/hardirq.h:10,
>>>                  from /home/mingo/tip/include/linux/ftrace_event.h:7,
>>>                  from /home/mingo/tip/include/trace/syscall.h:6,
>>>                  from /home/mingo/tip/include/linux/syscalls.h:76,
>>>                  from /home/mingo/tip/init/initramfs.c:9:
>>> /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff':
>>> /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
>>> /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment
>>> make[2]: *** [init/initramfs.o] Error 1
>>> make[2]: *** Waiting for unfinished jobs....
>>> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11,
>>
>> I could build sparc64_defconfig, but s390 is broken for me by this...
> 
> /me being very grumpy @ Lai.. ia64 is broken too!
> 
> /me goes revert this shite
> 

Now it is fixed.

http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=9f081ce5da2c8af297a0a7d15a57fb4beeed374b;hp=43e3bf203456c4f06bdd6060426976ad2bed9081


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise
  2010-10-19 10:58           ` Koki Sanagi
@ 2010-10-19 11:25             ` Peter Zijlstra
  2010-10-19 13:00             ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner
  1 sibling, 0 replies; 93+ messages in thread
From: Peter Zijlstra @ 2010-10-19 11:25 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: Ingo Molnar, mathieu.desnoyers, fweisbec, rostedt, nhorman,
	scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet,
	kaneshige.kenji, davem, izumi.taku, kosaki.motohiro,
	linux-tip-commits, Heiko Carstens, Luck, Tony

On Tue, 2010-10-19 at 19:58 +0900, Koki Sanagi wrote:
> >> I could build sparc64_defconfig, but s390 is broken for me by
> this...
> > 
> > /me being very grumpy @ Lai.. ia64 is broken too!
> > 
> > /me goes revert this shite
> > 
> 
> Now it is fixed.
> 
> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=9f081ce5da2c8af297a0a7d15a57fb4beeed374b;hp=43e3bf203456c4f06bdd6060426976ad2bed9081

No its not, -tip is still not buildable on s390 and ia64.

Please don't ever pull crap like this again.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 10:58           ` Koki Sanagi
  2010-10-19 11:25             ` Peter Zijlstra
@ 2010-10-19 13:00             ` Thomas Gleixner
  2010-10-19 13:08               ` Peter Zijlstra
                                 ` (2 more replies)
  1 sibling, 3 replies; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 13:00 UTC (permalink / raw)
  To: Koki Sanagi
  Cc: Peter Zijlstra, Ingo Molnar, mathieu.desnoyers,
	Frederic Weisbecker, Steven Rostedt, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

With the addition of trace_softirq_raise() the softirq tracepoint got
even more convoluted. Why the tracepoints take two pointers to assign
an integer is beyond my comprehension.

But adding an extra case which treats the first pointer as an unsigned
long when the second pointer is NULL including the back and forth
type casting is just horrible.

Convert the softirq tracepoints to take a single unsigned int argument
for the softirq vector number and fix the call sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/interrupt.h  |    2 -
 include/trace/events/irq.h |   54 ++++++++++++++++-----------------------------
 kernel/softirq.c           |   14 ++++++-----
 3 files changed, 29 insertions(+), 41 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 531495d..0ac1949 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -410,7 +410,7 @@ extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
 static inline void __raise_softirq_irqoff(unsigned int nr)
 {
-	trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL);
+	trace_softirq_raise(nr);
 	or_softirq_pending(1UL << nr);
 }
 
diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 6fa7cba..1c09820 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -86,76 +86,62 @@ TRACE_EVENT(irq_handler_exit,
 
 DECLARE_EVENT_CLASS(softirq,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec),
+	TP_ARGS(vec_nr),
 
 	TP_STRUCT__entry(
-		__field(	int,	vec			)
+		__field(	unsigned int,	vec	)
 	),
 
 	TP_fast_assign(
-		if (vec)
-			__entry->vec = (int)(h - vec);
-		else
-			__entry->vec = (int)(long)h;
+		__entry->vec = vec_nr;
 	),
 
-	TP_printk("vec=%d [action=%s]", __entry->vec,
+	TP_printk("vec=%u [action=%s]", __entry->vec,
 		  show_softirq_name(__entry->vec))
 );
 
 /**
  * softirq_entry - called immediately before the softirq handler
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @vec_nr:  softirq vector number
  *
- * The @h parameter, contains a pointer to the struct softirq_action
- * which has a pointer to the action handler that is called. By subtracting
- * the @vec pointer from the @h pointer, we can determine the softirq
- * number. Also, when used in combination with the softirq_exit tracepoint
- * we can determine the softirq latency.
+ * When used in combination with the softirq_exit tracepoint
+ * we can determine the softirq handler runtine.
  */
 DEFINE_EVENT(softirq, softirq_entry,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(vec_nr)
 );
 
 /**
  * softirq_exit - called immediately after the softirq handler returns
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @vec_nr:  softirq vector number
  *
- * The @h parameter contains a pointer to the struct softirq_action
- * that has handled the softirq. By subtracting the @vec pointer from
- * the @h pointer, we can determine the softirq number. Also, when used in
- * combination with the softirq_entry tracepoint we can determine the softirq
- * latency.
+ * When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq handler runtine.
  */
 DEFINE_EVENT(softirq, softirq_exit,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(vec_nr)
 );
 
 /**
  * softirq_raise - called immediately when a softirq is raised
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @vec_nr:  softirq vector number
  *
- * The @h parameter contains a pointer to the softirq vector number which is
- * raised. @vec is NULL and it means @h includes vector number not
- * softirq_action. When used in combination with the softirq_entry tracepoint
- * we can determine the softirq raise latency.
+ * When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise to run latency.
  */
 DEFINE_EVENT(softirq, softirq_raise,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(vec_nr)
 );
 
 #endif /*  _TRACE_IRQ_H */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 07b4f1b..c0a9ea5 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -212,18 +212,20 @@ restart:
 
 	do {
 		if (pending & 1) {
+			unsigned int vec_nr = h - softirq_vec;
 			int prev_count = preempt_count();
-			kstat_incr_softirqs_this_cpu(h - softirq_vec);
 
-			trace_softirq_entry(h, softirq_vec);
+			kstat_incr_softirqs_this_cpu(vec_nr);
+
+			trace_softirq_entry(vec_nr);
 			h->action(h);
-			trace_softirq_exit(h, softirq_vec);
+			trace_softirq_exit(vec_nr);
 			if (unlikely(prev_count != preempt_count())) {
 				printk(KERN_ERR "huh, entered softirq %td %s %p"
 				       "with preempt_count %08x,"
-				       " exited with %08x?\n", h - softirq_vec,
-				       softirq_to_name[h - softirq_vec],
-				       h->action, prev_count, preempt_count());
+				       " exited with %08x?\n", vec_nr,
+				       softirq_to_name[vec_nr], h->action,
+				       prev_count, preempt_count());
 				preempt_count() = prev_count;
 			}
 

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 13:00             ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner
@ 2010-10-19 13:08               ` Peter Zijlstra
  2010-10-19 13:22               ` Mathieu Desnoyers
  2010-10-21 14:52               ` [tip:perf/core] " tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 93+ messages in thread
From: Peter Zijlstra @ 2010-10-19 13:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Koki Sanagi, Ingo Molnar, mathieu.desnoyers, Frederic Weisbecker,
	Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin,
	LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku,
	kosaki.motohiro, Heiko Carstens, Luck, Tony

On Tue, 2010-10-19 at 15:00 +0200, Thomas Gleixner wrote:
> 
> With the addition of trace_softirq_raise() the softirq tracepoint got
> even more convoluted. Why the tracepoints take two pointers to assign
> an integer is beyond my comprehension.
> 
> But adding an extra case which treats the first pointer as an unsigned
> long when the second pointer is NULL including the back and forth
> type casting is just horrible.
> 
> Convert the softirq tracepoints to take a single unsigned int argument
> for the softirq vector number and fix the call sites.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

A much needed cleanup indeed!

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 13:00             ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner
  2010-10-19 13:08               ` Peter Zijlstra
@ 2010-10-19 13:22               ` Mathieu Desnoyers
  2010-10-19 13:41                 ` Thomas Gleixner
  2010-10-21 14:52               ` [tip:perf/core] " tip-bot for Thomas Gleixner
  2 siblings, 1 reply; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-19 13:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker,
	Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin,
	LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku,
	kosaki.motohiro, Heiko Carstens, Luck, Tony

* Thomas Gleixner (tglx@linutronix.de) wrote:
> With the addition of trace_softirq_raise() the softirq tracepoint got
> even more convoluted. Why the tracepoints take two pointers to assign
> an integer is beyond my comprehension.
> 
> But adding an extra case which treats the first pointer as an unsigned
> long when the second pointer is NULL including the back and forth
> type casting is just horrible.
> 
> Convert the softirq tracepoints to take a single unsigned int argument
> for the softirq vector number and fix the call sites.

Well, there was originally a reason for this oddness. The in __do_softirq(),
"h - softirq_ve"c computation was not needed outside of the tracepoint handler
in the past, but it now seems to be required with the new inlined
"kstat_incr_softirqs_this_cpu()".

So yes, thanks to this recent change, it now makes sense to pull this
computation out of the tracepoints and do it unconditionally in the kernel code.

Feel free to put my:
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thanks,

Mathieu

> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/interrupt.h  |    2 -
>  include/trace/events/irq.h |   54 ++++++++++++++++-----------------------------
>  kernel/softirq.c           |   14 ++++++-----
>  3 files changed, 29 insertions(+), 41 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 531495d..0ac1949 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -410,7 +410,7 @@ extern void open_softirq(int nr, void (*action)(struct softirq_action *));
>  extern void softirq_init(void);
>  static inline void __raise_softirq_irqoff(unsigned int nr)
>  {
> -	trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL);
> +	trace_softirq_raise(nr);
>  	or_softirq_pending(1UL << nr);
>  }
>  
> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> index 6fa7cba..1c09820 100644
> --- a/include/trace/events/irq.h
> +++ b/include/trace/events/irq.h
> @@ -86,76 +86,62 @@ TRACE_EVENT(irq_handler_exit,
>  
>  DECLARE_EVENT_CLASS(softirq,
>  
> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +	TP_PROTO(unsigned int vec_nr),
>  
> -	TP_ARGS(h, vec),
> +	TP_ARGS(vec_nr),
>  
>  	TP_STRUCT__entry(
> -		__field(	int,	vec			)
> +		__field(	unsigned int,	vec	)
>  	),
>  
>  	TP_fast_assign(
> -		if (vec)
> -			__entry->vec = (int)(h - vec);
> -		else
> -			__entry->vec = (int)(long)h;
> +		__entry->vec = vec_nr;
>  	),
>  
> -	TP_printk("vec=%d [action=%s]", __entry->vec,
> +	TP_printk("vec=%u [action=%s]", __entry->vec,
>  		  show_softirq_name(__entry->vec))
>  );
>  
>  /**
>   * softirq_entry - called immediately before the softirq handler
> - * @h: pointer to struct softirq_action
> - * @vec: pointer to first struct softirq_action in softirq_vec array
> + * @vec_nr:  softirq vector number
>   *
> - * The @h parameter, contains a pointer to the struct softirq_action
> - * which has a pointer to the action handler that is called. By subtracting
> - * the @vec pointer from the @h pointer, we can determine the softirq
> - * number. Also, when used in combination with the softirq_exit tracepoint
> - * we can determine the softirq latency.
> + * When used in combination with the softirq_exit tracepoint
> + * we can determine the softirq handler runtine.
>   */
>  DEFINE_EVENT(softirq, softirq_entry,
>  
> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +	TP_PROTO(unsigned int vec_nr),
>  
> -	TP_ARGS(h, vec)
> +	TP_ARGS(vec_nr)
>  );
>  
>  /**
>   * softirq_exit - called immediately after the softirq handler returns
> - * @h: pointer to struct softirq_action
> - * @vec: pointer to first struct softirq_action in softirq_vec array
> + * @vec_nr:  softirq vector number
>   *
> - * The @h parameter contains a pointer to the struct softirq_action
> - * that has handled the softirq. By subtracting the @vec pointer from
> - * the @h pointer, we can determine the softirq number. Also, when used in
> - * combination with the softirq_entry tracepoint we can determine the softirq
> - * latency.
> + * When used in combination with the softirq_entry tracepoint
> + * we can determine the softirq handler runtine.
>   */
>  DEFINE_EVENT(softirq, softirq_exit,
>  
> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +	TP_PROTO(unsigned int vec_nr),
>  
> -	TP_ARGS(h, vec)
> +	TP_ARGS(vec_nr)
>  );
>  
>  /**
>   * softirq_raise - called immediately when a softirq is raised
> - * @h: pointer to struct softirq_action
> - * @vec: pointer to first struct softirq_action in softirq_vec array
> + * @vec_nr:  softirq vector number
>   *
> - * The @h parameter contains a pointer to the softirq vector number which is
> - * raised. @vec is NULL and it means @h includes vector number not
> - * softirq_action. When used in combination with the softirq_entry tracepoint
> - * we can determine the softirq raise latency.
> + * When used in combination with the softirq_entry tracepoint
> + * we can determine the softirq raise to run latency.
>   */
>  DEFINE_EVENT(softirq, softirq_raise,
>  
> -	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +	TP_PROTO(unsigned int vec_nr),
>  
> -	TP_ARGS(h, vec)
> +	TP_ARGS(vec_nr)
>  );
>  
>  #endif /*  _TRACE_IRQ_H */
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 07b4f1b..c0a9ea5 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -212,18 +212,20 @@ restart:
>  
>  	do {
>  		if (pending & 1) {
> +			unsigned int vec_nr = h - softirq_vec;
>  			int prev_count = preempt_count();
> -			kstat_incr_softirqs_this_cpu(h - softirq_vec);
>  
> -			trace_softirq_entry(h, softirq_vec);
> +			kstat_incr_softirqs_this_cpu(vec_nr);
> +
> +			trace_softirq_entry(vec_nr);
>  			h->action(h);
> -			trace_softirq_exit(h, softirq_vec);
> +			trace_softirq_exit(vec_nr);
>  			if (unlikely(prev_count != preempt_count())) {
>  				printk(KERN_ERR "huh, entered softirq %td %s %p"
>  				       "with preempt_count %08x,"
> -				       " exited with %08x?\n", h - softirq_vec,
> -				       softirq_to_name[h - softirq_vec],
> -				       h->action, prev_count, preempt_count());
> +				       " exited with %08x?\n", vec_nr,
> +				       softirq_to_name[vec_nr], h->action,
> +				       prev_count, preempt_count());
>  				preempt_count() = prev_count;
>  			}
>  

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 13:22               ` Mathieu Desnoyers
@ 2010-10-19 13:41                 ` Thomas Gleixner
  2010-10-19 13:54                   ` Steven Rostedt
  2010-10-19 14:00                   ` Mathieu Desnoyers
  0 siblings, 2 replies; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 13:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker,
	Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin,
	LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku,
	kosaki.motohiro, Heiko Carstens, Luck, Tony

On Tue, 19 Oct 2010, Mathieu Desnoyers wrote:

> * Thomas Gleixner (tglx@linutronix.de) wrote:
> > With the addition of trace_softirq_raise() the softirq tracepoint got
> > even more convoluted. Why the tracepoints take two pointers to assign
> > an integer is beyond my comprehension.
> > 
> > But adding an extra case which treats the first pointer as an unsigned
> > long when the second pointer is NULL including the back and forth
> > type casting is just horrible.
> > 
> > Convert the softirq tracepoints to take a single unsigned int argument
> > for the softirq vector number and fix the call sites.
> 
> Well, there was originally a reason for this oddness. The in __do_softirq(),
> "h - softirq_ve"c computation was not needed outside of the tracepoint handler
> in the past, but it now seems to be required with the new inlined
> "kstat_incr_softirqs_this_cpu()".

Dudes, a vector computation is hardly a performance problem in that
function and definitely not an excuse for designing such horrible
interfaces.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 13:41                 ` Thomas Gleixner
@ 2010-10-19 13:54                   ` Steven Rostedt
  2010-10-19 14:07                     ` Thomas Gleixner
  2010-10-19 14:00                   ` Mathieu Desnoyers
  1 sibling, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 13:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 15:41 +0200, Thomas Gleixner wrote:
> On Tue, 19 Oct 2010, Mathieu Desnoyers wrote:

> Dudes, a vector computation is hardly a performance problem in that
> function and definitely not an excuse for designing such horrible
> interfaces.

Yes, now we can be a bit more liberal. But when these tracepoints were
going in, people were watching to make sure they have practically zero
impact when tracing was disabled.

Now that people are more use to tracepoints, they are more understanding
to have cleaner code over that extra few more lines of machine code in
the fast path.

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 13:41                 ` Thomas Gleixner
  2010-10-19 13:54                   ` Steven Rostedt
@ 2010-10-19 14:00                   ` Mathieu Desnoyers
  1 sibling, 0 replies; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-19 14:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker,
	Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin,
	LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku,
	kosaki.motohiro, Heiko Carstens, Luck, Tony

* Thomas Gleixner (tglx@linutronix.de) wrote:
> On Tue, 19 Oct 2010, Mathieu Desnoyers wrote:
> 
> > * Thomas Gleixner (tglx@linutronix.de) wrote:
> > > With the addition of trace_softirq_raise() the softirq tracepoint got
> > > even more convoluted. Why the tracepoints take two pointers to assign
> > > an integer is beyond my comprehension.
> > > 
> > > But adding an extra case which treats the first pointer as an unsigned
> > > long when the second pointer is NULL including the back and forth
> > > type casting is just horrible.
> > > 
> > > Convert the softirq tracepoints to take a single unsigned int argument
> > > for the softirq vector number and fix the call sites.
> > 
> > Well, there was originally a reason for this oddness. The in __do_softirq(),
> > "h - softirq_ve"c computation was not needed outside of the tracepoint handler
> > in the past, but it now seems to be required with the new inlined
> > "kstat_incr_softirqs_this_cpu()".
> 
> Dudes, a vector computation is hardly a performance problem in that
> function and definitely not an excuse for designing such horrible
> interfaces.

In this specific case, I think you are right. But things are not that trivial,
and you know it. We have to consider:

- Extra instruction cache footprint
- Added register pressure
- Added computation overhead of the added substraction
- Frequency of code execution

for all target architectures when we add tracepoints to performance sensitive
code paths. As a general policy, we try to keep these at the lowest possible
level, so that all tracepoints will be compiled into distro kernels without
perceivable _overall_ performance overhead. It's not something that should be
looked at only on a tracepoint-by-tracepoint overhead basis, but rather by
looking at the overall system degradation that adding 300 tracepoints would
cause.

So I agree with you that it's a trade-off between interface cleanness and
performance. When they were introduced, Tracepoint handlers were barely seen as
citizen of the kernel code base, so all that mattered was to keep the
"tracepoints off" case clean and fast. Now that tracepoint handlers seems to be
increasingly accepted as part of the kernel code base, I agree that taking into
account oddness performed in this handler becomes more important. It ends up
being a question of balance between oddness inside the tracepoint handler and
performance overhead in the off-case. The increased acceptance of the tracepoint
code-base has shifted this balance slightly in favor of cleanness.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 13:54                   ` Steven Rostedt
@ 2010-10-19 14:07                     ` Thomas Gleixner
  2010-10-19 14:28                       ` Mathieu Desnoyers
  2010-10-19 14:46                       ` Steven Rostedt
  0 siblings, 2 replies; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 14:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 19 Oct 2010, Steven Rostedt wrote:

> On Tue, 2010-10-19 at 15:41 +0200, Thomas Gleixner wrote:
> > On Tue, 19 Oct 2010, Mathieu Desnoyers wrote:
> 
> > Dudes, a vector computation is hardly a performance problem in that
> > function and definitely not an excuse for designing such horrible
> > interfaces.
> 
> Yes, now we can be a bit more liberal. But when these tracepoints were
> going in, people were watching to make sure they have practically zero
> impact when tracing was disabled.
> 
> Now that people are more use to tracepoints, they are more understanding
> to have cleaner code over that extra few more lines of machine code in
> the fast path.

The vector computation is compared to the extra tracing induced jumps
probably not even measurable. Stop defending horrible coding with
handwavy performance and impact arguments.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 14:07                     ` Thomas Gleixner
@ 2010-10-19 14:28                       ` Mathieu Desnoyers
  2010-10-19 19:49                         ` Thomas Gleixner
  2010-10-19 14:46                       ` Steven Rostedt
  1 sibling, 1 reply; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-19 14:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

* Thomas Gleixner (tglx@linutronix.de) wrote:
> On Tue, 19 Oct 2010, Steven Rostedt wrote:
> 
> > On Tue, 2010-10-19 at 15:41 +0200, Thomas Gleixner wrote:
> > > On Tue, 19 Oct 2010, Mathieu Desnoyers wrote:
> > 
> > > Dudes, a vector computation is hardly a performance problem in that
> > > function and definitely not an excuse for designing such horrible
> > > interfaces.
> > 
> > Yes, now we can be a bit more liberal. But when these tracepoints were
> > going in, people were watching to make sure they have practically zero
> > impact when tracing was disabled.
> > 
> > Now that people are more use to tracepoints, they are more understanding
> > to have cleaner code over that extra few more lines of machine code in
> > the fast path.
> 
> The vector computation is compared to the extra tracing induced jumps
> probably not even measurable. Stop defending horrible coding with
> handwavy performance and impact arguments.

>From the moment markers and tracepoints infrastructures were merged, the
performance overhead target has been assuming we would eventually be merging
"asm goto jump labels", which replace the load+test+branch with a no-op.

So compared to a 5 bytes no-op added to the fast path, this vector computation
can be expected to have a higher performance impact, because skipping a no-op on
modern architectures (x86 at least) adds technically zero cycles. Agreed, there
is still the impact on I$, extra register pressure, some leaf functions becoming
non-leaf, and added function call (which imply external side-effect, thus acting
like a barrier()). But saying that all we do is to provide handwavy performance
and impact arguments is a bit much.

Until the asm goto are more widely deployed and until gcc 4.5 is more widely
used, there are some instrumentation sites I am relunctant to consider to
instrument with tracepoints (e.g. all system call entry/exit sites). However,
we should not use the cost of the current load+test+branch tracepoint behavior
as an excuse for adding extra performance impact to kernel code, because when it
will be replaced by asm gotos, all that will be left is the performance impact
inappropriately justified as insignificant compared to the impact of the old
tracepoint scheme.

Thanks,

Mathieu

> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 14:07                     ` Thomas Gleixner
  2010-10-19 14:28                       ` Mathieu Desnoyers
@ 2010-10-19 14:46                       ` Steven Rostedt
  1 sibling, 0 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 16:07 +0200, Thomas Gleixner wrote:

> The vector computation is compared to the extra tracing induced jumps
> probably not even measurable. Stop defending horrible coding with
> handwavy performance and impact arguments.

Yes this was crappy code, I'm not defending it. But this code was from
the original tracepoints. I just looked at when this code was added, and
it was still in the time TRACE_EVENT() was in a major flux. Heck, the
code resided in include/trace/irq.h and not include/trace/events/irq.h.
And yes, a lot of decisions back then were put on handwaving performance
and impact, and it was not just coming from us.

I admit I should have cleaned it up, but I did not want to touch it
until it actually broke ;-)

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 14:28                       ` Mathieu Desnoyers
@ 2010-10-19 19:49                         ` Thomas Gleixner
  2010-10-19 20:55                           ` Steven Rostedt
                                             ` (2 more replies)
  0 siblings, 3 replies; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 19:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 19 Oct 2010, Mathieu Desnoyers wrote:
> * Thomas Gleixner (tglx@linutronix.de) wrote:
> > On Tue, 19 Oct 2010, Steven Rostedt wrote:
> as an excuse for adding extra performance impact to kernel code, because when it
> will be replaced by asm gotos, all that will be left is the performance impact
> inappropriately justified as insignificant compared to the impact of the old
> tracepoint scheme.

Can you at one point just stop your tracing lectures and look at the
facts ?

The impact of a sensible tracepoint design on the code in question
before kstat_incr_softirqs_this_cpu() was added would have been a mere
_FIVE_ bytes of text. But the original tracepoint code itself is
_TWENTY_ bytes of text larger.

So we trade horrible code plus 20 bytes text against 5 bytes of text
in the hotpath. And you tell me that these _FIVE_ bytes are impacting
performance so much that it's significant.

Now with kstat_incr_softirqs_this_cpu() the impact is zero, it even
removes code.

And talking about non impact of disabled trace points. The tracepoint
in question which made me look at the code results in deinlining
__raise_softirq_irqsoff() in net/dev/core.c. There goes your theory.

So no, you _cannot_ tell what impact a tracepoint has in reality
except by looking at the assembly output.

And what scares me way more is the size of a single tracepoint in a
code file.

Just adding "trace_softirq_entry(nr);" adds 88 bytes of text. So
that's optimized tracing code ?

All it's supposed to do is:

    if (enabled)
	trace_foo(nr);

Replace "if (enabled)" with your favourite code patching jump label
whatever magic. The above stupid version takes about 28, but the
"optimized" tracing code makes that 88. Brilliant. That's inlining
utter shite for no good reason. WTF is it necessary to inline all that
gunk ?

Please spare me the "jump label will make this less intrusive"
lecture. I'm not interested at all.

Let's instead look at some more facts:

#include <linux/interrupt.h>
#include <linux/module.h>

#include <trace/events/irq.h>

static struct softirq_action softirq_vec[NR_SOFTIRQS];

void test(struct softirq_action *h)
{
	trace_softirq_entry(h - softirq_vec);

	h->action(h);
}

Compile this code with GCC 4.5 with and without jump labels (zap the
select HAVE_ARCH_JUMP_LABEL line in arch/x86/Kconfig)

So now the !jumplabel case gives us:

../build/kernel/soft.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <test>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	41 55                	push   %r13
   6:	49 89 fd             	mov    %rdi,%r13
   9:	49 81 ed 00 00 00 00 	sub    $0x0,%r13
  10:	41 54                	push   %r12
  12:	49 c1 ed 03          	shr    $0x3,%r13
  16:	49 89 fc             	mov    %rdi,%r12
  19:	53                   	push   %rbx
  1a:	48 83 ec 08          	sub    $0x8,%rsp
  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
  25:	74 4d                	je     74 <test+0x74>
  27:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
  2e:	00 00 
  30:	ff 80 44 e0 ff ff    	incl   -0x1fbc(%rax)
  36:	48 8b 1d 00 00 00 00 	mov    0x0(%rip),%rbx        # 3d <test+0x3d>
  3d:	48 85 db             	test   %rbx,%rbx
  40:	74 13                	je     55 <test+0x55>
  42:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
  46:	44 89 ee             	mov    %r13d,%esi
  49:	ff 13                	callq  *(%rbx)
  4b:	48 83 c3 10          	add    $0x10,%rbx
  4f:	48 83 3b 00          	cmpq   $0x0,(%rbx)
  53:	eb eb                	jmp    40 <test+0x40>
  55:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
  5c:	00 00 
  5e:	ff 88 44 e0 ff ff    	decl   -0x1fbc(%rax)
  64:	48 8b 80 38 e0 ff ff 	mov    -0x1fc8(%rax),%rax
  6b:	a8 08                	test   $0x8,%al
  6d:	74 05                	je     74 <test+0x74>
  6f:	e8 00 00 00 00       	callq  74 <test+0x74>
  74:	4c 89 e7             	mov    %r12,%rdi
  77:	41 ff 14 24          	callq  *(%r12)
  7b:	58                   	pop    %rax
  7c:	5b                   	pop    %rbx
  7d:	41 5c                	pop    %r12
  7f:	41 5d                	pop    %r13
  81:	c9                   	leaveq 
  82:	c3                   	retq   

The jumplabel=y case gives:

../build/kernel/soft.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <test>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	41 55                	push   %r13
   6:	49 89 fd             	mov    %rdi,%r13
   9:	49 81 ed 00 00 00 00 	sub    $0x0,%r13
  10:	41 54                	push   %r12
  12:	49 c1 ed 03          	shr    $0x3,%r13
  16:	49 89 fc             	mov    %rdi,%r12
  19:	53                   	push   %rbx
  1a:	48 83 ec 08          	sub    $0x8,%rsp
  1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
  23:	eb 4d                	jmp    72 <test+0x72>
  25:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
  2c:	00 00 
  2e:	ff 80 44 e0 ff ff    	incl   -0x1fbc(%rax)
  34:	48 8b 1d 00 00 00 00 	mov    0x0(%rip),%rbx        # 3b <test+0x3b>
  3b:	48 85 db             	test   %rbx,%rbx
  3e:	74 13                	je     53 <test+0x53>
  40:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
  44:	44 89 ee             	mov    %r13d,%esi
  47:	ff 13                	callq  *(%rbx)
  49:	48 83 c3 10          	add    $0x10,%rbx
  4d:	48 83 3b 00          	cmpq   $0x0,(%rbx)
  51:	eb eb                	jmp    3e <test+0x3e>
  53:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
  5a:	00 00 
  5c:	ff 88 44 e0 ff ff    	decl   -0x1fbc(%rax)
  62:	48 8b 80 38 e0 ff ff 	mov    -0x1fc8(%rax),%rax
  69:	a8 08                	test   $0x8,%al
  6b:	74 05                	je     72 <test+0x72>
  6d:	e8 00 00 00 00       	callq  72 <test+0x72>
  72:	4c 89 e7             	mov    %r12,%rdi
  75:	41 ff 14 24          	callq  *(%r12)
  79:	58                   	pop    %rax
  7a:	5b                   	pop    %rbx
  7b:	41 5c                	pop    %r12
  7d:	41 5d                	pop    %r13
  7f:	c9                   	leaveq 
  80:	c3                   	retq   

So that saves _TWO_ bytes of text and replaces:

-  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
-  25:	74 4d                	je     74 <test+0x74>
+  1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
+  23:	eb 4d                	jmp    72 <test+0x72>

So it trades a conditional vs. two jumps ? WTF ??

I thought that jumplabel magic was supposed to get rid of the jump
over the tracing code ? In fact it adds another jump. Whatfor ?

Now even worse, when you NOP out the jmpq then your tracepoint is
still not enabled. Brilliant !

Did you guys ever look at the assembly output of that insane shite you
are advertising with lengthy explanations ? 

Obviously _NOT_

Come back when you can show me a clean imlementation of all this crap
which reproduces with my jumplabel enabled stock compiler. And please
just send me a patch w/o the blurb.

And sane looks like:

    jmpq   2f  <---- This gets noped out 
1:
    mov    %r12,%rdi
    callq  *(%r12)
    [whatever cleanup it takes ]
    leaveq 
    retq   

2f:
    [tracing gunk]
    jmp    1b

And further I want to see the tracing gunk in a minimal size so the
net/core/dev.c deinlining does not happen.

Thanks,

	tglx

P.S.: It might be helpful and polite if you'd take off your tracing
      blinkers from time to time.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 19:49                         ` Thomas Gleixner
@ 2010-10-19 20:55                           ` Steven Rostedt
  2010-10-19 21:07                             ` Thomas Gleixner
  2010-10-19 21:45                             ` Thomas Gleixner
  2010-10-19 21:16                           ` David Daney
  2010-10-19 21:28                           ` Jason Baron
  2 siblings, 2 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 20:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote:

> Let's instead look at some more facts:

Sure, this is where it gets fun :-)

> 
> #include <linux/interrupt.h>
> #include <linux/module.h>
> 
> #include <trace/events/irq.h>
> 
> static struct softirq_action softirq_vec[NR_SOFTIRQS];
> 
> void test(struct softirq_action *h)
> {
> 	trace_softirq_entry(h - softirq_vec);
> 
> 	h->action(h);
> }

Since I don't have your patch yet, I used the original:


void test(struct softirq_action *h)
{
        trace_softirq_entry(h, softirq_vec);
        h->action(h);
}


> Compile this code with GCC 4.5 with and without jump labels (zap the
> select HAVE_ARCH_JUMP_LABEL line in arch/x86/Kconfig)
> 
> So now the !jumplabel case gives us:
> 
> ../build/kernel/soft.o:     file format elf64-x86-64
> 
> Disassembly of section .text:
> 
> 0000000000000000 <test>:
>    0:	55                   	push   %rbp
>    1:	48 89 e5             	mov    %rsp,%rbp
>    4:	41 55                	push   %r13
>    6:	49 89 fd             	mov    %rdi,%r13
>    9:	49 81 ed 00 00 00 00 	sub    $0x0,%r13
>   10:	41 54                	push   %r12
>   12:	49 c1 ed 03          	shr    $0x3,%r13
>   16:	49 89 fc             	mov    %rdi,%r12
>   19:	53                   	push   %rbx
>   1a:	48 83 ec 08          	sub    $0x8,%rsp
>   1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
>   25:	74 4d                	je     74 <test+0x74>
>   27:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   2e:	00 00 
>   30:	ff 80 44 e0 ff ff    	incl   -0x1fbc(%rax)
>   36:	48 8b 1d 00 00 00 00 	mov    0x0(%rip),%rbx        # 3d <test+0x3d>
>   3d:	48 85 db             	test   %rbx,%rbx
>   40:	74 13                	je     55 <test+0x55>
>   42:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
>   46:	44 89 ee             	mov    %r13d,%esi
>   49:	ff 13                	callq  *(%rbx)
>   4b:	48 83 c3 10          	add    $0x10,%rbx
>   4f:	48 83 3b 00          	cmpq   $0x0,(%rbx)
>   53:	eb eb                	jmp    40 <test+0x40>
>   55:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   5c:	00 00 
>   5e:	ff 88 44 e0 ff ff    	decl   -0x1fbc(%rax)
>   64:	48 8b 80 38 e0 ff ff 	mov    -0x1fc8(%rax),%rax
>   6b:	a8 08                	test   $0x8,%al
>   6d:	74 05                	je     74 <test+0x74>
>   6f:	e8 00 00 00 00       	callq  74 <test+0x74>
>   74:	4c 89 e7             	mov    %r12,%rdi
>   77:	41 ff 14 24          	callq  *(%r12)
>   7b:	58                   	pop    %rax
>   7c:	5b                   	pop    %rbx
>   7d:	41 5c                	pop    %r12
>   7f:	41 5d                	pop    %r13
>   81:	c9                   	leaveq 
>   82:	c3                   	retq   
> 
> The jumplabel=y case gives:
> 
> ../build/kernel/soft.o:     file format elf64-x86-64
> 
> Disassembly of section .text:
> 
> 0000000000000000 <test>:
>    0:	55                   	push   %rbp
>    1:	48 89 e5             	mov    %rsp,%rbp
>    4:	41 55                	push   %r13
>    6:	49 89 fd             	mov    %rdi,%r13
>    9:	49 81 ed 00 00 00 00 	sub    $0x0,%r13
>   10:	41 54                	push   %r12
>   12:	49 c1 ed 03          	shr    $0x3,%r13
>   16:	49 89 fc             	mov    %rdi,%r12
>   19:	53                   	push   %rbx
>   1a:	48 83 ec 08          	sub    $0x8,%rsp
>   1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
>   23:	eb 4d                	jmp    72 <test+0x72>
>   25:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   2c:	00 00 
>   2e:	ff 80 44 e0 ff ff    	incl   -0x1fbc(%rax)
>   34:	48 8b 1d 00 00 00 00 	mov    0x0(%rip),%rbx        # 3b <test+0x3b>
>   3b:	48 85 db             	test   %rbx,%rbx
>   3e:	74 13                	je     53 <test+0x53>
>   40:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
>   44:	44 89 ee             	mov    %r13d,%esi
>   47:	ff 13                	callq  *(%rbx)
>   49:	48 83 c3 10          	add    $0x10,%rbx
>   4d:	48 83 3b 00          	cmpq   $0x0,(%rbx)
>   51:	eb eb                	jmp    3e <test+0x3e>
>   53:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   5a:	00 00 
>   5c:	ff 88 44 e0 ff ff    	decl   -0x1fbc(%rax)
>   62:	48 8b 80 38 e0 ff ff 	mov    -0x1fc8(%rax),%rax
>   69:	a8 08                	test   $0x8,%al
>   6b:	74 05                	je     72 <test+0x72>
>   6d:	e8 00 00 00 00       	callq  72 <test+0x72>
>   72:	4c 89 e7             	mov    %r12,%rdi
>   75:	41 ff 14 24          	callq  *(%r12)
>   79:	58                   	pop    %rax
>   7a:	5b                   	pop    %rbx
>   7b:	41 5c                	pop    %r12
>   7d:	41 5d                	pop    %r13
>   7f:	c9                   	leaveq 
>   80:	c3                   	retq   
> 
> So that saves _TWO_ bytes of text and replaces:
> 
> -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
> -  25:	74 4d                	je     74 <test+0x74>
> +  1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
> +  23:	eb 4d                	jmp    72 <test+0x72>
> 
> So it trades a conditional vs. two jumps ? WTF ??

Well, the one jmpq is noped out, and the jmp is non conditional. I've
always thought a non conditional jmp was faster than a conditional one,
since there's no need to go into the branch prediction logic. The CPU
can simply skip to the code to jump next. Of counse, this pollutes the 
I$.


> 
> I thought that jumplabel magic was supposed to get rid of the jump
> over the tracing code ? In fact it adds another jump. Whatfor ?

Because you do the h - softvec in the tracepoint parameter? I got a
different result:

Here's the diff. I did a cut -c10- to get rid of the line numbers so I
have a better diff. There's still differences due to jump locations, but
those are easy to figure out:

I diffed nojump vs jump. The '-' is with nojump, the '+' is with jumps.

--- /tmp/s2	2010-10-19 16:40:19.000000000 -0400
+++ /tmp/s1	2010-10-19 16:40:23.000000000 -0400
@@ -1,38 +1,33 @@
-00026f0 <test>:
+00027a0 <test>:
 	55                   	push   %rbp
 	48 89 e5             	mov    %rsp,%rbp
-	48 83 ec 10          	sub    $0x10,%rsp
-	48 89 1c 24          	mov    %rbx,(%rsp)
-	4c 89 64 24 08       	mov    %r12,0x8(%rsp)
-	e8 00 00 00 00       	callq  2706 <test+0x16>
+	41 54                	push   %r12
+	53                   	push   %rbx
+	e8 00 00 00 00       	callq  27ac <test+0xc>
 R_X86_64_PC32	mcount-0x4
-	8b 15 00 00 00 00    	mov    0x0(%rip),%edx        # 270c <test+0x1c>
-R_X86_64_PC32	__tracepoint_softirq_entry+0x4
 	48 89 fb             	mov    %rdi,%rbx

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
-	85 d2                	test   %edx,%edx
-	75 10                	jne    2723 <test+0x33>
+	e9 00 00 00 00       	jmpq   27b4 <test+0x14>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There's the difference with this code. We replaced a test and jump
conditional with a single jump that will later be nop'd out.


 	48 89 df             	mov    %rbx,%rdi
 	ff 13                	callq  *(%rbx)
-	48 8b 1c 24          	mov    (%rsp),%rbx
-	4c 8b 64 24 08       	mov    0x8(%rsp),%r12
+	5b                   	pop    %rbx
+	41 5c                	pop    %r12
 	c9                   	leaveq 
 	c3                   	retq

^^^^^^^^^^^^^^^^^^^
end of the fast path, below is the code that does the tracepoint.



   
+	66 90                	xchg   %ax,%ax
 	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
 	00 00 
 R_X86_64_32S	kernel_stack
 	83 80 44 e0 ff ff 01 	addl   $0x1,-0x1fbc(%rax)
-	e8 00 00 00 00       	callq  2738 <test+0x48>
+	e8 00 00 00 00       	callq  27d5 <test+0x35>
 R_X86_64_PC32	debug_lockdep_rcu_enabled-0x4
 	85 c0                	test   %eax,%eax
-	74 09                	je     2745 <test+0x55>
-	80 3d 00 00 00 00 00 	cmpb   $0x0,0x0(%rip)        # 2743 <test+0x53>
-R_X86_64_PC32	.bss-0x1
-	74 53                	je     2798 <test+0xa8>
-	4c 8b 25 00 00 00 00 	mov    0x0(%rip),%r12        # 274c <test+0x5c>
+	75 57                	jne    2830 <test+0x90>
+	4c 8b 25 00 00 00 00 	mov    0x0(%rip),%r12        # 27e0 <test+0x40>
 R_X86_64_PC32	__tracepoint_softirq_entry+0x1c
 	4d 85 e4             	test   %r12,%r12
-	74 22                	je     2773 <test+0x83>
+	74 29                	je     280e <test+0x6e>
 	49 8b 04 24          	mov    (%r12),%rax
+	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
 	49 8b 7c 24 08       	mov    0x8(%r12),%rdi
 	49 83 c4 10          	add    $0x10,%r12
 	48 c7 c2 00 00 00 00 	mov    $0x0,%rdx
@@ -41,49 +36,52 @@
 	ff d0                	callq  *%rax
 	49 8b 04 24          	mov    (%r12),%rax
 	48 85 c0             	test   %rax,%rax
-	75 e2                	jne    2755 <test+0x65>
+	75 e2                	jne    27f0 <test+0x50>
 	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
 	00 00 
 R_X86_64_32S	kernel_stack
 	83 a8 44 e0 ff ff 01 	subl   $0x1,-0x1fbc(%rax)
 	48 8b 80 38 e0 ff ff 	mov    -0x1fc8(%rax),%rax
 	a8 08                	test   $0x8,%al
-	74 85                	je     2713 <test+0x23>
-	e8 00 00 00 00       	callq  2793 <test+0xa3>
+	74 8b                	je     27b4 <test+0x14>
+	e8 00 00 00 00       	callq  282e <test+0x8e>
 R_X86_64_PC32	preempt_schedule-0x4
-	e9 7b ff ff ff       	jmpq   2713 <test+0x23>
-	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
-	00 
-	e8 00 00 00 00       	callq  27a5 <test+0xb5>
+	eb 84                	jmp    27b4 <test+0x14>
+	80 3d 00 00 00 00 00 	cmpb   $0x0,0x0(%rip)        # 2837 <test+0x97>
+R_X86_64_PC32	.bss-0x1
+	75 a0                	jne    27d9 <test+0x39>
+	e8 00 00 00 00       	callq  283e <test+0x9e>
 R_X86_64_PC32	debug_lockdep_rcu_enabled-0x4
 	85 c0                	test   %eax,%eax
-	74 9c                	je     2745 <test+0x55>
-	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 27b0 <test+0xc0>
-R_X86_64_PC32	debug_locks-0x5
-	75 3f                	jne    27f1 <test+0x101>
+	74 97                	je     27d9 <test+0x39>
+	8b 35 00 00 00 00    	mov    0x0(%rip),%esi        # 2848 <test+0xa8>
+R_X86_64_PC32	debug_locks-0x4
+	85 f6                	test   %esi,%esi
+	75 44                	jne    2890 <test+0xf0>
 	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
 	00 00 
 R_X86_64_32S	kernel_stack
-	83 b8 44 e0 ff ff 00 	cmpl   $0x0,-0x1fbc(%rax)
-	75 81                	jne    2745 <test+0x55>
+	8b 88 44 e0 ff ff    	mov    -0x1fbc(%rax),%ecx
+	85 c9                	test   %ecx,%ecx
+	0f 85 76 ff ff ff    	jne    27d9 <test+0x39>
 	ff 14 25 00 00 00 00 	callq  *0x0
 R_X86_64_32S	pv_irq_ops
 	f6 c4 02             	test   $0x2,%ah
-	0f 84 71 ff ff ff    	je     2745 <test+0x55>
+	0f 84 66 ff ff ff    	je     27d9 <test+0x39>
 	be 7c 00 00 00       	mov    $0x7c,%esi
 	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
 R_X86_64_32S	.rodata.str1.1
-	c6 05 00 00 00 00 01 	movb   $0x1,0x0(%rip)        # 27e7 <test+0xf7>
+	c6 05 00 00 00 00 01 	movb   $0x1,0x0(%rip)        # 2886 <test+0xe6>
 R_X86_64_PC32	.bss-0x1
-	e8 00 00 00 00       	callq  27ec <test+0xfc>
+	e8 00 00 00 00       	callq  288b <test+0xeb>
 R_X86_64_PC32	lockdep_rcu_dereference-0x4
-	e9 54 ff ff ff       	jmpq   2745 <test+0x55>
+	e9 49 ff ff ff       	jmpq   27d9 <test+0x39>
 	48 c7 c7 00 00 00 00 	mov    $0x0,%rdi
 R_X86_64_32S	rcu_sched_lock_map
-	e8 00 00 00 00       	callq  27fd <test+0x10d>
+	e8 00 00 00 00       	callq  289c <test+0xfc>
 R_X86_64_PC32	lock_is_held-0x4
 	85 c0                	test   %eax,%eax
-	0f 85 40 ff ff ff    	jne    2745 <test+0x55>
-	eb ab                	jmp    27b2 <test+0xc2>
-	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
-	00 00 
+	0f 85 35 ff ff ff    	jne    27d9 <test+0x39>
+	eb a6                	jmp    284c <test+0xac>
+	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
+	00 00 00 


> 
> Now even worse, when you NOP out the jmpq then your tracepoint is
> still not enabled. Brilliant !
> 
> Did you guys ever look at the assembly output of that insane shite you
> are advertising with lengthy explanations ? 
> 
> Obviously _NOT_

Perhaps so, but as Peter Zijlsta has said, compiling with gcc is a
random number generator. Your mileage may vary.

> 
> Come back when you can show me a clean imlementation of all this crap
> which reproduces with my jumplabel enabled stock compiler. And please
> just send me a patch w/o the blurb.
> 
> And sane looks like:
> 
>     jmpq   2f  <---- This gets noped out 
> 1:
>     mov    %r12,%rdi
>     callq  *(%r12)
>     [whatever cleanup it takes ]
>     leaveq 
>     retq   
> 
> 2f:
>     [tracing gunk]
>     jmp    1b

The above looks like what I have.

-- Steve


> 
> And further I want to see the tracing gunk in a minimal size so the
> net/core/dev.c deinlining does not happen.
> 
> Thanks,
> 
> 	tglx
> 
> P.S.: It might be helpful and polite if you'd take off your tracing
>       blinkers from time to time.



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 20:55                           ` Steven Rostedt
@ 2010-10-19 21:07                             ` Thomas Gleixner
  2010-10-19 21:23                               ` Steven Rostedt
  2010-10-19 21:45                             ` Thomas Gleixner
  1 sibling, 1 reply; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 21:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 19 Oct 2010, Steven Rostedt wrote:
> On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote:
> > So that saves _TWO_ bytes of text and replaces:
> > 
> > -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
> > -  25:	74 4d                	je     74 <test+0x74>
> > +  1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
> > +  23:	eb 4d                	jmp    72 <test+0x72>
> > 
> > So it trades a conditional vs. two jumps ? WTF ??
> 
> Well, the one jmpq is noped out, and the jmp is non conditional. I've

What are you smoking ?

In case the trace point is enabled the jmpq is there, so it jumps to
23 and jumps from there to 72.

In case the trace point is disabled the jmpq is noped out, so it jumps
to 72 directly.

> always thought a non conditional jmp was faster than a conditional one,

I always thought, that at least some of the stuff which comes from
tracing folks makes some sense.

> since there's no need to go into the branch prediction logic. The CPU
> can simply skip to the code to jump next. Of counse, this pollutes the 
> I$.

We might consult Mathieu for further useless blurb on how CPUs work
around broken code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 19:49                         ` Thomas Gleixner
  2010-10-19 20:55                           ` Steven Rostedt
@ 2010-10-19 21:16                           ` David Daney
  2010-10-19 21:32                             ` Jason Baron
  2010-10-19 21:47                             ` Steven Rostedt
  2010-10-19 21:28                           ` Jason Baron
  2 siblings, 2 replies; 93+ messages in thread
From: David Daney @ 2010-10-19 21:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/19/2010 12:49 PM, Thomas Gleixner wrote:
[...]
> So that saves _TWO_ bytes of text and replaces:
>
> -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25<test+0x25>
> -  25:	74 4d                	je     74<test+0x74>
> +  1e:	e9 00 00 00 00       	jmpq   23<test+0x23>
> +  23:	eb 4d                	jmp    72<test+0x72>
>
> So it trades a conditional vs. two jumps ? WTF ??
>
> I thought that jumplabel magic was supposed to get rid of the jump
> over the tracing code ? In fact it adds another jump. Whatfor ?

The 'asm goto' construct in GCC-4.5 is deficient in this area.

GCC assumes that all exit paths from an 'asm goto' are equally likely, 
so the tracing (or dynamic printk etc.) code is assumed to be hot and is 
emitted inline.  Since they are inline like this, there are all these 
jumps around them and they pollute the I-Cache.

I was looking at fixing it, but I think a true general purpose fix would 
require enhancing GCC's grammar to allow specifying of the 'likelyness' 
of each exit path from 'asm goto'.

David Daney

>
> Now even worse, when you NOP out the jmpq then your tracepoint is
> still not enabled. Brilliant !
>
> Did you guys ever look at the assembly output of that insane shite you
> are advertising with lengthy explanations ?
>
> Obviously _NOT_
>
> Come back when you can show me a clean imlementation of all this crap
> which reproduces with my jumplabel enabled stock compiler. And please
> just send me a patch w/o the blurb.
>
> And sane looks like:
>
>      jmpq   2f<---- This gets noped out
> 1:
>      mov    %r12,%rdi
>      callq  *(%r12)
>      [whatever cleanup it takes ]
>      leaveq
>      retq
>
> 2f:
>      [tracing gunk]
>      jmp    1b
>
> And further I want to see the tracing gunk in a minimal size so the
> net/core/dev.c deinlining does not happen.
>
> Thanks,
>
> 	tglx
>
> P.S.: It might be helpful and polite if you'd take off your tracing
>        blinkers from time to time.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:07                             ` Thomas Gleixner
@ 2010-10-19 21:23                               ` Steven Rostedt
  2010-10-19 21:48                                 ` H. Peter Anvin
  2010-10-19 22:04                                 ` Thomas Gleixner
  0 siblings, 2 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 21:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 23:07 +0200, Thomas Gleixner wrote:
> On Tue, 19 Oct 2010, Steven Rostedt wrote:
> > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote:
> > > So that saves _TWO_ bytes of text and replaces:
> > > 
> > > -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
> > > -  25:	74 4d                	je     74 <test+0x74>
> > > +  1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
> > > +  23:	eb 4d                	jmp    72 <test+0x72>
> > > 
> > > So it trades a conditional vs. two jumps ? WTF ??
> > 
> > Well, the one jmpq is noped out, and the jmp is non conditional. I've
> 
> What are you smoking ?

What? Are you saying that conditional jumps are just as fast as non
conditional ones?

> 
> In case the trace point is enabled the jmpq is there, so it jumps to
> 23 and jumps from there to 72.

No, when we dynamically enable the tracepoint, it will jump to 25, not
23. That's what the goto part is about. We add the do_trace label to the
table, and we make it point to that location. If we did it as you say,
then tracepoints would never be enabled.

This is not unlike what we do with the function tracer. The original
code points to mcount which simply is:

	mcount:
		retq

And when we enable the callers, we have it jump to a different function.

> 
> In case the trace point is disabled the jmpq is noped out, so it jumps
> to 72 directly.

That is correct.

> 
> > always thought a non conditional jmp was faster than a conditional one,
> 
> I always thought, that at least some of the stuff which comes from
> tracing folks makes some sense.

Is it still not making sense?

> 
> > since there's no need to go into the branch prediction logic. The CPU
> > can simply skip to the code to jump next. Of counse, this pollutes the 
> > I$.
> 
> We might consult Mathieu for further useless blurb on how CPUs work
> around broken code.

The code worked fine before, it just was not very pretty.

But it seemed that gcc for you inlined the code in the wrong spot.
Perhaps it's not a good idea to have the something like h - softirq_vec
in the parameter of the tracepoint. Not saying that your change is not
worth it. It is, because h - softirq_vec is used by others now too.


-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 19:49                         ` Thomas Gleixner
  2010-10-19 20:55                           ` Steven Rostedt
  2010-10-19 21:16                           ` David Daney
@ 2010-10-19 21:28                           ` Jason Baron
  2010-10-19 21:55                             ` Thomas Gleixner
  2 siblings, 1 reply; 93+ messages in thread
From: Jason Baron @ 2010-10-19 21:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, Oct 19, 2010 at 09:49:45PM +0200, Thomas Gleixner wrote:
> > > On Tue, 19 Oct 2010, Steven Rostedt wrote:
> > as an excuse for adding extra performance impact to kernel code, because when it
> > will be replaced by asm gotos, all that will be left is the performance impact
> > inappropriately justified as insignificant compared to the impact of the old
> > tracepoint scheme.
> 
> Can you at one point just stop your tracing lectures and look at the
> facts ?
> 
> The impact of a sensible tracepoint design on the code in question
> before kstat_incr_softirqs_this_cpu() was added would have been a mere
> _FIVE_ bytes of text. But the original tracepoint code itself is
> _TWENTY_ bytes of text larger.
> 
> So we trade horrible code plus 20 bytes text against 5 bytes of text
> in the hotpath. And you tell me that these _FIVE_ bytes are impacting
> performance so much that it's significant.
> 
> Now with kstat_incr_softirqs_this_cpu() the impact is zero, it even
> removes code.
> 
> And talking about non impact of disabled trace points. The tracepoint
> in question which made me look at the code results in deinlining
> __raise_softirq_irqsoff() in net/dev/core.c. There goes your theory.
> 
> So no, you _cannot_ tell what impact a tracepoint has in reality
> except by looking at the assembly output.
> 
> And what scares me way more is the size of a single tracepoint in a
> code file.
> 
> Just adding "trace_softirq_entry(nr);" adds 88 bytes of text. So
> that's optimized tracing code ?
> 
> All it's supposed to do is:
> 
>     if (enabled)
> 	trace_foo(nr);
> 
> Replace "if (enabled)" with your favourite code patching jump label
> whatever magic. The above stupid version takes about 28, but the
> "optimized" tracing code makes that 88. Brilliant. That's inlining
> utter shite for no good reason. WTF is it necessary to inline all that
> gunk ?
> 
> Please spare me the "jump label will make this less intrusive"
> lecture. I'm not interested at all.
> 
> Let's instead look at some more facts:
> 
> #include <linux/interrupt.h>
> #include <linux/module.h>
> 
> #include <trace/events/irq.h>
> 
> static struct softirq_action softirq_vec[NR_SOFTIRQS];
> 
> void test(struct softirq_action *h)
> {
> 	trace_softirq_entry(h - softirq_vec);
> 
> 	h->action(h);
> }
> 
> Compile this code with GCC 4.5 with and without jump labels (zap the
> select HAVE_ARCH_JUMP_LABEL line in arch/x86/Kconfig)
> 
> So now the !jumplabel case gives us:
> 
> ../build/kernel/soft.o:     file format elf64-x86-64
> 
> Disassembly of section .text:
> 
> 0000000000000000 <test>:
>    0:	55                   	push   %rbp
>    1:	48 89 e5             	mov    %rsp,%rbp
>    4:	41 55                	push   %r13
>    6:	49 89 fd             	mov    %rdi,%r13
>    9:	49 81 ed 00 00 00 00 	sub    $0x0,%r13
>   10:	41 54                	push   %r12
>   12:	49 c1 ed 03          	shr    $0x3,%r13
>   16:	49 89 fc             	mov    %rdi,%r12
>   19:	53                   	push   %rbx
>   1a:	48 83 ec 08          	sub    $0x8,%rsp
>   1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
>   25:	74 4d                	je     74 <test+0x74>
>   27:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   2e:	00 00 
>   30:	ff 80 44 e0 ff ff    	incl   -0x1fbc(%rax)
>   36:	48 8b 1d 00 00 00 00 	mov    0x0(%rip),%rbx        # 3d <test+0x3d>
>   3d:	48 85 db             	test   %rbx,%rbx
>   40:	74 13                	je     55 <test+0x55>
>   42:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
>   46:	44 89 ee             	mov    %r13d,%esi
>   49:	ff 13                	callq  *(%rbx)
>   4b:	48 83 c3 10          	add    $0x10,%rbx
>   4f:	48 83 3b 00          	cmpq   $0x0,(%rbx)
>   53:	eb eb                	jmp    40 <test+0x40>
>   55:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   5c:	00 00 
>   5e:	ff 88 44 e0 ff ff    	decl   -0x1fbc(%rax)
>   64:	48 8b 80 38 e0 ff ff 	mov    -0x1fc8(%rax),%rax
>   6b:	a8 08                	test   $0x8,%al
>   6d:	74 05                	je     74 <test+0x74>
>   6f:	e8 00 00 00 00       	callq  74 <test+0x74>
>   74:	4c 89 e7             	mov    %r12,%rdi
>   77:	41 ff 14 24          	callq  *(%r12)
>   7b:	58                   	pop    %rax
>   7c:	5b                   	pop    %rbx
>   7d:	41 5c                	pop    %r12
>   7f:	41 5d                	pop    %r13
>   81:	c9                   	leaveq 
>   82:	c3                   	retq   
> 
> The jumplabel=y case gives:
> 
> ../build/kernel/soft.o:     file format elf64-x86-64
> 
> Disassembly of section .text:
> 
> 0000000000000000 <test>:
>    0:	55                   	push   %rbp
>    1:	48 89 e5             	mov    %rsp,%rbp
>    4:	41 55                	push   %r13
>    6:	49 89 fd             	mov    %rdi,%r13
>    9:	49 81 ed 00 00 00 00 	sub    $0x0,%r13
>   10:	41 54                	push   %r12
>   12:	49 c1 ed 03          	shr    $0x3,%r13
>   16:	49 89 fc             	mov    %rdi,%r12
>   19:	53                   	push   %rbx
>   1a:	48 83 ec 08          	sub    $0x8,%rsp
>   1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
>   23:	eb 4d                	jmp    72 <test+0x72>
>   25:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   2c:	00 00 
>   2e:	ff 80 44 e0 ff ff    	incl   -0x1fbc(%rax)
>   34:	48 8b 1d 00 00 00 00 	mov    0x0(%rip),%rbx        # 3b <test+0x3b>
>   3b:	48 85 db             	test   %rbx,%rbx
>   3e:	74 13                	je     53 <test+0x53>
>   40:	48 8b 7b 08          	mov    0x8(%rbx),%rdi
>   44:	44 89 ee             	mov    %r13d,%esi
>   47:	ff 13                	callq  *(%rbx)
>   49:	48 83 c3 10          	add    $0x10,%rbx
>   4d:	48 83 3b 00          	cmpq   $0x0,(%rbx)
>   51:	eb eb                	jmp    3e <test+0x3e>
>   53:	65 48 8b 04 25 00 00 	mov    %gs:0x0,%rax
>   5a:	00 00 
>   5c:	ff 88 44 e0 ff ff    	decl   -0x1fbc(%rax)
>   62:	48 8b 80 38 e0 ff ff 	mov    -0x1fc8(%rax),%rax
>   69:	a8 08                	test   $0x8,%al
>   6b:	74 05                	je     72 <test+0x72>
>   6d:	e8 00 00 00 00       	callq  72 <test+0x72>
>   72:	4c 89 e7             	mov    %r12,%rdi
>   75:	41 ff 14 24          	callq  *(%r12)
>   79:	58                   	pop    %rax
>   7a:	5b                   	pop    %rbx
>   7b:	41 5c                	pop    %r12
>   7d:	41 5d                	pop    %r13
>   7f:	c9                   	leaveq 
>   80:	c3                   	retq   
> 
> So that saves _TWO_ bytes of text and replaces:
> 
> -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
> -  25:	74 4d                	je     74 <test+0x74>
> +  1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
> +  23:	eb 4d                	jmp    72 <test+0x72>
> 
> So it trades a conditional vs. two jumps ? WTF ??
> 

right, so the 'jmpq' on boot on x86 gets patched with 5 byte no-op
sequence. So in the disabled case we have no-op followed by a jump
around the disabled code.

> I thought that jumplabel magic was supposed to get rid of the jump
> over the tracing code ? In fact it adds another jump. Whatfor ?
> 

yes, that is the plan. gcc does not yet support hot/cold labels...once
it does the second jump will go away and the entire tracepoint code will
be moved to a 'cold' section. It's not quite completely optimal yet, but
we are getting there.

> Now even worse, when you NOP out the jmpq then your tracepoint is
> still not enabled. Brilliant !
> 

The 'jmpq' in the enabled case is patched with a jmpq to the body of the
tracepoint itself.

> Did you guys ever look at the assembly output of that insane shite you
> are advertising with lengthy explanations ? 
> 
> Obviously _NOT_
> 
> Come back when you can show me a clean imlementation of all this crap
> which reproduces with my jumplabel enabled stock compiler. And please
> just send me a patch w/o the blurb.
> 
> And sane looks like:
> 
>     jmpq   2f  <---- This gets noped out 
> 1:
>     mov    %r12,%rdi
>     callq  *(%r12)
>     [whatever cleanup it takes ]
>     leaveq 
>     retq   
> 
> 2f:
>     [tracing gunk]
>     jmp    1b
> 

yes, this is what the code should look like when we get support for
hot/cold labels. I've discussed this support with gcc folk, and its the
next step here. So yes, this is exacatly where we are headed.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:16                           ` David Daney
@ 2010-10-19 21:32                             ` Jason Baron
  2010-10-19 21:38                               ` David Daney
  2010-10-19 21:47                             ` Steven Rostedt
  1 sibling, 1 reply; 93+ messages in thread
From: Jason Baron @ 2010-10-19 21:32 UTC (permalink / raw)
  To: David Daney
  Cc: Thomas Gleixner, Mathieu Desnoyers, Steven Rostedt, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet,
	kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro,
	Heiko Carstens, Luck, Tony

On Tue, Oct 19, 2010 at 02:16:54PM -0700, David Daney wrote:
> On 10/19/2010 12:49 PM, Thomas Gleixner wrote:
> [...]
> >So that saves _TWO_ bytes of text and replaces:
> >
> >-  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25<test+0x25>
> >-  25:	74 4d                	je     74<test+0x74>
> >+  1e:	e9 00 00 00 00       	jmpq   23<test+0x23>
> >+  23:	eb 4d                	jmp    72<test+0x72>
> >
> >So it trades a conditional vs. two jumps ? WTF ??
> >
> >I thought that jumplabel magic was supposed to get rid of the jump
> >over the tracing code ? In fact it adds another jump. Whatfor ?
> 
> The 'asm goto' construct in GCC-4.5 is deficient in this area.
> 
> GCC assumes that all exit paths from an 'asm goto' are equally
> likely, so the tracing (or dynamic printk etc.) code is assumed to
> be hot and is emitted inline.  Since they are inline like this,
> there are all these jumps around them and they pollute the I-Cache.
> 
> I was looking at fixing it, but I think a true general purpose fix
> would require enhancing GCC's grammar to allow specifying of the
> 'likelyness' of each exit path from 'asm goto'.
> 
> David Daney
> 

right, the next step is adding support for hot/cold labels, so the
tracing code will be annotaed with a 'cold' label. Thus, not adding the
'jmp' above on line '23', and in fact moving the tracing code
out-of-line. Maybe I haven't been clear on this.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:32                             ` Jason Baron
@ 2010-10-19 21:38                               ` David Daney
  0 siblings, 0 replies; 93+ messages in thread
From: David Daney @ 2010-10-19 21:38 UTC (permalink / raw)
  To: Jason Baron
  Cc: Thomas Gleixner, Mathieu Desnoyers, Steven Rostedt, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet,
	kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro,
	Heiko Carstens, Luck, Tony

On 10/19/2010 02:32 PM, Jason Baron wrote:
> On Tue, Oct 19, 2010 at 02:16:54PM -0700, David Daney wrote:
>> On 10/19/2010 12:49 PM, Thomas Gleixner wrote:
>> [...]
>>> So that saves _TWO_ bytes of text and replaces:
>>>
>>> -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25<test+0x25>
>>> -  25:	74 4d                	je     74<test+0x74>
>>> +  1e:	e9 00 00 00 00       	jmpq   23<test+0x23>
>>> +  23:	eb 4d                	jmp    72<test+0x72>
>>>
>>> So it trades a conditional vs. two jumps ? WTF ??
>>>
>>> I thought that jumplabel magic was supposed to get rid of the jump
>>> over the tracing code ? In fact it adds another jump. Whatfor ?
>>
>> The 'asm goto' construct in GCC-4.5 is deficient in this area.
>>
>> GCC assumes that all exit paths from an 'asm goto' are equally
>> likely, so the tracing (or dynamic printk etc.) code is assumed to
>> be hot and is emitted inline.  Since they are inline like this,
>> there are all these jumps around them and they pollute the I-Cache.
>>
>> I was looking at fixing it, but I think a true general purpose fix
>> would require enhancing GCC's grammar to allow specifying of the
>> 'likelyness' of each exit path from 'asm goto'.
>>
>> David Daney
>>
>
> right, the next step is adding support for hot/cold labels, so the
> tracing code will be annotaed with a 'cold' label. Thus, not adding the
> 'jmp' above on line '23', and in fact moving the tracing code
> out-of-line. Maybe I haven't been clear on this.
>

Ok, so is anybody working on doing that?  GCC-4.6 stage 1 (the time when 
a change like this could be merged) closes in 8 days.

It is unfortunate that we have this shiny new feature that can't really 
be used because the infrastructure is only half baked.

David Daney

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 20:55                           ` Steven Rostedt
  2010-10-19 21:07                             ` Thomas Gleixner
@ 2010-10-19 21:45                             ` Thomas Gleixner
  2010-10-19 22:14                               ` Steven Rostedt
  1 sibling, 1 reply; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 21:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 19 Oct 2010, Steven Rostedt wrote:
> On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote:
> 
> Because you do the h - softvec in the tracepoint parameter? I got a
> different result:

I guess some serious whacking is due.

The compiler adds two jumps when the parameter changes due to
 (h -softvec) instead of (h, softvec) ????

Dude, you can't be serious.

If you would have asked about the compiler version I'm using and told
me about the compiler version you are using, then I could take that
answer somehow serious.

It still would miss the "Uhhhh, your compiler creates crap code"
alert, because that double jump is seriously broken and braindead.

And I tell you more about this. You are going to piss off a lot of
users of distro compilers because they will set CC_HAVE_ASM_GOTO
happily and create the code I posted. Which will break the tracer no
matter what.

So you tracer maniacs happily played with some experimental compiler
stuff w/o even testing your crap against something which ships with
distros or is the reference 4.5 compiler on kernel.org ?

I prefer you sending a patch to disable this, until it's sorted out,
unless you want me to add some really outrageous changelog to the
patch I'm going to put into tip tomorrow night, ok ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:16                           ` David Daney
  2010-10-19 21:32                             ` Jason Baron
@ 2010-10-19 21:47                             ` Steven Rostedt
  1 sibling, 0 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 21:47 UTC (permalink / raw)
  To: David Daney
  Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 14:16 -0700, David Daney wrote:
> On 10/19/2010 12:49 PM, Thomas Gleixner wrote:
> [...]
> > So that saves _TWO_ bytes of text and replaces:
> >
> > -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25<test+0x25>
> > -  25:	74 4d                	je     74<test+0x74>
> > +  1e:	e9 00 00 00 00       	jmpq   23<test+0x23>
> > +  23:	eb 4d                	jmp    72<test+0x72>
> >
> > So it trades a conditional vs. two jumps ? WTF ??
> >
> > I thought that jumplabel magic was supposed to get rid of the jump
> > over the tracing code ? In fact it adds another jump. Whatfor ?
> 
> The 'asm goto' construct in GCC-4.5 is deficient in this area.
> 
> GCC assumes that all exit paths from an 'asm goto' are equally likely, 
> so the tracing (or dynamic printk etc.) code is assumed to be hot and is 
> emitted inline.  Since they are inline like this, there are all these 
> jumps around them and they pollute the I-Cache.
> 

Interesting. I thought the driving force for asm goto was for
tracepoints, as the documentation seems to reference them. One would
think that the default would have been to make it the unlikely case, as
it may be the only user of that code so far.

> I was looking at fixing it, but I think a true general purpose fix would 
> require enhancing GCC's grammar to allow specifying of the 'likelyness' 
> of each exit path from 'asm goto'.

That would be nice.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:23                               ` Steven Rostedt
@ 2010-10-19 21:48                                 ` H. Peter Anvin
  2010-10-19 22:23                                   ` Steven Rostedt
  2010-10-19 22:41                                   ` Mathieu Desnoyers
  2010-10-19 22:04                                 ` Thomas Gleixner
  1 sibling, 2 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-19 21:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony

On 10/19/2010 02:23 PM, Steven Rostedt wrote:
> 
> But it seemed that gcc for you inlined the code in the wrong spot.
> Perhaps it's not a good idea to have the something like h - softirq_vec
> in the parameter of the tracepoint. Not saying that your change is not
> worth it. It is, because h - softirq_vec is used by others now too.
> 

OK, first of all, there are some serious WTFs here:

# define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"

A jump instruction is one of the worst possible NOPs.  Why are we doing
this?

The second thing that I found when implementing static_cpu_has() was
that it is actually better to encapsulate the asm goto in a small inline
which returns bool (true/false) -- gcc will happily optimize out the
variable and only see it as a flow of control thing.  I would be very
curious if that wouldn't make gcc generate better code in cases like that.

gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm
goto (you can't have it unconditionally branch one way or the other), so
that should be the likely case and accordingly it should be annotated
likely() so that gcc doesn't reorder.  I suspect in the end one ends up
with code like this:

static __always_inline __pure bool __switch_point(...)
{
	asm goto("1: " JUMP_LABEL_INITIAL_NOP
		 /* ... patching stuff */
		: : : : t_jump);
	return false;
t_jump:
	return true;
}

#define SWITCH_POINT(x) unlikely(__switch_point(x))

I *suspect* this will resolve the need for hot/cold labels just fine.

	-hpa


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:28                           ` Jason Baron
@ 2010-10-19 21:55                             ` Thomas Gleixner
  2010-10-19 22:17                               ` Thomas Gleixner
  2010-10-19 22:38                               ` Jason Baron
  0 siblings, 2 replies; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 21:55 UTC (permalink / raw)
  To: Jason Baron
  Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 19 Oct 2010, Jason Baron wrote:
> On Tue, Oct 19, 2010 at 09:49:45PM +0200, Thomas Gleixner wrote:
> > > > On Tue, 19 Oct 2010, Steven Rostedt wrote:
> > 
> > So it trades a conditional vs. two jumps ? WTF ??
> > 
> 
> right, so the 'jmpq' on boot on x86 gets patched with 5 byte no-op
> sequence. So in the disabled case we have no-op followed by a jump
> around the disabled code.

And that's supposed to be useful ? We do _NOT_ want to jump around
disabled stuff. The noped out case should fall through into the non
traced code. Otherwise that whole jumplabel thing is completely
useless.

> > I thought that jumplabel magic was supposed to get rid of the jump
> > over the tracing code ? In fact it adds another jump. Whatfor ?
> > 
> 
> yes, that is the plan. gcc does not yet support hot/cold labels...once
> it does the second jump will go away and the entire tracepoint code will
> be moved to a 'cold' section. It's not quite completely optimal yet, but
> we are getting there.

Then do not advertise it as the brilliant solution for all tracing
matters.

> > Now even worse, when you NOP out the jmpq then your tracepoint is
> > still not enabled. Brilliant !
> > 
> 
> The 'jmpq' in the enabled case is patched with a jmpq to the body of the
> tracepoint itself.

Brilliant.
 
> > Did you guys ever look at the assembly output of that insane shite you
> > are advertising with lengthy explanations ? 
> > 
> > Obviously _NOT_
> > 
> > Come back when you can show me a clean imlementation of all this crap
> > which reproduces with my jumplabel enabled stock compiler. And please
> > just send me a patch w/o the blurb.
> > 
> > And sane looks like:
> > 
> >     jmpq   2f  <---- This gets noped out 
> > 1:
> >     mov    %r12,%rdi
> >     callq  *(%r12)
> >     [whatever cleanup it takes ]
> >     leaveq 
> >     retq   
> > 
> > 2f:
> >     [tracing gunk]
> >     jmp    1b
> > 
> 
> yes, this is what the code should look like when we get support for
> hot/cold labels. I've discussed this support with gcc folk, and its the
> next step here. So yes, this is exacatly where we are headed.

So and at the same time the whole tracing crowd tells me, that this is
already a done deal. See previous advertisments from DrTracing. I'm
seriously grumpy about this especially in the context of a patch which
fixes one of the worst interfaces I've seen in years.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:23                               ` Steven Rostedt
  2010-10-19 21:48                                 ` H. Peter Anvin
@ 2010-10-19 22:04                                 ` Thomas Gleixner
  2010-10-19 22:33                                   ` Steven Rostedt
  1 sibling, 1 reply; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 22:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 19 Oct 2010, Steven Rostedt wrote:
> On Tue, 2010-10-19 at 23:07 +0200, Thomas Gleixner wrote:
> > On Tue, 19 Oct 2010, Steven Rostedt wrote:
> > > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote:
> > > > So that saves _TWO_ bytes of text and replaces:
> > > > 
> > > > -  1e:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
> > > > -  25:	74 4d                	je     74 <test+0x74>
> > > > +  1e:	e9 00 00 00 00       	jmpq   23 <test+0x23>
> > > > +  23:	eb 4d                	jmp    72 <test+0x72>
> > > > 
> > > > So it trades a conditional vs. two jumps ? WTF ??
> > > 
> > > Well, the one jmpq is noped out, and the jmp is non conditional. I've
> > 
> > What are you smoking ?
> 
> What? Are you saying that conditional jumps are just as fast as non
> conditional ones?
> 
> > 
> > In case the trace point is enabled the jmpq is there, so it jumps to
> > 23 and jumps from there to 72.
> 
> No, when we dynamically enable the tracepoint, it will jump to 25, not
> 23. That's what the goto part is about. We add the do_trace label to the
> table, and we make it point to that location. If we did it as you say,
> then tracepoints would never be enabled.
> 
> This is not unlike what we do with the function tracer. The original
> code points to mcount which simply is:
> 
> 	mcount:
> 		retq
> 
> And when we enable the callers, we have it jump to a different function.
> 
> > 
> > In case the trace point is disabled the jmpq is noped out, so it jumps
> > to 72 directly.
> 
> That is correct.
> 
> > 
> > > always thought a non conditional jmp was faster than a conditional one,
> > 
> > I always thought, that at least some of the stuff which comes from
> > tracing folks makes some sense.
> 
> Is it still not making sense?
> 
> > 
> > > since there's no need to go into the branch prediction logic. The CPU
> > > can simply skip to the code to jump next. Of counse, this pollutes the 
> > > I$.
> > 
> > We might consult Mathieu for further useless blurb on how CPUs work
> > around broken code.
> 
> The code worked fine before, it just was not very pretty.
> 
> But it seemed that gcc for you inlined the code in the wrong spot.
> Perhaps it's not a good idea to have the something like h - softirq_vec
> in the parameter of the tracepoint. Not saying that your change is not
> worth it. It is, because h - softirq_vec is used by others now too.

Crap, crap, crap. This has nothing to do with the arguments of that
trace point, it's a compiler problem and you are just hoping that GCC
will do the right thing.

That's the complete wrong assumption and as Jason confirmed GCC is not
up to it at all.

hpa just posted code which does the _RIGHT_ _THING_ independent of any
compiler madness and you tracer folks just missed it.

Your jump label optimization made code even worse for todays common
compilers. Just admit it and fix that mess you created or simply
disable it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:45                             ` Thomas Gleixner
@ 2010-10-19 22:14                               ` Steven Rostedt
  0 siblings, 0 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 22:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 23:45 +0200, Thomas Gleixner wrote:
> On Tue, 19 Oct 2010, Steven Rostedt wrote:
> > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote:
> > 
> > Because you do the h - softvec in the tracepoint parameter? I got a
> > different result:
> 
> I guess some serious whacking is due.
> 
> The compiler adds two jumps when the parameter changes due to
>  (h -softvec) instead of (h, softvec) ????
> 
> Dude, you can't be serious.
> 
> If you would have asked about the compiler version I'm using and told
> me about the compiler version you are using, then I could take that
> answer somehow serious.

Heh, gcc has always been of a black magic for what it decided. But,
anyway, I'm using a self built version (vanilla from gcc.gnu.org) of
4.5.1. What are you using?

> 
> It still would miss the "Uhhhh, your compiler creates crap code"
> alert, because that double jump is seriously broken and braindead.
> 
> And I tell you more about this. You are going to piss off a lot of
> users of distro compilers because they will set CC_HAVE_ASM_GOTO
> happily and create the code I posted. Which will break the tracer no
> matter what.
> 
> So you tracer maniacs happily played with some experimental compiler
> stuff w/o even testing your crap against something which ships with
> distros or is the reference 4.5 compiler on kernel.org ?
> 
> I prefer you sending a patch to disable this, until it's sorted out,
> unless you want me to add some really outrageous changelog to the
> patch I'm going to put into tip tomorrow night, ok ?

Then lets just compare the crap versions you posted.

> -  1e:        83 3d 00 00 00 00 00    cmpl   $0x0,0x0(%rip)        # 25 <test+0x25>
> -  25:        74 4d                   je     74 <test+0x74>
> +  1e:        e9 00 00 00 00          jmpq   23 <test+0x23>
> +  23:        eb 4d                   jmp    72 <test+0x72>

Yes, gcc replaced a cmp and conditional jump with two unconditional
jumps. One of these jumps on boot up will be converted to a nop. Thus
the jump label code just converted a compare and conditional jump with a
nop and a non conditional jump.

This still sounds like a win to me, although we can do better. I guess
those poor sobs using a distro kernel compiled with a distro gcc that
has CC_HAVE_ASM_GOTO enabled will still be doing better than if it was
doing the if (enable) code.

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:55                             ` Thomas Gleixner
@ 2010-10-19 22:17                               ` Thomas Gleixner
  2010-10-20  1:36                                 ` Steven Rostedt
  2010-10-19 22:38                               ` Jason Baron
  1 sibling, 1 reply; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-19 22:17 UTC (permalink / raw)
  To: Jason Baron
  Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 19 Oct 2010, Thomas Gleixner wrote:
> On Tue, 19 Oct 2010, Jason Baron wrote:
> > > Now even worse, when you NOP out the jmpq then your tracepoint is
> > > still not enabled. Brilliant !
> > > 
> > 
> > The 'jmpq' in the enabled case is patched with a jmpq to the body of the
> > tracepoint itself.
> 
> Brilliant.

IOW, We now jump around the jump which jumps around the disabled code.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:48                                 ` H. Peter Anvin
@ 2010-10-19 22:23                                   ` Steven Rostedt
  2010-10-19 22:26                                     ` H. Peter Anvin
  2010-10-19 22:27                                     ` Peter Zijlstra
  2010-10-19 22:41                                   ` Mathieu Desnoyers
  1 sibling, 2 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 22:23 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony

On Tue, 2010-10-19 at 14:48 -0700, H. Peter Anvin wrote:
> On 10/19/2010 02:23 PM, Steven Rostedt wrote:
> > 
> > But it seemed that gcc for you inlined the code in the wrong spot.
> > Perhaps it's not a good idea to have the something like h - softirq_vec
> > in the parameter of the tracepoint. Not saying that your change is not
> > worth it. It is, because h - softirq_vec is used by others now too.
> > 
> 
> OK, first of all, there are some serious WTFs here:
> 
> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
> 
> A jump instruction is one of the worst possible NOPs.  Why are we doing
> this?


Good question. Safety?  Jason?

This is the initial jumps and are converted on boot up to a better nop.

> 
> The second thing that I found when implementing static_cpu_has() was
> that it is actually better to encapsulate the asm goto in a small inline
> which returns bool (true/false) -- gcc will happily optimize out the
> variable and only see it as a flow of control thing.  I would be very
> curious if that wouldn't make gcc generate better code in cases like that.
>
> gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm
> goto (you can't have it unconditionally branch one way or the other), so
> that should be the likely case and accordingly it should be annotated
> likely() so that gcc doesn't reorder.  I suspect in the end one ends up
> with code like this:
> 
> static __always_inline __pure bool __switch_point(...)
> {
> 	asm goto("1: " JUMP_LABEL_INITIAL_NOP
> 		 /* ... patching stuff */
> 		: : : : t_jump);
> 	return false;
> t_jump:
> 	return true;
> }
> 
> #define SWITCH_POINT(x) unlikely(__switch_point(x))
> 
> I *suspect* this will resolve the need for hot/cold labels just fine.

Interesting, we could try this.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:23                                   ` Steven Rostedt
@ 2010-10-19 22:26                                     ` H. Peter Anvin
  2010-10-19 22:27                                     ` Peter Zijlstra
  1 sibling, 0 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-19 22:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony

On 10/19/2010 03:23 PM, Steven Rostedt wrote:
> On Tue, 2010-10-19 at 14:48 -0700, H. Peter Anvin wrote:
>> On 10/19/2010 02:23 PM, Steven Rostedt wrote:
>>>
>>> But it seemed that gcc for you inlined the code in the wrong spot.
>>> Perhaps it's not a good idea to have the something like h - softirq_vec
>>> in the parameter of the tracepoint. Not saying that your change is not
>>> worth it. It is, because h - softirq_vec is used by others now too.
>>>
>>
>> OK, first of all, there are some serious WTFs here:
>>
>> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
>>
>> A jump instruction is one of the worst possible NOPs.  Why are we doing
>> this?
> 
> Good question. Safety?  Jason?
> 
> This is the initial jumps and are converted on boot up to a better nop.
> 

But it makes absolutely no sense to insert an instruction that
suboptimal and then convert it.  Start out with a reasonable,
universally acceptable, instruction, e.g. LEA on 32 bits and NOPL on 64
bits.

>>
>> The second thing that I found when implementing static_cpu_has() was
>> that it is actually better to encapsulate the asm goto in a small inline
>> which returns bool (true/false) -- gcc will happily optimize out the
>> variable and only see it as a flow of control thing.  I would be very
>> curious if that wouldn't make gcc generate better code in cases like that.
>>
>> gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm
>> goto (you can't have it unconditionally branch one way or the other), so
>> that should be the likely case and accordingly it should be annotated
>> likely() so that gcc doesn't reorder.  I suspect in the end one ends up
>> with code like this:
>>
>> static __always_inline __pure bool __switch_point(...)
>> {
>> 	asm goto("1: " JUMP_LABEL_INITIAL_NOP
>> 		 /* ... patching stuff */
>> 		: : : : t_jump);
>> 	return false;
>> t_jump:
>> 	return true;
>> }
>>
>> #define SWITCH_POINT(x) unlikely(__switch_point(x))
>>
>> I *suspect* this will resolve the need for hot/cold labels just fine.
> 
> Interesting, we could try this.
> 

It of course also have the nice property that it syntactically looks
exactly like any other C conditional.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:23                                   ` Steven Rostedt
  2010-10-19 22:26                                     ` H. Peter Anvin
@ 2010-10-19 22:27                                     ` Peter Zijlstra
  2010-10-19 23:39                                       ` H. Peter Anvin
  1 sibling, 1 reply; 93+ messages in thread
From: Peter Zijlstra @ 2010-10-19 22:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Wed, October 20, 2010 12:23 am, Steven Rostedt wrote:

>> static __always_inline __pure bool __switch_point(...)
>> {
>> 	asm goto("1: " JUMP_LABEL_INITIAL_NOP
>> 		 /* ... patching stuff */
>> 		: : : : t_jump);
>> 	return false;
>> t_jump:
>> 	return true;
>> }
>>
>> #define SWITCH_POINT(x) unlikely(__switch_point(x))
>>
>> I *suspect* this will resolve the need for hot/cold labels just fine.
>
> Interesting, we could try this.

Due to not actually having a sane key type the above is not easy to
implement, but I tried:

#define _SWITCH_POINT(x)\
({                                                              \
        __label__ jl_enabled;                                   \
        bool ret = true;                                        \
        JUMP_LABEL(x, jl_enabled);                              \
        ret = false;                                            \
jl_enabled:                                                     \
        ret;            })

#define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x))

#define COND_STMT(key, stmt)                                    \
do {                                                            \
        if (SWITCH_POINT(key)) {                                \
                stmt;                                           \
        }                                                       \
} while (0)


and that's still generating these double jumps.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:04                                 ` Thomas Gleixner
@ 2010-10-19 22:33                                   ` Steven Rostedt
  2010-10-21 16:18                                     ` Thomas Gleixner
  0 siblings, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 22:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote:

> hpa just posted code which does the _RIGHT_ _THING_ independent of any
> compiler madness and you tracer folks just missed it.

Thomas,

Can you try this patch and see if it makes the object code better?

-- Steve


diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index a4a90b6..6264bd3 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -144,14 +144,19 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
  */
 #define __DECLARE_TRACE(name, proto, args, data_proto, data_args)	\
 	extern struct tracepoint __tracepoint_##name;			\
-	static inline void trace_##name(proto)				\
+	static __always_inline int __trace_##name(proto)		\
 	{								\
 		JUMP_LABEL(&__tracepoint_##name.state, do_trace);	\
-		return;							\
+		return 0;						\
 do_trace:								\
 			__DO_TRACE(&__tracepoint_##name,		\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args));			\
+			return 1;					\
+	}								\
+	static inline void trace_##name(proto)				\
+	{								\
+		unlikely(__trace_##name(args));				\
 	}								\
 	static inline int						\
 	register_trace_##name(void (*probe)(data_proto), void *data)	\



^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:55                             ` Thomas Gleixner
  2010-10-19 22:17                               ` Thomas Gleixner
@ 2010-10-19 22:38                               ` Jason Baron
  2010-10-19 22:44                                 ` H. Peter Anvin
  1 sibling, 1 reply; 93+ messages in thread
From: Jason Baron @ 2010-10-19 22:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, Oct 19, 2010 at 11:55:19PM +0200, Thomas Gleixner wrote:
> On Tue, 19 Oct 2010, Jason Baron wrote:
> > On Tue, Oct 19, 2010 at 09:49:45PM +0200, Thomas Gleixner wrote:
> > > > > On Tue, 19 Oct 2010, Steven Rostedt wrote:
> > > 
> > > So it trades a conditional vs. two jumps ? WTF ??
> > > 
> > 
> > right, so the 'jmpq' on boot on x86 gets patched with 5 byte no-op
> > sequence. So in the disabled case we have no-op followed by a jump
> > around the disabled code.
> 
> And that's supposed to be useful ? We do _NOT_ want to jump around
> disabled stuff. The noped out case should fall through into the non
> traced code. Otherwise that whole jumplabel thing is completely
> useless.
> 
> > > I thought that jumplabel magic was supposed to get rid of the jump
> > > over the tracing code ? In fact it adds another jump. Whatfor ?
> > > 
> > 
> > yes, that is the plan. gcc does not yet support hot/cold labels...once
> > it does the second jump will go away and the entire tracepoint code will
> > be moved to a 'cold' section. It's not quite completely optimal yet, but
> > we are getting there.
> 
> Then do not advertise it as the brilliant solution for all tracing
> matters.
> 

I'm not sure I did, the documentation says that we have nop followed by
a jmp:

+The new code is a 'nopl' followed by a 'jmp'. Thus:
+
+nopl - 0f 1f 44 00 00 - 5 bytes
+jmp  - eb 3e          - 2 bytes


http://marc.info/?l=linux-kernel&m=128717355231182&w=2`

> > > Now even worse, when you NOP out the jmpq then your tracepoint is
> > > still not enabled. Brilliant !
> > > 
> > 
> > The 'jmpq' in the enabled case is patched with a jmpq to the body of the
> > tracepoint itself.
> 
> Brilliant.
>  
> > > Did you guys ever look at the assembly output of that insane shite you
> > > are advertising with lengthy explanations ? 
> > > 
> > > Obviously _NOT_
> > > 
> > > Come back when you can show me a clean imlementation of all this crap
> > > which reproduces with my jumplabel enabled stock compiler. And please
> > > just send me a patch w/o the blurb.
> > > 
> > > And sane looks like:
> > > 
> > >     jmpq   2f  <---- This gets noped out 
> > > 1:
> > >     mov    %r12,%rdi
> > >     callq  *(%r12)
> > >     [whatever cleanup it takes ]
> > >     leaveq 
> > >     retq   
> > > 
> > > 2f:
> > >     [tracing gunk]
> > >     jmp    1b
> > > 
> > 
> > yes, this is what the code should look like when we get support for
> > hot/cold labels. I've discussed this support with gcc folk, and its the
> > next step here. So yes, this is exacatly where we are headed.
> 
> So and at the same time the whole tracing crowd tells me, that this is
> already a done deal. See previous advertisments from DrTracing. I'm
> seriously grumpy about this especially in the context of a patch which
> fixes one of the worst interfaces I've seen in years.
> 
> Thanks,
> 
> 	tglx

sorry if I mislead anybody about the current state of of 'jump labels'.
But we have the same goal in mind, and a clear path to get there. If you
don't agree with the approach - I'm all ears. And you are right - the code is
not where it should be yet.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 21:48                                 ` H. Peter Anvin
  2010-10-19 22:23                                   ` Steven Rostedt
@ 2010-10-19 22:41                                   ` Mathieu Desnoyers
  2010-10-19 22:49                                     ` H. Peter Anvin
  1 sibling, 1 reply; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-19 22:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony,
	Jason Baron

* H. Peter Anvin (hpa@zytor.com) wrote:
> On 10/19/2010 02:23 PM, Steven Rostedt wrote:
> > 
> > But it seemed that gcc for you inlined the code in the wrong spot.
> > Perhaps it's not a good idea to have the something like h - softirq_vec
> > in the parameter of the tracepoint. Not saying that your change is not
> > worth it. It is, because h - softirq_vec is used by others now too.
> > 
> 
> OK, first of all, there are some serious WTFs here:
> 
> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
> 
> A jump instruction is one of the worst possible NOPs.  Why are we doing
> this?

This code is dynamically patched at boot time (and module load time) with a
better nop, just like the function tracer does.

> 
> The second thing that I found when implementing static_cpu_has() was
> that it is actually better to encapsulate the asm goto in a small inline
> which returns bool (true/false) -- gcc will happily optimize out the
> variable and only see it as a flow of control thing.  I would be very
> curious if that wouldn't make gcc generate better code in cases like that.
> 
> gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm
> goto (you can't have it unconditionally branch one way or the other), so
> that should be the likely case and accordingly it should be annotated
> likely() so that gcc doesn't reorder.  I suspect in the end one ends up
> with code like this:
> 
> static __always_inline __pure bool __switch_point(...)
> {
> 	asm goto("1: " JUMP_LABEL_INITIAL_NOP
> 		 /* ... patching stuff */
> 		: : : : t_jump);
> 	return false;
> t_jump:
> 	return true;
> }
> 
> #define SWITCH_POINT(x) unlikely(__switch_point(x))
> 
> I *suspect* this will resolve the need for hot/cold labels just fine.

Thanks for the hint! We'll make sure to try it out. Having the ability to force
gcc to put the tracepoint in an unlikely branch is deeply needed here.

I'm a bit curious about the nop vs jump overhead comparison you are referring
to. It is an instruction latency benchmark or throughput benchmark ?

Intel's manual "Intel 64 and IA-32 Architectures Optimization Reference Manual"

http://www.intel.com/Assets/PDF/manual/248966.pdf

Page C-33 (or 577 in the pdf)

"7. Selection of conditional jump instructions should be based on the
    recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to
    improve the predictability of branches. When branches are predicted
    successfully, the latency of jcc is effectively zero."

So it mentions "jcc", but not jmp. Is there any reason for jmp to have a higher
latency than jcc ?

In this manual, the latency of predicted jcc is therefore 0 cycle, and its
throughput is 0.5 cycle/insn.

NOP (page C-29) is stated to have a latency of 0.5 to 1 cycle/insn (depending on
the exact HW), and throughput of 0.5 cycle/insn.

However, I have not found "jmp" explicitly in this listing.

So if we were executing tracepoints in a maze of jumps, we could argue that
instruction throughput is the most important there. However, if we expect the
common case to be surrounded by some non-ALU instructions, latency tends to
become the most important criterion.

But I feel I might be missing something important that distinguish "jcc" from
"jmp".

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:38                               ` Jason Baron
@ 2010-10-19 22:44                                 ` H. Peter Anvin
  2010-10-19 22:56                                   ` Steven Rostedt
  0 siblings, 1 reply; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-19 22:44 UTC (permalink / raw)
  To: Jason Baron
  Cc: Thomas Gleixner, Mathieu Desnoyers, Steven Rostedt, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/19/2010 03:38 PM, Jason Baron wrote:
> 
> I'm not sure I did, the documentation says that we have nop followed by
> a jmp:
> 
> +The new code is a 'nopl' followed by a 'jmp'. Thus:
> +
> +nopl - 0f 1f 44 00 00 - 5 bytes
> +jmp  - eb 3e          - 2 bytes
> 

There is no excuse for needing the second jump here, obviously...

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:41                                   ` Mathieu Desnoyers
@ 2010-10-19 22:49                                     ` H. Peter Anvin
  2010-10-19 23:05                                       ` Steven Rostedt
  0 siblings, 1 reply; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-19 22:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony,
	Jason Baron

On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote:
>>
>> OK, first of all, there are some serious WTFs here:
>>
>> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
>>
>> A jump instruction is one of the worst possible NOPs.  Why are we doing
>> this?
> 
> This code is dynamically patched at boot time (and module load time) with a
> better nop, just like the function tracer does.
> 

That's just ridiculous... start out with something sane and you at least
have the chance of not having to patch it.

> Intel's manual "Intel 64 and IA-32 Architectures Optimization Reference Manual"
> 
> http://www.intel.com/Assets/PDF/manual/248966.pdf
> 
> Page C-33 (or 577 in the pdf)
> 
> "7. Selection of conditional jump instructions should be based on the
>     recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to
>     improve the predictability of branches. When branches are predicted
>     successfully, the latency of jcc is effectively zero."
> 
> So it mentions "jcc", but not jmp. Is there any reason for jmp to have a higher
> latency than jcc ?
> 
> In this manual, the latency of predicted jcc is therefore 0 cycle, and its
> throughput is 0.5 cycle/insn.
> 
> NOP (page C-29) is stated to have a latency of 0.5 to 1 cycle/insn (depending on
> the exact HW), and throughput of 0.5 cycle/insn.
> 
> However, I have not found "jmp" explicitly in this listing.
> 
> So if we were executing tracepoints in a maze of jumps, we could argue that
> instruction throughput is the most important there. However, if we expect the
> common case to be surrounded by some non-ALU instructions, latency tends to
> become the most important criterion.
> 
> But I feel I might be missing something important that distinguish "jcc" from
> "jmp".

NOP has a latency of 0.5-1.0 cycle/insns, *but has no consumers*.

JMP/Jcc does have a consumer -- the IP -- and actually measuring shows
that it is much, much worse than NOP and other dummy instructions.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:44                                 ` H. Peter Anvin
@ 2010-10-19 22:56                                   ` Steven Rostedt
  2010-10-19 22:57                                     ` H. Peter Anvin
  0 siblings, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 22:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 15:44 -0700, H. Peter Anvin wrote:
> On 10/19/2010 03:38 PM, Jason Baron wrote:
> > 
> > I'm not sure I did, the documentation says that we have nop followed by
> > a jmp:
> > 
> > +The new code is a 'nopl' followed by a 'jmp'. Thus:
> > +
> > +nopl - 0f 1f 44 00 00 - 5 bytes
> > +jmp  - eb 3e          - 2 bytes
> > 
> 
> There is no excuse for needing the second jump here, obviously...

Now the trick is to tell gcc that.

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:56                                   ` Steven Rostedt
@ 2010-10-19 22:57                                     ` H. Peter Anvin
  0 siblings, 0 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-19 22:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/19/2010 03:56 PM, Steven Rostedt wrote:
>>
>> There is no excuse for needing the second jump here, obviously...
> 
> Now the trick is to tell gcc that.
> 

Yes.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:49                                     ` H. Peter Anvin
@ 2010-10-19 23:05                                       ` Steven Rostedt
  2010-10-19 23:09                                         ` H. Peter Anvin
  2010-10-20 15:27                                         ` Jason Baron
  0 siblings, 2 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 23:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony,
	Jason Baron

On Tue, 2010-10-19 at 15:49 -0700, H. Peter Anvin wrote:
> On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote:
> >>
> >> OK, first of all, there are some serious WTFs here:
> >>
> >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
> >>
> >> A jump instruction is one of the worst possible NOPs.  Why are we doing
> >> this?
> > 
> > This code is dynamically patched at boot time (and module load time) with a
> > better nop, just like the function tracer does.
> > 
> 
> That's just ridiculous... start out with something sane and you at least
> have the chance of not having to patch it.

Yep we can fix this. Jason?



> > So if we were executing tracepoints in a maze of jumps, we could argue that
> > instruction throughput is the most important there. However, if we expect the
> > common case to be surrounded by some non-ALU instructions, latency tends to
> > become the most important criterion.
> > 
> > But I feel I might be missing something important that distinguish "jcc" from
> > "jmp".
> 
> NOP has a latency of 0.5-1.0 cycle/insns, *but has no consumers*.
> 
> JMP/Jcc does have a consumer -- the IP -- and actually measuring shows
> that it is much, much worse than NOP and other dummy instructions.

But how does JMP vs Jcc compare?

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 23:05                                       ` Steven Rostedt
@ 2010-10-19 23:09                                         ` H. Peter Anvin
  2010-10-20 15:27                                         ` Jason Baron
  1 sibling, 0 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-19 23:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony,
	Jason Baron

On 10/19/2010 04:05 PM, Steven Rostedt wrote:
>>
>> JMP/Jcc does have a consumer -- the IP -- and actually measuring shows
>> that it is much, much worse than NOP and other dummy instructions.
> 
> But how does JMP vs Jcc compare?
> 

*As far as I know* they're the same, except of course that a direct JMP
never mispredicts.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:27                                     ` Peter Zijlstra
@ 2010-10-19 23:39                                       ` H. Peter Anvin
  2010-10-19 23:45                                         ` Steven Rostedt
  2010-10-20  0:43                                         ` Jason Baron
  0 siblings, 2 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-19 23:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/19/2010 03:27 PM, Peter Zijlstra wrote:
> 
> Due to not actually having a sane key type the above is not easy to
> implement, but I tried:
> 
> #define _SWITCH_POINT(x)\
> ({                                                              \
>         __label__ jl_enabled;                                   \
>         bool ret = true;                                        \
>         JUMP_LABEL(x, jl_enabled);                              \
>         ret = false;                                            \
> jl_enabled:                                                     \
>         ret;            })
> 
> #define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x))
> 
> #define COND_STMT(key, stmt)                                    \
> do {                                                            \
>         if (SWITCH_POINT(key)) {                                \
>                 stmt;                                           \
>         }                                                       \
> } while (0)
> 
> 
> and that's still generating these double jumps.
> 

I just experimented with it, and the ({...}) construct doesn't work,
because it looks like a merged flow of control to gcc.

Replacing the ({ ... }) with an inline does indeed remove the double
jumps.

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index b67cb18..2ff829d 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -61,12 +61,22 @@ static inline int jump_label_text_reserved(void
*start, void *end)

 #endif

+static __always_inline __pure bool _SWITCH_POINT(void *x)
+{
+       asm goto("# SWITCH_POINT %0\n\t"
+                ".byte 0x66,0x66,0x66,0x66,0x90\n"
+                "1:"
+                : : "i" (x) : : jl_enabled);
+       return false;
+jl_enabled:
+       return true;
+}
+
+#define SWITCH_POINT(x)        unlikely(_SWITCH_POINT(x))
+
 #define COND_STMT(key, stmt)                                   \
 do {                                                           \
-       __label__ jl_enabled;                                   \
-       JUMP_LABEL(key, jl_enabled);                            \
-       if (0) {                                                \
-jl_enabled:                                                    \
+       if (SWITCH_POINT(key)) {                                \
                stmt;                                           \
        }                                                       \
 } while (0)


The key here seems to be to not use the JUMP_LABEL macro as implemented;
I have utterly failed to make JUMP_LABEL() do the right thing.


	-hpa

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 23:39                                       ` H. Peter Anvin
@ 2010-10-19 23:45                                         ` Steven Rostedt
  2010-10-20  0:43                                         ` Jason Baron
  1 sibling, 0 replies; 93+ messages in thread
From: Steven Rostedt @ 2010-10-19 23:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, 2010-10-19 at 16:39 -0700, H. Peter Anvin wrote:

> The key here seems to be to not use the JUMP_LABEL macro as implemented;
> I have utterly failed to make JUMP_LABEL() do the right thing.

What happens if you remove the do { } while (0) from JUMP_LABEL, since
it now just makes it into a asm()

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 23:39                                       ` H. Peter Anvin
  2010-10-19 23:45                                         ` Steven Rostedt
@ 2010-10-20  0:43                                         ` Jason Baron
  1 sibling, 0 replies; 93+ messages in thread
From: Jason Baron @ 2010-10-20  0:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML,
	eric.dumazet, kaneshige.kenji, David Miller, izumi.taku,
	kosaki.motohiro, Heiko Carstens, Luck, Tony

On Tue, Oct 19, 2010 at 04:39:07PM -0700, H. Peter Anvin wrote:
> On 10/19/2010 03:27 PM, Peter Zijlstra wrote:
> > 
> > Due to not actually having a sane key type the above is not easy to
> > implement, but I tried:
> > 
> > #define _SWITCH_POINT(x)\
> > ({                                                              \
> >         __label__ jl_enabled;                                   \
> >         bool ret = true;                                        \
> >         JUMP_LABEL(x, jl_enabled);                              \
> >         ret = false;                                            \
> > jl_enabled:                                                     \
> >         ret;            })
> > 
> > #define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x))
> > 
> > #define COND_STMT(key, stmt)                                    \
> > do {                                                            \
> >         if (SWITCH_POINT(key)) {                                \
> >                 stmt;                                           \
> >         }                                                       \
> > } while (0)
> > 
> > 
> > and that's still generating these double jumps.
> > 
> 
> I just experimented with it, and the ({...}) construct doesn't work,
> because it looks like a merged flow of control to gcc.
> 
> Replacing the ({ ... }) with an inline does indeed remove the double
> jumps.
> 
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index b67cb18..2ff829d 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -61,12 +61,22 @@ static inline int jump_label_text_reserved(void
> *start, void *end)
> 
>  #endif
> 
> +static __always_inline __pure bool _SWITCH_POINT(void *x)
> +{
> +       asm goto("# SWITCH_POINT %0\n\t"
> +                ".byte 0x66,0x66,0x66,0x66,0x90\n"
> +                "1:"
> +                : : "i" (x) : : jl_enabled);
> +       return false;
> +jl_enabled:
> +       return true;
> +}
> +
> +#define SWITCH_POINT(x)        unlikely(_SWITCH_POINT(x))
> +
>  #define COND_STMT(key, stmt)                                   \
>  do {                                                           \
> -       __label__ jl_enabled;                                   \
> -       JUMP_LABEL(key, jl_enabled);                            \
> -       if (0) {                                                \
> -jl_enabled:                                                    \
> +       if (SWITCH_POINT(key)) {                                \
>                 stmt;                                           \
>         }                                                       \
>  } while (0)
> 
> 
> The key here seems to be to not use the JUMP_LABEL macro as implemented;
> I have utterly failed to make JUMP_LABEL() do the right thing.
> 

ok, I tried this out for the tracepoint code, but I still seem to be
getting the double jump.

patch:


diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 1947a12..7bc2537 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -66,12 +66,22 @@ static inline void jump_label_unlock(void) {}
 
 #endif
 
+static __always_inline __pure bool _SWITCH_POINT(void *x)
+{
+	asm goto("# SWITCH_POINT %0\n\t"
+		 ".byte 0x66,0x66,0x66,0x66,0x90\n"
+		 "1:"
+		 : : "i" (x) : : jl_enabled);
+	return false;
+jl_enabled:
+	return true;
+}
+
+#define SWITCH_POINT(x)        unlikely(_SWITCH_POINT(x))
+
 #define COND_STMT(key, stmt)					\
 do {								\
-	__label__ jl_enabled;					\
-	JUMP_LABEL(key, jl_enabled);				\
-	if (0) {						\
-jl_enabled:							\
+	if (SWITCH_POINT(key)) {                                \
 		stmt;						\
 	}							\
 } while (0)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index a4a90b6..1f8d14f 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -146,12 +146,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
 	extern struct tracepoint __tracepoint_##name;			\
 	static inline void trace_##name(proto)				\
 	{								\
-		JUMP_LABEL(&__tracepoint_##name.state, do_trace);	\
-		return;							\
-do_trace:								\
-			__DO_TRACE(&__tracepoint_##name,		\
-				TP_PROTO(data_proto),			\
-				TP_ARGS(data_args));			\
+		COND_STMT(&__tracepoint_##name.state, __DO_TRACE(&__tracepoint_##name, TP_PROTO(data_proto), TP_ARGS(data_args)));						     \
 	}								\
 	static inline int						\
 	register_trace_##name(void (*probe)(data_proto), void *data)	\


disassemly:

ffffffff810360a6 <set_task_cpu>:
ffffffff810360a6:       55                      push   %rbp
ffffffff810360a7:       48 89 e5                mov    %rsp,%rbp
ffffffff810360aa:       41 55                   push   %r13
ffffffff810360ac:       41 54                   push   %r12
ffffffff810360ae:       41 89 f4                mov    %esi,%r12d
ffffffff810360b1:       53                      push   %rbx
ffffffff810360b2:       48 89 fb                mov    %rdi,%rbx
ffffffff810360b5:       48 81 ec b8 00 00 00    sub    $0xb8,%rsp
ffffffff810360bc:       66 66 66 66 90          data32 data32 data32
xchg %ax,%ax
ffffffff810360c1:       eb 19                   jmp    ffffffff810360dc
<set_task_cpu+0x36>
ffffffff810360c3:       49 8b 7d 08             mov    0x8(%r13),%rdi
ffffffff810360c7:       44 89 e2                mov    %r12d,%edx
ffffffff810360ca:       48 89 de                mov    %rbx,%rsi
ffffffff810360cd:       41 ff 55 00             callq  *0x0(%r13)
ffffffff810360d1:       49 83 c5 10             add    $0x10,%r13
ffffffff810360d5:       49 83 7d 00 00          cmpq   $0x0,0x0(%r13)
ffffffff810360da:       eb 6c                   jmp    ffffffff81036148
<set_task_cpu+0xa2>
ffffffff810360dc:       48 8b 43 08             mov    0x8(%rbx),%rax
ffffffff810360e0:       44 39 60 18             cmp    %r12d,0x18(%rax)
ffffffff810360e4:       74 37                   je     ffffffff8103611d
<set_task_cpu+0x77>
ffffffff810360e6:       48 ff 83 98 00 00 00    incq   0x98(%rbx)
ffffffff810360ed:       e9 00 00 00 00          jmpq   ffffffff810360f2
<set_task_cpu+0x4c>
ffffffff810360f2:       eb 29                   jmp    ffffffff8103611d
<set_task_cpu+0x77>
ffffffff810360f4:       4c 8d ad 30 ff ff ff    lea    -0xd0(%rbp),%r13
ffffffff810360fb:       4c 89 ef                mov    %r13,%rdi
ffffffff810360fe:       e8 c7 94 ff ff          callq  ffffffff8102f5ca
<perf_fetch_caller_regs>
ffffffff81036103:       45 31 c0                xor    %r8d,%r8d
ffffffff81036106:       4c 89 e9                mov    %r13,%rcx
ffffffff81036109:       ba 01 00 00 00          mov    $0x1,%edx
ffffffff8103610e:       be 01 00 00 00          mov    $0x1,%esi
ffffffff81036113:       bf 04 00 00 00          mov    $0x4,%edi
ffffffff81036118:       e8 67 19 07 00          callq  ffffffff810a7a84
<__perf_sw_event>
ffffffff8103611d:       44 89 e6                mov    %r12d,%esi
ffffffff81036120:       48 89 df                mov    %rbx,%rdi
ffffffff81036123:       e8 2f 75 ff ff          callq  ffffffff8102d657
<set_task_rq>
ffffffff81036128:       48 8b 43 08             mov    0x8(%rbx),%rax
ffffffff8103612c:       44 89 60 18             mov    %r12d,0x18(%rax)
ffffffff81036130:       48 81 c4 b8 00 00 00    add    $0xb8,%rsp
ffffffff81036137:       5b                      pop    %rbx
ffffffff81036138:       41 5c                   pop    %r12
ffffffff8103613a:       41 5d                   pop    %r13
ffffffff8103613c:       c9                      leaveq
ffffffff8103613d:       c3                      retq
ffffffff8103613e:       4c 8b 2d 3b 26 a9 00    mov
0xa9263b(%rip),%r13        # ffffffff81ac8780
<__tracepoint_sched_migrate_task+0x20>
ffffffff81036145:       4d 85 ed                test   %r13,%r13
ffffffff81036148:       0f 85 75 ff ff ff       jne    ffffffff810360c3
<set_task_cpu+0x1d>
ffffffff8103614e:       eb 8c                   jmp    ffffffff810360dc
<set_task_cpu+0x36>



I'm using gcc (GCC) 4.5.1 20100812

is my patch wrong?

thanks,

-Jason

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:17                               ` Thomas Gleixner
@ 2010-10-20  1:36                                 ` Steven Rostedt
  2010-10-20  1:52                                   ` Jason Baron
  0 siblings, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-10-20  1:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jason Baron, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Wed, 2010-10-20 at 00:17 +0200, Thomas Gleixner wrote:
> On Tue, 19 Oct 2010, Thomas Gleixner wrote:
> > On Tue, 19 Oct 2010, Jason Baron wrote:
> > > > Now even worse, when you NOP out the jmpq then your tracepoint is
> > > > still not enabled. Brilliant !
> > > > 
> > > 
> > > The 'jmpq' in the enabled case is patched with a jmpq to the body of the
> > > tracepoint itself.
> > 
> > Brilliant.
> 
> IOW, We now jump around the jump which jumps around the disabled code.
> 


Do you happen to have CONFIG_CC_OPTIMIZE_FOR_SIZE set? If so, then this
is a known issue. We even originally had jump label enabled _only_ if
CC_OPTIMIZE_FOR_SIZE was not set, but hpa NAK'd it.

http://lkml.org/lkml/2010/9/22/482

http://lkml.org/lkml/2010/9/20/488

http://lkml.org/lkml/2010/9/24/259

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-20  1:36                                 ` Steven Rostedt
@ 2010-10-20  1:52                                   ` Jason Baron
  2010-10-25 22:32                                     ` H. Peter Anvin
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Baron @ 2010-10-20  1:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, Oct 19, 2010 at 09:36:30PM -0400, Steven Rostedt wrote:
> 
> On Wed, 2010-10-20 at 00:17 +0200, Thomas Gleixner wrote:
> > On Tue, 19 Oct 2010, Thomas Gleixner wrote:
> > > On Tue, 19 Oct 2010, Jason Baron wrote:
> > > > > Now even worse, when you NOP out the jmpq then your tracepoint is
> > > > > still not enabled. Brilliant !
> > > > > 
> > > > 
> > > > The 'jmpq' in the enabled case is patched with a jmpq to the body of the
> > > > tracepoint itself.
> > > 
> > > Brilliant.
> > 
> > IOW, We now jump around the jump which jumps around the disabled code.
> > 
> 
> 
> Do you happen to have CONFIG_CC_OPTIMIZE_FOR_SIZE set? If so, then this
> is a known issue. We even originally had jump label enabled _only_ if
> CC_OPTIMIZE_FOR_SIZE was not set, but hpa NAK'd it.
> 
> http://lkml.org/lkml/2010/9/22/482
> 
> http://lkml.org/lkml/2010/9/20/488
> 
> http://lkml.org/lkml/2010/9/24/259
> 
> -- Steve

thanks Steve. I was about to say this. When CONFIG_CC_OPTIMIZE_FOR_SIZE
is not set we don't get the double 'jmp' and the tracepoint code is
moved out of line. It was mentioned that a number of distros ship with
CONFIG_CC_OPTIMIZE_FOR_SIZE not set, and as Steve mentioned my original
patch set was conditional on !CONFIG_CC_OPTIMIZE_FOR_SIZE.

using hot/cold labels gcc can fix the CONFIG_CC_OPTIMIZE_FOR_SIZE case,
but its a non-trivial amount of work for gcc. I was hoping that if jump
labels are included, we could make the gcc work happen.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 23:05                                       ` Steven Rostedt
  2010-10-19 23:09                                         ` H. Peter Anvin
@ 2010-10-20 15:27                                         ` Jason Baron
  2010-10-20 15:41                                           ` Mathieu Desnoyers
  2010-10-25 21:54                                           ` H. Peter Anvin
  1 sibling, 2 replies; 93+ messages in thread
From: Jason Baron @ 2010-10-20 15:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Tue, Oct 19, 2010 at 07:05:15PM -0400, Steven Rostedt wrote:
> On Tue, 2010-10-19 at 15:49 -0700, H. Peter Anvin wrote:
> > On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote:
> > >>
> > >> OK, first of all, there are some serious WTFs here:
> > >>
> > >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
> > >>
> > >> A jump instruction is one of the worst possible NOPs.  Why are we doing
> > >> this?
> > > 
> > > This code is dynamically patched at boot time (and module load time) with a
> > > better nop, just like the function tracer does.
> > > 
> > 
> > That's just ridiculous... start out with something sane and you at least
> > have the chance of not having to patch it.
> 
> Yep we can fix this. Jason?
> 

sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
there's a better lcd for x86, I'll update it. But note, that since the
'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
And in the boot path we are using 'text_poke_early()', so avoiding that
isn't going to improve things much.

I've got a few fixup patches in the queue that I'm going to post first,
and then I'll take a look at this change.

thanks,

-Jason


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-20 15:27                                         ` Jason Baron
@ 2010-10-20 15:41                                           ` Mathieu Desnoyers
  2010-10-25 21:54                                           ` H. Peter Anvin
  1 sibling, 0 replies; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-20 15:41 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, H. Peter Anvin, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

* Jason Baron (jbaron@redhat.com) wrote:
> On Tue, Oct 19, 2010 at 07:05:15PM -0400, Steven Rostedt wrote:
> > On Tue, 2010-10-19 at 15:49 -0700, H. Peter Anvin wrote:
> > > On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote:
> > > >>
> > > >> OK, first of all, there are some serious WTFs here:
> > > >>
> > > >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
> > > >>
> > > >> A jump instruction is one of the worst possible NOPs.  Why are we doing
> > > >> this?
> > > > 
> > > > This code is dynamically patched at boot time (and module load time) with a
> > > > better nop, just like the function tracer does.
> > > > 
> > > 
> > > That's just ridiculous... start out with something sane and you at least
> > > have the chance of not having to patch it.
> > 
> > Yep we can fix this. Jason?
> > 
> 
> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
> there's a better lcd for x86, I'll update it. But note, that since the
> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
> And in the boot path we are using 'text_poke_early()', so avoiding that
> isn't going to improve things much.
> 
> I've got a few fixup patches in the queue that I'm going to post first,
> and then I'll take a look at this change.

One thing to consider here is that some nops are not compatible across all
architectures. And it would be safer to use an atomic nops (a single
instruction) too. e.g.  GENERIC_NOP5 in arch/x86/include/asm/nops.h is really 2
instructions, which can cause problems if a concurrent thread is preempted
between the two instructions while we patch.

arch_init_ideal_nop5() is actually doing the task of finding the best nop, and
it falls-back on a 5-byte nop (just like you do).

HPA, do you have any recommendation for a 5-byte single-instruction nop that is
efficient enough and will work on all x86 (Intel, AMD and other variants) ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [tip:perf/core] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 13:00             ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner
  2010-10-19 13:08               ` Peter Zijlstra
  2010-10-19 13:22               ` Mathieu Desnoyers
@ 2010-10-21 14:52               ` tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 93+ messages in thread
From: tip-bot for Thomas Gleixner @ 2010-10-21 14:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, fweisbec, rostedt, peterz, tglx

Commit-ID:  f4bc6bb2d562703eafc895c37e7be20906de139d
Gitweb:     http://git.kernel.org/tip/f4bc6bb2d562703eafc895c37e7be20906de139d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 19 Oct 2010 15:00:13 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 21 Oct 2010 16:50:29 +0200

tracing: Cleanup the convoluted softirq tracepoints

With the addition of trace_softirq_raise() the softirq tracepoint got
even more convoluted. Why the tracepoints take two pointers to assign
an integer is beyond my comprehension.

But adding an extra case which treats the first pointer as an unsigned
long when the second pointer is NULL including the back and forth
type casting is just horrible.

Convert the softirq tracepoints to take a single unsigned int argument
for the softirq vector number and fix the call sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1010191428560.6815@localhost6.localdomain6>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: mathieu.desnoyers@efficios.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>

---
 include/linux/interrupt.h  |    2 +-
 include/trace/events/irq.h |   54 ++++++++++++++++---------------------------
 kernel/softirq.c           |   16 +++++++-----
 3 files changed, 30 insertions(+), 42 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 531495d..0ac1949 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -410,7 +410,7 @@ extern void open_softirq(int nr, void (*action)(struct softirq_action *));
 extern void softirq_init(void);
 static inline void __raise_softirq_irqoff(unsigned int nr)
 {
-	trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL);
+	trace_softirq_raise(nr);
 	or_softirq_pending(1UL << nr);
 }
 
diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 6fa7cba..1c09820 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -86,76 +86,62 @@ TRACE_EVENT(irq_handler_exit,
 
 DECLARE_EVENT_CLASS(softirq,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec),
+	TP_ARGS(vec_nr),
 
 	TP_STRUCT__entry(
-		__field(	int,	vec			)
+		__field(	unsigned int,	vec	)
 	),
 
 	TP_fast_assign(
-		if (vec)
-			__entry->vec = (int)(h - vec);
-		else
-			__entry->vec = (int)(long)h;
+		__entry->vec = vec_nr;
 	),
 
-	TP_printk("vec=%d [action=%s]", __entry->vec,
+	TP_printk("vec=%u [action=%s]", __entry->vec,
 		  show_softirq_name(__entry->vec))
 );
 
 /**
  * softirq_entry - called immediately before the softirq handler
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @vec_nr:  softirq vector number
  *
- * The @h parameter, contains a pointer to the struct softirq_action
- * which has a pointer to the action handler that is called. By subtracting
- * the @vec pointer from the @h pointer, we can determine the softirq
- * number. Also, when used in combination with the softirq_exit tracepoint
- * we can determine the softirq latency.
+ * When used in combination with the softirq_exit tracepoint
+ * we can determine the softirq handler runtine.
  */
 DEFINE_EVENT(softirq, softirq_entry,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(vec_nr)
 );
 
 /**
  * softirq_exit - called immediately after the softirq handler returns
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @vec_nr:  softirq vector number
  *
- * The @h parameter contains a pointer to the struct softirq_action
- * that has handled the softirq. By subtracting the @vec pointer from
- * the @h pointer, we can determine the softirq number. Also, when used in
- * combination with the softirq_entry tracepoint we can determine the softirq
- * latency.
+ * When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq handler runtine.
  */
 DEFINE_EVENT(softirq, softirq_exit,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(vec_nr)
 );
 
 /**
  * softirq_raise - called immediately when a softirq is raised
- * @h: pointer to struct softirq_action
- * @vec: pointer to first struct softirq_action in softirq_vec array
+ * @vec_nr:  softirq vector number
  *
- * The @h parameter contains a pointer to the softirq vector number which is
- * raised. @vec is NULL and it means @h includes vector number not
- * softirq_action. When used in combination with the softirq_entry tracepoint
- * we can determine the softirq raise latency.
+ * When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise to run latency.
  */
 DEFINE_EVENT(softirq, softirq_raise,
 
-	TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+	TP_PROTO(unsigned int vec_nr),
 
-	TP_ARGS(h, vec)
+	TP_ARGS(vec_nr)
 );
 
 #endif /*  _TRACE_IRQ_H */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 07b4f1b..b3cb1dc 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -212,18 +212,20 @@ restart:
 
 	do {
 		if (pending & 1) {
+			unsigned int vec_nr = h - softirq_vec;
 			int prev_count = preempt_count();
-			kstat_incr_softirqs_this_cpu(h - softirq_vec);
 
-			trace_softirq_entry(h, softirq_vec);
+			kstat_incr_softirqs_this_cpu(vec_nr);
+
+			trace_softirq_entry(vec_nr);
 			h->action(h);
-			trace_softirq_exit(h, softirq_vec);
+			trace_softirq_exit(vec_nr);
 			if (unlikely(prev_count != preempt_count())) {
-				printk(KERN_ERR "huh, entered softirq %td %s %p"
+				printk(KERN_ERR "huh, entered softirq %u %s %p"
 				       "with preempt_count %08x,"
-				       " exited with %08x?\n", h - softirq_vec,
-				       softirq_to_name[h - softirq_vec],
-				       h->action, prev_count, preempt_count());
+				       " exited with %08x?\n", vec_nr,
+				       softirq_to_name[vec_nr], h->action,
+				       prev_count, preempt_count());
 				preempt_count() = prev_count;
 			}
 

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-19 22:33                                   ` Steven Rostedt
@ 2010-10-21 16:18                                     ` Thomas Gleixner
  2010-10-21 17:05                                       ` Steven Rostedt
  0 siblings, 1 reply; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-21 16:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony



On Tue, 19 Oct 2010, Steven Rostedt wrote:

> On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote:
> 
> > hpa just posted code which does the _RIGHT_ _THING_ independent of any
> > compiler madness and you tracer folks just missed it.
> 
> Thomas,
> 
> Can you try this patch and see if it makes the object code better?

Nope, same result.

Thanks,

	tglx

> -- Steve
> 
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index a4a90b6..6264bd3 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -144,14 +144,19 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
>   */
>  #define __DECLARE_TRACE(name, proto, args, data_proto, data_args)	\
>  	extern struct tracepoint __tracepoint_##name;			\
> -	static inline void trace_##name(proto)				\
> +	static __always_inline int __trace_##name(proto)		\
>  	{								\
>  		JUMP_LABEL(&__tracepoint_##name.state, do_trace);	\
> -		return;							\
> +		return 0;						\
>  do_trace:								\
>  			__DO_TRACE(&__tracepoint_##name,		\
>  				TP_PROTO(data_proto),			\
>  				TP_ARGS(data_args));			\
> +			return 1;					\
> +	}								\
> +	static inline void trace_##name(proto)				\
> +	{								\
> +		unlikely(__trace_##name(args));				\
>  	}								\
>  	static inline int						\
>  	register_trace_##name(void (*probe)(data_proto), void *data)	\
> 
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-21 16:18                                     ` Thomas Gleixner
@ 2010-10-21 17:05                                       ` Steven Rostedt
  2010-10-21 19:56                                         ` Thomas Gleixner
  0 siblings, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-10-21 17:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Thu, 2010-10-21 at 18:18 +0200, Thomas Gleixner wrote:
> 
> On Tue, 19 Oct 2010, Steven Rostedt wrote:
> 
> > On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote:
> > 
> > > hpa just posted code which does the _RIGHT_ _THING_ independent of any
> > > compiler madness and you tracer folks just missed it.
> > 
> > Thomas,
> > 
> > Can you try this patch and see if it makes the object code better?
> 
> Nope, same result.

Yeah, I figured. Do you have CC_OPTIMIZE_FOR_SIZE set? And if you do,
what happens if you disable it?

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-21 17:05                                       ` Steven Rostedt
@ 2010-10-21 19:56                                         ` Thomas Gleixner
  2010-10-25 22:31                                           ` H. Peter Anvin
  0 siblings, 1 reply; 93+ messages in thread
From: Thomas Gleixner @ 2010-10-21 19:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs,
	H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Thu, 21 Oct 2010, Steven Rostedt wrote:

> On Thu, 2010-10-21 at 18:18 +0200, Thomas Gleixner wrote:
> > 
> > On Tue, 19 Oct 2010, Steven Rostedt wrote:
> > 
> > > On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote:
> > > 
> > > > hpa just posted code which does the _RIGHT_ _THING_ independent of any
> > > > compiler madness and you tracer folks just missed it.
> > > 
> > > Thomas,
> > > 
> > > Can you try this patch and see if it makes the object code better?
> > 
> > Nope, same result.
> 
> Yeah, I figured. Do you have CC_OPTIMIZE_FOR_SIZE set? And if you do,
> what happens if you disable it?

Hmm. Indeed. That gets rid of the double jump.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-20 15:27                                         ` Jason Baron
  2010-10-20 15:41                                           ` Mathieu Desnoyers
@ 2010-10-25 21:54                                           ` H. Peter Anvin
  2010-10-25 22:01                                             ` Mathieu Desnoyers
  1 sibling, 1 reply; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-25 21:54 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/20/2010 08:27 AM, Jason Baron wrote:
> 
> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
> there's a better lcd for x86, I'll update it. But note, that since the
> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
> And in the boot path we are using 'text_poke_early()', so avoiding that
> isn't going to improve things much.
> 

It's still a completely unnecessary waste of startup time some
potentially significant fraction of the time.  Startup time matters,
especially as the number of tracepoints grow.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-25 21:54                                           ` H. Peter Anvin
@ 2010-10-25 22:01                                             ` Mathieu Desnoyers
  2010-10-25 22:12                                               ` H. Peter Anvin
  0 siblings, 1 reply; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-25 22:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

* H. Peter Anvin (hpa@zytor.com) wrote:
> On 10/20/2010 08:27 AM, Jason Baron wrote:
> > 
> > sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
> > there's a better lcd for x86, I'll update it. But note, that since the
> > 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
> > And in the boot path we are using 'text_poke_early()', so avoiding that
> > isn't going to improve things much.
> > 
> 
> It's still a completely unnecessary waste of startup time some
> potentially significant fraction of the time.  Startup time matters,
> especially as the number of tracepoints grow.

We're still waiting for input for the best single-5-byte-instruction nop that
will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two
instructions one next to each other, which is not appropriate here.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-25 22:01                                             ` Mathieu Desnoyers
@ 2010-10-25 22:12                                               ` H. Peter Anvin
  2010-10-25 22:19                                                 ` H. Peter Anvin
  2010-10-25 22:55                                                 ` Mathieu Desnoyers
  0 siblings, 2 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-25 22:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote:
> * H. Peter Anvin (hpa@zytor.com) wrote:
>> On 10/20/2010 08:27 AM, Jason Baron wrote:
>>>
>>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
>>> there's a better lcd for x86, I'll update it. But note, that since the
>>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
>>> And in the boot path we are using 'text_poke_early()', so avoiding that
>>> isn't going to improve things much.
>>>
>>
>> It's still a completely unnecessary waste of startup time some
>> potentially significant fraction of the time.  Startup time matters,
>> especially as the number of tracepoints grow.
> 
> We're still waiting for input for the best single-5-byte-instruction nop that
> will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two
> instructions one next to each other, which is not appropriate here.
> 

On 64 bits, use P6_NOP5; it seems to not suck on any platform.

On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least
do okay.

I can't say these are the *best* (in fact, they are guaranteed not the
best on some significant number of chips), but they haven't sucked on
any chips I have been able to measure -- and are way faster than JMP.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-25 22:12                                               ` H. Peter Anvin
@ 2010-10-25 22:19                                                 ` H. Peter Anvin
  2010-10-25 22:55                                                 ` Mathieu Desnoyers
  1 sibling, 0 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-25 22:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/25/2010 03:12 PM, H. Peter Anvin wrote:
> 
> On 64 bits, use P6_NOP5; it seems to not suck on any platform.
> 
> On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least
> do okay.
> 
> I can't say these are the *best* (in fact, they are guaranteed not the
> best on some significant number of chips), but they haven't sucked on
> any chips I have been able to measure -- and are way faster than JMP.
> 

This is pure conjecture, I have not measured it, but I suspect in fact
that we could just change the composite nops in nops.h to use a 3E
prefix instead of a separate 90 nop.  Some platforms will take a penalty
on the prefix, but that would be balanced against handling two instructions.

The P5 core and others of the same generation might suffer, as it might
have been able to do U+V pipe pairing on two instructions which it
wouldn't for prefixes.

	-hpa


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-21 19:56                                         ` Thomas Gleixner
@ 2010-10-25 22:31                                           ` H. Peter Anvin
  0 siblings, 0 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-25 22:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra,
	Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan,
	laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller,
	izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony

On 10/21/2010 12:56 PM, Thomas Gleixner wrote:
> On Thu, 21 Oct 2010, Steven Rostedt wrote:
> 
>> On Thu, 2010-10-21 at 18:18 +0200, Thomas Gleixner wrote:
>>>
>>> On Tue, 19 Oct 2010, Steven Rostedt wrote:
>>>
>>>> On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote:
>>>>
>>>>> hpa just posted code which does the _RIGHT_ _THING_ independent of any
>>>>> compiler madness and you tracer folks just missed it.
>>>>
>>>> Thomas,
>>>>
>>>> Can you try this patch and see if it makes the object code better?
>>>
>>> Nope, same result.
>>
>> Yeah, I figured. Do you have CC_OPTIMIZE_FOR_SIZE set? And if you do,
>> what happens if you disable it?
> 
> Hmm. Indeed. That gets rid of the double jump.
> 

-Os unfortunately drops a bunch of optimizations.

With gcc 4.5.1 there is actually a way to guarantee to get rid of double
jumps, which is that you tell gcc that it is branching to one of two
targets:

                asm goto("1: .byte 0xe9 ; .long %l[t_no]-2f\n"
			 "2:\n"
		         /* patching infrastructure goes here */
                         : : "i" (bit) : : t_no, t_yes);
	        __builtin_unreachable();
t_no:
                return false;
t_yes:
                return true;

[The open-coding of the jump is necessary to force the 5-byte form
instead of the 2-byte form.]

The patching machinery can recognize the case where the jump offset is
zero and patch in a NOP instead.

There does, however, seem to be a couple of problems:

a) gcc 4.5.1 is required due to a bug in previous versions of gcc when
an asm goto doesn't have a fallthrough case.

b) it seems to encourage gcc to actively jump around as it reorders
blocks, since gcc no longer sees a fallthrough case at all.

Not sure I have a good solution for this, at least not with current gcc.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-20  1:52                                   ` Jason Baron
@ 2010-10-25 22:32                                     ` H. Peter Anvin
  0 siblings, 0 replies; 93+ messages in thread
From: H. Peter Anvin @ 2010-10-25 22:32 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On 10/19/2010 06:52 PM, Jason Baron wrote:
> 
> thanks Steve. I was about to say this. When CONFIG_CC_OPTIMIZE_FOR_SIZE
> is not set we don't get the double 'jmp' and the tracepoint code is
> moved out of line. It was mentioned that a number of distros ship with
> CONFIG_CC_OPTIMIZE_FOR_SIZE not set, and as Steve mentioned my original
> patch set was conditional on !CONFIG_CC_OPTIMIZE_FOR_SIZE.
> 
> using hot/cold labels gcc can fix the CONFIG_CC_OPTIMIZE_FOR_SIZE case,
> but its a non-trivial amount of work for gcc. I was hoping that if jump
> labels are included, we could make the gcc work happen.
> 

That's fair.  I think jump labels are still a win even in the
double-jump case (especially if the the tracepoint turns into a NOP
rather than a JMP.)

Code generated with -Os has a bunch of other problems, too.

	-hpa

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-25 22:12                                               ` H. Peter Anvin
  2010-10-25 22:19                                                 ` H. Peter Anvin
@ 2010-10-25 22:55                                                 ` Mathieu Desnoyers
  2010-10-26  0:39                                                   ` Steven Rostedt
  1 sibling, 1 reply; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-25 22:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

* H. Peter Anvin (hpa@zytor.com) wrote:
> On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote:
> > * H. Peter Anvin (hpa@zytor.com) wrote:
> >> On 10/20/2010 08:27 AM, Jason Baron wrote:
> >>>
> >>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
> >>> there's a better lcd for x86, I'll update it. But note, that since the
> >>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
> >>> And in the boot path we are using 'text_poke_early()', so avoiding that
> >>> isn't going to improve things much.
> >>>
> >>
> >> It's still a completely unnecessary waste of startup time some
> >> potentially significant fraction of the time.  Startup time matters,
> >> especially as the number of tracepoints grow.
> > 
> > We're still waiting for input for the best single-5-byte-instruction nop that
> > will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two
> > instructions one next to each other, which is not appropriate here.
> > 
> 
> On 64 bits, use P6_NOP5; it seems to not suck on any platform.
> 
> On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least
> do okay.
> 
> I can't say these are the *best* (in fact, they are guaranteed not the
> best on some significant number of chips), but they haven't sucked on
> any chips I have been able to measure -- and are way faster than JMP.

Cool, thanks for the info! Steven and Jason should probably update their
respective infrastructure to use the 32-bit 5-byte nop you propose rather than
the 5-byte jump.

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-25 22:55                                                 ` Mathieu Desnoyers
@ 2010-10-26  0:39                                                   ` Steven Rostedt
  2010-10-26  1:14                                                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 93+ messages in thread
From: Steven Rostedt @ 2010-10-26  0:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: H. Peter Anvin, Jason Baron, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

On Mon, 2010-10-25 at 18:55 -0400, Mathieu Desnoyers wrote:
> * H. Peter Anvin (hpa@zytor.com) wrote:
> > On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote:
> > > * H. Peter Anvin (hpa@zytor.com) wrote:
> > >> On 10/20/2010 08:27 AM, Jason Baron wrote:
> > >>>
> > >>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
> > >>> there's a better lcd for x86, I'll update it. But note, that since the
> > >>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
> > >>> And in the boot path we are using 'text_poke_early()', so avoiding that
> > >>> isn't going to improve things much.
> > >>>
> > >>
> > >> It's still a completely unnecessary waste of startup time some
> > >> potentially significant fraction of the time.  Startup time matters,
> > >> especially as the number of tracepoints grow.
> > > 
> > > We're still waiting for input for the best single-5-byte-instruction nop that
> > > will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two
> > > instructions one next to each other, which is not appropriate here.
> > > 
> > 
> > On 64 bits, use P6_NOP5; it seems to not suck on any platform.
> > 
> > On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least
> > do okay.
> > 
> > I can't say these are the *best* (in fact, they are guaranteed not the
> > best on some significant number of chips), but they haven't sucked on
> > any chips I have been able to measure -- and are way faster than JMP.
> 
> Cool, thanks for the info! Steven and Jason should probably update their
> respective infrastructure to use the 32-bit 5-byte nop you propose rather than
> the 5-byte jump.

Actually, I was thinking that we could take any 5 byte nop. The
alternate code is executed _before_ SMP is enabled. Thus we should not
have any cases where something could be executing in midstream.

-- Steve



^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints
  2010-10-26  0:39                                                   ` Steven Rostedt
@ 2010-10-26  1:14                                                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 93+ messages in thread
From: Mathieu Desnoyers @ 2010-10-26  1:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Jason Baron, Thomas Gleixner, Koki Sanagi,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman,
	scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji,
	David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck,
	Tony

* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Mon, 2010-10-25 at 18:55 -0400, Mathieu Desnoyers wrote:
> > * H. Peter Anvin (hpa@zytor.com) wrote:
> > > On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote:
> > > > * H. Peter Anvin (hpa@zytor.com) wrote:
> > > >> On 10/20/2010 08:27 AM, Jason Baron wrote:
> > > >>>
> > > >>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if
> > > >>> there's a better lcd for x86, I'll update it. But note, that since the
> > > >>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain.
> > > >>> And in the boot path we are using 'text_poke_early()', so avoiding that
> > > >>> isn't going to improve things much.
> > > >>>
> > > >>
> > > >> It's still a completely unnecessary waste of startup time some
> > > >> potentially significant fraction of the time.  Startup time matters,
> > > >> especially as the number of tracepoints grow.
> > > > 
> > > > We're still waiting for input for the best single-5-byte-instruction nop that
> > > > will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two
> > > > instructions one next to each other, which is not appropriate here.
> > > > 
> > > 
> > > On 64 bits, use P6_NOP5; it seems to not suck on any platform.
> > > 
> > > On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least
> > > do okay.
> > > 
> > > I can't say these are the *best* (in fact, they are guaranteed not the
> > > best on some significant number of chips), but they haven't sucked on
> > > any chips I have been able to measure -- and are way faster than JMP.
> > 
> > Cool, thanks for the info! Steven and Jason should probably update their
> > respective infrastructure to use the 32-bit 5-byte nop you propose rather than
> > the 5-byte jump.
> 
> Actually, I was thinking that we could take any 5 byte nop. The
> alternate code is executed _before_ SMP is enabled. Thus we should not
> have any cases where something could be executing in midstream.

Nay, absolutely not. See, the goal here is to find a no-op that is good enough
to be left there *without* init-time dynamic patching on a range of
architectures, so we can diminish the boot-time delay. This imply that we have
to select a no-op that can be patched in SMP context, thus it must be a single
5-byte instruction. We could even create a EMBEDDED config that lets specify
that the built-in nop should be left there for embedded systems that care about
boot time.

Moreover, even if it was not the case, I'd be tempted to still use a single
instruction 5-byte no-op just in case interrupts or any sorts (standard
interrupts, nmis, mce or whatnot) would happen to be enabled earlier than this
boot time nop patching.

IOW, you'd need a _very_ strong argument to support using the fragile 2
instructions nops there.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2010-10-26  1:14 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-23  9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi
2010-08-23  9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi
2010-09-03 15:29   ` Frederic Weisbecker
2010-09-03 15:39     ` Steven Rostedt
2010-09-03 15:42       ` Frederic Weisbecker
2010-09-03 15:43     ` Steven Rostedt
2010-09-03 15:50       ` Frederic Weisbecker
2010-09-06  1:46         ` Koki Sanagi
2010-09-08  8:33   ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan
2010-09-08 11:25     ` [sparc build bug] " Ingo Molnar
2010-09-08 12:26       ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker
2010-09-09 19:54         ` [tip:perf/core] " tip-bot for Frederic Weisbecker
2010-10-18  9:44       ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra
2010-10-18 10:11         ` Peter Zijlstra
2010-10-18 10:26           ` Heiko Carstens
2010-10-18 10:48         ` Peter Zijlstra
2010-10-19 10:58           ` Koki Sanagi
2010-10-19 11:25             ` Peter Zijlstra
2010-10-19 13:00             ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner
2010-10-19 13:08               ` Peter Zijlstra
2010-10-19 13:22               ` Mathieu Desnoyers
2010-10-19 13:41                 ` Thomas Gleixner
2010-10-19 13:54                   ` Steven Rostedt
2010-10-19 14:07                     ` Thomas Gleixner
2010-10-19 14:28                       ` Mathieu Desnoyers
2010-10-19 19:49                         ` Thomas Gleixner
2010-10-19 20:55                           ` Steven Rostedt
2010-10-19 21:07                             ` Thomas Gleixner
2010-10-19 21:23                               ` Steven Rostedt
2010-10-19 21:48                                 ` H. Peter Anvin
2010-10-19 22:23                                   ` Steven Rostedt
2010-10-19 22:26                                     ` H. Peter Anvin
2010-10-19 22:27                                     ` Peter Zijlstra
2010-10-19 23:39                                       ` H. Peter Anvin
2010-10-19 23:45                                         ` Steven Rostedt
2010-10-20  0:43                                         ` Jason Baron
2010-10-19 22:41                                   ` Mathieu Desnoyers
2010-10-19 22:49                                     ` H. Peter Anvin
2010-10-19 23:05                                       ` Steven Rostedt
2010-10-19 23:09                                         ` H. Peter Anvin
2010-10-20 15:27                                         ` Jason Baron
2010-10-20 15:41                                           ` Mathieu Desnoyers
2010-10-25 21:54                                           ` H. Peter Anvin
2010-10-25 22:01                                             ` Mathieu Desnoyers
2010-10-25 22:12                                               ` H. Peter Anvin
2010-10-25 22:19                                                 ` H. Peter Anvin
2010-10-25 22:55                                                 ` Mathieu Desnoyers
2010-10-26  0:39                                                   ` Steven Rostedt
2010-10-26  1:14                                                     ` Mathieu Desnoyers
2010-10-19 22:04                                 ` Thomas Gleixner
2010-10-19 22:33                                   ` Steven Rostedt
2010-10-21 16:18                                     ` Thomas Gleixner
2010-10-21 17:05                                       ` Steven Rostedt
2010-10-21 19:56                                         ` Thomas Gleixner
2010-10-25 22:31                                           ` H. Peter Anvin
2010-10-19 21:45                             ` Thomas Gleixner
2010-10-19 22:14                               ` Steven Rostedt
2010-10-19 21:16                           ` David Daney
2010-10-19 21:32                             ` Jason Baron
2010-10-19 21:38                               ` David Daney
2010-10-19 21:47                             ` Steven Rostedt
2010-10-19 21:28                           ` Jason Baron
2010-10-19 21:55                             ` Thomas Gleixner
2010-10-19 22:17                               ` Thomas Gleixner
2010-10-20  1:36                                 ` Steven Rostedt
2010-10-20  1:52                                   ` Jason Baron
2010-10-25 22:32                                     ` H. Peter Anvin
2010-10-19 22:38                               ` Jason Baron
2010-10-19 22:44                                 ` H. Peter Anvin
2010-10-19 22:56                                   ` Steven Rostedt
2010-10-19 22:57                                     ` H. Peter Anvin
2010-10-19 14:46                       ` Steven Rostedt
2010-10-19 14:00                   ` Mathieu Desnoyers
2010-10-21 14:52               ` [tip:perf/core] " tip-bot for Thomas Gleixner
2010-08-23  9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi
2010-08-24  3:52   ` David Miller
2010-09-08  8:34   ` [tip:perf/core] napi: Convert " tip-bot for Neil Horman
2010-08-23  9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi
2010-08-24  3:53   ` David Miller
2010-09-08  8:34   ` [tip:perf/core] netdev: Add " tip-bot for Koki Sanagi
2010-08-23  9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi
2010-08-24  3:53   ` David Miller
2010-09-08  8:35   ` [tip:perf/core] skb: Add " tip-bot for Koki Sanagi
2010-08-23  9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi
2010-08-24  3:53   ` David Miller
2010-09-07 16:57   ` Frederic Weisbecker
2010-09-08  8:35   ` [tip:perf/core] perf: Add a script to show packets processing tip-bot for Koki Sanagi
2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt
2010-09-03  2:10   ` Koki Sanagi
2010-09-03  2:17     ` David Miller
2010-09-03  2:55       ` Koki Sanagi
2010-09-03  4:46         ` Frederic Weisbecker
2010-09-03  5:12           ` Koki Sanagi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.