* [PATCH v4 0/5] netdev: show a process of packets @ 2010-08-23 9:41 Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi ` (5 more replies) 0 siblings, 6 replies; 93+ messages in thread From: Koki Sanagi @ 2010-08-23 9:41 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers Rebase to the latest net-next. CHANGE-LOG since v3: 1) change arguments of softirq tracepoint into original one. 2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked and add trace_kfree_skb before __kfree_skb instead of them. 3) add tracepoint to netif_rx and display it by netdev-times script. These patch-set adds tracepoints to show us a process of packets. Using these tracepoints and existing points, we can get the time when packet passes through some points in transmit or receive sequence. For example, this is an output of perf script which is attached by patch 5/5. 106133.171439sec cpu=0 irq_entry(+0.000msec irq=24:eth4) | softirq_entry(+0.006msec) | |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) | | | skb_copy_datagram_iovec(+0.039msec 10291::10291) | napi_poll_exit(+0.022msec eth4) 106134.175634sec cpu=1 irq_entry(+0.000msec irq=28:eth1) | |---netif_rx(+0.009msec skb=f3ef0a00) | softirq_entry(+0.018msec) | |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84) | | | skb_copy_datagram_iovec(+0.033msec 0:swapper) | napi_poll_exit(+0.035msec (no_device)) The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can show receive sequence frominterrupt(irq_entry) to application (skb_copy_datagram_iovec). This script shows one NET_RX softirq and events related to it. All relative time bases on first irq_entry which raise NET_RX softirq. dev len Qdisc netdevice free eth4 74 106125.030004sec 0.006msec 0.009msec eth4 87 106125.041020sec 0.007msec 0.023msec eth4 66 106125.042291sec 0.003msec 0.012msec eth4 66 106125.043274sec 0.006msec 0.004msec eth4 850 106125.044283sec 0.007msec 0.018msec The above is a transmit side. There are three check-time-points. Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in dev_hard_start_xmit. It indicates finishing putting a packet to driver. point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet. Values of this script are, from left, device name, length of a packet, a time of point1, an interval time between point1 and point2 and an interval time between point2 and point3. These times are useful to analyze a performance or to detect a point where packet delays. For example, - NET_RX softirq calling is late. - Application is late to take a packet. - It takes much time to put a transmitting packet to driver (It may be caused by packed queue) And also, these tracepoint help us to investigate a network driver's trouble from memory dump because ftrace records it to memory. And ftrace is so light even if always trace on. So, in a case investigating a problem which doesn't reproduce, it is useful. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi @ 2010-08-23 9:42 ` Koki Sanagi 2010-09-03 15:29 ` Frederic Weisbecker 2010-09-08 8:33 ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi ` (4 subsequent siblings) 5 siblings, 2 replies; 93+ messages in thread From: Koki Sanagi @ 2010-08-23 9:42 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Lai Jiangshan <laijs@cn.fujitsu.com> Add a tracepoint for tracing when softirq action is raised. It and the existed tracepoints complete softirq's tracepoints: softirq_raise, softirq_entry and softirq_exit. And when this tracepoint is used in combination with the softirq_entry tracepoint we can determine the softirq raise latency. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> [ factorize softirq events with DECLARE_EVENT_CLASS ] Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/linux/interrupt.h | 8 +++++++- include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index a0384a4..d3e8e90 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -18,6 +18,7 @@ #include <asm/atomic.h> #include <asm/ptrace.h> #include <asm/system.h> +#include <trace/events/irq.h> /* * These correspond to the IORESOURCE_IRQ_* defines in @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) +static inline void __raise_softirq_irqoff(unsigned int nr) +{ + trace_softirq_raise((struct softirq_action *)&nr, NULL); + or_softirq_pending(1UL << nr); +} + extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); extern void wakeup_softirqd(void); diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index 0e4cfb6..3ddda02 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -5,7 +5,9 @@ #define _TRACE_IRQ_H #include <linux/tracepoint.h> -#include <linux/interrupt.h> + +struct irqaction; +struct softirq_action; #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } #define show_softirq_name(val) \ @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, ), TP_fast_assign( - __entry->vec = (int)(h - vec); + if (vec) + __entry->vec = (int)(h - vec); + else + __entry->vec = *((int *)h); ), TP_printk("vec=%d [action=%s]", __entry->vec, @@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit, TP_ARGS(h, vec) ); +/** + * softirq_raise - called immediately when a softirq is raised + * @h: pointer to struct softirq_action + * @vec: pointer to first struct softirq_action in softirq_vec array + * + * The @h parameter contains a pointer to the softirq vector number which is + * raised. @vec is NULL and it means @h includes vector number not + * softirq_action. When used in combination with the softirq_entry tracepoint + * we can determine the softirq raise latency. + */ +DEFINE_EVENT(softirq, softirq_raise, + + TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + + TP_ARGS(h, vec) +); + #endif /* _TRACE_IRQ_H */ /* This part must be outside protection */ ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi @ 2010-09-03 15:29 ` Frederic Weisbecker 2010-09-03 15:39 ` Steven Rostedt 2010-09-03 15:43 ` Steven Rostedt 2010-09-08 8:33 ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan 1 sibling, 2 replies; 93+ messages in thread From: Frederic Weisbecker @ 2010-09-03 15:29 UTC (permalink / raw) To: Koki Sanagi Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, mathieu.desnoyers On Mon, Aug 23, 2010 at 06:42:48PM +0900, Koki Sanagi wrote: > From: Lai Jiangshan <laijs@cn.fujitsu.com> > > Add a tracepoint for tracing when softirq action is raised. > > It and the existed tracepoints complete softirq's tracepoints: > softirq_raise, softirq_entry and softirq_exit. > > And when this tracepoint is used in combination with > the softirq_entry tracepoint we can determine > the softirq raise latency. > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Acked-by: Frederic Weisbecker <fweisbec@gmail.com> > > [ factorize softirq events with DECLARE_EVENT_CLASS ] > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > --- > include/linux/interrupt.h | 8 +++++++- > include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- > 2 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > index a0384a4..d3e8e90 100644 > --- a/include/linux/interrupt.h > +++ b/include/linux/interrupt.h > @@ -18,6 +18,7 @@ > #include <asm/atomic.h> > #include <asm/ptrace.h> > #include <asm/system.h> > +#include <trace/events/irq.h> > > /* > * These correspond to the IORESOURCE_IRQ_* defines in > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); > asmlinkage void __do_softirq(void); > extern void open_softirq(int nr, void (*action)(struct softirq_action *)); > extern void softirq_init(void); > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) > +static inline void __raise_softirq_irqoff(unsigned int nr) > +{ > + trace_softirq_raise((struct softirq_action *)&nr, NULL); > + or_softirq_pending(1UL << nr); > +} > + > extern void raise_softirq_irqoff(unsigned int nr); > extern void raise_softirq(unsigned int nr); > extern void wakeup_softirqd(void); > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h > index 0e4cfb6..3ddda02 100644 > --- a/include/trace/events/irq.h > +++ b/include/trace/events/irq.h > @@ -5,7 +5,9 @@ > #define _TRACE_IRQ_H > > #include <linux/tracepoint.h> > -#include <linux/interrupt.h> > + > +struct irqaction; > +struct softirq_action; > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > #define show_softirq_name(val) \ > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > ), > > TP_fast_assign( > - __entry->vec = (int)(h - vec); > + if (vec) > + __entry->vec = (int)(h - vec); > + else > + __entry->vec = *((int *)h); > ), It seems that this will break softirq_entry/exit tracepoints. __entry->vec will deref vec->action() for these two, which is not what we want. If you can't have the same tracepoint signature for the three, just split the new one in a seperate TRACE_EVENT(). Thanks. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:29 ` Frederic Weisbecker @ 2010-09-03 15:39 ` Steven Rostedt 2010-09-03 15:42 ` Frederic Weisbecker 2010-09-03 15:43 ` Steven Rostedt 1 sibling, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-09-03 15:39 UTC (permalink / raw) To: Frederic Weisbecker Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > #define show_softirq_name(val) \ > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > ), > > > > TP_fast_assign( > > - __entry->vec = (int)(h - vec); > > + if (vec) > > + __entry->vec = (int)(h - vec); > > + else > > + __entry->vec = *((int *)h); > > ), > > > > It seems that this will break softirq_entry/exit tracepoints. > __entry->vec will deref vec->action() for these two, which is not > what we want. But for trace_softirq_entry and trace_softirq_exit, vec will not be NULL. > > If you can't have the same tracepoint signature for the three, just > split the new one in a seperate TRACE_EVENT(). It may be a bit of a hack, and questionable about adding another TRACE_EVENT(). There still is a pretty good space savings in using DEFINE_EVENT() over TRACE_EVENT() though. -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:39 ` Steven Rostedt @ 2010-09-03 15:42 ` Frederic Weisbecker 0 siblings, 0 replies; 93+ messages in thread From: Frederic Weisbecker @ 2010-09-03 15:42 UTC (permalink / raw) To: Steven Rostedt Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, Sep 03, 2010 at 11:39:36AM -0400, Steven Rostedt wrote: > On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > > #define show_softirq_name(val) \ > > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > > ), > > > > > > TP_fast_assign( > > > - __entry->vec = (int)(h - vec); > > > + if (vec) > > > + __entry->vec = (int)(h - vec); > > > + else > > > + __entry->vec = *((int *)h); > > > ), > > > > > > > > It seems that this will break softirq_entry/exit tracepoints. > > __entry->vec will deref vec->action() for these two, which is not > > what we want. > > But for trace_softirq_entry and trace_softirq_exit, vec will not be > NULL. Oh right... /me slaps his forehead > > > > > If you can't have the same tracepoint signature for the three, just > > split the new one in a seperate TRACE_EVENT(). > > It may be a bit of a hack, and questionable about adding another > TRACE_EVENT(). There still is a pretty good space savings in using > DEFINE_EVENT() over TRACE_EVENT() though. Yeah, let's keep it as is. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:29 ` Frederic Weisbecker 2010-09-03 15:39 ` Steven Rostedt @ 2010-09-03 15:43 ` Steven Rostedt 2010-09-03 15:50 ` Frederic Weisbecker 1 sibling, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-09-03 15:43 UTC (permalink / raw) To: Frederic Weisbecker Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > /* > > * These correspond to the IORESOURCE_IRQ_* defines in > > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); > > asmlinkage void __do_softirq(void); > > extern void open_softirq(int nr, void (*action)(struct softirq_action *)); > > extern void softirq_init(void); > > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) > > +static inline void __raise_softirq_irqoff(unsigned int nr) > > +{ > > + trace_softirq_raise((struct softirq_action *)&nr, NULL); Perhaps doing: trace_softirq_raise((struct softirq_action *)((unsigend long)nr), NULL); and ... > > + or_softirq_pending(1UL << nr); > > +} > > + > > extern void raise_softirq_irqoff(unsigned int nr); > > extern void raise_softirq(unsigned int nr); > > extern void wakeup_softirqd(void); > > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h > > index 0e4cfb6..3ddda02 100644 > > --- a/include/trace/events/irq.h > > +++ b/include/trace/events/irq.h > > @@ -5,7 +5,9 @@ > > #define _TRACE_IRQ_H > > > > #include <linux/tracepoint.h> > > -#include <linux/interrupt.h> > > + > > +struct irqaction; > > +struct softirq_action; > > > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > #define show_softirq_name(val) \ > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > ), > > > > TP_fast_assign( > > - __entry->vec = (int)(h - vec); > > + if (vec) > > + __entry->vec = (int)(h - vec); > > + else > > + __entry->vec = *((int *)h); __entry->vec = (int)h; would be better. > > ), > > > > It seems that this will break softirq_entry/exit tracepoints. > __entry->vec will deref vec->action() for these two, which is not > what we want. > > If you can't have the same tracepoint signature for the three, just > split the new one in a seperate TRACE_EVENT(). Doing the above will at least be a bit safer. -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:43 ` Steven Rostedt @ 2010-09-03 15:50 ` Frederic Weisbecker 2010-09-06 1:46 ` Koki Sanagi 0 siblings, 1 reply; 93+ messages in thread From: Frederic Weisbecker @ 2010-09-03 15:50 UTC (permalink / raw) To: Steven Rostedt Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, Sep 03, 2010 at 11:43:12AM -0400, Steven Rostedt wrote: > On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > > > /* > > > * These correspond to the IORESOURCE_IRQ_* defines in > > > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); > > > asmlinkage void __do_softirq(void); > > > extern void open_softirq(int nr, void (*action)(struct softirq_action *)); > > > extern void softirq_init(void); > > > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) > > > +static inline void __raise_softirq_irqoff(unsigned int nr) > > > +{ > > > + trace_softirq_raise((struct softirq_action *)&nr, NULL); > > Perhaps doing: > > trace_softirq_raise((struct softirq_action *)((unsigend long)nr), > NULL); > > and ... > > > > + or_softirq_pending(1UL << nr); > > > +} > > > + > > > extern void raise_softirq_irqoff(unsigned int nr); > > > extern void raise_softirq(unsigned int nr); > > > extern void wakeup_softirqd(void); > > > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h > > > index 0e4cfb6..3ddda02 100644 > > > --- a/include/trace/events/irq.h > > > +++ b/include/trace/events/irq.h > > > @@ -5,7 +5,9 @@ > > > #define _TRACE_IRQ_H > > > > > > #include <linux/tracepoint.h> > > > -#include <linux/interrupt.h> > > > + > > > +struct irqaction; > > > +struct softirq_action; > > > > > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > > #define show_softirq_name(val) \ > > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > > ), > > > > > > TP_fast_assign( > > > - __entry->vec = (int)(h - vec); > > > + if (vec) > > > + __entry->vec = (int)(h - vec); > > > + else > > > + __entry->vec = *((int *)h); > > __entry->vec = (int)h; > > would be better. > > > > > ), > > > > > > > > It seems that this will break softirq_entry/exit tracepoints. > > __entry->vec will deref vec->action() for these two, which is not > > what we want. > > > > If you can't have the same tracepoint signature for the three, just > > split the new one in a seperate TRACE_EVENT(). > > Doing the above will at least be a bit safer. Agreed, I'm going to change that in the patch. Thanks. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:50 ` Frederic Weisbecker @ 2010-09-06 1:46 ` Koki Sanagi 0 siblings, 0 replies; 93+ messages in thread From: Koki Sanagi @ 2010-09-06 1:46 UTC (permalink / raw) To: Frederic Weisbecker Cc: Steven Rostedt, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers (2010/09/04 0:50), Frederic Weisbecker wrote: > On Fri, Sep 03, 2010 at 11:43:12AM -0400, Steven Rostedt wrote: >> On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: >> >>>> /* >>>> * These correspond to the IORESOURCE_IRQ_* defines in >>>> @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); >>>> asmlinkage void __do_softirq(void); >>>> extern void open_softirq(int nr, void (*action)(struct softirq_action *)); >>>> extern void softirq_init(void); >>>> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) >>>> +static inline void __raise_softirq_irqoff(unsigned int nr) >>>> +{ >>>> + trace_softirq_raise((struct softirq_action *)&nr, NULL); >> >> Perhaps doing: >> >> trace_softirq_raise((struct softirq_action *)((unsigend long)nr), >> NULL); >> >> and ... >> >>>> + or_softirq_pending(1UL << nr); >>>> +} >>>> + >>>> extern void raise_softirq_irqoff(unsigned int nr); >>>> extern void raise_softirq(unsigned int nr); >>>> extern void wakeup_softirqd(void); >>>> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h >>>> index 0e4cfb6..3ddda02 100644 >>>> --- a/include/trace/events/irq.h >>>> +++ b/include/trace/events/irq.h >>>> @@ -5,7 +5,9 @@ >>>> #define _TRACE_IRQ_H >>>> >>>> #include <linux/tracepoint.h> >>>> -#include <linux/interrupt.h> >>>> + >>>> +struct irqaction; >>>> +struct softirq_action; >>>> >>>> #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } >>>> #define show_softirq_name(val) \ >>>> @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, >>>> ), >>>> >>>> TP_fast_assign( >>>> - __entry->vec = (int)(h - vec); >>>> + if (vec) >>>> + __entry->vec = (int)(h - vec); >>>> + else >>>> + __entry->vec = *((int *)h); >> >> __entry->vec = (int)h; >> >> would be better. >> >> >>>> ), >>> >>> >>> >>> It seems that this will break softirq_entry/exit tracepoints. >>> __entry->vec will deref vec->action() for these two, which is not >>> what we want. >>> >>> If you can't have the same tracepoint signature for the three, just >>> split the new one in a seperate TRACE_EVENT(). >> >> Doing the above will at least be a bit safer. > > > Agreed, I'm going to change that in the patch. > > Thanks. > I agree. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 93+ messages in thread
* [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi 2010-09-03 15:29 ` Frederic Weisbecker @ 2010-09-08 8:33 ` tip-bot for Lai Jiangshan 2010-09-08 11:25 ` [sparc build bug] " Ingo Molnar 1 sibling, 1 reply; 93+ messages in thread From: tip-bot for Lai Jiangshan @ 2010-09-08 8:33 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro Commit-ID: 2bf2160d8805de64308e2e7c3cd97813cb58ed2f Gitweb: http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f Author: Lai Jiangshan <laijs@cn.fujitsu.com> AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900 Committer: Frederic Weisbecker <fweisbec@gmail.com> CommitDate: Tue, 7 Sep 2010 17:49:34 +0200 irq: Add tracepoint to softirq_raise Add a tracepoint for tracing when softirq action is raised. This and the existing tracepoints complete softirq's tracepoints: softirq_raise, softirq_entry and softirq_exit. And when this tracepoint is used in combination with the softirq_entry tracepoint we can determine the softirq raise latency. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4C724298.4050509@jp.fujitsu.com> [ factorize softirq events with DECLARE_EVENT_CLASS ] Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- include/linux/interrupt.h | 8 +++++++- include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index a0384a4..531495d 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -18,6 +18,7 @@ #include <asm/atomic.h> #include <asm/ptrace.h> #include <asm/system.h> +#include <trace/events/irq.h> /* * These correspond to the IORESOURCE_IRQ_* defines in @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) +static inline void __raise_softirq_irqoff(unsigned int nr) +{ + trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL); + or_softirq_pending(1UL << nr); +} + extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); extern void wakeup_softirqd(void); diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index 0e4cfb6..6fa7cba 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -5,7 +5,9 @@ #define _TRACE_IRQ_H #include <linux/tracepoint.h> -#include <linux/interrupt.h> + +struct irqaction; +struct softirq_action; #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } #define show_softirq_name(val) \ @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, ), TP_fast_assign( - __entry->vec = (int)(h - vec); + if (vec) + __entry->vec = (int)(h - vec); + else + __entry->vec = (int)(long)h; ), TP_printk("vec=%d [action=%s]", __entry->vec, @@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit, TP_ARGS(h, vec) ); +/** + * softirq_raise - called immediately when a softirq is raised + * @h: pointer to struct softirq_action + * @vec: pointer to first struct softirq_action in softirq_vec array + * + * The @h parameter contains a pointer to the softirq vector number which is + * raised. @vec is NULL and it means @h includes vector number not + * softirq_action. When used in combination with the softirq_entry tracepoint + * we can determine the softirq raise latency. + */ +DEFINE_EVENT(softirq, softirq_raise, + + TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + + TP_ARGS(h, vec) +); + #endif /* _TRACE_IRQ_H */ /* This part must be outside protection */ ^ permalink raw reply related [flat|nested] 93+ messages in thread
* [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-09-08 8:33 ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan @ 2010-09-08 11:25 ` Ingo Molnar 2010-09-08 12:26 ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker 2010-10-18 9:44 ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra 0 siblings, 2 replies; 93+ messages in thread From: Ingo Molnar @ 2010-09-08 11:25 UTC (permalink / raw) To: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro Cc: linux-tip-commits * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > Commit-ID: 2bf2160d8805de64308e2e7c3cd97813cb58ed2f > Gitweb: http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f > Author: Lai Jiangshan <laijs@cn.fujitsu.com> > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900 > Committer: Frederic Weisbecker <fweisbec@gmail.com> > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200 > > irq: Add tracepoint to softirq_raise > > Add a tracepoint for tracing when softirq action is raised. > > This and the existing tracepoints complete softirq's tracepoints: > softirq_raise, softirq_entry and softirq_exit. > > And when this tracepoint is used in combination with > the softirq_entry tracepoint we can determine > the softirq raise latency. > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Acked-by: Neil Horman <nhorman@tuxdriver.com> > Cc: David Miller <davem@davemloft.net> > Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> > Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > LKML-Reference: <4C724298.4050509@jp.fujitsu.com> > [ factorize softirq events with DECLARE_EVENT_CLASS ] > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > --- > include/linux/interrupt.h | 8 +++++++- > include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- > 2 files changed, 31 insertions(+), 3 deletions(-) FYI, this commit broke the Sparc build: In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, from /home/mingo/tip/arch/sparc/include/asm/irq.h:6, from /home/mingo/tip/include/linux/irqnr.h:10, from /home/mingo/tip/include/linux/irq.h:22, from /home/mingo/tip/include/asm-generic/hardirq.h:6, from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11, from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6, from /home/mingo/tip/include/linux/hardirq.h:10, from /home/mingo/tip/include/linux/ftrace_event.h:7, from /home/mingo/tip/include/trace/syscall.h:6, from /home/mingo/tip/include/linux/syscalls.h:76, from /home/mingo/tip/init/initramfs.c:9: /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff': /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending' /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment make[2]: *** [init/initramfs.o] Error 1 make[2]: *** Waiting for unfinished jobs.... In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, Ingo ^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH] irq: Fix circular headers dependency 2010-09-08 11:25 ` [sparc build bug] " Ingo Molnar @ 2010-09-08 12:26 ` Frederic Weisbecker 2010-09-09 19:54 ` [tip:perf/core] " tip-bot for Frederic Weisbecker 2010-10-18 9:44 ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra 1 sibling, 1 reply; 93+ messages in thread From: Frederic Weisbecker @ 2010-09-08 12:26 UTC (permalink / raw) To: Ingo Molnar Cc: mingo, mathieu.desnoyers, sanagi.koki, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro, linux-tip-commits On Wed, Sep 08, 2010 at 01:25:29PM +0200, Ingo Molnar wrote: > > * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > Commit-ID: 2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > Gitweb: http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > Author: Lai Jiangshan <laijs@cn.fujitsu.com> > > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900 > > Committer: Frederic Weisbecker <fweisbec@gmail.com> > > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200 > > > > irq: Add tracepoint to softirq_raise > > > > Add a tracepoint for tracing when softirq action is raised. > > > > This and the existing tracepoints complete softirq's tracepoints: > > softirq_raise, softirq_entry and softirq_exit. > > > > And when this tracepoint is used in combination with > > the softirq_entry tracepoint we can determine > > the softirq raise latency. > > > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Acked-by: Neil Horman <nhorman@tuxdriver.com> > > Cc: David Miller <davem@davemloft.net> > > Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> > > Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> > > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> > > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > > Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Eric Dumazet <eric.dumazet@gmail.com> > > LKML-Reference: <4C724298.4050509@jp.fujitsu.com> > > [ factorize softirq events with DECLARE_EVENT_CLASS ] > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > > --- > > include/linux/interrupt.h | 8 +++++++- > > include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- > > 2 files changed, 31 insertions(+), 3 deletions(-) > > FYI, this commit broke the Sparc build: > > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, > from /home/mingo/tip/arch/sparc/include/asm/irq.h:6, > from /home/mingo/tip/include/linux/irqnr.h:10, > from /home/mingo/tip/include/linux/irq.h:22, > from /home/mingo/tip/include/asm-generic/hardirq.h:6, > from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11, > from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6, > from /home/mingo/tip/include/linux/hardirq.h:10, > from /home/mingo/tip/include/linux/ftrace_event.h:7, > from /home/mingo/tip/include/trace/syscall.h:6, > from /home/mingo/tip/include/linux/syscalls.h:76, > from /home/mingo/tip/init/initramfs.c:9: > /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff': > /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending' > /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment > make[2]: *** [init/initramfs.o] Error 1 > make[2]: *** Waiting for unfinished jobs.... > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, > > Ingo Yeah, there is a circular dependency. Does that fixes the issue (and if so, does that look sane)? Thanks. --- >From fc21eaa02d4a6f0af396af6a106587e61515cd86 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker <fweisbec@gmail.com> Date: Wed, 8 Sep 2010 14:17:31 +0200 Subject: [PATCH] irq: Fix circular headers dependency asm-generic/hardirq.h needs asm/irq.h which might include linux/interrupt.h as in the sparc 32 case. At this point we need irq_cpustat generic definitions, but those are included later in asm-generic/hardirq.h. Then delay a bit the inclusion of irq.h from asm-generic/hardirq.h, it doesn't need to be included early. This fixes: In file included from arch/sparc/include/asm/irq_32.h:11, from arch/sparc/include/asm/irq.h:6, from include/linux/irqnr.h:10, from include/linux/irq.h:22, from include/asm-generic/hardirq.h:6, from arch/sparc/include/asm/hardirq_32.h:11, from arch/sparc/include/asm/hardirq.h:6, from include/linux/hardirq.h:10, from include/linux/ftrace_event.h:7, from include/trace/syscall.h:6, from include/linux/syscalls.h:76, from init/initramfs.c:9: include/linux/interrupt.h: In function '__raise_softirq_irqoff': include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending' include/linux/interrupt.h:414: error: lvalue required as left operand of assignment Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/asm-generic/hardirq.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/asm-generic/hardirq.h b/include/asm-generic/hardirq.h index 62f5908..04d0a97 100644 --- a/include/asm-generic/hardirq.h +++ b/include/asm-generic/hardirq.h @@ -3,13 +3,13 @@ #include <linux/cache.h> #include <linux/threads.h> -#include <linux/irq.h> typedef struct { unsigned int __softirq_pending; } ____cacheline_aligned irq_cpustat_t; #include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */ +#include <linux/irq.h> #ifndef ack_bad_irq static inline void ack_bad_irq(unsigned int irq) -- 1.6.2.3 ^ permalink raw reply related [flat|nested] 93+ messages in thread
* [tip:perf/core] irq: Fix circular headers dependency 2010-09-08 12:26 ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker @ 2010-09-09 19:54 ` tip-bot for Frederic Weisbecker 0 siblings, 0 replies; 93+ messages in thread From: tip-bot for Frederic Weisbecker @ 2010-09-09 19:54 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, sanagi.koki, fweisbec, tglx, laijs, mingo Commit-ID: 3b8fad3e2f5f69bfd8e42d099ca8582fb2342edf Gitweb: http://git.kernel.org/tip/3b8fad3e2f5f69bfd8e42d099ca8582fb2342edf Author: Frederic Weisbecker <fweisbec@gmail.com> AuthorDate: Wed, 8 Sep 2010 14:26:00 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Thu, 9 Sep 2010 21:28:58 +0200 irq: Fix circular headers dependency asm-generic/hardirq.h needs asm/irq.h which might include linux/interrupt.h as in the sparc 32 case. At this point we need irq_cpustat generic definitions, but those are included later in asm-generic/hardirq.h. Then delay a bit the inclusion of irq.h from asm-generic/hardirq.h, it doesn't need to be included early. This fixes: include/linux/interrupt.h: In function '__raise_softirq_irqoff': include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending' include/linux/interrupt.h:414: error: lvalue required as left operand of assignment Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Cc: mathieu.desnoyers@efficios.com Cc: rostedt@goodmis.org Cc: nhorman@tuxdriver.com Cc: scott.a.mcmillan@intel.com Cc: eric.dumazet@gmail.com Cc: kaneshige.kenji@jp.fujitsu.com Cc: davem@davemloft.net Cc: izumi.taku@jp.fujitsu.com Cc: kosaki.motohiro@jp.fujitsu.com LKML-Reference: <20100908122557.GA5310@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- include/asm-generic/hardirq.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/asm-generic/hardirq.h b/include/asm-generic/hardirq.h index 62f5908..04d0a97 100644 --- a/include/asm-generic/hardirq.h +++ b/include/asm-generic/hardirq.h @@ -3,13 +3,13 @@ #include <linux/cache.h> #include <linux/threads.h> -#include <linux/irq.h> typedef struct { unsigned int __softirq_pending; } ____cacheline_aligned irq_cpustat_t; #include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */ +#include <linux/irq.h> #ifndef ack_bad_irq static inline void ack_bad_irq(unsigned int irq) ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-09-08 11:25 ` [sparc build bug] " Ingo Molnar 2010-09-08 12:26 ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker @ 2010-10-18 9:44 ` Peter Zijlstra 2010-10-18 10:11 ` Peter Zijlstra 2010-10-18 10:48 ` Peter Zijlstra 1 sibling, 2 replies; 93+ messages in thread From: Peter Zijlstra @ 2010-10-18 9:44 UTC (permalink / raw) To: Ingo Molnar Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro, linux-tip-commits, Heiko Carstens On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote: > * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > Commit-ID: 2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > Gitweb: http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > Author: Lai Jiangshan <laijs@cn.fujitsu.com> > > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900 > > Committer: Frederic Weisbecker <fweisbec@gmail.com> > > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200 > > > > irq: Add tracepoint to softirq_raise > > > > Add a tracepoint for tracing when softirq action is raised. > > > > This and the existing tracepoints complete softirq's tracepoints: > > softirq_raise, softirq_entry and softirq_exit. > > > > And when this tracepoint is used in combination with > > the softirq_entry tracepoint we can determine > > the softirq raise latency. > > > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Acked-by: Neil Horman <nhorman@tuxdriver.com> > > Cc: David Miller <davem@davemloft.net> > > Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> > > Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> > > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> > > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > > Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Eric Dumazet <eric.dumazet@gmail.com> > > LKML-Reference: <4C724298.4050509@jp.fujitsu.com> > > [ factorize softirq events with DECLARE_EVENT_CLASS ] > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > > --- > > include/linux/interrupt.h | 8 +++++++- > > include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- > > 2 files changed, 31 insertions(+), 3 deletions(-) > > FYI, this commit broke the Sparc build: > > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, > from /home/mingo/tip/arch/sparc/include/asm/irq.h:6, > from /home/mingo/tip/include/linux/irqnr.h:10, > from /home/mingo/tip/include/linux/irq.h:22, > from /home/mingo/tip/include/asm-generic/hardirq.h:6, > from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11, > from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6, > from /home/mingo/tip/include/linux/hardirq.h:10, > from /home/mingo/tip/include/linux/ftrace_event.h:7, > from /home/mingo/tip/include/trace/syscall.h:6, > from /home/mingo/tip/include/linux/syscalls.h:76, > from /home/mingo/tip/init/initramfs.c:9: > /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff': > /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending' > /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment > make[2]: *** [init/initramfs.o] Error 1 > make[2]: *** Waiting for unfinished jobs.... > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, I could build sparc64_defconfig, but s390 is broken for me by this... ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-10-18 9:44 ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra @ 2010-10-18 10:11 ` Peter Zijlstra 2010-10-18 10:26 ` Heiko Carstens 2010-10-18 10:48 ` Peter Zijlstra 1 sibling, 1 reply; 93+ messages in thread From: Peter Zijlstra @ 2010-10-18 10:11 UTC (permalink / raw) To: Ingo Molnar Cc: Ingo Molnar, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro, linux-tip-commits, Heiko Carstens On Mon, 2010-10-18 at 11:44 +0200, Peter Zijlstra wrote: > On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote: > > * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > > > Commit-ID: 2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > > Gitweb: http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > > Author: Lai Jiangshan <laijs@cn.fujitsu.com> > > > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900 > > > Committer: Frederic Weisbecker <fweisbec@gmail.com> > > > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200 > > > > > > irq: Add tracepoint to softirq_raise > > > > > > Add a tracepoint for tracing when softirq action is raised. > > > > > > This and the existing tracepoints complete softirq's tracepoints: > > > softirq_raise, softirq_entry and softirq_exit. > > > > > > And when this tracepoint is used in combination with > > > the softirq_entry tracepoint we can determine > > > the softirq raise latency. > > > > > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > > > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > > Acked-by: Neil Horman <nhorman@tuxdriver.com> > > > Cc: David Miller <davem@davemloft.net> > > > Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> > > > Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> > > > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> > > > Cc: Lai Jiangshan <laijs@cn.fujitsu.com> > > > Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> > > > Cc: Steven Rostedt <rostedt@goodmis.org> > > > Cc: Eric Dumazet <eric.dumazet@gmail.com> > > > LKML-Reference: <4C724298.4050509@jp.fujitsu.com> > > > [ factorize softirq events with DECLARE_EVENT_CLASS ] > > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> > > > --- > > > include/linux/interrupt.h | 8 +++++++- > > > include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- > > > 2 files changed, 31 insertions(+), 3 deletions(-) > > > I could build sparc64_defconfig, but s390 is broken for me by this... the below makes s390 build again, not sure its completely safe for all configs though... Heiko? --- arch/s390/include/asm/hardirq.h | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/s390/include/asm/hardirq.h b/arch/s390/include/asm/hardirq.h index 498bc38..9558a71 100644 --- a/arch/s390/include/asm/hardirq.h +++ b/arch/s390/include/asm/hardirq.h @@ -15,7 +15,6 @@ #include <linux/threads.h> #include <linux/sched.h> #include <linux/cache.h> -#include <linux/interrupt.h> #include <asm/lowcore.h> #define local_softirq_pending() (S390_lowcore.softirq_pending) ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-10-18 10:11 ` Peter Zijlstra @ 2010-10-18 10:26 ` Heiko Carstens 0 siblings, 0 replies; 93+ messages in thread From: Heiko Carstens @ 2010-10-18 10:26 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro, linux-tip-commits On Mon, Oct 18, 2010 at 12:11:47PM +0200, Peter Zijlstra wrote: > > I could build sparc64_defconfig, but s390 is broken for me by this... > > > the below makes s390 build again, not sure its completely safe for all > configs though... > > Heiko? We have a similar patch in our git tree to fix this issue: http://git390.marist.edu/cgi-bin/gitweb.cgi?p=linux-2.6.git;a=commitdiff;h=b722c7e6ce9b52b58a5488aacfd90936a2720dd9 (link valid until the next rebase ;) > --- > arch/s390/include/asm/hardirq.h | 1 - > 1 files changed, 0 insertions(+), 1 deletions(-) > > diff --git a/arch/s390/include/asm/hardirq.h > b/arch/s390/include/asm/hardirq.h > index 498bc38..9558a71 100644 > --- a/arch/s390/include/asm/hardirq.h > +++ b/arch/s390/include/asm/hardirq.h > @@ -15,7 +15,6 @@ > #include <linux/threads.h> > #include <linux/sched.h> > #include <linux/cache.h> > -#include <linux/interrupt.h> > #include <asm/lowcore.h> > > #define local_softirq_pending() (S390_lowcore.softirq_pending) > ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-10-18 9:44 ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra 2010-10-18 10:11 ` Peter Zijlstra @ 2010-10-18 10:48 ` Peter Zijlstra 2010-10-19 10:58 ` Koki Sanagi 1 sibling, 1 reply; 93+ messages in thread From: Peter Zijlstra @ 2010-10-18 10:48 UTC (permalink / raw) To: Ingo Molnar Cc: Ingo Molnar, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro, linux-tip-commits, Heiko Carstens, Luck, Tony On Mon, 2010-10-18 at 11:44 +0200, Peter Zijlstra wrote: > On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote: > > * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > > > > > Commit-ID: 2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > > Gitweb: http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f > > > Author: Lai Jiangshan <laijs@cn.fujitsu.com> > > > AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900 > > > Committer: Frederic Weisbecker <fweisbec@gmail.com> > > > CommitDate: Tue, 7 Sep 2010 17:49:34 +0200 > > > > > > irq: Add tracepoint to softirq_raise > > > > > > Add a tracepoint for tracing when softirq action is raised. > > > > > > This and the existing tracepoints complete softirq's tracepoints: > > > softirq_raise, softirq_entry and softirq_exit. > > > > > > And when this tracepoint is used in combination with > > > the softirq_entry tracepoint we can determine > > > the softirq raise latency. > > > > > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, > > from /home/mingo/tip/arch/sparc/include/asm/irq.h:6, > > from /home/mingo/tip/include/linux/irqnr.h:10, > > from /home/mingo/tip/include/linux/irq.h:22, > > from /home/mingo/tip/include/asm-generic/hardirq.h:6, > > from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11, > > from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6, > > from /home/mingo/tip/include/linux/hardirq.h:10, > > from /home/mingo/tip/include/linux/ftrace_event.h:7, > > from /home/mingo/tip/include/trace/syscall.h:6, > > from /home/mingo/tip/include/linux/syscalls.h:76, > > from /home/mingo/tip/init/initramfs.c:9: > > /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff': > > /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending' > > /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment > > make[2]: *** [init/initramfs.o] Error 1 > > make[2]: *** Waiting for unfinished jobs.... > > In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, > > I could build sparc64_defconfig, but s390 is broken for me by this... /me being very grumpy @ Lai.. ia64 is broken too! /me goes revert this shite ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-10-18 10:48 ` Peter Zijlstra @ 2010-10-19 10:58 ` Koki Sanagi 2010-10-19 11:25 ` Peter Zijlstra 2010-10-19 13:00 ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner 0 siblings, 2 replies; 93+ messages in thread From: Koki Sanagi @ 2010-10-19 10:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, mathieu.desnoyers, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro, linux-tip-commits, Heiko Carstens, Luck, Tony (2010/10/18 19:48), Peter Zijlstra wrote: > On Mon, 2010-10-18 at 11:44 +0200, Peter Zijlstra wrote: >> On Wed, 2010-09-08 at 13:25 +0200, Ingo Molnar wrote: >>> * tip-bot for Lai Jiangshan <laijs@cn.fujitsu.com> wrote: >>> >>>> Commit-ID: 2bf2160d8805de64308e2e7c3cd97813cb58ed2f >>>> Gitweb: http://git.kernel.org/tip/2bf2160d8805de64308e2e7c3cd97813cb58ed2f >>>> Author: Lai Jiangshan <laijs@cn.fujitsu.com> >>>> AuthorDate: Mon, 23 Aug 2010 18:42:48 +0900 >>>> Committer: Frederic Weisbecker <fweisbec@gmail.com> >>>> CommitDate: Tue, 7 Sep 2010 17:49:34 +0200 >>>> >>>> irq: Add tracepoint to softirq_raise >>>> >>>> Add a tracepoint for tracing when softirq action is raised. >>>> >>>> This and the existing tracepoints complete softirq's tracepoints: >>>> softirq_raise, softirq_entry and softirq_exit. >>>> >>>> And when this tracepoint is used in combination with >>>> the softirq_entry tracepoint we can determine >>>> the softirq raise latency. >>>> > >>> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, >>> from /home/mingo/tip/arch/sparc/include/asm/irq.h:6, >>> from /home/mingo/tip/include/linux/irqnr.h:10, >>> from /home/mingo/tip/include/linux/irq.h:22, >>> from /home/mingo/tip/include/asm-generic/hardirq.h:6, >>> from /home/mingo/tip/arch/sparc/include/asm/hardirq_32.h:11, >>> from /home/mingo/tip/arch/sparc/include/asm/hardirq.h:6, >>> from /home/mingo/tip/include/linux/hardirq.h:10, >>> from /home/mingo/tip/include/linux/ftrace_event.h:7, >>> from /home/mingo/tip/include/trace/syscall.h:6, >>> from /home/mingo/tip/include/linux/syscalls.h:76, >>> from /home/mingo/tip/init/initramfs.c:9: >>> /home/mingo/tip/include/linux/interrupt.h: In function '__raise_softirq_irqoff': >>> /home/mingo/tip/include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending' >>> /home/mingo/tip/include/linux/interrupt.h:414: error: lvalue required as left operand of assignment >>> make[2]: *** [init/initramfs.o] Error 1 >>> make[2]: *** Waiting for unfinished jobs.... >>> In file included from /home/mingo/tip/arch/sparc/include/asm/irq_32.h:11, >> >> I could build sparc64_defconfig, but s390 is broken for me by this... > > /me being very grumpy @ Lai.. ia64 is broken too! > > /me goes revert this shite > Now it is fixed. http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=9f081ce5da2c8af297a0a7d15a57fb4beeed374b;hp=43e3bf203456c4f06bdd6060426976ad2bed9081 ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise 2010-10-19 10:58 ` Koki Sanagi @ 2010-10-19 11:25 ` Peter Zijlstra 2010-10-19 13:00 ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner 1 sibling, 0 replies; 93+ messages in thread From: Peter Zijlstra @ 2010-10-19 11:25 UTC (permalink / raw) To: Koki Sanagi Cc: Ingo Molnar, mathieu.desnoyers, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro, linux-tip-commits, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 19:58 +0900, Koki Sanagi wrote: > >> I could build sparc64_defconfig, but s390 is broken for me by > this... > > > > /me being very grumpy @ Lai.. ia64 is broken too! > > > > /me goes revert this shite > > > > Now it is fixed. > > http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=9f081ce5da2c8af297a0a7d15a57fb4beeed374b;hp=43e3bf203456c4f06bdd6060426976ad2bed9081 No its not, -tip is still not buildable on s390 and ia64. Please don't ever pull crap like this again. ^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 10:58 ` Koki Sanagi 2010-10-19 11:25 ` Peter Zijlstra @ 2010-10-19 13:00 ` Thomas Gleixner 2010-10-19 13:08 ` Peter Zijlstra ` (2 more replies) 1 sibling, 3 replies; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 13:00 UTC (permalink / raw) To: Koki Sanagi Cc: Peter Zijlstra, Ingo Molnar, mathieu.desnoyers, Frederic Weisbecker, Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony With the addition of trace_softirq_raise() the softirq tracepoint got even more convoluted. Why the tracepoints take two pointers to assign an integer is beyond my comprehension. But adding an extra case which treats the first pointer as an unsigned long when the second pointer is NULL including the back and forth type casting is just horrible. Convert the softirq tracepoints to take a single unsigned int argument for the softirq vector number and fix the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/interrupt.h | 2 - include/trace/events/irq.h | 54 ++++++++++++++++----------------------------- kernel/softirq.c | 14 ++++++----- 3 files changed, 29 insertions(+), 41 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 531495d..0ac1949 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -410,7 +410,7 @@ extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); static inline void __raise_softirq_irqoff(unsigned int nr) { - trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL); + trace_softirq_raise(nr); or_softirq_pending(1UL << nr); } diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index 6fa7cba..1c09820 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -86,76 +86,62 @@ TRACE_EVENT(irq_handler_exit, DECLARE_EVENT_CLASS(softirq, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec), + TP_ARGS(vec_nr), TP_STRUCT__entry( - __field( int, vec ) + __field( unsigned int, vec ) ), TP_fast_assign( - if (vec) - __entry->vec = (int)(h - vec); - else - __entry->vec = (int)(long)h; + __entry->vec = vec_nr; ), - TP_printk("vec=%d [action=%s]", __entry->vec, + TP_printk("vec=%u [action=%s]", __entry->vec, show_softirq_name(__entry->vec)) ); /** * softirq_entry - called immediately before the softirq handler - * @h: pointer to struct softirq_action - * @vec: pointer to first struct softirq_action in softirq_vec array + * @vec_nr: softirq vector number * - * The @h parameter, contains a pointer to the struct softirq_action - * which has a pointer to the action handler that is called. By subtracting - * the @vec pointer from the @h pointer, we can determine the softirq - * number. Also, when used in combination with the softirq_exit tracepoint - * we can determine the softirq latency. + * When used in combination with the softirq_exit tracepoint + * we can determine the softirq handler runtine. */ DEFINE_EVENT(softirq, softirq_entry, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec) + TP_ARGS(vec_nr) ); /** * softirq_exit - called immediately after the softirq handler returns - * @h: pointer to struct softirq_action - * @vec: pointer to first struct softirq_action in softirq_vec array + * @vec_nr: softirq vector number * - * The @h parameter contains a pointer to the struct softirq_action - * that has handled the softirq. By subtracting the @vec pointer from - * the @h pointer, we can determine the softirq number. Also, when used in - * combination with the softirq_entry tracepoint we can determine the softirq - * latency. + * When used in combination with the softirq_entry tracepoint + * we can determine the softirq handler runtine. */ DEFINE_EVENT(softirq, softirq_exit, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec) + TP_ARGS(vec_nr) ); /** * softirq_raise - called immediately when a softirq is raised - * @h: pointer to struct softirq_action - * @vec: pointer to first struct softirq_action in softirq_vec array + * @vec_nr: softirq vector number * - * The @h parameter contains a pointer to the softirq vector number which is - * raised. @vec is NULL and it means @h includes vector number not - * softirq_action. When used in combination with the softirq_entry tracepoint - * we can determine the softirq raise latency. + * When used in combination with the softirq_entry tracepoint + * we can determine the softirq raise to run latency. */ DEFINE_EVENT(softirq, softirq_raise, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec) + TP_ARGS(vec_nr) ); #endif /* _TRACE_IRQ_H */ diff --git a/kernel/softirq.c b/kernel/softirq.c index 07b4f1b..c0a9ea5 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -212,18 +212,20 @@ restart: do { if (pending & 1) { + unsigned int vec_nr = h - softirq_vec; int prev_count = preempt_count(); - kstat_incr_softirqs_this_cpu(h - softirq_vec); - trace_softirq_entry(h, softirq_vec); + kstat_incr_softirqs_this_cpu(vec_nr); + + trace_softirq_entry(vec_nr); h->action(h); - trace_softirq_exit(h, softirq_vec); + trace_softirq_exit(vec_nr); if (unlikely(prev_count != preempt_count())) { printk(KERN_ERR "huh, entered softirq %td %s %p" "with preempt_count %08x," - " exited with %08x?\n", h - softirq_vec, - softirq_to_name[h - softirq_vec], - h->action, prev_count, preempt_count()); + " exited with %08x?\n", vec_nr, + softirq_to_name[vec_nr], h->action, + prev_count, preempt_count()); preempt_count() = prev_count; } ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 13:00 ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner @ 2010-10-19 13:08 ` Peter Zijlstra 2010-10-19 13:22 ` Mathieu Desnoyers 2010-10-21 14:52 ` [tip:perf/core] " tip-bot for Thomas Gleixner 2 siblings, 0 replies; 93+ messages in thread From: Peter Zijlstra @ 2010-10-19 13:08 UTC (permalink / raw) To: Thomas Gleixner Cc: Koki Sanagi, Ingo Molnar, mathieu.desnoyers, Frederic Weisbecker, Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 15:00 +0200, Thomas Gleixner wrote: > > With the addition of trace_softirq_raise() the softirq tracepoint got > even more convoluted. Why the tracepoints take two pointers to assign > an integer is beyond my comprehension. > > But adding an extra case which treats the first pointer as an unsigned > long when the second pointer is NULL including the back and forth > type casting is just horrible. > > Convert the softirq tracepoints to take a single unsigned int argument > for the softirq vector number and fix the call sites. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> A much needed cleanup indeed! ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 13:00 ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner 2010-10-19 13:08 ` Peter Zijlstra @ 2010-10-19 13:22 ` Mathieu Desnoyers 2010-10-19 13:41 ` Thomas Gleixner 2010-10-21 14:52 ` [tip:perf/core] " tip-bot for Thomas Gleixner 2 siblings, 1 reply; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-19 13:22 UTC (permalink / raw) To: Thomas Gleixner Cc: Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony * Thomas Gleixner (tglx@linutronix.de) wrote: > With the addition of trace_softirq_raise() the softirq tracepoint got > even more convoluted. Why the tracepoints take two pointers to assign > an integer is beyond my comprehension. > > But adding an extra case which treats the first pointer as an unsigned > long when the second pointer is NULL including the back and forth > type casting is just horrible. > > Convert the softirq tracepoints to take a single unsigned int argument > for the softirq vector number and fix the call sites. Well, there was originally a reason for this oddness. The in __do_softirq(), "h - softirq_ve"c computation was not needed outside of the tracepoint handler in the past, but it now seems to be required with the new inlined "kstat_incr_softirqs_this_cpu()". So yes, thanks to this recent change, it now makes sense to pull this computation out of the tracepoints and do it unconditionally in the kernel code. Feel free to put my: Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Thanks, Mathieu > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > include/linux/interrupt.h | 2 - > include/trace/events/irq.h | 54 ++++++++++++++++----------------------------- > kernel/softirq.c | 14 ++++++----- > 3 files changed, 29 insertions(+), 41 deletions(-) > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > index 531495d..0ac1949 100644 > --- a/include/linux/interrupt.h > +++ b/include/linux/interrupt.h > @@ -410,7 +410,7 @@ extern void open_softirq(int nr, void (*action)(struct softirq_action *)); > extern void softirq_init(void); > static inline void __raise_softirq_irqoff(unsigned int nr) > { > - trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL); > + trace_softirq_raise(nr); > or_softirq_pending(1UL << nr); > } > > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h > index 6fa7cba..1c09820 100644 > --- a/include/trace/events/irq.h > +++ b/include/trace/events/irq.h > @@ -86,76 +86,62 @@ TRACE_EVENT(irq_handler_exit, > > DECLARE_EVENT_CLASS(softirq, > > - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), > + TP_PROTO(unsigned int vec_nr), > > - TP_ARGS(h, vec), > + TP_ARGS(vec_nr), > > TP_STRUCT__entry( > - __field( int, vec ) > + __field( unsigned int, vec ) > ), > > TP_fast_assign( > - if (vec) > - __entry->vec = (int)(h - vec); > - else > - __entry->vec = (int)(long)h; > + __entry->vec = vec_nr; > ), > > - TP_printk("vec=%d [action=%s]", __entry->vec, > + TP_printk("vec=%u [action=%s]", __entry->vec, > show_softirq_name(__entry->vec)) > ); > > /** > * softirq_entry - called immediately before the softirq handler > - * @h: pointer to struct softirq_action > - * @vec: pointer to first struct softirq_action in softirq_vec array > + * @vec_nr: softirq vector number > * > - * The @h parameter, contains a pointer to the struct softirq_action > - * which has a pointer to the action handler that is called. By subtracting > - * the @vec pointer from the @h pointer, we can determine the softirq > - * number. Also, when used in combination with the softirq_exit tracepoint > - * we can determine the softirq latency. > + * When used in combination with the softirq_exit tracepoint > + * we can determine the softirq handler runtine. > */ > DEFINE_EVENT(softirq, softirq_entry, > > - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), > + TP_PROTO(unsigned int vec_nr), > > - TP_ARGS(h, vec) > + TP_ARGS(vec_nr) > ); > > /** > * softirq_exit - called immediately after the softirq handler returns > - * @h: pointer to struct softirq_action > - * @vec: pointer to first struct softirq_action in softirq_vec array > + * @vec_nr: softirq vector number > * > - * The @h parameter contains a pointer to the struct softirq_action > - * that has handled the softirq. By subtracting the @vec pointer from > - * the @h pointer, we can determine the softirq number. Also, when used in > - * combination with the softirq_entry tracepoint we can determine the softirq > - * latency. > + * When used in combination with the softirq_entry tracepoint > + * we can determine the softirq handler runtine. > */ > DEFINE_EVENT(softirq, softirq_exit, > > - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), > + TP_PROTO(unsigned int vec_nr), > > - TP_ARGS(h, vec) > + TP_ARGS(vec_nr) > ); > > /** > * softirq_raise - called immediately when a softirq is raised > - * @h: pointer to struct softirq_action > - * @vec: pointer to first struct softirq_action in softirq_vec array > + * @vec_nr: softirq vector number > * > - * The @h parameter contains a pointer to the softirq vector number which is > - * raised. @vec is NULL and it means @h includes vector number not > - * softirq_action. When used in combination with the softirq_entry tracepoint > - * we can determine the softirq raise latency. > + * When used in combination with the softirq_entry tracepoint > + * we can determine the softirq raise to run latency. > */ > DEFINE_EVENT(softirq, softirq_raise, > > - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), > + TP_PROTO(unsigned int vec_nr), > > - TP_ARGS(h, vec) > + TP_ARGS(vec_nr) > ); > > #endif /* _TRACE_IRQ_H */ > diff --git a/kernel/softirq.c b/kernel/softirq.c > index 07b4f1b..c0a9ea5 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -212,18 +212,20 @@ restart: > > do { > if (pending & 1) { > + unsigned int vec_nr = h - softirq_vec; > int prev_count = preempt_count(); > - kstat_incr_softirqs_this_cpu(h - softirq_vec); > > - trace_softirq_entry(h, softirq_vec); > + kstat_incr_softirqs_this_cpu(vec_nr); > + > + trace_softirq_entry(vec_nr); > h->action(h); > - trace_softirq_exit(h, softirq_vec); > + trace_softirq_exit(vec_nr); > if (unlikely(prev_count != preempt_count())) { > printk(KERN_ERR "huh, entered softirq %td %s %p" > "with preempt_count %08x," > - " exited with %08x?\n", h - softirq_vec, > - softirq_to_name[h - softirq_vec], > - h->action, prev_count, preempt_count()); > + " exited with %08x?\n", vec_nr, > + softirq_to_name[vec_nr], h->action, > + prev_count, preempt_count()); > preempt_count() = prev_count; > } > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 13:22 ` Mathieu Desnoyers @ 2010-10-19 13:41 ` Thomas Gleixner 2010-10-19 13:54 ` Steven Rostedt 2010-10-19 14:00 ` Mathieu Desnoyers 0 siblings, 2 replies; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 13:41 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Mathieu Desnoyers wrote: > * Thomas Gleixner (tglx@linutronix.de) wrote: > > With the addition of trace_softirq_raise() the softirq tracepoint got > > even more convoluted. Why the tracepoints take two pointers to assign > > an integer is beyond my comprehension. > > > > But adding an extra case which treats the first pointer as an unsigned > > long when the second pointer is NULL including the back and forth > > type casting is just horrible. > > > > Convert the softirq tracepoints to take a single unsigned int argument > > for the softirq vector number and fix the call sites. > > Well, there was originally a reason for this oddness. The in __do_softirq(), > "h - softirq_ve"c computation was not needed outside of the tracepoint handler > in the past, but it now seems to be required with the new inlined > "kstat_incr_softirqs_this_cpu()". Dudes, a vector computation is hardly a performance problem in that function and definitely not an excuse for designing such horrible interfaces. Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 13:41 ` Thomas Gleixner @ 2010-10-19 13:54 ` Steven Rostedt 2010-10-19 14:07 ` Thomas Gleixner 2010-10-19 14:00 ` Mathieu Desnoyers 1 sibling, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 13:54 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 15:41 +0200, Thomas Gleixner wrote: > On Tue, 19 Oct 2010, Mathieu Desnoyers wrote: > Dudes, a vector computation is hardly a performance problem in that > function and definitely not an excuse for designing such horrible > interfaces. Yes, now we can be a bit more liberal. But when these tracepoints were going in, people were watching to make sure they have practically zero impact when tracing was disabled. Now that people are more use to tracepoints, they are more understanding to have cleaner code over that extra few more lines of machine code in the fast path. -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 13:54 ` Steven Rostedt @ 2010-10-19 14:07 ` Thomas Gleixner 2010-10-19 14:28 ` Mathieu Desnoyers 2010-10-19 14:46 ` Steven Rostedt 0 siblings, 2 replies; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 14:07 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Steven Rostedt wrote: > On Tue, 2010-10-19 at 15:41 +0200, Thomas Gleixner wrote: > > On Tue, 19 Oct 2010, Mathieu Desnoyers wrote: > > > Dudes, a vector computation is hardly a performance problem in that > > function and definitely not an excuse for designing such horrible > > interfaces. > > Yes, now we can be a bit more liberal. But when these tracepoints were > going in, people were watching to make sure they have practically zero > impact when tracing was disabled. > > Now that people are more use to tracepoints, they are more understanding > to have cleaner code over that extra few more lines of machine code in > the fast path. The vector computation is compared to the extra tracing induced jumps probably not even measurable. Stop defending horrible coding with handwavy performance and impact arguments. Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 14:07 ` Thomas Gleixner @ 2010-10-19 14:28 ` Mathieu Desnoyers 2010-10-19 19:49 ` Thomas Gleixner 2010-10-19 14:46 ` Steven Rostedt 1 sibling, 1 reply; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-19 14:28 UTC (permalink / raw) To: Thomas Gleixner Cc: Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony * Thomas Gleixner (tglx@linutronix.de) wrote: > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > > On Tue, 2010-10-19 at 15:41 +0200, Thomas Gleixner wrote: > > > On Tue, 19 Oct 2010, Mathieu Desnoyers wrote: > > > > > Dudes, a vector computation is hardly a performance problem in that > > > function and definitely not an excuse for designing such horrible > > > interfaces. > > > > Yes, now we can be a bit more liberal. But when these tracepoints were > > going in, people were watching to make sure they have practically zero > > impact when tracing was disabled. > > > > Now that people are more use to tracepoints, they are more understanding > > to have cleaner code over that extra few more lines of machine code in > > the fast path. > > The vector computation is compared to the extra tracing induced jumps > probably not even measurable. Stop defending horrible coding with > handwavy performance and impact arguments. >From the moment markers and tracepoints infrastructures were merged, the performance overhead target has been assuming we would eventually be merging "asm goto jump labels", which replace the load+test+branch with a no-op. So compared to a 5 bytes no-op added to the fast path, this vector computation can be expected to have a higher performance impact, because skipping a no-op on modern architectures (x86 at least) adds technically zero cycles. Agreed, there is still the impact on I$, extra register pressure, some leaf functions becoming non-leaf, and added function call (which imply external side-effect, thus acting like a barrier()). But saying that all we do is to provide handwavy performance and impact arguments is a bit much. Until the asm goto are more widely deployed and until gcc 4.5 is more widely used, there are some instrumentation sites I am relunctant to consider to instrument with tracepoints (e.g. all system call entry/exit sites). However, we should not use the cost of the current load+test+branch tracepoint behavior as an excuse for adding extra performance impact to kernel code, because when it will be replaced by asm gotos, all that will be left is the performance impact inappropriately justified as insignificant compared to the impact of the old tracepoint scheme. Thanks, Mathieu > > Thanks, > > tglx -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 14:28 ` Mathieu Desnoyers @ 2010-10-19 19:49 ` Thomas Gleixner 2010-10-19 20:55 ` Steven Rostedt ` (2 more replies) 0 siblings, 3 replies; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 19:49 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Mathieu Desnoyers wrote: > * Thomas Gleixner (tglx@linutronix.de) wrote: > > On Tue, 19 Oct 2010, Steven Rostedt wrote: > as an excuse for adding extra performance impact to kernel code, because when it > will be replaced by asm gotos, all that will be left is the performance impact > inappropriately justified as insignificant compared to the impact of the old > tracepoint scheme. Can you at one point just stop your tracing lectures and look at the facts ? The impact of a sensible tracepoint design on the code in question before kstat_incr_softirqs_this_cpu() was added would have been a mere _FIVE_ bytes of text. But the original tracepoint code itself is _TWENTY_ bytes of text larger. So we trade horrible code plus 20 bytes text against 5 bytes of text in the hotpath. And you tell me that these _FIVE_ bytes are impacting performance so much that it's significant. Now with kstat_incr_softirqs_this_cpu() the impact is zero, it even removes code. And talking about non impact of disabled trace points. The tracepoint in question which made me look at the code results in deinlining __raise_softirq_irqsoff() in net/dev/core.c. There goes your theory. So no, you _cannot_ tell what impact a tracepoint has in reality except by looking at the assembly output. And what scares me way more is the size of a single tracepoint in a code file. Just adding "trace_softirq_entry(nr);" adds 88 bytes of text. So that's optimized tracing code ? All it's supposed to do is: if (enabled) trace_foo(nr); Replace "if (enabled)" with your favourite code patching jump label whatever magic. The above stupid version takes about 28, but the "optimized" tracing code makes that 88. Brilliant. That's inlining utter shite for no good reason. WTF is it necessary to inline all that gunk ? Please spare me the "jump label will make this less intrusive" lecture. I'm not interested at all. Let's instead look at some more facts: #include <linux/interrupt.h> #include <linux/module.h> #include <trace/events/irq.h> static struct softirq_action softirq_vec[NR_SOFTIRQS]; void test(struct softirq_action *h) { trace_softirq_entry(h - softirq_vec); h->action(h); } Compile this code with GCC 4.5 with and without jump labels (zap the select HAVE_ARCH_JUMP_LABEL line in arch/x86/Kconfig) So now the !jumplabel case gives us: ../build/kernel/soft.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <test>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 41 55 push %r13 6: 49 89 fd mov %rdi,%r13 9: 49 81 ed 00 00 00 00 sub $0x0,%r13 10: 41 54 push %r12 12: 49 c1 ed 03 shr $0x3,%r13 16: 49 89 fc mov %rdi,%r12 19: 53 push %rbx 1a: 48 83 ec 08 sub $0x8,%rsp 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> 25: 74 4d je 74 <test+0x74> 27: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 2e: 00 00 30: ff 80 44 e0 ff ff incl -0x1fbc(%rax) 36: 48 8b 1d 00 00 00 00 mov 0x0(%rip),%rbx # 3d <test+0x3d> 3d: 48 85 db test %rbx,%rbx 40: 74 13 je 55 <test+0x55> 42: 48 8b 7b 08 mov 0x8(%rbx),%rdi 46: 44 89 ee mov %r13d,%esi 49: ff 13 callq *(%rbx) 4b: 48 83 c3 10 add $0x10,%rbx 4f: 48 83 3b 00 cmpq $0x0,(%rbx) 53: eb eb jmp 40 <test+0x40> 55: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 5c: 00 00 5e: ff 88 44 e0 ff ff decl -0x1fbc(%rax) 64: 48 8b 80 38 e0 ff ff mov -0x1fc8(%rax),%rax 6b: a8 08 test $0x8,%al 6d: 74 05 je 74 <test+0x74> 6f: e8 00 00 00 00 callq 74 <test+0x74> 74: 4c 89 e7 mov %r12,%rdi 77: 41 ff 14 24 callq *(%r12) 7b: 58 pop %rax 7c: 5b pop %rbx 7d: 41 5c pop %r12 7f: 41 5d pop %r13 81: c9 leaveq 82: c3 retq The jumplabel=y case gives: ../build/kernel/soft.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <test>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 41 55 push %r13 6: 49 89 fd mov %rdi,%r13 9: 49 81 ed 00 00 00 00 sub $0x0,%r13 10: 41 54 push %r12 12: 49 c1 ed 03 shr $0x3,%r13 16: 49 89 fc mov %rdi,%r12 19: 53 push %rbx 1a: 48 83 ec 08 sub $0x8,%rsp 1e: e9 00 00 00 00 jmpq 23 <test+0x23> 23: eb 4d jmp 72 <test+0x72> 25: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 2c: 00 00 2e: ff 80 44 e0 ff ff incl -0x1fbc(%rax) 34: 48 8b 1d 00 00 00 00 mov 0x0(%rip),%rbx # 3b <test+0x3b> 3b: 48 85 db test %rbx,%rbx 3e: 74 13 je 53 <test+0x53> 40: 48 8b 7b 08 mov 0x8(%rbx),%rdi 44: 44 89 ee mov %r13d,%esi 47: ff 13 callq *(%rbx) 49: 48 83 c3 10 add $0x10,%rbx 4d: 48 83 3b 00 cmpq $0x0,(%rbx) 51: eb eb jmp 3e <test+0x3e> 53: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 5a: 00 00 5c: ff 88 44 e0 ff ff decl -0x1fbc(%rax) 62: 48 8b 80 38 e0 ff ff mov -0x1fc8(%rax),%rax 69: a8 08 test $0x8,%al 6b: 74 05 je 72 <test+0x72> 6d: e8 00 00 00 00 callq 72 <test+0x72> 72: 4c 89 e7 mov %r12,%rdi 75: 41 ff 14 24 callq *(%r12) 79: 58 pop %rax 7a: 5b pop %rbx 7b: 41 5c pop %r12 7d: 41 5d pop %r13 7f: c9 leaveq 80: c3 retq So that saves _TWO_ bytes of text and replaces: - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> - 25: 74 4d je 74 <test+0x74> + 1e: e9 00 00 00 00 jmpq 23 <test+0x23> + 23: eb 4d jmp 72 <test+0x72> So it trades a conditional vs. two jumps ? WTF ?? I thought that jumplabel magic was supposed to get rid of the jump over the tracing code ? In fact it adds another jump. Whatfor ? Now even worse, when you NOP out the jmpq then your tracepoint is still not enabled. Brilliant ! Did you guys ever look at the assembly output of that insane shite you are advertising with lengthy explanations ? Obviously _NOT_ Come back when you can show me a clean imlementation of all this crap which reproduces with my jumplabel enabled stock compiler. And please just send me a patch w/o the blurb. And sane looks like: jmpq 2f <---- This gets noped out 1: mov %r12,%rdi callq *(%r12) [whatever cleanup it takes ] leaveq retq 2f: [tracing gunk] jmp 1b And further I want to see the tracing gunk in a minimal size so the net/core/dev.c deinlining does not happen. Thanks, tglx P.S.: It might be helpful and polite if you'd take off your tracing blinkers from time to time. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 19:49 ` Thomas Gleixner @ 2010-10-19 20:55 ` Steven Rostedt 2010-10-19 21:07 ` Thomas Gleixner 2010-10-19 21:45 ` Thomas Gleixner 2010-10-19 21:16 ` David Daney 2010-10-19 21:28 ` Jason Baron 2 siblings, 2 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 20:55 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote: > Let's instead look at some more facts: Sure, this is where it gets fun :-) > > #include <linux/interrupt.h> > #include <linux/module.h> > > #include <trace/events/irq.h> > > static struct softirq_action softirq_vec[NR_SOFTIRQS]; > > void test(struct softirq_action *h) > { > trace_softirq_entry(h - softirq_vec); > > h->action(h); > } Since I don't have your patch yet, I used the original: void test(struct softirq_action *h) { trace_softirq_entry(h, softirq_vec); h->action(h); } > Compile this code with GCC 4.5 with and without jump labels (zap the > select HAVE_ARCH_JUMP_LABEL line in arch/x86/Kconfig) > > So now the !jumplabel case gives us: > > ../build/kernel/soft.o: file format elf64-x86-64 > > Disassembly of section .text: > > 0000000000000000 <test>: > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 41 55 push %r13 > 6: 49 89 fd mov %rdi,%r13 > 9: 49 81 ed 00 00 00 00 sub $0x0,%r13 > 10: 41 54 push %r12 > 12: 49 c1 ed 03 shr $0x3,%r13 > 16: 49 89 fc mov %rdi,%r12 > 19: 53 push %rbx > 1a: 48 83 ec 08 sub $0x8,%rsp > 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > 25: 74 4d je 74 <test+0x74> > 27: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 2e: 00 00 > 30: ff 80 44 e0 ff ff incl -0x1fbc(%rax) > 36: 48 8b 1d 00 00 00 00 mov 0x0(%rip),%rbx # 3d <test+0x3d> > 3d: 48 85 db test %rbx,%rbx > 40: 74 13 je 55 <test+0x55> > 42: 48 8b 7b 08 mov 0x8(%rbx),%rdi > 46: 44 89 ee mov %r13d,%esi > 49: ff 13 callq *(%rbx) > 4b: 48 83 c3 10 add $0x10,%rbx > 4f: 48 83 3b 00 cmpq $0x0,(%rbx) > 53: eb eb jmp 40 <test+0x40> > 55: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 5c: 00 00 > 5e: ff 88 44 e0 ff ff decl -0x1fbc(%rax) > 64: 48 8b 80 38 e0 ff ff mov -0x1fc8(%rax),%rax > 6b: a8 08 test $0x8,%al > 6d: 74 05 je 74 <test+0x74> > 6f: e8 00 00 00 00 callq 74 <test+0x74> > 74: 4c 89 e7 mov %r12,%rdi > 77: 41 ff 14 24 callq *(%r12) > 7b: 58 pop %rax > 7c: 5b pop %rbx > 7d: 41 5c pop %r12 > 7f: 41 5d pop %r13 > 81: c9 leaveq > 82: c3 retq > > The jumplabel=y case gives: > > ../build/kernel/soft.o: file format elf64-x86-64 > > Disassembly of section .text: > > 0000000000000000 <test>: > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 41 55 push %r13 > 6: 49 89 fd mov %rdi,%r13 > 9: 49 81 ed 00 00 00 00 sub $0x0,%r13 > 10: 41 54 push %r12 > 12: 49 c1 ed 03 shr $0x3,%r13 > 16: 49 89 fc mov %rdi,%r12 > 19: 53 push %rbx > 1a: 48 83 ec 08 sub $0x8,%rsp > 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > 23: eb 4d jmp 72 <test+0x72> > 25: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 2c: 00 00 > 2e: ff 80 44 e0 ff ff incl -0x1fbc(%rax) > 34: 48 8b 1d 00 00 00 00 mov 0x0(%rip),%rbx # 3b <test+0x3b> > 3b: 48 85 db test %rbx,%rbx > 3e: 74 13 je 53 <test+0x53> > 40: 48 8b 7b 08 mov 0x8(%rbx),%rdi > 44: 44 89 ee mov %r13d,%esi > 47: ff 13 callq *(%rbx) > 49: 48 83 c3 10 add $0x10,%rbx > 4d: 48 83 3b 00 cmpq $0x0,(%rbx) > 51: eb eb jmp 3e <test+0x3e> > 53: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 5a: 00 00 > 5c: ff 88 44 e0 ff ff decl -0x1fbc(%rax) > 62: 48 8b 80 38 e0 ff ff mov -0x1fc8(%rax),%rax > 69: a8 08 test $0x8,%al > 6b: 74 05 je 72 <test+0x72> > 6d: e8 00 00 00 00 callq 72 <test+0x72> > 72: 4c 89 e7 mov %r12,%rdi > 75: 41 ff 14 24 callq *(%r12) > 79: 58 pop %rax > 7a: 5b pop %rbx > 7b: 41 5c pop %r12 > 7d: 41 5d pop %r13 > 7f: c9 leaveq > 80: c3 retq > > So that saves _TWO_ bytes of text and replaces: > > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > - 25: 74 4d je 74 <test+0x74> > + 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > + 23: eb 4d jmp 72 <test+0x72> > > So it trades a conditional vs. two jumps ? WTF ?? Well, the one jmpq is noped out, and the jmp is non conditional. I've always thought a non conditional jmp was faster than a conditional one, since there's no need to go into the branch prediction logic. The CPU can simply skip to the code to jump next. Of counse, this pollutes the I$. > > I thought that jumplabel magic was supposed to get rid of the jump > over the tracing code ? In fact it adds another jump. Whatfor ? Because you do the h - softvec in the tracepoint parameter? I got a different result: Here's the diff. I did a cut -c10- to get rid of the line numbers so I have a better diff. There's still differences due to jump locations, but those are easy to figure out: I diffed nojump vs jump. The '-' is with nojump, the '+' is with jumps. --- /tmp/s2 2010-10-19 16:40:19.000000000 -0400 +++ /tmp/s1 2010-10-19 16:40:23.000000000 -0400 @@ -1,38 +1,33 @@ -00026f0 <test>: +00027a0 <test>: 55 push %rbp 48 89 e5 mov %rsp,%rbp - 48 83 ec 10 sub $0x10,%rsp - 48 89 1c 24 mov %rbx,(%rsp) - 4c 89 64 24 08 mov %r12,0x8(%rsp) - e8 00 00 00 00 callq 2706 <test+0x16> + 41 54 push %r12 + 53 push %rbx + e8 00 00 00 00 callq 27ac <test+0xc> R_X86_64_PC32 mcount-0x4 - 8b 15 00 00 00 00 mov 0x0(%rip),%edx # 270c <test+0x1c> -R_X86_64_PC32 __tracepoint_softirq_entry+0x4 48 89 fb mov %rdi,%rbx vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv - 85 d2 test %edx,%edx - 75 10 jne 2723 <test+0x33> + e9 00 00 00 00 jmpq 27b4 <test+0x14> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There's the difference with this code. We replaced a test and jump conditional with a single jump that will later be nop'd out. 48 89 df mov %rbx,%rdi ff 13 callq *(%rbx) - 48 8b 1c 24 mov (%rsp),%rbx - 4c 8b 64 24 08 mov 0x8(%rsp),%r12 + 5b pop %rbx + 41 5c pop %r12 c9 leaveq c3 retq ^^^^^^^^^^^^^^^^^^^ end of the fast path, below is the code that does the tracepoint. + 66 90 xchg %ax,%ax 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 R_X86_64_32S kernel_stack 83 80 44 e0 ff ff 01 addl $0x1,-0x1fbc(%rax) - e8 00 00 00 00 callq 2738 <test+0x48> + e8 00 00 00 00 callq 27d5 <test+0x35> R_X86_64_PC32 debug_lockdep_rcu_enabled-0x4 85 c0 test %eax,%eax - 74 09 je 2745 <test+0x55> - 80 3d 00 00 00 00 00 cmpb $0x0,0x0(%rip) # 2743 <test+0x53> -R_X86_64_PC32 .bss-0x1 - 74 53 je 2798 <test+0xa8> - 4c 8b 25 00 00 00 00 mov 0x0(%rip),%r12 # 274c <test+0x5c> + 75 57 jne 2830 <test+0x90> + 4c 8b 25 00 00 00 00 mov 0x0(%rip),%r12 # 27e0 <test+0x40> R_X86_64_PC32 __tracepoint_softirq_entry+0x1c 4d 85 e4 test %r12,%r12 - 74 22 je 2773 <test+0x83> + 74 29 je 280e <test+0x6e> 49 8b 04 24 mov (%r12),%rax + 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 49 8b 7c 24 08 mov 0x8(%r12),%rdi 49 83 c4 10 add $0x10,%r12 48 c7 c2 00 00 00 00 mov $0x0,%rdx @@ -41,49 +36,52 @@ ff d0 callq *%rax 49 8b 04 24 mov (%r12),%rax 48 85 c0 test %rax,%rax - 75 e2 jne 2755 <test+0x65> + 75 e2 jne 27f0 <test+0x50> 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 R_X86_64_32S kernel_stack 83 a8 44 e0 ff ff 01 subl $0x1,-0x1fbc(%rax) 48 8b 80 38 e0 ff ff mov -0x1fc8(%rax),%rax a8 08 test $0x8,%al - 74 85 je 2713 <test+0x23> - e8 00 00 00 00 callq 2793 <test+0xa3> + 74 8b je 27b4 <test+0x14> + e8 00 00 00 00 callq 282e <test+0x8e> R_X86_64_PC32 preempt_schedule-0x4 - e9 7b ff ff ff jmpq 2713 <test+0x23> - 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) - 00 - e8 00 00 00 00 callq 27a5 <test+0xb5> + eb 84 jmp 27b4 <test+0x14> + 80 3d 00 00 00 00 00 cmpb $0x0,0x0(%rip) # 2837 <test+0x97> +R_X86_64_PC32 .bss-0x1 + 75 a0 jne 27d9 <test+0x39> + e8 00 00 00 00 callq 283e <test+0x9e> R_X86_64_PC32 debug_lockdep_rcu_enabled-0x4 85 c0 test %eax,%eax - 74 9c je 2745 <test+0x55> - 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 27b0 <test+0xc0> -R_X86_64_PC32 debug_locks-0x5 - 75 3f jne 27f1 <test+0x101> + 74 97 je 27d9 <test+0x39> + 8b 35 00 00 00 00 mov 0x0(%rip),%esi # 2848 <test+0xa8> +R_X86_64_PC32 debug_locks-0x4 + 85 f6 test %esi,%esi + 75 44 jne 2890 <test+0xf0> 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 R_X86_64_32S kernel_stack - 83 b8 44 e0 ff ff 00 cmpl $0x0,-0x1fbc(%rax) - 75 81 jne 2745 <test+0x55> + 8b 88 44 e0 ff ff mov -0x1fbc(%rax),%ecx + 85 c9 test %ecx,%ecx + 0f 85 76 ff ff ff jne 27d9 <test+0x39> ff 14 25 00 00 00 00 callq *0x0 R_X86_64_32S pv_irq_ops f6 c4 02 test $0x2,%ah - 0f 84 71 ff ff ff je 2745 <test+0x55> + 0f 84 66 ff ff ff je 27d9 <test+0x39> be 7c 00 00 00 mov $0x7c,%esi 48 c7 c7 00 00 00 00 mov $0x0,%rdi R_X86_64_32S .rodata.str1.1 - c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # 27e7 <test+0xf7> + c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # 2886 <test+0xe6> R_X86_64_PC32 .bss-0x1 - e8 00 00 00 00 callq 27ec <test+0xfc> + e8 00 00 00 00 callq 288b <test+0xeb> R_X86_64_PC32 lockdep_rcu_dereference-0x4 - e9 54 ff ff ff jmpq 2745 <test+0x55> + e9 49 ff ff ff jmpq 27d9 <test+0x39> 48 c7 c7 00 00 00 00 mov $0x0,%rdi R_X86_64_32S rcu_sched_lock_map - e8 00 00 00 00 callq 27fd <test+0x10d> + e8 00 00 00 00 callq 289c <test+0xfc> R_X86_64_PC32 lock_is_held-0x4 85 c0 test %eax,%eax - 0f 85 40 ff ff ff jne 2745 <test+0x55> - eb ab jmp 27b2 <test+0xc2> - 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) - 00 00 + 0f 85 35 ff ff ff jne 27d9 <test+0x39> + eb a6 jmp 284c <test+0xac> + 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) + 00 00 00 > > Now even worse, when you NOP out the jmpq then your tracepoint is > still not enabled. Brilliant ! > > Did you guys ever look at the assembly output of that insane shite you > are advertising with lengthy explanations ? > > Obviously _NOT_ Perhaps so, but as Peter Zijlsta has said, compiling with gcc is a random number generator. Your mileage may vary. > > Come back when you can show me a clean imlementation of all this crap > which reproduces with my jumplabel enabled stock compiler. And please > just send me a patch w/o the blurb. > > And sane looks like: > > jmpq 2f <---- This gets noped out > 1: > mov %r12,%rdi > callq *(%r12) > [whatever cleanup it takes ] > leaveq > retq > > 2f: > [tracing gunk] > jmp 1b The above looks like what I have. -- Steve > > And further I want to see the tracing gunk in a minimal size so the > net/core/dev.c deinlining does not happen. > > Thanks, > > tglx > > P.S.: It might be helpful and polite if you'd take off your tracing > blinkers from time to time. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 20:55 ` Steven Rostedt @ 2010-10-19 21:07 ` Thomas Gleixner 2010-10-19 21:23 ` Steven Rostedt 2010-10-19 21:45 ` Thomas Gleixner 1 sibling, 1 reply; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 21:07 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Steven Rostedt wrote: > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote: > > So that saves _TWO_ bytes of text and replaces: > > > > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > > - 25: 74 4d je 74 <test+0x74> > > + 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > > + 23: eb 4d jmp 72 <test+0x72> > > > > So it trades a conditional vs. two jumps ? WTF ?? > > Well, the one jmpq is noped out, and the jmp is non conditional. I've What are you smoking ? In case the trace point is enabled the jmpq is there, so it jumps to 23 and jumps from there to 72. In case the trace point is disabled the jmpq is noped out, so it jumps to 72 directly. > always thought a non conditional jmp was faster than a conditional one, I always thought, that at least some of the stuff which comes from tracing folks makes some sense. > since there's no need to go into the branch prediction logic. The CPU > can simply skip to the code to jump next. Of counse, this pollutes the > I$. We might consult Mathieu for further useless blurb on how CPUs work around broken code. Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:07 ` Thomas Gleixner @ 2010-10-19 21:23 ` Steven Rostedt 2010-10-19 21:48 ` H. Peter Anvin 2010-10-19 22:04 ` Thomas Gleixner 0 siblings, 2 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 21:23 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 23:07 +0200, Thomas Gleixner wrote: > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote: > > > So that saves _TWO_ bytes of text and replaces: > > > > > > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > > > - 25: 74 4d je 74 <test+0x74> > > > + 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > > > + 23: eb 4d jmp 72 <test+0x72> > > > > > > So it trades a conditional vs. two jumps ? WTF ?? > > > > Well, the one jmpq is noped out, and the jmp is non conditional. I've > > What are you smoking ? What? Are you saying that conditional jumps are just as fast as non conditional ones? > > In case the trace point is enabled the jmpq is there, so it jumps to > 23 and jumps from there to 72. No, when we dynamically enable the tracepoint, it will jump to 25, not 23. That's what the goto part is about. We add the do_trace label to the table, and we make it point to that location. If we did it as you say, then tracepoints would never be enabled. This is not unlike what we do with the function tracer. The original code points to mcount which simply is: mcount: retq And when we enable the callers, we have it jump to a different function. > > In case the trace point is disabled the jmpq is noped out, so it jumps > to 72 directly. That is correct. > > > always thought a non conditional jmp was faster than a conditional one, > > I always thought, that at least some of the stuff which comes from > tracing folks makes some sense. Is it still not making sense? > > > since there's no need to go into the branch prediction logic. The CPU > > can simply skip to the code to jump next. Of counse, this pollutes the > > I$. > > We might consult Mathieu for further useless blurb on how CPUs work > around broken code. The code worked fine before, it just was not very pretty. But it seemed that gcc for you inlined the code in the wrong spot. Perhaps it's not a good idea to have the something like h - softirq_vec in the parameter of the tracepoint. Not saying that your change is not worth it. It is, because h - softirq_vec is used by others now too. -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:23 ` Steven Rostedt @ 2010-10-19 21:48 ` H. Peter Anvin 2010-10-19 22:23 ` Steven Rostedt 2010-10-19 22:41 ` Mathieu Desnoyers 2010-10-19 22:04 ` Thomas Gleixner 1 sibling, 2 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-19 21:48 UTC (permalink / raw) To: Steven Rostedt Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 02:23 PM, Steven Rostedt wrote: > > But it seemed that gcc for you inlined the code in the wrong spot. > Perhaps it's not a good idea to have the something like h - softirq_vec > in the parameter of the tracepoint. Not saying that your change is not > worth it. It is, because h - softirq_vec is used by others now too. > OK, first of all, there are some serious WTFs here: # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" A jump instruction is one of the worst possible NOPs. Why are we doing this? The second thing that I found when implementing static_cpu_has() was that it is actually better to encapsulate the asm goto in a small inline which returns bool (true/false) -- gcc will happily optimize out the variable and only see it as a flow of control thing. I would be very curious if that wouldn't make gcc generate better code in cases like that. gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm goto (you can't have it unconditionally branch one way or the other), so that should be the likely case and accordingly it should be annotated likely() so that gcc doesn't reorder. I suspect in the end one ends up with code like this: static __always_inline __pure bool __switch_point(...) { asm goto("1: " JUMP_LABEL_INITIAL_NOP /* ... patching stuff */ : : : : t_jump); return false; t_jump: return true; } #define SWITCH_POINT(x) unlikely(__switch_point(x)) I *suspect* this will resolve the need for hot/cold labels just fine. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:48 ` H. Peter Anvin @ 2010-10-19 22:23 ` Steven Rostedt 2010-10-19 22:26 ` H. Peter Anvin 2010-10-19 22:27 ` Peter Zijlstra 2010-10-19 22:41 ` Mathieu Desnoyers 1 sibling, 2 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 22:23 UTC (permalink / raw) To: H. Peter Anvin Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 14:48 -0700, H. Peter Anvin wrote: > On 10/19/2010 02:23 PM, Steven Rostedt wrote: > > > > But it seemed that gcc for you inlined the code in the wrong spot. > > Perhaps it's not a good idea to have the something like h - softirq_vec > > in the parameter of the tracepoint. Not saying that your change is not > > worth it. It is, because h - softirq_vec is used by others now too. > > > > OK, first of all, there are some serious WTFs here: > > # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" > > A jump instruction is one of the worst possible NOPs. Why are we doing > this? Good question. Safety? Jason? This is the initial jumps and are converted on boot up to a better nop. > > The second thing that I found when implementing static_cpu_has() was > that it is actually better to encapsulate the asm goto in a small inline > which returns bool (true/false) -- gcc will happily optimize out the > variable and only see it as a flow of control thing. I would be very > curious if that wouldn't make gcc generate better code in cases like that. > > gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm > goto (you can't have it unconditionally branch one way or the other), so > that should be the likely case and accordingly it should be annotated > likely() so that gcc doesn't reorder. I suspect in the end one ends up > with code like this: > > static __always_inline __pure bool __switch_point(...) > { > asm goto("1: " JUMP_LABEL_INITIAL_NOP > /* ... patching stuff */ > : : : : t_jump); > return false; > t_jump: > return true; > } > > #define SWITCH_POINT(x) unlikely(__switch_point(x)) > > I *suspect* this will resolve the need for hot/cold labels just fine. Interesting, we could try this. Thanks! -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:23 ` Steven Rostedt @ 2010-10-19 22:26 ` H. Peter Anvin 2010-10-19 22:27 ` Peter Zijlstra 1 sibling, 0 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-19 22:26 UTC (permalink / raw) To: Steven Rostedt Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 03:23 PM, Steven Rostedt wrote: > On Tue, 2010-10-19 at 14:48 -0700, H. Peter Anvin wrote: >> On 10/19/2010 02:23 PM, Steven Rostedt wrote: >>> >>> But it seemed that gcc for you inlined the code in the wrong spot. >>> Perhaps it's not a good idea to have the something like h - softirq_vec >>> in the parameter of the tracepoint. Not saying that your change is not >>> worth it. It is, because h - softirq_vec is used by others now too. >>> >> >> OK, first of all, there are some serious WTFs here: >> >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" >> >> A jump instruction is one of the worst possible NOPs. Why are we doing >> this? > > Good question. Safety? Jason? > > This is the initial jumps and are converted on boot up to a better nop. > But it makes absolutely no sense to insert an instruction that suboptimal and then convert it. Start out with a reasonable, universally acceptable, instruction, e.g. LEA on 32 bits and NOPL on 64 bits. >> >> The second thing that I found when implementing static_cpu_has() was >> that it is actually better to encapsulate the asm goto in a small inline >> which returns bool (true/false) -- gcc will happily optimize out the >> variable and only see it as a flow of control thing. I would be very >> curious if that wouldn't make gcc generate better code in cases like that. >> >> gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm >> goto (you can't have it unconditionally branch one way or the other), so >> that should be the likely case and accordingly it should be annotated >> likely() so that gcc doesn't reorder. I suspect in the end one ends up >> with code like this: >> >> static __always_inline __pure bool __switch_point(...) >> { >> asm goto("1: " JUMP_LABEL_INITIAL_NOP >> /* ... patching stuff */ >> : : : : t_jump); >> return false; >> t_jump: >> return true; >> } >> >> #define SWITCH_POINT(x) unlikely(__switch_point(x)) >> >> I *suspect* this will resolve the need for hot/cold labels just fine. > > Interesting, we could try this. > It of course also have the nice property that it syntactically looks exactly like any other C conditional. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:23 ` Steven Rostedt 2010-10-19 22:26 ` H. Peter Anvin @ 2010-10-19 22:27 ` Peter Zijlstra 2010-10-19 23:39 ` H. Peter Anvin 1 sibling, 1 reply; 93+ messages in thread From: Peter Zijlstra @ 2010-10-19 22:27 UTC (permalink / raw) To: Steven Rostedt Cc: H. Peter Anvin, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Wed, October 20, 2010 12:23 am, Steven Rostedt wrote: >> static __always_inline __pure bool __switch_point(...) >> { >> asm goto("1: " JUMP_LABEL_INITIAL_NOP >> /* ... patching stuff */ >> : : : : t_jump); >> return false; >> t_jump: >> return true; >> } >> >> #define SWITCH_POINT(x) unlikely(__switch_point(x)) >> >> I *suspect* this will resolve the need for hot/cold labels just fine. > > Interesting, we could try this. Due to not actually having a sane key type the above is not easy to implement, but I tried: #define _SWITCH_POINT(x)\ ({ \ __label__ jl_enabled; \ bool ret = true; \ JUMP_LABEL(x, jl_enabled); \ ret = false; \ jl_enabled: \ ret; }) #define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x)) #define COND_STMT(key, stmt) \ do { \ if (SWITCH_POINT(key)) { \ stmt; \ } \ } while (0) and that's still generating these double jumps. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:27 ` Peter Zijlstra @ 2010-10-19 23:39 ` H. Peter Anvin 2010-10-19 23:45 ` Steven Rostedt 2010-10-20 0:43 ` Jason Baron 0 siblings, 2 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-19 23:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 03:27 PM, Peter Zijlstra wrote: > > Due to not actually having a sane key type the above is not easy to > implement, but I tried: > > #define _SWITCH_POINT(x)\ > ({ \ > __label__ jl_enabled; \ > bool ret = true; \ > JUMP_LABEL(x, jl_enabled); \ > ret = false; \ > jl_enabled: \ > ret; }) > > #define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x)) > > #define COND_STMT(key, stmt) \ > do { \ > if (SWITCH_POINT(key)) { \ > stmt; \ > } \ > } while (0) > > > and that's still generating these double jumps. > I just experimented with it, and the ({...}) construct doesn't work, because it looks like a merged flow of control to gcc. Replacing the ({ ... }) with an inline does indeed remove the double jumps. diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index b67cb18..2ff829d 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -61,12 +61,22 @@ static inline int jump_label_text_reserved(void *start, void *end) #endif +static __always_inline __pure bool _SWITCH_POINT(void *x) +{ + asm goto("# SWITCH_POINT %0\n\t" + ".byte 0x66,0x66,0x66,0x66,0x90\n" + "1:" + : : "i" (x) : : jl_enabled); + return false; +jl_enabled: + return true; +} + +#define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x)) + #define COND_STMT(key, stmt) \ do { \ - __label__ jl_enabled; \ - JUMP_LABEL(key, jl_enabled); \ - if (0) { \ -jl_enabled: \ + if (SWITCH_POINT(key)) { \ stmt; \ } \ } while (0) The key here seems to be to not use the JUMP_LABEL macro as implemented; I have utterly failed to make JUMP_LABEL() do the right thing. -hpa ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 23:39 ` H. Peter Anvin @ 2010-10-19 23:45 ` Steven Rostedt 2010-10-20 0:43 ` Jason Baron 1 sibling, 0 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 23:45 UTC (permalink / raw) To: H. Peter Anvin Cc: Peter Zijlstra, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 16:39 -0700, H. Peter Anvin wrote: > The key here seems to be to not use the JUMP_LABEL macro as implemented; > I have utterly failed to make JUMP_LABEL() do the right thing. What happens if you remove the do { } while (0) from JUMP_LABEL, since it now just makes it into a asm() -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 23:39 ` H. Peter Anvin 2010-10-19 23:45 ` Steven Rostedt @ 2010-10-20 0:43 ` Jason Baron 1 sibling, 0 replies; 93+ messages in thread From: Jason Baron @ 2010-10-20 0:43 UTC (permalink / raw) To: H. Peter Anvin Cc: Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, Oct 19, 2010 at 04:39:07PM -0700, H. Peter Anvin wrote: > On 10/19/2010 03:27 PM, Peter Zijlstra wrote: > > > > Due to not actually having a sane key type the above is not easy to > > implement, but I tried: > > > > #define _SWITCH_POINT(x)\ > > ({ \ > > __label__ jl_enabled; \ > > bool ret = true; \ > > JUMP_LABEL(x, jl_enabled); \ > > ret = false; \ > > jl_enabled: \ > > ret; }) > > > > #define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x)) > > > > #define COND_STMT(key, stmt) \ > > do { \ > > if (SWITCH_POINT(key)) { \ > > stmt; \ > > } \ > > } while (0) > > > > > > and that's still generating these double jumps. > > > > I just experimented with it, and the ({...}) construct doesn't work, > because it looks like a merged flow of control to gcc. > > Replacing the ({ ... }) with an inline does indeed remove the double > jumps. > > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > index b67cb18..2ff829d 100644 > --- a/include/linux/jump_label.h > +++ b/include/linux/jump_label.h > @@ -61,12 +61,22 @@ static inline int jump_label_text_reserved(void > *start, void *end) > > #endif > > +static __always_inline __pure bool _SWITCH_POINT(void *x) > +{ > + asm goto("# SWITCH_POINT %0\n\t" > + ".byte 0x66,0x66,0x66,0x66,0x90\n" > + "1:" > + : : "i" (x) : : jl_enabled); > + return false; > +jl_enabled: > + return true; > +} > + > +#define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x)) > + > #define COND_STMT(key, stmt) \ > do { \ > - __label__ jl_enabled; \ > - JUMP_LABEL(key, jl_enabled); \ > - if (0) { \ > -jl_enabled: \ > + if (SWITCH_POINT(key)) { \ > stmt; \ > } \ > } while (0) > > > The key here seems to be to not use the JUMP_LABEL macro as implemented; > I have utterly failed to make JUMP_LABEL() do the right thing. > ok, I tried this out for the tracepoint code, but I still seem to be getting the double jump. patch: diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 1947a12..7bc2537 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -66,12 +66,22 @@ static inline void jump_label_unlock(void) {} #endif +static __always_inline __pure bool _SWITCH_POINT(void *x) +{ + asm goto("# SWITCH_POINT %0\n\t" + ".byte 0x66,0x66,0x66,0x66,0x90\n" + "1:" + : : "i" (x) : : jl_enabled); + return false; +jl_enabled: + return true; +} + +#define SWITCH_POINT(x) unlikely(_SWITCH_POINT(x)) + #define COND_STMT(key, stmt) \ do { \ - __label__ jl_enabled; \ - JUMP_LABEL(key, jl_enabled); \ - if (0) { \ -jl_enabled: \ + if (SWITCH_POINT(key)) { \ stmt; \ } \ } while (0) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index a4a90b6..1f8d14f 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -146,12 +146,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin, extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ - JUMP_LABEL(&__tracepoint_##name.state, do_trace); \ - return; \ -do_trace: \ - __DO_TRACE(&__tracepoint_##name, \ - TP_PROTO(data_proto), \ - TP_ARGS(data_args)); \ + COND_STMT(&__tracepoint_##name.state, __DO_TRACE(&__tracepoint_##name, TP_PROTO(data_proto), TP_ARGS(data_args))); \ } \ static inline int \ register_trace_##name(void (*probe)(data_proto), void *data) \ disassemly: ffffffff810360a6 <set_task_cpu>: ffffffff810360a6: 55 push %rbp ffffffff810360a7: 48 89 e5 mov %rsp,%rbp ffffffff810360aa: 41 55 push %r13 ffffffff810360ac: 41 54 push %r12 ffffffff810360ae: 41 89 f4 mov %esi,%r12d ffffffff810360b1: 53 push %rbx ffffffff810360b2: 48 89 fb mov %rdi,%rbx ffffffff810360b5: 48 81 ec b8 00 00 00 sub $0xb8,%rsp ffffffff810360bc: 66 66 66 66 90 data32 data32 data32 xchg %ax,%ax ffffffff810360c1: eb 19 jmp ffffffff810360dc <set_task_cpu+0x36> ffffffff810360c3: 49 8b 7d 08 mov 0x8(%r13),%rdi ffffffff810360c7: 44 89 e2 mov %r12d,%edx ffffffff810360ca: 48 89 de mov %rbx,%rsi ffffffff810360cd: 41 ff 55 00 callq *0x0(%r13) ffffffff810360d1: 49 83 c5 10 add $0x10,%r13 ffffffff810360d5: 49 83 7d 00 00 cmpq $0x0,0x0(%r13) ffffffff810360da: eb 6c jmp ffffffff81036148 <set_task_cpu+0xa2> ffffffff810360dc: 48 8b 43 08 mov 0x8(%rbx),%rax ffffffff810360e0: 44 39 60 18 cmp %r12d,0x18(%rax) ffffffff810360e4: 74 37 je ffffffff8103611d <set_task_cpu+0x77> ffffffff810360e6: 48 ff 83 98 00 00 00 incq 0x98(%rbx) ffffffff810360ed: e9 00 00 00 00 jmpq ffffffff810360f2 <set_task_cpu+0x4c> ffffffff810360f2: eb 29 jmp ffffffff8103611d <set_task_cpu+0x77> ffffffff810360f4: 4c 8d ad 30 ff ff ff lea -0xd0(%rbp),%r13 ffffffff810360fb: 4c 89 ef mov %r13,%rdi ffffffff810360fe: e8 c7 94 ff ff callq ffffffff8102f5ca <perf_fetch_caller_regs> ffffffff81036103: 45 31 c0 xor %r8d,%r8d ffffffff81036106: 4c 89 e9 mov %r13,%rcx ffffffff81036109: ba 01 00 00 00 mov $0x1,%edx ffffffff8103610e: be 01 00 00 00 mov $0x1,%esi ffffffff81036113: bf 04 00 00 00 mov $0x4,%edi ffffffff81036118: e8 67 19 07 00 callq ffffffff810a7a84 <__perf_sw_event> ffffffff8103611d: 44 89 e6 mov %r12d,%esi ffffffff81036120: 48 89 df mov %rbx,%rdi ffffffff81036123: e8 2f 75 ff ff callq ffffffff8102d657 <set_task_rq> ffffffff81036128: 48 8b 43 08 mov 0x8(%rbx),%rax ffffffff8103612c: 44 89 60 18 mov %r12d,0x18(%rax) ffffffff81036130: 48 81 c4 b8 00 00 00 add $0xb8,%rsp ffffffff81036137: 5b pop %rbx ffffffff81036138: 41 5c pop %r12 ffffffff8103613a: 41 5d pop %r13 ffffffff8103613c: c9 leaveq ffffffff8103613d: c3 retq ffffffff8103613e: 4c 8b 2d 3b 26 a9 00 mov 0xa9263b(%rip),%r13 # ffffffff81ac8780 <__tracepoint_sched_migrate_task+0x20> ffffffff81036145: 4d 85 ed test %r13,%r13 ffffffff81036148: 0f 85 75 ff ff ff jne ffffffff810360c3 <set_task_cpu+0x1d> ffffffff8103614e: eb 8c jmp ffffffff810360dc <set_task_cpu+0x36> I'm using gcc (GCC) 4.5.1 20100812 is my patch wrong? thanks, -Jason ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:48 ` H. Peter Anvin 2010-10-19 22:23 ` Steven Rostedt @ 2010-10-19 22:41 ` Mathieu Desnoyers 2010-10-19 22:49 ` H. Peter Anvin 1 sibling, 1 reply; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-19 22:41 UTC (permalink / raw) To: H. Peter Anvin Cc: Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony, Jason Baron * H. Peter Anvin (hpa@zytor.com) wrote: > On 10/19/2010 02:23 PM, Steven Rostedt wrote: > > > > But it seemed that gcc for you inlined the code in the wrong spot. > > Perhaps it's not a good idea to have the something like h - softirq_vec > > in the parameter of the tracepoint. Not saying that your change is not > > worth it. It is, because h - softirq_vec is used by others now too. > > > > OK, first of all, there are some serious WTFs here: > > # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" > > A jump instruction is one of the worst possible NOPs. Why are we doing > this? This code is dynamically patched at boot time (and module load time) with a better nop, just like the function tracer does. > > The second thing that I found when implementing static_cpu_has() was > that it is actually better to encapsulate the asm goto in a small inline > which returns bool (true/false) -- gcc will happily optimize out the > variable and only see it as a flow of control thing. I would be very > curious if that wouldn't make gcc generate better code in cases like that. > > gcc 4.5.0 has a bug in that there must be a flowthrough case in the asm > goto (you can't have it unconditionally branch one way or the other), so > that should be the likely case and accordingly it should be annotated > likely() so that gcc doesn't reorder. I suspect in the end one ends up > with code like this: > > static __always_inline __pure bool __switch_point(...) > { > asm goto("1: " JUMP_LABEL_INITIAL_NOP > /* ... patching stuff */ > : : : : t_jump); > return false; > t_jump: > return true; > } > > #define SWITCH_POINT(x) unlikely(__switch_point(x)) > > I *suspect* this will resolve the need for hot/cold labels just fine. Thanks for the hint! We'll make sure to try it out. Having the ability to force gcc to put the tracepoint in an unlikely branch is deeply needed here. I'm a bit curious about the nop vs jump overhead comparison you are referring to. It is an instruction latency benchmark or throughput benchmark ? Intel's manual "Intel 64 and IA-32 Architectures Optimization Reference Manual" http://www.intel.com/Assets/PDF/manual/248966.pdf Page C-33 (or 577 in the pdf) "7. Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches. When branches are predicted successfully, the latency of jcc is effectively zero." So it mentions "jcc", but not jmp. Is there any reason for jmp to have a higher latency than jcc ? In this manual, the latency of predicted jcc is therefore 0 cycle, and its throughput is 0.5 cycle/insn. NOP (page C-29) is stated to have a latency of 0.5 to 1 cycle/insn (depending on the exact HW), and throughput of 0.5 cycle/insn. However, I have not found "jmp" explicitly in this listing. So if we were executing tracepoints in a maze of jumps, we could argue that instruction throughput is the most important there. However, if we expect the common case to be surrounded by some non-ALU instructions, latency tends to become the most important criterion. But I feel I might be missing something important that distinguish "jcc" from "jmp". Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:41 ` Mathieu Desnoyers @ 2010-10-19 22:49 ` H. Peter Anvin 2010-10-19 23:05 ` Steven Rostedt 0 siblings, 1 reply; 93+ messages in thread From: H. Peter Anvin @ 2010-10-19 22:49 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony, Jason Baron On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote: >> >> OK, first of all, there are some serious WTFs here: >> >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" >> >> A jump instruction is one of the worst possible NOPs. Why are we doing >> this? > > This code is dynamically patched at boot time (and module load time) with a > better nop, just like the function tracer does. > That's just ridiculous... start out with something sane and you at least have the chance of not having to patch it. > Intel's manual "Intel 64 and IA-32 Architectures Optimization Reference Manual" > > http://www.intel.com/Assets/PDF/manual/248966.pdf > > Page C-33 (or 577 in the pdf) > > "7. Selection of conditional jump instructions should be based on the > recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to > improve the predictability of branches. When branches are predicted > successfully, the latency of jcc is effectively zero." > > So it mentions "jcc", but not jmp. Is there any reason for jmp to have a higher > latency than jcc ? > > In this manual, the latency of predicted jcc is therefore 0 cycle, and its > throughput is 0.5 cycle/insn. > > NOP (page C-29) is stated to have a latency of 0.5 to 1 cycle/insn (depending on > the exact HW), and throughput of 0.5 cycle/insn. > > However, I have not found "jmp" explicitly in this listing. > > So if we were executing tracepoints in a maze of jumps, we could argue that > instruction throughput is the most important there. However, if we expect the > common case to be surrounded by some non-ALU instructions, latency tends to > become the most important criterion. > > But I feel I might be missing something important that distinguish "jcc" from > "jmp". NOP has a latency of 0.5-1.0 cycle/insns, *but has no consumers*. JMP/Jcc does have a consumer -- the IP -- and actually measuring shows that it is much, much worse than NOP and other dummy instructions. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:49 ` H. Peter Anvin @ 2010-10-19 23:05 ` Steven Rostedt 2010-10-19 23:09 ` H. Peter Anvin 2010-10-20 15:27 ` Jason Baron 0 siblings, 2 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 23:05 UTC (permalink / raw) To: H. Peter Anvin Cc: Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony, Jason Baron On Tue, 2010-10-19 at 15:49 -0700, H. Peter Anvin wrote: > On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote: > >> > >> OK, first of all, there are some serious WTFs here: > >> > >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" > >> > >> A jump instruction is one of the worst possible NOPs. Why are we doing > >> this? > > > > This code is dynamically patched at boot time (and module load time) with a > > better nop, just like the function tracer does. > > > > That's just ridiculous... start out with something sane and you at least > have the chance of not having to patch it. Yep we can fix this. Jason? > > So if we were executing tracepoints in a maze of jumps, we could argue that > > instruction throughput is the most important there. However, if we expect the > > common case to be surrounded by some non-ALU instructions, latency tends to > > become the most important criterion. > > > > But I feel I might be missing something important that distinguish "jcc" from > > "jmp". > > NOP has a latency of 0.5-1.0 cycle/insns, *but has no consumers*. > > JMP/Jcc does have a consumer -- the IP -- and actually measuring shows > that it is much, much worse than NOP and other dummy instructions. But how does JMP vs Jcc compare? -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 23:05 ` Steven Rostedt @ 2010-10-19 23:09 ` H. Peter Anvin 2010-10-20 15:27 ` Jason Baron 1 sibling, 0 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-19 23:09 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony, Jason Baron On 10/19/2010 04:05 PM, Steven Rostedt wrote: >> >> JMP/Jcc does have a consumer -- the IP -- and actually measuring shows >> that it is much, much worse than NOP and other dummy instructions. > > But how does JMP vs Jcc compare? > *As far as I know* they're the same, except of course that a direct JMP never mispredicts. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 23:05 ` Steven Rostedt 2010-10-19 23:09 ` H. Peter Anvin @ 2010-10-20 15:27 ` Jason Baron 2010-10-20 15:41 ` Mathieu Desnoyers 2010-10-25 21:54 ` H. Peter Anvin 1 sibling, 2 replies; 93+ messages in thread From: Jason Baron @ 2010-10-20 15:27 UTC (permalink / raw) To: Steven Rostedt Cc: H. Peter Anvin, Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, Oct 19, 2010 at 07:05:15PM -0400, Steven Rostedt wrote: > On Tue, 2010-10-19 at 15:49 -0700, H. Peter Anvin wrote: > > On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote: > > >> > > >> OK, first of all, there are some serious WTFs here: > > >> > > >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" > > >> > > >> A jump instruction is one of the worst possible NOPs. Why are we doing > > >> this? > > > > > > This code is dynamically patched at boot time (and module load time) with a > > > better nop, just like the function tracer does. > > > > > > > That's just ridiculous... start out with something sane and you at least > > have the chance of not having to patch it. > > Yep we can fix this. Jason? > sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if there's a better lcd for x86, I'll update it. But note, that since the 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. And in the boot path we are using 'text_poke_early()', so avoiding that isn't going to improve things much. I've got a few fixup patches in the queue that I'm going to post first, and then I'll take a look at this change. thanks, -Jason ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-20 15:27 ` Jason Baron @ 2010-10-20 15:41 ` Mathieu Desnoyers 2010-10-25 21:54 ` H. Peter Anvin 1 sibling, 0 replies; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-20 15:41 UTC (permalink / raw) To: Jason Baron Cc: Steven Rostedt, H. Peter Anvin, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony * Jason Baron (jbaron@redhat.com) wrote: > On Tue, Oct 19, 2010 at 07:05:15PM -0400, Steven Rostedt wrote: > > On Tue, 2010-10-19 at 15:49 -0700, H. Peter Anvin wrote: > > > On 10/19/2010 03:41 PM, Mathieu Desnoyers wrote: > > > >> > > > >> OK, first of all, there are some serious WTFs here: > > > >> > > > >> # define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" > > > >> > > > >> A jump instruction is one of the worst possible NOPs. Why are we doing > > > >> this? > > > > > > > > This code is dynamically patched at boot time (and module load time) with a > > > > better nop, just like the function tracer does. > > > > > > > > > > That's just ridiculous... start out with something sane and you at least > > > have the chance of not having to patch it. > > > > Yep we can fix this. Jason? > > > > sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if > there's a better lcd for x86, I'll update it. But note, that since the > 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. > And in the boot path we are using 'text_poke_early()', so avoiding that > isn't going to improve things much. > > I've got a few fixup patches in the queue that I'm going to post first, > and then I'll take a look at this change. One thing to consider here is that some nops are not compatible across all architectures. And it would be safer to use an atomic nops (a single instruction) too. e.g. GENERIC_NOP5 in arch/x86/include/asm/nops.h is really 2 instructions, which can cause problems if a concurrent thread is preempted between the two instructions while we patch. arch_init_ideal_nop5() is actually doing the task of finding the best nop, and it falls-back on a 5-byte nop (just like you do). HPA, do you have any recommendation for a 5-byte single-instruction nop that is efficient enough and will work on all x86 (Intel, AMD and other variants) ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-20 15:27 ` Jason Baron 2010-10-20 15:41 ` Mathieu Desnoyers @ 2010-10-25 21:54 ` H. Peter Anvin 2010-10-25 22:01 ` Mathieu Desnoyers 1 sibling, 1 reply; 93+ messages in thread From: H. Peter Anvin @ 2010-10-25 21:54 UTC (permalink / raw) To: Jason Baron Cc: Steven Rostedt, Mathieu Desnoyers, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/20/2010 08:27 AM, Jason Baron wrote: > > sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if > there's a better lcd for x86, I'll update it. But note, that since the > 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. > And in the boot path we are using 'text_poke_early()', so avoiding that > isn't going to improve things much. > It's still a completely unnecessary waste of startup time some potentially significant fraction of the time. Startup time matters, especially as the number of tracepoints grow. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-25 21:54 ` H. Peter Anvin @ 2010-10-25 22:01 ` Mathieu Desnoyers 2010-10-25 22:12 ` H. Peter Anvin 0 siblings, 1 reply; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-25 22:01 UTC (permalink / raw) To: H. Peter Anvin Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony * H. Peter Anvin (hpa@zytor.com) wrote: > On 10/20/2010 08:27 AM, Jason Baron wrote: > > > > sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if > > there's a better lcd for x86, I'll update it. But note, that since the > > 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. > > And in the boot path we are using 'text_poke_early()', so avoiding that > > isn't going to improve things much. > > > > It's still a completely unnecessary waste of startup time some > potentially significant fraction of the time. Startup time matters, > especially as the number of tracepoints grow. We're still waiting for input for the best single-5-byte-instruction nop that will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two instructions one next to each other, which is not appropriate here. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-25 22:01 ` Mathieu Desnoyers @ 2010-10-25 22:12 ` H. Peter Anvin 2010-10-25 22:19 ` H. Peter Anvin 2010-10-25 22:55 ` Mathieu Desnoyers 0 siblings, 2 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-25 22:12 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote: > * H. Peter Anvin (hpa@zytor.com) wrote: >> On 10/20/2010 08:27 AM, Jason Baron wrote: >>> >>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if >>> there's a better lcd for x86, I'll update it. But note, that since the >>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. >>> And in the boot path we are using 'text_poke_early()', so avoiding that >>> isn't going to improve things much. >>> >> >> It's still a completely unnecessary waste of startup time some >> potentially significant fraction of the time. Startup time matters, >> especially as the number of tracepoints grow. > > We're still waiting for input for the best single-5-byte-instruction nop that > will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two > instructions one next to each other, which is not appropriate here. > On 64 bits, use P6_NOP5; it seems to not suck on any platform. On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least do okay. I can't say these are the *best* (in fact, they are guaranteed not the best on some significant number of chips), but they haven't sucked on any chips I have been able to measure -- and are way faster than JMP. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-25 22:12 ` H. Peter Anvin @ 2010-10-25 22:19 ` H. Peter Anvin 2010-10-25 22:55 ` Mathieu Desnoyers 1 sibling, 0 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-25 22:19 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/25/2010 03:12 PM, H. Peter Anvin wrote: > > On 64 bits, use P6_NOP5; it seems to not suck on any platform. > > On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least > do okay. > > I can't say these are the *best* (in fact, they are guaranteed not the > best on some significant number of chips), but they haven't sucked on > any chips I have been able to measure -- and are way faster than JMP. > This is pure conjecture, I have not measured it, but I suspect in fact that we could just change the composite nops in nops.h to use a 3E prefix instead of a separate 90 nop. Some platforms will take a penalty on the prefix, but that would be balanced against handling two instructions. The P5 core and others of the same generation might suffer, as it might have been able to do U+V pipe pairing on two instructions which it wouldn't for prefixes. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-25 22:12 ` H. Peter Anvin 2010-10-25 22:19 ` H. Peter Anvin @ 2010-10-25 22:55 ` Mathieu Desnoyers 2010-10-26 0:39 ` Steven Rostedt 1 sibling, 1 reply; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-25 22:55 UTC (permalink / raw) To: H. Peter Anvin Cc: Jason Baron, Steven Rostedt, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony * H. Peter Anvin (hpa@zytor.com) wrote: > On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote: > > * H. Peter Anvin (hpa@zytor.com) wrote: > >> On 10/20/2010 08:27 AM, Jason Baron wrote: > >>> > >>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if > >>> there's a better lcd for x86, I'll update it. But note, that since the > >>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. > >>> And in the boot path we are using 'text_poke_early()', so avoiding that > >>> isn't going to improve things much. > >>> > >> > >> It's still a completely unnecessary waste of startup time some > >> potentially significant fraction of the time. Startup time matters, > >> especially as the number of tracepoints grow. > > > > We're still waiting for input for the best single-5-byte-instruction nop that > > will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two > > instructions one next to each other, which is not appropriate here. > > > > On 64 bits, use P6_NOP5; it seems to not suck on any platform. > > On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least > do okay. > > I can't say these are the *best* (in fact, they are guaranteed not the > best on some significant number of chips), but they haven't sucked on > any chips I have been able to measure -- and are way faster than JMP. Cool, thanks for the info! Steven and Jason should probably update their respective infrastructure to use the 32-bit 5-byte nop you propose rather than the 5-byte jump. Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-25 22:55 ` Mathieu Desnoyers @ 2010-10-26 0:39 ` Steven Rostedt 2010-10-26 1:14 ` Mathieu Desnoyers 0 siblings, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-10-26 0:39 UTC (permalink / raw) To: Mathieu Desnoyers Cc: H. Peter Anvin, Jason Baron, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Mon, 2010-10-25 at 18:55 -0400, Mathieu Desnoyers wrote: > * H. Peter Anvin (hpa@zytor.com) wrote: > > On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote: > > > * H. Peter Anvin (hpa@zytor.com) wrote: > > >> On 10/20/2010 08:27 AM, Jason Baron wrote: > > >>> > > >>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if > > >>> there's a better lcd for x86, I'll update it. But note, that since the > > >>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. > > >>> And in the boot path we are using 'text_poke_early()', so avoiding that > > >>> isn't going to improve things much. > > >>> > > >> > > >> It's still a completely unnecessary waste of startup time some > > >> potentially significant fraction of the time. Startup time matters, > > >> especially as the number of tracepoints grow. > > > > > > We're still waiting for input for the best single-5-byte-instruction nop that > > > will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two > > > instructions one next to each other, which is not appropriate here. > > > > > > > On 64 bits, use P6_NOP5; it seems to not suck on any platform. > > > > On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least > > do okay. > > > > I can't say these are the *best* (in fact, they are guaranteed not the > > best on some significant number of chips), but they haven't sucked on > > any chips I have been able to measure -- and are way faster than JMP. > > Cool, thanks for the info! Steven and Jason should probably update their > respective infrastructure to use the 32-bit 5-byte nop you propose rather than > the 5-byte jump. Actually, I was thinking that we could take any 5 byte nop. The alternate code is executed _before_ SMP is enabled. Thus we should not have any cases where something could be executing in midstream. -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-26 0:39 ` Steven Rostedt @ 2010-10-26 1:14 ` Mathieu Desnoyers 0 siblings, 0 replies; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-26 1:14 UTC (permalink / raw) To: Steven Rostedt Cc: H. Peter Anvin, Jason Baron, Thomas Gleixner, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony * Steven Rostedt (rostedt@goodmis.org) wrote: > On Mon, 2010-10-25 at 18:55 -0400, Mathieu Desnoyers wrote: > > * H. Peter Anvin (hpa@zytor.com) wrote: > > > On 10/25/2010 03:01 PM, Mathieu Desnoyers wrote: > > > > * H. Peter Anvin (hpa@zytor.com) wrote: > > > >> On 10/20/2010 08:27 AM, Jason Baron wrote: > > > >>> > > > >>> sure. The idea of the 'jmp 0' was simply to be an lcd for x86, if > > > >>> there's a better lcd for x86, I'll update it. But note, that since the > > > >>> 'jmp 0' is patched to a better nop at boot, we wouldn't see much gain. > > > >>> And in the boot path we are using 'text_poke_early()', so avoiding that > > > >>> isn't going to improve things much. > > > >>> > > > >> > > > >> It's still a completely unnecessary waste of startup time some > > > >> potentially significant fraction of the time. Startup time matters, > > > >> especially as the number of tracepoints grow. > > > > > > > > We're still waiting for input for the best single-5-byte-instruction nop that > > > > will work on all x86 variants. Please note that the GENERIC_NOP5 is actually two > > > > instructions one next to each other, which is not appropriate here. > > > > > > > > > > On 64 bits, use P6_NOP5; it seems to not suck on any platform. > > > > > > On 32 bits, 3E 8D 74 26 00 (i.e. DS: + GENERIC_NOP4) seems to at least > > > do okay. > > > > > > I can't say these are the *best* (in fact, they are guaranteed not the > > > best on some significant number of chips), but they haven't sucked on > > > any chips I have been able to measure -- and are way faster than JMP. > > > > Cool, thanks for the info! Steven and Jason should probably update their > > respective infrastructure to use the 32-bit 5-byte nop you propose rather than > > the 5-byte jump. > > Actually, I was thinking that we could take any 5 byte nop. The > alternate code is executed _before_ SMP is enabled. Thus we should not > have any cases where something could be executing in midstream. Nay, absolutely not. See, the goal here is to find a no-op that is good enough to be left there *without* init-time dynamic patching on a range of architectures, so we can diminish the boot-time delay. This imply that we have to select a no-op that can be patched in SMP context, thus it must be a single 5-byte instruction. We could even create a EMBEDDED config that lets specify that the built-in nop should be left there for embedded systems that care about boot time. Moreover, even if it was not the case, I'd be tempted to still use a single instruction 5-byte no-op just in case interrupts or any sorts (standard interrupts, nmis, mce or whatnot) would happen to be enabled earlier than this boot time nop patching. IOW, you'd need a _very_ strong argument to support using the fragile 2 instructions nops there. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:23 ` Steven Rostedt 2010-10-19 21:48 ` H. Peter Anvin @ 2010-10-19 22:04 ` Thomas Gleixner 2010-10-19 22:33 ` Steven Rostedt 1 sibling, 1 reply; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 22:04 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Steven Rostedt wrote: > On Tue, 2010-10-19 at 23:07 +0200, Thomas Gleixner wrote: > > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote: > > > > So that saves _TWO_ bytes of text and replaces: > > > > > > > > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > > > > - 25: 74 4d je 74 <test+0x74> > > > > + 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > > > > + 23: eb 4d jmp 72 <test+0x72> > > > > > > > > So it trades a conditional vs. two jumps ? WTF ?? > > > > > > Well, the one jmpq is noped out, and the jmp is non conditional. I've > > > > What are you smoking ? > > What? Are you saying that conditional jumps are just as fast as non > conditional ones? > > > > > In case the trace point is enabled the jmpq is there, so it jumps to > > 23 and jumps from there to 72. > > No, when we dynamically enable the tracepoint, it will jump to 25, not > 23. That's what the goto part is about. We add the do_trace label to the > table, and we make it point to that location. If we did it as you say, > then tracepoints would never be enabled. > > This is not unlike what we do with the function tracer. The original > code points to mcount which simply is: > > mcount: > retq > > And when we enable the callers, we have it jump to a different function. > > > > > In case the trace point is disabled the jmpq is noped out, so it jumps > > to 72 directly. > > That is correct. > > > > > > always thought a non conditional jmp was faster than a conditional one, > > > > I always thought, that at least some of the stuff which comes from > > tracing folks makes some sense. > > Is it still not making sense? > > > > > > since there's no need to go into the branch prediction logic. The CPU > > > can simply skip to the code to jump next. Of counse, this pollutes the > > > I$. > > > > We might consult Mathieu for further useless blurb on how CPUs work > > around broken code. > > The code worked fine before, it just was not very pretty. > > But it seemed that gcc for you inlined the code in the wrong spot. > Perhaps it's not a good idea to have the something like h - softirq_vec > in the parameter of the tracepoint. Not saying that your change is not > worth it. It is, because h - softirq_vec is used by others now too. Crap, crap, crap. This has nothing to do with the arguments of that trace point, it's a compiler problem and you are just hoping that GCC will do the right thing. That's the complete wrong assumption and as Jason confirmed GCC is not up to it at all. hpa just posted code which does the _RIGHT_ _THING_ independent of any compiler madness and you tracer folks just missed it. Your jump label optimization made code even worse for todays common compilers. Just admit it and fix that mess you created or simply disable it. Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:04 ` Thomas Gleixner @ 2010-10-19 22:33 ` Steven Rostedt 2010-10-21 16:18 ` Thomas Gleixner 0 siblings, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 22:33 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote: > hpa just posted code which does the _RIGHT_ _THING_ independent of any > compiler madness and you tracer folks just missed it. Thomas, Can you try this patch and see if it makes the object code better? -- Steve diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index a4a90b6..6264bd3 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -144,14 +144,19 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin, */ #define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ - static inline void trace_##name(proto) \ + static __always_inline int __trace_##name(proto) \ { \ JUMP_LABEL(&__tracepoint_##name.state, do_trace); \ - return; \ + return 0; \ do_trace: \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args)); \ + return 1; \ + } \ + static inline void trace_##name(proto) \ + { \ + unlikely(__trace_##name(args)); \ } \ static inline int \ register_trace_##name(void (*probe)(data_proto), void *data) \ ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:33 ` Steven Rostedt @ 2010-10-21 16:18 ` Thomas Gleixner 2010-10-21 17:05 ` Steven Rostedt 0 siblings, 1 reply; 93+ messages in thread From: Thomas Gleixner @ 2010-10-21 16:18 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Steven Rostedt wrote: > On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote: > > > hpa just posted code which does the _RIGHT_ _THING_ independent of any > > compiler madness and you tracer folks just missed it. > > Thomas, > > Can you try this patch and see if it makes the object code better? Nope, same result. Thanks, tglx > -- Steve > > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index a4a90b6..6264bd3 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -144,14 +144,19 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin, > */ > #define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \ > extern struct tracepoint __tracepoint_##name; \ > - static inline void trace_##name(proto) \ > + static __always_inline int __trace_##name(proto) \ > { \ > JUMP_LABEL(&__tracepoint_##name.state, do_trace); \ > - return; \ > + return 0; \ > do_trace: \ > __DO_TRACE(&__tracepoint_##name, \ > TP_PROTO(data_proto), \ > TP_ARGS(data_args)); \ > + return 1; \ > + } \ > + static inline void trace_##name(proto) \ > + { \ > + unlikely(__trace_##name(args)); \ > } \ > static inline int \ > register_trace_##name(void (*probe)(data_proto), void *data) \ > > ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-21 16:18 ` Thomas Gleixner @ 2010-10-21 17:05 ` Steven Rostedt 2010-10-21 19:56 ` Thomas Gleixner 0 siblings, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-10-21 17:05 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Thu, 2010-10-21 at 18:18 +0200, Thomas Gleixner wrote: > > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > > On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote: > > > > > hpa just posted code which does the _RIGHT_ _THING_ independent of any > > > compiler madness and you tracer folks just missed it. > > > > Thomas, > > > > Can you try this patch and see if it makes the object code better? > > Nope, same result. Yeah, I figured. Do you have CC_OPTIMIZE_FOR_SIZE set? And if you do, what happens if you disable it? -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-21 17:05 ` Steven Rostedt @ 2010-10-21 19:56 ` Thomas Gleixner 2010-10-25 22:31 ` H. Peter Anvin 0 siblings, 1 reply; 93+ messages in thread From: Thomas Gleixner @ 2010-10-21 19:56 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Thu, 21 Oct 2010, Steven Rostedt wrote: > On Thu, 2010-10-21 at 18:18 +0200, Thomas Gleixner wrote: > > > > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > > > > On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote: > > > > > > > hpa just posted code which does the _RIGHT_ _THING_ independent of any > > > > compiler madness and you tracer folks just missed it. > > > > > > Thomas, > > > > > > Can you try this patch and see if it makes the object code better? > > > > Nope, same result. > > Yeah, I figured. Do you have CC_OPTIMIZE_FOR_SIZE set? And if you do, > what happens if you disable it? Hmm. Indeed. That gets rid of the double jump. Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-21 19:56 ` Thomas Gleixner @ 2010-10-25 22:31 ` H. Peter Anvin 0 siblings, 0 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-25 22:31 UTC (permalink / raw) To: Thomas Gleixner Cc: Steven Rostedt, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/21/2010 12:56 PM, Thomas Gleixner wrote: > On Thu, 21 Oct 2010, Steven Rostedt wrote: > >> On Thu, 2010-10-21 at 18:18 +0200, Thomas Gleixner wrote: >>> >>> On Tue, 19 Oct 2010, Steven Rostedt wrote: >>> >>>> On Wed, 2010-10-20 at 00:04 +0200, Thomas Gleixner wrote: >>>> >>>>> hpa just posted code which does the _RIGHT_ _THING_ independent of any >>>>> compiler madness and you tracer folks just missed it. >>>> >>>> Thomas, >>>> >>>> Can you try this patch and see if it makes the object code better? >>> >>> Nope, same result. >> >> Yeah, I figured. Do you have CC_OPTIMIZE_FOR_SIZE set? And if you do, >> what happens if you disable it? > > Hmm. Indeed. That gets rid of the double jump. > -Os unfortunately drops a bunch of optimizations. With gcc 4.5.1 there is actually a way to guarantee to get rid of double jumps, which is that you tell gcc that it is branching to one of two targets: asm goto("1: .byte 0xe9 ; .long %l[t_no]-2f\n" "2:\n" /* patching infrastructure goes here */ : : "i" (bit) : : t_no, t_yes); __builtin_unreachable(); t_no: return false; t_yes: return true; [The open-coding of the jump is necessary to force the 5-byte form instead of the 2-byte form.] The patching machinery can recognize the case where the jump offset is zero and patch in a NOP instead. There does, however, seem to be a couple of problems: a) gcc 4.5.1 is required due to a bug in previous versions of gcc when an asm goto doesn't have a fallthrough case. b) it seems to encourage gcc to actively jump around as it reorders blocks, since gcc no longer sees a fallthrough case at all. Not sure I have a good solution for this, at least not with current gcc. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 20:55 ` Steven Rostedt 2010-10-19 21:07 ` Thomas Gleixner @ 2010-10-19 21:45 ` Thomas Gleixner 2010-10-19 22:14 ` Steven Rostedt 1 sibling, 1 reply; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 21:45 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Steven Rostedt wrote: > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote: > > Because you do the h - softvec in the tracepoint parameter? I got a > different result: I guess some serious whacking is due. The compiler adds two jumps when the parameter changes due to (h -softvec) instead of (h, softvec) ???? Dude, you can't be serious. If you would have asked about the compiler version I'm using and told me about the compiler version you are using, then I could take that answer somehow serious. It still would miss the "Uhhhh, your compiler creates crap code" alert, because that double jump is seriously broken and braindead. And I tell you more about this. You are going to piss off a lot of users of distro compilers because they will set CC_HAVE_ASM_GOTO happily and create the code I posted. Which will break the tracer no matter what. So you tracer maniacs happily played with some experimental compiler stuff w/o even testing your crap against something which ships with distros or is the reference 4.5 compiler on kernel.org ? I prefer you sending a patch to disable this, until it's sorted out, unless you want me to add some really outrageous changelog to the patch I'm going to put into tip tomorrow night, ok ? Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:45 ` Thomas Gleixner @ 2010-10-19 22:14 ` Steven Rostedt 0 siblings, 0 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 22:14 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 23:45 +0200, Thomas Gleixner wrote: > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > On Tue, 2010-10-19 at 21:49 +0200, Thomas Gleixner wrote: > > > > Because you do the h - softvec in the tracepoint parameter? I got a > > different result: > > I guess some serious whacking is due. > > The compiler adds two jumps when the parameter changes due to > (h -softvec) instead of (h, softvec) ???? > > Dude, you can't be serious. > > If you would have asked about the compiler version I'm using and told > me about the compiler version you are using, then I could take that > answer somehow serious. Heh, gcc has always been of a black magic for what it decided. But, anyway, I'm using a self built version (vanilla from gcc.gnu.org) of 4.5.1. What are you using? > > It still would miss the "Uhhhh, your compiler creates crap code" > alert, because that double jump is seriously broken and braindead. > > And I tell you more about this. You are going to piss off a lot of > users of distro compilers because they will set CC_HAVE_ASM_GOTO > happily and create the code I posted. Which will break the tracer no > matter what. > > So you tracer maniacs happily played with some experimental compiler > stuff w/o even testing your crap against something which ships with > distros or is the reference 4.5 compiler on kernel.org ? > > I prefer you sending a patch to disable this, until it's sorted out, > unless you want me to add some really outrageous changelog to the > patch I'm going to put into tip tomorrow night, ok ? Then lets just compare the crap versions you posted. > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > - 25: 74 4d je 74 <test+0x74> > + 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > + 23: eb 4d jmp 72 <test+0x72> Yes, gcc replaced a cmp and conditional jump with two unconditional jumps. One of these jumps on boot up will be converted to a nop. Thus the jump label code just converted a compare and conditional jump with a nop and a non conditional jump. This still sounds like a win to me, although we can do better. I guess those poor sobs using a distro kernel compiled with a distro gcc that has CC_HAVE_ASM_GOTO enabled will still be doing better than if it was doing the if (enable) code. -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 19:49 ` Thomas Gleixner 2010-10-19 20:55 ` Steven Rostedt @ 2010-10-19 21:16 ` David Daney 2010-10-19 21:32 ` Jason Baron 2010-10-19 21:47 ` Steven Rostedt 2010-10-19 21:28 ` Jason Baron 2 siblings, 2 replies; 93+ messages in thread From: David Daney @ 2010-10-19 21:16 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 12:49 PM, Thomas Gleixner wrote: [...] > So that saves _TWO_ bytes of text and replaces: > > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25<test+0x25> > - 25: 74 4d je 74<test+0x74> > + 1e: e9 00 00 00 00 jmpq 23<test+0x23> > + 23: eb 4d jmp 72<test+0x72> > > So it trades a conditional vs. two jumps ? WTF ?? > > I thought that jumplabel magic was supposed to get rid of the jump > over the tracing code ? In fact it adds another jump. Whatfor ? The 'asm goto' construct in GCC-4.5 is deficient in this area. GCC assumes that all exit paths from an 'asm goto' are equally likely, so the tracing (or dynamic printk etc.) code is assumed to be hot and is emitted inline. Since they are inline like this, there are all these jumps around them and they pollute the I-Cache. I was looking at fixing it, but I think a true general purpose fix would require enhancing GCC's grammar to allow specifying of the 'likelyness' of each exit path from 'asm goto'. David Daney > > Now even worse, when you NOP out the jmpq then your tracepoint is > still not enabled. Brilliant ! > > Did you guys ever look at the assembly output of that insane shite you > are advertising with lengthy explanations ? > > Obviously _NOT_ > > Come back when you can show me a clean imlementation of all this crap > which reproduces with my jumplabel enabled stock compiler. And please > just send me a patch w/o the blurb. > > And sane looks like: > > jmpq 2f<---- This gets noped out > 1: > mov %r12,%rdi > callq *(%r12) > [whatever cleanup it takes ] > leaveq > retq > > 2f: > [tracing gunk] > jmp 1b > > And further I want to see the tracing gunk in a minimal size so the > net/core/dev.c deinlining does not happen. > > Thanks, > > tglx > > P.S.: It might be helpful and polite if you'd take off your tracing > blinkers from time to time. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:16 ` David Daney @ 2010-10-19 21:32 ` Jason Baron 2010-10-19 21:38 ` David Daney 2010-10-19 21:47 ` Steven Rostedt 1 sibling, 1 reply; 93+ messages in thread From: Jason Baron @ 2010-10-19 21:32 UTC (permalink / raw) To: David Daney Cc: Thomas Gleixner, Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, Oct 19, 2010 at 02:16:54PM -0700, David Daney wrote: > On 10/19/2010 12:49 PM, Thomas Gleixner wrote: > [...] > >So that saves _TWO_ bytes of text and replaces: > > > >- 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25<test+0x25> > >- 25: 74 4d je 74<test+0x74> > >+ 1e: e9 00 00 00 00 jmpq 23<test+0x23> > >+ 23: eb 4d jmp 72<test+0x72> > > > >So it trades a conditional vs. two jumps ? WTF ?? > > > >I thought that jumplabel magic was supposed to get rid of the jump > >over the tracing code ? In fact it adds another jump. Whatfor ? > > The 'asm goto' construct in GCC-4.5 is deficient in this area. > > GCC assumes that all exit paths from an 'asm goto' are equally > likely, so the tracing (or dynamic printk etc.) code is assumed to > be hot and is emitted inline. Since they are inline like this, > there are all these jumps around them and they pollute the I-Cache. > > I was looking at fixing it, but I think a true general purpose fix > would require enhancing GCC's grammar to allow specifying of the > 'likelyness' of each exit path from 'asm goto'. > > David Daney > right, the next step is adding support for hot/cold labels, so the tracing code will be annotaed with a 'cold' label. Thus, not adding the 'jmp' above on line '23', and in fact moving the tracing code out-of-line. Maybe I haven't been clear on this. thanks, -Jason ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:32 ` Jason Baron @ 2010-10-19 21:38 ` David Daney 0 siblings, 0 replies; 93+ messages in thread From: David Daney @ 2010-10-19 21:38 UTC (permalink / raw) To: Jason Baron Cc: Thomas Gleixner, Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 02:32 PM, Jason Baron wrote: > On Tue, Oct 19, 2010 at 02:16:54PM -0700, David Daney wrote: >> On 10/19/2010 12:49 PM, Thomas Gleixner wrote: >> [...] >>> So that saves _TWO_ bytes of text and replaces: >>> >>> - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25<test+0x25> >>> - 25: 74 4d je 74<test+0x74> >>> + 1e: e9 00 00 00 00 jmpq 23<test+0x23> >>> + 23: eb 4d jmp 72<test+0x72> >>> >>> So it trades a conditional vs. two jumps ? WTF ?? >>> >>> I thought that jumplabel magic was supposed to get rid of the jump >>> over the tracing code ? In fact it adds another jump. Whatfor ? >> >> The 'asm goto' construct in GCC-4.5 is deficient in this area. >> >> GCC assumes that all exit paths from an 'asm goto' are equally >> likely, so the tracing (or dynamic printk etc.) code is assumed to >> be hot and is emitted inline. Since they are inline like this, >> there are all these jumps around them and they pollute the I-Cache. >> >> I was looking at fixing it, but I think a true general purpose fix >> would require enhancing GCC's grammar to allow specifying of the >> 'likelyness' of each exit path from 'asm goto'. >> >> David Daney >> > > right, the next step is adding support for hot/cold labels, so the > tracing code will be annotaed with a 'cold' label. Thus, not adding the > 'jmp' above on line '23', and in fact moving the tracing code > out-of-line. Maybe I haven't been clear on this. > Ok, so is anybody working on doing that? GCC-4.6 stage 1 (the time when a change like this could be merged) closes in 8 days. It is unfortunate that we have this shiny new feature that can't really be used because the infrastructure is only half baked. David Daney ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:16 ` David Daney 2010-10-19 21:32 ` Jason Baron @ 2010-10-19 21:47 ` Steven Rostedt 1 sibling, 0 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 21:47 UTC (permalink / raw) To: David Daney Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 14:16 -0700, David Daney wrote: > On 10/19/2010 12:49 PM, Thomas Gleixner wrote: > [...] > > So that saves _TWO_ bytes of text and replaces: > > > > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25<test+0x25> > > - 25: 74 4d je 74<test+0x74> > > + 1e: e9 00 00 00 00 jmpq 23<test+0x23> > > + 23: eb 4d jmp 72<test+0x72> > > > > So it trades a conditional vs. two jumps ? WTF ?? > > > > I thought that jumplabel magic was supposed to get rid of the jump > > over the tracing code ? In fact it adds another jump. Whatfor ? > > The 'asm goto' construct in GCC-4.5 is deficient in this area. > > GCC assumes that all exit paths from an 'asm goto' are equally likely, > so the tracing (or dynamic printk etc.) code is assumed to be hot and is > emitted inline. Since they are inline like this, there are all these > jumps around them and they pollute the I-Cache. > Interesting. I thought the driving force for asm goto was for tracepoints, as the documentation seems to reference them. One would think that the default would have been to make it the unlikely case, as it may be the only user of that code so far. > I was looking at fixing it, but I think a true general purpose fix would > require enhancing GCC's grammar to allow specifying of the 'likelyness' > of each exit path from 'asm goto'. That would be nice. Thanks, -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 19:49 ` Thomas Gleixner 2010-10-19 20:55 ` Steven Rostedt 2010-10-19 21:16 ` David Daney @ 2010-10-19 21:28 ` Jason Baron 2010-10-19 21:55 ` Thomas Gleixner 2 siblings, 1 reply; 93+ messages in thread From: Jason Baron @ 2010-10-19 21:28 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, Oct 19, 2010 at 09:49:45PM +0200, Thomas Gleixner wrote: > > > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > as an excuse for adding extra performance impact to kernel code, because when it > > will be replaced by asm gotos, all that will be left is the performance impact > > inappropriately justified as insignificant compared to the impact of the old > > tracepoint scheme. > > Can you at one point just stop your tracing lectures and look at the > facts ? > > The impact of a sensible tracepoint design on the code in question > before kstat_incr_softirqs_this_cpu() was added would have been a mere > _FIVE_ bytes of text. But the original tracepoint code itself is > _TWENTY_ bytes of text larger. > > So we trade horrible code plus 20 bytes text against 5 bytes of text > in the hotpath. And you tell me that these _FIVE_ bytes are impacting > performance so much that it's significant. > > Now with kstat_incr_softirqs_this_cpu() the impact is zero, it even > removes code. > > And talking about non impact of disabled trace points. The tracepoint > in question which made me look at the code results in deinlining > __raise_softirq_irqsoff() in net/dev/core.c. There goes your theory. > > So no, you _cannot_ tell what impact a tracepoint has in reality > except by looking at the assembly output. > > And what scares me way more is the size of a single tracepoint in a > code file. > > Just adding "trace_softirq_entry(nr);" adds 88 bytes of text. So > that's optimized tracing code ? > > All it's supposed to do is: > > if (enabled) > trace_foo(nr); > > Replace "if (enabled)" with your favourite code patching jump label > whatever magic. The above stupid version takes about 28, but the > "optimized" tracing code makes that 88. Brilliant. That's inlining > utter shite for no good reason. WTF is it necessary to inline all that > gunk ? > > Please spare me the "jump label will make this less intrusive" > lecture. I'm not interested at all. > > Let's instead look at some more facts: > > #include <linux/interrupt.h> > #include <linux/module.h> > > #include <trace/events/irq.h> > > static struct softirq_action softirq_vec[NR_SOFTIRQS]; > > void test(struct softirq_action *h) > { > trace_softirq_entry(h - softirq_vec); > > h->action(h); > } > > Compile this code with GCC 4.5 with and without jump labels (zap the > select HAVE_ARCH_JUMP_LABEL line in arch/x86/Kconfig) > > So now the !jumplabel case gives us: > > ../build/kernel/soft.o: file format elf64-x86-64 > > Disassembly of section .text: > > 0000000000000000 <test>: > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 41 55 push %r13 > 6: 49 89 fd mov %rdi,%r13 > 9: 49 81 ed 00 00 00 00 sub $0x0,%r13 > 10: 41 54 push %r12 > 12: 49 c1 ed 03 shr $0x3,%r13 > 16: 49 89 fc mov %rdi,%r12 > 19: 53 push %rbx > 1a: 48 83 ec 08 sub $0x8,%rsp > 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > 25: 74 4d je 74 <test+0x74> > 27: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 2e: 00 00 > 30: ff 80 44 e0 ff ff incl -0x1fbc(%rax) > 36: 48 8b 1d 00 00 00 00 mov 0x0(%rip),%rbx # 3d <test+0x3d> > 3d: 48 85 db test %rbx,%rbx > 40: 74 13 je 55 <test+0x55> > 42: 48 8b 7b 08 mov 0x8(%rbx),%rdi > 46: 44 89 ee mov %r13d,%esi > 49: ff 13 callq *(%rbx) > 4b: 48 83 c3 10 add $0x10,%rbx > 4f: 48 83 3b 00 cmpq $0x0,(%rbx) > 53: eb eb jmp 40 <test+0x40> > 55: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 5c: 00 00 > 5e: ff 88 44 e0 ff ff decl -0x1fbc(%rax) > 64: 48 8b 80 38 e0 ff ff mov -0x1fc8(%rax),%rax > 6b: a8 08 test $0x8,%al > 6d: 74 05 je 74 <test+0x74> > 6f: e8 00 00 00 00 callq 74 <test+0x74> > 74: 4c 89 e7 mov %r12,%rdi > 77: 41 ff 14 24 callq *(%r12) > 7b: 58 pop %rax > 7c: 5b pop %rbx > 7d: 41 5c pop %r12 > 7f: 41 5d pop %r13 > 81: c9 leaveq > 82: c3 retq > > The jumplabel=y case gives: > > ../build/kernel/soft.o: file format elf64-x86-64 > > Disassembly of section .text: > > 0000000000000000 <test>: > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 41 55 push %r13 > 6: 49 89 fd mov %rdi,%r13 > 9: 49 81 ed 00 00 00 00 sub $0x0,%r13 > 10: 41 54 push %r12 > 12: 49 c1 ed 03 shr $0x3,%r13 > 16: 49 89 fc mov %rdi,%r12 > 19: 53 push %rbx > 1a: 48 83 ec 08 sub $0x8,%rsp > 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > 23: eb 4d jmp 72 <test+0x72> > 25: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 2c: 00 00 > 2e: ff 80 44 e0 ff ff incl -0x1fbc(%rax) > 34: 48 8b 1d 00 00 00 00 mov 0x0(%rip),%rbx # 3b <test+0x3b> > 3b: 48 85 db test %rbx,%rbx > 3e: 74 13 je 53 <test+0x53> > 40: 48 8b 7b 08 mov 0x8(%rbx),%rdi > 44: 44 89 ee mov %r13d,%esi > 47: ff 13 callq *(%rbx) > 49: 48 83 c3 10 add $0x10,%rbx > 4d: 48 83 3b 00 cmpq $0x0,(%rbx) > 51: eb eb jmp 3e <test+0x3e> > 53: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax > 5a: 00 00 > 5c: ff 88 44 e0 ff ff decl -0x1fbc(%rax) > 62: 48 8b 80 38 e0 ff ff mov -0x1fc8(%rax),%rax > 69: a8 08 test $0x8,%al > 6b: 74 05 je 72 <test+0x72> > 6d: e8 00 00 00 00 callq 72 <test+0x72> > 72: 4c 89 e7 mov %r12,%rdi > 75: 41 ff 14 24 callq *(%r12) > 79: 58 pop %rax > 7a: 5b pop %rbx > 7b: 41 5c pop %r12 > 7d: 41 5d pop %r13 > 7f: c9 leaveq > 80: c3 retq > > So that saves _TWO_ bytes of text and replaces: > > - 1e: 83 3d 00 00 00 00 00 cmpl $0x0,0x0(%rip) # 25 <test+0x25> > - 25: 74 4d je 74 <test+0x74> > + 1e: e9 00 00 00 00 jmpq 23 <test+0x23> > + 23: eb 4d jmp 72 <test+0x72> > > So it trades a conditional vs. two jumps ? WTF ?? > right, so the 'jmpq' on boot on x86 gets patched with 5 byte no-op sequence. So in the disabled case we have no-op followed by a jump around the disabled code. > I thought that jumplabel magic was supposed to get rid of the jump > over the tracing code ? In fact it adds another jump. Whatfor ? > yes, that is the plan. gcc does not yet support hot/cold labels...once it does the second jump will go away and the entire tracepoint code will be moved to a 'cold' section. It's not quite completely optimal yet, but we are getting there. > Now even worse, when you NOP out the jmpq then your tracepoint is > still not enabled. Brilliant ! > The 'jmpq' in the enabled case is patched with a jmpq to the body of the tracepoint itself. > Did you guys ever look at the assembly output of that insane shite you > are advertising with lengthy explanations ? > > Obviously _NOT_ > > Come back when you can show me a clean imlementation of all this crap > which reproduces with my jumplabel enabled stock compiler. And please > just send me a patch w/o the blurb. > > And sane looks like: > > jmpq 2f <---- This gets noped out > 1: > mov %r12,%rdi > callq *(%r12) > [whatever cleanup it takes ] > leaveq > retq > > 2f: > [tracing gunk] > jmp 1b > yes, this is what the code should look like when we get support for hot/cold labels. I've discussed this support with gcc folk, and its the next step here. So yes, this is exacatly where we are headed. thanks, -Jason ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:28 ` Jason Baron @ 2010-10-19 21:55 ` Thomas Gleixner 2010-10-19 22:17 ` Thomas Gleixner 2010-10-19 22:38 ` Jason Baron 0 siblings, 2 replies; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 21:55 UTC (permalink / raw) To: Jason Baron Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Jason Baron wrote: > On Tue, Oct 19, 2010 at 09:49:45PM +0200, Thomas Gleixner wrote: > > > > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > > > So it trades a conditional vs. two jumps ? WTF ?? > > > > right, so the 'jmpq' on boot on x86 gets patched with 5 byte no-op > sequence. So in the disabled case we have no-op followed by a jump > around the disabled code. And that's supposed to be useful ? We do _NOT_ want to jump around disabled stuff. The noped out case should fall through into the non traced code. Otherwise that whole jumplabel thing is completely useless. > > I thought that jumplabel magic was supposed to get rid of the jump > > over the tracing code ? In fact it adds another jump. Whatfor ? > > > > yes, that is the plan. gcc does not yet support hot/cold labels...once > it does the second jump will go away and the entire tracepoint code will > be moved to a 'cold' section. It's not quite completely optimal yet, but > we are getting there. Then do not advertise it as the brilliant solution for all tracing matters. > > Now even worse, when you NOP out the jmpq then your tracepoint is > > still not enabled. Brilliant ! > > > > The 'jmpq' in the enabled case is patched with a jmpq to the body of the > tracepoint itself. Brilliant. > > Did you guys ever look at the assembly output of that insane shite you > > are advertising with lengthy explanations ? > > > > Obviously _NOT_ > > > > Come back when you can show me a clean imlementation of all this crap > > which reproduces with my jumplabel enabled stock compiler. And please > > just send me a patch w/o the blurb. > > > > And sane looks like: > > > > jmpq 2f <---- This gets noped out > > 1: > > mov %r12,%rdi > > callq *(%r12) > > [whatever cleanup it takes ] > > leaveq > > retq > > > > 2f: > > [tracing gunk] > > jmp 1b > > > > yes, this is what the code should look like when we get support for > hot/cold labels. I've discussed this support with gcc folk, and its the > next step here. So yes, this is exacatly where we are headed. So and at the same time the whole tracing crowd tells me, that this is already a done deal. See previous advertisments from DrTracing. I'm seriously grumpy about this especially in the context of a patch which fixes one of the worst interfaces I've seen in years. Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:55 ` Thomas Gleixner @ 2010-10-19 22:17 ` Thomas Gleixner 2010-10-20 1:36 ` Steven Rostedt 2010-10-19 22:38 ` Jason Baron 1 sibling, 1 reply; 93+ messages in thread From: Thomas Gleixner @ 2010-10-19 22:17 UTC (permalink / raw) To: Jason Baron Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 19 Oct 2010, Thomas Gleixner wrote: > On Tue, 19 Oct 2010, Jason Baron wrote: > > > Now even worse, when you NOP out the jmpq then your tracepoint is > > > still not enabled. Brilliant ! > > > > > > > The 'jmpq' in the enabled case is patched with a jmpq to the body of the > > tracepoint itself. > > Brilliant. IOW, We now jump around the jump which jumps around the disabled code. Thanks, tglx ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:17 ` Thomas Gleixner @ 2010-10-20 1:36 ` Steven Rostedt 2010-10-20 1:52 ` Jason Baron 0 siblings, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-10-20 1:36 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Baron, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Wed, 2010-10-20 at 00:17 +0200, Thomas Gleixner wrote: > On Tue, 19 Oct 2010, Thomas Gleixner wrote: > > On Tue, 19 Oct 2010, Jason Baron wrote: > > > > Now even worse, when you NOP out the jmpq then your tracepoint is > > > > still not enabled. Brilliant ! > > > > > > > > > > The 'jmpq' in the enabled case is patched with a jmpq to the body of the > > > tracepoint itself. > > > > Brilliant. > > IOW, We now jump around the jump which jumps around the disabled code. > Do you happen to have CONFIG_CC_OPTIMIZE_FOR_SIZE set? If so, then this is a known issue. We even originally had jump label enabled _only_ if CC_OPTIMIZE_FOR_SIZE was not set, but hpa NAK'd it. http://lkml.org/lkml/2010/9/22/482 http://lkml.org/lkml/2010/9/20/488 http://lkml.org/lkml/2010/9/24/259 -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-20 1:36 ` Steven Rostedt @ 2010-10-20 1:52 ` Jason Baron 2010-10-25 22:32 ` H. Peter Anvin 0 siblings, 1 reply; 93+ messages in thread From: Jason Baron @ 2010-10-20 1:52 UTC (permalink / raw) To: Steven Rostedt Cc: Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, Oct 19, 2010 at 09:36:30PM -0400, Steven Rostedt wrote: > > On Wed, 2010-10-20 at 00:17 +0200, Thomas Gleixner wrote: > > On Tue, 19 Oct 2010, Thomas Gleixner wrote: > > > On Tue, 19 Oct 2010, Jason Baron wrote: > > > > > Now even worse, when you NOP out the jmpq then your tracepoint is > > > > > still not enabled. Brilliant ! > > > > > > > > > > > > > The 'jmpq' in the enabled case is patched with a jmpq to the body of the > > > > tracepoint itself. > > > > > > Brilliant. > > > > IOW, We now jump around the jump which jumps around the disabled code. > > > > > Do you happen to have CONFIG_CC_OPTIMIZE_FOR_SIZE set? If so, then this > is a known issue. We even originally had jump label enabled _only_ if > CC_OPTIMIZE_FOR_SIZE was not set, but hpa NAK'd it. > > http://lkml.org/lkml/2010/9/22/482 > > http://lkml.org/lkml/2010/9/20/488 > > http://lkml.org/lkml/2010/9/24/259 > > -- Steve thanks Steve. I was about to say this. When CONFIG_CC_OPTIMIZE_FOR_SIZE is not set we don't get the double 'jmp' and the tracepoint code is moved out of line. It was mentioned that a number of distros ship with CONFIG_CC_OPTIMIZE_FOR_SIZE not set, and as Steve mentioned my original patch set was conditional on !CONFIG_CC_OPTIMIZE_FOR_SIZE. using hot/cold labels gcc can fix the CONFIG_CC_OPTIMIZE_FOR_SIZE case, but its a non-trivial amount of work for gcc. I was hoping that if jump labels are included, we could make the gcc work happen. thanks, -Jason ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-20 1:52 ` Jason Baron @ 2010-10-25 22:32 ` H. Peter Anvin 0 siblings, 0 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-25 22:32 UTC (permalink / raw) To: Jason Baron Cc: Steven Rostedt, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 06:52 PM, Jason Baron wrote: > > thanks Steve. I was about to say this. When CONFIG_CC_OPTIMIZE_FOR_SIZE > is not set we don't get the double 'jmp' and the tracepoint code is > moved out of line. It was mentioned that a number of distros ship with > CONFIG_CC_OPTIMIZE_FOR_SIZE not set, and as Steve mentioned my original > patch set was conditional on !CONFIG_CC_OPTIMIZE_FOR_SIZE. > > using hot/cold labels gcc can fix the CONFIG_CC_OPTIMIZE_FOR_SIZE case, > but its a non-trivial amount of work for gcc. I was hoping that if jump > labels are included, we could make the gcc work happen. > That's fair. I think jump labels are still a win even in the double-jump case (especially if the the tracepoint turns into a NOP rather than a JMP.) Code generated with -Os has a bunch of other problems, too. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 21:55 ` Thomas Gleixner 2010-10-19 22:17 ` Thomas Gleixner @ 2010-10-19 22:38 ` Jason Baron 2010-10-19 22:44 ` H. Peter Anvin 1 sibling, 1 reply; 93+ messages in thread From: Jason Baron @ 2010-10-19 22:38 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, Oct 19, 2010 at 11:55:19PM +0200, Thomas Gleixner wrote: > On Tue, 19 Oct 2010, Jason Baron wrote: > > On Tue, Oct 19, 2010 at 09:49:45PM +0200, Thomas Gleixner wrote: > > > > > On Tue, 19 Oct 2010, Steven Rostedt wrote: > > > > > > So it trades a conditional vs. two jumps ? WTF ?? > > > > > > > right, so the 'jmpq' on boot on x86 gets patched with 5 byte no-op > > sequence. So in the disabled case we have no-op followed by a jump > > around the disabled code. > > And that's supposed to be useful ? We do _NOT_ want to jump around > disabled stuff. The noped out case should fall through into the non > traced code. Otherwise that whole jumplabel thing is completely > useless. > > > > I thought that jumplabel magic was supposed to get rid of the jump > > > over the tracing code ? In fact it adds another jump. Whatfor ? > > > > > > > yes, that is the plan. gcc does not yet support hot/cold labels...once > > it does the second jump will go away and the entire tracepoint code will > > be moved to a 'cold' section. It's not quite completely optimal yet, but > > we are getting there. > > Then do not advertise it as the brilliant solution for all tracing > matters. > I'm not sure I did, the documentation says that we have nop followed by a jmp: +The new code is a 'nopl' followed by a 'jmp'. Thus: + +nopl - 0f 1f 44 00 00 - 5 bytes +jmp - eb 3e - 2 bytes http://marc.info/?l=linux-kernel&m=128717355231182&w=2` > > > Now even worse, when you NOP out the jmpq then your tracepoint is > > > still not enabled. Brilliant ! > > > > > > > The 'jmpq' in the enabled case is patched with a jmpq to the body of the > > tracepoint itself. > > Brilliant. > > > > Did you guys ever look at the assembly output of that insane shite you > > > are advertising with lengthy explanations ? > > > > > > Obviously _NOT_ > > > > > > Come back when you can show me a clean imlementation of all this crap > > > which reproduces with my jumplabel enabled stock compiler. And please > > > just send me a patch w/o the blurb. > > > > > > And sane looks like: > > > > > > jmpq 2f <---- This gets noped out > > > 1: > > > mov %r12,%rdi > > > callq *(%r12) > > > [whatever cleanup it takes ] > > > leaveq > > > retq > > > > > > 2f: > > > [tracing gunk] > > > jmp 1b > > > > > > > yes, this is what the code should look like when we get support for > > hot/cold labels. I've discussed this support with gcc folk, and its the > > next step here. So yes, this is exacatly where we are headed. > > So and at the same time the whole tracing crowd tells me, that this is > already a done deal. See previous advertisments from DrTracing. I'm > seriously grumpy about this especially in the context of a patch which > fixes one of the worst interfaces I've seen in years. > > Thanks, > > tglx sorry if I mislead anybody about the current state of of 'jump labels'. But we have the same goal in mind, and a clear path to get there. If you don't agree with the approach - I'm all ears. And you are right - the code is not where it should be yet. thanks, -Jason ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:38 ` Jason Baron @ 2010-10-19 22:44 ` H. Peter Anvin 2010-10-19 22:56 ` Steven Rostedt 0 siblings, 1 reply; 93+ messages in thread From: H. Peter Anvin @ 2010-10-19 22:44 UTC (permalink / raw) To: Jason Baron Cc: Thomas Gleixner, Mathieu Desnoyers, Steven Rostedt, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 03:38 PM, Jason Baron wrote: > > I'm not sure I did, the documentation says that we have nop followed by > a jmp: > > +The new code is a 'nopl' followed by a 'jmp'. Thus: > + > +nopl - 0f 1f 44 00 00 - 5 bytes > +jmp - eb 3e - 2 bytes > There is no excuse for needing the second jump here, obviously... -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:44 ` H. Peter Anvin @ 2010-10-19 22:56 ` Steven Rostedt 2010-10-19 22:57 ` H. Peter Anvin 0 siblings, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 22:56 UTC (permalink / raw) To: H. Peter Anvin Cc: Jason Baron, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 15:44 -0700, H. Peter Anvin wrote: > On 10/19/2010 03:38 PM, Jason Baron wrote: > > > > I'm not sure I did, the documentation says that we have nop followed by > > a jmp: > > > > +The new code is a 'nopl' followed by a 'jmp'. Thus: > > + > > +nopl - 0f 1f 44 00 00 - 5 bytes > > +jmp - eb 3e - 2 bytes > > > > There is no excuse for needing the second jump here, obviously... Now the trick is to tell gcc that. -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 22:56 ` Steven Rostedt @ 2010-10-19 22:57 ` H. Peter Anvin 0 siblings, 0 replies; 93+ messages in thread From: H. Peter Anvin @ 2010-10-19 22:57 UTC (permalink / raw) To: Steven Rostedt Cc: Jason Baron, Thomas Gleixner, Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On 10/19/2010 03:56 PM, Steven Rostedt wrote: >> >> There is no excuse for needing the second jump here, obviously... > > Now the trick is to tell gcc that. > Yes. -hpa ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 14:07 ` Thomas Gleixner 2010-10-19 14:28 ` Mathieu Desnoyers @ 2010-10-19 14:46 ` Steven Rostedt 1 sibling, 0 replies; 93+ messages in thread From: Steven Rostedt @ 2010-10-19 14:46 UTC (permalink / raw) To: Thomas Gleixner Cc: Mathieu Desnoyers, Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony On Tue, 2010-10-19 at 16:07 +0200, Thomas Gleixner wrote: > The vector computation is compared to the extra tracing induced jumps > probably not even measurable. Stop defending horrible coding with > handwavy performance and impact arguments. Yes this was crappy code, I'm not defending it. But this code was from the original tracepoints. I just looked at when this code was added, and it was still in the time TRACE_EVENT() was in a major flux. Heck, the code resided in include/trace/irq.h and not include/trace/events/irq.h. And yes, a lot of decisions back then were put on handwaving performance and impact, and it was not just coming from us. I admit I should have cleaned it up, but I did not want to touch it until it actually broke ;-) -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 13:41 ` Thomas Gleixner 2010-10-19 13:54 ` Steven Rostedt @ 2010-10-19 14:00 ` Mathieu Desnoyers 1 sibling, 0 replies; 93+ messages in thread From: Mathieu Desnoyers @ 2010-10-19 14:00 UTC (permalink / raw) To: Thomas Gleixner Cc: Koki Sanagi, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, Steven Rostedt, nhorman, scott.a.mcmillan, laijs, H. Peter Anvin, LKML, eric.dumazet, kaneshige.kenji, David Miller, izumi.taku, kosaki.motohiro, Heiko Carstens, Luck, Tony * Thomas Gleixner (tglx@linutronix.de) wrote: > On Tue, 19 Oct 2010, Mathieu Desnoyers wrote: > > > * Thomas Gleixner (tglx@linutronix.de) wrote: > > > With the addition of trace_softirq_raise() the softirq tracepoint got > > > even more convoluted. Why the tracepoints take two pointers to assign > > > an integer is beyond my comprehension. > > > > > > But adding an extra case which treats the first pointer as an unsigned > > > long when the second pointer is NULL including the back and forth > > > type casting is just horrible. > > > > > > Convert the softirq tracepoints to take a single unsigned int argument > > > for the softirq vector number and fix the call sites. > > > > Well, there was originally a reason for this oddness. The in __do_softirq(), > > "h - softirq_ve"c computation was not needed outside of the tracepoint handler > > in the past, but it now seems to be required with the new inlined > > "kstat_incr_softirqs_this_cpu()". > > Dudes, a vector computation is hardly a performance problem in that > function and definitely not an excuse for designing such horrible > interfaces. In this specific case, I think you are right. But things are not that trivial, and you know it. We have to consider: - Extra instruction cache footprint - Added register pressure - Added computation overhead of the added substraction - Frequency of code execution for all target architectures when we add tracepoints to performance sensitive code paths. As a general policy, we try to keep these at the lowest possible level, so that all tracepoints will be compiled into distro kernels without perceivable _overall_ performance overhead. It's not something that should be looked at only on a tracepoint-by-tracepoint overhead basis, but rather by looking at the overall system degradation that adding 300 tracepoints would cause. So I agree with you that it's a trade-off between interface cleanness and performance. When they were introduced, Tracepoint handlers were barely seen as citizen of the kernel code base, so all that mattered was to keep the "tracepoints off" case clean and fast. Now that tracepoint handlers seems to be increasingly accepted as part of the kernel code base, I agree that taking into account oddness performed in this handler becomes more important. It ends up being a question of balance between oddness inside the tracepoint handler and performance overhead in the off-case. The increased acceptance of the tracepoint code-base has shifted this balance slightly in favor of cleanness. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 93+ messages in thread
* [tip:perf/core] tracing: Cleanup the convoluted softirq tracepoints 2010-10-19 13:00 ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner 2010-10-19 13:08 ` Peter Zijlstra 2010-10-19 13:22 ` Mathieu Desnoyers @ 2010-10-21 14:52 ` tip-bot for Thomas Gleixner 2 siblings, 0 replies; 93+ messages in thread From: tip-bot for Thomas Gleixner @ 2010-10-21 14:52 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, fweisbec, rostedt, peterz, tglx Commit-ID: f4bc6bb2d562703eafc895c37e7be20906de139d Gitweb: http://git.kernel.org/tip/f4bc6bb2d562703eafc895c37e7be20906de139d Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 19 Oct 2010 15:00:13 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 21 Oct 2010 16:50:29 +0200 tracing: Cleanup the convoluted softirq tracepoints With the addition of trace_softirq_raise() the softirq tracepoint got even more convoluted. Why the tracepoints take two pointers to assign an integer is beyond my comprehension. But adding an extra case which treats the first pointer as an unsigned long when the second pointer is NULL including the back and forth type casting is just horrible. Convert the softirq tracepoints to take a single unsigned int argument for the softirq vector number and fix the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1010191428560.6815@localhost6.localdomain6> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: mathieu.desnoyers@efficios.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> --- include/linux/interrupt.h | 2 +- include/trace/events/irq.h | 54 ++++++++++++++++--------------------------- kernel/softirq.c | 16 +++++++----- 3 files changed, 30 insertions(+), 42 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 531495d..0ac1949 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -410,7 +410,7 @@ extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); static inline void __raise_softirq_irqoff(unsigned int nr) { - trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL); + trace_softirq_raise(nr); or_softirq_pending(1UL << nr); } diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index 6fa7cba..1c09820 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -86,76 +86,62 @@ TRACE_EVENT(irq_handler_exit, DECLARE_EVENT_CLASS(softirq, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec), + TP_ARGS(vec_nr), TP_STRUCT__entry( - __field( int, vec ) + __field( unsigned int, vec ) ), TP_fast_assign( - if (vec) - __entry->vec = (int)(h - vec); - else - __entry->vec = (int)(long)h; + __entry->vec = vec_nr; ), - TP_printk("vec=%d [action=%s]", __entry->vec, + TP_printk("vec=%u [action=%s]", __entry->vec, show_softirq_name(__entry->vec)) ); /** * softirq_entry - called immediately before the softirq handler - * @h: pointer to struct softirq_action - * @vec: pointer to first struct softirq_action in softirq_vec array + * @vec_nr: softirq vector number * - * The @h parameter, contains a pointer to the struct softirq_action - * which has a pointer to the action handler that is called. By subtracting - * the @vec pointer from the @h pointer, we can determine the softirq - * number. Also, when used in combination with the softirq_exit tracepoint - * we can determine the softirq latency. + * When used in combination with the softirq_exit tracepoint + * we can determine the softirq handler runtine. */ DEFINE_EVENT(softirq, softirq_entry, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec) + TP_ARGS(vec_nr) ); /** * softirq_exit - called immediately after the softirq handler returns - * @h: pointer to struct softirq_action - * @vec: pointer to first struct softirq_action in softirq_vec array + * @vec_nr: softirq vector number * - * The @h parameter contains a pointer to the struct softirq_action - * that has handled the softirq. By subtracting the @vec pointer from - * the @h pointer, we can determine the softirq number. Also, when used in - * combination with the softirq_entry tracepoint we can determine the softirq - * latency. + * When used in combination with the softirq_entry tracepoint + * we can determine the softirq handler runtine. */ DEFINE_EVENT(softirq, softirq_exit, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec) + TP_ARGS(vec_nr) ); /** * softirq_raise - called immediately when a softirq is raised - * @h: pointer to struct softirq_action - * @vec: pointer to first struct softirq_action in softirq_vec array + * @vec_nr: softirq vector number * - * The @h parameter contains a pointer to the softirq vector number which is - * raised. @vec is NULL and it means @h includes vector number not - * softirq_action. When used in combination with the softirq_entry tracepoint - * we can determine the softirq raise latency. + * When used in combination with the softirq_entry tracepoint + * we can determine the softirq raise to run latency. */ DEFINE_EVENT(softirq, softirq_raise, - TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + TP_PROTO(unsigned int vec_nr), - TP_ARGS(h, vec) + TP_ARGS(vec_nr) ); #endif /* _TRACE_IRQ_H */ diff --git a/kernel/softirq.c b/kernel/softirq.c index 07b4f1b..b3cb1dc 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -212,18 +212,20 @@ restart: do { if (pending & 1) { + unsigned int vec_nr = h - softirq_vec; int prev_count = preempt_count(); - kstat_incr_softirqs_this_cpu(h - softirq_vec); - trace_softirq_entry(h, softirq_vec); + kstat_incr_softirqs_this_cpu(vec_nr); + + trace_softirq_entry(vec_nr); h->action(h); - trace_softirq_exit(h, softirq_vec); + trace_softirq_exit(vec_nr); if (unlikely(prev_count != preempt_count())) { - printk(KERN_ERR "huh, entered softirq %td %s %p" + printk(KERN_ERR "huh, entered softirq %u %s %p" "with preempt_count %08x," - " exited with %08x?\n", h - softirq_vec, - softirq_to_name[h - softirq_vec], - h->action, prev_count, preempt_count()); + " exited with %08x?\n", vec_nr, + softirq_to_name[vec_nr], h->action, + prev_count, preempt_count()); preempt_count() = prev_count; } ^ permalink raw reply related [flat|nested] 93+ messages in thread
* [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi @ 2010-08-23 9:43 ` Koki Sanagi 2010-08-24 3:52 ` David Miller 2010-09-08 8:34 ` [tip:perf/core] napi: Convert " tip-bot for Neil Horman 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi ` (3 subsequent siblings) 5 siblings, 2 replies; 93+ messages in thread From: Koki Sanagi @ 2010-08-23 9:43 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Neil Horman <nhorman@tuxdriver.com> This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve the usability of napi_poll tracepoint. <idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3 <idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1 An original patch is below. http://marc.info/?l=linux-kernel&m=126021713809450&w=2 Signed-off-by: Neil Horman <nhorman@tuxdriver.com> And add a fix by Steven Rostedt. http://marc.info/?l=linux-kernel&m=126150506519173&w=2 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/trace/events/napi.h | 25 +++++++++++++++++++++++-- 1 files changed, 23 insertions(+), 2 deletions(-) diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h index 188deca..8fe1e93 100644 --- a/include/trace/events/napi.h +++ b/include/trace/events/napi.h @@ -6,10 +6,31 @@ #include <linux/netdevice.h> #include <linux/tracepoint.h> +#include <linux/ftrace.h> + +#define NO_DEV "(no_device)" + +TRACE_EVENT(napi_poll, -DECLARE_TRACE(napi_poll, TP_PROTO(struct napi_struct *napi), - TP_ARGS(napi)); + + TP_ARGS(napi), + + TP_STRUCT__entry( + __field( struct napi_struct *, napi) + __string( dev_name, napi->dev ? napi->dev->name : NO_DEV) + ), + + TP_fast_assign( + __entry->napi = napi; + __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV); + ), + + TP_printk("napi poll on napi struct %p for device %s", + __entry->napi, __get_str(dev_name)) +); + +#undef NO_DEV #endif /* _TRACE_NAPI_H_ */ ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi @ 2010-08-24 3:52 ` David Miller 2010-09-08 8:34 ` [tip:perf/core] napi: Convert " tip-bot for Neil Horman 1 sibling, 0 replies; 93+ messages in thread From: David Miller @ 2010-08-24 3:52 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:43:51 +0900 > From: Neil Horman <nhorman@tuxdriver.com> > > This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve > the usability of napi_poll tracepoint. > > <idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3 > <idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1 > > An original patch is below. > http://marc.info/?l=linux-kernel&m=126021713809450&w=2 > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > And add a fix by Steven Rostedt. > http://marc.info/?l=linux-kernel&m=126150506519173&w=2 > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 93+ messages in thread
* [tip:perf/core] napi: Convert trace_napi_poll to TRACE_EVENT 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi 2010-08-24 3:52 ` David Miller @ 2010-09-08 8:34 ` tip-bot for Neil Horman 1 sibling, 0 replies; 93+ messages in thread From: tip-bot for Neil Horman @ 2010-09-08 8:34 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro Commit-ID: 3e4b10d7a4d2a78af64f8096dc7cdb3bebd65adb Gitweb: http://git.kernel.org/tip/3e4b10d7a4d2a78af64f8096dc7cdb3bebd65adb Author: Neil Horman <nhorman@tuxdriver.com> AuthorDate: Mon, 23 Aug 2010 18:43:51 +0900 Committer: Frederic Weisbecker <fweisbec@gmail.com> CommitDate: Tue, 7 Sep 2010 17:51:01 +0200 napi: Convert trace_napi_poll to TRACE_EVENT This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve the usability of napi_poll tracepoint. <idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3 <idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1 The original patch is below: http://marc.info/?l=linux-kernel&m=126021713809450&w=2 [ sanagi.koki@jp.fujitsu.com: And add a fix by Steven Rostedt: http://marc.info/?l=linux-kernel&m=126150506519173&w=2 ] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4C7242D7.4050009@jp.fujitsu.com> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- include/trace/events/napi.h | 25 +++++++++++++++++++++++-- 1 files changed, 23 insertions(+), 2 deletions(-) diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h index 188deca..8fe1e93 100644 --- a/include/trace/events/napi.h +++ b/include/trace/events/napi.h @@ -6,10 +6,31 @@ #include <linux/netdevice.h> #include <linux/tracepoint.h> +#include <linux/ftrace.h> + +#define NO_DEV "(no_device)" + +TRACE_EVENT(napi_poll, -DECLARE_TRACE(napi_poll, TP_PROTO(struct napi_struct *napi), - TP_ARGS(napi)); + + TP_ARGS(napi), + + TP_STRUCT__entry( + __field( struct napi_struct *, napi) + __string( dev_name, napi->dev ? napi->dev->name : NO_DEV) + ), + + TP_fast_assign( + __entry->napi = napi; + __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV); + ), + + TP_printk("napi poll on napi struct %p for device %s", + __entry->napi, __get_str(dev_name)) +); + +#undef NO_DEV #endif /* _TRACE_NAPI_H_ */ ^ permalink raw reply related [flat|nested] 93+ messages in thread
* [PATCH v4 3/5] netdev: add tracepoints to netdev layer 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi @ 2010-08-23 9:45 ` Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-08 8:34 ` [tip:perf/core] netdev: Add " tip-bot for Koki Sanagi 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi ` (2 subsequent siblings) 5 siblings, 2 replies; 93+ messages in thread From: Koki Sanagi @ 2010-08-23 9:45 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and netif_receive_skb. These tracepoints help you to monitor network driver's input/output. <idle>-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84 <idle>-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84 sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226 sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/trace/events/net.h | 82 ++++++++++++++++++++++++++++++++++++++++++++ net/core/dev.c | 6 +++ net/core/net-traces.c | 1 + 3 files changed, 89 insertions(+), 0 deletions(-) diff --git a/include/trace/events/net.h b/include/trace/events/net.h new file mode 100644 index 0000000..5f247f5 --- /dev/null +++ b/include/trace/events/net.h @@ -0,0 +1,82 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM net + +#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NET_H + +#include <linux/skbuff.h> +#include <linux/netdevice.h> +#include <linux/ip.h> +#include <linux/tracepoint.h> + +TRACE_EVENT(net_dev_xmit, + + TP_PROTO(struct sk_buff *skb, + int rc), + + TP_ARGS(skb, rc), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + __field( unsigned int, len ) + __field( int, rc ) + __string( name, skb->dev->name ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + __entry->len = skb->len; + __entry->rc = rc; + __assign_str(name, skb->dev->name); + ), + + TP_printk("dev=%s skbaddr=%p len=%u rc=%d", + __get_str(name), __entry->skbaddr, __entry->len, __entry->rc) +); + +DECLARE_EVENT_CLASS(net_dev_template, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + __field( unsigned int, len ) + __string( name, skb->dev->name ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + __entry->len = skb->len; + __assign_str(name, skb->dev->name); + ), + + TP_printk("dev=%s skbaddr=%p len=%u", + __get_str(name), __entry->skbaddr, __entry->len) +) + +DEFINE_EVENT(net_dev_template, net_dev_queue, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); + +DEFINE_EVENT(net_dev_template, netif_receive_skb, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); + +DEFINE_EVENT(net_dev_template, netif_rx, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); +#endif /* _TRACE_NET_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/net/core/dev.c b/net/core/dev.c index 7cd5237..c9b026a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -128,6 +128,7 @@ #include <linux/jhash.h> #include <linux/random.h> #include <trace/events/napi.h> +#include <trace/events/net.h> #include <linux/pci.h> #include "net-sysfs.h" @@ -1978,6 +1979,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, } rc = ops->ndo_start_xmit(skb, dev); + trace_net_dev_xmit(skb, rc); if (rc == NETDEV_TX_OK) txq_trans_update(txq); return rc; @@ -1998,6 +2000,7 @@ gso: skb_dst_drop(nskb); rc = ops->ndo_start_xmit(nskb, dev); + trace_net_dev_xmit(nskb, rc); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) goto out_kfree_gso_skb; @@ -2186,6 +2189,7 @@ int dev_queue_xmit(struct sk_buff *skb) #ifdef CONFIG_NET_CLS_ACT skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS); #endif + trace_net_dev_queue(skb); if (q->enqueue) { rc = __dev_xmit_skb(skb, q, dev, txq); goto out; @@ -2525,6 +2529,7 @@ int netif_rx(struct sk_buff *skb) if (netdev_tstamp_prequeue) net_timestamp_check(skb); + trace_netif_rx(skb); #ifdef CONFIG_RPS { struct rps_dev_flow voidflow, *rflow = &voidflow; @@ -2841,6 +2846,7 @@ static int __netif_receive_skb(struct sk_buff *skb) if (!netdev_tstamp_prequeue) net_timestamp_check(skb); + trace_netif_receive_skb(skb); if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb)) return NET_RX_SUCCESS; diff --git a/net/core/net-traces.c b/net/core/net-traces.c index afa6380..7f1bb2a 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -26,6 +26,7 @@ #define CREATE_TRACE_POINTS #include <trace/events/skb.h> +#include <trace/events/net.h> #include <trace/events/napi.h> EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb); ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH v4 3/5] netdev: add tracepoints to netdev layer 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi @ 2010-08-24 3:53 ` David Miller 2010-09-08 8:34 ` [tip:perf/core] netdev: Add " tip-bot for Koki Sanagi 1 sibling, 0 replies; 93+ messages in thread From: David Miller @ 2010-08-24 3:53 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:45:02 +0900 > This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and > netif_receive_skb. These tracepoints help you to monitor network driver's > input/output. > > <idle>-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84 > <idle>-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84 > sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226 > sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0 > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 93+ messages in thread
* [tip:perf/core] netdev: Add tracepoints to netdev layer 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi 2010-08-24 3:53 ` David Miller @ 2010-09-08 8:34 ` tip-bot for Koki Sanagi 1 sibling, 0 replies; 93+ messages in thread From: tip-bot for Koki Sanagi @ 2010-09-08 8:34 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro Commit-ID: cf66ba58b5cb8b1526e9dd2fb96ff8db048d4d44 Gitweb: http://git.kernel.org/tip/cf66ba58b5cb8b1526e9dd2fb96ff8db048d4d44 Author: Koki Sanagi <sanagi.koki@jp.fujitsu.com> AuthorDate: Mon, 23 Aug 2010 18:45:02 +0900 Committer: Frederic Weisbecker <fweisbec@gmail.com> CommitDate: Tue, 7 Sep 2010 17:51:33 +0200 netdev: Add tracepoints to netdev layer This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and netif_receive_skb. These tracepoints help you to monitor network driver's input/output. <idle>-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84 <idle>-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84 sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226 sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4C72431E.3000901@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- include/trace/events/net.h | 82 ++++++++++++++++++++++++++++++++++++++++++++ net/core/dev.c | 6 +++ net/core/net-traces.c | 1 + 3 files changed, 89 insertions(+), 0 deletions(-) diff --git a/include/trace/events/net.h b/include/trace/events/net.h new file mode 100644 index 0000000..5f247f5 --- /dev/null +++ b/include/trace/events/net.h @@ -0,0 +1,82 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM net + +#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NET_H + +#include <linux/skbuff.h> +#include <linux/netdevice.h> +#include <linux/ip.h> +#include <linux/tracepoint.h> + +TRACE_EVENT(net_dev_xmit, + + TP_PROTO(struct sk_buff *skb, + int rc), + + TP_ARGS(skb, rc), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + __field( unsigned int, len ) + __field( int, rc ) + __string( name, skb->dev->name ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + __entry->len = skb->len; + __entry->rc = rc; + __assign_str(name, skb->dev->name); + ), + + TP_printk("dev=%s skbaddr=%p len=%u rc=%d", + __get_str(name), __entry->skbaddr, __entry->len, __entry->rc) +); + +DECLARE_EVENT_CLASS(net_dev_template, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + __field( unsigned int, len ) + __string( name, skb->dev->name ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + __entry->len = skb->len; + __assign_str(name, skb->dev->name); + ), + + TP_printk("dev=%s skbaddr=%p len=%u", + __get_str(name), __entry->skbaddr, __entry->len) +) + +DEFINE_EVENT(net_dev_template, net_dev_queue, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); + +DEFINE_EVENT(net_dev_template, netif_receive_skb, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); + +DEFINE_EVENT(net_dev_template, netif_rx, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); +#endif /* _TRACE_NET_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/net/core/dev.c b/net/core/dev.c index 3721fbb..5a4fbc7 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -128,6 +128,7 @@ #include <linux/jhash.h> #include <linux/random.h> #include <trace/events/napi.h> +#include <trace/events/net.h> #include <linux/pci.h> #include "net-sysfs.h" @@ -1978,6 +1979,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, } rc = ops->ndo_start_xmit(skb, dev); + trace_net_dev_xmit(skb, rc); if (rc == NETDEV_TX_OK) txq_trans_update(txq); return rc; @@ -1998,6 +2000,7 @@ gso: skb_dst_drop(nskb); rc = ops->ndo_start_xmit(nskb, dev); + trace_net_dev_xmit(nskb, rc); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) goto out_kfree_gso_skb; @@ -2186,6 +2189,7 @@ int dev_queue_xmit(struct sk_buff *skb) #ifdef CONFIG_NET_CLS_ACT skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS); #endif + trace_net_dev_queue(skb); if (q->enqueue) { rc = __dev_xmit_skb(skb, q, dev, txq); goto out; @@ -2512,6 +2516,7 @@ int netif_rx(struct sk_buff *skb) if (netdev_tstamp_prequeue) net_timestamp_check(skb); + trace_netif_rx(skb); #ifdef CONFIG_RPS { struct rps_dev_flow voidflow, *rflow = &voidflow; @@ -2828,6 +2833,7 @@ static int __netif_receive_skb(struct sk_buff *skb) if (!netdev_tstamp_prequeue) net_timestamp_check(skb); + trace_netif_receive_skb(skb); if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb)) return NET_RX_SUCCESS; diff --git a/net/core/net-traces.c b/net/core/net-traces.c index afa6380..7f1bb2a 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -26,6 +26,7 @@ #define CREATE_TRACE_POINTS #include <trace/events/skb.h> +#include <trace/events/net.h> #include <trace/events/napi.h> EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb); ^ permalink raw reply related [flat|nested] 93+ messages in thread
* [PATCH v4 4/5] skb: add tracepoints to freeing skb 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi ` (2 preceding siblings ...) 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi @ 2010-08-23 9:46 ` Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-08 8:35 ` [tip:perf/core] skb: Add " tip-bot for Koki Sanagi 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt 5 siblings, 2 replies; 93+ messages in thread From: Koki Sanagi @ 2010-08-23 9:46 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers This patch adds tracepoint to consume_skb and add trace_kfree_skb before __kfree_skb in skb_free_datagram_locked and net_tx_action. Combinating with tracepoint on dev_hard_start_xmit, we can check how long it takes to free transmited packets. And using it, we can calculate how many packets driver had at that time. It is useful when a drop of transmited packet is a problem. sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/trace/events/skb.h | 17 +++++++++++++++++ net/core/datagram.c | 1 + net/core/dev.c | 2 ++ net/core/skbuff.c | 1 + 4 files changed, 21 insertions(+), 0 deletions(-) diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 4b2be6d..75ce9d5 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb, __entry->skbaddr, __entry->protocol, __entry->location) ); +TRACE_EVENT(consume_skb, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + ), + + TP_printk("skbaddr=%p", __entry->skbaddr) +); + TRACE_EVENT(skb_copy_datagram_iovec, TP_PROTO(const struct sk_buff *skb, int len), diff --git a/net/core/datagram.c b/net/core/datagram.c index 251997a..282806b 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb) unlock_sock_fast(sk, slow); /* skb is now orphaned, can be freed outside of locked section */ + trace_kfree_skb(skb, skb_free_datagram_locked); __kfree_skb(skb); } EXPORT_SYMBOL(skb_free_datagram_locked); diff --git a/net/core/dev.c b/net/core/dev.c index c9b026a..48f7977 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -129,6 +129,7 @@ #include <linux/random.h> #include <trace/events/napi.h> #include <trace/events/net.h> +#include <trace/events/skb.h> #include <linux/pci.h> #include "net-sysfs.h" @@ -2589,6 +2590,7 @@ static void net_tx_action(struct softirq_action *h) clist = clist->next; WARN_ON(atomic_read(&skb->users)); + trace_kfree_skb(skb, net_tx_action); __kfree_skb(skb); } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 99ef721..ef4ffa8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb) smp_rmb(); else if (likely(!atomic_dec_and_test(&skb->users))) return; + trace_consume_skb(skb); __kfree_skb(skb); } EXPORT_SYMBOL(consume_skb); ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH v4 4/5] skb: add tracepoints to freeing skb 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi @ 2010-08-24 3:53 ` David Miller 2010-09-08 8:35 ` [tip:perf/core] skb: Add " tip-bot for Koki Sanagi 1 sibling, 0 replies; 93+ messages in thread From: David Miller @ 2010-08-24 3:53 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:46:12 +0900 > This patch adds tracepoint to consume_skb and add trace_kfree_skb before > __kfree_skb in skb_free_datagram_locked and net_tx_action. > Combinating with tracepoint on dev_hard_start_xmit, we can check how long it > takes to free transmited packets. And using it, we can calculate how many > packets driver had at that time. It is useful when a drop of transmited packet > is a problem. > > sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8 > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 93+ messages in thread
* [tip:perf/core] skb: Add tracepoints to freeing skb 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi 2010-08-24 3:53 ` David Miller @ 2010-09-08 8:35 ` tip-bot for Koki Sanagi 1 sibling, 0 replies; 93+ messages in thread From: tip-bot for Koki Sanagi @ 2010-09-08 8:35 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro Commit-ID: 07dc22e7295f25526f110d704655ff0ea7687420 Gitweb: http://git.kernel.org/tip/07dc22e7295f25526f110d704655ff0ea7687420 Author: Koki Sanagi <sanagi.koki@jp.fujitsu.com> AuthorDate: Mon, 23 Aug 2010 18:46:12 +0900 Committer: Frederic Weisbecker <fweisbec@gmail.com> CommitDate: Tue, 7 Sep 2010 17:51:53 +0200 skb: Add tracepoints to freeing skb This patch adds tracepoint to consume_skb and add trace_kfree_skb before __kfree_skb in skb_free_datagram_locked and net_tx_action. Combinating with tracepoint on dev_hard_start_xmit, we can check how long it takes to free transmitted packets. And using it, we can calculate how many packets driver had at that time. It is useful when a drop of transmitted packet is a problem. sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4C724364.50903@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- include/trace/events/skb.h | 17 +++++++++++++++++ net/core/datagram.c | 1 + net/core/dev.c | 2 ++ net/core/skbuff.c | 1 + 4 files changed, 21 insertions(+), 0 deletions(-) diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 4b2be6d..75ce9d5 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb, __entry->skbaddr, __entry->protocol, __entry->location) ); +TRACE_EVENT(consume_skb, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + ), + + TP_printk("skbaddr=%p", __entry->skbaddr) +); + TRACE_EVENT(skb_copy_datagram_iovec, TP_PROTO(const struct sk_buff *skb, int len), diff --git a/net/core/datagram.c b/net/core/datagram.c index 251997a..282806b 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb) unlock_sock_fast(sk, slow); /* skb is now orphaned, can be freed outside of locked section */ + trace_kfree_skb(skb, skb_free_datagram_locked); __kfree_skb(skb); } EXPORT_SYMBOL(skb_free_datagram_locked); diff --git a/net/core/dev.c b/net/core/dev.c index 5a4fbc7..2308cce 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -129,6 +129,7 @@ #include <linux/random.h> #include <trace/events/napi.h> #include <trace/events/net.h> +#include <trace/events/skb.h> #include <linux/pci.h> #include "net-sysfs.h" @@ -2576,6 +2577,7 @@ static void net_tx_action(struct softirq_action *h) clist = clist->next; WARN_ON(atomic_read(&skb->users)); + trace_kfree_skb(skb, net_tx_action); __kfree_skb(skb); } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3a2513f..12e61e3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb) smp_rmb(); else if (likely(!atomic_dec_and_test(&skb->users))) return; + trace_consume_skb(skb); __kfree_skb(skb); } EXPORT_SYMBOL(consume_skb); ^ permalink raw reply related [flat|nested] 93+ messages in thread
* [PATCH v4 5/5] perf:add a script shows a process of packet 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi ` (3 preceding siblings ...) 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi @ 2010-08-23 9:47 ` Koki Sanagi 2010-08-24 3:53 ` David Miller ` (2 more replies) 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt 5 siblings, 3 replies; 93+ messages in thread From: Koki Sanagi @ 2010-08-23 9:47 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers Add a perf script which shows a process of packets and processed time. It helps us to investigate networking or network device. If you want to use it, install perf and record perf.data like following. #perf trace record netdev-times [script] If you set script, perf gathers records until it ends. If not, you must Ctrl-C to stop recording. And if you want a report from record, #perf trace report netdev-times [options] If you use some options, you can limit an output. Option is below. tx: show only process of tx packets rx: show only process of rx packets dev=: show a process specified with this option debug: work with debug mode. It shows buffer status. For example, if you want to show a process of received packets associated with eth4, #perf trace report netdev-times rx dev=eth4 106133.171439sec cpu=0 irq_entry(+0.000msec irq=24:eth4) | softirq_entry(+0.006msec) | |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) | | | skb_copy_datagram_iovec(+0.039msec 10291::10291) | napi_poll_exit(+0.022msec eth4) This perf script helps us to analyze a process time of transmit/receive sequence. Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- tools/perf/scripts/python/bin/netdev-times-record | 8 + tools/perf/scripts/python/bin/netdev-times-report | 5 + tools/perf/scripts/python/netdev-times.py | 464 +++++++++++++++++++++ 3 files changed, 477 insertions(+), 0 deletions(-) diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record new file mode 100644 index 0000000..2b59511 --- /dev/null +++ b/tools/perf/scripts/python/bin/netdev-times-record @@ -0,0 +1,8 @@ +#!/bin/bash +perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue \ + -e net:netif_receive_skb -e net:netif_rx \ + -e skb:consume_skb -e skb:kfree_skb \ + -e skb:skb_copy_datagram_iovec -e napi:napi_poll \ + -e irq:irq_handler_entry -e irq:irq_handler_exit \ + -e irq:softirq_entry -e irq:softirq_exit \ + -e irq:softirq_raise $@ diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report new file mode 100644 index 0000000..c3d0a63 --- /dev/null +++ b/tools/perf/scripts/python/bin/netdev-times-report @@ -0,0 +1,5 @@ +#!/bin/bash +# description: display a process of packet and processing time +# args: [tx] [rx] [dev=] [debug] + +perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@ diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py new file mode 100644 index 0000000..9aa0a32 --- /dev/null +++ b/tools/perf/scripts/python/netdev-times.py @@ -0,0 +1,464 @@ +# Display a process of packets and processed time. +# It helps us to investigate networking or network device. +# +# options +# tx: show only tx chart +# rx: show only rx chart +# dev=: show only thing related to specified device +# debug: work with debug mode. It shows buffer status. + +import os +import sys + +sys.path.append(os.environ['PERF_EXEC_PATH'] + \ + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') + +from perf_trace_context import * +from Core import * +from Util import * + +all_event_list = []; # insert all tracepoint event related with this script +irq_dic = {}; # key is cpu and value is a list which stacks irqs + # which raise NET_RX softirq +net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry + # and a list which stacks receive +receive_hunk_list = []; # a list which include a sequence of receive events +rx_skb_list = []; # received packet list for matching + # skb_copy_datagram_iovec + +buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and + # tx_xmit_list +of_count_rx_skb_list = 0; # overflow count + +tx_queue_list = []; # list of packets which pass through dev_queue_xmit +of_count_tx_queue_list = 0; # overflow count + +tx_xmit_list = []; # list of packets which pass through dev_hard_start_xmit +of_count_tx_xmit_list = 0; # overflow count + +tx_free_list = []; # list of packets which is freed + +# options +show_tx = 0; +show_rx = 0; +dev = 0; # store a name of device specified by option "dev=" +debug = 0; + +# indices of event_info tuple +EINFO_IDX_NAME= 0 +EINFO_IDX_CONTEXT=1 +EINFO_IDX_CPU= 2 +EINFO_IDX_TIME= 3 +EINFO_IDX_PID= 4 +EINFO_IDX_COMM= 5 + +# Calculate a time interval(msec) from src(nsec) to dst(nsec) +def diff_msec(src, dst): + return (dst - src) / 1000000.0 + +# Display a process of transmitting a packet +def print_transmit(hunk): + if dev != 0 and hunk['dev'].find(dev) < 0: + return + print "%7s %5d %6d.%06dsec %12.3fmsec %12.3fmsec" % \ + (hunk['dev'], hunk['len'], + nsecs_secs(hunk['queue_t']), + nsecs_nsecs(hunk['queue_t'])/1000, + diff_msec(hunk['queue_t'], hunk['xmit_t']), + diff_msec(hunk['xmit_t'], hunk['free_t'])) + +# Format for displaying rx packet processing +PF_IRQ_ENTRY= " irq_entry(+%.3fmsec irq=%d:%s)" +PF_SOFT_ENTRY=" softirq_entry(+%.3fmsec)" +PF_NAPI_POLL= " napi_poll_exit(+%.3fmsec %s)" +PF_JOINT= " |" +PF_WJOINT= " | |" +PF_NET_RECV= " |---netif_receive_skb(+%.3fmsec skb=%x len=%d)" +PF_NET_RX= " |---netif_rx(+%.3fmsec skb=%x)" +PF_CPY_DGRAM= " | skb_copy_datagram_iovec(+%.3fmsec %d:%s)" +PF_KFREE_SKB= " | kfree_skb(+%.3fmsec location=%x)" +PF_CONS_SKB= " | consume_skb(+%.3fmsec)" + +# Display a process of received packets and interrputs associated with +# a NET_RX softirq +def print_receive(hunk): + show_hunk = 0 + irq_list = hunk['irq_list'] + cpu = irq_list[0]['cpu'] + base_t = irq_list[0]['irq_ent_t'] + # check if this hunk should be showed + if dev != 0: + for i in range(len(irq_list)): + if irq_list[i]['name'].find(dev) >= 0: + show_hunk = 1 + break + else: + show_hunk = 1 + if show_hunk == 0: + return + + print "%d.%06dsec cpu=%d" % \ + (nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu) + for i in range(len(irq_list)): + print PF_IRQ_ENTRY % \ + (diff_msec(base_t, irq_list[i]['irq_ent_t']), + irq_list[i]['irq'], irq_list[i]['name']) + print PF_JOINT + irq_event_list = irq_list[i]['event_list'] + for j in range(len(irq_event_list)): + irq_event = irq_event_list[j] + if irq_event['event'] == 'netif_rx': + print PF_NET_RX % \ + (diff_msec(base_t, irq_event['time']), + irq_event['skbaddr']) + print PF_JOINT + print PF_SOFT_ENTRY % \ + diff_msec(base_t, hunk['sirq_ent_t']) + print PF_JOINT + event_list = hunk['event_list'] + for i in range(len(event_list)): + event = event_list[i] + if event['event_name'] == 'napi_poll': + print PF_NAPI_POLL % \ + (diff_msec(base_t, event['event_t']), event['dev']) + if i == len(event_list) - 1: + print "" + else: + print PF_JOINT + else: + print PF_NET_RECV % \ + (diff_msec(base_t, event['event_t']), event['skbaddr'], + event['len']) + if 'comm' in event.keys(): + print PF_WJOINT + print PF_CPY_DGRAM % \ + (diff_msec(base_t, event['comm_t']), + event['pid'], event['comm']) + elif 'handle' in event.keys(): + print PF_WJOINT + if event['handle'] == "kfree_skb": + print PF_KFREE_SKB % \ + (diff_msec(base_t, + event['comm_t']), + event['location']) + elif event['handle'] == "consume_skb": + print PF_CONS_SKB % \ + diff_msec(base_t, + event['comm_t']) + print PF_JOINT + +def trace_begin(): + global show_tx + global show_rx + global dev + global debug + + for i in range(len(sys.argv)): + if i == 0: + continue + arg = sys.argv[i] + if arg == 'tx': + show_tx = 1 + elif arg =='rx': + show_rx = 1 + elif arg.find('dev=',0, 4) >= 0: + dev = arg[4:] + elif arg == 'debug': + debug = 1 + if show_tx == 0 and show_rx == 0: + show_tx = 1 + show_rx = 1 + +def trace_end(): + # order all events in time + all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME], + b[EINFO_IDX_TIME])) + # process all events + for i in range(len(all_event_list)): + event_info = all_event_list[i] + name = event_info[EINFO_IDX_NAME] + if name == 'irq__softirq_exit': + handle_irq_softirq_exit(event_info) + elif name == 'irq__softirq_entry': + handle_irq_softirq_entry(event_info) + elif name == 'irq__softirq_raise': + handle_irq_softirq_raise(event_info) + elif name == 'irq__irq_handler_entry': + handle_irq_handler_entry(event_info) + elif name == 'irq__irq_handler_exit': + handle_irq_handler_exit(event_info) + elif name == 'napi__napi_poll': + handle_napi_poll(event_info) + elif name == 'net__netif_receive_skb': + handle_netif_receive_skb(event_info) + elif name == 'net__netif_rx': + handle_netif_rx(event_info) + elif name == 'skb__skb_copy_datagram_iovec': + handle_skb_copy_datagram_iovec(event_info) + elif name == 'net__net_dev_queue': + handle_net_dev_queue(event_info) + elif name == 'net__net_dev_xmit': + handle_net_dev_xmit(event_info) + elif name == 'skb__kfree_skb': + handle_kfree_skb(event_info) + elif name == 'skb__consume_skb': + handle_consume_skb(event_info) + # display receive hunks + if show_rx: + for i in range(len(receive_hunk_list)): + print_receive(receive_hunk_list[i]) + # display transmit hunks + if show_tx: + print " dev len Qdisc " \ + " netdevice free" + for i in range(len(tx_free_list)): + print_transmit(tx_free_list[i]) + if debug: + print "debug buffer status" + print "----------------------------" + print "xmit Qdisc:remain:%d overflow:%d" % \ + (len(tx_queue_list), of_count_tx_queue_list) + print "xmit netdevice:remain:%d overflow:%d" % \ + (len(tx_xmit_list), of_count_tx_xmit_list) + print "receive:remain:%d overflow:%d" % \ + (len(rx_skb_list), of_count_rx_skb_list) + +# called from perf, when it finds a correspoinding event +def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm, + irq, irq_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + irq, irq_name) + all_event_list.append(event_info) + +def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret) + all_event_list.append(event_info) + +def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + napi, dev_name) + all_event_list.append(event_info) + +def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr, + skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr, + skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen, rc, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, rc ,dev_name) + all_event_list.append(event_info) + +def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm, + skbaddr, protocol, location): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, protocol, location) + all_event_list.append(event_info) + +def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr) + all_event_list.append(event_info) + +def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen) + all_event_list.append(event_info) + +def handle_irq_handler_entry(event_info): + (name, context, cpu, time, pid, comm, irq, irq_name) = event_info + if cpu not in irq_dic.keys(): + irq_dic[cpu] = [] + irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time} + irq_dic[cpu].append(irq_record) + +def handle_irq_handler_exit(event_info): + (name, context, cpu, time, pid, comm, irq, ret) = event_info + if cpu not in irq_dic.keys(): + return + irq_record = irq_dic[cpu].pop() + if irq != irq_record['irq']: + return + irq_record.update({'irq_ext_t':time}) + # if an irq doesn't include NET_RX softirq, drop. + if 'event_list' in irq_record.keys(): + irq_dic[cpu].append(irq_record) + +def handle_irq_softirq_raise(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + if cpu not in irq_dic.keys() \ + or len(irq_dic[cpu]) == 0: + return + irq_record = irq_dic[cpu].pop() + if 'event_list' in irq_record.keys(): + irq_event_list = irq_record['event_list'] + else: + irq_event_list = [] + irq_event_list.append({'time':time, 'event':'sirq_raise'}) + irq_record.update({'event_list':irq_event_list}) + irq_dic[cpu].append(irq_record) + +def handle_irq_softirq_entry(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]} + +def handle_irq_softirq_exit(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + irq_list = [] + event_list = 0 + if cpu in irq_dic.keys(): + irq_list = irq_dic[cpu] + del irq_dic[cpu] + if cpu in net_rx_dic.keys(): + sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t'] + event_list = net_rx_dic[cpu]['event_list'] + del net_rx_dic[cpu] + if irq_list == [] or event_list == 0: + return + rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time, + 'irq_list':irq_list, 'event_list':event_list} + # merge information realted to a NET_RX softirq + receive_hunk_list.append(rec_data) + +def handle_napi_poll(event_info): + (name, context, cpu, time, pid, comm, napi, dev_name) = event_info + if cpu in net_rx_dic.keys(): + event_list = net_rx_dic[cpu]['event_list'] + rec_data = {'event_name':'napi_poll', + 'dev':dev_name, 'event_t':time} + event_list.append(rec_data) + +def handle_netif_rx(event_info): + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + if cpu not in irq_dic.keys() \ + or len(irq_dic[cpu]) == 0: + return + irq_record = irq_dic[cpu].pop() + if 'event_list' in irq_record.keys(): + irq_event_list = irq_record['event_list'] + else: + irq_event_list = [] + irq_event_list.append({'time':time, 'event':'netif_rx', + 'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name}) + irq_record.update({'event_list':irq_event_list}) + irq_dic[cpu].append(irq_record) + +def handle_netif_receive_skb(event_info): + global of_count_rx_skb_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + if cpu in net_rx_dic.keys(): + rec_data = {'event_name':'netif_receive_skb', + 'event_t':time, 'skbaddr':skbaddr, 'len':skblen} + event_list = net_rx_dic[cpu]['event_list'] + event_list.append(rec_data) + rx_skb_list.insert(0, rec_data) + if len(rx_skb_list) > buffer_budget: + rx_skb_list.pop() + of_count_rx_skb_list += 1 + +def handle_net_dev_queue(event_info): + global of_count_tx_queue_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time} + tx_queue_list.insert(0, skb) + if len(tx_queue_list) > buffer_budget: + tx_queue_list.pop() + of_count_tx_queue_list += 1 + +def handle_net_dev_xmit(event_info): + global of_count_tx_xmit_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, rc, dev_name) = event_info + if rc == 0: # NETDEV_TX_OK + for i in range(len(tx_queue_list)): + skb = tx_queue_list[i] + if skb['skbaddr'] == skbaddr: + skb['xmit_t'] = time + tx_xmit_list.insert(0, skb) + del tx_queue_list[i] + if len(tx_xmit_list) > buffer_budget: + tx_xmit_list.pop() + of_count_tx_xmit_list += 1 + return + +def handle_kfree_skb(event_info): + (name, context, cpu, time, pid, comm, + skbaddr, protocol, location) = event_info + for i in range(len(tx_queue_list)): + skb = tx_queue_list[i] + if skb['skbaddr'] == skbaddr: + del tx_queue_list[i] + return + for i in range(len(tx_xmit_list)): + skb = tx_xmit_list[i] + if skb['skbaddr'] == skbaddr: + skb['free_t'] = time + tx_free_list.append(skb) + del tx_xmit_list[i] + return + for i in range(len(rx_skb_list)): + rec_data = rx_skb_list[i] + if rec_data['skbaddr'] == skbaddr: + rec_data.update({'handle':"kfree_skb", + 'comm':comm, 'pid':pid, 'comm_t':time}) + del rx_skb_list[i] + return + +def handle_consume_skb(event_info): + (name, context, cpu, time, pid, comm, skbaddr) = event_info + for i in range(len(tx_xmit_list)): + skb = tx_xmit_list[i] + if skb['skbaddr'] == skbaddr: + skb['free_t'] = time + tx_free_list.append(skb) + del tx_xmit_list[i] + return + +def handle_skb_copy_datagram_iovec(event_info): + (name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info + for i in range(len(rx_skb_list)): + rec_data = rx_skb_list[i] + if skbaddr == rec_data['skbaddr']: + rec_data.update({'handle':"skb_copy_datagram_iovec", + 'comm':comm, 'pid':pid, 'comm_t':time}) + del rx_skb_list[i] + return ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH v4 5/5] perf:add a script shows a process of packet 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi @ 2010-08-24 3:53 ` David Miller 2010-09-07 16:57 ` Frederic Weisbecker 2010-09-08 8:35 ` [tip:perf/core] perf: Add a script to show packets processing tip-bot for Koki Sanagi 2 siblings, 0 replies; 93+ messages in thread From: David Miller @ 2010-08-24 3:53 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:47:09 +0900 > Add a perf script which shows a process of packets and processed time. > It helps us to investigate networking or network device. ... > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 5/5] perf:add a script shows a process of packet 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi 2010-08-24 3:53 ` David Miller @ 2010-09-07 16:57 ` Frederic Weisbecker 2010-09-08 8:35 ` [tip:perf/core] perf: Add a script to show packets processing tip-bot for Koki Sanagi 2 siblings, 0 replies; 93+ messages in thread From: Frederic Weisbecker @ 2010-09-07 16:57 UTC (permalink / raw) To: Koki Sanagi Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, mathieu.desnoyers On Mon, Aug 23, 2010 at 06:47:09PM +0900, Koki Sanagi wrote: > Add a perf script which shows a process of packets and processed time. > It helps us to investigate networking or network device. > > If you want to use it, install perf and record perf.data like following. > > #perf trace record netdev-times [script] > > If you set script, perf gathers records until it ends. > If not, you must Ctrl-C to stop recording. > > And if you want a report from record, > > #perf trace report netdev-times [options] > > If you use some options, you can limit an output. > Option is below. > > tx: show only process of tx packets > rx: show only process of rx packets > dev=: show a process specified with this option > debug: work with debug mode. It shows buffer status. > > For example, if you want to show a process of received packets associated > with eth4, > > #perf trace report netdev-times rx dev=eth4 > 106133.171439sec cpu=0 > irq_entry(+0.000msec irq=24:eth4) > | > softirq_entry(+0.006msec) > | > |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) > | | > | skb_copy_datagram_iovec(+0.039msec 10291::10291) > | > napi_poll_exit(+0.022msec eth4) > > This perf script helps us to analyze a process time of transmit/receive > sequence. > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > --- > tools/perf/scripts/python/bin/netdev-times-record | 8 + > tools/perf/scripts/python/bin/netdev-times-report | 5 + > tools/perf/scripts/python/netdev-times.py | 464 +++++++++++++++++++++ > 3 files changed, 477 insertions(+), 0 deletions(-) > > diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record > new file mode 100644 > index 0000000..2b59511 > --- /dev/null > +++ b/tools/perf/scripts/python/bin/netdev-times-record > @@ -0,0 +1,8 @@ > +#!/bin/bash > +perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue \ Nano-nits: -c 1 and -R are now default settings for tracepoints and -f is not needed anymore. I've removed them. > +all_event_list = []; # insert all tracepoint event related with this script Ah I didn't know ";" works with python :) > +def trace_end(): > + # order all events in time > + all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME], > + b[EINFO_IDX_TIME])) Events already arrive in time order to the scripts. Thnaks! ^ permalink raw reply [flat|nested] 93+ messages in thread
* [tip:perf/core] perf: Add a script to show packets processing 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-07 16:57 ` Frederic Weisbecker @ 2010-09-08 8:35 ` tip-bot for Koki Sanagi 2 siblings, 0 replies; 93+ messages in thread From: tip-bot for Koki Sanagi @ 2010-09-08 8:35 UTC (permalink / raw) To: linux-tip-commits Cc: mingo, mathieu.desnoyers, sanagi.koki, fweisbec, rostedt, nhorman, scott.a.mcmillan, tglx, laijs, hpa, linux-kernel, eric.dumazet, tzanussi, kaneshige.kenji, davem, izumi.taku, kosaki.motohiro Commit-ID: 359d5106a2ff4ffa2ba129ec8f54743c341dabfc Gitweb: http://git.kernel.org/tip/359d5106a2ff4ffa2ba129ec8f54743c341dabfc Author: Koki Sanagi <sanagi.koki@jp.fujitsu.com> AuthorDate: Mon, 23 Aug 2010 18:47:09 +0900 Committer: Frederic Weisbecker <fweisbec@gmail.com> CommitDate: Tue, 7 Sep 2010 18:43:32 +0200 perf: Add a script to show packets processing Add a perf script which shows packets processing and processed time. It helps us to investigate networking or network devices. If you want to use it, install perf and record perf.data like following. If you set script, perf gathers records until it ends. If not, you must Ctrl-C to stop recording. And if you want a report from record, If you use some options, you can limit the output. Option is below. tx: show only tx packets processing rx: show only rx packets processing dev=: show processing on this device debug: work with debug mode. It shows buffer status. For example, if you want to show received packets processing associated with eth4, 106133.171439sec cpu=0 irq_entry(+0.000msec irq=24:eth4) | softirq_entry(+0.006msec) | |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) | | | skb_copy_datagram_iovec(+0.039msec 10291::10291) | napi_poll_exit(+0.022msec eth4) This perf script helps us to analyze the processing time of a transmit/receive sequence. Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4C72439D.3040001@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> --- tools/perf/scripts/python/bin/netdev-times-record | 8 + tools/perf/scripts/python/bin/netdev-times-report | 5 + tools/perf/scripts/python/netdev-times.py | 464 +++++++++++++++++++++ 3 files changed, 477 insertions(+), 0 deletions(-) diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record new file mode 100644 index 0000000..d931a82 --- /dev/null +++ b/tools/perf/scripts/python/bin/netdev-times-record @@ -0,0 +1,8 @@ +#!/bin/bash +perf record -a -e net:net_dev_xmit -e net:net_dev_queue \ + -e net:netif_receive_skb -e net:netif_rx \ + -e skb:consume_skb -e skb:kfree_skb \ + -e skb:skb_copy_datagram_iovec -e napi:napi_poll \ + -e irq:irq_handler_entry -e irq:irq_handler_exit \ + -e irq:softirq_entry -e irq:softirq_exit \ + -e irq:softirq_raise $@ diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report new file mode 100644 index 0000000..c3d0a63 --- /dev/null +++ b/tools/perf/scripts/python/bin/netdev-times-report @@ -0,0 +1,5 @@ +#!/bin/bash +# description: display a process of packet and processing time +# args: [tx] [rx] [dev=] [debug] + +perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@ diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py new file mode 100644 index 0000000..9aa0a32 --- /dev/null +++ b/tools/perf/scripts/python/netdev-times.py @@ -0,0 +1,464 @@ +# Display a process of packets and processed time. +# It helps us to investigate networking or network device. +# +# options +# tx: show only tx chart +# rx: show only rx chart +# dev=: show only thing related to specified device +# debug: work with debug mode. It shows buffer status. + +import os +import sys + +sys.path.append(os.environ['PERF_EXEC_PATH'] + \ + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') + +from perf_trace_context import * +from Core import * +from Util import * + +all_event_list = []; # insert all tracepoint event related with this script +irq_dic = {}; # key is cpu and value is a list which stacks irqs + # which raise NET_RX softirq +net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry + # and a list which stacks receive +receive_hunk_list = []; # a list which include a sequence of receive events +rx_skb_list = []; # received packet list for matching + # skb_copy_datagram_iovec + +buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and + # tx_xmit_list +of_count_rx_skb_list = 0; # overflow count + +tx_queue_list = []; # list of packets which pass through dev_queue_xmit +of_count_tx_queue_list = 0; # overflow count + +tx_xmit_list = []; # list of packets which pass through dev_hard_start_xmit +of_count_tx_xmit_list = 0; # overflow count + +tx_free_list = []; # list of packets which is freed + +# options +show_tx = 0; +show_rx = 0; +dev = 0; # store a name of device specified by option "dev=" +debug = 0; + +# indices of event_info tuple +EINFO_IDX_NAME= 0 +EINFO_IDX_CONTEXT=1 +EINFO_IDX_CPU= 2 +EINFO_IDX_TIME= 3 +EINFO_IDX_PID= 4 +EINFO_IDX_COMM= 5 + +# Calculate a time interval(msec) from src(nsec) to dst(nsec) +def diff_msec(src, dst): + return (dst - src) / 1000000.0 + +# Display a process of transmitting a packet +def print_transmit(hunk): + if dev != 0 and hunk['dev'].find(dev) < 0: + return + print "%7s %5d %6d.%06dsec %12.3fmsec %12.3fmsec" % \ + (hunk['dev'], hunk['len'], + nsecs_secs(hunk['queue_t']), + nsecs_nsecs(hunk['queue_t'])/1000, + diff_msec(hunk['queue_t'], hunk['xmit_t']), + diff_msec(hunk['xmit_t'], hunk['free_t'])) + +# Format for displaying rx packet processing +PF_IRQ_ENTRY= " irq_entry(+%.3fmsec irq=%d:%s)" +PF_SOFT_ENTRY=" softirq_entry(+%.3fmsec)" +PF_NAPI_POLL= " napi_poll_exit(+%.3fmsec %s)" +PF_JOINT= " |" +PF_WJOINT= " | |" +PF_NET_RECV= " |---netif_receive_skb(+%.3fmsec skb=%x len=%d)" +PF_NET_RX= " |---netif_rx(+%.3fmsec skb=%x)" +PF_CPY_DGRAM= " | skb_copy_datagram_iovec(+%.3fmsec %d:%s)" +PF_KFREE_SKB= " | kfree_skb(+%.3fmsec location=%x)" +PF_CONS_SKB= " | consume_skb(+%.3fmsec)" + +# Display a process of received packets and interrputs associated with +# a NET_RX softirq +def print_receive(hunk): + show_hunk = 0 + irq_list = hunk['irq_list'] + cpu = irq_list[0]['cpu'] + base_t = irq_list[0]['irq_ent_t'] + # check if this hunk should be showed + if dev != 0: + for i in range(len(irq_list)): + if irq_list[i]['name'].find(dev) >= 0: + show_hunk = 1 + break + else: + show_hunk = 1 + if show_hunk == 0: + return + + print "%d.%06dsec cpu=%d" % \ + (nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu) + for i in range(len(irq_list)): + print PF_IRQ_ENTRY % \ + (diff_msec(base_t, irq_list[i]['irq_ent_t']), + irq_list[i]['irq'], irq_list[i]['name']) + print PF_JOINT + irq_event_list = irq_list[i]['event_list'] + for j in range(len(irq_event_list)): + irq_event = irq_event_list[j] + if irq_event['event'] == 'netif_rx': + print PF_NET_RX % \ + (diff_msec(base_t, irq_event['time']), + irq_event['skbaddr']) + print PF_JOINT + print PF_SOFT_ENTRY % \ + diff_msec(base_t, hunk['sirq_ent_t']) + print PF_JOINT + event_list = hunk['event_list'] + for i in range(len(event_list)): + event = event_list[i] + if event['event_name'] == 'napi_poll': + print PF_NAPI_POLL % \ + (diff_msec(base_t, event['event_t']), event['dev']) + if i == len(event_list) - 1: + print "" + else: + print PF_JOINT + else: + print PF_NET_RECV % \ + (diff_msec(base_t, event['event_t']), event['skbaddr'], + event['len']) + if 'comm' in event.keys(): + print PF_WJOINT + print PF_CPY_DGRAM % \ + (diff_msec(base_t, event['comm_t']), + event['pid'], event['comm']) + elif 'handle' in event.keys(): + print PF_WJOINT + if event['handle'] == "kfree_skb": + print PF_KFREE_SKB % \ + (diff_msec(base_t, + event['comm_t']), + event['location']) + elif event['handle'] == "consume_skb": + print PF_CONS_SKB % \ + diff_msec(base_t, + event['comm_t']) + print PF_JOINT + +def trace_begin(): + global show_tx + global show_rx + global dev + global debug + + for i in range(len(sys.argv)): + if i == 0: + continue + arg = sys.argv[i] + if arg == 'tx': + show_tx = 1 + elif arg =='rx': + show_rx = 1 + elif arg.find('dev=',0, 4) >= 0: + dev = arg[4:] + elif arg == 'debug': + debug = 1 + if show_tx == 0 and show_rx == 0: + show_tx = 1 + show_rx = 1 + +def trace_end(): + # order all events in time + all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME], + b[EINFO_IDX_TIME])) + # process all events + for i in range(len(all_event_list)): + event_info = all_event_list[i] + name = event_info[EINFO_IDX_NAME] + if name == 'irq__softirq_exit': + handle_irq_softirq_exit(event_info) + elif name == 'irq__softirq_entry': + handle_irq_softirq_entry(event_info) + elif name == 'irq__softirq_raise': + handle_irq_softirq_raise(event_info) + elif name == 'irq__irq_handler_entry': + handle_irq_handler_entry(event_info) + elif name == 'irq__irq_handler_exit': + handle_irq_handler_exit(event_info) + elif name == 'napi__napi_poll': + handle_napi_poll(event_info) + elif name == 'net__netif_receive_skb': + handle_netif_receive_skb(event_info) + elif name == 'net__netif_rx': + handle_netif_rx(event_info) + elif name == 'skb__skb_copy_datagram_iovec': + handle_skb_copy_datagram_iovec(event_info) + elif name == 'net__net_dev_queue': + handle_net_dev_queue(event_info) + elif name == 'net__net_dev_xmit': + handle_net_dev_xmit(event_info) + elif name == 'skb__kfree_skb': + handle_kfree_skb(event_info) + elif name == 'skb__consume_skb': + handle_consume_skb(event_info) + # display receive hunks + if show_rx: + for i in range(len(receive_hunk_list)): + print_receive(receive_hunk_list[i]) + # display transmit hunks + if show_tx: + print " dev len Qdisc " \ + " netdevice free" + for i in range(len(tx_free_list)): + print_transmit(tx_free_list[i]) + if debug: + print "debug buffer status" + print "----------------------------" + print "xmit Qdisc:remain:%d overflow:%d" % \ + (len(tx_queue_list), of_count_tx_queue_list) + print "xmit netdevice:remain:%d overflow:%d" % \ + (len(tx_xmit_list), of_count_tx_xmit_list) + print "receive:remain:%d overflow:%d" % \ + (len(rx_skb_list), of_count_rx_skb_list) + +# called from perf, when it finds a correspoinding event +def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm, + irq, irq_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + irq, irq_name) + all_event_list.append(event_info) + +def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret) + all_event_list.append(event_info) + +def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + napi, dev_name) + all_event_list.append(event_info) + +def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr, + skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr, + skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen, rc, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, rc ,dev_name) + all_event_list.append(event_info) + +def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm, + skbaddr, protocol, location): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, protocol, location) + all_event_list.append(event_info) + +def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr) + all_event_list.append(event_info) + +def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen) + all_event_list.append(event_info) + +def handle_irq_handler_entry(event_info): + (name, context, cpu, time, pid, comm, irq, irq_name) = event_info + if cpu not in irq_dic.keys(): + irq_dic[cpu] = [] + irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time} + irq_dic[cpu].append(irq_record) + +def handle_irq_handler_exit(event_info): + (name, context, cpu, time, pid, comm, irq, ret) = event_info + if cpu not in irq_dic.keys(): + return + irq_record = irq_dic[cpu].pop() + if irq != irq_record['irq']: + return + irq_record.update({'irq_ext_t':time}) + # if an irq doesn't include NET_RX softirq, drop. + if 'event_list' in irq_record.keys(): + irq_dic[cpu].append(irq_record) + +def handle_irq_softirq_raise(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + if cpu not in irq_dic.keys() \ + or len(irq_dic[cpu]) == 0: + return + irq_record = irq_dic[cpu].pop() + if 'event_list' in irq_record.keys(): + irq_event_list = irq_record['event_list'] + else: + irq_event_list = [] + irq_event_list.append({'time':time, 'event':'sirq_raise'}) + irq_record.update({'event_list':irq_event_list}) + irq_dic[cpu].append(irq_record) + +def handle_irq_softirq_entry(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]} + +def handle_irq_softirq_exit(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + irq_list = [] + event_list = 0 + if cpu in irq_dic.keys(): + irq_list = irq_dic[cpu] + del irq_dic[cpu] + if cpu in net_rx_dic.keys(): + sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t'] + event_list = net_rx_dic[cpu]['event_list'] + del net_rx_dic[cpu] + if irq_list == [] or event_list == 0: + return + rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time, + 'irq_list':irq_list, 'event_list':event_list} + # merge information realted to a NET_RX softirq + receive_hunk_list.append(rec_data) + +def handle_napi_poll(event_info): + (name, context, cpu, time, pid, comm, napi, dev_name) = event_info + if cpu in net_rx_dic.keys(): + event_list = net_rx_dic[cpu]['event_list'] + rec_data = {'event_name':'napi_poll', + 'dev':dev_name, 'event_t':time} + event_list.append(rec_data) + +def handle_netif_rx(event_info): + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + if cpu not in irq_dic.keys() \ + or len(irq_dic[cpu]) == 0: + return + irq_record = irq_dic[cpu].pop() + if 'event_list' in irq_record.keys(): + irq_event_list = irq_record['event_list'] + else: + irq_event_list = [] + irq_event_list.append({'time':time, 'event':'netif_rx', + 'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name}) + irq_record.update({'event_list':irq_event_list}) + irq_dic[cpu].append(irq_record) + +def handle_netif_receive_skb(event_info): + global of_count_rx_skb_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + if cpu in net_rx_dic.keys(): + rec_data = {'event_name':'netif_receive_skb', + 'event_t':time, 'skbaddr':skbaddr, 'len':skblen} + event_list = net_rx_dic[cpu]['event_list'] + event_list.append(rec_data) + rx_skb_list.insert(0, rec_data) + if len(rx_skb_list) > buffer_budget: + rx_skb_list.pop() + of_count_rx_skb_list += 1 + +def handle_net_dev_queue(event_info): + global of_count_tx_queue_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time} + tx_queue_list.insert(0, skb) + if len(tx_queue_list) > buffer_budget: + tx_queue_list.pop() + of_count_tx_queue_list += 1 + +def handle_net_dev_xmit(event_info): + global of_count_tx_xmit_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, rc, dev_name) = event_info + if rc == 0: # NETDEV_TX_OK + for i in range(len(tx_queue_list)): + skb = tx_queue_list[i] + if skb['skbaddr'] == skbaddr: + skb['xmit_t'] = time + tx_xmit_list.insert(0, skb) + del tx_queue_list[i] + if len(tx_xmit_list) > buffer_budget: + tx_xmit_list.pop() + of_count_tx_xmit_list += 1 + return + +def handle_kfree_skb(event_info): + (name, context, cpu, time, pid, comm, + skbaddr, protocol, location) = event_info + for i in range(len(tx_queue_list)): + skb = tx_queue_list[i] + if skb['skbaddr'] == skbaddr: + del tx_queue_list[i] + return + for i in range(len(tx_xmit_list)): + skb = tx_xmit_list[i] + if skb['skbaddr'] == skbaddr: + skb['free_t'] = time + tx_free_list.append(skb) + del tx_xmit_list[i] + return + for i in range(len(rx_skb_list)): + rec_data = rx_skb_list[i] + if rec_data['skbaddr'] == skbaddr: + rec_data.update({'handle':"kfree_skb", + 'comm':comm, 'pid':pid, 'comm_t':time}) + del rx_skb_list[i] + return + +def handle_consume_skb(event_info): + (name, context, cpu, time, pid, comm, skbaddr) = event_info + for i in range(len(tx_xmit_list)): + skb = tx_xmit_list[i] + if skb['skbaddr'] == skbaddr: + skb['free_t'] = time + tx_free_list.append(skb) + del tx_xmit_list[i] + return + +def handle_skb_copy_datagram_iovec(event_info): + (name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info + for i in range(len(rx_skb_list)): + rec_data = rx_skb_list[i] + if skbaddr == rec_data['skbaddr']: + rec_data.update({'handle':"skb_copy_datagram_iovec", + 'comm':comm, 'pid':pid, 'comm_t':time}) + del rx_skb_list[i] + return ^ permalink raw reply related [flat|nested] 93+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi ` (4 preceding siblings ...) 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi @ 2010-08-30 23:50 ` Steven Rostedt 2010-09-03 2:10 ` Koki Sanagi 5 siblings, 1 reply; 93+ messages in thread From: Steven Rostedt @ 2010-08-30 23:50 UTC (permalink / raw) To: Koki Sanagi Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers On Mon, 2010-08-23 at 18:41 +0900, Koki Sanagi wrote: > Rebase to the latest net-next. > > CHANGE-LOG since v3: > 1) change arguments of softirq tracepoint into original one. > 2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked > and add trace_kfree_skb before __kfree_skb instead of them. > 3) add tracepoint to netif_rx and display it by netdev-times script. > > These patch-set adds tracepoints to show us a process of packets. > Using these tracepoints and existing points, we can get the time when > packet passes through some points in transmit or receive sequence. > For example, this is an output of perf script which is attached by patch 5/5. > > 106133.171439sec cpu=0 > irq_entry(+0.000msec irq=24:eth4) > | > softirq_entry(+0.006msec) > | > |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) > | | > | skb_copy_datagram_iovec(+0.039msec 10291::10291) > | > napi_poll_exit(+0.022msec eth4) > > 106134.175634sec cpu=1 > irq_entry(+0.000msec irq=28:eth1) > | > |---netif_rx(+0.009msec skb=f3ef0a00) > | > softirq_entry(+0.018msec) > | > |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84) > | | > | skb_copy_datagram_iovec(+0.033msec 0:swapper) > | > napi_poll_exit(+0.035msec (no_device)) > > The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can > show receive sequence frominterrupt(irq_entry) to application > (skb_copy_datagram_iovec). > This script shows one NET_RX softirq and events related to it. All relative > time bases on first irq_entry which raise NET_RX softirq. > > dev len Qdisc netdevice free > eth4 74 106125.030004sec 0.006msec 0.009msec > eth4 87 106125.041020sec 0.007msec 0.023msec > eth4 66 106125.042291sec 0.003msec 0.012msec > eth4 66 106125.043274sec 0.006msec 0.004msec > eth4 850 106125.044283sec 0.007msec 0.018msec > > The above is a transmit side. There are three check-time-points. > Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in > dev_hard_start_xmit. It indicates finishing putting a packet to driver. > point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet. > Values of this script are, from left, device name, length of a packet, a time of > point1, an interval time between point1 and point2 and an interval time between > point2 and point3. > > These times are useful to analyze a performance or to detect a point where > packet delays. For example, > - NET_RX softirq calling is late. > - Application is late to take a packet. > - It takes much time to put a transmitting packet to driver > (It may be caused by packed queue) > > And also, these tracepoint help us to investigate a network driver's trouble > from memory dump because ftrace records it to memory. And ftrace is so light > even if always trace on. So, in a case investigating a problem which doesn't > reproduce, it is useful. > The entire series: Acked-by: Steven Rostedt <rostedt@goodmis.org> -- Steve ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt @ 2010-09-03 2:10 ` Koki Sanagi 2010-09-03 2:17 ` David Miller 0 siblings, 1 reply; 93+ messages in thread From: Koki Sanagi @ 2010-09-03 2:10 UTC (permalink / raw) To: Steven Rostedt Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers (2010/08/31 8:50), Steven Rostedt wrote: > On Mon, 2010-08-23 at 18:41 +0900, Koki Sanagi wrote: >> Rebase to the latest net-next. >> >> CHANGE-LOG since v3: >> 1) change arguments of softirq tracepoint into original one. >> 2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked >> and add trace_kfree_skb before __kfree_skb instead of them. >> 3) add tracepoint to netif_rx and display it by netdev-times script. >> >> These patch-set adds tracepoints to show us a process of packets. >> Using these tracepoints and existing points, we can get the time when >> packet passes through some points in transmit or receive sequence. >> For example, this is an output of perf script which is attached by patch 5/5. >> >> 106133.171439sec cpu=0 >> irq_entry(+0.000msec irq=24:eth4) >> | >> softirq_entry(+0.006msec) >> | >> |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) >> | | >> | skb_copy_datagram_iovec(+0.039msec 10291::10291) >> | >> napi_poll_exit(+0.022msec eth4) >> >> 106134.175634sec cpu=1 >> irq_entry(+0.000msec irq=28:eth1) >> | >> |---netif_rx(+0.009msec skb=f3ef0a00) >> | >> softirq_entry(+0.018msec) >> | >> |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84) >> | | >> | skb_copy_datagram_iovec(+0.033msec 0:swapper) >> | >> napi_poll_exit(+0.035msec (no_device)) >> >> The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can >> show receive sequence frominterrupt(irq_entry) to application >> (skb_copy_datagram_iovec). >> This script shows one NET_RX softirq and events related to it. All relative >> time bases on first irq_entry which raise NET_RX softirq. >> >> dev len Qdisc netdevice free >> eth4 74 106125.030004sec 0.006msec 0.009msec >> eth4 87 106125.041020sec 0.007msec 0.023msec >> eth4 66 106125.042291sec 0.003msec 0.012msec >> eth4 66 106125.043274sec 0.006msec 0.004msec >> eth4 850 106125.044283sec 0.007msec 0.018msec >> >> The above is a transmit side. There are three check-time-points. >> Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in >> dev_hard_start_xmit. It indicates finishing putting a packet to driver. >> point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet. >> Values of this script are, from left, device name, length of a packet, a time of >> point1, an interval time between point1 and point2 and an interval time between >> point2 and point3. >> >> These times are useful to analyze a performance or to detect a point where >> packet delays. For example, >> - NET_RX softirq calling is late. >> - Application is late to take a packet. >> - It takes much time to put a transmitting packet to driver >> (It may be caused by packed queue) >> >> And also, these tracepoint help us to investigate a network driver's trouble >> from memory dump because ftrace records it to memory. And ftrace is so light >> even if always trace on. So, in a case investigating a problem which doesn't >> reproduce, it is useful. >> > > The entire series: > > Acked-by: Steven Rostedt <rostedt@goodmis.org> > > -- Steve > Thanks many acks. and I have one question. These patches have several component. Patch1 is kernel component, but patch2-5 are netdev component. What tree is good to be included ? If it is not net-next, I must rebase to another tree. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 2:10 ` Koki Sanagi @ 2010-09-03 2:17 ` David Miller 2010-09-03 2:55 ` Koki Sanagi 0 siblings, 1 reply; 93+ messages in thread From: David Miller @ 2010-09-03 2:17 UTC (permalink / raw) To: sanagi.koki Cc: rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Fri, 03 Sep 2010 11:10:51 +0900 > Thanks many acks. and I have one question. > > These patches have several component. > > Patch1 is kernel component, but patch2-5 are netdev component. > What tree is good to be included ? > If it is not net-next, I must rebase to another tree. I would prefer it goes into the tracing tree or whatever is the most appropriate for patch #1. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 2:17 ` David Miller @ 2010-09-03 2:55 ` Koki Sanagi 2010-09-03 4:46 ` Frederic Weisbecker 0 siblings, 1 reply; 93+ messages in thread From: Koki Sanagi @ 2010-09-03 2:55 UTC (permalink / raw) To: David Miller Cc: rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers (2010/09/03 11:17), David Miller wrote: > From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > Date: Fri, 03 Sep 2010 11:10:51 +0900 > >> Thanks many acks. and I have one question. >> >> These patches have several component. >> >> Patch1 is kernel component, but patch2-5 are netdev component. >> What tree is good to be included ? >> If it is not net-next, I must rebase to another tree. > > I would prefer it goes into the tracing tree or whatever is the most appropriate > for patch #1. > O.K. I'll rebase to linux-2.6-tip. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 2:55 ` Koki Sanagi @ 2010-09-03 4:46 ` Frederic Weisbecker 2010-09-03 5:12 ` Koki Sanagi 0 siblings, 1 reply; 93+ messages in thread From: Frederic Weisbecker @ 2010-09-03 4:46 UTC (permalink / raw) To: Koki Sanagi Cc: David Miller, rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, Sep 03, 2010 at 11:55:04AM +0900, Koki Sanagi wrote: > (2010/09/03 11:17), David Miller wrote: > > From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > > Date: Fri, 03 Sep 2010 11:10:51 +0900 > > > >> Thanks many acks. and I have one question. > >> > >> These patches have several component. > >> > >> Patch1 is kernel component, but patch2-5 are netdev component. > >> What tree is good to be included ? > >> If it is not net-next, I must rebase to another tree. > > > > I would prefer it goes into the tracing tree or whatever is the most appropriate > > for patch #1. > > > > O.K. I'll rebase to linux-2.6-tip. No need, they apply very well :) I'll push that to -tip soon. Thanks. ^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 4:46 ` Frederic Weisbecker @ 2010-09-03 5:12 ` Koki Sanagi 0 siblings, 0 replies; 93+ messages in thread From: Koki Sanagi @ 2010-09-03 5:12 UTC (permalink / raw) To: Frederic Weisbecker Cc: David Miller, rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers (2010/09/03 13:46), Frederic Weisbecker wrote: > On Fri, Sep 03, 2010 at 11:55:04AM +0900, Koki Sanagi wrote: >> (2010/09/03 11:17), David Miller wrote: >>> From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> >>> Date: Fri, 03 Sep 2010 11:10:51 +0900 >>> >>>> Thanks many acks. and I have one question. >>>> >>>> These patches have several component. >>>> >>>> Patch1 is kernel component, but patch2-5 are netdev component. >>>> What tree is good to be included ? >>>> If it is not net-next, I must rebase to another tree. >>> >>> I would prefer it goes into the tracing tree or whatever is the most appropriate >>> for patch #1. >>> >> >> O.K. I'll rebase to linux-2.6-tip. > > > No need, they apply very well :) > > I'll push that to -tip soon. > > Thanks. > O.K. Thanks! Koki Sanagi. ^ permalink raw reply [flat|nested] 93+ messages in thread
end of thread, other threads:[~2010-10-26 1:14 UTC | newest] Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi 2010-09-03 15:29 ` Frederic Weisbecker 2010-09-03 15:39 ` Steven Rostedt 2010-09-03 15:42 ` Frederic Weisbecker 2010-09-03 15:43 ` Steven Rostedt 2010-09-03 15:50 ` Frederic Weisbecker 2010-09-06 1:46 ` Koki Sanagi 2010-09-08 8:33 ` [tip:perf/core] irq: Add " tip-bot for Lai Jiangshan 2010-09-08 11:25 ` [sparc build bug] " Ingo Molnar 2010-09-08 12:26 ` [PATCH] irq: Fix circular headers dependency Frederic Weisbecker 2010-09-09 19:54 ` [tip:perf/core] " tip-bot for Frederic Weisbecker 2010-10-18 9:44 ` [sparc build bug] Re: [tip:perf/core] irq: Add tracepoint to softirq_raise Peter Zijlstra 2010-10-18 10:11 ` Peter Zijlstra 2010-10-18 10:26 ` Heiko Carstens 2010-10-18 10:48 ` Peter Zijlstra 2010-10-19 10:58 ` Koki Sanagi 2010-10-19 11:25 ` Peter Zijlstra 2010-10-19 13:00 ` [PATCH] tracing: Cleanup the convoluted softirq tracepoints Thomas Gleixner 2010-10-19 13:08 ` Peter Zijlstra 2010-10-19 13:22 ` Mathieu Desnoyers 2010-10-19 13:41 ` Thomas Gleixner 2010-10-19 13:54 ` Steven Rostedt 2010-10-19 14:07 ` Thomas Gleixner 2010-10-19 14:28 ` Mathieu Desnoyers 2010-10-19 19:49 ` Thomas Gleixner 2010-10-19 20:55 ` Steven Rostedt 2010-10-19 21:07 ` Thomas Gleixner 2010-10-19 21:23 ` Steven Rostedt 2010-10-19 21:48 ` H. Peter Anvin 2010-10-19 22:23 ` Steven Rostedt 2010-10-19 22:26 ` H. Peter Anvin 2010-10-19 22:27 ` Peter Zijlstra 2010-10-19 23:39 ` H. Peter Anvin 2010-10-19 23:45 ` Steven Rostedt 2010-10-20 0:43 ` Jason Baron 2010-10-19 22:41 ` Mathieu Desnoyers 2010-10-19 22:49 ` H. Peter Anvin 2010-10-19 23:05 ` Steven Rostedt 2010-10-19 23:09 ` H. Peter Anvin 2010-10-20 15:27 ` Jason Baron 2010-10-20 15:41 ` Mathieu Desnoyers 2010-10-25 21:54 ` H. Peter Anvin 2010-10-25 22:01 ` Mathieu Desnoyers 2010-10-25 22:12 ` H. Peter Anvin 2010-10-25 22:19 ` H. Peter Anvin 2010-10-25 22:55 ` Mathieu Desnoyers 2010-10-26 0:39 ` Steven Rostedt 2010-10-26 1:14 ` Mathieu Desnoyers 2010-10-19 22:04 ` Thomas Gleixner 2010-10-19 22:33 ` Steven Rostedt 2010-10-21 16:18 ` Thomas Gleixner 2010-10-21 17:05 ` Steven Rostedt 2010-10-21 19:56 ` Thomas Gleixner 2010-10-25 22:31 ` H. Peter Anvin 2010-10-19 21:45 ` Thomas Gleixner 2010-10-19 22:14 ` Steven Rostedt 2010-10-19 21:16 ` David Daney 2010-10-19 21:32 ` Jason Baron 2010-10-19 21:38 ` David Daney 2010-10-19 21:47 ` Steven Rostedt 2010-10-19 21:28 ` Jason Baron 2010-10-19 21:55 ` Thomas Gleixner 2010-10-19 22:17 ` Thomas Gleixner 2010-10-20 1:36 ` Steven Rostedt 2010-10-20 1:52 ` Jason Baron 2010-10-25 22:32 ` H. Peter Anvin 2010-10-19 22:38 ` Jason Baron 2010-10-19 22:44 ` H. Peter Anvin 2010-10-19 22:56 ` Steven Rostedt 2010-10-19 22:57 ` H. Peter Anvin 2010-10-19 14:46 ` Steven Rostedt 2010-10-19 14:00 ` Mathieu Desnoyers 2010-10-21 14:52 ` [tip:perf/core] " tip-bot for Thomas Gleixner 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi 2010-08-24 3:52 ` David Miller 2010-09-08 8:34 ` [tip:perf/core] napi: Convert " tip-bot for Neil Horman 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-08 8:34 ` [tip:perf/core] netdev: Add " tip-bot for Koki Sanagi 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-08 8:35 ` [tip:perf/core] skb: Add " tip-bot for Koki Sanagi 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-07 16:57 ` Frederic Weisbecker 2010-09-08 8:35 ` [tip:perf/core] perf: Add a script to show packets processing tip-bot for Koki Sanagi 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt 2010-09-03 2:10 ` Koki Sanagi 2010-09-03 2:17 ` David Miller 2010-09-03 2:55 ` Koki Sanagi 2010-09-03 4:46 ` Frederic Weisbecker 2010-09-03 5:12 ` Koki Sanagi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.