All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
@ 2013-08-06 16:08 Oleg Nesterov
  2013-08-06 16:08 ` [PATCH v2 1/3] tracing/perf: Expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Oleg Nesterov @ 2013-08-06 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, David Ahern,
	Masami Hiramatsu, zhangwei(Jovi),
	linux-kernel

On 08/05, Steven Rostedt wrote:
>
> > Sorry... should I resend once again ?
>
> Sure, why not. It's only 3 patches :-)

OK. Added "v2" to avoid the confusion.

The only change is

	- Reviewed-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
	+ Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
	+ Acked-by: Steven Rostedt <rostedt@goodmis.org>
	
> I wonder if the Reviewed-by assumes the Acked-by?

I dunno. But since you gave me both, I'd better preserve them all.

Oleg.

 include/trace/events/sched.h |   22 ++++++++--------------
 include/trace/ftrace.h       |   33 ++++++++++++++++++++-------------
 2 files changed, 28 insertions(+), 27 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] tracing/perf: Expand TRACE_EVENT(sched_stat_runtime)
  2013-08-06 16:08 [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
@ 2013-08-06 16:08 ` Oleg Nesterov
  2013-08-15 19:48   ` [tip:perf/core] tracing/perf: Expand TRACE_EVENT( sched_stat_runtime) tip-bot for Oleg Nesterov
  2013-08-06 16:08 ` [PATCH v2 2/3] tracing/perf: Reimplement TP_perf_assign() logic Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2013-08-06 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, David Ahern,
	Masami Hiramatsu, zhangwei(Jovi),
	linux-kernel

To simplify the review of the next patches:

1. We are going to reimplent __perf_task/counter and embedd them
   into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
   DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
   different TP_ARGS's.

2. Change perf_trace_##call() macro to do perf_fetch_caller_regs()
   right before perf_trace_buf_prepare().

   This way it evaluates TP_ARGS() asap, the next patch explores
   this fact.

   Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
   "struct pt_regs *regs", perhaps it makes sense to remove this
   argument. And perhaps we can teach perf_trace_buf_submit()
   to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
   in this case.

3. Cosmetic, but the typecast from "void*" buys nothing. It just
   adds the noise, remove it.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/sched.h |    6 +++++-
 include/trace/ftrace.h       |    7 +++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index e5586ca..249c024 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -372,7 +372,7 @@ DEFINE_EVENT(sched_stat_template, sched_stat_blocked,
  * Tracepoint for accounting runtime (time the task is executing
  * on a CPU).
  */
-TRACE_EVENT(sched_stat_runtime,
+DECLARE_EVENT_CLASS(sched_stat_runtime,
 
 	TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
 
@@ -401,6 +401,10 @@ TRACE_EVENT(sched_stat_runtime,
 			(unsigned long long)__entry->vruntime)
 );
 
+DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
+	     TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
+	     TP_ARGS(tsk, runtime, vruntime));
+
 /*
  * Tracepoint for showing priority inheritance modifying a tasks
  * priority.
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 41a6643..618af05 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -663,15 +663,14 @@ perf_trace_##call(void *__data, proto)					\
 	int __data_size;						\
 	int rctx;							\
 									\
-	perf_fetch_caller_regs(&__regs);				\
-									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
 									\
-	entry = (struct ftrace_raw_##call *)perf_trace_buf_prepare(	\
-		__entry_size, event_call->event.type, &__regs, &rctx);	\
+	perf_fetch_caller_regs(&__regs);				\
+	entry = perf_trace_buf_prepare(__entry_size,			\
+			event_call->event.type, &__regs, &rctx);	\
 	if (!entry)							\
 		return;							\
 									\
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] tracing/perf: Reimplement TP_perf_assign() logic
  2013-08-06 16:08 [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
  2013-08-06 16:08 ` [PATCH v2 1/3] tracing/perf: Expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
@ 2013-08-06 16:08 ` Oleg Nesterov
  2013-08-15 19:48   ` [tip:perf/core] " tip-bot for Oleg Nesterov
  2013-08-06 16:08 ` [PATCH v2 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
  2013-08-12 15:09 ` [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Peter Zijlstra
  3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2013-08-06 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, David Ahern,
	Masami Hiramatsu, zhangwei(Jovi),
	linux-kernel

The next patch tries to avoid the costly perf_trace_buf_* calls
when possible but there is a problem. We can only do this if
__task == NULL, perf_tp_event(task != NULL) has the additional
code for this case.

Unfortunately, TP_perf_assign/__perf_xxx which changes the default
values of __count/__task variables for perf_trace_buf_submit() is
called "too late", after we already did perf_trace_buf_prepare(),
and the optimization above can't work.

So this patch simply embeds __perf_xxx() into TP_ARGS(), this way
DECLARE_EVENT_CLASS() can use the result of assignments hidden in
"args" right after ftrace_get_offsets_##call() which is mostly
trivial. This allows us to have the fast-path "__task != NULL"
check at the start, see the next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/sched.h |   16 +++-------------
 include/trace/ftrace.h       |   19 +++++++++++--------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 249c024..2e7d994 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -57,7 +57,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
 
 	TP_PROTO(struct task_struct *p, int success),
 
-	TP_ARGS(p, success),
+	TP_ARGS(__perf_task(p), success),
 
 	TP_STRUCT__entry(
 		__array(	char,	comm,	TASK_COMM_LEN	)
@@ -73,9 +73,6 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
 		__entry->prio		= p->prio;
 		__entry->success	= success;
 		__entry->target_cpu	= task_cpu(p);
-	)
-	TP_perf_assign(
-		__perf_task(p);
 	),
 
 	TP_printk("comm=%s pid=%d prio=%d success=%d target_cpu=%03d",
@@ -313,7 +310,7 @@ DECLARE_EVENT_CLASS(sched_stat_template,
 
 	TP_PROTO(struct task_struct *tsk, u64 delay),
 
-	TP_ARGS(tsk, delay),
+	TP_ARGS(__perf_task(tsk), __perf_count(delay)),
 
 	TP_STRUCT__entry(
 		__array( char,	comm,	TASK_COMM_LEN	)
@@ -325,10 +322,6 @@ DECLARE_EVENT_CLASS(sched_stat_template,
 		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
 		__entry->pid	= tsk->pid;
 		__entry->delay	= delay;
-	)
-	TP_perf_assign(
-		__perf_count(delay);
-		__perf_task(tsk);
 	),
 
 	TP_printk("comm=%s pid=%d delay=%Lu [ns]",
@@ -376,7 +369,7 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
 
 	TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
 
-	TP_ARGS(tsk, runtime, vruntime),
+	TP_ARGS(tsk, __perf_count(runtime), vruntime),
 
 	TP_STRUCT__entry(
 		__array( char,	comm,	TASK_COMM_LEN	)
@@ -390,9 +383,6 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
 		__entry->pid		= tsk->pid;
 		__entry->runtime	= runtime;
 		__entry->vruntime	= vruntime;
-	)
-	TP_perf_assign(
-		__perf_count(runtime);
 	),
 
 	TP_printk("comm=%s pid=%d runtime=%Lu [ns] vruntime=%Lu [ns]",
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 618af05..4163d93 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -507,8 +507,14 @@ static inline notrace int ftrace_get_offsets_##call(			\
 #undef TP_fast_assign
 #define TP_fast_assign(args...) args
 
-#undef TP_perf_assign
-#define TP_perf_assign(args...)
+#undef __perf_addr
+#define __perf_addr(a)	(a)
+
+#undef __perf_count
+#define __perf_count(c)	(c)
+
+#undef __perf_task
+#define __perf_task(t)	(t)
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
@@ -636,16 +642,13 @@ __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call
 #define __get_str(field) (char *)__get_dynamic_array(field)
 
 #undef __perf_addr
-#define __perf_addr(a) __addr = (a)
+#define __perf_addr(a)	(__addr = (a))
 
 #undef __perf_count
-#define __perf_count(c) __count = (c)
+#define __perf_count(c)	(__count = (c))
 
 #undef __perf_task
-#define __perf_task(t) __task = (t)
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...) args
+#define __perf_task(t)	(__task = (t))
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
  2013-08-06 16:08 [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
  2013-08-06 16:08 ` [PATCH v2 1/3] tracing/perf: Expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
  2013-08-06 16:08 ` [PATCH v2 2/3] tracing/perf: Reimplement TP_perf_assign() logic Oleg Nesterov
@ 2013-08-06 16:08 ` Oleg Nesterov
  2013-08-15 19:49   ` [tip:perf/core] " tip-bot for Oleg Nesterov
  2013-08-12 15:09 ` [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Peter Zijlstra
  3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2013-08-06 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Frederic Weisbecker, Peter Zijlstra, David Ahern,
	Masami Hiramatsu, zhangwei(Jovi),
	linux-kernel

perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.

This removes the overhead for tasks without events associated
with them. For example, "perf record -e sched:sched_switch -p1"
attaches the counter(s) to the single task, but every task in
system will do perf_trace_buf_prepare/submit() just to realize
that it was not attached to this event.

However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.

With this patch "perf bench sched pipe" shows approximately 4%
improvement when "perf record -p1" runs in parallel, many thanks
to Steven for the testing.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/ftrace.h |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 4163d93..5c7ab17 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -667,6 +667,12 @@ perf_trace_##call(void *__data, proto)					\
 	int rctx;							\
 									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+									\
+	head = this_cpu_ptr(event_call->perf_events);			\
+	if (__builtin_constant_p(!__task) && !__task &&			\
+				hlist_empty(head))			\
+		return;							\
+									\
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
@@ -681,7 +687,6 @@ perf_trace_##call(void *__data, proto)					\
 									\
 	{ assign; }							\
 									\
-	head = this_cpu_ptr(event_call->perf_events);			\
 	perf_trace_buf_submit(entry, __entry_size, rctx, __addr,	\
 		__count, &__regs, head, __task);			\
 }
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
  2013-08-06 16:08 [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
                   ` (2 preceding siblings ...)
  2013-08-06 16:08 ` [PATCH v2 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
@ 2013-08-12 15:09 ` Peter Zijlstra
  2013-08-12 17:45   ` Oleg Nesterov
  3 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2013-08-12 15:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Steven Rostedt, Frederic Weisbecker, David Ahern,
	Masami Hiramatsu, zhangwei(Jovi),
	linux-kernel

On Tue, Aug 06, 2013 at 06:08:26PM +0200, Oleg Nesterov wrote:
> On 08/05, Steven Rostedt wrote:
> >
> > > Sorry... should I resend once again ?
> >
> > Sure, why not. It's only 3 patches :-)
> 
> OK. Added "v2" to avoid the confusion.
> 
> The only change is
> 
> 	- Reviewed-and-Acked-by: Steven Rostedt <rostedt@goodmis.org>
> 	+ Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> 	+ Acked-by: Steven Rostedt <rostedt@goodmis.org>
> 	
> > I wonder if the Reviewed-by assumes the Acked-by?
> 
> I dunno. But since you gave me both, I'd better preserve them all.

So I suppose the down-side to putting them in TP_ARGS() is that you
cannot use arbitrary expressions for them anymore; like:

  TP_ARGS(foo);

  TP_perf_assign(
    __perf_task(foo->ponies);
    __perf_count(foo->horses);
  ),

Not that we actually did something like that, but I imagine it might've
been useful.. A well, lets not worry too much about that and go with
this. We'll get creative again if we ever need something like that.

Acked-by: Peter Zijlstra <peterz@infradead.org>

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
  2013-08-12 15:09 ` [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Peter Zijlstra
@ 2013-08-12 17:45   ` Oleg Nesterov
  2013-08-13  7:47     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2013-08-12 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Frederic Weisbecker, David Ahern,
	Masami Hiramatsu, zhangwei(Jovi),
	linux-kernel

On 08/12, Peter Zijlstra wrote:
>
> So I suppose the down-side to putting them in TP_ARGS() is that you
> cannot use arbitrary expressions for them anymore; like:
>
>   TP_ARGS(foo);
>
>   TP_perf_assign(
>     __perf_task(foo->ponies);
>     __perf_count(foo->horses);
>   ),
>
> Not that we actually did something like that, but I imagine it might've
> been useful..

Yes. This is of course less generic. And more confusing, I agree.

> A well, lets not worry too much about that and go with
> this. We'll get creative again if we ever need something like that.
>
> Acked-by: Peter Zijlstra <peterz@infradead.org>

Thanks ;)

BTW. Can't we kill __perf_addr() and the corresponding argument in
perf_trace_buf_submit/perf_tp_event ?

Or do you think it can have a new user?

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events)
  2013-08-12 17:45   ` Oleg Nesterov
@ 2013-08-13  7:47     ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2013-08-13  7:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Steven Rostedt, Frederic Weisbecker, David Ahern,
	Masami Hiramatsu, zhangwei(Jovi),
	linux-kernel

On Mon, Aug 12, 2013 at 07:45:15PM +0200, Oleg Nesterov wrote:
> BTW. Can't we kill __perf_addr() and the corresponding argument in
> perf_trace_buf_submit/perf_tp_event ?
> 
> Or do you think it can have a new user?

Oh, right.. I suppose that was to enable tracepoints for fault (like)
events. Clearly those never happened.

Yeah, I think we can kill that, another thing to reconsider once we grow
someone wanting it I suppose.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:perf/core] tracing/perf: Expand TRACE_EVENT( sched_stat_runtime)
  2013-08-06 16:08 ` [PATCH v2 1/3] tracing/perf: Expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
@ 2013-08-15 19:48   ` tip-bot for Oleg Nesterov
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Oleg Nesterov @ 2013-08-15 19:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, rostedt, dsahern, tglx, oleg

Commit-ID:  36009d07b79d2a168d6037947357d96e5d8cebe7
Gitweb:     http://git.kernel.org/tip/36009d07b79d2a168d6037947357d96e5d8cebe7
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 6 Aug 2013 18:08:41 +0200
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Tue, 13 Aug 2013 21:05:12 -0400

tracing/perf: Expand TRACE_EVENT(sched_stat_runtime)

To simplify the review of the next patches:

1. We are going to reimplent __perf_task/counter and embedd them
   into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
   DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
   different TP_ARGS's.

2. Change perf_trace_##call() macro to do perf_fetch_caller_regs()
   right before perf_trace_buf_prepare().

   This way it evaluates TP_ARGS() asap, the next patch explores
   this fact.

   Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
   "struct pt_regs *regs", perhaps it makes sense to remove this
   argument. And perhaps we can teach perf_trace_buf_submit()
   to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
   in this case.

3. Cosmetic, but the typecast from "void*" buys nothing. It just
   adds the noise, remove it.

Link: http://lkml.kernel.org/r/20130806160841.GA2736@redhat.com

Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/sched.h | 6 +++++-
 include/trace/ftrace.h       | 7 +++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index e5586ca..249c024 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -372,7 +372,7 @@ DEFINE_EVENT(sched_stat_template, sched_stat_blocked,
  * Tracepoint for accounting runtime (time the task is executing
  * on a CPU).
  */
-TRACE_EVENT(sched_stat_runtime,
+DECLARE_EVENT_CLASS(sched_stat_runtime,
 
 	TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
 
@@ -401,6 +401,10 @@ TRACE_EVENT(sched_stat_runtime,
 			(unsigned long long)__entry->vruntime)
 );
 
+DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
+	     TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
+	     TP_ARGS(tsk, runtime, vruntime));
+
 /*
  * Tracepoint for showing priority inheritance modifying a tasks
  * priority.
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 41a6643..618af05 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -663,15 +663,14 @@ perf_trace_##call(void *__data, proto)					\
 	int __data_size;						\
 	int rctx;							\
 									\
-	perf_fetch_caller_regs(&__regs);				\
-									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
 									\
-	entry = (struct ftrace_raw_##call *)perf_trace_buf_prepare(	\
-		__entry_size, event_call->event.type, &__regs, &rctx);	\
+	perf_fetch_caller_regs(&__regs);				\
+	entry = perf_trace_buf_prepare(__entry_size,			\
+			event_call->event.type, &__regs, &rctx);	\
 	if (!entry)							\
 		return;							\
 									\

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:perf/core] tracing/perf: Reimplement TP_perf_assign() logic
  2013-08-06 16:08 ` [PATCH v2 2/3] tracing/perf: Reimplement TP_perf_assign() logic Oleg Nesterov
@ 2013-08-15 19:48   ` tip-bot for Oleg Nesterov
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Oleg Nesterov @ 2013-08-15 19:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, rostedt, dsahern, tglx, oleg

Commit-ID:  12473965c38a527a0c6f7a38d23edce60957f873
Gitweb:     http://git.kernel.org/tip/12473965c38a527a0c6f7a38d23edce60957f873
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 6 Aug 2013 18:08:44 +0200
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Tue, 13 Aug 2013 21:05:57 -0400

tracing/perf: Reimplement TP_perf_assign() logic

The next patch tries to avoid the costly perf_trace_buf_* calls
when possible but there is a problem. We can only do this if
__task == NULL, perf_tp_event(task != NULL) has the additional
code for this case.

Unfortunately, TP_perf_assign/__perf_xxx which changes the default
values of __count/__task variables for perf_trace_buf_submit() is
called "too late", after we already did perf_trace_buf_prepare(),
and the optimization above can't work.

So this patch simply embeds __perf_xxx() into TP_ARGS(), this way
DECLARE_EVENT_CLASS() can use the result of assignments hidden in
"args" right after ftrace_get_offsets_##call() which is mostly
trivial. This allows us to have the fast-path "__task != NULL"
check at the start, see the next patch.

Link: http://lkml.kernel.org/r/20130806160844.GA2739@redhat.com

Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/sched.h | 16 +++-------------
 include/trace/ftrace.h       | 19 +++++++++++--------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 249c024..2e7d994 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -57,7 +57,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
 
 	TP_PROTO(struct task_struct *p, int success),
 
-	TP_ARGS(p, success),
+	TP_ARGS(__perf_task(p), success),
 
 	TP_STRUCT__entry(
 		__array(	char,	comm,	TASK_COMM_LEN	)
@@ -73,9 +73,6 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
 		__entry->prio		= p->prio;
 		__entry->success	= success;
 		__entry->target_cpu	= task_cpu(p);
-	)
-	TP_perf_assign(
-		__perf_task(p);
 	),
 
 	TP_printk("comm=%s pid=%d prio=%d success=%d target_cpu=%03d",
@@ -313,7 +310,7 @@ DECLARE_EVENT_CLASS(sched_stat_template,
 
 	TP_PROTO(struct task_struct *tsk, u64 delay),
 
-	TP_ARGS(tsk, delay),
+	TP_ARGS(__perf_task(tsk), __perf_count(delay)),
 
 	TP_STRUCT__entry(
 		__array( char,	comm,	TASK_COMM_LEN	)
@@ -325,10 +322,6 @@ DECLARE_EVENT_CLASS(sched_stat_template,
 		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
 		__entry->pid	= tsk->pid;
 		__entry->delay	= delay;
-	)
-	TP_perf_assign(
-		__perf_count(delay);
-		__perf_task(tsk);
 	),
 
 	TP_printk("comm=%s pid=%d delay=%Lu [ns]",
@@ -376,7 +369,7 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
 
 	TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
 
-	TP_ARGS(tsk, runtime, vruntime),
+	TP_ARGS(tsk, __perf_count(runtime), vruntime),
 
 	TP_STRUCT__entry(
 		__array( char,	comm,	TASK_COMM_LEN	)
@@ -390,9 +383,6 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
 		__entry->pid		= tsk->pid;
 		__entry->runtime	= runtime;
 		__entry->vruntime	= vruntime;
-	)
-	TP_perf_assign(
-		__perf_count(runtime);
 	),
 
 	TP_printk("comm=%s pid=%d runtime=%Lu [ns] vruntime=%Lu [ns]",
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 618af05..4163d93 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -507,8 +507,14 @@ static inline notrace int ftrace_get_offsets_##call(			\
 #undef TP_fast_assign
 #define TP_fast_assign(args...) args
 
-#undef TP_perf_assign
-#define TP_perf_assign(args...)
+#undef __perf_addr
+#define __perf_addr(a)	(a)
+
+#undef __perf_count
+#define __perf_count(c)	(c)
+
+#undef __perf_task
+#define __perf_task(t)	(t)
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
@@ -636,16 +642,13 @@ __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call
 #define __get_str(field) (char *)__get_dynamic_array(field)
 
 #undef __perf_addr
-#define __perf_addr(a) __addr = (a)
+#define __perf_addr(a)	(__addr = (a))
 
 #undef __perf_count
-#define __perf_count(c) __count = (c)
+#define __perf_count(c)	(__count = (c))
 
 #undef __perf_task
-#define __perf_task(t) __task = (t)
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...) args
+#define __perf_task(t)	(__task = (t))
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:perf/core] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible
  2013-08-06 16:08 ` [PATCH v2 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
@ 2013-08-15 19:49   ` tip-bot for Oleg Nesterov
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Oleg Nesterov @ 2013-08-15 19:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, rostedt, dsahern, tglx, oleg

Commit-ID:  d027e6a9c83440bf1ca9e5503539d58d8e0914f1
Gitweb:     http://git.kernel.org/tip/d027e6a9c83440bf1ca9e5503539d58d8e0914f1
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Tue, 6 Aug 2013 18:08:47 +0200
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Tue, 13 Aug 2013 21:06:30 -0400

tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible

perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.

This removes the overhead for tasks without events associated
with them. For example, "perf record -e sched:sched_switch -p1"
attaches the counter(s) to the single task, but every task in
system will do perf_trace_buf_prepare/submit() just to realize
that it was not attached to this event.

However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.

With this patch "perf bench sched pipe" shows approximately 4%
improvement when "perf record -p1" runs in parallel, many thanks
to Steven for the testing.

Link: http://lkml.kernel.org/r/20130806160847.GA2746@redhat.com

Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/ftrace.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 4163d93..5c7ab17 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -667,6 +667,12 @@ perf_trace_##call(void *__data, proto)					\
 	int rctx;							\
 									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+									\
+	head = this_cpu_ptr(event_call->perf_events);			\
+	if (__builtin_constant_p(!__task) && !__task &&			\
+				hlist_empty(head))			\
+		return;							\
+									\
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
@@ -681,7 +687,6 @@ perf_trace_##call(void *__data, proto)					\
 									\
 	{ assign; }							\
 									\
-	head = this_cpu_ptr(event_call->perf_events);			\
 	perf_trace_buf_submit(entry, __entry_size, rctx, __addr,	\
 		__count, &__regs, head, __task);			\
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-08-15 19:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-06 16:08 [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Oleg Nesterov
2013-08-06 16:08 ` [PATCH v2 1/3] tracing/perf: Expand TRACE_EVENT(sched_stat_runtime) Oleg Nesterov
2013-08-15 19:48   ` [tip:perf/core] tracing/perf: Expand TRACE_EVENT( sched_stat_runtime) tip-bot for Oleg Nesterov
2013-08-06 16:08 ` [PATCH v2 2/3] tracing/perf: Reimplement TP_perf_assign() logic Oleg Nesterov
2013-08-15 19:48   ` [tip:perf/core] " tip-bot for Oleg Nesterov
2013-08-06 16:08 ` [PATCH v2 3/3] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible Oleg Nesterov
2013-08-15 19:49   ` [tip:perf/core] " tip-bot for Oleg Nesterov
2013-08-12 15:09 ` [PATCH v2 0/3] Teach perf_trace_##call() to check hlist_empty(perf_events) Peter Zijlstra
2013-08-12 17:45   ` Oleg Nesterov
2013-08-13  7:47     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.