All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] perf updates
@ 2010-03-03  6:54 Frederic Weisbecker
  2010-03-03  6:55 ` [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-03  6:54 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Paul Mackerras

There is an rfc patch in this set, so may be it's not that
pullable.

But for those who want to test or something, the set is in the
perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      lockdep: Move lock events under lockdep recursion protection
      perf: Take a hot regs snapshot for trace events
      perf/x86-64: Use frame pointer to walk on irq and process stacks


 arch/x86/include/asm/asm.h        |    1 +
 arch/x86/include/asm/perf_event.h |   25 +++++++++++++++++++++++++
 arch/x86/kernel/dumpstack_64.c    |    4 ++--
 kernel/lockdep.c                  |    9 +++------
 kernel/perf_event.c               |   19 ++++++++++++++-----
 5 files changed, 45 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-03  6:54 [GIT PULL] perf updates Frederic Weisbecker
@ 2010-03-03  6:55 ` Frederic Weisbecker
  2010-03-09  7:18   ` Hitoshi Mitake
  2010-03-09  8:34   ` Jens Axboe
  2010-03-03  6:55 ` [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events Frederic Weisbecker
  2010-03-03  6:55 ` [PATCH 3/3] perf/x86-64: Use frame pointer to walk on irq and process stacks Frederic Weisbecker
  2 siblings, 2 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-03  6:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Steven Rostedt, Paul Mackerras,
	Hitoshi Mitake, Li Zefan, Lai Jiangshan, Masami Hiramatsu,
	Jens Axboe

There are rcu locked read side areas in the path where we submit
a trace event. And these rcu_read_(un)lock() trigger lock events,
which create recursive events.

One pair in do_perf_sw_event:

__lock_acquire
      |
      |--96.11%-- lock_acquire
      |          |
      |          |--27.21%-- do_perf_sw_event
      |          |          perf_tp_event
      |          |          |
      |          |          |--49.62%-- ftrace_profile_lock_release
      |          |          |          lock_release
      |          |          |          |
      |          |          |          |--33.85%-- _raw_spin_unlock

Another pair in perf_output_begin/end:

__lock_acquire
      |--23.40%-- perf_output_begin
      |          |          __perf_event_overflow
      |          |          perf_swevent_overflow
      |          |          perf_swevent_add
      |          |          perf_swevent_ctx_event
      |          |          do_perf_sw_event
      |          |          perf_tp_event
      |          |          |
      |          |          |--55.37%-- ftrace_profile_lock_acquire
      |          |          |          lock_acquire
      |          |          |          |
      |          |          |          |--37.31%-- _raw_spin_lock

The problem is not that much the trace recursion itself, as we have a
recursion protection already (though it's always wasteful to recurse).
But the trace events are outside the lockdep recursion protection, then
each lockdep event triggers a lock trace, which will trigger two
other lockdep events. Here the recursive lock trace event won't
be taken because of the trace recursion, so the recursion stops there
but lockdep will still analyse these new events:

To sum up, for each lockdep events we have:

	lock_*()
	     |
             trace lock_acquire
                  |
                  ----- rcu_read_lock()
                  |          |
                  |          lock_acquire()
                  |          |
                  |          trace_lock_acquire() (stopped)
                  |          |
		  |          lockdep analyze
                  |
                  ----- rcu_read_unlock()
                             |
                             lock_release
                             |
                             trace_lock_release() (stopped)
                             |
                             lockdep analyze

And you can repeat the above two times as we have two rcu read side
sections when we submit an event.

This is fixed in this patch by moving the lock trace event under
the lockdep recursion protection.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
---
 kernel/lockdep.c |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index c62ec14..3de6085 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -3211,8 +3211,6 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 {
 	unsigned long flags;
 
-	trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip);
-
 	if (unlikely(current->lockdep_recursion))
 		return;
 
@@ -3220,6 +3218,7 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	check_flags(flags);
 
 	current->lockdep_recursion = 1;
+	trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip);
 	__lock_acquire(lock, subclass, trylock, read, check,
 		       irqs_disabled_flags(flags), nest_lock, ip, 0);
 	current->lockdep_recursion = 0;
@@ -3232,14 +3231,13 @@ void lock_release(struct lockdep_map *lock, int nested,
 {
 	unsigned long flags;
 
-	trace_lock_release(lock, nested, ip);
-
 	if (unlikely(current->lockdep_recursion))
 		return;
 
 	raw_local_irq_save(flags);
 	check_flags(flags);
 	current->lockdep_recursion = 1;
+	trace_lock_release(lock, nested, ip);
 	__lock_release(lock, nested, ip);
 	current->lockdep_recursion = 0;
 	raw_local_irq_restore(flags);
@@ -3413,8 +3411,6 @@ void lock_contended(struct lockdep_map *lock, unsigned long ip)
 {
 	unsigned long flags;
 
-	trace_lock_contended(lock, ip);
-
 	if (unlikely(!lock_stat))
 		return;
 
@@ -3424,6 +3420,7 @@ void lock_contended(struct lockdep_map *lock, unsigned long ip)
 	raw_local_irq_save(flags);
 	check_flags(flags);
 	current->lockdep_recursion = 1;
+	trace_lock_contended(lock, ip);
 	__lock_contended(lock, ip);
 	current->lockdep_recursion = 0;
 	raw_local_irq_restore(flags);
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03  6:54 [GIT PULL] perf updates Frederic Weisbecker
  2010-03-03  6:55 ` [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection Frederic Weisbecker
@ 2010-03-03  6:55 ` Frederic Weisbecker
  2010-03-03  8:38   ` Peter Zijlstra
  2010-03-03 16:06   ` [RFC][PATCH 2/3] " Steven Rostedt
  2010-03-03  6:55 ` [PATCH 3/3] perf/x86-64: Use frame pointer to walk on irq and process stacks Frederic Weisbecker
  2 siblings, 2 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-03  6:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Peter Zijlstra, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

In order to achieve that, we introduce PERF_SAVE_REGS() that
takes a live snapshot of the regs we are interested in, and
we use it from perf_tp_event().

Comparison with perf record -e lock: -R -a -f -g
Before:

        perf  [kernel]                   [k] __do_softirq
               |
               --- __do_softirq
                  |
                  |--55.16%-- __open
                  |
                   --44.84%-- __write_nocancel

After:

            perf  [kernel]           [k] perf_tp_event
               |
               --- perf_tp_event
                  |
                  |--41.07%-- ftrace_profile_lock_acquire
                  |          lock_acquire
                  |          |
                  |          |--39.36%-- _raw_spin_lock
                  |          |          |
                  |          |          |--7.81%-- hrtimer_interrupt
                  |          |          |          smp_apic_timer_interrupt
                  |          |          |          apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want. We can even think about implementing a "skip" field in
PERF_SAVE_REGS() to rewind to the caller "n" (here n would
be equal to 2 so that we start on lock_acquire()).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 arch/x86/include/asm/asm.h        |    1 +
 arch/x86/include/asm/perf_event.h |   25 +++++++++++++++++++++++++
 kernel/perf_event.c               |   19 ++++++++++++++-----
 3 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index b3ed1e1..d86e7dc 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -27,6 +27,7 @@
 #define _ASM_ADD	__ASM_SIZE(add)
 #define _ASM_SUB	__ASM_SIZE(sub)
 #define _ASM_XADD	__ASM_SIZE(xadd)
+#define _ASM_POP	__ASM_SIZE(pop)
 
 #define _ASM_AX		__ASM_REG(ax)
 #define _ASM_BX		__ASM_REG(bx)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index befd172..a166fa5 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -122,6 +122,31 @@ union cpuid10_edx {
 extern void init_hw_perf_events(void);
 extern void perf_events_lapic_init(void);
 
+/*
+ * Take a snapshot of the regs. We only need a few of them:
+ * ip for PERF_SAMPLE_IP
+ * cs for user_mode() tests
+ * bp for callchains
+ * eflags, for future purposes, just in case
+ */
+#define PERF_SAVE_REGS(regs)				{\
+	memset((regs), 0, sizeof(*(regs)));		\
+	(regs)->cs = __KERNEL_CS;			\
+							\
+	asm volatile(					\
+		"call 1f\n"				\
+		"1:\n"					\
+		_ASM_POP " %[ip]\n"			\
+		_ASM_MOV "%%" _ASM_BP ", %[bp]\n"	\
+		"pushf\n"				\
+		_ASM_POP " %[flags]\n"			\
+		:  [ip]    "=m" ((regs)->ip),		\
+		   [bp]    "=m" ((regs)->bp),		\
+		   [flags] "=m" ((regs)->flags)		\
+		:: "memory"				\
+	);						\
+}
+
 #define PERF_EVENT_INDEX_OFFSET			0
 
 #else
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a661e79..f412be1 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2807,6 +2807,16 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 }
 
 /*
+ * Hot regs snapshot support -- arch specific
+ * This needs to be a macro because we want the current
+ * frame pointer.
+ */
+#ifndef PERF_SAVE_REGS
+#define PERF_SAVE_REGS(regs)	memset(regs, 0, sizeof(*regs));
+#endif
+
+
+/*
  * Output
  */
 static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail,
@@ -4337,6 +4347,8 @@ static const struct pmu perf_ops_task_clock = {
 void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 			  int entry_size)
 {
+	struct pt_regs regs;
+
 	struct perf_raw_record raw = {
 		.size = entry_size,
 		.data = record,
@@ -4347,14 +4359,11 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 		.raw = &raw,
 	};
 
-	struct pt_regs *regs = get_irq_regs();
-
-	if (!regs)
-		regs = task_pt_regs(current);
+	PERF_SAVE_REGS(&regs);
 
 	/* Trace events already protected against recursion */
 	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
-				&data, regs);
+				&data, &regs);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 3/3] perf/x86-64: Use frame pointer to walk on irq and process stacks
  2010-03-03  6:54 [GIT PULL] perf updates Frederic Weisbecker
  2010-03-03  6:55 ` [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection Frederic Weisbecker
  2010-03-03  6:55 ` [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events Frederic Weisbecker
@ 2010-03-03  6:55 ` Frederic Weisbecker
  2 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-03  6:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo

We were using the frame pointer based stack walker on every
contexts in x86-32, but not in x86-64 where we only use the
seven-league boots on the exception stacks.

Use it also on irq and process stacks. This utterly accelerate
the captures.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 arch/x86/kernel/dumpstack_64.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 676bc05..cab4f72 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -202,7 +202,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			if (in_irq_stack(stack, irq_stack, irq_stack_end)) {
 				if (ops->stack(data, "IRQ") < 0)
 					break;
-				bp = print_context_stack(tinfo, stack, bp,
+				bp = ops->walk_stack(tinfo, stack, bp,
 					ops, data, irq_stack_end, &graph);
 				/*
 				 * We link to the next stack (which would be
@@ -223,7 +223,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 	/*
 	 * This handles the process stack:
 	 */
-	bp = print_context_stack(tinfo, stack, bp, ops, data, NULL, &graph);
+	bp = ops->walk_stack(tinfo, stack, bp, ops, data, NULL, &graph);
 	put_cpu();
 }
 EXPORT_SYMBOL(dump_trace);
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03  6:55 ` [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events Frederic Weisbecker
@ 2010-03-03  8:38   ` Peter Zijlstra
  2010-03-03 20:07     ` Frederic Weisbecker
                       ` (4 more replies)
  2010-03-03 16:06   ` [RFC][PATCH 2/3] " Steven Rostedt
  1 sibling, 5 replies; 76+ messages in thread
From: Peter Zijlstra @ 2010-03-03  8:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Thomas Gleixner, H. Peter Anvin,
	Paul Mackerras, Steven Rostedt, Arnaldo Carvalho de Melo

On Wed, 2010-03-03 at 07:55 +0100, Frederic Weisbecker wrote:
>  /*
> + * Hot regs snapshot support -- arch specific
> + * This needs to be a macro because we want the current
> + * frame pointer.
> + */
> +#ifndef PERF_SAVE_REGS
> +#define PERF_SAVE_REGS(regs)   memset(regs, 0, sizeof(*regs));
> +#endif

It would be nice to have the fallback at least set the current or
calling IP, now you've basically wrecked stuff for everything !x86.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03  6:55 ` [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events Frederic Weisbecker
  2010-03-03  8:38   ` Peter Zijlstra
@ 2010-03-03 16:06   ` Steven Rostedt
  2010-03-03 16:37     ` Peter Zijlstra
  1 sibling, 1 reply; 76+ messages in thread
From: Steven Rostedt @ 2010-03-03 16:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo

On Wed, 2010-03-03 at 07:55 +0100, Frederic Weisbecker wrote:

> +/*
>   * Output
>   */
>  static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail,
> @@ -4337,6 +4347,8 @@ static const struct pmu perf_ops_task_clock = {
>  void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
>  			  int entry_size)
>  {
> +	struct pt_regs regs;
> +
>  	struct perf_raw_record raw = {
>  		.size = entry_size,
>  		.data = record,
> @@ -4347,14 +4359,11 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
>  		.raw = &raw,
>  	};
>  
> -	struct pt_regs *regs = get_irq_regs();
> -
> -	if (!regs)
> -		regs = task_pt_regs(current);
> +	PERF_SAVE_REGS(&regs);
>  
>  	/* Trace events already protected against recursion */
>  	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
> -				&data, regs);
> +				&data, &regs);

Off-topic: Why is the above a perf sw event? Couldn't that also be a
normal TRACE_EVENT()?

-- Steve

>  }
>  EXPORT_SYMBOL_GPL(perf_tp_event);
>  



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03 16:06   ` [RFC][PATCH 2/3] " Steven Rostedt
@ 2010-03-03 16:37     ` Peter Zijlstra
  2010-03-03 17:07       ` Steven Rostedt
  0 siblings, 1 reply; 76+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:37 UTC (permalink / raw)
  To: rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo

On Wed, 2010-03-03 at 11:06 -0500, Steven Rostedt wrote:
> On Wed, 2010-03-03 at 07:55 +0100, Frederic Weisbecker wrote:
> 
> > +/*
> >   * Output
> >   */
> >  static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail,
> > @@ -4337,6 +4347,8 @@ static const struct pmu perf_ops_task_clock = {
> >  void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
> >  			  int entry_size)
> >  {
> > +	struct pt_regs regs;
> > +
> >  	struct perf_raw_record raw = {
> >  		.size = entry_size,
> >  		.data = record,
> > @@ -4347,14 +4359,11 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
> >  		.raw = &raw,
> >  	};
> >  
> > -	struct pt_regs *regs = get_irq_regs();
> > -
> > -	if (!regs)
> > -		regs = task_pt_regs(current);
> > +	PERF_SAVE_REGS(&regs);
> >  
> >  	/* Trace events already protected against recursion */
> >  	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
> > -				&data, regs);
> > +				&data, &regs);
> 
> Off-topic: Why is the above a perf sw event? Couldn't that also be a
> normal TRACE_EVENT()?

Well, no, this is the stuff that transforms TRACE_EVENT() into perf
software events ;-)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03 16:37     ` Peter Zijlstra
@ 2010-03-03 17:07       ` Steven Rostedt
  2010-03-03 17:16         ` Peter Zijlstra
  0 siblings, 1 reply; 76+ messages in thread
From: Steven Rostedt @ 2010-03-03 17:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo

On Wed, 2010-03-03 at 17:37 +0100, Peter Zijlstra wrote:

> > >  	/* Trace events already protected against recursion */
> > >  	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
> > > -				&data, regs);
> > > +				&data, &regs);
> > 
> > Off-topic: Why is the above a perf sw event? Couldn't that also be a
> > normal TRACE_EVENT()?
> 
> Well, no, this is the stuff that transforms TRACE_EVENT() into perf
> software events ;-)
> 

oops, my bad :-), I thought this was in the x86 arch directory. For the
University, I was helping them with adding trace points for page faults
when I came across this in arch/x86/mm/fault.c:

	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, 0, regs, address);


This is what I actually was wondering about. Why is it a "perf only"
trace point instead of a TRACE_EVENT()?

-- Steve



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03 17:07       ` Steven Rostedt
@ 2010-03-03 17:16         ` Peter Zijlstra
  2010-03-03 17:45           ` Steven Rostedt
  2010-03-04 11:25           ` Ingo Molnar
  0 siblings, 2 replies; 76+ messages in thread
From: Peter Zijlstra @ 2010-03-03 17:16 UTC (permalink / raw)
  To: rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo

On Wed, 2010-03-03 at 12:07 -0500, Steven Rostedt wrote:
> oops, my bad :-), I thought this was in the x86 arch directory. For the
> University, I was helping them with adding trace points for page faults
> when I came across this in arch/x86/mm/fault.c:
> 
>         perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, 0, regs, address);
> 
> 
> This is what I actually was wondering about. Why is it a "perf only"
> trace point instead of a TRACE_EVENT()?

Because I wanted to make perf usable without having to rely on funny
tracepoints. That is, I am less worried about committing software
counters to ABI than I am about TRACE_EVENT(), which still gives me a
terribly uncomfortable feeling.

Also, building with all CONFIG_TRACE_*=n will still yield a usable perf,
which is something the embedded people might fancy, all that TRACE stuff
adds lots of code.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03 17:16         ` Peter Zijlstra
@ 2010-03-03 17:45           ` Steven Rostedt
  2010-03-03 20:37             ` Frederic Weisbecker
  2010-03-04 11:25           ` Ingo Molnar
  1 sibling, 1 reply; 76+ messages in thread
From: Steven Rostedt @ 2010-03-03 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Ingo Molnar, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo

On Wed, 2010-03-03 at 18:16 +0100, Peter Zijlstra wrote:

> > This is what I actually was wondering about. Why is it a "perf only"
> > trace point instead of a TRACE_EVENT()?
> 
> Because I wanted to make perf usable without having to rely on funny
> tracepoints. That is, I am less worried about committing software
> counters to ABI than I am about TRACE_EVENT(), which still gives me a
> terribly uncomfortable feeling.
> 
> Also, building with all CONFIG_TRACE_*=n will still yield a usable perf,
> which is something the embedded people might fancy, all that TRACE stuff
> adds lots of code.

We could make TRACE_EVENT() into a perf only trace point with
CONFIG_TRACE_*=n. 

Just saying that it would be nice if ftrace could also see page faults
and such.

-- Steve




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03  8:38   ` Peter Zijlstra
@ 2010-03-03 20:07     ` Frederic Weisbecker
  2010-03-04 19:01     ` Frederic Weisbecker
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-03 20:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Thomas Gleixner, H. Peter Anvin,
	Paul Mackerras, Steven Rostedt, Arnaldo Carvalho de Melo

On Wed, Mar 03, 2010 at 09:38:49AM +0100, Peter Zijlstra wrote:
> On Wed, 2010-03-03 at 07:55 +0100, Frederic Weisbecker wrote:
> >  /*
> > + * Hot regs snapshot support -- arch specific
> > + * This needs to be a macro because we want the current
> > + * frame pointer.
> > + */
> > +#ifndef PERF_SAVE_REGS
> > +#define PERF_SAVE_REGS(regs)   memset(regs, 0, sizeof(*regs));
> > +#endif
> 
> It would be nice to have the fallback at least set the current or
> calling IP, now you've basically wrecked stuff for everything !x86.


Oh I haven't wrecked that much !x86, no callchain was working on
trace events before, nor the ip was correct :-)

That said I agree with you, I'm going to reuse the CALLER_ADDRx
macros for that. This is the occasion to add a skip field
in PERF_SAVE_REGS to rewind where we want.

Thanks.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03 17:45           ` Steven Rostedt
@ 2010-03-03 20:37             ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-03 20:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo

On Wed, Mar 03, 2010 at 12:45:55PM -0500, Steven Rostedt wrote:
> On Wed, 2010-03-03 at 18:16 +0100, Peter Zijlstra wrote:
> 
> > > This is what I actually was wondering about. Why is it a "perf only"
> > > trace point instead of a TRACE_EVENT()?
> > 
> > Because I wanted to make perf usable without having to rely on funny
> > tracepoints. That is, I am less worried about committing software
> > counters to ABI than I am about TRACE_EVENT(), which still gives me a
> > terribly uncomfortable feeling.
> > 
> > Also, building with all CONFIG_TRACE_*=n will still yield a usable perf,
> > which is something the embedded people might fancy, all that TRACE stuff
> > adds lots of code.
> 
> We could make TRACE_EVENT() into a perf only trace point with
> CONFIG_TRACE_*=n. 



Yeah.


 
> Just saying that it would be nice if ftrace could also see page faults
> and such.


Agreed, we could make it a TRACE_EVENT. IIRC, someone proposed patches
for that by the past.

That notwithstanding one of the main worries is the fact TRACE_EVENT
are less ABI-stable guaranteed.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03 17:16         ` Peter Zijlstra
  2010-03-03 17:45           ` Steven Rostedt
@ 2010-03-04 11:25           ` Ingo Molnar
  2010-03-04 15:16             ` Steven Rostedt
  1 sibling, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2010-03-04 11:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rostedt, Frederic Weisbecker, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2010-03-03 at 12:07 -0500, Steven Rostedt wrote:
> > oops, my bad :-), I thought this was in the x86 arch directory. For the
> > University, I was helping them with adding trace points for page faults
> > when I came across this in arch/x86/mm/fault.c:
> > 
> >         perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, 0, regs, address);
> > 
> > 
> > This is what I actually was wondering about. Why is it a "perf only" trace 
> > point instead of a TRACE_EVENT()?
> 
> Because I wanted to make perf usable without having to rely on funny 
> tracepoints. That is, I am less worried about committing software counters 
> to ABI than I am about TRACE_EVENT(), which still gives me a terribly 
> uncomfortable feeling.

I'd still like a much less error-prone and work-intense way of doing it.

I'd suggest we simply add a TRACE_EVENT_ABI() for such cases, where we really 
want to expose a tracepoint to tooling, programmatically. Maybe even change 
the usage sites to trace_foo_ABI(), to make it really clear and to make people 
aware of the consequences.

> Also, building with all CONFIG_TRACE_*=n will still yield a usable perf, 
> which is something the embedded people might fancy, all that TRACE stuff 
> adds lots of code.

Not a real issue i suspect when you do lock profiling ...

Or if it is, some debloating might be in order - and the detaching of event 
enumeration and ftrace TRACE_EVENT infrastructure from other ftrace bits. (i 
suggested an '/eventfs' special filesystem before, for nicely layed out 
hierarchy of ftrace/perf events.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 11:25           ` Ingo Molnar
@ 2010-03-04 15:16             ` Steven Rostedt
  2010-03-04 15:36               ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Steven Rostedt @ 2010-03-04 15:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Frederic Weisbecker, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo

On Thu, 2010-03-04 at 12:25 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, 2010-03-03 at 12:07 -0500, Steven Rostedt wrote:
> > > oops, my bad :-), I thought this was in the x86 arch directory. For the
> > > University, I was helping them with adding trace points for page faults
> > > when I came across this in arch/x86/mm/fault.c:
> > > 
> > >         perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, 0, regs, address);
> > > 
> > > 
> > > This is what I actually was wondering about. Why is it a "perf only" trace 
> > > point instead of a TRACE_EVENT()?
> > 
> > Because I wanted to make perf usable without having to rely on funny 
> > tracepoints. That is, I am less worried about committing software counters 
> > to ABI than I am about TRACE_EVENT(), which still gives me a terribly 
> > uncomfortable feeling.
> 
> I'd still like a much less error-prone and work-intense way of doing it.
> 
> I'd suggest we simply add a TRACE_EVENT_ABI() for such cases, where we really 
> want to expose a tracepoint to tooling, programmatically. Maybe even change 
> the usage sites to trace_foo_ABI(), to make it really clear and to make people 
> aware of the consequences.

Would this still be available as a normal trace event?

> 
> > Also, building with all CONFIG_TRACE_*=n will still yield a usable perf, 
> > which is something the embedded people might fancy, all that TRACE stuff 
> > adds lots of code.
> 
> Not a real issue i suspect when you do lock profiling ...
> 
> Or if it is, some debloating might be in order - and the detaching of event 
> enumeration and ftrace TRACE_EVENT infrastructure from other ftrace bits. (i 
> suggested an '/eventfs' special filesystem before, for nicely layed out 
> hierarchy of ftrace/perf events.)

Actually, we already have a way to decouple it.

include/trace/define_trace.h is the file that just adds the tracepoint
that is needed.

include/trace/ftrace.h is the file that does the magic and adds the code
for callbacks and tracing.

The perf hooks probably should not have gone in that file and been put
into a include/trace/perf.h file, and then in define_trace.h we would
add:

 #ifdef CONFIG_EVENT_TRACING
 #include <trace/ftrace.h>
 #endif

+#ifdef CONFIG_PERF_EVENTS
+#include <trace/perf.h>
+#endif

This should be done anyway. But it would also let you decouple ftrace
trace events from perf trace events but still let the two use the same
trace points.

-- Steve




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 15:16             ` Steven Rostedt
@ 2010-03-04 15:36               ` Ingo Molnar
  2010-03-04 15:55                 ` Steven Rostedt
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2010-03-04 15:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Frederic Weisbecker, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 2010-03-04 at 12:25 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Wed, 2010-03-03 at 12:07 -0500, Steven Rostedt wrote:
> > > > oops, my bad :-), I thought this was in the x86 arch directory. For the
> > > > University, I was helping them with adding trace points for page faults
> > > > when I came across this in arch/x86/mm/fault.c:
> > > > 
> > > >         perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, 0, regs, address);
> > > > 
> > > > 
> > > > This is what I actually was wondering about. Why is it a "perf only" trace 
> > > > point instead of a TRACE_EVENT()?
> > > 
> > > Because I wanted to make perf usable without having to rely on funny 
> > > tracepoints. That is, I am less worried about committing software counters 
> > > to ABI than I am about TRACE_EVENT(), which still gives me a terribly 
> > > uncomfortable feeling.
> > 
> > I'd still like a much less error-prone and work-intense way of doing it.
> > 
> > I'd suggest we simply add a TRACE_EVENT_ABI() for such cases, where we 
> > really want to expose a tracepoint to tooling, programmatically. Maybe 
> > even change the usage sites to trace_foo_ABI(), to make it really clear 
> > and to make people aware of the consequences.
> 
> Would this still be available as a normal trace event?

Yeah, of course. It would not result in any usage or flexibility restriction.

( In the future we might want to add some sort of automated signature thing to 
  make sure that an event that has been declared an 'ABI' isnt changed - at 
  least as far as the record format goes. )

> > 
> > > Also, building with all CONFIG_TRACE_*=n will still yield a usable perf, 
> > > which is something the embedded people might fancy, all that TRACE stuff 
> > > adds lots of code.
> > 
> > Not a real issue i suspect when you do lock profiling ...
> > 
> > Or if it is, some debloating might be in order - and the detaching of event 
> > enumeration and ftrace TRACE_EVENT infrastructure from other ftrace bits. (i 
> > suggested an '/eventfs' special filesystem before, for nicely layed out 
> > hierarchy of ftrace/perf events.)
> 
> Actually, we already have a way to decouple it.
> 
> include/trace/define_trace.h is the file that just adds the tracepoint
> that is needed.
> 
> include/trace/ftrace.h is the file that does the magic and adds the code
> for callbacks and tracing.
> 
> The perf hooks probably should not have gone in that file and been put
> into a include/trace/perf.h file, and then in define_trace.h we would
> add:
> 
>  #ifdef CONFIG_EVENT_TRACING
>  #include <trace/ftrace.h>
>  #endif
> 
> +#ifdef CONFIG_PERF_EVENTS
> +#include <trace/perf.h>
> +#endif
> 
> This should be done anyway. But it would also let you decouple ftrace trace 
> events from perf trace events but still let the two use the same trace 
> points.

I think the main thing would be to have a decoupled /eventfs - basically 
/debug/tracing/events/ moved to "/eventfs" or maybe to "/proc/events/". This 
would make them available more widely, and in a standardized way.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 15:36               ` Ingo Molnar
@ 2010-03-04 15:55                 ` Steven Rostedt
  2010-03-04 21:17                   ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Steven Rostedt @ 2010-03-04 15:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Frederic Weisbecker, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH

On Thu, 2010-03-04 at 16:36 +0100, Ingo Molnar wrote:

> > This should be done anyway. But it would also let you decouple ftrace trace 
> > events from perf trace events but still let the two use the same trace 
> > points.
> 
> I think the main thing would be to have a decoupled /eventfs - basically 
> /debug/tracing/events/ moved to "/eventfs" or maybe to "/proc/events/". This 
> would make them available more widely, and in a standardized way.

I know Greg once proposed a /tracefs directory. I don't really care how
things work as long as we don't lose functionality. Perhaps we should
have a standard tracefs dir, and have:

/sys/kernel/trace
/sys/kernel/trace/events
/sys/kernel/trace/ftrace
/sys/kernel/trace/perf

This would keep things nicely grouped but separate.

I could also decouple the printing of the formats from ftrace.h and then
in in the define_trace.h:

#ifdef CONFIG_EVENTS
# include <trace/events.h>
# ifdef CONFIG_FTRACE_EVENTS
#  include <trace/ftrace.h>
# endif
# ifdef CONFIG_PERF_EVENTS
#  include <trace/perf.h>
# endif
#endif

Have the trace/events.h file create the files for the event directory.

But what about the enable and filter files in the event directory. How
would they be attached? Currently these modify the way ftrace works. I'm
assuming that perf enables these with the syscall. Should these files
still be specific to ftrace if enabled?

-- Steve




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-03  8:38   ` Peter Zijlstra
  2010-03-03 20:07     ` Frederic Weisbecker
@ 2010-03-04 19:01     ` Frederic Weisbecker
  2010-03-05  3:08     ` [PATCH 0/2] Perf " Frederic Weisbecker
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-04 19:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Thomas Gleixner, H. Peter Anvin,
	Paul Mackerras, Steven Rostedt, Arnaldo Carvalho de Melo

On Wed, Mar 03, 2010 at 09:38:49AM +0100, Peter Zijlstra wrote:
> On Wed, 2010-03-03 at 07:55 +0100, Frederic Weisbecker wrote:
> >  /*
> > + * Hot regs snapshot support -- arch specific
> > + * This needs to be a macro because we want the current
> > + * frame pointer.
> > + */
> > +#ifndef PERF_SAVE_REGS
> > +#define PERF_SAVE_REGS(regs)   memset(regs, 0, sizeof(*regs));
> > +#endif
> 
> It would be nice to have the fallback at least set the current or
> calling IP, now you've basically wrecked stuff for everything !x86.


Actually I can't, as there doesn't seem to be a way to write
regs->ip in a generic way :-s


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 15:55                 ` Steven Rostedt
@ 2010-03-04 21:17                   ` Ingo Molnar
  2010-03-04 21:30                     ` Steven Rostedt
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2010-03-04 21:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Frederic Weisbecker, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 2010-03-04 at 16:36 +0100, Ingo Molnar wrote:
> 
> > > This should be done anyway. But it would also let you decouple ftrace trace 
> > > events from perf trace events but still let the two use the same trace 
> > > points.
> > 
> > I think the main thing would be to have a decoupled /eventfs - basically 
> > /debug/tracing/events/ moved to "/eventfs" or maybe to "/proc/events/". This 
> > would make them available more widely, and in a standardized way.
> 
> I know Greg once proposed a /tracefs directory. I don't really care how 
> things work as long as we don't lose functionality. Perhaps we should have a 
> standard tracefs dir, and have:

No, we want to decouple it from 'tracing'. It's events, not tracing. Events 
are more broader, they can be used for RAS, profiling, counting, etc. - not 
just tracing.

Furthermore, we only want /debug/tracing/events really, not the various 
dynamic ftrace controls - those could remain in /debug/tracing/.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 21:17                   ` Ingo Molnar
@ 2010-03-04 21:30                     ` Steven Rostedt
  2010-03-04 21:37                       ` Frederic Weisbecker
  0 siblings, 1 reply; 76+ messages in thread
From: Steven Rostedt @ 2010-03-04 21:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Frederic Weisbecker, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH

On Thu, 2010-03-04 at 22:17 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 

> No, we want to decouple it from 'tracing'. It's events, not tracing. Events 
> are more broader, they can be used for RAS, profiling, counting, etc. - not 
> just tracing.
> 
> Furthermore, we only want /debug/tracing/events really, not the various 
> dynamic ftrace controls - those could remain in /debug/tracing/.

I was talking about the files in the events directory:

events/sched/sched_switch/{id,format,enable,filter}


Seems only the format file should go in, and perhaps the id.

I can keep the debug/tracing/events/* as is too, where the format and id
just call the same routines that the eventfs calls, but add the enable
and filter to be specific to ftrace.

-- Steve



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 21:30                     ` Steven Rostedt
@ 2010-03-04 21:37                       ` Frederic Weisbecker
  2010-03-04 21:52                         ` Thomas Gleixner
  2010-03-04 22:02                         ` Steven Rostedt
  0 siblings, 2 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-04 21:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH

On Thu, Mar 04, 2010 at 04:30:38PM -0500, Steven Rostedt wrote:
> On Thu, 2010-03-04 at 22:17 +0100, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> 
> > No, we want to decouple it from 'tracing'. It's events, not tracing. Events 
> > are more broader, they can be used for RAS, profiling, counting, etc. - not 
> > just tracing.
> > 
> > Furthermore, we only want /debug/tracing/events really, not the various 
> > dynamic ftrace controls - those could remain in /debug/tracing/.
> 
> I was talking about the files in the events directory:
> 
> events/sched/sched_switch/{id,format,enable,filter}
> 
> 
> Seems only the format file should go in, and perhaps the id.
> 
> I can keep the debug/tracing/events/* as is too, where the format and id
> just call the same routines that the eventfs calls, but add the enable
> and filter to be specific to ftrace.


The /debug/tracing/events could contain symlinks for the format
files so that the rest can stay there.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 21:37                       ` Frederic Weisbecker
@ 2010-03-04 21:52                         ` Thomas Gleixner
  2010-03-04 22:01                           ` Frederic Weisbecker
  2010-03-04 22:02                         ` Steven Rostedt
  1 sibling, 1 reply; 76+ messages in thread
From: Thomas Gleixner @ 2010-03-04 21:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, LKML,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH

On Thu, 4 Mar 2010, Frederic Weisbecker wrote:

> On Thu, Mar 04, 2010 at 04:30:38PM -0500, Steven Rostedt wrote:
> > On Thu, 2010-03-04 at 22:17 +0100, Ingo Molnar wrote:
> > > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > 
> > > No, we want to decouple it from 'tracing'. It's events, not tracing. Events 
> > > are more broader, they can be used for RAS, profiling, counting, etc. - not 
> > > just tracing.
> > > 
> > > Furthermore, we only want /debug/tracing/events really, not the various 
> > > dynamic ftrace controls - those could remain in /debug/tracing/.
> > 
> > I was talking about the files in the events directory:
> > 
> > events/sched/sched_switch/{id,format,enable,filter}
> > 
> > 
> > Seems only the format file should go in, and perhaps the id.
> > 
> > I can keep the debug/tracing/events/* as is too, where the format and id
> > just call the same routines that the eventfs calls, but add the enable
> > and filter to be specific to ftrace.
> 
> 
> The /debug/tracing/events could contain symlinks for the format
> files so that the rest can stay there.

Are you proposing to create another sysfs symlink maze ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 21:52                         ` Thomas Gleixner
@ 2010-03-04 22:01                           ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-04 22:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Ingo Molnar, Peter Zijlstra, LKML,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH

On Thu, Mar 04, 2010 at 10:52:36PM +0100, Thomas Gleixner wrote:
> On Thu, 4 Mar 2010, Frederic Weisbecker wrote:
> 
> > On Thu, Mar 04, 2010 at 04:30:38PM -0500, Steven Rostedt wrote:
> > > On Thu, 2010-03-04 at 22:17 +0100, Ingo Molnar wrote:
> > > > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > 
> > > 
> > > > No, we want to decouple it from 'tracing'. It's events, not tracing. Events 
> > > > are more broader, they can be used for RAS, profiling, counting, etc. - not 
> > > > just tracing.
> > > > 
> > > > Furthermore, we only want /debug/tracing/events really, not the various 
> > > > dynamic ftrace controls - those could remain in /debug/tracing/.
> > > 
> > > I was talking about the files in the events directory:
> > > 
> > > events/sched/sched_switch/{id,format,enable,filter}
> > > 
> > > 
> > > Seems only the format file should go in, and perhaps the id.
> > > 
> > > I can keep the debug/tracing/events/* as is too, where the format and id
> > > just call the same routines that the eventfs calls, but add the enable
> > > and filter to be specific to ftrace.
> > 
> > 
> > The /debug/tracing/events could contain symlinks for the format
> > files so that the rest can stay there.
> 
> Are you proposing to create another sysfs symlink maze ?
> 
> Thanks,
> 
> 	tglx


I don't know, I like the idea of generalizing the events, but I hope
it won't be at the detriment of ftrace usability.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 21:37                       ` Frederic Weisbecker
  2010-03-04 21:52                         ` Thomas Gleixner
@ 2010-03-04 22:02                         ` Steven Rostedt
  2010-03-04 22:09                           ` Frederic Weisbecker
  1 sibling, 1 reply; 76+ messages in thread
From: Steven Rostedt @ 2010-03-04 22:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH

On Thu, 2010-03-04 at 22:37 +0100, Frederic Weisbecker wrote:

> > I can keep the debug/tracing/events/* as is too, where the format and id
> > just call the same routines that the eventfs calls, but add the enable
> > and filter to be specific to ftrace.
> 
> 
> The /debug/tracing/events could contain symlinks for the format
> files so that the rest can stay there.

Symlinks buy us nothing but headaches. This is a pseudo filesystem, we
can use the same code that generates the file with no extra overhead.

-- Steve
 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events
  2010-03-04 22:02                         ` Steven Rostedt
@ 2010-03-04 22:09                           ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-04 22:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Thomas Gleixner,
	H. Peter Anvin, Paul Mackerras, Arnaldo Carvalho de Melo,
	Greg KH

On Thu, Mar 04, 2010 at 05:02:48PM -0500, Steven Rostedt wrote:
> On Thu, 2010-03-04 at 22:37 +0100, Frederic Weisbecker wrote:
> 
> > > I can keep the debug/tracing/events/* as is too, where the format and id
> > > just call the same routines that the eventfs calls, but add the enable
> > > and filter to be specific to ftrace.
> > 
> > 
> > The /debug/tracing/events could contain symlinks for the format
> > files so that the rest can stay there.
> 
> Symlinks buy us nothing but headaches. This is a pseudo filesystem, we
> can use the same code that generates the file with no extra overhead.


Yeah, you're right.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 0/2] Perf hot regs snapshot for trace events
  2010-03-03  8:38   ` Peter Zijlstra
  2010-03-03 20:07     ` Frederic Weisbecker
  2010-03-04 19:01     ` Frederic Weisbecker
@ 2010-03-05  3:08     ` Frederic Weisbecker
  2010-03-05  3:08       ` Frederic Weisbecker
  2010-03-05  3:08       ` Frederic Weisbecker
  4 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Jason Baron, Archs

Please tell me what you think about this new version.
I still can not fallback to save the ip for archs that
don't implement it as it can't be done in a generic way,
but more work has been made to implement this support
more easily.

No more assembly in this new version, it also uses
functions instead of macros and it implements the callers
level skipping. Then instead of starting the callchain from
perf/ftrace event callbacks, we start on the real event
source.

Thanks.

Frederic Weisbecker (2):
  perf: Introduce new perf_save_regs() for hot regs snapshot
  perf: Take a hot regs snapshot for trace events

 arch/x86/kernel/cpu/perf_event.c   |   12 ++++++++++
 arch/x86/kernel/dumpstack.h        |   15 ++++++++++++
 include/linux/ftrace_event.h       |    7 ++++-
 include/linux/perf_event.h         |   42 +++++++++++++++++++++++++++++++++++-
 include/trace/ftrace.h             |    6 ++++-
 kernel/perf_event.c                |   14 ++++++------
 kernel/trace/trace_event_profile.c |    3 +-
 kernel/trace/trace_kprobe.c        |    5 ++-
 kernel/trace/trace_syscalls.c      |    4 +-
 9 files changed, 92 insertions(+), 16 deletions(-)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot
  2010-03-03  8:38   ` Peter Zijlstra
  2010-03-03 20:07     ` Frederic Weisbecker
@ 2010-03-05  3:08       ` Frederic Weisbecker
  2010-03-05  3:08     ` [PATCH 0/2] Perf " Frederic Weisbecker
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Jason Baron, Archs,
	Peter Zijlstra

Events that trigger overflows by interrupting a context can
use get_irq_regs() or task_pt_regs() to retrieve the state
when the event triggered. But this is not the case for some
other class of events like trace events as tracepoints are
executed in the same context than the code that triggered
the event.

It means we need a different api to capture the regs there,
namely we need a hot snapshot to get the most important
informations for perf: the instruction pointer to get the
event origin, the frame pointer for the callchain, the code
segment for user_mode() tests (we always use __KERNEL_CS as
trace events always occur from the kernel) and the eflags
for further purposes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c |   12 ++++++++++
 arch/x86/kernel/dumpstack.h      |   15 +++++++++++++
 include/linux/perf_event.h       |   42 +++++++++++++++++++++++++++++++++++++-
 kernel/perf_event.c              |    5 ++++
 4 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 641ccb9..274d0cd 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1662,6 +1662,18 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return entry;
 }
 
+void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+	regs->ip = ip;
+	/*
+	 * perf_arch_save_regs adds another call, we need to increment
+	 * the skip level
+	 */
+	regs->bp = rewind_frame_pointer(skip + 1);
+	regs->cs = __KERNEL_CS;
+	local_save_flags(regs->flags);
+}
+
 void hw_perf_event_setup_online(int cpu)
 {
 	init_debug_store_on_cpu(cpu);
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..29e5f7c 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,19 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+static inline unsigned long rewind_frame_pointer(int n)
+{
+	struct stack_frame *frame;
+
+	get_bp(frame);
+
+#ifdef CONFIG_FRAME_POINTER
+	while (n--)
+		frame = frame->next_frame;
 #endif
+
+	return (unsigned long)frame;
+}
+
+#endif /* DUMPSTACK_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 04f06b4..e35ad6f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -452,6 +452,7 @@ enum perf_callchain_context {
 #include <linux/fs.h>
 #include <linux/pid_namespace.h>
 #include <linux/workqueue.h>
+#include <linux/ftrace.h>
 #include <asm/atomic.h>
 
 #define PERF_MAX_STACK_DEPTH		255
@@ -840,6 +841,44 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 		__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
+extern void
+perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip);
+
+/*
+ * Take a snapshot of the regs. Skip ip and frame pointer to
+ * the nth caller. We only need a few of the regs:
+ * - ip for PERF_SAMPLE_IP
+ * - cs for user_mode() tests
+ * - bp for callchains
+ * - eflags, for future purposes, just in case
+ */
+static inline void perf_save_regs(struct pt_regs *regs, int skip)
+{
+	unsigned long ip;
+
+	memset(regs, 0, sizeof(*regs));
+
+	switch (skip) {
+	case 1 :
+		ip = CALLER_ADDR0;
+		break;
+	case 2 :
+		ip = CALLER_ADDR1;
+		break;
+	case 3 :
+		ip = CALLER_ADDR2;
+		break;
+	case 4:
+		ip = CALLER_ADDR3;
+		break;
+	/* No need to support further for now */
+	default:
+		ip = 0;
+	}
+
+	return perf_arch_save_regs(regs, ip, skip);
+}
+
 extern void __perf_event_mmap(struct vm_area_struct *vma);
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)
@@ -858,7 +897,8 @@ extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 
 extern void perf_event_init(void);
-extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record, int entry_size);
+extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
+			  int entry_size, struct pt_regs *regs);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a661e79..42b8276 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2806,6 +2806,11 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return NULL;
 }
 
+__weak
+void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+}
+
 /*
  * Output
  */
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot
@ 2010-03-05  3:08       ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05  3:08 UTC (permalink / raw)
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Jason Baron, Archs,
	Peter Zijlstra

Events that trigger overflows by interrupting a context can
use get_irq_regs() or task_pt_regs() to retrieve the state
when the event triggered. But this is not the case for some
other class of events like trace events as tracepoints are
executed in the same context than the code that triggered
the event.

It means we need a different api to capture the regs there,
namely we need a hot snapshot to get the most important
informations for perf: the instruction pointer to get the
event origin, the frame pointer for the callchain, the code
segment for user_mode() tests (we always use __KERNEL_CS as
trace events always occur from the kernel) and the eflags
for further purposes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c |   12 ++++++++++
 arch/x86/kernel/dumpstack.h      |   15 +++++++++++++
 include/linux/perf_event.h       |   42 +++++++++++++++++++++++++++++++++++++-
 kernel/perf_event.c              |    5 ++++
 4 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 641ccb9..274d0cd 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1662,6 +1662,18 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return entry;
 }
 
+void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+	regs->ip = ip;
+	/*
+	 * perf_arch_save_regs adds another call, we need to increment
+	 * the skip level
+	 */
+	regs->bp = rewind_frame_pointer(skip + 1);
+	regs->cs = __KERNEL_CS;
+	local_save_flags(regs->flags);
+}
+
 void hw_perf_event_setup_online(int cpu)
 {
 	init_debug_store_on_cpu(cpu);
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..29e5f7c 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,19 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+static inline unsigned long rewind_frame_pointer(int n)
+{
+	struct stack_frame *frame;
+
+	get_bp(frame);
+
+#ifdef CONFIG_FRAME_POINTER
+	while (n--)
+		frame = frame->next_frame;
 #endif
+
+	return (unsigned long)frame;
+}
+
+#endif /* DUMPSTACK_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 04f06b4..e35ad6f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -452,6 +452,7 @@ enum perf_callchain_context {
 #include <linux/fs.h>
 #include <linux/pid_namespace.h>
 #include <linux/workqueue.h>
+#include <linux/ftrace.h>
 #include <asm/atomic.h>
 
 #define PERF_MAX_STACK_DEPTH		255
@@ -840,6 +841,44 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 		__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
+extern void
+perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip);
+
+/*
+ * Take a snapshot of the regs. Skip ip and frame pointer to
+ * the nth caller. We only need a few of the regs:
+ * - ip for PERF_SAMPLE_IP
+ * - cs for user_mode() tests
+ * - bp for callchains
+ * - eflags, for future purposes, just in case
+ */
+static inline void perf_save_regs(struct pt_regs *regs, int skip)
+{
+	unsigned long ip;
+
+	memset(regs, 0, sizeof(*regs));
+
+	switch (skip) {
+	case 1 :
+		ip = CALLER_ADDR0;
+		break;
+	case 2 :
+		ip = CALLER_ADDR1;
+		break;
+	case 3 :
+		ip = CALLER_ADDR2;
+		break;
+	case 4:
+		ip = CALLER_ADDR3;
+		break;
+	/* No need to support further for now */
+	default:
+		ip = 0;
+	}
+
+	return perf_arch_save_regs(regs, ip, skip);
+}
+
 extern void __perf_event_mmap(struct vm_area_struct *vma);
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)
@@ -858,7 +897,8 @@ extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 
 extern void perf_event_init(void);
-extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record, int entry_size);
+extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
+			  int entry_size, struct pt_regs *regs);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a661e79..42b8276 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2806,6 +2806,11 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return NULL;
 }
 
+__weak
+void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+}
+
 /*
  * Output
  */
-- 
1.6.2.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot
@ 2010-03-05  3:08       ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Jason Baron, Archs

Events that trigger overflows by interrupting a context can
use get_irq_regs() or task_pt_regs() to retrieve the state
when the event triggered. But this is not the case for some
other class of events like trace events as tracepoints are
executed in the same context than the code that triggered
the event.

It means we need a different api to capture the regs there,
namely we need a hot snapshot to get the most important
informations for perf: the instruction pointer to get the
event origin, the frame pointer for the callchain, the code
segment for user_mode() tests (we always use __KERNEL_CS as
trace events always occur from the kernel) and the eflags
for further purposes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c |   12 ++++++++++
 arch/x86/kernel/dumpstack.h      |   15 +++++++++++++
 include/linux/perf_event.h       |   42 +++++++++++++++++++++++++++++++++++++-
 kernel/perf_event.c              |    5 ++++
 4 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 641ccb9..274d0cd 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1662,6 +1662,18 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return entry;
 }
 
+void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+	regs->ip = ip;
+	/*
+	 * perf_arch_save_regs adds another call, we need to increment
+	 * the skip level
+	 */
+	regs->bp = rewind_frame_pointer(skip + 1);
+	regs->cs = __KERNEL_CS;
+	local_save_flags(regs->flags);
+}
+
 void hw_perf_event_setup_online(int cpu)
 {
 	init_debug_store_on_cpu(cpu);
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..29e5f7c 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,19 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+static inline unsigned long rewind_frame_pointer(int n)
+{
+	struct stack_frame *frame;
+
+	get_bp(frame);
+
+#ifdef CONFIG_FRAME_POINTER
+	while (n--)
+		frame = frame->next_frame;
 #endif
+
+	return (unsigned long)frame;
+}
+
+#endif /* DUMPSTACK_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 04f06b4..e35ad6f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -452,6 +452,7 @@ enum perf_callchain_context {
 #include <linux/fs.h>
 #include <linux/pid_namespace.h>
 #include <linux/workqueue.h>
+#include <linux/ftrace.h>
 #include <asm/atomic.h>
 
 #define PERF_MAX_STACK_DEPTH		255
@@ -840,6 +841,44 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 		__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
+extern void
+perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip);
+
+/*
+ * Take a snapshot of the regs. Skip ip and frame pointer to
+ * the nth caller. We only need a few of the regs:
+ * - ip for PERF_SAMPLE_IP
+ * - cs for user_mode() tests
+ * - bp for callchains
+ * - eflags, for future purposes, just in case
+ */
+static inline void perf_save_regs(struct pt_regs *regs, int skip)
+{
+	unsigned long ip;
+
+	memset(regs, 0, sizeof(*regs));
+
+	switch (skip) {
+	case 1 :
+		ip = CALLER_ADDR0;
+		break;
+	case 2 :
+		ip = CALLER_ADDR1;
+		break;
+	case 3 :
+		ip = CALLER_ADDR2;
+		break;
+	case 4:
+		ip = CALLER_ADDR3;
+		break;
+	/* No need to support further for now */
+	default:
+		ip = 0;
+	}
+
+	return perf_arch_save_regs(regs, ip, skip);
+}
+
 extern void __perf_event_mmap(struct vm_area_struct *vma);
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)
@@ -858,7 +897,8 @@ extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 
 extern void perf_event_init(void);
-extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record, int entry_size);
+extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
+			  int entry_size, struct pt_regs *regs);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a661e79..42b8276 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2806,6 +2806,11 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return NULL;
 }
 
+__weak
+void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+}
+
 /*
  * Output
  */
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] perf: Take a hot regs snapshot for trace events
  2010-03-03  8:38   ` Peter Zijlstra
  2010-03-03 20:07     ` Frederic Weisbecker
@ 2010-03-05  3:08       ` Frederic Weisbecker
  2010-03-05  3:08     ` [PATCH 0/2] Perf " Frederic Weisbecker
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Jason Baron, Archs,
	Peter Zijlstra

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

Let's use the new perf_save_regs() for that.

Comparison with perf record -e lock: -R -a -f -g
Before:

        perf  [kernel]                   [k] __do_softirq
               |
               --- __do_softirq
                  |
                  |--55.16%-- __open
                  |
                   --44.84%-- __write_nocancel

After:

            perf  [kernel]           [k] perf_tp_event
               |
               --- perf_tp_event
                  |
                  |--41.07%-- lock_acquire
                  |          |
                  |          |--39.36%-- _raw_spin_lock
                  |          |          |
                  |          |          |--7.81%-- hrtimer_interrupt
                  |          |          |          smp_apic_timer_interrupt
                  |          |          |          apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.

Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 include/linux/ftrace_event.h       |    7 +++++--
 include/trace/ftrace.h             |    6 +++++-
 kernel/perf_event.c                |    9 ++-------
 kernel/trace/trace_event_profile.c |    3 ++-
 kernel/trace/trace_kprobe.c        |    5 +++--
 kernel/trace/trace_syscalls.c      |    4 ++--
 6 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cd95919..48b367e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -188,6 +188,9 @@ do {									\
 
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;
+
+DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+
 extern int ftrace_profile_enable(int event_id);
 extern void ftrace_profile_disable(int event_id);
 extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
@@ -199,11 +202,11 @@ ftrace_perf_buf_prepare(int size, unsigned short type, int *rctxp,
 
 static inline void
 ftrace_perf_buf_submit(void *raw_data, int size, int rctx, u64 addr,
-		       u64 count, unsigned long irq_flags)
+		       u64 count, unsigned long irq_flags, struct pt_regs *regs)
 {
 	struct trace_entry *entry = raw_data;
 
-	perf_tp_event(entry->type, addr, count, raw_data, size);
+	perf_tp_event(entry->type, addr, count, raw_data, size, regs);
 	perf_swevent_put_recursion_context(rctx);
 	local_irq_restore(irq_flags);
 }
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f2c09e4..a4fa51c 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -853,6 +853,7 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct pt_regs *__regs;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
@@ -873,8 +874,11 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 									\
 	{ assign; }							\
 									\
+	__regs = &__get_cpu_var(perf_trace_regs);			\
+	perf_save_regs(__regs, 2);					\
+									\
 	ftrace_perf_buf_submit(entry, __entry_size, rctx, __addr,	\
-			       __count, irq_flags);			\
+			       __count, irq_flags, __regs);		\
 }
 
 #undef DEFINE_EVENT
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 42b8276..763a482 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4340,7 +4340,7 @@ static const struct pmu perf_ops_task_clock = {
 #ifdef CONFIG_EVENT_TRACING
 
 void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
-			  int entry_size)
+		   int entry_size, struct pt_regs *regs)
 {
 	struct perf_raw_record raw = {
 		.size = entry_size,
@@ -4352,14 +4352,9 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 		.raw = &raw,
 	};
 
-	struct pt_regs *regs = get_irq_regs();
-
-	if (!regs)
-		regs = task_pt_regs(current);
-
 	/* Trace events already protected against recursion */
 	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
-				&data, regs);
+			 &data, regs);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index f0d6930..e66d21e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -2,13 +2,14 @@
  * trace event based perf counter profiling
  *
  * Copyright (C) 2009 Red Hat Inc, Peter Zijlstra <pzijlstr@redhat.com>
- *
+ * Copyright (C) 2009-2010 Frederic Weisbecker <fweisbec@gmail.com>
  */
 
 #include <linux/module.h>
 #include <linux/kprobes.h>
 #include "trace.h"
 
+DEFINE_PER_CPU(struct pt_regs, perf_trace_regs);
 
 static char *perf_trace_buf;
 static char *perf_trace_buf_nmi;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 356c102..73e0526 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1260,7 +1260,7 @@ static __kprobes void kprobe_profile_func(struct kprobe *kp,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags, regs);
 }
 
 /* Kretprobe profile handler */
@@ -1291,7 +1291,8 @@ static __kprobes void kretprobe_profile_func(struct kretprobe_instance *ri,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1,
+			       irq_flags, regs);
 }
 
 static int probe_profile_enable(struct ftrace_event_call *call)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4e332b9..d0f34ab 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -462,7 +462,7 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	rec->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysenter_enable(struct ftrace_event_call *call)
@@ -537,7 +537,7 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysexit_enable(struct ftrace_event_call *call)
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] perf: Take a hot regs snapshot for trace events
@ 2010-03-05  3:08       ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05  3:08 UTC (permalink / raw)
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Jason Baron, Archs,
	Peter Zijlstra

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

Let's use the new perf_save_regs() for that.

Comparison with perf record -e lock: -R -a -f -g
Before:

        perf  [kernel]                   [k] __do_softirq
               |
               --- __do_softirq
                  |
                  |--55.16%-- __open
                  |
                   --44.84%-- __write_nocancel

After:

            perf  [kernel]           [k] perf_tp_event
               |
               --- perf_tp_event
                  |
                  |--41.07%-- lock_acquire
                  |          |
                  |          |--39.36%-- _raw_spin_lock
                  |          |          |
                  |          |          |--7.81%-- hrtimer_interrupt
                  |          |          |          smp_apic_timer_interrupt
                  |          |          |          apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.

Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 include/linux/ftrace_event.h       |    7 +++++--
 include/trace/ftrace.h             |    6 +++++-
 kernel/perf_event.c                |    9 ++-------
 kernel/trace/trace_event_profile.c |    3 ++-
 kernel/trace/trace_kprobe.c        |    5 +++--
 kernel/trace/trace_syscalls.c      |    4 ++--
 6 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cd95919..48b367e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -188,6 +188,9 @@ do {									\
 
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;
+
+DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+
 extern int ftrace_profile_enable(int event_id);
 extern void ftrace_profile_disable(int event_id);
 extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
@@ -199,11 +202,11 @@ ftrace_perf_buf_prepare(int size, unsigned short type, int *rctxp,
 
 static inline void
 ftrace_perf_buf_submit(void *raw_data, int size, int rctx, u64 addr,
-		       u64 count, unsigned long irq_flags)
+		       u64 count, unsigned long irq_flags, struct pt_regs *regs)
 {
 	struct trace_entry *entry = raw_data;
 
-	perf_tp_event(entry->type, addr, count, raw_data, size);
+	perf_tp_event(entry->type, addr, count, raw_data, size, regs);
 	perf_swevent_put_recursion_context(rctx);
 	local_irq_restore(irq_flags);
 }
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f2c09e4..a4fa51c 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -853,6 +853,7 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct pt_regs *__regs;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
@@ -873,8 +874,11 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 									\
 	{ assign; }							\
 									\
+	__regs = &__get_cpu_var(perf_trace_regs);			\
+	perf_save_regs(__regs, 2);					\
+									\
 	ftrace_perf_buf_submit(entry, __entry_size, rctx, __addr,	\
-			       __count, irq_flags);			\
+			       __count, irq_flags, __regs);		\
 }
 
 #undef DEFINE_EVENT
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 42b8276..763a482 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4340,7 +4340,7 @@ static const struct pmu perf_ops_task_clock = {
 #ifdef CONFIG_EVENT_TRACING
 
 void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
-			  int entry_size)
+		   int entry_size, struct pt_regs *regs)
 {
 	struct perf_raw_record raw = {
 		.size = entry_size,
@@ -4352,14 +4352,9 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 		.raw = &raw,
 	};
 
-	struct pt_regs *regs = get_irq_regs();
-
-	if (!regs)
-		regs = task_pt_regs(current);
-
 	/* Trace events already protected against recursion */
 	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
-				&data, regs);
+			 &data, regs);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index f0d6930..e66d21e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -2,13 +2,14 @@
  * trace event based perf counter profiling
  *
  * Copyright (C) 2009 Red Hat Inc, Peter Zijlstra <pzijlstr@redhat.com>
- *
+ * Copyright (C) 2009-2010 Frederic Weisbecker <fweisbec@gmail.com>
  */
 
 #include <linux/module.h>
 #include <linux/kprobes.h>
 #include "trace.h"
 
+DEFINE_PER_CPU(struct pt_regs, perf_trace_regs);
 
 static char *perf_trace_buf;
 static char *perf_trace_buf_nmi;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 356c102..73e0526 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1260,7 +1260,7 @@ static __kprobes void kprobe_profile_func(struct kprobe *kp,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags, regs);
 }
 
 /* Kretprobe profile handler */
@@ -1291,7 +1291,8 @@ static __kprobes void kretprobe_profile_func(struct kretprobe_instance *ri,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1,
+			       irq_flags, regs);
 }
 
 static int probe_profile_enable(struct ftrace_event_call *call)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4e332b9..d0f34ab 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -462,7 +462,7 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	rec->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysenter_enable(struct ftrace_event_call *call)
@@ -537,7 +537,7 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysexit_enable(struct ftrace_event_call *call)
-- 
1.6.2.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] perf: Take a hot regs snapshot for trace events
@ 2010-03-05  3:08       ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Masami Hiramatsu, Jason Baron, Archs

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

Let's use the new perf_save_regs() for that.

Comparison with perf record -e lock: -R -a -f -g
Before:

        perf  [kernel]                   [k] __do_softirq
               |
               --- __do_softirq
                  |
                  |--55.16%-- __open
                  |
                   --44.84%-- __write_nocancel

After:

            perf  [kernel]           [k] perf_tp_event
               |
               --- perf_tp_event
                  |
                  |--41.07%-- lock_acquire
                  |          |
                  |          |--39.36%-- _raw_spin_lock
                  |          |          |
                  |          |          |--7.81%-- hrtimer_interrupt
                  |          |          |          smp_apic_timer_interrupt
                  |          |          |          apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.

Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 include/linux/ftrace_event.h       |    7 +++++--
 include/trace/ftrace.h             |    6 +++++-
 kernel/perf_event.c                |    9 ++-------
 kernel/trace/trace_event_profile.c |    3 ++-
 kernel/trace/trace_kprobe.c        |    5 +++--
 kernel/trace/trace_syscalls.c      |    4 ++--
 6 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cd95919..48b367e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -188,6 +188,9 @@ do {									\
 
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;
+
+DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+
 extern int ftrace_profile_enable(int event_id);
 extern void ftrace_profile_disable(int event_id);
 extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
@@ -199,11 +202,11 @@ ftrace_perf_buf_prepare(int size, unsigned short type, int *rctxp,
 
 static inline void
 ftrace_perf_buf_submit(void *raw_data, int size, int rctx, u64 addr,
-		       u64 count, unsigned long irq_flags)
+		       u64 count, unsigned long irq_flags, struct pt_regs *regs)
 {
 	struct trace_entry *entry = raw_data;
 
-	perf_tp_event(entry->type, addr, count, raw_data, size);
+	perf_tp_event(entry->type, addr, count, raw_data, size, regs);
 	perf_swevent_put_recursion_context(rctx);
 	local_irq_restore(irq_flags);
 }
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f2c09e4..a4fa51c 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -853,6 +853,7 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct pt_regs *__regs;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
@@ -873,8 +874,11 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 									\
 	{ assign; }							\
 									\
+	__regs = &__get_cpu_var(perf_trace_regs);			\
+	perf_save_regs(__regs, 2);					\
+									\
 	ftrace_perf_buf_submit(entry, __entry_size, rctx, __addr,	\
-			       __count, irq_flags);			\
+			       __count, irq_flags, __regs);		\
 }
 
 #undef DEFINE_EVENT
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 42b8276..763a482 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4340,7 +4340,7 @@ static const struct pmu perf_ops_task_clock = {
 #ifdef CONFIG_EVENT_TRACING
 
 void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
-			  int entry_size)
+		   int entry_size, struct pt_regs *regs)
 {
 	struct perf_raw_record raw = {
 		.size = entry_size,
@@ -4352,14 +4352,9 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 		.raw = &raw,
 	};
 
-	struct pt_regs *regs = get_irq_regs();
-
-	if (!regs)
-		regs = task_pt_regs(current);
-
 	/* Trace events already protected against recursion */
 	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
-				&data, regs);
+			 &data, regs);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index f0d6930..e66d21e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -2,13 +2,14 @@
  * trace event based perf counter profiling
  *
  * Copyright (C) 2009 Red Hat Inc, Peter Zijlstra <pzijlstr@redhat.com>
- *
+ * Copyright (C) 2009-2010 Frederic Weisbecker <fweisbec@gmail.com>
  */
 
 #include <linux/module.h>
 #include <linux/kprobes.h>
 #include "trace.h"
 
+DEFINE_PER_CPU(struct pt_regs, perf_trace_regs);
 
 static char *perf_trace_buf;
 static char *perf_trace_buf_nmi;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 356c102..73e0526 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1260,7 +1260,7 @@ static __kprobes void kprobe_profile_func(struct kprobe *kp,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags, regs);
 }
 
 /* Kretprobe profile handler */
@@ -1291,7 +1291,8 @@ static __kprobes void kretprobe_profile_func(struct kretprobe_instance *ri,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1,
+			       irq_flags, regs);
 }
 
 static int probe_profile_enable(struct ftrace_event_call *call)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4e332b9..d0f34ab 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -462,7 +462,7 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	rec->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysenter_enable(struct ftrace_event_call *call)
@@ -537,7 +537,7 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysexit_enable(struct ftrace_event_call *call)
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot
  2010-03-05  3:08       ` Frederic Weisbecker
  (?)
  (?)
@ 2010-03-05 15:08       ` Masami Hiramatsu
  2010-03-05 16:38         ` Frederic Weisbecker
  -1 siblings, 1 reply; 76+ messages in thread
From: Masami Hiramatsu @ 2010-03-05 15:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Jason Baron, Archs

Frederic Weisbecker wrote:
> Events that trigger overflows by interrupting a context can
> use get_irq_regs() or task_pt_regs() to retrieve the state
> when the event triggered. But this is not the case for some
> other class of events like trace events as tracepoints are
> executed in the same context than the code that triggered
> the event.
> 
> It means we need a different api to capture the regs there,
> namely we need a hot snapshot to get the most important
> informations for perf: the instruction pointer to get the
> event origin, the frame pointer for the callchain, the code
> segment for user_mode() tests (we always use __KERNEL_CS as
> trace events always occur from the kernel) and the eflags
> for further purposes.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Masami Hiramatsu <mhiramat@redhat.com>
> Cc: Jason Baron <jbaron@redhat.com>
> Cc: Archs <linux-arch@vger.kernel.org>
> ---
>  arch/x86/kernel/cpu/perf_event.c |   12 ++++++++++
>  arch/x86/kernel/dumpstack.h      |   15 +++++++++++++
>  include/linux/perf_event.h       |   42 +++++++++++++++++++++++++++++++++++++-
>  kernel/perf_event.c              |    5 ++++
>  4 files changed, 73 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 641ccb9..274d0cd 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1662,6 +1662,18 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
>  	return entry;
>  }
>  
> +void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)

Hmm, why would you call it 'save_regs' ?
It seems that this function is just for fixing registers 
instead of saving it into somewhere...

Thank you,


-- 
Masami Hiramatsu
e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot
  2010-03-05 15:08       ` Masami Hiramatsu
@ 2010-03-05 16:38         ` Frederic Weisbecker
  2010-03-05 17:08           ` Masami Hiramatsu
  0 siblings, 1 reply; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 16:38 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Jason Baron, Archs

On Fri, Mar 05, 2010 at 10:08:07AM -0500, Masami Hiramatsu wrote:
> > +void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
> 
> Hmm, why would you call it 'save_regs' ?
> It seems that this function is just for fixing registers 
> instead of saving it into somewhere...
> 
> Thank you,


Hmm, save_regs() describes what it does: you pass
a pt_regs and it saves registers inside. But it
has also a kind of fixup thing as it also rewinds.

I'm not sure using a fixup thing for the naming
is correct as we are not starting with initial
regs passed to the function (just a raw buffer).

What about perf_save_caller_regs() ?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot
  2010-03-05 16:38         ` Frederic Weisbecker
@ 2010-03-05 17:08           ` Masami Hiramatsu
  2010-03-05 17:17             ` Frederic Weisbecker
                               ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Masami Hiramatsu @ 2010-03-05 17:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Jason Baron, Archs

Frederic Weisbecker wrote:
> On Fri, Mar 05, 2010 at 10:08:07AM -0500, Masami Hiramatsu wrote:
>>> +void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
>>
>> Hmm, why would you call it 'save_regs' ?
>> It seems that this function is just for fixing registers 
>> instead of saving it into somewhere...
>>
>> Thank you,
> 
> 
> Hmm, save_regs() describes what it does: you pass
> a pt_regs and it saves registers inside. But it
> has also a kind of fixup thing as it also rewinds.

Ah, I see. so this saves current register values
into pt_regs. :)

> 
> I'm not sure using a fixup thing for the naming
> is correct as we are not starting with initial
> regs passed to the function (just a raw buffer).
> 
> What about perf_save_caller_regs() ?
> 

Hmm, I think, it might be better perf_get_caller_regs()
or something like that (fetch ?). 

Thank you,


-- 
Masami Hiramatsu
e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot
  2010-03-05 17:08           ` Masami Hiramatsu
@ 2010-03-05 17:17             ` Frederic Weisbecker
  2010-03-05 20:55               ` Frederic Weisbecker
  2010-03-05 20:55               ` Frederic Weisbecker
  2 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 17:17 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Thomas Gleixner,
	H . Peter Anvin, Paul Mackerras, Steven Rostedt,
	Arnaldo Carvalho de Melo, Jason Baron, Archs

On Fri, Mar 05, 2010 at 12:08:19PM -0500, Masami Hiramatsu wrote:
> Frederic Weisbecker wrote:
> > On Fri, Mar 05, 2010 at 10:08:07AM -0500, Masami Hiramatsu wrote:
> >>> +void perf_arch_save_regs(struct pt_regs *regs, unsigned long ip, int skip)
> >>
> >> Hmm, why would you call it 'save_regs' ?
> >> It seems that this function is just for fixing registers 
> >> instead of saving it into somewhere...
> >>
> >> Thank you,
> > 
> > 
> > Hmm, save_regs() describes what it does: you pass
> > a pt_regs and it saves registers inside. But it
> > has also a kind of fixup thing as it also rewinds.
> 
> Ah, I see. so this saves current register values
> into pt_regs. :)
> 
> > 
> > I'm not sure using a fixup thing for the naming
> > is correct as we are not starting with initial
> > regs passed to the function (just a raw buffer).
> > 
> > What about perf_save_caller_regs() ?
> > 
> 
> Hmm, I think, it might be better perf_get_caller_regs()
> or something like that (fetch ?). 


perf_fetch_caller_regs() looks fine. I'll update my
patch accordingly, thanks.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/2] perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
  2010-03-05 17:08           ` Masami Hiramatsu
  2010-03-05 17:17             ` Frederic Weisbecker
@ 2010-03-05 20:55               ` Frederic Weisbecker
  2010-03-05 20:55               ` Frederic Weisbecker
  2 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 20:55 UTC (permalink / raw)
  To: LKML
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Paul Mackerras,
	Steven Rostedt, Masami Hiramatsu, Jason Baron, Thomas Gleixner,
	H . Peter Anvin, Arnaldo Carvalho de Melo, Archs, Ingo Molnar

Events that trigger overflows by interrupting a context can
use get_irq_regs() or task_pt_regs() to retrieve the state
when the event triggered. But this is not the case for some
other class of events like trace events as tracepoints are
executed in the same context than the code that triggered
the event.

It means we need a different api to capture the regs there,
namely we need a hot snapshot to get the most important
informations for perf: the instruction pointer to get the
event origin, the frame pointer for the callchain, the code
segment for user_mode() tests (we always use __KERNEL_CS as
trace events always occur from the kernel) and the eflags
for further purposes.

v2: rename perf_save_regs to perf_fetch_caller_regs as per
Masami's suggestion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c |   12 ++++++++++
 arch/x86/kernel/dumpstack.h      |   15 +++++++++++++
 include/linux/perf_event.h       |   42 +++++++++++++++++++++++++++++++++++++-
 kernel/perf_event.c              |    5 ++++
 4 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 641ccb9..4c1f430 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1662,6 +1662,18 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return entry;
 }
 
+void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+	regs->ip = ip;
+	/*
+	 * perf_arch_fetch_caller_regs adds another call, we need to increment
+	 * the skip level
+	 */
+	regs->bp = rewind_frame_pointer(skip + 1);
+	regs->cs = __KERNEL_CS;
+	local_save_flags(regs->flags);
+}
+
 void hw_perf_event_setup_online(int cpu)
 {
 	init_debug_store_on_cpu(cpu);
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..29e5f7c 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,19 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+static inline unsigned long rewind_frame_pointer(int n)
+{
+	struct stack_frame *frame;
+
+	get_bp(frame);
+
+#ifdef CONFIG_FRAME_POINTER
+	while (n--)
+		frame = frame->next_frame;
 #endif
+
+	return (unsigned long)frame;
+}
+
+#endif /* DUMPSTACK_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 04f06b4..c859bee 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -452,6 +452,7 @@ enum perf_callchain_context {
 #include <linux/fs.h>
 #include <linux/pid_namespace.h>
 #include <linux/workqueue.h>
+#include <linux/ftrace.h>
 #include <asm/atomic.h>
 
 #define PERF_MAX_STACK_DEPTH		255
@@ -840,6 +841,44 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 		__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
+extern void
+perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip);
+
+/*
+ * Take a snapshot of the regs. Skip ip and frame pointer to
+ * the nth caller. We only need a few of the regs:
+ * - ip for PERF_SAMPLE_IP
+ * - cs for user_mode() tests
+ * - bp for callchains
+ * - eflags, for future purposes, just in case
+ */
+static inline void perf_fetch_caller_regs(struct pt_regs *regs, int skip)
+{
+	unsigned long ip;
+
+	memset(regs, 0, sizeof(*regs));
+
+	switch (skip) {
+	case 1 :
+		ip = CALLER_ADDR0;
+		break;
+	case 2 :
+		ip = CALLER_ADDR1;
+		break;
+	case 3 :
+		ip = CALLER_ADDR2;
+		break;
+	case 4:
+		ip = CALLER_ADDR3;
+		break;
+	/* No need to support further for now */
+	default:
+		ip = 0;
+	}
+
+	return perf_arch_fetch_caller_regs(regs, ip, skip);
+}
+
 extern void __perf_event_mmap(struct vm_area_struct *vma);
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)
@@ -858,7 +897,8 @@ extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 
 extern void perf_event_init(void);
-extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record, int entry_size);
+extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
+			  int entry_size, struct pt_regs *regs);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a661e79..1c3059b 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2806,6 +2806,11 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return NULL;
 }
 
+__weak
+void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+}
+
 /*
  * Output
  */
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 1/2] perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
@ 2010-03-05 20:55               ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 20:55 UTC (permalink / raw)
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Paul Mackerras,
	Steven Rostedt, Masami Hiramatsu, Jason Baron, Thomas Gleixner,
	H . Peter Anvin, Arnaldo Carvalho de Melo, Archs, Ingo Molnar

Events that trigger overflows by interrupting a context can
use get_irq_regs() or task_pt_regs() to retrieve the state
when the event triggered. But this is not the case for some
other class of events like trace events as tracepoints are
executed in the same context than the code that triggered
the event.

It means we need a different api to capture the regs there,
namely we need a hot snapshot to get the most important
informations for perf: the instruction pointer to get the
event origin, the frame pointer for the callchain, the code
segment for user_mode() tests (we always use __KERNEL_CS as
trace events always occur from the kernel) and the eflags
for further purposes.

v2: rename perf_save_regs to perf_fetch_caller_regs as per
Masami's suggestion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c |   12 ++++++++++
 arch/x86/kernel/dumpstack.h      |   15 +++++++++++++
 include/linux/perf_event.h       |   42 +++++++++++++++++++++++++++++++++++++-
 kernel/perf_event.c              |    5 ++++
 4 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 641ccb9..4c1f430 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1662,6 +1662,18 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return entry;
 }
 
+void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+	regs->ip = ip;
+	/*
+	 * perf_arch_fetch_caller_regs adds another call, we need to increment
+	 * the skip level
+	 */
+	regs->bp = rewind_frame_pointer(skip + 1);
+	regs->cs = __KERNEL_CS;
+	local_save_flags(regs->flags);
+}
+
 void hw_perf_event_setup_online(int cpu)
 {
 	init_debug_store_on_cpu(cpu);
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..29e5f7c 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,19 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+static inline unsigned long rewind_frame_pointer(int n)
+{
+	struct stack_frame *frame;
+
+	get_bp(frame);
+
+#ifdef CONFIG_FRAME_POINTER
+	while (n--)
+		frame = frame->next_frame;
 #endif
+
+	return (unsigned long)frame;
+}
+
+#endif /* DUMPSTACK_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 04f06b4..c859bee 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -452,6 +452,7 @@ enum perf_callchain_context {
 #include <linux/fs.h>
 #include <linux/pid_namespace.h>
 #include <linux/workqueue.h>
+#include <linux/ftrace.h>
 #include <asm/atomic.h>
 
 #define PERF_MAX_STACK_DEPTH		255
@@ -840,6 +841,44 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 		__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
+extern void
+perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip);
+
+/*
+ * Take a snapshot of the regs. Skip ip and frame pointer to
+ * the nth caller. We only need a few of the regs:
+ * - ip for PERF_SAMPLE_IP
+ * - cs for user_mode() tests
+ * - bp for callchains
+ * - eflags, for future purposes, just in case
+ */
+static inline void perf_fetch_caller_regs(struct pt_regs *regs, int skip)
+{
+	unsigned long ip;
+
+	memset(regs, 0, sizeof(*regs));
+
+	switch (skip) {
+	case 1 :
+		ip = CALLER_ADDR0;
+		break;
+	case 2 :
+		ip = CALLER_ADDR1;
+		break;
+	case 3 :
+		ip = CALLER_ADDR2;
+		break;
+	case 4:
+		ip = CALLER_ADDR3;
+		break;
+	/* No need to support further for now */
+	default:
+		ip = 0;
+	}
+
+	return perf_arch_fetch_caller_regs(regs, ip, skip);
+}
+
 extern void __perf_event_mmap(struct vm_area_struct *vma);
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)
@@ -858,7 +897,8 @@ extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 
 extern void perf_event_init(void);
-extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record, int entry_size);
+extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
+			  int entry_size, struct pt_regs *regs);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a661e79..1c3059b 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2806,6 +2806,11 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return NULL;
 }
 
+__weak
+void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+}
+
 /*
  * Output
  */
-- 
1.6.2.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 1/2] perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
@ 2010-03-05 20:55               ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 20:55 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Paul Mackerras,
	Steven Rostedt, Masami Hiramatsu, Jason Baron, Thomas Gleixner,
	H . Peter Anvin, Arnaldo Carvalho de Melo, Archs, Ingo Molnar

Events that trigger overflows by interrupting a context can
use get_irq_regs() or task_pt_regs() to retrieve the state
when the event triggered. But this is not the case for some
other class of events like trace events as tracepoints are
executed in the same context than the code that triggered
the event.

It means we need a different api to capture the regs there,
namely we need a hot snapshot to get the most important
informations for perf: the instruction pointer to get the
event origin, the frame pointer for the callchain, the code
segment for user_mode() tests (we always use __KERNEL_CS as
trace events always occur from the kernel) and the eflags
for further purposes.

v2: rename perf_save_regs to perf_fetch_caller_regs as per
Masami's suggestion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c |   12 ++++++++++
 arch/x86/kernel/dumpstack.h      |   15 +++++++++++++
 include/linux/perf_event.h       |   42 +++++++++++++++++++++++++++++++++++++-
 kernel/perf_event.c              |    5 ++++
 4 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 641ccb9..4c1f430 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1662,6 +1662,18 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return entry;
 }
 
+void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+	regs->ip = ip;
+	/*
+	 * perf_arch_fetch_caller_regs adds another call, we need to increment
+	 * the skip level
+	 */
+	regs->bp = rewind_frame_pointer(skip + 1);
+	regs->cs = __KERNEL_CS;
+	local_save_flags(regs->flags);
+}
+
 void hw_perf_event_setup_online(int cpu)
 {
 	init_debug_store_on_cpu(cpu);
diff --git a/arch/x86/kernel/dumpstack.h b/arch/x86/kernel/dumpstack.h
index 4fd1420..29e5f7c 100644
--- a/arch/x86/kernel/dumpstack.h
+++ b/arch/x86/kernel/dumpstack.h
@@ -29,4 +29,19 @@ struct stack_frame {
 	struct stack_frame *next_frame;
 	unsigned long return_address;
 };
+
+static inline unsigned long rewind_frame_pointer(int n)
+{
+	struct stack_frame *frame;
+
+	get_bp(frame);
+
+#ifdef CONFIG_FRAME_POINTER
+	while (n--)
+		frame = frame->next_frame;
 #endif
+
+	return (unsigned long)frame;
+}
+
+#endif /* DUMPSTACK_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 04f06b4..c859bee 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -452,6 +452,7 @@ enum perf_callchain_context {
 #include <linux/fs.h>
 #include <linux/pid_namespace.h>
 #include <linux/workqueue.h>
+#include <linux/ftrace.h>
 #include <asm/atomic.h>
 
 #define PERF_MAX_STACK_DEPTH		255
@@ -840,6 +841,44 @@ perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
 		__perf_sw_event(event_id, nr, nmi, regs, addr);
 }
 
+extern void
+perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip);
+
+/*
+ * Take a snapshot of the regs. Skip ip and frame pointer to
+ * the nth caller. We only need a few of the regs:
+ * - ip for PERF_SAMPLE_IP
+ * - cs for user_mode() tests
+ * - bp for callchains
+ * - eflags, for future purposes, just in case
+ */
+static inline void perf_fetch_caller_regs(struct pt_regs *regs, int skip)
+{
+	unsigned long ip;
+
+	memset(regs, 0, sizeof(*regs));
+
+	switch (skip) {
+	case 1 :
+		ip = CALLER_ADDR0;
+		break;
+	case 2 :
+		ip = CALLER_ADDR1;
+		break;
+	case 3 :
+		ip = CALLER_ADDR2;
+		break;
+	case 4:
+		ip = CALLER_ADDR3;
+		break;
+	/* No need to support further for now */
+	default:
+		ip = 0;
+	}
+
+	return perf_arch_fetch_caller_regs(regs, ip, skip);
+}
+
 extern void __perf_event_mmap(struct vm_area_struct *vma);
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)
@@ -858,7 +897,8 @@ extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 
 extern void perf_event_init(void);
-extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record, int entry_size);
+extern void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
+			  int entry_size, struct pt_regs *regs);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a661e79..1c3059b 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2806,6 +2806,11 @@ __weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 	return NULL;
 }
 
+__weak
+void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int skip)
+{
+}
+
 /*
  * Output
  */
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] perf: Take a hot regs snapshot for trace events
  2010-03-05 17:08           ` Masami Hiramatsu
  2010-03-05 17:17             ` Frederic Weisbecker
@ 2010-03-05 20:55               ` Frederic Weisbecker
  2010-03-05 20:55               ` Frederic Weisbecker
  2 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 20:55 UTC (permalink / raw)
  To: LKML
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Paul Mackerras,
	Steven Rostedt, Masami Hiramatsu, Jason Baron, Thomas Gleixner,
	H . Peter Anvin, Arnaldo Carvalho de Melo, Archs, Ingo Molnar

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

Let's use the new perf_fetch_caller_regs() for that.

Comparison with perf record -e lock: -R -a -f -g
Before:

        perf  [kernel]                   [k] __do_softirq
               |
               --- __do_softirq
                  |
                  |--55.16%-- __open
                  |
                   --44.84%-- __write_nocancel

After:

            perf  [kernel]           [k] perf_tp_event
               |
               --- perf_tp_event
                  |
                  |--41.07%-- lock_acquire
                  |          |
                  |          |--39.36%-- _raw_spin_lock
                  |          |          |
                  |          |          |--7.81%-- hrtimer_interrupt
                  |          |          |          smp_apic_timer_interrupt
                  |          |          |          apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.

Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.

v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 include/linux/ftrace_event.h       |    7 +++++--
 include/trace/ftrace.h             |    6 +++++-
 kernel/perf_event.c                |    9 ++-------
 kernel/trace/trace_event_profile.c |    3 ++-
 kernel/trace/trace_kprobe.c        |    5 +++--
 kernel/trace/trace_syscalls.c      |    4 ++--
 6 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cd95919..48b367e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -188,6 +188,9 @@ do {									\
 
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;
+
+DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+
 extern int ftrace_profile_enable(int event_id);
 extern void ftrace_profile_disable(int event_id);
 extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
@@ -199,11 +202,11 @@ ftrace_perf_buf_prepare(int size, unsigned short type, int *rctxp,
 
 static inline void
 ftrace_perf_buf_submit(void *raw_data, int size, int rctx, u64 addr,
-		       u64 count, unsigned long irq_flags)
+		       u64 count, unsigned long irq_flags, struct pt_regs *regs)
 {
 	struct trace_entry *entry = raw_data;
 
-	perf_tp_event(entry->type, addr, count, raw_data, size);
+	perf_tp_event(entry->type, addr, count, raw_data, size, regs);
 	perf_swevent_put_recursion_context(rctx);
 	local_irq_restore(irq_flags);
 }
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f2c09e4..1f7677d 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -853,6 +853,7 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct pt_regs *__regs;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
@@ -873,8 +874,11 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 									\
 	{ assign; }							\
 									\
+	__regs = &__get_cpu_var(perf_trace_regs);			\
+	perf_fetch_caller_regs(__regs, 2);				\
+									\
 	ftrace_perf_buf_submit(entry, __entry_size, rctx, __addr,	\
-			       __count, irq_flags);			\
+			       __count, irq_flags, __regs);		\
 }
 
 #undef DEFINE_EVENT
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 1c3059b..aa3b98c 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4340,7 +4340,7 @@ static const struct pmu perf_ops_task_clock = {
 #ifdef CONFIG_EVENT_TRACING
 
 void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
-			  int entry_size)
+		   int entry_size, struct pt_regs *regs)
 {
 	struct perf_raw_record raw = {
 		.size = entry_size,
@@ -4352,14 +4352,9 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 		.raw = &raw,
 	};
 
-	struct pt_regs *regs = get_irq_regs();
-
-	if (!regs)
-		regs = task_pt_regs(current);
-
 	/* Trace events already protected against recursion */
 	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
-				&data, regs);
+			 &data, regs);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index f0d6930..e66d21e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -2,13 +2,14 @@
  * trace event based perf counter profiling
  *
  * Copyright (C) 2009 Red Hat Inc, Peter Zijlstra <pzijlstr@redhat.com>
- *
+ * Copyright (C) 2009-2010 Frederic Weisbecker <fweisbec@gmail.com>
  */
 
 #include <linux/module.h>
 #include <linux/kprobes.h>
 #include "trace.h"
 
+DEFINE_PER_CPU(struct pt_regs, perf_trace_regs);
 
 static char *perf_trace_buf;
 static char *perf_trace_buf_nmi;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 356c102..73e0526 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1260,7 +1260,7 @@ static __kprobes void kprobe_profile_func(struct kprobe *kp,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags, regs);
 }
 
 /* Kretprobe profile handler */
@@ -1291,7 +1291,8 @@ static __kprobes void kretprobe_profile_func(struct kretprobe_instance *ri,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1,
+			       irq_flags, regs);
 }
 
 static int probe_profile_enable(struct ftrace_event_call *call)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4e332b9..d0f34ab 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -462,7 +462,7 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	rec->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysenter_enable(struct ftrace_event_call *call)
@@ -537,7 +537,7 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysexit_enable(struct ftrace_event_call *call)
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] perf: Take a hot regs snapshot for trace events
@ 2010-03-05 20:55               ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 20:55 UTC (permalink / raw)
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Paul Mackerras,
	Steven Rostedt, Masami Hiramatsu, Jason Baron, Thomas Gleixner,
	H . Peter Anvin, Arnaldo Carvalho de Melo, Archs, Ingo Molnar

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

Let's use the new perf_fetch_caller_regs() for that.

Comparison with perf record -e lock: -R -a -f -g
Before:

        perf  [kernel]                   [k] __do_softirq
               |
               --- __do_softirq
                  |
                  |--55.16%-- __open
                  |
                   --44.84%-- __write_nocancel

After:

            perf  [kernel]           [k] perf_tp_event
               |
               --- perf_tp_event
                  |
                  |--41.07%-- lock_acquire
                  |          |
                  |          |--39.36%-- _raw_spin_lock
                  |          |          |
                  |          |          |--7.81%-- hrtimer_interrupt
                  |          |          |          smp_apic_timer_interrupt
                  |          |          |          apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.

Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.

v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 include/linux/ftrace_event.h       |    7 +++++--
 include/trace/ftrace.h             |    6 +++++-
 kernel/perf_event.c                |    9 ++-------
 kernel/trace/trace_event_profile.c |    3 ++-
 kernel/trace/trace_kprobe.c        |    5 +++--
 kernel/trace/trace_syscalls.c      |    4 ++--
 6 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cd95919..48b367e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -188,6 +188,9 @@ do {									\
 
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;
+
+DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+
 extern int ftrace_profile_enable(int event_id);
 extern void ftrace_profile_disable(int event_id);
 extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
@@ -199,11 +202,11 @@ ftrace_perf_buf_prepare(int size, unsigned short type, int *rctxp,
 
 static inline void
 ftrace_perf_buf_submit(void *raw_data, int size, int rctx, u64 addr,
-		       u64 count, unsigned long irq_flags)
+		       u64 count, unsigned long irq_flags, struct pt_regs *regs)
 {
 	struct trace_entry *entry = raw_data;
 
-	perf_tp_event(entry->type, addr, count, raw_data, size);
+	perf_tp_event(entry->type, addr, count, raw_data, size, regs);
 	perf_swevent_put_recursion_context(rctx);
 	local_irq_restore(irq_flags);
 }
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f2c09e4..1f7677d 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -853,6 +853,7 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct pt_regs *__regs;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
@@ -873,8 +874,11 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 									\
 	{ assign; }							\
 									\
+	__regs = &__get_cpu_var(perf_trace_regs);			\
+	perf_fetch_caller_regs(__regs, 2);				\
+									\
 	ftrace_perf_buf_submit(entry, __entry_size, rctx, __addr,	\
-			       __count, irq_flags);			\
+			       __count, irq_flags, __regs);		\
 }
 
 #undef DEFINE_EVENT
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 1c3059b..aa3b98c 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4340,7 +4340,7 @@ static const struct pmu perf_ops_task_clock = {
 #ifdef CONFIG_EVENT_TRACING
 
 void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
-			  int entry_size)
+		   int entry_size, struct pt_regs *regs)
 {
 	struct perf_raw_record raw = {
 		.size = entry_size,
@@ -4352,14 +4352,9 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 		.raw = &raw,
 	};
 
-	struct pt_regs *regs = get_irq_regs();
-
-	if (!regs)
-		regs = task_pt_regs(current);
-
 	/* Trace events already protected against recursion */
 	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
-				&data, regs);
+			 &data, regs);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index f0d6930..e66d21e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -2,13 +2,14 @@
  * trace event based perf counter profiling
  *
  * Copyright (C) 2009 Red Hat Inc, Peter Zijlstra <pzijlstr@redhat.com>
- *
+ * Copyright (C) 2009-2010 Frederic Weisbecker <fweisbec@gmail.com>
  */
 
 #include <linux/module.h>
 #include <linux/kprobes.h>
 #include "trace.h"
 
+DEFINE_PER_CPU(struct pt_regs, perf_trace_regs);
 
 static char *perf_trace_buf;
 static char *perf_trace_buf_nmi;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 356c102..73e0526 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1260,7 +1260,7 @@ static __kprobes void kprobe_profile_func(struct kprobe *kp,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags, regs);
 }
 
 /* Kretprobe profile handler */
@@ -1291,7 +1291,8 @@ static __kprobes void kretprobe_profile_func(struct kretprobe_instance *ri,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1,
+			       irq_flags, regs);
 }
 
 static int probe_profile_enable(struct ftrace_event_call *call)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4e332b9..d0f34ab 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -462,7 +462,7 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	rec->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysenter_enable(struct ftrace_event_call *call)
@@ -537,7 +537,7 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysexit_enable(struct ftrace_event_call *call)
-- 
1.6.2.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] perf: Take a hot regs snapshot for trace events
@ 2010-03-05 20:55               ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 20:55 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Peter Zijlstra, Paul Mackerras,
	Steven Rostedt, Masami Hiramatsu, Jason Baron, Thomas Gleixner,
	H . Peter Anvin, Arnaldo Carvalho de Melo, Archs, Ingo Molnar

We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.

What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.

Let's use the new perf_fetch_caller_regs() for that.

Comparison with perf record -e lock: -R -a -f -g
Before:

        perf  [kernel]                   [k] __do_softirq
               |
               --- __do_softirq
                  |
                  |--55.16%-- __open
                  |
                   --44.84%-- __write_nocancel

After:

            perf  [kernel]           [k] perf_tp_event
               |
               --- perf_tp_event
                  |
                  |--41.07%-- lock_acquire
                  |          |
                  |          |--39.36%-- _raw_spin_lock
                  |          |          |
                  |          |          |--7.81%-- hrtimer_interrupt
                  |          |          |          smp_apic_timer_interrupt
                  |          |          |          apic_timer_interrupt

The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.

Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.

v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
---
 include/linux/ftrace_event.h       |    7 +++++--
 include/trace/ftrace.h             |    6 +++++-
 kernel/perf_event.c                |    9 ++-------
 kernel/trace/trace_event_profile.c |    3 ++-
 kernel/trace/trace_kprobe.c        |    5 +++--
 kernel/trace/trace_syscalls.c      |    4 ++--
 6 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cd95919..48b367e 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -188,6 +188,9 @@ do {									\
 
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;
+
+DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+
 extern int ftrace_profile_enable(int event_id);
 extern void ftrace_profile_disable(int event_id);
 extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
@@ -199,11 +202,11 @@ ftrace_perf_buf_prepare(int size, unsigned short type, int *rctxp,
 
 static inline void
 ftrace_perf_buf_submit(void *raw_data, int size, int rctx, u64 addr,
-		       u64 count, unsigned long irq_flags)
+		       u64 count, unsigned long irq_flags, struct pt_regs *regs)
 {
 	struct trace_entry *entry = raw_data;
 
-	perf_tp_event(entry->type, addr, count, raw_data, size);
+	perf_tp_event(entry->type, addr, count, raw_data, size, regs);
 	perf_swevent_put_recursion_context(rctx);
 	local_irq_restore(irq_flags);
 }
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index f2c09e4..1f7677d 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -853,6 +853,7 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 	struct ftrace_raw_##call *entry;				\
 	u64 __addr = 0, __count = 1;					\
 	unsigned long irq_flags;					\
+	struct pt_regs *__regs;						\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
@@ -873,8 +874,11 @@ ftrace_profile_templ_##call(struct ftrace_event_call *event_call,	\
 									\
 	{ assign; }							\
 									\
+	__regs = &__get_cpu_var(perf_trace_regs);			\
+	perf_fetch_caller_regs(__regs, 2);				\
+									\
 	ftrace_perf_buf_submit(entry, __entry_size, rctx, __addr,	\
-			       __count, irq_flags);			\
+			       __count, irq_flags, __regs);		\
 }
 
 #undef DEFINE_EVENT
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 1c3059b..aa3b98c 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4340,7 +4340,7 @@ static const struct pmu perf_ops_task_clock = {
 #ifdef CONFIG_EVENT_TRACING
 
 void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
-			  int entry_size)
+		   int entry_size, struct pt_regs *regs)
 {
 	struct perf_raw_record raw = {
 		.size = entry_size,
@@ -4352,14 +4352,9 @@ void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
 		.raw = &raw,
 	};
 
-	struct pt_regs *regs = get_irq_regs();
-
-	if (!regs)
-		regs = task_pt_regs(current);
-
 	/* Trace events already protected against recursion */
 	do_perf_sw_event(PERF_TYPE_TRACEPOINT, event_id, count, 1,
-				&data, regs);
+			 &data, regs);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
diff --git a/kernel/trace/trace_event_profile.c b/kernel/trace/trace_event_profile.c
index f0d6930..e66d21e 100644
--- a/kernel/trace/trace_event_profile.c
+++ b/kernel/trace/trace_event_profile.c
@@ -2,13 +2,14 @@
  * trace event based perf counter profiling
  *
  * Copyright (C) 2009 Red Hat Inc, Peter Zijlstra <pzijlstr@redhat.com>
- *
+ * Copyright (C) 2009-2010 Frederic Weisbecker <fweisbec@gmail.com>
  */
 
 #include <linux/module.h>
 #include <linux/kprobes.h>
 #include "trace.h"
 
+DEFINE_PER_CPU(struct pt_regs, perf_trace_regs);
 
 static char *perf_trace_buf;
 static char *perf_trace_buf_nmi;
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 356c102..73e0526 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1260,7 +1260,7 @@ static __kprobes void kprobe_profile_func(struct kprobe *kp,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ip, 1, irq_flags, regs);
 }
 
 /* Kretprobe profile handler */
@@ -1291,7 +1291,8 @@ static __kprobes void kretprobe_profile_func(struct kretprobe_instance *ri,
 	for (i = 0; i < tp->nr_args; i++)
 		entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
 
-	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1, irq_flags);
+	ftrace_perf_buf_submit(entry, size, rctx, entry->ret_ip, 1,
+			       irq_flags, regs);
 }
 
 static int probe_profile_enable(struct ftrace_event_call *call)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4e332b9..d0f34ab 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -462,7 +462,7 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
 	rec->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysenter_enable(struct ftrace_event_call *call)
@@ -537,7 +537,7 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags);
+	ftrace_perf_buf_submit(rec, size, rctx, 0, 1, flags, regs);
 }
 
 int prof_sysexit_enable(struct ftrace_event_call *call)
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-03  6:55 ` [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection Frederic Weisbecker
@ 2010-03-09  7:18   ` Hitoshi Mitake
  2010-03-10  0:10     ` Frederic Weisbecker
  2010-03-09  8:34   ` Jens Axboe
  1 sibling, 1 reply; 76+ messages in thread
From: Hitoshi Mitake @ 2010-03-09  7:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Steven Rostedt, Paul Mackerras, Li Zefan, Lai Jiangshan,
	Masami Hiramatsu, Jens Axboe

On 03/03/10 15:55, Frederic Weisbecker wrote:
> There are rcu locked read side areas in the path where we submit
> a trace event. And these rcu_read_(un)lock() trigger lock events,
> which create recursive events.
>
> One pair in do_perf_sw_event:
>
> __lock_acquire
>        |
>        |--96.11%-- lock_acquire
>        |          |
>        |          |--27.21%-- do_perf_sw_event
>        |          |          perf_tp_event
>        |          |          |
>        |          |          |--49.62%-- ftrace_profile_lock_release
>        |          |          |          lock_release
>        |          |          |          |
>        |          |          |          |--33.85%-- _raw_spin_unlock
>
> Another pair in perf_output_begin/end:
>
> __lock_acquire
>        |--23.40%-- perf_output_begin
>        |          |          __perf_event_overflow
>        |          |          perf_swevent_overflow
>        |          |          perf_swevent_add
>        |          |          perf_swevent_ctx_event
>        |          |          do_perf_sw_event
>        |          |          perf_tp_event
>        |          |          |
>        |          |          |--55.37%-- ftrace_profile_lock_acquire
>        |          |          |          lock_acquire
>        |          |          |          |
>        |          |          |          |--37.31%-- _raw_spin_lock
>
> The problem is not that much the trace recursion itself, as we have a
> recursion protection already (though it's always wasteful to recurse).
> But the trace events are outside the lockdep recursion protection, then
> each lockdep event triggers a lock trace, which will trigger two
> other lockdep events. Here the recursive lock trace event won't
> be taken because of the trace recursion, so the recursion stops there
> but lockdep will still analyse these new events:
>
> To sum up, for each lockdep events we have:
>
> 	lock_*()
> 	     |
>               trace lock_acquire
>                    |
>                    ----- rcu_read_lock()
>                    |          |
>                    |          lock_acquire()
>                    |          |
>                    |          trace_lock_acquire() (stopped)
>                    |          |
> 		  |          lockdep analyze
>                    |
>                    ----- rcu_read_unlock()
>                               |
>                               lock_release
>                               |
>                               trace_lock_release() (stopped)
>                               |
>                               lockdep analyze
>
> And you can repeat the above two times as we have two rcu read side
> sections when we submit an event.
>
> This is fixed in this patch by moving the lock trace event under
> the lockdep recursion protection.

Thanks a lot, Frederic!

I tested perf lock with your patch, result is like this,

Typical scores:

before:
% sudo ./perf lock record ./perf bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

      Total time: 3.265 [sec]
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 143.952 MB perf.data (~6289344 samples) ]

after:
% sudo ./perf lock record ./perf bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

      Total time: 1.943 [sec]                       <--- about x1.5 faster!
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 98.161 MB perf.data (~4288734 samples) 
]   <--- size of perf.data is also reduced


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-03  6:55 ` [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection Frederic Weisbecker
  2010-03-09  7:18   ` Hitoshi Mitake
@ 2010-03-09  8:34   ` Jens Axboe
  2010-03-09  8:35     ` Jens Axboe
  2010-03-09 23:45     ` Frederic Weisbecker
  1 sibling, 2 replies; 76+ messages in thread
From: Jens Axboe @ 2010-03-09  8:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Steven Rostedt, Paul Mackerras, Hitoshi Mitake, Li Zefan,
	Lai Jiangshan, Masami Hiramatsu

On Wed, Mar 03 2010, Frederic Weisbecker wrote:
> There are rcu locked read side areas in the path where we submit
> a trace event. And these rcu_read_(un)lock() trigger lock events,
> which create recursive events.
> 
> One pair in do_perf_sw_event:
> 
> __lock_acquire
>       |
>       |--96.11%-- lock_acquire
>       |          |
>       |          |--27.21%-- do_perf_sw_event
>       |          |          perf_tp_event
>       |          |          |
>       |          |          |--49.62%-- ftrace_profile_lock_release
>       |          |          |          lock_release
>       |          |          |          |
>       |          |          |          |--33.85%-- _raw_spin_unlock
> 
> Another pair in perf_output_begin/end:
> 
> __lock_acquire
>       |--23.40%-- perf_output_begin
>       |          |          __perf_event_overflow
>       |          |          perf_swevent_overflow
>       |          |          perf_swevent_add
>       |          |          perf_swevent_ctx_event
>       |          |          do_perf_sw_event
>       |          |          perf_tp_event
>       |          |          |
>       |          |          |--55.37%-- ftrace_profile_lock_acquire
>       |          |          |          lock_acquire
>       |          |          |          |
>       |          |          |          |--37.31%-- _raw_spin_lock
> 
> The problem is not that much the trace recursion itself, as we have a
> recursion protection already (though it's always wasteful to recurse).
> But the trace events are outside the lockdep recursion protection, then
> each lockdep event triggers a lock trace, which will trigger two
> other lockdep events. Here the recursive lock trace event won't
> be taken because of the trace recursion, so the recursion stops there
> but lockdep will still analyse these new events:
> 
> To sum up, for each lockdep events we have:
> 
> 	lock_*()
> 	     |
>              trace lock_acquire
>                   |
>                   ----- rcu_read_lock()
>                   |          |
>                   |          lock_acquire()
>                   |          |
>                   |          trace_lock_acquire() (stopped)
>                   |          |
> 		  |          lockdep analyze
>                   |
>                   ----- rcu_read_unlock()
>                              |
>                              lock_release
>                              |
>                              trace_lock_release() (stopped)
>                              |
>                              lockdep analyze
> 
> And you can repeat the above two times as we have two rcu read side
> sections when we submit an event.
> 
> This is fixed in this patch by moving the lock trace event under
> the lockdep recursion protection.

I went to try this on 2.6.34-rc1 to see how much it would improve things
here. With 2.6.34-rc1, a

$ time sudo perf lock record ls /dev/shm

essentially hangs the box, it ends up crashing hard (not just live
locked like before). With the patch in place, it does eventually finish:

real    0m21.301s
user    0m0.030s
sys     0m21.040s

The directory is empty.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-09  8:34   ` Jens Axboe
@ 2010-03-09  8:35     ` Jens Axboe
  2010-03-10  0:05       ` Frederic Weisbecker
  2010-03-09 23:45     ` Frederic Weisbecker
  1 sibling, 1 reply; 76+ messages in thread
From: Jens Axboe @ 2010-03-09  8:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Steven Rostedt, Paul Mackerras, Hitoshi Mitake, Li Zefan,
	Lai Jiangshan, Masami Hiramatsu

On Tue, Mar 09 2010, Jens Axboe wrote:
> On Wed, Mar 03 2010, Frederic Weisbecker wrote:
> > There are rcu locked read side areas in the path where we submit
> > a trace event. And these rcu_read_(un)lock() trigger lock events,
> > which create recursive events.
> > 
> > One pair in do_perf_sw_event:
> > 
> > __lock_acquire
> >       |
> >       |--96.11%-- lock_acquire
> >       |          |
> >       |          |--27.21%-- do_perf_sw_event
> >       |          |          perf_tp_event
> >       |          |          |
> >       |          |          |--49.62%-- ftrace_profile_lock_release
> >       |          |          |          lock_release
> >       |          |          |          |
> >       |          |          |          |--33.85%-- _raw_spin_unlock
> > 
> > Another pair in perf_output_begin/end:
> > 
> > __lock_acquire
> >       |--23.40%-- perf_output_begin
> >       |          |          __perf_event_overflow
> >       |          |          perf_swevent_overflow
> >       |          |          perf_swevent_add
> >       |          |          perf_swevent_ctx_event
> >       |          |          do_perf_sw_event
> >       |          |          perf_tp_event
> >       |          |          |
> >       |          |          |--55.37%-- ftrace_profile_lock_acquire
> >       |          |          |          lock_acquire
> >       |          |          |          |
> >       |          |          |          |--37.31%-- _raw_spin_lock
> > 
> > The problem is not that much the trace recursion itself, as we have a
> > recursion protection already (though it's always wasteful to recurse).
> > But the trace events are outside the lockdep recursion protection, then
> > each lockdep event triggers a lock trace, which will trigger two
> > other lockdep events. Here the recursive lock trace event won't
> > be taken because of the trace recursion, so the recursion stops there
> > but lockdep will still analyse these new events:
> > 
> > To sum up, for each lockdep events we have:
> > 
> > 	lock_*()
> > 	     |
> >              trace lock_acquire
> >                   |
> >                   ----- rcu_read_lock()
> >                   |          |
> >                   |          lock_acquire()
> >                   |          |
> >                   |          trace_lock_acquire() (stopped)
> >                   |          |
> > 		  |          lockdep analyze
> >                   |
> >                   ----- rcu_read_unlock()
> >                              |
> >                              lock_release
> >                              |
> >                              trace_lock_release() (stopped)
> >                              |
> >                              lockdep analyze
> > 
> > And you can repeat the above two times as we have two rcu read side
> > sections when we submit an event.
> > 
> > This is fixed in this patch by moving the lock trace event under
> > the lockdep recursion protection.
> 
> I went to try this on 2.6.34-rc1 to see how much it would improve things
> here. With 2.6.34-rc1, a

Which, btw, throws a new lockdep warning fest:

[   42.247718] scsi7 : ioc0: LSISAS1068E B3, FwRev=011b0300h, Ports=1, MaxQ=483, IRQ=26
[   42.281125] BUG: key ffff880c7cb75250 not in .data!
[   42.288346] ------------[ cut here ]------------
[   42.294490] WARNING: at kernel/lockdep.c:2706 lockdep_init_map+0x545/0x5f0()
[   42.304013] Hardware name: QSSC-S4R
[   42.309674] Modules linked in: hid_apple usbhid ehci_hcd uhci_hcd usbcore nls_base mptsas(+) mptscsih mptbase scsi_transport_sas igb sg sr_mod cdrom
[   42.332072] Pid: 5197, comm: modprobe Not tainted 2.6.34-rc1 #176
[   42.340597] Call Trace:
[   42.345335]  [<ffffffff8107fb8d>] ? is_module_address+0x2d/0x60
[   42.353670]  [<ffffffff81074aa5>] ? lockdep_init_map+0x545/0x5f0
[   42.362154]  [<ffffffff81044648>] warn_slowpath_common+0x78/0xd0
[   42.371913]  [<ffffffff810446af>] warn_slowpath_null+0xf/0x20
[   42.380109]  [<ffffffff81074aa5>] lockdep_init_map+0x545/0x5f0
[   42.388402]  [<ffffffff81156636>] ? sysfs_new_dirent+0x76/0x120
[   42.396748]  [<ffffffff81155b6c>] sysfs_add_file_mode+0x6c/0xd0
[   42.405170]  [<ffffffff812dd260>] ? transport_add_class_device+0x0/0x50
[   42.414406]  [<ffffffff81155bdc>] sysfs_add_file+0xc/0x10
[   42.422183]  [<ffffffff81155ca1>] sysfs_create_file+0x21/0x40
[   42.430328]  [<ffffffff812d6574>] device_create_file+0x14/0x20
[   42.438584]  [<ffffffff812dcace>] attribute_container_add_attrs+0x8e/0xb0
[   42.447952]  [<ffffffff812dcb0d>] attribute_container_add_class_device+0x1d/0x30
[   42.457968]  [<ffffffff812dd27f>] transport_add_class_device+0x1f/0x50
[   42.467108]  [<ffffffff812dcc4f>] attribute_container_device_trigger+0x9f/0xe0
[   42.476978]  [<ffffffff812dd1e0>] transport_add_device+0x10/0x20
[   42.485415]  [<ffffffffa0049208>] sas_phy_add+0x28/0x40 [scsi_transport_sas]
[   42.495039]  [<ffffffffa0079e58>] mptsas_probe_one_phy+0x5f8/0x850 [mptsas]
[   42.504680]  [<ffffffff813fb469>] ? mutex_unlock+0x9/0x10
[   42.512429]  [<ffffffffa0079459>] ? mptsas_setup_wide_ports+0x269/0x380 [mptsas]
[   42.522481]  [<ffffffffa007a65d>] mptsas_probe_hba_phys+0x5ad/0xa10 [mptsas]
[   42.532154]  [<ffffffff8107401e>] ? lock_release_holdtime+0xfe/0x1f0
[   42.541071]  [<ffffffffa007af6d>] mptsas_scan_sas_topology+0x1d/0x340 [mptsas]
[   42.550920]  [<ffffffff812dd1c0>] ? transport_configure_device+0x10/0x20
[   42.560106]  [<ffffffff812ecd78>] ? scsi_sysfs_add_host+0x98/0xb0
[   42.568590]  [<ffffffffa007b693>] mptsas_probe+0x403/0x4b0 [mptsas]
[   42.577215]  [<ffffffff8123f2b2>] local_pci_probe+0x12/0x20
[   42.585257]  [<ffffffff8123f611>] pci_device_probe+0x111/0x120
[   42.593590]  [<ffffffff812d936b>] ? driver_sysfs_add+0x6b/0x90
[   42.601814]  [<ffffffff812d94a5>] driver_probe_device+0x95/0x1a0
[   42.610212]  [<ffffffff812d9643>] __driver_attach+0x93/0xa0
[   42.618261]  [<ffffffff812d95b0>] ? __driver_attach+0x0/0xa0
[   42.626320]  [<ffffffff812d8b5b>] bus_for_each_dev+0x6b/0xa0
[   42.634509]  [<ffffffff812d92fc>] driver_attach+0x1c/0x20
[   42.642204]  [<ffffffff812d8315>] bus_add_driver+0x1d5/0x320
[   42.650234]  [<ffffffffa0015000>] ? mptsas_init+0x0/0x114 [mptsas]
[   42.658889]  [<ffffffff812d994c>] driver_register+0x7c/0x170
[   42.667004]  [<ffffffffa0015000>] ? mptsas_init+0x0/0x114 [mptsas]
[   42.675673]  [<ffffffff8123f8ba>] __pci_register_driver+0x6a/0xf0
[   42.684215]  [<ffffffffa0015000>] ? mptsas_init+0x0/0x114 [mptsas]
[   42.692814]  [<ffffffffa00150f9>] mptsas_init+0xf9/0x114 [mptsas]
[   42.701336]  [<ffffffff810001d8>] do_one_initcall+0x38/0x190
[   42.709375]  [<ffffffff81083fe3>] sys_init_module+0xe3/0x260
[   42.717447]  [<ffffffff81002e2b>] system_call_fastpath+0x16/0x1b
[   42.726261] ---[ end trace 52cf32179b76155b ]---
[   42.731555] BUG: key ffff880c7cb75288 not in .data!
[   42.737130] BUG: key ffff880c7cb752c0 not in .data!
[   42.742817] BUG: key ffff880c7cb752f8 not in .data!
[   42.748476] BUG: key ffff880c7cb75330 not in .data!
[   42.754039] BUG: key ffff880c7cb75368 not in .data!
[   42.759634] BUG: key ffff880c7cb753a0 not in .data!
[   42.765192] BUG: key ffff880c7cb753d8 not in .data!
[   42.770833] BUG: key ffff880c7cb75410 not in .data!
[   42.776394] BUG: key ffff880c7cb75448 not in .data!
[   42.781967] BUG: key ffff880c7cb75480 not in .data!
[   42.787480] BUG: key ffff880c7cb754b8 not in .data!
[   42.793026] BUG: key ffff880c7cb754f0 not in .data!
[   42.798689] BUG: key ffff880c7cb75528 not in .data!
[   42.804235] BUG: key ffff880c7cb75560 not in .data!
[   42.809828] BUG: key ffff880c7cb75598 not in .data!
[   42.815401] BUG: key ffff880c7cb755d0 not in .data!
[   42.821078] BUG: key ffff880c7cb75250 not in .data!
[   42.826714] BUG: key ffff880c7cb75288 not in .data!
[   42.832401] BUG: key ffff880c7cb752c0 not in .data!
[   42.838064] BUG: key ffff880c7cb752f8 not in .data!
[   42.843619] BUG: key ffff880c7cb75330 not in .data!
[   42.849324] BUG: key ffff880c7cb75368 not in .data!
[   42.854900] BUG: key ffff880c7cb753a0 not in .data!
[   42.860519] BUG: key ffff880c7cb753d8 not in .data!
[   42.866135] BUG: key ffff880c7cb75410 not in .data!
[   42.871746] BUG: key ffff880c7cb75448 not in .data!
[   42.877257] BUG: key ffff880c7cb75480 not in .data!
[   42.882831] BUG: key ffff880c7cb754b8 not in .data!
[   42.888425] BUG: key ffff880c7cb754f0 not in .data!
[   42.893990] BUG: key ffff880c7cb75528 not in .data!
[   42.899668] BUG: key ffff880c7cb75560 not in .data!
[   42.905220] BUG: key ffff880c7cb75598 not in .data!
[   42.910881] BUG: key ffff880c7cb755d0 not in .data!
[   42.916753] BUG: key ffff880c7cb75250 not in .data!
[   42.922374] BUG: key ffff880c7cb75288 not in .data!
[   42.928045] BUG: key ffff880c7cb752c0 not in .data!
[   42.933596] BUG: key ffff880c7cb752f8 not in .data!
[   42.939245] BUG: key ffff880c7cb75330 not in .data!
[   42.944820] BUG: key ffff880c7cb75368 not in .data!
[   42.950467] BUG: key ffff880c7cb753a0 not in .data!
[   42.956052] BUG: key ffff880c7cb753d8 not in .data!
[   42.961694] BUG: key ffff880c7cb75410 not in .data!
[   42.967206] BUG: key ffff880c7cb75448 not in .data!
[   42.972741] BUG: key ffff880c7cb75480 not in .data!
[   42.978397] BUG: key ffff880c7cb754b8 not in .data!
[   42.983965] BUG: key ffff880c7cb754f0 not in .data!
[   42.989570] BUG: key ffff880c7cb75528 not in .data!
[   42.995116] BUG: key ffff880c7cb75560 not in .data!
[   43.000777] BUG: key ffff880c7cb75598 not in .data!
[   43.006418] BUG: key ffff880c7cb755d0 not in .data!
[   43.012110] BUG: key ffff880c7cb75250 not in .data!
[   43.017697] BUG: key ffff880c7cb75288 not in .data!
[   43.023278] BUG: key ffff880c7cb752c0 not in .data!
[   43.028927] BUG: key ffff880c7cb752f8 not in .data!
[   43.034543] BUG: key ffff880c7cb75330 not in .data!
[   43.040158] BUG: key ffff880c7cb75368 not in .data!
[   43.045767] BUG: key ffff880c7cb753a0 not in .data!
[   43.051370] BUG: key ffff880c7cb753d8 not in .data!
[   43.056882] BUG: key ffff880c7cb75410 not in .data!
[   43.062395] BUG: key ffff880c7cb75448 not in .data!
[   43.068074] BUG: key ffff880c7cb75480 not in .data!
[   43.073629] BUG: key ffff880c7cb754b8 not in .data!
[   43.079261] BUG: key ffff880c7cb754f0 not in .data!
[   43.084775] BUG: key ffff880c7cb75528 not in .data!
[   43.090401] BUG: key ffff880c7cb75560 not in .data!
[   43.095914] BUG: key ffff880c7cb75598 not in .data!
[   43.101507] BUG: key ffff880c7cb755d0 not in .data!
[   43.107046] BUG: key ffff880c7cb75250 not in .data!
[   43.112684] BUG: key ffff880c7cb75288 not in .data!
[   43.118270] BUG: key ffff880c7cb752c0 not in .data!
[   43.123832] BUG: key ffff880c7cb752f8 not in .data!
[   43.129442] BUG: key ffff880c7cb75330 not in .data!
[   43.134967] BUG: key ffff880c7cb75368 not in .data!
[   43.140574] BUG: key ffff880c7cb753a0 not in .data!
[   43.146190] BUG: key ffff880c7cb753d8 not in .data!
[   43.151806] BUG: key ffff880c7cb75410 not in .data!
[   43.157403] BUG: key ffff880c7cb75448 not in .data!
[   43.162976] BUG: key ffff880c7cb75480 not in .data!
[   43.168570] BUG: key ffff880c7cb754b8 not in .data!
[   43.174200] BUG: key ffff880c7cb754f0 not in .data!
[   43.179791] BUG: key ffff880c7cb75528 not in .data!
[   43.185350] BUG: key ffff880c7cb75560 not in .data!
[   43.190907] BUG: key ffff880c7cb75598 not in .data!
[   43.196421] BUG: key ffff880c7cb755d0 not in .data!
[   43.202012] BUG: key ffff880c7cb75250 not in .data!
[   43.207579] BUG: key ffff880c7cb75288 not in .data!
[   43.213112] BUG: key ffff880c7cb752c0 not in .data!
[   43.218880] BUG: key ffff880c7cb752f8 not in .data!
[   43.224416] BUG: key ffff880c7cb75330 not in .data!
[   43.230075] BUG: key ffff880c7cb75368 not in .data!
[   43.235699] BUG: key ffff880c7cb753a0 not in .data!
[   43.241477] BUG: key ffff880c7cb753d8 not in .data!
[   43.247058] BUG: key ffff880c7cb75410 not in .data!
[   43.252559] BUG: key ffff880c7cb75448 not in .data!
[   43.258273] BUG: key ffff880c7cb75480 not in .data!
[   43.263775] BUG: key ffff880c7cb754b8 not in .data!
[   43.269451] BUG: key ffff880c7cb754f0 not in .data!
[   43.275018] BUG: key ffff880c7cb75528 not in .data!
[   43.280674] BUG: key ffff880c7cb75560 not in .data!
[   43.286185] BUG: key ffff880c7cb75598 not in .data!
[   43.291700] BUG: key ffff880c7cb755d0 not in .data!
[   43.297239] BUG: key ffff880c7cb75250 not in .data!
[   43.302749] BUG: key ffff880c7cb75288 not in .data!
[   43.308351] BUG: key ffff880c7cb752c0 not in .data!
[   43.313968] BUG: key ffff880c7cb752f8 not in .data!
[   43.319584] BUG: key ffff880c7cb75330 not in .data!
[   43.325202] BUG: key ffff880c7cb75368 not in .data!
[   43.330813] BUG: key ffff880c7cb753a0 not in .data!
[   43.336326] BUG: key ffff880c7cb753d8 not in .data!
[   43.341900] BUG: key ffff880c7cb75410 not in .data!
[   43.347480] BUG: key ffff880c7cb75448 not in .data!
[   43.353050] BUG: key ffff880c7cb75480 not in .data!
[   43.358664] BUG: key ffff880c7cb754b8 not in .data!
[   43.364187] BUG: key ffff880c7cb754f0 not in .data!
[   43.369865] BUG: key ffff880c7cb75528 not in .data!
[   43.375417] BUG: key ffff880c7cb75560 not in .data!
[   43.381026] BUG: key ffff880c7cb75598 not in .data!
[   43.386632] BUG: key ffff880c7cb755d0 not in .data!
[   43.392189] BUG: key ffff880c7cb75250 not in .data!
[   43.397785] BUG: key ffff880c7cb75288 not in .data!
[   43.403343] BUG: key ffff880c7cb752c0 not in .data!
[   43.408862] BUG: key ffff880c7cb752f8 not in .data!
[   43.414405] BUG: key ffff880c7cb75330 not in .data!
[   43.420069] BUG: key ffff880c7cb75368 not in .data!
[   43.425720] BUG: key ffff880c7cb753a0 not in .data!
[   43.431331] BUG: key ffff880c7cb753d8 not in .data!
[   43.436970] BUG: key ffff880c7cb75410 not in .data!
[   43.442554] BUG: key ffff880c7cb75448 not in .data!
[   43.448209] BUG: key ffff880c7cb75480 not in .data!
[   43.453786] BUG: key ffff880c7cb754b8 not in .data!
[   43.459429] BUG: key ffff880c7cb754f0 not in .data!
[   43.465011] BUG: key ffff880c7cb75528 not in .data!
[   43.470654] BUG: key ffff880c7cb75560 not in .data!
[   43.476168] BUG: key ffff880c7cb75598 not in .data!
[   43.483110] BUG: key ffff880c7cb755d0 not in .data!


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-09  8:34   ` Jens Axboe
  2010-03-09  8:35     ` Jens Axboe
@ 2010-03-09 23:45     ` Frederic Weisbecker
  2010-03-10 15:55       ` Jens Axboe
  1 sibling, 1 reply; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-09 23:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Steven Rostedt, Paul Mackerras, Hitoshi Mitake, Li Zefan,
	Lai Jiangshan, Masami Hiramatsu

On Tue, Mar 09, 2010 at 09:34:10AM +0100, Jens Axboe wrote:
> I went to try this on 2.6.34-rc1 to see how much it would improve things
> here. With 2.6.34-rc1, a
> 
> $ time sudo perf lock record ls /dev/shm
> 
> essentially hangs the box, it ends up crashing hard (not just live
> locked like before). With the patch in place, it does eventually finish:
> 
> real    0m21.301s
> user    0m0.030s
> sys     0m21.040s
> 
> The directory is empty.


Hehe :-)

That said you are probably missing a part of the puzzle.
This patch avoids the scary recursions we had. But there
is another separate patch that fixes the buffers multiplexing
dependency we had. Buffering is now done per cpu.

You need this patch:

	commit b67577dfb45580c498bfdb1bc76c00c3b2ad6310
	Author: Frederic Weisbecker <fweisbec@gmail.com>
	Date:   Wed Feb 3 09:09:33 2010 +0100

	perf lock: Drop the buffers multiplexing dependency

But this is not in -rc1, you need to checkout tip/perf/core

Well, the best would be you actually pull this on top of -rc1:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

so that you get the buffers multiplexing dependency that is in -tip,
and you also get the recursions fixes, not yet in -tip (but should make
it soon).

I bet you still won't have magic results after that, but still it should
be better.

We still have further projects in mind to improve the scalability,
like the events injection thing (that avoids the string copy)

Again, you are my eyes on this, I'm still blind with my poor dual
laptop or my atom testbox.

Thanks for your testing.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-09  8:35     ` Jens Axboe
@ 2010-03-10  0:05       ` Frederic Weisbecker
  2010-03-10 15:45         ` Peter Zijlstra
  2010-03-10 15:55         ` Jens Axboe
  0 siblings, 2 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-10  0:05 UTC (permalink / raw)
  To: Jens Axboe, Peter Zijlstra, Greg KH
  Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Steven Rostedt,
	Paul Mackerras, Hitoshi Mitake, Li Zefan, Lai Jiangshan,
	Masami Hiramatsu

On Tue, Mar 09, 2010 at 09:35:37AM +0100, Jens Axboe wrote:
> Which, btw, throws a new lockdep warning fest:
> 
> [   42.247718] scsi7 : ioc0: LSISAS1068E B3, FwRev=011b0300h, Ports=1, MaxQ=483, IRQ=26
> [   42.281125] BUG: key ffff880c7cb75250 not in .data!
> [   42.288346] ------------[ cut here ]------------
> [   42.294490] WARNING: at kernel/lockdep.c:2706 lockdep_init_map+0x545/0x5f0()
> [   42.304013] Hardware name: QSSC-S4R
> [   42.309674] Modules linked in: hid_apple usbhid ehci_hcd uhci_hcd usbcore nls_base mptsas(+) mptscsih mptbase scsi_transport_sas igb sg sr_mod cdrom
> [   42.332072] Pid: 5197, comm: modprobe Not tainted 2.6.34-rc1 #176
> [   42.340597] Call Trace:
> [   42.345335]  [<ffffffff8107fb8d>] ? is_module_address+0x2d/0x60
> [   42.353670]  [<ffffffff81074aa5>] ? lockdep_init_map+0x545/0x5f0
> [   42.362154]  [<ffffffff81044648>] warn_slowpath_common+0x78/0xd0
> [   42.371913]  [<ffffffff810446af>] warn_slowpath_null+0xf/0x20
> [   42.380109]  [<ffffffff81074aa5>] lockdep_init_map+0x545/0x5f0



This doesn't look related to my patch, it also happen in -rc1 right?



> [   42.388402]  [<ffffffff81156636>] ? sysfs_new_dirent+0x76/0x120


This calls:

#define sysfs_dirent_init_lockdep(sd)				\
do {								\
	struct attribute *attr = sd->s_attr.attr;		\
	struct lock_class_key *key = attr->key;			\
	if (!key)						\
		key = &attr->skey;				\
								\
	lockdep_init_map(&sd->dep_map, "s_active", key, 0);	\
} while(0)


Looks like the struct lock_class_key is dynamically allocated
and lockdep doesn't like that.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-09  7:18   ` Hitoshi Mitake
@ 2010-03-10  0:10     ` Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-10  0:10 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Steven Rostedt, Paul Mackerras, Li Zefan, Lai Jiangshan,
	Masami Hiramatsu, Jens Axboe

On Tue, Mar 09, 2010 at 04:18:58PM +0900, Hitoshi Mitake wrote:
> Thanks a lot, Frederic!
>
> I tested perf lock with your patch, result is like this,
>
> Typical scores:
>
> before:
> % sudo ./perf lock record ./perf bench sched messaging
> # Running sched/messaging benchmark...
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
>      Total time: 3.265 [sec]
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 143.952 MB perf.data (~6289344 samples) ]
>
> after:
> % sudo ./perf lock record ./perf bench sched messaging
> # Running sched/messaging benchmark...
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
>      Total time: 1.943 [sec]                       <--- about x1.5 faster!
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 98.161 MB perf.data (~4288734 samples)  
> ]   <--- size of perf.data is also reduced
>


Oh great! Yeah this recursion thing was really bad.

Thanks.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-10  0:05       ` Frederic Weisbecker
@ 2010-03-10 15:45         ` Peter Zijlstra
  2010-03-10 15:56           ` Jens Axboe
  2010-03-10 15:55         ` Jens Axboe
  1 sibling, 1 reply; 76+ messages in thread
From: Peter Zijlstra @ 2010-03-10 15:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Jens Axboe, Greg KH, Ingo Molnar, LKML, Arnaldo Carvalho de Melo,
	Steven Rostedt, Paul Mackerras, Hitoshi Mitake, Li Zefan,
	Lai Jiangshan, Masami Hiramatsu

On Wed, 2010-03-10 at 01:05 +0100, Frederic Weisbecker wrote:
> On Tue, Mar 09, 2010 at 09:35:37AM +0100, Jens Axboe wrote:
> > Which, btw, throws a new lockdep warning fest:
> > 
> > [   42.247718] scsi7 : ioc0: LSISAS1068E B3, FwRev=011b0300h,
> Ports=1, MaxQ=483, IRQ=26
> > [   42.281125] BUG: key ffff880c7cb75250 not in .data!
> > [   42.288346] ------------[ cut here ]------------
> > [   42.294490] WARNING: at kernel/lockdep.c:2706 lockdep_init_map
> +0x545/0x5f0()
> > [   42.304013] Hardware name: QSSC-S4R
> > [   42.309674] Modules linked in: hid_apple usbhid ehci_hcd uhci_hcd
> usbcore nls_base mptsas(+) mptscsih mptbase scsi_transport_sas igb sg
> sr_mod cdrom
> > [   42.332072] Pid: 5197, comm: modprobe Not tainted 2.6.34-rc1 #176
> > [   42.340597] Call Trace:
> > [   42.345335]  [<ffffffff8107fb8d>] ? is_module_address+0x2d/0x60
> > [   42.353670]  [<ffffffff81074aa5>] ? lockdep_init_map+0x545/0x5f0
> > [   42.362154]  [<ffffffff81044648>] warn_slowpath_common+0x78/0xd0
> > [   42.371913]  [<ffffffff810446af>] warn_slowpath_null+0xf/0x20
> > [   42.380109]  [<ffffffff81074aa5>] lockdep_init_map+0x545/0x5f0
> 
> 
> 
> This doesn't look related to my patch, it also happen in -rc1 right?


The per-cpu changes broke lockdep, Tejun already posted patches for
this:

http://lkml.org/lkml/2010/3/10/76
http://lkml.org/lkml/2010/3/10/79



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-09 23:45     ` Frederic Weisbecker
@ 2010-03-10 15:55       ` Jens Axboe
  0 siblings, 0 replies; 76+ messages in thread
From: Jens Axboe @ 2010-03-10 15:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Steven Rostedt, Paul Mackerras, Hitoshi Mitake, Li Zefan,
	Lai Jiangshan, Masami Hiramatsu

On Wed, Mar 10 2010, Frederic Weisbecker wrote:
> On Tue, Mar 09, 2010 at 09:34:10AM +0100, Jens Axboe wrote:
> > I went to try this on 2.6.34-rc1 to see how much it would improve things
> > here. With 2.6.34-rc1, a
> > 
> > $ time sudo perf lock record ls /dev/shm
> > 
> > essentially hangs the box, it ends up crashing hard (not just live
> > locked like before). With the patch in place, it does eventually finish:
> > 
> > real    0m21.301s
> > user    0m0.030s
> > sys     0m21.040s
> > 
> > The directory is empty.
> 
> 
> Hehe :-)
> 
> That said you are probably missing a part of the puzzle.
> This patch avoids the scary recursions we had. But there
> is another separate patch that fixes the buffers multiplexing
> dependency we had. Buffering is now done per cpu.
> 
> You need this patch:
> 
> 	commit b67577dfb45580c498bfdb1bc76c00c3b2ad6310
> 	Author: Frederic Weisbecker <fweisbec@gmail.com>
> 	Date:   Wed Feb 3 09:09:33 2010 +0100
> 
> 	perf lock: Drop the buffers multiplexing dependency
> 
> But this is not in -rc1, you need to checkout tip/perf/core
> 
> Well, the best would be you actually pull this on top of -rc1:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/core
> 
> so that you get the buffers multiplexing dependency that is in -tip,
> and you also get the recursions fixes, not yet in -tip (but should make
> it soon).
> 
> I bet you still won't have magic results after that, but still it should
> be better.

OK, I will give it a spin and see what happens. I had seen the
de-multiplexing patch, but just assumed it was in -rc1 already.

> We still have further projects in mind to improve the scalability,
> like the events injection thing (that avoids the string copy)
> 
> Again, you are my eyes on this, I'm still blind with my poor dual
> laptop or my atom testbox.
> 
> Thanks for your testing.

Not a problem, thank you for trying to get some nicer lock profiling in
the kernel :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-10  0:05       ` Frederic Weisbecker
  2010-03-10 15:45         ` Peter Zijlstra
@ 2010-03-10 15:55         ` Jens Axboe
  1 sibling, 0 replies; 76+ messages in thread
From: Jens Axboe @ 2010-03-10 15:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Greg KH, Ingo Molnar, LKML,
	Arnaldo Carvalho de Melo, Steven Rostedt, Paul Mackerras,
	Hitoshi Mitake, Li Zefan, Lai Jiangshan, Masami Hiramatsu

On Wed, Mar 10 2010, Frederic Weisbecker wrote:
> On Tue, Mar 09, 2010 at 09:35:37AM +0100, Jens Axboe wrote:
> > Which, btw, throws a new lockdep warning fest:
> > 
> > [   42.247718] scsi7 : ioc0: LSISAS1068E B3, FwRev=011b0300h, Ports=1, MaxQ=483, IRQ=26
> > [   42.281125] BUG: key ffff880c7cb75250 not in .data!
> > [   42.288346] ------------[ cut here ]------------
> > [   42.294490] WARNING: at kernel/lockdep.c:2706 lockdep_init_map+0x545/0x5f0()
> > [   42.304013] Hardware name: QSSC-S4R
> > [   42.309674] Modules linked in: hid_apple usbhid ehci_hcd uhci_hcd usbcore nls_base mptsas(+) mptscsih mptbase scsi_transport_sas igb sg sr_mod cdrom
> > [   42.332072] Pid: 5197, comm: modprobe Not tainted 2.6.34-rc1 #176
> > [   42.340597] Call Trace:
> > [   42.345335]  [<ffffffff8107fb8d>] ? is_module_address+0x2d/0x60
> > [   42.353670]  [<ffffffff81074aa5>] ? lockdep_init_map+0x545/0x5f0
> > [   42.362154]  [<ffffffff81044648>] warn_slowpath_common+0x78/0xd0
> > [   42.371913]  [<ffffffff810446af>] warn_slowpath_null+0xf/0x20
> > [   42.380109]  [<ffffffff81074aa5>] lockdep_init_map+0x545/0x5f0
> 
> 
> 
> This doesn't look related to my patch, it also happen in -rc1 right?

It's not related, it does happen in -rc1 plain as well.

> > [   42.388402]  [<ffffffff81156636>] ? sysfs_new_dirent+0x76/0x120
> 
> 
> This calls:
> 
> #define sysfs_dirent_init_lockdep(sd)				\
> do {								\
> 	struct attribute *attr = sd->s_attr.attr;		\
> 	struct lock_class_key *key = attr->key;			\
> 	if (!key)						\
> 		key = &attr->skey;				\
> 								\
> 	lockdep_init_map(&sd->dep_map, "s_active", key, 0);	\
> } while(0)
> 
> 
> Looks like the struct lock_class_key is dynamically allocated
> and lockdep doesn't like that.

I saw someone else report the same warning, but from a different path
than mine. I'll try and bisect it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection
  2010-03-10 15:45         ` Peter Zijlstra
@ 2010-03-10 15:56           ` Jens Axboe
  0 siblings, 0 replies; 76+ messages in thread
From: Jens Axboe @ 2010-03-10 15:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Greg KH, Ingo Molnar, LKML,
	Arnaldo Carvalho de Melo, Steven Rostedt, Paul Mackerras,
	Hitoshi Mitake, Li Zefan, Lai Jiangshan, Masami Hiramatsu

On Wed, Mar 10 2010, Peter Zijlstra wrote:
> On Wed, 2010-03-10 at 01:05 +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 09, 2010 at 09:35:37AM +0100, Jens Axboe wrote:
> > > Which, btw, throws a new lockdep warning fest:
> > > 
> > > [   42.247718] scsi7 : ioc0: LSISAS1068E B3, FwRev=011b0300h,
> > Ports=1, MaxQ=483, IRQ=26
> > > [   42.281125] BUG: key ffff880c7cb75250 not in .data!
> > > [   42.288346] ------------[ cut here ]------------
> > > [   42.294490] WARNING: at kernel/lockdep.c:2706 lockdep_init_map
> > +0x545/0x5f0()
> > > [   42.304013] Hardware name: QSSC-S4R
> > > [   42.309674] Modules linked in: hid_apple usbhid ehci_hcd uhci_hcd
> > usbcore nls_base mptsas(+) mptscsih mptbase scsi_transport_sas igb sg
> > sr_mod cdrom
> > > [   42.332072] Pid: 5197, comm: modprobe Not tainted 2.6.34-rc1 #176
> > > [   42.340597] Call Trace:
> > > [   42.345335]  [<ffffffff8107fb8d>] ? is_module_address+0x2d/0x60
> > > [   42.353670]  [<ffffffff81074aa5>] ? lockdep_init_map+0x545/0x5f0
> > > [   42.362154]  [<ffffffff81044648>] warn_slowpath_common+0x78/0xd0
> > > [   42.371913]  [<ffffffff810446af>] warn_slowpath_null+0xf/0x20
> > > [   42.380109]  [<ffffffff81074aa5>] lockdep_init_map+0x545/0x5f0
> > 
> > 
> > 
> > This doesn't look related to my patch, it also happen in -rc1 right?
> 
> 
> The per-cpu changes broke lockdep, Tejun already posted patches for
> this:
> 
> http://lkml.org/lkml/2010/3/10/76
> http://lkml.org/lkml/2010/3/10/79

Ah good, saves me the trouble of locating it, thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2019-09-26 20:06 [GIT PULL] perf updates Ingo Molnar
@ 2019-09-26 23:00 ` pr-tracker-bot
  0 siblings, 0 replies; 76+ messages in thread
From: pr-tracker-bot @ 2019-09-26 23:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Andrew Morton, Jiri Olsa, Thomas Gleixner

The pull request you sent on Thu, 26 Sep 2019 22:06:22 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a7b7b772bb4abaa4b2d9df67b50bf7208203da82

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2019-09-26 20:06 Ingo Molnar
  2019-09-26 23:00 ` pr-tracker-bot
  0 siblings, 1 reply; 76+ messages in thread
From: Ingo Molnar @ 2019-09-26 20:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Andrew Morton, Jiri Olsa, Thomas Gleixner

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus

   # HEAD: 26acf400d2dcc72c7e713e1f55db47ad92010cc2 perf unwind: Fix libunwind build failure on i386 systems

The only kernel change is comment typo fixes, the rest is mostly tooling 
fixes, but also new vendor event additions and updates, a bigger 
libperf/libtraceevent library and a header files reorganization that came 
in a bit late.

 Thanks,

	Ingo

------------------>
Andi Kleen (2):
      perf stat: Fix free memory access / memory leaks in metrics
      perf evlist: Fix access of freed id arrays

Anju T Sudhakar (3):
      perf kvm: Move kvm-stat header file from conditional inclusion to common include section
      perf kvm: Add arch neutral function to choose event for perf kvm record
      perf kvm stat: Set 'trace_cycles' as default event for 'perf kvm record' in powerpc

Arnaldo Carvalho de Melo (36):
      perf jvmti: Link against tools/lib/string.o to have weak strlcpy()
      perf tools: Remove needless builtin.h include directives
      perf debug: No need to include ui/util.h
      perf tools: Remove debug.h from places where it is not needed
      perf tools: Remove util.h from where it is not needed
      perf probe: Add missing build-id.h header.
      perf symbols: Add missing dso.h header
      perf env: Remove needless cpumap.h header
      perf event: Move perf_event__synthesize* to event.h
      perf stat: Move perf_stat_synthesize_config() to event.h
      perf callchain: Remove needless event.h include
      perf python: Remove debug.h
      perf hist: Add missing 'struct branch_stack' forward declaration
      perf annotate: Add missing machine.h include directive
      perf sched: Add missing event.h include directive
      perf auxtrace: Add missing 'struct perf_sample' forward declaration
      perf tools: Move event synthesizing routines to separate header
      perf memswap: Adopt 'struct u64_swap' from evsel.h
      perf tools: Move event synthesizing routines to separate .c file
      tools headers uapi: Sync prctl.h with the kernel sources
      tools uapi asm-generic: Sync unistd.h with the kernel sources
      tools arch x86 uapi: Synch asm/unistd.h with the kernel sources
      tools arch x86: Sync asm/cpufeatures.h with the kernel sources
      perf record: Move restricted maps check to after a possible fallback to not collect kernel samples
      perf evlist: Adopt backwards ring buffer state enum
      libperf: Add missing 'struct xyarray' forward declaration
      perf tools: No need to include internal/lib.h from util/util.h
      libperf: Use sys/types.h to get ssize_t, not unistd.h
      perf copyfile: Move copyfile routines to separate files
      perf evsel: Remove need for symbol_conf in evsel_fprintf.c
      perf evsel: Introduce evsel_fprintf.h
      perf evlist: Remove unused perf_evlist__fprintf() method
      perf evsel: Move config terms to a separate header
      perf tools: Replace needless mmap.h with what is needed, event.h
      perf parser: Remove needless include directives
      perf unwind: Fix libunwind build failure on i386 systems

Colin Ian King (1):
      perf test: Fix spelling mistake "allos" -> "allocate"

James Clark (1):
      perf tools: Add PMU event JSON files for ARM Cortex-A76 and, Neoverse N1.

Jiri Olsa (43):
      perf python: Add missing python/perf.so dependency for libperf
      perf tests: Add libperf automated test for 'make -C tools/perf build-test'
      libperf: Add missing event.h file to install rule
      libperf: Adopt perf_cpu_map__max() function
      perf tests: Fix static build test
      perf tools: Fix segfault in cpu_cache_level__read()
      tools: Add missing stdio.h include to asm/bug.h header
      perf tools: Rename 'struct perf_mmap' to 'struct mmap'
      perf tools: Rename perf_evlist__mmap() to evlist__mmap()
      perf tools: Rename perf_evlist__munmap() to evlist__munmap()
      perf tools: Rename perf_evlist__alloc_mmap() to evlist__alloc_mmap()
      perf tools: Rename perf_evlist__exit() to evlist__exit()
      perf tools: Rename perf_evlist__purge() to evlist__purge()
      libperf: Link libapi.a in libperf.so
      libperf: Add perf_mmap struct
      libperf: Add 'mask' to struct perf_mmap
      libperf: Add 'fd' to struct perf_mmap
      libperf: Add 'cpu' to struct perf_mmap
      libperf: Add 'refcnt' to struct perf_mmap
      libperf: Add prev/start/end to struct perf_mmap
      libperf: Add 'overwrite' to 'struct perf_mmap'
      libperf: Add 'event_copy' to 'struct perf_mmap'
      libperf: Add 'flush' to 'struct perf_mmap'
      libperf: Move 'system_wide' from 'struct evsel' to 'struct perf_evsel'
      libperf: Move 'nr_mmaps' from 'struct evlist' to 'struct perf_evlist'
      libperf: Move 'mmap_len' from 'struct evlist' to 'struct perf_evlist'
      libperf: Move 'pollfd' from 'struct evlist' to 'struct perf_evlist'
      libperf: Move 'sample_id' from 'struct evsel' to 'struct perf_evsel'
      libperf: Move 'id' from 'struct evsel' to 'struct perf_evsel'
      libperf: Move 'ids' from 'struct evsel' to 'struct perf_evsel'
      libperf: Move 'heads' from 'struct evlist' to 'struct perf_evlist'
      libperf: Add perf_evsel__alloc_id/perf_evsel__free_id functions
      libperf: Add perf_evlist__first()/last() functions
      libperf: Add perf_evlist__read_format() function
      libperf: Add perf_evlist__id_add() function
      libperf: Add perf_evlist__id_add_fd() function
      libperf: Move 'page_size' global variable to libperf
      libperf: Add libperf dependency for tests targets
      libperf: Merge libperf_set_print() into libperf_init()
      libperf: Add libperf_init() call to the tests
      libperf: Add perf_evlist__alloc_pollfd() function
      libperf: Add perf_evlist__add_pollfd() function
      libperf: Add perf_evlist__poll() function

Kim Phillips (4):
      perf vendor events amd: Add L3 cache events for Family 17h
      perf vendor events amd: Remove redundant '['
      perf vendor events: Minor fixes to the README
      perf list: Allow plurals for metric, metricgroup

Mamatha Inamdar (2):
      perf session: Return error code for perf_session__new() function on failure
      perf vendor events: Remove P8 HW events which are not supported

Masami Hiramatsu (2):
      perf probe: Skip same probe address for a given line
      perf probe: Fix to clear tev->nargs in clear_probe_trace_event()

Roy Ben Shlomo (1):
      perf/core: Fix several typos in comments

Sakari Ailus (1):
      tools lib traceevent: Convert remaining %p[fF] users to %p[sS]

Srikar Dronamraju (2):
      perf stat: Reset previous counts on repeat with interval
      perf stat: Fix a segmentation fault when using repeat forever

Stephane Eranian (1):
      perf record: Fix priv level with branch sampling for paranoid=2

Steven Rostedt (VMware) (1):
      libtraceevent: Round up in tep_print_event() time precision

Thomas Richter (2):
      perf jvmti: Include JVMTI support for s390
      perf build: Add detection of java-11-openjdk-devel package

Tzvetomir Stoyanov (2):
      libtraceevent: Man pages for libtraceevent event print related API
      libtraceevent: Man pages for tep plugins APIs

Tzvetomir Stoyanov (VMware) (4):
      libtraceevent: Man pages fix, rename tep_ref_get() to tep_get_ref()
      libtraceevent: Man pages fix, changes in event printing APIs
      libtraceevent: Add tep_get_event() in event-parse.h
      libtraceevent: Move traceevent plugins in its own subdirectory


 kernel/events/core.c                               |    6 +-
 tools/arch/x86/include/asm/cpufeatures.h           |    3 +
 tools/arch/x86/include/uapi/asm/unistd.h           |    2 +-
 tools/include/asm/bug.h                            |    1 +
 tools/include/uapi/asm-generic/unistd.h            |    2 +-
 tools/include/uapi/linux/prctl.h                   |    7 +-
 tools/lib/traceevent/Build                         |   11 -
 .../Documentation/libtraceevent-event_print.txt    |  130 ++
 .../Documentation/libtraceevent-func_apis.txt      |   10 +-
 .../Documentation/libtraceevent-handle.txt         |    8 +-
 .../Documentation/libtraceevent-plugins.txt        |   99 +
 .../lib/traceevent/Documentation/libtraceevent.txt |   15 +-
 tools/lib/traceevent/Makefile                      |   94 +-
 tools/lib/traceevent/event-parse.c                 |   22 +-
 tools/lib/traceevent/event-parse.h                 |    2 +
 tools/lib/traceevent/plugins/Build                 |   10 +
 tools/lib/traceevent/plugins/Makefile              |  222 +++
 .../lib/traceevent/{ => plugins}/plugin_cfg80211.c |    0
 .../lib/traceevent/{ => plugins}/plugin_function.c |    0
 .../lib/traceevent/{ => plugins}/plugin_hrtimer.c  |    0
 tools/lib/traceevent/{ => plugins}/plugin_jbd2.c   |    0
 tools/lib/traceevent/{ => plugins}/plugin_kmem.c   |    0
 tools/lib/traceevent/{ => plugins}/plugin_kvm.c    |    0
 .../lib/traceevent/{ => plugins}/plugin_mac80211.c |    0
 .../traceevent/{ => plugins}/plugin_sched_switch.c |    0
 tools/lib/traceevent/{ => plugins}/plugin_scsi.c   |    0
 tools/lib/traceevent/{ => plugins}/plugin_xen.c    |    0
 tools/perf/Makefile.config                         |    2 +-
 tools/perf/Makefile.perf                           |    6 +-
 tools/perf/arch/arm/util/cs-etm.c                  |    7 +-
 tools/perf/arch/arm64/util/arm-spe.c               |    6 +-
 tools/perf/arch/arm64/util/dwarf-regs.c            |    1 -
 tools/perf/arch/arm64/util/header.c                |    4 +-
 tools/perf/arch/arm64/util/unwind-libunwind.c      |    2 +-
 tools/perf/arch/powerpc/util/dwarf-regs.c          |    1 -
 tools/perf/arch/powerpc/util/header.c              |    1 -
 tools/perf/arch/powerpc/util/kvm-stat.c            |   45 +
 tools/perf/arch/powerpc/util/skip-callchain-idx.c  |    1 +
 tools/perf/arch/powerpc/util/sym-handling.c        |    1 -
 tools/perf/arch/s390/Makefile                      |    1 +
 tools/perf/arch/s390/util/auxtrace.c               |    1 +
 tools/perf/arch/s390/util/machine.c                |    2 +-
 tools/perf/arch/x86/tests/intel-cqm.c              |    6 +-
 tools/perf/arch/x86/tests/perf-time-to-tsc.c       |   12 +-
 tools/perf/arch/x86/tests/rdpmc.c                  |    2 +-
 tools/perf/arch/x86/util/archinsn.c                |    1 +
 tools/perf/arch/x86/util/event.c                   |    2 +
 tools/perf/arch/x86/util/intel-bts.c               |    9 +-
 tools/perf/arch/x86/util/intel-pt.c                |   17 +-
 tools/perf/arch/x86/util/machine.c                 |    3 +-
 tools/perf/arch/x86/util/tsc.c                     |    2 +
 tools/perf/arch/x86/util/unwind-libunwind.c        |    2 +-
 tools/perf/bench/epoll-ctl.c                       |    2 +-
 tools/perf/bench/epoll-wait.c                      |    2 +-
 tools/perf/bench/futex-hash.c                      |    2 +-
 tools/perf/bench/futex-lock-pi.c                   |    2 +-
 tools/perf/bench/futex-requeue.c                   |    2 +-
 tools/perf/bench/futex-wake-parallel.c             |    3 +-
 tools/perf/bench/futex-wake.c                      |    2 +-
 tools/perf/bench/numa.c                            |    1 -
 tools/perf/bench/sched-messaging.c                 |    2 -
 tools/perf/bench/sched-pipe.c                      |    2 -
 tools/perf/builtin-annotate.c                      |    6 +-
 tools/perf/builtin-buildid-cache.c                 |    5 +-
 tools/perf/builtin-buildid-list.c                  |    5 +-
 tools/perf/builtin-c2c.c                           |    7 +-
 tools/perf/builtin-config.c                        |    1 -
 tools/perf/builtin-diff.c                          |    9 +-
 tools/perf/builtin-evlist.c                        |    8 +-
 tools/perf/builtin-inject.c                        |    6 +-
 tools/perf/builtin-kmem.c                          |    5 +-
 tools/perf/builtin-kvm.c                           |   37 +-
 tools/perf/builtin-list.c                          |    4 +-
 tools/perf/builtin-lock.c                          |    5 +-
 tools/perf/builtin-mem.c                           |    5 +-
 tools/perf/builtin-record.c                        |  117 +-
 tools/perf/builtin-report.c                        |    6 +-
 tools/perf/builtin-sched.c                         |   17 +-
 tools/perf/builtin-script.c                        |   20 +-
 tools/perf/builtin-stat.c                          |   41 +-
 tools/perf/builtin-timechart.c                     |    5 +-
 tools/perf/builtin-top.c                           |   28 +-
 tools/perf/builtin-trace.c                         |   22 +-
 tools/perf/jvmti/Build                             |    9 +
 tools/perf/lib/Makefile                            |   36 +-
 tools/perf/lib/core.c                              |   13 +-
 tools/perf/lib/cpumap.c                            |   12 +
 tools/perf/lib/evlist.c                            |  124 ++
 tools/perf/lib/evsel.c                             |   30 +
 tools/perf/lib/include/internal/evlist.h           |   33 +
 tools/perf/lib/include/internal/evsel.h            |   33 +
 tools/perf/lib/include/internal/lib.h              |    4 +-
 tools/perf/lib/include/internal/mmap.h             |   32 +
 tools/perf/lib/include/perf/core.h                 |    2 +-
 tools/perf/lib/include/perf/cpumap.h               |    1 +
 tools/perf/lib/include/perf/evlist.h               |    1 +
 tools/perf/lib/lib.c                               |    2 +
 tools/perf/lib/libperf.map                         |    4 +-
 tools/perf/lib/tests/test-cpumap.c                 |   10 +
 tools/perf/lib/tests/test-evlist.c                 |   10 +
 tools/perf/lib/tests/test-evsel.c                  |   10 +
 tools/perf/lib/tests/test-threadmap.c              |   10 +
 tools/perf/perf.c                                  |   13 +-
 tools/perf/pmu-events/README                       |   22 +-
 .../arch/arm64/arm/cortex-a76-n1/branch.json       |   14 +
 .../arch/arm64/arm/cortex-a76-n1/bus.json          |   24 +
 .../arch/arm64/arm/cortex-a76-n1/cache.json        |  207 +++
 .../arch/arm64/arm/cortex-a76-n1/exception.json    |   52 +
 .../arch/arm64/arm/cortex-a76-n1/instruction.json  |  108 ++
 .../arch/arm64/arm/cortex-a76-n1/memory.json       |   23 +
 .../arch/arm64/arm/cortex-a76-n1/other.json        |    7 +
 .../arch/arm64/arm/cortex-a76-n1/pipeline.json     |   14 +
 tools/perf/pmu-events/arch/arm64/mapfile.csv       |    2 +
 .../perf/pmu-events/arch/powerpc/power8/other.json |   24 -
 .../perf/pmu-events/arch/x86/amdfam17h/cache.json  |   42 +
 tools/perf/pmu-events/arch/x86/amdfam17h/core.json |    2 +-
 tools/perf/pmu-events/jevents.c                    |    1 +
 tools/perf/tests/backward-ring-buffer.c            |   11 +-
 tools/perf/tests/bitmap.c                          |    2 +-
 tools/perf/tests/bpf.c                             |    9 +-
 tools/perf/tests/clang.c                           |    2 -
 tools/perf/tests/code-reading.c                    |   13 +-
 tools/perf/tests/cpumap.c                          |    1 +
 tools/perf/tests/dso-data.c                        |    1 -
 tools/perf/tests/dwarf-unwind.c                    |    1 +
 tools/perf/tests/event-times.c                     |   15 +-
 tools/perf/tests/event_update.c                    |   10 +-
 tools/perf/tests/evsel-roundtrip-name.c            |    2 +-
 tools/perf/tests/hists_common.c                    |    2 +
 tools/perf/tests/hists_cumulate.c                  |    2 +-
 tools/perf/tests/hists_link.c                      |    5 +-
 tools/perf/tests/hists_output.c                    |    2 +-
 tools/perf/tests/keep-tracking.c                   |   14 +-
 tools/perf/tests/llvm.c                            |    1 -
 tools/perf/tests/make                              |    8 +-
 tools/perf/tests/mem2node.c                        |    2 +-
 tools/perf/tests/mmap-basic.c                      |    8 +-
 tools/perf/tests/mmap-thread-lookup.c              |    4 +-
 tools/perf/tests/openat-syscall-all-cpus.c         |    5 +-
 tools/perf/tests/openat-syscall-tp-fields.c        |   11 +-
 tools/perf/tests/parse-events.c                    |  117 +-
 tools/perf/tests/parse-no-sample-id-all.c          |    2 -
 tools/perf/tests/perf-hooks.c                      |    1 -
 tools/perf/tests/perf-record.c                     |   13 +-
 tools/perf/tests/pmu.c                             |    1 -
 tools/perf/tests/sample-parsing.c                  |    2 +-
 tools/perf/tests/sdt.c                             |    1 +
 tools/perf/tests/stat.c                            |    1 +
 tools/perf/tests/sw-clock.c                        |    5 +-
 tools/perf/tests/switch-tracking.c                 |   30 +-
 tools/perf/tests/task-exit.c                       |   11 +-
 tools/perf/tests/thread-map.c                      |    1 +
 tools/perf/tests/topology.c                        |    7 +-
 tools/perf/tests/vmlinux-kallsyms.c                |    2 +-
 tools/perf/ui/browser.c                            |    1 -
 tools/perf/ui/browsers/annotate.c                  |    1 -
 tools/perf/ui/browsers/header.c                    |    1 -
 tools/perf/ui/browsers/hists.c                     |    6 +-
 tools/perf/ui/browsers/map.c                       |    1 -
 tools/perf/ui/browsers/res_sample.c                |    2 +-
 tools/perf/ui/browsers/scripts.c                   |    3 +-
 tools/perf/ui/gtk/helpline.c                       |    1 -
 tools/perf/ui/gtk/hists.c                          |    1 +
 tools/perf/ui/gtk/progress.c                       |    1 -
 tools/perf/ui/gtk/setup.c                          |    3 +-
 tools/perf/ui/gtk/util.c                           |    1 -
 tools/perf/ui/helpline.c                           |    2 -
 tools/perf/ui/hist.c                               |    1 -
 tools/perf/ui/setup.c                              |    2 +-
 tools/perf/ui/stdio/hist.c                         |    1 +
 tools/perf/ui/tui/helpline.c                       |    1 -
 tools/perf/ui/tui/setup.c                          |    2 +-
 tools/perf/ui/tui/util.c                           |    1 -
 tools/perf/util/Build                              |    3 +
 tools/perf/util/annotate.c                         |    3 +-
 tools/perf/util/arm-spe.c                          |    1 -
 tools/perf/util/auxtrace.c                         |   12 +-
 tools/perf/util/auxtrace.h                         |   26 +-
 tools/perf/util/bpf-event.c                        |    1 +
 tools/perf/util/bpf-event.h                        |   15 +-
 tools/perf/util/bpf-loader.c                       |    2 +-
 tools/perf/util/branch.c                           |    2 -
 tools/perf/util/branch.h                           |    9 +-
 tools/perf/util/build-id.c                         |    3 +-
 tools/perf/util/callchain.c                        |    1 +
 tools/perf/util/callchain.h                        |    5 +-
 tools/perf/util/cloexec.c                          |    2 +-
 tools/perf/util/copyfile.c                         |  144 ++
 tools/perf/util/copyfile.h                         |   16 +
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |    1 -
 tools/perf/util/cs-etm.c                           |    4 +-
 tools/perf/util/data-convert-bt.c                  |    5 +-
 tools/perf/util/data.c                             |    3 +-
 tools/perf/util/debug.c                            |    1 -
 tools/perf/util/debug.h                            |    2 +-
 tools/perf/util/demangle-java.c                    |    1 -
 tools/perf/util/demangle-rust.c                    |    1 -
 tools/perf/util/dwarf-regs.c                       |    1 -
 tools/perf/util/env.h                              |    3 +-
 tools/perf/util/event.c                            | 1109 +-----------
 tools/perf/util/event.h                            |   77 +-
 tools/perf/util/evlist.c                           |  295 +--
 tools/perf/util/evlist.h                           |   81 +-
 tools/perf/util/evsel.c                            |  484 +----
 tools/perf/util/evsel.h                            |  126 +-
 tools/perf/util/evsel_config.h                     |   50 +
 tools/perf/util/evsel_fprintf.c                    |   16 +-
 tools/perf/util/evsel_fprintf.h                    |   50 +
 tools/perf/util/genelf.h                           |    3 +
 tools/perf/util/header.c                           |  424 +----
 tools/perf/util/header.h                           |   60 +-
 tools/perf/util/hist.h                             |    1 +
 tools/perf/util/intel-bts.c                        |    6 +-
 tools/perf/util/intel-pt.c                         |   11 +-
 tools/perf/util/jitdump.c                          |    4 +-
 tools/perf/util/kvm-stat.h                         |    4 +
 tools/perf/util/libunwind/arm64.c                  |    1 -
 tools/perf/util/libunwind/x86_32.c                 |    1 -
 tools/perf/util/llvm-utils.c                       |    1 +
 tools/perf/util/lzma.c                             |    2 +-
 tools/perf/util/machine.c                          |   16 +-
 tools/perf/util/machine.h                          |   15 -
 tools/perf/util/memswap.h                          |    7 +
 tools/perf/util/mmap.c                             |  185 +-
 tools/perf/util/mmap.h                             |   77 +-
 tools/perf/util/namespaces.c                       |   18 +
 tools/perf/util/namespaces.h                       |    2 +
 tools/perf/util/parse-events.c                     |    9 +-
 tools/perf/util/parse-events.y                     |    4 +-
 tools/perf/util/perf-hooks.c                       |    1 -
 tools/perf/util/perf_event_attr_fprintf.c          |  148 ++
 tools/perf/util/pmu.c                              |    1 -
 tools/perf/util/probe-event.c                      |    1 +
 tools/perf/util/probe-file.c                       |    1 +
 tools/perf/util/probe-finder.c                     |   19 +
 tools/perf/util/python-ext-sources                 |    1 +
 tools/perf/util/python.c                           |   28 +-
 tools/perf/util/record.c                           |    8 +-
 tools/perf/util/rwsem.c                            |    1 +
 tools/perf/util/s390-cpumsf.c                      |    1 -
 tools/perf/util/s390-sample-raw.c                  |    1 -
 .../util/scripting-engines/trace-event-python.c    |    2 -
 tools/perf/util/session.c                          |   92 +-
 tools/perf/util/session.h                          |    5 -
 tools/perf/util/sort.c                             |    2 +-
 tools/perf/util/srccode.c                          |    2 +-
 tools/perf/util/stat-shadow.c                      |    4 +-
 tools/perf/util/stat.c                             |   62 +-
 tools/perf/util/stat.h                             |    9 +-
 tools/perf/util/svghelper.c                        |    2 +-
 tools/perf/util/symbol-elf.c                       |    5 +-
 tools/perf/util/symbol-minimal.c                   |    3 +-
 tools/perf/util/symbol.c                           |    2 +-
 tools/perf/util/synthetic-events.c                 | 1884 ++++++++++++++++++++
 tools/perf/util/synthetic-events.h                 |  103 ++
 tools/perf/util/target.c                           |    2 -
 tools/perf/util/top.c                              |    3 +-
 tools/perf/util/trace-event-info.c                 |    2 +-
 tools/perf/util/trace-event-read.c                 |    1 -
 tools/perf/util/trace-event.c                      |    1 -
 tools/perf/util/tsc.h                              |   14 +-
 tools/perf/util/unwind-libdw.c                     |    1 -
 tools/perf/util/unwind-libunwind-local.c           |    1 -
 tools/perf/util/usage.c                            |    1 -
 tools/perf/util/util.c                             |  136 --
 tools/perf/util/util.h                             |    8 -
 tools/perf/util/vdso.c                             |    2 +-
 tools/perf/util/zlib.c                             |    4 +-
 268 files changed, 4801 insertions(+), 3617 deletions(-)
 create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-event_print.txt
 create mode 100644 tools/lib/traceevent/Documentation/libtraceevent-plugins.txt
 create mode 100644 tools/lib/traceevent/plugins/Build
 create mode 100644 tools/lib/traceevent/plugins/Makefile
 rename tools/lib/traceevent/{ => plugins}/plugin_cfg80211.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_function.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_hrtimer.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_jbd2.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_kmem.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_kvm.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_mac80211.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_sched_switch.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_scsi.c (100%)
 rename tools/lib/traceevent/{ => plugins}/plugin_xen.c (100%)
 create mode 100644 tools/perf/lib/include/internal/mmap.h
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json
 create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json
 create mode 100644 tools/perf/util/copyfile.c
 create mode 100644 tools/perf/util/copyfile.h
 create mode 100644 tools/perf/util/evsel_config.h
 create mode 100644 tools/perf/util/evsel_fprintf.h
 create mode 100644 tools/perf/util/perf_event_attr_fprintf.c
 create mode 100644 tools/perf/util/synthetic-events.c
 create mode 100644 tools/perf/util/synthetic-events.h

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2016-08-06  6:05 Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2016-08-06  6:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Jiri Olsa,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus

   # HEAD: f282f7a0ecc3e0b8fd8532a6c3e9401534cb907c Merge tag 'perf-core-for-mingo-20160803' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Mostly tooling fixes and some late tooling updates, plus two perf related printk 
message fixes.

 Thanks,

	Ingo

------------------>
Arnaldo Carvalho de Melo (7):
      perf evsel: Introduce constructor for cycles event
      perf annotate: Use pipe + fork instead of popen
      perf target: str_error_r() always returns the buffer it receives
      perf annotate: Rename symbol__annotate() to symbol__disassemble()
      perf annotate: Introduce strerror for handling symbol__disassemble() errors
      perf annotate: Plug filename string leak
      perf tests bpf: Use SyS_epoll_wait alias

David Ahern (1):
      perf/core: Change log level for duration warning to KERN_INFO

Jan Stancek (1):
      perf tests: objdump output can contain multi byte chunks

Jiri Olsa (7):
      tools lib: Add bitmap_alloc function
      tools lib: Add bitmap_scnprintf function
      tools lib: Add bitmap_and function
      perf tests: Add test for bitmap_scnprintf function
      perf tools: Move config/Makefile into Makefile.config
      perf hists: Introduce output_resort_cb method
      perf record: Add --sample-cpu option

Juergen Gross (1):
      perf/x86: Modify error message in virtualized environment

Namhyung Kim (2):
      perf tools: Fix build failure on perl script context
      tools lib traceevent: Ignore generated library files


 arch/x86/events/core.c                          |  11 ++-
 kernel/events/core.c                            |   2 +-
 tools/include/linux/bitmap.h                    |  37 ++++++++
 tools/lib/bitmap.c                              |  44 +++++++++
 tools/lib/traceevent/.gitignore                 |   1 +
 tools/perf/Documentation/perf-record.txt        |   3 +
 tools/perf/{config/Makefile => Makefile.config} |   0
 tools/perf/Makefile.perf                        |   6 +-
 tools/perf/builtin-record.c                     |   1 +
 tools/perf/builtin-top.c                        |   6 +-
 tools/perf/perf.h                               |   1 +
 tools/perf/scripts/perl/Perf-Trace-Util/Build   |   4 +-
 tools/perf/tests/Build                          |   1 +
 tools/perf/tests/bitmap.c                       |  53 +++++++++++
 tools/perf/tests/bpf-script-example.c           |   4 +-
 tools/perf/tests/builtin-test.c                 |   4 +
 tools/perf/tests/code-reading.c                 | 100 ++++++++++++++------
 tools/perf/tests/tests.h                        |   1 +
 tools/perf/ui/browsers/annotate.c               |   9 +-
 tools/perf/ui/gtk/annotate.c                    |   8 +-
 tools/perf/util/annotate.c                      | 116 +++++++++++++++++-------
 tools/perf/util/annotate.h                      |  22 ++++-
 tools/perf/util/evlist.c                        |  22 +----
 tools/perf/util/evsel.c                         |  30 +++++-
 tools/perf/util/evsel.h                         |   2 +
 tools/perf/util/hist.c                          |  15 ++-
 tools/perf/util/hist.h                          |   4 +
 tools/perf/util/target.c                        |   6 +-
 28 files changed, 402 insertions(+), 111 deletions(-)
 rename tools/perf/{config/Makefile => Makefile.config} (100%)
 create mode 100644 tools/perf/tests/bitmap.c

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c17f0de5fd39..ba1335d16529 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -262,10 +262,13 @@ static bool check_hw_exists(void)
 	return true;
 
 msr_fail:
-	pr_cont("Broken PMU hardware detected, using software events only.\n");
-	printk("%sFailed to access perfctr msr (MSR %x is %Lx)\n",
-		boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR,
-		reg, val_new);
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		pr_cont("PMU not available due to virtualization, using software events only.\n");
+	} else {
+		pr_cont("Broken PMU hardware detected, using software events only.\n");
+		pr_err("Failed to access perfctr msr (MSR %x is %Lx)\n",
+		       reg, val_new);
+	}
 
 	return false;
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 356a6c7cb52a..a19550d80ab1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -448,7 +448,7 @@ static u64 __report_allowed;
 
 static void perf_duration_warn(struct irq_work *w)
 {
-	printk_ratelimited(KERN_WARNING
+	printk_ratelimited(KERN_INFO
 		"perf: interrupt took too long (%lld > %lld), lowering "
 		"kernel.perf_event_max_sample_rate to %d\n",
 		__report_avg, __report_allowed,
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 28f5493da491..43c1c5021e4b 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -3,6 +3,7 @@
 
 #include <string.h>
 #include <linux/bitops.h>
+#include <stdlib.h>
 
 #define DECLARE_BITMAP(name,bits) \
 	unsigned long name[BITS_TO_LONGS(bits)]
@@ -10,6 +11,8 @@
 int __bitmap_weight(const unsigned long *bitmap, int bits);
 void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
 		 const unsigned long *bitmap2, int bits);
+int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
+		 const unsigned long *bitmap2, unsigned int bits);
 
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
 
@@ -65,4 +68,38 @@ static inline int test_and_set_bit(int nr, unsigned long *addr)
 	return (old & mask) != 0;
 }
 
+/**
+ * bitmap_alloc - Allocate bitmap
+ * @nr: Bit to set
+ */
+static inline unsigned long *bitmap_alloc(int nbits)
+{
+	return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long));
+}
+
+/*
+ * bitmap_scnprintf - print bitmap list into buffer
+ * @bitmap: bitmap
+ * @nbits: size of bitmap
+ * @buf: buffer to store output
+ * @size: size of @buf
+ */
+size_t bitmap_scnprintf(unsigned long *bitmap, int nbits,
+			char *buf, size_t size);
+
+/**
+ * bitmap_and - Do logical and on bitmaps
+ * @dst: resulting bitmap
+ * @src1: operand 1
+ * @src2: operand 2
+ * @nbits: size of bitmap
+ */
+static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
+			     const unsigned long *src2, unsigned int nbits)
+{
+	if (small_const_nbits(nbits))
+		return (*dst = *src1 & *src2 & BITMAP_LAST_WORD_MASK(nbits)) != 0;
+	return __bitmap_and(dst, src1, src2, nbits);
+}
+
 #endif /* _PERF_BITOPS_H */
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index 0a1adc1111fd..38748b0e342f 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -29,3 +29,47 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
 	for (k = 0; k < nr; k++)
 		dst[k] = bitmap1[k] | bitmap2[k];
 }
+
+size_t bitmap_scnprintf(unsigned long *bitmap, int nbits,
+			char *buf, size_t size)
+{
+	/* current bit is 'cur', most recently seen range is [rbot, rtop] */
+	int cur, rbot, rtop;
+	bool first = true;
+	size_t ret = 0;
+
+	rbot = cur = find_first_bit(bitmap, nbits);
+	while (cur < nbits) {
+		rtop = cur;
+		cur = find_next_bit(bitmap, nbits, cur + 1);
+		if (cur < nbits && cur <= rtop + 1)
+			continue;
+
+		if (!first)
+			ret += scnprintf(buf + ret, size - ret, ",");
+
+		first = false;
+
+		ret += scnprintf(buf + ret, size - ret, "%d", rbot);
+		if (rbot < rtop)
+			ret += scnprintf(buf + ret, size - ret, "-%d", rtop);
+
+		rbot = cur;
+	}
+	return ret;
+}
+
+int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
+		 const unsigned long *bitmap2, unsigned int bits)
+{
+	unsigned int k;
+	unsigned int lim = bits/BITS_PER_LONG;
+	unsigned long result = 0;
+
+	for (k = 0; k < lim; k++)
+		result |= (dst[k] = bitmap1[k] & bitmap2[k]);
+	if (bits % BITS_PER_LONG)
+		result |= (dst[k] = bitmap1[k] & bitmap2[k] &
+			   BITMAP_LAST_WORD_MASK(bits));
+	return result != 0;
+}
diff --git a/tools/lib/traceevent/.gitignore b/tools/lib/traceevent/.gitignore
index 3c60335fe7be..9e9f25fb1922 100644
--- a/tools/lib/traceevent/.gitignore
+++ b/tools/lib/traceevent/.gitignore
@@ -1,2 +1,3 @@
 TRACEEVENT-CFLAGS
 libtraceevent-dynamic-list
+libtraceevent.so.*
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 69966abf65d1..379a2bed07c0 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -192,6 +192,9 @@ OPTIONS
 --period::
 	Record the sample period.
 
+--sample-cpu::
+	Record the sample cpu.
+
 -n::
 --no-samples::
 	Don't sample.
diff --git a/tools/perf/config/Makefile b/tools/perf/Makefile.config
similarity index 100%
rename from tools/perf/config/Makefile
rename to tools/perf/Makefile.config
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 6641abb97f0a..2d9087501633 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -161,7 +161,7 @@ TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
 BPF_DIR		= $(srctree)/tools/lib/bpf/
 SUBCMD_DIR	= $(srctree)/tools/lib/subcmd/
 
-# include config/Makefile by default and rule out
+# include Makefile.config by default and rule out
 # non-config cases
 config := 1
 
@@ -183,7 +183,7 @@ ifeq ($(filter feature-dump,$(MAKECMDGOALS)),feature-dump)
 FEATURE_TESTS := all
 endif
 endif
-include config/Makefile
+include Makefile.config
 endif
 
 ifeq ($(config),0)
@@ -706,7 +706,7 @@ install: install-bin try-install-man install-traceevent-plugins
 ### Cleaning rules
 
 #
-# This is here, not in config/Makefile, because config/Makefile does
+# This is here, not in Makefile.config, because Makefile.config does
 # not get included for the clean target:
 #
 config-clean:
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8f2c16d9275f..6355902fbfc8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1434,6 +1434,7 @@ struct option __record_options[] = {
 	OPT_BOOLEAN('s', "stat", &record.opts.inherit_stat,
 		    "per thread counts"),
 	OPT_BOOLEAN('d', "data", &record.opts.sample_address, "Record the sample addresses"),
+	OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"),
 	OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
 			&record.opts.sample_time_set,
 			"Record the sample timestamps"),
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index bd108683fcb8..418ed94756d3 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -128,10 +128,14 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he)
 		return err;
 	}
 
-	err = symbol__annotate(sym, map, 0);
+	err = symbol__disassemble(sym, map, 0);
 	if (err == 0) {
 out_assign:
 		top->sym_filter_entry = he;
+	} else {
+		char msg[BUFSIZ];
+		symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
+		pr_err("Couldn't annotate %s: %s\n", sym->name, msg);
 	}
 
 	pthread_mutex_unlock(&notes->lock);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index a7e0f1497244..cb0f1356ff81 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -52,6 +52,7 @@ struct record_opts {
 	bool	     sample_weight;
 	bool	     sample_time;
 	bool	     sample_time_set;
+	bool	     sample_cpu;
 	bool	     period;
 	bool	     running_time;
 	bool	     full_auxtrace;
diff --git a/tools/perf/scripts/perl/Perf-Trace-Util/Build b/tools/perf/scripts/perl/Perf-Trace-Util/Build
index 928e110179cb..34faecf774ae 100644
--- a/tools/perf/scripts/perl/Perf-Trace-Util/Build
+++ b/tools/perf/scripts/perl/Perf-Trace-Util/Build
@@ -1,3 +1,5 @@
 libperf-y += Context.o
 
-CFLAGS_Context.o += $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes -Wno-unused-parameter -Wno-nested-externs -Wno-undef -Wno-switch-default
+CFLAGS_Context.o += $(PERL_EMBED_CCOPTS) -Wno-redundant-decls -Wno-strict-prototypes
+CFLAGS_Context.o += -Wno-unused-parameter -Wno-nested-externs -Wno-undef
+CFLAGS_Context.o += -Wno-switch-default -Wno-shadow
diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index cb20ae1c0d35..dc51bc570e51 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -41,6 +41,7 @@ perf-y += event-times.o
 perf-y += backward-ring-buffer.o
 perf-y += sdt.o
 perf-y += is_printable_array.o
+perf-y += bitmap.o
 
 $(OUTPUT)tests/llvm-src-base.c: tests/bpf-script-example.c tests/Build
 	$(call rule_mkdir)
diff --git a/tools/perf/tests/bitmap.c b/tools/perf/tests/bitmap.c
new file mode 100644
index 000000000000..9abe6c13090f
--- /dev/null
+++ b/tools/perf/tests/bitmap.c
@@ -0,0 +1,53 @@
+#include <linux/compiler.h>
+#include <linux/bitmap.h>
+#include "tests.h"
+#include "cpumap.h"
+#include "debug.h"
+
+#define NBITS 100
+
+static unsigned long *get_bitmap(const char *str, int nbits)
+{
+	struct cpu_map *map = cpu_map__new(str);
+	unsigned long *bm = NULL;
+	int i;
+
+	bm = bitmap_alloc(nbits);
+
+	if (map && bm) {
+		bitmap_zero(bm, nbits);
+
+		for (i = 0; i < map->nr; i++)
+			set_bit(map->map[i], bm);
+	}
+
+	if (map)
+		cpu_map__put(map);
+	return bm;
+}
+
+static int test_bitmap(const char *str)
+{
+	unsigned long *bm = get_bitmap(str, NBITS);
+	char buf[100];
+	int ret;
+
+	bitmap_scnprintf(bm, NBITS, buf, sizeof(buf));
+	pr_debug("bitmap: %s\n", buf);
+
+	ret = !strcmp(buf, str);
+	free(bm);
+	return ret;
+}
+
+int test__bitmap_print(int subtest __maybe_unused)
+{
+	TEST_ASSERT_VAL("failed to convert map", test_bitmap("1"));
+	TEST_ASSERT_VAL("failed to convert map", test_bitmap("1,5"));
+	TEST_ASSERT_VAL("failed to convert map", test_bitmap("1,3,5,7,9,11,13,15,17,19,21-40"));
+	TEST_ASSERT_VAL("failed to convert map", test_bitmap("2-5"));
+	TEST_ASSERT_VAL("failed to convert map", test_bitmap("1,3-6,8-10,24,35-37"));
+	TEST_ASSERT_VAL("failed to convert map", test_bitmap("1,3-6,8-10,24,35-37"));
+	TEST_ASSERT_VAL("failed to convert map", test_bitmap("1-10,12-20,22-30,32-40"));
+	return 0;
+}
diff --git a/tools/perf/tests/bpf-script-example.c b/tools/perf/tests/bpf-script-example.c
index e53bc91fa260..268e5f8e4aa2 100644
--- a/tools/perf/tests/bpf-script-example.c
+++ b/tools/perf/tests/bpf-script-example.c
@@ -31,8 +31,8 @@ struct bpf_map_def SEC("maps") flip_table = {
 	.max_entries = 1,
 };
 
-SEC("func=sys_epoll_wait")
-int bpf_func__sys_epoll_wait(void *ctx)
+SEC("func=SyS_epoll_wait")
+int bpf_func__SyS_epoll_wait(void *ctx)
 {
 	int ind =0;
 	int *flag = bpf_map_lookup_elem(&flip_table, &ind);
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 10eb30686c9c..778668a2a966 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -226,6 +226,10 @@ static struct test generic_tests[] = {
 		.func = test__is_printable_array,
 	},
 	{
+		.desc = "Test bitmap print",
+		.func = test__bitmap_print,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 68a69a195545..2af156a8d4e5 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -33,44 +33,86 @@ static unsigned int hex(char c)
 	return c - 'A' + 10;
 }
 
-static size_t read_objdump_line(const char *line, size_t line_len, void *buf,
-			      size_t len)
+static size_t read_objdump_chunk(const char **line, unsigned char **buf,
+				 size_t *buf_len)
 {
-	const char *p;
-	size_t i, j = 0;
-
-	/* Skip to a colon */
-	p = strchr(line, ':');
-	if (!p)
-		return 0;
-	i = p + 1 - line;
+	size_t bytes_read = 0;
+	unsigned char *chunk_start = *buf;
 
 	/* Read bytes */
-	while (j < len) {
+	while (*buf_len > 0) {
 		char c1, c2;
 
-		/* Skip spaces */
-		for (; i < line_len; i++) {
-			if (!isspace(line[i]))
-				break;
-		}
 		/* Get 2 hex digits */
-		if (i >= line_len || !isxdigit(line[i]))
+		c1 = *(*line)++;
+		if (!isxdigit(c1))
 			break;
-		c1 = line[i++];
-		if (i >= line_len || !isxdigit(line[i]))
+		c2 = *(*line)++;
+		if (!isxdigit(c2))
 			break;
-		c2 = line[i++];
-		/* Followed by a space */
-		if (i < line_len && line[i] && !isspace(line[i]))
+
+		/* Store byte and advance buf */
+		**buf = (hex(c1) << 4) | hex(c2);
+		(*buf)++;
+		(*buf_len)--;
+		bytes_read++;
+
+		/* End of chunk? */
+		if (isspace(**line))
 			break;
-		/* Store byte */
-		*(unsigned char *)buf = (hex(c1) << 4) | hex(c2);
-		buf += 1;
-		j++;
 	}
+
+	/*
+	 * objdump will display raw insn as LE if code endian
+	 * is LE and bytes_per_chunk > 1. In that case reverse
+	 * the chunk we just read.
+	 *
+	 * see disassemble_bytes() at binutils/objdump.c for details
+	 * how objdump chooses display endian)
+	 */
+	if (bytes_read > 1 && !bigendian()) {
+		unsigned char *chunk_end = chunk_start + bytes_read - 1;
+		unsigned char tmp;
+
+		while (chunk_start < chunk_end) {
+			tmp = *chunk_start;
+			*chunk_start = *chunk_end;
+			*chunk_end = tmp;
+			chunk_start++;
+			chunk_end--;
+		}
+	}
+
+	return bytes_read;
+}
+
+static size_t read_objdump_line(const char *line, unsigned char *buf,
+				size_t buf_len)
+{
+	const char *p;
+	size_t ret, bytes_read = 0;
+
+	/* Skip to a colon */
+	p = strchr(line, ':');
+	if (!p)
+		return 0;
+	p++;
+
+	/* Skip initial spaces */
+	while (*p) {
+		if (!isspace(*p))
+			break;
+		p++;
+	}
+
+	do {
+		ret = read_objdump_chunk(&p, &buf, &buf_len);
+		bytes_read += ret;
+		p++;
+	} while (ret > 0);
+
 	/* return number of successfully read bytes */
-	return j;
+	return bytes_read;
 }
 
 static int read_objdump_output(FILE *f, void *buf, size_t *len, u64 start_addr)
@@ -95,7 +137,7 @@ static int read_objdump_output(FILE *f, void *buf, size_t *len, u64 start_addr)
 		}
 
 		/* read objdump data into temporary buffer */
-		read_bytes = read_objdump_line(line, ret, tmp, sizeof(tmp));
+		read_bytes = read_objdump_line(line, tmp, sizeof(tmp));
 		if (!read_bytes)
 			continue;
 
@@ -152,7 +194,7 @@ static int read_via_objdump(const char *filename, u64 addr, void *buf,
 
 	ret = read_objdump_output(f, buf, &len, addr);
 	if (len) {
-		pr_debug("objdump read too few bytes\n");
+		pr_debug("objdump read too few bytes: %zd\n", len);
 		if (!ret)
 			ret = len;
 	}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 9bfc0e06c61a..7c196c585472 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -90,6 +90,7 @@ int test__backward_ring_buffer(int subtest);
 int test__cpu_map_print(int subtest);
 int test__sdt_event(int subtest);
 int test__is_printable_array(int subtest);
+int test__bitmap_print(int subtest);
 
 #if defined(__arm__) || defined(__aarch64__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 29dc6d20364e..2e2d10022355 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1026,7 +1026,7 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map,
 			.use_navkeypressed = true,
 		},
 	};
-	int ret = -1;
+	int ret = -1, err;
 	int nr_pcnt = 1;
 	size_t sizeof_bdl = sizeof(struct browser_disasm_line);
 
@@ -1050,8 +1050,11 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map,
 		  (nr_pcnt - 1);
 	}
 
-	if (symbol__annotate(sym, map, sizeof_bdl) < 0) {
-		ui__error("%s", ui_helpline__last_msg);
+	err = symbol__disassemble(sym, map, sizeof_bdl);
+	if (err) {
+		char msg[BUFSIZ];
+		symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
+		ui__error("Couldn't annotate %s:\n%s", sym->name, msg);
 		goto out_free_offsets;
 	}
 
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 9c7ff8d31b27..42d319927762 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -162,12 +162,16 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map,
 	GtkWidget *notebook;
 	GtkWidget *scrolled_window;
 	GtkWidget *tab_label;
+	int err;
 
 	if (map->dso->annotate_warned)
 		return -1;
 
-	if (symbol__annotate(sym, map, 0) < 0) {
-		ui__error("%s", ui_helpline__current);
+	err = symbol__disassemble(sym, map, 0);
+	if (err) {
+		char msg[BUFSIZ];
+		symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
+		ui__error("Couldn't annotate %s: %s\n", sym->name, msg);
 		return -1;
 	}
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e9825fe825fd..4024d309bb00 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1123,7 +1123,46 @@ static void delete_last_nop(struct symbol *sym)
 	}
 }
 
-int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
+int symbol__strerror_disassemble(struct symbol *sym __maybe_unused, struct map *map,
+			      int errnum, char *buf, size_t buflen)
+{
+	struct dso *dso = map->dso;
+
+	BUG_ON(buflen == 0);
+
+	if (errnum >= 0) {
+		str_error_r(errnum, buf, buflen);
+		return 0;
+	}
+
+	switch (errnum) {
+	case SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX: {
+		char bf[SBUILD_ID_SIZE + 15] = " with build id ";
+		char *build_id_msg = NULL;
+
+		if (dso->has_build_id) {
+			build_id__sprintf(dso->build_id,
+					  sizeof(dso->build_id), bf + 15);
+			build_id_msg = bf;
+		}
+		scnprintf(buf, buflen,
+			  "No vmlinux file%s\nwas found in the path.\n\n"
+			  "Note that annotation using /proc/kcore requires CAP_SYS_RAWIO capability.\n\n"
+			  "Please use:\n\n"
+			  "  perf buildid-cache -vu vmlinux\n\n"
+			  "or:\n\n"
+			  "  --vmlinux vmlinux\n", build_id_msg ?: "");
+	}
+		break;
+	default:
+		scnprintf(buf, buflen, "Internal error: Invalid %d error code\n", errnum);
+		break;
+	}
+
+	return 0;
+}
+
+int symbol__disassemble(struct symbol *sym, struct map *map, size_t privsize)
 {
 	struct dso *dso = map->dso;
 	char *filename = dso__build_id_filename(dso, NULL, 0);
@@ -1134,22 +1173,20 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 	char symfs_filename[PATH_MAX];
 	struct kcore_extract kce;
 	bool delete_extract = false;
+	int stdout_fd[2];
 	int lineno = 0;
 	int nline;
+	pid_t pid;
 
 	if (filename)
 		symbol__join_symfs(symfs_filename, filename);
 
 	if (filename == NULL) {
-		if (dso->has_build_id) {
-			pr_err("Can't annotate %s: not enough memory\n",
-			       sym->name);
-			return -ENOMEM;
-		}
+		if (dso->has_build_id)
+			return ENOMEM;
 		goto fallback;
-	} else if (dso__is_kcore(dso)) {
-		goto fallback;
-	} else if (readlink(symfs_filename, command, sizeof(command)) < 0 ||
+	} else if (dso__is_kcore(dso) ||
+		   readlink(symfs_filename, command, sizeof(command)) < 0 ||
 		   strstr(command, DSO__NAME_KALLSYMS) ||
 		   access(symfs_filename, R_OK)) {
 		free(filename);
@@ -1166,27 +1203,7 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 
 	if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS &&
 	    !dso__is_kcore(dso)) {
-		char bf[SBUILD_ID_SIZE + 15] = " with build id ";
-		char *build_id_msg = NULL;
-
-		if (dso->annotate_warned)
-			goto out_free_filename;
-
-		if (dso->has_build_id) {
-			build_id__sprintf(dso->build_id,
-					  sizeof(dso->build_id), bf + 15);
-			build_id_msg = bf;
-		}
-		err = -ENOENT;
-		dso->annotate_warned = 1;
-		pr_err("Can't annotate %s:\n\n"
-		       "No vmlinux file%s\nwas found in the path.\n\n"
-		       "Note that annotation using /proc/kcore requires CAP_SYS_RAWIO capability.\n\n"
-		       "Please use:\n\n"
-		       "  perf buildid-cache -vu vmlinux\n\n"
-		       "or:\n\n"
-		       "  --vmlinux vmlinux\n",
-		       sym->name, build_id_msg ?: "");
+		err = SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX;
 		goto out_free_filename;
 	}
 
@@ -1258,9 +1275,32 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 
 	pr_debug("Executing: %s\n", command);
 
-	file = popen(command, "r");
+	err = -1;
+	if (pipe(stdout_fd) < 0) {
+		pr_err("Failure creating the pipe to run %s\n", command);
+		goto out_remove_tmp;
+	}
+
+	pid = fork();
+	if (pid < 0) {
+		pr_err("Failure forking to run %s\n", command);
+		goto out_close_stdout;
+	}
+
+	if (pid == 0) {
+		close(stdout_fd[0]);
+		dup2(stdout_fd[1], 1);
+		close(stdout_fd[1]);
+		execl("/bin/sh", "sh", "-c", command, NULL);
+		perror(command);
+		exit(-1);
+	}
+
+	close(stdout_fd[1]);
+
+	file = fdopen(stdout_fd[0], "r");
 	if (!file) {
-		pr_err("Failure running %s\n", command);
+		pr_err("Failure creating FILE stream for %s\n", command);
 		/*
 		 * If we were using debug info should retry with
 		 * original binary.
@@ -1286,9 +1326,11 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 	if (dso__is_kcore(dso))
 		delete_last_nop(sym);
 
-	pclose(file);
-
+	fclose(file);
+	err = 0;
 out_remove_tmp:
+	close(stdout_fd[0]);
+
 	if (dso__needs_decompress(dso))
 		unlink(symfs_filename);
 out_free_filename:
@@ -1297,6 +1339,10 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 	if (free_filename)
 		free(filename);
 	return err;
+
+out_close_stdout:
+	close(stdout_fd[1]);
+	goto out_remove_tmp;
 }
 
 static void insert_source_line(struct rb_root *root, struct source_line *src_line)
@@ -1663,7 +1709,7 @@ int symbol__tty_annotate(struct symbol *sym, struct map *map,
 	struct rb_root source_line = RB_ROOT;
 	u64 len;
 
-	if (symbol__annotate(sym, map, 0) < 0)
+	if (symbol__disassemble(sym, map, 0) < 0)
 		return -1;
 
 	len = symbol__size(sym);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index a23084f54128..f67ccb027561 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -155,7 +155,27 @@ int hist_entry__inc_addr_samples(struct hist_entry *he, int evidx, u64 addr);
 int symbol__alloc_hist(struct symbol *sym);
 void symbol__annotate_zero_histograms(struct symbol *sym);
 
-int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize);
+int symbol__disassemble(struct symbol *sym, struct map *map, size_t privsize);
+
+enum symbol_disassemble_errno {
+	SYMBOL_ANNOTATE_ERRNO__SUCCESS		= 0,
+
+	/*
+	 * Choose an arbitrary negative big number not to clash with standard
+	 * errno since SUS requires the errno has distinct positive values.
+	 * See 'Issue 6' in the link below.
+	 *
+	 * http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
+	 */
+	__SYMBOL_ANNOTATE_ERRNO__START		= -10000,
+
+	SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX	= __SYMBOL_ANNOTATE_ERRNO__START,
+
+	__SYMBOL_ANNOTATE_ERRNO__END,
+};
+
+int symbol__strerror_disassemble(struct symbol *sym, struct map *map,
+				 int errnum, char *buf, size_t buflen);
 
 int symbol__annotate_init(struct map *map, struct symbol *sym);
 int symbol__annotate_printf(struct symbol *sym, struct map *map,
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 2a40b8e1def7..097b3ed77fdd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -239,31 +239,13 @@ void perf_event_attr__set_max_precise_ip(struct perf_event_attr *attr)
 
 int perf_evlist__add_default(struct perf_evlist *evlist)
 {
-	struct perf_event_attr attr = {
-		.type = PERF_TYPE_HARDWARE,
-		.config = PERF_COUNT_HW_CPU_CYCLES,
-	};
-	struct perf_evsel *evsel;
-
-	event_attr_init(&attr);
+	struct perf_evsel *evsel = perf_evsel__new_cycles();
 
-	perf_event_attr__set_max_precise_ip(&attr);
-
-	evsel = perf_evsel__new(&attr);
 	if (evsel == NULL)
-		goto error;
-
-	/* use asprintf() because free(evsel) assumes name is allocated */
-	if (asprintf(&evsel->name, "cycles%.*s",
-		     attr.precise_ip ? attr.precise_ip + 1 : 0, ":ppp") < 0)
-		goto error_free;
+		return -ENOMEM;
 
 	perf_evlist__add(evlist, evsel);
 	return 0;
-error_free:
-	perf_evsel__delete(evsel);
-error:
-	return -ENOMEM;
 }
 
 int perf_evlist__add_dummy(struct perf_evlist *evlist)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8c54df61fe64..d9b80ef881cd 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -253,6 +253,34 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
 	return evsel;
 }
 
+struct perf_evsel *perf_evsel__new_cycles(void)
+{
+	struct perf_event_attr attr = {
+		.type	= PERF_TYPE_HARDWARE,
+		.config	= PERF_COUNT_HW_CPU_CYCLES,
+	};
+	struct perf_evsel *evsel;
+
+	event_attr_init(&attr);
+
+	perf_event_attr__set_max_precise_ip(&attr);
+
+	evsel = perf_evsel__new(&attr);
+	if (evsel == NULL)
+		goto out;
+
+	/* use asprintf() because free(evsel) assumes name is allocated */
+	if (asprintf(&evsel->name, "cycles%.*s",
+		     attr.precise_ip ? attr.precise_ip + 1 : 0, ":ppp") < 0)
+		goto error_free;
+out:
+	return evsel;
+error_free:
+	perf_evsel__delete(evsel);
+	evsel = NULL;
+	goto out;
+}
+
 /*
  * Returns pointer with encoded error via <linux/err.h> interface.
  */
@@ -854,7 +882,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 		perf_evsel__set_sample_bit(evsel, REGS_INTR);
 	}
 
-	if (target__has_cpu(&opts->target))
+	if (target__has_cpu(&opts->target) || opts->sample_cpu)
 		perf_evsel__set_sample_bit(evsel, CPU);
 
 	if (opts->period)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8a4a6c9f1480..4d44129e050b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -175,6 +175,8 @@ static inline struct perf_evsel *perf_evsel__newtp(const char *sys, const char *
 	return perf_evsel__newtp_idx(sys, name, 0);
 }
 
+struct perf_evsel *perf_evsel__new_cycles(void);
+
 struct event_format *event_format__new(const char *sys, const char *name);
 
 void perf_evsel__init(struct perf_evsel *evsel,
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index a18d142cdca3..de15dbcdcecf 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1672,7 +1672,7 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 }
 
 static void output_resort(struct hists *hists, struct ui_progress *prog,
-			  bool use_callchain)
+			  bool use_callchain, hists__resort_cb_t cb)
 {
 	struct rb_root *root;
 	struct rb_node *next;
@@ -1711,6 +1711,9 @@ static void output_resort(struct hists *hists, struct ui_progress *prog,
 		n = rb_entry(next, struct hist_entry, rb_node_in);
 		next = rb_next(&n->rb_node_in);
 
+		if (cb && cb(n))
+			continue;
+
 		__hists__insert_output_entry(&hists->entries, n, min_callchain_hits, use_callchain);
 		hists__inc_stats(hists, n);
 
@@ -1731,12 +1734,18 @@ void perf_evsel__output_resort(struct perf_evsel *evsel, struct ui_progress *pro
 	else
 		use_callchain = symbol_conf.use_callchain;
 
-	output_resort(evsel__hists(evsel), prog, use_callchain);
+	output_resort(evsel__hists(evsel), prog, use_callchain, NULL);
 }
 
 void hists__output_resort(struct hists *hists, struct ui_progress *prog)
 {
-	output_resort(hists, prog, symbol_conf.use_callchain);
+	output_resort(hists, prog, symbol_conf.use_callchain, NULL);
+}
+
+void hists__output_resort_cb(struct hists *hists, struct ui_progress *prog,
+			     hists__resort_cb_t cb)
+{
+	output_resort(hists, prog, symbol_conf.use_callchain, cb);
 }
 
 static bool can_goto_child(struct hist_entry *he, enum hierarchy_move_dir hmd)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 49aa4fac148f..0a1edf1ab450 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -153,8 +153,12 @@ int hist_entry__snprintf_alignment(struct hist_entry *he, struct perf_hpp *hpp,
 				   struct perf_hpp_fmt *fmt, int printed);
 void hist_entry__delete(struct hist_entry *he);
 
+typedef int (*hists__resort_cb_t)(struct hist_entry *he);
+
 void perf_evsel__output_resort(struct perf_evsel *evsel, struct ui_progress *prog);
 void hists__output_resort(struct hists *hists, struct ui_progress *prog);
+void hists__output_resort_cb(struct hists *hists, struct ui_progress *prog,
+			     hists__resort_cb_t cb);
 int hists__collapse_resort(struct hists *hists, struct ui_progress *prog);
 
 void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel);
diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
index 8cdcf4641c51..21c4d9b23c24 100644
--- a/tools/perf/util/target.c
+++ b/tools/perf/util/target.c
@@ -122,11 +122,7 @@ int target__strerror(struct target *target, int errnum,
 	BUG_ON(buflen == 0);
 
 	if (errnum >= 0) {
-		const char *err = str_error_r(errnum, buf, buflen);
-
-		if (err != buf)
-			scnprintf(buf, buflen, "%s", err);
-
+		str_error_r(errnum, buf, buflen);
 		return 0;
 	}
 

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2016-05-25 20:18 Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2016-05-25 20:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa, Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus

   # HEAD: 0c9f790fcbdaf8cfb6dd7fb4e88fadf55082e37e Merge tag 'perf-core-for-mingo-20160523' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Mostly tooling and PMU driver fixes, but also a number of late updates such as the 
reworking of the call-chain size limiting logic to make call-graph recording more 
robust, plus tooling side changes for the new 'backwards ring-buffer' extension to 
the perf ring-buffer.

 Thanks,

	Ingo

------------------>
Andi Kleen (2):
      perf stat: Avoid fractional digits for integer scales
      perf report: Add srcline_from/to branch sort keys

Arnaldo Carvalho de Melo (15):
      perf core: Generalize max_stack sysctl handler
      perf core: Pass max stack as a perf_callchain_entry context
      perf core: Add a 'nr' field to perf_event_callchain_context
      perf core: Add perf_callchain_store_context() helper
      perf core: Separate accounting of contexts and real addresses in a stack trace
      perf tools: Separate accounting of contexts and real addresses in a stack trace
      perf machine: Do not bail out if not managing to read ref reloc symbol
      perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
      perf top: Use machine->kptr_restrict_warned
      perf trace: Fix exit_group() formatting
      perf callchain: Stop validating callchains by the max_stack sysctl
      perf tools: Fix usage of max_stack sysctl
      perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
      perf trace: Use the fd->name beautifier as default for "fd" args
      perf trace: Use the ptr->name beautifier as default for "filename" args

Chris Ryder (2):
      perf annotate: Fix identification of ARM blt and bls instructions
      perf annotate: Sort list of recognised instructions

Colin Ian King (1):
      perf/x86/intel/p4: Trival indentation fix, remove space

He Kuang (2):
      perf symbols: Store vdso buildid unconditionally
      perf tools: Set buildid dir under symfs when --symfs is provided

Jiri Olsa (1):
      perf/x86/intel/uncore: Remove WARN_ON_ONCE in uncore_pci_probe

Masami Hiramatsu (1):
      perf symbols: Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE

Namhyung Kim (3):
      perf stat: Fix indentation of stalled backend cycle
      perf stat: Update runtime using cpu-clock event
      perf stat: Use cpu-clock event for cpu targets

Peter Zijlstra (1):
      perf: Update MAINTAINERS for x86 move

Wang Nan (6):
      perf evsel: Add overwrite attribute and check write_backward
      perf evsel: Record fd into perf_mmap
      perf evlist: Add API to pause/resume
      perf record: Prevent reading invalid data in record__mmap_read
      perf record: Rename variable to make code clear
      perf record: Read from backward ring buffer


 Documentation/sysctl/kernel.txt                    |  14 ++
 MAINTAINERS                                        |   1 +
 arch/arc/kernel/perf_event.c                       |   6 +-
 arch/arm/kernel/perf_callchain.c                   |  10 +-
 arch/arm64/kernel/perf_callchain.c                 |  14 +-
 arch/metag/kernel/perf_callchain.c                 |  10 +-
 arch/mips/kernel/perf_event.c                      |  12 +-
 arch/powerpc/perf/callchain.c                      |  20 +-
 arch/s390/kernel/perf_event.c                      |   4 +-
 arch/sh/kernel/perf_callchain.c                    |   4 +-
 arch/sparc/kernel/perf_event.c                     |  14 +-
 arch/tile/kernel/perf_event.c                      |   6 +-
 arch/x86/events/core.c                             |  14 +-
 arch/x86/events/intel/p4.c                         |   2 +-
 arch/x86/events/intel/uncore.c                     |   2 +-
 arch/xtensa/kernel/perf_event.c                    |  10 +-
 include/linux/perf_event.h                         |  34 ++-
 include/uapi/linux/perf_event.h                    |   1 +
 kernel/bpf/stackmap.c                              |   3 +-
 kernel/events/callchain.c                          |  36 ++-
 kernel/sysctl.c                                    |  11 +-
 tools/perf/Documentation/perf-report.txt           |   5 +-
 tools/perf/Documentation/perf-script.txt           |   2 +-
 tools/perf/Documentation/perf-trace.txt            |   3 +-
 tools/perf/builtin-annotate.c                      |   5 +-
 tools/perf/builtin-buildid-cache.c                 |   8 +-
 tools/perf/builtin-diff.c                          |   5 +-
 tools/perf/builtin-record.c                        |  81 +++++-
 tools/perf/builtin-report.c                        |   7 +-
 tools/perf/builtin-script.c                        |   7 +-
 tools/perf/builtin-stat.c                          |  22 +-
 tools/perf/builtin-timechart.c                     |   5 +-
 tools/perf/builtin-top.c                           |   6 +-
 tools/perf/builtin-trace.c                         | 274 +++++++++------------
 tools/perf/perf.c                                  |   3 +
 tools/perf/util/annotate.c                         |  32 ++-
 tools/perf/util/build-id.c                         |   2 +-
 tools/perf/util/db-export.c                        |   3 +-
 tools/perf/util/dso.c                              |   7 +-
 tools/perf/util/evlist.c                           |  34 +++
 tools/perf/util/evlist.h                           |   4 +
 tools/perf/util/evsel.c                            |  13 +
 tools/perf/util/evsel.h                            |   1 +
 tools/perf/util/hist.c                             |   9 +
 tools/perf/util/hist.h                             |   2 +
 tools/perf/util/machine.c                          |  35 ++-
 tools/perf/util/machine.h                          |   1 +
 .../perf/util/scripting-engines/trace-event-perl.c |   3 +-
 tools/perf/util/sort.c                             |  84 +++++++
 tools/perf/util/sort.h                             |   2 +
 tools/perf/util/stat-shadow.c                      |   8 +-
 tools/perf/util/symbol.c                           |  33 ++-
 tools/perf/util/symbol.h                           |   7 +
 tools/perf/util/top.h                              |   1 -
 tools/perf/util/util.c                             |   3 +-
 tools/perf/util/util.h                             |   3 +-
 56 files changed, 618 insertions(+), 330 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index daabdd7ee543..a3683ce2a2f3 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -61,6 +61,7 @@ Currently, these files might (depending on your configuration)
 - perf_cpu_time_max_percent
 - perf_event_paranoid
 - perf_event_max_stack
+- perf_event_max_contexts_per_stack
 - pid_max
 - powersave-nap               [ PPC only ]
 - printk
@@ -668,6 +669,19 @@ The default value is 127.
 
 ==============================================================
 
+perf_event_max_contexts_per_stack:
+
+Controls maximum number of stack frame context entries for
+(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for
+instance, when using 'perf record -g' or 'perf trace --call-graph fp'.
+
+This can only be done when no events are in use that have callchains
+enabled, otherwise writing to this file will return -EBUSY.
+
+The default value is 8.
+
+==============================================================
+
 pid_max:
 
 PID allocation wrap value.  When the kernel's next PID value
diff --git a/MAINTAINERS b/MAINTAINERS
index 9c567a431d8d..e15a79d88480 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8637,6 +8637,7 @@ F:	arch/*/kernel/*/perf_event*.c
 F:	arch/*/kernel/*/*/perf_event*.c
 F:	arch/*/include/asm/perf_event.h
 F:	arch/*/kernel/perf_callchain.c
+F:	arch/*/events/*
 F:	tools/perf/
 
 PERSONALITY HANDLING
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 8b134cfe5e1f..6fd48021324b 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -48,7 +48,7 @@ struct arc_callchain_trace {
 static int callchain_trace(unsigned int addr, void *data)
 {
 	struct arc_callchain_trace *ctrl = data;
-	struct perf_callchain_entry *entry = ctrl->perf_stuff;
+	struct perf_callchain_entry_ctx *entry = ctrl->perf_stuff;
 	perf_callchain_store(entry, addr);
 
 	if (ctrl->depth++ < 3)
@@ -58,7 +58,7 @@ static int callchain_trace(unsigned int addr, void *data)
 }
 
 void
-perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	struct arc_callchain_trace ctrl = {
 		.depth = 0,
@@ -69,7 +69,7 @@ perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
 }
 
 void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	/*
 	 * User stack can't be unwound trivially with kernel dwarf unwinder
diff --git a/arch/arm/kernel/perf_callchain.c b/arch/arm/kernel/perf_callchain.c
index 27563befa8a2..22bf1f64d99a 100644
--- a/arch/arm/kernel/perf_callchain.c
+++ b/arch/arm/kernel/perf_callchain.c
@@ -31,7 +31,7 @@ struct frame_tail {
  */
 static struct frame_tail __user *
 user_backtrace(struct frame_tail __user *tail,
-	       struct perf_callchain_entry *entry)
+	       struct perf_callchain_entry_ctx *entry)
 {
 	struct frame_tail buftail;
 	unsigned long err;
@@ -59,7 +59,7 @@ user_backtrace(struct frame_tail __user *tail,
 }
 
 void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	struct frame_tail __user *tail;
 
@@ -75,7 +75,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 
 	tail = (struct frame_tail __user *)regs->ARM_fp - 1;
 
-	while ((entry->nr < sysctl_perf_event_max_stack) &&
+	while ((entry->nr < entry->max_stack) &&
 	       tail && !((unsigned long)tail & 0x3))
 		tail = user_backtrace(tail, entry);
 }
@@ -89,13 +89,13 @@ static int
 callchain_trace(struct stackframe *fr,
 		void *data)
 {
-	struct perf_callchain_entry *entry = data;
+	struct perf_callchain_entry_ctx *entry = data;
 	perf_callchain_store(entry, fr->pc);
 	return 0;
 }
 
 void
-perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	struct stackframe fr;
 
diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 32c3c6e70119..713ca824f266 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -31,7 +31,7 @@ struct frame_tail {
  */
 static struct frame_tail __user *
 user_backtrace(struct frame_tail __user *tail,
-	       struct perf_callchain_entry *entry)
+	       struct perf_callchain_entry_ctx *entry)
 {
 	struct frame_tail buftail;
 	unsigned long err;
@@ -76,7 +76,7 @@ struct compat_frame_tail {
 
 static struct compat_frame_tail __user *
 compat_user_backtrace(struct compat_frame_tail __user *tail,
-		      struct perf_callchain_entry *entry)
+		      struct perf_callchain_entry_ctx *entry)
 {
 	struct compat_frame_tail buftail;
 	unsigned long err;
@@ -106,7 +106,7 @@ compat_user_backtrace(struct compat_frame_tail __user *tail,
 }
 #endif /* CONFIG_COMPAT */
 
-void perf_callchain_user(struct perf_callchain_entry *entry,
+void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
@@ -122,7 +122,7 @@ void perf_callchain_user(struct perf_callchain_entry *entry,
 
 		tail = (struct frame_tail __user *)regs->regs[29];
 
-		while (entry->nr < sysctl_perf_event_max_stack &&
+		while (entry->nr < entry->max_stack &&
 		       tail && !((unsigned long)tail & 0xf))
 			tail = user_backtrace(tail, entry);
 	} else {
@@ -132,7 +132,7 @@ void perf_callchain_user(struct perf_callchain_entry *entry,
 
 		tail = (struct compat_frame_tail __user *)regs->compat_fp - 1;
 
-		while ((entry->nr < sysctl_perf_event_max_stack) &&
+		while ((entry->nr < entry->max_stack) &&
 			tail && !((unsigned long)tail & 0x3))
 			tail = compat_user_backtrace(tail, entry);
 #endif
@@ -146,12 +146,12 @@ void perf_callchain_user(struct perf_callchain_entry *entry,
  */
 static int callchain_trace(struct stackframe *frame, void *data)
 {
-	struct perf_callchain_entry *entry = data;
+	struct perf_callchain_entry_ctx *entry = data;
 	perf_callchain_store(entry, frame->pc);
 	return 0;
 }
 
-void perf_callchain_kernel(struct perf_callchain_entry *entry,
+void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
 	struct stackframe frame;
diff --git a/arch/metag/kernel/perf_callchain.c b/arch/metag/kernel/perf_callchain.c
index 252abc12a5a3..3e8e048040df 100644
--- a/arch/metag/kernel/perf_callchain.c
+++ b/arch/metag/kernel/perf_callchain.c
@@ -29,7 +29,7 @@ static bool is_valid_call(unsigned long calladdr)
 
 static struct metag_frame __user *
 user_backtrace(struct metag_frame __user *user_frame,
-	       struct perf_callchain_entry *entry)
+	       struct perf_callchain_entry_ctx *entry)
 {
 	struct metag_frame frame;
 	unsigned long calladdr;
@@ -56,7 +56,7 @@ user_backtrace(struct metag_frame __user *user_frame,
 }
 
 void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	unsigned long sp = regs->ctx.AX[0].U0;
 	struct metag_frame __user *frame;
@@ -65,7 +65,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 
 	--frame;
 
-	while ((entry->nr < sysctl_perf_event_max_stack) && frame)
+	while ((entry->nr < entry->max_stack) && frame)
 		frame = user_backtrace(frame, entry);
 }
 
@@ -78,13 +78,13 @@ static int
 callchain_trace(struct stackframe *fr,
 		void *data)
 {
-	struct perf_callchain_entry *entry = data;
+	struct perf_callchain_entry_ctx *entry = data;
 	perf_callchain_store(entry, fr->pc);
 	return 0;
 }
 
 void
-perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	struct stackframe fr;
 
diff --git a/arch/mips/kernel/perf_event.c b/arch/mips/kernel/perf_event.c
index 5021c546ad07..d64056e0bb56 100644
--- a/arch/mips/kernel/perf_event.c
+++ b/arch/mips/kernel/perf_event.c
@@ -25,8 +25,8 @@
  * the user stack callchains, we will add it here.
  */
 
-static void save_raw_perf_callchain(struct perf_callchain_entry *entry,
-	unsigned long reg29)
+static void save_raw_perf_callchain(struct perf_callchain_entry_ctx *entry,
+				    unsigned long reg29)
 {
 	unsigned long *sp = (unsigned long *)reg29;
 	unsigned long addr;
@@ -35,14 +35,14 @@ static void save_raw_perf_callchain(struct perf_callchain_entry *entry,
 		addr = *sp++;
 		if (__kernel_text_address(addr)) {
 			perf_callchain_store(entry, addr);
-			if (entry->nr >= sysctl_perf_event_max_stack)
+			if (entry->nr >= entry->max_stack)
 				break;
 		}
 	}
 }
 
-void perf_callchain_kernel(struct perf_callchain_entry *entry,
-		      struct pt_regs *regs)
+void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
+			   struct pt_regs *regs)
 {
 	unsigned long sp = regs->regs[29];
 #ifdef CONFIG_KALLSYMS
@@ -59,7 +59,7 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry,
 	}
 	do {
 		perf_callchain_store(entry, pc);
-		if (entry->nr >= sysctl_perf_event_max_stack)
+		if (entry->nr >= entry->max_stack)
 			break;
 		pc = unwind_stack(current, &sp, pc, &ra);
 	} while (pc);
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 22d9015c1acc..f62597dbd757 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -47,7 +47,7 @@ static int valid_next_sp(unsigned long sp, unsigned long prev_sp)
 }
 
 void
-perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	unsigned long sp, next_sp;
 	unsigned long next_ip;
@@ -76,7 +76,7 @@ perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
 			next_ip = regs->nip;
 			lr = regs->link;
 			level = 0;
-			perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
+			perf_callchain_store_context(entry, PERF_CONTEXT_KERNEL);
 
 		} else {
 			if (level == 0)
@@ -232,7 +232,7 @@ static int sane_signal_64_frame(unsigned long sp)
 		puc == (unsigned long) &sf->uc;
 }
 
-static void perf_callchain_user_64(struct perf_callchain_entry *entry,
+static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 				   struct pt_regs *regs)
 {
 	unsigned long sp, next_sp;
@@ -247,7 +247,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry *entry,
 	sp = regs->gpr[1];
 	perf_callchain_store(entry, next_ip);
 
-	while (entry->nr < sysctl_perf_event_max_stack) {
+	while (entry->nr < entry->max_stack) {
 		fp = (unsigned long __user *) sp;
 		if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
 			return;
@@ -274,7 +274,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry *entry,
 			    read_user_stack_64(&uregs[PT_R1], &sp))
 				return;
 			level = 0;
-			perf_callchain_store(entry, PERF_CONTEXT_USER);
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
 			perf_callchain_store(entry, next_ip);
 			continue;
 		}
@@ -319,7 +319,7 @@ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
 	return rc;
 }
 
-static inline void perf_callchain_user_64(struct perf_callchain_entry *entry,
+static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 					  struct pt_regs *regs)
 {
 }
@@ -439,7 +439,7 @@ static unsigned int __user *signal_frame_32_regs(unsigned int sp,
 	return mctx->mc_gregs;
 }
 
-static void perf_callchain_user_32(struct perf_callchain_entry *entry,
+static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 				   struct pt_regs *regs)
 {
 	unsigned int sp, next_sp;
@@ -453,7 +453,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry *entry,
 	sp = regs->gpr[1];
 	perf_callchain_store(entry, next_ip);
 
-	while (entry->nr < sysctl_perf_event_max_stack) {
+	while (entry->nr < entry->max_stack) {
 		fp = (unsigned int __user *) (unsigned long) sp;
 		if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
 			return;
@@ -473,7 +473,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry *entry,
 			    read_user_stack_32(&uregs[PT_R1], &sp))
 				return;
 			level = 0;
-			perf_callchain_store(entry, PERF_CONTEXT_USER);
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
 			perf_callchain_store(entry, next_ip);
 			continue;
 		}
@@ -487,7 +487,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry *entry,
 }
 
 void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	if (current_is_64bit())
 		perf_callchain_user_64(entry, regs);
diff --git a/arch/s390/kernel/perf_event.c b/arch/s390/kernel/perf_event.c
index c3e4099b60a5..87035fa58bbe 100644
--- a/arch/s390/kernel/perf_event.c
+++ b/arch/s390/kernel/perf_event.c
@@ -224,13 +224,13 @@ arch_initcall(service_level_perf_register);
 
 static int __perf_callchain_kernel(void *data, unsigned long address)
 {
-	struct perf_callchain_entry *entry = data;
+	struct perf_callchain_entry_ctx *entry = data;
 
 	perf_callchain_store(entry, address);
 	return 0;
 }
 
-void perf_callchain_kernel(struct perf_callchain_entry *entry,
+void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
 	if (user_mode(regs))
diff --git a/arch/sh/kernel/perf_callchain.c b/arch/sh/kernel/perf_callchain.c
index cc80b614b5fa..fa2c0cd23eaa 100644
--- a/arch/sh/kernel/perf_callchain.c
+++ b/arch/sh/kernel/perf_callchain.c
@@ -21,7 +21,7 @@ static int callchain_stack(void *data, char *name)
 
 static void callchain_address(void *data, unsigned long addr, int reliable)
 {
-	struct perf_callchain_entry *entry = data;
+	struct perf_callchain_entry_ctx *entry = data;
 
 	if (reliable)
 		perf_callchain_store(entry, addr);
@@ -33,7 +33,7 @@ static const struct stacktrace_ops callchain_ops = {
 };
 
 void
-perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	perf_callchain_store(entry, regs->pc);
 
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index a4b8b5aed21c..710f3278d448 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1711,7 +1711,7 @@ static int __init init_hw_perf_events(void)
 }
 pure_initcall(init_hw_perf_events);
 
-void perf_callchain_kernel(struct perf_callchain_entry *entry,
+void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
 	unsigned long ksp, fp;
@@ -1756,7 +1756,7 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry,
 			}
 		}
 #endif
-	} while (entry->nr < sysctl_perf_event_max_stack);
+	} while (entry->nr < entry->max_stack);
 }
 
 static inline int
@@ -1769,7 +1769,7 @@ valid_user_frame(const void __user *fp, unsigned long size)
 	return (__range_not_ok(fp, size, TASK_SIZE) == 0);
 }
 
-static void perf_callchain_user_64(struct perf_callchain_entry *entry,
+static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 				   struct pt_regs *regs)
 {
 	unsigned long ufp;
@@ -1790,10 +1790,10 @@ static void perf_callchain_user_64(struct perf_callchain_entry *entry,
 		pc = sf.callers_pc;
 		ufp = (unsigned long)sf.fp + STACK_BIAS;
 		perf_callchain_store(entry, pc);
-	} while (entry->nr < sysctl_perf_event_max_stack);
+	} while (entry->nr < entry->max_stack);
 }
 
-static void perf_callchain_user_32(struct perf_callchain_entry *entry,
+static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 				   struct pt_regs *regs)
 {
 	unsigned long ufp;
@@ -1822,11 +1822,11 @@ static void perf_callchain_user_32(struct perf_callchain_entry *entry,
 			ufp = (unsigned long)sf.fp;
 		}
 		perf_callchain_store(entry, pc);
-	} while (entry->nr < sysctl_perf_event_max_stack);
+	} while (entry->nr < entry->max_stack);
 }
 
 void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	u64 saved_fault_address = current_thread_info()->fault_address;
 	u8 saved_fault_code = get_thread_fault_code();
diff --git a/arch/tile/kernel/perf_event.c b/arch/tile/kernel/perf_event.c
index 8767060d70fb..6394c1ccb68e 100644
--- a/arch/tile/kernel/perf_event.c
+++ b/arch/tile/kernel/perf_event.c
@@ -941,7 +941,7 @@ arch_initcall(init_hw_perf_events);
 /*
  * Tile specific backtracing code for perf_events.
  */
-static inline void perf_callchain(struct perf_callchain_entry *entry,
+static inline void perf_callchain(struct perf_callchain_entry_ctx *entry,
 		    struct pt_regs *regs)
 {
 	struct KBacktraceIterator kbt;
@@ -992,13 +992,13 @@ static inline void perf_callchain(struct perf_callchain_entry *entry,
 	}
 }
 
-void perf_callchain_user(struct perf_callchain_entry *entry,
+void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 		    struct pt_regs *regs)
 {
 	perf_callchain(entry, regs);
 }
 
-void perf_callchain_kernel(struct perf_callchain_entry *entry,
+void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 		      struct pt_regs *regs)
 {
 	perf_callchain(entry, regs);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 73a75aa5a66d..33787ee817f0 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2202,7 +2202,7 @@ static int backtrace_stack(void *data, char *name)
 
 static int backtrace_address(void *data, unsigned long addr, int reliable)
 {
-	struct perf_callchain_entry *entry = data;
+	struct perf_callchain_entry_ctx *entry = data;
 
 	return perf_callchain_store(entry, addr);
 }
@@ -2214,7 +2214,7 @@ static const struct stacktrace_ops backtrace_ops = {
 };
 
 void
-perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
 		/* TODO: We don't support guest os callchain now */
@@ -2268,7 +2268,7 @@ static unsigned long get_segment_base(unsigned int segment)
 #include <asm/compat.h>
 
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *entry)
 {
 	/* 32-bit process in 64-bit kernel. */
 	unsigned long ss_base, cs_base;
@@ -2283,7 +2283,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
 
 	fp = compat_ptr(ss_base + regs->bp);
 	pagefault_disable();
-	while (entry->nr < sysctl_perf_event_max_stack) {
+	while (entry->nr < entry->max_stack) {
 		unsigned long bytes;
 		frame.next_frame     = 0;
 		frame.return_address = 0;
@@ -2309,14 +2309,14 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
 }
 #else
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *entry)
 {
     return 0;
 }
 #endif
 
 void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
 	struct stack_frame frame;
 	const void __user *fp;
@@ -2343,7 +2343,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 		return;
 
 	pagefault_disable();
-	while (entry->nr < sysctl_perf_event_max_stack) {
+	while (entry->nr < entry->max_stack) {
 		unsigned long bytes;
 		frame.next_frame	     = NULL;
 		frame.return_address = 0;
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index 0a5ede187d9c..eb0533558c2b 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -826,7 +826,7 @@ static int p4_hw_config(struct perf_event *event)
 		 * Clear bits we reserve to be managed by kernel itself
 		 * and never allowed from a user space
 		 */
-		 event->attr.config &= P4_CONFIG_MASK;
+		event->attr.config &= P4_CONFIG_MASK;
 
 		rc = p4_validate_raw_event(event);
 		if (rc)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 16c178916412..fce74062d981 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -891,7 +891,7 @@ static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id
 		return -ENODEV;
 
 	pkg = topology_phys_to_logical_pkg(phys_id);
-	if (WARN_ON_ONCE(pkg < 0))
+	if (pkg < 0)
 		return -EINVAL;
 
 	if (UNCORE_PCI_DEV_TYPE(id->driver_data) == UNCORE_EXTRA_PCI_DEV) {
diff --git a/arch/xtensa/kernel/perf_event.c b/arch/xtensa/kernel/perf_event.c
index a6b00b3af429..ef90479e0397 100644
--- a/arch/xtensa/kernel/perf_event.c
+++ b/arch/xtensa/kernel/perf_event.c
@@ -323,23 +323,23 @@ static void xtensa_pmu_read(struct perf_event *event)
 
 static int callchain_trace(struct stackframe *frame, void *data)
 {
-	struct perf_callchain_entry *entry = data;
+	struct perf_callchain_entry_ctx *entry = data;
 
 	perf_callchain_store(entry, frame->pc);
 	return 0;
 }
 
-void perf_callchain_kernel(struct perf_callchain_entry *entry,
+void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 			   struct pt_regs *regs)
 {
-	xtensa_backtrace_kernel(regs, sysctl_perf_event_max_stack,
+	xtensa_backtrace_kernel(regs, entry->max_stack,
 				callchain_trace, NULL, entry);
 }
 
-void perf_callchain_user(struct perf_callchain_entry *entry,
+void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 			 struct pt_regs *regs)
 {
-	xtensa_backtrace_user(regs, sysctl_perf_event_max_stack,
+	xtensa_backtrace_user(regs, entry->max_stack,
 			      callchain_trace, entry);
 }
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9e1c3ada91c4..6b87be908790 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -61,6 +61,14 @@ struct perf_callchain_entry {
 	__u64				ip[0]; /* /proc/sys/kernel/perf_event_max_stack */
 };
 
+struct perf_callchain_entry_ctx {
+	struct perf_callchain_entry *entry;
+	u32			    max_stack;
+	u32			    nr;
+	short			    contexts;
+	bool			    contexts_maxed;
+};
+
 struct perf_raw_record {
 	u32				size;
 	void				*data;
@@ -1063,20 +1071,36 @@ extern void perf_event_fork(struct task_struct *tsk);
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
 
-extern void perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs);
-extern void perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs);
+extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs);
+extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs);
 extern struct perf_callchain_entry *
 get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
-		   bool crosstask, bool add_mark);
+		   u32 max_stack, bool crosstask, bool add_mark);
 extern int get_callchain_buffers(void);
 extern void put_callchain_buffers(void);
 
 extern int sysctl_perf_event_max_stack;
+extern int sysctl_perf_event_max_contexts_per_stack;
+
+static inline int perf_callchain_store_context(struct perf_callchain_entry_ctx *ctx, u64 ip)
+{
+	if (ctx->contexts < sysctl_perf_event_max_contexts_per_stack) {
+		struct perf_callchain_entry *entry = ctx->entry;
+		entry->ip[entry->nr++] = ip;
+		++ctx->contexts;
+		return 0;
+	} else {
+		ctx->contexts_maxed = true;
+		return -1; /* no more room, stop walking the stack */
+	}
+}
 
-static inline int perf_callchain_store(struct perf_callchain_entry *entry, u64 ip)
+static inline int perf_callchain_store(struct perf_callchain_entry_ctx *ctx, u64 ip)
 {
-	if (entry->nr < sysctl_perf_event_max_stack) {
+	if (ctx->nr < ctx->max_stack && !ctx->contexts_maxed) {
+		struct perf_callchain_entry *entry = ctx->entry;
 		entry->ip[entry->nr++] = ip;
+		++ctx->nr;
 		return 0;
 	} else {
 		return -1; /* no more room, stop walking the stack */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 43fc8d213472..36ce552cf6a9 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -862,6 +862,7 @@ enum perf_event_type {
 };
 
 #define PERF_MAX_STACK_DEPTH		127
+#define PERF_MAX_CONTEXTS_PER_STACK	  8
 
 enum perf_callchain_context {
 	PERF_CONTEXT_HV			= (__u64)-32,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index f5a19548be12..a82d7605db3f 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -136,7 +136,8 @@ static u64 bpf_get_stackid(u64 r1, u64 r2, u64 flags, u64 r4, u64 r5)
 			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
 		return -EINVAL;
 
-	trace = get_perf_callchain(regs, init_nr, kernel, user, false, false);
+	trace = get_perf_callchain(regs, init_nr, kernel, user,
+				   sysctl_perf_event_max_stack, false, false);
 
 	if (unlikely(!trace))
 		/* couldn't fetch the stack trace */
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index b9325e7dcba1..179ef4640964 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -19,11 +19,13 @@ struct callchain_cpus_entries {
 };
 
 int sysctl_perf_event_max_stack __read_mostly = PERF_MAX_STACK_DEPTH;
+int sysctl_perf_event_max_contexts_per_stack __read_mostly = PERF_MAX_CONTEXTS_PER_STACK;
 
 static inline size_t perf_callchain_entry__sizeof(void)
 {
 	return (sizeof(struct perf_callchain_entry) +
-		sizeof(__u64) * sysctl_perf_event_max_stack);
+		sizeof(__u64) * (sysctl_perf_event_max_stack +
+				 sysctl_perf_event_max_contexts_per_stack));
 }
 
 static DEFINE_PER_CPU(int, callchain_recursion[PERF_NR_CONTEXTS]);
@@ -32,12 +34,12 @@ static DEFINE_MUTEX(callchain_mutex);
 static struct callchain_cpus_entries *callchain_cpus_entries;
 
 
-__weak void perf_callchain_kernel(struct perf_callchain_entry *entry,
+__weak void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
 				  struct pt_regs *regs)
 {
 }
 
-__weak void perf_callchain_user(struct perf_callchain_entry *entry,
+__weak void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 				struct pt_regs *regs)
 {
 }
@@ -176,14 +178,15 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
 	if (!kernel && !user)
 		return NULL;
 
-	return get_perf_callchain(regs, 0, kernel, user, crosstask, true);
+	return get_perf_callchain(regs, 0, kernel, user, sysctl_perf_event_max_stack, crosstask, true);
 }
 
 struct perf_callchain_entry *
 get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
-		   bool crosstask, bool add_mark)
+		   u32 max_stack, bool crosstask, bool add_mark)
 {
 	struct perf_callchain_entry *entry;
+	struct perf_callchain_entry_ctx ctx;
 	int rctx;
 
 	entry = get_callchain_entry(&rctx);
@@ -193,12 +196,16 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
 	if (!entry)
 		goto exit_put;
 
-	entry->nr = init_nr;
+	ctx.entry     = entry;
+	ctx.max_stack = max_stack;
+	ctx.nr	      = entry->nr = init_nr;
+	ctx.contexts       = 0;
+	ctx.contexts_maxed = false;
 
 	if (kernel && !user_mode(regs)) {
 		if (add_mark)
-			perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
-		perf_callchain_kernel(entry, regs);
+			perf_callchain_store_context(&ctx, PERF_CONTEXT_KERNEL);
+		perf_callchain_kernel(&ctx, regs);
 	}
 
 	if (user) {
@@ -214,8 +221,8 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
 				goto exit_put;
 
 			if (add_mark)
-				perf_callchain_store(entry, PERF_CONTEXT_USER);
-			perf_callchain_user(entry, regs);
+				perf_callchain_store_context(&ctx, PERF_CONTEXT_USER);
+			perf_callchain_user(&ctx, regs);
 		}
 	}
 
@@ -225,10 +232,15 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
 	return entry;
 }
 
+/*
+ * Used for sysctl_perf_event_max_stack and
+ * sysctl_perf_event_max_contexts_per_stack.
+ */
 int perf_event_max_stack_handler(struct ctl_table *table, int write,
 				 void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int new_value = sysctl_perf_event_max_stack, ret;
+	int *value = table->data;
+	int new_value = *value, ret;
 	struct ctl_table new_table = *table;
 
 	new_table.data = &new_value;
@@ -240,7 +252,7 @@ int perf_event_max_stack_handler(struct ctl_table *table, int write,
 	if (atomic_read(&nr_callchain_events))
 		ret = -EBUSY;
 	else
-		sysctl_perf_event_max_stack = new_value;
+		*value = new_value;
 
 	mutex_unlock(&callchain_mutex);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c8b318663525..bec4c11c47d6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1149,13 +1149,22 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname	= "perf_event_max_stack",
-		.data		= NULL, /* filled in by handler */
+		.data		= &sysctl_perf_event_max_stack,
 		.maxlen		= sizeof(sysctl_perf_event_max_stack),
 		.mode		= 0644,
 		.proc_handler	= perf_event_max_stack_handler,
 		.extra1		= &zero,
 		.extra2		= &six_hundred_forty_kb,
 	},
+	{
+		.procname	= "perf_event_max_contexts_per_stack",
+		.data		= &sysctl_perf_event_max_contexts_per_stack,
+		.maxlen		= sizeof(sysctl_perf_event_max_contexts_per_stack),
+		.mode		= 0644,
+		.proc_handler	= perf_event_max_stack_handler,
+		.extra1		= &zero,
+		.extra2		= &one_thousand,
+	},
 #endif
 #ifdef CONFIG_KMEMCHECK
 	{
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index ebaf849e30ef..9cbddc290aff 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -103,12 +103,13 @@ OPTIONS
 
 	If --branch-stack option is used, following sort keys are also
 	available:
-	dso_from, dso_to, symbol_from, symbol_to, mispredict.
 
 	- dso_from: name of library or module branched from
 	- dso_to: name of library or module branched to
 	- symbol_from: name of function branched from
 	- symbol_to: name of function branched to
+	- srcline_from: source file and line branched from
+	- srcline_to: source file and line branched to
 	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
 	- in_tx: branch in TSX transaction
 	- abort: TSX transaction abort.
@@ -248,7 +249,7 @@ OPTIONS
 	Note that when using the --itrace option the synthesized callchain size
 	will override this value if the synthesized callchain size is bigger.
 
-	Default: /proc/sys/kernel/perf_event_max_stack when present, 127 otherwise.
+	Default: 127
 
 -G::
 --inverted::
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index a856a1095893..4fc44c75263f 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -267,7 +267,7 @@ include::itrace.txt[]
         Note that when using the --itrace option the synthesized callchain size
         will override this value if the synthesized callchain size is bigger.
 
-        Default: /proc/sys/kernel/perf_event_max_stack when present, 127 otherwise.
+        Default: 127
 
 --ns::
 	Use 9 decimal places when displaying time (i.e. show the nanoseconds)
diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt
index 6afe20121bc0..1ab0782369b1 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -143,7 +143,8 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs.
         Implies '--call-graph dwarf' when --call-graph not present on the
         command line, on systems where DWARF unwinding was built in.
 
-        Default: /proc/sys/kernel/perf_event_max_stack when present, 127 otherwise.
+        Default: /proc/sys/kernel/perf_event_max_stack when present for
+                 live sessions (without --input/-i), 127 otherwise.
 
 --min-stack::
         Set the stack depth limit when parsing the callchain, anything
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 814158393656..25c81734a950 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -324,8 +324,9 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "skip-missing", &annotate.skip_missing,
 		    "Skip symbols that cannot be annotated"),
 	OPT_STRING('C', "cpu", &annotate.cpu_list, "cpu", "list of cpus to profile"),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		   "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory",
+		     "Look for files with symbols relative to this directory",
+		     symbol__config_symfs),
 	OPT_BOOLEAN(0, "source", &symbol_conf.annotate_src,
 		    "Interleave source code with assembly code (default)"),
 	OPT_BOOLEAN(0, "asm-raw", &symbol_conf.annotate_asm_raw,
diff --git a/tools/perf/builtin-buildid-cache.c b/tools/perf/builtin-buildid-cache.c
index 632efc6b79a0..d75bded21fe0 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -119,8 +119,8 @@ static int build_id_cache__add_kcore(const char *filename, bool force)
 	if (build_id_cache__kcore_buildid(from_dir, sbuildid) < 0)
 		return -1;
 
-	scnprintf(to_dir, sizeof(to_dir), "%s/[kernel.kcore]/%s",
-		  buildid_dir, sbuildid);
+	scnprintf(to_dir, sizeof(to_dir), "%s/%s/%s",
+		  buildid_dir, DSO__NAME_KCORE, sbuildid);
 
 	if (!force &&
 	    !build_id_cache__kcore_existing(from_dir, to_dir, sizeof(to_dir))) {
@@ -131,8 +131,8 @@ static int build_id_cache__add_kcore(const char *filename, bool force)
 	if (build_id_cache__kcore_dir(dir, sizeof(dir)))
 		return -1;
 
-	scnprintf(to_dir, sizeof(to_dir), "%s/[kernel.kcore]/%s/%s",
-		  buildid_dir, sbuildid, dir);
+	scnprintf(to_dir, sizeof(to_dir), "%s/%s/%s/%s",
+		  buildid_dir, DSO__NAME_KCORE, sbuildid, dir);
 
 	if (mkdir_p(to_dir, 0755))
 		return -1;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 9ce354f469dc..f7645a42708e 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -812,8 +812,9 @@ static const struct option options[] = {
 	OPT_STRING_NOEMPTY('t', "field-separator", &symbol_conf.field_sep, "separator",
 		   "separator for columns, no spaces will be added between "
 		   "columns '.' is reserved."),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		    "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory",
+		     "Look for files with symbols relative to this directory",
+		     symbol__config_symfs),
 	OPT_UINTEGER('o', "order", &sort_compute, "Specify compute sorting."),
 	OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
 		     "How to display percentage of filtered entries", parse_filter_percentage),
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f3679c44d3f3..dc3fcb597e4c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -40,6 +40,7 @@
 #include <unistd.h>
 #include <sched.h>
 #include <sys/mman.h>
+#include <asm/bug.h>
 
 
 struct record {
@@ -82,27 +83,87 @@ static int process_synthesized_event(struct perf_tool *tool,
 	return record__write(rec, event, event->header.size);
 }
 
+static int
+backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
+{
+	struct perf_event_header *pheader;
+	u64 evt_head = head;
+	int size = mask + 1;
+
+	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
+	pheader = (struct perf_event_header *)(buf + (head & mask));
+	*start = head;
+	while (true) {
+		if (evt_head - head >= (unsigned int)size) {
+			pr_debug("Finshed reading backward ring buffer: rewind\n");
+			if (evt_head - head > (unsigned int)size)
+				evt_head -= pheader->size;
+			*end = evt_head;
+			return 0;
+		}
+
+		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
+
+		if (pheader->size == 0) {
+			pr_debug("Finshed reading backward ring buffer: get start\n");
+			*end = evt_head;
+			return 0;
+		}
+
+		evt_head += pheader->size;
+		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
+	}
+	WARN_ONCE(1, "Shouldn't get here\n");
+	return -1;
+}
+
+static int
+rb_find_range(struct perf_evlist *evlist,
+	      void *data, int mask, u64 head, u64 old,
+	      u64 *start, u64 *end)
+{
+	if (!evlist->backward) {
+		*start = old;
+		*end = head;
+		return 0;
+	}
+
+	return backward_rb_find_range(data, mask, head, start, end);
+}
+
 static int record__mmap_read(struct record *rec, int idx)
 {
 	struct perf_mmap *md = &rec->evlist->mmap[idx];
 	u64 head = perf_mmap__read_head(md);
 	u64 old = md->prev;
+	u64 end = head, start = old;
 	unsigned char *data = md->base + page_size;
 	unsigned long size;
 	void *buf;
 	int rc = 0;
 
-	if (old == head)
+	if (rb_find_range(rec->evlist, data, md->mask, head,
+			  old, &start, &end))
+		return -1;
+
+	if (start == end)
 		return 0;
 
 	rec->samples++;
 
-	size = head - old;
+	size = end - start;
+	if (size > (unsigned long)(md->mask) + 1) {
+		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
+
+		md->prev = head;
+		perf_evlist__mmap_consume(rec->evlist, idx);
+		return 0;
+	}
 
-	if ((old & md->mask) + size != (head & md->mask)) {
-		buf = &data[old & md->mask];
-		size = md->mask + 1 - (old & md->mask);
-		old += size;
+	if ((start & md->mask) + size != (end & md->mask)) {
+		buf = &data[start & md->mask];
+		size = md->mask + 1 - (start & md->mask);
+		start += size;
 
 		if (record__write(rec, buf, size) < 0) {
 			rc = -1;
@@ -110,16 +171,16 @@ static int record__mmap_read(struct record *rec, int idx)
 		}
 	}
 
-	buf = &data[old & md->mask];
-	size = head - old;
-	old += size;
+	buf = &data[start & md->mask];
+	size = end - start;
+	start += size;
 
 	if (record__write(rec, buf, size) < 0) {
 		rc = -1;
 		goto out;
 	}
 
-	md->prev = old;
+	md->prev = head;
 	perf_evlist__mmap_consume(rec->evlist, idx);
 out:
 	return rc;
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 87d40e3c4078..a87cb338bdf1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -691,7 +691,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 			.ordered_events	 = true,
 			.ordering_requires_timestamps = true,
 		},
-		.max_stack		 = sysctl_perf_event_max_stack,
+		.max_stack		 = PERF_MAX_STACK_DEPTH,
 		.pretty_printing_style	 = "normal",
 		.socket_filter		 = -1,
 	};
@@ -770,8 +770,9 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "columns '.' is reserved."),
 	OPT_BOOLEAN('U', "hide-unresolved", &symbol_conf.hide_unresolved,
 		    "Only display entries resolved to a symbol"),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		    "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory",
+		     "Look for files with symbols relative to this directory",
+		     symbol__config_symfs),
 	OPT_STRING('C', "cpu", &report.cpu_list, "cpu",
 		   "list of cpus to profile"),
 	OPT_BOOLEAN('I', "show-info", &report.show_full_info,
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index efca81679bb3..e3ce2f34d3ad 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2010,8 +2010,9 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "file", "kallsyms pathname"),
 	OPT_BOOLEAN('G', "hide-call-graph", &no_callchain,
 		    "When printing symbols do not display call chain"),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		    "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory",
+		     "Look for files with symbols relative to this directory",
+		     symbol__config_symfs),
 	OPT_CALLBACK('F', "fields", NULL, "str",
 		     "comma separated output fields prepend with 'type:'. "
 		     "Valid types: hw,sw,trace,raw. "
@@ -2067,8 +2068,6 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		NULL
 	};
 
-	scripting_max_stack = sysctl_perf_event_max_stack;
-
 	setup_scripting();
 
 	argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e459b685a4e9..ee7ada78d86f 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -66,6 +66,7 @@
 #include <stdlib.h>
 #include <sys/prctl.h>
 #include <locale.h>
+#include <math.h>
 
 #define DEFAULT_SEPARATOR	" "
 #define CNTR_NOT_SUPPORTED	"<not supported>"
@@ -991,12 +992,12 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
 	const char *fmt;
 
 	if (csv_output) {
-		fmt = sc != 1.0 ?  "%.2f%s" : "%.0f%s";
+		fmt = floor(sc) != sc ?  "%.2f%s" : "%.0f%s";
 	} else {
 		if (big_num)
-			fmt = sc != 1.0 ? "%'18.2f%s" : "%'18.0f%s";
+			fmt = floor(sc) != sc ? "%'18.2f%s" : "%'18.0f%s";
 		else
-			fmt = sc != 1.0 ? "%18.2f%s" : "%18.0f%s";
+			fmt = floor(sc) != sc ? "%18.2f%s" : "%18.0f%s";
 	}
 
 	aggr_printout(evsel, id, nr);
@@ -1909,6 +1910,9 @@ static int add_default_attributes(void)
 	}
 
 	if (!evsel_list->nr_entries) {
+		if (target__has_cpu(&target))
+			default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
+
 		if (perf_evlist__add_default_attrs(evsel_list, default_attrs0) < 0)
 			return -1;
 		if (pmu_have_event("cpu", "stalled-cycles-frontend")) {
@@ -2000,7 +2004,7 @@ static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
 				    union perf_event *event,
 				    struct perf_session *session)
 {
-	struct stat_round_event *round = &event->stat_round;
+	struct stat_round_event *stat_round = &event->stat_round;
 	struct perf_evsel *counter;
 	struct timespec tsh, *ts = NULL;
 	const char **argv = session->header.env.cmdline_argv;
@@ -2009,12 +2013,12 @@ static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
 	evlist__for_each(evsel_list, counter)
 		perf_stat_process_counter(&stat_config, counter);
 
-	if (round->type == PERF_STAT_ROUND_TYPE__FINAL)
-		update_stats(&walltime_nsecs_stats, round->time);
+	if (stat_round->type == PERF_STAT_ROUND_TYPE__FINAL)
+		update_stats(&walltime_nsecs_stats, stat_round->time);
 
-	if (stat_config.interval && round->time) {
-		tsh.tv_sec  = round->time / NSECS_PER_SEC;
-		tsh.tv_nsec = round->time % NSECS_PER_SEC;
+	if (stat_config.interval && stat_round->time) {
+		tsh.tv_sec  = stat_round->time / NSECS_PER_SEC;
+		tsh.tv_nsec = stat_round->time % NSECS_PER_SEC;
 		ts = &tsh;
 	}
 
diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 40cc9bb3506c..733a55422d03 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -1945,8 +1945,9 @@ int cmd_timechart(int argc, const char **argv,
 	OPT_CALLBACK('p', "process", NULL, "process",
 		      "process selector. Pass a pid or process name.",
 		       parse_process),
-	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
-		    "Look for files with symbols relative to this directory"),
+	OPT_CALLBACK(0, "symfs", NULL, "directory",
+		     "Look for files with symbols relative to this directory",
+		     symbol__config_symfs),
 	OPT_INTEGER('n', "proc-num", &tchart.proc_num,
 		    "min. number of tasks to print"),
 	OPT_BOOLEAN('t', "topology", &tchart.topology,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 1793da585676..2a6cc254ad0c 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -732,7 +732,7 @@ static void perf_event__process_sample(struct perf_tool *tool,
 	if (machine__resolve(machine, &al, sample) < 0)
 		return;
 
-	if (!top->kptr_restrict_warned &&
+	if (!machine->kptr_restrict_warned &&
 	    symbol_conf.kptr_restrict &&
 	    al.cpumode == PERF_RECORD_MISC_KERNEL) {
 		ui__warning(
@@ -743,7 +743,7 @@ static void perf_event__process_sample(struct perf_tool *tool,
 			  " modules" : "");
 		if (use_browser <= 0)
 			sleep(5);
-		top->kptr_restrict_warned = true;
+		machine->kptr_restrict_warned = true;
 	}
 
 	if (al.sym == NULL) {
@@ -759,7 +759,7 @@ static void perf_event__process_sample(struct perf_tool *tool,
 		 * --hide-kernel-symbols, even if the user specifies an
 		 * invalid --vmlinux ;-)
 		 */
-		if (!top->kptr_restrict_warned && !top->vmlinux_warned &&
+		if (!machine->kptr_restrict_warned && !top->vmlinux_warned &&
 		    al.map == machine->vmlinux_maps[MAP__FUNCTION] &&
 		    RB_EMPTY_ROOT(&al.map->dso->symbols[MAP__FUNCTION])) {
 			if (symbol_conf.vmlinux_name) {
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 6e5c325148e4..5c50fe70d6b3 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -576,84 +576,54 @@ static struct syscall_fmt {
 	bool	   hexret;
 } syscall_fmts[] = {
 	{ .name	    = "access",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */
-			     [1] = SCA_ACCMODE,  /* mode */ }, },
+	  .arg_scnprintf = { [1] = SCA_ACCMODE,  /* mode */ }, },
 	{ .name	    = "arch_prctl", .errmsg = true, .alias = "prctl", },
 	{ .name	    = "bpf",	    .errmsg = true, STRARRAY(0, cmd, bpf_cmd), },
 	{ .name	    = "brk",	    .hexret = true,
 	  .arg_scnprintf = { [0] = SCA_HEX, /* brk */ }, },
-	{ .name	    = "chdir",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "chmod",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "chroot",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
+	{ .name	    = "chdir",	    .errmsg = true, },
+	{ .name	    = "chmod",	    .errmsg = true, },
+	{ .name	    = "chroot",	    .errmsg = true, },
 	{ .name     = "clock_gettime",  .errmsg = true, STRARRAY(0, clk_id, clockid), },
 	{ .name	    = "clone",	    .errpid = true, },
 	{ .name	    = "close",	    .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_CLOSE_FD, /* fd */ }, },
 	{ .name	    = "connect",    .errmsg = true, },
-	{ .name	    = "creat",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "dup",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "dup2",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "dup3",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	{ .name	    = "creat",	    .errmsg = true, },
+	{ .name	    = "dup",	    .errmsg = true, },
+	{ .name	    = "dup2",	    .errmsg = true, },
+	{ .name	    = "dup3",	    .errmsg = true, },
 	{ .name	    = "epoll_ctl",  .errmsg = true, STRARRAY(1, op, epoll_ctl_ops), },
 	{ .name	    = "eventfd2",   .errmsg = true,
 	  .arg_scnprintf = { [1] = SCA_EFD_FLAGS, /* flags */ }, },
-	{ .name	    = "faccessat",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "fadvise64",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "fallocate",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "fchdir",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "fchmod",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	{ .name	    = "faccessat",  .errmsg = true, },
+	{ .name	    = "fadvise64",  .errmsg = true, },
+	{ .name	    = "fallocate",  .errmsg = true, },
+	{ .name	    = "fchdir",	    .errmsg = true, },
+	{ .name	    = "fchmod",	    .errmsg = true, },
 	{ .name	    = "fchmodat",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "fchown",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
+	{ .name	    = "fchown",	    .errmsg = true, },
 	{ .name	    = "fchownat",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
 	{ .name	    = "fcntl",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [1] = SCA_STRARRAY, /* cmd */ },
+	  .arg_scnprintf = { [1] = SCA_STRARRAY, /* cmd */ },
 	  .arg_parm	 = { [1] = &strarray__fcntl_cmds, /* cmd */ }, },
-	{ .name	    = "fdatasync",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	{ .name	    = "fdatasync",  .errmsg = true, },
 	{ .name	    = "flock",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [1] = SCA_FLOCK, /* cmd */ }, },
-	{ .name	    = "fsetxattr",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "fstat",	    .errmsg = true, .alias = "newfstat",
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "fstatat",    .errmsg = true, .alias = "newfstatat",
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "fstatfs",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "fsync",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "ftruncate", .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	  .arg_scnprintf = { [1] = SCA_FLOCK, /* cmd */ }, },
+	{ .name	    = "fsetxattr",  .errmsg = true, },
+	{ .name	    = "fstat",	    .errmsg = true, .alias = "newfstat", },
+	{ .name	    = "fstatat",    .errmsg = true, .alias = "newfstatat", },
+	{ .name	    = "fstatfs",    .errmsg = true, },
+	{ .name	    = "fsync",    .errmsg = true, },
+	{ .name	    = "ftruncate", .errmsg = true, },
 	{ .name	    = "futex",	    .errmsg = true,
 	  .arg_scnprintf = { [1] = SCA_FUTEX_OP, /* op */ }, },
 	{ .name	    = "futimesat", .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "getdents",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "getdents64", .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
+	{ .name	    = "getdents",   .errmsg = true, },
+	{ .name	    = "getdents64", .errmsg = true, },
 	{ .name	    = "getitimer",  .errmsg = true, STRARRAY(0, which, itimers), },
 	{ .name	    = "getpid",	    .errpid = true, },
 	{ .name	    = "getpgid",    .errpid = true, },
@@ -661,12 +631,10 @@ static struct syscall_fmt {
 	{ .name	    = "getrandom",  .errmsg = true,
 	  .arg_scnprintf = { [2] = SCA_GETRANDOM_FLAGS, /* flags */ }, },
 	{ .name	    = "getrlimit",  .errmsg = true, STRARRAY(0, resource, rlimit_resources), },
-	{ .name	    = "getxattr",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "inotify_add_watch",	    .errmsg = true,
-	  .arg_scnprintf = { [1] = SCA_FILENAME, /* pathname */ }, },
+	{ .name	    = "getxattr",   .errmsg = true, },
+	{ .name	    = "inotify_add_watch",	    .errmsg = true, },
 	{ .name	    = "ioctl",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
+	  .arg_scnprintf = {
 #if defined(__i386__) || defined(__x86_64__)
 /*
  * FIXME: Make this available to all arches.
@@ -680,41 +648,28 @@ static struct syscall_fmt {
 	{ .name	    = "keyctl",	    .errmsg = true, STRARRAY(0, option, keyctl_options), },
 	{ .name	    = "kill",	    .errmsg = true,
 	  .arg_scnprintf = { [1] = SCA_SIGNUM, /* sig */ }, },
-	{ .name	    = "lchown",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "lgetxattr",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
+	{ .name	    = "lchown",    .errmsg = true, },
+	{ .name	    = "lgetxattr",  .errmsg = true, },
 	{ .name	    = "linkat",	    .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
-	{ .name	    = "listxattr",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "llistxattr", .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "lremovexattr",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
+	{ .name	    = "listxattr",  .errmsg = true, },
+	{ .name	    = "llistxattr", .errmsg = true, },
+	{ .name	    = "lremovexattr",  .errmsg = true, },
 	{ .name	    = "lseek",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [2] = SCA_STRARRAY, /* whence */ },
+	  .arg_scnprintf = { [2] = SCA_STRARRAY, /* whence */ },
 	  .arg_parm	 = { [2] = &strarray__whences, /* whence */ }, },
-	{ .name	    = "lsetxattr",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "lstat",	    .errmsg = true, .alias = "newlstat",
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "lsxattr",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
+	{ .name	    = "lsetxattr",  .errmsg = true, },
+	{ .name	    = "lstat",	    .errmsg = true, .alias = "newlstat", },
+	{ .name	    = "lsxattr",    .errmsg = true, },
 	{ .name     = "madvise",    .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_HEX,	 /* start */
 			     [2] = SCA_MADV_BHV, /* behavior */ }, },
-	{ .name	    = "mkdir",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
+	{ .name	    = "mkdir",    .errmsg = true, },
 	{ .name	    = "mkdirat",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */
-			     [1] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "mknod",      .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
+	{ .name	    = "mknod",      .errmsg = true, },
 	{ .name	    = "mknodat",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
 	{ .name	    = "mlock",	    .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_HEX, /* addr */ }, },
 	{ .name	    = "mlockall",   .errmsg = true,
@@ -722,8 +677,7 @@ static struct syscall_fmt {
 	{ .name	    = "mmap",	    .hexret = true,
 	  .arg_scnprintf = { [0] = SCA_HEX,	  /* addr */
 			     [2] = SCA_MMAP_PROT, /* prot */
-			     [3] = SCA_MMAP_FLAGS, /* flags */
-			     [4] = SCA_FD, 	  /* fd */ }, },
+			     [3] = SCA_MMAP_FLAGS, /* flags */ }, },
 	{ .name	    = "mprotect",   .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_HEX, /* start */
 			     [2] = SCA_MMAP_PROT, /* prot */ }, },
@@ -740,17 +694,14 @@ static struct syscall_fmt {
 	{ .name	    = "name_to_handle_at", .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
 	{ .name	    = "newfstatat", .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
 	{ .name	    = "open",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME,   /* filename */
-			     [1] = SCA_OPEN_FLAGS, /* flags */ }, },
+	  .arg_scnprintf = { [1] = SCA_OPEN_FLAGS, /* flags */ }, },
 	{ .name	    = "open_by_handle_at", .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
 			     [2] = SCA_OPEN_FLAGS, /* flags */ }, },
 	{ .name	    = "openat",	    .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
-			     [1] = SCA_FILENAME, /* filename */
 			     [2] = SCA_OPEN_FLAGS, /* flags */ }, },
 	{ .name	    = "perf_event_open", .errmsg = true,
 	  .arg_scnprintf = { [2] = SCA_INT, /* cpu */
@@ -760,39 +711,26 @@ static struct syscall_fmt {
 	  .arg_scnprintf = { [1] = SCA_PIPE_FLAGS, /* flags */ }, },
 	{ .name	    = "poll",	    .errmsg = true, .timeout = true, },
 	{ .name	    = "ppoll",	    .errmsg = true, .timeout = true, },
-	{ .name	    = "pread",	    .errmsg = true, .alias = "pread64",
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "preadv",	    .errmsg = true, .alias = "pread",
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	{ .name	    = "pread",	    .errmsg = true, .alias = "pread64", },
+	{ .name	    = "preadv",	    .errmsg = true, .alias = "pread", },
 	{ .name	    = "prlimit64",  .errmsg = true, STRARRAY(1, resource, rlimit_resources), },
-	{ .name	    = "pwrite",	    .errmsg = true, .alias = "pwrite64",
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "pwritev",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "read",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "readlink",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* path */ }, },
+	{ .name	    = "pwrite",	    .errmsg = true, .alias = "pwrite64", },
+	{ .name	    = "pwritev",    .errmsg = true, },
+	{ .name	    = "read",	    .errmsg = true, },
+	{ .name	    = "readlink",   .errmsg = true, },
 	{ .name	    = "readlinkat", .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
-			     [1] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "readv",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
+	{ .name	    = "readv",	    .errmsg = true, },
 	{ .name	    = "recvfrom",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [3] = SCA_MSG_FLAGS, /* flags */ }, },
+	  .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
 	{ .name	    = "recvmmsg",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [3] = SCA_MSG_FLAGS, /* flags */ }, },
+	  .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
 	{ .name	    = "recvmsg",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [2] = SCA_MSG_FLAGS, /* flags */ }, },
-	{ .name	    = "removexattr", .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
+	  .arg_scnprintf = { [2] = SCA_MSG_FLAGS, /* flags */ }, },
+	{ .name	    = "removexattr", .errmsg = true, },
 	{ .name	    = "renameat",   .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
-	{ .name	    = "rmdir",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
+	{ .name	    = "rmdir",    .errmsg = true, },
 	{ .name	    = "rt_sigaction", .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_SIGNUM, /* sig */ }, },
 	{ .name	    = "rt_sigprocmask",  .errmsg = true, STRARRAY(0, how, sighow), },
@@ -807,22 +745,17 @@ static struct syscall_fmt {
 			     [1] = SCA_SECCOMP_FLAGS, /* flags */ }, },
 	{ .name	    = "select",	    .errmsg = true, .timeout = true, },
 	{ .name	    = "sendmmsg",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [3] = SCA_MSG_FLAGS, /* flags */ }, },
+	  .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
 	{ .name	    = "sendmsg",    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [2] = SCA_MSG_FLAGS, /* flags */ }, },
+	  .arg_scnprintf = { [2] = SCA_MSG_FLAGS, /* flags */ }, },
 	{ .name	    = "sendto",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */
-			     [3] = SCA_MSG_FLAGS, /* flags */ }, },
+	  .arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
 	{ .name	    = "set_tid_address", .errpid = true, },
 	{ .name	    = "setitimer",  .errmsg = true, STRARRAY(0, which, itimers), },
 	{ .name	    = "setpgid",    .errmsg = true, },
 	{ .name	    = "setrlimit",  .errmsg = true, STRARRAY(0, resource, rlimit_resources), },
-	{ .name	    = "setxattr",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "shutdown",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	{ .name	    = "setxattr",   .errmsg = true, },
+	{ .name	    = "shutdown",   .errmsg = true, },
 	{ .name	    = "socket",	    .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_STRARRAY, /* family */
 			     [1] = SCA_SK_TYPE, /* type */ },
@@ -831,10 +764,8 @@ static struct syscall_fmt {
 	  .arg_scnprintf = { [0] = SCA_STRARRAY, /* family */
 			     [1] = SCA_SK_TYPE, /* type */ },
 	  .arg_parm	 = { [0] = &strarray__socket_families, /* family */ }, },
-	{ .name	    = "stat",	    .errmsg = true, .alias = "newstat",
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "statfs",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
+	{ .name	    = "stat",	    .errmsg = true, .alias = "newstat", },
+	{ .name	    = "statfs",	    .errmsg = true, },
 	{ .name	    = "swapoff",    .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_FILENAME, /* specialfile */ }, },
 	{ .name	    = "swapon",	    .errmsg = true,
@@ -845,29 +776,21 @@ static struct syscall_fmt {
 	  .arg_scnprintf = { [2] = SCA_SIGNUM, /* sig */ }, },
 	{ .name	    = "tkill",	    .errmsg = true,
 	  .arg_scnprintf = { [1] = SCA_SIGNUM, /* sig */ }, },
-	{ .name	    = "truncate",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* path */ }, },
+	{ .name	    = "truncate",   .errmsg = true, },
 	{ .name	    = "uname",	    .errmsg = true, .alias = "newuname", },
 	{ .name	    = "unlinkat",   .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
-			     [1] = SCA_FILENAME, /* pathname */ }, },
-	{ .name	    = "utime",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
+	{ .name	    = "utime",  .errmsg = true, },
 	{ .name	    = "utimensat",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FDAT, /* dirfd */
-			     [1] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "utimes",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
-	{ .name	    = "vmsplice",  .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	  .arg_scnprintf = { [0] = SCA_FDAT, /* dirfd */ }, },
+	{ .name	    = "utimes",  .errmsg = true, },
+	{ .name	    = "vmsplice",  .errmsg = true, },
 	{ .name	    = "wait4",	    .errpid = true,
 	  .arg_scnprintf = { [2] = SCA_WAITID_OPTIONS, /* options */ }, },
 	{ .name	    = "waitid",	    .errpid = true,
 	  .arg_scnprintf = { [3] = SCA_WAITID_OPTIONS, /* options */ }, },
-	{ .name	    = "write",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
-	{ .name	    = "writev",	    .errmsg = true,
-	  .arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
+	{ .name	    = "write",	    .errmsg = true, },
+	{ .name	    = "writev",	    .errmsg = true, },
 };
 
 static int syscall_fmt__cmp(const void *name, const void *fmtp)
@@ -1160,6 +1083,24 @@ static int trace__tool_process(struct perf_tool *tool,
 	return trace__process_event(trace, machine, event, sample);
 }
 
+static char *trace__machine__resolve_kernel_addr(void *vmachine, unsigned long long *addrp, char **modp)
+{
+	struct machine *machine = vmachine;
+
+	if (machine->kptr_restrict_warned)
+		return NULL;
+
+	if (symbol_conf.kptr_restrict) {
+		pr_warning("Kernel address maps (/proc/{kallsyms,modules}) are restricted.\n\n"
+			   "Check /proc/sys/kernel/kptr_restrict.\n\n"
+			   "Kernel samples will not be resolved.\n");
+		machine->kptr_restrict_warned = true;
+		return NULL;
+	}
+
+	return machine__resolve_kernel_addr(vmachine, addrp, modp);
+}
+
 static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
 {
 	int err = symbol__init(NULL);
@@ -1171,7 +1112,7 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
 	if (trace->host == NULL)
 		return -ENOMEM;
 
-	if (trace_event__register_resolver(trace->host, machine__resolve_kernel_addr) < 0)
+	if (trace_event__register_resolver(trace->host, trace__machine__resolve_kernel_addr) < 0)
 		return -errno;
 
 	err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
@@ -1186,7 +1127,7 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
 static int syscall__set_arg_fmts(struct syscall *sc)
 {
 	struct format_field *field;
-	int idx = 0;
+	int idx = 0, len;
 
 	sc->arg_scnprintf = calloc(sc->nr_args, sizeof(void *));
 	if (sc->arg_scnprintf == NULL)
@@ -1198,12 +1139,31 @@ static int syscall__set_arg_fmts(struct syscall *sc)
 	for (field = sc->args; field; field = field->next) {
 		if (sc->fmt && sc->fmt->arg_scnprintf[idx])
 			sc->arg_scnprintf[idx] = sc->fmt->arg_scnprintf[idx];
+		else if (strcmp(field->type, "const char *") == 0 &&
+			 (strcmp(field->name, "filename") == 0 ||
+			  strcmp(field->name, "path") == 0 ||
+			  strcmp(field->name, "pathname") == 0))
+			sc->arg_scnprintf[idx] = SCA_FILENAME;
 		else if (field->flags & FIELD_IS_POINTER)
 			sc->arg_scnprintf[idx] = syscall_arg__scnprintf_hex;
 		else if (strcmp(field->type, "pid_t") == 0)
 			sc->arg_scnprintf[idx] = SCA_PID;
 		else if (strcmp(field->type, "umode_t") == 0)
 			sc->arg_scnprintf[idx] = SCA_MODE_T;
+		else if ((strcmp(field->type, "int") == 0 ||
+			  strcmp(field->type, "unsigned int") == 0 ||
+			  strcmp(field->type, "long") == 0) &&
+			 (len = strlen(field->name)) >= 2 &&
+			 strcmp(field->name + len - 2, "fd") == 0) {
+			/*
+			 * /sys/kernel/tracing/events/syscalls/sys_enter*
+			 * egrep 'field:.*fd;' .../format|sed -r 's/.*field:([a-z ]+) [a-z_]*fd.+/\1/g'|sort|uniq -c
+			 * 65 int
+			 * 23 unsigned int
+			 * 7 unsigned long
+			 */
+			sc->arg_scnprintf[idx] = SCA_FD;
+		}
 		++idx;
 	}
 
@@ -1534,7 +1494,7 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel,
 	if (sc->is_exit) {
 		if (!(trace->duration_filter || trace->summary_only || trace->min_stack)) {
 			trace__fprintf_entry_head(trace, thread, 1, sample->time, trace->output);
-			fprintf(trace->output, "%-70s\n", ttrace->entry_str);
+			fprintf(trace->output, "%-70s)\n", ttrace->entry_str);
 		}
 	} else {
 		ttrace->entry_pending = true;
@@ -2887,12 +2847,12 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 		mmap_pages_user_set = false;
 
 	if (trace.max_stack == UINT_MAX) {
-		trace.max_stack = sysctl_perf_event_max_stack;
+		trace.max_stack = input_name ? PERF_MAX_STACK_DEPTH : sysctl_perf_event_max_stack;
 		max_stack_user_set = false;
 	}
 
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
-	if ((trace.min_stack || max_stack_user_set) && !callchain_param.enabled)
+	if ((trace.min_stack || max_stack_user_set) && !callchain_param.enabled && trace.trace_syscalls)
 		record_opts__parse_callchain(&trace.opts, &callchain_param, "dwarf", false);
 #endif
 
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 797000842d40..15982cee5ef3 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -549,6 +549,9 @@ int main(int argc, const char **argv)
 	if (sysctl__read_int("kernel/perf_event_max_stack", &value) == 0)
 		sysctl_perf_event_max_stack = value;
 
+	if (sysctl__read_int("kernel/perf_event_max_contexts_per_stack", &value) == 0)
+		sysctl_perf_event_max_contexts_per_stack = value;
+
 	cmd = extract_argv0_path(argv[0]);
 	if (!cmd)
 		cmd = "perf-help";
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4db73d5a0dbc..7e5a1e8874ce 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -354,9 +354,6 @@ static struct ins_ops nop_ops = {
 	.scnprintf = nop__scnprintf,
 };
 
-/*
- * Must be sorted by name!
- */
 static struct ins instructions[] = {
 	{ .name = "add",   .ops  = &mov_ops, },
 	{ .name = "addl",  .ops  = &mov_ops, },
@@ -372,8 +369,8 @@ static struct ins instructions[] = {
 	{ .name = "bgt",   .ops  = &jump_ops, },
 	{ .name = "bhi",   .ops  = &jump_ops, },
 	{ .name = "bl",    .ops  = &call_ops, },
-	{ .name = "blt",   .ops  = &jump_ops, },
 	{ .name = "bls",   .ops  = &jump_ops, },
+	{ .name = "blt",   .ops  = &jump_ops, },
 	{ .name = "blx",   .ops  = &call_ops, },
 	{ .name = "bne",   .ops  = &jump_ops, },
 #endif
@@ -449,18 +446,39 @@ static struct ins instructions[] = {
 	{ .name = "xbeginq", .ops  = &jump_ops, },
 };
 
-static int ins__cmp(const void *name, const void *insp)
+static int ins__key_cmp(const void *name, const void *insp)
 {
 	const struct ins *ins = insp;
 
 	return strcmp(name, ins->name);
 }
 
+static int ins__cmp(const void *a, const void *b)
+{
+	const struct ins *ia = a;
+	const struct ins *ib = b;
+
+	return strcmp(ia->name, ib->name);
+}
+
+static void ins__sort(void)
+{
+	const int nmemb = ARRAY_SIZE(instructions);
+
+	qsort(instructions, nmemb, sizeof(struct ins), ins__cmp);
+}
+
 static struct ins *ins__find(const char *name)
 {
 	const int nmemb = ARRAY_SIZE(instructions);
+	static bool sorted;
+
+	if (!sorted) {
+		ins__sort();
+		sorted = true;
+	}
 
-	return bsearch(name, instructions, nmemb, sizeof(struct ins), ins__cmp);
+	return bsearch(name, instructions, nmemb, sizeof(struct ins), ins__key_cmp);
 }
 
 int symbol__annotate_init(struct map *map __maybe_unused, struct symbol *sym)
@@ -1122,7 +1140,7 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 	} else if (dso__is_kcore(dso)) {
 		goto fallback;
 	} else if (readlink(symfs_filename, command, sizeof(command)) < 0 ||
-		   strstr(command, "[kernel.kallsyms]") ||
+		   strstr(command, DSO__NAME_KALLSYMS) ||
 		   access(symfs_filename, R_OK)) {
 		free(filename);
 fallback:
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index bff425e1232c..67e5966503b2 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -256,7 +256,7 @@ static int machine__write_buildid_table(struct machine *machine, int fd)
 		size_t name_len;
 		bool in_kernel = false;
 
-		if (!pos->hit)
+		if (!pos->hit && !dso__is_vdso(pos))
 			continue;
 
 		if (dso__is_vdso(pos)) {
diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c
index 8d96c80cc67e..c9a6dc173e74 100644
--- a/tools/perf/util/db-export.c
+++ b/tools/perf/util/db-export.c
@@ -298,8 +298,7 @@ static struct call_path *call_path_from_sample(struct db_export *dbe,
 	 */
 	callchain_param.order = ORDER_CALLER;
 	err = thread__resolve_callchain(thread, &callchain_cursor, evsel,
-					sample, NULL, NULL,
-					sysctl_perf_event_max_stack);
+					sample, NULL, NULL, PERF_MAX_STACK_DEPTH);
 	if (err) {
 		callchain_param.order = saved_order;
 		return NULL;
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 3357479082ca..5d286f5d7906 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -7,6 +7,7 @@
 #include "auxtrace.h"
 #include "util.h"
 #include "debug.h"
+#include "vdso.h"
 
 char dso__symtab_origin(const struct dso *dso)
 {
@@ -62,9 +63,7 @@ int dso__read_binary_type_filename(const struct dso *dso,
 		}
 		break;
 	case DSO_BINARY_TYPE__BUILD_ID_CACHE:
-		/* skip the locally configured cache if a symfs is given */
-		if (symbol_conf.symfs[0] ||
-		    (dso__build_id_filename(dso, filename, size) == NULL))
+		if (dso__build_id_filename(dso, filename, size) == NULL)
 			ret = -1;
 		break;
 
@@ -1169,7 +1168,7 @@ bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
 	struct dso *pos;
 
 	list_for_each_entry(pos, head, node) {
-		if (with_hits && !pos->hit)
+		if (with_hits && !pos->hit && !dso__is_vdso(pos))
 			continue;
 		if (pos->has_build_id) {
 			have_build_id = true;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c4bfe11479a0..e82ba90cc969 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -44,6 +44,7 @@ void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 	perf_evlist__set_maps(evlist, cpus, threads);
 	fdarray__init(&evlist->pollfd, 64);
 	evlist->workload.pid = -1;
+	evlist->backward = false;
 }
 
 struct perf_evlist *perf_evlist__new(void)
@@ -679,6 +680,33 @@ static struct perf_evsel *perf_evlist__event2evsel(struct perf_evlist *evlist,
 	return NULL;
 }
 
+static int perf_evlist__set_paused(struct perf_evlist *evlist, bool value)
+{
+	int i;
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		int fd = evlist->mmap[i].fd;
+		int err;
+
+		if (fd < 0)
+			continue;
+		err = ioctl(fd, PERF_EVENT_IOC_PAUSE_OUTPUT, value ? 1 : 0);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+int perf_evlist__pause(struct perf_evlist *evlist)
+{
+	return perf_evlist__set_paused(evlist, true);
+}
+
+int perf_evlist__resume(struct perf_evlist *evlist)
+{
+	return perf_evlist__set_paused(evlist, false);
+}
+
 /* When check_messup is true, 'end' must points to a good entry */
 static union perf_event *
 perf_mmap__read(struct perf_mmap *md, bool check_messup, u64 start,
@@ -881,6 +909,7 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
 	if (evlist->mmap[idx].base != NULL) {
 		munmap(evlist->mmap[idx].base, evlist->mmap_len);
 		evlist->mmap[idx].base = NULL;
+		evlist->mmap[idx].fd = -1;
 		atomic_set(&evlist->mmap[idx].refcnt, 0);
 	}
 	auxtrace_mmap__munmap(&evlist->mmap[idx].auxtrace_mmap);
@@ -901,10 +930,14 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
 
 static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
 {
+	int i;
+
 	evlist->nr_mmaps = cpu_map__nr(evlist->cpus);
 	if (cpu_map__empty(evlist->cpus))
 		evlist->nr_mmaps = thread_map__nr(evlist->threads);
 	evlist->mmap = zalloc(evlist->nr_mmaps * sizeof(struct perf_mmap));
+	for (i = 0; i < evlist->nr_mmaps; i++)
+		evlist->mmap[i].fd = -1;
 	return evlist->mmap != NULL ? 0 : -ENOMEM;
 }
 
@@ -941,6 +974,7 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 		evlist->mmap[idx].base = NULL;
 		return -1;
 	}
+	evlist->mmap[idx].fd = fd;
 
 	if (auxtrace_mmap__mmap(&evlist->mmap[idx].auxtrace_mmap,
 				&mp->auxtrace_mp, evlist->mmap[idx].base, fd))
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 85d1b59802e8..d740fb877ab6 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -28,6 +28,7 @@ struct record_opts;
 struct perf_mmap {
 	void		 *base;
 	int		 mask;
+	int		 fd;
 	atomic_t	 refcnt;
 	u64		 prev;
 	struct auxtrace_mmap auxtrace_mmap;
@@ -43,6 +44,7 @@ struct perf_evlist {
 	bool		 overwrite;
 	bool		 enabled;
 	bool		 has_user_cpus;
+	bool		 backward;
 	size_t		 mmap_len;
 	int		 id_pos;
 	int		 is_pos;
@@ -135,6 +137,8 @@ void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
 
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
 
+int perf_evlist__pause(struct perf_evlist *evlist);
+int perf_evlist__resume(struct perf_evlist *evlist);
 int perf_evlist__open(struct perf_evlist *evlist);
 void perf_evlist__close(struct perf_evlist *evlist);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 964c7c3602c0..02c177d14c8d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -37,6 +37,7 @@ static struct {
 	bool clockid;
 	bool clockid_wrong;
 	bool lbr_flags;
+	bool write_backward;
 } perf_missing_features;
 
 static clockid_t clockid;
@@ -1376,6 +1377,8 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 	if (perf_missing_features.lbr_flags)
 		evsel->attr.branch_sample_type &= ~(PERF_SAMPLE_BRANCH_NO_FLAGS |
 				     PERF_SAMPLE_BRANCH_NO_CYCLES);
+	if (perf_missing_features.write_backward)
+		evsel->attr.write_backward = false;
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->attr.sample_id_all = 0;
@@ -1438,6 +1441,12 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 				err = -EINVAL;
 				goto out_close;
 			}
+
+			if (evsel->overwrite &&
+			    perf_missing_features.write_backward) {
+				err = -EINVAL;
+				goto out_close;
+			}
 		}
 	}
 
@@ -1500,6 +1509,10 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 			  PERF_SAMPLE_BRANCH_NO_FLAGS))) {
 		perf_missing_features.lbr_flags = true;
 		goto fallback_missing_features;
+	} else if (!perf_missing_features.write_backward &&
+			evsel->attr.write_backward) {
+		perf_missing_features.write_backward = true;
+		goto fallback_missing_features;
 	}
 
 out_close:
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8a644fef452c..c1f10159804c 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -112,6 +112,7 @@ struct perf_evsel {
 	bool			tracking;
 	bool			per_pkg;
 	bool			precise_max;
+	bool			overwrite;
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index cfab531437c7..d1f19e0012d4 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -117,6 +117,13 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 			hists__new_col_len(hists, HISTC_SYMBOL_TO, symlen);
 			hists__set_unres_dso_col_len(hists, HISTC_DSO_TO);
 		}
+
+		if (h->branch_info->srcline_from)
+			hists__new_col_len(hists, HISTC_SRCLINE_FROM,
+					strlen(h->branch_info->srcline_from));
+		if (h->branch_info->srcline_to)
+			hists__new_col_len(hists, HISTC_SRCLINE_TO,
+					strlen(h->branch_info->srcline_to));
 	}
 
 	if (h->mem_info) {
@@ -1042,6 +1049,8 @@ void hist_entry__delete(struct hist_entry *he)
 	if (he->branch_info) {
 		map__zput(he->branch_info->from.map);
 		map__zput(he->branch_info->to.map);
+		free_srcline(he->branch_info->srcline_from);
+		free_srcline(he->branch_info->srcline_to);
 		zfree(&he->branch_info);
 	}
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 0f84bfb42bb1..7b54ccf1b737 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -52,6 +52,8 @@ enum hist_column {
 	HISTC_MEM_IADDR_SYMBOL,
 	HISTC_TRANSACTION,
 	HISTC_CYCLES,
+	HISTC_SRCLINE_FROM,
+	HISTC_SRCLINE_TO,
 	HISTC_TRACE,
 	HISTC_NR_COLS, /* Last entry */
 };
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 639a2903065e..205d27017361 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -43,6 +43,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 
 	machine->symbol_filter = NULL;
 	machine->id_hdr_size = 0;
+	machine->kptr_restrict_warned = false;
 	machine->comm_exec = false;
 	machine->kernel_start = 0;
 
@@ -709,7 +710,7 @@ static struct dso *machine__get_kernel(struct machine *machine)
 	if (machine__is_host(machine)) {
 		vmlinux_name = symbol_conf.vmlinux_name;
 		if (!vmlinux_name)
-			vmlinux_name = "[kernel.kallsyms]";
+			vmlinux_name = DSO__NAME_KALLSYMS;
 
 		kernel = machine__findnew_kernel(machine, vmlinux_name,
 						 "[kernel]", DSO_TYPE_KERNEL);
@@ -1135,10 +1136,10 @@ int machine__create_kernel_maps(struct machine *machine)
 {
 	struct dso *kernel = machine__get_kernel(machine);
 	const char *name;
-	u64 addr = machine__get_running_kernel_start(machine, &name);
+	u64 addr;
 	int ret;
 
-	if (!addr || kernel == NULL)
+	if (kernel == NULL)
 		return -1;
 
 	ret = __machine__create_kernel_maps(machine, kernel);
@@ -1160,8 +1161,9 @@ int machine__create_kernel_maps(struct machine *machine)
 	 */
 	map_groups__fixup_end(&machine->kmaps);
 
-	if (maps__set_kallsyms_ref_reloc_sym(machine->vmlinux_maps, name,
-					     addr)) {
+	addr = machine__get_running_kernel_start(machine, &name);
+	if (!addr) {
+	} else if (maps__set_kallsyms_ref_reloc_sym(machine->vmlinux_maps, name, addr)) {
 		machine__destroy_kernel_maps(machine);
 		return -1;
 	}
@@ -1769,11 +1771,6 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 		 */
 		int mix_chain_nr = i + 1 + lbr_nr + 1;
 
-		if (mix_chain_nr > (int)sysctl_perf_event_max_stack + PERF_MAX_BRANCH_DEPTH) {
-			pr_warning("corrupted callchain. skipping...\n");
-			return 0;
-		}
-
 		for (j = 0; j < mix_chain_nr; j++) {
 			if (callchain_param.order == ORDER_CALLEE) {
 				if (j < i + 1)
@@ -1811,9 +1808,9 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 {
 	struct branch_stack *branch = sample->branch_stack;
 	struct ip_callchain *chain = sample->callchain;
-	int chain_nr = min(max_stack, (int)chain->nr);
+	int chain_nr = chain->nr;
 	u8 cpumode = PERF_RECORD_MISC_USER;
-	int i, j, err;
+	int i, j, err, nr_entries;
 	int skip_idx = -1;
 	int first_call = 0;
 
@@ -1828,8 +1825,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	 * Based on DWARF debug information, some architectures skip
 	 * a callchain entry saved by the kernel.
 	 */
-	if (chain->nr < sysctl_perf_event_max_stack)
-		skip_idx = arch_skip_callchain_idx(thread, chain);
+	skip_idx = arch_skip_callchain_idx(thread, chain);
 
 	/*
 	 * Add branches to call stack for easier browsing. This gives
@@ -1889,12 +1885,8 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	}
 
 check_calls:
-	if (chain->nr > sysctl_perf_event_max_stack && (int)chain->nr > max_stack) {
-		pr_warning("corrupted callchain. skipping...\n");
-		return 0;
-	}
-
-	for (i = first_call; i < chain_nr; i++) {
+	for (i = first_call, nr_entries = 0;
+	     i < chain_nr && nr_entries < max_stack; i++) {
 		u64 ip;
 
 		if (callchain_param.order == ORDER_CALLEE)
@@ -1908,6 +1900,9 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 #endif
 		ip = chain->ips[j];
 
+		if (ip < PERF_CONTEXT_MAX)
+                       ++nr_entries;
+
 		err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip);
 
 		if (err)
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 83f46790c52f..41ac9cfd416b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -28,6 +28,7 @@ struct machine {
 	pid_t		  pid;
 	u16		  id_hdr_size;
 	bool		  comm_exec;
+	bool		  kptr_restrict_warned;
 	char		  *root_dir;
 	struct rb_root	  threads;
 	pthread_rwlock_t  threads_lock;
diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
index 62c7f6988e0e..5d1eb1ccd96c 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -264,8 +264,7 @@ static SV *perl_process_callchain(struct perf_sample *sample,
 		goto exit;
 
 	if (thread__resolve_callchain(al->thread, &callchain_cursor, evsel,
-				      sample, NULL, NULL,
-				      sysctl_perf_event_max_stack) != 0) {
+				      sample, NULL, NULL, scripting_max_stack) != 0) {
 		pr_err("Failed to resolve callchain. Skipping\n");
 		goto exit;
 	}
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 20e69edd5006..c4e9bd70723c 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -353,6 +353,88 @@ struct sort_entry sort_srcline = {
 	.se_width_idx	= HISTC_SRCLINE,
 };
 
+/* --sort srcline_from */
+
+static int64_t
+sort__srcline_from_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	if (!left->branch_info->srcline_from) {
+		struct map *map = left->branch_info->from.map;
+		if (!map)
+			left->branch_info->srcline_from = SRCLINE_UNKNOWN;
+		else
+			left->branch_info->srcline_from = get_srcline(map->dso,
+					   map__rip_2objdump(map,
+							     left->branch_info->from.al_addr),
+							 left->branch_info->from.sym, true);
+	}
+	if (!right->branch_info->srcline_from) {
+		struct map *map = right->branch_info->from.map;
+		if (!map)
+			right->branch_info->srcline_from = SRCLINE_UNKNOWN;
+		else
+			right->branch_info->srcline_from = get_srcline(map->dso,
+					     map__rip_2objdump(map,
+							       right->branch_info->from.al_addr),
+						     right->branch_info->from.sym, true);
+	}
+	return strcmp(right->branch_info->srcline_from, left->branch_info->srcline_from);
+}
+
+static int hist_entry__srcline_from_snprintf(struct hist_entry *he, char *bf,
+					size_t size, unsigned int width)
+{
+	return repsep_snprintf(bf, size, "%-*.*s", width, width, he->branch_info->srcline_from);
+}
+
+struct sort_entry sort_srcline_from = {
+	.se_header	= "From Source:Line",
+	.se_cmp		= sort__srcline_from_cmp,
+	.se_snprintf	= hist_entry__srcline_from_snprintf,
+	.se_width_idx	= HISTC_SRCLINE_FROM,
+};
+
+/* --sort srcline_to */
+
+static int64_t
+sort__srcline_to_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	if (!left->branch_info->srcline_to) {
+		struct map *map = left->branch_info->to.map;
+		if (!map)
+			left->branch_info->srcline_to = SRCLINE_UNKNOWN;
+		else
+			left->branch_info->srcline_to = get_srcline(map->dso,
+					   map__rip_2objdump(map,
+							     left->branch_info->to.al_addr),
+							 left->branch_info->from.sym, true);
+	}
+	if (!right->branch_info->srcline_to) {
+		struct map *map = right->branch_info->to.map;
+		if (!map)
+			right->branch_info->srcline_to = SRCLINE_UNKNOWN;
+		else
+			right->branch_info->srcline_to = get_srcline(map->dso,
+					     map__rip_2objdump(map,
+							       right->branch_info->to.al_addr),
+						     right->branch_info->to.sym, true);
+	}
+	return strcmp(right->branch_info->srcline_to, left->branch_info->srcline_to);
+}
+
+static int hist_entry__srcline_to_snprintf(struct hist_entry *he, char *bf,
+					size_t size, unsigned int width)
+{
+	return repsep_snprintf(bf, size, "%-*.*s", width, width, he->branch_info->srcline_to);
+}
+
+struct sort_entry sort_srcline_to = {
+	.se_header	= "To Source:Line",
+	.se_cmp		= sort__srcline_to_cmp,
+	.se_snprintf	= hist_entry__srcline_to_snprintf,
+	.se_width_idx	= HISTC_SRCLINE_TO,
+};
+
 /* --sort srcfile */
 
 static char no_srcfile[1];
@@ -1347,6 +1429,8 @@ static struct sort_dimension bstack_sort_dimensions[] = {
 	DIM(SORT_IN_TX, "in_tx", sort_in_tx),
 	DIM(SORT_ABORT, "abort", sort_abort),
 	DIM(SORT_CYCLES, "cycles", sort_cycles),
+	DIM(SORT_SRCLINE_FROM, "srcline_from", sort_srcline_from),
+	DIM(SORT_SRCLINE_TO, "srcline_to", sort_srcline_to),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 42927f448bcb..ebb59cacd092 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -215,6 +215,8 @@ enum sort_type {
 	SORT_ABORT,
 	SORT_IN_TX,
 	SORT_CYCLES,
+	SORT_SRCLINE_FROM,
+	SORT_SRCLINE_TO,
 
 	/* memory mode specific sort keys */
 	__SORT_MEMORY_MODE,
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index fdb71961143e..aa9efe08762b 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -94,7 +94,8 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 *count,
 {
 	int ctx = evsel_context(counter);
 
-	if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK))
+	if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
+	    perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
 		update_stats(&runtime_nsecs_stats[cpu], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
 		update_stats(&runtime_cycles_stats[ctx][cpu], count[0]);
@@ -188,7 +189,7 @@ static void print_stalled_cycles_backend(int cpu,
 
 	color = get_ratio_color(GRC_STALLED_CYCLES_BE, ratio);
 
-	out->print_metric(out->ctx, color, "%6.2f%%", "backend cycles idle", ratio);
+	out->print_metric(out->ctx, color, "%7.2f%%", "backend cycles idle", ratio);
 }
 
 static void print_branch_misses(int cpu,
@@ -444,7 +445,8 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 			ratio = total / avg;
 
 		print_metric(ctxp, NULL, "%8.0f", "cycles / elision", ratio);
-	} else if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK)) {
+	} else if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK) ||
+		   perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK)) {
 		if ((ratio = avg_stats(&walltime_nsecs_stats)) != 0)
 			print_metric(ctxp, NULL, "%8.3f", "CPUs utilized",
 				     avg / ratio);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 7fb33304fb4e..20f9cb32b703 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1662,8 +1662,8 @@ static char *dso__find_kallsyms(struct dso *dso, struct map *map)
 
 	build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
 
-	scnprintf(path, sizeof(path), "%s/[kernel.kcore]/%s", buildid_dir,
-		  sbuild_id);
+	scnprintf(path, sizeof(path), "%s/%s/%s", buildid_dir,
+		  DSO__NAME_KCORE, sbuild_id);
 
 	/* Use /proc/kallsyms if possible */
 	if (is_host) {
@@ -1699,8 +1699,8 @@ static char *dso__find_kallsyms(struct dso *dso, struct map *map)
 	if (!find_matching_kcore(map, path, sizeof(path)))
 		return strdup(path);
 
-	scnprintf(path, sizeof(path), "%s/[kernel.kallsyms]/%s",
-		  buildid_dir, sbuild_id);
+	scnprintf(path, sizeof(path), "%s/%s/%s",
+		  buildid_dir, DSO__NAME_KALLSYMS, sbuild_id);
 
 	if (access(path, F_OK)) {
 		pr_err("No kallsyms or vmlinux with build-id %s was found\n",
@@ -1769,7 +1769,7 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map,
 
 	if (err > 0 && !dso__is_kcore(dso)) {
 		dso->binary_type = DSO_BINARY_TYPE__KALLSYMS;
-		dso__set_long_name(dso, "[kernel.kallsyms]", false);
+		dso__set_long_name(dso, DSO__NAME_KALLSYMS, false);
 		map__fixup_start(map);
 		map__fixup_end(map);
 	}
@@ -2033,3 +2033,26 @@ void symbol__exit(void)
 	symbol_conf.sym_list = symbol_conf.dso_list = symbol_conf.comm_list = NULL;
 	symbol_conf.initialized = false;
 }
+
+int symbol__config_symfs(const struct option *opt __maybe_unused,
+			 const char *dir, int unset __maybe_unused)
+{
+	char *bf = NULL;
+	int ret;
+
+	symbol_conf.symfs = strdup(dir);
+	if (symbol_conf.symfs == NULL)
+		return -ENOMEM;
+
+	/* skip the locally configured cache if a symfs is given, and
+	 * config buildid dir to symfs/.debug
+	 */
+	ret = asprintf(&bf, "%s/%s", dir, ".debug");
+	if (ret < 0)
+		return -ENOMEM;
+
+	set_buildid_dir(bf);
+
+	free(bf);
+	return 0;
+}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 2b5e4ed76fcb..b10d558a8803 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -44,6 +44,9 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
 #define DMGL_ANSI        (1 << 1)       /* Include const, volatile, etc */
 #endif
 
+#define DSO__NAME_KALLSYMS	"[kernel.kallsyms]"
+#define DSO__NAME_KCORE		"[kernel.kcore]"
+
 /** struct symbol - symtab entry
  *
  * @ignore - resolvable but tools ignore it (e.g. idle routines)
@@ -183,6 +186,8 @@ struct branch_info {
 	struct addr_map_symbol from;
 	struct addr_map_symbol to;
 	struct branch_flags flags;
+	char			*srcline_from;
+	char			*srcline_to;
 };
 
 struct mem_info {
@@ -287,6 +292,8 @@ bool symbol_type__is_a(char symbol_type, enum map_type map_type);
 bool symbol__restricted_filename(const char *filename,
 				 const char *restricted_filename);
 bool symbol__is_idle(struct symbol *sym);
+int symbol__config_symfs(const struct option *opt __maybe_unused,
+			 const char *dir, int unset __maybe_unused);
 
 int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		  struct symsrc *runtime_ss, symbol_filter_t filter,
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index f92c37abb0a8..b2940c88734a 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -27,7 +27,6 @@ struct perf_top {
 	int		   max_stack;
 	bool		   hide_kernel_symbols, hide_user_symbols, zero;
 	bool		   use_tui, use_stdio;
-	bool		   kptr_restrict_warned;
 	bool		   vmlinux_warned;
 	bool		   dump_symtab;
 	struct hist_entry  *sym_filter_entry;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index eab077ad6ca9..23504ad5d6dd 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -33,7 +33,8 @@ struct callchain_param	callchain_param = {
 unsigned int page_size;
 int cacheline_size;
 
-unsigned int sysctl_perf_event_max_stack = PERF_MAX_STACK_DEPTH;
+int sysctl_perf_event_max_stack = PERF_MAX_STACK_DEPTH;
+int sysctl_perf_event_max_contexts_per_stack = PERF_MAX_CONTEXTS_PER_STACK;
 
 bool test_attr__enabled;
 
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 7651633a8dc7..1e8c3167b9fb 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -261,7 +261,8 @@ void sighandler_dump_stack(int sig);
 
 extern unsigned int page_size;
 extern int cacheline_size;
-extern unsigned int sysctl_perf_event_max_stack;
+extern int sysctl_perf_event_max_stack;
+extern int sysctl_perf_event_max_contexts_per_stack;
 
 struct parse_tag {
 	char tag;

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2015-07-04 11:25 Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2015-07-04 11:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus

   # HEAD: b9df84fd7c05cc300d6d14f022b8a00773ebcf8c Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

This tree includes an x86 PMU scheduling fix, but most changes are late breaking 
tooling fixes and updates:

User visible fixes:

  - Create config.detected into OUTPUT directory, fixing parallel
    builds sharing the same source directory (Aaro Kiskinen)

  - Allow to specify custom linker command, fixing some MIPS64
    builds. (Aaro Kiskinen)

  - Fix to show proper convergence stats in 'perf bench numa' (Srikar Dronamraju)

User visible changes:

  - Validate syscall list passed via -e argument to 'perf trace'. (Arnaldo Carvalho de Melo)

  - Introduce 'perf stat --per-thread'. (Jiri Olsa)

  - Check access permission for --kallsyms and --vmlinux. (Li Zhang)

  - Move toggling event logic from 'perf top' and into hists browser, allowing
    freeze/unfreeze with event lists with more than one entry (Namhyung Kim)

  - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and
    showing the Aggregated stats in 'perf report -D' (Adrian Hunter)

 Infrastructure fixes:

  - Add missing break for PERF_RECORD_ITRACE_START, which caused those events
    samples to be parsed as well as PERF_RECORD_LOST_SAMPLES. ITRACE_START only
    appears when Intel PT or BTS are present, so  (Jiri Olsa)

  - Call the perf_session destructor when bailing out in the inject, kmem, report,
    kvm and mem tools (Taeung Song)

 Infrastructure changes:

  - Move stuff out of 'perf stat' and into the lib for further use. (Jiri Olsa)

  - Reference count the cpu_map and thread_map classes. (Jiri Olsa)

  - Set evsel->{cpus,threads} from the evlist, if not set,
    allowing the generalization of some 'perf stat' functions that
    previously were accessing private static evlist variable. (Jiri Olsa)

  - Delete an unnecessary check before the calling
    free_event_desc() (Markus Elfring)

  - Allow auxtrace data alignment (Adrian Hunter)

  - Allow events with dot (Andi Kleen)

  - Fix failure to 'perf probe' events on arm (He Kuang)

  - Add testing for Makefile.perf (Jiri Olsa)

  - Add test for make install with prefix (Jiri Olsa)

  - Fix single target build dependency check (Jiri Olsa)

  - Access thread_map entries via accessors, prep patch to hold more info per
    entry, for ongoing 'perf stat --per-thread' work (Jiri Olsa)

  - Use __weak definition from compiler.h (Sukadev Bhattiprolu)

  - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)

 Thanks,

	Ingo

------------------>
Aaro Koskinen (2):
      perf tools: Create config.detected into OUTPUT directory
      perf tools: Allow to specify custom linker command

Adrian Hunter (3):
      perf session: Print a newline when dumping PERF_RECORD_FINISHED_ROUND
      perf tools: Print a newline before dumping Aggregated stats
      perf tools: Allow auxtrace data alignment

Andi Kleen (1):
      perf tools: Allow events with dot

Arnaldo Carvalho de Melo (2):
      perf tools: Future-proof thread_map allocation size calculation
      perf trace: Validate syscall list passed via -e argument

He Kuang (1):
      perf probe: Fix failure to probe events on arm

Jiri Olsa (33):
      perf tests: Add testing for Makefile.perf
      perf tests: Add test for make install with prefix
      perf build: Fix single target build dependency check
      perf thread_map: Don't access the array entries directly
      perf thread_map: Change map entries into a struct
      perf tools: Add reference counting for cpu_map object
      perf tools: Add reference counting for thread_map object
      perf evlist: Propagate cpu maps to evsels in an evlist
      perf evlist: Propagate thread maps through the evlist
      perf tools: Make perf_evsel__(nr_)cpus generic
      perf thread_map: Introduce thread_map__reset function
      perf thrad_map: Add comm string into array
      perf tests: Add thread_map object tests
      perf stat: Introduce perf_counts function
      perf stat: Use xyarray for cpu evsel counts
      perf stat: Make stats work over the thread dimension
      perf stat: Rename struct perf_counts::cpu member to values
      perf stat: Introduce perf_evlist__reset_stats
      perf stat: Move perf_evsel__(alloc|free|reset)_stat_priv into stat object
      perf stat: Move perf_evsel__(alloc|free)_prev_raw_counts into stat object
      perf stat: Move perf_evlist__(alloc|free|reset)_stats into stat object
      perf stat: Introduce perf_evsel__alloc_stats function
      perf stat: Introduce perf_evsel__read function
      perf stat: Introduce read_counters function
      perf stat: Separate counters reading and processing
      perf stat: Move zero_per_pkg into counter process code
      perf stat: Move perf_stat initialization counter process code
      perf stat: Remove perf_evsel__read_cb function
      perf stat: Rename print_interval to process_interval
      perf stat: Using init_stats instead of memset
      perf stat: Introduce print_counters function
      perf stat: Introduce --per-thread option
      perf tools: Add missing break for PERF_RECORD_ITRACE_START

Li Zhang (1):
      perf symbols: Check access permission when reading symbol files

Markus Elfring (1):
      perf header: Delete an unnecessary check before the calling free_event_desc()

Namhyung Kim (1):
      perf top: Move toggling event logic into hists browser

Peter Zijlstra (1):
      perf/x86: Fix 'active_events' imbalance

Srikar Dronamraju (1):
      perf bench numa: Fix to show proper convergence stats

Sukadev Bhattiprolu (2):
      perf pmu: Use __weak definition from <linux/compiler.h>
      perf pmu: Split perf_pmu__new_alias()

Taeung Song (5):
      perf inject: Fill in the missing session freeing after an error occurs
      perf kmem: Fill in the missing session freeing after an error occurs
      perf report: Fill in the missing session freeing after an error occurs
      perf kvm: Fill in the missing session freeing after an error occurs
      perf mem: Fill in the missing session freeing after an error occurs


 arch/x86/kernel/cpu/perf_event.c            |  36 +--
 tools/build/Makefile.build                  |   2 +-
 tools/perf/Documentation/perf-stat.txt      |   4 +
 tools/perf/Makefile                         |   4 +-
 tools/perf/Makefile.perf                    |   4 +-
 tools/perf/builtin-inject.c                 |   7 +-
 tools/perf/builtin-kmem.c                   |   4 +-
 tools/perf/builtin-kvm.c                    |  14 +-
 tools/perf/builtin-mem.c                    |  16 +-
 tools/perf/builtin-report.c                 |  17 +-
 tools/perf/builtin-stat.c                   | 412 ++++++++++++++--------------
 tools/perf/builtin-top.c                    |  24 +-
 tools/perf/builtin-trace.c                  |  36 ++-
 tools/perf/config/Makefile                  |   6 +-
 tools/perf/tests/Build                      |   1 +
 tools/perf/tests/builtin-test.c             |   4 +
 tools/perf/tests/code-reading.c             |   4 +-
 tools/perf/tests/keep-tracking.c            |   4 +-
 tools/perf/tests/make                       |  31 ++-
 tools/perf/tests/mmap-basic.c               |   4 +-
 tools/perf/tests/mmap-thread-lookup.c       |   2 +-
 tools/perf/tests/openat-syscall-all-cpus.c  |   8 +-
 tools/perf/tests/openat-syscall-tp-fields.c |   2 +-
 tools/perf/tests/openat-syscall.c           |   6 +-
 tools/perf/tests/switch-tracking.c          |   4 +-
 tools/perf/tests/tests.h                    |   1 +
 tools/perf/tests/thread-map.c               |  38 +++
 tools/perf/ui/browsers/hists.c              |  19 +-
 tools/perf/util/auxtrace.c                  |  11 +-
 tools/perf/util/auxtrace.h                  |   1 +
 tools/perf/util/cloexec.c                   |   4 +
 tools/perf/util/cpumap.c                    |  26 +-
 tools/perf/util/cpumap.h                    |   6 +-
 tools/perf/util/event.c                     |   6 +-
 tools/perf/util/evlist.c                    |  39 ++-
 tools/perf/util/evlist.h                    |   1 -
 tools/perf/util/evsel.c                     |  28 +-
 tools/perf/util/evsel.h                     |  40 ++-
 tools/perf/util/header.c                    |   3 +-
 tools/perf/util/machine.c                   |   3 +-
 tools/perf/util/parse-events.c              |   5 +-
 tools/perf/util/parse-events.l              |   5 +-
 tools/perf/util/pmu.c                       |  45 +--
 tools/perf/util/probe-event.c               |   6 +-
 tools/perf/util/python-ext-sources          |   1 +
 tools/perf/util/python.c                    |   4 +-
 tools/perf/util/record.c                    |   4 +-
 tools/perf/util/session.c                   |   6 +-
 tools/perf/util/stat.c                      | 132 ++++++++-
 tools/perf/util/stat.h                      |  47 +++-
 tools/perf/util/svghelper.c                 |   2 +-
 tools/perf/util/symbol.c                    |   5 +-
 tools/perf/util/thread_map.c                | 132 ++++++++-
 tools/perf/util/thread_map.h                |  31 ++-
 54 files changed, 885 insertions(+), 422 deletions(-)
 create mode 100644 tools/perf/tests/thread-map.c

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 5801a14f7524..3658de47900f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -357,34 +357,24 @@ void x86_release_hardware(void)
  */
 int x86_add_exclusive(unsigned int what)
 {
-	int ret = -EBUSY, i;
-
-	if (atomic_inc_not_zero(&x86_pmu.lbr_exclusive[what]))
-		return 0;
+	int i;
 
-	mutex_lock(&pmc_reserve_mutex);
-	for (i = 0; i < ARRAY_SIZE(x86_pmu.lbr_exclusive); i++) {
-		if (i != what && atomic_read(&x86_pmu.lbr_exclusive[i]))
-			goto out;
+	if (!atomic_inc_not_zero(&x86_pmu.lbr_exclusive[what])) {
+		mutex_lock(&pmc_reserve_mutex);
+		for (i = 0; i < ARRAY_SIZE(x86_pmu.lbr_exclusive); i++) {
+			if (i != what && atomic_read(&x86_pmu.lbr_exclusive[i]))
+				goto fail_unlock;
+		}
+		atomic_inc(&x86_pmu.lbr_exclusive[what]);
+		mutex_unlock(&pmc_reserve_mutex);
 	}
 
-	atomic_inc(&x86_pmu.lbr_exclusive[what]);
-	ret = 0;
+	atomic_inc(&active_events);
+	return 0;
 
-out:
+fail_unlock:
 	mutex_unlock(&pmc_reserve_mutex);
-
-	/*
-	 * Assuming that all exclusive events will share the PMI handler
-	 * (which checks active_events for whether there is work to do),
-	 * we can bump active_events counter right here, except for
-	 * x86_lbr_exclusive_lbr events that go through x86_pmu_event_init()
-	 * path, which already bumps active_events for them.
-	 */
-	if (!ret && what != x86_lbr_exclusive_lbr)
-		atomic_inc(&active_events);
-
-	return ret;
+	return -EBUSY;
 }
 
 void x86_del_exclusive(unsigned int what)
diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index a51244a8022f..faca2bf6a430 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -25,7 +25,7 @@ build-dir := $(srctree)/tools/build
 include $(build-dir)/Build.include
 
 # do not force detected configuration
--include .config-detected
+-include $(OUTPUT).config-detected
 
 # Init all relevant variables used in build files so
 # 1) they have correct type
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 04e150d83e7d..47469abdcc1c 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -144,6 +144,10 @@ is a useful mode to detect imbalance between physical cores.  To enable this mod
 use --per-core in addition to -a. (system-wide).  The output includes the
 core number and the number of online logical processors on that physical processor.
 
+--per-thread::
+Aggregate counts per monitored threads, when monitoring threads (-t option)
+or processes (-p option).
+
 -D msecs::
 --delay msecs::
 After starting the program, wait msecs before measuring. This is useful to
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d31a7bbd7cee..480546d5f13b 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -83,8 +83,8 @@ endef
 #
 # All other targets get passed through:
 #
-%:
+%: FORCE
 	$(print_msg)
 	$(make)
 
-.PHONY: tags TAGS
+.PHONY: tags TAGS FORCE Makefile
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 1af0cfeb7a57..7a4b549214e3 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -110,7 +110,7 @@ $(OUTPUT)PERF-VERSION-FILE: ../../.git/HEAD
 	$(Q)touch $(OUTPUT)PERF-VERSION-FILE
 
 CC = $(CROSS_COMPILE)gcc
-LD = $(CROSS_COMPILE)ld
+LD ?= $(CROSS_COMPILE)ld
 AR = $(CROSS_COMPILE)ar
 PKG_CONFIG = $(CROSS_COMPILE)pkg-config
 
@@ -545,7 +545,7 @@ install: install-bin try-install-man install-traceevent-plugins
 clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
-	$(Q)$(RM) .config-detected
+	$(Q)$(RM) $(OUTPUT).config-detected
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
 	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 52ec66b23607..01b06492bd6a 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -630,12 +630,13 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (inject.session == NULL)
 		return -1;
 
-	if (symbol__init(&inject.session->header.env) < 0)
-		return -1;
+	ret = symbol__init(&inject.session->header.env);
+	if (ret < 0)
+		goto out_delete;
 
 	ret = __cmd_inject(&inject);
 
+out_delete:
 	perf_session__delete(inject.session);
-
 	return ret;
 }
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 950f296dfcf7..23b1faaaa4cc 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -1916,7 +1916,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		if (!perf_evlist__find_tracepoint_by_name(session->evlist,
 							  "kmem:kmalloc")) {
 			pr_err(errmsg, "slab", "slab");
-			return -1;
+			goto out_delete;
 		}
 	}
 
@@ -1927,7 +1927,7 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 							     "kmem:mm_page_alloc");
 		if (evsel == NULL) {
 			pr_err(errmsg, "page", "page");
-			return -1;
+			goto out_delete;
 		}
 
 		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 74878cd75078..fc1cffb1b7a2 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1061,8 +1061,10 @@ static int read_events(struct perf_kvm_stat *kvm)
 
 	symbol__init(&kvm->session->header.env);
 
-	if (!perf_session__has_traces(kvm->session, "kvm record"))
-		return -EINVAL;
+	if (!perf_session__has_traces(kvm->session, "kvm record")) {
+		ret = -EINVAL;
+		goto out_delete;
+	}
 
 	/*
 	 * Do not use 'isa' recorded in kvm_exit tracepoint since it is not
@@ -1070,9 +1072,13 @@ static int read_events(struct perf_kvm_stat *kvm)
 	 */
 	ret = cpu_isa_config(kvm);
 	if (ret < 0)
-		return ret;
+		goto out_delete;
 
-	return perf_session__process_events(kvm->session);
+	ret = perf_session__process_events(kvm->session);
+
+out_delete:
+	perf_session__delete(kvm->session);
+	return ret;
 }
 
 static int parse_target_str(struct perf_kvm_stat *kvm)
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index da2ec06f0742..80170aace5d4 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -124,7 +124,6 @@ static int report_raw_events(struct perf_mem *mem)
 		.mode = PERF_DATA_MODE_READ,
 		.force = mem->force,
 	};
-	int err = -EINVAL;
 	int ret;
 	struct perf_session *session = perf_session__new(&file, false,
 							 &mem->tool);
@@ -135,24 +134,21 @@ static int report_raw_events(struct perf_mem *mem)
 	if (mem->cpu_list) {
 		ret = perf_session__cpu_bitmap(session, mem->cpu_list,
 					       mem->cpu_bitmap);
-		if (ret)
+		if (ret < 0)
 			goto out_delete;
 	}
 
-	if (symbol__init(&session->header.env) < 0)
-		return -1;
+	ret = symbol__init(&session->header.env);
+	if (ret < 0)
+		goto out_delete;
 
 	printf("# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
 
-	err = perf_session__process_events(session);
-	if (err)
-		return err;
-
-	return 0;
+	ret = perf_session__process_events(session);
 
 out_delete:
 	perf_session__delete(session);
-	return err;
+	return ret;
 }
 
 static int report_events(int argc, const char **argv, struct perf_mem *mem)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 32626ea3e227..95a47719aec3 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -742,6 +742,17 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 
 	argc = parse_options(argc, argv, options, report_usage, 0);
 
+	if (symbol_conf.vmlinux_name &&
+	    access(symbol_conf.vmlinux_name, R_OK)) {
+		pr_err("Invalid file: %s\n", symbol_conf.vmlinux_name);
+		return -EINVAL;
+	}
+	if (symbol_conf.kallsyms_name &&
+	    access(symbol_conf.kallsyms_name, R_OK)) {
+		pr_err("Invalid file: %s\n", symbol_conf.kallsyms_name);
+		return -EINVAL;
+	}
+
 	if (report.use_stdio)
 		use_browser = 0;
 	else if (report.use_tui)
@@ -828,8 +839,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (report.header || report.header_only) {
 		perf_session__fprintf_info(session, stdout,
 					   report.show_full_info);
-		if (report.header_only)
-			return 0;
+		if (report.header_only) {
+			ret = 0;
+			goto error;
+		}
 	} else if (use_browser == 0) {
 		fputs("# To display the perf.data header info, please use --header/--header-only options.\n#\n",
 		      stdout);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index fcf99bdeb19e..37e301a32f43 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -67,10 +67,7 @@
 #define CNTR_NOT_SUPPORTED	"<not supported>"
 #define CNTR_NOT_COUNTED	"<not counted>"
 
-static void print_stat(int argc, const char **argv);
-static void print_counter_aggr(struct perf_evsel *counter, char *prefix);
-static void print_counter(struct perf_evsel *counter, char *prefix);
-static void print_aggr(char *prefix);
+static void print_counters(struct timespec *ts, int argc, const char **argv);
 
 /* Default events used for perf stat -T */
 static const char *transaction_attrs = {
@@ -141,96 +138,9 @@ static inline void diff_timespec(struct timespec *r, struct timespec *a,
 	}
 }
 
-static inline struct cpu_map *perf_evsel__cpus(struct perf_evsel *evsel)
+static void perf_stat__reset_stats(void)
 {
-	return (evsel->cpus && !target.cpu_list) ? evsel->cpus : evsel_list->cpus;
-}
-
-static inline int perf_evsel__nr_cpus(struct perf_evsel *evsel)
-{
-	return perf_evsel__cpus(evsel)->nr;
-}
-
-static void perf_evsel__reset_stat_priv(struct perf_evsel *evsel)
-{
-	int i;
-	struct perf_stat *ps = evsel->priv;
-
-	for (i = 0; i < 3; i++)
-		init_stats(&ps->res_stats[i]);
-
-	perf_stat_evsel_id_init(evsel);
-}
-
-static int perf_evsel__alloc_stat_priv(struct perf_evsel *evsel)
-{
-	evsel->priv = zalloc(sizeof(struct perf_stat));
-	if (evsel->priv == NULL)
-		return -ENOMEM;
-	perf_evsel__reset_stat_priv(evsel);
-	return 0;
-}
-
-static void perf_evsel__free_stat_priv(struct perf_evsel *evsel)
-{
-	zfree(&evsel->priv);
-}
-
-static int perf_evsel__alloc_prev_raw_counts(struct perf_evsel *evsel)
-{
-	struct perf_counts *counts;
-
-	counts = perf_counts__new(perf_evsel__nr_cpus(evsel));
-	if (counts)
-		evsel->prev_raw_counts = counts;
-
-	return counts ? 0 : -ENOMEM;
-}
-
-static void perf_evsel__free_prev_raw_counts(struct perf_evsel *evsel)
-{
-	perf_counts__delete(evsel->prev_raw_counts);
-	evsel->prev_raw_counts = NULL;
-}
-
-static void perf_evlist__free_stats(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel;
-
-	evlist__for_each(evlist, evsel) {
-		perf_evsel__free_stat_priv(evsel);
-		perf_evsel__free_counts(evsel);
-		perf_evsel__free_prev_raw_counts(evsel);
-	}
-}
-
-static int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw)
-{
-	struct perf_evsel *evsel;
-
-	evlist__for_each(evlist, evsel) {
-		if (perf_evsel__alloc_stat_priv(evsel) < 0 ||
-		    perf_evsel__alloc_counts(evsel, perf_evsel__nr_cpus(evsel)) < 0 ||
-		    (alloc_raw && perf_evsel__alloc_prev_raw_counts(evsel) < 0))
-			goto out_free;
-	}
-
-	return 0;
-
-out_free:
-	perf_evlist__free_stats(evlist);
-	return -1;
-}
-
-static void perf_stat__reset_stats(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel;
-
-	evlist__for_each(evlist, evsel) {
-		perf_evsel__reset_stat_priv(evsel);
-		perf_evsel__reset_counts(evsel, perf_evsel__nr_cpus(evsel));
-	}
-
+	perf_evlist__reset_stats(evsel_list);
 	perf_stat__reset_shadow_stats();
 }
 
@@ -304,8 +214,9 @@ static int check_per_pkg(struct perf_evsel *counter, int cpu, bool *skip)
 	return 0;
 }
 
-static int read_cb(struct perf_evsel *evsel, int cpu, int thread __maybe_unused,
-		   struct perf_counts_values *count)
+static int
+process_counter_values(struct perf_evsel *evsel, int cpu, int thread,
+		       struct perf_counts_values *count)
 {
 	struct perf_counts_values *aggr = &evsel->counts->aggr;
 	static struct perf_counts_values zero;
@@ -320,13 +231,13 @@ static int read_cb(struct perf_evsel *evsel, int cpu, int thread __maybe_unused,
 		count = &zero;
 
 	switch (aggr_mode) {
+	case AGGR_THREAD:
 	case AGGR_CORE:
 	case AGGR_SOCKET:
 	case AGGR_NONE:
 		if (!evsel->snapshot)
-			perf_evsel__compute_deltas(evsel, cpu, count);
+			perf_evsel__compute_deltas(evsel, cpu, thread, count);
 		perf_counts_values__scale(count, scale, NULL);
-		evsel->counts->cpu[cpu] = *count;
 		if (aggr_mode == AGGR_NONE)
 			perf_stat__update_shadow_stats(evsel, count->values, cpu);
 		break;
@@ -343,26 +254,48 @@ static int read_cb(struct perf_evsel *evsel, int cpu, int thread __maybe_unused,
 	return 0;
 }
 
-static int read_counter(struct perf_evsel *counter);
+static int process_counter_maps(struct perf_evsel *counter)
+{
+	int nthreads = thread_map__nr(counter->threads);
+	int ncpus = perf_evsel__nr_cpus(counter);
+	int cpu, thread;
 
-/*
- * Read out the results of a single counter:
- * aggregate counts across CPUs in system-wide mode
- */
-static int read_counter_aggr(struct perf_evsel *counter)
+	if (counter->system_wide)
+		nthreads = 1;
+
+	for (thread = 0; thread < nthreads; thread++) {
+		for (cpu = 0; cpu < ncpus; cpu++) {
+			if (process_counter_values(counter, cpu, thread,
+						   perf_counts(counter->counts, cpu, thread)))
+				return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int process_counter(struct perf_evsel *counter)
 {
 	struct perf_counts_values *aggr = &counter->counts->aggr;
 	struct perf_stat *ps = counter->priv;
 	u64 *count = counter->counts->aggr.values;
-	int i;
+	int i, ret;
 
 	aggr->val = aggr->ena = aggr->run = 0;
+	init_stats(ps->res_stats);
 
-	if (read_counter(counter))
-		return -1;
+	if (counter->per_pkg)
+		zero_per_pkg(counter);
+
+	ret = process_counter_maps(counter);
+	if (ret)
+		return ret;
+
+	if (aggr_mode != AGGR_GLOBAL)
+		return 0;
 
 	if (!counter->snapshot)
-		perf_evsel__compute_deltas(counter, -1, aggr);
+		perf_evsel__compute_deltas(counter, -1, -1, aggr);
 	perf_counts_values__scale(aggr, scale, &counter->counts->scaled);
 
 	for (i = 0; i < 3; i++)
@@ -397,12 +330,12 @@ static int read_counter(struct perf_evsel *counter)
 	if (counter->system_wide)
 		nthreads = 1;
 
-	if (counter->per_pkg)
-		zero_per_pkg(counter);
-
 	for (thread = 0; thread < nthreads; thread++) {
 		for (cpu = 0; cpu < ncpus; cpu++) {
-			if (perf_evsel__read_cb(counter, cpu, thread, read_cb))
+			struct perf_counts_values *count;
+
+			count = perf_counts(counter->counts, cpu, thread);
+			if (perf_evsel__read(counter, cpu, thread, count))
 				return -1;
 		}
 	}
@@ -410,68 +343,34 @@ static int read_counter(struct perf_evsel *counter)
 	return 0;
 }
 
-static void print_interval(void)
+static void read_counters(bool close)
 {
-	static int num_print_interval;
 	struct perf_evsel *counter;
-	struct perf_stat *ps;
-	struct timespec ts, rs;
-	char prefix[64];
 
-	if (aggr_mode == AGGR_GLOBAL) {
-		evlist__for_each(evsel_list, counter) {
-			ps = counter->priv;
-			memset(ps->res_stats, 0, sizeof(ps->res_stats));
-			read_counter_aggr(counter);
-		}
-	} else	{
-		evlist__for_each(evsel_list, counter) {
-			ps = counter->priv;
-			memset(ps->res_stats, 0, sizeof(ps->res_stats));
-			read_counter(counter);
-		}
-	}
+	evlist__for_each(evsel_list, counter) {
+		if (read_counter(counter))
+			pr_warning("failed to read counter %s\n", counter->name);
 
-	clock_gettime(CLOCK_MONOTONIC, &ts);
-	diff_timespec(&rs, &ts, &ref_time);
-	sprintf(prefix, "%6lu.%09lu%s", rs.tv_sec, rs.tv_nsec, csv_sep);
+		if (process_counter(counter))
+			pr_warning("failed to process counter %s\n", counter->name);
 
-	if (num_print_interval == 0 && !csv_output) {
-		switch (aggr_mode) {
-		case AGGR_SOCKET:
-			fprintf(output, "#           time socket cpus             counts %*s events\n", unit_width, "unit");
-			break;
-		case AGGR_CORE:
-			fprintf(output, "#           time core         cpus             counts %*s events\n", unit_width, "unit");
-			break;
-		case AGGR_NONE:
-			fprintf(output, "#           time CPU                counts %*s events\n", unit_width, "unit");
-			break;
-		case AGGR_GLOBAL:
-		default:
-			fprintf(output, "#           time             counts %*s events\n", unit_width, "unit");
+		if (close) {
+			perf_evsel__close_fd(counter, perf_evsel__nr_cpus(counter),
+					     thread_map__nr(evsel_list->threads));
 		}
 	}
+}
 
-	if (++num_print_interval == 25)
-		num_print_interval = 0;
+static void process_interval(void)
+{
+	struct timespec ts, rs;
 
-	switch (aggr_mode) {
-	case AGGR_CORE:
-	case AGGR_SOCKET:
-		print_aggr(prefix);
-		break;
-	case AGGR_NONE:
-		evlist__for_each(evsel_list, counter)
-			print_counter(counter, prefix);
-		break;
-	case AGGR_GLOBAL:
-	default:
-		evlist__for_each(evsel_list, counter)
-			print_counter_aggr(counter, prefix);
-	}
+	read_counters(false);
 
-	fflush(output);
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+	diff_timespec(&rs, &ts, &ref_time);
+
+	print_counters(&rs, 0, NULL);
 }
 
 static void handle_initial_delay(void)
@@ -586,7 +485,7 @@ static int __run_perf_stat(int argc, const char **argv)
 		if (interval) {
 			while (!waitpid(child_pid, &status, WNOHANG)) {
 				nanosleep(&ts, NULL);
-				print_interval();
+				process_interval();
 			}
 		}
 		wait(&status);
@@ -604,7 +503,7 @@ static int __run_perf_stat(int argc, const char **argv)
 		while (!done) {
 			nanosleep(&ts, NULL);
 			if (interval)
-				print_interval();
+				process_interval();
 		}
 	}
 
@@ -612,18 +511,7 @@ static int __run_perf_stat(int argc, const char **argv)
 
 	update_stats(&walltime_nsecs_stats, t1 - t0);
 
-	if (aggr_mode == AGGR_GLOBAL) {
-		evlist__for_each(evsel_list, counter) {
-			read_counter_aggr(counter);
-			perf_evsel__close_fd(counter, perf_evsel__nr_cpus(counter),
-					     thread_map__nr(evsel_list->threads));
-		}
-	} else {
-		evlist__for_each(evsel_list, counter) {
-			read_counter(counter);
-			perf_evsel__close_fd(counter, perf_evsel__nr_cpus(counter), 1);
-		}
-	}
+	read_counters(true);
 
 	return WEXITSTATUS(status);
 }
@@ -715,6 +603,14 @@ static void aggr_printout(struct perf_evsel *evsel, int id, int nr)
 			csv_output ? 0 : -4,
 			perf_evsel__cpus(evsel)->map[id], csv_sep);
 		break;
+	case AGGR_THREAD:
+		fprintf(output, "%*s-%*d%s",
+			csv_output ? 0 : 16,
+			thread_map__comm(evsel->threads, id),
+			csv_output ? 0 : -8,
+			thread_map__pid(evsel->threads, id),
+			csv_sep);
+		break;
 	case AGGR_GLOBAL:
 	default:
 		break;
@@ -815,9 +711,9 @@ static void print_aggr(char *prefix)
 				s2 = aggr_get_id(evsel_list->cpus, cpu2);
 				if (s2 != id)
 					continue;
-				val += counter->counts->cpu[cpu].val;
-				ena += counter->counts->cpu[cpu].ena;
-				run += counter->counts->cpu[cpu].run;
+				val += perf_counts(counter->counts, cpu, 0)->val;
+				ena += perf_counts(counter->counts, cpu, 0)->ena;
+				run += perf_counts(counter->counts, cpu, 0)->run;
 				nr++;
 			}
 			if (prefix)
@@ -863,6 +759,40 @@ static void print_aggr(char *prefix)
 	}
 }
 
+static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
+{
+	int nthreads = thread_map__nr(counter->threads);
+	int ncpus = cpu_map__nr(counter->cpus);
+	int cpu, thread;
+	double uval;
+
+	for (thread = 0; thread < nthreads; thread++) {
+		u64 ena = 0, run = 0, val = 0;
+
+		for (cpu = 0; cpu < ncpus; cpu++) {
+			val += perf_counts(counter->counts, cpu, thread)->val;
+			ena += perf_counts(counter->counts, cpu, thread)->ena;
+			run += perf_counts(counter->counts, cpu, thread)->run;
+		}
+
+		if (prefix)
+			fprintf(output, "%s", prefix);
+
+		uval = val * counter->scale;
+
+		if (nsec_counter(counter))
+			nsec_printout(thread, 0, counter, uval);
+		else
+			abs_printout(thread, 0, counter, uval);
+
+		if (!csv_output)
+			print_noise(counter, 1.0);
+
+		print_running(run, ena);
+		fputc('\n', output);
+	}
+}
+
 /*
  * Print out the results of a single counter:
  * aggregated counts in system-wide mode
@@ -925,9 +855,9 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
 	int cpu;
 
 	for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
-		val = counter->counts->cpu[cpu].val;
-		ena = counter->counts->cpu[cpu].ena;
-		run = counter->counts->cpu[cpu].run;
+		val = perf_counts(counter->counts, cpu, 0)->val;
+		ena = perf_counts(counter->counts, cpu, 0)->ena;
+		run = perf_counts(counter->counts, cpu, 0)->run;
 
 		if (prefix)
 			fprintf(output, "%s", prefix);
@@ -972,9 +902,38 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
 	}
 }
 
-static void print_stat(int argc, const char **argv)
+static void print_interval(char *prefix, struct timespec *ts)
+{
+	static int num_print_interval;
+
+	sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
+
+	if (num_print_interval == 0 && !csv_output) {
+		switch (aggr_mode) {
+		case AGGR_SOCKET:
+			fprintf(output, "#           time socket cpus             counts %*s events\n", unit_width, "unit");
+			break;
+		case AGGR_CORE:
+			fprintf(output, "#           time core         cpus             counts %*s events\n", unit_width, "unit");
+			break;
+		case AGGR_NONE:
+			fprintf(output, "#           time CPU                counts %*s events\n", unit_width, "unit");
+			break;
+		case AGGR_THREAD:
+			fprintf(output, "#           time             comm-pid                  counts %*s events\n", unit_width, "unit");
+			break;
+		case AGGR_GLOBAL:
+		default:
+			fprintf(output, "#           time             counts %*s events\n", unit_width, "unit");
+		}
+	}
+
+	if (++num_print_interval == 25)
+		num_print_interval = 0;
+}
+
+static void print_header(int argc, const char **argv)
 {
-	struct perf_evsel *counter;
 	int i;
 
 	fflush(stdout);
@@ -1000,36 +959,57 @@ static void print_stat(int argc, const char **argv)
 			fprintf(output, " (%d runs)", run_count);
 		fprintf(output, ":\n\n");
 	}
+}
+
+static void print_footer(void)
+{
+	if (!null_run)
+		fprintf(output, "\n");
+	fprintf(output, " %17.9f seconds time elapsed",
+			avg_stats(&walltime_nsecs_stats)/1e9);
+	if (run_count > 1) {
+		fprintf(output, "                                        ");
+		print_noise_pct(stddev_stats(&walltime_nsecs_stats),
+				avg_stats(&walltime_nsecs_stats));
+	}
+	fprintf(output, "\n\n");
+}
+
+static void print_counters(struct timespec *ts, int argc, const char **argv)
+{
+	struct perf_evsel *counter;
+	char buf[64], *prefix = NULL;
+
+	if (interval)
+		print_interval(prefix = buf, ts);
+	else
+		print_header(argc, argv);
 
 	switch (aggr_mode) {
 	case AGGR_CORE:
 	case AGGR_SOCKET:
-		print_aggr(NULL);
+		print_aggr(prefix);
+		break;
+	case AGGR_THREAD:
+		evlist__for_each(evsel_list, counter)
+			print_aggr_thread(counter, prefix);
 		break;
 	case AGGR_GLOBAL:
 		evlist__for_each(evsel_list, counter)
-			print_counter_aggr(counter, NULL);
+			print_counter_aggr(counter, prefix);
 		break;
 	case AGGR_NONE:
 		evlist__for_each(evsel_list, counter)
-			print_counter(counter, NULL);
+			print_counter(counter, prefix);
 		break;
 	default:
 		break;
 	}
 
-	if (!csv_output) {
-		if (!null_run)
-			fprintf(output, "\n");
-		fprintf(output, " %17.9f seconds time elapsed",
-				avg_stats(&walltime_nsecs_stats)/1e9);
-		if (run_count > 1) {
-			fprintf(output, "                                        ");
-			print_noise_pct(stddev_stats(&walltime_nsecs_stats),
-					avg_stats(&walltime_nsecs_stats));
-		}
-		fprintf(output, "\n\n");
-	}
+	if (!interval && !csv_output)
+		print_footer();
+
+	fflush(output);
 }
 
 static volatile int signr = -1;
@@ -1101,6 +1081,7 @@ static int perf_stat_init_aggr_mode(void)
 		break;
 	case AGGR_NONE:
 	case AGGR_GLOBAL:
+	case AGGR_THREAD:
 	default:
 		break;
 	}
@@ -1325,6 +1306,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 		     "aggregate counts per processor socket", AGGR_SOCKET),
 	OPT_SET_UINT(0, "per-core", &aggr_mode,
 		     "aggregate counts per physical processor core", AGGR_CORE),
+	OPT_SET_UINT(0, "per-thread", &aggr_mode,
+		     "aggregate counts per thread", AGGR_THREAD),
 	OPT_UINTEGER('D', "delay", &initial_delay,
 		     "ms to wait before starting measurement after program start"),
 	OPT_END()
@@ -1416,8 +1399,19 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 		run_count = 1;
 	}
 
-	/* no_aggr, cgroup are for system-wide only */
-	if ((aggr_mode != AGGR_GLOBAL || nr_cgroups) &&
+	if ((aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
+		fprintf(stderr, "The --per-thread option is only available "
+			"when monitoring via -p -t options.\n");
+		parse_options_usage(NULL, options, "p", 1);
+		parse_options_usage(NULL, options, "t", 1);
+		goto out;
+	}
+
+	/*
+	 * no_aggr, cgroup are for system-wide only
+	 * --per-thread is aggregated per thread, we dont mix it with cpu mode
+	 */
+	if (((aggr_mode != AGGR_GLOBAL && aggr_mode != AGGR_THREAD) || nr_cgroups) &&
 	    !target__has_cpu(&target)) {
 		fprintf(stderr, "both cgroup and no-aggregation "
 			"modes only available in system-wide mode\n");
@@ -1445,6 +1439,14 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 		goto out;
 	}
+
+	/*
+	 * Initialize thread_map with comm names,
+	 * so we could print it out on output.
+	 */
+	if (aggr_mode == AGGR_THREAD)
+		thread_map__read_comms(evsel_list->threads);
+
 	if (interval && interval < 100) {
 		pr_err("print interval must be >= 100ms\n");
 		parse_options_usage(stat_usage, options, "I", 1);
@@ -1478,13 +1480,13 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 
 		status = run_perf_stat(argc, argv);
 		if (forever && status != -1) {
-			print_stat(argc, argv);
-			perf_stat__reset_stats(evsel_list);
+			print_counters(NULL, argc, argv);
+			perf_stat__reset_stats();
 		}
 	}
 
 	if (!forever && status != -1 && !interval)
-		print_stat(argc, argv);
+		print_counters(NULL, argc, argv);
 
 	perf_evlist__free_stats(evsel_list);
 out:
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 619a8696fda7..ecf319728f25 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -586,27 +586,9 @@ static void *display_thread_tui(void *arg)
 		hists->uid_filter_str = top->record_opts.target.uid_str;
 	}
 
-	while (true)  {
-		int key = perf_evlist__tui_browse_hists(top->evlist, help, &hbt,
-							top->min_percent,
-							&top->session->header.env);
-
-		if (key != 'f')
-			break;
-
-		perf_evlist__toggle_enable(top->evlist);
-		/*
-		 * No need to refresh, resort/decay histogram entries
-		 * if we are not collecting samples:
-		 */
-		if (top->evlist->enabled) {
-			hbt.refresh = top->delay_secs;
-			help = "Press 'f' to disable the events or 'h' to see other hotkeys";
-		} else {
-			help = "Press 'f' again to re-enable the events";
-			hbt.refresh = 0;
-		}
-	}
+	perf_evlist__tui_browse_hists(top->evlist, help, &hbt,
+				      top->min_percent,
+				      &top->session->header.env);
 
 	done = 1;
 	return NULL;
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index de5d277d1ad7..39ad4d0ca884 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1617,6 +1617,34 @@ static int trace__read_syscall_info(struct trace *trace, int id)
 	return syscall__set_arg_fmts(sc);
 }
 
+static int trace__validate_ev_qualifier(struct trace *trace)
+{
+	int err = 0;
+	struct str_node *pos;
+
+	strlist__for_each(pos, trace->ev_qualifier) {
+		const char *sc = pos->s;
+
+		if (audit_name_to_syscall(sc, trace->audit.machine) < 0) {
+			if (err == 0) {
+				fputs("Error:\tInvalid syscall ", trace->output);
+				err = -EINVAL;
+			} else {
+				fputs(", ", trace->output);
+			}
+
+			fputs(sc, trace->output);
+		}
+	}
+
+	if (err < 0) {
+		fputs("\nHint:\ttry 'perf list syscalls:sys_enter_*'"
+		      "\nHint:\tand: 'man syscalls'\n", trace->output);
+	}
+
+	return err;
+}
+
 /*
  * args is to be interpreted as a series of longs but we need to handle
  * 8-byte unaligned accesses. args points to raw_data within the event
@@ -2325,7 +2353,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 	 */
 	if (trace->filter_pids.nr > 0)
 		err = perf_evlist__set_filter_pids(evlist, trace->filter_pids.nr, trace->filter_pids.entries);
-	else if (evlist->threads->map[0] == -1)
+	else if (thread_map__pid(evlist->threads, 0) == -1)
 		err = perf_evlist__set_filter_pid(evlist, getpid());
 
 	if (err < 0) {
@@ -2343,7 +2371,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 	if (forks)
 		perf_evlist__start_workload(evlist);
 
-	trace->multiple_threads = evlist->threads->map[0] == -1 ||
+	trace->multiple_threads = thread_map__pid(evlist->threads, 0) == -1 ||
 				  evlist->threads->nr > 1 ||
 				  perf_evlist__first(evlist)->attr.inherit;
 again:
@@ -2862,6 +2890,10 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 			err = -ENOMEM;
 			goto out_close;
 		}
+
+		err = trace__validate_ev_qualifier(&trace);
+		if (err)
+			goto out_close;
 	}
 
 	err = target__validate(&trace.opts.target);
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 317001c94660..094ddaee104c 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -11,9 +11,9 @@ ifneq ($(obj-perf),)
 obj-perf := $(abspath $(obj-perf))/
 endif
 
-$(shell echo -n > .config-detected)
-detected     = $(shell echo "$(1)=y"       >> .config-detected)
-detected_var = $(shell echo "$(1)=$($(1))" >> .config-detected)
+$(shell echo -n > $(OUTPUT).config-detected)
+detected     = $(shell echo "$(1)=y"       >> $(OUTPUT).config-detected)
+detected_var = $(shell echo "$(1)=$($(1))" >> $(OUTPUT).config-detected)
 
 CFLAGS := $(EXTRA_CFLAGS) $(EXTRA_WARNINGS)
 
diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index ee41e705b2eb..d20d6e6ab65b 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -31,6 +31,7 @@ perf-y += code-reading.o
 perf-y += sample-parsing.o
 perf-y += parse-no-sample-id-all.o
 perf-y += kmod-path.o
+perf-y += thread-map.o
 
 perf-$(CONFIG_X86) += perf-time-to-tsc.o
 
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 87b9961646e4..c1dde733c3a6 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -171,6 +171,10 @@ static struct test {
 		.func = test__kmod_path__parse,
 	},
 	{
+		.desc = "Test thread map",
+		.func = test__thread_map,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 22f8a00446e1..39c784a100a9 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -545,8 +545,8 @@ static int do_test_code_reading(bool try_kcore)
 	if (evlist) {
 		perf_evlist__delete(evlist);
 	} else {
-		cpu_map__delete(cpus);
-		thread_map__delete(threads);
+		cpu_map__put(cpus);
+		thread_map__put(threads);
 	}
 	machines__destroy_kernel_maps(&machines);
 	machine__delete_threads(machine);
diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c
index 5b171d1e338b..4d4b9837b630 100644
--- a/tools/perf/tests/keep-tracking.c
+++ b/tools/perf/tests/keep-tracking.c
@@ -144,8 +144,8 @@ int test__keep_tracking(void)
 		perf_evlist__disable(evlist);
 		perf_evlist__delete(evlist);
 	} else {
-		cpu_map__delete(cpus);
-		thread_map__delete(threads);
+		cpu_map__put(cpus);
+		thread_map__put(threads);
 	}
 
 	return err;
diff --git a/tools/perf/tests/make b/tools/perf/tests/make
index 65280d28662e..729112f4cfaa 100644
--- a/tools/perf/tests/make
+++ b/tools/perf/tests/make
@@ -1,5 +1,16 @@
+ifndef MK
+ifeq ($(MAKECMDGOALS),)
+# no target specified, trigger the whole suite
+all:
+	@echo "Testing Makefile";      $(MAKE) -sf tests/make MK=Makefile
+	@echo "Testing Makefile.perf"; $(MAKE) -sf tests/make MK=Makefile.perf
+else
+# run only specific test over 'Makefile'
+%:
+	@echo "Testing Makefile";      $(MAKE) -sf tests/make MK=Makefile $@
+endif
+else
 PERF := .
-MK   := Makefile
 
 include config/Makefile.arch
 
@@ -47,6 +58,7 @@ make_install_man    := install-man
 make_install_html   := install-html
 make_install_info   := install-info
 make_install_pdf    := install-pdf
+make_install_prefix := install prefix=/tmp/krava
 make_static         := LDFLAGS=-static
 
 # all the NO_* variable combined
@@ -57,7 +69,12 @@ make_minimal        += NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1
 
 # $(run) contains all available tests
 run := make_pure
+# Targets 'clean all' can be run together only through top level
+# Makefile because we detect clean target in Makefile.perf and
+# disable features detection
+ifeq ($(MK),Makefile)
 run += make_clean_all
+endif
 run += make_python_perf_so
 run += make_debug
 run += make_no_libperl
@@ -83,6 +100,7 @@ run += make_util_map_o
 run += make_util_pmu_bison_o
 run += make_install
 run += make_install_bin
+run += make_install_prefix
 # FIXME 'install-*' commented out till they're fixed
 # run += make_install_doc
 # run += make_install_man
@@ -157,6 +175,12 @@ test_make_install_O     := $(call test_dest_files,$(installed_files_all))
 test_make_install_bin   := $(call test_dest_files,$(installed_files_bin))
 test_make_install_bin_O := $(call test_dest_files,$(installed_files_bin))
 
+# We prefix all installed files for make_install_prefix
+# with '/tmp/krava' to match installed/prefix-ed files.
+installed_files_all_prefix := $(addprefix /tmp/krava/,$(installed_files_all))
+test_make_install_prefix   := $(call test_dest_files,$(installed_files_all_prefix))
+test_make_install_prefix_O := $(call test_dest_files,$(installed_files_all_prefix))
+
 # FIXME nothing gets installed
 test_make_install_man    := test -f $$TMP_DEST/share/man/man1/perf.1
 test_make_install_man_O  := $(test_make_install_man)
@@ -226,13 +250,13 @@ clean := @(cd $(PERF); make -s -f $(MK) clean >/dev/null)
 	( eval $$cmd ) >> $@ 2>&1
 
 make_kernelsrc:
-	@echo " - make -C <kernelsrc> tools/perf"
+	@echo "- make -C <kernelsrc> tools/perf"
 	$(call clean); \
 	(make -C ../.. tools/perf) > $@ 2>&1 && \
 	test -x perf && rm -f $@ || (cat $@ ; false)
 
 make_kernelsrc_tools:
-	@echo " - make -C <kernelsrc>/tools perf"
+	@echo "- make -C <kernelsrc>/tools perf"
 	$(call clean); \
 	(make -C ../../tools perf) > $@ 2>&1 && \
 	test -x perf && rm -f $@ || (cat $@ ; false)
@@ -244,3 +268,4 @@ out: $(run_O)
 	@echo OK
 
 .PHONY: all $(run) $(run_O) tarpkg clean
+endif # ifndef MK
diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
index 5855cf471210..666b67a4df9d 100644
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
@@ -140,8 +140,8 @@ int test__basic_mmap(void)
 	cpus	= NULL;
 	threads = NULL;
 out_free_cpus:
-	cpu_map__delete(cpus);
+	cpu_map__put(cpus);
 out_free_threads:
-	thread_map__delete(threads);
+	thread_map__put(threads);
 	return err;
 }
diff --git a/tools/perf/tests/mmap-thread-lookup.c b/tools/perf/tests/mmap-thread-lookup.c
index 7f48efa7e295..145050e2e544 100644
--- a/tools/perf/tests/mmap-thread-lookup.c
+++ b/tools/perf/tests/mmap-thread-lookup.c
@@ -143,7 +143,7 @@ static int synth_process(struct machine *machine)
 						perf_event__process,
 						machine, 0, 500);
 
-	thread_map__delete(map);
+	thread_map__put(map);
 	return err;
 }
 
diff --git a/tools/perf/tests/openat-syscall-all-cpus.c b/tools/perf/tests/openat-syscall-all-cpus.c
index 9a7a116e09b8..a572f87e9c8d 100644
--- a/tools/perf/tests/openat-syscall-all-cpus.c
+++ b/tools/perf/tests/openat-syscall-all-cpus.c
@@ -78,7 +78,7 @@ int test__openat_syscall_event_on_all_cpus(void)
 	 * we use the auto allocation it will allocate just for 1 cpu,
 	 * as we start by cpu 0.
 	 */
-	if (perf_evsel__alloc_counts(evsel, cpus->nr) < 0) {
+	if (perf_evsel__alloc_counts(evsel, cpus->nr, 1) < 0) {
 		pr_debug("perf_evsel__alloc_counts(ncpus=%d)\n", cpus->nr);
 		goto out_close_fd;
 	}
@@ -98,9 +98,9 @@ int test__openat_syscall_event_on_all_cpus(void)
 		}
 
 		expected = nr_openat_calls + cpu;
-		if (evsel->counts->cpu[cpu].val != expected) {
+		if (perf_counts(evsel->counts, cpu, 0)->val != expected) {
 			pr_debug("perf_evsel__read_on_cpu: expected to intercept %d calls on cpu %d, got %" PRIu64 "\n",
-				 expected, cpus->map[cpu], evsel->counts->cpu[cpu].val);
+				 expected, cpus->map[cpu], perf_counts(evsel->counts, cpu, 0)->val);
 			err = -1;
 		}
 	}
@@ -111,6 +111,6 @@ int test__openat_syscall_event_on_all_cpus(void)
 out_evsel_delete:
 	perf_evsel__delete(evsel);
 out_thread_map_delete:
-	thread_map__delete(threads);
+	thread_map__put(threads);
 	return err;
 }
diff --git a/tools/perf/tests/openat-syscall-tp-fields.c b/tools/perf/tests/openat-syscall-tp-fields.c
index 6245221479d7..01a19626c846 100644
--- a/tools/perf/tests/openat-syscall-tp-fields.c
+++ b/tools/perf/tests/openat-syscall-tp-fields.c
@@ -45,7 +45,7 @@ int test__syscall_openat_tp_fields(void)
 
 	perf_evsel__config(evsel, &opts);
 
-	evlist->threads->map[0] = getpid();
+	thread_map__set_pid(evlist->threads, 0, getpid());
 
 	err = perf_evlist__open(evlist);
 	if (err < 0) {
diff --git a/tools/perf/tests/openat-syscall.c b/tools/perf/tests/openat-syscall.c
index 9f9491bb8e48..c9a37bc6b33a 100644
--- a/tools/perf/tests/openat-syscall.c
+++ b/tools/perf/tests/openat-syscall.c
@@ -44,9 +44,9 @@ int test__openat_syscall_event(void)
 		goto out_close_fd;
 	}
 
-	if (evsel->counts->cpu[0].val != nr_openat_calls) {
+	if (perf_counts(evsel->counts, 0, 0)->val != nr_openat_calls) {
 		pr_debug("perf_evsel__read_on_cpu: expected to intercept %d calls, got %" PRIu64 "\n",
-			 nr_openat_calls, evsel->counts->cpu[0].val);
+			 nr_openat_calls, perf_counts(evsel->counts, 0, 0)->val);
 		goto out_close_fd;
 	}
 
@@ -56,6 +56,6 @@ int test__openat_syscall_event(void)
 out_evsel_delete:
 	perf_evsel__delete(evsel);
 out_thread_map_delete:
-	thread_map__delete(threads);
+	thread_map__put(threads);
 	return err;
 }
diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c
index 0d31403ea593..e698742d4fec 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -560,8 +560,8 @@ int test__switch_tracking(void)
 		perf_evlist__disable(evlist);
 		perf_evlist__delete(evlist);
 	} else {
-		cpu_map__delete(cpus);
-		thread_map__delete(threads);
+		cpu_map__put(cpus);
+		thread_map__put(threads);
 	}
 
 	return err;
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 8e5038b48ba8..ebb47d96bc0b 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -61,6 +61,7 @@ int test__switch_tracking(void);
 int test__fdarray__filter(void);
 int test__fdarray__add(void);
 int test__kmod_path__parse(void);
+int test__thread_map(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__) || defined(__aarch64__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
new file mode 100644
index 000000000000..5acf000939ea
--- /dev/null
+++ b/tools/perf/tests/thread-map.c
@@ -0,0 +1,38 @@
+#include <sys/types.h>
+#include <unistd.h>
+#include "tests.h"
+#include "thread_map.h"
+#include "debug.h"
+
+int test__thread_map(void)
+{
+	struct thread_map *map;
+
+	/* test map on current pid */
+	map = thread_map__new_by_pid(getpid());
+	TEST_ASSERT_VAL("failed to alloc map", map);
+
+	thread_map__read_comms(map);
+
+	TEST_ASSERT_VAL("wrong nr", map->nr == 1);
+	TEST_ASSERT_VAL("wrong pid",
+			thread_map__pid(map, 0) == getpid());
+	TEST_ASSERT_VAL("wrong comm",
+			thread_map__comm(map, 0) &&
+			!strcmp(thread_map__comm(map, 0), "perf"));
+	thread_map__put(map);
+
+	/* test dummy pid */
+	map = thread_map__new_dummy();
+	TEST_ASSERT_VAL("failed to alloc map", map);
+
+	thread_map__read_comms(map);
+
+	TEST_ASSERT_VAL("wrong nr", map->nr == 1);
+	TEST_ASSERT_VAL("wrong pid", thread_map__pid(map, 0) == -1);
+	TEST_ASSERT_VAL("wrong comm",
+			thread_map__comm(map, 0) &&
+			!strcmp(thread_map__comm(map, 0), "dummy"));
+	thread_map__put(map);
+	return 0;
+}
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index c42adb600091..7629bef2fd79 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1902,8 +1902,23 @@ static int perf_evsel__hists_browse(struct perf_evsel *evsel, int nr_events,
 		case CTRL('c'):
 			goto out_free_stack;
 		case 'f':
-			if (!is_report_browser(hbt))
-				goto out_free_stack;
+			if (!is_report_browser(hbt)) {
+				struct perf_top *top = hbt->arg;
+
+				perf_evlist__toggle_enable(top->evlist);
+				/*
+				 * No need to refresh, resort/decay histogram
+				 * entries if we are not collecting samples:
+				 */
+				if (top->evlist->enabled) {
+					helpline = "Press 'f' to disable the events or 'h' to see other hotkeys";
+					hbt->refresh = delay_secs;
+				} else {
+					helpline = "Press 'f' again to re-enable the events";
+					hbt->refresh = 0;
+				}
+				continue;
+			}
 			/* Fall thru */
 		default:
 			helpline = "Press '?' for help on key bindings";
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index df66966cfde7..7e7405c9b936 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -119,12 +119,12 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 	if (per_cpu) {
 		mp->cpu = evlist->cpus->map[idx];
 		if (evlist->threads)
-			mp->tid = evlist->threads->map[0];
+			mp->tid = thread_map__pid(evlist->threads, 0);
 		else
 			mp->tid = -1;
 	} else {
 		mp->cpu = -1;
-		mp->tid = evlist->threads->map[idx];
+		mp->tid = thread_map__pid(evlist->threads, idx);
 	}
 }
 
@@ -1182,6 +1182,13 @@ static int __auxtrace_mmap__read(struct auxtrace_mmap *mm,
 		data2 = NULL;
 	}
 
+	if (itr->alignment) {
+		unsigned int unwanted = len1 % itr->alignment;
+
+		len1 -= unwanted;
+		size -= unwanted;
+	}
+
 	/* padding must be written by fn() e.g. record__process_auxtrace() */
 	padding = size & 7;
 	if (padding)
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index a171abbe7301..471aecbc4d68 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -303,6 +303,7 @@ struct auxtrace_record {
 				      const char *str);
 	u64 (*reference)(struct auxtrace_record *itr);
 	int (*read_finish)(struct auxtrace_record *itr, int idx);
+	unsigned int alignment;
 };
 
 #ifdef HAVE_AUXTRACE_SUPPORT
diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
index 85b523885f9d..2babddaa2481 100644
--- a/tools/perf/util/cloexec.c
+++ b/tools/perf/util/cloexec.c
@@ -7,11 +7,15 @@
 
 static unsigned long flag = PERF_FLAG_FD_CLOEXEC;
 
+#ifdef __GLIBC_PREREQ
+#if !__GLIBC_PREREQ(2, 6)
 int __weak sched_getcpu(void)
 {
 	errno = ENOSYS;
 	return -1;
 }
+#endif
+#endif
 
 static int perf_flag_probe(void)
 {
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index c4e55b71010c..3667e2123e5b 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -5,6 +5,7 @@
 #include <assert.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include "asm/bug.h"
 
 static struct cpu_map *cpu_map__default_new(void)
 {
@@ -22,6 +23,7 @@ static struct cpu_map *cpu_map__default_new(void)
 			cpus->map[i] = i;
 
 		cpus->nr = nr_cpus;
+		atomic_set(&cpus->refcnt, 1);
 	}
 
 	return cpus;
@@ -35,6 +37,7 @@ static struct cpu_map *cpu_map__trim_new(int nr_cpus, int *tmp_cpus)
 	if (cpus != NULL) {
 		cpus->nr = nr_cpus;
 		memcpy(cpus->map, tmp_cpus, payload_size);
+		atomic_set(&cpus->refcnt, 1);
 	}
 
 	return cpus;
@@ -194,14 +197,32 @@ struct cpu_map *cpu_map__dummy_new(void)
 	if (cpus != NULL) {
 		cpus->nr = 1;
 		cpus->map[0] = -1;
+		atomic_set(&cpus->refcnt, 1);
 	}
 
 	return cpus;
 }
 
-void cpu_map__delete(struct cpu_map *map)
+static void cpu_map__delete(struct cpu_map *map)
 {
-	free(map);
+	if (map) {
+		WARN_ONCE(atomic_read(&map->refcnt) != 0,
+			  "cpu_map refcnt unbalanced\n");
+		free(map);
+	}
+}
+
+struct cpu_map *cpu_map__get(struct cpu_map *map)
+{
+	if (map)
+		atomic_inc(&map->refcnt);
+	return map;
+}
+
+void cpu_map__put(struct cpu_map *map)
+{
+	if (map && atomic_dec_and_test(&map->refcnt))
+		cpu_map__delete(map);
 }
 
 int cpu_map__get_socket(struct cpu_map *map, int idx)
@@ -263,6 +284,7 @@ static int cpu_map__build_map(struct cpu_map *cpus, struct cpu_map **res,
 	/* ensure we process id in increasing order */
 	qsort(c->map, c->nr, sizeof(int), cmp_ids);
 
+	atomic_set(&cpus->refcnt, 1);
 	*res = c;
 	return 0;
 }
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 61a654849002..0af9cecb4c51 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -3,18 +3,19 @@
 
 #include <stdio.h>
 #include <stdbool.h>
+#include <linux/atomic.h>
 
 #include "perf.h"
 #include "util/debug.h"
 
 struct cpu_map {
+	atomic_t refcnt;
 	int nr;
 	int map[];
 };
 
 struct cpu_map *cpu_map__new(const char *cpu_list);
 struct cpu_map *cpu_map__dummy_new(void);
-void cpu_map__delete(struct cpu_map *map);
 struct cpu_map *cpu_map__read(FILE *file);
 size_t cpu_map__fprintf(struct cpu_map *map, FILE *fp);
 int cpu_map__get_socket(struct cpu_map *map, int idx);
@@ -22,6 +23,9 @@ int cpu_map__get_core(struct cpu_map *map, int idx);
 int cpu_map__build_socket_map(struct cpu_map *cpus, struct cpu_map **sockp);
 int cpu_map__build_core_map(struct cpu_map *cpus, struct cpu_map **corep);
 
+struct cpu_map *cpu_map__get(struct cpu_map *map);
+void cpu_map__put(struct cpu_map *map);
+
 static inline int cpu_map__socket(struct cpu_map *sock, int s)
 {
 	if (!sock || s > sock->nr || s < 0)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index d7d986d8f23e..67a977e5d0ab 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -504,7 +504,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 	for (thread = 0; thread < threads->nr; ++thread) {
 		if (__event__synthesize_thread(comm_event, mmap_event,
 					       fork_event,
-					       threads->map[thread], 0,
+					       thread_map__pid(threads, thread), 0,
 					       process, tool, machine,
 					       mmap_data, proc_map_timeout)) {
 			err = -1;
@@ -515,12 +515,12 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 		 * comm.pid is set to thread group id by
 		 * perf_event__synthesize_comm
 		 */
-		if ((int) comm_event->comm.pid != threads->map[thread]) {
+		if ((int) comm_event->comm.pid != thread_map__pid(threads, thread)) {
 			bool need_leader = true;
 
 			/* is thread group leader in thread_map? */
 			for (j = 0; j < threads->nr; ++j) {
-				if ((int) comm_event->comm.pid == threads->map[j]) {
+				if ((int) comm_event->comm.pid == thread_map__pid(threads, j)) {
 					need_leader = false;
 					break;
 				}
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 8366511b45f8..6cfdee68e763 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -114,8 +114,8 @@ void perf_evlist__delete(struct perf_evlist *evlist)
 {
 	perf_evlist__munmap(evlist);
 	perf_evlist__close(evlist);
-	cpu_map__delete(evlist->cpus);
-	thread_map__delete(evlist->threads);
+	cpu_map__put(evlist->cpus);
+	thread_map__put(evlist->threads);
 	evlist->cpus = NULL;
 	evlist->threads = NULL;
 	perf_evlist__purge(evlist);
@@ -548,7 +548,7 @@ static void perf_evlist__set_sid_idx(struct perf_evlist *evlist,
 	else
 		sid->cpu = -1;
 	if (!evsel->system_wide && evlist->threads && thread >= 0)
-		sid->tid = evlist->threads->map[thread];
+		sid->tid = thread_map__pid(evlist->threads, thread);
 	else
 		sid->tid = -1;
 }
@@ -1101,6 +1101,31 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
 }
 
+static int perf_evlist__propagate_maps(struct perf_evlist *evlist,
+				       struct target *target)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(evlist, evsel) {
+		/*
+		 * We already have cpus for evsel (via PMU sysfs) so
+		 * keep it, if there's no target cpu list defined.
+		 */
+		if (evsel->cpus && target->cpu_list)
+			cpu_map__put(evsel->cpus);
+
+		if (!evsel->cpus || target->cpu_list)
+			evsel->cpus = cpu_map__get(evlist->cpus);
+
+		evsel->threads = thread_map__get(evlist->threads);
+
+		if (!evsel->cpus || !evsel->threads)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 {
 	evlist->threads = thread_map__new_str(target->pid, target->tid,
@@ -1117,10 +1142,10 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 	if (evlist->cpus == NULL)
 		goto out_delete_threads;
 
-	return 0;
+	return perf_evlist__propagate_maps(evlist, target);
 
 out_delete_threads:
-	thread_map__delete(evlist->threads);
+	thread_map__put(evlist->threads);
 	evlist->threads = NULL;
 	return -1;
 }
@@ -1353,7 +1378,7 @@ static int perf_evlist__create_syswide_maps(struct perf_evlist *evlist)
 out:
 	return err;
 out_free_cpus:
-	cpu_map__delete(evlist->cpus);
+	cpu_map__put(evlist->cpus);
 	evlist->cpus = NULL;
 	goto out;
 }
@@ -1475,7 +1500,7 @@ int perf_evlist__prepare_workload(struct perf_evlist *evlist, struct target *tar
 				__func__, __LINE__);
 			goto out_close_pipes;
 		}
-		evlist->threads->map[0] = evlist->workload.pid;
+		thread_map__set_pid(evlist->threads, 0, evlist->workload.pid);
 	}
 
 	close(child_ready_pipe[1]);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a8489b9d2812..037633c1da9d 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -289,5 +289,4 @@ void perf_evlist__to_front(struct perf_evlist *evlist,
 
 void perf_evlist__set_tracking_event(struct perf_evlist *evlist,
 				     struct perf_evsel *tracking_evsel);
-
 #endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 33449decf7bd..2936b3080722 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -885,6 +885,8 @@ void perf_evsel__exit(struct perf_evsel *evsel)
 	perf_evsel__free_fd(evsel);
 	perf_evsel__free_id(evsel);
 	close_cgroup(evsel->cgrp);
+	cpu_map__put(evsel->cpus);
+	thread_map__put(evsel->threads);
 	zfree(&evsel->group_name);
 	zfree(&evsel->name);
 	perf_evsel__object.fini(evsel);
@@ -896,7 +898,7 @@ void perf_evsel__delete(struct perf_evsel *evsel)
 	free(evsel);
 }
 
-void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu,
+void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu, int thread,
 				struct perf_counts_values *count)
 {
 	struct perf_counts_values tmp;
@@ -908,8 +910,8 @@ void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu,
 		tmp = evsel->prev_raw_counts->aggr;
 		evsel->prev_raw_counts->aggr = *count;
 	} else {
-		tmp = evsel->prev_raw_counts->cpu[cpu];
-		evsel->prev_raw_counts->cpu[cpu] = *count;
+		tmp = *perf_counts(evsel->prev_raw_counts, cpu, thread);
+		*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
 	}
 
 	count->val = count->val - tmp.val;
@@ -937,20 +939,18 @@ void perf_counts_values__scale(struct perf_counts_values *count,
 		*pscaled = scaled;
 }
 
-int perf_evsel__read_cb(struct perf_evsel *evsel, int cpu, int thread,
-			perf_evsel__read_cb_t cb)
+int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
+		     struct perf_counts_values *count)
 {
-	struct perf_counts_values count;
-
-	memset(&count, 0, sizeof(count));
+	memset(count, 0, sizeof(*count));
 
 	if (FD(evsel, cpu, thread) < 0)
 		return -EINVAL;
 
-	if (readn(FD(evsel, cpu, thread), &count, sizeof(count)) < 0)
+	if (readn(FD(evsel, cpu, thread), count, sizeof(*count)) < 0)
 		return -errno;
 
-	return cb(evsel, cpu, thread, &count);
+	return 0;
 }
 
 int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
@@ -962,15 +962,15 @@ int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
 	if (FD(evsel, cpu, thread) < 0)
 		return -EINVAL;
 
-	if (evsel->counts == NULL && perf_evsel__alloc_counts(evsel, cpu + 1) < 0)
+	if (evsel->counts == NULL && perf_evsel__alloc_counts(evsel, cpu + 1, thread + 1) < 0)
 		return -ENOMEM;
 
 	if (readn(FD(evsel, cpu, thread), &count, nv * sizeof(u64)) < 0)
 		return -errno;
 
-	perf_evsel__compute_deltas(evsel, cpu, &count);
+	perf_evsel__compute_deltas(evsel, cpu, thread, &count);
 	perf_counts_values__scale(&count, scale, NULL);
-	evsel->counts->cpu[cpu] = count;
+	*perf_counts(evsel->counts, cpu, thread) = count;
 	return 0;
 }
 
@@ -1167,7 +1167,7 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 			int group_fd;
 
 			if (!evsel->cgrp && !evsel->system_wide)
-				pid = threads->map[thread];
+				pid = thread_map__pid(threads, thread);
 
 			group_fd = get_group_fd(evsel, cpu, thread);
 retry_open:
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index bb0579e8a10a..4a7ed5656cf0 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -8,23 +8,8 @@
 #include <linux/types.h>
 #include "xyarray.h"
 #include "symbol.h"
-
-struct perf_counts_values {
-	union {
-		struct {
-			u64 val;
-			u64 ena;
-			u64 run;
-		};
-		u64 values[3];
-	};
-};
-
-struct perf_counts {
-	s8		   	  scaled;
-	struct perf_counts_values aggr;
-	struct perf_counts_values cpu[];
-};
+#include "cpumap.h"
+#include "stat.h"
 
 struct perf_evsel;
 
@@ -82,6 +67,7 @@ struct perf_evsel {
 	struct cgroup_sel	*cgrp;
 	void			*handler;
 	struct cpu_map		*cpus;
+	struct thread_map	*threads;
 	unsigned int		sample_size;
 	int			id_pos;
 	int			is_pos;
@@ -113,10 +99,20 @@ struct thread_map;
 struct perf_evlist;
 struct record_opts;
 
+static inline struct cpu_map *perf_evsel__cpus(struct perf_evsel *evsel)
+{
+	return evsel->cpus;
+}
+
+static inline int perf_evsel__nr_cpus(struct perf_evsel *evsel)
+{
+	return perf_evsel__cpus(evsel)->nr;
+}
+
 void perf_counts_values__scale(struct perf_counts_values *count,
 			       bool scale, s8 *pscaled);
 
-void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu,
+void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu, int thread,
 				struct perf_counts_values *count);
 
 int perf_evsel__object_config(size_t object_size,
@@ -233,12 +229,8 @@ static inline bool perf_evsel__match2(struct perf_evsel *e1,
 	 (a)->attr.type == (b)->attr.type &&	\
 	 (a)->attr.config == (b)->attr.config)
 
-typedef int (perf_evsel__read_cb_t)(struct perf_evsel *evsel,
-				    int cpu, int thread,
-				    struct perf_counts_values *count);
-
-int perf_evsel__read_cb(struct perf_evsel *evsel, int cpu, int thread,
-			perf_evsel__read_cb_t cb);
+int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
+		     struct perf_counts_values *count);
 
 int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
 			      int cpu, int thread, bool scale);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 21a77e7a171e..03ace57a800c 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1063,8 +1063,7 @@ read_event_desc(struct perf_header *ph, int fd)
 	free(buf);
 	return events;
 error:
-	if (events)
-		free_event_desc(events);
+	free_event_desc(events);
 	events = NULL;
 	goto out;
 }
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 4744673aff1b..7ff682770fdb 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1448,10 +1448,9 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 	case PERF_RECORD_AUX:
 		ret = machine__process_aux_event(machine, event); break;
 	case PERF_RECORD_ITRACE_START:
-		ret = machine__process_itrace_start_event(machine, event);
+		ret = machine__process_itrace_start_event(machine, event); break;
 	case PERF_RECORD_LOST_SAMPLES:
 		ret = machine__process_lost_samples_event(machine, event, sample); break;
-		break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 2a4d1ec02846..09f8d2357108 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -17,6 +17,7 @@
 #include "parse-events-flex.h"
 #include "pmu.h"
 #include "thread_map.h"
+#include "cpumap.h"
 #include "asm/bug.h"
 
 #define MAX_NAME_LEN 100
@@ -285,7 +286,9 @@ __add_event(struct list_head *list, int *idx,
 	if (!evsel)
 		return NULL;
 
-	evsel->cpus = cpus;
+	if (cpus)
+		evsel->cpus = cpu_map__get(cpus);
+
 	if (name)
 		evsel->name = strdup(name);
 	list_add_tail(&evsel->node, list);
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 09e738fe9ea2..13cef3c65565 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -119,8 +119,8 @@ event		[^,{}/]+
 num_dec		[0-9]+
 num_hex		0x[a-fA-F0-9]+
 num_raw_hex	[a-fA-F0-9]+
-name		[a-zA-Z_*?][a-zA-Z0-9_*?]*
-name_minus	[a-zA-Z_*?][a-zA-Z0-9\-_*?]*
+name		[a-zA-Z_*?][a-zA-Z0-9_*?.]*
+name_minus	[a-zA-Z_*?][a-zA-Z0-9\-_*?.]*
 /* If you add a modifier you need to update check_modifier() */
 modifier_event	[ukhpGHSDI]+
 modifier_bp	[rwx]{1,3}
@@ -165,7 +165,6 @@ modifier_bp	[rwx]{1,3}
 			return PE_EVENT_NAME;
 		}
 
-.		|
 <<EOF>>		{
 			BEGIN(INITIAL);
 			REWIND(0);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 0fcc624eb767..7bcb8c315615 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1,4 +1,5 @@
 #include <linux/list.h>
+#include <linux/compiler.h>
 #include <sys/types.h>
 #include <unistd.h>
 #include <stdio.h>
@@ -205,17 +206,12 @@ static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias,
 	return 0;
 }
 
-static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FILE *file)
+static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
+				 char *desc __maybe_unused, char *val)
 {
 	struct perf_pmu_alias *alias;
-	char buf[256];
 	int ret;
 
-	ret = fread(buf, 1, sizeof(buf), file);
-	if (ret == 0)
-		return -EINVAL;
-	buf[ret] = 0;
-
 	alias = malloc(sizeof(*alias));
 	if (!alias)
 		return -ENOMEM;
@@ -225,26 +221,43 @@ static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FI
 	alias->unit[0] = '\0';
 	alias->per_pkg = false;
 
-	ret = parse_events_terms(&alias->terms, buf);
+	ret = parse_events_terms(&alias->terms, val);
 	if (ret) {
+		pr_err("Cannot parse alias %s: %d\n", val, ret);
 		free(alias);
 		return ret;
 	}
 
 	alias->name = strdup(name);
-	/*
-	 * load unit name and scale if available
-	 */
-	perf_pmu__parse_unit(alias, dir, name);
-	perf_pmu__parse_scale(alias, dir, name);
-	perf_pmu__parse_per_pkg(alias, dir, name);
-	perf_pmu__parse_snapshot(alias, dir, name);
+	if (dir) {
+		/*
+		 * load unit name and scale if available
+		 */
+		perf_pmu__parse_unit(alias, dir, name);
+		perf_pmu__parse_scale(alias, dir, name);
+		perf_pmu__parse_per_pkg(alias, dir, name);
+		perf_pmu__parse_snapshot(alias, dir, name);
+	}
 
 	list_add_tail(&alias->list, list);
 
 	return 0;
 }
 
+static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FILE *file)
+{
+	char buf[256];
+	int ret;
+
+	ret = fread(buf, 1, sizeof(buf), file);
+	if (ret == 0)
+		return -EINVAL;
+
+	buf[ret] = 0;
+
+	return __perf_pmu__new_alias(list, dir, name, NULL, buf);
+}
+
 static inline bool pmu_alias_info_file(char *name)
 {
 	size_t len;
@@ -436,7 +449,7 @@ static struct cpu_map *pmu_cpumask(const char *name)
 	return cpus;
 }
 
-struct perf_event_attr *__attribute__((weak))
+struct perf_event_attr * __weak
 perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
 {
 	return NULL;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 076527b639bd..381f23a443c7 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -249,8 +249,12 @@ static void clear_probe_trace_events(struct probe_trace_event *tevs, int ntevs)
 static bool kprobe_blacklist__listed(unsigned long address);
 static bool kprobe_warn_out_range(const char *symbol, unsigned long address)
 {
+	u64 etext_addr;
+
 	/* Get the address of _etext for checking non-probable text symbol */
-	if (kernel_get_symbol_address_by_name("_etext", false) < address)
+	etext_addr = kernel_get_symbol_address_by_name("_etext", false);
+
+	if (etext_addr != 0 && etext_addr < address)
 		pr_warning("%s is out of .text, skip it.\n", symbol);
 	else if (kprobe_blacklist__listed(address))
 		pr_warning("%s is blacklisted function, skip it.\n", symbol);
diff --git a/tools/perf/util/python-ext-sources b/tools/perf/util/python-ext-sources
index 5925fec90562..e23ded40c79e 100644
--- a/tools/perf/util/python-ext-sources
+++ b/tools/perf/util/python-ext-sources
@@ -20,3 +20,4 @@ util/stat.c
 util/strlist.c
 util/trace-event.c
 ../../lib/rbtree.c
+util/string.c
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index d906d0ad5d40..626422eda727 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -384,7 +384,7 @@ static int pyrf_cpu_map__init(struct pyrf_cpu_map *pcpus,
 
 static void pyrf_cpu_map__delete(struct pyrf_cpu_map *pcpus)
 {
-	cpu_map__delete(pcpus->cpus);
+	cpu_map__put(pcpus->cpus);
 	pcpus->ob_type->tp_free((PyObject*)pcpus);
 }
 
@@ -453,7 +453,7 @@ static int pyrf_thread_map__init(struct pyrf_thread_map *pthreads,
 
 static void pyrf_thread_map__delete(struct pyrf_thread_map *pthreads)
 {
-	thread_map__delete(pthreads->threads);
+	thread_map__put(pthreads->threads);
 	pthreads->ob_type->tp_free((PyObject*)pthreads);
 }
 
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index d457c523a33d..1f7becbe5e18 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -64,7 +64,7 @@ static bool perf_probe_api(setup_probe_fn_t fn)
 	if (!cpus)
 		return false;
 	cpu = cpus->map[0];
-	cpu_map__delete(cpus);
+	cpu_map__put(cpus);
 
 	do {
 		ret = perf_do_probe_api(fn, cpu, try[i++]);
@@ -226,7 +226,7 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str)
 		struct cpu_map *cpus = cpu_map__new(NULL);
 
 		cpu =  cpus ? cpus->map[0] : 0;
-		cpu_map__delete(cpus);
+		cpu_map__put(cpus);
 	} else {
 		cpu = evlist->cpus->map[0];
 	}
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index aa482c10469d..ed9dc2555ec7 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -686,6 +686,8 @@ static int process_finished_round(struct perf_tool *tool __maybe_unused,
 				  union perf_event *event __maybe_unused,
 				  struct ordered_events *oe)
 {
+	if (dump_trace)
+		fprintf(stdout, "\n");
 	return ordered_events__flush(oe, OE_FLUSH__ROUND);
 }
 
@@ -1726,7 +1728,7 @@ size_t perf_session__fprintf_nr_events(struct perf_session *session, FILE *fp)
 	if (perf_header__has_feat(&session->header, HEADER_AUXTRACE))
 		msg = " (excludes AUX area (e.g. instruction trace) decoded / synthesized events)";
 
-	ret = fprintf(fp, "Aggregated stats:%s\n", msg);
+	ret = fprintf(fp, "\nAggregated stats:%s\n", msg);
 
 	ret += events_stats__fprintf(&session->evlist->stats, fp);
 	return ret;
@@ -1893,7 +1895,7 @@ int perf_session__cpu_bitmap(struct perf_session *session,
 	err = 0;
 
 out_delete_map:
-	cpu_map__delete(map);
+	cpu_map__put(map);
 	return err;
 }
 
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 4014b709f956..f2a0d1521e26 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -1,6 +1,8 @@
 #include <math.h>
 #include "stat.h"
+#include "evlist.h"
 #include "evsel.h"
+#include "thread_map.h"
 
 void update_stats(struct stats *stats, u64 val)
 {
@@ -95,33 +97,46 @@ void perf_stat_evsel_id_init(struct perf_evsel *evsel)
 	}
 }
 
-struct perf_counts *perf_counts__new(int ncpus)
+struct perf_counts *perf_counts__new(int ncpus, int nthreads)
 {
-	int size = sizeof(struct perf_counts) +
-		   ncpus * sizeof(struct perf_counts_values);
+	struct perf_counts *counts = zalloc(sizeof(*counts));
 
-	return zalloc(size);
+	if (counts) {
+		struct xyarray *values;
+
+		values = xyarray__new(ncpus, nthreads, sizeof(struct perf_counts_values));
+		if (!values) {
+			free(counts);
+			return NULL;
+		}
+
+		counts->values = values;
+	}
+
+	return counts;
 }
 
 void perf_counts__delete(struct perf_counts *counts)
 {
-	free(counts);
+	if (counts) {
+		xyarray__delete(counts->values);
+		free(counts);
+	}
 }
 
-static void perf_counts__reset(struct perf_counts *counts, int ncpus)
+static void perf_counts__reset(struct perf_counts *counts)
 {
-	memset(counts, 0, (sizeof(*counts) +
-	       (ncpus * sizeof(struct perf_counts_values))));
+	xyarray__reset(counts->values);
 }
 
-void perf_evsel__reset_counts(struct perf_evsel *evsel, int ncpus)
+void perf_evsel__reset_counts(struct perf_evsel *evsel)
 {
-	perf_counts__reset(evsel->counts, ncpus);
+	perf_counts__reset(evsel->counts);
 }
 
-int perf_evsel__alloc_counts(struct perf_evsel *evsel, int ncpus)
+int perf_evsel__alloc_counts(struct perf_evsel *evsel, int ncpus, int nthreads)
 {
-	evsel->counts = perf_counts__new(ncpus);
+	evsel->counts = perf_counts__new(ncpus, nthreads);
 	return evsel->counts != NULL ? 0 : -ENOMEM;
 }
 
@@ -130,3 +145,96 @@ void perf_evsel__free_counts(struct perf_evsel *evsel)
 	perf_counts__delete(evsel->counts);
 	evsel->counts = NULL;
 }
+
+void perf_evsel__reset_stat_priv(struct perf_evsel *evsel)
+{
+	int i;
+	struct perf_stat *ps = evsel->priv;
+
+	for (i = 0; i < 3; i++)
+		init_stats(&ps->res_stats[i]);
+
+	perf_stat_evsel_id_init(evsel);
+}
+
+int perf_evsel__alloc_stat_priv(struct perf_evsel *evsel)
+{
+	evsel->priv = zalloc(sizeof(struct perf_stat));
+	if (evsel->priv == NULL)
+		return -ENOMEM;
+	perf_evsel__reset_stat_priv(evsel);
+	return 0;
+}
+
+void perf_evsel__free_stat_priv(struct perf_evsel *evsel)
+{
+	zfree(&evsel->priv);
+}
+
+int perf_evsel__alloc_prev_raw_counts(struct perf_evsel *evsel,
+				      int ncpus, int nthreads)
+{
+	struct perf_counts *counts;
+
+	counts = perf_counts__new(ncpus, nthreads);
+	if (counts)
+		evsel->prev_raw_counts = counts;
+
+	return counts ? 0 : -ENOMEM;
+}
+
+void perf_evsel__free_prev_raw_counts(struct perf_evsel *evsel)
+{
+	perf_counts__delete(evsel->prev_raw_counts);
+	evsel->prev_raw_counts = NULL;
+}
+
+int perf_evsel__alloc_stats(struct perf_evsel *evsel, bool alloc_raw)
+{
+	int ncpus = perf_evsel__nr_cpus(evsel);
+	int nthreads = thread_map__nr(evsel->threads);
+
+	if (perf_evsel__alloc_stat_priv(evsel) < 0 ||
+	    perf_evsel__alloc_counts(evsel, ncpus, nthreads) < 0 ||
+	    (alloc_raw && perf_evsel__alloc_prev_raw_counts(evsel, ncpus, nthreads) < 0))
+		return -ENOMEM;
+
+	return 0;
+}
+
+int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(evlist, evsel) {
+		if (perf_evsel__alloc_stats(evsel, alloc_raw))
+			goto out_free;
+	}
+
+	return 0;
+
+out_free:
+	perf_evlist__free_stats(evlist);
+	return -1;
+}
+
+void perf_evlist__free_stats(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(evlist, evsel) {
+		perf_evsel__free_stat_priv(evsel);
+		perf_evsel__free_counts(evsel);
+		perf_evsel__free_prev_raw_counts(evsel);
+	}
+}
+
+void perf_evlist__reset_stats(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(evlist, evsel) {
+		perf_evsel__reset_stat_priv(evsel);
+		perf_evsel__reset_counts(evsel);
+	}
+}
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 093dc3cb28dd..1cfbe0a980ac 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -3,6 +3,7 @@
 
 #include <linux/types.h>
 #include <stdio.h>
+#include "xyarray.h"
 
 struct stats
 {
@@ -29,8 +30,32 @@ enum aggr_mode {
 	AGGR_GLOBAL,
 	AGGR_SOCKET,
 	AGGR_CORE,
+	AGGR_THREAD,
 };
 
+struct perf_counts_values {
+	union {
+		struct {
+			u64 val;
+			u64 ena;
+			u64 run;
+		};
+		u64 values[3];
+	};
+};
+
+struct perf_counts {
+	s8			  scaled;
+	struct perf_counts_values aggr;
+	struct xyarray		  *values;
+};
+
+static inline struct perf_counts_values*
+perf_counts(struct perf_counts *counts, int cpu, int thread)
+{
+	return xyarray__entry(counts->values, cpu, thread);
+}
+
 void update_stats(struct stats *stats, u64 val);
 double avg_stats(struct stats *stats);
 double stddev_stats(struct stats *stats);
@@ -46,6 +71,8 @@ static inline void init_stats(struct stats *stats)
 }
 
 struct perf_evsel;
+struct perf_evlist;
+
 bool __perf_evsel_stat__is(struct perf_evsel *evsel,
 			   enum perf_stat_evsel_id id);
 
@@ -62,10 +89,24 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 *count,
 void perf_stat__print_shadow_stats(FILE *out, struct perf_evsel *evsel,
 				   double avg, int cpu, enum aggr_mode aggr);
 
-struct perf_counts *perf_counts__new(int ncpus);
+struct perf_counts *perf_counts__new(int ncpus, int nthreads);
 void perf_counts__delete(struct perf_counts *counts);
 
-void perf_evsel__reset_counts(struct perf_evsel *evsel, int ncpus);
-int perf_evsel__alloc_counts(struct perf_evsel *evsel, int ncpus);
+void perf_evsel__reset_counts(struct perf_evsel *evsel);
+int perf_evsel__alloc_counts(struct perf_evsel *evsel, int ncpus, int nthreads);
 void perf_evsel__free_counts(struct perf_evsel *evsel);
+
+void perf_evsel__reset_stat_priv(struct perf_evsel *evsel);
+int perf_evsel__alloc_stat_priv(struct perf_evsel *evsel);
+void perf_evsel__free_stat_priv(struct perf_evsel *evsel);
+
+int perf_evsel__alloc_prev_raw_counts(struct perf_evsel *evsel,
+				      int ncpus, int nthreads);
+void perf_evsel__free_prev_raw_counts(struct perf_evsel *evsel);
+
+int perf_evsel__alloc_stats(struct perf_evsel *evsel, bool alloc_raw);
+
+int perf_evlist__alloc_stats(struct perf_evlist *evlist, bool alloc_raw);
+void perf_evlist__free_stats(struct perf_evlist *evlist);
+void perf_evlist__reset_stats(struct perf_evlist *evlist);
 #endif
diff --git a/tools/perf/util/svghelper.c b/tools/perf/util/svghelper.c
index 283d3e73e2f2..eec6c1149f44 100644
--- a/tools/perf/util/svghelper.c
+++ b/tools/perf/util/svghelper.c
@@ -748,7 +748,7 @@ static int str_to_bitmap(char *s, cpumask_t *b)
 		set_bit(c, cpumask_bits(b));
 	}
 
-	cpu_map__delete(m);
+	cpu_map__put(m);
 
 	return ret;
 }
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 504f2d73b7ee..48b588c6951a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1132,8 +1132,11 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 	INIT_LIST_HEAD(&md.maps);
 
 	fd = open(kcore_filename, O_RDONLY);
-	if (fd < 0)
+	if (fd < 0) {
+		pr_err("%s requires CAP_SYS_RAWIO capability to access.\n",
+			kcore_filename);
 		return -EINVAL;
+	}
 
 	/* Read new maps into temporary lists */
 	err = file__read_maps(fd, md.type == MAP__FUNCTION, kcore_mapfn, &md,
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index f4822bd03709..da7646d767fe 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -8,8 +8,11 @@
 #include <unistd.h>
 #include "strlist.h"
 #include <string.h>
+#include <api/fs/fs.h>
+#include "asm/bug.h"
 #include "thread_map.h"
 #include "util.h"
+#include "debug.h"
 
 /* Skip "." and ".." directories */
 static int filter(const struct dirent *dir)
@@ -20,11 +23,26 @@ static int filter(const struct dirent *dir)
 		return 1;
 }
 
+static void thread_map__reset(struct thread_map *map, int start, int nr)
+{
+	size_t size = (nr - start) * sizeof(map->map[0]);
+
+	memset(&map->map[start], 0, size);
+}
+
 static struct thread_map *thread_map__realloc(struct thread_map *map, int nr)
 {
-	size_t size = sizeof(*map) + sizeof(pid_t) * nr;
+	size_t size = sizeof(*map) + sizeof(map->map[0]) * nr;
+	int start = map ? map->nr : 0;
+
+	map = realloc(map, size);
+	/*
+	 * We only realloc to add more items, let's reset new items.
+	 */
+	if (map)
+		thread_map__reset(map, start, nr);
 
-	return realloc(map, size);
+	return map;
 }
 
 #define thread_map__alloc(__nr) thread_map__realloc(NULL, __nr)
@@ -45,8 +63,9 @@ struct thread_map *thread_map__new_by_pid(pid_t pid)
 	threads = thread_map__alloc(items);
 	if (threads != NULL) {
 		for (i = 0; i < items; i++)
-			threads->map[i] = atoi(namelist[i]->d_name);
+			thread_map__set_pid(threads, i, atoi(namelist[i]->d_name));
 		threads->nr = items;
+		atomic_set(&threads->refcnt, 1);
 	}
 
 	for (i=0; i<items; i++)
@@ -61,8 +80,9 @@ struct thread_map *thread_map__new_by_tid(pid_t tid)
 	struct thread_map *threads = thread_map__alloc(1);
 
 	if (threads != NULL) {
-		threads->map[0] = tid;
-		threads->nr	= 1;
+		thread_map__set_pid(threads, 0, tid);
+		threads->nr = 1;
+		atomic_set(&threads->refcnt, 1);
 	}
 
 	return threads;
@@ -84,6 +104,7 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 		goto out_free_threads;
 
 	threads->nr = 0;
+	atomic_set(&threads->refcnt, 1);
 
 	while (!readdir_r(proc, &dirent, &next) && next) {
 		char *end;
@@ -123,8 +144,10 @@ struct thread_map *thread_map__new_by_uid(uid_t uid)
 			threads = tmp;
 		}
 
-		for (i = 0; i < items; i++)
-			threads->map[threads->nr + i] = atoi(namelist[i]->d_name);
+		for (i = 0; i < items; i++) {
+			thread_map__set_pid(threads, threads->nr + i,
+					    atoi(namelist[i]->d_name));
+		}
 
 		for (i = 0; i < items; i++)
 			zfree(&namelist[i]);
@@ -201,7 +224,7 @@ static struct thread_map *thread_map__new_by_pid_str(const char *pid_str)
 		threads = nt;
 
 		for (i = 0; i < items; i++) {
-			threads->map[j++] = atoi(namelist[i]->d_name);
+			thread_map__set_pid(threads, j++, atoi(namelist[i]->d_name));
 			zfree(&namelist[i]);
 		}
 		threads->nr = total_tasks;
@@ -210,6 +233,8 @@ static struct thread_map *thread_map__new_by_pid_str(const char *pid_str)
 
 out:
 	strlist__delete(slist);
+	if (threads)
+		atomic_set(&threads->refcnt, 1);
 	return threads;
 
 out_free_namelist:
@@ -227,8 +252,9 @@ struct thread_map *thread_map__new_dummy(void)
 	struct thread_map *threads = thread_map__alloc(1);
 
 	if (threads != NULL) {
-		threads->map[0]	= -1;
-		threads->nr	= 1;
+		thread_map__set_pid(threads, 0, -1);
+		threads->nr = 1;
+		atomic_set(&threads->refcnt, 1);
 	}
 	return threads;
 }
@@ -267,10 +293,12 @@ static struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
 			goto out_free_threads;
 
 		threads = nt;
-		threads->map[ntasks - 1] = tid;
-		threads->nr		 = ntasks;
+		thread_map__set_pid(threads, ntasks - 1, tid);
+		threads->nr = ntasks;
 	}
 out:
+	if (threads)
+		atomic_set(&threads->refcnt, 1);
 	return threads;
 
 out_free_threads:
@@ -290,9 +318,30 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid,
 	return thread_map__new_by_tid_str(tid);
 }
 
-void thread_map__delete(struct thread_map *threads)
+static void thread_map__delete(struct thread_map *threads)
 {
-	free(threads);
+	if (threads) {
+		int i;
+
+		WARN_ONCE(atomic_read(&threads->refcnt) != 0,
+			  "thread map refcnt unbalanced\n");
+		for (i = 0; i < threads->nr; i++)
+			free(thread_map__comm(threads, i));
+		free(threads);
+	}
+}
+
+struct thread_map *thread_map__get(struct thread_map *map)
+{
+	if (map)
+		atomic_inc(&map->refcnt);
+	return map;
+}
+
+void thread_map__put(struct thread_map *map)
+{
+	if (map && atomic_dec_and_test(&map->refcnt))
+		thread_map__delete(map);
 }
 
 size_t thread_map__fprintf(struct thread_map *threads, FILE *fp)
@@ -301,7 +350,60 @@ size_t thread_map__fprintf(struct thread_map *threads, FILE *fp)
 	size_t printed = fprintf(fp, "%d thread%s: ",
 				 threads->nr, threads->nr > 1 ? "s" : "");
 	for (i = 0; i < threads->nr; ++i)
-		printed += fprintf(fp, "%s%d", i ? ", " : "", threads->map[i]);
+		printed += fprintf(fp, "%s%d", i ? ", " : "", thread_map__pid(threads, i));
 
 	return printed + fprintf(fp, "\n");
 }
+
+static int get_comm(char **comm, pid_t pid)
+{
+	char *path;
+	size_t size;
+	int err;
+
+	if (asprintf(&path, "%s/%d/comm", procfs__mountpoint(), pid) == -1)
+		return -ENOMEM;
+
+	err = filename__read_str(path, comm, &size);
+	if (!err) {
+		/*
+		 * We're reading 16 bytes, while filename__read_str
+		 * allocates data per BUFSIZ bytes, so we can safely
+		 * mark the end of the string.
+		 */
+		(*comm)[size] = 0;
+		rtrim(*comm);
+	}
+
+	free(path);
+	return err;
+}
+
+static void comm_init(struct thread_map *map, int i)
+{
+	pid_t pid = thread_map__pid(map, i);
+	char *comm = NULL;
+
+	/* dummy pid comm initialization */
+	if (pid == -1) {
+		map->map[i].comm = strdup("dummy");
+		return;
+	}
+
+	/*
+	 * The comm name is like extra bonus ;-),
+	 * so just warn if we fail for any reason.
+	 */
+	if (get_comm(&comm, pid))
+		pr_warning("Couldn't resolve comm name for pid %d\n", pid);
+
+	map->map[i].comm = comm;
+}
+
+void thread_map__read_comms(struct thread_map *threads)
+{
+	int i;
+
+	for (i = 0; i < threads->nr; ++i)
+		comm_init(threads, i);
+}
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 95313f43cc0f..af679d8a50f8 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -3,10 +3,17 @@
 
 #include <sys/types.h>
 #include <stdio.h>
+#include <linux/atomic.h>
+
+struct thread_map_data {
+	pid_t    pid;
+	char	*comm;
+};
 
 struct thread_map {
+	atomic_t refcnt;
 	int nr;
-	pid_t map[];
+	struct thread_map_data map[];
 };
 
 struct thread_map *thread_map__new_dummy(void);
@@ -15,11 +22,12 @@ struct thread_map *thread_map__new_by_tid(pid_t tid);
 struct thread_map *thread_map__new_by_uid(uid_t uid);
 struct thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
 
+struct thread_map *thread_map__get(struct thread_map *map);
+void thread_map__put(struct thread_map *map);
+
 struct thread_map *thread_map__new_str(const char *pid,
 		const char *tid, uid_t uid);
 
-void thread_map__delete(struct thread_map *threads);
-
 size_t thread_map__fprintf(struct thread_map *threads, FILE *fp);
 
 static inline int thread_map__nr(struct thread_map *threads)
@@ -27,4 +35,21 @@ static inline int thread_map__nr(struct thread_map *threads)
 	return threads ? threads->nr : 1;
 }
 
+static inline pid_t thread_map__pid(struct thread_map *map, int thread)
+{
+	return map->map[thread].pid;
+}
+
+static inline void
+thread_map__set_pid(struct thread_map *map, int thread, pid_t pid)
+{
+	map->map[thread].pid = pid;
+}
+
+static inline char *thread_map__comm(struct thread_map *map, int thread)
+{
+	return map->map[thread].comm;
+}
+
+void thread_map__read_comms(struct thread_map *threads);
 #endif	/* __PERF_THREAD_MAP_H */

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2015-04-18 15:22 Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2015-04-18 15:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus

   # HEAD: 0c99241c93b8060441f3c8434848e54b5338f922 perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()

This update has mostly fixes, but also other bits:

  - perf tooling fixes

  - PMU driver fixes

  - Intel Broadwell PMU driver HW-enablement for LBR callstacks

  - a late coming 'perf kmem' tool update that enables it to also 
    analyze page allocation data. Note, this comes with MM tracepoint
    changes that we believe to not break anything: because it changes
    the formerly opaque 'struct page *' field that uniquely identifies
    pages to 'pfn' which identifies pages uniquely too, but isn't as
    opaque and can be used for other purposes as well.

 Thanks,

	Ingo

------------------>
He Kuang (2):
      perf probe: Set retprobe flag when probe in address-based alternative mode
      perf probe: Fix segfault when probe with lazy_line to file

Ingo Molnar (1):
      perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()

Jacob Pan (1):
      perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units

Kan Liang (1):
      perf/x86/intel: Add Broadwell support for the LBR callstack

Namhyung Kim (2):
      tracing, mm: Record pfn instead of pointer to struct page
      perf kmem: Analyze page allocator events also

Naohiro Aota (1):
      perf probe: Find compilation directory path for lazy matching

Peter Zijlstra (2):
      perf/x86: Fix hw_perf_event::flags collision
      perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events


 arch/x86/kernel/cpu/perf_event.h            |  18 +-
 arch/x86/kernel/cpu/perf_event_intel.c      |   2 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c   |   8 +
 arch/x86/kernel/cpu/perf_event_intel_pt.c   |  33 +-
 arch/x86/kernel/cpu/perf_event_intel_rapl.c |  94 ++++--
 include/trace/events/filemap.h              |   8 +-
 include/trace/events/kmem.h                 |  42 +--
 include/trace/events/vmscan.h               |   8 +-
 tools/perf/Documentation/perf-kmem.txt      |   8 +-
 tools/perf/builtin-kmem.c                   | 500 +++++++++++++++++++++++++++-
 tools/perf/util/probe-event.c               |  60 +---
 tools/perf/util/probe-finder.c              |  73 +++-
 tools/perf/util/probe-finder.h              |   4 +
 13 files changed, 702 insertions(+), 156 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 329f0356ad4a..6ac5cb7a9e14 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -65,15 +65,15 @@ struct event_constraint {
 /*
  * struct hw_perf_event.flags flags
  */
-#define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
-#define PERF_X86_EVENT_PEBS_ST		0x2 /* st data address sampling */
-#define PERF_X86_EVENT_PEBS_ST_HSW	0x4 /* haswell style datala, store */
-#define PERF_X86_EVENT_COMMITTED	0x8 /* event passed commit_txn */
-#define PERF_X86_EVENT_PEBS_LD_HSW	0x10 /* haswell style datala, load */
-#define PERF_X86_EVENT_PEBS_NA_HSW	0x20 /* haswell style datala, unknown */
-#define PERF_X86_EVENT_EXCL		0x40 /* HT exclusivity on counter */
-#define PERF_X86_EVENT_DYNAMIC		0x80 /* dynamic alloc'd constraint */
-#define PERF_X86_EVENT_RDPMC_ALLOWED	0x40 /* grant rdpmc permission */
+#define PERF_X86_EVENT_PEBS_LDLAT	0x0001 /* ld+ldlat data address sampling */
+#define PERF_X86_EVENT_PEBS_ST		0x0002 /* st data address sampling */
+#define PERF_X86_EVENT_PEBS_ST_HSW	0x0004 /* haswell style datala, store */
+#define PERF_X86_EVENT_COMMITTED	0x0008 /* event passed commit_txn */
+#define PERF_X86_EVENT_PEBS_LD_HSW	0x0010 /* haswell style datala, load */
+#define PERF_X86_EVENT_PEBS_NA_HSW	0x0020 /* haswell style datala, unknown */
+#define PERF_X86_EVENT_EXCL		0x0040 /* HT exclusivity on counter */
+#define PERF_X86_EVENT_DYNAMIC		0x0080 /* dynamic alloc'd constraint */
+#define PERF_X86_EVENT_RDPMC_ALLOWED	0x0100 /* grant rdpmc permission */
 
 
 struct amd_nb {
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 9da2400c2ec3..219d3fb423a1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -3275,7 +3275,7 @@ __init int intel_pmu_init(void)
 		hw_cache_extra_regs[C(NODE)][C(OP_WRITE)][C(RESULT_ACCESS)] = HSW_DEMAND_WRITE|
 									      BDW_L3_MISS_LOCAL|HSW_SNOOP_DRAM;
 
-		intel_pmu_lbr_init_snb();
+		intel_pmu_lbr_init_hsw();
 
 		x86_pmu.event_constraints = intel_bdw_event_constraints;
 		x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index ca69ea56c712..813f75d71175 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -558,6 +558,8 @@ struct event_constraint intel_core2_pebs_event_constraints[] = {
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x00c5, 0x1), /* BR_INST_RETIRED.MISPRED */
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x1fc7, 0x1), /* SIMD_INST_RETURED.ANY */
 	INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0x1),    /* MEM_LOAD_RETIRED.* */
+	/* INST_RETIRED.ANY_P, inv=1, cmask=16 (cycles:p). */
+	INTEL_FLAGS_EVENT_CONSTRAINT(0x108000c0, 0x01),
 	EVENT_CONSTRAINT_END
 };
 
@@ -565,6 +567,8 @@ struct event_constraint intel_atom_pebs_event_constraints[] = {
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x00c0, 0x1), /* INST_RETIRED.ANY */
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x00c5, 0x1), /* MISPREDICTED_BRANCH_RETIRED */
 	INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0x1),    /* MEM_LOAD_RETIRED.* */
+	/* INST_RETIRED.ANY_P, inv=1, cmask=16 (cycles:p). */
+	INTEL_FLAGS_EVENT_CONSTRAINT(0x108000c0, 0x01),
 	EVENT_CONSTRAINT_END
 };
 
@@ -588,6 +592,8 @@ struct event_constraint intel_nehalem_pebs_event_constraints[] = {
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x20c8, 0xf), /* ITLB_MISS_RETIRED */
 	INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0xf),    /* MEM_LOAD_RETIRED.* */
 	INTEL_FLAGS_EVENT_CONSTRAINT(0xf7, 0xf),    /* FP_ASSIST.* */
+	/* INST_RETIRED.ANY_P, inv=1, cmask=16 (cycles:p). */
+	INTEL_FLAGS_EVENT_CONSTRAINT(0x108000c0, 0x0f),
 	EVENT_CONSTRAINT_END
 };
 
@@ -603,6 +609,8 @@ struct event_constraint intel_westmere_pebs_event_constraints[] = {
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x20c8, 0xf), /* ITLB_MISS_RETIRED */
 	INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0xf),    /* MEM_LOAD_RETIRED.* */
 	INTEL_FLAGS_EVENT_CONSTRAINT(0xf7, 0xf),    /* FP_ASSIST.* */
+	/* INST_RETIRED.ANY_P, inv=1, cmask=16 (cycles:p). */
+	INTEL_FLAGS_EVENT_CONSTRAINT(0x108000c0, 0x0f),
 	EVENT_CONSTRAINT_END
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index f2770641c0fd..ffe666c2c6b5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -988,39 +988,36 @@ static int pt_event_add(struct perf_event *event, int mode)
 	int ret = -EBUSY;
 
 	if (pt->handle.event)
-		goto out;
+		goto fail;
 
 	buf = perf_aux_output_begin(&pt->handle, event);
-	if (!buf) {
-		ret = -EINVAL;
-		goto out;
-	}
+	ret = -EINVAL;
+	if (!buf)
+		goto fail_stop;
 
 	pt_buffer_reset_offsets(buf, pt->handle.head);
 	if (!buf->snapshot) {
 		ret = pt_buffer_reset_markers(buf, &pt->handle);
-		if (ret) {
-			perf_aux_output_end(&pt->handle, 0, true);
-			goto out;
-		}
+		if (ret)
+			goto fail_end_stop;
 	}
 
 	if (mode & PERF_EF_START) {
 		pt_event_start(event, 0);
-		if (hwc->state == PERF_HES_STOPPED) {
-			pt_event_del(event, 0);
-			ret = -EBUSY;
-		}
+		ret = -EBUSY;
+		if (hwc->state == PERF_HES_STOPPED)
+			goto fail_end_stop;
 	} else {
 		hwc->state = PERF_HES_STOPPED;
 	}
 
-	ret = 0;
-out:
-
-	if (ret)
-		hwc->state = PERF_HES_STOPPED;
+	return 0;
 
+fail_end_stop:
+	perf_aux_output_end(&pt->handle, 0, true);
+fail_stop:
+	hwc->state = PERF_HES_STOPPED;
+fail:
 	return ret;
 }
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
index c4bb8b8e5017..999289b94025 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -62,6 +62,14 @@
 #define RAPL_IDX_PP1_NRG_STAT	3	/* gpu */
 #define INTEL_RAPL_PP1		0x4	/* pseudo-encoding */
 
+#define NR_RAPL_DOMAINS         0x4
+static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
+	"pp0-core",
+	"package",
+	"dram",
+	"pp1-gpu",
+};
+
 /* Clients have PP0, PKG */
 #define RAPL_IDX_CLN	(1<<RAPL_IDX_PP0_NRG_STAT|\
 			 1<<RAPL_IDX_PKG_NRG_STAT|\
@@ -112,7 +120,6 @@ static struct perf_pmu_events_attr event_attr_##v = {			\
 
 struct rapl_pmu {
 	spinlock_t	 lock;
-	int		 hw_unit;  /* 1/2^hw_unit Joule */
 	int		 n_active; /* number of active events */
 	struct list_head active_list;
 	struct pmu	 *pmu; /* pointer to rapl_pmu_class */
@@ -120,6 +127,7 @@ struct rapl_pmu {
 	struct hrtimer   hrtimer;
 };
 
+static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;  /* 1/2^hw_unit Joule */
 static struct pmu rapl_pmu_class;
 static cpumask_t rapl_cpu_mask;
 static int rapl_cntr_mask;
@@ -127,6 +135,7 @@ static int rapl_cntr_mask;
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu);
 static DEFINE_PER_CPU(struct rapl_pmu *, rapl_pmu_to_free);
 
+static struct x86_pmu_quirk *rapl_quirks;
 static inline u64 rapl_read_counter(struct perf_event *event)
 {
 	u64 raw;
@@ -134,15 +143,28 @@ static inline u64 rapl_read_counter(struct perf_event *event)
 	return raw;
 }
 
-static inline u64 rapl_scale(u64 v)
+#define rapl_add_quirk(func_)						\
+do {									\
+	static struct x86_pmu_quirk __quirk __initdata = {		\
+		.func = func_,						\
+	};								\
+	__quirk.next = rapl_quirks;					\
+	rapl_quirks = &__quirk;						\
+} while (0)
+
+static inline u64 rapl_scale(u64 v, int cfg)
 {
+	if (cfg > NR_RAPL_DOMAINS) {
+		pr_warn("invalid domain %d, failed to scale data\n", cfg);
+		return v;
+	}
 	/*
 	 * scale delta to smallest unit (1/2^32)
 	 * users must then scale back: count * 1/(1e9*2^32) to get Joules
 	 * or use ldexp(count, -32).
 	 * Watts = Joules/Time delta
 	 */
-	return v << (32 - __this_cpu_read(rapl_pmu)->hw_unit);
+	return v << (32 - rapl_hw_unit[cfg - 1]);
 }
 
 static u64 rapl_event_update(struct perf_event *event)
@@ -173,7 +195,7 @@ static u64 rapl_event_update(struct perf_event *event)
 	delta = (new_raw_count << shift) - (prev_raw_count << shift);
 	delta >>= shift;
 
-	sdelta = rapl_scale(delta);
+	sdelta = rapl_scale(delta, event->hw.config);
 
 	local64_add(sdelta, &event->count);
 
@@ -546,12 +568,22 @@ static void rapl_cpu_init(int cpu)
 	cpumask_set_cpu(cpu, &rapl_cpu_mask);
 }
 
+static __init void rapl_hsw_server_quirk(void)
+{
+	/*
+	 * DRAM domain on HSW server has fixed energy unit which can be
+	 * different than the unit from power unit MSR.
+	 * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
+	 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
+	 */
+	rapl_hw_unit[RAPL_IDX_RAM_NRG_STAT] = 16;
+}
+
 static int rapl_cpu_prepare(int cpu)
 {
 	struct rapl_pmu *pmu = per_cpu(rapl_pmu, cpu);
 	int phys_id = topology_physical_package_id(cpu);
 	u64 ms;
-	u64 msr_rapl_power_unit_bits;
 
 	if (pmu)
 		return 0;
@@ -559,24 +591,13 @@ static int rapl_cpu_prepare(int cpu)
 	if (phys_id < 0)
 		return -1;
 
-	/* protect rdmsrl() to handle virtualization */
-	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &msr_rapl_power_unit_bits))
-		return -1;
-
 	pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
 	if (!pmu)
 		return -1;
-
 	spin_lock_init(&pmu->lock);
 
 	INIT_LIST_HEAD(&pmu->active_list);
 
-	/*
-	 * grab power unit as: 1/2^unit Joules
-	 *
-	 * we cache in local PMU instance
-	 */
-	pmu->hw_unit = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 	pmu->pmu = &rapl_pmu_class;
 
 	/*
@@ -586,8 +607,8 @@ static int rapl_cpu_prepare(int cpu)
 	 * divide interval by 2 to avoid lockstep (2 * 100)
 	 * if hw unit is 32, then we use 2 ms 1/200/2
 	 */
-	if (pmu->hw_unit < 32)
-		ms = (1000 / (2 * 100)) * (1ULL << (32 - pmu->hw_unit - 1));
+	if (rapl_hw_unit[0] < 32)
+		ms = (1000 / (2 * 100)) * (1ULL << (32 - rapl_hw_unit[0] - 1));
 	else
 		ms = 2;
 
@@ -655,6 +676,20 @@ static int rapl_cpu_notifier(struct notifier_block *self,
 	return NOTIFY_OK;
 }
 
+static int rapl_check_hw_unit(void)
+{
+	u64 msr_rapl_power_unit_bits;
+	int i;
+
+	/* protect rdmsrl() to handle virtualization */
+	if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &msr_rapl_power_unit_bits))
+		return -1;
+	for (i = 0; i < NR_RAPL_DOMAINS; i++)
+		rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
+
+	return 0;
+}
+
 static const struct x86_cpu_id rapl_cpu_match[] = {
 	[0] = { .vendor = X86_VENDOR_INTEL, .family = 6 },
 	[1] = {},
@@ -664,6 +699,8 @@ static int __init rapl_pmu_init(void)
 {
 	struct rapl_pmu *pmu;
 	int cpu, ret;
+	struct x86_pmu_quirk *quirk;
+	int i;
 
 	/*
 	 * check for Intel processor family 6
@@ -678,6 +715,11 @@ static int __init rapl_pmu_init(void)
 		rapl_cntr_mask = RAPL_IDX_CLN;
 		rapl_pmu_events_group.attrs = rapl_events_cln_attr;
 		break;
+	case 63: /* Haswell-Server */
+		rapl_add_quirk(rapl_hsw_server_quirk);
+		rapl_cntr_mask = RAPL_IDX_SRV;
+		rapl_pmu_events_group.attrs = rapl_events_srv_attr;
+		break;
 	case 60: /* Haswell */
 	case 69: /* Haswell-Celeron */
 		rapl_cntr_mask = RAPL_IDX_HSW;
@@ -693,7 +735,13 @@ static int __init rapl_pmu_init(void)
 		/* unsupported */
 		return 0;
 	}
+	ret = rapl_check_hw_unit();
+	if (ret)
+		return ret;
 
+	/* run cpu model quirks */
+	for (quirk = rapl_quirks; quirk; quirk = quirk->next)
+		quirk->func();
 	cpu_notifier_register_begin();
 
 	for_each_online_cpu(cpu) {
@@ -714,14 +762,18 @@ static int __init rapl_pmu_init(void)
 
 	pmu = __this_cpu_read(rapl_pmu);
 
-	pr_info("RAPL PMU detected, hw unit 2^-%d Joules,"
+	pr_info("RAPL PMU detected,"
 		" API unit is 2^-32 Joules,"
 		" %d fixed counters"
 		" %llu ms ovfl timer\n",
-		pmu->hw_unit,
 		hweight32(rapl_cntr_mask),
 		ktime_to_ms(pmu->timer_interval));
-
+	for (i = 0; i < NR_RAPL_DOMAINS; i++) {
+		if (rapl_cntr_mask & (1 << i)) {
+			pr_info("hw unit of domain %s 2^-%d Joules\n",
+				rapl_domain_names[i], rapl_hw_unit[i]);
+		}
+	}
 out:
 	cpu_notifier_register_done();
 
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 0421f49a20f7..42febb6bc1d5 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -18,14 +18,14 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_ARGS(page),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(unsigned long, i_ino)
 		__field(unsigned long, index)
 		__field(dev_t, s_dev)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->i_ino = page->mapping->host->i_ino;
 		__entry->index = page->index;
 		if (page->mapping->host->i_sb)
@@ -37,8 +37,8 @@ DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 	TP_printk("dev %d:%d ino %lx page=%p pfn=%lu ofs=%lu",
 		MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
 		__entry->i_ino,
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->index << PAGE_SHIFT)
 );
 
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 4ad10baecd4d..81ea59812117 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -154,18 +154,18 @@ TRACE_EVENT(mm_page_free,
 	TP_ARGS(page, order),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->order		= order;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->order)
 );
 
@@ -176,18 +176,18 @@ TRACE_EVENT(mm_page_free_batched,
 	TP_ARGS(page, cold),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	int,		cold		)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page_to_pfn(page);
 		__entry->cold		= cold;
 	),
 
 	TP_printk("page=%p pfn=%lu order=0 cold=%d",
-			__entry->page,
-			page_to_pfn(__entry->page),
+			pfn_to_page(__entry->pfn),
+			__entry->pfn,
 			__entry->cold)
 );
 
@@ -199,22 +199,22 @@ TRACE_EVENT(mm_page_alloc,
 	TP_ARGS(page, order, gfp_flags, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	gfp_t,		gfp_flags	)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->gfp_flags	= gfp_flags;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		show_gfp_flags(__entry->gfp_flags))
@@ -227,20 +227,20 @@ DECLARE_EVENT_CLASS(mm_page,
 	TP_ARGS(page, order, migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page		)
+		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	int,		migratetype	)
 	),
 
 	TP_fast_assign(
-		__entry->page		= page;
+		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->migratetype	= migratetype;
 	),
 
 	TP_printk("page=%p pfn=%lu order=%u migratetype=%d percpu_refill=%d",
-		__entry->page,
-		__entry->page ? page_to_pfn(__entry->page) : 0,
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
 		__entry->order == 0)
@@ -260,7 +260,7 @@ DEFINE_EVENT_PRINT(mm_page, mm_page_pcpu_drain,
 	TP_ARGS(page, order, migratetype),
 
 	TP_printk("page=%p pfn=%lu order=%d migratetype=%d",
-		__entry->page, page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn), __entry->pfn,
 		__entry->order, __entry->migratetype)
 );
 
@@ -275,7 +275,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 		alloc_migratetype, fallback_migratetype),
 
 	TP_STRUCT__entry(
-		__field(	struct page *,	page			)
+		__field(	unsigned long,	pfn			)
 		__field(	int,		alloc_order		)
 		__field(	int,		fallback_order		)
 		__field(	int,		alloc_migratetype	)
@@ -284,7 +284,7 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_fast_assign(
-		__entry->page			= page;
+		__entry->pfn			= page_to_pfn(page);
 		__entry->alloc_order		= alloc_order;
 		__entry->fallback_order		= fallback_order;
 		__entry->alloc_migratetype	= alloc_migratetype;
@@ -294,8 +294,8 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 	),
 
 	TP_printk("page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		__entry->alloc_order,
 		__entry->fallback_order,
 		pageblock_order,
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 69590b6ffc09..f66476b96264 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -336,18 +336,18 @@ TRACE_EVENT(mm_vmscan_writepage,
 	TP_ARGS(page, reclaim_flags),
 
 	TP_STRUCT__entry(
-		__field(struct page *, page)
+		__field(unsigned long, pfn)
 		__field(int, reclaim_flags)
 	),
 
 	TP_fast_assign(
-		__entry->page = page;
+		__entry->pfn = page_to_pfn(page);
 		__entry->reclaim_flags = reclaim_flags;
 	),
 
 	TP_printk("page=%p pfn=%lu flags=%s",
-		__entry->page,
-		page_to_pfn(__entry->page),
+		pfn_to_page(__entry->pfn),
+		__entry->pfn,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
 
diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 150253cc3c97..23219c65c16f 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -3,7 +3,7 @@ perf-kmem(1)
 
 NAME
 ----
-perf-kmem - Tool to trace/measure kernel memory(slab) properties
+perf-kmem - Tool to trace/measure kernel memory properties
 
 SYNOPSIS
 --------
@@ -46,6 +46,12 @@ OPTIONS
 --raw-ip::
 	Print raw ip instead of symbol
 
+--slab::
+	Analyze SLAB allocator events.
+
+--page::
+	Analyze page allocator events
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 4ebf65c79434..63ea01349b6e 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -22,6 +22,11 @@
 #include <linux/string.h>
 #include <locale.h>
 
+static int	kmem_slab;
+static int	kmem_page;
+
+static long	kmem_page_size;
+
 struct alloc_stat;
 typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
 
@@ -226,6 +231,244 @@ static int perf_evsel__process_free_event(struct perf_evsel *evsel,
 	return 0;
 }
 
+static u64 total_page_alloc_bytes;
+static u64 total_page_free_bytes;
+static u64 total_page_nomatch_bytes;
+static u64 total_page_fail_bytes;
+static unsigned long nr_page_allocs;
+static unsigned long nr_page_frees;
+static unsigned long nr_page_fails;
+static unsigned long nr_page_nomatch;
+
+static bool use_pfn;
+
+#define MAX_MIGRATE_TYPES  6
+#define MAX_PAGE_ORDER     11
+
+static int order_stats[MAX_PAGE_ORDER][MAX_MIGRATE_TYPES];
+
+struct page_stat {
+	struct rb_node 	node;
+	u64 		page;
+	int 		order;
+	unsigned 	gfp_flags;
+	unsigned 	migrate_type;
+	u64		alloc_bytes;
+	u64 		free_bytes;
+	int 		nr_alloc;
+	int 		nr_free;
+};
+
+static struct rb_root page_tree;
+static struct rb_root page_alloc_tree;
+static struct rb_root page_alloc_sorted;
+
+static struct page_stat *search_page(unsigned long page, bool create)
+{
+	struct rb_node **node = &page_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = data->page - page;
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = page;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_tree);
+	}
+
+	return data;
+}
+
+static int page_stat_cmp(struct page_stat *a, struct page_stat *b)
+{
+	if (a->page > b->page)
+		return -1;
+	if (a->page < b->page)
+		return 1;
+	if (a->order > b->order)
+		return -1;
+	if (a->order < b->order)
+		return 1;
+	if (a->migrate_type > b->migrate_type)
+		return -1;
+	if (a->migrate_type < b->migrate_type)
+		return 1;
+	if (a->gfp_flags > b->gfp_flags)
+		return -1;
+	if (a->gfp_flags < b->gfp_flags)
+		return 1;
+	return 0;
+}
+
+static struct page_stat *search_page_alloc_stat(struct page_stat *stat, bool create)
+{
+	struct rb_node **node = &page_alloc_tree.rb_node;
+	struct rb_node *parent = NULL;
+	struct page_stat *data;
+
+	while (*node) {
+		s64 cmp;
+
+		parent = *node;
+		data = rb_entry(*node, struct page_stat, node);
+
+		cmp = page_stat_cmp(data, stat);
+		if (cmp < 0)
+			node = &parent->rb_left;
+		else if (cmp > 0)
+			node = &parent->rb_right;
+		else
+			return data;
+	}
+
+	if (!create)
+		return NULL;
+
+	data = zalloc(sizeof(*data));
+	if (data != NULL) {
+		data->page = stat->page;
+		data->order = stat->order;
+		data->gfp_flags = stat->gfp_flags;
+		data->migrate_type = stat->migrate_type;
+
+		rb_link_node(&data->node, parent, node);
+		rb_insert_color(&data->node, &page_alloc_tree);
+	}
+
+	return data;
+}
+
+static bool valid_page(u64 pfn_or_page)
+{
+	if (use_pfn && pfn_or_page == -1UL)
+		return false;
+	if (!use_pfn && pfn_or_page == 0)
+		return false;
+	return true;
+}
+
+static int perf_evsel__process_page_alloc_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	unsigned int gfp_flags = perf_evsel__intval(evsel, sample, "gfp_flags");
+	unsigned int migrate_type = perf_evsel__intval(evsel, sample,
+						       "migratetype");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+		.gfp_flags = gfp_flags,
+		.migrate_type = migrate_type,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_allocs++;
+	total_page_alloc_bytes += bytes;
+
+	if (!valid_page(page)) {
+		nr_page_fails++;
+		total_page_fail_bytes += bytes;
+
+		return 0;
+	}
+
+	/*
+	 * This is to find the current page (with correct gfp flags and
+	 * migrate type) at free event.
+	 */
+	stat = search_page(page, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->order = order;
+	stat->gfp_flags = gfp_flags;
+	stat->migrate_type = migrate_type;
+
+	this.page = page;
+	stat = search_page_alloc_stat(&this, true);
+	if (stat == NULL)
+		return -ENOMEM;
+
+	stat->nr_alloc++;
+	stat->alloc_bytes += bytes;
+
+	order_stats[order][migrate_type]++;
+
+	return 0;
+}
+
+static int perf_evsel__process_page_free_event(struct perf_evsel *evsel,
+						struct perf_sample *sample)
+{
+	u64 page;
+	unsigned int order = perf_evsel__intval(evsel, sample, "order");
+	u64 bytes = kmem_page_size << order;
+	struct page_stat *stat;
+	struct page_stat this = {
+		.order = order,
+	};
+
+	if (use_pfn)
+		page = perf_evsel__intval(evsel, sample, "pfn");
+	else
+		page = perf_evsel__intval(evsel, sample, "page");
+
+	nr_page_frees++;
+	total_page_free_bytes += bytes;
+
+	stat = search_page(page, false);
+	if (stat == NULL) {
+		pr_debug2("missing free at page %"PRIx64" (order: %d)\n",
+			  page, order);
+
+		nr_page_nomatch++;
+		total_page_nomatch_bytes += bytes;
+
+		return 0;
+	}
+
+	this.page = page;
+	this.gfp_flags = stat->gfp_flags;
+	this.migrate_type = stat->migrate_type;
+
+	rb_erase(&stat->node, &page_tree);
+	free(stat);
+
+	stat = search_page_alloc_stat(&this, false);
+	if (stat == NULL)
+		return -ENOENT;
+
+	stat->nr_free++;
+	stat->free_bytes += bytes;
+
+	return 0;
+}
+
 typedef int (*tracepoint_handler)(struct perf_evsel *evsel,
 				  struct perf_sample *sample);
 
@@ -270,8 +513,9 @@ static double fragmentation(unsigned long n_req, unsigned long n_alloc)
 		return 100.0 - (100.0 * n_req / n_alloc);
 }
 
-static void __print_result(struct rb_root *root, struct perf_session *session,
-			   int n_lines, int is_caller)
+static void __print_slab_result(struct rb_root *root,
+				struct perf_session *session,
+				int n_lines, int is_caller)
 {
 	struct rb_node *next;
 	struct machine *machine = &session->machines.host;
@@ -323,9 +567,56 @@ static void __print_result(struct rb_root *root, struct perf_session *session,
 	printf("%.105s\n", graph_dotted_line);
 }
 
-static void print_summary(void)
+static const char * const migrate_type_str[] = {
+	"UNMOVABL",
+	"RECLAIM",
+	"MOVABLE",
+	"RESERVED",
+	"CMA/ISLT",
+	"UNKNOWN",
+};
+
+static void __print_page_result(struct rb_root *root,
+				struct perf_session *session __maybe_unused,
+				int n_lines)
+{
+	struct rb_node *next = rb_first(root);
+	const char *format;
+
+	printf("\n%.80s\n", graph_dotted_line);
+	printf(" %-16s | Total alloc (KB) | Hits      | Order | Mig.type | GFP flags\n",
+	       use_pfn ? "PFN" : "Page");
+	printf("%.80s\n", graph_dotted_line);
+
+	if (use_pfn)
+		format = " %16llu | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+	else
+		format = " %016llx | %'16llu | %'9d | %5d | %8s |  %08lx\n";
+
+	while (next && n_lines--) {
+		struct page_stat *data;
+
+		data = rb_entry(next, struct page_stat, node);
+
+		printf(format, (unsigned long long)data->page,
+		       (unsigned long long)data->alloc_bytes / 1024,
+		       data->nr_alloc, data->order,
+		       migrate_type_str[data->migrate_type],
+		       (unsigned long)data->gfp_flags);
+
+		next = rb_next(next);
+	}
+
+	if (n_lines == -1)
+		printf(" ...              | ...              | ...       | ...   | ...      | ...     \n");
+
+	printf("%.80s\n", graph_dotted_line);
+}
+
+static void print_slab_summary(void)
 {
-	printf("\nSUMMARY\n=======\n");
+	printf("\nSUMMARY (SLAB allocator)");
+	printf("\n========================\n");
 	printf("Total bytes requested: %'lu\n", total_requested);
 	printf("Total bytes allocated: %'lu\n", total_allocated);
 	printf("Total bytes wasted on internal fragmentation: %'lu\n",
@@ -335,13 +626,73 @@ static void print_summary(void)
 	printf("Cross CPU allocations: %'lu/%'lu\n", nr_cross_allocs, nr_allocs);
 }
 
-static void print_result(struct perf_session *session)
+static void print_page_summary(void)
+{
+	int o, m;
+	u64 nr_alloc_freed = nr_page_frees - nr_page_nomatch;
+	u64 total_alloc_freed_bytes = total_page_free_bytes - total_page_nomatch_bytes;
+
+	printf("\nSUMMARY (page allocator)");
+	printf("\n========================\n");
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation requests",
+	       nr_page_allocs, total_page_alloc_bytes / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free requests",
+	       nr_page_frees, total_page_free_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc+freed requests",
+	       nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total alloc-only requests",
+	       nr_page_allocs - nr_alloc_freed,
+	       (total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total free-only requests",
+	       nr_page_nomatch, total_page_nomatch_bytes / 1024);
+	printf("\n");
+
+	printf("%-30s: %'16lu   [ %'16"PRIu64" KB ]\n", "Total allocation failures",
+	       nr_page_fails, total_page_fail_bytes / 1024);
+	printf("\n");
+
+	printf("%5s  %12s  %12s  %12s  %12s  %12s\n", "Order",  "Unmovable",
+	       "Reclaimable", "Movable", "Reserved", "CMA/Isolated");
+	printf("%.5s  %.12s  %.12s  %.12s  %.12s  %.12s\n", graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line, graph_dotted_line,
+	       graph_dotted_line, graph_dotted_line);
+
+	for (o = 0; o < MAX_PAGE_ORDER; o++) {
+		printf("%5d", o);
+		for (m = 0; m < MAX_MIGRATE_TYPES - 1; m++) {
+			if (order_stats[o][m])
+				printf("  %'12d", order_stats[o][m]);
+			else
+				printf("  %12c", '.');
+		}
+		printf("\n");
+	}
+}
+
+static void print_slab_result(struct perf_session *session)
 {
 	if (caller_flag)
-		__print_result(&root_caller_sorted, session, caller_lines, 1);
+		__print_slab_result(&root_caller_sorted, session, caller_lines, 1);
+	if (alloc_flag)
+		__print_slab_result(&root_alloc_sorted, session, alloc_lines, 0);
+	print_slab_summary();
+}
+
+static void print_page_result(struct perf_session *session)
+{
 	if (alloc_flag)
-		__print_result(&root_alloc_sorted, session, alloc_lines, 0);
-	print_summary();
+		__print_page_result(&page_alloc_sorted, session, alloc_lines);
+	print_page_summary();
+}
+
+static void print_result(struct perf_session *session)
+{
+	if (kmem_slab)
+		print_slab_result(session);
+	if (kmem_page)
+		print_page_result(session);
 }
 
 struct sort_dimension {
@@ -353,8 +704,8 @@ struct sort_dimension {
 static LIST_HEAD(caller_sort);
 static LIST_HEAD(alloc_sort);
 
-static void sort_insert(struct rb_root *root, struct alloc_stat *data,
-			struct list_head *sort_list)
+static void sort_slab_insert(struct rb_root *root, struct alloc_stat *data,
+			     struct list_head *sort_list)
 {
 	struct rb_node **new = &(root->rb_node);
 	struct rb_node *parent = NULL;
@@ -383,8 +734,8 @@ static void sort_insert(struct rb_root *root, struct alloc_stat *data,
 	rb_insert_color(&data->node, root);
 }
 
-static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
-			  struct list_head *sort_list)
+static void __sort_slab_result(struct rb_root *root, struct rb_root *root_sorted,
+			       struct list_head *sort_list)
 {
 	struct rb_node *node;
 	struct alloc_stat *data;
@@ -396,26 +747,79 @@ static void __sort_result(struct rb_root *root, struct rb_root *root_sorted,
 
 		rb_erase(node, root);
 		data = rb_entry(node, struct alloc_stat, node);
-		sort_insert(root_sorted, data, sort_list);
+		sort_slab_insert(root_sorted, data, sort_list);
+	}
+}
+
+static void sort_page_insert(struct rb_root *root, struct page_stat *data)
+{
+	struct rb_node **new = &root->rb_node;
+	struct rb_node *parent = NULL;
+
+	while (*new) {
+		struct page_stat *this;
+		int cmp = 0;
+
+		this = rb_entry(*new, struct page_stat, node);
+		parent = *new;
+
+		/* TODO: support more sort key */
+		cmp = data->alloc_bytes - this->alloc_bytes;
+
+		if (cmp > 0)
+			new = &parent->rb_left;
+		else
+			new = &parent->rb_right;
+	}
+
+	rb_link_node(&data->node, parent, new);
+	rb_insert_color(&data->node, root);
+}
+
+static void __sort_page_result(struct rb_root *root, struct rb_root *root_sorted)
+{
+	struct rb_node *node;
+	struct page_stat *data;
+
+	for (;;) {
+		node = rb_first(root);
+		if (!node)
+			break;
+
+		rb_erase(node, root);
+		data = rb_entry(node, struct page_stat, node);
+		sort_page_insert(root_sorted, data);
 	}
 }
 
 static void sort_result(void)
 {
-	__sort_result(&root_alloc_stat, &root_alloc_sorted, &alloc_sort);
-	__sort_result(&root_caller_stat, &root_caller_sorted, &caller_sort);
+	if (kmem_slab) {
+		__sort_slab_result(&root_alloc_stat, &root_alloc_sorted,
+				   &alloc_sort);
+		__sort_slab_result(&root_caller_stat, &root_caller_sorted,
+				   &caller_sort);
+	}
+	if (kmem_page) {
+		__sort_page_result(&page_alloc_tree, &page_alloc_sorted);
+	}
 }
 
 static int __cmd_kmem(struct perf_session *session)
 {
 	int err = -EINVAL;
+	struct perf_evsel *evsel;
 	const struct perf_evsel_str_handler kmem_tracepoints[] = {
+		/* slab allocator */
 		{ "kmem:kmalloc",		perf_evsel__process_alloc_event, },
     		{ "kmem:kmem_cache_alloc",	perf_evsel__process_alloc_event, },
 		{ "kmem:kmalloc_node",		perf_evsel__process_alloc_node_event, },
     		{ "kmem:kmem_cache_alloc_node", perf_evsel__process_alloc_node_event, },
 		{ "kmem:kfree",			perf_evsel__process_free_event, },
     		{ "kmem:kmem_cache_free",	perf_evsel__process_free_event, },
+		/* page allocator */
+		{ "kmem:mm_page_alloc",		perf_evsel__process_page_alloc_event, },
+		{ "kmem:mm_page_free",		perf_evsel__process_page_free_event, },
 	};
 
 	if (!perf_session__has_traces(session, "kmem record"))
@@ -426,10 +830,20 @@ static int __cmd_kmem(struct perf_session *session)
 		goto out;
 	}
 
+	evlist__for_each(session->evlist, evsel) {
+		if (!strcmp(perf_evsel__name(evsel), "kmem:mm_page_alloc") &&
+		    perf_evsel__field(evsel, "pfn")) {
+			use_pfn = true;
+			break;
+		}
+	}
+
 	setup_pager();
 	err = perf_session__process_events(session);
-	if (err != 0)
+	if (err != 0) {
+		pr_err("error during process events: %d\n", err);
 		goto out;
+	}
 	sort_result();
 	print_result(session);
 out:
@@ -612,6 +1026,22 @@ static int parse_alloc_opt(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int parse_slab_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_slab = (kmem_page + 1);
+	return 0;
+}
+
+static int parse_page_opt(const struct option *opt __maybe_unused,
+			  const char *arg __maybe_unused,
+			  int unset __maybe_unused)
+{
+	kmem_page = (kmem_slab + 1);
+	return 0;
+}
+
 static int parse_line_opt(const struct option *opt __maybe_unused,
 			  const char *arg, int unset __maybe_unused)
 {
@@ -634,6 +1064,8 @@ static int __cmd_record(int argc, const char **argv)
 {
 	const char * const record_args[] = {
 	"record", "-a", "-R", "-c", "1",
+	};
+	const char * const slab_events[] = {
 	"-e", "kmem:kmalloc",
 	"-e", "kmem:kmalloc_node",
 	"-e", "kmem:kfree",
@@ -641,10 +1073,19 @@ static int __cmd_record(int argc, const char **argv)
 	"-e", "kmem:kmem_cache_alloc_node",
 	"-e", "kmem:kmem_cache_free",
 	};
+	const char * const page_events[] = {
+	"-e", "kmem:mm_page_alloc",
+	"-e", "kmem:mm_page_free",
+	};
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
 
 	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	if (kmem_slab)
+		rec_argc += ARRAY_SIZE(slab_events);
+	if (kmem_page)
+		rec_argc += ARRAY_SIZE(page_events);
+
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
 	if (rec_argv == NULL)
@@ -653,6 +1094,15 @@ static int __cmd_record(int argc, const char **argv)
 	for (i = 0; i < ARRAY_SIZE(record_args); i++)
 		rec_argv[i] = strdup(record_args[i]);
 
+	if (kmem_slab) {
+		for (j = 0; j < ARRAY_SIZE(slab_events); j++, i++)
+			rec_argv[i] = strdup(slab_events[j]);
+	}
+	if (kmem_page) {
+		for (j = 0; j < ARRAY_SIZE(page_events); j++, i++)
+			rec_argv[i] = strdup(page_events[j]);
+	}
+
 	for (j = 1; j < (unsigned int)argc; j++, i++)
 		rec_argv[i] = argv[j];
 
@@ -679,6 +1129,10 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_CALLBACK('l', "line", NULL, "num", "show n lines", parse_line_opt),
 	OPT_BOOLEAN(0, "raw-ip", &raw_ip, "show raw ip instead of symbol"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
+	OPT_CALLBACK_NOOPT(0, "slab", NULL, NULL, "Analyze slab allocator",
+			   parse_slab_opt),
+	OPT_CALLBACK_NOOPT(0, "page", NULL, NULL, "Analyze page allocator",
+			   parse_page_opt),
 	OPT_END()
 	};
 	const char *const kmem_subcommands[] = { "record", "stat", NULL };
@@ -695,6 +1149,9 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (!argc)
 		usage_with_options(kmem_usage, kmem_options);
 
+	if (kmem_slab == 0 && kmem_page == 0)
+		kmem_slab = 1;  /* for backward compatibility */
+
 	if (!strncmp(argv[0], "rec", 3)) {
 		symbol__init(NULL);
 		return __cmd_record(argc, argv);
@@ -706,6 +1163,17 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (session == NULL)
 		return -1;
 
+	if (kmem_page) {
+		struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+
+		if (evsel == NULL || evsel->tp_format == NULL) {
+			pr_err("invalid event found.. aborting\n");
+			return -1;
+		}
+
+		kmem_page_size = pevent_get_page_size(evsel->tp_format->pevent);
+	}
+
 	symbol__init(&session->header.env);
 
 	if (!strcmp(argv[0], "stat")) {
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 30545ce2c712..d8bb616ff57c 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -332,6 +332,7 @@ static int find_alternative_probe_point(struct debuginfo *dinfo,
 	else {
 		result->offset += pp->offset;
 		result->line += pp->line;
+		result->retprobe = pp->retprobe;
 		ret = 0;
 	}
 
@@ -654,65 +655,6 @@ static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
 	return ntevs;
 }
 
-/*
- * Find a src file from a DWARF tag path. Prepend optional source path prefix
- * and chop off leading directories that do not exist. Result is passed back as
- * a newly allocated path on success.
- * Return 0 if file was found and readable, -errno otherwise.
- */
-static int get_real_path(const char *raw_path, const char *comp_dir,
-			 char **new_path)
-{
-	const char *prefix = symbol_conf.source_prefix;
-
-	if (!prefix) {
-		if (raw_path[0] != '/' && comp_dir)
-			/* If not an absolute path, try to use comp_dir */
-			prefix = comp_dir;
-		else {
-			if (access(raw_path, R_OK) == 0) {
-				*new_path = strdup(raw_path);
-				return *new_path ? 0 : -ENOMEM;
-			} else
-				return -errno;
-		}
-	}
-
-	*new_path = malloc((strlen(prefix) + strlen(raw_path) + 2));
-	if (!*new_path)
-		return -ENOMEM;
-
-	for (;;) {
-		sprintf(*new_path, "%s/%s", prefix, raw_path);
-
-		if (access(*new_path, R_OK) == 0)
-			return 0;
-
-		if (!symbol_conf.source_prefix) {
-			/* In case of searching comp_dir, don't retry */
-			zfree(new_path);
-			return -errno;
-		}
-
-		switch (errno) {
-		case ENAMETOOLONG:
-		case ENOENT:
-		case EROFS:
-		case EFAULT:
-			raw_path = strchr(++raw_path, '/');
-			if (!raw_path) {
-				zfree(new_path);
-				return -ENOENT;
-			}
-			continue;
-
-		default:
-			zfree(new_path);
-			return -errno;
-		}
-	}
-}
-
 #define LINEBUF_SIZE 256
 #define NR_ADDITIONAL_LINES 2
 
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 7831e2d93949..44554c3c2220 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -855,11 +855,22 @@ static int probe_point_lazy_walker(const char *fname, int lineno,
 static int find_probe_point_lazy(Dwarf_Die *sp_die, struct probe_finder *pf)
 {
 	int ret = 0;
+	char *fpath;
 
 	if (intlist__empty(pf->lcache)) {
+		const char *comp_dir;
+
+		comp_dir = cu_get_comp_dir(&pf->cu_die);
+		ret = get_real_path(pf->fname, comp_dir, &fpath);
+		if (ret < 0) {
+			pr_warning("Failed to find source file path.\n");
+			return ret;
+		}
+
 		/* Matching lazy line pattern */
-		ret = find_lazy_match_lines(pf->lcache, pf->fname,
+		ret = find_lazy_match_lines(pf->lcache, fpath,
 					    pf->pev->point.lazy_line);
+		free(fpath);
 		if (ret <= 0)
 			return ret;
 	}
@@ -1055,7 +1066,7 @@ static int debuginfo__find_probes(struct debuginfo *dbg,
 			if (pp->function)
 				ret = find_probe_point_by_func(pf);
 			else if (pp->lazy_line)
-				ret = find_probe_point_lazy(NULL, pf);
+				ret = find_probe_point_lazy(&pf->cu_die, pf);
 			else {
 				pf->lno = pp->line;
 				ret = find_probe_point_by_line(pf);
@@ -1622,3 +1633,61 @@ int debuginfo__find_line_range(struct debuginfo *dbg, struct line_range *lr)
 	return (ret < 0) ? ret : lf.found;
 }
 
+/*
+ * Find a src file from a DWARF tag path. Prepend optional source path prefix
+ * and chop off leading directories that do not exist. Result is passed back as
+ * a newly allocated path on success.
+ * Return 0 if file was found and readable, -errno otherwise.
+ */
+int get_real_path(const char *raw_path, const char *comp_dir,
+			 char **new_path)
+{
+	const char *prefix = symbol_conf.source_prefix;
+
+	if (!prefix) {
+		if (raw_path[0] != '/' && comp_dir)
+			/* If not an absolute path, try to use comp_dir */
+			prefix = comp_dir;
+		else {
+			if (access(raw_path, R_OK) == 0) {
+				*new_path = strdup(raw_path);
+				return *new_path ? 0 : -ENOMEM;
+			} else
+				return -errno;
+		}
+	}
+
+	*new_path = malloc((strlen(prefix) + strlen(raw_path) + 2));
+	if (!*new_path)
+		return -ENOMEM;
+
+	for (;;) {
+		sprintf(*new_path, "%s/%s", prefix, raw_path);
+
+		if (access(*new_path, R_OK) == 0)
+			return 0;
+
+		if (!symbol_conf.source_prefix) {
+			/* In case of searching comp_dir, don't retry */
+			zfree(new_path);
+			return -errno;
+		}
+
+		switch (errno) {
+		case ENAMETOOLONG:
+		case ENOENT:
+		case EROFS:
+		case EFAULT:
+			raw_path = strchr(++raw_path, '/');
+			if (!raw_path) {
+				zfree(new_path);
+				return -ENOENT;
+			}
+			continue;
+
+		default:
+			zfree(new_path);
+			return -errno;
+		}
+	}
+}
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index 92590b2c7e1c..ebf8c8c81453 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -55,6 +55,10 @@ extern int debuginfo__find_available_vars_at(struct debuginfo *dbg,
 					     struct variable_list **vls,
 					     int max_points, bool externs);
 
+/* Find a src file from a DWARF tag path */
+int get_real_path(const char *raw_path, const char *comp_dir,
+			 char **new_path);
+
 struct probe_finder {
 	struct perf_probe_event	*pev;		/* Target probe event */
 

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2013-11-13 20:58 Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2013-11-13 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Thomas Gleixner, H. Peter Anvin, Andrew Morton

Linus,

Please pull the latest perf-urgent-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus

   # HEAD: d969135aae1434547f41853f0e8eaa622e8b8816 Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

A number of fixes:

  * Fix segfault on perf trace -i perf.data, from Namhyung Kim.

  * Fix segfault with --no-mmap-pages, from David Ahern.

  * Don't force a refresh during progress update in the TUI, greatly reducing
    startup costs, fix from Patrick Palka.

  * Fix sw clock event period test wrt not checking if using > max_sample_freq.

  * Handle throttle events in 'object code reading' test, fix from Adrian Hunter.

  * Prevent condition that all sort keys are elided, fix from Namhyung Kim.

  * Round mmap pages to power 2, from David Ahern.

And a number of late arrival changes:

  * Add summary only option to 'perf trace', suppressing the decoding of
    events, from David Ahern

  * 'perf trace --summary' formatting simplifications, from Pekka Emberg.

  * Beautify fifth argument of mmap() as fd, in 'perf trace', from Namhyung Kim.

  * Add direct access to dynamic arrays in libtraceevent, from Steven Rostedt.

  * Synthesize non-exec MMAP records when --data used, allowing the resolution of
    data addresses to symbols (global variables, etc), by Arnaldo Carvalho de Melo.

  * Code cleanups by David Ahern and Adrian Hunter.

 Thanks,

	Ingo

------------------>
Adrian Hunter (3):
      perf record: Use correct return type for write()
      perf tests: Compensate lower sample freq with longer test loop
      perf tests: Handle throttle events in 'object code reading' test

Arnaldo Carvalho de Melo (7):
      perf evsel: Remove idx parm from constructor
      perf record: Synthesize non-exec MMAP records when --data used
      perf machine: Introduce synthesize_threads method out of open coded equivalent
      perf machine: Simplify synthesize_threads method
      perf tests: Check return of perf_evlist__open sw clock event period test
      perf tests: Use lower sample_freq in sw clock event period test
      perf target: Shorten perf_target__ to target__

David Ahern (5):
      perf record: Move existing write_output into helper function
      perf trace: Add summary only option
      perf record: Fix segfault with --no-mmap-pages
      perf evlist: Round mmap pages to power 2 - v2
      perf evlist: Refactor mmap_pages parsing

Namhyung Kim (4):
      perf tools: Prevent condition that all sort keys are elided
      perf trace: Beautify fifth argument of mmap() as fd
      perf trace: Separate tp syscall field caching into init routine to be reused
      perf trace: Fix segfault on perf trace -i perf.data

Patrick Palka (1):
      perf ui tui progress: Don't force a refresh during progress update

Pekka Enberg (2):
      perf trace: Change syscall summary duration order
      perf trace: Simplify '--summary' output

Steven Rostedt (1):
      tools lib traceevent: Add direct access to dynamic arrays


 tools/lib/traceevent/event-parse.c        |  13 +++
 tools/perf/Documentation/perf-trace.txt   |  10 ++-
 tools/perf/builtin-kvm.c                  |  20 ++---
 tools/perf/builtin-record.c               |  35 ++++----
 tools/perf/builtin-stat.c                 |  21 +++--
 tools/perf/builtin-top.c                  |  24 ++---
 tools/perf/builtin-trace.c                | 143 +++++++++++++++++-------------
 tools/perf/perf.h                         |   2 +-
 tools/perf/tests/code-reading.c           |  17 +++-
 tools/perf/tests/evsel-tp-sched.c         |   4 +-
 tools/perf/tests/mmap-basic.c             |   2 +-
 tools/perf/tests/open-syscall-all-cpus.c  |   2 +-
 tools/perf/tests/open-syscall-tp-fields.c |   2 +-
 tools/perf/tests/open-syscall.c           |   2 +-
 tools/perf/tests/sw-clock.c               |  15 +++-
 tools/perf/tests/task-exit.c              |   2 +-
 tools/perf/ui/tui/progress.c              |   3 +-
 tools/perf/util/event.c                   |  50 ++++++-----
 tools/perf/util/event.h                   |   4 +-
 tools/perf/util/evlist.c                  |  73 ++++++++-------
 tools/perf/util/evlist.h                  |   5 +-
 tools/perf/util/evsel.c                   |  13 ++-
 tools/perf/util/evsel.h                   |  18 +++-
 tools/perf/util/header.c                  |   4 +-
 tools/perf/util/machine.c                 |  12 +++
 tools/perf/util/machine.h                 |  12 +++
 tools/perf/util/parse-events.c            |   6 +-
 tools/perf/util/sort.c                    |  13 +++
 tools/perf/util/target.c                  |  54 ++++++-----
 tools/perf/util/target.h                  |  44 +++++----
 tools/perf/util/top.c                     |   2 +-
 31 files changed, 363 insertions(+), 264 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index 8f450ad..0362d57 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -3435,6 +3435,19 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg
 			goto out_warning_op;
 		}
 		break;
+	case PRINT_DYNAMIC_ARRAY:
+		/* Without [], we pass the address to the dynamic data */
+		offset = pevent_read_number(pevent,
+					    data + arg->dynarray.field->offset,
+					    arg->dynarray.field->size);
+		/*
+		 * The actual length of the dynamic array is stored
+		 * in the top half of the field, and the offset
+		 * is in the bottom half of the 32 bit field.
+		 */
+		offset &= 0xffff;
+		val = (unsigned long long)(data + offset);
+		break;
 	default: /* not sure what to do there */
 		return 0;
 	}
diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt
index 7b0497f..fae38d9 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -93,9 +93,15 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs.
 --comm::
         Show process COMM right beside its ID, on by default, disable with --no-comm.
 
+-s::
 --summary::
-	Show a summary of syscalls by thread with min, max, and average times (in
-    msec) and relative stddev.
+	Show only a summary of syscalls by thread with min, max, and average times
+    (in msec) and relative stddev.
+
+-S::
+--with-summary::
+	Show all syscalls followed by a summary by thread with min, max, and
+    average times (in msec) and relative stddev.
 
 --tool_stats::
 	Show tool stats such as number of times fd->pathname was discovered thru
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index cd9f920..f8bf5f2 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1510,13 +1510,13 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	/*
 	 * target related setups
 	 */
-	err = perf_target__validate(&kvm->opts.target);
+	err = target__validate(&kvm->opts.target);
 	if (err) {
-		perf_target__strerror(&kvm->opts.target, err, errbuf, BUFSIZ);
+		target__strerror(&kvm->opts.target, err, errbuf, BUFSIZ);
 		ui__warning("%s", errbuf);
 	}
 
-	if (perf_target__none(&kvm->opts.target))
+	if (target__none(&kvm->opts.target))
 		kvm->opts.target.system_wide = true;
 
 
@@ -1544,18 +1544,8 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	}
 	kvm->session->evlist = kvm->evlist;
 	perf_session__set_id_hdr_size(kvm->session);
-
-
-	if (perf_target__has_task(&kvm->opts.target))
-		perf_event__synthesize_thread_map(&kvm->tool,
-						  kvm->evlist->threads,
-						  perf_event__process,
-						  &kvm->session->machines.host);
-	else
-		perf_event__synthesize_threads(&kvm->tool, perf_event__process,
-					       &kvm->session->machines.host);
-
-
+	machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target,
+				    kvm->evlist->threads, false);
 	err = kvm_live_open_events(kvm);
 	if (err)
 		goto out;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 15280b5..4d644fe 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -76,12 +76,12 @@ struct perf_record {
 	long			samples;
 };
 
-static int write_output(struct perf_record *rec, void *buf, size_t size)
+static int do_write_output(struct perf_record *rec, void *buf, size_t size)
 {
 	struct perf_data_file *file = &rec->file;
 
 	while (size) {
-		int ret = write(file->fd, buf, size);
+		ssize_t ret = write(file->fd, buf, size);
 
 		if (ret < 0) {
 			pr_err("failed to write perf data, error: %m\n");
@@ -97,6 +97,11 @@ static int write_output(struct perf_record *rec, void *buf, size_t size)
 	return 0;
 }
 
+static int write_output(struct perf_record *rec, void *buf, size_t size)
+{
+	return do_write_output(rec, buf, size);
+}
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -480,16 +485,8 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 					 perf_event__synthesize_guest_os, tool);
 	}
 
-	if (perf_target__has_task(&opts->target))
-		err = perf_event__synthesize_thread_map(tool, evsel_list->threads,
-						  process_synthesized_event,
-						  machine);
-	else if (perf_target__has_cpu(&opts->target))
-		err = perf_event__synthesize_threads(tool, process_synthesized_event,
-					       machine);
-	else /* command specified */
-		err = 0;
-
+	err = __machine__synthesize_threads(machine, tool, &opts->target, evsel_list->threads,
+					    process_synthesized_event, opts->sample_address);
 	if (err != 0)
 		goto out_delete_session;
 
@@ -509,7 +506,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 	 * (apart from group members) have enable_on_exec=1 set,
 	 * so don't spoil it by prematurely enabling them.
 	 */
-	if (!perf_target__none(&opts->target))
+	if (!target__none(&opts->target))
 		perf_evlist__enable(evsel_list);
 
 	/*
@@ -538,7 +535,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 		 * die with the process and we wait for that. Thus no need to
 		 * disable events in this case.
 		 */
-		if (done && !disabled && !perf_target__none(&opts->target)) {
+		if (done && !disabled && !target__none(&opts->target)) {
 			perf_evlist__disable(evsel_list);
 			disabled = true;
 		}
@@ -909,7 +906,7 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 
 	argc = parse_options(argc, argv, record_options, record_usage,
 			    PARSE_OPT_STOP_AT_NON_OPTION);
-	if (!argc && perf_target__none(&rec->opts.target))
+	if (!argc && target__none(&rec->opts.target))
 		usage_with_options(record_usage, record_options);
 
 	if (nr_cgroups && !rec->opts.target.system_wide) {
@@ -939,17 +936,17 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 		goto out_symbol_exit;
 	}
 
-	err = perf_target__validate(&rec->opts.target);
+	err = target__validate(&rec->opts.target);
 	if (err) {
-		perf_target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
+		target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
 		ui__warning("%s", errbuf);
 	}
 
-	err = perf_target__parse_uid(&rec->opts.target);
+	err = target__parse_uid(&rec->opts.target);
 	if (err) {
 		int saved_errno = errno;
 
-		perf_target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
+		target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
 		ui__error("%s", errbuf);
 
 		err = -saved_errno;
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 0fc1c94..ee0d565 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -108,7 +108,7 @@ enum {
 
 static struct perf_evlist	*evsel_list;
 
-static struct perf_target	target = {
+static struct target target = {
 	.uid	= UINT_MAX,
 };
 
@@ -294,11 +294,10 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
 
 	attr->inherit = !no_inherit;
 
-	if (perf_target__has_cpu(&target))
+	if (target__has_cpu(&target))
 		return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
 
-	if (!perf_target__has_task(&target) &&
-	    perf_evsel__is_group_leader(evsel)) {
+	if (!target__has_task(&target) && perf_evsel__is_group_leader(evsel)) {
 		attr->disabled = 1;
 		if (!initial_delay)
 			attr->enable_on_exec = 1;
@@ -1236,7 +1235,7 @@ static void print_stat(int argc, const char **argv)
 			fprintf(output, "\'system wide");
 		else if (target.cpu_list)
 			fprintf(output, "\'CPU(s) %s", target.cpu_list);
-		else if (!perf_target__has_task(&target)) {
+		else if (!target__has_task(&target)) {
 			fprintf(output, "\'%s", argv[0]);
 			for (i = 1; i < argc; i++)
 				fprintf(output, " %s", argv[i]);
@@ -1667,7 +1666,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	} else if (big_num_opt == 0) /* User passed --no-big-num */
 		big_num = false;
 
-	if (!argc && perf_target__none(&target))
+	if (!argc && target__none(&target))
 		usage_with_options(stat_usage, options);
 
 	if (run_count < 0) {
@@ -1680,8 +1679,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	}
 
 	/* no_aggr, cgroup are for system-wide only */
-	if ((aggr_mode != AGGR_GLOBAL || nr_cgroups)
-	     && !perf_target__has_cpu(&target)) {
+	if ((aggr_mode != AGGR_GLOBAL || nr_cgroups) &&
+	    !target__has_cpu(&target)) {
 		fprintf(stderr, "both cgroup and no-aggregation "
 			"modes only available in system-wide mode\n");
 
@@ -1694,14 +1693,14 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (add_default_attributes())
 		goto out;
 
-	perf_target__validate(&target);
+	target__validate(&target);
 
 	if (perf_evlist__create_maps(evsel_list, &target) < 0) {
-		if (perf_target__has_task(&target)) {
+		if (target__has_task(&target)) {
 			pr_err("Problems finding threads of monitor\n");
 			parse_options_usage(stat_usage, options, "p", 1);
 			parse_options_usage(NULL, options, "t", 1);
-		} else if (perf_target__has_cpu(&target)) {
+		} else if (target__has_cpu(&target)) {
 			perror("failed to parse CPUs map");
 			parse_options_usage(stat_usage, options, "C", 1);
 			parse_options_usage(NULL, options, "a", 1);
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 9acca88..b8f8e29 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -950,14 +950,8 @@ static int __cmd_top(struct perf_top *top)
 	if (ret)
 		goto out_delete;
 
-	if (perf_target__has_task(&opts->target))
-		perf_event__synthesize_thread_map(&top->tool, top->evlist->threads,
-						  perf_event__process,
-						  &top->session->machines.host);
-	else
-		perf_event__synthesize_threads(&top->tool, perf_event__process,
-					       &top->session->machines.host);
-
+	machine__synthesize_threads(&top->session->machines.host, &opts->target,
+				    top->evlist->threads, false);
 	ret = perf_top__start_counters(top);
 	if (ret)
 		goto out_delete;
@@ -973,7 +967,7 @@ static int __cmd_top(struct perf_top *top)
 	 * XXX 'top' still doesn't start workloads like record, trace, but should,
 	 * so leave the check here.
 	 */
-        if (!perf_target__none(&opts->target))
+        if (!target__none(&opts->target))
                 perf_evlist__enable(top->evlist);
 
 	/* Wait for a minimal set of events before starting the snapshot */
@@ -1059,7 +1053,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
 		.sym_pcnt_filter     = 5,
 	};
 	struct perf_record_opts *opts = &top.record_opts;
-	struct perf_target *target = &opts->target;
+	struct target *target = &opts->target;
 	const struct option options[] = {
 	OPT_CALLBACK('e', "event", &top.evlist, "event",
 		     "event selector. use 'perf list' to list available events",
@@ -1175,24 +1169,24 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
 
 	setup_browser(false);
 
-	status = perf_target__validate(target);
+	status = target__validate(target);
 	if (status) {
-		perf_target__strerror(target, status, errbuf, BUFSIZ);
+		target__strerror(target, status, errbuf, BUFSIZ);
 		ui__warning("%s", errbuf);
 	}
 
-	status = perf_target__parse_uid(target);
+	status = target__parse_uid(target);
 	if (status) {
 		int saved_errno = errno;
 
-		perf_target__strerror(target, status, errbuf, BUFSIZ);
+		target__strerror(target, status, errbuf, BUFSIZ);
 		ui__error("%s", errbuf);
 
 		status = -saved_errno;
 		goto out_delete_evlist;
 	}
 
-	if (perf_target__none(target))
+	if (target__none(target))
 		target->system_wide = true;
 
 	if (perf_evlist__create_maps(top.evlist, target) < 0)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 329b783..6b230af 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -149,21 +149,32 @@ static void perf_evsel__delete_priv(struct perf_evsel *evsel)
 	perf_evsel__delete(evsel);
 }
 
-static struct perf_evsel *perf_evsel__syscall_newtp(const char *direction,
-						    void *handler, int idx)
+static int perf_evsel__init_syscall_tp(struct perf_evsel *evsel, void *handler)
 {
-	struct perf_evsel *evsel = perf_evsel__newtp("raw_syscalls", direction, idx);
-
-	if (evsel) {
-		evsel->priv = malloc(sizeof(struct syscall_tp));
-
-		if (evsel->priv == NULL)
-			goto out_delete;
-
+	evsel->priv = malloc(sizeof(struct syscall_tp));
+	if (evsel->priv != NULL) {
 		if (perf_evsel__init_sc_tp_uint_field(evsel, id))
 			goto out_delete;
 
 		evsel->handler = handler;
+		return 0;
+	}
+
+	return -ENOMEM;
+
+out_delete:
+	free(evsel->priv);
+	evsel->priv = NULL;
+	return -ENOENT;
+}
+
+static struct perf_evsel *perf_evsel__syscall_newtp(const char *direction, void *handler)
+{
+	struct perf_evsel *evsel = perf_evsel__newtp("raw_syscalls", direction);
+
+	if (evsel) {
+		if (perf_evsel__init_syscall_tp(evsel, handler))
+			goto out_delete;
 	}
 
 	return evsel;
@@ -186,17 +197,16 @@ static int perf_evlist__add_syscall_newtp(struct perf_evlist *evlist,
 					  void *sys_exit_handler)
 {
 	int ret = -1;
-	int idx = evlist->nr_entries;
 	struct perf_evsel *sys_enter, *sys_exit;
 
-	sys_enter = perf_evsel__syscall_newtp("sys_enter", sys_enter_handler, idx++);
+	sys_enter = perf_evsel__syscall_newtp("sys_enter", sys_enter_handler);
 	if (sys_enter == NULL)
 		goto out;
 
 	if (perf_evsel__init_sc_tp_ptr_field(sys_enter, args))
 		goto out_delete_sys_enter;
 
-	sys_exit = perf_evsel__syscall_newtp("sys_exit", sys_exit_handler, idx++);
+	sys_exit = perf_evsel__syscall_newtp("sys_exit", sys_exit_handler);
 	if (sys_exit == NULL)
 		goto out_delete_sys_enter;
 
@@ -953,7 +963,8 @@ static struct syscall_fmt {
 	{ .name	    = "mmap",	    .hexret = true,
 	  .arg_scnprintf = { [0] = SCA_HEX,	  /* addr */
 			     [2] = SCA_MMAP_PROT, /* prot */
-			     [3] = SCA_MMAP_FLAGS, /* flags */ }, },
+			     [3] = SCA_MMAP_FLAGS, /* flags */
+			     [4] = SCA_FD, 	  /* fd */ }, },
 	{ .name	    = "mprotect",   .errmsg = true,
 	  .arg_scnprintf = { [0] = SCA_HEX, /* start */
 			     [2] = SCA_MMAP_PROT, /* prot */ }, },
@@ -1157,6 +1168,7 @@ struct trace {
 	bool			sched;
 	bool			multiple_threads;
 	bool			summary;
+	bool			summary_only;
 	bool			show_comm;
 	bool			show_tool_stats;
 	double			duration_filter;
@@ -1342,15 +1354,8 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
 	if (trace->host == NULL)
 		return -ENOMEM;
 
-	if (perf_target__has_task(&trace->opts.target)) {
-		err = perf_event__synthesize_thread_map(&trace->tool, evlist->threads,
-							trace__tool_process,
-							trace->host);
-	} else {
-		err = perf_event__synthesize_threads(&trace->tool, trace__tool_process,
-						     trace->host);
-	}
-
+	err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
+					    evlist->threads, trace__tool_process, false);
 	if (err)
 		symbol__exit();
 
@@ -1607,7 +1612,7 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel,
 					   args, trace, thread);
 
 	if (!strcmp(sc->name, "exit_group") || !strcmp(sc->name, "exit")) {
-		if (!trace->duration_filter) {
+		if (!trace->duration_filter && !trace->summary_only) {
 			trace__fprintf_entry_head(trace, thread, 1, sample->time, trace->output);
 			fprintf(trace->output, "%-70s\n", ttrace->entry_str);
 		}
@@ -1660,6 +1665,9 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel,
 	} else if (trace->duration_filter)
 		goto out;
 
+	if (trace->summary_only)
+		goto out;
+
 	trace__fprintf_entry_head(trace, thread, duration, sample->time, trace->output);
 
 	if (ttrace->entry_pending) {
@@ -1762,16 +1770,6 @@ static int trace__process_sample(struct perf_tool *tool,
 	return err;
 }
 
-static bool
-perf_session__has_tp(struct perf_session *session, const char *name)
-{
-	struct perf_evsel *evsel;
-
-	evsel = perf_evlist__find_tracepoint_by_name(session->evlist, name);
-
-	return evsel != NULL;
-}
-
 static int parse_target_str(struct trace *trace)
 {
 	if (trace->opts.target.pid) {
@@ -1824,8 +1822,7 @@ static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp);
 
 static void perf_evlist__add_vfs_getname(struct perf_evlist *evlist)
 {
-	struct perf_evsel *evsel = perf_evsel__newtp("probe", "vfs_getname",
-						     evlist->nr_entries);
+	struct perf_evsel *evsel = perf_evsel__newtp("probe", "vfs_getname");
 	if (evsel == NULL)
 		return;
 
@@ -2009,8 +2006,6 @@ out_error:
 static int trace__replay(struct trace *trace)
 {
 	const struct perf_evsel_str_handler handlers[] = {
-		{ "raw_syscalls:sys_enter",  trace__sys_enter, },
-		{ "raw_syscalls:sys_exit",   trace__sys_exit, },
 		{ "probe:vfs_getname",	     trace__vfs_getname, },
 	};
 	struct perf_data_file file = {
@@ -2018,6 +2013,7 @@ static int trace__replay(struct trace *trace)
 		.mode  = PERF_DATA_MODE_READ,
 	};
 	struct perf_session *session;
+	struct perf_evsel *evsel;
 	int err = -1;
 
 	trace->tool.sample	  = trace__process_sample;
@@ -2049,13 +2045,29 @@ static int trace__replay(struct trace *trace)
 	if (err)
 		goto out;
 
-	if (!perf_session__has_tp(session, "raw_syscalls:sys_enter")) {
-		pr_err("Data file does not have raw_syscalls:sys_enter events\n");
+	evsel = perf_evlist__find_tracepoint_by_name(session->evlist,
+						     "raw_syscalls:sys_enter");
+	if (evsel == NULL) {
+		pr_err("Data file does not have raw_syscalls:sys_enter event\n");
 		goto out;
 	}
 
-	if (!perf_session__has_tp(session, "raw_syscalls:sys_exit")) {
-		pr_err("Data file does not have raw_syscalls:sys_exit events\n");
+	if (perf_evsel__init_syscall_tp(evsel, trace__sys_enter) < 0 ||
+	    perf_evsel__init_sc_tp_ptr_field(evsel, args)) {
+		pr_err("Error during initialize raw_syscalls:sys_enter event\n");
+		goto out;
+	}
+
+	evsel = perf_evlist__find_tracepoint_by_name(session->evlist,
+						     "raw_syscalls:sys_exit");
+	if (evsel == NULL) {
+		pr_err("Data file does not have raw_syscalls:sys_exit event\n");
+		goto out;
+	}
+
+	if (perf_evsel__init_syscall_tp(evsel, trace__sys_exit) < 0 ||
+	    perf_evsel__init_sc_tp_uint_field(evsel, ret)) {
+		pr_err("Error during initialize raw_syscalls:sys_exit event\n");
 		goto out;
 	}
 
@@ -2082,12 +2094,7 @@ static size_t trace__fprintf_threads_header(FILE *fp)
 {
 	size_t printed;
 
-	printed  = fprintf(fp, "\n _____________________________________________________________________________\n");
-	printed += fprintf(fp, " __)    Summary of events    (__\n\n");
-	printed += fprintf(fp, "              [ task - pid ]     [ events ] [ ratio ]  [ runtime ]\n");
-	printed += fprintf(fp, "                                  syscall  count    min     max    avg  stddev\n");
-	printed += fprintf(fp, "                                                   msec    msec   msec     %%\n");
-	printed += fprintf(fp, " _____________________________________________________________________________\n\n");
+	printed  = fprintf(fp, "\n Summary of events:\n\n");
 
 	return printed;
 }
@@ -2105,6 +2112,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
 
 	printed += fprintf(fp, "\n");
 
+	printed += fprintf(fp, "                                                    msec/call\n");
+	printed += fprintf(fp, "   syscall            calls      min      avg      max stddev\n");
+	printed += fprintf(fp, "   --------------- -------- -------- -------- -------- ------\n");
+
 	/* each int_node is a syscall */
 	while (inode) {
 		stats = inode->priv;
@@ -2119,10 +2130,10 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
 			avg /= NSEC_PER_MSEC;
 
 			sc = &trace->syscalls.table[inode->i];
-			printed += fprintf(fp, "%24s  %14s : ", "", sc->name);
-			printed += fprintf(fp, "%5" PRIu64 "  %8.3f  %8.3f",
-					   n, min, max);
-			printed += fprintf(fp, "  %8.3f  %6.2f\n", avg, pct);
+			printed += fprintf(fp, "   %-15s", sc->name);
+			printed += fprintf(fp, " %8" PRIu64 " %8.3f %8.3f",
+					   n, min, avg);
+			printed += fprintf(fp, " %8.3f %6.2f\n", max, pct);
 		}
 
 		inode = intlist__next(inode);
@@ -2163,10 +2174,10 @@ static int trace__fprintf_one_thread(struct thread *thread, void *priv)
 	else if (ratio > 5.0)
 		color = PERF_COLOR_YELLOW;
 
-	printed += color_fprintf(fp, color, "%20s", thread__comm_str(thread));
-	printed += fprintf(fp, " - %-5d :%11lu   [", thread->tid, ttrace->nr_events);
-	printed += color_fprintf(fp, color, "%5.1f%%", ratio);
-	printed += fprintf(fp, " ] %10.3f ms\n", ttrace->runtime_ms);
+	printed += color_fprintf(fp, color, " %s (%d), ", thread__comm_str(thread), thread->tid);
+	printed += fprintf(fp, "%lu events, ", ttrace->nr_events);
+	printed += color_fprintf(fp, color, "%.1f%%", ratio);
+	printed += fprintf(fp, ", %.3f msec\n", ttrace->runtime_ms);
 	printed += thread__dump_stats(ttrace, trace, fp);
 
 	data->printed += printed;
@@ -2275,8 +2286,10 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_INCR('v', "verbose", &verbose, "be more verbose"),
 	OPT_BOOLEAN('T', "time", &trace.full_time,
 		    "Show full timestamp, not time relative to first start"),
-	OPT_BOOLEAN(0, "summary", &trace.summary,
-		    "Show syscall summary with statistics"),
+	OPT_BOOLEAN('s', "summary", &trace.summary_only,
+		    "Show only syscall summary with statistics"),
+	OPT_BOOLEAN('S', "with-summary", &trace.summary,
+		    "Show all syscalls and summary with statistics"),
 	OPT_END()
 	};
 	int err;
@@ -2287,6 +2300,10 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 
 	argc = parse_options(argc, argv, trace_options, trace_usage, 0);
 
+	/* summary_only implies summary option, but don't overwrite summary if set */
+	if (trace.summary_only)
+		trace.summary = trace.summary_only;
+
 	if (output_name != NULL) {
 		err = trace__open_output(&trace, output_name);
 		if (err < 0) {
@@ -2310,21 +2327,21 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 	}
 
-	err = perf_target__validate(&trace.opts.target);
+	err = target__validate(&trace.opts.target);
 	if (err) {
-		perf_target__strerror(&trace.opts.target, err, bf, sizeof(bf));
+		target__strerror(&trace.opts.target, err, bf, sizeof(bf));
 		fprintf(trace.output, "%s", bf);
 		goto out_close;
 	}
 
-	err = perf_target__parse_uid(&trace.opts.target);
+	err = target__parse_uid(&trace.opts.target);
 	if (err) {
-		perf_target__strerror(&trace.opts.target, err, bf, sizeof(bf));
+		target__strerror(&trace.opts.target, err, bf, sizeof(bf));
 		fprintf(trace.output, "%s", bf);
 		goto out_close;
 	}
 
-	if (!argc && perf_target__none(&trace.opts.target))
+	if (!argc && target__none(&trace.opts.target))
 		trace.opts.target.system_wide = true;
 
 	if (input_name)
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 6a587e84..b079304 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -248,7 +248,7 @@ enum perf_call_graph_mode {
 };
 
 struct perf_record_opts {
-	struct perf_target target;
+	struct target target;
 	int	     call_graph;
 	bool	     group;
 	bool	     inherit_stat;
diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 49ccc3b..85d4919 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -275,8 +275,19 @@ static int process_event(struct machine *machine, struct perf_evlist *evlist,
 	if (event->header.type == PERF_RECORD_SAMPLE)
 		return process_sample_event(machine, evlist, event, state);
 
-	if (event->header.type < PERF_RECORD_MAX)
-		return machine__process_event(machine, event, NULL);
+	if (event->header.type == PERF_RECORD_THROTTLE ||
+	    event->header.type == PERF_RECORD_UNTHROTTLE)
+		return 0;
+
+	if (event->header.type < PERF_RECORD_MAX) {
+		int ret;
+
+		ret = machine__process_event(machine, event, NULL);
+		if (ret < 0)
+			pr_debug("machine__process_event failed, event type %u\n",
+				 event->header.type);
+		return ret;
+	}
 
 	return 0;
 }
@@ -441,7 +452,7 @@ static int do_test_code_reading(bool try_kcore)
 	}
 
 	ret = perf_event__synthesize_thread_map(NULL, threads,
-						perf_event__process, machine);
+						perf_event__process, machine, false);
 	if (ret < 0) {
 		pr_debug("perf_event__synthesize_thread_map failed\n");
 		goto out_err;
diff --git a/tools/perf/tests/evsel-tp-sched.c b/tools/perf/tests/evsel-tp-sched.c
index 9b98c15..4774f7f 100644
--- a/tools/perf/tests/evsel-tp-sched.c
+++ b/tools/perf/tests/evsel-tp-sched.c
@@ -32,7 +32,7 @@ static int perf_evsel__test_field(struct perf_evsel *evsel, const char *name,
 
 int test__perf_evsel__tp_sched_test(void)
 {
-	struct perf_evsel *evsel = perf_evsel__newtp("sched", "sched_switch", 0);
+	struct perf_evsel *evsel = perf_evsel__newtp("sched", "sched_switch");
 	int ret = 0;
 
 	if (evsel == NULL) {
@@ -63,7 +63,7 @@ int test__perf_evsel__tp_sched_test(void)
 
 	perf_evsel__delete(evsel);
 
-	evsel = perf_evsel__newtp("sched", "sched_wakeup", 0);
+	evsel = perf_evsel__newtp("sched", "sched_wakeup");
 
 	if (perf_evsel__test_field(evsel, "comm", 16, true))
 		ret = -1;
diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
index a7232c2..d64ab79 100644
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
@@ -65,7 +65,7 @@ int test__basic_mmap(void)
 		char name[64];
 
 		snprintf(name, sizeof(name), "sys_enter_%s", syscall_names[i]);
-		evsels[i] = perf_evsel__newtp("syscalls", name, i);
+		evsels[i] = perf_evsel__newtp("syscalls", name);
 		if (evsels[i] == NULL) {
 			pr_debug("perf_evsel__new\n");
 			goto out_free_evlist;
diff --git a/tools/perf/tests/open-syscall-all-cpus.c b/tools/perf/tests/open-syscall-all-cpus.c
index b0657a9..5fecdbd 100644
--- a/tools/perf/tests/open-syscall-all-cpus.c
+++ b/tools/perf/tests/open-syscall-all-cpus.c
@@ -26,7 +26,7 @@ int test__open_syscall_event_on_all_cpus(void)
 
 	CPU_ZERO(&cpu_set);
 
-	evsel = perf_evsel__newtp("syscalls", "sys_enter_open", 0);
+	evsel = perf_evsel__newtp("syscalls", "sys_enter_open");
 	if (evsel == NULL) {
 		pr_debug("is debugfs mounted on /sys/kernel/debug?\n");
 		goto out_thread_map_delete;
diff --git a/tools/perf/tests/open-syscall-tp-fields.c b/tools/perf/tests/open-syscall-tp-fields.c
index 524b221..41cc0ba 100644
--- a/tools/perf/tests/open-syscall-tp-fields.c
+++ b/tools/perf/tests/open-syscall-tp-fields.c
@@ -27,7 +27,7 @@ int test__syscall_open_tp_fields(void)
 		goto out;
 	}
 
-	evsel = perf_evsel__newtp("syscalls", "sys_enter_open", 0);
+	evsel = perf_evsel__newtp("syscalls", "sys_enter_open");
 	if (evsel == NULL) {
 		pr_debug("%s: perf_evsel__newtp\n", __func__);
 		goto out_delete_evlist;
diff --git a/tools/perf/tests/open-syscall.c b/tools/perf/tests/open-syscall.c
index befc067..c1dc7d2 100644
--- a/tools/perf/tests/open-syscall.c
+++ b/tools/perf/tests/open-syscall.c
@@ -15,7 +15,7 @@ int test__open_syscall_event(void)
 		return -1;
 	}
 
-	evsel = perf_evsel__newtp("syscalls", "sys_enter_open", 0);
+	evsel = perf_evsel__newtp("syscalls", "sys_enter_open");
 	if (evsel == NULL) {
 		pr_debug("is debugfs mounted on /sys/kernel/debug?\n");
 		goto out_thread_map_delete;
diff --git a/tools/perf/tests/sw-clock.c b/tools/perf/tests/sw-clock.c
index 6e2b44e..6664a7c 100644
--- a/tools/perf/tests/sw-clock.c
+++ b/tools/perf/tests/sw-clock.c
@@ -9,7 +9,7 @@
 #include "util/cpumap.h"
 #include "util/thread_map.h"
 
-#define NR_LOOPS  1000000
+#define NR_LOOPS  10000000
 
 /*
  * This test will open software clock events (cpu-clock, task-clock)
@@ -34,7 +34,7 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id)
 		.freq = 1,
 	};
 
-	attr.sample_freq = 10000;
+	attr.sample_freq = 500;
 
 	evlist = perf_evlist__new();
 	if (evlist == NULL) {
@@ -42,7 +42,7 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id)
 		return -1;
 	}
 
-	evsel = perf_evsel__new(&attr, 0);
+	evsel = perf_evsel__new(&attr);
 	if (evsel == NULL) {
 		pr_debug("perf_evsel__new\n");
 		goto out_free_evlist;
@@ -57,7 +57,14 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id)
 		goto out_delete_maps;
 	}
 
-	perf_evlist__open(evlist);
+	if (perf_evlist__open(evlist)) {
+		const char *knob = "/proc/sys/kernel/perf_event_max_sample_rate";
+
+		err = -errno;
+		pr_debug("Couldn't open evlist: %s\nHint: check %s, using %" PRIu64 " in this test.\n",
+			 strerror(errno), knob, (u64)attr.sample_freq);
+		goto out_delete_maps;
+	}
 
 	err = perf_evlist__mmap(evlist, 128, true);
 	if (err < 0) {
diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c
index c33d95f..d09ab57 100644
--- a/tools/perf/tests/task-exit.c
+++ b/tools/perf/tests/task-exit.c
@@ -28,7 +28,7 @@ int test__task_exit(void)
 	union perf_event *event;
 	struct perf_evsel *evsel;
 	struct perf_evlist *evlist;
-	struct perf_target target = {
+	struct target target = {
 		.uid		= UINT_MAX,
 		.uses_mmap	= true,
 	};
diff --git a/tools/perf/ui/tui/progress.c b/tools/perf/ui/tui/progress.c
index 3e2d936..c61d14b 100644
--- a/tools/perf/ui/tui/progress.c
+++ b/tools/perf/ui/tui/progress.c
@@ -18,13 +18,14 @@ static void tui_progress__update(struct ui_progress *p)
 	if (p->total == 0)
 		return;
 
-	ui__refresh_dimensions(true);
+	ui__refresh_dimensions(false);
 	pthread_mutex_lock(&ui__lock);
 	y = SLtt_Screen_Rows / 2 - 2;
 	SLsmg_set_color(0);
 	SLsmg_draw_box(y, 0, 3, SLtt_Screen_Cols);
 	SLsmg_gotorc(y++, 1);
 	SLsmg_write_string((char *)p->title);
+	SLsmg_fill_region(y, 1, 1, SLtt_Screen_Cols - 2, ' ');
 	SLsmg_set_color(HE_COLORSET_SELECTED);
 	bar = ((SLtt_Screen_Cols - 2) * p->curr) / p->total;
 	SLsmg_fill_region(y, 1, 1, bar, ' ');
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index ec9ae11..6e3a846 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -170,7 +170,8 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 					      union perf_event *event,
 					      pid_t pid, pid_t tgid,
 					      perf_event__handler_t process,
-					      struct machine *machine)
+					      struct machine *machine,
+					      bool mmap_data)
 {
 	char filename[PATH_MAX];
 	FILE *fp;
@@ -188,10 +189,6 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	}
 
 	event->header.type = PERF_RECORD_MMAP;
-	/*
-	 * Just like the kernel, see __perf_event_mmap in kernel/perf_event.c
-	 */
-	event->header.misc = PERF_RECORD_MISC_USER;
 
 	while (1) {
 		char bf[BUFSIZ];
@@ -215,9 +212,17 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 
 		if (n != 5)
 			continue;
+		/*
+		 * Just like the kernel, see __perf_event_mmap in kernel/perf_event.c
+		 */
+		event->header.misc = PERF_RECORD_MISC_USER;
 
-		if (prot[2] != 'x')
-			continue;
+		if (prot[2] != 'x') {
+			if (!mmap_data || prot[0] != 'r')
+				continue;
+
+			event->header.misc |= PERF_RECORD_MISC_MMAP_DATA;
+		}
 
 		if (!strcmp(execname, ""))
 			strcpy(execname, anonstr);
@@ -304,20 +309,21 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 				      pid_t pid, int full,
 					  perf_event__handler_t process,
 				      struct perf_tool *tool,
-				      struct machine *machine)
+				      struct machine *machine, bool mmap_data)
 {
 	pid_t tgid = perf_event__synthesize_comm(tool, comm_event, pid, full,
 						 process, machine);
 	if (tgid == -1)
 		return -1;
 	return perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
-						  process, machine);
+						  process, machine, mmap_data);
 }
 
 int perf_event__synthesize_thread_map(struct perf_tool *tool,
 				      struct thread_map *threads,
 				      perf_event__handler_t process,
-				      struct machine *machine)
+				      struct machine *machine,
+				      bool mmap_data)
 {
 	union perf_event *comm_event, *mmap_event;
 	int err = -1, thread, j;
@@ -334,7 +340,8 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 	for (thread = 0; thread < threads->nr; ++thread) {
 		if (__event__synthesize_thread(comm_event, mmap_event,
 					       threads->map[thread], 0,
-					       process, tool, machine)) {
+					       process, tool, machine,
+					       mmap_data)) {
 			err = -1;
 			break;
 		}
@@ -356,10 +363,10 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 
 			/* if not, generate events for it */
 			if (need_leader &&
-			    __event__synthesize_thread(comm_event,
-						      mmap_event,
-						      comm_event->comm.pid, 0,
-						      process, tool, machine)) {
+			    __event__synthesize_thread(comm_event, mmap_event,
+						       comm_event->comm.pid, 0,
+						       process, tool, machine,
+						       mmap_data)) {
 				err = -1;
 				break;
 			}
@@ -374,7 +381,7 @@ out:
 
 int perf_event__synthesize_threads(struct perf_tool *tool,
 				   perf_event__handler_t process,
-				   struct machine *machine)
+				   struct machine *machine, bool mmap_data)
 {
 	DIR *proc;
 	struct dirent dirent, *next;
@@ -404,7 +411,7 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
  		 * one thread couldn't be synthesized.
  		 */
 		__event__synthesize_thread(comm_event, mmap_event, pid, 1,
-					   process, tool, machine);
+					   process, tool, machine, mmap_data);
 	}
 
 	err = 0;
@@ -528,19 +535,22 @@ int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
-	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %s\n",
+	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
 		       event->mmap.pid, event->mmap.tid, event->mmap.start,
-		       event->mmap.len, event->mmap.pgoff, event->mmap.filename);
+		       event->mmap.len, event->mmap.pgoff,
+		       (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) ? 'r' : 'x',
+		       event->mmap.filename);
 }
 
 size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64
-			   " %02x:%02x %"PRIu64" %"PRIu64"]: %s\n",
+			   " %02x:%02x %"PRIu64" %"PRIu64"]: %c %s\n",
 		       event->mmap2.pid, event->mmap2.tid, event->mmap2.start,
 		       event->mmap2.len, event->mmap2.pgoff, event->mmap2.maj,
 		       event->mmap2.min, event->mmap2.ino,
 		       event->mmap2.ino_generation,
+		       (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) ? 'r' : 'x',
 		       event->mmap2.filename);
 }
 
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index f8d70f3..30fec99 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -208,10 +208,10 @@ typedef int (*perf_event__handler_t)(struct perf_tool *tool,
 int perf_event__synthesize_thread_map(struct perf_tool *tool,
 				      struct thread_map *threads,
 				      perf_event__handler_t process,
-				      struct machine *machine);
+				      struct machine *machine, bool mmap_data);
 int perf_event__synthesize_threads(struct perf_tool *tool,
 				   perf_event__handler_t process,
-				   struct machine *machine);
+				   struct machine *machine, bool mmap_data);
 int perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
 				       perf_event__handler_t process,
 				       struct machine *machine,
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index b939221..dc6fa3f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -117,6 +117,8 @@ void perf_evlist__delete(struct perf_evlist *evlist)
 void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry)
 {
 	list_add_tail(&entry->node, &evlist->entries);
+	entry->idx = evlist->nr_entries;
+
 	if (!evlist->nr_entries++)
 		perf_evlist__set_id_pos(evlist);
 }
@@ -165,7 +167,7 @@ int perf_evlist__add_default(struct perf_evlist *evlist)
 
 	event_attr_init(&attr);
 
-	evsel = perf_evsel__new(&attr, 0);
+	evsel = perf_evsel__new(&attr);
 	if (evsel == NULL)
 		goto error;
 
@@ -190,7 +192,7 @@ static int perf_evlist__add_attrs(struct perf_evlist *evlist,
 	size_t i;
 
 	for (i = 0; i < nr_attrs; i++) {
-		evsel = perf_evsel__new(attrs + i, evlist->nr_entries + i);
+		evsel = perf_evsel__new_idx(attrs + i, evlist->nr_entries + i);
 		if (evsel == NULL)
 			goto out_delete_partial_list;
 		list_add_tail(&evsel->node, &head);
@@ -249,9 +251,8 @@ perf_evlist__find_tracepoint_by_name(struct perf_evlist *evlist,
 int perf_evlist__add_newtp(struct perf_evlist *evlist,
 			   const char *sys, const char *name, void *handler)
 {
-	struct perf_evsel *evsel;
+	struct perf_evsel *evsel = perf_evsel__newtp(sys, name);
 
-	evsel = perf_evsel__newtp(sys, name, evlist->nr_entries);
 	if (evsel == NULL)
 		return -1;
 
@@ -704,12 +705,10 @@ static size_t perf_evlist__mmap_size(unsigned long pages)
 	return (pages + 1) * page_size;
 }
 
-int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
-				  int unset __maybe_unused)
+static long parse_pages_arg(const char *str, unsigned long min,
+			    unsigned long max)
 {
-	unsigned int *mmap_pages = opt->value;
 	unsigned long pages, val;
-	size_t size;
 	static struct parse_tag tags[] = {
 		{ .tag  = 'B', .mult = 1       },
 		{ .tag  = 'K', .mult = 1 << 10 },
@@ -718,33 +717,49 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 		{ .tag  = 0 },
 	};
 
+	if (str == NULL)
+		return -EINVAL;
+
 	val = parse_tag_value(str, tags);
 	if (val != (unsigned long) -1) {
 		/* we got file size value */
 		pages = PERF_ALIGN(val, page_size) / page_size;
-		if (pages < (1UL << 31) && !is_power_of_2(pages)) {
-			pages = next_pow2(pages);
-			pr_info("rounding mmap pages size to %lu (%lu pages)\n",
-				pages * page_size, pages);
-		}
 	} else {
 		/* we got pages count value */
 		char *eptr;
 		pages = strtoul(str, &eptr, 10);
-		if (*eptr != '\0') {
-			pr_err("failed to parse --mmap_pages/-m value\n");
-			return -1;
-		}
+		if (*eptr != '\0')
+			return -EINVAL;
 	}
 
-	if (pages > UINT_MAX || pages > SIZE_MAX / page_size) {
-		pr_err("--mmap_pages/-m value too big\n");
-		return -1;
+	if ((pages == 0) && (min == 0)) {
+		/* leave number of pages at 0 */
+	} else if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+		/* round pages up to next power of 2 */
+		pages = next_pow2(pages);
+		pr_info("rounding mmap pages size to %lu bytes (%lu pages)\n",
+			pages * page_size, pages);
 	}
 
-	size = perf_evlist__mmap_size(pages);
-	if (!size) {
-		pr_err("--mmap_pages/-m value must be a power of two.");
+	if (pages > max)
+		return -EINVAL;
+
+	return pages;
+}
+
+int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
+				  int unset __maybe_unused)
+{
+	unsigned int *mmap_pages = opt->value;
+	unsigned long max = UINT_MAX;
+	long pages;
+
+	if (max < SIZE_MAX / page_size)
+		max = SIZE_MAX / page_size;
+
+	pages = parse_pages_arg(str, 1, max);
+	if (pages < 0) {
+		pr_err("Invalid argument for --mmap_pages/-m\n");
 		return -1;
 	}
 
@@ -796,8 +811,7 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	return perf_evlist__mmap_per_cpu(evlist, prot, mask);
 }
 
-int perf_evlist__create_maps(struct perf_evlist *evlist,
-			     struct perf_target *target)
+int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 {
 	evlist->threads = thread_map__new_str(target->pid, target->tid,
 					      target->uid);
@@ -805,9 +819,9 @@ int perf_evlist__create_maps(struct perf_evlist *evlist,
 	if (evlist->threads == NULL)
 		return -1;
 
-	if (perf_target__has_task(target))
+	if (target__has_task(target))
 		evlist->cpus = cpu_map__dummy_new();
-	else if (!perf_target__has_cpu(target) && !target->uses_mmap)
+	else if (!target__has_cpu(target) && !target->uses_mmap)
 		evlist->cpus = cpu_map__dummy_new();
 	else
 		evlist->cpus = cpu_map__new(target->cpu_list);
@@ -1016,8 +1030,7 @@ out_err:
 	return err;
 }
 
-int perf_evlist__prepare_workload(struct perf_evlist *evlist,
-				  struct perf_target *target,
+int perf_evlist__prepare_workload(struct perf_evlist *evlist, struct target *target,
 				  const char *argv[], bool pipe_output,
 				  bool want_signal)
 {
@@ -1069,7 +1082,7 @@ int perf_evlist__prepare_workload(struct perf_evlist *evlist,
 		exit(-1);
 	}
 
-	if (perf_target__none(target))
+	if (target__none(target))
 		evlist->threads->map[0] = evlist->workload.pid;
 
 	close(child_ready_pipe[1]);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index ecaa582..649d6ea 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -102,7 +102,7 @@ void perf_evlist__config(struct perf_evlist *evlist,
 int perf_record_opts__config(struct perf_record_opts *opts);
 
 int perf_evlist__prepare_workload(struct perf_evlist *evlist,
-				  struct perf_target *target,
+				  struct target *target,
 				  const char *argv[], bool pipe_output,
 				  bool want_signal);
 int perf_evlist__start_workload(struct perf_evlist *evlist);
@@ -134,8 +134,7 @@ static inline void perf_evlist__set_maps(struct perf_evlist *evlist,
 	evlist->threads	= threads;
 }
 
-int perf_evlist__create_maps(struct perf_evlist *evlist,
-			     struct perf_target *target);
+int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target);
 void perf_evlist__delete_maps(struct perf_evlist *evlist);
 int perf_evlist__apply_filters(struct perf_evlist *evlist);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 5280820..18f7c18 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -168,7 +168,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
 	perf_evsel__calc_id_pos(evsel);
 }
 
-struct perf_evsel *perf_evsel__new(struct perf_event_attr *attr, int idx)
+struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
 {
 	struct perf_evsel *evsel = zalloc(sizeof(*evsel));
 
@@ -219,7 +219,7 @@ out:
 	return format;
 }
 
-struct perf_evsel *perf_evsel__newtp(const char *sys, const char *name, int idx)
+struct perf_evsel *perf_evsel__newtp_idx(const char *sys, const char *name, int idx)
 {
 	struct perf_evsel *evsel = zalloc(sizeof(*evsel));
 
@@ -645,7 +645,7 @@ void perf_evsel__config(struct perf_evsel *evsel,
 		}
 	}
 
-	if (perf_target__has_cpu(&opts->target))
+	if (target__has_cpu(&opts->target))
 		perf_evsel__set_sample_bit(evsel, CPU);
 
 	if (opts->period)
@@ -653,7 +653,7 @@ void perf_evsel__config(struct perf_evsel *evsel,
 
 	if (!perf_missing_features.sample_id_all &&
 	    (opts->sample_time || !opts->no_inherit ||
-	     perf_target__has_cpu(&opts->target)))
+	     target__has_cpu(&opts->target)))
 		perf_evsel__set_sample_bit(evsel, TIME);
 
 	if (opts->raw_samples) {
@@ -696,7 +696,7 @@ void perf_evsel__config(struct perf_evsel *evsel,
 	 * Setting enable_on_exec for independent events and
 	 * group leaders for traced executed by perf.
 	 */
-	if (perf_target__none(&opts->target) && perf_evsel__is_group_leader(evsel))
+	if (target__none(&opts->target) && perf_evsel__is_group_leader(evsel))
 		attr->enable_on_exec = 1;
 }
 
@@ -2006,8 +2006,7 @@ bool perf_evsel__fallback(struct perf_evsel *evsel, int err,
 	return false;
 }
 
-int perf_evsel__open_strerror(struct perf_evsel *evsel,
-			      struct perf_target *target,
+int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
 			      int err, char *msg, size_t size)
 {
 	switch (err) {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 64ec8e1..f502965 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -96,8 +96,19 @@ struct thread_map;
 struct perf_evlist;
 struct perf_record_opts;
 
-struct perf_evsel *perf_evsel__new(struct perf_event_attr *attr, int idx);
-struct perf_evsel *perf_evsel__newtp(const char *sys, const char *name, int idx);
+struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx);
+
+static inline struct perf_evsel *perf_evsel__new(struct perf_event_attr *attr)
+{
+	return perf_evsel__new_idx(attr, 0);
+}
+
+struct perf_evsel *perf_evsel__newtp_idx(const char *sys, const char *name, int idx);
+
+static inline struct perf_evsel *perf_evsel__newtp(const char *sys, const char *name)
+{
+	return perf_evsel__newtp_idx(sys, name, 0);
+}
 
 struct event_format *event_format__new(const char *sys, const char *name);
 
@@ -307,8 +318,7 @@ int perf_evsel__fprintf(struct perf_evsel *evsel,
 
 bool perf_evsel__fallback(struct perf_evsel *evsel, int err,
 			  char *msg, size_t msgsize);
-int perf_evsel__open_strerror(struct perf_evsel *evsel,
-			      struct perf_target *target,
+int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
 			      int err, char *msg, size_t size);
 
 static inline int perf_evsel__group_idx(struct perf_evsel *evsel)
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 26d9520..369c036 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2797,7 +2797,7 @@ int perf_session__read_header(struct perf_session *session)
 			perf_event__attr_swap(&f_attr.attr);
 
 		tmp = lseek(fd, 0, SEEK_CUR);
-		evsel = perf_evsel__new(&f_attr.attr, i);
+		evsel = perf_evsel__new(&f_attr.attr);
 
 		if (evsel == NULL)
 			goto out_delete_evlist;
@@ -2916,7 +2916,7 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
 			return -ENOMEM;
 	}
 
-	evsel = perf_evsel__new(&event->attr.attr, evlist->nr_entries);
+	evsel = perf_evsel__new(&event->attr.attr);
 	if (evsel == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ce034c1..0393912 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1394,3 +1394,15 @@ int machine__for_each_thread(struct machine *machine,
 	}
 	return rc;
 }
+
+int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
+				  struct target *target, struct thread_map *threads,
+				  perf_event__handler_t process, bool data_mmap)
+{
+	if (target__has_task(target))
+		return perf_event__synthesize_thread_map(tool, threads, process, machine, data_mmap);
+	else if (target__has_cpu(target))
+		return perf_event__synthesize_threads(tool, process, machine, data_mmap);
+	/* command specified */
+	return 0;
+}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 2389ba8..4771330 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -4,6 +4,7 @@
 #include <sys/types.h>
 #include <linux/rbtree.h>
 #include "map.h"
+#include "event.h"
 
 struct addr_location;
 struct branch_stack;
@@ -178,4 +179,15 @@ int machine__for_each_thread(struct machine *machine,
 			     int (*fn)(struct thread *thread, void *p),
 			     void *priv);
 
+int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
+				  struct target *target, struct thread_map *threads,
+				  perf_event__handler_t process, bool data_mmap);
+static inline
+int machine__synthesize_threads(struct machine *machine, struct target *target,
+				struct thread_map *threads, bool data_mmap)
+{
+	return __machine__synthesize_threads(machine, NULL, target, threads,
+					     perf_event__process, data_mmap);
+}
+
 #endif /* __PERF_MACHINE_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index c90e55c..6de6f89 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -277,7 +277,7 @@ static int __add_event(struct list_head *list, int *idx,
 
 	event_attr_init(attr);
 
-	evsel = perf_evsel__new(attr, (*idx)++);
+	evsel = perf_evsel__new_idx(attr, (*idx)++);
 	if (!evsel)
 		return -ENOMEM;
 
@@ -378,7 +378,7 @@ static int add_tracepoint(struct list_head *list, int *idx,
 {
 	struct perf_evsel *evsel;
 
-	evsel = perf_evsel__newtp(sys_name, evt_name, (*idx)++);
+	evsel = perf_evsel__newtp_idx(sys_name, evt_name, (*idx)++);
 	if (!evsel)
 		return -ENOMEM;
 
@@ -1097,7 +1097,7 @@ static bool is_event_supported(u8 type, unsigned config)
 		.threads = { 0 },
 	};
 
-	evsel = perf_evsel__new(&attr, 0);
+	evsel = perf_evsel__new(&attr);
 	if (evsel) {
 		ret = perf_evsel__open(evsel, NULL, &tmap.map) >= 0;
 		perf_evsel__delete(evsel);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 3c1b75c..8b0bb1f 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1137,6 +1137,8 @@ static void sort_entry__setup_elide(struct sort_entry *se,
 
 void sort__setup_elide(FILE *output)
 {
+	struct sort_entry *se;
+
 	sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list,
 				"dso", output);
 	sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list,
@@ -1172,4 +1174,15 @@ void sort__setup_elide(FILE *output)
 					"snoop", output);
 	}
 
+	/*
+	 * It makes no sense to elide all of sort entries.
+	 * Just revert them to show up again.
+	 */
+	list_for_each_entry(se, &hist_entry__sort_list, list) {
+		if (!se->elide)
+			return;
+	}
+
+	list_for_each_entry(se, &hist_entry__sort_list, list)
+		se->elide = false;
 }
diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
index 065528b..3c778a0 100644
--- a/tools/perf/util/target.c
+++ b/tools/perf/util/target.c
@@ -13,9 +13,9 @@
 #include <string.h>
 
 
-enum perf_target_errno perf_target__validate(struct perf_target *target)
+enum target_errno target__validate(struct target *target)
 {
-	enum perf_target_errno ret = PERF_ERRNO_TARGET__SUCCESS;
+	enum target_errno ret = TARGET_ERRNO__SUCCESS;
 
 	if (target->pid)
 		target->tid = target->pid;
@@ -23,42 +23,42 @@ enum perf_target_errno perf_target__validate(struct perf_target *target)
 	/* CPU and PID are mutually exclusive */
 	if (target->tid && target->cpu_list) {
 		target->cpu_list = NULL;
-		if (ret == PERF_ERRNO_TARGET__SUCCESS)
-			ret = PERF_ERRNO_TARGET__PID_OVERRIDE_CPU;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__PID_OVERRIDE_CPU;
 	}
 
 	/* UID and PID are mutually exclusive */
 	if (target->tid && target->uid_str) {
 		target->uid_str = NULL;
-		if (ret == PERF_ERRNO_TARGET__SUCCESS)
-			ret = PERF_ERRNO_TARGET__PID_OVERRIDE_UID;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__PID_OVERRIDE_UID;
 	}
 
 	/* UID and CPU are mutually exclusive */
 	if (target->uid_str && target->cpu_list) {
 		target->cpu_list = NULL;
-		if (ret == PERF_ERRNO_TARGET__SUCCESS)
-			ret = PERF_ERRNO_TARGET__UID_OVERRIDE_CPU;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__UID_OVERRIDE_CPU;
 	}
 
 	/* PID and SYSTEM are mutually exclusive */
 	if (target->tid && target->system_wide) {
 		target->system_wide = false;
-		if (ret == PERF_ERRNO_TARGET__SUCCESS)
-			ret = PERF_ERRNO_TARGET__PID_OVERRIDE_SYSTEM;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__PID_OVERRIDE_SYSTEM;
 	}
 
 	/* UID and SYSTEM are mutually exclusive */
 	if (target->uid_str && target->system_wide) {
 		target->system_wide = false;
-		if (ret == PERF_ERRNO_TARGET__SUCCESS)
-			ret = PERF_ERRNO_TARGET__UID_OVERRIDE_SYSTEM;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__UID_OVERRIDE_SYSTEM;
 	}
 
 	return ret;
 }
 
-enum perf_target_errno perf_target__parse_uid(struct perf_target *target)
+enum target_errno target__parse_uid(struct target *target)
 {
 	struct passwd pwd, *result;
 	char buf[1024];
@@ -66,7 +66,7 @@ enum perf_target_errno perf_target__parse_uid(struct perf_target *target)
 
 	target->uid = UINT_MAX;
 	if (str == NULL)
-		return PERF_ERRNO_TARGET__SUCCESS;
+		return TARGET_ERRNO__SUCCESS;
 
 	/* Try user name first */
 	getpwnam_r(str, &pwd, buf, sizeof(buf), &result);
@@ -79,22 +79,22 @@ enum perf_target_errno perf_target__parse_uid(struct perf_target *target)
 		int uid = strtol(str, &endptr, 10);
 
 		if (*endptr != '\0')
-			return PERF_ERRNO_TARGET__INVALID_UID;
+			return TARGET_ERRNO__INVALID_UID;
 
 		getpwuid_r(uid, &pwd, buf, sizeof(buf), &result);
 
 		if (result == NULL)
-			return PERF_ERRNO_TARGET__USER_NOT_FOUND;
+			return TARGET_ERRNO__USER_NOT_FOUND;
 	}
 
 	target->uid = result->pw_uid;
-	return PERF_ERRNO_TARGET__SUCCESS;
+	return TARGET_ERRNO__SUCCESS;
 }
 
 /*
- * This must have a same ordering as the enum perf_target_errno.
+ * This must have a same ordering as the enum target_errno.
  */
-static const char *perf_target__error_str[] = {
+static const char *target__error_str[] = {
 	"PID/TID switch overriding CPU",
 	"PID/TID switch overriding UID",
 	"UID switch overriding CPU",
@@ -104,7 +104,7 @@ static const char *perf_target__error_str[] = {
 	"Problems obtaining information for user %s",
 };
 
-int perf_target__strerror(struct perf_target *target, int errnum,
+int target__strerror(struct target *target, int errnum,
 			  char *buf, size_t buflen)
 {
 	int idx;
@@ -124,21 +124,19 @@ int perf_target__strerror(struct perf_target *target, int errnum,
 		return 0;
 	}
 
-	if (errnum <  __PERF_ERRNO_TARGET__START ||
-	    errnum >= __PERF_ERRNO_TARGET__END)
+	if (errnum <  __TARGET_ERRNO__START || errnum >= __TARGET_ERRNO__END)
 		return -1;
 
-	idx = errnum - __PERF_ERRNO_TARGET__START;
-	msg = perf_target__error_str[idx];
+	idx = errnum - __TARGET_ERRNO__START;
+	msg = target__error_str[idx];
 
 	switch (errnum) {
-	case PERF_ERRNO_TARGET__PID_OVERRIDE_CPU
-	 ... PERF_ERRNO_TARGET__UID_OVERRIDE_SYSTEM:
+	case TARGET_ERRNO__PID_OVERRIDE_CPU ... TARGET_ERRNO__UID_OVERRIDE_SYSTEM:
 		snprintf(buf, buflen, "%s", msg);
 		break;
 
-	case PERF_ERRNO_TARGET__INVALID_UID:
-	case PERF_ERRNO_TARGET__USER_NOT_FOUND:
+	case TARGET_ERRNO__INVALID_UID:
+	case TARGET_ERRNO__USER_NOT_FOUND:
 		snprintf(buf, buflen, msg, target->uid_str);
 		break;
 
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index a4be857..89bab71 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -4,7 +4,7 @@
 #include <stdbool.h>
 #include <sys/types.h>
 
-struct perf_target {
+struct target {
 	const char   *pid;
 	const char   *tid;
 	const char   *cpu_list;
@@ -14,8 +14,8 @@ struct perf_target {
 	bool	     uses_mmap;
 };
 
-enum perf_target_errno {
-	PERF_ERRNO_TARGET__SUCCESS		= 0,
+enum target_errno {
+	TARGET_ERRNO__SUCCESS		= 0,
 
 	/*
 	 * Choose an arbitrary negative big number not to clash with standard
@@ -24,42 +24,40 @@ enum perf_target_errno {
 	 *
 	 * http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
 	 */
-	__PERF_ERRNO_TARGET__START		= -10000,
+	__TARGET_ERRNO__START		= -10000,
 
+	/* for target__validate() */
+	TARGET_ERRNO__PID_OVERRIDE_CPU	= __TARGET_ERRNO__START,
+	TARGET_ERRNO__PID_OVERRIDE_UID,
+	TARGET_ERRNO__UID_OVERRIDE_CPU,
+	TARGET_ERRNO__PID_OVERRIDE_SYSTEM,
+	TARGET_ERRNO__UID_OVERRIDE_SYSTEM,
 
-	/* for perf_target__validate() */
-	PERF_ERRNO_TARGET__PID_OVERRIDE_CPU	= __PERF_ERRNO_TARGET__START,
-	PERF_ERRNO_TARGET__PID_OVERRIDE_UID,
-	PERF_ERRNO_TARGET__UID_OVERRIDE_CPU,
-	PERF_ERRNO_TARGET__PID_OVERRIDE_SYSTEM,
-	PERF_ERRNO_TARGET__UID_OVERRIDE_SYSTEM,
+	/* for target__parse_uid() */
+	TARGET_ERRNO__INVALID_UID,
+	TARGET_ERRNO__USER_NOT_FOUND,
 
-	/* for perf_target__parse_uid() */
-	PERF_ERRNO_TARGET__INVALID_UID,
-	PERF_ERRNO_TARGET__USER_NOT_FOUND,
-
-	__PERF_ERRNO_TARGET__END,
+	__TARGET_ERRNO__END,
 };
 
-enum perf_target_errno perf_target__validate(struct perf_target *target);
-enum perf_target_errno perf_target__parse_uid(struct perf_target *target);
+enum target_errno target__validate(struct target *target);
+enum target_errno target__parse_uid(struct target *target);
 
-int perf_target__strerror(struct perf_target *target, int errnum, char *buf,
-			  size_t buflen);
+int target__strerror(struct target *target, int errnum, char *buf, size_t buflen);
 
-static inline bool perf_target__has_task(struct perf_target *target)
+static inline bool target__has_task(struct target *target)
 {
 	return target->tid || target->pid || target->uid_str;
 }
 
-static inline bool perf_target__has_cpu(struct perf_target *target)
+static inline bool target__has_cpu(struct target *target)
 {
 	return target->system_wide || target->cpu_list;
 }
 
-static inline bool perf_target__none(struct perf_target *target)
+static inline bool target__none(struct target *target)
 {
-	return !perf_target__has_task(target) && !perf_target__has_cpu(target);
+	return !target__has_task(target) && !target__has_cpu(target);
 }
 
 #endif /* _PERF_TARGET_H */
diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
index f857b51..ce793c7 100644
--- a/tools/perf/util/top.c
+++ b/tools/perf/util/top.c
@@ -27,7 +27,7 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
 	float ksamples_per_sec;
 	float esamples_percent;
 	struct perf_record_opts *opts = &top->record_opts;
-	struct perf_target *target = &opts->target;
+	struct target *target = &opts->target;
 	size_t ret = 0;
 
 	if (top->samples) {

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2012-05-23 19:01 Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2012-05-23 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest perf-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus

   HEAD: ab0cce560ef177bdc7a8f73e9962be9d829a7b2c Revert "sched, perf: Use a single callback into the scheduler"

These are late tools/perf/ changes that missed the first round: 

 - endianness fixes
 - event parsing improvements
 - libtraceevent fixes factored out from trace-cmd
 - perl scripting engine fixes related to libtraceevent,
 - testcase improvements
 - perf inject / pipe mode fixes
 - plus a kernel side fix

 Thanks,

	Ingo

------------------>
Anshuman Khandual (1):
      perf record: Fix documentation for branch stack sampling

Arnaldo Carvalho de Melo (2):
      perf tools: Bump default sample freq to 4 kHz
      perf evlist: Show event attribute details

Frederic Weisbecker (2):
      perf script: Explicitly handle known default print arg type
      perf script: Rename struct event to struct event_format in perl engine

Jiri Olsa (8):
      perf test: Move parse event automated tests to separated object
      perf tools: Add support for displaying event parser debug info
      perf tools: Use allocated list for each parsed event
      perf tools: Separate 'mem:' event scanner bits
      perf tools: Add hardcoded name term for pmu events
      perf tools: Carry perf_event_attr bitfield throught different endians
      perf tools: Add union u64_swap type for swapping u64 data
      Revert "sched, perf: Use a single callback into the scheduler"

Namhyung Kim (3):
      perf tools: Rename libparsevent to libtraceevent in Makefile
      perf tools: Always try to build libtraceevent
      perf target: Add cpu flag to sample_type if target has cpu

Stephane Eranian (4):
      perf tools: rename HEADER_TRACE_INFO to HEADER_TRACING_DATA
      perf inject: Fix broken perf inject -b
      perf tools: Fix piped mode read code
      perf buildid-list: Work better with pipe mode


 include/linux/perf_event.h                         |  24 +-
 kernel/events/core.c                               |  14 +-
 kernel/sched/core.c                                |   9 +-
 tools/perf/Documentation/perf-evlist.txt           |   8 +
 tools/perf/Documentation/perf-record.txt           |   2 +-
 tools/perf/Makefile                                |  37 +-
 tools/perf/builtin-buildid-list.c                  |   6 +-
 tools/perf/builtin-evlist.c                        | 103 +++-
 tools/perf/builtin-inject.c                        |   5 +
 tools/perf/builtin-record.c                        |   6 +-
 tools/perf/builtin-test.c                          | 552 +-----------------
 tools/perf/builtin-top.c                           |   5 +-
 tools/perf/util/build-id.c                         |   2 +
 tools/perf/util/evsel.c                            |  12 +-
 tools/perf/util/header.c                           |  10 +-
 tools/perf/util/header.h                           |   2 +-
 tools/perf/util/parse-events-test.c                | 625 +++++++++++++++++++++
 tools/perf/util/parse-events.c                     |  69 ++-
 tools/perf/util/parse-events.h                     |  20 +-
 tools/perf/util/parse-events.l                     |  26 +-
 tools/perf/util/parse-events.y                     |  77 ++-
 tools/perf/util/pmu.c                              |   4 +-
 .../perf/util/scripting-engines/trace-event-perl.c |  16 +-
 tools/perf/util/session.c                          |  68 ++-
 tools/perf/util/types.h                            |   5 +
 25 files changed, 1032 insertions(+), 675 deletions(-)
 create mode 100644 tools/perf/util/parse-events-test.c

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8adf70e..f325786 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1084,8 +1084,10 @@ extern void perf_pmu_unregister(struct pmu *pmu);
 
 extern int perf_num_counters(void);
 extern const char *perf_pmu_name(void);
-extern void __perf_event_task_sched(struct task_struct *prev,
-				    struct task_struct *next);
+extern void __perf_event_task_sched_in(struct task_struct *prev,
+				       struct task_struct *task);
+extern void __perf_event_task_sched_out(struct task_struct *prev,
+					struct task_struct *next);
 extern int perf_event_init_task(struct task_struct *child);
 extern void perf_event_exit_task(struct task_struct *child);
 extern void perf_event_free_task(struct task_struct *task);
@@ -1205,13 +1207,20 @@ perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 
 extern struct static_key_deferred perf_sched_events;
 
-static inline void perf_event_task_sched(struct task_struct *prev,
+static inline void perf_event_task_sched_in(struct task_struct *prev,
 					    struct task_struct *task)
 {
+	if (static_key_false(&perf_sched_events.key))
+		__perf_event_task_sched_in(prev, task);
+}
+
+static inline void perf_event_task_sched_out(struct task_struct *prev,
+					     struct task_struct *next)
+{
 	perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, NULL, 0);
 
 	if (static_key_false(&perf_sched_events.key))
-		__perf_event_task_sched(prev, task);
+		__perf_event_task_sched_out(prev, next);
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
@@ -1286,8 +1295,11 @@ extern void perf_event_disable(struct perf_event *event);
 extern void perf_event_task_tick(void);
 #else
 static inline void
-perf_event_task_sched(struct task_struct *prev,
-		      struct task_struct *task)				{ }
+perf_event_task_sched_in(struct task_struct *prev,
+			 struct task_struct *task)			{ }
+static inline void
+perf_event_task_sched_out(struct task_struct *prev,
+			  struct task_struct *next)			{ }
 static inline int perf_event_init_task(struct task_struct *child)	{ return 0; }
 static inline void perf_event_exit_task(struct task_struct *child)	{ }
 static inline void perf_event_free_task(struct task_struct *task)	{ }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 91a4459..5b06cbb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2039,8 +2039,8 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
  * accessing the event control register. If a NMI hits, then it will
  * not restart the event.
  */
-static void __perf_event_task_sched_out(struct task_struct *task,
-					struct task_struct *next)
+void __perf_event_task_sched_out(struct task_struct *task,
+				 struct task_struct *next)
 {
 	int ctxn;
 
@@ -2279,8 +2279,8 @@ static void perf_branch_stack_sched_in(struct task_struct *prev,
  * accessing the event control register. If a NMI hits, then it will
  * keep the event running.
  */
-static void __perf_event_task_sched_in(struct task_struct *prev,
-				       struct task_struct *task)
+void __perf_event_task_sched_in(struct task_struct *prev,
+				struct task_struct *task)
 {
 	struct perf_event_context *ctx;
 	int ctxn;
@@ -2305,12 +2305,6 @@ static void __perf_event_task_sched_in(struct task_struct *prev,
 		perf_branch_stack_sched_in(prev, task);
 }
 
-void __perf_event_task_sched(struct task_struct *prev, struct task_struct *next)
-{
-	__perf_event_task_sched_out(prev, next);
-	__perf_event_task_sched_in(prev, next);
-}
-
 static u64 perf_calculate_period(struct perf_event *event, u64 nsec, u64 count)
 {
 	u64 frequency = event->attr.sample_freq;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 13c3883..0533a68 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1913,7 +1913,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
 	sched_info_switch(prev, next);
-	perf_event_task_sched(prev, next);
+	perf_event_task_sched_out(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
 	prepare_lock_switch(rq, next);
 	prepare_arch_switch(next);
@@ -1956,6 +1956,13 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	 */
 	prev_state = prev->state;
 	finish_arch_switch(prev);
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+	local_irq_disable();
+#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
+	perf_event_task_sched_in(prev, current);
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+	local_irq_enable();
+#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/tools/perf/Documentation/perf-evlist.txt b/tools/perf/Documentation/perf-evlist.txt
index 0507ec7..1521734 100644
--- a/tools/perf/Documentation/perf-evlist.txt
+++ b/tools/perf/Documentation/perf-evlist.txt
@@ -20,6 +20,14 @@ OPTIONS
 --input=::
         Input file name. (default: perf.data unless stdin is a fifo)
 
+-F::
+--freq=::
+	Show just the sample frequency used for each event.
+
+-v::
+--verbose=::
+	Show all fields.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-list[1],
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index a1386b2..b38a1f9 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -168,7 +168,7 @@ following filters are defined:
         - any:  any type of branches
         - any_call: any function call or system call
         - any_ret: any function return or system call return
-        - any_ind: any indirect branch
+        - ind_call: any indirect branch
         - u:  only when the branch target is at the user level
         - k: only when the branch target is in the kernel
         - hv: only when the target is at the hypervisor level
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index fa37cd5..1d3d513 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -83,7 +83,13 @@ ifndef PERF_DEBUG
   CFLAGS_OPTIMIZE = -O6
 endif
 
-CFLAGS = -fno-omit-frame-pointer -ggdb3 -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS)
+ifdef PARSER_DEBUG
+	PARSER_DEBUG_BISON  := -t
+	PARSER_DEBUG_FLEX   := -d
+	PARSER_DEBUG_CFLAGS := -DPARSER_DEBUG
+endif
+
+CFLAGS = -fno-omit-frame-pointer -ggdb3 -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) $(PARSER_DEBUG_CFLAGS)
 EXTLIBS = -lpthread -lrt -lelf -lm
 ALL_CFLAGS = $(CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 ALL_LDFLAGS = $(LDFLAGS)
@@ -149,7 +155,7 @@ endif
 
 ### --- END CONFIGURATION SECTION ---
 
-BASIC_CFLAGS = -Iutil/include -Iarch/$(ARCH)/include -I$(OUTPUT)/util -I$(EVENT_PARSE_DIR) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
+BASIC_CFLAGS = -Iutil/include -Iarch/$(ARCH)/include -I$(OUTPUT)/util -I$(TRACE_EVENT_DIR) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 BASIC_LDFLAGS =
 
 # Guard against environment variables
@@ -178,16 +184,16 @@ $(OUTPUT)python/perf.so: $(PYRF_OBJS) $(PYTHON_EXT_SRCS) $(PYTHON_EXT_DEPS)
 
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH))
 
-EVENT_PARSE_DIR = ../lib/traceevent/
+TRACE_EVENT_DIR = ../lib/traceevent/
 
 ifeq ("$(origin O)", "command line")
-	EP_PATH=$(OUTPUT)/
+	TE_PATH=$(OUTPUT)/
 else
-	EP_PATH=$(EVENT_PARSE_DIR)/
+	TE_PATH=$(TRACE_EVENT_DIR)/
 endif
 
-LIBPARSEVENT = $(EP_PATH)libtraceevent.a
-EP_LIB := -L$(EP_PATH) -ltraceevent
+LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
+TE_LIB := -L$(TE_PATH) -ltraceevent
 
 #
 # Single 'perf' binary right now:
@@ -216,10 +222,10 @@ FLEX = flex
 BISON= bison
 
 $(OUTPUT)util/parse-events-flex.c: util/parse-events.l
-	$(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/parse-events-flex.h -t util/parse-events.l > $(OUTPUT)util/parse-events-flex.c
+	$(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/parse-events-flex.h $(PARSER_DEBUG_FLEX) -t util/parse-events.l > $(OUTPUT)util/parse-events-flex.c
 
 $(OUTPUT)util/parse-events-bison.c: util/parse-events.y
-	$(QUIET_BISON)$(BISON) -v util/parse-events.y -d -o $(OUTPUT)util/parse-events-bison.c
+	$(QUIET_BISON)$(BISON) -v util/parse-events.y -d $(PARSER_DEBUG_BISON) -o $(OUTPUT)util/parse-events-bison.c
 
 $(OUTPUT)util/pmu-flex.c: util/pmu.l
 	$(QUIET_FLEX)$(FLEX) --header-file=$(OUTPUT)util/pmu-flex.h -t util/pmu.l > $(OUTPUT)util/pmu-flex.c
@@ -311,7 +317,7 @@ LIB_H += util/cpumap.h
 LIB_H += util/top.h
 LIB_H += $(ARCH_INCLUDE)
 LIB_H += util/cgroup.h
-LIB_H += $(EVENT_PARSE_DIR)event-parse.h
+LIB_H += $(TRACE_EVENT_DIR)event-parse.h
 LIB_H += util/target.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
@@ -332,6 +338,7 @@ LIB_OBJS += $(OUTPUT)util/help.o
 LIB_OBJS += $(OUTPUT)util/levenshtein.o
 LIB_OBJS += $(OUTPUT)util/parse-options.o
 LIB_OBJS += $(OUTPUT)util/parse-events.o
+LIB_OBJS += $(OUTPUT)util/parse-events-test.o
 LIB_OBJS += $(OUTPUT)util/path.o
 LIB_OBJS += $(OUTPUT)util/rbtree.o
 LIB_OBJS += $(OUTPUT)util/bitmap.o
@@ -410,7 +417,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
 BUILTIN_OBJS += $(OUTPUT)builtin-test.o
 BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
 
-PERFLIBS = $(LIB_FILE) $(LIBPARSEVENT)
+PERFLIBS = $(LIB_FILE) $(LIBTRACEEVENT)
 
 # Files needed for the python binding, perf.so
 # pyrf is just an internal name needed for all those wrappers.
@@ -819,9 +826,9 @@ $(sort $(dir $(DIRECTORY_DEPS))):
 $(LIB_FILE): $(LIB_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) rcs $@ $(LIB_OBJS)
 
-# libparsevent.a
-$(LIBPARSEVENT):
-	make -C $(EVENT_PARSE_DIR) $(COMMAND_O) libtraceevent.a
+# libtraceevent.a
+$(LIBTRACEEVENT):
+	$(QUIET_SUBDIR0)$(TRACE_EVENT_DIR) $(QUIET_SUBDIR1) $(COMMAND_O) libtraceevent.a
 
 help:
 	@echo 'Perf make targets:'
@@ -969,6 +976,6 @@ clean:
 	$(RM) $(OUTPUT)util/*-{bison,flex}*
 	$(python-clean)
 
-.PHONY: all install clean strip
+.PHONY: all install clean strip $(LIBTRACEEVENT)
 .PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell
 .PHONY: .FORCE-PERF-VERSION-FILE TAGS tags cscope .FORCE-PERF-CFLAGS
diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index 5248046..6b2bcfb 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -84,7 +84,11 @@ static int perf_session__list_build_ids(void)
 	if (filename__fprintf_build_id(session->filename, stdout))
 		goto out;
 
-	if (with_hits)
+	/*
+	 * in pipe-mode, the only way to get the buildids is to parse
+	 * the record stream. Buildids are stored as RECORD_HEADER_BUILD_ID
+	 */
+	if (with_hits || session->fd_pipe)
 		perf_session__process_events(session, &build_id__mark_dso_hit_ops);
 
 	perf_session__fprintf_dsos_buildid(session, stdout, with_hits);
diff --git a/tools/perf/builtin-evlist.c b/tools/perf/builtin-evlist.c
index 2676032..e52d77e 100644
--- a/tools/perf/builtin-evlist.c
+++ b/tools/perf/builtin-evlist.c
@@ -15,9 +15,40 @@
 #include "util/parse-options.h"
 #include "util/session.h"
 
-static const char *input_name;
+struct perf_attr_details {
+	bool freq;
+	bool verbose;
+};
+
+static int comma_printf(bool *first, const char *fmt, ...)
+{
+	va_list args;
+	int ret = 0;
+
+	if (!*first) {
+		ret += printf(",");
+	} else {
+		ret += printf(":");
+		*first = false;
+	}
+
+	va_start(args, fmt);
+	ret += vprintf(fmt, args);
+	va_end(args);
+	return ret;
+}
+
+static int __if_print(bool *first, const char *field, u64 value)
+{
+	if (value == 0)
+		return 0;
+
+	return comma_printf(first, " %s: %" PRIu64, field, value);
+}
+
+#define if_print(field) __if_print(&first, #field, pos->attr.field)
 
-static int __cmd_evlist(void)
+static int __cmd_evlist(const char *input_name, struct perf_attr_details *details)
 {
 	struct perf_session *session;
 	struct perf_evsel *pos;
@@ -26,8 +57,52 @@ static int __cmd_evlist(void)
 	if (session == NULL)
 		return -ENOMEM;
 
-	list_for_each_entry(pos, &session->evlist->entries, node)
-		printf("%s\n", event_name(pos));
+	list_for_each_entry(pos, &session->evlist->entries, node) {
+		bool first = true;
+
+		printf("%s", event_name(pos));
+
+		if (details->verbose || details->freq) {
+			comma_printf(&first, " sample_freq=%" PRIu64,
+				     (u64)pos->attr.sample_freq);
+		}
+
+		if (details->verbose) {
+			if_print(type);
+			if_print(config);
+			if_print(config1);
+			if_print(config2);
+			if_print(size);
+			if_print(sample_type);
+			if_print(read_format);
+			if_print(disabled);
+			if_print(inherit);
+			if_print(pinned);
+			if_print(exclusive);
+			if_print(exclude_user);
+			if_print(exclude_kernel);
+			if_print(exclude_hv);
+			if_print(exclude_idle);
+			if_print(mmap);
+			if_print(comm);
+			if_print(freq);
+			if_print(inherit_stat);
+			if_print(enable_on_exec);
+			if_print(task);
+			if_print(watermark);
+			if_print(precise_ip);
+			if_print(mmap_data);
+			if_print(sample_id_all);
+			if_print(exclude_host);
+			if_print(exclude_guest);
+			if_print(__reserved_1);
+			if_print(wakeup_events);
+			if_print(bp_type);
+			if_print(branch_sample_type);
+		}
+
+		putchar('\n');
+	}
 
 	perf_session__delete(session);
 	return 0;
@@ -38,17 +113,23 @@ static const char * const evlist_usage[] = {
 	NULL
 };
 
-static const struct option options[] = {
-	OPT_STRING('i', "input", &input_name, "file",
-		    "input file name"),
-	OPT_END()
-};
-
 int cmd_evlist(int argc, const char **argv, const char *prefix __used)
 {
+	struct perf_attr_details details = { .verbose = false, };
+	const char *input_name;
+	const struct option options[] = {
+		OPT_STRING('i', "input", &input_name, "file",
+			    "Input file name"),
+		OPT_BOOLEAN('F', "freq", &details.freq,
+			    "Show the sample frequency"),
+		OPT_BOOLEAN('v', "verbose", &details.verbose,
+			    "Show all event attr details"),
+		OPT_END()
+	};
+
 	argc = parse_options(argc, argv, options, evlist_usage, 0);
 	if (argc)
 		usage_with_options(evlist_usage, options);
 
-	return __cmd_evlist();
+	return __cmd_evlist(input_name, &details);
 }
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 09c1061..3beab48 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -60,6 +60,11 @@ static int perf_event__repipe_tracing_data_synth(union perf_event *event,
 static int perf_event__repipe_attr(union perf_event *event,
 				   struct perf_evlist **pevlist __used)
 {
+	int ret;
+	ret = perf_event__process_attr(event, pevlist);
+	if (ret)
+		return ret;
+
 	return perf_event__repipe_synth(NULL, event, NULL);
 }
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8a3dfac..e5cb084 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -396,7 +396,7 @@ static void perf_record__mmap_read_all(struct perf_record *rec)
 			perf_record__mmap_read(rec, &rec->evlist->mmap[i]);
 	}
 
-	if (perf_header__has_feat(&rec->session->header, HEADER_TRACE_INFO))
+	if (perf_header__has_feat(&rec->session->header, HEADER_TRACING_DATA))
 		write_output(rec, &finished_round_event, sizeof(finished_round_event));
 }
 
@@ -478,7 +478,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 		perf_header__clear_feat(&session->header, HEADER_BUILD_ID);
 
 	if (!have_tracepoints(&evsel_list->entries))
-		perf_header__clear_feat(&session->header, HEADER_TRACE_INFO);
+		perf_header__clear_feat(&session->header, HEADER_TRACING_DATA);
 
 	if (!rec->opts.branch_stack)
 		perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
@@ -753,7 +753,7 @@ static struct perf_record record = {
 		.mmap_pages	     = UINT_MAX,
 		.user_freq	     = UINT_MAX,
 		.user_interval	     = ULLONG_MAX,
-		.freq		     = 1000,
+		.freq		     = 4000,
 		.target		     = {
 			.uses_mmap   = true,
 		},
diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index 6c47376..5a8727c 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -604,556 +604,6 @@ out_free_threads:
 #undef nsyscalls
 }
 
-#define TEST_ASSERT_VAL(text, cond) \
-do { \
-	if (!(cond)) { \
-		pr_debug("FAILED %s:%d %s\n", __FILE__, __LINE__, text); \
-		return -1; \
-	} \
-} while (0)
-
-static int test__checkevent_tracepoint(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_TRACEPOINT == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong sample_type",
-		(PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU) ==
-		evsel->attr.sample_type);
-	TEST_ASSERT_VAL("wrong sample_period", 1 == evsel->attr.sample_period);
-	return 0;
-}
-
-static int test__checkevent_tracepoint_multi(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel;
-
-	TEST_ASSERT_VAL("wrong number of entries", evlist->nr_entries > 1);
-
-	list_for_each_entry(evsel, &evlist->entries, node) {
-		TEST_ASSERT_VAL("wrong type",
-			PERF_TYPE_TRACEPOINT == evsel->attr.type);
-		TEST_ASSERT_VAL("wrong sample_type",
-			(PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU)
-			== evsel->attr.sample_type);
-		TEST_ASSERT_VAL("wrong sample_period",
-			1 == evsel->attr.sample_period);
-	}
-	return 0;
-}
-
-static int test__checkevent_raw(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 0x1a == evsel->attr.config);
-	return 0;
-}
-
-static int test__checkevent_numeric(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", 1 == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 1 == evsel->attr.config);
-	return 0;
-}
-
-static int test__checkevent_symbolic_name(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config",
-			PERF_COUNT_HW_INSTRUCTIONS == evsel->attr.config);
-	return 0;
-}
-
-static int test__checkevent_symbolic_name_config(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config",
-			PERF_COUNT_HW_CPU_CYCLES == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong period",
-			100000 == evsel->attr.sample_period);
-	TEST_ASSERT_VAL("wrong config1",
-			0 == evsel->attr.config1);
-	TEST_ASSERT_VAL("wrong config2",
-			1 == evsel->attr.config2);
-	return 0;
-}
-
-static int test__checkevent_symbolic_alias(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config",
-			PERF_COUNT_SW_PAGE_FAULTS == evsel->attr.config);
-	return 0;
-}
-
-static int test__checkevent_genhw(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", (1 << 16) == evsel->attr.config);
-	return 0;
-}
-
-static int test__checkevent_breakpoint(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong bp_type", (HW_BREAKPOINT_R | HW_BREAKPOINT_W) ==
-					 evsel->attr.bp_type);
-	TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_4 ==
-					evsel->attr.bp_len);
-	return 0;
-}
-
-static int test__checkevent_breakpoint_x(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong bp_type",
-			HW_BREAKPOINT_X == evsel->attr.bp_type);
-	TEST_ASSERT_VAL("wrong bp_len", sizeof(long) == evsel->attr.bp_len);
-	return 0;
-}
-
-static int test__checkevent_breakpoint_r(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type",
-			PERF_TYPE_BREAKPOINT == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong bp_type",
-			HW_BREAKPOINT_R == evsel->attr.bp_type);
-	TEST_ASSERT_VAL("wrong bp_len",
-			HW_BREAKPOINT_LEN_4 == evsel->attr.bp_len);
-	return 0;
-}
-
-static int test__checkevent_breakpoint_w(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type",
-			PERF_TYPE_BREAKPOINT == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong bp_type",
-			HW_BREAKPOINT_W == evsel->attr.bp_type);
-	TEST_ASSERT_VAL("wrong bp_len",
-			HW_BREAKPOINT_LEN_4 == evsel->attr.bp_len);
-	return 0;
-}
-
-static int test__checkevent_tracepoint_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-
-	return test__checkevent_tracepoint(evlist);
-}
-
-static int
-test__checkevent_tracepoint_multi_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel;
-
-	TEST_ASSERT_VAL("wrong number of entries", evlist->nr_entries > 1);
-
-	list_for_each_entry(evsel, &evlist->entries, node) {
-		TEST_ASSERT_VAL("wrong exclude_user",
-				!evsel->attr.exclude_user);
-		TEST_ASSERT_VAL("wrong exclude_kernel",
-				evsel->attr.exclude_kernel);
-		TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-		TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-	}
-
-	return test__checkevent_tracepoint_multi(evlist);
-}
-
-static int test__checkevent_raw_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
-
-	return test__checkevent_raw(evlist);
-}
-
-static int test__checkevent_numeric_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
-
-	return test__checkevent_numeric(evlist);
-}
-
-static int test__checkevent_symbolic_name_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-
-	return test__checkevent_symbolic_name(evlist);
-}
-
-static int test__checkevent_exclude_host_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
-	TEST_ASSERT_VAL("wrong exclude host", evsel->attr.exclude_host);
-
-	return test__checkevent_symbolic_name(evlist);
-}
-
-static int test__checkevent_exclude_guest_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude guest", evsel->attr.exclude_guest);
-	TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
-
-	return test__checkevent_symbolic_name(evlist);
-}
-
-static int test__checkevent_symbolic_alias_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-
-	return test__checkevent_symbolic_alias(evlist);
-}
-
-static int test__checkevent_genhw_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
-
-	return test__checkevent_genhw(evlist);
-}
-
-static int test__checkevent_breakpoint_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-
-	return test__checkevent_breakpoint(evlist);
-}
-
-static int test__checkevent_breakpoint_x_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-
-	return test__checkevent_breakpoint_x(evlist);
-}
-
-static int test__checkevent_breakpoint_r_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
-
-	return test__checkevent_breakpoint_r(evlist);
-}
-
-static int test__checkevent_breakpoint_w_modifier(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
-
-	return test__checkevent_breakpoint_w(evlist);
-}
-
-static int test__checkevent_pmu(struct perf_evlist *evlist)
-{
-
-	struct perf_evsel *evsel = list_entry(evlist->entries.next,
-					      struct perf_evsel, node);
-
-	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config",    10 == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong config1",    1 == evsel->attr.config1);
-	TEST_ASSERT_VAL("wrong config2",    3 == evsel->attr.config2);
-	TEST_ASSERT_VAL("wrong period",  1000 == evsel->attr.sample_period);
-
-	return 0;
-}
-
-static int test__checkevent_list(struct perf_evlist *evlist)
-{
-	struct perf_evsel *evsel;
-
-	TEST_ASSERT_VAL("wrong number of entries", 3 == evlist->nr_entries);
-
-	/* r1 */
-	evsel = list_entry(evlist->entries.next, struct perf_evsel, node);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 1 == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong config1", 0 == evsel->attr.config1);
-	TEST_ASSERT_VAL("wrong config2", 0 == evsel->attr.config2);
-	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-
-	/* syscalls:sys_enter_open:k */
-	evsel = list_entry(evsel->node.next, struct perf_evsel, node);
-	TEST_ASSERT_VAL("wrong type", PERF_TYPE_TRACEPOINT == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong sample_type",
-		(PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU) ==
-		evsel->attr.sample_type);
-	TEST_ASSERT_VAL("wrong sample_period", 1 == evsel->attr.sample_period);
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
-
-	/* 1:1:hp */
-	evsel = list_entry(evsel->node.next, struct perf_evsel, node);
-	TEST_ASSERT_VAL("wrong type", 1 == evsel->attr.type);
-	TEST_ASSERT_VAL("wrong config", 1 == evsel->attr.config);
-	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
-	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
-	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
-	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
-
-	return 0;
-}
-
-static struct test__event_st {
-	const char *name;
-	__u32 type;
-	int (*check)(struct perf_evlist *evlist);
-} test__events[] = {
-	{
-		.name  = "syscalls:sys_enter_open",
-		.check = test__checkevent_tracepoint,
-	},
-	{
-		.name  = "syscalls:*",
-		.check = test__checkevent_tracepoint_multi,
-	},
-	{
-		.name  = "r1a",
-		.check = test__checkevent_raw,
-	},
-	{
-		.name  = "1:1",
-		.check = test__checkevent_numeric,
-	},
-	{
-		.name  = "instructions",
-		.check = test__checkevent_symbolic_name,
-	},
-	{
-		.name  = "cycles/period=100000,config2/",
-		.check = test__checkevent_symbolic_name_config,
-	},
-	{
-		.name  = "faults",
-		.check = test__checkevent_symbolic_alias,
-	},
-	{
-		.name  = "L1-dcache-load-miss",
-		.check = test__checkevent_genhw,
-	},
-	{
-		.name  = "mem:0",
-		.check = test__checkevent_breakpoint,
-	},
-	{
-		.name  = "mem:0:x",
-		.check = test__checkevent_breakpoint_x,
-	},
-	{
-		.name  = "mem:0:r",
-		.check = test__checkevent_breakpoint_r,
-	},
-	{
-		.name  = "mem:0:w",
-		.check = test__checkevent_breakpoint_w,
-	},
-	{
-		.name  = "syscalls:sys_enter_open:k",
-		.check = test__checkevent_tracepoint_modifier,
-	},
-	{
-		.name  = "syscalls:*:u",
-		.check = test__checkevent_tracepoint_multi_modifier,
-	},
-	{
-		.name  = "r1a:kp",
-		.check = test__checkevent_raw_modifier,
-	},
-	{
-		.name  = "1:1:hp",
-		.check = test__checkevent_numeric_modifier,
-	},
-	{
-		.name  = "instructions:h",
-		.check = test__checkevent_symbolic_name_modifier,
-	},
-	{
-		.name  = "faults:u",
-		.check = test__checkevent_symbolic_alias_modifier,
-	},
-	{
-		.name  = "L1-dcache-load-miss:kp",
-		.check = test__checkevent_genhw_modifier,
-	},
-	{
-		.name  = "mem:0:u",
-		.check = test__checkevent_breakpoint_modifier,
-	},
-	{
-		.name  = "mem:0:x:k",
-		.check = test__checkevent_breakpoint_x_modifier,
-	},
-	{
-		.name  = "mem:0:r:hp",
-		.check = test__checkevent_breakpoint_r_modifier,
-	},
-	{
-		.name  = "mem:0:w:up",
-		.check = test__checkevent_breakpoint_w_modifier,
-	},
-	{
-		.name  = "cpu/config=10,config1,config2=3,period=1000/u",
-		.check = test__checkevent_pmu,
-	},
-	{
-		.name  = "r1,syscalls:sys_enter_open:k,1:1:hp",
-		.check = test__checkevent_list,
-	},
-	{
-		.name  = "instructions:G",
-		.check = test__checkevent_exclude_host_modifier,
-	},
-	{
-		.name  = "instructions:H",
-		.check = test__checkevent_exclude_guest_modifier,
-	},
-};
-
-#define TEST__EVENTS_CNT (sizeof(test__events) / sizeof(struct test__event_st))
-
-static int test__parse_events(void)
-{
-	struct perf_evlist *evlist;
-	u_int i;
-	int ret = 0;
-
-	for (i = 0; i < TEST__EVENTS_CNT; i++) {
-		struct test__event_st *e = &test__events[i];
-
-		evlist = perf_evlist__new(NULL, NULL);
-		if (evlist == NULL)
-			break;
-
-		ret = parse_events(evlist, e->name, 0);
-		if (ret) {
-			pr_debug("failed to parse event '%s', err %d\n",
-				 e->name, ret);
-			break;
-		}
-
-		ret = e->check(evlist);
-		perf_evlist__delete(evlist);
-		if (ret)
-			break;
-	}
-
-	return ret;
-}
-
 static int sched__get_first_possible_cpu(pid_t pid, cpu_set_t **maskp,
 					 size_t *sizep)
 {
@@ -1675,7 +1125,7 @@ static struct test {
 	},
 	{
 		.desc = "parse events tests",
-		.func = test__parse_events,
+		.func = parse_events__test,
 	},
 #if defined(__x86_64__) || defined(__i386__)
 	{
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 3e981a7..6031dce 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -900,6 +900,9 @@ static void perf_top__start_counters(struct perf_top *top)
 			attr->read_format |= PERF_FORMAT_ID;
 		}
 
+		if (perf_target__has_cpu(&top->target))
+			attr->sample_type |= PERF_SAMPLE_CPU;
+
 		if (symbol_conf.use_callchain)
 			attr->sample_type |= PERF_SAMPLE_CALLCHAIN;
 
@@ -1159,7 +1162,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __used)
 	struct perf_top top = {
 		.count_filter	     = 5,
 		.delay_secs	     = 2,
-		.freq		     = 1000, /* 1 KHz */
+		.freq		     = 4000, /* 4 KHz */
 		.mmap_pages	     = 128,
 		.sym_pcnt_filter     = 5,
 		.target		     = {
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index dff9c7a..fd9a594 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -65,6 +65,8 @@ struct perf_tool build_id__mark_dso_hit_ops = {
 	.mmap	= perf_event__process_mmap,
 	.fork	= perf_event__process_task,
 	.exit	= perf_event__exit_del_thread,
+	.attr		 = perf_event__process_attr,
+	.build_id	 = perf_event__process_build_id,
 };
 
 char *dso__build_id_filename(struct dso *self, char *bf, size_t size)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f4f427c..57e4ce5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -108,7 +108,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
 	if (opts->call_graph)
 		attr->sample_type	|= PERF_SAMPLE_CALLCHAIN;
 
-	if (opts->target.system_wide)
+	if (perf_target__has_cpu(&opts->target))
 		attr->sample_type	|= PERF_SAMPLE_CPU;
 
 	if (opts->period)
@@ -462,10 +462,7 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
 	 * used for cross-endian analysis. See git commit 65014ab3
 	 * for why this goofiness is needed.
 	 */
-	union {
-		u64 val64;
-		u32 val32[2];
-	} u;
+	union u64_swap u;
 
 	memset(data, 0, sizeof(*data));
 	data->cpu = data->pid = data->tid = -1;
@@ -608,10 +605,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type,
 	 * used for cross-endian analysis. See git commit 65014ab3
 	 * for why this goofiness is needed.
 	 */
-	union {
-		u64 val64;
-		u32 val32[2];
-	} u;
+	union u64_swap u;
 
 	array = event->sample.array;
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 5385980..2dd5edf 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -437,7 +437,7 @@ static bool perf_session__read_build_ids(struct perf_session *session, bool with
 	return ret;
 }
 
-static int write_trace_info(int fd, struct perf_header *h __used,
+static int write_tracing_data(int fd, struct perf_header *h __used,
 			    struct perf_evlist *evlist)
 {
 	return read_tracing_data(fd, &evlist->entries);
@@ -1472,7 +1472,7 @@ out:
 	return err;
 }
 
-static int process_trace_info(struct perf_file_section *section __unused,
+static int process_tracing_data(struct perf_file_section *section __unused,
 			      struct perf_header *ph __unused,
 			      int feat __unused, int fd)
 {
@@ -1508,11 +1508,11 @@ struct feature_ops {
 		.full_only = true }
 
 /* feature_ops not implemented: */
-#define print_trace_info		NULL
-#define print_build_id			NULL
+#define print_tracing_data	NULL
+#define print_build_id		NULL
 
 static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
-	FEAT_OPP(HEADER_TRACE_INFO,	trace_info),
+	FEAT_OPP(HEADER_TRACING_DATA,	tracing_data),
 	FEAT_OPP(HEADER_BUILD_ID,	build_id),
 	FEAT_OPA(HEADER_HOSTNAME,	hostname),
 	FEAT_OPA(HEADER_OSRELEASE,	osrelease),
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 21a6be0..2d42b3e 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -12,7 +12,7 @@
 enum {
 	HEADER_RESERVED		= 0,	/* always cleared */
 	HEADER_FIRST_FEATURE	= 1,
-	HEADER_TRACE_INFO	= 1,
+	HEADER_TRACING_DATA	= 1,
 	HEADER_BUILD_ID,
 
 	HEADER_HOSTNAME,
diff --git a/tools/perf/util/parse-events-test.c b/tools/perf/util/parse-events-test.c
new file mode 100644
index 0000000..76b98e2
--- /dev/null
+++ b/tools/perf/util/parse-events-test.c
@@ -0,0 +1,625 @@
+
+#include "parse-events.h"
+#include "evsel.h"
+#include "evlist.h"
+#include "sysfs.h"
+#include "../../../include/linux/hw_breakpoint.h"
+
+#define TEST_ASSERT_VAL(text, cond) \
+do { \
+	if (!(cond)) { \
+		pr_debug("FAILED %s:%d %s\n", __FILE__, __LINE__, text); \
+		return -1; \
+	} \
+} while (0)
+
+static int test__checkevent_tracepoint(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_TRACEPOINT == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong sample_type",
+		(PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU) ==
+		evsel->attr.sample_type);
+	TEST_ASSERT_VAL("wrong sample_period", 1 == evsel->attr.sample_period);
+	return 0;
+}
+
+static int test__checkevent_tracepoint_multi(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	TEST_ASSERT_VAL("wrong number of entries", evlist->nr_entries > 1);
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		TEST_ASSERT_VAL("wrong type",
+			PERF_TYPE_TRACEPOINT == evsel->attr.type);
+		TEST_ASSERT_VAL("wrong sample_type",
+			(PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU)
+			== evsel->attr.sample_type);
+		TEST_ASSERT_VAL("wrong sample_period",
+			1 == evsel->attr.sample_period);
+	}
+	return 0;
+}
+
+static int test__checkevent_raw(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 0x1a == evsel->attr.config);
+	return 0;
+}
+
+static int test__checkevent_numeric(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", 1 == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 1 == evsel->attr.config);
+	return 0;
+}
+
+static int test__checkevent_symbolic_name(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config",
+			PERF_COUNT_HW_INSTRUCTIONS == evsel->attr.config);
+	return 0;
+}
+
+static int test__checkevent_symbolic_name_config(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config",
+			PERF_COUNT_HW_CPU_CYCLES == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong period",
+			100000 == evsel->attr.sample_period);
+	TEST_ASSERT_VAL("wrong config1",
+			0 == evsel->attr.config1);
+	TEST_ASSERT_VAL("wrong config2",
+			1 == evsel->attr.config2);
+	return 0;
+}
+
+static int test__checkevent_symbolic_alias(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config",
+			PERF_COUNT_SW_PAGE_FAULTS == evsel->attr.config);
+	return 0;
+}
+
+static int test__checkevent_genhw(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_HW_CACHE == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", (1 << 16) == evsel->attr.config);
+	return 0;
+}
+
+static int test__checkevent_breakpoint(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong bp_type", (HW_BREAKPOINT_R | HW_BREAKPOINT_W) ==
+					 evsel->attr.bp_type);
+	TEST_ASSERT_VAL("wrong bp_len", HW_BREAKPOINT_LEN_4 ==
+					evsel->attr.bp_len);
+	return 0;
+}
+
+static int test__checkevent_breakpoint_x(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_BREAKPOINT == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong bp_type",
+			HW_BREAKPOINT_X == evsel->attr.bp_type);
+	TEST_ASSERT_VAL("wrong bp_len", sizeof(long) == evsel->attr.bp_len);
+	return 0;
+}
+
+static int test__checkevent_breakpoint_r(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type",
+			PERF_TYPE_BREAKPOINT == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong bp_type",
+			HW_BREAKPOINT_R == evsel->attr.bp_type);
+	TEST_ASSERT_VAL("wrong bp_len",
+			HW_BREAKPOINT_LEN_4 == evsel->attr.bp_len);
+	return 0;
+}
+
+static int test__checkevent_breakpoint_w(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type",
+			PERF_TYPE_BREAKPOINT == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 0 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong bp_type",
+			HW_BREAKPOINT_W == evsel->attr.bp_type);
+	TEST_ASSERT_VAL("wrong bp_len",
+			HW_BREAKPOINT_LEN_4 == evsel->attr.bp_len);
+	return 0;
+}
+
+static int test__checkevent_tracepoint_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+
+	return test__checkevent_tracepoint(evlist);
+}
+
+static int
+test__checkevent_tracepoint_multi_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	TEST_ASSERT_VAL("wrong number of entries", evlist->nr_entries > 1);
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		TEST_ASSERT_VAL("wrong exclude_user",
+				!evsel->attr.exclude_user);
+		TEST_ASSERT_VAL("wrong exclude_kernel",
+				evsel->attr.exclude_kernel);
+		TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+		TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+	}
+
+	return test__checkevent_tracepoint_multi(evlist);
+}
+
+static int test__checkevent_raw_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+
+	return test__checkevent_raw(evlist);
+}
+
+static int test__checkevent_numeric_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+
+	return test__checkevent_numeric(evlist);
+}
+
+static int test__checkevent_symbolic_name_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+
+	return test__checkevent_symbolic_name(evlist);
+}
+
+static int test__checkevent_exclude_host_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude guest", !evsel->attr.exclude_guest);
+	TEST_ASSERT_VAL("wrong exclude host", evsel->attr.exclude_host);
+
+	return test__checkevent_symbolic_name(evlist);
+}
+
+static int test__checkevent_exclude_guest_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude guest", evsel->attr.exclude_guest);
+	TEST_ASSERT_VAL("wrong exclude host", !evsel->attr.exclude_host);
+
+	return test__checkevent_symbolic_name(evlist);
+}
+
+static int test__checkevent_symbolic_alias_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+
+	return test__checkevent_symbolic_alias(evlist);
+}
+
+static int test__checkevent_genhw_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+
+	return test__checkevent_genhw(evlist);
+}
+
+static int test__checkevent_breakpoint_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+
+	return test__checkevent_breakpoint(evlist);
+}
+
+static int test__checkevent_breakpoint_x_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+
+	return test__checkevent_breakpoint_x(evlist);
+}
+
+static int test__checkevent_breakpoint_r_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+
+	return test__checkevent_breakpoint_r(evlist);
+}
+
+static int test__checkevent_breakpoint_w_modifier(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+
+	return test__checkevent_breakpoint_w(evlist);
+}
+
+static int test__checkevent_pmu(struct perf_evlist *evlist)
+{
+
+	struct perf_evsel *evsel = list_entry(evlist->entries.next,
+					      struct perf_evsel, node);
+
+	TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config",    10 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong config1",    1 == evsel->attr.config1);
+	TEST_ASSERT_VAL("wrong config2",    3 == evsel->attr.config2);
+	TEST_ASSERT_VAL("wrong period",  1000 == evsel->attr.sample_period);
+
+	return 0;
+}
+
+static int test__checkevent_list(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	TEST_ASSERT_VAL("wrong number of entries", 3 == evlist->nr_entries);
+
+	/* r1 */
+	evsel = list_entry(evlist->entries.next, struct perf_evsel, node);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 1 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong config1", 0 == evsel->attr.config1);
+	TEST_ASSERT_VAL("wrong config2", 0 == evsel->attr.config2);
+	TEST_ASSERT_VAL("wrong exclude_user", !evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+
+	/* syscalls:sys_enter_open:k */
+	evsel = list_entry(evsel->node.next, struct perf_evsel, node);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_TRACEPOINT == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong sample_type",
+		(PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU) ==
+		evsel->attr.sample_type);
+	TEST_ASSERT_VAL("wrong sample_period", 1 == evsel->attr.sample_period);
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", !evsel->attr.precise_ip);
+
+	/* 1:1:hp */
+	evsel = list_entry(evsel->node.next, struct perf_evsel, node);
+	TEST_ASSERT_VAL("wrong type", 1 == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config", 1 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong exclude_user", evsel->attr.exclude_user);
+	TEST_ASSERT_VAL("wrong exclude_kernel", evsel->attr.exclude_kernel);
+	TEST_ASSERT_VAL("wrong exclude_hv", !evsel->attr.exclude_hv);
+	TEST_ASSERT_VAL("wrong precise_ip", evsel->attr.precise_ip);
+
+	return 0;
+}
+
+static int test__checkevent_pmu_name(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	/* cpu/config=1,name=krava1/u */
+	evsel = list_entry(evlist->entries.next, struct perf_evsel, node);
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config",  1 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong name", !strcmp(evsel->name, "krava"));
+
+	/* cpu/config=2/" */
+	evsel = list_entry(evsel->node.next, struct perf_evsel, node);
+	TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->nr_entries);
+	TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->attr.type);
+	TEST_ASSERT_VAL("wrong config",  2 == evsel->attr.config);
+	TEST_ASSERT_VAL("wrong name", !strcmp(evsel->name, "raw 0x2"));
+
+	return 0;
+}
+
+struct test__event_st {
+	const char *name;
+	__u32 type;
+	int (*check)(struct perf_evlist *evlist);
+};
+
+static struct test__event_st test__events[] = {
+	[0] = {
+		.name  = "syscalls:sys_enter_open",
+		.check = test__checkevent_tracepoint,
+	},
+	[1] = {
+		.name  = "syscalls:*",
+		.check = test__checkevent_tracepoint_multi,
+	},
+	[2] = {
+		.name  = "r1a",
+		.check = test__checkevent_raw,
+	},
+	[3] = {
+		.name  = "1:1",
+		.check = test__checkevent_numeric,
+	},
+	[4] = {
+		.name  = "instructions",
+		.check = test__checkevent_symbolic_name,
+	},
+	[5] = {
+		.name  = "cycles/period=100000,config2/",
+		.check = test__checkevent_symbolic_name_config,
+	},
+	[6] = {
+		.name  = "faults",
+		.check = test__checkevent_symbolic_alias,
+	},
+	[7] = {
+		.name  = "L1-dcache-load-miss",
+		.check = test__checkevent_genhw,
+	},
+	[8] = {
+		.name  = "mem:0",
+		.check = test__checkevent_breakpoint,
+	},
+	[9] = {
+		.name  = "mem:0:x",
+		.check = test__checkevent_breakpoint_x,
+	},
+	[10] = {
+		.name  = "mem:0:r",
+		.check = test__checkevent_breakpoint_r,
+	},
+	[11] = {
+		.name  = "mem:0:w",
+		.check = test__checkevent_breakpoint_w,
+	},
+	[12] = {
+		.name  = "syscalls:sys_enter_open:k",
+		.check = test__checkevent_tracepoint_modifier,
+	},
+	[13] = {
+		.name  = "syscalls:*:u",
+		.check = test__checkevent_tracepoint_multi_modifier,
+	},
+	[14] = {
+		.name  = "r1a:kp",
+		.check = test__checkevent_raw_modifier,
+	},
+	[15] = {
+		.name  = "1:1:hp",
+		.check = test__checkevent_numeric_modifier,
+	},
+	[16] = {
+		.name  = "instructions:h",
+		.check = test__checkevent_symbolic_name_modifier,
+	},
+	[17] = {
+		.name  = "faults:u",
+		.check = test__checkevent_symbolic_alias_modifier,
+	},
+	[18] = {
+		.name  = "L1-dcache-load-miss:kp",
+		.check = test__checkevent_genhw_modifier,
+	},
+	[19] = {
+		.name  = "mem:0:u",
+		.check = test__checkevent_breakpoint_modifier,
+	},
+	[20] = {
+		.name  = "mem:0:x:k",
+		.check = test__checkevent_breakpoint_x_modifier,
+	},
+	[21] = {
+		.name  = "mem:0:r:hp",
+		.check = test__checkevent_breakpoint_r_modifier,
+	},
+	[22] = {
+		.name  = "mem:0:w:up",
+		.check = test__checkevent_breakpoint_w_modifier,
+	},
+	[23] = {
+		.name  = "r1,syscalls:sys_enter_open:k,1:1:hp",
+		.check = test__checkevent_list,
+	},
+	[24] = {
+		.name  = "instructions:G",
+		.check = test__checkevent_exclude_host_modifier,
+	},
+	[25] = {
+		.name  = "instructions:H",
+		.check = test__checkevent_exclude_guest_modifier,
+	},
+};
+
+#define TEST__EVENTS_CNT (sizeof(test__events) / sizeof(struct test__event_st))
+
+static struct test__event_st test__events_pmu[] = {
+	[0] = {
+		.name  = "cpu/config=10,config1,config2=3,period=1000/u",
+		.check = test__checkevent_pmu,
+	},
+	[1] = {
+		.name  = "cpu/config=1,name=krava/u,cpu/config=2/u",
+		.check = test__checkevent_pmu_name,
+	},
+};
+
+#define TEST__EVENTS_PMU_CNT (sizeof(test__events_pmu) / \
+			      sizeof(struct test__event_st))
+
+static int test(struct test__event_st *e)
+{
+	struct perf_evlist *evlist;
+	int ret;
+
+	evlist = perf_evlist__new(NULL, NULL);
+	if (evlist == NULL)
+		return -ENOMEM;
+
+	ret = parse_events(evlist, e->name, 0);
+	if (ret) {
+		pr_debug("failed to parse event '%s', err %d\n",
+			 e->name, ret);
+		return ret;
+	}
+
+	ret = e->check(evlist);
+	perf_evlist__delete(evlist);
+
+	return ret;
+}
+
+static int test_events(struct test__event_st *events, unsigned cnt)
+{
+	int ret = 0;
+	unsigned i;
+
+	for (i = 0; i < cnt; i++) {
+		struct test__event_st *e = &events[i];
+
+		pr_debug("running test %d '%s'\n", i, e->name);
+		ret = test(e);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+static int test_pmu(void)
+{
+	struct stat st;
+	char path[PATH_MAX];
+	int ret;
+
+	snprintf(path, PATH_MAX, "%s/bus/event_source/devices/cpu/format/",
+		 sysfs_find_mountpoint());
+
+	ret = stat(path, &st);
+	if (ret)
+		pr_debug("ommiting PMU cpu tests\n");
+	return !ret;
+}
+
+int parse_events__test(void)
+{
+	int ret;
+
+	ret = test_events(test__events, TEST__EVENTS_CNT);
+	if (!ret && test_pmu())
+		ret = test_events(test__events_pmu, TEST__EVENTS_PMU_CNT);
+
+	return ret;
+}
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index c7fc18a..fac7d59 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -23,8 +23,10 @@ struct event_symbol {
 	const char	*alias;
 };
 
-int parse_events_parse(struct list_head *list, struct list_head *list_tmp,
-		       int *idx);
+#ifdef PARSER_DEBUG
+extern int parse_events_debug;
+#endif
+int parse_events_parse(struct list_head *list, int *idx);
 
 #define CHW(x) .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_##x
 #define CSW(x) .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_##x
@@ -355,20 +357,30 @@ const char *__event_name(int type, u64 config)
 	return "unknown";
 }
 
-static int add_event(struct list_head *list, int *idx,
+static int add_event(struct list_head **_list, int *idx,
 		     struct perf_event_attr *attr, char *name)
 {
 	struct perf_evsel *evsel;
+	struct list_head *list = *_list;
+
+	if (!list) {
+		list = malloc(sizeof(*list));
+		if (!list)
+			return -ENOMEM;
+		INIT_LIST_HEAD(list);
+	}
 
 	event_attr_init(attr);
 
 	evsel = perf_evsel__new(attr, (*idx)++);
-	if (!evsel)
+	if (!evsel) {
+		free(list);
 		return -ENOMEM;
-
-	list_add_tail(&evsel->node, list);
+	}
 
 	evsel->name = strdup(name);
+	list_add_tail(&evsel->node, list);
+	*_list = list;
 	return 0;
 }
 
@@ -390,7 +402,7 @@ static int parse_aliases(char *str, const char *names[][MAX_ALIASES], int size)
 	return -1;
 }
 
-int parse_events_add_cache(struct list_head *list, int *idx,
+int parse_events_add_cache(struct list_head **list, int *idx,
 			   char *type, char *op_result1, char *op_result2)
 {
 	struct perf_event_attr attr;
@@ -451,7 +463,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
 	return add_event(list, idx, &attr, name);
 }
 
-static int add_tracepoint(struct list_head *list, int *idx,
+static int add_tracepoint(struct list_head **list, int *idx,
 			  char *sys_name, char *evt_name)
 {
 	struct perf_event_attr attr;
@@ -488,7 +500,7 @@ static int add_tracepoint(struct list_head *list, int *idx,
 	return add_event(list, idx, &attr, name);
 }
 
-static int add_tracepoint_multi(struct list_head *list, int *idx,
+static int add_tracepoint_multi(struct list_head **list, int *idx,
 				char *sys_name, char *evt_name)
 {
 	char evt_path[MAXPATHLEN];
@@ -519,7 +531,7 @@ static int add_tracepoint_multi(struct list_head *list, int *idx,
 	return ret;
 }
 
-int parse_events_add_tracepoint(struct list_head *list, int *idx,
+int parse_events_add_tracepoint(struct list_head **list, int *idx,
 				char *sys, char *event)
 {
 	int ret;
@@ -563,7 +575,7 @@ parse_breakpoint_type(const char *type, struct perf_event_attr *attr)
 	return 0;
 }
 
-int parse_events_add_breakpoint(struct list_head *list, int *idx,
+int parse_events_add_breakpoint(struct list_head **list, int *idx,
 				void *ptr, char *type)
 {
 	struct perf_event_attr attr;
@@ -622,6 +634,9 @@ do {								\
 		 * attr->branch_sample_type = term->val.num;
 		 */
 		break;
+	case PARSE_EVENTS__TERM_TYPE_NAME:
+		CHECK_TYPE_VAL(STR);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -642,7 +657,7 @@ static int config_attr(struct perf_event_attr *attr,
 	return 0;
 }
 
-int parse_events_add_numeric(struct list_head *list, int *idx,
+int parse_events_add_numeric(struct list_head **list, int *idx,
 			     unsigned long type, unsigned long config,
 			     struct list_head *head_config)
 {
@@ -660,7 +675,24 @@ int parse_events_add_numeric(struct list_head *list, int *idx,
 			 (char *) __event_name(type, config));
 }
 
-int parse_events_add_pmu(struct list_head *list, int *idx,
+static int parse_events__is_name_term(struct parse_events__term *term)
+{
+	return term->type_term == PARSE_EVENTS__TERM_TYPE_NAME;
+}
+
+static char *pmu_event_name(struct perf_event_attr *attr,
+			    struct list_head *head_terms)
+{
+	struct parse_events__term *term;
+
+	list_for_each_entry(term, head_terms, list)
+		if (parse_events__is_name_term(term))
+			return term->val.str;
+
+	return (char *) __event_name(PERF_TYPE_RAW, attr->config);
+}
+
+int parse_events_add_pmu(struct list_head **list, int *idx,
 			 char *name, struct list_head *head_config)
 {
 	struct perf_event_attr attr;
@@ -681,7 +713,8 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 	if (perf_pmu__config(pmu, &attr, head_config))
 		return -EINVAL;
 
-	return add_event(list, idx, &attr, (char *) "pmu");
+	return add_event(list, idx, &attr,
+			 pmu_event_name(&attr, head_config));
 }
 
 void parse_events_update_lists(struct list_head *list_event,
@@ -693,7 +726,7 @@ void parse_events_update_lists(struct list_head *list_event,
 	 * list, for next event definition.
 	 */
 	list_splice_tail(list_event, list_all);
-	INIT_LIST_HEAD(list_event);
+	free(list_event);
 }
 
 int parse_events_modifier(struct list_head *list, char *str)
@@ -768,10 +801,14 @@ int parse_events(struct perf_evlist *evlist, const char *str, int unset __used)
 
 	buffer = parse_events__scan_string(str);
 
-	ret = parse_events_parse(&list, &list_tmp, &idx);
+#ifdef PARSER_DEBUG
+	parse_events_debug = 1;
+#endif
+	ret = parse_events_parse(&list, &idx);
 
 	parse_events__flush_buffer(buffer);
 	parse_events__delete_buffer(buffer);
+	parse_events_lex_destroy();
 
 	if (!ret) {
 		int entries = idx - evlist->nr_entries;
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 3fddd61..8cac57a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -4,7 +4,9 @@
  * Parse symbolic events/counts passed in as options:
  */
 
+#include <linux/list.h>
 #include <stdbool.h>
+#include "types.h"
 #include "../../../include/linux/perf_event.h"
 #include "types.h"
 
@@ -45,6 +47,7 @@ enum {
 	PARSE_EVENTS__TERM_TYPE_CONFIG,
 	PARSE_EVENTS__TERM_TYPE_CONFIG1,
 	PARSE_EVENTS__TERM_TYPE_CONFIG2,
+	PARSE_EVENTS__TERM_TYPE_NAME,
 	PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
 	PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE,
 };
@@ -66,26 +69,23 @@ int parse_events__term_num(struct parse_events__term **_term,
 int parse_events__term_str(struct parse_events__term **_term,
 			   int type_term, char *config, char *str);
 void parse_events__free_terms(struct list_head *terms);
-int parse_events_modifier(struct list_head *list __used, char *str __used);
-int parse_events_add_tracepoint(struct list_head *list, int *idx,
+int parse_events_modifier(struct list_head *list, char *str);
+int parse_events_add_tracepoint(struct list_head **list, int *idx,
 				char *sys, char *event);
-int parse_events_add_raw(struct perf_evlist *evlist, unsigned long config,
-			 unsigned long config1, unsigned long config2,
-			 char *mod);
-int parse_events_add_numeric(struct list_head *list, int *idx,
+int parse_events_add_numeric(struct list_head **list, int *idx,
 			     unsigned long type, unsigned long config,
 			     struct list_head *head_config);
-int parse_events_add_cache(struct list_head *list, int *idx,
+int parse_events_add_cache(struct list_head **list, int *idx,
 			   char *type, char *op_result1, char *op_result2);
-int parse_events_add_breakpoint(struct list_head *list, int *idx,
+int parse_events_add_breakpoint(struct list_head **list, int *idx,
 				void *ptr, char *type);
-int parse_events_add_pmu(struct list_head *list, int *idx,
+int parse_events_add_pmu(struct list_head **list, int *idx,
 			 char *pmu , struct list_head *head_config);
 void parse_events_update_lists(struct list_head *list_event,
 			       struct list_head *list_all);
 void parse_events_error(struct list_head *list_all,
-			struct list_head *list_event,
 			int *idx, char const *msg);
+int parse_events__test(void);
 
 void print_events(const char *event_glob);
 void print_events_type(u8 type);
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 1fcf1bb..618a8e7 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -1,5 +1,6 @@
 
 %option prefix="parse_events_"
+%option stack
 
 %{
 #include <errno.h>
@@ -50,6 +51,8 @@ static int term(int type)
 
 %}
 
+%x mem
+
 num_dec		[0-9]+
 num_hex		0x[a-fA-F0-9]+
 num_raw_hex	[a-fA-F0-9]+
@@ -102,16 +105,16 @@ misses|miss				{ return str(PE_NAME_CACHE_OP_RESULT); }
 config			{ return term(PARSE_EVENTS__TERM_TYPE_CONFIG); }
 config1			{ return term(PARSE_EVENTS__TERM_TYPE_CONFIG1); }
 config2			{ return term(PARSE_EVENTS__TERM_TYPE_CONFIG2); }
+name			{ return term(PARSE_EVENTS__TERM_TYPE_NAME); }
 period			{ return term(PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type		{ return term(PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 
-mem:			{ return PE_PREFIX_MEM; }
+mem:			{ BEGIN(mem); return PE_PREFIX_MEM; }
 r{num_raw_hex}		{ return raw(); }
 {num_dec}		{ return value(10); }
 {num_hex}		{ return value(16); }
 
 {modifier_event}	{ return str(PE_MODIFIER_EVENT); }
-{modifier_bp}		{ return str(PE_MODIFIER_BP); }
 {name}			{ return str(PE_NAME); }
 "/"			{ return '/'; }
 -			{ return '-'; }
@@ -119,6 +122,25 @@ r{num_raw_hex}		{ return raw(); }
 :			{ return ':'; }
 =			{ return '='; }
 
+<mem>{
+{modifier_bp}		{ return str(PE_MODIFIER_BP); }
+:			{ return ':'; }
+{num_dec}		{ return value(10); }
+{num_hex}		{ return value(16); }
+	/*
+	 * We need to separate 'mem:' scanner part, in order to get specific
+	 * modifier bits parsed out. Otherwise we would need to handle PE_NAME
+	 * and we'd need to parse it manually. During the escape from <mem>
+	 * state we need to put the escaping char back, so we dont miss it.
+	 */
+.			{ unput(*parse_events_text); BEGIN(INITIAL); }
+	/*
+	 * We destroy the scanner after reaching EOF,
+	 * but anyway just to be sure get back to INIT state.
+	 */
+<<EOF>>			{ BEGIN(INITIAL); }
+}
+
 %%
 
 int parse_events_wrap(void)
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 936913e..362cc59 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -1,7 +1,6 @@
 
 %name-prefix "parse_events_"
 %parse-param {struct list_head *list_all}
-%parse-param {struct list_head *list_event}
 %parse-param {int *idx}
 
 %{
@@ -41,6 +40,14 @@ do { \
 %type <str> PE_MODIFIER_BP
 %type <head> event_config
 %type <term> event_term
+%type <head> event_pmu
+%type <head> event_legacy_symbol
+%type <head> event_legacy_cache
+%type <head> event_legacy_mem
+%type <head> event_legacy_tracepoint
+%type <head> event_legacy_numeric
+%type <head> event_legacy_raw
+%type <head> event_def
 
 %union
 {
@@ -62,13 +69,13 @@ event_def PE_MODIFIER_EVENT
 	 * (there could be more events added for multiple tracepoint
 	 * definitions via '*?'.
 	 */
-	ABORT_ON(parse_events_modifier(list_event, $2));
-	parse_events_update_lists(list_event, list_all);
+	ABORT_ON(parse_events_modifier($1, $2));
+	parse_events_update_lists($1, list_all);
 }
 |
 event_def
 {
-	parse_events_update_lists(list_event, list_all);
+	parse_events_update_lists($1, list_all);
 }
 
 event_def: event_pmu |
@@ -82,71 +89,102 @@ event_def: event_pmu |
 event_pmu:
 PE_NAME '/' event_config '/'
 {
-	ABORT_ON(parse_events_add_pmu(list_event, idx, $1, $3));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_pmu(&list, idx, $1, $3));
 	parse_events__free_terms($3);
+	$$ = list;
 }
 
 event_legacy_symbol:
 PE_VALUE_SYM '/' event_config '/'
 {
+	struct list_head *list = NULL;
 	int type = $1 >> 16;
 	int config = $1 & 255;
 
-	ABORT_ON(parse_events_add_numeric(list_event, idx, type, config, $3));
+	ABORT_ON(parse_events_add_numeric(&list, idx, type, config, $3));
 	parse_events__free_terms($3);
+	$$ = list;
 }
 |
 PE_VALUE_SYM sep_slash_dc
 {
+	struct list_head *list = NULL;
 	int type = $1 >> 16;
 	int config = $1 & 255;
 
-	ABORT_ON(parse_events_add_numeric(list_event, idx, type, config, NULL));
+	ABORT_ON(parse_events_add_numeric(&list, idx, type, config, NULL));
+	$$ = list;
 }
 
 event_legacy_cache:
 PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT
 {
-	ABORT_ON(parse_events_add_cache(list_event, idx, $1, $3, $5));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_cache(&list, idx, $1, $3, $5));
+	$$ = list;
 }
 |
 PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT
 {
-	ABORT_ON(parse_events_add_cache(list_event, idx, $1, $3, NULL));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_cache(&list, idx, $1, $3, NULL));
+	$$ = list;
 }
 |
 PE_NAME_CACHE_TYPE
 {
-	ABORT_ON(parse_events_add_cache(list_event, idx, $1, NULL, NULL));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_cache(&list, idx, $1, NULL, NULL));
+	$$ = list;
 }
 
 event_legacy_mem:
 PE_PREFIX_MEM PE_VALUE ':' PE_MODIFIER_BP sep_dc
 {
-	ABORT_ON(parse_events_add_breakpoint(list_event, idx, (void *) $2, $4));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_breakpoint(&list, idx, (void *) $2, $4));
+	$$ = list;
 }
 |
 PE_PREFIX_MEM PE_VALUE sep_dc
 {
-	ABORT_ON(parse_events_add_breakpoint(list_event, idx, (void *) $2, NULL));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_breakpoint(&list, idx, (void *) $2, NULL));
+	$$ = list;
 }
 
 event_legacy_tracepoint:
 PE_NAME ':' PE_NAME
 {
-	ABORT_ON(parse_events_add_tracepoint(list_event, idx, $1, $3));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_tracepoint(&list, idx, $1, $3));
+	$$ = list;
 }
 
 event_legacy_numeric:
 PE_VALUE ':' PE_VALUE
 {
-	ABORT_ON(parse_events_add_numeric(list_event, idx, $1, $3, NULL));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_numeric(&list, idx, $1, $3, NULL));
+	$$ = list;
 }
 
 event_legacy_raw:
 PE_RAW
 {
-	ABORT_ON(parse_events_add_numeric(list_event, idx, PERF_TYPE_RAW, $1, NULL));
+	struct list_head *list = NULL;
+
+	ABORT_ON(parse_events_add_numeric(&list, idx, PERF_TYPE_RAW, $1, NULL));
+	$$ = list;
 }
 
 event_config:
@@ -199,6 +237,14 @@ PE_NAME
 	$$ = term;
 }
 |
+PE_TERM '=' PE_NAME
+{
+	struct parse_events__term *term;
+
+	ABORT_ON(parse_events__term_str(&term, $1, NULL, $3));
+	$$ = term;
+}
+|
 PE_TERM '=' PE_VALUE
 {
 	struct parse_events__term *term;
@@ -222,7 +268,6 @@ sep_slash_dc: '/' | ':' |
 %%
 
 void parse_events_error(struct list_head *list_all __used,
-			struct list_head *list_event __used,
 			int *idx __used,
 			char const *msg __used)
 {
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 8ee219b..a119a53 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -258,9 +258,9 @@ static int pmu_config_term(struct list_head *formats,
 static int pmu_config(struct list_head *formats, struct perf_event_attr *attr,
 		      struct list_head *head_terms)
 {
-	struct parse_events__term *term, *h;
+	struct parse_events__term *term;
 
-	list_for_each_entry_safe(term, h, head_terms, list)
+	list_for_each_entry(term, head_terms, list)
 		if (pmu_config_term(formats, attr, term))
 			return -EINVAL;
 
diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
index e30749e..4c1b3d7 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -56,7 +56,7 @@ INTERP my_perl;
 #define FTRACE_MAX_EVENT				\
 	((1 << (sizeof(unsigned short) * 8)) - 1)
 
-struct event *events[FTRACE_MAX_EVENT];
+struct event_format *events[FTRACE_MAX_EVENT];
 
 extern struct scripting_context *scripting_context;
 
@@ -181,7 +181,7 @@ static void define_flag_field(const char *ev_name,
 	LEAVE;
 }
 
-static void define_event_symbols(struct event *event,
+static void define_event_symbols(struct event_format *event,
 				 const char *ev_name,
 				 struct print_arg *args)
 {
@@ -209,6 +209,8 @@ static void define_event_symbols(struct event *event,
 		define_symbolic_values(args->symbol.symbols, ev_name,
 				       cur_field_name);
 		break;
+	case PRINT_BSTRING:
+	case PRINT_DYNAMIC_ARRAY:
 	case PRINT_STRING:
 		break;
 	case PRINT_TYPE:
@@ -220,7 +222,9 @@ static void define_event_symbols(struct event *event,
 		define_event_symbols(event, ev_name, args->op.left);
 		define_event_symbols(event, ev_name, args->op.right);
 		break;
+	case PRINT_FUNC:
 	default:
+		pr_err("Unsupported print arg type\n");
 		/* we should warn... */
 		return;
 	}
@@ -229,10 +233,10 @@ static void define_event_symbols(struct event *event,
 		define_event_symbols(event, ev_name, args->next);
 }
 
-static inline struct event *find_cache_event(int type)
+static inline struct event_format *find_cache_event(int type)
 {
 	static char ev_name[256];
-	struct event *event;
+	struct event_format *event;
 
 	if (events[type])
 		return events[type];
@@ -258,7 +262,7 @@ static void perl_process_tracepoint(union perf_event *pevent __unused,
 	static char handler[256];
 	unsigned long long val;
 	unsigned long s, ns;
-	struct event *event;
+	struct event_format *event;
 	int type;
 	int pid;
 	int cpu = sample->cpu;
@@ -446,7 +450,7 @@ static int perl_stop_script(void)
 
 static int perl_generate_script(const char *outfile)
 {
-	struct event *event = NULL;
+	struct event_format *event = NULL;
 	struct format_field *f;
 	char fname[PATH_MAX];
 	int not_first, count;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4dcc8f3..93d355d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -481,6 +481,38 @@ static void perf_event__read_swap(union perf_event *event)
 	event->read.id		 = bswap_64(event->read.id);
 }
 
+static u8 revbyte(u8 b)
+{
+	int rev = (b >> 4) | ((b & 0xf) << 4);
+	rev = ((rev & 0xcc) >> 2) | ((rev & 0x33) << 2);
+	rev = ((rev & 0xaa) >> 1) | ((rev & 0x55) << 1);
+	return (u8) rev;
+}
+
+/*
+ * XXX this is hack in attempt to carry flags bitfield
+ * throught endian village. ABI says:
+ *
+ * Bit-fields are allocated from right to left (least to most significant)
+ * on little-endian implementations and from left to right (most to least
+ * significant) on big-endian implementations.
+ *
+ * The above seems to be byte specific, so we need to reverse each
+ * byte of the bitfield. 'Internet' also says this might be implementation
+ * specific and we probably need proper fix and carry perf_event_attr
+ * bitfield flags in separate data file FEAT_ section. Thought this seems
+ * to work for now.
+ */
+static void swap_bitfield(u8 *p, unsigned len)
+{
+	unsigned i;
+
+	for (i = 0; i < len; i++) {
+		*p = revbyte(*p);
+		p++;
+	}
+}
+
 /* exported for swapping attributes in file header */
 void perf_event__attr_swap(struct perf_event_attr *attr)
 {
@@ -494,6 +526,8 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
 	attr->bp_type		= bswap_32(attr->bp_type);
 	attr->bp_addr		= bswap_64(attr->bp_addr);
 	attr->bp_len		= bswap_64(attr->bp_len);
+
+	swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64));
 }
 
 static void perf_event__hdr_attr_swap(union perf_event *event)
@@ -1064,8 +1098,9 @@ volatile int session_done;
 static int __perf_session__process_pipe_events(struct perf_session *self,
 					       struct perf_tool *tool)
 {
-	union perf_event event;
-	uint32_t size;
+	union perf_event *event;
+	uint32_t size, cur_size = 0;
+	void *buf = NULL;
 	int skip = 0;
 	u64 head;
 	int err;
@@ -1074,8 +1109,14 @@ static int __perf_session__process_pipe_events(struct perf_session *self,
 	perf_tool__fill_defaults(tool);
 
 	head = 0;
+	cur_size = sizeof(union perf_event);
+
+	buf = malloc(cur_size);
+	if (!buf)
+		return -errno;
 more:
-	err = readn(self->fd, &event, sizeof(struct perf_event_header));
+	event = buf;
+	err = readn(self->fd, event, sizeof(struct perf_event_header));
 	if (err <= 0) {
 		if (err == 0)
 			goto done;
@@ -1085,13 +1126,23 @@ more:
 	}
 
 	if (self->header.needs_swap)
-		perf_event_header__bswap(&event.header);
+		perf_event_header__bswap(&event->header);
 
-	size = event.header.size;
+	size = event->header.size;
 	if (size == 0)
 		size = 8;
 
-	p = &event;
+	if (size > cur_size) {
+		void *new = realloc(buf, size);
+		if (!new) {
+			pr_err("failed to allocate memory to read event\n");
+			goto out_err;
+		}
+		buf = new;
+		cur_size = size;
+		event = buf;
+	}
+	p = event;
 	p += sizeof(struct perf_event_header);
 
 	if (size - sizeof(struct perf_event_header)) {
@@ -1107,9 +1158,9 @@ more:
 		}
 	}
 
-	if ((skip = perf_session__process_event(self, &event, tool, head)) < 0) {
+	if ((skip = perf_session__process_event(self, event, tool, head)) < 0) {
 		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
-		       head, event.header.size, event.header.type);
+		       head, event->header.size, event->header.type);
 		err = -EINVAL;
 		goto out_err;
 	}
@@ -1124,6 +1175,7 @@ more:
 done:
 	err = 0;
 out_err:
+	free(buf);
 	perf_session__warn_about_errors(self, tool);
 	perf_session_free_sample_buffers(self);
 	return err;
diff --git a/tools/perf/util/types.h b/tools/perf/util/types.h
index 5f3689a..c51fa6b 100644
--- a/tools/perf/util/types.h
+++ b/tools/perf/util/types.h
@@ -16,4 +16,9 @@ typedef signed short	   s16;
 typedef unsigned char	   u8;
 typedef signed char	   s8;
 
+union u64_swap {
+	u64 val64;
+	u32 val32[2];
+};
+
 #endif /* __PERF_TYPES_H */

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2011-03-04  0:38 ` [GIT PULL] perf updates Frederic Weisbecker
@ 2011-03-04  7:10   ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2011-03-04  7:10 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Ingo,
> 
> Please pull the perf/core branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/core
> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (2):
>       perf: Fix missing strndup declaration
>       perf: Fix undefined PyVarObject_HEAD_INIT in python 2.5
> 
> 
>  tools/perf/util/python.c    |    5 +++++
>  tools/perf/util/strfilter.c |    1 -
>  2 files changed, 5 insertions(+), 1 deletions(-)

Pulled, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
  2011-02-26 15:11 [PATCH v2] perf: Fix undefined PyVarObject_HEAD_INIT in python 2.5 Frederic Weisbecker
@ 2011-03-04  0:38 ` Frederic Weisbecker
  2011-03-04  7:10   ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Frederic Weisbecker @ 2011-03-04  0:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Arnaldo Carvalho de Melo

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (2):
      perf: Fix missing strndup declaration
      perf: Fix undefined PyVarObject_HEAD_INIT in python 2.5


 tools/perf/util/python.c    |    5 +++++
 tools/perf/util/strfilter.c |    1 -
 2 files changed, 5 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2010-06-08 20:13 Frederic Weisbecker
@ 2010-06-08 21:13 ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2010-06-08 21:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Steven Rostedt, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 2773 bytes --]


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Ingo,
> 
> Please pull the perf/core branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/core
> 
> I'm only posting Oleg's patches as only those are new in my queues.
> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (2):
>       x86: Unify dumpstack.h and stacktrace.h
>       perf: Drop the skip argument from perf_arch_fetch_regs_caller
> 
> Oleg Nesterov (2):
>       x86: Make save_stack_address() !CONFIG_FRAME_POINTER friendly
>       x86: Unify save_stack_address() and save_stack_address_nosched()
> 
> Américo Wang (1):
>       tracing: Remove boot tracer
> 
> Li Zefan (1):
>       tracing: Remove kmemtrace ftrace plugin
> 
> 
>  Documentation/ABI/testing/debugfs-kmemtrace |   71 ----
>  Documentation/trace/kmemtrace.txt           |  126 -------
>  MAINTAINERS                                 |    7 -
>  arch/powerpc/include/asm/perf_event.h       |   12 +
>  arch/powerpc/kernel/misc.S                  |   26 --
>  arch/sparc/include/asm/perf_event.h         |    8 +
>  arch/sparc/kernel/helpers.S                 |    6 +-
>  arch/x86/include/asm/perf_event.h           |   13 +
>  arch/x86/include/asm/stacktrace.h           |   49 +++
>  arch/x86/kernel/cpu/perf_event.c            |   18 -
>  arch/x86/kernel/dumpstack.c                 |    1 -
>  arch/x86/kernel/dumpstack.h                 |   56 ---
>  arch/x86/kernel/dumpstack_32.c              |    2 -
>  arch/x86/kernel/dumpstack_64.c              |    1 -
>  arch/x86/kernel/stacktrace.c                |   31 +-
>  include/linux/kmemtrace.h                   |   25 --
>  include/linux/perf_event.h                  |   32 +--
>  include/linux/slab_def.h                    |    3 +-
>  include/linux/slub_def.h                    |    3 +-
>  include/trace/boot.h                        |   60 ---
>  include/trace/ftrace.h                      |    2 +-
>  init/main.c                                 |   29 +-
>  kernel/perf_event.c                         |    5 -
>  kernel/trace/Kconfig                        |   37 --
>  kernel/trace/Makefile                       |    2 -
>  kernel/trace/kmemtrace.c                    |  529 ---------------------------
>  kernel/trace/trace.c                        |    3 -
>  kernel/trace/trace.h                        |   20 -
>  kernel/trace/trace_boot.c                   |  185 ----------
>  kernel/trace/trace_entries.h                |   62 ----
>  kernel/trace/trace_event_perf.c             |    2 -
>  mm/slab.c                                   |    1 -
>  mm/slub.c                                   |    1 -
>  33 files changed, 123 insertions(+), 1305 deletions(-)

Pulled, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2010-06-08 20:13 Frederic Weisbecker
  2010-06-08 21:13 ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Frederic Weisbecker @ 2010-06-08 20:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Steven Rostedt, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2558 bytes --]

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

I'm only posting Oleg's patches as only those are new in my queues.

Thanks,
	Frederic
---

Frederic Weisbecker (2):
      x86: Unify dumpstack.h and stacktrace.h
      perf: Drop the skip argument from perf_arch_fetch_regs_caller

Oleg Nesterov (2):
      x86: Make save_stack_address() !CONFIG_FRAME_POINTER friendly
      x86: Unify save_stack_address() and save_stack_address_nosched()

Américo Wang (1):
      tracing: Remove boot tracer

Li Zefan (1):
      tracing: Remove kmemtrace ftrace plugin


 Documentation/ABI/testing/debugfs-kmemtrace |   71 ----
 Documentation/trace/kmemtrace.txt           |  126 -------
 MAINTAINERS                                 |    7 -
 arch/powerpc/include/asm/perf_event.h       |   12 +
 arch/powerpc/kernel/misc.S                  |   26 --
 arch/sparc/include/asm/perf_event.h         |    8 +
 arch/sparc/kernel/helpers.S                 |    6 +-
 arch/x86/include/asm/perf_event.h           |   13 +
 arch/x86/include/asm/stacktrace.h           |   49 +++
 arch/x86/kernel/cpu/perf_event.c            |   18 -
 arch/x86/kernel/dumpstack.c                 |    1 -
 arch/x86/kernel/dumpstack.h                 |   56 ---
 arch/x86/kernel/dumpstack_32.c              |    2 -
 arch/x86/kernel/dumpstack_64.c              |    1 -
 arch/x86/kernel/stacktrace.c                |   31 +-
 include/linux/kmemtrace.h                   |   25 --
 include/linux/perf_event.h                  |   32 +--
 include/linux/slab_def.h                    |    3 +-
 include/linux/slub_def.h                    |    3 +-
 include/trace/boot.h                        |   60 ---
 include/trace/ftrace.h                      |    2 +-
 init/main.c                                 |   29 +-
 kernel/perf_event.c                         |    5 -
 kernel/trace/Kconfig                        |   37 --
 kernel/trace/Makefile                       |    2 -
 kernel/trace/kmemtrace.c                    |  529 ---------------------------
 kernel/trace/trace.c                        |    3 -
 kernel/trace/trace.h                        |   20 -
 kernel/trace/trace_boot.c                   |  185 ----------
 kernel/trace/trace_entries.h                |   62 ----
 kernel/trace/trace_event_perf.c             |    2 -
 mm/slab.c                                   |    1 -
 mm/slub.c                                   |    1 -
 33 files changed, 123 insertions(+), 1305 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2010-05-25 13:45 Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-05-25 13:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Paul Mackerras, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (2):
      x86: Unify dumpstack.h and stacktrace.h
      perf: Drop the skip argument from perf_arch_fetch_regs_caller


 arch/powerpc/include/asm/perf_event.h |   12 +++++++
 arch/powerpc/kernel/misc.S            |   26 ---------------
 arch/sparc/include/asm/perf_event.h   |    8 +++++
 arch/sparc/kernel/helpers.S           |    6 ++--
 arch/x86/include/asm/perf_event.h     |   13 +++++++
 arch/x86/include/asm/stacktrace.h     |   49 ++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event.c      |   18 ----------
 arch/x86/kernel/dumpstack.c           |    1 -
 arch/x86/kernel/dumpstack.h           |   56 ---------------------------------
 arch/x86/kernel/dumpstack_32.c        |    2 -
 arch/x86/kernel/dumpstack_64.c        |    1 -
 arch/x86/kernel/stacktrace.c          |    7 ++--
 include/linux/perf_event.h            |   32 ++++--------------
 include/trace/ftrace.h                |    2 +-
 kernel/perf_event.c                   |    5 ---
 kernel/trace/trace_event_perf.c       |    2 -
 16 files changed, 97 insertions(+), 143 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2010-05-09 20:43 Frederic Weisbecker
  2010-05-10  5:10 ` Frederic Weisbecker
@ 2010-05-10  6:46 ` Ingo Molnar
  1 sibling, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2010-05-10  6:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Hitoshi Mitake, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Jens Axboe, Jason Baron,
	Xiao Guangrong, Tom Zanussi, Masami Hiramatsu, Steven Rostedt


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Ingo,
> 
> Please pull the perf/test branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/test

Pulled, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2010-05-09 20:43 Frederic Weisbecker
@ 2010-05-10  5:10 ` Frederic Weisbecker
  2010-05-10  6:46 ` Ingo Molnar
  1 sibling, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-05-10  5:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Hitoshi Mitake, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Jens Axboe, Jason Baron,
	Xiao Guangrong, Tom Zanussi, Masami Hiramatsu, Steven Rostedt

On Sun, May 09, 2010 at 10:43:28PM +0200, Frederic Weisbecker wrote:
> Ingo,
> 
> Please pull the perf/test branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/test


It's for tip:perf/core btw.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2010-05-09 20:43 Frederic Weisbecker
  2010-05-10  5:10 ` Frederic Weisbecker
  2010-05-10  6:46 ` Ingo Molnar
  0 siblings, 2 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-05-09 20:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Hitoshi Mitake, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong, Tom Zanussi, Masami Hiramatsu,
	Steven Rostedt

Ingo,

Please pull the perf/test branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/test

I've already posted the new reordering patches.

Summary:

- Now the reordering works with any kind of raw sample flow. I
  know I already said that with the previous period based reordering,
  but now it seems to be true. I hope...

- Perf lock cleanups and improvements.

- Lock events cleanups

Now I observed we lose a lot of events with perf lock. It's just
that the flow can be so huge that perf record can't keep up with
the buffers, and the kernel often meets the buffer size limit.
Increasing the buffers size makes the things better but it's still not
enough.

In fact we have strong scalability problems that can be solved after
we reach the two following steps:

- I need to unearth the injection thing, so that I'll be able to reduce
the size of the lock events: putting the name in the lock_init event only
and only pass the address of the lock instance in the other lock events.

- Introducing "perf muliplex". The way perf record saves the events by
walking through every counters in a linear way is nice if you have a small
flow and few cpus. I did all my tests on an SMT 2 threads machine (Atom) and
it loses lots of lock events (even with 8192 pages per counters), due to the
huge flow, and this linear walking: the time you read a counter and write the
events to the file, another counter may have reached its buffer size limit already.

Now I can imagine what will be the result after a test on this sparc machine
with 64 cpus.

So we need a perf record mode that does this: multiplex counters buffers per
cpu, so that we have one buffer per cpu. Having one thread per cpu that does its
own cpu record in its own file. Then pass the whole to perf multiplex that can
handle the multiplexing into a single file (useful as a multiplexing remote point
too for perf pipe).

Anyway, lot of work in prevision.

Thanks,
	Frederic
---

Frederic Weisbecker (9):
      perf: Introduce a new "round of buffers read" pseudo event
      perf: Provide a new deterministic events reordering algorithm
      perf: Cleanup perf lock broken states
      perf: Humanize lock flags in perf lock
      perf: Fix perf lock bad rate
      perf lock: Always check min AND max wait time
      tracing: Drop lock_acquired waittime field
      tracing: Drop the nested field from lock_release event
      tracing: Factorize lock events in a lock class

Hitoshi Mitake (2):
      perf lock: Add "info" subcommand for dumping misc information
      perf lock: Drop "-a" option from cmd_record() default arguments set

Tom Zanussi (1):
      perf/live-mode: Handle payload-less events


 include/trace/events/lock.h |   55 +++++-----------
 kernel/lockdep.c            |    4 +-
 tools/perf/builtin-lock.c   |  145 +++++++++++++++++++++++++++++++------------
 tools/perf/builtin-record.c |   34 +++++++---
 tools/perf/util/event.h     |    3 +-
 tools/perf/util/session.c   |  125 ++++++++++++++++++++++++++-----------
 tools/perf/util/session.h   |   36 ++++++-----
 7 files changed, 257 insertions(+), 145 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2010-04-14 15:59 Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-04-14 15:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (2):
      perf: Store active software events in a hashlist
      perf: Make clock software events consistent with general exclusion rules


 include/linux/perf_event.h |   12 ++
 kernel/perf_event.c        |  255 +++++++++++++++++++++++++++++++------------
 2 files changed, 196 insertions(+), 71 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2010-04-04 14:42 ` Frederic Weisbecker
  2010-04-04 19:02   ` Ingo Molnar
@ 2010-04-05  9:56   ` Hitoshi Mitake
  1 sibling, 0 replies; 76+ messages in thread
From: Hitoshi Mitake @ 2010-04-05  9:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Stephane Eranian

On 04/04/10 23:42, Frederic Weisbecker wrote:
 > On Sun, Apr 04, 2010 at 04:36:40PM +0200, Frederic Weisbecker wrote:
 >> Ingo,
 >>
 >> Please pull the perf/core branch that can be found at:
 >>
 >> 
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
 >> 	perf/core
 >>
 >> Thanks,
 >> 	Frederic
 >> ---
 >>
 >> Frederic Weisbecker (2):
 >>        perf: Drop the frame reliablity check
 >>        perf: Fetch hot regs from the template caller
 >>
 >> Arnd Bergmann (1):
 >>        perf_event: Make perf fd non seekable
 >>
 >> Hitoshi Mitake (1):
 >>        Swap inclusion order of util.h and string.h in util/string.c
 >
 >
 > I just did a last minute rebase, just to prepend the title of this
 > one with "perf: "

Ah, sorry, my naming of title was not good...

Thanks,
	Hitoshi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2010-04-04 14:42 ` Frederic Weisbecker
@ 2010-04-04 19:02   ` Ingo Molnar
  2010-04-05  9:56   ` Hitoshi Mitake
  1 sibling, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2010-04-04 19:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Hitoshi Mitake, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Stephane Eranian


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Sun, Apr 04, 2010 at 04:36:40PM +0200, Frederic Weisbecker wrote:
> > Ingo,
> > 
> > Please pull the perf/core branch that can be found at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> > 	perf/core
> > 
> > Thanks,
> > 	Frederic
> > ---
> > 
> > Frederic Weisbecker (2):
> >       perf: Drop the frame reliablity check
> >       perf: Fetch hot regs from the template caller
> > 
> > Arnd Bergmann (1):
> >       perf_event: Make perf fd non seekable
> > 
> > Hitoshi Mitake (1):
> >       Swap inclusion order of util.h and string.h in util/string.c
> 
> 
> I just did a last minute rebase, just to prepend the title of this
> one with "perf: "
> 
> 
> > 
> > 
> >  arch/x86/kernel/cpu/perf_event.c |    3 +--
> >  include/trace/ftrace.h           |   23 ++++++++++++-----------
> >  kernel/perf_event.c              |    1 +
> >  tools/perf/util/string.c         |    2 +-
> >  4 files changed, 15 insertions(+), 14 deletions(-)

Pulled, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2010-04-04 14:36 Frederic Weisbecker
@ 2010-04-04 14:42 ` Frederic Weisbecker
  2010-04-04 19:02   ` Ingo Molnar
  2010-04-05  9:56   ` Hitoshi Mitake
  0 siblings, 2 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-04-04 14:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Hitoshi Mitake, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Arnd Bergmann, Stephane Eranian

On Sun, Apr 04, 2010 at 04:36:40PM +0200, Frederic Weisbecker wrote:
> Ingo,
> 
> Please pull the perf/core branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/core
> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (2):
>       perf: Drop the frame reliablity check
>       perf: Fetch hot regs from the template caller
> 
> Arnd Bergmann (1):
>       perf_event: Make perf fd non seekable
> 
> Hitoshi Mitake (1):
>       Swap inclusion order of util.h and string.h in util/string.c


I just did a last minute rebase, just to prepend the title of this
one with "perf: "


> 
> 
>  arch/x86/kernel/cpu/perf_event.c |    3 +--
>  include/trace/ftrace.h           |   23 ++++++++++++-----------
>  kernel/perf_event.c              |    1 +
>  tools/perf/util/string.c         |    2 +-
>  4 files changed, 15 insertions(+), 14 deletions(-)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2010-04-04 14:36 Frederic Weisbecker
  2010-04-04 14:42 ` Frederic Weisbecker
  0 siblings, 1 reply; 76+ messages in thread
From: Frederic Weisbecker @ 2010-04-04 14:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Hitoshi Mitake, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Arnd Bergmann,
	Stephane Eranian

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (2):
      perf: Drop the frame reliablity check
      perf: Fetch hot regs from the template caller

Arnd Bergmann (1):
      perf_event: Make perf fd non seekable

Hitoshi Mitake (1):
      Swap inclusion order of util.h and string.h in util/string.c


 arch/x86/kernel/cpu/perf_event.c |    3 +--
 include/trace/ftrace.h           |   23 ++++++++++++-----------
 kernel/perf_event.c              |    1 +
 tools/perf/util/string.c         |    2 +-
 4 files changed, 15 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [GIT PULL] perf updates
  2010-03-05 21:55 Frederic Weisbecker
@ 2010-03-10 14:28 ` Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2010-03-10 14:28 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Peter Zijlstra, Paul Mackerras


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Ingo,
> 
> Please pull the perf/core branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/core
> 
> - Lock events seem to be usable now with perf
> - Better performances in x86-64 stacktraces
> - Trace events now have correct callchains
> - Cleanups
> 
> 
> Other patches still in discussion are not in the branch.
> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (5):
>       lockdep: Move lock events under lockdep recursion protection
>       perf/x86-64: Use frame pointer to walk on irq and process stacks
>       perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
>       perf: Take a hot regs snapshot for trace events
>       perf: Drop the obsolete profile naming for trace events
> 
> 
>  arch/x86/kernel/cpu/perf_event.c   |   12 +++
>  arch/x86/kernel/dumpstack.h        |   15 +++
>  arch/x86/kernel/dumpstack_64.c     |    4 +-
>  include/linux/ftrace_event.h       |   23 +++--
>  include/linux/perf_event.h         |   42 +++++++++-
>  include/linux/syscalls.h           |   24 +++---
>  include/trace/ftrace.h             |   44 +++++-----
>  include/trace/syscall.h            |    8 +-
>  kernel/lockdep.c                   |    9 +--
>  kernel/perf_event.c                |   18 ++--
>  kernel/trace/Makefile              |    2 +-
>  kernel/trace/trace_event_perf.c    |  165 ++++++++++++++++++++++++++++++++++++
>  kernel/trace/trace_event_profile.c |  164 -----------------------------------
>  kernel/trace/trace_events.c        |    2 +-
>  kernel/trace/trace_kprobe.c        |   29 +++---
>  kernel/trace/trace_syscalls.c      |   72 ++++++++--------
>  16 files changed, 353 insertions(+), 280 deletions(-)

Pulled, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2010-03-05 21:55 Frederic Weisbecker
  2010-03-10 14:28 ` Ingo Molnar
  0 siblings, 1 reply; 76+ messages in thread
From: Frederic Weisbecker @ 2010-03-05 21:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Paul Mackerras

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

- Lock events seem to be usable now with perf
- Better performances in x86-64 stacktraces
- Trace events now have correct callchains
- Cleanups


Other patches still in discussion are not in the branch.

Thanks,
	Frederic
---

Frederic Weisbecker (5):
      lockdep: Move lock events under lockdep recursion protection
      perf/x86-64: Use frame pointer to walk on irq and process stacks
      perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
      perf: Take a hot regs snapshot for trace events
      perf: Drop the obsolete profile naming for trace events


 arch/x86/kernel/cpu/perf_event.c   |   12 +++
 arch/x86/kernel/dumpstack.h        |   15 +++
 arch/x86/kernel/dumpstack_64.c     |    4 +-
 include/linux/ftrace_event.h       |   23 +++--
 include/linux/perf_event.h         |   42 +++++++++-
 include/linux/syscalls.h           |   24 +++---
 include/trace/ftrace.h             |   44 +++++-----
 include/trace/syscall.h            |    8 +-
 kernel/lockdep.c                   |    9 +--
 kernel/perf_event.c                |   18 ++--
 kernel/trace/Makefile              |    2 +-
 kernel/trace/trace_event_perf.c    |  165 ++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_event_profile.c |  164 -----------------------------------
 kernel/trace/trace_events.c        |    2 +-
 kernel/trace/trace_kprobe.c        |   29 +++---
 kernel/trace/trace_syscalls.c      |   72 ++++++++--------
 16 files changed, 353 insertions(+), 280 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] Perf updates
@ 2010-02-27 18:25 Frederic Weisbecker
  0 siblings, 0 replies; 76+ messages in thread
From: Frederic Weisbecker @ 2010-02-27 18:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, K . Prasad, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Hitoshi Mitake,
	Tejun Heo

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      perf lock: Drop the buffers multiplexing dependency
      perf: Remove pointless breakpoint union
      x86/hw-breakpoints: Remove the name field

Hitoshi Mitake (1):
      perf lock: Fix and add misc documentally things

Tejun Heo (1):
      percpu: Add __percpu sparse annotations to hw_breakpoint


 arch/x86/include/asm/hw_breakpoint.h    |    1 -
 arch/x86/kernel/hw_breakpoint.c         |    7 --
 include/linux/hw_breakpoint.h           |    8 +-
 include/linux/perf_event.h              |    5 +-
 kernel/hw_breakpoint.c                  |   10 +-
 lib/Kconfig.debug                       |    8 ++
 samples/hw_breakpoint/data_breakpoint.c |    6 +-
 tools/perf/Documentation/perf-lock.txt  |   29 ++++++
 tools/perf/builtin-lock.c               |  148 ++++++++++++++++++++++++++++++-
 tools/perf/command-list.txt             |    1 +
 10 files changed, 198 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [GIT PULL] perf updates
@ 2009-12-14 16:24 Ingo Molnar
  0 siblings, 0 replies; 76+ messages in thread
From: Ingo Molnar @ 2009-12-14 16:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Fr??d??ric Weisbecker, Andrew Morton

Linus,

Please pull the latest perf-fixes-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git perf-fixes-for-linus

 Thanks,

	Ingo

------------------>
Arnaldo Carvalho de Melo (8):
      perf symbols: Rename kthreads to kmaps, using another abstraction for it
      perf symbols: Introduce symbol_type__is_a
      perf symbols: Introduce ELF counterparts to symbol_type__is_a
      perf symbols: Add support for 'variable' symtabs
      perf symbols: Add missing "Variables" entry to map_type__name
      perf symbols: Allow lookups by symbol name too
      perf symbols: Ditch dso->find_symbol
      perf tools: Introduce perf_session class

David Miller (1):
      perf sched: Fix build failure on sparc

Frederic Weisbecker (1):
      hw-breakpoints: Handle bad modify_user_hw_breakpoint off-case return value

Hitoshi Mitake (1):
      perf bench: Add "all" pseudo subsystem and "all" pseudo suite

Jamie Iles (2):
      perf tools: Allow cross compiling
      perf tools: Allow building for ARM

Li Zefan (2):
      tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING
      tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE


 include/linux/hw_breakpoint.h      |    2 +-
 include/linux/slab_def.h           |    4 +-
 include/linux/slub_def.h           |    4 +-
 mm/slab.c                          |   12 +-
 mm/slub.c                          |    4 +-
 tools/perf/Makefile                |    9 +-
 tools/perf/bench/sched-messaging.c |    8 +-
 tools/perf/bench/sched-pipe.c      |   11 +-
 tools/perf/builtin-annotate.c      |   15 ++-
 tools/perf/builtin-bench.c         |   57 +++++++-
 tools/perf/builtin-buildid-list.c  |   55 +-------
 tools/perf/builtin-kmem.c          |   15 ++-
 tools/perf/builtin-record.c        |   25 ++--
 tools/perf/builtin-report.c        |   20 ++-
 tools/perf/builtin-sched.c         |   13 ++-
 tools/perf/builtin-timechart.c     |   15 ++-
 tools/perf/builtin-trace.c         |   20 ++-
 tools/perf/perf.h                  |   12 ++
 tools/perf/util/data_map.c         |   71 ++--------
 tools/perf/util/data_map.h         |    9 +-
 tools/perf/util/event.c            |   11 +-
 tools/perf/util/event.h            |    7 +-
 tools/perf/util/header.c           |   28 +---
 tools/perf/util/header.h           |    4 +-
 tools/perf/util/map.c              |   87 +++++++-----
 tools/perf/util/session.c          |   80 +++++++++++
 tools/perf/util/session.h          |   16 ++
 tools/perf/util/symbol.c           |  268 ++++++++++++++++++++++++++++--------
 tools/perf/util/symbol.h           |   17 ++-
 tools/perf/util/thread.c           |   63 +++++----
 tools/perf/util/thread.h           |   42 ++++--
 31 files changed, 651 insertions(+), 353 deletions(-)
 create mode 100644 tools/perf/util/session.c
 create mode 100644 tools/perf/util/session.h

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 69f07a9..41235c9 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -93,7 +93,7 @@ register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    struct task_struct *tsk)	{ return NULL; }
 static inline int
 modify_user_hw_breakpoint(struct perf_event *bp,
-			  struct perf_event_attr *attr)	{ return NULL; }
+			  struct perf_event_attr *attr)	{ return -ENOSYS; }
 static inline struct perf_event *
 register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 				perf_overflow_handler_t	 triggered,
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 850d057..ca6b2b3 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -110,7 +110,7 @@ extern struct cache_sizes malloc_sizes[];
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
 void *__kmalloc(size_t size, gfp_t flags);
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 extern void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags);
 extern size_t slab_buffer_size(struct kmem_cache *cachep);
 #else
@@ -166,7 +166,7 @@ found:
 extern void *__kmalloc_node(size_t size, gfp_t flags, int node);
 extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 extern void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep,
 					   gfp_t flags,
 					   int nodeid);
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 5ad70a6..1e14beb 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -217,7 +217,7 @@ static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
 void *__kmalloc(size_t size, gfp_t flags);
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 extern void *kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags);
 #else
 static __always_inline void *
@@ -266,7 +266,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
 void *__kmalloc_node(size_t size, gfp_t flags, int node);
 void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 extern void *kmem_cache_alloc_node_notrace(struct kmem_cache *s,
 					   gfp_t gfpflags,
 					   int node);
diff --git a/mm/slab.c b/mm/slab.c
index 7dfa481..c3d092d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -490,7 +490,7 @@ static void **dbg_userword(struct kmem_cache *cachep, void *objp)
 
 #endif
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 size_t slab_buffer_size(struct kmem_cache *cachep)
 {
 	return cachep->buffer_size;
@@ -3558,7 +3558,7 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 }
 EXPORT_SYMBOL(kmem_cache_alloc);
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 void *kmem_cache_alloc_notrace(struct kmem_cache *cachep, gfp_t flags)
 {
 	return __cache_alloc(cachep, flags, __builtin_return_address(0));
@@ -3621,7 +3621,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 }
 EXPORT_SYMBOL(kmem_cache_alloc_node);
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 void *kmem_cache_alloc_node_notrace(struct kmem_cache *cachep,
 				    gfp_t flags,
 				    int nodeid)
@@ -3649,7 +3649,7 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller)
 	return ret;
 }
 
-#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_KMEMTRACE)
+#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_TRACING)
 void *__kmalloc_node(size_t size, gfp_t flags, int node)
 {
 	return __do_kmalloc_node(size, flags, node,
@@ -3669,7 +3669,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
 	return __do_kmalloc_node(size, flags, node, NULL);
 }
 EXPORT_SYMBOL(__kmalloc_node);
-#endif /* CONFIG_DEBUG_SLAB */
+#endif /* CONFIG_DEBUG_SLAB || CONFIG_TRACING */
 #endif /* CONFIG_NUMA */
 
 /**
@@ -3701,7 +3701,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
 }
 
 
-#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_KMEMTRACE)
+#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_TRACING)
 void *__kmalloc(size_t size, gfp_t flags)
 {
 	return __do_kmalloc(size, flags, __builtin_return_address(0));
diff --git a/mm/slub.c b/mm/slub.c
index 4996fc7..4a89c3d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1754,7 +1754,7 @@ void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
 }
 EXPORT_SYMBOL(kmem_cache_alloc);
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 void *kmem_cache_alloc_notrace(struct kmem_cache *s, gfp_t gfpflags)
 {
 	return slab_alloc(s, gfpflags, -1, _RET_IP_);
@@ -1775,7 +1775,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 EXPORT_SYMBOL(kmem_cache_alloc_node);
 #endif
 
-#ifdef CONFIG_KMEMTRACE
+#ifdef CONFIG_TRACING
 void *kmem_cache_alloc_node_notrace(struct kmem_cache *s,
 				    gfp_t gfpflags,
 				    int node)
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 23ec660..4069996 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -237,8 +237,8 @@ lib = lib
 
 export prefix bindir sharedir sysconfdir
 
-CC = gcc
-AR = ar
+CC = $(CROSS_COMPILE)gcc
+AR = $(CROSS_COMPILE)ar
 RM = rm -f
 TAR = tar
 FIND = find
@@ -356,7 +356,9 @@ LIB_H += util/parse-options.h
 LIB_H += util/parse-events.h
 LIB_H += util/quote.h
 LIB_H += util/util.h
+LIB_H += util/header.h
 LIB_H += util/help.h
+LIB_H += util/session.h
 LIB_H += util/strbuf.h
 LIB_H += util/string.h
 LIB_H += util/strlist.h
@@ -405,6 +407,7 @@ LIB_OBJS += util/callchain.o
 LIB_OBJS += util/values.o
 LIB_OBJS += util/debug.o
 LIB_OBJS += util/map.o
+LIB_OBJS += util/session.o
 LIB_OBJS += util/thread.o
 LIB_OBJS += util/trace-event-parse.o
 LIB_OBJS += util/trace-event-read.o
@@ -492,8 +495,10 @@ else
 	LIB_OBJS += util/probe-finder.o
 endif
 
+ifndef NO_LIBPERL
 PERL_EMBED_LDOPTS = `perl -MExtUtils::Embed -e ldopts 2>/dev/null`
 PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null`
+endif
 
 ifneq ($(shell sh -c "(echo '\#include <EXTERN.h>'; echo '\#include <perl.h>'; echo 'int main(void) { perl_alloc(); return 0; }') | $(CC) -x c - $(PERL_EMBED_CCOPTS) -o /dev/null $(PERL_EMBED_LDOPTS) > /dev/null 2>&1 && echo y"), y)
 	BASIC_CFLAGS += -DNO_LIBPERL
diff --git a/tools/perf/bench/sched-messaging.c b/tools/perf/bench/sched-messaging.c
index 605a2a9..81cee78 100644
--- a/tools/perf/bench/sched-messaging.c
+++ b/tools/perf/bench/sched-messaging.c
@@ -1,6 +1,6 @@
 /*
  *
- * builtin-bench-messaging.c
+ * sched-messaging.c
  *
  * messaging: Benchmark for scheduler and IPC mechanisms
  *
@@ -320,10 +320,12 @@ int bench_sched_messaging(int argc, const char **argv,
 		       num_groups, num_groups * 2 * num_fds,
 		       thread_mode ? "threads" : "processes");
 		printf(" %14s: %lu.%03lu [sec]\n", "Total time",
-		       diff.tv_sec, diff.tv_usec/1000);
+		       diff.tv_sec,
+		       (unsigned long) (diff.tv_usec/1000));
 		break;
 	case BENCH_FORMAT_SIMPLE:
-		printf("%lu.%03lu\n", diff.tv_sec, diff.tv_usec/1000);
+		printf("%lu.%03lu\n", diff.tv_sec,
+		       (unsigned long) (diff.tv_usec/1000));
 		break;
 	default:
 		/* reaching here is something disaster */
diff --git a/tools/perf/bench/sched-pipe.c b/tools/perf/bench/sched-pipe.c
index 238185f..4f77c7c 100644
--- a/tools/perf/bench/sched-pipe.c
+++ b/tools/perf/bench/sched-pipe.c
@@ -1,6 +1,6 @@
 /*
  *
- * builtin-bench-pipe.c
+ * sched-pipe.c
  *
  * pipe: Benchmark for pipe()
  *
@@ -87,7 +87,8 @@ int bench_sched_pipe(int argc, const char **argv,
 	if (pid) {
 		retpid = waitpid(pid, &wait_stat, 0);
 		assert((retpid == pid) && WIFEXITED(wait_stat));
-		return 0;
+	} else {
+		exit(0);
 	}
 
 	switch (bench_format) {
@@ -99,7 +100,8 @@ int bench_sched_pipe(int argc, const char **argv,
 		result_usec += diff.tv_usec;
 
 		printf(" %14s: %lu.%03lu [sec]\n\n", "Total time",
-		       diff.tv_sec, diff.tv_usec/1000);
+		       diff.tv_sec,
+		       (unsigned long) (diff.tv_usec/1000));
 
 		printf(" %14lf usecs/op\n",
 		       (double)result_usec / (double)loops);
@@ -110,7 +112,8 @@ int bench_sched_pipe(int argc, const char **argv,
 
 	case BENCH_FORMAT_SIMPLE:
 		printf("%lu.%03lu\n",
-		       diff.tv_sec, diff.tv_usec / 1000);
+		       diff.tv_sec,
+		       (unsigned long) (diff.tv_usec / 1000));
 		break;
 
 	default:
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 0bf2e8f..21a78d3 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -25,6 +25,7 @@
 #include "util/thread.h"
 #include "util/sort.h"
 #include "util/hist.h"
+#include "util/session.h"
 #include "util/data_map.h"
 
 static char		const *input_name = "perf.data";
@@ -462,21 +463,23 @@ static struct perf_file_handler file_handler = {
 
 static int __cmd_annotate(void)
 {
-	struct perf_header *header;
+	struct perf_session *session = perf_session__new(input_name, O_RDONLY, force);
 	struct thread *idle;
 	int ret;
 
+	if (session == NULL)
+		return -ENOMEM;
+
 	idle = register_idle_thread();
 	register_perf_file_handler(&file_handler);
 
-	ret = mmap_dispatch_perf_file(&header, input_name, 0, 0,
-				      &event__cwdlen, &event__cwd);
+	ret = perf_session__process_events(session, 0, &event__cwdlen, &event__cwd);
 	if (ret)
-		return ret;
+		goto out_delete;
 
 	if (dump_trace) {
 		event__print_totals();
-		return 0;
+		goto out_delete;
 	}
 
 	if (verbose > 3)
@@ -489,6 +492,8 @@ static int __cmd_annotate(void)
 	output__resort(event__total[0]);
 
 	find_annotations();
+out_delete:
+	perf_session__delete(session);
 
 	return ret;
 }
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index e043eb8..4699677 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -31,6 +31,9 @@ struct bench_suite {
 	const char *summary;
 	int (*fn)(int, const char **, const char *);
 };
+						\
+/* sentinel: easy for help */
+#define suite_all { "all", "test all suite (pseudo suite)", NULL }
 
 static struct bench_suite sched_suites[] = {
 	{ "messaging",
@@ -39,6 +42,7 @@ static struct bench_suite sched_suites[] = {
 	{ "pipe",
 	  "Flood of communication over pipe() between two processes",
 	  bench_sched_pipe      },
+	suite_all,
 	{ NULL,
 	  NULL,
 	  NULL                  }
@@ -48,6 +52,7 @@ static struct bench_suite mem_suites[] = {
 	{ "memcpy",
 	  "Simple memory copy in various ways",
 	  bench_mem_memcpy },
+	suite_all,
 	{ NULL,
 	  NULL,
 	  NULL             }
@@ -66,6 +71,9 @@ static struct bench_subsys subsystems[] = {
 	{ "mem",
 	  "memory access performance",
 	  mem_suites },
+	{ "all",		/* sentinel: easy for help */
+	  "test all subsystem (pseudo subsystem)",
+	  NULL },
 	{ NULL,
 	  NULL,
 	  NULL       }
@@ -75,11 +83,11 @@ static void dump_suites(int subsys_index)
 {
 	int i;
 
-	printf("List of available suites for %s...\n\n",
+	printf("# List of available suites for %s...\n\n",
 	       subsystems[subsys_index].name);
 
 	for (i = 0; subsystems[subsys_index].suites[i].name; i++)
-		printf("\t%s: %s\n",
+		printf("%14s: %s\n",
 		       subsystems[subsys_index].suites[i].name,
 		       subsystems[subsys_index].suites[i].summary);
 
@@ -110,10 +118,10 @@ static void print_usage(void)
 		printf("\t%s\n", bench_usage[i]);
 	printf("\n");
 
-	printf("List of available subsystems...\n\n");
+	printf("# List of available subsystems...\n\n");
 
 	for (i = 0; subsystems[i].name; i++)
-		printf("\t%s: %s\n",
+		printf("%14s: %s\n",
 		       subsystems[i].name, subsystems[i].summary);
 	printf("\n");
 }
@@ -131,6 +139,37 @@ static int bench_str2int(char *str)
 	return BENCH_FORMAT_UNKNOWN;
 }
 
+static void all_suite(struct bench_subsys *subsys)	  /* FROM HERE */
+{
+	int i;
+	const char *argv[2];
+	struct bench_suite *suites = subsys->suites;
+
+	argv[1] = NULL;
+	/*
+	 * TODO:
+	 * preparing preset parameters for
+	 * embedded, ordinary PC, HPC, etc...
+	 * will be helpful
+	 */
+	for (i = 0; suites[i].fn; i++) {
+		printf("# Running %s/%s benchmark...\n",
+		       subsys->name,
+		       suites[i].name);
+
+		argv[1] = suites[i].name;
+		suites[i].fn(1, argv, NULL);
+		printf("\n");
+	}
+}
+
+static void all_subsystem(void)
+{
+	int i;
+	for (i = 0; subsystems[i].suites; i++)
+		all_suite(&subsystems[i]);
+}
+
 int cmd_bench(int argc, const char **argv, const char *prefix __used)
 {
 	int i, j, status = 0;
@@ -155,6 +194,11 @@ int cmd_bench(int argc, const char **argv, const char *prefix __used)
 		goto end;
 	}
 
+	if (!strcmp(argv[0], "all")) {
+		all_subsystem();
+		goto end;
+	}
+
 	for (i = 0; subsystems[i].name; i++) {
 		if (strcmp(subsystems[i].name, argv[0]))
 			continue;
@@ -165,6 +209,11 @@ int cmd_bench(int argc, const char **argv, const char *prefix __used)
 			goto end;
 		}
 
+		if (!strcmp(argv[1], "all")) {
+			all_suite(&subsystems[i]);
+			goto end;
+		}
+
 		for (j = 0; subsystems[i].suites[j].name; j++) {
 			if (strcmp(subsystems[i].suites[j].name, argv[1]))
 				continue;
diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index dcb6143..bfd16a1 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -11,8 +11,8 @@
 #include "util/cache.h"
 #include "util/data_map.h"
 #include "util/debug.h"
-#include "util/header.h"
 #include "util/parse-options.h"
+#include "util/session.h"
 #include "util/symbol.h"
 
 static char const *input_name = "perf.data";
@@ -55,56 +55,17 @@ static int perf_file_section__process_buildids(struct perf_file_section *self,
 static int __cmd_buildid_list(void)
 {
 	int err = -1;
-	struct perf_header *header;
-	struct perf_file_header f_header;
-	struct stat input_stat;
-	int input = open(input_name, O_RDONLY);
+	struct perf_session *session = perf_session__new(input_name, O_RDONLY, force);
 
-	if (input < 0) {
-		pr_err("failed to open file: %s", input_name);
-		if (!strcmp(input_name, "perf.data"))
-			pr_err("  (try 'perf record' first)");
-		pr_err("\n");
-		goto out;
-	}
-
-	err = fstat(input, &input_stat);
-	if (err < 0) {
-		perror("failed to stat file");
-		goto out_close;
-	}
-
-	if (!force && input_stat.st_uid && (input_stat.st_uid != geteuid())) {
-		pr_err("file %s not owned by current user or root\n",
-		       input_name);
-		goto out_close;
-	}
-
-	if (!input_stat.st_size) {
-		pr_info("zero-sized file, nothing to do!\n");
-		goto out_close;
-	}
-
-	err = -1;
-	header = perf_header__new();
-	if (header == NULL)
-		goto out_close;
-
-	if (perf_file_header__read(&f_header, header, input) < 0) {
-		pr_warning("incompatible file format");
-		goto out_close;
-	}
+	if (session == NULL)
+		return -1;
 
-	err = perf_header__process_sections(header, input,
+	err = perf_header__process_sections(&session->header, session->fd,
 				         perf_file_section__process_buildids);
+	if (err >= 0)
+		dsos__fprintf_buildid(stdout);
 
-	if (err < 0)
-		goto out_close;
-
-	dsos__fprintf_buildid(stdout);
-out_close:
-	close(input);
-out:
+	perf_session__delete(session);
 	return err;
 }
 
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 5f20951..2071d24 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -6,6 +6,7 @@
 #include "util/symbol.h"
 #include "util/thread.h"
 #include "util/header.h"
+#include "util/session.h"
 
 #include "util/parse-options.h"
 #include "util/trace-event.h"
@@ -20,7 +21,6 @@ typedef int (*sort_fn_t)(struct alloc_stat *, struct alloc_stat *);
 
 static char const		*input_name = "perf.data";
 
-static struct perf_header	*header;
 static u64			sample_type;
 
 static int			alloc_flag;
@@ -367,11 +367,18 @@ static struct perf_file_handler file_handler = {
 
 static int read_events(void)
 {
+	int err;
+	struct perf_session *session = perf_session__new(input_name, O_RDONLY, 0);
+
+	if (session == NULL)
+		return -ENOMEM;
+
 	register_idle_thread();
 	register_perf_file_handler(&file_handler);
 
-	return mmap_dispatch_perf_file(&header, input_name, 0, 0,
-				       &event__cwdlen, &event__cwd);
+	err = perf_session__process_events(session, 0, &event__cwdlen, &event__cwd);
+	perf_session__delete(session);
+	return err;
 }
 
 static double fragmentation(unsigned long n_req, unsigned long n_alloc)
@@ -403,7 +410,7 @@ static void __print_result(struct rb_root *root, int n_lines, int is_caller)
 		if (is_caller) {
 			addr = data->call_site;
 			if (!raw_ip)
-				sym = thread__find_function(kthread, addr, NULL);
+				sym = map_groups__find_function(kmaps, addr, NULL);
 		} else
 			addr = data->ptr;
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0e519c6..4decbd1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -17,6 +17,7 @@
 #include "util/header.h"
 #include "util/event.h"
 #include "util/debug.h"
+#include "util/session.h"
 #include "util/symbol.h"
 
 #include <unistd.h>
@@ -62,7 +63,7 @@ static int			nr_cpu				=      0;
 
 static int			file_new			=      1;
 
-struct perf_header		*header				=   NULL;
+static struct perf_session	*session;
 
 struct mmap_data {
 	int			counter;
@@ -216,12 +217,12 @@ static struct perf_header_attr *get_header_attr(struct perf_event_attr *a, int n
 {
 	struct perf_header_attr *h_attr;
 
-	if (nr < header->attrs) {
-		h_attr = header->attr[nr];
+	if (nr < session->header.attrs) {
+		h_attr = session->header.attr[nr];
 	} else {
 		h_attr = perf_header_attr__new(a);
 		if (h_attr != NULL)
-			if (perf_header__add_attr(header, h_attr) < 0) {
+			if (perf_header__add_attr(&session->header, h_attr) < 0) {
 				perf_header_attr__delete(h_attr);
 				h_attr = NULL;
 			}
@@ -395,9 +396,9 @@ static void open_counters(int cpu, pid_t pid)
 
 static void atexit_header(void)
 {
-	header->data_size += bytes_written;
+	session->header.data_size += bytes_written;
 
-	perf_header__write(header, output, true);
+	perf_header__write(&session->header, output, true);
 }
 
 static int __cmd_record(int argc, const char **argv)
@@ -440,24 +441,24 @@ static int __cmd_record(int argc, const char **argv)
 		exit(-1);
 	}
 
-	header = perf_header__new();
-	if (header == NULL) {
+	session = perf_session__new(output_name, O_WRONLY, force);
+	if (session == NULL) {
 		pr_err("Not enough memory for reading perf file header\n");
 		return -1;
 	}
 
 	if (!file_new) {
-		err = perf_header__read(header, output);
+		err = perf_header__read(&session->header, output);
 		if (err < 0)
 			return err;
 	}
 
 	if (raw_samples) {
-		perf_header__set_feat(header, HEADER_TRACE_INFO);
+		perf_header__set_feat(&session->header, HEADER_TRACE_INFO);
 	} else {
 		for (i = 0; i < nr_counters; i++) {
 			if (attrs[i].sample_type & PERF_SAMPLE_RAW) {
-				perf_header__set_feat(header, HEADER_TRACE_INFO);
+				perf_header__set_feat(&session->header, HEADER_TRACE_INFO);
 				break;
 			}
 		}
@@ -481,7 +482,7 @@ static int __cmd_record(int argc, const char **argv)
 	}
 
 	if (file_new) {
-		err = perf_header__write(header, output, false);
+		err = perf_header__write(&session->header, output, false);
 		if (err < 0)
 			return err;
 	}
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2b9eb3a..e2ec49a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -22,6 +22,7 @@
 #include "perf.h"
 #include "util/debug.h"
 #include "util/header.h"
+#include "util/session.h"
 
 #include "util/parse-options.h"
 #include "util/parse-events.h"
@@ -52,7 +53,7 @@ static int		exclude_other = 1;
 
 static char		callchain_default_opt[] = "fractal,0.5";
 
-static struct perf_header *header;
+static struct perf_session *session;
 
 static u64		sample_type;
 
@@ -701,7 +702,7 @@ static int process_read_event(event_t *event)
 {
 	struct perf_event_attr *attr;
 
-	attr = perf_header__find_attr(event->read.id, header);
+	attr = perf_header__find_attr(event->read.id, &session->header);
 
 	if (show_threads) {
 		const char *name = attr ? __event_name(attr->type, attr->config)
@@ -766,6 +767,10 @@ static int __cmd_report(void)
 	struct thread *idle;
 	int ret;
 
+	session = perf_session__new(input_name, O_RDONLY, force);
+	if (session == NULL)
+		return -ENOMEM;
+
 	idle = register_idle_thread();
 	thread__comm_adjust(idle);
 
@@ -774,14 +779,14 @@ static int __cmd_report(void)
 
 	register_perf_file_handler(&file_handler);
 
-	ret = mmap_dispatch_perf_file(&header, input_name, force,
-				      full_paths, &event__cwdlen, &event__cwd);
+	ret = perf_session__process_events(session, full_paths,
+					   &event__cwdlen, &event__cwd);
 	if (ret)
-		return ret;
+		goto out_delete;
 
 	if (dump_trace) {
 		event__print_totals();
-		return 0;
+		goto out_delete;
 	}
 
 	if (verbose > 3)
@@ -796,7 +801,8 @@ static int __cmd_report(void)
 
 	if (show_threads)
 		perf_read_values_destroy(&show_threads_values);
-
+out_delete:
+	perf_session__delete(session);
 	return ret;
 }
 
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7cca7c1..65021fe 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -6,6 +6,7 @@
 #include "util/symbol.h"
 #include "util/thread.h"
 #include "util/header.h"
+#include "util/session.h"
 
 #include "util/parse-options.h"
 #include "util/trace-event.h"
@@ -21,7 +22,6 @@
 
 static char			const *input_name = "perf.data";
 
-static struct perf_header	*header;
 static u64			sample_type;
 
 static char			default_sort_order[] = "avg, max, switch, runtime";
@@ -1663,11 +1663,18 @@ static struct perf_file_handler file_handler = {
 
 static int read_events(void)
 {
+	int err;
+	struct perf_session *session = perf_session__new(input_name, O_RDONLY, 0);
+
+	if (session == NULL)
+		return -ENOMEM;
+
 	register_idle_thread();
 	register_perf_file_handler(&file_handler);
 
-	return mmap_dispatch_perf_file(&header, input_name, 0, 0,
-				       &event__cwdlen, &event__cwd);
+	err = perf_session__process_events(session, 0, &event__cwdlen, &event__cwd);
+	perf_session__delete(session);
+	return err;
 }
 
 static void print_bad_events(void)
diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index f472df9..759dd2b 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -1059,15 +1059,17 @@ static struct perf_file_handler file_handler = {
 
 static int __cmd_timechart(void)
 {
-	struct perf_header *header;
+	struct perf_session *session = perf_session__new(input_name, O_RDONLY, 0);
 	int ret;
 
+	if (session == NULL)
+		return -ENOMEM;
+
 	register_perf_file_handler(&file_handler);
 
-	ret = mmap_dispatch_perf_file(&header, input_name, 0, 0,
-				      &event__cwdlen, &event__cwd);
+	ret = perf_session__process_events(session, 0, &event__cwdlen, &event__cwd);
 	if (ret)
-		return EXIT_FAILURE;
+		goto out_delete;
 
 	process_samples();
 
@@ -1079,8 +1081,9 @@ static int __cmd_timechart(void)
 
 	pr_info("Written %2.1f seconds of trace to %s.\n",
 		(last_time - first_time) / 1000000000.0, output_name);
-
-	return EXIT_SUCCESS;
+out_delete:
+	perf_session__delete(session);
+	return ret;
 }
 
 static const char * const timechart_usage[] = {
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index c2fcc34..0756664 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -7,6 +7,7 @@
 #include "util/header.h"
 #include "util/exec_cmd.h"
 #include "util/trace-event.h"
+#include "util/session.h"
 
 static char const		*script_name;
 static char const		*generate_script_lang;
@@ -61,7 +62,7 @@ static int cleanup_scripting(void)
 
 static char const		*input_name = "perf.data";
 
-static struct perf_header	*header;
+static struct perf_session 	*session;
 static u64			sample_type;
 
 static int process_sample_event(event_t *event)
@@ -126,11 +127,18 @@ static struct perf_file_handler file_handler = {
 
 static int __cmd_trace(void)
 {
+	int err;
+
+	session = perf_session__new(input_name, O_RDONLY, 0);
+	if (session == NULL)
+		return -ENOMEM;
+
 	register_idle_thread();
 	register_perf_file_handler(&file_handler);
 
-	return mmap_dispatch_perf_file(&header, input_name,
-				       0, 0, &event__cwdlen, &event__cwd);
+	err = perf_session__process_events(session, 0, &event__cwdlen, &event__cwd);
+	perf_session__delete(session);
+	return err;
 }
 
 struct script_spec {
@@ -348,11 +356,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __used)
 			return -1;
 		}
 
-		header = perf_header__new();
-		if (header == NULL)
-			return -1;
-
-		perf_header__read(header, input);
+		perf_header__read(&session->header, input);
 		err = scripting_ops->generate_script("perf-trace");
 		goto out;
 	}
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 454d5d5..75f941b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,18 @@
 #define cpu_relax()	asm volatile ("hint @pause" ::: "memory")
 #endif
 
+#ifdef __arm__
+#include "../../arch/arm/include/asm/unistd.h"
+/*
+ * Use the __kuser_memory_barrier helper in the CPU helper page. See
+ * arch/arm/kernel/entry-armv.S in the kernel source for details.
+ */
+#define rmb()		asm volatile("mov r0, #0xffff0fff; mov lr, pc;" \
+				     "sub pc, r0, #95" ::: "r0", "lr", "cc", \
+				     "memory")
+#define cpu_relax()	asm volatile("":::"memory")
+#endif
+
 #include <time.h>
 #include <unistd.h>
 #include <sys/types.h>
diff --git a/tools/perf/util/data_map.c b/tools/perf/util/data_map.c
index 59b65d0..6d46dda 100644
--- a/tools/perf/util/data_map.c
+++ b/tools/perf/util/data_map.c
@@ -129,23 +129,16 @@ out:
 	return err;
 }
 
-int mmap_dispatch_perf_file(struct perf_header **pheader,
-			    const char *input_name,
-			    int force,
-			    int full_paths,
-			    int *cwdlen,
-			    char **cwd)
+int perf_session__process_events(struct perf_session *self,
+				 int full_paths, int *cwdlen, char **cwd)
 {
 	int err;
-	struct perf_header *header;
 	unsigned long head, shift;
 	unsigned long offset = 0;
-	struct stat input_stat;
 	size_t	page_size;
 	u64 sample_type;
 	event_t *event;
 	uint32_t size;
-	int input;
 	char *buf;
 
 	if (curr_handler == NULL) {
@@ -155,56 +148,19 @@ int mmap_dispatch_perf_file(struct perf_header **pheader,
 
 	page_size = getpagesize();
 
-	input = open(input_name, O_RDONLY);
-	if (input < 0) {
-		pr_err("Failed to open file: %s", input_name);
-		if (!strcmp(input_name, "perf.data"))
-			pr_err("  (try 'perf record' first)");
-		pr_err("\n");
-		return -errno;
-	}
-
-	if (fstat(input, &input_stat) < 0) {
-		pr_err("failed to stat file");
-		err = -errno;
-		goto out_close;
-	}
-
-	err = -EACCES;
-	if (!force && input_stat.st_uid && (input_stat.st_uid != geteuid())) {
-		pr_err("file: %s not owned by current user or root\n",
-			input_name);
-		goto out_close;
-	}
-
-	if (input_stat.st_size == 0) {
-		pr_info("zero-sized file, nothing to do!\n");
-		goto done;
-	}
-
-	err = -ENOMEM;
-	header = perf_header__new();
-	if (header == NULL)
-		goto out_close;
-
-	err = perf_header__read(header, input);
-	if (err < 0)
-		goto out_delete;
-	*pheader = header;
-	head = header->data_offset;
-
-	sample_type = perf_header__sample_type(header);
+	head = self->header.data_offset;
+	sample_type = perf_header__sample_type(&self->header);
 
 	err = -EINVAL;
 	if (curr_handler->sample_type_check &&
 	    curr_handler->sample_type_check(sample_type) < 0)
-		goto out_delete;
+		goto out_err;
 
 	if (!full_paths) {
 		if (getcwd(__cwd, sizeof(__cwd)) == NULL) {
 			pr_err("failed to get the current directory\n");
 			err = -errno;
-			goto out_delete;
+			goto out_err;
 		}
 		*cwd = __cwd;
 		*cwdlen = strlen(*cwd);
@@ -219,11 +175,11 @@ int mmap_dispatch_perf_file(struct perf_header **pheader,
 
 remap:
 	buf = mmap(NULL, page_size * mmap_window, PROT_READ,
-		   MAP_SHARED, input, offset);
+		   MAP_SHARED, self->fd, offset);
 	if (buf == MAP_FAILED) {
 		pr_err("failed to mmap file\n");
 		err = -errno;
-		goto out_delete;
+		goto out_err;
 	}
 
 more:
@@ -273,19 +229,14 @@ more:
 
 	head += size;
 
-	if (offset + head >= header->data_offset + header->data_size)
+	if (offset + head >= self->header.data_offset + self->header.data_size)
 		goto done;
 
-	if (offset + head < (unsigned long)input_stat.st_size)
+	if (offset + head < self->size)
 		goto more;
 
 done:
 	err = 0;
-out_close:
-	close(input);
-
+out_err:
 	return err;
-out_delete:
-	perf_header__delete(header);
-	goto out_close;
 }
diff --git a/tools/perf/util/data_map.h b/tools/perf/util/data_map.h
index 258a87b..98c5b82 100644
--- a/tools/perf/util/data_map.h
+++ b/tools/perf/util/data_map.h
@@ -3,6 +3,7 @@
 
 #include "event.h"
 #include "header.h"
+#include "session.h"
 
 typedef int (*event_type_handler_t)(event_t *);
 
@@ -21,12 +22,8 @@ struct perf_file_handler {
 };
 
 void register_perf_file_handler(struct perf_file_handler *handler);
-int mmap_dispatch_perf_file(struct perf_header **pheader,
-			    const char *input_name,
-			    int force,
-			    int full_paths,
-			    int *cwdlen,
-			    char **cwd);
+int perf_session__process_events(struct perf_session *self,
+				 int full_paths, int *cwdlen, char **cwd);
 int perf_header__read_build_ids(int input, u64 offset, u64 file_size);
 
 #endif
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4dcecaf..ba0de90 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -254,13 +254,14 @@ void thread__find_addr_location(struct thread *self, u8 cpumode,
 				struct addr_location *al,
 				symbol_filter_t filter)
 {
-	struct thread *thread = al->thread = self;
+	struct map_groups *mg = &self->mg;
 
+	al->thread = self;
 	al->addr = addr;
 
 	if (cpumode & PERF_RECORD_MISC_KERNEL) {
 		al->level = 'k';
-		thread = kthread;
+		mg = kmaps;
 	} else if (cpumode & PERF_RECORD_MISC_USER)
 		al->level = '.';
 	else {
@@ -270,7 +271,7 @@ void thread__find_addr_location(struct thread *self, u8 cpumode,
 		return;
 	}
 try_again:
-	al->map = thread__find_map(thread, type, al->addr);
+	al->map = map_groups__find(mg, type, al->addr);
 	if (al->map == NULL) {
 		/*
 		 * If this is outside of all known maps, and is a negative
@@ -281,8 +282,8 @@ try_again:
 		 * "[vdso]" dso, but for now lets use the old trick of looking
 		 * in the whole kernel symbol list.
 		 */
-		if ((long long)al->addr < 0 && thread != kthread) {
-			thread = kthread;
+		if ((long long)al->addr < 0 && mg != kmaps) {
+			mg = kmaps;
 			goto try_again;
 		}
 		al->sym = NULL;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c7a78ee..51a96c2 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -103,10 +103,11 @@ void event__print_totals(void);
 
 enum map_type {
 	MAP__FUNCTION = 0,
-
-	MAP__NR_TYPES,
+	MAP__VARIABLE,
 };
 
+#define MAP__NR_TYPES (MAP__VARIABLE + 1)
+
 struct map {
 	union {
 		struct rb_node	rb_node;
@@ -150,6 +151,8 @@ int map__overlap(struct map *l, struct map *r);
 size_t map__fprintf(struct map *self, FILE *fp);
 struct symbol *map__find_symbol(struct map *self, u64 addr,
 				symbol_filter_t filter);
+struct symbol *map__find_symbol_by_name(struct map *self, const char *name,
+					symbol_filter_t filter);
 void map__fixup_start(struct map *self);
 void map__fixup_end(struct map *self);
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 59a9c0b..f2e8d87 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -58,35 +58,19 @@ int perf_header_attr__add_id(struct perf_header_attr *self, u64 id)
 	return 0;
 }
 
-/*
- * Create new perf.data header:
- */
-struct perf_header *perf_header__new(void)
+int perf_header__init(struct perf_header *self)
 {
-	struct perf_header *self = zalloc(sizeof(*self));
-
-	if (self != NULL) {
-		self->size = 1;
-		self->attr = malloc(sizeof(void *));
-
-		if (self->attr == NULL) {
-			free(self);
-			self = NULL;
-		}
-	}
-
-	return self;
+	self->size = 1;
+	self->attr = malloc(sizeof(void *));
+	return self->attr == NULL ? -ENOMEM : 0;
 }
 
-void perf_header__delete(struct perf_header *self)
+void perf_header__exit(struct perf_header *self)
 {
 	int i;
-
 	for (i = 0; i < self->attrs; ++i)
-		perf_header_attr__delete(self->attr[i]);
-
+                perf_header_attr__delete(self->attr[i]);
 	free(self->attr);
-	free(self);
 }
 
 int perf_header__add_attr(struct perf_header *self,
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index d1dbe2b..d118d05 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -55,8 +55,8 @@ struct perf_header {
 	DECLARE_BITMAP(adds_features, HEADER_FEAT_BITS);
 };
 
-struct perf_header *perf_header__new(void);
-void perf_header__delete(struct perf_header *self);
+int perf_header__init(struct perf_header *self);
+void perf_header__exit(struct perf_header *self);
 
 int perf_header__read(struct perf_header *self, int fd);
 int perf_header__write(struct perf_header *self, int fd, bool at_exit);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 69f94fe..76bdca6 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -104,43 +104,64 @@ void map__fixup_end(struct map *self)
 
 #define DSO__DELETED "(deleted)"
 
-struct symbol *map__find_symbol(struct map *self, u64 addr,
-				symbol_filter_t filter)
+static int map__load(struct map *self, symbol_filter_t filter)
 {
-	if (!dso__loaded(self->dso, self->type)) {
-		int nr = dso__load(self->dso, self, filter);
-
-		if (nr < 0) {
-			if (self->dso->has_build_id) {
-				char sbuild_id[BUILD_ID_SIZE * 2 + 1];
-
-				build_id__sprintf(self->dso->build_id,
-						  sizeof(self->dso->build_id),
-						  sbuild_id);
-				pr_warning("%s with build id %s not found",
-					   self->dso->long_name, sbuild_id);
-			} else
-				pr_warning("Failed to open %s",
-					   self->dso->long_name);
-			pr_warning(", continuing without symbols\n");
-			return NULL;
-		} else if (nr == 0) {
-			const char *name = self->dso->long_name;
-			const size_t len = strlen(name);
-			const size_t real_len = len - sizeof(DSO__DELETED);
-
-			if (len > sizeof(DSO__DELETED) &&
-			    strcmp(name + real_len + 1, DSO__DELETED) == 0) {
-				pr_warning("%.*s was updated, restart the long running apps that use it!\n",
-					   (int)real_len, name);
-			} else {
-				pr_warning("no symbols found in %s, maybe install a debug package?\n", name);
-			}
-			return NULL;
+	const char *name = self->dso->long_name;
+	int nr = dso__load(self->dso, self, filter);
+
+	if (nr < 0) {
+		if (self->dso->has_build_id) {
+			char sbuild_id[BUILD_ID_SIZE * 2 + 1];
+
+			build_id__sprintf(self->dso->build_id,
+					  sizeof(self->dso->build_id),
+					  sbuild_id);
+			pr_warning("%s with build id %s not found",
+				   name, sbuild_id);
+		} else
+			pr_warning("Failed to open %s", name);
+
+		pr_warning(", continuing without symbols\n");
+		return -1;
+	} else if (nr == 0) {
+		const size_t len = strlen(name);
+		const size_t real_len = len - sizeof(DSO__DELETED);
+
+		if (len > sizeof(DSO__DELETED) &&
+		    strcmp(name + real_len + 1, DSO__DELETED) == 0) {
+			pr_warning("%.*s was updated, restart the long "
+				   "running apps that use it!\n",
+				   (int)real_len, name);
+		} else {
+			pr_warning("no symbols found in %s, maybe install "
+				   "a debug package?\n", name);
 		}
+
+		return -1;
 	}
 
-	return self->dso->find_symbol(self->dso, self->type, addr);
+	return 0;
+}
+
+struct symbol *map__find_symbol(struct map *self, u64 addr,
+				symbol_filter_t filter)
+{
+	if (!dso__loaded(self->dso, self->type) && map__load(self, filter) < 0)
+		return NULL;
+
+	return dso__find_symbol(self->dso, self->type, addr);
+}
+
+struct symbol *map__find_symbol_by_name(struct map *self, const char *name,
+					symbol_filter_t filter)
+{
+	if (!dso__loaded(self->dso, self->type) && map__load(self, filter) < 0)
+		return NULL;
+
+	if (!dso__sorted_by_name(self->dso, self->type))
+		dso__sort_by_name(self->dso, self->type);
+
+	return dso__find_symbol_by_name(self->dso, self->type, name);
 }
 
 struct map *map__clone(struct map *self)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
new file mode 100644
index 0000000..707ce1c
--- /dev/null
+++ b/tools/perf/util/session.c
@@ -0,0 +1,80 @@
+#include <linux/kernel.h>
+
+#include <unistd.h>
+#include <sys/types.h>
+
+#include "session.h"
+#include "util.h"
+
+static int perf_session__open(struct perf_session *self, bool force)
+{
+	struct stat input_stat;
+
+	self->fd = open(self->filename, O_RDONLY);
+	if (self->fd < 0) {
+		pr_err("failed to open file: %s", self->filename);
+		if (!strcmp(self->filename, "perf.data"))
+			pr_err("  (try 'perf record' first)");
+		pr_err("\n");
+		return -errno;
+	}
+
+	if (fstat(self->fd, &input_stat) < 0)
+		goto out_close;
+
+	if (!force && input_stat.st_uid && (input_stat.st_uid != geteuid())) {
+		pr_err("file %s not owned by current user or root\n",
+		       self->filename);
+		goto out_close;
+	}
+
+	if (!input_stat.st_size) {
+		pr_info("zero-sized file (%s), nothing to do!\n",
+			self->filename);
+		goto out_close;
+	}
+
+	if (perf_header__read(&self->header, self->fd) < 0) {
+		pr_err("incompatible file format");
+		goto out_close;
+	}
+
+	self->size = input_stat.st_size;
+	return 0;
+
+out_close:
+	close(self->fd);
+	self->fd = -1;
+	return -1;
+}
+
+struct perf_session *perf_session__new(const char *filename, int mode, bool force)
+{
+	size_t len = strlen(filename) + 1;
+	struct perf_session *self = zalloc(sizeof(*self) + len);
+
+	if (self == NULL)
+		goto out;
+
+	if (perf_header__init(&self->header) < 0)
+		goto out_delete;
+
+	memcpy(self->filename, filename, len);
+
+	if (mode == O_RDONLY && perf_session__open(self, force) < 0) {
+		perf_session__delete(self);
+		self = NULL;
+	}
+out:
+	return self;
+out_delete:
+	free(self);
+	return NULL;
+}
+
+void perf_session__delete(struct perf_session *self)
+{
+	perf_header__exit(&self->header);
+	close(self->fd);
+	free(self);
+}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
new file mode 100644
index 0000000..f3699c8
--- /dev/null
+++ b/tools/perf/util/session.h
@@ -0,0 +1,16 @@
+#ifndef __PERF_SESSION_H
+#define __PERF_SESSION_H
+
+#include "header.h"
+
+struct perf_session {
+	struct perf_header	header;
+	unsigned long		size;
+	int			fd;
+	char filename[0];
+};
+
+struct perf_session *perf_session__new(const char *filename, int mode, bool force);
+void perf_session__delete(struct perf_session *self);
+
+#endif /* __PERF_SESSION_H */
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index e7508ad..d3d9fed 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -29,11 +29,9 @@ enum dso_origin {
 };
 
 static void dsos__add(struct list_head *head, struct dso *dso);
-static struct map *thread__find_map_by_name(struct thread *self, char *name);
 static struct map *map__new2(u64 start, struct dso *dso, enum map_type type);
-struct symbol *dso__find_symbol(struct dso *self, enum map_type type, u64 addr);
 static int dso__load_kernel_sym(struct dso *self, struct map *map,
-				struct thread *thread, symbol_filter_t filter);
+				struct map_groups *mg, symbol_filter_t filter);
 unsigned int symbol__priv_size;
 static int vmlinux_path__nr_entries;
 static char **vmlinux_path;
@@ -43,19 +41,41 @@ static struct symbol_conf symbol_conf__defaults = {
 	.try_vmlinux_path = true,
 };
 
-static struct thread kthread_mem;
-struct thread *kthread = &kthread_mem;
+static struct map_groups kmaps_mem;
+struct map_groups *kmaps = &kmaps_mem;
 
 bool dso__loaded(const struct dso *self, enum map_type type)
 {
 	return self->loaded & (1 << type);
 }
 
+bool dso__sorted_by_name(const struct dso *self, enum map_type type)
+{
+	return self->sorted_by_name & (1 << type);
+}
+
 static void dso__set_loaded(struct dso *self, enum map_type type)
 {
 	self->loaded |= (1 << type);
 }
 
+static void dso__set_sorted_by_name(struct dso *self, enum map_type type)
+{
+	self->sorted_by_name |= (1 << type);
+}
+
+static bool symbol_type__is_a(char symbol_type, enum map_type map_type)
+{
+	switch (map_type) {
+	case MAP__FUNCTION:
+		return symbol_type == 'T' || symbol_type == 'W';
+	case MAP__VARIABLE:
+		return symbol_type == 'D' || symbol_type == 'd';
+	default:
+		return false;
+	}
+}
+
 static void symbols__fixup_end(struct rb_root *self)
 {
 	struct rb_node *nd, *prevnd = rb_first(self);
@@ -79,7 +99,7 @@ static void symbols__fixup_end(struct rb_root *self)
 		curr->end = roundup(curr->start, 4096);
 }
 
-static void __thread__fixup_maps_end(struct thread *self, enum map_type type)
+static void __map_groups__fixup_end(struct map_groups *self, enum map_type type)
 {
 	struct map *prev, *curr;
 	struct rb_node *nd, *prevnd = rb_first(&self->maps[type]);
@@ -102,11 +122,11 @@ static void __thread__fixup_maps_end(struct thread *self, enum map_type type)
 	curr->end = ~0UL;
 }
 
-static void thread__fixup_maps_end(struct thread *self)
+static void map_groups__fixup_end(struct map_groups *self)
 {
 	int i;
 	for (i = 0; i < MAP__NR_TYPES; ++i)
-		__thread__fixup_maps_end(self, i);
+		__map_groups__fixup_end(self, i);
 }
 
 static struct symbol *symbol__new(u64 start, u64 len, const char *name)
@@ -164,11 +184,11 @@ struct dso *dso__new(const char *name)
 		dso__set_long_name(self, self->name);
 		self->short_name = self->name;
 		for (i = 0; i < MAP__NR_TYPES; ++i)
-			self->symbols[i] = RB_ROOT;
-		self->find_symbol = dso__find_symbol;
+			self->symbols[i] = self->symbol_names[i] = RB_ROOT;
 		self->slen_calculated = 0;
 		self->origin = DSO__ORIG_NOT_FOUND;
 		self->loaded = 0;
+		self->sorted_by_name = 0;
 		self->has_build_id = 0;
 	}
 
@@ -246,11 +266,85 @@ static struct symbol *symbols__find(struct rb_root *self, u64 ip)
 	return NULL;
 }
 
-struct symbol *dso__find_symbol(struct dso *self, enum map_type type, u64 addr)
+struct symbol_name_rb_node {
+	struct rb_node	rb_node;
+	struct symbol	sym;
+};
+
+static void symbols__insert_by_name(struct rb_root *self, struct symbol *sym)
+{
+	struct rb_node **p = &self->rb_node;
+	struct rb_node *parent = NULL;
+	struct symbol_name_rb_node *symn = ((void *)sym) - sizeof(*parent), *s;
+
+	while (*p != NULL) {
+		parent = *p;
+		s = rb_entry(parent, struct symbol_name_rb_node, rb_node);
+		if (strcmp(sym->name, s->sym.name) < 0)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&symn->rb_node, parent, p);
+	rb_insert_color(&symn->rb_node, self);
+}
+
+static void symbols__sort_by_name(struct rb_root *self, struct rb_root *source)
+{
+	struct rb_node *nd;
+
+	for (nd = rb_first(source); nd; nd = rb_next(nd)) {
+		struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
+		symbols__insert_by_name(self, pos);
+	}
+}
+
+static struct symbol *symbols__find_by_name(struct rb_root *self, const char *name)
+{
+	struct rb_node *n;
+
+	if (self == NULL)
+		return NULL;
+
+	n = self->rb_node;
+
+	while (n) {
+		struct symbol_name_rb_node *s;
+		int cmp;
+
+		s = rb_entry(n, struct symbol_name_rb_node, rb_node);
+		cmp = strcmp(name, s->sym.name);
+
+		if (cmp < 0)
+			n = n->rb_left;
+		else if (cmp > 0)
+			n = n->rb_right;
+		else
+			return &s->sym;
+	}
+
+	return NULL;
+}
+
+struct symbol *dso__find_symbol(struct dso *self,
+				enum map_type type, u64 addr)
 {
 	return symbols__find(&self->symbols[type], addr);
 }
 
+struct symbol *dso__find_symbol_by_name(struct dso *self, enum map_type type,
+					const char *name)
+{
+	return symbols__find_by_name(&self->symbol_names[type], name);
+}
+
+void dso__sort_by_name(struct dso *self, enum map_type type)
+{
+	dso__set_sorted_by_name(self, type);
+	return symbols__sort_by_name(&self->symbol_names[type],
+				     &self->symbols[type]);
+}
+
 int build_id__sprintf(u8 *self, int len, char *bf)
 {
 	char *bid = bf;
@@ -327,10 +421,7 @@ static int dso__load_all_kallsyms(struct dso *self, struct map *map)
 			continue;
 
 		symbol_type = toupper(line[len]);
-		/*
-		 * We're interested only in code ('T'ext)
-		 */
-		if (symbol_type != 'T' && symbol_type != 'W')
+		if (!symbol_type__is_a(symbol_type, map->type))
 			continue;
 
 		symbol_name = line + len + 2;
@@ -364,8 +455,8 @@ out_failure:
  * kernel range is broken in several maps, named [kernel].N, as we don't have
  * the original ELF section names vmlinux have.
  */
-static int dso__split_kallsyms(struct dso *self, struct map *map, struct thread *thread,
-			       symbol_filter_t filter)
+static int dso__split_kallsyms(struct dso *self, struct map *map,
+			       struct map_groups *mg, symbol_filter_t filter)
 {
 	struct map *curr_map = map;
 	struct symbol *pos;
@@ -382,13 +473,13 @@ static int dso__split_kallsyms(struct dso *self, struct map *map, struct thread
 
 		module = strchr(pos->name, '\t');
 		if (module) {
-			if (!thread->use_modules)
+			if (!mg->use_modules)
 				goto discard_symbol;
 
 			*module++ = '\0';
 
 			if (strcmp(self->name, module)) {
-				curr_map = thread__find_map_by_name(thread, module);
+				curr_map = map_groups__find_by_name(mg, map->type, module);
 				if (curr_map == NULL) {
 					pr_debug("/proc/{kallsyms,modules} "
 					         "inconsistency!\n");
@@ -419,7 +510,7 @@ static int dso__split_kallsyms(struct dso *self, struct map *map, struct thread
 			}
 
 			curr_map->map_ip = curr_map->unmap_ip = identity__map_ip;
-			__thread__insert_map(thread, curr_map);
+			map_groups__insert(mg, curr_map);
 			++kernel_range;
 		}
 
@@ -440,7 +531,7 @@ discard_symbol:		rb_erase(&pos->rb_node, root);
 
 
 static int dso__load_kallsyms(struct dso *self, struct map *map,
-			      struct thread *thread, symbol_filter_t filter)
+			      struct map_groups *mg, symbol_filter_t filter)
 {
 	if (dso__load_all_kallsyms(self, map) < 0)
 		return -1;
@@ -448,13 +539,13 @@ static int dso__load_kallsyms(struct dso *self, struct map *map,
 	symbols__fixup_end(&self->symbols[map->type]);
 	self->origin = DSO__ORIG_KERNEL;
 
-	return dso__split_kallsyms(self, map, thread, filter);
+	return dso__split_kallsyms(self, map, mg, filter);
 }
 
 size_t kernel_maps__fprintf(FILE *fp)
 {
 	size_t printed = fprintf(fp, "Kernel maps:\n");
-	printed += thread__fprintf_maps(kthread, fp);
+	printed += map_groups__fprintf_maps(kmaps, fp);
 	return printed + fprintf(fp, "END kernel maps\n");
 }
 
@@ -544,6 +635,13 @@ static inline int elf_sym__is_function(const GElf_Sym *sym)
 	       sym->st_shndx != SHN_UNDEF;
 }
 
+static inline bool elf_sym__is_object(const GElf_Sym *sym)
+{
+	return elf_sym__type(sym) == STT_OBJECT &&
+		sym->st_name != 0 &&
+		sym->st_shndx != SHN_UNDEF;
+}
+
 static inline int elf_sym__is_label(const GElf_Sym *sym)
 {
 	return elf_sym__type(sym) == STT_NOTYPE &&
@@ -564,6 +662,12 @@ static inline int elf_sec__is_text(const GElf_Shdr *shdr,
 	return strstr(elf_sec__name(shdr, secstrs), "text") != NULL;
 }
 
+static inline bool elf_sec__is_data(const GElf_Shdr *shdr,
+				    const Elf_Data *secstrs)
+{
+	return strstr(elf_sec__name(shdr, secstrs), "data") != NULL;
+}
+
 static inline const char *elf_sym__name(const GElf_Sym *sym,
 					const Elf_Data *symstrs)
 {
@@ -744,8 +848,32 @@ out:
 	return 0;
 }
 
+static bool elf_sym__is_a(GElf_Sym *self, enum map_type type)
+{
+	switch (type) {
+	case MAP__FUNCTION:
+		return elf_sym__is_function(self);
+	case MAP__VARIABLE:
+		return elf_sym__is_object(self);
+	default:
+		return false;
+	}
+}
+
+static bool elf_sec__is_a(GElf_Shdr *self, Elf_Data *secstrs, enum map_type type)
+{
+	switch (type) {
+	case MAP__FUNCTION:
+		return elf_sec__is_text(self, secstrs);
+	case MAP__VARIABLE:
+		return elf_sec__is_data(self, secstrs);
+	default:
+		return false;
+	}
+}
+
 static int dso__load_sym(struct dso *self, struct map *map,
-			 struct thread *thread, const char *name, int fd,
+			 struct map_groups *mg, const char *name, int fd,
 			 symbol_filter_t filter, int kernel, int kmodule)
 {
 	struct map *curr_map = map;
@@ -818,7 +946,7 @@ static int dso__load_sym(struct dso *self, struct map *map,
 		int is_label = elf_sym__is_label(&sym);
 		const char *section_name;
 
-		if (!is_label && !elf_sym__is_function(&sym))
+		if (!is_label && !elf_sym__is_a(&sym, map->type))
 			continue;
 
 		sec = elf_getscn(elf, sym.st_shndx);
@@ -827,7 +955,7 @@ static int dso__load_sym(struct dso *self, struct map *map,
 
 		gelf_getshdr(sec, &shdr);
 
-		if (is_label && !elf_sec__is_text(&shdr, secstrs))
+		if (is_label && !elf_sec__is_a(&shdr, secstrs, map->type))
 			continue;
 
 		elf_name = elf_sym__name(&sym, symstrs);
@@ -849,7 +977,7 @@ static int dso__load_sym(struct dso *self, struct map *map,
 			snprintf(dso_name, sizeof(dso_name),
 				 "%s%s", self->short_name, section_name);
 
-			curr_map = thread__find_map_by_name(thread, dso_name);
+			curr_map = map_groups__find_by_name(mg, map->type, dso_name);
 			if (curr_map == NULL) {
 				u64 start = sym.st_value;
 
@@ -868,7 +996,7 @@ static int dso__load_sym(struct dso *self, struct map *map,
 				curr_map->map_ip = identity__map_ip;
 				curr_map->unmap_ip = identity__map_ip;
 				curr_dso->origin = DSO__ORIG_KERNEL;
-				__thread__insert_map(kthread, curr_map);
+				map_groups__insert(kmaps, curr_map);
 				dsos__add(&dsos__kernel, curr_dso);
 			} else
 				curr_dso = curr_map->dso;
@@ -1094,7 +1222,7 @@ int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
 	dso__set_loaded(self, map->type);
 
 	if (self->kernel)
-		return dso__load_kernel_sym(self, map, kthread, filter);
+		return dso__load_kernel_sym(self, map, kmaps, filter);
 
 	name = malloc(size);
 	if (!name)
@@ -1180,11 +1308,12 @@ out:
 	return ret;
 }
 
-static struct map *thread__find_map_by_name(struct thread *self, char *name)
+struct map *map_groups__find_by_name(struct map_groups *self,
+				     enum map_type type, const char *name)
 {
 	struct rb_node *nd;
 
-	for (nd = rb_first(&self->maps[MAP__FUNCTION]); nd; nd = rb_next(nd)) {
+	for (nd = rb_first(&self->maps[type]); nd; nd = rb_next(nd)) {
 		struct map *map = rb_entry(nd, struct map, rb_node);
 
 		if (map->dso && strcmp(map->dso->name, name) == 0)
@@ -1228,7 +1357,7 @@ static int dsos__set_modules_path_dir(char *dirname)
 				 (int)(dot - dent->d_name), dent->d_name);
 
 			strxfrchar(dso_name, '-', '_');
-			map = thread__find_map_by_name(kthread, dso_name);
+			map = map_groups__find_by_name(kmaps, MAP__FUNCTION, dso_name);
 			if (map == NULL)
 				continue;
 
@@ -1281,7 +1410,7 @@ static struct map *map__new2(u64 start, struct dso *dso, enum map_type type)
 	return self;
 }
 
-static int thread__create_module_maps(struct thread *self)
+static int map_groups__create_module_maps(struct map_groups *self)
 {
 	char *line = NULL;
 	size_t n;
@@ -1338,7 +1467,7 @@ static int thread__create_module_maps(struct thread *self)
 			dso->has_build_id = true;
 
 		dso->origin = DSO__ORIG_KMODULE;
-		__thread__insert_map(self, map);
+		map_groups__insert(self, map);
 		dsos__add(&dsos__kernel, dso);
 	}
 
@@ -1353,7 +1482,8 @@ out_failure:
 	return -1;
 }
 
-static int dso__load_vmlinux(struct dso *self, struct map *map, struct thread *thread,
+static int dso__load_vmlinux(struct dso *self, struct map *map,
+			     struct map_groups *mg,
 			     const char *vmlinux, symbol_filter_t filter)
 {
 	int err = -1, fd;
@@ -1387,14 +1517,14 @@ static int dso__load_vmlinux(struct dso *self, struct map *map, struct thread *t
 		return -1;
 
 	dso__set_loaded(self, map->type);
-	err = dso__load_sym(self, map, thread, self->long_name, fd, filter, 1, 0);
+	err = dso__load_sym(self, map, mg, self->long_name, fd, filter, 1, 0);
 	close(fd);
 
 	return err;
 }
 
 static int dso__load_kernel_sym(struct dso *self, struct map *map,
-				struct thread *thread, symbol_filter_t filter)
+				struct map_groups *mg, symbol_filter_t filter)
 {
 	int err;
 	bool is_kallsyms;
@@ -1404,7 +1534,7 @@ static int dso__load_kernel_sym(struct dso *self, struct map *map,
 		pr_debug("Looking at the vmlinux_path (%d entries long)\n",
 			 vmlinux_path__nr_entries);
 		for (i = 0; i < vmlinux_path__nr_entries; ++i) {
-			err = dso__load_vmlinux(self, map, thread,
+			err = dso__load_vmlinux(self, map, mg,
 						vmlinux_path[i], filter);
 			if (err > 0) {
 				pr_debug("Using %s for symbols\n",
@@ -1420,12 +1550,12 @@ static int dso__load_kernel_sym(struct dso *self, struct map *map,
 	if (is_kallsyms)
 		goto do_kallsyms;
 
-	err = dso__load_vmlinux(self, map, thread, self->long_name, filter);
+	err = dso__load_vmlinux(self, map, mg, self->long_name, filter);
 	if (err <= 0) {
 		pr_info("The file %s cannot be used, "
 			"trying to use /proc/kallsyms...", self->long_name);
 do_kallsyms:
-		err = dso__load_kallsyms(self, map, thread, filter);
+		err = dso__load_kallsyms(self, map, mg, filter);
 		if (err > 0 && !is_kallsyms)
                         dso__set_long_name(self, strdup("[kernel.kallsyms]"));
 	}
@@ -1508,42 +1638,59 @@ size_t dsos__fprintf_buildid(FILE *fp)
 		__dsos__fprintf_buildid(&dsos__user, fp));
 }
 
-static int thread__create_kernel_map(struct thread *self, const char *vmlinux)
+static struct dso *dsos__create_kernel( const char *vmlinux)
 {
-	struct map *kmap;
 	struct dso *kernel = dso__new(vmlinux ?: "[kernel.kallsyms]");
 
 	if (kernel == NULL)
-		return -1;
-
-	kmap = map__new2(0, kernel, MAP__FUNCTION);
-	if (kmap == NULL)
-		goto out_delete_kernel_dso;
+		return NULL;
 
-	kmap->map_ip	   = kmap->unmap_ip = identity__map_ip;
 	kernel->short_name = "[kernel]";
 	kernel->kernel	   = 1;
 
 	vdso = dso__new("[vdso]");
 	if (vdso == NULL)
-		goto out_delete_kernel_map;
+		goto out_delete_kernel_dso;
 	dso__set_loaded(vdso, MAP__FUNCTION);
 
 	if (sysfs__read_build_id("/sys/kernel/notes", kernel->build_id,
 				 sizeof(kernel->build_id)) == 0)
 		kernel->has_build_id = true;
 
-	__thread__insert_map(self, kmap);
 	dsos__add(&dsos__kernel, kernel);
 	dsos__add(&dsos__user, vdso);
 
-	return 0;
+	return kernel;
 
-out_delete_kernel_map:
-	map__delete(kmap);
 out_delete_kernel_dso:
 	dso__delete(kernel);
-	return -1;
+	return NULL;
+}
+
+static int map_groups__create_kernel_maps(struct map_groups *self, const char *vmlinux)
+{
+	struct map *functions, *variables;
+	struct dso *kernel = dsos__create_kernel(vmlinux);
+
+	if (kernel == NULL)
+		return -1;
+
+	functions = map__new2(0, kernel, MAP__FUNCTION);
+	if (functions == NULL)
+		return -1;
+
+	variables = map__new2(0, kernel, MAP__VARIABLE);
+	if (variables == NULL) {
+		map__delete(functions);
+		return -1;
+	}
+
+	functions->map_ip = functions->unmap_ip =
+		variables->map_ip = variables->unmap_ip = identity__map_ip;
+	map_groups__insert(self, functions);
+	map_groups__insert(self, variables);
+
+	return 0;
 }
 
 static void vmlinux_path__exit(void)
@@ -1607,23 +1754,26 @@ int symbol__init(struct symbol_conf *conf)
 
 	elf_version(EV_CURRENT);
 	symbol__priv_size = pconf->priv_size;
-	thread__init(kthread, 0);
+	if (pconf->sort_by_name)
+		symbol__priv_size += (sizeof(struct symbol_name_rb_node) -
+				      sizeof(struct symbol));
+	map_groups__init(kmaps);
 
 	if (pconf->try_vmlinux_path && vmlinux_path__init() < 0)
 		return -1;
 
-	if (thread__create_kernel_map(kthread, pconf->vmlinux_name) < 0) {
+	if (map_groups__create_kernel_maps(kmaps, pconf->vmlinux_name) < 0) {
 		vmlinux_path__exit();
 		return -1;
 	}
 
-	kthread->use_modules = pconf->use_modules;
-	if (pconf->use_modules && thread__create_module_maps(kthread) < 0)
+	kmaps->use_modules = pconf->use_modules;
+	if (pconf->use_modules && map_groups__create_module_maps(kmaps) < 0)
 		pr_debug("Failed to load list of modules in use, "
 			 "continuing...\n");
 	/*
 	 * Now that we have all the maps created, just set the ->end of them:
 	 */
-	thread__fixup_maps_end(kthread);
+	map_groups__fixup_end(kmaps);
 	return 0;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 17003ef..cf99f88 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -52,7 +52,8 @@ struct symbol {
 struct symbol_conf {
 	unsigned short	priv_size;
 	bool		try_vmlinux_path,
-			use_modules;
+			use_modules,
+			sort_by_name;
 	const char	*vmlinux_name;
 };
 
@@ -74,13 +75,13 @@ struct addr_location {
 struct dso {
 	struct list_head node;
 	struct rb_root	 symbols[MAP__NR_TYPES];
-	struct symbol    *(*find_symbol)(struct dso *self,
-					 enum map_type type, u64 addr);
+	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	u8		 adjust_symbols:1;
 	u8		 slen_calculated:1;
 	u8		 has_build_id:1;
 	u8		 kernel:1;
 	unsigned char	 origin;
+	u8		 sorted_by_name;
 	u8		 loaded;
 	u8		 build_id[BUILD_ID_SIZE];
 	u16		 long_name_len;
@@ -93,6 +94,9 @@ struct dso *dso__new(const char *name);
 void dso__delete(struct dso *self);
 
 bool dso__loaded(const struct dso *self, enum map_type type);
+bool dso__sorted_by_name(const struct dso *self, enum map_type type);
+
+void dso__sort_by_name(struct dso *self, enum map_type type);
 
 struct dso *dsos__findnew(const char *name);
 int dso__load(struct dso *self, struct map *map, symbol_filter_t filter);
@@ -103,6 +107,9 @@ size_t dso__fprintf_buildid(struct dso *self, FILE *fp);
 size_t dso__fprintf(struct dso *self, enum map_type type, FILE *fp);
 char dso__symtab_origin(const struct dso *self);
 void dso__set_build_id(struct dso *self, void *build_id);
+struct symbol *dso__find_symbol(struct dso *self, enum map_type type, u64 addr);
+struct symbol *dso__find_symbol_by_name(struct dso *self, enum map_type type,
+					const char *name);
 
 int filename__read_build_id(const char *filename, void *bf, size_t size);
 int sysfs__read_build_id(const char *filename, void *bf, size_t size);
@@ -113,8 +120,8 @@ size_t kernel_maps__fprintf(FILE *fp);
 
 int symbol__init(struct symbol_conf *conf);
 
-struct thread;
-struct thread *kthread;
+struct map_groups;
+struct map_groups *kmaps;
 extern struct list_head dsos__user, dsos__kernel;
 extern struct dso *vdso;
 #endif /* __PERF_SYMBOL */
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 603f561..b68a00e 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -9,11 +9,9 @@
 static struct rb_root threads;
 static struct thread *last_match;
 
-void thread__init(struct thread *self, pid_t pid)
+void map_groups__init(struct map_groups *self)
 {
 	int i;
-	self->pid = pid;
-	self->comm = NULL;
 	for (i = 0; i < MAP__NR_TYPES; ++i) {
 		self->maps[i] = RB_ROOT;
 		INIT_LIST_HEAD(&self->removed_maps[i]);
@@ -25,7 +23,8 @@ static struct thread *thread__new(pid_t pid)
 	struct thread *self = zalloc(sizeof(*self));
 
 	if (self != NULL) {
-		thread__init(self, pid);
+		map_groups__init(&self->mg);
+		self->pid = pid;
 		self->comm = malloc(32);
 		if (self->comm)
 			snprintf(self->comm, 32, ":%d", self->pid);
@@ -55,10 +54,11 @@ int thread__comm_len(struct thread *self)
 
 static const char *map_type__name[MAP__NR_TYPES] = {
 	[MAP__FUNCTION] = "Functions",
+	[MAP__VARIABLE] = "Variables",
 };
 
-static size_t __thread__fprintf_maps(struct thread *self,
-				     enum map_type type, FILE *fp)
+static size_t __map_groups__fprintf_maps(struct map_groups *self,
+					 enum map_type type, FILE *fp)
 {
 	size_t printed = fprintf(fp, "%s:\n", map_type__name[type]);
 	struct rb_node *nd;
@@ -76,16 +76,16 @@ static size_t __thread__fprintf_maps(struct thread *self,
 	return printed;
 }
 
-size_t thread__fprintf_maps(struct thread *self, FILE *fp)
+size_t map_groups__fprintf_maps(struct map_groups *self, FILE *fp)
 {
 	size_t printed = 0, i;
 	for (i = 0; i < MAP__NR_TYPES; ++i)
-		printed += __thread__fprintf_maps(self, i, fp);
+		printed += __map_groups__fprintf_maps(self, i, fp);
 	return printed;
 }
 
-static size_t __thread__fprintf_removed_maps(struct thread *self,
-					     enum map_type type, FILE *fp)
+static size_t __map_groups__fprintf_removed_maps(struct map_groups *self,
+						 enum map_type type, FILE *fp)
 {
 	struct map *pos;
 	size_t printed = 0;
@@ -101,20 +101,25 @@ static size_t __thread__fprintf_removed_maps(struct thread *self,
 	return printed;
 }
 
-static size_t thread__fprintf_removed_maps(struct thread *self, FILE *fp)
+static size_t map_groups__fprintf_removed_maps(struct map_groups *self, FILE *fp)
 {
 	size_t printed = 0, i;
 	for (i = 0; i < MAP__NR_TYPES; ++i)
-		printed += __thread__fprintf_removed_maps(self, i, fp);
+		printed += __map_groups__fprintf_removed_maps(self, i, fp);
 	return printed;
 }
 
-static size_t thread__fprintf(struct thread *self, FILE *fp)
+static size_t map_groups__fprintf(struct map_groups *self, FILE *fp)
 {
-	size_t printed = fprintf(fp, "Thread %d %s\n", self->pid, self->comm);
-	printed += thread__fprintf_removed_maps(self, fp);
+	size_t printed = map_groups__fprintf_maps(self, fp);
 	printed += fprintf(fp, "Removed maps:\n");
-	return printed + thread__fprintf_removed_maps(self, fp);
+	return printed + map_groups__fprintf_removed_maps(self, fp);
+}
+
+static size_t thread__fprintf(struct thread *self, FILE *fp)
+{
+	return fprintf(fp, "Thread %d %s\n", self->pid, self->comm) +
+	       map_groups__fprintf(&self->mg, fp);
 }
 
 struct thread *threads__findnew(pid_t pid)
@@ -168,7 +173,8 @@ struct thread *register_idle_thread(void)
 	return thread;
 }
 
-static void thread__remove_overlappings(struct thread *self, struct map *map)
+static void map_groups__remove_overlappings(struct map_groups *self,
+					    struct map *map)
 {
 	struct rb_root *root = &self->maps[map->type];
 	struct rb_node *next = rb_first(root);
@@ -238,12 +244,15 @@ struct map *maps__find(struct rb_root *maps, u64 ip)
 
 void thread__insert_map(struct thread *self, struct map *map)
 {
-	thread__remove_overlappings(self, map);
-	maps__insert(&self->maps[map->type], map);
+	map_groups__remove_overlappings(&self->mg, map);
+	map_groups__insert(&self->mg, map);
 }
 
-static int thread__clone_maps(struct thread *self, struct thread *parent,
-			      enum map_type type)
+/*
+ * XXX This should not really _copy_ te maps, but refcount them.
+ */
+static int map_groups__clone(struct map_groups *self,
+			     struct map_groups *parent, enum map_type type)
 {
 	struct rb_node *nd;
 	for (nd = rb_first(&parent->maps[type]); nd; nd = rb_next(nd)) {
@@ -251,7 +260,7 @@ static int thread__clone_maps(struct thread *self, struct thread *parent,
 		struct map *new = map__clone(map);
 		if (new == NULL)
 			return -ENOMEM;
-		thread__insert_map(self, new);
+		map_groups__insert(self, new);
 	}
 	return 0;
 }
@@ -267,7 +276,7 @@ int thread__fork(struct thread *self, struct thread *parent)
 		return -ENOMEM;
 
 	for (i = 0; i < MAP__NR_TYPES; ++i)
-		if (thread__clone_maps(self, parent, i) < 0)
+		if (map_groups__clone(&self->mg, &parent->mg, i) < 0)
 			return -ENOMEM;
 	return 0;
 }
@@ -286,11 +295,11 @@ size_t threads__fprintf(FILE *fp)
 	return ret;
 }
 
-struct symbol *thread__find_symbol(struct thread *self,
-				   enum map_type type, u64 addr,
-				   symbol_filter_t filter)
+struct symbol *map_groups__find_symbol(struct map_groups *self,
+				       enum map_type type, u64 addr,
+				       symbol_filter_t filter)
 {
-	struct map *map = thread__find_map(self, type, addr);
+	struct map *map = map_groups__find(self, type, addr);
 
 	if (map != NULL)
 		return map__find_symbol(map, map->map_ip(map, addr), filter);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 686d6e9..1751802 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -5,52 +5,66 @@
 #include <unistd.h>
 #include "symbol.h"
 
-struct thread {
-	struct rb_node		rb_node;
+struct map_groups {
 	struct rb_root		maps[MAP__NR_TYPES];
 	struct list_head	removed_maps[MAP__NR_TYPES];
-	pid_t			pid;
 	bool			use_modules;
+};
+
+struct thread {
+	struct rb_node		rb_node;
+	struct map_groups	mg;
+	pid_t			pid;
 	char			shortname[3];
 	char			*comm;
 	int			comm_len;
 };
 
-void thread__init(struct thread *self, pid_t pid);
+void map_groups__init(struct map_groups *self);
 int thread__set_comm(struct thread *self, const char *comm);
 int thread__comm_len(struct thread *self);
 struct thread *threads__findnew(pid_t pid);
 struct thread *register_idle_thread(void);
 void thread__insert_map(struct thread *self, struct map *map);
 int thread__fork(struct thread *self, struct thread *parent);
-size_t thread__fprintf_maps(struct thread *self, FILE *fp);
+size_t map_groups__fprintf_maps(struct map_groups *self, FILE *fp);
 size_t threads__fprintf(FILE *fp);
 
 void maps__insert(struct rb_root *maps, struct map *map);
 struct map *maps__find(struct rb_root *maps, u64 addr);
 
-static inline struct map *thread__find_map(struct thread *self,
+static inline void map_groups__insert(struct map_groups *self, struct map *map)
+{
+	 maps__insert(&self->maps[map->type], map);
+}
+
+static inline struct map *map_groups__find(struct map_groups *self,
 					   enum map_type type, u64 addr)
 {
-	return self ? maps__find(&self->maps[type], addr) : NULL;
+	return maps__find(&self->maps[type], addr);
 }
 
-static inline void __thread__insert_map(struct thread *self, struct map *map)
+static inline struct map *thread__find_map(struct thread *self,
+					   enum map_type type, u64 addr)
 {
-	 maps__insert(&self->maps[map->type], map);
+	return self ? map_groups__find(&self->mg, type, addr) : NULL;
 }
 
 void thread__find_addr_location(struct thread *self, u8 cpumode,
 				enum map_type type, u64 addr,
 				struct addr_location *al,
 				symbol_filter_t filter);
-struct symbol *thread__find_symbol(struct thread *self,
-				   enum map_type type, u64 addr,
-				   symbol_filter_t filter);
+struct symbol *map_groups__find_symbol(struct map_groups *self,
+				       enum map_type type, u64 addr,
+				       symbol_filter_t filter);
 
 static inline struct symbol *
-thread__find_function(struct thread *self, u64 addr, symbol_filter_t filter)
+map_groups__find_function(struct map_groups *self, u64 addr,
+			  symbol_filter_t filter)
 {
-	return thread__find_symbol(self, MAP__FUNCTION, addr, filter);
+	return map_groups__find_symbol(self, MAP__FUNCTION, addr, filter);
 }
+
+struct map *map_groups__find_by_name(struct map_groups *self,
+				     enum map_type type, const char *name);
 #endif	/* __PERF_THREAD_H */

^ permalink raw reply related	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2019-09-26 23:00 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03  6:54 [GIT PULL] perf updates Frederic Weisbecker
2010-03-03  6:55 ` [PATCH 1/3] lockdep: Move lock events under lockdep recursion protection Frederic Weisbecker
2010-03-09  7:18   ` Hitoshi Mitake
2010-03-10  0:10     ` Frederic Weisbecker
2010-03-09  8:34   ` Jens Axboe
2010-03-09  8:35     ` Jens Axboe
2010-03-10  0:05       ` Frederic Weisbecker
2010-03-10 15:45         ` Peter Zijlstra
2010-03-10 15:56           ` Jens Axboe
2010-03-10 15:55         ` Jens Axboe
2010-03-09 23:45     ` Frederic Weisbecker
2010-03-10 15:55       ` Jens Axboe
2010-03-03  6:55 ` [RFC][PATCH 2/3] perf: Take a hot regs snapshot for trace events Frederic Weisbecker
2010-03-03  8:38   ` Peter Zijlstra
2010-03-03 20:07     ` Frederic Weisbecker
2010-03-04 19:01     ` Frederic Weisbecker
2010-03-05  3:08     ` [PATCH 0/2] Perf " Frederic Weisbecker
2010-03-05  3:08     ` [PATCH 1/2] perf: Introduce new perf_save_regs() for hot regs snapshot Frederic Weisbecker
2010-03-05  3:08       ` Frederic Weisbecker
2010-03-05  3:08       ` Frederic Weisbecker
2010-03-05 15:08       ` Masami Hiramatsu
2010-03-05 16:38         ` Frederic Weisbecker
2010-03-05 17:08           ` Masami Hiramatsu
2010-03-05 17:17             ` Frederic Weisbecker
2010-03-05 20:55             ` [PATCH 1/2] perf: Introduce new perf_fetch_caller_regs() " Frederic Weisbecker
2010-03-05 20:55               ` Frederic Weisbecker
2010-03-05 20:55               ` Frederic Weisbecker
2010-03-05 20:55             ` [PATCH 2/2] perf: Take a hot regs snapshot for trace events Frederic Weisbecker
2010-03-05 20:55               ` Frederic Weisbecker
2010-03-05 20:55               ` Frederic Weisbecker
2010-03-05  3:08     ` Frederic Weisbecker
2010-03-05  3:08       ` Frederic Weisbecker
2010-03-05  3:08       ` Frederic Weisbecker
2010-03-03 16:06   ` [RFC][PATCH 2/3] " Steven Rostedt
2010-03-03 16:37     ` Peter Zijlstra
2010-03-03 17:07       ` Steven Rostedt
2010-03-03 17:16         ` Peter Zijlstra
2010-03-03 17:45           ` Steven Rostedt
2010-03-03 20:37             ` Frederic Weisbecker
2010-03-04 11:25           ` Ingo Molnar
2010-03-04 15:16             ` Steven Rostedt
2010-03-04 15:36               ` Ingo Molnar
2010-03-04 15:55                 ` Steven Rostedt
2010-03-04 21:17                   ` Ingo Molnar
2010-03-04 21:30                     ` Steven Rostedt
2010-03-04 21:37                       ` Frederic Weisbecker
2010-03-04 21:52                         ` Thomas Gleixner
2010-03-04 22:01                           ` Frederic Weisbecker
2010-03-04 22:02                         ` Steven Rostedt
2010-03-04 22:09                           ` Frederic Weisbecker
2010-03-03  6:55 ` [PATCH 3/3] perf/x86-64: Use frame pointer to walk on irq and process stacks Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2019-09-26 20:06 [GIT PULL] perf updates Ingo Molnar
2019-09-26 23:00 ` pr-tracker-bot
2016-08-06  6:05 Ingo Molnar
2016-05-25 20:18 Ingo Molnar
2015-07-04 11:25 Ingo Molnar
2015-04-18 15:22 Ingo Molnar
2013-11-13 20:58 Ingo Molnar
2012-05-23 19:01 Ingo Molnar
2011-02-26 15:11 [PATCH v2] perf: Fix undefined PyVarObject_HEAD_INIT in python 2.5 Frederic Weisbecker
2011-03-04  0:38 ` [GIT PULL] perf updates Frederic Weisbecker
2011-03-04  7:10   ` Ingo Molnar
2010-06-08 20:13 Frederic Weisbecker
2010-06-08 21:13 ` Ingo Molnar
2010-05-25 13:45 Frederic Weisbecker
2010-05-09 20:43 Frederic Weisbecker
2010-05-10  5:10 ` Frederic Weisbecker
2010-05-10  6:46 ` Ingo Molnar
2010-04-14 15:59 Frederic Weisbecker
2010-04-04 14:36 Frederic Weisbecker
2010-04-04 14:42 ` Frederic Weisbecker
2010-04-04 19:02   ` Ingo Molnar
2010-04-05  9:56   ` Hitoshi Mitake
2010-03-05 21:55 Frederic Weisbecker
2010-03-10 14:28 ` Ingo Molnar
2010-02-27 18:25 [GIT PULL] Perf updates Frederic Weisbecker
2009-12-14 16:24 [GIT PULL] perf updates Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.