All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFCv2 00/15] perf: Add backtrace post dwarf unwind
@ 2012-04-17 11:17 Jiri Olsa
  2012-04-17 11:17 ` [PATCH 01/16] uaccess: Add new copy_from_user_gup API Jiri Olsa
                   ` (16 more replies)
  0 siblings, 17 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper

hi,
sending another RFC version. There are some fixies and some
yet unresolved issues outlined below.

thanks for comments,
jirka

v2 changes:
	02/16 - fixed register enums
	12/16 - fixed the perf_evlist__* names
	14/16 - 'fp' stays default even if compiled with dwarf unwind
	15/16 - added cache to the DSO data interface, it makes the
                dwarf unwind real fast now

v2 not solved yet:
  1) generic user regs capturing
     discussed in here:
       http://marc.info/?l=linux-kernel&m=133304076114629&w=2

     Looks like we could have a generic way of getting registers
     on samples and support more than only user level registers.
     But looks like it wasn't completely decided what register
     levels are worth to store..

     It looks to me that each level could add its own registers
     mask and it'd be used/saved if the mask is non zero.
     The same way the user level regs are dealt with now ;).

     Or we could add something that Stephane came up with:
       attr->sample_type |= PERF_SAMPLE_REGS
       attr->sample_regs = EAX | EBX | EDI | ESI |.....
       attr->sample_reg_mode = { INTR, PRECISE, USER }

     But I guess we need to decide what levels make sense
     to store first. Also if there's like 2 or 3 only, it might be
     better to use separate masks as pointed out above.

  2) compat task handling
     we dont handle compat tasks unwinding currently, we probably want
     to make it part of the 1) solution or add something like:
	__u64 user_regs_compat;

     Handling the compat task would also need some other changes in the
     unwind code and I'm not completely sure how libunwind deals with that,
     need to check.

     How much do we want this? ;)

  3) registers version names
     this one is now probably connected to 1) and 2) ;)
     I kept the register version, since I think it might be usefull
     for dealing with compat tasks - to recognize what type of registers
     were stored

---
Adding the post unwinding user stack backtrace using dwarf unwind
via libunwind. The original work was done by Frederic. I mostly took
his patches and make them compile in current kernel code plus I added
some stuff here and there.

The main idea is to store user registers and portion of user
stack when the sample data during the record phase. Then during
the report, when the data is presented, perform the actual dwarf
dwarf unwind.

attached patches:
 01/16 uaccess: Add new copy_from_user_gup API
 02/16 perf: Unified API to record selective sets of arch registers
 03/16 perf: Add ability to dump user regs
 04/16 perf: Add ability to dump part of the user stack
 05/16 perf: Add attribute to filter out user callchains
 06/16 perf, tool: Factor DSO symtab types to generic binary types
 07/16 perf, tool: Add interface to read DSO image data
 08/16 perf, tool: Add '.note' check into search for NOTE section
 09/16 perf, tool: Back [vdso] DSO with real data
 10/16 perf, tool: Add interface to arch registers sets
 11/16 perf, tool: Add libunwind dependency for dwarf cfi unwinding
 12/16 perf, tool: Support user regs and stack in sample parsing
 13/16 perf, tool: Support for dwarf cfi unwinding on post processing
 14/16 perf, tool: Support for dwarf mode callchain on perf record
 15/16 perf, tool: Add dso data caching
 16/16 perf, tool: Add dso data caching tests

I tested on Fedora. There was not much gain on i386, because the
binaries are compiled with frame pointers. Thought the dwarf
backtrace is more accurade and unwraps calls in more details
(functions that do not set the frame pointers).

I could see some improvement on x86_64, where I got full backtrace
where current code could got just the first address out of the
instruction pointer.

Example on x86_64:
[dwarf]
   perf record -g -e syscalls:sys_enter_write date

   100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
               |
               --- __GI___libc_write
                   _IO_file_write@@GLIBC_2.2.5
                   new_do_write
                   _IO_do_write@@GLIBC_2.2.5
                   _IO_file_overflow@@GLIBC_2.2.5
                   0x4022cd
                   0x401ee6
                   __libc_start_main
                   0x4020b9


[frame pointer]
   perf record -g fp -e syscalls:sys_enter_write date

   100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
               |
               --- __GI___libc_write

Also I tested on coreutils binaries mainly, but I could see
getting wider backtraces with dwarf unwind for more complex
application like firefox.

The unwind should go throught [vdso] object. I haven't studied
the [vsyscall] yet, so not sure there.

Attached patches should work on both x86 and x86_64. I did
some initial testing so far.

The unwind backtrace can be interrupted by following reasons:
    - bug in unwind information of processed shared library
    - bug in unwind processing code (most likely ;) )
    - insufficient dump stack size
    - wrong register value - x86_64 does not store whole
      set of registers when in exception, but so far
      it looks like RIP and RSP should be enough

thanks for comments,
jirka
---
 arch/x86/include/asm/perf_regs.h                   |   16 +
 arch/x86/include/asm/perf_regs_32.h                |   84 +++
 arch/x86/include/asm/perf_regs_64.h                |  101 ++++
 arch/x86/include/asm/uaccess.h                     |    7 +-
 arch/x86/kernel/cpu/perf_event.c                   |    4 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c          |    3 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c         |    2 +-
 arch/x86/lib/usercopy.c                            |    4 +-
 arch/x86/oprofile/backtrace.c                      |    4 +-
 include/asm-generic/perf_regs.h                    |   23 +
 include/asm-generic/uaccess.h                      |    4 +
 include/linux/perf_event.h                         |   18 +-
 kernel/events/callchain.c                          |    4 +-
 kernel/events/core.c                               |  127 +++++-
 kernel/events/internal.h                           |   59 ++-
 kernel/events/ring_buffer.c                        |    4 +-
 tools/perf/Makefile                                |   41 ++-
 tools/perf/arch/x86/Makefile                       |    3 +
 tools/perf/arch/x86/include/perf_regs.h            |  101 ++++
 tools/perf/arch/x86/util/unwind.c                  |  111 ++++
 tools/perf/builtin-record.c                        |   86 +++-
 tools/perf/builtin-report.c                        |   24 +-
 tools/perf/builtin-script.c                        |   56 ++-
 tools/perf/builtin-test.c                          |    7 +-
 tools/perf/builtin-top.c                           |    7 +-
 tools/perf/config/feature-tests.mak                |   25 +
 tools/perf/perf.h                                  |    9 +-
 tools/perf/util/annotate.c                         |    2 +-
 tools/perf/util/dso-test.c                         |  154 ++++++
 tools/perf/util/event.h                            |   15 +-
 tools/perf/util/evlist.c                           |   16 +
 tools/perf/util/evlist.h                           |    2 +
 tools/perf/util/evsel.c                            |   35 ++-
 tools/perf/util/include/linux/compiler.h           |    1 +
 tools/perf/util/map.c                              |   23 +-
 tools/perf/util/map.h                              |    7 +-
 tools/perf/util/perf_regs.h                        |   10 +
 tools/perf/util/python.c                           |    3 +-
 .../perf/util/scripting-engines/trace-event-perl.c |    3 +-
 .../util/scripting-engines/trace-event-python.c    |    3 +-
 tools/perf/util/session.c                          |  104 ++++-
 tools/perf/util/session.h                          |   10 +-
 tools/perf/util/symbol.c                           |  435 +++++++++++++---
 tools/perf/util/symbol.h                           |   52 ++-
 tools/perf/util/trace-event-scripting.c            |    3 +-
 tools/perf/util/trace-event.h                      |    5 +-
 tools/perf/util/unwind.c                           |  565 ++++++++++++++++++++
 tools/perf/util/unwind.h                           |   34 ++
 tools/perf/util/vdso.c                             |   90 +++
 tools/perf/util/vdso.h                             |    8 +
 50 files changed, 2315 insertions(+), 199 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/16] uaccess: Add new copy_from_user_gup API
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 02/16] perf: Unified API to record selective sets of arch registers Jiri Olsa
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

This brings a get_user_page_fast() based copy_from_user() that can
do a best effort copy from any context.

In order to support user stack dump safely in perf samples from
generic code, rename x86 copy_from_user_nmi to copy_from_user_gup
and make it generally available. If the arch doesn't provide an
implementation it will map to copy_from_user_inatomic.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 arch/x86/include/asm/uaccess.h             |    7 +++++--
 arch/x86/kernel/cpu/perf_event.c           |    4 ++--
 arch/x86/kernel/cpu/perf_event_intel_ds.c  |    3 ++-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |    2 +-
 arch/x86/lib/usercopy.c                    |    4 ++--
 arch/x86/oprofile/backtrace.c              |    4 ++--
 include/asm-generic/uaccess.h              |    4 ++++
 7 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index e054459..b8724e9 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -228,6 +228,11 @@ extern void __put_user_2(void);
 extern void __put_user_4(void);
 extern void __put_user_8(void);
 
+extern unsigned long
+__copy_from_user_gup(void *to, const void __user *from, unsigned long n);
+
+#define copy_from_user_gup(to, from, n) __copy_from_user_gup(to, from, n)
+
 #ifdef CONFIG_X86_WP_WORKS_OK
 
 /**
@@ -555,8 +560,6 @@ struct __large_struct { unsigned long buf[100]; };
 
 #endif /* CONFIG_X86_WP_WORKS_OK */
 
-extern unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n);
 extern __must_check long
 strncpy_from_user(char *dst, const char __user *src, long count);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index bb8e034..294ad68 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1781,7 +1781,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
 		frame.next_frame     = 0;
 		frame.return_address = 0;
 
-		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		bytes = copy_from_user_gup(&frame, fp, sizeof(frame));
 		if (bytes != sizeof(frame))
 			break;
 
@@ -1827,7 +1827,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
 		frame.next_frame	     = NULL;
 		frame.return_address = 0;
 
-		bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
+		bytes = copy_from_user_gup(&frame, fp, sizeof(frame));
 		if (bytes != sizeof(frame))
 			break;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 7f64df1..77053af 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -520,7 +520,8 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
 		if (!kernel_ip(ip)) {
 			int bytes, size = MAX_INSN_SIZE;
 
-			bytes = copy_from_user_nmi(buf, (void __user *)to, size);
+			bytes = copy_from_user_gup(buf, (void __user *)to,
+						   size);
 			if (bytes != size)
 				return 0;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 520b426..8f0b9ce 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -437,7 +437,7 @@ static int branch_type(unsigned long from, unsigned long to)
 			return X86_BR_NONE;
 
 		/* may fail if text not present */
-		bytes = copy_from_user_nmi(buf, (void __user *)from, size);
+		bytes = copy_from_user_gup(buf, (void __user *)from, size);
 		if (bytes != size)
 			return X86_BR_NONE;
 
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index d6ae30b..e797d6e 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -13,7 +13,7 @@
  * best effort, GUP based copy_from_user() that is NMI-safe
  */
 unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
+__copy_from_user_gup(void *to, const void __user *from, unsigned long n)
 {
 	unsigned long offset, addr = (unsigned long)from;
 	unsigned long size, len = 0;
@@ -42,7 +42,7 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 
 	return len;
 }
-EXPORT_SYMBOL_GPL(copy_from_user_nmi);
+EXPORT_SYMBOL_GPL(__copy_from_user_gup);
 
 static inline unsigned long count_bytes(unsigned long mask)
 {
diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index d6aa6e8..42511b0 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -46,7 +46,7 @@ dump_user_backtrace_32(struct stack_frame_ia32 *head)
 	struct stack_frame_ia32 *fp;
 	unsigned long bytes;
 
-	bytes = copy_from_user_nmi(bufhead, head, sizeof(bufhead));
+	bytes = copy_from_user_gup(bufhead, head, sizeof(bufhead));
 	if (bytes != sizeof(bufhead))
 		return NULL;
 
@@ -92,7 +92,7 @@ static struct stack_frame *dump_user_backtrace(struct stack_frame *head)
 	struct stack_frame bufhead[2];
 	unsigned long bytes;
 
-	bytes = copy_from_user_nmi(bufhead, head, sizeof(bufhead));
+	bytes = copy_from_user_gup(bufhead, head, sizeof(bufhead));
 	if (bytes != sizeof(bufhead))
 		return NULL;
 
diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
index 9788568..759339b 100644
--- a/include/asm-generic/uaccess.h
+++ b/include/asm-generic/uaccess.h
@@ -240,6 +240,10 @@ extern int __get_user_bad(void) __attribute__((noreturn));
 #define __copy_to_user_inatomic __copy_to_user
 #endif
 
+#ifndef copy_from_user_gup
+#define copy_from_user_gup __copy_from_user_inatomic
+#endif
+
 static inline long copy_from_user(void *to,
 		const void __user * from, unsigned long n)
 {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
  2012-04-17 11:17 ` [PATCH 01/16] uaccess: Add new copy_from_user_gup API Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-23 10:10   ` Stephane Eranian
  2012-04-17 11:17 ` [PATCH 03/16] perf: Add ability to dump user regs Jiri Olsa
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

This brings a new API to help the selective dump of registers on
event sampling, and its implementation in x86.

- The informations about the desired registers will be passed
  to a single u64 mask. It's up to the architecture to map the
  registers into the mask bits.

- The architecture must provide a non-zero and unique id to
  identify the origin of a register set because interpreting a
  register dump requires to know from which architecture it comes.
  The achitecture is considered different between the 32 and 64 bits
  version. x86-32 has the id 1, x86-64 has the id 2.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 arch/x86/include/asm/perf_regs.h    |   16 ++++++
 arch/x86/include/asm/perf_regs_32.h |   84 +++++++++++++++++++++++++++++
 arch/x86/include/asm/perf_regs_64.h |  101 +++++++++++++++++++++++++++++++++++
 include/asm-generic/perf_regs.h     |   23 ++++++++
 4 files changed, 224 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/perf_regs.h
 create mode 100644 arch/x86/include/asm/perf_regs_32.h
 create mode 100644 arch/x86/include/asm/perf_regs_64.h
 create mode 100644 include/asm-generic/perf_regs.h

diff --git a/arch/x86/include/asm/perf_regs.h b/arch/x86/include/asm/perf_regs.h
new file mode 100644
index 0000000..80b7fbe
--- /dev/null
+++ b/arch/x86/include/asm/perf_regs.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_PERF_REGS_H
+#define _ASM_X86_PERF_REGS_H
+
+enum {
+	PERF_REGS_VERSION_NONE   = 0UL,
+	PERF_REGS_VERSION_X86_32 = 1UL,
+	PERF_REGS_VERSION_X86_64 = 2UL,
+};
+
+#ifdef CONFIG_X86_32
+#include "perf_regs_32.h"
+#else
+#include "perf_regs_64.h"
+#endif
+
+#endif /* _ASM_X86_PERF_REGS_H */
diff --git a/arch/x86/include/asm/perf_regs_32.h b/arch/x86/include/asm/perf_regs_32.h
new file mode 100644
index 0000000..3c5aa80
--- /dev/null
+++ b/arch/x86/include/asm/perf_regs_32.h
@@ -0,0 +1,84 @@
+#ifndef _ASM_X86_PERF_REGS_32_H
+#define _ASM_X86_PERF_REGS_32_H
+
+enum perf_event_x86_32_regs {
+	PERF_X86_32_REG_EAX,
+	PERF_X86_32_REG_EBX,
+	PERF_X86_32_REG_ECX,
+	PERF_X86_32_REG_EDX,
+	PERF_X86_32_REG_ESI,
+	PERF_X86_32_REG_EDI,
+	PERF_X86_32_REG_EBP,
+	PERF_X86_32_REG_ESP,
+	PERF_X86_32_REG_EIP,
+	PERF_X86_32_REG_FLAGS,
+	PERF_X86_32_REG_CS,
+	PERF_X86_32_REG_DS,
+	PERF_X86_32_REG_ES,
+	PERF_X86_32_REG_FS,
+	PERF_X86_32_REG_GS,
+
+	/* Non ABI */
+	PERF_X86_32_REG_MAX,
+	PERF_REG_IP = PERF_X86_32_REG_EIP,
+	PERF_REG_SP = PERF_X86_32_REG_ESP,
+};
+
+#ifdef __KERNEL__
+
+#define PERF_X86_32_REG_RESERVED (~((1ULL << PERF_X86_32_REG_MAX) - 1ULL))
+
+static inline u64 perf_reg_version(void)
+{
+	return PERF_REGS_VERSION_X86_32;
+}
+
+static inline int perf_reg_validate(u64 mask)
+{
+	if (mask & PERF_X86_32_REG_RESERVED)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline u64 perf_reg_value(struct pt_regs *regs, int idx)
+{
+	switch (idx) {
+	case PERF_X86_32_REG_EAX:
+		return regs->ax;
+	case PERF_X86_32_REG_EBX:
+		return regs->bx;
+	case PERF_X86_32_REG_ECX:
+		return regs->cx;
+	case PERF_X86_32_REG_EDX:
+		return regs->dx;
+	case PERF_X86_32_REG_ESI:
+		return regs->si;
+	case PERF_X86_32_REG_EDI:
+		return regs->di;
+	case PERF_X86_32_REG_EBP:
+		return regs->bp;
+	case PERF_X86_32_REG_ESP:
+		return regs->sp;
+	case PERF_X86_32_REG_EIP:
+		return regs->ip;
+	case PERF_X86_32_REG_FLAGS:
+		return regs->flags;
+	case PERF_X86_32_REG_CS:
+		return regs->cs;
+	case PERF_X86_32_REG_DS:
+		return regs->ds;
+	case PERF_X86_32_REG_ES:
+		return regs->es;
+	case PERF_X86_32_REG_FS:
+		return regs->fs;
+	case PERF_X86_32_REG_GS:
+		return regs->gs;
+	}
+
+	return 0;
+}
+
+#endif /* __KERNEL__ */
+
+#endif /* _ASM_X86_PERF_REGS_32_H */
diff --git a/arch/x86/include/asm/perf_regs_64.h b/arch/x86/include/asm/perf_regs_64.h
new file mode 100644
index 0000000..d775213
--- /dev/null
+++ b/arch/x86/include/asm/perf_regs_64.h
@@ -0,0 +1,101 @@
+#ifndef _ASM_X86_PERF_REGS_64_H
+#define _ASM_X86_PERF_REGS_64_H
+
+#define PERF_X86_64_REG_VERSION		1ULL
+
+enum perf_event_x86_64_regs {
+	PERF_X86_64_REG_RAX,
+	PERF_X86_64_REG_RBX,
+	PERF_X86_64_REG_RCX,
+	PERF_X86_64_REG_RDX,
+	PERF_X86_64_REG_RSI,
+	PERF_X86_64_REG_RDI,
+	PERF_X86_64_REG_R8,
+	PERF_X86_64_REG_R9,
+	PERF_X86_64_REG_R10,
+	PERF_X86_64_REG_R11,
+	PERF_X86_64_REG_R12,
+	PERF_X86_64_REG_R13,
+	PERF_X86_64_REG_R14,
+	PERF_X86_64_REG_R15,
+	PERF_X86_64_REG_RBP,
+	PERF_X86_64_REG_RSP,
+	PERF_X86_64_REG_RIP,
+	PERF_X86_64_REG_FLAGS,
+	PERF_X86_64_REG_CS,
+	PERF_X86_64_REG_SS,
+
+	/* Non ABI */
+	PERF_X86_64_REG_MAX,
+	PERF_REG_IP = PERF_X86_64_REG_RIP,
+	PERF_REG_SP = PERF_X86_64_REG_RSP,
+};
+
+#ifdef __KERNEL__
+
+#define PERF_X86_64_REG_RESERVED (~((1ULL << PERF_X86_64_REG_MAX) - 1ULL))
+
+static inline u64 perf_reg_version(void)
+{
+	return PERF_REGS_VERSION_X86_64;
+}
+
+static inline int perf_reg_validate(u64 mask)
+{
+	if (mask & PERF_X86_64_REG_RESERVED)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline u64 perf_reg_value(struct pt_regs *regs, int idx)
+{
+	switch (idx) {
+	case PERF_X86_64_REG_RAX:
+		return regs->ax;
+	case PERF_X86_64_REG_RBX:
+		return regs->bx;
+	case PERF_X86_64_REG_RCX:
+		return regs->cx;
+	case PERF_X86_64_REG_RDX:
+		return regs->dx;
+	case PERF_X86_64_REG_RSI:
+		return regs->si;
+	case PERF_X86_64_REG_RDI:
+		return regs->di;
+	case PERF_X86_64_REG_R8:
+		return regs->r8;
+	case PERF_X86_64_REG_R9:
+		return regs->r9;
+	case PERF_X86_64_REG_R10:
+		return regs->r10;
+	case PERF_X86_64_REG_R11:
+		return regs->r11;
+	case PERF_X86_64_REG_R12:
+		return regs->r12;
+	case PERF_X86_64_REG_R13:
+		return regs->r13;
+	case PERF_X86_64_REG_R14:
+		return regs->r14;
+	case PERF_X86_64_REG_R15:
+		return regs->r15;
+	case PERF_X86_64_REG_RBP:
+		return regs->bp;
+	case PERF_X86_64_REG_RSP:
+		return regs->sp;
+	case PERF_X86_64_REG_RIP:
+		return regs->ip;
+	case PERF_X86_64_REG_FLAGS:
+		return regs->flags;
+	case PERF_X86_64_REG_CS:
+		return regs->cs;
+	case PERF_X86_64_REG_SS:
+		return regs->ss;
+	}
+
+	return 0;
+}
+
+#endif /* __KERNEL__ */
+
+#endif /* _ASM_X86_PERF_REGS_64_H */
diff --git a/include/asm-generic/perf_regs.h b/include/asm-generic/perf_regs.h
new file mode 100644
index 0000000..f616096
--- /dev/null
+++ b/include/asm-generic/perf_regs.h
@@ -0,0 +1,23 @@
+#ifndef __ASM_GENERIC_PERF_REGS_H
+#define __ASM_GENERIC_PERF_REGS_H
+
+enum {
+	PERF_REGS_VERSION_NONE = 0UL,
+};
+
+static inline int perf_reg_value(struct pt_regs *regs, int idx)
+{
+	return 0;
+}
+
+static inline int perf_reg_version(void)
+{
+	return PERF_REGS_VERSION_NONE;
+}
+
+static inline int perf_reg_validate(u64 mask)
+{
+	return mask ? -ENOSYS : 0;
+}
+
+#endif /* __ASM_GENERIC_PERF_REGS_H */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/16] perf: Add ability to dump user regs
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
  2012-04-17 11:17 ` [PATCH 01/16] uaccess: Add new copy_from_user_gup API Jiri Olsa
  2012-04-17 11:17 ` [PATCH 02/16] perf: Unified API to record selective sets of arch registers Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-23 10:15   ` Stephane Eranian
  2012-04-17 11:17 ` [PATCH 04/16] perf: Add ability to dump part of the user stack Jiri Olsa
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Add new attr->user_regs bitmap that lets a user choose a set
of user registers to dump to the sample. The layout of this
bitmap is described in asm/perf_regs.h for archs that
support register dump.

The perf syscall will fail if attr->user_regs is non zero.

The register value here are those of the user space context as
it was before the user entered the kernel for whatever reason
(syscall, irq, exception, or a PMI happening in userspace).

This is going to be useful to bring Dwarf CFI based stack unwinding
on top of samples.

TODO handle compat tasks

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h |    8 +++++
 kernel/events/core.c       |   63 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ddbb6a9..c63b807 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -272,6 +272,12 @@ struct perf_event_attr {
 		__u64		config2; /* extension of config1 */
 	};
 	__u64	branch_sample_type; /* enum branch_sample_type */
+
+	/*
+	 * Arch specific mask that defines a set of user regs to dump on
+	 * samples. See asm/perf_regs.h for details.
+	 */
+	__u64			user_regs;
 };
 
 /*
@@ -608,6 +614,7 @@ struct perf_guest_info_callbacks {
 #include <linux/atomic.h>
 #include <linux/sysfs.h>
 #include <asm/local.h>
+#include <asm/perf_regs.h>
 
 #define PERF_MAX_STACK_DEPTH		255
 
@@ -1130,6 +1137,7 @@ struct perf_sample_data {
 	struct perf_callchain_entry	*callchain;
 	struct perf_raw_record		*raw;
 	struct perf_branch_stack	*br_stack;
+	struct pt_regs			*uregs;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a6a9ec4..9f29fc3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3751,6 +3751,37 @@ int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
 
+static void
+perf_output_sample_regs(struct perf_output_handle *handle,
+			struct pt_regs *regs, u64 mask)
+{
+	int i = 0;
+
+	do {
+		u64 val;
+
+		if (mask & 1) {
+			val = perf_reg_value(regs, i);
+			perf_output_put(handle, val);
+		}
+
+		mask >>= 1;
+		i++;
+	} while (mask);
+}
+
+static struct pt_regs *perf_sample_uregs(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		if (current->mm)
+			regs = task_pt_regs(current);
+		else
+			regs = NULL;
+	}
+
+	return regs;
+}
+
 static void __perf_event_header__init_id(struct perf_event_header *header,
 					 struct perf_sample_data *data,
 					 struct perf_event *event)
@@ -4011,6 +4042,21 @@ void perf_output_sample(struct perf_output_handle *handle,
 			perf_output_put(handle, nr);
 		}
 	}
+
+	if (event->attr.user_regs) {
+		u64 id;
+
+		/*
+		 * If there are no regs to dump, notice it through a
+		 * PERF_REGS_VERSION_NONE version.
+		 */
+		id = data->uregs ? perf_reg_version() : PERF_REGS_VERSION_NONE;
+		perf_output_put(handle, id);
+
+		if (id)
+			perf_output_sample_regs(handle, data->uregs,
+						event->attr.user_regs);
+	}
 }
 
 void perf_prepare_sample(struct perf_event_header *header,
@@ -4062,6 +4108,16 @@ void perf_prepare_sample(struct perf_event_header *header,
 		}
 		header->size += size;
 	}
+
+	if (event->attr.user_regs) {
+		int size = sizeof(u64); /* the version size */
+
+		data->uregs = perf_sample_uregs(regs);
+		if (data->uregs)
+			size += hweight64(event->attr.user_regs) * sizeof(u64);
+
+		header->size += size;
+	}
 }
 
 static void perf_event_output(struct perf_event *event,
@@ -6112,6 +6168,13 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 			attr->branch_sample_type = mask;
 		}
 	}
+
+	/*
+	 * Don't let throught invalid register mask (i.e. the architecture
+	 * does not support register dump at all).
+	 */
+	ret = perf_reg_validate(attr->user_regs);
+
 out:
 	return ret;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/16] perf: Add ability to dump part of the user stack
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (2 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 03/16] perf: Add ability to dump user regs Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 05/16] perf: Add attribute to filter out user callchains Jiri Olsa
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Beeing able to dump parts of the user stack, starting from the
stack pointer, will be useful to make a post mortem dwarf CFI based
stack unwinding.

This is done through the new ustack_dump_size perf attribute. If it
is non zero, the user stack will dumped in samples following the
requested size in bytes.

The longer is the dump, the deeper will be the resulting retrieved
callchain.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h  |    6 +++-
 kernel/events/core.c        |   61 +++++++++++++++++++++++++++++++++++++++++++
 kernel/events/internal.h    |   56 ++++++++++++++++++++++++---------------
 kernel/events/ring_buffer.c |    4 +-
 4 files changed, 102 insertions(+), 25 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c63b807..e40bac5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -278,6 +278,7 @@ struct perf_event_attr {
 	 * samples. See asm/perf_regs.h for details.
 	 */
 	__u64			user_regs;
+	__u32			ustack_dump_size;
 };
 
 /*
@@ -1291,8 +1292,9 @@ static inline bool has_branch_stack(struct perf_event *event)
 extern int perf_output_begin(struct perf_output_handle *handle,
 			     struct perf_event *event, unsigned int size);
 extern void perf_output_end(struct perf_output_handle *handle);
-extern void perf_output_copy(struct perf_output_handle *handle,
-			     const void *buf, unsigned int len);
+extern unsigned int
+perf_output_copy(struct perf_output_handle *handle,
+		 const void *buf, unsigned int len);
 extern int perf_swevent_get_recursion_context(void);
 extern void perf_swevent_put_recursion_context(int rctx);
 extern void perf_event_enable(struct perf_event *event);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9f29fc3..5bae705 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4057,6 +4057,41 @@ void perf_output_sample(struct perf_output_handle *handle,
 			perf_output_sample_regs(handle, data->uregs,
 						event->attr.user_regs);
 	}
+
+	if (event->attr.ustack_dump_size) {
+		unsigned long sp;
+		unsigned int rem;
+		u64 size, dyn_size;
+
+		/* Case of a kernel thread, nothing to dump */
+		if (!data->uregs) {
+			size = 0;
+			perf_output_put(handle, size);
+		} else {
+
+			/*
+			 * Static size: we always dump the size requested by
+			 * the user because most of the time, the top of the
+			 * user stack is not paged out.
+			 */
+			size = event->attr.ustack_dump_size;
+			size = round_up(size, sizeof(u64));
+			perf_output_put(handle, size);
+
+			sp = user_stack_pointer(data->uregs);
+			rem = __output_copy_user_gup(handle, (void *)sp, size);
+			dyn_size = size - rem;
+
+			/* What couldn't be dumped is zero padded */
+			while (rem--) {
+				char zero = 0;
+				perf_output_put(handle, zero);
+			}
+
+			/* Dynamic size: whole dump - padding */
+			perf_output_put(handle, dyn_size);
+		}
+	}
 }
 
 void perf_prepare_sample(struct perf_event_header *header,
@@ -4118,6 +4153,32 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 		header->size += size;
 	}
+
+	if (event->attr.ustack_dump_size) {
+		if (!event->attr.user_regs)
+			data->uregs = perf_sample_uregs(regs);
+
+		/*
+		 * A first field that tells the _static_ size of the dump. 0 if
+		 * there is nothing to dump (ie: we are in a kernel thread)
+		 * otherwise the requested size.
+		 */
+		header->size += sizeof(u64);
+
+		/*
+		 * If there is something to dump, add space for the dump itself
+		 * and for the field that tells the _dynamic_ size, which is
+		 * how many have been actually dumped. What couldn't be dumped
+		 * will be zero-padded.
+		 */
+		if (data->uregs) {
+			u64 size = event->attr.ustack_dump_size;
+
+			size = round_up(size, sizeof(u64));
+			header->size += size;
+			header->size += sizeof(u64);
+		}
+	}
 }
 
 static void perf_event_output(struct perf_event *event,
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index b0b107f..1ae5270 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -76,30 +76,44 @@ static inline unsigned long perf_data_size(struct ring_buffer *rb)
 	return rb->nr_pages << (PAGE_SHIFT + page_order(rb));
 }
 
-static inline void
-__output_copy(struct perf_output_handle *handle,
-		   const void *buf, unsigned int len)
+static int memcpy_common(void *dst, const void *src, size_t n)
 {
-	do {
-		unsigned long size = min_t(unsigned long, handle->size, len);
-
-		memcpy(handle->addr, buf, size);
-
-		len -= size;
-		handle->addr += size;
-		buf += size;
-		handle->size -= size;
-		if (!handle->size) {
-			struct ring_buffer *rb = handle->rb;
-
-			handle->page++;
-			handle->page &= rb->nr_pages - 1;
-			handle->addr = rb->data_pages[handle->page];
-			handle->size = PAGE_SIZE << page_order(rb);
-		}
-	} while (len);
+	memcpy(dst, src, n);
+	return n;
 }
 
+#define DEFINE_PERF_OUTPUT_COPY(func_name, memcpy_func)			\
+static inline unsigned int						\
+func_name(struct perf_output_handle *handle,				\
+	  const void *buf, unsigned int len)				\
+{									\
+	unsigned long size, written;					\
+									\
+	do {								\
+		size = min_t(unsigned long, handle->size, len);		\
+									\
+		written = memcpy_func(handle->addr, buf, size);		\
+									\
+		len -= written;						\
+		handle->addr += written;				\
+		buf += written;						\
+		handle->size -= written;				\
+		if (!handle->size) {					\
+			struct ring_buffer *rb = handle->rb;		\
+									\
+			handle->page++;					\
+			handle->page &= rb->nr_pages - 1;		\
+			handle->addr = rb->data_pages[handle->page];	\
+			handle->size = PAGE_SIZE << page_order(rb);	\
+		}							\
+	} while (len && written == size);				\
+									\
+	return len;							\
+}
+
+DEFINE_PERF_OUTPUT_COPY(__output_copy, memcpy_common)
+DEFINE_PERF_OUTPUT_COPY(__output_copy_user_gup, copy_from_user_gup)
+
 /* Callchain handling */
 extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
 extern int get_callchain_buffers(void);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 6ddaba4..b4c2ad3 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -182,10 +182,10 @@ out:
 	return -ENOSPC;
 }
 
-void perf_output_copy(struct perf_output_handle *handle,
+unsigned int perf_output_copy(struct perf_output_handle *handle,
 		      const void *buf, unsigned int len)
 {
-	__output_copy(handle, buf, len);
+	return __output_copy(handle, buf, len);
 }
 
 void perf_output_end(struct perf_output_handle *handle)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/16] perf: Add attribute to filter out user callchains
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (3 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 04/16] perf: Add ability to dump part of the user stack Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 06/16] perf, tool: Factor DSO symtab types to generic binary types Jiri Olsa
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Add the new exclude_user_callchain attribute to filter out
frame pointer based user callchains. This is something we
want to select when we use the dwarf cfi callchain mode,
because frame pointer based user callchains are useless in
this mode.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h |    4 +++-
 kernel/events/callchain.c  |    4 ++--
 kernel/events/core.c       |    3 ++-
 kernel/events/internal.h   |    3 ++-
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e40bac5..b121c05 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -255,7 +255,9 @@ struct perf_event_attr {
 				exclude_host   :  1, /* don't count in host   */
 				exclude_guest  :  1, /* don't count in guest  */
 
-				__reserved_1   : 43;
+				exclude_user_callchain : 1, /* only record kernel callchains */
+
+				__reserved_1   : 42;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 6581a04..884e997 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -153,7 +153,7 @@ put_callchain_entry(int rctx)
 	put_recursion_context(__get_cpu_var(callchain_recursion), rctx);
 }
 
-struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
+struct perf_callchain_entry *perf_callchain(struct pt_regs *regs, int user)
 {
 	int rctx;
 	struct perf_callchain_entry *entry;
@@ -177,7 +177,7 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 			regs = NULL;
 	}
 
-	if (regs) {
+	if (user && regs) {
 		perf_callchain_store(entry, PERF_CONTEXT_USER);
 		perf_callchain_user(entry, regs);
 	}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5bae705..35c28f5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4114,8 +4114,9 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
 		int size = 1;
+		int user = !event->attr.exclude_user_callchain;
 
-		data->callchain = perf_callchain(regs);
+		data->callchain = perf_callchain(regs, user);
 
 		if (data->callchain)
 			size += data->callchain->nr;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 1ae5270..b3ade19 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -115,7 +115,8 @@ DEFINE_PERF_OUTPUT_COPY(__output_copy, memcpy_common)
 DEFINE_PERF_OUTPUT_COPY(__output_copy_user_gup, copy_from_user_gup)
 
 /* Callchain handling */
-extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
+extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs,
+						   int user);
 extern int get_callchain_buffers(void);
 extern void put_callchain_buffers(void);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/16] perf, tool: Factor DSO symtab types to generic binary types
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (4 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 05/16] perf: Add attribute to filter out user callchains Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 07/16] perf, tool: Add interface to read DSO image data Jiri Olsa
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding interface to access DSOs so it could be used
from another place.

New DSO binary type is added - making current SYMTAB__*
types more general:
   DSO_BINARY_TYPE__* = SYMTAB__*

Folowing function is added to return path based on the specified
binary type:
   dso__binary_type_file

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/builtin-top.c   |    2 +-
 tools/perf/util/annotate.c |    2 +-
 tools/perf/util/symbol.c   |  181 +++++++++++++++++++++++++++-----------------
 tools/perf/util/symbol.h   |   32 ++++----
 4 files changed, 129 insertions(+), 88 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 8ef59f8..10184f6 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -125,7 +125,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he)
 	/*
 	 * We can't annotate with just /proc/kallsyms
 	 */
-	if (map->dso->symtab_type == SYMTAB__KALLSYMS) {
+	if (map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS) {
 		pr_err("Can't annotate %s: No vmlinux file was found in the "
 		       "path\n", sym->name);
 		sleep(1);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1e7fd52..7ceef29 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -307,7 +307,7 @@ fallback:
 		free_filename = false;
 	}
 
-	if (dso->symtab_type == SYMTAB__KALLSYMS) {
+	if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS) {
 		char bf[BUILD_ID_SIZE * 2 + 16] = " with build id ";
 		char *build_id_msg = NULL;
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index c0a028c..09b440c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -48,6 +48,22 @@ struct symbol_conf symbol_conf = {
 	.symfs            = "",
 };
 
+static enum dso_binary_type binary_type_symtab[] = {
+	DSO_BINARY_TYPE__KALLSYMS,
+	DSO_BINARY_TYPE__GUEST_KALLSYMS,
+	DSO_BINARY_TYPE__JAVA_JIT,
+	DSO_BINARY_TYPE__BUILD_ID_CACHE,
+	DSO_BINARY_TYPE__FEDORA_DEBUGINFO,
+	DSO_BINARY_TYPE__UBUNTU_DEBUGINFO,
+	DSO_BINARY_TYPE__BUILDID_DEBUGINFO,
+	DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+	DSO_BINARY_TYPE__GUEST_KMODULE,
+	DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE,
+	DSO_BINARY_TYPE__NOT_FOUND,
+};
+
+#define DSO_BINARY_TYPE__SYMTAB_CNT sizeof(binary_type_symtab)
+
 int dso__name_len(const struct dso *dso)
 {
 	if (!dso)
@@ -318,7 +334,7 @@ struct dso *dso__new(const char *name)
 		dso__set_short_name(dso, dso->name);
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
-		dso->symtab_type = SYMTAB__NOT_FOUND;
+		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->loaded = 0;
 		dso->sorted_by_name = 0;
 		dso->has_build_id = 0;
@@ -805,9 +821,9 @@ int dso__load_kallsyms(struct dso *dso, const char *filename,
 	symbols__fixup_end(&dso->symbols[map->type]);
 
 	if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
-		dso->symtab_type = SYMTAB__GUEST_KALLSYMS;
+		dso->symtab_type = DSO_BINARY_TYPE__GUEST_KALLSYMS;
 	else
-		dso->symtab_type = SYMTAB__KALLSYMS;
+		dso->symtab_type = DSO_BINARY_TYPE__KALLSYMS;
 
 	return dso__split_kallsyms(dso, map, filter);
 }
@@ -1564,31 +1580,96 @@ out:
 char dso__symtab_origin(const struct dso *dso)
 {
 	static const char origin[] = {
-		[SYMTAB__KALLSYMS]	      = 'k',
-		[SYMTAB__JAVA_JIT]	      = 'j',
-		[SYMTAB__BUILD_ID_CACHE]      = 'B',
-		[SYMTAB__FEDORA_DEBUGINFO]    = 'f',
-		[SYMTAB__UBUNTU_DEBUGINFO]    = 'u',
-		[SYMTAB__BUILDID_DEBUGINFO]   = 'b',
-		[SYMTAB__SYSTEM_PATH_DSO]     = 'd',
-		[SYMTAB__SYSTEM_PATH_KMODULE] = 'K',
-		[SYMTAB__GUEST_KALLSYMS]      =  'g',
-		[SYMTAB__GUEST_KMODULE]	      =  'G',
+		[DSO_BINARY_TYPE__KALLSYMS]		= 'k',
+		[DSO_BINARY_TYPE__JAVA_JIT]		= 'j',
+		[DSO_BINARY_TYPE__BUILD_ID_CACHE]	= 'B',
+		[DSO_BINARY_TYPE__FEDORA_DEBUGINFO]	= 'f',
+		[DSO_BINARY_TYPE__UBUNTU_DEBUGINFO]	= 'u',
+		[DSO_BINARY_TYPE__BUILDID_DEBUGINFO]	= 'b',
+		[DSO_BINARY_TYPE__SYSTEM_PATH_DSO]	= 'd',
+		[DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE]	= 'K',
+		[DSO_BINARY_TYPE__GUEST_KALLSYMS]	= 'g',
+		[DSO_BINARY_TYPE__GUEST_KMODULE]	= 'G',
 	};
 
-	if (dso == NULL || dso->symtab_type == SYMTAB__NOT_FOUND)
+	if (dso == NULL || dso->symtab_type == DSO_BINARY_TYPE__NOT_FOUND)
 		return '!';
 	return origin[dso->symtab_type];
 }
 
+int dso__binary_type_file(struct dso *dso, enum dso_binary_type type,
+			  char *root_dir, char *file, size_t size)
+{
+	char build_id_hex[BUILD_ID_SIZE * 2 + 1];
+	int ret = 0;
+
+	switch (type) {
+	case DSO_BINARY_TYPE__BUILD_ID_CACHE:
+		/* skip the locally configured cache if a symfs is given */
+		if (symbol_conf.symfs[0] ||
+		    (dso__build_id_filename(dso, file, size) == NULL))
+			ret = -1;
+		break;
+
+	case DSO_BINARY_TYPE__FEDORA_DEBUGINFO:
+		snprintf(file, size, "%s/usr/lib/debug%s.debug",
+			 symbol_conf.symfs, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO:
+		snprintf(file, size, "%s/usr/lib/debug%s",
+			 symbol_conf.symfs, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__BUILDID_DEBUGINFO:
+		if (!dso->has_build_id) {
+			ret = -1;
+			break;
+		}
+
+		build_id__sprintf(dso->build_id,
+				  sizeof(dso->build_id),
+				  build_id_hex);
+		snprintf(file, size,
+			 "%s/usr/lib/debug/.build-id/%.2s/%s.debug",
+			 symbol_conf.symfs, build_id_hex, build_id_hex + 2);
+		break;
+
+	case DSO_BINARY_TYPE__SYSTEM_PATH_DSO:
+		snprintf(file, size, "%s%s",
+			 symbol_conf.symfs, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__GUEST_KMODULE:
+		snprintf(file, size, "%s%s%s", symbol_conf.symfs,
+			 root_dir, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE:
+		snprintf(file, size, "%s%s", symbol_conf.symfs,
+			 dso->long_name);
+		break;
+
+	default:
+	case DSO_BINARY_TYPE__KALLSYMS:
+	case DSO_BINARY_TYPE__GUEST_KALLSYMS:
+	case DSO_BINARY_TYPE__JAVA_JIT:
+	case DSO_BINARY_TYPE__NOT_FOUND:
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
 int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 {
-	int size = PATH_MAX;
 	char *name;
 	int ret = -1;
 	int fd;
+	u_int i;
 	struct machine *machine;
-	const char *root_dir;
+	char *root_dir = (char *) "";
 	int want_symtab;
 
 	dso__set_loaded(dso, map->type);
@@ -1603,7 +1684,7 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 	else
 		machine = NULL;
 
-	name = malloc(size);
+	name = malloc(PATH_MAX);
 	if (!name)
 		return -1;
 
@@ -1622,69 +1703,27 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 		}
 
 		ret = dso__load_perf_map(dso, map, filter);
-		dso->symtab_type = ret > 0 ? SYMTAB__JAVA_JIT :
-					      SYMTAB__NOT_FOUND;
+		dso->symtab_type = ret > 0 ? DSO_BINARY_TYPE__JAVA_JIT :
+					     DSO_BINARY_TYPE__NOT_FOUND;
 		return ret;
 	}
 
+	if (machine)
+		root_dir = machine->root_dir;
+
 	/* Iterate over candidate debug images.
 	 * On the first pass, only load images if they have a full symtab.
 	 * Failing that, do a second pass where we accept .dynsym also
 	 */
 	want_symtab = 1;
 restart:
-	for (dso->symtab_type = SYMTAB__BUILD_ID_CACHE;
-	     dso->symtab_type != SYMTAB__NOT_FOUND;
-	     dso->symtab_type++) {
-		switch (dso->symtab_type) {
-		case SYMTAB__BUILD_ID_CACHE:
-			/* skip the locally configured cache if a symfs is given */
-			if (symbol_conf.symfs[0] ||
-			    (dso__build_id_filename(dso, name, size) == NULL)) {
-				continue;
-			}
-			break;
-		case SYMTAB__FEDORA_DEBUGINFO:
-			snprintf(name, size, "%s/usr/lib/debug%s.debug",
-				 symbol_conf.symfs, dso->long_name);
-			break;
-		case SYMTAB__UBUNTU_DEBUGINFO:
-			snprintf(name, size, "%s/usr/lib/debug%s",
-				 symbol_conf.symfs, dso->long_name);
-			break;
-		case SYMTAB__BUILDID_DEBUGINFO: {
-			char build_id_hex[BUILD_ID_SIZE * 2 + 1];
-
-			if (!dso->has_build_id)
-				continue;
+	for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
 
-			build_id__sprintf(dso->build_id,
-					  sizeof(dso->build_id),
-					  build_id_hex);
-			snprintf(name, size,
-				 "%s/usr/lib/debug/.build-id/%.2s/%s.debug",
-				 symbol_conf.symfs, build_id_hex, build_id_hex + 2);
-			}
-			break;
-		case SYMTAB__SYSTEM_PATH_DSO:
-			snprintf(name, size, "%s%s",
-			     symbol_conf.symfs, dso->long_name);
-			break;
-		case SYMTAB__GUEST_KMODULE:
-			if (map->groups && machine)
-				root_dir = machine->root_dir;
-			else
-				root_dir = "";
-			snprintf(name, size, "%s%s%s", symbol_conf.symfs,
-				 root_dir, dso->long_name);
-			break;
+		dso->symtab_type = binary_type_symtab[i];
 
-		case SYMTAB__SYSTEM_PATH_KMODULE:
-			snprintf(name, size, "%s%s", symbol_conf.symfs,
-				 dso->long_name);
-			break;
-		default:;
-		}
+		if (dso__binary_type_file(dso, dso->symtab_type,
+					  root_dir, name, PATH_MAX))
+			continue;
 
 		/* Name is now the name of the next image to try */
 		fd = open(name, O_RDONLY);
@@ -1900,9 +1939,9 @@ struct map *machine__new_module(struct machine *machine, u64 start,
 		return NULL;
 
 	if (machine__is_host(machine))
-		dso->symtab_type = SYMTAB__SYSTEM_PATH_KMODULE;
+		dso->symtab_type = DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE;
 	else
-		dso->symtab_type = SYMTAB__GUEST_KMODULE;
+		dso->symtab_type = DSO_BINARY_TYPE__GUEST_KMODULE;
 	map_groups__insert(&machine->kmaps, map);
 	return map;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index ac49ef2..13c3618 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -149,6 +149,20 @@ struct addr_location {
 	s32	      cpu;
 };
 
+enum dso_binary_type {
+	DSO_BINARY_TYPE__KALLSYMS = 0,
+	DSO_BINARY_TYPE__GUEST_KALLSYMS,
+	DSO_BINARY_TYPE__JAVA_JIT,
+	DSO_BINARY_TYPE__BUILD_ID_CACHE,
+	DSO_BINARY_TYPE__FEDORA_DEBUGINFO,
+	DSO_BINARY_TYPE__UBUNTU_DEBUGINFO,
+	DSO_BINARY_TYPE__BUILDID_DEBUGINFO,
+	DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+	DSO_BINARY_TYPE__GUEST_KMODULE,
+	DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE,
+	DSO_BINARY_TYPE__NOT_FOUND,
+};
+
 enum dso_kernel_type {
 	DSO_TYPE_USER = 0,
 	DSO_TYPE_KERNEL,
@@ -160,13 +174,13 @@ struct dso {
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	enum dso_kernel_type	kernel;
+	enum dso_binary_type	symtab_type;
 	u8		 adjust_symbols:1;
 	u8		 has_build_id:1;
 	u8		 hit:1;
 	u8		 annotate_warned:1;
 	u8		 sname_alloc:1;
 	u8		 lname_alloc:1;
-	unsigned char	 symtab_type;
 	u8		 sorted_by_name;
 	u8		 loaded;
 	u8		 build_id[BUILD_ID_SIZE];
@@ -218,20 +232,6 @@ size_t dso__fprintf_symbols_by_name(struct dso *dso,
 				    enum map_type type, FILE *fp);
 size_t dso__fprintf(struct dso *dso, enum map_type type, FILE *fp);
 
-enum symtab_type {
-	SYMTAB__KALLSYMS = 0,
-	SYMTAB__GUEST_KALLSYMS,
-	SYMTAB__JAVA_JIT,
-	SYMTAB__BUILD_ID_CACHE,
-	SYMTAB__FEDORA_DEBUGINFO,
-	SYMTAB__UBUNTU_DEBUGINFO,
-	SYMTAB__BUILDID_DEBUGINFO,
-	SYMTAB__SYSTEM_PATH_DSO,
-	SYMTAB__GUEST_KMODULE,
-	SYMTAB__SYSTEM_PATH_KMODULE,
-	SYMTAB__NOT_FOUND,
-};
-
 char dso__symtab_origin(const struct dso *dso);
 void dso__set_long_name(struct dso *dso, char *name);
 void dso__set_build_id(struct dso *dso, void *build_id);
@@ -267,4 +267,6 @@ bool symbol_type__is_a(char symbol_type, enum map_type map_type);
 
 size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
 
+int dso__binary_type_file(struct dso *dso, enum dso_binary_type type,
+			  char *root_dir, char *file, size_t size);
 #endif /* __PERF_SYMBOL */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/16] perf, tool: Add interface to read DSO image data
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (5 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 06/16] perf, tool: Factor DSO symtab types to generic binary types Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 08/16] perf, tool: Add '.note' check into search for NOTE section Jiri Olsa
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding following interface for DSO object to allow
reading of DSO image data:

  dso__data_fd
    - opens DSO and returns file descriptor
      Binary types are used to locate/open DSO in following order:
        DSO_BINARY_TYPE__BUILD_ID_CACHE
        DSO_BINARY_TYPE__SYSTEM_PATH_DSO
      In other word we first try to open DSO build-id path,
      and if that fails we try to open DSO system path.

  dso__data_read_offset
    - reads DSO data from specified offset

  dso__data_read_addr
    - reads DSO data from specified addres/map.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/symbol.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/symbol.h |    8 +++
 2 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 09b440c..1e42034 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -64,6 +64,14 @@ static enum dso_binary_type binary_type_symtab[] = {
 
 #define DSO_BINARY_TYPE__SYMTAB_CNT sizeof(binary_type_symtab)
 
+static enum dso_binary_type binary_type_data[] = {
+	DSO_BINARY_TYPE__BUILD_ID_CACHE,
+	DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+	DSO_BINARY_TYPE__NOT_FOUND,
+};
+
+#define DSO_BINARY_TYPE__DATA_CNT sizeof(binary_type_data)
+
 int dso__name_len(const struct dso *dso)
 {
 	if (!dso)
@@ -335,6 +343,7 @@ struct dso *dso__new(const char *name)
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
+		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->loaded = 0;
 		dso->sorted_by_name = 0;
 		dso->has_build_id = 0;
@@ -2823,3 +2832,101 @@ int machine__load_vmlinux_path(struct machine *machine, enum map_type type,
 
 	return ret;
 }
+
+static int open_dso(struct dso *dso, struct machine *machine)
+{
+	char *root_dir = (char *) "";
+	char *name;
+	int fd;
+
+	name = malloc(PATH_MAX);
+	if (!name)
+		return -ENOMEM;
+
+	if (machine)
+		root_dir = machine->root_dir;
+
+	if (dso__binary_type_file(dso, dso->data_type,
+				  root_dir, name, PATH_MAX))
+		return -EINVAL;
+
+	fd = open(name, O_RDONLY);
+	free(name);
+	return fd;
+}
+
+int dso__data_fd(struct dso *dso, struct machine *machine)
+{
+	int i = 0;
+
+	if (dso->data_type != DSO_BINARY_TYPE__NOT_FOUND)
+		return open_dso(dso, machine);
+
+	do {
+		int fd;
+
+		dso->data_type = binary_type_data[i++];
+
+		fd = open_dso(dso, machine);
+		if (fd >= 0)
+			return fd;
+
+	} while (dso->data_type != DSO_BINARY_TYPE__NOT_FOUND);
+
+	return -EINVAL;
+}
+
+static ssize_t dso_cache_read(struct dso *dso __used, u64 offset __used,
+			      u8 *data __used, ssize_t size __used)
+{
+	return -EINVAL;
+}
+
+static int dso_cache_add(struct dso *dso __used, u64 offset __used,
+			 u8 *data __used, ssize_t size __used)
+{
+	return 0;
+}
+
+static ssize_t read_dso_data(struct dso *dso, struct machine *machine,
+		     u64 offset, u8 *data, ssize_t size)
+{
+	ssize_t rsize = -1;
+	int fd;
+
+	fd = dso__data_fd(dso, machine);
+	if (fd < 0)
+		return -1;
+
+	do {
+		if (-1 == lseek(fd, offset, SEEK_SET))
+			break;
+
+		rsize = read(fd, data, size);
+		if (-1 == rsize)
+			break;
+
+		if (dso_cache_add(dso, offset, data, size))
+			pr_err("Failed to add data int dso cache.");
+
+	} while (0);
+
+	close(fd);
+	return rsize;
+}
+
+ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
+			      u64 offset, u8 *data, ssize_t size)
+{
+	if (dso_cache_read(dso, offset, data, size))
+		return read_dso_data(dso, machine, offset, data, size);
+	return 0;
+}
+
+ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
+			    struct machine *machine, u64 addr,
+			    u8 *data, ssize_t size)
+{
+	u64 offset = map->map_ip(map, addr);
+	return dso__data_read_offset(dso, machine, offset, data, size);
+}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 13c3618..b62321f 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -175,6 +175,7 @@ struct dso {
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	enum dso_kernel_type	kernel;
 	enum dso_binary_type	symtab_type;
+	enum dso_binary_type	data_type;
 	u8		 adjust_symbols:1;
 	u8		 has_build_id:1;
 	u8		 hit:1;
@@ -269,4 +270,11 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
 
 int dso__binary_type_file(struct dso *dso, enum dso_binary_type type,
 			  char *root_dir, char *file, size_t size);
+
+int dso__data_fd(struct dso *dso, struct machine *machine);
+ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
+			      u64 offset, u8 *data, ssize_t size);
+ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
+			    struct machine *machine, u64 addr,
+			    u8 *data, ssize_t size);
 #endif /* __PERF_SYMBOL */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/16] perf, tool: Add '.note' check into search for NOTE section
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (6 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 07/16] perf, tool: Add interface to read DSO image data Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 09/16] perf, tool: Back [vdso] DSO with real data Jiri Olsa
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding '.note' section name to be check when looking for notes
section. The '.note' name is used by kernel VDSO.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/symbol.c |   29 +++++++++++++++++++++++------
 1 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 1e42034..9c0fa32 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1474,14 +1474,31 @@ static int elf_read_build_id(Elf *elf, void *bf, size_t size)
 		goto out;
 	}
 
-	sec = elf_section_by_name(elf, &ehdr, &shdr,
-				  ".note.gnu.build-id", NULL);
-	if (sec == NULL) {
+	/*
+	 * Check following sections for notes:
+	 *   '.note.gnu.build-id'
+	 *   '.notes'
+	 *   '.note' (VDSO specific)
+	 */
+	do {
+		sec = elf_section_by_name(elf, &ehdr, &shdr,
+					  ".note.gnu.build-id", NULL);
+		if (sec)
+			break;
+
 		sec = elf_section_by_name(elf, &ehdr, &shdr,
 					  ".notes", NULL);
-		if (sec == NULL)
-			goto out;
-	}
+		if (sec)
+			break;
+
+		sec = elf_section_by_name(elf, &ehdr, &shdr,
+					  ".note", NULL);
+		if (sec)
+			break;
+
+		return err;
+
+	} while (0);
 
 	data = elf_getdata(sec, NULL);
 	if (data == NULL)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/16] perf, tool: Back [vdso] DSO with real data
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (7 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 08/16] perf, tool: Add '.note' check into search for NOTE section Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 10/16] perf, tool: Add interface to arch registers sets Jiri Olsa
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Storing data for VDSO shared object, because we need it for
the unwind process.

The idea is that VDSO shared object is same for all process
on a running system, so it makes no difference if we store
it inside the tracer - perf.

The record command:
When [vdso] map memory is hit, we retrieve [vdso] DSO image
and store it into temporary file. During the build-id
processing the [vdso] DSO image is stored in build-id db,
and build-id refference is made inside perf.data. The temporary
file is removed when record is finished.

The report command:
We read build-id from perf.data and store [vdso] DSO object.
This object is refferenced and attached to map when the MMAP
events are processed. Thus during the SAMPLE event processing
we have correct mmap/dso attached.

Adding following API for vdso object:
  vdso__file
    - vdso temp file path

  vdso__get_file
    - finds and store VDSO image into temp file,
      the temp file path is returned

  vdso__exit
    - removes temporary VDSO image if there's any

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile       |    2 +
 tools/perf/util/map.c     |   23 ++++++++++-
 tools/perf/util/session.c |    2 +
 tools/perf/util/vdso.c    |   90 +++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/vdso.h    |    8 ++++
 5 files changed, 123 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/vdso.c
 create mode 100644 tools/perf/util/vdso.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index e98e14c..d82af48 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -300,6 +300,7 @@ LIB_H += util/cpumap.h
 LIB_H += util/top.h
 LIB_H += $(ARCH_INCLUDE)
 LIB_H += util/cgroup.h
+LIB_H += util/vdso.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -361,6 +362,7 @@ LIB_OBJS += $(OUTPUT)util/util.o
 LIB_OBJS += $(OUTPUT)util/xyarray.o
 LIB_OBJS += $(OUTPUT)util/cpumap.o
 LIB_OBJS += $(OUTPUT)util/cgroup.o
+LIB_OBJS += $(OUTPUT)util/vdso.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 35ae568..1649ea0 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -7,6 +7,7 @@
 #include <stdio.h>
 #include <unistd.h>
 #include "map.h"
+#include "vdso.h"
 
 const char *map_type__name[MAP__NR_TYPES] = {
 	[MAP__FUNCTION] = "Functions",
@@ -18,10 +19,14 @@ static inline int is_anon_memory(const char *filename)
 	return strcmp(filename, "//anon") == 0;
 }
 
+static inline int is_vdso_memory(const char *filename)
+{
+	return !strcmp(filename, "[vdso]");
+}
+
 static inline int is_no_dso_memory(const char *filename)
 {
 	return !strcmp(filename, "[stack]") ||
-	       !strcmp(filename, "[vdso]")  ||
 	       !strcmp(filename, "[heap]");
 }
 
@@ -50,9 +55,10 @@ struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
 	if (self != NULL) {
 		char newfilename[PATH_MAX];
 		struct dso *dso;
-		int anon, no_dso;
+		int anon, no_dso, vdso;
 
 		anon = is_anon_memory(filename);
+		vdso = is_vdso_memory(filename);
 		no_dso = is_no_dso_memory(filename);
 
 		if (anon) {
@@ -60,10 +66,23 @@ struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
 			filename = newfilename;
 		}
 
+		if (vdso) {
+			filename = (char *) vdso__file;
+			pgoff = 0;
+		}
+
 		dso = __dsos__findnew(dsos__list, filename);
 		if (dso == NULL)
 			goto out_delete;
 
+		if (vdso && !dso->has_build_id) {
+			char *file_vdso = vdso__get_file();
+			if (file_vdso)
+				dso__set_long_name(dso, file_vdso);
+			else
+				no_dso = 1;
+		}
+
 		map__init(self, type, start, start + len, pgoff, dso);
 
 		if (anon || no_dso) {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1efd3be..ae30ca5 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -14,6 +14,7 @@
 #include "sort.h"
 #include "util.h"
 #include "cpumap.h"
+#include "vdso.h"
 
 static int perf_session__open(struct perf_session *self, bool force)
 {
@@ -209,6 +210,7 @@ void perf_session__delete(struct perf_session *self)
 	machine__exit(&self->host_machine);
 	close(self->fd);
 	free(self);
+	vdso__exit();
 }
 
 void machine__remove_thread(struct machine *self, struct thread *th)
diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
new file mode 100644
index 0000000..e964482
--- /dev/null
+++ b/tools/perf/util/vdso.c
@@ -0,0 +1,90 @@
+
+#include <unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <linux/kernel.h>
+#include "vdso.h"
+#include "util.h"
+
+const char vdso__file[] = "/tmp/vdso.so";
+static bool vdso_found;
+
+static int find_vdso_map(void **start, void **end)
+{
+	FILE *maps;
+	char line[128];
+	int found = 0;
+
+	maps = fopen("/proc/self/maps", "r");
+	if (!maps) {
+		pr_err("vdso: cannot open maps\n");
+		return -1;
+	}
+
+	while (!found && fgets(line, sizeof(line), maps)) {
+		int m = -1;
+
+		/* We care only about private r-x mappings. */
+		if (2 != sscanf(line, "%p-%p r-xp %*x %*x:%*x %*u %n",
+				start, end, &m))
+			continue;
+		if (m < 0)
+			continue;
+
+		if (!strncmp(&line[m], "[vdso]", 6))
+			found = 1;
+	}
+
+	fclose(maps);
+	return !found;
+}
+
+char *vdso__get_file(void)
+{
+	char *vdso = NULL;
+	char *buf = NULL;
+	void *start, *end;
+
+	do {
+		int fd, size;
+
+		if (vdso_found) {
+			vdso = (char *) vdso__file;
+			break;
+		}
+
+		if (find_vdso_map(&start, &end))
+			break;
+
+		size = end - start;
+		buf = malloc(size);
+		if (!buf)
+			break;
+
+		memcpy(buf, start, size);
+
+		fd = open(vdso__file, O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
+		if (fd < 0)
+			break;
+
+		if (size == write(fd, buf, size))
+			vdso = (char *) vdso__file;
+
+		close(fd);
+	} while (0);
+
+	if (buf)
+		free(buf);
+
+	vdso_found = (vdso != NULL);
+	return vdso;
+}
+
+void vdso__exit(void)
+{
+	if (vdso_found)
+		unlink(vdso__file);
+}
diff --git a/tools/perf/util/vdso.h b/tools/perf/util/vdso.h
new file mode 100644
index 0000000..908b041
--- /dev/null
+++ b/tools/perf/util/vdso.h
@@ -0,0 +1,8 @@
+#ifndef __VDSO__
+#define __VDSO__
+
+extern const char vdso__file[];
+char *vdso__get_file(void);
+void  vdso__exit(void);
+
+#endif /* __VDSO__ */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/16] perf, tool: Add interface to arch registers sets
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (8 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 09/16] perf, tool: Back [vdso] DSO with real data Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 11/16] perf, tool: Add libunwind dependency for dwarf cfi unwinding Jiri Olsa
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding header files to access unified API for arch registers.

In addition adding a way to obtain register name based on the
API ID value.

Also adding PERF_REGS_MASK macro with mask definition of
all current arch registers (will be used in unwind patches).

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile                     |    9 +++-
 tools/perf/arch/x86/include/perf_regs.h |  101 +++++++++++++++++++++++++++++++
 tools/perf/util/perf_regs.h             |   10 +++
 3 files changed, 119 insertions(+), 1 deletions(-)
 create mode 100644 tools/perf/arch/x86/include/perf_regs.h
 create mode 100644 tools/perf/util/perf_regs.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d82af48..4ce268c 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -50,13 +50,15 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \
 				  -e s/s390x/s390/ -e s/parisc64/parisc/ \
 				  -e s/ppc.*/powerpc/ -e s/mips.*/mips/ \
 				  -e s/sh[234].*/sh/ )
+NO_PERF_REGS_DEFS := 1
 
 CC = $(CROSS_COMPILE)gcc
 AR = $(CROSS_COMPILE)ar
 
 # Additional ARCH settings for x86
 ifeq ($(ARCH),i386)
-        ARCH := x86
+	ARCH := x86
+	NO_PERF_REGS_DEFS := 0
 endif
 ifeq ($(ARCH),x86_64)
 	ARCH := x86
@@ -69,6 +71,7 @@ ifeq ($(ARCH),x86_64)
 		ARCH_CFLAGS := -DARCH_X86_64
 		ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S ../../arch/x86/lib/memset_64.S
 	endif
+	NO_PERF_REGS_DEFS := 0
 endif
 
 # Treat warnings as errors unless directed not to
@@ -301,6 +304,7 @@ LIB_H += util/top.h
 LIB_H += $(ARCH_INCLUDE)
 LIB_H += util/cgroup.h
 LIB_H += util/vdso.h
+LIB_H += util/perf_regs.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -638,6 +642,9 @@ else
 	endif
 endif
 
+ifeq ($(NO_PERF_REGS_DEFS),1)
+	BASIC_CFLAGS += -DNO_PERF_REGS_DEFS
+endif
 
 ifdef NO_STRLCPY
 	BASIC_CFLAGS += -DNO_STRLCPY
diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/include/perf_regs.h
new file mode 100644
index 0000000..97c0a25
--- /dev/null
+++ b/tools/perf/arch/x86/include/perf_regs.h
@@ -0,0 +1,101 @@
+#ifndef PERF_REGS_H
+#define PERF_REGS_H
+
+#include <stdlib.h>
+
+#ifdef ARCH_X86_64
+#include "../../../../../arch/x86/include/asm/perf_regs_64.h"
+#define PERF_REGS_MASK ((1 << PERF_X86_64_REG_MAX) - 1)
+
+static inline const char *perf_reg_name(int id)
+{
+	switch (id) {
+	case PERF_X86_64_REG_RAX:
+		return "RAX";
+	case PERF_X86_64_REG_RBX:
+		return "RBX";
+	case PERF_X86_64_REG_RCX:
+		return "RCX";
+	case PERF_X86_64_REG_RDX:
+		return "RDX";
+	case PERF_X86_64_REG_RSI:
+		return "RSI";
+	case PERF_X86_64_REG_RDI:
+		return "RDI";
+	case PERF_X86_64_REG_R8:
+		return "R8";
+	case PERF_X86_64_REG_R9:
+		return "R9";
+	case PERF_X86_64_REG_R10:
+		return "R10";
+	case PERF_X86_64_REG_R11:
+		return "R11";
+	case PERF_X86_64_REG_R12:
+		return "R12";
+	case PERF_X86_64_REG_R13:
+		return "R13";
+	case PERF_X86_64_REG_R14:
+		return "R14";
+	case PERF_X86_64_REG_R15:
+		return "R15";
+	case PERF_X86_64_REG_RBP:
+		return "RBP";
+	case PERF_X86_64_REG_RSP:
+		return "RSP";
+	case PERF_X86_64_REG_RIP:
+		return "RIP";
+	case PERF_X86_64_REG_FLAGS:
+		return "FLAGS";
+	case PERF_X86_64_REG_CS:
+		return "CS";
+	case PERF_X86_64_REG_SS:
+		return "SS";
+	default:
+		return NULL;
+	}
+	return NULL;
+}
+#else
+#include "../../../../../arch/x86/include/asm/perf_regs_32.h"
+#define PERF_REGS_MASK ((1 << PERF_X86_32_REG_MAX) - 1)
+
+static inline const char *perf_reg_name(int id)
+{
+	switch (id) {
+	case PERF_X86_32_REG_EAX:
+		return "EAX";
+	case PERF_X86_32_REG_EBX:
+		return "EBX";
+	case PERF_X86_32_REG_ECX:
+		return "ECX";
+	case PERF_X86_32_REG_EDX:
+		return "EDX";
+	case PERF_X86_32_REG_ESI:
+		return "ESI";
+	case PERF_X86_32_REG_EDI:
+		return "EDI";
+	case PERF_X86_32_REG_EBP:
+		return "EBP";
+	case PERF_X86_32_REG_ESP:
+		return "ESP";
+	case PERF_X86_32_REG_EIP:
+		return "EIP";
+	case PERF_X86_32_REG_FLAGS:
+		return "FLAGS";
+	case PERF_X86_32_REG_CS:
+		return "CS";
+	case PERF_X86_32_REG_DS:
+		return "DS";
+	case PERF_X86_32_REG_ES:
+		return "ES";
+	case PERF_X86_32_REG_FS:
+		return "FS";
+	case PERF_X86_32_REG_GS:
+		return "GS";
+	default:
+		return NULL;
+	}
+	return NULL;
+}
+#endif /* ARCH_X86_64 */
+#endif /* PERF_REGS_H */
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
new file mode 100644
index 0000000..5e2a945
--- /dev/null
+++ b/tools/perf/util/perf_regs.h
@@ -0,0 +1,10 @@
+#ifndef __PERF_REGS_H
+#define __PERF_REGS_H
+
+#ifndef NO_PERF_REGS_DEFS
+#include <perf_regs.h>
+#else
+#define PERF_REGS_MASK	0
+#endif /* NO_PERF_REGS_DEFS */
+
+#endif /* __PERF_REGS_H */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/16] perf, tool: Add libunwind dependency for dwarf cfi unwinding
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (9 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 10/16] perf, tool: Add interface to arch registers sets Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 12/16] perf, tool: Support user regs and stack in sample parsing Jiri Olsa
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding libunwind to be linked with perf if available. It's required
for the to get dwarf cfi unwinding support.

Also building perf with the dwarf call frame informations by default,
so that we can unwind callchains in perf itself.

Adding LIBUNWIND_DIR Makefile variable allowing user to specify
the directory with libunwind to be linked. This is used for
debug purposes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile                 |   27 ++++++++++++++++++++++++++-
 tools/perf/config/feature-tests.mak |   25 +++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 4ce268c..01d67ec 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -59,6 +59,7 @@ AR = $(CROSS_COMPILE)ar
 ifeq ($(ARCH),i386)
 	ARCH := x86
 	NO_PERF_REGS_DEFS := 0
+	LIBUNWIND_LIBS = -lunwind -lunwind-x86
 endif
 ifeq ($(ARCH),x86_64)
 	ARCH := x86
@@ -72,6 +73,7 @@ ifeq ($(ARCH),x86_64)
 		ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S ../../arch/x86/lib/memset_64.S
 	endif
 	NO_PERF_REGS_DEFS := 0
+	LIBUNWIND_LIBS = -lunwind -lunwind-x86_64
 endif
 
 # Treat warnings as errors unless directed not to
@@ -86,7 +88,7 @@ ifndef PERF_DEBUG
   CFLAGS_OPTIMIZE = -O6
 endif
 
-CFLAGS = -fno-omit-frame-pointer -ggdb3 -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS)
+CFLAGS = -fno-omit-frame-pointer -ggdb3 -funwind-tables -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) -D_FORTIFY_SOURCE=2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS)
 EXTLIBS = -lpthread -lrt -lelf -lm
 ALL_CFLAGS = $(CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 ALL_LDFLAGS = $(LDFLAGS)
@@ -437,6 +439,21 @@ ifneq ($(call try-cc,$(SOURCE_DWARF),$(FLAGS_DWARF)),y)
 endif # Dwarf support
 endif # NO_DWARF
 
+ifndef NO_LIBUNWIND
+# for linking with debug library, run like:
+# make DEBUG=1 LIBUNWIND_DIR=/opt/libunwind/
+ifdef LIBUNWIND_DIR
+	LIBUNWIND_CFLAGS  := -I$(LIBUNWIND_DIR)/include
+	LIBUNWIND_LDFLAGS := -L$(LIBUNWIND_DIR)/lib
+endif
+
+FLAGS_UNWIND=$(LIBUNWIND_CFLAGS) $(ALL_CFLAGS) $(LIBUNWIND_LDFLAGS) $(ALL_LDFLAGS) $(EXTLIBS) $(LIBUNWIND_LIBS)
+ifneq ($(call try-cc,$(SOURCE_LIBUNWIND),$(FLAGS_UNWIND)),y)
+	msg := $(warning No libunwind found. Please install libunwind >= 0.99);
+	NO_LIBUNWIND := 1
+endif # Libunwind support
+endif # NO_LIBUNWIND
+
 -include arch/$(ARCH)/Makefile
 
 ifneq ($(OUTPUT),)
@@ -468,6 +485,14 @@ else
 endif # PERF_HAVE_DWARF_REGS
 endif # NO_DWARF
 
+ifdef NO_LIBUNWIND
+	BASIC_CFLAGS += -DNO_LIBUNWIND_SUPPORT
+else
+	EXTLIBS += $(LIBUNWIND_LIBS)
+	BASIC_CFLAGS := $(LIBUNWIND_CFLAGS) $(BASIC_CFLAGS)
+	BASIC_LDFLAGS := $(LIBUNWIND_LDFLAGS) $(BASIC_LDFLAGS)
+endif
+
 ifdef NO_NEWT
 	BASIC_CFLAGS += -DNO_NEWT_SUPPORT
 else
diff --git a/tools/perf/config/feature-tests.mak b/tools/perf/config/feature-tests.mak
index d9084e0..51cd201 100644
--- a/tools/perf/config/feature-tests.mak
+++ b/tools/perf/config/feature-tests.mak
@@ -141,3 +141,28 @@ int main(void)
 	return 0;
 }
 endef
+
+ifndef NO_LIBUNWIND
+define SOURCE_LIBUNWIND
+#include <libunwind.h>
+#include <stdlib.h>
+
+extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
+                                      unw_word_t ip,
+                                      unw_dyn_info_t *di,
+                                      unw_proc_info_t *pi,
+                                      int need_unwind_info, void *arg);
+
+
+#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
+
+int main(void)
+{
+	unw_addr_space_t addr_space;
+	addr_space = unw_create_addr_space(NULL, 0);
+	unw_init_remote(NULL, addr_space, NULL);
+	dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL);
+	return 0;
+}
+endef
+endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/16] perf, tool: Support user regs and stack in sample parsing
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (10 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 11/16] perf, tool: Add libunwind dependency for dwarf cfi unwinding Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 13/16] perf, tool: Support for dwarf cfi unwinding on post processing Jiri Olsa
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding following info to be parsed out of the event sample:
 - user register set
 - user stack dump

Both are global and specific to all events within the session.
This info will be used in the unwind patches comming in shortly.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/builtin-test.c |    3 ++-
 tools/perf/util/event.h   |   15 ++++++++++++++-
 tools/perf/util/evlist.c  |   16 ++++++++++++++++
 tools/perf/util/evlist.h  |    2 ++
 tools/perf/util/evsel.c   |   25 +++++++++++++++++++++++++
 tools/perf/util/python.c  |    3 ++-
 tools/perf/util/session.c |    2 ++
 tools/perf/util/session.h |    7 ++++++-
 8 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index 1c5b980..a434aaa 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -564,7 +564,7 @@ static int test__basic_mmap(void)
 		}
 
 		err = perf_event__parse_sample(event, attr.sample_type, sample_size,
-					       false, &sample, false);
+					       false, 0, false, &sample, false);
 		if (err) {
 			pr_err("Can't parse sample, err = %d\n", err);
 			goto out_munmap;
@@ -1307,6 +1307,7 @@ static int test__PERF_RECORD(void)
 
 				err = perf_event__parse_sample(event, sample_type,
 							       sample_size, true,
+							       0, false,
 							       &sample, false);
 				if (err < 0) {
 					if (verbose)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 1b19728..31d3e8c 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -69,6 +69,16 @@ struct sample_event {
 	u64 array[];
 };
 
+struct user_regs {
+	u64 version;
+	u64 *regs;
+};
+
+struct user_stack_dump {
+	u64 size;
+	char *data;
+};
+
 struct perf_sample {
 	u64 ip;
 	u32 pid, tid;
@@ -82,6 +92,8 @@ struct perf_sample {
 	void *raw_data;
 	struct ip_callchain *callchain;
 	struct branch_stack *branch_stack;
+	struct user_regs uregs;
+	struct user_stack_dump stack;
 };
 
 #define BUILD_ID_SIZE 20
@@ -199,7 +211,8 @@ const char *perf_event__name(unsigned int id);
 
 int perf_event__parse_sample(const union perf_event *event, u64 type,
 			     int sample_size, bool sample_id_all,
-			     struct perf_sample *sample, bool swapped);
+			     u64 sample_uregs, bool sample_ustack,
+			     struct perf_sample *data, bool swapped);
 int perf_event__synthesize_sample(union perf_event *event, u64 type,
 				  const struct perf_sample *sample,
 				  bool swapped);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1986d80..11cc50e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -672,6 +672,22 @@ bool perf_evlist__valid_sample_type(const struct perf_evlist *evlist)
 	return true;
 }
 
+u64 perf_evlist__sample_uregs(const struct perf_evlist *evlist)
+{
+	struct perf_evsel *first;
+
+	first = list_entry(evlist->entries.next, struct perf_evsel, node);
+	return first->attr.user_regs;
+}
+
+bool perf_evlist__sample_ustack(const struct perf_evlist *evlist)
+{
+	struct perf_evsel *first;
+
+	first = list_entry(evlist->entries.next, struct perf_evsel, node);
+	return (first->attr.ustack_dump_size != 0);
+}
+
 u64 perf_evlist__sample_type(const struct perf_evlist *evlist)
 {
 	struct perf_evsel *first;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 21f1c9e..a90cf24 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -117,6 +117,8 @@ u16 perf_evlist__id_hdr_size(const struct perf_evlist *evlist);
 
 bool perf_evlist__valid_sample_type(const struct perf_evlist *evlist);
 bool perf_evlist__valid_sample_id_all(const struct perf_evlist *evlist);
+u64 perf_evlist__sample_uregs(const struct perf_evlist *evlist);
+bool perf_evlist__sample_ustack(const struct perf_evlist *evlist);
 
 void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
 				   struct list_head *list,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8c13dbc..7ee47c1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -8,6 +8,7 @@
  */
 
 #include <byteswap.h>
+#include <linux/bitops.h>
 #include "asm/bug.h"
 #include "evsel.h"
 #include "evlist.h"
@@ -453,8 +454,10 @@ static bool sample_overlap(const union perf_event *event,
 
 int perf_event__parse_sample(const union perf_event *event, u64 type,
 			     int sample_size, bool sample_id_all,
+			     u64 sample_uregs, bool sample_ustack,
 			     struct perf_sample *data, bool swapped)
 {
+	int sample_uregs_nr = hweight_long(sample_uregs);
 	const u64 *array;
 
 	/*
@@ -594,6 +597,28 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
 		sz /= sizeof(u64);
 		array += sz;
 	}
+
+	if (sample_uregs_nr) {
+		data->uregs.version = *array++;
+
+		if (data->uregs.version) {
+			data->uregs.regs = (u64 *)array;
+			array += sample_uregs_nr;
+		}
+	}
+
+	if (sample_ustack) {
+		u64 size = *array++;
+
+		if (!size) {
+			data->stack.size = 0;
+		} else {
+			data->stack.data = (char *)array;
+			array += size / sizeof(*array);
+			data->stack.size = *array;
+		}
+	}
+
 	return 0;
 }
 
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index e03b58a..257efcd 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -807,7 +807,8 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist,
 		first = list_entry(evlist->entries.next, struct perf_evsel, node);
 		err = perf_event__parse_sample(event, first->attr.sample_type,
 					       perf_evsel__sample_size(first),
-					       sample_id_all, &pevent->sample, false);
+					       sample_id_all, 0, false,
+					       &pevent->sample, false);
 		if (err)
 			return PyErr_Format(PyExc_OSError,
 					    "perf: can't parse sample, err=%d", err);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ae30ca5..a2e1c1b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -87,6 +87,8 @@ void perf_session__update_sample_type(struct perf_session *self)
 	self->sample_id_all = perf_evlist__sample_id_all(self->evlist);
 	self->id_hdr_size = perf_evlist__id_hdr_size(self->evlist);
 	self->host_machine.id_hdr_size = self->id_hdr_size;
+	self->sample_uregs = perf_evlist__sample_uregs(self->evlist);
+	self->sample_ustack = perf_evlist__sample_ustack(self->evlist);
 }
 
 int perf_session__create_kernel_maps(struct perf_session *self)
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 7a5434c..182c0e5 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -42,6 +42,8 @@ struct perf_session {
 	struct hists		hists;
 	u64			sample_type;
 	int			sample_size;
+	u64			sample_uregs;
+	bool			sample_ustack;
 	int			fd;
 	bool			fd_pipe;
 	bool			repipe;
@@ -134,7 +136,10 @@ static inline int perf_session__parse_sample(struct perf_session *session,
 {
 	return perf_event__parse_sample(event, session->sample_type,
 					session->sample_size,
-					session->sample_id_all, sample,
+					session->sample_id_all,
+					session->sample_uregs,
+					session->sample_ustack,
+					sample,
 					session->header.needs_swap);
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/16] perf, tool: Support for dwarf cfi unwinding on post processing
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (11 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 12/16] perf, tool: Support user regs and stack in sample parsing Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 14/16] perf, tool: Support for dwarf mode callchain on perf record Jiri Olsa
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

This brings the support for dwarf cfi unwinding on perf post
processing. Call frame informations are retrieved and then passed
to libunwind that requests memory and register content from the
applications.

Adding unwind object to handle the user stack backtrace based
on the user register values and user stack dump.

The unwind object access the libunwind via remote interface
and provides to it all the necessary data to unwind the stack.

The unwind interface provides following function:
	unwind__get_entries

And callback (specified in above function) to retrieve
the backtrace entries:
	typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry,
					 void *arg);

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile                                |    2 +
 tools/perf/arch/x86/Makefile                       |    3 +
 tools/perf/arch/x86/util/unwind.c                  |  111 ++++
 tools/perf/builtin-report.c                        |   24 +-
 tools/perf/builtin-script.c                        |   56 ++-
 tools/perf/builtin-top.c                           |    5 +-
 tools/perf/util/include/linux/compiler.h           |    1 +
 tools/perf/util/map.h                              |    7 +-
 .../perf/util/scripting-engines/trace-event-perl.c |    3 +-
 .../util/scripting-engines/trace-event-python.c    |    3 +-
 tools/perf/util/session.c                          |  100 +++-
 tools/perf/util/session.h                          |    3 +-
 tools/perf/util/trace-event-scripting.c            |    3 +-
 tools/perf/util/trace-event.h                      |    5 +-
 tools/perf/util/unwind.c                           |  565 ++++++++++++++++++++
 tools/perf/util/unwind.h                           |   34 ++
 16 files changed, 872 insertions(+), 53 deletions(-)
 create mode 100644 tools/perf/arch/x86/util/unwind.c
 create mode 100644 tools/perf/util/unwind.c
 create mode 100644 tools/perf/util/unwind.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 01d67ec..3614b1a 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -307,6 +307,7 @@ LIB_H += $(ARCH_INCLUDE)
 LIB_H += util/cgroup.h
 LIB_H += util/vdso.h
 LIB_H += util/perf_regs.h
+LIB_H += util/unwind.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -491,6 +492,7 @@ else
 	EXTLIBS += $(LIBUNWIND_LIBS)
 	BASIC_CFLAGS := $(LIBUNWIND_CFLAGS) $(BASIC_CFLAGS)
 	BASIC_LDFLAGS := $(LIBUNWIND_LDFLAGS) $(BASIC_LDFLAGS)
+	LIB_OBJS += $(OUTPUT)util/unwind.o
 endif
 
 ifdef NO_NEWT
diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
index 744e629..815841c 100644
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -2,4 +2,7 @@ ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
 endif
+ifndef NO_LIBUNWIND
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o
+endif
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
diff --git a/tools/perf/arch/x86/util/unwind.c b/tools/perf/arch/x86/util/unwind.c
new file mode 100644
index 0000000..aed7f5a
--- /dev/null
+++ b/tools/perf/arch/x86/util/unwind.c
@@ -0,0 +1,111 @@
+
+#include <errno.h>
+#include <libunwind.h>
+#include "perf_regs.h"
+#include "../../util/unwind.h"
+
+#ifdef ARCH_X86_64
+int unwind__arch_reg_id(int regnum)
+{
+	int id;
+
+	switch (regnum) {
+	case UNW_X86_64_RAX:
+		id = PERF_X86_64_REG_RAX;
+		break;
+	case UNW_X86_64_RDX:
+		id = PERF_X86_64_REG_RDX;
+		break;
+	case UNW_X86_64_RCX:
+		id = PERF_X86_64_REG_RCX;
+		break;
+	case UNW_X86_64_RBX:
+		id = PERF_X86_64_REG_RBX;
+		break;
+	case UNW_X86_64_RSI:
+		id = PERF_X86_64_REG_RSI;
+		break;
+	case UNW_X86_64_RDI:
+		id = PERF_X86_64_REG_RDI;
+		break;
+	case UNW_X86_64_RBP:
+		id = PERF_X86_64_REG_RBP;
+		break;
+	case UNW_X86_64_RSP:
+		id = PERF_X86_64_REG_RSP;
+		break;
+	case UNW_X86_64_R8:
+		id = PERF_X86_64_REG_R8;
+		break;
+	case UNW_X86_64_R9:
+		id = PERF_X86_64_REG_R9;
+		break;
+	case UNW_X86_64_R10:
+		id = PERF_X86_64_REG_R10;
+		break;
+	case UNW_X86_64_R11:
+		id = PERF_X86_64_REG_R11;
+		break;
+	case UNW_X86_64_R12:
+		id = PERF_X86_64_REG_R12;
+		break;
+	case UNW_X86_64_R13:
+		id = PERF_X86_64_REG_R13;
+		break;
+	case UNW_X86_64_R14:
+		id = PERF_X86_64_REG_R14;
+		break;
+	case UNW_X86_64_R15:
+		id = PERF_X86_64_REG_R15;
+		break;
+	case UNW_X86_64_RIP:
+		id = PERF_X86_64_REG_RIP;
+		break;
+	default:
+		pr_err("unwind: invalid reg id %d\n", regnum);
+		return -EINVAL;
+	}
+
+	return id;
+}
+#else
+int unwind__arch_reg_id(int regnum)
+{
+	int id;
+
+	switch (regnum) {
+	case UNW_X86_EAX:
+		id = PERF_X86_32_REG_EAX;
+		break;
+	case UNW_X86_EDX:
+		id = PERF_X86_32_REG_EDX;
+		break;
+	case UNW_X86_ECX:
+		id = PERF_X86_32_REG_ECX;
+		break;
+	case UNW_X86_EBX:
+		id = PERF_X86_32_REG_EBX;
+		break;
+	case UNW_X86_ESI:
+		id = PERF_X86_32_REG_ESI;
+		break;
+	case UNW_X86_EDI:
+		id = PERF_X86_32_REG_EDI;
+		break;
+	case UNW_X86_EBP:
+		id = PERF_X86_32_REG_EBP;
+		break;
+	case UNW_X86_ESP:
+		id = PERF_X86_32_REG_ESP;
+		break;
+	case UNW_X86_EIP:
+		id = PERF_X86_32_REG_EIP;
+		break;
+	default:
+		pr_err("unwind: invalid reg id %d\n", regnum);
+		return -EINVAL;
+	}
+
+	return id;
+}
+#endif /* ARCH_X86_64 */
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index cec2b8c..3753820 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -69,8 +69,8 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 
 	if ((sort__has_parent || symbol_conf.use_callchain)
 	    && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
-						 sample->callchain, &parent);
+		err = machine__resolve_callchain(rep->session, machine, evsel,
+						 al->thread, sample, &parent);
 		if (err)
 			return err;
 	}
@@ -130,7 +130,8 @@ out:
 	return err;
 }
 
-static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
+static int perf_evsel__add_hist_entry(struct perf_session *session,
+				      struct perf_evsel *evsel,
 				      struct addr_location *al,
 				      struct perf_sample *sample,
 				      struct machine *machine)
@@ -140,8 +141,8 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct hist_entry *he;
 
 	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
-						 sample->callchain, &parent);
+		err = machine__resolve_callchain(session, machine, evsel,
+						 al->thread, sample, &parent);
 		if (err)
 			return err;
 	}
@@ -213,7 +214,8 @@ static int process_sample_event(struct perf_tool *tool,
 		if (al.map != NULL)
 			al.map->dso->hit = 1;
 
-		if (perf_evsel__add_hist_entry(evsel, &al, sample, machine)) {
+		if (perf_evsel__add_hist_entry(rep->session, evsel, &al,
+					       sample, machine)) {
 			pr_debug("problem incrementing symbol period, skipping event\n");
 			return -1;
 		}
@@ -389,17 +391,17 @@ static int __cmd_report(struct perf_report *rep)
 "If some relocation was applied (e.g. kexec) symbols may be misresolved.");
 	}
 
-	if (dump_trace) {
-		perf_session__fprintf_nr_events(session, stdout);
-		goto out_delete;
-	}
-
 	if (verbose > 3)
 		perf_session__fprintf(session, stdout);
 
 	if (verbose > 2)
 		perf_session__fprintf_dsos(session, stdout);
 
+	if (dump_trace) {
+		perf_session__fprintf_nr_events(session, stdout);
+		goto out_delete;
+	}
+
 	nr_samples = 0;
 	list_for_each_entry(pos, &session->evlist->entries, node) {
 		struct hists *hists = &pos->hists;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d4ce733..0967c97 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -28,6 +28,11 @@ static bool			system_wide;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
+struct perf_script {
+	struct perf_tool tool;
+	struct perf_session *session;
+};
+
 enum perf_output_field {
 	PERF_OUTPUT_COMM            = 1U << 0,
 	PERF_OUTPUT_TID             = 1U << 1,
@@ -373,7 +378,8 @@ static void print_sample_addr(union perf_event *event,
 	}
 }
 
-static void print_sample_bts(union perf_event *event,
+static void print_sample_bts(struct perf_session *session,
+			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct perf_evsel *evsel,
 			     struct machine *machine,
@@ -387,7 +393,7 @@ static void print_sample_bts(union perf_event *event,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine, evsel,
+		perf_event__print_ip(session, event, sample, machine, evsel,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
@@ -401,7 +407,8 @@ static void print_sample_bts(union perf_event *event,
 	printf("\n");
 }
 
-static void process_event(union perf_event *event __unused,
+static void process_event(struct perf_session *session,
+			  union perf_event *event __unused,
 			  struct perf_sample *sample,
 			  struct perf_evsel *evsel,
 			  struct machine *machine,
@@ -415,7 +422,8 @@ static void process_event(union perf_event *event __unused,
 	print_sample_start(sample, thread, attr);
 
 	if (is_bts_event(attr)) {
-		print_sample_bts(event, sample, evsel, machine, thread);
+		print_sample_bts(session, event, sample, evsel,
+				 machine, thread);
 		return;
 	}
 
@@ -431,7 +439,7 @@ static void process_event(union perf_event *event __unused,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine, evsel,
+		perf_event__print_ip(session, event, sample, machine, evsel,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
@@ -488,6 +496,8 @@ static int process_sample_event(struct perf_tool *tool __used,
 				struct perf_evsel *evsel,
 				struct machine *machine)
 {
+	struct perf_script *script = container_of(tool, struct perf_script,
+						  tool);
 	struct addr_location al;
 	struct thread *thread = machine__findnew_thread(machine, event->ip.tid);
 
@@ -520,24 +530,27 @@ static int process_sample_event(struct perf_tool *tool __used,
 	if (cpu_list && !test_bit(sample->cpu, cpu_bitmap))
 		return 0;
 
-	scripting_ops->process_event(event, sample, evsel, machine, thread);
+	scripting_ops->process_event(script->session, event, sample, evsel,
+				     machine, thread);
 
 	evsel->hists.stats.total_period += sample->period;
 	return 0;
 }
 
-static struct perf_tool perf_script = {
-	.sample		 = process_sample_event,
-	.mmap		 = perf_event__process_mmap,
-	.comm		 = perf_event__process_comm,
-	.exit		 = perf_event__process_task,
-	.fork		 = perf_event__process_task,
-	.attr		 = perf_event__process_attr,
-	.event_type	 = perf_event__process_event_type,
-	.tracing_data	 = perf_event__process_tracing_data,
-	.build_id	 = perf_event__process_build_id,
-	.ordered_samples = true,
-	.ordering_requires_timestamps = true,
+static struct perf_script perf_script = {
+	.tool = {
+		.sample		 = process_sample_event,
+		.mmap		 = perf_event__process_mmap,
+		.comm		 = perf_event__process_comm,
+		.exit		 = perf_event__process_task,
+		.fork		 = perf_event__process_task,
+		.attr		 = perf_event__process_attr,
+		.event_type	 = perf_event__process_event_type,
+		.tracing_data	 = perf_event__process_tracing_data,
+		.build_id	 = perf_event__process_build_id,
+		.ordered_samples = true,
+		.ordering_requires_timestamps = true,
+	},
 };
 
 extern volatile int session_done;
@@ -553,7 +566,7 @@ static int __cmd_script(struct perf_session *session)
 
 	signal(SIGINT, sig_handler);
 
-	ret = perf_session__process_events(session, &perf_script);
+	ret = perf_session__process_events(session, &perf_script.tool);
 
 	if (debug_mode)
 		pr_err("Misordered timestamps: %" PRIu64 "\n", nr_unordered);
@@ -1335,10 +1348,13 @@ int cmd_script(int argc, const char **argv, const char *prefix __used)
 	if (!script_name)
 		setup_pager();
 
-	session = perf_session__new(input_name, O_RDONLY, 0, false, &perf_script);
+	session = perf_session__new(input_name, O_RDONLY, 0, false,
+				    &perf_script.tool);
 	if (session == NULL)
 		return -ENOMEM;
 
+	perf_script.session = session;
+
 	if (cpu_list) {
 		if (perf_session__cpu_bitmap(session, cpu_list, cpu_bitmap))
 			return -1;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 10184f6..1a4d9b5 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -774,8 +774,9 @@ static void perf_event__process_sample(struct perf_tool *tool,
 
 		if ((sort__has_parent || symbol_conf.use_callchain) &&
 		    sample->callchain) {
-			err = machine__resolve_callchain(machine, evsel, al.thread,
-							 sample->callchain, &parent);
+			err = machine__resolve_callchain(top->session,
+						machine, evsel, al.thread,
+						sample, &parent);
 			if (err)
 				return;
 		}
diff --git a/tools/perf/util/include/linux/compiler.h b/tools/perf/util/include/linux/compiler.h
index 547628e..2dc8671 100644
--- a/tools/perf/util/include/linux/compiler.h
+++ b/tools/perf/util/include/linux/compiler.h
@@ -10,5 +10,6 @@
 #endif
 
 #define __used		__attribute__((__unused__))
+#define __packed	__attribute__((__packed__))
 
 #endif
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 81371ba..a27a0a1 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -156,9 +156,12 @@ int machine__init(struct machine *self, const char *root_dir, pid_t pid);
 void machine__exit(struct machine *self);
 void machine__delete(struct machine *self);
 
-int machine__resolve_callchain(struct machine *machine,
+struct perf_session;
+struct perf_sample;
+int machine__resolve_callchain(struct perf_session *session,
+			       struct machine *machine,
 			       struct perf_evsel *evsel, struct thread *thread,
-			       struct ip_callchain *chain,
+			       struct perf_sample *sample,
 			       struct symbol **parent);
 int maps__set_kallsyms_ref_reloc_sym(struct map **maps, const char *symbol_name,
 				     u64 addr);
diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
index e30749e..ae07098 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -364,7 +364,8 @@ static void perl_process_event_generic(union perf_event *pevent __unused,
 	LEAVE;
 }
 
-static void perl_process_event(union perf_event *pevent,
+static void perl_process_event(struct perf_session *session __used,
+			       union perf_event *pevent,
 			       struct perf_sample *sample,
 			       struct perf_evsel *evsel,
 			       struct machine *machine,
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index c2623c6..c593ba9 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -205,7 +205,8 @@ static inline struct event *find_cache_event(int type)
 	return event;
 }
 
-static void python_process_event(union perf_event *pevent __unused,
+static void python_process_event(struct perf_session *session,
+				 union perf_event *pevent __unused,
 				 struct perf_sample *sample,
 				 struct perf_evsel *evsel __unused,
 				 struct machine *machine __unused,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index a2e1c1b..4920e21 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -15,6 +15,8 @@
 #include "util.h"
 #include "cpumap.h"
 #include "vdso.h"
+#include "unwind.h"
+#include "perf_regs.h"
 
 static int perf_session__open(struct perf_session *self, bool force)
 {
@@ -292,17 +294,17 @@ struct branch_info *machine__resolve_bstack(struct machine *self,
 	return bi;
 }
 
-int machine__resolve_callchain(struct machine *self, struct perf_evsel *evsel,
-			       struct thread *thread,
-			       struct ip_callchain *chain,
-			       struct symbol **parent)
+static int
+resolve_callchain_sample(struct machine *machine,
+			 struct perf_evsel *evsel,
+			 struct thread *thread,
+			 struct ip_callchain *chain,
+			 struct symbol **parent)
 {
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	unsigned int i;
 	int err;
 
-	callchain_cursor_reset(&evsel->hists.callchain_cursor);
-
 	for (i = 0; i < chain->nr; i++) {
 		u64 ip;
 		struct addr_location al;
@@ -315,11 +317,14 @@ int machine__resolve_callchain(struct machine *self, struct perf_evsel *evsel,
 		if (ip >= PERF_CONTEXT_MAX) {
 			switch (ip) {
 			case PERF_CONTEXT_HV:
-				cpumode = PERF_RECORD_MISC_HYPERVISOR;	break;
+				cpumode = PERF_RECORD_MISC_HYPERVISOR;
+				break;
 			case PERF_CONTEXT_KERNEL:
-				cpumode = PERF_RECORD_MISC_KERNEL;	break;
+				cpumode = PERF_RECORD_MISC_KERNEL;
+				break;
 			case PERF_CONTEXT_USER:
-				cpumode = PERF_RECORD_MISC_USER;	break;
+				cpumode = PERF_RECORD_MISC_USER;
+				break;
 			default:
 				break;
 			}
@@ -327,7 +332,7 @@ int machine__resolve_callchain(struct machine *self, struct perf_evsel *evsel,
 		}
 
 		al.filtered = false;
-		thread__find_addr_location(thread, self, cpumode,
+		thread__find_addr_location(thread, machine, cpumode,
 					   MAP__FUNCTION, ip, &al, NULL);
 		if (al.sym != NULL) {
 			if (sort__has_parent && !*parent &&
@@ -346,6 +351,38 @@ int machine__resolve_callchain(struct machine *self, struct perf_evsel *evsel,
 	return 0;
 }
 
+static int unwind_entry(struct unwind_entry *entry, void *arg)
+{
+	struct callchain_cursor *cursor = arg;
+	return callchain_cursor_append(cursor, entry->ip,
+				       entry->map, entry->sym);
+}
+
+int machine__resolve_callchain(struct perf_session *session,
+			       struct machine *self,
+			       struct perf_evsel *evsel,
+			       struct thread *thread,
+			       struct perf_sample *sample,
+			       struct symbol **parent)
+{
+	int ret;
+
+	callchain_cursor_reset(&evsel->hists.callchain_cursor);
+
+	ret = resolve_callchain_sample(self, evsel, thread,
+				       sample->callchain, parent);
+	if (ret)
+		return ret;
+
+	if (!session->sample_uregs || !session->sample_ustack)
+		return 0;
+
+	return unwind__get_entries(unwind_entry,
+				   &evsel->hists.callchain_cursor,
+				   self, thread,
+				   session->sample_uregs, sample);
+}
+
 static int process_event_synth_tracing_data_stub(union perf_event *event __used,
 						 struct perf_session *session __used)
 {
@@ -772,6 +809,36 @@ static void branch_stack__printf(struct perf_sample *sample)
 			sample->branch_stack->entries[i].to);
 }
 
+static void uregs_printf(struct perf_sample *sample, u64 sample_uregs)
+{
+	struct user_regs *uregs = &sample->uregs;
+	int i = 0, rid = 0;
+
+	if (!uregs->version)
+		return;
+
+	printf("... uregs: mask 0x%" PRIx64 "\n", sample_uregs);
+
+	do {
+		u64 val;
+
+		if (sample_uregs & 1) {
+			val = uregs->regs[i++];
+			printf(".... %-5s 0x%" PRIx64 "\n",
+			       perf_reg_name(rid), val);
+		}
+
+		rid++;
+		sample_uregs >>= 1;
+
+	} while (sample_uregs);
+}
+
+static void ustack_printf(struct perf_sample *sample)
+{
+	printf("... ustack: size %" PRIu64 "\n", sample->stack.size);
+}
+
 static void perf_session__print_tstamp(struct perf_session *session,
 				       union perf_event *event,
 				       struct perf_sample *sample)
@@ -822,6 +889,12 @@ static void dump_sample(struct perf_session *session, union perf_event *event,
 
 	if (session->sample_type & PERF_SAMPLE_BRANCH_STACK)
 		branch_stack__printf(sample);
+
+	if (session->sample_uregs)
+		uregs_printf(sample, session->sample_uregs);
+
+	if (session->sample_ustack)
+		ustack_printf(sample);
 }
 
 static struct machine *
@@ -1387,7 +1460,8 @@ struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 	return NULL;
 }
 
-void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
+void perf_event__print_ip(struct perf_session *session,
+			  union perf_event *event, struct perf_sample *sample,
 			  struct machine *machine, struct perf_evsel *evsel,
 			  int print_sym, int print_dso, int print_symoffset)
 {
@@ -1404,8 +1478,8 @@ void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
 
 	if (symbol_conf.use_callchain && sample->callchain) {
 
-		if (machine__resolve_callchain(machine, evsel, al.thread,
-						sample->callchain, NULL) != 0) {
+		if (machine__resolve_callchain(session, machine, evsel,
+					al.thread, sample, NULL) != 0) {
 			if (verbose)
 				error("Failed to resolve callchain. Skipping\n");
 			return;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 182c0e5..090563a 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -154,7 +154,8 @@ static inline int perf_session__synthesize_sample(struct perf_session *session,
 struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 					    unsigned int type);
 
-void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
+void perf_event__print_ip(struct perf_session *session,
+			  union perf_event *event, struct perf_sample *sample,
 			  struct machine *machine, struct perf_evsel *evsel,
 			  int print_sym, int print_dso, int print_symoffset);
 
diff --git a/tools/perf/util/trace-event-scripting.c b/tools/perf/util/trace-event-scripting.c
index 18ae6c1..496c8d2 100644
--- a/tools/perf/util/trace-event-scripting.c
+++ b/tools/perf/util/trace-event-scripting.c
@@ -35,7 +35,8 @@ static int stop_script_unsupported(void)
 	return 0;
 }
 
-static void process_event_unsupported(union perf_event *event __unused,
+static void process_event_unsupported(struct perf_session *session __used,
+				      union perf_event *event __unused,
 				      struct perf_sample *sample __unused,
 				      struct perf_evsel *evsel __unused,
 				      struct machine *machine __unused,
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 58ae14c..15b3133 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -289,11 +289,14 @@ enum trace_flag_type {
 	TRACE_FLAG_SOFTIRQ		= 0x10,
 };
 
+struct perf_session;
+
 struct scripting_ops {
 	const char *name;
 	int (*start_script) (const char *script, int argc, const char **argv);
 	int (*stop_script) (void);
-	void (*process_event) (union perf_event *event,
+	void (*process_event) (struct perf_session *session,
+			       union perf_event *event,
 			       struct perf_sample *sample,
 			       struct perf_evsel *evsel,
 			       struct machine *machine,
diff --git a/tools/perf/util/unwind.c b/tools/perf/util/unwind.c
new file mode 100644
index 0000000..d1bdcda
--- /dev/null
+++ b/tools/perf/util/unwind.c
@@ -0,0 +1,565 @@
+/*
+ * Post mortem Dwarf CFI based unwinding on top of regs and stack dumps.
+ *
+ * Lots of this code have been borrowed or heavily inspired from parts of
+ * the libunwind 0.99 code which are (amongst other contributors I may have
+ * forgotten):
+ *
+ * Copyright (C) 2002-2007 Hewlett-Packard Co
+ *	Contributed by David Mosberger-Tang <davidm@hpl.hp.com>
+ *
+ * And the bugs have been added by:
+ *
+ * Copyright (C) 2010, Frederic Weisbecker <fweisbec@gmail.com>
+ * Copyright (C) 2012, Jiri Olsa <jolsa@redhat.com>
+ *
+ */
+
+#include <elf.h>
+#include <gelf.h>
+#include <fcntl.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <linux/list.h>
+#include <libunwind.h>
+#include <libunwind-ptrace.h>
+#include "thread.h"
+#include "session.h"
+#include "perf_regs.h"
+#include "unwind.h"
+#include "util.h"
+
+extern int
+UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
+				    unw_word_t ip,
+				    unw_dyn_info_t *di,
+				    unw_proc_info_t *pi,
+				    int need_unwind_info, void *arg);
+
+#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
+
+#define DW_EH_PE_FORMAT_MASK	0x0f	/* format of the encoded value */
+#define DW_EH_PE_APPL_MASK	0x70	/* how the value is to be applied */
+
+/* Pointer-encoding formats: */
+#define DW_EH_PE_omit		0xff
+#define DW_EH_PE_ptr		0x00	/* pointer-sized unsigned value */
+#define DW_EH_PE_udata4		0x03	/* unsigned 32-bit value */
+#define DW_EH_PE_udata8		0x04	/* unsigned 64-bit value */
+#define DW_EH_PE_sdata4		0x0b	/* signed 32-bit value */
+#define DW_EH_PE_sdata8		0x0c	/* signed 64-bit value */
+
+/* Pointer-encoding application: */
+#define DW_EH_PE_absptr		0x00	/* absolute value */
+#define DW_EH_PE_pcrel		0x10	/* rel. to addr. of encoded value */
+
+/*
+ * The following are not documented by LSB v1.3, yet they are used by
+ * GCC, presumably they aren't documented by LSB since they aren't
+ * used on Linux:
+ */
+#define DW_EH_PE_funcrel	0x40	/* start-of-procedure-relative */
+#define DW_EH_PE_aligned	0x50	/* aligned pointer */
+
+/* Flags intentionaly not handled, since they're not needed:
+ * #define DW_EH_PE_indirect      0x80
+ * #define DW_EH_PE_uleb128       0x01
+ * #define DW_EH_PE_udata2        0x02
+ * #define DW_EH_PE_sleb128       0x09
+ * #define DW_EH_PE_sdata2        0x0a
+ * #define DW_EH_PE_textrel       0x20
+ * #define DW_EH_PE_datarel       0x30
+ */
+
+struct unwind_info {
+	struct perf_sample	*sample;
+	struct machine		*machine;
+	struct thread		*thread;
+	u64			sample_uregs;
+};
+
+#define dw_read(ptr, type, end) ({	\
+	type *__p = (type *) ptr;	\
+	type  __v;			\
+	if ((__p + 1) > (type *) end)	\
+		return -EINVAL;		\
+	__v = *__p++;			\
+	ptr = (typeof(ptr)) __p;	\
+	__v;				\
+	})
+
+static int __dw_read_encoded_value(u8 **p, u8 *end, u64 *val,
+				   u8 encoding)
+{
+	u8 *cur = *p;
+	*val = 0;
+
+	switch (encoding) {
+	case DW_EH_PE_omit:
+		*val = 0;
+		goto out;
+	case DW_EH_PE_ptr:
+		*val = dw_read(cur, unsigned long, end);
+		goto out;
+	default:
+		break;
+	}
+
+	switch (encoding & DW_EH_PE_APPL_MASK) {
+	case DW_EH_PE_absptr:
+		break;
+	case DW_EH_PE_pcrel:
+		*val = (unsigned long) cur;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if ((encoding & 0x07) == 0x00)
+		encoding |= DW_EH_PE_udata4;
+
+	switch (encoding & DW_EH_PE_FORMAT_MASK) {
+	case DW_EH_PE_sdata4:
+		*val += dw_read(cur, s32, end);
+		break;
+	case DW_EH_PE_udata4:
+		*val += dw_read(cur, u32, end);
+		break;
+	case DW_EH_PE_sdata8:
+		*val += dw_read(cur, s64, end);
+		break;
+	case DW_EH_PE_udata8:
+		*val += dw_read(cur, u64, end);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+ out:
+	*p = cur;
+	return 0;
+}
+
+#define dw_read_encoded_value(ptr, end, enc) ({			\
+	u64 __v;						\
+	if (__dw_read_encoded_value(&ptr, end, &__v, enc)) {	\
+		return -EINVAL;                                 \
+	}                                                       \
+	__v;                                                    \
+	})
+
+static Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
+				    GElf_Shdr *shp, const char *name)
+{
+	Elf_Scn *sec = NULL;
+
+	while ((sec = elf_nextscn(elf, sec)) != NULL) {
+		char *str;
+
+		gelf_getshdr(sec, shp);
+		str = elf_strptr(elf, ep->e_shstrndx, shp->sh_name);
+		if (!strcmp(name, str))
+			break;
+	}
+
+	return sec;
+}
+
+static u64 elf_section_offset(int fd, const char *name)
+{
+	Elf *elf;
+	GElf_Ehdr ehdr;
+	GElf_Shdr shdr;
+	u64 offset = 0;
+
+	elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+	if (elf == NULL)
+		return 0;
+
+	do {
+		if (gelf_getehdr(elf, &ehdr) == NULL)
+			break;
+
+		if (!elf_section_by_name(elf, &ehdr, &shdr, name))
+			break;
+
+		offset = shdr.sh_offset;
+	} while (0);
+
+	elf_end(elf);
+	return offset;
+}
+
+struct table_entry {
+	u32 start_ip_offset;
+	u32 fde_offset;
+};
+
+struct eh_frame_hdr {
+	unsigned char version;
+	unsigned char eh_frame_ptr_enc;
+	unsigned char fde_count_enc;
+	unsigned char table_enc;
+
+	/*
+	 * The rest of the header is variable-length and consists of the
+	 * following members:
+	 *
+	 *	encoded_t eh_frame_ptr;
+	 *	encoded_t fde_count;
+	 */
+
+	/* A single encoded pointer should not be more than 8 bytes. */
+	u64 enc[2];
+
+	/*
+	 * struct {
+	 *    encoded_t start_ip;
+	 *    encoded_t fde_addr;
+	 * } binary_search_table[fde_count];
+	 */
+	char data[0];
+} __packed;
+
+static int unwind_spec_ehframe(struct dso *dso, struct machine *machine,
+			       u64 offset, u64 *table_data, u64 *segbase,
+			       u64 *fde_count)
+{
+	struct eh_frame_hdr hdr;
+	u8 *enc = (u8 *) &hdr.enc;
+	u8 *end = (u8 *) &hdr.data;
+	ssize_t r;
+
+	r = dso__data_read_offset(dso, machine, offset,
+				  (u8 *) &hdr, sizeof(hdr));
+	if (r != sizeof(hdr))
+		return -EINVAL;
+
+	/* We dont need eh_frame_ptr, just skip it. */
+	dw_read_encoded_value(enc, end, hdr.eh_frame_ptr_enc);
+
+	*fde_count  = dw_read_encoded_value(enc, end, hdr.fde_count_enc);
+	*segbase    = offset;
+	*table_data = (enc - (u8 *) &hdr) + offset;
+	return 0;
+}
+
+static int read_unwind_spec(struct dso *dso, struct machine *machine,
+			    u64 *table_data, u64 *segbase, u64 *fde_count)
+{
+	int ret = -EINVAL, fd;
+	u64 offset;
+
+	fd = dso__data_fd(dso, machine);
+	if (fd < 0)
+		return -EINVAL;
+
+	offset = elf_section_offset(fd, ".eh_frame_hdr");
+	close(fd);
+
+	if (offset)
+		ret = unwind_spec_ehframe(dso, machine, offset,
+					  table_data, segbase,
+					  fde_count);
+
+	/* TODO .debug_frame check if eh_frame_hdr fails */
+	return ret;
+}
+
+static struct map *find_map(unw_word_t ip, struct unwind_info *ui)
+{
+	struct addr_location al;
+
+	thread__find_addr_map(ui->thread, ui->machine, PERF_RECORD_MISC_USER,
+			      MAP__FUNCTION, ip, &al);
+	return al.map;
+}
+
+static int
+find_proc_info(unw_addr_space_t as, unw_word_t ip, unw_proc_info_t *pi,
+	       int need_unwind_info, void *arg)
+{
+	struct unwind_info *ui = arg;
+	struct map *map;
+	unw_dyn_info_t di;
+	u64 table_data, segbase, fde_count;
+
+	map = find_map(ip, ui);
+	if (!map || !map->dso)
+		return -EINVAL;
+
+	if (read_unwind_spec(map->dso, ui->machine,
+			     &table_data, &segbase, &fde_count))
+		return -EINVAL;
+
+	memset(&di, 0, sizeof(di));
+	di.format   = UNW_INFO_FORMAT_REMOTE_TABLE;
+	di.start_ip = map->start;
+	di.end_ip   = map->end;
+	di.u.rti.segbase    = map->start + segbase;
+	di.u.rti.table_data = map->start + table_data;
+	di.u.rti.table_len  = fde_count * sizeof(struct table_entry)
+			      / sizeof(unw_word_t);
+	return dwarf_search_unwind_table(as, ip, &di, pi,
+					 need_unwind_info, arg);
+}
+
+static int access_fpreg(unw_addr_space_t __used as, unw_regnum_t __used num,
+			unw_fpreg_t __used *val, int __used __write,
+			void __used *arg)
+{
+	pr_err("unwind: access_fpreg unsupported\n");
+	return -UNW_EINVAL;
+}
+
+static int get_dyn_info_list_addr(unw_addr_space_t __used as,
+				  unw_word_t __used *dil_addr,
+				  void __used *arg)
+{
+	return -UNW_ENOINFO;
+}
+
+static int resume(unw_addr_space_t __used as, unw_cursor_t __used *cu,
+		  void __used *arg)
+{
+	pr_err("unwind: resume unsupported\n");
+	return -UNW_EINVAL;
+}
+
+static int
+get_proc_name(unw_addr_space_t __used as, unw_word_t __used addr,
+		char __used *bufp, size_t __used buf_len,
+		unw_word_t __used *offp, void __used *arg)
+{
+	pr_err("unwind: get_proc_name unsupported\n");
+	return -UNW_EINVAL;
+}
+
+static int access_dso_mem(struct unwind_info *ui, unw_word_t addr,
+			  unw_word_t *data)
+{
+	struct addr_location al;
+	ssize_t size;
+
+	thread__find_addr_map(ui->thread, ui->machine, PERF_RECORD_MISC_USER,
+			      MAP__FUNCTION, addr, &al);
+	if (!al.map) {
+		pr_debug("unwind: no map for %lx\n", (unsigned long)addr);
+		return -1;
+	}
+
+	if (!al.map->dso)
+		return -1;
+
+	size = dso__data_read_addr(al.map->dso, al.map, ui->machine,
+				   addr, (u8 *) data, sizeof(*data));
+
+	return !(size == sizeof(*data));
+}
+
+static int reg_value(unw_word_t *valp, struct user_regs *regs, int id,
+		     u64 sample_regs)
+{
+	int i, idx = 0;
+
+	if (!(sample_regs & (1 << id)))
+		return -EINVAL;
+
+	for (i = 0; i < id; i++) {
+		if (sample_regs & (1 << i))
+			idx++;
+	}
+
+	*valp = regs->regs[idx];
+	return 0;
+}
+
+static int access_mem(unw_addr_space_t __used as,
+		      unw_word_t addr, unw_word_t *valp,
+		      int __write, void *arg)
+{
+	struct unwind_info *ui = arg;
+	struct user_stack_dump *stack = &ui->sample->stack;
+	unw_word_t start, end;
+	int offset;
+	int ret;
+
+	/* Don't support write, probably not needed. */
+	if (__write || !stack || !ui->sample->uregs.version) {
+		*valp = 0;
+		return 0;
+	}
+
+	ret = reg_value(&start, &ui->sample->uregs, PERF_REG_SP,
+			ui->sample_uregs);
+	if (ret)
+		return ret;
+
+	end = start + stack->size;
+
+	/* Check overflow. */
+	if (addr + sizeof(unw_word_t) < addr)
+		return -EINVAL;
+
+	if (addr < start || addr + sizeof(unw_word_t) >= end) {
+		ret = access_dso_mem(ui, addr, valp);
+		if (ret) {
+			pr_debug("unwind: access_mem %p not inside range %p-%p\n",
+				(void *)addr, (void *)start, (void *)end);
+			*valp = 0;
+			return ret;
+		}
+		return 0;
+	}
+
+	offset = addr - start;
+	*valp  = *(unw_word_t *)&stack->data[offset];
+	pr_debug("unwind: access_mem %p %lx\n",
+		 (void *)addr, (unsigned long)*valp);
+	return 0;
+}
+
+static int access_reg(unw_addr_space_t __used as,
+		      unw_regnum_t regnum, unw_word_t *valp,
+		      int __write, void *arg)
+{
+	struct unwind_info *ui = arg;
+	int id, ret;
+
+	/* Don't support write, I suspect we don't need it. */
+	if (__write) {
+		pr_err("unwind: access_reg w %d\n", regnum);
+		return 0;
+	}
+
+	if (!ui->sample->uregs.version) {
+		*valp = 0;
+		return 0;
+	}
+
+	id = unwind__arch_reg_id(regnum);
+	if (id < 0)
+		return -EINVAL;
+
+	ret = reg_value(valp, &ui->sample->uregs, id, ui->sample_uregs);
+	if (ret) {
+		pr_err("unwind: can't read reg %d\n", regnum);
+		return ret;
+	}
+
+	pr_debug("unwind: reg %d, val %lx\n", regnum, (unsigned long)*valp);
+	return 0;
+}
+
+static void put_unwind_info(unw_addr_space_t __used as,
+			    unw_proc_info_t *pi __used,
+			    void *arg __used)
+{
+	pr_debug("unwind: put_unwind_info called\n");
+}
+
+static int entry(u64 ip, struct thread *thread, struct machine *machine,
+		 unwind_entry_cb_t cb, void *arg)
+{
+	struct unwind_entry e;
+	struct addr_location al;
+
+	thread__find_addr_location(thread, machine,
+				   PERF_RECORD_MISC_USER,
+				   MAP__FUNCTION, ip, &al, NULL);
+
+	e.ip = ip;
+	e.map = al.map;
+	e.sym = al.sym;
+
+	pr_debug("unwind: %s:ip = 0x%" PRIx64 " (0x%" PRIx64 "\n",
+		 al.sym ? al.sym->name : "[]",
+		 ip,
+		 al.map ? al.map->map_ip(al.map, ip) : (u64) 0);
+
+	return cb(&e, arg);
+}
+
+static void display_error(int err)
+{
+	switch (err) {
+	case UNW_EINVAL:
+		pr_err("unwind: Only supports local.\n");
+		break;
+	case UNW_EUNSPEC:
+		pr_err("unwind: Unspecified error.\n");
+		break;
+	case UNW_EBADREG:
+		pr_err("unwind: Register unavailable.\n");
+		break;
+	default:
+		break;
+	}
+}
+
+static unw_accessors_t accessors = {
+	.find_proc_info		= find_proc_info,
+	.put_unwind_info	= put_unwind_info,
+	.get_dyn_info_list_addr	= get_dyn_info_list_addr,
+	.access_mem		= access_mem,
+	.access_reg		= access_reg,
+	.access_fpreg		= access_fpreg,
+	.resume			= resume,
+	.get_proc_name		= get_proc_name,
+};
+
+static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
+		       void *arg)
+{
+	unw_addr_space_t addr_space;
+	unw_cursor_t c;
+	int ret;
+
+	addr_space = unw_create_addr_space(&accessors, 0);
+	if (!addr_space) {
+		pr_err("unwind: Can't create unwind address space.\n");
+		return -ENOMEM;
+	}
+
+	ret = unw_init_remote(&c, addr_space, ui);
+	if (ret)
+		display_error(ret);
+
+	while (!ret && (unw_step(&c) > 0)) {
+		unw_word_t ip;
+
+		unw_get_reg(&c, UNW_REG_IP, &ip);
+		ret = entry(ip, ui->thread, ui->machine, cb, arg);
+	}
+
+	unw_destroy_addr_space(addr_space);
+	return ret;
+}
+
+int unwind__get_entries(unwind_entry_cb_t cb, void *arg,
+			struct machine *machine, struct thread *thread,
+			u64 sample_uregs, struct perf_sample *data)
+{
+	unw_word_t ip;
+	struct unwind_info ui = {
+		.sample       = data,
+		.sample_uregs = sample_uregs,
+		.thread       = thread,
+		.machine      = machine,
+	};
+	int ret;
+
+	if (!data->uregs.version)
+		return -EINVAL;
+
+	ret = reg_value(&ip, &data->uregs, PERF_REG_IP, sample_uregs);
+	if (ret)
+		return ret;
+
+	ret = entry(ip, thread, machine, cb, arg);
+	if (ret)
+		return -ENOMEM;
+
+	return get_entries(&ui, cb, arg);
+}
diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h
new file mode 100644
index 0000000..919bd6a
--- /dev/null
+++ b/tools/perf/util/unwind.h
@@ -0,0 +1,34 @@
+#ifndef __UNWIND_H
+#define __UNWIND_H
+
+#include "types.h"
+#include "event.h"
+#include "symbol.h"
+
+struct unwind_entry {
+	struct map	*map;
+	struct symbol	*sym;
+	u64		ip;
+};
+
+typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry, void *arg);
+
+#ifndef NO_LIBUNWIND_SUPPORT
+int unwind__get_entries(unwind_entry_cb_t cb, void *arg,
+			struct machine *machine,
+			struct thread *thread,
+			u64 sample_uregs,
+			struct perf_sample *data);
+int unwind__arch_reg_id(int regnum);
+#else
+static inline int
+unwind__get_entries(unwind_entry_cb_t cb __used, void *arg __used,
+		    struct machine *machine __used,
+		    struct thread *thread __used,
+		    u64 sample_uregs __used,
+		    struct perf_sample *data __used)
+{
+	return 0;
+}
+#endif /* NO_LIBUNWIND_SUPPORT */
+#endif /* __UNWIND_H */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/16] perf, tool: Support for dwarf mode callchain on perf record
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (12 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 13/16] perf, tool: Support for dwarf cfi unwinding on post processing Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 15/16] perf, tool: Add dso data caching Jiri Olsa
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

This patch enables perf to use the dwarf unwind code.

It extends the perf record '-g' option with following arguments:
  'fp'           - provides framepointer based user
                   stack backtrace
  'dwarf[,size]' - provides dwarf (libunwind) based user stack
                   backtrace. The size specifies the size of the
                   user stack dump. If ommited it is 8192 by default.

If libunwind is found during the perf build, then the 'dwarf'
argument becomes available for record command. The 'fp' stays as
default option in any case.

Examples: (perf compiled with libunwind)

   perf record -g dwarf ls
      - provides dwarf unwind with 8192 as stack dump size

   perf record -g dwarf,4096 ls
      - provides dwarf unwind with 4096 as stack dump size

   perf record -g -- ls
   perf record -g fp ls
      - provides frame pointer unwind

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/builtin-record.c |   86 ++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/perf.h           |    9 ++++-
 tools/perf/util/evsel.c     |   10 ++++-
 3 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 10b1f1f..ba01545 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -31,6 +31,15 @@
 #include <sched.h>
 #include <sys/mman.h>
 
+#define CALLCHAIN_HELP "do call-graph (stack chain/backtrace) recording: "
+
+#ifdef NO_LIBUNWIND_SUPPORT
+static char callchain_help[] = CALLCHAIN_HELP "[fp]";
+#else
+static unsigned long default_stack_dump_size = 8192;
+static char callchain_help[] = CALLCHAIN_HELP "[fp] dwarf";
+#endif
+
 enum write_mode_t {
 	WRITE_FORCE,
 	WRITE_APPEND
@@ -725,6 +734,78 @@ error:
 	return ret;
 }
 
+static int
+parse_callchain_opt(const struct option *opt __used, const char *arg,
+		    int unset)
+{
+	struct perf_record *rec = (struct perf_record *)opt->value;
+	char *tok, *name, *saveptr = NULL;
+	char buf[20];
+	int ret = -1;
+
+	/* --no-call-graph */
+	if (unset)
+		return 0;
+
+	/* We specified default option if none is provided. */
+	BUG_ON(!arg);
+
+	/* We need buffer that we know we can write to. */
+	snprintf(buf, 20, "%s", arg);
+
+	tok = strtok_r((char *)buf, ",", &saveptr);
+	name = tok ? : (char *)buf;
+
+	do {
+		/* Framepointer style */
+		if (!strncmp(name, "fp", sizeof("fp"))) {
+			if (!strtok_r(NULL, ",", &saveptr)) {
+				rec->opts.call_graph = CALLCHAIN_FP;
+				ret = 0;
+			} else
+				pr_err("callchain: No more arguments "
+				       "needed for -g fp\n");
+			break;
+
+#ifndef NO_LIBUNWIND_SUPPORT
+		/* Dwarf style */
+		} else if (!strncmp(name, "dwarf", sizeof("dwarf"))) {
+			ret = 0;
+			rec->opts.call_graph = CALLCHAIN_DWARF;
+			rec->opts.stack_dump_size = default_stack_dump_size;
+
+			tok = strtok_r(NULL, ",", &saveptr);
+			if (tok) {
+				char *endptr;
+				unsigned long size;
+
+				size = strtoul(tok, &endptr, 0);
+				if (*endptr) {
+					pr_err("callchain: Incorrect stack "
+					       "dump size: %s\n", tok);
+					ret = -1;
+				}
+
+				rec->opts.stack_dump_size = size;
+			}
+
+			pr_debug("callchain: stack dump size %lu\n",
+				 rec->opts.stack_dump_size);
+#endif
+		} else {
+			pr_err("callchain: Unknown -g option "
+			       "value: %s\n", name);
+			break;
+		}
+
+	} while (0);
+
+	if (!ret)
+		pr_debug("callchain: type %d\n", rec->opts.call_graph);
+
+	return ret;
+}
+
 static const char * const record_usage[] = {
 	"perf record [<options>] [<command>]",
 	"perf record [<options>] -- <command> [<options>]",
@@ -793,8 +874,9 @@ const struct option record_options[] = {
 		     "number of mmap data pages"),
 	OPT_BOOLEAN(0, "group", &record.opts.group,
 		    "put the counters into a counter group"),
-	OPT_BOOLEAN('g', "call-graph", &record.opts.call_graph,
-		    "do call-graph (stack chain/backtrace) recording"),
+	OPT_CALLBACK_DEFAULT('g', "call-graph", &record, "mode,dump_size",
+			     callchain_help, &parse_callchain_opt,
+			     "fp"),
 	OPT_INCR('v', "verbose", &verbose,
 		    "be more verbose (show counter open errors, etc)"),
 	OPT_BOOLEAN('q', "quiet", &quiet, "don't print any message"),
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 89e3355..1b72438 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -207,11 +207,17 @@ extern const char perf_version_string[];
 
 void pthread__unblock_sigwinch(void);
 
+enum perf_call_graph_mode {
+	CALLCHAIN_NONE,
+	CALLCHAIN_FP,
+	CALLCHAIN_DWARF
+};
+
 struct perf_record_opts {
 	const char   *target_pid;
 	const char   *target_tid;
 	uid_t	     uid;
-	bool	     call_graph;
+	int	     call_graph;
 	bool	     group;
 	bool	     inherit_stat;
 	bool	     no_delay;
@@ -232,6 +238,7 @@ struct perf_record_opts {
 	u64	     default_interval;
 	u64	     user_interval;
 	const char   *cpu_list;
+	unsigned long stack_dump_size;
 };
 
 #endif
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 7ee47c1..f2c4ca6 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -15,6 +15,7 @@
 #include "util.h"
 #include "cpumap.h"
 #include "thread_map.h"
+#include "perf_regs.h"
 
 #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
 #define GROUP_FD(group_fd, cpu) (*(int *)xyarray__entry(group_fd, cpu, 0))
@@ -104,9 +105,16 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
 		attr->mmap_data = track;
 	}
 
-	if (opts->call_graph)
+	if (opts->call_graph) {
 		attr->sample_type	|= PERF_SAMPLE_CALLCHAIN;
 
+		if (opts->call_graph == CALLCHAIN_DWARF) {
+			attr->user_regs = PERF_REGS_MASK;
+			attr->ustack_dump_size = opts->stack_dump_size;
+			attr->exclude_user_callchain = 1;
+		}
+	}
+
 	if (opts->system_wide)
 		attr->sample_type	|= PERF_SAMPLE_CPU;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 15/16] perf, tool: Add dso data caching
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (13 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 14/16] perf, tool: Support for dwarf mode callchain on perf record Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-17 11:17 ` [PATCH 16/16] perf, tool: Add dso data caching tests Jiri Olsa
  2012-04-18  6:51 ` [RFCv2 00/15] perf: Add backtrace post dwarf unwind Frederic Weisbecker
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding dso data caching so we dont need to open/read/close,
each time we want dso data.

The DSO data caching affects following functions:
  dso__data_read_offset
  dso__data_read_addr

Each DSO read tries to find the data (based on offset) inside
the cache. If it's not present it fills the cache from file,
and returns the data. If it is present, data are returned
with no file read.

Each data read is cached by reading cache page sized/alligned
amount of DSO data. The cache page size is hardcoded to 4096.
The cache is using RB tree with file offset as a sort key.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/symbol.c |  154 ++++++++++++++++++++++++++++++++++++++++------
 tools/perf/util/symbol.h |   11 +++
 2 files changed, 147 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 9c0fa32..b7b0c2a 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -29,6 +29,7 @@
 #define NT_GNU_BUILD_ID 3
 #endif
 
+static void data_cache__free(struct rb_root *root);
 static bool dso__build_id_equal(const struct dso *dso, u8 *build_id);
 static int elf_read_build_id(Elf *elf, void *bf, size_t size);
 static void dsos__add(struct list_head *head, struct dso *dso);
@@ -342,6 +343,7 @@ struct dso *dso__new(const char *name)
 		dso__set_short_name(dso, dso->name);
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
+		dso->data_cache = RB_ROOT;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->loaded = 0;
@@ -376,6 +378,7 @@ void dso__delete(struct dso *dso)
 		free((char *)dso->short_name);
 	if (dso->lname_alloc)
 		free(dso->long_name);
+	data_cache__free(&dso->data_cache);
 	free(dso);
 }
 
@@ -2893,22 +2896,87 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
 	return -EINVAL;
 }
 
-static ssize_t dso_cache_read(struct dso *dso __used, u64 offset __used,
-			      u8 *data __used, ssize_t size __used)
+static void
+data_cache__free(struct rb_root *root)
 {
-	return -EINVAL;
+	struct rb_node *next = rb_first(root);
+
+	while (next) {
+		struct dso__data_cache *cache;
+
+		cache = rb_entry(next, struct dso__data_cache, rb_node);
+		next = rb_next(&cache->rb_node);
+		rb_erase(&cache->rb_node, root);
+		free(cache);
+	}
 }
 
-static int dso_cache_add(struct dso *dso __used, u64 offset __used,
-			 u8 *data __used, ssize_t size __used)
+static struct dso__data_cache*
+data_cache__find(struct rb_root *root, u64 offset)
 {
-	return 0;
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct dso__data_cache *cache;
+
+	while (*p != NULL) {
+		u64 end;
+
+		parent = *p;
+		cache = rb_entry(parent, struct dso__data_cache, rb_node);
+		end = cache->offset + DSO__DATA_CACHE_SIZE;
+
+		if (offset < cache->offset)
+			p = &(*p)->rb_left;
+		else if (offset >= end)
+			p = &(*p)->rb_right;
+		else
+			return cache;
+	}
+	return NULL;
+}
+
+static void
+data_cache__insert(struct rb_root *root, struct dso__data_cache *new)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct dso__data_cache *cache;
+	u64 offset = new->offset;
+
+	while (*p != NULL) {
+		u64 end;
+
+		parent = *p;
+		cache = rb_entry(parent, struct dso__data_cache, rb_node);
+		end = cache->offset + DSO__DATA_CACHE_SIZE;
+
+		if (offset < cache->offset)
+			p = &(*p)->rb_left;
+		else if (offset >= end)
+			p = &(*p)->rb_right;
+	}
+
+	rb_link_node(&new->rb_node, parent, p);
+	rb_insert_color(&new->rb_node, root);
+}
+
+static ssize_t
+data_cache__data(struct dso__data_cache *cache, u64 offset,
+		 u8 *data, u64 size)
+{
+	u64 cache_offset = offset - cache->offset;
+	u64 cache_size   = min(cache->size - cache_offset, size);
+
+	memcpy(data, cache->data + cache_offset, cache_size);
+	return cache_size;
 }
 
-static ssize_t read_dso_data(struct dso *dso, struct machine *machine,
-		     u64 offset, u8 *data, ssize_t size)
+static ssize_t
+data_cache__read(struct dso *dso, struct machine *machine,
+		 u64 offset, u8 *data, ssize_t size)
 {
-	ssize_t rsize = -1;
+	struct dso__data_cache *cache;
+	ssize_t ret;
 	int fd;
 
 	fd = dso__data_fd(dso, machine);
@@ -2916,28 +2984,78 @@ static ssize_t read_dso_data(struct dso *dso, struct machine *machine,
 		return -1;
 
 	do {
-		if (-1 == lseek(fd, offset, SEEK_SET))
+		u64 cache_offset;
+
+		ret = -ENOMEM;
+
+		cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE);
+		if (!cache)
 			break;
 
-		rsize = read(fd, data, size);
-		if (-1 == rsize)
+		cache_offset = offset & DSO__DATA_CACHE_MASK;
+		ret = -EINVAL;
+
+		if (-1 == lseek(fd, cache_offset, SEEK_SET))
 			break;
 
-		if (dso_cache_add(dso, offset, data, size))
-			pr_err("Failed to add data int dso cache.");
+		ret = read(fd, cache->data, DSO__DATA_CACHE_SIZE);
+		if (ret <= 0)
+			break;
+
+		cache->offset = cache_offset;
+		cache->size   = ret;
+		data_cache__insert(&dso->data_cache, cache);
+
+		ret = data_cache__data(cache, offset, data, size);
 
 	} while (0);
 
+	if (ret <= 0)
+		free(cache);
+
 	close(fd);
-	return rsize;
+	return ret;
+}
+
+static ssize_t dso_cache_read(struct dso *dso, struct machine *machine,
+			      u64 offset, u8 *data, ssize_t size)
+{
+	struct dso__data_cache *cache;
+
+	cache = data_cache__find(&dso->data_cache, offset);
+	if (cache)
+		return data_cache__data(cache, offset, data, size);
+	else
+		return data_cache__read(dso, machine, offset, data, size);
 }
 
 ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 			      u64 offset, u8 *data, ssize_t size)
 {
-	if (dso_cache_read(dso, offset, data, size))
-		return read_dso_data(dso, machine, offset, data, size);
-	return 0;
+	ssize_t r = 0;
+	u8 *p = data;
+
+	do {
+		ssize_t ret;
+
+		ret = dso_cache_read(dso, machine, offset, p, size);
+		if (ret < 0)
+			return ret;
+
+		/* Reached EOF, return what we have. */
+		if (!ret)
+			break;
+
+		BUG_ON(ret > size);
+
+		r      += ret;
+		p      += ret;
+		offset += ret;
+		size   -= ret;
+
+	} while (size);
+
+	return r;
 }
 
 ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index b62321f..e5744c0 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -169,10 +169,21 @@ enum dso_kernel_type {
 	DSO_TYPE_GUEST_KERNEL
 };
 
+#define DSO__DATA_CACHE_SIZE 4096
+#define DSO__DATA_CACHE_MASK ~(DSO__DATA_CACHE_SIZE - 1)
+
+struct dso__data_cache {
+	struct rb_node	rb_node;
+	u64 offset;
+	u64 size;
+	char data[0];
+};
+
 struct dso {
 	struct list_head node;
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
+	struct rb_root	 data_cache;
 	enum dso_kernel_type	kernel;
 	enum dso_binary_type	symtab_type;
 	enum dso_binary_type	data_type;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 16/16] perf, tool: Add dso data caching tests
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (14 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 15/16] perf, tool: Add dso data caching Jiri Olsa
@ 2012-04-17 11:17 ` Jiri Olsa
  2012-04-18  6:51 ` [RFCv2 00/15] perf: Add backtrace post dwarf unwind Frederic Weisbecker
  16 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-04-17 11:17 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, rostedt, robert.richter,
	fche, linux-kernel, masami.hiramatsu.pt, drepper, Jiri Olsa

Adding automated test for DSO data reading. Testing raw/cached
reads from different file/cache locations.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile        |    1 +
 tools/perf/builtin-test.c  |    4 +
 tools/perf/util/dso-test.c |  154 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/symbol.h   |    1 +
 4 files changed, 160 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/util/dso-test.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 3614b1a..dc56c8e 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -342,6 +342,7 @@ LIB_OBJS += $(OUTPUT)util/usage.o
 LIB_OBJS += $(OUTPUT)util/wrapper.o
 LIB_OBJS += $(OUTPUT)util/sigchain.o
 LIB_OBJS += $(OUTPUT)util/symbol.o
+LIB_OBJS += $(OUTPUT)util/dso-test.o
 LIB_OBJS += $(OUTPUT)util/color.o
 LIB_OBJS += $(OUTPUT)util/pager.o
 LIB_OBJS += $(OUTPUT)util/header.o
diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index a434aaa..83ff5ba 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -1662,6 +1662,10 @@ static struct test {
 		.func = test__perf_pmu,
 	},
 	{
+		.desc = "Test dso data interface",
+		.func = dso__test_data,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/util/dso-test.c b/tools/perf/util/dso-test.c
new file mode 100644
index 0000000..ec62977
--- /dev/null
+++ b/tools/perf/util/dso-test.c
@@ -0,0 +1,154 @@
+
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <string.h>
+
+#include "symbol.h"
+
+#define TEST_ASSERT_VAL(text, cond) \
+do { \
+	if (!(cond)) { \
+		pr_debug("FAILED %s:%d %s\n", __FILE__, __LINE__, text); \
+		return -1; \
+	} \
+} while (0)
+
+static char *test_file(int size)
+{
+	static char buf_templ[] = "/tmp/test-XXXXXX";
+	char *templ = buf_templ;
+	int fd, i;
+	unsigned char *buf;
+
+	fd = mkostemp(templ, O_CREAT|O_WRONLY|O_TRUNC);
+
+	buf = malloc(size);
+	if (!buf) {
+		close(fd);
+		return NULL;
+	}
+
+	for (i = 0; i < size; i++)
+		buf[i] = (unsigned char) ((int) i % 10);
+
+	if (size != write(fd, buf, size))
+		templ = NULL;
+
+	close(fd);
+	return templ;
+}
+
+#define TEST_FILE_SIZE (DSO__DATA_CACHE_SIZE * 20)
+
+struct test_data__offset {
+	off_t offset;
+	u8 data[10];
+	int size;
+};
+
+struct test_data__offset offsets[] = {
+	/* Fill first cache page. */
+	{
+		.offset = 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read first cache page. */
+	{
+		.offset = 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Fill cache boundary pages. */
+	{
+		.offset = DSO__DATA_CACHE_SIZE - DSO__DATA_CACHE_SIZE % 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read cache boundary pages. */
+	{
+		.offset = DSO__DATA_CACHE_SIZE - DSO__DATA_CACHE_SIZE % 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Fill final cache page. */
+	{
+		.offset = TEST_FILE_SIZE - 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read final cache page. */
+	{
+		.offset = TEST_FILE_SIZE - 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read final cache page. */
+	{
+		.offset = TEST_FILE_SIZE - 3,
+		.data   = { 7, 8, 9, 0, 0, 0, 0, 0, 0, 0 },
+		.size   = 3,
+	},
+};
+
+#define OFFSETS_CNT (sizeof(offsets) / sizeof(struct test_data__offset))
+
+int dso__test_data(void)
+{
+	struct machine machine;
+	struct dso *dso;
+	char *file = test_file(TEST_FILE_SIZE);
+	int i;
+
+	TEST_ASSERT_VAL("No test file", file);
+
+	memset(&machine, 0, sizeof(machine));
+
+	dso = dso__new((const char *)file);
+
+	/* Basic 10 bytes tests. */
+	for (i = 0; i < (int) OFFSETS_CNT; i++) {
+		struct test_data__offset *data = &offsets[i];
+		ssize_t size;
+		u8 buf[10];
+
+		memset(buf, 0, 10);
+		size = dso__data_read_offset(dso, &machine, data->offset,
+				     buf, 10);
+
+		TEST_ASSERT_VAL("Wrong size", size == data->size);
+		TEST_ASSERT_VAL("Wrong data", !memcmp(buf, data->data, 10));
+	}
+
+	/* Read cross multiple cache pages. */
+	{
+		ssize_t size;
+		int c;
+		u8 *buf;
+
+		buf = malloc(TEST_FILE_SIZE);
+		TEST_ASSERT_VAL("ENOMEM\n", buf);
+
+		/* First iteration to fill caches, second one to read them. */
+		for (c = 0; c < 2; c++) {
+			memset(buf, 0, TEST_FILE_SIZE);
+			size = dso__data_read_offset(dso, &machine, 10,
+						     buf, TEST_FILE_SIZE);
+
+			TEST_ASSERT_VAL("Wrong size",
+				size == (TEST_FILE_SIZE - 10));
+
+			for (i = 0; i < size; i++)
+				TEST_ASSERT_VAL("Wrong data",
+					buf[i] == (i % 10));
+		}
+
+		free(buf);
+	}
+
+	dso__delete(dso);
+	unlink(file);
+	return 0;
+}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index e5744c0..897b35e 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -288,4 +288,5 @@ ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
 			    struct machine *machine, u64 addr,
 			    u8 *data, ssize_t size);
+int dso__test_data(void);
 #endif /* __PERF_SYMBOL */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFCv2 00/15] perf: Add backtrace post dwarf unwind
  2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (15 preceding siblings ...)
  2012-04-17 11:17 ` [PATCH 16/16] perf, tool: Add dso data caching tests Jiri Olsa
@ 2012-04-18  6:51 ` Frederic Weisbecker
  16 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2012-04-18  6:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, eranian, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper

On Tue, Apr 17, 2012 at 01:17:05PM +0200, Jiri Olsa wrote:
> hi,
> sending another RFC version. There are some fixies and some
> yet unresolved issues outlined below.
> 
> thanks for comments,
> jirka
> 
> v2 changes:
> 	02/16 - fixed register enums
> 	12/16 - fixed the perf_evlist__* names
> 	14/16 - 'fp' stays default even if compiled with dwarf unwind
> 	15/16 - added cache to the DSO data interface, it makes the
>                 dwarf unwind real fast now
> 
> v2 not solved yet:
>   1) generic user regs capturing
>      discussed in here:
>        http://marc.info/?l=linux-kernel&m=133304076114629&w=2
> 
>      Looks like we could have a generic way of getting registers
>      on samples and support more than only user level registers.
>      But looks like it wasn't completely decided what register
>      levels are worth to store..

Right but I think this is outside the scope of this patchset.
Especially without a proper usecase in tools, we can only do
mistakes by implementing early support for other kind of regs dump in
samples.

> 
>      It looks to me that each level could add its own registers
>      mask and it'd be used/saved if the mask is non zero.
>      The same way the user level regs are dealt with now ;).
> 
>      Or we could add something that Stephane came up with:
>        attr->sample_type |= PERF_SAMPLE_REGS
>        attr->sample_regs = EAX | EBX | EDI | ESI |.....
>        attr->sample_reg_mode = { INTR, PRECISE, USER }

If we do this we won't be able to record user regs and, say, precise regs
at the same time. These should be different sample types that can be set
at the same time: PERF_SAMPLE_UREGS | PERF_SAMPLE_PRECISE_REGS | ...

And then have attr->sample_uregs, attr->sample_pregs, ...

This is scary because I fear we'll need to increase the size of attr
and the ABI is then going to become incompatible. We'll probably need a way
to extend properly.

Anyway, no need to implement support for regs samples (other than user) in
this patchset. But we indeed need to make plans such that the ABI is ready to
host that later.

> 
>      But I guess we need to decide what levels make sense
>      to store first. Also if there's like 2 or 3 only, it might be
>      better to use separate masks as pointed out above.
> 
>   2) compat task handling
>      we dont handle compat tasks unwinding currently, we probably want
>      to make it part of the 1) solution or add something like:
> 	__u64 user_regs_compat;

Yep. One question I have for those who know well architectures support
in general is: do we have some archs that support more than just one
kind of compat/emulated mode?

If so I think we can't rely on that native/compat dichotomy but rather
we need to treat all these modes as different architectures.

In this case, attr->sample_regs would only make sense with some arch ID.

Anybody, more clues on this?

> 
>      Handling the compat task would also need some other changes in the
>      unwind code and I'm not completely sure how libunwind deals with that,
>      need to check.

Indeed.

> 
>      How much do we want this? ;)

I think we do. But there is no emergency. We can do it incrementally once
we have native support. But then again I think we need to ensure we got the
things right to make the ABI able to host that in the future.

> 
>   3) registers version names
>      this one is now probably connected to 1) and 2) ;)
>      I kept the register version, since I think it might be usefull
>      for dealing with compat tasks - to recognize what type of registers
>      were stored

Right. Hm, I need to look at this deeper.

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-04-17 11:17 ` [PATCH 02/16] perf: Unified API to record selective sets of arch registers Jiri Olsa
@ 2012-04-23 10:10   ` Stephane Eranian
  2012-04-23 10:33     ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Stephane Eranian @ 2012-04-23 10:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper

On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> This brings a new API to help the selective dump of registers on
> event sampling, and its implementation in x86.
>
> - The informations about the desired registers will be passed
>  to a single u64 mask. It's up to the architecture to map the
>  registers into the mask bits.
>
> - The architecture must provide a non-zero and unique id to
>  identify the origin of a register set because interpreting a
>  register dump requires to know from which architecture it comes.
>  The achitecture is considered different between the 32 and 64 bits
>  version. x86-32 has the id 1, x86-64 has the id 2.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  arch/x86/include/asm/perf_regs.h    |   16 ++++++
>  arch/x86/include/asm/perf_regs_32.h |   84 +++++++++++++++++++++++++++++
>  arch/x86/include/asm/perf_regs_64.h |  101 +++++++++++++++++++++++++++++++++++
>  include/asm-generic/perf_regs.h     |   23 ++++++++
>  4 files changed, 224 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/include/asm/perf_regs.h
>  create mode 100644 arch/x86/include/asm/perf_regs_32.h
>  create mode 100644 arch/x86/include/asm/perf_regs_64.h
>  create mode 100644 include/asm-generic/perf_regs.h
>
> diff --git a/arch/x86/include/asm/perf_regs.h b/arch/x86/include/asm/perf_regs.h
> new file mode 100644
> index 0000000..80b7fbe
> --- /dev/null
> +++ b/arch/x86/include/asm/perf_regs.h
> @@ -0,0 +1,16 @@
> +#ifndef _ASM_X86_PERF_REGS_H
> +#define _ASM_X86_PERF_REGS_H
> +
> +enum {
> +       PERF_REGS_VERSION_NONE   = 0UL,
> +       PERF_REGS_VERSION_X86_32 = 1UL,
> +       PERF_REGS_VERSION_X86_64 = 2UL,
> +};
> +
I don't really like the term VERSION here. It's not a versioning
problem you're trying to solve. It's an ABI problem, unless I am
mistaken. You should rename to PERF_REGS_ABI_X86_32 and
PERF_REGS_ABI_X86_64.

I assume the NONE is here to cover the case where you don't
have a user machine state, i.e., hit a kernel thread. Is that right?


> +#ifdef CONFIG_X86_32
> +#include "perf_regs_32.h"
> +#else
> +#include "perf_regs_64.h"
> +#endif
> +
How are you going to deal with 32-bit binaries sampled on a 64-bit system?

> +#endif /* _ASM_X86_PERF_REGS_H */
> diff --git a/arch/x86/include/asm/perf_regs_32.h b/arch/x86/include/asm/perf_regs_32.h
> new file mode 100644
> index 0000000..3c5aa80
> --- /dev/null
> +++ b/arch/x86/include/asm/perf_regs_32.h
> @@ -0,0 +1,84 @@
> +#ifndef _ASM_X86_PERF_REGS_32_H
> +#define _ASM_X86_PERF_REGS_32_H
> +
> +enum perf_event_x86_32_regs {
> +       PERF_X86_32_REG_EAX,
> +       PERF_X86_32_REG_EBX,
> +       PERF_X86_32_REG_ECX,
> +       PERF_X86_32_REG_EDX,
> +       PERF_X86_32_REG_ESI,
> +       PERF_X86_32_REG_EDI,
> +       PERF_X86_32_REG_EBP,
> +       PERF_X86_32_REG_ESP,
> +       PERF_X86_32_REG_EIP,
> +       PERF_X86_32_REG_FLAGS,
> +       PERF_X86_32_REG_CS,
> +       PERF_X86_32_REG_DS,
> +       PERF_X86_32_REG_ES,
> +       PERF_X86_32_REG_FS,
> +       PERF_X86_32_REG_GS,
> +
> +       /* Non ABI */
> +       PERF_X86_32_REG_MAX,
> +       PERF_REG_IP = PERF_X86_32_REG_EIP,
> +       PERF_REG_SP = PERF_X86_32_REG_ESP,
> +};
> +
> +#ifdef __KERNEL__
> +
> +#define PERF_X86_32_REG_RESERVED (~((1ULL << PERF_X86_32_REG_MAX) - 1ULL))
> +
> +static inline u64 perf_reg_version(void)
> +{
> +       return PERF_REGS_VERSION_X86_32;
> +}
> +
> +static inline int perf_reg_validate(u64 mask)
> +{
> +       if (mask & PERF_X86_32_REG_RESERVED)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +static inline u64 perf_reg_value(struct pt_regs *regs, int idx)
> +{
> +       switch (idx) {
> +       case PERF_X86_32_REG_EAX:
> +               return regs->ax;
> +       case PERF_X86_32_REG_EBX:
> +               return regs->bx;
> +       case PERF_X86_32_REG_ECX:
> +               return regs->cx;
> +       case PERF_X86_32_REG_EDX:
> +               return regs->dx;
> +       case PERF_X86_32_REG_ESI:
> +               return regs->si;
> +       case PERF_X86_32_REG_EDI:
> +               return regs->di;
> +       case PERF_X86_32_REG_EBP:
> +               return regs->bp;
> +       case PERF_X86_32_REG_ESP:
> +               return regs->sp;
> +       case PERF_X86_32_REG_EIP:
> +               return regs->ip;
> +       case PERF_X86_32_REG_FLAGS:
> +               return regs->flags;
> +       case PERF_X86_32_REG_CS:
> +               return regs->cs;
> +       case PERF_X86_32_REG_DS:
> +               return regs->ds;
> +       case PERF_X86_32_REG_ES:
> +               return regs->es;
> +       case PERF_X86_32_REG_FS:
> +               return regs->fs;
> +       case PERF_X86_32_REG_GS:
> +               return regs->gs;
> +       }
> +
> +       return 0;
> +}
> +
> +#endif /* __KERNEL__ */
> +
> +#endif /* _ASM_X86_PERF_REGS_32_H */
> diff --git a/arch/x86/include/asm/perf_regs_64.h b/arch/x86/include/asm/perf_regs_64.h
> new file mode 100644
> index 0000000..d775213
> --- /dev/null
> +++ b/arch/x86/include/asm/perf_regs_64.h
> @@ -0,0 +1,101 @@
> +#ifndef _ASM_X86_PERF_REGS_64_H
> +#define _ASM_X86_PERF_REGS_64_H
> +
> +#define PERF_X86_64_REG_VERSION                1ULL
> +
> +enum perf_event_x86_64_regs {
> +       PERF_X86_64_REG_RAX,
> +       PERF_X86_64_REG_RBX,
> +       PERF_X86_64_REG_RCX,
> +       PERF_X86_64_REG_RDX,
> +       PERF_X86_64_REG_RSI,
> +       PERF_X86_64_REG_RDI,
> +       PERF_X86_64_REG_R8,
> +       PERF_X86_64_REG_R9,
> +       PERF_X86_64_REG_R10,
> +       PERF_X86_64_REG_R11,
> +       PERF_X86_64_REG_R12,
> +       PERF_X86_64_REG_R13,
> +       PERF_X86_64_REG_R14,
> +       PERF_X86_64_REG_R15,
> +       PERF_X86_64_REG_RBP,
> +       PERF_X86_64_REG_RSP,
> +       PERF_X86_64_REG_RIP,
> +       PERF_X86_64_REG_FLAGS,
> +       PERF_X86_64_REG_CS,
> +       PERF_X86_64_REG_SS,
> +
> +       /* Non ABI */
> +       PERF_X86_64_REG_MAX,
> +       PERF_REG_IP = PERF_X86_64_REG_RIP,
> +       PERF_REG_SP = PERF_X86_64_REG_RSP,
> +};
> +
> +#ifdef __KERNEL__
> +
> +#define PERF_X86_64_REG_RESERVED (~((1ULL << PERF_X86_64_REG_MAX) - 1ULL))
> +
> +static inline u64 perf_reg_version(void)
> +{
> +       return PERF_REGS_VERSION_X86_64;
> +}
> +
> +static inline int perf_reg_validate(u64 mask)
> +{
> +       if (mask & PERF_X86_64_REG_RESERVED)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +static inline u64 perf_reg_value(struct pt_regs *regs, int idx)
> +{
> +       switch (idx) {
> +       case PERF_X86_64_REG_RAX:
> +               return regs->ax;
> +       case PERF_X86_64_REG_RBX:
> +               return regs->bx;
> +       case PERF_X86_64_REG_RCX:
> +               return regs->cx;
> +       case PERF_X86_64_REG_RDX:
> +               return regs->dx;
> +       case PERF_X86_64_REG_RSI:
> +               return regs->si;
> +       case PERF_X86_64_REG_RDI:
> +               return regs->di;
> +       case PERF_X86_64_REG_R8:
> +               return regs->r8;
> +       case PERF_X86_64_REG_R9:
> +               return regs->r9;
> +       case PERF_X86_64_REG_R10:
> +               return regs->r10;
> +       case PERF_X86_64_REG_R11:
> +               return regs->r11;
> +       case PERF_X86_64_REG_R12:
> +               return regs->r12;
> +       case PERF_X86_64_REG_R13:
> +               return regs->r13;
> +       case PERF_X86_64_REG_R14:
> +               return regs->r14;
> +       case PERF_X86_64_REG_R15:
> +               return regs->r15;
> +       case PERF_X86_64_REG_RBP:
> +               return regs->bp;
> +       case PERF_X86_64_REG_RSP:
> +               return regs->sp;
> +       case PERF_X86_64_REG_RIP:
> +               return regs->ip;
> +       case PERF_X86_64_REG_FLAGS:
> +               return regs->flags;
> +       case PERF_X86_64_REG_CS:
> +               return regs->cs;
> +       case PERF_X86_64_REG_SS:
> +               return regs->ss;
> +       }
> +
> +       return 0;
> +}
> +
> +#endif /* __KERNEL__ */
> +
> +#endif /* _ASM_X86_PERF_REGS_64_H */
> diff --git a/include/asm-generic/perf_regs.h b/include/asm-generic/perf_regs.h
> new file mode 100644
> index 0000000..f616096
> --- /dev/null
> +++ b/include/asm-generic/perf_regs.h
> @@ -0,0 +1,23 @@
> +#ifndef __ASM_GENERIC_PERF_REGS_H
> +#define __ASM_GENERIC_PERF_REGS_H
> +
> +enum {
> +       PERF_REGS_VERSION_NONE = 0UL,
> +};
> +
> +static inline int perf_reg_value(struct pt_regs *regs, int idx)
> +{
> +       return 0;
> +}
> +
> +static inline int perf_reg_version(void)
> +{
> +       return PERF_REGS_VERSION_NONE;
> +}
> +
> +static inline int perf_reg_validate(u64 mask)
> +{
> +       return mask ? -ENOSYS : 0;
> +}
> +
> +#endif /* __ASM_GENERIC_PERF_REGS_H */
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/16] perf: Add ability to dump user regs
  2012-04-17 11:17 ` [PATCH 03/16] perf: Add ability to dump user regs Jiri Olsa
@ 2012-04-23 10:15   ` Stephane Eranian
  0 siblings, 0 replies; 28+ messages in thread
From: Stephane Eranian @ 2012-04-23 10:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper

On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> Add new attr->user_regs bitmap that lets a user choose a set
> of user registers to dump to the sample. The layout of this
> bitmap is described in asm/perf_regs.h for archs that
> support register dump.
>
> The perf syscall will fail if attr->user_regs is non zero.
>
> The register value here are those of the user space context as
> it was before the user entered the kernel for whatever reason
> (syscall, irq, exception, or a PMI happening in userspace).
>
> This is going to be useful to bring Dwarf CFI based stack unwinding
> on top of samples.
>
> TODO handle compat tasks
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/perf_event.h |    8 +++++
>  kernel/events/core.c       |   63 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index ddbb6a9..c63b807 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -272,6 +272,12 @@ struct perf_event_attr {
>                __u64           config2; /* extension of config1 */
>        };
>        __u64   branch_sample_type; /* enum branch_sample_type */
> +
> +       /*
> +        * Arch specific mask that defines a set of user regs to dump on
> +        * samples. See asm/perf_regs.h for details.
> +        */
> +       __u64                   user_regs;
>  };
>
Don't like the name of the field too much. You be more explicit.
Something like user_sample_regs.

>  /*
> @@ -608,6 +614,7 @@ struct perf_guest_info_callbacks {
>  #include <linux/atomic.h>
>  #include <linux/sysfs.h>
>  #include <asm/local.h>
> +#include <asm/perf_regs.h>
>
>  #define PERF_MAX_STACK_DEPTH           255
>
> @@ -1130,6 +1137,7 @@ struct perf_sample_data {
>        struct perf_callchain_entry     *callchain;
>        struct perf_raw_record          *raw;
>        struct perf_branch_stack        *br_stack;
> +       struct pt_regs                  *uregs;
>  };
>
>  static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index a6a9ec4..9f29fc3 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3751,6 +3751,37 @@ int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
>  }
>  EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
>
> +static void
> +perf_output_sample_regs(struct perf_output_handle *handle,
> +                       struct pt_regs *regs, u64 mask)
> +{
> +       int i = 0;
> +
> +       do {
> +               u64 val;
> +
> +               if (mask & 1) {
> +                       val = perf_reg_value(regs, i);
> +                       perf_output_put(handle, val);
> +               }
> +
> +               mask >>= 1;
> +               i++;
> +       } while (mask);
> +}
> +
> +static struct pt_regs *perf_sample_uregs(struct pt_regs *regs)
> +{
> +       if (!user_mode(regs)) {
> +               if (current->mm)
> +                       regs = task_pt_regs(current);
> +               else
> +                       regs = NULL;
> +       }
> +
> +       return regs;
> +}
> +
You are assuming the user app is running the same ABI than that of the
kernel. That's not correct on X86 and possibly other architectures.

>  static void __perf_event_header__init_id(struct perf_event_header *header,
>                                         struct perf_sample_data *data,
>                                         struct perf_event *event)
> @@ -4011,6 +4042,21 @@ void perf_output_sample(struct perf_output_handle *handle,
>                        perf_output_put(handle, nr);
>                }
>        }
> +
> +       if (event->attr.user_regs) {
> +               u64 id;
> +
> +               /*
> +                * If there are no regs to dump, notice it through a
> +                * PERF_REGS_VERSION_NONE version.
> +                */
> +               id = data->uregs ? perf_reg_version() : PERF_REGS_VERSION_NONE;
> +               perf_output_put(handle, id);
> +
> +               if (id)
> +                       perf_output_sample_regs(handle, data->uregs,
> +                                               event->attr.user_regs);
> +       }
>  }
>
>  void perf_prepare_sample(struct perf_event_header *header,
> @@ -4062,6 +4108,16 @@ void perf_prepare_sample(struct perf_event_header *header,
>                }
>                header->size += size;
>        }
> +
> +       if (event->attr.user_regs) {
> +               int size = sizeof(u64); /* the version size */
> +
> +               data->uregs = perf_sample_uregs(regs);
> +               if (data->uregs)
> +                       size += hweight64(event->attr.user_regs) * sizeof(u64);
> +
> +               header->size += size;
> +       }
>  }
>
>  static void perf_event_output(struct perf_event *event,
> @@ -6112,6 +6168,13 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
>                        attr->branch_sample_type = mask;
>                }
>        }
> +
> +       /*
> +        * Don't let throught invalid register mask (i.e. the architecture
> +        * does not support register dump at all).
> +        */
> +       ret = perf_reg_validate(attr->user_regs);
> +
Again, how do you deal with 32-bit apps vs. 64-bit kernel?

>  out:
>        return ret;
>
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-04-23 10:10   ` Stephane Eranian
@ 2012-04-23 10:33     ` Jiri Olsa
  2012-04-26 15:28       ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2012-04-23 10:33 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper

On Mon, Apr 23, 2012 at 12:10:57PM +0200, Stephane Eranian wrote:
> On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > This brings a new API to help the selective dump of registers on
> > event sampling, and its implementation in x86.
> >
> > - The informations about the desired registers will be passed
> >  to a single u64 mask. It's up to the architecture to map the
> >  registers into the mask bits.
> >
> > - The architecture must provide a non-zero and unique id to
> >  identify the origin of a register set because interpreting a
> >  register dump requires to know from which architecture it comes.
> >  The achitecture is considered different between the 32 and 64 bits
> >  version. x86-32 has the id 1, x86-64 has the id 2.
> >
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  arch/x86/include/asm/perf_regs.h    |   16 ++++++
> >  arch/x86/include/asm/perf_regs_32.h |   84 +++++++++++++++++++++++++++++
> >  arch/x86/include/asm/perf_regs_64.h |  101 +++++++++++++++++++++++++++++++++++
> >  include/asm-generic/perf_regs.h     |   23 ++++++++
> >  4 files changed, 224 insertions(+), 0 deletions(-)
> >  create mode 100644 arch/x86/include/asm/perf_regs.h
> >  create mode 100644 arch/x86/include/asm/perf_regs_32.h
> >  create mode 100644 arch/x86/include/asm/perf_regs_64.h
> >  create mode 100644 include/asm-generic/perf_regs.h
> >
> > diff --git a/arch/x86/include/asm/perf_regs.h b/arch/x86/include/asm/perf_regs.h
> > new file mode 100644
> > index 0000000..80b7fbe
> > --- /dev/null
> > +++ b/arch/x86/include/asm/perf_regs.h
> > @@ -0,0 +1,16 @@
> > +#ifndef _ASM_X86_PERF_REGS_H
> > +#define _ASM_X86_PERF_REGS_H
> > +
> > +enum {
> > +       PERF_REGS_VERSION_NONE   = 0UL,
> > +       PERF_REGS_VERSION_X86_32 = 1UL,
> > +       PERF_REGS_VERSION_X86_64 = 2UL,
> > +};
> > +
> I don't really like the term VERSION here. It's not a versioning
> problem you're trying to solve. It's an ABI problem, unless I am
> mistaken. You should rename to PERF_REGS_ABI_X86_32 and
> PERF_REGS_ABI_X86_64.
> 
> I assume the NONE is here to cover the case where you don't
> have a user machine state, i.e., hit a kernel thread. Is that right?

right

> 
> 
> > +#ifdef CONFIG_X86_32
> > +#include "perf_regs_32.h"
> > +#else
> > +#include "perf_regs_64.h"
> > +#endif
> > +
> How are you going to deal with 32-bit binaries sampled on a 64-bit system?

I dont have the solution right now... but seems like compat tasks need more
thinking even before go ahead with this patchset.. since it's going affect
the perf_event_attr and could bite us in future.

I'll see what I can do about that and send out new patchset

jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-04-23 10:33     ` Jiri Olsa
@ 2012-04-26 15:28       ` Jiri Olsa
  2012-05-02 12:00         ` Stephane Eranian
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2012-04-26 15:28 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper

On Mon, Apr 23, 2012 at 12:33:50PM +0200, Jiri Olsa wrote:
> On Mon, Apr 23, 2012 at 12:10:57PM +0200, Stephane Eranian wrote:
> > On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:

SNIP

> > How are you going to deal with 32-bit binaries sampled on a 64-bit system?
> 
> I dont have the solution right now... but seems like compat tasks need more
> thinking even before go ahead with this patchset.. since it's going affect
> the perf_event_attr and could bite us in future.
hi,
got more info on the compat task unwind

- for 32 bit task running under 64 bit env. the 64 bits user
  registers values are stored on kernel stack when entering
  the kernel via exception or interrupt, like for native
  64 bit task

  So I think we can keep the current interface as far as
  compat tasks are concerned, since we will get 64 bits
  registers all the time anyway.

  The place that will take care of compat task unwind
  is the post processing unwind.

  For each processed sample we:
     - get the sample and translate IP into MAP and DSO
     - read DSO ELF class and figure out wether we deal with
       64 or 32 bit task
     - run libunwind interface with proper task class info,
       which gets us to next bullet:

- 64 bit libunwind does not support unwind of 32 bit tasks ;)
  so unless that change, I can see just one hacky way of doing
  this via 32 bit libunwind being loaded in separate 32 bit
  process and doing remote unwind for us..

  I'll try to follow on this to see if there'd be some better
  libunwind interface solution.. but thats quite longterm ;)


As for the sample registers interface.

Currently we have:

  u64 user_sample_regs
  - if != 0 we provide the user registers with mask specified
    by its value

  - it will stay for compat tasks as well
  - we could use PERF_SAMPLE_USER_REGS sample type instead of the != 0
    check to be more consistent, but that would eat up one sample bit
    unnecessary

In some previous email you suggested some generic interface like

    attr->sample_type |= PERF_SAMPLE_REGS
    attr->sample_regs = EAX | EBX | EDI | ESI |.....
    attr->sample_reg_mode = { INTR, PRECISE, USER }

I think we can have something like:

    attr->sample_type |= PERF_SAMPLE_REGS
    attr->sample_reg_mode = { INTR, PRECISE, USER }

but in case we want eg both USER and INTR modes together then we still
need to have:

  u64 user_sample_regs
  u64 intr_sample_regs
  ...

for the register modes mask definition. Some mode combinations might be
useless, but I think this could work.. we could always customize our
needs with new mode ;)

I'll start to work on this unless I hear some screaming ;)

thoughts? ;)


thanks and sorry for long email,
jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-04-26 15:28       ` Jiri Olsa
@ 2012-05-02 12:00         ` Stephane Eranian
  2012-05-02 12:26           ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Stephane Eranian @ 2012-05-02 12:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, Arun Sharma

Sorry for the delay, had higher priority tasks to do.
[+asharma]

On Thu, Apr 26, 2012 at 5:28 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Mon, Apr 23, 2012 at 12:33:50PM +0200, Jiri Olsa wrote:
>> On Mon, Apr 23, 2012 at 12:10:57PM +0200, Stephane Eranian wrote:
>> > On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>
> SNIP
>
>> > How are you going to deal with 32-bit binaries sampled on a 64-bit system?
>>
>> I dont have the solution right now... but seems like compat tasks need more
>> thinking even before go ahead with this patchset.. since it's going affect
>> the perf_event_attr and could bite us in future.
> hi,
> got more info on the compat task unwind
>
> - for 32 bit task running under 64 bit env. the 64 bits user
>  registers values are stored on kernel stack when entering
>  the kernel via exception or interrupt, like for native
>  64 bit task
>
You mean the 32-bit registers are stored on the kernel stack,
right? Or you mean 64-bit and the upper 32 are guaranteed 0.


>  So I think we can keep the current interface as far as
>  compat tasks are concerned, since we will get 64 bits
>  registers all the time anyway.
>
>  The place that will take care of compat task unwind
>  is the post processing unwind.
>
>  For each processed sample we:
>     - get the sample and translate IP into MAP and DSO
>     - read DSO ELF class and figure out wether we deal with
>       64 or 32 bit task
>     - run libunwind interface with proper task class info,
>       which gets us to next bullet:
>
> - 64 bit libunwind does not support unwind of 32 bit tasks ;)
>  so unless that change, I can see just one hacky way of doing
>  this via 32 bit libunwind being loaded in separate 32 bit
>  process and doing remote unwind for us..

okay was not aware of that restriction on libunwind. I copied Arun
on this response, so maybe he can comment on that.

>
>  I'll try to follow on this to see if there'd be some better
>  libunwind interface solution.. but thats quite longterm ;)
>
>
> As for the sample registers interface.
>
> Currently we have:
>
>  u64 user_sample_regs
>  - if != 0 we provide the user registers with mask specified
>    by its value
>
>  - it will stay for compat tasks as well

What if I say EAX|EBX|R15? but the sample was captured
on a 32-bit tasks. Are you going to just store 0 for R15?
Unless you also store a bitmask of what was actually saved,
then you have to fill in non-existent registers with zeroes, otherwise
the tool cannot parse the sample.


>  - we could use PERF_SAMPLE_USER_REGS sample type instead of the != 0
>    check to be more consistent, but that would eat up one sample bit
>    unnecessary

But then that would be aligned with how branch_stack has been implemented
for instance (PERF_SAMPLE_BRANCH_STACK).

>
> In some previous email you suggested some generic interface like
>
>    attr->sample_type |= PERF_SAMPLE_REGS
>    attr->sample_regs = EAX | EBX | EDI | ESI |.....
>    attr->sample_reg_mode = { INTR, PRECISE, USER }
>
> I think we can have something like:
>
>    attr->sample_type |= PERF_SAMPLE_REGS
>    attr->sample_reg_mode = { INTR, PRECISE, USER }
>
> but in case we want eg both USER and INTR modes together then we still
> need to have:
>
>  u64 user_sample_regs
>  u64 intr_sample_regs
>  ...
>
Yes. but if we allow any combinations, then you'd need
u64 user_sample_regs
u64 intr_sample_regs
u64 precise_sample_regs

Note that in the case of Intel PEBS used for precise mode, there are
only a subset of the INTR registers available.

> for the register modes mask definition. Some mode combinations might be
> useless, but I think this could work.. we could always customize our
> needs with new mode ;)
>
The INTR vs. PRECISE is useful to get an idea of the skid.
The USER vs. INTR is useful to determine how we entered
the kernel in case the IP @ INTR is in the kernel.

> I'll start to work on this unless I hear some screaming ;)
>

In any case, the important issue is how does the kernel
satisfy the request for registers when those  may not
be available in the interrupt task AND it is impossible
to know this in advance.

Note that in the case of precise on Intel, we know in advance
which registers will be available. So you can fail early, when
the event is created.

The alternative is to include the bitmask of which registers
was actually saved at the beginning of the section after the
ABI type flag.


> thoughts? ;)
>
>
> thanks and sorry for long email,
> jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-05-02 12:00         ` Stephane Eranian
@ 2012-05-02 12:26           ` Jiri Olsa
  2012-05-02 12:36             ` Stephane Eranian
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2012-05-02 12:26 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, Arun Sharma

On Wed, May 02, 2012 at 02:00:23PM +0200, Stephane Eranian wrote:
> Sorry for the delay, had higher priority tasks to do.
hi,
np at all :)
I just sent v3, but I answered some of your comments below

thanks,
jirka


> [+asharma]
> 
> On Thu, Apr 26, 2012 at 5:28 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > On Mon, Apr 23, 2012 at 12:33:50PM +0200, Jiri Olsa wrote:
> >> On Mon, Apr 23, 2012 at 12:10:57PM +0200, Stephane Eranian wrote:
> >> > On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > SNIP
> >
> >> > How are you going to deal with 32-bit binaries sampled on a 64-bit system?
> >>
> >> I dont have the solution right now... but seems like compat tasks need more
> >> thinking even before go ahead with this patchset.. since it's going affect
> >> the perf_event_attr and could bite us in future.
> > hi,
> > got more info on the compat task unwind
> >
> > - for 32 bit task running under 64 bit env. the 64 bits user
> >  registers values are stored on kernel stack when entering
> >  the kernel via exception or interrupt, like for native
> >  64 bit task
> >
> You mean the 32-bit registers are stored on the kernel stack,
> right? Or you mean 64-bit and the upper 32 are guaranteed 0.

I meant 64 bit registers are stored on stack the same way
as for native process. There are different code paths for
exception, but same registers' saved stack layout.

So if there's an event within the compat task, you still get
64 bit registers saved on stack as if the event happened
in native process.

The upper 32 are probably 0, but I'm not sure that's garanteed.

> 
> 
> >  So I think we can keep the current interface as far as
> >  compat tasks are concerned, since we will get 64 bits
> >  registers all the time anyway.
> >
> >  The place that will take care of compat task unwind
> >  is the post processing unwind.
> >
> >  For each processed sample we:
> >     - get the sample and translate IP into MAP and DSO
> >     - read DSO ELF class and figure out wether we deal with
> >       64 or 32 bit task
> >     - run libunwind interface with proper task class info,
> >       which gets us to next bullet:
> >
> > - 64 bit libunwind does not support unwind of 32 bit tasks ;)
> >  so unless that change, I can see just one hacky way of doing
> >  this via 32 bit libunwind being loaded in separate 32 bit
> >  process and doing remote unwind for us..
> 
> okay was not aware of that restriction on libunwind. I copied Arun
> on this response, so maybe he can comment on that.
> 
> >
> >  I'll try to follow on this to see if there'd be some better
> >  libunwind interface solution.. but thats quite longterm ;)
> >
> >
> > As for the sample registers interface.
> >
> > Currently we have:
> >
> >  u64 user_sample_regs
> >  - if != 0 we provide the user registers with mask specified
> >    by its value
> >
> >  - it will stay for compat tasks as well
> 
> What if I say EAX|EBX|R15? but the sample was captured
> on a 32-bit tasks. Are you going to just store 0 for R15?
> Unless you also store a bitmask of what was actually saved,
> then you have to fill in non-existent registers with zeroes, otherwise
> the tool cannot parse the sample.

I just sent v3, with changed design to be more generic, please check

anyway, currently there's no way to mix 32 and 64 bit registers in sample.

As I mentioned above, once running compat task, 64 bit registers
are stored anyway. Given that all 32 bit registers have 64 equiv.
you can ask to store RAX|RBX|R15.

You need to know wether to examine 32 or 64 bit register afterwards.

> 
> 
> >  - we could use PERF_SAMPLE_USER_REGS sample type instead of the != 0
> >    check to be more consistent, but that would eat up one sample bit
> >    unnecessary
> 
> But then that would be aligned with how branch_stack has been implemented
> for instance (PERF_SAMPLE_BRANCH_STACK).
> 
> >
> > In some previous email you suggested some generic interface like
> >
> >    attr->sample_type |= PERF_SAMPLE_REGS
> >    attr->sample_regs = EAX | EBX | EDI | ESI |.....
> >    attr->sample_reg_mode = { INTR, PRECISE, USER }
> >
> > I think we can have something like:
> >
> >    attr->sample_type |= PERF_SAMPLE_REGS
> >    attr->sample_reg_mode = { INTR, PRECISE, USER }
> >
> > but in case we want eg both USER and INTR modes together then we still
> > need to have:
> >
> >  u64 user_sample_regs
> >  u64 intr_sample_regs
> >  ...
> >
> Yes. but if we allow any combinations, then you'd need
> u64 user_sample_regs
> u64 intr_sample_regs
> u64 precise_sample_regs
> 
> Note that in the case of Intel PEBS used for precise mode, there are
> only a subset of the INTR registers available.
> 
> > for the register modes mask definition. Some mode combinations might be
> > useless, but I think this could work.. we could always customize our
> > needs with new mode ;)
> >
> The INTR vs. PRECISE is useful to get an idea of the skid.
> The USER vs. INTR is useful to determine how we entered
> the kernel in case the IP @ INTR is in the kernel.
> 
> > I'll start to work on this unless I hear some screaming ;)
> >

my thinking with v3 was to have new sample type PERF_SAMPLE_REGS

Once set there's perf_event_attr:sample_regs value carying the
king of registers we want to store.

Currently there's just following user regs bit:

enum perf_sample_regs {
       PERF_SAMPLE_REGS_USER   = 1U << 0, /* user registers */
       PERF_SAMPLE_REGS_MAX    = 1U << 1, /* non-ABI */
};

If PERF_SAMPLE_REGS_USER is set then perf_event_attr::sample_regs_user
gives the mask of user registers to store.

we could add more bits like:
       PERF_SAMPLE_REGS_KERNEL
       PERF_SAMPLE_REGS_PRECISE
       ...

to determine the kind of registers we want to dump and
retrieve registers accordingly. And if the bit needs
additional info we add new perf_event_attr value same
like in sample_regs_user case.


> 
> In any case, the important issue is how does the kernel
> satisfy the request for registers when those  may not
> be available in the interrupt task AND it is impossible
> to know this in advance.
> 
> Note that in the case of precise on Intel, we know in advance
> which registers will be available. So you can fail early, when
> the event is created.
> 
> The alternative is to include the bitmask of which registers
> was actually saved at the beginning of the section after the
> ABI type flag.
> 
> 
> > thoughts? ;)
> >
> >
> > thanks and sorry for long email,
> > jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-05-02 12:26           ` Jiri Olsa
@ 2012-05-02 12:36             ` Stephane Eranian
  2012-05-02 12:58               ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Stephane Eranian @ 2012-05-02 12:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, Arun Sharma

On Wed, May 2, 2012 at 2:26 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, May 02, 2012 at 02:00:23PM +0200, Stephane Eranian wrote:
>> Sorry for the delay, had higher priority tasks to do.
> hi,
> np at all :)
> I just sent v3, but I answered some of your comments below
>
> thanks,
> jirka
>
>
>> [+asharma]
>>
>> On Thu, Apr 26, 2012 at 5:28 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>> > On Mon, Apr 23, 2012 at 12:33:50PM +0200, Jiri Olsa wrote:
>> >> On Mon, Apr 23, 2012 at 12:10:57PM +0200, Stephane Eranian wrote:
>> >> > On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>> >
>> > SNIP
>> >
>> >> > How are you going to deal with 32-bit binaries sampled on a 64-bit system?
>> >>
>> >> I dont have the solution right now... but seems like compat tasks need more
>> >> thinking even before go ahead with this patchset.. since it's going affect
>> >> the perf_event_attr and could bite us in future.
>> > hi,
>> > got more info on the compat task unwind
>> >
>> > - for 32 bit task running under 64 bit env. the 64 bits user
>> > áregisters values are stored on kernel stack when entering
>> > áthe kernel via exception or interrupt, like for native
>> > á64 bit task
>> >
>> You mean the 32-bit registers are stored on the kernel stack,
>> right? Or you mean 64-bit and the upper 32 are guaranteed 0.
>
> I meant 64 bit registers are stored on stack the same way
> as for native process. There are different code paths for
> exception, but same registers' saved stack layout.
>
> So if there's an event within the compat task, you still get
> 64 bit registers saved on stack as if the event happened
> in native process.
>
> The upper 32 are probably 0, but I'm not sure that's garanteed.
>
>>
>>
>> > áSo I think we can keep the current interface as far as
>> > ácompat tasks are concerned, since we will get 64 bits
>> > áregisters all the time anyway.
>> >
>> > áThe place that will take care of compat task unwind
>> > áis the post processing unwind.
>> >
>> > áFor each processed sample we:
>> > á á - get the sample and translate IP into MAP and DSO
>> > á á - read DSO ELF class and figure out wether we deal with
>> > á á á 64 or 32 bit task
>> > á á - run libunwind interface with proper task class info,
>> > á á á which gets us to next bullet:
>> >
>> > - 64 bit libunwind does not support unwind of 32 bit tasks ;)
>> > áso unless that change, I can see just one hacky way of doing
>> > áthis via 32 bit libunwind being loaded in separate 32 bit
>> > áprocess and doing remote unwind for us..
>>
>> okay was not aware of that restriction on libunwind. I copied Arun
>> on this response, so maybe he can comment on that.
>>
>> >
>> > áI'll try to follow on this to see if there'd be some better
>> > álibunwind interface solution.. but thats quite longterm ;)
>> >
>> >
>> > As for the sample registers interface.
>> >
>> > Currently we have:
>> >
>> > áu64 user_sample_regs
>> > á- if != 0 we provide the user registers with mask specified
>> > á áby its value
>> >
>> > á- it will stay for compat tasks as well
>>
>> What if I say EAX|EBX|R15? but the sample was captured
>> on a 32-bit tasks. Are you going to just store 0 for R15?
>> Unless you also store a bitmask of what was actually saved,
>> then you have to fill in non-existent registers with zeroes, otherwise
>> the tool cannot parse the sample.
>
> I just sent v3, with changed design to be more generic, please check
>
> anyway, currently there's no way to mix 32 and 64 bit registers in sample.
>
> As I mentioned above, once running compat task, 64 bit registers
> are stored anyway. Given that all 32 bit registers have 64 equiv.
> you can ask to store RAX|RBX|R15.
>
Well, R8-R15 do not exist in 32-bit mode. So I wonder what is saved
on the stack for those, probably nothing. And in that case, how do you
handle the case where the user asked for R15 but it is not available and
you know that only on PMU interrupt.


> You need to know wether to examine 32 or 64 bit register afterwards.
>
>>
>>
>> > á- we could use PERF_SAMPLE_USER_REGS sample type instead of the != 0
>> > á ácheck to be more consistent, but that would eat up one sample bit
>> > á áunnecessary
>>
>> But then that would be aligned with how branch_stack has been implemented
>> for instance (PERF_SAMPLE_BRANCH_STACK).
>>
>> >
>> > In some previous email you suggested some generic interface like
>> >
>> > á áattr->sample_type |= PERF_SAMPLE_REGS
>> > á áattr->sample_regs = EAX | EBX | EDI | ESI |.....
>> > á áattr->sample_reg_mode = { INTR, PRECISE, USER }
>> >
>> > I think we can have something like:
>> >
>> > á áattr->sample_type |= PERF_SAMPLE_REGS
>> > á áattr->sample_reg_mode = { INTR, PRECISE, USER }
>> >
>> > but in case we want eg both USER and INTR modes together then we still
>> > need to have:
>> >
>> > áu64 user_sample_regs
>> > áu64 intr_sample_regs
>> > á...
>> >
>> Yes. but if we allow any combinations, then you'd need
>> u64 user_sample_regs
>> u64 intr_sample_regs
>> u64 precise_sample_regs
>>
>> Note that in the case of Intel PEBS used for precise mode, there are
>> only a subset of the INTR registers available.
>>
>> > for the register modes mask definition. Some mode combinations might be
>> > useless, but I think this could work.. we could always customize our
>> > needs with new mode ;)
>> >
>> The INTR vs. PRECISE is useful to get an idea of the skid.
>> The USER vs. INTR is useful to determine how we entered
>> the kernel in case the IP @ INTR is in the kernel.
>>
>> > I'll start to work on this unless I hear some screaming ;)
>> >
>
> my thinking with v3 was to have new sample type PERF_SAMPLE_REGS
>
> Once set there's perf_event_attr:sample_regs value carying the
> king of registers we want to store.
>
> Currently there's just following user regs bit:
>
> enum perf_sample_regs {
>       PERF_SAMPLE_REGS_USER   = 1U << 0, /* user registers */
>       PERF_SAMPLE_REGS_MAX    = 1U << 1, /* non-ABI */
> };
>
> If PERF_SAMPLE_REGS_USER is set then perf_event_attr::sample_regs_user
> gives the mask of user registers to store.
>
> we could add more bits like:
>       PERF_SAMPLE_REGS_KERNEL
>       PERF_SAMPLE_REGS_PRECISE
>       ...
>
> to determine the kind of registers we want to dump and
> retrieve registers accordingly. And if the bit needs
> additional info we add new perf_event_attr value same
> like in sample_regs_user case.
>
>
>>
>> In any case, the important issue is how does the kernel
>> satisfy the request for registers when those  may not
>> be available in the interrupt task AND it is impossible
>> to know this in advance.
>>
>> Note that in the case of precise on Intel, we know in advance
>> which registers will be available. So you can fail early, when
>> the event is created.
>>
>> The alternative is to include the bitmask of which registers
>> was actually saved at the beginning of the section after the
>> ABI type flag.
>>
>>
>> > thoughts? ;)
>> >
>> >
>> > thanks and sorry for long email,
>> > jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-05-02 12:36             ` Stephane Eranian
@ 2012-05-02 12:58               ` Jiri Olsa
  2012-05-02 14:46                 ` Stephane Eranian
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2012-05-02 12:58 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, Arun Sharma


SNIP

> >
> > I just sent v3, with changed design to be more generic, please check
> >
> > anyway, currently there's no way to mix 32 and 64 bit registers in sample.
> >
> > As I mentioned above, once running compat task, 64 bit registers
> > are stored anyway. Given that all 32 bit registers have 64 equiv.
> > you can ask to store RAX|RBX|R15.
> >
> Well, R8-R15 do not exist in 32-bit mode. So I wonder what is saved
> on the stack for those, probably nothing. And in that case, how do you
> handle the case where the user asked for R15 but it is not available and
> you know that only on PMU interrupt.

right, R8-R15 do not exist in 32 bit mode, meaning that the 32 bit task
do not use them... but when you enter 64 bit kernel from 32 bit compat
task, still 64bits registers are saved.. as for native 64 process,
there's no difference.. so even for 32 bit task you get 64 bits
registers described below.


The availbility of registers is tricky for user space register dump.

We dump whatever we got in pt_regs structure from perf_sample_regs_user.
The problem is, that exception do not store all of the registers.. the rest
of the registers is whatever is in stack occupying the pt_regs space.

When kernel is entered (by exception or irq), following is saved by CPU
(both 32 and 64 bits):
 ss, sp, flags, cs ip

followed by SAVE_ARGS  (for exception):
        movq_cfi rdi, 8*8
        movq_cfi rsi, 7*8
        movq_cfi rdx, 6*8

        .if \save_rcx
        movq_cfi rcx, 5*8
        .endif

        movq_cfi rax, 4*8

        .if \save_r891011
        movq_cfi r8,  3*8
        movq_cfi r9,  2*8
        movq_cfi r10, 1*8
        movq_cfi r11, 0*8

SAVE_REST - saves the whole pt_regs structure just for the
sake of the syscall_trace_enter function, once finished it's
poped out..

As for unwind we should get the most important ones. Of course
the unwind could be based on any register, and with bogus value
the unwind just stops.

I think we are in better shape for irqs but I'd need to check.

jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-05-02 12:58               ` Jiri Olsa
@ 2012-05-02 14:46                 ` Stephane Eranian
  2012-05-03 10:25                   ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Stephane Eranian @ 2012-05-02 14:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, Arun Sharma

On Wed, May 2, 2012 at 2:58 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>
> SNIP
>
>> >
>> > I just sent v3, with changed design to be more generic, please check
>> >
>> > anyway, currently there's no way to mix 32 and 64 bit registers in sample.
>> >
>> > As I mentioned above, once running compat task, 64 bit registers
>> > are stored anyway. Given that all 32 bit registers have 64 equiv.
>> > you can ask to store RAX|RBX|R15.
>> >
>> Well, R8-R15 do not exist in 32-bit mode. So I wonder what is saved
>> on the stack for those, probably nothing. And in that case, how do you
>> handle the case where the user asked for R15 but it is not available and
>> you know that only on PMU interrupt.
>
> right, R8-R15 do not exist in 32 bit mode, meaning that the 32 bit task
> do not use them... but when you enter 64 bit kernel from 32 bit compat
> task, still 64bits registers are saved.. as for native 64 process,

I am confused by your term '64-bit registers' here. I assume you
mean registers are saved as 64-bit integers. But that does not means
that the full set of 64-bit registers (incl. R8-R15) is saved. Unless
you're telling
me that whatever values are in those 64-bit ABI only registers are thus saved
on the stack.


> there's no difference.. so even for 32 bit task you get 64 bits
> registers described below.
>
>
> The availbility of registers is tricky for user space register dump.
>
> We dump whatever we got in pt_regs structure from perf_sample_regs_user.
> The problem is, that exception do not store all of the registers.. the rest
> of the registers is whatever is in stack occupying the pt_regs space.
>
> When kernel is entered (by exception or irq), following is saved by CPU
> (both 32 and 64 bits):
>  ss, sp, flags, cs ip
>
> followed by SAVE_ARGS  (for exception):
>        movq_cfi rdi, 8*8
>        movq_cfi rsi, 7*8
>        movq_cfi rdx, 6*8
>
>        .if \save_rcx
>        movq_cfi rcx, 5*8
>        .endif
>
>        movq_cfi rax, 4*8
>
>        .if \save_r891011
>        movq_cfi r8,  3*8
>        movq_cfi r9,  2*8
>        movq_cfi r10, 1*8
>        movq_cfi r11, 0*8
>
> SAVE_REST - saves the whole pt_regs structure just for the
> sake of the syscall_trace_enter function, once finished it's
> poped out..
>
> As for unwind we should get the most important ones. Of course
> the unwind could be based on any register, and with bogus value
> the unwind just stops.
>
> I think we are in better shape for irqs but I'd need to check.
>
> jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers
  2012-05-02 14:46                 ` Stephane Eranian
@ 2012-05-03 10:25                   ` Jiri Olsa
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2012-05-03 10:25 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, rostedt, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, Arun Sharma

On Wed, May 02, 2012 at 04:46:33PM +0200, Stephane Eranian wrote:
> On Wed, May 2, 2012 at 2:58 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > SNIP
> >
> >> >
> >> > I just sent v3, with changed design to be more generic, please check
> >> >
> >> > anyway, currently there's no way to mix 32 and 64 bit registers in sample.
> >> >
> >> > As I mentioned above, once running compat task, 64 bit registers
> >> > are stored anyway. Given that all 32 bit registers have 64 equiv.
> >> > you can ask to store RAX|RBX|R15.
> >> >
> >> Well, R8-R15 do not exist in 32-bit mode. So I wonder what is saved
> >> on the stack for those, probably nothing. And in that case, how do you
> >> handle the case where the user asked for R15 but it is not available and
> >> you know that only on PMU interrupt.
> >
> > right, R8-R15 do not exist in 32 bit mode, meaning that the 32 bit task
> > do not use them... but when you enter 64 bit kernel from 32 bit compat
> > task, still 64bits registers are saved.. as for native 64 process,
> 
> I am confused by your term '64-bit registers' here. I assume you
> mean registers are saved as 64-bit integers. But that does not means
> that the full set of 64-bit registers (incl. R8-R15) is saved. Unless
same set as for 64 bit tasks.. it's not allways full as I described
in previous email

> you're telling
> me that whatever values are in those 64-bit ABI only registers are thus saved
> on the stack.
yep, thats what I see in arch/x86/ia32/ia32entry.S

jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-05-03 10:25 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-17 11:17 [RFCv2 00/15] perf: Add backtrace post dwarf unwind Jiri Olsa
2012-04-17 11:17 ` [PATCH 01/16] uaccess: Add new copy_from_user_gup API Jiri Olsa
2012-04-17 11:17 ` [PATCH 02/16] perf: Unified API to record selective sets of arch registers Jiri Olsa
2012-04-23 10:10   ` Stephane Eranian
2012-04-23 10:33     ` Jiri Olsa
2012-04-26 15:28       ` Jiri Olsa
2012-05-02 12:00         ` Stephane Eranian
2012-05-02 12:26           ` Jiri Olsa
2012-05-02 12:36             ` Stephane Eranian
2012-05-02 12:58               ` Jiri Olsa
2012-05-02 14:46                 ` Stephane Eranian
2012-05-03 10:25                   ` Jiri Olsa
2012-04-17 11:17 ` [PATCH 03/16] perf: Add ability to dump user regs Jiri Olsa
2012-04-23 10:15   ` Stephane Eranian
2012-04-17 11:17 ` [PATCH 04/16] perf: Add ability to dump part of the user stack Jiri Olsa
2012-04-17 11:17 ` [PATCH 05/16] perf: Add attribute to filter out user callchains Jiri Olsa
2012-04-17 11:17 ` [PATCH 06/16] perf, tool: Factor DSO symtab types to generic binary types Jiri Olsa
2012-04-17 11:17 ` [PATCH 07/16] perf, tool: Add interface to read DSO image data Jiri Olsa
2012-04-17 11:17 ` [PATCH 08/16] perf, tool: Add '.note' check into search for NOTE section Jiri Olsa
2012-04-17 11:17 ` [PATCH 09/16] perf, tool: Back [vdso] DSO with real data Jiri Olsa
2012-04-17 11:17 ` [PATCH 10/16] perf, tool: Add interface to arch registers sets Jiri Olsa
2012-04-17 11:17 ` [PATCH 11/16] perf, tool: Add libunwind dependency for dwarf cfi unwinding Jiri Olsa
2012-04-17 11:17 ` [PATCH 12/16] perf, tool: Support user regs and stack in sample parsing Jiri Olsa
2012-04-17 11:17 ` [PATCH 13/16] perf, tool: Support for dwarf cfi unwinding on post processing Jiri Olsa
2012-04-17 11:17 ` [PATCH 14/16] perf, tool: Support for dwarf mode callchain on perf record Jiri Olsa
2012-04-17 11:17 ` [PATCH 15/16] perf, tool: Add dso data caching Jiri Olsa
2012-04-17 11:17 ` [PATCH 16/16] perf, tool: Add dso data caching tests Jiri Olsa
2012-04-18  6:51 ` [RFCv2 00/15] perf: Add backtrace post dwarf unwind Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.