All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFCv5 00/19] perf: Add backtrace post dwarf unwind
@ 2012-06-11 13:19 Jiri Olsa
  2012-06-11 13:19 ` [PATCH 01/19] perf: Unified API to record selective sets of arch registers Jiri Olsa
                   ` (19 more replies)
  0 siblings, 20 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:19 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings

hi,
besides fixing several issues, going back to the original design
because the last one was considered too generic.. now we have:

 sample_regs_user  - != 0 triggers the user level regs dump
 sample_stack_user - != 0 triggers the user stack dump

We can allway extend this in future with new mask and flags
for IRQ/PEBS regs.

patches available also as tarball in here:
http://people.redhat.com/~jolsa/perf_post_unwind_v5.tar.bz2

v5 changes:
   patch 1/19 - having just one enum set of the perf registers
   patch 2/19 - using for_each_set_bit for scanning the mask
              - single regs enum for both 32 and 64 bits versions
              - using regs mask != 0 trigger to trigger the regs dump
   patch 5/19 - adding perf_output_skip so we can skip undumped part of the stack in RB
   patch 6/19 - using stack size != 0 trigger to trigger the stack dump
              - do not zero the memory for non retrieved part of the stack dump
   patch 7/19 - adding exclude_callchain_kernel attribute
   patch 8/19 - this could be taken without the rest of the series

v4 changes:
   - no real change from v3, just rebase
   - v3 patch 06/17 got already merged

v3 changes:
   patch 01/17
   - added HAVE_PERF_REGS config option
   patch 02/17, 04/17
   - regs and stack perf interface is more general now
   patch 06/17
   - unrelated online fix for i386 compilation
   patch 16/17
   - few namespace fixies

---
Adding the post unwinding user stack backtrace using dwarf unwind
via libunwind. The original work was done by Frederic. I mostly took
his patches and make them compile in current kernel code plus I added
some stuff here and there.

The main idea is to store user registers and portion of user
stack when the sample data during the record phase. Then during
the report, when the data is presented, perform the actual dwarf
dwarf unwind.

attached patches:
  01/19 perf: Unified API to record selective sets of arch registers
  02/19 perf: Add ability to attach user level registers dump to sample
  03/19 perf, x86: Add copy_from_user_nmi_nochk for best effort copy
  04/19 perf: Factor __output_copy to be usable with specific copy function
  05/19 perf: Add perf_output_skip function to skip bytes in sample
  06/19 perf: Add ability to attach user stack dump to sample
  07/19 perf: Add attribute to filter out callchains
  08/19 perf, tool: Remove unsused evsel parameter from machine__resolve_callchain
  09/19 perf, tool: Factor DSO symtab types to generic binary types
  10/19 perf, tool: Add interface to read DSO image data
  11/19 perf, tool: Add '.note' check into search for NOTE section
  12/19 perf, tool: Back [vdso] DSO with real data
  13/19 perf, tool: Add interface to arch registers sets
  14/19 perf, tool: Add libunwind dependency for dwarf cfi unwinding
  15/19 perf, tool: Support user regs and stack in sample parsing
  16/19 perf, tool: Support for dwarf cfi unwinding on post processing
  17/19 perf, tool: Support for dwarf mode callchain on perf record
  18/19 perf, tool: Add dso data caching
  19/19 perf, tool: Add dso data caching tests


I tested on Fedora. There was not much gain on i386, because the
binaries are compiled with frame pointers. Thought the dwarf
backtrace is more accurade and unwraps calls in more details
(functions that do not set the frame pointers).

I could see some improvement on x86_64, where I got full backtrace
where current code could got just the first address out of the
instruction pointer.

Example on x86_64:
[dwarf]
   perf record -g -e syscalls:sys_enter_write date

   100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
               |
               --- __GI___libc_write
                   _IO_file_write@@GLIBC_2.2.5
                   new_do_write
                   _IO_do_write@@GLIBC_2.2.5
                   _IO_file_overflow@@GLIBC_2.2.5
                   0x4022cd
                   0x401ee6
                   __libc_start_main
                   0x4020b9


[frame pointer]
   perf record -g fp -e syscalls:sys_enter_write date

   100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
               |
               --- __GI___libc_write

Also I tested on coreutils binaries mainly, but I could see
getting wider backtraces with dwarf unwind for more complex
application like firefox.

The unwind should go throught [vdso] object. I haven't studied
the [vsyscall] yet, so not sure there.

Attached patches should work on both x86 and x86_64. I did
some initial testing so far.

The unwind backtrace can be interrupted by following reasons:
    - bug in unwind information of processed shared library
    - bug in unwind processing code (most likely ;) )
    - insufficient dump stack size
    - wrong register value - x86_64 does not store whole
      set of registers when in exception, but so far
      it looks like RIP and RSP should be enough

thanks for comments,
jirka
---
 arch/Kconfig                                       |    6 +
 arch/x86/Kconfig                                   |    1 +
 arch/x86/include/asm/perf_event.h                  |    2 +
 arch/x86/include/asm/perf_regs.h                   |   34 ++
 arch/x86/include/asm/uaccess.h                     |    2 +
 arch/x86/kernel/Makefile                           |    2 +
 arch/x86/kernel/perf_regs.c                        |   91 ++++
 arch/x86/lib/usercopy.c                            |   15 +-
 include/linux/perf_event.h                         |   24 +-
 include/linux/perf_regs.h                          |   19 +
 kernel/events/callchain.c                          |   25 +-
 kernel/events/core.c                               |  132 +++++-
 kernel/events/internal.h                           |   69 ++-
 kernel/events/ring_buffer.c                        |   10 +-
 tools/perf/Makefile                                |   45 ++-
 tools/perf/arch/x86/Makefile                       |    3 +
 tools/perf/arch/x86/include/perf_regs.h            |   80 +++
 tools/perf/arch/x86/util/unwind.c                  |  111 ++++
 tools/perf/builtin-record.c                        |   86 +++-
 tools/perf/builtin-report.c                        |   24 +-
 tools/perf/builtin-script.c                        |   56 ++-
 tools/perf/builtin-test.c                          |    7 +-
 tools/perf/builtin-top.c                           |    7 +-
 tools/perf/config/feature-tests.mak                |   25 +
 tools/perf/perf.h                                  |    9 +-
 tools/perf/util/annotate.c                         |    2 +-
 tools/perf/util/dso-test-data.c                    |  154 ++++++
 tools/perf/util/event.h                            |   15 +-
 tools/perf/util/evlist.c                           |   16 +
 tools/perf/util/evlist.h                           |    2 +
 tools/perf/util/evsel.c                            |   35 ++-
 tools/perf/util/include/linux/compiler.h           |    1 +
 tools/perf/util/map.c                              |   23 +-
 tools/perf/util/map.h                              |    9 +-
 tools/perf/util/perf_regs.h                        |   14 +
 tools/perf/util/python.c                           |    3 +-
 .../perf/util/scripting-engines/trace-event-perl.c |    3 +-
 .../util/scripting-engines/trace-event-python.c    |    3 +-
 tools/perf/util/session.c                          |  110 ++++-
 tools/perf/util/session.h                          |   17 +-
 tools/perf/util/symbol.c                           |  435 +++++++++++++---
 tools/perf/util/symbol.h                           |   52 ++-
 tools/perf/util/trace-event-scripting.c            |    3 +-
 tools/perf/util/trace-event.h                      |    5 +-
 tools/perf/util/unwind.c                           |  565 ++++++++++++++++++++
 tools/perf/util/unwind.h                           |   34 ++
 tools/perf/util/vdso.c                             |   90 +++
 tools/perf/util/vdso.h                             |    8 +
 48 files changed, 2278 insertions(+), 206 deletions(-)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/19] perf: Unified API to record selective sets of arch registers
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
@ 2012-06-11 13:19 ` Jiri Olsa
  2012-06-11 13:19 ` [PATCH 02/19] perf: Add ability to attach user level registers dump to sample Jiri Olsa
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:19 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

This brings a new API to help the selective dump of registers on
event sampling, and its implementation for x86 arch.

Added HAVE_PERF_REGS config option to determine if the architecture
provides perf registers ABI.

The information about desired registers will be passed in u64 mask.
It's up to the architecture to map the registers into the mask bits.

For the x86 arch implementation, both 32 and 64 bit registers
bits are defined within single enum to ensure 64 bit system can
provide register dump for compat task if needed in the future.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 arch/Kconfig                     |    6 +++
 arch/x86/Kconfig                 |    1 +
 arch/x86/include/asm/perf_regs.h |   34 ++++++++++++++
 arch/x86/kernel/Makefile         |    2 +
 arch/x86/kernel/perf_regs.c      |   91 ++++++++++++++++++++++++++++++++++++++
 include/linux/perf_regs.h        |   19 ++++++++
 6 files changed, 153 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/perf_regs.h
 create mode 100644 arch/x86/kernel/perf_regs.c
 create mode 100644 include/linux/perf_regs.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 8c3d957..32f4873 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -222,6 +222,12 @@ config HAVE_PERF_EVENTS_NMI
 	  subsystem.  Also has support for calculating CPU cycle events
 	  to determine how many clock cycles in a given period.
 
+config HAVE_PERF_REGS
+	bool
+	help
+	  Support selective register dumps for perf events. This includes
+	  bit-mapping of each registers and a unique architecture id.
+
 config HAVE_ARCH_JUMP_LABEL
 	bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d35438e..457d8d6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -60,6 +60,7 @@ config X86
 	select HAVE_MIXED_BREAKPOINTS_REGS
 	select PERF_EVENTS
 	select HAVE_PERF_EVENTS_NMI
+	select HAVE_PERF_REGS
 	select ANON_INODES
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB && !M386
 	select HAVE_CMPXCHG_LOCAL if !M386
diff --git a/arch/x86/include/asm/perf_regs.h b/arch/x86/include/asm/perf_regs.h
new file mode 100644
index 0000000..0397bfc
--- /dev/null
+++ b/arch/x86/include/asm/perf_regs.h
@@ -0,0 +1,34 @@
+#ifndef _ASM_X86_PERF_REGS_H
+#define _ASM_X86_PERF_REGS_H
+
+enum perf_event_x86_regs {
+	PERF_REG_X86_AX,
+	PERF_REG_X86_BX,
+	PERF_REG_X86_CX,
+	PERF_REG_X86_DX,
+	PERF_REG_X86_SI,
+	PERF_REG_X86_DI,
+	PERF_REG_X86_BP,
+	PERF_REG_X86_SP,
+	PERF_REG_X86_IP,
+	PERF_REG_X86_FLAGS,
+	PERF_REG_X86_CS,
+	PERF_REG_X86_DS,
+	PERF_REG_X86_ES,
+	PERF_REG_X86_FS,
+	PERF_REG_X86_GS,
+	PERF_REG_X86_R8,
+	PERF_REG_X86_R9,
+	PERF_REG_X86_R10,
+	PERF_REG_X86_R11,
+	PERF_REG_X86_R12,
+	PERF_REG_X86_R13,
+	PERF_REG_X86_R14,
+	PERF_REG_X86_R15,
+	PERF_REG_X86_SS,
+
+	/* non ABI */
+	PERF_REG_X86_64_MAX = PERF_REG_X86_SS + 1,
+	PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
+};
+#endif /* _ASM_X86_PERF_REGS_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8215e56..8d7a619 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -100,6 +100,8 @@ obj-$(CONFIG_SWIOTLB)			+= pci-swiotlb.o
 obj-$(CONFIG_OF)			+= devicetree.o
 obj-$(CONFIG_UPROBES)			+= uprobes.o
 
+obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
new file mode 100644
index 0000000..87f0e81
--- /dev/null
+++ b/arch/x86/kernel/perf_regs.c
@@ -0,0 +1,91 @@
+
+#include <linux/kernel.h>
+#include <linux/bug.h>
+#include <linux/stddef.h>
+#include <asm/perf_regs.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_32
+#define PERF_REG_X86_MAX PERF_REG_X86_32_MAX
+#else
+#define PERF_REG_X86_MAX PERF_REG_X86_64_MAX
+#endif
+
+#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
+
+static unsigned int pt_regs_offset[PERF_REG_X86_MAX] = {
+	PT_REGS_OFFSET(PERF_REG_X86_AX, ax),
+	PT_REGS_OFFSET(PERF_REG_X86_BX, bx),
+	PT_REGS_OFFSET(PERF_REG_X86_CX, cx),
+	PT_REGS_OFFSET(PERF_REG_X86_DX, dx),
+	PT_REGS_OFFSET(PERF_REG_X86_SI, si),
+	PT_REGS_OFFSET(PERF_REG_X86_DI, di),
+	PT_REGS_OFFSET(PERF_REG_X86_BP, bp),
+	PT_REGS_OFFSET(PERF_REG_X86_SP, sp),
+	PT_REGS_OFFSET(PERF_REG_X86_IP, ip),
+	PT_REGS_OFFSET(PERF_REG_X86_FLAGS, flags),
+	PT_REGS_OFFSET(PERF_REG_X86_CS, cs),
+#ifdef CONFIG_X86_32
+	PT_REGS_OFFSET(PERF_REG_X86_DS, ds),
+	PT_REGS_OFFSET(PERF_REG_X86_ES, es),
+	PT_REGS_OFFSET(PERF_REG_X86_FS, fs),
+	PT_REGS_OFFSET(PERF_REG_X86_GS, gs),
+#else
+	/*
+	 * The pt_regs struct does not store
+	 * ds, es, fs, gs in 64 bit mode.
+	 */
+	(unsigned int) -1,
+	(unsigned int) -1,
+	(unsigned int) -1,
+	(unsigned int) -1,
+#endif
+#ifdef CONFIG_X86_64
+	PT_REGS_OFFSET(PERF_REG_X86_R8, r8),
+	PT_REGS_OFFSET(PERF_REG_X86_R9, r9),
+	PT_REGS_OFFSET(PERF_REG_X86_R10, r10),
+	PT_REGS_OFFSET(PERF_REG_X86_R11, r11),
+	PT_REGS_OFFSET(PERF_REG_X86_R12, r12),
+	PT_REGS_OFFSET(PERF_REG_X86_R13, r13),
+	PT_REGS_OFFSET(PERF_REG_X86_R14, r14),
+	PT_REGS_OFFSET(PERF_REG_X86_R15, r15),
+	PT_REGS_OFFSET(PERF_REG_X86_SS, ss),
+#endif
+};
+
+u64 perf_reg_value(struct pt_regs *regs, int idx)
+{
+	if (WARN_ON_ONCE(idx > ARRAY_SIZE(pt_regs_offset)))
+		return 0;
+
+	return regs_get_register(regs, pt_regs_offset[idx]);
+}
+
+#define REG_RESERVED (~((1ULL << PERF_REG_X86_MAX) - 1ULL))
+
+#ifdef CONFIG_X86_32
+int perf_reg_validate(u64 mask)
+{
+	if (mask & REG_RESERVED)
+		return -EINVAL;
+
+	return 0;
+}
+
+#else /* CONFIG_X86_64 */
+#define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \
+		       (1ULL << PERF_REG_X86_ES) | \
+		       (1ULL << PERF_REG_X86_FS) | \
+		       (1ULL << PERF_REG_X86_GS))
+
+int perf_reg_validate(u64 mask)
+{
+	if (mask & REG_RESERVED)
+		return -EINVAL;
+
+	if (mask & REG_NOSUPPORT)
+		return -EINVAL;
+
+	return 0;
+}
+#endif /* CONFIG_X86_32 */
diff --git a/include/linux/perf_regs.h b/include/linux/perf_regs.h
new file mode 100644
index 0000000..a2f1a98
--- /dev/null
+++ b/include/linux/perf_regs.h
@@ -0,0 +1,19 @@
+#ifndef _LINUX_PERF_REGS_H
+#define _LINUX_PERF_REGS_H
+
+#ifdef CONFIG_HAVE_PERF_REGS
+#include <asm/perf_regs.h>
+u64 perf_reg_value(struct pt_regs *regs, int idx);
+int perf_reg_validate(u64 mask);
+#else
+static inline u64 perf_reg_value(struct pt_regs *regs, int idx)
+{
+	return 0;
+}
+
+static inline int perf_reg_validate(u64 mask)
+{
+	return mask ? -ENOSYS : 0;
+}
+#endif /* CONFIG_HAVE_PERF_REGS */
+#endif /* _LINUX_PERF_REGS_H */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
  2012-06-11 13:19 ` [PATCH 01/19] perf: Unified API to record selective sets of arch registers Jiri Olsa
@ 2012-06-11 13:19 ` Jiri Olsa
  2012-06-13 11:16   ` Stephane Eranian
  2012-06-13 13:29   ` Stephane Eranian
  2012-06-11 13:19 ` [PATCH 03/19] perf, x86: Add copy_from_user_nmi_nochk for best effort copy Jiri Olsa
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:19 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Introducing sample_regs_user bitmask into perf_event_attr
struct to define the user level registers we want to attach
to the sample. The dump itself is triggered once the
sample_regs_user is not empty.

Only user level registers are dump at the moment. Meaning the
register values of the user space context as it was before the
user entered the kernel for whatever reason (syscall, irq,
exception, or a PMI happening in userspace).

The layout of the sample_regs_user bitmap is described in
asm/perf_regs.h for archs that support register dump.

This is going to be useful to bring Dwarf CFI based stack
unwinding on top of samples.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h |   10 ++++++-
 kernel/events/core.c       |   61 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1ce887a..d66cbeb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -271,7 +271,13 @@ struct perf_event_attr {
 		__u64		bp_len;
 		__u64		config2; /* extension of config1 */
 	};
-	__u64	branch_sample_type; /* enum branch_sample_type */
+	__u64	branch_sample_type; /* enum perf_branch_sample_type */
+
+	/*
+	 * Defines set of user regs to dump on samples.
+	 * See asm/perf_regs.h for details.
+	 */
+	__u64	sample_regs_user;
 };
 
 /*
@@ -609,6 +615,7 @@ struct perf_guest_info_callbacks {
 #include <linux/static_key.h>
 #include <linux/atomic.h>
 #include <linux/sysfs.h>
+#include <linux/perf_regs.h>
 #include <asm/local.h>
 
 struct perf_callchain_entry {
@@ -1131,6 +1138,7 @@ struct perf_sample_data {
 	struct perf_callchain_entry	*callchain;
 	struct perf_raw_record		*raw;
 	struct perf_branch_stack	*br_stack;
+	struct pt_regs			*regs_user;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f85c015..e4df59d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3750,6 +3750,33 @@ int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 }
 EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
 
+static void
+perf_output_sample_regs(struct perf_output_handle *handle,
+			struct pt_regs *regs, u64 mask)
+{
+	int bit;
+
+	for_each_set_bit(bit, (const unsigned long *) &mask,
+			 sizeof(mask) * BITS_PER_BYTE) {
+		u64 val;
+
+		val = perf_reg_value(regs, bit);
+		perf_output_put(handle, val);
+	}
+}
+
+static struct pt_regs *perf_sample_regs_user(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		if (current->mm)
+			regs = task_pt_regs(current);
+		else
+			regs = NULL;
+	}
+
+	return regs;
+}
+
 static void __perf_event_header__init_id(struct perf_event_header *header,
 					 struct perf_sample_data *data,
 					 struct perf_event *event)
@@ -4010,6 +4037,23 @@ void perf_output_sample(struct perf_output_handle *handle,
 			perf_output_put(handle, nr);
 		}
 	}
+
+	if (event->attr.sample_regs_user) {
+		u64 avail = (data->regs_user != NULL);
+
+		/*
+		 * If there are no regs to dump, notice it through
+		 * first u64 being zero.
+		 */
+		perf_output_put(handle, avail);
+
+		if (avail) {
+			u64 mask = event->attr.sample_regs_user;
+			perf_output_sample_regs(handle,
+						data->regs_user,
+						mask);
+		}
+	}
 }
 
 void perf_prepare_sample(struct perf_event_header *header,
@@ -4061,6 +4105,19 @@ void perf_prepare_sample(struct perf_event_header *header,
 		}
 		header->size += size;
 	}
+
+	if (event->attr.sample_regs_user) {
+		/* regs dump available bool */
+		int size = sizeof(u64);
+
+		data->regs_user = perf_sample_regs_user(regs);
+		if (data->regs_user) {
+			u64 mask = event->attr.sample_regs_user;
+			size += hweight64(mask) * sizeof(u64);
+		}
+
+		header->size += size;
+	}
 }
 
 static void perf_event_output(struct perf_event *event,
@@ -6110,6 +6167,10 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 			attr->branch_sample_type = mask;
 		}
 	}
+
+	if (attr->sample_regs_user)
+		ret = perf_reg_validate(attr->sample_regs_user);
+
 out:
 	return ret;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/19] perf, x86: Add copy_from_user_nmi_nochk for best effort copy
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
  2012-06-11 13:19 ` [PATCH 01/19] perf: Unified API to record selective sets of arch registers Jiri Olsa
  2012-06-11 13:19 ` [PATCH 02/19] perf: Add ability to attach user level registers dump to sample Jiri Olsa
@ 2012-06-11 13:19 ` Jiri Olsa
  2012-06-11 13:19 ` [PATCH 04/19] perf: Factor __output_copy to be usable with specific copy function Jiri Olsa
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:19 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding copy_from_user_nmi_nochk that provides the best effort
copy regardless the size crossing the task boundary at the moment.

This is going to be useful for stack dump we need in post
DWARF CFI based unwind.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 arch/x86/include/asm/uaccess.h |    2 ++
 arch/x86/lib/usercopy.c        |   15 +++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index e1f3a17..d8d6bcd 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -562,6 +562,8 @@ struct __large_struct { unsigned long buf[100]; };
 #endif /* CONFIG_X86_WP_WORKS_OK */
 
 extern unsigned long
+copy_from_user_nmi_nochk(void *to, const void __user *from, unsigned long n);
+extern unsigned long
 copy_from_user_nmi(void *to, const void __user *from, unsigned long n);
 extern __must_check long
 strncpy_from_user(char *dst, const char __user *src, long count);
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 677b1ed..29ca1c7 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -14,7 +14,7 @@
  * best effort, GUP based copy_from_user() that is NMI-safe
  */
 unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
+copy_from_user_nmi_nochk(void *to, const void __user *from, unsigned long n)
 {
 	unsigned long offset, addr = (unsigned long)from;
 	unsigned long size, len = 0;
@@ -22,9 +22,6 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	void *map;
 	int ret;
 
-	if (__range_not_ok(from, n, TASK_SIZE) == 0)
-		return len;
-
 	do {
 		ret = __get_user_pages_fast(addr, 1, 0, &page);
 		if (!ret)
@@ -46,4 +43,14 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 
 	return len;
 }
+EXPORT_SYMBOL_GPL(copy_from_user_nmi_nochk);
+
+unsigned long
+copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
+{
+	if (__range_not_ok(from, n, TASK_SIZE) == 0)
+		return 0;
+
+	return copy_from_user_nmi_nochk(to, from, n);
+}
 EXPORT_SYMBOL_GPL(copy_from_user_nmi);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/19] perf: Factor __output_copy to be usable with specific copy function
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (2 preceding siblings ...)
  2012-06-11 13:19 ` [PATCH 03/19] perf, x86: Add copy_from_user_nmi_nochk for best effort copy Jiri Olsa
@ 2012-06-11 13:19 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 05/19] perf: Add perf_output_skip function to skip bytes in sample Jiri Olsa
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:19 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding a generic way to use __output_copy function with
specific copy function via DEFINE_PERF_OUTPUT_COPY macro.

Using this to add new __output_copy_user function, that provides
output copy from user pointers. For x86 the copy_from_user_nmi_nochk
function is used and __copy_from_user_inatomic for the rest
of the architectures.

This new function will be used in user stack dump on sample,
comming in next patches.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 arch/x86/include/asm/perf_event.h |    2 +
 include/linux/perf_event.h        |    2 +-
 kernel/events/internal.h          |   62 ++++++++++++++++++++++++------------
 kernel/events/ring_buffer.c       |    4 +-
 4 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 588f52e..7b46815 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -257,4 +257,6 @@ static inline void perf_events_lapic_init(void)	{ }
  static inline void amd_pmu_disable_virt(void) { }
 #endif
 
+#define arch_perf_out_copy_user copy_from_user_nmi_nochk
+
 #endif /* _ASM_X86_PERF_EVENT_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d66cbeb..741e997 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1295,7 +1295,7 @@ static inline bool has_branch_stack(struct perf_event *event)
 extern int perf_output_begin(struct perf_output_handle *handle,
 			     struct perf_event *event, unsigned int size);
 extern void perf_output_end(struct perf_output_handle *handle);
-extern void perf_output_copy(struct perf_output_handle *handle,
+extern unsigned int perf_output_copy(struct perf_output_handle *handle,
 			     const void *buf, unsigned int len);
 extern int perf_swevent_get_recursion_context(void);
 extern void perf_swevent_put_recursion_context(int rctx);
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index b0b107f..2206a0f 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -2,6 +2,7 @@
 #define _KERNEL_EVENTS_INTERNAL_H
 
 #include <linux/hardirq.h>
+#include <linux/uaccess.h>
 
 /* Buffer handling */
 
@@ -76,30 +77,49 @@ static inline unsigned long perf_data_size(struct ring_buffer *rb)
 	return rb->nr_pages << (PAGE_SHIFT + page_order(rb));
 }
 
-static inline void
-__output_copy(struct perf_output_handle *handle,
-		   const void *buf, unsigned int len)
+#define DEFINE_OUTPUT_COPY(func_name, memcpy_func)			\
+static inline unsigned int						\
+func_name(struct perf_output_handle *handle,				\
+	  const void *buf, unsigned int len)				\
+{									\
+	unsigned long size, written;					\
+									\
+	do {								\
+		size = min_t(unsigned long, handle->size, len);		\
+									\
+		written = memcpy_func(handle->addr, buf, size);		\
+									\
+		len -= written;						\
+		handle->addr += written;				\
+		buf += written;						\
+		handle->size -= written;				\
+		if (!handle->size) {					\
+			struct ring_buffer *rb = handle->rb;		\
+									\
+			handle->page++;					\
+			handle->page &= rb->nr_pages - 1;		\
+			handle->addr = rb->data_pages[handle->page];	\
+			handle->size = PAGE_SIZE << page_order(rb);	\
+		}							\
+	} while (len && written == size);				\
+									\
+	return len;							\
+}
+
+static inline int memcpy_common(void *dst, const void *src, size_t n)
 {
-	do {
-		unsigned long size = min_t(unsigned long, handle->size, len);
-
-		memcpy(handle->addr, buf, size);
-
-		len -= size;
-		handle->addr += size;
-		buf += size;
-		handle->size -= size;
-		if (!handle->size) {
-			struct ring_buffer *rb = handle->rb;
-
-			handle->page++;
-			handle->page &= rb->nr_pages - 1;
-			handle->addr = rb->data_pages[handle->page];
-			handle->size = PAGE_SIZE << page_order(rb);
-		}
-	} while (len);
+	memcpy(dst, src, n);
+	return n;
 }
 
+DEFINE_OUTPUT_COPY(__output_copy, memcpy_common)
+
+#ifndef arch_perf_out_copy_user
+#define arch_perf_out_copy_user __copy_from_user_inatomic
+#endif
+
+DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user)
+
 /* Callchain handling */
 extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
 extern int get_callchain_buffers(void);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 6ddaba4..b4c2ad3 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -182,10 +182,10 @@ out:
 	return -ENOSPC;
 }
 
-void perf_output_copy(struct perf_output_handle *handle,
+unsigned int perf_output_copy(struct perf_output_handle *handle,
 		      const void *buf, unsigned int len)
 {
-	__output_copy(handle, buf, len);
+	return __output_copy(handle, buf, len);
 }
 
 void perf_output_end(struct perf_output_handle *handle)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/19] perf: Add perf_output_skip function to skip bytes in sample
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (3 preceding siblings ...)
  2012-06-11 13:19 ` [PATCH 04/19] perf: Factor __output_copy to be usable with specific copy function Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 06/19] perf: Add ability to attach user stack dump to sample Jiri Olsa
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Introducing perf_output_skip function to be able to skip
data within the perf ring buffer.

When writing data into perf ring buffer we first reserve needed
place in ring buffer and then copy the actual data.

There's a possibility we won't be able to fill all the reserved
size with data, so we need a way to skip the remaining bytes.

This is going to be usefull when storing the user stack dump,
where we might end up with less data than we originally requested.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h  |    2 ++
 kernel/events/internal.h    |    4 ++++
 kernel/events/ring_buffer.c |    6 ++++++
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 741e997..8960968 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1297,6 +1297,8 @@ extern int perf_output_begin(struct perf_output_handle *handle,
 extern void perf_output_end(struct perf_output_handle *handle);
 extern unsigned int perf_output_copy(struct perf_output_handle *handle,
 			     const void *buf, unsigned int len);
+extern unsigned int perf_output_skip(struct perf_output_handle *handle,
+				     unsigned int len);
 extern int perf_swevent_get_recursion_context(void);
 extern void perf_swevent_put_recursion_context(int rctx);
 extern void perf_event_enable(struct perf_event *event);
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2206a0f..be1ef29 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -114,6 +114,10 @@ static inline int memcpy_common(void *dst, const void *src, size_t n)
 
 DEFINE_OUTPUT_COPY(__output_copy, memcpy_common)
 
+#define MEMCPY_SKIP(dst, src, n) (n)
+
+DEFINE_OUTPUT_COPY(__output_skip, MEMCPY_SKIP)
+
 #ifndef arch_perf_out_copy_user
 #define arch_perf_out_copy_user __copy_from_user_inatomic
 #endif
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index b4c2ad3..23cb34f 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -188,6 +188,12 @@ unsigned int perf_output_copy(struct perf_output_handle *handle,
 	return __output_copy(handle, buf, len);
 }
 
+unsigned int perf_output_skip(struct perf_output_handle *handle,
+			      unsigned int len)
+{
+	return __output_skip(handle, NULL, len);
+}
+
 void perf_output_end(struct perf_output_handle *handle)
 {
 	perf_output_put_handle(handle);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/19] perf: Add ability to attach user stack dump to sample
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (4 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 05/19] perf: Add perf_output_skip function to skip bytes in sample Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 07/19] perf: Add attribute to filter out callchains Jiri Olsa
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Introducing sample_stack_user value into the perf_event_attr
struct to define the size of the user stack dump to be attached
to the sample. The dump itself is triggered once the
sample_stack_user is not zero.

Beeing able to dump parts of the user stack, starting from the
stack pointer, will be useful to make a post mortem dwarf CFI
based stack unwinding.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h |    5 +++
 kernel/events/core.c       |   66 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8960968..8db1655 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -278,6 +278,11 @@ struct perf_event_attr {
 	 * See asm/perf_regs.h for details.
 	 */
 	__u64	sample_regs_user;
+
+	/*
+	 * Defines size of the user stack to dump on samples.
+	 */
+	__u64	sample_stack_user;
 };
 
 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e4df59d..c1062a1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3765,6 +3765,38 @@ perf_output_sample_regs(struct perf_output_handle *handle,
 	}
 }
 
+static void
+perf_output_sample_ustack(struct perf_output_handle *handle, u64 dump_size,
+			  struct pt_regs *regs)
+{
+	/* Case of a kernel thread, nothing to dump */
+	if (!regs) {
+		u64 size = 0;
+		perf_output_put(handle, size);
+	} else {
+		unsigned long sp;
+		unsigned int rem;
+		u64 dyn_size;
+
+		/*
+		 * Static size: we always dump the size
+		 * requested by the user because most of the
+		 * time, the top of the user stack is not
+		 * paged out.
+		 */
+		perf_output_put(handle, dump_size);
+
+		sp = user_stack_pointer(regs);
+		rem = __output_copy_user(handle, (void *) sp, dump_size);
+		dyn_size = dump_size - rem;
+
+		perf_output_skip(handle, rem);
+
+		/* Dynamic size: whole dump - padding. */
+		perf_output_put(handle, dyn_size);
+	}
+}
+
 static struct pt_regs *perf_sample_regs_user(struct pt_regs *regs)
 {
 	if (!user_mode(regs)) {
@@ -4054,6 +4086,11 @@ void perf_output_sample(struct perf_output_handle *handle,
 						mask);
 		}
 	}
+
+	if (event->attr.sample_stack_user)
+		perf_output_sample_ustack(handle,
+					  event->attr.sample_stack_user,
+					  data->regs_user);
 }
 
 void perf_prepare_sample(struct perf_event_header *header,
@@ -4118,6 +4155,31 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 		header->size += size;
 	}
+
+	if (event->attr.sample_stack_user) {
+		/*
+		 * A first field that tells the _static_ size of the
+		 * dump. 0 if there is nothing to dump (ie: we are in
+		 * a kernel thread) otherwise the requested size.
+		 */
+		int size = sizeof(u64);
+
+		if (!data->regs_user)
+			data->regs_user = perf_sample_regs_user(regs);
+
+		/*
+		 * If there is something to dump, add space for the
+		 * dump itself and for the field that tells the
+		 * dynamic size, which is how many have been actually
+		 * dumped.
+		 */
+		if (data->regs_user) {
+			size += event->attr.sample_stack_user;
+			size += sizeof(u64);
+		}
+
+		header->size += size;
+	}
 }
 
 static void perf_event_output(struct perf_event *event,
@@ -6171,6 +6233,10 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
 	if (attr->sample_regs_user)
 		ret = perf_reg_validate(attr->sample_regs_user);
 
+	if (attr->sample_stack_user)
+		attr->sample_stack_user = round_up(attr->sample_stack_user,
+						   sizeof(u64));
+
 out:
 	return ret;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/19] perf: Add attribute to filter out callchains
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (5 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 06/19] perf: Add ability to attach user stack dump to sample Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 08/19] perf, tool: Remove unsused evsel parameter from machine__resolve_callchain Jiri Olsa
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Introducing new bits to the into the perf_event_attr struct:
  - exclude_callchain_kernel to filter out kernel callchain
  - exclude_callchain_user to filter out user callchain
from the sample dump.

We need to be able to disable standard user callchain dump
when we use the dwarf cfi callchain mode, because frame
pointer based user callchains are useless in this mode.

Implementing also exclude_callchain_kernel to have complete
set of options.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 include/linux/perf_event.h |    5 ++++-
 kernel/events/callchain.c  |   25 +++++++++++++++----------
 kernel/events/core.c       |    5 ++++-
 kernel/events/internal.h   |    3 ++-
 4 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8db1655..895fe7a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -255,7 +255,10 @@ struct perf_event_attr {
 				exclude_host   :  1, /* don't count in host   */
 				exclude_guest  :  1, /* don't count in guest  */
 
-				__reserved_1   : 43;
+				exclude_callchain_kernel : 1, /* exclude kernel callchains */
+				exclude_callchain_user   : 1, /* exclude user callchains */
+
+				__reserved_1   : 41;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 6581a04..905ffb2 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -153,12 +153,12 @@ put_callchain_entry(int rctx)
 	put_recursion_context(__get_cpu_var(callchain_recursion), rctx);
 }
 
-struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
+struct perf_callchain_entry *perf_callchain(struct pt_regs *regs,
+					    int kernel, int user)
 {
 	int rctx;
 	struct perf_callchain_entry *entry;
 
-
 	entry = get_callchain_entry(&rctx);
 	if (rctx == -1)
 		return NULL;
@@ -168,18 +168,23 @@ struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
 
 	entry->nr = 0;
 
-	if (!user_mode(regs)) {
+	if (kernel && !user_mode(regs)) {
 		perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
 		perf_callchain_kernel(entry, regs);
-		if (current->mm)
-			regs = task_pt_regs(current);
-		else
-			regs = NULL;
 	}
 
-	if (regs) {
-		perf_callchain_store(entry, PERF_CONTEXT_USER);
-		perf_callchain_user(entry, regs);
+	if (user) {
+		if (!user_mode(regs)) {
+			if  (current->mm)
+				regs = task_pt_regs(current);
+			else
+				regs = NULL;
+		}
+
+		if (regs) {
+			perf_callchain_store(entry, PERF_CONTEXT_USER);
+			perf_callchain_user(entry, regs);
+		}
 	}
 
 exit_put:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c1062a1..5ce9e75 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4113,8 +4113,11 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
 		int size = 1;
+		int kernel = !event->attr.exclude_callchain_kernel;
+		int user   = !event->attr.exclude_callchain_user;
 
-		data->callchain = perf_callchain(regs);
+		if (kernel || user)
+			data->callchain = perf_callchain(regs, kernel, user);
 
 		if (data->callchain)
 			size += data->callchain->nr;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index be1ef29..abcc03d 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -125,7 +125,8 @@ DEFINE_OUTPUT_COPY(__output_skip, MEMCPY_SKIP)
 DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user)
 
 /* Callchain handling */
-extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
+extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs,
+						   int kernel, int user);
 extern int get_callchain_buffers(void);
 extern void put_callchain_buffers(void);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/19] perf, tool: Remove unsused evsel parameter from machine__resolve_callchain
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (6 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 07/19] perf: Add attribute to filter out callchains Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-20 16:59   ` [tip:perf/core] perf tools: Remove unused " tip-bot for Jiri Olsa
  2012-06-11 13:20 ` [PATCH 09/19] perf, tool: Factor DSO symtab types to generic binary types Jiri Olsa
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Removing unsused evsel parameter from machine__resolve_callchain
function. Plus related header file and callers changes.

The evsel parameter is unused since following commit:
  perf callchain: Make callchain cursors TLS
  commit 472606458f3e1ced5fe3cc5f04e90a6b5a4732cf
  Author: Namhyung Kim <namhyung.kim@lge.com>
  Date:   Thu May 31 14:43:26 2012 +0900

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/builtin-report.c |    4 ++--
 tools/perf/builtin-script.c |    4 ++--
 tools/perf/builtin-top.c    |    2 +-
 tools/perf/util/map.h       |    2 +-
 tools/perf/util/session.c   |    7 +++----
 tools/perf/util/session.h   |    4 ++--
 6 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 25249f7..d20ef95 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -69,7 +69,7 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 
 	if ((sort__has_parent || symbol_conf.use_callchain)
 	    && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
+		err = machine__resolve_callchain(machine, al->thread,
 						 sample->callchain, &parent);
 		if (err)
 			return err;
@@ -140,7 +140,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct hist_entry *he;
 
 	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
+		err = machine__resolve_callchain(machine, al->thread,
 						 sample->callchain, &parent);
 		if (err)
 			return err;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8e395a5..05aa2bb 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -387,7 +387,7 @@ static void print_sample_bts(union perf_event *event,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine, evsel,
+		perf_event__print_ip(event, sample, machine,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
@@ -431,7 +431,7 @@ static void process_event(union perf_event *event __unused,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine, evsel,
+		perf_event__print_ip(event, sample, machine,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 6bb0277..79cabe4 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -774,7 +774,7 @@ static void perf_event__process_sample(struct perf_tool *tool,
 
 		if ((sort__has_parent || symbol_conf.use_callchain) &&
 		    sample->callchain) {
-			err = machine__resolve_callchain(machine, evsel, al.thread,
+			err = machine__resolve_callchain(machine, al.thread,
 							 sample->callchain, &parent);
 			if (err)
 				return;
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 81371ba..c14c665 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -157,7 +157,7 @@ void machine__exit(struct machine *self);
 void machine__delete(struct machine *self);
 
 int machine__resolve_callchain(struct machine *machine,
-			       struct perf_evsel *evsel, struct thread *thread,
+			       struct thread *thread,
 			       struct ip_callchain *chain,
 			       struct symbol **parent);
 int maps__set_kallsyms_ref_reloc_sym(struct map **maps, const char *symbol_name,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2600916..2785ce8 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -289,7 +289,6 @@ struct branch_info *machine__resolve_bstack(struct machine *self,
 }
 
 int machine__resolve_callchain(struct machine *self,
-			       struct perf_evsel *evsel __used,
 			       struct thread *thread,
 			       struct ip_callchain *chain,
 			       struct symbol **parent)
@@ -1480,8 +1479,8 @@ struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 }
 
 void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
-			  struct machine *machine, struct perf_evsel *evsel,
-			  int print_sym, int print_dso, int print_symoffset)
+			  struct machine *machine, int print_sym,
+			  int print_dso, int print_symoffset)
 {
 	struct addr_location al;
 	struct callchain_cursor_node *node;
@@ -1495,7 +1494,7 @@ void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
 
 	if (symbol_conf.use_callchain && sample->callchain) {
 
-		if (machine__resolve_callchain(machine, evsel, al.thread,
+		if (machine__resolve_callchain(machine, al.thread,
 						sample->callchain, NULL) != 0) {
 			if (verbose)
 				error("Failed to resolve callchain. Skipping\n");
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 7a5434c..877d781 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -150,8 +150,8 @@ struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 					    unsigned int type);
 
 void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
-			  struct machine *machine, struct perf_evsel *evsel,
-			  int print_sym, int print_dso, int print_symoffset);
+			  struct machine *machine, int print_sym,
+			  int print_dso, int print_symoffset);
 
 int perf_session__cpu_bitmap(struct perf_session *session,
 			     const char *cpu_list, unsigned long *cpu_bitmap);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/19] perf, tool: Factor DSO symtab types to generic binary types
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (7 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 08/19] perf, tool: Remove unsused evsel parameter from machine__resolve_callchain Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 10/19] perf, tool: Add interface to read DSO image data Jiri Olsa
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding interface to access DSOs so it could be used
from another place.

New DSO binary type is added - making current SYMTAB__*
types more general:
   DSO_BINARY_TYPE__* = SYMTAB__*

Folowing function is added to return path based on the specified
binary type:
   dso__binary_type_file

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/builtin-top.c   |    2 +-
 tools/perf/util/annotate.c |    2 +-
 tools/perf/util/symbol.c   |  181 +++++++++++++++++++++++++++-----------------
 tools/perf/util/symbol.h   |   32 ++++----
 4 files changed, 129 insertions(+), 88 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 79cabe4..246081a 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -125,7 +125,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he)
 	/*
 	 * We can't annotate with just /proc/kallsyms
 	 */
-	if (map->dso->symtab_type == SYMTAB__KALLSYMS) {
+	if (map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS) {
 		pr_err("Can't annotate %s: No vmlinux file was found in the "
 		       "path\n", sym->name);
 		sleep(1);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 8069dfb..7d3641f 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -777,7 +777,7 @@ fallback:
 		free_filename = false;
 	}
 
-	if (dso->symtab_type == SYMTAB__KALLSYMS) {
+	if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS) {
 		char bf[BUILD_ID_SIZE * 2 + 16] = " with build id ";
 		char *build_id_msg = NULL;
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 3e2e5ea..b6c456e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -48,6 +48,22 @@ struct symbol_conf symbol_conf = {
 	.symfs            = "",
 };
 
+static enum dso_binary_type binary_type_symtab[] = {
+	DSO_BINARY_TYPE__KALLSYMS,
+	DSO_BINARY_TYPE__GUEST_KALLSYMS,
+	DSO_BINARY_TYPE__JAVA_JIT,
+	DSO_BINARY_TYPE__BUILD_ID_CACHE,
+	DSO_BINARY_TYPE__FEDORA_DEBUGINFO,
+	DSO_BINARY_TYPE__UBUNTU_DEBUGINFO,
+	DSO_BINARY_TYPE__BUILDID_DEBUGINFO,
+	DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+	DSO_BINARY_TYPE__GUEST_KMODULE,
+	DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE,
+	DSO_BINARY_TYPE__NOT_FOUND,
+};
+
+#define DSO_BINARY_TYPE__SYMTAB_CNT sizeof(binary_type_symtab)
+
 int dso__name_len(const struct dso *dso)
 {
 	if (!dso)
@@ -318,7 +334,7 @@ struct dso *dso__new(const char *name)
 		dso__set_short_name(dso, dso->name);
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
-		dso->symtab_type = SYMTAB__NOT_FOUND;
+		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->loaded = 0;
 		dso->sorted_by_name = 0;
 		dso->has_build_id = 0;
@@ -806,9 +822,9 @@ int dso__load_kallsyms(struct dso *dso, const char *filename,
 	symbols__fixup_end(&dso->symbols[map->type]);
 
 	if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
-		dso->symtab_type = SYMTAB__GUEST_KALLSYMS;
+		dso->symtab_type = DSO_BINARY_TYPE__GUEST_KALLSYMS;
 	else
-		dso->symtab_type = SYMTAB__KALLSYMS;
+		dso->symtab_type = DSO_BINARY_TYPE__KALLSYMS;
 
 	return dso__split_kallsyms(dso, map, filter);
 }
@@ -1593,31 +1609,96 @@ out:
 char dso__symtab_origin(const struct dso *dso)
 {
 	static const char origin[] = {
-		[SYMTAB__KALLSYMS]	      = 'k',
-		[SYMTAB__JAVA_JIT]	      = 'j',
-		[SYMTAB__BUILD_ID_CACHE]      = 'B',
-		[SYMTAB__FEDORA_DEBUGINFO]    = 'f',
-		[SYMTAB__UBUNTU_DEBUGINFO]    = 'u',
-		[SYMTAB__BUILDID_DEBUGINFO]   = 'b',
-		[SYMTAB__SYSTEM_PATH_DSO]     = 'd',
-		[SYMTAB__SYSTEM_PATH_KMODULE] = 'K',
-		[SYMTAB__GUEST_KALLSYMS]      =  'g',
-		[SYMTAB__GUEST_KMODULE]	      =  'G',
+		[DSO_BINARY_TYPE__KALLSYMS]		= 'k',
+		[DSO_BINARY_TYPE__JAVA_JIT]		= 'j',
+		[DSO_BINARY_TYPE__BUILD_ID_CACHE]	= 'B',
+		[DSO_BINARY_TYPE__FEDORA_DEBUGINFO]	= 'f',
+		[DSO_BINARY_TYPE__UBUNTU_DEBUGINFO]	= 'u',
+		[DSO_BINARY_TYPE__BUILDID_DEBUGINFO]	= 'b',
+		[DSO_BINARY_TYPE__SYSTEM_PATH_DSO]	= 'd',
+		[DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE]	= 'K',
+		[DSO_BINARY_TYPE__GUEST_KALLSYMS]	= 'g',
+		[DSO_BINARY_TYPE__GUEST_KMODULE]	= 'G',
 	};
 
-	if (dso == NULL || dso->symtab_type == SYMTAB__NOT_FOUND)
+	if (dso == NULL || dso->symtab_type == DSO_BINARY_TYPE__NOT_FOUND)
 		return '!';
 	return origin[dso->symtab_type];
 }
 
+int dso__binary_type_file(struct dso *dso, enum dso_binary_type type,
+			  char *root_dir, char *file, size_t size)
+{
+	char build_id_hex[BUILD_ID_SIZE * 2 + 1];
+	int ret = 0;
+
+	switch (type) {
+	case DSO_BINARY_TYPE__BUILD_ID_CACHE:
+		/* skip the locally configured cache if a symfs is given */
+		if (symbol_conf.symfs[0] ||
+		    (dso__build_id_filename(dso, file, size) == NULL))
+			ret = -1;
+		break;
+
+	case DSO_BINARY_TYPE__FEDORA_DEBUGINFO:
+		snprintf(file, size, "%s/usr/lib/debug%s.debug",
+			 symbol_conf.symfs, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO:
+		snprintf(file, size, "%s/usr/lib/debug%s",
+			 symbol_conf.symfs, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__BUILDID_DEBUGINFO:
+		if (!dso->has_build_id) {
+			ret = -1;
+			break;
+		}
+
+		build_id__sprintf(dso->build_id,
+				  sizeof(dso->build_id),
+				  build_id_hex);
+		snprintf(file, size,
+			 "%s/usr/lib/debug/.build-id/%.2s/%s.debug",
+			 symbol_conf.symfs, build_id_hex, build_id_hex + 2);
+		break;
+
+	case DSO_BINARY_TYPE__SYSTEM_PATH_DSO:
+		snprintf(file, size, "%s%s",
+			 symbol_conf.symfs, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__GUEST_KMODULE:
+		snprintf(file, size, "%s%s%s", symbol_conf.symfs,
+			 root_dir, dso->long_name);
+		break;
+
+	case DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE:
+		snprintf(file, size, "%s%s", symbol_conf.symfs,
+			 dso->long_name);
+		break;
+
+	default:
+	case DSO_BINARY_TYPE__KALLSYMS:
+	case DSO_BINARY_TYPE__GUEST_KALLSYMS:
+	case DSO_BINARY_TYPE__JAVA_JIT:
+	case DSO_BINARY_TYPE__NOT_FOUND:
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
 int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 {
-	int size = PATH_MAX;
 	char *name;
 	int ret = -1;
 	int fd;
+	u_int i;
 	struct machine *machine;
-	const char *root_dir;
+	char *root_dir = (char *) "";
 	int want_symtab;
 
 	dso__set_loaded(dso, map->type);
@@ -1632,7 +1713,7 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 	else
 		machine = NULL;
 
-	name = malloc(size);
+	name = malloc(PATH_MAX);
 	if (!name)
 		return -1;
 
@@ -1651,69 +1732,27 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 		}
 
 		ret = dso__load_perf_map(dso, map, filter);
-		dso->symtab_type = ret > 0 ? SYMTAB__JAVA_JIT :
-					      SYMTAB__NOT_FOUND;
+		dso->symtab_type = ret > 0 ? DSO_BINARY_TYPE__JAVA_JIT :
+					     DSO_BINARY_TYPE__NOT_FOUND;
 		return ret;
 	}
 
+	if (machine)
+		root_dir = machine->root_dir;
+
 	/* Iterate over candidate debug images.
 	 * On the first pass, only load images if they have a full symtab.
 	 * Failing that, do a second pass where we accept .dynsym also
 	 */
 	want_symtab = 1;
 restart:
-	for (dso->symtab_type = SYMTAB__BUILD_ID_CACHE;
-	     dso->symtab_type != SYMTAB__NOT_FOUND;
-	     dso->symtab_type++) {
-		switch (dso->symtab_type) {
-		case SYMTAB__BUILD_ID_CACHE:
-			/* skip the locally configured cache if a symfs is given */
-			if (symbol_conf.symfs[0] ||
-			    (dso__build_id_filename(dso, name, size) == NULL)) {
-				continue;
-			}
-			break;
-		case SYMTAB__FEDORA_DEBUGINFO:
-			snprintf(name, size, "%s/usr/lib/debug%s.debug",
-				 symbol_conf.symfs, dso->long_name);
-			break;
-		case SYMTAB__UBUNTU_DEBUGINFO:
-			snprintf(name, size, "%s/usr/lib/debug%s",
-				 symbol_conf.symfs, dso->long_name);
-			break;
-		case SYMTAB__BUILDID_DEBUGINFO: {
-			char build_id_hex[BUILD_ID_SIZE * 2 + 1];
+	for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
 
-			if (!dso->has_build_id)
-				continue;
+		dso->symtab_type = binary_type_symtab[i];
 
-			build_id__sprintf(dso->build_id,
-					  sizeof(dso->build_id),
-					  build_id_hex);
-			snprintf(name, size,
-				 "%s/usr/lib/debug/.build-id/%.2s/%s.debug",
-				 symbol_conf.symfs, build_id_hex, build_id_hex + 2);
-			}
-			break;
-		case SYMTAB__SYSTEM_PATH_DSO:
-			snprintf(name, size, "%s%s",
-			     symbol_conf.symfs, dso->long_name);
-			break;
-		case SYMTAB__GUEST_KMODULE:
-			if (map->groups && machine)
-				root_dir = machine->root_dir;
-			else
-				root_dir = "";
-			snprintf(name, size, "%s%s%s", symbol_conf.symfs,
-				 root_dir, dso->long_name);
-			break;
-
-		case SYMTAB__SYSTEM_PATH_KMODULE:
-			snprintf(name, size, "%s%s", symbol_conf.symfs,
-				 dso->long_name);
-			break;
-		default:;
-		}
+		if (dso__binary_type_file(dso, dso->symtab_type,
+					  root_dir, name, PATH_MAX))
+			continue;
 
 		/* Name is now the name of the next image to try */
 		fd = open(name, O_RDONLY);
@@ -1930,9 +1969,9 @@ struct map *machine__new_module(struct machine *machine, u64 start,
 		return NULL;
 
 	if (machine__is_host(machine))
-		dso->symtab_type = SYMTAB__SYSTEM_PATH_KMODULE;
+		dso->symtab_type = DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE;
 	else
-		dso->symtab_type = SYMTAB__GUEST_KMODULE;
+		dso->symtab_type = DSO_BINARY_TYPE__GUEST_KMODULE;
 	map_groups__insert(&machine->kmaps, map);
 	return map;
 }
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index af0752b..0dc30f9 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -155,6 +155,20 @@ struct addr_location {
 	s32	      cpu;
 };
 
+enum dso_binary_type {
+	DSO_BINARY_TYPE__KALLSYMS = 0,
+	DSO_BINARY_TYPE__GUEST_KALLSYMS,
+	DSO_BINARY_TYPE__JAVA_JIT,
+	DSO_BINARY_TYPE__BUILD_ID_CACHE,
+	DSO_BINARY_TYPE__FEDORA_DEBUGINFO,
+	DSO_BINARY_TYPE__UBUNTU_DEBUGINFO,
+	DSO_BINARY_TYPE__BUILDID_DEBUGINFO,
+	DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+	DSO_BINARY_TYPE__GUEST_KMODULE,
+	DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE,
+	DSO_BINARY_TYPE__NOT_FOUND,
+};
+
 enum dso_kernel_type {
 	DSO_TYPE_USER = 0,
 	DSO_TYPE_KERNEL,
@@ -173,13 +187,13 @@ struct dso {
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
 	enum dso_kernel_type	kernel;
 	enum dso_swap_type	needs_swap;
+	enum dso_binary_type	symtab_type;
 	u8		 adjust_symbols:1;
 	u8		 has_build_id:1;
 	u8		 hit:1;
 	u8		 annotate_warned:1;
 	u8		 sname_alloc:1;
 	u8		 lname_alloc:1;
-	unsigned char	 symtab_type;
 	u8		 sorted_by_name;
 	u8		 loaded;
 	u8		 build_id[BUILD_ID_SIZE];
@@ -253,20 +267,6 @@ size_t dso__fprintf_symbols_by_name(struct dso *dso,
 				    enum map_type type, FILE *fp);
 size_t dso__fprintf(struct dso *dso, enum map_type type, FILE *fp);
 
-enum symtab_type {
-	SYMTAB__KALLSYMS = 0,
-	SYMTAB__GUEST_KALLSYMS,
-	SYMTAB__JAVA_JIT,
-	SYMTAB__BUILD_ID_CACHE,
-	SYMTAB__FEDORA_DEBUGINFO,
-	SYMTAB__UBUNTU_DEBUGINFO,
-	SYMTAB__BUILDID_DEBUGINFO,
-	SYMTAB__SYSTEM_PATH_DSO,
-	SYMTAB__GUEST_KMODULE,
-	SYMTAB__SYSTEM_PATH_KMODULE,
-	SYMTAB__NOT_FOUND,
-};
-
 char dso__symtab_origin(const struct dso *dso);
 void dso__set_long_name(struct dso *dso, char *name);
 void dso__set_build_id(struct dso *dso, void *build_id);
@@ -303,4 +303,6 @@ bool symbol_type__is_a(char symbol_type, enum map_type map_type);
 
 size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
 
+int dso__binary_type_file(struct dso *dso, enum dso_binary_type type,
+			  char *root_dir, char *file, size_t size);
 #endif /* __PERF_SYMBOL */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/19] perf, tool: Add interface to read DSO image data
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (8 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 09/19] perf, tool: Factor DSO symtab types to generic binary types Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 11/19] perf, tool: Add '.note' check into search for NOTE section Jiri Olsa
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding following interface for DSO object to allow
reading of DSO image data:

  dso__data_fd
    - opens DSO and returns file descriptor
      Binary types are used to locate/open DSO in following order:
        DSO_BINARY_TYPE__BUILD_ID_CACHE
        DSO_BINARY_TYPE__SYSTEM_PATH_DSO
      In other word we first try to open DSO build-id path,
      and if that fails we try to open DSO system path.

  dso__data_read_offset
    - reads DSO data from specified offset

  dso__data_read_addr
    - reads DSO data from specified addres/map.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/symbol.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/symbol.h |    8 +++
 2 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index b6c456e..425bea5 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -64,6 +64,14 @@ static enum dso_binary_type binary_type_symtab[] = {
 
 #define DSO_BINARY_TYPE__SYMTAB_CNT sizeof(binary_type_symtab)
 
+static enum dso_binary_type binary_type_data[] = {
+	DSO_BINARY_TYPE__BUILD_ID_CACHE,
+	DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+	DSO_BINARY_TYPE__NOT_FOUND,
+};
+
+#define DSO_BINARY_TYPE__DATA_CNT sizeof(binary_type_data)
+
 int dso__name_len(const struct dso *dso)
 {
 	if (!dso)
@@ -335,6 +343,7 @@ struct dso *dso__new(const char *name)
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
+		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->loaded = 0;
 		dso->sorted_by_name = 0;
 		dso->has_build_id = 0;
@@ -2864,3 +2873,101 @@ struct map *dso__new_map(const char *name)
 
 	return map;
 }
+
+static int open_dso(struct dso *dso, struct machine *machine)
+{
+	char *root_dir = (char *) "";
+	char *name;
+	int fd;
+
+	name = malloc(PATH_MAX);
+	if (!name)
+		return -ENOMEM;
+
+	if (machine)
+		root_dir = machine->root_dir;
+
+	if (dso__binary_type_file(dso, dso->data_type,
+				  root_dir, name, PATH_MAX))
+		return -EINVAL;
+
+	fd = open(name, O_RDONLY);
+	free(name);
+	return fd;
+}
+
+int dso__data_fd(struct dso *dso, struct machine *machine)
+{
+	int i = 0;
+
+	if (dso->data_type != DSO_BINARY_TYPE__NOT_FOUND)
+		return open_dso(dso, machine);
+
+	do {
+		int fd;
+
+		dso->data_type = binary_type_data[i++];
+
+		fd = open_dso(dso, machine);
+		if (fd >= 0)
+			return fd;
+
+	} while (dso->data_type != DSO_BINARY_TYPE__NOT_FOUND);
+
+	return -EINVAL;
+}
+
+static ssize_t dso_cache_read(struct dso *dso __used, u64 offset __used,
+			      u8 *data __used, ssize_t size __used)
+{
+	return -EINVAL;
+}
+
+static int dso_cache_add(struct dso *dso __used, u64 offset __used,
+			 u8 *data __used, ssize_t size __used)
+{
+	return 0;
+}
+
+static ssize_t read_dso_data(struct dso *dso, struct machine *machine,
+		     u64 offset, u8 *data, ssize_t size)
+{
+	ssize_t rsize = -1;
+	int fd;
+
+	fd = dso__data_fd(dso, machine);
+	if (fd < 0)
+		return -1;
+
+	do {
+		if (-1 == lseek(fd, offset, SEEK_SET))
+			break;
+
+		rsize = read(fd, data, size);
+		if (-1 == rsize)
+			break;
+
+		if (dso_cache_add(dso, offset, data, size))
+			pr_err("Failed to add data int dso cache.");
+
+	} while (0);
+
+	close(fd);
+	return rsize;
+}
+
+ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
+			      u64 offset, u8 *data, ssize_t size)
+{
+	if (dso_cache_read(dso, offset, data, size))
+		return read_dso_data(dso, machine, offset, data, size);
+	return 0;
+}
+
+ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
+			    struct machine *machine, u64 addr,
+			    u8 *data, ssize_t size)
+{
+	u64 offset = map->map_ip(map, addr);
+	return dso__data_read_offset(dso, machine, offset, data, size);
+}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 0dc30f9..77044cb 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -188,6 +188,7 @@ struct dso {
 	enum dso_kernel_type	kernel;
 	enum dso_swap_type	needs_swap;
 	enum dso_binary_type	symtab_type;
+	enum dso_binary_type	data_type;
 	u8		 adjust_symbols:1;
 	u8		 has_build_id:1;
 	u8		 hit:1;
@@ -305,4 +306,11 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
 
 int dso__binary_type_file(struct dso *dso, enum dso_binary_type type,
 			  char *root_dir, char *file, size_t size);
+
+int dso__data_fd(struct dso *dso, struct machine *machine);
+ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
+			      u64 offset, u8 *data, ssize_t size);
+ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
+			    struct machine *machine, u64 addr,
+			    u8 *data, ssize_t size);
 #endif /* __PERF_SYMBOL */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/19] perf, tool: Add '.note' check into search for NOTE section
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (9 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 10/19] perf, tool: Add interface to read DSO image data Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 12/19] perf, tool: Back [vdso] DSO with real data Jiri Olsa
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding '.note' section name to be check when looking for notes
section. The '.note' name is used by kernel VDSO.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/symbol.c |   29 +++++++++++++++++++++++------
 1 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 425bea5..59012fc 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1503,14 +1503,31 @@ static int elf_read_build_id(Elf *elf, void *bf, size_t size)
 		goto out;
 	}
 
-	sec = elf_section_by_name(elf, &ehdr, &shdr,
-				  ".note.gnu.build-id", NULL);
-	if (sec == NULL) {
+	/*
+	 * Check following sections for notes:
+	 *   '.note.gnu.build-id'
+	 *   '.notes'
+	 *   '.note' (VDSO specific)
+	 */
+	do {
+		sec = elf_section_by_name(elf, &ehdr, &shdr,
+					  ".note.gnu.build-id", NULL);
+		if (sec)
+			break;
+
 		sec = elf_section_by_name(elf, &ehdr, &shdr,
 					  ".notes", NULL);
-		if (sec == NULL)
-			goto out;
-	}
+		if (sec)
+			break;
+
+		sec = elf_section_by_name(elf, &ehdr, &shdr,
+					  ".note", NULL);
+		if (sec)
+			break;
+
+		return err;
+
+	} while (0);
 
 	data = elf_getdata(sec, NULL);
 	if (data == NULL)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/19] perf, tool: Back [vdso] DSO with real data
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (10 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 11/19] perf, tool: Add '.note' check into search for NOTE section Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-29 18:49   ` Arnaldo Carvalho de Melo
  2012-06-11 13:20 ` [PATCH 13/19] perf, tool: Add interface to arch registers sets Jiri Olsa
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Storing data for VDSO shared object, because we need it for
the unwind process.

The idea is that VDSO shared object is same for all process
on a running system, so it makes no difference if we store
it inside the tracer - perf.

The record command:
When [vdso] map memory is hit, we retrieve [vdso] DSO image
and store it into temporary file. During the build-id
processing the [vdso] DSO image is stored in build-id db,
and build-id refference is made inside perf.data. The temporary
file is removed when record is finished.

The report command:
We read build-id from perf.data and store [vdso] DSO object.
This object is refferenced and attached to map when the MMAP
events are processed. Thus during the SAMPLE event processing
we have correct mmap/dso attached.

Adding following API for vdso object:
  vdso__file
    - vdso temp file path

  vdso__get_file
    - finds and store VDSO image into temp file,
      the temp file path is returned

  vdso__exit
    - removes temporary VDSO image if there's any

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile       |    2 +
 tools/perf/util/map.c     |   23 ++++++++++-
 tools/perf/util/session.c |    2 +
 tools/perf/util/vdso.c    |   90 +++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/vdso.h    |    8 ++++
 5 files changed, 123 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/util/vdso.c
 create mode 100644 tools/perf/util/vdso.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 0eee64c..e48b969 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -319,6 +319,7 @@ LIB_H += $(ARCH_INCLUDE)
 LIB_H += util/cgroup.h
 LIB_H += $(TRACE_EVENT_DIR)event-parse.h
 LIB_H += util/target.h
+LIB_H += util/vdso.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -382,6 +383,7 @@ LIB_OBJS += $(OUTPUT)util/xyarray.o
 LIB_OBJS += $(OUTPUT)util/cpumap.o
 LIB_OBJS += $(OUTPUT)util/cgroup.o
 LIB_OBJS += $(OUTPUT)util/target.o
+LIB_OBJS += $(OUTPUT)util/vdso.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 35ae568..1649ea0 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -7,6 +7,7 @@
 #include <stdio.h>
 #include <unistd.h>
 #include "map.h"
+#include "vdso.h"
 
 const char *map_type__name[MAP__NR_TYPES] = {
 	[MAP__FUNCTION] = "Functions",
@@ -18,10 +19,14 @@ static inline int is_anon_memory(const char *filename)
 	return strcmp(filename, "//anon") == 0;
 }
 
+static inline int is_vdso_memory(const char *filename)
+{
+	return !strcmp(filename, "[vdso]");
+}
+
 static inline int is_no_dso_memory(const char *filename)
 {
 	return !strcmp(filename, "[stack]") ||
-	       !strcmp(filename, "[vdso]")  ||
 	       !strcmp(filename, "[heap]");
 }
 
@@ -50,9 +55,10 @@ struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
 	if (self != NULL) {
 		char newfilename[PATH_MAX];
 		struct dso *dso;
-		int anon, no_dso;
+		int anon, no_dso, vdso;
 
 		anon = is_anon_memory(filename);
+		vdso = is_vdso_memory(filename);
 		no_dso = is_no_dso_memory(filename);
 
 		if (anon) {
@@ -60,10 +66,23 @@ struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
 			filename = newfilename;
 		}
 
+		if (vdso) {
+			filename = (char *) vdso__file;
+			pgoff = 0;
+		}
+
 		dso = __dsos__findnew(dsos__list, filename);
 		if (dso == NULL)
 			goto out_delete;
 
+		if (vdso && !dso->has_build_id) {
+			char *file_vdso = vdso__get_file();
+			if (file_vdso)
+				dso__set_long_name(dso, file_vdso);
+			else
+				no_dso = 1;
+		}
+
 		map__init(self, type, start, start + len, pgoff, dso);
 
 		if (anon || no_dso) {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2785ce8..f400612 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -14,6 +14,7 @@
 #include "sort.h"
 #include "util.h"
 #include "cpumap.h"
+#include "vdso.h"
 
 static int perf_session__open(struct perf_session *self, bool force)
 {
@@ -209,6 +210,7 @@ void perf_session__delete(struct perf_session *self)
 	machine__exit(&self->host_machine);
 	close(self->fd);
 	free(self);
+	vdso__exit();
 }
 
 void machine__remove_thread(struct machine *self, struct thread *th)
diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
new file mode 100644
index 0000000..e964482
--- /dev/null
+++ b/tools/perf/util/vdso.c
@@ -0,0 +1,90 @@
+
+#include <unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <linux/kernel.h>
+#include "vdso.h"
+#include "util.h"
+
+const char vdso__file[] = "/tmp/vdso.so";
+static bool vdso_found;
+
+static int find_vdso_map(void **start, void **end)
+{
+	FILE *maps;
+	char line[128];
+	int found = 0;
+
+	maps = fopen("/proc/self/maps", "r");
+	if (!maps) {
+		pr_err("vdso: cannot open maps\n");
+		return -1;
+	}
+
+	while (!found && fgets(line, sizeof(line), maps)) {
+		int m = -1;
+
+		/* We care only about private r-x mappings. */
+		if (2 != sscanf(line, "%p-%p r-xp %*x %*x:%*x %*u %n",
+				start, end, &m))
+			continue;
+		if (m < 0)
+			continue;
+
+		if (!strncmp(&line[m], "[vdso]", 6))
+			found = 1;
+	}
+
+	fclose(maps);
+	return !found;
+}
+
+char *vdso__get_file(void)
+{
+	char *vdso = NULL;
+	char *buf = NULL;
+	void *start, *end;
+
+	do {
+		int fd, size;
+
+		if (vdso_found) {
+			vdso = (char *) vdso__file;
+			break;
+		}
+
+		if (find_vdso_map(&start, &end))
+			break;
+
+		size = end - start;
+		buf = malloc(size);
+		if (!buf)
+			break;
+
+		memcpy(buf, start, size);
+
+		fd = open(vdso__file, O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
+		if (fd < 0)
+			break;
+
+		if (size == write(fd, buf, size))
+			vdso = (char *) vdso__file;
+
+		close(fd);
+	} while (0);
+
+	if (buf)
+		free(buf);
+
+	vdso_found = (vdso != NULL);
+	return vdso;
+}
+
+void vdso__exit(void)
+{
+	if (vdso_found)
+		unlink(vdso__file);
+}
diff --git a/tools/perf/util/vdso.h b/tools/perf/util/vdso.h
new file mode 100644
index 0000000..908b041
--- /dev/null
+++ b/tools/perf/util/vdso.h
@@ -0,0 +1,8 @@
+#ifndef __VDSO__
+#define __VDSO__
+
+extern const char vdso__file[];
+char *vdso__get_file(void);
+void  vdso__exit(void);
+
+#endif /* __VDSO__ */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/19] perf, tool: Add interface to arch registers sets
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (11 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 12/19] perf, tool: Back [vdso] DSO with real data Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 14/19] perf, tool: Add libunwind dependency for dwarf cfi unwinding Jiri Olsa
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding header files to access unified API for arch registers.
  util/perf_regs.h - global perf_reg declarations
  arch/x86/include/perf_regs.h - x86 arch specific

Adding perf_reg_name function to obtain register name based
on the API ID value, and PERF_REGS_MASK macro with mask
definition of all current arch registers (will be used in
unwind patches).

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile                     |   13 +++++-
 tools/perf/arch/x86/include/perf_regs.h |   80 +++++++++++++++++++++++++++++++
 tools/perf/util/perf_regs.h             |   14 +++++
 3 files changed, 106 insertions(+), 1 deletions(-)
 create mode 100644 tools/perf/arch/x86/include/perf_regs.h
 create mode 100644 tools/perf/util/perf_regs.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index e48b969..f412fb1 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -50,13 +50,15 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \
 				  -e s/s390x/s390/ -e s/parisc64/parisc/ \
 				  -e s/ppc.*/powerpc/ -e s/mips.*/mips/ \
 				  -e s/sh[234].*/sh/ )
+NO_PERF_REGS := 1
 
 CC = $(CROSS_COMPILE)gcc
 AR = $(CROSS_COMPILE)ar
 
 # Additional ARCH settings for x86
 ifeq ($(ARCH),i386)
-        ARCH := x86
+	ARCH := x86
+	NO_PERF_REGS := 0
 endif
 ifeq ($(ARCH),x86_64)
 	ARCH := x86
@@ -69,6 +71,7 @@ ifeq ($(ARCH),x86_64)
 		ARCH_CFLAGS := -DARCH_X86_64
 		ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S ../../arch/x86/lib/memset_64.S
 	endif
+	NO_PERF_REGS := 0
 endif
 
 # Treat warnings as errors unless directed not to
@@ -320,6 +323,7 @@ LIB_H += util/cgroup.h
 LIB_H += $(TRACE_EVENT_DIR)event-parse.h
 LIB_H += util/target.h
 LIB_H += util/vdso.h
+LIB_H += util/perf_regs.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -665,6 +669,13 @@ else
 	endif
 endif
 
+ifeq ($(NO_PERF_REGS),0)
+	ifeq ($(ARCH),x86)
+		LIB_H += arch/x86/include/perf_regs.h
+	endif
+else
+	BASIC_CFLAGS += -DNO_PERF_REGS
+endif
 
 ifdef NO_STRLCPY
 	BASIC_CFLAGS += -DNO_STRLCPY
diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/include/perf_regs.h
new file mode 100644
index 0000000..ee5aff6
--- /dev/null
+++ b/tools/perf/arch/x86/include/perf_regs.h
@@ -0,0 +1,80 @@
+#ifndef ARCH_PERF_REGS_H
+#define ARCH_PERF_REGS_H
+
+#include <stdlib.h>
+#include "../../util/types.h"
+#include "../../../../../arch/x86/include/asm/perf_regs.h"
+
+#ifndef ARCH_X86_64
+#define PERF_REGS_MASK ((1ULL << PERF_REG_X86_32_MAX) - 1)
+#else
+#define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \
+		       (1ULL << PERF_REG_X86_ES) | \
+		       (1ULL << PERF_REG_X86_FS) | \
+		       (1ULL << PERF_REG_X86_GS))
+#define PERF_REGS_MASK (((1ULL << PERF_REG_X86_64_MAX) - 1) & ~REG_NOSUPPORT)
+#endif
+#define PERF_REG_IP PERF_REG_X86_IP
+#define PERF_REG_SP PERF_REG_X86_SP
+
+static inline const char *perf_reg_name(int id)
+{
+	switch (id) {
+	case PERF_REG_X86_AX:
+		return "AX";
+	case PERF_REG_X86_BX:
+		return "BX";
+	case PERF_REG_X86_CX:
+		return "CX";
+	case PERF_REG_X86_DX:
+		return "DX";
+	case PERF_REG_X86_SI:
+		return "SI";
+	case PERF_REG_X86_DI:
+		return "DI";
+	case PERF_REG_X86_BP:
+		return "BP";
+	case PERF_REG_X86_SP:
+		return "SP";
+	case PERF_REG_X86_IP:
+		return "IP";
+	case PERF_REG_X86_FLAGS:
+		return "FLAGS";
+	case PERF_REG_X86_CS:
+		return "CS";
+	case PERF_REG_X86_DS:
+		return "DS";
+	case PERF_REG_X86_ES:
+		return "ES";
+	case PERF_REG_X86_FS:
+		return "FS";
+	case PERF_REG_X86_GS:
+		return "GS";
+#ifdef ARCH_X86_64
+	case PERF_REG_X86_R8:
+		return "R8";
+	case PERF_REG_X86_R9:
+		return "R9";
+	case PERF_REG_X86_R10:
+		return "R10";
+	case PERF_REG_X86_R11:
+		return "R11";
+	case PERF_REG_X86_R12:
+		return "R12";
+	case PERF_REG_X86_R13:
+		return "R13";
+	case PERF_REG_X86_R14:
+		return "R14";
+	case PERF_REG_X86_R15:
+		return "R15";
+	case PERF_REG_X86_SS:
+		return "SS";
+#endif /* ARCH_X86_64 */
+	default:
+		return NULL;
+	}
+
+	return NULL;
+}
+
+#endif /* ARCH_PERF_REGS_H */
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
new file mode 100644
index 0000000..3efb79a
--- /dev/null
+++ b/tools/perf/util/perf_regs.h
@@ -0,0 +1,14 @@
+#ifndef __PERF_REGS_H
+#define __PERF_REGS_H
+
+#ifndef NO_PERF_REGS_DEFS
+#include <perf_regs.h>
+#else
+#define PERF_REGS_MASK	0
+
+static inline const char *perf_reg_name(int id __used)
+{
+	return NULL;
+}
+#endif /* NO_PERF_REGS_DEFS */
+#endif /* __PERF_REGS_H */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/19] perf, tool: Add libunwind dependency for dwarf cfi unwinding
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (12 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 13/19] perf, tool: Add interface to arch registers sets Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 15/19] perf, tool: Support user regs and stack in sample parsing Jiri Olsa
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding libunwind to be linked with perf if available. It's required
for the to get dwarf cfi unwinding support.

Also building perf with the dwarf call frame informations by default,
so that we can unwind callchains in perf itself.

Adding LIBUNWIND_DIR Makefile variable allowing user to specify
the directory with libunwind to be linked. This is used for
debug purposes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile                 |   27 ++++++++++++++++++++++++++-
 tools/perf/config/feature-tests.mak |   25 +++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index f412fb1..4993a97 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -59,6 +59,7 @@ AR = $(CROSS_COMPILE)ar
 ifeq ($(ARCH),i386)
 	ARCH := x86
 	NO_PERF_REGS := 0
+	LIBUNWIND_LIBS = -lunwind -lunwind-x86
 endif
 ifeq ($(ARCH),x86_64)
 	ARCH := x86
@@ -72,6 +73,7 @@ ifeq ($(ARCH),x86_64)
 		ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S ../../arch/x86/lib/memset_64.S
 	endif
 	NO_PERF_REGS := 0
+	LIBUNWIND_LIBS = -lunwind -lunwind-x86_64
 endif
 
 # Treat warnings as errors unless directed not to
@@ -92,7 +94,7 @@ ifdef PARSER_DEBUG
 	PARSER_DEBUG_CFLAGS := -DPARSER_DEBUG
 endif
 
-CFLAGS = -fno-omit-frame-pointer -ggdb3 -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) $(PARSER_DEBUG_CFLAGS)
+CFLAGS = -fno-omit-frame-pointer -ggdb3 -funwind-tables -Wall -Wextra -std=gnu99 $(CFLAGS_WERROR) $(CFLAGS_OPTIMIZE) $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) $(PARSER_DEBUG_CFLAGS)
 EXTLIBS = -lpthread -lrt -lelf -lm
 ALL_CFLAGS = $(CFLAGS) -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 ALL_LDFLAGS = $(LDFLAGS)
@@ -458,6 +460,21 @@ ifneq ($(call try-cc,$(SOURCE_DWARF),$(FLAGS_DWARF)),y)
 endif # Dwarf support
 endif # NO_DWARF
 
+ifndef NO_LIBUNWIND
+# for linking with debug library, run like:
+# make DEBUG=1 LIBUNWIND_DIR=/opt/libunwind/
+ifdef LIBUNWIND_DIR
+	LIBUNWIND_CFLAGS  := -I$(LIBUNWIND_DIR)/include
+	LIBUNWIND_LDFLAGS := -L$(LIBUNWIND_DIR)/lib
+endif
+
+FLAGS_UNWIND=$(LIBUNWIND_CFLAGS) $(ALL_CFLAGS) $(LIBUNWIND_LDFLAGS) $(ALL_LDFLAGS) $(EXTLIBS) $(LIBUNWIND_LIBS)
+ifneq ($(call try-cc,$(SOURCE_LIBUNWIND),$(FLAGS_UNWIND)),y)
+	msg := $(warning No libunwind found. Please install libunwind >= 0.99);
+	NO_LIBUNWIND := 1
+endif # Libunwind support
+endif # NO_LIBUNWIND
+
 -include arch/$(ARCH)/Makefile
 
 ifneq ($(OUTPUT),)
@@ -489,6 +506,14 @@ else
 endif # PERF_HAVE_DWARF_REGS
 endif # NO_DWARF
 
+ifdef NO_LIBUNWIND
+	BASIC_CFLAGS += -DNO_LIBUNWIND_SUPPORT
+else
+	EXTLIBS += $(LIBUNWIND_LIBS)
+	BASIC_CFLAGS := $(LIBUNWIND_CFLAGS) $(BASIC_CFLAGS)
+	BASIC_LDFLAGS := $(LIBUNWIND_LDFLAGS) $(BASIC_LDFLAGS)
+endif
+
 ifdef NO_NEWT
 	BASIC_CFLAGS += -DNO_NEWT_SUPPORT
 else
diff --git a/tools/perf/config/feature-tests.mak b/tools/perf/config/feature-tests.mak
index d9084e0..51cd201 100644
--- a/tools/perf/config/feature-tests.mak
+++ b/tools/perf/config/feature-tests.mak
@@ -141,3 +141,28 @@ int main(void)
 	return 0;
 }
 endef
+
+ifndef NO_LIBUNWIND
+define SOURCE_LIBUNWIND
+#include <libunwind.h>
+#include <stdlib.h>
+
+extern int UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
+                                      unw_word_t ip,
+                                      unw_dyn_info_t *di,
+                                      unw_proc_info_t *pi,
+                                      int need_unwind_info, void *arg);
+
+
+#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
+
+int main(void)
+{
+	unw_addr_space_t addr_space;
+	addr_space = unw_create_addr_space(NULL, 0);
+	unw_init_remote(NULL, addr_space, NULL);
+	dwarf_search_unwind_table(addr_space, 0, NULL, NULL, 0, NULL);
+	return 0;
+}
+endef
+endif
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/19] perf, tool: Support user regs and stack in sample parsing
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (13 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 14/19] perf, tool: Add libunwind dependency for dwarf cfi unwinding Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 16/19] perf, tool: Support for dwarf cfi unwinding on post processing Jiri Olsa
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding following info to be parsed out of the event sample:
 - user register set
 - user stack dump

Both are global and specific to all events within the session.
This info will be used in the unwind patches comming in shortly.

Adding simple output printout (report -D) for both register and
stack dumps.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/builtin-test.c |    3 ++-
 tools/perf/util/event.h   |   15 ++++++++++++++-
 tools/perf/util/evlist.c  |   16 ++++++++++++++++
 tools/perf/util/evlist.h  |    2 ++
 tools/perf/util/evsel.c   |   25 +++++++++++++++++++++++++
 tools/perf/util/python.c  |    3 ++-
 tools/perf/util/session.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/session.h |   10 ++++++++--
 8 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index 5a8727c..8c2fcb0 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -564,7 +564,7 @@ static int test__basic_mmap(void)
 		}
 
 		err = perf_event__parse_sample(event, attr.sample_type, sample_size,
-					       false, &sample, false);
+					       false, 0, 0, &sample, false);
 		if (err) {
 			pr_err("Can't parse sample, err = %d\n", err);
 			goto out_munmap;
@@ -790,6 +790,7 @@ static int test__PERF_RECORD(void)
 
 				err = perf_event__parse_sample(event, sample_type,
 							       sample_size, true,
+							       0, 0,
 							       &sample, false);
 				if (err < 0) {
 					if (verbose)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 1b19728..c2aa7ba 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -69,6 +69,15 @@ struct sample_event {
 	u64 array[];
 };
 
+struct regs_dump {
+	u64 *regs;
+};
+
+struct stack_dump {
+	u64 size;
+	char *data;
+};
+
 struct perf_sample {
 	u64 ip;
 	u32 pid, tid;
@@ -82,6 +91,8 @@ struct perf_sample {
 	void *raw_data;
 	struct ip_callchain *callchain;
 	struct branch_stack *branch_stack;
+	struct regs_dump  user_regs;
+	struct stack_dump user_stack;
 };
 
 #define BUILD_ID_SIZE 20
@@ -199,7 +210,9 @@ const char *perf_event__name(unsigned int id);
 
 int perf_event__parse_sample(const union perf_event *event, u64 type,
 			     int sample_size, bool sample_id_all,
-			     struct perf_sample *sample, bool swapped);
+			     u64 sample_regs_user, u64 sample_stack_user,
+			     struct perf_sample *data, bool swapped);
+
 int perf_event__synthesize_sample(union perf_event *event, u64 type,
 				  const struct perf_sample *sample,
 				  bool swapped);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7400fb3..b7c6ffa 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -689,6 +689,22 @@ bool perf_evlist__valid_sample_type(const struct perf_evlist *evlist)
 	return true;
 }
 
+u64 perf_evlist__sample_regs_user(const struct perf_evlist *evlist)
+{
+	struct perf_evsel *first;
+
+	first = list_entry(evlist->entries.next, struct perf_evsel, node);
+	return first->attr.sample_regs_user;
+}
+
+u64 perf_evlist__sample_stack_user(const struct perf_evlist *evlist)
+{
+	struct perf_evsel *first;
+
+	first = list_entry(evlist->entries.next, struct perf_evsel, node);
+	return first->attr.sample_stack_user;
+}
+
 u64 perf_evlist__sample_type(const struct perf_evlist *evlist)
 {
 	struct perf_evsel *first;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 989bee9..0b2560f 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -121,6 +121,8 @@ u16 perf_evlist__id_hdr_size(const struct perf_evlist *evlist);
 
 bool perf_evlist__valid_sample_type(const struct perf_evlist *evlist);
 bool perf_evlist__valid_sample_id_all(const struct perf_evlist *evlist);
+u64 perf_evlist__sample_regs_user(const struct perf_evlist *evlist);
+u64 perf_evlist__sample_stack_user(const struct perf_evlist *evlist);
 
 void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
 				   struct list_head *list,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9f6cebd..32ade1f 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -8,6 +8,7 @@
  */
 
 #include <byteswap.h>
+#include <linux/bitops.h>
 #include "asm/bug.h"
 #include "evsel.h"
 #include "evlist.h"
@@ -559,6 +560,7 @@ static bool sample_overlap(const union perf_event *event,
 
 int perf_event__parse_sample(const union perf_event *event, u64 type,
 			     int sample_size, bool sample_id_all,
+			     u64 sample_regs_user, u64 sample_stack_user,
 			     struct perf_sample *data, bool swapped)
 {
 	const u64 *array;
@@ -697,6 +699,29 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
 		sz /= sizeof(u64);
 		array += sz;
 	}
+
+	if (sample_regs_user) {
+		/* First u64 tells us if we have any regs in sample. */
+		u64 avail = *array++;
+
+		if (avail) {
+			data->user_regs.regs = (u64 *)array;
+			array += hweight_long(sample_regs_user);
+		}
+	}
+
+	if (sample_stack_user) {
+		u64 size = *array++;
+
+		if (!size) {
+			data->user_stack.size = 0;
+		} else {
+			data->user_stack.data = (char *)array;
+			array += size / sizeof(*array);
+			data->user_stack.size = *array;
+		}
+	}
+
 	return 0;
 }
 
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index e03b58a..17eb1f9 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -807,7 +807,8 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist,
 		first = list_entry(evlist->entries.next, struct perf_evsel, node);
 		err = perf_event__parse_sample(event, first->attr.sample_type,
 					       perf_evsel__sample_size(first),
-					       sample_id_all, &pevent->sample, false);
+					       sample_id_all, 0, 0,
+					       &pevent->sample, false);
 		if (err)
 			return PyErr_Format(PyExc_OSError,
 					    "perf: can't parse sample, err=%d", err);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f400612..8dd331d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -15,6 +15,7 @@
 #include "util.h"
 #include "cpumap.h"
 #include "vdso.h"
+#include "perf_regs.h"
 
 static int perf_session__open(struct perf_session *self, bool force)
 {
@@ -87,6 +88,8 @@ void perf_session__update_sample_type(struct perf_session *self)
 	self->sample_id_all = perf_evlist__sample_id_all(self->evlist);
 	self->id_hdr_size = perf_evlist__id_hdr_size(self->evlist);
 	self->host_machine.id_hdr_size = self->id_hdr_size;
+	self->sample_regs_user = perf_evlist__sample_regs_user(self->evlist);
+	self->sample_stack_user = perf_evlist__sample_stack_user(self->evlist);
 }
 
 int perf_session__create_kernel_maps(struct perf_session *self)
@@ -851,6 +854,40 @@ static void branch_stack__printf(struct perf_sample *sample)
 			sample->branch_stack->entries[i].to);
 }
 
+static void regs_dump__printf(u64 mask, u64 *regs)
+{
+	int i = 0, rid = 0;
+
+	do {
+		u64 val;
+
+		if (mask & 1) {
+			val = regs[i++];
+			printf(".... %-5s 0x%" PRIx64 "\n",
+			       perf_reg_name(rid), val);
+		}
+
+		rid++;
+		mask >>= 1;
+
+	} while (mask);
+}
+
+static void regs_user__printf(struct perf_sample *sample, u64 mask)
+{
+	struct regs_dump *user_regs = &sample->user_regs;
+
+	if (user_regs->regs) {
+		printf("... user regs: mask 0x%" PRIx64 "\n", mask);
+		regs_dump__printf(mask, user_regs->regs);
+	}
+}
+
+static void stack_user__printf(struct stack_dump *dump)
+{
+	printf("... ustack: size %" PRIu64 "\n", dump->size);
+}
+
 static void perf_session__print_tstamp(struct perf_session *session,
 				       union perf_event *event,
 				       struct perf_sample *sample)
@@ -901,6 +938,12 @@ static void dump_sample(struct perf_session *session, union perf_event *event,
 
 	if (session->sample_type & PERF_SAMPLE_BRANCH_STACK)
 		branch_stack__printf(sample);
+
+	if (session->sample_regs_user)
+		regs_user__printf(sample, session->sample_regs_user);
+
+	if (session->sample_stack_user)
+		stack_user__printf(&sample->user_stack);
 }
 
 static struct machine *
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 877d781..452d826 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -42,6 +42,8 @@ struct perf_session {
 	struct hists		hists;
 	u64			sample_type;
 	int			sample_size;
+	u64			sample_regs_user;
+	u64			sample_stack_user;
 	int			fd;
 	bool			fd_pipe;
 	bool			repipe;
@@ -132,9 +134,13 @@ static inline int perf_session__parse_sample(struct perf_session *session,
 					     const union perf_event *event,
 					     struct perf_sample *sample)
 {
-	return perf_event__parse_sample(event, session->sample_type,
+	return perf_event__parse_sample(event,
+					session->sample_type,
 					session->sample_size,
-					session->sample_id_all, sample,
+					session->sample_id_all,
+					session->sample_regs_user,
+					session->sample_stack_user,
+					sample,
 					session->header.needs_swap);
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 16/19] perf, tool: Support for dwarf cfi unwinding on post processing
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (14 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 15/19] perf, tool: Support user regs and stack in sample parsing Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 17/19] perf, tool: Support for dwarf mode callchain on perf record Jiri Olsa
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

This brings the support for dwarf cfi unwinding on perf post
processing. Call frame informations are retrieved and then passed
to libunwind that requests memory and register content from the
applications.

Adding unwind object to handle the user stack backtrace based
on the user register values and user stack dump.

The unwind object access the libunwind via remote interface
and provides to it all the necessary data to unwind the stack.

The unwind interface provides following function:
	unwind__get_entries

And callback (specified in above function) to retrieve
the backtrace entries:
	typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry,
					 void *arg);

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile                                |    2 +
 tools/perf/arch/x86/Makefile                       |    3 +
 tools/perf/arch/x86/util/unwind.c                  |  111 ++++
 tools/perf/builtin-report.c                        |   24 +-
 tools/perf/builtin-script.c                        |   56 ++-
 tools/perf/builtin-top.c                           |    5 +-
 tools/perf/util/include/linux/compiler.h           |    1 +
 tools/perf/util/map.h                              |    7 +-
 .../perf/util/scripting-engines/trace-event-perl.c |    3 +-
 .../util/scripting-engines/trace-event-python.c    |    3 +-
 tools/perf/util/session.c                          |   60 ++-
 tools/perf/util/session.h                          |    3 +-
 tools/perf/util/trace-event-scripting.c            |    3 +-
 tools/perf/util/trace-event.h                      |    5 +-
 tools/perf/util/unwind.c                           |  565 ++++++++++++++++++++
 tools/perf/util/unwind.h                           |   34 ++
 16 files changed, 834 insertions(+), 51 deletions(-)
 create mode 100644 tools/perf/arch/x86/util/unwind.c
 create mode 100644 tools/perf/util/unwind.c
 create mode 100644 tools/perf/util/unwind.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 4993a97..92c3f8f 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -326,6 +326,7 @@ LIB_H += $(TRACE_EVENT_DIR)event-parse.h
 LIB_H += util/target.h
 LIB_H += util/vdso.h
 LIB_H += util/perf_regs.h
+LIB_H += util/unwind.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -512,6 +513,7 @@ else
 	EXTLIBS += $(LIBUNWIND_LIBS)
 	BASIC_CFLAGS := $(LIBUNWIND_CFLAGS) $(BASIC_CFLAGS)
 	BASIC_LDFLAGS := $(LIBUNWIND_LDFLAGS) $(BASIC_LDFLAGS)
+	LIB_OBJS += $(OUTPUT)util/unwind.o
 endif
 
 ifdef NO_NEWT
diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
index 744e629..815841c 100644
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -2,4 +2,7 @@ ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
 endif
+ifndef NO_LIBUNWIND
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o
+endif
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
diff --git a/tools/perf/arch/x86/util/unwind.c b/tools/perf/arch/x86/util/unwind.c
new file mode 100644
index 0000000..78d956e
--- /dev/null
+++ b/tools/perf/arch/x86/util/unwind.c
@@ -0,0 +1,111 @@
+
+#include <errno.h>
+#include <libunwind.h>
+#include "perf_regs.h"
+#include "../../util/unwind.h"
+
+#ifdef ARCH_X86_64
+int unwind__arch_reg_id(int regnum)
+{
+	int id;
+
+	switch (regnum) {
+	case UNW_X86_64_RAX:
+		id = PERF_REG_X86_AX;
+		break;
+	case UNW_X86_64_RDX:
+		id = PERF_REG_X86_DX;
+		break;
+	case UNW_X86_64_RCX:
+		id = PERF_REG_X86_CX;
+		break;
+	case UNW_X86_64_RBX:
+		id = PERF_REG_X86_BX;
+		break;
+	case UNW_X86_64_RSI:
+		id = PERF_REG_X86_SI;
+		break;
+	case UNW_X86_64_RDI:
+		id = PERF_REG_X86_DI;
+		break;
+	case UNW_X86_64_RBP:
+		id = PERF_REG_X86_BP;
+		break;
+	case UNW_X86_64_RSP:
+		id = PERF_REG_X86_SP;
+		break;
+	case UNW_X86_64_R8:
+		id = PERF_REG_X86_R8;
+		break;
+	case UNW_X86_64_R9:
+		id = PERF_REG_X86_R9;
+		break;
+	case UNW_X86_64_R10:
+		id = PERF_REG_X86_R10;
+		break;
+	case UNW_X86_64_R11:
+		id = PERF_REG_X86_R11;
+		break;
+	case UNW_X86_64_R12:
+		id = PERF_REG_X86_R12;
+		break;
+	case UNW_X86_64_R13:
+		id = PERF_REG_X86_R13;
+		break;
+	case UNW_X86_64_R14:
+		id = PERF_REG_X86_R14;
+		break;
+	case UNW_X86_64_R15:
+		id = PERF_REG_X86_R15;
+		break;
+	case UNW_X86_64_RIP:
+		id = PERF_REG_X86_IP;
+		break;
+	default:
+		pr_err("unwind: invalid reg id %d\n", regnum);
+		return -EINVAL;
+	}
+
+	return id;
+}
+#else
+int unwind__arch_reg_id(int regnum)
+{
+	int id;
+
+	switch (regnum) {
+	case UNW_X86_EAX:
+		id = PERF_REG_X86_AX;
+		break;
+	case UNW_X86_EDX:
+		id = PERF_REG_X86_DX;
+		break;
+	case UNW_X86_ECX:
+		id = PERF_REG_X86_CX;
+		break;
+	case UNW_X86_EBX:
+		id = PERF_REG_X86_BX;
+		break;
+	case UNW_X86_ESI:
+		id = PERF_REG_X86_SI;
+		break;
+	case UNW_X86_EDI:
+		id = PERF_REG_X86_DI;
+		break;
+	case UNW_X86_EBP:
+		id = PERF_REG_X86_BP;
+		break;
+	case UNW_X86_ESP:
+		id = PERF_REG_X86_SP;
+		break;
+	case UNW_X86_EIP:
+		id = PERF_REG_X86_IP;
+		break;
+	default:
+		pr_err("unwind: invalid reg id %d\n", regnum);
+		return -EINVAL;
+	}
+
+	return id;
+}
+#endif /* ARCH_X86_64 */
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d20ef95..ecf94df 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -69,8 +69,8 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 
 	if ((sort__has_parent || symbol_conf.use_callchain)
 	    && sample->callchain) {
-		err = machine__resolve_callchain(machine, al->thread,
-						 sample->callchain, &parent);
+		err = machine__resolve_callchain(rep->session, machine,
+						 al->thread, sample, &parent);
 		if (err)
 			return err;
 	}
@@ -130,7 +130,8 @@ out:
 	return err;
 }
 
-static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
+static int perf_evsel__add_hist_entry(struct perf_session *session,
+				      struct perf_evsel *evsel,
 				      struct addr_location *al,
 				      struct perf_sample *sample,
 				      struct machine *machine)
@@ -140,8 +141,8 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct hist_entry *he;
 
 	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
-		err = machine__resolve_callchain(machine, al->thread,
-						 sample->callchain, &parent);
+		err = machine__resolve_callchain(session, machine,
+						 al->thread, sample, &parent);
 		if (err)
 			return err;
 	}
@@ -213,7 +214,8 @@ static int process_sample_event(struct perf_tool *tool,
 		if (al.map != NULL)
 			al.map->dso->hit = 1;
 
-		if (perf_evsel__add_hist_entry(evsel, &al, sample, machine)) {
+		if (perf_evsel__add_hist_entry(rep->session, evsel, &al,
+					       sample, machine)) {
 			pr_debug("problem incrementing symbol period, skipping event\n");
 			return -1;
 		}
@@ -394,17 +396,17 @@ static int __cmd_report(struct perf_report *rep)
 		desc);
 	}
 
-	if (dump_trace) {
-		perf_session__fprintf_nr_events(session, stdout);
-		goto out_delete;
-	}
-
 	if (verbose > 3)
 		perf_session__fprintf(session, stdout);
 
 	if (verbose > 2)
 		perf_session__fprintf_dsos(session, stdout);
 
+	if (dump_trace) {
+		perf_session__fprintf_nr_events(session, stdout);
+		goto out_delete;
+	}
+
 	nr_samples = 0;
 	list_for_each_entry(pos, &session->evlist->entries, node) {
 		struct hists *hists = &pos->hists;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 05aa2bb..2ee4ce5 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -28,6 +28,11 @@ static bool			system_wide;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
+struct perf_script {
+	struct perf_tool tool;
+	struct perf_session *session;
+};
+
 enum perf_output_field {
 	PERF_OUTPUT_COMM            = 1U << 0,
 	PERF_OUTPUT_TID             = 1U << 1,
@@ -373,7 +378,8 @@ static void print_sample_addr(union perf_event *event,
 	}
 }
 
-static void print_sample_bts(union perf_event *event,
+static void print_sample_bts(struct perf_session *session,
+			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct perf_evsel *evsel,
 			     struct machine *machine,
@@ -387,7 +393,7 @@ static void print_sample_bts(union perf_event *event,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine,
+		perf_event__print_ip(session, event, sample, machine,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
@@ -401,7 +407,8 @@ static void print_sample_bts(union perf_event *event,
 	printf("\n");
 }
 
-static void process_event(union perf_event *event __unused,
+static void process_event(struct perf_session *session,
+			  union perf_event *event __unused,
 			  struct perf_sample *sample,
 			  struct perf_evsel *evsel,
 			  struct machine *machine,
@@ -415,7 +422,8 @@ static void process_event(union perf_event *event __unused,
 	print_sample_start(sample, thread, attr);
 
 	if (is_bts_event(attr)) {
-		print_sample_bts(event, sample, evsel, machine, thread);
+		print_sample_bts(session, event, sample, evsel,
+				 machine, thread);
 		return;
 	}
 
@@ -431,7 +439,7 @@ static void process_event(union perf_event *event __unused,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine,
+		perf_event__print_ip(session, event, sample, machine,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
@@ -488,6 +496,8 @@ static int process_sample_event(struct perf_tool *tool __used,
 				struct perf_evsel *evsel,
 				struct machine *machine)
 {
+	struct perf_script *script = container_of(tool, struct perf_script,
+						  tool);
 	struct addr_location al;
 	struct thread *thread = machine__findnew_thread(machine, event->ip.tid);
 
@@ -520,24 +530,27 @@ static int process_sample_event(struct perf_tool *tool __used,
 	if (cpu_list && !test_bit(sample->cpu, cpu_bitmap))
 		return 0;
 
-	scripting_ops->process_event(event, sample, evsel, machine, thread);
+	scripting_ops->process_event(script->session, event, sample, evsel,
+				     machine, thread);
 
 	evsel->hists.stats.total_period += sample->period;
 	return 0;
 }
 
-static struct perf_tool perf_script = {
-	.sample		 = process_sample_event,
-	.mmap		 = perf_event__process_mmap,
-	.comm		 = perf_event__process_comm,
-	.exit		 = perf_event__process_task,
-	.fork		 = perf_event__process_task,
-	.attr		 = perf_event__process_attr,
-	.event_type	 = perf_event__process_event_type,
-	.tracing_data	 = perf_event__process_tracing_data,
-	.build_id	 = perf_event__process_build_id,
-	.ordered_samples = true,
-	.ordering_requires_timestamps = true,
+static struct perf_script perf_script = {
+	.tool = {
+		.sample		 = process_sample_event,
+		.mmap		 = perf_event__process_mmap,
+		.comm		 = perf_event__process_comm,
+		.exit		 = perf_event__process_task,
+		.fork		 = perf_event__process_task,
+		.attr		 = perf_event__process_attr,
+		.event_type	 = perf_event__process_event_type,
+		.tracing_data	 = perf_event__process_tracing_data,
+		.build_id	 = perf_event__process_build_id,
+		.ordered_samples = true,
+		.ordering_requires_timestamps = true,
+	},
 };
 
 extern volatile int session_done;
@@ -553,7 +566,7 @@ static int __cmd_script(struct perf_session *session)
 
 	signal(SIGINT, sig_handler);
 
-	ret = perf_session__process_events(session, &perf_script);
+	ret = perf_session__process_events(session, &perf_script.tool);
 
 	if (debug_mode)
 		pr_err("Misordered timestamps: %" PRIu64 "\n", nr_unordered);
@@ -1335,10 +1348,13 @@ int cmd_script(int argc, const char **argv, const char *prefix __used)
 	if (!script_name)
 		setup_pager();
 
-	session = perf_session__new(input_name, O_RDONLY, 0, false, &perf_script);
+	session = perf_session__new(input_name, O_RDONLY, 0, false,
+				    &perf_script.tool);
 	if (session == NULL)
 		return -ENOMEM;
 
+	perf_script.session = session;
+
 	if (cpu_list) {
 		if (perf_session__cpu_bitmap(session, cpu_list, cpu_bitmap))
 			return -1;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 246081a..5ff95e1 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -774,8 +774,9 @@ static void perf_event__process_sample(struct perf_tool *tool,
 
 		if ((sort__has_parent || symbol_conf.use_callchain) &&
 		    sample->callchain) {
-			err = machine__resolve_callchain(machine, al.thread,
-							 sample->callchain, &parent);
+			err = machine__resolve_callchain(top->session,
+						machine, al.thread,
+						sample, &parent);
 			if (err)
 				return;
 		}
diff --git a/tools/perf/util/include/linux/compiler.h b/tools/perf/util/include/linux/compiler.h
index 547628e..2dc8671 100644
--- a/tools/perf/util/include/linux/compiler.h
+++ b/tools/perf/util/include/linux/compiler.h
@@ -10,5 +10,6 @@
 #endif
 
 #define __used		__attribute__((__unused__))
+#define __packed	__attribute__((__packed__))
 
 #endif
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index c14c665..2b2468c 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -156,9 +156,12 @@ int machine__init(struct machine *self, const char *root_dir, pid_t pid);
 void machine__exit(struct machine *self);
 void machine__delete(struct machine *self);
 
-int machine__resolve_callchain(struct machine *machine,
+struct perf_session;
+struct perf_sample;
+int machine__resolve_callchain(struct perf_session *session,
+			       struct machine *machine,
 			       struct thread *thread,
-			       struct ip_callchain *chain,
+			       struct perf_sample *sample,
 			       struct symbol **parent);
 int maps__set_kallsyms_ref_reloc_sym(struct map **maps, const char *symbol_name,
 				     u64 addr);
diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
index 4c1b3d7..40d7a4a 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -368,7 +368,8 @@ static void perl_process_event_generic(union perf_event *pevent __unused,
 	LEAVE;
 }
 
-static void perl_process_event(union perf_event *pevent,
+static void perl_process_event(struct perf_session *session __used,
+			       union perf_event *pevent,
 			       struct perf_sample *sample,
 			       struct perf_evsel *evsel,
 			       struct machine *machine,
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index acb9795..c6cc453 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -209,7 +209,8 @@ static inline struct event_format *find_cache_event(int type)
 	return event;
 }
 
-static void python_process_event(union perf_event *pevent __unused,
+static void python_process_event(struct perf_session *session,
+				 union perf_event *pevent __unused,
 				 struct perf_sample *sample,
 				 struct perf_evsel *evsel __unused,
 				 struct machine *machine __unused,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 8dd331d..f3cd0ad 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -16,6 +16,7 @@
 #include "cpumap.h"
 #include "vdso.h"
 #include "perf_regs.h"
+#include "unwind.h"
 
 static int perf_session__open(struct perf_session *self, bool force)
 {
@@ -293,10 +294,11 @@ struct branch_info *machine__resolve_bstack(struct machine *self,
 	return bi;
 }
 
-int machine__resolve_callchain(struct machine *self,
-			       struct thread *thread,
-			       struct ip_callchain *chain,
-			       struct symbol **parent)
+static int
+resolve_callchain_sample(struct machine *machine,
+			 struct thread *thread,
+			 struct ip_callchain *chain,
+			 struct symbol **parent)
 {
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	unsigned int i;
@@ -321,11 +323,14 @@ int machine__resolve_callchain(struct machine *self,
 		if (ip >= PERF_CONTEXT_MAX) {
 			switch (ip) {
 			case PERF_CONTEXT_HV:
-				cpumode = PERF_RECORD_MISC_HYPERVISOR;	break;
+				cpumode = PERF_RECORD_MISC_HYPERVISOR;
+				break;
 			case PERF_CONTEXT_KERNEL:
-				cpumode = PERF_RECORD_MISC_KERNEL;	break;
+				cpumode = PERF_RECORD_MISC_KERNEL;
+				break;
 			case PERF_CONTEXT_USER:
-				cpumode = PERF_RECORD_MISC_USER;	break;
+				cpumode = PERF_RECORD_MISC_USER;
+				break;
 			default:
 				pr_debug("invalid callchain context: "
 					 "%"PRId64"\n", (s64) ip);
@@ -340,7 +345,7 @@ int machine__resolve_callchain(struct machine *self,
 		}
 
 		al.filtered = false;
-		thread__find_addr_location(thread, self, cpumode,
+		thread__find_addr_location(thread, machine, cpumode,
 					   MAP__FUNCTION, ip, &al, NULL);
 		if (al.sym != NULL) {
 			if (sort__has_parent && !*parent &&
@@ -359,6 +364,38 @@ int machine__resolve_callchain(struct machine *self,
 	return 0;
 }
 
+static int unwind_entry(struct unwind_entry *entry, void *arg)
+{
+	struct callchain_cursor *cursor = arg;
+	return callchain_cursor_append(cursor, entry->ip,
+				       entry->map, entry->sym);
+}
+
+int machine__resolve_callchain(struct perf_session *session,
+			       struct machine *self,
+			       struct thread *thread,
+			       struct perf_sample *sample,
+			       struct symbol **parent)
+{
+	int ret;
+
+	callchain_cursor_reset(&callchain_cursor);
+
+	ret = resolve_callchain_sample(self, thread, sample->callchain,
+				       parent);
+	if (ret)
+		return ret;
+
+	if (!session->sample_regs_user ||
+	    !session->sample_stack_user)
+		return 0;
+
+	return unwind__get_entries(unwind_entry,
+				   &callchain_cursor,
+				   self, thread,
+				   session->sample_regs_user, sample);
+}
+
 static int process_event_synth_tracing_data_stub(union perf_event *event __used,
 						 struct perf_session *session __used)
 {
@@ -1523,7 +1560,8 @@ struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 	return NULL;
 }
 
-void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
+void perf_event__print_ip(struct perf_session *session,
+			  union perf_event *event, struct perf_sample *sample,
 			  struct machine *machine, int print_sym,
 			  int print_dso, int print_symoffset)
 {
@@ -1539,8 +1577,8 @@ void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
 
 	if (symbol_conf.use_callchain && sample->callchain) {
 
-		if (machine__resolve_callchain(machine, al.thread,
-						sample->callchain, NULL) != 0) {
+		if (machine__resolve_callchain(session, machine,
+					al.thread, sample, NULL) != 0) {
 			if (verbose)
 				error("Failed to resolve callchain. Skipping\n");
 			return;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 452d826..3474d68 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -155,7 +155,8 @@ static inline int perf_session__synthesize_sample(struct perf_session *session,
 struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 					    unsigned int type);
 
-void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
+void perf_event__print_ip(struct perf_session *session,
+			  union perf_event *event, struct perf_sample *sample,
 			  struct machine *machine, int print_sym,
 			  int print_dso, int print_symoffset);
 
diff --git a/tools/perf/util/trace-event-scripting.c b/tools/perf/util/trace-event-scripting.c
index 18ae6c1..496c8d2 100644
--- a/tools/perf/util/trace-event-scripting.c
+++ b/tools/perf/util/trace-event-scripting.c
@@ -35,7 +35,8 @@ static int stop_script_unsupported(void)
 	return 0;
 }
 
-static void process_event_unsupported(union perf_event *event __unused,
+static void process_event_unsupported(struct perf_session *session __used,
+				      union perf_event *event __unused,
 				      struct perf_sample *sample __unused,
 				      struct perf_evsel *evsel __unused,
 				      struct machine *machine __unused,
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 639852a..e38fe82 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -72,11 +72,14 @@ struct tracing_data *tracing_data_get(struct list_head *pattrs,
 void tracing_data_put(struct tracing_data *tdata);
 
 
+struct perf_session;
+
 struct scripting_ops {
 	const char *name;
 	int (*start_script) (const char *script, int argc, const char **argv);
 	int (*stop_script) (void);
-	void (*process_event) (union perf_event *event,
+	void (*process_event) (struct perf_session *session,
+			       union perf_event *event,
 			       struct perf_sample *sample,
 			       struct perf_evsel *evsel,
 			       struct machine *machine,
diff --git a/tools/perf/util/unwind.c b/tools/perf/util/unwind.c
new file mode 100644
index 0000000..e49e6a5
--- /dev/null
+++ b/tools/perf/util/unwind.c
@@ -0,0 +1,565 @@
+/*
+ * Post mortem Dwarf CFI based unwinding on top of regs and stack dumps.
+ *
+ * Lots of this code have been borrowed or heavily inspired from parts of
+ * the libunwind 0.99 code which are (amongst other contributors I may have
+ * forgotten):
+ *
+ * Copyright (C) 2002-2007 Hewlett-Packard Co
+ *	Contributed by David Mosberger-Tang <davidm@hpl.hp.com>
+ *
+ * And the bugs have been added by:
+ *
+ * Copyright (C) 2010, Frederic Weisbecker <fweisbec@gmail.com>
+ * Copyright (C) 2012, Jiri Olsa <jolsa@redhat.com>
+ *
+ */
+
+#include <elf.h>
+#include <gelf.h>
+#include <fcntl.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <linux/list.h>
+#include <libunwind.h>
+#include <libunwind-ptrace.h>
+#include "thread.h"
+#include "session.h"
+#include "perf_regs.h"
+#include "unwind.h"
+#include "util.h"
+
+extern int
+UNW_OBJ(dwarf_search_unwind_table) (unw_addr_space_t as,
+				    unw_word_t ip,
+				    unw_dyn_info_t *di,
+				    unw_proc_info_t *pi,
+				    int need_unwind_info, void *arg);
+
+#define dwarf_search_unwind_table UNW_OBJ(dwarf_search_unwind_table)
+
+#define DW_EH_PE_FORMAT_MASK	0x0f	/* format of the encoded value */
+#define DW_EH_PE_APPL_MASK	0x70	/* how the value is to be applied */
+
+/* Pointer-encoding formats: */
+#define DW_EH_PE_omit		0xff
+#define DW_EH_PE_ptr		0x00	/* pointer-sized unsigned value */
+#define DW_EH_PE_udata4		0x03	/* unsigned 32-bit value */
+#define DW_EH_PE_udata8		0x04	/* unsigned 64-bit value */
+#define DW_EH_PE_sdata4		0x0b	/* signed 32-bit value */
+#define DW_EH_PE_sdata8		0x0c	/* signed 64-bit value */
+
+/* Pointer-encoding application: */
+#define DW_EH_PE_absptr		0x00	/* absolute value */
+#define DW_EH_PE_pcrel		0x10	/* rel. to addr. of encoded value */
+
+/*
+ * The following are not documented by LSB v1.3, yet they are used by
+ * GCC, presumably they aren't documented by LSB since they aren't
+ * used on Linux:
+ */
+#define DW_EH_PE_funcrel	0x40	/* start-of-procedure-relative */
+#define DW_EH_PE_aligned	0x50	/* aligned pointer */
+
+/* Flags intentionaly not handled, since they're not needed:
+ * #define DW_EH_PE_indirect      0x80
+ * #define DW_EH_PE_uleb128       0x01
+ * #define DW_EH_PE_udata2        0x02
+ * #define DW_EH_PE_sleb128       0x09
+ * #define DW_EH_PE_sdata2        0x0a
+ * #define DW_EH_PE_textrel       0x20
+ * #define DW_EH_PE_datarel       0x30
+ */
+
+struct unwind_info {
+	struct perf_sample	*sample;
+	struct machine		*machine;
+	struct thread		*thread;
+	u64			sample_uregs;
+};
+
+#define dw_read(ptr, type, end) ({	\
+	type *__p = (type *) ptr;	\
+	type  __v;			\
+	if ((__p + 1) > (type *) end)	\
+		return -EINVAL;		\
+	__v = *__p++;			\
+	ptr = (typeof(ptr)) __p;	\
+	__v;				\
+	})
+
+static int __dw_read_encoded_value(u8 **p, u8 *end, u64 *val,
+				   u8 encoding)
+{
+	u8 *cur = *p;
+	*val = 0;
+
+	switch (encoding) {
+	case DW_EH_PE_omit:
+		*val = 0;
+		goto out;
+	case DW_EH_PE_ptr:
+		*val = dw_read(cur, unsigned long, end);
+		goto out;
+	default:
+		break;
+	}
+
+	switch (encoding & DW_EH_PE_APPL_MASK) {
+	case DW_EH_PE_absptr:
+		break;
+	case DW_EH_PE_pcrel:
+		*val = (unsigned long) cur;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if ((encoding & 0x07) == 0x00)
+		encoding |= DW_EH_PE_udata4;
+
+	switch (encoding & DW_EH_PE_FORMAT_MASK) {
+	case DW_EH_PE_sdata4:
+		*val += dw_read(cur, s32, end);
+		break;
+	case DW_EH_PE_udata4:
+		*val += dw_read(cur, u32, end);
+		break;
+	case DW_EH_PE_sdata8:
+		*val += dw_read(cur, s64, end);
+		break;
+	case DW_EH_PE_udata8:
+		*val += dw_read(cur, u64, end);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+ out:
+	*p = cur;
+	return 0;
+}
+
+#define dw_read_encoded_value(ptr, end, enc) ({			\
+	u64 __v;						\
+	if (__dw_read_encoded_value(&ptr, end, &__v, enc)) {	\
+		return -EINVAL;                                 \
+	}                                                       \
+	__v;                                                    \
+	})
+
+static Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
+				    GElf_Shdr *shp, const char *name)
+{
+	Elf_Scn *sec = NULL;
+
+	while ((sec = elf_nextscn(elf, sec)) != NULL) {
+		char *str;
+
+		gelf_getshdr(sec, shp);
+		str = elf_strptr(elf, ep->e_shstrndx, shp->sh_name);
+		if (!strcmp(name, str))
+			break;
+	}
+
+	return sec;
+}
+
+static u64 elf_section_offset(int fd, const char *name)
+{
+	Elf *elf;
+	GElf_Ehdr ehdr;
+	GElf_Shdr shdr;
+	u64 offset = 0;
+
+	elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+	if (elf == NULL)
+		return 0;
+
+	do {
+		if (gelf_getehdr(elf, &ehdr) == NULL)
+			break;
+
+		if (!elf_section_by_name(elf, &ehdr, &shdr, name))
+			break;
+
+		offset = shdr.sh_offset;
+	} while (0);
+
+	elf_end(elf);
+	return offset;
+}
+
+struct table_entry {
+	u32 start_ip_offset;
+	u32 fde_offset;
+};
+
+struct eh_frame_hdr {
+	unsigned char version;
+	unsigned char eh_frame_ptr_enc;
+	unsigned char fde_count_enc;
+	unsigned char table_enc;
+
+	/*
+	 * The rest of the header is variable-length and consists of the
+	 * following members:
+	 *
+	 *	encoded_t eh_frame_ptr;
+	 *	encoded_t fde_count;
+	 */
+
+	/* A single encoded pointer should not be more than 8 bytes. */
+	u64 enc[2];
+
+	/*
+	 * struct {
+	 *    encoded_t start_ip;
+	 *    encoded_t fde_addr;
+	 * } binary_search_table[fde_count];
+	 */
+	char data[0];
+} __packed;
+
+static int unwind_spec_ehframe(struct dso *dso, struct machine *machine,
+			       u64 offset, u64 *table_data, u64 *segbase,
+			       u64 *fde_count)
+{
+	struct eh_frame_hdr hdr;
+	u8 *enc = (u8 *) &hdr.enc;
+	u8 *end = (u8 *) &hdr.data;
+	ssize_t r;
+
+	r = dso__data_read_offset(dso, machine, offset,
+				  (u8 *) &hdr, sizeof(hdr));
+	if (r != sizeof(hdr))
+		return -EINVAL;
+
+	/* We dont need eh_frame_ptr, just skip it. */
+	dw_read_encoded_value(enc, end, hdr.eh_frame_ptr_enc);
+
+	*fde_count  = dw_read_encoded_value(enc, end, hdr.fde_count_enc);
+	*segbase    = offset;
+	*table_data = (enc - (u8 *) &hdr) + offset;
+	return 0;
+}
+
+static int read_unwind_spec(struct dso *dso, struct machine *machine,
+			    u64 *table_data, u64 *segbase, u64 *fde_count)
+{
+	int ret = -EINVAL, fd;
+	u64 offset;
+
+	fd = dso__data_fd(dso, machine);
+	if (fd < 0)
+		return -EINVAL;
+
+	offset = elf_section_offset(fd, ".eh_frame_hdr");
+	close(fd);
+
+	if (offset)
+		ret = unwind_spec_ehframe(dso, machine, offset,
+					  table_data, segbase,
+					  fde_count);
+
+	/* TODO .debug_frame check if eh_frame_hdr fails */
+	return ret;
+}
+
+static struct map *find_map(unw_word_t ip, struct unwind_info *ui)
+{
+	struct addr_location al;
+
+	thread__find_addr_map(ui->thread, ui->machine, PERF_RECORD_MISC_USER,
+			      MAP__FUNCTION, ip, &al);
+	return al.map;
+}
+
+static int
+find_proc_info(unw_addr_space_t as, unw_word_t ip, unw_proc_info_t *pi,
+	       int need_unwind_info, void *arg)
+{
+	struct unwind_info *ui = arg;
+	struct map *map;
+	unw_dyn_info_t di;
+	u64 table_data, segbase, fde_count;
+
+	map = find_map(ip, ui);
+	if (!map || !map->dso)
+		return -EINVAL;
+
+	if (read_unwind_spec(map->dso, ui->machine,
+			     &table_data, &segbase, &fde_count))
+		return -EINVAL;
+
+	memset(&di, 0, sizeof(di));
+	di.format   = UNW_INFO_FORMAT_REMOTE_TABLE;
+	di.start_ip = map->start;
+	di.end_ip   = map->end;
+	di.u.rti.segbase    = map->start + segbase;
+	di.u.rti.table_data = map->start + table_data;
+	di.u.rti.table_len  = fde_count * sizeof(struct table_entry)
+			      / sizeof(unw_word_t);
+	return dwarf_search_unwind_table(as, ip, &di, pi,
+					 need_unwind_info, arg);
+}
+
+static int access_fpreg(unw_addr_space_t __used as, unw_regnum_t __used num,
+			unw_fpreg_t __used *val, int __used __write,
+			void __used *arg)
+{
+	pr_err("unwind: access_fpreg unsupported\n");
+	return -UNW_EINVAL;
+}
+
+static int get_dyn_info_list_addr(unw_addr_space_t __used as,
+				  unw_word_t __used *dil_addr,
+				  void __used *arg)
+{
+	return -UNW_ENOINFO;
+}
+
+static int resume(unw_addr_space_t __used as, unw_cursor_t __used *cu,
+		  void __used *arg)
+{
+	pr_err("unwind: resume unsupported\n");
+	return -UNW_EINVAL;
+}
+
+static int
+get_proc_name(unw_addr_space_t __used as, unw_word_t __used addr,
+		char __used *bufp, size_t __used buf_len,
+		unw_word_t __used *offp, void __used *arg)
+{
+	pr_err("unwind: get_proc_name unsupported\n");
+	return -UNW_EINVAL;
+}
+
+static int access_dso_mem(struct unwind_info *ui, unw_word_t addr,
+			  unw_word_t *data)
+{
+	struct addr_location al;
+	ssize_t size;
+
+	thread__find_addr_map(ui->thread, ui->machine, PERF_RECORD_MISC_USER,
+			      MAP__FUNCTION, addr, &al);
+	if (!al.map) {
+		pr_debug("unwind: no map for %lx\n", (unsigned long)addr);
+		return -1;
+	}
+
+	if (!al.map->dso)
+		return -1;
+
+	size = dso__data_read_addr(al.map->dso, al.map, ui->machine,
+				   addr, (u8 *) data, sizeof(*data));
+
+	return !(size == sizeof(*data));
+}
+
+static int reg_value(unw_word_t *valp, struct regs_dump *regs, int id,
+		     u64 sample_regs)
+{
+	int i, idx = 0;
+
+	if (!(sample_regs & (1 << id)))
+		return -EINVAL;
+
+	for (i = 0; i < id; i++) {
+		if (sample_regs & (1 << i))
+			idx++;
+	}
+
+	*valp = regs->regs[idx];
+	return 0;
+}
+
+static int access_mem(unw_addr_space_t __used as,
+		      unw_word_t addr, unw_word_t *valp,
+		      int __write, void *arg)
+{
+	struct unwind_info *ui = arg;
+	struct stack_dump *stack = &ui->sample->user_stack;
+	unw_word_t start, end;
+	int offset;
+	int ret;
+
+	/* Don't support write, probably not needed. */
+	if (__write || !stack || !ui->sample->user_regs.regs) {
+		*valp = 0;
+		return 0;
+	}
+
+	ret = reg_value(&start, &ui->sample->user_regs, PERF_REG_SP,
+			ui->sample_uregs);
+	if (ret)
+		return ret;
+
+	end = start + stack->size;
+
+	/* Check overflow. */
+	if (addr + sizeof(unw_word_t) < addr)
+		return -EINVAL;
+
+	if (addr < start || addr + sizeof(unw_word_t) >= end) {
+		ret = access_dso_mem(ui, addr, valp);
+		if (ret) {
+			pr_debug("unwind: access_mem %p not inside range %p-%p\n",
+				(void *)addr, (void *)start, (void *)end);
+			*valp = 0;
+			return ret;
+		}
+		return 0;
+	}
+
+	offset = addr - start;
+	*valp  = *(unw_word_t *)&stack->data[offset];
+	pr_debug("unwind: access_mem %p %lx\n",
+		 (void *)addr, (unsigned long)*valp);
+	return 0;
+}
+
+static int access_reg(unw_addr_space_t __used as,
+		      unw_regnum_t regnum, unw_word_t *valp,
+		      int __write, void *arg)
+{
+	struct unwind_info *ui = arg;
+	int id, ret;
+
+	/* Don't support write, I suspect we don't need it. */
+	if (__write) {
+		pr_err("unwind: access_reg w %d\n", regnum);
+		return 0;
+	}
+
+	if (!ui->sample->user_regs.regs) {
+		*valp = 0;
+		return 0;
+	}
+
+	id = unwind__arch_reg_id(regnum);
+	if (id < 0)
+		return -EINVAL;
+
+	ret = reg_value(valp, &ui->sample->user_regs, id, ui->sample_uregs);
+	if (ret) {
+		pr_err("unwind: can't read reg %d\n", regnum);
+		return ret;
+	}
+
+	pr_debug("unwind: reg %d, val %lx\n", regnum, (unsigned long)*valp);
+	return 0;
+}
+
+static void put_unwind_info(unw_addr_space_t __used as,
+			    unw_proc_info_t *pi __used,
+			    void *arg __used)
+{
+	pr_debug("unwind: put_unwind_info called\n");
+}
+
+static int entry(u64 ip, struct thread *thread, struct machine *machine,
+		 unwind_entry_cb_t cb, void *arg)
+{
+	struct unwind_entry e;
+	struct addr_location al;
+
+	thread__find_addr_location(thread, machine,
+				   PERF_RECORD_MISC_USER,
+				   MAP__FUNCTION, ip, &al, NULL);
+
+	e.ip = ip;
+	e.map = al.map;
+	e.sym = al.sym;
+
+	pr_debug("unwind: %s:ip = 0x%" PRIx64 " (0x%" PRIx64 "\n",
+		 al.sym ? al.sym->name : "[]",
+		 ip,
+		 al.map ? al.map->map_ip(al.map, ip) : (u64) 0);
+
+	return cb(&e, arg);
+}
+
+static void display_error(int err)
+{
+	switch (err) {
+	case UNW_EINVAL:
+		pr_err("unwind: Only supports local.\n");
+		break;
+	case UNW_EUNSPEC:
+		pr_err("unwind: Unspecified error.\n");
+		break;
+	case UNW_EBADREG:
+		pr_err("unwind: Register unavailable.\n");
+		break;
+	default:
+		break;
+	}
+}
+
+static unw_accessors_t accessors = {
+	.find_proc_info		= find_proc_info,
+	.put_unwind_info	= put_unwind_info,
+	.get_dyn_info_list_addr	= get_dyn_info_list_addr,
+	.access_mem		= access_mem,
+	.access_reg		= access_reg,
+	.access_fpreg		= access_fpreg,
+	.resume			= resume,
+	.get_proc_name		= get_proc_name,
+};
+
+static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
+		       void *arg)
+{
+	unw_addr_space_t addr_space;
+	unw_cursor_t c;
+	int ret;
+
+	addr_space = unw_create_addr_space(&accessors, 0);
+	if (!addr_space) {
+		pr_err("unwind: Can't create unwind address space.\n");
+		return -ENOMEM;
+	}
+
+	ret = unw_init_remote(&c, addr_space, ui);
+	if (ret)
+		display_error(ret);
+
+	while (!ret && (unw_step(&c) > 0)) {
+		unw_word_t ip;
+
+		unw_get_reg(&c, UNW_REG_IP, &ip);
+		ret = entry(ip, ui->thread, ui->machine, cb, arg);
+	}
+
+	unw_destroy_addr_space(addr_space);
+	return ret;
+}
+
+int unwind__get_entries(unwind_entry_cb_t cb, void *arg,
+			struct machine *machine, struct thread *thread,
+			u64 sample_uregs, struct perf_sample *data)
+{
+	unw_word_t ip;
+	struct unwind_info ui = {
+		.sample       = data,
+		.sample_uregs = sample_uregs,
+		.thread       = thread,
+		.machine      = machine,
+	};
+	int ret;
+
+	if (!data->user_regs.regs)
+		return -EINVAL;
+
+	ret = reg_value(&ip, &data->user_regs, PERF_REG_IP, sample_uregs);
+	if (ret)
+		return ret;
+
+	ret = entry(ip, thread, machine, cb, arg);
+	if (ret)
+		return -ENOMEM;
+
+	return get_entries(&ui, cb, arg);
+}
diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h
new file mode 100644
index 0000000..919bd6a
--- /dev/null
+++ b/tools/perf/util/unwind.h
@@ -0,0 +1,34 @@
+#ifndef __UNWIND_H
+#define __UNWIND_H
+
+#include "types.h"
+#include "event.h"
+#include "symbol.h"
+
+struct unwind_entry {
+	struct map	*map;
+	struct symbol	*sym;
+	u64		ip;
+};
+
+typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry, void *arg);
+
+#ifndef NO_LIBUNWIND_SUPPORT
+int unwind__get_entries(unwind_entry_cb_t cb, void *arg,
+			struct machine *machine,
+			struct thread *thread,
+			u64 sample_uregs,
+			struct perf_sample *data);
+int unwind__arch_reg_id(int regnum);
+#else
+static inline int
+unwind__get_entries(unwind_entry_cb_t cb __used, void *arg __used,
+		    struct machine *machine __used,
+		    struct thread *thread __used,
+		    u64 sample_uregs __used,
+		    struct perf_sample *data __used)
+{
+	return 0;
+}
+#endif /* NO_LIBUNWIND_SUPPORT */
+#endif /* __UNWIND_H */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 17/19] perf, tool: Support for dwarf mode callchain on perf record
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (15 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 16/19] perf, tool: Support for dwarf cfi unwinding on post processing Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 18/19] perf, tool: Add dso data caching Jiri Olsa
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

This patch enables perf to use the dwarf unwind code.

It extends the perf record '-g' option with following arguments:
  'fp'           - provides framepointer based user
                   stack backtrace
  'dwarf[,size]' - provides dwarf (libunwind) based user stack
                   backtrace. The size specifies the size of the
                   user stack dump. If ommited it is 8192 by default.

If libunwind is found during the perf build, then the 'dwarf'
argument becomes available for record command. The 'fp' stays as
default option in any case.

Examples: (perf compiled with libunwind)

   perf record -g dwarf ls
      - provides dwarf unwind with 8192 as stack dump size

   perf record -g dwarf,4096 ls
      - provides dwarf unwind with 4096 as stack dump size

   perf record -g -- ls
   perf record -g fp ls
      - provides frame pointer unwind

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/builtin-record.c |   86 ++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/perf.h           |    9 ++++-
 tools/perf/util/evsel.c     |   10 ++++-
 3 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f95840d..6dbad83 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -31,6 +31,15 @@
 #include <sched.h>
 #include <sys/mman.h>
 
+#define CALLCHAIN_HELP "do call-graph (stack chain/backtrace) recording: "
+
+#ifdef NO_LIBUNWIND_SUPPORT
+static char callchain_help[] = CALLCHAIN_HELP "[fp]";
+#else
+static unsigned long default_stack_dump_size = 8192;
+static char callchain_help[] = CALLCHAIN_HELP "[fp] dwarf";
+#endif
+
 enum write_mode_t {
 	WRITE_FORCE,
 	WRITE_APPEND
@@ -732,6 +741,78 @@ error:
 	return ret;
 }
 
+static int
+parse_callchain_opt(const struct option *opt __used, const char *arg,
+		    int unset)
+{
+	struct perf_record *rec = (struct perf_record *)opt->value;
+	char *tok, *name, *saveptr = NULL;
+	char buf[20];
+	int ret = -1;
+
+	/* --no-call-graph */
+	if (unset)
+		return 0;
+
+	/* We specified default option if none is provided. */
+	BUG_ON(!arg);
+
+	/* We need buffer that we know we can write to. */
+	snprintf(buf, 20, "%s", arg);
+
+	tok = strtok_r((char *)buf, ",", &saveptr);
+	name = tok ? : (char *)buf;
+
+	do {
+		/* Framepointer style */
+		if (!strncmp(name, "fp", sizeof("fp"))) {
+			if (!strtok_r(NULL, ",", &saveptr)) {
+				rec->opts.call_graph = CALLCHAIN_FP;
+				ret = 0;
+			} else
+				pr_err("callchain: No more arguments "
+				       "needed for -g fp\n");
+			break;
+
+#ifndef NO_LIBUNWIND_SUPPORT
+		/* Dwarf style */
+		} else if (!strncmp(name, "dwarf", sizeof("dwarf"))) {
+			ret = 0;
+			rec->opts.call_graph = CALLCHAIN_DWARF;
+			rec->opts.stack_dump_size = default_stack_dump_size;
+
+			tok = strtok_r(NULL, ",", &saveptr);
+			if (tok) {
+				char *endptr;
+				unsigned long size;
+
+				size = strtoul(tok, &endptr, 0);
+				if (*endptr) {
+					pr_err("callchain: Incorrect stack "
+					       "dump size: %s\n", tok);
+					ret = -1;
+				}
+
+				rec->opts.stack_dump_size = size;
+			}
+
+			pr_debug("callchain: stack dump size %lu\n",
+				 rec->opts.stack_dump_size);
+#endif
+		} else {
+			pr_err("callchain: Unknown -g option "
+			       "value: %s\n", name);
+			break;
+		}
+
+	} while (0);
+
+	if (!ret)
+		pr_debug("callchain: type %d\n", rec->opts.call_graph);
+
+	return ret;
+}
+
 static const char * const record_usage[] = {
 	"perf record [<options>] [<command>]",
 	"perf record [<options>] -- <command> [<options>]",
@@ -803,8 +884,9 @@ const struct option record_options[] = {
 		     "number of mmap data pages"),
 	OPT_BOOLEAN(0, "group", &record.opts.group,
 		    "put the counters into a counter group"),
-	OPT_BOOLEAN('g', "call-graph", &record.opts.call_graph,
-		    "do call-graph (stack chain/backtrace) recording"),
+	OPT_CALLBACK_DEFAULT('g', "call-graph", &record, "mode,dump_size",
+			     callchain_help, &parse_callchain_opt,
+			     "fp"),
 	OPT_INCR('v', "verbose", &verbose,
 		    "be more verbose (show counter open errors, etc)"),
 	OPT_BOOLEAN('q', "quiet", &quiet, "don't print any message"),
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index f960ccb..09272db 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -209,9 +209,15 @@ void pthread__unblock_sigwinch(void);
 
 #include "util/target.h"
 
+enum perf_call_graph_mode {
+	CALLCHAIN_NONE,
+	CALLCHAIN_FP,
+	CALLCHAIN_DWARF
+};
+
 struct perf_record_opts {
 	struct perf_target target;
-	bool	     call_graph;
+	int	     call_graph;
 	bool	     group;
 	bool	     inherit_stat;
 	bool	     no_delay;
@@ -230,6 +236,7 @@ struct perf_record_opts {
 	u64          branch_stack;
 	u64	     default_interval;
 	u64	     user_interval;
+	unsigned long stack_dump_size;
 };
 
 #endif
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 32ade1f..be824da 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -17,6 +17,7 @@
 #include "thread_map.h"
 #include "target.h"
 #include "../../include/linux/perf_event.h"
+#include "perf_regs.h"
 
 #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
 #define GROUP_FD(group_fd, cpu) (*(int *)xyarray__entry(group_fd, cpu, 0))
@@ -196,9 +197,16 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
 		attr->mmap_data = track;
 	}
 
-	if (opts->call_graph)
+	if (opts->call_graph) {
 		attr->sample_type	|= PERF_SAMPLE_CALLCHAIN;
 
+		if (opts->call_graph == CALLCHAIN_DWARF) {
+			attr->sample_regs_user = PERF_REGS_MASK;
+			attr->sample_stack_user = opts->stack_dump_size;
+			attr->exclude_callchain_user = 1;
+		}
+	}
+
 	if (perf_target__has_cpu(&opts->target))
 		attr->sample_type	|= PERF_SAMPLE_CPU;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 18/19] perf, tool: Add dso data caching
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (16 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 17/19] perf, tool: Support for dwarf mode callchain on perf record Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 13:20 ` [PATCH 19/19] perf, tool: Add dso data caching tests Jiri Olsa
  2012-06-11 21:44 ` [RFCv5 00/19] perf: Add backtrace post dwarf unwind Benjamin Redelings
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding dso data caching so we dont need to open/read/close,
each time we want dso data.

The DSO data caching affects following functions:
  dso__data_read_offset
  dso__data_read_addr

Each DSO read tries to find the data (based on offset) inside
the cache. If it's not present it fills the cache from file,
and returns the data. If it is present, data are returned
with no file read.

Each data read is cached by reading cache page sized/alligned
amount of DSO data. The cache page size is hardcoded to 4096.
The cache is using RB tree with file offset as a sort key.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/symbol.c |  154 ++++++++++++++++++++++++++++++++++++++++------
 tools/perf/util/symbol.h |   11 +++
 2 files changed, 147 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 59012fc..8a66cce 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -29,6 +29,7 @@
 #define NT_GNU_BUILD_ID 3
 #endif
 
+static void dso_cache__free(struct rb_root *root);
 static bool dso__build_id_equal(const struct dso *dso, u8 *build_id);
 static int elf_read_build_id(Elf *elf, void *bf, size_t size);
 static void dsos__add(struct list_head *head, struct dso *dso);
@@ -342,6 +343,7 @@ struct dso *dso__new(const char *name)
 		dso__set_short_name(dso, dso->name);
 		for (i = 0; i < MAP__NR_TYPES; ++i)
 			dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
+		dso->cache = RB_ROOT;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->loaded = 0;
@@ -377,6 +379,7 @@ void dso__delete(struct dso *dso)
 		free((char *)dso->short_name);
 	if (dso->lname_alloc)
 		free(dso->long_name);
+	dso_cache__free(&dso->cache);
 	free(dso);
 }
 
@@ -2934,22 +2937,87 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
 	return -EINVAL;
 }
 
-static ssize_t dso_cache_read(struct dso *dso __used, u64 offset __used,
-			      u8 *data __used, ssize_t size __used)
+static void
+dso_cache__free(struct rb_root *root)
 {
-	return -EINVAL;
+	struct rb_node *next = rb_first(root);
+
+	while (next) {
+		struct dso_cache *cache;
+
+		cache = rb_entry(next, struct dso_cache, rb_node);
+		next = rb_next(&cache->rb_node);
+		rb_erase(&cache->rb_node, root);
+		free(cache);
+	}
 }
 
-static int dso_cache_add(struct dso *dso __used, u64 offset __used,
-			 u8 *data __used, ssize_t size __used)
+static struct dso_cache*
+dso_cache__find(struct rb_root *root, u64 offset)
 {
-	return 0;
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct dso_cache *cache;
+
+	while (*p != NULL) {
+		u64 end;
+
+		parent = *p;
+		cache = rb_entry(parent, struct dso_cache, rb_node);
+		end = cache->offset + DSO__DATA_CACHE_SIZE;
+
+		if (offset < cache->offset)
+			p = &(*p)->rb_left;
+		else if (offset >= end)
+			p = &(*p)->rb_right;
+		else
+			return cache;
+	}
+	return NULL;
+}
+
+static void
+dso_cache__insert(struct rb_root *root, struct dso_cache *new)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct dso_cache *cache;
+	u64 offset = new->offset;
+
+	while (*p != NULL) {
+		u64 end;
+
+		parent = *p;
+		cache = rb_entry(parent, struct dso_cache, rb_node);
+		end = cache->offset + DSO__DATA_CACHE_SIZE;
+
+		if (offset < cache->offset)
+			p = &(*p)->rb_left;
+		else if (offset >= end)
+			p = &(*p)->rb_right;
+	}
+
+	rb_link_node(&new->rb_node, parent, p);
+	rb_insert_color(&new->rb_node, root);
+}
+
+static ssize_t
+dso_cache__memcpy(struct dso_cache *cache, u64 offset,
+		  u8 *data, u64 size)
+{
+	u64 cache_offset = offset - cache->offset;
+	u64 cache_size   = min(cache->size - cache_offset, size);
+
+	memcpy(data, cache->data + cache_offset, cache_size);
+	return cache_size;
 }
 
-static ssize_t read_dso_data(struct dso *dso, struct machine *machine,
-		     u64 offset, u8 *data, ssize_t size)
+static ssize_t
+dso_cache__read(struct dso *dso, struct machine *machine,
+		 u64 offset, u8 *data, ssize_t size)
 {
-	ssize_t rsize = -1;
+	struct dso_cache *cache;
+	ssize_t ret;
 	int fd;
 
 	fd = dso__data_fd(dso, machine);
@@ -2957,28 +3025,78 @@ static ssize_t read_dso_data(struct dso *dso, struct machine *machine,
 		return -1;
 
 	do {
-		if (-1 == lseek(fd, offset, SEEK_SET))
+		u64 cache_offset;
+
+		ret = -ENOMEM;
+
+		cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE);
+		if (!cache)
 			break;
 
-		rsize = read(fd, data, size);
-		if (-1 == rsize)
+		cache_offset = offset & DSO__DATA_CACHE_MASK;
+		ret = -EINVAL;
+
+		if (-1 == lseek(fd, cache_offset, SEEK_SET))
 			break;
 
-		if (dso_cache_add(dso, offset, data, size))
-			pr_err("Failed to add data int dso cache.");
+		ret = read(fd, cache->data, DSO__DATA_CACHE_SIZE);
+		if (ret <= 0)
+			break;
+
+		cache->offset = cache_offset;
+		cache->size   = ret;
+		dso_cache__insert(&dso->cache, cache);
+
+		ret = dso_cache__memcpy(cache, offset, data, size);
 
 	} while (0);
 
+	if (ret <= 0)
+		free(cache);
+
 	close(fd);
-	return rsize;
+	return ret;
+}
+
+static ssize_t dso_cache_read(struct dso *dso, struct machine *machine,
+			      u64 offset, u8 *data, ssize_t size)
+{
+	struct dso_cache *cache;
+
+	cache = dso_cache__find(&dso->cache, offset);
+	if (cache)
+		return dso_cache__memcpy(cache, offset, data, size);
+	else
+		return dso_cache__read(dso, machine, offset, data, size);
 }
 
 ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 			      u64 offset, u8 *data, ssize_t size)
 {
-	if (dso_cache_read(dso, offset, data, size))
-		return read_dso_data(dso, machine, offset, data, size);
-	return 0;
+	ssize_t r = 0;
+	u8 *p = data;
+
+	do {
+		ssize_t ret;
+
+		ret = dso_cache_read(dso, machine, offset, p, size);
+		if (ret < 0)
+			return ret;
+
+		/* Reached EOF, return what we have. */
+		if (!ret)
+			break;
+
+		BUG_ON(ret > size);
+
+		r      += ret;
+		p      += ret;
+		offset += ret;
+		size   -= ret;
+
+	} while (size);
+
+	return r;
 }
 
 ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 77044cb..67313f0 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -181,10 +181,21 @@ enum dso_swap_type {
 	DSO_SWAP__YES,
 };
 
+#define DSO__DATA_CACHE_SIZE 4096
+#define DSO__DATA_CACHE_MASK ~(DSO__DATA_CACHE_SIZE - 1)
+
+struct dso_cache {
+	struct rb_node	rb_node;
+	u64 offset;
+	u64 size;
+	char data[0];
+};
+
 struct dso {
 	struct list_head node;
 	struct rb_root	 symbols[MAP__NR_TYPES];
 	struct rb_root	 symbol_names[MAP__NR_TYPES];
+	struct rb_root	 cache;
 	enum dso_kernel_type	kernel;
 	enum dso_swap_type	needs_swap;
 	enum dso_binary_type	symtab_type;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 19/19] perf, tool: Add dso data caching tests
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (17 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 18/19] perf, tool: Add dso data caching Jiri Olsa
@ 2012-06-11 13:20 ` Jiri Olsa
  2012-06-11 21:44 ` [RFCv5 00/19] perf: Add backtrace post dwarf unwind Benjamin Redelings
  19 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-11 13:20 UTC (permalink / raw)
  To: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec
  Cc: eranian, gorcunov, tzanussi, mhiramat, robert.richter, fche,
	linux-kernel, masami.hiramatsu.pt, drepper, asharma,
	benjamin.redelings, Jiri Olsa

Adding automated test for DSO data reading. Testing raw/cached
reads from different file/cache locations.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/Makefile             |    1 +
 tools/perf/builtin-test.c       |    4 +
 tools/perf/util/dso-test-data.c |  154 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/symbol.h        |    1 +
 4 files changed, 160 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/util/dso-test-data.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 92c3f8f..503ed03 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -362,6 +362,7 @@ LIB_OBJS += $(OUTPUT)util/usage.o
 LIB_OBJS += $(OUTPUT)util/wrapper.o
 LIB_OBJS += $(OUTPUT)util/sigchain.o
 LIB_OBJS += $(OUTPUT)util/symbol.o
+LIB_OBJS += $(OUTPUT)util/dso-test-data.o
 LIB_OBJS += $(OUTPUT)util/color.o
 LIB_OBJS += $(OUTPUT)util/pager.o
 LIB_OBJS += $(OUTPUT)util/header.o
diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index 8c2fcb0..f6011a5 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -1143,6 +1143,10 @@ static struct test {
 		.func = test__perf_pmu,
 	},
 	{
+		.desc = "Test dso data interface",
+		.func = dso__test_data,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/util/dso-test-data.c b/tools/perf/util/dso-test-data.c
new file mode 100644
index 0000000..ec62977
--- /dev/null
+++ b/tools/perf/util/dso-test-data.c
@@ -0,0 +1,154 @@
+
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <string.h>
+
+#include "symbol.h"
+
+#define TEST_ASSERT_VAL(text, cond) \
+do { \
+	if (!(cond)) { \
+		pr_debug("FAILED %s:%d %s\n", __FILE__, __LINE__, text); \
+		return -1; \
+	} \
+} while (0)
+
+static char *test_file(int size)
+{
+	static char buf_templ[] = "/tmp/test-XXXXXX";
+	char *templ = buf_templ;
+	int fd, i;
+	unsigned char *buf;
+
+	fd = mkostemp(templ, O_CREAT|O_WRONLY|O_TRUNC);
+
+	buf = malloc(size);
+	if (!buf) {
+		close(fd);
+		return NULL;
+	}
+
+	for (i = 0; i < size; i++)
+		buf[i] = (unsigned char) ((int) i % 10);
+
+	if (size != write(fd, buf, size))
+		templ = NULL;
+
+	close(fd);
+	return templ;
+}
+
+#define TEST_FILE_SIZE (DSO__DATA_CACHE_SIZE * 20)
+
+struct test_data__offset {
+	off_t offset;
+	u8 data[10];
+	int size;
+};
+
+struct test_data__offset offsets[] = {
+	/* Fill first cache page. */
+	{
+		.offset = 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read first cache page. */
+	{
+		.offset = 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Fill cache boundary pages. */
+	{
+		.offset = DSO__DATA_CACHE_SIZE - DSO__DATA_CACHE_SIZE % 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read cache boundary pages. */
+	{
+		.offset = DSO__DATA_CACHE_SIZE - DSO__DATA_CACHE_SIZE % 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Fill final cache page. */
+	{
+		.offset = TEST_FILE_SIZE - 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read final cache page. */
+	{
+		.offset = TEST_FILE_SIZE - 10,
+		.data   = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 },
+		.size   = 10,
+	},
+	/* Read final cache page. */
+	{
+		.offset = TEST_FILE_SIZE - 3,
+		.data   = { 7, 8, 9, 0, 0, 0, 0, 0, 0, 0 },
+		.size   = 3,
+	},
+};
+
+#define OFFSETS_CNT (sizeof(offsets) / sizeof(struct test_data__offset))
+
+int dso__test_data(void)
+{
+	struct machine machine;
+	struct dso *dso;
+	char *file = test_file(TEST_FILE_SIZE);
+	int i;
+
+	TEST_ASSERT_VAL("No test file", file);
+
+	memset(&machine, 0, sizeof(machine));
+
+	dso = dso__new((const char *)file);
+
+	/* Basic 10 bytes tests. */
+	for (i = 0; i < (int) OFFSETS_CNT; i++) {
+		struct test_data__offset *data = &offsets[i];
+		ssize_t size;
+		u8 buf[10];
+
+		memset(buf, 0, 10);
+		size = dso__data_read_offset(dso, &machine, data->offset,
+				     buf, 10);
+
+		TEST_ASSERT_VAL("Wrong size", size == data->size);
+		TEST_ASSERT_VAL("Wrong data", !memcmp(buf, data->data, 10));
+	}
+
+	/* Read cross multiple cache pages. */
+	{
+		ssize_t size;
+		int c;
+		u8 *buf;
+
+		buf = malloc(TEST_FILE_SIZE);
+		TEST_ASSERT_VAL("ENOMEM\n", buf);
+
+		/* First iteration to fill caches, second one to read them. */
+		for (c = 0; c < 2; c++) {
+			memset(buf, 0, TEST_FILE_SIZE);
+			size = dso__data_read_offset(dso, &machine, 10,
+						     buf, TEST_FILE_SIZE);
+
+			TEST_ASSERT_VAL("Wrong size",
+				size == (TEST_FILE_SIZE - 10));
+
+			for (i = 0; i < size; i++)
+				TEST_ASSERT_VAL("Wrong data",
+					buf[i] == (i % 10));
+		}
+
+		free(buf);
+	}
+
+	dso__delete(dso);
+	unlink(file);
+	return 0;
+}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 67313f0..17a7e8f 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -324,4 +324,5 @@ ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 ssize_t dso__data_read_addr(struct dso *dso, struct map *map,
 			    struct machine *machine, u64 addr,
 			    u8 *data, ssize_t size);
+int dso__test_data(void);
 #endif /* __PERF_SYMBOL */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [RFCv5 00/19] perf: Add backtrace post dwarf unwind
  2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
                   ` (18 preceding siblings ...)
  2012-06-11 13:20 ` [PATCH 19/19] perf, tool: Add dso data caching tests Jiri Olsa
@ 2012-06-11 21:44 ` Benjamin Redelings
  19 siblings, 0 replies; 37+ messages in thread
From: Benjamin Redelings @ 2012-06-11 21:44 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 11782 bytes --]

Hi,

     I want to say that these patches have been quite useful to me as a 
non-kernel-developer doing ordinary CPU profiling on x86_64, and I do 
hope they enter mainline soon.  I've been able to track down a ton of 
CPU time wasted in library functions like _int_malloc and strcmp_ssse3 
using preliminary versions of the patch (v4 and v5) .  Since symbols 
like _int_malloc are in system libraries like libc-2.13.so  or 
libstdc++.so.6.0.17, I couldn't tell what functions were responsible for 
calling them.  (I am not ready to recompile the C library with frame 
pointers just to get accurate profiling information!)  The post-hoc 
DWARF unwinder makes things "just work" without recompiling all system 
libraries, and I'm already relying on it to find further speedups.


     Now for some details.

1. [FIXED in v5] The 'perf script' output now gets the DSO for for each 
address in the backtrace correct, instead of assuming that all stack 
addresses have the same DSO as the stack top.

2. The 'perf report' command is still much slower when the samples are 
taken with dwarf than when they are taken using frame pointers.  
Specifically, it took about 80 seconds with DWARF, and less than 1 
second with fp.  However, I appreciate that these are run-time options, 
so that people who want the speed can use fp instead of DWARF.  Is there 
much chances of this post-processing getting faster?

3. The main (only?) symbol that didn't get a backtrace was __times in 
libc6.  I'm not clear why this would be different.  For example:

bali-phy 25064 10751.879731: cycles:
         ffffffff810136b4 __cycles_2_ns ([kernel.kallsyms])
             7f2b8c30332a __times (/lib/x86_64-linux-gnu/libc-2.13.so)
                       1c [unknown] ([unknown])

...

bali-phy 25064 10751.935645: cycles:
         ffffffff8103eb33 jiffies_to_clock_t ([kernel.kallsyms])
             7f2b8c30332a __times (/lib/x86_64-linux-gnu/libc-2.13.so)
             7fff0baf0540 [unknown] ([unknown])

...

bali-phy 25064 10752.059581: cycles:
         ffffffff8103eb33 jiffies_to_clock_t ([kernel.kallsyms])
             7f2b8c30332a __times (/lib/x86_64-linux-gnu/libc-2.13.so)
             7fff0baefd60 [unknown] ([unknown])

I know this is being called by my program.  From gdb:

#0  __times (buf=0x7fffffffc710) at ../sysdeps/unix/sysv/linux/times.c:28
#1  0x00000000005d3f85 in boost::chrono::process_user_cpu_clock::now ()
     at 
../../../../../master/boost/include/boost/chrono/detail/inlined/posix/process_cpu_clocks.hpp:124
#2  0x000000000053e8e3 in total_cpu_time ()
     at ../../../master/src/timer_stack.C:38

4.  Hmm... running the same profiling command with "perf  record -g fp 
<command>" appears to not record backtraces at all, even though my 
program was compiled with frame pointers.  Perhaps this part broke?

-BenRI

P.S. This is using the latest git version of libunwind.  I applied v5 of 
Jiri's perf-unwind patches against Linus's currently tip.

P.P.S. I've attached a profile graph from gprof2dot.py 
(http://code.google.com/p/jrfonseca/wiki/Gprof2Dot).  Now that I can get 
backtraces from library symbols, 97% of cpu is spent in main and 
children of main, versus about 80% before.  (I'm not sure we should 
expect 100% even if all backtraces work perfectly, since functions with 
very few samples may be ignored.)

On 06/11/2012 09:19 AM, Jiri Olsa wrote:
> hi,
> besides fixing several issues, going back to the original design
> because the last one was considered too generic.. now we have:
>
>   sample_regs_user  - != 0 triggers the user level regs dump
>   sample_stack_user - != 0 triggers the user stack dump
>
> We can allway extend this in future with new mask and flags
> for IRQ/PEBS regs.
>
> patches available also as tarball in here:
> http://people.redhat.com/~jolsa/perf_post_unwind_v5.tar.bz2
>
> v5 changes:
>     patch 1/19 - having just one enum set of the perf registers
>     patch 2/19 - using for_each_set_bit for scanning the mask
>                - single regs enum for both 32 and 64 bits versions
>                - using regs mask != 0 trigger to trigger the regs dump
>     patch 5/19 - adding perf_output_skip so we can skip undumped part of the stack in RB
>     patch 6/19 - using stack size != 0 trigger to trigger the stack dump
>                - do not zero the memory for non retrieved part of the stack dump
>     patch 7/19 - adding exclude_callchain_kernel attribute
>     patch 8/19 - this could be taken without the rest of the series
>
> v4 changes:
>     - no real change from v3, just rebase
>     - v3 patch 06/17 got already merged
>
> v3 changes:
>     patch 01/17
>     - added HAVE_PERF_REGS config option
>     patch 02/17, 04/17
>     - regs and stack perf interface is more general now
>     patch 06/17
>     - unrelated online fix for i386 compilation
>     patch 16/17
>     - few namespace fixies
>
> ---
> Adding the post unwinding user stack backtrace using dwarf unwind
> via libunwind. The original work was done by Frederic. I mostly took
> his patches and make them compile in current kernel code plus I added
> some stuff here and there.
>
> The main idea is to store user registers and portion of user
> stack when the sample data during the record phase. Then during
> the report, when the data is presented, perform the actual dwarf
> dwarf unwind.
>
> attached patches:
>    01/19 perf: Unified API to record selective sets of arch registers
>    02/19 perf: Add ability to attach user level registers dump to sample
>    03/19 perf, x86: Add copy_from_user_nmi_nochk for best effort copy
>    04/19 perf: Factor __output_copy to be usable with specific copy function
>    05/19 perf: Add perf_output_skip function to skip bytes in sample
>    06/19 perf: Add ability to attach user stack dump to sample
>    07/19 perf: Add attribute to filter out callchains
>    08/19 perf, tool: Remove unsused evsel parameter from machine__resolve_callchain
>    09/19 perf, tool: Factor DSO symtab types to generic binary types
>    10/19 perf, tool: Add interface to read DSO image data
>    11/19 perf, tool: Add '.note' check into search for NOTE section
>    12/19 perf, tool: Back [vdso] DSO with real data
>    13/19 perf, tool: Add interface to arch registers sets
>    14/19 perf, tool: Add libunwind dependency for dwarf cfi unwinding
>    15/19 perf, tool: Support user regs and stack in sample parsing
>    16/19 perf, tool: Support for dwarf cfi unwinding on post processing
>    17/19 perf, tool: Support for dwarf mode callchain on perf record
>    18/19 perf, tool: Add dso data caching
>    19/19 perf, tool: Add dso data caching tests
>
>
> I tested on Fedora. There was not much gain on i386, because the
> binaries are compiled with frame pointers. Thought the dwarf
> backtrace is more accurade and unwraps calls in more details
> (functions that do not set the frame pointers).
>
> I could see some improvement on x86_64, where I got full backtrace
> where current code could got just the first address out of the
> instruction pointer.
>
> Example on x86_64:
> [dwarf]
>     perf record -g -e syscalls:sys_enter_write date
>
>     100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
>                 |
>                 --- __GI___libc_write
>                     _IO_file_write@@GLIBC_2.2.5
>                     new_do_write
>                     _IO_do_write@@GLIBC_2.2.5
>                     _IO_file_overflow@@GLIBC_2.2.5
>                     0x4022cd
>                     0x401ee6
>                     __libc_start_main
>                     0x4020b9
>
>
> [frame pointer]
>     perf record -g fp -e syscalls:sys_enter_write date
>
>     100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
>                 |
>                 --- __GI___libc_write
>
> Also I tested on coreutils binaries mainly, but I could see
> getting wider backtraces with dwarf unwind for more complex
> application like firefox.
>
> The unwind should go throught [vdso] object. I haven't studied
> the [vsyscall] yet, so not sure there.
>
> Attached patches should work on both x86 and x86_64. I did
> some initial testing so far.
>
> The unwind backtrace can be interrupted by following reasons:
>      - bug in unwind information of processed shared library
>      - bug in unwind processing code (most likely ;) )
>      - insufficient dump stack size
>      - wrong register value - x86_64 does not store whole
>        set of registers when in exception, but so far
>        it looks like RIP and RSP should be enough
>
> thanks for comments,
> jirka
> ---
>   arch/Kconfig                                       |    6 +
>   arch/x86/Kconfig                                   |    1 +
>   arch/x86/include/asm/perf_event.h                  |    2 +
>   arch/x86/include/asm/perf_regs.h                   |   34 ++
>   arch/x86/include/asm/uaccess.h                     |    2 +
>   arch/x86/kernel/Makefile                           |    2 +
>   arch/x86/kernel/perf_regs.c                        |   91 ++++
>   arch/x86/lib/usercopy.c                            |   15 +-
>   include/linux/perf_event.h                         |   24 +-
>   include/linux/perf_regs.h                          |   19 +
>   kernel/events/callchain.c                          |   25 +-
>   kernel/events/core.c                               |  132 +++++-
>   kernel/events/internal.h                           |   69 ++-
>   kernel/events/ring_buffer.c                        |   10 +-
>   tools/perf/Makefile                                |   45 ++-
>   tools/perf/arch/x86/Makefile                       |    3 +
>   tools/perf/arch/x86/include/perf_regs.h            |   80 +++
>   tools/perf/arch/x86/util/unwind.c                  |  111 ++++
>   tools/perf/builtin-record.c                        |   86 +++-
>   tools/perf/builtin-report.c                        |   24 +-
>   tools/perf/builtin-script.c                        |   56 ++-
>   tools/perf/builtin-test.c                          |    7 +-
>   tools/perf/builtin-top.c                           |    7 +-
>   tools/perf/config/feature-tests.mak                |   25 +
>   tools/perf/perf.h                                  |    9 +-
>   tools/perf/util/annotate.c                         |    2 +-
>   tools/perf/util/dso-test-data.c                    |  154 ++++++
>   tools/perf/util/event.h                            |   15 +-
>   tools/perf/util/evlist.c                           |   16 +
>   tools/perf/util/evlist.h                           |    2 +
>   tools/perf/util/evsel.c                            |   35 ++-
>   tools/perf/util/include/linux/compiler.h           |    1 +
>   tools/perf/util/map.c                              |   23 +-
>   tools/perf/util/map.h                              |    9 +-
>   tools/perf/util/perf_regs.h                        |   14 +
>   tools/perf/util/python.c                           |    3 +-
>   .../perf/util/scripting-engines/trace-event-perl.c |    3 +-
>   .../util/scripting-engines/trace-event-python.c    |    3 +-
>   tools/perf/util/session.c                          |  110 ++++-
>   tools/perf/util/session.h                          |   17 +-
>   tools/perf/util/symbol.c                           |  435 +++++++++++++---
>   tools/perf/util/symbol.h                           |   52 ++-
>   tools/perf/util/trace-event-scripting.c            |    3 +-
>   tools/perf/util/trace-event.h                      |    5 +-
>   tools/perf/util/unwind.c                           |  565 ++++++++++++++++++++
>   tools/perf/util/unwind.h                           |   34 ++
>   tools/perf/util/vdso.c                             |   90 +++
>   tools/perf/util/vdso.h                             |    8 +
>   48 files changed, 2278 insertions(+), 206 deletions(-)
>


[-- Attachment #2: call-graph-dwarf.pdf --]
[-- Type: application/pdf, Size: 71783 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-11 13:19 ` [PATCH 02/19] perf: Add ability to attach user level registers dump to sample Jiri Olsa
@ 2012-06-13 11:16   ` Stephane Eranian
  2012-06-13 13:12     ` Jiri Olsa
  2012-06-13 13:29   ` Stephane Eranian
  1 sibling, 1 reply; 37+ messages in thread
From: Stephane Eranian @ 2012-06-13 11:16 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Mon, Jun 11, 2012 at 3:19 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> Introducing sample_regs_user bitmask into perf_event_attr
> struct to define the user level registers we want to attach
> to the sample. The dump itself is triggered once the
> sample_regs_user is not empty.
>
> Only user level registers are dump at the moment. Meaning the
> register values of the user space context as it was before the
> user entered the kernel for whatever reason (syscall, irq,
> exception, or a PMI happening in userspace).
>
> The layout of the sample_regs_user bitmap is described in
> asm/perf_regs.h for archs that support register dump.
>
> This is going to be useful to bring Dwarf CFI based stack
> unwinding on top of samples.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/perf_event.h |   10 ++++++-
>  kernel/events/core.c       |   61 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 70 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1ce887a..d66cbeb 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -271,7 +271,13 @@ struct perf_event_attr {
>                __u64           bp_len;
>                __u64           config2; /* extension of config1 */
>        };
> -       __u64   branch_sample_type; /* enum branch_sample_type */
> +       __u64   branch_sample_type; /* enum perf_branch_sample_type */
> +
> +       /*
> +        * Defines set of user regs to dump on samples.
> +        * See asm/perf_regs.h for details.
> +        */
> +       __u64   sample_regs_user;
>  };
That's not enough. You also need to define PERF_SAMPLE_USER_REGS
for sample_type. Although the sample_regs_users might look like it's enough
to capture regs, there is a problem when it comes to parsing the record. You
need an ordering guarantee that is explicitly spelled out in the API (the header
file). In your current patch, I have no way of knowing that sample_regs_users
are saved after BRANCH_STACK (should you have that enabled). Remember
that you can turn on/off sampled infos at will in sample_type. Yet to find the
infos when parsing, you need to know the order.

The enum perf_event_sample_format provides that ordering information. You
need to add a new type for sampling user regs.

>
>  /*
> @@ -609,6 +615,7 @@ struct perf_guest_info_callbacks {
>  #include <linux/static_key.h>
>  #include <linux/atomic.h>
>  #include <linux/sysfs.h>
> +#include <linux/perf_regs.h>
>  #include <asm/local.h>
>
>  struct perf_callchain_entry {
> @@ -1131,6 +1138,7 @@ struct perf_sample_data {
>        struct perf_callchain_entry     *callchain;
>        struct perf_raw_record          *raw;
>        struct perf_branch_stack        *br_stack;
> +       struct pt_regs                  *regs_user;
>  };
>
>  static inline void perf_sample_data_init(struct perf_sample_data *data,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f85c015..e4df59d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3750,6 +3750,33 @@ int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
>  }
>  EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
>
> +static void
> +perf_output_sample_regs(struct perf_output_handle *handle,
> +                       struct pt_regs *regs, u64 mask)
> +{
> +       int bit;
> +
> +       for_each_set_bit(bit, (const unsigned long *) &mask,
> +                        sizeof(mask) * BITS_PER_BYTE) {
> +               u64 val;
> +
> +               val = perf_reg_value(regs, bit);
> +               perf_output_put(handle, val);
> +       }
> +}
> +
> +static struct pt_regs *perf_sample_regs_user(struct pt_regs *regs)
> +{
> +       if (!user_mode(regs)) {
> +               if (current->mm)
> +                       regs = task_pt_regs(current);
> +               else
> +                       regs = NULL;
> +       }
> +
> +       return regs;
> +}
> +
>  static void __perf_event_header__init_id(struct perf_event_header *header,
>                                         struct perf_sample_data *data,
>                                         struct perf_event *event)
> @@ -4010,6 +4037,23 @@ void perf_output_sample(struct perf_output_handle *handle,
>                        perf_output_put(handle, nr);
>                }
>        }
> +
> +       if (event->attr.sample_regs_user) {
> +               u64 avail = (data->regs_user != NULL);
> +
> +               /*
> +                * If there are no regs to dump, notice it through
> +                * first u64 being zero.
> +                */
> +               perf_output_put(handle, avail);
> +
> +               if (avail) {
> +                       u64 mask = event->attr.sample_regs_user;
> +                       perf_output_sample_regs(handle,
> +                                               data->regs_user,
> +                                               mask);
> +               }
> +       }
>  }
>
>  void perf_prepare_sample(struct perf_event_header *header,
> @@ -4061,6 +4105,19 @@ void perf_prepare_sample(struct perf_event_header *header,
>                }
>                header->size += size;
>        }
> +
> +       if (event->attr.sample_regs_user) {
> +               /* regs dump available bool */
> +               int size = sizeof(u64);
> +
> +               data->regs_user = perf_sample_regs_user(regs);
> +               if (data->regs_user) {
> +                       u64 mask = event->attr.sample_regs_user;
> +                       size += hweight64(mask) * sizeof(u64);
> +               }
> +
> +               header->size += size;
> +       }
>  }
>
>  static void perf_event_output(struct perf_event *event,
> @@ -6110,6 +6167,10 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
>                        attr->branch_sample_type = mask;
>                }
>        }
> +
> +       if (attr->sample_regs_user)
> +               ret = perf_reg_validate(attr->sample_regs_user);
> +
>  out:
>        return ret;
>
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 11:16   ` Stephane Eranian
@ 2012-06-13 13:12     ` Jiri Olsa
  2012-06-13 13:18       ` Stephane Eranian
  2012-06-13 13:37       ` Peter Zijlstra
  0 siblings, 2 replies; 37+ messages in thread
From: Jiri Olsa @ 2012-06-13 13:12 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, Jun 13, 2012 at 01:16:44PM +0200, Stephane Eranian wrote:
> On Mon, Jun 11, 2012 at 3:19 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > Introducing sample_regs_user bitmask into perf_event_attr
> > struct to define the user level registers we want to attach
> > to the sample. The dump itself is triggered once the
> > sample_regs_user is not empty.
> >
> > Only user level registers are dump at the moment. Meaning the
> > register values of the user space context as it was before the
> > user entered the kernel for whatever reason (syscall, irq,
> > exception, or a PMI happening in userspace).
> >
> > The layout of the sample_regs_user bitmap is described in
> > asm/perf_regs.h for archs that support register dump.
> >
> > This is going to be useful to bring Dwarf CFI based stack
> > unwinding on top of samples.
> >
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> > ---
> >  include/linux/perf_event.h |   10 ++++++-
> >  kernel/events/core.c       |   61 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 70 insertions(+), 1 deletions(-)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 1ce887a..d66cbeb 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -271,7 +271,13 @@ struct perf_event_attr {
> >                __u64           bp_len;
> >                __u64           config2; /* extension of config1 */
> >        };
> > -       __u64   branch_sample_type; /* enum branch_sample_type */
> > +       __u64   branch_sample_type; /* enum perf_branch_sample_type */
> > +
> > +       /*
> > +        * Defines set of user regs to dump on samples.
> > +        * See asm/perf_regs.h for details.
> > +        */
> > +       __u64   sample_regs_user;
> >  };
> That's not enough. You also need to define PERF_SAMPLE_USER_REGS
> for sample_type. Although the sample_regs_users might look like it's enough
> to capture regs, there is a problem when it comes to parsing the record. You
> need an ordering guarantee that is explicitly spelled out in the API (the header
> file). In your current patch, I have no way of knowing that sample_regs_users
> are saved after BRANCH_STACK (should you have that enabled). Remember
> that you can turn on/off sampled infos at will in sample_type. Yet to find the
> infos when parsing, you need to know the order.

Well, the sample_regs_user != 0 substitute the PERF_SAMPLE_USER_REGS bit.
The behaviour is the same as if there was that bit defined..

After last discussion the idea was to keep this just with sample_regs_user != 0.

I dont see any limitation except for being incosistent with the rest of
the sample dumps. I'm all for having that PERF_SAMPLE_USER_REGS bit and
the user stack bit as well.

jirka

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:12     ` Jiri Olsa
@ 2012-06-13 13:18       ` Stephane Eranian
  2012-06-13 13:23         ` Jiri Olsa
  2012-06-13 13:40         ` Peter Zijlstra
  2012-06-13 13:37       ` Peter Zijlstra
  1 sibling, 2 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-06-13 13:18 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, Jun 13, 2012 at 3:12 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Jun 13, 2012 at 01:16:44PM +0200, Stephane Eranian wrote:
>> On Mon, Jun 11, 2012 at 3:19 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>> > Introducing sample_regs_user bitmask into perf_event_attr
>> > struct to define the user level registers we want to attach
>> > to the sample. The dump itself is triggered once the
>> > sample_regs_user is not empty.
>> >
>> > Only user level registers are dump at the moment. Meaning the
>> > register values of the user space context as it was before the
>> > user entered the kernel for whatever reason (syscall, irq,
>> > exception, or a PMI happening in userspace).
>> >
>> > The layout of the sample_regs_user bitmap is described in
>> > asm/perf_regs.h for archs that support register dump.
>> >
>> > This is going to be useful to bring Dwarf CFI based stack
>> > unwinding on top of samples.
>> >
>> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
>> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
>> > ---
>> >  include/linux/perf_event.h |   10 ++++++-
>> >  kernel/events/core.c       |   61 ++++++++++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 70 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> > index 1ce887a..d66cbeb 100644
>> > --- a/include/linux/perf_event.h
>> > +++ b/include/linux/perf_event.h
>> > @@ -271,7 +271,13 @@ struct perf_event_attr {
>> >                __u64           bp_len;
>> >                __u64           config2; /* extension of config1 */
>> >        };
>> > -       __u64   branch_sample_type; /* enum branch_sample_type */
>> > +       __u64   branch_sample_type; /* enum perf_branch_sample_type */
>> > +
>> > +       /*
>> > +        * Defines set of user regs to dump on samples.
>> > +        * See asm/perf_regs.h for details.
>> > +        */
>> > +       __u64   sample_regs_user;
>> >  };
>> That's not enough. You also need to define PERF_SAMPLE_USER_REGS
>> for sample_type. Although the sample_regs_users might look like it's enough
>> to capture regs, there is a problem when it comes to parsing the record. You
>> need an ordering guarantee that is explicitly spelled out in the API (the header
>> file). In your current patch, I have no way of knowing that sample_regs_users
>> are saved after BRANCH_STACK (should you have that enabled). Remember
>> that you can turn on/off sampled infos at will in sample_type. Yet to find the
>> infos when parsing, you need to know the order.
>
> Well, the sample_regs_user != 0 substitute the PERF_SAMPLE_USER_REGS bit.
> The behaviour is the same as if there was that bit defined..
>
No it's not the same.  Looking at sample_regs_user != 0, do you know in which
order the regs array is going to appear RELATIVE to the other captured
information?

Take sample_type = IP|CPU|PERIOD, sample_regs_users = EAX

Now, I get the raw record, want to parse it. Which comes first the user_regs
or the IP, CPU, PERIOD?

Worst, I add more entries to PERF_SAMPLE_*, are they laid out before or
after the regs?

If you look carefully at perf_output_sample(), you will notice that data is
written in the exact order of the enum perf_event_sample_format. Otherwise
there is no way to parse this in the right order without looking at the kernel
source code, which is not the right way....


> After last discussion the idea was to keep this just with sample_regs_user != 0.
>
> I dont see any limitation except for being incosistent with the rest of
> the sample dumps. I'm all for having that PERF_SAMPLE_USER_REGS bit and
> the user stack bit as well.
>
> jirka

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:18       ` Stephane Eranian
@ 2012-06-13 13:23         ` Jiri Olsa
  2012-06-13 13:25           ` Stephane Eranian
  2012-06-13 13:40         ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2012-06-13 13:23 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, Jun 13, 2012 at 03:18:54PM +0200, Stephane Eranian wrote:
> On Wed, Jun 13, 2012 at 3:12 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > On Wed, Jun 13, 2012 at 01:16:44PM +0200, Stephane Eranian wrote:
> >> On Mon, Jun 11, 2012 at 3:19 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> >> > Introducing sample_regs_user bitmask into perf_event_attr
> >> > struct to define the user level registers we want to attach
> >> > to the sample. The dump itself is triggered once the
> >> > sample_regs_user is not empty.
> >> >
> >> > Only user level registers are dump at the moment. Meaning the
> >> > register values of the user space context as it was before the
> >> > user entered the kernel for whatever reason (syscall, irq,
> >> > exception, or a PMI happening in userspace).
> >> >
> >> > The layout of the sample_regs_user bitmap is described in
> >> > asm/perf_regs.h for archs that support register dump.
> >> >
> >> > This is going to be useful to bring Dwarf CFI based stack
> >> > unwinding on top of samples.
> >> >
> >> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> >> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> >> > ---
> >> >  include/linux/perf_event.h |   10 ++++++-
> >> >  kernel/events/core.c       |   61 ++++++++++++++++++++++++++++++++++++++++++++
> >> >  2 files changed, 70 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> >> > index 1ce887a..d66cbeb 100644
> >> > --- a/include/linux/perf_event.h
> >> > +++ b/include/linux/perf_event.h
> >> > @@ -271,7 +271,13 @@ struct perf_event_attr {
> >> >                __u64           bp_len;
> >> >                __u64           config2; /* extension of config1 */
> >> >        };
> >> > -       __u64   branch_sample_type; /* enum branch_sample_type */
> >> > +       __u64   branch_sample_type; /* enum perf_branch_sample_type */
> >> > +
> >> > +       /*
> >> > +        * Defines set of user regs to dump on samples.
> >> > +        * See asm/perf_regs.h for details.
> >> > +        */
> >> > +       __u64   sample_regs_user;
> >> >  };
> >> That's not enough. You also need to define PERF_SAMPLE_USER_REGS
> >> for sample_type. Although the sample_regs_users might look like it's enough
> >> to capture regs, there is a problem when it comes to parsing the record. You
> >> need an ordering guarantee that is explicitly spelled out in the API (the header
> >> file). In your current patch, I have no way of knowing that sample_regs_users
> >> are saved after BRANCH_STACK (should you have that enabled). Remember
> >> that you can turn on/off sampled infos at will in sample_type. Yet to find the
> >> infos when parsing, you need to know the order.
> >
> > Well, the sample_regs_user != 0 substitute the PERF_SAMPLE_USER_REGS bit.
> > The behaviour is the same as if there was that bit defined..
> >
> No it's not the same.  Looking at sample_regs_user != 0, do you know in which
> order the regs array is going to appear RELATIVE to the other captured
> information?
> 
> Take sample_type = IP|CPU|PERIOD, sample_regs_users = EAX
> 
> Now, I get the raw record, want to parse it. Which comes first the user_regs
> or the IP, CPU, PERIOD?
> 
> Worst, I add more entries to PERF_SAMPLE_*, are they laid out before or
> after the regs?

after.. but only because I know that.. yep, I think you're right,
we should track it in the sample_type enum.. I'll add those 2 bits

jirka

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:23         ` Jiri Olsa
@ 2012-06-13 13:25           ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-06-13 13:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, Jun 13, 2012 at 3:23 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Jun 13, 2012 at 03:18:54PM +0200, Stephane Eranian wrote:
>> On Wed, Jun 13, 2012 at 3:12 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>> > On Wed, Jun 13, 2012 at 01:16:44PM +0200, Stephane Eranian wrote:
>> >> On Mon, Jun 11, 2012 at 3:19 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>> >> > Introducing sample_regs_user bitmask into perf_event_attr
>> >> > struct to define the user level registers we want to attach
>> >> > to the sample. The dump itself is triggered once the
>> >> > sample_regs_user is not empty.
>> >> >
>> >> > Only user level registers are dump at the moment. Meaning the
>> >> > register values of the user space context as it was before the
>> >> > user entered the kernel for whatever reason (syscall, irq,
>> >> > exception, or a PMI happening in userspace).
>> >> >
>> >> > The layout of the sample_regs_user bitmap is described in
>> >> > asm/perf_regs.h for archs that support register dump.
>> >> >
>> >> > This is going to be useful to bring Dwarf CFI based stack
>> >> > unwinding on top of samples.
>> >> >
>> >> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
>> >> > Signed-off-by: Jiri Olsa <jolsa@redhat.com>
>> >> > ---
>> >> >  include/linux/perf_event.h |   10 ++++++-
>> >> >  kernel/events/core.c       |   61 ++++++++++++++++++++++++++++++++++++++++++++
>> >> >  2 files changed, 70 insertions(+), 1 deletions(-)
>> >> >
>> >> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> >> > index 1ce887a..d66cbeb 100644
>> >> > --- a/include/linux/perf_event.h
>> >> > +++ b/include/linux/perf_event.h
>> >> > @@ -271,7 +271,13 @@ struct perf_event_attr {
>> >> >                __u64           bp_len;
>> >> >                __u64           config2; /* extension of config1 */
>> >> >        };
>> >> > -       __u64   branch_sample_type; /* enum branch_sample_type */
>> >> > +       __u64   branch_sample_type; /* enum perf_branch_sample_type */
>> >> > +
>> >> > +       /*
>> >> > +        * Defines set of user regs to dump on samples.
>> >> > +        * See asm/perf_regs.h for details.
>> >> > +        */
>> >> > +       __u64   sample_regs_user;
>> >> >  };
>> >> That's not enough. You also need to define PERF_SAMPLE_USER_REGS
>> >> for sample_type. Although the sample_regs_users might look like it's enough
>> >> to capture regs, there is a problem when it comes to parsing the record. You
>> >> need an ordering guarantee that is explicitly spelled out in the API (the header
>> >> file). In your current patch, I have no way of knowing that sample_regs_users
>> >> are saved after BRANCH_STACK (should you have that enabled). Remember
>> >> that you can turn on/off sampled infos at will in sample_type. Yet to find the
>> >> infos when parsing, you need to know the order.
>> >
>> > Well, the sample_regs_user != 0 substitute the PERF_SAMPLE_USER_REGS bit.
>> > The behaviour is the same as if there was that bit defined..
>> >
>> No it's not the same.  Looking at sample_regs_user != 0, do you know in which
>> order the regs array is going to appear RELATIVE to the other captured
>> information?
>>
>> Take sample_type = IP|CPU|PERIOD, sample_regs_users = EAX
>>
>> Now, I get the raw record, want to parse it. Which comes first the user_regs
>> or the IP, CPU, PERIOD?
>>
>> Worst, I add more entries to PERF_SAMPLE_*, are they laid out before or
>> after the regs?
>
> after.. but only because I know that.. yep, I think you're right,
> we should track it in the sample_type enum.. I'll add those 2 bits
>
Good thanks.

> jirka

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-11 13:19 ` [PATCH 02/19] perf: Add ability to attach user level registers dump to sample Jiri Olsa
  2012-06-13 11:16   ` Stephane Eranian
@ 2012-06-13 13:29   ` Stephane Eranian
  1 sibling, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-06-13 13:29 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Mon, Jun 11, 2012 at 3:19 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> Introducing sample_regs_user bitmask into perf_event_attr
> struct to define the user level registers we want to attach
> to the sample. The dump itself is triggered once the
> sample_regs_user is not empty.
>
> Only user level registers are dump at the moment. Meaning the
> register values of the user space context as it was before the
> user entered the kernel for whatever reason (syscall, irq,
> exception, or a PMI happening in userspace).
>
> The layout of the sample_regs_user bitmap is described in
> asm/perf_regs.h for archs that support register dump.
>
> This is going to be useful to bring Dwarf CFI based stack
> unwinding on top of samples.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  include/linux/perf_event.h |   10 ++++++-
>  kernel/events/core.c       |   61 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 70 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1ce887a..d66cbeb 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -271,7 +271,13 @@ struct perf_event_attr {
>                __u64           bp_len;
>                __u64           config2; /* extension of config1 */
>        };
> -       __u64   branch_sample_type; /* enum branch_sample_type */
> +       __u64   branch_sample_type; /* enum perf_branch_sample_type */
> +
> +       /*
> +        * Defines set of user regs to dump on samples.
> +        * See asm/perf_regs.h for details.
> +        */
> +       __u64   sample_regs_user;
>  };
>
>  /*
> @@ -609,6 +615,7 @@ struct perf_guest_info_callbacks {
>  #include <linux/static_key.h>
>  #include <linux/atomic.h>
>  #include <linux/sysfs.h>
> +#include <linux/perf_regs.h>
>  #include <asm/local.h>
>
>  struct perf_callchain_entry {
> @@ -1131,6 +1138,7 @@ struct perf_sample_data {
>        struct perf_callchain_entry     *callchain;
>        struct perf_raw_record          *raw;
>        struct perf_branch_stack        *br_stack;
> +       struct pt_regs                  *regs_user;
>  };
>
that one needs to be initialized in perf_sample_data_init()
otherwise you may get random junk. It's allocated on the
stack.

>  static inline void perf_sample_data_init(struct perf_sample_data *data,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f85c015..e4df59d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3750,6 +3750,33 @@ int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
>  }
>  EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
>
> +static void
> +perf_output_sample_regs(struct perf_output_handle *handle,
> +                       struct pt_regs *regs, u64 mask)
> +{
> +       int bit;
> +
> +       for_each_set_bit(bit, (const unsigned long *) &mask,
> +                        sizeof(mask) * BITS_PER_BYTE) {
> +               u64 val;
> +
> +               val = perf_reg_value(regs, bit);
> +               perf_output_put(handle, val);
> +       }
> +}
> +
> +static struct pt_regs *perf_sample_regs_user(struct pt_regs *regs)
> +{
> +       if (!user_mode(regs)) {
> +               if (current->mm)
> +                       regs = task_pt_regs(current);
> +               else
> +                       regs = NULL;
> +       }
> +
> +       return regs;
> +}
> +
>  static void __perf_event_header__init_id(struct perf_event_header *header,
>                                         struct perf_sample_data *data,
>                                         struct perf_event *event)
> @@ -4010,6 +4037,23 @@ void perf_output_sample(struct perf_output_handle *handle,
>                        perf_output_put(handle, nr);
>                }
>        }
> +
> +       if (event->attr.sample_regs_user) {
> +               u64 avail = (data->regs_user != NULL);
> +
> +               /*
> +                * If there are no regs to dump, notice it through
> +                * first u64 being zero.
> +                */
> +               perf_output_put(handle, avail);
> +
> +               if (avail) {
> +                       u64 mask = event->attr.sample_regs_user;
> +                       perf_output_sample_regs(handle,
> +                                               data->regs_user,
> +                                               mask);
> +               }
> +       }
>  }
>
>  void perf_prepare_sample(struct perf_event_header *header,
> @@ -4061,6 +4105,19 @@ void perf_prepare_sample(struct perf_event_header *header,
>                }
>                header->size += size;
>        }
> +
> +       if (event->attr.sample_regs_user) {
> +               /* regs dump available bool */
> +               int size = sizeof(u64);
> +
> +               data->regs_user = perf_sample_regs_user(regs);
> +               if (data->regs_user) {
> +                       u64 mask = event->attr.sample_regs_user;
> +                       size += hweight64(mask) * sizeof(u64);
> +               }
> +
> +               header->size += size;
> +       }
>  }
>
>  static void perf_event_output(struct perf_event *event,
> @@ -6110,6 +6167,10 @@ static int perf_copy_attr(struct perf_event_attr __user *uattr,
>                        attr->branch_sample_type = mask;
>                }
>        }
> +
> +       if (attr->sample_regs_user)
> +               ret = perf_reg_validate(attr->sample_regs_user);
> +
>  out:
>        return ret;
>
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:12     ` Jiri Olsa
  2012-06-13 13:18       ` Stephane Eranian
@ 2012-06-13 13:37       ` Peter Zijlstra
  1 sibling, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2012-06-13 13:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Stephane Eranian, acme, mingo, paulus, cjashfor, fweisbec,
	gorcunov, tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, 2012-06-13 at 15:12 +0200, Jiri Olsa wrote:
> 
> I dont see any limitation except for being incosistent with the rest
> of
> the sample dumps. I'm all for having that PERF_SAMPLE_USER_REGS bit
> and
> the user stack bit as well. 

Humm,. we don't have a PERF_SAMPLE_ bit anymore? We should have this.

I was only objecting to this other flag word where you were having one
or two bits set for no apparent reason.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:18       ` Stephane Eranian
  2012-06-13 13:23         ` Jiri Olsa
@ 2012-06-13 13:40         ` Peter Zijlstra
  2012-06-13 13:41           ` Stephane Eranian
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2012-06-13 13:40 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jiri Olsa, acme, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, 2012-06-13 at 15:18 +0200, Stephane Eranian wrote:
> 
> If you look carefully at perf_output_sample(), you will notice that data is
> written in the exact order of the enum perf_event_sample_format. 

Not so actually.. CALLCHAIN is out of order. Not sure why we did that
though.


But it should match the comment near PERF_RECORD_SAMPLE.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:40         ` Peter Zijlstra
@ 2012-06-13 13:41           ` Stephane Eranian
  2012-06-13 13:51             ` Stephane Eranian
  2012-06-14  8:36             ` Peter Zijlstra
  0 siblings, 2 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-06-13 13:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, acme, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, Jun 13, 2012 at 3:40 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Wed, 2012-06-13 at 15:18 +0200, Stephane Eranian wrote:
>>
>> If you look carefully at perf_output_sample(), you will notice that data is
>> written in the exact order of the enum perf_event_sample_format.
>
> Not so actually.. CALLCHAIN is out of order. Not sure why we did that
> though.
>
>
> But it should match the comment near PERF_RECORD_SAMPLE.

Ok, yes that one. It would have been much nicer to follow the enum
order.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:41           ` Stephane Eranian
@ 2012-06-13 13:51             ` Stephane Eranian
  2012-06-14  8:36             ` Peter Zijlstra
  1 sibling, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-06-13 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, acme, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, Jun 13, 2012 at 3:41 PM, Stephane Eranian <eranian@google.com> wrote:
> On Wed, Jun 13, 2012 at 3:40 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> On Wed, 2012-06-13 at 15:18 +0200, Stephane Eranian wrote:
>>>
>>> If you look carefully at perf_output_sample(), you will notice that data is
>>> written in the exact order of the enum perf_event_sample_format.
>>
>> Not so actually.. CALLCHAIN is out of order. Not sure why we did that
>> though.
>>
>>
>> But it should match the comment near PERF_RECORD_SAMPLE.
>
> Ok, yes that one. It would have been much nicer to follow the enum
> order.
In that case, the comment needs to be update to show the sample_user_regs[]
in the right order.

         *      { u32                   size;
         *        char                  data[size];}&& PERF_SAMPLE_RAW
         *
         *      { u64 from, to, flags } lbr[nr];} && PERF_SAMPLE_BRANCH_STACK
         *      { u64 size; u64 regs[size];} && PERF_SAMPLE_USER_REGS

Same thing for the user stack.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-13 13:41           ` Stephane Eranian
  2012-06-13 13:51             ` Stephane Eranian
@ 2012-06-14  8:36             ` Peter Zijlstra
  2012-06-14 10:45               ` Stephane Eranian
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2012-06-14  8:36 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jiri Olsa, acme, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Wed, 2012-06-13 at 15:41 +0200, Stephane Eranian wrote:
> Ok, yes that one. It would have been much nicer to follow the enum
> order. 

Yes.. I should make a list of all the things we should've done
differently. Maybe if that list grows long enough someone gets motivated
enough to rev the format as a whole.

It would be a tedious endeavour since you'd need to have both output
paths available for a good while, but it might be worth it in the long
run depending on how painful the current format becomes.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-14  8:36             ` Peter Zijlstra
@ 2012-06-14 10:45               ` Stephane Eranian
  2012-06-14 10:50                 ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Stephane Eranian @ 2012-06-14 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, acme, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Thu, Jun 14, 2012 at 10:36 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Wed, 2012-06-13 at 15:41 +0200, Stephane Eranian wrote:
>> Ok, yes that one. It would have been much nicer to follow the enum
>> order.
>
> Yes.. I should make a list of all the things we should've done
> differently. Maybe if that list grows long enough someone gets motivated
> enough to rev the format as a whole.
>
Yeah, that would be good.

I will soon post a patch to extend that format to make it possible to
have distinct sample_type per event and yet be able to parse them.
I am still waiting for Arnaldo to take my perf patches concerning pipe
mode first. Can't have too many patch sets in flight at the same time.

> It would be a tedious endeavour since you'd need to have both output
> paths available for a good while, but it might be worth it in the long
> run depending on how painful the current format becomes.

True. What we don't know however, it how many other tools are out
there using the kernel API directly.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-14 10:45               ` Stephane Eranian
@ 2012-06-14 10:50                 ` Peter Zijlstra
  2012-06-14 10:56                   ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2012-06-14 10:50 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jiri Olsa, acme, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Thu, 2012-06-14 at 12:45 +0200, Stephane Eranian wrote:
> > It would be a tedious endeavour since you'd need to have both output
> > paths available for a good while, but it might be worth it in the long
> > run depending on how painful the current format becomes.
> 
> True. What we don't know however, it how many other tools are out
> there using the kernel API directly. 

Right.. but if you leave the old stuff in for say 2 years and start
emitting a warning on using the old stuff after 1 year.. give or take a
few years we should be able to get the old stuff deprecated and removed.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/19] perf: Add ability to attach user level registers dump to sample
  2012-06-14 10:50                 ` Peter Zijlstra
@ 2012-06-14 10:56                   ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2012-06-14 10:56 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jiri Olsa, acme, mingo, paulus, cjashfor, fweisbec, gorcunov,
	tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

On Thu, 2012-06-14 at 12:50 +0200, Peter Zijlstra wrote:
> On Thu, 2012-06-14 at 12:45 +0200, Stephane Eranian wrote:
> > > It would be a tedious endeavour since you'd need to have both output
> > > paths available for a good while, but it might be worth it in the long
> > > run depending on how painful the current format becomes.
> > 
> > True. What we don't know however, it how many other tools are out
> > there using the kernel API directly. 
> 
> Right.. but if you leave the old stuff in for say 2 years and start
> emitting a warning on using the old stuff after 1 year.. give or take a
> few years we should be able to get the old stuff deprecated and removed.

Another thing I did consider was trying to JIT a whole output path in
order to reduce the insane amount of conditionals in there. It would be
very nice if GCC grew something to help stitch code snippets together so
we could write it arch independent.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [tip:perf/core] perf tools: Remove unused evsel parameter from machine__resolve_callchain
  2012-06-11 13:20 ` [PATCH 08/19] perf, tool: Remove unsused evsel parameter from machine__resolve_callchain Jiri Olsa
@ 2012-06-20 16:59   ` tip-bot for Jiri Olsa
  0 siblings, 0 replies; 37+ messages in thread
From: tip-bot for Jiri Olsa @ 2012-06-20 16:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, eranian, mingo, gorcunov, a.p.zijlstra, benjamin.redelings,
	jolsa, drepper, robert.richter, fweisbec, tglx, cjashfor,
	asharma, linux-kernel, hpa, fche, paulus, tzanussi,
	masami.hiramatsu.pt, mingo

Commit-ID:  a9c34a9f9c677fcbe06bd3eda8d6caa3487b4a65
Gitweb:     http://git.kernel.org/tip/a9c34a9f9c677fcbe06bd3eda8d6caa3487b4a65
Author:     Jiri Olsa <jolsa@redhat.com>
AuthorDate: Mon, 11 Jun 2012 15:20:03 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 19 Jun 2012 13:06:21 -0300

perf tools: Remove unused evsel parameter from machine__resolve_callchain

Removing unused evsel parameter from machine__resolve_callchain
function. Plus related header file and callers changes.

The evsel parameter is unused since following commit:
  perf callchain: Make callchain cursors TLS
  commit 472606458f3e1ced5fe3cc5f04e90a6b5a4732cf
  Author: Namhyung Kim <namhyung.kim@lge.com>
  Date:   Thu May 31 14:43:26 2012 +0900

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1339420814-7379-9-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c |    4 ++--
 tools/perf/builtin-script.c |    4 ++--
 tools/perf/builtin-top.c    |    2 +-
 tools/perf/util/map.h       |    2 +-
 tools/perf/util/session.c   |    7 +++----
 tools/perf/util/session.h   |    4 ++--
 6 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ea8ce8e..40b0ffc 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -69,7 +69,7 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 
 	if ((sort__has_parent || symbol_conf.use_callchain)
 	    && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
+		err = machine__resolve_callchain(machine, al->thread,
 						 sample->callchain, &parent);
 		if (err)
 			return err;
@@ -140,7 +140,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct hist_entry *he;
 
 	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
+		err = machine__resolve_callchain(machine, al->thread,
 						 sample->callchain, &parent);
 		if (err)
 			return err;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8f9f9b6..8fecd3b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -389,7 +389,7 @@ static void print_sample_bts(union perf_event *event,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine, evsel,
+		perf_event__print_ip(event, sample, machine,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
@@ -433,7 +433,7 @@ static void process_event(union perf_event *event __unused,
 			printf(" ");
 		else
 			printf("\n");
-		perf_event__print_ip(event, sample, machine, evsel,
+		perf_event__print_ip(event, sample, machine,
 				     PRINT_FIELD(SYM), PRINT_FIELD(DSO),
 				     PRINT_FIELD(SYMOFFSET));
 	}
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 8090a28..e3cab5f 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -774,7 +774,7 @@ static void perf_event__process_sample(struct perf_tool *tool,
 
 		if ((sort__has_parent || symbol_conf.use_callchain) &&
 		    sample->callchain) {
-			err = machine__resolve_callchain(machine, evsel, al.thread,
+			err = machine__resolve_callchain(machine, al.thread,
 							 sample->callchain, &parent);
 			if (err)
 				return;
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 81371ba..c14c665 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -157,7 +157,7 @@ void machine__exit(struct machine *self);
 void machine__delete(struct machine *self);
 
 int machine__resolve_callchain(struct machine *machine,
-			       struct perf_evsel *evsel, struct thread *thread,
+			       struct thread *thread,
 			       struct ip_callchain *chain,
 			       struct symbol **parent);
 int maps__set_kallsyms_ref_reloc_sym(struct map **maps, const char *symbol_name,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 582ee38..febc0ae 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -289,7 +289,6 @@ struct branch_info *machine__resolve_bstack(struct machine *self,
 }
 
 int machine__resolve_callchain(struct machine *self,
-			       struct perf_evsel *evsel __used,
 			       struct thread *thread,
 			       struct ip_callchain *chain,
 			       struct symbol **parent)
@@ -1480,8 +1479,8 @@ struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 }
 
 void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
-			  struct machine *machine, struct perf_evsel *evsel,
-			  int print_sym, int print_dso, int print_symoffset)
+			  struct machine *machine, int print_sym,
+			  int print_dso, int print_symoffset)
 {
 	struct addr_location al;
 	struct callchain_cursor_node *node;
@@ -1495,7 +1494,7 @@ void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
 
 	if (symbol_conf.use_callchain && sample->callchain) {
 
-		if (machine__resolve_callchain(machine, evsel, al.thread,
+		if (machine__resolve_callchain(machine, al.thread,
 						sample->callchain, NULL) != 0) {
 			if (verbose)
 				error("Failed to resolve callchain. Skipping\n");
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 7a5434c..877d781 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -150,8 +150,8 @@ struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
 					    unsigned int type);
 
 void perf_event__print_ip(union perf_event *event, struct perf_sample *sample,
-			  struct machine *machine, struct perf_evsel *evsel,
-			  int print_sym, int print_dso, int print_symoffset);
+			  struct machine *machine, int print_sym,
+			  int print_dso, int print_symoffset);
 
 int perf_session__cpu_bitmap(struct perf_session *session,
 			     const char *cpu_list, unsigned long *cpu_bitmap);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 12/19] perf, tool: Back [vdso] DSO with real data
  2012-06-11 13:20 ` [PATCH 12/19] perf, tool: Back [vdso] DSO with real data Jiri Olsa
@ 2012-06-29 18:49   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-06-29 18:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: a.p.zijlstra, mingo, paulus, cjashfor, fweisbec, eranian,
	gorcunov, tzanussi, mhiramat, robert.richter, fche, linux-kernel,
	masami.hiramatsu.pt, drepper, asharma, benjamin.redelings

Em Mon, Jun 11, 2012 at 03:20:07PM +0200, Jiri Olsa escreveu:
> Storing data for VDSO shared object, because we need it for
> the unwind process.
> 
> The idea is that VDSO shared object is same for all process
> on a running system, so it makes no difference if we store
> it inside the tracer - perf.
> 
> The record command:
> When [vdso] map memory is hit, we retrieve [vdso] DSO image
> and store it into temporary file. During the build-id
> processing the [vdso] DSO image is stored in build-id db,
> and build-id refference is made inside perf.data. The temporary
> file is removed when record is finished.
> 
> The report command:
> We read build-id from perf.data and store [vdso] DSO object.
> This object is refferenced and attached to map when the MMAP
> events are processed. Thus during the SAMPLE event processing
> we have correct mmap/dso attached.
> 
> Adding following API for vdso object:
>   vdso__file
>     - vdso temp file path
> 
>   vdso__get_file
>     - finds and store VDSO image into temp file,
>       the temp file path is returned
> 
>   vdso__exit
>     - removes temporary VDSO image if there's any
> 
> Signed-off-by: Jiri Olsa <jolsa@redhat.com>
> ---
>  tools/perf/Makefile       |    2 +
>  tools/perf/util/map.c     |   23 ++++++++++-
>  tools/perf/util/session.c |    2 +
>  tools/perf/util/vdso.c    |   90 +++++++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/vdso.h    |    8 ++++
>  5 files changed, 123 insertions(+), 2 deletions(-)
>  create mode 100644 tools/perf/util/vdso.c
>  create mode 100644 tools/perf/util/vdso.h
> 
> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> index 0eee64c..e48b969 100644
> --- a/tools/perf/Makefile
> +++ b/tools/perf/Makefile
> @@ -319,6 +319,7 @@ LIB_H += $(ARCH_INCLUDE)
>  LIB_H += util/cgroup.h
>  LIB_H += $(TRACE_EVENT_DIR)event-parse.h
>  LIB_H += util/target.h
> +LIB_H += util/vdso.h
>  
>  LIB_OBJS += $(OUTPUT)util/abspath.o
>  LIB_OBJS += $(OUTPUT)util/alias.o
> @@ -382,6 +383,7 @@ LIB_OBJS += $(OUTPUT)util/xyarray.o
>  LIB_OBJS += $(OUTPUT)util/cpumap.o
>  LIB_OBJS += $(OUTPUT)util/cgroup.o
>  LIB_OBJS += $(OUTPUT)util/target.o
> +LIB_OBJS += $(OUTPUT)util/vdso.o
>  
>  BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
>  
> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
> index 35ae568..1649ea0 100644
> --- a/tools/perf/util/map.c
> +++ b/tools/perf/util/map.c
> @@ -7,6 +7,7 @@
>  #include <stdio.h>
>  #include <unistd.h>
>  #include "map.h"
> +#include "vdso.h"
>  
>  const char *map_type__name[MAP__NR_TYPES] = {
>  	[MAP__FUNCTION] = "Functions",
> @@ -18,10 +19,14 @@ static inline int is_anon_memory(const char *filename)
>  	return strcmp(filename, "//anon") == 0;
>  }
>  
> +static inline int is_vdso_memory(const char *filename)
> +{
> +	return !strcmp(filename, "[vdso]");
> +}
> +
>  static inline int is_no_dso_memory(const char *filename)
>  {
>  	return !strcmp(filename, "[stack]") ||
> -	       !strcmp(filename, "[vdso]")  ||
>  	       !strcmp(filename, "[heap]");
>  }
>  
> @@ -50,9 +55,10 @@ struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
>  	if (self != NULL) {
>  		char newfilename[PATH_MAX];
>  		struct dso *dso;
> -		int anon, no_dso;
> +		int anon, no_dso, vdso;
>  
>  		anon = is_anon_memory(filename);
> +		vdso = is_vdso_memory(filename);
>  		no_dso = is_no_dso_memory(filename);
>  
>  		if (anon) {
> @@ -60,10 +66,23 @@ struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
>  			filename = newfilename;
>  		}
>  
> +		if (vdso) {
> +			filename = (char *) vdso__file;
> +			pgoff = 0;
> +		}
> +
>  		dso = __dsos__findnew(dsos__list, filename);
>  		if (dso == NULL)
>  			goto out_delete;
>  
> +		if (vdso && !dso->has_build_id) {
> +			char *file_vdso = vdso__get_file();
> +			if (file_vdso)
> +				dso__set_long_name(dso, file_vdso);
> +			else
> +				no_dso = 1;
> +		}
> +
>  		map__init(self, type, start, start + len, pgoff, dso);
>  
>  		if (anon || no_dso) {
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 2785ce8..f400612 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -14,6 +14,7 @@
>  #include "sort.h"
>  #include "util.h"
>  #include "cpumap.h"
> +#include "vdso.h"
>  
>  static int perf_session__open(struct perf_session *self, bool force)
>  {
> @@ -209,6 +210,7 @@ void perf_session__delete(struct perf_session *self)
>  	machine__exit(&self->host_machine);
>  	close(self->fd);
>  	free(self);
> +	vdso__exit();
>  }
>  
>  void machine__remove_thread(struct machine *self, struct thread *th)
> diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
> new file mode 100644
> index 0000000..e964482
> --- /dev/null
> +++ b/tools/perf/util/vdso.c
> @@ -0,0 +1,90 @@
> +
> +#include <unistd.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <linux/kernel.h>
> +#include "vdso.h"
> +#include "util.h"
> +
> +const char vdso__file[] = "/tmp/vdso.so";

Oops, can you please use mkstemp()?

> +static bool vdso_found;
> +
> +static int find_vdso_map(void **start, void **end)
> +{
> +	FILE *maps;
> +	char line[128];
> +	int found = 0;
> +
> +	maps = fopen("/proc/self/maps", "r");
> +	if (!maps) {
> +		pr_err("vdso: cannot open maps\n");
> +		return -1;
> +	}
> +
> +	while (!found && fgets(line, sizeof(line), maps)) {
> +		int m = -1;
> +
> +		/* We care only about private r-x mappings. */
> +		if (2 != sscanf(line, "%p-%p r-xp %*x %*x:%*x %*u %n",
> +				start, end, &m))
> +			continue;
> +		if (m < 0)
> +			continue;
> +
> +		if (!strncmp(&line[m], "[vdso]", 6))
> +			found = 1;
> +	}
> +
> +	fclose(maps);
> +	return !found;
> +}
> +
> +char *vdso__get_file(void)
> +{
> +	char *vdso = NULL;
> +	char *buf = NULL;
> +	void *start, *end;
> +
> +	do {
> +		int fd, size;
> +
> +		if (vdso_found) {
> +			vdso = (char *) vdso__file;

codying stile bzzt, use:

			vdso = (char *)vdso__file;

> +			break;
> +		}
> +
> +		if (find_vdso_map(&start, &end))
> +			break;
> +
> +		size = end - start;
> +		buf = malloc(size);
> +		if (!buf)
> +			break;
> +
> +		memcpy(buf, start, size);

Introduce memdup, just like in the kernel we have kmemdup?

> +
> +		fd = open(vdso__file, O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
> +		if (fd < 0)
> +			break;
> +
> +		if (size == write(fd, buf, size))
> +			vdso = (char *) vdso__file;

coding style

> +
> +		close(fd);
> +	} while (0);

And what is the point of this while construct?

> +
> +	if (buf)
> +		free(buf);

No need to check, free accepts NULL pointers.

> +
> +	vdso_found = (vdso != NULL);
> +	return vdso;
> +}
> +
> +void vdso__exit(void)
> +{
> +	if (vdso_found)
> +		unlink(vdso__file);

What is the point of using a global variable to check if you need to
call unlink if you don't check unlink's result? Just ditch vdso_found
and call unlink unconditionally, but on the temp file created.

And be careful how you use that temp filename...

> +}
> diff --git a/tools/perf/util/vdso.h b/tools/perf/util/vdso.h
> new file mode 100644
> index 0000000..908b041
> --- /dev/null
> +++ b/tools/perf/util/vdso.h
> @@ -0,0 +1,8 @@
> +#ifndef __VDSO__
> +#define __VDSO__

Better use __PERF_VDSO__

> +
> +extern const char vdso__file[];
> +char *vdso__get_file(void);
> +void  vdso__exit(void);
> +
> +#endif /* __VDSO__ */
> -- 
> 1.7.7.6

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-06-29 18:50 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-11 13:19 [RFCv5 00/19] perf: Add backtrace post dwarf unwind Jiri Olsa
2012-06-11 13:19 ` [PATCH 01/19] perf: Unified API to record selective sets of arch registers Jiri Olsa
2012-06-11 13:19 ` [PATCH 02/19] perf: Add ability to attach user level registers dump to sample Jiri Olsa
2012-06-13 11:16   ` Stephane Eranian
2012-06-13 13:12     ` Jiri Olsa
2012-06-13 13:18       ` Stephane Eranian
2012-06-13 13:23         ` Jiri Olsa
2012-06-13 13:25           ` Stephane Eranian
2012-06-13 13:40         ` Peter Zijlstra
2012-06-13 13:41           ` Stephane Eranian
2012-06-13 13:51             ` Stephane Eranian
2012-06-14  8:36             ` Peter Zijlstra
2012-06-14 10:45               ` Stephane Eranian
2012-06-14 10:50                 ` Peter Zijlstra
2012-06-14 10:56                   ` Peter Zijlstra
2012-06-13 13:37       ` Peter Zijlstra
2012-06-13 13:29   ` Stephane Eranian
2012-06-11 13:19 ` [PATCH 03/19] perf, x86: Add copy_from_user_nmi_nochk for best effort copy Jiri Olsa
2012-06-11 13:19 ` [PATCH 04/19] perf: Factor __output_copy to be usable with specific copy function Jiri Olsa
2012-06-11 13:20 ` [PATCH 05/19] perf: Add perf_output_skip function to skip bytes in sample Jiri Olsa
2012-06-11 13:20 ` [PATCH 06/19] perf: Add ability to attach user stack dump to sample Jiri Olsa
2012-06-11 13:20 ` [PATCH 07/19] perf: Add attribute to filter out callchains Jiri Olsa
2012-06-11 13:20 ` [PATCH 08/19] perf, tool: Remove unsused evsel parameter from machine__resolve_callchain Jiri Olsa
2012-06-20 16:59   ` [tip:perf/core] perf tools: Remove unused " tip-bot for Jiri Olsa
2012-06-11 13:20 ` [PATCH 09/19] perf, tool: Factor DSO symtab types to generic binary types Jiri Olsa
2012-06-11 13:20 ` [PATCH 10/19] perf, tool: Add interface to read DSO image data Jiri Olsa
2012-06-11 13:20 ` [PATCH 11/19] perf, tool: Add '.note' check into search for NOTE section Jiri Olsa
2012-06-11 13:20 ` [PATCH 12/19] perf, tool: Back [vdso] DSO with real data Jiri Olsa
2012-06-29 18:49   ` Arnaldo Carvalho de Melo
2012-06-11 13:20 ` [PATCH 13/19] perf, tool: Add interface to arch registers sets Jiri Olsa
2012-06-11 13:20 ` [PATCH 14/19] perf, tool: Add libunwind dependency for dwarf cfi unwinding Jiri Olsa
2012-06-11 13:20 ` [PATCH 15/19] perf, tool: Support user regs and stack in sample parsing Jiri Olsa
2012-06-11 13:20 ` [PATCH 16/19] perf, tool: Support for dwarf cfi unwinding on post processing Jiri Olsa
2012-06-11 13:20 ` [PATCH 17/19] perf, tool: Support for dwarf mode callchain on perf record Jiri Olsa
2012-06-11 13:20 ` [PATCH 18/19] perf, tool: Add dso data caching Jiri Olsa
2012-06-11 13:20 ` [PATCH 19/19] perf, tool: Add dso data caching tests Jiri Olsa
2012-06-11 21:44 ` [RFCv5 00/19] perf: Add backtrace post dwarf unwind Benjamin Redelings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.