linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv10 00/12] perf: Add backtrace post dwarf unwind
@ 2012-08-07 13:20 Jiri Olsa
  2012-08-07 13:20 ` [PATCH 01/12] perf: Unified API to record selective sets of arch registers Jiri Olsa
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Jiri Olsa @ 2012-08-07 13:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arnaldo Carvalho de Melo, Arun Sharma, Benjamin Redelings,
	Corey Ashford, Cyrill Gorcunov, Frank Ch. Eigler,
	Frederic Weisbecker, Ingo Molnar, Masami Hiramatsu,
	Paul Mackerras, Peter Zijlstra, Robert Richter, Stephane Eranian,
	Tom Zanussi, Ulrich Drepper

hi,
patches available also as tarball in here:
http://people.redhat.com/~jolsa/perf_post_unwind_v10.tar.bz2

v10 changes:
   - omit copy_from_user_nmi_nochk function
   - record -g option fix for unlimited arg len

v9 changes:
   - rebased to current tip tree

v8 changes:
   - patch 2 - added dump registers ABI specification as suggested
               by Stephane
   - v7 patches 9,10,16,17 already in

v7 changes:
   - omitted v6 patches 9 and 15
     They need more work and will be sent separately. I dont want to hold off whole
     patchset because of them. We could miss some related backtraces (syscall, vdso)
     in this version.
   - v6 patch 11, 14, 20 already in

v6 changes:
   patch 01/23 - unrelated - ftrace stuff
   patch 03/23 - added PERF_SAMPLE_REGS_USER bit
               - added regs_user initialization
   patch 07/23 - added PERF_SAMPLE_STACK_USER bit
               - sample_stack_user changed to u32 and
                 added size check
   new patches 1,9,10,20

v5 changes:
   patch 1/19 - having just one enum set of the perf registers
   patch 2/19 - using for_each_set_bit for scanning the mask
              - single regs enum for both 32 and 64 bits versions
              - using regs mask != 0 trigger to trigger the regs dump
   patch 5/19 - adding perf_output_skip so we can skip undumped part of the stack in RB
   patch 6/19 - using stack size != 0 trigger to trigger the stack dump
              - do not zero the memory for non retrieved part of the stack dump
   patch 7/19 - adding exclude_callchain_kernel attribute
   patch 8/19 - this could be taken without the rest of the series

v4 changes:
   - no real change from v3, just rebase
   - v3 patch 06/17 got already merged

v3 changes:
   patch 01/17
   - added HAVE_PERF_REGS config option
   patch 02/17, 04/17
   - regs and stack perf interface is more general now
   patch 06/17
   - unrelated online fix for i386 compilation
   patch 16/17
   - few namespace fixies

---
Adding the post unwinding user stack backtrace using dwarf unwind
via libunwind. The original work was done by Frederic. I mostly took
his patches and make them compile in current kernel code plus I added
some stuff here and there.

The main idea is to store user registers and portion of user
stack when the sample data during the record phase. Then during
the report, when the data is presented, perform the actual dwarf
dwarf unwind.

attached patches:
  01/12 perf: Unified API to record selective sets of arch registers
  02/12 perf: Add ability to attach user level registers dump to sample
  03/12 perf: Factor __output_copy to be usable with specific copy function
  04/12 perf: Add perf_output_skip function to skip bytes in sample
  05/12 perf: Add ability to attach user stack dump to sample
  06/12 perf: Add attribute to filter out callchains
  07/12 perf tools: Adding PERF_ATTR_SIZE_VER2 to the header swap check
  08/12 perf tools: Add interface to arch registers sets
  09/12 perf tools: Add libunwind dependency for DWARF CFI unwinding
  10/12 perf tools: Support user regs and stack in sample parsing
  11/12 perf tools: Support for DWARF CFI unwinding on post processing
  12/12 perf tools: Support for DWARF mode callchain


I tested on Fedora. There was not much gain on i386, because the
binaries are compiled with frame pointers. Thought the dwarf
backtrace is more accurate and unwraps calls in more details
(functions that do not set the frame pointers).

I could see some improvement on x86_64, where I got full backtrace
where current code could got just the first address out of the
instruction pointer.

Example on x86_64:
[dwarf]
   perf record -g dwarf -e syscalls:sys_enter_write date

   100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
               |
               --- __GI___libc_write
                   _IO_file_write@@GLIBC_2.2.5
                   new_do_write
                   _IO_do_write@@GLIBC_2.2.5
                   _IO_file_overflow@@GLIBC_2.2.5
                   0x4022cd
                   0x401ee6
                   __libc_start_main
                   0x4020b9


[frame pointer]
   perf record -g fp -e syscalls:sys_enter_write date

   100.00%     date  libc-2.14.90.so  [.] __GI___libc_write
               |
               --- __GI___libc_write

Also I tested on coreutils binaries mainly, but I could see
getting wider backtraces with dwarf unwind for more complex
application like firefox.

Attached patches should work on both x86 and x86_64.

The unwind backtrace can be interrupted by following reasons:
    - bug in unwind information of processed shared library
    - bug in unwind processing code (most likely ;) )
    - insufficient dump stack size
    - until full syscall register storage and vdso support
      we could miss some related backtraces 

jirka

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
---
 arch/Kconfig                             |  13 ++
 arch/x86/Kconfig                         |   2 +
 arch/x86/include/asm/perf_event.h        |   2 +
 arch/x86/include/asm/perf_regs.h         |  33 ++++++
 arch/x86/kernel/Makefile                 |   2 +
 arch/x86/kernel/perf_regs.c              | 105 ++++++++++++++++
 include/linux/perf_event.h               |  60 +++++++++-
 include/linux/perf_regs.h                |  25 ++++
 kernel/events/callchain.c                |  38 +++---
 kernel/events/core.c                     | 214 +++++++++++++++++++++++++++++++++
 kernel/events/internal.h                 |  82 +++++++++----
 kernel/events/ring_buffer.c              |  10 +-
 tools/perf/Makefile                      |  45 ++++++-
 tools/perf/arch/x86/Makefile             |   3 +
 tools/perf/arch/x86/include/perf_regs.h  |  80 +++++++++++++
 tools/perf/arch/x86/util/unwind.c        | 111 +++++++++++++++++
 tools/perf/builtin-record.c              | 114 +++++++++++++++++-
 tools/perf/builtin-report.c              |  18 +--
 tools/perf/builtin-script.c              |   6 +-
 tools/perf/builtin-top.c                 |   6 +-
 tools/perf/config/feature-tests.mak      |  25 ++++
 tools/perf/perf.h                        |   9 +-
 tools/perf/util/event.h                  |  12 ++
 tools/perf/util/evsel.c                  |  41 ++++++-
 tools/perf/util/header.c                 |   3 +
 tools/perf/util/include/linux/compiler.h |   1 +
 tools/perf/util/map.h                    |   5 +-
 tools/perf/util/perf_regs.h              |  14 +++
 tools/perf/util/session.c                | 107 ++++++++++++++---
 tools/perf/util/session.h                |   6 +-
 tools/perf/util/trace-event.h            |   2 +
 tools/perf/util/unwind.c                 | 567 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/unwind.h                 |  34 ++++++
 33 files changed, 1713 insertions(+), 82 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-08-27 16:59 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-07 13:20 [PATCHv10 00/12] perf: Add backtrace post dwarf unwind Jiri Olsa
2012-08-07 13:20 ` [PATCH 01/12] perf: Unified API to record selective sets of arch registers Jiri Olsa
2012-08-21 15:45   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 02/12] perf: Add ability to attach user level registers dump to sample Jiri Olsa
2012-08-21 15:46   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 03/12] perf: Factor __output_copy to be usable with specific copy function Jiri Olsa
2012-08-21 15:47   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
2012-08-07 13:20 ` [PATCH 04/12] perf: Add perf_output_skip function to skip bytes in sample Jiri Olsa
2012-08-21 15:48   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 05/12] perf: Add ability to attach user stack dump to sample Jiri Olsa
2012-08-21 15:49   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-21 17:11     ` Peter Zijlstra
2012-08-22  8:35       ` [PATCH] perf: Keep the perf_event_attr on version 3 Jiri Olsa
2012-08-22 18:18         ` Arnaldo Carvalho de Melo
2012-08-22 18:21           ` Peter Zijlstra
2012-08-27 16:57         ` [tip:perf/core] perf tools: " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 06/12] perf: Add attribute to filter out callchains Jiri Olsa
2012-08-21 15:50   ` [tip:perf/core] " tip-bot for Frederic Weisbecker
2012-08-07 13:20 ` [PATCH 07/12] perf tools: Adding PERF_ATTR_SIZE_VER2 to the header swap check Jiri Olsa
2012-08-21 15:51   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-21 17:12     ` Peter Zijlstra
2012-08-22  8:31       ` Jiri Olsa
2012-08-07 13:20 ` [PATCH 08/12] perf tools: Add interface to arch registers sets Jiri Olsa
2012-08-21 15:52   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 09/12] perf tools: Add libunwind dependency for DWARF CFI unwinding Jiri Olsa
2012-08-21 15:53   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 10/12] perf tools: Support user regs and stack in sample parsing Jiri Olsa
2012-08-21 15:54   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 11/12] perf tools: Support for DWARF CFI unwinding on post processing Jiri Olsa
2012-08-21 15:55   ` [tip:perf/core] " tip-bot for Jiri Olsa
2012-08-07 13:20 ` [PATCH 12/12] perf tools: Support for DWARF mode callchain Jiri Olsa
2012-08-21 15:55   ` [tip:perf/core] " tip-bot for Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).