linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kan.liang@intel.com
To: acme@kernel.org, jolsa@redhat.com, a.p.zijlstra@chello.nl,
	eranian@google.com
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com, paulus@samba.org,
	ak@linux.intel.com, Kan Liang <kan.liang@intel.com>
Subject: [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user)
Date: Fri, 14 Nov 2014 08:44:09 -0500	[thread overview]
Message-ID: <1415972652-17310-1-git-send-email-kan.liang@intel.com> (raw)

From: Kan Liang <kan.liang@intel.com>

This is the user space patch for Haswell LBR call stack support.
For many profiling tasks we need the callgraph. For example we often
need to see the caller of a lock or the caller of a memcpy or other
library function to actually tune the program. Frame pointer unwinding
is efficient and works well. But frame pointers are off by default on
64bit code (and on modern 32bit gccs), so there are many binaries around
that do not use frame pointers. Profiling unchanged production code is
very useful in practice. On some CPUs frame pointer also has a high
cost. Dwarf2 unwinding also does not always work and is extremely slow
(upto 20% overhead).

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are
executed the last captured branch record is popped from the on-chip LBR
registers. The LBR call stack facility provides an alternative to get
callgraph. It has some limitations too, but should work in most cases
and is significantly faster than dwarf. Frame pointer unwinding is still
the best default, but LBR call stack is a good alternative when nothing
else works.

A new call chain recording option "lbr" is introduced into perf tool for
LBR call stack. The user can use --call-graph lbr to get the call stack
information from hardware.

When profiling bc(1) on Fedora 19:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd
If enabling LBR, perf report output looks like:
    50.36%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
    33.66%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
                     bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
     7.62%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                    |
                    |--99.89%-- 0x2000186a8
                     --0.11%-- [...]
     6.83%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
                    |
                    |--99.94%-- bc_add
                    |          execute
                    |          run_code
                    |          yyparse
                    |          main
                    |          __libc_start_main
                    |          _start
                     --0.06%-- [...]
     0.46%       bc  libc-2.17.so       [.] __memset_sse2
                 |
                 --- __memset_sse2
                    |
                    |--54.13%-- bc_new_num
                    |          |
                    |          |--51.00%-- bc_divide
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |          |--30.46%-- _bc_do_sub
                    |          |          bc_add
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |           --18.55%-- _bc_do_add
                    |                     bc_add
                    |                     execute
                    |                     run_code
                    |                     yyparse
                    |                     main
                    |                     __libc_start_main
                    |                     _start
                    |
                     --45.87%-- bc_divide
                               execute
                               run_code
                               yyparse
                               main
                               __libc_start_main
                               _start
If using FP, perf report output looks like:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd
    50.49%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
    33.57%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
     7.61%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                     0x2000186a8
     6.88%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
     0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
                 |
                 --- __memcpy_ssse3_back

If using LBR, perf report -D output looks like:
11739295893248 0x4d0 [0xe0]: PERF_RECORD_SAMPLE(IP, 0x2): 10505/10505:
0x40054d period: 39255 addr: 0
... LBR call chain: nr:7
.....  0: fffffffffffffe00
.....  1: 0000000000400540
.....  2: 0000000000400587
.....  3: 00000000004005b3
.....  4: 00000000004005ef
.....  5: 0000003d1cc21b43
.....  6: 0000000000400474
... FP chain: nr:6
.....  0: fffffffffffffe00
.....  1: 000000000040054d
.....  2: 000000000040058c
.....  3: 00000000004005b8
.....  4: 00000000004005f4
.....  5: 0000003d1cc21b45
 ... thread: a.out:10505
 ...... dso: /home/lk/a.out


The LBR call stack has following known limitations
 - Zero length calls are not filtered out by hardware
 - Exception handing such as setjmp/longjmp will have calls/returns not
   match
 - Pushing different return address onto the stack will have calls/returns
   not match
 - If callstack is deeper than the LBR, only the last entries are captured

Changes since v1
 - Update help document
 - Force exclude_user to 0 with warning in LBR call stack
 - Dump both lbr and fp info when report -D
 - Reconstruct thread__resolve_callchain_sample and split it into two patches
 - Use has_branch_callstack function to check LBR call stack available

Changes since v2
 - Rebase to 025ce5d33373

Kan Liang (3):
  perf tools: enable LBR call stack support
  perf tool: Move cpumode resolve code to add_callchain_ip
  perf tools: Construct LBR call chain

 tools/perf/Documentation/perf-record.txt |   8 +-
 tools/perf/builtin-record.c              |   6 +-
 tools/perf/builtin-report.c              |   2 +
 tools/perf/util/callchain.c              |  10 ++-
 tools/perf/util/callchain.h              |   1 +
 tools/perf/util/evsel.c                  |  21 ++++-
 tools/perf/util/evsel.h                  |   4 +
 tools/perf/util/machine.c                | 134 +++++++++++++++++++++----------
 tools/perf/util/session.c                |  60 ++++++++++++--
 9 files changed, 192 insertions(+), 54 deletions(-)

-- 
1.8.3.2


             reply	other threads:[~2014-11-14 13:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14 13:44 kan.liang [this message]
2014-11-14 13:44 ` [PATCH V3 1/3] perf tools: enable LBR call stack support kan.liang
2014-11-18  5:54   ` Namhyung Kim
2014-11-18 13:57     ` Liang, Kan
2014-11-14 13:44 ` [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip kan.liang
2014-11-17 13:57   ` Jiri Olsa
2014-11-17 14:00   ` Jiri Olsa
2014-11-18  8:24   ` Jiri Olsa
2014-11-21 15:06     ` Liang, Kan
2014-11-21 15:19       ` Arnaldo Carvalho de Melo
2014-11-14 13:44 ` [PATCH V3 3/3] perf tools: Construct LBR call chain kan.liang
2014-11-17 15:54   ` Jiri Olsa
2014-11-17 17:41     ` Liang, Kan
2014-11-18  6:13       ` Namhyung Kim
2014-11-18  7:55         ` Jiri Olsa
2014-11-18 14:37           ` Liang, Kan
2014-11-18 19:40             ` Liang, Kan
2014-11-19  5:57               ` Namhyung Kim
2014-11-17 15:55   ` Jiri Olsa
2014-11-18  6:14     ` Namhyung Kim
2014-11-18  6:25   ` Namhyung Kim
2014-11-18 14:01     ` Liang, Kan
2014-11-19  6:01       ` Namhyung Kim
2014-11-19 13:37         ` Liang, Kan
2014-11-17 16:01 ` [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1415972652-17310-1-git-send-email-kan.liang@intel.com \
    --to=kan.liang@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).