From: Stephane Eranian <eranian@google.com>
To: "Yan, Zheng" <zheng.z.yan@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ingo Molnar <mingo@kernel.org>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH v3 00/14] perf, x86: Haswell LBR call stack support
Date: Sun, 23 Feb 2014 20:47:55 +0100 [thread overview]
Message-ID: <CABPqkBSuQoedYqe7EZ567vD4eTfxbzsLJhGJoGUh0d0GPJ=O0A@mail.gmail.com> (raw)
In-Reply-To: <1392703661-15104-1-git-send-email-zheng.z.yan@intel.com>
Could you add the Reviewed-by: on the patches I already
reviewed? So I focus on the changes you made and continue
testing on my HSW system.
Thanks.
On Tue, Feb 18, 2014 at 7:07 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
> For many profiling tasks we need the callgraph. For example we often
> need to see the caller of a lock or the caller of a memcpy or other
> library function to actually tune the program. Frame pointer unwinding
> is efficient and works well. But frame pointers are off by default on
> 64bit code (and on modern 32bit gccs), so there are many binaries around
> that do not use frame pointers. Profiling unchanged production code is
> very useful in practice. On some CPUs frame pointer also has a high
> cost. Dwarf2 unwinding also does not always work and is extremely slow
> (upto 20% overhead).
>
> Haswell has a new feature that utilizes the existing Last Branch Record
> facility to record call chains. When the feature is enabled, function
> call will be collected as normal, but as return instructions are
> executed the last captured branch record is popped from the on-chip LBR
> registers. The LBR call stack facility provides an alternative to get
> callgraph. It has some limitations too, but should work in most cases
> and is significantly faster than dwarf. Frame pointer unwinding is still
> the best default, but LBR call stack is a good alternative when nothing
> else works.
>
> This patch series adds LBR call stack support. User can enabled/disable
> this through an sysfs attribute file in the CPU PMU directory:
> echo 1 > /sys/bus/event_source/devices/cpu/lbr_callstack
>
> When profiling bc(1) on Fedora 19:
> echo 'scale=2000; 4*a(1)' > cmd; perf record -g fp bc -l < cmd
>
> If this feature is enabled, perf report output looks like:
> 50.36% bc bc [.] bc_divide
> |
> --- bc_divide
> execute
> run_code
> yyparse
> main
> __libc_start_main
> _start
>
> 33.66% bc bc [.] _one_mult
> |
> --- _one_mult
> bc_divide
> execute
> run_code
> yyparse
> main
> __libc_start_main
> _start
>
> 7.62% bc bc [.] _bc_do_add
> |
> --- _bc_do_add
> |
> |--99.89%-- 0x2000186a8
> --0.11%-- [...]
>
> 6.83% bc bc [.] _bc_do_sub
> |
> --- _bc_do_sub
> |
> |--99.94%-- bc_add
> | execute
> | run_code
> | yyparse
> | main
> | __libc_start_main
> | _start
> --0.06%-- [...]
>
> 0.46% bc libc-2.17.so [.] __memset_sse2
> |
> --- __memset_sse2
> |
> |--54.13%-- bc_new_num
> | |
> | |--51.00%-- bc_divide
> | | execute
> | | run_code
> | | yyparse
> | | main
> | | __libc_start_main
> | | _start
> | |
> | |--30.46%-- _bc_do_sub
> | | bc_add
> | | execute
> | | run_code
> | | yyparse
> | | main
> | | __libc_start_main
> | | _start
> | |
> | --18.55%-- _bc_do_add
> | bc_add
> | execute
> | run_code
> | yyparse
> | main
> | __libc_start_main
> | _start
> |
> --45.87%-- bc_divide
> execute
> run_code
> yyparse
> main
> __libc_start_main
> _start
>
> If this feature is disabled, perf report output looks like:
> 50.49% bc bc [.] bc_divide
> |
> --- bc_divide
>
> 33.57% bc bc [.] _one_mult
> |
> --- _one_mult
>
> 7.61% bc bc [.] _bc_do_add
> |
> --- _bc_do_add
> 0x2000186a8
>
> 6.88% bc bc [.] _bc_do_sub
> |
> --- _bc_do_sub
>
> 0.42% bc libc-2.17.so [.] __memcpy_ssse3_back
> |
> --- __memcpy_ssse3_back
>
> The LBR call stack has following known limitations
> - Zero length calls are not filtered out by hardware
> - Exception handing such as setjmp/longjmp will have calls/returns not
> match
> - Pushing different return address onto the stack will have calls/returns
> not match
> - If callstack is deeper than the LBR, only the last entries are captured
>
> Changes since v1
> - split change into more patches
> - introduce context switch callback and use it to flush LBR
> - use the context switch callback to save/restore LBR
> - dynamic allocate memory area for storing LBR stack, always switch the
> memory area during context switch
> - disable this feature by default
> - more description in change logs
>
> Changes since v2
> - don't use xchg to switch PMU specific data
> - remove nr_branch_stack from struct perf_event_context
> - simplify the save/restore LBR stack logical
> - remove unnecessary 'has_branch_stack -> needs_branch_stack'
> conversion
> - more description in change logs
>
>
next prev parent reply other threads:[~2014-02-23 19:47 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-18 6:07 [PATCH v3 00/14] perf, x86: Haswell LBR call stack support Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 01/14] perf, x86: Reduce lbr_sel_map size Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 02/14] perf, core: introduce pmu context switch callback Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 03/14] perf, x86: use context switch callback to flush LBR stack Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 04/14] perf, x86: Basic Haswell LBR call stack support Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 05/14] perf, core: pmu specific data for perf task context Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 06/14] perf, core: always switch pmu specific data during context switch Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 07/14] perf, x86: track number of events that use LBR callstack Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 08/14] perf, x86: allocate space for storing LBR stack Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 09/14] perf, x86: Save/resotre LBR stack during context switch Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 10/14] perf, core: simplify need branch stack check Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 11/14] perf, core: Pass perf_sample_data to perf_callchain() Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 12/14] perf, x86: use LBR call stack to get user callchain Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 13/14] perf, x86: enable LBR callstack when recording callchain Yan, Zheng
2014-02-18 6:07 ` [PATCH v3 14/14] perf, x86: Discard zero length call entries in LBR call stack Yan, Zheng
2014-02-23 19:47 ` Stephane Eranian [this message]
2014-02-24 1:07 ` [PATCH v3 00/14] perf, x86: Haswell LBR call stack support Yan, Zheng
2014-02-24 7:14 ` Peter Zijlstra
2014-02-26 2:39 ` Andy Lutomirski
2014-02-26 7:04 ` Stephane Eranian
2014-02-26 8:57 ` Yan, Zheng
2014-02-26 16:03 ` Andy Lutomirski
2014-02-26 18:55 ` Andi Kleen
2014-02-26 18:59 ` Andy Lutomirski
2014-02-26 19:19 ` David Ahern
2014-02-26 19:25 ` Andy Lutomirski
2014-02-26 20:14 ` David Ahern
2014-02-26 20:26 ` Andy Lutomirski
2014-04-09 11:48 ` Peter Zijlstra
2014-04-09 16:48 ` Andi Kleen
2014-04-09 17:40 ` Andi Kleen
2014-02-26 20:32 ` Peter Zijlstra
2014-02-26 20:53 ` Andi Kleen
2014-02-26 21:15 ` Peter Zijlstra
2014-02-26 21:33 ` Andi Kleen
2014-02-26 21:34 ` David Ahern
2014-02-26 21:42 ` Andi Kleen
2014-02-27 9:09 ` Stephane Eranian
2014-02-27 12:35 ` Ingo Molnar
2014-02-27 16:08 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABPqkBSuQoedYqe7EZ567vD4eTfxbzsLJhGJoGUh0d0GPJ=O0A@mail.gmail.com' \
--to=eranian@google.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@infradead.org \
--cc=andi@firstfloor.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=zheng.z.yan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).