linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Clark Williams <williams@redhat.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Kan Liang <kan.liang@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Jiri Olsa <jolsa@redhat.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexey Budankov <alexey.budankov@linux.intel.com>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Pavel Gerasimov <pavel.gerasimov@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ravi Bangoria <ravi.bangoria@linux.ibm.com>,
	Stephane Eranian <eranian@google.com>,
	Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Subject: [PATCH 56/60] perf report: Add option to enable the LBR stitching approach
Date: Mon, 20 Apr 2020 08:53:12 -0300	[thread overview]
Message-ID: <20200420115316.18781-57-acme@kernel.org> (raw)
In-Reply-To: <20200420115316.18781-1-acme@kernel.org>

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack can
break the HW limitation. However, it may reconstruct invalid call stacks
in some cases, e.g. exception handing such as setjmp/longjmp.  Also, it
may impact the processing time especially when the number of samples
with stitched LBRs are huge.

Add an option to enable the approach.

  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles'
  # Event count (approx.): 6492797701
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................
  # .................................
  #
    99.99%    99.99%  tchain_edit      tchain_edit        [.] f43
            |
            ---main
               f1
               f2
               f3
               f4
               f5
               f6
               f7
               f8
               f9
               f10
               f11
               f12
               f13
               f14
               f15
               f16
               f17
               f18
               f19
               f20
               f21
               f22
               f23
               f24
               f25
               f26
               f27
               f28
               f29
               f30
               f31
               |
                --99.65%--f32
                          f33
                          f34
                          f35
                          f36
                          f37
                          f38
                          f39
                          f40
                          f41
                          f42
                          f43

Committer testing:

  $ perf record --call-graph lbr /wb/tchain_edit
  [ perf record: Woken up 23 times to write data ]
  [ perf record: Captured and wrote 5.578 MB perf.data (6839 samples) ]
  $ perf report --header-only | egrep 'cpu(desc|.*capabilities)'
  # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
  $

Before:

  $ perf report --no-children --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles:u'
  # Event count (approx.): 6459523879
  #
  # Overhead  Command      Shared Object     Symbol
  # ........  ...........  ................  .......................
  #
      99.95%  tchain_edit  tchain_edit       [.] f43
              |
               --99.92%--f43
                         f42
                         f41
                         f40
                         f39
                         f38
                         f37
                         f36
                         f35
                         f34
                         f33
                         f32
                         f31
                         f30
                         f29
                         f28
                         f27
                         f26
                         f25
                         f24
                         f23
                         f22
                         f21
                         f20
                         f19
                         f18
                         f17
                         f16
                         f15
                         f14
                         f13
                         f12
                         f11

       0.03%  tchain_edit  tchain_edit       [.] f42
       0.01%  tchain_edit  tchain_edit       [.] f41
       0.00%  tchain_edit  tchain_edit       [.] f31
       0.00%  tchain_edit  ld-2.29.so        [.] _dl_relocate_object
       0.00%  tchain_edit  ld-2.29.so        [.] memmove
       0.00%  tchain_edit  [unknown]         [k] 0xffffffff93a00b17

After:

  $ perf report --stitch-lbr --no-children --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles:u'
  # Event count (approx.): 6459496645
  #
  # Overhead  Command      Shared Object     Symbol
  # ........  ...........  ................  ........................
  #
      99.97%  tchain_edit  tchain_edit       [.] f43
              |
               --99.93%--f43
                         f42
                         f41
                         f40
                         f39
                         f38
                         f37
                         f36
                         f35
                         f34
                         f33
                         f32
                         f31
                         f30
                         f29
                         f28
                         f27
                         f26
                         f25
                         f24
                         f23
                         f22
                         f21
                         f20
                         f19
                         f18
                         f17
                         f16
                         f15
                         f14
                         f13
                         f12
                         f11
                         f10
                         f9
                         f8
                         f7
                         f6
                         f5
                         f4
                         f3
                         f2
                         f1
                         main
                         __libc_start_main

       0.02%  tchain_edit  [unknown]         [k] 0xffffffff93a00b17
       0.01%  tchain_edit  tchain_edit       [.] f31
       0.00%  tchain_edit  ld-2.29.so        [.] _dl_important_hwcaps

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-14-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 11 +++++++++++
 tools/perf/builtin-report.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index f569b9ea4002..d068103690cc 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -488,6 +488,17 @@ include::itrace.txt[]
 	This option extends the perf report to show reference callgraphs,
 	which collected by reference event, in no callgraph event.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 --socket-filter::
 	Only report the samples on the processor socket that match with this filter
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c0cebd53ecf9..0c32767b1c56 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -84,6 +84,7 @@ struct report {
 	bool			header_only;
 	bool			nonany_branch_mode;
 	bool			group_set;
+	bool			stitch_lbr;
 	int			max_stack;
 	struct perf_read_values	show_threads_values;
 	struct annotation_options annotation_opts;
@@ -267,6 +268,9 @@ static int process_sample_event(struct perf_tool *tool,
 		return -1;
 	}
 
+	if (rep->stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	if (symbol_conf.hide_unresolved && al.sym == NULL)
 		goto out_put;
 
@@ -408,6 +412,12 @@ static int report__setup_sample_type(struct report *rep)
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
 
+	if (rep->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
+		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
+			    "Please apply --call-graph lbr when recording.\n");
+		rep->stitch_lbr = false;
+	}
+
 	/* ??? handle more cases than just ANY? */
 	if (!(perf_evlist__combined_branch_type(session->evlist) &
 				PERF_SAMPLE_BRANCH_ANY))
@@ -1258,6 +1268,8 @@ int cmd_report(int argc, const char **argv)
 			"Show full source file name path for source lines"),
 	OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
 		    "Show callgraph from reference event"),
+	OPT_BOOLEAN(0, "stitch-lbr", &report.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPT_INTEGER(0, "socket-filter", &report.socket_filter,
 		    "only show processor socket that match with this filter"),
 	OPT_BOOLEAN(0, "raw-trace", &symbol_conf.raw_trace,
-- 
2.21.1


  parent reply	other threads:[~2020-04-20 11:57 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-20 11:52 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 01/60] perf stat: Honour --timeout for forked workloads Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 02/60] perf tools: Synthesize bpf_trampoline/dispatcher ksymbol event Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 03/60] perf machine: Set ksymbol dso as loaded on arrival Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 04/60] perf annotate: Add basic support for bpf_image Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 05/60] capabilities: Introduce CAP_PERFMON to kernel and user space Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 06/60] perf/core: Open access to the core for CAP_PERFMON privileged process Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 07/60] perf/core: open access to probes " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 08/60] perf tools: Support CAP_PERFMON capability Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 09/60] drm/i915/perf: Open access for CAP_PERFMON privileged process Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 10/60] trace/bpf_trace: " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 11/60] powerpc/perf: open " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 12/60] parisc/perf: " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 13/60] drivers/perf: Open " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 14/60] drivers/oprofile: " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 15/60] doc/admin-guide: Update perf-security.rst with CAP_PERFMON information Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 16/60] doc/admin-guide: update kernel.rst " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 17/60] perf script: Simplify auxiliary event printing functions Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 18/60] perf bench: Add event synthesis benchmark Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 19/60] tools api fs: Make xxx__mountpoint() more scalable Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 20/60] perf synthetic-events: save 4kb from 2 stack frames Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 21/60] perf expr: Add expr_ prefix for parse_ctx and parse_id Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 22/60] perf expr: Add expr_scanner_ctx object Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 23/60] perf metrictroup: Split the metricgroup__add_metric function Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 24/60] perf script: Add flamegraph.py script Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 25/60] perf auxtrace: Add ->evsel_is_auxtrace() callback Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 26/60] perf intel-pt: Implement " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 27/60] perf intel-bts: " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 28/60] perf arm-spe: " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 29/60] perf cs-etm: " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 30/60] perf s390-cpumsf: " Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 31/60] perf auxtrace: For reporting purposes, un-group AUX area event Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 32/60] perf auxtrace: Add an option to synthesize callchains for regular events Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 33/60] perf thread-stack: Add thread_stack__sample_late() Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 34/60] perf evsel: Be consistent when looking which evsel PERF_SAMPLE_ bits are set Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 35/60] perf evsel: Add support for synthesized sample type Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 36/60] perf intel-pt: Add support for synthesizing callchains for regular events Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 37/60] perf evsel: Move and globalize perf_evsel__find_pmu() and perf_evsel__is_aux_event() Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 38/60] perf evlist: Move leader-sampling configuration Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 39/60] perf evsel: Rearrange perf_evsel__config_leader_sampling() Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 40/60] perf evlist: Allow multiple read formats Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 41/60] perf tools: Add support for leader-sampling with AUX area events Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 42/60] perf stat: Force error in fallback on :k events Arnaldo Carvalho de Melo
2020-04-20 11:52 ` [PATCH 43/60] tools lib traceevent: Take care of return value of asprintf Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 44/60] perf pmu: Add support for PMU capabilities Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 45/60] perf doc: allow ASCIIDOC_EXTRA to be an argument Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 46/60] perf parser: Add support to specify rXXX event with pmu Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 47/60] perf header: Support CPU PMU capabilities Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 48/60] perf machine: Remove the indent in resolve_lbr_callchain_sample Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 49/60] perf machine: Refine the function for LBR call stack reconstruction Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 50/60] perf machine: Factor out lbr_callchain_add_kernel_ip() Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 51/60] perf machine: Factor out lbr_callchain_add_lbr_ip() Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 52/60] perf thread: Add a knob for LBR stitch approach Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 53/60] perf thread: Save previous sample for LBR stitching approach Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 54/60] perf callchain: Save previous cursor nodes " Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 55/60] perf callchain: Stitch LBR call stack Arnaldo Carvalho de Melo
2020-04-20 11:53 ` Arnaldo Carvalho de Melo [this message]
2020-04-20 11:53 ` [PATCH 57/60] perf script: Add option to enable the LBR stitching approach Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 58/60] perf top: " Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 59/60] perf c2c: " Arnaldo Carvalho de Melo
2020-04-20 11:53 ` [PATCH 60/60] perf hist: Add fast path for duplicate entries check Arnaldo Carvalho de Melo
2020-04-22 12:09 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar
2020-04-23 21:28   ` Daniel Díaz
2020-04-24 13:07     ` Arnaldo Carvalho de Melo
2020-04-24 14:10       ` Andreas Gerstmayr
2020-05-04 19:07         ` Daniel Díaz
2020-05-05 16:37           ` Arnaldo Carvalho de Melo
2020-05-05 16:57             ` Daniel Díaz
2020-05-05 17:03               ` Arnaldo Carvalho de Melo
2020-05-08 13:04     ` [tip: perf/core] perf flamegraph: Use /bin/bash for report and record scripts tip-bot2 for Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200420115316.18781-57-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=namhyung@kernel.org \
    --cc=pavel.gerasimov@intel.com \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vitaly.slobodskoy@intel.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).