linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Make perf script easier to use for itrace
@ 2018-09-20 18:05 Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding Andi Kleen
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Andi Kleen @ 2018-09-20 18:05 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel

Implement a range of improvements to make it easier to look
at itrace traces with perf script. Nothing here couldn't be done
before with some additional scripting, but add simple high 
level options to make it easier to use.

% perf record -e intel_pt//k -a sleep 1

Show function calls:

% perf script --call-trace
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])          perf_pmu_enable
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])          __x86_indirect_thunk_rax
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])          event_filter_match
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
    
Show function calls and returns:

% perf script --call-ret-trace
                perf   900 [000] 194167.205652203:   tr strt     ([unknown])        pt_config
                perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_config
                perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_event_add
                perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            perf_pmu_enable
                perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_void
                perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_sched_in.isra.107
                perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            __x86_indirect_thunk_rax
                perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_int
                perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            group_sched_in
                perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            event_filter_match
                perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_filter_match
                perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            group_sched_in


Show instruction traces (using XED):

  % perf script --insn-trace --xed
       swapper     0 [000] 117276.429606186:  ffffffff81010486 pt_config ([kernel.kallsyms])          nopl  %eax, (%rax,%rax,1)
       swapper     0 [000] 117276.429606186:  ffffffff8101048b pt_config ([kernel.kallsyms])          add $0x10, %rsp
       swapper     0 [000] 117276.429606186:  ffffffff8101048f pt_config ([kernel.kallsyms])          popq  %rbx
       swapper     0 [000] 117276.429606186:  ffffffff81010490 pt_config ([kernel.kallsyms])          popq  %rbp
       swapper     0 [000] 117276.429606186:  ffffffff81010491 pt_config ([kernel.kallsyms])          popq  %r12
       swapper     0 [000] 117276.429606186:  ffffffff81010493 pt_config ([kernel.kallsyms])          popq  %r13
       swapper     0 [000] 117276.429606186:  ffffffff81010495 pt_config ([kernel.kallsyms])          popq  %r14
       swapper     0 [000] 117276.429606186:  ffffffff81010497 pt_config ([kernel.kallsyms])          popq  %r15
       swapper     0 [000] 117276.429606186:  ffffffff81010499 pt_config ([kernel.kallsyms])          retq


Filter by a ftrace style graph function:

 % perf script --graph-function group_sched_in --call-trace
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          calc_timer_values
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_cpu
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          arch_perf_update_userpage
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              __fentry__
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              using_native_sched_clock
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_stable
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_enable
                perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax


Also available in
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git pt/easy-script-7

v1: Initial post
v2: Address review comments. Minor fixes to descriptions. Now builds everywhere.
v3: Merge wildcard+remove addr/sym patch for one logical review unit.
Drop typed remove patch
Fix build failure on ARM ETM builds.
Fix git branch number.
v4: Remove debug printout.
Really fix ETM build failure (Kim Phillips)
v5: Split out --xed into separate option to handle other architectures better,
and support brstackinsn too. Add total cycles printing for brstackinsn
v6: Document xed build. Change code formatting for one patch.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding
  2018-09-20 18:05 Make perf script easier to use for itrace Andi Kleen
@ 2018-09-20 18:05 ` Andi Kleen
  2018-10-08 17:50   ` Arnaldo Carvalho de Melo
                     ` (2 more replies)
  2018-09-20 18:05 ` [PATCH v6 2/5] perf, tools, script: Make itrace script default to all calls Andi Kleen
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 21+ messages in thread
From: Andi Kleen @ 2018-09-20 18:05 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a --insn-trace short hand option for decoding and disassembling
instruction streams for intel_pt. This automatically pipes the
output into the xed disassembler to generate disassembled instructions.
This just makes this use model much nicer to use

Before

% perf record -e intel_pt// ...
% perf script --itrace=i0ns --ns -F +insn,-event,-period | xed -F insn: -A -64
   swapper     0 [000] 117276.429606186:  ffffffff81010486 pt_config ([kernel.kallsyms])          nopl  %eax, (%rax,%rax,1)
   swapper     0 [000] 117276.429606186:  ffffffff8101048b pt_config ([kernel.kallsyms])          add $0x10, %rsp
   swapper     0 [000] 117276.429606186:  ffffffff8101048f pt_config ([kernel.kallsyms])          popq  %rbx
   swapper     0 [000] 117276.429606186:  ffffffff81010490 pt_config ([kernel.kallsyms])          popq  %rbp
   swapper     0 [000] 117276.429606186:  ffffffff81010491 pt_config ([kernel.kallsyms])          popq  %r12
   swapper     0 [000] 117276.429606186:  ffffffff81010493 pt_config ([kernel.kallsyms])          popq  %r13
   swapper     0 [000] 117276.429606186:  ffffffff81010495 pt_config ([kernel.kallsyms])          popq  %r14
   swapper     0 [000] 117276.429606186:  ffffffff81010497 pt_config ([kernel.kallsyms])          popq  %r15
   swapper     0 [000] 117276.429606186:  ffffffff81010499 pt_config ([kernel.kallsyms])          retq
   swapper     0 [000] 117276.429606186:  ffffffff8101063e pt_event_add ([kernel.kallsyms])               cmpl  $0x1, 0x1b0(%rbx)
   swapper     0 [000] 117276.429606186:  ffffffff81010645 pt_event_add ([kernel.kallsyms])               mov $0xffffffea, %eax
   swapper     0 [000] 117276.429606186:  ffffffff8101064a pt_event_add ([kernel.kallsyms])               mov $0x0, %edx
   swapper     0 [000] 117276.429606186:  ffffffff8101064f pt_event_add ([kernel.kallsyms])               popq  %rbx
   swapper     0 [000] 117276.429606186:  ffffffff81010650 pt_event_add ([kernel.kallsyms])               cmovnz %edx, %eax
   swapper     0 [000] 117276.429606186:  ffffffff81010653 pt_event_add ([kernel.kallsyms])               jmp 0xffffffff81010635
   swapper     0 [000] 117276.429606186:  ffffffff81010635 pt_event_add ([kernel.kallsyms])               retq
   swapper     0 [000] 117276.429606186:  ffffffff8115e687 event_sched_in.isra.107 ([kernel.kallsyms])            test %eax, %eax

Now

% perf record -e intel_pt// ...
% perf script --insn-trace --xed
... same output ...

XED needs to be installed with:

> git clone https://github.com/intelxed/mbuild.git mbuild
> git clone https://github.com/intelxed/xed
> cd xed
> mkdir obj
> cd obj
> ../mfile.py
> sudo ../mfile.py --prefix=/usr/local install

Signed-off-by: Andi Kleen <ak@linux.intel.com>

--

v2: Add separate --xed option
v3: Add xed build documentation and update commit
---
 tools/perf/Documentation/build-xed.txt   | 11 +++++++++++
 tools/perf/Documentation/perf-script.txt |  7 +++++++
 tools/perf/builtin-script.c              | 23 +++++++++++++++++++++++
 3 files changed, 41 insertions(+)
 create mode 100644 tools/perf/Documentation/build-xed.txt

diff --git a/tools/perf/Documentation/build-xed.txt b/tools/perf/Documentation/build-xed.txt
new file mode 100644
index 000000000000..8da3028e6dca
--- /dev/null
+++ b/tools/perf/Documentation/build-xed.txt
@@ -0,0 +1,11 @@
+
+For --xed the xed tool is needed. Here is how to install it:
+
+> git clone https://github.com/intelxed/mbuild.git mbuild                                                                                        
+> git clone https://github.com/intelxed/xed                                                                                                      
+> cd xed                                                                                                                                         
+> mkdir obj                                                                                                                                      
+> cd obj                                                                                                                                         
+> ../mfile.py                                                                                                                                    
+> sudo ../mfile.py --prefix=/usr/local install    
+
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index afdafe2110a1..00c655ab4968 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -383,6 +383,13 @@ include::itrace.txt[]
 	will be printed. Each entry has function name and file/line. Enabled by
 	default, disable with --no-inline.
 
+--insn-trace::
+	Show instruction stream for intel_pt traces. Combine with --xed to
+	show disassembly.
+
+--xed::
+	Run xed disassembler on output. Requires installing the xed disassembler.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4da5e32b9e03..311f5b53dd83 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -44,6 +44,7 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <subcmd/pager.h>
 
 #include "sane_ctype.h"
 
@@ -3100,6 +3101,24 @@ static int perf_script__process_auxtrace_info(struct perf_session *session,
 #define perf_script__process_auxtrace_info 0
 #endif
 
+static int parse_insn_trace(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	parse_output_fields(NULL, "+insn,-event,-period", 0);
+	itrace_parse_synth_opts(opt, "i0ns", 0);
+	nanosecs = true;
+	return 0;
+}
+
+static int parse_xed(const struct option *opt __maybe_unused,
+		     const char *str __maybe_unused,
+		     int unset __maybe_unused)
+{
+	force_pager("xed -F insn: -A -64 | less");
+	return 0;
+}
+
 int cmd_script(int argc, const char **argv)
 {
 	bool show_full_info = false;
@@ -3184,6 +3203,10 @@ int cmd_script(int argc, const char **argv)
 		    "system-wide collection from all CPUs"),
 	OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
 		   "only consider these symbols"),
+	OPT_CALLBACK_OPTARG(0, "insn-trace", &itrace_synth_opts, NULL, NULL,
+			"Decode instructions from itrace", parse_insn_trace),
+	OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
+			"Run xed disassembler on output", parse_xed),
 	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
 		   "Stop display of callgraph at these symbols"),
 	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 2/5] perf, tools, script: Make itrace script default to all calls
  2018-09-20 18:05 Make perf script easier to use for itrace Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding Andi Kleen
@ 2018-09-20 18:05 ` Andi Kleen
  2018-10-26  7:46   ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace Andi Kleen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2018-09-20 18:05 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

By default perf script for itrace outputs sampled instructions or
branches. In my experience this is confusing to users because it's
hard to correlate with real program behavior. The sampling makes
sense for tools like report that actually sample to reduce
the run time, but run time is normally not a problem for perf script.
It's better to give an accurate representation of the program flow.

Default perf script to output all calls for itrace. That's a much saner
default. The old behavior can be still requested with
perf script --itrace=ibxwpe100000

v2: Fix ETM build failure
v3: Really fix ETM build failure (Kim Phillips)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/itrace.txt |  7 ++++---
 tools/perf/builtin-script.c         |  5 ++++-
 tools/perf/util/auxtrace.c          | 17 ++++++++++++-----
 tools/perf/util/auxtrace.h          |  5 ++++-
 tools/perf/util/cs-etm.c            |  3 ++-
 tools/perf/util/intel-bts.c         |  3 ++-
 tools/perf/util/intel-pt.c          |  3 ++-
 7 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index a3abe04c779d..c2182cbabde3 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -11,10 +11,11 @@
 		l	synthesize last branch entries (use with i or x)
 		s       skip initial number of events
 
-	The default is all events i.e. the same as --itrace=ibxwpe
+	The default is all events i.e. the same as --itrace=ibxwpe,
+	except for perf script where it is --itrace=ce
 
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
+	In addition, the period (default 100000, except for perf script where it is 1)
+	for instructions events can be specified in units of:
 
 		i	instructions
 		t	ticks
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 311f5b53dd83..519ebb5a1f96 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3128,7 +3128,10 @@ int cmd_script(int argc, const char **argv)
 	char *rec_script_path = NULL;
 	char *rep_script_path = NULL;
 	struct perf_session *session;
-	struct itrace_synth_opts itrace_synth_opts = { .set = false, };
+	struct itrace_synth_opts itrace_synth_opts = {
+		.set = false,
+		.default_no_sample = true,
+	};
 	char *script_path = NULL;
 	const char **__argv;
 	int i, j, err = 0;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c4617bcfd521..72d5ba2479bf 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -962,16 +962,23 @@ s64 perf_event__process_auxtrace(struct perf_session *session,
 #define PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ	64
 #define PERF_ITRACE_MAX_LAST_BRANCH_SZ		1024
 
-void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts)
+void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
+				    bool no_sample)
 {
-	synth_opts->instructions = true;
 	synth_opts->branches = true;
 	synth_opts->transactions = true;
 	synth_opts->ptwrites = true;
 	synth_opts->pwr_events = true;
 	synth_opts->errors = true;
-	synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
-	synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
+	if (no_sample) {
+		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
+		synth_opts->period = 1;
+		synth_opts->calls = true;
+	} else {
+		synth_opts->instructions = true;
+		synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
+		synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
+	}
 	synth_opts->callchain_sz = PERF_ITRACE_DEFAULT_CALLCHAIN_SZ;
 	synth_opts->last_branch_sz = PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
 	synth_opts->initial_skip = 0;
@@ -999,7 +1006,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	}
 
 	if (!str) {
-		itrace_synth_opts__set_default(synth_opts);
+		itrace_synth_opts__set_default(synth_opts, false);
 		return 0;
 	}
 
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 0a6ce9c4fc11..f6df30187e1c 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -57,6 +57,7 @@ enum itrace_period_type {
 /**
  * struct itrace_synth_opts - AUX area tracing synthesis options.
  * @set: indicates whether or not options have been set
+ * @default_no_sample: Default to no sampling.
  * @inject: indicates the event (not just the sample) must be fully synthesized
  *          because 'perf inject' will write it out
  * @instructions: whether to synthesize 'instructions' events
@@ -81,6 +82,7 @@ enum itrace_period_type {
  */
 struct itrace_synth_opts {
 	bool			set;
+	bool			default_no_sample;
 	bool			inject;
 	bool			instructions;
 	bool			branches;
@@ -527,7 +529,8 @@ int perf_event__process_auxtrace_error(struct perf_session *session,
 				       union perf_event *event);
 int itrace_parse_synth_opts(const struct option *opt, const char *str,
 			    int unset);
-void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts);
+void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
+				    bool no_sample);
 
 size_t perf_event__fprintf_auxtrace_error(union perf_event *event, FILE *fp);
 void perf_session__auxtrace_error_inc(struct perf_session *session,
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 2ae640257fdb..3b37d66dc533 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1432,7 +1432,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
 		etm->synth_opts = *session->itrace_synth_opts;
 	} else {
-		itrace_synth_opts__set_default(&etm->synth_opts);
+		itrace_synth_opts__set_default(&etm->synth_opts,
+				session->itrace_synth_opts->default_no_sample);
 		etm->synth_opts.callchain = false;
 	}
 
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 7f0c83b6332b..3b3a3d55dca1 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -910,7 +910,8 @@ int intel_bts_process_auxtrace_info(union perf_event *event,
 	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
 		bts->synth_opts = *session->itrace_synth_opts;
 	} else {
-		itrace_synth_opts__set_default(&bts->synth_opts);
+		itrace_synth_opts__set_default(&bts->synth_opts,
+				session->itrace_synth_opts->default_no_sample);
 		if (session->itrace_synth_opts)
 			bts->synth_opts.thread_stack =
 				session->itrace_synth_opts->thread_stack;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 48c1d415c6b0..ffa385a029b3 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2559,7 +2559,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
 		pt->synth_opts = *session->itrace_synth_opts;
 	} else {
-		itrace_synth_opts__set_default(&pt->synth_opts);
+		itrace_synth_opts__set_default(&pt->synth_opts,
+				session->itrace_synth_opts->default_no_sample);
 		if (use_browser != -1) {
 			pt->synth_opts.branches = false;
 			pt->synth_opts.callchain = true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace
  2018-09-20 18:05 Make perf script easier to use for itrace Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 2/5] perf, tools, script: Make itrace script default to all calls Andi Kleen
@ 2018-09-20 18:05 ` Andi Kleen
  2018-09-28 10:23   ` leo.yan
  2018-10-26  7:46   ` [tip:perf/urgent] tools " tip-bot for Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 4/5] tools, perf, script: Implement --graph-function Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 5/5] perf, tools, script: Support total cycles count Andi Kleen
  4 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2018-09-20 18:05 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add short cut options to print PT call trace and call-ret-trace,
for calls and call and returns. Roughly corresponds to ftrace
function tracer and function graph tracer.

Just makes these common use cases nicer to use.

% perf record -a -e intel_pt// sleep 1
% perf script --call-trace
	    perf   900 [000] 194167.205652203: ([kernel.kallsyms])          perf_pmu_enable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          event_filter_match
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage

% perf script --call-ret-trace
	    perf   900 [000] 194167.205652203:   tr strt     ([unknown])        pt_config
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_config
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_event_add
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            perf_pmu_enable
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_void
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_sched_in.isra.107
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_int
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            group_sched_in
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            event_filter_match
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_filter_match
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            group_sched_in
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])                perf_pmu_nop_txn
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                event_sched_in.isra.107
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                    perf_event_set_state.part.71

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
v2: Print errors, power, ptwrite too
---
 tools/perf/Documentation/perf-script.txt |  7 +++++++
 tools/perf/builtin-script.c              | 24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 00c655ab4968..805baabd238e 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -390,6 +390,13 @@ include::itrace.txt[]
 --xed::
 	Run xed disassembler on output. Requires installing the xed disassembler.
 
+--call-trace::
+	Show call stream for intel_pt traces. The CPUs are interleaved, but
+	can be filtered with -C.
+
+--call-ret-trace::
+	Show call and return stream for intel_pt traces.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 519ebb5a1f96..6c4562973983 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3119,6 +3119,26 @@ static int parse_xed(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int parse_call_trace(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent", 0);
+	itrace_parse_synth_opts(opt, "cewp", 0);
+	nanosecs = true;
+	return 0;
+}
+
+static int parse_callret_trace(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent,+flags", 0);
+	itrace_parse_synth_opts(opt, "crewp", 0);
+	nanosecs = true;
+	return 0;
+}
+
 int cmd_script(int argc, const char **argv)
 {
 	bool show_full_info = false;
@@ -3210,6 +3230,10 @@ int cmd_script(int argc, const char **argv)
 			"Decode instructions from itrace", parse_insn_trace),
 	OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
 			"Run xed disassembler on output", parse_xed),
+	OPT_CALLBACK_OPTARG(0, "call-trace", &itrace_synth_opts, NULL, NULL,
+			"Decode calls from from itrace", parse_call_trace),
+	OPT_CALLBACK_OPTARG(0, "call-ret-trace", &itrace_synth_opts, NULL, NULL,
+			"Decode calls and returns from itrace", parse_callret_trace),
 	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
 		   "Stop display of callgraph at these symbols"),
 	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 4/5] tools, perf, script: Implement --graph-function
  2018-09-20 18:05 Make perf script easier to use for itrace Andi Kleen
                   ` (2 preceding siblings ...)
  2018-09-20 18:05 ` [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace Andi Kleen
@ 2018-09-20 18:05 ` Andi Kleen
  2018-10-26  7:47   ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
  2018-09-20 18:05 ` [PATCH v6 5/5] perf, tools, script: Support total cycles count Andi Kleen
  4 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2018-09-20 18:05 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a ftrace style --graph-function argument to perf script that allows
to print itrace function calls only below a given function. This
makes it easier to find the code of interest in a large trace.

% perf record -e intel_pt//k -a sleep 1
% perf script --graph-function group_sched_in --call-trace
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          calc_timer_values
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_cpu
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          arch_perf_update_userpage
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              __fentry__
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              using_native_sched_clock
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_stable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_enable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])          group_sched_in
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              event_sched_in.isra.107
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_event_set_state.part.71
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_time
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_pmu_disable
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_log_itrace_start
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_userpage
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          calc_timer_values
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_cpu
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          arch_perf_update_userpage
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              __fentry__
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              using_native_sched_clock
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_stable

v2: Remove debug printout
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-script.txt |  4 +
 tools/perf/builtin-script.c              | 96 +++++++++++++++++++-----
 tools/perf/util/symbol.h                 |  3 +-
 tools/perf/util/thread.h                 |  2 +
 4 files changed, 86 insertions(+), 19 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 805baabd238e..a2b37ce48094 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -397,6 +397,10 @@ include::itrace.txt[]
 --call-ret-trace::
 	Show call and return stream for intel_pt traces.
 
+--graph-function::
+	For itrace only show specified functions and their callees for
+	itrace. Multiple functions can be separated by comma.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6c4562973983..6232658c6f31 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1105,6 +1105,35 @@ static int perf_sample__fprintf_addr(struct perf_sample *sample,
 	return printed;
 }
 
+static const char *resolve_branch_sym(struct perf_sample *sample,
+				      struct perf_evsel *evsel,
+				      struct thread *thread,
+				      struct addr_location *al,
+				      u64 *ip)
+{
+	struct addr_location addr_al;
+	struct perf_event_attr *attr = &evsel->attr;
+	const char *name = NULL;
+
+	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
+		if (sample_addr_correlates_sym(attr)) {
+			thread__resolve(thread, &addr_al, sample);
+			if (addr_al.sym)
+				name = addr_al.sym->name;
+			else
+				*ip = sample->addr;
+		} else {
+			*ip = sample->addr;
+		}
+	} else if (sample->flags & (PERF_IP_FLAG_RETURN | PERF_IP_FLAG_TRACE_END)) {
+		if (al->sym)
+			name = al->sym->name;
+		else
+			*ip = sample->ip;
+	}
+	return name;
+}
+
 static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 					   struct perf_evsel *evsel,
 					   struct thread *thread,
@@ -1112,7 +1141,6 @@ static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 {
 	struct perf_event_attr *attr = &evsel->attr;
 	size_t depth = thread_stack__depth(thread);
-	struct addr_location addr_al;
 	const char *name = NULL;
 	static int spacing;
 	int len = 0;
@@ -1126,22 +1154,7 @@ static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 	if (thread->ts && sample->flags & PERF_IP_FLAG_RETURN)
 		depth += 1;
 
-	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
-		if (sample_addr_correlates_sym(attr)) {
-			thread__resolve(thread, &addr_al, sample);
-			if (addr_al.sym)
-				name = addr_al.sym->name;
-			else
-				ip = sample->addr;
-		} else {
-			ip = sample->addr;
-		}
-	} else if (sample->flags & (PERF_IP_FLAG_RETURN | PERF_IP_FLAG_TRACE_END)) {
-		if (al->sym)
-			name = al->sym->name;
-		else
-			ip = sample->ip;
-	}
+	name = resolve_branch_sym(sample, evsel, thread, al, &ip);
 
 	if (PRINT_FIELD(DSO) && !(PRINT_FIELD(IP) || PRINT_FIELD(ADDR))) {
 		dlen += fprintf(fp, "(");
@@ -1647,6 +1660,47 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	}
 }
 
+static bool show_event(struct perf_sample *sample,
+		       struct perf_evsel *evsel,
+		       struct thread *thread,
+		       struct addr_location *al)
+{
+	int depth = thread_stack__depth(thread);
+
+	if (!symbol_conf.graph_function)
+		return true;
+
+	if (thread->filter) {
+		if (depth <= thread->filter_entry_depth) {
+			thread->filter = false;
+			return false;
+		}
+		return true;
+	} else {
+		const char *s = symbol_conf.graph_function;
+		u64 ip;
+		const char *name = resolve_branch_sym(sample, evsel, thread, al,
+				&ip);
+		unsigned nlen;
+
+		if (!name)
+			return false;
+		nlen = strlen(name);
+		while (*s) {
+			unsigned len = strcspn(s, ",");
+			if (nlen == len && !strncmp(name, s, len)) {
+				thread->filter = true;
+				thread->filter_entry_depth = depth;
+				return true;
+			}
+			s += len;
+			if (*s == ',')
+				s++;
+		}
+		return false;
+	}
+}
+
 static void process_event(struct perf_script *script,
 			  struct perf_sample *sample, struct perf_evsel *evsel,
 			  struct addr_location *al,
@@ -1661,6 +1715,9 @@ static void process_event(struct perf_script *script,
 	if (output[type].fields == 0)
 		return;
 
+	if (!show_event(sample, evsel, thread, al))
+		return;
+
 	++es->samples;
 
 	perf_sample__fprintf_start(sample, thread, evsel,
@@ -3234,6 +3291,8 @@ int cmd_script(int argc, const char **argv)
 			"Decode calls from from itrace", parse_call_trace),
 	OPT_CALLBACK_OPTARG(0, "call-ret-trace", &itrace_synth_opts, NULL, NULL,
 			"Decode calls and returns from itrace", parse_callret_trace),
+	OPT_STRING(0, "graph-function", &symbol_conf.graph_function, "symbol[,symbol...]",
+			"Only print symbols and callees with --call-trace/--call-ret-trace"),
 	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
 		   "Stop display of callgraph at these symbols"),
 	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),
@@ -3489,7 +3548,8 @@ int cmd_script(int argc, const char **argv)
 	script.session = session;
 	script__setup_sample_type(&script);
 
-	if (output[PERF_TYPE_HARDWARE].fields & PERF_OUTPUT_CALLINDENT)
+	if ((output[PERF_TYPE_HARDWARE].fields & PERF_OUTPUT_CALLINDENT) ||
+	    symbol_conf.graph_function)
 		itrace_synth_opts.thread_stack = true;
 
 	session->itrace_synth_opts = &itrace_synth_opts;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index f25fae4b5743..d726a8a7bb1b 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -123,7 +123,8 @@ struct symbol_conf {
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
-			*field_sep;
+			*field_sep,
+			*graph_function;
 	const char	*default_guest_vmlinux_name,
 			*default_guest_kallsyms,
 			*default_guest_modules;
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 07606aa6998d..36c09a9904e6 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -42,6 +42,8 @@ struct thread {
 	void				*addr_space;
 	struct unwind_libunwind_ops	*unwind_libunwind_ops;
 #endif
+	bool			filter;
+	int			filter_entry_depth;
 };
 
 struct machine;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 5/5] perf, tools, script: Support total cycles count
  2018-09-20 18:05 Make perf script easier to use for itrace Andi Kleen
                   ` (3 preceding siblings ...)
  2018-09-20 18:05 ` [PATCH v6 4/5] tools, perf, script: Implement --graph-function Andi Kleen
@ 2018-09-20 18:05 ` Andi Kleen
  2018-09-24 17:07   ` Andi Kleen
  4 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2018-09-20 18:05 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

For perf script brstackinsn also print a running cycles count.
This makes it easier to calculate cycle deltas for code sections
measured with LBRs.

% perf record -b -a sleep 1
% perf script -F +brstackinsn
...
        _dl_sysdep_start+330:
        00007eff9f20583a        insn: 75 c4                     # PRED 24 cycles [24]
        00007eff9f205800        insn: 48 83 e8 03
        00007eff9f205804        insn: 48 83 f8 1e
        00007eff9f205808        insn: 77 26
        00007eff9f20580a        insn: 48 63 04 81
        00007eff9f20580e        insn: 48 01 c8
        00007eff9f205811        insn: ff e0                     # MISPRED 31 cycles [7] 0.71 IPC
        00007eff9f2059c0        insn: 44 8b 62 08
        00007eff9f2059c4        insn: e9 67 fe ff ff            # PRED 55 cycles [24] 0.04 IPC
        00007eff9f205830        insn: 48 83 c2 10
        00007eff9f205834        insn: 48 8b 02
        00007eff9f205837        insn: 48 85 c0
        00007eff9f20583a        insn: 75 c4                     # PRED 68 cycles [13] 0.23 IPC

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: reflow line
---
 tools/perf/builtin-script.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6232658c6f31..53dc27e63d98 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -913,7 +913,7 @@ static int grab_bb(u8 *buffer, u64 start, u64 end,
 
 static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 			    struct perf_insn *x, u8 *inbuf, int len,
-			    int insn, FILE *fp)
+			    int insn, FILE *fp, int *total_cycles)
 {
 	int printed = fprintf(fp, "\t%016" PRIx64 "\t%-30s\t#%s%s%s%s", ip,
 			      dump_insn(x, ip, inbuf, len, NULL),
@@ -922,7 +922,8 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 			      en->flags.in_tx ? " INTX" : "",
 			      en->flags.abort ? " ABORT" : "");
 	if (en->flags.cycles) {
-		printed += fprintf(fp, " %d cycles", en->flags.cycles);
+		*total_cycles += en->flags.cycles;
+		printed += fprintf(fp, " %d cycles [%d]", *total_cycles, en->flags.cycles);
 		if (insn)
 			printed += fprintf(fp, " %.2f IPC", (float)insn / en->flags.cycles);
 	}
@@ -979,6 +980,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	u8 buffer[MAXBB];
 	unsigned off;
 	struct symbol *lastsym = NULL;
+	int total_cycles = 0;
 
 	if (!(br && br->nr))
 		return 0;
@@ -999,7 +1001,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
 					   x.cpumode, x.cpu, &lastsym, attr, fp);
 		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
-					    &x, buffer, len, 0, fp);
+					    &x, buffer, len, 0, fp, &total_cycles);
 	}
 
 	/* Print all blocks */
@@ -1027,7 +1029,8 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 
 			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
 			if (ip == end) {
-				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, insn, fp);
+				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, insn, fp,
+							    &total_cycles);
 				break;
 			} else {
 				printed += fprintf(fp, "\t%016" PRIx64 "\t%s\n", ip,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 5/5] perf, tools, script: Support total cycles count
  2018-09-20 18:05 ` [PATCH v6 5/5] perf, tools, script: Support total cycles count Andi Kleen
@ 2018-09-24 17:07   ` Andi Kleen
  2018-10-26  7:47     ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2018-09-24 17:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, linux-kernel

[Updated patch with a bug fix. Please use this version.]

For perf script brstackinsn also print a running cycles count.
This makes it easier to calculate cycle deltas for code sections
measured with LBRs.

% perf record -b -a sleep 1
% perf script -F +brstackinsn
...
        00007f73ecc41083        insn: 74 06                     # PRED 9 cycles [17] 1.11 IPC
        00007f73ecc4108b        insn: a8 10
        00007f73ecc4108d        insn: 74 71                     # PRED 1 cycles [18] 1.00 IPC
        00007f73ecc41100        insn: 48 8b 46 10
        00007f73ecc41104        insn: 4c 8b 38
        00007f73ecc41107        insn: 4d 85 ff
        00007f73ecc4110a        insn: 0f 84 b0 00 00 00
        00007f73ecc41110        insn: 83 43 58 01
        00007f73ecc41114        insn: 48 89 df
        00007f73ecc41117        insn: e8 94 73 04 00            # PRED 6 cycles [24] 1.00 IPC

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: reflow line
v3: Print cycles in correct column

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6232658c6f31..dbb0a780225c 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -913,7 +913,7 @@ static int grab_bb(u8 *buffer, u64 start, u64 end,
 
 static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 			    struct perf_insn *x, u8 *inbuf, int len,
-			    int insn, FILE *fp)
+			    int insn, FILE *fp, int *total_cycles)
 {
 	int printed = fprintf(fp, "\t%016" PRIx64 "\t%-30s\t#%s%s%s%s", ip,
 			      dump_insn(x, ip, inbuf, len, NULL),
@@ -922,7 +922,8 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 			      en->flags.in_tx ? " INTX" : "",
 			      en->flags.abort ? " ABORT" : "");
 	if (en->flags.cycles) {
-		printed += fprintf(fp, " %d cycles", en->flags.cycles);
+		*total_cycles += en->flags.cycles;
+		printed += fprintf(fp, " %d cycles [%d]", en->flags.cycles, *total_cycles);
 		if (insn)
 			printed += fprintf(fp, " %.2f IPC", (float)insn / en->flags.cycles);
 	}
@@ -979,6 +980,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	u8 buffer[MAXBB];
 	unsigned off;
 	struct symbol *lastsym = NULL;
+	int total_cycles = 0;
 
 	if (!(br && br->nr))
 		return 0;
@@ -999,7 +1001,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
 					   x.cpumode, x.cpu, &lastsym, attr, fp);
 		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
-					    &x, buffer, len, 0, fp);
+					    &x, buffer, len, 0, fp, &total_cycles);
 	}
 
 	/* Print all blocks */
@@ -1027,7 +1029,8 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 
 			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
 			if (ip == end) {
-				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, insn, fp);
+				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, insn, fp,
+							    &total_cycles);
 				break;
 			} else {
 				printed += fprintf(fp, "\t%016" PRIx64 "\t%s\n", ip,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace
  2018-09-20 18:05 ` [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace Andi Kleen
@ 2018-09-28 10:23   ` leo.yan
  2018-09-28 17:19     ` Andi Kleen
  2018-10-26  7:46   ` [tip:perf/urgent] tools " tip-bot for Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: leo.yan @ 2018-09-28 10:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, linux-kernel, Andi Kleen, Mathieu Poirier

Hi Andi,

On Thu, Sep 20, 2018 at 11:05:38AM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add short cut options to print PT call trace and call-ret-trace,
> for calls and call and returns. Roughly corresponds to ftrace
> function tracer and function graph tracer.
> 
> Just makes these common use cases nicer to use.
> 
> % perf record -a -e intel_pt// sleep 1
> % perf script --call-trace
> 	    perf   900 [000] 194167.205652203: ([kernel.kallsyms])          perf_pmu_enable
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])          __x86_indirect_thunk_rax
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])          event_filter_match
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
>             perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
> 
> % perf script --call-ret-trace
> 	    perf   900 [000] 194167.205652203:   tr strt     ([unknown])        pt_config
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_config
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_event_add
>             perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            perf_pmu_enable
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_void
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_sched_in.isra.107
>             perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            __x86_indirect_thunk_rax
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_int
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            group_sched_in
>             perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            event_filter_match
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_filter_match
>             perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            group_sched_in
>             perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                __x86_indirect_thunk_rax
>             perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])                perf_pmu_nop_txn
>             perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                event_sched_in.isra.107
>             perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                    perf_event_set_state.part.71
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
> v2: Print errors, power, ptwrite too
> ---
>  tools/perf/Documentation/perf-script.txt |  7 +++++++
>  tools/perf/builtin-script.c              | 24 ++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index 00c655ab4968..805baabd238e 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -390,6 +390,13 @@ include::itrace.txt[]
>  --xed::
>  	Run xed disassembler on output. Requires installing the xed disassembler.
>  
> +--call-trace::
> +	Show call stream for intel_pt traces. The CPUs are interleaved, but
> +	can be filtered with -C.
> +
> +--call-ret-trace::
> +	Show call and return stream for intel_pt traces.

Seems to me, these two features are _NOT_ only benefit for intel_pt,
other hardware tracing (e.g. Arm CoreSight) can enable these features
as well.  This patch is to document only for intel_pt, later if we
enable this feature on Arm platform we need to change the doc;
alternatively we can use more general description for these two options
at the first place.  How about you think for this?

Except this question, this patch looks good for me.

Thanks,
Leo Yan

>  SEE ALSO
>  --------
>  linkperf:perf-record[1], linkperf:perf-script-perl[1],
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 519ebb5a1f96..6c4562973983 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -3119,6 +3119,26 @@ static int parse_xed(const struct option *opt __maybe_unused,
>  	return 0;
>  }
>  
> +static int parse_call_trace(const struct option *opt __maybe_unused,
> +			    const char *str __maybe_unused,
> +			    int unset __maybe_unused)
> +{
> +	parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent", 0);
> +	itrace_parse_synth_opts(opt, "cewp", 0);
> +	nanosecs = true;
> +	return 0;
> +}
> +
> +static int parse_callret_trace(const struct option *opt __maybe_unused,
> +			    const char *str __maybe_unused,
> +			    int unset __maybe_unused)
> +{
> +	parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent,+flags", 0);
> +	itrace_parse_synth_opts(opt, "crewp", 0);
> +	nanosecs = true;
> +	return 0;
> +}
> +
>  int cmd_script(int argc, const char **argv)
>  {
>  	bool show_full_info = false;
> @@ -3210,6 +3230,10 @@ int cmd_script(int argc, const char **argv)
>  			"Decode instructions from itrace", parse_insn_trace),
>  	OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
>  			"Run xed disassembler on output", parse_xed),
> +	OPT_CALLBACK_OPTARG(0, "call-trace", &itrace_synth_opts, NULL, NULL,
> +			"Decode calls from from itrace", parse_call_trace),
> +	OPT_CALLBACK_OPTARG(0, "call-ret-trace", &itrace_synth_opts, NULL, NULL,
> +			"Decode calls and returns from itrace", parse_callret_trace),
>  	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
>  		   "Stop display of callgraph at these symbols"),
>  	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace
  2018-09-28 10:23   ` leo.yan
@ 2018-09-28 17:19     ` Andi Kleen
  2018-09-29  7:39       ` leo.yan
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2018-09-28 17:19 UTC (permalink / raw)
  To: leo.yan
  Cc: Andi Kleen, acme, jolsa, linux-kernel, Andi Kleen, Mathieu Poirier

> Seems to me, these two features are _NOT_ only benefit for intel_pt,
> other hardware tracing (e.g. Arm CoreSight) can enable these features
> as well.  This patch is to document only for intel_pt, later if we
> enable this feature on Arm platform we need to change the doc;
> alternatively we can use more general description for these two options
> at the first place.  How about you think for this?

Likely it already works for CoreSight

I specified intel_pt, because if we just say traces the users won't
know what PMU to specify for record. Being too abstract is
often not helpful.

If someone successfully tests it on CoreSight they could submit
a patch to the documentation to add "or <coresightpmu>" to these
two cases. That would make it then clear for those users too.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace
  2018-09-28 17:19     ` Andi Kleen
@ 2018-09-29  7:39       ` leo.yan
  2018-10-02 14:24         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 21+ messages in thread
From: leo.yan @ 2018-09-29  7:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, linux-kernel, Andi Kleen, Mathieu Poirier

On Fri, Sep 28, 2018 at 10:19:44AM -0700, Andi Kleen wrote:
> > Seems to me, these two features are _NOT_ only benefit for intel_pt,
> > other hardware tracing (e.g. Arm CoreSight) can enable these features
> > as well.  This patch is to document only for intel_pt, later if we
> > enable this feature on Arm platform we need to change the doc;
> > alternatively we can use more general description for these two options
> > at the first place.  How about you think for this?
> 
> Likely it already works for CoreSight

I think Kim played with this patch series and he also pointed me for
this series.

> I specified intel_pt, because if we just say traces the users won't
> know what PMU to specify for record. Being too abstract is
> often not helpful.
> 
> If someone successfully tests it on CoreSight they could submit
> a patch to the documentation to add "or <coresightpmu>" to these
> two cases. That would make it then clear for those users too.

Okay, agree.

Actually I applied your patch series v6 on the perf latest core branch
and tested on Arm Juno board, I observed there have couple issues, one
is CoreSight trace data doesn't support timestamp so I need to use
'-F,-time' to workaround the command failure; another issue is now
CoreSight is absent to set sample flags so perf fails to resolve
symbols [1]; these two issues are only related with CoreSight decoder
and it's no matter with this patch, so I didn't mention in my previous
replying.

I need a bit more time to work out more formal CoreSight fixing patches
and will send for reviewing (also will include one patch to clarifying
Arm Coresight support in doc as suggested).

Thanks,
Leo Yan

[1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/builtin-script.c?h=perf/core#n1128

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace
  2018-09-29  7:39       ` leo.yan
@ 2018-10-02 14:24         ` Arnaldo Carvalho de Melo
  2018-10-02 23:09           ` leo.yan
  0 siblings, 1 reply; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-10-02 14:24 UTC (permalink / raw)
  To: leo.yan; +Cc: Andi Kleen, jolsa, linux-kernel, Andi Kleen, Mathieu Poirier

Em Sat, Sep 29, 2018 at 03:39:03PM +0800, leo.yan@linaro.org escreveu:
> On Fri, Sep 28, 2018 at 10:19:44AM -0700, Andi Kleen wrote:
> > > Seems to me, these two features are _NOT_ only benefit for intel_pt,
> > > other hardware tracing (e.g. Arm CoreSight) can enable these features
> > > as well.  This patch is to document only for intel_pt, later if we
> > > enable this feature on Arm platform we need to change the doc;
> > > alternatively we can use more general description for these two options
> > > at the first place.  How about you think for this?
> > 
> > Likely it already works for CoreSight
> 
> I think Kim played with this patch series and he also pointed me for
> this series.
> 
> > I specified intel_pt, because if we just say traces the users won't
> > know what PMU to specify for record. Being too abstract is
> > often not helpful.
> > 
> > If someone successfully tests it on CoreSight they could submit
> > a patch to the documentation to add "or <coresightpmu>" to these
> > two cases. That would make it then clear for those users too.
> 
> Okay, agree.
> 
> Actually I applied your patch series v6 on the perf latest core branch
> and tested on Arm Juno board, I observed there have couple issues, one
> is CoreSight trace data doesn't support timestamp so I need to use
> '-F,-time' to workaround the command failure; another issue is now
> CoreSight is absent to set sample flags so perf fails to resolve
> symbols [1]; these two issues are only related with CoreSight decoder
> and it's no matter with this patch, so I didn't mention in my previous
> replying.

Could I take that as a Tested-by? I.e. you actually applied the patches,
run it and saw that it works as advertised, right?

- Arnaldo
 
> I need a bit more time to work out more formal CoreSight fixing patches
> and will send for reviewing (also will include one patch to clarifying
> Arm Coresight support in doc as suggested).
> 
> Thanks,
> Leo Yan
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/builtin-script.c?h=perf/core#n1128

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace
  2018-10-02 14:24         ` Arnaldo Carvalho de Melo
@ 2018-10-02 23:09           ` leo.yan
  0 siblings, 0 replies; 21+ messages in thread
From: leo.yan @ 2018-10-02 23:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, jolsa, linux-kernel, Andi Kleen, Mathieu Poirier

On Tue, Oct 02, 2018 at 11:24:56AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sat, Sep 29, 2018 at 03:39:03PM +0800, leo.yan@linaro.org escreveu:
> > On Fri, Sep 28, 2018 at 10:19:44AM -0700, Andi Kleen wrote:
> > > > Seems to me, these two features are _NOT_ only benefit for intel_pt,
> > > > other hardware tracing (e.g. Arm CoreSight) can enable these features
> > > > as well.  This patch is to document only for intel_pt, later if we
> > > > enable this feature on Arm platform we need to change the doc;
> > > > alternatively we can use more general description for these two options
> > > > at the first place.  How about you think for this?
> > > 
> > > Likely it already works for CoreSight
> > 
> > I think Kim played with this patch series and he also pointed me for
> > this series.
> > 
> > > I specified intel_pt, because if we just say traces the users won't
> > > know what PMU to specify for record. Being too abstract is
> > > often not helpful.
> > > 
> > > If someone successfully tests it on CoreSight they could submit
> > > a patch to the documentation to add "or <coresightpmu>" to these
> > > two cases. That would make it then clear for those users too.
> > 
> > Okay, agree.
> > 
> > Actually I applied your patch series v6 on the perf latest core branch
> > and tested on Arm Juno board, I observed there have couple issues, one
> > is CoreSight trace data doesn't support timestamp so I need to use
> > '-F,-time' to workaround the command failure; another issue is now
> > CoreSight is absent to set sample flags so perf fails to resolve
> > symbols [1]; these two issues are only related with CoreSight decoder
> > and it's no matter with this patch, so I didn't mention in my previous
> > replying.
> 
> Could I take that as a Tested-by? I.e. you actually applied the patches,
> run it and saw that it works as advertised, right?

Yes, Arnaldo.  This is my testing tag:

Tested-by: Leo Yan <leo.yan@linaro.org>

> - Arnaldo
>  
> > I need a bit more time to work out more formal CoreSight fixing patches
> > and will send for reviewing (also will include one patch to clarifying
> > Arm Coresight support in doc as suggested).
> > 
> > Thanks,
> > Leo Yan
> > 
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/builtin-script.c?h=perf/core#n1128

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding
  2018-09-20 18:05 ` [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding Andi Kleen
@ 2018-10-08 17:50   ` Arnaldo Carvalho de Melo
  2018-10-08 18:03   ` Arnaldo Carvalho de Melo
  2018-10-26  7:45   ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
  2 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-10-08 17:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, Andi Kleen

Em Thu, Sep 20, 2018 at 11:05:36AM -0700, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a --insn-trace short hand option for decoding and disassembling
> instruction streams for intel_pt. This automatically pipes the
> output into the xed disassembler to generate disassembled instructions.
> This just makes this use model much nicer to use
> 
> Before
> 
> % perf record -e intel_pt// ...
> % perf script --itrace=i0ns --ns -F +insn,-event,-period | xed -F insn: -A -64
>    swapper     0 [000] 117276.429606186:  ffffffff81010486 pt_config ([kernel.kallsyms])          nopl  %eax, (%rax,%rax,1)
>    swapper     0 [000] 117276.429606186:  ffffffff8101048b pt_config ([kernel.kallsyms])          add $0x10, %rsp
>    swapper     0 [000] 117276.429606186:  ffffffff8101048f pt_config ([kernel.kallsyms])          popq  %rbx
>    swapper     0 [000] 117276.429606186:  ffffffff81010490 pt_config ([kernel.kallsyms])          popq  %rbp
>    swapper     0 [000] 117276.429606186:  ffffffff81010491 pt_config ([kernel.kallsyms])          popq  %r12
>    swapper     0 [000] 117276.429606186:  ffffffff81010493 pt_config ([kernel.kallsyms])          popq  %r13
>    swapper     0 [000] 117276.429606186:  ffffffff81010495 pt_config ([kernel.kallsyms])          popq  %r14
>    swapper     0 [000] 117276.429606186:  ffffffff81010497 pt_config ([kernel.kallsyms])          popq  %r15
>    swapper     0 [000] 117276.429606186:  ffffffff81010499 pt_config ([kernel.kallsyms])          retq
>    swapper     0 [000] 117276.429606186:  ffffffff8101063e pt_event_add ([kernel.kallsyms])               cmpl  $0x1, 0x1b0(%rbx)
>    swapper     0 [000] 117276.429606186:  ffffffff81010645 pt_event_add ([kernel.kallsyms])               mov $0xffffffea, %eax
>    swapper     0 [000] 117276.429606186:  ffffffff8101064a pt_event_add ([kernel.kallsyms])               mov $0x0, %edx
>    swapper     0 [000] 117276.429606186:  ffffffff8101064f pt_event_add ([kernel.kallsyms])               popq  %rbx
>    swapper     0 [000] 117276.429606186:  ffffffff81010650 pt_event_add ([kernel.kallsyms])               cmovnz %edx, %eax
>    swapper     0 [000] 117276.429606186:  ffffffff81010653 pt_event_add ([kernel.kallsyms])               jmp 0xffffffff81010635
>    swapper     0 [000] 117276.429606186:  ffffffff81010635 pt_event_add ([kernel.kallsyms])               retq
>    swapper     0 [000] 117276.429606186:  ffffffff8115e687 event_sched_in.isra.107 ([kernel.kallsyms])            test %eax, %eax
> 
> Now
> 
> % perf record -e intel_pt// ...
> % perf script --insn-trace --xed
> ... same output ...
> 
> XED needs to be installed with:
> 
> > git clone https://github.com/intelxed/mbuild.git mbuild
> > git clone https://github.com/intelxed/xed
> > cd xed
> > mkdir obj
> > cd obj
> > ../mfile.py
> > sudo ../mfile.py --prefix=/usr/local install
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Fixed up whitespace damage in the Documentation for building xed,
applied.

- Arnaldo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding
  2018-09-20 18:05 ` [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding Andi Kleen
  2018-10-08 17:50   ` Arnaldo Carvalho de Melo
@ 2018-10-08 18:03   ` Arnaldo Carvalho de Melo
  2018-10-08 18:41     ` Andi Kleen
  2018-10-26  7:45   ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
  2 siblings, 1 reply; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-10-08 18:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, Andi Kleen

Em Thu, Sep 20, 2018 at 11:05:36AM -0700, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a --insn-trace short hand option for decoding and disassembling
> instruction streams for intel_pt. This automatically pipes the
> output into the xed disassembler to generate disassembled instructions.
> This just makes this use model much nicer to use
> 
> Before
> 
> % perf record -e intel_pt// ...
> % perf script --itrace=i0ns --ns -F +insn,-event,-period | xed -F insn: -A -64
>    swapper     0 [000] 117276.429606186:  ffffffff81010486 pt_config ([kernel.kallsyms])          nopl  %eax, (%rax,%rax,1)
>    swapper     0 [000] 117276.429606186:  ffffffff8101048b pt_config ([kernel.kallsyms])          add $0x10, %rsp
>    swapper     0 [000] 117276.429606186:  ffffffff8101048f pt_config ([kernel.kallsyms])          popq  %rbx
>    swapper     0 [000] 117276.429606186:  ffffffff81010490 pt_config ([kernel.kallsyms])          popq  %rbp
>    swapper     0 [000] 117276.429606186:  ffffffff81010491 pt_config ([kernel.kallsyms])          popq  %r12
>    swapper     0 [000] 117276.429606186:  ffffffff81010493 pt_config ([kernel.kallsyms])          popq  %r13
>    swapper     0 [000] 117276.429606186:  ffffffff81010495 pt_config ([kernel.kallsyms])          popq  %r14
>    swapper     0 [000] 117276.429606186:  ffffffff81010497 pt_config ([kernel.kallsyms])          popq  %r15
>    swapper     0 [000] 117276.429606186:  ffffffff81010499 pt_config ([kernel.kallsyms])          retq
>    swapper     0 [000] 117276.429606186:  ffffffff8101063e pt_event_add ([kernel.kallsyms])               cmpl  $0x1, 0x1b0(%rbx)
>    swapper     0 [000] 117276.429606186:  ffffffff81010645 pt_event_add ([kernel.kallsyms])               mov $0xffffffea, %eax
>    swapper     0 [000] 117276.429606186:  ffffffff8101064a pt_event_add ([kernel.kallsyms])               mov $0x0, %edx
>    swapper     0 [000] 117276.429606186:  ffffffff8101064f pt_event_add ([kernel.kallsyms])               popq  %rbx
>    swapper     0 [000] 117276.429606186:  ffffffff81010650 pt_event_add ([kernel.kallsyms])               cmovnz %edx, %eax
>    swapper     0 [000] 117276.429606186:  ffffffff81010653 pt_event_add ([kernel.kallsyms])               jmp 0xffffffff81010635
>    swapper     0 [000] 117276.429606186:  ffffffff81010635 pt_event_add ([kernel.kallsyms])               retq
>    swapper     0 [000] 117276.429606186:  ffffffff8115e687 event_sched_in.isra.107 ([kernel.kallsyms])            test %eax, %eax
> 
> Now
> 
> % perf record -e intel_pt// ...
> % perf script --insn-trace --xed
> ... same output ...
> 
> XED needs to be installed with:
> 
> > git clone https://github.com/intelxed/mbuild.git mbuild
> > git clone https://github.com/intelxed/xed
> > cd xed
> > mkdir obj
> > cd obj
> > ../mfile.py
> > sudo ../mfile.py --prefix=/usr/local install

I tried following these instructions, but in the end these are the only
files installed in /usr/local:

[acme@jouet obj]$ find /usr/local -name "*xed*"
/usr/local/include/xed
/usr/local/include/xed/xed-reg-class.h
/usr/local/include/xed/xed-reg-role-enum.h
/usr/local/include/xed/xed-format-options.h
/usr/local/include/xed/xed-encoder-hl.h
/usr/local/include/xed/xed-ild.h
/usr/local/include/xed/xed-encode.h
/usr/local/include/xed/xed-immed.h
/usr/local/include/xed/xed-init-pointer-names.h
/usr/local/include/xed/xed-operand-accessors.h
/usr/local/include/xed/xed-operand-enum.h
/usr/local/include/xed/xed-flags.h
/usr/local/include/xed/xed-operand-action.h
/usr/local/include/xed/xed-print-info.h
/usr/local/include/xed/xed-chip-features.h
/usr/local/include/xed/xed-types.h
/usr/local/include/xed/xed-common-hdrs.h
/usr/local/include/xed/xed-decoded-inst-api.h
/usr/local/include/xed/xed-inst.h
/usr/local/include/xed/xed-address-width-enum.h
/usr/local/include/xed/xed-state.h
/usr/local/include/xed/xed-decoded-inst.h
/usr/local/include/xed/xed-isa-set.h
/usr/local/include/xed/xed-reg-enum.h
/usr/local/include/xed/xed-attribute-enum.h
/usr/local/include/xed/xed-operand-ctype-enum.h
/usr/local/include/xed/xed-extension-enum.h
/usr/local/include/xed/xed-iformfl-enum.h
/usr/local/include/xed/xed-convert-table-init.h
/usr/local/include/xed/xed-immdis.h
/usr/local/include/xed/xed-iclass-enum.h
/usr/local/include/xed/xed-flag-enum.h
/usr/local/include/xed/xed-get-time.h
/usr/local/include/xed/xed-attributes.h
/usr/local/include/xed/xed-init.h
/usr/local/include/xed/xed-operand-storage.h
/usr/local/include/xed/xed-operand-width-enum.h
/usr/local/include/xed/xed-operand-ctype-map.h
/usr/local/include/xed/xed-error-enum.h
/usr/local/include/xed/xed-operand-values-interface.h
/usr/local/include/xed/xed-operand-convert-enum.h
/usr/local/include/xed/xed-chip-enum.h
/usr/local/include/xed/xed-cpuid-bit-enum.h
/usr/local/include/xed/xed-iform-enum.h
/usr/local/include/xed/xed-decode.h
/usr/local/include/xed/xed-flag-action-enum.h
/usr/local/include/xed/xed-rep-prefix.h
/usr/local/include/xed/xed-patch.h
/usr/local/include/xed/xed-util.h
/usr/local/include/xed/xed-cpuid-rec.h
/usr/local/include/xed/xed-iform-map.h
/usr/local/include/xed/xed-category-enum.h
/usr/local/include/xed/xed-machine-mode-enum.h
/usr/local/include/xed/xed-syntax-enum.h
/usr/local/include/xed/xed-isa-set-enum.h
/usr/local/include/xed/xed-disas.h
/usr/local/include/xed/xed-operand-type-enum.h
/usr/local/include/xed/xed-portability.h
/usr/local/include/xed/xed-exception-enum.h
/usr/local/include/xed/xed-version.h
/usr/local/include/xed/xed-operand-element-xtype-enum.h
/usr/local/include/xed/xed-build-defines.h
/usr/local/include/xed/xed-common-defs.h
/usr/local/include/xed/xed-gen-table-defs.h
/usr/local/include/xed/xed-operand-element-type-enum.h
/usr/local/include/xed/xed-nonterminal-enum.h
/usr/local/include/xed/xed-encoder-gen-defs.h
/usr/local/include/xed/xed-agen.h
/usr/local/include/xed/xed-interface.h
/usr/local/include/xed/xed-encoder-iforms.h
/usr/local/include/xed/xed-operand-visibility-enum.h
/usr/local/include/xed/xed-operand-action-enum.h
/usr/local/include/xed/xed-reg-class-enum.h
/usr/local/include/llvm/MC/MCFixedLenDisassembler.h
/usr/local/include/llvm/ADT/IndexedMap.h
/usr/local/include/clang/Basic/FixedPoint.h
/usr/local/lib/libxed.a
/usr/local/lib/libxed-ild.a
[acme@jouet obj]$

I have /usr/local/bin in my PATH and:

[acme@jouet perf]$ perf script --xed -i /home/acme/perf.data.intel_pt
sh: xed: command not found
[acme@jouet perf]$

What am I missing?

- Arnaldo
 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> --
> 
> v2: Add separate --xed option
> v3: Add xed build documentation and update commit
> ---
>  tools/perf/Documentation/build-xed.txt   | 11 +++++++++++
>  tools/perf/Documentation/perf-script.txt |  7 +++++++
>  tools/perf/builtin-script.c              | 23 +++++++++++++++++++++++
>  3 files changed, 41 insertions(+)
>  create mode 100644 tools/perf/Documentation/build-xed.txt
> 
> diff --git a/tools/perf/Documentation/build-xed.txt b/tools/perf/Documentation/build-xed.txt
> new file mode 100644
> index 000000000000..8da3028e6dca
> --- /dev/null
> +++ b/tools/perf/Documentation/build-xed.txt
> @@ -0,0 +1,11 @@
> +
> +For --xed the xed tool is needed. Here is how to install it:
> +
> +> git clone https://github.com/intelxed/mbuild.git mbuild                                                                                        
> +> git clone https://github.com/intelxed/xed                                                                                                      
> +> cd xed                                                                                                                                         
> +> mkdir obj                                                                                                                                      
> +> cd obj                                                                                                                                         
> +> ../mfile.py                                                                                                                                    
> +> sudo ../mfile.py --prefix=/usr/local install    
> +
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index afdafe2110a1..00c655ab4968 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -383,6 +383,13 @@ include::itrace.txt[]
>  	will be printed. Each entry has function name and file/line. Enabled by
>  	default, disable with --no-inline.
>  
> +--insn-trace::
> +	Show instruction stream for intel_pt traces. Combine with --xed to
> +	show disassembly.
> +
> +--xed::
> +	Run xed disassembler on output. Requires installing the xed disassembler.
> +
>  SEE ALSO
>  --------
>  linkperf:perf-record[1], linkperf:perf-script-perl[1],
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 4da5e32b9e03..311f5b53dd83 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -44,6 +44,7 @@
>  #include <sys/stat.h>
>  #include <fcntl.h>
>  #include <unistd.h>
> +#include <subcmd/pager.h>
>  
>  #include "sane_ctype.h"
>  
> @@ -3100,6 +3101,24 @@ static int perf_script__process_auxtrace_info(struct perf_session *session,
>  #define perf_script__process_auxtrace_info 0
>  #endif
>  
> +static int parse_insn_trace(const struct option *opt __maybe_unused,
> +			    const char *str __maybe_unused,
> +			    int unset __maybe_unused)
> +{
> +	parse_output_fields(NULL, "+insn,-event,-period", 0);
> +	itrace_parse_synth_opts(opt, "i0ns", 0);
> +	nanosecs = true;
> +	return 0;
> +}
> +
> +static int parse_xed(const struct option *opt __maybe_unused,
> +		     const char *str __maybe_unused,
> +		     int unset __maybe_unused)
> +{
> +	force_pager("xed -F insn: -A -64 | less");
> +	return 0;
> +}
> +
>  int cmd_script(int argc, const char **argv)
>  {
>  	bool show_full_info = false;
> @@ -3184,6 +3203,10 @@ int cmd_script(int argc, const char **argv)
>  		    "system-wide collection from all CPUs"),
>  	OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
>  		   "only consider these symbols"),
> +	OPT_CALLBACK_OPTARG(0, "insn-trace", &itrace_synth_opts, NULL, NULL,
> +			"Decode instructions from itrace", parse_insn_trace),
> +	OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
> +			"Run xed disassembler on output", parse_xed),
>  	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
>  		   "Stop display of callgraph at these symbols"),
>  	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding
  2018-10-08 18:03   ` Arnaldo Carvalho de Melo
@ 2018-10-08 18:41     ` Andi Kleen
  2018-10-08 19:40       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2018-10-08 18:41 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Andi Kleen, jolsa, linux-kernel

> > +> git clone https://github.com/intelxed/mbuild.git mbuild                                                                                        
> > +> git clone https://github.com/intelxed/xed                                                                                                      
> > +> cd xed                                                                                                                                         
> > +> mkdir obj                                                                                                                                      
> > +> cd obj                                                                                                                                         
> > +> ../mfile.py                                                                                                                                    
> > +> sudo ../mfile.py --prefix=/usr/local install    


Need 

../mfile.py examples
sudo ../mfile.py --prefix=/usr/local examples install


-Andi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding
  2018-10-08 18:41     ` Andi Kleen
@ 2018-10-08 19:40       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-10-08 19:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, jolsa, linux-kernel

Em Mon, Oct 08, 2018 at 11:41:15AM -0700, Andi Kleen escreveu:
> > > +> git clone https://github.com/intelxed/mbuild.git mbuild                                                                                        
> > > +> git clone https://github.com/intelxed/xed                                                                                                      
> > > +> cd xed                                                                                                                                         
> > > +> mkdir obj                                                                                                                                      
> > > +> cd obj                                                                                                                                         
> > > +> ../mfile.py                                                                                                                                    
> > > +> sudo ../mfile.py --prefix=/usr/local install    
> 
> 
> Need 
> 
> ../mfile.py examples
> sudo ../mfile.py --prefix=/usr/local examples install

Will try that now.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:perf/urgent] perf script: Add --insn-trace for instruction decoding
  2018-09-20 18:05 ` [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding Andi Kleen
  2018-10-08 17:50   ` Arnaldo Carvalho de Melo
  2018-10-08 18:03   ` Arnaldo Carvalho de Melo
@ 2018-10-26  7:45   ` tip-bot for Andi Kleen
  2 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2018-10-26  7:45 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: ak, linux-kernel, mingo, hpa, jolsa, tglx, acme

Commit-ID:  b585ebdb5912cf1438d4822f79aaebe36a2d123a
Gitweb:     https://git.kernel.org/tip/b585ebdb5912cf1438d4822f79aaebe36a2d123a
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Thu, 20 Sep 2018 11:05:36 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 24 Oct 2018 15:29:50 -0300

perf script: Add --insn-trace for instruction decoding

Add a --insn-trace short hand option for decoding and disassembling
instruction streams for intel_pt. This automatically pipes the output
into the xed disassembler to generate disassembled instructions.  This
just makes this use model much nicer to use.

Before

  % perf record -e intel_pt// ...
  % perf script --itrace=i0ns --ns -F +insn,-event,-period | xed -F insn: -A -64
   swapper 0 [000] 17276.429606186: ffffffff81010486 pt_config ([kernel.kallsyms])    nopl  %eax, (%rax,%rax,1)
   swapper 0 [000] 17276.429606186: ffffffff8101048b pt_config ([kernel.kallsyms])    add $0x10, %rsp
   swapper 0 [000] 17276.429606186: ffffffff8101048f pt_config ([kernel.kallsyms])    popq  %rbx
   swapper 0 [000] 17276.429606186: ffffffff81010490 pt_config ([kernel.kallsyms])    popq  %rbp
   swapper 0 [000] 17276.429606186: ffffffff81010491 pt_config ([kernel.kallsyms])    popq  %r12
   swapper 0 [000] 17276.429606186: ffffffff81010493 pt_config ([kernel.kallsyms])    popq  %r13
   swapper 0 [000] 17276.429606186: ffffffff81010495 pt_config ([kernel.kallsyms])    popq  %r14
   swapper 0 [000] 17276.429606186: ffffffff81010497 pt_config ([kernel.kallsyms])    popq  %r15
   swapper 0 [000] 17276.429606186: ffffffff81010499 pt_config ([kernel.kallsyms])    retq
   swapper 0 [000] 17276.429606186: ffffffff8101063e pt_event_add ([kernel.kallsyms])         cmpl  $0x1, 0x1b0(%rbx)
   swapper 0 [000] 17276.429606186: ffffffff81010645 pt_event_add ([kernel.kallsyms])         mov $0xffffffea, %eax
   swapper 0 [000] 17276.429606186: ffffffff8101064a pt_event_add ([kernel.kallsyms])         mov $0x0, %edx
   swapper 0 [000] 17276.429606186: ffffffff8101064f pt_event_add ([kernel.kallsyms])         popq  %rbx
   swapper 0 [000] 17276.429606186: ffffffff81010650 pt_event_add ([kernel.kallsyms])         cmovnz %edx, %eax
   swapper 0 [000] 17276.429606186: ffffffff81010653 pt_event_add ([kernel.kallsyms])         jmp 0xffffffff81010635
   swapper 0 [000] 17276.429606186: ffffffff81010635 pt_event_add ([kernel.kallsyms])         retq
   swapper 0 [000] 17276.429606186: ffffffff8115e687 event_sched_in.isra.107 ([kernel.kallsyms])       test %eax, %eax

Now:

  % perf record -e intel_pt// ...
  % perf script --insn-trace --xed
  ... same output ...

XED needs to be installed with:

  $ git clone https://github.com/intelxed/mbuild.git mbuild
  $ git clone https://github.com/intelxed/xed
  $ cd xed
  $ ./mfile.py
  $ ./mfile.py examples
  $ sudo ./mfile.py --prefix=/usr/local install
  $ sudo cp obj/examples/xed /usr/local/bin
  $ xed | head -3
  ERROR: required argument(s) were missing
  Copyright (C) 2017, Intel Corporation. All rights reserved.
  XED version: [v10.0-328-g7d62c8c49b7b]
  $

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20180920180540.14039-2-andi@firstfloor.org
[ Fixed up whitespace damage, added the 'mfile.py examples + cp obj/examples/xed ... ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/build-xed.txt   | 19 +++++++++++++++++++
 tools/perf/Documentation/perf-script.txt |  7 +++++++
 tools/perf/builtin-script.c              | 23 +++++++++++++++++++++++
 3 files changed, 49 insertions(+)

diff --git a/tools/perf/Documentation/build-xed.txt b/tools/perf/Documentation/build-xed.txt
new file mode 100644
index 000000000000..6222c1e7231f
--- /dev/null
+++ b/tools/perf/Documentation/build-xed.txt
@@ -0,0 +1,19 @@
+
+For --xed the xed tool is needed. Here is how to install it:
+
+  $ git clone https://github.com/intelxed/mbuild.git mbuild
+  $ git clone https://github.com/intelxed/xed
+  $ cd xed
+  $ ./mfile.py --share
+  $ ./mfile.py examples
+  $ sudo ./mfile.py --prefix=/usr/local install
+  $ sudo ldconfig
+  $ sudo cp obj/examples/xed /usr/local/bin
+
+Basic xed testing:
+
+  $ xed | head -3
+  ERROR: required argument(s) were missing
+  Copyright (C) 2017, Intel Corporation. All rights reserved.
+  XED version: [v10.0-328-g7d62c8c49b7b]
+  $
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index afdafe2110a1..00c655ab4968 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -383,6 +383,13 @@ include::itrace.txt[]
 	will be printed. Each entry has function name and file/line. Enabled by
 	default, disable with --no-inline.
 
+--insn-trace::
+	Show instruction stream for intel_pt traces. Combine with --xed to
+	show disassembly.
+
+--xed::
+	Run xed disassembler on output. Requires installing the xed disassembler.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ca09b7d2adb7..411ea175bcaf 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -44,6 +44,7 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <subcmd/pager.h>
 
 #include "sane_ctype.h"
 
@@ -3103,6 +3104,24 @@ static int perf_script__process_auxtrace_info(struct perf_session *session,
 #define perf_script__process_auxtrace_info 0
 #endif
 
+static int parse_insn_trace(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	parse_output_fields(NULL, "+insn,-event,-period", 0);
+	itrace_parse_synth_opts(opt, "i0ns", 0);
+	nanosecs = true;
+	return 0;
+}
+
+static int parse_xed(const struct option *opt __maybe_unused,
+		     const char *str __maybe_unused,
+		     int unset __maybe_unused)
+{
+	force_pager("xed -F insn: -A -64 | less");
+	return 0;
+}
+
 int cmd_script(int argc, const char **argv)
 {
 	bool show_full_info = false;
@@ -3187,6 +3206,10 @@ int cmd_script(int argc, const char **argv)
 		    "system-wide collection from all CPUs"),
 	OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
 		   "only consider these symbols"),
+	OPT_CALLBACK_OPTARG(0, "insn-trace", &itrace_synth_opts, NULL, NULL,
+			"Decode instructions from itrace", parse_insn_trace),
+	OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
+			"Run xed disassembler on output", parse_xed),
 	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
 		   "Stop display of callgraph at these symbols"),
 	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/urgent] perf script: Make itrace script default to all calls
  2018-09-20 18:05 ` [PATCH v6 2/5] perf, tools, script: Make itrace script default to all calls Andi Kleen
@ 2018-10-26  7:46   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2018-10-26  7:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, jolsa, hpa, leo.yan, kim.phillips, ak, mingo, acme, linux-kernel

Commit-ID:  4eb068157121939f4bc16256a37bcd88f5554123
Gitweb:     https://git.kernel.org/tip/4eb068157121939f4bc16256a37bcd88f5554123
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Thu, 20 Sep 2018 11:05:37 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 24 Oct 2018 15:29:54 -0300

perf script: Make itrace script default to all calls

By default 'perf script' for itrace outputs sampled instructions or
branches. In my experience this is confusing to users because it's hard
to correlate with real program behavior. The sampling makes sense for
tools like 'perf report' that actually sample to reduce the run time,
but run time is normally not a problem for 'perf script'.  It's better
to give an accurate representation of the program flow.

Default 'perf script' to output all calls for itrace. That's a much saner
default. The old behavior can be still requested with 'perf script'
--itrace=ibxwpe100000

v2: Fix ETM build failure
v3: Really fix ETM build failure (Kim Phillips)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Link: http://lkml.kernel.org/r/20180920180540.14039-3-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/itrace.txt |  7 ++++---
 tools/perf/builtin-script.c         |  5 ++++-
 tools/perf/util/auxtrace.c          | 17 ++++++++++++-----
 tools/perf/util/auxtrace.h          |  5 ++++-
 tools/perf/util/cs-etm.c            |  3 ++-
 tools/perf/util/intel-bts.c         |  3 ++-
 tools/perf/util/intel-pt.c          |  3 ++-
 7 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index a3abe04c779d..c2182cbabde3 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -11,10 +11,11 @@
 		l	synthesize last branch entries (use with i or x)
 		s       skip initial number of events
 
-	The default is all events i.e. the same as --itrace=ibxwpe
+	The default is all events i.e. the same as --itrace=ibxwpe,
+	except for perf script where it is --itrace=ce
 
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
+	In addition, the period (default 100000, except for perf script where it is 1)
+	for instructions events can be specified in units of:
 
 		i	instructions
 		t	ticks
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 411ea175bcaf..6099c722a679 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3131,7 +3131,10 @@ int cmd_script(int argc, const char **argv)
 	char *rec_script_path = NULL;
 	char *rep_script_path = NULL;
 	struct perf_session *session;
-	struct itrace_synth_opts itrace_synth_opts = { .set = false, };
+	struct itrace_synth_opts itrace_synth_opts = {
+		.set = false,
+		.default_no_sample = true,
+	};
 	char *script_path = NULL;
 	const char **__argv;
 	int i, j, err = 0;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c4617bcfd521..72d5ba2479bf 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -962,16 +962,23 @@ s64 perf_event__process_auxtrace(struct perf_session *session,
 #define PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ	64
 #define PERF_ITRACE_MAX_LAST_BRANCH_SZ		1024
 
-void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts)
+void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
+				    bool no_sample)
 {
-	synth_opts->instructions = true;
 	synth_opts->branches = true;
 	synth_opts->transactions = true;
 	synth_opts->ptwrites = true;
 	synth_opts->pwr_events = true;
 	synth_opts->errors = true;
-	synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
-	synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
+	if (no_sample) {
+		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
+		synth_opts->period = 1;
+		synth_opts->calls = true;
+	} else {
+		synth_opts->instructions = true;
+		synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
+		synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
+	}
 	synth_opts->callchain_sz = PERF_ITRACE_DEFAULT_CALLCHAIN_SZ;
 	synth_opts->last_branch_sz = PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
 	synth_opts->initial_skip = 0;
@@ -999,7 +1006,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	}
 
 	if (!str) {
-		itrace_synth_opts__set_default(synth_opts);
+		itrace_synth_opts__set_default(synth_opts, false);
 		return 0;
 	}
 
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index d88f6e9eb461..8e50f96d4b23 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -58,6 +58,7 @@ enum itrace_period_type {
 /**
  * struct itrace_synth_opts - AUX area tracing synthesis options.
  * @set: indicates whether or not options have been set
+ * @default_no_sample: Default to no sampling.
  * @inject: indicates the event (not just the sample) must be fully synthesized
  *          because 'perf inject' will write it out
  * @instructions: whether to synthesize 'instructions' events
@@ -82,6 +83,7 @@ enum itrace_period_type {
  */
 struct itrace_synth_opts {
 	bool			set;
+	bool			default_no_sample;
 	bool			inject;
 	bool			instructions;
 	bool			branches;
@@ -528,7 +530,8 @@ int perf_event__process_auxtrace_error(struct perf_session *session,
 				       union perf_event *event);
 int itrace_parse_synth_opts(const struct option *opt, const char *str,
 			    int unset);
-void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts);
+void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
+				    bool no_sample);
 
 size_t perf_event__fprintf_auxtrace_error(union perf_event *event, FILE *fp);
 void perf_session__auxtrace_error_inc(struct perf_session *session,
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 2ae640257fdb..3b37d66dc533 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1432,7 +1432,8 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
 		etm->synth_opts = *session->itrace_synth_opts;
 	} else {
-		itrace_synth_opts__set_default(&etm->synth_opts);
+		itrace_synth_opts__set_default(&etm->synth_opts,
+				session->itrace_synth_opts->default_no_sample);
 		etm->synth_opts.callchain = false;
 	}
 
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 7f0c83b6332b..3b3a3d55dca1 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -910,7 +910,8 @@ int intel_bts_process_auxtrace_info(union perf_event *event,
 	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
 		bts->synth_opts = *session->itrace_synth_opts;
 	} else {
-		itrace_synth_opts__set_default(&bts->synth_opts);
+		itrace_synth_opts__set_default(&bts->synth_opts,
+				session->itrace_synth_opts->default_no_sample);
 		if (session->itrace_synth_opts)
 			bts->synth_opts.thread_stack =
 				session->itrace_synth_opts->thread_stack;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 48c1d415c6b0..ffa385a029b3 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2559,7 +2559,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
 		pt->synth_opts = *session->itrace_synth_opts;
 	} else {
-		itrace_synth_opts__set_default(&pt->synth_opts);
+		itrace_synth_opts__set_default(&pt->synth_opts,
+				session->itrace_synth_opts->default_no_sample);
 		if (use_browser != -1) {
 			pt->synth_opts.branches = false;
 			pt->synth_opts.callchain = true;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/urgent] tools script: Add --call-trace and --call-ret-trace
  2018-09-20 18:05 ` [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace Andi Kleen
  2018-09-28 10:23   ` leo.yan
@ 2018-10-26  7:46   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2018-10-26  7:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, jolsa, acme, leo.yan, kim.phillips, linux-kernel, tglx, ak, hpa

Commit-ID:  d1b1552e15d41297abcaf3812378e3391d44fa6b
Gitweb:     https://git.kernel.org/tip/d1b1552e15d41297abcaf3812378e3391d44fa6b
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Thu, 20 Sep 2018 11:05:38 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 24 Oct 2018 15:29:55 -0300

tools script: Add --call-trace and --call-ret-trace

Add short cut options to print PT call trace and call-ret-trace, for
calls and call and returns. Roughly corresponds to ftrace function
tracer and function graph tracer.

Just makes these common use cases nicer to use.

% perf record -a -e intel_pt// sleep 1
% perf script --call-trace
	    perf   900 [000] 194167.205652203: ([kernel.kallsyms])          perf_pmu_enable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          event_filter_match
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage

% perf script --call-ret-trace
	    perf   900 [000] 194167.205652203:   tr strt     ([unknown])        pt_config
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_config
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            pt_event_add
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            perf_pmu_enable
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_void
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_sched_in.isra.107
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            perf_pmu_nop_int
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            group_sched_in
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            event_filter_match
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])            event_filter_match
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])            group_sched_in
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203:   return      ([kernel.kallsyms])                perf_pmu_nop_txn
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                event_sched_in.isra.107
            perf   900 [000] 194167.205652203:   call        ([kernel.kallsyms])                    perf_event_set_state.part.71

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Link: http://lkml.kernel.org/r/20180920180540.14039-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt |  7 +++++++
 tools/perf/builtin-script.c              | 24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 00c655ab4968..805baabd238e 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -390,6 +390,13 @@ include::itrace.txt[]
 --xed::
 	Run xed disassembler on output. Requires installing the xed disassembler.
 
+--call-trace::
+	Show call stream for intel_pt traces. The CPUs are interleaved, but
+	can be filtered with -C.
+
+--call-ret-trace::
+	Show call and return stream for intel_pt traces.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6099c722a679..566e1450898a 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3122,6 +3122,26 @@ static int parse_xed(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int parse_call_trace(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent", 0);
+	itrace_parse_synth_opts(opt, "cewp", 0);
+	nanosecs = true;
+	return 0;
+}
+
+static int parse_callret_trace(const struct option *opt __maybe_unused,
+			    const char *str __maybe_unused,
+			    int unset __maybe_unused)
+{
+	parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent,+flags", 0);
+	itrace_parse_synth_opts(opt, "crewp", 0);
+	nanosecs = true;
+	return 0;
+}
+
 int cmd_script(int argc, const char **argv)
 {
 	bool show_full_info = false;
@@ -3213,6 +3233,10 @@ int cmd_script(int argc, const char **argv)
 			"Decode instructions from itrace", parse_insn_trace),
 	OPT_CALLBACK_OPTARG(0, "xed", NULL, NULL, NULL,
 			"Run xed disassembler on output", parse_xed),
+	OPT_CALLBACK_OPTARG(0, "call-trace", &itrace_synth_opts, NULL, NULL,
+			"Decode calls from from itrace", parse_call_trace),
+	OPT_CALLBACK_OPTARG(0, "call-ret-trace", &itrace_synth_opts, NULL, NULL,
+			"Decode calls and returns from itrace", parse_callret_trace),
 	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
 		   "Stop display of callgraph at these symbols"),
 	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/urgent] perf script: Implement --graph-function
  2018-09-20 18:05 ` [PATCH v6 4/5] tools, perf, script: Implement --graph-function Andi Kleen
@ 2018-10-26  7:47   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2018-10-26  7:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, leo.yan, mingo, ak, kim.phillips, tglx, jolsa, hpa, acme

Commit-ID:  99f753f048b3f02f31a56951781672021af6cd0d
Gitweb:     https://git.kernel.org/tip/99f753f048b3f02f31a56951781672021af6cd0d
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Thu, 20 Sep 2018 11:05:39 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 24 Oct 2018 15:29:55 -0300

perf script: Implement --graph-function

Add a ftrace style --graph-function argument to 'perf script' that
allows to print itrace function calls only below a given function. This
makes it easier to find the code of interest in a large trace.

% perf record -e intel_pt//k -a sleep 1
% perf script --graph-function group_sched_in --call-trace
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          calc_timer_values
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_cpu
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          arch_perf_update_userpage
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              __fentry__
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              using_native_sched_clock
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_stable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_enable
            perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])          group_sched_in
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              event_sched_in.isra.107
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_event_set_state.part.71
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_time
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_pmu_disable
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_log_itrace_start
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_userpage
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          calc_timer_values
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_cpu
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          arch_perf_update_userpage
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              __fentry__
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              using_native_sched_clock
         swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_stable

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Link: http://lkml.kernel.org/r/20180920180540.14039-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt |  4 ++
 tools/perf/builtin-script.c              | 96 ++++++++++++++++++++++++++------
 tools/perf/util/symbol.h                 |  3 +-
 tools/perf/util/thread.h                 |  2 +
 4 files changed, 86 insertions(+), 19 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 805baabd238e..a2b37ce48094 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -397,6 +397,10 @@ include::itrace.txt[]
 --call-ret-trace::
 	Show call and return stream for intel_pt traces.
 
+--graph-function::
+	For itrace only show specified functions and their callees for
+	itrace. Multiple functions can be separated by comma.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 566e1450898a..9d2249ea75e3 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1105,6 +1105,35 @@ out:
 	return printed;
 }
 
+static const char *resolve_branch_sym(struct perf_sample *sample,
+				      struct perf_evsel *evsel,
+				      struct thread *thread,
+				      struct addr_location *al,
+				      u64 *ip)
+{
+	struct addr_location addr_al;
+	struct perf_event_attr *attr = &evsel->attr;
+	const char *name = NULL;
+
+	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
+		if (sample_addr_correlates_sym(attr)) {
+			thread__resolve(thread, &addr_al, sample);
+			if (addr_al.sym)
+				name = addr_al.sym->name;
+			else
+				*ip = sample->addr;
+		} else {
+			*ip = sample->addr;
+		}
+	} else if (sample->flags & (PERF_IP_FLAG_RETURN | PERF_IP_FLAG_TRACE_END)) {
+		if (al->sym)
+			name = al->sym->name;
+		else
+			*ip = sample->ip;
+	}
+	return name;
+}
+
 static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 					   struct perf_evsel *evsel,
 					   struct thread *thread,
@@ -1112,7 +1141,6 @@ static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 {
 	struct perf_event_attr *attr = &evsel->attr;
 	size_t depth = thread_stack__depth(thread);
-	struct addr_location addr_al;
 	const char *name = NULL;
 	static int spacing;
 	int len = 0;
@@ -1126,22 +1154,7 @@ static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 	if (thread->ts && sample->flags & PERF_IP_FLAG_RETURN)
 		depth += 1;
 
-	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
-		if (sample_addr_correlates_sym(attr)) {
-			thread__resolve(thread, &addr_al, sample);
-			if (addr_al.sym)
-				name = addr_al.sym->name;
-			else
-				ip = sample->addr;
-		} else {
-			ip = sample->addr;
-		}
-	} else if (sample->flags & (PERF_IP_FLAG_RETURN | PERF_IP_FLAG_TRACE_END)) {
-		if (al->sym)
-			name = al->sym->name;
-		else
-			ip = sample->ip;
-	}
+	name = resolve_branch_sym(sample, evsel, thread, al, &ip);
 
 	if (PRINT_FIELD(DSO) && !(PRINT_FIELD(IP) || PRINT_FIELD(ADDR))) {
 		dlen += fprintf(fp, "(");
@@ -1647,6 +1660,47 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	}
 }
 
+static bool show_event(struct perf_sample *sample,
+		       struct perf_evsel *evsel,
+		       struct thread *thread,
+		       struct addr_location *al)
+{
+	int depth = thread_stack__depth(thread);
+
+	if (!symbol_conf.graph_function)
+		return true;
+
+	if (thread->filter) {
+		if (depth <= thread->filter_entry_depth) {
+			thread->filter = false;
+			return false;
+		}
+		return true;
+	} else {
+		const char *s = symbol_conf.graph_function;
+		u64 ip;
+		const char *name = resolve_branch_sym(sample, evsel, thread, al,
+				&ip);
+		unsigned nlen;
+
+		if (!name)
+			return false;
+		nlen = strlen(name);
+		while (*s) {
+			unsigned len = strcspn(s, ",");
+			if (nlen == len && !strncmp(name, s, len)) {
+				thread->filter = true;
+				thread->filter_entry_depth = depth;
+				return true;
+			}
+			s += len;
+			if (*s == ',')
+				s++;
+		}
+		return false;
+	}
+}
+
 static void process_event(struct perf_script *script,
 			  struct perf_sample *sample, struct perf_evsel *evsel,
 			  struct addr_location *al,
@@ -1661,6 +1715,9 @@ static void process_event(struct perf_script *script,
 	if (output[type].fields == 0)
 		return;
 
+	if (!show_event(sample, evsel, thread, al))
+		return;
+
 	++es->samples;
 
 	perf_sample__fprintf_start(sample, thread, evsel,
@@ -3237,6 +3294,8 @@ int cmd_script(int argc, const char **argv)
 			"Decode calls from from itrace", parse_call_trace),
 	OPT_CALLBACK_OPTARG(0, "call-ret-trace", &itrace_synth_opts, NULL, NULL,
 			"Decode calls and returns from itrace", parse_callret_trace),
+	OPT_STRING(0, "graph-function", &symbol_conf.graph_function, "symbol[,symbol...]",
+			"Only print symbols and callees with --call-trace/--call-ret-trace"),
 	OPT_STRING(0, "stop-bt", &symbol_conf.bt_stop_list_str, "symbol[,symbol...]",
 		   "Stop display of callgraph at these symbols"),
 	OPT_STRING('C', "cpu", &cpu_list, "cpu", "list of cpus to profile"),
@@ -3494,7 +3553,8 @@ int cmd_script(int argc, const char **argv)
 	script.session = session;
 	script__setup_sample_type(&script);
 
-	if (output[PERF_TYPE_HARDWARE].fields & PERF_OUTPUT_CALLINDENT)
+	if ((output[PERF_TYPE_HARDWARE].fields & PERF_OUTPUT_CALLINDENT) ||
+	    symbol_conf.graph_function)
 		itrace_synth_opts.thread_stack = true;
 
 	session->itrace_synth_opts = &itrace_synth_opts;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index f25fae4b5743..d726a8a7bb1b 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -123,7 +123,8 @@ struct symbol_conf {
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
-			*field_sep;
+			*field_sep,
+			*graph_function;
 	const char	*default_guest_vmlinux_name,
 			*default_guest_kallsyms,
 			*default_guest_modules;
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 07606aa6998d..36c09a9904e6 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -42,6 +42,8 @@ struct thread {
 	void				*addr_space;
 	struct unwind_libunwind_ops	*unwind_libunwind_ops;
 #endif
+	bool			filter;
+	int			filter_entry_depth;
 };
 
 struct machine;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/urgent] perf script: Support total cycles count
  2018-09-24 17:07   ` Andi Kleen
@ 2018-10-26  7:47     ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2018-10-26  7:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: andi, linux-kernel, tglx, hpa, acme, ak, mingo, jolsa

Commit-ID:  fe57120e18a1f9124ca758c89cc54f91333d1847
Gitweb:     https://git.kernel.org/tip/fe57120e18a1f9124ca758c89cc54f91333d1847
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Mon, 24 Sep 2018 10:07:32 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 24 Oct 2018 15:29:56 -0300

perf script: Support total cycles count

For 'perf script' brstackinsn also print a running cycles count.  This
makes it easier to calculate cycle deltas for code sections measured
with LBRs.

% perf record -b -a sleep 1
% perf script -F +brstackinsn
...
        00007f73ecc41083        insn: 74 06                     # PRED 9 cycles [17] 1.11 IPC
        00007f73ecc4108b        insn: a8 10
        00007f73ecc4108d        insn: 74 71                     # PRED 1 cycles [18] 1.00 IPC
        00007f73ecc41100        insn: 48 8b 46 10
        00007f73ecc41104        insn: 4c 8b 38
        00007f73ecc41107        insn: 4d 85 ff
        00007f73ecc4110a        insn: 0f 84 b0 00 00 00
        00007f73ecc41110        insn: 83 43 58 01
        00007f73ecc41114        insn: 48 89 df
        00007f73ecc41117        insn: e8 94 73 04 00            # PRED 6 cycles [24] 1.00 IPC

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Link: http://lkml.kernel.org/r/20180924170732.GA28040@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-script.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 9d2249ea75e3..b5bc85bd0bbe 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -913,7 +913,7 @@ static int grab_bb(u8 *buffer, u64 start, u64 end,
 
 static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 			    struct perf_insn *x, u8 *inbuf, int len,
-			    int insn, FILE *fp)
+			    int insn, FILE *fp, int *total_cycles)
 {
 	int printed = fprintf(fp, "\t%016" PRIx64 "\t%-30s\t#%s%s%s%s", ip,
 			      dump_insn(x, ip, inbuf, len, NULL),
@@ -922,7 +922,8 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 			      en->flags.in_tx ? " INTX" : "",
 			      en->flags.abort ? " ABORT" : "");
 	if (en->flags.cycles) {
-		printed += fprintf(fp, " %d cycles", en->flags.cycles);
+		*total_cycles += en->flags.cycles;
+		printed += fprintf(fp, " %d cycles [%d]", en->flags.cycles, *total_cycles);
 		if (insn)
 			printed += fprintf(fp, " %.2f IPC", (float)insn / en->flags.cycles);
 	}
@@ -979,6 +980,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	u8 buffer[MAXBB];
 	unsigned off;
 	struct symbol *lastsym = NULL;
+	int total_cycles = 0;
 
 	if (!(br && br->nr))
 		return 0;
@@ -999,7 +1001,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 		printed += ip__fprintf_sym(br->entries[nr - 1].from, thread,
 					   x.cpumode, x.cpu, &lastsym, attr, fp);
 		printed += ip__fprintf_jump(br->entries[nr - 1].from, &br->entries[nr - 1],
-					    &x, buffer, len, 0, fp);
+					    &x, buffer, len, 0, fp, &total_cycles);
 	}
 
 	/* Print all blocks */
@@ -1027,7 +1029,8 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 
 			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
 			if (ip == end) {
-				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, insn, fp);
+				printed += ip__fprintf_jump(ip, &br->entries[i], &x, buffer + off, len - off, insn, fp,
+							    &total_cycles);
 				break;
 			} else {
 				printed += fprintf(fp, "\t%016" PRIx64 "\t%s\n", ip,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-10-26  7:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-20 18:05 Make perf script easier to use for itrace Andi Kleen
2018-09-20 18:05 ` [PATCH v6 1/5] tools, perf, script: Add --insn-trace for instruction decoding Andi Kleen
2018-10-08 17:50   ` Arnaldo Carvalho de Melo
2018-10-08 18:03   ` Arnaldo Carvalho de Melo
2018-10-08 18:41     ` Andi Kleen
2018-10-08 19:40       ` Arnaldo Carvalho de Melo
2018-10-26  7:45   ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
2018-09-20 18:05 ` [PATCH v6 2/5] perf, tools, script: Make itrace script default to all calls Andi Kleen
2018-10-26  7:46   ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
2018-09-20 18:05 ` [PATCH v6 3/5] tools, perf, script: Add --call-trace and --call-ret-trace Andi Kleen
2018-09-28 10:23   ` leo.yan
2018-09-28 17:19     ` Andi Kleen
2018-09-29  7:39       ` leo.yan
2018-10-02 14:24         ` Arnaldo Carvalho de Melo
2018-10-02 23:09           ` leo.yan
2018-10-26  7:46   ` [tip:perf/urgent] tools " tip-bot for Andi Kleen
2018-09-20 18:05 ` [PATCH v6 4/5] tools, perf, script: Implement --graph-function Andi Kleen
2018-10-26  7:47   ` [tip:perf/urgent] perf " tip-bot for Andi Kleen
2018-09-20 18:05 ` [PATCH v6 5/5] perf, tools, script: Support total cycles count Andi Kleen
2018-09-24 17:07   ` Andi Kleen
2018-10-26  7:47     ` [tip:perf/urgent] perf " tip-bot for Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).