linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff
@ 2015-09-25 13:15 Adrian Hunter
  2015-09-25 13:15 ` [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero Adrian Hunter
                   ` (25 more replies)
  0 siblings, 26 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Hi

Here are some minor improvements to Intel PT related stuff.

First 3 patches are minor fixes:

      perf auxtrace: Fix 'instructions' period of zero
      perf report: Fix sample type validation for synthesized callchains
      perf intel-pt: Fix potential loop forever

Next 4 are minor improvements:

      perf intel-pt: Make logging slightly more efficient
      perf script: Allow time to be displayed in nanoseconds
      perf tools: Warn when AUX data has been lost
      perf tools: Add more documentation to export-to-postgresql.py script

Next 7 add support for branch stacks:

      perf auxtrace: Add option to synthesize branch stacks on samples
      perf report: Adjust sample type validation for synthesized branch stacks
      perf report: Also do default setup for synthesized branch stacks
      perf report: Skip events with null branch stacks
      perf inject: Set branch stack feature flag when synthesizing branch stacks
      perf intel-pt: Move branch filter logic
      perf intel-pt: Support generating branch stack

Next 6 allow for arbitrary-sized call stacks:

      perf report: Make max_stack value allow for synthesized callchains
      perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
      perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
      perf script: Add a setting for maximum stack depth
      perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
      perf script: Make scripting_max_stack value allow for synthesized callchains

Final 5 let Intel PT be used with autofdo:

      perf tools: Add perf_evlist__id2evsel_strict()
      perf tools: Add perf_evlist__del()
      perf inject: Remove more aux-related stuff when processing instruction traces
      perf inject: Add --strip option to strip out non-synthesized events
      perf intel-pt: Add mispred-all config option to aid use with autofdo


Adrian Hunter (25):
      perf auxtrace: Fix 'instructions' period of zero
      perf report: Fix sample type validation for synthesized callchains
      perf intel-pt: Fix potential loop forever
      perf intel-pt: Make logging slightly more efficient
      perf script: Allow time to be displayed in nanoseconds
      perf tools: Warn when AUX data has been lost
      perf tools: Add more documentation to export-to-postgresql.py script
      perf auxtrace: Add option to synthesize branch stacks on samples
      perf report: Adjust sample type validation for synthesized branch stacks
      perf report: Also do default setup for synthesized branch stacks
      perf report: Skip events with null branch stacks
      perf inject: Set branch stack feature flag when synthesizing branch stacks
      perf intel-pt: Move branch filter logic
      perf intel-pt: Support generating branch stack
      perf report: Make max_stack value allow for synthesized callchains
      perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
      perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
      perf script: Add a setting for maximum stack depth
      perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
      perf script: Make scripting_max_stack value allow for synthesized callchains
      perf tools: Add perf_evlist__id2evsel_strict()
      perf tools: Add perf_evlist__del()
      perf inject: Remove more aux-related stuff when processing instruction traces
      perf inject: Add --strip option to strip out non-synthesized events
      perf intel-pt: Add mispred-all config option to aid use with autofdo

 tools/perf/Documentation/intel-pt.txt              |  39 ++++
 tools/perf/Documentation/itrace.txt                |   4 +
 tools/perf/Documentation/perf-inject.txt           |   3 +
 tools/perf/Documentation/perf-script.txt           |   3 +
 tools/perf/builtin-inject.c                        | 125 +++++++++++-
 tools/perf/builtin-report.c                        |  31 ++-
 tools/perf/builtin-script.c                        |  18 +-
 tools/perf/scripts/python/export-to-postgresql.py  | 221 +++++++++++++++++++++
 tools/perf/util/auxtrace.c                         |  24 ++-
 tools/perf/util/auxtrace.h                         |   4 +
 tools/perf/util/event.h                            |   1 +
 tools/perf/util/evlist.c                           |  23 +++
 tools/perf/util/evlist.h                           |   3 +
 tools/perf/util/hist.c                             |   6 +-
 tools/perf/util/hist.h                             |   1 +
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  |   4 +-
 tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  21 +-
 tools/perf/util/intel-pt-decoder/intel-pt-log.h    |  38 +++-
 tools/perf/util/intel-pt.c                         | 135 ++++++++++++-
 tools/perf/util/machine.c                          |   2 +-
 .../util/scripting-engines/trace-event-python.c    |   2 +-
 tools/perf/util/session.c                          |  12 +-
 tools/perf/util/trace-event.h                      |   2 +
 23 files changed, 686 insertions(+), 36 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-28 14:12   ` Arnaldo Carvalho de Melo
  2015-09-29  8:41   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 02/25] perf report: Fix sample type validation for synthesized callchains Adrian Hunter
                   ` (24 subsequent siblings)
  25 siblings, 2 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Instruction tracing options (i.e. --itrace) include an option for
sampling instructions at an arbitrary period. e.g.

	--itrace=i10us

means make an 'instructions' sample for every 10us of trace.

Currently the logic does not distinguish between a period of
zero and no period being specified at all, so it gets treated
as the default period which is 100000.  That doesn't really
make sense.

Fix it so that zero period is accepted and treated as meaning
"as often as possible".

In the case of Intel PT that is the same as a period of 1 and
a unit of 'instructions' (i.e. --itrace=i1i).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/auxtrace.c | 4 +++-
 tools/perf/util/intel-pt.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index a980e7c50ee0..c4993b2e6c50 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -950,6 +950,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	const char *p;
 	char *endptr;
 	bool period_type_set = false;
+	bool period_set = false;
 
 	synth_opts->set = true;
 
@@ -971,6 +972,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				p += 1;
 			if (isdigit(*p)) {
 				synth_opts->period = strtoull(p, &endptr, 10);
+				period_set = true;
 				p = endptr;
 				while (*p == ' ' || *p == ',')
 					p += 1;
@@ -1053,7 +1055,7 @@ out:
 		if (!period_type_set)
 			synth_opts->period_type =
 					PERF_ITRACE_DEFAULT_PERIOD_TYPE;
-		if (!synth_opts->period)
+		if (!period_set)
 			synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
 	}
 
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 38942e1eac8f..c8bb5ca6a157 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -720,7 +720,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 
 		if (!params.period) {
 			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
-			params.period = 1000;
+			params.period = 1;
 		}
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/25] perf report: Fix sample type validation for synthesized callchains
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
  2015-09-25 13:15 ` [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:41   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 03/25] perf intel-pt: Fix potential loop forever Adrian Hunter
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Processing instruction tracing data (e.g. Intel PT) can synthesize
callchains e.g.

	$ perf record -e intel_pt//u uname
	$ perf report --stdio --itrace=ige

However perf report's callgraph option gets extra validation, so:

	$ perf report --stdio --itrace=ige -gflat
	Error:
	Selected -g or --branch-history but no callchain data. Did
	you call 'perf record' without -g?
	# To display the perf.data header info,
	# please use --header/--header-only options.
	#

Fix the validation to know about instruction tracing options so
above command works.

A side-effect of the change is that the default option to
accumulate the callchain of child functions comes into force.
To get the previous behaviour the --no-children option can be
used e.g.

       $ perf report --stdio --itrace=ige -gflat --no-children

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-report.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index e4e3f1432622..0d53b485a87b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -214,6 +214,12 @@ static int report__setup_sample_type(struct report *rep)
 	u64 sample_type = perf_evlist__combined_sample_type(session->evlist);
 	bool is_pipe = perf_data_file__is_pipe(session->file);
 
+	if (session->itrace_synth_opts->callchain ||
+	    (!is_pipe &&
+	     perf_header__has_feat(&session->header, HEADER_AUXTRACE) &&
+	     !session->itrace_synth_opts->set))
+		sample_type |= PERF_SAMPLE_CALLCHAIN;
+
 	if (!is_pipe && !(sample_type & PERF_SAMPLE_CALLCHAIN)) {
 		if (sort__has_parent) {
 			ui__error("Selected --sort parent, but no "
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/25] perf intel-pt: Fix potential loop forever
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
  2015-09-25 13:15 ` [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero Adrian Hunter
  2015-09-25 13:15 ` [PATCH 02/25] perf report: Fix sample type validation for synthesized callchains Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 04/25] perf intel-pt: Make logging slightly more efficient Adrian Hunter
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

TSC packets contain only 7 bytes of TSC.  The 8th byte is assumed
to change so infrequently that its value can be inferred.  However
the logic must cater for a 7 byte wraparound, which it does by
adding 1 to the top byte.

The existing code was doing that with a while loop even though the
addition should only need to be done once.  That logic won't work
(will loop forever) if TSC wraps around at the 8th byte.
Theoretically that would take at least 10 years, unless something
else went wrong.

And what else could go wrong.  Well, if the chunks of trace data
are processed out of order, it will make it look like the 7-byte
TSC has gone backwards (i.e. wrapped).  If that happens 256 times
then stuck in the while loop it will be.

Fix that by getting rid of the unnecessary while loop.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 22ba50224319..9409d014b46c 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -650,7 +650,7 @@ static int intel_pt_calc_cyc_cb(struct intel_pt_pkt_info *pkt_info)
 		if (data->from_mtc && timestamp < data->timestamp &&
 		    data->timestamp - timestamp < decoder->tsc_slip)
 			return 1;
-		while (timestamp < data->timestamp)
+		if (timestamp < data->timestamp)
 			timestamp += (1ULL << 56);
 		if (pkt_info->last_packet_type != INTEL_PT_CYC) {
 			if (data->from_mtc)
@@ -1191,7 +1191,7 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 					timestamp);
 			timestamp = decoder->timestamp;
 		}
-		while (timestamp < decoder->timestamp) {
+		if (timestamp < decoder->timestamp) {
 			intel_pt_log_to("Wraparound timestamp", timestamp);
 			timestamp += (1ULL << 56);
 			decoder->tsc_timestamp = timestamp;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/25] perf intel-pt: Make logging slightly more efficient
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (2 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 03/25] perf intel-pt: Fix potential loop forever Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 05/25] perf script: Allow time to be displayed in nanoseconds Adrian Hunter
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Logging is only used for debugging. Use macros to save
calling into the functions only to return immediately
when logging is not enabled.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 21 +++++++-------
 tools/perf/util/intel-pt-decoder/intel-pt-log.h | 38 +++++++++++++++++++++----
 2 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
index d09c7d9f9050..319bef33a64b 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -29,18 +29,18 @@
 
 static FILE *f;
 static char log_name[MAX_LOG_NAME];
-static bool enable_logging;
+bool intel_pt_enable_logging;
 
 void intel_pt_log_enable(void)
 {
-	enable_logging = true;
+	intel_pt_enable_logging = true;
 }
 
 void intel_pt_log_disable(void)
 {
 	if (f)
 		fflush(f);
-	enable_logging = false;
+	intel_pt_enable_logging = false;
 }
 
 void intel_pt_log_set_name(const char *name)
@@ -80,7 +80,7 @@ static void intel_pt_print_no_data(uint64_t pos, int indent)
 
 static int intel_pt_log_open(void)
 {
-	if (!enable_logging)
+	if (!intel_pt_enable_logging)
 		return -1;
 
 	if (f)
@@ -91,15 +91,15 @@ static int intel_pt_log_open(void)
 
 	f = fopen(log_name, "w+");
 	if (!f) {
-		enable_logging = false;
+		intel_pt_enable_logging = false;
 		return -1;
 	}
 
 	return 0;
 }
 
-void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
-			 uint64_t pos, const unsigned char *buf)
+void __intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			   uint64_t pos, const unsigned char *buf)
 {
 	char desc[INTEL_PT_PKT_DESC_MAX];
 
@@ -111,7 +111,7 @@ void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
 	fprintf(f, "%s\n", desc);
 }
 
-void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+void __intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
 {
 	char desc[INTEL_PT_INSN_DESC_MAX];
 	size_t len = intel_pt_insn->length;
@@ -128,7 +128,8 @@ void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
 		fprintf(f, "Bad instruction!\n");
 }
 
-void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+void __intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+				 uint64_t ip)
 {
 	char desc[INTEL_PT_INSN_DESC_MAX];
 
@@ -142,7 +143,7 @@ void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
 		fprintf(f, "Bad instruction!\n");
 }
 
-void intel_pt_log(const char *fmt, ...)
+void __intel_pt_log(const char *fmt, ...)
 {
 	va_list args;
 
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.h b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
index db3942f83677..debe751dc3d6 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
@@ -25,20 +25,46 @@ void intel_pt_log_enable(void);
 void intel_pt_log_disable(void);
 void intel_pt_log_set_name(const char *name);
 
-void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
-			 uint64_t pos, const unsigned char *buf);
+void __intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			   uint64_t pos, const unsigned char *buf);
 
 struct intel_pt_insn;
 
-void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
-void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
-			       uint64_t ip);
+void __intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
+void __intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+				 uint64_t ip);
 
 __attribute__((format(printf, 1, 2)))
-void intel_pt_log(const char *fmt, ...);
+void __intel_pt_log(const char *fmt, ...);
+
+#define intel_pt_log(fmt, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log(fmt, ##__VA_ARGS__); \
+	} while (0)
+
+#define intel_pt_log_packet(arg, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log_packet(arg, ##__VA_ARGS__); \
+	} while (0)
+
+#define intel_pt_log_insn(arg, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log_insn(arg, ##__VA_ARGS__); \
+	} while (0)
+
+#define intel_pt_log_insn_no_data(arg, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log_insn_no_data(arg, ##__VA_ARGS__); \
+	} while (0)
 
 #define x64_fmt "0x%" PRIx64
 
+extern bool intel_pt_enable_logging;
+
 static inline void intel_pt_log_at(const char *msg, uint64_t u)
 {
 	intel_pt_log("%s at " x64_fmt "\n", msg, u);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/25] perf script: Allow time to be displayed in nanoseconds
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (3 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 04/25] perf intel-pt: Make logging slightly more efficient Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 06/25] perf tools: Warn when AUX data has been lost Adrian Hunter
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Add option --ns to display time to 9 decimal places.
That is useful in some cases, for example when using
Intel PT cycle accurate mode.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt | 3 +++
 tools/perf/builtin-script.c              | 8 +++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index dc3ec783b7bd..b3b42f9285df 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -249,6 +249,9 @@ include::itrace.txt[]
 --full-source-path::
 	Show the full path for source files for srcline output.
 
+--ns::
+	Use 9 decimal places when displaying time (i.e. show the nanoseconds)
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 284a76e04628..092843968791 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -29,6 +29,7 @@ static bool			no_callchain;
 static bool			latency_format;
 static bool			system_wide;
 static bool			print_flags;
+static bool			nanosecs;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
@@ -415,7 +416,10 @@ static void print_sample_start(struct perf_sample *sample,
 		secs = nsecs / NSECS_PER_SEC;
 		nsecs -= secs * NSECS_PER_SEC;
 		usecs = nsecs / NSECS_PER_USEC;
-		printf("%5lu.%06lu: ", secs, usecs);
+		if (nanosecs)
+			printf("%5lu.%09llu: ", secs, nsecs);
+		else
+			printf("%5lu.%06lu: ", secs, usecs);
 	}
 }
 
@@ -1695,6 +1699,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
 		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
+	OPT_BOOLEAN(0, "ns", &nanosecs,
+		    "Use 9 decimal places when displaying time"),
 	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
 			    "Instruction Tracing options",
 			    itrace_parse_synth_opts),
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/25] perf tools: Warn when AUX data has been lost
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (4 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 05/25] perf script: Allow time to be displayed in nanoseconds Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:43   ` [tip:perf/core] perf session: " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 07/25] perf tools: Add more documentation to export-to-postgresql.py script Adrian Hunter
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

By default perf record will postprocess the perf.data file
to determine build-ids.  When that happens, the number of lost
perf events is displayed.

Make that also happen for AUX events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/event.h   |  1 +
 tools/perf/util/session.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index be5cbc7be889..a0dbcbd4f6d8 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -257,6 +257,7 @@ struct events_stats {
 	u64 total_non_filtered_period;
 	u64 total_lost;
 	u64 total_lost_samples;
+	u64 total_aux_lost;
 	u64 total_invalid_chains;
 	u32 nr_events[PERF_RECORD_HEADER_MAX];
 	u32 nr_non_filtered_samples;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f5e000030a5e..15c84cad213a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1101,6 +1101,9 @@ static int machines__deliver_event(struct machines *machines,
 	case PERF_RECORD_UNTHROTTLE:
 		return tool->unthrottle(tool, event, sample, machine);
 	case PERF_RECORD_AUX:
+		if (tool->aux == perf_event__process_aux &&
+		    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED))
+			evlist->stats.total_aux_lost += 1;
 		return tool->aux(tool, event, sample, machine);
 	case PERF_RECORD_ITRACE_START:
 		return tool->itrace_start(tool, event, sample, machine);
@@ -1346,6 +1349,13 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
 		}
 	}
 
+	if (session->tool->aux == perf_event__process_aux &&
+	    stats->total_aux_lost != 0) {
+		ui__warning("AUX data lost %" PRIu64 " times out of %u!\n\n",
+			    stats->total_aux_lost,
+			    stats->nr_events[PERF_RECORD_AUX]);
+	}
+
 	if (stats->nr_unknown_events != 0) {
 		ui__warning("Found %u unknown events!\n\n"
 			    "Is this an older tool processing a perf.data "
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/25] perf tools: Add more documentation to export-to-postgresql.py script
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (5 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 06/25] perf tools: Warn when AUX data has been lost Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:43   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 08/25] perf auxtrace: Add option to synthesize branch stacks on samples Adrian Hunter
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Add some comments to the script and some 'views' to the created
database that better illustrate the database structure and how it
can be used.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/scripts/python/export-to-postgresql.py | 221 ++++++++++++++++++++++
 1 file changed, 221 insertions(+)

diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 84a32037a80f..1b02cdc0cab6 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -61,6 +61,142 @@ import datetime
 #
 # An example of using the database is provided by the script
 # call-graph-from-postgresql.py.  Refer to that script for details.
+#
+# Tables:
+#
+#	The tables largely correspond to perf tools' data structures.  They are largely self-explanatory.
+#
+#	samples
+#
+#		'samples' is the main table. It represents what instruction was executing at a point in time
+#		when something (a selected event) happened.  The memory address is the instruction pointer or 'ip'.
+#
+#	calls
+#
+#		'calls' represents function calls and is related to 'samples' by 'call_id' and 'return_id'.
+#		'calls' is only created when the 'calls' option to this script is specified.
+#
+#	call_paths
+#
+#		'call_paths' represents all the call stacks.  Each 'call' has an associated record in 'call_paths'.
+#		'calls_paths' is only created when the 'calls' option to this script is specified.
+#
+#	branch_types
+#
+#		'branch_types' provides descriptions for each type of branch.
+#
+#	comm_threads
+#
+#		'comm_threads' shows how 'comms' relates to 'threads'.
+#
+#	comms
+#
+#		'comms' contains a record for each 'comm' - the name given to the executable that is running.
+#
+#	dsos
+#
+#		'dsos' contains a record for each executable file or library.
+#
+#	machines
+#
+#		'machines' can be used to distinguish virtual machines if virtualization is supported.
+#
+#	selected_events
+#
+#		'selected_events' contains a record for each kind of event that has been sampled.
+#
+#	symbols
+#
+#		'symbols' contains a record for each symbol.  Only symbols that have samples are present.
+#
+#	threads
+#
+#		'threads' contains a record for each thread.
+#
+# Views:
+#
+#	Most of the tables have views for more friendly display.  The views are:
+#
+#		calls_view
+#		call_paths_view
+#		comm_threads_view
+#		dsos_view
+#		machines_view
+#		samples_view
+#		symbols_view
+#		threads_view
+#
+# More examples of browsing the database with psql:
+#   Note that some of the examples are not the most optimal SQL query.
+#   Note that call information is only available if the script's 'calls' option has been used.
+#
+#	Top 10 function calls (not aggregated by symbol):
+#
+#		SELECT * FROM calls_view ORDER BY elapsed_time DESC LIMIT 10;
+#
+#	Top 10 function calls (aggregated by symbol):
+#
+#		SELECT symbol_id,(SELECT name FROM symbols WHERE id = symbol_id) AS symbol,
+#			SUM(elapsed_time) AS tot_elapsed_time,SUM(branch_count) AS tot_branch_count
+#			FROM calls_view GROUP BY symbol_id ORDER BY tot_elapsed_time DESC LIMIT 10;
+#
+#		Note that the branch count gives a rough estimation of cpu usage, so functions
+#		that took a long time but have a relatively low branch count must have spent time
+#		waiting.
+#
+#	Find symbols by pattern matching on part of the name (e.g. names containing 'alloc'):
+#
+#		SELECT * FROM symbols_view WHERE name LIKE '%alloc%';
+#
+#	Top 10 function calls for a specific symbol (e.g. whose symbol_id is 187):
+#
+#		SELECT * FROM calls_view WHERE symbol_id = 187 ORDER BY elapsed_time DESC LIMIT 10;
+#
+#	Show function calls made by function in the same context (i.e. same call path) (e.g. one with call_path_id 254):
+#
+#		SELECT * FROM calls_view WHERE parent_call_path_id = 254;
+#
+#	Show branches made during a function call (e.g. where call_id is 29357 and return_id is 29370 and tid is 29670)
+#
+#		SELECT * FROM samples_view WHERE id >= 29357 AND id <= 29370 AND tid = 29670 AND event LIKE 'branches%';
+#
+#	Show transactions:
+#
+#		SELECT * FROM samples_view WHERE event = 'transactions';
+#
+#		Note transaction start has 'in_tx' true whereas, transaction end has 'in_tx' false.
+#		Transaction aborts have branch_type_name 'transaction abort'
+#
+#	Show transaction aborts:
+#
+#		SELECT * FROM samples_view WHERE event = 'transactions' AND branch_type_name = 'transaction abort';
+#
+# To print a call stack requires walking the call_paths table.  For example this python script:
+#   #!/usr/bin/python2
+#
+#   import sys
+#   from PySide.QtSql import *
+#
+#   if __name__ == '__main__':
+#           if (len(sys.argv) < 3):
+#                   print >> sys.stderr, "Usage is: printcallstack.py <database name> <call_path_id>"
+#                   raise Exception("Too few arguments")
+#           dbname = sys.argv[1]
+#           call_path_id = sys.argv[2]
+#           db = QSqlDatabase.addDatabase('QPSQL')
+#           db.setDatabaseName(dbname)
+#           if not db.open():
+#                   raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
+#           query = QSqlQuery(db)
+#           print "    id          ip  symbol_id  symbol                          dso_id  dso_short_name"
+#           while call_path_id != 0 and call_path_id != 1:
+#                   ret = query.exec_('SELECT * FROM call_paths_view WHERE id = ' + str(call_path_id))
+#                   if not ret:
+#                           raise Exception("Query failed: " + query.lastError().text())
+#                   if not query.next():
+#                           raise Exception("Query failed")
+#                   print "{0:>6}  {1:>10}  {2:>9}  {3:<30}  {4:>6}  {5:<30}".format(query.value(0), query.value(1), query.value(2), query.value(3), query.value(4), query.value(5))
+#                   call_path_id = query.value(6)
 
 from PySide.QtSql import *
 
@@ -244,6 +380,91 @@ if perf_db_export_calls:
 		'parent_call_path_id	bigint,'
 		'flags		integer)')
 
+do_query(query, 'CREATE VIEW machines_view AS '
+	'SELECT '
+		'id,'
+		'pid,'
+		'root_dir,'
+		'CASE WHEN id=0 THEN \'unknown\' WHEN pid=-1 THEN \'host\' ELSE \'guest\' END AS host_or_guest'
+	' FROM machines')
+
+do_query(query, 'CREATE VIEW dsos_view AS '
+	'SELECT '
+		'id,'
+		'machine_id,'
+		'(SELECT host_or_guest FROM machines_view WHERE id = machine_id) AS host_or_guest,'
+		'short_name,'
+		'long_name,'
+		'build_id'
+	' FROM dsos')
+
+do_query(query, 'CREATE VIEW symbols_view AS '
+	'SELECT '
+		'id,'
+		'name,'
+		'(SELECT short_name FROM dsos WHERE id=dso_id) AS dso,'
+		'dso_id,'
+		'sym_start,'
+		'sym_end,'
+		'CASE WHEN binding=0 THEN \'local\' WHEN binding=1 THEN \'global\' ELSE \'weak\' END AS binding'
+	' FROM symbols')
+
+do_query(query, 'CREATE VIEW threads_view AS '
+	'SELECT '
+		'id,'
+		'machine_id,'
+		'(SELECT host_or_guest FROM machines_view WHERE id = machine_id) AS host_or_guest,'
+		'process_id,'
+		'pid,'
+		'tid'
+	' FROM threads')
+
+do_query(query, 'CREATE VIEW comm_threads_view AS '
+	'SELECT '
+		'comm_id,'
+		'(SELECT comm FROM comms WHERE id = comm_id) AS command,'
+		'thread_id,'
+		'(SELECT pid FROM threads WHERE id = thread_id) AS pid,'
+		'(SELECT tid FROM threads WHERE id = thread_id) AS tid'
+	' FROM comm_threads')
+
+if perf_db_export_calls:
+	do_query(query, 'CREATE VIEW call_paths_view AS '
+		'SELECT '
+			'c.id,'
+			'to_hex(c.ip) AS ip,'
+			'c.symbol_id,'
+			'(SELECT name FROM symbols WHERE id = c.symbol_id) AS symbol,'
+			'(SELECT dso_id FROM symbols WHERE id = c.symbol_id) AS dso_id,'
+			'(SELECT dso FROM symbols_view  WHERE id = c.symbol_id) AS dso_short_name,'
+			'c.parent_id,'
+			'to_hex(p.ip) AS parent_ip,'
+			'p.symbol_id AS parent_symbol_id,'
+			'(SELECT name FROM symbols WHERE id = p.symbol_id) AS parent_symbol,'
+			'(SELECT dso_id FROM symbols WHERE id = p.symbol_id) AS parent_dso_id,'
+			'(SELECT dso FROM symbols_view  WHERE id = p.symbol_id) AS parent_dso_short_name'
+		' FROM call_paths c INNER JOIN call_paths p ON p.id = c.parent_id')
+	do_query(query, 'CREATE VIEW calls_view AS '
+		'SELECT '
+			'calls.id,'
+			'thread_id,'
+			'(SELECT pid FROM threads WHERE id = thread_id) AS pid,'
+			'(SELECT tid FROM threads WHERE id = thread_id) AS tid,'
+			'(SELECT comm FROM comms WHERE id = comm_id) AS command,'
+			'call_path_id,'
+			'to_hex(ip) AS ip,'
+			'symbol_id,'
+			'(SELECT name FROM symbols WHERE id = symbol_id) AS symbol,'
+			'call_time,'
+			'return_time,'
+			'return_time - call_time AS elapsed_time,'
+			'branch_count,'
+			'call_id,'
+			'return_id,'
+			'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
+			'parent_call_path_id'
+		' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
+
 do_query(query, 'CREATE VIEW samples_view AS '
 	'SELECT '
 		'id,'
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/25] perf auxtrace: Add option to synthesize branch stacks on samples
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (6 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 07/25] perf tools: Add more documentation to export-to-postgresql.py script Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:43   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 09/25] perf report: Adjust sample type validation for synthesized branch stacks Adrian Hunter
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Add AUX area tracing option 'l' to synthesize branch stacks on samples
just like sample type PERF_SAMPLE_BRANCH_STACK.  This is taken into use
by Intel PT in a subsequent patch.

Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/itrace.txt |  4 ++++
 tools/perf/util/auxtrace.c          | 20 ++++++++++++++++++++
 tools/perf/util/auxtrace.h          |  4 ++++
 3 files changed, 28 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 2ff946677e3b..65453f4c7006 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -6,6 +6,7 @@
 		e	synthesize error events
 		d	create a debug log
 		g	synthesize a call chain (use with i or x)
+		l	synthesize last branch entries (use with i or x)
 
 	The default is all events i.e. the same as --itrace=ibxe
 
@@ -20,3 +21,6 @@
 
 	Also the call chain size (default 16, max. 1024) for instructions or
 	transactions events can be specified.
+
+	Also the number of last branch entries (default 64, max. 1024) for
+	instructions or transactions events can be specified.
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c4993b2e6c50..7f10430af39c 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -926,6 +926,8 @@ s64 perf_event__process_auxtrace(struct perf_tool *tool,
 #define PERF_ITRACE_DEFAULT_PERIOD		100000
 #define PERF_ITRACE_DEFAULT_CALLCHAIN_SZ	16
 #define PERF_ITRACE_MAX_CALLCHAIN_SZ		1024
+#define PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ	64
+#define PERF_ITRACE_MAX_LAST_BRANCH_SZ		1024
 
 void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts)
 {
@@ -936,6 +938,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts)
 	synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
 	synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
 	synth_opts->callchain_sz = PERF_ITRACE_DEFAULT_CALLCHAIN_SZ;
+	synth_opts->last_branch_sz = PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
 }
 
 /*
@@ -1043,6 +1046,23 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				synth_opts->callchain_sz = val;
 			}
 			break;
+		case 'l':
+			synth_opts->last_branch = true;
+			synth_opts->last_branch_sz =
+					PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
+			while (*p == ' ' || *p == ',')
+				p += 1;
+			if (isdigit(*p)) {
+				unsigned int val;
+
+				val = strtoul(p, &endptr, 10);
+				p = endptr;
+				if (!val ||
+				    val > PERF_ITRACE_MAX_LAST_BRANCH_SZ)
+					goto out_err;
+				synth_opts->last_branch_sz = val;
+			}
+			break;
 		case ' ':
 		case ',':
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index bf72b77a588a..b86f90db1352 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -63,7 +63,9 @@ enum itrace_period_type {
  * @calls: limit branch samples to calls (can be combined with @returns)
  * @returns: limit branch samples to returns (can be combined with @calls)
  * @callchain: add callchain to 'instructions' events
+ * @last_branch: add branch context to 'instruction' events
  * @callchain_sz: maximum callchain size
+ * @last_branch_sz: branch context size
  * @period: 'instructions' events period
  * @period_type: 'instructions' events period type
  */
@@ -79,7 +81,9 @@ struct itrace_synth_opts {
 	bool			calls;
 	bool			returns;
 	bool			callchain;
+	bool			last_branch;
 	unsigned int		callchain_sz;
+	unsigned int		last_branch_sz;
 	unsigned long long	period;
 	enum itrace_period_type	period_type;
 };
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/25] perf report: Adjust sample type validation for synthesized branch stacks
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (7 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 08/25] perf auxtrace: Add option to synthesize branch stacks on samples Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 10/25] perf report: Also do default setup " Adrian Hunter
                   ` (16 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

perf report looks at event sample types to determine if branch stacks have
been sampled.  Adjust the validation to know about instruction tracing
options.

This change allows the use of the -b option which otherwise would complain
with an error like:

	Error:
	Selected -b but no branch data. Did you call perf record without -b?
	# To display the perf.data header info,
	# please use --header/--header-only options.
	#

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-report.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 0d53b485a87b..7af35af5a5e5 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -220,6 +220,9 @@ static int report__setup_sample_type(struct report *rep)
 	     !session->itrace_synth_opts->set))
 		sample_type |= PERF_SAMPLE_CALLCHAIN;
 
+	if (session->itrace_synth_opts->last_branch)
+		sample_type |= PERF_SAMPLE_BRANCH_STACK;
+
 	if (!is_pipe && !(sample_type & PERF_SAMPLE_CALLCHAIN)) {
 		if (sort__has_parent) {
 			ui__error("Selected --sort parent, but no "
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/25] perf report: Also do default setup for synthesized branch stacks
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (8 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 09/25] perf report: Adjust sample type validation for synthesized branch stacks Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 11/25] perf report: Skip events with null " Adrian Hunter
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

perf report will default to displaying branch stacks (-b option) if they
are present.  Make that also happen for synthesized branch stacks.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-report.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 7af35af5a5e5..92f7c5a75208 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -829,6 +829,9 @@ repeat:
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
+	if (itrace_synth_opts.last_branch)
+		has_br_stack = true;
+
 	/*
 	 * Branch mode is a tristate:
 	 * -1 means default, so decide based on the file having branch data.
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/25] perf report: Skip events with null branch stacks
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (9 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 10/25] perf report: Also do default setup " Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 12/25] perf inject: Set branch stack feature flag when synthesizing " Adrian Hunter
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

A non-synthesized event might not have a branch stack if branch
stacks have been synthesized (using itrace options).

An example of that is when Intel PT records sched_switch events
for decoding purposes.  Those sched_switch events do not have
branch stacks even though the Intel PT decoder may be synthesizing
other events that do due to the itrace options.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-report.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 92f7c5a75208..e94e5c7155af 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -163,14 +163,21 @@ static int process_sample_event(struct perf_tool *tool,
 	if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
 		goto out_put;
 
-	if (sort__mode == SORT_MODE__BRANCH)
+	if (sort__mode == SORT_MODE__BRANCH) {
+		/*
+		 * A non-synthesized event might not have a branch stack if
+		 * branch stacks have been synthesized (using itrace options).
+		 */
+		if (!sample->branch_stack)
+			goto out_put;
 		iter.ops = &hist_iter_branch;
-	else if (rep->mem_mode)
+	} else if (rep->mem_mode) {
 		iter.ops = &hist_iter_mem;
-	else if (symbol_conf.cumulate_callchain)
+	} else if (symbol_conf.cumulate_callchain) {
 		iter.ops = &hist_iter_cumulative;
-	else
+	} else {
 		iter.ops = &hist_iter_normal;
+	}
 
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/25] perf inject: Set branch stack feature flag when synthesizing branch stacks
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (10 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 11/25] perf report: Skip events with null " Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 13/25] perf intel-pt: Move branch filter logic Adrian Hunter
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

The branch stack feature flag is set by 'perf record' when recording
data that contains branch stacks.  Consequently, when 'perf inject'
synthesizes branch stacks, the feature flag should be set also.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f62c49b35be0..8638fad8a085 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -537,9 +537,13 @@ static int __cmd_inject(struct perf_inject *inject)
 		 * The AUX areas have been removed and replaced with
 		 * synthesized hardware events, so clear the feature flag.
 		 */
-		if (inject->itrace_synth_opts.set)
+		if (inject->itrace_synth_opts.set) {
 			perf_header__clear_feat(&session->header,
 						HEADER_AUXTRACE);
+			if (inject->itrace_synth_opts.last_branch)
+				perf_header__set_feat(&session->header,
+						      HEADER_BRANCH_STACK);
+		}
 		session->header.data_offset = output_data_offset;
 		session->header.data_size = inject->bytes_written;
 		perf_session__write_header(session, session->evlist, fd, true);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 13/25] perf intel-pt: Move branch filter logic
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (11 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 12/25] perf inject: Set branch stack feature flag when synthesizing " Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 14/25] perf intel-pt: Support generating branch stack Adrian Hunter
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

intel_pt_synth_branch_sample() skips synthesizing if the branch
does not match the branch filter.  That logic was sitting in the
middle of the function but is more efficiently placed at the
start of the function, so move it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index c8bb5ca6a157..2c01e723826a 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -891,6 +891,9 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	union perf_event *event = ptq->event_buf;
 	struct perf_sample sample = { .ip = 0, };
 
+	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
+		return 0;
+
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = PERF_RECORD_MISC_USER;
 	event->sample.header.size = sizeof(struct perf_event_header);
@@ -909,9 +912,6 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	sample.flags = ptq->flags;
 	sample.insn_len = ptq->insn_len;
 
-	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
-		return 0;
-
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->branches_sample_type,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 14/25] perf intel-pt: Support generating branch stack
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (12 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 13/25] perf intel-pt: Move branch filter logic Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains Adrian Hunter
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Add support for generating branch stack context for PT samples.
The decoder reports a configurable number of branches as branch
context for each sample. Internally it keeps track of them by
using a simple sliding window.  We also flush the last branch
buffer on each sample to avoid overlapping intervals.

This is useful for:

- Reporting accurate basic block edge frequencies through the perf
report branch view
- Using with --branch-history to get the wider context of samples
- Other users of LBRs

Also the Documentation is updated.

Examples:

	Record with Intel PT:

		perf record -e intel_pt//u ls

	Branch stacks are used by default if synthesized so:

		perf report --itrace=ile

	is the same as:

		perf report --itrace=ile -b

	Branch history can be requested also:

		perf report --itrace=igle --branch-history

Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-pt.txt |  10 +++
 tools/perf/util/intel-pt.c            | 115 ++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index 4a0501d7a3b4..f4d8e706619c 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -686,6 +686,7 @@ The letters are:
 	e	synthesize tracing error events
 	d	create a debug log
 	g	synthesize a call chain (use with i or x)
+	l	synthesize last branch entries (use with i or x)
 
 "Instructions" events look like they were recorded by "perf record -e
 instructions".
@@ -728,6 +729,15 @@ transactions events can be specified. e.g.
 	--itrace=ig32
 	--itrace=xg32
 
+Also the number of last branch entries (default 64, max. 1024) for instructions or
+transactions events can be specified. e.g.
+
+       --itrace=il10
+       --itrace=xl10
+
+Note that last branch entries are cleared for each sample, so there is no overlap
+from one sample to the next.
+
 To disable trace decoding entirely, use the option --no-itrace.
 
 
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 2c01e723826a..05e8fcc5188b 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -22,6 +22,7 @@
 #include "../perf.h"
 #include "session.h"
 #include "machine.h"
+#include "sort.h"
 #include "tool.h"
 #include "event.h"
 #include "evlist.h"
@@ -115,6 +116,9 @@ struct intel_pt_queue {
 	void *decoder;
 	const struct intel_pt_state *state;
 	struct ip_callchain *chain;
+	struct branch_stack *last_branch;
+	struct branch_stack *last_branch_rb;
+	size_t last_branch_pos;
 	union perf_event *event_buf;
 	bool on_heap;
 	bool stop;
@@ -675,6 +679,19 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 			goto out_free;
 	}
 
+	if (pt->synth_opts.last_branch) {
+		size_t sz = sizeof(struct branch_stack);
+
+		sz += pt->synth_opts.last_branch_sz *
+		      sizeof(struct branch_entry);
+		ptq->last_branch = zalloc(sz);
+		if (!ptq->last_branch)
+			goto out_free;
+		ptq->last_branch_rb = zalloc(sz);
+		if (!ptq->last_branch_rb)
+			goto out_free;
+	}
+
 	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
 	if (!ptq->event_buf)
 		goto out_free;
@@ -732,6 +749,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 
 out_free:
 	zfree(&ptq->event_buf);
+	zfree(&ptq->last_branch);
+	zfree(&ptq->last_branch_rb);
 	zfree(&ptq->chain);
 	free(ptq);
 	return NULL;
@@ -746,6 +765,8 @@ static void intel_pt_free_queue(void *priv)
 	thread__zput(ptq->thread);
 	intel_pt_decoder_free(ptq->decoder);
 	zfree(&ptq->event_buf);
+	zfree(&ptq->last_branch);
+	zfree(&ptq->last_branch_rb);
 	zfree(&ptq->chain);
 	free(ptq);
 }
@@ -876,6 +897,57 @@ static int intel_pt_setup_queues(struct intel_pt *pt)
 	return 0;
 }
 
+static inline void intel_pt_copy_last_branch_rb(struct intel_pt_queue *ptq)
+{
+	struct branch_stack *bs_src = ptq->last_branch_rb;
+	struct branch_stack *bs_dst = ptq->last_branch;
+	size_t nr = 0;
+
+	bs_dst->nr = bs_src->nr;
+
+	if (!bs_src->nr)
+		return;
+
+	nr = ptq->pt->synth_opts.last_branch_sz - ptq->last_branch_pos;
+	memcpy(&bs_dst->entries[0],
+	       &bs_src->entries[ptq->last_branch_pos],
+	       sizeof(struct branch_entry) * nr);
+
+	if (bs_src->nr >= ptq->pt->synth_opts.last_branch_sz) {
+		memcpy(&bs_dst->entries[nr],
+		       &bs_src->entries[0],
+		       sizeof(struct branch_entry) * ptq->last_branch_pos);
+	}
+}
+
+static inline void intel_pt_reset_last_branch_rb(struct intel_pt_queue *ptq)
+{
+	ptq->last_branch_pos = 0;
+	ptq->last_branch_rb->nr = 0;
+}
+
+static void intel_pt_update_last_branch_rb(struct intel_pt_queue *ptq)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct branch_stack *bs = ptq->last_branch_rb;
+	struct branch_entry *be;
+
+	if (!ptq->last_branch_pos)
+		ptq->last_branch_pos = ptq->pt->synth_opts.last_branch_sz;
+
+	ptq->last_branch_pos -= 1;
+
+	be              = &bs->entries[ptq->last_branch_pos];
+	be->from        = state->from_ip;
+	be->to          = state->to_ip;
+	be->flags.abort = !!(state->flags & INTEL_PT_ABORT_TX);
+	be->flags.in_tx = !!(state->flags & INTEL_PT_IN_TX);
+	/* No support for mispredict */
+
+	if (bs->nr < ptq->pt->synth_opts.last_branch_sz)
+		bs->nr += 1;
+}
+
 static int intel_pt_inject_event(union perf_event *event,
 				 struct perf_sample *sample, u64 type,
 				 bool swapped)
@@ -890,6 +962,10 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	struct intel_pt *pt = ptq->pt;
 	union perf_event *event = ptq->event_buf;
 	struct perf_sample sample = { .ip = 0, };
+	struct dummy_branch_stack {
+		u64			nr;
+		struct branch_entry	entries;
+	} dummy_bs;
 
 	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
 		return 0;
@@ -912,6 +988,21 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	sample.flags = ptq->flags;
 	sample.insn_len = ptq->insn_len;
 
+	/*
+	 * perf report cannot handle events without a branch stack when using
+	 * SORT_MODE__BRANCH so make a dummy one.
+	 */
+	if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) {
+		dummy_bs = (struct dummy_branch_stack){
+			.nr = 1,
+			.entries = {
+				.from = sample.ip,
+				.to = sample.addr,
+			},
+		};
+		sample.branch_stack = (struct branch_stack *)&dummy_bs;
+	}
+
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->branches_sample_type,
@@ -961,6 +1052,11 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 		sample.callchain = ptq->chain;
 	}
 
+	if (pt->synth_opts.last_branch) {
+		intel_pt_copy_last_branch_rb(ptq);
+		sample.branch_stack = ptq->last_branch;
+	}
+
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->instructions_sample_type,
@@ -974,6 +1070,9 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
 		       ret);
 
+	if (pt->synth_opts.last_branch)
+		intel_pt_reset_last_branch_rb(ptq);
+
 	return ret;
 }
 
@@ -1008,6 +1107,11 @@ static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
 		sample.callchain = ptq->chain;
 	}
 
+	if (pt->synth_opts.last_branch) {
+		intel_pt_copy_last_branch_rb(ptq);
+		sample.branch_stack = ptq->last_branch;
+	}
+
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->transactions_sample_type,
@@ -1021,6 +1125,9 @@ static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
 		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
 		       ret);
 
+	if (pt->synth_opts.callchain)
+		intel_pt_reset_last_branch_rb(ptq);
+
 	return ret;
 }
 
@@ -1116,6 +1223,9 @@ static int intel_pt_sample(struct intel_pt_queue *ptq)
 			return err;
 	}
 
+	if (pt->synth_opts.last_branch)
+		intel_pt_update_last_branch_rb(ptq);
+
 	if (!pt->sync_switch)
 		return 0;
 
@@ -1763,6 +1873,8 @@ static int intel_pt_synth_events(struct intel_pt *pt,
 		pt->instructions_sample_period = attr.sample_period;
 		if (pt->synth_opts.callchain)
 			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		if (pt->synth_opts.last_branch)
+			attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
 		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
 			 id, (u64)attr.sample_type);
 		err = intel_pt_synth_event(session, &attr, id);
@@ -1782,6 +1894,8 @@ static int intel_pt_synth_events(struct intel_pt *pt,
 		attr.sample_period = 1;
 		if (pt->synth_opts.callchain)
 			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		if (pt->synth_opts.last_branch)
+			attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
 		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
 			 id, (u64)attr.sample_type);
 		err = intel_pt_synth_event(session, &attr, id);
@@ -1808,6 +1922,7 @@ static int intel_pt_synth_events(struct intel_pt *pt,
 		attr.sample_period = 1;
 		attr.sample_type |= PERF_SAMPLE_ADDR;
 		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_BRANCH_STACK;
 		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
 			 id, (u64)attr.sample_type);
 		err = intel_pt_synth_event(session, &attr, id);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (13 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 14/25] perf intel-pt: Support generating branch stack Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-28 20:03   ` Arnaldo Carvalho de Melo
  2015-09-29  8:46   ` [tip:perf/core] perf report: Make max_stack value allow for " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 16/25] perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
                   ` (10 subsequent siblings)
  25 siblings, 2 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

perf report has an option (--max-stack) to set the maximum stack depth
when processing callchains.  The option defaults to the hard-coded
maximum definition PERF_MAX_STACK_DEPTH which is 127.  The intention of
the option is to allow the user to reduce the processing time by
reducing the amount of the callchain that is processed.

It is also possible, when processing instruction traces, to synthesize
callchains.  Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024.  The
default value is 16.

To allow for synthesized callchains, make the max_stack value at least
the same size as the synthesized callchain size.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-report.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index e94e5c7155af..37c9f5125887 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -809,6 +809,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (report.inverted_callchain)
 		callchain_param.order = ORDER_CALLER;
 
+	if (itrace_synth_opts.callchain &&
+	    (int)itrace_synth_opts.callchain_sz > report.max_stack)
+		report.max_stack = itrace_synth_opts.callchain_sz;
+
 	if (!input_name || !strlen(input_name)) {
 		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
 			input_name = "-";
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 16/25] perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (14 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:46   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 17/25] perf callchain: " Adrian Hunter
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Use the max_stack value instead of PERF_MAX_STACK_DEPTH so that
arbitrary-sized callchains can be supported.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/hist.c | 6 ++++--
 tools/perf/util/hist.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b3567a25f0c4..0cad9e07c5b4 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -695,7 +695,7 @@ iter_finish_normal_entry(struct hist_entry_iter *iter,
 }
 
 static int
-iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
+iter_prepare_cumulative_entry(struct hist_entry_iter *iter,
 			      struct addr_location *al __maybe_unused)
 {
 	struct hist_entry **he_cache;
@@ -707,7 +707,7 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
 	 * cumulated only one time to prevent entries more than 100%
 	 * overhead.
 	 */
-	he_cache = malloc(sizeof(*he_cache) * (PERF_MAX_STACK_DEPTH + 1));
+	he_cache = malloc(sizeof(*he_cache) * (iter->max_stack + 1));
 	if (he_cache == NULL)
 		return -ENOMEM;
 
@@ -868,6 +868,8 @@ int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
 	if (err)
 		return err;
 
+	iter->max_stack = max_stack_depth;
+
 	err = iter->ops->prepare_entry(iter, al);
 	if (err)
 		goto out;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4d6aa1dbdaee..8c20a8f6b214 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -90,6 +90,7 @@ struct hist_entry_iter {
 	int curr;
 
 	bool hide_unresolved;
+	int max_stack;
 
 	struct perf_evsel *evsel;
 	struct perf_sample *sample;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 17/25] perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (15 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 16/25] perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-28 20:08   ` Arnaldo Carvalho de Melo
  2015-10-03  7:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 18/25] perf script: Add a setting for maximum stack depth Adrian Hunter
                   ` (8 subsequent siblings)
  25 siblings, 2 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Adjust the validation to allow for max_stack greater than
PERF_MAX_STACK_DEPTH.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index fd1efeafb343..d7bd9a304535 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1831,7 +1831,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	}
 
 check_calls:
-	if (chain->nr > PERF_MAX_STACK_DEPTH) {
+	if (chain->nr > PERF_MAX_STACK_DEPTH && (int)chain->nr > max_stack) {
 		pr_warning("corrupted callchain. skipping...\n");
 		return 0;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 18/25] perf script: Add a setting for maximum stack depth
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (16 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 17/25] perf callchain: " Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:46   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 19/25] perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Add a setting for maximum stack depth in preparation for
allowing for synthesized callchains.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c | 6 ++++--
 tools/perf/util/session.c   | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 092843968791..a65b498df097 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -33,6 +33,8 @@ static bool			nanosecs;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
+static unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
+
 enum perf_output_field {
 	PERF_OUTPUT_COMM            = 1U << 0,
 	PERF_OUTPUT_TID             = 1U << 1,
@@ -475,7 +477,7 @@ static void print_sample_bts(union perf_event *event,
 			}
 		}
 		perf_evsel__print_ip(evsel, sample, al, print_opts,
-				     PERF_MAX_STACK_DEPTH);
+				     scripting_max_stack);
 	}
 
 	/* print branch_to information */
@@ -552,7 +554,7 @@ static void process_event(union perf_event *event, struct perf_sample *sample,
 
 		perf_evsel__print_ip(evsel, sample, al,
 				     output[attr->type].print_ip_opts,
-				     PERF_MAX_STACK_DEPTH);
+				     scripting_max_stack);
 	}
 
 	if (PRINT_FIELD(IREGS))
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 15c84cad213a..84a02eae4394 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1800,7 +1800,7 @@ void perf_evsel__print_ip(struct perf_evsel *evsel, struct perf_sample *sample,
 
 		if (thread__resolve_callchain(al->thread, evsel,
 					      sample, NULL, NULL,
-					      PERF_MAX_STACK_DEPTH) != 0) {
+					      stack_depth) != 0) {
 			if (verbose)
 				error("Failed to resolve callchain. Skipping\n");
 			return;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 19/25] perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (17 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 18/25] perf script: Add a setting for maximum stack depth Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:47   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 20/25] perf script: Make scripting_max_stack value allow for synthesized callchains Adrian Hunter
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Use the scripting_max_stack value to allow for values greater than
PERF_MAX_STACK_DEPTH.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c                            | 2 +-
 tools/perf/util/scripting-engines/trace-event-python.c | 2 +-
 tools/perf/util/trace-event.h                          | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index a65b498df097..5c3c02d5af53 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -33,7 +33,7 @@ static bool			nanosecs;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
-static unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
+unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
 
 enum perf_output_field {
 	PERF_OUTPUT_COMM            = 1U << 0,
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index aa9e1257c1ee..a8e825fca42a 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -319,7 +319,7 @@ static PyObject *python_process_callchain(struct perf_sample *sample,
 
 	if (thread__resolve_callchain(al->thread, evsel,
 				      sample, NULL, NULL,
-				      PERF_MAX_STACK_DEPTH) != 0) {
+				      scripting_max_stack) != 0) {
 		pr_err("Failed to resolve callchain. Skipping\n");
 		goto exit;
 	}
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index da6cc4cc2a4f..b85ee55cca0c 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -78,6 +78,8 @@ struct scripting_ops {
 	int (*generate_script) (struct pevent *pevent, const char *outfile);
 };
 
+extern unsigned int scripting_max_stack;
+
 int script_spec_register(const char *spec, struct scripting_ops *ops);
 
 void setup_perl_scripting(void);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 20/25] perf script: Make scripting_max_stack value allow for synthesized callchains
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (18 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 19/25] perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:47   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 21/25] perf tools: Add perf_evlist__id2evsel_strict() Adrian Hunter
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

perf script has a setting to set the maximum stack depth when processing
callchains.  The setting defaults to the hard-coded maximum definition
PERF_MAX_STACK_DEPTH which is 127.

It is possible, when processing instruction traces, to synthesize
callchains.  Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024.  The
default value is 16.

To allow for synthesized callchains, make the scripting_max_stack value
at least the same size as the synthesized callchain size.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 5c3c02d5af53..8ce1c6bbfa45 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1748,6 +1748,10 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 	}
 
+	if (itrace_synth_opts.callchain &&
+	    itrace_synth_opts.callchain_sz > scripting_max_stack)
+		scripting_max_stack = itrace_synth_opts.callchain_sz;
+
 	/* make sure PERF_EXEC_PATH is set for scripts */
 	perf_set_argv_exec_path(perf_exec_path());
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 21/25] perf tools: Add perf_evlist__id2evsel_strict()
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (19 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 20/25] perf script: Make scripting_max_stack value allow for synthesized callchains Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:47   ` [tip:perf/core] perf evlist: " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 22/25] perf tools: Add perf_evlist__del() Adrian Hunter
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

perf_evlist__id2evsel_strict() is the same as perf_evlist__id2evsel()
except that it ensures that the id must match.

This will be used by perf inject to find a specific evsel that is to
be deleted, hence the need to match exactly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 15 +++++++++++++++
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index a8643735dcea..e6760380d731 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -617,6 +617,21 @@ struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id)
 	return NULL;
 }
 
+struct perf_evsel *perf_evlist__id2evsel_strict(struct perf_evlist *evlist,
+						u64 id)
+{
+	struct perf_sample_id *sid;
+
+	if (!id)
+		return NULL;
+
+	sid = perf_evlist__id2sid(evlist, id);
+	if (sid)
+		return sid->evsel;
+
+	return NULL;
+}
+
 static int perf_evlist__event2id(struct perf_evlist *evlist,
 				 union perf_event *event, u64 *id)
 {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 115d8b53c601..0edf0d4f4efa 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -104,6 +104,8 @@ int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mas
 int perf_evlist__poll(struct perf_evlist *evlist, int timeout);
 
 struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
+struct perf_evsel *perf_evlist__id2evsel_strict(struct perf_evlist *evlist,
+						u64 id);
 
 struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 22/25] perf tools: Add perf_evlist__del()
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (20 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 21/25] perf tools: Add perf_evlist__id2evsel_strict() Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-28 13:33   ` Arnaldo Carvalho de Melo
  2015-09-29  8:48   ` [tip:perf/core] perf evlist: Add perf_evlist__remove() tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 23/25] perf inject: Remove more aux-related stuff when processing instruction traces Adrian Hunter
                   ` (3 subsequent siblings)
  25 siblings, 2 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Add a counterpart to perf_evlist__add() that does the opposite
and deletes the evsel.

This will be used by perf inject to remove unwanted evsels.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 8 ++++++++
 tools/perf/util/evlist.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e6760380d731..0bb15e6d12a0 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -165,6 +165,14 @@ void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry)
 	__perf_evlist__propagate_maps(evlist, entry);
 }
 
+void perf_evlist__del(struct perf_evlist *evlist, struct perf_evsel *evsel)
+{
+	evsel->evlist = NULL;
+	list_del_init(&evsel->node);
+	evlist->nr_entries -= 1;
+	perf_evsel__delete(evsel);
+}
+
 void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
 				   struct list_head *list)
 {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 0edf0d4f4efa..7fab57d85fa1 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -73,6 +73,7 @@ void perf_evlist__exit(struct perf_evlist *evlist);
 void perf_evlist__delete(struct perf_evlist *evlist);
 
 void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
+void perf_evlist__del(struct perf_evlist *evlist, struct perf_evsel *evsel);
 int perf_evlist__add_default(struct perf_evlist *evlist);
 int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 23/25] perf inject: Remove more aux-related stuff when processing instruction traces
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (21 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 22/25] perf tools: Add perf_evlist__del() Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:48   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 24/25] perf inject: Add --strip option to strip out non-synthesized events Adrian Hunter
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

perf inject can process instruction traces (using the --itrace option)
which removes aux-related events and replaces them with the requested
synthesized events.

However there are still some leftovers, namely PERF_RECORD_ITRACE_START
events and the original evsel (selected event) e.g. intel_pt//

For the sake of completeness, remove them too.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 8638fad8a085..ecd69fae587e 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -31,6 +31,7 @@ struct perf_inject {
 	const char		*input_name;
 	struct perf_data_file	output;
 	u64			bytes_written;
+	u64			aux_id;
 	struct list_head	samples;
 	struct itrace_synth_opts itrace_synth_opts;
 };
@@ -176,6 +177,19 @@ static int perf_event__repipe(struct perf_tool *tool,
 	return perf_event__repipe_synth(tool, event);
 }
 
+static int perf_event__drop_aux(struct perf_tool *tool,
+				union perf_event *event __maybe_unused,
+				struct perf_sample *sample,
+				struct machine *machine __maybe_unused)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+
+	if (!inject->aux_id)
+		inject->aux_id = sample->id;
+
+	return 0;
+}
+
 typedef int (*inject_handler)(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -512,6 +526,8 @@ static int __cmd_inject(struct perf_inject *inject)
 		inject->tool.id_index	    = perf_event__repipe_id_index;
 		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
 		inject->tool.auxtrace	    = perf_event__process_auxtrace;
+		inject->tool.aux	    = perf_event__drop_aux;
+		inject->tool.itrace_start   = perf_event__drop_aux,
 		inject->tool.ordered_events = true;
 		inject->tool.ordering_requires_timestamps = true;
 		/* Allow space in the header for new attributes */
@@ -535,14 +551,24 @@ static int __cmd_inject(struct perf_inject *inject)
 		}
 		/*
 		 * The AUX areas have been removed and replaced with
-		 * synthesized hardware events, so clear the feature flag.
+		 * synthesized hardware events, so clear the feature flag and
+		 * remove the evsel.
 		 */
 		if (inject->itrace_synth_opts.set) {
+			struct perf_evsel *evsel;
+
 			perf_header__clear_feat(&session->header,
 						HEADER_AUXTRACE);
 			if (inject->itrace_synth_opts.last_branch)
 				perf_header__set_feat(&session->header,
 						      HEADER_BRANCH_STACK);
+			evsel = perf_evlist__id2evsel_strict(session->evlist,
+							     inject->aux_id);
+			if (evsel) {
+				pr_debug("Deleting %s\n",
+					 perf_evsel__name(evsel));
+				perf_evlist__del(session->evlist, evsel);
+			}
 		}
 		session->header.data_offset = output_data_offset;
 		session->header.data_size = inject->bytes_written;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 24/25] perf inject: Add --strip option to strip out non-synthesized events
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (22 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 23/25] perf inject: Remove more aux-related stuff when processing instruction traces Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-25 13:15 ` [PATCH 25/25] perf intel-pt: Add mispred-all config option to aid use with autofdo Adrian Hunter
  2015-09-28 20:33 ` [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Arnaldo Carvalho de Melo
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

Add a new option --strip which is used with --itrace to strip out
non-synthesized events.  This results in a perf.data file that is
simpler for external tools to parse.  In particular, this can be used
to prepare a perf.data file for consumption by autofdo.

A subsequent patch makes a change to Intel PT also to enable use with
autofdo and gives an example of that use.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-inject.txt |  3 ++
 tools/perf/builtin-inject.c              | 91 ++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index 0c721c3e37e1..0b1cedeef895 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -50,6 +50,9 @@ OPTIONS
 
 include::itrace.txt[]
 
+--strip::
+	Use with --itrace to strip out non-synthesized events.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1]
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index ecd69fae587e..7cd64b7f3118 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -28,6 +28,7 @@ struct perf_inject {
 	bool			build_ids;
 	bool			sched_stat;
 	bool			have_auxtrace;
+	bool			strip;
 	const char		*input_name;
 	struct perf_data_file	output;
 	u64			bytes_written;
@@ -177,6 +178,14 @@ static int perf_event__repipe(struct perf_tool *tool,
 	return perf_event__repipe_synth(tool, event);
 }
 
+static int perf_event__drop(struct perf_tool *tool __maybe_unused,
+			    union perf_event *event __maybe_unused,
+			    struct perf_sample *sample __maybe_unused,
+			    struct machine *machine __maybe_unused)
+{
+	return 0;
+}
+
 static int perf_event__drop_aux(struct perf_tool *tool,
 				union perf_event *event __maybe_unused,
 				struct perf_sample *sample,
@@ -480,6 +489,77 @@ static int perf_evsel__check_stype(struct perf_evsel *evsel,
 	return 0;
 }
 
+static int drop_sample(struct perf_tool *tool __maybe_unused,
+		       union perf_event *event __maybe_unused,
+		       struct perf_sample *sample __maybe_unused,
+		       struct perf_evsel *evsel __maybe_unused,
+		       struct machine *machine __maybe_unused)
+{
+	return 0;
+}
+
+static void strip_init(struct perf_inject *inject)
+{
+	struct perf_evlist *evlist = inject->session->evlist;
+	struct perf_evsel *evsel;
+
+	inject->tool.context_switch = perf_event__drop;
+
+	evlist__for_each(evlist, evsel)
+		evsel->handler = drop_sample;
+}
+
+static bool has_tracking(struct perf_evsel *evsel)
+{
+	return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm ||
+	       evsel->attr.task;
+}
+
+#define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
+		     PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_IDENTIFIER)
+
+/*
+ * In order that the perf.data file is parsable, tracking events like MMAP need
+ * their selected event to exist, except if there is only 1 selected event left
+ * and it has a compatible sample type.
+ */
+static bool ok_to_remove(struct perf_evlist *evlist,
+			 struct perf_evsel *evsel_to_remove)
+{
+	struct perf_evsel *evsel;
+	int cnt = 0;
+	bool ok = false;
+
+	if (!has_tracking(evsel_to_remove))
+		return true;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->handler != drop_sample) {
+			cnt += 1;
+			if ((evsel->attr.sample_type & COMPAT_MASK) ==
+			    (evsel_to_remove->attr.sample_type & COMPAT_MASK))
+				ok = true;
+		}
+	}
+
+	return ok && cnt == 1;
+}
+
+static void strip_fini(struct perf_inject *inject)
+{
+	struct perf_evlist *evlist = inject->session->evlist;
+	struct perf_evsel *evsel, *tmp;
+
+	/* Remove non-synthesized evsels if possible */
+	evlist__for_each_safe(evlist, tmp, evsel) {
+		if (evsel->handler == drop_sample &&
+		    ok_to_remove(evlist, evsel)) {
+			pr_debug("Deleting %s\n", perf_evsel__name(evsel));
+			perf_evlist__del(evlist, evsel);
+		}
+	}
+}
+
 static int __cmd_inject(struct perf_inject *inject)
 {
 	int ret = -EINVAL;
@@ -532,6 +612,8 @@ static int __cmd_inject(struct perf_inject *inject)
 		inject->tool.ordering_requires_timestamps = true;
 		/* Allow space in the header for new attributes */
 		output_data_offset = 4096;
+		if (inject->strip)
+			strip_init(inject);
 	}
 
 	if (!inject->itrace_synth_opts.set)
@@ -569,6 +651,8 @@ static int __cmd_inject(struct perf_inject *inject)
 					 perf_evsel__name(evsel));
 				perf_evlist__del(session->evlist, evsel);
 			}
+			if (inject->strip)
+				strip_fini(inject);
 		}
 		session->header.data_offset = output_data_offset;
 		session->header.data_size = inject->bytes_written;
@@ -634,6 +718,8 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 		OPT_CALLBACK_OPTARG(0, "itrace", &inject.itrace_synth_opts,
 				    NULL, "opts", "Instruction Tracing options",
 				    itrace_parse_synth_opts),
+		OPT_BOOLEAN(0, "strip", &inject.strip,
+			    "strip non-synthesized events (use with --itrace)"),
 		OPT_END()
 	};
 	const char * const inject_usage[] = {
@@ -649,6 +735,11 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (argc)
 		usage_with_options(inject_usage, options);
 
+	if (inject.strip && !inject.itrace_synth_opts.set) {
+		pr_err("--strip option requires --itrace option\n");
+		return -1;
+	}
+
 	if (perf_data_file__open(&inject.output)) {
 		perror("failed to create output file");
 		return -1;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 25/25] perf intel-pt: Add mispred-all config option to aid use with autofdo
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (23 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 24/25] perf inject: Add --strip option to strip out non-synthesized events Adrian Hunter
@ 2015-09-25 13:15 ` Adrian Hunter
  2015-09-29  8:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-09-28 20:33 ` [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Arnaldo Carvalho de Melo
  25 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-25 13:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

autofdo incorrectly expects branch flags to include either mispred
or predicted.  In fact mispred = predicted = 0 is valid and means
the flags are not supported, which they aren't by Intel PT.

To make autofdo work, add a config option which will cause Intel
PT decoder to set the mispred flag on all branches.

Below is an example of using Intel PT with autofdo.  The example is
also added to the Intel PT documentation.  It requires autofdo
(https://github.com/google/autofdo) and gcc version 5.  The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
amended to take the number of elements as a parameter.

	$ gcc-5 -O3 sort.c -o sort_optimized
	$ ./sort_optimized 30000
	Bubble sorting array of 30000 elements
	2254 ms

	$ cat ~/.perfconfig
	[intel-pt]
		mispred-all

	$ perf record -e intel_pt//u ./sort 3000
	Bubble sorting array of 3000 elements
	58 ms
	[ perf record: Woken up 2 times to write data ]
	[ perf record: Captured and wrote 3.939 MB perf.data ]
	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
	$ ./sort_autofdo 30000
	Bubble sorting array of 30000 elements
	2155 ms

Note there is currently no advantage to using Intel PT instead of LBR, but
that may change in the future if greater use is made of the data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-pt.txt | 29 +++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.c            | 14 ++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index f4d8e706619c..7a2a3f3d9818 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -774,3 +774,32 @@ perf inject also accepts the --itrace option in which case tracing data is
 removed and replaced with the synthesized events. e.g.
 
 	perf inject --itrace -i perf.data -o perf.data.new
+
+Below is an example of using Intel PT with autofdo.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
+amended to take the number of elements as a parameter.
+
+	$ gcc-5 -O3 sort.c -o sort_optimized
+	$ ./sort_optimized 30000
+	Bubble sorting array of 30000 elements
+	2254 ms
+
+	$ cat ~/.perfconfig
+	[intel-pt]
+		mispred-all
+
+	$ perf record -e intel_pt//u ./sort 3000
+	Bubble sorting array of 3000 elements
+	58 ms
+	[ perf record: Woken up 2 times to write data ]
+	[ perf record: Captured and wrote 3.939 MB perf.data ]
+	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
+	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ ./sort_autofdo 30000
+	Bubble sorting array of 30000 elements
+	2155 ms
+
+Note there is currently no advantage to using Intel PT instead of LBR, but
+that may change in the future if greater use is made of the data.
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 05e8fcc5188b..03ff072b5993 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -64,6 +64,7 @@ struct intel_pt {
 	bool data_queued;
 	bool est_tsc;
 	bool sync_switch;
+	bool mispred_all;
 	int have_sched_switch;
 	u32 pmu_type;
 	u64 kernel_start;
@@ -943,6 +944,7 @@ static void intel_pt_update_last_branch_rb(struct intel_pt_queue *ptq)
 	be->flags.abort = !!(state->flags & INTEL_PT_ABORT_TX);
 	be->flags.in_tx = !!(state->flags & INTEL_PT_IN_TX);
 	/* No support for mispredict */
+	be->flags.mispred = ptq->pt->mispred_all;
 
 	if (bs->nr < ptq->pt->synth_opts.last_branch_sz)
 		bs->nr += 1;
@@ -1967,6 +1969,16 @@ static bool intel_pt_find_switch(struct perf_evlist *evlist)
 	return false;
 }
 
+static int intel_pt_perf_config(const char *var, const char *value, void *data)
+{
+	struct intel_pt *pt = data;
+
+	if (!strcmp(var, "intel-pt.mispred-all"))
+		pt->mispred_all = perf_config_bool(var, value);
+
+	return 0;
+}
+
 static const char * const intel_pt_info_fmts[] = {
 	[INTEL_PT_PMU_TYPE]		= "  PMU Type            %"PRId64"\n",
 	[INTEL_PT_TIME_SHIFT]		= "  Time Shift          %"PRIu64"\n",
@@ -2011,6 +2023,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	if (!pt)
 		return -ENOMEM;
 
+	perf_config(intel_pt_perf_config, pt);
+
 	err = auxtrace_queues__init(&pt->queues);
 	if (err)
 		goto err_free;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 22/25] perf tools: Add perf_evlist__del()
  2015-09-25 13:15 ` [PATCH 22/25] perf tools: Add perf_evlist__del() Adrian Hunter
@ 2015-09-28 13:33   ` Arnaldo Carvalho de Melo
  2015-09-28 20:14     ` Arnaldo Carvalho de Melo
  2015-09-29  8:48   ` [tip:perf/core] perf evlist: Add perf_evlist__remove() tip-bot for Adrian Hunter
  1 sibling, 1 reply; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-28 13:33 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Fri, Sep 25, 2015 at 04:15:53PM +0300, Adrian Hunter escreveu:
> Add a counterpart to perf_evlist__add() that does the opposite
> and deletes the evsel.
> 
> This will be used by perf inject to remove unwanted evsels

I think perf_evsel__remove() is better, as __del() looks like a shortcut
for __delete(), which has different semantics than removing an entry
from a list.

I'll fix up the patches.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/evlist.c | 8 ++++++++
>  tools/perf/util/evlist.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index e6760380d731..0bb15e6d12a0 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -165,6 +165,14 @@ void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry)
>  	__perf_evlist__propagate_maps(evlist, entry);
>  }
>  
> +void perf_evlist__del(struct perf_evlist *evlist, struct perf_evsel *evsel)
> +{
> +	evsel->evlist = NULL;
> +	list_del_init(&evsel->node);
> +	evlist->nr_entries -= 1;
> +	perf_evsel__delete(evsel);
> +}
> +
>  void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
>  				   struct list_head *list)
>  {
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 0edf0d4f4efa..7fab57d85fa1 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -73,6 +73,7 @@ void perf_evlist__exit(struct perf_evlist *evlist);
>  void perf_evlist__delete(struct perf_evlist *evlist);
>  
>  void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
> +void perf_evlist__del(struct perf_evlist *evlist, struct perf_evsel *evsel);
>  int perf_evlist__add_default(struct perf_evlist *evlist);
>  int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
>  				     struct perf_event_attr *attrs, size_t nr_attrs);
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero
  2015-09-25 13:15 ` [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero Adrian Hunter
@ 2015-09-28 14:12   ` Arnaldo Carvalho de Melo
  2015-09-28 14:16     ` Arnaldo Carvalho de Melo
  2015-09-29  8:41   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-28 14:12 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Fri, Sep 25, 2015 at 04:15:32PM +0300, Adrian Hunter escreveu:
> Instruction tracing options (i.e. --itrace) include an option for
> sampling instructions at an arbitrary period. e.g.
> 
> 	--itrace=i10us
> 
> means make an 'instructions' sample for every 10us of trace.
> 
> Currently the logic does not distinguish between a period of
> zero and no period being specified at all, so it gets treated
> as the default period which is 100000.  That doesn't really
> make sense.
> 
> Fix it so that zero period is accepted and treated as meaning
> "as often as possible".

Don't we have to update the documentation for this?
 
> In the case of Intel PT that is the same as a period of 1 and
> a unit of 'instructions' (i.e. --itrace=i1i).
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/auxtrace.c | 4 +++-
>  tools/perf/util/intel-pt.c | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index a980e7c50ee0..c4993b2e6c50 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -950,6 +950,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  	const char *p;
>  	char *endptr;
>  	bool period_type_set = false;
> +	bool period_set = false;
>  
>  	synth_opts->set = true;
>  
> @@ -971,6 +972,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  				p += 1;
>  			if (isdigit(*p)) {
>  				synth_opts->period = strtoull(p, &endptr, 10);
> +				period_set = true;
>  				p = endptr;
>  				while (*p == ' ' || *p == ',')
>  					p += 1;
> @@ -1053,7 +1055,7 @@ out:
>  		if (!period_type_set)
>  			synth_opts->period_type =
>  					PERF_ITRACE_DEFAULT_PERIOD_TYPE;
> -		if (!synth_opts->period)
> +		if (!period_set)
>  			synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
>  	}
>  
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 38942e1eac8f..c8bb5ca6a157 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -720,7 +720,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
>  
>  		if (!params.period) {
>  			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
> -			params.period = 1000;
> +			params.period = 1;
>  		}
>  	}
>  
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero
  2015-09-28 14:12   ` Arnaldo Carvalho de Melo
@ 2015-09-28 14:16     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-28 14:16 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Mon, Sep 28, 2015 at 11:12:16AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Sep 25, 2015 at 04:15:32PM +0300, Adrian Hunter escreveu:
> > Instruction tracing options (i.e. --itrace) include an option for
> > sampling instructions at an arbitrary period. e.g.
> > 
> > 	--itrace=i10us
> > 
> > means make an 'instructions' sample for every 10us of trace.
> > 
> > Currently the logic does not distinguish between a period of
> > zero and no period being specified at all, so it gets treated
> > as the default period which is 100000.  That doesn't really
> > make sense.
> > 
> > Fix it so that zero period is accepted and treated as meaning
> > "as often as possible".
> 
> Don't we have to update the documentation for this?
>  
> > In the case of Intel PT that is the same as a period of 1 and
> > a unit of 'instructions' (i.e. --itrace=i1i).

I.e. may I fold the following patch into this one? Should we update the
per tool man page?

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index 4a0501d7a3b4..05707d9bfdda 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -722,6 +722,11 @@ on the sample is *not* adjusted and reflects the last known value of TSC.
 
 For Intel PT, the default period is 100us.
 
+Setting it to a zero period means "as often as possible".
+
+In the case of Intel PT that is the same as a period of 1 and a unit of
+'instructions' (i.e. --itrace=i1i).
+
 Also the call chain size (default 16, max. 1024) for instructions or
 transactions events can be specified. e.g.
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains
  2015-09-25 13:15 ` [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains Adrian Hunter
@ 2015-09-28 20:03   ` Arnaldo Carvalho de Melo
  2015-09-29  8:52     ` Adrian Hunter
  2015-09-29  8:46   ` [tip:perf/core] perf report: Make max_stack value allow for " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-28 20:03 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Fri, Sep 25, 2015 at 04:15:46PM +0300, Adrian Hunter escreveu:
> perf report has an option (--max-stack) to set the maximum stack depth
> when processing callchains.  The option defaults to the hard-coded
> maximum definition PERF_MAX_STACK_DEPTH which is 127.  The intention of
> the option is to allow the user to reduce the processing time by
> reducing the amount of the callchain that is processed.
> 
> It is also possible, when processing instruction traces, to synthesize
> callchains.  Synthesized callchains do not have the kernel size
> limitation and are whatever size the user requests, although validation
> presently prevents the user requested a value greater that 1024.  The
> default value is 16.

So, haven't checked the options, but one can possibly use both the way
itrace has to ask for a max stack size and also via --max-stack, right?

In that case we better emit a warning or plain state that one either
uses one way of setting the max stack or the other?

I'm applying the patch, because it is unlikely that this gets specified,
but would be good to close this gap.

- Arnaldo
 
> To allow for synthesized callchains, make the max_stack value at least
> the same size as the synthesized callchain size.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/builtin-report.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index e94e5c7155af..37c9f5125887 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -809,6 +809,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>  	if (report.inverted_callchain)
>  		callchain_param.order = ORDER_CALLER;
>  
> +	if (itrace_synth_opts.callchain &&
> +	    (int)itrace_synth_opts.callchain_sz > report.max_stack)
> +		report.max_stack = itrace_synth_opts.callchain_sz;
> +
>  	if (!input_name || !strlen(input_name)) {
>  		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
>  			input_name = "-";
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 17/25] perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-25 13:15 ` [PATCH 17/25] perf callchain: " Adrian Hunter
@ 2015-09-28 20:08   ` Arnaldo Carvalho de Melo
  2015-09-29  8:16     ` Adrian Hunter
  2015-10-03  7:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-28 20:08 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Fri, Sep 25, 2015 at 04:15:48PM +0300, Adrian Hunter escreveu:
> Adjust the validation to allow for max_stack greater than
> PERF_MAX_STACK_DEPTH.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/machine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index fd1efeafb343..d7bd9a304535 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1831,7 +1831,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  	}
>  
>  check_calls:
> -	if (chain->nr > PERF_MAX_STACK_DEPTH) {
> +	if (chain->nr > PERF_MAX_STACK_DEPTH && (int)chain->nr > max_stack) {

Both?

>  		pr_warning("corrupted callchain. skipping...\n");
>  		return 0;
>  	}
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 22/25] perf tools: Add perf_evlist__del()
  2015-09-28 13:33   ` Arnaldo Carvalho de Melo
@ 2015-09-28 20:14     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-28 20:14 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Mon, Sep 28, 2015 at 10:33:32AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Sep 25, 2015 at 04:15:53PM +0300, Adrian Hunter escreveu:
> > Add a counterpart to perf_evlist__add() that does the opposite
> > and deletes the evsel.
> > 
> > This will be used by perf inject to remove unwanted evsels
> 
> I think perf_evsel__remove() is better, as __del() looks like a shortcut
> for __delete(), which has different semantics than removing an entry
> from a list.
> 
> I'll fix up the patches.

Ouch, but it also deletes the evsel, I think in this case it is better
to either have a perf_evlist__delete_evsel() or plain use
perf_evlist__remove() + perf_evsel__delete(). Doing the later, we can
rediscuss if you feel strongly about it.

- Arnaldo
 
> - Arnaldo
>  
> > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> > ---
> >  tools/perf/util/evlist.c | 8 ++++++++
> >  tools/perf/util/evlist.h | 1 +
> >  2 files changed, 9 insertions(+)
> > 
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index e6760380d731..0bb15e6d12a0 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -165,6 +165,14 @@ void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry)
> >  	__perf_evlist__propagate_maps(evlist, entry);
> >  }
> >  
> > +void perf_evlist__del(struct perf_evlist *evlist, struct perf_evsel *evsel)
> > +{
> > +	evsel->evlist = NULL;
> > +	list_del_init(&evsel->node);
> > +	evlist->nr_entries -= 1;
> > +	perf_evsel__delete(evsel);
> > +}
> > +
> >  void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
> >  				   struct list_head *list)
> >  {
> > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> > index 0edf0d4f4efa..7fab57d85fa1 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -73,6 +73,7 @@ void perf_evlist__exit(struct perf_evlist *evlist);
> >  void perf_evlist__delete(struct perf_evlist *evlist);
> >  
> >  void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
> > +void perf_evlist__del(struct perf_evlist *evlist, struct perf_evsel *evsel);
> >  int perf_evlist__add_default(struct perf_evlist *evlist);
> >  int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
> >  				     struct perf_event_attr *attrs, size_t nr_attrs);
> > -- 
> > 1.9.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff
  2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
                   ` (24 preceding siblings ...)
  2015-09-25 13:15 ` [PATCH 25/25] perf intel-pt: Add mispred-all config option to aid use with autofdo Adrian Hunter
@ 2015-09-28 20:33 ` Arnaldo Carvalho de Melo
  2015-09-29 11:13   ` Adrian Hunter
  25 siblings, 1 reply; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-28 20:33 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Fri, Sep 25, 2015 at 04:15:31PM +0300, Adrian Hunter escreveu:
> Hi
> 
> Here are some minor improvements to Intel PT related stuff.

Thanks, applied all but:

 [PATCH 17/25] perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH

Please take a look at the comments I made on this and a few others that
I applied,

- Arnaldo
 
> First 3 patches are minor fixes:
> 
>       perf auxtrace: Fix 'instructions' period of zero
>       perf report: Fix sample type validation for synthesized callchains
>       perf intel-pt: Fix potential loop forever
> 
> Next 4 are minor improvements:
> 
>       perf intel-pt: Make logging slightly more efficient
>       perf script: Allow time to be displayed in nanoseconds
>       perf tools: Warn when AUX data has been lost
>       perf tools: Add more documentation to export-to-postgresql.py script
> 
> Next 7 add support for branch stacks:
> 
>       perf auxtrace: Add option to synthesize branch stacks on samples
>       perf report: Adjust sample type validation for synthesized branch stacks
>       perf report: Also do default setup for synthesized branch stacks
>       perf report: Skip events with null branch stacks
>       perf inject: Set branch stack feature flag when synthesizing branch stacks
>       perf intel-pt: Move branch filter logic
>       perf intel-pt: Support generating branch stack
> 
> Next 6 allow for arbitrary-sized call stacks:
> 
>       perf report: Make max_stack value allow for synthesized callchains
>       perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
>       perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
>       perf script: Add a setting for maximum stack depth
>       perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
>       perf script: Make scripting_max_stack value allow for synthesized callchains
> 
> Final 5 let Intel PT be used with autofdo:
> 
>       perf tools: Add perf_evlist__id2evsel_strict()
>       perf tools: Add perf_evlist__del()
>       perf inject: Remove more aux-related stuff when processing instruction traces
>       perf inject: Add --strip option to strip out non-synthesized events
>       perf intel-pt: Add mispred-all config option to aid use with autofdo
> 
> 
> Adrian Hunter (25):
>       perf auxtrace: Fix 'instructions' period of zero
>       perf report: Fix sample type validation for synthesized callchains
>       perf intel-pt: Fix potential loop forever
>       perf intel-pt: Make logging slightly more efficient
>       perf script: Allow time to be displayed in nanoseconds
>       perf tools: Warn when AUX data has been lost
>       perf tools: Add more documentation to export-to-postgresql.py script
>       perf auxtrace: Add option to synthesize branch stacks on samples
>       perf report: Adjust sample type validation for synthesized branch stacks
>       perf report: Also do default setup for synthesized branch stacks
>       perf report: Skip events with null branch stacks
>       perf inject: Set branch stack feature flag when synthesizing branch stacks
>       perf intel-pt: Move branch filter logic
>       perf intel-pt: Support generating branch stack
>       perf report: Make max_stack value allow for synthesized callchains
>       perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
>       perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
>       perf script: Add a setting for maximum stack depth
>       perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
>       perf script: Make scripting_max_stack value allow for synthesized callchains
>       perf tools: Add perf_evlist__id2evsel_strict()
>       perf tools: Add perf_evlist__del()
>       perf inject: Remove more aux-related stuff when processing instruction traces
>       perf inject: Add --strip option to strip out non-synthesized events
>       perf intel-pt: Add mispred-all config option to aid use with autofdo
> 
>  tools/perf/Documentation/intel-pt.txt              |  39 ++++
>  tools/perf/Documentation/itrace.txt                |   4 +
>  tools/perf/Documentation/perf-inject.txt           |   3 +
>  tools/perf/Documentation/perf-script.txt           |   3 +
>  tools/perf/builtin-inject.c                        | 125 +++++++++++-
>  tools/perf/builtin-report.c                        |  31 ++-
>  tools/perf/builtin-script.c                        |  18 +-
>  tools/perf/scripts/python/export-to-postgresql.py  | 221 +++++++++++++++++++++
>  tools/perf/util/auxtrace.c                         |  24 ++-
>  tools/perf/util/auxtrace.h                         |   4 +
>  tools/perf/util/event.h                            |   1 +
>  tools/perf/util/evlist.c                           |  23 +++
>  tools/perf/util/evlist.h                           |   3 +
>  tools/perf/util/hist.c                             |   6 +-
>  tools/perf/util/hist.h                             |   1 +
>  .../perf/util/intel-pt-decoder/intel-pt-decoder.c  |   4 +-
>  tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  21 +-
>  tools/perf/util/intel-pt-decoder/intel-pt-log.h    |  38 +++-
>  tools/perf/util/intel-pt.c                         | 135 ++++++++++++-
>  tools/perf/util/machine.c                          |   2 +-
>  .../util/scripting-engines/trace-event-python.c    |   2 +-
>  tools/perf/util/session.c                          |  12 +-
>  tools/perf/util/trace-event.h                      |   2 +
>  23 files changed, 686 insertions(+), 36 deletions(-)
> 
> 
> Regards
> Adrian

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 17/25] perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-28 20:08   ` Arnaldo Carvalho de Melo
@ 2015-09-29  8:16     ` Adrian Hunter
  2015-10-01 11:45       ` Adrian Hunter
  0 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-29  8:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

On 28/09/15 23:08, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 25, 2015 at 04:15:48PM +0300, Adrian Hunter escreveu:
>> Adjust the validation to allow for max_stack greater than
>> PERF_MAX_STACK_DEPTH.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/util/machine.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>> index fd1efeafb343..d7bd9a304535 100644
>> --- a/tools/perf/util/machine.c
>> +++ b/tools/perf/util/machine.c
>> @@ -1831,7 +1831,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>>  	}
>>  
>>  check_calls:
>> -	if (chain->nr > PERF_MAX_STACK_DEPTH) {
>> +	if (chain->nr > PERF_MAX_STACK_DEPTH && (int)chain->nr > max_stack) {
> 
> Both?

Yes.

In the case of a hardware generated callchain, the callchain can be up to
PERF_MAX_STACK_DEPTH but max_stack can be less than PERF_MAX_STACK_DEPTH to
limit the number processed.

In the case of a synthesized callchain, the callchain can be up to max_stack
which might be more than PERF_MAX_STACK_DEPTH.

> 
>>  		pr_warning("corrupted callchain. skipping...\n");
>>  		return 0;
>>  	}
>> -- 
>> 1.9.1
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf auxtrace: Fix 'instructions' period of zero
  2015-09-25 13:15 ` [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero Adrian Hunter
  2015-09-28 14:12   ` Arnaldo Carvalho de Melo
@ 2015-09-29  8:41   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, adrian.hunter, tglx, jolsa, acme, hpa

Commit-ID:  e1791347b5d57d13326cf0114df1a3f3b1c4ca24
Gitweb:     http://git.kernel.org/tip/e1791347b5d57d13326cf0114df1a3f3b1c4ca24
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:32 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 15:50:56 -0300

perf auxtrace: Fix 'instructions' period of zero

Instruction tracing options (i.e. --itrace) include an option for
sampling instructions at an arbitrary period. e.g.

	--itrace=i10us

means make an 'instructions' sample for every 10us of trace.

Currently the logic does not distinguish between a period of
zero and no period being specified at all, so it gets treated
as the default period which is 100000.  That doesn't really
make sense.

Fix it so that zero period is accepted and treated as meaning
"as often as possible".

In the case of Intel PT that is the same as a period of 1 and
a unit of 'instructions' (i.e. --itrace=i1i).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-2-git-send-email-adrian.hunter@intel.com
[ Add a few lines describing this in the Documentation/intel-pt.txt file ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-pt.txt | 5 +++++
 tools/perf/util/auxtrace.c            | 4 +++-
 tools/perf/util/intel-pt.c            | 2 +-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index c94c9de..886612b 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -707,6 +707,11 @@ on the sample is *not* adjusted and reflects the last known value of TSC.
 
 For Intel PT, the default period is 100us.
 
+Setting it to a zero period means "as often as possible".
+
+In the case of Intel PT that is the same as a period of 1 and a unit of
+'instructions' (i.e. --itrace=i1i).
+
 Also the call chain size (default 16, max. 1024) for instructions or
 transactions events can be specified. e.g.
 
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index a980e7c..c4993b2 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -950,6 +950,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	const char *p;
 	char *endptr;
 	bool period_type_set = false;
+	bool period_set = false;
 
 	synth_opts->set = true;
 
@@ -971,6 +972,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				p += 1;
 			if (isdigit(*p)) {
 				synth_opts->period = strtoull(p, &endptr, 10);
+				period_set = true;
 				p = endptr;
 				while (*p == ' ' || *p == ',')
 					p += 1;
@@ -1053,7 +1055,7 @@ out:
 		if (!period_type_set)
 			synth_opts->period_type =
 					PERF_ITRACE_DEFAULT_PERIOD_TYPE;
-		if (!synth_opts->period)
+		if (!period_set)
 			synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
 	}
 
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 38942e1..c8bb5ca 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -720,7 +720,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 
 		if (!params.period) {
 			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
-			params.period = 1000;
+			params.period = 1;
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf report: Fix sample type validation for synthesized callchains
  2015-09-25 13:15 ` [PATCH 02/25] perf report: Fix sample type validation for synthesized callchains Adrian Hunter
@ 2015-09-29  8:41   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, adrian.hunter, acme, tglx, linux-kernel, jolsa, mingo

Commit-ID:  d062ac16f53d1a24047bcc9eded5514a71c363b8
Gitweb:     http://git.kernel.org/tip/d062ac16f53d1a24047bcc9eded5514a71c363b8
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:33 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:42:38 -0300

perf report: Fix sample type validation for synthesized callchains

Processing instruction tracing data (e.g. Intel PT) can synthesize
callchains e.g.

	$ perf record -e intel_pt//u uname
	$ perf report --stdio --itrace=ige

However perf report's callgraph option gets extra validation, so:

	$ perf report --stdio --itrace=ige -gflat
	Error:
	Selected -g or --branch-history but no callchain data. Did
	you call 'perf record' without -g?
	# To display the perf.data header info,
	# please use --header/--header-only options.
	#

Fix the validation to know about instruction tracing options so
above command works.

A side-effect of the change is that the default option to
accumulate the callchain of child functions comes into force.
To get the previous behaviour the --no-children option can be
used e.g.

       $ perf report --stdio --itrace=ige -gflat --no-children

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index e4e3f14..0d53b48 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -214,6 +214,12 @@ static int report__setup_sample_type(struct report *rep)
 	u64 sample_type = perf_evlist__combined_sample_type(session->evlist);
 	bool is_pipe = perf_data_file__is_pipe(session->file);
 
+	if (session->itrace_synth_opts->callchain ||
+	    (!is_pipe &&
+	     perf_header__has_feat(&session->header, HEADER_AUXTRACE) &&
+	     !session->itrace_synth_opts->set))
+		sample_type |= PERF_SAMPLE_CALLCHAIN;
+
 	if (!is_pipe && !(sample_type & PERF_SAMPLE_CALLCHAIN)) {
 		if (sort__has_parent) {
 			ui__error("Selected --sort parent, but no "

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf intel-pt: Fix potential loop forever
  2015-09-25 13:15 ` [PATCH 03/25] perf intel-pt: Fix potential loop forever Adrian Hunter
@ 2015-09-29  8:42   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, mingo, adrian.hunter, tglx, hpa, linux-kernel, acme

Commit-ID:  9992c2d50a73f442653968a98a9e5f3bf4e769e9
Gitweb:     http://git.kernel.org/tip/9992c2d50a73f442653968a98a9e5f3bf4e769e9
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:34 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:44:31 -0300

perf intel-pt: Fix potential loop forever

TSC packets contain only 7 bytes of TSC.  The 8th byte is assumed to
change so infrequently that its value can be inferred.  However the
logic must cater for a 7 byte wraparound, which it does by adding 1 to
the top byte.

The existing code was doing that with a while loop even though the
addition should only need to be done once.  That logic won't work (will
loop forever) if TSC wraps around at the 8th byte.  Theoretically that
would take at least 10 years, unless something else went wrong.

And what else could go wrong.  Well, if the chunks of trace data are
processed out of order, it will make it look like the 7-byte TSC has
gone backwards (i.e. wrapped).  If that happens 256 times then stuck in
the while loop it will be.

Fix that by getting rid of the unnecessary while loop.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 22ba502..9409d01 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -650,7 +650,7 @@ static int intel_pt_calc_cyc_cb(struct intel_pt_pkt_info *pkt_info)
 		if (data->from_mtc && timestamp < data->timestamp &&
 		    data->timestamp - timestamp < decoder->tsc_slip)
 			return 1;
-		while (timestamp < data->timestamp)
+		if (timestamp < data->timestamp)
 			timestamp += (1ULL << 56);
 		if (pkt_info->last_packet_type != INTEL_PT_CYC) {
 			if (data->from_mtc)
@@ -1191,7 +1191,7 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 					timestamp);
 			timestamp = decoder->timestamp;
 		}
-		while (timestamp < decoder->timestamp) {
+		if (timestamp < decoder->timestamp) {
 			intel_pt_log_to("Wraparound timestamp", timestamp);
 			timestamp += (1ULL << 56);
 			decoder->tsc_timestamp = timestamp;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf intel-pt: Make logging slightly more efficient
  2015-09-25 13:15 ` [PATCH 04/25] perf intel-pt: Make logging slightly more efficient Adrian Hunter
@ 2015-09-29  8:42   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, acme, linux-kernel, jolsa, tglx, adrian.hunter

Commit-ID:  116f349c5bf8c7aec4047dd6e06c310354b46e4f
Gitweb:     http://git.kernel.org/tip/116f349c5bf8c7aec4047dd6e06c310354b46e4f
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:35 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:45:26 -0300

perf intel-pt: Make logging slightly more efficient

Logging is only used for debugging. Use macros to save calling into the
functions only to return immediately when logging is not enabled.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 21 +++++++-------
 tools/perf/util/intel-pt-decoder/intel-pt-log.h | 38 +++++++++++++++++++++----
 2 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
index d09c7d9..319bef3 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -29,18 +29,18 @@
 
 static FILE *f;
 static char log_name[MAX_LOG_NAME];
-static bool enable_logging;
+bool intel_pt_enable_logging;
 
 void intel_pt_log_enable(void)
 {
-	enable_logging = true;
+	intel_pt_enable_logging = true;
 }
 
 void intel_pt_log_disable(void)
 {
 	if (f)
 		fflush(f);
-	enable_logging = false;
+	intel_pt_enable_logging = false;
 }
 
 void intel_pt_log_set_name(const char *name)
@@ -80,7 +80,7 @@ static void intel_pt_print_no_data(uint64_t pos, int indent)
 
 static int intel_pt_log_open(void)
 {
-	if (!enable_logging)
+	if (!intel_pt_enable_logging)
 		return -1;
 
 	if (f)
@@ -91,15 +91,15 @@ static int intel_pt_log_open(void)
 
 	f = fopen(log_name, "w+");
 	if (!f) {
-		enable_logging = false;
+		intel_pt_enable_logging = false;
 		return -1;
 	}
 
 	return 0;
 }
 
-void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
-			 uint64_t pos, const unsigned char *buf)
+void __intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			   uint64_t pos, const unsigned char *buf)
 {
 	char desc[INTEL_PT_PKT_DESC_MAX];
 
@@ -111,7 +111,7 @@ void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
 	fprintf(f, "%s\n", desc);
 }
 
-void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+void __intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
 {
 	char desc[INTEL_PT_INSN_DESC_MAX];
 	size_t len = intel_pt_insn->length;
@@ -128,7 +128,8 @@ void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
 		fprintf(f, "Bad instruction!\n");
 }
 
-void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+void __intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+				 uint64_t ip)
 {
 	char desc[INTEL_PT_INSN_DESC_MAX];
 
@@ -142,7 +143,7 @@ void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
 		fprintf(f, "Bad instruction!\n");
 }
 
-void intel_pt_log(const char *fmt, ...)
+void __intel_pt_log(const char *fmt, ...)
 {
 	va_list args;
 
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.h b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
index db3942f..debe751 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
@@ -25,20 +25,46 @@ void intel_pt_log_enable(void);
 void intel_pt_log_disable(void);
 void intel_pt_log_set_name(const char *name);
 
-void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
-			 uint64_t pos, const unsigned char *buf);
+void __intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			   uint64_t pos, const unsigned char *buf);
 
 struct intel_pt_insn;
 
-void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
-void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
-			       uint64_t ip);
+void __intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
+void __intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+				 uint64_t ip);
 
 __attribute__((format(printf, 1, 2)))
-void intel_pt_log(const char *fmt, ...);
+void __intel_pt_log(const char *fmt, ...);
+
+#define intel_pt_log(fmt, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log(fmt, ##__VA_ARGS__); \
+	} while (0)
+
+#define intel_pt_log_packet(arg, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log_packet(arg, ##__VA_ARGS__); \
+	} while (0)
+
+#define intel_pt_log_insn(arg, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log_insn(arg, ##__VA_ARGS__); \
+	} while (0)
+
+#define intel_pt_log_insn_no_data(arg, ...) \
+	do { \
+		if (intel_pt_enable_logging) \
+			__intel_pt_log_insn_no_data(arg, ##__VA_ARGS__); \
+	} while (0)
 
 #define x64_fmt "0x%" PRIx64
 
+extern bool intel_pt_enable_logging;
+
 static inline void intel_pt_log_at(const char *msg, uint64_t u)
 {
 	intel_pt_log("%s at " x64_fmt "\n", msg, u);

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf script: Allow time to be displayed in nanoseconds
  2015-09-25 13:15 ` [PATCH 05/25] perf script: Allow time to be displayed in nanoseconds Adrian Hunter
@ 2015-09-29  8:42   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, linux-kernel, mingo, jolsa, acme, adrian.hunter, hpa

Commit-ID:  83e1986032dfcd3f9e9fc0d06e11d9153edae19b
Gitweb:     http://git.kernel.org/tip/83e1986032dfcd3f9e9fc0d06e11d9153edae19b
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:36 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:46:05 -0300

perf script: Allow time to be displayed in nanoseconds

Add option --ns to display time to 9 decimal places.  That is useful in
some cases, for example when using Intel PT cycle accurate mode.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt | 3 +++
 tools/perf/builtin-script.c              | 8 +++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index dc3ec78..b3b42f9 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -249,6 +249,9 @@ include::itrace.txt[]
 --full-source-path::
 	Show the full path for source files for srcline output.
 
+--ns::
+	Use 9 decimal places when displaying time (i.e. show the nanoseconds)
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 284a76e..0928439 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -29,6 +29,7 @@ static bool			no_callchain;
 static bool			latency_format;
 static bool			system_wide;
 static bool			print_flags;
+static bool			nanosecs;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
@@ -415,7 +416,10 @@ static void print_sample_start(struct perf_sample *sample,
 		secs = nsecs / NSECS_PER_SEC;
 		nsecs -= secs * NSECS_PER_SEC;
 		usecs = nsecs / NSECS_PER_USEC;
-		printf("%5lu.%06lu: ", secs, usecs);
+		if (nanosecs)
+			printf("%5lu.%09llu: ", secs, nsecs);
+		else
+			printf("%5lu.%06lu: ", secs, usecs);
 	}
 }
 
@@ -1695,6 +1699,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
 		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
+	OPT_BOOLEAN(0, "ns", &nanosecs,
+		    "Use 9 decimal places when displaying time"),
 	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
 			    "Instruction Tracing options",
 			    itrace_parse_synth_opts),

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf session: Warn when AUX data has been lost
  2015-09-25 13:15 ` [PATCH 06/25] perf tools: Warn when AUX data has been lost Adrian Hunter
@ 2015-09-29  8:43   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, tglx, linux-kernel, adrian.hunter, jolsa, mingo, hpa

Commit-ID:  a38f48e300f9dac30a9b2d2ce958c8dbd7def351
Gitweb:     http://git.kernel.org/tip/a38f48e300f9dac30a9b2d2ce958c8dbd7def351
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:37 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:51:33 -0300

perf session: Warn when AUX data has been lost

By default 'perf record' will postprocess the perf.data file to
determine build-ids.  When that happens, the number of lost perf events
is displayed.

Make that also happen for AUX events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/event.h   |  1 +
 tools/perf/util/session.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index be5cbc7..a0dbcbd 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -257,6 +257,7 @@ struct events_stats {
 	u64 total_non_filtered_period;
 	u64 total_lost;
 	u64 total_lost_samples;
+	u64 total_aux_lost;
 	u64 total_invalid_chains;
 	u32 nr_events[PERF_RECORD_HEADER_MAX];
 	u32 nr_non_filtered_samples;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f5e0000..15c84ca 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1101,6 +1101,9 @@ static int machines__deliver_event(struct machines *machines,
 	case PERF_RECORD_UNTHROTTLE:
 		return tool->unthrottle(tool, event, sample, machine);
 	case PERF_RECORD_AUX:
+		if (tool->aux == perf_event__process_aux &&
+		    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED))
+			evlist->stats.total_aux_lost += 1;
 		return tool->aux(tool, event, sample, machine);
 	case PERF_RECORD_ITRACE_START:
 		return tool->itrace_start(tool, event, sample, machine);
@@ -1346,6 +1349,13 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
 		}
 	}
 
+	if (session->tool->aux == perf_event__process_aux &&
+	    stats->total_aux_lost != 0) {
+		ui__warning("AUX data lost %" PRIu64 " times out of %u!\n\n",
+			    stats->total_aux_lost,
+			    stats->nr_events[PERF_RECORD_AUX]);
+	}
+
 	if (stats->nr_unknown_events != 0) {
 		ui__warning("Found %u unknown events!\n\n"
 			    "Is this an older tool processing a perf.data "

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf tools: Add more documentation to export-to-postgresql.py script
  2015-09-25 13:15 ` [PATCH 07/25] perf tools: Add more documentation to export-to-postgresql.py script Adrian Hunter
@ 2015-09-29  8:43   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, acme, jolsa, tglx, adrian.hunter, hpa, linux-kernel

Commit-ID:  35ca01c117da9b8e5b60204f730cdde414735596
Gitweb:     http://git.kernel.org/tip/35ca01c117da9b8e5b60204f730cdde414735596
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:38 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:53:07 -0300

perf tools: Add more documentation to export-to-postgresql.py script

Add some comments to the script and some 'views' to the created database
that better illustrate the database structure and how it can be used.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-8-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/scripts/python/export-to-postgresql.py | 221 ++++++++++++++++++++++
 1 file changed, 221 insertions(+)

diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 84a3203..1b02cdc 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -61,6 +61,142 @@ import datetime
 #
 # An example of using the database is provided by the script
 # call-graph-from-postgresql.py.  Refer to that script for details.
+#
+# Tables:
+#
+#	The tables largely correspond to perf tools' data structures.  They are largely self-explanatory.
+#
+#	samples
+#
+#		'samples' is the main table. It represents what instruction was executing at a point in time
+#		when something (a selected event) happened.  The memory address is the instruction pointer or 'ip'.
+#
+#	calls
+#
+#		'calls' represents function calls and is related to 'samples' by 'call_id' and 'return_id'.
+#		'calls' is only created when the 'calls' option to this script is specified.
+#
+#	call_paths
+#
+#		'call_paths' represents all the call stacks.  Each 'call' has an associated record in 'call_paths'.
+#		'calls_paths' is only created when the 'calls' option to this script is specified.
+#
+#	branch_types
+#
+#		'branch_types' provides descriptions for each type of branch.
+#
+#	comm_threads
+#
+#		'comm_threads' shows how 'comms' relates to 'threads'.
+#
+#	comms
+#
+#		'comms' contains a record for each 'comm' - the name given to the executable that is running.
+#
+#	dsos
+#
+#		'dsos' contains a record for each executable file or library.
+#
+#	machines
+#
+#		'machines' can be used to distinguish virtual machines if virtualization is supported.
+#
+#	selected_events
+#
+#		'selected_events' contains a record for each kind of event that has been sampled.
+#
+#	symbols
+#
+#		'symbols' contains a record for each symbol.  Only symbols that have samples are present.
+#
+#	threads
+#
+#		'threads' contains a record for each thread.
+#
+# Views:
+#
+#	Most of the tables have views for more friendly display.  The views are:
+#
+#		calls_view
+#		call_paths_view
+#		comm_threads_view
+#		dsos_view
+#		machines_view
+#		samples_view
+#		symbols_view
+#		threads_view
+#
+# More examples of browsing the database with psql:
+#   Note that some of the examples are not the most optimal SQL query.
+#   Note that call information is only available if the script's 'calls' option has been used.
+#
+#	Top 10 function calls (not aggregated by symbol):
+#
+#		SELECT * FROM calls_view ORDER BY elapsed_time DESC LIMIT 10;
+#
+#	Top 10 function calls (aggregated by symbol):
+#
+#		SELECT symbol_id,(SELECT name FROM symbols WHERE id = symbol_id) AS symbol,
+#			SUM(elapsed_time) AS tot_elapsed_time,SUM(branch_count) AS tot_branch_count
+#			FROM calls_view GROUP BY symbol_id ORDER BY tot_elapsed_time DESC LIMIT 10;
+#
+#		Note that the branch count gives a rough estimation of cpu usage, so functions
+#		that took a long time but have a relatively low branch count must have spent time
+#		waiting.
+#
+#	Find symbols by pattern matching on part of the name (e.g. names containing 'alloc'):
+#
+#		SELECT * FROM symbols_view WHERE name LIKE '%alloc%';
+#
+#	Top 10 function calls for a specific symbol (e.g. whose symbol_id is 187):
+#
+#		SELECT * FROM calls_view WHERE symbol_id = 187 ORDER BY elapsed_time DESC LIMIT 10;
+#
+#	Show function calls made by function in the same context (i.e. same call path) (e.g. one with call_path_id 254):
+#
+#		SELECT * FROM calls_view WHERE parent_call_path_id = 254;
+#
+#	Show branches made during a function call (e.g. where call_id is 29357 and return_id is 29370 and tid is 29670)
+#
+#		SELECT * FROM samples_view WHERE id >= 29357 AND id <= 29370 AND tid = 29670 AND event LIKE 'branches%';
+#
+#	Show transactions:
+#
+#		SELECT * FROM samples_view WHERE event = 'transactions';
+#
+#		Note transaction start has 'in_tx' true whereas, transaction end has 'in_tx' false.
+#		Transaction aborts have branch_type_name 'transaction abort'
+#
+#	Show transaction aborts:
+#
+#		SELECT * FROM samples_view WHERE event = 'transactions' AND branch_type_name = 'transaction abort';
+#
+# To print a call stack requires walking the call_paths table.  For example this python script:
+#   #!/usr/bin/python2
+#
+#   import sys
+#   from PySide.QtSql import *
+#
+#   if __name__ == '__main__':
+#           if (len(sys.argv) < 3):
+#                   print >> sys.stderr, "Usage is: printcallstack.py <database name> <call_path_id>"
+#                   raise Exception("Too few arguments")
+#           dbname = sys.argv[1]
+#           call_path_id = sys.argv[2]
+#           db = QSqlDatabase.addDatabase('QPSQL')
+#           db.setDatabaseName(dbname)
+#           if not db.open():
+#                   raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
+#           query = QSqlQuery(db)
+#           print "    id          ip  symbol_id  symbol                          dso_id  dso_short_name"
+#           while call_path_id != 0 and call_path_id != 1:
+#                   ret = query.exec_('SELECT * FROM call_paths_view WHERE id = ' + str(call_path_id))
+#                   if not ret:
+#                           raise Exception("Query failed: " + query.lastError().text())
+#                   if not query.next():
+#                           raise Exception("Query failed")
+#                   print "{0:>6}  {1:>10}  {2:>9}  {3:<30}  {4:>6}  {5:<30}".format(query.value(0), query.value(1), query.value(2), query.value(3), query.value(4), query.value(5))
+#                   call_path_id = query.value(6)
 
 from PySide.QtSql import *
 
@@ -244,6 +380,91 @@ if perf_db_export_calls:
 		'parent_call_path_id	bigint,'
 		'flags		integer)')
 
+do_query(query, 'CREATE VIEW machines_view AS '
+	'SELECT '
+		'id,'
+		'pid,'
+		'root_dir,'
+		'CASE WHEN id=0 THEN \'unknown\' WHEN pid=-1 THEN \'host\' ELSE \'guest\' END AS host_or_guest'
+	' FROM machines')
+
+do_query(query, 'CREATE VIEW dsos_view AS '
+	'SELECT '
+		'id,'
+		'machine_id,'
+		'(SELECT host_or_guest FROM machines_view WHERE id = machine_id) AS host_or_guest,'
+		'short_name,'
+		'long_name,'
+		'build_id'
+	' FROM dsos')
+
+do_query(query, 'CREATE VIEW symbols_view AS '
+	'SELECT '
+		'id,'
+		'name,'
+		'(SELECT short_name FROM dsos WHERE id=dso_id) AS dso,'
+		'dso_id,'
+		'sym_start,'
+		'sym_end,'
+		'CASE WHEN binding=0 THEN \'local\' WHEN binding=1 THEN \'global\' ELSE \'weak\' END AS binding'
+	' FROM symbols')
+
+do_query(query, 'CREATE VIEW threads_view AS '
+	'SELECT '
+		'id,'
+		'machine_id,'
+		'(SELECT host_or_guest FROM machines_view WHERE id = machine_id) AS host_or_guest,'
+		'process_id,'
+		'pid,'
+		'tid'
+	' FROM threads')
+
+do_query(query, 'CREATE VIEW comm_threads_view AS '
+	'SELECT '
+		'comm_id,'
+		'(SELECT comm FROM comms WHERE id = comm_id) AS command,'
+		'thread_id,'
+		'(SELECT pid FROM threads WHERE id = thread_id) AS pid,'
+		'(SELECT tid FROM threads WHERE id = thread_id) AS tid'
+	' FROM comm_threads')
+
+if perf_db_export_calls:
+	do_query(query, 'CREATE VIEW call_paths_view AS '
+		'SELECT '
+			'c.id,'
+			'to_hex(c.ip) AS ip,'
+			'c.symbol_id,'
+			'(SELECT name FROM symbols WHERE id = c.symbol_id) AS symbol,'
+			'(SELECT dso_id FROM symbols WHERE id = c.symbol_id) AS dso_id,'
+			'(SELECT dso FROM symbols_view  WHERE id = c.symbol_id) AS dso_short_name,'
+			'c.parent_id,'
+			'to_hex(p.ip) AS parent_ip,'
+			'p.symbol_id AS parent_symbol_id,'
+			'(SELECT name FROM symbols WHERE id = p.symbol_id) AS parent_symbol,'
+			'(SELECT dso_id FROM symbols WHERE id = p.symbol_id) AS parent_dso_id,'
+			'(SELECT dso FROM symbols_view  WHERE id = p.symbol_id) AS parent_dso_short_name'
+		' FROM call_paths c INNER JOIN call_paths p ON p.id = c.parent_id')
+	do_query(query, 'CREATE VIEW calls_view AS '
+		'SELECT '
+			'calls.id,'
+			'thread_id,'
+			'(SELECT pid FROM threads WHERE id = thread_id) AS pid,'
+			'(SELECT tid FROM threads WHERE id = thread_id) AS tid,'
+			'(SELECT comm FROM comms WHERE id = comm_id) AS command,'
+			'call_path_id,'
+			'to_hex(ip) AS ip,'
+			'symbol_id,'
+			'(SELECT name FROM symbols WHERE id = symbol_id) AS symbol,'
+			'call_time,'
+			'return_time,'
+			'return_time - call_time AS elapsed_time,'
+			'branch_count,'
+			'call_id,'
+			'return_id,'
+			'CASE WHEN flags=1 THEN \'no call\' WHEN flags=2 THEN \'no return\' WHEN flags=3 THEN \'no call/return\' ELSE \'\' END AS flags,'
+			'parent_call_path_id'
+		' FROM calls INNER JOIN call_paths ON call_paths.id = call_path_id')
+
 do_query(query, 'CREATE VIEW samples_view AS '
 	'SELECT '
 		'id,'

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf auxtrace: Add option to synthesize branch stacks on samples
  2015-09-25 13:15 ` [PATCH 08/25] perf auxtrace: Add option to synthesize branch stacks on samples Adrian Hunter
@ 2015-09-29  8:43   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, tglx, mingo, linux-kernel, ak, hpa, acme, jolsa

Commit-ID:  601897b54c7ed492a89b262dccd7c6f7faf12b30
Gitweb:     http://git.kernel.org/tip/601897b54c7ed492a89b262dccd7c6f7faf12b30
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:39 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:53:44 -0300

perf auxtrace: Add option to synthesize branch stacks on samples

Add AUX area tracing option 'l' to synthesize branch stacks on samples
just like sample type PERF_SAMPLE_BRANCH_STACK.  This is taken into use
by Intel PT in a subsequent patch.

Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-9-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/itrace.txt |  4 ++++
 tools/perf/util/auxtrace.c          | 20 ++++++++++++++++++++
 tools/perf/util/auxtrace.h          |  4 ++++
 3 files changed, 28 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 2ff9466..65453f4 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -6,6 +6,7 @@
 		e	synthesize error events
 		d	create a debug log
 		g	synthesize a call chain (use with i or x)
+		l	synthesize last branch entries (use with i or x)
 
 	The default is all events i.e. the same as --itrace=ibxe
 
@@ -20,3 +21,6 @@
 
 	Also the call chain size (default 16, max. 1024) for instructions or
 	transactions events can be specified.
+
+	Also the number of last branch entries (default 64, max. 1024) for
+	instructions or transactions events can be specified.
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c4993b2..7f10430 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -926,6 +926,8 @@ s64 perf_event__process_auxtrace(struct perf_tool *tool,
 #define PERF_ITRACE_DEFAULT_PERIOD		100000
 #define PERF_ITRACE_DEFAULT_CALLCHAIN_SZ	16
 #define PERF_ITRACE_MAX_CALLCHAIN_SZ		1024
+#define PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ	64
+#define PERF_ITRACE_MAX_LAST_BRANCH_SZ		1024
 
 void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts)
 {
@@ -936,6 +938,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts)
 	synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
 	synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
 	synth_opts->callchain_sz = PERF_ITRACE_DEFAULT_CALLCHAIN_SZ;
+	synth_opts->last_branch_sz = PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
 }
 
 /*
@@ -1043,6 +1046,23 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				synth_opts->callchain_sz = val;
 			}
 			break;
+		case 'l':
+			synth_opts->last_branch = true;
+			synth_opts->last_branch_sz =
+					PERF_ITRACE_DEFAULT_LAST_BRANCH_SZ;
+			while (*p == ' ' || *p == ',')
+				p += 1;
+			if (isdigit(*p)) {
+				unsigned int val;
+
+				val = strtoul(p, &endptr, 10);
+				p = endptr;
+				if (!val ||
+				    val > PERF_ITRACE_MAX_LAST_BRANCH_SZ)
+					goto out_err;
+				synth_opts->last_branch_sz = val;
+			}
+			break;
 		case ' ':
 		case ',':
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index bf72b77..b86f90db 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -63,7 +63,9 @@ enum itrace_period_type {
  * @calls: limit branch samples to calls (can be combined with @returns)
  * @returns: limit branch samples to returns (can be combined with @calls)
  * @callchain: add callchain to 'instructions' events
+ * @last_branch: add branch context to 'instruction' events
  * @callchain_sz: maximum callchain size
+ * @last_branch_sz: branch context size
  * @period: 'instructions' events period
  * @period_type: 'instructions' events period type
  */
@@ -79,7 +81,9 @@ struct itrace_synth_opts {
 	bool			calls;
 	bool			returns;
 	bool			callchain;
+	bool			last_branch;
 	unsigned int		callchain_sz;
+	unsigned int		last_branch_sz;
 	unsigned long long	period;
 	enum itrace_period_type	period_type;
 };

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf report: Adjust sample type validation for synthesized branch stacks
  2015-09-25 13:15 ` [PATCH 09/25] perf report: Adjust sample type validation for synthesized branch stacks Adrian Hunter
@ 2015-09-29  8:44   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, linux-kernel, adrian.hunter, hpa, jolsa, acme

Commit-ID:  c7eced63f2f67bd06ceb2269062416db9d81d29d
Gitweb:     http://git.kernel.org/tip/c7eced63f2f67bd06ceb2269062416db9d81d29d
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:40 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:54:21 -0300

perf report: Adjust sample type validation for synthesized branch stacks

perf report looks at event sample types to determine if branch stacks
have been sampled.  Adjust the validation to know about instruction
tracing options.

This change allows the use of the -b option which otherwise would
complain with an error like:

	Error:
	Selected -b but no branch data. Did you call perf record without -b?
	# To display the perf.data header info,
	# please use --header/--header-only options.
	#

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-10-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 0d53b48..7af35af 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -220,6 +220,9 @@ static int report__setup_sample_type(struct report *rep)
 	     !session->itrace_synth_opts->set))
 		sample_type |= PERF_SAMPLE_CALLCHAIN;
 
+	if (session->itrace_synth_opts->last_branch)
+		sample_type |= PERF_SAMPLE_BRANCH_STACK;
+
 	if (!is_pipe && !(sample_type & PERF_SAMPLE_CALLCHAIN)) {
 		if (sort__has_parent) {
 			ui__error("Selected --sort parent, but no "

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf report: Also do default setup for synthesized branch stacks
  2015-09-25 13:15 ` [PATCH 10/25] perf report: Also do default setup " Adrian Hunter
@ 2015-09-29  8:44   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, jolsa, hpa, linux-kernel, acme, tglx, mingo

Commit-ID:  fb9fab66e6e3ee737e521c899684c6d684b24a22
Gitweb:     http://git.kernel.org/tip/fb9fab66e6e3ee737e521c899684c6d684b24a22
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:41 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:54:45 -0300

perf report: Also do default setup for synthesized branch stacks

The 'perf report' tool will default to displaying branch stacks (-b
option) if they are present.  Make that also happen for synthesized
branch stacks.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-11-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 7af35af..92f7c5a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -829,6 +829,9 @@ repeat:
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
+	if (itrace_synth_opts.last_branch)
+		has_br_stack = true;
+
 	/*
 	 * Branch mode is a tristate:
 	 * -1 means default, so decide based on the file having branch data.

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf report: Skip events with null branch stacks
  2015-09-25 13:15 ` [PATCH 11/25] perf report: Skip events with null " Adrian Hunter
@ 2015-09-29  8:44   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, hpa, tglx, mingo, linux-kernel, acme, jolsa

Commit-ID:  f86225db3aa0e394915af45eea1c3cca6f3e2dba
Gitweb:     http://git.kernel.org/tip/f86225db3aa0e394915af45eea1c3cca6f3e2dba
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:42 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:57:01 -0300

perf report: Skip events with null branch stacks

A non-synthesized event might not have a branch stack if branch stacks
have been synthesized (using itrace options).

An example of that is when Intel PT records sched_switch events for
decoding purposes.  Those sched_switch events do not have branch stacks
even though the Intel PT decoder may be synthesizing other events that
do due to the itrace options.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-12-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 92f7c5a..e94e5c7 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -163,14 +163,21 @@ static int process_sample_event(struct perf_tool *tool,
 	if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
 		goto out_put;
 
-	if (sort__mode == SORT_MODE__BRANCH)
+	if (sort__mode == SORT_MODE__BRANCH) {
+		/*
+		 * A non-synthesized event might not have a branch stack if
+		 * branch stacks have been synthesized (using itrace options).
+		 */
+		if (!sample->branch_stack)
+			goto out_put;
 		iter.ops = &hist_iter_branch;
-	else if (rep->mem_mode)
+	} else if (rep->mem_mode) {
 		iter.ops = &hist_iter_mem;
-	else if (symbol_conf.cumulate_callchain)
+	} else if (symbol_conf.cumulate_callchain) {
 		iter.ops = &hist_iter_cumulative;
-	else
+	} else {
 		iter.ops = &hist_iter_normal;
+	}
 
 	if (al.map != NULL)
 		al.map->dso->hit = 1;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf inject: Set branch stack feature flag when synthesizing branch stacks
  2015-09-25 13:15 ` [PATCH 12/25] perf inject: Set branch stack feature flag when synthesizing " Adrian Hunter
@ 2015-09-29  8:45   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, acme, tglx, linux-kernel, adrian.hunter, jolsa, hpa

Commit-ID:  051a01b9a2c1c1ef3049973a43d9ed4ddcc946f3
Gitweb:     http://git.kernel.org/tip/051a01b9a2c1c1ef3049973a43d9ed4ddcc946f3
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:43 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:57:59 -0300

perf inject: Set branch stack feature flag when synthesizing branch stacks

The branch stack feature flag is set by 'perf record' when recording
data that contains branch stacks.  Consequently, when 'perf inject'
synthesizes branch stacks, the feature flag should be set also.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-13-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-inject.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f62c49b..8638fad 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -537,9 +537,13 @@ static int __cmd_inject(struct perf_inject *inject)
 		 * The AUX areas have been removed and replaced with
 		 * synthesized hardware events, so clear the feature flag.
 		 */
-		if (inject->itrace_synth_opts.set)
+		if (inject->itrace_synth_opts.set) {
 			perf_header__clear_feat(&session->header,
 						HEADER_AUXTRACE);
+			if (inject->itrace_synth_opts.last_branch)
+				perf_header__set_feat(&session->header,
+						      HEADER_BRANCH_STACK);
+		}
 		session->header.data_offset = output_data_offset;
 		session->header.data_size = inject->bytes_written;
 		perf_session__write_header(session, session->evlist, fd, true);

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf intel-pt: Move branch filter logic
  2015-09-25 13:15 ` [PATCH 13/25] perf intel-pt: Move branch filter logic Adrian Hunter
@ 2015-09-29  8:45   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, adrian.hunter, mingo, acme, linux-kernel, tglx, jolsa

Commit-ID:  385e33063fb963f5cccb0a37fe539319b6481fa5
Gitweb:     http://git.kernel.org/tip/385e33063fb963f5cccb0a37fe539319b6481fa5
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:44 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:58:27 -0300

perf intel-pt: Move branch filter logic

intel_pt_synth_branch_sample() skips synthesizing if the branch does not
match the branch filter.  That logic was sitting in the middle of the
function but is more efficiently placed at the start of the function, so
move it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-14-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index c8bb5ca..2c01e72 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -891,6 +891,9 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	union perf_event *event = ptq->event_buf;
 	struct perf_sample sample = { .ip = 0, };
 
+	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
+		return 0;
+
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = PERF_RECORD_MISC_USER;
 	event->sample.header.size = sizeof(struct perf_event_header);
@@ -909,9 +912,6 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	sample.flags = ptq->flags;
 	sample.insn_len = ptq->insn_len;
 
-	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
-		return 0;
-
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->branches_sample_type,

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf intel-pt: Support generating branch stack
  2015-09-25 13:15 ` [PATCH 14/25] perf intel-pt: Support generating branch stack Adrian Hunter
@ 2015-09-29  8:45   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, hpa, ak, acme, adrian.hunter, mingo, tglx, linux-kernel

Commit-ID:  f14445ee72c59f32aa5cbf4d0f0330a5f62a752d
Gitweb:     http://git.kernel.org/tip/f14445ee72c59f32aa5cbf4d0f0330a5f62a752d
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:45 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 16:59:14 -0300

perf intel-pt: Support generating branch stack

Add support for generating branch stack context for PT samples.  The
decoder reports a configurable number of branches as branch context for
each sample. Internally it keeps track of them by using a simple sliding
window.  We also flush the last branch buffer on each sample to avoid
overlapping intervals.

This is useful for:

- Reporting accurate basic block edge frequencies through the perf
  report branch view
- Using with --branch-history to get the wider context of samples
- Other users of LBRs

Also the Documentation is updated.

Examples:

	Record with Intel PT:

		perf record -e intel_pt//u ls

	Branch stacks are used by default if synthesized so:

		perf report --itrace=ile

	is the same as:

		perf report --itrace=ile -b

	Branch history can be requested also:

		perf report --itrace=igle --branch-history

Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-15-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-pt.txt |  10 +++
 tools/perf/util/intel-pt.c            | 115 ++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index 886612b..a0fbb5d 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -671,6 +671,7 @@ The letters are:
 	e	synthesize tracing error events
 	d	create a debug log
 	g	synthesize a call chain (use with i or x)
+	l	synthesize last branch entries (use with i or x)
 
 "Instructions" events look like they were recorded by "perf record -e
 instructions".
@@ -718,6 +719,15 @@ transactions events can be specified. e.g.
 	--itrace=ig32
 	--itrace=xg32
 
+Also the number of last branch entries (default 64, max. 1024) for instructions or
+transactions events can be specified. e.g.
+
+       --itrace=il10
+       --itrace=xl10
+
+Note that last branch entries are cleared for each sample, so there is no overlap
+from one sample to the next.
+
 To disable trace decoding entirely, use the option --no-itrace.
 
 
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 2c01e72..05e8fcc51 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -22,6 +22,7 @@
 #include "../perf.h"
 #include "session.h"
 #include "machine.h"
+#include "sort.h"
 #include "tool.h"
 #include "event.h"
 #include "evlist.h"
@@ -115,6 +116,9 @@ struct intel_pt_queue {
 	void *decoder;
 	const struct intel_pt_state *state;
 	struct ip_callchain *chain;
+	struct branch_stack *last_branch;
+	struct branch_stack *last_branch_rb;
+	size_t last_branch_pos;
 	union perf_event *event_buf;
 	bool on_heap;
 	bool stop;
@@ -675,6 +679,19 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 			goto out_free;
 	}
 
+	if (pt->synth_opts.last_branch) {
+		size_t sz = sizeof(struct branch_stack);
+
+		sz += pt->synth_opts.last_branch_sz *
+		      sizeof(struct branch_entry);
+		ptq->last_branch = zalloc(sz);
+		if (!ptq->last_branch)
+			goto out_free;
+		ptq->last_branch_rb = zalloc(sz);
+		if (!ptq->last_branch_rb)
+			goto out_free;
+	}
+
 	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
 	if (!ptq->event_buf)
 		goto out_free;
@@ -732,6 +749,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 
 out_free:
 	zfree(&ptq->event_buf);
+	zfree(&ptq->last_branch);
+	zfree(&ptq->last_branch_rb);
 	zfree(&ptq->chain);
 	free(ptq);
 	return NULL;
@@ -746,6 +765,8 @@ static void intel_pt_free_queue(void *priv)
 	thread__zput(ptq->thread);
 	intel_pt_decoder_free(ptq->decoder);
 	zfree(&ptq->event_buf);
+	zfree(&ptq->last_branch);
+	zfree(&ptq->last_branch_rb);
 	zfree(&ptq->chain);
 	free(ptq);
 }
@@ -876,6 +897,57 @@ static int intel_pt_setup_queues(struct intel_pt *pt)
 	return 0;
 }
 
+static inline void intel_pt_copy_last_branch_rb(struct intel_pt_queue *ptq)
+{
+	struct branch_stack *bs_src = ptq->last_branch_rb;
+	struct branch_stack *bs_dst = ptq->last_branch;
+	size_t nr = 0;
+
+	bs_dst->nr = bs_src->nr;
+
+	if (!bs_src->nr)
+		return;
+
+	nr = ptq->pt->synth_opts.last_branch_sz - ptq->last_branch_pos;
+	memcpy(&bs_dst->entries[0],
+	       &bs_src->entries[ptq->last_branch_pos],
+	       sizeof(struct branch_entry) * nr);
+
+	if (bs_src->nr >= ptq->pt->synth_opts.last_branch_sz) {
+		memcpy(&bs_dst->entries[nr],
+		       &bs_src->entries[0],
+		       sizeof(struct branch_entry) * ptq->last_branch_pos);
+	}
+}
+
+static inline void intel_pt_reset_last_branch_rb(struct intel_pt_queue *ptq)
+{
+	ptq->last_branch_pos = 0;
+	ptq->last_branch_rb->nr = 0;
+}
+
+static void intel_pt_update_last_branch_rb(struct intel_pt_queue *ptq)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct branch_stack *bs = ptq->last_branch_rb;
+	struct branch_entry *be;
+
+	if (!ptq->last_branch_pos)
+		ptq->last_branch_pos = ptq->pt->synth_opts.last_branch_sz;
+
+	ptq->last_branch_pos -= 1;
+
+	be              = &bs->entries[ptq->last_branch_pos];
+	be->from        = state->from_ip;
+	be->to          = state->to_ip;
+	be->flags.abort = !!(state->flags & INTEL_PT_ABORT_TX);
+	be->flags.in_tx = !!(state->flags & INTEL_PT_IN_TX);
+	/* No support for mispredict */
+
+	if (bs->nr < ptq->pt->synth_opts.last_branch_sz)
+		bs->nr += 1;
+}
+
 static int intel_pt_inject_event(union perf_event *event,
 				 struct perf_sample *sample, u64 type,
 				 bool swapped)
@@ -890,6 +962,10 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	struct intel_pt *pt = ptq->pt;
 	union perf_event *event = ptq->event_buf;
 	struct perf_sample sample = { .ip = 0, };
+	struct dummy_branch_stack {
+		u64			nr;
+		struct branch_entry	entries;
+	} dummy_bs;
 
 	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
 		return 0;
@@ -912,6 +988,21 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	sample.flags = ptq->flags;
 	sample.insn_len = ptq->insn_len;
 
+	/*
+	 * perf report cannot handle events without a branch stack when using
+	 * SORT_MODE__BRANCH so make a dummy one.
+	 */
+	if (pt->synth_opts.last_branch && sort__mode == SORT_MODE__BRANCH) {
+		dummy_bs = (struct dummy_branch_stack){
+			.nr = 1,
+			.entries = {
+				.from = sample.ip,
+				.to = sample.addr,
+			},
+		};
+		sample.branch_stack = (struct branch_stack *)&dummy_bs;
+	}
+
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->branches_sample_type,
@@ -961,6 +1052,11 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 		sample.callchain = ptq->chain;
 	}
 
+	if (pt->synth_opts.last_branch) {
+		intel_pt_copy_last_branch_rb(ptq);
+		sample.branch_stack = ptq->last_branch;
+	}
+
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->instructions_sample_type,
@@ -974,6 +1070,9 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
 		       ret);
 
+	if (pt->synth_opts.last_branch)
+		intel_pt_reset_last_branch_rb(ptq);
+
 	return ret;
 }
 
@@ -1008,6 +1107,11 @@ static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
 		sample.callchain = ptq->chain;
 	}
 
+	if (pt->synth_opts.last_branch) {
+		intel_pt_copy_last_branch_rb(ptq);
+		sample.branch_stack = ptq->last_branch;
+	}
+
 	if (pt->synth_opts.inject) {
 		ret = intel_pt_inject_event(event, &sample,
 					    pt->transactions_sample_type,
@@ -1021,6 +1125,9 @@ static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
 		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
 		       ret);
 
+	if (pt->synth_opts.callchain)
+		intel_pt_reset_last_branch_rb(ptq);
+
 	return ret;
 }
 
@@ -1116,6 +1223,9 @@ static int intel_pt_sample(struct intel_pt_queue *ptq)
 			return err;
 	}
 
+	if (pt->synth_opts.last_branch)
+		intel_pt_update_last_branch_rb(ptq);
+
 	if (!pt->sync_switch)
 		return 0;
 
@@ -1763,6 +1873,8 @@ static int intel_pt_synth_events(struct intel_pt *pt,
 		pt->instructions_sample_period = attr.sample_period;
 		if (pt->synth_opts.callchain)
 			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		if (pt->synth_opts.last_branch)
+			attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
 		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
 			 id, (u64)attr.sample_type);
 		err = intel_pt_synth_event(session, &attr, id);
@@ -1782,6 +1894,8 @@ static int intel_pt_synth_events(struct intel_pt *pt,
 		attr.sample_period = 1;
 		if (pt->synth_opts.callchain)
 			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		if (pt->synth_opts.last_branch)
+			attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
 		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
 			 id, (u64)attr.sample_type);
 		err = intel_pt_synth_event(session, &attr, id);
@@ -1808,6 +1922,7 @@ static int intel_pt_synth_events(struct intel_pt *pt,
 		attr.sample_period = 1;
 		attr.sample_type |= PERF_SAMPLE_ADDR;
 		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_BRANCH_STACK;
 		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
 			 id, (u64)attr.sample_type);
 		err = intel_pt_synth_event(session, &attr, id);

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf report: Make max_stack value allow for synthesized callchains
  2015-09-25 13:15 ` [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains Adrian Hunter
  2015-09-28 20:03   ` Arnaldo Carvalho de Melo
@ 2015-09-29  8:46   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, acme, linux-kernel, jolsa, adrian.hunter, tglx, hpa

Commit-ID:  188bb5e2ce112463428994f91291e5df6fc05521
Gitweb:     http://git.kernel.org/tip/188bb5e2ce112463428994f91291e5df6fc05521
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:46 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:03:20 -0300

perf report: Make max_stack value allow for synthesized callchains

perf report has an option (--max-stack) to set the maximum stack depth
when processing callchains.  The option defaults to the hard-coded
maximum definition PERF_MAX_STACK_DEPTH which is 127.  The intention of
the option is to allow the user to reduce the processing time by
reducing the amount of the callchain that is processed.

It is also possible, when processing instruction traces, to synthesize
callchains.  Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024.  The
default value is 16.

To allow for synthesized callchains, make the max_stack value at least
the same size as the synthesized callchain size.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-16-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index e94e5c7..37c9f51 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -809,6 +809,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (report.inverted_callchain)
 		callchain_param.order = ORDER_CALLER;
 
+	if (itrace_synth_opts.callchain &&
+	    (int)itrace_synth_opts.callchain_sz > report.max_stack)
+		report.max_stack = itrace_synth_opts.callchain_sz;
+
 	if (!input_name || !strlen(input_name)) {
 		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
 			input_name = "-";

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-25 13:15 ` [PATCH 16/25] perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
@ 2015-09-29  8:46   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, linux-kernel, acme, hpa, adrian.hunter, jolsa

Commit-ID:  96b40f3c05f36e061fd4dde920b9e9c795a88b69
Gitweb:     http://git.kernel.org/tip/96b40f3c05f36e061fd4dde920b9e9c795a88b69
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:47 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:06:16 -0300

perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH

Use the max_stack value instead of PERF_MAX_STACK_DEPTH so that
arbitrary-sized callchains can be supported.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-17-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/hist.c | 6 ++++--
 tools/perf/util/hist.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b3567a2..0cad9e0 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -695,7 +695,7 @@ iter_finish_normal_entry(struct hist_entry_iter *iter,
 }
 
 static int
-iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
+iter_prepare_cumulative_entry(struct hist_entry_iter *iter,
 			      struct addr_location *al __maybe_unused)
 {
 	struct hist_entry **he_cache;
@@ -707,7 +707,7 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
 	 * cumulated only one time to prevent entries more than 100%
 	 * overhead.
 	 */
-	he_cache = malloc(sizeof(*he_cache) * (PERF_MAX_STACK_DEPTH + 1));
+	he_cache = malloc(sizeof(*he_cache) * (iter->max_stack + 1));
 	if (he_cache == NULL)
 		return -ENOMEM;
 
@@ -868,6 +868,8 @@ int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
 	if (err)
 		return err;
 
+	iter->max_stack = max_stack_depth;
+
 	err = iter->ops->prepare_entry(iter, al);
 	if (err)
 		goto out;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4d6aa1d..8c20a8f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -90,6 +90,7 @@ struct hist_entry_iter {
 	int curr;
 
 	bool hide_unresolved;
+	int max_stack;
 
 	struct perf_evsel *evsel;
 	struct perf_sample *sample;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf script: Add a setting for maximum stack depth
  2015-09-25 13:15 ` [PATCH 18/25] perf script: Add a setting for maximum stack depth Adrian Hunter
@ 2015-09-29  8:46   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, linux-kernel, mingo, jolsa, tglx, acme, hpa

Commit-ID:  03cd1fed2b8730271d3a8dbabd87989abddc33c4
Gitweb:     http://git.kernel.org/tip/03cd1fed2b8730271d3a8dbabd87989abddc33c4
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:49 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:08:48 -0300

perf script: Add a setting for maximum stack depth

Add a setting for maximum stack depth in preparation for allowing for
synthesized callchains.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-19-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-script.c | 6 ++++--
 tools/perf/util/session.c   | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 0928439..a65b498 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -33,6 +33,8 @@ static bool			nanosecs;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
+static unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
+
 enum perf_output_field {
 	PERF_OUTPUT_COMM            = 1U << 0,
 	PERF_OUTPUT_TID             = 1U << 1,
@@ -475,7 +477,7 @@ static void print_sample_bts(union perf_event *event,
 			}
 		}
 		perf_evsel__print_ip(evsel, sample, al, print_opts,
-				     PERF_MAX_STACK_DEPTH);
+				     scripting_max_stack);
 	}
 
 	/* print branch_to information */
@@ -552,7 +554,7 @@ static void process_event(union perf_event *event, struct perf_sample *sample,
 
 		perf_evsel__print_ip(evsel, sample, al,
 				     output[attr->type].print_ip_opts,
-				     PERF_MAX_STACK_DEPTH);
+				     scripting_max_stack);
 	}
 
 	if (PRINT_FIELD(IREGS))
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 15c84ca..84a02eae 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1800,7 +1800,7 @@ void perf_evsel__print_ip(struct perf_evsel *evsel, struct perf_sample *sample,
 
 		if (thread__resolve_callchain(al->thread, evsel,
 					      sample, NULL, NULL,
-					      PERF_MAX_STACK_DEPTH) != 0) {
+					      stack_depth) != 0) {
 			if (verbose)
 				error("Failed to resolve callchain. Skipping\n");
 			return;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-25 13:15 ` [PATCH 19/25] perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
@ 2015-09-29  8:47   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, adrian.hunter, jolsa, acme, tglx, mingo

Commit-ID:  44cbe7295c3808977159f500a5bcdebf12a7db5f
Gitweb:     http://git.kernel.org/tip/44cbe7295c3808977159f500a5bcdebf12a7db5f
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:50 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:09:12 -0300

perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH

Use the scripting_max_stack value to allow for values greater than
PERF_MAX_STACK_DEPTH.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-20-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-script.c                            | 2 +-
 tools/perf/util/scripting-engines/trace-event-python.c | 2 +-
 tools/perf/util/trace-event.h                          | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index a65b498..5c3c02d 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -33,7 +33,7 @@ static bool			nanosecs;
 static const char		*cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 
-static unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
+unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
 
 enum perf_output_field {
 	PERF_OUTPUT_COMM            = 1U << 0,
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index aa9e125..a8e825f 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -319,7 +319,7 @@ static PyObject *python_process_callchain(struct perf_sample *sample,
 
 	if (thread__resolve_callchain(al->thread, evsel,
 				      sample, NULL, NULL,
-				      PERF_MAX_STACK_DEPTH) != 0) {
+				      scripting_max_stack) != 0) {
 		pr_err("Failed to resolve callchain. Skipping\n");
 		goto exit;
 	}
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index da6cc4c..b85ee55 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -78,6 +78,8 @@ struct scripting_ops {
 	int (*generate_script) (struct pevent *pevent, const char *outfile);
 };
 
+extern unsigned int scripting_max_stack;
+
 int script_spec_register(const char *spec, struct scripting_ops *ops);
 
 void setup_perl_scripting(void);

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf script: Make scripting_max_stack value allow for synthesized callchains
  2015-09-25 13:15 ` [PATCH 20/25] perf script: Make scripting_max_stack value allow for synthesized callchains Adrian Hunter
@ 2015-09-29  8:47   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, adrian.hunter, linux-kernel, mingo, tglx, acme, hpa

Commit-ID:  3c5b645faee7afbd417f6127694adbd26778a9eb
Gitweb:     http://git.kernel.org/tip/3c5b645faee7afbd417f6127694adbd26778a9eb
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:51 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:09:41 -0300

perf script: Make scripting_max_stack value allow for synthesized callchains

perf script has a setting to set the maximum stack depth when processing
callchains.  The setting defaults to the hard-coded maximum definition
PERF_MAX_STACK_DEPTH which is 127.

It is possible, when processing instruction traces, to synthesize
callchains.  Synthesized callchains do not have the kernel size
limitation and are whatever size the user requests, although validation
presently prevents the user requested a value greater that 1024.  The
default value is 16.

To allow for synthesized callchains, make the scripting_max_stack value
at least the same size as the synthesized callchain size.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-21-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-script.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 5c3c02d..8ce1c6b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1748,6 +1748,10 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 	}
 
+	if (itrace_synth_opts.callchain &&
+	    itrace_synth_opts.callchain_sz > scripting_max_stack)
+		scripting_max_stack = itrace_synth_opts.callchain_sz;
+
 	/* make sure PERF_EXEC_PATH is set for scripts */
 	perf_set_argv_exec_path(perf_exec_path());
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf evlist: Add perf_evlist__id2evsel_strict()
  2015-09-25 13:15 ` [PATCH 21/25] perf tools: Add perf_evlist__id2evsel_strict() Adrian Hunter
@ 2015-09-29  8:47   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, hpa, adrian.hunter, mingo, linux-kernel, acme, tglx

Commit-ID:  dddcf6abbf5946f9ec1183dd2099cede6dbe12fc
Gitweb:     http://git.kernel.org/tip/dddcf6abbf5946f9ec1183dd2099cede6dbe12fc
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:52 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:11:00 -0300

perf evlist: Add perf_evlist__id2evsel_strict()

perf_evlist__id2evsel_strict() is the same as perf_evlist__id2evsel()
except that it ensures that the id must match.

This will be used by perf inject to find a specific evsel that is to be
deleted, hence the need to match exactly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-22-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 15 +++++++++++++++
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index a864373..e676038 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -617,6 +617,21 @@ struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id)
 	return NULL;
 }
 
+struct perf_evsel *perf_evlist__id2evsel_strict(struct perf_evlist *evlist,
+						u64 id)
+{
+	struct perf_sample_id *sid;
+
+	if (!id)
+		return NULL;
+
+	sid = perf_evlist__id2sid(evlist, id);
+	if (sid)
+		return sid->evsel;
+
+	return NULL;
+}
+
 static int perf_evlist__event2id(struct perf_evlist *evlist,
 				 union perf_event *event, u64 *id)
 {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 115d8b5..0edf0d4 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -104,6 +104,8 @@ int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mas
 int perf_evlist__poll(struct perf_evlist *evlist, int timeout);
 
 struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
+struct perf_evsel *perf_evlist__id2evsel_strict(struct perf_evlist *evlist,
+						u64 id);
 
 struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf evlist: Add perf_evlist__remove()
  2015-09-25 13:15 ` [PATCH 22/25] perf tools: Add perf_evlist__del() Adrian Hunter
  2015-09-28 13:33   ` Arnaldo Carvalho de Melo
@ 2015-09-29  8:48   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, tglx, acme, mingo, linux-kernel, jolsa, hpa

Commit-ID:  4768230ad57d4e4fc6d36c44e98e0062c89b0dc0
Gitweb:     http://git.kernel.org/tip/4768230ad57d4e4fc6d36c44e98e0062c89b0dc0
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:53 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:15:31 -0300

perf evlist: Add perf_evlist__remove()

Add a counterpart to perf_evlist__add() that does the opposite and
deletes the evsel.

This will be used by perf inject to remove unwanted evsels.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-23-git-send-email-adrian.hunter@intel.com
[ Renamed it from perf_evlist__del() to perf_evlist__remove() and removed the perf_evsel__delete() call ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 7 +++++++
 tools/perf/util/evlist.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e676038..8954622 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -165,6 +165,13 @@ void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry)
 	__perf_evlist__propagate_maps(evlist, entry);
 }
 
+void perf_evlist__remove(struct perf_evlist *evlist, struct perf_evsel *evsel)
+{
+	evsel->evlist = NULL;
+	list_del_init(&evsel->node);
+	evlist->nr_entries -= 1;
+}
+
 void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
 				   struct list_head *list)
 {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 0edf0d4..66bc9d4 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -73,6 +73,7 @@ void perf_evlist__exit(struct perf_evlist *evlist);
 void perf_evlist__delete(struct perf_evlist *evlist);
 
 void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
+void perf_evlist__remove(struct perf_evlist *evlist, struct perf_evsel *evsel);
 int perf_evlist__add_default(struct perf_evlist *evlist);
 int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf inject: Remove more aux-related stuff when processing instruction traces
  2015-09-25 13:15 ` [PATCH 23/25] perf inject: Remove more aux-related stuff when processing instruction traces Adrian Hunter
@ 2015-09-29  8:48   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, acme, adrian.hunter, tglx, linux-kernel, jolsa

Commit-ID:  73117308f953afb60a1383725b7d5372feeb2433
Gitweb:     http://git.kernel.org/tip/73117308f953afb60a1383725b7d5372feeb2433
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:54 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:17:15 -0300

perf inject: Remove more aux-related stuff when processing instruction traces

perf inject can process instruction traces (using the --itrace option)
which removes aux-related events and replaces them with the requested
synthesized events.

However there are still some leftovers, namely PERF_RECORD_ITRACE_START
events and the original evsel (selected event) e.g. intel_pt//

For the sake of completeness, remove them too.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-24-git-send-email-adrian.hunter@intel.com
[ Made it use perf_evlist__remove() + perf_evsel__delete() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-inject.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 8638fad..9b6119f 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -31,6 +31,7 @@ struct perf_inject {
 	const char		*input_name;
 	struct perf_data_file	output;
 	u64			bytes_written;
+	u64			aux_id;
 	struct list_head	samples;
 	struct itrace_synth_opts itrace_synth_opts;
 };
@@ -176,6 +177,19 @@ static int perf_event__repipe(struct perf_tool *tool,
 	return perf_event__repipe_synth(tool, event);
 }
 
+static int perf_event__drop_aux(struct perf_tool *tool,
+				union perf_event *event __maybe_unused,
+				struct perf_sample *sample,
+				struct machine *machine __maybe_unused)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+
+	if (!inject->aux_id)
+		inject->aux_id = sample->id;
+
+	return 0;
+}
+
 typedef int (*inject_handler)(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -512,6 +526,8 @@ static int __cmd_inject(struct perf_inject *inject)
 		inject->tool.id_index	    = perf_event__repipe_id_index;
 		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
 		inject->tool.auxtrace	    = perf_event__process_auxtrace;
+		inject->tool.aux	    = perf_event__drop_aux;
+		inject->tool.itrace_start   = perf_event__drop_aux,
 		inject->tool.ordered_events = true;
 		inject->tool.ordering_requires_timestamps = true;
 		/* Allow space in the header for new attributes */
@@ -535,14 +551,25 @@ static int __cmd_inject(struct perf_inject *inject)
 		}
 		/*
 		 * The AUX areas have been removed and replaced with
-		 * synthesized hardware events, so clear the feature flag.
+		 * synthesized hardware events, so clear the feature flag and
+		 * remove the evsel.
 		 */
 		if (inject->itrace_synth_opts.set) {
+			struct perf_evsel *evsel;
+
 			perf_header__clear_feat(&session->header,
 						HEADER_AUXTRACE);
 			if (inject->itrace_synth_opts.last_branch)
 				perf_header__set_feat(&session->header,
 						      HEADER_BRANCH_STACK);
+			evsel = perf_evlist__id2evsel_strict(session->evlist,
+							     inject->aux_id);
+			if (evsel) {
+				pr_debug("Deleting %s\n",
+					 perf_evsel__name(evsel));
+				perf_evlist__remove(session->evlist, evsel);
+				perf_evsel__delete(evsel);
+			}
 		}
 		session->header.data_offset = output_data_offset;
 		session->header.data_size = inject->bytes_written;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf inject: Add --strip option to strip out non-synthesized events
  2015-09-25 13:15 ` [PATCH 24/25] perf inject: Add --strip option to strip out non-synthesized events Adrian Hunter
@ 2015-09-29  8:49   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, mingo, jolsa, linux-kernel, hpa, tglx, acme

Commit-ID:  f56fb9864c501dc85ebe40af5bf925dd07d990c0
Gitweb:     http://git.kernel.org/tip/f56fb9864c501dc85ebe40af5bf925dd07d990c0
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:55 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:19:51 -0300

perf inject: Add --strip option to strip out non-synthesized events

Add a new option --strip which is used with --itrace to strip out
non-synthesized events.  This results in a perf.data file that is
simpler for external tools to parse.  In particular, this can be used to
prepare a perf.data file for consumption by autofdo.

A subsequent patch makes a change to Intel PT also to enable use with
autofdo and gives an example of that use.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-25-git-send-email-adrian.hunter@intel.com
[ Made it use perf_evlist__remove() + perf_evsel__delete() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-inject.txt |  3 ++
 tools/perf/builtin-inject.c              | 92 ++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)

diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index 0c721c3..0b1cede 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -50,6 +50,9 @@ OPTIONS
 
 include::itrace.txt[]
 
+--strip::
+	Use with --itrace to strip out non-synthesized events.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1]
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 9b6119f..0a945d2 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -28,6 +28,7 @@ struct perf_inject {
 	bool			build_ids;
 	bool			sched_stat;
 	bool			have_auxtrace;
+	bool			strip;
 	const char		*input_name;
 	struct perf_data_file	output;
 	u64			bytes_written;
@@ -177,6 +178,14 @@ static int perf_event__repipe(struct perf_tool *tool,
 	return perf_event__repipe_synth(tool, event);
 }
 
+static int perf_event__drop(struct perf_tool *tool __maybe_unused,
+			    union perf_event *event __maybe_unused,
+			    struct perf_sample *sample __maybe_unused,
+			    struct machine *machine __maybe_unused)
+{
+	return 0;
+}
+
 static int perf_event__drop_aux(struct perf_tool *tool,
 				union perf_event *event __maybe_unused,
 				struct perf_sample *sample,
@@ -480,6 +489,78 @@ static int perf_evsel__check_stype(struct perf_evsel *evsel,
 	return 0;
 }
 
+static int drop_sample(struct perf_tool *tool __maybe_unused,
+		       union perf_event *event __maybe_unused,
+		       struct perf_sample *sample __maybe_unused,
+		       struct perf_evsel *evsel __maybe_unused,
+		       struct machine *machine __maybe_unused)
+{
+	return 0;
+}
+
+static void strip_init(struct perf_inject *inject)
+{
+	struct perf_evlist *evlist = inject->session->evlist;
+	struct perf_evsel *evsel;
+
+	inject->tool.context_switch = perf_event__drop;
+
+	evlist__for_each(evlist, evsel)
+		evsel->handler = drop_sample;
+}
+
+static bool has_tracking(struct perf_evsel *evsel)
+{
+	return evsel->attr.mmap || evsel->attr.mmap2 || evsel->attr.comm ||
+	       evsel->attr.task;
+}
+
+#define COMPAT_MASK (PERF_SAMPLE_ID | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
+		     PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_IDENTIFIER)
+
+/*
+ * In order that the perf.data file is parsable, tracking events like MMAP need
+ * their selected event to exist, except if there is only 1 selected event left
+ * and it has a compatible sample type.
+ */
+static bool ok_to_remove(struct perf_evlist *evlist,
+			 struct perf_evsel *evsel_to_remove)
+{
+	struct perf_evsel *evsel;
+	int cnt = 0;
+	bool ok = false;
+
+	if (!has_tracking(evsel_to_remove))
+		return true;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->handler != drop_sample) {
+			cnt += 1;
+			if ((evsel->attr.sample_type & COMPAT_MASK) ==
+			    (evsel_to_remove->attr.sample_type & COMPAT_MASK))
+				ok = true;
+		}
+	}
+
+	return ok && cnt == 1;
+}
+
+static void strip_fini(struct perf_inject *inject)
+{
+	struct perf_evlist *evlist = inject->session->evlist;
+	struct perf_evsel *evsel, *tmp;
+
+	/* Remove non-synthesized evsels if possible */
+	evlist__for_each_safe(evlist, tmp, evsel) {
+		if (evsel->handler == drop_sample &&
+		    ok_to_remove(evlist, evsel)) {
+			pr_debug("Deleting %s\n", perf_evsel__name(evsel));
+			perf_evlist__remove(evlist, evsel);
+			perf_evsel__delete(evsel);
+		}
+	}
+}
+
 static int __cmd_inject(struct perf_inject *inject)
 {
 	int ret = -EINVAL;
@@ -532,6 +613,8 @@ static int __cmd_inject(struct perf_inject *inject)
 		inject->tool.ordering_requires_timestamps = true;
 		/* Allow space in the header for new attributes */
 		output_data_offset = 4096;
+		if (inject->strip)
+			strip_init(inject);
 	}
 
 	if (!inject->itrace_synth_opts.set)
@@ -570,6 +653,8 @@ static int __cmd_inject(struct perf_inject *inject)
 				perf_evlist__remove(session->evlist, evsel);
 				perf_evsel__delete(evsel);
 			}
+			if (inject->strip)
+				strip_fini(inject);
 		}
 		session->header.data_offset = output_data_offset;
 		session->header.data_size = inject->bytes_written;
@@ -635,6 +720,8 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 		OPT_CALLBACK_OPTARG(0, "itrace", &inject.itrace_synth_opts,
 				    NULL, "opts", "Instruction Tracing options",
 				    itrace_parse_synth_opts),
+		OPT_BOOLEAN(0, "strip", &inject.strip,
+			    "strip non-synthesized events (use with --itrace)"),
 		OPT_END()
 	};
 	const char * const inject_usage[] = {
@@ -650,6 +737,11 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (argc)
 		usage_with_options(inject_usage, options);
 
+	if (inject.strip && !inject.itrace_synth_opts.set) {
+		pr_err("--strip option requires --itrace option\n");
+		return -1;
+	}
+
 	if (perf_data_file__open(&inject.output)) {
 		perror("failed to create output file");
 		return -1;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf intel-pt: Add mispred-all config option to aid use with autofdo
  2015-09-25 13:15 ` [PATCH 25/25] perf intel-pt: Add mispred-all config option to aid use with autofdo Adrian Hunter
@ 2015-09-29  8:49   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-09-29  8:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, linux-kernel, mingo, tglx, adrian.hunter, hpa, acme

Commit-ID:  ba11ba65e02836c475427ae199adfc2d8cc4a900
Gitweb:     http://git.kernel.org/tip/ba11ba65e02836c475427ae199adfc2d8cc4a900
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:56 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:21:00 -0300

perf intel-pt: Add mispred-all config option to aid use with autofdo

autofdo incorrectly expects branch flags to include either mispred or
predicted.  In fact mispred = predicted = 0 is valid and means the flags
are not supported, which they aren't by Intel PT.

To make autofdo work, add a config option which will cause Intel PT
decoder to set the mispred flag on all branches.

Below is an example of using Intel PT with autofdo.  The example is
also added to the Intel PT documentation.  It requires autofdo
(https://github.com/google/autofdo) and gcc version 5.  The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
amended to take the number of elements as a parameter.

	$ gcc-5 -O3 sort.c -o sort_optimized
	$ ./sort_optimized 30000
	Bubble sorting array of 30000 elements
	2254 ms

	$ cat ~/.perfconfig
	[intel-pt]
		mispred-all

	$ perf record -e intel_pt//u ./sort 3000
	Bubble sorting array of 3000 elements
	58 ms
	[ perf record: Woken up 2 times to write data ]
	[ perf record: Captured and wrote 3.939 MB perf.data ]
	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
	$ ./sort_autofdo 30000
	Bubble sorting array of 30000 elements
	2155 ms

Note there is currently no advantage to using Intel PT instead of LBR,
but that may change in the future if greater use is made of the data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-pt.txt | 29 +++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.c            | 14 ++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index a0fbb5d..be764f9 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -764,3 +764,32 @@ perf inject also accepts the --itrace option in which case tracing data is
 removed and replaced with the synthesized events. e.g.
 
 	perf inject --itrace -i perf.data -o perf.data.new
+
+Below is an example of using Intel PT with autofdo.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
+amended to take the number of elements as a parameter.
+
+	$ gcc-5 -O3 sort.c -o sort_optimized
+	$ ./sort_optimized 30000
+	Bubble sorting array of 30000 elements
+	2254 ms
+
+	$ cat ~/.perfconfig
+	[intel-pt]
+		mispred-all
+
+	$ perf record -e intel_pt//u ./sort 3000
+	Bubble sorting array of 3000 elements
+	58 ms
+	[ perf record: Woken up 2 times to write data ]
+	[ perf record: Captured and wrote 3.939 MB perf.data ]
+	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
+	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ ./sort_autofdo 30000
+	Bubble sorting array of 30000 elements
+	2155 ms
+
+Note there is currently no advantage to using Intel PT instead of LBR, but
+that may change in the future if greater use is made of the data.
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 05e8fcc51..03ff072 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -64,6 +64,7 @@ struct intel_pt {
 	bool data_queued;
 	bool est_tsc;
 	bool sync_switch;
+	bool mispred_all;
 	int have_sched_switch;
 	u32 pmu_type;
 	u64 kernel_start;
@@ -943,6 +944,7 @@ static void intel_pt_update_last_branch_rb(struct intel_pt_queue *ptq)
 	be->flags.abort = !!(state->flags & INTEL_PT_ABORT_TX);
 	be->flags.in_tx = !!(state->flags & INTEL_PT_IN_TX);
 	/* No support for mispredict */
+	be->flags.mispred = ptq->pt->mispred_all;
 
 	if (bs->nr < ptq->pt->synth_opts.last_branch_sz)
 		bs->nr += 1;
@@ -1967,6 +1969,16 @@ static bool intel_pt_find_switch(struct perf_evlist *evlist)
 	return false;
 }
 
+static int intel_pt_perf_config(const char *var, const char *value, void *data)
+{
+	struct intel_pt *pt = data;
+
+	if (!strcmp(var, "intel-pt.mispred-all"))
+		pt->mispred_all = perf_config_bool(var, value);
+
+	return 0;
+}
+
 static const char * const intel_pt_info_fmts[] = {
 	[INTEL_PT_PMU_TYPE]		= "  PMU Type            %"PRId64"\n",
 	[INTEL_PT_TIME_SHIFT]		= "  Time Shift          %"PRIu64"\n",
@@ -2011,6 +2023,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	if (!pt)
 		return -ENOMEM;
 
+	perf_config(intel_pt_perf_config, pt);
+
 	err = auxtrace_queues__init(&pt->queues);
 	if (err)
 		goto err_free;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains
  2015-09-28 20:03   ` Arnaldo Carvalho de Melo
@ 2015-09-29  8:52     ` Adrian Hunter
  2015-09-29 15:51       ` Arnaldo Carvalho de Melo
  2015-10-01  7:10       ` [tip:perf/core] perf report: Amend documentation about max_stack and " tip-bot for Adrian Hunter
  0 siblings, 2 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-09-29  8:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

On 28/09/15 23:03, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 25, 2015 at 04:15:46PM +0300, Adrian Hunter escreveu:
>> perf report has an option (--max-stack) to set the maximum stack depth
>> when processing callchains.  The option defaults to the hard-coded
>> maximum definition PERF_MAX_STACK_DEPTH which is 127.  The intention of
>> the option is to allow the user to reduce the processing time by
>> reducing the amount of the callchain that is processed.
>>
>> It is also possible, when processing instruction traces, to synthesize
>> callchains.  Synthesized callchains do not have the kernel size
>> limitation and are whatever size the user requests, although validation
>> presently prevents the user requested a value greater that 1024.  The
>> default value is 16.
> 
> So, haven't checked the options, but one can possibly use both the way
> itrace has to ask for a max stack size and also via --max-stack, right?

Possibly, but it would not be a common paradigm.

> 
> In that case we better emit a warning or plain state that one either
> uses one way of setting the max stack or the other?

max_stack was added as an optimization to reduce processing time, so
people specifying --max-stack might get a increased processing time
if combined with synthesized callchains, but otherwise no real harm.

A warning seems like overkill.  Could amend the documenation e.g.


diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index b941d5e07e28..ce499035e6d8 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -205,6 +205,8 @@ OPTIONS
 	beyond the specified depth will be ignored. This is a trade-off
 	between information loss and faster processing especially for
 	workloads that can have a very long callchain stack.
+	Note that when using the --itrace option the synthesized callchain size
+	will override this value if the synthesized callchain size is bigger.
 
 	Default: 127
 



> 
> I'm applying the patch, because it is unlikely that this gets specified,
> but would be good to close this gap.
> 
> - Arnaldo
>  
>> To allow for synthesized callchains, make the max_stack value at least
>> the same size as the synthesized callchain size.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/builtin-report.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index e94e5c7155af..37c9f5125887 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -809,6 +809,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>>  	if (report.inverted_callchain)
>>  		callchain_param.order = ORDER_CALLER;
>>  
>> +	if (itrace_synth_opts.callchain &&
>> +	    (int)itrace_synth_opts.callchain_sz > report.max_stack)
>> +		report.max_stack = itrace_synth_opts.callchain_sz;
>> +
>>  	if (!input_name || !strlen(input_name)) {
>>  		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
>>  			input_name = "-";
>> -- 
>> 1.9.1
> 


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff
  2015-09-28 20:33 ` [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Arnaldo Carvalho de Melo
@ 2015-09-29 11:13   ` Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-09-29 11:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

On 28/09/15 23:33, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 25, 2015 at 04:15:31PM +0300, Adrian Hunter escreveu:
>> Hi
>>
>> Here are some minor improvements to Intel PT related stuff.
> 
> Thanks, applied all but:
> 
>  [PATCH 17/25] perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
> 
> Please take a look at the comments I made on this and a few others that
> I applied,

I checked the final patches.  Everything looks good.
Thank you!


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains
  2015-09-29  8:52     ` Adrian Hunter
@ 2015-09-29 15:51       ` Arnaldo Carvalho de Melo
  2015-09-30  8:43         ` Adrian Hunter
  2015-10-01  7:10       ` [tip:perf/core] perf report: Amend documentation about max_stack and " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-29 15:51 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Tue, Sep 29, 2015 at 11:52:37AM +0300, Adrian Hunter escreveu:
> On 28/09/15 23:03, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Sep 25, 2015 at 04:15:46PM +0300, Adrian Hunter escreveu:
> >> perf report has an option (--max-stack) to set the maximum stack depth
> >> when processing callchains.  The option defaults to the hard-coded
> >> maximum definition PERF_MAX_STACK_DEPTH which is 127.  The intention of
> >> the option is to allow the user to reduce the processing time by
> >> reducing the amount of the callchain that is processed.
> >>
> >> It is also possible, when processing instruction traces, to synthesize
> >> callchains.  Synthesized callchains do not have the kernel size
> >> limitation and are whatever size the user requests, although validation
> >> presently prevents the user requested a value greater that 1024.  The
> >> default value is 16.
> > 
> > So, haven't checked the options, but one can possibly use both the way
> > itrace has to ask for a max stack size and also via --max-stack, right?
> 
> Possibly, but it would not be a common paradigm.
> 
> > 
> > In that case we better emit a warning or plain state that one either
> > uses one way of setting the max stack or the other?
> 
> max_stack was added as an optimization to reduce processing time, so
> people specifying --max-stack might get a increased processing time
> if combined with synthesized callchains, but otherwise no real harm.
> 
> A warning seems like overkill.  Could amend the documenation e.g.

Adding the doc part helps, but actually telling that what they are
trying to do is not possible, even for unlikely scenarios like this,
seems cleaner, but no biggie.

I'll add the patch below with your s-o-b, ack?

- Arnaldo
 
> 
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index b941d5e07e28..ce499035e6d8 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -205,6 +205,8 @@ OPTIONS
>  	beyond the specified depth will be ignored. This is a trade-off
>  	between information loss and faster processing especially for
>  	workloads that can have a very long callchain stack.
> +	Note that when using the --itrace option the synthesized callchain size
> +	will override this value if the synthesized callchain size is bigger.
>  
>  	Default: 127
>  
> 
> 
> 
> > 
> > I'm applying the patch, because it is unlikely that this gets specified,
> > but would be good to close this gap.
> > 
> > - Arnaldo
> >  
> >> To allow for synthesized callchains, make the max_stack value at least
> >> the same size as the synthesized callchain size.
> >>
> >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> >> ---
> >>  tools/perf/builtin-report.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> >> index e94e5c7155af..37c9f5125887 100644
> >> --- a/tools/perf/builtin-report.c
> >> +++ b/tools/perf/builtin-report.c
> >> @@ -809,6 +809,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
> >>  	if (report.inverted_callchain)
> >>  		callchain_param.order = ORDER_CALLER;
> >>  
> >> +	if (itrace_synth_opts.callchain &&
> >> +	    (int)itrace_synth_opts.callchain_sz > report.max_stack)
> >> +		report.max_stack = itrace_synth_opts.callchain_sz;
> >> +
> >>  	if (!input_name || !strlen(input_name)) {
> >>  		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
> >>  			input_name = "-";
> >> -- 
> >> 1.9.1
> > 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains
  2015-09-29 15:51       ` Arnaldo Carvalho de Melo
@ 2015-09-30  8:43         ` Adrian Hunter
  2015-09-30 13:17           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 66+ messages in thread
From: Adrian Hunter @ 2015-09-30  8:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

On 29/09/15 18:51, Arnaldo Carvalho de Melo wrote:
> Em Tue, Sep 29, 2015 at 11:52:37AM +0300, Adrian Hunter escreveu:
>> On 28/09/15 23:03, Arnaldo Carvalho de Melo wrote:
>>> Em Fri, Sep 25, 2015 at 04:15:46PM +0300, Adrian Hunter escreveu:
>>>> perf report has an option (--max-stack) to set the maximum stack depth
>>>> when processing callchains.  The option defaults to the hard-coded
>>>> maximum definition PERF_MAX_STACK_DEPTH which is 127.  The intention of
>>>> the option is to allow the user to reduce the processing time by
>>>> reducing the amount of the callchain that is processed.
>>>>
>>>> It is also possible, when processing instruction traces, to synthesize
>>>> callchains.  Synthesized callchains do not have the kernel size
>>>> limitation and are whatever size the user requests, although validation
>>>> presently prevents the user requested a value greater that 1024.  The
>>>> default value is 16.
>>>
>>> So, haven't checked the options, but one can possibly use both the way
>>> itrace has to ask for a max stack size and also via --max-stack, right?
>>
>> Possibly, but it would not be a common paradigm.
>>
>>>
>>> In that case we better emit a warning or plain state that one either
>>> uses one way of setting the max stack or the other?
>>
>> max_stack was added as an optimization to reduce processing time, so
>> people specifying --max-stack might get a increased processing time
>> if combined with synthesized callchains, but otherwise no real harm.
>>
>> A warning seems like overkill.  Could amend the documenation e.g.
> 
> Adding the doc part helps, but actually telling that what they are
> trying to do is not possible, even for unlikely scenarios like this,
> seems cleaner, but no biggie.
> 
> I'll add the patch below with your s-o-b, ack?

Yes thank you.

> 
> - Arnaldo
>  
>>
>> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
>> index b941d5e07e28..ce499035e6d8 100644
>> --- a/tools/perf/Documentation/perf-report.txt
>> +++ b/tools/perf/Documentation/perf-report.txt
>> @@ -205,6 +205,8 @@ OPTIONS
>>  	beyond the specified depth will be ignored. This is a trade-off
>>  	between information loss and faster processing especially for
>>  	workloads that can have a very long callchain stack.
>> +	Note that when using the --itrace option the synthesized callchain size
>> +	will override this value if the synthesized callchain size is bigger.
>>  
>>  	Default: 127
>>  
>>
>>
>>
>>>
>>> I'm applying the patch, because it is unlikely that this gets specified,
>>> but would be good to close this gap.
>>>
>>> - Arnaldo
>>>  
>>>> To allow for synthesized callchains, make the max_stack value at least
>>>> the same size as the synthesized callchain size.
>>>>
>>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>>> ---
>>>>  tools/perf/builtin-report.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>>>> index e94e5c7155af..37c9f5125887 100644
>>>> --- a/tools/perf/builtin-report.c
>>>> +++ b/tools/perf/builtin-report.c
>>>> @@ -809,6 +809,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
>>>>  	if (report.inverted_callchain)
>>>>  		callchain_param.order = ORDER_CALLER;
>>>>  
>>>> +	if (itrace_synth_opts.callchain &&
>>>> +	    (int)itrace_synth_opts.callchain_sz > report.max_stack)
>>>> +		report.max_stack = itrace_synth_opts.callchain_sz;
>>>> +
>>>>  	if (!input_name || !strlen(input_name)) {
>>>>  		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
>>>>  			input_name = "-";
>>>> -- 
>>>> 1.9.1
>>>
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains
  2015-09-30  8:43         ` Adrian Hunter
@ 2015-09-30 13:17           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 66+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-09-30 13:17 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel

Em Wed, Sep 30, 2015 at 11:43:10AM +0300, Adrian Hunter escreveu:
> On 29/09/15 18:51, Arnaldo Carvalho de Melo wrote:
> > I'll add the patch below with your s-o-b, ack?
> 
> Yes thank you.

Done, pushed out.

- Arnaldo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf report: Amend documentation about max_stack and synthesized callchains
  2015-09-29  8:52     ` Adrian Hunter
  2015-09-29 15:51       ` Arnaldo Carvalho de Melo
@ 2015-10-01  7:10       ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-10-01  7:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, adrian.hunter, mingo, acme, tglx, linux-kernel, hpa

Commit-ID:  40862a7b793945c7080d1566ca3dc6249f3c6354
Gitweb:     http://git.kernel.org/tip/40862a7b793945c7080d1566ca3dc6249f3c6354
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Tue, 29 Sep 2015 11:52:37 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 30 Sep 2015 18:34:26 -0300

perf report: Amend documentation about max_stack and synthesized callchains

The --max_stack option was added as an optimization to reduce processing time,
so people specifying --max-stack might get a increased processing time if
combined with synthesized callchains, but otherwise no real harm.

A warning about setting both --max_stack and the synthesized callchains max
depth seems like overkill.  Amend the documentation.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/560A5155.4060105@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index b941d5e..ce49903 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -205,6 +205,8 @@ OPTIONS
 	beyond the specified depth will be ignored. This is a trade-off
 	between information loss and faster processing especially for
 	workloads that can have a very long callchain stack.
+	Note that when using the --itrace option the synthesized callchain size
+	will override this value if the synthesized callchain size is bigger.
 
 	Default: 127
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 17/25] perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-29  8:16     ` Adrian Hunter
@ 2015-10-01 11:45       ` Adrian Hunter
  0 siblings, 0 replies; 66+ messages in thread
From: Adrian Hunter @ 2015-10-01 11:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel

On 29/09/15 11:16, Adrian Hunter wrote:
> On 28/09/15 23:08, Arnaldo Carvalho de Melo wrote:
>> Em Fri, Sep 25, 2015 at 04:15:48PM +0300, Adrian Hunter escreveu:
>>> Adjust the validation to allow for max_stack greater than
>>> PERF_MAX_STACK_DEPTH.
>>>
>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>> ---
>>>  tools/perf/util/machine.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>> index fd1efeafb343..d7bd9a304535 100644
>>> --- a/tools/perf/util/machine.c
>>> +++ b/tools/perf/util/machine.c
>>> @@ -1831,7 +1831,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>>>  	}
>>>  
>>>  check_calls:
>>> -	if (chain->nr > PERF_MAX_STACK_DEPTH) {
>>> +	if (chain->nr > PERF_MAX_STACK_DEPTH && (int)chain->nr > max_stack) {
>>
>> Both?
> 
> Yes.
> 
> In the case of a hardware generated callchain, the callchain can be up to
> PERF_MAX_STACK_DEPTH but max_stack can be less than PERF_MAX_STACK_DEPTH to
> limit the number processed.
> 
> In the case of a synthesized callchain, the callchain can be up to max_stack
> which might be more than PERF_MAX_STACK_DEPTH.

Is this ok?

> 
>>
>>>  		pr_warning("corrupted callchain. skipping...\n");
>>>  		return 0;
>>>  	}
>>> -- 
>>> 1.9.1
>>
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [tip:perf/core] perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
  2015-09-25 13:15 ` [PATCH 17/25] perf callchain: " Adrian Hunter
  2015-09-28 20:08   ` Arnaldo Carvalho de Melo
@ 2015-10-03  7:49   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 66+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-10-03  7:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, linux-kernel, jolsa, adrian.hunter, mingo, hpa, acme

Commit-ID:  0edd453368c6b9cdb756bde2b6675bb0d5d0eb0a
Gitweb:     http://git.kernel.org/tip/0edd453368c6b9cdb756bde2b6675bb0d5d0eb0a
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:48 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 1 Oct 2015 09:56:06 -0300

perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH

Adjust the validation to allow for max_stack greater than
PERF_MAX_STACK_DEPTH.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-18-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 76fe167..5ef90be 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1831,7 +1831,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	}
 
 check_calls:
-	if (chain->nr > PERF_MAX_STACK_DEPTH) {
+	if (chain->nr > PERF_MAX_STACK_DEPTH && (int)chain->nr > max_stack) {
 		pr_warning("corrupted callchain. skipping...\n");
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2015-10-03  7:50 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
2015-09-25 13:15 ` [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero Adrian Hunter
2015-09-28 14:12   ` Arnaldo Carvalho de Melo
2015-09-28 14:16     ` Arnaldo Carvalho de Melo
2015-09-29  8:41   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 02/25] perf report: Fix sample type validation for synthesized callchains Adrian Hunter
2015-09-29  8:41   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 03/25] perf intel-pt: Fix potential loop forever Adrian Hunter
2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 04/25] perf intel-pt: Make logging slightly more efficient Adrian Hunter
2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 05/25] perf script: Allow time to be displayed in nanoseconds Adrian Hunter
2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 06/25] perf tools: Warn when AUX data has been lost Adrian Hunter
2015-09-29  8:43   ` [tip:perf/core] perf session: " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 07/25] perf tools: Add more documentation to export-to-postgresql.py script Adrian Hunter
2015-09-29  8:43   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 08/25] perf auxtrace: Add option to synthesize branch stacks on samples Adrian Hunter
2015-09-29  8:43   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 09/25] perf report: Adjust sample type validation for synthesized branch stacks Adrian Hunter
2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 10/25] perf report: Also do default setup " Adrian Hunter
2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 11/25] perf report: Skip events with null " Adrian Hunter
2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 12/25] perf inject: Set branch stack feature flag when synthesizing " Adrian Hunter
2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 13/25] perf intel-pt: Move branch filter logic Adrian Hunter
2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 14/25] perf intel-pt: Support generating branch stack Adrian Hunter
2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains Adrian Hunter
2015-09-28 20:03   ` Arnaldo Carvalho de Melo
2015-09-29  8:52     ` Adrian Hunter
2015-09-29 15:51       ` Arnaldo Carvalho de Melo
2015-09-30  8:43         ` Adrian Hunter
2015-09-30 13:17           ` Arnaldo Carvalho de Melo
2015-10-01  7:10       ` [tip:perf/core] perf report: Amend documentation about max_stack and " tip-bot for Adrian Hunter
2015-09-29  8:46   ` [tip:perf/core] perf report: Make max_stack value allow for " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 16/25] perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
2015-09-29  8:46   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 17/25] perf callchain: " Adrian Hunter
2015-09-28 20:08   ` Arnaldo Carvalho de Melo
2015-09-29  8:16     ` Adrian Hunter
2015-10-01 11:45       ` Adrian Hunter
2015-10-03  7:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 18/25] perf script: Add a setting for maximum stack depth Adrian Hunter
2015-09-29  8:46   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 19/25] perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
2015-09-29  8:47   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 20/25] perf script: Make scripting_max_stack value allow for synthesized callchains Adrian Hunter
2015-09-29  8:47   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 21/25] perf tools: Add perf_evlist__id2evsel_strict() Adrian Hunter
2015-09-29  8:47   ` [tip:perf/core] perf evlist: " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 22/25] perf tools: Add perf_evlist__del() Adrian Hunter
2015-09-28 13:33   ` Arnaldo Carvalho de Melo
2015-09-28 20:14     ` Arnaldo Carvalho de Melo
2015-09-29  8:48   ` [tip:perf/core] perf evlist: Add perf_evlist__remove() tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 23/25] perf inject: Remove more aux-related stuff when processing instruction traces Adrian Hunter
2015-09-29  8:48   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 24/25] perf inject: Add --strip option to strip out non-synthesized events Adrian Hunter
2015-09-29  8:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 25/25] perf intel-pt: Add mispred-all config option to aid use with autofdo Adrian Hunter
2015-09-29  8:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-28 20:33 ` [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Arnaldo Carvalho de Melo
2015-09-29 11:13   ` Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).