linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] perf script: Add callindent option
@ 2016-06-16 12:34 Adrian Hunter
  2016-06-16 12:34 ` [PATCH RFC 1/3] perf: script: Fix documentation of '-f' when it should be '-F' Adrian Hunter
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Adrian Hunter @ 2016-06-16 12:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel, Andi Kleen

Hi

Andi Kleen sent a couple of patches to add a callindent option to
perf script. If Andi is agreeable, I would like to propose an
alternative implementation.

While there are some differences in the resulting output, the main
differences are:

 1. Tell the decoder to feed branches to the thread stack, which has the
 advantage that it happens before branch filtering and so can be used
 with different itrace options (e.g. it still works when only showing
 calls, even though the thread stack needs to see calls and returns). Also
 it does not conflict with using the thread stack to get callchains.

 2. Print "call" or "ret" instead of using a different event


Adrian Hunter (3):
      perf: script: Fix documentation of '-f' when it should be '-F'
      perf auxtrace: Add option to feed branches to the thread stack
      perf script: Add callindent option

 tools/perf/Documentation/perf-script.txt | 26 +++++++-----
 tools/perf/builtin-script.c              | 70 ++++++++++++++++++++++++++++++++
 tools/perf/util/auxtrace.h               |  2 +
 tools/perf/util/intel-bts.c              | 22 +++++++---
 tools/perf/util/intel-pt.c               |  5 ++-
 tools/perf/util/thread-stack.c           |  7 ++++
 tools/perf/util/thread-stack.h           |  1 +
 7 files changed, 117 insertions(+), 16 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/3] perf: script: Fix documentation of '-f' when it should be '-F'
  2016-06-16 12:34 [PATCH RFC 0/3] perf script: Add callindent option Adrian Hunter
@ 2016-06-16 12:34 ` Adrian Hunter
  2016-06-16 19:53   ` Arnaldo Carvalho de Melo
  2016-06-16 12:34 ` [PATCH RFC 2/3] perf auxtrace: Add option to feed branches to the thread stack Adrian Hunter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Adrian Hunter @ 2016-06-16 12:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel, Andi Kleen

The documentation for perf script mixes up '-f' and '-F'. Fix it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 4fc44c75263f..4f34379ebd77 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -119,13 +119,13 @@ OPTIONS
 	srcline, period, iregs, brstack, brstacksym, flags.
         Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
-        e.g., -f sw:comm,tid,time,ip,sym  and -f trace:time,cpu,trace
+        e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
 
-		perf script -f <fields>
+		perf script -F <fields>
 
 	is equivalent to:
 
-		perf script -f trace:<fields> -f sw:<fields> -f hw:<fields>
+		perf script -F trace:<fields> -F sw:<fields> -F hw:<fields>
 
 	i.e., the specified fields apply to all event types if the type string
 	is not given.
@@ -133,9 +133,9 @@ OPTIONS
 	The arguments are processed in the order received. A later usage can
 	reset a prior request. e.g.:
 
-		-f trace: -f comm,tid,time,ip,sym
+		-F trace: -F comm,tid,time,ip,sym
 
-	The first -f suppresses trace events (field list is ""), but then the
+	The first -F suppresses trace events (field list is ""), but then the
 	second invocation sets the fields to comm,tid,time,ip,sym. In this case a
 	warning is given to the user:
 
@@ -143,9 +143,9 @@ OPTIONS
 
 	Alternatively, consider the order:
 
-		-f comm,tid,time,ip,sym -f trace:
+		-F comm,tid,time,ip,sym -F trace:
 
-	The first -f sets the fields for all events and the second -f
+	The first -F sets the fields for all events and the second -F
 	suppresses trace events. The user is given a warning message about
 	the override, and the result of the above is that only S/W and H/W
 	events are displayed with the given fields.
@@ -154,14 +154,14 @@ OPTIONS
 	event type, a message is displayed to the user that the option is
 	ignored for that type. For example:
 
-		$ perf script -f comm,tid,trace
+		$ perf script -F comm,tid,trace
 		'trace' not valid for hardware events. Ignoring.
 		'trace' not valid for software events. Ignoring.
 
 	Alternatively, if the type is given an invalid field is specified it
 	is an error. For example:
 
-        perf script -v -f sw:comm,tid,trace
+        perf script -v -F sw:comm,tid,trace
         'trace' not valid for software events.
 
 	At this point usage is displayed, and perf-script exits.
@@ -173,7 +173,7 @@ OPTIONS
 	respectively.
 
 	Finally, a user may not set fields to none for all event types.
-	i.e., -f "" is not allowed.
+	i.e., -F "" is not allowed.
 
 	The brstack output includes branch related information with raw addresses using the
 	/v/v/v/v/ syntax in the following order:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 2/3] perf auxtrace: Add option to feed branches to the thread stack
  2016-06-16 12:34 [PATCH RFC 0/3] perf script: Add callindent option Adrian Hunter
  2016-06-16 12:34 ` [PATCH RFC 1/3] perf: script: Fix documentation of '-f' when it should be '-F' Adrian Hunter
@ 2016-06-16 12:34 ` Adrian Hunter
  2016-06-16 12:34 ` [PATCH RFC 3/3] perf script: Add callindent option Adrian Hunter
  2016-06-16 16:37 ` [PATCH RFC 0/3] " Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 8+ messages in thread
From: Adrian Hunter @ 2016-06-16 12:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel, Andi Kleen

In preparation for using the thread stack to print an indent representing
the stack depth in perf script, add an option to tell decoders to feed
branches to the thread stack. Add support for that option to Intel PT and
Intel BTS.

The advantage of using the decoder to feed the thread stack is that it
happens before branch filtering and so can be used with different itrace
options (e.g. it still works when only showing calls, even though the
thread stack needs to see calls and returns). Also it does not conflict
with using the thread stack to get callchains.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/auxtrace.h  |  2 ++
 tools/perf/util/intel-bts.c | 22 +++++++++++++++++-----
 tools/perf/util/intel-pt.c  |  5 ++++-
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 767989e0e312..ac5f0d7167e6 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -63,6 +63,7 @@ enum itrace_period_type {
  * @calls: limit branch samples to calls (can be combined with @returns)
  * @returns: limit branch samples to returns (can be combined with @calls)
  * @callchain: add callchain to 'instructions' events
+ * @thread_stack: feed branches to the thread_stack
  * @last_branch: add branch context to 'instruction' events
  * @callchain_sz: maximum callchain size
  * @last_branch_sz: branch context size
@@ -82,6 +83,7 @@ struct itrace_synth_opts {
 	bool			calls;
 	bool			returns;
 	bool			callchain;
+	bool			thread_stack;
 	bool			last_branch;
 	unsigned int		callchain_sz;
 	unsigned int		last_branch_sz;
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 9df996085563..6d16b5c059b9 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -422,7 +422,8 @@ static int intel_bts_get_branch_type(struct intel_bts_queue *btsq,
 }
 
 static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
-				    struct auxtrace_buffer *buffer)
+				    struct auxtrace_buffer *buffer,
+				    struct thread *thread)
 {
 	struct branch *branch;
 	size_t sz, bsz = sizeof(struct branch);
@@ -444,6 +445,12 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
 		if (!branch->from && !branch->to)
 			continue;
 		intel_bts_get_branch_type(btsq, branch);
+		if (btsq->bts->synth_opts.thread_stack)
+			thread_stack__event(thread, btsq->sample_flags,
+					    le64_to_cpu(branch->from),
+					    le64_to_cpu(branch->to),
+					    btsq->intel_pt_insn.length,
+					    buffer->buffer_nr + 1);
 		if (filter && !(filter & btsq->sample_flags))
 			continue;
 		err = intel_bts_synth_branch_sample(btsq, branch);
@@ -507,12 +514,13 @@ static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
 		goto out_put;
 	}
 
-	if (!btsq->bts->synth_opts.callchain && thread &&
+	if (!btsq->bts->synth_opts.callchain &&
+	    !btsq->bts->synth_opts.thread_stack && thread &&
 	    (!old_buffer || btsq->bts->sampling_mode ||
 	     (btsq->bts->snapshot_mode && !buffer->consecutive)))
 		thread_stack__set_trace_nr(thread, buffer->buffer_nr + 1);
 
-	err = intel_bts_process_buffer(btsq, buffer);
+	err = intel_bts_process_buffer(btsq, buffer, thread);
 
 	auxtrace_buffer__drop_data(buffer);
 
@@ -905,10 +913,14 @@ int intel_bts_process_auxtrace_info(union perf_event *event,
 	if (dump_trace)
 		return 0;
 
-	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
 		bts->synth_opts = *session->itrace_synth_opts;
-	else
+	} else {
 		itrace_synth_opts__set_default(&bts->synth_opts);
+		if (session->itrace_synth_opts)
+			bts->synth_opts.thread_stack =
+				session->itrace_synth_opts->thread_stack;
+	}
 
 	if (bts->synth_opts.calls)
 		bts->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 137196990012..b2f7e5474f3c 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1233,7 +1233,7 @@ static int intel_pt_sample(struct intel_pt_queue *ptq)
 	if (!(state->type & INTEL_PT_BRANCH))
 		return 0;
 
-	if (pt->synth_opts.callchain)
+	if (pt->synth_opts.callchain || pt->synth_opts.thread_stack)
 		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
 				    state->to_ip, ptq->insn_len,
 				    state->trace_nr);
@@ -2136,6 +2136,9 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 			pt->synth_opts.branches = false;
 			pt->synth_opts.callchain = true;
 		}
+		if (session->itrace_synth_opts)
+			pt->synth_opts.thread_stack =
+				session->itrace_synth_opts->thread_stack;
 	}
 
 	if (pt->synth_opts.log)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 3/3] perf script: Add callindent option
  2016-06-16 12:34 [PATCH RFC 0/3] perf script: Add callindent option Adrian Hunter
  2016-06-16 12:34 ` [PATCH RFC 1/3] perf: script: Fix documentation of '-f' when it should be '-F' Adrian Hunter
  2016-06-16 12:34 ` [PATCH RFC 2/3] perf auxtrace: Add option to feed branches to the thread stack Adrian Hunter
@ 2016-06-16 12:34 ` Adrian Hunter
  2016-06-17  8:22   ` [PATCH RFC V2 " Adrian Hunter
  2016-06-16 16:37 ` [PATCH RFC 0/3] " Arnaldo Carvalho de Melo
  3 siblings, 1 reply; 8+ messages in thread
From: Adrian Hunter @ 2016-06-16 12:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel, Andi Kleen

Based on patches from Andi Kleen.

When printing PT instruction traces with perf script it is rather useful to
see some indentation for the call tree. This patch adds a new callindent
field to perf script that prints spaces for the function call stack depth.

We already have code to track the function call stack for PT, that we can
reuse with minor modifications.

The resulting output is not quite as nice as ftrace yet, but a lot better
than what was there before.

Note there are some corner cases when the thread stack gets code confused
and prints incorrect indentation. Even with that it is fairly useful.

When displaying kernel code traces it is recommended to run as root, as
otherwise perf doesn't understand the kernel addresses properly, and may
not reset the call stack correctly on kernel boundaries.

Example output:

	sudo perf-with-kcore record eg2 -a -e intel_pt// -- sleep 1
	sudo perf-with-kcore script eg2 --ns -F callindent,time,comm,pid,sym,ip,addr,event,cpu --itrace=cre | less
	...
         swapper     0 [000] 21191.973043157:   branches:     call irq_exit                                                ffffffff8104e880 smp_call_function_single_interrupt+0x30 => ffffffff8107f2b0 irq_exit
         swapper     0 [000] 21191.973043157:   branches:         call idle_cpu                                            ffffffff8107f2f9 irq_exit+0x49 => ffffffff810a5890 idle_cpu
         swapper     0 [000] 21191.973043157:   branches:         ret  idle_cpu                                            ffffffff810a58d7 idle_cpu+0x47 => ffffffff8107f2fe irq_exit
         swapper     0 [000] 21191.973043157:   branches:         call tick_nohz_irq_exit                                  ffffffff8107f34d irq_exit+0x9d => ffffffff810f4850 tick_nohz_irq_exit
         swapper     0 [000] 21191.973043157:   branches:             call __tick_nohz_idle_enter                          ffffffff810f4870 tick_nohz_irq_exit+0x20 => ffffffff810f4160 __tick_nohz_idle_enter
         swapper     0 [000] 21191.973043157:   branches:                 call ktime_get                                   ffffffff810f4181 __tick_nohz_idle_enter+0x21 => ffffffff810eb750 ktime_get
         swapper     0 [000] 21191.973043157:   branches:                     call read_tsc                                ffffffff810eb786 ktime_get+0x36 => ffffffff810362e0 read_tsc
         swapper     0 [000] 21191.973043157:   branches:                     ret  read_tsc                                ffffffff810362f4 read_tsc+0x14 => ffffffff810eb78c ktime_get
         swapper     0 [000] 21191.973043157:   branches:                 ret  ktime_get                                   ffffffff810eb7d6 ktime_get+0x86 => ffffffff810f4186 __tick_nohz_idle_enter
         swapper     0 [000] 21191.973043157:   branches:                 call sched_clock_idle_sleep_event                ffffffff810f419b __tick_nohz_idle_enter+0x3b => ffffffff810a9220 sched_clock_idle_sleep_event
         swapper     0 [000] 21191.973043157:   branches:                     call sched_clock_cpu                         ffffffff810a922b sched_clock_idle_sleep_event+0xb => ffffffff810a9180 sched_clock_cpu
         swapper     0 [000] 21191.973043157:   branches:                         call sched_clock                         ffffffff810a91ed sched_clock_cpu+0x6d => ffffffff810369d0 sched_clock
         swapper     0 [000] 21191.973043157:   branches:                             call native_sched_clock              ffffffff810369d4 sched_clock+0x4 => ffffffff810368c0 native_sched_clock
         swapper     0 [000] 21191.973043157:   branches:                             ret  native_sched_clock              ffffffff8103690c native_sched_clock+0x4c => ffffffff810369d9 sched_clock
         swapper     0 [000] 21191.973043157:   branches:                         ret  sched_clock                         ffffffff810369dc sched_clock+0xc => ffffffff810a91f2 sched_clock_cpu
         swapper     0 [000] 21191.973043157:   branches:                     ret  sched_clock_cpu                         ffffffff810a91f6 sched_clock_cpu+0x76 => ffffffff810a9230 sched_clock_idle_sleep_event
         swapper     0 [000] 21191.973043157:   branches:                 ret  sched_clock_idle_sleep_event                ffffffff810a9231 sched_clock_idle_sleep_event+0x11 => ffffffff810f41a0 __tick_nohz_idle_enter

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt |  6 +++
 tools/perf/builtin-script.c              | 70 ++++++++++++++++++++++++++++++++
 tools/perf/util/thread-stack.c           |  7 ++++
 tools/perf/util/thread-stack.h           |  1 +
 4 files changed, 84 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 4f34379ebd77..576a85c16f4c 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -172,6 +172,12 @@ OPTIONS
 	transaction abort, trace begin, trace end, and in transaction,
 	respectively.
 
+	The callindent field is synthesized and may have a value when
+	Instruction Trace decoding. For calls and returns, it will display the
+	name of the symbol indented with spaces to reflect the stack depth. To
+	differentiate: calls are prefixed by "call", returns by "ret", trace
+	start by "strt" and trace end by "end".
+
 	Finally, a user may not set fields to none for all event types.
 	i.e., -F "" is not allowed.
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 46011235af5d..05e088560c44 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -21,6 +21,7 @@
 #include "util/cpumap.h"
 #include "util/thread_map.h"
 #include "util/stat.h"
+#include "util/thread-stack.h"
 #include <linux/bitmap.h>
 #include <linux/stringify.h>
 #include "asm/bug.h"
@@ -63,6 +64,7 @@ enum perf_output_field {
 	PERF_OUTPUT_DATA_SRC	    = 1U << 17,
 	PERF_OUTPUT_WEIGHT	    = 1U << 18,
 	PERF_OUTPUT_BPF_OUTPUT	    = 1U << 19,
+	PERF_OUTPUT_CALLINDENT	    = 1U << 20,
 };
 
 struct output_option {
@@ -89,6 +91,7 @@ struct output_option {
 	{.str = "data_src", .field = PERF_OUTPUT_DATA_SRC},
 	{.str = "weight",   .field = PERF_OUTPUT_WEIGHT},
 	{.str = "bpf-output",   .field = PERF_OUTPUT_BPF_OUTPUT},
+	{.str = "callindent", .field = PERF_OUTPUT_CALLINDENT},
 };
 
 /* default set to maintain compatibility with current format */
@@ -562,6 +565,70 @@ static void print_sample_addr(struct perf_sample *sample,
 	}
 }
 
+static void print_sample_callindent(struct perf_sample *sample,
+				    struct perf_evsel *evsel,
+				    struct thread *thread,
+				    struct addr_location *al)
+{
+	struct perf_event_attr *attr = &evsel->attr;
+	size_t depth = thread_stack__depth(thread);
+	const char *prefix = NULL, *name = NULL;
+	struct addr_location addr_al;
+	static int spacing;
+	int len = 0;
+	u64 ip = 0;
+
+	/*
+	 * The 'return' has already been popped off the stack so the depth has
+	 * to be adjusted to match the 'call'.
+	 */
+	if (thread->ts && sample->flags & PERF_IP_FLAG_RETURN)
+		depth += 1;
+
+	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
+		if (sample->flags & PERF_IP_FLAG_TRACE_BEGIN)
+			prefix = "strt";
+		else
+			prefix = "call";
+		if (sample_addr_correlates_sym(attr)) {
+			thread__resolve(thread, &addr_al, sample);
+			if (addr_al.sym)
+				name = addr_al.sym->name;
+			else
+				ip = sample->addr;
+		} else {
+			ip = sample->addr;
+		}
+	} else if (sample->flags & (PERF_IP_FLAG_RETURN | PERF_IP_FLAG_TRACE_END)) {
+		if (sample->flags & PERF_IP_FLAG_TRACE_END)
+			prefix = "end";
+		else
+			prefix = "ret";
+		if (al->sym)
+			name = al->sym->name;
+		else
+			ip = sample->ip;
+	}
+
+	if (name)
+		len = printf("%*s%-4s %s", (int)depth * 4, "", prefix, name);
+	else if (prefix)
+		len = printf("%*s%-4s %16" PRIx64, (int)depth * 4, "", prefix, ip);
+
+	if (len < 0)
+		return;
+
+	/*
+	 * Try to keep the output length from changing frequently so that the
+	 * output lines up more nicely.
+	 */
+	if (len > spacing || (len && len < spacing - 52))
+		spacing = round_up(len + 4, 32);
+
+	if (len < spacing)
+		printf("%*s", spacing - len, "");
+}
+
 static void print_sample_bts(struct perf_sample *sample,
 			     struct perf_evsel *evsel,
 			     struct thread *thread,
@@ -2259,6 +2326,9 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 	script.session = session;
 	script__setup_sample_type(&script);
 
+	if (output[PERF_TYPE_HARDWARE].fields & PERF_OUTPUT_CALLINDENT)
+		itrace_synth_opts.thread_stack = true;
+
 	session->itrace_synth_opts = &itrace_synth_opts;
 
 	if (cpu_list) {
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 825086aa9a08..d3301529f6a7 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -616,3 +616,10 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
 
 	return err;
 }
+
+size_t thread_stack__depth(struct thread *thread)
+{
+	if (!thread->ts)
+		return 0;
+	return thread->ts->cnt;
+}
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index ad44c7944b8e..b7e41c4ebfdd 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -87,6 +87,7 @@ void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
 			  size_t sz, u64 ip);
 int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
+size_t thread_stack__depth(struct thread *thread);
 
 struct call_return_processor *
 call_return_processor__new(int (*process)(struct call_return *cr, void *data),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/3] perf script: Add callindent option
  2016-06-16 12:34 [PATCH RFC 0/3] perf script: Add callindent option Adrian Hunter
                   ` (2 preceding siblings ...)
  2016-06-16 12:34 ` [PATCH RFC 3/3] perf script: Add callindent option Adrian Hunter
@ 2016-06-16 16:37 ` Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-16 16:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Adrian Hunter, Jiri Olsa, linux-kernel, Andi Kleen

Em Thu, Jun 16, 2016 at 03:34:31PM +0300, Adrian Hunter escreveu:
> Hi
> 
> Andi Kleen sent a couple of patches to add a callindent option to
> perf script. If Andi is agreeable, I would like to propose an
> alternative implementation.

Andi?

- Arnaldo
 
> While there are some differences in the resulting output, the main
> differences are:
> 
>  1. Tell the decoder to feed branches to the thread stack, which has the
>  advantage that it happens before branch filtering and so can be used
>  with different itrace options (e.g. it still works when only showing
>  calls, even though the thread stack needs to see calls and returns). Also
>  it does not conflict with using the thread stack to get callchains.
> 
>  2. Print "call" or "ret" instead of using a different event
> 
> 
> Adrian Hunter (3):
>       perf: script: Fix documentation of '-f' when it should be '-F'
>       perf auxtrace: Add option to feed branches to the thread stack
>       perf script: Add callindent option
> 
>  tools/perf/Documentation/perf-script.txt | 26 +++++++-----
>  tools/perf/builtin-script.c              | 70 ++++++++++++++++++++++++++++++++
>  tools/perf/util/auxtrace.h               |  2 +
>  tools/perf/util/intel-bts.c              | 22 +++++++---
>  tools/perf/util/intel-pt.c               |  5 ++-
>  tools/perf/util/thread-stack.c           |  7 ++++
>  tools/perf/util/thread-stack.h           |  1 +
>  7 files changed, 117 insertions(+), 16 deletions(-)
> 
> 
> Regards
> Adrian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 1/3] perf: script: Fix documentation of '-f' when it should be '-F'
  2016-06-16 12:34 ` [PATCH RFC 1/3] perf: script: Fix documentation of '-f' when it should be '-F' Adrian Hunter
@ 2016-06-16 19:53   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-16 19:53 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Jiri Olsa, linux-kernel, Andi Kleen

Em Thu, Jun 16, 2016 at 03:34:32PM +0300, Adrian Hunter escreveu:
> The documentation for perf script mixes up '-f' and '-F'. Fix it.

Applied, thanks,

- Arnaldo

> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/perf-script.txt | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index 4fc44c75263f..4f34379ebd77 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -119,13 +119,13 @@ OPTIONS
>  	srcline, period, iregs, brstack, brstacksym, flags.
>          Field list can be prepended with the type, trace, sw or hw,
>          to indicate to which event type the field list applies.
> -        e.g., -f sw:comm,tid,time,ip,sym  and -f trace:time,cpu,trace
> +        e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
>  
> -		perf script -f <fields>
> +		perf script -F <fields>
>  
>  	is equivalent to:
>  
> -		perf script -f trace:<fields> -f sw:<fields> -f hw:<fields>
> +		perf script -F trace:<fields> -F sw:<fields> -F hw:<fields>
>  
>  	i.e., the specified fields apply to all event types if the type string
>  	is not given.
> @@ -133,9 +133,9 @@ OPTIONS
>  	The arguments are processed in the order received. A later usage can
>  	reset a prior request. e.g.:
>  
> -		-f trace: -f comm,tid,time,ip,sym
> +		-F trace: -F comm,tid,time,ip,sym
>  
> -	The first -f suppresses trace events (field list is ""), but then the
> +	The first -F suppresses trace events (field list is ""), but then the
>  	second invocation sets the fields to comm,tid,time,ip,sym. In this case a
>  	warning is given to the user:
>  
> @@ -143,9 +143,9 @@ OPTIONS
>  
>  	Alternatively, consider the order:
>  
> -		-f comm,tid,time,ip,sym -f trace:
> +		-F comm,tid,time,ip,sym -F trace:
>  
> -	The first -f sets the fields for all events and the second -f
> +	The first -F sets the fields for all events and the second -F
>  	suppresses trace events. The user is given a warning message about
>  	the override, and the result of the above is that only S/W and H/W
>  	events are displayed with the given fields.
> @@ -154,14 +154,14 @@ OPTIONS
>  	event type, a message is displayed to the user that the option is
>  	ignored for that type. For example:
>  
> -		$ perf script -f comm,tid,trace
> +		$ perf script -F comm,tid,trace
>  		'trace' not valid for hardware events. Ignoring.
>  		'trace' not valid for software events. Ignoring.
>  
>  	Alternatively, if the type is given an invalid field is specified it
>  	is an error. For example:
>  
> -        perf script -v -f sw:comm,tid,trace
> +        perf script -v -F sw:comm,tid,trace
>          'trace' not valid for software events.
>  
>  	At this point usage is displayed, and perf-script exits.
> @@ -173,7 +173,7 @@ OPTIONS
>  	respectively.
>  
>  	Finally, a user may not set fields to none for all event types.
> -	i.e., -f "" is not allowed.
> +	i.e., -F "" is not allowed.
>  
>  	The brstack output includes branch related information with raw addresses using the
>  	/v/v/v/v/ syntax in the following order:
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC V2 3/3] perf script: Add callindent option
  2016-06-16 12:34 ` [PATCH RFC 3/3] perf script: Add callindent option Adrian Hunter
@ 2016-06-17  8:22   ` Adrian Hunter
  2016-06-20 23:50     ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Adrian Hunter @ 2016-06-17  8:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel, Andi Kleen

Based on patches from Andi Kleen.

When printing PT instruction traces with perf script it is rather useful to
see some indentation for the call tree. This patch adds a new callindent
field to perf script that prints spaces for the function call stack depth.

We already have code to track the function call stack for PT, that we can
reuse with minor modifications.

The resulting output is not quite as nice as ftrace yet, but a lot better
than what was there before.

Note there are some corner cases when the thread stack gets code confused
and prints incorrect indentation. Even with that it is fairly useful.

When displaying kernel code traces it is recommended to run as root, as
otherwise perf doesn't understand the kernel addresses properly, and may
not reset the call stack correctly on kernel boundaries.

Example output:

	sudo perf-with-kcore record eg2 -a -e intel_pt// -- sleep 1
	sudo perf-with-kcore script eg2 --ns -F callindent,time,comm,pid,sym,ip,addr,event,cpu --itrace=cre | less
	...
         swapper     0 [000] 21191.973043157:   branches:     call irq_exit                                                ffffffff8104e880 smp_call_function_single_interrupt+0x30 => ffffffff8107f2b0 irq_exit
         swapper     0 [000] 21191.973043157:   branches:         call idle_cpu                                            ffffffff8107f2f9 irq_exit+0x49 => ffffffff810a5890 idle_cpu
         swapper     0 [000] 21191.973043157:   branches:         ret  idle_cpu                                            ffffffff810a58d7 idle_cpu+0x47 => ffffffff8107f2fe irq_exit
         swapper     0 [000] 21191.973043157:   branches:         call tick_nohz_irq_exit                                  ffffffff8107f34d irq_exit+0x9d => ffffffff810f4850 tick_nohz_irq_exit
         swapper     0 [000] 21191.973043157:   branches:             call __tick_nohz_idle_enter                          ffffffff810f4870 tick_nohz_irq_exit+0x20 => ffffffff810f4160 __tick_nohz_idle_enter
         swapper     0 [000] 21191.973043157:   branches:                 call ktime_get                                   ffffffff810f4181 __tick_nohz_idle_enter+0x21 => ffffffff810eb750 ktime_get
         swapper     0 [000] 21191.973043157:   branches:                     call read_tsc                                ffffffff810eb786 ktime_get+0x36 => ffffffff810362e0 read_tsc
         swapper     0 [000] 21191.973043157:   branches:                     ret  read_tsc                                ffffffff810362f4 read_tsc+0x14 => ffffffff810eb78c ktime_get
         swapper     0 [000] 21191.973043157:   branches:                 ret  ktime_get                                   ffffffff810eb7d6 ktime_get+0x86 => ffffffff810f4186 __tick_nohz_idle_enter
         swapper     0 [000] 21191.973043157:   branches:                 call sched_clock_idle_sleep_event                ffffffff810f419b __tick_nohz_idle_enter+0x3b => ffffffff810a9220 sched_clock_idle_sleep_event
         swapper     0 [000] 21191.973043157:   branches:                     call sched_clock_cpu                         ffffffff810a922b sched_clock_idle_sleep_event+0xb => ffffffff810a9180 sched_clock_cpu
         swapper     0 [000] 21191.973043157:   branches:                         call sched_clock                         ffffffff810a91ed sched_clock_cpu+0x6d => ffffffff810369d0 sched_clock
         swapper     0 [000] 21191.973043157:   branches:                             call native_sched_clock              ffffffff810369d4 sched_clock+0x4 => ffffffff810368c0 native_sched_clock
         swapper     0 [000] 21191.973043157:   branches:                             ret  native_sched_clock              ffffffff8103690c native_sched_clock+0x4c => ffffffff810369d9 sched_clock
         swapper     0 [000] 21191.973043157:   branches:                         ret  sched_clock                         ffffffff810369dc sched_clock+0xc => ffffffff810a91f2 sched_clock_cpu
         swapper     0 [000] 21191.973043157:   branches:                     ret  sched_clock_cpu                         ffffffff810a91f6 sched_clock_cpu+0x76 => ffffffff810a9230 sched_clock_idle_sleep_event
         swapper     0 [000] 21191.973043157:   branches:                 ret  sched_clock_idle_sleep_event                ffffffff810a9231 sched_clock_idle_sleep_event+0x11 => ffffffff810f41a0 __tick_nohz_idle_enter

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---



Changes in V2:

	Original patch had 2 chunks missing



 tools/perf/Documentation/perf-script.txt |  6 +++
 tools/perf/builtin-script.c              | 76 +++++++++++++++++++++++++++++++-
 tools/perf/util/thread-stack.c           |  7 +++
 tools/perf/util/thread-stack.h           |  1 +
 4 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 4f34379ebd77..576a85c16f4c 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -172,6 +172,12 @@ OPTIONS
 	transaction abort, trace begin, trace end, and in transaction,
 	respectively.
 
+	The callindent field is synthesized and may have a value when
+	Instruction Trace decoding. For calls and returns, it will display the
+	name of the symbol indented with spaces to reflect the stack depth. To
+	differentiate: calls are prefixed by "call", returns by "ret", trace
+	start by "strt" and trace end by "end".
+
 	Finally, a user may not set fields to none for all event types.
 	i.e., -F "" is not allowed.
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 46011235af5d..de2b176c1535 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -21,6 +21,7 @@
 #include "util/cpumap.h"
 #include "util/thread_map.h"
 #include "util/stat.h"
+#include "util/thread-stack.h"
 #include <linux/bitmap.h>
 #include <linux/stringify.h>
 #include "asm/bug.h"
@@ -63,6 +64,7 @@ enum perf_output_field {
 	PERF_OUTPUT_DATA_SRC	    = 1U << 17,
 	PERF_OUTPUT_WEIGHT	    = 1U << 18,
 	PERF_OUTPUT_BPF_OUTPUT	    = 1U << 19,
+	PERF_OUTPUT_CALLINDENT	    = 1U << 20,
 };
 
 struct output_option {
@@ -89,6 +91,7 @@ struct output_option {
 	{.str = "data_src", .field = PERF_OUTPUT_DATA_SRC},
 	{.str = "weight",   .field = PERF_OUTPUT_WEIGHT},
 	{.str = "bpf-output",   .field = PERF_OUTPUT_BPF_OUTPUT},
+	{.str = "callindent", .field = PERF_OUTPUT_CALLINDENT},
 };
 
 /* default set to maintain compatibility with current format */
@@ -562,6 +565,70 @@ static void print_sample_addr(struct perf_sample *sample,
 	}
 }
 
+static void print_sample_callindent(struct perf_sample *sample,
+				    struct perf_evsel *evsel,
+				    struct thread *thread,
+				    struct addr_location *al)
+{
+	struct perf_event_attr *attr = &evsel->attr;
+	size_t depth = thread_stack__depth(thread);
+	const char *prefix = NULL, *name = NULL;
+	struct addr_location addr_al;
+	static int spacing;
+	int len = 0;
+	u64 ip = 0;
+
+	/*
+	 * The 'return' has already been popped off the stack so the depth has
+	 * to be adjusted to match the 'call'.
+	 */
+	if (thread->ts && sample->flags & PERF_IP_FLAG_RETURN)
+		depth += 1;
+
+	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
+		if (sample->flags & PERF_IP_FLAG_TRACE_BEGIN)
+			prefix = "strt";
+		else
+			prefix = "call";
+		if (sample_addr_correlates_sym(attr)) {
+			thread__resolve(thread, &addr_al, sample);
+			if (addr_al.sym)
+				name = addr_al.sym->name;
+			else
+				ip = sample->addr;
+		} else {
+			ip = sample->addr;
+		}
+	} else if (sample->flags & (PERF_IP_FLAG_RETURN | PERF_IP_FLAG_TRACE_END)) {
+		if (sample->flags & PERF_IP_FLAG_TRACE_END)
+			prefix = "end";
+		else
+			prefix = "ret";
+		if (al->sym)
+			name = al->sym->name;
+		else
+			ip = sample->ip;
+	}
+
+	if (name)
+		len = printf("%*s%-4s %s", (int)depth * 4, "", prefix, name);
+	else if (prefix)
+		len = printf("%*s%-4s %16" PRIx64, (int)depth * 4, "", prefix, ip);
+
+	if (len < 0)
+		return;
+
+	/*
+	 * Try to keep the output length from changing frequently so that the
+	 * output lines up more nicely.
+	 */
+	if (len > spacing || (len && len < spacing - 52))
+		spacing = round_up(len + 4, 32);
+
+	if (len < spacing)
+		printf("%*s", spacing - len, "");
+}
+
 static void print_sample_bts(struct perf_sample *sample,
 			     struct perf_evsel *evsel,
 			     struct thread *thread,
@@ -570,6 +637,9 @@ static void print_sample_bts(struct perf_sample *sample,
 	struct perf_event_attr *attr = &evsel->attr;
 	bool print_srcline_last = false;
 
+	if (PRINT_FIELD(CALLINDENT))
+		print_sample_callindent(sample, evsel, thread, al);
+
 	/* print branch_from information */
 	if (PRINT_FIELD(IP)) {
 		unsigned int print_opts = output[attr->type].print_ip_opts;
@@ -2020,7 +2090,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		     "comma separated output fields prepend with 'type:'. "
 		     "Valid types: hw,sw,trace,raw. "
 		     "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
-		     "addr,symoff,period,iregs,brstack,brstacksym,flags", parse_output_fields),
+		     "addr,symoff,period,iregs,brstack,brstacksym,flags,"
+		     "callindent", parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
 	OPT_STRING('S', "symbols", &symbol_conf.sym_list_str, "symbol[,symbol...]",
@@ -2259,6 +2330,9 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 	script.session = session;
 	script__setup_sample_type(&script);
 
+	if (output[PERF_TYPE_HARDWARE].fields & PERF_OUTPUT_CALLINDENT)
+		itrace_synth_opts.thread_stack = true;
+
 	session->itrace_synth_opts = &itrace_synth_opts;
 
 	if (cpu_list) {
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 825086aa9a08..d3301529f6a7 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -616,3 +616,10 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
 
 	return err;
 }
+
+size_t thread_stack__depth(struct thread *thread)
+{
+	if (!thread->ts)
+		return 0;
+	return thread->ts->cnt;
+}
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index ad44c7944b8e..b7e41c4ebfdd 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -87,6 +87,7 @@ void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
 			  size_t sz, u64 ip);
 int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
+size_t thread_stack__depth(struct thread *thread);
 
 struct call_return_processor *
 call_return_processor__new(int (*process)(struct call_return *cr, void *data),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC V2 3/3] perf script: Add callindent option
  2016-06-17  8:22   ` [PATCH RFC V2 " Adrian Hunter
@ 2016-06-20 23:50     ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2016-06-20 23:50 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

On Fri, Jun 17, 2016 at 11:22:56AM +0300, Adrian Hunter wrote:
> Based on patches from Andi Kleen.
> 
> When printing PT instruction traces with perf script it is rather useful to
> see some indentation for the call tree. This patch adds a new callindent
> field to perf script that prints spaces for the function call stack depth.
> 
> We already have code to track the function call stack for PT, that we can
> reuse with minor modifications.
> 
> The resulting output is not quite as nice as ftrace yet, but a lot better
> than what was there before.
> 
> Note there are some corner cases when the thread stack gets code confused
> and prints incorrect indentation. Even with that it is fairly useful.
> 
> When displaying kernel code traces it is recommended to run as root, as
> otherwise perf doesn't understand the kernel addresses properly, and may
> not reset the call stack correctly on kernel boundaries.

I still would prefer to have call returns as event names instead of branches,
and not implicitly in the output, mainly so that it can be disabled if needed,
and give easier to read output.

The rest of the patch looks good to me. Perhaps you could just drop
the prefix for now.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-20 23:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16 12:34 [PATCH RFC 0/3] perf script: Add callindent option Adrian Hunter
2016-06-16 12:34 ` [PATCH RFC 1/3] perf: script: Fix documentation of '-f' when it should be '-F' Adrian Hunter
2016-06-16 19:53   ` Arnaldo Carvalho de Melo
2016-06-16 12:34 ` [PATCH RFC 2/3] perf auxtrace: Add option to feed branches to the thread stack Adrian Hunter
2016-06-16 12:34 ` [PATCH RFC 3/3] perf script: Add callindent option Adrian Hunter
2016-06-17  8:22   ` [PATCH RFC V2 " Adrian Hunter
2016-06-20 23:50     ` Andi Kleen
2016-06-16 16:37 ` [PATCH RFC 0/3] " Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).