linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] perf tool: Haswell LBR call stack support (user)
@ 2014-11-06 14:58 kan.liang
  2014-11-06 14:58 ` [PATCH 1/2] perf tools: enable LBR call stack support kan.liang
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: kan.liang @ 2014-11-06 14:58 UTC (permalink / raw)
  To: acme, jolsa, a.p.zijlstra, eranian
  Cc: linux-kernel, mingo, paulus, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This is the user space patch for Haswell LBR call stack support.
For many profiling tasks we need the callgraph. For example we often
need to see the caller of a lock or the caller of a memcpy or other
library function to actually tune the program. Frame pointer unwinding
is efficient and works well. But frame pointers are off by default on
64bit code (and on modern 32bit gccs), so there are many binaries around
that do not use frame pointers. Profiling unchanged production code is
very useful in practice. On some CPUs frame pointer also has a high
cost. Dwarf2 unwinding also does not always work and is extremely slow
(upto 20% overhead).

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are
executed the last captured branch record is popped from the on-chip LBR
registers. The LBR call stack facility provides an alternative to get
callgraph. It has some limitations too, but should work in most cases
and is significantly faster than dwarf. Frame pointer unwinding is still
the best default, but LBR call stack is a good alternative when nothing
else works.

A new call chain recording option "lbr" is introduced into perf tool for
LBR call stack. The user can use --call-graph lbr to get the call stack
information from hardware.

When profiling bc(1) on Fedora 19:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd
If enabling LBR, perf report output looks like:
    50.36%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
    33.66%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
                     bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
     7.62%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                    |
                    |--99.89%-- 0x2000186a8
                     --0.11%-- [...]
     6.83%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
                    |
                    |--99.94%-- bc_add
                    |          execute
                    |          run_code
                    |          yyparse
                    |          main
                    |          __libc_start_main
                    |          _start
                     --0.06%-- [...]
     0.46%       bc  libc-2.17.so       [.] __memset_sse2
                 |
                 --- __memset_sse2
                    |
                    |--54.13%-- bc_new_num
                    |          |
                    |          |--51.00%-- bc_divide
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |          |--30.46%-- _bc_do_sub
                    |          |          bc_add
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |           --18.55%-- _bc_do_add
                    |                     bc_add
                    |                     execute
                    |                     run_code
                    |                     yyparse
                    |                     main
                    |                     __libc_start_main
                    |                     _start
                    |
                     --45.87%-- bc_divide
                               execute
                               run_code
                               yyparse
                               main
                               __libc_start_main
                               _start
If using FP, perf report output looks like:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd
    50.49%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
    33.57%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
     7.61%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                     0x2000186a8
     6.88%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
     0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
                 |
                 --- __memcpy_ssse3_back

The LBR call stack has following known limitations
 - Zero length calls are not filtered out by hardware
 - Exception handing such as setjmp/longjmp will have calls/returns not
   match
 - Pushing different return address onto the stack will have calls/returns
   not match
 - If callstack is deeper than the LBR, only the last entries are captured

Kan Liang (2):
  perf tools: enable LBR call stack support
  perf tools: Construct LBR call chain

 tools/perf/builtin-record.c |   6 +-
 tools/perf/builtin-report.c |   2 +
 tools/perf/util/callchain.c |   8 ++
 tools/perf/util/callchain.h |   1 +
 tools/perf/util/evsel.c     |  15 +++-
 tools/perf/util/machine.c   | 194 +++++++++++++++++++++++++++++---------------
 tools/perf/util/session.c   |  41 ++++++++--
 7 files changed, 190 insertions(+), 77 deletions(-)

-- 
1.8.3.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] perf tools: enable LBR call stack support
  2014-11-06 14:58 [PATCH 0/2] perf tool: Haswell LBR call stack support (user) kan.liang
@ 2014-11-06 14:58 ` kan.liang
  2014-11-12  7:50   ` Jiri Olsa
  2014-11-12  7:50   ` Jiri Olsa
  2014-11-06 14:58 ` [PATCH 2/2] perf tools: Construct LBR call chain kan.liang
  2014-11-10 10:54 ` [PATCH 0/2] perf tool: Haswell LBR call stack support (user) Peter Zijlstra
  2 siblings, 2 replies; 15+ messages in thread
From: kan.liang @ 2014-11-06 14:58 UTC (permalink / raw)
  To: acme, jolsa, a.p.zijlstra, eranian
  Cc: linux-kernel, mingo, paulus, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Currently, there are two call chain recording options, fp and dwarf.
Haswell has a new feature that utilizes the existing LBR facility to
record call chains. So it provides the third options to record call
chain. This patch enables the lbr call stack support.

LBR call stack has some limitations. It reuses current LBR facility, so
LBR call stack and branch record can not be enabled at the same time. It
is only available for user callchain.
However, LBR call stack can work on the user app which doesn't have
frame-pointer or dwarf debug info compiled. It is a good alternative
when nothing else works.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-record.c |  6 +++---
 tools/perf/builtin-report.c |  2 ++
 tools/perf/util/callchain.c |  8 ++++++++
 tools/perf/util/callchain.h |  1 +
 tools/perf/util/evsel.c     | 15 +++++++++++++--
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 582c4da..e486627 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -639,7 +639,7 @@ error:
 
 static void callchain_debug(void)
 {
-	static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF" };
+	static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF", "LBR" };
 
 	pr_debug("callchain: type %s\n", str[callchain_param.record_mode]);
 
@@ -725,9 +725,9 @@ static struct record record = {
 #define CALLCHAIN_HELP "setup and enables call-graph (stack chain/backtrace) recording: "
 
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
-const char record_callchain_help[] = CALLCHAIN_HELP "fp dwarf";
+const char record_callchain_help[] = CALLCHAIN_HELP "fp dwarf lbr";
 #else
-const char record_callchain_help[] = CALLCHAIN_HELP "fp";
+const char record_callchain_help[] = CALLCHAIN_HELP "fp lbr";
 #endif
 
 /*
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 140a6cd..43babdb 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -261,6 +261,8 @@ static int report__setup_sample_type(struct report *rep)
 		if ((sample_type & PERF_SAMPLE_REGS_USER) &&
 		    (sample_type & PERF_SAMPLE_STACK_USER))
 			callchain_param.record_mode = CALLCHAIN_DWARF;
+		else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+			callchain_param.record_mode = CALLCHAIN_LBR;
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0022980..72ed39b 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -97,6 +97,14 @@ int parse_callchain_record_opt(const char *arg)
 				callchain_param.dump_size = size;
 			}
 #endif /* HAVE_DWARF_UNWIND_SUPPORT */
+		} else if (!strncmp(name, "lbr", sizeof("lbr"))) {
+			if (!strtok_r(NULL, ",", &saveptr)) {
+				callchain_param.record_mode = CALLCHAIN_LBR;
+				ret = 0;
+			} else
+				pr_err("callchain: No more arguments "
+					"needed for -g lbr\n");
+			break;
 		} else {
 			pr_err("callchain: Unknown --call-graph option "
 			       "value: %s\n", arg);
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3caccc2..c0d026f 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -11,6 +11,7 @@ enum perf_call_graph_mode {
 	CALLCHAIN_NONE,
 	CALLCHAIN_FP,
 	CALLCHAIN_DWARF,
+	CALLCHAIN_LBR,
 	CALLCHAIN_MAX
 };
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2f9e680..8c23607 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -537,13 +537,24 @@ int perf_evsel__group_desc(struct perf_evsel *evsel, char *buf, size_t size)
 }
 
 static void
-perf_evsel__config_callgraph(struct perf_evsel *evsel)
+perf_evsel__config_callgraph(struct perf_evsel *evsel,
+			     struct record_opts *opts)
 {
 	bool function = perf_evsel__is_function_event(evsel);
 	struct perf_event_attr *attr = &evsel->attr;
 
 	perf_evsel__set_sample_bit(evsel, CALLCHAIN);
 
+	if (callchain_param.record_mode == CALLCHAIN_LBR) {
+		if (!opts->branch_stack) {
+			perf_evsel__set_sample_bit(evsel, BRANCH_STACK);
+			attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
+						PERF_SAMPLE_BRANCH_CALL_STACK;
+			attr->exclude_user = 0;
+		} else
+			 pr_info("Cannot use LBR callstack with branch stack\n");
+	}
+
 	if (callchain_param.record_mode == CALLCHAIN_DWARF) {
 		if (!function) {
 			perf_evsel__set_sample_bit(evsel, REGS_USER);
@@ -659,7 +670,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
 	}
 
 	if (callchain_param.enabled && !evsel->no_aux_samples)
-		perf_evsel__config_callgraph(evsel);
+		perf_evsel__config_callgraph(evsel, opts);
 
 	if (target__has_cpu(&opts->target))
 		perf_evsel__set_sample_bit(evsel, CPU);
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-06 14:58 [PATCH 0/2] perf tool: Haswell LBR call stack support (user) kan.liang
  2014-11-06 14:58 ` [PATCH 1/2] perf tools: enable LBR call stack support kan.liang
@ 2014-11-06 14:58 ` kan.liang
  2014-11-12  8:58   ` Jiri Olsa
                     ` (3 more replies)
  2014-11-10 10:54 ` [PATCH 0/2] perf tool: Haswell LBR call stack support (user) Peter Zijlstra
  2 siblings, 4 replies; 15+ messages in thread
From: kan.liang @ 2014-11-06 14:58 UTC (permalink / raw)
  To: acme, jolsa, a.p.zijlstra, eranian
  Cc: linux-kernel, mingo, paulus, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

LBR call stack only has user callchain. It is output as
PERF_SAMPLE_BRANCH_STACK data format. For the kernel callchain, it's
still from PERF_SAMPLE_CALLCHAIN.
The perf tool has to handle both data sources to construct a
complete callstack.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/machine.c | 194 ++++++++++++++++++++++++++++++----------------
 tools/perf/util/session.c |  41 ++++++++--
 2 files changed, 163 insertions(+), 72 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 52e9490..f51014f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1399,18 +1399,70 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 	return bi;
 }
 
+static inline int __thread__resolve_callchain_sample(struct thread *thread,
+						     u64 ip, u8 *cpumode,
+						     struct symbol **parent,
+						     struct addr_location *root_al,
+						     struct addr_location *al)
+{
+	if (ip >= PERF_CONTEXT_MAX) {
+		switch (ip) {
+		case PERF_CONTEXT_HV:
+			*cpumode = PERF_RECORD_MISC_HYPERVISOR;
+			break;
+		case PERF_CONTEXT_KERNEL:
+			*cpumode = PERF_RECORD_MISC_KERNEL;
+			break;
+		case PERF_CONTEXT_USER:
+			*cpumode = PERF_RECORD_MISC_USER;
+			break;
+		default:
+			pr_debug("invalid callchain context: "
+				 "%"PRId64"\n", (s64) ip);
+			/*
+			 * It seems the callchain is corrupted.
+			 * Discard all.
+			 */
+			callchain_cursor_reset(&callchain_cursor);
+			return 1;
+		}
+		return 0;
+	}
+
+	al->filtered = 0;
+	thread__find_addr_location(thread, *cpumode,
+				   MAP__FUNCTION, ip, al);
+	if (al->sym != NULL) {
+		if (sort__has_parent && !*parent &&
+		    symbol__match_regex(al->sym, &parent_regex))
+			*parent = al->sym;
+		else if (have_ignore_callees && root_al &&
+		  symbol__match_regex(al->sym, &ignore_callees_regex)) {
+			/* Treat this symbol as the root,
+			   forgetting its callees. */
+			*root_al = *al;
+			callchain_cursor_reset(&callchain_cursor);
+		}
+	}
+
+	return callchain_cursor_append(&callchain_cursor,
+				      ip, al->map, al->sym);
+}
+
 static int thread__resolve_callchain_sample(struct thread *thread,
-					     struct ip_callchain *chain,
-					     struct symbol **parent,
-					     struct addr_location *root_al,
-					     int max_stack)
+					    struct perf_evsel *evsel,
+					    struct perf_sample *sample,
+					    struct symbol **parent,
+					    struct addr_location *root_al,
+					    int max_stack)
 {
+	struct ip_callchain *chain = sample->callchain;
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	int chain_nr = min(max_stack, (int)chain->nr);
-	int i;
-	int j;
-	int err;
+	int i, j, err = 0;
 	int skip_idx __maybe_unused;
+	int lbr = 0;
+	u64 ip;
 
 	callchain_cursor_reset(&callchain_cursor);
 
@@ -1419,74 +1471,81 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		return 0;
 	}
 
-	/*
-	 * Based on DWARF debug information, some architectures skip
-	 * a callchain entry saved by the kernel.
-	 */
-	skip_idx = arch_skip_callchain_idx(thread, chain);
+	if (evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK)
+		lbr = 1;
 
-	for (i = 0; i < chain_nr; i++) {
-		u64 ip;
-		struct addr_location al;
+again:
+	/* LBR call stack */
+	if (lbr) {
+		struct branch_stack *lbr_stack = sample->branch_stack;
+		int lbr_nr = lbr_stack->nr;
+		int mix_chain_nr;
 
-		if (callchain_param.order == ORDER_CALLEE)
-			j = i;
-		else
-			j = chain->nr - i - 1;
+		for (i = 0; i < chain_nr; i++) {
+			if (chain->ips[i] == PERF_CONTEXT_USER)
+				break;
+		}
 
-#ifdef HAVE_SKIP_CALLCHAIN_IDX
-		if (j == skip_idx)
-			continue;
-#endif
-		ip = chain->ips[j];
+		/* LBR only affects the user callchain */
+		if (i == chain_nr) {
+			lbr = 0;
+			goto again;
+		}
 
-		if (ip >= PERF_CONTEXT_MAX) {
-			switch (ip) {
-			case PERF_CONTEXT_HV:
-				cpumode = PERF_RECORD_MISC_HYPERVISOR;
-				break;
-			case PERF_CONTEXT_KERNEL:
-				cpumode = PERF_RECORD_MISC_KERNEL;
-				break;
-			case PERF_CONTEXT_USER:
-				cpumode = PERF_RECORD_MISC_USER;
-				break;
-			default:
-				pr_debug("invalid callchain context: "
-					 "%"PRId64"\n", (s64) ip);
-				/*
-				 * It seems the callchain is corrupted.
-				 * Discard all.
-				 */
-				callchain_cursor_reset(&callchain_cursor);
-				return 0;
-			}
-			continue;
+		mix_chain_nr = i + 2 + lbr_nr;
+		if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
+			pr_warning("corrupted callchain. skipping...\n");
+			return 0;
 		}
 
-		al.filtered = 0;
-		thread__find_addr_location(thread, cpumode,
-					   MAP__FUNCTION, ip, &al);
-		if (al.sym != NULL) {
-			if (sort__has_parent && !*parent &&
-			    symbol__match_regex(al.sym, &parent_regex))
-				*parent = al.sym;
-			else if (have_ignore_callees && root_al &&
-			  symbol__match_regex(al.sym, &ignore_callees_regex)) {
-				/* Treat this symbol as the root,
-				   forgetting its callees. */
-				*root_al = al;
-				callchain_cursor_reset(&callchain_cursor);
+		for (j = 0; j < mix_chain_nr; j++) {
+			struct addr_location al;
+
+			if (callchain_param.order == ORDER_CALLEE) {
+				if (j < i + 2)
+					ip = chain->ips[j];
+				else
+					ip = lbr_stack->entries[j - i - 2].from;
+			} else {
+				if (j < lbr_nr)
+					ip = lbr_stack->entries[lbr_nr - j - 1].from;
+				else
+					ip = chain->ips[i + 1 - (j - lbr_nr)];
 			}
+			err = __thread__resolve_callchain_sample(thread,
+				ip, &cpumode, parent, root_al, &al);
+			if (err)
+				goto exit;
 		}
+	} else {
 
-		err = callchain_cursor_append(&callchain_cursor,
-					      ip, al.map, al.sym);
-		if (err)
-			return err;
-	}
+		/*
+		 * Based on DWARF debug information, some architectures skip
+		 * a callchain entry saved by the kernel.
+		 */
+		skip_idx = arch_skip_callchain_idx(thread, chain);
 
-	return 0;
+		for (i = 0; i < chain_nr; i++) {
+			struct addr_location al;
+
+			if (callchain_param.order == ORDER_CALLEE)
+				j = i;
+			else
+				j = chain->nr - i - 1;
+
+#ifdef HAVE_SKIP_CALLCHAIN_IDX
+			if (j == skip_idx)
+				continue;
+#endif
+			ip = chain->ips[j];
+			err = __thread__resolve_callchain_sample(thread,
+				ip, &cpumode, parent, root_al, &al);
+			if (err)
+				goto exit;
+		}
+	}
+exit:
+	return (err < 0) ? err : 0;
 }
 
 static int unwind_entry(struct unwind_entry *entry, void *arg)
@@ -1503,8 +1562,9 @@ int thread__resolve_callchain(struct thread *thread,
 			      struct addr_location *root_al,
 			      int max_stack)
 {
-	int ret = thread__resolve_callchain_sample(thread, sample->callchain,
-						   parent, root_al, max_stack);
+	int ret = thread__resolve_callchain_sample(thread, evsel,
+						   sample, parent,
+						   root_al, max_stack);
 	if (ret)
 		return ret;
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f4478ce..fa476be 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -557,15 +557,45 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
 	return 0;
 }
 
-static void callchain__printf(struct perf_sample *sample)
+static void callchain__printf(struct perf_evsel *evsel,
+			      struct perf_sample *sample)
 {
 	unsigned int i;
+	u64 total_nr, callchain_nr;
+	int lbr = 0;
 
-	printf("... chain: nr:%" PRIu64 "\n", sample->callchain->nr);
+	total_nr = callchain_nr = sample->callchain->nr;
 
-	for (i = 0; i < sample->callchain->nr; i++)
+	if (evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK)
+		lbr = 1;
+
+	if (lbr) {
+		struct branch_stack *lbr_stack = sample->branch_stack;
+
+		for (i = 0; i < callchain_nr; i++) {
+			if (sample->callchain->ips[i] == PERF_CONTEXT_USER)
+				break;
+		}
+
+		if (i != callchain_nr) {
+			total_nr = i + 1 + lbr_stack->nr;
+			callchain_nr = i + 1;
+		}
+	}
+
+	printf("... chain: nr:%" PRIu64 "\n", total_nr);
+
+	for (i = 0; i < callchain_nr + 1; i++)
 		printf("..... %2d: %016" PRIx64 "\n",
 		       i, sample->callchain->ips[i]);
+
+	if (total_nr > callchain_nr) {
+		struct branch_stack *lbr_stack = sample->branch_stack;
+
+		for (i = 0; i < lbr_stack->nr; i++)
+			printf("..... %2d: %016" PRIx64 "\n",
+				(int)(i + callchain_nr + 1), lbr_stack->entries[i].from);
+	}
 }
 
 static void branch_stack__printf(struct perf_sample *sample)
@@ -691,9 +721,10 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 	sample_type = evsel->attr.sample_type;
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN)
-		callchain__printf(sample);
+		callchain__printf(evsel, sample);
 
-	if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+	if ((sample_type & PERF_SAMPLE_BRANCH_STACK) &&
+		!(evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK))
 		branch_stack__printf(sample);
 
 	if (sample_type & PERF_SAMPLE_REGS_USER)
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] perf tool: Haswell LBR call stack support (user)
  2014-11-06 14:58 [PATCH 0/2] perf tool: Haswell LBR call stack support (user) kan.liang
  2014-11-06 14:58 ` [PATCH 1/2] perf tools: enable LBR call stack support kan.liang
  2014-11-06 14:58 ` [PATCH 2/2] perf tools: Construct LBR call chain kan.liang
@ 2014-11-10 10:54 ` Peter Zijlstra
  2014-11-10 14:08   ` Liang, Kan
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2014-11-10 10:54 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, jolsa, eranian, linux-kernel, mingo, paulus, ak



acme, jolsa, ACK on these two?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 0/2] perf tool: Haswell LBR call stack support (user)
  2014-11-10 10:54 ` [PATCH 0/2] perf tool: Haswell LBR call stack support (user) Peter Zijlstra
@ 2014-11-10 14:08   ` Liang, Kan
  0 siblings, 0 replies; 15+ messages in thread
From: Liang, Kan @ 2014-11-10 14:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: acme, jolsa, eranian, linux-kernel, mingo, paulus, ak

 
> acme, jolsa, ACK on these two?

These patches are pure user tool patches. I usually sent the tool
patches to them for review. Also, Jolsa had some comments on
the previous perf tool part. So I would like them have a look
at the new changes of the user tool.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf tools: enable LBR call stack support
  2014-11-06 14:58 ` [PATCH 1/2] perf tools: enable LBR call stack support kan.liang
@ 2014-11-12  7:50   ` Jiri Olsa
  2014-11-12  7:50   ` Jiri Olsa
  1 sibling, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2014-11-12  7:50 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Thu, Nov 06, 2014 at 09:58:05AM -0500, kan.liang@intel.com wrote:

SNIP

>  		    (sample_type & PERF_SAMPLE_STACK_USER))
>  			callchain_param.record_mode = CALLCHAIN_DWARF;
> +		else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
> +			callchain_param.record_mode = CALLCHAIN_LBR;
>  		else
>  			callchain_param.record_mode = CALLCHAIN_FP;
>  	}
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 0022980..72ed39b 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -97,6 +97,14 @@ int parse_callchain_record_opt(const char *arg)
>  				callchain_param.dump_size = size;
>  			}
>  #endif /* HAVE_DWARF_UNWIND_SUPPORT */
> +		} else if (!strncmp(name, "lbr", sizeof("lbr"))) {
> +			if (!strtok_r(NULL, ",", &saveptr)) {
> +				callchain_param.record_mode = CALLCHAIN_LBR;
> +				ret = 0;
> +			} else
> +				pr_err("callchain: No more arguments "
> +					"needed for -g lbr\n");

should it be 'needed for --call-graph' option?   ^^^^^^^^^^^

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf tools: enable LBR call stack support
  2014-11-06 14:58 ` [PATCH 1/2] perf tools: enable LBR call stack support kan.liang
  2014-11-12  7:50   ` Jiri Olsa
@ 2014-11-12  7:50   ` Jiri Olsa
  2014-11-12 14:44     ` Liang, Kan
  1 sibling, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2014-11-12  7:50 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Thu, Nov 06, 2014 at 09:58:05AM -0500, kan.liang@intel.com wrote:

SNIP

>  };
>  
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 2f9e680..8c23607 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -537,13 +537,24 @@ int perf_evsel__group_desc(struct perf_evsel *evsel, char *buf, size_t size)
>  }
>  
>  static void
> -perf_evsel__config_callgraph(struct perf_evsel *evsel)
> +perf_evsel__config_callgraph(struct perf_evsel *evsel,
> +			     struct record_opts *opts)
>  {
>  	bool function = perf_evsel__is_function_event(evsel);
>  	struct perf_event_attr *attr = &evsel->attr;
>  
>  	perf_evsel__set_sample_bit(evsel, CALLCHAIN);
>  
> +	if (callchain_param.record_mode == CALLCHAIN_LBR) {
> +		if (!opts->branch_stack) {
> +			perf_evsel__set_sample_bit(evsel, BRANCH_STACK);
> +			attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
> +						PERF_SAMPLE_BRANCH_CALL_STACK;
> +			attr->exclude_user = 0;

I think we shouldn't siletly change attr->exclude_user,
if it was defined, we need to display warning that
we are changing that or fail

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-06 14:58 ` [PATCH 2/2] perf tools: Construct LBR call chain kan.liang
@ 2014-11-12  8:58   ` Jiri Olsa
  2014-11-12  8:58   ` Jiri Olsa
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2014-11-12  8:58 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Thu, Nov 06, 2014 at 09:58:06AM -0500, kan.liang@intel.com wrote:

SNIP

> +	struct ip_callchain *chain = sample->callchain;
>  	u8 cpumode = PERF_RECORD_MISC_USER;
>  	int chain_nr = min(max_stack, (int)chain->nr);
> -	int i;
> -	int j;
> -	int err;
> +	int i, j, err = 0;
>  	int skip_idx __maybe_unused;
> +	int lbr = 0;
> +	u64 ip;
>  
>  	callchain_cursor_reset(&callchain_cursor);
>  
> @@ -1419,74 +1471,81 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  		return 0;
>  	}
>  
> -	/*
> -	 * Based on DWARF debug information, some architectures skip
> -	 * a callchain entry saved by the kernel.
> -	 */
> -	skip_idx = arch_skip_callchain_idx(thread, chain);
> +	if (evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK)
> +		lbr = 1;

please put this in a function.. like

bool has_branch_callstack(evsel)

thanks,
jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-06 14:58 ` [PATCH 2/2] perf tools: Construct LBR call chain kan.liang
  2014-11-12  8:58   ` Jiri Olsa
@ 2014-11-12  8:58   ` Jiri Olsa
  2014-11-12  8:59   ` Jiri Olsa
  2014-11-12 12:33   ` Jiri Olsa
  3 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2014-11-12  8:58 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Thu, Nov 06, 2014 at 09:58:06AM -0500, kan.liang@intel.com wrote:

SNIP

> +			callchain_nr = i + 1;
> +		}
> +	}
> +
> +	printf("... chain: nr:%" PRIu64 "\n", total_nr);
> +
> +	for (i = 0; i < callchain_nr + 1; i++)
>  		printf("..... %2d: %016" PRIx64 "\n",
>  		       i, sample->callchain->ips[i]);
> +
> +	if (total_nr > callchain_nr) {
> +		struct branch_stack *lbr_stack = sample->branch_stack;
> +
> +		for (i = 0; i < lbr_stack->nr; i++)
> +			printf("..... %2d: %016" PRIx64 "\n",
> +				(int)(i + callchain_nr + 1), lbr_stack->entries[i].from);

could u describe in comment why only ->from entries are interesting?

thanks,
jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-06 14:58 ` [PATCH 2/2] perf tools: Construct LBR call chain kan.liang
  2014-11-12  8:58   ` Jiri Olsa
  2014-11-12  8:58   ` Jiri Olsa
@ 2014-11-12  8:59   ` Jiri Olsa
  2014-11-12 14:37     ` Liang, Kan
  2014-11-12 12:33   ` Jiri Olsa
  3 siblings, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2014-11-12  8:59 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Thu, Nov 06, 2014 at 09:58:06AM -0500, kan.liang@intel.com wrote:

SNIP

>  
> -static void callchain__printf(struct perf_sample *sample)
> +static void callchain__printf(struct perf_evsel *evsel,
> +			      struct perf_sample *sample)
>  {
>  	unsigned int i;
> +	u64 total_nr, callchain_nr;
> +	int lbr = 0;
>  
> -	printf("... chain: nr:%" PRIu64 "\n", sample->callchain->nr);
> +	total_nr = callchain_nr = sample->callchain->nr;
>  
> -	for (i = 0; i < sample->callchain->nr; i++)
> +	if (evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK)
> +		lbr = 1;
> +
> +	if (lbr) {
> +		struct branch_stack *lbr_stack = sample->branch_stack;
> +
> +		for (i = 0; i < callchain_nr; i++) {
> +			if (sample->callchain->ips[i] == PERF_CONTEXT_USER)
> +				break;
> +		}
> +
> +		if (i != callchain_nr) {
> +			total_nr = i + 1 + lbr_stack->nr;
> +			callchain_nr = i + 1;
> +		}
> +	}
> +
> +	printf("... chain: nr:%" PRIu64 "\n", total_nr);
> +
> +	for (i = 0; i < callchain_nr + 1; i++)
>  		printf("..... %2d: %016" PRIx64 "\n",
>  		       i, sample->callchain->ips[i]);

so if there's lbr callstack info we dont display user stack
part from standard callchain? I think the dump code should
dump out all the info..

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-06 14:58 ` [PATCH 2/2] perf tools: Construct LBR call chain kan.liang
                     ` (2 preceding siblings ...)
  2014-11-12  8:59   ` Jiri Olsa
@ 2014-11-12 12:33   ` Jiri Olsa
  3 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2014-11-12 12:33 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Thu, Nov 06, 2014 at 09:58:06AM -0500, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 

SNIP

> +	/* LBR call stack */
> +	if (lbr) {
> +		struct branch_stack *lbr_stack = sample->branch_stack;
> +		int lbr_nr = lbr_stack->nr;
> +		int mix_chain_nr;
>  
> -		if (callchain_param.order == ORDER_CALLEE)
> -			j = i;
> -		else
> -			j = chain->nr - i - 1;
> +		for (i = 0; i < chain_nr; i++) {
> +			if (chain->ips[i] == PERF_CONTEXT_USER)
> +				break;
> +		}
>  
> -#ifdef HAVE_SKIP_CALLCHAIN_IDX
> -		if (j == skip_idx)
> -			continue;
> -#endif
> -		ip = chain->ips[j];
> +		/* LBR only affects the user callchain */
> +		if (i == chain_nr) {
> +			lbr = 0;
> +			goto again;
> +		}

'goto again' sounds like u want to try something again,
but you prepare the condition already with lbr = 0,
could you please restruct the code in another way?


SNIP

> +				goto exit;
>  		}
> +	} else {
>  
> -		err = callchain_cursor_append(&callchain_cursor,
> -					      ip, al.map, al.sym);
> -		if (err)
> -			return err;
> -	}
> +		/*
> +		 * Based on DWARF debug information, some architectures skip
> +		 * a callchain entry saved by the kernel.
> +		 */
> +		skip_idx = arch_skip_callchain_idx(thread, chain);
>  
> -	return 0;
> +		for (i = 0; i < chain_nr; i++) {
> +			struct addr_location al;
> +
> +			if (callchain_param.order == ORDER_CALLEE)
> +				j = i;
> +			else
> +				j = chain->nr - i - 1;
> +
> +#ifdef HAVE_SKIP_CALLCHAIN_IDX
> +			if (j == skip_idx)
> +				continue;
> +#endif
> +			ip = chain->ips[j];
> +			err = __thread__resolve_callchain_sample(thread,

Uou factored out the common functionality into __thread__resolve_callchain_sample
function and added your own functionality.. could you please split this into 2
separate patches? (first the new function, then your change)
IMO It'll make the change simple and more obvious.

thanks,
jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-12  8:59   ` Jiri Olsa
@ 2014-11-12 14:37     ` Liang, Kan
  2014-11-12 15:05       ` Peter Zijlstra
  2014-11-12 18:31       ` Jiri Olsa
  0 siblings, 2 replies; 15+ messages in thread
From: Liang, Kan @ 2014-11-12 14:37 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak



> > +
> > +	printf("... chain: nr:%" PRIu64 "\n", total_nr);
> > +
> > +	for (i = 0; i < callchain_nr + 1; i++)
> >  		printf("..... %2d: %016" PRIx64 "\n",
> >  		       i, sample->callchain->ips[i]);
> 
> so if there's lbr callstack info we dont display user stack part from standard
> callchain? I think the dump code should dump out all the info..
> 
 
Right, we don't display user stack part from fp if there is lbr callstack info.
The lbr callstack info can only be captured when the user set --call-graph
lbr. If --call-graph is set to fp and dwarf, there will be no lbr callstack info.

If the user set lbr, I think he really want the lbr info. So I think if we display
both lbr and fp, the fp chain might be meaningless and it will confuse them.
If the user want to do compare, they can do perf record twice with 
different --call-graph.


Thanks,
Kan

> jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/2] perf tools: enable LBR call stack support
  2014-11-12  7:50   ` Jiri Olsa
@ 2014-11-12 14:44     ` Liang, Kan
  0 siblings, 0 replies; 15+ messages in thread
From: Liang, Kan @ 2014-11-12 14:44 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak



> PERF_SAMPLE_BRANCH_USER |
> > +
> 	PERF_SAMPLE_BRANCH_CALL_STACK;
> > +			attr->exclude_user = 0;
> 
> I think we shouldn't siletly change attr->exclude_user, if it was defined, we
> need to display warning that we are changing that or fail
> 

Right, I will display a warning here.

Thanks,
Kan

> jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-12 14:37     ` Liang, Kan
@ 2014-11-12 15:05       ` Peter Zijlstra
  2014-11-12 18:31       ` Jiri Olsa
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-11-12 15:05 UTC (permalink / raw)
  To: Liang, Kan; +Cc: Jiri Olsa, acme, eranian, linux-kernel, mingo, paulus, ak

On Wed, Nov 12, 2014 at 02:37:13PM +0000, Liang, Kan wrote:
> If the user set lbr, I think he really want the lbr info. So I think if we display
> both lbr and fp, the fp chain might be meaningless and it will confuse them.
> If the user want to do compare, they can do perf record twice with 
> different --call-graph.

Or fix the tool to do both. Having both from the exact same context is
far better to compare the quality of the actual backtraces.

But that is not something we have to implement now; but the kernel
interface does allow it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] perf tools: Construct LBR call chain
  2014-11-12 14:37     ` Liang, Kan
  2014-11-12 15:05       ` Peter Zijlstra
@ 2014-11-12 18:31       ` Jiri Olsa
  1 sibling, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2014-11-12 18:31 UTC (permalink / raw)
  To: Liang, Kan; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Wed, Nov 12, 2014 at 02:37:13PM +0000, Liang, Kan wrote:
> 
> 
> > > +
> > > +	printf("... chain: nr:%" PRIu64 "\n", total_nr);
> > > +
> > > +	for (i = 0; i < callchain_nr + 1; i++)
> > >  		printf("..... %2d: %016" PRIx64 "\n",
> > >  		       i, sample->callchain->ips[i]);
> > 
> > so if there's lbr callstack info we dont display user stack part from standard
> > callchain? I think the dump code should dump out all the info..
> > 
>  
> Right, we don't display user stack part from fp if there is lbr callstack info.
> The lbr callstack info can only be captured when the user set --call-graph
> lbr. If --call-graph is set to fp and dwarf, there will be no lbr callstack info.
> 
> If the user set lbr, I think he really want the lbr info. So I think if we display
> both lbr and fp, the fp chain might be meaningless and it will confuse them.
> If the user want to do compare, they can do perf record twice with 
> different --call-graph.

hum, IMO if the user wants the dump (report -D), she wants to see
everything she got from kernel

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-11-12 18:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-06 14:58 [PATCH 0/2] perf tool: Haswell LBR call stack support (user) kan.liang
2014-11-06 14:58 ` [PATCH 1/2] perf tools: enable LBR call stack support kan.liang
2014-11-12  7:50   ` Jiri Olsa
2014-11-12  7:50   ` Jiri Olsa
2014-11-12 14:44     ` Liang, Kan
2014-11-06 14:58 ` [PATCH 2/2] perf tools: Construct LBR call chain kan.liang
2014-11-12  8:58   ` Jiri Olsa
2014-11-12  8:58   ` Jiri Olsa
2014-11-12  8:59   ` Jiri Olsa
2014-11-12 14:37     ` Liang, Kan
2014-11-12 15:05       ` Peter Zijlstra
2014-11-12 18:31       ` Jiri Olsa
2014-11-12 12:33   ` Jiri Olsa
2014-11-10 10:54 ` [PATCH 0/2] perf tool: Haswell LBR call stack support (user) Peter Zijlstra
2014-11-10 14:08   ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).