linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user)
@ 2014-11-14 13:44 kan.liang
  2014-11-14 13:44 ` [PATCH V3 1/3] perf tools: enable LBR call stack support kan.liang
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: kan.liang @ 2014-11-14 13:44 UTC (permalink / raw)
  To: acme, jolsa, a.p.zijlstra, eranian
  Cc: linux-kernel, mingo, paulus, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This is the user space patch for Haswell LBR call stack support.
For many profiling tasks we need the callgraph. For example we often
need to see the caller of a lock or the caller of a memcpy or other
library function to actually tune the program. Frame pointer unwinding
is efficient and works well. But frame pointers are off by default on
64bit code (and on modern 32bit gccs), so there are many binaries around
that do not use frame pointers. Profiling unchanged production code is
very useful in practice. On some CPUs frame pointer also has a high
cost. Dwarf2 unwinding also does not always work and is extremely slow
(upto 20% overhead).

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are
executed the last captured branch record is popped from the on-chip LBR
registers. The LBR call stack facility provides an alternative to get
callgraph. It has some limitations too, but should work in most cases
and is significantly faster than dwarf. Frame pointer unwinding is still
the best default, but LBR call stack is a good alternative when nothing
else works.

A new call chain recording option "lbr" is introduced into perf tool for
LBR call stack. The user can use --call-graph lbr to get the call stack
information from hardware.

When profiling bc(1) on Fedora 19:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd
If enabling LBR, perf report output looks like:
    50.36%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
    33.66%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
                     bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
     7.62%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                    |
                    |--99.89%-- 0x2000186a8
                     --0.11%-- [...]
     6.83%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
                    |
                    |--99.94%-- bc_add
                    |          execute
                    |          run_code
                    |          yyparse
                    |          main
                    |          __libc_start_main
                    |          _start
                     --0.06%-- [...]
     0.46%       bc  libc-2.17.so       [.] __memset_sse2
                 |
                 --- __memset_sse2
                    |
                    |--54.13%-- bc_new_num
                    |          |
                    |          |--51.00%-- bc_divide
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |          |--30.46%-- _bc_do_sub
                    |          |          bc_add
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |           --18.55%-- _bc_do_add
                    |                     bc_add
                    |                     execute
                    |                     run_code
                    |                     yyparse
                    |                     main
                    |                     __libc_start_main
                    |                     _start
                    |
                     --45.87%-- bc_divide
                               execute
                               run_code
                               yyparse
                               main
                               __libc_start_main
                               _start
If using FP, perf report output looks like:
echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd
    50.49%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
    33.57%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
     7.61%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                     0x2000186a8
     6.88%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
     0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
                 |
                 --- __memcpy_ssse3_back

If using LBR, perf report -D output looks like:
11739295893248 0x4d0 [0xe0]: PERF_RECORD_SAMPLE(IP, 0x2): 10505/10505:
0x40054d period: 39255 addr: 0
... LBR call chain: nr:7
.....  0: fffffffffffffe00
.....  1: 0000000000400540
.....  2: 0000000000400587
.....  3: 00000000004005b3
.....  4: 00000000004005ef
.....  5: 0000003d1cc21b43
.....  6: 0000000000400474
... FP chain: nr:6
.....  0: fffffffffffffe00
.....  1: 000000000040054d
.....  2: 000000000040058c
.....  3: 00000000004005b8
.....  4: 00000000004005f4
.....  5: 0000003d1cc21b45
 ... thread: a.out:10505
 ...... dso: /home/lk/a.out


The LBR call stack has following known limitations
 - Zero length calls are not filtered out by hardware
 - Exception handing such as setjmp/longjmp will have calls/returns not
   match
 - Pushing different return address onto the stack will have calls/returns
   not match
 - If callstack is deeper than the LBR, only the last entries are captured

Changes since v1
 - Update help document
 - Force exclude_user to 0 with warning in LBR call stack
 - Dump both lbr and fp info when report -D
 - Reconstruct thread__resolve_callchain_sample and split it into two patches
 - Use has_branch_callstack function to check LBR call stack available

Changes since v2
 - Rebase to 025ce5d33373

Kan Liang (3):
  perf tools: enable LBR call stack support
  perf tool: Move cpumode resolve code to add_callchain_ip
  perf tools: Construct LBR call chain

 tools/perf/Documentation/perf-record.txt |   8 +-
 tools/perf/builtin-record.c              |   6 +-
 tools/perf/builtin-report.c              |   2 +
 tools/perf/util/callchain.c              |  10 ++-
 tools/perf/util/callchain.h              |   1 +
 tools/perf/util/evsel.c                  |  21 ++++-
 tools/perf/util/evsel.h                  |   4 +
 tools/perf/util/machine.c                | 134 +++++++++++++++++++++----------
 tools/perf/util/session.c                |  60 ++++++++++++--
 9 files changed, 192 insertions(+), 54 deletions(-)

-- 
1.8.3.2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH V3 1/3] perf tools: enable LBR call stack support
  2014-11-14 13:44 [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) kan.liang
@ 2014-11-14 13:44 ` kan.liang
  2014-11-18  5:54   ` Namhyung Kim
  2014-11-14 13:44 ` [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip kan.liang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: kan.liang @ 2014-11-14 13:44 UTC (permalink / raw)
  To: acme, jolsa, a.p.zijlstra, eranian
  Cc: linux-kernel, mingo, paulus, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Currently, there are two call chain recording options, fp and dwarf.
Haswell has a new feature that utilizes the existing LBR facility to
record call chains. So it provides the third options to record call
chain. This patch enables the lbr call stack support.

LBR call stack has some limitations. It reuses current LBR facility, so
LBR call stack and branch record can not be enabled at the same time. It
is only available for user callchain.
However, LBR call stack can work on the user app which doesn't have
frame-pointer or dwarf debug info compiled. It is a good alternative
when nothing else works.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/Documentation/perf-record.txt |  8 +++++++-
 tools/perf/builtin-record.c              |  6 +++---
 tools/perf/builtin-report.c              |  2 ++
 tools/perf/util/callchain.c              | 10 +++++++++-
 tools/perf/util/callchain.h              |  1 +
 tools/perf/util/evsel.c                  | 21 +++++++++++++++++++--
 6 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 398f8d5..03d9939 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -100,13 +100,19 @@ OPTIONS
 	implies -g.
 
 	Allows specifying "fp" (frame pointer) or "dwarf"
-	(DWARF's CFI - Call Frame Information) as the method to collect
+	(DWARF's CFI - Call Frame Information) or "lbr"
+	(Hardware Last Branch Record facility) as the method to collect
 	the information used to show the call graphs.
 
 	In some systems, where binaries are build with gcc
 	--fomit-frame-pointer, using the "fp" method will produce bogus
 	call graphs, using "dwarf", if available (perf tools linked to
 	the libunwind library) should be used instead.
+	Using the "lbr" method doesn't require any compiler options. It
+	will produce call graphs from the hardware LBR registers. The
+	main limition is that it is only available on new Intel
+	platforms, such as Haswell. It can only get user call chain. It
+	doesn't work with branch stack sampling at the same time.
 
 -q::
 --quiet::
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 582c4da..e486627 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -639,7 +639,7 @@ error:
 
 static void callchain_debug(void)
 {
-	static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF" };
+	static const char *str[CALLCHAIN_MAX] = { "NONE", "FP", "DWARF", "LBR" };
 
 	pr_debug("callchain: type %s\n", str[callchain_param.record_mode]);
 
@@ -725,9 +725,9 @@ static struct record record = {
 #define CALLCHAIN_HELP "setup and enables call-graph (stack chain/backtrace) recording: "
 
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
-const char record_callchain_help[] = CALLCHAIN_HELP "fp dwarf";
+const char record_callchain_help[] = CALLCHAIN_HELP "fp dwarf lbr";
 #else
-const char record_callchain_help[] = CALLCHAIN_HELP "fp";
+const char record_callchain_help[] = CALLCHAIN_HELP "fp lbr";
 #endif
 
 /*
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 140a6cd..43babdb 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -261,6 +261,8 @@ static int report__setup_sample_type(struct report *rep)
 		if ((sample_type & PERF_SAMPLE_REGS_USER) &&
 		    (sample_type & PERF_SAMPLE_STACK_USER))
 			callchain_param.record_mode = CALLCHAIN_DWARF;
+		else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+			callchain_param.record_mode = CALLCHAIN_LBR;
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 38da69c..138c5d6 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -77,7 +77,7 @@ int parse_callchain_record_opt(const char *arg)
 				ret = 0;
 			} else
 				pr_err("callchain: No more arguments "
-				       "needed for -g fp\n");
+				       "needed for --call-graph fp\n");
 			break;
 
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
@@ -97,6 +97,14 @@ int parse_callchain_record_opt(const char *arg)
 				callchain_param.dump_size = size;
 			}
 #endif /* HAVE_DWARF_UNWIND_SUPPORT */
+		} else if (!strncmp(name, "lbr", sizeof("lbr"))) {
+			if (!strtok_r(NULL, ",", &saveptr)) {
+				callchain_param.record_mode = CALLCHAIN_LBR;
+				ret = 0;
+			} else
+				pr_err("callchain: No more arguments "
+					"needed for --call-graph lbr\n");
+			break;
 		} else {
 			pr_err("callchain: Unknown --call-graph option "
 			       "value: %s\n", arg);
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3e1ed15..a60a7b5 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -11,6 +11,7 @@ enum perf_call_graph_mode {
 	CALLCHAIN_NONE,
 	CALLCHAIN_FP,
 	CALLCHAIN_DWARF,
+	CALLCHAIN_LBR,
 	CALLCHAIN_MAX
 };
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 12b4396..7cbe2e9 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -537,13 +537,30 @@ int perf_evsel__group_desc(struct perf_evsel *evsel, char *buf, size_t size)
 }
 
 static void
-perf_evsel__config_callgraph(struct perf_evsel *evsel)
+perf_evsel__config_callgraph(struct perf_evsel *evsel,
+			     struct record_opts *opts)
 {
 	bool function = perf_evsel__is_function_event(evsel);
 	struct perf_event_attr *attr = &evsel->attr;
 
 	perf_evsel__set_sample_bit(evsel, CALLCHAIN);
 
+	if (callchain_param.record_mode == CALLCHAIN_LBR) {
+		if (!opts->branch_stack) {
+			perf_evsel__set_sample_bit(evsel, BRANCH_STACK);
+			attr->branch_sample_type = PERF_SAMPLE_BRANCH_USER |
+						PERF_SAMPLE_BRANCH_CALL_STACK;
+			if (attr->exclude_user) {
+				attr->exclude_user = 0;
+
+				pr_warning("LBR callstack option is only available"
+					   " to get user callchain information."
+					   " Force exclude_user to 0.\n");
+			}
+		} else
+			 pr_info("Cannot use LBR callstack with branch stack\n");
+	}
+
 	if (callchain_param.record_mode == CALLCHAIN_DWARF) {
 		if (!function) {
 			perf_evsel__set_sample_bit(evsel, REGS_USER);
@@ -659,7 +676,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
 	}
 
 	if (callchain_param.enabled && !evsel->no_aux_samples)
-		perf_evsel__config_callgraph(evsel);
+		perf_evsel__config_callgraph(evsel, opts);
 
 	if (target__has_cpu(&opts->target))
 		perf_evsel__set_sample_bit(evsel, CPU);
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
  2014-11-14 13:44 [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) kan.liang
  2014-11-14 13:44 ` [PATCH V3 1/3] perf tools: enable LBR call stack support kan.liang
@ 2014-11-14 13:44 ` kan.liang
  2014-11-17 13:57   ` Jiri Olsa
                     ` (2 more replies)
  2014-11-14 13:44 ` [PATCH V3 3/3] perf tools: Construct LBR call chain kan.liang
  2014-11-17 16:01 ` [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) Jiri Olsa
  3 siblings, 3 replies; 25+ messages in thread
From: kan.liang @ 2014-11-14 13:44 UTC (permalink / raw)
  To: acme, jolsa, a.p.zijlstra, eranian
  Cc: linux-kernel, mingo, paulus, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Move the cpumode resolve code to add_callchain_ip function.
No change in behavior.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/machine.c | 62 ++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d97309c..dd8496a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1384,11 +1384,35 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 static int add_callchain_ip(struct thread *thread,
 			    struct symbol **parent,
 			    struct addr_location *root_al,
-			    int cpumode,
+			    u8 cpumode,
 			    u64 ip)
 {
 	struct addr_location al;
 
+	if (ip >= PERF_CONTEXT_MAX) {
+		switch (ip) {
+		case PERF_CONTEXT_HV:
+			cpumode = PERF_RECORD_MISC_HYPERVISOR;
+			break;
+		case PERF_CONTEXT_KERNEL:
+			cpumode = PERF_RECORD_MISC_KERNEL;
+			break;
+		case PERF_CONTEXT_USER:
+			cpumode = PERF_RECORD_MISC_USER;
+			break;
+		default:
+			pr_debug("invalid callchain context: "
+				 "%"PRId64"\n", (s64) ip);
+			/*
+			 * It seems the callchain is corrupted.
+			 * Discard all.
+			 */
+			callchain_cursor_reset(&callchain_cursor);
+			return 1;
+		}
+		return 0;	
+	}
+
 	al.filtered = 0;
 	al.sym = NULL;
 	thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
@@ -1435,9 +1459,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 {
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	int chain_nr = min(max_stack, (int)chain->nr);
-	int i;
-	int j;
-	int err;
+	int i, j, err = 0;
 	int skip_idx __maybe_unused;
 
 	callchain_cursor_reset(&callchain_cursor);
@@ -1467,39 +1489,13 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 #endif
 		ip = chain->ips[j];
 
-		if (ip >= PERF_CONTEXT_MAX) {
-			switch (ip) {
-			case PERF_CONTEXT_HV:
-				cpumode = PERF_RECORD_MISC_HYPERVISOR;
-				break;
-			case PERF_CONTEXT_KERNEL:
-				cpumode = PERF_RECORD_MISC_KERNEL;
-				break;
-			case PERF_CONTEXT_USER:
-				cpumode = PERF_RECORD_MISC_USER;
-				break;
-			default:
-				pr_debug("invalid callchain context: "
-					 "%"PRId64"\n", (s64) ip);
-				/*
-				 * It seems the callchain is corrupted.
-				 * Discard all.
-				 */
-				callchain_cursor_reset(&callchain_cursor);
-				return 0;
-			}
-			continue;
-		}
-
 		err = add_callchain_ip(thread, parent, root_al,
 				       cpumode, ip);
-		if (err == -EINVAL)
-			break;
 		if (err)
-			return err;
+			goto exit;
 	}
-
-	return 0;
+exit:
+	return (err < 0) ? err : 0;
 }
 
 static int unwind_entry(struct unwind_entry *entry, void *arg)
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-14 13:44 [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) kan.liang
  2014-11-14 13:44 ` [PATCH V3 1/3] perf tools: enable LBR call stack support kan.liang
  2014-11-14 13:44 ` [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip kan.liang
@ 2014-11-14 13:44 ` kan.liang
  2014-11-17 15:54   ` Jiri Olsa
                     ` (2 more replies)
  2014-11-17 16:01 ` [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) Jiri Olsa
  3 siblings, 3 replies; 25+ messages in thread
From: kan.liang @ 2014-11-14 13:44 UTC (permalink / raw)
  To: acme, jolsa, a.p.zijlstra, eranian
  Cc: linux-kernel, mingo, paulus, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

LBR call stack only has user callchain. It is output as
PERF_SAMPLE_BRANCH_STACK data format. For the kernel callchain, it's
still from PERF_SAMPLE_CALLCHAIN.
The perf tool has to handle both data sources to construct a
complete callstack.
For perf report -D option, both lbr and fp information will be
displayed.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/evsel.h   |  4 +++
 tools/perf/util/machine.c | 72 +++++++++++++++++++++++++++++++++++++++++------
 tools/perf/util/session.c | 60 +++++++++++++++++++++++++++++++++++----
 3 files changed, 122 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 9797909..1bbaa74 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -368,4 +368,8 @@ for ((_evsel) = list_entry((_leader)->node.next, struct perf_evsel, node); 	\
      (_evsel) && (_evsel)->leader == (_leader);					\
      (_evsel) = list_entry((_evsel)->node.next, struct perf_evsel, node))
 
+static inline bool has_branch_callstack(struct perf_evsel *evsel)
+{
+	return evsel->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
+}
 #endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index dd8496a..a26c9e0 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1452,15 +1452,19 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 }
 
 static int thread__resolve_callchain_sample(struct thread *thread,
-					     struct ip_callchain *chain,
-					     struct symbol **parent,
-					     struct addr_location *root_al,
-					     int max_stack)
+					    struct perf_evsel *evsel,
+					    struct perf_sample *sample,
+					    struct symbol **parent,
+					    struct addr_location *root_al,
+					    int max_stack)
 {
+	struct ip_callchain *chain = sample->callchain;
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	int chain_nr = min(max_stack, (int)chain->nr);
 	int i, j, err = 0;
 	int skip_idx __maybe_unused;
+	u64 ip;
+	bool lbr = has_branch_callstack(evsel);
 
 	callchain_cursor_reset(&callchain_cursor);
 
@@ -1469,6 +1473,59 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		return 0;
 	}
 
+	if (lbr) {
+		for (i = 0; i < chain_nr; i++) {
+			if (chain->ips[i] == PERF_CONTEXT_USER)
+				break;
+		}
+
+		/* LBR only affects the user callchain */
+		if (i != chain_nr) {
+			struct branch_stack *lbr_stack = sample->branch_stack;
+			int lbr_nr = lbr_stack->nr;
+			/*
+			 * LBR callstack can only get user call chain.
+			 * The mix_chain_nr is kernel call chain
+			 * number plus LBR user call chain number.
+			 * i is kernel call chain number,
+			 * 1 is PERF_CONTEXT_USER,
+			 * lbr_nr + 1 is the user call chain number.
+			 * For details, please refer to the comments
+			 * in callchain__printf
+			 */
+			int mix_chain_nr = i + 1 + lbr_nr + 1;
+
+			if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
+				pr_warning("corrupted callchain. skipping...\n");
+				return 0;
+			}
+
+			for (j = 0; j < mix_chain_nr; j++) {
+				if (callchain_param.order == ORDER_CALLEE) {
+					if (j < i + 1)
+						ip = chain->ips[j];
+					else if (j > i + 1)
+						ip = lbr_stack->entries[j - i - 2].from;
+					else
+						ip = lbr_stack->entries[0].to;
+				} else {
+					if (j < lbr_nr)
+						ip = lbr_stack->entries[lbr_nr - j - 1].from;
+					else if (j > lbr_nr)
+						ip = chain->ips[i + 1 - (j - lbr_nr)];
+					else
+						ip = lbr_stack->entries[0].to;
+				}
+
+				err = add_callchain_ip(thread, parent, root_al,
+						       cpumode, ip);
+				if (err)
+					goto exit;
+			}
+			return 0;
+		}
+	}
+
 	/*
 	 * Based on DWARF debug information, some architectures skip
 	 * a callchain entry saved by the kernel.
@@ -1476,8 +1533,6 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	skip_idx = arch_skip_callchain_idx(thread, chain);
 
 	for (i = 0; i < chain_nr; i++) {
-		u64 ip;
-
 		if (callchain_param.order == ORDER_CALLEE)
 			j = i;
 		else
@@ -1512,8 +1567,9 @@ int thread__resolve_callchain(struct thread *thread,
 			      struct addr_location *root_al,
 			      int max_stack)
 {
-	int ret = thread__resolve_callchain_sample(thread, sample->callchain,
-						   parent, root_al, max_stack);
+	int ret = thread__resolve_callchain_sample(thread, evsel,
+						   sample, parent,
+						   root_al, max_stack);
 	if (ret)
 		return ret;
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f4478ce..335c3a9 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -557,15 +557,63 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
 	return 0;
 }
 
-static void callchain__printf(struct perf_sample *sample)
+static void callchain__printf(struct perf_evsel *evsel,
+			      struct perf_sample *sample)
 {
 	unsigned int i;
+	struct ip_callchain *callchain = sample->callchain;
+	bool lbr = has_branch_callstack(evsel);
 
-	printf("... chain: nr:%" PRIu64 "\n", sample->callchain->nr);
+	if (lbr) {
+		struct branch_stack *lbr_stack = sample->branch_stack;
+		u64 kernel_callchain_nr = callchain->nr;
 
-	for (i = 0; i < sample->callchain->nr; i++)
+		for (i = 0; i < kernel_callchain_nr; i++) {
+			if (callchain->ips[i] == PERF_CONTEXT_USER)
+				break;
+		}
+
+		if ((i != kernel_callchain_nr) && lbr_stack->nr) {
+			u64 total_nr;
+			/*
+			 * LBR callstack can only get user call chain,
+			 * i is kernel call chain number,
+			 * 1 is PERF_CONTEXT_USER.
+			 *
+			 * The user call chain is stored in LBR registers.
+			 * LBR are pair registers. The caller is stored
+			 * in "from" register, while the callee is stored
+			 * in "to" register.
+			 * For example, there is a call stack
+			 * "A"->"B"->"C"->"D".
+			 * The LBR registers will recorde like
+			 * "C"->"D", "B"->"C", "A"->"B".
+			 * So only the first "to" register and all "from"
+			 * registers are needed to construct the whole stack.
+			 */
+			total_nr = i + 1 + lbr_stack->nr + 1;
+			kernel_callchain_nr = i + 1;
+
+			printf("... LBR call chain: nr:%" PRIu64 "\n", total_nr);
+
+			for (i = 0; i < kernel_callchain_nr; i++)
+				printf("..... %2d: %016" PRIx64 "\n",
+				       i, callchain->ips[i]);
+
+			printf("..... %2d: %016" PRIx64 "\n",
+			       (int)(kernel_callchain_nr), lbr_stack->entries[0].to);
+			for (i = 0; i < lbr_stack->nr; i++)
+				printf("..... %2d: %016" PRIx64 "\n",
+				       (int)(i + kernel_callchain_nr + 1), lbr_stack->entries[i].from);
+		}
+
+	}
+
+	printf("... FP chain: nr:%" PRIu64 "\n", callchain->nr);
+
+	for (i = 0; i < callchain->nr; i++)
 		printf("..... %2d: %016" PRIx64 "\n",
-		       i, sample->callchain->ips[i]);
+		       i, callchain->ips[i]);
 }
 
 static void branch_stack__printf(struct perf_sample *sample)
@@ -691,9 +739,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 	sample_type = evsel->attr.sample_type;
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN)
-		callchain__printf(sample);
+		callchain__printf(evsel, sample);
 
-	if (sample_type & PERF_SAMPLE_BRANCH_STACK)
+	if ((sample_type & PERF_SAMPLE_BRANCH_STACK) && !has_branch_callstack(evsel))
 		branch_stack__printf(sample);
 
 	if (sample_type & PERF_SAMPLE_REGS_USER)
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
  2014-11-14 13:44 ` [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip kan.liang
@ 2014-11-17 13:57   ` Jiri Olsa
  2014-11-17 14:00   ` Jiri Olsa
  2014-11-18  8:24   ` Jiri Olsa
  2 siblings, 0 replies; 25+ messages in thread
From: Jiri Olsa @ 2014-11-17 13:57 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, Nov 14, 2014 at 08:44:11AM -0500, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>

SNIP

> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1384,11 +1384,35 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
>  static int add_callchain_ip(struct thread *thread,
>  			    struct symbol **parent,
>  			    struct addr_location *root_al,
> -			    int cpumode,
> +			    u8 cpumode,
>  			    u64 ip)
>  {
>  	struct addr_location al;
>  
> +	if (ip >= PERF_CONTEXT_MAX) {
> +		switch (ip) {
> +		case PERF_CONTEXT_HV:
> +			cpumode = PERF_RECORD_MISC_HYPERVISOR;
> +			break;
> +		case PERF_CONTEXT_KERNEL:
> +			cpumode = PERF_RECORD_MISC_KERNEL;
> +			break;
> +		case PERF_CONTEXT_USER:
> +			cpumode = PERF_RECORD_MISC_USER;
> +			break;
> +		default:
> +			pr_debug("invalid callchain context: "
> +				 "%"PRId64"\n", (s64) ip);
> +			/*
> +			 * It seems the callchain is corrupted.
> +			 * Discard all.
> +			 */
> +			callchain_cursor_reset(&callchain_cursor);
> +			return 1;
> +		}
> +		return 0;	

extra whitespace         ^^^^^^^

> +	}
> +
>  	al.filtered = 0;
>  	al.sym = NULL;
>  	thread__find_addr_location(thread, cpumode, MAP__FUNCTION,

SNIP

> -				 * Discard all.
> -				 */
> -				callchain_cursor_reset(&callchain_cursor);
> -				return 0;
> -			}
> -			continue;
> -		}
> -
>  		err = add_callchain_ip(thread, parent, root_al,
>  				       cpumode, ip);

hm, why do we need to pass cpumode in here? it should be handled within
add_callchain_ip in add_callchain_ip function, right?

jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
  2014-11-14 13:44 ` [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip kan.liang
  2014-11-17 13:57   ` Jiri Olsa
@ 2014-11-17 14:00   ` Jiri Olsa
  2014-11-18  8:24   ` Jiri Olsa
  2 siblings, 0 replies; 25+ messages in thread
From: Jiri Olsa @ 2014-11-17 14:00 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, Nov 14, 2014 at 08:44:11AM -0500, kan.liang@intel.com wrote:

SNIP

> -	int i;
> -	int j;
> -	int err;
> +	int i, j, err = 0;
>  	int skip_idx __maybe_unused;
>  
>  	callchain_cursor_reset(&callchain_cursor);
> @@ -1467,39 +1489,13 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  #endif
>  		ip = chain->ips[j];
>  
> -		if (ip >= PERF_CONTEXT_MAX) {
> -			switch (ip) {
> -			case PERF_CONTEXT_HV:
> -				cpumode = PERF_RECORD_MISC_HYPERVISOR;
> -				break;
> -			case PERF_CONTEXT_KERNEL:
> -				cpumode = PERF_RECORD_MISC_KERNEL;
> -				break;
> -			case PERF_CONTEXT_USER:
> -				cpumode = PERF_RECORD_MISC_USER;
> -				break;
> -			default:
> -				pr_debug("invalid callchain context: "
> -					 "%"PRId64"\n", (s64) ip);
> -				/*
> -				 * It seems the callchain is corrupted.
> -				 * Discard all.
> -				 */
> -				callchain_cursor_reset(&callchain_cursor);
> -				return 0;
> -			}
> -			continue;
> -		}
> -
>  		err = add_callchain_ip(thread, parent, root_al,
>  				       cpumode, ip);
> -		if (err == -EINVAL)
> -			break;

yea, I was wondering, why we did not spot this nop before ;-)

>  		if (err)
> -			return err;
> +			goto exit;

I dont understand this change seems like extra nop complication..
why not return in here and rather recheck at the end again?

jirka

>  	}
> -
> -	return 0;
> +exit:
> +	return (err < 0) ? err : 0;
>  }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-14 13:44 ` [PATCH V3 3/3] perf tools: Construct LBR call chain kan.liang
@ 2014-11-17 15:54   ` Jiri Olsa
  2014-11-17 17:41     ` Liang, Kan
  2014-11-17 15:55   ` Jiri Olsa
  2014-11-18  6:25   ` Namhyung Kim
  2 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2014-11-17 15:54 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, Nov 14, 2014 at 08:44:12AM -0500, kan.liang@intel.com wrote:

SNIP

> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index f4478ce..335c3a9 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -557,15 +557,63 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
>  	return 0;
>  }
>  
> -static void callchain__printf(struct perf_sample *sample)
> +static void callchain__printf(struct perf_evsel *evsel,
> +			      struct perf_sample *sample)
>  {
>  	unsigned int i;
> +	struct ip_callchain *callchain = sample->callchain;
> +	bool lbr = has_branch_callstack(evsel);
>  
> -	printf("... chain: nr:%" PRIu64 "\n", sample->callchain->nr);
> +	if (lbr) {
> +		struct branch_stack *lbr_stack = sample->branch_stack;
> +		u64 kernel_callchain_nr = callchain->nr;
>  
> -	for (i = 0; i < sample->callchain->nr; i++)
> +		for (i = 0; i < kernel_callchain_nr; i++) {
> +			if (callchain->ips[i] == PERF_CONTEXT_USER)
> +				break;
> +		}
> +
> +		if ((i != kernel_callchain_nr) && lbr_stack->nr) {
> +			u64 total_nr;
> +			/*
> +			 * LBR callstack can only get user call chain,
> +			 * i is kernel call chain number,
> +			 * 1 is PERF_CONTEXT_USER.
> +			 *
> +			 * The user call chain is stored in LBR registers.
> +			 * LBR are pair registers. The caller is stored
> +			 * in "from" register, while the callee is stored
> +			 * in "to" register.
> +			 * For example, there is a call stack
> +			 * "A"->"B"->"C"->"D".
> +			 * The LBR registers will recorde like
> +			 * "C"->"D", "B"->"C", "A"->"B".
> +			 * So only the first "to" register and all "from"
> +			 * registers are needed to construct the whole stack.
> +			 */

Andi is using some sanity checks:
http://marc.info/?l=linux-kernel&m=141584447819894&w=2
I guess this could be applied in here, once his patch gets in.

jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-14 13:44 ` [PATCH V3 3/3] perf tools: Construct LBR call chain kan.liang
  2014-11-17 15:54   ` Jiri Olsa
@ 2014-11-17 15:55   ` Jiri Olsa
  2014-11-18  6:14     ` Namhyung Kim
  2014-11-18  6:25   ` Namhyung Kim
  2 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2014-11-17 15:55 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, Nov 14, 2014 at 08:44:12AM -0500, kan.liang@intel.com wrote:

SNIP

> +	if (lbr) {
> +		for (i = 0; i < chain_nr; i++) {
> +			if (chain->ips[i] == PERF_CONTEXT_USER)
> +				break;
> +		}
> +
> +		/* LBR only affects the user callchain */
> +		if (i != chain_nr) {
> +			struct branch_stack *lbr_stack = sample->branch_stack;
> +			int lbr_nr = lbr_stack->nr;
> +			/*
> +			 * LBR callstack can only get user call chain.
> +			 * The mix_chain_nr is kernel call chain
> +			 * number plus LBR user call chain number.
> +			 * i is kernel call chain number,
> +			 * 1 is PERF_CONTEXT_USER,
> +			 * lbr_nr + 1 is the user call chain number.
> +			 * For details, please refer to the comments
> +			 * in callchain__printf
> +			 */
> +			int mix_chain_nr = i + 1 + lbr_nr + 1;
> +
> +			if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
> +				pr_warning("corrupted callchain. skipping...\n");
> +				return 0;
> +			}
> +
> +			for (j = 0; j < mix_chain_nr; j++) {
> +				if (callchain_param.order == ORDER_CALLEE) {
> +					if (j < i + 1)
> +						ip = chain->ips[j];
> +					else if (j > i + 1)
> +						ip = lbr_stack->entries[j - i - 2].from;
> +					else
> +						ip = lbr_stack->entries[0].to;
> +				} else {
> +					if (j < lbr_nr)
> +						ip = lbr_stack->entries[lbr_nr - j - 1].from;
> +					else if (j > lbr_nr)
> +						ip = chain->ips[i + 1 - (j - lbr_nr)];
> +					else
> +						ip = lbr_stack->entries[0].to;
> +				}
> +
> +				err = add_callchain_ip(thread, parent, root_al,
> +						       cpumode, ip);
> +				if (err)
> +					goto exit;
> +			}
> +			return 0;
> +		}
> +	}

also could you please move whole block above into separated function?
Andi has another change for this function and it's becoming really big.

other then that I think it's ok IMO and I could ack next version

thanks,
jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user)
  2014-11-14 13:44 [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) kan.liang
                   ` (2 preceding siblings ...)
  2014-11-14 13:44 ` [PATCH V3 3/3] perf tools: Construct LBR call chain kan.liang
@ 2014-11-17 16:01 ` Jiri Olsa
  3 siblings, 0 replies; 25+ messages in thread
From: Jiri Olsa @ 2014-11-17 16:01 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, Nov 14, 2014 at 08:44:09AM -0500, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> This is the user space patch for Haswell LBR call stack support.
> For many profiling tasks we need the callgraph. For example we often
> need to see the caller of a lock or the caller of a memcpy or other
> library function to actually tune the program. Frame pointer unwinding
> is efficient and works well. But frame pointers are off by default on
> 64bit code (and on modern 32bit gccs), so there are many binaries around
> that do not use frame pointers. Profiling unchanged production code is
> very useful in practice. On some CPUs frame pointer also has a high
> cost. Dwarf2 unwinding also does not always work and is extremely slow
> (upto 20% overhead).
> 
> Haswell has a new feature that utilizes the existing Last Branch Record
> facility to record call chains. When the feature is enabled, function
> call will be collected as normal, but as return instructions are
> executed the last captured branch record is popped from the on-chip LBR
> registers. The LBR call stack facility provides an alternative to get
> callgraph. It has some limitations too, but should work in most cases
> and is significantly faster than dwarf. Frame pointer unwinding is still
> the best default, but LBR call stack is a good alternative when nothing
> else works.
> 


---
> A new call chain recording option "lbr" is introduced into perf tool for
> LBR call stack. The user can use --call-graph lbr to get the call stack
> information from hardware.
> 
> When profiling bc(1) on Fedora 19:
> echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd
> If enabling LBR, perf report output looks like:
>     50.36%       bc  bc                 [.] bc_divide
>                  |
>                  --- bc_divide
>                      execute
>                      run_code
>                      yyparse
>                      main
>                      __libc_start_main
>                      _start
>     33.66%       bc  bc                 [.] _one_mult
>                  |
>                  --- _one_mult
>                      bc_divide
>                      execute
>                      run_code
>                      yyparse
>                      main
>                      __libc_start_main
>                      _start
>      7.62%       bc  bc                 [.] _bc_do_add
>                  |
>                  --- _bc_do_add
>                     |
>                     |--99.89%-- 0x2000186a8
>                      --0.11%-- [...]
>      6.83%       bc  bc                 [.] _bc_do_sub
>                  |
>                  --- _bc_do_sub
>                     |
>                     |--99.94%-- bc_add
>                     |          execute
>                     |          run_code
>                     |          yyparse
>                     |          main
>                     |          __libc_start_main
>                     |          _start
>                      --0.06%-- [...]
>      0.46%       bc  libc-2.17.so       [.] __memset_sse2
>                  |
>                  --- __memset_sse2
>                     |
>                     |--54.13%-- bc_new_num
>                     |          |
>                     |          |--51.00%-- bc_divide
>                     |          |          execute
>                     |          |          run_code
>                     |          |          yyparse
>                     |          |          main
>                     |          |          __libc_start_main
>                     |          |          _start
>                     |          |
>                     |          |--30.46%-- _bc_do_sub
>                     |          |          bc_add
>                     |          |          execute
>                     |          |          run_code
>                     |          |          yyparse
>                     |          |          main
>                     |          |          __libc_start_main
>                     |          |          _start
>                     |          |
>                     |           --18.55%-- _bc_do_add
>                     |                     bc_add
>                     |                     execute
>                     |                     run_code
>                     |                     yyparse
>                     |                     main
>                     |                     __libc_start_main
>                     |                     _start
>                     |
>                      --45.87%-- bc_divide
>                                execute
>                                run_code
>                                yyparse
>                                main
>                                __libc_start_main
>                                _start
> If using FP, perf report output looks like:
> echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd
>     50.49%       bc  bc                 [.] bc_divide
>                  |
>                  --- bc_divide
>     33.57%       bc  bc                 [.] _one_mult
>                  |
>                  --- _one_mult
>      7.61%       bc  bc                 [.] _bc_do_add
>                  |
>                  --- _bc_do_add
>                      0x2000186a8
>      6.88%       bc  bc                 [.] _bc_do_sub
>                  |
>                  --- _bc_do_sub
>      0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
>                  |
>                  --- __memcpy_ssse3_back
> 
> If using LBR, perf report -D output looks like:
> 11739295893248 0x4d0 [0xe0]: PERF_RECORD_SAMPLE(IP, 0x2): 10505/10505:
> 0x40054d period: 39255 addr: 0
> ... LBR call chain: nr:7
> .....  0: fffffffffffffe00
> .....  1: 0000000000400540
> .....  2: 0000000000400587
> .....  3: 00000000004005b3
> .....  4: 00000000004005ef
> .....  5: 0000003d1cc21b43
> .....  6: 0000000000400474
> ... FP chain: nr:6
> .....  0: fffffffffffffe00
> .....  1: 000000000040054d
> .....  2: 000000000040058c
> .....  3: 00000000004005b8
> .....  4: 00000000004005f4
> .....  5: 0000003d1cc21b45
>  ... thread: a.out:10505
>  ...... dso: /home/lk/a.out
> 
> 
> The LBR call stack has following known limitations
>  - Zero length calls are not filtered out by hardware
>  - Exception handing such as setjmp/longjmp will have calls/returns not
>    match
>  - Pushing different return address onto the stack will have calls/returns
>    not match
>  - If callstack is deeper than the LBR, only the last entries are captured
---

also could you please add all above ^^^ as an additional text
for patch 3/3 changelog (perf tools: Construct LBR call chain)?

looks too nice to lose it ;-)

thanks,
jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-17 15:54   ` Jiri Olsa
@ 2014-11-17 17:41     ` Liang, Kan
  2014-11-18  6:13       ` Namhyung Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Liang, Kan @ 2014-11-17 17:41 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak



> SNIP
> 
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index f4478ce..335c3a9 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -557,15 +557,63 @@ int perf_session_queue_event(struct
> perf_session *s, union perf_event *event,
> >  	return 0;
> >  }
> >
> > -static void callchain__printf(struct perf_sample *sample)
> > +static void callchain__printf(struct perf_evsel *evsel,
> > +			      struct perf_sample *sample)
> >  {
> >  	unsigned int i;
> > +	struct ip_callchain *callchain = sample->callchain;
> > +	bool lbr = has_branch_callstack(evsel);
> >
> > -	printf("... chain: nr:%" PRIu64 "\n", sample->callchain->nr);
> > +	if (lbr) {
> > +		struct branch_stack *lbr_stack = sample->branch_stack;
> > +		u64 kernel_callchain_nr = callchain->nr;
> >
> > -	for (i = 0; i < sample->callchain->nr; i++)
> > +		for (i = 0; i < kernel_callchain_nr; i++) {
> > +			if (callchain->ips[i] == PERF_CONTEXT_USER)
> > +				break;
> > +		}
> > +
> > +		if ((i != kernel_callchain_nr) && lbr_stack->nr) {
> > +			u64 total_nr;
> > +			/*
> > +			 * LBR callstack can only get user call chain,
> > +			 * i is kernel call chain number,
> > +			 * 1 is PERF_CONTEXT_USER.
> > +			 *
> > +			 * The user call chain is stored in LBR registers.
> > +			 * LBR are pair registers. The caller is stored
> > +			 * in "from" register, while the callee is stored
> > +			 * in "to" register.
> > +			 * For example, there is a call stack
> > +			 * "A"->"B"->"C"->"D".
> > +			 * The LBR registers will recorde like
> > +			 * "C"->"D", "B"->"C", "A"->"B".
> > +			 * So only the first "to" register and all "from"
> > +			 * registers are needed to construct the whole
> stack.
> > +			 */
> 
> Andi is using some sanity checks:
> http://marc.info/?l=linux-kernel&m=141584447819894&w=2
> I guess this could be applied in here, once his patch gets in.
> 

Are you suggesting me to remove the comments,
or rebase the whole patch to Andi's patch once it's merged?

The branch history in Andi's patch is different as the call stack,
although they are both from LBR.
Andi's branch history recording branch records for
taken branches, interrupts, and exceptions.
While the LBR call stack records for the call stack.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 1/3] perf tools: enable LBR call stack support
  2014-11-14 13:44 ` [PATCH V3 1/3] perf tools: enable LBR call stack support kan.liang
@ 2014-11-18  5:54   ` Namhyung Kim
  2014-11-18 13:57     ` Liang, Kan
  0 siblings, 1 reply; 25+ messages in thread
From: Namhyung Kim @ 2014-11-18  5:54 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, jolsa, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, 14 Nov 2014 08:44:10 -0500, kan liang wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> Currently, there are two call chain recording options, fp and dwarf.
> Haswell has a new feature that utilizes the existing LBR facility to
> record call chains. So it provides the third options to record call
> chain. This patch enables the lbr call stack support.
>
> LBR call stack has some limitations. It reuses current LBR facility, so
> LBR call stack and branch record can not be enabled at the same time. It
> is only available for user callchain.
> However, LBR call stack can work on the user app which doesn't have
> frame-pointer or dwarf debug info compiled. It is a good alternative
> when nothing else works.

So this is only for Haswell or later, right?  What if user gives it to
the other CPUs - is it fail or ignore silently?  I think we need to warn
user about it more kindly..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-17 17:41     ` Liang, Kan
@ 2014-11-18  6:13       ` Namhyung Kim
  2014-11-18  7:55         ` Jiri Olsa
  0 siblings, 1 reply; 25+ messages in thread
From: Namhyung Kim @ 2014-11-18  6:13 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Jiri Olsa, acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Mon, 17 Nov 2014 17:41:32 +0000, Kan Liang wrote:
>> SNIP
>> 
>> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> > index f4478ce..335c3a9 100644
>> > --- a/tools/perf/util/session.c
>> > +++ b/tools/perf/util/session.c
>> > @@ -557,15 +557,63 @@ int perf_session_queue_event(struct
>> perf_session *s, union perf_event *event,
>> >  	return 0;
>> >  }
>> >
>> > -static void callchain__printf(struct perf_sample *sample)
>> > +static void callchain__printf(struct perf_evsel *evsel,
>> > +			      struct perf_sample *sample)
>> >  {
>> >  	unsigned int i;
>> > +	struct ip_callchain *callchain = sample->callchain;
>> > +	bool lbr = has_branch_callstack(evsel);
>> >
>> > -	printf("... chain: nr:%" PRIu64 "\n", sample->callchain->nr);
>> > +	if (lbr) {
>> > +		struct branch_stack *lbr_stack = sample->branch_stack;
>> > +		u64 kernel_callchain_nr = callchain->nr;
>> >
>> > -	for (i = 0; i < sample->callchain->nr; i++)
>> > +		for (i = 0; i < kernel_callchain_nr; i++) {
>> > +			if (callchain->ips[i] == PERF_CONTEXT_USER)
>> > +				break;
>> > +		}
>> > +
>> > +		if ((i != kernel_callchain_nr) && lbr_stack->nr) {
>> > +			u64 total_nr;
>> > +			/*
>> > +			 * LBR callstack can only get user call chain,
>> > +			 * i is kernel call chain number,
>> > +			 * 1 is PERF_CONTEXT_USER.
>> > +			 *
>> > +			 * The user call chain is stored in LBR registers.
>> > +			 * LBR are pair registers. The caller is stored
>> > +			 * in "from" register, while the callee is stored
>> > +			 * in "to" register.
>> > +			 * For example, there is a call stack
>> > +			 * "A"->"B"->"C"->"D".
>> > +			 * The LBR registers will recorde like
>> > +			 * "C"->"D", "B"->"C", "A"->"B".
>> > +			 * So only the first "to" register and all "from"
>> > +			 * registers are needed to construct the whole
>> stack.
>> > +			 */
>> 
>> Andi is using some sanity checks:
>> http://marc.info/?l=linux-kernel&m=141584447819894&w=2
>> I guess this could be applied in here, once his patch gets in.
>> 
>
> Are you suggesting me to remove the comments,
> or rebase the whole patch to Andi's patch once it's merged?
>
> The branch history in Andi's patch is different as the call stack,
> although they are both from LBR.
> Andi's branch history recording branch records for
> taken branches, interrupts, and exceptions.
> While the LBR call stack records for the call stack.

Right.  And branch history can overlap with normal callchains so
additional check in there is to remove duplication.  While LBR call
stack is separated to user only so there should be no overlap.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-17 15:55   ` Jiri Olsa
@ 2014-11-18  6:14     ` Namhyung Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Namhyung Kim @ 2014-11-18  6:14 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: kan.liang, acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Mon, 17 Nov 2014 16:55:53 +0100, Jiri Olsa wrote:
> On Fri, Nov 14, 2014 at 08:44:12AM -0500, kan.liang@intel.com wrote:
>
> SNIP
>
>> +	if (lbr) {
>> +		for (i = 0; i < chain_nr; i++) {
>> +			if (chain->ips[i] == PERF_CONTEXT_USER)
>> +				break;
>> +		}
>> +
>> +		/* LBR only affects the user callchain */
>> +		if (i != chain_nr) {
>> +			struct branch_stack *lbr_stack = sample->branch_stack;
>> +			int lbr_nr = lbr_stack->nr;
>> +			/*
>> +			 * LBR callstack can only get user call chain.
>> +			 * The mix_chain_nr is kernel call chain
>> +			 * number plus LBR user call chain number.
>> +			 * i is kernel call chain number,
>> +			 * 1 is PERF_CONTEXT_USER,
>> +			 * lbr_nr + 1 is the user call chain number.
>> +			 * For details, please refer to the comments
>> +			 * in callchain__printf
>> +			 */
>> +			int mix_chain_nr = i + 1 + lbr_nr + 1;
>> +
>> +			if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
>> +				pr_warning("corrupted callchain. skipping...\n");
>> +				return 0;
>> +			}
>> +
>> +			for (j = 0; j < mix_chain_nr; j++) {
>> +				if (callchain_param.order == ORDER_CALLEE) {
>> +					if (j < i + 1)
>> +						ip = chain->ips[j];
>> +					else if (j > i + 1)
>> +						ip = lbr_stack->entries[j - i - 2].from;
>> +					else
>> +						ip = lbr_stack->entries[0].to;
>> +				} else {
>> +					if (j < lbr_nr)
>> +						ip = lbr_stack->entries[lbr_nr - j - 1].from;
>> +					else if (j > lbr_nr)
>> +						ip = chain->ips[i + 1 - (j - lbr_nr)];
>> +					else
>> +						ip = lbr_stack->entries[0].to;
>> +				}
>> +
>> +				err = add_callchain_ip(thread, parent, root_al,
>> +						       cpumode, ip);
>> +				if (err)
>> +					goto exit;
>> +			}
>> +			return 0;
>> +		}
>> +	}
>
> also could you please move whole block above into separated function?
> Andi has another change for this function and it's becoming really big.

Agreed.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-14 13:44 ` [PATCH V3 3/3] perf tools: Construct LBR call chain kan.liang
  2014-11-17 15:54   ` Jiri Olsa
  2014-11-17 15:55   ` Jiri Olsa
@ 2014-11-18  6:25   ` Namhyung Kim
  2014-11-18 14:01     ` Liang, Kan
  2 siblings, 1 reply; 25+ messages in thread
From: Namhyung Kim @ 2014-11-18  6:25 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, jolsa, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, 14 Nov 2014 08:44:12 -0500, kan liang wrote:
> +		/* LBR only affects the user callchain */
> +		if (i != chain_nr) {
> +			struct branch_stack *lbr_stack = sample->branch_stack;
> +			int lbr_nr = lbr_stack->nr;
> +			/*
> +			 * LBR callstack can only get user call chain.
> +			 * The mix_chain_nr is kernel call chain
> +			 * number plus LBR user call chain number.
> +			 * i is kernel call chain number,
> +			 * 1 is PERF_CONTEXT_USER,
> +			 * lbr_nr + 1 is the user call chain number.
> +			 * For details, please refer to the comments
> +			 * in callchain__printf
> +			 */
> +			int mix_chain_nr = i + 1 + lbr_nr + 1;
> +
> +			if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
> +				pr_warning("corrupted callchain. skipping...\n");
> +				return 0;
> +			}

I'm not sure whether this is really a corrupted callchain.  If a single
chain is greater than the max it should be corrupted, but we're now
summing up with other values..

What about checking callchain_nr and lbr_nr separately or mix_nr with
2*max?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-18  6:13       ` Namhyung Kim
@ 2014-11-18  7:55         ` Jiri Olsa
  2014-11-18 14:37           ` Liang, Kan
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2014-11-18  7:55 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Liang, Kan, acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Tue, Nov 18, 2014 at 03:13:50PM +0900, Namhyung Kim wrote:

SNIP

> >> > +			 * in "from" register, while the callee is stored
> >> > +			 * in "to" register.
> >> > +			 * For example, there is a call stack
> >> > +			 * "A"->"B"->"C"->"D".
> >> > +			 * The LBR registers will recorde like
> >> > +			 * "C"->"D", "B"->"C", "A"->"B".
> >> > +			 * So only the first "to" register and all "from"
> >> > +			 * registers are needed to construct the whole
> >> stack.
> >> > +			 */
> >> 
> >> Andi is using some sanity checks:
> >> http://marc.info/?l=linux-kernel&m=141584447819894&w=2
> >> I guess this could be applied in here, once his patch gets in.
> >> 
> >
> > Are you suggesting me to remove the comments,
> > or rebase the whole patch to Andi's patch once it's merged?
> >
> > The branch history in Andi's patch is different as the call stack,
> > although they are both from LBR.
> > Andi's branch history recording branch records for
> > taken branches, interrupts, and exceptions.
> > While the LBR call stack records for the call stack.
> 
> Right.  And branch history can overlap with normal callchains so
> additional check in there is to remove duplication.  While LBR call
> stack is separated to user only so there should be no overlap.

hum, it seemed to me like the remove_loops function could be used
for this one as well.. but anyway I meant that this can be introduced
later after Andi's change gets in

thanks,
jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
  2014-11-14 13:44 ` [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip kan.liang
  2014-11-17 13:57   ` Jiri Olsa
  2014-11-17 14:00   ` Jiri Olsa
@ 2014-11-18  8:24   ` Jiri Olsa
  2014-11-21 15:06     ` Liang, Kan
  2 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2014-11-18  8:24 UTC (permalink / raw)
  To: kan.liang; +Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Fri, Nov 14, 2014 at 08:44:11AM -0500, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> Move the cpumode resolve code to add_callchain_ip function.
> No change in behavior.
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  tools/perf/util/machine.c | 62 ++++++++++++++++++++++-------------------------
>  1 file changed, 29 insertions(+), 33 deletions(-)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index d97309c..dd8496a 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1384,11 +1384,35 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
>  static int add_callchain_ip(struct thread *thread,
>  			    struct symbol **parent,
>  			    struct addr_location *root_al,
> -			    int cpumode,
> +			    u8 cpumode,

Andi's patch got in:
b3340a5 perf callchain: Support handling complete branch stacks as histograms

and it uses the cpumode check for -1, so it needs to stay int, otherwise
there's a compilation failure:

  CC       util/machine.o
util/machine.c: In function ‘add_callchain_ip’:
util/machine.c:1419:14: error: comparison is always false due to limited range of data type [-Werror=type-limits]
  if (cpumode == -1)
              ^


jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH V3 1/3] perf tools: enable LBR call stack support
  2014-11-18  5:54   ` Namhyung Kim
@ 2014-11-18 13:57     ` Liang, Kan
  0 siblings, 0 replies; 25+ messages in thread
From: Liang, Kan @ 2014-11-18 13:57 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, jolsa, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak


> 
> On Fri, 14 Nov 2014 08:44:10 -0500, kan liang wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > Currently, there are two call chain recording options, fp and dwarf.
> > Haswell has a new feature that utilizes the existing LBR facility to
> > record call chains. So it provides the third options to record call
> > chain. This patch enables the lbr call stack support.
> >
> > LBR call stack has some limitations. It reuses current LBR facility,
> > so LBR call stack and branch record can not be enabled at the same
> > time. It is only available for user callchain.
> > However, LBR call stack can work on the user app which doesn't have
> > frame-pointer or dwarf debug info compiled. It is a good alternative
> > when nothing else works.
> 
> So this is only for Haswell or later, right?  

Right.

>What if user gives it to the other
> CPUs - is it fail or ignore silently?  I think we need to warn user about it
> more kindly..

The kernel will handle it. EOPNOTSUPP will be return if there is no LBR
Call stack support.

Thanks,
Kan

> 
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-18  6:25   ` Namhyung Kim
@ 2014-11-18 14:01     ` Liang, Kan
  2014-11-19  6:01       ` Namhyung Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Liang, Kan @ 2014-11-18 14:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, jolsa, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak



> On Fri, 14 Nov 2014 08:44:12 -0500, kan liang wrote:
> > +		/* LBR only affects the user callchain */
> > +		if (i != chain_nr) {
> > +			struct branch_stack *lbr_stack = sample-
> >branch_stack;
> > +			int lbr_nr = lbr_stack->nr;
> > +			/*
> > +			 * LBR callstack can only get user call chain.
> > +			 * The mix_chain_nr is kernel call chain
> > +			 * number plus LBR user call chain number.
> > +			 * i is kernel call chain number,
> > +			 * 1 is PERF_CONTEXT_USER,
> > +			 * lbr_nr + 1 is the user call chain number.
> > +			 * For details, please refer to the comments
> > +			 * in callchain__printf
> > +			 */
> > +			int mix_chain_nr = i + 1 + lbr_nr + 1;
> > +
> > +			if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
> > +				pr_warning("corrupted callchain.
> skipping...\n");
> > +				return 0;
> > +			}
> 
> I'm not sure whether this is really a corrupted callchain.  If a single chain is
> greater than the max it should be corrupted, but we're now summing up
> with other values..
> 
> What about checking callchain_nr and lbr_nr separately or mix_nr with
> 2*max?

The lbr_nr max is 16. I will change it to max+16.

Thanks,
Kan

> 
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-18  7:55         ` Jiri Olsa
@ 2014-11-18 14:37           ` Liang, Kan
  2014-11-18 19:40             ` Liang, Kan
  0 siblings, 1 reply; 25+ messages in thread
From: Liang, Kan @ 2014-11-18 14:37 UTC (permalink / raw)
  To: Jiri Olsa, Namhyung Kim
  Cc: acme, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak



> On Tue, Nov 18, 2014 at 03:13:50PM +0900, Namhyung Kim wrote:
> 
> SNIP
> 
> > >> > +			 * in "from" register, while the callee is
> stored
> > >> > +			 * in "to" register.
> > >> > +			 * For example, there is a call stack
> > >> > +			 * "A"->"B"->"C"->"D".
> > >> > +			 * The LBR registers will recorde like
> > >> > +			 * "C"->"D", "B"->"C", "A"->"B".
> > >> > +			 * So only the first "to" register and all
> "from"
> > >> > +			 * registers are needed to construct the
> whole
> > >> stack.
> > >> > +			 */
> > >>
> > >> Andi is using some sanity checks:
> > >> http://marc.info/?l=linux-kernel&m=141584447819894&w=2
> > >> I guess this could be applied in here, once his patch gets in.
> > >>
> > >
> > > Are you suggesting me to remove the comments, or rebase the whole
> > > patch to Andi's patch once it's merged?
> > >
> > > The branch history in Andi's patch is different as the call stack,
> > > although they are both from LBR.
> > > Andi's branch history recording branch records for taken branches,
> > > interrupts, and exceptions.
> > > While the LBR call stack records for the call stack.
> >
> > Right.  And branch history can overlap with normal callchains so
> > additional check in there is to remove duplication.  While LBR call
> > stack is separated to user only so there should be no overlap.
> 
> hum, it seemed to me like the remove_loops function could be used for
> this one as well.. but anyway I meant that this can be introduced later after
> Andi's change gets in

I see. I will apply Andi's remove_loops.

Thanks,
Kan

> 
> thanks,
> jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-18 14:37           ` Liang, Kan
@ 2014-11-18 19:40             ` Liang, Kan
  2014-11-19  5:57               ` Namhyung Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Liang, Kan @ 2014-11-18 19:40 UTC (permalink / raw)
  To: 'Jiri Olsa', 'Namhyung Kim'
  Cc: 'acme@kernel.org', 'a.p.zijlstra@chello.nl',
	'eranian@google.com',
	'linux-kernel@vger.kernel.org',
	'mingo@redhat.com', 'paulus@samba.org',
	'ak@linux.intel.com'



> > whole
> > > >> stack.
> > > >> > +			 */
> > > >>
> > > >> Andi is using some sanity checks:
> > > >> http://marc.info/?l=linux-kernel&m=141584447819894&w=2
> > > >> I guess this could be applied in here, once his patch gets in.
> > > >>
> > > >
> > > > Are you suggesting me to remove the comments, or rebase the
> whole
> > > > patch to Andi's patch once it's merged?
> > > >
> > > > The branch history in Andi's patch is different as the call stack,
> > > > although they are both from LBR.
> > > > Andi's branch history recording branch records for taken branches,
> > > > interrupts, and exceptions.
> > > > While the LBR call stack records for the call stack.
> > >
> > > Right.  And branch history can overlap with normal callchains so
> > > additional check in there is to remove duplication.  While LBR call
> > > stack is separated to user only so there should be no overlap.
> >
> > hum, it seemed to me like the remove_loops function could be used for
> > this one as well.. but anyway I meant that this can be introduced
> > later after Andi's change gets in
> 
> I see. I will apply Andi's remove_loops.
> 

As Namhyung said, there is no overlap for LBR call stack. The user callchain 
is not a mix. It's from either LBR or FP. so remove_loops
doesn't fit to the LBR call stack. 
Sorry for the confusion from my last reply.

Thanks,
Kan  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-18 19:40             ` Liang, Kan
@ 2014-11-19  5:57               ` Namhyung Kim
  0 siblings, 0 replies; 25+ messages in thread
From: Namhyung Kim @ 2014-11-19  5:57 UTC (permalink / raw)
  To: Liang, Kan
  Cc: 'Jiri Olsa', 'acme@kernel.org',
	'a.p.zijlstra@chello.nl', 'eranian@google.com',
	'linux-kernel@vger.kernel.org',
	'mingo@redhat.com', 'paulus@samba.org',
	'ak@linux.intel.com'

Hi Kan,

On Tue, 18 Nov 2014 19:40:23 +0000, Kan Liang wrote:
>> > whole
>> > > >> stack.
>> > > >> > +			 */
>> > > >>
>> > > >> Andi is using some sanity checks:
>> > > >> http://marc.info/?l=linux-kernel&m=141584447819894&w=2
>> > > >> I guess this could be applied in here, once his patch gets in.
>> > > >>
>> > > >
>> > > > Are you suggesting me to remove the comments, or rebase the
>> whole
>> > > > patch to Andi's patch once it's merged?
>> > > >
>> > > > The branch history in Andi's patch is different as the call stack,
>> > > > although they are both from LBR.
>> > > > Andi's branch history recording branch records for taken branches,
>> > > > interrupts, and exceptions.
>> > > > While the LBR call stack records for the call stack.
>> > >
>> > > Right.  And branch history can overlap with normal callchains so
>> > > additional check in there is to remove duplication.  While LBR call
>> > > stack is separated to user only so there should be no overlap.
>> >
>> > hum, it seemed to me like the remove_loops function could be used for
>> > this one as well.. but anyway I meant that this can be introduced
>> > later after Andi's change gets in
>> 
>> I see. I will apply Andi's remove_loops.
>> 
>
> As Namhyung said, there is no overlap for LBR call stack. The user callchain 
> is not a mix. It's from either LBR or FP. so remove_loops
> doesn't fit to the LBR call stack. 

IIUC the remove_loops() is for eliminating repeated branches in a loop
in a single function (and it only checks LBRs not FP callchain).  It's
needed since branch history saves all branches, but not needed for LBR
callstack as it saves CALLs only.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-18 14:01     ` Liang, Kan
@ 2014-11-19  6:01       ` Namhyung Kim
  2014-11-19 13:37         ` Liang, Kan
  0 siblings, 1 reply; 25+ messages in thread
From: Namhyung Kim @ 2014-11-19  6:01 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, jolsa, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

On Tue, 18 Nov 2014 14:01:06 +0000, Kan Liang wrote:
>> On Fri, 14 Nov 2014 08:44:12 -0500, kan liang wrote:
>> > +		/* LBR only affects the user callchain */
>> > +		if (i != chain_nr) {
>> > +			struct branch_stack *lbr_stack = sample-
>> >branch_stack;
>> > +			int lbr_nr = lbr_stack->nr;
>> > +			/*
>> > +			 * LBR callstack can only get user call chain.
>> > +			 * The mix_chain_nr is kernel call chain
>> > +			 * number plus LBR user call chain number.
>> > +			 * i is kernel call chain number,
>> > +			 * 1 is PERF_CONTEXT_USER,
>> > +			 * lbr_nr + 1 is the user call chain number.
>> > +			 * For details, please refer to the comments
>> > +			 * in callchain__printf
>> > +			 */
>> > +			int mix_chain_nr = i + 1 + lbr_nr + 1;
>> > +
>> > +			if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
>> > +				pr_warning("corrupted callchain.
>> skipping...\n");
>> > +				return 0;
>> > +			}
>> 
>> I'm not sure whether this is really a corrupted callchain.  If a single chain is
>> greater than the max it should be corrupted, but we're now summing up
>> with other values..
>> 
>> What about checking callchain_nr and lbr_nr separately or mix_nr with
>> 2*max?
>
> The lbr_nr max is 16. I will change it to max+16.

Shouldn't it be 18 (16 + 1 + 1) at least?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH V3 3/3] perf tools: Construct LBR call chain
  2014-11-19  6:01       ` Namhyung Kim
@ 2014-11-19 13:37         ` Liang, Kan
  0 siblings, 0 replies; 25+ messages in thread
From: Liang, Kan @ 2014-11-19 13:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: acme, jolsa, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak



> 
> On Tue, 18 Nov 2014 14:01:06 +0000, Kan Liang wrote:
> >> On Fri, 14 Nov 2014 08:44:12 -0500, kan liang wrote:
> >> > +		/* LBR only affects the user callchain */
> >> > +		if (i != chain_nr) {
> >> > +			struct branch_stack *lbr_stack = sample-
> >> >branch_stack;
> >> > +			int lbr_nr = lbr_stack->nr;
> >> > +			/*
> >> > +			 * LBR callstack can only get user call chain.
> >> > +			 * The mix_chain_nr is kernel call chain
> >> > +			 * number plus LBR user call chain number.
> >> > +			 * i is kernel call chain number,
> >> > +			 * 1 is PERF_CONTEXT_USER,
> >> > +			 * lbr_nr + 1 is the user call chain number.
> >> > +			 * For details, please refer to the comments
> >> > +			 * in callchain__printf
> >> > +			 */
> >> > +			int mix_chain_nr = i + 1 + lbr_nr + 1;
> >> > +
> >> > +			if (mix_chain_nr > PERF_MAX_STACK_DEPTH) {
> >> > +				pr_warning("corrupted callchain.
> >> skipping...\n");
> >> > +				return 0;
> >> > +			}
> >>
> >> I'm not sure whether this is really a corrupted callchain.  If a
> >> single chain is greater than the max it should be corrupted, but
> >> we're now summing up with other values..
> >>
> >> What about checking callchain_nr and lbr_nr separately or mix_nr with
> >> 2*max?
> >
> > The lbr_nr max is 16. I will change it to max+16.
> 
> Shouldn't it be 18 (16 + 1 + 1) at least?

In the new patch, I use PERF_MAX_BRANCH_DEPTH from Andi's patch. It's
127. It should be enough.

 +		if (mix_chain_nr > PERF_MAX_STACK_DEPTH +
 PERF_MAX_BRANCH_DEPTH) {
 +			pr_warning("corrupted callchain. skipping...\n");
 +			return 0;
 +		}

Thanks,
Kan
> 
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
  2014-11-18  8:24   ` Jiri Olsa
@ 2014-11-21 15:06     ` Liang, Kan
  2014-11-21 15:19       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 25+ messages in thread
From: Liang, Kan @ 2014-11-21 15:06 UTC (permalink / raw)
  To: acme, Jiri Olsa; +Cc: a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2308 bytes --]



> -----Original Message-----
> From: Jiri Olsa [mailto:jolsa@redhat.com]
> Sent: Tuesday, November 18, 2014 3:25 AM
> To: Liang, Kan
> Cc: acme@kernel.org; a.p.zijlstra@chello.nl; eranian@google.com; linux-
> kernel@vger.kernel.org; mingo@redhat.com; paulus@samba.org;
> ak@linux.intel.com
> Subject: Re: [PATCH V3 2/3] perf tool: Move cpumode resolve code to
> add_callchain_ip
> 
> On Fri, Nov 14, 2014 at 08:44:11AM -0500, kan.liang@intel.com wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > Move the cpumode resolve code to add_callchain_ip function.
> > No change in behavior.
> >
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > ---
> >  tools/perf/util/machine.c | 62
> > ++++++++++++++++++++++-------------------------
> >  1 file changed, 29 insertions(+), 33 deletions(-)
> >
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index d97309c..dd8496a 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -1384,11 +1384,35 @@ struct mem_info
> *sample__resolve_mem(struct
> > perf_sample *sample,  static int add_callchain_ip(struct thread *thread,
> >  			    struct symbol **parent,
> >  			    struct addr_location *root_al,
> > -			    int cpumode,
> > +			    u8 cpumode,
> 
> Andi's patch got in:
> b3340a5 perf callchain: Support handling complete branch stacks as
> histograms
> 
> and it uses the cpumode check for -1, so it needs to stay int, otherwise
> there's a compilation failure:
> 
>   CC       util/machine.o
> util/machine.c: In function ‘add_callchain_ip’:
> util/machine.c:1419:14: error: comparison is always false due to limited
> range of data type [-Werror=type-limits]
>   if (cpumode == -1)
>               ^
> 

Hi Arnaldo and Jirka,

According to recent feedback, the LBR call stack V4 patch only need to do small
change. However, it looks the patch "b3340a5 perf callchain: Support handling
complete branch stacks as histograms" has been removed from perf/core.

Should I wait for the patch to merge again and submit the V5 patch then, OR
Should I rebase the V5 to the latest perf/core and submit it today?

Thanks,
Kan


> 
> jirka
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip
  2014-11-21 15:06     ` Liang, Kan
@ 2014-11-21 15:19       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-21 15:19 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Jiri Olsa, a.p.zijlstra, eranian, linux-kernel, mingo, paulus, ak

Em Fri, Nov 21, 2014 at 03:06:08PM +0000, Liang, Kan escreveu:
> 
> 
> > -----Original Message-----
> > From: Jiri Olsa [mailto:jolsa@redhat.com]
> > Sent: Tuesday, November 18, 2014 3:25 AM
> > To: Liang, Kan
> > Cc: acme@kernel.org; a.p.zijlstra@chello.nl; eranian@google.com; linux-
> > kernel@vger.kernel.org; mingo@redhat.com; paulus@samba.org;
> > ak@linux.intel.com
> > Subject: Re: [PATCH V3 2/3] perf tool: Move cpumode resolve code to
> > add_callchain_ip
> > 
> > On Fri, Nov 14, 2014 at 08:44:11AM -0500, kan.liang@intel.com wrote:
> > > From: Kan Liang <kan.liang@intel.com>
> > >
> > > Move the cpumode resolve code to add_callchain_ip function.
> > > No change in behavior.
> > >
> > > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > > ---
> > >  tools/perf/util/machine.c | 62
> > > ++++++++++++++++++++++-------------------------
> > >  1 file changed, 29 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index d97309c..dd8496a 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > @@ -1384,11 +1384,35 @@ struct mem_info
> > *sample__resolve_mem(struct
> > > perf_sample *sample,  static int add_callchain_ip(struct thread *thread,
> > >  			    struct symbol **parent,
> > >  			    struct addr_location *root_al,
> > > -			    int cpumode,
> > > +			    u8 cpumode,
> > 
> > Andi's patch got in:
> > b3340a5 perf callchain: Support handling complete branch stacks as
> > histograms
> > 
> > and it uses the cpumode check for -1, so it needs to stay int, otherwise
> > there's a compilation failure:
> > 
> >   CC       util/machine.o
> > util/machine.c: In function ‘add_callchain_ip’:
> > util/machine.c:1419:14: error: comparison is always false due to limited
> > range of data type [-Werror=type-limits]
> >   if (cpumode == -1)
> >               ^
> > 
> 
> Hi Arnaldo and Jirka,
> 
> According to recent feedback, the LBR call stack V4 patch only need to do small
> change. However, it looks the patch "b3340a5 perf callchain: Support handling
> complete branch stacks as histograms" has been removed from perf/core.

Yeah, I should have left it on a separate branch while testing was
underway, my bad.
 
> Should I wait for the patch to merge again and submit the V5 patch then, OR
> Should I rebase the V5 to the latest perf/core and submit it today?

I'll try to send a screenshot of the differences among

  perf report --branch-report --tui
  perf report --branch-report --stdio (or --gtk, those two matches)

As requested by Andi, that isn't noticing any difference among these
three output modes.

After this is clarified (and possibly fixed) I'll push to Ingo and then
you can move on.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-11-21 15:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 13:44 [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) kan.liang
2014-11-14 13:44 ` [PATCH V3 1/3] perf tools: enable LBR call stack support kan.liang
2014-11-18  5:54   ` Namhyung Kim
2014-11-18 13:57     ` Liang, Kan
2014-11-14 13:44 ` [PATCH V3 2/3] perf tool: Move cpumode resolve code to add_callchain_ip kan.liang
2014-11-17 13:57   ` Jiri Olsa
2014-11-17 14:00   ` Jiri Olsa
2014-11-18  8:24   ` Jiri Olsa
2014-11-21 15:06     ` Liang, Kan
2014-11-21 15:19       ` Arnaldo Carvalho de Melo
2014-11-14 13:44 ` [PATCH V3 3/3] perf tools: Construct LBR call chain kan.liang
2014-11-17 15:54   ` Jiri Olsa
2014-11-17 17:41     ` Liang, Kan
2014-11-18  6:13       ` Namhyung Kim
2014-11-18  7:55         ` Jiri Olsa
2014-11-18 14:37           ` Liang, Kan
2014-11-18 19:40             ` Liang, Kan
2014-11-19  5:57               ` Namhyung Kim
2014-11-17 15:55   ` Jiri Olsa
2014-11-18  6:14     ` Namhyung Kim
2014-11-18  6:25   ` Namhyung Kim
2014-11-18 14:01     ` Liang, Kan
2014-11-19  6:01       ` Namhyung Kim
2014-11-19 13:37         ` Liang, Kan
2014-11-17 16:01 ` [PATCH V3 0/3] perf tool: Haswell LBR call stack support (user) Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).