linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Implement lbr-as-callgraph v10
@ 2014-11-13  2:05 Andi Kleen
  2014-11-13  2:05 ` [PATCH 01/10] perf, tools: Factor out adding new call chain entries Andi Kleen
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme

[Reworks to address all the review feedback. Rebased to latest tree]
[Just a repost after a rebase]
[Even more review feedback and some bugs addressed.]
[Only port to changes in perf/core. No other changes.]
[Rebase to latest perf/core]
[Another rebase. No changes]

This patchkit implements lbr-as-callgraphs in per freport,
as an alternative way to present LBR information.

Current perf report does a histogram over the branch edges,
which is useful to look at basic blocks, but doesn't tell
you anything about the larger control flow behaviour.

This patchkit adds a new option --branch-history that
adds the branch paths to the callgraph history instead.

This allows to reason about individual branch paths leading
to specific samples.

Also available at
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/lbr-callgraph8

v2:
- rebased on perf/core
- fix various issues
- rename the option to --branch-history
- various fixes to display the information more concise
v3:
- White space changes
- Consolidate some patches
- Update some descriptions
v4:
- Fix various display problems
- Unknown srcline is now printed as symbol+offset
- Refactor some code to address review feedback
- Merge with latest tip
- Fix missing srcline display in stdio hist output.
v5:
- Rename functions
- Fix gtk build problem
- Fix crash without -g
- Improve error messages
- Improve srcline display in various ways
v6:
- Port to latest perf/core
v7:
- Really port to latest perf/core
v8:
- Rebased on 3.16-rc1
v9:
- Rebase on 3.17-rc* tip/perf/core
v10:
- Rebase to latest tree.
- Address review feedback (except where commented). See
the individual patches for changelog. I dropped the -v 
option support.

Example output:

    % perf record -b -g ./tsrc/tcall
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
    % perf report --branch-history
    ...
        54.91%  tcall.c:6  [.] f2                      tcall
                |
                |--65.53%-- f2 tcall.c:5
                |          |
                |          |--70.83%-- f1 tcall.c:11
                |          |          f1 tcall.c:10
                |          |          main tcall.c:18
                |          |          main tcall.c:18
                |          |          main tcall.c:17
                |          |          main tcall.c:17
                |          |          f1 tcall.c:13
                |          |          f1 tcall.c:13
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:12
                |          |          f1 tcall.c:12
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:11



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/10] perf, tools: Factor out adding new call chain entries
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 19:14   ` Arnaldo Carvalho de Melo
  2014-11-20  7:37   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms Andi Kleen
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Move the code to resolve and add a new callchain entry
into a new add_callchain_ip function. This will be used
in the next patches to add LBRs too.

No change in behavior.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/machine.c | 51 +++++++++++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 52e9490..84390ee 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1381,6 +1381,34 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 	return mi;
 }
 
+static int add_callchain_ip(struct thread *thread,
+			    struct symbol **parent,
+			    struct addr_location *root_al,
+			    int cpumode,
+			    u64 ip)
+{
+	struct addr_location al;
+
+	al.filtered = 0;
+	al.sym = NULL;
+	thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
+				   ip, &al);
+	if (al.sym != NULL) {
+		if (sort__has_parent && !*parent &&
+		    symbol__match_regex(al.sym, &parent_regex))
+			*parent = al.sym;
+		else if (have_ignore_callees && root_al &&
+		  symbol__match_regex(al.sym, &ignore_callees_regex)) {
+			/* Treat this symbol as the root,
+			   forgetting its callees. */
+			*root_al = al;
+			callchain_cursor_reset(&callchain_cursor);
+		}
+	}
+
+	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+}
+
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 					   struct addr_location *al)
 {
@@ -1427,7 +1455,6 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 
 	for (i = 0; i < chain_nr; i++) {
 		u64 ip;
-		struct addr_location al;
 
 		if (callchain_param.order == ORDER_CALLEE)
 			j = i;
@@ -1464,24 +1491,10 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 			continue;
 		}
 
-		al.filtered = 0;
-		thread__find_addr_location(thread, cpumode,
-					   MAP__FUNCTION, ip, &al);
-		if (al.sym != NULL) {
-			if (sort__has_parent && !*parent &&
-			    symbol__match_regex(al.sym, &parent_regex))
-				*parent = al.sym;
-			else if (have_ignore_callees && root_al &&
-			  symbol__match_regex(al.sym, &ignore_callees_regex)) {
-				/* Treat this symbol as the root,
-				   forgetting its callees. */
-				*root_al = al;
-				callchain_cursor_reset(&callchain_cursor);
-			}
-		}
-
-		err = callchain_cursor_append(&callchain_cursor,
-					      ip, al.map, al.sym);
+		err = add_callchain_ip(thread, parent, root_al,
+				       cpumode, ip);
+		if (err == -EINVAL)
+			break;
 		if (err)
 			return err;
 	}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
  2014-11-13  2:05 ` [PATCH 01/10] perf, tools: Factor out adding new call chain entries Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 19:14   ` Arnaldo Carvalho de Melo
  2014-12-08  6:53   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 03/10] perf, tools: Use al.addr to set up call chain Andi Kleen
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over complete
branch traces, instead of individual branches, similar to how normal
callgraphs are handled. This is done by putting it in
front of the normal callgraph and then using the normal callgraph
histogram infrastructure to unify them.

This way in complex functions we can understand the control flow
that lead to a particular sample, and may even see some control
flow in the caller for short functions.

Example (simplified, of course for such simple code this
is usually not needed):

tcall.c:

volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
	c = a / b;
}

__attribute__((noinline)) f1()
{
	f2();
	f2();
}
main()
{
	int i;
	for (i = 0; i < 1000000; i++)
		f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --branch-history
...
    54.91%  tcall.c:6  [.] f2                      tcall
            |
            |--65.53%-- f2 tcall.c:5
            |          |
            |          |--70.83%-- f1 tcall.c:11
            |          |          f1 tcall.c:10
            |          |          main tcall.c:18
            |          |          main tcall.c:18
            |          |          main tcall.c:17
            |          |          main tcall.c:17
            |          |          f1 tcall.c:13
            |          |          f1 tcall.c:13
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:12
            |          |          f1 tcall.c:12
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:11
            |          |
            |           --29.17%-- f1 tcall.c:12
            |                     f1 tcall.c:12
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:11
            |                     f1 tcall.c:10
            |                     main tcall.c:18
            |                     main tcall.c:18
            |                     main tcall.c:17
            |                     main tcall.c:17
            |                     f1 tcall.c:13
            |                     f1 tcall.c:13
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record
or anywhere else.

This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the difference
between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
    -b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
    patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |   7 +-
 tools/perf/builtin-report.c              |   4 +-
 tools/perf/util/callchain.c              |   4 +
 tools/perf/util/callchain.h              |   1 +
 tools/perf/util/machine.c                | 125 ++++++++++++++++++++++++++++---
 tools/perf/util/symbol.h                 |   3 +-
 6 files changed, 131 insertions(+), 13 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 0927bf4..22706be 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -159,7 +159,7 @@ OPTIONS
 --dump-raw-trace::
         Dump raw trace in ASCII.
 
--g [type,min[,limit],order[,key]]::
+-g [type,min[,limit],order[,key][,branch]]::
 --call-graph::
         Display call chains using type, min percent threshold, optional print
 	limit and order.
@@ -177,6 +177,11 @@ OPTIONS
 	- function: compare on functions
 	- address: compare on individual code addresses
 
+	branch can be:
+	- branch: include last branch information in callgraph
+	when available. Usually more convenient to use --branch-history
+	for this.
+
 	Default: fractal,0.5,callee,function.
 
 --children::
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 140a6cd..410d44f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -637,8 +637,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "regex filter to identify parent, see: '--sort parent'"),
 	OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
 		    "Only display entries with parent-match"),
-	OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order",
-		     "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). "
+	OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order[,branch]",
+		     "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address), add branches. "
 		     "Default: fractal,0.5,callee,function", &report_parse_callchain_opt, callchain_default_opt),
 	OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
 		    "Accumulate callchains of children and show total overhead as well"),
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0022980..7131ee3 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -149,6 +149,10 @@ static int parse_callchain_sort_key(const char *value)
 		callchain_param.key = CCKEY_ADDRESS;
 		return 0;
 	}
+	if (!strncmp(value, "branch", strlen(value))) {
+		callchain_param.branch_callstack = 1;
+		return 0;
+	}
 	return -1;
 }
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3caccc2..f570b8e 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -63,6 +63,7 @@ struct callchain_param {
 	sort_chain_func_t	sort;
 	enum chain_order	order;
 	enum chain_key		key;
+	bool			branch_callstack;
 };
 
 extern struct callchain_param callchain_param;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 84390ee..2e16d69 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -12,6 +12,7 @@
 #include <stdbool.h>
 #include <symbol/kallsyms.h>
 #include "unwind.h"
+#include "linux/hash.h"
 
 static void dsos__init(struct dsos *dsos)
 {
@@ -1391,7 +1392,11 @@ static int add_callchain_ip(struct thread *thread,
 
 	al.filtered = 0;
 	al.sym = NULL;
-	thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
+	if (cpumode == -1)
+		thread__find_cpumode_addr_location(thread, MAP__FUNCTION,
+						   ip, &al);
+	else
+		thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
 				   ip, &al);
 	if (al.sym != NULL) {
 		if (sort__has_parent && !*parent &&
@@ -1427,8 +1432,49 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 	return bi;
 }
 
+#define CHASHSZ 127
+#define CHASHBITS 7
+#define NO_ENTRY 0xff
+
+#define PERF_MAX_BRANCH_DEPTH 127
+
+/* Remove loops. */
+static int remove_loops(struct branch_entry *l, int nr)
+{
+	int i, j, off;
+	unsigned char chash[CHASHSZ];
+
+	memset(chash, NO_ENTRY, sizeof(chash));
+
+	BUG_ON(nr >= 256);
+	for (i = 0; i < nr; i++) {
+		int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
+
+		/* no collision handling for now */
+		if (chash[h] == NO_ENTRY) {
+			chash[h] = i;
+		} else if (l[chash[h]].from == l[i].from) {
+			bool is_loop = true;
+			/* check if it is a real loop */
+			off = 0;
+			for (j = chash[h]; j < i && i + off < nr; j++, off++)
+				if (l[j].from != l[i + off].from) {
+					is_loop = false;
+					break;
+				}
+			if (is_loop) {
+				memmove(l + i, l + i + off,
+					(nr - (i + off)) * sizeof(*l));
+				nr -= off;
+			}
+		}
+	}
+	return nr;
+}
+
 static int thread__resolve_callchain_sample(struct thread *thread,
 					     struct ip_callchain *chain,
+					     struct branch_stack *branch,
 					     struct symbol **parent,
 					     struct addr_location *root_al,
 					     int max_stack)
@@ -1438,22 +1484,82 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	int i;
 	int j;
 	int err;
-	int skip_idx __maybe_unused;
+	int skip_idx = -1;
+	int first_call = 0;
+
+	/*
+	 * Based on DWARF debug information, some architectures skip
+	 * a callchain entry saved by the kernel.
+	 */
+	if (chain->nr < PERF_MAX_STACK_DEPTH)
+		skip_idx = arch_skip_callchain_idx(thread, chain);
 
 	callchain_cursor_reset(&callchain_cursor);
 
+	/*
+	 * Add branches to call stack for easier browsing. This gives
+	 * more context for a sample than just the callers.
+	 *
+	 * This uses individual histograms of paths compared to the
+	 * aggregated histograms the normal LBR mode uses.
+	 *
+	 * Limitations for now:
+	 * - No extra filters
+	 * - No annotations (should annotate somehow)
+	 */
+
+	if (branch && callchain_param.branch_callstack) {
+		int nr = min(max_stack, (int)branch->nr);
+		struct branch_entry be[nr];
+
+		if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
+			pr_warning("corrupted branch chain. skipping...\n");
+			goto check_calls;
+		}
+
+		for (i = 0; i < nr; i++) {
+			if (callchain_param.order == ORDER_CALLEE) {
+				be[i] = branch->entries[i];
+				/*
+				 * Check for overlap into the callchain.
+				 * The return address is one off compared to
+				 * the branch entry. To adjust for this
+				 * assume the calling instruction is not longer
+				 * than 8 bytes.
+				 */
+				if (i == skip_idx ||
+				    chain->ips[first_call] >= PERF_CONTEXT_MAX)
+					first_call++;
+				else if (be[i].from < chain->ips[first_call] &&
+				    be[i].from >= chain->ips[first_call] - 8)
+					first_call++;
+			} else
+				be[i] = branch->entries[branch->nr - i - 1];
+		}
+
+		nr = remove_loops(be, nr);
+
+		for (i = 0; i < nr; i++) {
+			err = add_callchain_ip(thread, parent, root_al,
+					       -1, be[i].to);
+			if (!err)
+				err = add_callchain_ip(thread, parent, root_al,
+						       -1, be[i].from);
+			if (err == -EINVAL)
+				break;
+			if (err)
+				return err;
+		}
+		chain_nr -= nr;
+	}
+
+check_calls:
 	if (chain->nr > PERF_MAX_STACK_DEPTH) {
 		pr_warning("corrupted callchain. skipping...\n");
 		return 0;
 	}
 
-	/*
-	 * Based on DWARF debug information, some architectures skip
-	 * a callchain entry saved by the kernel.
-	 */
-	skip_idx = arch_skip_callchain_idx(thread, chain);
-
-	for (i = 0; i < chain_nr; i++) {
+	for (i = first_call; i < chain_nr; i++) {
 		u64 ip;
 
 		if (callchain_param.order == ORDER_CALLEE)
@@ -1517,6 +1623,7 @@ int thread__resolve_callchain(struct thread *thread,
 			      int max_stack)
 {
 	int ret = thread__resolve_callchain_sample(thread, sample->callchain,
+						   sample->branch_stack,
 						   parent, root_al, max_stack);
 	if (ret)
 		return ret;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index ded3ca7..b364d96 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -123,7 +123,8 @@ struct symbol_conf {
 			demangle,
 			demangle_kernel,
 			filter_relative,
-			show_hist_headers;
+			show_hist_headers,
+			branch_callstack;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/10] perf, tools: Use al.addr to set up call chain
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
  2014-11-13  2:05 ` [PATCH 01/10] perf, tools: Factor out adding new call chain entries Andi Kleen
  2014-11-13  2:05 ` [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 19:16   ` Arnaldo Carvalho de Melo
  2014-11-20  7:38   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 04/10] perf, tools: Add --branch-history option to report Andi Kleen
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Use the relative address, this makes get_srcline work correctly
in the end.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2e16d69..066e963 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1411,7 +1411,7 @@ static int add_callchain_ip(struct thread *thread,
 		}
 	}
 
-	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+	return callchain_cursor_append(&callchain_cursor, al.addr, al.map, al.sym);
 }
 
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/10] perf, tools: Add --branch-history option to report
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (2 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 03/10] perf, tools: Use al.addr to set up call chain Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-12-08  6:53   ` [tip:perf/core] perf report: Add --branch-history option tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 05/10] perf, tools: Use a common function to resolve symbol or name Andi Kleen
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a --branch-history option to perf report that changes all
the settings necessary for using the branches in callstacks.

This is just a short cut to make this nicer to use, it does
not enable any functionality by itself.

v2: Change sort order. Rename option to --branch-history to
be less confusing.
v3: Updates
v4: Fix conflict with newer perf base
v5: Port to latest tip
v6: Add more comments. Remove CCKEY_ADDRESS setting. Remove
unnecessary branch_mode setting. Use a boolean.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |  5 +++++
 tools/perf/builtin-report.c              | 26 ++++++++++++++++++++++----
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 22706be..dd7cccd 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -271,6 +271,11 @@ OPTIONS
 	branch stacks and it will automatically switch to the branch view mode,
 	unless --no-branch-stack is used.
 
+--branch-history::
+	Add the addresses of sampled taken branches to the callstack.
+	This allows to examine the path the program took to each sample.
+	The data collection must have used -b (or -j) and -g.
+
 --objdump=<path>::
         Path to objdump binary.
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 410d44f..fb272ff 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -226,8 +226,9 @@ static int report__setup_sample_type(struct report *rep)
 			return -EINVAL;
 		}
 		if (symbol_conf.use_callchain) {
-			ui__error("Selected -g but no callchain data. Did "
-				    "you call 'perf record' without -g?\n");
+			ui__error("Selected -g or --branch-history but no "
+				  "callchain data. Did\n"
+				  "you call 'perf record' without -g?\n");
 			return -1;
 		}
 	} else if (!rep->dont_use_callchains &&
@@ -575,6 +576,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	struct stat st;
 	bool has_br_stack = false;
 	int branch_mode = -1;
+	bool branch_call_mode = false;
 	char callchain_default_opt[] = "fractal,0.5,callee";
 	const char * const report_usage[] = {
 		"perf report [<options>]",
@@ -684,7 +686,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
 		    "Show event group information together"),
 	OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
-		    "use branch records for histogram filling", parse_branch_mode),
+		    "use branch records for per branch histogram filling",
+		    parse_branch_mode),
+	OPT_BOOLEAN(0, "branch-history", &branch_call_mode,
+		    "add last branch records to call history"),
 	OPT_STRING(0, "objdump", &objdump_path, "path",
 		   "objdump binary to use for disassembly and annotations"),
 	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
@@ -745,10 +750,23 @@ repeat:
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
-	if ((branch_mode == -1 && has_br_stack) || branch_mode == 1) {
+	/*
+	 * Branch mode is a tristate:
+	 * -1 means default, so decide based on the file having branch data.
+	 * 0/1 means the user chose a mode.
+	 */
+	if (((branch_mode == -1 && has_br_stack) || branch_mode == 1) &&
+	    branch_call_mode == -1) {
 		sort__mode = SORT_MODE__BRANCH;
 		symbol_conf.cumulate_callchain = false;
 	}
+	if (branch_call_mode) {
+		callchain_param.branch_callstack = 1;
+		symbol_conf.use_callchain = true;
+		callchain_register_param(&callchain_param);
+		if (sort_order == NULL)
+			sort_order = "srcline,symbol,dso";
+	}
 
 	if (report.mem_mode) {
 		if (sort__mode == SORT_MODE__BRANCH) {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/10] perf, tools: Use a common function to resolve symbol or name
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (3 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 04/10] perf, tools: Add --branch-history option to report Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 19:17   ` Arnaldo Carvalho de Melo
  2014-11-20  7:38   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 06/10] perf, tools: Enable printing the srcline in the history Andi Kleen
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Refactor the duplicated code to resolve the symbol name or
the address of a symbol into a single function.

Used in next patch to add common functionality.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/ui/browsers/hists.c | 17 -----------------
 tools/perf/ui/gtk/hists.c      | 11 +----------
 tools/perf/ui/stdio/hist.c     | 23 +++++++++--------------
 tools/perf/util/callchain.c    | 19 +++++++++++++++++++
 tools/perf/util/callchain.h    |  3 +++
 5 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index cfb976b..12c17c5 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -463,23 +463,6 @@ out:
 	return key;
 }
 
-static char *callchain_list__sym_name(struct callchain_list *cl,
-				      char *bf, size_t bfsize, bool show_dso)
-{
-	int printed;
-
-	if (cl->ms.sym)
-		printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-	else
-		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-
-	if (show_dso)
-		scnprintf(bf + printed, bfsize - printed, " %s",
-			  cl->ms.map ? cl->ms.map->dso->short_name : "unknown");
-
-	return bf;
-}
-
 struct callchain_print_arg {
 	/* for hists browser */
 	off_t	row_offset;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index fc654fb..4b3585e 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -89,15 +89,6 @@ void perf_gtk__init_hpp(void)
 				perf_gtk__hpp_color_overhead_acc;
 }
 
-static void callchain_list__sym_name(struct callchain_list *cl,
-				     char *bf, size_t bfsize)
-{
-	if (cl->ms.sym)
-		scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-	else
-		scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-}
-
 static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 				    GtkTreeIter *parent, int col, u64 total)
 {
@@ -128,7 +119,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 			scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
 			gtk_tree_store_set(store, &iter, 0, buf, -1);
 
-			callchain_list__sym_name(chain, buf, sizeof(buf));
+			callchain_list__sym_name(chain, buf, sizeof(buf), false);
 			gtk_tree_store_set(store, &iter, col, buf, -1);
 
 			if (need_new_parent) {
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 15b451a..dfcbc90 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -41,6 +41,7 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 {
 	int i;
 	size_t ret = 0;
+	char bf[1024];
 
 	ret += callchain__fprintf_left_margin(fp, left_margin);
 	for (i = 0; i < depth; i++) {
@@ -56,11 +57,8 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 		} else
 			ret += fprintf(fp, "%s", "          ");
 	}
-	if (chain->ms.sym)
-		ret += fprintf(fp, "%s\n", chain->ms.sym->name);
-	else
-		ret += fprintf(fp, "0x%0" PRIx64 "\n", chain->ip);
-
+	fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
+	fputc('\n', fp);
 	return ret;
 }
 
@@ -168,6 +166,7 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 	struct rb_node *node;
 	int i = 0;
 	int ret = 0;
+	char bf[1024];
 
 	/*
 	 * If have one single callchain root, don't bother printing
@@ -196,10 +195,8 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 			} else
 				ret += callchain__fprintf_left_margin(fp, left_margin);
 
-			if (chain->ms.sym)
-				ret += fprintf(fp, " %s\n", chain->ms.sym->name);
-			else
-				ret += fprintf(fp, " %p\n", (void *)(long)chain->ip);
+			ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
+							false));
 
 			if (++entries_printed == callchain_param.print_limit)
 				break;
@@ -219,6 +216,7 @@ static size_t __callchain__fprintf_flat(FILE *fp, struct callchain_node *node,
 {
 	struct callchain_list *chain;
 	size_t ret = 0;
+	char bf[1024];
 
 	if (!node)
 		return 0;
@@ -229,11 +227,8 @@ static size_t __callchain__fprintf_flat(FILE *fp, struct callchain_node *node,
 	list_for_each_entry(chain, &node->val, list) {
 		if (chain->ip >= PERF_CONTEXT_MAX)
 			continue;
-		if (chain->ms.sym)
-			ret += fprintf(fp, "                %s\n", chain->ms.sym->name);
-		else
-			ret += fprintf(fp, "                %p\n",
-					(void *)(long)chain->ip);
+		ret += fprintf(fp, "                %s\n", callchain_list__sym_name(chain,
+					bf, sizeof(bf), false));
 	}
 
 	return ret;
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 7131ee3..10870c9 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -812,3 +812,22 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 out:
 	return 1;
 }
+
+char *callchain_list__sym_name(struct callchain_list *cl,
+			       char *bf, size_t bfsize, bool show_dso)
+{
+	int printed;
+
+	if (cl->ms.sym) {
+		printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+	} else
+		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
+
+	if (show_dso)
+		scnprintf(bf + printed, bfsize - printed, " %s",
+			  cl->ms.map ?
+			  cl->ms.map->dso->short_name :
+			  "unknown");
+
+	return bf;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index f570b8e..5d5200c 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -194,4 +194,7 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
 }
 #endif
 
+char *callchain_list__sym_name(struct callchain_list *cl,
+			       char *bf, size_t bfsize, bool show_dso);
+
 #endif	/* __PERF_CALLCHAIN_H */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/10] perf, tools: Enable printing the srcline in the history
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (4 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 05/10] perf, tools: Use a common function to resolve symbol or name Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 19:20   ` Arnaldo Carvalho de Melo
  2014-12-08  6:48   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 07/10] perf, tools: Only print base source file for srcline Andi Kleen
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.

When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.

This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.

Contains fixes from Namhyung Kim

v2: Refactor code into common function
v3: Fix GTK build
v4: Rebase
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/callchain.c | 11 ++++++++++-
 tools/perf/util/callchain.h |  1 +
 tools/perf/util/srcline.c   |  6 ++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 10870c9..8046eb9 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -819,7 +819,16 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	int printed;
 
 	if (cl->ms.sym) {
-		printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+		if (callchain_param.key == CCKEY_ADDRESS &&
+		    cl->ms.map && !cl->srcline)
+			cl->srcline = get_srcline(cl->ms.map->dso,
+						  map__rip_2objdump(cl->ms.map,
+								    cl->ip));
+		if (cl->srcline)
+			printed = scnprintf(bf, bfsize, "%s %s",
+					cl->ms.sym->name, cl->srcline);
+		else
+			printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
 	} else
 		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 5d5200c..dbc08cf 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -71,6 +71,7 @@ extern struct callchain_param callchain_param;
 struct callchain_list {
 	u64			ip;
 	struct map_symbol	ms;
+	char		       *srcline;
 	struct list_head	list;
 };
 
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index f3e4bc5..c6a7cdc 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -258,7 +258,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
 	const char *dso_name;
 
 	if (!dso->has_srcline)
-		return SRCLINE_UNKNOWN;
+		goto out;
 
 	if (dso->symsrc_filename)
 		dso_name = dso->symsrc_filename;
@@ -289,7 +289,9 @@ out:
 		dso->has_srcline = 0;
 		dso__free_a2l(dso);
 	}
-	return SRCLINE_UNKNOWN;
+	if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
+		return SRCLINE_UNKNOWN;
+	return srcline;
 }
 
 void free_srcline(char *srcline)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/10] perf, tools: Only print base source file for srcline
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (5 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 06/10] perf, tools: Enable printing the srcline in the history Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 19:22   ` Arnaldo Carvalho de Melo
  2014-11-20  7:38   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 08/10] perf, tools: Support source line numbers in annotate Andi Kleen
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

For perf report with --sort srcline only print the base source file
name. This makes the results generally fit much better to the
screen. The path is usually not that useful anyways because it is
often from different systems.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/srcline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index c6a7cdc..ac877f9 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -274,7 +274,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
 	if (!addr2line(dso_name, addr, &file, &line, dso))
 		goto out;
 
-	if (asprintf(&srcline, "%s:%u", file, line) < 0) {
+	if (asprintf(&srcline, "%s:%u", basename(file), line) < 0) {
 		free(file);
 		goto out;
 	}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/10] perf, tools: Support source line numbers in annotate
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (6 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 07/10] perf, tools: Only print base source file for srcline Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 20:52   ` Arnaldo Carvalho de Melo
  2014-11-20  7:39   ` [tip:perf/core] perf annotate: " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 09/10] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

With srcline key/sort'ing it's useful to have line numbers
in the annotate window. This patch implements this.

Use objdump -l to request the line numbers and
save them in the line structure. Then the browser
displays them for source lines.

The line numbers are not displayed by default, but can be
toggled on with 'k'

There is one unfortunate problem with this setup. For
lines not containing source and which are outside functions
objdump -l reports line numbers off by a few: it always reports
the first line number in the next function even for lines
that are outside the function.
I haven't found a nice way to detect/correct this. Probably objdump
has to be fixed.
See https://sourceware.org/bugzilla/show_bug.cgi?id=16433

The line numbers are still useful even with these problems,
as most are correct and the ones which are not are nearby.

v2: Fix help text. Handle (discriminator...) output in objdump.
Left align the line numbers.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/ui/browsers/annotate.c | 13 ++++++++++++-
 tools/perf/util/annotate.c        | 30 +++++++++++++++++++++++++-----
 tools/perf/util/annotate.h        |  1 +
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index f0697a3..1e0a2fd 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -27,6 +27,7 @@ static struct annotate_browser_opt {
 	bool hide_src_code,
 	     use_offset,
 	     jump_arrows,
+	     show_linenr,
 	     show_nr_jumps;
 } annotate_browser__opts = {
 	.use_offset	= true,
@@ -128,7 +129,11 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
 	if (!*dl->line)
 		slsmg_write_nstring(" ", width - pcnt_width);
 	else if (dl->offset == -1) {
-		printed = scnprintf(bf, sizeof(bf), "%*s  ",
+		if (dl->line_nr && annotate_browser__opts.show_linenr)
+			printed = scnprintf(bf, sizeof(bf), "%-*d ",
+					ab->addr_width + 1, dl->line_nr);
+		else
+			printed = scnprintf(bf, sizeof(bf), "%*s  ",
 				    ab->addr_width, " ");
 		slsmg_write_nstring(bf, printed);
 		slsmg_write_nstring(dl->line, width - printed - pcnt_width + 1);
@@ -733,6 +738,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
 		"o             Toggle disassembler output/simplified view\n"
 		"s             Toggle source code view\n"
 		"/             Search string\n"
+		"k             Toggle line numbers\n"
 		"r             Run available scripts\n"
 		"?             Search string backwards\n");
 			continue;
@@ -741,6 +747,10 @@ static int annotate_browser__run(struct annotate_browser *browser,
 				script_browse(NULL);
 				continue;
 			}
+		case 'k':
+			annotate_browser__opts.show_linenr =
+				!annotate_browser__opts.show_linenr;
+			break;
 		case 'H':
 			nd = browser->curr_hot;
 			break;
@@ -984,6 +994,7 @@ static struct annotate_config {
 } annotate__configs[] = {
 	ANNOTATE_CFG(hide_src_code),
 	ANNOTATE_CFG(jump_arrows),
+	ANNOTATE_CFG(show_linenr),
 	ANNOTATE_CFG(show_nr_jumps),
 	ANNOTATE_CFG(use_offset),
 };
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 7dabde1..d821223 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -17,11 +17,13 @@
 #include "debug.h"
 #include "annotate.h"
 #include "evsel.h"
+#include <regex.h>
 #include <pthread.h>
 #include <linux/bitops.h>
 
 const char 	*disassembler_style;
 const char	*objdump_path;
+static regex_t	 file_lineno;
 
 static struct ins *ins__find(const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
@@ -570,13 +572,15 @@ out_free_name:
 	return -1;
 }
 
-static struct disasm_line *disasm_line__new(s64 offset, char *line, size_t privsize)
+static struct disasm_line *disasm_line__new(s64 offset, char *line,
+					size_t privsize, int line_nr)
 {
 	struct disasm_line *dl = zalloc(sizeof(*dl) + privsize);
 
 	if (dl != NULL) {
 		dl->offset = offset;
 		dl->line = strdup(line);
+		dl->line_nr = line_nr;
 		if (dl->line == NULL)
 			goto out_delete;
 
@@ -788,13 +792,15 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st
  * The ops.raw part will be parsed further according to type of the instruction.
  */
 static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
-				      FILE *file, size_t privsize)
+				      FILE *file, size_t privsize,
+				      int *line_nr)
 {
 	struct annotation *notes = symbol__annotation(sym);
 	struct disasm_line *dl;
 	char *line = NULL, *parsed_line, *tmp, *tmp2, *c;
 	size_t line_len;
 	s64 line_ip, offset = -1;
+	regmatch_t match[2];
 
 	if (getline(&line, &line_len, file) < 0)
 		return -1;
@@ -812,6 +818,12 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 	line_ip = -1;
 	parsed_line = line;
 
+	/* /filename:linenr ? Save line number and ignore. */
+	if (regexec(&file_lineno, line, 2, match, 0) == 0) {
+		*line_nr = atoi(line + match[1].rm_so);
+		return 0;
+	}
+
 	/*
 	 * Strip leading spaces:
 	 */
@@ -842,8 +854,9 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 			parsed_line = tmp2 + 1;
 	}
 
-	dl = disasm_line__new(offset, parsed_line, privsize);
+	dl = disasm_line__new(offset, parsed_line, privsize, *line_nr);
 	free(line);
+	(*line_nr)++;
 
 	if (dl == NULL)
 		return -1;
@@ -869,6 +882,11 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 	return 0;
 }
 
+static __attribute__((constructor)) void symbol__init_regexpr(void)
+{
+	regcomp(&file_lineno, "^/[^:]+:([0-9]+)", REG_EXTENDED);
+}
+
 static void delete_last_nop(struct symbol *sym)
 {
 	struct annotation *notes = symbol__annotation(sym);
@@ -904,6 +922,7 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 	char symfs_filename[PATH_MAX];
 	struct kcore_extract kce;
 	bool delete_extract = false;
+	int lineno = 0;
 
 	if (filename)
 		symbol__join_symfs(symfs_filename, filename);
@@ -982,7 +1001,7 @@ fallback:
 	snprintf(command, sizeof(command),
 		 "%s %s%s --start-address=0x%016" PRIx64
 		 " --stop-address=0x%016" PRIx64
-		 " -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
+		 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
 		 objdump_path ? objdump_path : "objdump",
 		 disassembler_style ? "-M " : "",
 		 disassembler_style ? disassembler_style : "",
@@ -999,7 +1018,8 @@ fallback:
 		goto out_free_filename;
 
 	while (!feof(file))
-		if (symbol__parse_objdump_line(sym, map, file, privsize) < 0)
+		if (symbol__parse_objdump_line(sym, map, file, privsize,
+			    &lineno) < 0)
 			break;
 
 	/*
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 112d6e2..0784a94 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -58,6 +58,7 @@ struct disasm_line {
 	char		    *line;
 	char		    *name;
 	struct ins	    *ins;
+	int		    line_nr;
 	struct ins_operands ops;
 };
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/10] tools, perf: Make get_srcline fall back to sym+offset
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (7 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 08/10] perf, tools: Support source line numbers in annotate Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-12-08  6:49   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  2014-11-13  2:05 ` [PATCH 10/10] tools, perf: Add asprintf replacement Andi Kleen
  2014-11-17 21:34 ` Implement lbr-as-callgraph v10 Arnaldo Carvalho de Melo
  10 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When the source line is not found fall back to sym + offset.
This is generally much more useful than a raw address.
For this we need to pass in the symbol from the caller.
For some callers it's awkward to compute, so we stay
at the old behaviour.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/annotate.c  |  2 +-
 tools/perf/util/callchain.c |  3 ++-
 tools/perf/util/map.c       |  2 +-
 tools/perf/util/sort.c      |  6 ++++--
 tools/perf/util/srcline.c   | 12 +++++++++---
 tools/perf/util/util.h      |  4 +++-
 6 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index d821223..9837adf 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1190,7 +1190,7 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map,
 			goto next;
 
 		offset = start + i;
-		src_line->path = get_srcline(map->dso, offset);
+		src_line->path = get_srcline(map->dso, offset, NULL, false);
 		insert_source_line(&tmp_root, src_line);
 
 	next:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8046eb9..cf524a3 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -823,7 +823,8 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 		    cl->ms.map && !cl->srcline)
 			cl->srcline = get_srcline(cl->ms.map->dso,
 						  map__rip_2objdump(cl->ms.map,
-								    cl->ip));
+								    cl->ip),
+						  cl->ms.sym, false);
 		if (cl->srcline)
 			printed = scnprintf(bf, bfsize, "%s %s",
 					cl->ms.sym->name, cl->srcline);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 040a785..62ca9f2 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -360,7 +360,7 @@ int map__fprintf_srcline(struct map *map, u64 addr, const char *prefix,
 
 	if (map && map->dso) {
 		srcline = get_srcline(map->dso,
-				      map__rip_2objdump(map, addr));
+				      map__rip_2objdump(map, addr), NULL, true);
 		if (srcline != SRCLINE_UNKNOWN)
 			ret = fprintf(fp, "%s%s", prefix, srcline);
 		free_srcline(srcline);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 9402885..95efaaf 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -291,7 +291,8 @@ sort__srcline_cmp(struct hist_entry *left, struct hist_entry *right)
 		else {
 			struct map *map = left->ms.map;
 			left->srcline = get_srcline(map->dso,
-					    map__rip_2objdump(map, left->ip));
+					   map__rip_2objdump(map, left->ip),
+						    left->ms.sym, true);
 		}
 	}
 	if (!right->srcline) {
@@ -300,7 +301,8 @@ sort__srcline_cmp(struct hist_entry *left, struct hist_entry *right)
 		else {
 			struct map *map = right->ms.map;
 			right->srcline = get_srcline(map->dso,
-					    map__rip_2objdump(map, right->ip));
+					     map__rip_2objdump(map, right->ip),
+						     right->ms.sym, true);
 		}
 	}
 	return strcmp(right->srcline, left->srcline);
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ac877f9..36a7aff 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -8,12 +8,13 @@
 #include "util/util.h"
 #include "util/debug.h"
 
+#include "symbol.h"
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
  * Implement addr2line using libbfd.
  */
-#define PACKAGE "perf"
 #include <bfd.h>
 
 struct a2l_data {
@@ -250,7 +251,8 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
  */
 #define A2L_FAIL_LIMIT 123
 
-char *get_srcline(struct dso *dso, unsigned long addr)
+char *get_srcline(struct dso *dso, unsigned long addr, struct symbol *sym,
+		  bool show_sym)
 {
 	char *file = NULL;
 	unsigned line = 0;
@@ -289,7 +291,11 @@ out:
 		dso->has_srcline = 0;
 		dso__free_a2l(dso);
 	}
-	if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
+	if (sym) {
+		if (asprintf(&srcline, "%s+%ld", show_sym ? sym->name : "",
+					addr - sym->start) < 0)
+			return SRCLINE_UNKNOWN;
+	} else if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
 		return SRCLINE_UNKNOWN;
 	return srcline;
 }
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 7dc44cf..921c48b 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -338,8 +338,10 @@ static inline int path__join3(char *bf, size_t size,
 }
 
 struct dso;
+struct symbol;
 
-char *get_srcline(struct dso *dso, unsigned long addr);
+char *get_srcline(struct dso *dso, unsigned long addr, struct symbol *sym,
+		  bool show_sym);
 void free_srcline(char *srcline);
 
 int filename__read_int(const char *filename, int *value);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/10] tools, perf: Add asprintf replacement
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (8 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 09/10] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
@ 2014-11-13  2:05 ` Andi Kleen
  2014-11-13 20:53   ` Arnaldo Carvalho de Melo
  2014-11-17 21:34 ` Implement lbr-as-callgraph v10 Arnaldo Carvalho de Melo
  10 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-13  2:05 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

asprintf corrupts memory on some older glibc versions.
Provide a replacement. This fixes various segfaults
with --branch-history on older Fedoras.

v2: Remove bogus hunk.
    Support arbitrary size (Geert Uytterhoeven)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Makefile.perf    |  1 +
 tools/perf/builtin-report.c |  1 +
 tools/perf/util/asprintf.c  | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)
 create mode 100644 tools/perf/util/asprintf.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index aecf61dc..9db4ff1 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -395,6 +395,7 @@ LIB_OBJS += $(OUTPUT)util/vdso.o
 LIB_OBJS += $(OUTPUT)util/stat.o
 LIB_OBJS += $(OUTPUT)util/record.o
 LIB_OBJS += $(OUTPUT)util/srcline.o
+LIB_OBJS += $(OUTPUT)util/asprintf.o
 LIB_OBJS += $(OUTPUT)util/data.o
 LIB_OBJS += $(OUTPUT)util/tsc.o
 LIB_OBJS += $(OUTPUT)util/cloexec.o
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index fb272ff..493f011 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -762,6 +762,7 @@ repeat:
 	}
 	if (branch_call_mode) {
 		callchain_param.branch_callstack = 1;
+		callchain_param.key = CCKEY_ADDRESS;
 		symbol_conf.use_callchain = true;
 		callchain_register_param(&callchain_param);
 		if (sort_order == NULL)
diff --git a/tools/perf/util/asprintf.c b/tools/perf/util/asprintf.c
new file mode 100644
index 0000000..6177ee8
--- /dev/null
+++ b/tools/perf/util/asprintf.c
@@ -0,0 +1,34 @@
+/* Replacement for asprintf as it's buggy in older glibc versions */
+#include <stdio.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <string.h>
+
+int vasprintf(char **str, const char *fmt, va_list ap)
+{
+	va_list ao;
+	char buf[1024];
+	int len;
+
+	va_copy(ao, ap);
+	len = vsnprintf(buf, sizeof(buf), fmt, ap);
+	*str = malloc(len + 1);
+	if (!*str)
+		return -1;
+	if ((size_t)len < sizeof(buf)) {
+		strcpy(*str, buf);
+		return len;
+	}
+	return vsnprintf(*str, len + 1, fmt, ao);
+}
+
+int asprintf(char **str, const char *fmt, ...)
+{
+	va_list ap;
+	int ret;
+
+	va_start(ap, fmt);
+	ret = vasprintf(str, fmt, ap);
+	va_end(ap);
+	return ret;
+}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms
  2014-11-13  2:05 ` [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms Andi Kleen
@ 2014-11-13 19:14   ` Arnaldo Carvalho de Melo
  2014-11-13 19:52     ` Andi Kleen
  2014-12-08  6:53   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  1 sibling, 1 reply; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 19:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:20PM -0800, Andi Kleen escreveu:
> +static int remove_loops(struct branch_entry *l, int nr)
> +{
> +	int i, j, off;
> +	unsigned char chash[CHASHSZ];
> +
> +	memset(chash, NO_ENTRY, sizeof(chash));
> +
> +	BUG_ON(nr >= 256);

What is wrong with return -1 and propagating the error, so that the user
is informed if the data being processed is bogus, stop processing with a
warning or continue processing if finding the next valid record is
possible?

In general just aborting (and this only does that when NDEBUG is set)
should be avoided...

I can go on and help you and do that for you, but why introduce it in
the first place?

- Arnaldo

> +	for (i = 0; i < nr; i++) {
> +		int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> +
> +		/* no collision handling for now */
> +		if (chash[h] == NO_ENTRY) {
> +			chash[h] = i;
> +		} else if (l[chash[h]].from == l[i].from) {
> +			bool is_loop = true;
> +			/* check if it is a real loop */
> +			off = 0;
> +			for (j = chash[h]; j < i && i + off < nr; j++, off++)
> +				if (l[j].from != l[i + off].from) {
> +					is_loop = false;
> +					break;
> +				}
> +			if (is_loop) {
> +				memmove(l + i, l + i + off,
> +					(nr - (i + off)) * sizeof(*l));
> +				nr -= off;
> +			}
> +		}
> +	}
> +	return nr;
> +}
> +
>  static int thread__resolve_callchain_sample(struct thread *thread,
>  					     struct ip_callchain *chain,
> +					     struct branch_stack *branch,
>  					     struct symbol **parent,
>  					     struct addr_location *root_al,
>  					     int max_stack)
> @@ -1438,22 +1484,82 @@ static int thread__resolve_callchain_sample(struct thread *thread,
>  	int i;
>  	int j;
>  	int err;
> -	int skip_idx __maybe_unused;
> +	int skip_idx = -1;
> +	int first_call = 0;
> +
> +	/*
> +	 * Based on DWARF debug information, some architectures skip
> +	 * a callchain entry saved by the kernel.
> +	 */
> +	if (chain->nr < PERF_MAX_STACK_DEPTH)
> +		skip_idx = arch_skip_callchain_idx(thread, chain);
>  
>  	callchain_cursor_reset(&callchain_cursor);
>  
> +	/*
> +	 * Add branches to call stack for easier browsing. This gives
> +	 * more context for a sample than just the callers.
> +	 *
> +	 * This uses individual histograms of paths compared to the
> +	 * aggregated histograms the normal LBR mode uses.
> +	 *
> +	 * Limitations for now:
> +	 * - No extra filters
> +	 * - No annotations (should annotate somehow)
> +	 */
> +
> +	if (branch && callchain_param.branch_callstack) {
> +		int nr = min(max_stack, (int)branch->nr);
> +		struct branch_entry be[nr];
> +
> +		if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
> +			pr_warning("corrupted branch chain. skipping...\n");
> +			goto check_calls;
> +		}
> +
> +		for (i = 0; i < nr; i++) {
> +			if (callchain_param.order == ORDER_CALLEE) {
> +				be[i] = branch->entries[i];
> +				/*
> +				 * Check for overlap into the callchain.
> +				 * The return address is one off compared to
> +				 * the branch entry. To adjust for this
> +				 * assume the calling instruction is not longer
> +				 * than 8 bytes.
> +				 */
> +				if (i == skip_idx ||
> +				    chain->ips[first_call] >= PERF_CONTEXT_MAX)
> +					first_call++;
> +				else if (be[i].from < chain->ips[first_call] &&
> +				    be[i].from >= chain->ips[first_call] - 8)
> +					first_call++;
> +			} else
> +				be[i] = branch->entries[branch->nr - i - 1];
> +		}
> +
> +		nr = remove_loops(be, nr);
> +
> +		for (i = 0; i < nr; i++) {
> +			err = add_callchain_ip(thread, parent, root_al,
> +					       -1, be[i].to);
> +			if (!err)
> +				err = add_callchain_ip(thread, parent, root_al,
> +						       -1, be[i].from);
> +			if (err == -EINVAL)
> +				break;
> +			if (err)
> +				return err;
> +		}
> +		chain_nr -= nr;
> +	}
> +
> +check_calls:
>  	if (chain->nr > PERF_MAX_STACK_DEPTH) {
>  		pr_warning("corrupted callchain. skipping...\n");
>  		return 0;
>  	}
>  
> -	/*
> -	 * Based on DWARF debug information, some architectures skip
> -	 * a callchain entry saved by the kernel.
> -	 */
> -	skip_idx = arch_skip_callchain_idx(thread, chain);
> -
> -	for (i = 0; i < chain_nr; i++) {
> +	for (i = first_call; i < chain_nr; i++) {
>  		u64 ip;
>  
>  		if (callchain_param.order == ORDER_CALLEE)
> @@ -1517,6 +1623,7 @@ int thread__resolve_callchain(struct thread *thread,
>  			      int max_stack)
>  {
>  	int ret = thread__resolve_callchain_sample(thread, sample->callchain,
> +						   sample->branch_stack,
>  						   parent, root_al, max_stack);
>  	if (ret)
>  		return ret;
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index ded3ca7..b364d96 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -123,7 +123,8 @@ struct symbol_conf {
>  			demangle,
>  			demangle_kernel,
>  			filter_relative,
> -			show_hist_headers;
> +			show_hist_headers,
> +			branch_callstack;
>  	const char	*vmlinux_name,
>  			*kallsyms_name,
>  			*source_prefix,
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/10] perf, tools: Factor out adding new call chain entries
  2014-11-13  2:05 ` [PATCH 01/10] perf, tools: Factor out adding new call chain entries Andi Kleen
@ 2014-11-13 19:14   ` Arnaldo Carvalho de Melo
  2014-11-20  7:37   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 19:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:19PM -0800, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Move the code to resolve and add a new callchain entry
> into a new add_callchain_ip function. This will be used
> in the next patches to add LBRs too.
> 
> No change in behavior.

Applied.

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/10] perf, tools: Use al.addr to set up call chain
  2014-11-13  2:05 ` [PATCH 03/10] perf, tools: Use al.addr to set up call chain Andi Kleen
@ 2014-11-13 19:16   ` Arnaldo Carvalho de Melo
  2014-12-11 21:46     ` Jiri Olsa
  2014-11-20  7:38   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  1 sibling, 1 reply; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 19:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:21PM -0800, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Use the relative address, this makes get_srcline work correctly
> in the end.

Applied.
 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/util/machine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 2e16d69..066e963 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1411,7 +1411,7 @@ static int add_callchain_ip(struct thread *thread,
>  		}
>  	}
>  
> -	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
> +	return callchain_cursor_append(&callchain_cursor, al.addr, al.map, al.sym);
>  }
>  
>  struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/10] perf, tools: Use a common function to resolve symbol or name
  2014-11-13  2:05 ` [PATCH 05/10] perf, tools: Use a common function to resolve symbol or name Andi Kleen
@ 2014-11-13 19:17   ` Arnaldo Carvalho de Melo
  2014-11-20  7:38   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 19:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:23PM -0800, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Refactor the duplicated code to resolve the symbol name or
> the address of a symbol into a single function.
> 
> Used in next patch to add common functionality.

Applied.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/10] perf, tools: Enable printing the srcline in the history
  2014-11-13  2:05 ` [PATCH 06/10] perf, tools: Enable printing the srcline in the history Andi Kleen
@ 2014-11-13 19:20   ` Arnaldo Carvalho de Melo
  2014-12-08  6:48   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 19:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:24PM -0800, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> For lbr-as-callgraph we need to see the line number in the history,
> because many LBR entries can be in a single function, and just
> showing the same function name many times is not useful.
> 
> When the history code is configured to sort by address, also try to
> resolve the address to a file:srcline and display this in the browser.
> If that doesn't work still display the address.
> 
> This can be also useful without LBRs for understanding which call in a large
> function (or in which inlined function) called something else.
> 
> Contains fixes from Namhyung Kim

Applied.

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/10] perf, tools: Only print base source file for srcline
  2014-11-13  2:05 ` [PATCH 07/10] perf, tools: Only print base source file for srcline Andi Kleen
@ 2014-11-13 19:22   ` Arnaldo Carvalho de Melo
  2014-11-20  7:38   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 19:22 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:25PM -0800, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> For perf report with --sort srcline only print the base source file
> name. This makes the results generally fit much better to the
> screen. The path is usually not that useful anyways because it is
> often from different systems.

Applied
 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/util/srcline.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> index c6a7cdc..ac877f9 100644
> --- a/tools/perf/util/srcline.c
> +++ b/tools/perf/util/srcline.c
> @@ -274,7 +274,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
>  	if (!addr2line(dso_name, addr, &file, &line, dso))
>  		goto out;
>  
> -	if (asprintf(&srcline, "%s:%u", file, line) < 0) {
> +	if (asprintf(&srcline, "%s:%u", basename(file), line) < 0) {
>  		free(file);
>  		goto out;
>  	}
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms
  2014-11-13 19:14   ` Arnaldo Carvalho de Melo
@ 2014-11-13 19:52     ` Andi Kleen
  2014-11-13 20:08       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-13 19:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, jolsa, linux-kernel, namhyung, Andi Kleen

On Thu, Nov 13, 2014 at 04:14:17PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 12, 2014 at 06:05:20PM -0800, Andi Kleen escreveu:
> > +static int remove_loops(struct branch_entry *l, int nr)
> > +{
> > +	int i, j, off;
> > +	unsigned char chash[CHASHSZ];
> > +
> > +	memset(chash, NO_ENTRY, sizeof(chash));
> > +
> > +	BUG_ON(nr >= 256);
> 
> What is wrong with return -1 and propagating the error, so that the user
> is informed if the data being processed is bogus, stop processing with a
> warning or continue processing if finding the next valid record is
> possible?

The error doesn't depend on the record. There is a check for the record
being < 127 in front of this. This is merely to catch that 
if someone increases PERF_MAX_BRANCH_DEPTH to below 255 they need
to increase the type of the hash table from u8.


-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms
  2014-11-13 19:52     ` Andi Kleen
@ 2014-11-13 20:08       ` Arnaldo Carvalho de Melo
  2014-11-13 20:15         ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 20:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Thu, Nov 13, 2014 at 08:52:08PM +0100, Andi Kleen escreveu:
> On Thu, Nov 13, 2014 at 04:14:17PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Nov 12, 2014 at 06:05:20PM -0800, Andi Kleen escreveu:
> > > +static int remove_loops(struct branch_entry *l, int nr)
> > > +{
> > > +	int i, j, off;
> > > +	unsigned char chash[CHASHSZ];
> > > +
> > > +	memset(chash, NO_ENTRY, sizeof(chash));
> > > +
> > > +	BUG_ON(nr >= 256);
> > 
> > What is wrong with return -1 and propagating the error, so that the user
> > is informed if the data being processed is bogus, stop processing with a
> > warning or continue processing if finding the next valid record is
> > possible?
> 
> The error doesn't depend on the record. There is a check for the record
> being < 127 in front of this. This is merely to catch that 
> if someone increases PERF_MAX_BRANCH_DEPTH to below 255 they need
> to increase the type of the hash table from u8.

Ok, so this would be better as a BUILD_BUG_ON? 

Like:

#define CHASHSZ 127
#define CHASHBITS 7
#define NO_ENTRY 0xff

#define PERF_MAX_BRANCH_DEPTH 127

/* Remove loops. */
static int remove_loops(struct branch_entry *l, int nr)
{
	int i, j, off;
	unsigned char chash[CHASHSZ];

	memset(chash, NO_ENTRY, sizeof(chash));

	/* Change the type of the chash table, u8 is not enough now */
	BUILD_BUG_ON(PERF_MAX_BRANCH_DEPTH >= 256);
	for (i = 0; i < nr; i++) {
		int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;

-------------------------------------

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms
  2014-11-13 20:08       ` Arnaldo Carvalho de Melo
@ 2014-11-13 20:15         ` Andi Kleen
  2014-11-13 20:42           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-13 20:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, jolsa, linux-kernel, namhyung, Andi Kleen

On Thu, Nov 13, 2014 at 05:08:33PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 13, 2014 at 08:52:08PM +0100, Andi Kleen escreveu:
> > On Thu, Nov 13, 2014 at 04:14:17PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Nov 12, 2014 at 06:05:20PM -0800, Andi Kleen escreveu:
> > > > +static int remove_loops(struct branch_entry *l, int nr)
> > > > +{
> > > > +	int i, j, off;
> > > > +	unsigned char chash[CHASHSZ];
> > > > +
> > > > +	memset(chash, NO_ENTRY, sizeof(chash));
> > > > +
> > > > +	BUG_ON(nr >= 256);
> > > 
> > > What is wrong with return -1 and propagating the error, so that the user
> > > is informed if the data being processed is bogus, stop processing with a
> > > warning or continue processing if finding the next valid record is
> > > possible?
> > 
> > The error doesn't depend on the record. There is a check for the record
> > being < 127 in front of this. This is merely to catch that 
> > if someone increases PERF_MAX_BRANCH_DEPTH to below 255 they need
> > to increase the type of the hash table from u8.
> 
> Ok, so this would be better as a BUILD_BUG_ON? 

Yes that should work.

-Andi

> 
> Like:
> 
> #define CHASHSZ 127
> #define CHASHBITS 7
> #define NO_ENTRY 0xff
> 
> #define PERF_MAX_BRANCH_DEPTH 127
> 
> /* Remove loops. */
> static int remove_loops(struct branch_entry *l, int nr)
> {
> 	int i, j, off;
> 	unsigned char chash[CHASHSZ];
> 
> 	memset(chash, NO_ENTRY, sizeof(chash));
> 
> 	/* Change the type of the chash table, u8 is not enough now */
> 	BUILD_BUG_ON(PERF_MAX_BRANCH_DEPTH >= 256);
> 	for (i = 0; i < nr; i++) {
> 		int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> 
> -------------------------------------
> 
> - Arnaldo
> 

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms
  2014-11-13 20:15         ` Andi Kleen
@ 2014-11-13 20:42           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 20:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Thu, Nov 13, 2014 at 09:15:24PM +0100, Andi Kleen escreveu:
> On Thu, Nov 13, 2014 at 05:08:33PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Nov 13, 2014 at 08:52:08PM +0100, Andi Kleen escreveu:
> > > On Thu, Nov 13, 2014 at 04:14:17PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Wed, Nov 12, 2014 at 06:05:20PM -0800, Andi Kleen escreveu:
> > > > > +static int remove_loops(struct branch_entry *l, int nr)
> > > > > +{
> > > > > +	int i, j, off;
> > > > > +	unsigned char chash[CHASHSZ];
> > > > > +
> > > > > +	memset(chash, NO_ENTRY, sizeof(chash));
> > > > > +
> > > > > +	BUG_ON(nr >= 256);
> > > > 
> > > > What is wrong with return -1 and propagating the error, so that the user
> > > > is informed if the data being processed is bogus, stop processing with a
> > > > warning or continue processing if finding the next valid record is
> > > > possible?
> > > 
> > > The error doesn't depend on the record. There is a check for the record
> > > being < 127 in front of this. This is merely to catch that 
> > > if someone increases PERF_MAX_BRANCH_DEPTH to below 255 they need
> > > to increase the type of the hash table from u8.
> > 
> > Ok, so this would be better as a BUILD_BUG_ON? 
> 
> Yes that should work.

Ok, I'll do that, test by bumping it to 256, etc and merge the resulting
patch, thanks for checking.

- Arnaldo
 
> -Andi
> 
> > 
> > Like:
> > 
> > #define CHASHSZ 127
> > #define CHASHBITS 7
> > #define NO_ENTRY 0xff
> > 
> > #define PERF_MAX_BRANCH_DEPTH 127
> > 
> > /* Remove loops. */
> > static int remove_loops(struct branch_entry *l, int nr)
> > {
> > 	int i, j, off;
> > 	unsigned char chash[CHASHSZ];
> > 
> > 	memset(chash, NO_ENTRY, sizeof(chash));
> > 
> > 	/* Change the type of the chash table, u8 is not enough now */
> > 	BUILD_BUG_ON(PERF_MAX_BRANCH_DEPTH >= 256);
> > 	for (i = 0; i < nr; i++) {
> > 		int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> > 
> > -------------------------------------
> > 
> > - Arnaldo
> > 
> 
> -- 
> ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/10] perf, tools: Support source line numbers in annotate
  2014-11-13  2:05 ` [PATCH 08/10] perf, tools: Support source line numbers in annotate Andi Kleen
@ 2014-11-13 20:52   ` Arnaldo Carvalho de Melo
  2014-11-20  7:39   ` [tip:perf/core] perf annotate: " tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 20:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:26PM -0800, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> With srcline key/sort'ing it's useful to have line numbers
> in the annotate window. This patch implements this.
> 
> Use objdump -l to request the line numbers and
> save them in the line structure. Then the browser
> displays them for source lines.
> 
> The line numbers are not displayed by default, but can be
> toggled on with 'k'

Ok, I tested this, cool stuff, I'll apply after some more testing,
issues:

it is right now left justified, I did a perf top -> annotate
cpu_idle_loop() and I see up to 3 digits __LINE__, would be nice to
figure out the max number of digits and right justify as the jump target
labels are.

Also the numbers sometimes change dramatically, that is because at those
places we see inlined stuff, so another toogle for the basename source
code would be as well nice.

Also it is on the same column as the jump target labels, which is kinda
OK because it makes it more compact, and one can figure out one from the
other because labels are colon suffixed.

Anyway, I think it is good to go now, we can improve these things later.

- Arnaldo

 
> There is one unfortunate problem with this setup. For
> lines not containing source and which are outside functions
> objdump -l reports line numbers off by a few: it always reports
> the first line number in the next function even for lines
> that are outside the function.
> I haven't found a nice way to detect/correct this. Probably objdump
> has to be fixed.
> See https://sourceware.org/bugzilla/show_bug.cgi?id=16433
> 
> The line numbers are still useful even with these problems,
> as most are correct and the ones which are not are nearby.
> 
> v2: Fix help text. Handle (discriminator...) output in objdump.
> Left align the line numbers.
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/ui/browsers/annotate.c | 13 ++++++++++++-
>  tools/perf/util/annotate.c        | 30 +++++++++++++++++++++++++-----
>  tools/perf/util/annotate.h        |  1 +
>  3 files changed, 38 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
> index f0697a3..1e0a2fd 100644
> --- a/tools/perf/ui/browsers/annotate.c
> +++ b/tools/perf/ui/browsers/annotate.c
> @@ -27,6 +27,7 @@ static struct annotate_browser_opt {
>  	bool hide_src_code,
>  	     use_offset,
>  	     jump_arrows,
> +	     show_linenr,
>  	     show_nr_jumps;
>  } annotate_browser__opts = {
>  	.use_offset	= true,
> @@ -128,7 +129,11 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
>  	if (!*dl->line)
>  		slsmg_write_nstring(" ", width - pcnt_width);
>  	else if (dl->offset == -1) {
> -		printed = scnprintf(bf, sizeof(bf), "%*s  ",
> +		if (dl->line_nr && annotate_browser__opts.show_linenr)
> +			printed = scnprintf(bf, sizeof(bf), "%-*d ",
> +					ab->addr_width + 1, dl->line_nr);
> +		else
> +			printed = scnprintf(bf, sizeof(bf), "%*s  ",
>  				    ab->addr_width, " ");
>  		slsmg_write_nstring(bf, printed);
>  		slsmg_write_nstring(dl->line, width - printed - pcnt_width + 1);
> @@ -733,6 +738,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
>  		"o             Toggle disassembler output/simplified view\n"
>  		"s             Toggle source code view\n"
>  		"/             Search string\n"
> +		"k             Toggle line numbers\n"
>  		"r             Run available scripts\n"
>  		"?             Search string backwards\n");
>  			continue;
> @@ -741,6 +747,10 @@ static int annotate_browser__run(struct annotate_browser *browser,
>  				script_browse(NULL);
>  				continue;
>  			}
> +		case 'k':
> +			annotate_browser__opts.show_linenr =
> +				!annotate_browser__opts.show_linenr;
> +			break;
>  		case 'H':
>  			nd = browser->curr_hot;
>  			break;
> @@ -984,6 +994,7 @@ static struct annotate_config {
>  } annotate__configs[] = {
>  	ANNOTATE_CFG(hide_src_code),
>  	ANNOTATE_CFG(jump_arrows),
> +	ANNOTATE_CFG(show_linenr),
>  	ANNOTATE_CFG(show_nr_jumps),
>  	ANNOTATE_CFG(use_offset),
>  };
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 7dabde1..d821223 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -17,11 +17,13 @@
>  #include "debug.h"
>  #include "annotate.h"
>  #include "evsel.h"
> +#include <regex.h>
>  #include <pthread.h>
>  #include <linux/bitops.h>
>  
>  const char 	*disassembler_style;
>  const char	*objdump_path;
> +static regex_t	 file_lineno;
>  
>  static struct ins *ins__find(const char *name);
>  static int disasm_line__parse(char *line, char **namep, char **rawp);
> @@ -570,13 +572,15 @@ out_free_name:
>  	return -1;
>  }
>  
> -static struct disasm_line *disasm_line__new(s64 offset, char *line, size_t privsize)
> +static struct disasm_line *disasm_line__new(s64 offset, char *line,
> +					size_t privsize, int line_nr)
>  {
>  	struct disasm_line *dl = zalloc(sizeof(*dl) + privsize);
>  
>  	if (dl != NULL) {
>  		dl->offset = offset;
>  		dl->line = strdup(line);
> +		dl->line_nr = line_nr;
>  		if (dl->line == NULL)
>  			goto out_delete;
>  
> @@ -788,13 +792,15 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st
>   * The ops.raw part will be parsed further according to type of the instruction.
>   */
>  static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
> -				      FILE *file, size_t privsize)
> +				      FILE *file, size_t privsize,
> +				      int *line_nr)
>  {
>  	struct annotation *notes = symbol__annotation(sym);
>  	struct disasm_line *dl;
>  	char *line = NULL, *parsed_line, *tmp, *tmp2, *c;
>  	size_t line_len;
>  	s64 line_ip, offset = -1;
> +	regmatch_t match[2];
>  
>  	if (getline(&line, &line_len, file) < 0)
>  		return -1;
> @@ -812,6 +818,12 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
>  	line_ip = -1;
>  	parsed_line = line;
>  
> +	/* /filename:linenr ? Save line number and ignore. */
> +	if (regexec(&file_lineno, line, 2, match, 0) == 0) {
> +		*line_nr = atoi(line + match[1].rm_so);
> +		return 0;
> +	}
> +
>  	/*
>  	 * Strip leading spaces:
>  	 */
> @@ -842,8 +854,9 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
>  			parsed_line = tmp2 + 1;
>  	}
>  
> -	dl = disasm_line__new(offset, parsed_line, privsize);
> +	dl = disasm_line__new(offset, parsed_line, privsize, *line_nr);
>  	free(line);
> +	(*line_nr)++;
>  
>  	if (dl == NULL)
>  		return -1;
> @@ -869,6 +882,11 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
>  	return 0;
>  }
>  
> +static __attribute__((constructor)) void symbol__init_regexpr(void)
> +{
> +	regcomp(&file_lineno, "^/[^:]+:([0-9]+)", REG_EXTENDED);
> +}
> +
>  static void delete_last_nop(struct symbol *sym)
>  {
>  	struct annotation *notes = symbol__annotation(sym);
> @@ -904,6 +922,7 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
>  	char symfs_filename[PATH_MAX];
>  	struct kcore_extract kce;
>  	bool delete_extract = false;
> +	int lineno = 0;
>  
>  	if (filename)
>  		symbol__join_symfs(symfs_filename, filename);
> @@ -982,7 +1001,7 @@ fallback:
>  	snprintf(command, sizeof(command),
>  		 "%s %s%s --start-address=0x%016" PRIx64
>  		 " --stop-address=0x%016" PRIx64
> -		 " -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
> +		 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
>  		 objdump_path ? objdump_path : "objdump",
>  		 disassembler_style ? "-M " : "",
>  		 disassembler_style ? disassembler_style : "",
> @@ -999,7 +1018,8 @@ fallback:
>  		goto out_free_filename;
>  
>  	while (!feof(file))
> -		if (symbol__parse_objdump_line(sym, map, file, privsize) < 0)
> +		if (symbol__parse_objdump_line(sym, map, file, privsize,
> +			    &lineno) < 0)
>  			break;
>  
>  	/*
> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
> index 112d6e2..0784a94 100644
> --- a/tools/perf/util/annotate.h
> +++ b/tools/perf/util/annotate.h
> @@ -58,6 +58,7 @@ struct disasm_line {
>  	char		    *line;
>  	char		    *name;
>  	struct ins	    *ins;
> +	int		    line_nr;
>  	struct ins_operands ops;
>  };
>  
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/10] tools, perf: Add asprintf replacement
  2014-11-13  2:05 ` [PATCH 10/10] tools, perf: Add asprintf replacement Andi Kleen
@ 2014-11-13 20:53   ` Arnaldo Carvalho de Melo
  2014-11-13 21:14     ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-13 20:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

Em Wed, Nov 12, 2014 at 06:05:28PM -0800, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> asprintf corrupts memory on some older glibc versions.
> Provide a replacement. This fixes various segfaults
> with --branch-history on older Fedoras.
> 
> v2: Remove bogus hunk.
>     Support arbitrary size (Geert Uytterhoeven)
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/Makefile.perf    |  1 +
>  tools/perf/builtin-report.c |  1 +
>  tools/perf/util/asprintf.c  | 34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
>  create mode 100644 tools/perf/util/asprintf.c
> 
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index aecf61dc..9db4ff1 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -395,6 +395,7 @@ LIB_OBJS += $(OUTPUT)util/vdso.o
>  LIB_OBJS += $(OUTPUT)util/stat.o
>  LIB_OBJS += $(OUTPUT)util/record.o
>  LIB_OBJS += $(OUTPUT)util/srcline.o
> +LIB_OBJS += $(OUTPUT)util/asprintf.o
>  LIB_OBJS += $(OUTPUT)util/data.o
>  LIB_OBJS += $(OUTPUT)util/tsc.o
>  LIB_OBJS += $(OUTPUT)util/cloexec.o
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index fb272ff..493f011 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -762,6 +762,7 @@ repeat:
>  	}
>  	if (branch_call_mode) {
>  		callchain_param.branch_callstack = 1;
> +		callchain_param.key = CCKEY_ADDRESS;

unrelated hunk?

- Arnaldo

>  		symbol_conf.use_callchain = true;
>  		callchain_register_param(&callchain_param);
>  		if (sort_order == NULL)
> diff --git a/tools/perf/util/asprintf.c b/tools/perf/util/asprintf.c
> new file mode 100644
> index 0000000..6177ee8
> --- /dev/null
> +++ b/tools/perf/util/asprintf.c
> @@ -0,0 +1,34 @@
> +/* Replacement for asprintf as it's buggy in older glibc versions */
> +#include <stdio.h>
> +#include <stdarg.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +int vasprintf(char **str, const char *fmt, va_list ap)
> +{
> +	va_list ao;
> +	char buf[1024];
> +	int len;
> +
> +	va_copy(ao, ap);
> +	len = vsnprintf(buf, sizeof(buf), fmt, ap);
> +	*str = malloc(len + 1);
> +	if (!*str)
> +		return -1;
> +	if ((size_t)len < sizeof(buf)) {
> +		strcpy(*str, buf);
> +		return len;
> +	}
> +	return vsnprintf(*str, len + 1, fmt, ao);
> +}
> +
> +int asprintf(char **str, const char *fmt, ...)
> +{
> +	va_list ap;
> +	int ret;
> +
> +	va_start(ap, fmt);
> +	ret = vasprintf(str, fmt, ap);
> +	va_end(ap);
> +	return ret;
> +}
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/10] tools, perf: Add asprintf replacement
  2014-11-13 20:53   ` Arnaldo Carvalho de Melo
@ 2014-11-13 21:14     ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-13 21:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Andi Kleen, jolsa, linux-kernel, namhyung

> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index fb272ff..493f011 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -762,6 +762,7 @@ repeat:
> >  	}
> >  	if (branch_call_mode) {
> >  		callchain_param.branch_callstack = 1;
> > +		callchain_param.key = CCKEY_ADDRESS;
> 
> unrelated hunk?

Yes. Please remove.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
                   ` (9 preceding siblings ...)
  2014-11-13  2:05 ` [PATCH 10/10] tools, perf: Add asprintf replacement Andi Kleen
@ 2014-11-17 21:34 ` Arnaldo Carvalho de Melo
  2014-11-18  1:56   ` Andi Kleen
  2014-11-18 10:44   ` Jiri Olsa
  10 siblings, 2 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-17 21:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Olsa, linux-kernel, Namhyung Kim, Liang, Kan

Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
> [Reworks to address all the review feedback. Rebased to latest tree]
> [Just a repost after a rebase]
> [Even more review feedback and some bugs addressed.]
> [Only port to changes in perf/core. No other changes.]
> [Rebase to latest perf/core]
> [Another rebase. No changes]
> 
> This patchkit implements lbr-as-callgraphs in per freport,
> as an alternative way to present LBR information.

Ok, I have this in my perf/core branch, but I need to test it further,
as I couldn't get the output that appears on some of the changelogs.

If you could take a look at

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core

To see if I made some mistake, that would be of help,

Thanks

- Arnaldo
 
> Current perf report does a histogram over the branch edges,
> which is useful to look at basic blocks, but doesn't tell
> you anything about the larger control flow behaviour.
> 
> This patchkit adds a new option --branch-history that
> adds the branch paths to the callgraph history instead.
> 
> This allows to reason about individual branch paths leading
> to specific samples.
> 
> Also available at
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/lbr-callgraph8
> 
> v2:
> - rebased on perf/core
> - fix various issues
> - rename the option to --branch-history
> - various fixes to display the information more concise
> v3:
> - White space changes
> - Consolidate some patches
> - Update some descriptions
> v4:
> - Fix various display problems
> - Unknown srcline is now printed as symbol+offset
> - Refactor some code to address review feedback
> - Merge with latest tip
> - Fix missing srcline display in stdio hist output.
> v5:
> - Rename functions
> - Fix gtk build problem
> - Fix crash without -g
> - Improve error messages
> - Improve srcline display in various ways
> v6:
> - Port to latest perf/core
> v7:
> - Really port to latest perf/core
> v8:
> - Rebased on 3.16-rc1
> v9:
> - Rebase on 3.17-rc* tip/perf/core
> v10:
> - Rebase to latest tree.
> - Address review feedback (except where commented). See
> the individual patches for changelog. I dropped the -v 
> option support.
> 
> Example output:
> 
>     % perf record -b -g ./tsrc/tcall
>     [ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
>     % perf report --branch-history
>     ...
>         54.91%  tcall.c:6  [.] f2                      tcall
>                 |
>                 |--65.53%-- f2 tcall.c:5
>                 |          |
>                 |          |--70.83%-- f1 tcall.c:11
>                 |          |          f1 tcall.c:10
>                 |          |          main tcall.c:18
>                 |          |          main tcall.c:18
>                 |          |          main tcall.c:17
>                 |          |          main tcall.c:17
>                 |          |          f1 tcall.c:13
>                 |          |          f1 tcall.c:13
>                 |          |          f2 tcall.c:7
>                 |          |          f2 tcall.c:5
>                 |          |          f1 tcall.c:12
>                 |          |          f1 tcall.c:12
>                 |          |          f2 tcall.c:7
>                 |          |          f2 tcall.c:5
>                 |          |          f1 tcall.c:11
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-17 21:34 ` Implement lbr-as-callgraph v10 Arnaldo Carvalho de Melo
@ 2014-11-18  1:56   ` Andi Kleen
  2014-11-18 10:44   ` Jiri Olsa
  1 sibling, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-18  1:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, linux-kernel, Namhyung Kim, Liang, Kan

On Mon, Nov 17, 2014 at 06:34:57PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
> > [Reworks to address all the review feedback. Rebased to latest tree]
> > [Just a repost after a rebase]
> > [Even more review feedback and some bugs addressed.]
> > [Only port to changes in perf/core. No other changes.]
> > [Rebase to latest perf/core]
> > [Another rebase. No changes]
> > 
> > This patchkit implements lbr-as-callgraphs in per freport,
> > as an alternative way to present LBR information.
> 
> Ok, I have this in my perf/core branch, but I need to test it further,
> as I couldn't get the output that appears on some of the changelogs.
> 
> If you could take a look at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core
> 
> To see if I made some mistake, that would be of help,

Sorry the CCKEY_ADDRESS hunk was actually needed. With that I get
the srclines as expected.

Also the BUILD_BUG_ON change breaks the compilation with optimization
off (DEBUG=1). I'll send patches for both.

Also please consider the adding the symbol offset fix patch
I sent Friday.

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-17 21:34 ` Implement lbr-as-callgraph v10 Arnaldo Carvalho de Melo
  2014-11-18  1:56   ` Andi Kleen
@ 2014-11-18 10:44   ` Jiri Olsa
  2014-11-18 11:00     ` Jiri Olsa
  1 sibling, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2014-11-18 10:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, linux-kernel, Namhyung Kim, Liang, Kan

On Mon, Nov 17, 2014 at 06:34:57PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
> > [Reworks to address all the review feedback. Rebased to latest tree]
> > [Just a repost after a rebase]
> > [Even more review feedback and some bugs addressed.]
> > [Only port to changes in perf/core. No other changes.]
> > [Rebase to latest perf/core]
> > [Another rebase. No changes]
> > 
> > This patchkit implements lbr-as-callgraphs in per freport,
> > as an alternative way to present LBR information.
> 
> Ok, I have this in my perf/core branch, but I need to test it further,
> as I couldn't get the output that appears on some of the changelogs.
> 
> If you could take a look at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core
> 
> To see if I made some mistake, that would be of help,
> 

I'm getting compile error for DEBUG=1

jirka


[jolsa@krava perf]$ make DEBUG=1 JOBS=1
  BUILD:   Doing 'make -j1' parallel build
  CC       util/machine.o
In file included from /home/jolsa/kernel.org/linux-perf/tools/perf/util/include/../../../../include/uapi/linux/swab.h:5:0,
                 from /home/jolsa/kernel.org/linux-perf/tools/perf/util/include/asm/byteorder.h:2,
                 from /home/jolsa/kernel.org/linux-perf/include/uapi/linux/perf_event.h:19,
                 from util/../perf.h:7,
                 from util/callchain.h:4,
                 from util/machine.c:1:
util/machine.c: In function ‘remove_loops’:
/home/jolsa/kernel.org/linux-perf/tools/include/linux/compiler.h:64:20: error: call to ‘__compiletime_assert_1450’ declared with attribute error: Please upgrade the type of the hash table from unsigned char
    prefix ## suffix();    \
                    ^
/home/jolsa/kernel.org/linux-perf/tools/include/linux/compiler.h:69:2: note: in expansion of macro ‘__compiletime_assert’
  __compiletime_assert(condition, msg, prefix, suffix)
  ^
/home/jolsa/kernel.org/linux-perf/tools/include/linux/compiler.h:81:2: note: in expansion of macro ‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^
/home/jolsa/kernel.org/linux-perf/tools/include/linux/bug.h:13:37: note: in expansion of macro ‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^
util/machine.c:1449:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(PERF_MAX_BRANCH_DEPTH > 127,
  ^
make[1]: *** [util/machine.o] Error 1
make: *** [all] Error 2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-18 10:44   ` Jiri Olsa
@ 2014-11-18 11:00     ` Jiri Olsa
  2014-11-18 13:37       ` Arnaldo Carvalho de Melo
  2014-11-19  6:21       ` Namhyung Kim
  0 siblings, 2 replies; 55+ messages in thread
From: Jiri Olsa @ 2014-11-18 11:00 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, linux-kernel, Namhyung Kim, Liang, Kan

On Tue, Nov 18, 2014 at 11:44:16AM +0100, Jiri Olsa wrote:
> On Mon, Nov 17, 2014 at 06:34:57PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
> > > [Reworks to address all the review feedback. Rebased to latest tree]
> > > [Just a repost after a rebase]
> > > [Even more review feedback and some bugs addressed.]
> > > [Only port to changes in perf/core. No other changes.]
> > > [Rebase to latest perf/core]
> > > [Another rebase. No changes]
> > > 
> > > This patchkit implements lbr-as-callgraphs in per freport,
> > > as an alternative way to present LBR information.
> > 
> > Ok, I have this in my perf/core branch, but I need to test it further,
> > as I couldn't get the output that appears on some of the changelogs.
> > 
> > If you could take a look at
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core
> > 
> > To see if I made some mistake, that would be of help,
> > 
> 
> I'm getting compile error for DEBUG=1

looks like the BUILD_BUG_ON_MSG does not work correctly with -O6 we use

jirka

> 
> jirka
> 
> 
> [jolsa@krava perf]$ make DEBUG=1 JOBS=1
>   BUILD:   Doing 'make -j1' parallel build
>   CC       util/machine.o
> In file included from /home/jolsa/kernel.org/linux-perf/tools/perf/util/include/../../../../include/uapi/linux/swab.h:5:0,
>                  from /home/jolsa/kernel.org/linux-perf/tools/perf/util/include/asm/byteorder.h:2,
>                  from /home/jolsa/kernel.org/linux-perf/include/uapi/linux/perf_event.h:19,
>                  from util/../perf.h:7,
>                  from util/callchain.h:4,
>                  from util/machine.c:1:
> util/machine.c: In function ‘remove_loops’:
> /home/jolsa/kernel.org/linux-perf/tools/include/linux/compiler.h:64:20: error: call to ‘__compiletime_assert_1450’ declared with attribute error: Please upgrade the type of the hash table from unsigned char
>     prefix ## suffix();    \
>                     ^
> /home/jolsa/kernel.org/linux-perf/tools/include/linux/compiler.h:69:2: note: in expansion of macro ‘__compiletime_assert’
>   __compiletime_assert(condition, msg, prefix, suffix)
>   ^
> /home/jolsa/kernel.org/linux-perf/tools/include/linux/compiler.h:81:2: note: in expansion of macro ‘_compiletime_assert’
>   _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>   ^
> /home/jolsa/kernel.org/linux-perf/tools/include/linux/bug.h:13:37: note: in expansion of macro ‘compiletime_assert’
>  #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>                                      ^
> util/machine.c:1449:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
>   BUILD_BUG_ON_MSG(PERF_MAX_BRANCH_DEPTH > 127,
>   ^
> make[1]: *** [util/machine.o] Error 1
> make: *** [all] Error 2

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-18 11:00     ` Jiri Olsa
@ 2014-11-18 13:37       ` Arnaldo Carvalho de Melo
  2014-11-19 15:31         ` Andi Kleen
  2014-11-19  6:21       ` Namhyung Kim
  1 sibling, 1 reply; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-18 13:37 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, Jiri Olsa, linux-kernel, Namhyung Kim, Liang, Kan

Em Tue, Nov 18, 2014 at 12:00:07PM +0100, Jiri Olsa escreveu:
> On Tue, Nov 18, 2014 at 11:44:16AM +0100, Jiri Olsa wrote:
> > On Mon, Nov 17, 2014 at 06:34:57PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
> > > > [Reworks to address all the review feedback. Rebased to latest tree]
> > > > [Just a repost after a rebase]
> > > > [Even more review feedback and some bugs addressed.]
> > > > [Only port to changes in perf/core. No other changes.]
> > > > [Rebase to latest perf/core]
> > > > [Another rebase. No changes]

> > > > This patchkit implements lbr-as-callgraphs in per freport,
> > > > as an alternative way to present LBR information.

> > > Ok, I have this in my perf/core branch, but I need to test it further,
> > > as I couldn't get the output that appears on some of the changelogs.

> > > If you could take a look at

> > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core

> > > To see if I made some mistake, that would be of help,

> > I'm getting compile error for DEBUG=1

> looks like the BUILD_BUG_ON_MSG does not work correctly with -O6 we use

Yeah, I'll probably just take the brute force approach Andi originally
used and switch to BUILD_BUG_ON_MSG once we get it working in the tools/
environment, i.e. using -O6, etc.

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-18 11:00     ` Jiri Olsa
  2014-11-18 13:37       ` Arnaldo Carvalho de Melo
@ 2014-11-19  6:21       ` Namhyung Kim
  2014-11-19  9:23         ` Jiri Olsa
  1 sibling, 1 reply; 55+ messages in thread
From: Namhyung Kim @ 2014-11-19  6:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Andi Kleen, Jiri Olsa, linux-kernel,
	Liang, Kan

Hi Jiri,

On Tue, 18 Nov 2014 12:00:07 +0100, Jiri Olsa wrote:
> On Tue, Nov 18, 2014 at 11:44:16AM +0100, Jiri Olsa wrote:
>> On Mon, Nov 17, 2014 at 06:34:57PM -0300, Arnaldo Carvalho de Melo wrote:
>> > Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
>> > > [Reworks to address all the review feedback. Rebased to latest tree]
>> > > [Just a repost after a rebase]
>> > > [Even more review feedback and some bugs addressed.]
>> > > [Only port to changes in perf/core. No other changes.]
>> > > [Rebase to latest perf/core]
>> > > [Another rebase. No changes]
>> > > 
>> > > This patchkit implements lbr-as-callgraphs in per freport,
>> > > as an alternative way to present LBR information.
>> > 
>> > Ok, I have this in my perf/core branch, but I need to test it further,
>> > as I couldn't get the output that appears on some of the changelogs.
>> > 
>> > If you could take a look at
>> > 
>> > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core
>> > 
>> > To see if I made some mistake, that would be of help,
>> > 
>> 
>> I'm getting compile error for DEBUG=1
>
> looks like the BUILD_BUG_ON_MSG does not work correctly with -O6 we use

Nope, DEBUG=1 prevents -O6 from applied.  we have below in config/Makefile:

  ifndef DEBUG
    DEBUG := 0
  endif

  ifeq ($(DEBUG),0)
    CFLAGS += -O6
  endif


Looking at include/linux/bug.h, BUILD_BUG_ON is guarded under
__OPTIMIZE__ but BUILD_BUG_ON_MSG is not.  What about changing it to
BUILD_BUG_ON then?

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19  6:21       ` Namhyung Kim
@ 2014-11-19  9:23         ` Jiri Olsa
  2014-11-19 10:54           ` Jiri Olsa
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2014-11-19  9:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Andi Kleen, Jiri Olsa, linux-kernel,
	Liang, Kan

On Wed, Nov 19, 2014 at 03:21:39PM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> On Tue, 18 Nov 2014 12:00:07 +0100, Jiri Olsa wrote:
> > On Tue, Nov 18, 2014 at 11:44:16AM +0100, Jiri Olsa wrote:
> >> On Mon, Nov 17, 2014 at 06:34:57PM -0300, Arnaldo Carvalho de Melo wrote:
> >> > Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
> >> > > [Reworks to address all the review feedback. Rebased to latest tree]
> >> > > [Just a repost after a rebase]
> >> > > [Even more review feedback and some bugs addressed.]
> >> > > [Only port to changes in perf/core. No other changes.]
> >> > > [Rebase to latest perf/core]
> >> > > [Another rebase. No changes]
> >> > > 
> >> > > This patchkit implements lbr-as-callgraphs in per freport,
> >> > > as an alternative way to present LBR information.
> >> > 
> >> > Ok, I have this in my perf/core branch, but I need to test it further,
> >> > as I couldn't get the output that appears on some of the changelogs.
> >> > 
> >> > If you could take a look at
> >> > 
> >> > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core
> >> > 
> >> > To see if I made some mistake, that would be of help,
> >> > 
> >> 
> >> I'm getting compile error for DEBUG=1
> >
> > looks like the BUILD_BUG_ON_MSG does not work correctly with -O6 we use
> 
> Nope, DEBUG=1 prevents -O6 from applied.  we have below in config/Makefile:

right.. the other way around ;))

> 
>   ifndef DEBUG
>     DEBUG := 0
>   endif
> 
>   ifeq ($(DEBUG),0)
>     CFLAGS += -O6
>   endif
> 
> 
> Looking at include/linux/bug.h, BUILD_BUG_ON is guarded under
> __OPTIMIZE__ but BUILD_BUG_ON_MSG is not.  What about changing it to
> BUILD_BUG_ON then?

sounds gut

jirka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19  9:23         ` Jiri Olsa
@ 2014-11-19 10:54           ` Jiri Olsa
  2014-11-19 14:10             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2014-11-19 10:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Andi Kleen, Jiri Olsa, linux-kernel,
	Liang, Kan

On Wed, Nov 19, 2014 at 10:23:23AM +0100, Jiri Olsa wrote:
> On Wed, Nov 19, 2014 at 03:21:39PM +0900, Namhyung Kim wrote:
> > Hi Jiri,
> > 
> > On Tue, 18 Nov 2014 12:00:07 +0100, Jiri Olsa wrote:
> > > On Tue, Nov 18, 2014 at 11:44:16AM +0100, Jiri Olsa wrote:
> > >> On Mon, Nov 17, 2014 at 06:34:57PM -0300, Arnaldo Carvalho de Melo wrote:
> > >> > Em Wed, Nov 12, 2014 at 06:05:18PM -0800, Andi Kleen escreveu:
> > >> > > [Reworks to address all the review feedback. Rebased to latest tree]
> > >> > > [Just a repost after a rebase]
> > >> > > [Even more review feedback and some bugs addressed.]
> > >> > > [Only port to changes in perf/core. No other changes.]
> > >> > > [Rebase to latest perf/core]
> > >> > > [Another rebase. No changes]
> > >> > > 
> > >> > > This patchkit implements lbr-as-callgraphs in per freport,
> > >> > > as an alternative way to present LBR information.
> > >> > 
> > >> > Ok, I have this in my perf/core branch, but I need to test it further,
> > >> > as I couldn't get the output that appears on some of the changelogs.
> > >> > 
> > >> > If you could take a look at
> > >> > 
> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux perf/core
> > >> > 
> > >> > To see if I made some mistake, that would be of help,
> > >> > 
> > >> 
> > >> I'm getting compile error for DEBUG=1
> > >
> > > looks like the BUILD_BUG_ON_MSG does not work correctly with -O6 we use
> > 
> > Nope, DEBUG=1 prevents -O6 from applied.  we have below in config/Makefile:
> 
> right.. the other way around ;))
> 
> > 
> >   ifndef DEBUG
> >     DEBUG := 0
> >   endif
> > 
> >   ifeq ($(DEBUG),0)
> >     CFLAGS += -O6
> >   endif
> > 
> > 
> > Looking at include/linux/bug.h, BUILD_BUG_ON is guarded under
> > __OPTIMIZE__ but BUILD_BUG_ON_MSG is not.  What about changing it to
> > BUILD_BUG_ON then?
> 
> sounds gut

hum, acme/perf/core changed and so has the compile error ;-)
we dont overload the <linux/bug.h>, so the kernel one got
included, which is wrong.. attached patch fixes that

jirka


[jolsa@krava perf]$ make JOBS=1
  BUILD:   Doing 'make -j1' parallel build
  CC       perf.o
  CC       builtin-annotate.o
In file included from util/machine.h:5:0,
                 from util/session.h:7,
                 from builtin-annotate.c:29:
/home/jolsa/kernel.org/linux-perf/include/linux/bug.h: In function ‘report_bug’:
/home/jolsa/kernel.org/linux-perf/include/linux/bug.h:105:59: error: unused parameter ‘bug_addr’ [-Werror=unused-parameter]
 static inline enum bug_trap_type report_bug(unsigned long bug_addr,
                                                           ^
/home/jolsa/kernel.org/linux-perf/include/linux/bug.h:106:26: error: unused parameter ‘regs’ [-Werror=unused-parameter]
          struct pt_regs *regs)
                          ^
cc1: all warnings being treated as errors
make[1]: *** [builtin-annotate.o] Error 1
make: *** [all] Error 2


---
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 180d16c19a2b..e8b7779a0a3f 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -2,7 +2,6 @@
 #define __PERF_MACHINE_H
 
 #include <sys/types.h>
-#include <linux/bug.h>
 #include <linux/rbtree.h>
 #include "map.h"
 #include "dso.h"

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19 10:54           ` Jiri Olsa
@ 2014-11-19 14:10             ` Arnaldo Carvalho de Melo
  2014-11-19 16:04               ` Arnaldo Carvalho de Melo
  2014-11-19 21:50               ` Andi Kleen
  0 siblings, 2 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-19 14:10 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Namhyung Kim, Andi Kleen, Jiri Olsa, linux-kernel, Liang, Kan

Em Wed, Nov 19, 2014 at 11:54:50AM +0100, Jiri Olsa escreveu:
> > > BUILD_BUG_ON then?
> > 
> > sounds gut
> 
> hum, acme/perf/core changed and so has the compile error ;-)
> we dont overload the <linux/bug.h>, so the kernel one got
> included, which is wrong.. attached patch fixes that

I had fixed this but not force pushed out, sorry.

Now I mistakenly tried running:

perf report --stdio --no-children --branch-history

on a file that has no BRANCH_STACK, i.e. a perf.data file on a wrong
directory since I'm comparing the output of --stdio, --tui and --gtk,
since it looks --gtk is wrong, still unsure about what the problem is in
that case, but stumbled on:

--branch-history works with this perf.data:

[acme@zoo ~]$ perf evlist --verbose
cycles: sample_freq=4000, size: 96, sample_type:
IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, disabled: 1, inherit: 1,
mmap: 1, mmap2: 1, comm: 1, comm_exec: 1, freq: 1, enable_on_exec: 1,
sample_id_all: 1, exclude_guest: 1, branch_sample_type: 8
[acme@zoo ~]$


But not with this one:

[root@zoo ~]# perf evlist --verbose
cycles: sample_freq=4000, size: 96, sample_type:
IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled: 1, inherit: 1, mmap: 1,
mmap2: 1, comm: 1, comm_exec: 1, freq: 1, sample_id_all: 1,
exclude_guest: 1
[root@zoo ~]# 

Where I get tons of:

[root@zoo ~]# perf report --stdio --no-children --branch-history
# To display the perf.data header info, please use --header/--header-only options.
#
BFD: Dwarf Error: Offset (994443) greater than or equal to .debug_str size (156124).
BFD: Dwarf Error: Offset (1162200) greater than or equal to .debug_str size (156124).
BFD: Dwarf Error: Offset (1929446660) greater than or equal to .debug_str size (1940537).
BFD: Dwarf Error: Offset (465055236) greater than or equal to .debug_str size (156124).
BFD: Dwarf Error: Could not find abbrev number 736412.
BFD: Dwarf Error: Offset (470186753) greater than or equal to .debug_str size (1940537).
<SNIP>
BFD: Dwarf Error: Offset (3525378816) greater than or equal to .debug_str size (156124).
BFD: Dwarf Error: Offset (1483770) greater than or equal to .debug_str size (156124).
BFD: Dwarf Error: Could not find abbrev number 127.
^C^C^C^C^C^C
^C^C^C^Z

and finally a:


Segmentation fault (core dumped)
[root@zoo ~]#

Rebuilding with DEBUG=1 I get this backtrace:

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `perf report --stdio --no-children --branch-history'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000053ab10 in read_unsigned_leb128 ()
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-9.fc20.x86_64 elfutils-libelf-0.160-1.fc20.x86_64 elfutils-libs-0.160-1.fc20.x86_64 libunwind-1.1-3.fc20.x86_64 nss-softokn-freebl-3.17.2-1.fc20.x86_64 numactl-libs-2.0.9-2.fc20.x86_64 slang-2.2.4-11.fc20.x86_64 xz-libs-5.1.2-12alpha.fc20.x86_64
(gdb) bt
#0  0x000000000053ab10 in read_unsigned_leb128 ()
#1  0x00000000005485d7 in find_abstract_instance_name.isra ()
#2  0x0000000000548756 in find_abstract_instance_name.isra ()
#3  0x0000000000548756 in find_abstract_instance_name.isra ()
#4  0x0000000000548d31 in scan_unit_for_symbols ()
#5  0x00000000005496c9 in comp_unit_find_nearest_line ()
#6  0x000000000054a6ca in find_line.part ()
#7  0x00000000005606ca in _bfd_elf_find_nearest_line_discriminator ()
#8  0x00000000005607db in _bfd_elf_find_nearest_line ()
#9  0x00000000004dabf1 in find_address_in_section (abfd=0xfd522f0, section=0xe8be528, data=0xfa50c30) at util/srcline.c:100
#10 0x000000000053cbdc in bfd_map_over_sections ()
#11 0x00000000004dae84 in addr2line (dso_name=0xe89fe70 "/usr/lib/debug/usr/lib64/firefox/libxul.so.debug", addr=24667432, file=0x7fff3981f9c0, line=0x7fff3981f9bc, dso=0x22a0b50) at util/srcline.c:168
#12 0x00000000004db017 in get_srcline (dso=0x22a0b50, addr=24667432, sym=0xf6b93f0, show_sym=true) at util/srcline.c:276
#13 0x00000000004c5a31 in sort__srcline_cmp (left=0x39494a0, right=0x7fff3981fb80) at util/sort.c:303
#14 0x00000000004cbe96 in hist_entry__cmp (left=0x39494a0, right=0x7fff3981fb80) at util/hist.c:916
#15 0x00000000004caa97 in add_hist_entry (hists=0x229e688, entry=0x7fff3981fb80, al=0x7fff3981fe00, sample_self=true) at util/hist.c:397
#16 0x00000000004cae19 in __hists__add_entry (hists=0x229e688, al=0x7fff3981fe00, sym_parent=0x0, bi=0x0, mi=0x0, period=770586, weight=0, transaction=0, sample_self=true) at util/hist.c:477
#17 0x00000000004cb585 in iter_add_single_normal_entry (iter=0x7fff3981fe30, al=0x7fff3981fe00) at util/hist.c:668
#18 0x00000000004cbcfe in hist_entry_iter__add (iter=0x7fff3981fe30, al=0x7fff3981fe00, evsel=0x229e570, sample=0x7fff39820020, max_stack_depth=127, arg=0x7fff39820510) at util/hist.c:876
#19 0x000000000043206c in process_sample_event (tool=0x7fff39820510, event=0x7fcb0c4d8040, sample=0x7fff39820020, evsel=0x229e570, machine=0x229c9d0) at builtin-report.c:182
#20 0x00000000004afe4e in perf_session__deliver_sample (session=0x229c910, tool=0x7fff39820510, event=0x7fcb0c4d8040, sample=0x7fff39820020, evsel=0x229e570, machine=0x229c9d0) at util/session.c:805
#21 0x00000000004affeb in perf_session__deliver_event (session=0x229c910, event=0x7fcb0c4d8040, sample=0x7fff39820020, tool=0x7fff39820510, file_offset=1146944) at util/session.c:842
#22 0x00000000004b2ee4 in __ordered_events__flush (s=0x229c910, tool=0x7fff39820510) at util/ordered-events.c:186
#23 0x00000000004b31e6 in ordered_events__flush (s=0x229c910, tool=0x7fff39820510, how=OE_FLUSH__FINAL) at util/ordered-events.c:251
#24 0x00000000004b1236 in __perf_session__process_events (session=0x229c910, data_offset=216, data_size=1220024, file_size=1220240, tool=0x7fff39820510) at util/session.c:1318
#25 0x00000000004b132e in perf_session__process_events (session=0x229c910, tool=0x7fff39820510) at util/session.c:1337
#26 0x0000000000432d5b in __cmd_report (rep=0x7fff39820510) at builtin-report.c:483
#27 0x00000000004341ca in cmd_report (argc=0, argv=0x7fff39821950, prefix=0x0) at builtin-report.c:851
#28 0x000000000041c8bc in run_builtin (p=0x857ec8 <commands+168>, argc=4, argv=0x7fff39821950) at perf.c:331
#29 0x000000000041cb1b in handle_internal_command (argc=4, argv=0x7fff39821950) at perf.c:390
#30 0x000000000041cc67 in run_argv (argcp=0x7fff398217ac, argv=0x7fff398217a0) at perf.c:434
#31 0x000000000041cfbe in main (argc=4, argv=0x7fff39821950) at perf.c:549
(gdb)

Haven't checked yet, but I guess we need to check, when --branch-history is
passed, if the sample_type has BRANCH_STACK, etc.

Now back to checking why the --gtk output doesn't match, then I go back to this
other bug.

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-18 13:37       ` Arnaldo Carvalho de Melo
@ 2014-11-19 15:31         ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-19 15:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Jiri Olsa, linux-kernel, Namhyung Kim, Liang, Kan

> > > I'm getting compile error for DEBUG=1
> 
> > looks like the BUILD_BUG_ON_MSG does not work correctly with -O6 we use
> 
> Yeah, I'll probably just take the brute force approach Andi originally
> used and switch to BUILD_BUG_ON_MSG once we get it working in the tools/
> environment, i.e. using -O6, etc.

For new enough gcc you could use _Static_assert()

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19 14:10             ` Arnaldo Carvalho de Melo
@ 2014-11-19 16:04               ` Arnaldo Carvalho de Melo
  2014-11-19 21:48                 ` Andi Kleen
  2014-11-19 21:50               ` Andi Kleen
  1 sibling, 1 reply; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-19 16:04 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Namhyung Kim, Andi Kleen, Jiri Olsa, linux-kernel, Liang, Kan

Em Wed, Nov 19, 2014 at 11:10:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Nov 19, 2014 at 11:54:50AM +0100, Jiri Olsa escreveu:
> > > > BUILD_BUG_ON then?
> > > 
> > > sounds gut
> > 
> > hum, acme/perf/core changed and so has the compile error ;-)
> > we dont overload the <linux/bug.h>, so the kernel one got
> > included, which is wrong.. attached patch fixes that
> 
> I had fixed this but not force pushed out, sorry.
> 
> Now I mistakenly tried running:
> 
> perf report --stdio --no-children --branch-history
> 
> on a file that has no BRANCH_STACK, i.e. a perf.data file on a wrong
> directory since I'm comparing the output of --stdio, --tui and --gtk,
> since it looks --gtk is wrong, still unsure about what the problem is in
> that case, but stumbled on:
> 

I need to investigate this further, so I created a perf/branch-history
branch that has the patches I need to test more rebased on top of my
perf/core branch I just pushed out to Ingo.

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19 16:04               ` Arnaldo Carvalho de Melo
@ 2014-11-19 21:48                 ` Andi Kleen
  2014-11-20 19:33                   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-19 21:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Andi Kleen, Jiri Olsa, linux-kernel, Liang, Kan

On Wed, Nov 19, 2014 at 01:04:58PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 19, 2014 at 11:10:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Nov 19, 2014 at 11:54:50AM +0100, Jiri Olsa escreveu:
> > > > > BUILD_BUG_ON then?
> > > > 
> > > > sounds gut
> > > 
> > > hum, acme/perf/core changed and so has the compile error ;-)
> > > we dont overload the <linux/bug.h>, so the kernel one got
> > > included, which is wrong.. attached patch fixes that
> > 
> > I had fixed this but not force pushed out, sorry.
> > 
> > Now I mistakenly tried running:
> > 
> > perf report --stdio --no-children --branch-history
> > 
> > on a file that has no BRANCH_STACK, i.e. a perf.data file on a wrong

It works with -g.
Without -g it will just give an error message. I think that is ok, isn't it?

> > directory since I'm comparing the output of --stdio, --tui and --gtk,
> > since it looks --gtk is wrong, still unsure about what the problem is in
> > that case, but stumbled on:
> > 
> 
> I need to investigate this further, so I created a perf/branch-history
> branch that has the patches I need to test more rebased on top of my
> perf/core branch I just pushed out to Ingo.


I tested --gtk and I don't see any differences to the console mode
with --branch-history.  What problem do you see?

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19 14:10             ` Arnaldo Carvalho de Melo
  2014-11-19 16:04               ` Arnaldo Carvalho de Melo
@ 2014-11-19 21:50               ` Andi Kleen
  2014-11-20 20:36                 ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-19 21:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Andi Kleen, Jiri Olsa, linux-kernel, Liang, Kan

> Segmentation fault (core dumped)
> [root@zoo ~]#
> 
> Rebuilding with DEBUG=1 I get this backtrace:

Ok it crashes inside libbfd while resolving the line. 
Not much I can do about it. 

Could file a bug with the binutils people, if you have a test case?

> 
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `perf report --stdio --no-children --branch-history'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x000000000053ab10 in read_unsigned_leb128 ()
> Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-9.fc20.x86_64 elfutils-libelf-0.160-1.fc20.x86_64 elfutils-libs-0.160-1.fc20.x86_64 libunwind-1.1-3.fc20.x86_64 nss-softokn-freebl-3.17.2-1.fc20.x86_64 numactl-libs-2.0.9-2.fc20.x86_64 slang-2.2.4-11.fc20.x86_64 xz-libs-5.1.2-12alpha.fc20.x86_64
> (gdb) bt
> #0  0x000000000053ab10 in read_unsigned_leb128 ()
> #1  0x00000000005485d7 in find_abstract_instance_name.isra ()
> #2  0x0000000000548756 in find_abstract_instance_name.isra ()
> #3  0x0000000000548756 in find_abstract_instance_name.isra ()
> #4  0x0000000000548d31 in scan_unit_for_symbols ()
> #5  0x00000000005496c9 in comp_unit_find_nearest_line ()
> #6  0x000000000054a6ca in find_line.part ()
> #7  0x00000000005606ca in _bfd_elf_find_nearest_line_discriminator ()
> #8  0x00000000005607db in _bfd_elf_find_nearest_line ()
> #9  0x00000000004dabf1 in find_address_in_section (abfd=0xfd522f0, section=0xe8be528, data=0xfa50c30) at util/srcline.c:100
> #10 0x000000000053cbdc in bfd_map_over_sections ()
> #11 0x00000000004dae84 in addr2line (dso_name=0xe89fe70 "/usr/lib/debug/usr/lib64/firefox/libxul.so.debug", addr=24667432, file=0x7fff3981f9c0, line=0x7fff3981f9bc, dso=0x22a0b50) at util/srcline.c:168
> #12 0x00000000004db017 in get_srcline (dso=0x22a0b50, addr=24667432, sym=0xf6b93f0, show_sym=true) at util/srcline.c:276

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf callchain: Factor out adding new call chain entries
  2014-11-13  2:05 ` [PATCH 01/10] perf, tools: Factor out adding new call chain entries Andi Kleen
  2014-11-13 19:14   ` Arnaldo Carvalho de Melo
@ 2014-11-20  7:37   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-11-20  7:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, ak, mingo, hpa, namhyung, tglx, acme, jolsa

Commit-ID:  37592b8afb7151994e760d1727c264329d9c13c8
Gitweb:     http://git.kernel.org/tip/37592b8afb7151994e760d1727c264329d9c13c8
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:19 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Nov 2014 12:33:47 -0300

perf callchain: Factor out adding new call chain entries

Move the code to resolve and add a new callchain entry into a new
add_callchain_ip function. This will be used in the next patches to add
LBRs too.

No change in behavior.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 51 +++++++++++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 52e9490..84390ee 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1381,6 +1381,34 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 	return mi;
 }
 
+static int add_callchain_ip(struct thread *thread,
+			    struct symbol **parent,
+			    struct addr_location *root_al,
+			    int cpumode,
+			    u64 ip)
+{
+	struct addr_location al;
+
+	al.filtered = 0;
+	al.sym = NULL;
+	thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
+				   ip, &al);
+	if (al.sym != NULL) {
+		if (sort__has_parent && !*parent &&
+		    symbol__match_regex(al.sym, &parent_regex))
+			*parent = al.sym;
+		else if (have_ignore_callees && root_al &&
+		  symbol__match_regex(al.sym, &ignore_callees_regex)) {
+			/* Treat this symbol as the root,
+			   forgetting its callees. */
+			*root_al = al;
+			callchain_cursor_reset(&callchain_cursor);
+		}
+	}
+
+	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+}
+
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 					   struct addr_location *al)
 {
@@ -1427,7 +1455,6 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 
 	for (i = 0; i < chain_nr; i++) {
 		u64 ip;
-		struct addr_location al;
 
 		if (callchain_param.order == ORDER_CALLEE)
 			j = i;
@@ -1464,24 +1491,10 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 			continue;
 		}
 
-		al.filtered = 0;
-		thread__find_addr_location(thread, cpumode,
-					   MAP__FUNCTION, ip, &al);
-		if (al.sym != NULL) {
-			if (sort__has_parent && !*parent &&
-			    symbol__match_regex(al.sym, &parent_regex))
-				*parent = al.sym;
-			else if (have_ignore_callees && root_al &&
-			  symbol__match_regex(al.sym, &ignore_callees_regex)) {
-				/* Treat this symbol as the root,
-				   forgetting its callees. */
-				*root_al = al;
-				callchain_cursor_reset(&callchain_cursor);
-			}
-		}
-
-		err = callchain_cursor_append(&callchain_cursor,
-					      ip, al.map, al.sym);
+		err = add_callchain_ip(thread, parent, root_al,
+				       cpumode, ip);
+		if (err == -EINVAL)
+			break;
 		if (err)
 			return err;
 	}

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf callchain: Use al.addr to set up call chain
  2014-11-13  2:05 ` [PATCH 03/10] perf, tools: Use al.addr to set up call chain Andi Kleen
  2014-11-13 19:16   ` Arnaldo Carvalho de Melo
@ 2014-11-20  7:38   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-11-20  7:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, jolsa, ak, tglx, linux-kernel, namhyung, acme, hpa

Commit-ID:  5550171b2a9f8df26ff483051d060db06376b26d
Gitweb:     http://git.kernel.org/tip/5550171b2a9f8df26ff483051d060db06376b26d
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:21 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Nov 2014 12:33:47 -0300

perf callchain: Use al.addr to set up call chain

Use the relative address, this makes get_srcline work correctly in the
end.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-4-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 84390ee..d97309c 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1406,7 +1406,7 @@ static int add_callchain_ip(struct thread *thread,
 		}
 	}
 
-	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+	return callchain_cursor_append(&callchain_cursor, al.addr, al.map, al.sym);
 }
 
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf callchain: Use a common function to resolve symbol or name
  2014-11-13  2:05 ` [PATCH 05/10] perf, tools: Use a common function to resolve symbol or name Andi Kleen
  2014-11-13 19:17   ` Arnaldo Carvalho de Melo
@ 2014-11-20  7:38   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-11-20  7:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: namhyung, mingo, jolsa, tglx, acme, ak, hpa, linux-kernel

Commit-ID:  2989ccaac48f8c3da7f77101bbf98e0ea8773d83
Gitweb:     http://git.kernel.org/tip/2989ccaac48f8c3da7f77101bbf98e0ea8773d83
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:23 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Nov 2014 12:33:47 -0300

perf callchain: Use a common function to resolve symbol or name

Refactor the duplicated code to resolve the symbol name or
the address of a symbol into a single function.

Used in next patch to add common functionality.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/hists.c | 17 -----------------
 tools/perf/ui/gtk/hists.c      | 11 +----------
 tools/perf/ui/stdio/hist.c     | 23 +++++++++--------------
 tools/perf/util/callchain.c    | 19 +++++++++++++++++++
 tools/perf/util/callchain.h    |  3 +++
 5 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index cfb976b..12c17c5 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -463,23 +463,6 @@ out:
 	return key;
 }
 
-static char *callchain_list__sym_name(struct callchain_list *cl,
-				      char *bf, size_t bfsize, bool show_dso)
-{
-	int printed;
-
-	if (cl->ms.sym)
-		printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-	else
-		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-
-	if (show_dso)
-		scnprintf(bf + printed, bfsize - printed, " %s",
-			  cl->ms.map ? cl->ms.map->dso->short_name : "unknown");
-
-	return bf;
-}
-
 struct callchain_print_arg {
 	/* for hists browser */
 	off_t	row_offset;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index fc654fb..4b3585e 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -89,15 +89,6 @@ void perf_gtk__init_hpp(void)
 				perf_gtk__hpp_color_overhead_acc;
 }
 
-static void callchain_list__sym_name(struct callchain_list *cl,
-				     char *bf, size_t bfsize)
-{
-	if (cl->ms.sym)
-		scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-	else
-		scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-}
-
 static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 				    GtkTreeIter *parent, int col, u64 total)
 {
@@ -128,7 +119,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 			scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
 			gtk_tree_store_set(store, &iter, 0, buf, -1);
 
-			callchain_list__sym_name(chain, buf, sizeof(buf));
+			callchain_list__sym_name(chain, buf, sizeof(buf), false);
 			gtk_tree_store_set(store, &iter, col, buf, -1);
 
 			if (need_new_parent) {
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 15b451a..dfcbc90 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -41,6 +41,7 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 {
 	int i;
 	size_t ret = 0;
+	char bf[1024];
 
 	ret += callchain__fprintf_left_margin(fp, left_margin);
 	for (i = 0; i < depth; i++) {
@@ -56,11 +57,8 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 		} else
 			ret += fprintf(fp, "%s", "          ");
 	}
-	if (chain->ms.sym)
-		ret += fprintf(fp, "%s\n", chain->ms.sym->name);
-	else
-		ret += fprintf(fp, "0x%0" PRIx64 "\n", chain->ip);
-
+	fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
+	fputc('\n', fp);
 	return ret;
 }
 
@@ -168,6 +166,7 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 	struct rb_node *node;
 	int i = 0;
 	int ret = 0;
+	char bf[1024];
 
 	/*
 	 * If have one single callchain root, don't bother printing
@@ -196,10 +195,8 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 			} else
 				ret += callchain__fprintf_left_margin(fp, left_margin);
 
-			if (chain->ms.sym)
-				ret += fprintf(fp, " %s\n", chain->ms.sym->name);
-			else
-				ret += fprintf(fp, " %p\n", (void *)(long)chain->ip);
+			ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
+							false));
 
 			if (++entries_printed == callchain_param.print_limit)
 				break;
@@ -219,6 +216,7 @@ static size_t __callchain__fprintf_flat(FILE *fp, struct callchain_node *node,
 {
 	struct callchain_list *chain;
 	size_t ret = 0;
+	char bf[1024];
 
 	if (!node)
 		return 0;
@@ -229,11 +227,8 @@ static size_t __callchain__fprintf_flat(FILE *fp, struct callchain_node *node,
 	list_for_each_entry(chain, &node->val, list) {
 		if (chain->ip >= PERF_CONTEXT_MAX)
 			continue;
-		if (chain->ms.sym)
-			ret += fprintf(fp, "                %s\n", chain->ms.sym->name);
-		else
-			ret += fprintf(fp, "                %p\n",
-					(void *)(long)chain->ip);
+		ret += fprintf(fp, "                %s\n", callchain_list__sym_name(chain,
+					bf, sizeof(bf), false));
 	}
 
 	return ret;
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0022980..38da69c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -808,3 +808,22 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 out:
 	return 1;
 }
+
+char *callchain_list__sym_name(struct callchain_list *cl,
+			       char *bf, size_t bfsize, bool show_dso)
+{
+	int printed;
+
+	if (cl->ms.sym) {
+		printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+	} else
+		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
+
+	if (show_dso)
+		scnprintf(bf + printed, bfsize - printed, " %s",
+			  cl->ms.map ?
+			  cl->ms.map->dso->short_name :
+			  "unknown");
+
+	return bf;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3caccc2..3e1ed15 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -193,4 +193,7 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
 }
 #endif
 
+char *callchain_list__sym_name(struct callchain_list *cl,
+			       char *bf, size_t bfsize, bool show_dso);
+
 #endif	/* __PERF_CALLCHAIN_H */

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf tools: Only print base source file for srcline
  2014-11-13  2:05 ` [PATCH 07/10] perf, tools: Only print base source file for srcline Andi Kleen
  2014-11-13 19:22   ` Arnaldo Carvalho de Melo
@ 2014-11-20  7:38   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-11-20  7:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: namhyung, linux-kernel, ak, tglx, hpa, jolsa, mingo, acme

Commit-ID:  2de217688e8f086bf6d920d530401b56fcbc6eff
Gitweb:     http://git.kernel.org/tip/2de217688e8f086bf6d920d530401b56fcbc6eff
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:25 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Nov 2014 12:33:47 -0300

perf tools: Only print base source file for srcline

For perf report with --sort srcline only print the base source file
name. This makes the results generally fit much better to the screen.
The path is usually not that useful anyways because it is often from
different systems.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-8-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/srcline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index f3e4bc5..77c1806 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -274,7 +274,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
 	if (!addr2line(dso_name, addr, &file, &line, dso))
 		goto out;
 
-	if (asprintf(&srcline, "%s:%u", file, line) < 0) {
+	if (asprintf(&srcline, "%s:%u", basename(file), line) < 0) {
 		free(file);
 		goto out;
 	}

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf annotate: Support source line numbers in annotate
  2014-11-13  2:05 ` [PATCH 08/10] perf, tools: Support source line numbers in annotate Andi Kleen
  2014-11-13 20:52   ` Arnaldo Carvalho de Melo
@ 2014-11-20  7:39   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-11-20  7:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, tglx, linux-kernel, hpa, acme, namhyung, mingo, ak

Commit-ID:  e592488c01d51763de847fcecb3d969231a483a9
Gitweb:     http://git.kernel.org/tip/e592488c01d51763de847fcecb3d969231a483a9
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:26 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Nov 2014 12:33:48 -0300

perf annotate: Support source line numbers in annotate

With srcline key/sort'ing it's useful to have line numbers in the
annotate window. This patch implements this.

Use objdump -l to request the line numbers and save them in the line
structure. Then the browser displays them for source lines.

The line numbers are not displayed by default, but can be toggled on
with 'k'

There is one unfortunate problem with this setup. For lines not
containing source and which are outside functions objdump -l reports
line numbers off by a few: it always reports the first line number in
the next function even for lines that are outside the function.

I haven't found a nice way to detect/correct this. Probably objdump has
to be fixed.

See https://sourceware.org/bugzilla/show_bug.cgi?id=16433

The line numbers are still useful even with these problems, as most are
correct and the ones which are not are nearby.

v2: Fix help text. Handle (discriminator...) output in objdump.
Left align the line numbers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-9-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/annotate.c | 13 ++++++++++++-
 tools/perf/util/annotate.c        | 30 +++++++++++++++++++++++++-----
 tools/perf/util/annotate.h        |  1 +
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index f0697a3..1e0a2fd 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -27,6 +27,7 @@ static struct annotate_browser_opt {
 	bool hide_src_code,
 	     use_offset,
 	     jump_arrows,
+	     show_linenr,
 	     show_nr_jumps;
 } annotate_browser__opts = {
 	.use_offset	= true,
@@ -128,7 +129,11 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
 	if (!*dl->line)
 		slsmg_write_nstring(" ", width - pcnt_width);
 	else if (dl->offset == -1) {
-		printed = scnprintf(bf, sizeof(bf), "%*s  ",
+		if (dl->line_nr && annotate_browser__opts.show_linenr)
+			printed = scnprintf(bf, sizeof(bf), "%-*d ",
+					ab->addr_width + 1, dl->line_nr);
+		else
+			printed = scnprintf(bf, sizeof(bf), "%*s  ",
 				    ab->addr_width, " ");
 		slsmg_write_nstring(bf, printed);
 		slsmg_write_nstring(dl->line, width - printed - pcnt_width + 1);
@@ -733,6 +738,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
 		"o             Toggle disassembler output/simplified view\n"
 		"s             Toggle source code view\n"
 		"/             Search string\n"
+		"k             Toggle line numbers\n"
 		"r             Run available scripts\n"
 		"?             Search string backwards\n");
 			continue;
@@ -741,6 +747,10 @@ static int annotate_browser__run(struct annotate_browser *browser,
 				script_browse(NULL);
 				continue;
 			}
+		case 'k':
+			annotate_browser__opts.show_linenr =
+				!annotate_browser__opts.show_linenr;
+			break;
 		case 'H':
 			nd = browser->curr_hot;
 			break;
@@ -984,6 +994,7 @@ static struct annotate_config {
 } annotate__configs[] = {
 	ANNOTATE_CFG(hide_src_code),
 	ANNOTATE_CFG(jump_arrows),
+	ANNOTATE_CFG(show_linenr),
 	ANNOTATE_CFG(show_nr_jumps),
 	ANNOTATE_CFG(use_offset),
 };
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 873c877..e5670f1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -17,11 +17,13 @@
 #include "debug.h"
 #include "annotate.h"
 #include "evsel.h"
+#include <regex.h>
 #include <pthread.h>
 #include <linux/bitops.h>
 
 const char 	*disassembler_style;
 const char	*objdump_path;
+static regex_t	 file_lineno;
 
 static struct ins *ins__find(const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
@@ -570,13 +572,15 @@ out_free_name:
 	return -1;
 }
 
-static struct disasm_line *disasm_line__new(s64 offset, char *line, size_t privsize)
+static struct disasm_line *disasm_line__new(s64 offset, char *line,
+					size_t privsize, int line_nr)
 {
 	struct disasm_line *dl = zalloc(sizeof(*dl) + privsize);
 
 	if (dl != NULL) {
 		dl->offset = offset;
 		dl->line = strdup(line);
+		dl->line_nr = line_nr;
 		if (dl->line == NULL)
 			goto out_delete;
 
@@ -788,13 +792,15 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st
  * The ops.raw part will be parsed further according to type of the instruction.
  */
 static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
-				      FILE *file, size_t privsize)
+				      FILE *file, size_t privsize,
+				      int *line_nr)
 {
 	struct annotation *notes = symbol__annotation(sym);
 	struct disasm_line *dl;
 	char *line = NULL, *parsed_line, *tmp, *tmp2, *c;
 	size_t line_len;
 	s64 line_ip, offset = -1;
+	regmatch_t match[2];
 
 	if (getline(&line, &line_len, file) < 0)
 		return -1;
@@ -812,6 +818,12 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 	line_ip = -1;
 	parsed_line = line;
 
+	/* /filename:linenr ? Save line number and ignore. */
+	if (regexec(&file_lineno, line, 2, match, 0) == 0) {
+		*line_nr = atoi(line + match[1].rm_so);
+		return 0;
+	}
+
 	/*
 	 * Strip leading spaces:
 	 */
@@ -842,8 +854,9 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 			parsed_line = tmp2 + 1;
 	}
 
-	dl = disasm_line__new(offset, parsed_line, privsize);
+	dl = disasm_line__new(offset, parsed_line, privsize, *line_nr);
 	free(line);
+	(*line_nr)++;
 
 	if (dl == NULL)
 		return -1;
@@ -869,6 +882,11 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 	return 0;
 }
 
+static __attribute__((constructor)) void symbol__init_regexpr(void)
+{
+	regcomp(&file_lineno, "^/[^:]+:([0-9]+)", REG_EXTENDED);
+}
+
 static void delete_last_nop(struct symbol *sym)
 {
 	struct annotation *notes = symbol__annotation(sym);
@@ -904,6 +922,7 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 	char symfs_filename[PATH_MAX];
 	struct kcore_extract kce;
 	bool delete_extract = false;
+	int lineno = 0;
 
 	if (filename)
 		symbol__join_symfs(symfs_filename, filename);
@@ -984,7 +1003,7 @@ fallback:
 	snprintf(command, sizeof(command),
 		 "%s %s%s --start-address=0x%016" PRIx64
 		 " --stop-address=0x%016" PRIx64
-		 " -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
+		 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
 		 objdump_path ? objdump_path : "objdump",
 		 disassembler_style ? "-M " : "",
 		 disassembler_style ? disassembler_style : "",
@@ -1001,7 +1020,8 @@ fallback:
 		goto out_free_filename;
 
 	while (!feof(file))
-		if (symbol__parse_objdump_line(sym, map, file, privsize) < 0)
+		if (symbol__parse_objdump_line(sym, map, file, privsize,
+			    &lineno) < 0)
 			break;
 
 	/*
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 112d6e2..0784a94 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -58,6 +58,7 @@ struct disasm_line {
 	char		    *line;
 	char		    *name;
 	struct ins	    *ins;
+	int		    line_nr;
 	struct ins_operands ops;
 };
 

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19 21:48                 ` Andi Kleen
@ 2014-11-20 19:33                   ` Arnaldo Carvalho de Melo
  2014-11-20 20:46                     ` Andi Kleen
  2014-11-21 20:30                     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-20 19:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Olsa, Namhyung Kim, Jiri Olsa, linux-kernel, Liang, Kan

Em Wed, Nov 19, 2014 at 10:48:22PM +0100, Andi Kleen escreveu:
> On Wed, Nov 19, 2014 at 01:04:58PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Nov 19, 2014 at 11:10:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Wed, Nov 19, 2014 at 11:54:50AM +0100, Jiri Olsa escreveu:
> > > > > > BUILD_BUG_ON then?
> > > > > 
> > > > > sounds gut
> > > > 
> > > > hum, acme/perf/core changed and so has the compile error ;-)
> > > > we dont overload the <linux/bug.h>, so the kernel one got
> > > > included, which is wrong.. attached patch fixes that
> > > 
> > > I had fixed this but not force pushed out, sorry.
> > > 
> > > Now I mistakenly tried running:
> > > 
> > > perf report --stdio --no-children --branch-history
> > > 
> > > on a file that has no BRANCH_STACK, i.e. a perf.data file on a wrong
 
> It works with -g.

What works with -g? perf report?


> Without -g it will just give an error message. I think that is ok, isn't it?
 
> > > directory since I'm comparing the output of --stdio, --tui and --gtk,
> > > since it looks --gtk is wrong, still unsure about what the problem is in
> > > that case, but stumbled on:
> > > 
> > 
> > I need to investigate this further, so I created a perf/branch-history
> > branch that has the patches I need to test more rebased on top of my
> > perf/core branch I just pushed out to Ingo.
> 
> 
> I tested --gtk and I don't see any differences to the console mode
> with --branch-history.  What problem do you see?

The difference is with --tui, but I haven't checked if this is a problem
introduced by your patchkit or if this is something that was there
before it was applied.

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-19 21:50               ` Andi Kleen
@ 2014-11-20 20:36                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-20 20:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Olsa, Namhyung Kim, Jiri Olsa, linux-kernel, Liang, Kan

Em Wed, Nov 19, 2014 at 10:50:41PM +0100, Andi Kleen escreveu:
> > Segmentation fault (core dumped)
> > [root@zoo ~]#
> > 
> > Rebuilding with DEBUG=1 I get this backtrace:
> 
> Ok it crashes inside libbfd while resolving the line. 
> Not much I can do about it. 

> Could file a bug with the binutils people, if you have a test case?

The file I have is:

  [root@zoo ~]# perf report --stdio --header-only
  # ========
  # captured on: Tue Nov 18 17:42:26 2014
  # hostname : zoo.ghostprotocols.net
  # os release : 3.17.2-200.fc20.x86_64
  # perf version : 3.18.rc1.g5a0227
  # arch : x86_64
  # nrcpus online : 4
  # nrcpus avail : 4
  # cpudesc : Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
  # cpuid : GenuineIntel,6,58,9
  # total memory : 8080676 kB
  # cmdline : /home/acme/bin/perf record -a -g 
  # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, attr_mmap2 = 1, attr_mmap  = 1, attr_mmap_data = 0
  # HEADER_CPU_TOPOLOGY info available, use -I to display
  # HEADER_NUMA_TOPOLOGY info available, use -I to display
  # pmu mappings: cpu = 4, software = 1, power = 9, uncore_imc = 8, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7, breakpoint = 5
  # ========
  #
  [root@zoo ~]#

So the command line used was:

  perf record -a -g

And the perf tool was one built when HEAD was:

  [acme@zoo linux]$ git show 5a0227
   commit 5a0227c085cbee8af4ce228da80a99cdc124329e
  Author: Andi Kleen <ak@linux.intel.com>
  Date:   Mon Nov 17 17:58:54 2014 -0800

      perf report: In branch stack mode use address history sorting

  Which, due to rebases, in my perf/branch-history is:

  commit aef48d981d85f996a47715d44111220843999a4e
  Author: Andi Kleen <ak@linux.intel.com>
  Date:   Mon Nov 17 17:58:54 2014 -0800

      perf report: In branch stack mode use address history sorting

So let me try to reproduce it again...

Yeah, reproduced, so, what lead me to send this report to you was:

1. yes, it has -g in the 'perf record' command line generating the perf.data file
   that when fed to perf report --branch-history causes a segfault
2. no, it doesn't have -b, so, it ends up with PERF_SAMPLE_BRANCH_STACK:

   [root@zoo ~]# perf evlist -v
   cycles: sample_freq=4000, size: 104,
   sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled: 1,
   inherit: 1, mmap: 1, mmap2: 1, comm: 1, comm_exec: 1,
   freq: 1, sample_id_all: 1, exclude_guest: 1
   [root@zoo ~]#

3. When I then pass it to 'perf report' telling that I want --branch-history,
   it segfaults

What I thought this patch should do is to check if the perf.data sample_type has
PERF_SAMPLE_BRANCH_STACK and if not, tell the user that such mode can't be chosen
since there is no branch stack.

This is all without looking at the code to see what kind of data is being fed to
libbfd, I fear it is bogus data that will then cause that segfault, no?

If I add that '-b' to the 'perf record -a -g'  command line, I get:

  [root@zoo ~]# perf record -a -g -b
  ^C[ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 2.422 MB perf.data (~105832 samples) ]

  [root@zoo ~]# perf evlist -v
  cycles: sample_freq=4000, size: 104,
  sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, disabled: 1,
  inherit: 1, mmap: 1, mmap2: 1, comm: 1, comm_exec: 1, freq: 1,
  sample_id_all: 1, exclude_guest: 1, branch_sample_type: 8
  [root@zoo ~]#

And then, specifying --branch-history to perf report is no problem it is happy
finding whatever info it needs to properly spoon into libbfd...

No, that is not the case, when I passed the above perf.data to 'perf report
--stdio --branch-history' it happily exploded with double frees and segfaults...

So please try, as root:

  perf record -a -g -b

then interrupt with control+C and try:

  perf report --stdio --branch-history

On my perf/branch-history branch.

- Arnaldo
 
> > 
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `perf report --stdio --no-children --branch-history'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x000000000053ab10 in read_unsigned_leb128 ()
> > Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-9.fc20.x86_64 elfutils-libelf-0.160-1.fc20.x86_64 elfutils-libs-0.160-1.fc20.x86_64 libunwind-1.1-3.fc20.x86_64 nss-softokn-freebl-3.17.2-1.fc20.x86_64 numactl-libs-2.0.9-2.fc20.x86_64 slang-2.2.4-11.fc20.x86_64 xz-libs-5.1.2-12alpha.fc20.x86_64
> > (gdb) bt
> > #0  0x000000000053ab10 in read_unsigned_leb128 ()
> > #1  0x00000000005485d7 in find_abstract_instance_name.isra ()
> > #2  0x0000000000548756 in find_abstract_instance_name.isra ()
> > #3  0x0000000000548756 in find_abstract_instance_name.isra ()
> > #4  0x0000000000548d31 in scan_unit_for_symbols ()
> > #5  0x00000000005496c9 in comp_unit_find_nearest_line ()
> > #6  0x000000000054a6ca in find_line.part ()
> > #7  0x00000000005606ca in _bfd_elf_find_nearest_line_discriminator ()
> > #8  0x00000000005607db in _bfd_elf_find_nearest_line ()
> > #9  0x00000000004dabf1 in find_address_in_section (abfd=0xfd522f0, section=0xe8be528, data=0xfa50c30) at util/srcline.c:100
> > #10 0x000000000053cbdc in bfd_map_over_sections ()
> > #11 0x00000000004dae84 in addr2line (dso_name=0xe89fe70 "/usr/lib/debug/usr/lib64/firefox/libxul.so.debug", addr=24667432, file=0x7fff3981f9c0, line=0x7fff3981f9bc, dso=0x22a0b50) at util/srcline.c:168
> > #12 0x00000000004db017 in get_srcline (dso=0x22a0b50, addr=24667432, sym=0xf6b93f0, show_sym=true) at util/srcline.c:276

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-20 19:33                   ` Arnaldo Carvalho de Melo
@ 2014-11-20 20:46                     ` Andi Kleen
  2014-11-21 20:30                     ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2014-11-20 20:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, Namhyung Kim, Jiri Olsa, linux-kernel, Liang, Kan

On Thu, Nov 20, 2014 at 04:33:47PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 19, 2014 at 10:48:22PM +0100, Andi Kleen escreveu:
> > On Wed, Nov 19, 2014 at 01:04:58PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Nov 19, 2014 at 11:10:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Wed, Nov 19, 2014 at 11:54:50AM +0100, Jiri Olsa escreveu:
> > > > > > > BUILD_BUG_ON then?
> > > > > > 
> > > > > > sounds gut
> > > > > 
> > > > > hum, acme/perf/core changed and so has the compile error ;-)
> > > > > we dont overload the <linux/bug.h>, so the kernel one got
> > > > > included, which is wrong.. attached patch fixes that
> > > > 
> > > > I had fixed this but not force pushed out, sorry.
> > > > 
> > > > Now I mistakenly tried running:
> > > > 
> > > > perf report --stdio --no-children --branch-history
> > > > 
> > > > on a file that has no BRANCH_STACK, i.e. a perf.data file on a wrong
>  
> > It works with -g.
> 
> What works with -g? perf report?

--branch-history needs perf record -g.
If there is no -g in the record it will error out.

That's what you did and see, right?

> 
> 
> > Without -g it will just give an error message. I think that is ok, isn't it?
>  
> > > > directory since I'm comparing the output of --stdio, --tui and --gtk,
> > > > since it looks --gtk is wrong, still unsure about what the problem is in
> > > > that case, but stumbled on:
> > > > 
> > > 
> > > I need to investigate this further, so I created a perf/branch-history
> > > branch that has the patches I need to test more rebased on top of my
> > > perf/core branch I just pushed out to Ingo.
> > 
> > 
> > I tested --gtk and I don't see any differences to the console mode
> > with --branch-history.  What problem do you see?
> 
> The difference is with --tui, but I haven't checked if this is a problem
> introduced by your patchkit or if this is something that was there
> before it was applied.

What difference do you see?


For me all of --stdio, --gtk, --tui look similar.

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-20 19:33                   ` Arnaldo Carvalho de Melo
  2014-11-20 20:46                     ` Andi Kleen
@ 2014-11-21 20:30                     ` Arnaldo Carvalho de Melo
  2014-11-22  1:25                       ` Andi Kleen
  1 sibling, 1 reply; 55+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-11-21 20:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Olsa, Namhyung Kim, Jiri Olsa, linux-kernel, Liang, Kan

Em Thu, Nov 20, 2014 at 04:33:47PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Nov 19, 2014 at 10:48:22PM +0100, Andi Kleen escreveu:
> > > > directory since I'm comparing the output of --stdio, --tui and --gtk,
> > > > since it looks --gtk is wrong, still unsure about what the problem is in
> > > > that case, but stumbled on:

> > > I need to investigate this further, so I created a perf/branch-history
> > > branch that has the patches I need to test more rebased on top of my
> > > perf/core branch I just pushed out to Ingo.

> > I tested --gtk and I don't see any differences to the console mode
> > with --branch-history.  What problem do you see?

> The difference is with --tui, but I haven't checked if this is a problem
> introduced by your patchkit or if this is something that was there
> before it was applied.

So, here it is, --gtk looks like --stdio:

$ perf report --no-children --branch-history --stdio

# Samples: 43  of event 'cycles'
# Event count (approx.): 26843162
#
# Overhead  Source:Line  Symbol                     Shared Object   
# ........  ...........  .........................  ................
#
    68.42%  tcall.c:5    [.] f2                     tcall           
            |          
            |--87.65%-- f2 tcall.c:4
            |          |          
            |          |--67.41%-- f1 tcall.c:10
            |          |          f1 tcall.c:9
            |          |          main tcall.c:17
            |          |          main tcall.c:17
            |          |          main tcall.c:16
            |          |          main tcall.c:16
            |          |          f1 tcall.c:12
            |          |          f1 tcall.c:12
            |          |          f2 tcall.c:6
            |          |          f2 tcall.c:4
            |          |          f1 tcall.c:11
            |          |          f1 tcall.c:11
            |          |          f2 tcall.c:6
            |          |          f2 tcall.c:4
            |          |          f1 tcall.c:10
            |          |          
            |           --32.59%-- f1 tcall.c:11
            |                     f1 tcall.c:11
            |                     f1 tcall.c:11
            |                     f2 tcall.c:6
            |                     f2 tcall.c:4
            |                     f1 tcall.c:10
            |                     f1 tcall.c:9
            |                     main tcall.c:17
            |                     main tcall.c:17
            |                     main tcall.c:16
            |                     main tcall.c:16
            |                     f1 tcall.c:12
            |                     f1 tcall.c:12
            |                     f2 tcall.c:6
            |                     f2 tcall.c:4
            |                     f1 tcall.c:11
            |          
             --12.35%-- f1 tcall.c:9
                       main tcall.c:17
                       main tcall.c:17
                       main tcall.c:16
                       main tcall.c:16
                       f1 tcall.c:12
                       f1 tcall.c:12
                       f2 tcall.c:6
                       f2 tcall.c:4
                       f1 tcall.c:11
                       f1 tcall.c:11
                       f2 tcall.c:6
                       f2 tcall.c:4
                       f1 tcall.c:10
                       f1 tcall.c:9
                       main tcall.c:17
<SNIP>

But:

$ perf report --no-children --branch-history --tui
# Then expand a few callchains and press 'P' to generate a perf.hist.0
# file:
-   68.42%  tcall.c:5    [.] f2                     tcall
   - f2 tcall.c:4
      - 67.41% f1 tcall.c:10
           f1 tcall.c:9
           main tcall.c:17
           main tcall.c:17
           main tcall.c:16
           main tcall.c:16
           f1 tcall.c:12
           f1 tcall.c:12
           f2 tcall.c:6
           f2 tcall.c:4
           f1 tcall.c:11
           f1 tcall.c:11
           f2 tcall.c:6
           f2 tcall.c:4
           f1 tcall.c:10
      - 32.59% f1 tcall.c:11
           f1 tcall.c:11
           f2 tcall.c:6
           f2 tcall.c:4
           f1 tcall.c:10
           f1 tcall.c:9
           main tcall.c:17
           main tcall.c:17
           main tcall.c:16
           main tcall.c:16
           f1 tcall.c:12
           f1 tcall.c:12
           f2 tcall.c:6
           f2 tcall.c:4
           f1 tcall.c:11
     f1 tcall.c:9
     main tcall.c:17
     main tcall.c:17
     main tcall.c:16
     main tcall.c:16
     f1 tcall.c:12
     f1 tcall.c:12
     f2 tcall.c:6
     f2 tcall.c:4
     f1 tcall.c:11
     f1 tcall.c:11
     f2 tcall.c:6
     f2 tcall.c:4
     f1 tcall.c:10
     f1 tcall.c:9
     main tcall.c:17
<SNIP>


Do you see the diff?  The 87.65% and 12.35% doesn't appear on the --tui
output.

But I don't know if this is due to your patchkit, trying to check.

- Arnaldo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-21 20:30                     ` Arnaldo Carvalho de Melo
@ 2014-11-22  1:25                       ` Andi Kleen
  2014-11-24  7:40                         ` Namhyung Kim
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2014-11-22  1:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, Namhyung Kim, Jiri Olsa, linux-kernel, Liang, Kan

>      f1 tcall.c:9
>      main tcall.c:17
>      main tcall.c:17
>      main tcall.c:16
>      main tcall.c:16
>      f1 tcall.c:12
>      f1 tcall.c:12
>      f2 tcall.c:6
>      f2 tcall.c:4
>      f1 tcall.c:11
>      f1 tcall.c:11
>      f2 tcall.c:6
>      f2 tcall.c:4
>      f1 tcall.c:10
>      f1 tcall.c:9
>      main tcall.c:17
> <SNIP>
> 
> 
> Do you see the diff?  The 87.65% and 12.35% doesn't appear on the --tui
> output.

I see the problem. It's some issue in hist_browser__show_callchain.
--stdio doesn't show it because it doesn't seem to use that (?)

With this patch it shows percent for the first entry

@@ -791,7 +791,7 @@ static int hist_browser__show_entry(struct hist_browser *browser,
                };
 
                printed += hist_browser__show_callchain(browser,
-                                       &entry->sorted_chain, 1, row, total,
+                                       &entry->sorted_chain, 2, row, total,
                                        hist_browser__show_callchain_entry, &arg,
                                        hist_browser__check_output_full);

But the numbers are still different from what --stdio outputs,
so there are some deeper issues.

I doubt I caused this, probably some latent bug that just got triggered.

Namhyung?

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implement lbr-as-callgraph v10
  2014-11-22  1:25                       ` Andi Kleen
@ 2014-11-24  7:40                         ` Namhyung Kim
  0 siblings, 0 replies; 55+ messages in thread
From: Namhyung Kim @ 2014-11-24  7:40 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Jiri Olsa, linux-kernel, Liang, Kan

Hi Andi and Arnaldo,

On Sat, 22 Nov 2014 02:25:19 +0100, Andi Kleen wrote:
>>      f1 tcall.c:9
>>      main tcall.c:17
>>      main tcall.c:17
>>      main tcall.c:16
>>      main tcall.c:16
>>      f1 tcall.c:12
>>      f1 tcall.c:12
>>      f2 tcall.c:6
>>      f2 tcall.c:4
>>      f1 tcall.c:11
>>      f1 tcall.c:11
>>      f2 tcall.c:6
>>      f2 tcall.c:4
>>      f1 tcall.c:10
>>      f1 tcall.c:9
>>      main tcall.c:17
>> <SNIP>
>> 
>> 
>> Do you see the diff?  The 87.65% and 12.35% doesn't appear on the --tui
>> output.
>
> I see the problem. It's some issue in hist_browser__show_callchain.
> --stdio doesn't show it because it doesn't seem to use that (?)
>
> With this patch it shows percent for the first entry
>
> @@ -791,7 +791,7 @@ static int hist_browser__show_entry(struct hist_browser *browser,
>                 };
>  
>                 printed += hist_browser__show_callchain(browser,
> -                                       &entry->sorted_chain, 1, row, total,
> +                                       &entry->sorted_chain, 2, row, total,
>                                         hist_browser__show_callchain_entry, &arg,
>                                         hist_browser__check_output_full);
>
> But the numbers are still different from what --stdio outputs,
> so there are some deeper issues.
>
> I doubt I caused this, probably some latent bug that just got triggered.
>
> Namhyung?

I think it's an old bug even before my callchain cleanup patch series.
It only prints percent if the level is greater than 1 which I guess it
assumes there's only a single callchain path for the first level.

I'll post a patch for that soon.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf callchain: Enable printing the srcline in the history
  2014-11-13  2:05 ` [PATCH 06/10] perf, tools: Enable printing the srcline in the history Andi Kleen
  2014-11-13 19:20   ` Arnaldo Carvalho de Melo
@ 2014-12-08  6:48   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-12-08  6:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, namhyung, jolsa, acme, tglx, ak, linux-kernel, mingo

Commit-ID:  23f0981bbd89fcc1496d0490ec39ca7c91599e32
Gitweb:     http://git.kernel.org/tip/23f0981bbd89fcc1496d0490ec39ca7c91599e32
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:24 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Nov 2014 18:03:46 -0300

perf callchain: Enable printing the srcline in the history

For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.

When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.

This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.

Contains fixes from Namhyung Kim

v2: Refactor code into common function
v3: Fix GTK build
v4: Rebase

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 11 ++++++++++-
 tools/perf/util/callchain.h |  1 +
 tools/perf/util/srcline.c   |  6 ++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 38da69c..b6624ae 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -815,7 +815,16 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	int printed;
 
 	if (cl->ms.sym) {
-		printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+		if (callchain_param.key == CCKEY_ADDRESS &&
+		    cl->ms.map && !cl->srcline)
+			cl->srcline = get_srcline(cl->ms.map->dso,
+						  map__rip_2objdump(cl->ms.map,
+								    cl->ip));
+		if (cl->srcline)
+			printed = scnprintf(bf, bfsize, "%s %s",
+					cl->ms.sym->name, cl->srcline);
+		else
+			printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
 	} else
 		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3e1ed15..3f15847 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -70,6 +70,7 @@ extern struct callchain_param callchain_param;
 struct callchain_list {
 	u64			ip;
 	struct map_symbol	ms;
+	char		       *srcline;
 	struct list_head	list;
 };
 
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 77c1806..ac877f9 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -258,7 +258,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
 	const char *dso_name;
 
 	if (!dso->has_srcline)
-		return SRCLINE_UNKNOWN;
+		goto out;
 
 	if (dso->symsrc_filename)
 		dso_name = dso->symsrc_filename;
@@ -289,7 +289,9 @@ out:
 		dso->has_srcline = 0;
 		dso__free_a2l(dso);
 	}
-	return SRCLINE_UNKNOWN;
+	if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
+		return SRCLINE_UNKNOWN;
+	return srcline;
 }
 
 void free_srcline(char *srcline)

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf callchain: Make get_srcline fall back to sym+offset
  2014-11-13  2:05 ` [PATCH 09/10] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
@ 2014-12-08  6:49   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-12-08  6:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tglx, namhyung, linux-kernel, mingo, ak, jolsa, acme

Commit-ID:  85c116a6cb91a5c09b7a6c95ffc6a6cbd32cd237
Gitweb:     http://git.kernel.org/tip/85c116a6cb91a5c09b7a6c95ffc6a6cbd32cd237
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:27 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Nov 2014 18:03:47 -0300

perf callchain: Make get_srcline fall back to sym+offset

When the source line is not found fall back to sym + offset.  This is
generally much more useful than a raw address.

For this we need to pass in the symbol from the caller.

For some callers it's awkward to compute, so we stay at the old
behaviour.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-10-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/annotate.c  |  2 +-
 tools/perf/util/callchain.c |  3 ++-
 tools/perf/util/map.c       |  2 +-
 tools/perf/util/sort.c      |  6 ++++--
 tools/perf/util/srcline.c   | 11 +++++++++--
 tools/perf/util/util.h      |  4 +++-
 6 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e5670f1..79999ce 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1192,7 +1192,7 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map,
 			goto next;
 
 		offset = start + i;
-		src_line->path = get_srcline(map->dso, offset);
+		src_line->path = get_srcline(map->dso, offset, NULL, false);
 		insert_source_line(&tmp_root, src_line);
 
 	next:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index b6624ae..517ed84 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -819,7 +819,8 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 		    cl->ms.map && !cl->srcline)
 			cl->srcline = get_srcline(cl->ms.map->dso,
 						  map__rip_2objdump(cl->ms.map,
-								    cl->ip));
+								    cl->ip),
+						  cl->ms.sym, false);
 		if (cl->srcline)
 			printed = scnprintf(bf, bfsize, "%s %s",
 					cl->ms.sym->name, cl->srcline);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 040a785..62ca9f2 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -360,7 +360,7 @@ int map__fprintf_srcline(struct map *map, u64 addr, const char *prefix,
 
 	if (map && map->dso) {
 		srcline = get_srcline(map->dso,
-				      map__rip_2objdump(map, addr));
+				      map__rip_2objdump(map, addr), NULL, true);
 		if (srcline != SRCLINE_UNKNOWN)
 			ret = fprintf(fp, "%s%s", prefix, srcline);
 		free_srcline(srcline);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 82a5596..9139dda 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -291,7 +291,8 @@ sort__srcline_cmp(struct hist_entry *left, struct hist_entry *right)
 		else {
 			struct map *map = left->ms.map;
 			left->srcline = get_srcline(map->dso,
-					    map__rip_2objdump(map, left->ip));
+					   map__rip_2objdump(map, left->ip),
+						    left->ms.sym, true);
 		}
 	}
 	if (!right->srcline) {
@@ -300,7 +301,8 @@ sort__srcline_cmp(struct hist_entry *left, struct hist_entry *right)
 		else {
 			struct map *map = right->ms.map;
 			right->srcline = get_srcline(map->dso,
-					    map__rip_2objdump(map, right->ip));
+					     map__rip_2objdump(map, right->ip),
+						     right->ms.sym, true);
 		}
 	}
 	return strcmp(right->srcline, left->srcline);
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ac877f9..e73b6a5 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -8,6 +8,8 @@
 #include "util/util.h"
 #include "util/debug.h"
 
+#include "symbol.h"
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
@@ -250,7 +252,8 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
  */
 #define A2L_FAIL_LIMIT 123
 
-char *get_srcline(struct dso *dso, unsigned long addr)
+char *get_srcline(struct dso *dso, unsigned long addr, struct symbol *sym,
+		  bool show_sym)
 {
 	char *file = NULL;
 	unsigned line = 0;
@@ -289,7 +292,11 @@ out:
 		dso->has_srcline = 0;
 		dso__free_a2l(dso);
 	}
-	if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
+	if (sym) {
+		if (asprintf(&srcline, "%s+%ld", show_sym ? sym->name : "",
+					addr - sym->start) < 0)
+			return SRCLINE_UNKNOWN;
+	} else if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
 		return SRCLINE_UNKNOWN;
 	return srcline;
 }
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 76d23d8..419bee0 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -337,8 +337,10 @@ static inline int path__join3(char *bf, size_t size,
 }
 
 struct dso;
+struct symbol;
 
-char *get_srcline(struct dso *dso, unsigned long addr);
+char *get_srcline(struct dso *dso, unsigned long addr, struct symbol *sym,
+		  bool show_sym);
 void free_srcline(char *srcline);
 
 int filename__read_int(const char *filename, int *value);

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf callchain: Support handling complete branch stacks as histograms
  2014-11-13  2:05 ` [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms Andi Kleen
  2014-11-13 19:14   ` Arnaldo Carvalho de Melo
@ 2014-12-08  6:53   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-12-08  6:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, tglx, mingo, ak, jolsa, linux-kernel, hpa, namhyung

Commit-ID:  8b7bad58efb7e3aaff60f7c1fa4361fb8c23181d
Gitweb:     http://git.kernel.org/tip/8b7bad58efb7e3aaff60f7c1fa4361fb8c23181d
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:20 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 1 Dec 2014 20:00:31 -0300

perf callchain: Support handling complete branch stacks as histograms

Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.

This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.

Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:

tcall.c:

volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
	c = a / b;
}

__attribute__((noinline)) f1()
{
	f2();
	f2();
}
main()
{
	int i;
	for (i = 0; i < 1000000; i++)
		f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
    54.91%  tcall.c:6  [.] f2                      tcall
            |
            |--65.53%-- f2 tcall.c:5
            |          |
            |          |--70.83%-- f1 tcall.c:11
            |          |          f1 tcall.c:10
            |          |          main tcall.c:18
            |          |          main tcall.c:18
            |          |          main tcall.c:17
            |          |          main tcall.c:17
            |          |          f1 tcall.c:13
            |          |          f1 tcall.c:13
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:12
            |          |          f1 tcall.c:12
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:11
            |          |
            |           --29.17%-- f1 tcall.c:12
            |                     f1 tcall.c:12
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:11
            |                     f1 tcall.c:10
            |                     main tcall.c:18
            |                     main tcall.c:18
            |                     main tcall.c:17
            |                     main tcall.c:17
            |                     f1 tcall.c:13
            |                     f1 tcall.c:13
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record or anywhere
else.

This adds the basic code to report:

- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.

The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.

- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:

- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
    -b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
    patch. Skip initial entries in callchain. Minor cleanups.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt |   7 +-
 tools/perf/builtin-report.c              |   4 +-
 tools/perf/util/callchain.c              |   4 +
 tools/perf/util/callchain.h              |   1 +
 tools/perf/util/machine.c                | 126 ++++++++++++++++++++++++++++---
 tools/perf/util/symbol.h                 |   3 +-
 6 files changed, 132 insertions(+), 13 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 0927bf4..22706be 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -159,7 +159,7 @@ OPTIONS
 --dump-raw-trace::
         Dump raw trace in ASCII.
 
--g [type,min[,limit],order[,key]]::
+-g [type,min[,limit],order[,key][,branch]]::
 --call-graph::
         Display call chains using type, min percent threshold, optional print
 	limit and order.
@@ -177,6 +177,11 @@ OPTIONS
 	- function: compare on functions
 	- address: compare on individual code addresses
 
+	branch can be:
+	- branch: include last branch information in callgraph
+	when available. Usually more convenient to use --branch-history
+	for this.
+
 	Default: fractal,0.5,callee,function.
 
 --children::
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 140a6cd..410d44f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -637,8 +637,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "regex filter to identify parent, see: '--sort parent'"),
 	OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
 		    "Only display entries with parent-match"),
-	OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order",
-		     "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). "
+	OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order[,branch]",
+		     "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address), add branches. "
 		     "Default: fractal,0.5,callee,function", &report_parse_callchain_opt, callchain_default_opt),
 	OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
 		    "Accumulate callchains of children and show total overhead as well"),
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 517ed84..cf524a3 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -149,6 +149,10 @@ static int parse_callchain_sort_key(const char *value)
 		callchain_param.key = CCKEY_ADDRESS;
 		return 0;
 	}
+	if (!strncmp(value, "branch", strlen(value))) {
+		callchain_param.branch_callstack = 1;
+		return 0;
+	}
 	return -1;
 }
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3f15847..dbc08cf 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -63,6 +63,7 @@ struct callchain_param {
 	sort_chain_func_t	sort;
 	enum chain_order	order;
 	enum chain_key		key;
+	bool			branch_callstack;
 };
 
 extern struct callchain_param callchain_param;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b75b487..15dd0a9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -12,6 +12,7 @@
 #include <stdbool.h>
 #include <symbol/kallsyms.h>
 #include "unwind.h"
+#include "linux/hash.h"
 
 static void dsos__init(struct dsos *dsos)
 {
@@ -1391,7 +1392,11 @@ static int add_callchain_ip(struct thread *thread,
 
 	al.filtered = 0;
 	al.sym = NULL;
-	thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
+	if (cpumode == -1)
+		thread__find_cpumode_addr_location(thread, MAP__FUNCTION,
+						   ip, &al);
+	else
+		thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
 				   ip, &al);
 	if (al.sym != NULL) {
 		if (sort__has_parent && !*parent &&
@@ -1427,8 +1432,50 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 	return bi;
 }
 
+#define CHASHSZ 127
+#define CHASHBITS 7
+#define NO_ENTRY 0xff
+
+#define PERF_MAX_BRANCH_DEPTH 127
+
+/* Remove loops. */
+static int remove_loops(struct branch_entry *l, int nr)
+{
+	int i, j, off;
+	unsigned char chash[CHASHSZ];
+
+	memset(chash, NO_ENTRY, sizeof(chash));
+
+	BUG_ON(PERF_MAX_BRANCH_DEPTH > 255);
+
+	for (i = 0; i < nr; i++) {
+		int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
+
+		/* no collision handling for now */
+		if (chash[h] == NO_ENTRY) {
+			chash[h] = i;
+		} else if (l[chash[h]].from == l[i].from) {
+			bool is_loop = true;
+			/* check if it is a real loop */
+			off = 0;
+			for (j = chash[h]; j < i && i + off < nr; j++, off++)
+				if (l[j].from != l[i + off].from) {
+					is_loop = false;
+					break;
+				}
+			if (is_loop) {
+				memmove(l + i, l + i + off,
+					(nr - (i + off)) * sizeof(*l));
+				nr -= off;
+			}
+		}
+	}
+	return nr;
+}
+
 static int thread__resolve_callchain_sample(struct thread *thread,
 					     struct ip_callchain *chain,
+					     struct branch_stack *branch,
 					     struct symbol **parent,
 					     struct addr_location *root_al,
 					     int max_stack)
@@ -1438,22 +1485,82 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	int i;
 	int j;
 	int err;
-	int skip_idx __maybe_unused;
+	int skip_idx = -1;
+	int first_call = 0;
+
+	/*
+	 * Based on DWARF debug information, some architectures skip
+	 * a callchain entry saved by the kernel.
+	 */
+	if (chain->nr < PERF_MAX_STACK_DEPTH)
+		skip_idx = arch_skip_callchain_idx(thread, chain);
 
 	callchain_cursor_reset(&callchain_cursor);
 
+	/*
+	 * Add branches to call stack for easier browsing. This gives
+	 * more context for a sample than just the callers.
+	 *
+	 * This uses individual histograms of paths compared to the
+	 * aggregated histograms the normal LBR mode uses.
+	 *
+	 * Limitations for now:
+	 * - No extra filters
+	 * - No annotations (should annotate somehow)
+	 */
+
+	if (branch && callchain_param.branch_callstack) {
+		int nr = min(max_stack, (int)branch->nr);
+		struct branch_entry be[nr];
+
+		if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
+			pr_warning("corrupted branch chain. skipping...\n");
+			goto check_calls;
+		}
+
+		for (i = 0; i < nr; i++) {
+			if (callchain_param.order == ORDER_CALLEE) {
+				be[i] = branch->entries[i];
+				/*
+				 * Check for overlap into the callchain.
+				 * The return address is one off compared to
+				 * the branch entry. To adjust for this
+				 * assume the calling instruction is not longer
+				 * than 8 bytes.
+				 */
+				if (i == skip_idx ||
+				    chain->ips[first_call] >= PERF_CONTEXT_MAX)
+					first_call++;
+				else if (be[i].from < chain->ips[first_call] &&
+				    be[i].from >= chain->ips[first_call] - 8)
+					first_call++;
+			} else
+				be[i] = branch->entries[branch->nr - i - 1];
+		}
+
+		nr = remove_loops(be, nr);
+
+		for (i = 0; i < nr; i++) {
+			err = add_callchain_ip(thread, parent, root_al,
+					       -1, be[i].to);
+			if (!err)
+				err = add_callchain_ip(thread, parent, root_al,
+						       -1, be[i].from);
+			if (err == -EINVAL)
+				break;
+			if (err)
+				return err;
+		}
+		chain_nr -= nr;
+	}
+
+check_calls:
 	if (chain->nr > PERF_MAX_STACK_DEPTH) {
 		pr_warning("corrupted callchain. skipping...\n");
 		return 0;
 	}
 
-	/*
-	 * Based on DWARF debug information, some architectures skip
-	 * a callchain entry saved by the kernel.
-	 */
-	skip_idx = arch_skip_callchain_idx(thread, chain);
-
-	for (i = 0; i < chain_nr; i++) {
+	for (i = first_call; i < chain_nr; i++) {
 		u64 ip;
 
 		if (callchain_param.order == ORDER_CALLEE)
@@ -1517,6 +1624,7 @@ int thread__resolve_callchain(struct thread *thread,
 			      int max_stack)
 {
 	int ret = thread__resolve_callchain_sample(thread, sample->callchain,
+						   sample->branch_stack,
 						   parent, root_al, max_stack);
 	if (ret)
 		return ret;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index e0b297c..9d602e9 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -102,7 +102,8 @@ struct symbol_conf {
 			demangle,
 			demangle_kernel,
 			filter_relative,
-			show_hist_headers;
+			show_hist_headers,
+			branch_callstack;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf report: Add --branch-history option
  2014-11-13  2:05 ` [PATCH 04/10] perf, tools: Add --branch-history option to report Andi Kleen
@ 2014-12-08  6:53   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2014-12-08  6:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, tglx, namhyung, jolsa, ak, linux-kernel, hpa, mingo

Commit-ID:  fa94c36c29ed8bb4749b5fd7ea51a593f673dcef
Gitweb:     http://git.kernel.org/tip/fa94c36c29ed8bb4749b5fd7ea51a593f673dcef
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Wed, 12 Nov 2014 18:05:22 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 1 Dec 2014 20:00:31 -0300

perf report: Add --branch-history option

Add a --branch-history option to perf report that changes all the
settings necessary for using the branches in callstacks.

This is just a short cut to make this nicer to use, it does not enable
any functionality by itself.

v2: Change sort order. Rename option to --branch-history to
    be less confusing.
v3: Updates
v4: Fix conflict with newer perf base
v5: Port to latest tip
v6: Add more comments. Remove CCKEY_ADDRESS setting. Remove
    unnecessary branch_mode setting. Use a boolean.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-5-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt |  5 +++++
 tools/perf/builtin-report.c              | 26 ++++++++++++++++++++++----
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 22706be..dd7cccd 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -271,6 +271,11 @@ OPTIONS
 	branch stacks and it will automatically switch to the branch view mode,
 	unless --no-branch-stack is used.
 
+--branch-history::
+	Add the addresses of sampled taken branches to the callstack.
+	This allows to examine the path the program took to each sample.
+	The data collection must have used -b (or -j) and -g.
+
 --objdump=<path>::
         Path to objdump binary.
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 410d44f..fb272ff 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -226,8 +226,9 @@ static int report__setup_sample_type(struct report *rep)
 			return -EINVAL;
 		}
 		if (symbol_conf.use_callchain) {
-			ui__error("Selected -g but no callchain data. Did "
-				    "you call 'perf record' without -g?\n");
+			ui__error("Selected -g or --branch-history but no "
+				  "callchain data. Did\n"
+				  "you call 'perf record' without -g?\n");
 			return -1;
 		}
 	} else if (!rep->dont_use_callchains &&
@@ -575,6 +576,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	struct stat st;
 	bool has_br_stack = false;
 	int branch_mode = -1;
+	bool branch_call_mode = false;
 	char callchain_default_opt[] = "fractal,0.5,callee";
 	const char * const report_usage[] = {
 		"perf report [<options>]",
@@ -684,7 +686,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
 		    "Show event group information together"),
 	OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
-		    "use branch records for histogram filling", parse_branch_mode),
+		    "use branch records for per branch histogram filling",
+		    parse_branch_mode),
+	OPT_BOOLEAN(0, "branch-history", &branch_call_mode,
+		    "add last branch records to call history"),
 	OPT_STRING(0, "objdump", &objdump_path, "path",
 		   "objdump binary to use for disassembly and annotations"),
 	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
@@ -745,10 +750,23 @@ repeat:
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
-	if ((branch_mode == -1 && has_br_stack) || branch_mode == 1) {
+	/*
+	 * Branch mode is a tristate:
+	 * -1 means default, so decide based on the file having branch data.
+	 * 0/1 means the user chose a mode.
+	 */
+	if (((branch_mode == -1 && has_br_stack) || branch_mode == 1) &&
+	    branch_call_mode == -1) {
 		sort__mode = SORT_MODE__BRANCH;
 		symbol_conf.cumulate_callchain = false;
 	}
+	if (branch_call_mode) {
+		callchain_param.branch_callstack = 1;
+		symbol_conf.use_callchain = true;
+		callchain_register_param(&callchain_param);
+		if (sort_order == NULL)
+			sort_order = "srcline,symbol,dso";
+	}
 
 	if (report.mem_mode) {
 		if (sort__mode == SORT_MODE__BRANCH) {

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/10] perf, tools: Use al.addr to set up call chain
  2014-11-13 19:16   ` Arnaldo Carvalho de Melo
@ 2014-12-11 21:46     ` Jiri Olsa
  2014-12-11 22:27       ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2014-12-11 21:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Andi Kleen, linux-kernel, namhyung, Andi Kleen

On Thu, Nov 13, 2014 at 04:16:33PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 12, 2014 at 06:05:21PM -0800, Andi Kleen escreveu:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Use the relative address, this makes get_srcline work correctly
> > in the end.
> 
> Applied.
>  
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> > ---
> >  tools/perf/util/machine.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 2e16d69..066e963 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -1411,7 +1411,7 @@ static int add_callchain_ip(struct thread *thread,
> >  		}
> >  	}
> >  
> > -	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
> > +	return callchain_cursor_append(&callchain_cursor, al.addr, al.map, al.sym);
> >  }
> >  
> >  struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
> > -- 
> > 1.9.3

hi,
this patch also changed the output of callchain entries
with 'map' but with no symbol, like in following diff:

---
                  ---get_next_seq
                     |
-                    |--70.16%-- 0x40e1cf
+                    |--70.16%-- 0xe1cf
                     |          0x841f0f
                     |
-                    |--25.83%-- 0x40e153
+                    |--25.83%-- 0xe153
                     |          0x841f0f
                     |
-                     --4.00%-- 0x40e27f
+                     --4.00%-- 0xe27f
--

I'm guessing this change was unintentional..? in case we have a map
but no symbol seeing full address is more clear than relative at
least for binary, not sure about DSOs..

thoughts?

thanks,
jirka


---
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 64b377e591e4..a4fb25fb26f5 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -816,22 +816,30 @@ out:
 char *callchain_list__sym_name(struct callchain_list *cl,
 			       char *bf, size_t bfsize, bool show_dso)
 {
+	struct map *map = cl->ms.map;
 	int printed;
 
 	if (cl->ms.sym) {
 		if (callchain_param.key == CCKEY_ADDRESS &&
-		    cl->ms.map && !cl->srcline)
-			cl->srcline = get_srcline(cl->ms.map->dso,
-						  map__rip_2objdump(cl->ms.map,
-								    cl->ip),
+		    map && !cl->srcline) {
+			cl->srcline = get_srcline(map->dso,
+						  map__rip_2objdump(map, cl->ip),
 						  cl->ms.sym, false);
+		}
 		if (cl->srcline)
 			printed = scnprintf(bf, bfsize, "%s %s",
 					cl->ms.sym->name, cl->srcline);
 		else
 			printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-	} else
-		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
+	} else {
+		/*
+		 * The cl->ip value is unbased ip (applied map->map_ip).
+		 * Display the unmap ip in case we have no symbol.
+		 */
+		u64 addr = map ? map->unmap_ip(map, cl->ip) : cl->ip;
+
+		printed = scnprintf(bf, bfsize, "%#" PRIx64, addr);
+	}
 
 	if (show_dso)
 		scnprintf(bf + printed, bfsize - printed, " %s",


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/10] perf, tools: Use al.addr to set up call chain
  2014-12-11 21:46     ` Jiri Olsa
@ 2014-12-11 22:27       ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2014-12-11 22:27 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Arnaldo Carvalho de Melo, Andi Kleen, linux-kernel, namhyung

> this patch also changed the output of callchain entries
> with 'map' but with no symbol, like in following diff:

Was unintentional. Patch looks good. thanks.

-Andi
> 
> ---
>                   ---get_next_seq
>                      |
> -                    |--70.16%-- 0x40e1cf
> +                    |--70.16%-- 0xe1cf
>                      |          0x841f0f
>                      |
> -                    |--25.83%-- 0x40e153
> +                    |--25.83%-- 0xe153
>                      |          0x841f0f
>                      |
> -                     --4.00%-- 0x40e27f
> +                     --4.00%-- 0xe27f
> --
> 
> I'm guessing this change was unintentional..? in case we have a map
> but no symbol seeing full address is more clear than relative at
> least for binary, not sure about DSOs..
> 
> thoughts?
> 
> thanks,
> jirka
> 
> 
> ---
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 64b377e591e4..a4fb25fb26f5 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -816,22 +816,30 @@ out:
>  char *callchain_list__sym_name(struct callchain_list *cl,
>  			       char *bf, size_t bfsize, bool show_dso)
>  {
> +	struct map *map = cl->ms.map;
>  	int printed;
>  
>  	if (cl->ms.sym) {
>  		if (callchain_param.key == CCKEY_ADDRESS &&
> -		    cl->ms.map && !cl->srcline)
> -			cl->srcline = get_srcline(cl->ms.map->dso,
> -						  map__rip_2objdump(cl->ms.map,
> -								    cl->ip),
> +		    map && !cl->srcline) {
> +			cl->srcline = get_srcline(map->dso,
> +						  map__rip_2objdump(map, cl->ip),
>  						  cl->ms.sym, false);
> +		}
>  		if (cl->srcline)
>  			printed = scnprintf(bf, bfsize, "%s %s",
>  					cl->ms.sym->name, cl->srcline);
>  		else
>  			printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
> -	} else
> -		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
> +	} else {
> +		/*
> +		 * The cl->ip value is unbased ip (applied map->map_ip).
> +		 * Display the unmap ip in case we have no symbol.
> +		 */
> +		u64 addr = map ? map->unmap_ip(map, cl->ip) : cl->ip;
> +
> +		printed = scnprintf(bf, bfsize, "%#" PRIx64, addr);
> +	}
>  
>  	if (show_dso)
>  		scnprintf(bf + printed, bfsize - printed, " %s",
> 

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Implement lbr-as-callgraph v10
@ 2014-09-26 23:37 Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2014-09-26 23:37 UTC (permalink / raw)
  To: jolsa; +Cc: linux-kernel, namhyung, acme

This has been reviewed before. Any change to merge it now?

[Rebase. Fix one conflict with an earlier tip change]
[Just a repost after a rebase]
[Even more review feedback and some bugs addressed.]
[Only port to changes in perf/core. No other changes.]
[Rebase to latest perf/core]
[Another rebase. No changes]

This patchkit implements lbr-as-callgraphs in per freport,
as an alternative way to present LBR information.

Current perf report does a histogram over the branch edges,
which is useful to look at basic blocks, but doesn't tell
you anything about the larger control flow behaviour.

This patchkit adds a new option --branch-history that
adds the branch paths to the callgraph history instead.

This allows to reason about individual branch paths leading
to specific samples.

Also available at
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/lbr-callgraph7

v2:
- rebased on perf/core
- fix various issues
- rename the option to --branch-history
- various fixes to display the information more concise
v3:
- White space changes
- Consolidate some patches
- Update some descriptions
v4:
- Fix various display problems
- Unknown srcline is now printed as symbol+offset
- Refactor some code to address review feedback
- Merge with latest tip
- Fix missing srcline display in stdio hist output.
v5:
- Rename functions
- Fix gtk build problem
- Fix crash without -g
- Improve error messages
- Improve srcline display in various ways
v6:
- Port to latest perf/core
v7:
- Really port to latest perf/core
v8:
- Rebased on 3.16-rc1
v9:
- Rebase on 3.17-rc* tip/perf/core
v10:
- Rebase to tip/perf/core.
- Add missing check to make --branch-history work again
with latest tip.


Example output:

    % perf record -b -g ./tsrc/tcall
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
    % perf report --branch-history
    ...
        54.91%  tcall.c:6  [.] f2                      tcall
                |
                |--65.53%-- f2 tcall.c:5
                |          |
                |          |--70.83%-- f1 tcall.c:11
                |          |          f1 tcall.c:10
                |          |          main tcall.c:18
                |          |          main tcall.c:18
                |          |          main tcall.c:17
                |          |          main tcall.c:17
                |          |          f1 tcall.c:13
                |          |          f1 tcall.c:13
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:12
                |          |          f1 tcall.c:12
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:11



^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2014-12-11 22:27 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-13  2:05 Implement lbr-as-callgraph v10 Andi Kleen
2014-11-13  2:05 ` [PATCH 01/10] perf, tools: Factor out adding new call chain entries Andi Kleen
2014-11-13 19:14   ` Arnaldo Carvalho de Melo
2014-11-20  7:37   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 02/10] perf, tools: Support handling complete branch stacks as histograms Andi Kleen
2014-11-13 19:14   ` Arnaldo Carvalho de Melo
2014-11-13 19:52     ` Andi Kleen
2014-11-13 20:08       ` Arnaldo Carvalho de Melo
2014-11-13 20:15         ` Andi Kleen
2014-11-13 20:42           ` Arnaldo Carvalho de Melo
2014-12-08  6:53   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 03/10] perf, tools: Use al.addr to set up call chain Andi Kleen
2014-11-13 19:16   ` Arnaldo Carvalho de Melo
2014-12-11 21:46     ` Jiri Olsa
2014-12-11 22:27       ` Andi Kleen
2014-11-20  7:38   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 04/10] perf, tools: Add --branch-history option to report Andi Kleen
2014-12-08  6:53   ` [tip:perf/core] perf report: Add --branch-history option tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 05/10] perf, tools: Use a common function to resolve symbol or name Andi Kleen
2014-11-13 19:17   ` Arnaldo Carvalho de Melo
2014-11-20  7:38   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 06/10] perf, tools: Enable printing the srcline in the history Andi Kleen
2014-11-13 19:20   ` Arnaldo Carvalho de Melo
2014-12-08  6:48   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 07/10] perf, tools: Only print base source file for srcline Andi Kleen
2014-11-13 19:22   ` Arnaldo Carvalho de Melo
2014-11-20  7:38   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 08/10] perf, tools: Support source line numbers in annotate Andi Kleen
2014-11-13 20:52   ` Arnaldo Carvalho de Melo
2014-11-20  7:39   ` [tip:perf/core] perf annotate: " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 09/10] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
2014-12-08  6:49   ` [tip:perf/core] perf callchain: " tip-bot for Andi Kleen
2014-11-13  2:05 ` [PATCH 10/10] tools, perf: Add asprintf replacement Andi Kleen
2014-11-13 20:53   ` Arnaldo Carvalho de Melo
2014-11-13 21:14     ` Andi Kleen
2014-11-17 21:34 ` Implement lbr-as-callgraph v10 Arnaldo Carvalho de Melo
2014-11-18  1:56   ` Andi Kleen
2014-11-18 10:44   ` Jiri Olsa
2014-11-18 11:00     ` Jiri Olsa
2014-11-18 13:37       ` Arnaldo Carvalho de Melo
2014-11-19 15:31         ` Andi Kleen
2014-11-19  6:21       ` Namhyung Kim
2014-11-19  9:23         ` Jiri Olsa
2014-11-19 10:54           ` Jiri Olsa
2014-11-19 14:10             ` Arnaldo Carvalho de Melo
2014-11-19 16:04               ` Arnaldo Carvalho de Melo
2014-11-19 21:48                 ` Andi Kleen
2014-11-20 19:33                   ` Arnaldo Carvalho de Melo
2014-11-20 20:46                     ` Andi Kleen
2014-11-21 20:30                     ` Arnaldo Carvalho de Melo
2014-11-22  1:25                       ` Andi Kleen
2014-11-24  7:40                         ` Namhyung Kim
2014-11-19 21:50               ` Andi Kleen
2014-11-20 20:36                 ` Arnaldo Carvalho de Melo
  -- strict thread matches above, loose matches on Subject: below --
2014-09-26 23:37 Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).