All of lore.kernel.org
 help / color / mirror / Atom feed
* perf: Implement lbr-as-callgraph v4
@ 2014-02-28  4:22 Andi Kleen
  2014-02-28  4:22 ` [PATCH 1/8] perf, tools: fix BFD detection on opensuse Andi Kleen
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa

[All review feedback addressed.]

This patchkit implements lbr-as-callgraphs in per freport,
as an alternative way to present LBR information.

Current perf report does a histogram over the branch edges,
which is useful to look at basic blocks, but doesn't tell
you anything about the larger control flow behaviour.

This patchkit adds a new option --branch-history that
adds the branch paths to the callgraph history instead.

This allows to reason about individual branch paths leading
to specific samples.

Also improves srcline display in various ways to make
branches better understandable.

Also available at
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/lbr-callgraph4

Example output:

    % perf record -b -g ./tsrc/tcall
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
    % perf report --branch-history
    ...
        54.91%  tcall.c:6  [.] f2                      tcall
                |
                |--65.53%-- f2 tcall.c:5
                |          |
                |          |--70.83%-- f1 tcall.c:11
                |          |          f1 tcall.c:10
                |          |          main tcall.c:18
                |          |          main tcall.c:18
                |          |          main tcall.c:17
                |          |          main tcall.c:17
                |          |          f1 tcall.c:13
                |          |          f1 tcall.c:13
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:12
                |          |          f1 tcall.c:12
                |          |          f2 tcall.c:7
                |          |          f2 tcall.c:5
                |          |          f1 tcall.c:11


v2:
- rebased on perf/core
- fix various issues
- rename the option to --branch-history
- various fixes to display the information more concise
v3:
- White space changes
- Consolidate some patches
- Update some descriptions
v4:
- Fix various display problems
- Unknown srcline is now printed as symbol+offset
- Refactor some code to address review feedback
- Merge with latest tip
- Fix missing srcline display in stdio hist output.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/8] perf, tools: fix BFD detection on opensuse
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-02-28  4:22 ` [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4 Andi Kleen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

opensuse libbfd requires -lz -liberty to build. Add those
to the BFD feature detection.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/config/Makefile                | 2 +-
 tools/perf/config/feature-checks/Makefile | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index f7c81d3..c234182 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -503,7 +503,7 @@ else
 endif
 
 ifeq ($(feature-libbfd), 1)
-  EXTLIBS += -lbfd
+  EXTLIBS += -lbfd -lz -liberty
 endif
 
 ifdef NO_DEMANGLE
diff --git a/tools/perf/config/feature-checks/Makefile b/tools/perf/config/feature-checks/Makefile
index 2b492f5..2da103c 100644
--- a/tools/perf/config/feature-checks/Makefile
+++ b/tools/perf/config/feature-checks/Makefile
@@ -122,7 +122,7 @@ test-libpython-version.bin:
 	$(BUILD) $(FLAGS_PYTHON_EMBED)
 
 test-libbfd.bin:
-	$(BUILD) -DPACKAGE='"perf"' -lbfd -ldl
+	$(BUILD) -DPACKAGE='"perf"' -lbfd -lz -liberty -ldl
 
 test-liberty.bin:
 	$(CC) -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
  2014-02-28  4:22 ` [PATCH 1/8] perf, tools: fix BFD detection on opensuse Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-03-05 10:29   ` Jiri Olsa
  2014-03-07 13:19   ` Jiri Olsa
  2014-02-28  4:22 ` [PATCH 3/8] perf, tools: Add --branch-history option to report v2 Andi Kleen
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over complete
branch traces, instead of individual branches, similar to how normal
callgraphs are handled. This is done by putting it in
front of the normal callgraph and then using the normal callgraph
histogram infrastructure to unify them.

This way in complex functions we can understand the control flow
that lead to a particular sample, and may even see some control
flow in the caller for short functions.

Example (simplified, of course for such simple code this
is usually not needed):

tcall.c:

volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
	c = a / b;
}

__attribute__((noinline)) f1()
{
	f2();
	f2();
}
main()
{
	int i;
	for (i = 0; i < 1000000; i++)
		f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --branch-history
...
    54.91%  tcall.c:6  [.] f2                      tcall
            |
            |--65.53%-- f2 tcall.c:5
            |          |
            |          |--70.83%-- f1 tcall.c:11
            |          |          f1 tcall.c:10
            |          |          main tcall.c:18
            |          |          main tcall.c:18
            |          |          main tcall.c:17
            |          |          main tcall.c:17
            |          |          f1 tcall.c:13
            |          |          f1 tcall.c:13
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:12
            |          |          f1 tcall.c:12
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:11
            |          |
            |           --29.17%-- f1 tcall.c:12
            |                     f1 tcall.c:12
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:11
            |                     f1 tcall.c:10
            |                     main tcall.c:18
            |                     main tcall.c:18
            |                     main tcall.c:17
            |                     main tcall.c:17
            |                     f1 tcall.c:13
            |                     f1 tcall.c:13
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record
or anywhere else.

This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the difference
between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |   7 +-
 tools/perf/builtin-report.c              |  15 ++-
 tools/perf/util/callchain.h              |   1 +
 tools/perf/util/machine.c                | 188 ++++++++++++++++++++++++++-----
 tools/perf/util/symbol.h                 |   3 +-
 5 files changed, 177 insertions(+), 37 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8eab8a4..349cd2a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -121,7 +121,7 @@ OPTIONS
 --dump-raw-trace::
         Dump raw trace in ASCII.
 
--g [type,min[,limit],order[,key]]::
+-g [type,min[,limit],order[,key][,branch]]::
 --call-graph::
         Display call chains using type, min percent threshold, optional print
 	limit and order.
@@ -139,6 +139,11 @@ OPTIONS
 	- function: compare on functions
 	- address: compare on individual code addresses
 
+	branch can be:
+	- branch: include last branch information in callgraph
+	when available. Usually more convenient to use --branch-history
+	for this.
+
 	Default: fractal,0.5,callee,function.
 
 --max-stack::
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d882b6f..f7a9e3d 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -634,7 +634,7 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 		callchain_param.order = ORDER_CALLER;
 	else if (!strncmp(tok2, "callee", strlen("callee")))
 		callchain_param.order = ORDER_CALLEE;
-	else
+	else if (tok2[0] != 0)
 		return -1;
 
 	/* Get the sort key */
@@ -645,8 +645,15 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 		callchain_param.key = CCKEY_FUNCTION;
 	else if (!strncmp(tok2, "address", strlen("address")))
 		callchain_param.key = CCKEY_ADDRESS;
-	else
+	else if (tok2[0] != 0)
 		return -1;
+
+	tok2 = strtok(NULL, ",");
+	if (!tok2)
+		goto setup;
+	if (!strncmp(tok2, "branch", 6))
+		callchain_param.branch_callstack = 1;
+
 setup:
 	if (callchain_register_param(&callchain_param) < 0) {
 		pr_err("Can't register callchain params\n");
@@ -762,8 +769,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "regex filter to identify parent, see: '--sort parent'"),
 	OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
 		    "Only display entries with parent-match"),
-	OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order",
-		     "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). "
+	OPT_CALLBACK_DEFAULT('g', "call-graph", &report, "output_type,min_percent[,print_limit],call_order[,branch]",
+		     "Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address), add branches. "
 		     "Default: fractal,0.5,callee,function", &parse_callchain_opt, callchain_default_opt),
 	OPT_INTEGER(0, "max-stack", &report.max_stack,
 		    "Set the maximum stack depth when parsing the callchain, "
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 8ad97e9..be29bb8 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -53,6 +53,7 @@ struct callchain_param {
 	sort_chain_func_t	sort;
 	enum chain_order	order;
 	enum chain_key		key;
+	bool			branch_callstack;
 };
 
 struct callchain_list {
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ac37d78..6eff65e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -11,6 +11,7 @@
 #include <stdbool.h>
 #include <symbol/kallsyms.h>
 #include "unwind.h"
+#include "linux/hash.h"
 
 int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 {
@@ -1192,17 +1193,16 @@ static const u8 cpumodes[] = {
 };
 #define NCPUMODES (sizeof(cpumodes)/sizeof(u8))
 
-static void ip__resolve_ams(struct machine *machine, struct thread *thread,
-			    struct addr_map_symbol *ams,
-			    u64 ip)
+static void resolve_branch_ams(struct machine *machine, struct thread *thread,
+			       u64 ip,
+			       struct addr_location *al)
 {
-	struct addr_location al;
 	size_t i;
 	u8 m;
 
-	memset(&al, 0, sizeof(al));
+	memset(al, 0, sizeof(*(al)));
 
-	for (i = 0; i < NCPUMODES; i++) {
+	for (i = 0; i < NCPUMODES && !al->sym; i++) {
 		m = cpumodes[i];
 		/*
 		 * We cannot use the header.misc hint to determine whether a
@@ -1212,11 +1212,17 @@ static void ip__resolve_ams(struct machine *machine, struct thread *thread,
 		 * or else, the symbol is unknown
 		 */
 		thread__find_addr_location(thread, machine, m, MAP__FUNCTION,
-				ip, &al);
-		if (al.sym)
-			goto found;
+				ip, al);
 	}
-found:
+}
+
+static void ip__resolve_ams(struct machine *machine, struct thread *thread,
+			    struct addr_map_symbol *ams,
+			    u64 ip)
+{
+	struct addr_location al;
+
+	resolve_branch_ams(machine, thread, ip, &al);
 	ams->addr = ip;
 	ams->al_addr = al.addr;
 	ams->sym = al.sym;
@@ -1272,9 +1278,85 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 	return bi;
 }
 
+static int add_callchain_ip(struct machine *machine,
+			    struct thread *thread,
+			    struct symbol **parent,
+			    struct addr_location *root_al,
+			    int cpumode,
+			    u64 ip)
+{
+	struct addr_location al;
+
+	al.filtered = false;
+	al.sym = NULL;
+	if (cpumode == -1) {
+		resolve_branch_ams(machine, thread, ip, &al);
+	} else {
+		thread__find_addr_location(thread, machine, cpumode,
+					   MAP__FUNCTION, ip, &al);
+	}
+	if (al.sym != NULL) {
+		if (sort__has_parent && !*parent &&
+		    symbol__match_regex(al.sym, &parent_regex))
+			*parent = al.sym;
+		else if (have_ignore_callees && root_al &&
+		  symbol__match_regex(al.sym, &ignore_callees_regex)) {
+			/* Treat this symbol as the root,
+			   forgetting its callees. */
+			*root_al = al;
+			callchain_cursor_reset(&callchain_cursor);
+		}
+		if (!symbol_conf.use_callchain)
+			return -EINVAL;
+	}
+
+	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+}
+
+#define CHASHSZ 127
+#define CHASHBITS 7
+#define NO_ENTRY 0xff
+
+#define PERF_MAX_BRANCH_DEPTH 127
+
+/* Remove loops. */
+static int remove_loops(struct branch_entry *l, int nr)
+{
+	int i, j, off;
+	unsigned char chash[CHASHSZ];
+	memset(chash, -1, sizeof(chash));
+
+	BUG_ON(nr >= 256);
+	for (i = 0; i < nr; i++) {
+		int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
+
+		/* no collision handling for now */
+		if (chash[h] == NO_ENTRY) {
+			chash[h] = i;
+		} else if (l[chash[h]].from == l[i].from) {
+			bool is_loop = true;
+			/* check if it is a real loop */
+			off = 0;
+			for (j = chash[h]; j < i && i + off < nr; j++, off++)
+				if (l[j].from != l[i + off].from) {
+					is_loop = false;
+					break;
+				}
+			if (is_loop) {
+				memmove(l + i, l + i + off,
+					(nr - (i + off))
+					* sizeof(struct branch_entry));
+				nr -= off;
+			}
+		}
+	}
+	return nr;
+}
+
 static int machine__resolve_callchain_sample(struct machine *machine,
 					     struct thread *thread,
 					     struct ip_callchain *chain,
+					     struct branch_stack *branch,
 					     struct symbol **parent,
 					     struct addr_location *root_al,
 					     int max_stack)
@@ -1283,17 +1365,73 @@ static int machine__resolve_callchain_sample(struct machine *machine,
 	int chain_nr = min(max_stack, (int)chain->nr);
 	int i;
 	int err;
+	int first_call = 0;
 
 	callchain_cursor_reset(&callchain_cursor);
 
+	/*
+	 * Add branches to call stack for easier browsing. This gives
+	 * more context for a sample than just the callers.
+	 *
+	 * This uses individual histograms of paths compared to the
+	 * aggregated histograms the normal LBR mode uses.
+	 *
+	 * Limitations for now:
+	 * - No extra filters
+	 * - No annotations (should annotate somehow)
+	 */
+
+	if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
+		pr_warning("corrupted branch chain. skipping...\n");
+		return 0;
+	}
+
+	if (callchain_param.branch_callstack) {
+		int nr = min(max_stack, (int)branch->nr);
+		struct branch_entry be[nr];
+
+		for (i = 0; i < nr; i++) {
+			if (callchain_param.order == ORDER_CALLEE) {
+				be[i] = branch->entries[i];
+				/*
+				 * Check for overlap into the callchain.
+				 * The return address is one off compared to
+				 * the branch entry. To adjust for this
+				 * assume the calling instruction is not longer
+				 * than 8 bytes.
+				 */
+				if (be[i].from < chain->ips[first_call] &&
+				    be[i].from >= chain->ips[first_call] - 8)
+					first_call++;
+			} else
+				be[i] = branch->entries[branch->nr - i - 1];
+		}
+
+		nr = remove_loops(be, nr);
+
+		for (i = 0; i < nr; i++) {
+			err = add_callchain_ip(machine, thread, parent,
+					       root_al,
+					       -1, be[i].to);
+			if (!err)
+				err = add_callchain_ip(machine, thread,
+						       parent, root_al,
+						       -1, be[i].from);
+			if (err == -EINVAL)
+				break;
+			if (err)
+				return err;
+		}
+		chain_nr -= nr;
+	}
+
 	if (chain->nr > PERF_MAX_STACK_DEPTH) {
 		pr_warning("corrupted callchain. skipping...\n");
 		return 0;
 	}
 
-	for (i = 0; i < chain_nr; i++) {
+	for (i = first_call; i < chain_nr; i++) {
 		u64 ip;
-		struct addr_location al;
 
 		if (callchain_param.order == ORDER_CALLEE)
 			ip = chain->ips[i];
@@ -1324,24 +1462,10 @@ static int machine__resolve_callchain_sample(struct machine *machine,
 			continue;
 		}
 
-		al.filtered = false;
-		thread__find_addr_location(thread, machine, cpumode,
-					   MAP__FUNCTION, ip, &al);
-		if (al.sym != NULL) {
-			if (sort__has_parent && !*parent &&
-			    symbol__match_regex(al.sym, &parent_regex))
-				*parent = al.sym;
-			else if (have_ignore_callees && root_al &&
-			  symbol__match_regex(al.sym, &ignore_callees_regex)) {
-				/* Treat this symbol as the root,
-				   forgetting its callees. */
-				*root_al = al;
-				callchain_cursor_reset(&callchain_cursor);
-			}
-		}
-
-		err = callchain_cursor_append(&callchain_cursor,
-					      ip, al.map, al.sym);
+		err = add_callchain_ip(machine, thread, parent, root_al,
+				       cpumode, ip);
+		if (err == -EINVAL)
+			break;
 		if (err)
 			return err;
 	}
@@ -1367,7 +1491,9 @@ int machine__resolve_callchain(struct machine *machine,
 	int ret;
 
 	ret = machine__resolve_callchain_sample(machine, thread,
-						sample->callchain, parent,
+						sample->callchain,
+						sample->branch_stack,
+						parent,
 						root_al, max_stack);
 	if (ret)
 		return ret;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 2553ae0..33b2be7 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -115,7 +115,8 @@ struct symbol_conf {
 			annotate_asm_raw,
 			annotate_src,
 			event_group,
-			demangle;
+			demangle,
+			branch_callstack;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/8] perf, tools: Add --branch-history option to report v2
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
  2014-02-28  4:22 ` [PATCH 1/8] perf, tools: fix BFD detection on opensuse Andi Kleen
  2014-02-28  4:22 ` [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4 Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-03-07 13:19   ` Jiri Olsa
  2014-02-28  4:22 ` [PATCH 4/8] perf, tools: Enable printing the srcline in the history v2 Andi Kleen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a --branch-history option to perf report that changes all
the settings necessary for using the branches in callstacks.

This is just a short cut to make this nicer to use, it does
not enable any functionality by itself.

v2: Change sort order. Rename option to --branch-history to
be less confusing.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |  5 +++++
 tools/perf/builtin-report.c              | 28 +++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 349cd2a..5b270fe 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -228,6 +228,11 @@ OPTIONS
 	branch stacks and it will automatically switch to the branch view mode,
 	unless --no-branch-stack is used.
 
+--branch-history::
+	Add the addresses of sampled taken branches to the callstack.
+	This allows to examine the path the program took to each sample.
+	The data collection must have used -b (or -j) and -g.
+
 --objdump=<path>::
         Path to objdump binary.
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f7a9e3d..302c4e8 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -691,6 +691,16 @@ parse_branch_mode(const struct option *opt __maybe_unused,
 }
 
 static int
+parse_branch_call_mode(const struct option *opt __maybe_unused,
+		  const char *str __maybe_unused, int unset)
+{
+	int *branch_mode = opt->value;
+
+	*branch_mode = !unset;
+	return 0;
+}
+
+static int
 parse_percent_limit(const struct option *opt, const char *str,
 		    int unset __maybe_unused)
 {
@@ -705,7 +715,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	struct perf_session *session;
 	struct stat st;
 	bool has_br_stack = false;
-	int branch_mode = -1;
+	int branch_mode = -1, branch_call_mode = -1;
 	int ret = -1;
 	char callchain_default_opt[] = "fractal,0.5,callee";
 	const char * const report_usage[] = {
@@ -814,7 +824,11 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
 		    "Show event group information together"),
 	OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
-		    "use branch records for histogram filling", parse_branch_mode),
+		    "use branch records for per branch histogram filling",
+		    parse_branch_mode),
+	OPT_CALLBACK_NOOPT(0, "branch-history", &branch_call_mode, "",
+		    "add last branch records to call history",
+		    parse_branch_call_mode),
 	OPT_STRING(0, "objdump", &objdump_path, "path",
 		   "objdump binary to use for disassembly and annotations"),
 	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
@@ -862,8 +876,16 @@ repeat:
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
-	if (branch_mode == -1 && has_br_stack)
+	if (branch_mode == -1 && has_br_stack && branch_call_mode == -1)
 		sort__mode = SORT_MODE__BRANCH;
+	if (branch_call_mode != -1) {
+		callchain_param.branch_callstack = 1;
+		callchain_param.key = CCKEY_ADDRESS;
+		symbol_conf.use_callchain = true;
+		callchain_register_param(&callchain_param);
+		if (sort_order == default_sort_order)
+			sort_order = "srcline,symbol,dso";
+	}
 
 	/* sort__mode could be NORMAL if --no-branch-stack */
 	if (sort__mode == SORT_MODE__BRANCH) {
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/8] perf, tools: Enable printing the srcline in the history v2
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
                   ` (2 preceding siblings ...)
  2014-02-28  4:22 ` [PATCH 3/8] perf, tools: Add --branch-history option to report v2 Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-03-01 16:22   ` Andi Kleen
  2014-02-28  4:22 ` [PATCH 5/8] perf, tools: Only print base source file for srcline Andi Kleen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.

When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.

This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.

Contains fixes from Namhyung Kim

v2: Refactor code into common function
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/ui/browsers/hists.c | 17 -----------------
 tools/perf/ui/stdio/hist.c     | 23 +++++++++--------------
 tools/perf/util/callchain.c    | 29 +++++++++++++++++++++++++++++
 tools/perf/util/callchain.h    |  5 +++++
 tools/perf/util/machine.c      |  2 +-
 tools/perf/util/srcline.c      |  6 ++++--
 6 files changed, 48 insertions(+), 34 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index b720b92..ec69142 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -394,23 +394,6 @@ out:
 	return key;
 }
 
-static char *callchain_list__sym_name(struct callchain_list *cl,
-				      char *bf, size_t bfsize, bool show_dso)
-{
-	int printed;
-
-	if (cl->ms.sym)
-		printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-	else
-		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-
-	if (show_dso)
-		scnprintf(bf + printed, bfsize - printed, " %s",
-			  cl->ms.map ? cl->ms.map->dso->short_name : "unknown");
-
-	return bf;
-}
-
 #define LEVEL_OFFSET_STEP 3
 
 static int hist_browser__show_callchain_node_rb_tree(struct hist_browser *browser,
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 831fbb7..3ac9ace 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -41,6 +41,7 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 {
 	int i;
 	size_t ret = 0;
+	char bf[1024];
 
 	ret += callchain__fprintf_left_margin(fp, left_margin);
 	for (i = 0; i < depth; i++) {
@@ -56,11 +57,8 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 		} else
 			ret += fprintf(fp, "%s", "          ");
 	}
-	if (chain->ms.sym)
-		ret += fprintf(fp, "%s\n", chain->ms.sym->name);
-	else
-		ret += fprintf(fp, "0x%0" PRIx64 "\n", chain->ip);
-
+	fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
+	fputc('\n', fp);
 	return ret;
 }
 
@@ -168,6 +166,7 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 	struct rb_node *node;
 	int i = 0;
 	int ret = 0;
+	char bf[1024];
 
 	/*
 	 * If have one single callchain root, don't bother printing
@@ -195,10 +194,8 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 			} else
 				ret += callchain__fprintf_left_margin(fp, left_margin);
 
-			if (chain->ms.sym)
-				ret += fprintf(fp, " %s\n", chain->ms.sym->name);
-			else
-				ret += fprintf(fp, " %p\n", (void *)(long)chain->ip);
+			ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
+							false));
 
 			if (++entries_printed == callchain_param.print_limit)
 				break;
@@ -218,6 +215,7 @@ static size_t __callchain__fprintf_flat(FILE *fp, struct callchain_node *node,
 {
 	struct callchain_list *chain;
 	size_t ret = 0;
+	char bf[1024];
 
 	if (!node)
 		return 0;
@@ -228,11 +226,8 @@ static size_t __callchain__fprintf_flat(FILE *fp, struct callchain_node *node,
 	list_for_each_entry(chain, &node->val, list) {
 		if (chain->ip >= PERF_CONTEXT_MAX)
 			continue;
-		if (chain->ms.sym)
-			ret += fprintf(fp, "                %s\n", chain->ms.sym->name);
-		else
-			ret += fprintf(fp, "                %p\n",
-					(void *)(long)chain->ip);
+		ret += fprintf(fp, "                %s\n", callchain_list__sym_name(chain,
+					bf, sizeof(bf), false));
 	}
 
 	return ret;
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8d9db45..56ab517 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -551,3 +551,32 @@ int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *samp
 		return 0;
 	return callchain_append(he->callchain, &callchain_cursor, sample->period);
 }
+
+char *callchain_list__sym_name(struct callchain_list *cl,
+			       char *bf, size_t bfsize, bool show_dso)
+{
+	int printed;
+
+	if (cl->ms.sym) {
+		if (callchain_param.key == CCKEY_ADDRESS &&
+		    cl->ms.map && !cl->srcline)
+			cl->srcline = get_srcline(cl->ms.map->dso,
+						  map__rip_2objdump(cl->ms.map,
+								    cl->ip));
+		if (cl->srcline)
+			printed = scnprintf(bf, bfsize, "%s %s",
+					cl->ms.sym->name, cl->srcline);
+		else
+			printed = scnprintf(bf, bfsize, "%s",
+					    cl->ms.sym->name);
+	} else
+		printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
+
+	if (show_dso)
+		scnprintf(bf + printed, bfsize - printed, " %s",
+			  cl->ms.map ?
+			  cl->ms.map->dso->short_name :
+			  "unknown");
+
+	return bf;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index be29bb8..39322ef 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -59,6 +59,7 @@ struct callchain_param {
 struct callchain_list {
 	u64			ip;
 	struct map_symbol	ms;
+	char		       *srcline;
 	struct list_head	list;
 };
 
@@ -158,4 +159,8 @@ int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent
 int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample);
 
 extern const char record_callchain_help[];
+
+char *callchain_list__sym_name(struct callchain_list *cl,
+			       char *bf, size_t bfsize, bool show_dso);
+
 #endif	/* __PERF_CALLCHAIN_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6eff65e..ef198c3 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1310,7 +1310,7 @@ static int add_callchain_ip(struct machine *machine,
 			return -EINVAL;
 	}
 
-	return callchain_cursor_append(&callchain_cursor, ip, al.map, al.sym);
+	return callchain_cursor_append(&callchain_cursor, al.addr, al.map, al.sym);
 }
 
 #define CHASHSZ 127
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index f3e4bc5..c6a7cdc 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -258,7 +258,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
 	const char *dso_name;
 
 	if (!dso->has_srcline)
-		return SRCLINE_UNKNOWN;
+		goto out;
 
 	if (dso->symsrc_filename)
 		dso_name = dso->symsrc_filename;
@@ -289,7 +289,9 @@ out:
 		dso->has_srcline = 0;
 		dso__free_a2l(dso);
 	}
-	return SRCLINE_UNKNOWN;
+	if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
+		return SRCLINE_UNKNOWN;
+	return srcline;
 }
 
 void free_srcline(char *srcline)
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/8] perf, tools: Only print base source file for srcline
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
                   ` (3 preceding siblings ...)
  2014-02-28  4:22 ` [PATCH 4/8] perf, tools: Enable printing the srcline in the history v2 Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-02-28  4:22 ` [PATCH 6/8] perf, tools: Support source line numbers in annotate Andi Kleen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

For perf report with --sort srcline only print the base source file
name. This makes the results generally fit much better to the
screen. The path is usually not that useful anyways because it is
often from different systems.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/srcline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index c6a7cdc..ac877f9 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -274,7 +274,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
 	if (!addr2line(dso_name, addr, &file, &line, dso))
 		goto out;
 
-	if (asprintf(&srcline, "%s:%u", file, line) < 0) {
+	if (asprintf(&srcline, "%s:%u", basename(file), line) < 0) {
 		free(file);
 		goto out;
 	}
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/8] perf, tools: Support source line numbers in annotate
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
                   ` (4 preceding siblings ...)
  2014-02-28  4:22 ` [PATCH 5/8] perf, tools: Only print base source file for srcline Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-02-28  4:22 ` [PATCH 7/8] perf, tools: Fix srcline sort key output to use width Andi Kleen
  2014-02-28  4:22 ` [PATCH 8/8] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
  7 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

With srcline key/sort'ing it's useful to have line numbers
in the annotate window. This patch implements this.

Use objdump -l to request the line numbers and
save them in the line structure. Then the browser
displays them for source lines.

The line numbers are not displayed by default, but can be
toggled on with 'k'

There is one unfortunate problem with this setup. For
lines not containing source and which are outside functions
objdump -l reports line numbers off by a few: it always reports
the first line number in the next function even for lines
that are outside the function.
I haven't find a nice way to detect/correct this. Probably objdump
has to be fixed.
See https://sourceware.org/bugzilla/show_bug.cgi?id=16433

The line numbers are still useful even with these problems,
as most are correct and the ones which are not are nearby.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/ui/browsers/annotate.c | 13 ++++++++++++-
 tools/perf/util/annotate.c        | 30 +++++++++++++++++++++++++-----
 tools/perf/util/annotate.h        |  1 +
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index f0697a3..8df6787 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -27,6 +27,7 @@ static struct annotate_browser_opt {
 	bool hide_src_code,
 	     use_offset,
 	     jump_arrows,
+	     show_linenr,
 	     show_nr_jumps;
 } annotate_browser__opts = {
 	.use_offset	= true,
@@ -128,7 +129,11 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
 	if (!*dl->line)
 		slsmg_write_nstring(" ", width - pcnt_width);
 	else if (dl->offset == -1) {
-		printed = scnprintf(bf, sizeof(bf), "%*s  ",
+		if (dl->line_nr && annotate_browser__opts.show_linenr)
+			printed = scnprintf(bf, sizeof(bf), "%*s %-5d ",
+					ab->addr_width, " ", dl->line_nr);
+		else
+			printed = scnprintf(bf, sizeof(bf), "%*s  ",
 				    ab->addr_width, " ");
 		slsmg_write_nstring(bf, printed);
 		slsmg_write_nstring(dl->line, width - printed - pcnt_width + 1);
@@ -733,6 +738,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
 		"o             Toggle disassembler output/simplified view\n"
 		"s             Toggle source code view\n"
 		"/             Search string\n"
+		"k	       Toggle line numbers\n"
 		"r             Run available scripts\n"
 		"?             Search string backwards\n");
 			continue;
@@ -741,6 +747,10 @@ static int annotate_browser__run(struct annotate_browser *browser,
 				script_browse(NULL);
 				continue;
 			}
+		case 'k':
+			annotate_browser__opts.show_linenr =
+				!annotate_browser__opts.show_linenr;
+			break;
 		case 'H':
 			nd = browser->curr_hot;
 			break;
@@ -984,6 +994,7 @@ static struct annotate_config {
 } annotate__configs[] = {
 	ANNOTATE_CFG(hide_src_code),
 	ANNOTATE_CFG(jump_arrows),
+	ANNOTATE_CFG(show_linenr),
 	ANNOTATE_CFG(show_nr_jumps),
 	ANNOTATE_CFG(use_offset),
 };
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 469eb67..f020110 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -15,11 +15,13 @@
 #include "debug.h"
 #include "annotate.h"
 #include "evsel.h"
+#include <regex.h>
 #include <pthread.h>
 #include <linux/bitops.h>
 
 const char 	*disassembler_style;
 const char	*objdump_path;
+static regex_t	 file_lineno;
 
 static struct ins *ins__find(const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
@@ -562,13 +564,15 @@ out_free_name:
 	return -1;
 }
 
-static struct disasm_line *disasm_line__new(s64 offset, char *line, size_t privsize)
+static struct disasm_line *disasm_line__new(s64 offset, char *line,
+					size_t privsize, int line_nr)
 {
 	struct disasm_line *dl = zalloc(sizeof(*dl) + privsize);
 
 	if (dl != NULL) {
 		dl->offset = offset;
 		dl->line = strdup(line);
+		dl->line_nr = line_nr;
 		if (dl->line == NULL)
 			goto out_delete;
 
@@ -780,13 +784,15 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st
  * The ops.raw part will be parsed further according to type of the instruction.
  */
 static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
-				      FILE *file, size_t privsize)
+				      FILE *file, size_t privsize,
+				      int *line_nr)
 {
 	struct annotation *notes = symbol__annotation(sym);
 	struct disasm_line *dl;
 	char *line = NULL, *parsed_line, *tmp, *tmp2, *c;
 	size_t line_len;
 	s64 line_ip, offset = -1;
+	regmatch_t match[2];
 
 	if (getline(&line, &line_len, file) < 0)
 		return -1;
@@ -804,6 +810,12 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 	line_ip = -1;
 	parsed_line = line;
 
+	/* /filename:linenr ? Save line number and ignore. */
+	if (regexec(&file_lineno, line, 2, match, 0) == 0) {
+		*line_nr = atoi(line + match[1].rm_so);
+		return 0;
+	}
+
 	/*
 	 * Strip leading spaces:
 	 */
@@ -834,8 +846,9 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 			parsed_line = tmp2 + 1;
 	}
 
-	dl = disasm_line__new(offset, parsed_line, privsize);
+	dl = disasm_line__new(offset, parsed_line, privsize, *line_nr);
 	free(line);
+	(*line_nr)++;
 
 	if (dl == NULL)
 		return -1;
@@ -861,6 +874,11 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 	return 0;
 }
 
+static __attribute__((constructor)) void symbol__init_regexpr(void)
+{
+	regcomp(&file_lineno, "^/[^:]+:([0-9]+)$", REG_EXTENDED);
+}
+
 static void delete_last_nop(struct symbol *sym)
 {
 	struct annotation *notes = symbol__annotation(sym);
@@ -896,6 +914,7 @@ int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize)
 	char symfs_filename[PATH_MAX];
 	struct kcore_extract kce;
 	bool delete_extract = false;
+	int lineno = 0;
 
 	if (filename) {
 		snprintf(symfs_filename, sizeof(symfs_filename), "%s%s",
@@ -977,7 +996,7 @@ fallback:
 	snprintf(command, sizeof(command),
 		 "%s %s%s --start-address=0x%016" PRIx64
 		 " --stop-address=0x%016" PRIx64
-		 " -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
+		 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand",
 		 objdump_path ? objdump_path : "objdump",
 		 disassembler_style ? "-M " : "",
 		 disassembler_style ? disassembler_style : "",
@@ -994,7 +1013,8 @@ fallback:
 		goto out_free_filename;
 
 	while (!feof(file))
-		if (symbol__parse_objdump_line(sym, map, file, privsize) < 0)
+		if (symbol__parse_objdump_line(sym, map, file, privsize,
+			    &lineno) < 0)
 			break;
 
 	/*
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index b2aef59..a124e7e 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -58,6 +58,7 @@ struct disasm_line {
 	char		    *line;
 	char		    *name;
 	struct ins	    *ins;
+	int		    line_nr;
 	struct ins_operands ops;
 };
 
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 7/8] perf, tools: Fix srcline sort key output to use width
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
                   ` (5 preceding siblings ...)
  2014-02-28  4:22 ` [PATCH 6/8] perf, tools: Support source line numbers in annotate Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-02-28  4:22 ` [PATCH 8/8] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
  7 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The srcline sort output ignored the width, which caused
various problems with displaying srcline in the tui
browser. Just cut it off at width.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 635cd8f..d273cf2 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -284,7 +284,7 @@ static int hist_entry__srcline_snprintf(struct hist_entry *he, char *bf,
 					size_t size,
 					unsigned int width __maybe_unused)
 {
-	return repsep_snprintf(bf, size, "%s", he->srcline);
+	return repsep_snprintf(bf, size, "%.*s", width, he->srcline);
 }
 
 struct sort_entry sort_srcline = {
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 8/8] tools, perf: Make get_srcline fall back to sym+offset
  2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
                   ` (6 preceding siblings ...)
  2014-02-28  4:22 ` [PATCH 7/8] perf, tools: Fix srcline sort key output to use width Andi Kleen
@ 2014-02-28  4:22 ` Andi Kleen
  2014-03-07 13:19   ` Jiri Olsa
  7 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2014-02-28  4:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When the source line is not found fall back to sym + offset.
This is generally much more useful than a raw address.
For this we need to pass in the symbol from the caller.
For some callers it's awkward to compute, so we stay
at the old behaviour.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/annotate.c  |  2 +-
 tools/perf/util/callchain.c |  3 ++-
 tools/perf/util/map.c       |  2 +-
 tools/perf/util/sort.c      |  6 ++++--
 tools/perf/util/srcline.c   | 12 +++++++++---
 tools/perf/util/util.h      |  4 +++-
 6 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index f020110..6bc23bf 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1185,7 +1185,7 @@ static int symbol__get_source_line(struct symbol *sym, struct map *map,
 			goto next;
 
 		offset = start + i;
-		src_line->path = get_srcline(map->dso, offset);
+		src_line->path = get_srcline(map->dso, offset, NULL, false);
 		insert_source_line(&tmp_root, src_line);
 
 	next:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 56ab517..9ad5fc7 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -562,7 +562,8 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 		    cl->ms.map && !cl->srcline)
 			cl->srcline = get_srcline(cl->ms.map->dso,
 						  map__rip_2objdump(cl->ms.map,
-								    cl->ip));
+								    cl->ip),
+						  cl->ms.sym, false);
 		if (cl->srcline)
 			printed = scnprintf(bf, bfsize, "%s %s",
 					cl->ms.sym->name, cl->srcline);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 39cd2d0..9638e68 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -262,7 +262,7 @@ int map__fprintf_srcline(struct map *map, u64 addr, const char *prefix,
 
 	if (map && map->dso) {
 		srcline = get_srcline(map->dso,
-				      map__rip_2objdump(map, addr));
+				      map__rip_2objdump(map, addr), NULL, true);
 		if (srcline != SRCLINE_UNKNOWN)
 			ret = fprintf(fp, "%s%s", prefix, srcline);
 		free_srcline(srcline);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index d273cf2..80e6dfa 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -265,7 +265,8 @@ sort__srcline_cmp(struct hist_entry *left, struct hist_entry *right)
 		else {
 			struct map *map = left->ms.map;
 			left->srcline = get_srcline(map->dso,
-					    map__rip_2objdump(map, left->ip));
+					   map__rip_2objdump(map, left->ip),
+						    left->ms.sym, true);
 		}
 	}
 	if (!right->srcline) {
@@ -274,7 +275,8 @@ sort__srcline_cmp(struct hist_entry *left, struct hist_entry *right)
 		else {
 			struct map *map = right->ms.map;
 			right->srcline = get_srcline(map->dso,
-					    map__rip_2objdump(map, right->ip));
+					     map__rip_2objdump(map, right->ip),
+						     right->ms.sym, true);
 		}
 	}
 	return strcmp(left->srcline, right->srcline);
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ac877f9..36a7aff 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -8,12 +8,13 @@
 #include "util/util.h"
 #include "util/debug.h"
 
+#include "symbol.h"
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
  * Implement addr2line using libbfd.
  */
-#define PACKAGE "perf"
 #include <bfd.h>
 
 struct a2l_data {
@@ -250,7 +251,8 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
  */
 #define A2L_FAIL_LIMIT 123
 
-char *get_srcline(struct dso *dso, unsigned long addr)
+char *get_srcline(struct dso *dso, unsigned long addr, struct symbol *sym,
+		  bool show_sym)
 {
 	char *file = NULL;
 	unsigned line = 0;
@@ -289,7 +291,11 @@ out:
 		dso->has_srcline = 0;
 		dso__free_a2l(dso);
 	}
-	if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
+	if (sym) {
+		if (asprintf(&srcline, "%s+%ld", show_sym ? sym->name : "",
+					addr - sym->start) < 0)
+			return SRCLINE_UNKNOWN;
+	} else if (asprintf(&srcline, "%s[%lx]", dso->short_name, addr) < 0)
 		return SRCLINE_UNKNOWN;
 	return srcline;
 }
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 6995d66..4b29ed1 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -317,8 +317,10 @@ unsigned long parse_tag_value(const char *str, struct parse_tag *tags);
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
 struct dso;
+struct symbol;
 
-char *get_srcline(struct dso *dso, unsigned long addr);
+char *get_srcline(struct dso *dso, unsigned long addr, struct symbol *sym,
+		  bool show_sym);
 void free_srcline(char *srcline);
 
 int filename__read_int(const char *filename, int *value);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/8] perf, tools: Enable printing the srcline in the history v2
  2014-02-28  4:22 ` [PATCH 4/8] perf, tools: Enable printing the srcline in the history v2 Andi Kleen
@ 2014-03-01 16:22   ` Andi Kleen
  0 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2014-03-01 16:22 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, eranian, namhyung, jolsa

Andi Kleen <andi@firstfloor.org> writes:

> From: Andi Kleen <ak@linux.intel.com>
>
> For lbr-as-callgraph we need to see the line number in the history,
> because many LBR entries can be in a single function, and just
> showing the same function name many times is not useful.

This patch broke the GTK build, fixed with this followon patch:

>From fd2b85cc85534f3a3b025b36bab6c42d31edc660 Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Fri, 28 Feb 2014 16:53:40 -0800
Subject: [PATCH] perf, tools: Fix gtk compilation with src line patch.

Convert the gtk browser code to the new shared callchain_list__sym_name
function too.

Should be folded into "perf, tools: Enable printing the srcline in the history"

Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 5b95c44..cafe105 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -123,15 +123,6 @@ void perf_gtk__init_hpp(void)
 				perf_gtk__hpp_color_overhead_guest_us;
 }
 
-static void callchain_list__sym_name(struct callchain_list *cl,
-				     char *bf, size_t bfsize)
-{
-	if (cl->ms.sym)
-		scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-	else
-		scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-}
-
 static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 				    GtkTreeIter *parent, int col, u64 total)
 {
@@ -162,7 +153,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 			scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
 			gtk_tree_store_set(store, &iter, 0, buf, -1);
 
-			callchain_list__sym_name(chain, buf, sizeof(buf));
+			callchain_list__sym_name(chain, buf, sizeof(buf), false);
 			gtk_tree_store_set(store, &iter, col, buf, -1);
 
 			if (need_new_parent) {



-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-02-28  4:22 ` [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4 Andi Kleen
@ 2014-03-05 10:29   ` Jiri Olsa
  2014-03-05 15:31     ` Andi Kleen
  2014-03-07 13:19   ` Jiri Olsa
  1 sibling, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2014-03-05 10:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, mingo, linux-kernel, eranian, namhyung, Andi Kleen

On Thu, Feb 27, 2014 at 08:22:26PM -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 

SNIP

> +
>  static int machine__resolve_callchain_sample(struct machine *machine,
>  					     struct thread *thread,
>  					     struct ip_callchain *chain,
> +					     struct branch_stack *branch,
>  					     struct symbol **parent,
>  					     struct addr_location *root_al,
>  					     int max_stack)
> @@ -1283,17 +1365,73 @@ static int machine__resolve_callchain_sample(struct machine *machine,
>  	int chain_nr = min(max_stack, (int)chain->nr);
>  	int i;
>  	int err;
> +	int first_call = 0;
>  
>  	callchain_cursor_reset(&callchain_cursor);
>  
> +	/*
> +	 * Add branches to call stack for easier browsing. This gives
> +	 * more context for a sample than just the callers.
> +	 *
> +	 * This uses individual histograms of paths compared to the
> +	 * aggregated histograms the normal LBR mode uses.
> +	 *
> +	 * Limitations for now:
> +	 * - No extra filters
> +	 * - No annotations (should annotate somehow)
> +	 */
> +
> +	if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
> +		pr_warning("corrupted branch chain. skipping...\n");
> +		return 0;
> +	}

segfaults here..

[jolsa@krava perf]$ ./perf record -g sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (~567 samples) ]
[jolsa@krava perf]$ ./perf report
perf: Segmentation fault
[jolsa@krava perf]$ 

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-03-05 10:29   ` Jiri Olsa
@ 2014-03-05 15:31     ` Andi Kleen
  2014-03-07 13:19       ` Jiri Olsa
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2014-03-05 15:31 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, acme, mingo, linux-kernel, eranian, namhyung

> segfaults here..
> 
> [jolsa@krava perf]$ ./perf record -g sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.013 MB perf.data (~567 samples) ]
> [jolsa@krava perf]$ ./perf report
> perf: Segmentation fault
> [jolsa@krava perf]$ 

Thanks. Fixed with this patch:

commit 124014960a8fb09043914fb79ea86e0444408ab0
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Mar 5 07:28:56 2014 -0800

    perf, tools, report: Fix perf report without -b
    
    Fix report crash when -b is not used. Reported by Jiri Olsa.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ef198c3..04ec451 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1381,15 +1381,15 @@ static int machine__resolve_callchain_sample(struct machine *machine,
 	 * - No annotations (should annotate somehow)
 	 */
 
-	if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
-		pr_warning("corrupted branch chain. skipping...\n");
-		return 0;
-	}
-
-	if (callchain_param.branch_callstack) {
+	if (branch && callchain_param.branch_callstack) {
 		int nr = min(max_stack, (int)branch->nr);
 		struct branch_entry be[nr];
 
+		if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
+			pr_warning("corrupted branch chain. skipping...\n");
+			return 0;
+		}
+
 		for (i = 0; i < nr; i++) {
 			if (callchain_param.order == ORDER_CALLEE) {
 				be[i] = branch->entries[i];


-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-03-05 15:31     ` Andi Kleen
@ 2014-03-07 13:19       ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2014-03-07 13:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, acme, mingo, linux-kernel, eranian, namhyung

On Wed, Mar 05, 2014 at 07:31:49AM -0800, Andi Kleen wrote:
> > segfaults here..
> > 
> > [jolsa@krava perf]$ ./perf record -g sleep 1
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.013 MB perf.data (~567 samples) ]
> > [jolsa@krava perf]$ ./perf report
> > perf: Segmentation fault
> > [jolsa@krava perf]$ 
> 
> Thanks. Fixed with this patch:
> 
> commit 124014960a8fb09043914fb79ea86e0444408ab0
> Author: Andi Kleen <ak@linux.intel.com>
> Date:   Wed Mar 5 07:28:56 2014 -0800
> 
>     perf, tools, report: Fix perf report without -b
>     
>     Fix report crash when -b is not used. Reported by Jiri Olsa.
>     
>     Signed-off-by: Andi Kleen <ak@linux.intel.com>

you need to merge it with the original one,
so we dont have broken build.. git bisect wise

same for the gtk stuff

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] perf, tools: Add --branch-history option to report v2
  2014-02-28  4:22 ` [PATCH 3/8] perf, tools: Add --branch-history option to report v2 Andi Kleen
@ 2014-03-07 13:19   ` Jiri Olsa
  2014-03-11  0:31     ` Andi Kleen
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2014-03-07 13:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, mingo, linux-kernel, eranian, namhyung, Andi Kleen

On Thu, Feb 27, 2014 at 08:22:27PM -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a --branch-history option to perf report that changes all
> the settings necessary for using the branches in callstacks.
> 
> This is just a short cut to make this nicer to use, it does
> not enable any functionality by itself.
> 
> v2: Change sort order. Rename option to --branch-history to
> be less confusing.
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/Documentation/perf-report.txt |  5 +++++
>  tools/perf/builtin-report.c              | 28 +++++++++++++++++++++++++---
>  2 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index 349cd2a..5b270fe 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -228,6 +228,11 @@ OPTIONS
>  	branch stacks and it will automatically switch to the branch view mode,
>  	unless --no-branch-stack is used.
>  
> +--branch-history::
> +	Add the addresses of sampled taken branches to the callstack.
> +	This allows to examine the path the program took to each sample.
> +	The data collection must have used -b (or -j) and -g.
> +
>  --objdump=<path>::
>          Path to objdump binary.
>  
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index f7a9e3d..302c4e8 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -691,6 +691,16 @@ parse_branch_mode(const struct option *opt __maybe_unused,
>  }
>  
>  static int
> +parse_branch_call_mode(const struct option *opt __maybe_unused,
> +		  const char *str __maybe_unused, int unset)
> +{
> +	int *branch_mode = opt->value;
> +
> +	*branch_mode = !unset;
> +	return 0;
> +}

this is *same* as parse_branch_mode function and seems to be not needed,
I think both branch-stack and branch-history options could be handled
by OPT_BOOLEAN

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-02-28  4:22 ` [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4 Andi Kleen
  2014-03-05 10:29   ` Jiri Olsa
@ 2014-03-07 13:19   ` Jiri Olsa
  2014-03-07 19:51     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2014-03-07 13:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, mingo, linux-kernel, eranian, namhyung, Andi Kleen

On Thu, Feb 27, 2014 at 08:22:26PM -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>

SNIP

> index 8ad97e9..be29bb8 100644
> --- a/tools/perf/util/callchain.h
> +++ b/tools/perf/util/callchain.h
> @@ -53,6 +53,7 @@ struct callchain_param {
>  	sort_chain_func_t	sort;
>  	enum chain_order	order;
>  	enum chain_key		key;
> +	bool			branch_callstack;
>  };
>  
>  struct callchain_list {
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index ac37d78..6eff65e 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -11,6 +11,7 @@
>  #include <stdbool.h>
>  #include <symbol/kallsyms.h>
>  #include "unwind.h"
> +#include "linux/hash.h"
>  
>  int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
>  {
> @@ -1192,17 +1193,16 @@ static const u8 cpumodes[] = {
>  };
>  #define NCPUMODES (sizeof(cpumodes)/sizeof(u8))
>  
> -static void ip__resolve_ams(struct machine *machine, struct thread *thread,
> -			    struct addr_map_symbol *ams,
> -			    u64 ip)
> +static void resolve_branch_ams(struct machine *machine, struct thread *thread,
> +			       u64 ip,
> +			       struct addr_location *al)

the function name is confusing.. there's no branch or ams
being dealt with in here

maybe something like resolve_ip_al or resolve_ip ?

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 8/8] tools, perf: Make get_srcline fall back to sym+offset
  2014-02-28  4:22 ` [PATCH 8/8] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
@ 2014-03-07 13:19   ` Jiri Olsa
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2014-03-07 13:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, mingo, linux-kernel, eranian, namhyung, Andi Kleen

On Thu, Feb 27, 2014 at 08:22:32PM -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> When the source line is not found fall back to sym + offset.
> This is generally much more useful than a raw address.
> For this we need to pass in the symbol from the caller.
> For some callers it's awkward to compute, so we stay
> at the old behaviour.

could we fallback for 0 src lines as well? I'm getting following:

            |          |--99.85%-- _IO_file_xsputn@@GLIBC_2.2.5 .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0
            |          |          __GI___mempcpy .:0

also it would be handy to show raw address (object based).. maybe for -v option?

jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-03-07 13:19   ` Jiri Olsa
@ 2014-03-07 19:51     ` Arnaldo Carvalho de Melo
  2014-03-07 19:54       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 22+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-03-07 19:51 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, mingo, linux-kernel, eranian, namhyung, Andi Kleen

Em Fri, Mar 07, 2014 at 02:19:49PM +0100, Jiri Olsa escreveu:
> On Thu, Feb 27, 2014 at 08:22:26PM -0800, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> 
> SNIP
> 
> > index 8ad97e9..be29bb8 100644
> > --- a/tools/perf/util/callchain.h
> > +++ b/tools/perf/util/callchain.h
> > @@ -53,6 +53,7 @@ struct callchain_param {
> >  	sort_chain_func_t	sort;
> >  	enum chain_order	order;
> >  	enum chain_key		key;
> > +	bool			branch_callstack;
> >  };
> >  
> >  struct callchain_list {
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index ac37d78..6eff65e 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -11,6 +11,7 @@
> >  #include <stdbool.h>
> >  #include <symbol/kallsyms.h>
> >  #include "unwind.h"
> > +#include "linux/hash.h"
> >  
> >  int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
> >  {
> > @@ -1192,17 +1193,16 @@ static const u8 cpumodes[] = {
> >  };
> >  #define NCPUMODES (sizeof(cpumodes)/sizeof(u8))
> >  
> > -static void ip__resolve_ams(struct machine *machine, struct thread *thread,
> > -			    struct addr_map_symbol *ams,
> > -			    u64 ip)
> > +static void resolve_branch_ams(struct machine *machine, struct thread *thread,
> > +			       u64 ip,
> > +			       struct addr_location *al)
> 
> the function name is confusing.. there's no branch or ams
> being dealt with in here

yeah, it should be renamed, and all it does is to try to find something
in all the possible cpu modes, so I think we should rename this to:

  thread__find_cpumode_addr_location()

And move it to right after thread__find_addr_location(), then use it
here, doing that and combining the other patches, etc, should have it in
a place for testing soon.

- Arnaldo
 
> maybe something like resolve_ip_al or resolve_ip ?
> 
> jirka

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-03-07 19:51     ` Arnaldo Carvalho de Melo
@ 2014-03-07 19:54       ` Arnaldo Carvalho de Melo
  2014-03-11  0:43         ` Andi Kleen
  0 siblings, 1 reply; 22+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-03-07 19:54 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, mingo, linux-kernel, eranian, namhyung, Andi Kleen

Em Fri, Mar 07, 2014 at 04:51:00PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Mar 07, 2014 at 02:19:49PM +0100, Jiri Olsa escreveu:
> > On Thu, Feb 27, 2014 at 08:22:26PM -0800, Andi Kleen wrote:
> > > +static void resolve_branch_ams(struct machine *machine, struct thread *thread,
> > > +			       u64 ip,
> > > +			       struct addr_location *al)

> > the function name is confusing.. there's no branch or ams
> > being dealt with in here

> yeah, it should be renamed, and all it does is to try to find something
> in all the possible cpu modes, so I think we should rename this to:

>   thread__find_cpumode_addr_location()

> And move it to right after thread__find_addr_location(), then use it
> here, doing that and combining the other patches, etc, should have it in
> a place for testing soon.
>  
> > maybe something like resolve_ip_al or resolve_ip ?

Humm, then it gets overly long and we have
PERF_RECORD_MISC_CPUMODE_UNKNOWN, so I'll try making it so that if
cpumode is passed as PERF_RECORD_MISC_CPUMODE_UNKNOWN,
thread__find_addr_location() will try to figure it out.

- Arnaldo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] perf, tools: Add --branch-history option to report v2
  2014-03-07 13:19   ` Jiri Olsa
@ 2014-03-11  0:31     ` Andi Kleen
  2014-03-11 15:25       ` Jiri Olsa
  0 siblings, 1 reply; 22+ messages in thread
From: Andi Kleen @ 2014-03-11  0:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andi Kleen, acme, mingo, linux-kernel, eranian, namhyung, Andi Kleen

> this is *same* as parse_branch_mode function and seems to be not needed,
> I think both branch-stack and branch-history options could be handled
> by OPT_BOOLEAN

They can't because the type checker requires bool, and these variables
are tristate (UNSET, TRUE, FALSE)

-Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4
  2014-03-07 19:54       ` Arnaldo Carvalho de Melo
@ 2014-03-11  0:43         ` Andi Kleen
  0 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2014-03-11  0:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, mingo, linux-kernel, eranian, namhyung,
	Andi Kleen

> Humm, then it gets overly long and we have
> PERF_RECORD_MISC_CPUMODE_UNKNOWN, so I'll try making it so that if
> cpumode is passed as PERF_RECORD_MISC_CPUMODE_UNKNOWN,
> thread__find_addr_location() will try to figure it out.

I implemented Jiri's suggestion for now, just renaming the functions.

-Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] perf, tools: Add --branch-history option to report v2
  2014-03-11  0:31     ` Andi Kleen
@ 2014-03-11 15:25       ` Jiri Olsa
  2014-03-11 20:21         ` Andi Kleen
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2014-03-11 15:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, mingo, linux-kernel, eranian, namhyung, Andi Kleen

On Tue, Mar 11, 2014 at 01:31:54AM +0100, Andi Kleen wrote:
> > this is *same* as parse_branch_mode function and seems to be not needed,
> > I think both branch-stack and branch-history options could be handled
> > by OPT_BOOLEAN
> 
> They can't because the type checker requires bool, and these variables
> are tristate (UNSET, TRUE, FALSE)
> 
> -Andi

hum, I can see only 2 states there (please check patch below).. what do I miss?

also, I think we need to fix the check for sort_mode setup,
because following: 

  $ perf report -b

wouldn't set sort__mode to SORT_MODE__BRANCH

how about -b and --branch-history .. do we allow?

jirka


---
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 16675a0..1433b9c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -682,26 +682,6 @@ report_parse_ignore_callees_opt(const struct option *opt __maybe_unused,
 }
 
 static int
-parse_branch_mode(const struct option *opt __maybe_unused,
-		  const char *str __maybe_unused, int unset)
-{
-	int *branch_mode = opt->value;
-
-	*branch_mode = !unset;
-	return 0;
-}
-
-static int
-parse_branch_call_mode(const struct option *opt __maybe_unused,
-		  const char *str __maybe_unused, int unset)
-{
-	int *branch_mode = opt->value;
-
-	*branch_mode = !unset;
-	return 0;
-}
-
-static int
 parse_percent_limit(const struct option *opt, const char *str,
 		    int unset __maybe_unused)
 {
@@ -716,7 +696,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	struct perf_session *session;
 	struct stat st;
 	bool has_br_stack = false;
-	int branch_mode = -1, branch_call_mode = -1;
+	bool branch_mode = false, branch_call_mode = false;
 	int ret = -1;
 	char callchain_default_opt[] = "fractal,0.5,callee";
 	const char * const report_usage[] = {
@@ -824,12 +804,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show a column with the sum of periods"),
 	OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
 		    "Show event group information together"),
-	OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
-		    "use branch records for per branch histogram filling",
-		    parse_branch_mode),
-	OPT_CALLBACK_NOOPT(0, "branch-history", &branch_call_mode, "",
-		    "add last branch records to call history",
-		    parse_branch_call_mode),
+	OPT_BOOLEAN('b', "branch-stack", &branch_mode,
+		    "use branch records for per branch histogram filling"),
+	OPT_BOOLEAN(0, "branch-history", &branch_call_mode,
+		    "add last branch records to call history"),
 	OPT_STRING(0, "objdump", &objdump_path, "path",
 		   "objdump binary to use for disassembly and annotations"),
 	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
@@ -877,9 +855,9 @@ repeat:
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
-	if (branch_mode == -1 && has_br_stack && branch_call_mode == -1)
+	if (!branch_mode && has_br_stack && !branch_call_mode)
 		sort__mode = SORT_MODE__BRANCH;
-	if (branch_call_mode != -1) {
+	if (branch_call_mode) {
 		callchain_param.branch_callstack = 1;
 		callchain_param.key = CCKEY_ADDRESS;
 		symbol_conf.use_callchain = true;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/8] perf, tools: Add --branch-history option to report v2
  2014-03-11 15:25       ` Jiri Olsa
@ 2014-03-11 20:21         ` Andi Kleen
  0 siblings, 0 replies; 22+ messages in thread
From: Andi Kleen @ 2014-03-11 20:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andi Kleen, acme, mingo, linux-kernel, eranian, namhyung, Andi Kleen

On Tue, Mar 11, 2014 at 04:25:02PM +0100, Jiri Olsa wrote:
> On Tue, Mar 11, 2014 at 01:31:54AM +0100, Andi Kleen wrote:
> > > this is *same* as parse_branch_mode function and seems to be not needed,
> > > I think both branch-stack and branch-history options could be handled
> > > by OPT_BOOLEAN
> > 
> > They can't because the type checker requires bool, and these variables
> > are tristate (UNSET, TRUE, FALSE)
> > 
> > -Andi
> 
> hum, I can see only 2 states there (please check patch below).. what do I miss?

-1 means undefined. Your patch changes things that the override would
happen even if the user explicitely set these options. It's not
equivalent.


-Andi

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-03-11 20:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28  4:22 perf: Implement lbr-as-callgraph v4 Andi Kleen
2014-02-28  4:22 ` [PATCH 1/8] perf, tools: fix BFD detection on opensuse Andi Kleen
2014-02-28  4:22 ` [PATCH 2/8] perf, tools: Support handling complete branch stacks as histograms v4 Andi Kleen
2014-03-05 10:29   ` Jiri Olsa
2014-03-05 15:31     ` Andi Kleen
2014-03-07 13:19       ` Jiri Olsa
2014-03-07 13:19   ` Jiri Olsa
2014-03-07 19:51     ` Arnaldo Carvalho de Melo
2014-03-07 19:54       ` Arnaldo Carvalho de Melo
2014-03-11  0:43         ` Andi Kleen
2014-02-28  4:22 ` [PATCH 3/8] perf, tools: Add --branch-history option to report v2 Andi Kleen
2014-03-07 13:19   ` Jiri Olsa
2014-03-11  0:31     ` Andi Kleen
2014-03-11 15:25       ` Jiri Olsa
2014-03-11 20:21         ` Andi Kleen
2014-02-28  4:22 ` [PATCH 4/8] perf, tools: Enable printing the srcline in the history v2 Andi Kleen
2014-03-01 16:22   ` Andi Kleen
2014-02-28  4:22 ` [PATCH 5/8] perf, tools: Only print base source file for srcline Andi Kleen
2014-02-28  4:22 ` [PATCH 6/8] perf, tools: Support source line numbers in annotate Andi Kleen
2014-02-28  4:22 ` [PATCH 7/8] perf, tools: Fix srcline sort key output to use width Andi Kleen
2014-02-28  4:22 ` [PATCH 8/8] tools, perf: Make get_srcline fall back to sym+offset Andi Kleen
2014-03-07 13:19   ` Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.