linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view
@ 2016-10-31  1:19 Jin Yao
  2016-10-31  1:19 ` [PATCH v4 1/6] perf report: Add branch flag to callchain cursor node Jin Yao
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Jin Yao @ 2016-10-31  1:19 UTC (permalink / raw)
  To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, Jin Yao

v4: Update according to Andi's comments. The requirement is not displaying
    the number of removed loops. It needs to display the average number of
    iterations. It computes out the number of iterations by counting
    the removed loops. 

v3: 1. Display the count for tsx abort, remove the abort percentage.

    2. Since the branch history code has a loop detection that removes
       small loops in util/machine.c:remove_loops(). It would be nice to
       note how many loops were removed. So it adds the note on some
       callchain entries.

v2: Just a rebase to Arnaldo's perf/core branch, no functional changes.

Initial post

perf record -g -b ...
perf report --branch-history

Currently it only shows the branches from the LBR in the callgraph view.
It would be useful to annotate branch predictions and TSX aborts and
also timed LBR cycles also in the callgraph view.

This would allow a quick overview where branch predictions are and how
costly basic blocks are.

For example:

# Overhead  Source:Line                  Symbol                       Shared Object      Predicted  Abort  Cycles
# ........  ...........................  ...........................  .................  .........  .....  ......
#
    38.25%  div.c:45                     [.] main                     div                97.6%      0      3
            |
            ---main div.c:42 (cycles:2)
               compute_flag div.c:28 (cycles:2)
               compute_flag div.c:27 (cycles:1)
               rand rand.c:28 (cycles:1)
               rand rand.c:28 (cycles:1)
               __random random.c:298 (cycles:1)
               __random random.c:297 (cycles:1)
               __random random.c:295 (cycles:1)
               __random random.c:295 (cycles:1)
               __random random.c:295 (cycles:1)
               __random random.c:295 (cycles:9)
               |
               |--36.73%--__random_r random_r.c:392 (cycles:9)
               |          __random_r random_r.c:357 (cycles:1)
               |          __random random.c:293 (cycles:1)
               |          __random random.c:293 (cycles:1)
               |          __random random.c:291 (cycles:1)
               |          __random random.c:291 (cycles:1)
               |          __random random.c:291 (cycles:1)
               |          __random random.c:288 (cycles:1)
               |          rand rand.c:27 (cycles:1)
               |          rand rand.c:26 (cycles:1)
               |          rand@plt +4194304 (cycles:1)
               |          rand@plt +4194304 (cycles:1)
               |          compute_flag div.c:25 (cycles:1)
               |          compute_flag div.c:22 (cycles:1)
               |          main div.c:40 (cycles:1)
               |          main div.c:40 (cycles:16)
               |          main div.c:39 (cycles:16)
               |          |
               |          |--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18)
               |          |          main div.c:44 (predicted:50.6%, cycles:1)
               |          |          |
               |          |           --22.69%--main div.c:42 (cycles:2, iterations:17)
               |          |                     compute_flag div.c:28 (cycles:2)
               |          |                     |
               |          |                      --10.52%--compute_flag div.c:27 (cycles:1)
               |          |                                rand rand.c:28 (cycles:1)

Jin Yao (6):
  perf report: Add branch flag to callchain cursor node
  perf report: Create a symbol_conf flag for showing branch flag
    counting
  perf report: Caculate and return the branch flag counting
  perf report: Show branch info in callchain entry for stdio mode
  perf report: Show branch info in callchain entry for browser mode
  perf report: Display columns Predicted/Abort/Cycles in
    --branch-history

 tools/perf/Documentation/perf-report.txt |   9 ++
 tools/perf/builtin-report.c              |   9 +-
 tools/perf/ui/browsers/hists.c           |  20 ++-
 tools/perf/ui/stdio/hist.c               |  35 +++++-
 tools/perf/util/callchain.c              | 203 ++++++++++++++++++++++++++++++-
 tools/perf/util/callchain.h              |  22 +++-
 tools/perf/util/hist.c                   |   3 +
 tools/perf/util/hist.h                   |   3 +
 tools/perf/util/machine.c                |  82 ++++++++++---
 tools/perf/util/sort.c                   | 113 ++++++++++++++++-
 tools/perf/util/sort.h                   |   3 +
 tools/perf/util/symbol.h                 |   1 +
 12 files changed, 476 insertions(+), 27 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/6] perf report: Add branch flag to callchain cursor node
  2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
@ 2016-10-31  1:19 ` Jin Yao
  2016-10-31  1:19 ` [PATCH v4 2/6] perf report: Create a symbol_conf flag for showing branch flag counting Jin Yao
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Jin Yao @ 2016-10-31  1:19 UTC (permalink / raw)
  To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, Jin Yao

Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.

Then we can know if the cursor node represents a branch and know
what the branch flag it has.

The branch history code has a loop detection pass that removes
loops. It would be nice for knowing how many loops were removed then
in next steps, we can compute out the average number of iterations.

For example:

Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800

After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800

The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).

iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples

This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's
good enough.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/callchain.c | 14 ++++++--
 tools/perf/util/callchain.h |  8 ++++-
 tools/perf/util/machine.c   | 82 ++++++++++++++++++++++++++++++++++++---------
 3 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 07fd30b..9508023 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -730,7 +730,8 @@ merge_chain_branch(struct callchain_cursor *cursor,
 
 	list_for_each_entry_safe(list, next_list, &src->val, list) {
 		callchain_cursor_append(cursor, list->ip,
-					list->ms.map, list->ms.sym);
+					list->ms.map, list->ms.sym,
+					false, NULL, 0, 0);
 		list_del(&list->list);
 		free(list);
 	}
@@ -767,7 +768,9 @@ int callchain_merge(struct callchain_cursor *cursor,
 }
 
 int callchain_cursor_append(struct callchain_cursor *cursor,
-			    u64 ip, struct map *map, struct symbol *sym)
+			    u64 ip, struct map *map, struct symbol *sym,
+			    bool branch, struct branch_flags *flags,
+			    int iter, int samples)
 {
 	struct callchain_cursor_node *node = *cursor->last;
 
@@ -782,6 +785,13 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
 	node->ip = ip;
 	node->map = map;
 	node->sym = sym;
+	node->branch = branch;
+	node->iter = iter;
+	node->samples = samples;
+
+	if (flags)
+		memcpy(&node->branch_flags, flags,
+			sizeof(struct branch_flags));
 
 	cursor->nr++;
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 13e7554..3bbf616 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -129,6 +129,10 @@ struct callchain_cursor_node {
 	u64				ip;
 	struct map			*map;
 	struct symbol			*sym;
+	bool				branch;
+	struct branch_flags		branch_flags;
+	int				iter;
+	int				samples;
 	struct callchain_cursor_node	*next;
 };
 
@@ -183,7 +187,9 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
 }
 
 int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
-			    struct map *map, struct symbol *sym);
+			    struct map *map, struct symbol *sym,
+			    bool branch, struct branch_flags *flags,
+			    int iter, int samples);
 
 /* Close a cursor writing session. Initialize for the reader */
 static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index df85b9e..185c85a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1616,7 +1616,11 @@ static int add_callchain_ip(struct thread *thread,
 			    struct symbol **parent,
 			    struct addr_location *root_al,
 			    u8 *cpumode,
-			    u64 ip)
+			    u64 ip,
+			    bool branch,
+			    struct branch_flags *flags,
+			    int iter,
+			    int samples)
 {
 	struct addr_location al;
 
@@ -1668,7 +1672,8 @@ static int add_callchain_ip(struct thread *thread,
 
 	if (symbol_conf.hide_unresolved && al.sym == NULL)
 		return 0;
-	return callchain_cursor_append(cursor, al.addr, al.map, al.sym);
+	return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
+				       branch, flags, iter, samples);
 }
 
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -1757,7 +1762,9 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	/* LBR only affects the user callchain */
 	if (i != chain_nr) {
 		struct branch_stack *lbr_stack = sample->branch_stack;
-		int lbr_nr = lbr_stack->nr, j;
+		int lbr_nr = lbr_stack->nr, j, k;
+		bool branch;
+		struct branch_flags *flags;
 		/*
 		 * LBR callstack can only get user call chain.
 		 * The mix_chain_nr is kernel call chain
@@ -1772,23 +1779,41 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 
 		for (j = 0; j < mix_chain_nr; j++) {
 			int err;
+			branch = false;
+			flags = NULL;
+
 			if (callchain_param.order == ORDER_CALLEE) {
 				if (j < i + 1)
 					ip = chain->ips[j];
-				else if (j > i + 1)
-					ip = lbr_stack->entries[j - i - 2].from;
-				else
+				else if (j > i + 1) {
+					k = j - i - 2;
+					ip = lbr_stack->entries[k].from;
+					branch = true;
+					flags = &lbr_stack->entries[k].flags;
+				} else {
 					ip = lbr_stack->entries[0].to;
+					branch = true;
+					flags = &lbr_stack->entries[0].flags;
+				}
 			} else {
-				if (j < lbr_nr)
-					ip = lbr_stack->entries[lbr_nr - j - 1].from;
+				if (j < lbr_nr) {
+					k = lbr_nr - j - 1;
+					ip = lbr_stack->entries[k].from;
+					branch = true;
+					flags = &lbr_stack->entries[k].flags;
+				}
 				else if (j > lbr_nr)
 					ip = chain->ips[i + 1 - (j - lbr_nr)];
-				else
+				else {
 					ip = lbr_stack->entries[0].to;
+					branch = true;
+					flags = &lbr_stack->entries[0].flags;
+				}
 			}
 
-			err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip);
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, 0, 0);
 			if (err)
 				return (err < 0) ? err : 0;
 		}
@@ -1813,6 +1838,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	int i, j, err, nr_entries;
 	int skip_idx = -1;
 	int first_call = 0;
+	int iter;
 
 	if (perf_evsel__has_branch_callstack(evsel)) {
 		err = resolve_lbr_callchain_sample(thread, cursor, sample, parent,
@@ -1868,14 +1894,37 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 				be[i] = branch->entries[branch->nr - i - 1];
 		}
 
+		iter = nr;
 		nr = remove_loops(be, nr);
 
+		/*
+		 * Get the number of iterations.
+		 * It's only approximation, but good enough in practice.
+		 */
+		if (iter > nr)
+			iter = iter - nr + 1;
+		else
+			iter = 0;
+
 		for (i = 0; i < nr; i++) {
-			err = add_callchain_ip(thread, cursor, parent, root_al,
-					       NULL, be[i].to);
+			if (i == nr - 1)
+				err = add_callchain_ip(thread, cursor, parent,
+						       root_al,
+						       NULL, be[i].to,
+						       true, &be[i].flags,
+						       iter, 1);
+			else
+				err = add_callchain_ip(thread, cursor, parent,
+						       root_al,
+						       NULL, be[i].to,
+						       true, &be[i].flags,
+						       0, 0);
+
 			if (!err)
 				err = add_callchain_ip(thread, cursor, parent, root_al,
-						       NULL, be[i].from);
+						       NULL, be[i].from,
+						       true, &be[i].flags,
+						       0, 0);
 			if (err == -EINVAL)
 				break;
 			if (err)
@@ -1903,7 +1952,9 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		if (ip < PERF_CONTEXT_MAX)
                        ++nr_entries;
 
-		err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip);
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       false, NULL, 0, 0);
 
 		if (err)
 			return (err < 0) ? err : 0;
@@ -1919,7 +1970,8 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 	if (symbol_conf.hide_unresolved && entry->sym == NULL)
 		return 0;
 	return callchain_cursor_append(cursor, entry->ip,
-				       entry->map, entry->sym);
+				       entry->map, entry->sym,
+				       false, NULL, 0, 0);
 }
 
 static int thread__resolve_callchain_unwind(struct thread *thread,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 2/6] perf report: Create a symbol_conf flag for showing branch flag counting
  2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
  2016-10-31  1:19 ` [PATCH v4 1/6] perf report: Add branch flag to callchain cursor node Jin Yao
@ 2016-10-31  1:19 ` Jin Yao
  2016-11-15 10:47   ` [tip:perf/core] " tip-bot for Jin Yao
  2016-10-31  1:19 ` [PATCH v4 3/6] perf report: Caculate and return the " Jin Yao
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Jin Yao @ 2016-10-31  1:19 UTC (permalink / raw)
  To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, Jin Yao

Create a new flag show_branchflag_count in symbol_conf. The flag is used
to control if showing the branch flag counting information. The flag
depends on if the perf.data has branch data and if user chooses the
"branch-history" option in perf report command line.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-report.c | 3 +++
 tools/perf/util/symbol.h    | 1 +
 2 files changed, 4 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8064de8..3dfbfff 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -911,6 +911,9 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (itrace_synth_opts.last_branch)
 		has_br_stack = true;
 
+	if (has_br_stack && branch_call_mode)
+		symbol_conf.show_branchflag_count = true;
+
 	/*
 	 * Branch mode is a tristate:
 	 * -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d964844..2d0a905 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -100,6 +100,7 @@ struct symbol_conf {
 			show_total_period,
 			use_callchain,
 			cumulate_callchain,
+			show_branchflag_count,
 			exclude_other,
 			show_cpu_utilization,
 			initialized,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 3/6] perf report: Caculate and return the branch flag counting
  2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
  2016-10-31  1:19 ` [PATCH v4 1/6] perf report: Add branch flag to callchain cursor node Jin Yao
  2016-10-31  1:19 ` [PATCH v4 2/6] perf report: Create a symbol_conf flag for showing branch flag counting Jin Yao
@ 2016-10-31  1:19 ` Jin Yao
  2016-11-15 10:47   ` [tip:perf/core] perf report: Calculate " tip-bot for Jin Yao
  2016-10-31  1:19 ` [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode Jin Yao
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Jin Yao @ 2016-10-31  1:19 UTC (permalink / raw)
  To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, Jin Yao

Create some branch counters in per callchain list entry. Each counter
is for a branch flag. For example, predicted_count counts all the
*predicted* branches. The counters get updated by processing the
callchain cursor nodes.

It also provides functions to retrieve or print the values of counters
in callchain list.

Besides the counting for branch flags, it also counts and returns the
average number of iterations.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/callchain.c | 189 +++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/callchain.h |  14 ++++
 2 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 9508023..dcdb737 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -440,6 +440,21 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 		call->ip = cursor_node->ip;
 		call->ms.sym = cursor_node->sym;
 		call->ms.map = cursor_node->map;
+
+		if (cursor_node->branch) {
+			call->branch_count = 1;
+
+			if (cursor_node->branch_flags.predicted)
+				call->predicted_count = 1;
+
+			if (cursor_node->branch_flags.abort)
+				call->abort_count = 1;
+
+			call->cycles_count = cursor_node->branch_flags.cycles;
+			call->iter_count = cursor_node->iter;
+			call->samples_count = cursor_node->samples;
+		}
+
 		list_add_tail(&call->list, &node->val);
 
 		callchain_cursor_advance(cursor);
@@ -499,8 +514,23 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 		right = node->ip;
 	}
 
-	if (left == right)
+	if (left == right) {
+		if (node->branch) {
+			cnode->branch_count++;
+
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
+
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
+
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->iter;
+			cnode->samples_count += node->samples;
+		}
+
 		return MATCH_EQ;
+	}
 
 	return left > right ? MATCH_GT : MATCH_LT;
 }
@@ -949,6 +979,163 @@ int callchain_node__fprintf_value(struct callchain_node *node,
 	return 0;
 }
 
+static void callchain_counts_value(struct callchain_node *node,
+				   u64 *branch_count, u64 *predicted_count,
+				   u64 *abort_count, u64 *cycles_count)
+{
+	struct callchain_list *clist;
+
+	list_for_each_entry(clist, &node->val, list) {
+		if (branch_count)
+			*branch_count += clist->branch_count;
+
+		if (predicted_count)
+			*predicted_count += clist->predicted_count;
+
+		if (abort_count)
+			*abort_count += clist->abort_count;
+
+		if (cycles_count)
+			*cycles_count += clist->cycles_count;
+	}
+}
+
+static int callchain_node_branch_counts_cumul(struct callchain_node *node,
+					      u64 *branch_count,
+					      u64 *predicted_count,
+					      u64 *abort_count,
+					      u64 *cycles_count)
+{
+	struct callchain_node *child;
+	struct rb_node *n;
+
+	n = rb_first(&node->rb_root_in);
+	while (n) {
+		child = rb_entry(n, struct callchain_node, rb_node_in);
+		n = rb_next(n);
+
+		callchain_node_branch_counts_cumul(child, branch_count,
+						   predicted_count,
+						   abort_count,
+						   cycles_count);
+
+		callchain_counts_value(child, branch_count,
+				       predicted_count, abort_count,
+				       cycles_count);
+	}
+
+	return 0;
+}
+
+int callchain_branch_counts(struct callchain_root *root,
+			    u64 *branch_count, u64 *predicted_count,
+			    u64 *abort_count, u64 *cycles_count)
+{
+	if (branch_count)
+		*branch_count = 0;
+
+	if (predicted_count)
+		*predicted_count = 0;
+
+	if (abort_count)
+		*abort_count = 0;
+
+	if (cycles_count)
+		*cycles_count = 0;
+
+	return callchain_node_branch_counts_cumul(&root->node,
+						  branch_count,
+						  predicted_count,
+						  abort_count,
+						  cycles_count);
+}
+
+static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
+				   u64 branch_count, u64 predicted_count,
+				   u64 abort_count, u64 cycles_count,
+				   u64 iter_count, u64 samples_count)
+{
+	double predicted_percent = 0.0;
+	const char *null_str = "";
+	char iter_str[32];
+	char *str;
+	u64 cycles = 0;
+
+	if (branch_count == 0) {
+		if (fp)
+			return fprintf(fp, " (calltrace)");
+
+		return scnprintf(bf, bfsize, " (calltrace)");
+	}
+
+	if (iter_count && samples_count) {
+		scnprintf(iter_str, sizeof(iter_str),
+			 ", iterations:%" PRId64 "",
+			 iter_count / samples_count);
+		str = iter_str;
+	} else
+		str = (char *)null_str;
+
+	predicted_percent = predicted_count * 100.0 / branch_count;
+	cycles = cycles_count / branch_count;
+
+	if ((predicted_percent >= 100.0) && (abort_count == 0)) {
+		if (fp)
+			return fprintf(fp, " (cycles:%" PRId64 "%s)",
+				       cycles, str);
+
+		return scnprintf(bf, bfsize, " (cycles:%" PRId64 "%s)",
+				 cycles, str);
+	}
+
+	if ((predicted_percent < 100.0) && (abort_count == 0)) {
+		if (fp)
+			return fprintf(fp,
+				" (predicted:%.1f%%, cycles:%" PRId64 "%s)",
+				predicted_percent, cycles, str);
+
+		return scnprintf(bf, bfsize,
+			" (predicted:%.1f%%, cycles:%" PRId64 "%s)",
+			predicted_percent, cycles, str);
+	}
+
+	if (fp)
+		return fprintf(fp,
+		" (predicted:%.1f%%, abort:%" PRId64 ", cycles:%" PRId64 "%s)",
+			predicted_percent, abort_count, cycles, str);
+
+	return scnprintf(bf, bfsize,
+		" (predicted:%.1f%%, abort:%" PRId64 ", cycles:%" PRId64 "%s)",
+		predicted_percent, abort_count, cycles, str);
+}
+
+int callchain_list_counts__printf_value(struct callchain_node *node,
+					struct callchain_list *clist,
+					FILE *fp, char *bf, int bfsize)
+{
+	u64 branch_count, predicted_count;
+	u64 abort_count, cycles_count;
+	u64 iter_count = 0, samples_count = 0;
+
+	branch_count = clist->branch_count;
+	predicted_count = clist->predicted_count;
+	abort_count = clist->abort_count;
+	cycles_count = clist->cycles_count;
+
+	if (node) {
+		struct callchain_list *call;
+
+		list_for_each_entry(call, &node->val, list) {
+			iter_count += call->iter_count;
+			samples_count += call->samples_count;
+		}
+	}
+
+	return callchain_counts_printf(fp, bf, bfsize, branch_count,
+				       predicted_count, abort_count,
+				       cycles_count, iter_count, samples_count);
+}
+
 static void free_callchain_node(struct callchain_node *node)
 {
 	struct callchain_list *list, *tmp;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3bbf616..82159f9 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -115,6 +115,12 @@ struct callchain_list {
 		bool		unfolded;
 		bool		has_children;
 	};
+	u64			branch_count;
+	u64			predicted_count;
+	u64			abort_count;
+	u64			cycles_count;
+	u64			iter_count;
+	u64			samples_count;
 	char		       *srcline;
 	struct list_head	list;
 };
@@ -267,8 +273,16 @@ char *callchain_node__scnprintf_value(struct callchain_node *node,
 int callchain_node__fprintf_value(struct callchain_node *node,
 				  FILE *fp, u64 total);
 
+int callchain_list_counts__printf_value(struct callchain_node *node,
+					struct callchain_list *clist,
+					FILE *fp, char *bf, int bfsize);
+
 void free_callchain(struct callchain_root *root);
 void decay_callchain(struct callchain_root *root);
 int callchain_node__make_parent_list(struct callchain_node *node);
 
+int callchain_branch_counts(struct callchain_root *root,
+			    u64 *branch_count, u64 *predicted_count,
+			    u64 *abort_count, u64 *cycles_count);
+
 #endif	/* __PERF_CALLCHAIN_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode
  2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
                   ` (2 preceding siblings ...)
  2016-10-31  1:19 ` [PATCH v4 3/6] perf report: Caculate and return the " Jin Yao
@ 2016-10-31  1:19 ` Jin Yao
  2016-11-14 16:34   ` Arnaldo Carvalho de Melo
  2016-11-15 10:48   ` [tip:perf/core] " tip-bot for Jin Yao
  2016-10-31  1:19 ` [PATCH v4 5/6] perf report: Show branch info in callchain entry for browser mode Jin Yao
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 15+ messages in thread
From: Jin Yao @ 2016-10-31  1:19 UTC (permalink / raw)
  To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, Jin Yao

If the branch is 100% predicated then the "predicated" is hide.
Similarly, if there is no branch tsx abort, the "abort" is hide.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

If no iterations, the "iterations" is hide.

For example:

|--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18)
|          main div.c:44 (predicted:50.6%, cycles:1)
|          |
|           --22.69%--main div.c:42 (cycles:2, iterations:17)
|                     compute_flag div.c:28 (cycles:2)
|                     |
|                      --10.52%--compute_flag div.c:27 (cycles:1)
|                                rand rand.c:28 (cycles:1)
|                                rand rand.c:28 (cycles:1)
|                                __random random.c:298 (cycles:1)
|                                __random random.c:297 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:6)

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/ui/stdio/hist.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 89d8441..668f4ae 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -41,7 +41,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
 {
 	int i;
 	size_t ret = 0;
-	char bf[1024];
+	char bf[1024], *alloc_str = NULL;
+	char buf[64];
+	const char *str;
 
 	ret += callchain__fprintf_left_margin(fp, left_margin);
 	for (i = 0; i < depth; i++) {
@@ -56,8 +58,26 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
 		} else
 			ret += fprintf(fp, "%s", "          ");
 	}
-	fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
+
+	str = callchain_list__sym_name(chain, bf, sizeof(bf), false);
+
+	if (symbol_conf.show_branchflag_count) {
+		if (!period)
+			callchain_list_counts__printf_value(node, chain, NULL,
+							    buf, sizeof(buf));
+		else
+			callchain_list_counts__printf_value(NULL, chain, NULL,
+							    buf, sizeof(buf));
+
+		if (asprintf(&alloc_str, "%s%s", str, buf) < 0)
+			str = "Not enough memory!";
+		else
+			str = alloc_str;
+	}
+
+	fputs(str, fp);
 	fputc('\n', fp);
+	free(alloc_str);
 	return ret;
 }
 
@@ -219,8 +239,15 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 			} else
 				ret += callchain__fprintf_left_margin(fp, left_margin);
 
-			ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
-							false));
+			ret += fprintf(fp, "%s",
+				       callchain_list__sym_name(chain, bf,
+								sizeof(bf),
+								false));
+
+			if (symbol_conf.show_branchflag_count)
+				ret += callchain_list_counts__printf_value(
+						NULL, chain, fp, NULL, 0);
+			ret += fprintf(fp, "\n");
 
 			if (++entries_printed == callchain_param.print_limit)
 				break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 5/6] perf report: Show branch info in callchain entry for browser mode
  2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
                   ` (3 preceding siblings ...)
  2016-10-31  1:19 ` [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode Jin Yao
@ 2016-10-31  1:19 ` Jin Yao
  2016-11-15 10:49   ` [tip:perf/core] " tip-bot for Jin Yao
  2016-10-31  1:19 ` [PATCH v4 6/6] perf report: Display columns Predicted/Abort/Cycles in --branch-history Jin Yao
  2016-11-14 14:30 ` [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Arnaldo Carvalho de Melo
  6 siblings, 1 reply; 15+ messages in thread
From: Jin Yao @ 2016-10-31  1:19 UTC (permalink / raw)
  To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, Jin Yao

If the branch is 100% predicated then the "predicated" is hide.
Similarly, if there is no branch tsx abort, the "abort" is hide.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

If no iterations, the "iterations" is hide.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/ui/browsers/hists.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 84f5dd2..66676cb 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -738,6 +738,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 					     struct callchain_print_arg *arg)
 {
 	char bf[1024], *alloc_str;
+	char buf[64], *alloc_str2;
 	const char *str;
 
 	if (arg->row_offset != 0) {
@@ -746,12 +747,26 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	}
 
 	alloc_str = NULL;
+	alloc_str2 = NULL;
+
 	str = callchain_list__sym_name(chain, bf, sizeof(bf),
 				       browser->show_dso);
 
-	if (need_percent) {
-		char buf[64];
+	if (symbol_conf.show_branchflag_count) {
+		if (need_percent)
+			callchain_list_counts__printf_value(node, chain, NULL,
+							    buf, sizeof(buf));
+		else
+			callchain_list_counts__printf_value(NULL, chain, NULL,
+							    buf, sizeof(buf));
+
+		if (asprintf(&alloc_str2, "%s%s", str, buf) < 0)
+			str = "Not enough memory!";
+		else
+			str = alloc_str2;
+	}
 
+	if (need_percent) {
 		callchain_node__scnprintf_value(node, buf, sizeof(buf),
 						total);
 
@@ -764,6 +779,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	print(browser, chain, str, offset, row, arg);
 
 	free(alloc_str);
+	free(alloc_str2);
 	return 1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 6/6] perf report: Display columns Predicted/Abort/Cycles in --branch-history
  2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
                   ` (4 preceding siblings ...)
  2016-10-31  1:19 ` [PATCH v4 5/6] perf report: Show branch info in callchain entry for browser mode Jin Yao
@ 2016-10-31  1:19 ` Jin Yao
  2016-11-14 14:30 ` [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Arnaldo Carvalho de Melo
  6 siblings, 0 replies; 15+ messages in thread
From: Jin Yao @ 2016-10-31  1:19 UTC (permalink / raw)
  To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, Jin Yao

Use current sort mechanism but the real .se_cmp() just returns 0 so
that new columns "Predicted", "Abort" and "Cycles" are created in display
but actually these keys are not the sort keys.

For example:

Overhead  Source:Line   Symbol    Shared Object  Predicted  Abort  Cycles
........  ............  ........  .............  .........  .....  ......

  38.25%  div.c:45      [.] main  div            97.6%      0      3

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |   9 +++
 tools/perf/builtin-report.c              |   6 +-
 tools/perf/util/hist.c                   |   3 +
 tools/perf/util/hist.h                   |   3 +
 tools/perf/util/sort.c                   | 113 ++++++++++++++++++++++++++++++-
 tools/perf/util/sort.h                   |   3 +
 6 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 2d17462..f728936 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -335,6 +335,15 @@ OPTIONS
 --branch-history::
 	Add the addresses of sampled taken branches to the callstack.
 	This allows to examine the path the program took to each sample.
+
+	Also show with some branch flags that can be:
+	- Predicted: display the average percentage of predicated branches.
+		     (predicated number / total number)
+	- Abort: display the number of tsx aborted branches.
+	- Cycles: cycles in basic block.
+
+	- iterations: display the average number of iterations in callchain list.
+
 	The data collection must have used -b (or -j) and -g.
 
 --objdump=<path>::
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3dfbfff..fbcd035 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -670,6 +670,10 @@ const char report_callchain_help[] = "Display call graph (stack chain/backtrace)
 				     CALLCHAIN_REPORT_HELP
 				     "\n\t\t\t\tDefault: " CALLCHAIN_DEFAULT_OPT;
 
+#define CALLCHAIN_BRANCH_SORT_ORDER	\
+	"srcline,symbol,dso,callchain_branch_predicted," \
+	"callchain_branch_abort,callchain_branch_cycles"
+
 int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 {
 	struct perf_session *session;
@@ -930,7 +934,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		symbol_conf.use_callchain = true;
 		callchain_register_param(&callchain_param);
 		if (sort_order == NULL)
-			sort_order = "srcline,symbol,dso";
+			sort_order = CALLCHAIN_BRANCH_SORT_ORDER;
 	}
 
 	if (report.mem_mode) {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e1be413..2470fff 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -176,6 +176,9 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 	hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
 	hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
 	hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+	hists__new_col_len(hists, HISTC_CALLCHAIN_BRANCH_PREDICTED, 9);
+	hists__new_col_len(hists, HISTC_CALLCHAIN_BRANCH_ABORT, 5);
+	hists__new_col_len(hists, HISTC_CALLCHAIN_BRANCH_CYCLES, 6);
 
 	if (h->srcline) {
 		len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index d4b6514..74e1dd4 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -57,6 +57,9 @@ enum hist_column {
 	HISTC_SRCLINE_FROM,
 	HISTC_SRCLINE_TO,
 	HISTC_TRACE,
+	HISTC_CALLCHAIN_BRANCH_PREDICTED,
+	HISTC_CALLCHAIN_BRANCH_ABORT,
+	HISTC_CALLCHAIN_BRANCH_CYCLES,
 	HISTC_NR_COLS, /* Last entry */
 };
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index df622f4..96bec7f 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -435,6 +435,102 @@ struct sort_entry sort_srcline_to = {
 	.se_width_idx	= HISTC_SRCLINE_TO,
 };
 
+/* --sort callchain_branch_predicted */
+
+static int64_t
+sort__callchain_branch_predicted_cmp(struct hist_entry *left __maybe_unused,
+				     struct hist_entry *right __maybe_unused)
+{
+	return 0;
+}
+
+static int hist_entry__callchain_branch_predicted_snprintf(
+	struct hist_entry *he, char *bf, size_t size, unsigned int width)
+{
+	u64 branch_count, predicted_count;
+	double percent = 0.0;
+	char str[32];
+
+	callchain_branch_counts(he->callchain, &branch_count,
+				&predicted_count, NULL, NULL);
+
+	if (branch_count)
+		percent = predicted_count * 100.0 / branch_count;
+
+	snprintf(str, sizeof(str), "%.1f%%", percent);
+	return repsep_snprintf(bf, size, "%-*.*s", width, width, str);
+}
+
+struct sort_entry sort_callchain_branch_predicted = {
+	.se_header	= "Predicted",
+	.se_cmp		= sort__callchain_branch_predicted_cmp,
+	.se_snprintf	= hist_entry__callchain_branch_predicted_snprintf,
+	.se_width_idx	= HISTC_CALLCHAIN_BRANCH_PREDICTED,
+};
+
+/* --sort callchain_branch_abort */
+
+static int64_t
+sort__callchain_branch_abort_cmp(struct hist_entry *left __maybe_unused,
+				 struct hist_entry *right __maybe_unused)
+{
+	return 0;
+}
+
+static int hist_entry__callchain_branch_abort_snprintf(struct hist_entry *he,
+						       char *bf, size_t size,
+						       unsigned int width)
+{
+	u64 branch_count, abort_count;
+	char str[32];
+
+	callchain_branch_counts(he->callchain, &branch_count,
+				NULL, &abort_count, NULL);
+
+	snprintf(str, sizeof(str), "%" PRId64, abort_count);
+	return repsep_snprintf(bf, size, "%-*.*s", width, width, str);
+}
+
+struct sort_entry sort_callchain_branch_abort = {
+	.se_header	= "Abort",
+	.se_cmp		= sort__callchain_branch_abort_cmp,
+	.se_snprintf	= hist_entry__callchain_branch_abort_snprintf,
+	.se_width_idx	= HISTC_CALLCHAIN_BRANCH_ABORT,
+};
+
+/* --sort callchain_branch_cycles */
+
+static int64_t
+sort__callchain_branch_cycles_cmp(struct hist_entry *left __maybe_unused,
+				  struct hist_entry *right __maybe_unused)
+{
+	return 0;
+}
+
+static int hist_entry__callchain_branch_cycles_snprintf(struct hist_entry *he,
+							char *bf, size_t size,
+							unsigned int width)
+{
+	u64 branch_count, cycles_count, cycles = 0;
+	char str[32];
+
+	callchain_branch_counts(he->callchain, &branch_count,
+				NULL, NULL, &cycles_count);
+
+	if (branch_count)
+		cycles = cycles_count / branch_count;
+
+	snprintf(str, sizeof(str), "%" PRId64 "", cycles);
+	return repsep_snprintf(bf, size, "%-*.*s", width, width, str);
+}
+
+struct sort_entry sort_callchain_branch_cycles = {
+	.se_header	= "Cycles",
+	.se_cmp		= sort__callchain_branch_cycles_cmp,
+	.se_snprintf	= hist_entry__callchain_branch_cycles_snprintf,
+	.se_width_idx	= HISTC_CALLCHAIN_BRANCH_CYCLES,
+};
+
 /* --sort srcfile */
 
 static char no_srcfile[1];
@@ -1435,6 +1531,15 @@ static struct sort_dimension bstack_sort_dimensions[] = {
 	DIM(SORT_CYCLES, "cycles", sort_cycles),
 	DIM(SORT_SRCLINE_FROM, "srcline_from", sort_srcline_from),
 	DIM(SORT_SRCLINE_TO, "srcline_to", sort_srcline_to),
+	DIM(SORT_CALLCHAIN_BRANCH_PREDICTED,
+		"callchain_branch_predicted",
+		sort_callchain_branch_predicted),
+	DIM(SORT_CALLCHAIN_BRANCH_ABORT,
+		"callchain_branch_abort",
+		sort_callchain_branch_abort),
+	DIM(SORT_CALLCHAIN_BRANCH_CYCLES,
+		"callchain_branch_cycles",
+		sort_callchain_branch_cycles),
 };
 
 #undef DIM
@@ -2369,7 +2474,13 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 		if (strncasecmp(tok, sd->name, strlen(tok)))
 			continue;
 
-		if (sort__mode != SORT_MODE__BRANCH)
+		if ((sort__mode != SORT_MODE__BRANCH) &&
+			strncasecmp(tok, "callchain_branch_predicted",
+				    strlen(tok)) &&
+			strncasecmp(tok, "callchain_branch_abort",
+				    strlen(tok)) &&
+			strncasecmp(tok, "callchain_branch_cycles",
+				    strlen(tok)))
 			return -EINVAL;
 
 		if (sd->entry == &sort_sym_from || sd->entry == &sort_sym_to)
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 7aff317..30c6e97 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -224,6 +224,9 @@ enum sort_type {
 	SORT_CYCLES,
 	SORT_SRCLINE_FROM,
 	SORT_SRCLINE_TO,
+	SORT_CALLCHAIN_BRANCH_PREDICTED,
+	SORT_CALLCHAIN_BRANCH_ABORT,
+	SORT_CALLCHAIN_BRANCH_CYCLES,
 
 	/* memory mode specific sort keys */
 	__SORT_MEMORY_MODE,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view
  2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
                   ` (5 preceding siblings ...)
  2016-10-31  1:19 ` [PATCH v4 6/6] perf report: Display columns Predicted/Abort/Cycles in --branch-history Jin Yao
@ 2016-11-14 14:30 ` Arnaldo Carvalho de Melo
  2016-11-14 14:49   ` Andi Kleen
  6 siblings, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-14 14:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jin Yao, jolsa, Linux-kernel, ak, kan.liang

Em Mon, Oct 31, 2016 at 09:19:48AM +0800, Jin Yao escreveu:
> v4: Update according to Andi's comments. The requirement is not displaying
>     the number of removed loops. It needs to display the average number of
>     iterations. It computes out the number of iterations by counting
>     the removed loops. 
> 
> v3: 1. Display the count for tsx abort, remove the abort percentage.
> 
>     2. Since the branch history code has a loop detection that removes
>        small loops in util/machine.c:remove_loops(). It would be nice to
>        note how many loops were removed. So it adds the note on some
>        callchain entries.
> 
> v2: Just a rebase to Arnaldo's perf/core branch, no functional changes.

Andi, are you ok with this now? Can I have your Acked-by or Tested-by?

Thanks,

- Arnaldo
 
> Initial post
> 
> perf record -g -b ...
> perf report --branch-history
> 
> Currently it only shows the branches from the LBR in the callgraph view.
> It would be useful to annotate branch predictions and TSX aborts and
> also timed LBR cycles also in the callgraph view.
> 
> This would allow a quick overview where branch predictions are and how
> costly basic blocks are.
> 
> For example:
> 
> # Overhead  Source:Line                  Symbol                       Shared Object      Predicted  Abort  Cycles
> # ........  ...........................  ...........................  .................  .........  .....  ......
> #
>     38.25%  div.c:45                     [.] main                     div                97.6%      0      3
>             |
>             ---main div.c:42 (cycles:2)
>                compute_flag div.c:28 (cycles:2)
>                compute_flag div.c:27 (cycles:1)
>                rand rand.c:28 (cycles:1)
>                rand rand.c:28 (cycles:1)
>                __random random.c:298 (cycles:1)
>                __random random.c:297 (cycles:1)
>                __random random.c:295 (cycles:1)
>                __random random.c:295 (cycles:1)
>                __random random.c:295 (cycles:1)
>                __random random.c:295 (cycles:9)
>                |
>                |--36.73%--__random_r random_r.c:392 (cycles:9)
>                |          __random_r random_r.c:357 (cycles:1)
>                |          __random random.c:293 (cycles:1)
>                |          __random random.c:293 (cycles:1)
>                |          __random random.c:291 (cycles:1)
>                |          __random random.c:291 (cycles:1)
>                |          __random random.c:291 (cycles:1)
>                |          __random random.c:288 (cycles:1)
>                |          rand rand.c:27 (cycles:1)
>                |          rand rand.c:26 (cycles:1)
>                |          rand@plt +4194304 (cycles:1)
>                |          rand@plt +4194304 (cycles:1)
>                |          compute_flag div.c:25 (cycles:1)
>                |          compute_flag div.c:22 (cycles:1)
>                |          main div.c:40 (cycles:1)
>                |          main div.c:40 (cycles:16)
>                |          main div.c:39 (cycles:16)
>                |          |
>                |          |--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18)
>                |          |          main div.c:44 (predicted:50.6%, cycles:1)
>                |          |          |
>                |          |           --22.69%--main div.c:42 (cycles:2, iterations:17)
>                |          |                     compute_flag div.c:28 (cycles:2)
>                |          |                     |
>                |          |                      --10.52%--compute_flag div.c:27 (cycles:1)
>                |          |                                rand rand.c:28 (cycles:1)
> 
> Jin Yao (6):
>   perf report: Add branch flag to callchain cursor node
>   perf report: Create a symbol_conf flag for showing branch flag
>     counting
>   perf report: Caculate and return the branch flag counting
>   perf report: Show branch info in callchain entry for stdio mode
>   perf report: Show branch info in callchain entry for browser mode
>   perf report: Display columns Predicted/Abort/Cycles in
>     --branch-history
> 
>  tools/perf/Documentation/perf-report.txt |   9 ++
>  tools/perf/builtin-report.c              |   9 +-
>  tools/perf/ui/browsers/hists.c           |  20 ++-
>  tools/perf/ui/stdio/hist.c               |  35 +++++-
>  tools/perf/util/callchain.c              | 203 ++++++++++++++++++++++++++++++-
>  tools/perf/util/callchain.h              |  22 +++-
>  tools/perf/util/hist.c                   |   3 +
>  tools/perf/util/hist.h                   |   3 +
>  tools/perf/util/machine.c                |  82 ++++++++++---
>  tools/perf/util/sort.c                   | 113 ++++++++++++++++-
>  tools/perf/util/sort.h                   |   3 +
>  tools/perf/util/symbol.h                 |   1 +
>  12 files changed, 476 insertions(+), 27 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view
  2016-11-14 14:30 ` [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Arnaldo Carvalho de Melo
@ 2016-11-14 14:49   ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2016-11-14 14:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jin Yao, jolsa, Linux-kernel, kan.liang

On Mon, Nov 14, 2016 at 11:30:56AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 31, 2016 at 09:19:48AM +0800, Jin Yao escreveu:
> > v4: Update according to Andi's comments. The requirement is not displaying
> >     the number of removed loops. It needs to display the average number of
> >     iterations. It computes out the number of iterations by counting
> >     the removed loops. 
> > 
> > v3: 1. Display the count for tsx abort, remove the abort percentage.
> > 
> >     2. Since the branch history code has a loop detection that removes
> >        small loops in util/machine.c:remove_loops(). It would be nice to
> >        note how many loops were removed. So it adds the note on some
> >        callchain entries.
> > 
> > v2: Just a rebase to Arnaldo's perf/core branch, no functional changes.
> 
> Andi, are you ok with this now? Can I have your Acked-by or Tested-by?

Yes it looks good to me now.

Acked-by: Andi Kleen <ak@linux.intel.com>

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode
  2016-10-31  1:19 ` [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode Jin Yao
@ 2016-11-14 16:34   ` Arnaldo Carvalho de Melo
  2016-11-15  0:45     ` Jin, Yao
  2016-11-15 10:48   ` [tip:perf/core] " tip-bot for Jin Yao
  1 sibling, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-14 16:34 UTC (permalink / raw)
  To: Jin Yao; +Cc: jolsa, Linux-kernel, ak, kan.liang

Em Mon, Oct 31, 2016 at 09:19:52AM +0800, Jin Yao escreveu:
> If the branch is 100% predicated then the "predicated" is hide.

"predicated"?  Changing this to "predicted".

Also changing "is hide" to "is hidden".

- Arnaldo

> Similarly, if there is no branch tsx abort, the "abort" is hide.
> There is only cycles shown (cycle is supported on skylake platform,
> older platform would be 0).
> 
> If no iterations, the "iterations" is hide.
> 
> For example:
> 
> |--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18)
> |          main div.c:44 (predicted:50.6%, cycles:1)
> |          |
> |           --22.69%--main div.c:42 (cycles:2, iterations:17)
> |                     compute_flag div.c:28 (cycles:2)
> |                     |
> |                      --10.52%--compute_flag div.c:27 (cycles:1)
> |                                rand rand.c:28 (cycles:1)
> |                                rand rand.c:28 (cycles:1)
> |                                __random random.c:298 (cycles:1)
> |                                __random random.c:297 (cycles:1)
> |                                __random random.c:295 (cycles:1)
> |                                __random random.c:295 (cycles:1)
> |                                __random random.c:295 (cycles:1)
> |                                __random random.c:295 (cycles:6)
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  tools/perf/ui/stdio/hist.c | 35 +++++++++++++++++++++++++++++++----
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
> index 89d8441..668f4ae 100644
> --- a/tools/perf/ui/stdio/hist.c
> +++ b/tools/perf/ui/stdio/hist.c
> @@ -41,7 +41,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
>  {
>  	int i;
>  	size_t ret = 0;
> -	char bf[1024];
> +	char bf[1024], *alloc_str = NULL;
> +	char buf[64];
> +	const char *str;
>  
>  	ret += callchain__fprintf_left_margin(fp, left_margin);
>  	for (i = 0; i < depth; i++) {
> @@ -56,8 +58,26 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
>  		} else
>  			ret += fprintf(fp, "%s", "          ");
>  	}
> -	fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
> +
> +	str = callchain_list__sym_name(chain, bf, sizeof(bf), false);
> +
> +	if (symbol_conf.show_branchflag_count) {
> +		if (!period)
> +			callchain_list_counts__printf_value(node, chain, NULL,
> +							    buf, sizeof(buf));
> +		else
> +			callchain_list_counts__printf_value(NULL, chain, NULL,
> +							    buf, sizeof(buf));
> +
> +		if (asprintf(&alloc_str, "%s%s", str, buf) < 0)
> +			str = "Not enough memory!";
> +		else
> +			str = alloc_str;
> +	}
> +
> +	fputs(str, fp);
q>  	fputc('\n', fp);
> +	free(alloc_str);
>  	return ret;
>  }
>  
> @@ -219,8 +239,15 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
>  			} else
>  				ret += callchain__fprintf_left_margin(fp, left_margin);
>  
> -			ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
> -							false));
> +			ret += fprintf(fp, "%s",
> +				       callchain_list__sym_name(chain, bf,
> +								sizeof(bf),
> +								false));
> +
> +			if (symbol_conf.show_branchflag_count)
> +				ret += callchain_list_counts__printf_value(
> +						NULL, chain, fp, NULL, 0);
> +			ret += fprintf(fp, "\n");
>  
>  			if (++entries_printed == callchain_param.print_limit)
>  				break;
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode
  2016-11-14 16:34   ` Arnaldo Carvalho de Melo
@ 2016-11-15  0:45     ` Jin, Yao
  0 siblings, 0 replies; 15+ messages in thread
From: Jin, Yao @ 2016-11-15  0:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: jolsa, Linux-kernel, ak, kan.liang

Sorry, spelling mistake.

It should be "predicted" and "is hidden". Thanks for correcting on that.

Thanks

Jin Yao

On 11/15/2016 12:34 AM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 31, 2016 at 09:19:52AM +0800, Jin Yao escreveu:
>> If the branch is 100% predicated then the "predicated" is hide.
> "predicated"?  Changing this to "predicted".
>
> Also changing "is hide" to "is hidden".
>
> - Arnaldo
>
>> Similarly, if there is no branch tsx abort, the "abort" is hide.
>> There is only cycles shown (cycle is supported on skylake platform,
>> older platform would be 0).
>>
>> If no iterations, the "iterations" is hide.
>>
>> For example:
>>
>> |--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18)
>> |          main div.c:44 (predicted:50.6%, cycles:1)
>> |          |
>> |           --22.69%--main div.c:42 (cycles:2, iterations:17)
>> |                     compute_flag div.c:28 (cycles:2)
>> |                     |
>> |                      --10.52%--compute_flag div.c:27 (cycles:1)
>> |                                rand rand.c:28 (cycles:1)
>> |                                rand rand.c:28 (cycles:1)
>> |                                __random random.c:298 (cycles:1)
>> |                                __random random.c:297 (cycles:1)
>> |                                __random random.c:295 (cycles:1)
>> |                                __random random.c:295 (cycles:1)
>> |                                __random random.c:295 (cycles:1)
>> |                                __random random.c:295 (cycles:6)
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>   tools/perf/ui/stdio/hist.c | 35 +++++++++++++++++++++++++++++++----
>>   1 file changed, 31 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
>> index 89d8441..668f4ae 100644
>> --- a/tools/perf/ui/stdio/hist.c
>> +++ b/tools/perf/ui/stdio/hist.c
>> @@ -41,7 +41,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
>>   {
>>   	int i;
>>   	size_t ret = 0;
>> -	char bf[1024];
>> +	char bf[1024], *alloc_str = NULL;
>> +	char buf[64];
>> +	const char *str;
>>   
>>   	ret += callchain__fprintf_left_margin(fp, left_margin);
>>   	for (i = 0; i < depth; i++) {
>> @@ -56,8 +58,26 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
>>   		} else
>>   			ret += fprintf(fp, "%s", "          ");
>>   	}
>> -	fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
>> +
>> +	str = callchain_list__sym_name(chain, bf, sizeof(bf), false);
>> +
>> +	if (symbol_conf.show_branchflag_count) {
>> +		if (!period)
>> +			callchain_list_counts__printf_value(node, chain, NULL,
>> +							    buf, sizeof(buf));
>> +		else
>> +			callchain_list_counts__printf_value(NULL, chain, NULL,
>> +							    buf, sizeof(buf));
>> +
>> +		if (asprintf(&alloc_str, "%s%s", str, buf) < 0)
>> +			str = "Not enough memory!";
>> +		else
>> +			str = alloc_str;
>> +	}
>> +
>> +	fputs(str, fp);
> q>  	fputc('\n', fp);
>> +	free(alloc_str);
>>   	return ret;
>>   }
>>   
>> @@ -219,8 +239,15 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
>>   			} else
>>   				ret += callchain__fprintf_left_margin(fp, left_margin);
>>   
>> -			ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
>> -							false));
>> +			ret += fprintf(fp, "%s",
>> +				       callchain_list__sym_name(chain, bf,
>> +								sizeof(bf),
>> +								false));
>> +
>> +			if (symbol_conf.show_branchflag_count)
>> +				ret += callchain_list_counts__printf_value(
>> +						NULL, chain, fp, NULL, 0);
>> +			ret += fprintf(fp, "\n");
>>   
>>   			if (++entries_printed == callchain_param.print_limit)
>>   				break;
>> -- 
>> 2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf report: Create a symbol_conf flag for showing branch flag counting
  2016-10-31  1:19 ` [PATCH v4 2/6] perf report: Create a symbol_conf flag for showing branch flag counting Jin Yao
@ 2016-11-15 10:47   ` tip-bot for Jin Yao
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Jin Yao @ 2016-11-15 10:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kan.liang, ak, hpa, acme, mingo, tglx, jolsa, linux-kernel, yao.jin

Commit-ID:  f9a7be7c024319423623f58f5233234cad714e6b
Gitweb:     http://git.kernel.org/tip/f9a7be7c024319423623f58f5233234cad714e6b
Author:     Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Mon, 31 Oct 2016 09:19:50 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 14 Nov 2016 13:23:42 -0300

perf report: Create a symbol_conf flag for showing branch flag counting

Create a new flag show_branchflag_count in symbol_conf. The flag is used
to control if showing the branch flag counting information. The flag
depends on if the perf.data has branch data and if user chooses the
"branch-history" option in perf report command line.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1477876794-30749-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 3 +++
 tools/perf/util/symbol.h    | 1 +
 2 files changed, 4 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8064de8..3dfbfff 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -911,6 +911,9 @@ repeat:
 	if (itrace_synth_opts.last_branch)
 		has_br_stack = true;
 
+	if (has_br_stack && branch_call_mode)
+		symbol_conf.show_branchflag_count = true;
+
 	/*
 	 * Branch mode is a tristate:
 	 * -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d964844..2d0a905 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -100,6 +100,7 @@ struct symbol_conf {
 			show_total_period,
 			use_callchain,
 			cumulate_callchain,
+			show_branchflag_count,
 			exclude_other,
 			show_cpu_utilization,
 			initialized,

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf report: Calculate and return the branch flag counting
  2016-10-31  1:19 ` [PATCH v4 3/6] perf report: Caculate and return the " Jin Yao
@ 2016-11-15 10:47   ` tip-bot for Jin Yao
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Jin Yao @ 2016-11-15 10:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kan.liang, yao.jin, ak, jolsa, acme, tglx, hpa, linux-kernel, mingo

Commit-ID:  3dd029ef94018dfa499c05778dd67d03c00b637c
Gitweb:     http://git.kernel.org/tip/3dd029ef94018dfa499c05778dd67d03c00b637c
Author:     Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Mon, 31 Oct 2016 09:19:51 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 14 Nov 2016 13:25:58 -0300

perf report: Calculate and return the branch flag counting

Create some branch counters in per callchain list entry. Each counter
is for a branch flag. For example, predicted_count counts all the
*predicted* branches. The counters get updated by processing the
callchain cursor nodes.

It also provides functions to retrieve or print the values of counters
in callchain list.

Besides the counting for branch flags, it also counts and returns the
average number of iterations.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1477876794-30749-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 189 +++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/callchain.h |  14 ++++
 2 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 138a415..823befd 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -438,6 +438,21 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 		call->ip = cursor_node->ip;
 		call->ms.sym = cursor_node->sym;
 		call->ms.map = cursor_node->map;
+
+		if (cursor_node->branch) {
+			call->branch_count = 1;
+
+			if (cursor_node->branch_flags.predicted)
+				call->predicted_count = 1;
+
+			if (cursor_node->branch_flags.abort)
+				call->abort_count = 1;
+
+			call->cycles_count = cursor_node->branch_flags.cycles;
+			call->iter_count = cursor_node->nr_loop_iter;
+			call->samples_count = cursor_node->samples;
+		}
+
 		list_add_tail(&call->list, &node->val);
 
 		callchain_cursor_advance(cursor);
@@ -497,8 +512,23 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 		right = node->ip;
 	}
 
-	if (left == right)
+	if (left == right) {
+		if (node->branch) {
+			cnode->branch_count++;
+
+			if (node->branch_flags.predicted)
+				cnode->predicted_count++;
+
+			if (node->branch_flags.abort)
+				cnode->abort_count++;
+
+			cnode->cycles_count += node->branch_flags.cycles;
+			cnode->iter_count += node->nr_loop_iter;
+			cnode->samples_count += node->samples;
+		}
+
 		return MATCH_EQ;
+	}
 
 	return left > right ? MATCH_GT : MATCH_LT;
 }
@@ -947,6 +977,163 @@ int callchain_node__fprintf_value(struct callchain_node *node,
 	return 0;
 }
 
+static void callchain_counts_value(struct callchain_node *node,
+				   u64 *branch_count, u64 *predicted_count,
+				   u64 *abort_count, u64 *cycles_count)
+{
+	struct callchain_list *clist;
+
+	list_for_each_entry(clist, &node->val, list) {
+		if (branch_count)
+			*branch_count += clist->branch_count;
+
+		if (predicted_count)
+			*predicted_count += clist->predicted_count;
+
+		if (abort_count)
+			*abort_count += clist->abort_count;
+
+		if (cycles_count)
+			*cycles_count += clist->cycles_count;
+	}
+}
+
+static int callchain_node_branch_counts_cumul(struct callchain_node *node,
+					      u64 *branch_count,
+					      u64 *predicted_count,
+					      u64 *abort_count,
+					      u64 *cycles_count)
+{
+	struct callchain_node *child;
+	struct rb_node *n;
+
+	n = rb_first(&node->rb_root_in);
+	while (n) {
+		child = rb_entry(n, struct callchain_node, rb_node_in);
+		n = rb_next(n);
+
+		callchain_node_branch_counts_cumul(child, branch_count,
+						   predicted_count,
+						   abort_count,
+						   cycles_count);
+
+		callchain_counts_value(child, branch_count,
+				       predicted_count, abort_count,
+				       cycles_count);
+	}
+
+	return 0;
+}
+
+int callchain_branch_counts(struct callchain_root *root,
+			    u64 *branch_count, u64 *predicted_count,
+			    u64 *abort_count, u64 *cycles_count)
+{
+	if (branch_count)
+		*branch_count = 0;
+
+	if (predicted_count)
+		*predicted_count = 0;
+
+	if (abort_count)
+		*abort_count = 0;
+
+	if (cycles_count)
+		*cycles_count = 0;
+
+	return callchain_node_branch_counts_cumul(&root->node,
+						  branch_count,
+						  predicted_count,
+						  abort_count,
+						  cycles_count);
+}
+
+static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
+				   u64 branch_count, u64 predicted_count,
+				   u64 abort_count, u64 cycles_count,
+				   u64 iter_count, u64 samples_count)
+{
+	double predicted_percent = 0.0;
+	const char *null_str = "";
+	char iter_str[32];
+	char *str;
+	u64 cycles = 0;
+
+	if (branch_count == 0) {
+		if (fp)
+			return fprintf(fp, " (calltrace)");
+
+		return scnprintf(bf, bfsize, " (calltrace)");
+	}
+
+	if (iter_count && samples_count) {
+		scnprintf(iter_str, sizeof(iter_str),
+			 ", iterations:%" PRId64 "",
+			 iter_count / samples_count);
+		str = iter_str;
+	} else
+		str = (char *)null_str;
+
+	predicted_percent = predicted_count * 100.0 / branch_count;
+	cycles = cycles_count / branch_count;
+
+	if ((predicted_percent >= 100.0) && (abort_count == 0)) {
+		if (fp)
+			return fprintf(fp, " (cycles:%" PRId64 "%s)",
+				       cycles, str);
+
+		return scnprintf(bf, bfsize, " (cycles:%" PRId64 "%s)",
+				 cycles, str);
+	}
+
+	if ((predicted_percent < 100.0) && (abort_count == 0)) {
+		if (fp)
+			return fprintf(fp,
+				" (predicted:%.1f%%, cycles:%" PRId64 "%s)",
+				predicted_percent, cycles, str);
+
+		return scnprintf(bf, bfsize,
+			" (predicted:%.1f%%, cycles:%" PRId64 "%s)",
+			predicted_percent, cycles, str);
+	}
+
+	if (fp)
+		return fprintf(fp,
+		" (predicted:%.1f%%, abort:%" PRId64 ", cycles:%" PRId64 "%s)",
+			predicted_percent, abort_count, cycles, str);
+
+	return scnprintf(bf, bfsize,
+		" (predicted:%.1f%%, abort:%" PRId64 ", cycles:%" PRId64 "%s)",
+		predicted_percent, abort_count, cycles, str);
+}
+
+int callchain_list_counts__printf_value(struct callchain_node *node,
+					struct callchain_list *clist,
+					FILE *fp, char *bf, int bfsize)
+{
+	u64 branch_count, predicted_count;
+	u64 abort_count, cycles_count;
+	u64 iter_count = 0, samples_count = 0;
+
+	branch_count = clist->branch_count;
+	predicted_count = clist->predicted_count;
+	abort_count = clist->abort_count;
+	cycles_count = clist->cycles_count;
+
+	if (node) {
+		struct callchain_list *call;
+
+		list_for_each_entry(call, &node->val, list) {
+			iter_count += call->iter_count;
+			samples_count += call->samples_count;
+		}
+	}
+
+	return callchain_counts_printf(fp, bf, bfsize, branch_count,
+				       predicted_count, abort_count,
+				       cycles_count, iter_count, samples_count);
+}
+
 static void free_callchain_node(struct callchain_node *node)
 {
 	struct callchain_list *list, *tmp;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index df6329d..d9c70dc 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -111,6 +111,12 @@ struct callchain_list {
 		bool		unfolded;
 		bool		has_children;
 	};
+	u64			branch_count;
+	u64			predicted_count;
+	u64			abort_count;
+	u64			cycles_count;
+	u64			iter_count;
+	u64			samples_count;
 	char		       *srcline;
 	struct list_head	list;
 };
@@ -263,8 +269,16 @@ char *callchain_node__scnprintf_value(struct callchain_node *node,
 int callchain_node__fprintf_value(struct callchain_node *node,
 				  FILE *fp, u64 total);
 
+int callchain_list_counts__printf_value(struct callchain_node *node,
+					struct callchain_list *clist,
+					FILE *fp, char *bf, int bfsize);
+
 void free_callchain(struct callchain_root *root);
 void decay_callchain(struct callchain_root *root);
 int callchain_node__make_parent_list(struct callchain_node *node);
 
+int callchain_branch_counts(struct callchain_root *root,
+			    u64 *branch_count, u64 *predicted_count,
+			    u64 *abort_count, u64 *cycles_count);
+
 #endif	/* __PERF_CALLCHAIN_H */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf report: Show branch info in callchain entry for stdio mode
  2016-10-31  1:19 ` [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode Jin Yao
  2016-11-14 16:34   ` Arnaldo Carvalho de Melo
@ 2016-11-15 10:48   ` tip-bot for Jin Yao
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot for Jin Yao @ 2016-11-15 10:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, ak, kan.liang, tglx, acme, hpa, jolsa, linux-kernel, yao.jin

Commit-ID:  8577ae6b040022ed3ecd11dc395df7af59cce503
Gitweb:     http://git.kernel.org/tip/8577ae6b040022ed3ecd11dc395df7af59cce503
Author:     Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Mon, 31 Oct 2016 09:19:52 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 14 Nov 2016 13:33:47 -0300

perf report: Show branch info in callchain entry for stdio mode

If the branch is 100% predicted then the "predicted" is hidden.
Similarly, if there is no branch tsx abort, the "abort" is hidden.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

If no iterations, the "iterations" is hidden.

For example:

|--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18)
|          main div.c:44 (predicted:50.6%, cycles:1)
|          |
|           --22.69%--main div.c:42 (cycles:2, iterations:17)
|                     compute_flag div.c:28 (cycles:2)
|                     |
|                      --10.52%--compute_flag div.c:27 (cycles:1)
|                                rand rand.c:28 (cycles:1)
|                                rand rand.c:28 (cycles:1)
|                                __random random.c:298 (cycles:1)
|                                __random random.c:297 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:6)

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1477876794-30749-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/stdio/hist.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 89d8441..668f4ae 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -41,7 +41,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
 {
 	int i;
 	size_t ret = 0;
-	char bf[1024];
+	char bf[1024], *alloc_str = NULL;
+	char buf[64];
+	const char *str;
 
 	ret += callchain__fprintf_left_margin(fp, left_margin);
 	for (i = 0; i < depth; i++) {
@@ -56,8 +58,26 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
 		} else
 			ret += fprintf(fp, "%s", "          ");
 	}
-	fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
+
+	str = callchain_list__sym_name(chain, bf, sizeof(bf), false);
+
+	if (symbol_conf.show_branchflag_count) {
+		if (!period)
+			callchain_list_counts__printf_value(node, chain, NULL,
+							    buf, sizeof(buf));
+		else
+			callchain_list_counts__printf_value(NULL, chain, NULL,
+							    buf, sizeof(buf));
+
+		if (asprintf(&alloc_str, "%s%s", str, buf) < 0)
+			str = "Not enough memory!";
+		else
+			str = alloc_str;
+	}
+
+	fputs(str, fp);
 	fputc('\n', fp);
+	free(alloc_str);
 	return ret;
 }
 
@@ -219,8 +239,15 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 			} else
 				ret += callchain__fprintf_left_margin(fp, left_margin);
 
-			ret += fprintf(fp, "%s\n", callchain_list__sym_name(chain, bf, sizeof(bf),
-							false));
+			ret += fprintf(fp, "%s",
+				       callchain_list__sym_name(chain, bf,
+								sizeof(bf),
+								false));
+
+			if (symbol_conf.show_branchflag_count)
+				ret += callchain_list_counts__printf_value(
+						NULL, chain, fp, NULL, 0);
+			ret += fprintf(fp, "\n");
 
 			if (++entries_printed == callchain_param.print_limit)
 				break;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf report: Show branch info in callchain entry for browser mode
  2016-10-31  1:19 ` [PATCH v4 5/6] perf report: Show branch info in callchain entry for browser mode Jin Yao
@ 2016-11-15 10:49   ` tip-bot for Jin Yao
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Jin Yao @ 2016-11-15 10:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, jolsa, hpa, yao.jin, ak, acme, kan.liang, mingo, tglx

Commit-ID:  fef51ecd1056b5e090c9fb73e0833bd751389572
Gitweb:     http://git.kernel.org/tip/fef51ecd1056b5e090c9fb73e0833bd751389572
Author:     Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Mon, 31 Oct 2016 09:19:53 +0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 14 Nov 2016 13:34:08 -0300

perf report: Show branch info in callchain entry for browser mode

If the branch is 100% predicted then the "predicted" is hidden.
Similarly, if there is no branch tsx abort, the "abort" is hidden.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

If no iterations, the "iterations" is hidden.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1477876794-30749-6-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/hists.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 84f5dd2..66676cb 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -738,6 +738,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 					     struct callchain_print_arg *arg)
 {
 	char bf[1024], *alloc_str;
+	char buf[64], *alloc_str2;
 	const char *str;
 
 	if (arg->row_offset != 0) {
@@ -746,12 +747,26 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	}
 
 	alloc_str = NULL;
+	alloc_str2 = NULL;
+
 	str = callchain_list__sym_name(chain, bf, sizeof(bf),
 				       browser->show_dso);
 
-	if (need_percent) {
-		char buf[64];
+	if (symbol_conf.show_branchflag_count) {
+		if (need_percent)
+			callchain_list_counts__printf_value(node, chain, NULL,
+							    buf, sizeof(buf));
+		else
+			callchain_list_counts__printf_value(NULL, chain, NULL,
+							    buf, sizeof(buf));
+
+		if (asprintf(&alloc_str2, "%s%s", str, buf) < 0)
+			str = "Not enough memory!";
+		else
+			str = alloc_str2;
+	}
 
+	if (need_percent) {
 		callchain_node__scnprintf_value(node, buf, sizeof(buf),
 						total);
 
@@ -764,6 +779,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
 	print(browser, chain, str, offset, row, arg);
 
 	free(alloc_str);
+	free(alloc_str2);
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-15 10:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-31  1:19 [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Jin Yao
2016-10-31  1:19 ` [PATCH v4 1/6] perf report: Add branch flag to callchain cursor node Jin Yao
2016-10-31  1:19 ` [PATCH v4 2/6] perf report: Create a symbol_conf flag for showing branch flag counting Jin Yao
2016-11-15 10:47   ` [tip:perf/core] " tip-bot for Jin Yao
2016-10-31  1:19 ` [PATCH v4 3/6] perf report: Caculate and return the " Jin Yao
2016-11-15 10:47   ` [tip:perf/core] perf report: Calculate " tip-bot for Jin Yao
2016-10-31  1:19 ` [PATCH v4 4/6] perf report: Show branch info in callchain entry for stdio mode Jin Yao
2016-11-14 16:34   ` Arnaldo Carvalho de Melo
2016-11-15  0:45     ` Jin, Yao
2016-11-15 10:48   ` [tip:perf/core] " tip-bot for Jin Yao
2016-10-31  1:19 ` [PATCH v4 5/6] perf report: Show branch info in callchain entry for browser mode Jin Yao
2016-11-15 10:49   ` [tip:perf/core] " tip-bot for Jin Yao
2016-10-31  1:19 ` [PATCH v4 6/6] perf report: Display columns Predicted/Abort/Cycles in --branch-history Jin Yao
2016-11-14 14:30 ` [PATCH v4 0/6] perf report: Show branch flags/cycles in --branch-history callgraph view Arnaldo Carvalho de Melo
2016-11-14 14:49   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).