linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Cycles annotation support for perf tools v3
@ 2015-07-18 15:24 Andi Kleen
  2015-07-18 15:24 ` [PATCH 1/9] perf, tools: Add tools support for cycles, weight branch_info field Andi Kleen
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung

[v2: Addressed review comments. Fixed display problems and 
correctly compute IPC now. See patches for detailed changes.]
[v3: Merged with current Arnaldo perf/core and added acked-by.]

[Note the respective kernel patches to report cycles are in
peterz's perf/core queue, but so far not in tip. The patchkit
can be tested however with the "fake cycles" debug patch added at
the end]

The upcoming Skylake CPU has a new timed branch stack feature,
that reports cycle counts for individual branches in the
last branch record.

This allows to get fine grained cost information for code, and also allows
to compute fine grained IPC.

Available from
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/skl-tools3

This patchkit adds support for this in the perf tools:
- Basic support for the cycles field like other branch fields
- Show cycles in the standard branch sort view (no IPC here,
  as IPC needs the instruction counts from annotation)
- Annotate cycles and IPC in the assembler annotate view
- Add branch support to top, so we can do live annotation.
- Misc support, like dumping it in perf report -D

Example output for annotate (with made up numbers):
    
The second column is the IPC and third average cycles for the basic block.

                   │    static int hex(char ch)                                                                                                       ▒
                   │    {                                                                                                                             ▒
        0.12       │      push   %rbp                                                                                                                 ◆
        0.12       │      mov    %rsp,%rbp                                                                                                            ▒
        0.12       │      sub    $0x20,%rsp                                                                                                           ▒
        0.12       │      mov    %edi,%eax                                                                                                            ▒
        0.12       │      mov    %al,-0x14(%rbp)                                                                                                      ▒
        0.12       │      mov    %fs:0x28,%rax                                                                                                        ▒
        0.12       │      mov    %rax,-0x8(%rbp)                                                                                                      ▒
        0.12       │      xor    %eax,%eax                                                                                                            ▒
                   │            if ((ch >= '0') && (ch <= '9'))                                                                                       ▒
        0.12       │      cmpb   $0x2f,-0x14(%rbp)                                                                                                    ▒
 66.67  0.12   123 │    ↓ jle    31                                                                                                                   ▒
        0.12       │      cmpb   $0x39,-0x14(%rbp)                                                                                                    ▒
        0.12   123 │    ↓ jg     31                                                                                                                   ▒
                   │                    return ch - '0';                                                                                              ▒
 22.22  0.12       │      movsbl -0x14(%rbp),%eax                                                                                                     ▒
        0.12       │      sub    $0x30,%eax                                                                                                           ▒
        0.12   123 │    ↓ jmp    60                                                                                                                   ▒
                   │            if ((ch >= 'a') && (ch <= 'f'))                                                                                       ▒
        0.06       │31:   cmpb   $0x60,-0x14(%rbp)                                                                                                    ▒
        0.06   123 │    ↓ jle    46                                                                                                                   ▒
        0.06       │      cmpb   $0x66,-0x14(%rbp)                                                                                                    ▒
        0.06       │    ↓ jg     46                                                                                                                   ▒
                   │                    return ch - 'a' + 10;                                                                                         ▒
        0.06       │      movsbl -0x14(%rbp),%eax                                 

Example output for branch view (again with fake data):

Overhead  Command  Source Shared Object  Source Symbol                               Target Symbol                               Basic Block Cycles   ◆
  30.08%  tcall    tcall                 [.] f1                                      [.] f2                                      123                  ▒
  27.44%  tcall    tcall                 [.] f2                                      [.] f1                                      123                  ▒
  15.60%  tcall    tcall                 [.] main                                    [.] f1                                      123                  ▒
  12.96%  tcall    tcall                 [.] f1                                      [.] main                                    123                  ▒
  12.86%  tcall    tcall                 [.] main                                    [.] main                                    123                  ▒
   0.08%  tcall    [kernel.kallsyms]     [k] hrtimer_interrupt                       [k] hrtimer_interrupt                       123             

IPC computation has a few limitations (see the comments in the respective patches),
in particular it punts on overlaping basic blocks.

The annotation only works for the interactive annotation. Currently it is not
working in the scripted perf annotate, as that is missing a lot of the
infrastructure needed for per instruction state.

It would be nice to add column headers to annotate.

So far no support in --branch-history or in perf script.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/9] perf, tools: Add tools support for cycles, weight branch_info field
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:19   ` [tip:perf/core] perf tools: Add " tip-bot for Andi Kleen
  2015-07-18 15:24 ` [PATCH 2/9] perf, tools, report: Add flag for non ANY branch mode Andi Kleen
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

cycles is a new branch_info field available on some CPUs
that indicates the time deltas between branches in the LBR.

Add a sort key and output code for the cycles
to allow to display the basic block cycles individually in perf report.

We also pass in the cycles for weight when LBRs are processed,
which allows to get global and local weight, to get an estimate
of the total cost.

And also print the cycles information for perf report -D.
I also added printing for the previously missing LBR flags
(mispredict etc.)

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/util/event.h                  |  3 ++-
 tools/perf/util/hist.c                   |  3 ++-
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/session.c                | 16 ++++++++++++----
 tools/perf/util/sort.c                   | 24 ++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  1 +
 7 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index c33b69f..960da20 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -109,6 +109,7 @@ OPTIONS
 	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
 	- in_tx: branch in TSX transaction
 	- abort: TSX transaction abort.
+	- cycles: Cycles in basic block
 
 	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
 	and symbol_to, see '--branch-stack'.
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c53f363..ec175ca 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -134,7 +134,8 @@ struct branch_flags {
 	u64 predicted:1;
 	u64 in_tx:1;
 	u64 abort:1;
-	u64 reserved:60;
+	u64 cycles:16;
+	u64 reserved:44;
 };
 
 struct branch_entry {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 6f28d53..54fc003 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -618,7 +618,8 @@ iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *a
 	 * and not events sampled. Thus we use a pseudo period of 1.
 	 */
 	he = __hists__add_entry(hists, al, iter->parent, &bi[i], NULL,
-				1, 1, 0, true);
+				1, bi->flags.cycles ? bi->flags.cycles : 1,
+				0, true);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 5ed8d9c..3881d98 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -47,6 +47,7 @@ enum hist_column {
 	HISTC_MEM_SNOOP,
 	HISTC_MEM_DCACHELINE,
 	HISTC_TRANSACTION,
+	HISTC_CYCLES,
 	HISTC_NR_COLS, /* Last entry */
 };
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ed9dc25..e495127c 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -766,10 +766,18 @@ static void branch_stack__printf(struct perf_sample *sample)
 
 	printf("... branch stack: nr:%" PRIu64 "\n", sample->branch_stack->nr);
 
-	for (i = 0; i < sample->branch_stack->nr; i++)
-		printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 "\n",
-			i, sample->branch_stack->entries[i].from,
-			sample->branch_stack->entries[i].to);
+	for (i = 0; i < sample->branch_stack->nr; i++) {
+		struct branch_entry *e = &sample->branch_stack->entries[i];
+
+		printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n",
+			i, e->from, e->to,
+			e->flags.cycles,
+			e->flags.mispred ? "M" : " ",
+			e->flags.predicted ? "P" : " ",
+			e->flags.abort ? "A" : " ",
+			e->flags.in_tx ? "T" : " ",
+			(unsigned)e->flags.reserved);
+	}
 }
 
 static void regs_dump__printf(u64 mask, u64 *regs)
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 4c65a14..5b7a50c 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -526,6 +526,29 @@ static int hist_entry__mispredict_snprintf(struct hist_entry *he, char *bf,
 	return repsep_snprintf(bf, size, "%-*.*s", width, width, out);
 }
 
+static int64_t
+sort__cycles_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return left->branch_info->flags.cycles -
+		right->branch_info->flags.cycles;
+}
+
+static int hist_entry__cycles_snprintf(struct hist_entry *he, char *bf,
+				    size_t size, unsigned int width)
+{
+	if (he->branch_info->flags.cycles == 0)
+		return repsep_snprintf(bf, size, "%-*s", width, "-");
+	return repsep_snprintf(bf, size, "%-*hd", width,
+			       he->branch_info->flags.cycles);
+}
+
+struct sort_entry sort_cycles = {
+	.se_header	= "Basic Block Cycles",
+	.se_cmp		= sort__cycles_cmp,
+	.se_snprintf	= hist_entry__cycles_snprintf,
+	.se_width_idx	= HISTC_CYCLES,
+};
+
 /* --sort daddr_sym */
 static int64_t
 sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
@@ -1190,6 +1213,7 @@ static struct sort_dimension bstack_sort_dimensions[] = {
 	DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
 	DIM(SORT_IN_TX, "in_tx", sort_in_tx),
 	DIM(SORT_ABORT, "abort", sort_abort),
+	DIM(SORT_CYCLES, "cycles", sort_cycles),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e97cd47..bc6c87a 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -185,6 +185,7 @@ enum sort_type {
 	SORT_MISPREDICT,
 	SORT_ABORT,
 	SORT_IN_TX,
+	SORT_CYCLES,
 
 	/* memory mode specific sort keys */
 	__SORT_MEMORY_MODE,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/9] perf, tools, report: Add flag for non ANY branch mode
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
  2015-07-18 15:24 ` [PATCH 1/9] perf, tools: Add tools support for cycles, weight branch_info field Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:19   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2015-07-18 15:24 ` [PATCH 3/9] perf, tools, report: Add infrastructure for a cycles histogram Andi Kleen
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Later patches need to cheaply check that the branch mode is in ANY.
Add a new function to check all event attrs and add a flag to the
report state, which is then initialized.

v2: Rename flag
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-report.c |  7 +++++++
 tools/perf/util/evlist.c    | 10 ++++++++++
 tools/perf/util/evlist.h    |  1 +
 3 files changed, 18 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 95a4771..3ba0e97 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -53,6 +53,7 @@ struct report {
 	bool			mem_mode;
 	bool			header;
 	bool			header_only;
+	bool			nonany_branch_mode;
 	int			max_stack;
 	struct perf_read_values	show_threads_values;
 	const char		*pretty_printing_style;
@@ -258,6 +259,12 @@ static int report__setup_sample_type(struct report *rep)
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
+
+	/* ??? handle more cases than just ANY? */
+	if (!(perf_evlist__combined_branch_type(session->evlist) &
+				PERF_SAMPLE_BRANCH_ANY))
+		rep->nonany_branch_mode = true;
+
 	return 0;
 }
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f7d9c77..cba8069 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1255,6 +1255,16 @@ u64 perf_evlist__combined_sample_type(struct perf_evlist *evlist)
 	return __perf_evlist__combined_sample_type(evlist);
 }
 
+u64 perf_evlist__combined_branch_type(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+	u64 branch_type = 0;
+
+	evlist__for_each(evlist, evsel)
+		branch_type |= evsel->attr.branch_sample_type;
+	return branch_type;
+}
+
 bool perf_evlist__valid_read_format(struct perf_evlist *evlist)
 {
 	struct perf_evsel *first = perf_evlist__first(evlist), *pos = first;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 037633c..c1cdb86 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -169,6 +169,7 @@ void perf_evlist__set_leader(struct perf_evlist *evlist);
 u64 perf_evlist__read_format(struct perf_evlist *evlist);
 u64 __perf_evlist__combined_sample_type(struct perf_evlist *evlist);
 u64 perf_evlist__combined_sample_type(struct perf_evlist *evlist);
+u64 perf_evlist__combined_branch_type(struct perf_evlist *evlist);
 bool perf_evlist__sample_id_all(struct perf_evlist *evlist);
 u16 perf_evlist__id_hdr_size(struct perf_evlist *evlist);
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/9] perf, tools, report: Add infrastructure for a cycles histogram
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
  2015-07-18 15:24 ` [PATCH 1/9] perf, tools: Add tools support for cycles, weight branch_info field Andi Kleen
  2015-07-18 15:24 ` [PATCH 2/9] perf, tools, report: Add flag for non ANY branch mode Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:20   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2015-07-18 15:24 ` [PATCH 4/9] perf, tools, report: Add processing for cycle histograms Andi Kleen
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

This adds the basic infrastructure to keep track of cycle counts
per basic block for annotate. We allocate an array similar to the
normal accounting, and then account branch cycles there.

We handle two cases:
cycles per basic block with start and cycles per branch
(these are later used for either IPC or just cycles per BB)

In the start case we cannot handle overlaps, so always the longest
basic block wins.

For the cycles per branch case everything is accurately accounted.

v2: Remove unnecessary checks. Slight restructure. Move
symbol__get_annotation to another patch. Move histogram allocation.
v3: Merged with current tree
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-annotate.c |   1 +
 tools/perf/util/annotate.c    | 127 +++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/annotate.h    |  17 ++++++
 3 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 2c1bec3..467a23b 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -187,6 +187,7 @@ find_next:
 			 * symbol, free he->ms.sym->src to signal we already
 			 * processed this symbol.
 			 */
+			zfree(&notes->src->cycles_hist);
 			zfree(&notes->src);
 		}
 	}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 03b7bc70..e0b6146 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -473,17 +473,73 @@ int symbol__alloc_hist(struct symbol *sym)
 	return 0;
 }
 
+/* The cycles histogram is lazily allocated. */
+static int symbol__alloc_hist_cycles(struct symbol *sym)
+{
+	struct annotation *notes = symbol__annotation(sym);
+	const size_t size = symbol__size(sym);
+
+	notes->src->cycles_hist = calloc(size, sizeof(struct cyc_hist));
+	if (notes->src->cycles_hist == NULL)
+		return -1;
+	return 0;
+}
+
 void symbol__annotate_zero_histograms(struct symbol *sym)
 {
 	struct annotation *notes = symbol__annotation(sym);
 
 	pthread_mutex_lock(&notes->lock);
-	if (notes->src != NULL)
+	if (notes->src != NULL) {
 		memset(notes->src->histograms, 0,
 		       notes->src->nr_histograms * notes->src->sizeof_sym_hist);
+		if (notes->src->cycles_hist)
+			memset(notes->src->cycles_hist, 0,
+				symbol__size(sym) * sizeof(struct cyc_hist));
+	}
 	pthread_mutex_unlock(&notes->lock);
 }
 
+static int __symbol__account_cycles(struct annotation *notes,
+				    u64 start,
+				    unsigned offset, unsigned cycles,
+				    unsigned have_start)
+{
+	struct cyc_hist *ch;
+
+	ch = notes->src->cycles_hist;
+	/*
+	 * For now we can only account one basic block per
+	 * final jump. But multiple could be overlapping.
+	 * Always account the longest one. So when
+	 * a shorter one has been already seen throw it away.
+	 *
+	 * We separately always account the full cycles.
+	 */
+	ch[offset].num_aggr++;
+	ch[offset].cycles_aggr += cycles;
+
+	if (!have_start && ch[offset].have_start)
+		return 0;
+	if (ch[offset].num) {
+		if (have_start && (!ch[offset].have_start ||
+				   ch[offset].start > start)) {
+			ch[offset].have_start = 0;
+			ch[offset].cycles = 0;
+			ch[offset].num = 0;
+			if (ch[offset].reset < 0xffff)
+				ch[offset].reset++;
+		} else if (have_start &&
+			   ch[offset].start < start)
+			return 0;
+	}
+	ch[offset].have_start = have_start;
+	ch[offset].start = start;
+	ch[offset].cycles += cycles;
+	ch[offset].num++;
+	return 0;
+}
+
 static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,
 				      struct annotation *notes, int evidx, u64 addr)
 {
@@ -506,7 +562,7 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,
 	return 0;
 }
 
-static struct annotation *symbol__get_annotation(struct symbol *sym)
+static struct annotation *symbol__get_annotation(struct symbol *sym, bool cycles)
 {
 	struct annotation *notes = symbol__annotation(sym);
 
@@ -514,6 +570,10 @@ static struct annotation *symbol__get_annotation(struct symbol *sym)
 		if (symbol__alloc_hist(sym) < 0)
 			return NULL;
 	}
+	if (!notes->src->cycles_hist && cycles) {
+		if (symbol__alloc_hist_cycles(sym) < 0)
+			return NULL;
+	}
 	return notes;
 }
 
@@ -524,12 +584,73 @@ static int symbol__inc_addr_samples(struct symbol *sym, struct map *map,
 
 	if (sym == NULL)
 		return 0;
-	notes = symbol__get_annotation(sym);
+	notes = symbol__get_annotation(sym, false);
 	if (notes == NULL)
 		return -ENOMEM;
 	return __symbol__inc_addr_samples(sym, map, notes, evidx, addr);
 }
 
+static int symbol__account_cycles(u64 addr, u64 start,
+				  struct symbol *sym, unsigned cycles)
+{
+	struct annotation *notes;
+	unsigned offset;
+
+	if (sym == NULL)
+		return 0;
+	notes = symbol__get_annotation(sym, true);
+	if (notes == NULL)
+		return -ENOMEM;
+	if (addr < sym->start || addr >= sym->end)
+		return -ERANGE;
+
+	if (start) {
+		if (start < sym->start || start >= sym->end)
+			return -ERANGE;
+		if (start >= addr)
+			start = 0;
+	}
+	offset = addr - sym->start;
+	return __symbol__account_cycles(notes,
+					start ? start - sym->start : 0,
+					offset, cycles,
+					!!start);
+}
+
+int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
+				    struct addr_map_symbol *start,
+				    unsigned cycles)
+{
+	unsigned long saddr = 0;
+	int err;
+
+	if (!cycles)
+		return 0;
+
+	/*
+	 * Only set start when IPC can be computed. We can only
+	 * compute it when the basic block is completely in a single
+	 * function.
+	 * Special case the case when the jump is elsewhere, but
+	 * it starts on the function start.
+	 */
+	if (start &&
+		(start->sym == ams->sym ||
+		 (ams->sym &&
+		   start->addr == ams->sym->start + ams->map->start)))
+		saddr = start->al_addr;
+	if (saddr == 0)
+		pr_debug2("BB with bad start: addr %lx start %lx sym %lx saddr %lx\n",
+			ams->addr,
+			start ? start->addr : 0,
+			ams->sym ? ams->sym->start + ams->map->start : 0,
+			saddr);
+	err = symbol__account_cycles(ams->al_addr, saddr, ams->sym, cycles);
+	if (err)
+		pr_debug2("account_cycles failed %d\n", err);
+	return err;
+}
+
 int addr_map_symbol__inc_samples(struct addr_map_symbol *ams, int evidx)
 {
 	return symbol__inc_addr_samples(ams->sym, ams->map, evidx, ams->al_addr);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 7e78e6c..a06518d 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -79,6 +79,17 @@ struct sym_hist {
 	u64		addr[0];
 };
 
+struct cyc_hist {
+	u64	start;
+	u64	cycles;
+	u64	cycles_aggr;
+	u32	num;
+	u32	num_aggr;
+	u8	have_start;
+	/* 1 byte padding */
+	u16	reset;
+};
+
 struct source_line_samples {
 	double		percent;
 	double		percent_sum;
@@ -97,6 +108,7 @@ struct source_line {
  * @histogram: Array of addr hit histograms per event being monitored
  * @lines: If 'print_lines' is specified, per source code line percentages
  * @source: source parsed from a disassembler like objdump -dS
+ * @cyc_hist: Average cycles per basic block
  *
  * lines is allocated, percentages calculated and all sorted by percentage
  * when the annotation is about to be presented, so the percentages are for
@@ -109,6 +121,7 @@ struct annotated_source {
 	struct source_line *lines;
 	int    		   nr_histograms;
 	int    		   sizeof_sym_hist;
+	struct cyc_hist	   *cycles_hist;
 	struct sym_hist	   histograms[0];
 };
 
@@ -130,6 +143,10 @@ static inline struct annotation *symbol__annotation(struct symbol *sym)
 
 int addr_map_symbol__inc_samples(struct addr_map_symbol *ams, int evidx);
 
+int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
+				    struct addr_map_symbol *start,
+				    unsigned cycles);
+
 int hist_entry__inc_addr_samples(struct hist_entry *he, int evidx, u64 addr);
 
 int symbol__alloc_hist(struct symbol *sym);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/9] perf, tools, report: Add processing for cycle histograms
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
                   ` (2 preceding siblings ...)
  2015-07-18 15:24 ` [PATCH 3/9] perf, tools, report: Add infrastructure for a cycles histogram Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:20   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2015-07-18 15:24 ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Andi Kleen
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Call the earlier added cycle histogram infrastructure from the perf report
hist iter callback. For this we walk the branch records.

This allows to use cycle histograms when browsing perf report annotate.

v2: Rename flag
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-report.c |  3 +++
 tools/perf/util/hist.c      | 33 +++++++++++++++++++++++++++++++++
 tools/perf/util/hist.h      |  3 +++
 3 files changed, 39 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3ba0e97..3a9d1b6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -103,6 +103,9 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
 	if (!ui__has_annotation())
 		return 0;
 
+	hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
+			     rep->nonany_branch_mode);
+
 	if (sort__mode == SORT_MODE__BRANCH) {
 		bi = he->branch_info;
 		err = addr_map_symbol__inc_samples(&bi->from, evsel->idx);
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 54fc003..a6e9ddd 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1415,6 +1415,39 @@ int hists__link(struct hists *leader, struct hists *other)
 	return 0;
 }
 
+void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
+			  struct perf_sample *sample, bool nonany_branch_mode)
+{
+	struct branch_info *bi;
+
+	/* If we have branch cycles always annotate them. */
+	if (bs && bs->nr && bs->entries[0].flags.cycles) {
+		int i;
+
+		bi = sample__resolve_bstack(sample, al);
+		if (bi) {
+			struct addr_map_symbol *prev = NULL;
+
+			/*
+			 * Ignore errors, still want to process the
+			 * other entries.
+			 *
+			 * For non standard branch modes always
+			 * force no IPC (prev == NULL)
+			 *
+			 * Note that perf stores branches reversed from
+			 * program order!
+			 */
+			for (i = bs->nr - 1; i >= 0; i--) {
+				addr_map_symbol__account_cycles(&bi[i].from,
+					nonany_branch_mode ? NULL : prev,
+					bi[i].flags.cycles);
+				prev = &bi[i].to;
+			}
+			free(bi);
+		}
+	}
+}
 
 size_t perf_evlist__fprintf_nr_events(struct perf_evlist *evlist, FILE *fp)
 {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3881d98..e2f712f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -350,6 +350,9 @@ static inline int script_browse(const char *script_opt __maybe_unused)
 
 unsigned int hists__sort_list_width(struct hists *hists);
 
+void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
+			  struct perf_sample *sample, bool nonany_branch_mode);
+
 struct option;
 int parse_filter_percentage(const struct option *opt __maybe_unused,
 			    const char *arg, int unset __maybe_unused);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
                   ` (3 preceding siblings ...)
  2015-07-18 15:24 ` [PATCH 4/9] perf, tools, report: Add processing for cycle histograms Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:20   ` [tip:perf/core] perf annotate: Compute IPC and basic block cycles tip-bot for Andi Kleen
  2016-06-30  8:53   ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Peter Zijlstra
  2015-07-18 15:24 ` [PATCH 6/9] perf, tools, annotate: Finally display IPC and cycle accounting Andi Kleen
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Compute the IPC and the basic block cycles for the annotate display.

IPC is computed by counting the instructions, and then dividing the
accounted cycles by that count.

The actual IPC computation can only be done at annotate time,
because we need to parse the objdump output first to know
the number of instructions in the basic block.

The cycles/IPC are also put into the perf function annotation
so that the display code can show them.

Again basic block overlaps are not handled, with the longest winning,
but there are some heuristics to hide the IPC when the longest is not
the most common.

v2: Compute IPC correctly.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/ui/browsers/annotate.c | 73 ++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/annotate.h        |  2 ++
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 5995a8b..6ec1795 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -53,6 +53,7 @@ struct annotate_browser {
 	int		    max_jump_sources;
 	int		    nr_jumps;
 	bool		    searching_backwards;
+	bool		    have_cycles;
 	u8		    addr_width;
 	u8		    jumps_width;
 	u8		    target_width;
@@ -390,7 +391,7 @@ static void annotate_browser__calc_percent(struct annotate_browser *browser,
 				max_percent = bpos->samples[i].percent;
 		}
 
-		if (max_percent < 0.01) {
+		if (max_percent < 0.01 && pos->ipc == 0) {
 			RB_CLEAR_NODE(&bpos->rb_node);
 			continue;
 		}
@@ -869,6 +870,75 @@ int hist_entry__tui_annotate(struct hist_entry *he, struct perf_evsel *evsel,
 	return map_symbol__tui_annotate(&he->ms, evsel, hbt);
 }
 
+
+static unsigned count_insn(struct annotate_browser *browser, u64 start, u64 end)
+{
+	unsigned n_insn = 0;
+	u64 offset;
+
+	for (offset = start; offset <= end; offset++) {
+		if (browser->offsets[offset])
+			n_insn++;
+	}
+	return n_insn;
+}
+
+static void count_and_fill(struct annotate_browser *browser, u64 start, u64 end,
+			   struct cyc_hist *ch)
+{
+	unsigned n_insn;
+	u64 offset;
+
+	n_insn = count_insn(browser, start, end);
+	if (n_insn && ch->num && ch->cycles) {
+		float ipc = n_insn / ((double)ch->cycles / (double)ch->num);
+
+		/* Hide data when there are too many overlaps. */
+		if (ch->reset >= 0x7fff || ch->reset >= ch->num / 2)
+			return;
+
+		for (offset = start; offset <= end; offset++) {
+			struct disasm_line *dl = browser->offsets[offset];
+
+			if (dl)
+				dl->ipc = ipc;
+		}
+	}
+}
+
+/*
+ * This should probably be in util/annotate.c to share with the tty
+ * annotate, but right now we need the per byte offsets arrays,
+ * which are only here.
+ */
+static void annotate__compute_ipc(struct annotate_browser *browser, size_t size,
+			   struct symbol *sym)
+{
+	u64 offset;
+	struct annotation *notes = symbol__annotation(sym);
+
+	if (!notes->src || !notes->src->cycles_hist)
+		return;
+
+	pthread_mutex_lock(&notes->lock);
+	for (offset = 0; offset < size; ++offset) {
+		struct cyc_hist *ch;
+
+		ch = &notes->src->cycles_hist[offset];
+		if (ch && ch->cycles) {
+			struct disasm_line *dl;
+
+			if (ch->have_start)
+				count_and_fill(browser, ch->start, offset, ch);
+			dl = browser->offsets[offset];
+			if (dl && ch->num_aggr)
+				dl->cycles = ch->cycles_aggr / ch->num_aggr;
+			browser->have_cycles = true;
+		}
+	}
+	pthread_mutex_unlock(&notes->lock);
+}
+
 static void annotate_browser__mark_jump_targets(struct annotate_browser *browser,
 						size_t size)
 {
@@ -991,6 +1061,7 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map,
 	}
 
 	annotate_browser__mark_jump_targets(&browser, size);
+	annotate__compute_ipc(&browser, size, sym);
 
 	browser.addr_width = browser.target_width = browser.min_addr_width = hex_width(size);
 	browser.max_addr_width = hex_width(sym->end);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index a06518d..e999609 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -59,6 +59,8 @@ struct disasm_line {
 	char		    *name;
 	struct ins	    *ins;
 	int		    line_nr;
+	float		    ipc;
+	u64		    cycles;
 	struct ins_operands ops;
 };
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/9] perf, tools, annotate: Finally display IPC and cycle accounting
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
                   ` (4 preceding siblings ...)
  2015-07-18 15:24 ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:21   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2015-07-18 15:24 ` [PATCH 7/9] perf, tools, top: Add branch annotation code to top Andi Kleen
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add two new columns to the annotate display and display the average cycles
and the compute IPC if available.

When the LBR was not in any branch mode the IPC
computation is automatically disabled. We still display
the cycle information.

Example output (with made up numbers):

The second column is the IPC and third average cycles.

                 │    __attribute__((noinline)) f2()
                 │    {
  5.15  0.07     │       push   %rbp
  0.01  0.07     │       mov    %rsp,%rbp
                 │            c = a / b;
  9.87  0.07     │       mov    a,%eax
        0.07     │       mov    b,%ecx
        0.07     │       cltd
  4.92  0.07  123│       idiv   %ecx
 70.79  0.07     │       mov    %eax,__TMC_END__
                 │    }
  9.25  0.07     │       pop    %rbp
  0.01  0.07  123│     ← retq

v2: Fix display problems.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/ui/browsers/annotate.c | 57 +++++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 6ec1795..b5fc847 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -16,6 +16,9 @@ struct disasm_line_samples {
 	u64		nr;
 };
 
+#define IPC_WIDTH 6
+#define CYCLES_WIDTH 6
+
 struct browser_disasm_line {
 	struct rb_node			rb_node;
 	u32				idx;
@@ -97,6 +100,15 @@ static int annotate_browser__set_jumps_percent_color(struct annotate_browser *br
 	 return ui_browser__set_color(&browser->b, color);
 }
 
+static int annotate_browser__pcnt_width(struct annotate_browser *ab)
+{
+	int w = 7 * ab->nr_events;
+
+	if (ab->have_cycles)
+		w += IPC_WIDTH + CYCLES_WIDTH;
+	return w;
+}
+
 static void annotate_browser__write(struct ui_browser *browser, void *entry, int row)
 {
 	struct annotate_browser *ab = container_of(browser, struct annotate_browser, b);
@@ -107,7 +119,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
 			     (!current_entry || (browser->use_navkeypressed &&
 					         !browser->navkeypressed)));
 	int width = browser->width, printed;
-	int i, pcnt_width = 7 * ab->nr_events;
+	int i, pcnt_width = annotate_browser__pcnt_width(ab);
 	double percent_max = 0.0;
 	char bf[256];
 
@@ -117,19 +129,34 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
 	}
 
 	if (dl->offset != -1 && percent_max != 0.0) {
-		for (i = 0; i < ab->nr_events; i++) {
-			ui_browser__set_percent_color(browser,
-						      bdl->samples[i].percent,
-						      current_entry);
-			if (annotate_browser__opts.show_total_period)
-				slsmg_printf("%6" PRIu64 " ",
-					     bdl->samples[i].nr);
-			else
-				slsmg_printf("%6.2f ", bdl->samples[i].percent);
+		if (percent_max != 0.0) {
+			for (i = 0; i < ab->nr_events; i++) {
+				ui_browser__set_percent_color(browser,
+							bdl->samples[i].percent,
+							current_entry);
+				if (annotate_browser__opts.show_total_period)
+					slsmg_printf("%6" PRIu64 " ",
+						     bdl->samples[i].nr);
+				else
+					slsmg_printf("%6.2f ", bdl->samples[i].percent);
+			}
+		} else {
+			slsmg_write_nstring(" ", 7 * ab->nr_events);
 		}
 	} else {
 		ui_browser__set_percent_color(browser, 0, current_entry);
-		slsmg_write_nstring(" ", pcnt_width);
+		slsmg_write_nstring(" ", 7 * ab->nr_events);
+	}
+	if (ab->have_cycles) {
+		if (dl->ipc)
+			slsmg_printf("%*.2f ", IPC_WIDTH - 1, dl->ipc);
+		else
+			slsmg_write_nstring(" ", IPC_WIDTH);
+		if (dl->cycles)
+			slsmg_printf("%*" PRIu64 " ",
+				     CYCLES_WIDTH - 1, dl->cycles);
+		else
+			slsmg_write_nstring(" ", CYCLES_WIDTH);
 	}
 
 	SLsmg_write_char(' ');
@@ -232,7 +259,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 	unsigned int from, to;
 	struct map_symbol *ms = ab->b.priv;
 	struct symbol *sym = ms->sym;
-	u8 pcnt_width = 7;
+	u8 pcnt_width = annotate_browser__pcnt_width(ab);
 
 	/* PLT symbols contain external offsets */
 	if (strstr(sym->name, "@plt"))
@@ -256,8 +283,6 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 		to = (u64)btarget->idx;
 	}
 
-	pcnt_width *= ab->nr_events;
-
 	ui_browser__set_color(browser, HE_COLORSET_CODE);
 	__ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width,
 				 from, to);
@@ -267,9 +292,7 @@ static unsigned int annotate_browser__refresh(struct ui_browser *browser)
 {
 	struct annotate_browser *ab = container_of(browser, struct annotate_browser, b);
 	int ret = ui_browser__list_head_refresh(browser);
-	int pcnt_width;
-
-	pcnt_width = 7 * ab->nr_events;
+	int pcnt_width = annotate_browser__pcnt_width(ab);
 
 	if (annotate_browser__opts.jump_arrows)
 		annotate_browser__draw_current_jump(browser);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/9] perf, tools, top: Add branch annotation code to top
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
                   ` (5 preceding siblings ...)
  2015-07-18 15:24 ` [PATCH 6/9] perf, tools, annotate: Finally display IPC and cycle accounting Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:21   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2015-07-18 15:24 ` [PATCH 8/9] perf, tools, report: Display cycles in branch sort mode Andi Kleen
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Now that we can process branch data in annotate it makes sense to support
enabling branch recording from top too. Most of the code needed for
this is already in shared code with report. But we need to add:

- The option parsing code (using shared code from the previous patch)
- Document the options
- Set up the IPC/cycles accounting state in the top session
- Call the accounting code in the hist iter callback

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-top.txt | 21 +++++++++++++++++++++
 tools/perf/builtin-top.c              | 10 +++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 776aec4..f6a23eb 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -208,6 +208,27 @@ Default is to monitor all CPUS.
 	This option sets the time out limit. The default value is 500 ms.
 
 
+-b::
+--branch-any::
+	Enable taken branch stack sampling. Any type of taken branch may be sampled.
+	This is a shortcut for --branch-filter any. See --branch-filter for more infos.
+
+-j::
+--branch-filter::
+	Enable taken branch stack sampling. Each sample captures a series of consecutive
+	taken branches. The number of branches captured with each sample depends on the
+	underlying hardware, the type of branches of interest, and the executed code.
+	It is possible to select the types of branches captured by enabling filters.
+	For a full list of modifiers please see the perf record manpage.
+
+	The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
+	The privilege levels may be omitted, in which case, the privilege levels of the associated
+	event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
+	levels are subject to permissions.  When sampling on multiple events, branch stack sampling
+	is enabled for all the sampling events. The sampled branch type is the same for all events.
+	The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
+	Note that this feature may not be available on all processors.
+
 INTERACTIVE PROMPTING KEYS
 --------------------------
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ecf3197..f0a5240 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -40,6 +40,7 @@
 #include "util/xyarray.h"
 #include "util/sort.h"
 #include "util/intlist.h"
+#include "util/parse-branch-options.h"
 #include "arch/common.h"
 
 #include "util/debug.h"
@@ -695,6 +696,8 @@ static int hist_iter__top_callback(struct hist_entry_iter *iter,
 		perf_top__record_precise_ip(top, he, evsel->idx, ip);
 	}
 
+	hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
+		     !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY));
 	return 0;
 }
 
@@ -932,7 +935,6 @@ static int perf_top__setup_sample_type(struct perf_top *top __maybe_unused)
 			return -EINVAL;
 		}
 	}
-
 	return 0;
 }
 
@@ -1171,6 +1173,12 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "don't try to adjust column width, use these fixed values"),
 	OPT_UINTEGER(0, "proc-map-timeout", &opts->proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_CALLBACK_NOOPT('b', "branch-any", &opts->branch_stack,
+		     "branch any", "sample any taken branches",
+		     parse_branch_stack),
+	OPT_CALLBACK('j', "branch-filter", &opts->branch_stack,
+		     "branch filter mask", "branch stack filter modes",
+		     parse_branch_stack),
 	OPT_END()
 	};
 	const char * const top_usage[] = {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/9] perf, tools, report: Display cycles in branch sort mode
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
                   ` (6 preceding siblings ...)
  2015-07-18 15:24 ` [PATCH 7/9] perf, tools, top: Add branch annotation code to top Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-07  7:21   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2015-07-18 15:24 ` [PATCH 9/9] test patch: Add fake branch cycles to input data in report/top Andi Kleen
  2015-08-06 19:44 ` Cycles annotation support for perf tools v3 Arnaldo Carvalho de Melo
  9 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Display the cycles by default in branch sort mode.

To make enough room for the new column I removed dso_to. It is usually
redundant with dso_from.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 5b7a50c..5177088 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -9,7 +9,7 @@ regex_t		parent_regex;
 const char	default_parent_pattern[] = "^sys_|^do_page_fault";
 const char	*parent_pattern = default_parent_pattern;
 const char	default_sort_order[] = "comm,dso,symbol";
-const char	default_branch_sort_order[] = "comm,dso_from,symbol_from,dso_to,symbol_to";
+const char	default_branch_sort_order[] = "comm,dso_from,symbol_from,symbol_to,cycles";
 const char	default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
 const char	default_top_sort_order[] = "dso,symbol";
 const char	default_diff_sort_order[] = "dso,symbol";
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 9/9] test patch: Add fake branch cycles to input data in report/top
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
                   ` (7 preceding siblings ...)
  2015-07-18 15:24 ` [PATCH 8/9] perf, tools, report: Display cycles in branch sort mode Andi Kleen
@ 2015-07-18 15:24 ` Andi Kleen
  2015-08-06 19:44 ` Cycles annotation support for perf tools v3 Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2015-07-18 15:24 UTC (permalink / raw)
  To: acme; +Cc: jolsa, linux-kernel, namhyung, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Not to be merged, but useful for testing if you don't have
hardware with cycles branch stack support.
---
 tools/perf/util/hist.c    | 2 +-
 tools/perf/util/machine.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index a6e9ddd..8a4bf84 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1421,7 +1421,7 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 	struct branch_info *bi;
 
 	/* If we have branch cycles always annotate them. */
-	if (bs && bs->nr && bs->entries[0].flags.cycles) {
+	if (bs && bs->nr /* && bs->entries[0].flags.cycles */) {
 		int i;
 
 		bi = sample__resolve_bstack(sample, al);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7ff6827..1351f19 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1597,6 +1597,8 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
 		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
 		bi[i].flags = bs->entries[i].flags;
+		if (bi[i].flags.cycles == 0)
+			bi[i].flags.cycles = 123;
 	}
 	return bi;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Cycles annotation support for perf tools v3
  2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
                   ` (8 preceding siblings ...)
  2015-07-18 15:24 ` [PATCH 9/9] test patch: Add fake branch cycles to input data in report/top Andi Kleen
@ 2015-08-06 19:44 ` Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-06 19:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jolsa, linux-kernel, namhyung

Em Sat, Jul 18, 2015 at 08:24:45AM -0700, Andi Kleen escreveu:
> [v2: Addressed review comments. Fixed display problems and 
> correctly compute IPC now. See patches for detailed changes.]
> [v3: Merged with current Arnaldo perf/core and added acked-by.]
> 
> [Note the respective kernel patches to report cycles are in
> peterz's perf/core queue, but so far not in tip. The patchkit
> can be tested however with the "fake cycles" debug patch added at
> the end]
> 
> The upcoming Skylake CPU has a new timed branch stack feature,
> that reports cycle counts for individual branches in the
> last branch record.
> 
> This allows to get fine grained cost information for code, and also allows
> to compute fine grained IPC.

Thanks, applied.

- Arnaldo
 
> Available from
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/skl-tools3
> 
> This patchkit adds support for this in the perf tools:
> - Basic support for the cycles field like other branch fields
> - Show cycles in the standard branch sort view (no IPC here,
>   as IPC needs the instruction counts from annotation)
> - Annotate cycles and IPC in the assembler annotate view
> - Add branch support to top, so we can do live annotation.
> - Misc support, like dumping it in perf report -D
> 
> Example output for annotate (with made up numbers):
>     
> The second column is the IPC and third average cycles for the basic block.
> 
>                    │    static int hex(char ch)                                                                                                       ▒
>                    │    {                                                                                                                             ▒
>         0.12       │      push   %rbp                                                                                                                 ◆
>         0.12       │      mov    %rsp,%rbp                                                                                                            ▒
>         0.12       │      sub    $0x20,%rsp                                                                                                           ▒
>         0.12       │      mov    %edi,%eax                                                                                                            ▒
>         0.12       │      mov    %al,-0x14(%rbp)                                                                                                      ▒
>         0.12       │      mov    %fs:0x28,%rax                                                                                                        ▒
>         0.12       │      mov    %rax,-0x8(%rbp)                                                                                                      ▒
>         0.12       │      xor    %eax,%eax                                                                                                            ▒
>                    │            if ((ch >= '0') && (ch <= '9'))                                                                                       ▒
>         0.12       │      cmpb   $0x2f,-0x14(%rbp)                                                                                                    ▒
>  66.67  0.12   123 │    ↓ jle    31                                                                                                                   ▒
>         0.12       │      cmpb   $0x39,-0x14(%rbp)                                                                                                    ▒
>         0.12   123 │    ↓ jg     31                                                                                                                   ▒
>                    │                    return ch - '0';                                                                                              ▒
>  22.22  0.12       │      movsbl -0x14(%rbp),%eax                                                                                                     ▒
>         0.12       │      sub    $0x30,%eax                                                                                                           ▒
>         0.12   123 │    ↓ jmp    60                                                                                                                   ▒
>                    │            if ((ch >= 'a') && (ch <= 'f'))                                                                                       ▒
>         0.06       │31:   cmpb   $0x60,-0x14(%rbp)                                                                                                    ▒
>         0.06   123 │    ↓ jle    46                                                                                                                   ▒
>         0.06       │      cmpb   $0x66,-0x14(%rbp)                                                                                                    ▒
>         0.06       │    ↓ jg     46                                                                                                                   ▒
>                    │                    return ch - 'a' + 10;                                                                                         ▒
>         0.06       │      movsbl -0x14(%rbp),%eax                                 
> 
> Example output for branch view (again with fake data):
> 
> Overhead  Command  Source Shared Object  Source Symbol                               Target Symbol                               Basic Block Cycles   ◆
>   30.08%  tcall    tcall                 [.] f1                                      [.] f2                                      123                  ▒
>   27.44%  tcall    tcall                 [.] f2                                      [.] f1                                      123                  ▒
>   15.60%  tcall    tcall                 [.] main                                    [.] f1                                      123                  ▒
>   12.96%  tcall    tcall                 [.] f1                                      [.] main                                    123                  ▒
>   12.86%  tcall    tcall                 [.] main                                    [.] main                                    123                  ▒
>    0.08%  tcall    [kernel.kallsyms]     [k] hrtimer_interrupt                       [k] hrtimer_interrupt                       123             
> 
> IPC computation has a few limitations (see the comments in the respective patches),
> in particular it punts on overlaping basic blocks.
> 
> The annotation only works for the interactive annotation. Currently it is not
> working in the scripted perf annotate, as that is missing a lot of the
> infrastructure needed for per instruction state.
> 
> It would be nice to add column headers to annotate.
> 
> So far no support in --branch-history or in perf script.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf tools: Add support for cycles, weight branch_info field
  2015-07-18 15:24 ` [PATCH 1/9] perf, tools: Add tools support for cycles, weight branch_info field Andi Kleen
@ 2015-08-07  7:19   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, mingo, hpa, namhyung, tglx, jolsa, ak

Commit-ID:  0e332f033a8216fa03792fde69882f66500848c7
Gitweb:     http://git.kernel.org/tip/0e332f033a8216fa03792fde69882f66500848c7
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:46 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:29:45 -0300

perf tools: Add support for cycles, weight branch_info field

cycles is a new branch_info field available on some CPUs that indicates
the time deltas between branches in the LBR.

Add a sort key and output code for the cycles to allow to display the
basic block cycles individually in perf report.

We also pass in the cycles for weight when LBRs are processed, which
allows to get global and local weight, to get an estimate of the total
cost.

And also print the cycles information for perf report -D.  I also added
printing for the previously missing LBR flags (mispredict etc.)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/util/event.h                  |  3 ++-
 tools/perf/util/hist.c                   |  3 ++-
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/session.c                | 16 ++++++++++++----
 tools/perf/util/sort.c                   | 24 ++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  1 +
 7 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index c33b69f..960da20 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -109,6 +109,7 @@ OPTIONS
 	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
 	- in_tx: branch in TSX transaction
 	- abort: TSX transaction abort.
+	- cycles: Cycles in basic block
 
 	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
 	and symbol_to, see '--branch-stack'.
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 4bb2ae8..f729df5 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -134,7 +134,8 @@ struct branch_flags {
 	u64 predicted:1;
 	u64 in_tx:1;
 	u64 abort:1;
-	u64 reserved:60;
+	u64 cycles:16;
+	u64 reserved:44;
 };
 
 struct branch_entry {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 6f28d53..54fc003 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -618,7 +618,8 @@ iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *a
 	 * and not events sampled. Thus we use a pseudo period of 1.
 	 */
 	he = __hists__add_entry(hists, al, iter->parent, &bi[i], NULL,
-				1, 1, 0, true);
+				1, bi->flags.cycles ? bi->flags.cycles : 1,
+				0, true);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 5ed8d9c..3881d98 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -47,6 +47,7 @@ enum hist_column {
 	HISTC_MEM_SNOOP,
 	HISTC_MEM_DCACHELINE,
 	HISTC_TRANSACTION,
+	HISTC_CYCLES,
 	HISTC_NR_COLS, /* Last entry */
 };
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f51eb54..18722e7 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -784,10 +784,18 @@ static void branch_stack__printf(struct perf_sample *sample)
 
 	printf("... branch stack: nr:%" PRIu64 "\n", sample->branch_stack->nr);
 
-	for (i = 0; i < sample->branch_stack->nr; i++)
-		printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 "\n",
-			i, sample->branch_stack->entries[i].from,
-			sample->branch_stack->entries[i].to);
+	for (i = 0; i < sample->branch_stack->nr; i++) {
+		struct branch_entry *e = &sample->branch_stack->entries[i];
+
+		printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x\n",
+			i, e->from, e->to,
+			e->flags.cycles,
+			e->flags.mispred ? "M" : " ",
+			e->flags.predicted ? "P" : " ",
+			e->flags.abort ? "A" : " ",
+			e->flags.in_tx ? "T" : " ",
+			(unsigned)e->flags.reserved);
+	}
 }
 
 static void regs_dump__printf(u64 mask, u64 *regs)
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 4c65a14..5b7a50c 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -526,6 +526,29 @@ static int hist_entry__mispredict_snprintf(struct hist_entry *he, char *bf,
 	return repsep_snprintf(bf, size, "%-*.*s", width, width, out);
 }
 
+static int64_t
+sort__cycles_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return left->branch_info->flags.cycles -
+		right->branch_info->flags.cycles;
+}
+
+static int hist_entry__cycles_snprintf(struct hist_entry *he, char *bf,
+				    size_t size, unsigned int width)
+{
+	if (he->branch_info->flags.cycles == 0)
+		return repsep_snprintf(bf, size, "%-*s", width, "-");
+	return repsep_snprintf(bf, size, "%-*hd", width,
+			       he->branch_info->flags.cycles);
+}
+
+struct sort_entry sort_cycles = {
+	.se_header	= "Basic Block Cycles",
+	.se_cmp		= sort__cycles_cmp,
+	.se_snprintf	= hist_entry__cycles_snprintf,
+	.se_width_idx	= HISTC_CYCLES,
+};
+
 /* --sort daddr_sym */
 static int64_t
 sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
@@ -1190,6 +1213,7 @@ static struct sort_dimension bstack_sort_dimensions[] = {
 	DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
 	DIM(SORT_IN_TX, "in_tx", sort_in_tx),
 	DIM(SORT_ABORT, "abort", sort_abort),
+	DIM(SORT_CYCLES, "cycles", sort_cycles),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e97cd47..bc6c87a 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -185,6 +185,7 @@ enum sort_type {
 	SORT_MISPREDICT,
 	SORT_ABORT,
 	SORT_IN_TX,
+	SORT_CYCLES,
 
 	/* memory mode specific sort keys */
 	__SORT_MEMORY_MODE,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf report: Add flag for non ANY branch mode
  2015-07-18 15:24 ` [PATCH 2/9] perf, tools, report: Add flag for non ANY branch mode Andi Kleen
@ 2015-08-07  7:19   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, hpa, ak, namhyung, jolsa, tglx, acme

Commit-ID:  98df858ed46ddaaf9be3573eb2b63b57a68c6af7
Gitweb:     http://git.kernel.org/tip/98df858ed46ddaaf9be3573eb2b63b57a68c6af7
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:47 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:31:39 -0300

perf report: Add flag for non ANY branch mode

Later patches need to cheaply check that the branch mode is in ANY.  Add
a new function to check all event attrs and add a flag to the report
state, which is then initialized.

v2: Rename flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c |  7 +++++++
 tools/perf/util/evlist.c    | 10 ++++++++++
 tools/perf/util/evlist.h    |  1 +
 3 files changed, 18 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 95a4771..3ba0e97 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -53,6 +53,7 @@ struct report {
 	bool			mem_mode;
 	bool			header;
 	bool			header_only;
+	bool			nonany_branch_mode;
 	int			max_stack;
 	struct perf_read_values	show_threads_values;
 	const char		*pretty_printing_style;
@@ -258,6 +259,12 @@ static int report__setup_sample_type(struct report *rep)
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
+
+	/* ??? handle more cases than just ANY? */
+	if (!(perf_evlist__combined_branch_type(session->evlist) &
+				PERF_SAMPLE_BRANCH_ANY))
+		rep->nonany_branch_mode = true;
+
 	return 0;
 }
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3b9f411..373f65b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1273,6 +1273,16 @@ u64 perf_evlist__combined_sample_type(struct perf_evlist *evlist)
 	return __perf_evlist__combined_sample_type(evlist);
 }
 
+u64 perf_evlist__combined_branch_type(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+	u64 branch_type = 0;
+
+	evlist__for_each(evlist, evsel)
+		branch_type |= evsel->attr.branch_sample_type;
+	return branch_type;
+}
+
 bool perf_evlist__valid_read_format(struct perf_evlist *evlist)
 {
 	struct perf_evsel *first = perf_evlist__first(evlist), *pos = first;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a8930b6..3977570 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -165,6 +165,7 @@ void perf_evlist__set_leader(struct perf_evlist *evlist);
 u64 perf_evlist__read_format(struct perf_evlist *evlist);
 u64 __perf_evlist__combined_sample_type(struct perf_evlist *evlist);
 u64 perf_evlist__combined_sample_type(struct perf_evlist *evlist);
+u64 perf_evlist__combined_branch_type(struct perf_evlist *evlist);
 bool perf_evlist__sample_id_all(struct perf_evlist *evlist);
 u16 perf_evlist__id_hdr_size(struct perf_evlist *evlist);
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf report: Add infrastructure for a cycles histogram
  2015-07-18 15:24 ` [PATCH 3/9] perf, tools, report: Add infrastructure for a cycles histogram Andi Kleen
@ 2015-08-07  7:20   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, namhyung, jolsa, mingo, hpa, linux-kernel, tglx, acme

Commit-ID:  d4957633bf9dab70e566e7dbb2b8d0c61c3a2f1e
Gitweb:     http://git.kernel.org/tip/d4957633bf9dab70e566e7dbb2b8d0c61c3a2f1e
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:48 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:32:45 -0300

perf report: Add infrastructure for a cycles histogram

This adds the basic infrastructure to keep track of cycle counts per
basic block for annotate. We allocate an array similar to the normal
accounting, and then account branch cycles there.

We handle two cases:

cycles per basic block with start and cycles per branch (these are later
used for either IPC or just cycles per BB)

In the start case we cannot handle overlaps, so always the longest basic
block wins.

For the cycles per branch case everything is accurately accounted.

v2: Remove unnecessary checks. Slight restructure. Move
symbol__get_annotation to another patch. Move histogram allocation.
v3: Merged with current tree

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-4-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-annotate.c |   1 +
 tools/perf/util/annotate.c    | 127 +++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/annotate.h    |  17 ++++++
 3 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 2c1bec3..467a23b 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -187,6 +187,7 @@ find_next:
 			 * symbol, free he->ms.sym->src to signal we already
 			 * processed this symbol.
 			 */
+			zfree(&notes->src->cycles_hist);
 			zfree(&notes->src);
 		}
 	}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 03b7bc70..e0b6146 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -473,17 +473,73 @@ int symbol__alloc_hist(struct symbol *sym)
 	return 0;
 }
 
+/* The cycles histogram is lazily allocated. */
+static int symbol__alloc_hist_cycles(struct symbol *sym)
+{
+	struct annotation *notes = symbol__annotation(sym);
+	const size_t size = symbol__size(sym);
+
+	notes->src->cycles_hist = calloc(size, sizeof(struct cyc_hist));
+	if (notes->src->cycles_hist == NULL)
+		return -1;
+	return 0;
+}
+
 void symbol__annotate_zero_histograms(struct symbol *sym)
 {
 	struct annotation *notes = symbol__annotation(sym);
 
 	pthread_mutex_lock(&notes->lock);
-	if (notes->src != NULL)
+	if (notes->src != NULL) {
 		memset(notes->src->histograms, 0,
 		       notes->src->nr_histograms * notes->src->sizeof_sym_hist);
+		if (notes->src->cycles_hist)
+			memset(notes->src->cycles_hist, 0,
+				symbol__size(sym) * sizeof(struct cyc_hist));
+	}
 	pthread_mutex_unlock(&notes->lock);
 }
 
+static int __symbol__account_cycles(struct annotation *notes,
+				    u64 start,
+				    unsigned offset, unsigned cycles,
+				    unsigned have_start)
+{
+	struct cyc_hist *ch;
+
+	ch = notes->src->cycles_hist;
+	/*
+	 * For now we can only account one basic block per
+	 * final jump. But multiple could be overlapping.
+	 * Always account the longest one. So when
+	 * a shorter one has been already seen throw it away.
+	 *
+	 * We separately always account the full cycles.
+	 */
+	ch[offset].num_aggr++;
+	ch[offset].cycles_aggr += cycles;
+
+	if (!have_start && ch[offset].have_start)
+		return 0;
+	if (ch[offset].num) {
+		if (have_start && (!ch[offset].have_start ||
+				   ch[offset].start > start)) {
+			ch[offset].have_start = 0;
+			ch[offset].cycles = 0;
+			ch[offset].num = 0;
+			if (ch[offset].reset < 0xffff)
+				ch[offset].reset++;
+		} else if (have_start &&
+			   ch[offset].start < start)
+			return 0;
+	}
+	ch[offset].have_start = have_start;
+	ch[offset].start = start;
+	ch[offset].cycles += cycles;
+	ch[offset].num++;
+	return 0;
+}
+
 static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,
 				      struct annotation *notes, int evidx, u64 addr)
 {
@@ -506,7 +562,7 @@ static int __symbol__inc_addr_samples(struct symbol *sym, struct map *map,
 	return 0;
 }
 
-static struct annotation *symbol__get_annotation(struct symbol *sym)
+static struct annotation *symbol__get_annotation(struct symbol *sym, bool cycles)
 {
 	struct annotation *notes = symbol__annotation(sym);
 
@@ -514,6 +570,10 @@ static struct annotation *symbol__get_annotation(struct symbol *sym)
 		if (symbol__alloc_hist(sym) < 0)
 			return NULL;
 	}
+	if (!notes->src->cycles_hist && cycles) {
+		if (symbol__alloc_hist_cycles(sym) < 0)
+			return NULL;
+	}
 	return notes;
 }
 
@@ -524,12 +584,73 @@ static int symbol__inc_addr_samples(struct symbol *sym, struct map *map,
 
 	if (sym == NULL)
 		return 0;
-	notes = symbol__get_annotation(sym);
+	notes = symbol__get_annotation(sym, false);
 	if (notes == NULL)
 		return -ENOMEM;
 	return __symbol__inc_addr_samples(sym, map, notes, evidx, addr);
 }
 
+static int symbol__account_cycles(u64 addr, u64 start,
+				  struct symbol *sym, unsigned cycles)
+{
+	struct annotation *notes;
+	unsigned offset;
+
+	if (sym == NULL)
+		return 0;
+	notes = symbol__get_annotation(sym, true);
+	if (notes == NULL)
+		return -ENOMEM;
+	if (addr < sym->start || addr >= sym->end)
+		return -ERANGE;
+
+	if (start) {
+		if (start < sym->start || start >= sym->end)
+			return -ERANGE;
+		if (start >= addr)
+			start = 0;
+	}
+	offset = addr - sym->start;
+	return __symbol__account_cycles(notes,
+					start ? start - sym->start : 0,
+					offset, cycles,
+					!!start);
+}
+
+int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
+				    struct addr_map_symbol *start,
+				    unsigned cycles)
+{
+	unsigned long saddr = 0;
+	int err;
+
+	if (!cycles)
+		return 0;
+
+	/*
+	 * Only set start when IPC can be computed. We can only
+	 * compute it when the basic block is completely in a single
+	 * function.
+	 * Special case the case when the jump is elsewhere, but
+	 * it starts on the function start.
+	 */
+	if (start &&
+		(start->sym == ams->sym ||
+		 (ams->sym &&
+		   start->addr == ams->sym->start + ams->map->start)))
+		saddr = start->al_addr;
+	if (saddr == 0)
+		pr_debug2("BB with bad start: addr %lx start %lx sym %lx saddr %lx\n",
+			ams->addr,
+			start ? start->addr : 0,
+			ams->sym ? ams->sym->start + ams->map->start : 0,
+			saddr);
+	err = symbol__account_cycles(ams->al_addr, saddr, ams->sym, cycles);
+	if (err)
+		pr_debug2("account_cycles failed %d\n", err);
+	return err;
+}
+
 int addr_map_symbol__inc_samples(struct addr_map_symbol *ams, int evidx)
 {
 	return symbol__inc_addr_samples(ams->sym, ams->map, evidx, ams->al_addr);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 7e78e6c..a06518d 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -79,6 +79,17 @@ struct sym_hist {
 	u64		addr[0];
 };
 
+struct cyc_hist {
+	u64	start;
+	u64	cycles;
+	u64	cycles_aggr;
+	u32	num;
+	u32	num_aggr;
+	u8	have_start;
+	/* 1 byte padding */
+	u16	reset;
+};
+
 struct source_line_samples {
 	double		percent;
 	double		percent_sum;
@@ -97,6 +108,7 @@ struct source_line {
  * @histogram: Array of addr hit histograms per event being monitored
  * @lines: If 'print_lines' is specified, per source code line percentages
  * @source: source parsed from a disassembler like objdump -dS
+ * @cyc_hist: Average cycles per basic block
  *
  * lines is allocated, percentages calculated and all sorted by percentage
  * when the annotation is about to be presented, so the percentages are for
@@ -109,6 +121,7 @@ struct annotated_source {
 	struct source_line *lines;
 	int    		   nr_histograms;
 	int    		   sizeof_sym_hist;
+	struct cyc_hist	   *cycles_hist;
 	struct sym_hist	   histograms[0];
 };
 
@@ -130,6 +143,10 @@ static inline struct annotation *symbol__annotation(struct symbol *sym)
 
 int addr_map_symbol__inc_samples(struct addr_map_symbol *ams, int evidx);
 
+int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
+				    struct addr_map_symbol *start,
+				    unsigned cycles);
+
 int hist_entry__inc_addr_samples(struct hist_entry *he, int evidx, u64 addr);
 
 int symbol__alloc_hist(struct symbol *sym);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf report: Add processing for cycle histograms
  2015-07-18 15:24 ` [PATCH 4/9] perf, tools, report: Add processing for cycle histograms Andi Kleen
@ 2015-08-07  7:20   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, acme, hpa, ak, namhyung, jolsa, tglx

Commit-ID:  57849998e2cd24d50295076a1bbd2f029e2d7c38
Gitweb:     http://git.kernel.org/tip/57849998e2cd24d50295076a1bbd2f029e2d7c38
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:49 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:35:30 -0300

perf report: Add processing for cycle histograms

Call the earlier added cycle histogram infrastructure from the perf
report hist iter callback. For this we walk the branch records.

This allows to use cycle histograms when browsing perf report annotate.

v2: Rename flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-5-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c |  3 +++
 tools/perf/util/hist.c      | 33 +++++++++++++++++++++++++++++++++
 tools/perf/util/hist.h      |  3 +++
 3 files changed, 39 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3ba0e97..3a9d1b6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -103,6 +103,9 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
 	if (!ui__has_annotation())
 		return 0;
 
+	hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
+			     rep->nonany_branch_mode);
+
 	if (sort__mode == SORT_MODE__BRANCH) {
 		bi = he->branch_info;
 		err = addr_map_symbol__inc_samples(&bi->from, evsel->idx);
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 54fc003..a6e9ddd 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1415,6 +1415,39 @@ int hists__link(struct hists *leader, struct hists *other)
 	return 0;
 }
 
+void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
+			  struct perf_sample *sample, bool nonany_branch_mode)
+{
+	struct branch_info *bi;
+
+	/* If we have branch cycles always annotate them. */
+	if (bs && bs->nr && bs->entries[0].flags.cycles) {
+		int i;
+
+		bi = sample__resolve_bstack(sample, al);
+		if (bi) {
+			struct addr_map_symbol *prev = NULL;
+
+			/*
+			 * Ignore errors, still want to process the
+			 * other entries.
+			 *
+			 * For non standard branch modes always
+			 * force no IPC (prev == NULL)
+			 *
+			 * Note that perf stores branches reversed from
+			 * program order!
+			 */
+			for (i = bs->nr - 1; i >= 0; i--) {
+				addr_map_symbol__account_cycles(&bi[i].from,
+					nonany_branch_mode ? NULL : prev,
+					bi[i].flags.cycles);
+				prev = &bi[i].to;
+			}
+			free(bi);
+		}
+	}
+}
 
 size_t perf_evlist__fprintf_nr_events(struct perf_evlist *evlist, FILE *fp)
 {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3881d98..e2f712f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -350,6 +350,9 @@ static inline int script_browse(const char *script_opt __maybe_unused)
 
 unsigned int hists__sort_list_width(struct hists *hists);
 
+void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
+			  struct perf_sample *sample, bool nonany_branch_mode);
+
 struct option;
 int parse_filter_percentage(const struct option *opt __maybe_unused,
 			    const char *arg, int unset __maybe_unused);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf annotate: Compute IPC and basic block cycles
  2015-07-18 15:24 ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Andi Kleen
@ 2015-08-07  7:20   ` tip-bot for Andi Kleen
  2016-06-30  8:53   ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Peter Zijlstra
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, linux-kernel, acme, ak, namhyung, mingo, hpa, tglx

Commit-ID:  30e863bb6f708c0abd422fbb0e6b295f5ee6407b
Gitweb:     http://git.kernel.org/tip/30e863bb6f708c0abd422fbb0e6b295f5ee6407b
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:50 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:36:12 -0300

perf annotate: Compute IPC and basic block cycles

Compute the IPC and the basic block cycles for the annotate display.

IPC is computed by counting the instructions, and then dividing the
accounted cycles by that count.

The actual IPC computation can only be done at annotate time, because we
need to parse the objdump output first to know the number of
instructions in the basic block.

The cycles/IPC are also put into the perf function annotation so that
the display code can show them.

Again basic block overlaps are not handled, with the longest winning,
but there are some heuristics to hide the IPC when the longest is not
the most common.

v2: Compute IPC correctly.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/annotate.c | 73 ++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/annotate.h        |  2 ++
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 5995a8b..6ec1795 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -53,6 +53,7 @@ struct annotate_browser {
 	int		    max_jump_sources;
 	int		    nr_jumps;
 	bool		    searching_backwards;
+	bool		    have_cycles;
 	u8		    addr_width;
 	u8		    jumps_width;
 	u8		    target_width;
@@ -390,7 +391,7 @@ static void annotate_browser__calc_percent(struct annotate_browser *browser,
 				max_percent = bpos->samples[i].percent;
 		}
 
-		if (max_percent < 0.01) {
+		if (max_percent < 0.01 && pos->ipc == 0) {
 			RB_CLEAR_NODE(&bpos->rb_node);
 			continue;
 		}
@@ -869,6 +870,75 @@ int hist_entry__tui_annotate(struct hist_entry *he, struct perf_evsel *evsel,
 	return map_symbol__tui_annotate(&he->ms, evsel, hbt);
 }
 
+
+static unsigned count_insn(struct annotate_browser *browser, u64 start, u64 end)
+{
+	unsigned n_insn = 0;
+	u64 offset;
+
+	for (offset = start; offset <= end; offset++) {
+		if (browser->offsets[offset])
+			n_insn++;
+	}
+	return n_insn;
+}
+
+static void count_and_fill(struct annotate_browser *browser, u64 start, u64 end,
+			   struct cyc_hist *ch)
+{
+	unsigned n_insn;
+	u64 offset;
+
+	n_insn = count_insn(browser, start, end);
+	if (n_insn && ch->num && ch->cycles) {
+		float ipc = n_insn / ((double)ch->cycles / (double)ch->num);
+
+		/* Hide data when there are too many overlaps. */
+		if (ch->reset >= 0x7fff || ch->reset >= ch->num / 2)
+			return;
+
+		for (offset = start; offset <= end; offset++) {
+			struct disasm_line *dl = browser->offsets[offset];
+
+			if (dl)
+				dl->ipc = ipc;
+		}
+	}
+}
+
+/*
+ * This should probably be in util/annotate.c to share with the tty
+ * annotate, but right now we need the per byte offsets arrays,
+ * which are only here.
+ */
+static void annotate__compute_ipc(struct annotate_browser *browser, size_t size,
+			   struct symbol *sym)
+{
+	u64 offset;
+	struct annotation *notes = symbol__annotation(sym);
+
+	if (!notes->src || !notes->src->cycles_hist)
+		return;
+
+	pthread_mutex_lock(&notes->lock);
+	for (offset = 0; offset < size; ++offset) {
+		struct cyc_hist *ch;
+
+		ch = &notes->src->cycles_hist[offset];
+		if (ch && ch->cycles) {
+			struct disasm_line *dl;
+
+			if (ch->have_start)
+				count_and_fill(browser, ch->start, offset, ch);
+			dl = browser->offsets[offset];
+			if (dl && ch->num_aggr)
+				dl->cycles = ch->cycles_aggr / ch->num_aggr;
+			browser->have_cycles = true;
+		}
+	}
+	pthread_mutex_unlock(&notes->lock);
+}
+
 static void annotate_browser__mark_jump_targets(struct annotate_browser *browser,
 						size_t size)
 {
@@ -991,6 +1061,7 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map,
 	}
 
 	annotate_browser__mark_jump_targets(&browser, size);
+	annotate__compute_ipc(&browser, size, sym);
 
 	browser.addr_width = browser.target_width = browser.min_addr_width = hex_width(size);
 	browser.max_addr_width = hex_width(sym->end);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index a06518d..e999609 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -59,6 +59,8 @@ struct disasm_line {
 	char		    *name;
 	struct ins	    *ins;
 	int		    line_nr;
+	float		    ipc;
+	u64		    cycles;
 	struct ins_operands ops;
 };
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf annotate: Finally display IPC and cycle accounting
  2015-07-18 15:24 ` [PATCH 6/9] perf, tools, annotate: Finally display IPC and cycle accounting Andi Kleen
@ 2015-08-07  7:21   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tglx, linux-kernel, namhyung, acme, ak, mingo, jolsa

Commit-ID:  f8f4aaead579c947fb8fc051c9d242037025caf3
Gitweb:     http://git.kernel.org/tip/f8f4aaead579c947fb8fc051c9d242037025caf3
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:51 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:37:22 -0300

perf annotate: Finally display IPC and cycle accounting

Add two new columns to the annotate display and display the average
cycles and the compute IPC if available.

When the LBR was not in any branch mode the IPC computation is
automatically disabled. We still display the cycle information.

Example output (with made up numbers):

The second column is the IPC and third average cycles.

                 │    __attribute__((noinline)) f2()
                 │    {
  5.15  0.07     │       push   %rbp
  0.01  0.07     │       mov    %rsp,%rbp
                 │            c = a / b;
  9.87  0.07     │       mov    a,%eax
        0.07     │       mov    b,%ecx
        0.07     │       cltd
  4.92  0.07  123│       idiv   %ecx
 70.79  0.07     │       mov    %eax,__TMC_END__
                 │    }
  9.25  0.07     │       pop    %rbp
  0.01  0.07  123│     ← retq

v2: Fix display problems.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/annotate.c | 57 +++++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 6ec1795..b5fc847 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -16,6 +16,9 @@ struct disasm_line_samples {
 	u64		nr;
 };
 
+#define IPC_WIDTH 6
+#define CYCLES_WIDTH 6
+
 struct browser_disasm_line {
 	struct rb_node			rb_node;
 	u32				idx;
@@ -97,6 +100,15 @@ static int annotate_browser__set_jumps_percent_color(struct annotate_browser *br
 	 return ui_browser__set_color(&browser->b, color);
 }
 
+static int annotate_browser__pcnt_width(struct annotate_browser *ab)
+{
+	int w = 7 * ab->nr_events;
+
+	if (ab->have_cycles)
+		w += IPC_WIDTH + CYCLES_WIDTH;
+	return w;
+}
+
 static void annotate_browser__write(struct ui_browser *browser, void *entry, int row)
 {
 	struct annotate_browser *ab = container_of(browser, struct annotate_browser, b);
@@ -107,7 +119,7 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
 			     (!current_entry || (browser->use_navkeypressed &&
 					         !browser->navkeypressed)));
 	int width = browser->width, printed;
-	int i, pcnt_width = 7 * ab->nr_events;
+	int i, pcnt_width = annotate_browser__pcnt_width(ab);
 	double percent_max = 0.0;
 	char bf[256];
 
@@ -117,19 +129,34 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int
 	}
 
 	if (dl->offset != -1 && percent_max != 0.0) {
-		for (i = 0; i < ab->nr_events; i++) {
-			ui_browser__set_percent_color(browser,
-						      bdl->samples[i].percent,
-						      current_entry);
-			if (annotate_browser__opts.show_total_period)
-				slsmg_printf("%6" PRIu64 " ",
-					     bdl->samples[i].nr);
-			else
-				slsmg_printf("%6.2f ", bdl->samples[i].percent);
+		if (percent_max != 0.0) {
+			for (i = 0; i < ab->nr_events; i++) {
+				ui_browser__set_percent_color(browser,
+							bdl->samples[i].percent,
+							current_entry);
+				if (annotate_browser__opts.show_total_period)
+					slsmg_printf("%6" PRIu64 " ",
+						     bdl->samples[i].nr);
+				else
+					slsmg_printf("%6.2f ", bdl->samples[i].percent);
+			}
+		} else {
+			slsmg_write_nstring(" ", 7 * ab->nr_events);
 		}
 	} else {
 		ui_browser__set_percent_color(browser, 0, current_entry);
-		slsmg_write_nstring(" ", pcnt_width);
+		slsmg_write_nstring(" ", 7 * ab->nr_events);
+	}
+	if (ab->have_cycles) {
+		if (dl->ipc)
+			slsmg_printf("%*.2f ", IPC_WIDTH - 1, dl->ipc);
+		else
+			slsmg_write_nstring(" ", IPC_WIDTH);
+		if (dl->cycles)
+			slsmg_printf("%*" PRIu64 " ",
+				     CYCLES_WIDTH - 1, dl->cycles);
+		else
+			slsmg_write_nstring(" ", CYCLES_WIDTH);
 	}
 
 	SLsmg_write_char(' ');
@@ -232,7 +259,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 	unsigned int from, to;
 	struct map_symbol *ms = ab->b.priv;
 	struct symbol *sym = ms->sym;
-	u8 pcnt_width = 7;
+	u8 pcnt_width = annotate_browser__pcnt_width(ab);
 
 	/* PLT symbols contain external offsets */
 	if (strstr(sym->name, "@plt"))
@@ -256,8 +283,6 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 		to = (u64)btarget->idx;
 	}
 
-	pcnt_width *= ab->nr_events;
-
 	ui_browser__set_color(browser, HE_COLORSET_CODE);
 	__ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width,
 				 from, to);
@@ -267,9 +292,7 @@ static unsigned int annotate_browser__refresh(struct ui_browser *browser)
 {
 	struct annotate_browser *ab = container_of(browser, struct annotate_browser, b);
 	int ret = ui_browser__list_head_refresh(browser);
-	int pcnt_width;
-
-	pcnt_width = 7 * ab->nr_events;
+	int pcnt_width = annotate_browser__pcnt_width(ab);
 
 	if (annotate_browser__opts.jump_arrows)
 		annotate_browser__draw_current_jump(browser);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf top: Add branch annotation code to top
  2015-07-18 15:24 ` [PATCH 7/9] perf, tools, top: Add branch annotation code to top Andi Kleen
@ 2015-08-07  7:21   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, mingo, hpa, acme, linux-kernel, ak, namhyung, tglx

Commit-ID:  a18b027efe1a2a502d98a8d0ea0391a72bf3f696
Gitweb:     http://git.kernel.org/tip/a18b027efe1a2a502d98a8d0ea0391a72bf3f696
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:52 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:39:22 -0300

perf top: Add branch annotation code to top

Now that we can process branch data in annotate it makes sense to
support enabling branch recording from top too. Most of the code needed
for this is already in shared code with report. But we need to add:

- The option parsing code (using shared code from the previous patch)
- Document the options
- Set up the IPC/cycles accounting state in the top session
- Call the accounting code in the hist iter callback

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-8-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-top.txt | 21 +++++++++++++++++++++
 tools/perf/builtin-top.c              |  9 +++++++++
 2 files changed, 30 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 776aec4..f6a23eb 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -208,6 +208,27 @@ Default is to monitor all CPUS.
 	This option sets the time out limit. The default value is 500 ms.
 
 
+-b::
+--branch-any::
+	Enable taken branch stack sampling. Any type of taken branch may be sampled.
+	This is a shortcut for --branch-filter any. See --branch-filter for more infos.
+
+-j::
+--branch-filter::
+	Enable taken branch stack sampling. Each sample captures a series of consecutive
+	taken branches. The number of branches captured with each sample depends on the
+	underlying hardware, the type of branches of interest, and the executed code.
+	It is possible to select the types of branches captured by enabling filters.
+	For a full list of modifiers please see the perf record manpage.
+
+	The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
+	The privilege levels may be omitted, in which case, the privilege levels of the associated
+	event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
+	levels are subject to permissions.  When sampling on multiple events, branch stack sampling
+	is enabled for all the sampling events. The sampled branch type is the same for all events.
+	The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
+	Note that this feature may not be available on all processors.
+
 INTERACTIVE PROMPTING KEYS
 --------------------------
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ecf3197..bfe24f1 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -40,6 +40,7 @@
 #include "util/xyarray.h"
 #include "util/sort.h"
 #include "util/intlist.h"
+#include "util/parse-branch-options.h"
 #include "arch/common.h"
 
 #include "util/debug.h"
@@ -695,6 +696,8 @@ static int hist_iter__top_callback(struct hist_entry_iter *iter,
 		perf_top__record_precise_ip(top, he, evsel->idx, ip);
 	}
 
+	hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
+		     !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY));
 	return 0;
 }
 
@@ -1171,6 +1174,12 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "don't try to adjust column width, use these fixed values"),
 	OPT_UINTEGER(0, "proc-map-timeout", &opts->proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_CALLBACK_NOOPT('b', "branch-any", &opts->branch_stack,
+		     "branch any", "sample any taken branches",
+		     parse_branch_stack),
+	OPT_CALLBACK('j', "branch-filter", &opts->branch_stack,
+		     "branch filter mask", "branch stack filter modes",
+		     parse_branch_stack),
 	OPT_END()
 	};
 	const char * const top_usage[] = {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf report: Display cycles in branch sort mode
  2015-07-18 15:24 ` [PATCH 8/9] perf, tools, report: Display cycles in branch sort mode Andi Kleen
@ 2015-08-07  7:21   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Andi Kleen @ 2015-08-07  7:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, acme, linux-kernel, mingo, tglx, hpa, namhyung, ak

Commit-ID:  40997d6cf9fc40c85dba479e162a89e7530eb360
Gitweb:     http://git.kernel.org/tip/40997d6cf9fc40c85dba479e162a89e7530eb360
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Sat, 18 Jul 2015 08:24:53 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:39:53 -0300

perf report: Display cycles in branch sort mode

Display the cycles by default in branch sort mode.

To make enough room for the new column I removed dso_to. It is usually
redundant with dso_from.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1437233094-12844-9-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 5b7a50c..5177088 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -9,7 +9,7 @@ regex_t		parent_regex;
 const char	default_parent_pattern[] = "^sys_|^do_page_fault";
 const char	*parent_pattern = default_parent_pattern;
 const char	default_sort_order[] = "comm,dso,symbol";
-const char	default_branch_sort_order[] = "comm,dso_from,symbol_from,dso_to,symbol_to";
+const char	default_branch_sort_order[] = "comm,dso_from,symbol_from,symbol_to,cycles";
 const char	default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
 const char	default_top_sort_order[] = "dso,symbol";
 const char	default_diff_sort_order[] = "dso,symbol";

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate
  2015-07-18 15:24 ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Andi Kleen
  2015-08-07  7:20   ` [tip:perf/core] perf annotate: Compute IPC and basic block cycles tip-bot for Andi Kleen
@ 2016-06-30  8:53   ` Peter Zijlstra
  2016-07-02 20:38     ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2016-06-30  8:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, linux-kernel, namhyung, Andi Kleen

On Sat, Jul 18, 2015 at 08:24:50AM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Compute the IPC and the basic block cycles for the annotate display.
> 
> IPC is computed by counting the instructions, and then dividing the
> accounted cycles by that count.
> 
> The actual IPC computation can only be done at annotate time,
> because we need to parse the objdump output first to know
> the number of instructions in the basic block.
> 
> The cycles/IPC are also put into the perf function annotation
> so that the display code can show them.
> 
> Again basic block overlaps are not handled, with the longest winning,
> but there are some heuristics to hide the IPC when the longest is not
> the most common.

I'm looking at basic block support, but this all seems to depend on the
cycles stuff. Can we get the basic block stuff without that?

I'm looking to plot the hottest path through a branchy function.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate
  2016-06-30  8:53   ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Peter Zijlstra
@ 2016-07-02 20:38     ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2016-07-02 20:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, acme, jolsa, linux-kernel, namhyung, Andi Kleen

> I'm looking at basic block support, but this all seems to depend on the
> cycles stuff. Can we get the basic block stuff without that?

Not sure what you mean with basic block stuff, but ...
> 
> I'm looking to plot the hottest path through a branchy function.

Use --branch-history to get a histogram of the hottest paths.
If you want a longer history than 16/32 you can also use PT
and tell the decoder to synthesize larger LBRs.

However currently it cannot output metadata like cycles

Or apply the following patches to show the path with perf script for
individual samples
(unfortunatly the disassembler support was rejected, so this
won't be available in mainline)

https://git.kernel.org/cgit/linux/kernel/git/ak/linux-misc.git/log/?h=perf/disassembler-1

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-07-02 20:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-18 15:24 Cycles annotation support for perf tools v3 Andi Kleen
2015-07-18 15:24 ` [PATCH 1/9] perf, tools: Add tools support for cycles, weight branch_info field Andi Kleen
2015-08-07  7:19   ` [tip:perf/core] perf tools: Add " tip-bot for Andi Kleen
2015-07-18 15:24 ` [PATCH 2/9] perf, tools, report: Add flag for non ANY branch mode Andi Kleen
2015-08-07  7:19   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2015-07-18 15:24 ` [PATCH 3/9] perf, tools, report: Add infrastructure for a cycles histogram Andi Kleen
2015-08-07  7:20   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2015-07-18 15:24 ` [PATCH 4/9] perf, tools, report: Add processing for cycle histograms Andi Kleen
2015-08-07  7:20   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2015-07-18 15:24 ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Andi Kleen
2015-08-07  7:20   ` [tip:perf/core] perf annotate: Compute IPC and basic block cycles tip-bot for Andi Kleen
2016-06-30  8:53   ` [PATCH 5/9] perf, tools: Compute IPC and basic block cycles for annotate Peter Zijlstra
2016-07-02 20:38     ` Andi Kleen
2015-07-18 15:24 ` [PATCH 6/9] perf, tools, annotate: Finally display IPC and cycle accounting Andi Kleen
2015-08-07  7:21   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2015-07-18 15:24 ` [PATCH 7/9] perf, tools, top: Add branch annotation code to top Andi Kleen
2015-08-07  7:21   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2015-07-18 15:24 ` [PATCH 8/9] perf, tools, report: Display cycles in branch sort mode Andi Kleen
2015-08-07  7:21   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2015-07-18 15:24 ` [PATCH 9/9] test patch: Add fake branch cycles to input data in report/top Andi Kleen
2015-08-06 19:44 ` Cycles annotation support for perf tools v3 Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).