All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] perf report: Support sorting all blocks by cycles
@ 2019-10-15  5:33 Jin Yao
  2019-10-15  5:33 ` [PATCH v2 1/5] perf util: Create new block.h/block.c for block related functions Jin Yao
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Jin Yao @ 2019-10-15  5:33 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

It would be useful to support sorting for all blocks by the
sampled cycles percent per block. This is useful to concentrate
on the globally hottest blocks.

This patch series implements a new sort option "total_cycles" which
sorts all blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is
block sampled cycles aggregation / total sampled cycles

For example,

perf record -b ./div
perf report -s total_cycles --stdio

 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 2M of event 'cycles'
 # Event count (approx.): 2753248
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
 # ...............  ..............  ...........  ..........  .................................................................  ....................
 #
            26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
            15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
             5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
             4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
             4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
             3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
             3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
             3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
             2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
             2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
             2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
             2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
             2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
             2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
             1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
             1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
             1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
 ......

This patch series supports both stdio and tui. And also with the supporting
of --percent-limit.

 v2:
 ---
 Rebase to perf/core branch

Jin Yao (5):
  perf util: Create new block.h/block.c for block related functions
  perf util: Count the total cycles of all samples
  perf report: Sort by sampled cycles percent per block for stdio
  perf report: Support --percent-limit for total_cycles
  perf report: Sort by sampled cycles percent per block for tui

 tools/perf/Documentation/perf-report.txt |  10 +
 tools/perf/builtin-annotate.c            |   2 +-
 tools/perf/builtin-diff.c                |  41 +--
 tools/perf/builtin-report.c              | 445 ++++++++++++++++++++++-
 tools/perf/builtin-top.c                 |   3 +-
 tools/perf/ui/browsers/hists.c           |  62 +++-
 tools/perf/ui/browsers/hists.h           |   2 +
 tools/perf/ui/stdio/hist.c               |  29 +-
 tools/perf/util/Build                    |   1 +
 tools/perf/util/block.c                  |  73 ++++
 tools/perf/util/block.h                  |  40 ++
 tools/perf/util/hist.c                   |  11 +-
 tools/perf/util/hist.h                   |  15 +-
 tools/perf/util/sort.c                   |   5 +
 tools/perf/util/sort.h                   |   1 +
 tools/perf/util/symbol.c                 |  22 --
 tools/perf/util/symbol.h                 |  24 --
 tools/perf/util/symbol_conf.h            |   1 +
 18 files changed, 694 insertions(+), 93 deletions(-)
 create mode 100644 tools/perf/util/block.c
 create mode 100644 tools/perf/util/block.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/5] perf util: Create new block.h/block.c for block related functions
  2019-10-15  5:33 [PATCH v2 0/5] perf report: Support sorting all blocks by cycles Jin Yao
@ 2019-10-15  5:33 ` Jin Yao
  2019-10-21 16:08   ` Jiri Olsa
  2019-10-15  5:33 ` [PATCH v2 2/5] perf util: Count the total cycles of all samples Jin Yao
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Jin Yao @ 2019-10-15  5:33 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

We have already implemented some block related functions. Now it's
time to do some cleanup, and move the functions and structures to
the new block.h/block.c.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-diff.c | 38 ++---------------------
 tools/perf/util/Build     |  1 +
 tools/perf/util/block.c   | 63 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/block.h   | 37 +++++++++++++++++++++++
 tools/perf/util/hist.c    |  1 +
 tools/perf/util/symbol.c  | 22 --------------
 tools/perf/util/symbol.h  | 24 ---------------
 7 files changed, 104 insertions(+), 82 deletions(-)
 create mode 100644 tools/perf/util/block.c
 create mode 100644 tools/perf/util/block.h

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 5281629c27b1..05925b39718a 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -24,6 +24,7 @@
 #include "util/annotate.h"
 #include "util/map.h"
 #include "util/spark.h"
+#include "util/block.h"
 #include <linux/err.h>
 #include <linux/zalloc.h>
 #include <subcmd/pager.h>
@@ -537,41 +538,6 @@ static void hists__baseline_only(struct hists *hists)
 	}
 }
 
-static int64_t block_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
-			 struct hist_entry *left, struct hist_entry *right)
-{
-	struct block_info *bi_l = left->block_info;
-	struct block_info *bi_r = right->block_info;
-	int cmp;
-
-	if (!bi_l->sym || !bi_r->sym) {
-		if (!bi_l->sym && !bi_r->sym)
-			return 0;
-		else if (!bi_l->sym)
-			return -1;
-		else
-			return 1;
-	}
-
-	if (bi_l->sym == bi_r->sym) {
-		if (bi_l->start == bi_r->start) {
-			if (bi_l->end == bi_r->end)
-				return 0;
-			else
-				return (int64_t)(bi_r->end - bi_l->end);
-		} else
-			return (int64_t)(bi_r->start - bi_l->start);
-	} else {
-		cmp = strcmp(bi_l->sym->name, bi_r->sym->name);
-		return cmp;
-	}
-
-	if (bi_l->sym->start != bi_r->sym->start)
-		return (int64_t)(bi_r->sym->start - bi_l->sym->start);
-
-	return (int64_t)(bi_r->sym->end - bi_l->sym->end);
-}
-
 static int64_t block_cycles_diff_cmp(struct hist_entry *left,
 				     struct hist_entry *right)
 {
@@ -600,7 +566,7 @@ static void init_block_hist(struct block_hist *bh)
 
 	INIT_LIST_HEAD(&bh->block_fmt.list);
 	INIT_LIST_HEAD(&bh->block_fmt.sort_list);
-	bh->block_fmt.cmp = block_cmp;
+	bh->block_fmt.cmp = block_info__cmp;
 	bh->block_fmt.sort = block_sort;
 	perf_hpp_list__register_sort_field(&bh->block_list,
 					   &bh->block_fmt);
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 39814b1806a6..3121c0055950 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -1,4 +1,5 @@
 perf-y += annotate.o
+perf-y += block.o
 perf-y += block-range.o
 perf-y += build-id.o
 perf-y += cacheline.o
diff --git a/tools/perf/util/block.c b/tools/perf/util/block.c
new file mode 100644
index 000000000000..e5e6f941f040
--- /dev/null
+++ b/tools/perf/util/block.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdlib.h>
+#include <string.h>
+#include <linux/zalloc.h>
+#include "block.h"
+#include "sort.h"
+
+struct block_info *block_info__get(struct block_info *bi)
+{
+	if (bi)
+		refcount_inc(&bi->refcnt);
+	return bi;
+}
+
+void block_info__put(struct block_info *bi)
+{
+	if (bi && refcount_dec_and_test(&bi->refcnt))
+		free(bi);
+}
+
+struct block_info *block_info__new(void)
+{
+	struct block_info *bi = zalloc(sizeof(*bi));
+
+	if (bi)
+		refcount_set(&bi->refcnt, 1);
+	return bi;
+}
+
+int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+			struct hist_entry *left, struct hist_entry *right)
+{
+	struct block_info *bi_l = left->block_info;
+	struct block_info *bi_r = right->block_info;
+	int cmp;
+
+	if (!bi_l->sym || !bi_r->sym) {
+		if (!bi_l->sym && !bi_r->sym)
+			return 0;
+		else if (!bi_l->sym)
+			return -1;
+		else
+			return 1;
+	}
+
+	if (bi_l->sym == bi_r->sym) {
+		if (bi_l->start == bi_r->start) {
+			if (bi_l->end == bi_r->end)
+				return 0;
+			else
+				return (int64_t)(bi_r->end - bi_l->end);
+		} else
+			return (int64_t)(bi_r->start - bi_l->start);
+	} else {
+		cmp = strcmp(bi_l->sym->name, bi_r->sym->name);
+		return cmp;
+	}
+
+	if (bi_l->sym->start != bi_r->sym->start)
+		return (int64_t)(bi_r->sym->start - bi_l->sym->start);
+
+	return (int64_t)(bi_r->sym->end - bi_l->sym->end);
+}
diff --git a/tools/perf/util/block.h b/tools/perf/util/block.h
new file mode 100644
index 000000000000..b6730204d0b9
--- /dev/null
+++ b/tools/perf/util/block.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_BLOCK_H
+#define __PERF_BLOCK_H
+
+#include <linux/types.h>
+#include <linux/refcount.h>
+#include "util/hist.h"
+#include "util/symbol.h"
+
+struct block_info {
+	struct symbol		*sym;
+	u64			start;
+	u64			end;
+	u64			cycles;
+	u64			cycles_aggr;
+	s64			cycles_spark[NUM_SPARKS];
+	int			num;
+	int			num_aggr;
+	refcount_t		refcnt;
+};
+
+struct block_info *block_info__new(void);
+struct block_info *block_info__get(struct block_info *bi);
+void   block_info__put(struct block_info *bi);
+
+static inline void __block_info__zput(struct block_info **bi)
+{
+	block_info__put(*bi);
+	*bi = NULL;
+}
+
+#define block_info__zput(bi) __block_info__zput(&bi)
+
+int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+			struct hist_entry *left, struct hist_entry *right);
+
+#endif /* __PERF_BLOCK_H */
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 679a1d75090c..26ee45a3e5d0 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -18,6 +18,7 @@
 #include "srcline.h"
 #include "symbol.h"
 #include "thread.h"
+#include "block.h"
 #include "ui/progress.h"
 #include <errno.h>
 #include <math.h>
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a8f80e427674..fac8887a6759 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2371,25 +2371,3 @@ struct mem_info *mem_info__new(void)
 		refcount_set(&mi->refcnt, 1);
 	return mi;
 }
-
-struct block_info *block_info__get(struct block_info *bi)
-{
-	if (bi)
-		refcount_inc(&bi->refcnt);
-	return bi;
-}
-
-void block_info__put(struct block_info *bi)
-{
-	if (bi && refcount_dec_and_test(&bi->refcnt))
-		free(bi);
-}
-
-struct block_info *block_info__new(void)
-{
-	struct block_info *bi = zalloc(sizeof(*bi));
-
-	if (bi)
-		refcount_set(&bi->refcnt, 1);
-	return bi;
-}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index cc2a89b99d3d..c3bd16d75d5d 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -106,18 +106,6 @@ struct ref_reloc_sym {
 	u64		unrelocated_addr;
 };
 
-struct block_info {
-	struct symbol		*sym;
-	u64			start;
-	u64			end;
-	u64			cycles;
-	u64			cycles_aggr;
-	s64			cycles_spark[NUM_SPARKS];
-	int			num;
-	int			num_aggr;
-	refcount_t		refcnt;
-};
-
 struct addr_location {
 	struct machine *machine;
 	struct thread *thread;
@@ -291,16 +279,4 @@ static inline void __mem_info__zput(struct mem_info **mi)
 
 #define mem_info__zput(mi) __mem_info__zput(&mi)
 
-struct block_info *block_info__new(void);
-struct block_info *block_info__get(struct block_info *bi);
-void   block_info__put(struct block_info *bi);
-
-static inline void __block_info__zput(struct block_info **bi)
-{
-	block_info__put(*bi);
-	*bi = NULL;
-}
-
-#define block_info__zput(bi) __block_info__zput(&bi)
-
 #endif /* __PERF_SYMBOL */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/5] perf util: Count the total cycles of all samples
  2019-10-15  5:33 [PATCH v2 0/5] perf report: Support sorting all blocks by cycles Jin Yao
  2019-10-15  5:33 ` [PATCH v2 1/5] perf util: Create new block.h/block.c for block related functions Jin Yao
@ 2019-10-15  5:33 ` Jin Yao
  2019-10-15  5:33 ` [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio Jin Yao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Jin Yao @ 2019-10-15  5:33 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

We can get the per sample cycles by hist__account_cycles(). It's also
useful to know the total cycles of all samples in order to get the
cycles coverage for a single program block in further. For example,

coverage = per block sampled cycles / total sampled cycles

This patch creates a new argument 'total_cycles' in hist__account_cycles(),
which will be added with the cycles of each sample.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-annotate.c | 2 +-
 tools/perf/builtin-diff.c     | 3 ++-
 tools/perf/builtin-report.c   | 2 +-
 tools/perf/builtin-top.c      | 3 ++-
 tools/perf/util/hist.c        | 6 +++++-
 tools/perf/util/hist.h        | 3 ++-
 6 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 8db8fc9bddef..6ab0cc45b287 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -201,7 +201,7 @@ static int process_branch_callback(struct evsel *evsel,
 	if (a.map != NULL)
 		a.map->dso->hit = 1;
 
-	hist__account_cycles(sample->branch_stack, al, sample, false);
+	hist__account_cycles(sample->branch_stack, al, sample, false, NULL);
 
 	ret = hist_entry_iter__add(&iter, &a, PERF_MAX_STACK_DEPTH, ann);
 	return ret;
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 05925b39718a..7487e113454f 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -428,7 +428,8 @@ static int diff__process_sample_event(struct perf_tool *tool,
 			goto out_put;
 		}
 
-		hist__account_cycles(sample->branch_stack, &al, sample, false);
+		hist__account_cycles(sample->branch_stack, &al, sample, false,
+				     NULL);
 	}
 
 	/*
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 7accaf8ef689..cdb436d6e11f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -292,7 +292,7 @@ static int process_sample_event(struct perf_tool *tool,
 
 	if (ui__has_annotation() || rep->symbol_ipc) {
 		hist__account_cycles(sample->branch_stack, &al, sample,
-				     rep->nonany_branch_mode);
+				     rep->nonany_branch_mode, NULL);
 	}
 
 	ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d96f24c8770d..14c52e4d47f6 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -725,7 +725,8 @@ static int hist_iter__top_callback(struct hist_entry_iter *iter,
 		perf_top__record_precise_ip(top, he, iter->sample, evsel, al->addr);
 
 	hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
-		     !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY));
+		     !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY),
+		     NULL);
 	return 0;
 }
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 26ee45a3e5d0..af65ce950ba2 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2570,7 +2570,8 @@ int hists__unlink(struct hists *hists)
 }
 
 void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
-			  struct perf_sample *sample, bool nonany_branch_mode)
+			  struct perf_sample *sample, bool nonany_branch_mode,
+			  u64 *total_cycles)
 {
 	struct branch_info *bi;
 
@@ -2597,6 +2598,9 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 					nonany_branch_mode ? NULL : prev,
 					bi[i].flags.cycles);
 				prev = &bi[i].to;
+
+				if (total_cycles)
+					*total_cycles += bi[i].flags.cycles;
 			}
 			free(bi);
 		}
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 6a186b668303..4d87c7b4c1b2 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -527,7 +527,8 @@ unsigned int hists__sort_list_width(struct hists *hists);
 unsigned int hists__overhead_width(struct hists *hists);
 
 void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
-			  struct perf_sample *sample, bool nonany_branch_mode);
+			  struct perf_sample *sample, bool nonany_branch_mode,
+			  u64 *total_cycles);
 
 struct option;
 int parse_filter_percentage(const struct option *opt, const char *arg, int unset);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-15  5:33 [PATCH v2 0/5] perf report: Support sorting all blocks by cycles Jin Yao
  2019-10-15  5:33 ` [PATCH v2 1/5] perf util: Create new block.h/block.c for block related functions Jin Yao
  2019-10-15  5:33 ` [PATCH v2 2/5] perf util: Count the total cycles of all samples Jin Yao
@ 2019-10-15  5:33 ` Jin Yao
  2019-10-15  8:41   ` Jiri Olsa
                     ` (2 more replies)
  2019-10-15  5:33 ` [PATCH v2 4/5] perf report: Support --percent-limit for total_cycles Jin Yao
  2019-10-15  5:33 ` [PATCH v2 5/5] perf report: Sort by sampled cycles percent per block for tui Jin Yao
  4 siblings, 3 replies; 20+ messages in thread
From: Jin Yao @ 2019-10-15  5:33 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

It would be useful to support sorting for all blocks by the
sampled cycles percent per block. This is useful to concentrate
on the globally hottest blocks.

This patch implements a new sort option "total_cycles" which sorts
all blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the
percent that is,
	block sampled cycles aggregation / total sampled cycles

Note that, this patch only supports "--stdio" mode.

For example,

perf record -b ./div
perf report -s total_cycles --stdio

 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 2M of event 'cycles'
 # Event count (approx.): 2753248
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
 # ...............  ..............  ...........  ..........  .................................................................  ....................
 #
            26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
            15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
             5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
             4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
             4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
             3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
             3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
             3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
             2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
             2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
             2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
             2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
             2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
             2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
             1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
             1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
             1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so
             0.25%          182.5K        0.02%           1                                 [random_r.c:388 -> random_r.c:391]          libc-2.27.so
             0.00%              48        1.07%          48                         [x86_pmu_enable+284 -> x86_pmu_enable+298]     [kernel.kallsyms]
             0.00%              74        1.64%          74                              [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92]     [kernel.kallsyms]
             0.00%              73        1.62%          73                                          [vm_mmap+0 -> vm_mmap+48]     [kernel.kallsyms]
             0.00%              63        0.69%          31                                        [up_write+0 -> up_write+34]     [kernel.kallsyms]
             0.00%              13        0.29%          13                       [setup_arg_pages+396 -> setup_arg_pages+413]     [kernel.kallsyms]
             0.00%               3        0.07%           3                       [setup_arg_pages+418 -> setup_arg_pages+450]     [kernel.kallsyms]
             0.00%             616        6.84%         308                    [security_mmap_file+0 -> security_mmap_file+72]     [kernel.kallsyms]
             0.00%              23        0.51%          23                   [security_mmap_file+77 -> security_mmap_file+87]     [kernel.kallsyms]
             0.00%               4        0.02%           1                                   [sched_clock+0 -> sched_clock+4]     [kernel.kallsyms]
             0.00%               4        0.02%           1                                  [sched_clock+9 -> sched_clock+12]     [kernel.kallsyms]
             0.00%               1        0.02%           1                                 [rcu_nmi_exit+0 -> rcu_nmi_exit+9]     [kernel.kallsyms]
             0.00%               1        0.02%           1                               [rcu_nmi_exit+14 -> rcu_nmi_exit+15]     [kernel.kallsyms]
             0.00%               5        0.11%           5                                [rcu_irq_exit+0 -> rcu_irq_exit+79]     [kernel.kallsyms]
             0.00%               2        0.04%           2                          [printk_nmi_exit+0 -> printk_nmi_exit+16]     [kernel.kallsyms]
             0.00%               3        0.07%           3            [perf_sample_event_took+0 -> perf_sample_event_took+56]     [kernel.kallsyms]
             0.00%               1        0.02%           1         [perf_sample_event_took+184 -> perf_sample_event_took+185]     [kernel.kallsyms]
             0.00%              72        1.60%          72                        [perf_iterate_ctx+0 -> perf_iterate_ctx+50]     [kernel.kallsyms]
             0.00%               1        0.02%           1           [perf_event_nmi_handler+50 -> perf_event_nmi_handler+52]     [kernel.kallsyms]
             0.00%               1        0.02%           1           [perf_event_nmi_handler+57 -> perf_event_nmi_handler+63]     [kernel.kallsyms]
             0.00%               1        0.02%           1           [perf_event_nmi_handler+68 -> perf_event_nmi_handler+74]     [kernel.kallsyms]

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |  10 +
 tools/perf/builtin-report.c              | 423 ++++++++++++++++++++++-
 tools/perf/ui/stdio/hist.c               |  22 ++
 tools/perf/util/block.h                  |   1 +
 tools/perf/util/hist.c                   |   4 +
 tools/perf/util/sort.c                   |   5 +
 tools/perf/util/sort.h                   |   1 +
 tools/perf/util/symbol_conf.h            |   1 +
 8 files changed, 463 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 7315f155803f..b1f9c93a91fd 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -124,6 +124,7 @@ OPTIONS
 	- in_tx: branch in TSX transaction
 	- abort: TSX transaction abort.
 	- cycles: Cycles in basic block
+	- total_cycles: sort all blocks by the 'Sampled Cycles%'.
 
 	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
 	and symbol_to, see '--branch-stack'.
@@ -136,6 +137,15 @@ OPTIONS
 	executed, such as a memory access bottleneck. If a function has high overhead
 	and low IPC, it's worth further analyzing it to optimize its performance.
 
+	When the total_cycles is specified, it supports sorting for all blocks by the
+	'Sampled Cycles%'. This is useful to concentrate on the globally slowest blocks.
+	In output, there are some new columns:
+	'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles
+	'Sampled Cycles' - block sampled cycles aggregation
+	'Avg Cycles%' - block average sampled cycles / sum of total block average
+	sampled cycles
+	'Avg Cycles' - block average sampled cycles
+
 	If the --mem-mode option is used, the following sort keys are also available
 	(incompatible with --branch-stack):
 	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index cdb436d6e11f..f04a7b225a81 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -51,6 +51,7 @@
 #include "util/util.h" // perf_tip()
 #include "ui/ui.h"
 #include "ui/progress.h"
+#include "util/block.h"
 
 #include <dlfcn.h>
 #include <errno.h>
@@ -96,10 +97,64 @@ struct report {
 	float			min_percent;
 	u64			nr_entries;
 	u64			queue_size;
+	u64			cycles_count;
+	u64			block_cycles;
 	int			socket_filter;
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 	struct branch_type_stat	brtype_stat;
 	bool			symbol_ipc;
+	bool			total_cycles;
+	struct block_hist	block_hist;
+};
+
+struct block_fmt {
+	struct perf_hpp_fmt	fmt;
+	int			idx;
+	int			width;
+	const char		*header;
+	struct report		*rep;
+};
+
+enum {
+	PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV,
+	PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
+	PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
+	PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
+	PERF_HPP_REPORT__BLOCK_RANGE,
+	PERF_HPP_REPORT__BLOCK_DSO,
+	PERF_HPP_REPORT__BLOCK_MAX_INDEX
+};
+
+static struct block_fmt block_fmts[PERF_HPP_REPORT__BLOCK_MAX_INDEX];
+
+static struct block_header_column{
+	const char *name;
+	int width;
+} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
+	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
+		.name = "Sampled Cycles%",
+		.width = 15,
+	},
+	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
+		.name = "Sampled Cycles",
+		.width = 14,
+	},
+	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
+		.name = "Avg Cycles%",
+		.width = 11,
+	},
+	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
+		.name = "Avg Cycles",
+		.width = 10,
+	},
+	[PERF_HPP_REPORT__BLOCK_RANGE] = {
+		.name = "[Program Block Range]",
+		.width = 70,
+	},
+	[PERF_HPP_REPORT__BLOCK_DSO] = {
+		.name = "Shared Object",
+		.width = 20,
+	}
 };
 
 static int report__config(const char *var, const char *value, void *cb)
@@ -277,7 +332,8 @@ static int process_sample_event(struct perf_tool *tool,
 		if (!sample->branch_stack)
 			goto out_put;
 
-		iter.add_entry_cb = hist_iter__branch_callback;
+		if (!rep->total_cycles)
+			iter.add_entry_cb = hist_iter__branch_callback;
 		iter.ops = &hist_iter_branch;
 	} else if (rep->mem_mode) {
 		iter.ops = &hist_iter_mem;
@@ -290,9 +346,10 @@ static int process_sample_event(struct perf_tool *tool,
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
 
-	if (ui__has_annotation() || rep->symbol_ipc) {
+	if (ui__has_annotation() || rep->symbol_ipc || rep->total_cycles) {
 		hist__account_cycles(sample->branch_stack, &al, sample,
-				     rep->nonany_branch_mode, NULL);
+				     rep->nonany_branch_mode,
+				     &rep->cycles_count);
 	}
 
 	ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
@@ -480,6 +537,349 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
 	return ret + fprintf(fp, "\n#\n");
 }
 
+static int block_column_header(struct perf_hpp_fmt *fmt __maybe_unused,
+			       struct perf_hpp *hpp __maybe_unused,
+			       struct hists *hists __maybe_unused,
+			       int line __maybe_unused,
+			       int *span __maybe_unused)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+			 block_fmt->header);
+}
+
+static int block_column_width(struct perf_hpp_fmt *fmt __maybe_unused,
+			      struct perf_hpp *hpp __maybe_unused,
+			      struct hists *hists __maybe_unused)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+
+	return block_fmt->width;
+}
+
+static int block_cycles_cov_entry(struct perf_hpp_fmt *fmt,
+				  struct perf_hpp *hpp, struct hist_entry *he)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct report *rep = block_fmt->rep;
+	struct block_info *bi = he->block_info;
+	double ratio = 0.0;
+	char buf[16];
+
+	if (rep->cycles_count)
+		ratio = (double)bi->cycles / (double)rep->cycles_count;
+
+	sprintf(buf, "%.2f%%", 100.0 * ratio);
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
+}
+
+static int64_t block_cycles_cov_sort(struct perf_hpp_fmt *fmt,
+				     struct hist_entry *left,
+				     struct hist_entry *right)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct report *rep = block_fmt->rep;
+	struct block_info *bi_l = left->block_info;
+	struct block_info *bi_r = right->block_info;
+	double l, r;
+
+	if (rep->cycles_count) {
+		l = ((double)bi_l->cycles / (double)rep->cycles_count) * 1000.0;
+		r = ((double)bi_r->cycles / (double)rep->cycles_count) * 1000.0;
+		return (int64_t)l - (int64_t)r;
+	}
+
+	return 0;
+}
+
+static void cycles_string(u64 cycles, char *buf, int size)
+{
+	if (cycles >= 1000000)
+		scnprintf(buf, size, "%.1fM", (double)cycles / 1000000.0);
+	else if (cycles >= 1000)
+		scnprintf(buf, size, "%.1fK", (double)cycles / 1000.0);
+	else
+		scnprintf(buf, size, "%1d", cycles);
+}
+
+static int block_cycles_lbr_entry(struct perf_hpp_fmt *fmt,
+				  struct perf_hpp *hpp, struct hist_entry *he)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct block_info *bi = he->block_info;
+	char cycles_buf[16];
+
+	cycles_string(bi->cycles_aggr, cycles_buf, sizeof(cycles_buf));
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+			 cycles_buf);
+}
+
+static int block_cycles_pct_entry(struct perf_hpp_fmt *fmt,
+				  struct perf_hpp *hpp, struct hist_entry *he)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct report *rep = block_fmt->rep;
+	struct block_info *bi = he->block_info;
+	double ratio = 0.0;
+	u64 avg;
+	char buf[16];
+
+	if (rep->block_cycles && bi->num_aggr) {
+		avg = bi->cycles_aggr / bi->num_aggr;
+		ratio = (double)avg / (double)rep->block_cycles;
+	}
+
+	sprintf(buf, "%.2f%%", 100.0 * ratio);
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
+}
+
+static int block_avg_cycles_entry(struct perf_hpp_fmt *fmt,
+				  struct perf_hpp *hpp,
+				  struct hist_entry *he)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct block_info *bi = he->block_info;
+	char cycles_buf[16];
+
+	cycles_string(bi->cycles_aggr / bi->num_aggr, cycles_buf,
+		      sizeof(cycles_buf));
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+			 cycles_buf);
+}
+
+static int block_range_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			     struct hist_entry *he)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct block_info *bi = he->block_info;
+	char buf[128];
+	char *start_line, *end_line;
+
+	symbol_conf.disable_add2line_warn = true;
+
+	start_line = map__srcline(he->ms.map, bi->sym->start + bi->start,
+				  he->ms.sym);
+
+	end_line = map__srcline(he->ms.map, bi->sym->start + bi->end,
+				he->ms.sym);
+
+	if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
+		scnprintf(buf, sizeof(buf), "[%s -> %s]",
+			  start_line, end_line);
+	} else {
+		scnprintf(buf, sizeof(buf), "[%7lx -> %7lx]",
+			  bi->start, bi->end);
+	}
+
+	free_srcline(start_line);
+	free_srcline(end_line);
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
+}
+
+static int block_dso_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			   struct hist_entry *he)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct map *map = he->ms.map;
+
+	if (map && map->dso) {
+		return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+				 map->dso->short_name);
+	}
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+			 "[unknown]");
+}
+
+static void init_block_header(struct block_fmt *block_fmt)
+{
+	struct perf_hpp_fmt *fmt = &block_fmt->fmt;
+
+	BUG_ON(block_fmt->idx >= PERF_HPP_REPORT__BLOCK_MAX_INDEX);
+
+	block_fmt->header = block_columns[block_fmt->idx].name;
+	block_fmt->width = block_columns[block_fmt->idx].width;
+
+	fmt->header = block_column_header;
+	fmt->width = block_column_width;
+}
+
+static void block_hpp_register(struct block_fmt *block_fmt, int idx,
+			       struct perf_hpp_list *hpp_list,
+			       struct report *rep)
+{
+	struct perf_hpp_fmt *fmt = &block_fmt->fmt;
+
+	block_fmt->rep = rep;
+	block_fmt->idx = idx;
+	INIT_LIST_HEAD(&fmt->list);
+	INIT_LIST_HEAD(&fmt->sort_list);
+
+	switch (idx) {
+	case PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV:
+		fmt->entry = block_cycles_cov_entry;
+		fmt->cmp = block_info__cmp;
+		fmt->sort = block_cycles_cov_sort;
+		break;
+	case PERF_HPP_REPORT__BLOCK_LBR_CYCLES:
+		fmt->entry = block_cycles_lbr_entry;
+		break;
+	case PERF_HPP_REPORT__BLOCK_CYCLES_PCT:
+		fmt->entry = block_cycles_pct_entry;
+		break;
+	case PERF_HPP_REPORT__BLOCK_AVG_CYCLES:
+		fmt->entry = block_avg_cycles_entry;
+		break;
+	case PERF_HPP_REPORT__BLOCK_RANGE:
+		fmt->entry = block_range_entry;
+		break;
+	case PERF_HPP_REPORT__BLOCK_DSO:
+		fmt->entry = block_dso_entry;
+		break;
+	default:
+		return;
+	}
+
+	init_block_header(block_fmt);
+	perf_hpp_list__column_register(hpp_list, fmt);
+}
+
+static void register_block_columns(struct perf_hpp_list *hpp_list,
+				   struct report *rep)
+{
+	for (int i = 0; i < PERF_HPP_REPORT__BLOCK_MAX_INDEX; i++)
+		block_hpp_register(&block_fmts[i], i, hpp_list, rep);
+}
+
+static void init_block_hist(struct block_hist *bh, struct report *rep)
+{
+	__hists__init(&bh->block_hists, &bh->block_list);
+	perf_hpp_list__init(&bh->block_list);
+	bh->block_list.nr_header_lines = 1;
+
+	register_block_columns(&bh->block_list, rep);
+
+	perf_hpp_list__register_sort_field(&bh->block_list,
+		&block_fmts[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV].fmt);
+}
+
+static void init_block_info(struct block_info *bi, struct symbol *sym,
+			    struct cyc_hist *ch, int offset,
+			    u64 total_cycles)
+{
+	bi->sym = sym;
+	bi->start = ch->start;
+	bi->end = offset;
+	bi->cycles = ch->cycles;
+	bi->cycles_aggr = ch->cycles_aggr;
+	bi->num = ch->num;
+	bi->num_aggr = ch->num_aggr;
+	bi->total_cycles = total_cycles;
+}
+
+static int add_block_per_sym(struct hist_entry *he, struct block_hist *bh,
+			     u64 *block_cycles, u64 total_cycles)
+{
+	struct annotation *notes;
+	struct cyc_hist *ch;
+	static struct addr_location al;
+	u64 cycles = 0;
+
+	if (!he->ms.map || !he->ms.sym)
+		return 0;
+
+	memset(&al, 0, sizeof(al));
+	al.map = he->ms.map;
+	al.sym = he->ms.sym;
+
+	notes = symbol__annotation(he->ms.sym);
+	if (!notes || !notes->src || !notes->src->cycles_hist)
+		return 0;
+	ch = notes->src->cycles_hist;
+	for (unsigned int i = 0; i < symbol__size(he->ms.sym); i++) {
+		if (ch[i].num_aggr) {
+			struct block_info *bi;
+			struct hist_entry *he_block;
+
+			bi = block_info__new();
+			if (!bi)
+				return -1;
+
+			init_block_info(bi, he->ms.sym, &ch[i], i,
+					total_cycles);
+			cycles += bi->cycles_aggr / bi->num_aggr;
+
+			he_block = hists__add_entry_block(&bh->block_hists,
+							  &al, bi);
+			if (!he_block) {
+				block_info__put(bi);
+				return -1;
+			}
+		}
+	}
+
+	if (block_cycles)
+		*block_cycles += cycles;
+
+	return 0;
+}
+
+static int resort_cb(struct hist_entry *he, void *arg __maybe_unused)
+{
+	/* Skip the calculation of column length in output_resort */
+	he->filtered = true;
+	return 0;
+}
+
+static void hists__clear_filtered(struct hists *hists)
+{
+	struct rb_node *next = rb_first_cached(&hists->entries);
+	struct hist_entry *he;
+
+	while (next) {
+		he = rb_entry(next, struct hist_entry, rb_node);
+		he->filtered = false;
+		next = rb_next(&he->rb_node);
+	}
+}
+
+static void get_block_hists(struct hists *hists, struct block_hist *bh,
+			    struct report *rep)
+{
+	struct rb_node *next = rb_first_cached(&hists->entries);
+	struct hist_entry *he;
+
+	init_block_hist(bh, rep);
+
+	while (next) {
+		he = rb_entry(next, struct hist_entry, rb_node);
+		add_block_per_sym(he, bh, &rep->block_cycles,
+				  rep->cycles_count);
+		next = rb_next(&he->rb_node);
+	}
+
+	hists__output_resort_cb(&bh->block_hists, NULL, resort_cb);
+	hists__clear_filtered(&bh->block_hists);
+}
+
+static int hists__fprintf_all_blocks(struct hists *hists, struct report *rep)
+{
+	struct block_hist *bh = &rep->block_hist;
+
+	get_block_hists(hists, bh, rep);
+	symbol_conf.report_individual_block = true;
+	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
+		       stdout, true);
+	hists__delete_entries(&bh->block_hists);
+	return 0;
+}
+
 static int perf_evlist__tty_browse_hists(struct evlist *evlist,
 					 struct report *rep,
 					 const char *help)
@@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist,
 			continue;
 
 		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
+
+		if (rep->total_cycles) {
+			hists__fprintf_all_blocks(hists, rep);
+			continue;
+		}
+
 		hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
 			       !(symbol_conf.use_callchain ||
 			         symbol_conf.show_branchflag_count));
@@ -1373,6 +1779,15 @@ int cmd_report(int argc, const char **argv)
 		goto error;
 	}
 
+	if (sort_order && strstr(sort_order, "total_cycles") &&
+	    (sort__mode == SORT_MODE__BRANCH)) {
+		report.total_cycles = true;
+		if (!report.use_stdio) {
+			pr_err("Error: -s total_cycles can be only used together with --stdio\n");
+			goto error;
+		}
+	}
+
 	if (strcmp(input_name, "-") != 0)
 		setup_browser(true);
 	else
@@ -1423,7 +1838,7 @@ int cmd_report(int argc, const char **argv)
 	 * so don't allocate extra space that won't be used in the stdio
 	 * implementation.
 	 */
-	if (ui__has_annotation() || report.symbol_ipc) {
+	if (ui__has_annotation() || report.symbol_ipc || report.total_cycles) {
 		ret = symbol__annotation_init();
 		if (ret < 0)
 			goto error;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 5365606e9dad..655ef7708cd0 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -558,6 +558,25 @@ static int hist_entry__block_fprintf(struct hist_entry *he,
 	return ret;
 }
 
+static int hist_entry__individual_block_fprintf(struct hist_entry *he,
+						char *bf, size_t size,
+						FILE *fp)
+{
+	int ret = 0;
+
+	struct perf_hpp hpp = {
+		.buf		= bf,
+		.size		= size,
+		.skip		= false,
+	};
+
+	hist_entry__snprintf(he, &hpp);
+	if (!hpp.skip)
+		ret += fprintf(fp, "%s\n", bf);
+
+	return ret;
+}
+
 static int hist_entry__fprintf(struct hist_entry *he, size_t size,
 			       char *bf, size_t bfsz, FILE *fp,
 			       bool ignore_callchains)
@@ -580,6 +599,9 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
 	if (symbol_conf.report_block)
 		return hist_entry__block_fprintf(he, bf, size, fp);
 
+	if (symbol_conf.report_individual_block)
+		return hist_entry__individual_block_fprintf(he, bf, size, fp);
+
 	hist_entry__snprintf(he, &hpp);
 
 	ret = fprintf(fp, "%s\n", bf);
diff --git a/tools/perf/util/block.h b/tools/perf/util/block.h
index b6730204d0b9..bdd21354d26e 100644
--- a/tools/perf/util/block.h
+++ b/tools/perf/util/block.h
@@ -13,6 +13,7 @@ struct block_info {
 	u64			end;
 	u64			cycles;
 	u64			cycles_aggr;
+	u64			total_cycles;
 	s64			cycles_spark[NUM_SPARKS];
 	int			num;
 	int			num_aggr;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index af65ce950ba2..521f7185a94f 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -756,6 +756,10 @@ struct hist_entry *hists__add_entry_block(struct hists *hists,
 	struct hist_entry entry = {
 		.block_info = block_info,
 		.hists = hists,
+		.ms = {
+			.map = al->map,
+			.sym = al->sym,
+		},
 	}, *he = hists__findnew_entry(hists, &entry, al, false);
 
 	return he;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 43d1d410854a..eb286700a8a9 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -492,6 +492,10 @@ struct sort_entry sort_sym_ipc_null = {
 	.se_width_idx	= HISTC_SYMBOL_IPC,
 };
 
+struct sort_entry sort_block_cycles = {
+	.se_cmp		= sort__sym_cmp,
+};
+
 /* --sort srcfile */
 
 static char no_srcfile[1];
@@ -1695,6 +1699,7 @@ static struct sort_dimension bstack_sort_dimensions[] = {
 	DIM(SORT_SRCLINE_FROM, "srcline_from", sort_srcline_from),
 	DIM(SORT_SRCLINE_TO, "srcline_to", sort_srcline_to),
 	DIM(SORT_SYM_IPC, "ipc_lbr", sort_sym_ipc),
+	DIM(SORT_BLOCK_CYCLES, "total_cycles", sort_block_cycles),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 5aff9542d9b7..2ede6c70ad56 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -239,6 +239,7 @@ enum sort_type {
 	SORT_SRCLINE_FROM,
 	SORT_SRCLINE_TO,
 	SORT_SYM_IPC,
+	SORT_BLOCK_CYCLES,
 
 	/* memory mode specific sort keys */
 	__SORT_MEMORY_MODE,
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index e6880789864c..10f1ec3e0349 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -40,6 +40,7 @@ struct symbol_conf {
 			raw_trace,
 			report_hierarchy,
 			report_block,
+			report_individual_block,
 			inline_name,
 			disable_add2line_warn;
 	const char	*vmlinux_name,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/5] perf report: Support --percent-limit for total_cycles
  2019-10-15  5:33 [PATCH v2 0/5] perf report: Support sorting all blocks by cycles Jin Yao
                   ` (2 preceding siblings ...)
  2019-10-15  5:33 ` [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio Jin Yao
@ 2019-10-15  5:33 ` Jin Yao
  2019-10-15  5:33 ` [PATCH v2 5/5] perf report: Sort by sampled cycles percent per block for tui Jin Yao
  4 siblings, 0 replies; 20+ messages in thread
From: Jin Yao @ 2019-10-15  5:33 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

We have already supported the 'total_cycles' option in previous
patch. It's also useful to show entries only above a threshold
percent.

This patch enables '--percent-limit' for not showing entries
under that percent.

For example,

 perf report -s total_cycles --stdio --percent-limit 1

 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 2M of event 'cycles'
 # Event count (approx.): 2753248
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
 # ...............  ..............  ...........  ..........  .................................................................  ....................
 #
            26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
            15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
             5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
             4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
             4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
             3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
             3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
             3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
             2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
             2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
             2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
             2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
             2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
             2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
             1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
             1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
             1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so

It only shows the entries which 'Sampled Cycles%' > 1%.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-report.c |  2 +-
 tools/perf/ui/stdio/hist.c  |  7 ++++++-
 tools/perf/util/block.c     | 10 ++++++++++
 tools/perf/util/block.h     |  2 ++
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f04a7b225a81..5dbde541b797 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -874,7 +874,7 @@ static int hists__fprintf_all_blocks(struct hists *hists, struct report *rep)
 
 	get_block_hists(hists, bh, rep);
 	symbol_conf.report_individual_block = true;
-	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
+	hists__fprintf(&bh->block_hists, true, 0, 0, rep->min_percent,
 		       stdout, true);
 	hists__delete_entries(&bh->block_hists);
 	return 0;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 655ef7708cd0..d6991fc1d54e 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -15,6 +15,7 @@
 #include "../../util/srcline.h"
 #include "../../util/string2.h"
 #include "../../util/thread.h"
+#include "../../util/block.h"
 #include <linux/ctype.h>
 #include <linux/zalloc.h>
 
@@ -856,7 +857,11 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
 		if (h->filtered)
 			continue;
 
-		percent = hist_entry__get_percent_limit(h);
+		if (symbol_conf.report_individual_block)
+			percent = block_total_cycles_percent(h);
+		else
+			percent = hist_entry__get_percent_limit(h);
+
 		if (percent < min_pcnt)
 			continue;
 
diff --git a/tools/perf/util/block.c b/tools/perf/util/block.c
index e5e6f941f040..791a16da2102 100644
--- a/tools/perf/util/block.c
+++ b/tools/perf/util/block.c
@@ -61,3 +61,13 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 
 	return (int64_t)(bi_r->sym->end - bi_l->sym->end);
 }
+
+float block_total_cycles_percent(struct hist_entry *he)
+{
+	struct block_info *bi = he->block_info;
+
+	if (bi->total_cycles)
+		return bi->cycles * 100.0 / bi->total_cycles;
+
+	return 0.0;
+}
diff --git a/tools/perf/util/block.h b/tools/perf/util/block.h
index bdd21354d26e..4c2101b1d114 100644
--- a/tools/perf/util/block.h
+++ b/tools/perf/util/block.h
@@ -35,4 +35,6 @@ static inline void __block_info__zput(struct block_info **bi)
 int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			struct hist_entry *left, struct hist_entry *right);
 
+float block_total_cycles_percent(struct hist_entry *he);
+
 #endif /* __PERF_BLOCK_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/5] perf report: Sort by sampled cycles percent per block for tui
  2019-10-15  5:33 [PATCH v2 0/5] perf report: Support sorting all blocks by cycles Jin Yao
                   ` (3 preceding siblings ...)
  2019-10-15  5:33 ` [PATCH v2 4/5] perf report: Support --percent-limit for total_cycles Jin Yao
@ 2019-10-15  5:33 ` Jin Yao
  4 siblings, 0 replies; 20+ messages in thread
From: Jin Yao @ 2019-10-15  5:33 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

Previous patch has implemented a new sort option "total_cycles".
But there was only stdio mode supported.

This patch supports the tui mode and support '--percent-limit'.

For example,

 perf record -b ./div
 perf report -s total_cycles --percent-limit 1

 # Samples: 2753248 of event 'cycles'
 Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                              [Program Block Range]         Shared Object
          26.04%            2.8M        0.40%          18                                             [div.c:42 -> div.c:39]                   div
          15.17%            1.2M        0.16%           7                                 [random_r.c:357 -> random_r.c:380]          libc-2.27.so
           5.11%          402.0K        0.04%           2                                             [div.c:27 -> div.c:28]                   div
           4.87%          381.6K        0.04%           2                                     [random.c:288 -> random.c:291]          libc-2.27.so
           4.53%          381.0K        0.04%           2                                             [div.c:40 -> div.c:40]                   div
           3.85%          300.9K        0.02%           1                                             [div.c:22 -> div.c:25]                   div
           3.08%          241.1K        0.02%           1                                           [rand.c:26 -> rand.c:27]          libc-2.27.so
           3.06%          240.0K        0.02%           1                                     [random.c:291 -> random.c:291]          libc-2.27.so
           2.78%          215.7K        0.02%           1                                     [random.c:298 -> random.c:298]          libc-2.27.so
           2.52%          198.3K        0.02%           1                                     [random.c:293 -> random.c:293]          libc-2.27.so
           2.36%          184.8K        0.02%           1                                           [rand.c:28 -> rand.c:28]          libc-2.27.so
           2.33%          180.5K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
           2.28%          176.7K        0.02%           1                                     [random.c:295 -> random.c:295]          libc-2.27.so
           2.20%          168.8K        0.02%           1                                         [rand@plt+0 -> rand@plt+0]                   div
           1.98%          158.2K        0.02%           1                                 [random_r.c:388 -> random_r.c:388]          libc-2.27.so
           1.57%          123.3K        0.02%           1                                             [div.c:42 -> div.c:44]                   div
           1.44%          116.0K        0.42%          19                                 [random_r.c:357 -> random_r.c:394]          libc-2.27.so

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-report.c    | 30 +++++++++++++---
 tools/perf/ui/browsers/hists.c | 62 +++++++++++++++++++++++++++++++++-
 tools/perf/ui/browsers/hists.h |  2 ++
 tools/perf/util/hist.h         | 12 +++++++
 4 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 5dbde541b797..7dc74d794265 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -880,6 +880,27 @@ static int hists__fprintf_all_blocks(struct hists *hists, struct report *rep)
 	return 0;
 }
 
+static int perf_evlist__tui_block_hists_browse(struct evlist *evlist,
+					       struct report *rep)
+{
+	struct block_hist *bh = &rep->block_hist;
+	struct evsel *pos;
+	int ret;
+
+	evlist__for_each_entry(evlist, pos) {
+		struct hists *hists = evsel__hists(pos);
+
+		get_block_hists(hists, bh, rep);
+		symbol_conf.report_individual_block = true;
+		ret = block_hists_tui_browse(bh, pos, rep->min_percent);
+		hists__delete_entries(&bh->block_hists);
+		if (ret != 0)
+			return ret;
+	}
+
+	return ret;
+}
+
 static int perf_evlist__tty_browse_hists(struct evlist *evlist,
 					 struct report *rep,
 					 const char *help)
@@ -988,6 +1009,11 @@ static int report__browse_hists(struct report *rep)
 
 	switch (use_browser) {
 	case 1:
+		if (rep->total_cycles) {
+			ret = perf_evlist__tui_block_hists_browse(evlist, rep);
+			break;
+		}
+
 		ret = perf_evlist__tui_browse_hists(evlist, help, NULL,
 						    rep->min_percent,
 						    &session->header.env,
@@ -1782,10 +1808,6 @@ int cmd_report(int argc, const char **argv)
 	if (sort_order && strstr(sort_order, "total_cycles") &&
 	    (sort__mode == SORT_MODE__BRANCH)) {
 		report.total_cycles = true;
-		if (!report.use_stdio) {
-			pr_err("Error: -s total_cycles can be only used together with --stdio\n");
-			goto error;
-		}
 	}
 
 	if (strcmp(input_name, "-") != 0)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 7a7187e069b4..e0f3b1d4a43b 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -26,6 +26,7 @@
 #include "../../util/sort.h"
 #include "../../util/top.h"
 #include "../../util/thread.h"
+#include "../../util/block.h"
 #include "../../arch/common.h"
 #include "../../perf.h"
 
@@ -1783,7 +1784,11 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser)
 			continue;
 		}
 
-		percent = hist_entry__get_percent_limit(h);
+		if (symbol_conf.report_individual_block)
+			percent = block_total_cycles_percent(h);
+		else
+			percent = hist_entry__get_percent_limit(h);
+
 		if (percent < hb->min_pcnt)
 			continue;
 
@@ -3443,3 +3448,58 @@ int perf_evlist__tui_browse_hists(struct evlist *evlist, const char *help,
 					       warn_lost_event,
 					       annotation_opts);
 }
+
+static int block_hists_browser__title(struct hist_browser *browser, char *bf,
+				      size_t size)
+{
+	struct hists *hists = evsel__hists(browser->block_evsel);
+	const char *evname = perf_evsel__name(browser->block_evsel);
+	unsigned long nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE];
+	int ret;
+
+	ret = scnprintf(bf, size, "# Samples: %lu", nr_samples);
+	if (evname)
+		scnprintf(bf + ret, size -  ret, " of event '%s'", evname);
+
+	return 0;
+}
+
+int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
+			   float min_percent)
+{
+	struct hists *hists = &bh->block_hists;
+	struct hist_browser *browser;
+	int key = -1;
+	static const char help[] =
+	" q             Quit \n";
+
+	browser = hist_browser__new(hists);
+	if (!browser)
+		return -1;
+
+	browser->block_evsel = evsel;
+	browser->title = block_hists_browser__title;
+	browser->min_pcnt = min_percent;
+
+	/* reset abort key so that it can get Ctrl-C as a key */
+	SLang_reset_tty();
+	SLang_init_tty(0, 0, 0);
+
+	while (1) {
+		key = hist_browser__run(browser, "? - help", true);
+
+		switch (key) {
+		case 'q':
+			goto out;
+		case '?':
+			ui_browser__help_window(&browser->b, help);
+			break;
+		default:
+			break;
+		}
+	}
+
+out:
+	hist_browser__delete(browser);
+	return 0;
+}
diff --git a/tools/perf/ui/browsers/hists.h b/tools/perf/ui/browsers/hists.h
index 91d3e18b50aa..078f2f2c7abd 100644
--- a/tools/perf/ui/browsers/hists.h
+++ b/tools/perf/ui/browsers/hists.h
@@ -5,6 +5,7 @@
 #include "ui/browser.h"
 
 struct annotation_options;
+struct evsel;
 
 struct hist_browser {
 	struct ui_browser   b;
@@ -15,6 +16,7 @@ struct hist_browser {
 	struct pstack	    *pstack;
 	struct perf_env	    *env;
 	struct annotation_options *annotation_opts;
+	struct evsel	    *block_evsel;
 	int		     print_seq;
 	bool		     show_dso;
 	bool		     show_headers;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4d87c7b4c1b2..f254fa349ad6 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -449,6 +449,8 @@ enum rstype {
 	A_SOURCE
 };
 
+struct block_hist;
+
 #ifdef HAVE_SLANG_SUPPORT
 #include "../ui/keysyms.h"
 void attr_to_script(char *buf, struct perf_event_attr *attr);
@@ -474,6 +476,9 @@ void run_script(char *cmd);
 int res_sample_browse(struct res_sample *res_samples, int num_res,
 		      struct evsel *evsel, enum rstype rstype);
 void res_sample_init(void);
+
+int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
+			   float min_percent);
 #else
 static inline
 int perf_evlist__tui_browse_hists(struct evlist *evlist __maybe_unused,
@@ -516,6 +521,13 @@ static inline int res_sample_browse(struct res_sample *res_samples __maybe_unuse
 	return 0;
 }
 
+static inline int block_hists_tui_browse(struct block_hist *bh __maybe_unused,
+					 struct evsel *evsel __maybe_unused,
+					 float min_percent __maybe_unused)
+{
+	return 0;
+}
+
 static inline void res_sample_init(void) {}
 
 #define K_LEFT  -1000
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-15  5:33 ` [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio Jin Yao
@ 2019-10-15  8:41   ` Jiri Olsa
  2019-10-15 14:53     ` Jin, Yao
  2019-10-21 16:07   ` Jiri Olsa
  2019-10-21 16:08   ` Jiri Olsa
  2 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2019-10-15  8:41 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Oct 15, 2019 at 01:33:48PM +0800, Jin Yao wrote:

SNIP

> +enum {
> +	PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV,
> +	PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
> +	PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
> +	PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
> +	PERF_HPP_REPORT__BLOCK_RANGE,
> +	PERF_HPP_REPORT__BLOCK_DSO,
> +	PERF_HPP_REPORT__BLOCK_MAX_INDEX
> +};
> +
> +static struct block_fmt block_fmts[PERF_HPP_REPORT__BLOCK_MAX_INDEX];
> +
> +static struct block_header_column{
> +	const char *name;
> +	int width;
> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
> +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
> +		.name = "Sampled Cycles%",
> +		.width = 15,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
> +		.name = "Sampled Cycles",
> +		.width = 14,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
> +		.name = "Avg Cycles%",
> +		.width = 11,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
> +		.name = "Avg Cycles",
> +		.width = 10,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
> +		.name = "[Program Block Range]",
> +		.width = 70,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_DSO] = {
> +		.name = "Shared Object",
> +		.width = 20,
> +	}
>  };

so we already have support for multiple columns,
why don't you add those as 'struct sort_entry' objects?

SNIP

> +{
> +	struct block_hist *bh = &rep->block_hist;
> +
> +	get_block_hists(hists, bh, rep);
> +	symbol_conf.report_individual_block = true;
> +	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
> +		       stdout, true);
> +	hists__delete_entries(&bh->block_hists);
> +	return 0;
> +}
> +
>  static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>  					 struct report *rep,
>  					 const char *help)
> @@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>  			continue;
>  
>  		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
> +
> +		if (rep->total_cycles) {
> +			hists__fprintf_all_blocks(hists, rep);

so this call kicks all the block info setup/count/print, right?

I thingk it shouldn't be in the output code, but in the code before..
from what I see you could count block_info counts during the sample
processing, no?

jirka

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-15  8:41   ` Jiri Olsa
@ 2019-10-15 14:53     ` Jin, Yao
  2019-10-16 10:15       ` Jiri Olsa
  0 siblings, 1 reply; 20+ messages in thread
From: Jin, Yao @ 2019-10-15 14:53 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 10/15/2019 4:41 PM, Jiri Olsa wrote:
> On Tue, Oct 15, 2019 at 01:33:48PM +0800, Jin Yao wrote:
> 
> SNIP
> 
>> +enum {
>> +	PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV,
>> +	PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
>> +	PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
>> +	PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
>> +	PERF_HPP_REPORT__BLOCK_RANGE,
>> +	PERF_HPP_REPORT__BLOCK_DSO,
>> +	PERF_HPP_REPORT__BLOCK_MAX_INDEX
>> +};
>> +
>> +static struct block_fmt block_fmts[PERF_HPP_REPORT__BLOCK_MAX_INDEX];
>> +
>> +static struct block_header_column{
>> +	const char *name;
>> +	int width;
>> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
>> +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
>> +		.name = "Sampled Cycles%",
>> +		.width = 15,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
>> +		.name = "Sampled Cycles",
>> +		.width = 14,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
>> +		.name = "Avg Cycles%",
>> +		.width = 11,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
>> +		.name = "Avg Cycles",
>> +		.width = 10,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
>> +		.name = "[Program Block Range]",
>> +		.width = 70,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_DSO] = {
>> +		.name = "Shared Object",
>> +		.width = 20,
>> +	}
>>   };
> 
> so we already have support for multiple columns,
> why don't you add those as 'struct sort_entry' objects?
> 

For 'struct sort_entry' objects, do you mean I should reuse the 
"sort_dso" which has been implemented yet in util/sort.c?

For other columns, it looks we can't reuse the existing sort_entry objects.

> SNIP
> 
>> +{
>> +	struct block_hist *bh = &rep->block_hist;
>> +
>> +	get_block_hists(hists, bh, rep);
>> +	symbol_conf.report_individual_block = true;
>> +	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
>> +		       stdout, true);
>> +	hists__delete_entries(&bh->block_hists);
>> +	return 0;
>> +}
>> +
>>   static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>>   					 struct report *rep,
>>   					 const char *help)
>> @@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>>   			continue;
>>   
>>   		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>> +
>> +		if (rep->total_cycles) {
>> +			hists__fprintf_all_blocks(hists, rep);
> 
> so this call kicks all the block info setup/count/print, right?
> 

Yes, all in this call.

> I thingk it shouldn't be in the output code, but in the code before..
> from what I see you could count block_info counts during the sample
> processing, no?
>

In sample processing, we just get all symbols and account the cycles per 
symbol. We need to create/count the block_info at some points after the 
sample processing.

Maybe it's not very good to put block info setup/count/print in a call, 
but it's really not easy to process the block_info during the sample 
processing.

Thanks
Jin Yao

> jirka
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-15 14:53     ` Jin, Yao
@ 2019-10-16 10:15       ` Jiri Olsa
  2019-10-16 10:51         ` Jin, Yao
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2019-10-16 10:15 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Oct 15, 2019 at 10:53:18PM +0800, Jin, Yao wrote:

SNIP

> > > +static struct block_header_column{
> > > +	const char *name;
> > > +	int width;
> > > +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
> > > +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
> > > +		.name = "Sampled Cycles%",
> > > +		.width = 15,
> > > +	},
> > > +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
> > > +		.name = "Sampled Cycles",
> > > +		.width = 14,
> > > +	},
> > > +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
> > > +		.name = "Avg Cycles%",
> > > +		.width = 11,
> > > +	},
> > > +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
> > > +		.name = "Avg Cycles",
> > > +		.width = 10,
> > > +	},
> > > +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
> > > +		.name = "[Program Block Range]",
> > > +		.width = 70,
> > > +	},
> > > +	[PERF_HPP_REPORT__BLOCK_DSO] = {
> > > +		.name = "Shared Object",
> > > +		.width = 20,
> > > +	}
> > >   };
> > 
> > so we already have support for multiple columns,
> > why don't you add those as 'struct sort_entry' objects?
> > 
> 
> For 'struct sort_entry' objects, do you mean I should reuse the "sort_dso"
> which has been implemented yet in util/sort.c?
> 
> For other columns, it looks we can't reuse the existing sort_entry objects.

I did not mean reuse, just add new sort entries
to current sort framework

> 
> > SNIP
> > 
> > > +{
> > > +	struct block_hist *bh = &rep->block_hist;
> > > +
> > > +	get_block_hists(hists, bh, rep);
> > > +	symbol_conf.report_individual_block = true;
> > > +	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
> > > +		       stdout, true);
> > > +	hists__delete_entries(&bh->block_hists);
> > > +	return 0;
> > > +}
> > > +
> > >   static int perf_evlist__tty_browse_hists(struct evlist *evlist,
> > >   					 struct report *rep,
> > >   					 const char *help)
> > > @@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist,
> > >   			continue;
> > >   		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
> > > +
> > > +		if (rep->total_cycles) {
> > > +			hists__fprintf_all_blocks(hists, rep);
> > 
> > so this call kicks all the block info setup/count/print, right?
> > 
> 
> Yes, all in this call.
> 
> > I thingk it shouldn't be in the output code, but in the code before..
> > from what I see you could count block_info counts during the sample
> > processing, no?
> > 
> 
> In sample processing, we just get all symbols and account the cycles per
> symbol. We need to create/count the block_info at some points after the
> sample processing.

understand, but it needs to be outside display function

also, can't you gather the block_info data gradually
during the sample processing?

jirka

> 
> Maybe it's not very good to put block info setup/count/print in a call, but
> it's really not easy to process the block_info during the sample processing.
> 
> Thanks
> Jin Yao
> 
> > jirka
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-16 10:15       ` Jiri Olsa
@ 2019-10-16 10:51         ` Jin, Yao
  2019-10-16 12:53           ` Jiri Olsa
  0 siblings, 1 reply; 20+ messages in thread
From: Jin, Yao @ 2019-10-16 10:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 10/16/2019 6:15 PM, Jiri Olsa wrote:
> On Tue, Oct 15, 2019 at 10:53:18PM +0800, Jin, Yao wrote:
> 
> SNIP
> 
>>>> +static struct block_header_column{
>>>> +	const char *name;
>>>> +	int width;
>>>> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
>>>> +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
>>>> +		.name = "Sampled Cycles%",
>>>> +		.width = 15,
>>>> +	},
>>>> +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
>>>> +		.name = "Sampled Cycles",
>>>> +		.width = 14,
>>>> +	},
>>>> +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
>>>> +		.name = "Avg Cycles%",
>>>> +		.width = 11,
>>>> +	},
>>>> +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
>>>> +		.name = "Avg Cycles",
>>>> +		.width = 10,
>>>> +	},
>>>> +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
>>>> +		.name = "[Program Block Range]",
>>>> +		.width = 70,
>>>> +	},
>>>> +	[PERF_HPP_REPORT__BLOCK_DSO] = {
>>>> +		.name = "Shared Object",
>>>> +		.width = 20,
>>>> +	}
>>>>    };
>>>
>>> so we already have support for multiple columns,
>>> why don't you add those as 'struct sort_entry' objects?
>>>
>>
>> For 'struct sort_entry' objects, do you mean I should reuse the "sort_dso"
>> which has been implemented yet in util/sort.c?
>>
>> For other columns, it looks we can't reuse the existing sort_entry objects.
> 
> I did not mean reuse, just add new sort entries
> to current sort framework
> 

Does it seem like what the c2c does?

For example, my new update is like:

struct block_dimension {
        const char              *header;
        int                     idx;
        int                     width;
        struct sort_entry       *se;
        int64_t (*cmp)(struct perf_hpp_fmt *,
                       struct hist_entry *, struct hist_entry *);
        int64_t (*sort)(struct perf_hpp_fmt *,
                        struct hist_entry *, struct hist_entry *);
        int   (*entry)(struct perf_hpp_fmt *, struct perf_hpp *,
                       struct hist_entry *);
};

struct block_fmt {
        struct perf_hpp_fmt     fmt;
        struct report           *rep;
        struct block_dimension  *dim;
};

static struct block_dimension dim_total_cycles_pct {
        .header = "Sampled Cycles%",
        .idx = PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT,
        .width = 15,
        .cmp = block_info__cmp;
        .sort = block_cycles_cov_sort;
        .entry = block_cycles_cov_entry;
};

......

Is above new update correct?

>>
>>> SNIP
>>>
>>>> +{
>>>> +	struct block_hist *bh = &rep->block_hist;
>>>> +
>>>> +	get_block_hists(hists, bh, rep);
>>>> +	symbol_conf.report_individual_block = true;
>>>> +	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
>>>> +		       stdout, true);
>>>> +	hists__delete_entries(&bh->block_hists);
>>>> +	return 0;
>>>> +}
>>>> +
>>>>    static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>>>>    					 struct report *rep,
>>>>    					 const char *help)
>>>> @@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>>>>    			continue;
>>>>    		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>>>> +
>>>> +		if (rep->total_cycles) {
>>>> +			hists__fprintf_all_blocks(hists, rep);
>>>
>>> so this call kicks all the block info setup/count/print, right?
>>>
>>
>> Yes, all in this call.
>>
>>> I thingk it shouldn't be in the output code, but in the code before..
>>> from what I see you could count block_info counts during the sample
>>> processing, no?
>>>
>>
>> In sample processing, we just get all symbols and account the cycles per
>> symbol. We need to create/count the block_info at some points after the
>> sample processing.
> 
> understand, but it needs to be outside display function
> 

OK, I will move the code for block_info collection outside of display 
function.

> also, can't you gather the block_info data gradually
> during the sample processing?
> 

Looks we have to gather the block_info after the sample processing. 
That's because in each sample processing, we will call 
hist__account_cycles(). Then finally __symbol__account_cycles() gets 
called for accounting one basic block in this symbol.

ch[offset].num_aggr++;
ch[offset].cycles_aggr += cycles;

So actually, after the sample processing, we will get the num_aggr and 
cycles_aggr for this basic block and compute the average cycles for this 
block (cycles_aggr / num_aggr).

That's why I think we should gather the block_info after the sample 
processing.

Thanks
Jin Yao

> jirka
> 
>>
>> Maybe it's not very good to put block info setup/count/print in a call, but
>> it's really not easy to process the block_info during the sample processing.
>>
>> Thanks
>> Jin Yao
>>
>>> jirka
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-16 10:51         ` Jin, Yao
@ 2019-10-16 12:53           ` Jiri Olsa
  2019-10-16 13:09             ` Jin, Yao
  2019-10-21  6:56             ` Jin, Yao
  0 siblings, 2 replies; 20+ messages in thread
From: Jiri Olsa @ 2019-10-16 12:53 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Wed, Oct 16, 2019 at 06:51:07PM +0800, Jin, Yao wrote:
> 
> 
> On 10/16/2019 6:15 PM, Jiri Olsa wrote:
> > On Tue, Oct 15, 2019 at 10:53:18PM +0800, Jin, Yao wrote:
> > 
> > SNIP
> > 
> > > > > +static struct block_header_column{
> > > > > +	const char *name;
> > > > > +	int width;
> > > > > +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
> > > > > +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
> > > > > +		.name = "Sampled Cycles%",
> > > > > +		.width = 15,
> > > > > +	},
> > > > > +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
> > > > > +		.name = "Sampled Cycles",
> > > > > +		.width = 14,
> > > > > +	},
> > > > > +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
> > > > > +		.name = "Avg Cycles%",
> > > > > +		.width = 11,
> > > > > +	},
> > > > > +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
> > > > > +		.name = "Avg Cycles",
> > > > > +		.width = 10,
> > > > > +	},
> > > > > +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
> > > > > +		.name = "[Program Block Range]",
> > > > > +		.width = 70,
> > > > > +	},
> > > > > +	[PERF_HPP_REPORT__BLOCK_DSO] = {
> > > > > +		.name = "Shared Object",
> > > > > +		.width = 20,
> > > > > +	}
> > > > >    };
> > > > 
> > > > so we already have support for multiple columns,
> > > > why don't you add those as 'struct sort_entry' objects?
> > > > 
> > > 
> > > For 'struct sort_entry' objects, do you mean I should reuse the "sort_dso"
> > > which has been implemented yet in util/sort.c?
> > > 
> > > For other columns, it looks we can't reuse the existing sort_entry objects.
> > 
> > I did not mean reuse, just add new sort entries
> > to current sort framework
> > 
> 
> Does it seem like what the c2c does?

well c2c has its own data output with multiline column titles,
hence it has its own separate dimension stuff, but your code
output is within the standard perf report right? single column
output.. why couldn't you use just sort_entry ?

jirka

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-16 12:53           ` Jiri Olsa
@ 2019-10-16 13:09             ` Jin, Yao
  2019-10-21  6:56             ` Jin, Yao
  1 sibling, 0 replies; 20+ messages in thread
From: Jin, Yao @ 2019-10-16 13:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 10/16/2019 8:53 PM, Jiri Olsa wrote:
> On Wed, Oct 16, 2019 at 06:51:07PM +0800, Jin, Yao wrote:
>>
>>
>> On 10/16/2019 6:15 PM, Jiri Olsa wrote:
>>> On Tue, Oct 15, 2019 at 10:53:18PM +0800, Jin, Yao wrote:
>>>
>>> SNIP
>>>
>>>>>> +static struct block_header_column{
>>>>>> +	const char *name;
>>>>>> +	int width;
>>>>>> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
>>>>>> +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
>>>>>> +		.name = "Sampled Cycles%",
>>>>>> +		.width = 15,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
>>>>>> +		.name = "Sampled Cycles",
>>>>>> +		.width = 14,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
>>>>>> +		.name = "Avg Cycles%",
>>>>>> +		.width = 11,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
>>>>>> +		.name = "Avg Cycles",
>>>>>> +		.width = 10,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
>>>>>> +		.name = "[Program Block Range]",
>>>>>> +		.width = 70,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_DSO] = {
>>>>>> +		.name = "Shared Object",
>>>>>> +		.width = 20,
>>>>>> +	}
>>>>>>     };
>>>>>
>>>>> so we already have support for multiple columns,
>>>>> why don't you add those as 'struct sort_entry' objects?
>>>>>
>>>>
>>>> For 'struct sort_entry' objects, do you mean I should reuse the "sort_dso"
>>>> which has been implemented yet in util/sort.c?
>>>>
>>>> For other columns, it looks we can't reuse the existing sort_entry objects.
>>>
>>> I did not mean reuse, just add new sort entries
>>> to current sort framework
>>>
>>
>> Does it seem like what the c2c does?
> 
> well c2c has its own data output with multiline column titles,
> hence it has its own separate dimension stuff, but your code
> output is within the standard perf report right? single column
> output.. why couldn't you use just sort_entry ?
> 
> jirka
> 

No, my output is probably not within standard perf report. They are
totally 6 columns but only one column ('Sampled Cycles%') needs to be 
sorted. Other columns are not sorted. They only output some information.

For sort_entry, maybe the sorted column ('Sampled Cycles%') can use 
sort_entry, but others may not need it. So I don't know if I really need 
the sort_entry for my case.

I just feel maybe I still misunderstand what you have suggested. :(

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-16 12:53           ` Jiri Olsa
  2019-10-16 13:09             ` Jin, Yao
@ 2019-10-21  6:56             ` Jin, Yao
  2019-10-21 14:04               ` Jiri Olsa
  1 sibling, 1 reply; 20+ messages in thread
From: Jin, Yao @ 2019-10-21  6:56 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 10/16/2019 8:53 PM, Jiri Olsa wrote:
> On Wed, Oct 16, 2019 at 06:51:07PM +0800, Jin, Yao wrote:
>>
>>
>> On 10/16/2019 6:15 PM, Jiri Olsa wrote:
>>> On Tue, Oct 15, 2019 at 10:53:18PM +0800, Jin, Yao wrote:
>>>
>>> SNIP
>>>
>>>>>> +static struct block_header_column{
>>>>>> +	const char *name;
>>>>>> +	int width;
>>>>>> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
>>>>>> +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
>>>>>> +		.name = "Sampled Cycles%",
>>>>>> +		.width = 15,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
>>>>>> +		.name = "Sampled Cycles",
>>>>>> +		.width = 14,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
>>>>>> +		.name = "Avg Cycles%",
>>>>>> +		.width = 11,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
>>>>>> +		.name = "Avg Cycles",
>>>>>> +		.width = 10,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
>>>>>> +		.name = "[Program Block Range]",
>>>>>> +		.width = 70,
>>>>>> +	},
>>>>>> +	[PERF_HPP_REPORT__BLOCK_DSO] = {
>>>>>> +		.name = "Shared Object",
>>>>>> +		.width = 20,
>>>>>> +	}
>>>>>>     };
>>>>>
>>>>> so we already have support for multiple columns,
>>>>> why don't you add those as 'struct sort_entry' objects?
>>>>>
>>>>
>>>> For 'struct sort_entry' objects, do you mean I should reuse the "sort_dso"
>>>> which has been implemented yet in util/sort.c?
>>>>
>>>> For other columns, it looks we can't reuse the existing sort_entry objects.
>>>
>>> I did not mean reuse, just add new sort entries
>>> to current sort framework
>>>
>>
>> Does it seem like what the c2c does?
> 
> well c2c has its own data output with multiline column titles,
> hence it has its own separate dimension stuff, but your code
> output is within the standard perf report right? single column
> output.. why couldn't you use just sort_entry ?
> 
> jirka
> 

Hi Jiri,

I've being thinking how to use sort_entry but I have some troubles.

In v2, I used "struct perf_hpp_fmt" to pass extra argument. For example,

static int64_t block_cycles_cov_sort(struct perf_hpp_fmt *fmt,
				     struct hist_entry *left,
				     struct hist_entry *right)
{
	struct block_fmt *block_fmt = container_of(fmt, ...);
	struct report *rep = block_fmt->rep;
	...
}

But if I just use sort_entry, I can't pass extra argument (it's not a 
good idea to add more fields in struct hist_entry).

int64_t sort__xxx_sort(struct hist_entry *left,
		       struct hist_entry *right)

And for entry print it's similar, I can't pass extra argument in.

In v2,
static int block_cycles_pct_entry(struct perf_hpp_fmt *fmt,
				  struct perf_hpp *hpp,
				  struct hist_entry *he)
{
	struct block_fmt *block_fmt = container_of(fmt,...);
	struct report *rep = block_fmt->rep;
	...
}

But for se_snprintf, I can't pass extra argument in.

hist_entry__xxx_snprintf(struct hist_entry *he, char *bf,
			 size_t size, unsigned int width)

That's why I feel headache for just using the sort_entry. :(

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-21  6:56             ` Jin, Yao
@ 2019-10-21 14:04               ` Jiri Olsa
  2019-10-21 16:07                 ` Jiri Olsa
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2019-10-21 14:04 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Mon, Oct 21, 2019 at 02:56:57PM +0800, Jin, Yao wrote:

SNIP

> > > Does it seem like what the c2c does?
> > 
> > well c2c has its own data output with multiline column titles,
> > hence it has its own separate dimension stuff, but your code
> > output is within the standard perf report right? single column
> > output.. why couldn't you use just sort_entry ?
> > 
> > jirka
> > 
> 
> Hi Jiri,
> 
> I've being thinking how to use sort_entry but I have some troubles.
> 
> In v2, I used "struct perf_hpp_fmt" to pass extra argument. For example,
> 
> static int64_t block_cycles_cov_sort(struct perf_hpp_fmt *fmt,
> 				     struct hist_entry *left,
> 				     struct hist_entry *right)
> {
> 	struct block_fmt *block_fmt = container_of(fmt, ...);
> 	struct report *rep = block_fmt->rep;
> 	...
> }
> 
> But if I just use sort_entry, I can't pass extra argument (it's not a good
> idea to add more fields in struct hist_entry).
> 
> int64_t sort__xxx_sort(struct hist_entry *left,
> 		       struct hist_entry *right)
> 
> And for entry print it's similar, I can't pass extra argument in.
> 
> In v2,
> static int block_cycles_pct_entry(struct perf_hpp_fmt *fmt,
> 				  struct perf_hpp *hpp,
> 				  struct hist_entry *he)
> {
> 	struct block_fmt *block_fmt = container_of(fmt,...);
> 	struct report *rep = block_fmt->rep;
> 	...
> }
> 
> But for se_snprintf, I can't pass extra argument in.
> 
> hist_entry__xxx_snprintf(struct hist_entry *he, char *bf,
> 			 size_t size, unsigned int width)
> 
> That's why I feel headache for just using the sort_entry. :(

you might be right, I just want to omit another field output framework ;-) 
I'm checking on this and will let you know if I find some way

jirka


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-21 14:04               ` Jiri Olsa
@ 2019-10-21 16:07                 ` Jiri Olsa
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Olsa @ 2019-10-21 16:07 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Mon, Oct 21, 2019 at 04:04:39PM +0200, Jiri Olsa wrote:
> On Mon, Oct 21, 2019 at 02:56:57PM +0800, Jin, Yao wrote:
> 
> SNIP
> 
> > > > Does it seem like what the c2c does?
> > > 
> > > well c2c has its own data output with multiline column titles,
> > > hence it has its own separate dimension stuff, but your code
> > > output is within the standard perf report right? single column
> > > output.. why couldn't you use just sort_entry ?
> > > 
> > > jirka
> > > 
> > 
> > Hi Jiri,
> > 
> > I've being thinking how to use sort_entry but I have some troubles.
> > 
> > In v2, I used "struct perf_hpp_fmt" to pass extra argument. For example,
> > 
> > static int64_t block_cycles_cov_sort(struct perf_hpp_fmt *fmt,
> > 				     struct hist_entry *left,
> > 				     struct hist_entry *right)
> > {
> > 	struct block_fmt *block_fmt = container_of(fmt, ...);
> > 	struct report *rep = block_fmt->rep;
> > 	...
> > }
> > 
> > But if I just use sort_entry, I can't pass extra argument (it's not a good
> > idea to add more fields in struct hist_entry).
> > 
> > int64_t sort__xxx_sort(struct hist_entry *left,
> > 		       struct hist_entry *right)
> > 
> > And for entry print it's similar, I can't pass extra argument in.
> > 
> > In v2,
> > static int block_cycles_pct_entry(struct perf_hpp_fmt *fmt,
> > 				  struct perf_hpp *hpp,
> > 				  struct hist_entry *he)
> > {
> > 	struct block_fmt *block_fmt = container_of(fmt,...);
> > 	struct report *rep = block_fmt->rep;
> > 	...
> > }
> > 
> > But for se_snprintf, I can't pass extra argument in.
> > 
> > hist_entry__xxx_snprintf(struct hist_entry *he, char *bf,
> > 			 size_t size, unsigned int width)
> > 
> > That's why I feel headache for just using the sort_entry. :(
> 
> you might be right, I just want to omit another field output framework ;-) 
> I'm checking on this and will let you know if I find some way

ok, because it's based on branch data the sort entry would not be
convenient.. I'm sending some other comments for the patch

thanks,
jirka


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-15  5:33 ` [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio Jin Yao
  2019-10-15  8:41   ` Jiri Olsa
@ 2019-10-21 16:07   ` Jiri Olsa
  2019-10-22  0:57     ` Jin, Yao
  2019-10-21 16:08   ` Jiri Olsa
  2 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2019-10-21 16:07 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Oct 15, 2019 at 01:33:48PM +0800, Jin Yao wrote:

SNIP

> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -51,6 +51,7 @@
>  #include "util/util.h" // perf_tip()
>  #include "ui/ui.h"
>  #include "ui/progress.h"
> +#include "util/block.h"
>  
>  #include <dlfcn.h>
>  #include <errno.h>
> @@ -96,10 +97,64 @@ struct report {
>  	float			min_percent;
>  	u64			nr_entries;
>  	u64			queue_size;
> +	u64			cycles_count;
> +	u64			block_cycles;
>  	int			socket_filter;
>  	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
>  	struct branch_type_stat	brtype_stat;
>  	bool			symbol_ipc;
> +	bool			total_cycles;
> +	struct block_hist	block_hist;
> +};
> +

please put all this (everything that has *block* in the name ;-) )
to the block_info.[ch] and try to reuse some of the functions in
builtin-diff.c, looks like there's some duplicity like for
init_block_hist init_block_info

I understand that report and diff interface would be different
at some point, but there seems to be many common parts

jirka

> +struct block_fmt {
> +	struct perf_hpp_fmt	fmt;
> +	int			idx;
> +	int			width;
> +	const char		*header;
> +	struct report		*rep;
> +};
> +
> +enum {
> +	PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV,
> +	PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
> +	PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
> +	PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
> +	PERF_HPP_REPORT__BLOCK_RANGE,
> +	PERF_HPP_REPORT__BLOCK_DSO,
> +	PERF_HPP_REPORT__BLOCK_MAX_INDEX
> +};
> +
> +static struct block_fmt block_fmts[PERF_HPP_REPORT__BLOCK_MAX_INDEX];
> +
> +static struct block_header_column{
> +	const char *name;
> +	int width;
> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
> +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
> +		.name = "Sampled Cycles%",
> +		.width = 15,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
> +		.name = "Sampled Cycles",
> +		.width = 14,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
> +		.name = "Avg Cycles%",
> +		.width = 11,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
> +		.name = "Avg Cycles",
> +		.width = 10,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
> +		.name = "[Program Block Range]",
> +		.width = 70,
> +	},
> +	[PERF_HPP_REPORT__BLOCK_DSO] = {
> +		.name = "Shared Object",
> +		.width = 20,
> +	}
>  };
>  
>  static int report__config(const char *var, const char *value, void *cb)
> @@ -277,7 +332,8 @@ static int process_sample_event(struct perf_tool *tool,
>  		if (!sample->branch_stack)
>  			goto out_put;
>  
> -		iter.add_entry_cb = hist_iter__branch_callback;
> +		if (!rep->total_cycles)
> +			iter.add_entry_cb = hist_iter__branch_callback;
>  		iter.ops = &hist_iter_branch;
>  	} else if (rep->mem_mode) {
>  		iter.ops = &hist_iter_mem;
> @@ -290,9 +346,10 @@ static int process_sample_event(struct perf_tool *tool,
>  	if (al.map != NULL)
>  		al.map->dso->hit = 1;
>  
> -	if (ui__has_annotation() || rep->symbol_ipc) {
> +	if (ui__has_annotation() || rep->symbol_ipc || rep->total_cycles) {
>  		hist__account_cycles(sample->branch_stack, &al, sample,
> -				     rep->nonany_branch_mode, NULL);
> +				     rep->nonany_branch_mode,
> +				     &rep->cycles_count);
>  	}
>  
>  	ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
> @@ -480,6 +537,349 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
>  	return ret + fprintf(fp, "\n#\n");
>  }
>  
> +static int block_column_header(struct perf_hpp_fmt *fmt __maybe_unused,
> +			       struct perf_hpp *hpp __maybe_unused,
> +			       struct hists *hists __maybe_unused,
> +			       int line __maybe_unused,
> +			       int *span __maybe_unused)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +
> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
> +			 block_fmt->header);
> +}
> +
> +static int block_column_width(struct perf_hpp_fmt *fmt __maybe_unused,
> +			      struct perf_hpp *hpp __maybe_unused,
> +			      struct hists *hists __maybe_unused)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +
> +	return block_fmt->width;
> +}
> +
> +static int block_cycles_cov_entry(struct perf_hpp_fmt *fmt,
> +				  struct perf_hpp *hpp, struct hist_entry *he)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +	struct report *rep = block_fmt->rep;
> +	struct block_info *bi = he->block_info;
> +	double ratio = 0.0;
> +	char buf[16];
> +
> +	if (rep->cycles_count)
> +		ratio = (double)bi->cycles / (double)rep->cycles_count;
> +
> +	sprintf(buf, "%.2f%%", 100.0 * ratio);
> +
> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
> +}
> +
> +static int64_t block_cycles_cov_sort(struct perf_hpp_fmt *fmt,
> +				     struct hist_entry *left,
> +				     struct hist_entry *right)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +	struct report *rep = block_fmt->rep;
> +	struct block_info *bi_l = left->block_info;
> +	struct block_info *bi_r = right->block_info;
> +	double l, r;
> +
> +	if (rep->cycles_count) {
> +		l = ((double)bi_l->cycles / (double)rep->cycles_count) * 1000.0;
> +		r = ((double)bi_r->cycles / (double)rep->cycles_count) * 1000.0;
> +		return (int64_t)l - (int64_t)r;
> +	}
> +
> +	return 0;
> +}
> +
> +static void cycles_string(u64 cycles, char *buf, int size)
> +{
> +	if (cycles >= 1000000)
> +		scnprintf(buf, size, "%.1fM", (double)cycles / 1000000.0);
> +	else if (cycles >= 1000)
> +		scnprintf(buf, size, "%.1fK", (double)cycles / 1000.0);
> +	else
> +		scnprintf(buf, size, "%1d", cycles);
> +}
> +
> +static int block_cycles_lbr_entry(struct perf_hpp_fmt *fmt,
> +				  struct perf_hpp *hpp, struct hist_entry *he)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +	struct block_info *bi = he->block_info;
> +	char cycles_buf[16];
> +
> +	cycles_string(bi->cycles_aggr, cycles_buf, sizeof(cycles_buf));
> +
> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
> +			 cycles_buf);
> +}
> +
> +static int block_cycles_pct_entry(struct perf_hpp_fmt *fmt,
> +				  struct perf_hpp *hpp, struct hist_entry *he)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +	struct report *rep = block_fmt->rep;
> +	struct block_info *bi = he->block_info;
> +	double ratio = 0.0;
> +	u64 avg;
> +	char buf[16];
> +
> +	if (rep->block_cycles && bi->num_aggr) {
> +		avg = bi->cycles_aggr / bi->num_aggr;
> +		ratio = (double)avg / (double)rep->block_cycles;
> +	}
> +
> +	sprintf(buf, "%.2f%%", 100.0 * ratio);
> +
> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
> +}
> +
> +static int block_avg_cycles_entry(struct perf_hpp_fmt *fmt,
> +				  struct perf_hpp *hpp,
> +				  struct hist_entry *he)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +	struct block_info *bi = he->block_info;
> +	char cycles_buf[16];
> +
> +	cycles_string(bi->cycles_aggr / bi->num_aggr, cycles_buf,
> +		      sizeof(cycles_buf));
> +
> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
> +			 cycles_buf);
> +}
> +
> +static int block_range_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> +			     struct hist_entry *he)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +	struct block_info *bi = he->block_info;
> +	char buf[128];
> +	char *start_line, *end_line;
> +
> +	symbol_conf.disable_add2line_warn = true;
> +
> +	start_line = map__srcline(he->ms.map, bi->sym->start + bi->start,
> +				  he->ms.sym);
> +
> +	end_line = map__srcline(he->ms.map, bi->sym->start + bi->end,
> +				he->ms.sym);
> +
> +	if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
> +		scnprintf(buf, sizeof(buf), "[%s -> %s]",
> +			  start_line, end_line);
> +	} else {
> +		scnprintf(buf, sizeof(buf), "[%7lx -> %7lx]",
> +			  bi->start, bi->end);
> +	}
> +
> +	free_srcline(start_line);
> +	free_srcline(end_line);
> +
> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
> +}
> +
> +static int block_dso_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> +			   struct hist_entry *he)
> +{
> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> +	struct map *map = he->ms.map;
> +
> +	if (map && map->dso) {
> +		return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
> +				 map->dso->short_name);
> +	}
> +
> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
> +			 "[unknown]");
> +}
> +
> +static void init_block_header(struct block_fmt *block_fmt)
> +{
> +	struct perf_hpp_fmt *fmt = &block_fmt->fmt;
> +
> +	BUG_ON(block_fmt->idx >= PERF_HPP_REPORT__BLOCK_MAX_INDEX);
> +
> +	block_fmt->header = block_columns[block_fmt->idx].name;
> +	block_fmt->width = block_columns[block_fmt->idx].width;
> +
> +	fmt->header = block_column_header;
> +	fmt->width = block_column_width;
> +}
> +
> +static void block_hpp_register(struct block_fmt *block_fmt, int idx,
> +			       struct perf_hpp_list *hpp_list,
> +			       struct report *rep)
> +{
> +	struct perf_hpp_fmt *fmt = &block_fmt->fmt;
> +
> +	block_fmt->rep = rep;
> +	block_fmt->idx = idx;
> +	INIT_LIST_HEAD(&fmt->list);
> +	INIT_LIST_HEAD(&fmt->sort_list);
> +
> +	switch (idx) {
> +	case PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV:
> +		fmt->entry = block_cycles_cov_entry;
> +		fmt->cmp = block_info__cmp;
> +		fmt->sort = block_cycles_cov_sort;
> +		break;
> +	case PERF_HPP_REPORT__BLOCK_LBR_CYCLES:
> +		fmt->entry = block_cycles_lbr_entry;
> +		break;
> +	case PERF_HPP_REPORT__BLOCK_CYCLES_PCT:
> +		fmt->entry = block_cycles_pct_entry;
> +		break;
> +	case PERF_HPP_REPORT__BLOCK_AVG_CYCLES:
> +		fmt->entry = block_avg_cycles_entry;
> +		break;
> +	case PERF_HPP_REPORT__BLOCK_RANGE:
> +		fmt->entry = block_range_entry;
> +		break;
> +	case PERF_HPP_REPORT__BLOCK_DSO:
> +		fmt->entry = block_dso_entry;
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	init_block_header(block_fmt);
> +	perf_hpp_list__column_register(hpp_list, fmt);
> +}
> +
> +static void register_block_columns(struct perf_hpp_list *hpp_list,
> +				   struct report *rep)
> +{
> +	for (int i = 0; i < PERF_HPP_REPORT__BLOCK_MAX_INDEX; i++)
> +		block_hpp_register(&block_fmts[i], i, hpp_list, rep);
> +}
> +
> +static void init_block_hist(struct block_hist *bh, struct report *rep)
> +{
> +	__hists__init(&bh->block_hists, &bh->block_list);
> +	perf_hpp_list__init(&bh->block_list);
> +	bh->block_list.nr_header_lines = 1;
> +
> +	register_block_columns(&bh->block_list, rep);
> +
> +	perf_hpp_list__register_sort_field(&bh->block_list,
> +		&block_fmts[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV].fmt);
> +}
> +
> +static void init_block_info(struct block_info *bi, struct symbol *sym,
> +			    struct cyc_hist *ch, int offset,
> +			    u64 total_cycles)
> +{
> +	bi->sym = sym;
> +	bi->start = ch->start;
> +	bi->end = offset;
> +	bi->cycles = ch->cycles;
> +	bi->cycles_aggr = ch->cycles_aggr;
> +	bi->num = ch->num;
> +	bi->num_aggr = ch->num_aggr;
> +	bi->total_cycles = total_cycles;
> +}
> +
> +static int add_block_per_sym(struct hist_entry *he, struct block_hist *bh,
> +			     u64 *block_cycles, u64 total_cycles)
> +{
> +	struct annotation *notes;
> +	struct cyc_hist *ch;
> +	static struct addr_location al;
> +	u64 cycles = 0;
> +
> +	if (!he->ms.map || !he->ms.sym)
> +		return 0;
> +
> +	memset(&al, 0, sizeof(al));
> +	al.map = he->ms.map;
> +	al.sym = he->ms.sym;
> +
> +	notes = symbol__annotation(he->ms.sym);
> +	if (!notes || !notes->src || !notes->src->cycles_hist)
> +		return 0;
> +	ch = notes->src->cycles_hist;
> +	for (unsigned int i = 0; i < symbol__size(he->ms.sym); i++) {
> +		if (ch[i].num_aggr) {
> +			struct block_info *bi;
> +			struct hist_entry *he_block;
> +
> +			bi = block_info__new();
> +			if (!bi)
> +				return -1;
> +
> +			init_block_info(bi, he->ms.sym, &ch[i], i,
> +					total_cycles);
> +			cycles += bi->cycles_aggr / bi->num_aggr;
> +
> +			he_block = hists__add_entry_block(&bh->block_hists,
> +							  &al, bi);
> +			if (!he_block) {
> +				block_info__put(bi);
> +				return -1;
> +			}
> +		}
> +	}
> +
> +	if (block_cycles)
> +		*block_cycles += cycles;
> +
> +	return 0;
> +}
> +
> +static int resort_cb(struct hist_entry *he, void *arg __maybe_unused)
> +{
> +	/* Skip the calculation of column length in output_resort */
> +	he->filtered = true;
> +	return 0;
> +}
> +
> +static void hists__clear_filtered(struct hists *hists)
> +{
> +	struct rb_node *next = rb_first_cached(&hists->entries);
> +	struct hist_entry *he;
> +
> +	while (next) {
> +		he = rb_entry(next, struct hist_entry, rb_node);
> +		he->filtered = false;
> +		next = rb_next(&he->rb_node);
> +	}
> +}
> +
> +static void get_block_hists(struct hists *hists, struct block_hist *bh,
> +			    struct report *rep)
> +{
> +	struct rb_node *next = rb_first_cached(&hists->entries);
> +	struct hist_entry *he;
> +
> +	init_block_hist(bh, rep);
> +
> +	while (next) {
> +		he = rb_entry(next, struct hist_entry, rb_node);
> +		add_block_per_sym(he, bh, &rep->block_cycles,
> +				  rep->cycles_count);
> +		next = rb_next(&he->rb_node);
> +	}
> +
> +	hists__output_resort_cb(&bh->block_hists, NULL, resort_cb);
> +	hists__clear_filtered(&bh->block_hists);
> +}
> +
> +static int hists__fprintf_all_blocks(struct hists *hists, struct report *rep)
> +{
> +	struct block_hist *bh = &rep->block_hist;
> +
> +	get_block_hists(hists, bh, rep);
> +	symbol_conf.report_individual_block = true;
> +	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
> +		       stdout, true);
> +	hists__delete_entries(&bh->block_hists);
> +	return 0;
> +}
> +
>  static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>  					 struct report *rep,
>  					 const char *help)
> @@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>  			continue;
>  
>  		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
> +
> +		if (rep->total_cycles) {
> +			hists__fprintf_all_blocks(hists, rep);
> +			continue;
> +		}
> +
>  		hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
>  			       !(symbol_conf.use_callchain ||
>  			         symbol_conf.show_branchflag_count));
> @@ -1373,6 +1779,15 @@ int cmd_report(int argc, const char **argv)
>  		goto error;
>  	}
>  
> +	if (sort_order && strstr(sort_order, "total_cycles") &&
> +	    (sort__mode == SORT_MODE__BRANCH)) {
> +		report.total_cycles = true;
> +		if (!report.use_stdio) {
> +			pr_err("Error: -s total_cycles can be only used together with --stdio\n");
> +			goto error;
> +		}
> +	}
> +
>  	if (strcmp(input_name, "-") != 0)
>  		setup_browser(true);
>  	else
> @@ -1423,7 +1838,7 @@ int cmd_report(int argc, const char **argv)
>  	 * so don't allocate extra space that won't be used in the stdio
>  	 * implementation.
>  	 */
> -	if (ui__has_annotation() || report.symbol_ipc) {
> +	if (ui__has_annotation() || report.symbol_ipc || report.total_cycles) {
>  		ret = symbol__annotation_init();
>  		if (ret < 0)
>  			goto error;
> diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
> index 5365606e9dad..655ef7708cd0 100644
> --- a/tools/perf/ui/stdio/hist.c
> +++ b/tools/perf/ui/stdio/hist.c
> @@ -558,6 +558,25 @@ static int hist_entry__block_fprintf(struct hist_entry *he,
>  	return ret;
>  }
>  
> +static int hist_entry__individual_block_fprintf(struct hist_entry *he,
> +						char *bf, size_t size,
> +						FILE *fp)
> +{
> +	int ret = 0;
> +
> +	struct perf_hpp hpp = {
> +		.buf		= bf,
> +		.size		= size,
> +		.skip		= false,
> +	};
> +
> +	hist_entry__snprintf(he, &hpp);
> +	if (!hpp.skip)
> +		ret += fprintf(fp, "%s\n", bf);
> +
> +	return ret;
> +}
> +
>  static int hist_entry__fprintf(struct hist_entry *he, size_t size,
>  			       char *bf, size_t bfsz, FILE *fp,
>  			       bool ignore_callchains)
> @@ -580,6 +599,9 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
>  	if (symbol_conf.report_block)
>  		return hist_entry__block_fprintf(he, bf, size, fp);
>  
> +	if (symbol_conf.report_individual_block)
> +		return hist_entry__individual_block_fprintf(he, bf, size, fp);
> +
>  	hist_entry__snprintf(he, &hpp);
>  
>  	ret = fprintf(fp, "%s\n", bf);
> diff --git a/tools/perf/util/block.h b/tools/perf/util/block.h
> index b6730204d0b9..bdd21354d26e 100644
> --- a/tools/perf/util/block.h
> +++ b/tools/perf/util/block.h
> @@ -13,6 +13,7 @@ struct block_info {
>  	u64			end;
>  	u64			cycles;
>  	u64			cycles_aggr;
> +	u64			total_cycles;
>  	s64			cycles_spark[NUM_SPARKS];
>  	int			num;
>  	int			num_aggr;
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index af65ce950ba2..521f7185a94f 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -756,6 +756,10 @@ struct hist_entry *hists__add_entry_block(struct hists *hists,
>  	struct hist_entry entry = {
>  		.block_info = block_info,
>  		.hists = hists,
> +		.ms = {
> +			.map = al->map,
> +			.sym = al->sym,
> +		},
>  	}, *he = hists__findnew_entry(hists, &entry, al, false);
>  
>  	return he;
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 43d1d410854a..eb286700a8a9 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -492,6 +492,10 @@ struct sort_entry sort_sym_ipc_null = {
>  	.se_width_idx	= HISTC_SYMBOL_IPC,
>  };
>  
> +struct sort_entry sort_block_cycles = {
> +	.se_cmp		= sort__sym_cmp,
> +};
> +
>  /* --sort srcfile */
>  
>  static char no_srcfile[1];
> @@ -1695,6 +1699,7 @@ static struct sort_dimension bstack_sort_dimensions[] = {
>  	DIM(SORT_SRCLINE_FROM, "srcline_from", sort_srcline_from),
>  	DIM(SORT_SRCLINE_TO, "srcline_to", sort_srcline_to),
>  	DIM(SORT_SYM_IPC, "ipc_lbr", sort_sym_ipc),
> +	DIM(SORT_BLOCK_CYCLES, "total_cycles", sort_block_cycles),
>  };
>  
>  #undef DIM
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index 5aff9542d9b7..2ede6c70ad56 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -239,6 +239,7 @@ enum sort_type {
>  	SORT_SRCLINE_FROM,
>  	SORT_SRCLINE_TO,
>  	SORT_SYM_IPC,
> +	SORT_BLOCK_CYCLES,
>  
>  	/* memory mode specific sort keys */
>  	__SORT_MEMORY_MODE,
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index e6880789864c..10f1ec3e0349 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -40,6 +40,7 @@ struct symbol_conf {
>  			raw_trace,
>  			report_hierarchy,
>  			report_block,
> +			report_individual_block,
>  			inline_name,
>  			disable_add2line_warn;
>  	const char	*vmlinux_name,
> -- 
> 2.17.1
> {


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-15  5:33 ` [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio Jin Yao
  2019-10-15  8:41   ` Jiri Olsa
  2019-10-21 16:07   ` Jiri Olsa
@ 2019-10-21 16:08   ` Jiri Olsa
  2019-10-22  1:04     ` Jin, Yao
  2 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2019-10-21 16:08 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Oct 15, 2019 at 01:33:48PM +0800, Jin Yao wrote:

SNIP

> +			cycles += bi->cycles_aggr / bi->num_aggr;
> +
> +			he_block = hists__add_entry_block(&bh->block_hists,
> +							  &al, bi);
> +			if (!he_block) {
> +				block_info__put(bi);
> +				return -1;
> +			}
> +		}
> +	}
> +
> +	if (block_cycles)
> +		*block_cycles += cycles;
> +
> +	return 0;
> +}
> +
> +static int resort_cb(struct hist_entry *he, void *arg __maybe_unused)
> +{
> +	/* Skip the calculation of column length in output_resort */
> +	he->filtered = true;

that's a nasty hack ;-) together with setting it back to false just below

why do you want to skip the columns calculation? we could add those columns
to the output as well no?

jirka

> +	return 0;
> +}
> +
> +static void hists__clear_filtered(struct hists *hists)
> +{
> +	struct rb_node *next = rb_first_cached(&hists->entries);
> +	struct hist_entry *he;
> +
> +	while (next) {
> +		he = rb_entry(next, struct hist_entry, rb_node);
> +		he->filtered = false;
> +		next = rb_next(&he->rb_node);
> +	}
> +}

SNIP

jirka


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/5] perf util: Create new block.h/block.c for block related functions
  2019-10-15  5:33 ` [PATCH v2 1/5] perf util: Create new block.h/block.c for block related functions Jin Yao
@ 2019-10-21 16:08   ` Jiri Olsa
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Olsa @ 2019-10-21 16:08 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Oct 15, 2019 at 01:33:46PM +0800, Jin Yao wrote:

SNIP

> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 39814b1806a6..3121c0055950 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -1,4 +1,5 @@
>  perf-y += annotate.o
> +perf-y += block.o
>  perf-y += block-range.o
>  perf-y += build-id.o
>  perf-y += cacheline.o
> diff --git a/tools/perf/util/block.c b/tools/perf/util/block.c
> new file mode 100644
> index 000000000000..e5e6f941f040
> --- /dev/null
> +++ b/tools/perf/util/block.c

please use block_info.c, block.c is misleading wrt what
we are doing here..

jirka


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-21 16:07   ` Jiri Olsa
@ 2019-10-22  0:57     ` Jin, Yao
  0 siblings, 0 replies; 20+ messages in thread
From: Jin, Yao @ 2019-10-22  0:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 10/22/2019 12:07 AM, Jiri Olsa wrote:
> On Tue, Oct 15, 2019 at 01:33:48PM +0800, Jin Yao wrote:
> 
> SNIP
> 
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -51,6 +51,7 @@
>>   #include "util/util.h" // perf_tip()
>>   #include "ui/ui.h"
>>   #include "ui/progress.h"
>> +#include "util/block.h"
>>   
>>   #include <dlfcn.h>
>>   #include <errno.h>
>> @@ -96,10 +97,64 @@ struct report {
>>   	float			min_percent;
>>   	u64			nr_entries;
>>   	u64			queue_size;
>> +	u64			cycles_count;
>> +	u64			block_cycles;
>>   	int			socket_filter;
>>   	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
>>   	struct branch_type_stat	brtype_stat;
>>   	bool			symbol_ipc;
>> +	bool			total_cycles;
>> +	struct block_hist	block_hist;
>> +};
>> +
> 
> please put all this (everything that has *block* in the name ;-) )
> to the block_info.[ch] and try to reuse some of the functions in
> builtin-diff.c, looks like there's some duplicity like for
> init_block_hist init_block_info
> 
> I understand that report and diff interface would be different
> at some point, but there seems to be many common parts
> 
> jirka
> 

OK, got it. I will try to reuse the common parts.

Thanks
Jin Yao

>> +struct block_fmt {
>> +	struct perf_hpp_fmt	fmt;
>> +	int			idx;
>> +	int			width;
>> +	const char		*header;
>> +	struct report		*rep;
>> +};
>> +
>> +enum {
>> +	PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV,
>> +	PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
>> +	PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
>> +	PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
>> +	PERF_HPP_REPORT__BLOCK_RANGE,
>> +	PERF_HPP_REPORT__BLOCK_DSO,
>> +	PERF_HPP_REPORT__BLOCK_MAX_INDEX
>> +};
>> +
>> +static struct block_fmt block_fmts[PERF_HPP_REPORT__BLOCK_MAX_INDEX];
>> +
>> +static struct block_header_column{
>> +	const char *name;
>> +	int width;
>> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
>> +	[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = {
>> +		.name = "Sampled Cycles%",
>> +		.width = 15,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = {
>> +		.name = "Sampled Cycles",
>> +		.width = 14,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = {
>> +		.name = "Avg Cycles%",
>> +		.width = 11,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = {
>> +		.name = "Avg Cycles",
>> +		.width = 10,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_RANGE] = {
>> +		.name = "[Program Block Range]",
>> +		.width = 70,
>> +	},
>> +	[PERF_HPP_REPORT__BLOCK_DSO] = {
>> +		.name = "Shared Object",
>> +		.width = 20,
>> +	}
>>   };
>>   
>>   static int report__config(const char *var, const char *value, void *cb)
>> @@ -277,7 +332,8 @@ static int process_sample_event(struct perf_tool *tool,
>>   		if (!sample->branch_stack)
>>   			goto out_put;
>>   
>> -		iter.add_entry_cb = hist_iter__branch_callback;
>> +		if (!rep->total_cycles)
>> +			iter.add_entry_cb = hist_iter__branch_callback;
>>   		iter.ops = &hist_iter_branch;
>>   	} else if (rep->mem_mode) {
>>   		iter.ops = &hist_iter_mem;
>> @@ -290,9 +346,10 @@ static int process_sample_event(struct perf_tool *tool,
>>   	if (al.map != NULL)
>>   		al.map->dso->hit = 1;
>>   
>> -	if (ui__has_annotation() || rep->symbol_ipc) {
>> +	if (ui__has_annotation() || rep->symbol_ipc || rep->total_cycles) {
>>   		hist__account_cycles(sample->branch_stack, &al, sample,
>> -				     rep->nonany_branch_mode, NULL);
>> +				     rep->nonany_branch_mode,
>> +				     &rep->cycles_count);
>>   	}
>>   
>>   	ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
>> @@ -480,6 +537,349 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
>>   	return ret + fprintf(fp, "\n#\n");
>>   }
>>   
>> +static int block_column_header(struct perf_hpp_fmt *fmt __maybe_unused,
>> +			       struct perf_hpp *hpp __maybe_unused,
>> +			       struct hists *hists __maybe_unused,
>> +			       int line __maybe_unused,
>> +			       int *span __maybe_unused)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +
>> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
>> +			 block_fmt->header);
>> +}
>> +
>> +static int block_column_width(struct perf_hpp_fmt *fmt __maybe_unused,
>> +			      struct perf_hpp *hpp __maybe_unused,
>> +			      struct hists *hists __maybe_unused)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +
>> +	return block_fmt->width;
>> +}
>> +
>> +static int block_cycles_cov_entry(struct perf_hpp_fmt *fmt,
>> +				  struct perf_hpp *hpp, struct hist_entry *he)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +	struct report *rep = block_fmt->rep;
>> +	struct block_info *bi = he->block_info;
>> +	double ratio = 0.0;
>> +	char buf[16];
>> +
>> +	if (rep->cycles_count)
>> +		ratio = (double)bi->cycles / (double)rep->cycles_count;
>> +
>> +	sprintf(buf, "%.2f%%", 100.0 * ratio);
>> +
>> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
>> +}
>> +
>> +static int64_t block_cycles_cov_sort(struct perf_hpp_fmt *fmt,
>> +				     struct hist_entry *left,
>> +				     struct hist_entry *right)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +	struct report *rep = block_fmt->rep;
>> +	struct block_info *bi_l = left->block_info;
>> +	struct block_info *bi_r = right->block_info;
>> +	double l, r;
>> +
>> +	if (rep->cycles_count) {
>> +		l = ((double)bi_l->cycles / (double)rep->cycles_count) * 1000.0;
>> +		r = ((double)bi_r->cycles / (double)rep->cycles_count) * 1000.0;
>> +		return (int64_t)l - (int64_t)r;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void cycles_string(u64 cycles, char *buf, int size)
>> +{
>> +	if (cycles >= 1000000)
>> +		scnprintf(buf, size, "%.1fM", (double)cycles / 1000000.0);
>> +	else if (cycles >= 1000)
>> +		scnprintf(buf, size, "%.1fK", (double)cycles / 1000.0);
>> +	else
>> +		scnprintf(buf, size, "%1d", cycles);
>> +}
>> +
>> +static int block_cycles_lbr_entry(struct perf_hpp_fmt *fmt,
>> +				  struct perf_hpp *hpp, struct hist_entry *he)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +	struct block_info *bi = he->block_info;
>> +	char cycles_buf[16];
>> +
>> +	cycles_string(bi->cycles_aggr, cycles_buf, sizeof(cycles_buf));
>> +
>> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
>> +			 cycles_buf);
>> +}
>> +
>> +static int block_cycles_pct_entry(struct perf_hpp_fmt *fmt,
>> +				  struct perf_hpp *hpp, struct hist_entry *he)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +	struct report *rep = block_fmt->rep;
>> +	struct block_info *bi = he->block_info;
>> +	double ratio = 0.0;
>> +	u64 avg;
>> +	char buf[16];
>> +
>> +	if (rep->block_cycles && bi->num_aggr) {
>> +		avg = bi->cycles_aggr / bi->num_aggr;
>> +		ratio = (double)avg / (double)rep->block_cycles;
>> +	}
>> +
>> +	sprintf(buf, "%.2f%%", 100.0 * ratio);
>> +
>> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
>> +}
>> +
>> +static int block_avg_cycles_entry(struct perf_hpp_fmt *fmt,
>> +				  struct perf_hpp *hpp,
>> +				  struct hist_entry *he)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +	struct block_info *bi = he->block_info;
>> +	char cycles_buf[16];
>> +
>> +	cycles_string(bi->cycles_aggr / bi->num_aggr, cycles_buf,
>> +		      sizeof(cycles_buf));
>> +
>> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
>> +			 cycles_buf);
>> +}
>> +
>> +static int block_range_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
>> +			     struct hist_entry *he)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +	struct block_info *bi = he->block_info;
>> +	char buf[128];
>> +	char *start_line, *end_line;
>> +
>> +	symbol_conf.disable_add2line_warn = true;
>> +
>> +	start_line = map__srcline(he->ms.map, bi->sym->start + bi->start,
>> +				  he->ms.sym);
>> +
>> +	end_line = map__srcline(he->ms.map, bi->sym->start + bi->end,
>> +				he->ms.sym);
>> +
>> +	if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) {
>> +		scnprintf(buf, sizeof(buf), "[%s -> %s]",
>> +			  start_line, end_line);
>> +	} else {
>> +		scnprintf(buf, sizeof(buf), "[%7lx -> %7lx]",
>> +			  bi->start, bi->end);
>> +	}
>> +
>> +	free_srcline(start_line);
>> +	free_srcline(end_line);
>> +
>> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
>> +}
>> +
>> +static int block_dso_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
>> +			   struct hist_entry *he)
>> +{
>> +	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> +	struct map *map = he->ms.map;
>> +
>> +	if (map && map->dso) {
>> +		return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
>> +				 map->dso->short_name);
>> +	}
>> +
>> +	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
>> +			 "[unknown]");
>> +}
>> +
>> +static void init_block_header(struct block_fmt *block_fmt)
>> +{
>> +	struct perf_hpp_fmt *fmt = &block_fmt->fmt;
>> +
>> +	BUG_ON(block_fmt->idx >= PERF_HPP_REPORT__BLOCK_MAX_INDEX);
>> +
>> +	block_fmt->header = block_columns[block_fmt->idx].name;
>> +	block_fmt->width = block_columns[block_fmt->idx].width;
>> +
>> +	fmt->header = block_column_header;
>> +	fmt->width = block_column_width;
>> +}
>> +
>> +static void block_hpp_register(struct block_fmt *block_fmt, int idx,
>> +			       struct perf_hpp_list *hpp_list,
>> +			       struct report *rep)
>> +{
>> +	struct perf_hpp_fmt *fmt = &block_fmt->fmt;
>> +
>> +	block_fmt->rep = rep;
>> +	block_fmt->idx = idx;
>> +	INIT_LIST_HEAD(&fmt->list);
>> +	INIT_LIST_HEAD(&fmt->sort_list);
>> +
>> +	switch (idx) {
>> +	case PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV:
>> +		fmt->entry = block_cycles_cov_entry;
>> +		fmt->cmp = block_info__cmp;
>> +		fmt->sort = block_cycles_cov_sort;
>> +		break;
>> +	case PERF_HPP_REPORT__BLOCK_LBR_CYCLES:
>> +		fmt->entry = block_cycles_lbr_entry;
>> +		break;
>> +	case PERF_HPP_REPORT__BLOCK_CYCLES_PCT:
>> +		fmt->entry = block_cycles_pct_entry;
>> +		break;
>> +	case PERF_HPP_REPORT__BLOCK_AVG_CYCLES:
>> +		fmt->entry = block_avg_cycles_entry;
>> +		break;
>> +	case PERF_HPP_REPORT__BLOCK_RANGE:
>> +		fmt->entry = block_range_entry;
>> +		break;
>> +	case PERF_HPP_REPORT__BLOCK_DSO:
>> +		fmt->entry = block_dso_entry;
>> +		break;
>> +	default:
>> +		return;
>> +	}
>> +
>> +	init_block_header(block_fmt);
>> +	perf_hpp_list__column_register(hpp_list, fmt);
>> +}
>> +
>> +static void register_block_columns(struct perf_hpp_list *hpp_list,
>> +				   struct report *rep)
>> +{
>> +	for (int i = 0; i < PERF_HPP_REPORT__BLOCK_MAX_INDEX; i++)
>> +		block_hpp_register(&block_fmts[i], i, hpp_list, rep);
>> +}
>> +
>> +static void init_block_hist(struct block_hist *bh, struct report *rep)
>> +{
>> +	__hists__init(&bh->block_hists, &bh->block_list);
>> +	perf_hpp_list__init(&bh->block_list);
>> +	bh->block_list.nr_header_lines = 1;
>> +
>> +	register_block_columns(&bh->block_list, rep);
>> +
>> +	perf_hpp_list__register_sort_field(&bh->block_list,
>> +		&block_fmts[PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV].fmt);
>> +}
>> +
>> +static void init_block_info(struct block_info *bi, struct symbol *sym,
>> +			    struct cyc_hist *ch, int offset,
>> +			    u64 total_cycles)
>> +{
>> +	bi->sym = sym;
>> +	bi->start = ch->start;
>> +	bi->end = offset;
>> +	bi->cycles = ch->cycles;
>> +	bi->cycles_aggr = ch->cycles_aggr;
>> +	bi->num = ch->num;
>> +	bi->num_aggr = ch->num_aggr;
>> +	bi->total_cycles = total_cycles;
>> +}
>> +
>> +static int add_block_per_sym(struct hist_entry *he, struct block_hist *bh,
>> +			     u64 *block_cycles, u64 total_cycles)
>> +{
>> +	struct annotation *notes;
>> +	struct cyc_hist *ch;
>> +	static struct addr_location al;
>> +	u64 cycles = 0;
>> +
>> +	if (!he->ms.map || !he->ms.sym)
>> +		return 0;
>> +
>> +	memset(&al, 0, sizeof(al));
>> +	al.map = he->ms.map;
>> +	al.sym = he->ms.sym;
>> +
>> +	notes = symbol__annotation(he->ms.sym);
>> +	if (!notes || !notes->src || !notes->src->cycles_hist)
>> +		return 0;
>> +	ch = notes->src->cycles_hist;
>> +	for (unsigned int i = 0; i < symbol__size(he->ms.sym); i++) {
>> +		if (ch[i].num_aggr) {
>> +			struct block_info *bi;
>> +			struct hist_entry *he_block;
>> +
>> +			bi = block_info__new();
>> +			if (!bi)
>> +				return -1;
>> +
>> +			init_block_info(bi, he->ms.sym, &ch[i], i,
>> +					total_cycles);
>> +			cycles += bi->cycles_aggr / bi->num_aggr;
>> +
>> +			he_block = hists__add_entry_block(&bh->block_hists,
>> +							  &al, bi);
>> +			if (!he_block) {
>> +				block_info__put(bi);
>> +				return -1;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (block_cycles)
>> +		*block_cycles += cycles;
>> +
>> +	return 0;
>> +}
>> +
>> +static int resort_cb(struct hist_entry *he, void *arg __maybe_unused)
>> +{
>> +	/* Skip the calculation of column length in output_resort */
>> +	he->filtered = true;
>> +	return 0;
>> +}
>> +
>> +static void hists__clear_filtered(struct hists *hists)
>> +{
>> +	struct rb_node *next = rb_first_cached(&hists->entries);
>> +	struct hist_entry *he;
>> +
>> +	while (next) {
>> +		he = rb_entry(next, struct hist_entry, rb_node);
>> +		he->filtered = false;
>> +		next = rb_next(&he->rb_node);
>> +	}
>> +}
>> +
>> +static void get_block_hists(struct hists *hists, struct block_hist *bh,
>> +			    struct report *rep)
>> +{
>> +	struct rb_node *next = rb_first_cached(&hists->entries);
>> +	struct hist_entry *he;
>> +
>> +	init_block_hist(bh, rep);
>> +
>> +	while (next) {
>> +		he = rb_entry(next, struct hist_entry, rb_node);
>> +		add_block_per_sym(he, bh, &rep->block_cycles,
>> +				  rep->cycles_count);
>> +		next = rb_next(&he->rb_node);
>> +	}
>> +
>> +	hists__output_resort_cb(&bh->block_hists, NULL, resort_cb);
>> +	hists__clear_filtered(&bh->block_hists);
>> +}
>> +
>> +static int hists__fprintf_all_blocks(struct hists *hists, struct report *rep)
>> +{
>> +	struct block_hist *bh = &rep->block_hist;
>> +
>> +	get_block_hists(hists, bh, rep);
>> +	symbol_conf.report_individual_block = true;
>> +	hists__fprintf(&bh->block_hists, true, 0, 0, 0,
>> +		       stdout, true);
>> +	hists__delete_entries(&bh->block_hists);
>> +	return 0;
>> +}
>> +
>>   static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>>   					 struct report *rep,
>>   					 const char *help)
>> @@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist,
>>   			continue;
>>   
>>   		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>> +
>> +		if (rep->total_cycles) {
>> +			hists__fprintf_all_blocks(hists, rep);
>> +			continue;
>> +		}
>> +
>>   		hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
>>   			       !(symbol_conf.use_callchain ||
>>   			         symbol_conf.show_branchflag_count));
>> @@ -1373,6 +1779,15 @@ int cmd_report(int argc, const char **argv)
>>   		goto error;
>>   	}
>>   
>> +	if (sort_order && strstr(sort_order, "total_cycles") &&
>> +	    (sort__mode == SORT_MODE__BRANCH)) {
>> +		report.total_cycles = true;
>> +		if (!report.use_stdio) {
>> +			pr_err("Error: -s total_cycles can be only used together with --stdio\n");
>> +			goto error;
>> +		}
>> +	}
>> +
>>   	if (strcmp(input_name, "-") != 0)
>>   		setup_browser(true);
>>   	else
>> @@ -1423,7 +1838,7 @@ int cmd_report(int argc, const char **argv)
>>   	 * so don't allocate extra space that won't be used in the stdio
>>   	 * implementation.
>>   	 */
>> -	if (ui__has_annotation() || report.symbol_ipc) {
>> +	if (ui__has_annotation() || report.symbol_ipc || report.total_cycles) {
>>   		ret = symbol__annotation_init();
>>   		if (ret < 0)
>>   			goto error;
>> diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
>> index 5365606e9dad..655ef7708cd0 100644
>> --- a/tools/perf/ui/stdio/hist.c
>> +++ b/tools/perf/ui/stdio/hist.c
>> @@ -558,6 +558,25 @@ static int hist_entry__block_fprintf(struct hist_entry *he,
>>   	return ret;
>>   }
>>   
>> +static int hist_entry__individual_block_fprintf(struct hist_entry *he,
>> +						char *bf, size_t size,
>> +						FILE *fp)
>> +{
>> +	int ret = 0;
>> +
>> +	struct perf_hpp hpp = {
>> +		.buf		= bf,
>> +		.size		= size,
>> +		.skip		= false,
>> +	};
>> +
>> +	hist_entry__snprintf(he, &hpp);
>> +	if (!hpp.skip)
>> +		ret += fprintf(fp, "%s\n", bf);
>> +
>> +	return ret;
>> +}
>> +
>>   static int hist_entry__fprintf(struct hist_entry *he, size_t size,
>>   			       char *bf, size_t bfsz, FILE *fp,
>>   			       bool ignore_callchains)
>> @@ -580,6 +599,9 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
>>   	if (symbol_conf.report_block)
>>   		return hist_entry__block_fprintf(he, bf, size, fp);
>>   
>> +	if (symbol_conf.report_individual_block)
>> +		return hist_entry__individual_block_fprintf(he, bf, size, fp);
>> +
>>   	hist_entry__snprintf(he, &hpp);
>>   
>>   	ret = fprintf(fp, "%s\n", bf);
>> diff --git a/tools/perf/util/block.h b/tools/perf/util/block.h
>> index b6730204d0b9..bdd21354d26e 100644
>> --- a/tools/perf/util/block.h
>> +++ b/tools/perf/util/block.h
>> @@ -13,6 +13,7 @@ struct block_info {
>>   	u64			end;
>>   	u64			cycles;
>>   	u64			cycles_aggr;
>> +	u64			total_cycles;
>>   	s64			cycles_spark[NUM_SPARKS];
>>   	int			num;
>>   	int			num_aggr;
>> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
>> index af65ce950ba2..521f7185a94f 100644
>> --- a/tools/perf/util/hist.c
>> +++ b/tools/perf/util/hist.c
>> @@ -756,6 +756,10 @@ struct hist_entry *hists__add_entry_block(struct hists *hists,
>>   	struct hist_entry entry = {
>>   		.block_info = block_info,
>>   		.hists = hists,
>> +		.ms = {
>> +			.map = al->map,
>> +			.sym = al->sym,
>> +		},
>>   	}, *he = hists__findnew_entry(hists, &entry, al, false);
>>   
>>   	return he;
>> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
>> index 43d1d410854a..eb286700a8a9 100644
>> --- a/tools/perf/util/sort.c
>> +++ b/tools/perf/util/sort.c
>> @@ -492,6 +492,10 @@ struct sort_entry sort_sym_ipc_null = {
>>   	.se_width_idx	= HISTC_SYMBOL_IPC,
>>   };
>>   
>> +struct sort_entry sort_block_cycles = {
>> +	.se_cmp		= sort__sym_cmp,
>> +};
>> +
>>   /* --sort srcfile */
>>   
>>   static char no_srcfile[1];
>> @@ -1695,6 +1699,7 @@ static struct sort_dimension bstack_sort_dimensions[] = {
>>   	DIM(SORT_SRCLINE_FROM, "srcline_from", sort_srcline_from),
>>   	DIM(SORT_SRCLINE_TO, "srcline_to", sort_srcline_to),
>>   	DIM(SORT_SYM_IPC, "ipc_lbr", sort_sym_ipc),
>> +	DIM(SORT_BLOCK_CYCLES, "total_cycles", sort_block_cycles),
>>   };
>>   
>>   #undef DIM
>> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
>> index 5aff9542d9b7..2ede6c70ad56 100644
>> --- a/tools/perf/util/sort.h
>> +++ b/tools/perf/util/sort.h
>> @@ -239,6 +239,7 @@ enum sort_type {
>>   	SORT_SRCLINE_FROM,
>>   	SORT_SRCLINE_TO,
>>   	SORT_SYM_IPC,
>> +	SORT_BLOCK_CYCLES,
>>   
>>   	/* memory mode specific sort keys */
>>   	__SORT_MEMORY_MODE,
>> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
>> index e6880789864c..10f1ec3e0349 100644
>> --- a/tools/perf/util/symbol_conf.h
>> +++ b/tools/perf/util/symbol_conf.h
>> @@ -40,6 +40,7 @@ struct symbol_conf {
>>   			raw_trace,
>>   			report_hierarchy,
>>   			report_block,
>> +			report_individual_block,
>>   			inline_name,
>>   			disable_add2line_warn;
>>   	const char	*vmlinux_name,
>> -- 
>> 2.17.1
>> {
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio
  2019-10-21 16:08   ` Jiri Olsa
@ 2019-10-22  1:04     ` Jin, Yao
  0 siblings, 0 replies; 20+ messages in thread
From: Jin, Yao @ 2019-10-22  1:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 10/22/2019 12:08 AM, Jiri Olsa wrote:
> On Tue, Oct 15, 2019 at 01:33:48PM +0800, Jin Yao wrote:
> 
> SNIP
> 
>> +			cycles += bi->cycles_aggr / bi->num_aggr;
>> +
>> +			he_block = hists__add_entry_block(&bh->block_hists,
>> +							  &al, bi);
>> +			if (!he_block) {
>> +				block_info__put(bi);
>> +				return -1;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (block_cycles)
>> +		*block_cycles += cycles;
>> +
>> +	return 0;
>> +}
>> +
>> +static int resort_cb(struct hist_entry *he, void *arg __maybe_unused)
>> +{
>> +	/* Skip the calculation of column length in output_resort */
>> +	he->filtered = true;
> 
> that's a nasty hack ;-) together with setting it back to false just below
> 
> why do you want to skip the columns calculation? we could add those columns
> to the output as well no?
> 
> jirka
> 

The columns calculation for this case causes the crash. The current 
columns calculation requires some information but we don't provide. So I 
just want to skip the columns calculation.

OK, I will check how to avoid this nasty hack.

Thanks
Jin Yao

>> +	return 0;
>> +}
>> +
>> +static void hists__clear_filtered(struct hists *hists)
>> +{
>> +	struct rb_node *next = rb_first_cached(&hists->entries);
>> +	struct hist_entry *he;
>> +
>> +	while (next) {
>> +		he = rb_entry(next, struct hist_entry, rb_node);
>> +		he->filtered = false;
>> +		next = rb_next(&he->rb_node);
>> +	}
>> +}
> 
> SNIP
> 
> jirka
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-10-22  1:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15  5:33 [PATCH v2 0/5] perf report: Support sorting all blocks by cycles Jin Yao
2019-10-15  5:33 ` [PATCH v2 1/5] perf util: Create new block.h/block.c for block related functions Jin Yao
2019-10-21 16:08   ` Jiri Olsa
2019-10-15  5:33 ` [PATCH v2 2/5] perf util: Count the total cycles of all samples Jin Yao
2019-10-15  5:33 ` [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio Jin Yao
2019-10-15  8:41   ` Jiri Olsa
2019-10-15 14:53     ` Jin, Yao
2019-10-16 10:15       ` Jiri Olsa
2019-10-16 10:51         ` Jin, Yao
2019-10-16 12:53           ` Jiri Olsa
2019-10-16 13:09             ` Jin, Yao
2019-10-21  6:56             ` Jin, Yao
2019-10-21 14:04               ` Jiri Olsa
2019-10-21 16:07                 ` Jiri Olsa
2019-10-21 16:07   ` Jiri Olsa
2019-10-22  0:57     ` Jin, Yao
2019-10-21 16:08   ` Jiri Olsa
2019-10-22  1:04     ` Jin, Yao
2019-10-15  5:33 ` [PATCH v2 4/5] perf report: Support --percent-limit for total_cycles Jin Yao
2019-10-15  5:33 ` [PATCH v2 5/5] perf report: Sort by sampled cycles percent per block for tui Jin Yao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.