All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
@ 2012-09-13  7:19 Namhyung Kim
  2012-09-13  7:19 ` [PATCH 01/15] perf hists: Add missing period_* fields when collapsing a hist entry Namhyung Kim
                   ` (17 more replies)
  0 siblings, 18 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa

Hi,

This is my first attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

It basically adds period in a sample to every node in the callchain.
A hist_entry now has an additional fields to keep the cumulative
period if --cumulate option is given on perf report.

Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")
  
  void a(void)
  {
  	int i;
  
  	for (i = 0; i < 1000000; i++)
  		barrier();
  }
  
  void b(void)
  {
  	a();
  }
  
  void c(void)
  {
  	b();
  }
  
  int main(void)
  {
  	c();
  
  	return 0;
  }
  
With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc
  $ perf report -g none --stdio
  [snip]
  # Overhead  Command       Shared Object                      Symbol
  # ........  .......  ..................  ..........................
  #
      93.35%      abc  abc                 [.] a                     
       5.17%      abc  ld-2.15.so          [.] _dl_map_object_from_fd
       1.13%      abc  ld-2.15.so          [.] _dl_start             
       0.29%      abc  libpthread-2.15.so  [.] __libc_close          
       0.07%      abc  [kernel.kallsyms]   [k] page_fault            
       0.00%      abc  ld-2.15.so          [.] _start                
  
When --cumulate option is given, it'll be shown like this:

   $ perf report --cumulate
   (...)
   +  93.63%  abc  libc-2.15.so        [.] __libc_start_main
   +  93.35%  abc  abc                 [.] main
   +  93.35%  abc  abc                 [.] c
   +  93.35%  abc  abc                 [.] b
   +  93.35%  abc  abc                 [.] a
   +   5.17%  abc  ld-2.15.so          [.] _dl_map_object
   +   5.17%  abc  ld-2.15.so          [.] _dl_map_object_from_fd
   +   1.13%  abc  ld-2.15.so          [.] _dl_start_user
   +   1.13%  abc  ld-2.15.so          [.] _dl_start
   +   0.29%  abc  perf                [.] main
   +   0.29%  abc  perf                [.] run_builtin
   +   0.29%  abc  perf                [.] cmd_record
   +   0.29%  abc  libpthread-2.15.so  [.] __libc_close
   +   0.07%  abc  ld-2.15.so          [.] _start
   +   0.07%  abc  [kernel.kallsyms]   [k] page_fault
   
(This output came from TUI since stdio bothered by callchains)

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

It might have some rough edges or even bugs, but I really want to
release it and get reviews.  In fact I saw some very large percentage
or 'inf' on some callchain nodes when expanding.

It currently ignores samples don't have symbol info when accumulating
periods along the callchain.  Otherwise it resulted in very strangely
large output since every node in the callchain would be added into a
single entry which has NULL dso/sym.  Simply ignoring them solved the
problem and I couldn't come up with a better solution.

This patchset is based on current acme/perf/core + my small fixes [2],[3].
You can also get this series on my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git  perf/cumulate-v1

Any comments are welcome, thanks.
Namhyung

[1] https://lkml.org/lkml/2012/3/31/6
[2] https://lkml.org/lkml/2012/9/11/546
[3] https://lkml.org/lkml/2012/9/12/51


Namhyung Kim (15):
  perf hists: Add missing period_* fields when collapsing a hist entry
  perf hists: Introduce struct he_stat
  perf hists: Move he->stat.nr_events initialization to a template
  perf hists: Convert hist entry functions to use struct he_stat
  perf hists: Add more helpers for hist entry stat
  perf hists: Add support for accumulated stat of hist entry
  perf hists: Check if accumulated when adding a hist entry
  perf callchain: Add a couple of callchain helpers
  perf hists: Let add_hist_entry to make a hist entry template
  perf hists: Accumulate hist entry stat based on the callchain
  perf hists: Sort hist entries by accumulated period
  perf ui/hist: Add support to accumulated hist stat
  perf ui/browser: Add support to accumulated hist stat
  perf ui/gtk: Add support to accumulated hist stat
  perf report: Add --cumulate option

 tools/perf/builtin-report.c    |   8 ++
 tools/perf/ui/browsers/hists.c |  12 +-
 tools/perf/ui/gtk/browser.c    |   5 +-
 tools/perf/ui/hist.c           |  74 ++++++++++---
 tools/perf/ui/stdio/hist.c     |   2 +-
 tools/perf/util/callchain.c    |  15 +++
 tools/perf/util/callchain.h    |  17 +++
 tools/perf/util/hist.c         | 242 +++++++++++++++++++++++++++++++++--------
 tools/perf/util/sort.h         |  17 ++-
 tools/perf/util/symbol.h       |   1 +
 10 files changed, 318 insertions(+), 75 deletions(-)

-- 
1.7.11.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/15] perf hists: Add missing period_* fields when collapsing a hist entry
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
@ 2012-09-13  7:19 ` Namhyung Kim
  2012-09-13  7:19 ` [PATCH 02/15] perf hists: Introduce struct he_stat Namhyung Kim
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

So that the perf report won't lost the cpu utilization information.

For example, if there're two process that have same name.

  $ perf report --stdio --showcpuutilization -s pid
  [SNIP]
  #   Overhead       sys        us  Command:  Pid
  #   ........  ........  ........  .............
  #
        55.12%     0.01%    55.10%  noploop:28781
        44.88%     0.06%    44.83%  noploop:28782

Before:
  $ perf report --stdio --showcpuutilization -s comm
  [SNIP]
  #   Overhead       sys        us
  #   ........  ........  ........
  #
       100.00%     0.06%    44.83%

After:
  $ perf report --stdio --showcpuutilization -s comm
  [SNIP]
  #   Overhead       sys        us
  #   ........  ........  ........
  #
       100.00%     0.07%    99.93%

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 6ec5398de89d..236bc9d98ff2 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -410,8 +410,13 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
 		cmp = hist_entry__collapse(iter, he);
 
 		if (!cmp) {
-			iter->period += he->period;
-			iter->nr_events += he->nr_events;
+			iter->period		+= he->period;
+			iter->period_sys	+= he->period_sys;
+			iter->period_us		+= he->period_us;
+			iter->period_guest_sys	+= he->period_guest_sys;
+			iter->period_guest_us	+= he->period_guest_us;
+			iter->nr_events		+= he->nr_events;
+
 			if (symbol_conf.use_callchain) {
 				callchain_cursor_reset(&callchain_cursor);
 				callchain_merge(&callchain_cursor,
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/15] perf hists: Introduce struct he_stat
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
  2012-09-13  7:19 ` [PATCH 01/15] perf hists: Add missing period_* fields when collapsing a hist entry Namhyung Kim
@ 2012-09-13  7:19 ` Namhyung Kim
  2012-09-13  7:19 ` [PATCH 03/15] perf hists: Move he->stat.nr_events initialization to a template Namhyung Kim
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

The struct he_stat is for separating out statistics data of a hist
entry.  It is required for later changes.

It's just a mechanical change and should have no functional
differences.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  8 +++----
 tools/perf/ui/gtk/browser.c    |  2 +-
 tools/perf/ui/hist.c           | 32 +++++++++++++-------------
 tools/perf/ui/stdio/hist.c     |  2 +-
 tools/perf/util/hist.c         | 52 +++++++++++++++++++++++-------------------
 tools/perf/util/sort.h         | 16 ++++++++-----
 6 files changed, 60 insertions(+), 52 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index a21f40bebbac..b9b1b173637c 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -569,7 +569,7 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
 static int hist_browser__hpp_color_ ## _name(struct perf_hpp *hpp,	\
 					     struct hist_entry *he)	\
 {									\
-	double percent = 100.0 * he->_field / hpp->total_period;	\
+	double percent = 100.0 * he->stat._field / hpp->total_period;	\
 	*(double *)hpp->ptr = percent;					\
 	return scnprintf(hpp->buf, hpp->size, "%6.2f%%", percent);	\
 }
@@ -982,7 +982,7 @@ static int hist_browser__fprintf_entry(struct hist_browser *browser,
 		folded_sign = hist_entry__folded(he);
 
 	hist_entry__sort_snprintf(he, s, sizeof(s), browser->hists);
-	percent = (he->period * 100.0) / browser->hists->stats.total_period;
+	percent = (he->stat.period * 100.0) / browser->hists->stats.total_period;
 
 	if (symbol_conf.use_callchain)
 		printed += fprintf(fp, "%c ", folded_sign);
@@ -990,10 +990,10 @@ static int hist_browser__fprintf_entry(struct hist_browser *browser,
 	printed += fprintf(fp, " %5.2f%%", percent);
 
 	if (symbol_conf.show_nr_samples)
-		printed += fprintf(fp, " %11u", he->nr_events);
+		printed += fprintf(fp, " %11u", he->stat.nr_events);
 
 	if (symbol_conf.show_total_period)
-		printed += fprintf(fp, " %12" PRIu64, he->period);
+		printed += fprintf(fp, " %12" PRIu64, he->stat.period);
 
 	printed += fprintf(fp, "%s\n", rtrim(s));
 
diff --git a/tools/perf/ui/gtk/browser.c b/tools/perf/ui/gtk/browser.c
index 7ff99ec1d95e..7107edc6c08d 100644
--- a/tools/perf/ui/gtk/browser.c
+++ b/tools/perf/ui/gtk/browser.c
@@ -49,7 +49,7 @@ static const char *perf_gtk__get_percent_color(double percent)
 static int perf_gtk__hpp_color_ ## _name(struct perf_hpp *hpp,			\
 					 struct hist_entry *he)			\
 {										\
-	double percent = 100.0 * he->_field / hpp->total_period;		\
+	double percent = 100.0 * he->stat._field / hpp->total_period;		\
 	const char *markup;							\
 	int ret = 0;								\
 										\
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index e3f8cd46e7d7..d6ddeb10e678 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -20,12 +20,12 @@ static int hpp__width_overhead(struct perf_hpp *hpp __maybe_unused)
 
 static int hpp__color_overhead(struct perf_hpp *hpp, struct hist_entry *he)
 {
-	double percent = 100.0 * he->period / hpp->total_period;
+	double percent = 100.0 * he->stat.period / hpp->total_period;
 
 	if (hpp->ptr) {
 		struct hists *old_hists = hpp->ptr;
 		u64 total_period = old_hists->stats.total_period;
-		u64 base_period = he->pair ? he->pair->period : 0;
+		u64 base_period = he->pair ? he->pair->stat.period : 0;
 
 		if (total_period)
 			percent = 100.0 * base_period / total_period;
@@ -38,13 +38,13 @@ static int hpp__color_overhead(struct perf_hpp *hpp, struct hist_entry *he)
 
 static int hpp__entry_overhead(struct perf_hpp *hpp, struct hist_entry *he)
 {
-	double percent = 100.0 * he->period / hpp->total_period;
+	double percent = 100.0 * he->stat.period / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : " %6.2f%%";
 
 	if (hpp->ptr) {
 		struct hists *old_hists = hpp->ptr;
 		u64 total_period = old_hists->stats.total_period;
-		u64 base_period = he->pair ? he->pair->period : 0;
+		u64 base_period = he->pair ? he->pair->stat.period : 0;
 
 		if (total_period)
 			percent = 100.0 * base_period / total_period;
@@ -69,13 +69,13 @@ static int hpp__width_overhead_sys(struct perf_hpp *hpp __maybe_unused)
 
 static int hpp__color_overhead_sys(struct perf_hpp *hpp, struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_sys / hpp->total_period;
+	double percent = 100.0 * he->stat.period_sys / hpp->total_period;
 	return percent_color_snprintf(hpp->buf, hpp->size, "%6.2f%%", percent);
 }
 
 static int hpp__entry_overhead_sys(struct perf_hpp *hpp, struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_sys / hpp->total_period;
+	double percent = 100.0 * he->stat.period_sys / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : "%6.2f%%";
 
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
@@ -95,13 +95,13 @@ static int hpp__width_overhead_us(struct perf_hpp *hpp __maybe_unused)
 
 static int hpp__color_overhead_us(struct perf_hpp *hpp, struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_us / hpp->total_period;
+	double percent = 100.0 * he->stat.period_us / hpp->total_period;
 	return percent_color_snprintf(hpp->buf, hpp->size, "%6.2f%%", percent);
 }
 
 static int hpp__entry_overhead_us(struct perf_hpp *hpp, struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_us / hpp->total_period;
+	double percent = 100.0 * he->stat.period_us / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : "%6.2f%%";
 
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
@@ -120,14 +120,14 @@ static int hpp__width_overhead_guest_sys(struct perf_hpp *hpp __maybe_unused)
 static int hpp__color_overhead_guest_sys(struct perf_hpp *hpp,
 					 struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_guest_sys / hpp->total_period;
+	double percent = 100.0 * he->stat.period_guest_sys / hpp->total_period;
 	return percent_color_snprintf(hpp->buf, hpp->size, " %6.2f%% ", percent);
 }
 
 static int hpp__entry_overhead_guest_sys(struct perf_hpp *hpp,
 					 struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_guest_sys / hpp->total_period;
+	double percent = 100.0 * he->stat.period_guest_sys / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : " %6.2f%% ";
 
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
@@ -146,14 +146,14 @@ static int hpp__width_overhead_guest_us(struct perf_hpp *hpp __maybe_unused)
 static int hpp__color_overhead_guest_us(struct perf_hpp *hpp,
 					struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_guest_us / hpp->total_period;
+	double percent = 100.0 * he->stat.period_guest_us / hpp->total_period;
 	return percent_color_snprintf(hpp->buf, hpp->size, " %6.2f%% ", percent);
 }
 
 static int hpp__entry_overhead_guest_us(struct perf_hpp *hpp,
 					struct hist_entry *he)
 {
-	double percent = 100.0 * he->period_guest_us / hpp->total_period;
+	double percent = 100.0 * he->stat.period_guest_us / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : " %6.2f%% ";
 
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
@@ -175,7 +175,7 @@ static int hpp__entry_samples(struct perf_hpp *hpp, struct hist_entry *he)
 {
 	const char *fmt = symbol_conf.field_sep ? "%" PRIu64 : "%11" PRIu64;
 
-	return scnprintf(hpp->buf, hpp->size, fmt, he->nr_events);
+	return scnprintf(hpp->buf, hpp->size, fmt, he->stat.nr_events);
 }
 
 static int hpp__header_period(struct perf_hpp *hpp)
@@ -194,7 +194,7 @@ static int hpp__entry_period(struct perf_hpp *hpp, struct hist_entry *he)
 {
 	const char *fmt = symbol_conf.field_sep ? "%" PRIu64 : "%12" PRIu64;
 
-	return scnprintf(hpp->buf, hpp->size, fmt, he->period);
+	return scnprintf(hpp->buf, hpp->size, fmt, he->stat.period);
 }
 
 static int hpp__header_delta(struct perf_hpp *hpp)
@@ -220,11 +220,11 @@ static int hpp__entry_delta(struct perf_hpp *hpp, struct hist_entry *he)
 
 	old_total = pair_hists->stats.total_period;
 	if (old_total > 0 && he->pair)
-		old_percent = 100.0 * he->pair->period / old_total;
+		old_percent = 100.0 * he->pair->stat.period / old_total;
 
 	new_total = hpp->total_period;
 	if (new_total > 0)
-		new_percent = 100.0 * he->period / new_total;
+		new_percent = 100.0 * he->stat.period / new_total;
 
 	diff = new_percent - old_percent;
 	if (fabs(diff) >= 0.01)
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 882461a42830..4382a1995cda 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -271,7 +271,7 @@ static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
 {
 	switch (callchain_param.mode) {
 	case CHAIN_GRAPH_REL:
-		return callchain__fprintf_graph(fp, &he->sorted_chain, he->period,
+		return callchain__fprintf_graph(fp, &he->sorted_chain, he->stat.period,
 						left_margin);
 		break;
 	case CHAIN_GRAPH_ABS:
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 236bc9d98ff2..ab3d11491991 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -135,16 +135,16 @@ static void hist_entry__add_cpumode_period(struct hist_entry *he,
 {
 	switch (cpumode) {
 	case PERF_RECORD_MISC_KERNEL:
-		he->period_sys += period;
+		he->stat.period_sys += period;
 		break;
 	case PERF_RECORD_MISC_USER:
-		he->period_us += period;
+		he->stat.period_us += period;
 		break;
 	case PERF_RECORD_MISC_GUEST_KERNEL:
-		he->period_guest_sys += period;
+		he->stat.period_guest_sys += period;
 		break;
 	case PERF_RECORD_MISC_GUEST_USER:
-		he->period_guest_us += period;
+		he->stat.period_guest_us += period;
 		break;
 	default:
 		break;
@@ -153,13 +153,13 @@ static void hist_entry__add_cpumode_period(struct hist_entry *he,
 
 static void hist_entry__decay(struct hist_entry *he)
 {
-	he->period = (he->period * 7) / 8;
-	he->nr_events = (he->nr_events * 7) / 8;
+	he->stat.period = (he->stat.period * 7) / 8;
+	he->stat.nr_events = (he->stat.nr_events * 7) / 8;
 }
 
 static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 {
-	u64 prev_period = he->period;
+	u64 prev_period = he->stat.period;
 
 	if (prev_period == 0)
 		return true;
@@ -167,9 +167,9 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 	hist_entry__decay(he);
 
 	if (!he->filtered)
-		hists->stats.total_period -= prev_period - he->period;
+		hists->stats.total_period -= prev_period - he->stat.period;
 
-	return he->period == 0;
+	return he->stat.period == 0;
 }
 
 static void __hists__decay_entries(struct hists *hists, bool zap_user,
@@ -223,7 +223,7 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
 
 	if (he != NULL) {
 		*he = *template;
-		he->nr_events = 1;
+		he->stat.nr_events = 1;
 		if (he->ms.map)
 			he->ms.map->referenced = true;
 		if (symbol_conf.use_callchain)
@@ -238,7 +238,7 @@ static void hists__inc_nr_entries(struct hists *hists, struct hist_entry *h)
 	if (!h->filtered) {
 		hists__calc_col_len(hists, h);
 		++hists->nr_entries;
-		hists->stats.total_period += h->period;
+		hists->stats.total_period += h->stat.period;
 	}
 }
 
@@ -270,8 +270,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 		cmp = hist_entry__cmp(entry, he);
 
 		if (!cmp) {
-			he->period += period;
-			++he->nr_events;
+			he->stat.period += period;
+			++he->stat.nr_events;
 
 			/* If the map of an existing hist_entry has
 			 * become out-of-date due to an exec() or
@@ -321,7 +321,9 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 		.cpu	= al->cpu,
 		.ip	= bi->to.addr,
 		.level	= al->level,
-		.period	= period,
+		.stat = {
+			.period	= period,
+		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
 		.branch_info = bi,
@@ -343,7 +345,9 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		.cpu	= al->cpu,
 		.ip	= al->addr,
 		.level	= al->level,
-		.period	= period,
+		.stat = {
+			.period	= period,
+		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
 	};
@@ -410,12 +414,12 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
 		cmp = hist_entry__collapse(iter, he);
 
 		if (!cmp) {
-			iter->period		+= he->period;
-			iter->period_sys	+= he->period_sys;
-			iter->period_us		+= he->period_us;
-			iter->period_guest_sys	+= he->period_guest_sys;
-			iter->period_guest_us	+= he->period_guest_us;
-			iter->nr_events		+= he->nr_events;
+			iter->stat.period		+= he->stat.period;
+			iter->stat.period_sys		+= he->stat.period_sys;
+			iter->stat.period_us		+= he->stat.period_us;
+			iter->stat.period_guest_sys	+= he->stat.period_guest_sys;
+			iter->stat.period_guest_us	+= he->stat.period_guest_us;
+			iter->stat.nr_events		+= he->stat.nr_events;
 
 			if (symbol_conf.use_callchain) {
 				callchain_cursor_reset(&callchain_cursor);
@@ -518,7 +522,7 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 		parent = *p;
 		iter = rb_entry(parent, struct hist_entry, rb_node);
 
-		if (he->period > iter->period)
+		if (he->stat.period > iter->stat.period)
 			p = &(*p)->rb_left;
 		else
 			p = &(*p)->rb_right;
@@ -579,8 +583,8 @@ static void hists__remove_entry_filter(struct hists *hists, struct hist_entry *h
 	if (h->ms.unfolded)
 		hists->nr_entries += h->nr_rows;
 	h->row_offset = 0;
-	hists->stats.total_period += h->period;
-	hists->stats.nr_events[PERF_RECORD_SAMPLE] += h->nr_events;
+	hists->stats.total_period += h->stat.period;
+	hists->stats.nr_events[PERF_RECORD_SAMPLE] += h->stat.nr_events;
 
 	hists__calc_col_len(hists, h);
 }
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e459c981b039..2356d085bc54 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -42,6 +42,15 @@ extern struct sort_entry sort_sym_from;
 extern struct sort_entry sort_sym_to;
 extern enum sort_type sort__first_dimension;
 
+struct he_stat {
+	u64			period;
+	u64			period_sys;
+	u64			period_us;
+	u64			period_guest_sys;
+	u64			period_guest_us;
+	u32			nr_events;
+};
+
 /**
  * struct hist_entry - histogram entry
  *
@@ -51,16 +60,11 @@ extern enum sort_type sort__first_dimension;
 struct hist_entry {
 	struct rb_node		rb_node_in;
 	struct rb_node		rb_node;
-	u64			period;
-	u64			period_sys;
-	u64			period_us;
-	u64			period_guest_sys;
-	u64			period_guest_us;
+	struct he_stat		stat;
 	struct map_symbol	ms;
 	struct thread		*thread;
 	u64			ip;
 	s32			cpu;
-	u32			nr_events;
 
 	/* XXX These two should move to some tree widget lib */
 	u16			row_offset;
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/15] perf hists: Move he->stat.nr_events initialization to a template
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
  2012-09-13  7:19 ` [PATCH 01/15] perf hists: Add missing period_* fields when collapsing a hist entry Namhyung Kim
  2012-09-13  7:19 ` [PATCH 02/15] perf hists: Introduce struct he_stat Namhyung Kim
@ 2012-09-13  7:19 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 04/15] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Since it is set to 1 for a new hist entry, no need to set to
separately.  Move it to a template entry.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ab3d11491991..ef39e6714cbb 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -223,7 +223,7 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
 
 	if (he != NULL) {
 		*he = *template;
-		he->stat.nr_events = 1;
+
 		if (he->ms.map)
 			he->ms.map->referenced = true;
 		if (symbol_conf.use_callchain)
@@ -323,6 +323,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 		.level	= al->level,
 		.stat = {
 			.period	= period,
+			.nr_events = 1,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
@@ -347,6 +348,7 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		.level	= al->level,
 		.stat = {
 			.period	= period,
+			.nr_events = 1,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/15] perf hists: Convert hist entry functions to use struct he_stat
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (2 preceding siblings ...)
  2012-09-13  7:19 ` [PATCH 03/15] perf hists: Move he->stat.nr_events initialization to a template Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 05/15] perf hists: Add more helpers for hist entry stat Namhyung Kim
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

hist_entry__add_cpumode_period() and hist_entry__decay() are dealing
with hist_entry's stat fields only.  So make them use the struct
directly.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ef39e6714cbb..3a37f8c5c297 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -130,31 +130,31 @@ void hists__output_recalc_col_len(struct hists *hists, int max_rows)
 	}
 }
 
-static void hist_entry__add_cpumode_period(struct hist_entry *he,
+static void hist_entry__add_cpumode_period(struct he_stat *he_stat,
 					   unsigned int cpumode, u64 period)
 {
 	switch (cpumode) {
 	case PERF_RECORD_MISC_KERNEL:
-		he->stat.period_sys += period;
+		he_stat->period_sys += period;
 		break;
 	case PERF_RECORD_MISC_USER:
-		he->stat.period_us += period;
+		he_stat->period_us += period;
 		break;
 	case PERF_RECORD_MISC_GUEST_KERNEL:
-		he->stat.period_guest_sys += period;
+		he_stat->period_guest_sys += period;
 		break;
 	case PERF_RECORD_MISC_GUEST_USER:
-		he->stat.period_guest_us += period;
+		he_stat->period_guest_us += period;
 		break;
 	default:
 		break;
 	}
 }
 
-static void hist_entry__decay(struct hist_entry *he)
+static void hist_entry__decay(struct he_stat *he_stat)
 {
-	he->stat.period = (he->stat.period * 7) / 8;
-	he->stat.nr_events = (he->stat.nr_events * 7) / 8;
+	he_stat->period = (he_stat->period * 7) / 8;
+	he_stat->nr_events = (he_stat->nr_events * 7) / 8;
 }
 
 static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
@@ -164,7 +164,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 	if (prev_period == 0)
 		return true;
 
-	hist_entry__decay(he);
+	hist_entry__decay(&he->stat);
 
 	if (!he->filtered)
 		hists->stats.total_period -= prev_period - he->stat.period;
@@ -300,7 +300,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	rb_link_node(&he->rb_node_in, parent, p);
 	rb_insert_color(&he->rb_node_in, hists->entries_in);
 out:
-	hist_entry__add_cpumode_period(he, al->cpumode, period);
+	hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
 out_unlock:
 	pthread_mutex_unlock(&hists->lock);
 	return he;
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/15] perf hists: Add more helpers for hist entry stat
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (3 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 04/15] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 06/15] perf hists: Add support for accumulated stat of hist entry Namhyung Kim
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Add and use hist_entry__add_{period,stat} for calculating hist entry's
stat.  It will be used for accumulated stats later as well.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 3a37f8c5c297..9df0a503e159 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -130,6 +130,12 @@ void hists__output_recalc_col_len(struct hists *hists, int max_rows)
 	}
 }
 
+static void hist_entry__add_period(struct he_stat *he_stat, u64 period)
+{
+	he_stat->period		+= period;
+	he_stat->nr_events	+= 1;
+}
+
 static void hist_entry__add_cpumode_period(struct he_stat *he_stat,
 					   unsigned int cpumode, u64 period)
 {
@@ -151,6 +157,16 @@ static void hist_entry__add_cpumode_period(struct he_stat *he_stat,
 	}
 }
 
+static void hist_entry__add_stat(struct he_stat *dest, struct he_stat *src)
+{
+	dest->period		+= src->period;
+	dest->period_sys	+= src->period_sys;
+	dest->period_us		+= src->period_us;
+	dest->period_guest_sys	+= src->period_guest_sys;
+	dest->period_guest_us	+= src->period_guest_us;
+	dest->nr_events		+= src->nr_events;
+}
+
 static void hist_entry__decay(struct he_stat *he_stat)
 {
 	he_stat->period = (he_stat->period * 7) / 8;
@@ -270,8 +286,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 		cmp = hist_entry__cmp(entry, he);
 
 		if (!cmp) {
-			he->stat.period += period;
-			++he->stat.nr_events;
+			hist_entry__add_period(&he->stat, period);
 
 			/* If the map of an existing hist_entry has
 			 * become out-of-date due to an exec() or
@@ -416,12 +431,7 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
 		cmp = hist_entry__collapse(iter, he);
 
 		if (!cmp) {
-			iter->stat.period		+= he->stat.period;
-			iter->stat.period_sys		+= he->stat.period_sys;
-			iter->stat.period_us		+= he->stat.period_us;
-			iter->stat.period_guest_sys	+= he->stat.period_guest_sys;
-			iter->stat.period_guest_us	+= he->stat.period_guest_us;
-			iter->stat.nr_events		+= he->stat.nr_events;
+			hist_entry__add_stat(&iter->stat, &he->stat);
 
 			if (symbol_conf.use_callchain) {
 				callchain_cursor_reset(&callchain_cursor);
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/15] perf hists: Add support for accumulated stat of hist entry
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (4 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 05/15] perf hists: Add more helpers for hist entry stat Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 07/15] perf hists: Check if accumulated when adding a " Namhyung Kim
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Maintain accumulated stat information in hist_entry->stat_acc if
symbol_conf.cumulate_callchain is set.  Fields in ->stat_acc have same
vaules initially, and will be updated as callchain is processed later.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c   | 18 ++++++++++++++++++
 tools/perf/util/sort.h   |  1 +
 tools/perf/util/symbol.h |  1 +
 3 files changed, 20 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 9df0a503e159..2802302e5904 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -181,6 +181,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 		return true;
 
 	hist_entry__decay(&he->stat);
+	if (symbol_conf.cumulate_callchain)
+		hist_entry__decay(he->stat_acc);
 
 	if (!he->filtered)
 		hists->stats.total_period -= prev_period - he->stat.period;
@@ -239,6 +241,14 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
 
 	if (he != NULL) {
 		*he = *template;
+		if (symbol_conf.cumulate_callchain) {
+			he->stat_acc = malloc(sizeof(he->stat));
+			if (he->stat_acc == NULL) {
+				free(he);
+				return NULL;
+			}
+			memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+		}
 
 		if (he->ms.map)
 			he->ms.map->referenced = true;
@@ -287,6 +297,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 
 		if (!cmp) {
 			hist_entry__add_period(&he->stat, period);
+			if (symbol_conf.cumulate_callchain)
+				hist_entry__add_period(he->stat_acc, period);
 
 			/* If the map of an existing hist_entry has
 			 * become out-of-date due to an exec() or
@@ -316,6 +328,9 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	rb_insert_color(&he->rb_node_in, hists->entries_in);
 out:
 	hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
+	if (symbol_conf.cumulate_callchain)
+		hist_entry__add_cpumode_period(he->stat_acc, al->cpumode,
+					       period);
 out_unlock:
 	pthread_mutex_unlock(&hists->lock);
 	return he;
@@ -432,6 +447,9 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
 
 		if (!cmp) {
 			hist_entry__add_stat(&iter->stat, &he->stat);
+			if (symbol_conf.cumulate_callchain)
+				hist_entry__add_stat(iter->stat_acc,
+						     he->stat_acc);
 
 			if (symbol_conf.use_callchain) {
 				callchain_cursor_reset(&callchain_cursor);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 2356d085bc54..65b69824bc87 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -61,6 +61,7 @@ struct hist_entry {
 	struct rb_node		rb_node_in;
 	struct rb_node		rb_node;
 	struct he_stat		stat;
+	struct he_stat		*stat_acc;
 	struct map_symbol	ms;
 	struct thread		*thread;
 	u64			ip;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 4ff45e30c726..d0d65c4ebe6a 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -91,6 +91,7 @@ struct symbol_conf {
 			show_nr_samples,
 			show_total_period,
 			use_callchain,
+			cumulate_callchain,
 			exclude_other,
 			show_cpu_utilization,
 			initialized,
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/15] perf hists: Check if accumulated when adding a hist entry
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (5 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 06/15] perf hists: Add support for accumulated stat of hist entry Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 08/15] perf callchain: Add a couple of callchain helpers Namhyung Kim
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

To support callchain accumulation, @entry should be recognized if it's
accumulated or not when add_hist_entry() called.  The period of an
accumulated entry should be added to ->stat_acc but not ->stat. Add
@sample_self arg for that.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 62 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 2802302e5904..46433a0830dd 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -234,26 +234,50 @@ void hists__decay_entries_threaded(struct hists *hists,
  * histogram, sorted on item, collects periods
  */
 
-static struct hist_entry *hist_entry__new(struct hist_entry *template)
+static struct hist_entry *hist_entry__new_callchain(struct hist_entry *template,
+						    bool sample_self)
 {
-	size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
+	const size_t callchain_size = sizeof(struct callchain_root);
 	struct hist_entry *he = malloc(sizeof(*he) + callchain_size);
 
+	if (he == NULL)
+		return NULL;
+
+	*he = *template;
+
+	if (symbol_conf.cumulate_callchain) {
+		he->stat_acc = malloc(sizeof(he->stat));
+		if (he->stat_acc == NULL) {
+			free(he);
+			return NULL;
+		}
+		memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+		if (!sample_self)
+			memset(&he->stat, 0, sizeof(he->stat));
+	}
+
+	if (he->ms.map)
+		he->ms.map->referenced = true;
+
+	callchain_init(he->callchain);
+
+	return he;
+}
+
+static struct hist_entry *hist_entry__new(struct hist_entry *template,
+					  bool sample_self)
+{
+	struct hist_entry *he;
+
+	if (symbol_conf.use_callchain)
+		return hist_entry__new_callchain(template, sample_self);
+
+	he = malloc(sizeof(*he));
 	if (he != NULL) {
 		*he = *template;
-		if (symbol_conf.cumulate_callchain) {
-			he->stat_acc = malloc(sizeof(he->stat));
-			if (he->stat_acc == NULL) {
-				free(he);
-				return NULL;
-			}
-			memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
-		}
 
 		if (he->ms.map)
 			he->ms.map->referenced = true;
-		if (symbol_conf.use_callchain)
-			callchain_init(he->callchain);
 	}
 
 	return he;
@@ -278,7 +302,7 @@ static u8 symbol__parent_filter(const struct symbol *parent)
 static struct hist_entry *add_hist_entry(struct hists *hists,
 				      struct hist_entry *entry,
 				      struct addr_location *al,
-				      u64 period)
+				      u64 period, bool sample_self)
 {
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
@@ -296,7 +320,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 		cmp = hist_entry__cmp(entry, he);
 
 		if (!cmp) {
-			hist_entry__add_period(&he->stat, period);
+			if (sample_self)
+				hist_entry__add_period(&he->stat, period);
 			if (symbol_conf.cumulate_callchain)
 				hist_entry__add_period(he->stat_acc, period);
 
@@ -320,14 +345,15 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 			p = &(*p)->rb_right;
 	}
 
-	he = hist_entry__new(entry);
+	he = hist_entry__new(entry, sample_self);
 	if (!he)
 		goto out_unlock;
 
 	rb_link_node(&he->rb_node_in, parent, p);
 	rb_insert_color(&he->rb_node_in, hists->entries_in);
 out:
-	hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
+	if (sample_self)
+		hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
 	if (symbol_conf.cumulate_callchain)
 		hist_entry__add_cpumode_period(he->stat_acc, al->cpumode,
 					       period);
@@ -360,7 +386,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 		.branch_info = bi,
 	};
 
-	return add_hist_entry(self, &entry, al, period);
+	return add_hist_entry(self, &entry, al, period, true);
 }
 
 struct hist_entry *__hists__add_entry(struct hists *self,
@@ -384,7 +410,7 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		.filtered = symbol__parent_filter(sym_parent),
 	};
 
-	return add_hist_entry(self, &entry, al, period);
+	return add_hist_entry(self, &entry, al, period, true);
 }
 
 int64_t
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/15] perf callchain: Add a couple of callchain helpers
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (6 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 07/15] perf hists: Check if accumulated when adding a " Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 09/15] perf hists: Let add_hist_entry to make a hist entry template Namhyung Kim
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

To accumulate callchain information of a hist entry, following helper
functions are needed.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/callchain.c | 15 +++++++++++++++
 tools/perf/util/callchain.h | 17 +++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index d3b3f5d82137..255c4a8266be 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -461,3 +461,18 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
 
 	return 0;
 }
+
+int callchain_cursor_peek_al(struct callchain_cursor *cursor,
+			     struct addr_location *al)
+{
+	struct callchain_cursor_node *node = cursor->first;
+
+	if (node == NULL || cursor->nr == 0)
+		return -1;
+
+	al->map = node->map;
+	al->sym = node->sym;
+	al->addr = node->ip;
+
+	return 0;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3bdb407f9cd9..46ca2b4360e3 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -4,6 +4,7 @@
 #include "../perf.h"
 #include <linux/list.h>
 #include <linux/rbtree.h>
+#include <string.h>
 #include "event.h"
 #include "symbol.h"
 
@@ -143,4 +144,20 @@ static inline void callchain_cursor_advance(struct callchain_cursor *cursor)
 	cursor->curr = cursor->curr->next;
 	cursor->pos++;
 }
+
+/* NOTE: This function can leak a cursor node.  Use with caution. */
+static inline void callchain_cursor_next(struct callchain_cursor *cursor)
+{
+	cursor->first = cursor->first->next;
+	cursor->nr--;
+}
+
+static inline void callchain_cursor_copy(struct callchain_cursor *dest,
+					 struct callchain_cursor *src)
+{
+	memcpy(dest, src, sizeof(*src));
+}
+
+int callchain_cursor_peek_al(struct callchain_cursor *cursor,
+			     struct addr_location *al);
 #endif	/* __PERF_CALLCHAIN_H */
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/15] perf hists: Let add_hist_entry to make a hist entry template
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (7 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 08/15] perf callchain: Add a couple of callchain helpers Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 10/15] perf hists: Accumulate hist entry stat based on the callchain Namhyung Kim
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Change the add_hist_entry() to make a template based on given @al.
It makes the callchain cumulation code simpler.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 50 +++++++++++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 46433a0830dd..f96298f59e49 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -299,10 +299,10 @@ static u8 symbol__parent_filter(const struct symbol *parent)
 	return 0;
 }
 
-static struct hist_entry *add_hist_entry(struct hists *hists,
-				      struct hist_entry *entry,
-				      struct addr_location *al,
-				      u64 period, bool sample_self)
+static struct hist_entry *__add_hist_entry(struct hists *hists,
+					   struct hist_entry *entry,
+					   struct addr_location *al,
+					   u64 period, bool sample_self)
 {
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
@@ -362,20 +362,19 @@ out_unlock:
 	return he;
 }
 
-struct hist_entry *__hists__add_branch_entry(struct hists *self,
-					     struct addr_location *al,
-					     struct symbol *sym_parent,
-					     struct branch_info *bi,
-					     u64 period)
+static struct hist_entry *add_hist_entry(struct hists *hists,
+					 struct addr_location *al,
+					 struct symbol *sym_parent,
+					 u64 period, bool sample_self)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
 		.ms = {
-			.map	= bi->to.map,
-			.sym	= bi->to.sym,
+			.map	= al->map,
+			.sym	= al->sym,
 		},
 		.cpu	= al->cpu,
-		.ip	= bi->to.addr,
+		.ip	= al->addr,
 		.level	= al->level,
 		.stat = {
 			.period	= period,
@@ -383,24 +382,25 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
-		.branch_info = bi,
 	};
 
-	return add_hist_entry(self, &entry, al, period, true);
+	return __add_hist_entry(hists, &entry, al, period, sample_self);
 }
 
-struct hist_entry *__hists__add_entry(struct hists *self,
-				      struct addr_location *al,
-				      struct symbol *sym_parent, u64 period)
+struct hist_entry *__hists__add_branch_entry(struct hists *self,
+					     struct addr_location *al,
+					     struct symbol *sym_parent,
+					     struct branch_info *bi,
+					     u64 period)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
 		.ms = {
-			.map	= al->map,
-			.sym	= al->sym,
+			.map	= bi->to.map,
+			.sym	= bi->to.sym,
 		},
 		.cpu	= al->cpu,
-		.ip	= al->addr,
+		.ip	= bi->to.addr,
 		.level	= al->level,
 		.stat = {
 			.period	= period,
@@ -408,9 +408,17 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
+		.branch_info = bi,
 	};
 
-	return add_hist_entry(self, &entry, al, period, true);
+	return __add_hist_entry(self, &entry, al, period, true);
+}
+
+struct hist_entry *__hists__add_entry(struct hists *self,
+				      struct addr_location *al,
+				      struct symbol *sym_parent, u64 period)
+{
+	return add_hist_entry(self, al, sym_parent, period, true);
 }
 
 int64_t
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/15] perf hists: Accumulate hist entry stat based on the callchain
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (8 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 09/15] perf hists: Let add_hist_entry to make a hist entry template Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 11/15] perf hists: Sort hist entries by accumulated period Namhyung Kim
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Call add_hist_entry() for each callchain node to get an accumulated
stat for an entry.  However skip nodes which do not have symbol info
as they caused subtle problems.

AFAICS the current sort methods cannot distinguish entries with NULL
dso/sym well so that processing a callchian for an entry that doesn't
have symbol info might add a period to a same entry multiple times.
It ended up with an entry that have more than 100% of accumulated
period value which is not good.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index f96298f59e49..2b629f460889 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -387,6 +387,61 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	return __add_hist_entry(hists, &entry, al, period, sample_self);
 }
 
+static struct hist_entry *cumulate_hist_entry(struct hists *self,
+					      struct addr_location *al,
+					      struct symbol *sym_parent,
+					      u64 period)
+{
+	unsigned int i;
+	struct hist_entry *he, *he_self;
+	struct callchain_cursor cursor;
+	struct addr_location al_child;
+
+	/*
+	 * make a copy of callchain cursor since callchain_cursor_next()
+	 * can leak callchain cursor nodes otherwise.
+	 */
+	callchain_cursor_copy(&cursor, &callchain_cursor);
+
+	he_self = add_hist_entry(self, al, sym_parent, period, true);
+	if (he_self == NULL)
+		return NULL;
+
+	callchain_cursor_next(&cursor);
+
+	/*
+	 * map, sym and addr in al_child will be updated on a loop below.
+	 * The other fields are kept same as original @al.
+	 */
+	al_child = *al;
+
+	/* add all entries in the original callchain */
+	for (i = 1; i < callchain_cursor.nr; i++) {
+		if (callchain_cursor_peek_al(&cursor, &al_child))
+			return NULL;
+
+		/*
+		 * XXX: Adding an entry without symbol information caused
+		 * subtle problems.  Just skip it for now.
+		 */
+		if (al_child.sym == NULL) {
+			callchain_cursor_next(&cursor);
+			continue;
+		}
+
+		he = add_hist_entry(self, &al_child, sym_parent, period, false);
+		if (he == NULL)
+			return NULL;
+
+		if (callchain_append(he->callchain, &cursor, period))
+			return NULL;
+
+		callchain_cursor_next(&cursor);
+	}
+
+	return he_self;
+}
+
 struct hist_entry *__hists__add_branch_entry(struct hists *self,
 					     struct addr_location *al,
 					     struct symbol *sym_parent,
@@ -418,6 +473,9 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
 				      struct symbol *sym_parent, u64 period)
 {
+	if (symbol_conf.cumulate_callchain)
+		return cumulate_hist_entry(self, al, sym_parent, period);
+
 	return add_hist_entry(self, al, sym_parent, period, true);
 }
 
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/15] perf hists: Sort hist entries by accumulated period
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (9 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 10/15] perf hists: Accumulate hist entry stat based on the callchain Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 12/15] perf ui/hist: Add support to accumulated hist stat Namhyung Kim
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

When symbol_conf.cumulate_callchain is requested, we need to sort the
entries by accumulated period value.  To do this, separate out the
compare routine to hists__output_cmp().

In addition, if accumulated periods of two entries are same
(i.e. single path callchain) put the caller above since accumulation
tends to put callers on higher position for obvious reason.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 2b629f460889..53ca74f1979a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -628,6 +628,23 @@ void hists__collapse_resort_threaded(struct hists *hists)
  * reverse the map, sort on period.
  */
 
+static int hists__output_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	int cmp = left->stat.period > right->stat.period;
+
+	if (symbol_conf.cumulate_callchain) {
+		/*
+		 * Put caller above callee when they have equal period.
+		 */
+		if (left->stat_acc->period == right->stat_acc->period)
+			cmp = left->callchain->max_depth < right->callchain->max_depth;
+		else
+			cmp = left->stat_acc->period > right->stat_acc->period;
+	}
+
+	return cmp;
+}
+
 static void __hists__insert_output_entry(struct rb_root *entries,
 					 struct hist_entry *he,
 					 u64 min_callchain_hits)
@@ -644,7 +661,7 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 		parent = *p;
 		iter = rb_entry(parent, struct hist_entry, rb_node);
 
-		if (he->stat.period > iter->stat.period)
+		if (hists__output_cmp(he, iter))
 			p = &(*p)->rb_left;
 		else
 			p = &(*p)->rb_right;
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/15] perf ui/hist: Add support to accumulated hist stat
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (10 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 11/15] perf hists: Sort hist entries by accumulated period Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 13/15] perf ui/browser: " Namhyung Kim
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/hist.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index d6ddeb10e678..7bf2999d2d17 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -22,6 +22,9 @@ static int hpp__color_overhead(struct perf_hpp *hpp, struct hist_entry *he)
 {
 	double percent = 100.0 * he->stat.period / hpp->total_period;
 
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period / hpp->total_period;
+
 	if (hpp->ptr) {
 		struct hists *old_hists = hpp->ptr;
 		u64 total_period = old_hists->stats.total_period;
@@ -41,6 +44,9 @@ static int hpp__entry_overhead(struct perf_hpp *hpp, struct hist_entry *he)
 	double percent = 100.0 * he->stat.period / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : " %6.2f%%";
 
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period / hpp->total_period;
+
 	if (hpp->ptr) {
 		struct hists *old_hists = hpp->ptr;
 		u64 total_period = old_hists->stats.total_period;
@@ -70,6 +76,10 @@ static int hpp__width_overhead_sys(struct perf_hpp *hpp __maybe_unused)
 static int hpp__color_overhead_sys(struct perf_hpp *hpp, struct hist_entry *he)
 {
 	double percent = 100.0 * he->stat.period_sys / hpp->total_period;
+
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_sys / hpp->total_period;
+
 	return percent_color_snprintf(hpp->buf, hpp->size, "%6.2f%%", percent);
 }
 
@@ -78,6 +88,9 @@ static int hpp__entry_overhead_sys(struct perf_hpp *hpp, struct hist_entry *he)
 	double percent = 100.0 * he->stat.period_sys / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : "%6.2f%%";
 
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_sys / hpp->total_period;
+
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
 }
 
@@ -96,6 +109,10 @@ static int hpp__width_overhead_us(struct perf_hpp *hpp __maybe_unused)
 static int hpp__color_overhead_us(struct perf_hpp *hpp, struct hist_entry *he)
 {
 	double percent = 100.0 * he->stat.period_us / hpp->total_period;
+
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_us / hpp->total_period;
+
 	return percent_color_snprintf(hpp->buf, hpp->size, "%6.2f%%", percent);
 }
 
@@ -104,6 +121,9 @@ static int hpp__entry_overhead_us(struct perf_hpp *hpp, struct hist_entry *he)
 	double percent = 100.0 * he->stat.period_us / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : "%6.2f%%";
 
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_us / hpp->total_period;
+
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
 }
 
@@ -121,6 +141,10 @@ static int hpp__color_overhead_guest_sys(struct perf_hpp *hpp,
 					 struct hist_entry *he)
 {
 	double percent = 100.0 * he->stat.period_guest_sys / hpp->total_period;
+
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_guest_sys / hpp->total_period;
+
 	return percent_color_snprintf(hpp->buf, hpp->size, " %6.2f%% ", percent);
 }
 
@@ -130,6 +154,9 @@ static int hpp__entry_overhead_guest_sys(struct perf_hpp *hpp,
 	double percent = 100.0 * he->stat.period_guest_sys / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : " %6.2f%% ";
 
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_guest_sys / hpp->total_period;
+
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
 }
 
@@ -147,6 +174,10 @@ static int hpp__color_overhead_guest_us(struct perf_hpp *hpp,
 					struct hist_entry *he)
 {
 	double percent = 100.0 * he->stat.period_guest_us / hpp->total_period;
+
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_guest_us / hpp->total_period;
+
 	return percent_color_snprintf(hpp->buf, hpp->size, " %6.2f%% ", percent);
 }
 
@@ -156,6 +187,9 @@ static int hpp__entry_overhead_guest_us(struct perf_hpp *hpp,
 	double percent = 100.0 * he->stat.period_guest_us / hpp->total_period;
 	const char *fmt = symbol_conf.field_sep ? "%.2f" : " %6.2f%% ";
 
+	if (symbol_conf.cumulate_callchain)
+		percent = 100.0 * he->stat_acc->period_guest_us / hpp->total_period;
+
 	return scnprintf(hpp->buf, hpp->size, fmt, percent);
 }
 
@@ -174,8 +208,12 @@ static int hpp__width_samples(struct perf_hpp *hpp __maybe_unused)
 static int hpp__entry_samples(struct perf_hpp *hpp, struct hist_entry *he)
 {
 	const char *fmt = symbol_conf.field_sep ? "%" PRIu64 : "%11" PRIu64;
+	u64 samples = he->stat.nr_events;
 
-	return scnprintf(hpp->buf, hpp->size, fmt, he->stat.nr_events);
+	if (symbol_conf.cumulate_callchain)
+		samples = he->stat_acc->nr_events;
+
+	return scnprintf(hpp->buf, hpp->size, fmt, samples);
 }
 
 static int hpp__header_period(struct perf_hpp *hpp)
@@ -193,8 +231,12 @@ static int hpp__width_period(struct perf_hpp *hpp __maybe_unused)
 static int hpp__entry_period(struct perf_hpp *hpp, struct hist_entry *he)
 {
 	const char *fmt = symbol_conf.field_sep ? "%" PRIu64 : "%12" PRIu64;
+	u64 period = he->stat.period;
+
+	if (symbol_conf.cumulate_callchain)
+		period = he->stat_acc->period;
 
-	return scnprintf(hpp->buf, hpp->size, fmt, he->stat.period);
+	return scnprintf(hpp->buf, hpp->size, fmt, period);
 }
 
 static int hpp__header_delta(struct perf_hpp *hpp)
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/15] perf ui/browser: Add support to accumulated hist stat
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (11 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 12/15] perf ui/hist: Add support to accumulated hist stat Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 14/15] perf ui/gtk: " Namhyung Kim
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index b9b1b173637c..1dcdccf2edac 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -570,6 +570,10 @@ static int hist_browser__hpp_color_ ## _name(struct perf_hpp *hpp,	\
 					     struct hist_entry *he)	\
 {									\
 	double percent = 100.0 * he->stat._field / hpp->total_period;	\
+									\
+	if (symbol_conf.cumulate_callchain)				\
+		percent = 100.0 * he->stat_acc->_field / hpp->total_period; \
+									\
 	*(double *)hpp->ptr = percent;					\
 	return scnprintf(hpp->buf, hpp->size, "%6.2f%%", percent);	\
 }
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/15] perf ui/gtk: Add support to accumulated hist stat
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (12 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 13/15] perf ui/browser: " Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-13  7:20 ` [PATCH 15/15] perf report: Add --cumulate option Namhyung Kim
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/gtk/browser.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/ui/gtk/browser.c b/tools/perf/ui/gtk/browser.c
index 7107edc6c08d..2fdd2f675194 100644
--- a/tools/perf/ui/gtk/browser.c
+++ b/tools/perf/ui/gtk/browser.c
@@ -53,6 +53,9 @@ static int perf_gtk__hpp_color_ ## _name(struct perf_hpp *hpp,			\
 	const char *markup;							\
 	int ret = 0;								\
 										\
+	if (symbol_conf.cumulate_callchain)					\
+		percent = 100.0 * he->stat_acc->_field / hpp->total_period;	\
+										\
 	markup = perf_gtk__get_percent_color(percent);				\
 	if (markup)								\
 		ret += scnprintf(hpp->buf, hpp->size, "%s", markup);		\
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 15/15] perf report: Add --cumulate option
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (13 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 14/15] perf ui/gtk: " Namhyung Kim
@ 2012-09-13  7:20 ` Namhyung Kim
  2012-09-20 17:33 ` [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Arun Sharma
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-13  7:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa,
	Namhyung Kim

From: Namhyung Kim <namhyung.kim@lge.com>

When --cumulate option is given, it will print accumulated hist stat
information.  It requires callchain information.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 97b2e6300f4c..92da05c2151f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -282,6 +282,12 @@ static int perf_report__setup_sample_type(struct perf_report *rep)
 		}
 	}
 
+	if (!symbol_conf.use_callchain && symbol_conf.cumulate_callchain) {
+		ui__error("Selected --cumulate but no callchain data. "
+			  "Did you call 'perf record' without -g?\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -641,6 +647,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "use branch records for histogram filling", parse_branch_mode),
 	OPT_STRING(0, "objdump", &objdump_path, "path",
 		   "objdump binary to use for disassembly and annotations"),
+	OPT_BOOLEAN(0, "cumulate", &symbol_conf.cumulate_callchain,
+		    "Accumulate period along the callchain"),
 	OPT_END()
 	};
 
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (14 preceding siblings ...)
  2012-09-13  7:20 ` [PATCH 15/15] perf report: Add --cumulate option Namhyung Kim
@ 2012-09-20 17:33 ` Arun Sharma
  2012-09-25  4:57 ` Namhyung Kim
  2012-10-29 19:08 ` Peter Zijlstra
  17 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-09-20 17:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, LKML, Frederic Weisbecker, David Ahern, Jiri Olsa

On 9/13/12 12:19 AM, Namhyung Kim wrote:
> Hi,
>
> This is my first attempt to implement cumulative hist period report.
> This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
> rewrote it from scratch.

Tested-by: Arun Sharma <asharma@fb.com>

Our typical use case:

perf record -g fp ./foo
perf report --stdio --cumulate -g graph,100,callee

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (15 preceding siblings ...)
  2012-09-20 17:33 ` [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Arun Sharma
@ 2012-09-25  4:57 ` Namhyung Kim
  2012-09-27 23:01   ` Frederic Weisbecker
  2012-10-29 19:08 ` Peter Zijlstra
  17 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2012-09-25  4:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Arun Sharma
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, David Ahern, Jiri Olsa

Ping.  Any comments for this?

Arun, thanks for testing!
Namhyung


On Thu, 13 Sep 2012 16:19:56 +0900, Namhyung Kim wrote:
> Hi,
>
> This is my first attempt to implement cumulative hist period report.
> This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
> rewrote it from scratch.
>
> It basically adds period in a sample to every node in the callchain.
> A hist_entry now has an additional fields to keep the cumulative
> period if --cumulate option is given on perf report.
>
> Let me show you an example:
>
>   $ cat abc.c
>   #define barrier() asm volatile("" ::: "memory")
>   
>   void a(void)
>   {
>   	int i;
>   
>   	for (i = 0; i < 1000000; i++)
>   		barrier();
>   }
>   
>   void b(void)
>   {
>   	a();
>   }
>   
>   void c(void)
>   {
>   	b();
>   }
>   
>   int main(void)
>   {
>   	c();
>   
>   	return 0;
>   }
>   
> With this simple program I ran perf record and report:
>
>   $ perf record -g -e cycles:u ./abc
>   $ perf report -g none --stdio
>   [snip]
>   # Overhead  Command       Shared Object                      Symbol
>   # ........  .......  ..................  ..........................
>   #
>       93.35%      abc  abc                 [.] a                     
>        5.17%      abc  ld-2.15.so          [.] _dl_map_object_from_fd
>        1.13%      abc  ld-2.15.so          [.] _dl_start             
>        0.29%      abc  libpthread-2.15.so  [.] __libc_close          
>        0.07%      abc  [kernel.kallsyms]   [k] page_fault            
>        0.00%      abc  ld-2.15.so          [.] _start                
>   
> When --cumulate option is given, it'll be shown like this:
>
>    $ perf report --cumulate
>    (...)
>    +  93.63%  abc  libc-2.15.so        [.] __libc_start_main
>    +  93.35%  abc  abc                 [.] main
>    +  93.35%  abc  abc                 [.] c
>    +  93.35%  abc  abc                 [.] b
>    +  93.35%  abc  abc                 [.] a
>    +   5.17%  abc  ld-2.15.so          [.] _dl_map_object
>    +   5.17%  abc  ld-2.15.so          [.] _dl_map_object_from_fd
>    +   1.13%  abc  ld-2.15.so          [.] _dl_start_user
>    +   1.13%  abc  ld-2.15.so          [.] _dl_start
>    +   0.29%  abc  perf                [.] main
>    +   0.29%  abc  perf                [.] run_builtin
>    +   0.29%  abc  perf                [.] cmd_record
>    +   0.29%  abc  libpthread-2.15.so  [.] __libc_close
>    +   0.07%  abc  ld-2.15.so          [.] _start
>    +   0.07%  abc  [kernel.kallsyms]   [k] page_fault
>    
> (This output came from TUI since stdio bothered by callchains)
>
> As you can see __libc_start_main -> main -> c -> b -> a callchain show
> up in the output.
>
> It might have some rough edges or even bugs, but I really want to
> release it and get reviews.  In fact I saw some very large percentage
> or 'inf' on some callchain nodes when expanding.
>
> It currently ignores samples don't have symbol info when accumulating
> periods along the callchain.  Otherwise it resulted in very strangely
> large output since every node in the callchain would be added into a
> single entry which has NULL dso/sym.  Simply ignoring them solved the
> problem and I couldn't come up with a better solution.
>
> This patchset is based on current acme/perf/core + my small fixes [2],[3].
> You can also get this series on my tree at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git  perf/cumulate-v1
>
> Any comments are welcome, thanks.
> Namhyung
>
> [1] https://lkml.org/lkml/2012/3/31/6
> [2] https://lkml.org/lkml/2012/9/11/546
> [3] https://lkml.org/lkml/2012/9/12/51
>
>
> Namhyung Kim (15):
>   perf hists: Add missing period_* fields when collapsing a hist entry
>   perf hists: Introduce struct he_stat
>   perf hists: Move he->stat.nr_events initialization to a template
>   perf hists: Convert hist entry functions to use struct he_stat
>   perf hists: Add more helpers for hist entry stat
>   perf hists: Add support for accumulated stat of hist entry
>   perf hists: Check if accumulated when adding a hist entry
>   perf callchain: Add a couple of callchain helpers
>   perf hists: Let add_hist_entry to make a hist entry template
>   perf hists: Accumulate hist entry stat based on the callchain
>   perf hists: Sort hist entries by accumulated period
>   perf ui/hist: Add support to accumulated hist stat
>   perf ui/browser: Add support to accumulated hist stat
>   perf ui/gtk: Add support to accumulated hist stat
>   perf report: Add --cumulate option
>
>  tools/perf/builtin-report.c    |   8 ++
>  tools/perf/ui/browsers/hists.c |  12 +-
>  tools/perf/ui/gtk/browser.c    |   5 +-
>  tools/perf/ui/hist.c           |  74 ++++++++++---
>  tools/perf/ui/stdio/hist.c     |   2 +-
>  tools/perf/util/callchain.c    |  15 +++
>  tools/perf/util/callchain.h    |  17 +++
>  tools/perf/util/hist.c         | 242 +++++++++++++++++++++++++++++++++--------
>  tools/perf/util/sort.h         |  17 ++-
>  tools/perf/util/symbol.h       |   1 +
>  10 files changed, 318 insertions(+), 75 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-25  4:57 ` Namhyung Kim
@ 2012-09-27 23:01   ` Frederic Weisbecker
  2012-09-28  5:49     ` Namhyung Kim
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2012-09-27 23:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Arun Sharma, Peter Zijlstra,
	Paul Mackerras, Ingo Molnar, LKML, David Ahern, Jiri Olsa

On Tue, Sep 25, 2012 at 01:57:26PM +0900, Namhyung Kim wrote:
> Ping.  Any comments for this?
> 
> Arun, thanks for testing!
> Namhyung

When Arun was working on this, I asked him to explore if it could make sense to reuse
the "-b, --branch-stack"  perf report option. Because after all, this feature is doing
about the same than "-b" except it's using callchains instead of full branch tracing.
But callchains are branches. Just a limited subset of all branches taken on excecution.
So you can probably reuse some interface and even ground code there.

What do you think?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-27 23:01   ` Frederic Weisbecker
@ 2012-09-28  5:49     ` Namhyung Kim
  2012-09-28  7:07       ` Stephane Eranian
  2012-09-28 15:27       ` Frederic Weisbecker
  0 siblings, 2 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-09-28  5:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Arnaldo Carvalho de Melo, Arun Sharma, Peter Zijlstra,
	Paul Mackerras, Ingo Molnar, LKML, David Ahern, Jiri Olsa,
	Stephane Eranian

Hi Frederic,

On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
> When Arun was working on this, I asked him to explore if it could make sense to reuse
> the "-b, --branch-stack"  perf report option. Because after all, this feature is doing
> about the same than "-b" except it's using callchains instead of full branch tracing.
> But callchains are branches. Just a limited subset of all branches taken on excecution.
> So you can probably reuse some interface and even ground code there.
>
> What do you think?

Umm.. first of all, I'm not familiar with the branch stack thing.  It's
intel-specific, right?

Also I don't understand what exactly you want here.  What kind of
interface did you say?  Can you elaborate it bit more?

And AFAIK branch stack can collect much more branch information than
just callstacks.  Can we differentiate which is which easily?  Is there
any limitation on using it?  What if callstacks are not sync'ed with
branch stacks - is it possible though?

But I think it'd be good if the branch stack can be changed to call
stack in general.  Did you mean this?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-28  5:49     ` Namhyung Kim
@ 2012-09-28  7:07       ` Stephane Eranian
  2012-09-28 15:14         ` Frederic Weisbecker
  2012-09-28 15:27       ` Frederic Weisbecker
  1 sibling, 1 reply; 30+ messages in thread
From: Stephane Eranian @ 2012-09-28  7:07 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Frederic Weisbecker, Arnaldo Carvalho de Melo, Arun Sharma,
	Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML, David Ahern,
	Jiri Olsa

On Fri, Sep 28, 2012 at 7:49 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Frederic,
>
> On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
>> When Arun was working on this, I asked him to explore if it could make sense to reuse
>> the "-b, --branch-stack"  perf report option. Because after all, this feature is doing
>> about the same than "-b" except it's using callchains instead of full branch tracing.
>> But callchains are branches. Just a limited subset of all branches taken on excecution.
>> So you can probably reuse some interface and even ground code there.
>>
>> What do you think?
>
> Umm.. first of all, I'm not familiar with the branch stack thing.  It's
> intel-specific, right?
>
The kernel API is NOT specific to Intel. It is abstracted to be portable
across architecture. The implementation only exists on certain Intel
X86 processors.

> Also I don't understand what exactly you want here.  What kind of
> interface did you say?  Can you elaborate it bit more?
>
Not clear to me either.

> And AFAIK branch stack can collect much more branch information than
> just callstacks.  Can we differentiate which is which easily?  Is there
> any limitation on using it?  What if callstacks are not sync'ed with
> branch stacks - is it possible though?
>
First of all branch stack is not a branch tracing mechanism. This is a
branch sampling mechanism. Not all branches are captured. Only the
last N consecutive branches leading to a PMU interrupt are captured
in each sample.

Yes, the branch stack mechanism as it exists on Intel processors
can capture more then call branches. It is HW based and provides
a branch type filter. Filtering capability is exposed at the API level
in a generic fashion. The hw filter is based on opcodes. Call branches
all cover call, syscall instructions. As such, the branch stack mechanism
cannot be used to capture callstacks to shared libraries, simply because
there a a non call instruction in the trampoline. To obtain a better quality
callstack you have instead to sample return branches. So yes, callstacks
are not sync'ed with branch stack even if limited to call branches.



> But I think it'd be good if the branch stack can be changed to call
> stack in general.  Did you mean this?
>
That's not going to happen. The mechanism is much more generic than
that.

Quite frankly, I don't understand Frederic's motivation here. The mechanism
are not quite the same.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-28  7:07       ` Stephane Eranian
@ 2012-09-28 15:14         ` Frederic Weisbecker
  2012-09-28 16:36           ` Stephane Eranian
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2012-09-28 15:14 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Arun Sharma,
	Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML, David Ahern,
	Jiri Olsa

On Fri, Sep 28, 2012 at 09:07:57AM +0200, Stephane Eranian wrote:
> On Fri, Sep 28, 2012 at 7:49 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > Hi Frederic,
> >
> > On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
> >> When Arun was working on this, I asked him to explore if it could make sense to reuse
> >> the "-b, --branch-stack"  perf report option. Because after all, this feature is doing
> >> about the same than "-b" except it's using callchains instead of full branch tracing.
> >> But callchains are branches. Just a limited subset of all branches taken on excecution.
> >> So you can probably reuse some interface and even ground code there.
> >>
> >> What do you think?
> >
> > Umm.. first of all, I'm not familiar with the branch stack thing.  It's
> > intel-specific, right?
> >
> The kernel API is NOT specific to Intel. It is abstracted to be portable
> across architecture. The implementation only exists on certain Intel
> X86 processors.
> 
> > Also I don't understand what exactly you want here.  What kind of
> > interface did you say?  Can you elaborate it bit more?
> >
> Not clear to me either.
> 
> > And AFAIK branch stack can collect much more branch information than
> > just callstacks.  Can we differentiate which is which easily?  Is there
> > any limitation on using it?  What if callstacks are not sync'ed with
> > branch stacks - is it possible though?
> >
> First of all branch stack is not a branch tracing mechanism. This is a
> branch sampling mechanism. Not all branches are captured. Only the
> last N consecutive branches leading to a PMU interrupt are captured
> in each sample.
> 
> Yes, the branch stack mechanism as it exists on Intel processors
> can capture more then call branches. It is HW based and provides
> a branch type filter. Filtering capability is exposed at the API level
> in a generic fashion. The hw filter is based on opcodes. Call branches
> all cover call, syscall instructions. As such, the branch stack mechanism
> cannot be used to capture callstacks to shared libraries, simply because
> there a a non call instruction in the trampoline. To obtain a better quality
> callstack you have instead to sample return branches. So yes, callstacks
> are not sync'ed with branch stack even if limited to call branches.
> 

You're right. One doesn't simply sample callchains on top of branch tracing. Not easily at least.
But that's not what we want here. We want the other way round: use callchains as branch sampling.
And a callchain _is_ a branch sampling. Just a specialized one.

PERF_SAMPLE_BRANCH_STACK either records only calls, only ret, or everything, or....
You can define the filter with "-j" option. Now callchains can be considered as the result
of a specific "-j" filter option. It's just a high level filtering. ie: not just based on opcode
types but on semantic post-processing. As if we applied a specific filter on a pure branch tracing
that cancelled calls that had matching ret.

But in the end, what we have is just branches. Some branch layout that is biased, that already passed
through a semantic wheel, still it's just _branches_.

Note I'm not arguing about adding a "-j callchain" option, just trying to show you that callchains
are not really different from other filtered source of branch sampling.


> > But I think it'd be good if the branch stack can be changed to call
> > stack in general.  Did you mean this?
> >
> That's not going to happen. The mechanism is much more generic than
> that.
> 
> Quite frankly, I don't understand Frederic's motivation here. The mechanism
> are not quite the same.

So, considering that callchains are just "branches", why can't we use them as
a branch source, just like PERF_SAMPLE_BRANCH_STACK data samples, that we
can reuse in "perf report -b".

Look at commit b50311dc2ac1c04ad19163c2359910b25e16caf6
"perf report: Add support for taken branch sampling". It's doing (except for a few details
like the period weight of branch samples) the same than in Namhyung patch, just with
PERF_SAMPLE_BRANCH_STACK instead of callchains.

I don't understand what justifies this duplication.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-28  5:49     ` Namhyung Kim
  2012-09-28  7:07       ` Stephane Eranian
@ 2012-09-28 15:27       ` Frederic Weisbecker
  1 sibling, 0 replies; 30+ messages in thread
From: Frederic Weisbecker @ 2012-09-28 15:27 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Arun Sharma, Peter Zijlstra,
	Paul Mackerras, Ingo Molnar, LKML, David Ahern, Jiri Olsa,
	Stephane Eranian

On Fri, Sep 28, 2012 at 02:49:55PM +0900, Namhyung Kim wrote:
> Hi Frederic,
> 
> On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
> > When Arun was working on this, I asked him to explore if it could make sense to reuse
> > the "-b, --branch-stack"  perf report option. Because after all, this feature is doing
> > about the same than "-b" except it's using callchains instead of full branch tracing.
> > But callchains are branches. Just a limited subset of all branches taken on excecution.
> > So you can probably reuse some interface and even ground code there.
> >
> > What do you think?
> 
> Umm.. first of all, I'm not familiar with the branch stack thing.  It's
> intel-specific, right?
> 
> Also I don't understand what exactly you want here.  What kind of
> interface did you say?  Can you elaborate it bit more?

Look at commit b50311dc2ac1c04ad19163c2359910b25e16caf6
"perf report: Add support for taken branch sampling". It's doing almost
the same than you do, just using PERF_SAMPLE_BRANCH_STACK instead of
callchains.

> And AFAIK branch stack can collect much more branch information than
> just callstacks.

That's not a problem. Callchains are just a high-level filtered source of
branch samples. You don't need full branches to use "-b". Just use the flavour
of branch samples you want to make the sense you want on your branch sampling.

> Can we differentiate which is which easily?

Sure. If you have both sources in your perf.data (PERF_SAMPLE_BRANCH_STACK and
callchains), ask the user which one he wants. Otherwise defaults to what's there.

> Is there
> any limitation on using it?  What if callstacks are not sync'ed with
> branch stacks - is it possible though?

It' better to make both sources mutually exclusive. Otherwise it's going
to be over-complicated.

> 
> But I think it'd be good if the branch stack can be changed to call
> stack in general.  Did you mean this?

That's a different. We might be able to post-process branch tracing and
build a callchain on top of it (following calls and ret). May be we will
one day. But they are different issues altogether.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-28 15:14         ` Frederic Weisbecker
@ 2012-09-28 16:36           ` Stephane Eranian
  0 siblings, 0 replies; 30+ messages in thread
From: Stephane Eranian @ 2012-09-28 16:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Arun Sharma,
	Peter Zijlstra, Paul Mackerras, Ingo Molnar, LKML, David Ahern,
	Jiri Olsa

On Fri, Sep 28, 2012 at 5:14 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Fri, Sep 28, 2012 at 09:07:57AM +0200, Stephane Eranian wrote:
>> On Fri, Sep 28, 2012 at 7:49 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>> > Hi Frederic,
>> >
>> > On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
>> >> When Arun was working on this, I asked him to explore if it could make sense to reuse
>> >> the "-b, --branch-stack"  perf report option. Because after all, this feature is doing
>> >> about the same than "-b" except it's using callchains instead of full branch tracing.
>> >> But callchains are branches. Just a limited subset of all branches taken on excecution.
>> >> So you can probably reuse some interface and even ground code there.
>> >>
>> >> What do you think?
>> >
>> > Umm.. first of all, I'm not familiar with the branch stack thing.  It's
>> > intel-specific, right?
>> >
>> The kernel API is NOT specific to Intel. It is abstracted to be portable
>> across architecture. The implementation only exists on certain Intel
>> X86 processors.
>>
>> > Also I don't understand what exactly you want here.  What kind of
>> > interface did you say?  Can you elaborate it bit more?
>> >
>> Not clear to me either.
>>
>> > And AFAIK branch stack can collect much more branch information than
>> > just callstacks.  Can we differentiate which is which easily?  Is there
>> > any limitation on using it?  What if callstacks are not sync'ed with
>> > branch stacks - is it possible though?
>> >
>> First of all branch stack is not a branch tracing mechanism. This is a
>> branch sampling mechanism. Not all branches are captured. Only the
>> last N consecutive branches leading to a PMU interrupt are captured
>> in each sample.
>>
>> Yes, the branch stack mechanism as it exists on Intel processors
>> can capture more then call branches. It is HW based and provides
>> a branch type filter. Filtering capability is exposed at the API level
>> in a generic fashion. The hw filter is based on opcodes. Call branches
>> all cover call, syscall instructions. As such, the branch stack mechanism
>> cannot be used to capture callstacks to shared libraries, simply because
>> there a a non call instruction in the trampoline. To obtain a better quality
>> callstack you have instead to sample return branches. So yes, callstacks
>> are not sync'ed with branch stack even if limited to call branches.
>>
>
> You're right. One doesn't simply sample callchains on top of branch tracing. Not easily at least.
> But that's not what we want here. We want the other way round: use callchains as branch sampling.
> And a callchain _is_ a branch sampling. Just a specialized one.
>
> PERF_SAMPLE_BRANCH_STACK either records only calls, only ret, or everything, or....
> You can define the filter with "-j" option. Now callchains can be considered as the result
> of a specific "-j" filter option. It's just a high level filtering. ie: not just based on opcode
> types but on semantic post-processing. As if we applied a specific filter on a pure branch tracing
> that cancelled calls that had matching ret.
>
A callstack mode will be added to PERF_SAMPLE_BRANCH_STACK geneirc
filter because this becomes
available in HW starting with Haswell (see Vol3b August 2012, section
17.8). This will still be a statistical
approach and not a complete callstack trace (only the last 16 calls).

So yes, you could piggyback your callstack on top of that. You could
return the full trace with the existing
perf_branch_entry data structure. You'd have to fill in the prediction
flags as N/A.

But now with Haswell, one would have to decide whether to use the 'SW
callstack' or the 'HW callstack'.
It all depends on the quality of the data returned by HW callstack.


> But in the end, what we have is just branches. Some branch layout that is biased, that already passed
> through a semantic wheel, still it's just _branches_.
>
> Note I'm not arguing about adding a "-j callchain" option, just trying to show you that callchains
> are not really different from other filtered source of branch sampling.
>
>
>> > But I think it'd be good if the branch stack can be changed to call
>> > stack in general.  Did you mean this?
>> >
>> That's not going to happen. The mechanism is much more generic than
>> that.
>>
>> Quite frankly, I don't understand Frederic's motivation here. The mechanism
>> are not quite the same.
>
> So, considering that callchains are just "branches", why can't we use them as
> a branch source, just like PERF_SAMPLE_BRANCH_STACK data samples, that we
> can reuse in "perf report -b".
>
> Look at commit b50311dc2ac1c04ad19163c2359910b25e16caf6
> "perf report: Add support for taken branch sampling". It's doing (except for a few details
> like the period weight of branch samples) the same than in Namhyung patch, just with
> PERF_SAMPLE_BRANCH_STACK instead of callchains.
>
> I don't understand what justifies this duplication.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
                   ` (16 preceding siblings ...)
  2012-09-25  4:57 ` Namhyung Kim
@ 2012-10-29 19:08 ` Peter Zijlstra
  2012-10-29 21:36   ` Arun Sharma
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2012-10-29 19:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Paul Mackerras, Ingo Molnar, LKML,
	Frederic Weisbecker, Arun Sharma, David Ahern, Jiri Olsa

On Thu, 2012-09-13 at 16:19 +0900, Namhyung Kim wrote:
> When --cumulate option is given, it'll be shown like this:
> 
>    $ perf report --cumulate
>    (...)
>    +  93.63%  abc  libc-2.15.so        [.] __libc_start_main
>    +  93.35%  abc  abc                 [.] main
>    +  93.35%  abc  abc                 [.] c
>    +  93.35%  abc  abc                 [.] b
>    +  93.35%  abc  abc                 [.] a
>    +   5.17%  abc  ld-2.15.so          [.] _dl_map_object
>    +   5.17%  abc  ld-2.15.so          [.] _dl_map_object_from_fd
>    +   1.13%  abc  ld-2.15.so          [.] _dl_start_user
>    +   1.13%  abc  ld-2.15.so          [.] _dl_start
>    +   0.29%  abc  perf                [.] main
>    +   0.29%  abc  perf                [.] run_builtin
>    +   0.29%  abc  perf                [.] cmd_record
>    +   0.29%  abc  libpthread-2.15.so  [.] __libc_close
>    +   0.07%  abc  ld-2.15.so          [.] _start
>    +   0.07%  abc  [kernel.kallsyms]   [k] page_fault
>    
> (This output came from TUI since stdio bothered by callchains) 

Right, so I tried this and I would expect the callchains to be inverted
too, so that when I expand say 'c' I would see that 'c' calls 'b' for
100% which calls 'a' for 100%.

Instead I get the regular callchains, expanding 'c' gives me main calls
it for 100%.

Adding -G (invert callchains) doesn't make it better, in that case, when
I expand 'c' we start at '__libc_start_main' instead of 'c'.

Is there anything I'm missing?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-10-29 19:08 ` Peter Zijlstra
@ 2012-10-29 21:36   ` Arun Sharma
  2012-10-30  6:59     ` Namhyung Kim
  0 siblings, 1 reply; 30+ messages in thread
From: Arun Sharma @ 2012-10-29 21:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Paul Mackerras,
	Ingo Molnar, LKML, Frederic Weisbecker, David Ahern, Jiri Olsa

On 10/29/12 12:08 PM, Peter Zijlstra wrote:

> Right, so I tried this and I would expect the callchains to be inverted
> too, so that when I expand say 'c' I would see that 'c' calls 'b' for
> 100% which calls 'a' for 100%.
>
> Instead I get the regular callchains, expanding 'c' gives me main calls
> it for 100%.
>
> Adding -G (invert callchains) doesn't make it better, in that case, when
> I expand 'c' we start at '__libc_start_main' instead of 'c'.
>
> Is there anything I'm missing?
>

Sounds like a reasonable expectation.

I tested mainly:

perf report --cumulate  -g graph,100,callee

to find the functions with a large amount of CPU time underneath. Then 
examined the callgraph without --cumulate. But yeah - it'd be nice to be 
able to do both in a single invocation.

Also, when callgraphs are displayed, the percentages are off (> 100%). 
Namhyung probably needs to use he->stat_acc->period in a few places as 
the denominator instead of he->period.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-10-29 21:36   ` Arun Sharma
@ 2012-10-30  6:59     ` Namhyung Kim
  2012-10-30  8:17       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2012-10-30  6:59 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Paul Mackerras,
	Ingo Molnar, LKML, Frederic Weisbecker, David Ahern, Jiri Olsa,
	Stephane Eranian

Hi Arun and Peter,

On Mon, 29 Oct 2012 14:36:01 -0700, Arun Sharma wrote:
> On 10/29/12 12:08 PM, Peter Zijlstra wrote:
>
>> Right, so I tried this and I would expect the callchains to be inverted
>> too, so that when I expand say 'c' I would see that 'c' calls 'b' for
>> 100% which calls 'a' for 100%.
>>
>> Instead I get the regular callchains, expanding 'c' gives me main calls
>> it for 100%.
>>
>> Adding -G (invert callchains) doesn't make it better, in that case, when
>> I expand 'c' we start at '__libc_start_main' instead of 'c'.
>>
>> Is there anything I'm missing?
>>
>
> Sounds like a reasonable expectation.
>
> I tested mainly:
>
> perf report --cumulate  -g graph,100,callee
>
> to find the functions with a large amount of CPU time underneath. Then
> examined the callgraph without --cumulate. But yeah - it'd be nice to
> be able to do both in a single invocation.

Yes, the callchain part needs to be improved.  Peter's idea indeed looks
good to me too.

But before doing that, I'd like to get an agreement on how to
design/implement this feature.

Sorry to Frederic (and Stephane), I'm bothering you multiple times with
this but I didn't get what you want exactly.  IIUC you don't want to
have --cumulate option but to share branch sampling code to implement
it, right?

But the branch sampling output looks not fit to --cumulate usage IMHO.
Could you give me an advice?

>
> Also, when callgraphs are displayed, the percentages are off (>
> 100%). Namhyung probably needs to use he->stat_acc->period in a few
> places as the denominator instead of he->period.

I will look into it later.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-10-30  6:59     ` Namhyung Kim
@ 2012-10-30  8:17       ` Peter Zijlstra
  2012-10-30  9:01         ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2012-10-30  8:17 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arun Sharma, Arnaldo Carvalho de Melo, Paul Mackerras,
	Ingo Molnar, LKML, Frederic Weisbecker, David Ahern, Jiri Olsa,
	Stephane Eranian

On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:
> Yes, the callchain part needs to be improved.  Peter's idea indeed looks
> good to me too. 

FWIW, I think this is exactly what sysprof does, except that tool isn't
usable for other reasons.. You might want to look at it though.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-10-30  8:17       ` Peter Zijlstra
@ 2012-10-30  9:01         ` Ingo Molnar
  2012-10-31  7:24           ` Namhyung Kim
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2012-10-30  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Arun Sharma, Arnaldo Carvalho de Melo,
	Paul Mackerras, LKML, Frederic Weisbecker, David Ahern,
	Jiri Olsa, Stephane Eranian


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:

> > Yes, the callchain part needs to be improved.  Peter's idea 
> > indeed looks good to me too.
> 
> FWIW, I think this is exactly what sysprof does, except that 
> tool isn't usable for other reasons.. You might want to look 
> at it though.

I always found the fundamental sysprof system-wide call graph 
profiling output/view superior - and so do many Xorg developers 
who are using SysProf that I talked to - so I'd strongly 
encourage to use that ordering and grouping for the default perf 
call-graph profiling output/view.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods
  2012-10-30  9:01         ` Ingo Molnar
@ 2012-10-31  7:24           ` Namhyung Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-10-31  7:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arun Sharma, Arnaldo Carvalho de Melo,
	Paul Mackerras, LKML, Frederic Weisbecker, David Ahern,
	Jiri Olsa, Stephane Eranian

On Tue, 30 Oct 2012 10:01:10 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
>> On Tue, 2012-10-30 at 15:59 +0900, Namhyung Kim wrote:
>
>> > Yes, the callchain part needs to be improved.  Peter's idea 
>> > indeed looks good to me too.
>> 
>> FWIW, I think this is exactly what sysprof does, except that 
>> tool isn't usable for other reasons.. You might want to look 
>> at it though.
>
> I always found the fundamental sysprof system-wide call graph 
> profiling output/view superior - and so do many Xorg developers 
> who are using SysProf that I talked to - so I'd strongly 
> encourage to use that ordering and grouping for the default perf 
> call-graph profiling output/view.

Okay, I'll look at the sysprof.

Anyway, do you have any other comments for the general --cumulate
approach in this series (esp. with --branch-stack)?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-10-31  7:24 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-13  7:19 [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Namhyung Kim
2012-09-13  7:19 ` [PATCH 01/15] perf hists: Add missing period_* fields when collapsing a hist entry Namhyung Kim
2012-09-13  7:19 ` [PATCH 02/15] perf hists: Introduce struct he_stat Namhyung Kim
2012-09-13  7:19 ` [PATCH 03/15] perf hists: Move he->stat.nr_events initialization to a template Namhyung Kim
2012-09-13  7:20 ` [PATCH 04/15] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
2012-09-13  7:20 ` [PATCH 05/15] perf hists: Add more helpers for hist entry stat Namhyung Kim
2012-09-13  7:20 ` [PATCH 06/15] perf hists: Add support for accumulated stat of hist entry Namhyung Kim
2012-09-13  7:20 ` [PATCH 07/15] perf hists: Check if accumulated when adding a " Namhyung Kim
2012-09-13  7:20 ` [PATCH 08/15] perf callchain: Add a couple of callchain helpers Namhyung Kim
2012-09-13  7:20 ` [PATCH 09/15] perf hists: Let add_hist_entry to make a hist entry template Namhyung Kim
2012-09-13  7:20 ` [PATCH 10/15] perf hists: Accumulate hist entry stat based on the callchain Namhyung Kim
2012-09-13  7:20 ` [PATCH 11/15] perf hists: Sort hist entries by accumulated period Namhyung Kim
2012-09-13  7:20 ` [PATCH 12/15] perf ui/hist: Add support to accumulated hist stat Namhyung Kim
2012-09-13  7:20 ` [PATCH 13/15] perf ui/browser: " Namhyung Kim
2012-09-13  7:20 ` [PATCH 14/15] perf ui/gtk: " Namhyung Kim
2012-09-13  7:20 ` [PATCH 15/15] perf report: Add --cumulate option Namhyung Kim
2012-09-20 17:33 ` [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods Arun Sharma
2012-09-25  4:57 ` Namhyung Kim
2012-09-27 23:01   ` Frederic Weisbecker
2012-09-28  5:49     ` Namhyung Kim
2012-09-28  7:07       ` Stephane Eranian
2012-09-28 15:14         ` Frederic Weisbecker
2012-09-28 16:36           ` Stephane Eranian
2012-09-28 15:27       ` Frederic Weisbecker
2012-10-29 19:08 ` Peter Zijlstra
2012-10-29 21:36   ` Arun Sharma
2012-10-30  6:59     ` Namhyung Kim
2012-10-30  8:17       ` Peter Zijlstra
2012-10-30  9:01         ` Ingo Molnar
2012-10-31  7:24           ` Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.