All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
@ 2012-03-07 22:41 Arun Sharma
  2012-03-07 22:41 ` [PATCH 1/2] perf: refactor add_hist_entry() code paths Arun Sharma
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-07 22:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arun Sharma, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Namhyung Kim, Tom Zanussi,
	linux-perf-users

This patch series refactors existing code a bit and adds sort by 
inclusive time (time spent in the function + callees).

Sample command lines:

# perf record -ag -- sleep 1
# perf report -g graph,0.5,callee -n -s inclusive

Known bugs:

total_period computation is broken for order=callee

Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] perf: refactor add_hist_entry() code paths
  2012-03-07 22:41 [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Arun Sharma
@ 2012-03-07 22:41 ` Arun Sharma
  2012-03-07 22:41 ` [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE Arun Sharma
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-07 22:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arun Sharma, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Namhyung Kim, Tom Zanussi,
	linux-perf-users

This makes perf__evsel_add_hist_entry() less busy in preparation
for support for sorting by inclusive times.

Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
---
 builtin-report.c |   12 ++++--------
 util/hist.c      |   20 ++++++++++++++++++++
 util/hist.h      |    5 +++++
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/builtin-report.c b/builtin-report.c
index 25d34d4..72490a4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -61,6 +61,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct symbol *parent = NULL;
 	int err = 0;
 	struct hist_entry *he;
+	struct callchain_cursor *cursor;
 
 	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
 		err = machine__resolve_callchain(machine, evsel, al->thread,
@@ -69,17 +70,12 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 			return err;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, parent, sample->period);
+	cursor = &evsel->hists.callchain_cursor;
+	he = __hists__add_entry_single(&evsel->hists, al, parent,
+				       cursor, sample->period);
 	if (he == NULL)
 		return -ENOMEM;
 
-	if (symbol_conf.use_callchain) {
-		err = callchain_append(he->callchain,
-				       &evsel->hists.callchain_cursor,
-				       sample->period);
-		if (err)
-			return err;
-	}
 	/*
 	 * Only in the newt browser we are doing integrated annotation,
 	 * so we don't allocated the extra space needed because the stdio
diff --git a/util/hist.c b/util/hist.c
index 6f505d1..ca2314a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -252,6 +252,26 @@ out_unlock:
 	return he;
 }
 
+struct hist_entry *__hists__add_entry_single(struct hists *hists,
+					     struct addr_location *al,
+					     struct symbol *sym_parent,
+					     struct callchain_cursor *cursor,
+					     u64 period)
+{
+	struct hist_entry *he;
+	int err;
+
+	he = __hists__add_entry(hists, al, sym_parent, period);
+	if (he == NULL)
+		return NULL;
+	if (symbol_conf.use_callchain) {
+		err = callchain_append(he->callchain, cursor, period);
+		if (err)
+			return NULL;
+	}
+	return he;
+}
+
 int64_t
 hist_entry__cmp(struct hist_entry *left, struct hist_entry *right)
 {
diff --git a/util/hist.h b/util/hist.h
index 48e5acd..3ed726b 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -68,6 +68,11 @@ struct hists {
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
 				      struct symbol *parent, u64 period);
+struct hist_entry *__hists__add_entry_single(struct hists *self,
+				      struct addr_location *al,
+				      struct symbol *parent,
+				      struct callchain_cursor *cursor,
+				      u64 period);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
 int hist_entry__snprintf(struct hist_entry *self, char *bf, size_t size,
-- 
1.7.8.rc1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
  2012-03-07 22:41 [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Arun Sharma
  2012-03-07 22:41 ` [PATCH 1/2] perf: refactor add_hist_entry() code paths Arun Sharma
@ 2012-03-07 22:41 ` Arun Sharma
  2012-03-07 23:13   ` Arun Sharma
                     ` (2 more replies)
  2012-03-08  7:29 ` [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Ingo Molnar
  2012-03-12  7:43 ` Namhyung Kim
  3 siblings, 3 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-07 22:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arun Sharma, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Namhyung Kim, Tom Zanussi,
	linux-perf-users

Each entry that used to get added once to the histogram, now is added
chain->nr times, each time with one less entry in the
callchain.

This will result in a non-leaf function that appears in a lot of
samples to get a histogram entry with lots of hits.

The user can then drill down into the callchains of functions that
have high inclusive times.

Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
---
 builtin-report.c |    9 +++++++--
 util/callchain.c |   14 ++++++++++++++
 util/callchain.h |    4 ++++
 util/hist.c      |   44 +++++++++++++++++++++++++++++++++++++++++++-
 util/hist.h      |    5 +++++
 util/sort.c      |    3 +++
 util/sort.h      |    2 ++
 7 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/builtin-report.c b/builtin-report.c
index 72490a4..87318c8 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -71,8 +71,12 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	}
 
 	cursor = &evsel->hists.callchain_cursor;
-	he = __hists__add_entry_single(&evsel->hists, al, parent,
-				       cursor, sample->period);
+	if (sort__first_dimension == SORT_INCLUSIVE)
+		he = __hists__add_entry_inclusive(&evsel->hists, al, parent,
+						  cursor, sample->period);
+	else
+		he = __hists__add_entry_single(&evsel->hists, al, parent,
+					       cursor, sample->period);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -591,6 +595,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __used)
 	sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
 	sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list, "comm", stdout);
 	sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", stdout);
+	sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "inclusive", stdout);
 
 	return __cmd_report(&report);
 }
diff --git a/util/callchain.c b/util/callchain.c
index 9f7106a..aa4acde 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -459,3 +459,17 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
 
 	return 0;
 }
+
+int callchain_get(struct callchain_cursor *cursor,
+		  struct addr_location *al)
+{
+	struct callchain_cursor_node *node = cursor->curr;
+
+	if (node == NULL) return -1;
+
+	al->map = node->map;
+	al->sym = node->sym;
+	al->addr = node->ip;
+
+	return 0;
+}
diff --git a/util/callchain.h b/util/callchain.h
index 7f9c0f1..dcff6ec 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -103,9 +103,13 @@ int callchain_merge(struct callchain_cursor *cursor,
 
 struct ip_callchain;
 union perf_event;
+struct addr_location;
 
 bool ip_callchain__valid(struct ip_callchain *chain,
 			 const union perf_event *event);
+
+int callchain_get(struct callchain_cursor *cursor, struct addr_location *al);
+
 /*
  * Initialize a cursor before adding entries inside, but keep
  * the previously allocated entries as a cache.
diff --git a/util/hist.c b/util/hist.c
index ca2314a..9741d48 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -174,6 +174,7 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
 			he->ms.map->referenced = true;
 		if (symbol_conf.use_callchain)
 			callchain_init(he->callchain);
+		he->inclusive = false;
 	}
 
 	return he;
@@ -184,7 +185,8 @@ static void hists__inc_nr_entries(struct hists *hists, struct hist_entry *h)
 	if (!h->filtered) {
 		hists__calc_col_len(hists, h);
 		++hists->nr_entries;
-		hists->stats.total_period += h->period;
+		if (!h->inclusive)
+			hists->stats.total_period += h->period;
 	}
 }
 
@@ -252,6 +254,46 @@ out_unlock:
 	return he;
 }
 
+struct hist_entry *__hists__add_entry_inclusive(struct hists *hists,
+						struct addr_location *al,
+						struct symbol *sym_parent,
+						struct callchain_cursor *cursor,
+						u64 period)
+{
+	struct callchain_cursor iter = *cursor;
+	struct callchain_cursor new_cursor = *cursor;
+	struct hist_entry *he, *orig_he = NULL;
+	int err;
+	u64 i;
+
+	iter.pos = 0;
+	iter.curr = iter.first;
+	for (i = 0; i < cursor->nr; i++) {
+		struct addr_location al_child = *al;
+
+		err = callchain_get(&iter, &al_child);
+		if (err)
+			return NULL;
+		he = __hists__add_entry(hists, &al_child, sym_parent, period);
+		if (he == NULL)
+			return NULL;
+
+		new_cursor.first = iter.curr;
+		new_cursor.nr = cursor->nr - i;
+		if (i != 0)
+			he->inclusive = 1;
+		else
+			orig_he = he;
+		err = callchain_append(he->callchain,
+				       &new_cursor,
+				       period);
+		if (err)
+			return NULL;
+		callchain_cursor_advance(&iter);
+	}
+	return orig_he;
+}
+
 struct hist_entry *__hists__add_entry_single(struct hists *hists,
 					     struct addr_location *al,
 					     struct symbol *sym_parent,
diff --git a/util/hist.h b/util/hist.h
index 3ed726b..8b8b096 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -73,6 +73,11 @@ struct hist_entry *__hists__add_entry_single(struct hists *self,
 				      struct symbol *parent,
 				      struct callchain_cursor *cursor,
 				      u64 period);
+struct hist_entry *__hists__add_entry_inclusive(struct hists *self,
+				      struct addr_location *al,
+				      struct symbol *parent,
+				      struct callchain_cursor *cursor,
+				      u64 period);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
 int hist_entry__snprintf(struct hist_entry *self, char *bf, size_t size,
diff --git a/util/sort.c b/util/sort.c
index 16da30d..1440ad4 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -259,6 +259,7 @@ static struct sort_dimension sort_dimensions[] = {
 	{ .name = "symbol",	.entry = &sort_sym,	},
 	{ .name = "parent",	.entry = &sort_parent,	},
 	{ .name = "cpu",	.entry = &sort_cpu,	},
+	{ .name = "inclusive",	.entry = &sort_sym, 	},
 };
 
 int sort_dimension__add(const char *tok)
@@ -298,6 +299,8 @@ int sort_dimension__add(const char *tok)
 				sort__first_dimension = SORT_DSO;
 			else if (!strcmp(sd->name, "symbol"))
 				sort__first_dimension = SORT_SYM;
+			else if (!strcmp(sd->name, "inclusive"))
+				sort__first_dimension = SORT_INCLUSIVE;
 			else if (!strcmp(sd->name, "parent"))
 				sort__first_dimension = SORT_PARENT;
 			else if (!strcmp(sd->name, "cpu"))
diff --git a/util/sort.h b/util/sort.h
index 3f67ae3..2d35c11 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -65,6 +65,7 @@ struct hist_entry {
 	bool			init_have_children;
 	char			level;
 	bool			used;
+	bool			inclusive;
 	u8			filtered;
 	struct symbol		*parent;
 	union {
@@ -82,6 +83,7 @@ enum sort_type {
 	SORT_SYM,
 	SORT_PARENT,
 	SORT_CPU,
+	SORT_INCLUSIVE,
 };
 
 /*
-- 
1.7.8.rc1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
  2012-03-07 22:41 ` [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE Arun Sharma
@ 2012-03-07 23:13   ` Arun Sharma
  2012-03-08  7:22   ` Ingo Molnar
  2012-03-08 15:39   ` Frederic Weisbecker
  2 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-07 23:13 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Namhyung Kim, Tom Zanussi,
	linux-perf-users


Further testing showed one more bad memory reference when using
sort inclusive. Fixed by the patch below.

commit 799f80b6e8168d4813de094f2327367d6d226a8c
Author: Arun Sharma <asharma@fb.com>
Date:   Wed Mar 7 15:05:00 2012 -0800

    perf: fix some bad memory references in tui
    
    Check for he->ms.sym since that's what we're dereferencing
    
    Signed-off-by: Arun Sharma <asharma@fb.com>

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index b9d0fb2..d4e0ec5 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -86,7 +86,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	 * so we don't allocated the extra space needed because the stdio
 	 * code will not use it.
 	 */
-	if (al->sym != NULL && use_browser > 0) {
+	if (he->ms.sym != NULL && use_browser > 0) {
 		struct annotation *notes = symbol__annotation(he->ms.sym);
 
 		assert(evsel != NULL);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
  2012-03-07 22:41 ` [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE Arun Sharma
  2012-03-07 23:13   ` Arun Sharma
@ 2012-03-08  7:22   ` Ingo Molnar
  2012-03-12 19:35       ` Arun Sharma
  2012-03-08 15:39   ` Frederic Weisbecker
  2 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2012-03-08  7:22 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users


* Arun Sharma <asharma@fb.com> wrote:

> @@ -71,8 +71,12 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
>  	}
>  
>  	cursor = &evsel->hists.callchain_cursor;
> -	he = __hists__add_entry_single(&evsel->hists, al, parent,
> -				       cursor, sample->period);
> +	if (sort__first_dimension == SORT_INCLUSIVE)
> +		he = __hists__add_entry_inclusive(&evsel->hists, al, parent,
> +						  cursor, sample->period);
> +	else
> +		he = __hists__add_entry_single(&evsel->hists, al, parent,
> +					       cursor, sample->period);

If sort__first_dimension is available to hist.c then I think 
there should still be a simple __hists__add_entry() function in 
hist.c, which perf_evsel__add_hist_entry() calls - which then 
calls the static inline __hists__add_entry_inclusive() and 
__hists__add_entry_single() functions within hist.c.

I.e. this refactoring and splitup of the function into two parts 
is not a detail that should matter to builtin-report.c's 
perf_evsel__add_hist_entry().

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-07 22:41 [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Arun Sharma
  2012-03-07 22:41 ` [PATCH 1/2] perf: refactor add_hist_entry() code paths Arun Sharma
  2012-03-07 22:41 ` [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE Arun Sharma
@ 2012-03-08  7:29 ` Ingo Molnar
  2012-03-08 15:31   ` Frederic Weisbecker
  2012-03-12  7:43 ` Namhyung Kim
  3 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2012-03-08  7:29 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users


* Arun Sharma <asharma@fb.com> wrote:

> This patch series refactors existing code a bit and adds sort by 
> inclusive time (time spent in the function + callees).
> 
> Sample command lines:
> 
> # perf record -ag -- sleep 1
> # perf report -g graph,0.5,callee -n -s inclusive

So I tried this out with:

  $ taskset 1 perf record -g git gc

and got entries above 100% (in the TUI):

  $ perf report -g graph,0.5,callee -n -s inclusive

 +  321.11%        5628  [.] 0x392b609269
 +  142.27%        3774  [.] create_delta
 +  78.86%        1248  [.] lookup_object
 +  40.54%        1348  [k] system_call_fastpath
 [...]

Is that expected?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-08  7:29 ` [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Ingo Molnar
@ 2012-03-08 15:31   ` Frederic Weisbecker
  2012-03-08 18:49       ` Arun Sharma
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2012-03-08 15:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arun Sharma, linux-kernel, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On Thu, Mar 08, 2012 at 08:29:01AM +0100, Ingo Molnar wrote:
> 
> * Arun Sharma <asharma@fb.com> wrote:
> 
> > This patch series refactors existing code a bit and adds sort by 
> > inclusive time (time spent in the function + callees).
> > 
> > Sample command lines:
> > 
> > # perf record -ag -- sleep 1
> > # perf report -g graph,0.5,callee -n -s inclusive
> 
> So I tried this out with:
> 
>   $ taskset 1 perf record -g git gc
> 
> and got entries above 100% (in the TUI):
> 
>   $ perf report -g graph,0.5,callee -n -s inclusive
> 
>  +  321.11%        5628  [.] 0x392b609269
>  +  142.27%        3774  [.] create_delta
>  +  78.86%        1248  [.] lookup_object
>  +  40.54%        1348  [k] system_call_fastpath
>  [...]
> 
> Is that expected?

I think this happens because of this:

-               hists->stats.total_period += h->period;                                                                                
+               if (!h->inclusive)                                                                                                     
+                       hists->stats.total_period += h->period;

Which I'm not sure why it is needed btw.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
  2012-03-07 22:41 ` [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE Arun Sharma
  2012-03-07 23:13   ` Arun Sharma
  2012-03-08  7:22   ` Ingo Molnar
@ 2012-03-08 15:39   ` Frederic Weisbecker
  2012-03-08 18:22     ` Arun Sharma
  2 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2012-03-08 15:39 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On Wed, Mar 07, 2012 at 02:41:19PM -0800, Arun Sharma wrote:
> Each entry that used to get added once to the histogram, now is added
> chain->nr times, each time with one less entry in the
> callchain.
> 
> This will result in a non-leaf function that appears in a lot of
> samples to get a histogram entry with lots of hits.
> 
> The user can then drill down into the callchains of functions that
> have high inclusive times.

I don't yet understand the point of this.

Imagine those three hists:

a -> b -> c
a -> b -> d
a-> e -> f

The fractal inverted mode (-G) will report this:

a--
  |
  ----- b
  |     |
  |     -----c
  |     |
  |     -----d
  |
  ----- e
        |
        -----f

The branch sorting is recursive so outstanding callers appear
in the first branches already.

So if your goal is to find important callers, I don't see why
the existing functionalities aren't enough.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
  2012-03-08 15:39   ` Frederic Weisbecker
@ 2012-03-08 18:22     ` Arun Sharma
  2012-03-13 13:44       ` Frederic Weisbecker
  0 siblings, 1 reply; 30+ messages in thread
From: Arun Sharma @ 2012-03-08 18:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Arun Sharma, linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On Thu, Mar 08, 2012 at 04:39:36PM +0100, Frederic Weisbecker wrote:
> 
> I don't yet understand the point of this.
> 
> Imagine those three hists:
> 
> a -> b -> c
> a -> b -> d
> a-> e -> f
> 
> The fractal inverted mode (-G) will report this:
> 
> a--
>   |
>   ----- b
>   |     |
>   |     -----c
>   |     |
>   |     -----d
>   |
>   ----- e
>         |
>         -----f
> 

Please see the test program attached. It has only two paths from main()
to c(). But for this discussion, imagine a callgraph with 10 different
paths.

With -G, c() appears 10 times in the callgraph and the user is required
to manually sum up the samples to realize that the callgraph under c()
is very expensive.

With -s inclusive, c() will show up at the very top after main().

 -Arun

#include <stdio.h>

int sum = 0;

#define LOOP(n)                                         \
        {                                               \
                int j;                                  \
                for (j = 0; j < 10000; j++) {           \
                        sum += j;                       \
                }                                       \
        }

int f()
{
        LOOP(100);
}

int d()
{
        LOOP(100);
        f();
}

int e()
{
	LOOP(100);
	f();
}

int c()
{
	LOOP(100);
	d();
	LOOP(100);
	e();
}

int b() 
{
	LOOP(70);
	c();
}

int a() 
{
	LOOP(30);
	c();
}

int main()
{
	int i;
	for (i = 0; i < 10000; i++) {
		a();
		b();
	}
}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-08 15:31   ` Frederic Weisbecker
@ 2012-03-08 18:49       ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-08 18:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, linux-kernel, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/8/12 7:31 AM, Frederic Weisbecker wrote:
> On Thu, Mar 08, 2012 at 08:29:01AM +0100, Ingo Molnar wrote:
>>
>> * Arun Sharma<asharma@fb.com>  wrote:
>>
>>> This patch series refactors existing code a bit and adds sort by
>>> inclusive time (time spent in the function + callees).
>>>
>>> Sample command lines:
>>>
>>> # perf record -ag -- sleep 1
>>> # perf report -g graph,0.5,callee -n -s inclusive
>>
>> So I tried this out with:
>>
>>    $ taskset 1 perf record -g git gc
>>
>> and got entries above 100% (in the TUI):
>>
>>    $ perf report -g graph,0.5,callee -n -s inclusive
>>
>>   +  321.11%        5628  [.] 0x392b609269
>>   +  142.27%        3774  [.] create_delta
>>   +  78.86%        1248  [.] lookup_object
>>   +  40.54%        1348  [k] system_call_fastpath
>>   [...]
>>
>> Is that expected?

Yes - this is the "known bug" I noted in the cover letter

The second column (samples) is still accurate and could be used for the 
analysis.

>
> I think this happens because of this:
>
> -               hists->stats.total_period += h->period;
> +               if (!h->inclusive)
> +                       hists->stats.total_period += h->period;
>
> Which I'm not sure why it is needed btw.

Suppose the perf.data file had 1000 samples each with a period of 1e6 
events. total_period would be 1e9 without -s inclusive. Further, let's 
say the callchains had an average length of 10.

Now, after adding extra entries to the histogram, total_period would be 
1e10, which screws up the percentages. I'd like to differentiate between 
the hist entries that were in the event stream vs the ones added for 
inclusive time computation. Desired end result: the total_period remains 
unchanged at 1e9.

This is done via:

+		if (i != 0)
+			he->inclusive = 1;
+		else
+			orig_he = he;

Either (i != 0) is not a good enough test, or the inclusive bit is not 
getting propagated properly after histogram collapsing/resorting. This 
is the part I need to better understand and debug.

I tried to explain this problem in my first RFC message as well:

http://thread.gmane.org/gmane.linux.kernel/1262289

The problem Ingo is running into (and I've reproduced it on my end as 
well) is that total_period is smaller than without -s inclusive i.e. 
h->inclusive is 1 when it shouldn't be.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
@ 2012-03-08 18:49       ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-08 18:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, linux-kernel, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/8/12 7:31 AM, Frederic Weisbecker wrote:
> On Thu, Mar 08, 2012 at 08:29:01AM +0100, Ingo Molnar wrote:
>>
>> * Arun Sharma<asharma@fb.com>  wrote:
>>
>>> This patch series refactors existing code a bit and adds sort by
>>> inclusive time (time spent in the function + callees).
>>>
>>> Sample command lines:
>>>
>>> # perf record -ag -- sleep 1
>>> # perf report -g graph,0.5,callee -n -s inclusive
>>
>> So I tried this out with:
>>
>>    $ taskset 1 perf record -g git gc
>>
>> and got entries above 100% (in the TUI):
>>
>>    $ perf report -g graph,0.5,callee -n -s inclusive
>>
>>   +  321.11%        5628  [.] 0x392b609269
>>   +  142.27%        3774  [.] create_delta
>>   +  78.86%        1248  [.] lookup_object
>>   +  40.54%        1348  [k] system_call_fastpath
>>   [...]
>>
>> Is that expected?

Yes - this is the "known bug" I noted in the cover letter

The second column (samples) is still accurate and could be used for the 
analysis.

>
> I think this happens because of this:
>
> -               hists->stats.total_period += h->period;
> +               if (!h->inclusive)
> +                       hists->stats.total_period += h->period;
>
> Which I'm not sure why it is needed btw.

Suppose the perf.data file had 1000 samples each with a period of 1e6 
events. total_period would be 1e9 without -s inclusive. Further, let's 
say the callchains had an average length of 10.

Now, after adding extra entries to the histogram, total_period would be 
1e10, which screws up the percentages. I'd like to differentiate between 
the hist entries that were in the event stream vs the ones added for 
inclusive time computation. Desired end result: the total_period remains 
unchanged at 1e9.

This is done via:

+		if (i != 0)
+			he->inclusive = 1;
+		else
+			orig_he = he;

Either (i != 0) is not a good enough test, or the inclusive bit is not 
getting propagated properly after histogram collapsing/resorting. This 
is the part I need to better understand and debug.

I tried to explain this problem in my first RFC message as well:

http://thread.gmane.org/gmane.linux.kernel/1262289

The problem Ingo is running into (and I've reproduced it on my end as 
well) is that total_period is smaller than without -s inclusive i.e. 
h->inclusive is 1 when it shouldn't be.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-08 18:49       ` Arun Sharma
  (?)
@ 2012-03-12  7:15       ` Namhyung Kim
  2012-03-12 18:05           ` Arun Sharma
  -1 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2012-03-12  7:15 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Frederic Weisbecker, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

Hi,

2012-03-09 3:49 AM, Arun Sharma wrote:
> On 3/8/12 7:31 AM, Frederic Weisbecker wrote:
>> On Thu, Mar 08, 2012 at 08:29:01AM +0100, Ingo Molnar wrote:
>>>
>>> * Arun Sharma<asharma@fb.com> wrote:
>>>
>>>> This patch series refactors existing code a bit and adds sort by
>>>> inclusive time (time spent in the function + callees).
>>>>
>>>> Sample command lines:
>>>>
>>>> # perf record -ag -- sleep 1
>>>> # perf report -g graph,0.5,callee -n -s inclusive
>>>
>>> So I tried this out with:
>>>
>>> $ taskset 1 perf record -g git gc
>>>
>>> and got entries above 100% (in the TUI):
>>>
>>> $ perf report -g graph,0.5,callee -n -s inclusive
>>>
>>> + 321.11% 5628 [.] 0x392b609269
>>> + 142.27% 3774 [.] create_delta
>>> + 78.86% 1248 [.] lookup_object
>>> + 40.54% 1348 [k] system_call_fastpath
>>> [...]
>>>
>>> Is that expected?
>
> Yes - this is the "known bug" I noted in the cover letter
>
> The second column (samples) is still accurate and could be used for the analysis.
>
>>
>> I think this happens because of this:
>>
>> - hists->stats.total_period += h->period;
>> + if (!h->inclusive)
>> + hists->stats.total_period += h->period;
>>
>> Which I'm not sure why it is needed btw.
>
> Suppose the perf.data file had 1000 samples each with a period of 1e6 events.
> total_period would be 1e9 without -s inclusive. Further, let's say the
> callchains had an average length of 10.
>
> Now, after adding extra entries to the histogram, total_period would be 1e10,
> which screws up the percentages. I'd like to differentiate between the hist
> entries that were in the event stream vs the ones added for inclusive time
> computation. Desired end result: the total_period remains unchanged at 1e9.
>
> This is done via:
>
> + if (i != 0)
> + he->inclusive = 1;
> + else
> + orig_he = he;
>
> Either (i != 0) is not a good enough test, or the inclusive bit is not getting
> propagated properly after histogram collapsing/resorting. This is the part I
> need to better understand and debug.
>
> I tried to explain this problem in my first RFC message as well:
>
> http://thread.gmane.org/gmane.linux.kernel/1262289
>
> The problem Ingo is running into (and I've reproduced it on my end as well) is
> that total_period is smaller than without -s inclusive i.e. h->inclusive is 1
> when it shouldn't be.
>

I think it's because of the shared hist_entry. If a callchain is a subset of 
another, it will be marked as inclusive so that it cannot be contributed to 
total period. Say, there're two chains - X (a -> b -> c) and Y (a -> b), once 
__hists__add_entry_inclusive() was called on X, we have:

  a -> b -> c	
  a -> b 	(inclusive)
  a		(inclusive)

And then, calling the function on Y should make:

  a -> b
  a		(inclusive)

However, since both callchains are in tree already they'll be shared and 
marked *inclusive*. Thus the total period will not increased at all for Y. 
Also I guess the reverse case - add Y first, and then X - will have the same 
result.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-07 22:41 [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Arun Sharma
                   ` (2 preceding siblings ...)
  2012-03-08  7:29 ` [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Ingo Molnar
@ 2012-03-12  7:43 ` Namhyung Kim
  2012-03-12 18:21     ` Arun Sharma
  3 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2012-03-12  7:43 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

Hi,

2012-03-08 7:41 AM, Arun Sharma wrote:
> This patch series refactors existing code a bit and adds sort by
> inclusive time (time spent in the function + callees).
>
> Sample command lines:
>
> # perf record -ag -- sleep 1
> # perf report -g graph,0.5,callee -n -s inclusive
>
> Known bugs:
>
> total_period computation is broken for order=callee
>

I'd like to add two more :).

* If perf record misses callchain info, perf report will get stuck.
* If it's used with "symbol" sort order, it'll get stuck too.


BTW, I don't like the name 'inclusive' as a sort key. If it cares about time, 
IMHO, the name should contain 'time' - something like 'itime' or 'inctime'?

Furthermore, I don't think it is a sort key. As it doesn't sort anything,  and 
only affects the way calculating symbol's period value, wouldn't it be better 
making it a separate switch rather than a sort key? Plus, checking whether it 
has callchain data and symbol sort key might be added also.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-12  7:15       ` Namhyung Kim
@ 2012-03-12 18:05           ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 18:05 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Frederic Weisbecker, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 12:15 AM, Namhyung Kim wrote:

>
> I think it's because of the shared hist_entry. If a callchain is a
> subset of another, it will be marked as inclusive so that it cannot be
> contributed to total period. Say, there're two chains - X (a -> b -> c)
> and Y (a -> b), once __hists__add_entry_inclusive() was called on X, we
> have:
>
> a -> b -> c
> a -> b (inclusive)
> a (inclusive)
>
> And then, calling the function on Y should make:
>
> a -> b
> a (inclusive)
>
> However, since both callchains are in tree already they'll be shared and
> marked *inclusive*. Thus the total period will not increased at all for
> Y. Also I guess the reverse case - add Y first, and then X - will have
> the same result.

Thanks for figuring this out. Looks like using a single bit 
(he->inclusive) is insufficient. How about:

struct hist_entry {
	u64                     period;
         u64                     period_self;
	..
};

Normal mode: period_self == period.
Inclusive mode: period_self will be zero for inclusive hist_entries.
Shared entries: we sum up both period and period_self.

We can then compute total_period by summing up period_self.

  -Arun


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
@ 2012-03-12 18:05           ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 18:05 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Frederic Weisbecker, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 12:15 AM, Namhyung Kim wrote:

>
> I think it's because of the shared hist_entry. If a callchain is a
> subset of another, it will be marked as inclusive so that it cannot be
> contributed to total period. Say, there're two chains - X (a -> b -> c)
> and Y (a -> b), once __hists__add_entry_inclusive() was called on X, we
> have:
>
> a -> b -> c
> a -> b (inclusive)
> a (inclusive)
>
> And then, calling the function on Y should make:
>
> a -> b
> a (inclusive)
>
> However, since both callchains are in tree already they'll be shared and
> marked *inclusive*. Thus the total period will not increased at all for
> Y. Also I guess the reverse case - add Y first, and then X - will have
> the same result.

Thanks for figuring this out. Looks like using a single bit 
(he->inclusive) is insufficient. How about:

struct hist_entry {
	u64                     period;
         u64                     period_self;
	..
};

Normal mode: period_self == period.
Inclusive mode: period_self will be zero for inclusive hist_entries.
Shared entries: we sum up both period and period_self.

We can then compute total_period by summing up period_self.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-12  7:43 ` Namhyung Kim
@ 2012-03-12 18:21     ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 18:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 12:43 AM, Namhyung Kim wrote:

>> Known bugs:
>>
>> total_period computation is broken for order=callee
>>
>
> I'd like to add two more :).
>
> * If perf record misses callchain info, perf report will get stuck.
> * If it's used with "symbol" sort order, it'll get stuck too.
>

Can you post the command lines you used to reproduce this? A backtrace 
on where perf report is getting stuck would be useful as well.

Since I didn't make any changes to the perf record or perf report code 
paths that don't involve "-s inclusive", I wonder if you see the same 
issues without my patch.

For missing callchain info cases, I'd expect cursor->nr to be zero. So 
loops such as:

+       for (i = 0; i < cursor->nr; i++) {

should terminate immediately.

>
> BTW, I don't like the name 'inclusive' as a sort key. If it cares about
> time, IMHO, the name should contain 'time' - something like 'itime' or
> 'inctime'?

The existing sort orders: pid, comm, dso, symbol, parent -- all care 
about time, but none of them have time in their name?

>
> Furthermore, I don't think it is a sort key. As it doesn't sort
> anything, and only affects the way calculating symbol's period value,
> wouldn't it be better making it a separate switch rather than a sort
> key? Plus, checking whether it has callchain data and symbol sort key
> might be added also.

Yes, we're still sorting by period, but a redefined meaning of period 
(please see my other mail about period vs period_self). Previous 
iterations of this code did use a top level flag, but that namespace is 
getting a bit crowded.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
@ 2012-03-12 18:21     ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 18:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 12:43 AM, Namhyung Kim wrote:

>> Known bugs:
>>
>> total_period computation is broken for order=callee
>>
>
> I'd like to add two more :).
>
> * If perf record misses callchain info, perf report will get stuck.
> * If it's used with "symbol" sort order, it'll get stuck too.
>

Can you post the command lines you used to reproduce this? A backtrace 
on where perf report is getting stuck would be useful as well.

Since I didn't make any changes to the perf record or perf report code 
paths that don't involve "-s inclusive", I wonder if you see the same 
issues without my patch.

For missing callchain info cases, I'd expect cursor->nr to be zero. So 
loops such as:

+       for (i = 0; i < cursor->nr; i++) {

should terminate immediately.

>
> BTW, I don't like the name 'inclusive' as a sort key. If it cares about
> time, IMHO, the name should contain 'time' - something like 'itime' or
> 'inctime'?

The existing sort orders: pid, comm, dso, symbol, parent -- all care 
about time, but none of them have time in their name?

>
> Furthermore, I don't think it is a sort key. As it doesn't sort
> anything, and only affects the way calculating symbol's period value,
> wouldn't it be better making it a separate switch rather than a sort
> key? Plus, checking whether it has callchain data and symbol sort key
> might be added also.

Yes, we're still sorting by period, but a redefined meaning of period 
(please see my other mail about period vs period_self). Previous 
iterations of this code did use a top level flag, but that namespace is 
getting a bit crowded.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
  2012-03-08  7:22   ` Ingo Molnar
@ 2012-03-12 19:35       ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 19:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/7/12 11:22 PM, Ingo Molnar wrote:

>
> I.e. this refactoring and splitup of the function into two parts
> is not a detail that should matter to builtin-report.c's
> perf_evsel__add_hist_entry().
>

Agreed. I'll fold this into the next rev.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
@ 2012-03-12 19:35       ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 19:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Frederic Weisbecker,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/7/12 11:22 PM, Ingo Molnar wrote:

>
> I.e. this refactoring and splitup of the function into two parts
> is not a detail that should matter to builtin-report.c's
> perf_evsel__add_hist_entry().
>

Agreed. I'll fold this into the next rev.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-12 18:21     ` Arun Sharma
@ 2012-03-12 19:58       ` Arun Sharma
  -1 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 19:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 11:21 AM, Arun Sharma wrote:

>>
>> BTW, I don't like the name 'inclusive' as a sort key. If it cares about
>> time, IMHO, the name should contain 'time' - something like 'itime' or
>> 'inctime'?
>
> The existing sort orders: pid, comm, dso, symbol, parent -- all care
> about time, but none of them have time in their name?

I'll take that back. What they sort on depends on the event.

perf record -ge cache-misses
perf report -s inclusive

will sort by the number of cache-misses and not time.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
@ 2012-03-12 19:58       ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-12 19:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 11:21 AM, Arun Sharma wrote:

>>
>> BTW, I don't like the name 'inclusive' as a sort key. If it cares about
>> time, IMHO, the name should contain 'time' - something like 'itime' or
>> 'inctime'?
>
> The existing sort orders: pid, comm, dso, symbol, parent -- all care
> about time, but none of them have time in their name?

I'll take that back. What they sort on depends on the event.

perf record -ge cache-misses
perf report -s inclusive

will sort by the number of cache-misses and not time.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-12 18:05           ` Arun Sharma
  (?)
@ 2012-03-13  1:11           ` Namhyung Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Namhyung Kim @ 2012-03-13  1:11 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Frederic Weisbecker, Ingo Molnar, linux-kernel,
	Arnaldo Carvalho de Melo, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

2012-03-13 3:05 AM, Arun Sharma wrote:
> On 3/12/12 12:15 AM, Namhyung Kim wrote:
>> I think it's because of the shared hist_entry. If a callchain is a
>> subset of another, it will be marked as inclusive so that it cannot be
>> contributed to total period. Say, there're two chains - X (a -> b -> c)
>> and Y (a -> b), once __hists__add_entry_inclusive() was called on X, we
>> have:
>>
>> a -> b -> c
>> a -> b (inclusive)
>> a (inclusive)
>>
>> And then, calling the function on Y should make:
>>
>> a -> b
>> a (inclusive)
>>
>> However, since both callchains are in tree already they'll be shared and
>> marked *inclusive*. Thus the total period will not increased at all for
>> Y. Also I guess the reverse case - add Y first, and then X - will have
>> the same result.
>
> Thanks for figuring this out. Looks like using a single bit (he->inclusive) is
> insufficient. How about:
>
> struct hist_entry {
> 	u64 	period;
> 	u64 	period_self;
> 	..
> };
>
> Normal mode: period_self == period.
> Inclusive mode: period_self will be zero for inclusive hist_entries.
> Shared entries: we sum up both period and period_self.
>
> We can then compute total_period by summing up period_self.
>

Yeah, I agree that we need that kind of code to handle total_period properly. 
Looking forward to your v3 :).

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-12 18:21     ` Arun Sharma
  (?)
  (?)
@ 2012-03-13  1:16     ` Namhyung Kim
  2012-03-13 18:45         ` Arun Sharma
  -1 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2012-03-13  1:16 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

2012-03-13 3:21 AM, Arun Sharma wrote:
> On 3/12/12 12:43 AM, Namhyung Kim wrote:
>
>>> Known bugs:
>>>
>>> total_period computation is broken for order=callee
>>>
>>
>> I'd like to add two more :).
>>
>> * If perf record misses callchain info, perf report will get stuck.
>> * If it's used with "symbol" sort order, it'll get stuck too.
>>
>
> Can you post the command lines you used to reproduce this? A backtrace on
> where perf report is getting stuck would be useful as well.
>

$ perf record sleep 1
$ perf report -s inclusive

and

$ perf record -g sleep 1
$ perf report -s inclusive,symbol


Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-12 19:58       ` Arun Sharma
  (?)
@ 2012-03-13  1:36       ` Namhyung Kim
  2012-03-15 14:50         ` Frederic Weisbecker
  -1 siblings, 1 reply; 30+ messages in thread
From: Namhyung Kim @ 2012-03-13  1:36 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

Hi,

2012-03-13 4:58 AM, Arun Sharma wrote:
> On 3/12/12 11:21 AM, Arun Sharma wrote:
>>> BTW, I don't like the name 'inclusive' as a sort key. If it cares about
>>> time, IMHO, the name should contain 'time' - something like 'itime' or
>>> 'inctime'?
>>
>> The existing sort orders: pid, comm, dso, symbol, parent -- all care
>> about time, but none of them have time in their name?
>
> I'll take that back. What they sort on depends on the event.
>
> perf record -ge cache-misses
> perf report -s inclusive
>
> will sort by the number of cache-misses and not time.
>
> -Arun

AFAIK, "sort" here means how perf identifies a sample event from others: 
"comm" will collect samples have same pid/comm, then "dso" will group samples 
belong to same library, and "symbol" will group again samples have same symbol 
name. This is what default sort order (comm,dso,symbol) does.

In case of "inclusive" it does same thing with "symbol" as you use sort_sym 
for the inclusive sort_dimension. Sorting by period for output is done at 
final stage and it won't be affected by any sort orders. It maybe sound 
confusing but that's how it works.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE
  2012-03-08 18:22     ` Arun Sharma
@ 2012-03-13 13:44       ` Frederic Weisbecker
  0 siblings, 0 replies; 30+ messages in thread
From: Frederic Weisbecker @ 2012-03-13 13:44 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On Thu, Mar 08, 2012 at 10:22:26AM -0800, Arun Sharma wrote:
> On Thu, Mar 08, 2012 at 04:39:36PM +0100, Frederic Weisbecker wrote:
> > 
> > I don't yet understand the point of this.
> > 
> > Imagine those three hists:
> > 
> > a -> b -> c
> > a -> b -> d
> > a-> e -> f
> > 
> > The fractal inverted mode (-G) will report this:
> > 
> > a--
> >   |
> >   ----- b
> >   |     |
> >   |     -----c
> >   |     |
> >   |     -----d
> >   |
> >   ----- e
> >         |
> >         -----f
> > 
> 
> Please see the test program attached. It has only two paths from main()
> to c(). But for this discussion, imagine a callgraph with 10 different
> paths.

Ok that makes sense.

Thanks.

> 
> With -G, c() appears 10 times in the callgraph and the user is required
> to manually sum up the samples to realize that the callgraph under c()
> is very expensive.
> 
> With -s inclusive, c() will show up at the very top after main().
> 
>  -Arun
> 
> #include <stdio.h>
> 
> int sum = 0;
> 
> #define LOOP(n)                                         \
>         {                                               \
>                 int j;                                  \
>                 for (j = 0; j < 10000; j++) {           \
>                         sum += j;                       \
>                 }                                       \
>         }
> 
> int f()
> {
>         LOOP(100);
> }
> 
> int d()
> {
>         LOOP(100);
>         f();
> }
> 
> int e()
> {
> 	LOOP(100);
> 	f();
> }
> 
> int c()
> {
> 	LOOP(100);
> 	d();
> 	LOOP(100);
> 	e();
> }
> 
> int b() 
> {
> 	LOOP(70);
> 	c();
> }
> 
> int a() 
> {
> 	LOOP(30);
> 	c();
> }
> 
> int main()
> {
> 	int i;
> 	for (i = 0; i < 10000; i++) {
> 		a();
> 		b();
> 	}
> }
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-13  1:16     ` Namhyung Kim
@ 2012-03-13 18:45         ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-13 18:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 6:16 PM, Namhyung Kim wrote:

> $ perf record -g sleep 1
> $ perf report -s inclusive,symbol

Thanks. I was able to reproduce the hangs and they went away after I 
added sort_sym_inclusive (a copy of sort_sym, except for the se_header).

The output of perf report now better describes how the symbols are 
sorted. The text reads: "Symbol (Inclusive)".

v3 is coming via my next mail.

  -Arun


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
@ 2012-03-13 18:45         ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-13 18:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/12/12 6:16 PM, Namhyung Kim wrote:

> $ perf record -g sleep 1
> $ perf report -s inclusive,symbol

Thanks. I was able to reproduce the hangs and they went away after I 
added sort_sym_inclusive (a copy of sort_sym, except for the se_header).

The output of perf report now better describes how the symbols are 
sorted. The text reads: "Symbol (Inclusive)".

v3 is coming via my next mail.

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-13  1:36       ` Namhyung Kim
@ 2012-03-15 14:50         ` Frederic Weisbecker
  2012-03-15 17:41             ` Arun Sharma
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2012-03-15 14:50 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arun Sharma, linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Tom Zanussi, linux-perf-users

On Tue, Mar 13, 2012 at 10:36:31AM +0900, Namhyung Kim wrote:
> Hi,
> 
> 2012-03-13 4:58 AM, Arun Sharma wrote:
> >On 3/12/12 11:21 AM, Arun Sharma wrote:
> >>>BTW, I don't like the name 'inclusive' as a sort key. If it cares about
> >>>time, IMHO, the name should contain 'time' - something like 'itime' or
> >>>'inctime'?
> >>
> >>The existing sort orders: pid, comm, dso, symbol, parent -- all care
> >>about time, but none of them have time in their name?
> >
> >I'll take that back. What they sort on depends on the event.
> >
> >perf record -ge cache-misses
> >perf report -s inclusive
> >
> >will sort by the number of cache-misses and not time.
> >
> >-Arun
> 
> AFAIK, "sort" here means how perf identifies a sample event from
> others: "comm" will collect samples have same pid/comm, then "dso"
> will group samples belong to same library, and "symbol" will group
> again samples have same symbol name. This is what default sort order
> (comm,dso,symbol) does.

Right this is about how we group the events into histograms.
If you sort by dso, you'll have one histogram per dso and events
will be added to the histogram matching their dso.

Multiple sorting does the same with an "AND" between sort entries.
If you sort by dso,pid, you'll have one histogram per possible couple
of (dso,pid).

Say you have dso1, dso2 and pid1 and pid2, then you get 4 possible histograms:
(dso1,pid1), (dso1,pid2), (dso2,pid1), (dso2,pid2)
...assuming that over your events you have all these combinations.

So this is how we group samples into histograms.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
  2012-03-15 14:50         ` Frederic Weisbecker
@ 2012-03-15 17:41             ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-15 17:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Namhyung Kim, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/15/12 7:50 AM, Frederic Weisbecker wrote:

>> AFAIK, "sort" here means how perf identifies a sample event from
>> others: "comm" will collect samples have same pid/comm, then "dso"
>> will group samples belong to same library, and "symbol" will group
>> again samples have same symbol name. This is what default sort order
>> (comm,dso,symbol) does.
>
> Right this is about how we group the events into histograms.
> If you sort by dso, you'll have one histogram per dso and events
> will be added to the histogram matching their dso.
>
> Multiple sorting does the same with an "AND" between sort entries.
> If you sort by dso,pid, you'll have one histogram per possible couple
> of (dso,pid).
>
> Say you have dso1, dso2 and pid1 and pid2, then you get 4 possible histograms:
> (dso1,pid1), (dso1,pid2), (dso2,pid1), (dso2,pid2)
> ...assuming that over your events you have all these combinations.
>
> So this is how we group samples into histograms.

Agreed. It's a different sort order, but not necessarily a different 
sort dimension. I'm thinking "group by" vs "order by" in SQL here.

Re: adding a new top level flag, struct option options[] in 
builtin-report.c looks busy (especially for short options. Eg: -i is 
already taken).

I can see parallels between this and sort__branch_mode. Stephane: what 
do you think about putting this in a different namespace?

--sort-mode branch (sort__branch_mode)
--sort-mode inclusive (sort__inclusive_mode)

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] perf: add sort by inclusive time functionality (v2)
@ 2012-03-15 17:41             ` Arun Sharma
  0 siblings, 0 replies; 30+ messages in thread
From: Arun Sharma @ 2012-03-15 17:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Namhyung Kim, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

On 3/15/12 7:50 AM, Frederic Weisbecker wrote:

>> AFAIK, "sort" here means how perf identifies a sample event from
>> others: "comm" will collect samples have same pid/comm, then "dso"
>> will group samples belong to same library, and "symbol" will group
>> again samples have same symbol name. This is what default sort order
>> (comm,dso,symbol) does.
>
> Right this is about how we group the events into histograms.
> If you sort by dso, you'll have one histogram per dso and events
> will be added to the histogram matching their dso.
>
> Multiple sorting does the same with an "AND" between sort entries.
> If you sort by dso,pid, you'll have one histogram per possible couple
> of (dso,pid).
>
> Say you have dso1, dso2 and pid1 and pid2, then you get 4 possible histograms:
> (dso1,pid1), (dso1,pid2), (dso2,pid1), (dso2,pid2)
> ...assuming that over your events you have all these combinations.
>
> So this is how we group samples into histograms.

Agreed. It's a different sort order, but not necessarily a different 
sort dimension. I'm thinking "group by" vs "order by" in SQL here.

Re: adding a new top level flag, struct option options[] in 
builtin-report.c looks busy (especially for short options. Eg: -i is 
already taken).

I can see parallels between this and sort__branch_mode. Stephane: what 
do you think about putting this in a different namespace?

--sort-mode branch (sort__branch_mode)
--sort-mode inclusive (sort__inclusive_mode)

  -Arun

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-03-15 17:41 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-07 22:41 [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Arun Sharma
2012-03-07 22:41 ` [PATCH 1/2] perf: refactor add_hist_entry() code paths Arun Sharma
2012-03-07 22:41 ` [PATCH 2/2] perf: Add a new sort order: SORT_INCLUSIVE Arun Sharma
2012-03-07 23:13   ` Arun Sharma
2012-03-08  7:22   ` Ingo Molnar
2012-03-12 19:35     ` Arun Sharma
2012-03-12 19:35       ` Arun Sharma
2012-03-08 15:39   ` Frederic Weisbecker
2012-03-08 18:22     ` Arun Sharma
2012-03-13 13:44       ` Frederic Weisbecker
2012-03-08  7:29 ` [PATCH 0/2] perf: add sort by inclusive time functionality (v2) Ingo Molnar
2012-03-08 15:31   ` Frederic Weisbecker
2012-03-08 18:49     ` Arun Sharma
2012-03-08 18:49       ` Arun Sharma
2012-03-12  7:15       ` Namhyung Kim
2012-03-12 18:05         ` Arun Sharma
2012-03-12 18:05           ` Arun Sharma
2012-03-13  1:11           ` Namhyung Kim
2012-03-12  7:43 ` Namhyung Kim
2012-03-12 18:21   ` Arun Sharma
2012-03-12 18:21     ` Arun Sharma
2012-03-12 19:58     ` Arun Sharma
2012-03-12 19:58       ` Arun Sharma
2012-03-13  1:36       ` Namhyung Kim
2012-03-15 14:50         ` Frederic Weisbecker
2012-03-15 17:41           ` Arun Sharma
2012-03-15 17:41             ` Arun Sharma
2012-03-13  1:16     ` Namhyung Kim
2012-03-13 18:45       ` Arun Sharma
2012-03-13 18:45         ` Arun Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.