All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
@ 2012-03-14 17:36 Arun Sharma
  2012-03-15  1:02   ` Namhyung Kim
  2012-03-15 14:14 ` Frederic Weisbecker
  0 siblings, 2 replies; 13+ messages in thread
From: Arun Sharma @ 2012-03-14 17:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arun Sharma, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Namhyung Kim, Tom Zanussi,
	linux-perf-users

Each entry that used to get added once to the histogram, now is added
chain->nr times, each time with one less entry in the
callchain.

This will result in a non-leaf function that appears in a lot of
samples to get a histogram entry with lots of hits.

The user can then drill down into the callchains of functions that
have high inclusive times.

Sample command lines:

$ perf record -ag -- sleep 1
$ perf report -g graph,0.5,callee -n -s inclusive

Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
---
 tools/perf/builtin-annotate.c |    2 +-
 tools/perf/builtin-diff.c     |    2 +-
 tools/perf/builtin-report.c   |   13 +++----
 tools/perf/builtin-top.c      |    2 +-
 tools/perf/util/callchain.c   |   15 ++++++++
 tools/perf/util/callchain.h   |    4 ++
 tools/perf/util/hist.c        |   80 +++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/hist.h        |    4 ++-
 tools/perf/util/sort.c        |   14 +++++++
 tools/perf/util/sort.h        |    4 ++
 10 files changed, 126 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 806e0a2..5651b7b 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, 1);
+	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, 1);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 4f19513..4a30856 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -27,7 +27,7 @@ static bool show_displacement;
 static int hists__add_entry(struct hists *self,
 			    struct addr_location *al, u64 period)
 {
-	if (__hists__add_entry(self, al, NULL, period) != NULL)
+	if (__hists__add_entry(self, al, NULL, NULL, period) != NULL)
 		return 0;
 	return -ENOMEM;
 }
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 25d34d4..f20587f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -61,6 +61,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct symbol *parent = NULL;
 	int err = 0;
 	struct hist_entry *he;
+	struct callchain_cursor *cursor;
 
 	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
 		err = machine__resolve_callchain(machine, evsel, al->thread,
@@ -69,17 +70,12 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 			return err;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, parent, sample->period);
+	cursor = &evsel->hists.callchain_cursor;
+	he = __hists__add_entry(&evsel->hists, al, parent,
+			        cursor, sample->period);
 	if (he == NULL)
 		return -ENOMEM;
 
-	if (symbol_conf.use_callchain) {
-		err = callchain_append(he->callchain,
-				       &evsel->hists.callchain_cursor,
-				       sample->period);
-		if (err)
-			return err;
-	}
 	/*
 	 * Only in the newt browser we are doing integrated annotation,
 	 * so we don't allocated the extra space needed because the stdio
@@ -595,6 +591,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __used)
 	sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
 	sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list, "comm", stdout);
 	sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", stdout);
+	sort_entry__setup_elide(&sort_sym_inclusive, symbol_conf.sym_list, "inclusive", stdout);
 
 	return __cmd_report(&report);
 }
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index e3c63ae..41e7153 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -235,7 +235,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 {
 	struct hist_entry *he;
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, sample->period);
+	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, sample->period);
 	if (he == NULL)
 		return NULL;
 
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 9f7106a..2b824a5 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -459,3 +459,18 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
 
 	return 0;
 }
+
+int callchain_get(struct callchain_cursor *cursor,
+		  struct addr_location *al)
+{
+	struct callchain_cursor_node *node = cursor->curr;
+
+	if (node == NULL)
+		return -1;
+
+	al->map = node->map;
+	al->sym = node->sym;
+	al->addr = node->ip;
+
+	return 0;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 7f9c0f1..dcff6ec 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -103,9 +103,13 @@ int callchain_merge(struct callchain_cursor *cursor,
 
 struct ip_callchain;
 union perf_event;
+struct addr_location;
 
 bool ip_callchain__valid(struct ip_callchain *chain,
 			 const union perf_event *event);
+
+int callchain_get(struct callchain_cursor *cursor, struct addr_location *al);
+
 /*
  * Initialize a cursor before adding entries inside, but keep
  * the previously allocated entries as a cache.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 6f505d1..f3200ea 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -184,7 +184,7 @@ static void hists__inc_nr_entries(struct hists *hists, struct hist_entry *h)
 	if (!h->filtered) {
 		hists__calc_col_len(hists, h);
 		++hists->nr_entries;
-		hists->stats.total_period += h->period;
+		hists->stats.total_period += h->period_self;
 	}
 }
 
@@ -195,7 +195,7 @@ static u8 symbol__parent_filter(const struct symbol *parent)
 	return 0;
 }
 
-struct hist_entry *__hists__add_entry(struct hists *hists,
+static struct hist_entry *___hists__add_entry(struct hists *hists,
 				      struct addr_location *al,
 				      struct symbol *sym_parent, u64 period)
 {
@@ -252,6 +252,82 @@ out_unlock:
 	return he;
 }
 
+static struct hist_entry *__hists__add_entry_inclusive(struct hists *hists,
+				struct addr_location *al,
+				struct symbol *sym_parent,
+				struct callchain_cursor *cursor,
+				u64 period)
+{
+	struct callchain_cursor iter = *cursor;
+	struct callchain_cursor new_cursor = *cursor;
+	struct hist_entry *he, *orig_he = NULL;
+	int err;
+	u64 i;
+
+	iter.pos = 0;
+	iter.curr = iter.first;
+	for (i = 0; i < cursor->nr; i++) {
+		struct addr_location al_child = *al;
+
+		err = callchain_get(&iter, &al_child);
+		if (err)
+			return NULL;
+		he = ___hists__add_entry(hists, &al_child, sym_parent, period);
+		if (he == NULL)
+			return NULL;
+
+		new_cursor.first = iter.curr;
+		new_cursor.nr = cursor->nr - i;
+		if (i == 0) {
+			he->period_self += period;
+			orig_he = he;
+		}
+		err = callchain_append(he->callchain, &new_cursor, period);
+		if (err)
+			return NULL;
+		callchain_cursor_advance(&iter);
+	}
+	return orig_he;
+}
+
+static struct hist_entry *__hists__add_entry_single(struct hists *hists,
+					     struct addr_location *al,
+					     struct symbol *sym_parent,
+					     struct callchain_cursor *cursor,
+					     u64 period)
+{
+	struct hist_entry *he;
+	int err;
+
+	he = ___hists__add_entry(hists, al, sym_parent, period);
+	if (he == NULL)
+		return NULL;
+	he->period_self += period;
+	if (symbol_conf.use_callchain) {
+		err = callchain_append(he->callchain, cursor, period);
+		if (err)
+			return NULL;
+	}
+	return he;
+}
+
+struct hist_entry *__hists__add_entry(struct hists *hists,
+				      struct addr_location *al,
+				      struct symbol *parent, 
+				      struct callchain_cursor *cursor,
+				      u64 period)
+{
+	struct hist_entry *he;
+
+	if (sort__has_inclusive && symbol_conf.use_callchain)
+		he = __hists__add_entry_inclusive(hists, al, parent,
+						  cursor, period);
+	else
+		he = __hists__add_entry_single(hists, al, parent,
+					       cursor, period);
+	return he;
+}
+
 int64_t
 hist_entry__cmp(struct hist_entry *left, struct hist_entry *right)
 {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 48e5acd..4d3c6fb 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -67,7 +67,9 @@ struct hists {
 
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
-				      struct symbol *parent, u64 period);
+				      struct symbol *parent, 
+				      struct callchain_cursor *cursor,
+				      u64 period);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
 int hist_entry__snprintf(struct hist_entry *self, char *bf, size_t size,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 16da30d..9033a62 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -8,6 +8,7 @@ const char	default_sort_order[] = "comm,dso,symbol";
 const char	*sort_order = default_sort_order;
 int		sort__need_collapse = 0;
 int		sort__has_parent = 0;
+int		sort__has_inclusive = 0;
 
 enum sort_type	sort__first_dimension;
 
@@ -197,6 +198,13 @@ struct sort_entry sort_sym = {
 	.se_width_idx	= HISTC_SYMBOL,
 };
 
+struct sort_entry sort_sym_inclusive = {
+	.se_header	= "Symbol (Inclusive)",
+	.se_cmp		= sort__sym_cmp,
+	.se_snprintf	= hist_entry__sym_snprintf,
+	.se_width_idx	= HISTC_SYMBOL,
+};
+
 /* --sort parent */
 
 static int64_t
@@ -259,6 +267,7 @@ static struct sort_dimension sort_dimensions[] = {
 	{ .name = "symbol",	.entry = &sort_sym,	},
 	{ .name = "parent",	.entry = &sort_parent,	},
 	{ .name = "cpu",	.entry = &sort_cpu,	},
+	{ .name = "inclusive",	.entry = &sort_sym_inclusive, },
 };
 
 int sort_dimension__add(const char *tok)
@@ -283,6 +292,9 @@ int sort_dimension__add(const char *tok)
 			sort__has_parent = 1;
 		}
 
+		if (sd->entry == &sort_sym_inclusive)
+			sort__has_inclusive = 1;
+
 		if (sd->taken)
 			return 0;
 
@@ -298,6 +310,8 @@ int sort_dimension__add(const char *tok)
 				sort__first_dimension = SORT_DSO;
 			else if (!strcmp(sd->name, "symbol"))
 				sort__first_dimension = SORT_SYM;
+			else if (!strcmp(sd->name, "inclusive"))
+				sort__first_dimension = SORT_INCLUSIVE;
 			else if (!strcmp(sd->name, "parent"))
 				sort__first_dimension = SORT_PARENT;
 			else if (!strcmp(sd->name, "cpu"))
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 3f67ae3..97f67d5 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -31,10 +31,12 @@ extern const char *parent_pattern;
 extern const char default_sort_order[];
 extern int sort__need_collapse;
 extern int sort__has_parent;
+extern int sort__has_inclusive;
 extern char *field_sep;
 extern struct sort_entry sort_comm;
 extern struct sort_entry sort_dso;
 extern struct sort_entry sort_sym;
+extern struct sort_entry sort_sym_inclusive;
 extern struct sort_entry sort_parent;
 extern enum sort_type sort__first_dimension;
 
@@ -48,6 +50,7 @@ struct hist_entry {
 	struct rb_node		rb_node_in;
 	struct rb_node		rb_node;
 	u64			period;
+	u64			period_self;
 	u64			period_sys;
 	u64			period_us;
 	u64			period_guest_sys;
@@ -82,6 +85,7 @@ enum sort_type {
 	SORT_SYM,
 	SORT_PARENT,
 	SORT_CPU,
+	SORT_INCLUSIVE,
 };
 
 /*
-- 
1.7.8.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-14 17:36 [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4) Arun Sharma
@ 2012-03-15  1:02   ` Namhyung Kim
  2012-03-15 14:14 ` Frederic Weisbecker
  1 sibling, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2012-03-15  1:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-perf-users

2012-03-15 2:36 AM, Arun Sharma wrote:
> Each entry that used to get added once to the histogram, now is added
> chain->nr times, each time with one less entry in the
> callchain.
>
> This will result in a non-leaf function that appears in a lot of
> samples to get a histogram entry with lots of hits.
>
> The user can then drill down into the callchains of functions that
> have high inclusive times.
>
> Sample command lines:
>
> $ perf record -ag -- sleep 1
> $ perf report -g graph,0.5,callee -n -s inclusive
>

Reviewed-by: Namhyung Kim <namhyung.kim@lge.com>

Thanks, looks good to me now - although I still prefer make it a switch to 
alter the behavior of the existing "symbol" sort order to do "inclusive" 
calculation of its periods, but I can live with this :).


> Signed-off-by: Arun Sharma <asharma@fb.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> CC: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Namhyung Kim <namhyung.kim@lge.com>
> Cc: Tom Zanussi <tzanussi@gmail.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-perf-users@vger.kernel.org
> ---
>   tools/perf/builtin-annotate.c |    2 +-
>   tools/perf/builtin-diff.c     |    2 +-
>   tools/perf/builtin-report.c   |   13 +++----
>   tools/perf/builtin-top.c      |    2 +-
>   tools/perf/util/callchain.c   |   15 ++++++++
>   tools/perf/util/callchain.h   |    4 ++
>   tools/perf/util/hist.c        |   80 +++++++++++++++++++++++++++++++++++++++-
>   tools/perf/util/hist.h        |    4 ++-
>   tools/perf/util/sort.c        |   14 +++++++
>   tools/perf/util/sort.h        |    4 ++
>   10 files changed, 126 insertions(+), 14 deletions(-)
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
@ 2012-03-15  1:02   ` Namhyung Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2012-03-15  1:02 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Mike Galbraith, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Tom Zanussi, linux-perf-users

2012-03-15 2:36 AM, Arun Sharma wrote:
> Each entry that used to get added once to the histogram, now is added
> chain->nr times, each time with one less entry in the
> callchain.
>
> This will result in a non-leaf function that appears in a lot of
> samples to get a histogram entry with lots of hits.
>
> The user can then drill down into the callchains of functions that
> have high inclusive times.
>
> Sample command lines:
>
> $ perf record -ag -- sleep 1
> $ perf report -g graph,0.5,callee -n -s inclusive
>

Reviewed-by: Namhyung Kim <namhyung.kim@lge.com>

Thanks, looks good to me now - although I still prefer make it a switch to 
alter the behavior of the existing "symbol" sort order to do "inclusive" 
calculation of its periods, but I can live with this :).


> Signed-off-by: Arun Sharma <asharma@fb.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> CC: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Namhyung Kim <namhyung.kim@lge.com>
> Cc: Tom Zanussi <tzanussi@gmail.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-perf-users@vger.kernel.org
> ---
>   tools/perf/builtin-annotate.c |    2 +-
>   tools/perf/builtin-diff.c     |    2 +-
>   tools/perf/builtin-report.c   |   13 +++----
>   tools/perf/builtin-top.c      |    2 +-
>   tools/perf/util/callchain.c   |   15 ++++++++
>   tools/perf/util/callchain.h   |    4 ++
>   tools/perf/util/hist.c        |   80 +++++++++++++++++++++++++++++++++++++++-
>   tools/perf/util/hist.h        |    4 ++-
>   tools/perf/util/sort.c        |   14 +++++++
>   tools/perf/util/sort.h        |    4 ++
>   10 files changed, 126 insertions(+), 14 deletions(-)
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-14 17:36 [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4) Arun Sharma
  2012-03-15  1:02   ` Namhyung Kim
@ 2012-03-15 14:14 ` Frederic Weisbecker
  2012-03-15 17:58     ` Arun Sharma
  1 sibling, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2012-03-15 14:14 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On Wed, Mar 14, 2012 at 10:36:47AM -0700, Arun Sharma wrote:
> Each entry that used to get added once to the histogram, now is added
> chain->nr times, each time with one less entry in the
> callchain.
> 
> This will result in a non-leaf function that appears in a lot of
> samples to get a histogram entry with lots of hits.
> 
> The user can then drill down into the callchains of functions that
> have high inclusive times.
> 
> Sample command lines:
> 
> $ perf record -ag -- sleep 1
> $ perf report -g graph,0.5,callee -n -s inclusive
> 
> Signed-off-by: Arun Sharma <asharma@fb.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> CC: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: Namhyung Kim <namhyung.kim@lge.com>
> Cc: Tom Zanussi <tzanussi@gmail.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-perf-users@vger.kernel.org
> ---
>  tools/perf/builtin-annotate.c |    2 +-
>  tools/perf/builtin-diff.c     |    2 +-
>  tools/perf/builtin-report.c   |   13 +++----
>  tools/perf/builtin-top.c      |    2 +-
>  tools/perf/util/callchain.c   |   15 ++++++++
>  tools/perf/util/callchain.h   |    4 ++
>  tools/perf/util/hist.c        |   80 +++++++++++++++++++++++++++++++++++++++-
>  tools/perf/util/hist.h        |    4 ++-
>  tools/perf/util/sort.c        |   14 +++++++
>  tools/perf/util/sort.h        |    4 ++
>  10 files changed, 126 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index 806e0a2..5651b7b 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
>  		return 0;
>  	}
>  
> -	he = __hists__add_entry(&evsel->hists, al, NULL, 1);
> +	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, 1);
>  	if (he == NULL)
>  		return -ENOMEM;
>  
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 4f19513..4a30856 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -27,7 +27,7 @@ static bool show_displacement;
>  static int hists__add_entry(struct hists *self,
>  			    struct addr_location *al, u64 period)
>  {
> -	if (__hists__add_entry(self, al, NULL, period) != NULL)
> +	if (__hists__add_entry(self, al, NULL, NULL, period) != NULL)
>  		return 0;
>  	return -ENOMEM;
>  }
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 25d34d4..f20587f 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -61,6 +61,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
>  	struct symbol *parent = NULL;
>  	int err = 0;
>  	struct hist_entry *he;
> +	struct callchain_cursor *cursor;
>  
>  	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
>  		err = machine__resolve_callchain(machine, evsel, al->thread,
> @@ -69,17 +70,12 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
>  			return err;
>  	}
>  
> -	he = __hists__add_entry(&evsel->hists, al, parent, sample->period);
> +	cursor = &evsel->hists.callchain_cursor;
> +	he = __hists__add_entry(&evsel->hists, al, parent,
> +			        cursor, sample->period);
>  	if (he == NULL)
>  		return -ENOMEM;
>  
> -	if (symbol_conf.use_callchain) {
> -		err = callchain_append(he->callchain,
> -				       &evsel->hists.callchain_cursor,
> -				       sample->period);
> -		if (err)
> -			return err;
> -	}
>  	/*
>  	 * Only in the newt browser we are doing integrated annotation,
>  	 * so we don't allocated the extra space needed because the stdio
> @@ -595,6 +591,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __used)
>  	sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
>  	sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list, "comm", stdout);
>  	sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", stdout);
> +	sort_entry__setup_elide(&sort_sym_inclusive, symbol_conf.sym_list, "inclusive", stdout);
>  
>  	return __cmd_report(&report);
>  }
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index e3c63ae..41e7153 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -235,7 +235,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
>  {
>  	struct hist_entry *he;
>  
> -	he = __hists__add_entry(&evsel->hists, al, NULL, sample->period);
> +	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, sample->period);
>  	if (he == NULL)
>  		return NULL;
>  
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 9f7106a..2b824a5 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -459,3 +459,18 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
>  
>  	return 0;
>  }
> +
> +int callchain_get(struct callchain_cursor *cursor,
> +		  struct addr_location *al)
> +{
> +	struct callchain_cursor_node *node = cursor->curr;
> +
> +	if (node == NULL)
> +		return -1;
> +
> +	al->map = node->map;
> +	al->sym = node->sym;
> +	al->addr = node->ip;
> +
> +	return 0;
> +}
> diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> index 7f9c0f1..dcff6ec 100644
> --- a/tools/perf/util/callchain.h
> +++ b/tools/perf/util/callchain.h
> @@ -103,9 +103,13 @@ int callchain_merge(struct callchain_cursor *cursor,
>  
>  struct ip_callchain;
>  union perf_event;
> +struct addr_location;
>  
>  bool ip_callchain__valid(struct ip_callchain *chain,
>  			 const union perf_event *event);
> +
> +int callchain_get(struct callchain_cursor *cursor, struct addr_location *al);
> +
>  /*
>   * Initialize a cursor before adding entries inside, but keep
>   * the previously allocated entries as a cache.
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 6f505d1..f3200ea 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -184,7 +184,7 @@ static void hists__inc_nr_entries(struct hists *hists, struct hist_entry *h)
>  	if (!h->filtered) {
>  		hists__calc_col_len(hists, h);
>  		++hists->nr_entries;
> -		hists->stats.total_period += h->period;
> +		hists->stats.total_period += h->period_self;

I still feel concerned about this.

If I have only one event with a period of 1 and with that callchain:

	a -> b -> c

Then I produce three hists

	1) a -> b -> c
	2) a -> b
	3) a

Each hist have a period of 1, but the total period is 1.
So the end result should be (IIUC):

100%    foo     a
100%    foo     b
                |
                --- a
100%    foo     c
                |
                --- b
                    |
                    --- c
		
And the percentages on callchain branches will have the same kind
of weird things.

So I'm not sure this is a good direction. I'd rather advocate to create
true hists for each callers, all having the same real period as the leaf.

Also this feature reminds me a lot the -b option in perf report.
Branch sorting and callchain inclusive sorting are a bit different in
the way they handle the things but the core idea is the same. Callchains
are branches as well.

Branch sorting (-b) adds a hist for every branch taken, and the period
is always 1. I wonder if this makes more sense than using the original
period of the event for all branches of the event. Not sure.

Anyway I wonder if both features can be better integrated. After all
they are about the same thing. The difference is that the source of
the branches is not the same and that callchains can be depicted into
trees.

So perhaps we can have -b specifying the desired source, in case both
are present: -b callchain and -b branch. Both at the same time wouldn't
make much sense I think.

And the source could default to either if we don't have callchain and
branch at the same time in the events.

Just an idea...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-15 14:14 ` Frederic Weisbecker
@ 2012-03-15 17:58     ` Arun Sharma
  0 siblings, 0 replies; 13+ messages in thread
From: Arun Sharma @ 2012-03-15 17:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/15/12 7:14 AM, Frederic Weisbecker wrote:

> I still feel concerned about this.
>
> If I have only one event with a period of 1 and with that callchain:
>
> 	a ->  b ->  c
>
> Then I produce three hists
>
> 	1) a ->  b ->  c
> 	2) a ->  b
> 	3) a
>
> Each hist have a period of 1, but the total period is 1.
> So the end result should be (IIUC):
>
> 100%    foo     a
> 100%    foo     b
>                  |
>                  --- a
> 100%    foo     c
>                  |
>                  --- b
>                      |
>                      --- c
>

That is correct. The first column no longer adds up to 100%.
  		
> And the percentages on callchain branches will have the same kind
> of weird things.

I expect --sort inclusive to be used with -g graph,0.5,caller. I can
polish this in the next rev where a single top level flag will set this up.

The percentages on the branches should still be accurate (as a 
percentage of total_period). Please let me know if this is not the case.

>
> So I'm not sure this is a good direction. I'd rather advocate to create
> true hists for each callers, all having the same real period as the leaf.
>

Please see the v5 I just posted. The callers have a true histogram entry 
in every sense, except that period_self == 0.

If we don't do this, total_period will be inflated.

> Also this feature reminds me a lot the -b option in perf report.
> Branch sorting and callchain inclusive sorting are a bit different in
> the way they handle the things but the core idea is the same. Callchains
> are branches as well.
>

Yes - I kept asking why the branch stack stuff doesn't use the existing 
callchain logic.

> Branch sorting (-b) adds a hist for every branch taken, and the period
> is always 1. I wonder if this makes more sense than using the original
> period of the event for all branches of the event. Not sure.
>
> Anyway I wonder if both features can be better integrated. After all
> they are about the same thing. The difference is that the source of
> the branches is not the same and that callchains can be depicted into
> trees.
>
> So perhaps we can have -b specifying the desired source, in case both
> are present: -b callchain and -b branch. Both at the same time wouldn't
> make much sense I think.
>
> And the source could default to either if we don't have callchain and
> branch at the same time in the events.
>
> Just an idea...

I haven't played much with the branch stack logic. Will do so and get back.

In the meanwhile, my impression is that there are two high level use cases:

* Compiler optimizers, tracing JITs etc

Which try to focus on a single branch and try to understand what 
happened with that branch

* Programmers who're trying to understand the behavior of the code they 
wrote in production

I think the branch-stack stuff primarily caters to the former and 
inclusive callchain stuff to the latter. I was thinking that getting the 
branch-stack data into callchains will make the data more useful to more 
people.

  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
@ 2012-03-15 17:58     ` Arun Sharma
  0 siblings, 0 replies; 13+ messages in thread
From: Arun Sharma @ 2012-03-15 17:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/15/12 7:14 AM, Frederic Weisbecker wrote:

> I still feel concerned about this.
>
> If I have only one event with a period of 1 and with that callchain:
>
> 	a ->  b ->  c
>
> Then I produce three hists
>
> 	1) a ->  b ->  c
> 	2) a ->  b
> 	3) a
>
> Each hist have a period of 1, but the total period is 1.
> So the end result should be (IIUC):
>
> 100%    foo     a
> 100%    foo     b
>                  |
>                  --- a
> 100%    foo     c
>                  |
>                  --- b
>                      |
>                      --- c
>

That is correct. The first column no longer adds up to 100%.
  		
> And the percentages on callchain branches will have the same kind
> of weird things.

I expect --sort inclusive to be used with -g graph,0.5,caller. I can
polish this in the next rev where a single top level flag will set this up.

The percentages on the branches should still be accurate (as a 
percentage of total_period). Please let me know if this is not the case.

>
> So I'm not sure this is a good direction. I'd rather advocate to create
> true hists for each callers, all having the same real period as the leaf.
>

Please see the v5 I just posted. The callers have a true histogram entry 
in every sense, except that period_self == 0.

If we don't do this, total_period will be inflated.

> Also this feature reminds me a lot the -b option in perf report.
> Branch sorting and callchain inclusive sorting are a bit different in
> the way they handle the things but the core idea is the same. Callchains
> are branches as well.
>

Yes - I kept asking why the branch stack stuff doesn't use the existing 
callchain logic.

> Branch sorting (-b) adds a hist for every branch taken, and the period
> is always 1. I wonder if this makes more sense than using the original
> period of the event for all branches of the event. Not sure.
>
> Anyway I wonder if both features can be better integrated. After all
> they are about the same thing. The difference is that the source of
> the branches is not the same and that callchains can be depicted into
> trees.
>
> So perhaps we can have -b specifying the desired source, in case both
> are present: -b callchain and -b branch. Both at the same time wouldn't
> make much sense I think.
>
> And the source could default to either if we don't have callchain and
> branch at the same time in the events.
>
> Just an idea...

I haven't played much with the branch stack logic. Will do so and get back.

In the meanwhile, my impression is that there are two high level use cases:

* Compiler optimizers, tracing JITs etc

Which try to focus on a single branch and try to understand what 
happened with that branch

* Programmers who're trying to understand the behavior of the code they 
wrote in production

I think the branch-stack stuff primarily caters to the former and 
inclusive callchain stuff to the latter. I was thinking that getting the 
branch-stack data into callchains will make the data more useful to more 
people.

  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-15 17:58     ` Arun Sharma
  (?)
@ 2012-03-19 15:57     ` Frederic Weisbecker
  2012-03-20 23:28         ` Arun Sharma
  -1 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2012-03-19 15:57 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On Thu, Mar 15, 2012 at 10:58:38AM -0700, Arun Sharma wrote:
> On 3/15/12 7:14 AM, Frederic Weisbecker wrote:
> 
> >I still feel concerned about this.
> >
> >If I have only one event with a period of 1 and with that callchain:
> >
> >	a ->  b ->  c
> >
> >Then I produce three hists
> >
> >	1) a ->  b ->  c
> >	2) a ->  b
> >	3) a
> >
> >Each hist have a period of 1, but the total period is 1.
> >So the end result should be (IIUC):
> >
> >100%    foo     a
> >100%    foo     b
> >                 |
> >                 --- a
> >100%    foo     c
> >                 |
> >                 --- b
> >                     |
> >                     --- c
> >
> 
> That is correct. The first column no longer adds up to 100%.

So do we really want this?

>  		
> >And the percentages on callchain branches will have the same kind
> >of weird things.
> 
> I expect --sort inclusive to be used with -g graph,0.5,caller. I can
> polish this in the next rev where a single top level flag will set this up.
>
> The percentages on the branches should still be accurate (as a
> percentage of total_period). Please let me know if this is not the
> case.
> >
> >So I'm not sure this is a good direction. I'd rather advocate to create
> >true hists for each callers, all having the same real period as the leaf.
> >
> 
> Please see the v5 I just posted. The callers have a true histogram
> entry in every sense, except that period_self == 0.
> 
> If we don't do this, total_period will be inflated.

Yeah right I've just tried and callchains look right. I'm just puzzled
by the percentages:

+  98,99%  [k] execve
+  98,99%  [k] stub_execve
+  98,99%  [k] do_execve
+  98,99%  [k] do_execve_common
+  98,99%  [k] sys_execve
+  53,12%  [k] __libc_start_main
+  53,12%  [k] cmd_record
+  53,12%  [k] T.101
+  53,12%  [k] main
+  53,12%  [k] run_builtin
+  52,11%  [k] perf_evlist__prepare_workload
+  52,09%  [k] T.1163

> 
> >Also this feature reminds me a lot the -b option in perf report.
> >Branch sorting and callchain inclusive sorting are a bit different in
> >the way they handle the things but the core idea is the same. Callchains
> >are branches as well.
> >
> 
> Yes - I kept asking why the branch stack stuff doesn't use the
> existing callchain logic.

Because I fear that loops branches could make the tree representation useless.

> 
> >Branch sorting (-b) adds a hist for every branch taken, and the period
> >is always 1. I wonder if this makes more sense than using the original
> >period of the event for all branches of the event. Not sure.
> >
> >Anyway I wonder if both features can be better integrated. After all
> >they are about the same thing. The difference is that the source of
> >the branches is not the same and that callchains can be depicted into
> >trees.
> >
> >So perhaps we can have -b specifying the desired source, in case both
> >are present: -b callchain and -b branch. Both at the same time wouldn't
> >make much sense I think.
> >
> >And the source could default to either if we don't have callchain and
> >branch at the same time in the events.
> >
> >Just an idea...
> 
> I haven't played much with the branch stack logic. Will do so and get back.
> 
> In the meanwhile, my impression is that there are two high level use cases:
> 
> * Compiler optimizers, tracing JITs etc
> 
> Which try to focus on a single branch and try to understand what
> happened with that branch
> 
> * Programmers who're trying to understand the behavior of the code
> they wrote in production
> 
> I think the branch-stack stuff primarily caters to the former and
> inclusive callchain stuff to the latter. I was thinking that getting
> the branch-stack data into callchains will make the data more useful
> to more people.

I don't know. "if/else" generated branch could be relevant when represented
in a tree like we do for callchains. But I fear this doesn't work anymore
once we deal with loops.

> 
>  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-19 15:57     ` Frederic Weisbecker
@ 2012-03-20 23:28         ` Arun Sharma
  0 siblings, 0 replies; 13+ messages in thread
From: Arun Sharma @ 2012-03-20 23:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/19/12 8:57 AM, Frederic Weisbecker wrote:

>>> Each hist have a period of 1, but the total period is 1.
>>> So the end result should be (IIUC):
>>>
>>> 100%    foo     a
>>> 100%    foo     b
>>>                  |
>>>                  --- a
>>> 100%    foo     c
>>>                  |
>>>                  --- b
>>>                      |
>>>                      --- c
>>>
>>
>> That is correct. The first column no longer adds up to 100%.
>
> So do we really want this?
>

I think so. It's a different way of presenting the data. Pie chart vs a 
bar chart of OS market share where people may be using more than one OS.

I'll post some documentation updates.

>> If we don't do this, total_period will be inflated.
>
> Yeah right I've just tried and callchains look right. I'm just puzzled
> by the percentages:
>

Thanks for testing this!

> +  98,99%  [k] execve
> +  98,99%  [k] stub_execve
> +  98,99%  [k] do_execve
> +  98,99%  [k] do_execve_common
> +  98,99%  [k] sys_execve
> +  53,12%  [k] __libc_start_main
> +  53,12%  [k] cmd_record

These look like they belong to the perf binary and are incorrectly 
classified as kernel samples. Problem is that callchain_get() is not 
populating the privilege level - it's simply propagating the privilege 
level of the sample:


+       for (i = 0; i < cursor->nr; i++) {
+               struct addr_location al_child = *al;
+
+               err = callchain_get(&iter, &al_child);

Not all fields of al_child are populated by callchain_get().

> +  53,12%  [k] T.101
> +  53,12%  [k] main
> +  53,12%  [k] run_builtin
> +  52,11%  [k] perf_evlist__prepare_workload
> +  52,09%  [k] T.1163

The rest of them look ok to me. If something doesn't make sense, please 
point me at the output of "perf script".

>
>>
>>> Also this feature reminds me a lot the -b option in perf report.
>>> Branch sorting and callchain inclusive sorting are a bit different in
>>> the way they handle the things but the core idea is the same. Callchains
>>> are branches as well.
>>>
>>
>> Yes - I kept asking why the branch stack stuff doesn't use the
>> existing callchain logic.
>
> Because I fear that loops branches could make the tree representation useless.
>

The loops could happen in callgraphs too right (eg: recursive programs)? 
The other problem in branch stacks/LBR is that they're sampled branches. 
Just because I got a sample with:

a -> b
b -> c

doesn't necessarily mean that the callchain was a -> b -> c.

I still don't have the branch stack setup working properly. But I'm now 
more sympathetic to the view that last branch sampling and callchains 
may have different representations in perf.

  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
@ 2012-03-20 23:28         ` Arun Sharma
  0 siblings, 0 replies; 13+ messages in thread
From: Arun Sharma @ 2012-03-20 23:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/19/12 8:57 AM, Frederic Weisbecker wrote:

>>> Each hist have a period of 1, but the total period is 1.
>>> So the end result should be (IIUC):
>>>
>>> 100%    foo     a
>>> 100%    foo     b
>>>                  |
>>>                  --- a
>>> 100%    foo     c
>>>                  |
>>>                  --- b
>>>                      |
>>>                      --- c
>>>
>>
>> That is correct. The first column no longer adds up to 100%.
>
> So do we really want this?
>

I think so. It's a different way of presenting the data. Pie chart vs a 
bar chart of OS market share where people may be using more than one OS.

I'll post some documentation updates.

>> If we don't do this, total_period will be inflated.
>
> Yeah right I've just tried and callchains look right. I'm just puzzled
> by the percentages:
>

Thanks for testing this!

> +  98,99%  [k] execve
> +  98,99%  [k] stub_execve
> +  98,99%  [k] do_execve
> +  98,99%  [k] do_execve_common
> +  98,99%  [k] sys_execve
> +  53,12%  [k] __libc_start_main
> +  53,12%  [k] cmd_record

These look like they belong to the perf binary and are incorrectly 
classified as kernel samples. Problem is that callchain_get() is not 
populating the privilege level - it's simply propagating the privilege 
level of the sample:


+       for (i = 0; i < cursor->nr; i++) {
+               struct addr_location al_child = *al;
+
+               err = callchain_get(&iter, &al_child);

Not all fields of al_child are populated by callchain_get().

> +  53,12%  [k] T.101
> +  53,12%  [k] main
> +  53,12%  [k] run_builtin
> +  52,11%  [k] perf_evlist__prepare_workload
> +  52,09%  [k] T.1163

The rest of them look ok to me. If something doesn't make sense, please 
point me at the output of "perf script".

>
>>
>>> Also this feature reminds me a lot the -b option in perf report.
>>> Branch sorting and callchain inclusive sorting are a bit different in
>>> the way they handle the things but the core idea is the same. Callchains
>>> are branches as well.
>>>
>>
>> Yes - I kept asking why the branch stack stuff doesn't use the
>> existing callchain logic.
>
> Because I fear that loops branches could make the tree representation useless.
>

The loops could happen in callgraphs too right (eg: recursive programs)? 
The other problem in branch stacks/LBR is that they're sampled branches. 
Just because I got a sample with:

a -> b
b -> c

doesn't necessarily mean that the callchain was a -> b -> c.

I still don't have the branch stack setup working properly. But I'm now 
more sympathetic to the view that last branch sampling and callchains 
may have different representations in perf.

  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-20 23:28         ` Arun Sharma
  (?)
@ 2012-03-25  2:14         ` Frederic Weisbecker
  2012-03-27 18:09             ` Arun Sharma
  -1 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2012-03-25  2:14 UTC (permalink / raw)
  To: Arun Sharma
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On Tue, Mar 20, 2012 at 04:28:09PM -0700, Arun Sharma wrote:
> On 3/19/12 8:57 AM, Frederic Weisbecker wrote:
> 
> >>>Each hist have a period of 1, but the total period is 1.
> >>>So the end result should be (IIUC):
> >>>
> >>>100%    foo     a
> >>>100%    foo     b
> >>>                 |
> >>>                 --- a
> >>>100%    foo     c
> >>>                 |
> >>>                 --- b
> >>>                     |
> >>>                     --- c
> >>>
> >>
> >>That is correct. The first column no longer adds up to 100%.
> >
> >So do we really want this?
> >
> 
> I think so. It's a different way of presenting the data. Pie chart
> vs a bar chart of OS market share where people may be using more
> than one OS.
> 
> I'll post some documentation updates.

Ok this is one way to do this. The -b option has chosen to use a
period of 1 for everyone.

But both are doing the same thing, except that in the case of
callchains we also rewind the stacktrace at each entry.

But otherwise they deal with the same idea. I wish we have
a unified way of representing both. Either both should use
a period of 1 and add to the total period, or we should use
your way, I don't know. But I have the feeling they shouldn't
be treated differently.

> 
> >>If we don't do this, total_period will be inflated.
> >
> >Yeah right I've just tried and callchains look right. I'm just puzzled
> >by the percentages:
> >
> 
> Thanks for testing this!
> 
> >+  98,99%  [k] execve
> >+  98,99%  [k] stub_execve
> >+  98,99%  [k] do_execve
> >+  98,99%  [k] do_execve_common
> >+  98,99%  [k] sys_execve
> >+  53,12%  [k] __libc_start_main
> >+  53,12%  [k] cmd_record
> 
> These look like they belong to the perf binary and are incorrectly
> classified as kernel samples. Problem is that callchain_get() is not
> populating the privilege level - it's simply propagating the
> privilege level of the sample:

Ah so you should perhaps rather look at the raw callchain from the
sample. It contains the PERF_CONTEXT_* things

> >>>Also this feature reminds me a lot the -b option in perf report.
> >>>Branch sorting and callchain inclusive sorting are a bit different in
> >>>the way they handle the things but the core idea is the same. Callchains
> >>>are branches as well.
> >>>
> >>
> >>Yes - I kept asking why the branch stack stuff doesn't use the
> >>existing callchain logic.
> >
> >Because I fear that loops branches could make the tree representation useless.
> >
> 
> The loops could happen in callgraphs too right (eg: recursive
> programs)?

Right but loops are a common construct used in most programs.
recursive functions are more rare. enough for us to assume
than we can build a tree where branches are often hit more
than one time.

Another thing with callchains VS branches: with callchain we generalize
the sample IPs to the symbol of the function they are contained. We
want this kind of generalization on callchains.

This is not true with branches where details are zoomed. There are less
chances for different branch samples to match each other inside a tree.

> The other problem in branch stacks/LBR is that they're
> sampled branches. Just because I got a sample with:
> 
> a -> b
> b -> c
> 
> doesn't necessarily mean that the callchain was a -> b -> c.

Not sure what you mean. If you have a -> b, b -> c in single
LBR sample it means you got a -> b -> c.

> 
> I still don't have the branch stack setup working properly. But I'm
> now more sympathetic to the view that last branch sampling and
> callchains may have different representations in perf.
> 
>  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-25  2:14         ` Frederic Weisbecker
@ 2012-03-27 18:09             ` Arun Sharma
  0 siblings, 0 replies; 13+ messages in thread
From: Arun Sharma @ 2012-03-27 18:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/24/12 7:14 PM, Frederic Weisbecker wrote:

>> The other problem in branch stacks/LBR is that they're
>> sampled branches. Just because I got a sample with:
>>
>> a ->  b
>> b ->  c
>>
>> doesn't necessarily mean that the callchain was a ->  b ->  c.
>
> Not sure what you mean. If you have a ->  b, b ->  c in single
> LBR sample it means you got a ->  b ->  c.
>

I was going by Stephane's commit message here:

http://article.gmane.org/gmane.linux.kernel/1236999

 > Statistical sampling of taken branch should not be confused
 > for branch tracing. Not all branches are necessarily captured

Stephane, could you please explain if the 16 filtered branches in LBR 
are guaranteed to be from a given callchain to the leaf function? My 
understanding is that it's not.

callchain1: a -> b -> d -> e (sample a->b)
callchain2: a -> c -> b -> f (sample b->f)

on PMU interrupt can we end up with:

b -> f <- top of stack
a -> b
...

even though a -> b -> f can never happen in the actual program flow?

  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
@ 2012-03-27 18:09             ` Arun Sharma
  0 siblings, 0 replies; 13+ messages in thread
From: Arun Sharma @ 2012-03-27 18:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mike Galbraith, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Namhyung Kim, Tom Zanussi, linux-perf-users

On 3/24/12 7:14 PM, Frederic Weisbecker wrote:

>> The other problem in branch stacks/LBR is that they're
>> sampled branches. Just because I got a sample with:
>>
>> a ->  b
>> b ->  c
>>
>> doesn't necessarily mean that the callchain was a ->  b ->  c.
>
> Not sure what you mean. If you have a ->  b, b ->  c in single
> LBR sample it means you got a ->  b ->  c.
>

I was going by Stephane's commit message here:

http://article.gmane.org/gmane.linux.kernel/1236999

 > Statistical sampling of taken branch should not be confused
 > for branch tracing. Not all branches are necessarily captured

Stephane, could you please explain if the 16 filtered branches in LBR 
are guaranteed to be from a given callchain to the leaf function? My 
understanding is that it's not.

callchain1: a -> b -> d -> e (sample a->b)
callchain2: a -> c -> b -> f (sample b->f)

on PMU interrupt can we end up with:

b -> f <- top of stack
a -> b
...

even though a -> b -> f can never happen in the actual program flow?

  -Arun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4)
  2012-03-27 18:09             ` Arun Sharma
  (?)
@ 2012-03-27 19:38             ` Peter Zijlstra
  -1 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2012-03-27 19:38 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mike Galbraith, Paul Mackerras,
	Stephane Eranian, Namhyung Kim, Tom Zanussi, linux-perf-users

On Tue, 2012-03-27 at 11:09 -0700, Arun Sharma wrote:
> On 3/24/12 7:14 PM, Frederic Weisbecker wrote:
> 
> >> The other problem in branch stacks/LBR is that they're
> >> sampled branches. Just because I got a sample with:
> >>
> >> a ->  b
> >> b ->  c
> >>
> >> doesn't necessarily mean that the callchain was a ->  b ->  c.
> >
> > Not sure what you mean. If you have a ->  b, b ->  c in single
> > LBR sample it means you got a ->  b ->  c.
> >
> 
> I was going by Stephane's commit message here:
> 
> http://article.gmane.org/gmane.linux.kernel/1236999
> 
>  > Statistical sampling of taken branch should not be confused
>  > for branch tracing. Not all branches are necessarily captured
> 
> Stephane, could you please explain if the 16 filtered branches in LBR 
> are guaranteed to be from a given callchain to the leaf function? My 
> understanding is that it's not.
> 
> callchain1: a -> b -> d -> e (sample a->b)
> callchain2: a -> c -> b -> f (sample b->f)
> 
> on PMU interrupt can we end up with:
> 
> b -> f <- top of stack
> a -> b
> ...
> 
> even though a -> b -> f can never happen in the actual program flow?

Right, so the LBR is a queue not a stack. A program like:

foo() {
	bar1();
	bar2();
}

will, using the lbr, look like: foo->bar1->bar2 (if you filter returns),
or foo->bar1->foo+x->bar2 if you include returns.

A callchain is a pure stack, a return pops the top most entry, the above
program can only give 3 possible callchains:

a) foo
b) foo, bar1
c) foo, bar2

Furthermore, the LBR is about any branch, callchains are about function
calls.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-03-27 19:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-14 17:36 [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v4) Arun Sharma
2012-03-15  1:02 ` Namhyung Kim
2012-03-15  1:02   ` Namhyung Kim
2012-03-15 14:14 ` Frederic Weisbecker
2012-03-15 17:58   ` Arun Sharma
2012-03-15 17:58     ` Arun Sharma
2012-03-19 15:57     ` Frederic Weisbecker
2012-03-20 23:28       ` Arun Sharma
2012-03-20 23:28         ` Arun Sharma
2012-03-25  2:14         ` Frederic Weisbecker
2012-03-27 18:09           ` Arun Sharma
2012-03-27 18:09             ` Arun Sharma
2012-03-27 19:38             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.