All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
@ 2013-10-31  6:56 Namhyung Kim
  2013-10-31  6:56 ` [PATCH 01/14] perf tools: Consolidate __hists__add_*entry() Namhyung Kim
                   ` (14 more replies)
  0 siblings, 15 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

Hi,

This is my second attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

Please see first two patches.  I refactored functions that add hist
entries with struct add_entry_iter.  While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.

This patchset basically adds period in a sample to every node in the
callchain.  A hist_entry now has an additional fields to keep the
cumulative period if -g cumulative option is given on perf report.

Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")

  void a(void)
  {
  	int i;
  	for (i = 0; i < 1000000; i++)
  		barrier();
  }
  void b(void)
  {
  	a();
  }
  void c(void)
  {
  	b();
  }
  int main(void)
  {
  	c();
  	return 0;
  }

With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc

  $ perf report --stdio
      88.29%      abc  abc                [.] a                  
                  |
                  --- a
                      b
                      c
                      main
                      __libc_start_main

       9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
                  |
                  --- _dl_relocate_object
                      dl_main
                      _dl_sysdep_start

       2.27%      abc  [kernel.kallsyms]  [k] page_fault         
                  |
                  --- page_fault
                     |          
                     |--95.94%-- _dl_sysdep_start
                     |          _dl_start_user
                     |          
                      --4.06%-- _start

       0.00%      abc  ld-2.17.so         [.] _start             
                  |
                  --- _start


When the -g cumulative option is given, it'll be shown like this:

  $ perf report -g cumulative --stdio

  # Overhead  Overhead (Acc)  Command      Shared Object                   Symbol
  # ........  ..............  .......  .................  .......................
  #
       0.00%          88.29%      abc  libc-2.17.so       [.] __libc_start_main  
       0.00%          88.29%      abc  abc                [.] main               
       0.00%          88.29%      abc  abc                [.] c                  
       0.00%          88.29%      abc  abc                [.] b                  
      88.29%          88.29%      abc  abc                [.] a                  
       0.00%          11.61%      abc  ld-2.17.so         [k] _dl_sysdep_start   
       0.00%           9.43%      abc  ld-2.17.so         [.] dl_main            
       9.43%           9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
       2.27%           2.27%      abc  [kernel.kallsyms]  [k] page_fault         
       0.00%           2.18%      abc  ld-2.17.so         [k] _dl_start_user     
       0.00%           0.10%      abc  ld-2.17.so         [.] _start             

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

I know it have some rough edges or even bugs, but I really want to
release it and get reviews.  It does not handle event groups and
annotations and it has a bug on TUI.

You can also get this series on 'perf/cumulate-v2' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks.
Namhyung


Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>

[1] https://lkml.org/lkml/2012/3/31/6

Namhyung Kim (14):
  perf tools: Consolidate __hists__add_*entry()
  perf tools: Introduce struct add_entry_iter
  perf hists: Convert hist entry functions to use struct he_stat
  perf hists: Add support for accumulated stat of hist entry
  perf hists: Check if accumulated when adding a hist entry
  perf hists: Accumulate hist entry stat based on the callchain
  perf tools: Update cpumode for each cumulative entry
  perf report: Cache cumulative callchains
  perf hists: Sort hist entries by accumulated period
  perf ui/hist: Add support to accumulated hist stat
  perf ui/browser: Add support to accumulated hist stat
  perf ui/gtk: Add support to accumulated hist stat
  perf tools: Apply percent-limit to cumulative percentage
  perf report: Add -g cumulative option

 tools/perf/Documentation/perf-report.txt |   2 +
 tools/perf/builtin-annotate.c            |   3 +-
 tools/perf/builtin-diff.c                |   3 +-
 tools/perf/builtin-report.c              | 659 ++++++++++++++++++++++++-------
 tools/perf/builtin-top.c                 |   5 +-
 tools/perf/tests/hists_link.c            |   6 +-
 tools/perf/ui/browsers/hists.c           |  32 +-
 tools/perf/ui/gtk/hists.c                |  20 +
 tools/perf/ui/hist.c                     |  41 ++
 tools/perf/ui/stdio/hist.c               |   5 +
 tools/perf/util/callchain.c              |  12 +
 tools/perf/util/callchain.h              |   3 +-
 tools/perf/util/hist.c                   | 142 +++----
 tools/perf/util/hist.h                   |  22 +-
 tools/perf/util/sort.h                   |   1 +
 15 files changed, 701 insertions(+), 255 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 01/14] perf tools: Consolidate __hists__add_*entry()
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-11-01 11:56   ` Jiri Olsa
  2013-11-06  5:43   ` [tip:perf/core] perf hists: " tip-bot for Namhyung Kim
  2013-10-31  6:56 ` [PATCH 02/14] perf tools: Introduce struct add_entry_iter Namhyung Kim
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos

From: Namhyung Kim <namhyung.kim@lge.com>

The __hists__add_{branch,mem}_entry() did almost same thing that
__hists__add_entry() does.  Consolidate them into one.

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-annotate.c |  2 +-
 tools/perf/builtin-diff.c     |  3 +-
 tools/perf/builtin-report.c   | 16 +++++++---
 tools/perf/builtin-top.c      |  5 +--
 tools/perf/tests/hists_link.c |  6 ++--
 tools/perf/util/hist.c        | 73 +++++--------------------------------------
 tools/perf/util/hist.h        | 18 ++---------
 7 files changed, 30 insertions(+), 93 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 6c5ae57831f6..4087ab19823c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1, 0);
+	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index b605009e803f..3b67ea2444bd 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -307,7 +307,8 @@ static int hists__add_entry(struct hists *hists,
 			    struct addr_location *al, u64 period,
 			    u64 weight, u64 transaction)
 {
-	if (__hists__add_entry(hists, al, NULL, period, weight, transaction) != NULL)
+	if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
+			       transaction) != NULL)
 		return 0;
 	return -ENOMEM;
 }
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 98d3891392e2..a7a8f7769629 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -115,7 +115,8 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
 	 * and this is indirectly achieved by passing period=weight here
 	 * and the he_stat__add_period() function.
 	 */
-	he = __hists__add_mem_entry(&evsel->hists, al, parent, mi, cost, cost);
+	he = __hists__add_entry(&evsel->hists, al, parent, NULL, mi,
+				cost, cost, 0);
 	if (!he)
 		return -ENOMEM;
 
@@ -200,12 +201,16 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 
 		err = -ENOMEM;
 
+		/* overwrite the 'al' to branch-to info */
+		al->map = bi[i].to.map;
+		al->sym = bi[i].to.sym;
+		al->addr = bi[i].to.addr;
 		/*
 		 * The report shows the percentage of total branches captured
 		 * and not events sampled. Thus we use a pseudo period of 1.
 		 */
-		he = __hists__add_branch_entry(&evsel->hists, al, parent,
-				&bi[i], 1, 1);
+		he = __hists__add_entry(&evsel->hists, al, parent, &bi[i], NULL,
+					1, 1, 0);
 		if (he) {
 			struct annotation *notes;
 			bx = he->branch_info;
@@ -266,8 +271,9 @@ static int perf_evsel__add_hist_entry(struct perf_tool *tool,
 			return err;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, parent, sample->period,
-				sample->weight, sample->transaction);
+	he = __hists__add_entry(&evsel->hists, al, parent, NULL, NULL,
+				sample->period, sample->weight,
+				sample->transaction);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 04f5bf2d8e10..5c538a43f263 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -246,8 +246,9 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct hist_entry *he;
 
 	pthread_mutex_lock(&evsel->hists.lock);
-	he = __hists__add_entry(&evsel->hists, al, NULL, sample->period,
-				sample->weight, sample->transaction);
+	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
+				sample->period, sample->weight,
+				sample->transaction);
 	pthread_mutex_unlock(&evsel->hists.lock);
 	if (he == NULL)
 		return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index b51abcb2c243..76992aba00a7 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(&evsel->hists, &al, NULL,
-						1, 1, 0);
+						NULL, NULL, 1, 1, 0);
 			if (he == NULL)
 				goto out;
 
@@ -245,8 +245,8 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 							  &sample) < 0)
 				goto out;
 
-			he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1,
-						0);
+			he = __hists__add_entry(&evsel->hists, &al, NULL,
+						NULL, NULL, 1, 1, 0);
 			if (he == NULL)
 				goto out;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 7e80253074b0..e6f953190443 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -407,71 +407,12 @@ out:
 	return he;
 }
 
-struct hist_entry *__hists__add_mem_entry(struct hists *hists,
-					  struct addr_location *al,
-					  struct symbol *sym_parent,
-					  struct mem_info *mi,
-					  u64 period,
-					  u64 weight)
-{
-	struct hist_entry entry = {
-		.thread	= al->thread,
-		.ms = {
-			.map	= al->map,
-			.sym	= al->sym,
-		},
-		.stat = {
-			.period	= period,
-			.weight = weight,
-			.nr_events = 1,
-		},
-		.cpu	= al->cpu,
-		.ip	= al->addr,
-		.level	= al->level,
-		.parent = sym_parent,
-		.filtered = symbol__parent_filter(sym_parent),
-		.hists = hists,
-		.mem_info = mi,
-		.branch_info = NULL,
-	};
-	return add_hist_entry(hists, &entry, al, period, weight);
-}
-
-struct hist_entry *__hists__add_branch_entry(struct hists *hists,
-					     struct addr_location *al,
-					     struct symbol *sym_parent,
-					     struct branch_info *bi,
-					     u64 period,
-					     u64 weight)
-{
-	struct hist_entry entry = {
-		.thread	= al->thread,
-		.ms = {
-			.map	= bi->to.map,
-			.sym	= bi->to.sym,
-		},
-		.cpu	= al->cpu,
-		.ip	= bi->to.addr,
-		.level	= al->level,
-		.stat = {
-			.period	= period,
-			.nr_events = 1,
-			.weight = weight,
-		},
-		.parent = sym_parent,
-		.filtered = symbol__parent_filter(sym_parent),
-		.branch_info = bi,
-		.hists	= hists,
-		.mem_info = NULL,
-	};
-
-	return add_hist_entry(hists, &entry, al, period, weight);
-}
-
 struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct addr_location *al,
-				      struct symbol *sym_parent, u64 period,
-				      u64 weight, u64 transaction)
+				      struct symbol *sym_parent,
+				      struct branch_info *bi,
+				      struct mem_info *mi,
+				      u64 period, u64 weight, u64 transaction)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
@@ -483,15 +424,15 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 		.ip	= al->addr,
 		.level	= al->level,
 		.stat = {
-			.period	= period,
 			.nr_events = 1,
+			.period	= period,
 			.weight = weight,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
 		.hists	= hists,
-		.branch_info = NULL,
-		.mem_info = NULL,
+		.branch_info = bi,
+		.mem_info = mi,
 		.transaction = transaction,
 	};
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 9d2d022cdb79..307f1c742563 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -86,7 +86,9 @@ struct hists {
 
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
-				      struct symbol *parent, u64 period,
+				      struct symbol *parent,
+				      struct branch_info *bi,
+				      struct mem_info *mi, u64 period,
 				      u64 weight, u64 transaction);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
@@ -95,20 +97,6 @@ int hist_entry__sort_snprintf(struct hist_entry *self, char *bf, size_t size,
 			      struct hists *hists);
 void hist_entry__free(struct hist_entry *);
 
-struct hist_entry *__hists__add_branch_entry(struct hists *self,
-					     struct addr_location *al,
-					     struct symbol *sym_parent,
-					     struct branch_info *bi,
-					     u64 period,
-					     u64 weight);
-
-struct hist_entry *__hists__add_mem_entry(struct hists *self,
-					  struct addr_location *al,
-					  struct symbol *sym_parent,
-					  struct mem_info *mi,
-					  u64 period,
-					  u64 weight);
-
 void hists__output_resort(struct hists *self);
 void hists__collapse_resort(struct hists *self, struct ui_progress *prog);
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 02/14] perf tools: Introduce struct add_entry_iter
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
  2013-10-31  6:56 ` [PATCH 01/14] perf tools: Consolidate __hists__add_*entry() Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-11-01 12:07   ` Jiri Olsa
  2013-11-01 12:09   ` Jiri Olsa
  2013-10-31  6:56 ` [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos

From: Namhyung Kim <namhyung.kim@lge.com>

There're some duplicate code when adding hist entries.  They are
different in that some have branch info or mem info but generally do
same thing.  So introduce new struct add_entry_iter and add callbacks
to customize each case in general way.

The new perf_evsel__add_entry() function will look like:

  iter->prepare_entry();
  iter->add_single_entry();

  while (iter->next_entry())
    iter->add_next_entry();

  iter->finish_entry();

This will help further work like the cumulative callchain patchset.

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c | 435 +++++++++++++++++++++++++++++---------------
 1 file changed, 284 insertions(+), 151 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a7a8f7769629..f18cd43687d9 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -73,38 +73,74 @@ static int perf_report_config(const char *var, const char *value, void *cb)
 	return perf_default_config(var, value, cb);
 }
 
-static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
-					   struct addr_location *al,
-					   struct perf_sample *sample,
-					   struct perf_evsel *evsel,
-					   struct machine *machine,
-					   union perf_event *event)
-{
-	struct perf_report *rep = container_of(tool, struct perf_report, tool);
-	struct symbol *parent = NULL;
-	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
-	int err = 0;
+struct add_entry_iter {
+	int total;
+	int curr;
+
+	struct perf_report *rep;
+	struct perf_evsel *evsel;
+	struct perf_sample *sample;
 	struct hist_entry *he;
-	struct mem_info *mi, *mx;
-	uint64_t cost;
+	struct symbol *parent;
+	void *priv;
+
+	int (*prepare_entry)(struct add_entry_iter *, struct machine *,
+			     struct perf_evsel *, struct addr_location *,
+			     struct perf_sample *);
+	int (*add_single_entry)(struct add_entry_iter *, struct addr_location *);
+	int (*next_entry)(struct add_entry_iter *, struct addr_location *);
+	int (*add_next_entry)(struct add_entry_iter *, struct addr_location *);
+	int (*finish_entry)(struct add_entry_iter *, struct addr_location *);
+};
 
-	if ((sort__has_parent || symbol_conf.use_callchain) &&
-	    sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
-						 sample, &parent, al,
-						 rep->max_stack);
-		if (err)
-			return err;
-	}
+static int
+iter_next_nop_entry(struct add_entry_iter *iter __maybe_unused,
+		    struct addr_location *al __maybe_unused)
+{
+	return 0;
+}
+
+static int
+iter_add_next_nop_entry(struct add_entry_iter *iter __maybe_unused,
+			struct addr_location *al __maybe_unused)
+{
+	return 0;
+}
+
+static int
+iter_prepare_mem_entry(struct add_entry_iter *iter, struct machine *machine,
+		       struct perf_evsel *evsel, struct addr_location *al,
+		       struct perf_sample *sample)
+{
+	union perf_event *event = iter->priv;
+	struct mem_info *mi;
+	u8 cpumode;
+
+	BUG_ON(event == NULL);
+
+	cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
 
 	mi = machine__resolve_mem(machine, al->thread, sample, cpumode);
-	if (!mi)
+	if (mi == NULL)
 		return -ENOMEM;
 
-	if (rep->hide_unresolved && !al->sym)
+	iter->evsel = evsel;
+	iter->sample = sample;
+	iter->priv = mi;
+	return 0;
+}
+
+static int
+iter_add_single_mem_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+	u64 cost;
+	struct mem_info *mi = iter->priv;
+	struct hist_entry *he;
+
+	if (iter->rep->hide_unresolved && !al->sym)
 		return 0;
 
-	cost = sample->weight;
+	cost = iter->sample->weight;
 	if (!cost)
 		cost = 1;
 
@@ -115,11 +151,24 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
 	 * and this is indirectly achieved by passing period=weight here
 	 * and the he_stat__add_period() function.
 	 */
-	he = __hists__add_entry(&evsel->hists, al, parent, NULL, mi,
+	he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
 				cost, cost, 0);
 	if (!he)
 		return -ENOMEM;
 
+	iter->he = he;
+	return 0;
+}
+
+static int
+iter_finish_mem_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+	struct perf_evsel *evsel = iter->evsel;
+	struct hist_entry *he = iter->he;
+	struct mem_info *mi = iter->priv;
+	int err = -ENOMEM;
+	u64 cost;
+
 	/*
 	 * In the TUI browser, we are doing integrated annotation,
 	 * so we don't allocate the extra space needed because the stdio
@@ -138,152 +187,179 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
 			goto out;
 	}
 
-	if (sort__has_sym && he->mem_info->daddr.sym && use_browser > 0) {
+	if (sort__has_sym && mi->daddr.sym && use_browser > 0) {
 		struct annotation *notes;
 
-		mx = he->mem_info;
-
-		notes = symbol__annotation(mx->daddr.sym);
-		if (notes->src == NULL && symbol__alloc_hist(mx->daddr.sym) < 0)
+		notes = symbol__annotation(mi->daddr.sym);
+		if (notes->src == NULL && symbol__alloc_hist(mi->daddr.sym) < 0)
 			goto out;
 
-		err = symbol__inc_addr_samples(mx->daddr.sym,
-					       mx->daddr.map,
-					       evsel->idx,
-					       mx->daddr.al_addr);
+		err = symbol__inc_addr_samples(mi->daddr.sym, mi->daddr.map,
+					       evsel->idx, mi->daddr.al_addr);
 		if (err)
 			goto out;
 	}
 
+	cost = iter->sample->weight;
+	if (!cost)
+		cost = 1;
+
 	evsel->hists.stats.total_period += cost;
 	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
 	err = 0;
 
 	if (symbol_conf.use_callchain) {
-		err = callchain_append(he->callchain,
-				       &callchain_cursor,
-				       sample->period);
+		err = callchain_append(he->callchain, &callchain_cursor,
+				       iter->sample->period);
 	}
+
 out:
 	return err;
 }
 
-static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
-					struct addr_location *al,
-					struct perf_sample *sample,
-					struct perf_evsel *evsel,
-				      struct machine *machine)
+static int
+iter_prepare_branch_entry(struct add_entry_iter *iter, struct machine *machine,
+			  struct perf_evsel *evsel, struct addr_location *al,
+			  struct perf_sample *sample)
 {
-	struct perf_report *rep = container_of(tool, struct perf_report, tool);
-	struct symbol *parent = NULL;
-	int err = 0;
-	unsigned i;
-	struct hist_entry *he;
-	struct branch_info *bi, *bx;
-
-	if ((sort__has_parent || symbol_conf.use_callchain)
-	    && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
-						 sample, &parent, al,
-						 rep->max_stack);
-		if (err)
-			return err;
-	}
+	struct branch_info *bi;
 
 	bi = machine__resolve_bstack(machine, al->thread,
 				     sample->branch_stack);
 	if (!bi)
 		return -ENOMEM;
 
-	for (i = 0; i < sample->branch_stack->nr; i++) {
-		if (rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
-			continue;
+	iter->curr = 0;
+	iter->total = sample->branch_stack->nr;
 
-		err = -ENOMEM;
+	iter->evsel = evsel;
+	iter->sample = sample;
+	iter->priv = bi;
+	return 0;
+}
 
-		/* overwrite the 'al' to branch-to info */
-		al->map = bi[i].to.map;
-		al->sym = bi[i].to.sym;
-		al->addr = bi[i].to.addr;
-		/*
-		 * The report shows the percentage of total branches captured
-		 * and not events sampled. Thus we use a pseudo period of 1.
-		 */
-		he = __hists__add_entry(&evsel->hists, al, parent, &bi[i], NULL,
-					1, 1, 0);
-		if (he) {
-			struct annotation *notes;
-			bx = he->branch_info;
-			if (bx->from.sym && use_browser == 1 && sort__has_sym) {
-				notes = symbol__annotation(bx->from.sym);
-				if (!notes->src
-				    && symbol__alloc_hist(bx->from.sym) < 0)
-					goto out;
-
-				err = symbol__inc_addr_samples(bx->from.sym,
-							       bx->from.map,
-							       evsel->idx,
-							       bx->from.al_addr);
-				if (err)
-					goto out;
-			}
+static int
+iter_add_single_branch_entry(struct add_entry_iter *iter __maybe_unused,
+			     struct addr_location *al __maybe_unused)
+{
+	return 0;
+}
 
-			if (bx->to.sym && use_browser == 1 && sort__has_sym) {
-				notes = symbol__annotation(bx->to.sym);
-				if (!notes->src
-				    && symbol__alloc_hist(bx->to.sym) < 0)
-					goto out;
-
-				err = symbol__inc_addr_samples(bx->to.sym,
-							       bx->to.map,
-							       evsel->idx,
-							       bx->to.al_addr);
-				if (err)
-					goto out;
-			}
-			evsel->hists.stats.total_period += 1;
-			hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-		} else
-			goto out;
-	}
-	err = 0;
-out:
-	free(bi);
-	return err;
+static int
+iter_next_branch_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+	struct branch_info *bi = iter->priv;
+	int i = iter->curr;
+
+	if (iter->curr >= iter->total)
+		return 0;
+
+	al->map = bi[i].to.map;
+	al->sym = bi[i].to.sym;
+	al->addr = bi[i].to.addr;
+	return 1;
 }
 
-static int perf_evsel__add_hist_entry(struct perf_tool *tool,
-				      struct perf_evsel *evsel,
-				      struct addr_location *al,
-				      struct perf_sample *sample,
-				      struct machine *machine)
+static int
+iter_add_next_branch_entry(struct add_entry_iter *iter, struct addr_location *al)
 {
-	struct perf_report *rep = container_of(tool, struct perf_report, tool);
-	struct symbol *parent = NULL;
-	int err = 0;
+	struct branch_info *bi, *bx;
+	struct annotation *notes;
+	struct perf_evsel *evsel = iter->evsel;
 	struct hist_entry *he;
+	int i = iter->curr;
+	int err;
 
-	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
-		err = machine__resolve_callchain(machine, evsel, al->thread,
-						 sample, &parent, al,
-						 rep->max_stack);
+	bi = iter->priv;
+
+	if (iter->rep->hide_unresolved && !(bi[i].from.sym && bi[i].to.sym))
+		goto out;
+
+	/*
+	 * The report shows the percentage of total branches captured
+	 * and not events sampled. Thus we use a pseudo period of 1.
+	 */
+	he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
+				1, 1, 0);
+	if (he == NULL)
+		return -ENOMEM;
+
+	bx = he->branch_info;
+	if (bx->from.sym && use_browser == 1 && sort__has_sym) {
+		notes = symbol__annotation(bx->from.sym);
+		if (!notes->src && symbol__alloc_hist(bx->from.sym) < 0)
+			return -ENOMEM;
+
+		err = symbol__inc_addr_samples(bx->from.sym, bx->from.map,
+					       evsel->idx, bx->from.al_addr);
+		if (err)
+			return err;
+	}
+
+	if (bx->to.sym && use_browser == 1 && sort__has_sym) {
+		notes = symbol__annotation(bx->to.sym);
+		if (!notes->src && symbol__alloc_hist(bx->to.sym) < 0)
+			return -ENOMEM;
+
+		err = symbol__inc_addr_samples(bx->to.sym, bx->to.map,
+					       evsel->idx, bx->to.al_addr);
 		if (err)
 			return err;
 	}
+	evsel->hists.stats.total_period += 1;
+	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+out:
+	iter->curr++;
+	return 0;
+}
+
+static int
+iter_finish_branch_entry(struct add_entry_iter *iter,
+			 struct addr_location *al __maybe_unused)
+{
+	free(iter->priv);
+
+	return iter->curr >= iter->total ? 0 : -1;
+}
+
+static int
+iter_prepare_normal_entry(struct add_entry_iter *iter,
+			  struct machine *machine __maybe_unused,
+			  struct perf_evsel *evsel,
+			  struct addr_location *al __maybe_unused,
+			  struct perf_sample *sample)
+{
+	iter->evsel = evsel;
+	iter->sample = sample;
+	return 0;
+}
+
+static int
+iter_add_single_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+	struct perf_evsel *evsel = iter->evsel;
+	struct perf_sample *sample = iter->sample;
+	struct hist_entry *he;
 
-	he = __hists__add_entry(&evsel->hists, al, parent, NULL, NULL,
+	he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
 				sample->transaction);
 	if (he == NULL)
 		return -ENOMEM;
 
-	if (symbol_conf.use_callchain) {
-		err = callchain_append(he->callchain,
-				       &callchain_cursor,
-				       sample->period);
-		if (err)
-			return err;
-	}
+	iter->he = he;
+	return 0;
+}
+
+static int
+iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
+{
+	int err = 0;
+	struct hist_entry *he = iter->he;
+	struct perf_evsel *evsel = iter->evsel;
+	struct perf_sample *sample = iter->sample;
+
 	/*
 	 * Only in the TUI browser we are doing integrated annotation,
 	 * so we don't allocated the extra space needed because the stdio
@@ -294,19 +370,79 @@ static int perf_evsel__add_hist_entry(struct perf_tool *tool,
 
 		assert(evsel != NULL);
 
-		err = -ENOMEM;
 		if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
-			goto out;
+			return -ENOMEM;
 
 		err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
 	}
 
 	evsel->hists.stats.total_period += sample->period;
 	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
-out:
+
+	if (symbol_conf.use_callchain) {
+		err = callchain_append(he->callchain, &callchain_cursor,
+				       sample->period);
+	}
 	return err;
 }
 
+static struct add_entry_iter mem_iter = {
+	.prepare_entry 		= iter_prepare_mem_entry,
+	.add_single_entry 	= iter_add_single_mem_entry,
+	.next_entry 		= iter_next_nop_entry,
+	.add_next_entry 	= iter_add_next_nop_entry,
+	.finish_entry 		= iter_finish_mem_entry,
+};
+
+static struct add_entry_iter branch_iter = {
+	.prepare_entry 		= iter_prepare_branch_entry,
+	.add_single_entry 	= iter_add_single_branch_entry,
+	.next_entry 		= iter_next_branch_entry,
+	.add_next_entry 	= iter_add_next_branch_entry,
+	.finish_entry 		= iter_finish_branch_entry,
+};
+
+static struct add_entry_iter normal_iter = {
+	.prepare_entry 		= iter_prepare_normal_entry,
+	.add_single_entry 	= iter_add_single_normal_entry,
+	.next_entry 		= iter_next_nop_entry,
+	.add_next_entry 	= iter_add_next_nop_entry,
+	.finish_entry 		= iter_finish_normal_entry,
+};
+
+static int
+perf_evsel__add_entry(struct perf_evsel *evsel, struct addr_location *al,
+		      struct perf_sample *sample, struct machine *machine,
+		      struct add_entry_iter *iter)
+{
+	int err;
+
+	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
+		err = machine__resolve_callchain(machine, evsel, al->thread,
+						 sample, &iter->parent, al,
+						 iter->rep->max_stack);
+		if (err)
+			return err;
+	}
+
+	err = iter->prepare_entry(iter, machine, evsel, al, sample);
+	if (err)
+		return err;
+
+	err = iter->add_single_entry(iter, al);
+	if (err)
+		return err;
+
+	while (iter->next_entry(iter, al)) {
+		err = iter->add_next_entry(iter, al);
+		if (err)
+			break;
+	}
+
+	err = iter->finish_entry(iter, al);
+
+	return err;
+}
 
 static int process_sample_event(struct perf_tool *tool,
 				union perf_event *event,
@@ -316,6 +452,7 @@ static int process_sample_event(struct perf_tool *tool,
 {
 	struct perf_report *rep = container_of(tool, struct perf_report, tool);
 	struct addr_location al;
+	struct add_entry_iter *iter;
 	int ret;
 
 	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
@@ -330,25 +467,21 @@ static int process_sample_event(struct perf_tool *tool,
 	if (rep->cpu_list && !test_bit(sample->cpu, rep->cpu_bitmap))
 		return 0;
 
-	if (sort__mode == SORT_MODE__BRANCH) {
-		ret = perf_report__add_branch_hist_entry(tool, &al, sample,
-							 evsel, machine);
-		if (ret < 0)
-			pr_debug("problem adding lbr entry, skipping event\n");
-	} else if (rep->mem_mode == 1) {
-		ret = perf_report__add_mem_hist_entry(tool, &al, sample,
-						      evsel, machine, event);
-		if (ret < 0)
-			pr_debug("problem adding mem entry, skipping event\n");
-	} else {
-		if (al.map != NULL)
-			al.map->dso->hit = 1;
-
-		ret = perf_evsel__add_hist_entry(tool, evsel, &al, sample,
-						 machine);
-		if (ret < 0)
-			pr_debug("problem incrementing symbol period, skipping event\n");
-	}
+	if (sort__mode == SORT_MODE__BRANCH)
+		iter = &branch_iter;
+	else if (rep->mem_mode == 1)
+		iter = &mem_iter;
+	else
+		iter = &normal_iter;
+
+	if (al.map != NULL)
+		al.map->dso->hit = 1;
+
+	iter->rep = rep;
+	ret = perf_evsel__add_entry(evsel, &al, sample, machine, iter);
+	if (ret < 0)
+		pr_debug("problem adding hist entry, skipping event\n");
+
 	return ret;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
  2013-10-31  6:56 ` [PATCH 01/14] perf tools: Consolidate __hists__add_*entry() Namhyung Kim
  2013-10-31  6:56 ` [PATCH 02/14] perf tools: Introduce struct add_entry_iter Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-11-04 23:45   ` Arnaldo Carvalho de Melo
  2013-10-31  6:56 ` [PATCH 04/14] perf hists: Add support for accumulated stat of hist entry Namhyung Kim
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

hist_entry__add_cpumode_period() and hist_entry__decay() are dealing
with hist_entry's stat fields only.  So make them use the struct
directly.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e6f953190443..3a06b3326f12 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -182,21 +182,21 @@ void hists__output_recalc_col_len(struct hists *hists, int max_rows)
 	}
 }
 
-static void hist_entry__add_cpumode_period(struct hist_entry *he,
+static void hist_entry__add_cpumode_period(struct he_stat *he_stat,
 					   unsigned int cpumode, u64 period)
 {
 	switch (cpumode) {
 	case PERF_RECORD_MISC_KERNEL:
-		he->stat.period_sys += period;
+		he_stat->period_sys += period;
 		break;
 	case PERF_RECORD_MISC_USER:
-		he->stat.period_us += period;
+		he_stat->period_us += period;
 		break;
 	case PERF_RECORD_MISC_GUEST_KERNEL:
-		he->stat.period_guest_sys += period;
+		he_stat->period_guest_sys += period;
 		break;
 	case PERF_RECORD_MISC_GUEST_USER:
-		he->stat.period_guest_us += period;
+		he_stat->period_guest_us += period;
 		break;
 	default:
 		break;
@@ -223,10 +223,10 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
 	dest->weight		+= src->weight;
 }
 
-static void hist_entry__decay(struct hist_entry *he)
+static void hist_entry__decay(struct he_stat *he_stat)
 {
-	he->stat.period = (he->stat.period * 7) / 8;
-	he->stat.nr_events = (he->stat.nr_events * 7) / 8;
+	he_stat->period = (he_stat->period * 7) / 8;
+	he_stat->nr_events = (he_stat->nr_events * 7) / 8;
 	/* XXX need decay for weight too? */
 }
 
@@ -237,7 +237,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 	if (prev_period == 0)
 		return true;
 
-	hist_entry__decay(he);
+	hist_entry__decay(&he->stat);
 
 	if (!he->filtered)
 		hists->stats.total_period -= prev_period - he->stat.period;
@@ -403,7 +403,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	rb_link_node(&he->rb_node_in, parent, p);
 	rb_insert_color(&he->rb_node_in, hists->entries_in);
 out:
-	hist_entry__add_cpumode_period(he, al->cpumode, period);
+	hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
 	return he;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 04/14] perf hists: Add support for accumulated stat of hist entry
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (2 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 05/14] perf hists: Check if accumulated when adding a " Namhyung Kim
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

Maintain accumulated stat information in hist_entry->stat_acc if
symbol_conf.cumulate_callchain is set.  Fields in ->stat_acc have same
vaules initially, and will be updated as callchain is processed later.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/stdio/hist.c  |  1 +
 tools/perf/util/callchain.c |  2 ++
 tools/perf/util/callchain.h |  3 ++-
 tools/perf/util/hist.c      | 18 ++++++++++++++++++
 tools/perf/util/sort.h      |  1 +
 5 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 6c152686e837..302f08afd6a2 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -283,6 +283,7 @@ static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
 		return callchain__fprintf_flat(fp, &he->sorted_chain, total_samples);
 		break;
 	case CHAIN_NONE:
+	case CHAIN_CUMULATIVE:
 		break;
 	default:
 		pr_err("Bad callchain mode\n");
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e3970e3eaacf..c10052c6e73c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -52,6 +52,7 @@ rb_insert_callchain(struct rb_root *root, struct callchain_node *chain,
 				p = &(*p)->rb_right;
 			break;
 		case CHAIN_NONE:
+		case CHAIN_CUMULATIVE:
 		default:
 			break;
 		}
@@ -162,6 +163,7 @@ int callchain_register_param(struct callchain_param *param)
 		param->sort = sort_chain_flat;
 		break;
 	case CHAIN_NONE:
+	case CHAIN_CUMULATIVE:
 	default:
 		return -1;
 	}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 7bb36022377f..08cf367c2357 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -11,7 +11,8 @@ enum chain_mode {
 	CHAIN_NONE,
 	CHAIN_FLAT,
 	CHAIN_GRAPH_ABS,
-	CHAIN_GRAPH_REL
+	CHAIN_GRAPH_REL,
+	CHAIN_CUMULATIVE,
 };
 
 enum chain_order {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 3a06b3326f12..c38655dedad3 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -238,6 +238,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 		return true;
 
 	hist_entry__decay(&he->stat);
+	if (callchain_param.mode == CHAIN_CUMULATIVE)
+		hist_entry__decay(he->stat_acc);
 
 	if (!he->filtered)
 		hists->stats.total_period -= prev_period - he->stat.period;
@@ -285,6 +287,15 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
 	if (he != NULL) {
 		*he = *template;
 
+		if (callchain_param.mode == CHAIN_CUMULATIVE) {
+			he->stat_acc = malloc(sizeof(he->stat));
+			if (he->stat_acc == NULL) {
+				free(he);
+				return NULL;
+			}
+			memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+		}
+
 		if (he->ms.map)
 			he->ms.map->referenced = true;
 
@@ -296,6 +307,7 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
 			 */
 			he->branch_info = malloc(sizeof(*he->branch_info));
 			if (he->branch_info == NULL) {
+				free(he->stat_acc);
 				free(he);
 				return NULL;
 			}
@@ -368,6 +380,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 
 		if (!cmp) {
 			he_stat__add_period(&he->stat, period, weight);
+			if (callchain_param.mode == CHAIN_CUMULATIVE)
+				he_stat__add_period(he->stat_acc, period, weight);
 
 			/*
 			 * This mem info was allocated from machine__resolve_mem
@@ -404,6 +418,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	rb_insert_color(&he->rb_node_in, hists->entries_in);
 out:
 	hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
+	if (callchain_param.mode == CHAIN_CUMULATIVE)
+		hist_entry__add_cpumode_period(he->stat_acc, al->cpumode, period);
 	return he;
 }
 
@@ -502,6 +518,8 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
 
 		if (!cmp) {
 			he_stat__add_stat(&iter->stat, &he->stat);
+			if (callchain_param.mode == CHAIN_CUMULATIVE)
+				he_stat__add_stat(iter->stat_acc, he->stat_acc);
 
 			if (symbol_conf.use_callchain) {
 				callchain_cursor_reset(&callchain_cursor);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index bf4333694d3a..2e2f0f5a1502 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -82,6 +82,7 @@ struct hist_entry {
 		struct list_head head;
 	} pairs;
 	struct he_stat		stat;
+	struct he_stat		*stat_acc;
 	struct map_symbol	ms;
 	struct thread		*thread;
 	u64			ip;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 05/14] perf hists: Check if accumulated when adding a hist entry
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (3 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 04/14] perf hists: Add support for accumulated stat of hist entry Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 06/14] perf hists: Accumulate hist entry stat based on the callchain Namhyung Kim
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

To support callchain accumulation, @entry should be recognized if it's
accumulated or not when add_hist_entry() called.  The period of an
accumulated entry should be added to ->stat_acc but not ->stat. Add
@sample_self arg for that.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-annotate.c |  3 ++-
 tools/perf/builtin-diff.c     |  2 +-
 tools/perf/builtin-report.c   |  6 +++---
 tools/perf/builtin-top.c      |  2 +-
 tools/perf/tests/hists_link.c |  4 ++--
 tools/perf/util/hist.c        | 23 ++++++++++++++---------
 tools/perf/util/hist.h        |  3 ++-
 7 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 4087ab19823c..dcc7706e2136 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,8 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0);
+	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0,
+				true);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 3b67ea2444bd..dd2638001372 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -308,7 +308,7 @@ static int hists__add_entry(struct hists *hists,
 			    u64 weight, u64 transaction)
 {
 	if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
-			       transaction) != NULL)
+			       transaction, true) != NULL)
 		return 0;
 	return -ENOMEM;
 }
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f18cd43687d9..d171f4d18b67 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -152,7 +152,7 @@ iter_add_single_mem_entry(struct add_entry_iter *iter, struct addr_location *al)
 	 * and the he_stat__add_period() function.
 	 */
 	he = __hists__add_entry(&iter->evsel->hists, al, iter->parent, NULL, mi,
-				cost, cost, 0);
+				cost, cost, 0, true);
 	if (!he)
 		return -ENOMEM;
 
@@ -280,7 +280,7 @@ iter_add_next_branch_entry(struct add_entry_iter *iter, struct addr_location *al
 	 * and not events sampled. Thus we use a pseudo period of 1.
 	 */
 	he = __hists__add_entry(&evsel->hists, al, iter->parent, &bi[i], NULL,
-				1, 1, 0);
+				1, 1, 0, true);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -344,7 +344,7 @@ iter_add_single_normal_entry(struct add_entry_iter *iter, struct addr_location *
 
 	he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction);
+				sample->transaction, true);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5c538a43f263..bd4cb3800943 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -248,7 +248,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	pthread_mutex_lock(&evsel->hists.lock);
 	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction);
+				sample->transaction, true);
 	pthread_mutex_unlock(&evsel->hists.lock);
 	if (he == NULL)
 		return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 76992aba00a7..4ea3000e7663 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(&evsel->hists, &al, NULL,
-						NULL, NULL, 1, 1, 0);
+						NULL, NULL, 1, 1, 0, true);
 			if (he == NULL)
 				goto out;
 
@@ -246,7 +246,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(&evsel->hists, &al, NULL,
-						NULL, NULL, 1, 1, 0);
+						NULL, NULL, 1, 1, 0, true);
 			if (he == NULL)
 				goto out;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index c38655dedad3..229329a20ba5 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -279,7 +279,8 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
  * histogram, sorted on item, collects periods
  */
 
-static struct hist_entry *hist_entry__new(struct hist_entry *template)
+static struct hist_entry *hist_entry__new(struct hist_entry *template,
+					  bool sample_self)
 {
 	size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
 	struct hist_entry *he = zalloc(sizeof(*he) + callchain_size);
@@ -294,6 +295,8 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template)
 				return NULL;
 			}
 			memcpy(he->stat_acc, &he->stat, sizeof(he->stat));
+			if (!sample_self)
+				memset(&he->stat, 0, sizeof(he->stat));
 		}
 
 		if (he->ms.map)
@@ -356,8 +359,7 @@ static u8 symbol__parent_filter(const struct symbol *parent)
 static struct hist_entry *add_hist_entry(struct hists *hists,
 				      struct hist_entry *entry,
 				      struct addr_location *al,
-				      u64 period,
-				      u64 weight)
+				      u64 period, u64 weight, bool sample_self)
 {
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
@@ -379,7 +381,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 		cmp = hist_entry__cmp(he, entry);
 
 		if (!cmp) {
-			he_stat__add_period(&he->stat, period, weight);
+			if (sample_self)
+				he_stat__add_period(&he->stat, period, weight);
 			if (callchain_param.mode == CHAIN_CUMULATIVE)
 				he_stat__add_period(he->stat_acc, period, weight);
 
@@ -409,7 +412,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 			p = &(*p)->rb_right;
 	}
 
-	he = hist_entry__new(entry);
+	he = hist_entry__new(entry, sample_self);
 	if (!he)
 		return NULL;
 
@@ -417,7 +420,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	rb_link_node(&he->rb_node_in, parent, p);
 	rb_insert_color(&he->rb_node_in, hists->entries_in);
 out:
-	hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
+	if (sample_self)
+		hist_entry__add_cpumode_period(&he->stat, al->cpumode, period);
 	if (callchain_param.mode == CHAIN_CUMULATIVE)
 		hist_entry__add_cpumode_period(he->stat_acc, al->cpumode, period);
 	return he;
@@ -428,7 +432,8 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct symbol *sym_parent,
 				      struct branch_info *bi,
 				      struct mem_info *mi,
-				      u64 period, u64 weight, u64 transaction)
+				      u64 period, u64 weight, u64 transaction,
+				      bool sample_self)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
@@ -452,7 +457,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 		.transaction = transaction,
 	};
 
-	return add_hist_entry(hists, &entry, al, period, weight);
+	return add_hist_entry(hists, &entry, al, period, weight, sample_self);
 }
 
 int64_t
@@ -876,7 +881,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 			p = &(*p)->rb_right;
 	}
 
-	he = hist_entry__new(pair);
+	he = hist_entry__new(pair, true);
 	if (he) {
 		memset(&he->stat, 0, sizeof(he->stat));
 		he->hists = hists;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 307f1c742563..3a31f5e72cf7 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -89,7 +89,8 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct symbol *parent,
 				      struct branch_info *bi,
 				      struct mem_info *mi, u64 period,
-				      u64 weight, u64 transaction);
+				      u64 weight, u64 transaction,
+				      bool sample_self);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
 int hist_entry__transaction_len(void);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 06/14] perf hists: Accumulate hist entry stat based on the callchain
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (4 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 05/14] perf hists: Check if accumulated when adding a " Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 07/14] perf tools: Update cpumode for each cumulative entry Namhyung Kim
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

Call __hists__add_entry() for each callchain node to get an
accumulated stat for an entry.  However skip nodes which do not have
symbol info as they caused subtle problems.

AFAICS the current sort methods cannot distinguish entries with NULL
dso/sym well so that processing a callchian for an entry that doesn't
have symbol info might add a period to a same entry multiple times.
It ended up with an entry that have more than 100% of accumulated
period value which is not good.  So just stop processing when those
entries are met.

Introduce new cumulative_iter ops to process them properly.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c | 142 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d171f4d18b67..92cbd5cd1ab1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -386,6 +386,138 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
 	return err;
 }
 
+static int
+iter_prepare_cumulative_entry(struct add_entry_iter *iter,
+			      struct machine *machine __maybe_unused,
+			      struct perf_evsel *evsel,
+			      struct addr_location *al __maybe_unused,
+			      struct perf_sample *sample)
+{
+	callchain_cursor_commit(&callchain_cursor);
+
+	/*
+	 * The first callchain node always contains same information
+	 * as a hist entry itself.  So skip it in order to prevent
+	 * double accounting.
+	 */
+	callchain_cursor_advance(&callchain_cursor);
+
+	iter->evsel = evsel;
+	iter->sample = sample;
+	return 0;
+}
+
+static int
+iter_add_single_cumulative_entry(struct add_entry_iter *iter,
+				 struct addr_location *al)
+{
+	struct perf_evsel *evsel = iter->evsel;
+	struct perf_sample *sample = iter->sample;
+	struct hist_entry *he;
+	int err = 0;
+
+	he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+				sample->period, sample->weight,
+				sample->transaction, true);
+	if (he == NULL)
+		return -ENOMEM;
+
+	/*
+	 * This is for putting parents upward during output resort iff
+	 * only a child gets sampled.  See hist_entry__sort_on_period().
+	 */
+	he->callchain->max_depth = PERF_MAX_STACK_DEPTH + 1;
+
+	/*
+	 * Only in the TUI browser we are doing integrated annotation,
+	 * so we don't allocated the extra space needed because the stdio
+	 * code will not use it.
+	 */
+	if (he->ms.sym != NULL && use_browser == 1 && sort__has_sym) {
+		struct annotation *notes = symbol__annotation(he->ms.sym);
+
+		assert(evsel != NULL);
+
+		if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
+			return -ENOMEM;
+
+		err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+	}
+
+	return err;
+}
+
+static int
+iter_next_cumulative_entry(struct add_entry_iter *iter __maybe_unused,
+			   struct addr_location *al)
+{
+	struct callchain_cursor_node *node;
+
+	node = callchain_cursor_current(&callchain_cursor);
+	if (node == NULL)
+		return 0;
+
+	al->map = node->map;
+	al->sym = node->sym;
+	al->addr = node->ip;
+
+	/*
+	 * XXX: Adding an entry without symbol info caused subtle
+	 * problems.  Stop it.
+	 */
+	if (al->sym == NULL)
+		return 0;
+
+	callchain_cursor_advance(&callchain_cursor);
+	return 1;
+}
+
+static int
+iter_add_next_cumulative_entry(struct add_entry_iter *iter,
+			       struct addr_location *al)
+{
+	struct perf_evsel *evsel = iter->evsel;
+	struct perf_sample *sample = iter->sample;
+	struct hist_entry *he;
+	int err = 0;
+
+	he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
+				sample->period, sample->weight,
+				sample->transaction, false);
+	if (he == NULL)
+		return -ENOMEM;
+
+	/*
+	 * Only in the TUI browser we are doing integrated annotation,
+	 * so we don't allocated the extra space needed because the stdio
+	 * code will not use it.
+	 */
+	if (he->ms.sym != NULL && use_browser == 1 && sort__has_sym) {
+		struct annotation *notes = symbol__annotation(he->ms.sym);
+
+		assert(evsel != NULL);
+
+		if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
+			return -ENOMEM;
+
+		err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+	}
+	return err;
+}
+
+static int
+iter_finish_cumulative_entry(struct add_entry_iter *iter,
+			     struct addr_location *al __maybe_unused)
+{
+	struct perf_evsel *evsel = iter->evsel;
+	struct perf_sample *sample = iter->sample;
+
+	evsel->hists.stats.total_period += sample->period;
+	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+
+	return 0;
+}
+
 static struct add_entry_iter mem_iter = {
 	.prepare_entry 		= iter_prepare_mem_entry,
 	.add_single_entry 	= iter_add_single_mem_entry,
@@ -410,6 +542,14 @@ static struct add_entry_iter normal_iter = {
 	.finish_entry 		= iter_finish_normal_entry,
 };
 
+static struct add_entry_iter cumulative_iter = {
+	.prepare_entry 		= iter_prepare_cumulative_entry,
+	.add_single_entry 	= iter_add_single_cumulative_entry,
+	.next_entry 		= iter_next_cumulative_entry,
+	.add_next_entry 	= iter_add_next_cumulative_entry,
+	.finish_entry 		= iter_finish_cumulative_entry,
+};
+
 static int
 perf_evsel__add_entry(struct perf_evsel *evsel, struct addr_location *al,
 		      struct perf_sample *sample, struct machine *machine,
@@ -471,6 +611,8 @@ static int process_sample_event(struct perf_tool *tool,
 		iter = &branch_iter;
 	else if (rep->mem_mode == 1)
 		iter = &mem_iter;
+	else if (callchain_param.mode == CHAIN_CUMULATIVE)
+		iter = &cumulative_iter;
 	else
 		iter = &normal_iter;
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 07/14] perf tools: Update cpumode for each cumulative entry
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (5 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 06/14] perf hists: Accumulate hist entry stat based on the callchain Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-11-01 12:55   ` Jiri Olsa
  2013-10-31  6:56 ` [PATCH 08/14] perf report: Cache cumulative callchains Namhyung Kim
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added.  This led to have
non-matching symbol and cpumode in the output.

Update it accordingly based on the fact whether the map is a part of
the kernel or not.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 92cbd5cd1ab1..1b152a8b7f51 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -80,6 +80,7 @@ struct add_entry_iter {
 	struct perf_report *rep;
 	struct perf_evsel *evsel;
 	struct perf_sample *sample;
+	struct machine *machine;
 	struct hist_entry *he;
 	struct symbol *parent;
 	void *priv;
@@ -388,7 +389,7 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
 
 static int
 iter_prepare_cumulative_entry(struct add_entry_iter *iter,
-			      struct machine *machine __maybe_unused,
+			      struct machine *machine,
 			      struct perf_evsel *evsel,
 			      struct addr_location *al __maybe_unused,
 			      struct perf_sample *sample)
@@ -404,6 +405,7 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,
 
 	iter->evsel = evsel;
 	iter->sample = sample;
+	iter->machine = machine;
 	return 0;
 }
 
@@ -468,6 +470,27 @@ iter_next_cumulative_entry(struct add_entry_iter *iter __maybe_unused,
 	if (al->sym == NULL)
 		return 0;
 
+	if (al->map->groups == &iter->machine->kmaps) {
+		if (machine__is_host(iter->machine)) {
+			al->cpumode = PERF_RECORD_MISC_KERNEL;
+			al->level = 'k';
+		} else {
+			al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+			al->level = 'g';
+		}
+	} else {
+		if (machine__is_host(iter->machine)) {
+			al->cpumode = PERF_RECORD_MISC_USER;
+			al->level = '.';
+		} else if (perf_guest) {
+			al->cpumode = PERF_RECORD_MISC_GUEST_USER;
+			al->level = 'u';
+		} else {
+			al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
+			al->level = 'H';
+		}
+	}
+
 	callchain_cursor_advance(&callchain_cursor);
 	return 1;
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 08/14] perf report: Cache cumulative callchains
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (6 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 07/14] perf tools: Update cpumode for each cumulative entry Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31 11:13   ` Rodrigo Campos
  2013-11-01 12:29   ` Jiri Olsa
  2013-10-31  6:56 ` [PATCH 09/14] perf hists: Sort hist entries by accumulated period Namhyung Kim
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

It is possble that a callchain has cycles or recursive calls.  In that
case it'll end up having entries more than 100% overhead in the
output.  In order to prevent such entries, cache each callchain node
and skip if same entry already cumulated.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 1b152a8b7f51..1a0de9a4a568 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -387,6 +387,11 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
 	return err;
 }
 
+struct cumulative_cache {
+	struct dso *dso;
+	struct symbol *sym;
+};
+
 static int
 iter_prepare_cumulative_entry(struct add_entry_iter *iter,
 			      struct machine *machine,
@@ -394,9 +399,31 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,
 			      struct addr_location *al __maybe_unused,
 			      struct perf_sample *sample)
 {
+	struct callchain_cursor_node *node;
+	struct cumulative_cache *ccache;
+
 	callchain_cursor_commit(&callchain_cursor);
 
 	/*
+	 * This is for detecting cycles or recursions so that they're
+	 * cumulated only one time to prevent entries more than 100%
+	 * overhead.
+	 */
+	ccache = malloc(sizeof(*ccache) * PERF_MAX_STACK_DEPTH);
+	if (ccache == NULL)
+		return -ENOMEM;
+
+	node = callchain_cursor_current(&callchain_cursor);
+	if (node == NULL)
+		return 0;
+
+	ccache[0].dso = node->map->dso;
+	ccache[0].sym = node->sym;
+
+	iter->priv = ccache;
+	iter->curr = 1;
+
+	/*
 	 * The first callchain node always contains same information
 	 * as a hist entry itself.  So skip it in order to prevent
 	 * double accounting.
@@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
 {
 	struct perf_evsel *evsel = iter->evsel;
 	struct perf_sample *sample = iter->sample;
+	struct cumulative_cache *ccache = iter->priv;
 	struct hist_entry *he;
 	int err = 0;
+	int i;
+
+	/*
+	 * Check if there's duplicate entries in the callchain.
+	 * It's possible that it has cycles or recursive calls.
+	 */
+	for (i = 0; i < iter->curr; i++) {
+		if (sort__has_sym) {
+			if (ccache[i].sym == al->sym)
+				return 0;
+		} else {
+			/* Not much we can do - just compare the dso. */
+			if (ccache[i].dso == al->map->dso)
+				return 0;
+		}
+	}
+
+	ccache[i].dso = al->map->dso;
+	ccache[i].sym = al->sym;
+	iter->curr++;
 
 	he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
@@ -538,6 +586,7 @@ iter_finish_cumulative_entry(struct add_entry_iter *iter,
 	evsel->hists.stats.total_period += sample->period;
 	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
 
+	free(iter->priv);
 	return 0;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 09/14] perf hists: Sort hist entries by accumulated period
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (7 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 08/14] perf report: Cache cumulative callchains Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 10/14] perf ui/hist: Add support to accumulated hist stat Namhyung Kim
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

When callchain accumulation is requested, we need to sort the entries
by accumulated period value.  When accumulated periods of two entries
are same (i.e. single path callchain) put the caller above since
accumulation tends to put callers on higher position for obvious
reason.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c | 13 +++++++++++++
 tools/perf/util/hist.c      | 12 ++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 1a0de9a4a568..3b626127c8d6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -349,6 +349,13 @@ iter_add_single_normal_entry(struct add_entry_iter *iter, struct addr_location *
 	if (he == NULL)
 		return -ENOMEM;
 
+	/*
+	 * This is for putting parents upward during output resort iff
+	 * only a child gets sampled.  See hist_entry__sort_on_period().
+	 */
+	he->callchain->max_depth = callchain_cursor.nr - callchain_cursor.pos;
+	he->callchain->max_depth += PERF_MAX_STACK_DEPTH + 1;
+
 	iter->he = he;
 	return 0;
 }
@@ -559,6 +566,12 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
 		return -ENOMEM;
 
 	/*
+	 * This is for putting parents upward during output resort iff
+	 * only a child gets sampled.  See hist_entry__sort_on_period().
+	 */
+	he->callchain->max_depth = callchain_cursor.nr - callchain_cursor.pos;
+
+	/*
 	 * Only in the TUI browser we are doing integrated annotation,
 	 * so we don't allocated the extra space needed because the stdio
 	 * code will not use it.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 229329a20ba5..c1d5d6b43bba 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -623,6 +623,18 @@ static int hist_entry__sort_on_period(struct hist_entry *a,
 	struct hist_entry *pair;
 	u64 *periods_a, *periods_b;
 
+	if (callchain_param.mode == CHAIN_CUMULATIVE) {
+		/*
+		 * Put caller above callee when they have equal period.
+		 */
+		if (a->stat_acc->period != b->stat_acc->period)
+			return a->stat_acc->period > b->stat_acc->period ? 1 : -1;
+
+		if (a->callchain->max_depth != b->callchain->max_depth)
+			return a->callchain->max_depth < b->callchain->max_depth ?
+				1 : -1;
+	}
+
 	ret = period_cmp(a->stat.period, b->stat.period);
 	if (ret || !symbol_conf.event_group)
 		return ret;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 10/14] perf ui/hist: Add support to accumulated hist stat
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (8 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 09/14] perf hists: Sort hist entries by accumulated period Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 11/14] perf ui/browser: " Namhyung Kim
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/hist.c   | 41 +++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/hist.h |  1 +
 2 files changed, 42 insertions(+)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 0a193281eba8..e32a954c5c97 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -129,6 +129,28 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt __maybe_unused,		\
 			  scnprintf, true);					\
 }
 
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field)				\
+static u64 he_get_acc_##_field(struct hist_entry *he)				\
+{										\
+	return he->stat_acc->_field;						\
+}										\
+										\
+static int hpp__color_acc_##_type(struct perf_hpp_fmt *fmt __maybe_unused,	\
+			      struct perf_hpp *hpp, struct hist_entry *he) 	\
+{										\
+	return __hpp__fmt(hpp, he, he_get_acc_##_field, " %12.2f%%",		\
+			  (hpp_snprint_fn)percent_color_snprintf, true);	\
+}
+
+#define __HPP_ENTRY_ACC_PERCENT_FN(_type, _field)				\
+static int hpp__entry_acc_##_type(struct perf_hpp_fmt *_fmt __maybe_unused,	\
+			      struct perf_hpp *hpp, struct hist_entry *he) 	\
+{										\
+	const char *fmt = symbol_conf.field_sep ? " %.2f" : " %12.2f%%";	\
+	return __hpp__fmt(hpp, he, he_get_acc_##_field, fmt,			\
+			  scnprintf, true);					\
+}
+
 #define __HPP_ENTRY_RAW_FN(_type, _field)					\
 static u64 he_get_raw_##_field(struct hist_entry *he)				\
 {										\
@@ -148,6 +170,12 @@ __HPP_WIDTH_FN(_type, _min_width, _unit_width)				\
 __HPP_COLOR_PERCENT_FN(_type, _field)					\
 __HPP_ENTRY_PERCENT_FN(_type, _field)
 
+#define HPP_PERCENT_ACC_FNS(_type, _str, _field, _min_width, _unit_width)\
+__HPP_HEADER_FN(_type, _str, _min_width, _unit_width)			\
+__HPP_WIDTH_FN(_type, _min_width, _unit_width)				\
+__HPP_COLOR_ACC_PERCENT_FN(_type, _field)				\
+__HPP_ENTRY_ACC_PERCENT_FN(_type, _field)
+
 #define HPP_RAW_FNS(_type, _str, _field, _min_width, _unit_width)	\
 __HPP_HEADER_FN(_type, _str, _min_width, _unit_width)			\
 __HPP_WIDTH_FN(_type, _min_width, _unit_width)				\
@@ -159,6 +187,7 @@ HPP_PERCENT_FNS(overhead_sys, "sys", period_sys, 8, 8)
 HPP_PERCENT_FNS(overhead_us, "usr", period_us, 8, 8)
 HPP_PERCENT_FNS(overhead_guest_sys, "guest sys", period_guest_sys, 9, 8)
 HPP_PERCENT_FNS(overhead_guest_us, "guest usr", period_guest_us, 9, 8)
+HPP_PERCENT_ACC_FNS(overhead_acc, "Overhead (Acc)", period, 14, 8)
 
 HPP_RAW_FNS(samples, "Samples", nr_events, 12, 12)
 HPP_RAW_FNS(period, "Period", period, 12, 12)
@@ -171,6 +200,14 @@ HPP_RAW_FNS(period, "Period", period, 12, 12)
 		.entry	= hpp__entry_ ## _name		\
 	}
 
+#define HPP__COLOR_ACC_PRINT_FNS(_name)			\
+	{						\
+		.header	= hpp__header_ ## _name,	\
+		.width	= hpp__width_ ## _name,		\
+		.color	= hpp__color_acc_ ## _name,	\
+		.entry	= hpp__entry_acc_ ## _name	\
+	}
+
 #define HPP__PRINT_FNS(_name)				\
 	{						\
 		.header	= hpp__header_ ## _name,	\
@@ -184,6 +221,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__COLOR_PRINT_FNS(overhead_us),
 	HPP__COLOR_PRINT_FNS(overhead_guest_sys),
 	HPP__COLOR_PRINT_FNS(overhead_guest_us),
+	HPP__COLOR_ACC_PRINT_FNS(overhead_acc),
 	HPP__PRINT_FNS(samples),
 	HPP__PRINT_FNS(period)
 };
@@ -208,6 +246,9 @@ void perf_hpp__init(void)
 {
 	perf_hpp__column_enable(PERF_HPP__OVERHEAD);
 
+	if (callchain_param.mode == CHAIN_CUMULATIVE)
+		perf_hpp__column_enable(PERF_HPP__OVERHEAD_ACC);
+
 	if (symbol_conf.show_cpu_utilization) {
 		perf_hpp__column_enable(PERF_HPP__OVERHEAD_SYS);
 		perf_hpp__column_enable(PERF_HPP__OVERHEAD_US);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3a31f5e72cf7..707442fe5c4f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -160,6 +160,7 @@ enum {
 	PERF_HPP__OVERHEAD_US,
 	PERF_HPP__OVERHEAD_GUEST_SYS,
 	PERF_HPP__OVERHEAD_GUEST_US,
+	PERF_HPP__OVERHEAD_ACC,
 	PERF_HPP__SAMPLES,
 	PERF_HPP__PERIOD,
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 11/14] perf ui/browser: Add support to accumulated hist stat
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (9 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 10/14] perf ui/hist: Add support to accumulated hist stat Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 12/14] perf ui/gtk: " Namhyung Kim
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 7ef36c360471..2cea6cd9824e 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -693,11 +693,26 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
 	return __hpp__color_fmt(hpp, he, __hpp_get_##_field, _cb);	\
 }
 
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field, _cb)			\
+static u64 __hpp_get_acc_##_field(struct hist_entry *he)		\
+{									\
+	return he->stat_acc->_field;					\
+}									\
+									\
+static int								\
+hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,\
+				struct perf_hpp *hpp,			\
+				struct hist_entry *he)			\
+{									\
+	return __hpp__color_fmt(hpp, he, __hpp_get_acc_##_field, _cb);	\
+}
+
 __HPP_COLOR_PERCENT_FN(overhead, period, __hpp__color_callchain)
 __HPP_COLOR_PERCENT_FN(overhead_sys, period_sys, NULL)
 __HPP_COLOR_PERCENT_FN(overhead_us, period_us, NULL)
 __HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys, NULL)
 __HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, NULL)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period, NULL)
 
 #undef __HPP_COLOR_PERCENT_FN
 
@@ -715,6 +730,8 @@ void hist_browser__init_hpp(void)
 				hist_browser__hpp_color_overhead_guest_sys;
 	perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
 				hist_browser__hpp_color_overhead_guest_us;
+	perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+				hist_browser__hpp_color_overhead_acc;
 }
 
 static int hist_browser__show_entry(struct hist_browser *browser,
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 12/14] perf ui/gtk: Add support to accumulated hist stat
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (10 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 11/14] perf ui/browser: " Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 13/14] perf tools: Apply percent-limit to cumulative percentage Namhyung Kim
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

Print accumulated stat of a hist entry if requested.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/gtk/hists.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 2ca66cc1160f..70ed0d5e1b94 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -98,11 +98,25 @@ static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,
 	return __hpp__color_fmt(hpp, he, he_get_##_field);			\
 }
 
+#define __HPP_COLOR_ACC_PERCENT_FN(_type, _field)				\
+static u64 he_get_acc_##_field(struct hist_entry *he)				\
+{										\
+	return he->stat_acc->_field;						\
+}										\
+										\
+static int perf_gtk__hpp_color_##_type(struct perf_hpp_fmt *fmt __maybe_unused,	\
+				       struct perf_hpp *hpp,			\
+				       struct hist_entry *he)			\
+{										\
+	return __hpp__color_fmt(hpp, he, he_get_acc_##_field);			\
+}
+
 __HPP_COLOR_PERCENT_FN(overhead, period)
 __HPP_COLOR_PERCENT_FN(overhead_sys, period_sys)
 __HPP_COLOR_PERCENT_FN(overhead_us, period_us)
 __HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys)
 __HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
+__HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period)
 
 #undef __HPP_COLOR_PERCENT_FN
 
@@ -121,6 +135,8 @@ void perf_gtk__init_hpp(void)
 				perf_gtk__hpp_color_overhead_guest_sys;
 	perf_hpp__format[PERF_HPP__OVERHEAD_GUEST_US].color =
 				perf_gtk__hpp_color_overhead_guest_us;
+	perf_hpp__format[PERF_HPP__OVERHEAD_ACC].color =
+				perf_gtk__hpp_color_overhead_acc;
 }
 
 static void callchain_list__sym_name(struct callchain_list *cl,
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 13/14] perf tools: Apply percent-limit to cumulative percentage
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (11 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 12/14] perf ui/gtk: " Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-10-31  6:56 ` [PATCH 14/14] perf report: Add -g cumulative option Namhyung Kim
  2013-10-31  8:09 ` [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Ingo Molnar
  14 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

If -g cumulative option is given, it needs to show entries which don't
have self overhead.  So apply percent-limit to accumulated overhead
percentage in this case.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c | 15 ++++++++++++---
 tools/perf/ui/gtk/hists.c      |  4 ++++
 tools/perf/ui/stdio/hist.c     |  4 ++++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 2cea6cd9824e..07f500d8df1b 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -834,6 +834,10 @@ static unsigned int hist_browser__refresh(struct ui_browser *browser)
 		if (h->filtered)
 			continue;
 
+		if (callchain_param.mode == CHAIN_CUMULATIVE)
+			percent = h->stat_acc->period * 100.0 /
+					hb->hists->stats.total_period;
+
 		if (percent < hb->min_pcnt)
 			continue;
 
@@ -854,10 +858,11 @@ static struct rb_node *hists__filter_entries(struct rb_node *nd,
 		float percent = h->stat.period * 100.0 /
 					hists->stats.total_period;
 
-		if (percent < min_pcnt)
-			return NULL;
+		if (callchain_param.mode == CHAIN_CUMULATIVE)
+			percent = h->stat_acc->period * 100.0 /
+					hists->stats.total_period;
 
-		if (!h->filtered)
+		if (!h->filtered && percent >= min_pcnt)
 			return nd;
 
 		nd = rb_next(nd);
@@ -875,6 +880,10 @@ static struct rb_node *hists__filter_prev_entries(struct rb_node *nd,
 		float percent = h->stat.period * 100.0 /
 					hists->stats.total_period;
 
+		if (callchain_param.mode == CHAIN_CUMULATIVE)
+			percent = h->stat_acc->period * 100.0 /
+					hists->stats.total_period;
+
 		if (!h->filtered && percent >= min_pcnt)
 			return nd;
 
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 70ed0d5e1b94..b93302840cce 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -302,6 +302,10 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
 		if (h->filtered)
 			continue;
 
+		if (callchain_param.mode == CHAIN_CUMULATIVE)
+			percent = h->stat_acc->period * 100.0 /
+					hists->stats.total_period;
+
 		if (percent < min_pcnt)
 			continue;
 
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 302f08afd6a2..49db435f0cc2 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -496,6 +496,10 @@ print_entries:
 		if (h->filtered)
 			continue;
 
+		if (callchain_param.mode == CHAIN_CUMULATIVE)
+			percent = h->stat_acc->period * 100.0 /
+					hists->stats.total_period;
+
 		if (percent < min_pcnt)
 			continue;
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 14/14] perf report: Add -g cumulative option
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (12 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 13/14] perf tools: Apply percent-limit to cumulative percentage Namhyung Kim
@ 2013-10-31  6:56 ` Namhyung Kim
  2013-11-01 13:17   ` Jiri Olsa
  2013-10-31  8:09 ` [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Ingo Molnar
  14 siblings, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-10-31  6:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

From: Namhyung Kim <namhyung.kim@lge.com>

The -g cumulative option is for showing accumulated overhead (period)
value as well as self overhead.

Cc: Arun Sharma <asharma@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt |  2 ++
 tools/perf/builtin-report.c              |  3 +++
 tools/perf/util/callchain.c              | 12 +++++++++++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 10a279871251..b150bfb734f4 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -130,6 +130,8 @@ OPTIONS
 	- graph: use a graph tree, displaying absolute overhead rates.
 	- fractal: like graph, but displays relative rates. Each branch of
 		 the tree is considered as a new profiled object. +
+        - cumulative: accumulate callchain to parent entry so that they can
+		    show up in the output.
 
 	order can be either:
 	- callee: callee based call graph.
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3b626127c8d6..281053b28898 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1030,6 +1030,9 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
 	else if (!strncmp(tok, "fractal", strlen(arg)))
 		callchain_param.mode = CHAIN_GRAPH_REL;
 
+	else if (!strncmp(tok, "cumulative", strlen(arg)))
+		callchain_param.mode = CHAIN_CUMULATIVE;
+
 	else if (!strncmp(tok, "none", strlen(arg))) {
 		callchain_param.mode = CHAIN_NONE;
 		symbol_conf.use_callchain = false;
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index c10052c6e73c..c3c73eb839df 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -150,6 +150,14 @@ sort_chain_graph_rel(struct rb_root *rb_root, struct callchain_root *chain_root,
 	rb_root->rb_node = chain_root->node.rb_root.rb_node;
 }
 
+static void
+sort_chain_cumulative(struct rb_root *rb_root __maybe_unused,
+		      struct callchain_root *chain_root __maybe_unused,
+		      u64 min_hit __maybe_unused,
+		      struct callchain_param *param __maybe_unused)
+{
+}
+
 int callchain_register_param(struct callchain_param *param)
 {
 	switch (param->mode) {
@@ -162,8 +170,10 @@ int callchain_register_param(struct callchain_param *param)
 	case CHAIN_FLAT:
 		param->sort = sort_chain_flat;
 		break;
-	case CHAIN_NONE:
 	case CHAIN_CUMULATIVE:
+		param->sort = sort_chain_cumulative;
+		break;
+	case CHAIN_NONE:
 	default:
 		return -1;
 	}
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
                   ` (13 preceding siblings ...)
  2013-10-31  6:56 ` [PATCH 14/14] perf report: Add -g cumulative option Namhyung Kim
@ 2013-10-31  8:09 ` Ingo Molnar
  2013-11-01  6:48   ` Namhyung Kim
                     ` (2 more replies)
  14 siblings, 3 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-10-31  8:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Namhyung Kim, LKML, Frederic Weisbecker, Stephane Eranian,
	Jiri Olsa, Rodrigo Campos, Arun Sharma



* Namhyung Kim <namhyung@kernel.org> wrote:

> When the -g cumulative option is given, it'll be shown like this:
> 
>   $ perf report -g cumulative --stdio
> 
>   # Overhead  Overhead (Acc)  Command      Shared Object                   Symbol
>   # ........  ..............  .......  .................  .......................
>   #
>        0.00%          88.29%      abc  libc-2.17.so       [.] __libc_start_main  
>        0.00%          88.29%      abc  abc                [.] main               
>        0.00%          88.29%      abc  abc                [.] c                  
>        0.00%          88.29%      abc  abc                [.] b                  
>       88.29%          88.29%      abc  abc                [.] a                  
>        0.00%          11.61%      abc  ld-2.17.so         [k] _dl_sysdep_start   
>        0.00%           9.43%      abc  ld-2.17.so         [.] dl_main            
>        9.43%           9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
>        2.27%           2.27%      abc  [kernel.kallsyms]  [k] page_fault         
>        0.00%           2.18%      abc  ld-2.17.so         [k] _dl_start_user     
>        0.00%           0.10%      abc  ld-2.17.so         [.] _start             
> 
> As you can see __libc_start_main -> main -> c -> b -> a callchain 
> show up in the output.

This looks really useful!

A couple of details:

1)

This is pretty close to SysProf output, right? So why not use the 
well-known SysProf naming and call the first column 'self' and the 
second column 'total'? I think those names are pretty intuitive and 
it would help people who come from SysProf over to perf.

2)

Is it possible to configure the default 'report -g' style, so that 
people who'd like to use it all the time don't have to type '-g 
cumulative' all the time?

3)

I'd even argue that we enable this reporting feature by default, if 
a data file includes call-chain data: the first column will still 
show the well-known percentage that perf report produces today, the 
second column will be a new feature in essence.

The only open question would be, by which column should we sort: 
'sysprof style' sorts by 'total', 'perf style' sorts by 'self'. 
Agreed?

4)

This is not directly related to the new feature you added: 
call-graph profiling still takes quite a bit of time. It might make 
sense to save the ordered histogram to a perf.data.ordered file, so 
that repeat invocations of 'perf report' don't have to recalculate 
everything again and again?

This file would be maintained transparently and would only be 
re-created when the perf.data file changes, or something like that.

5)

I realize that this is an early RFC, still there are some usability 
complaints I have about call-graph recording/reporting which should 
be addressed before adding new features.

For example I tried to get a list of the -g output modi via:

   $ perf report -g help

Which produced a lot of options - I think it should produce only a 
list of -g options. It also doesn't list cumulative:

    -g, --call-graph <output_type,min_percent[,print_limit],call_order>
                          Display callchains using output_type 
(graph, flat, fractal, or none) , min percent threshold, optional 
print limit, callchain order, key (function or address). Default: 
fractal,0.5,callee,function

Also, the list is very long and not very readable - I think there 
should be more newlines.

Then I tried to do:

   $ perf report -g

which, somewhat surprisingly, was accepted. Given that call-graph 
perf.data is recognized automatically by 'perf report', the -g 
option should only accept -g <type> syntax and provide a list of 
options when '-g' or '-g help' is provided.

6)

A similar UI problem exists on the 'perf record' side: 'perf record 
--call-graph help' should produce a specific list of call-graph 
possibilities, not the two screens full output it does today.

> I know it have some rough edges or even bugs, but I really want to 
> release it and get reviews.  It does not handle event groups and 
> annotations and it has a bug on TUI.
> 
> You can also get this series on 'perf/cumulate-v2' branch in my tree at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

So I tried it out on top of tip:master, with your testcase, and in 
the --stdio case it works very well:

# Overhead  Overhead (Acc)  Command      Shared Object                                      Symbol
# ........  ..............  .......  .................  ..........................................
#
     0.00%         100.00%      abc  abc                [.] _start                                
     0.00%         100.00%      abc  libc-2.17.so       [.] __libc_start_main                     
     0.00%         100.00%      abc  abc                [.] main                                  
     0.00%         100.00%      abc  abc                [.] c                                     
     0.00%         100.00%      abc  abc                [.] b                                     
    99.79%         100.00%      abc  abc                [.] a                                     
     0.01%           0.21%      abc  [kernel.kallsyms]  [k] apic_timer_interrupt     

In the TUI output the 'c' entry is not visible:

   99.79%  100.00%  abc  abc                [.] a                                                               
    0.00%   99.79%  abc  abc                [.] b                                                               
    0.01%    0.21%  abc  [kernel.kallsyms]  [k] apic_timer_interrupt                                            
    0.00%    0.19%  abc  [kernel.kallsyms]  [k] smp_apic_timer_interrupt                 

I suspect this is the 'TUI bug' you mentioned?

> Any comments are welcome, thanks.

Cool stuff, let's fix & merge it ASAP! :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/14] perf report: Cache cumulative callchains
  2013-10-31  6:56 ` [PATCH 08/14] perf report: Cache cumulative callchains Namhyung Kim
@ 2013-10-31 11:13   ` Rodrigo Campos
  2013-11-01  7:07     ` Namhyung Kim
  2013-11-01 12:29   ` Jiri Olsa
  1 sibling, 1 reply; 64+ messages in thread
From: Rodrigo Campos @ 2013-10-31 11:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Arun Sharma

[-- Attachment #1: Type: text/plain, Size: 5264 bytes --]

On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>
> 
> It is possble that a callchain has cycles or recursive calls.  In that
> case it'll end up having entries more than 100% overhead in the
> output.  In order to prevent such entries, cache each callchain node
> and skip if same entry already cumulated.
> 
> Cc: Arun Sharma <asharma@fb.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-report.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 1b152a8b7f51..1a0de9a4a568 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -387,6 +387,11 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
>  	return err;
>  }
>  
> +struct cumulative_cache {
> +	struct dso *dso;
> +	struct symbol *sym;
> +};
> +
>  static int
>  iter_prepare_cumulative_entry(struct add_entry_iter *iter,
>  			      struct machine *machine,
> @@ -394,9 +399,31 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,
>  			      struct addr_location *al __maybe_unused,
>  			      struct perf_sample *sample)
>  {
> +	struct callchain_cursor_node *node;
> +	struct cumulative_cache *ccache;
> +
>  	callchain_cursor_commit(&callchain_cursor);
>  
>  	/*
> +	 * This is for detecting cycles or recursions so that they're
> +	 * cumulated only one time to prevent entries more than 100%
> +	 * overhead.
> +	 */
> +	ccache = malloc(sizeof(*ccache) * PERF_MAX_STACK_DEPTH);
> +	if (ccache == NULL)
> +		return -ENOMEM;
> +
> +	node = callchain_cursor_current(&callchain_cursor);
> +	if (node == NULL)
> +		return 0;

Here you return without assigning iter->priv nor iter->priv->dso iter->priv->sym

> +
> +	ccache[0].dso = node->map->dso;
> +	ccache[0].sym = node->sym;
> +
> +	iter->priv = ccache;
> +	iter->curr = 1;

Because the assignment is done here.

> +
> +	/*
>  	 * The first callchain node always contains same information
>  	 * as a hist entry itself.  So skip it in order to prevent
>  	 * double accounting.
> @@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
>  {
>  	struct perf_evsel *evsel = iter->evsel;
>  	struct perf_sample *sample = iter->sample;
> +	struct cumulative_cache *ccache = iter->priv;
>  	struct hist_entry *he;
>  	int err = 0;
> +	int i;
> +
> +	/*
> +	 * Check if there's duplicate entries in the callchain.
> +	 * It's possible that it has cycles or recursive calls.
> +	 */
> +	for (i = 0; i < iter->curr; i++) {
> +		if (sort__has_sym) {
> +			if (ccache[i].sym == al->sym)
> +				return 0;
> +		} else {
> +			/* Not much we can do - just compare the dso. */
> +			if (ccache[i].dso == al->map->dso)

sym and dso are used here

> +				return 0;
> +		}
> +	}
> +
> +	ccache[i].dso = al->map->dso;
> +	ccache[i].sym = al->sym;
> +	iter->curr++;
>  
>  	he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
>  				sample->period, sample->weight,
> @@ -538,6 +586,7 @@ iter_finish_cumulative_entry(struct add_entry_iter *iter,
>  	evsel->hists.stats.total_period += sample->period;
>  	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
>  
> +	free(iter->priv);

And here I'm seeing a double free when trying the patchset with other examples.
I added a printf to the "if (node == NULL)" case and I'm hitting it. So it seems
to me that, when reusing the entry, every user is freeing it and then the double
free.

This is my first time looking at perf code, so I might be missing LOT of things,
sorry in advance :)

I tried copying the dso and sym to the new allocated mem (and assigning
iter->priv = ccache before the return if "node == NULL"), as shown in the
attached patch, but when running with valgrind it also added some invalid reads
and segfaults (without valgrind it didn't segfault, but I must be "lucky").

So if there is no node (node == NULL) and we cannot read the dso and sym from
the current values of iter->priv (they show invalid reads in valgrind), I'm not
sure where can we read them. And, IIUC, we should initialize them because they
are used later. So maybe there are only some cases where we can read iter->priv
and for the other cases just initialize to something (although doesn't feel
possible because it's the dso and sym) ? Or should we read/copy them from some
other place (maybe before some other thing is free'd) ? Or maybe forget about
the malloc when node == NULL and just use iter->priv and the free shouldn't be
executed till iter->curr == 1 ? I added that if for the free, but didn't help.
Although I didn't really check how iter->curr is used. What am I missing ?

I'm not really sure which is the fix for this. Also just in case I tried
assigning "iter->priv = NULL" after it's free'd and it """fixes""" it.

Just reverting the patch (reverts without conflict) also solves the double free
problem for me (although it probably introduces the problem the patch tries to
fix =) and seems to make valgrind happy too.




Thanks a lot and sorry again if I'm completely missing some "rules/invariants",
I'm really new to perf :)

Rodrigo

[-- Attachment #2: broken-patch.diff --]
[-- Type: text/x-diff, Size: 820 bytes --]

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 281053b..a067be8 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -421,15 +421,20 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,
 		return -ENOMEM;
 
 	node = callchain_cursor_current(&callchain_cursor);
-	if (node == NULL)
+	if (node == NULL) {
+		struct cumulative_cache *tmp = iter->priv;
+		ccache[0].dso = tmp[0].dso;
+		ccache[0].sym = tmp[0].sym;
+		iter->priv = ccache;
 		return 0;
-
-	ccache[0].dso = node->map->dso;
-	ccache[0].sym = node->sym;
+	}
 
 	iter->priv = ccache;
 	iter->curr = 1;
 
+	ccache[0].dso = node->map->dso;
+	ccache[0].sym = node->sym;
+
 	/*
 	 * The first callchain node always contains same information
 	 * as a hist entry itself.  So skip it in order to prevent

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-10-31  8:09 ` [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Ingo Molnar
@ 2013-11-01  6:48   ` Namhyung Kim
  2013-11-01  7:55     ` Ingo Molnar
  2013-11-04 13:27     ` Frederic Weisbecker
  2013-11-04 13:23   ` Frederic Weisbecker
  2013-11-04 13:34   ` Frederic Weisbecker
  2 siblings, 2 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-11-01  6:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Namhyung Kim, LKML, Frederic Weisbecker, Stephane Eranian,
	Jiri Olsa, Rodrigo Campos, Arun Sharma

Hi Ingo,

On Thu, 31 Oct 2013 09:09:32 +0100, Ingo Molnar wrote:
> * Namhyung Kim <namhyung@kernel.org> wrote:
>
>> When the -g cumulative option is given, it'll be shown like this:
>> 
>>   $ perf report -g cumulative --stdio
>> 
>>   # Overhead  Overhead (Acc)  Command      Shared Object                   Symbol
>>   # ........  ..............  .......  .................  .......................
>>   #
>>        0.00%          88.29%      abc  libc-2.17.so       [.] __libc_start_main  
>>        0.00%          88.29%      abc  abc                [.] main               
>>        0.00%          88.29%      abc  abc                [.] c                  
>>        0.00%          88.29%      abc  abc                [.] b                  
>>       88.29%          88.29%      abc  abc                [.] a                  
>>        0.00%          11.61%      abc  ld-2.17.so         [k] _dl_sysdep_start   
>>        0.00%           9.43%      abc  ld-2.17.so         [.] dl_main            
>>        9.43%           9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
>>        2.27%           2.27%      abc  [kernel.kallsyms]  [k] page_fault         
>>        0.00%           2.18%      abc  ld-2.17.so         [k] _dl_start_user     
>>        0.00%           0.10%      abc  ld-2.17.so         [.] _start             
>> 
>> As you can see __libc_start_main -> main -> c -> b -> a callchain 
>> show up in the output.
>
> This looks really useful!

Thanks! :)

>
> A couple of details:
>
> 1)
>
> This is pretty close to SysProf output, right? So why not use the 
> well-known SysProf naming and call the first column 'self' and the 
> second column 'total'? I think those names are pretty intuitive and 
> it would help people who come from SysProf over to perf.

Okay, I can do it. (Although sysprof seems to call it 'cumulative'
rather than 'total' - but I think the 'total' is better since it's
simpler and shorter.)

>
> 2)
>
> Is it possible to configure the default 'report -g' style, so that 
> people who'd like to use it all the time don't have to type '-g 
> cumulative' all the time?

Hmm.. maybe I can add support for the 'report.call-graph' config option.

>
> 3)
>
> I'd even argue that we enable this reporting feature by default, if 
> a data file includes call-chain data: the first column will still 
> show the well-known percentage that perf report produces today, the 
> second column will be a new feature in essence.
>
> The only open question would be, by which column should we sort: 
> 'sysprof style' sorts by 'total', 'perf style' sorts by 'self'. 
> Agreed?

Right, I defaulted to go by 'total'.  But we can add an option for it.

>
> 4)
>
> This is not directly related to the new feature you added: 
> call-graph profiling still takes quite a bit of time. It might make 
> sense to save the ordered histogram to a perf.data.ordered file, so 
> that repeat invocations of 'perf report' don't have to recalculate 
> everything again and again?
>
> This file would be maintained transparently and would only be 
> re-created when the perf.data file changes, or something like that.

Hmm.. good idea.  We may discuss it along with Jiri's multiple file
storage patches.  I haven't had a time to review - maybe next week.

>
> 5)
>
> I realize that this is an early RFC, still there are some usability 
> complaints I have about call-graph recording/reporting which should 
> be addressed before adding new features.
>
> For example I tried to get a list of the -g output modi via:
>
>    $ perf report -g help
>
> Which produced a lot of options - I think it should produce only a 
> list of -g options.

Right.  I have a patchset for this.  Will send it soon.


> It also doesn't list cumulative:
>
>     -g, --call-graph <output_type,min_percent[,print_limit],call_order>
>                           Display callchains using output_type 
> (graph, flat, fractal, or none) , min percent threshold, optional 
> print limit, callchain order, key (function or address). Default: 
> fractal,0.5,callee,function

Ah, I forgot to add it.  Will fix!

>
> Also, the list is very long and not very readable - I think there 
> should be more newlines.
>
> Then I tried to do:
>
>    $ perf report -g
>
> which, somewhat surprisingly, was accepted. Given that call-graph 
> perf.data is recognized automatically by 'perf report', the -g 
> option should only accept -g <type> syntax and provide a list of 
> options when '-g' or '-g help' is provided.

Will check.

>
> 6)
>
> A similar UI problem exists on the 'perf record' side: 'perf record 
> --call-graph help' should produce a specific list of call-graph 
> possibilities, not the two screens full output it does today.

Right.  The patch will come soonish. :)

>
>> I know it have some rough edges or even bugs, but I really want to 
>> release it and get reviews.  It does not handle event groups and 
>> annotations and it has a bug on TUI.
>> 
>> You can also get this series on 'perf/cumulate-v2' branch in my tree at:
>> 
>>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> So I tried it out on top of tip:master, with your testcase, and in 
> the --stdio case it works very well:
>
> # Overhead  Overhead (Acc)  Command      Shared Object                                      Symbol
> # ........  ..............  .......  .................  ..........................................
> #
>      0.00%         100.00%      abc  abc                [.] _start                                
>      0.00%         100.00%      abc  libc-2.17.so       [.] __libc_start_main                     
>      0.00%         100.00%      abc  abc                [.] main                                  
>      0.00%         100.00%      abc  abc                [.] c                                     
>      0.00%         100.00%      abc  abc                [.] b                                     
>     99.79%         100.00%      abc  abc                [.] a                                     
>      0.01%           0.21%      abc  [kernel.kallsyms]  [k] apic_timer_interrupt     
>
> In the TUI output the 'c' entry is not visible:
>
>    99.79%  100.00%  abc  abc                [.] a                                                               
>     0.00%   99.79%  abc  abc                [.] b                                                               
>     0.01%    0.21%  abc  [kernel.kallsyms]  [k] apic_timer_interrupt                                            
>     0.00%    0.19%  abc  [kernel.kallsyms]  [k] smp_apic_timer_interrupt                 
>
> I suspect this is the 'TUI bug' you mentioned?

Exactly, I'll dig into it.

>
>> Any comments are welcome, thanks.
>
> Cool stuff, let's fix & merge it ASAP! :-)

hehe, Thanks!
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/14] perf report: Cache cumulative callchains
  2013-10-31 11:13   ` Rodrigo Campos
@ 2013-11-01  7:07     ` Namhyung Kim
  2013-11-01 14:24       ` Rodrigo Campos
  2013-11-01 15:16       ` Rodrigo Campos
  0 siblings, 2 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-11-01  7:07 UTC (permalink / raw)
  To: Rodrigo Campos
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Arun Sharma

Hi Rodrigo,

On Thu, 31 Oct 2013 11:13:34 +0000, Rodrigo Campos wrote:
> On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <namhyung.kim@lge.com>
>>  	/*
>> +	 * This is for detecting cycles or recursions so that they're
>> +	 * cumulated only one time to prevent entries more than 100%
>> +	 * overhead.
>> +	 */
>> +	ccache = malloc(sizeof(*ccache) * PERF_MAX_STACK_DEPTH);
>> +	if (ccache == NULL)
>> +		return -ENOMEM;
>> +
>> +	node = callchain_cursor_current(&callchain_cursor);
>> +	if (node == NULL)
>> +		return 0;
>
> Here you return without assigning iter->priv nor iter->priv->dso iter->priv->sym

Right!  I forgot to set iter->priv to ccache in this case.

>
>> +
>> +	ccache[0].dso = node->map->dso;
>> +	ccache[0].sym = node->sym;
>> +
>> +	iter->priv = ccache;
>> +	iter->curr = 1;
>
> Because the assignment is done here.
>
>> +
>> +	/*
>>  	 * The first callchain node always contains same information
>>  	 * as a hist entry itself.  So skip it in order to prevent
>>  	 * double accounting.
>> @@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
>>  {
>>  	struct perf_evsel *evsel = iter->evsel;
>>  	struct perf_sample *sample = iter->sample;
>> +	struct cumulative_cache *ccache = iter->priv;
>>  	struct hist_entry *he;
>>  	int err = 0;
>> +	int i;
>> +
>> +	/*
>> +	 * Check if there's duplicate entries in the callchain.
>> +	 * It's possible that it has cycles or recursive calls.
>> +	 */
>> +	for (i = 0; i < iter->curr; i++) {
>> +		if (sort__has_sym) {
>> +			if (ccache[i].sym == al->sym)
>> +				return 0;
>> +		} else {
>> +			/* Not much we can do - just compare the dso. */
>> +			if (ccache[i].dso == al->map->dso)
>
> sym and dso are used here
>
>> +				return 0;
>> +		}
>> +	}
>> +
>> +	ccache[i].dso = al->map->dso;
>> +	ccache[i].sym = al->sym;
>> +	iter->curr++;
>>  
>>  	he = __hists__add_entry(&evsel->hists, al, iter->parent, NULL, NULL,
>>  				sample->period, sample->weight,
>> @@ -538,6 +586,7 @@ iter_finish_cumulative_entry(struct add_entry_iter *iter,
>>  	evsel->hists.stats.total_period += sample->period;
>>  	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
>>  
>> +	free(iter->priv);
>
> And here I'm seeing a double free when trying the patchset with other examples.
> I added a printf to the "if (node == NULL)" case and I'm hitting it. So it seems
> to me that, when reusing the entry, every user is freeing it and then the double
> free.
>
> This is my first time looking at perf code, so I might be missing LOT of things,
> sorry in advance :)

Don't say sorry!  You're very helpful and found a real bug!

>
> I tried copying the dso and sym to the new allocated mem (and assigning
> iter->priv = ccache before the return if "node == NULL"), as shown in the
> attached patch, but when running with valgrind it also added some invalid reads
> and segfaults (without valgrind it didn't segfault, but I must be "lucky").
>
> So if there is no node (node == NULL) and we cannot read the dso and sym from
> the current values of iter->priv (they show invalid reads in valgrind), I'm not
> sure where can we read them. And, IIUC, we should initialize them because they
> are used later. So maybe there are only some cases where we can read iter->priv
> and for the other cases just initialize to something (although doesn't feel
> possible because it's the dso and sym) ? Or should we read/copy them from some
> other place (maybe before some other thing is free'd) ? Or maybe forget about
> the malloc when node == NULL and just use iter->priv and the free shouldn't be
> executed till iter->curr == 1 ? I added that if for the free, but didn't help.
> Although I didn't really check how iter->curr is used. What am I missing ?

If node == NULL, it means there no valid callchains so no need to go in
the loop - iter_next_cumulative_entry() returns 0 so iter_add_next_
cumulative_entry() never called.  So don't worry about the sym and dso
in this case.

The problem is for freeing iter->priv unconditionally.  Since it has
previous ccache pointer (which already freed) it can lead to a double
free if the next entry has no valid callchains.

>
> I'm not really sure which is the fix for this. Also just in case I tried
> assigning "iter->priv = NULL" after it's free'd and it """fixes""" it.

I think the right fix is assigning "iter->priv = NULL" as you said.  But
I changed this patch a bit for v3 so need to check it again.

>
> Just reverting the patch (reverts without conflict) also solves the double free
> problem for me (although it probably introduces the problem the patch tries to
> fix =) and seems to make valgrind happy too.
>
> Thanks a lot and sorry again if I'm completely missing some "rules/invariants",
> I'm really new to perf :)

You didn't miss anything and I'd really appreciate your review. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-01  6:48   ` Namhyung Kim
@ 2013-11-01  7:55     ` Ingo Molnar
  2013-11-01  9:18       ` Pekka Enberg
  2013-11-01  9:22       ` Namhyung Kim
  2013-11-04 13:27     ` Frederic Weisbecker
  1 sibling, 2 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-11-01  7:55 UTC (permalink / raw)
  To: Namhyung Kim, Pekka Enberg
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Namhyung Kim, LKML, Frederic Weisbecker, Stephane Eranian,
	Jiri Olsa, Rodrigo Campos, Arun Sharma


* Namhyung Kim <namhyung@kernel.org> wrote:

> > A couple of details:
> >
> > 1)
> >
> > This is pretty close to SysProf output, right? So why not use the 
> > well-known SysProf naming and call the first column 'self' and the 
> > second column 'total'? I think those names are pretty intuitive and 
> > it would help people who come from SysProf over to perf.
> 
> Okay, I can do it. (Although sysprof seems to call it 'cumulative'
> rather than 'total' - but I think the 'total' is better since it's
> simpler and shorter.)

So sysprof-1.2 has the following two windows:

 'functions',   with 'self' and 'total' fields
 'descendants', with 'self' and 'cumulative' fields

'descendants' appears to be similar to the perf 'dso' concept.

> > 2)
> >
> > Is it possible to configure the default 'report -g' style, so that 
> > people who'd like to use it all the time don't have to type '-g 
> > cumulative' all the time?
> 
> Hmm.. maybe I can add support for the 'report.call-graph' config option.

If we display your new 'total' field by default then it's not as 
pressing to me :)

> > 3)
> >
> > I'd even argue that we enable this reporting feature by default, if 
> > a data file includes call-chain data: the first column will still 
> > show the well-known percentage that perf report produces today, the 
> > second column will be a new feature in essence.
> >
> > The only open question would be, by which column should we sort: 
> > 'sysprof style' sorts by 'total', 'perf style' sorts by 'self'. 
> > Agreed?
> 
> Right, I defaulted to go by 'total'.  But we can add an option for 
> it.

The purpose would be to allow people to do old-style 'sort by 
function overhead' output, while still seeing the 'total' field as 
well.

Btw., if anyone is interested in improving the GTK front-end, it 
would be _really_ nice if it had a 'start profiling' button like 
sysprof has today, with a 'samples' field showing the current number 
of samples. (We could even improve upon sysprof by adding 'stop' 
functionality as well ;-)

A bit like perf top, except the reporting session is hidden until 
the user actively requests the profile.

Maybe it could even be called a gtk version of 'perf top', with a 
button to start/stop collection, with another button to 
activate/deactivate reporting output, and yet another button to 
reset the profiling buffer.

With that feature set perf would be a ready sysprof workflow 
replacement I think. (I've Cc:-ed Pekka, just in case! :-)

> > 4)
> >
> > This is not directly related to the new feature you added: 
> > call-graph profiling still takes quite a bit of time. It might 
> > make sense to save the ordered histogram to a perf.data.ordered 
> > file, so that repeat invocations of 'perf report' don't have to 
> > recalculate everything again and again?
> >
> > This file would be maintained transparently and would only be 
> > re-created when the perf.data file changes, or something like 
> > that.
> 
> Hmm.. good idea.  We may discuss it along with Jiri's multiple 
> file storage patches.  I haven't had a time to review - maybe next 
> week.

So Arnaldo tells me that with your and Frederic's latest 
callgraph-speedup patches the parsing of perf.data got _really_ 
fast, so maybe my performance complaint is moot and we should delay 
complicating the primary perf.data file model with a 'cache' until 
your patches are in and we see the full impact.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-01  7:55     ` Ingo Molnar
@ 2013-11-01  9:18       ` Pekka Enberg
  2013-11-01  9:22       ` Namhyung Kim
  1 sibling, 0 replies; 64+ messages in thread
From: Pekka Enberg @ 2013-11-01  9:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Fri, Nov 1, 2013 at 9:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
> Btw., if anyone is interested in improving the GTK front-end, it
> would be _really_ nice if it had a 'start profiling' button like
> sysprof has today, with a 'samples' field showing the current number
> of samples. (We could even improve upon sysprof by adding 'stop'
> functionality as well ;-)
>
> A bit like perf top, except the reporting session is hidden until
> the user actively requests the profile.
>
> Maybe it could even be called a gtk version of 'perf top', with a
> button to start/stop collection, with another button to
> activate/deactivate reporting output, and yet another button to
> reset the profiling buffer.
>
> With that feature set perf would be a ready sysprof workflow
> replacement I think. (I've Cc:-ed Pekka, just in case! :-)

Sure, "start/stop" button is useful for system-wide profiling but you
also want to be able to start an application for profiling from the UI.
I don't remember if SysProf supports that but the Shark profiler on Mac
OS X does.

Btw, to make GTK-front end even more useful, we should:

  (1) Have a drop-down for selecting a specific process from system-wide
      profile.

  (2) Show 'perf annotate' from 'perf report' when a function is
      double-clicked.

                        Pekka

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-01  7:55     ` Ingo Molnar
  2013-11-01  9:18       ` Pekka Enberg
@ 2013-11-01  9:22       ` Namhyung Kim
  2013-11-01  9:27         ` Ingo Molnar
  1 sibling, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-11-01  9:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Fri, 1 Nov 2013 08:55:02 +0100, Ingo Molnar wrote:
> * Namhyung Kim <namhyung@kernel.org> wrote:
>
>> > A couple of details:
>> >
>> > 1)
>> >
>> > This is pretty close to SysProf output, right? So why not use the 
>> > well-known SysProf naming and call the first column 'self' and the 
>> > second column 'total'? I think those names are pretty intuitive and 
>> > it would help people who come from SysProf over to perf.
>> 
>> Okay, I can do it. (Although sysprof seems to call it 'cumulative'
>> rather than 'total' - but I think the 'total' is better since it's
>> simpler and shorter.)
>
> So sysprof-1.2 has the following two windows:
>
>  'functions',   with 'self' and 'total' fields
>  'descendants', with 'self' and 'cumulative' fields
>
> 'descendants' appears to be similar to the perf 'dso' concept.

Arh, okay.  Thanks for the info.

>
>> > 2)
>> >
>> > Is it possible to configure the default 'report -g' style, so that 
>> > people who'd like to use it all the time don't have to type '-g 
>> > cumulative' all the time?
>> 
>> Hmm.. maybe I can add support for the 'report.call-graph' config option.
>
> If we display your new 'total' field by default then it's not as 
> pressing to me :)

Do you mean -g cumulative without 'self' column?

>
>> > 3)
>> >
>> > I'd even argue that we enable this reporting feature by default, if 
>> > a data file includes call-chain data: the first column will still 
>> > show the well-known percentage that perf report produces today, the 
>> > second column will be a new feature in essence.
>> >
>> > The only open question would be, by which column should we sort: 
>> > 'sysprof style' sorts by 'total', 'perf style' sorts by 'self'. 
>> > Agreed?
>> 
>> Right, I defaulted to go by 'total'.  But we can add an option for 
>> it.
>
> The purpose would be to allow people to do old-style 'sort by 
> function overhead' output, while still seeing the 'total' field as 
> well.

Right.

>
> Btw., if anyone is interested in improving the GTK front-end, it 
> would be _really_ nice if it had a 'start profiling' button like 
> sysprof has today, with a 'samples' field showing the current number 
> of samples. (We could even improve upon sysprof by adding 'stop' 
> functionality as well ;-)

Wow, I'm impressed that the sysprof doesn't have one. :)

>
> A bit like perf top, except the reporting session is hidden until 
> the user actively requests the profile.
>
> Maybe it could even be called a gtk version of 'perf top', with a 
> button to start/stop collection, with another button to 
> activate/deactivate reporting output, and yet another button to 
> reset the profiling buffer.
>
> With that feature set perf would be a ready sysprof workflow 
> replacement I think. (I've Cc:-ed Pekka, just in case! :-)

Sounds nice.  I'm not sure I can have to a time to do it anytime soon.

>
>> > 4)
>> >
>> > This is not directly related to the new feature you added: 
>> > call-graph profiling still takes quite a bit of time. It might 
>> > make sense to save the ordered histogram to a perf.data.ordered 
>> > file, so that repeat invocations of 'perf report' don't have to 
>> > recalculate everything again and again?
>> >
>> > This file would be maintained transparently and would only be 
>> > re-created when the perf.data file changes, or something like 
>> > that.
>> 
>> Hmm.. good idea.  We may discuss it along with Jiri's multiple 
>> file storage patches.  I haven't had a time to review - maybe next 
>> week.
>
> So Arnaldo tells me that with your and Frederic's latest 
> callgraph-speedup patches the parsing of perf.data got _really_ 
> fast, so maybe my performance complaint is moot and we should delay 
> complicating the primary perf.data file model with a 'cache' until 
> your patches are in and we see the full impact.

Okay, let's see what happens. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-01  9:22       ` Namhyung Kim
@ 2013-11-01  9:27         ` Ingo Molnar
  2013-11-05  7:31           ` Namhyung Kim
  0 siblings, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-11-01  9:27 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Namhyung Kim <namhyung@kernel.org> wrote:

> >> > 2)
> >> >
> >> > Is it possible to configure the default 'report -g' style, so that 
> >> > people who'd like to use it all the time don't have to type '-g 
> >> > cumulative' all the time?
> >> 
> >> Hmm.. maybe I can add support for the 'report.call-graph' config option.
> >
> > If we display your new 'total' field by default then it's not as 
> > pressing to me :)
> 
> Do you mean -g cumulative without 'self' column?

So, if by default we display both 'self' and 'total' and sort by 
'total', then I'm personally a happy camper: while it's a change of 
the default perf report output, it's also a step forward.

But some people might complain about it once this hits v3.13 (or 
v3.14) and might want to hide individual columns and have different 
sorting preferences.

So out of defensive caution you might want to provide toggles for 
such things, maybe even generalize it a bit to allow arbitrary 
hiding/display of individual colums in perf report.

That would probably also make it easier to do minimal tweaks to the 
upstream reporting defaults.

> > Btw., if anyone is interested in improving the GTK front-end, it 
> > would be _really_ nice if it had a 'start profiling' button like 
> > sysprof has today, with a 'samples' field showing the current 
> > number of samples. (We could even improve upon sysprof by adding 
> > 'stop' functionality as well ;-)
> 
> Wow, I'm impressed that the sysprof doesn't have one. :)

At least I haven't found it: I tried pressing 'start' once more but 
that doesn't do it, it just keeps collecting data.

Still many developers love sysprof, so I think there would be quite 
some plus in providing a gtk perf top version with the controls 
Pekka and me listed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 01/14] perf tools: Consolidate __hists__add_*entry()
  2013-10-31  6:56 ` [PATCH 01/14] perf tools: Consolidate __hists__add_*entry() Namhyung Kim
@ 2013-11-01 11:56   ` Jiri Olsa
  2013-11-06  5:43   ` [tip:perf/core] perf hists: " tip-bot for Namhyung Kim
  1 sibling, 0 replies; 64+ messages in thread
From: Jiri Olsa @ 2013-11-01 11:56 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos

On Thu, Oct 31, 2013 at 03:56:03PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>
> 
> The __hists__add_{branch,mem}_entry() did almost same thing that
> __hists__add_entry() does.  Consolidate them into one.
> 
> Cc: Jiri Olsa <jolsa@redhat.com>

Acked-by: Jiri Olsa <jolsa@redhat.com>

jirka

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/14] perf tools: Introduce struct add_entry_iter
  2013-10-31  6:56 ` [PATCH 02/14] perf tools: Introduce struct add_entry_iter Namhyung Kim
@ 2013-11-01 12:07   ` Jiri Olsa
  2013-11-05  7:09     ` Namhyung Kim
  2013-11-01 12:09   ` Jiri Olsa
  1 sibling, 1 reply; 64+ messages in thread
From: Jiri Olsa @ 2013-11-01 12:07 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos

On Thu, Oct 31, 2013 at 03:56:04PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>

SNIP

> +
> +static int
> +perf_evsel__add_entry(struct perf_evsel *evsel, struct addr_location *al,
> +		      struct perf_sample *sample, struct machine *machine,
> +		      struct add_entry_iter *iter)
> +{
> +	int err;
> +
> +	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
> +		err = machine__resolve_callchain(machine, evsel, al->thread,
> +						 sample, &iter->parent, al,
> +						 iter->rep->max_stack);
> +		if (err)
> +			return err;
> +	}
> +
> +	err = iter->prepare_entry(iter, machine, evsel, al, sample);
> +	if (err)
> +		return err;
> +
> +	err = iter->add_single_entry(iter, al);
> +	if (err)
> +		return err;
> +
> +	while (iter->next_entry(iter, al)) {
> +		err = iter->add_next_entry(iter, al);
> +		if (err)
> +			break;
> +	}
> +
> +	err = iter->finish_entry(iter, al);
> +
> +	return err;

	return iter->finish_entry(iter, al);  ?


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/14] perf tools: Introduce struct add_entry_iter
  2013-10-31  6:56 ` [PATCH 02/14] perf tools: Introduce struct add_entry_iter Namhyung Kim
  2013-11-01 12:07   ` Jiri Olsa
@ 2013-11-01 12:09   ` Jiri Olsa
  2013-11-05  7:16     ` Namhyung Kim
  1 sibling, 1 reply; 64+ messages in thread
From: Jiri Olsa @ 2013-11-01 12:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos

On Thu, Oct 31, 2013 at 03:56:04PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>

SNIP

> +}
> +
> +static int
> +iter_add_next_nop_entry(struct add_entry_iter *iter __maybe_unused,
> +			struct addr_location *al __maybe_unused)
> +{
> +	return 0;
> +}
> +
> +static int
> +iter_prepare_mem_entry(struct add_entry_iter *iter, struct machine *machine,
> +		       struct perf_evsel *evsel, struct addr_location *al,
> +		       struct perf_sample *sample)
> +{
> +	union perf_event *event = iter->priv;
> +	struct mem_info *mi;
> +	u8 cpumode;
> +
> +	BUG_ON(event == NULL);

the priv does not get assigned.. 'perf mem rep' aborts

[jolsa@krava perf]$ ./perf mem rep --stdio
Aborted
perf: builtin-report.c:120: iter_prepare_mem_entry: Assertion `!(event == ((void *)0))' failed.

jirka

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/14] perf report: Cache cumulative callchains
  2013-10-31  6:56 ` [PATCH 08/14] perf report: Cache cumulative callchains Namhyung Kim
  2013-10-31 11:13   ` Rodrigo Campos
@ 2013-11-01 12:29   ` Jiri Olsa
  2013-11-01 12:57     ` Jiri Olsa
  1 sibling, 1 reply; 64+ messages in thread
From: Jiri Olsa @ 2013-11-01 12:29 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos, Arun Sharma

On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote:

SNIP

>  	 * double accounting.
> @@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
>  {
>  	struct perf_evsel *evsel = iter->evsel;
>  	struct perf_sample *sample = iter->sample;
> +	struct cumulative_cache *ccache = iter->priv;
>  	struct hist_entry *he;
>  	int err = 0;
> +	int i;
> +
> +	/*
> +	 * Check if there's duplicate entries in the callchain.
> +	 * It's possible that it has cycles or recursive calls.
> +	 */
> +	for (i = 0; i < iter->curr; i++) {
> +		if (sort__has_sym) {
> +			if (ccache[i].sym == al->sym)
> +				return 0;
> +		} else {
> +			/* Not much we can do - just compare the dso. */
> +			if (ccache[i].dso == al->map->dso)
> +				return 0;
> +		}
> +	}

hum, do we want to prevent recursion totaly?
how about intended recursion?

also should the dso be checked together with sym?
because the symbol is defined like dso::sym

jirka

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 07/14] perf tools: Update cpumode for each cumulative entry
  2013-10-31  6:56 ` [PATCH 07/14] perf tools: Update cpumode for each cumulative entry Namhyung Kim
@ 2013-11-01 12:55   ` Jiri Olsa
  2013-11-05  7:41     ` Namhyung Kim
  0 siblings, 1 reply; 64+ messages in thread
From: Jiri Olsa @ 2013-11-01 12:55 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos, Arun Sharma

On Thu, Oct 31, 2013 at 03:56:09PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>
> 
> The cpumode and level in struct addr_localtion was set for a sample
> and but updated as cumulative callchains were added.  This led to have
> non-matching symbol and cpumode in the output.
> 
> Update it accordingly based on the fact whether the map is a part of
> the kernel or not.
> 
> Cc: Arun Sharma <asharma@fb.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-report.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 92cbd5cd1ab1..1b152a8b7f51 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -80,6 +80,7 @@ struct add_entry_iter {
>  	struct perf_report *rep;
>  	struct perf_evsel *evsel;
>  	struct perf_sample *sample;
> +	struct machine *machine;
>  	struct hist_entry *he;
>  	struct symbol *parent;
>  	void *priv;
> @@ -388,7 +389,7 @@ iter_finish_normal_entry(struct add_entry_iter *iter, struct addr_location *al)
>  
>  static int
>  iter_prepare_cumulative_entry(struct add_entry_iter *iter,
> -			      struct machine *machine __maybe_unused,
> +			      struct machine *machine,
>  			      struct perf_evsel *evsel,
>  			      struct addr_location *al __maybe_unused,
>  			      struct perf_sample *sample)
> @@ -404,6 +405,7 @@ iter_prepare_cumulative_entry(struct add_entry_iter *iter,
>  
>  	iter->evsel = evsel;
>  	iter->sample = sample;
> +	iter->machine = machine;
>  	return 0;
>  }
>  
> @@ -468,6 +470,27 @@ iter_next_cumulative_entry(struct add_entry_iter *iter __maybe_unused,
>  	if (al->sym == NULL)
>  		return 0;
>  
> +	if (al->map->groups == &iter->machine->kmaps) {
> +		if (machine__is_host(iter->machine)) {
> +			al->cpumode = PERF_RECORD_MISC_KERNEL;
> +			al->level = 'k';
> +		} else {
> +			al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
> +			al->level = 'g';
> +		}
> +	} else {
> +		if (machine__is_host(iter->machine)) {
> +			al->cpumode = PERF_RECORD_MISC_USER;
> +			al->level = '.';
> +		} else if (perf_guest) {
> +			al->cpumode = PERF_RECORD_MISC_GUEST_USER;
> +			al->level = 'u';
> +		} else {
> +			al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
> +			al->level = 'H';
> +		}
> +	}
> +

Looks like this is what thread__find_addr_map does as well.
Could above code go into a function used by both places?

jirka

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/14] perf report: Cache cumulative callchains
  2013-11-01 12:29   ` Jiri Olsa
@ 2013-11-01 12:57     ` Jiri Olsa
  0 siblings, 0 replies; 64+ messages in thread
From: Jiri Olsa @ 2013-11-01 12:57 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos, Arun Sharma

On Fri, Nov 01, 2013 at 01:29:42PM +0100, Jiri Olsa wrote:
> On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote:
> 
> SNIP
> 
> >  	 * double accounting.
> > @@ -501,8 +528,29 @@ iter_add_next_cumulative_entry(struct add_entry_iter *iter,
> >  {
> >  	struct perf_evsel *evsel = iter->evsel;
> >  	struct perf_sample *sample = iter->sample;
> > +	struct cumulative_cache *ccache = iter->priv;
> >  	struct hist_entry *he;
> >  	int err = 0;
> > +	int i;
> > +
> > +	/*
> > +	 * Check if there's duplicate entries in the callchain.
> > +	 * It's possible that it has cycles or recursive calls.
> > +	 */
> > +	for (i = 0; i < iter->curr; i++) {
> > +		if (sort__has_sym) {
> > +			if (ccache[i].sym == al->sym)
> > +				return 0;
> > +		} else {
> > +			/* Not much we can do - just compare the dso. */
> > +			if (ccache[i].dso == al->map->dso)
> > +				return 0;
> > +		}
> > +	}
> 
> hum, do we want to prevent recursion totaly?
> how about intended recursion?

ugh... just managed to read the whole patch,
please forget above comment ;-)

> 
> also should the dso be checked together with sym?
> because the symbol is defined like dso::sym
> 
> jirka

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 14/14] perf report: Add -g cumulative option
  2013-10-31  6:56 ` [PATCH 14/14] perf report: Add -g cumulative option Namhyung Kim
@ 2013-11-01 13:17   ` Jiri Olsa
  2013-11-05  7:44     ` Namhyung Kim
  0 siblings, 1 reply; 64+ messages in thread
From: Jiri Olsa @ 2013-11-01 13:17 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos, Arun Sharma

On Thu, Oct 31, 2013 at 03:56:16PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@lge.com>
> 
> The -g cumulative option is for showing accumulated overhead (period)
> value as well as self overhead.
> 
> Cc: Arun Sharma <asharma@fb.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-report.txt |  2 ++
>  tools/perf/builtin-report.c              |  3 +++
>  tools/perf/util/callchain.c              | 12 +++++++++++-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index 10a279871251..b150bfb734f4 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -130,6 +130,8 @@ OPTIONS
>  	- graph: use a graph tree, displaying absolute overhead rates.
>  	- fractal: like graph, but displays relative rates. Each branch of
>  		 the tree is considered as a new profiled object. +
> +        - cumulative: accumulate callchain to parent entry so that they can
> +		    show up in the output.
>  
>  	order can be either:
>  	- callee: callee based call graph.
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 3b626127c8d6..281053b28898 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1030,6 +1030,9 @@ parse_callchain_opt(const struct option *opt, const char *arg, int unset)
>  	else if (!strncmp(tok, "fractal", strlen(arg)))
>  		callchain_param.mode = CHAIN_GRAPH_REL;
>  
> +	else if (!strncmp(tok, "cumulative", strlen(arg)))
> +		callchain_param.mode = CHAIN_CUMULATIVE;
> +
>  	else if (!strncmp(tok, "none", strlen(arg))) {
>  		callchain_param.mode = CHAIN_NONE;
>  		symbol_conf.use_callchain = false;
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index c10052c6e73c..c3c73eb839df 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -150,6 +150,14 @@ sort_chain_graph_rel(struct rb_root *rb_root, struct callchain_root *chain_root,
>  	rb_root->rb_node = chain_root->node.rb_root.rb_node;
>  }
>  
> +static void
> +sort_chain_cumulative(struct rb_root *rb_root __maybe_unused,
> +		      struct callchain_root *chain_root __maybe_unused,
> +		      u64 min_hit __maybe_unused,
> +		      struct callchain_param *param __maybe_unused)
> +{
> +}

maybe add some commentary explaning that it's intentionaly empty

or maybe dont set it and do following check
in __hists__insert_output_entry:

        if (symbol_conf.use_callchain && callchain_param.sort)
                callchain_param.sort(&he->sorted_chain, he->callchain,
                                      min_callchain_hits, &callchain_param);


jirka

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/14] perf report: Cache cumulative callchains
  2013-11-01  7:07     ` Namhyung Kim
@ 2013-11-01 14:24       ` Rodrigo Campos
  2013-11-01 15:16       ` Rodrigo Campos
  1 sibling, 0 replies; 64+ messages in thread
From: Rodrigo Campos @ 2013-11-01 14:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Arun Sharma

On Fri, Nov 01, 2013 at 04:07:22PM +0900, Namhyung Kim wrote:
> Hi Rodrigo,
> 
> On Thu, 31 Oct 2013 11:13:34 +0000, Rodrigo Campos wrote:
> > On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote:
> >> @@ -538,6 +586,7 @@ iter_finish_cumulative_entry(struct add_entry_iter *iter,
> >>  	evsel->hists.stats.total_period += sample->period;
> >>  	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
> >>  
> >> +	free(iter->priv);
> >
> > And here I'm seeing a double free when trying the patchset with other examples.
> > I added a printf to the "if (node == NULL)" case and I'm hitting it. So it seems
> > to me that, when reusing the entry, every user is freeing it and then the double
> > free.
> >
> > This is my first time looking at perf code, so I might be missing LOT of things,
> > sorry in advance :)
> 
> Don't say sorry!  You're very helpful and found a real bug!

:-)


> > I'm not really sure which is the fix for this. Also just in case I tried
> > assigning "iter->priv = NULL" after it's free'd and it """fixes""" it.
> 
> I think the right fix is assigning "iter->priv = NULL" as you said.  But
> I changed this patch a bit for v3 so need to check it again.

Ohh, great. I'll wait for v3 to play more with this :-)





Thanks a lot,
Rodrigo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 08/14] perf report: Cache cumulative callchains
  2013-11-01  7:07     ` Namhyung Kim
  2013-11-01 14:24       ` Rodrigo Campos
@ 2013-11-01 15:16       ` Rodrigo Campos
  1 sibling, 0 replies; 64+ messages in thread
From: Rodrigo Campos @ 2013-11-01 15:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Arun Sharma

On Fri, Nov 01, 2013 at 04:07:22PM +0900, Namhyung Kim wrote:
> Hi Rodrigo,
> 
> On Thu, 31 Oct 2013 11:13:34 +0000, Rodrigo Campos wrote:
> > On Thu, Oct 31, 2013 at 03:56:10PM +0900, Namhyung Kim wrote:
> 
> I think the right fix is assigning "iter->priv = NULL" as you said.  But
> I changed this patch a bit for v3 so need to check it again.

Btw, that would have a race condition if this is run on multiple threads (I
mean, if different threads can free the same "iter->prev" pointer), at least if
we do it just assigning NULL after the free(). But maybe that's not an issue ?





Thanks a lot,
Rodrigo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-10-31  8:09 ` [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Ingo Molnar
  2013-11-01  6:48   ` Namhyung Kim
@ 2013-11-04 13:23   ` Frederic Weisbecker
  2013-11-04 13:34   ` Frederic Weisbecker
  2 siblings, 0 replies; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-04 13:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Stephane Eranian, Jiri Olsa,
	Rodrigo Campos, Arun Sharma

On Thu, Oct 31, 2013 at 09:09:32AM +0100, Ingo Molnar wrote:
> 
> 
> * Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > When the -g cumulative option is given, it'll be shown like this:
> > 
> >   $ perf report -g cumulative --stdio
> > 
> >   # Overhead  Overhead (Acc)  Command      Shared Object                   Symbol
> >   # ........  ..............  .......  .................  .......................
> >   #
> >        0.00%          88.29%      abc  libc-2.17.so       [.] __libc_start_main  
> >        0.00%          88.29%      abc  abc                [.] main               
> >        0.00%          88.29%      abc  abc                [.] c                  
> >        0.00%          88.29%      abc  abc                [.] b                  
> >       88.29%          88.29%      abc  abc                [.] a                  
> >        0.00%          11.61%      abc  ld-2.17.so         [k] _dl_sysdep_start   
> >        0.00%           9.43%      abc  ld-2.17.so         [.] dl_main            
> >        9.43%           9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
> >        2.27%           2.27%      abc  [kernel.kallsyms]  [k] page_fault         
> >        0.00%           2.18%      abc  ld-2.17.so         [k] _dl_start_user     
> >        0.00%           0.10%      abc  ld-2.17.so         [.] _start             
> > 
> > As you can see __libc_start_main -> main -> c -> b -> a callchain 
> > show up in the output.
> 
> This looks really useful!
> 
> A couple of details:
> 
> 1)
> 
> This is pretty close to SysProf output, right? So why not use the 
> well-known SysProf naming and call the first column 'self' and the 
> second column 'total'? I think those names are pretty intuitive and 
> it would help people who come from SysProf over to perf.

Makes sense. Or "Overhead" and "Total overhead"?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-01  6:48   ` Namhyung Kim
  2013-11-01  7:55     ` Ingo Molnar
@ 2013-11-04 13:27     ` Frederic Weisbecker
  1 sibling, 0 replies; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-04 13:27 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Stephane Eranian, Jiri Olsa,
	Rodrigo Campos, Arun Sharma

On Fri, Nov 01, 2013 at 03:48:37PM +0900, Namhyung Kim wrote:
> Hi Ingo,
> 
> On Thu, 31 Oct 2013 09:09:32 +0100, Ingo Molnar wrote:
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> >
> >> When the -g cumulative option is given, it'll be shown like this:
> >> 
> >>   $ perf report -g cumulative --stdio
> >> 
> >>   # Overhead  Overhead (Acc)  Command      Shared Object                   Symbol
> >>   # ........  ..............  .......  .................  .......................
> >>   #
> >>        0.00%          88.29%      abc  libc-2.17.so       [.] __libc_start_main  
> >>        0.00%          88.29%      abc  abc                [.] main               
> >>        0.00%          88.29%      abc  abc                [.] c                  
> >>        0.00%          88.29%      abc  abc                [.] b                  
> >>       88.29%          88.29%      abc  abc                [.] a                  
> >>        0.00%          11.61%      abc  ld-2.17.so         [k] _dl_sysdep_start   
> >>        0.00%           9.43%      abc  ld-2.17.so         [.] dl_main            
> >>        9.43%           9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
> >>        2.27%           2.27%      abc  [kernel.kallsyms]  [k] page_fault         
> >>        0.00%           2.18%      abc  ld-2.17.so         [k] _dl_start_user     
> >>        0.00%           0.10%      abc  ld-2.17.so         [.] _start             
> >> 
> >> As you can see __libc_start_main -> main -> c -> b -> a callchain 
> >> show up in the output.
> >
> > This looks really useful!
> 
> Thanks! :)
> 
> >
> > A couple of details:
> >
> > 1)
> >
> > This is pretty close to SysProf output, right? So why not use the 
> > well-known SysProf naming and call the first column 'self' and the 
> > second column 'total'? I think those names are pretty intuitive and 
> > it would help people who come from SysProf over to perf.
> 
> Okay, I can do it. (Although sysprof seems to call it 'cumulative'
> rather than 'total' - but I think the 'total' is better since it's
> simpler and shorter.)

OTOH cumulative probably express better what it is about. Or branch cumulative
may be.

Total is confusing because we don't know against what it is.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-10-31  8:09 ` [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Ingo Molnar
  2013-11-01  6:48   ` Namhyung Kim
  2013-11-04 13:23   ` Frederic Weisbecker
@ 2013-11-04 13:34   ` Frederic Weisbecker
  2 siblings, 0 replies; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-04 13:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Stephane Eranian, Jiri Olsa,
	Rodrigo Campos, Arun Sharma

On Thu, Oct 31, 2013 at 09:09:32AM +0100, Ingo Molnar wrote:
> 
> 
> * Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > When the -g cumulative option is given, it'll be shown like this:
> > 
> >   $ perf report -g cumulative --stdio
> > 
> >   # Overhead  Overhead (Acc)  Command      Shared Object                   Symbol
> >   # ........  ..............  .......  .................  .......................
> >   #
> >        0.00%          88.29%      abc  libc-2.17.so       [.] __libc_start_main  
> >        0.00%          88.29%      abc  abc                [.] main               
> >        0.00%          88.29%      abc  abc                [.] c                  
> >        0.00%          88.29%      abc  abc                [.] b                  
> >       88.29%          88.29%      abc  abc                [.] a                  
> >        0.00%          11.61%      abc  ld-2.17.so         [k] _dl_sysdep_start   
> >        0.00%           9.43%      abc  ld-2.17.so         [.] dl_main            
> >        9.43%           9.43%      abc  ld-2.17.so         [.] _dl_relocate_object
> >        2.27%           2.27%      abc  [kernel.kallsyms]  [k] page_fault         
> >        0.00%           2.18%      abc  ld-2.17.so         [k] _dl_start_user     
> >        0.00%           0.10%      abc  ld-2.17.so         [.] _start             
> > 
> > As you can see __libc_start_main -> main -> c -> b -> a callchain 
> > show up in the output.
> 
> This looks really useful!
> 
> A couple of details:
> 
> 1)
> 
> This is pretty close to SysProf output, right? So why not use the 
> well-known SysProf naming and call the first column 'self' and the 
> second column 'total'? I think those names are pretty intuitive and 
> it would help people who come from SysProf over to perf.
> 
> 2)
> 
> Is it possible to configure the default 'report -g' style, so that 
> people who'd like to use it all the time don't have to type '-g 
> cumulative' all the time?
> 
> 3)
> 
> I'd even argue that we enable this reporting feature by default, if 
> a data file includes call-chain data: the first column will still 
> show the well-known percentage that perf report produces today, the 
> second column will be a new feature in essence.
> 
> The only open question would be, by which column should we sort: 
> 'sysprof style' sorts by 'total', 'perf style' sorts by 'self'. 
> Agreed?

Defaulting that behaviour may make sense but we can expect some visible overhead
out of it though. Adding one hist per callchain entry has probably some
measurable impact.

How about we wait to see that option mature upstream a bit first then we can
decide about making it the default once we have measured and maybe optimized
the resulting overhead?

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat
  2013-10-31  6:56 ` [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
@ 2013-11-04 23:45   ` Arnaldo Carvalho de Melo
  2013-11-05  7:17     ` Namhyung Kim
  0 siblings, 1 reply; 64+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-04 23:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

Em Thu, Oct 31, 2013 at 03:56:05PM +0900, Namhyung Kim escreveu:

> -static void hist_entry__add_cpumode_period(struct hist_entry *he,
> +static void hist_entry__add_cpumode_period(struct he_stat *he_stat,
>  					   unsigned int cpumode, u64 period)

So it is not anymore a 'struct hist_entry' method, but a 'struct
he_stat' one, thus these functions need to be renamed accordingly, in
the above case it should be:

-static void hist_entry__add_cpumode_period(struct hist_entry *he,
+static void he_stat__add_cpumode_period(struct he_stat *he_stat,
+				 	 unsigned int cpumode, u64 period)

>  {
>  	switch (cpumode) {
>  	case PERF_RECORD_MISC_KERNEL:
> -		he->stat.period_sys += period;
> +		he_stat->period_sys += period;
>  		break;
>  	case PERF_RECORD_MISC_USER:
> -		he->stat.period_us += period;
> +		he_stat->period_us += period;
>  		break;
>  	case PERF_RECORD_MISC_GUEST_KERNEL:
> -		he->stat.period_guest_sys += period;
> +		he_stat->period_guest_sys += period;
>  		break;
>  	case PERF_RECORD_MISC_GUEST_USER:
> -		he->stat.period_guest_us += period;
> +		he_stat->period_guest_us += period;
>  		break;
>  	default:
>  		break;
> @@ -223,10 +223,10 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
>  	dest->weight		+= src->weight;
>  }
>  
> -static void hist_entry__decay(struct hist_entry *he)
> +static void hist_entry__decay(struct he_stat *he_stat)

Ditto

>  {
> -	he->stat.period = (he->stat.period * 7) / 8;
> -	he->stat.nr_events = (he->stat.nr_events * 7) / 8;
> +	he_stat->period = (he_stat->period * 7) / 8;

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/14] perf tools: Introduce struct add_entry_iter
  2013-11-01 12:07   ` Jiri Olsa
@ 2013-11-05  7:09     ` Namhyung Kim
  0 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-11-05  7:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos

Hi Jiri,

On Fri, 1 Nov 2013 13:07:35 +0100, Jiri Olsa wrote:
> On Thu, Oct 31, 2013 at 03:56:04PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <namhyung.kim@lge.com>
>
> SNIP
>
>> +
>> +static int
>> +perf_evsel__add_entry(struct perf_evsel *evsel, struct addr_location *al,
>> +		      struct perf_sample *sample, struct machine *machine,
>> +		      struct add_entry_iter *iter)
>> +{
>> +	int err;
>> +
>> +	if ((sort__has_parent || symbol_conf.use_callchain) && sample->callchain) {
>> +		err = machine__resolve_callchain(machine, evsel, al->thread,
>> +						 sample, &iter->parent, al,
>> +						 iter->rep->max_stack);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	err = iter->prepare_entry(iter, machine, evsel, al, sample);
>> +	if (err)
>> +		return err;
>> +
>> +	err = iter->add_single_entry(iter, al);
>> +	if (err)
>> +		return err;
>> +
>> +	while (iter->next_entry(iter, al)) {
>> +		err = iter->add_next_entry(iter, al);
>> +		if (err)
>> +			break;
>> +	}
>> +
>> +	err = iter->finish_entry(iter, al);
>> +
>> +	return err;
>
> 	return iter->finish_entry(iter, al);  ?

Right.  Will fix in v3.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 02/14] perf tools: Introduce struct add_entry_iter
  2013-11-01 12:09   ` Jiri Olsa
@ 2013-11-05  7:16     ` Namhyung Kim
  0 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-11-05  7:16 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos

On Fri, 1 Nov 2013 13:09:09 +0100, Jiri Olsa wrote:
> On Thu, Oct 31, 2013 at 03:56:04PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <namhyung.kim@lge.com>
>
> SNIP
>
>> +}
>> +
>> +static int
>> +iter_add_next_nop_entry(struct add_entry_iter *iter __maybe_unused,
>> +			struct addr_location *al __maybe_unused)
>> +{
>> +	return 0;
>> +}
>> +
>> +static int
>> +iter_prepare_mem_entry(struct add_entry_iter *iter, struct machine *machine,
>> +		       struct perf_evsel *evsel, struct addr_location *al,
>> +		       struct perf_sample *sample)
>> +{
>> +	union perf_event *event = iter->priv;
>> +	struct mem_info *mi;
>> +	u8 cpumode;
>> +
>> +	BUG_ON(event == NULL);
>
> the priv does not get assigned.. 'perf mem rep' aborts
>
> [jolsa@krava perf]$ ./perf mem rep --stdio
> Aborted
> perf: builtin-report.c:120: iter_prepare_mem_entry: Assertion `!(event == ((void *)0))' failed.

Argh... forgot to set it for mem entries after code refactoring,
sorry. :-/  Will fix in v3.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat
  2013-11-04 23:45   ` Arnaldo Carvalho de Melo
@ 2013-11-05  7:17     ` Namhyung Kim
  0 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-11-05  7:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

Hi Arnaldo,

On Mon, 4 Nov 2013 20:45:51 -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 31, 2013 at 03:56:05PM +0900, Namhyung Kim escreveu:
>
>> -static void hist_entry__add_cpumode_period(struct hist_entry *he,
>> +static void hist_entry__add_cpumode_period(struct he_stat *he_stat,
>>  					   unsigned int cpumode, u64 period)
>
> So it is not anymore a 'struct hist_entry' method, but a 'struct
> he_stat' one, thus these functions need to be renamed accordingly, in
> the above case it should be:
>
> -static void hist_entry__add_cpumode_period(struct hist_entry *he,
> +static void he_stat__add_cpumode_period(struct he_stat *he_stat,
> +				 	 unsigned int cpumode, u64 period)

I see your point.  Will change the both.

Thanks,
Namhyung

>
>>  {
>>  	switch (cpumode) {
>>  	case PERF_RECORD_MISC_KERNEL:
>> -		he->stat.period_sys += period;
>> +		he_stat->period_sys += period;
>>  		break;
>>  	case PERF_RECORD_MISC_USER:
>> -		he->stat.period_us += period;
>> +		he_stat->period_us += period;
>>  		break;
>>  	case PERF_RECORD_MISC_GUEST_KERNEL:
>> -		he->stat.period_guest_sys += period;
>> +		he_stat->period_guest_sys += period;
>>  		break;
>>  	case PERF_RECORD_MISC_GUEST_USER:
>> -		he->stat.period_guest_us += period;
>> +		he_stat->period_guest_us += period;
>>  		break;
>>  	default:
>>  		break;
>> @@ -223,10 +223,10 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
>>  	dest->weight		+= src->weight;
>>  }
>>  
>> -static void hist_entry__decay(struct hist_entry *he)
>> +static void hist_entry__decay(struct he_stat *he_stat)
>
> Ditto
>
>>  {
>> -	he->stat.period = (he->stat.period * 7) / 8;
>> -	he->stat.nr_events = (he->stat.nr_events * 7) / 8;
>> +	he_stat->period = (he_stat->period * 7) / 8;

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-01  9:27         ` Ingo Molnar
@ 2013-11-05  7:31           ` Namhyung Kim
  2013-11-05  7:46             ` Ingo Molnar
  0 siblings, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-11-05  7:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

Hi Ingo,

On Fri, 1 Nov 2013 10:27:59 +0100, Ingo Molnar wrote:
> * Namhyung Kim <namhyung@kernel.org> wrote:
>
>> >> > 2)
>> >> >
>> >> > Is it possible to configure the default 'report -g' style, so that 
>> >> > people who'd like to use it all the time don't have to type '-g 
>> >> > cumulative' all the time?
>> >> 
>> >> Hmm.. maybe I can add support for the 'report.call-graph' config option.
>> >
>> > If we display your new 'total' field by default then it's not as 
>> > pressing to me :)
>> 
>> Do you mean -g cumulative without 'self' column?
>
> So, if by default we display both 'self' and 'total' and sort by 
> 'total', then I'm personally a happy camper: while it's a change of 
> the default perf report output, it's also a step forward.
>
> But some people might complain about it once this hits v3.13 (or 
> v3.14) and might want to hide individual columns and have different 
> sorting preferences.
>
> So out of defensive caution you might want to provide toggles for 
> such things, maybe even generalize it a bit to allow arbitrary 
> hiding/display of individual colums in perf report.
>
> That would probably also make it easier to do minimal tweaks to the 
> upstream reporting defaults.

Okay, so what would the interface look like?

I think it'd better to separate the option and pass column and
(optional) sort key argument.

  --cumulative both,total (default)
  --cumulative both,self
  --cumulative total
  --cumulative self (meaningless?)

Maybe we need a config option and a single letter option for the default
case like --call-graph and -g options do.

What do you think?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 07/14] perf tools: Update cpumode for each cumulative entry
  2013-11-01 12:55   ` Jiri Olsa
@ 2013-11-05  7:41     ` Namhyung Kim
  0 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-11-05  7:41 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos, Arun Sharma

On Fri, 1 Nov 2013 13:55:51 +0100, Jiri Olsa wrote:
> On Thu, Oct 31, 2013 at 03:56:09PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <namhyung.kim@lge.com>
>> 
>> The cpumode and level in struct addr_localtion was set for a sample
>> and but updated as cumulative callchains were added.  This led to have
>> non-matching symbol and cpumode in the output.
>> 
>> Update it accordingly based on the fact whether the map is a part of
>> the kernel or not.

[SNIP]

>> @@ -468,6 +470,27 @@ iter_next_cumulative_entry(struct add_entry_iter *iter __maybe_unused,
>>  	if (al->sym == NULL)
>>  		return 0;
>>  
>> +	if (al->map->groups == &iter->machine->kmaps) {
>> +		if (machine__is_host(iter->machine)) {
>> +			al->cpumode = PERF_RECORD_MISC_KERNEL;
>> +			al->level = 'k';
>> +		} else {
>> +			al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
>> +			al->level = 'g';
>> +		}
>> +	} else {
>> +		if (machine__is_host(iter->machine)) {
>> +			al->cpumode = PERF_RECORD_MISC_USER;
>> +			al->level = '.';
>> +		} else if (perf_guest) {
>> +			al->cpumode = PERF_RECORD_MISC_GUEST_USER;
>> +			al->level = 'u';
>> +		} else {
>> +			al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
>> +			al->level = 'H';
>> +		}
>> +	}
>> +
>
> Looks like this is what thread__find_addr_map does as well.
> Could above code go into a function used by both places?

In fact, it does the reverse - the thread__find_addr_map() tries to find
a map group (and map) using cpumode, but it wants to know cpumode using
map (and map group).

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 14/14] perf report: Add -g cumulative option
  2013-11-01 13:17   ` Jiri Olsa
@ 2013-11-05  7:44     ` Namhyung Kim
  0 siblings, 0 replies; 64+ messages in thread
From: Namhyung Kim @ 2013-11-05  7:44 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Rodrigo Campos, Arun Sharma

On Fri, 1 Nov 2013 14:17:34 +0100, Jiri Olsa wrote:
> On Thu, Oct 31, 2013 at 03:56:16PM +0900, Namhyung Kim wrote:
>> From: Namhyung Kim <namhyung.kim@lge.com>
>> 
>> The -g cumulative option is for showing accumulated overhead (period)
>> value as well as self overhead.

[SNIP]

>> +static void
>> +sort_chain_cumulative(struct rb_root *rb_root __maybe_unused,
>> +		      struct callchain_root *chain_root __maybe_unused,
>> +		      u64 min_hit __maybe_unused,
>> +		      struct callchain_param *param __maybe_unused)
>> +{
>> +}
>
> maybe add some commentary explaning that it's intentionaly empty
>
> or maybe dont set it and do following check
> in __hists__insert_output_entry:
>
>         if (symbol_conf.use_callchain && callchain_param.sort)
>                 callchain_param.sort(&he->sorted_chain, he->callchain,
>                                       min_callchain_hits, &callchain_param);

Yeah, I'm fine with either way.

Hmm.. but now I think that checking existence of the sort function would
be better than having an empty function.  Will change in v3.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-05  7:31           ` Namhyung Kim
@ 2013-11-05  7:46             ` Ingo Molnar
  2013-11-05  9:05               ` Namhyung Kim
  0 siblings, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-11-05  7:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Ingo,
> 
> On Fri, 1 Nov 2013 10:27:59 +0100, Ingo Molnar wrote:
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> >
> >> >> > 2)
> >> >> >
> >> >> > Is it possible to configure the default 'report -g' style, so that 
> >> >> > people who'd like to use it all the time don't have to type '-g 
> >> >> > cumulative' all the time?
> >> >> 
> >> >> Hmm.. maybe I can add support for the 'report.call-graph' config option.
> >> >
> >> > If we display your new 'total' field by default then it's not as 
> >> > pressing to me :)
> >> 
> >> Do you mean -g cumulative without 'self' column?
> >
> > So, if by default we display both 'self' and 'total' and sort by 
> > 'total', then I'm personally a happy camper: while it's a change of 
> > the default perf report output, it's also a step forward.
> >
> > But some people might complain about it once this hits v3.13 (or 
> > v3.14) and might want to hide individual columns and have different 
> > sorting preferences.
> >
> > So out of defensive caution you might want to provide toggles for 
> > such things, maybe even generalize it a bit to allow arbitrary 
> > hiding/display of individual colums in perf report.
> >
> > That would probably also make it easier to do minimal tweaks to the 
> > upstream reporting defaults.
> 
> Okay, so what would the interface look like?
> 
> I think it'd better to separate the option and pass column and
> (optional) sort key argument.
> 
>   --cumulative both,total (default)
>   --cumulative both,self
>   --cumulative total
>   --cumulative self (meaningless?)
> 
> Maybe we need a config option and a single letter option for the default
> case like --call-graph and -g options do.
> 
> What do you think?

So why restrict it to 'cumulative'? Why not have a generic --fields/-F, 
with a good default. The ordering of the fields determines sorting 
behavior.

The default would be something like:

  -F total,self,process,dso,name

Whether 'cumulative' data is calculated is not a function of any direct 
option, but simply a function of whether the 'total' field is in the -F 
list of columns displayed or not.

With that scheme we could also do things like this to get old-style 
sorting:

 -F self,process,dso,name

Or a really frugal 'readprofile'-style output:

 -F self,name

if one is only interested in percentages and raw function names.

Wrt. sorting order, by default the first column in the list of columns 
would be the primary (and only) sort key.

(The -F field setup list could also be specified in the .perfconfig.)

With this method we could do away with all this geometrical explosion of 
somewhat inconsistent formatting and sorting options...

If there's demand then we could decouple sort keys from the display order, 
by slightly augmenting the field format:

 -F total,self:2,process:0,dso:1,name

This would sort by 'process' field as the primary key, 'dso' the secondary 
key and 'self' as the tertiary key.

And we could also keep the -s/--sort option:

 -s process,dso,self

So the above -F line would be equivalent to:

 -F total,self,process,dso,name -s process,dso,self

What do you think?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-05  7:46             ` Ingo Molnar
@ 2013-11-05  9:05               ` Namhyung Kim
  2013-11-05 11:58                 ` Ingo Molnar
  0 siblings, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-11-05  9:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Tue, 5 Nov 2013 08:46:50 +0100, Ingo Molnar wrote:
> * Namhyung Kim <namhyung@kernel.org> wrote:
>> I think it'd better to separate the option and pass column and
>> (optional) sort key argument.
>> 
>>   --cumulative both,total (default)
>>   --cumulative both,self
>>   --cumulative total
>>   --cumulative self (meaningless?)
>> 
>> Maybe we need a config option and a single letter option for the default
>> case like --call-graph and -g options do.
>> 
>> What do you think?
>
> So why restrict it to 'cumulative'? Why not have a generic --fields/-F, 
> with a good default. The ordering of the fields determines sorting 
> behavior.

Ah, I didn't know you meant that too. :)

But the 'cumulative' (btw, I feel a bit hard to type this word..) is
different in that it *generates* entries didn't get sampled originally.
And as it requires callchains, total field will not work if callchains
are missing.

So I tried to make it a standalone option.

>
> The default would be something like:
>
>   -F total,self,process,dso,name
>
> Whether 'cumulative' data is calculated is not a function of any direct 
> option, but simply a function of whether the 'total' field is in the -F 
> list of columns displayed or not.

So you want to turn the cumulative behavior always on, right?

But as Frederic noted, it might affect the performance of perf report,
so it might be better to delay this behavior to make default after users
feel comfortable with an option?

>
> With that scheme we could also do things like this to get old-style 
> sorting:
>
>  -F self,process,dso,name
>
> Or a really frugal 'readprofile'-style output:
>
>  -F self,name
>
> if one is only interested in percentages and raw function names.

s/name/sym(bol)/ :)

Yes, this is what we do with -s option now.

>
> Wrt. sorting order, by default the first column in the list of columns 
> would be the primary (and only) sort key.

Ah, I never thought it like this way.  It makes sense as sort orders
really affect the output sorting.

>
> (The -F field setup list could also be specified in the .perfconfig.)
>
> With this method we could do away with all this geometrical explosion of 
> somewhat inconsistent formatting and sorting options...

For now, there're two kind of columns:

- one for showing entry's overhead percentage: self, sys, user,
  guest_sys and guest_user.  So the 'total' should go into this
  category.  I named it hpp (hist_entry period percentage) functions and
  yes, I know it's an awfully bad name. :)  Please see perf_hpp__format.

  There're controlled by a couple of options:  --show-total-period,
  --show-nr-samples and --showcpuutilization (I hate this!).  And event
  group also can affect its output.

- one for grouping entries: cpu, pid, comm, dso, symbol, srcline and
  parent.  We call it "sort keys" but confusingly it doesn't affect
  output sorting for now.


So I think cleaning this up with -F option is good and I've been wanting
this discussion for a long time. :)

>
> If there's demand then we could decouple sort keys from the display order, 
> by slightly augmenting the field format:
>
>  -F total,self:2,process:0,dso:1,name
>
> This would sort by 'process' field as the primary key, 'dso' the secondary 
> key and 'self' as the tertiary key.
>
> And we could also keep the -s/--sort option:
>
>  -s process,dso,self
>
> So the above -F line would be equivalent to:
>
>  -F total,self,process,dso,name -s process,dso,self
>
> What do you think?

I like the second one.  It can sustain the old way but can support the
new way easily.

But for compatibility we need to use 'self' sort key internally iff
neither the -F option nor the config option was given by user.  And it
might warn (or notice) users to add 'self' column in the sort key for
future use.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-05  9:05               ` Namhyung Kim
@ 2013-11-05 11:58                 ` Ingo Molnar
  2013-11-06  7:56                   ` Namhyung Kim
  0 siblings, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-11-05 11:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Namhyung Kim <namhyung@kernel.org> wrote:

> On Tue, 5 Nov 2013 08:46:50 +0100, Ingo Molnar wrote:
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> >> I think it'd better to separate the option and pass column and
> >> (optional) sort key argument.
> >> 
> >>   --cumulative both,total (default)
> >>   --cumulative both,self
> >>   --cumulative total
> >>   --cumulative self (meaningless?)
> >> 
> >> Maybe we need a config option and a single letter option for the default
> >> case like --call-graph and -g options do.
> >> 
> >> What do you think?
> >
> > So why restrict it to 'cumulative'? Why not have a generic --fields/-F, 
> > with a good default. The ordering of the fields determines sorting 
> > behavior.
> 
> Ah, I didn't know you meant that too. :)
> 
> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
> different in that it *generates* entries didn't get sampled originally. 
> And as it requires callchains, total field will not work if callchains 
> are missing.

Well, 'total' should disappear if it's not available.

We already have some 'column elimination/optimization' logic - like the 
'dso' will disappear already if it's a single dso everywhere, IIRC?

> So I tried to make it a standalone option.
> 
> >
> > The default would be something like:
> >
> >   -F total,self,process,dso,name
> >
> > Whether 'cumulative' data is calculated is not a function of any direct 
> > option, but simply a function of whether the 'total' field is in the -F 
> > list of columns displayed or not.
> 
> So you want to turn the cumulative behavior always on, right?

Yes.

> But as Frederic noted, it might affect the performance of perf report, 
> so it might be better to delay this behavior to make default after users 
> feel comfortable with an option?

I think with call-chain speedups it should be fast enough, right?

We can argue about the default separately - if it's all done correctly 
then it should be really easy to change the default layout of 'perf 
report'.

> > With that scheme we could also do things like this to get old-style 
> > sorting:
> >
> >  -F self,process,dso,name
> >
> > Or a really frugal 'readprofile'-style output:
> >
> >  -F self,name
> >
> > if one is only interested in percentages and raw function names.
> 
> s/name/sym(bol)/ :)

Yeah.

> Yes, this is what we do with -s option now.
> 
> > Wrt. sorting order, by default the first column in the list of columns 
> > would be the primary (and only) sort key.
> 
> Ah, I never thought it like this way.  It makes sense as sort orders 
> really affect the output sorting.
> 
> > (The -F field setup list could also be specified in the .perfconfig.)
> >
> > With this method we could do away with all this geometrical explosion 
> > of somewhat inconsistent formatting and sorting options...
> 
> For now, there're two kind of columns:
> 
> - one for showing entry's overhead percentage: self, sys, user,
>   guest_sys and guest_user.  So the 'total' should go into this
>   category.  I named it hpp (hist_entry period percentage) functions and
>   yes, I know it's an awfully bad name. :)  Please see perf_hpp__format.
> 
>   There're controlled by a couple of options:  --show-total-period,
>   --show-nr-samples and --showcpuutilization (I hate this!).  And event
>   group also can affect its output.
> 
> - one for grouping entries: cpu, pid, comm, dso, symbol, srcline and
>   parent.  We call it "sort keys" but confusingly it doesn't affect 
>   output sorting for now.

Well, it's still a sort key in a sense, a string lexicographical ordering 
in essence, right?

> So I think cleaning this up with -F option is good and I've been wanting 
> this discussion for a long time. :)

Okay :-)

> > If there's demand then we could decouple sort keys from the display 
> > order, by slightly augmenting the field format:
> >
> >  -F total,self:2,process:0,dso:1,name
> >
> > This would sort by 'process' field as the primary key, 'dso' the secondary 
> > key and 'self' as the tertiary key.
> >
> > And we could also keep the -s/--sort option:
> >
> >  -s process,dso,self
> >
> > So the above -F line would be equivalent to:
> >
> >  -F total,self,process,dso,name -s process,dso,self
> >
> > What do you think?
> 
> I like the second one.  It can sustain the old way but can support the 
> new way easily.
>
> But for compatibility we need to use 'self' sort key internally iff 
> neither the -F option nor the config option was given by user.  And it 
> might warn (or notice) users to add 'self' column in the sort key for 
> future use.

Mind explaining what the problem here is? I don't think I get it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [tip:perf/core] perf hists: Consolidate __hists__add_*entry()
  2013-10-31  6:56 ` [PATCH 01/14] perf tools: Consolidate __hists__add_*entry() Namhyung Kim
  2013-11-01 11:56   ` Jiri Olsa
@ 2013-11-06  5:43   ` tip-bot for Namhyung Kim
  1 sibling, 0 replies; 64+ messages in thread
From: tip-bot for Namhyung Kim @ 2013-11-06  5:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, eranian, paulus, hpa, mingo, a.p.zijlstra,
	namhyung.kim, rodrigo, namhyung, jolsa, fweisbec, tglx

Commit-ID:  41a4e6e2a0237e8ac895f43158ef7c91ab7af157
Gitweb:     http://git.kernel.org/tip/41a4e6e2a0237e8ac895f43158ef7c91ab7af157
Author:     Namhyung Kim <namhyung.kim@lge.com>
AuthorDate: Thu, 31 Oct 2013 15:56:03 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 4 Nov 2013 20:59:09 -0300

perf hists: Consolidate __hists__add_*entry()

The __hists__add_{branch,mem}_entry() does almost the same thing that
__hists__add_entry() does.  Consolidate them into one.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1383202576-28141-2-git-send-email-namhyung@kernel.org
[ Fixup clash with new COMM infrastructure ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-annotate.c |  2 +-
 tools/perf/builtin-diff.c     |  3 +-
 tools/perf/builtin-report.c   | 16 ++++++---
 tools/perf/builtin-top.c      |  5 +--
 tools/perf/tests/hists_link.c |  6 ++--
 tools/perf/util/hist.c        | 75 ++++---------------------------------------
 tools/perf/util/hist.h        | 18 ++---------
 7 files changed, 30 insertions(+), 95 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 6c5ae57..4087ab1 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1, 0);
+	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL, 1, 1, 0);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index b605009..3b67ea2 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -307,7 +307,8 @@ static int hists__add_entry(struct hists *hists,
 			    struct addr_location *al, u64 period,
 			    u64 weight, u64 transaction)
 {
-	if (__hists__add_entry(hists, al, NULL, period, weight, transaction) != NULL)
+	if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
+			       transaction) != NULL)
 		return 0;
 	return -ENOMEM;
 }
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 25f83d5..8cf8e66 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -115,7 +115,8 @@ static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
 	 * and this is indirectly achieved by passing period=weight here
 	 * and the he_stat__add_period() function.
 	 */
-	he = __hists__add_mem_entry(&evsel->hists, al, parent, mi, cost, cost);
+	he = __hists__add_entry(&evsel->hists, al, parent, NULL, mi,
+				cost, cost, 0);
 	if (!he)
 		return -ENOMEM;
 
@@ -200,12 +201,16 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 
 		err = -ENOMEM;
 
+		/* overwrite the 'al' to branch-to info */
+		al->map = bi[i].to.map;
+		al->sym = bi[i].to.sym;
+		al->addr = bi[i].to.addr;
 		/*
 		 * The report shows the percentage of total branches captured
 		 * and not events sampled. Thus we use a pseudo period of 1.
 		 */
-		he = __hists__add_branch_entry(&evsel->hists, al, parent,
-				&bi[i], 1, 1);
+		he = __hists__add_entry(&evsel->hists, al, parent, &bi[i], NULL,
+					1, 1, 0);
 		if (he) {
 			struct annotation *notes;
 			bx = he->branch_info;
@@ -266,8 +271,9 @@ static int perf_evsel__add_hist_entry(struct perf_tool *tool,
 			return err;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, parent, sample->period,
-				sample->weight, sample->transaction);
+	he = __hists__add_entry(&evsel->hists, al, parent, NULL, NULL,
+				sample->period, sample->weight,
+				sample->transaction);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ca5ca37..21897f0 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -246,8 +246,9 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct hist_entry *he;
 
 	pthread_mutex_lock(&evsel->hists.lock);
-	he = __hists__add_entry(&evsel->hists, al, NULL, sample->period,
-				sample->weight, sample->transaction);
+	he = __hists__add_entry(&evsel->hists, al, NULL, NULL, NULL,
+				sample->period, sample->weight,
+				sample->transaction);
 	pthread_mutex_unlock(&evsel->hists.lock);
 	if (he == NULL)
 		return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 6c337e6..173bf42 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(&evsel->hists, &al, NULL,
-						1, 1, 0);
+						NULL, NULL, 1, 1, 0);
 			if (he == NULL)
 				goto out;
 
@@ -245,8 +245,8 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 							  &sample) < 0)
 				goto out;
 
-			he = __hists__add_entry(&evsel->hists, &al, NULL, 1, 1,
-						0);
+			he = __hists__add_entry(&evsel->hists, &al, NULL,
+						NULL, NULL, 1, 1, 0);
 			if (he == NULL)
 				goto out;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 30793f9..822903e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -407,73 +407,12 @@ out:
 	return he;
 }
 
-struct hist_entry *__hists__add_mem_entry(struct hists *hists,
-					  struct addr_location *al,
-					  struct symbol *sym_parent,
-					  struct mem_info *mi,
-					  u64 period,
-					  u64 weight)
-{
-	struct hist_entry entry = {
-		.thread	= al->thread,
-		.comm = thread__comm(al->thread),
-		.ms = {
-			.map	= al->map,
-			.sym	= al->sym,
-		},
-		.stat = {
-			.period	= period,
-			.weight = weight,
-			.nr_events = 1,
-		},
-		.cpu	= al->cpu,
-		.ip	= al->addr,
-		.level	= al->level,
-		.parent = sym_parent,
-		.filtered = symbol__parent_filter(sym_parent),
-		.hists = hists,
-		.mem_info = mi,
-		.branch_info = NULL,
-	};
-	return add_hist_entry(hists, &entry, al, period, weight);
-}
-
-struct hist_entry *__hists__add_branch_entry(struct hists *hists,
-					     struct addr_location *al,
-					     struct symbol *sym_parent,
-					     struct branch_info *bi,
-					     u64 period,
-					     u64 weight)
-{
-	struct hist_entry entry = {
-		.thread	= al->thread,
-		.comm = thread__comm(al->thread),
-		.ms = {
-			.map	= bi->to.map,
-			.sym	= bi->to.sym,
-		},
-		.cpu	= al->cpu,
-		.ip	= bi->to.addr,
-		.level	= al->level,
-		.stat = {
-			.period	= period,
-			.nr_events = 1,
-			.weight = weight,
-		},
-		.parent = sym_parent,
-		.filtered = symbol__parent_filter(sym_parent),
-		.branch_info = bi,
-		.hists	= hists,
-		.mem_info = NULL,
-	};
-
-	return add_hist_entry(hists, &entry, al, period, weight);
-}
-
 struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct addr_location *al,
-				      struct symbol *sym_parent, u64 period,
-				      u64 weight, u64 transaction)
+				      struct symbol *sym_parent,
+				      struct branch_info *bi,
+				      struct mem_info *mi,
+				      u64 period, u64 weight, u64 transaction)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
@@ -486,15 +425,15 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 		.ip	= al->addr,
 		.level	= al->level,
 		.stat = {
-			.period	= period,
 			.nr_events = 1,
+			.period	= period,
 			.weight = weight,
 		},
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
 		.hists	= hists,
-		.branch_info = NULL,
-		.mem_info = NULL,
+		.branch_info = bi,
+		.mem_info = mi,
 		.transaction = transaction,
 	};
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 9d2d022..307f1c7 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -86,7 +86,9 @@ struct hists {
 
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
-				      struct symbol *parent, u64 period,
+				      struct symbol *parent,
+				      struct branch_info *bi,
+				      struct mem_info *mi, u64 period,
 				      u64 weight, u64 transaction);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
@@ -95,20 +97,6 @@ int hist_entry__sort_snprintf(struct hist_entry *self, char *bf, size_t size,
 			      struct hists *hists);
 void hist_entry__free(struct hist_entry *);
 
-struct hist_entry *__hists__add_branch_entry(struct hists *self,
-					     struct addr_location *al,
-					     struct symbol *sym_parent,
-					     struct branch_info *bi,
-					     u64 period,
-					     u64 weight);
-
-struct hist_entry *__hists__add_mem_entry(struct hists *self,
-					  struct addr_location *al,
-					  struct symbol *sym_parent,
-					  struct mem_info *mi,
-					  u64 period,
-					  u64 weight);
-
 void hists__output_resort(struct hists *self);
 void hists__collapse_resort(struct hists *self, struct ui_progress *prog);
 

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-05 11:58                 ` Ingo Molnar
@ 2013-11-06  7:56                   ` Namhyung Kim
  2013-11-06  8:30                     ` Ingo Molnar
  0 siblings, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-11-06  7:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

Hi Ingo,

On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
> * Namhyung Kim <namhyung@kernel.org> wrote:
>> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
>> different in that it *generates* entries didn't get sampled originally. 
>> And as it requires callchains, total field will not work if callchains 
>> are missing.
>
> Well, 'total' should disappear if it's not available.

But what if it's the only sort key user gave?

>
> We already have some 'column elimination/optimization' logic - like the 
> 'dso' will disappear already if it's a single dso everywhere, IIRC?

When user explicitly gives a single name as the column filter with -c,
-d and/or -S options.

But it seems to have a same issue that I said above:

  $ perf report -s comm -c perf --stdio
  (...)
  # Overhead
  # ........
  #
     100.00%


And TUI even shows a noise in the output.

>
>> But as Frederic noted, it might affect the performance of perf report, 
>> so it might be better to delay this behavior to make default after users 
>> feel comfortable with an option?
>
> I think with call-chain speedups it should be fast enough, right?

Yeah, it should speedup things significantly.

>
> We can argue about the default separately - if it's all done correctly 
> then it should be really easy to change the default layout of 'perf 
> report'.
>

I just think that the perf tools are going so fast. ;-)


>> For now, there're two kind of columns:
>> 
>> - one for showing entry's overhead percentage: self, sys, user,
>>   guest_sys and guest_user.  So the 'total' should go into this
>>   category.  I named it hpp (hist_entry period percentage) functions and
>>   yes, I know it's an awfully bad name. :)  Please see perf_hpp__format.
>> 
>>   There're controlled by a couple of options:  --show-total-period,
>>   --show-nr-samples and --showcpuutilization (I hate this!).  And event
>>   group also can affect its output.
>> 
>> - one for grouping entries: cpu, pid, comm, dso, symbol, srcline and
>>   parent.  We call it "sort keys" but confusingly it doesn't affect 
>>   output sorting for now.
>
> Well, it's still a sort key in a sense, a string lexicographical ordering 
> in essence, right?

Right.  But it only affects on groupping entries when added and
collapsed not the output ordering.

>
>> > If there's demand then we could decouple sort keys from the display 
>> > order, by slightly augmenting the field format:
>> >
>> >  -F total,self:2,process:0,dso:1,name
>> >
>> > This would sort by 'process' field as the primary key, 'dso' the secondary 
>> > key and 'self' as the tertiary key.
>> >
>> > And we could also keep the -s/--sort option:
>> >
>> >  -s process,dso,self
>> >
>> > So the above -F line would be equivalent to:
>> >
>> >  -F total,self,process,dso,name -s process,dso,self
>> >
>> > What do you think?
>> 
>> I like the second one.  It can sustain the old way but can support the 
>> new way easily.
>>
>> But for compatibility we need to use 'self' sort key internally iff 
>> neither the -F option nor the config option was given by user.  And it 
>> might warn (or notice) users to add 'self' column in the sort key for 
>> future use.
>
> Mind explaining what the problem here is? I don't think I get it.

Well, normal users still use it as they used to - like 
'perf report -s comm,dso' without -F option and the config.

In that case, what would the output look like?  According to the above
proposal it'd look like below.

  # Command  Shared object
  # .......  .............
        aaa  aaa
        aaa  libc.so
        bbb  bbb
        bbb  libc.so


But the user might want see this:

  # Overhead (self)  Command  Shared object
  # ...............  .......  .............
             30.00%      bbb  bbb
             25.00%      aaa  aaa
             25.00%      aaa  libc.so
             20.00%      bbb  libc.so


If she really wants to see it sorted by comm and dso, the command line
should be 'perf report -F self,comm,dso -s comm,dso'
(or just 'perf report -F self -s comm,dso' could do the same).

  # Overhead (self)  Command  Shared object
  # ...............  .......  .............
             25.00%      aaa  aaa
             25.00%      aaa  libc.so
             30.00%      bbb  bbb
             20.00%      bbb  libc.so


Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06  7:56                   ` Namhyung Kim
@ 2013-11-06  8:30                     ` Ingo Molnar
  2013-11-06  9:17                       ` Namhyung Kim
  2013-11-06 12:10                       ` Frederic Weisbecker
  0 siblings, 2 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-11-06  8:30 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Ingo,
> 
> On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> >> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
> >> different in that it *generates* entries didn't get sampled originally. 
> >> And as it requires callchains, total field will not work if callchains 
> >> are missing.
> >
> > Well, 'total' should disappear if it's not available.
> 
> But what if it's the only sort key user gave?

Do you mean something like:

  -F self,name -s total

i.e. if a sort key not displayed?

I think sort keys should be automatically added to the displayed fields 
list.

This rule is obviously met with the -F total:2,self:1,name:0 kind of 
sorting syntax (you can only sort by fields that get displayed) - if mixed 
with -s then it should be implicit I think.

> >> But for compatibility we need to use 'self' sort key internally iff 
> >> neither the -F option nor the config option was given by user.  And 
> >> it might warn (or notice) users to add 'self' column in the sort key 
> >> for future use.
> >
> > Mind explaining what the problem here is? I don't think I get it.
> 
> Well, normal users still use it as they used to - like 
> 'perf report -s comm,dso' without -F option and the config.
> 
> In that case, what would the output look like?  According to the above
> proposal it'd look like below.
> 
>   # Command  Shared object
>   # .......  .............
>         aaa  aaa
>         aaa  libc.so
>         bbb  bbb
>         bbb  libc.so
> 
> 
> But the user might want see this:
> 
>   # Overhead (self)  Command  Shared object
>   # ...............  .......  .............
>              30.00%      bbb  bbb
>              25.00%      aaa  aaa
>              25.00%      aaa  libc.so
>              20.00%      bbb  libc.so
> 
> 
> If she really wants to see it sorted by comm and dso, the command line
> should be 'perf report -F self,comm,dso -s comm,dso'
> (or just 'perf report -F self -s comm,dso' could do the same).
> 
>   # Overhead (self)  Command  Shared object
>   # ...............  .......  .............
>              25.00%      aaa  aaa
>              25.00%      aaa  libc.so
>              30.00%      bbb  bbb
>              20.00%      bbb  libc.so

This problem should be solved if all -s fields are displayed - i.e. they 
are added to the -F list, right?

Basically there's just a single concept: the -F list. The -s option simply 
modifies and extends the -F list but internally perf report would not know 
anything about '-s', it only knows about fields to display and it would 
know which of those fields are to be sorted and in what order.

Does that make sense to you? Does it cover everything needed?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06  8:30                     ` Ingo Molnar
@ 2013-11-06  9:17                       ` Namhyung Kim
  2013-11-06 11:47                         ` Ingo Molnar
  2013-11-06 12:10                       ` Frederic Weisbecker
  1 sibling, 1 reply; 64+ messages in thread
From: Namhyung Kim @ 2013-11-06  9:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Wed, 6 Nov 2013 09:30:46 +0100, Ingo Molnar wrote:
> * Namhyung Kim <namhyung@kernel.org> wrote:
>
>> Hi Ingo,
>> 
>> On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
>> > * Namhyung Kim <namhyung@kernel.org> wrote:
>> >> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
>> >> different in that it *generates* entries didn't get sampled originally. 
>> >> And as it requires callchains, total field will not work if callchains 
>> >> are missing.
>> >
>> > Well, 'total' should disappear if it's not available.
>> 
>> But what if it's the only sort key user gave?
>
> Do you mean something like:
>
>   -F self,name -s total
>
> i.e. if a sort key not displayed?

What I worry is when no -F option was given at all.

>
> I think sort keys should be automatically added to the displayed fields 
> list.

Agreed.

>
> This rule is obviously met with the -F total:2,self:1,name:0 kind of 
> sorting syntax (you can only sort by fields that get displayed) - if mixed 
> with -s then it should be implicit I think.
>
>> >> But for compatibility we need to use 'self' sort key internally iff 
>> >> neither the -F option nor the config option was given by user.  And 
>> >> it might warn (or notice) users to add 'self' column in the sort key 
>> >> for future use.
>> >
>> > Mind explaining what the problem here is? I don't think I get it.
>> 
>> Well, normal users still use it as they used to - like 
>> 'perf report -s comm,dso' without -F option and the config.
>> 
>> In that case, what would the output look like?  According to the above
>> proposal it'd look like below.
>> 
>>   # Command  Shared object
>>   # .......  .............
>>         aaa  aaa
>>         aaa  libc.so
>>         bbb  bbb
>>         bbb  libc.so
>> 
>> 
>> But the user might want see this:
>> 
>>   # Overhead (self)  Command  Shared object
>>   # ...............  .......  .............
>>              30.00%      bbb  bbb
>>              25.00%      aaa  aaa
>>              25.00%      aaa  libc.so
>>              20.00%      bbb  libc.so
>> 
>> 
>> If she really wants to see it sorted by comm and dso, the command line
>> should be 'perf report -F self,comm,dso -s comm,dso'
>> (or just 'perf report -F self -s comm,dso' could do the same).
>> 
>>   # Overhead (self)  Command  Shared object
>>   # ...............  .......  .............
>>              25.00%      aaa  aaa
>>              25.00%      aaa  libc.so
>>              30.00%      bbb  bbb
>>              20.00%      bbb  libc.so
>
> This problem should be solved if all -s fields are displayed - i.e. they 
> are added to the -F list, right?

But old users might not aware of the new -F option, and use -s option
only.  If so, she will get output like the first example, right?

>
> Basically there's just a single concept: the -F list. The -s option simply 
> modifies and extends the -F list but internally perf report would not know 
> anything about '-s', it only knows about fields to display and it would 
> know which of those fields are to be sorted and in what order.
>
> Does that make sense to you? Does it cover everything needed?

I like the concept.  I'm just looking for a way to add it without
upsetting old users. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06  9:17                       ` Namhyung Kim
@ 2013-11-06 11:47                         ` Ingo Molnar
  2013-11-06 12:14                           ` Frederic Weisbecker
                                             ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-11-06 11:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Namhyung Kim <namhyung@kernel.org> wrote:

> On Wed, 6 Nov 2013 09:30:46 +0100, Ingo Molnar wrote:
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> >
> >> Hi Ingo,
> >> 
> >> On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
> >> > * Namhyung Kim <namhyung@kernel.org> wrote:
> >> >> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
> >> >> different in that it *generates* entries didn't get sampled originally. 
> >> >> And as it requires callchains, total field will not work if callchains 
> >> >> are missing.
> >> >
> >> > Well, 'total' should disappear if it's not available.
> >> 
> >> But what if it's the only sort key user gave?
> >
> > Do you mean something like:
> >
> >   -F self,name -s total
> >
> > i.e. if a sort key not displayed?
> 
> What I worry is when no -F option was given at all.

In that case the default list applied, plus whatever new fields are 
mentioned in -s would also be added (appended or prepended).

The display order of columns should _probably_ be something like:

  key1 key2 ... non-key1 non-key2

there's not much point in sorting and then displaying the key not in 
front, right?

> > I think sort keys should be automatically added to the displayed 
> > fields list.
> 
> Agreed.

> > This problem should be solved if all -s fields are displayed - i.e. 
> > they are added to the -F list, right?
> 
> But old users might not aware of the new -F option, and use -s option 
> only.  If so, she will get output like the first example, right?

Well, there's a default -F list that applies - so this shouldn't be a 
problem, agreed? So output should be like the second (expected) example.

> > Basically there's just a single concept: the -F list. The -s option 
> > simply modifies and extends the -F list but internally perf report 
> > would not know anything about '-s', it only knows about fields to 
> > display and it would know which of those fields are to be sorted and 
> > in what order.
> >
> > Does that make sense to you? Does it cover everything needed?
> 
> I like the concept.  I'm just looking for a way to add it without 
> upsetting old users. :)

If the default -F list matches our current displayed fields list then 
there should not be much change in behavior (beyond the addition of total 
for call-graph outputs - which can be kept completely separate).

I'm not too worried about call-graph 'legacies': it generates such huge 
perf.data files which is parsed so slowly at the moment that there's very 
little user base ... Anyone who absolutely needs call-graph profiling uses 
SysProf which performs well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06  8:30                     ` Ingo Molnar
  2013-11-06  9:17                       ` Namhyung Kim
@ 2013-11-06 12:10                       ` Frederic Weisbecker
  2013-11-11 12:12                         ` Ingo Molnar
  1 sibling, 1 reply; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-06 12:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Wed, Nov 06, 2013 at 09:30:46AM +0100, Ingo Molnar wrote:
> 
> * Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > Hi Ingo,
> > 
> > On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
> > > * Namhyung Kim <namhyung@kernel.org> wrote:
> > >> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
> > >> different in that it *generates* entries didn't get sampled originally. 
> > >> And as it requires callchains, total field will not work if callchains 
> > >> are missing.
> > >
> > > Well, 'total' should disappear if it's not available.
> > 
> > But what if it's the only sort key user gave?
> 
> Do you mean something like:
> 
>   -F self,name -s total
> 
> i.e. if a sort key not displayed?
> 
> I think sort keys should be automatically added to the displayed fields 
> list.
> 
> This rule is obviously met with the -F total:2,self:1,name:0 kind of 
> sorting syntax (you can only sort by fields that get displayed) - if mixed 
> with -s then it should be implicit I think.

I'm not sure why you want to add a new -F that adds news way to display fields.
Isn't -s enough for that?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 11:47                         ` Ingo Molnar
@ 2013-11-06 12:14                           ` Frederic Weisbecker
  2013-11-11 12:13                             ` Ingo Molnar
  2013-11-06 15:33                           ` David Ahern
  2013-11-06 16:09                           ` Peter Zijlstra
  2 siblings, 1 reply; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-06 12:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Wed, Nov 06, 2013 at 12:47:01PM +0100, Ingo Molnar wrote:
> 
> * Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > On Wed, 6 Nov 2013 09:30:46 +0100, Ingo Molnar wrote:
> > > * Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > >> Hi Ingo,
> > >> 
> > >> On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
> > >> > * Namhyung Kim <namhyung@kernel.org> wrote:
> > >> >> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
> > >> >> different in that it *generates* entries didn't get sampled originally. 
> > >> >> And as it requires callchains, total field will not work if callchains 
> > >> >> are missing.
> > >> >
> > >> > Well, 'total' should disappear if it's not available.
> > >> 
> > >> But what if it's the only sort key user gave?
> > >
> > > Do you mean something like:
> > >
> > >   -F self,name -s total
> > >
> > > i.e. if a sort key not displayed?
> > 
> > What I worry is when no -F option was given at all.
> 
> In that case the default list applied, plus whatever new fields are 
> mentioned in -s would also be added (appended or prepended).
> 
> The display order of columns should _probably_ be something like:
> 
>   key1 key2 ... non-key1 non-key2
> 
> there's not much point in sorting and then displaying the key not in 
> front, right?
> 
> > > I think sort keys should be automatically added to the displayed 
> > > fields list.
> > 
> > Agreed.
> 
> > > This problem should be solved if all -s fields are displayed - i.e. 
> > > they are added to the -F list, right?
> > 
> > But old users might not aware of the new -F option, and use -s option 
> > only.  If so, she will get output like the first example, right?
> 
> Well, there's a default -F list that applies - so this shouldn't be a 
> problem, agreed? So output should be like the second (expected) example.
> 
> > > Basically there's just a single concept: the -F list. The -s option 
> > > simply modifies and extends the -F list but internally perf report 
> > > would not know anything about '-s', it only knows about fields to 
> > > display and it would know which of those fields are to be sorted and 
> > > in what order.
> > >
> > > Does that make sense to you? Does it cover everything needed?
> > 
> > I like the concept.  I'm just looking for a way to add it without 
> > upsetting old users. :)
> 
> If the default -F list matches our current displayed fields list then 
> there should not be much change in behavior (beyond the addition of total 
> for call-graph outputs - which can be kept completely separate).
> 
> I'm not too worried about call-graph 'legacies': it generates such huge 
> perf.data files which is parsed so slowly at the moment that there's very 
> little user base ... Anyone who absolutely needs call-graph profiling uses 
> SysProf which performs well.

I'm a bit confused by what will be changed with call-graph here. Also I've
seen perf callgraph reports quite often on emails not even related to perf
developement. It doesn't appear to me like an irrelevant feature...

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 11:47                         ` Ingo Molnar
  2013-11-06 12:14                           ` Frederic Weisbecker
@ 2013-11-06 15:33                           ` David Ahern
  2013-11-11 12:19                             ` Ingo Molnar
  2013-11-12 12:08                             ` Pekka Enberg
  2013-11-06 16:09                           ` Peter Zijlstra
  2 siblings, 2 replies; 64+ messages in thread
From: David Ahern @ 2013-11-06 15:33 UTC (permalink / raw)
  To: Ingo Molnar, Namhyung Kim
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On 11/6/13, 4:47 AM, Ingo Molnar wrote:
> I'm not too worried about call-graph 'legacies': it generates such huge
> perf.data files which is parsed so slowly at the moment that there's very
> little user base ... Anyone who absolutely needs call-graph profiling uses
> SysProf which performs well.

Actually, perf with callchains is used quite heavily on my products. One 
of the selling points of perf.

David


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 11:47                         ` Ingo Molnar
  2013-11-06 12:14                           ` Frederic Weisbecker
  2013-11-06 15:33                           ` David Ahern
@ 2013-11-06 16:09                           ` Peter Zijlstra
  2013-11-11 12:17                             ` Ingo Molnar
  2 siblings, 1 reply; 64+ messages in thread
From: Peter Zijlstra @ 2013-11-06 16:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Wed, Nov 06, 2013 at 12:47:01PM +0100, Ingo Molnar wrote:
> I'm not too worried about call-graph 'legacies': it generates such huge 
> perf.data files which is parsed so slowly at the moment that there's very 
> little user base ... Anyone who absolutely needs call-graph profiling uses 
> SysProf which performs well.

Uhm, say what? I use it, and I don't use sysprof since that thing is
totally not usable ;-)



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 12:10                       ` Frederic Weisbecker
@ 2013-11-11 12:12                         ` Ingo Molnar
  2013-11-11 13:01                           ` Frederic Weisbecker
  0 siblings, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-11-11 12:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Wed, Nov 06, 2013 at 09:30:46AM +0100, Ingo Molnar wrote:
> > 
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > > Hi Ingo,
> > > 
> > > On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
> > > > * Namhyung Kim <namhyung@kernel.org> wrote:
> > > >> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
> > > >> different in that it *generates* entries didn't get sampled originally. 
> > > >> And as it requires callchains, total field will not work if callchains 
> > > >> are missing.
> > > >
> > > > Well, 'total' should disappear if it's not available.
> > > 
> > > But what if it's the only sort key user gave?
> > 
> > Do you mean something like:
> > 
> >   -F self,name -s total
> > 
> > i.e. if a sort key not displayed?
> > 
> > I think sort keys should be automatically added to the displayed fields 
> > list.
> > 
> > This rule is obviously met with the -F total:2,self:1,name:0 kind of 
> > sorting syntax (you can only sort by fields that get displayed) - if 
> > mixed with -s then it should be implicit I think.
> 
> I'm not sure why you want to add a new -F that adds news way to display 
> fields. Isn't -s enough for that?

Well, -s implies sorting.

With -F we could decouple sorting from display order, and allow output 
like:

  # Symbol   Command    Shared Object   Overhead

Where we still sort by 'overhead', yet display things by having 'overhead' 
last.

So basically have maximum flexibility of output and sorting - into which 
the new 'total' field for accumulated stats would fit automatically.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 12:14                           ` Frederic Weisbecker
@ 2013-11-11 12:13                             ` Ingo Molnar
  2013-11-11 13:08                               ` Frederic Weisbecker
  0 siblings, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-11-11 12:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Wed, Nov 06, 2013 at 12:47:01PM +0100, Ingo Molnar wrote:
> > 
> > * Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > > On Wed, 6 Nov 2013 09:30:46 +0100, Ingo Molnar wrote:
> > > > * Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > >> Hi Ingo,
> > > >> 
> > > >> On Tue, 5 Nov 2013 12:58:02 +0100, Ingo Molnar wrote:
> > > >> > * Namhyung Kim <namhyung@kernel.org> wrote:
> > > >> >> But the 'cumulative' (btw, I feel a bit hard to type this word..) is 
> > > >> >> different in that it *generates* entries didn't get sampled originally. 
> > > >> >> And as it requires callchains, total field will not work if callchains 
> > > >> >> are missing.
> > > >> >
> > > >> > Well, 'total' should disappear if it's not available.
> > > >> 
> > > >> But what if it's the only sort key user gave?
> > > >
> > > > Do you mean something like:
> > > >
> > > >   -F self,name -s total
> > > >
> > > > i.e. if a sort key not displayed?
> > > 
> > > What I worry is when no -F option was given at all.
> > 
> > In that case the default list applied, plus whatever new fields are 
> > mentioned in -s would also be added (appended or prepended).
> > 
> > The display order of columns should _probably_ be something like:
> > 
> >   key1 key2 ... non-key1 non-key2
> > 
> > there's not much point in sorting and then displaying the key not in 
> > front, right?
> > 
> > > > I think sort keys should be automatically added to the displayed 
> > > > fields list.
> > > 
> > > Agreed.
> > 
> > > > This problem should be solved if all -s fields are displayed - i.e. 
> > > > they are added to the -F list, right?
> > > 
> > > But old users might not aware of the new -F option, and use -s option 
> > > only.  If so, she will get output like the first example, right?
> > 
> > Well, there's a default -F list that applies - so this shouldn't be a 
> > problem, agreed? So output should be like the second (expected) example.
> > 
> > > > Basically there's just a single concept: the -F list. The -s option 
> > > > simply modifies and extends the -F list but internally perf report 
> > > > would not know anything about '-s', it only knows about fields to 
> > > > display and it would know which of those fields are to be sorted and 
> > > > in what order.
> > > >
> > > > Does that make sense to you? Does it cover everything needed?
> > > 
> > > I like the concept.  I'm just looking for a way to add it without 
> > > upsetting old users. :)
> > 
> > If the default -F list matches our current displayed fields list then 
> > there should not be much change in behavior (beyond the addition of total 
> > for call-graph outputs - which can be kept completely separate).
> > 
> > I'm not too worried about call-graph 'legacies': it generates such huge 
> > perf.data files which is parsed so slowly at the moment that there's very 
> > little user base ... Anyone who absolutely needs call-graph profiling uses 
> > SysProf which performs well.
> 
> I'm a bit confused by what will be changed with call-graph here. Also 
> I've seen perf callgraph reports quite often on emails not even related 
> to perf developement. It doesn't appear to me like an irrelevant 
> feature...

It's not an irrelevant feature at all! :-)

It's just that for any sort of longer profile it was pretty 
difficult/frustrating to use, which I think held back adoption.

That performance problem got fixed now by you and Namhyung, so I think 
we'll see even wider adoption of call-graph profiling...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 16:09                           ` Peter Zijlstra
@ 2013-11-11 12:17                             ` Ingo Molnar
  0 siblings, 0 replies; 64+ messages in thread
From: Ingo Molnar @ 2013-11-11 12:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Nov 06, 2013 at 12:47:01PM +0100, Ingo Molnar wrote:
>
> > I'm not too worried about call-graph 'legacies': it generates such 
> > huge perf.data files which is parsed so slowly at the moment that 
> > there's very little user base ... Anyone who absolutely needs 
> > call-graph profiling uses SysProf which performs well.
> 
> Uhm, say what? I use it, and I don't use sysprof since that thing is 
> totally not usable ;-)

You aren't a typical case at all! :-)

Just look back the example where Linus tried to use call-graph profiling 
to profile a mild 60-seconds workload (a kernel build) and came away 
reporting that his perf session locked up.

I think many other people ran into that performance problem. Those who are 
using it must be using it for far shorter workloads.

Anyway, that's all fixed now, and I do think that call-graph profiling is 
one of perf's killer features - I thought that from day 1 on when I 
suggested to Frederic that it would be really important to implement it 
;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 15:33                           ` David Ahern
@ 2013-11-11 12:19                             ` Ingo Molnar
  2013-11-11 14:44                               ` David Ahern
  2013-11-12 12:08                             ` Pekka Enberg
  1 sibling, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-11-11 12:19 UTC (permalink / raw)
  To: David Ahern
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma


* David Ahern <dsahern@gmail.com> wrote:

> On 11/6/13, 4:47 AM, Ingo Molnar wrote:
> >I'm not too worried about call-graph 'legacies': it generates such huge
> >perf.data files which is parsed so slowly at the moment that there's very
> >little user base ... Anyone who absolutely needs call-graph profiling uses
> >SysProf which performs well.
> 
> Actually, perf with callchains is used quite heavily on my products.
> One of the selling points of perf.

That's nice to hear :)

In what way is call-graph profiling utilized typically?

Is it system-wide, i.e. something like:

	perf record -a -g sleep 10

? If yes then that would explain why scalability problems rarely surfaced, 
it takes a longer user-space profile to get to the event counts where 
scalability started hurting.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-11 12:12                         ` Ingo Molnar
@ 2013-11-11 13:01                           ` Frederic Weisbecker
  0 siblings, 0 replies; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-11 13:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Mon, Nov 11, 2013 at 01:12:12PM +0100, Ingo Molnar wrote:
> > I'm not sure why you want to add a new -F that adds news way to display 
> > fields. Isn't -s enough for that?
> 
> Well, -s implies sorting.
> 
> With -F we could decouple sorting from display order, and allow output 
> like:
> 
>   # Symbol   Command    Shared Object   Overhead
> 
> Where we still sort by 'overhead', yet display things by having 'overhead' 
> last.
> 
> So basically have maximum flexibility of output and sorting - into which 
> the new 'total' field for accumulated stats would fit automatically.

Ok, I haven't followed the details on why we want this to display the cumulated
overhead.

But reordering the columns should be ok as long as we have the same fields present in -F and -s.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-11 12:13                             ` Ingo Molnar
@ 2013-11-11 13:08                               ` Frederic Weisbecker
  2013-11-11 13:56                                 ` Ingo Molnar
  0 siblings, 1 reply; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-11 13:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Mon, Nov 11, 2013 at 01:13:52PM +0100, Ingo Molnar wrote:
> 
> It's not an irrelevant feature at all! :-)
> 
> It's just that for any sort of longer profile it was pretty 
> difficult/frustrating to use, which I think held back adoption.
> 
> That performance problem got fixed now by you and Namhyung, so I think 
> we'll see even wider adoption of call-graph profiling...

Ah I see now. At the time Linus reported his issue, I had the feeling his
usecase was a bit "extreme", but I actually have no idea how far perf can be
used given that I'm mostly used to short benchmarks, typically hackbench,
perf bench sched messaging et al. Thing is I don't use it enough for my
real usecases :)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-11 13:08                               ` Frederic Weisbecker
@ 2013-11-11 13:56                                 ` Ingo Molnar
  2013-11-11 15:45                                   ` Frederic Weisbecker
  0 siblings, 1 reply; 64+ messages in thread
From: Ingo Molnar @ 2013-11-11 13:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Mon, Nov 11, 2013 at 01:13:52PM +0100, Ingo Molnar wrote:
> > 
> > It's not an irrelevant feature at all! :-)
> > 
> > It's just that for any sort of longer profile it was pretty 
> > difficult/frustrating to use, which I think held back adoption.
> > 
> > That performance problem got fixed now by you and Namhyung, so I think 
> > we'll see even wider adoption of call-graph profiling...
> 
> Ah I see now. At the time Linus reported his issue, I had the feeling 
> his usecase was a bit "extreme", but I actually have no idea how far 
> perf can be used given that I'm mostly used to short benchmarks, 
> typically hackbench, perf bench sched messaging et al. Thing is I don't 
> use it enough for my real usecases :)

Well, it's a bit of a catch-22: if there are severe scalability problems 
for a usecase then people won't use it because they cannot use it. So 
developers should usually try to over-measure things and go for extreme 
uses and such.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-11 12:19                             ` Ingo Molnar
@ 2013-11-11 14:44                               ` David Ahern
  0 siblings, 0 replies; 64+ messages in thread
From: David Ahern @ 2013-11-11 14:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Frederic Weisbecker, Stephane Eranian, Jiri Olsa, Rodrigo Campos,
	Arun Sharma

On 11/11/13, 5:19 AM, Ingo Molnar wrote:
>
> In what way is call-graph profiling utilized typically?
>
> Is it system-wide, i.e. something like:
>
> 	perf record -a -g sleep 10
>
> ? If yes then that would explain why scalability problems rarely surfaced,
> it takes a longer user-space profile to get to the event counts where
> scalability started hurting.

Both. But not time periods long enough to generate GB sized files 
(limitations in the product). I get various reports of hung commands 
(usually perf-not terminated properly (now fixed) or the file is 
corrupted on transfer), but noone has complained to me about perf-report 
appearing to hang or for 20-30 minutes to generate a result.

David

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-11 13:56                                 ` Ingo Molnar
@ 2013-11-11 15:45                                   ` Frederic Weisbecker
  0 siblings, 0 replies; 64+ messages in thread
From: Frederic Weisbecker @ 2013-11-11 15:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Namhyung Kim, Pekka Enberg, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Paul Mackerras, Namhyung Kim, LKML,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On Mon, Nov 11, 2013 at 02:56:37PM +0100, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Mon, Nov 11, 2013 at 01:13:52PM +0100, Ingo Molnar wrote:
> > > 
> > > It's not an irrelevant feature at all! :-)
> > > 
> > > It's just that for any sort of longer profile it was pretty 
> > > difficult/frustrating to use, which I think held back adoption.
> > > 
> > > That performance problem got fixed now by you and Namhyung, so I think 
> > > we'll see even wider adoption of call-graph profiling...
> > 
> > Ah I see now. At the time Linus reported his issue, I had the feeling 
> > his usecase was a bit "extreme", but I actually have no idea how far 
> > perf can be used given that I'm mostly used to short benchmarks, 
> > typically hackbench, perf bench sched messaging et al. Thing is I don't 
> > use it enough for my real usecases :)
> 
> Well, it's a bit of a catch-22: if there are severe scalability problems 
> for a usecase then people won't use it because they cannot use it. So 
> developers should usually try to over-measure things and go for extreme 
> uses and such.

Agreed :)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2)
  2013-11-06 15:33                           ` David Ahern
  2013-11-11 12:19                             ` Ingo Molnar
@ 2013-11-12 12:08                             ` Pekka Enberg
  1 sibling, 0 replies; 64+ messages in thread
From: Pekka Enberg @ 2013-11-12 12:08 UTC (permalink / raw)
  To: David Ahern, Ingo Molnar, Namhyung Kim
  Cc: Pekka Enberg, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Paul Mackerras, Namhyung Kim, LKML, Frederic Weisbecker,
	Stephane Eranian, Jiri Olsa, Rodrigo Campos, Arun Sharma

On 11/06/2013 05:33 PM, David Ahern wrote:
> On 11/6/13, 4:47 AM, Ingo Molnar wrote:
>> I'm not too worried about call-graph 'legacies': it generates such huge
>> perf.data files which is parsed so slowly at the moment that there's 
>> very
>> little user base ... Anyone who absolutely needs call-graph profiling 
>> uses
>> SysProf which performs well.
>
> Actually, perf with callchains is used quite heavily on my products. 
> One of the selling points of perf.

I use perf with callchains all the time as well.

                     Pekka

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2013-11-12 12:08 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-31  6:56 [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Namhyung Kim
2013-10-31  6:56 ` [PATCH 01/14] perf tools: Consolidate __hists__add_*entry() Namhyung Kim
2013-11-01 11:56   ` Jiri Olsa
2013-11-06  5:43   ` [tip:perf/core] perf hists: " tip-bot for Namhyung Kim
2013-10-31  6:56 ` [PATCH 02/14] perf tools: Introduce struct add_entry_iter Namhyung Kim
2013-11-01 12:07   ` Jiri Olsa
2013-11-05  7:09     ` Namhyung Kim
2013-11-01 12:09   ` Jiri Olsa
2013-11-05  7:16     ` Namhyung Kim
2013-10-31  6:56 ` [PATCH 03/14] perf hists: Convert hist entry functions to use struct he_stat Namhyung Kim
2013-11-04 23:45   ` Arnaldo Carvalho de Melo
2013-11-05  7:17     ` Namhyung Kim
2013-10-31  6:56 ` [PATCH 04/14] perf hists: Add support for accumulated stat of hist entry Namhyung Kim
2013-10-31  6:56 ` [PATCH 05/14] perf hists: Check if accumulated when adding a " Namhyung Kim
2013-10-31  6:56 ` [PATCH 06/14] perf hists: Accumulate hist entry stat based on the callchain Namhyung Kim
2013-10-31  6:56 ` [PATCH 07/14] perf tools: Update cpumode for each cumulative entry Namhyung Kim
2013-11-01 12:55   ` Jiri Olsa
2013-11-05  7:41     ` Namhyung Kim
2013-10-31  6:56 ` [PATCH 08/14] perf report: Cache cumulative callchains Namhyung Kim
2013-10-31 11:13   ` Rodrigo Campos
2013-11-01  7:07     ` Namhyung Kim
2013-11-01 14:24       ` Rodrigo Campos
2013-11-01 15:16       ` Rodrigo Campos
2013-11-01 12:29   ` Jiri Olsa
2013-11-01 12:57     ` Jiri Olsa
2013-10-31  6:56 ` [PATCH 09/14] perf hists: Sort hist entries by accumulated period Namhyung Kim
2013-10-31  6:56 ` [PATCH 10/14] perf ui/hist: Add support to accumulated hist stat Namhyung Kim
2013-10-31  6:56 ` [PATCH 11/14] perf ui/browser: " Namhyung Kim
2013-10-31  6:56 ` [PATCH 12/14] perf ui/gtk: " Namhyung Kim
2013-10-31  6:56 ` [PATCH 13/14] perf tools: Apply percent-limit to cumulative percentage Namhyung Kim
2013-10-31  6:56 ` [PATCH 14/14] perf report: Add -g cumulative option Namhyung Kim
2013-11-01 13:17   ` Jiri Olsa
2013-11-05  7:44     ` Namhyung Kim
2013-10-31  8:09 ` [RFC/PATCHSET 00/14] perf report: Add support to accumulate hist periods (v2) Ingo Molnar
2013-11-01  6:48   ` Namhyung Kim
2013-11-01  7:55     ` Ingo Molnar
2013-11-01  9:18       ` Pekka Enberg
2013-11-01  9:22       ` Namhyung Kim
2013-11-01  9:27         ` Ingo Molnar
2013-11-05  7:31           ` Namhyung Kim
2013-11-05  7:46             ` Ingo Molnar
2013-11-05  9:05               ` Namhyung Kim
2013-11-05 11:58                 ` Ingo Molnar
2013-11-06  7:56                   ` Namhyung Kim
2013-11-06  8:30                     ` Ingo Molnar
2013-11-06  9:17                       ` Namhyung Kim
2013-11-06 11:47                         ` Ingo Molnar
2013-11-06 12:14                           ` Frederic Weisbecker
2013-11-11 12:13                             ` Ingo Molnar
2013-11-11 13:08                               ` Frederic Weisbecker
2013-11-11 13:56                                 ` Ingo Molnar
2013-11-11 15:45                                   ` Frederic Weisbecker
2013-11-06 15:33                           ` David Ahern
2013-11-11 12:19                             ` Ingo Molnar
2013-11-11 14:44                               ` David Ahern
2013-11-12 12:08                             ` Pekka Enberg
2013-11-06 16:09                           ` Peter Zijlstra
2013-11-11 12:17                             ` Ingo Molnar
2013-11-06 12:10                       ` Frederic Weisbecker
2013-11-11 12:12                         ` Ingo Molnar
2013-11-11 13:01                           ` Frederic Weisbecker
2013-11-04 13:27     ` Frederic Weisbecker
2013-11-04 13:23   ` Frederic Weisbecker
2013-11-04 13:34   ` Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.