All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
@ 2015-11-02 12:57 Namhyung Kim
  2015-11-02 12:57 ` [RFC/PATCH v2 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 12:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Hello,

This is what Brendan requested on the perf-users mailing list [1] to
support FlameGraphs [2] more efficiently.  This patchset adds a few
more callchain options to adjust the output for it.

At first, 'folded' output mode was added.  The folded output puts all
calchain nodes in a line separated by semicolons, a space and the
value.  Now it only supports --stdio as other UI provides some way of
folding/expanding callchains dynamically.

The value is now can be one of 'percent', 'period', or 'count'.  The
percent is current default output and the period is the raw number of
sample periods.  The count is the number of samples for each callchain.

Here's an example:

  $ perf report --no-children --show-nr-samples --stdio -g folded,count
  ...
    39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
  intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
  intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23


  $ perf report --no-children --stdio -g percent
  ...
    39.93%  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--28.63%-- start_secondary
               |
                --11.30%-- rest_init


  $ perf report --no-children --stdio --show-total-period -g period
  ...
    39.93%   13018705  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--9334403-- start_secondary
               |
                --3684302-- rest_init


  $ perf report --no-children --stdio --show-nr-samples -g count
  ...
    39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--57-- start_secondary
               |
                --23-- rest_init


You can get it from 'perf/callchain-fold-v2' branch on my tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Any comments are welcome, thanks
Namhyung


[1] http://www.spinics.net/lists/linux-perf-users/msg02498.html
[2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html


Namhyung Kim (4):
  perf report: Support folded callchain mode on --stdio
  perf callchain: Abstract callchain print function
  perf callchain: Add count fields to struct callchain_node
  perf report: Add callchain value option

 tools/perf/Documentation/perf-report.txt | 13 +++--
 tools/perf/builtin-report.c              |  4 +-
 tools/perf/ui/browsers/hists.c           |  8 +--
 tools/perf/ui/gtk/hists.c                |  8 +--
 tools/perf/ui/stdio/hist.c               | 91 ++++++++++++++++++++++++++------
 tools/perf/util/callchain.c              | 87 +++++++++++++++++++++++++++++-
 tools/perf/util/callchain.h              | 24 ++++++++-
 tools/perf/util/util.c                   |  3 +-
 8 files changed, 204 insertions(+), 34 deletions(-)

-- 
2.6.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC/PATCH v2 1/4] perf report: Support folded callchain mode on --stdio
  2015-11-02 12:57 [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Namhyung Kim
@ 2015-11-02 12:57 ` Namhyung Kim
  2015-11-02 12:57 ` [RFC/PATCH v2 2/4] perf callchain: Abstract callchain print function Namhyung Kim
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 12:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, a space, then the
(absolute) percent values like in 'flat' mode.

For example, following 20 lines can be printed in 3 lines with the
folded output mode;

  $ perf report -g flat --no-children | grep -v ^# | head -20
      60.48%  swapper  [kernel.vmlinux]  [k] intel_idle
              54.60%
                 intel_idle
                 cpuidle_enter_state
                 cpuidle_enter
                 call_cpuidle
                 cpu_startup_entry
                 start_secondary

              5.88%
                 intel_idle
                 cpuidle_enter_state
                 cpuidle_enter
                 call_cpuidle
                 cpu_startup_entry
                 rest_init
                 start_kernel
                 x86_64_start_reservations
                 x86_64_start_kernel

  $ perf report -g folded --no-children | grep -v ^# | head -3
      60.48%  swapper  [kernel.vmlinux]  [k] intel_idle
  intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 54.60%
  intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel  5.88%

This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1].  Support for other UI might be
added later.

[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Requested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/stdio/hist.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/callchain.c |  6 +++++
 tools/perf/util/callchain.h |  3 ++-
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index dfcbc90146ef..2c7436241912 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -260,6 +260,56 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
 	return ret;
 }
 
+static size_t __callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
+{
+	struct callchain_list *chain;
+	size_t ret = 0;
+	char bf[1024];
+	bool first;
+
+	if (!node)
+		return 0;
+
+	ret += __callchain__fprintf_folded(fp, node->parent);
+
+	first = (ret == 0);
+	list_for_each_entry(chain, &node->val, list) {
+		if (chain->ip >= PERF_CONTEXT_MAX)
+			continue;
+		ret += fprintf(fp, "%s%s", first ? "" : ";",
+			       callchain_list__sym_name(chain,
+						bf, sizeof(bf), false));
+		first = false;
+	}
+
+	return ret;
+}
+
+static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
+					u64 total_samples)
+{
+	size_t ret = 0;
+	u32 entries_printed = 0;
+	struct callchain_node *chain;
+	struct rb_node *rb_node = rb_first(tree);
+
+	while (rb_node) {
+		double percent;
+
+		chain = rb_entry(rb_node, struct callchain_node, rb_node);
+		percent = chain->hit * 100.0 / total_samples;
+
+		ret += __callchain__fprintf_folded(fp, chain);
+		ret += fprintf(fp, " %6.2f%%\n", percent);
+		if (++entries_printed == callchain_param.print_limit)
+			break;
+
+		rb_node = rb_next(rb_node);
+	}
+
+	return ret;
+}
+
 static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
 					    u64 total_samples, int left_margin,
 					    FILE *fp)
@@ -278,6 +328,9 @@ static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
 	case CHAIN_FLAT:
 		return callchain__fprintf_flat(fp, &he->sorted_chain, total_samples);
 		break;
+	case CHAIN_FOLDED:
+		return callchain__fprintf_folded(fp, &he->sorted_chain, total_samples);
+		break;
 	case CHAIN_NONE:
 		break;
 	default:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 735ad48e1858..08cb220ba5ea 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -44,6 +44,10 @@ static int parse_callchain_mode(const char *value)
 		callchain_param.mode = CHAIN_GRAPH_REL;
 		return 0;
 	}
+	if (!strncmp(value, "folded", strlen(value))) {
+		callchain_param.mode = CHAIN_FOLDED;
+		return 0;
+	}
 	return -1;
 }
 
@@ -218,6 +222,7 @@ rb_insert_callchain(struct rb_root *root, struct callchain_node *chain,
 
 		switch (mode) {
 		case CHAIN_FLAT:
+		case CHAIN_FOLDED:
 			if (rnode->hit < chain->hit)
 				p = &(*p)->rb_left;
 			else
@@ -338,6 +343,7 @@ int callchain_register_param(struct callchain_param *param)
 		param->sort = sort_chain_graph_rel;
 		break;
 	case CHAIN_FLAT:
+	case CHAIN_FOLDED:
 		param->sort = sort_chain_flat;
 		break;
 	case CHAIN_NONE:
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index fce8161e54db..2f305384531f 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -43,7 +43,8 @@ enum chain_mode {
 	CHAIN_NONE,
 	CHAIN_FLAT,
 	CHAIN_GRAPH_ABS,
-	CHAIN_GRAPH_REL
+	CHAIN_GRAPH_REL,
+	CHAIN_FOLDED,
 };
 
 enum chain_order {
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC/PATCH v2 2/4] perf callchain: Abstract callchain print function
  2015-11-02 12:57 [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Namhyung Kim
  2015-11-02 12:57 ` [RFC/PATCH v2 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
@ 2015-11-02 12:57 ` Namhyung Kim
  2015-11-02 12:57 ` [RFC/PATCH v2 3/4] perf callchain: Add count fields to struct callchain_node Namhyung Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 12:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

This is a preparation to support for printing other type of callchain
value like count or period.

Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  8 +++++---
 tools/perf/ui/gtk/hists.c      |  8 ++------
 tools/perf/ui/stdio/hist.c     | 36 ++++++++++++++++++------------------
 tools/perf/util/callchain.c    | 25 +++++++++++++++++++++++++
 tools/perf/util/callchain.h    |  4 ++++
 5 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e5afb8936040..a8897aab4c4a 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -592,7 +592,6 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
 	while (node) {
 		struct callchain_node *child = rb_entry(node, struct callchain_node, rb_node);
 		struct rb_node *next = rb_next(node);
-		u64 cumul = callchain_cumul_hits(child);
 		struct callchain_list *chain;
 		char folded_sign = ' ';
 		int first = true;
@@ -619,9 +618,12 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
 						       browser->show_dso);
 
 			if (was_first && need_percent) {
-				double percent = cumul * 100.0 / total;
+				char buf[64];
 
-				if (asprintf(&alloc_str, "%2.2f%% %s", percent, str) < 0)
+				callchain_node__sprintf_value(child, buf, sizeof(buf),
+							      total);
+
+				if (asprintf(&alloc_str, "%s %s", buf, str) < 0)
 					str = "Not enough memory!";
 				else
 					str = alloc_str;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4b3585eed1e8..d8037b7023e8 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -100,14 +100,10 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 		struct callchain_list *chain;
 		GtkTreeIter iter, new_parent;
 		bool need_new_parent;
-		double percent;
-		u64 hits, child_total;
+		u64 child_total;
 
 		node = rb_entry(nd, struct callchain_node, rb_node);
 
-		hits = callchain_cumul_hits(node);
-		percent = 100.0 * hits / total;
-
 		new_parent = *parent;
 		need_new_parent = !has_single_node && (node->val_nr > 1);
 
@@ -116,7 +112,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 
 			gtk_tree_store_append(store, &iter, &new_parent);
 
-			scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
+			callchain_node__sprintf_value(node, buf, sizeof(buf), total);
 			gtk_tree_store_set(store, &iter, 0, buf, -1);
 
 			callchain_list__sym_name(chain, buf, sizeof(buf), false);
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 2c7436241912..e84ca21252d3 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -34,10 +34,10 @@ static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
 	return ret;
 }
 
-static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
+static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
+				     struct callchain_list *chain,
 				     int depth, int depth_mask, int period,
-				     u64 total_samples, u64 hits,
-				     int left_margin)
+				     u64 total_samples, int left_margin)
 {
 	int i;
 	size_t ret = 0;
@@ -50,10 +50,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 		else
 			ret += fprintf(fp, " ");
 		if (!period && i == depth - 1) {
-			double percent;
-
-			percent = hits * 100.0 / total_samples;
-			ret += percent_color_fprintf(fp, "--%2.2f%%-- ", percent);
+			ret += fprintf(fp, "--");
+			ret += callchain_node__fprintf_value(node, fp, total_samples);
+			ret += fprintf(fp, "--");
 		} else
 			ret += fprintf(fp, "%s", "          ");
 	}
@@ -120,10 +119,9 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 						   left_margin);
 		i = 0;
 		list_for_each_entry(chain, &child->val, list) {
-			ret += ipchain__fprintf_graph(fp, chain, depth,
+			ret += ipchain__fprintf_graph(fp, child, chain, depth,
 						      new_depth_mask, i++,
 						      total_samples,
-						      cumul,
 						      left_margin);
 		}
 
@@ -143,14 +141,17 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 
 	if (callchain_param.mode == CHAIN_GRAPH_REL &&
 		remaining && remaining != total_samples) {
+		struct callchain_node rem_node = {
+			.hit = remaining,
+		};
 
 		if (!rem_sq_bracket)
 			return ret;
 
 		new_depth_mask &= ~(1 << (depth - 1));
-		ret += ipchain__fprintf_graph(fp, &rem_hits, depth,
+		ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
 					      new_depth_mask, 0, total_samples,
-					      remaining, left_margin);
+					      left_margin);
 	}
 
 	return ret;
@@ -243,12 +244,11 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
 	struct rb_node *rb_node = rb_first(tree);
 
 	while (rb_node) {
-		double percent;
-
 		chain = rb_entry(rb_node, struct callchain_node, rb_node);
-		percent = chain->hit * 100.0 / total_samples;
 
-		ret = percent_color_fprintf(fp, "           %6.2f%%\n", percent);
+		ret += fprintf(fp, "           ");
+		ret += callchain_node__fprintf_value(chain, fp, total_samples);
+		ret += fprintf(fp, "\n");
 		ret += __callchain__fprintf_flat(fp, chain, total_samples);
 		ret += fprintf(fp, "\n");
 		if (++entries_printed == callchain_param.print_limit)
@@ -294,13 +294,13 @@ static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
 	struct rb_node *rb_node = rb_first(tree);
 
 	while (rb_node) {
-		double percent;
 
 		chain = rb_entry(rb_node, struct callchain_node, rb_node);
-		percent = chain->hit * 100.0 / total_samples;
 
 		ret += __callchain__fprintf_folded(fp, chain);
-		ret += fprintf(fp, " %6.2f%%\n", percent);
+		ret += putc(' ', fp);
+		ret += callchain_node__fprintf_value(chain, fp, total_samples);
+		ret += putc('\n', fp);
 		if (++entries_printed == callchain_param.print_limit)
 			break;
 
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 08cb220ba5ea..44184d198855 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -805,6 +805,31 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	return bf;
 }
 
+char *callchain_node__sprintf_value(struct callchain_node *node,
+				    char *bf, size_t bfsize, u64 total)
+{
+	double percent = 0.0;
+	u64 cumul = callchain_cumul_hits(node);
+
+	if (total)
+		percent = cumul * 100.0 / total;
+
+	scnprintf(bf, bfsize, "%6.2f%%", percent);
+	return bf;
+}
+
+int callchain_node__fprintf_value(struct callchain_node *node,
+				 FILE *fp, u64 total)
+{
+	double percent = 0.0;
+	u64 cumul = callchain_cumul_hits(node);
+
+	if (total)
+		percent = cumul * 100.0 / total;
+
+	return percent_color_fprintf(fp, "%.2f%%", percent);
+}
+
 static void free_callchain_node(struct callchain_node *node)
 {
 	struct callchain_list *list, *tmp;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 2f305384531f..3a90a57f6213 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -230,6 +230,10 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
 
 char *callchain_list__sym_name(struct callchain_list *cl,
 			       char *bf, size_t bfsize, bool show_dso);
+char *callchain_node__sprintf_value(struct callchain_node *node,
+				    char *bf, size_t bfsize, u64 total);
+int callchain_node__fprintf_value(struct callchain_node *node,
+				  FILE *fp, u64 total);
 
 void free_callchain(struct callchain_root *root);
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC/PATCH v2 3/4] perf callchain: Add count fields to struct callchain_node
  2015-11-02 12:57 [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Namhyung Kim
  2015-11-02 12:57 ` [RFC/PATCH v2 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
  2015-11-02 12:57 ` [RFC/PATCH v2 2/4] perf callchain: Abstract callchain print function Namhyung Kim
@ 2015-11-02 12:57 ` Namhyung Kim
  2015-11-02 12:57 ` [RFC/PATCH v2 4/4] perf report: Add callchain value option Namhyung Kim
  2015-11-02 20:37 ` [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Brendan Gregg
  4 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 12:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

It's to track the count of occurrences of the callchains.

Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/callchain.c | 10 ++++++++++
 tools/perf/util/callchain.h |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 44184d198855..0a97d77509bd 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -437,6 +437,8 @@ add_child(struct callchain_node *parent,
 
 	new->children_hit = 0;
 	new->hit = period;
+	new->children_count = 0;
+	new->count = 1;
 	return new;
 }
 
@@ -484,6 +486,9 @@ split_add_child(struct callchain_node *parent,
 	parent->children_hit = callchain_cumul_hits(new);
 	new->val_nr = parent->val_nr - idx_local;
 	parent->val_nr = idx_local;
+	new->count = parent->count;
+	new->children_count = parent->children_count;
+	parent->children_count = callchain_cumul_counts(new);
 
 	/* create a new child for the new branch if any */
 	if (idx_total < cursor->nr) {
@@ -494,6 +499,8 @@ split_add_child(struct callchain_node *parent,
 
 		parent->hit = 0;
 		parent->children_hit += period;
+		parent->count = 0;
+		parent->children_count += 1;
 
 		node = callchain_cursor_current(cursor);
 		new = add_child(parent, cursor, period);
@@ -516,6 +523,7 @@ split_add_child(struct callchain_node *parent,
 		rb_insert_color(&new->rb_node_in, &parent->rb_root_in);
 	} else {
 		parent->hit = period;
+		parent->count = 1;
 	}
 }
 
@@ -562,6 +570,7 @@ append_chain_children(struct callchain_node *root,
 
 inc_children_hit:
 	root->children_hit += period;
+	root->children_count++;
 }
 
 static int
@@ -614,6 +623,7 @@ append_chain(struct callchain_node *root,
 	/* we match 100% of the path, increment the hit */
 	if (matches == root->val_nr && cursor->pos == cursor->nr) {
 		root->hit += period;
+		root->count++;
 		return 0;
 	}
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3a90a57f6213..2f948f0ff034 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -60,6 +60,8 @@ struct callchain_node {
 	struct rb_root		rb_root_in; /* input tree of children */
 	struct rb_root		rb_root;    /* sorted output tree of children */
 	unsigned int		val_nr;
+	unsigned int		count;
+	unsigned int		children_count;
 	u64			hit;
 	u64			children_hit;
 };
@@ -145,6 +147,11 @@ static inline u64 callchain_cumul_hits(struct callchain_node *node)
 	return node->hit + node->children_hit;
 }
 
+static inline int callchain_cumul_counts(struct callchain_node *node)
+{
+	return node->count + node->children_count;
+}
+
 int callchain_register_param(struct callchain_param *param);
 int callchain_append(struct callchain_root *root,
 		     struct callchain_cursor *cursor,
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC/PATCH v2 4/4] perf report: Add callchain value option
  2015-11-02 12:57 [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Namhyung Kim
                   ` (2 preceding siblings ...)
  2015-11-02 12:57 ` [RFC/PATCH v2 3/4] perf callchain: Add count fields to struct callchain_node Namhyung Kim
@ 2015-11-02 12:57 ` Namhyung Kim
  2015-11-02 20:37 ` [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Brendan Gregg
  4 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 12:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'.  The percent is
same as before and it's the default behavior.  The period displays the
raw period value rather than the percentage.  The count displays the
number of occurrences.

  $ perf report --no-children --stdio -g percent
  ...
    39.93%  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--28.63%-- start_secondary
               |
                --11.30%-- rest_init

  $ perf report --no-children --show-total-period --stdio -g period
  ...
    39.93%   13018705  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--9334403-- start_secondary
               |
                --3684302-- rest_init

  $ perf report --no-children --show-nr-samples --stdio -g count
  ...
    39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--57-- start_secondary
               |
                --23-- rest_init

Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt | 13 ++++---
 tools/perf/builtin-report.c              |  4 +--
 tools/perf/ui/stdio/hist.c               |  8 +++++
 tools/perf/util/callchain.c              | 60 +++++++++++++++++++++++++++-----
 tools/perf/util/callchain.h              | 10 +++++-
 tools/perf/util/util.c                   |  3 +-
 6 files changed, 82 insertions(+), 16 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 5ce8da1e1256..bb9fd23a105e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -170,11 +170,11 @@ OPTIONS
         Dump raw trace in ASCII.
 
 -g::
---call-graph=<print_type,threshold[,print_limit],order,sort_key,branch>::
+--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
         Display call chains using type, min percent threshold, print limit,
-	call order, sort key and branch.  Note that ordering of parameters is not
-	fixed so any parement can be given in an arbitraty order.  One exception
-	is the print_limit which should be preceded by threshold.
+	call order, sort key, optional branch and value.  Note that ordering of
+	parameters is not fixed so any parement can be given in an arbitraty order.
+	One exception is the print_limit which should be preceded by threshold.
 
 	print_type can be either:
 	- flat: single column, linear exposure of call chains.
@@ -204,6 +204,11 @@ OPTIONS
 	- branch: include last branch information in callgraph when available.
 	          Usually more convenient to use --branch-history for this.
 
+	value can be:
+	- percent: diplay overhead percent (default)
+	- period: display event period
+	- count: display evnt count
+
 --children::
 	Accumulate callchain of children to parent entry so that then can
 	show up in the output.  The output will have a new "Children" column
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2853ad2bd435..3dd4bb4ded1a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -625,7 +625,7 @@ parse_percent_limit(const struct option *opt, const char *str,
 	return 0;
 }
 
-#define CALLCHAIN_DEFAULT_OPT  "graph,0.5,caller,function"
+#define CALLCHAIN_DEFAULT_OPT  "graph,0.5,caller,function,percent"
 
 const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
 				     CALLCHAIN_REPORT_HELP
@@ -708,7 +708,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
 		    "Only display entries with parent-match"),
 	OPT_CALLBACK_DEFAULT('g', "call-graph", &report,
-			     "print_type,threshold[,print_limit],order,sort_key[,branch]",
+			     "print_type,threshold[,print_limit],order,sort_key[,branch],value",
 			     report_callchain_help, &report_parse_callchain_opt,
 			     callchain_default_opt),
 	OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index e84ca21252d3..2104b09d41a8 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -88,6 +88,7 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 	size_t ret = 0;
 	int i;
 	uint entries_printed = 0;
+	int cumul_count = 0;
 
 	remaining = total_samples;
 
@@ -99,6 +100,7 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 		child = rb_entry(node, struct callchain_node, rb_node);
 		cumul = callchain_cumul_hits(child);
 		remaining -= cumul;
+		cumul_count += callchain_cumul_counts(child);
 
 		/*
 		 * The depth mask manages the output of pipes that show
@@ -148,6 +150,12 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 		if (!rem_sq_bracket)
 			return ret;
 
+		if (callchain_param.value == CCVAL_COUNT) {
+			rem_node.count = child->parent->children_count - cumul_count;
+			if (rem_node.count <= 0)
+				return ret;
+		}
+
 		new_depth_mask &= ~(1 << (depth - 1));
 		ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
 					      new_depth_mask, 0, total_samples,
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0a97d77509bd..7f0a89584f1b 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -83,6 +83,23 @@ static int parse_callchain_sort_key(const char *value)
 	return -1;
 }
 
+static int parse_callchain_value(const char *value)
+{
+	if (!strncmp(value, "percent", strlen(value))) {
+		callchain_param.value = CCVAL_PERCENT;
+		return 0;
+	}
+	if (!strncmp(value, "period", strlen(value))) {
+		callchain_param.value = CCVAL_PERIOD;
+		return 0;
+	}
+	if (!strncmp(value, "count", strlen(value))) {
+		callchain_param.value = CCVAL_COUNT;
+		return 0;
+	}
+	return -1;
+}
+
 static int
 __parse_callchain_report_opt(const char *arg, bool allow_record_opt)
 {
@@ -106,7 +123,8 @@ __parse_callchain_report_opt(const char *arg, bool allow_record_opt)
 
 		if (!parse_callchain_mode(tok) ||
 		    !parse_callchain_order(tok) ||
-		    !parse_callchain_sort_key(tok)) {
+		    !parse_callchain_sort_key(tok) ||
+		    !parse_callchain_value(tok)) {
 			/* parsing ok - move on to the next */
 			try_stack_size = false;
 			goto next;
@@ -819,12 +837,26 @@ char *callchain_node__sprintf_value(struct callchain_node *node,
 				    char *bf, size_t bfsize, u64 total)
 {
 	double percent = 0.0;
-	u64 cumul = callchain_cumul_hits(node);
+	u64 period = callchain_cumul_hits(node);
+	int count = callchain_cumul_counts(node);
 
 	if (total)
-		percent = cumul * 100.0 / total;
+		percent = period * 100.0 / total;
+	if (callchain_param.mode == CHAIN_FOLDED)
+		count = node->count;
 
-	scnprintf(bf, bfsize, "%6.2f%%", percent);
+	switch (callchain_param.value) {
+	case CCVAL_PERIOD:
+		scnprintf(bf, bfsize, "%"PRIu64, period);
+		break;
+	case CCVAL_COUNT:
+		scnprintf(bf, bfsize, "%u", count);
+		break;
+	case CCVAL_PERCENT:
+	default:
+		scnprintf(bf, bfsize, "%.2f%%", percent);
+		break;
+	}
 	return bf;
 }
 
@@ -832,12 +864,24 @@ int callchain_node__fprintf_value(struct callchain_node *node,
 				 FILE *fp, u64 total)
 {
 	double percent = 0.0;
-	u64 cumul = callchain_cumul_hits(node);
+	u64 period = callchain_cumul_hits(node);
+	int count = callchain_cumul_counts(node);
 
 	if (total)
-		percent = cumul * 100.0 / total;
-
-	return percent_color_fprintf(fp, "%.2f%%", percent);
+		percent = period * 100.0 / total;
+	if (callchain_param.mode == CHAIN_FOLDED)
+		count = node->count;
+
+	switch (callchain_param.value) {
+	case CCVAL_PERIOD:
+		return fprintf(fp, "%"PRIu64, period);
+	case CCVAL_COUNT:
+		return fprintf(fp, "%u", count);
+	case CCVAL_PERCENT:
+	default:
+		return percent_color_fprintf(fp, "%.2f%%", percent);
+	}
+	return 0;
 }
 
 static void free_callchain_node(struct callchain_node *node)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 2f948f0ff034..e8533e328a47 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -29,7 +29,8 @@
 	HELP_PAD "print_limit:\tmaximum number of call graph entry (<number>)\n" \
 	HELP_PAD "order:\t\tcall graph order (caller|callee)\n" \
 	HELP_PAD "sort_key:\tcall graph sort key (function|address)\n"	\
-	HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n"
+	HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n" \
+	HELP_PAD "value:\t\tcall graph value (percent|period|count)\n"
 
 enum perf_call_graph_mode {
 	CALLCHAIN_NONE,
@@ -81,6 +82,12 @@ enum chain_key {
 	CCKEY_ADDRESS
 };
 
+enum chain_value {
+	CCVAL_PERCENT,
+	CCVAL_PERIOD,
+	CCVAL_COUNT,
+};
+
 struct callchain_param {
 	bool			enabled;
 	enum perf_call_graph_mode record_mode;
@@ -93,6 +100,7 @@ struct callchain_param {
 	bool			order_set;
 	enum chain_key		key;
 	bool			branch_callstack;
+	enum chain_value	value;
 };
 
 extern struct callchain_param callchain_param;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index cd12c25e4ea4..174912f87913 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -20,7 +20,8 @@ struct callchain_param	callchain_param = {
 	.mode	= CHAIN_GRAPH_ABS,
 	.min_percent = 0.5,
 	.order  = ORDER_CALLEE,
-	.key	= CCKEY_FUNCTION
+	.key	= CCKEY_FUNCTION,
+	.value	= CCVAL_PERCENT,
 };
 
 /*
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 12:57 [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Namhyung Kim
                   ` (3 preceding siblings ...)
  2015-11-02 12:57 ` [RFC/PATCH v2 4/4] perf report: Add callchain value option Namhyung Kim
@ 2015-11-02 20:37 ` Brendan Gregg
  2015-11-02 21:30   ` Arnaldo Carvalho de Melo
  4 siblings, 1 reply; 17+ messages in thread
From: Brendan Gregg @ 2015-11-02 20:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

G'Day Namhyung,

On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hello,
>
> This is what Brendan requested on the perf-users mailing list [1] to
> support FlameGraphs [2] more efficiently.  This patchset adds a few
> more callchain options to adjust the output for it.
>
> At first, 'folded' output mode was added.  The folded output puts all
> calchain nodes in a line separated by semicolons, a space and the
> value.  Now it only supports --stdio as other UI provides some way of
> folding/expanding callchains dynamically.
>
> The value is now can be one of 'percent', 'period', or 'count'.  The
> percent is current default output and the period is the raw number of
> sample periods.  The count is the number of samples for each callchain.
>
> Here's an example:
>
>   $ perf report --no-children --show-nr-samples --stdio -g folded,count
>   ...
>     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
>   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
>   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

Thanks!

So for the folded output I don't need the summary line (the row of
columns printed by hist_entry__snprintf()), and don't need anything
except folded stacks and the counts. If working with the existing
stdio interface is making it harder than it needs to be, might it be
easier to make it a separate interface (ui/folded), that just emitted
the folded output? Just an idea. This existing patchset is working for
me, I'd just be filtering the output.

Having the option for percentages and periods is nice. I can envisage
using periods (for latency flame graphs).

Brendan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 20:37 ` [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Brendan Gregg
@ 2015-11-02 21:30   ` Arnaldo Carvalho de Melo
  2015-11-02 22:12     ` Namhyung Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-02 21:30 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> G'Day Namhyung,
> 
> On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > Hello,
> >
> > This is what Brendan requested on the perf-users mailing list [1] to
> > support FlameGraphs [2] more efficiently.  This patchset adds a few
> > more callchain options to adjust the output for it.
> >
> > At first, 'folded' output mode was added.  The folded output puts all
> > calchain nodes in a line separated by semicolons, a space and the
> > value.  Now it only supports --stdio as other UI provides some way of
> > folding/expanding callchains dynamically.
> >
> > The value is now can be one of 'percent', 'period', or 'count'.  The
> > percent is current default output and the period is the raw number of
> > sample periods.  The count is the number of samples for each callchain.
> >
> > Here's an example:
> >
> >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
> >   ...
> >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
> >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> 
> Thanks!
> 
> So for the folded output I don't need the summary line (the row of
> columns printed by hist_entry__snprintf()), and don't need anything
> except folded stacks and the counts. If working with the existing
> stdio interface is making it harder than it needs to be, might it be

I don't think it so, just add some flag asking for that
hist_entry__snprintf() to be supressed, ideas for a long option name?

Having it as Namhyung did may have value for some people as a more
compact way to show the callchains together with the hist_entry line.

With this in mind, do you have any other issues with Namhyung's
patchkit? An acked-by/tested-by you would be nice to have, and then we
could work out the new option to suppress that hist_entry__snprintf()
in a follow up patch.

> easier to make it a separate interface (ui/folded), that just emitted
> the folded output? Just an idea. This existing patchset is working for
> me, I'd just be filtering the output.
> 
> Having the option for percentages and periods is nice. I can envisage
> using periods (for latency flame graphs).

You mean in the callchain lines?

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 21:30   ` Arnaldo Carvalho de Melo
@ 2015-11-02 22:12     ` Namhyung Kim
  2015-11-02 22:28       ` Arnaldo Carvalho de Melo
  2015-11-02 22:43       ` Brendan Gregg
  0 siblings, 2 replies; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 22:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Hi Arnaldo,

On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > G'Day Namhyung,
> > 
> > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > Hello,
> > >
> > > This is what Brendan requested on the perf-users mailing list [1] to
> > > support FlameGraphs [2] more efficiently.  This patchset adds a few
> > > more callchain options to adjust the output for it.
> > >
> > > At first, 'folded' output mode was added.  The folded output puts all
> > > calchain nodes in a line separated by semicolons, a space and the
> > > value.  Now it only supports --stdio as other UI provides some way of
> > > folding/expanding callchains dynamically.
> > >
> > > The value is now can be one of 'percent', 'period', or 'count'.  The
> > > percent is current default output and the period is the raw number of
> > > sample periods.  The count is the number of samples for each callchain.
> > >
> > > Here's an example:
> > >
> > >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > >   ...
> > >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
> > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > 
> > Thanks!
> > 
> > So for the folded output I don't need the summary line (the row of
> > columns printed by hist_entry__snprintf()), and don't need anything
> > except folded stacks and the counts. If working with the existing
> > stdio interface is making it harder than it needs to be, might it be
> 
> I don't think it so, just add some flag asking for that
> hist_entry__snprintf() to be supressed, ideas for a long option name?
> 
> Having it as Namhyung did may have value for some people as a more
> compact way to show the callchains together with the hist_entry line.

Yeah, I'd keep the hist entry line unless it's too hard to
parse/filter.  IMHO it's just a way to show callchains, so no need to
have separate output mode..

Brendan, I guess you still need to know other info like cpu or pid, no?

And I feel like it'd be better to put the count before the callchains
for consistency like below.  Is it OK to you?

  $ perf report --no-children --show-nr-samples --stdio -g folded,count
  ...
    39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
  57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
  23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...


> 
> With this in mind, do you have any other issues with Namhyung's
> patchkit? An acked-by/tested-by you would be nice to have, and then we
> could work out the new option to suppress that hist_entry__snprintf()
> in a follow up patch.
> 
> > easier to make it a separate interface (ui/folded), that just emitted
> > the folded output? Just an idea. This existing patchset is working for
> > me, I'd just be filtering the output.
> > 
> > Having the option for percentages and periods is nice. I can envisage
> > using periods (for latency flame graphs).

Glad to see you like it. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 22:12     ` Namhyung Kim
@ 2015-11-02 22:28       ` Arnaldo Carvalho de Melo
  2015-11-02 22:49         ` Namhyung Kim
  2015-11-02 22:43       ` Brendan Gregg
  1 sibling, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-02 22:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Hi Namhyung,

Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > support FlameGraphs [2] more efficiently.  This patchset adds a few
> > > > more callchain options to adjust the output for it.

> > > > At first, 'folded' output mode was added.  The folded output puts all
> > > > calchain nodes in a line separated by semicolons, a space and the
> > > > value.  Now it only supports --stdio as other UI provides some way of
> > > > folding/expanding callchains dynamically.

> > > > The value is now can be one of 'percent', 'period', or 'count'.  The
> > > > percent is current default output and the period is the raw number of
> > > > sample periods.  The count is the number of samples for each callchain.

> > > > Here's an example:

> > > >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > >   ...
> > > >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
> > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

> > > So for the folded output I don't need the summary line (the row of
> > > columns printed by hist_entry__snprintf()), and don't need anything
> > > except folded stacks and the counts. If working with the existing
> > > stdio interface is making it harder than it needs to be, might it be

> > I don't think it so, just add some flag asking for that
> > hist_entry__snprintf() to be supressed, ideas for a long option name?

> > Having it as Namhyung did may have value for some people as a more
> > compact way to show the callchains together with the hist_entry line.

> Yeah, I'd keep the hist entry line unless it's too hard to
> parse/filter.  IMHO it's just a way to show callchains, so no need to

What I suggested was to have something like:

  $ perf report --no-children --no-hists --stdio -g folded,count
                              ^^^^^^^^^^
                              ^^^^^^^^^^
  ...
  intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
  intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

I.e. the first entry in the callchain is 'intel_idle', just like in what
Brendan called the 'summary line', i.e. reduntant when what he wants its
just all the callchains and how many times they were sampled.

> have separate output mode..
 
> Brendan, I guess you still need to know other info like cpu or pid, no?

Possibly, but just with the callchains he has enough info for the basic
flame graph, no?
 
> And I feel like it'd be better to put the count before the callchains
> for consistency like below.  Is it OK to you?

Consistency with what?

The main thing here is the callchain, all the other stuff are things
related to it, so showing it first makes sense to me.

Having some way to list the desired info to have for each callchain may
be interesting, and if he could do it like:

   -g folded,count,cpu,other,fields

then he would know how to parse the per-callchain info at the end of
each line, right?

- Arnaldo

> 
>   $ perf report --no-children --show-nr-samples --stdio -g folded,count
>   ...
>     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
>   57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
>   23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...
> 
> 
> > 
> > With this in mind, do you have any other issues with Namhyung's
> > patchkit? An acked-by/tested-by you would be nice to have, and then we
> > could work out the new option to suppress that hist_entry__snprintf()
> > in a follow up patch.
> > 
> > > easier to make it a separate interface (ui/folded), that just emitted
> > > the folded output? Just an idea. This existing patchset is working for
> > > me, I'd just be filtering the output.
> > > 
> > > Having the option for percentages and periods is nice. I can envisage
> > > using periods (for latency flame graphs).
> 
> Glad to see you like it. :)
> 
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 22:12     ` Namhyung Kim
  2015-11-02 22:28       ` Arnaldo Carvalho de Melo
@ 2015-11-02 22:43       ` Brendan Gregg
  1 sibling, 0 replies; 17+ messages in thread
From: Brendan Gregg @ 2015-11-02 22:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

On Mon, Nov 2, 2015 at 2:12 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Arnaldo,
>
> On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
>> > G'Day Namhyung,
>> >
>> > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>> > > Hello,
>> > >
>> > > This is what Brendan requested on the perf-users mailing list [1] to
>> > > support FlameGraphs [2] more efficiently.  This patchset adds a few
>> > > more callchain options to adjust the output for it.
>> > >
>> > > At first, 'folded' output mode was added.  The folded output puts all
>> > > calchain nodes in a line separated by semicolons, a space and the
>> > > value.  Now it only supports --stdio as other UI provides some way of
>> > > folding/expanding callchains dynamically.
>> > >
>> > > The value is now can be one of 'percent', 'period', or 'count'.  The
>> > > percent is current default output and the period is the raw number of
>> > > sample periods.  The count is the number of samples for each callchain.
>> > >
>> > > Here's an example:
>> > >
>> > >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
>> > >   ...
>> > >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
>> > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
>> > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
>> >
>> > Thanks!
>> >
>> > So for the folded output I don't need the summary line (the row of
>> > columns printed by hist_entry__snprintf()), and don't need anything
>> > except folded stacks and the counts. If working with the existing
>> > stdio interface is making it harder than it needs to be, might it be
>>
>> I don't think it so, just add some flag asking for that
>> hist_entry__snprintf() to be supressed, ideas for a long option name?
>>
>> Having it as Namhyung did may have value for some people as a more
>> compact way to show the callchains together with the hist_entry line.
>
> Yeah, I'd keep the hist entry line unless it's too hard to
> parse/filter.  IMHO it's just a way to show callchains, so no need to
> have separate output mode..

Ok, good point, it can be thought of as a different stack representation format.

>
> Brendan, I guess you still need to know other info like cpu or pid, no?
>

Yes, I just realized that I either include the process name (Command
column) or name-PID, as the first folded element. Eg, output can be:

mkdir;getopt_long;page_fault;do_page_fault;__do_page_fault;filemap_map_pages 3

Or:

mkdir-21918;getopt_long;page_fault;do_page_fault;__do_page_fault;filemap_map_pages
2

Usually the first, but sometimes it's helpful to split on PID as well.

As for what to call such options (which may be a follow on patch
anyway) ... maybe something like:

"folded": fold stacks as single lines
"nameonly,folded": suppress summary line and include process name in
the folded stack
"pidonly,folded": suppress summary line and include process_name-PID
in the folded stack

> And I feel like it'd be better to put the count before the callchains
> for consistency like below.  Is it OK to you?
>
>   $ perf report --no-children --show-nr-samples --stdio -g folded,count
>   ...
>     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
>   57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
>   23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...
>

If it was printing with the perf report summary, sure, but if we have
a way to only emit folded output, then counts last would be perfect
and maybe a bit more intuitive (key then value).

>
>>
>> With this in mind, do you have any other issues with Namhyung's
>> patchkit? An acked-by/tested-by you would be nice to have, and then we
>> could work out the new option to suppress that hist_entry__snprintf()
>> in a follow up patch.

Acked and tested, yes.

Looks like I'd be using caller ordering, eg, to get lines like this:

__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
91

Which I can do just by using "-g folded,count,caller".

>>
>> > easier to make it a separate interface (ui/folded), that just emitted
>> > the folded output? Just an idea. This existing patchset is working for
>> > me, I'd just be filtering the output.
>> >
>> > Having the option for percentages and periods is nice. I can envisage
>> > using periods (for latency flame graphs).
>
> Glad to see you like it. :)
>
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 22:28       ` Arnaldo Carvalho de Melo
@ 2015-11-02 22:49         ` Namhyung Kim
  2015-11-02 23:04           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 22:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> Hi Namhyung,
> 
> Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > support FlameGraphs [2] more efficiently.  This patchset adds a few
> > > > > more callchain options to adjust the output for it.
> 
> > > > > At first, 'folded' output mode was added.  The folded output puts all
> > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > value.  Now it only supports --stdio as other UI provides some way of
> > > > > folding/expanding callchains dynamically.
> 
> > > > > The value is now can be one of 'percent', 'period', or 'count'.  The
> > > > > percent is current default output and the period is the raw number of
> > > > > sample periods.  The count is the number of samples for each callchain.
> 
> > > > > Here's an example:
> 
> > > > >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > >   ...
> > > > >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
> > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> 
> > > > So for the folded output I don't need the summary line (the row of
> > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > except folded stacks and the counts. If working with the existing
> > > > stdio interface is making it harder than it needs to be, might it be
> 
> > > I don't think it so, just add some flag asking for that
> > > hist_entry__snprintf() to be supressed, ideas for a long option name?
> 
> > > Having it as Namhyung did may have value for some people as a more
> > > compact way to show the callchains together with the hist_entry line.
> 
> > Yeah, I'd keep the hist entry line unless it's too hard to
> > parse/filter.  IMHO it's just a way to show callchains, so no need to
> 
> What I suggested was to have something like:
> 
>   $ perf report --no-children --no-hists --stdio -g folded,count
>                               ^^^^^^^^^^
>                               ^^^^^^^^^^
>   ...
>   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
>   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> 
> I.e. the first entry in the callchain is 'intel_idle', just like in what
> Brendan called the 'summary line', i.e. reduntant when what he wants its
> just all the callchains and how many times they were sampled.

Yep, I know.  But isn't 'perf report' all for seeing hist lines? :)

I'm not insisting it strongly, but it's a bit strange for me if perf
report doesn't show any hist lines..


> 
> > have separate output mode..
>  
> > Brendan, I guess you still need to know other info like cpu or pid, no?
> 
> Possibly, but just with the callchains he has enough info for the basic
> flame graph, no?
>  
> > And I feel like it'd be better to put the count before the callchains
> > for consistency like below.  Is it OK to you?
> 
> Consistency with what?

Oh, I meant consistency with other callchain output style like graph,
fractal or flat - They all show the numbers before callchains.  And I
think it's easier to read for human. :)


> 
> The main thing here is the callchain, all the other stuff are things
> related to it, so showing it first makes sense to me.
> 
> Having some way to list the desired info to have for each callchain may
> be interesting, and if he could do it like:
> 
>    -g folded,count,cpu,other,fields
> 
> then he would know how to parse the per-callchain info at the end of
> each line, right?

Hmm.. looks like that it ends up having redundant info.  I don't think
it's generally useful to other 'perf report' stuffs.  Wouldn't it be
better just adding minimal support and let the external tool parse the
output?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 22:49         ` Namhyung Kim
@ 2015-11-02 23:04           ` Arnaldo Carvalho de Melo
  2015-11-02 23:46             ` Namhyung Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-02 23:04 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Em Tue, Nov 03, 2015 at 07:49:27AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > Hi Namhyung,
> > 
> > Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > > support FlameGraphs [2] more efficiently.  This patchset adds a few
> > > > > > more callchain options to adjust the output for it.
> > 
> > > > > > At first, 'folded' output mode was added.  The folded output puts all
> > > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > > value.  Now it only supports --stdio as other UI provides some way of
> > > > > > folding/expanding callchains dynamically.
> > 
> > > > > > The value is now can be one of 'percent', 'period', or 'count'.  The
> > > > > > percent is current default output and the period is the raw number of
> > > > > > sample periods.  The count is the number of samples for each callchain.
> > 
> > > > > > Here's an example:
> > 
> > > > > >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > > >   ...
> > > > > >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
> > > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > 
> > > > > So for the folded output I don't need the summary line (the row of
> > > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > > except folded stacks and the counts. If working with the existing
> > > > > stdio interface is making it harder than it needs to be, might it be
> > 
> > > > I don't think it so, just add some flag asking for that
> > > > hist_entry__snprintf() to be supressed, ideas for a long option name?
> > 
> > > > Having it as Namhyung did may have value for some people as a more
> > > > compact way to show the callchains together with the hist_entry line.
> > 
> > > Yeah, I'd keep the hist entry line unless it's too hard to
> > > parse/filter.  IMHO it's just a way to show callchains, so no need to
> > 
> > What I suggested was to have something like:
> > 
> >   $ perf report --no-children --no-hists --stdio -g folded,count
> >                               ^^^^^^^^^^
> >                               ^^^^^^^^^^
> >   ...
> >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > 
> > I.e. the first entry in the callchain is 'intel_idle', just like in what
> > Brendan called the 'summary line', i.e. reduntant when what he wants its
> > just all the callchains and how many times they were sampled.
> 
> Yep, I know.  But isn't 'perf report' all for seeing hist lines? :)

Well, so far, yes, but he is presenting a usecase where what we want to
see is just callchains, and we can achieve that rather easily, no?
 
> I'm not insisting it strongly, but it's a bit strange for me if perf
> report doesn't show any hist lines..

If that is of no use in this use case, why not?

> > > have separate output mode..
> >  
> > > Brendan, I guess you still need to know other info like cpu or pid, no?
> > 
> > Possibly, but just with the callchains he has enough info for the basic
> > flame graph, no?
> >  
> > > And I feel like it'd be better to put the count before the callchains
> > > for consistency like below.  Is it OK to you?
> > 
> > Consistency with what?
> 
> Oh, I meant consistency with other callchain output style like graph,
> fractal or flat - They all show the numbers before callchains.  And I
> think it's easier to read for human. :)

Well, As I said, isn't the main object here the callchain? :-)

And Brendan's request is for a something to be consumed by scripts, i.e.
something like we have for perf stat:

For humans:

[root@felicio ~]# perf stat -e cycles -I 1000 -a
#           time             counts unit events
     1.000304391          1,820,038      cycles                   
     2.000490191      1,005,477,007      cycles                   
     3.000657813          1,717,007      cycles                   
^C     3.917890293          2,804,034      cycles                   

For machines/scripts:

[root@felicio ~]# perf stat -x, -e cycles -I 1000 -a
     1.000291954,1923360,,cycles,3998167210,100.00
     2.000477154,1005608105,,cycles,3998475482,100.00
     3.000612612,1345483,,cycles,3998332391,100.00
     4.000744469,1005046913,,cycles,3998258199,100.00
^C     4.331684347,1551327,,cycles,3463190970,100.00

[root@felicio ~]#

 
> > The main thing here is the callchain, all the other stuff are things
> > related to it, so showing it first makes sense to me.
> > 
> > Having some way to list the desired info to have for each callchain may
> > be interesting, and if he could do it like:
> > 
> >    -g folded,count,cpu,other,fields
> > 
> > then he would know how to parse the per-callchain info at the end of
> > each line, right?
> 
> Hmm.. looks like that it ends up having redundant info.  I don't think

What is redundant, and with with what? 

> it's generally useful to other 'perf report' stuffs.  Wouldn't it be
> better just adding minimal support and let the external tool parse the
> output?

Oh well, perhaps we could have a 'perf callchain' tool that would be
centered on callchains and would provided one line per callchain, which
would have:

callchain;seprarated;colons series,of,desired,fields,for,this,callchain

Which would reuse heavily the 'perf report' / 'perf top' code for
histograms, no?

I still think that this is a 'perf report' thing, but one that is
centered in callchains, and that is to be consumed by scripts, not
humans.

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 23:04           ` Arnaldo Carvalho de Melo
@ 2015-11-02 23:46             ` Namhyung Kim
  2015-11-03  0:46               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2015-11-02 23:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 07:49:27AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Hi Namhyung,
> > > 
> > > Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > > > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > > > support FlameGraphs [2] more efficiently.  This patchset adds a few
> > > > > > > more callchain options to adjust the output for it.
> > > 
> > > > > > > At first, 'folded' output mode was added.  The folded output puts all
> > > > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > > > value.  Now it only supports --stdio as other UI provides some way of
> > > > > > > folding/expanding callchains dynamically.
> > > 
> > > > > > > The value is now can be one of 'percent', 'period', or 'count'.  The
> > > > > > > percent is current default output and the period is the raw number of
> > > > > > > sample periods.  The count is the number of samples for each callchain.
> > > 
> > > > > > > Here's an example:
> > > 
> > > > > > >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > > > >   ...
> > > > > > >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
> > > > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > > 
> > > > > > So for the folded output I don't need the summary line (the row of
> > > > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > > > except folded stacks and the counts. If working with the existing
> > > > > > stdio interface is making it harder than it needs to be, might it be
> > > 
> > > > > I don't think it so, just add some flag asking for that
> > > > > hist_entry__snprintf() to be supressed, ideas for a long option name?
> > > 
> > > > > Having it as Namhyung did may have value for some people as a more
> > > > > compact way to show the callchains together with the hist_entry line.
> > > 
> > > > Yeah, I'd keep the hist entry line unless it's too hard to
> > > > parse/filter.  IMHO it's just a way to show callchains, so no need to
> > > 
> > > What I suggested was to have something like:
> > > 
> > >   $ perf report --no-children --no-hists --stdio -g folded,count
> > >                               ^^^^^^^^^^
> > >                               ^^^^^^^^^^
> > >   ...
> > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > > 
> > > I.e. the first entry in the callchain is 'intel_idle', just like in what
> > > Brendan called the 'summary line', i.e. reduntant when what he wants its
> > > just all the callchains and how many times they were sampled.
> > 
> > Yep, I know.  But isn't 'perf report' all for seeing hist lines? :)
> 
> Well, so far, yes, but he is presenting a usecase where what we want to
> see is just callchains, and we can achieve that rather easily, no?

But it's also easy to filter from the script side.


>  
> > I'm not insisting it strongly, but it's a bit strange for me if perf
> > report doesn't show any hist lines..
> 
> If that is of no use in this use case, why not?

Well, I think FlameGraphs is a rather unusual case and folded output
seems useful to other use cases too.


> 
> > > > have separate output mode..
> > >  
> > > > Brendan, I guess you still need to know other info like cpu or pid, no?
> > > 
> > > Possibly, but just with the callchains he has enough info for the basic
> > > flame graph, no?
> > >  
> > > > And I feel like it'd be better to put the count before the callchains
> > > > for consistency like below.  Is it OK to you?
> > > 
> > > Consistency with what?
> > 
> > Oh, I meant consistency with other callchain output style like graph,
> > fractal or flat - They all show the numbers before callchains.  And I
> > think it's easier to read for human. :)
> 
> Well, As I said, isn't the main object here the callchain? :-)
> 
> And Brendan's request is for a something to be consumed by scripts, i.e.
> something like we have for perf stat:
> 
> For humans:
> 
> [root@felicio ~]# perf stat -e cycles -I 1000 -a
> #           time             counts unit events
>      1.000304391          1,820,038      cycles                   
>      2.000490191      1,005,477,007      cycles                   
>      3.000657813          1,717,007      cycles                   
> ^C     3.917890293          2,804,034      cycles                   
> 
> For machines/scripts:
> 
> [root@felicio ~]# perf stat -x, -e cycles -I 1000 -a
>      1.000291954,1923360,,cycles,3998167210,100.00
>      2.000477154,1005608105,,cycles,3998475482,100.00
>      3.000612612,1345483,,cycles,3998332391,100.00
>      4.000744469,1005046913,,cycles,3998258199,100.00
> ^C     4.331684347,1551327,,cycles,3463190970,100.00
> 
> [root@felicio ~]#

Yes, I thought about it too.  Maybe -t/--field-separator option can be
used to separate folded callchains too.

> 
>  
> > > The main thing here is the callchain, all the other stuff are things
> > > related to it, so showing it first makes sense to me.
> > > 
> > > Having some way to list the desired info to have for each callchain may
> > > be interesting, and if he could do it like:
> > > 
> > >    -g folded,count,cpu,other,fields
> > > 
> > > then he would know how to parse the per-callchain info at the end of
> > > each line, right?
> > 
> > Hmm.. looks like that it ends up having redundant info.  I don't think
> 
> What is redundant, and with with what?

When it's used with normal perf report cases, those other info in
callchain lines are redundant to hist lines.  Also if a hist entry has
many callchains, each callchain lines will have same info in other fields.


> 
> > it's generally useful to other 'perf report' stuffs.  Wouldn't it be
> > better just adding minimal support and let the external tool parse the
> > output?
> 
> Oh well, perhaps we could have a 'perf callchain' tool that would be
> centered on callchains and would provided one line per callchain, which
> would have:
> 
> callchain;seprarated;colons series,of,desired,fields,for,this,callchain
> 
> Which would reuse heavily the 'perf report' / 'perf top' code for
> histograms, no?

I guess the callchain code is pretty isolated or can be isolated
easily though.


> 
> I still think that this is a 'perf report' thing, but one that is
> centered in callchains, and that is to be consumed by scripts, not
> humans.

Agreed.

I'm just looking for a way to support it with minimal change. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-02 23:46             ` Namhyung Kim
@ 2015-11-03  0:46               ` Arnaldo Carvalho de Melo
  2015-11-03  1:35                 ` Namhyung Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-03  0:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 03, 2015 at 07:49:27AM +0900, Namhyung Kim escreveu:
> > > On Mon, Nov 02, 2015 at 07:28:42PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Tue, Nov 03, 2015 at 07:12:04AM +0900, Namhyung Kim escreveu:
> > > > > On Mon, Nov 02, 2015 at 06:30:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > > Em Mon, Nov 02, 2015 at 12:37:28PM -0800, Brendan Gregg escreveu:
> > > > > > > On Mon, Nov 2, 2015 at 4:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > > > This is what Brendan requested on the perf-users mailing list [1] to
> > > > > > > > support FlameGraphs [2] more efficiently.  This patchset adds a few
> > > > > > > > more callchain options to adjust the output for it.
> > > > 
> > > > > > > > At first, 'folded' output mode was added.  The folded output puts all
> > > > > > > > calchain nodes in a line separated by semicolons, a space and the
> > > > > > > > value.  Now it only supports --stdio as other UI provides some way of
> > > > > > > > folding/expanding callchains dynamically.
> > > > 
> > > > > > > > The value is now can be one of 'percent', 'period', or 'count'.  The
> > > > > > > > percent is current default output and the period is the raw number of
> > > > > > > > sample periods.  The count is the number of samples for each callchain.
> > > > 
> > > > > > > > Here's an example:
> > > > 
> > > > > > > >   $ perf report --no-children --show-nr-samples --stdio -g folded,count
> > > > > > > >   ...
> > > > > > > >     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
> > > > > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > > > > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23

> > > > > > > So for the folded output I don't need the summary line (the row of
> > > > > > > columns printed by hist_entry__snprintf()), and don't need anything
> > > > > > > except folded stacks and the counts. If working with the existing
> > > > > > > stdio interface is making it harder than it needs to be, might it be

> > > > > > I don't think it so, just add some flag asking for that
> > > > > > hist_entry__snprintf() to be supressed, ideas for a long option name?

> > > > > > Having it as Namhyung did may have value for some people as a more
> > > > > > compact way to show the callchains together with the hist_entry line.

> > > > > Yeah, I'd keep the hist entry line unless it's too hard to
> > > > > parse/filter.  IMHO it's just a way to show callchains, so no need to

> > > > What I suggested was to have something like:

> > > >   $ perf report --no-children --no-hists --stdio -g folded,count
> > > >                               ^^^^^^^^^^
> > > >                               ^^^^^^^^^^
> > > >   ...
> > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 57
> > > >   intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... 23
> > > > 
> > > > I.e. the first entry in the callchain is 'intel_idle', just like in what
> > > > Brendan called the 'summary line', i.e. reduntant when what he wants its
> > > > just all the callchains and how many times they were sampled.

> > > Yep, I know.  But isn't 'perf report' all for seeing hist lines? :)

> > Well, so far, yes, but he is presenting a usecase where what we want to
> > see is just callchains, and we can achieve that rather easily, no?
 
> But it's also easy to filter from the script side.

Why not go all the way and provide just what the script wants?
 
> > > I'm not insisting it strongly, but it's a bit strange for me if perf
> > > report doesn't show any hist lines..
> > 
> > If that is of no use in this use case, why not?
> 
> Well, I think FlameGraphs is a rather unusual case and folded output
> seems useful to other use cases too.

Sure thing, I agreed with that, its just one flag to tell if the
hist_entry__snprintf should be used or not.

> > > > > have separate output mode..
> > > >  
> > > > > Brendan, I guess you still need to know other info like cpu or pid, no?
> > > > 
> > > > Possibly, but just with the callchains he has enough info for the basic
> > > > flame graph, no?
> > > >  
> > > > > And I feel like it'd be better to put the count before the callchains
> > > > > for consistency like below.  Is it OK to you?
> > > > 
> > > > Consistency with what?
> > > 
> > > Oh, I meant consistency with other callchain output style like graph,
> > > fractal or flat - They all show the numbers before callchains.  And I
> > > think it's easier to read for human. :)
> > 
> > Well, As I said, isn't the main object here the callchain? :-)
> > 
> > And Brendan's request is for a something to be consumed by scripts, i.e.
> > something like we have for perf stat:
> > 
> > For humans:
> > 
> > [root@felicio ~]# perf stat -e cycles -I 1000 -a
> > #           time             counts unit events
> >      1.000304391          1,820,038      cycles                   
> >      2.000490191      1,005,477,007      cycles                   
> >      3.000657813          1,717,007      cycles                   
> > ^C     3.917890293          2,804,034      cycles                   
> > 
> > For machines/scripts:
> > 
> > [root@felicio ~]# perf stat -x, -e cycles -I 1000 -a
> >      1.000291954,1923360,,cycles,3998167210,100.00
> >      2.000477154,1005608105,,cycles,3998475482,100.00
> >      3.000612612,1345483,,cycles,3998332391,100.00
> >      4.000744469,1005046913,,cycles,3998258199,100.00
> > ^C     4.331684347,1551327,,cycles,3463190970,100.00
> > 
> > [root@felicio ~]#
 
> Yes, I thought about it too.  Maybe -t/--field-separator option can be
> used to separate folded callchains too.

What I meant here was: for humans, we don't want a field separator, and
we want headers, we want alignment, etc, while for scripts, its better
something easily parseable and with a record per line, no alignment is
needed, etc.
 
> > > > The main thing here is the callchain, all the other stuff are things
> > > > related to it, so showing it first makes sense to me.
> > > > 
> > > > Having some way to list the desired info to have for each callchain may
> > > > be interesting, and if he could do it like:
> > > > 
> > > >    -g folded,count,cpu,other,fields
> > > > 
> > > > then he would know how to parse the per-callchain info at the end of
> > > > each line, right?
> > > 
> > > Hmm.. looks like that it ends up having redundant info.  I don't think
> > 
> > What is redundant, and with with what?
> 
> When it's used with normal perf report cases, those other info in
> callchain lines are redundant to hist lines.  Also if a hist entry has

Sure, but if the user doesn't want to see the output of
hist_entry__snprintf()... :-)

> many callchains, each callchain lines will have same info in other fields.

Sure, but that would be what the script expects to consume, i.e. one
line per callchain.

> > > it's generally useful to other 'perf report' stuffs.  Wouldn't it be
> > > better just adding minimal support and let the external tool parse the
> > > output?
> > 
> > Oh well, perhaps we could have a 'perf callchain' tool that would be
> > centered on callchains and would provided one line per callchain, which
> > would have:
> > 
> > callchain;seprarated;colons series,of,desired,fields,for,this,callchain
> > 
> > Which would reuse heavily the 'perf report' / 'perf top' code for
> > histograms, no?
 
> I guess the callchain code is pretty isolated or can be isolated
> easily though.
 
> > I still think that this is a 'perf report' thing, but one that is
> > centered in callchains, and that is to be consumed by scripts, not
> > humans.
 
> Agreed.
 
> I'm just looking for a way to support it with minimal change. :)

Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
callchain code, or anything like that, just one long option switch and
we get what we need.

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-03  0:46               ` Arnaldo Carvalho de Melo
@ 2015-11-03  1:35                 ` Namhyung Kim
  2015-11-03  1:46                   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2015-11-03  1:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

On Mon, Nov 02, 2015 at 09:46:47PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > > I still think that this is a 'perf report' thing, but one that is
> > > centered in callchains, and that is to be consumed by scripts, not
> > > humans.
>  
> > Agreed.
>  
> > I'm just looking for a way to support it with minimal change. :)
> 
> Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
> callchain code, or anything like that, just one long option switch and
> we get what we need.

Hmm.. okay.  Let me think about the --no-hists flags then.

What do you want to do if the --no-hists flags is used without folded
callchain mode or other than --stdio?

And if you want to print other info in the callchains, what would be
the output of non-folded mode?

I think the simplest solution would be supporting the folded mode only
and error out other cases.  Is it ok to you?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-03  1:35                 ` Namhyung Kim
@ 2015-11-03  1:46                   ` Arnaldo Carvalho de Melo
  2015-11-03  3:17                     ` Namhyung Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-03  1:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Em Tue, Nov 03, 2015 at 10:35:35AM +0900, Namhyung Kim escreveu:
> On Mon, Nov 02, 2015 at 09:46:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> > > On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > I still think that this is a 'perf report' thing, but one that is
> > > > centered in callchains, and that is to be consumed by scripts, not
> > > > humans.
> >  
> > > Agreed.
> >  
> > > I'm just looking for a way to support it with minimal change. :)
> > 
> > Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
> > callchain code, or anything like that, just one long option switch and
> > we get what we need.
> 
> Hmm.. okay.  Let me think about the --no-hists flags then.
> 
> What do you want to do if the --no-hists flags is used without folded
> callchain mode or other than --stdio?

What the user asked it to, to not show what hist_entry__snprintf()
produces, i.e. just the callchains.

Its left to the user to decide if that output is good for whatever
purpose it has in mind.

We, from this discussion, know that suppressing it when using with
folded callchains, is useful at least for Brendan's scripts :-)
 
> And if you want to print other info in the callchains, what would be
> the output of non-folded mode?
 
> I think the simplest solution would be supporting the folded mode only
> and error out other cases.  Is it ok to you?

Well, the other info, if it comes at the end, may even be useful in non
folded mode, no?

If it is not, then the user will not use it, i.e. some combinations may
not produce useful results, but if we want to have more flexibility to
support usecases like Brendan's, and I think we want, without making the
existing code overly complex, then why not?

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH 0/4] perf report: Support folded callchain output (v2)
  2015-11-03  1:46                   ` Arnaldo Carvalho de Melo
@ 2015-11-03  3:17                     ` Namhyung Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2015-11-03  3:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

On Mon, Nov 02, 2015 at 10:46:00PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 10:35:35AM +0900, Namhyung Kim escreveu:
> > On Mon, Nov 02, 2015 at 09:46:47PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Nov 03, 2015 at 08:46:06AM +0900, Namhyung Kim escreveu:
> > > > On Mon, Nov 02, 2015 at 08:04:36PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > I still think that this is a 'perf report' thing, but one that is
> > > > > centered in callchains, and that is to be consumed by scripts, not
> > > > > humans.
> > >  
> > > > Agreed.
> > >  
> > > > I'm just looking for a way to support it with minimal change. :)
> > > 
> > > Hey, me too. A --no-hists flag looks like a quickie, no need to isolate
> > > callchain code, or anything like that, just one long option switch and
> > > we get what we need.
> > 
> > Hmm.. okay.  Let me think about the --no-hists flags then.
> > 
> > What do you want to do if the --no-hists flags is used without folded
> > callchain mode or other than --stdio?
> 
> What the user asked it to, to not show what hist_entry__snprintf()
> produces, i.e. just the callchains.
> 
> Its left to the user to decide if that output is good for whatever
> purpose it has in mind.

OK, will add it in a follow-up patch after checking TUI and GTK.

> 
> We, from this discussion, know that suppressing it when using with
> folded callchains, is useful at least for Brendan's scripts :-)

OK

>  
> > And if you want to print other info in the callchains, what would be
> > the output of non-folded mode?
>  
> > I think the simplest solution would be supporting the folded mode only
> > and error out other cases.  Is it ok to you?
> 
> Well, the other info, if it comes at the end, may even be useful in non
> folded mode, no?

At the end?  Brendan wanted to have it first and I think it'd be
better to show first.

Anyway, this other info depends on the sort keys - IOW it cannot show
task comm name if user gave sort keys without comm like '-s cpu'.  So
how about adding 'info' or 'context' (or whatever name it) option to
-g/--call-graph to show info selected by sort keys.

For example,

  $ perf report --no-children --stdio -s comm,dso -g folded,info --no-hists
  28.63% swapper,[kernel.vmlinux] intel_idle;cpuidle_enter_state;...
  11.30% swapper,[kernel.vmlinux] intel_idle;cpuidle_enter_state;...


  $ perf report --no-children --stdio -s pid,sym -g info
  ...
    39.93%  swapper  [k] intel_idle
            <0:swapper,intel_idle>
	    |
	    |---intel_idel
	        cpuidle_enter_state
		...

What do you think?

Thanks,
Namhyung


> 
> If it is not, then the user will not use it, i.e. some combinations may
> not produce useful results, but if we want to have more flexibility to
> support usecases like Brendan's, and I think we want, without making the
> existing code overly complex, then why not?
> 
> - Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-11-03  3:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-02 12:57 [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Namhyung Kim
2015-11-02 12:57 ` [RFC/PATCH v2 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
2015-11-02 12:57 ` [RFC/PATCH v2 2/4] perf callchain: Abstract callchain print function Namhyung Kim
2015-11-02 12:57 ` [RFC/PATCH v2 3/4] perf callchain: Add count fields to struct callchain_node Namhyung Kim
2015-11-02 12:57 ` [RFC/PATCH v2 4/4] perf report: Add callchain value option Namhyung Kim
2015-11-02 20:37 ` [RFC/PATCH 0/4] perf report: Support folded callchain output (v2) Brendan Gregg
2015-11-02 21:30   ` Arnaldo Carvalho de Melo
2015-11-02 22:12     ` Namhyung Kim
2015-11-02 22:28       ` Arnaldo Carvalho de Melo
2015-11-02 22:49         ` Namhyung Kim
2015-11-02 23:04           ` Arnaldo Carvalho de Melo
2015-11-02 23:46             ` Namhyung Kim
2015-11-03  0:46               ` Arnaldo Carvalho de Melo
2015-11-03  1:35                 ` Namhyung Kim
2015-11-03  1:46                   ` Arnaldo Carvalho de Melo
2015-11-03  3:17                     ` Namhyung Kim
2015-11-02 22:43       ` Brendan Gregg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.