All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/4] perf report: Support folded callchain output (v4)
@ 2015-11-03 12:52 Namhyung Kim
  2015-11-03 12:52 ` [PATCH v4 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Namhyung Kim @ 2015-11-03 12:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Hello,

This is what Brendan requested on the perf-users mailing list [1] to
support FlameGraphs [2] more efficiently.  This patchset adds a few
more callchain options to adjust the output for it.

 * changes in v4)
  - add missing doc update
  - cleanup/fix callchain value print code
  - add Acked-by from Brendan and Jiri

 * changes in v3)
  - put the value before callchains
  - fix compile error


At first, 'folded' output mode was added.  The folded output puts the
value, a space and all calchain nodes separated by semicolons.  Now it
only supports --stdio as other UI provides some way of folding and/or
expanding callchains dynamically.

The value is now can be one of 'percent', 'period', or 'count'.  The
percent is current default output and the period is the raw number of
sample periods.  The count is the number of samples for each callchain.

Here's an example:

  $ perf report --no-children --show-nr-samples --stdio -g folded,count
  ...
    39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
  57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
  23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...


  $ perf report --no-children --stdio -g percent
  ...
    39.93%  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--28.63%-- start_secondary
               |
                --11.30%-- rest_init


  $ perf report --no-children --stdio --show-total-period -g period
  ...
    39.93%   13018705  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--9334403-- start_secondary
               |
                --3684302-- rest_init


  $ perf report --no-children --stdio --show-nr-samples -g count
  ...
    39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--57-- start_secondary
               |
                --23-- rest_init


You can get it from 'perf/callchain-fold-v4' branch on my tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Any comments are welcome, thanks
Namhyung


[1] http://www.spinics.net/lists/linux-perf-users/msg02498.html
[2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html


Namhyung Kim (4):
  perf report: Support folded callchain mode on --stdio
  perf callchain: Abstract callchain print function
  perf callchain: Add count fields to struct callchain_node
  perf report: Add callchain value option

 tools/perf/Documentation/perf-report.txt | 13 +++--
 tools/perf/builtin-report.c              |  4 +-
 tools/perf/ui/browsers/hists.c           |  8 +--
 tools/perf/ui/gtk/hists.c                |  8 +--
 tools/perf/ui/stdio/hist.c               | 93 ++++++++++++++++++++++++++------
 tools/perf/util/callchain.c              | 87 +++++++++++++++++++++++++++++-
 tools/perf/util/callchain.h              | 24 ++++++++-
 tools/perf/util/util.c                   |  3 +-
 8 files changed, 205 insertions(+), 35 deletions(-)

-- 
2.6.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/4] perf report: Support folded callchain mode on --stdio
  2015-11-03 12:52 [PATCHSET 0/4] perf report: Support folded callchain output (v4) Namhyung Kim
@ 2015-11-03 12:52 ` Namhyung Kim
  2015-11-03 12:52 ` [PATCH v4 2/4] perf callchain: Abstract callchain print function Namhyung Kim
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2015-11-03 12:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, and preceded by (absolute)
percent values and a space.

For example, following 20 lines can be printed in 3 lines with the
folded output mode;

  $ perf report -g flat --no-children | grep -v ^# | head -20
      60.48%  swapper  [kernel.vmlinux]  [k] intel_idle
              54.60%
                 intel_idle
                 cpuidle_enter_state
                 cpuidle_enter
                 call_cpuidle
                 cpu_startup_entry
                 start_secondary

              5.88%
                 intel_idle
                 cpuidle_enter_state
                 cpuidle_enter
                 call_cpuidle
                 cpu_startup_entry
                 rest_init
                 start_kernel
                 x86_64_start_reservations
                 x86_64_start_kernel

  $ perf report -g folded --no-children | grep -v ^# | head -3
      60.48%  swapper  [kernel.vmlinux]  [k] intel_idle
  54.60% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
  5.88% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel

This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1].  Support for other UI might be
added later.

[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

Requested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/ui/stdio/hist.c               | 54 ++++++++++++++++++++++++++++++++
 tools/perf/util/callchain.c              |  6 ++++
 tools/perf/util/callchain.h              |  5 +--
 4 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 5ce8da1e1256..f7d81aac9188 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -181,6 +181,7 @@ OPTIONS
 	- graph: use a graph tree, displaying absolute overhead rates. (default)
 	- fractal: like graph, but displays relative rates. Each branch of
 		 the tree is considered as a new profiled object.
+	- folded: call chains are displayed in a line, separated by semicolons
 	- none: disable call chain display.
 
 	threshold is a percentage value which specifies a minimum percent to be
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index dfcbc90146ef..d04c068f4acf 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -260,6 +260,57 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
 	return ret;
 }
 
+static size_t __callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
+{
+	struct callchain_list *chain;
+	size_t ret = 0;
+	char bf[1024];
+	bool first;
+
+	if (!node)
+		return 0;
+
+	ret += __callchain__fprintf_folded(fp, node->parent);
+
+	first = (ret == 0);
+	list_for_each_entry(chain, &node->val, list) {
+		if (chain->ip >= PERF_CONTEXT_MAX)
+			continue;
+		ret += fprintf(fp, "%s%s", first ? "" : ";",
+			       callchain_list__sym_name(chain,
+						bf, sizeof(bf), false));
+		first = false;
+	}
+
+	return ret;
+}
+
+static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
+					u64 total_samples)
+{
+	size_t ret = 0;
+	u32 entries_printed = 0;
+	struct callchain_node *chain;
+	struct rb_node *rb_node = rb_first(tree);
+
+	while (rb_node) {
+		double percent;
+
+		chain = rb_entry(rb_node, struct callchain_node, rb_node);
+		percent = chain->hit * 100.0 / total_samples;
+
+		ret += fprintf(fp, "%.2f%% ", percent);
+		ret += __callchain__fprintf_folded(fp, chain);
+		ret += fprintf(fp, "\n");
+		if (++entries_printed == callchain_param.print_limit)
+			break;
+
+		rb_node = rb_next(rb_node);
+	}
+
+	return ret;
+}
+
 static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
 					    u64 total_samples, int left_margin,
 					    FILE *fp)
@@ -278,6 +329,9 @@ static size_t hist_entry_callchain__fprintf(struct hist_entry *he,
 	case CHAIN_FLAT:
 		return callchain__fprintf_flat(fp, &he->sorted_chain, total_samples);
 		break;
+	case CHAIN_FOLDED:
+		return callchain__fprintf_folded(fp, &he->sorted_chain, total_samples);
+		break;
 	case CHAIN_NONE:
 		break;
 	default:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 735ad48e1858..08cb220ba5ea 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -44,6 +44,10 @@ static int parse_callchain_mode(const char *value)
 		callchain_param.mode = CHAIN_GRAPH_REL;
 		return 0;
 	}
+	if (!strncmp(value, "folded", strlen(value))) {
+		callchain_param.mode = CHAIN_FOLDED;
+		return 0;
+	}
 	return -1;
 }
 
@@ -218,6 +222,7 @@ rb_insert_callchain(struct rb_root *root, struct callchain_node *chain,
 
 		switch (mode) {
 		case CHAIN_FLAT:
+		case CHAIN_FOLDED:
 			if (rnode->hit < chain->hit)
 				p = &(*p)->rb_left;
 			else
@@ -338,6 +343,7 @@ int callchain_register_param(struct callchain_param *param)
 		param->sort = sort_chain_graph_rel;
 		break;
 	case CHAIN_FLAT:
+	case CHAIN_FOLDED:
 		param->sort = sort_chain_flat;
 		break;
 	case CHAIN_NONE:
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index fce8161e54db..544d99ac169c 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -24,7 +24,7 @@
 #define CALLCHAIN_RECORD_HELP  CALLCHAIN_HELP RECORD_MODE_HELP RECORD_SIZE_HELP
 
 #define CALLCHAIN_REPORT_HELP						\
-	HELP_PAD "print_type:\tcall graph printing style (graph|flat|fractal|none)\n" \
+	HELP_PAD "print_type:\tcall graph printing style (graph|flat|fractal|folded|none)\n" \
 	HELP_PAD "threshold:\tminimum call graph inclusion threshold (<percent>)\n" \
 	HELP_PAD "print_limit:\tmaximum number of call graph entry (<number>)\n" \
 	HELP_PAD "order:\t\tcall graph order (caller|callee)\n" \
@@ -43,7 +43,8 @@ enum chain_mode {
 	CHAIN_NONE,
 	CHAIN_FLAT,
 	CHAIN_GRAPH_ABS,
-	CHAIN_GRAPH_REL
+	CHAIN_GRAPH_REL,
+	CHAIN_FOLDED,
 };
 
 enum chain_order {
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/4] perf callchain: Abstract callchain print function
  2015-11-03 12:52 [PATCHSET 0/4] perf report: Support folded callchain output (v4) Namhyung Kim
  2015-11-03 12:52 ` [PATCH v4 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
@ 2015-11-03 12:52 ` Namhyung Kim
  2015-11-03 12:52 ` [PATCH v4 3/4] perf callchain: Add count fields to struct callchain_node Namhyung Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2015-11-03 12:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

This is a preparation to support for printing other type of callchain
value like count or period.

Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  8 +++++---
 tools/perf/ui/gtk/hists.c      |  8 ++------
 tools/perf/ui/stdio/hist.c     | 35 +++++++++++++++++------------------
 tools/perf/util/callchain.c    | 29 +++++++++++++++++++++++++++++
 tools/perf/util/callchain.h    |  4 ++++
 5 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e5afb8936040..a8897aab4c4a 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -592,7 +592,6 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
 	while (node) {
 		struct callchain_node *child = rb_entry(node, struct callchain_node, rb_node);
 		struct rb_node *next = rb_next(node);
-		u64 cumul = callchain_cumul_hits(child);
 		struct callchain_list *chain;
 		char folded_sign = ' ';
 		int first = true;
@@ -619,9 +618,12 @@ static int hist_browser__show_callchain(struct hist_browser *browser,
 						       browser->show_dso);
 
 			if (was_first && need_percent) {
-				double percent = cumul * 100.0 / total;
+				char buf[64];
 
-				if (asprintf(&alloc_str, "%2.2f%% %s", percent, str) < 0)
+				callchain_node__sprintf_value(child, buf, sizeof(buf),
+							      total);
+
+				if (asprintf(&alloc_str, "%s %s", buf, str) < 0)
 					str = "Not enough memory!";
 				else
 					str = alloc_str;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4b3585eed1e8..d8037b7023e8 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -100,14 +100,10 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 		struct callchain_list *chain;
 		GtkTreeIter iter, new_parent;
 		bool need_new_parent;
-		double percent;
-		u64 hits, child_total;
+		u64 child_total;
 
 		node = rb_entry(nd, struct callchain_node, rb_node);
 
-		hits = callchain_cumul_hits(node);
-		percent = 100.0 * hits / total;
-
 		new_parent = *parent;
 		need_new_parent = !has_single_node && (node->val_nr > 1);
 
@@ -116,7 +112,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
 
 			gtk_tree_store_append(store, &iter, &new_parent);
 
-			scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
+			callchain_node__sprintf_value(node, buf, sizeof(buf), total);
 			gtk_tree_store_set(store, &iter, 0, buf, -1);
 
 			callchain_list__sym_name(chain, buf, sizeof(buf), false);
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index d04c068f4acf..9b5a0d8a43dc 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -34,10 +34,10 @@ static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
 	return ret;
 }
 
-static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
+static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
+				     struct callchain_list *chain,
 				     int depth, int depth_mask, int period,
-				     u64 total_samples, u64 hits,
-				     int left_margin)
+				     u64 total_samples, int left_margin)
 {
 	int i;
 	size_t ret = 0;
@@ -50,10 +50,9 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
 		else
 			ret += fprintf(fp, " ");
 		if (!period && i == depth - 1) {
-			double percent;
-
-			percent = hits * 100.0 / total_samples;
-			ret += percent_color_fprintf(fp, "--%2.2f%%-- ", percent);
+			ret += fprintf(fp, "--");
+			ret += callchain_node__fprintf_value(node, fp, total_samples);
+			ret += fprintf(fp, "--");
 		} else
 			ret += fprintf(fp, "%s", "          ");
 	}
@@ -120,10 +119,9 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 						   left_margin);
 		i = 0;
 		list_for_each_entry(chain, &child->val, list) {
-			ret += ipchain__fprintf_graph(fp, chain, depth,
+			ret += ipchain__fprintf_graph(fp, child, chain, depth,
 						      new_depth_mask, i++,
 						      total_samples,
-						      cumul,
 						      left_margin);
 		}
 
@@ -143,14 +141,17 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 
 	if (callchain_param.mode == CHAIN_GRAPH_REL &&
 		remaining && remaining != total_samples) {
+		struct callchain_node rem_node = {
+			.hit = remaining,
+		};
 
 		if (!rem_sq_bracket)
 			return ret;
 
 		new_depth_mask &= ~(1 << (depth - 1));
-		ret += ipchain__fprintf_graph(fp, &rem_hits, depth,
+		ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
 					      new_depth_mask, 0, total_samples,
-					      remaining, left_margin);
+					      left_margin);
 	}
 
 	return ret;
@@ -243,12 +244,11 @@ static size_t callchain__fprintf_flat(FILE *fp, struct rb_root *tree,
 	struct rb_node *rb_node = rb_first(tree);
 
 	while (rb_node) {
-		double percent;
-
 		chain = rb_entry(rb_node, struct callchain_node, rb_node);
-		percent = chain->hit * 100.0 / total_samples;
 
-		ret = percent_color_fprintf(fp, "           %6.2f%%\n", percent);
+		ret += fprintf(fp, "           ");
+		ret += callchain_node__fprintf_value(chain, fp, total_samples);
+		ret += fprintf(fp, "\n");
 		ret += __callchain__fprintf_flat(fp, chain, total_samples);
 		ret += fprintf(fp, "\n");
 		if (++entries_printed == callchain_param.print_limit)
@@ -294,12 +294,11 @@ static size_t callchain__fprintf_folded(FILE *fp, struct rb_root *tree,
 	struct rb_node *rb_node = rb_first(tree);
 
 	while (rb_node) {
-		double percent;
 
 		chain = rb_entry(rb_node, struct callchain_node, rb_node);
-		percent = chain->hit * 100.0 / total_samples;
 
-		ret += fprintf(fp, "%.2f%% ", percent);
+		ret += callchain_node__fprintf_value(chain, fp, total_samples);
+		ret += fprintf(fp, " ");
 		ret += __callchain__fprintf_folded(fp, chain);
 		ret += fprintf(fp, "\n");
 		if (++entries_printed == callchain_param.print_limit)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 08cb220ba5ea..e2ef9b38acb6 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -805,6 +805,35 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	return bf;
 }
 
+char *callchain_node__sprintf_value(struct callchain_node *node,
+				    char *bf, size_t bfsize, u64 total)
+{
+	double percent = 0.0;
+	u64 period = callchain_cumul_hits(node);
+
+	if (callchain_param.mode == CHAIN_FOLDED)
+		period = node->hit;
+	if (total)
+		percent = period * 100.0 / total;
+
+	scnprintf(bf, bfsize, "%.2f%%", percent);
+	return bf;
+}
+
+int callchain_node__fprintf_value(struct callchain_node *node,
+				 FILE *fp, u64 total)
+{
+	double percent = 0.0;
+	u64 period = callchain_cumul_hits(node);
+
+	if (callchain_param.mode == CHAIN_FOLDED)
+		period = node->hit;
+	if (total)
+		percent = period * 100.0 / total;
+
+	return percent_color_fprintf(fp, "%.2f%%", percent);
+}
+
 static void free_callchain_node(struct callchain_node *node)
 {
 	struct callchain_list *list, *tmp;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 544d99ac169c..f9e00e3d1243 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -230,6 +230,10 @@ static inline int arch_skip_callchain_idx(struct thread *thread __maybe_unused,
 
 char *callchain_list__sym_name(struct callchain_list *cl,
 			       char *bf, size_t bfsize, bool show_dso);
+char *callchain_node__sprintf_value(struct callchain_node *node,
+				    char *bf, size_t bfsize, u64 total);
+int callchain_node__fprintf_value(struct callchain_node *node,
+				  FILE *fp, u64 total);
 
 void free_callchain(struct callchain_root *root);
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/4] perf callchain: Add count fields to struct callchain_node
  2015-11-03 12:52 [PATCHSET 0/4] perf report: Support folded callchain output (v4) Namhyung Kim
  2015-11-03 12:52 ` [PATCH v4 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
  2015-11-03 12:52 ` [PATCH v4 2/4] perf callchain: Abstract callchain print function Namhyung Kim
@ 2015-11-03 12:52 ` Namhyung Kim
  2015-11-03 12:52 ` [PATCH v4 4/4] perf report: Add callchain value option Namhyung Kim
  2015-11-03 14:40 ` [PATCHSET 0/4] perf report: Support folded callchain output (v4) Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2015-11-03 12:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

It's to track the count of occurrences of the callchains.

Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/callchain.c | 10 ++++++++++
 tools/perf/util/callchain.h |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index e2ef9b38acb6..60754de700d4 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -437,6 +437,8 @@ add_child(struct callchain_node *parent,
 
 	new->children_hit = 0;
 	new->hit = period;
+	new->children_count = 0;
+	new->count = 1;
 	return new;
 }
 
@@ -484,6 +486,9 @@ split_add_child(struct callchain_node *parent,
 	parent->children_hit = callchain_cumul_hits(new);
 	new->val_nr = parent->val_nr - idx_local;
 	parent->val_nr = idx_local;
+	new->count = parent->count;
+	new->children_count = parent->children_count;
+	parent->children_count = callchain_cumul_counts(new);
 
 	/* create a new child for the new branch if any */
 	if (idx_total < cursor->nr) {
@@ -494,6 +499,8 @@ split_add_child(struct callchain_node *parent,
 
 		parent->hit = 0;
 		parent->children_hit += period;
+		parent->count = 0;
+		parent->children_count += 1;
 
 		node = callchain_cursor_current(cursor);
 		new = add_child(parent, cursor, period);
@@ -516,6 +523,7 @@ split_add_child(struct callchain_node *parent,
 		rb_insert_color(&new->rb_node_in, &parent->rb_root_in);
 	} else {
 		parent->hit = period;
+		parent->count = 1;
 	}
 }
 
@@ -562,6 +570,7 @@ append_chain_children(struct callchain_node *root,
 
 inc_children_hit:
 	root->children_hit += period;
+	root->children_count++;
 }
 
 static int
@@ -614,6 +623,7 @@ append_chain(struct callchain_node *root,
 	/* we match 100% of the path, increment the hit */
 	if (matches == root->val_nr && cursor->pos == cursor->nr) {
 		root->hit += period;
+		root->count++;
 		return 0;
 	}
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index f9e00e3d1243..0e6cc83f1a46 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -60,6 +60,8 @@ struct callchain_node {
 	struct rb_root		rb_root_in; /* input tree of children */
 	struct rb_root		rb_root;    /* sorted output tree of children */
 	unsigned int		val_nr;
+	unsigned int		count;
+	unsigned int		children_count;
 	u64			hit;
 	u64			children_hit;
 };
@@ -145,6 +147,11 @@ static inline u64 callchain_cumul_hits(struct callchain_node *node)
 	return node->hit + node->children_hit;
 }
 
+static inline unsigned callchain_cumul_counts(struct callchain_node *node)
+{
+	return node->count + node->children_count;
+}
+
 int callchain_register_param(struct callchain_param *param);
 int callchain_append(struct callchain_root *root,
 		     struct callchain_cursor *cursor,
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/4] perf report: Add callchain value option
  2015-11-03 12:52 [PATCHSET 0/4] perf report: Support folded callchain output (v4) Namhyung Kim
                   ` (2 preceding siblings ...)
  2015-11-03 12:52 ` [PATCH v4 3/4] perf callchain: Add count fields to struct callchain_node Namhyung Kim
@ 2015-11-03 12:52 ` Namhyung Kim
  2015-11-03 14:40 ` [PATCHSET 0/4] perf report: Support folded callchain output (v4) Arnaldo Carvalho de Melo
  4 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2015-11-03 12:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'.  The percent is
same as before and it's the default behavior.  The period displays the
raw period value rather than the percentage.  The count displays the
number of occurrences.

  $ perf report --no-children --stdio -g percent
  ...
    39.93%  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--28.63%-- start_secondary
               |
                --11.30%-- rest_init

  $ perf report --no-children --show-total-period --stdio -g period
  ...
    39.93%   13018705  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--9334403-- start_secondary
               |
                --3684302-- rest_init

  $ perf report --no-children --show-nr-samples --stdio -g count
  ...
    39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
            |
            ---intel_idle
               cpuidle_enter_state
               cpuidle_enter
               call_cpuidle
               cpu_startup_entry
               |
               |--57-- start_secondary
               |
                --23-- rest_init

Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt | 13 ++++---
 tools/perf/builtin-report.c              |  4 +--
 tools/perf/ui/stdio/hist.c               | 10 +++++-
 tools/perf/util/callchain.c              | 62 +++++++++++++++++++++++++++-----
 tools/perf/util/callchain.h              | 10 +++++-
 tools/perf/util/util.c                   |  3 +-
 6 files changed, 84 insertions(+), 18 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index f7d81aac9188..dab99ed2b339 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -170,11 +170,11 @@ OPTIONS
         Dump raw trace in ASCII.
 
 -g::
---call-graph=<print_type,threshold[,print_limit],order,sort_key,branch>::
+--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
         Display call chains using type, min percent threshold, print limit,
-	call order, sort key and branch.  Note that ordering of parameters is not
-	fixed so any parement can be given in an arbitraty order.  One exception
-	is the print_limit which should be preceded by threshold.
+	call order, sort key, optional branch and value.  Note that ordering of
+	parameters is not fixed so any parement can be given in an arbitraty order.
+	One exception is the print_limit which should be preceded by threshold.
 
 	print_type can be either:
 	- flat: single column, linear exposure of call chains.
@@ -205,6 +205,11 @@ OPTIONS
 	- branch: include last branch information in callgraph when available.
 	          Usually more convenient to use --branch-history for this.
 
+	value can be:
+	- percent: diplay overhead percent (default)
+	- period: display event period
+	- count: display event count
+
 --children::
 	Accumulate callchain of children to parent entry so that then can
 	show up in the output.  The output will have a new "Children" column
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2853ad2bd435..3dd4bb4ded1a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -625,7 +625,7 @@ parse_percent_limit(const struct option *opt, const char *str,
 	return 0;
 }
 
-#define CALLCHAIN_DEFAULT_OPT  "graph,0.5,caller,function"
+#define CALLCHAIN_DEFAULT_OPT  "graph,0.5,caller,function,percent"
 
 const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
 				     CALLCHAIN_REPORT_HELP
@@ -708,7 +708,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
 		    "Only display entries with parent-match"),
 	OPT_CALLBACK_DEFAULT('g', "call-graph", &report,
-			     "print_type,threshold[,print_limit],order,sort_key[,branch]",
+			     "print_type,threshold[,print_limit],order,sort_key[,branch],value",
 			     report_callchain_help, &report_parse_callchain_opt,
 			     callchain_default_opt),
 	OPT_BOOLEAN(0, "children", &symbol_conf.cumulate_callchain,
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 9b5a0d8a43dc..c0580f5ea7ea 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -81,13 +81,14 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 					 int depth_mask, int left_margin)
 {
 	struct rb_node *node, *next;
-	struct callchain_node *child;
+	struct callchain_node *child = NULL;
 	struct callchain_list *chain;
 	int new_depth_mask = depth_mask;
 	u64 remaining;
 	size_t ret = 0;
 	int i;
 	uint entries_printed = 0;
+	int cumul_count = 0;
 
 	remaining = total_samples;
 
@@ -99,6 +100,7 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 		child = rb_entry(node, struct callchain_node, rb_node);
 		cumul = callchain_cumul_hits(child);
 		remaining -= cumul;
+		cumul_count += callchain_cumul_counts(child);
 
 		/*
 		 * The depth mask manages the output of pipes that show
@@ -148,6 +150,12 @@ static size_t __callchain__fprintf_graph(FILE *fp, struct rb_root *root,
 		if (!rem_sq_bracket)
 			return ret;
 
+		if (callchain_param.value == CCVAL_COUNT && child && child->parent) {
+			rem_node.count = child->parent->children_count - cumul_count;
+			if (rem_node.count <= 0)
+				return ret;
+		}
+
 		new_depth_mask &= ~(1 << (depth - 1));
 		ret += ipchain__fprintf_graph(fp, &rem_node, &rem_hits, depth,
 					      new_depth_mask, 0, total_samples,
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 60754de700d4..f3f1b95b808e 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -83,6 +83,23 @@ static int parse_callchain_sort_key(const char *value)
 	return -1;
 }
 
+static int parse_callchain_value(const char *value)
+{
+	if (!strncmp(value, "percent", strlen(value))) {
+		callchain_param.value = CCVAL_PERCENT;
+		return 0;
+	}
+	if (!strncmp(value, "period", strlen(value))) {
+		callchain_param.value = CCVAL_PERIOD;
+		return 0;
+	}
+	if (!strncmp(value, "count", strlen(value))) {
+		callchain_param.value = CCVAL_COUNT;
+		return 0;
+	}
+	return -1;
+}
+
 static int
 __parse_callchain_report_opt(const char *arg, bool allow_record_opt)
 {
@@ -106,7 +123,8 @@ __parse_callchain_report_opt(const char *arg, bool allow_record_opt)
 
 		if (!parse_callchain_mode(tok) ||
 		    !parse_callchain_order(tok) ||
-		    !parse_callchain_sort_key(tok)) {
+		    !parse_callchain_sort_key(tok) ||
+		    !parse_callchain_value(tok)) {
 			/* parsing ok - move on to the next */
 			try_stack_size = false;
 			goto next;
@@ -820,13 +838,27 @@ char *callchain_node__sprintf_value(struct callchain_node *node,
 {
 	double percent = 0.0;
 	u64 period = callchain_cumul_hits(node);
+	unsigned count = callchain_cumul_counts(node);
 
-	if (callchain_param.mode == CHAIN_FOLDED)
+	if (callchain_param.mode == CHAIN_FOLDED) {
 		period = node->hit;
-	if (total)
-		percent = period * 100.0 / total;
+		count = node->count;
+	}
 
-	scnprintf(bf, bfsize, "%.2f%%", percent);
+	switch (callchain_param.value) {
+	case CCVAL_PERIOD:
+		scnprintf(bf, bfsize, "%"PRIu64, period);
+		break;
+	case CCVAL_COUNT:
+		scnprintf(bf, bfsize, "%u", count);
+		break;
+	case CCVAL_PERCENT:
+	default:
+		if (total)
+			percent = period * 100.0 / total;
+		scnprintf(bf, bfsize, "%.2f%%", percent);
+		break;
+	}
 	return bf;
 }
 
@@ -835,13 +867,25 @@ int callchain_node__fprintf_value(struct callchain_node *node,
 {
 	double percent = 0.0;
 	u64 period = callchain_cumul_hits(node);
+	unsigned count = callchain_cumul_counts(node);
 
-	if (callchain_param.mode == CHAIN_FOLDED)
+	if (callchain_param.mode == CHAIN_FOLDED) {
 		period = node->hit;
-	if (total)
-		percent = period * 100.0 / total;
+		count = node->count;
+	}
 
-	return percent_color_fprintf(fp, "%.2f%%", percent);
+	switch (callchain_param.value) {
+	case CCVAL_PERIOD:
+		return fprintf(fp, "%"PRIu64, period);
+	case CCVAL_COUNT:
+		return fprintf(fp, "%u", count);
+	case CCVAL_PERCENT:
+	default:
+		if (total)
+			percent = period * 100.0 / total;
+		return percent_color_fprintf(fp, "%.2f%%", percent);
+	}
+	return 0;
 }
 
 static void free_callchain_node(struct callchain_node *node)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 0e6cc83f1a46..b14d760fc4e3 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -29,7 +29,8 @@
 	HELP_PAD "print_limit:\tmaximum number of call graph entry (<number>)\n" \
 	HELP_PAD "order:\t\tcall graph order (caller|callee)\n" \
 	HELP_PAD "sort_key:\tcall graph sort key (function|address)\n"	\
-	HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n"
+	HELP_PAD "branch:\t\tinclude last branch info to call graph (branch)\n" \
+	HELP_PAD "value:\t\tcall graph value (percent|period|count)\n"
 
 enum perf_call_graph_mode {
 	CALLCHAIN_NONE,
@@ -81,6 +82,12 @@ enum chain_key {
 	CCKEY_ADDRESS
 };
 
+enum chain_value {
+	CCVAL_PERCENT,
+	CCVAL_PERIOD,
+	CCVAL_COUNT,
+};
+
 struct callchain_param {
 	bool			enabled;
 	enum perf_call_graph_mode record_mode;
@@ -93,6 +100,7 @@ struct callchain_param {
 	bool			order_set;
 	enum chain_key		key;
 	bool			branch_callstack;
+	enum chain_value	value;
 };
 
 extern struct callchain_param callchain_param;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index cd12c25e4ea4..174912f87913 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -20,7 +20,8 @@ struct callchain_param	callchain_param = {
 	.mode	= CHAIN_GRAPH_ABS,
 	.min_percent = 0.5,
 	.order  = ORDER_CALLEE,
-	.key	= CCKEY_FUNCTION
+	.key	= CCKEY_FUNCTION,
+	.value	= CCVAL_PERCENT,
 };
 
 /*
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-03 12:52 [PATCHSET 0/4] perf report: Support folded callchain output (v4) Namhyung Kim
                   ` (3 preceding siblings ...)
  2015-11-03 12:52 ` [PATCH v4 4/4] perf report: Add callchain value option Namhyung Kim
@ 2015-11-03 14:40 ` Arnaldo Carvalho de Melo
  2015-11-03 21:33   ` Brendan Gregg
  4 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-03 14:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Brendan Gregg,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
> Hello,
> 
> This is what Brendan requested on the perf-users mailing list [1] to
> support FlameGraphs [2] more efficiently.  This patchset adds a few
> more callchain options to adjust the output for it.
> 
>  * changes in v4)
>   - add missing doc update
>   - cleanup/fix callchain value print code
>   - add Acked-by from Brendan and Jiri

Do those Acked-by stand? Things changed, the values moved from the end
of the line to the start, etc.

You said you would consider having a --no-hists, but I see nothing about
it in this patchkit.

Some more comments below.

- Arnaldo

>  * changes in v3)
>   - put the value before callchains
>   - fix compile error
> 
> 
> At first, 'folded' output mode was added.  The folded output puts the
> value, a space and all calchain nodes separated by semicolons.  Now it
> only supports --stdio as other UI provides some way of folding and/or
> expanding callchains dynamically.
> 
> The value is now can be one of 'percent', 'period', or 'count'.  The
> percent is current default output and the period is the raw number of
> sample periods.  The count is the number of samples for each callchain.
> 
> Here's an example:
> 
>   $ perf report --no-children --show-nr-samples --stdio -g folded,count
>   ...
>     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel
>   57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
>   23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;...
> 
> 
>   $ perf report --no-children --stdio -g percent

So, in this first one you show the percent in both

>   ...
>     39.93%  swapper  [kernel.vmlinux]  [k] intel_idel
>             |
>             ---intel_idle
>                cpuidle_enter_state
>                cpuidle_enter
>                call_cpuidle
>                cpu_startup_entry
>                |
>                |--28.63%-- start_secondary
>                |
>                 --11.30%-- rest_init
> 
> 
>   $ perf report --no-children --stdio --show-total-period -g period
>   ...

then here you _add_ the period to the hist_entry line, but...

>     39.93%   13018705  swapper  [kernel.vmlinux]  [k] intel_idel
>             |
>             ---intel_idle
>                cpuidle_enter_state
>                cpuidle_enter
>                call_cpuidle
>                cpu_startup_entry
>                |

_replace_ the percentage with the period in the callchains.

Can't we have the same effect in both? I.e. I would expect the 39.93% to
simply be replaced with that 13018705.

>                |--9334403-- start_secondary
>                |
>                 --3684302-- rest_init
> 
> 
>   $ perf report --no-children --stdio --show-nr-samples -g count
>   ...
>     39.93%     80  swapper  [kernel.vmlinux]  [k] intel_idel

Ditto for count

>             |
>             ---intel_idle
>                cpuidle_enter_state
>                cpuidle_enter
>                call_cpuidle
>                cpu_startup_entry
>                |
>                |--57-- start_secondary
>                |
>                 --23-- rest_init
> 
> 
> You can get it from 'perf/callchain-fold-v4' branch on my tree:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Any comments are welcome, thanks
> Namhyung
> 
> 
> [1] http://www.spinics.net/lists/linux-perf-users/msg02498.html
> [2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
> 
> 
> Namhyung Kim (4):
>   perf report: Support folded callchain mode on --stdio
>   perf callchain: Abstract callchain print function
>   perf callchain: Add count fields to struct callchain_node
>   perf report: Add callchain value option
> 
>  tools/perf/Documentation/perf-report.txt | 13 +++--
>  tools/perf/builtin-report.c              |  4 +-
>  tools/perf/ui/browsers/hists.c           |  8 +--
>  tools/perf/ui/gtk/hists.c                |  8 +--
>  tools/perf/ui/stdio/hist.c               | 93 ++++++++++++++++++++++++++------
>  tools/perf/util/callchain.c              | 87 +++++++++++++++++++++++++++++-
>  tools/perf/util/callchain.h              | 24 ++++++++-
>  tools/perf/util/util.c                   |  3 +-
>  8 files changed, 205 insertions(+), 35 deletions(-)
> 
> -- 
> 2.6.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-03 14:40 ` [PATCHSET 0/4] perf report: Support folded callchain output (v4) Arnaldo Carvalho de Melo
@ 2015-11-03 21:33   ` Brendan Gregg
  2015-11-04  1:54     ` Namhyung Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Brendan Gregg @ 2015-11-03 21:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
>> Hello,
>>
>> This is what Brendan requested on the perf-users mailing list [1] to
>> support FlameGraphs [2] more efficiently.  This patchset adds a few
>> more callchain options to adjust the output for it.
>>
>>  * changes in v4)
>>   - add missing doc update
>>   - cleanup/fix callchain value print code
>>   - add Acked-by from Brendan and Jiri
>
> Do those Acked-by stand? Things changed, the values moved from the end
> of the line to the start, etc.
>
[...]

I'd Ack this change as it's a useful addition. It doesn't quite
address the folded-only output, but it's a step in that direction. I
think having the value at the start of a line only makes sense for the
perf report output containing the hist summary lines, for consistency.

Here's how I'd shuffle the output of this patch (ignore word wrap
issues with this email):

# ./perf report --stdio -g folded,count,caller -F pid | \
    awk '/^ / { n = $1 }
    /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
809
swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
135
dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
63
dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
54
dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
3
dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3

So the output is folded stacks, prefixed by comm-PID. Shuffling the
summarized output is a lot better than doing a "perf script" dump and
re-processing call chains. (Note that since I'm using -F, I didn't
need --no-children; and with "-g count", I didn't need
--show-nr-samples.)

I notice the fields (-F) option already has this precedent:

- "comm": prints PID:comm
- "pid": prints PID

If these were added to -g, along with a no-hists, then the two types
of folded-only output could be generated using:

perf report --stdio -g folded,count,comm,no-hists,caller
perf report --stdio -g folded,count,pid,no-hists,caller

... although "no-hists" doesn't hit me as intuitive. How about "-F
none" to specify zero columns? ie:

perf report --stdio -g folded,count,comm,caller -F none
perf report --stdio -g folded,count,pid,caller -F none

Brendan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-03 21:33   ` Brendan Gregg
@ 2015-11-04  1:54     ` Namhyung Kim
  2015-11-04  6:02       ` Brendan Gregg
  0 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2015-11-04  1:54 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Hi Brendan,

On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote:
> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
> >> Hello,
> >>
> >> This is what Brendan requested on the perf-users mailing list [1] to
> >> support FlameGraphs [2] more efficiently.  This patchset adds a few
> >> more callchain options to adjust the output for it.
> >>
> >>  * changes in v4)
> >>   - add missing doc update
> >>   - cleanup/fix callchain value print code
> >>   - add Acked-by from Brendan and Jiri
> >
> > Do those Acked-by stand? Things changed, the values moved from the end
> > of the line to the start, etc.
> >
> [...]
> 
> I'd Ack this change as it's a useful addition. It doesn't quite
> address the folded-only output, but it's a step in that direction. I
> think having the value at the start of a line only makes sense for the
> perf report output containing the hist summary lines, for consistency.

Right, thanks!


> 
> Here's how I'd shuffle the output of this patch (ignore word wrap
> issues with this email):
> 
> # ./perf report --stdio -g folded,count,caller -F pid | \
>     awk '/^ / { n = $1 }
>     /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> 809
> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> 135
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
> 63
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
> 54
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
> 3
> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3
> 
> So the output is folded stacks, prefixed by comm-PID. Shuffling the
> summarized output is a lot better than doing a "perf script" dump and
> re-processing call chains. (Note that since I'm using -F, I didn't
> need --no-children;

Nope.  The '-F pid' doesn't affect --children.  It doesn't show the
children overhead column but we still have hist entries for
(synthesized) children..

  $ perf report --no-children | wc -l
  998

  $ perf report --no-children -F pid,dso,sym | wc -l
  998

  $ perf report --children | wc -l
  3229

  $ perf report --children -F pid,dso,sym | wc -l
  3202

So I think you still need to use --no-children (or set report.children
config variable to false) for your script.


> and with "-g count", I didn't need --show-nr-samples.)

Yes, I used -n/--show-nr-samples just to check the number is correct.


> 
> I notice the fields (-F) option already has this precedent:
> 
> - "comm": prints PID:comm
> - "pid": prints PID

It's opposite:  "comm" prints comm, "pid" prints PID:comm. :)


> 
> If these were added to -g, along with a no-hists, then the two types
> of folded-only output could be generated using:
> 
> perf report --stdio -g folded,count,comm,no-hists,caller
> perf report --stdio -g folded,count,pid,no-hists,caller

As I said, using fields like comm, pid requires to have same keys in
--sort option.  So it's basically unreliable to use those specific
field names in the -g option IMHO.  I suggested to use 'info' (yes, it
needs better name) to print all sort keys.


> 
> ... although "no-hists" doesn't hit me as intuitive. How about "-F
> none" to specify zero columns? ie:
> 
> perf report --stdio -g folded,count,comm,caller -F none
> perf report --stdio -g folded,count,pid,caller -F none

Ah, makes sense.  So it'd look like

  $ perf report --stdio -g folded,count,info -F none -s comm
  $ perf report --stdio -g folded,count,info -F none -s pid

The output would be

  809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op

Thoughts?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-04  1:54     ` Namhyung Kim
@ 2015-11-04  6:02       ` Brendan Gregg
  2015-11-04 14:51         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 13+ messages in thread
From: Brendan Gregg @ 2015-11-04  6:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Brendan,
>
> On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote:
>> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
>> >> Hello,
>> >>
>> >> This is what Brendan requested on the perf-users mailing list [1] to
>> >> support FlameGraphs [2] more efficiently.  This patchset adds a few
>> >> more callchain options to adjust the output for it.
>> >>
>> >>  * changes in v4)
>> >>   - add missing doc update
>> >>   - cleanup/fix callchain value print code
>> >>   - add Acked-by from Brendan and Jiri
>> >
>> > Do those Acked-by stand? Things changed, the values moved from the end
>> > of the line to the start, etc.
>> >
>> [...]
>>
>> I'd Ack this change as it's a useful addition. It doesn't quite
>> address the folded-only output, but it's a step in that direction. I
>> think having the value at the start of a line only makes sense for the
>> perf report output containing the hist summary lines, for consistency.
>
> Right, thanks!
>
>
>>
>> Here's how I'd shuffle the output of this patch (ignore word wrap
>> issues with this email):
>>
>> # ./perf report --stdio -g folded,count,caller -F pid | \
>>     awk '/^ / { n = $1 }
>>     /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
>> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>> 809
>> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>> 135
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
>> 63
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
>> 54
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
>> 3
>> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3
>>
>> So the output is folded stacks, prefixed by comm-PID. Shuffling the
>> summarized output is a lot better than doing a "perf script" dump and
>> re-processing call chains. (Note that since I'm using -F, I didn't
>> need --no-children;
>
> Nope.  The '-F pid' doesn't affect --children.  It doesn't show the
> children overhead column but we still have hist entries for
> (synthesized) children..
>
>   $ perf report --no-children | wc -l
>   998
>
>   $ perf report --no-children -F pid,dso,sym | wc -l
>   998
>
>   $ perf report --children | wc -l
>   3229
>
>   $ perf report --children -F pid,dso,sym | wc -l
>   3202
>
> So I think you still need to use --no-children (or set report.children
> config variable to false) for your script.

Ok, good to know, thanks.

>
>
>> and with "-g count", I didn't need --show-nr-samples.)
>
> Yes, I used -n/--show-nr-samples just to check the number is correct.
>
>
>>
>> I notice the fields (-F) option already has this precedent:
>>
>> - "comm": prints PID:comm
>> - "pid": prints PID
>
> It's opposite:  "comm" prints comm, "pid" prints PID:comm. :)

Ah, right, sorry, I'd typed those the wrong way around. :)

>
>
>>
>> If these were added to -g, along with a no-hists, then the two types
>> of folded-only output could be generated using:
>>
>> perf report --stdio -g folded,count,comm,no-hists,caller
>> perf report --stdio -g folded,count,pid,no-hists,caller
>
> As I said, using fields like comm, pid requires to have same keys in
> --sort option.  So it's basically unreliable to use those specific
> field names in the -g option IMHO.  I suggested to use 'info' (yes, it
> needs better name) to print all sort keys.
>
>
>>
>> ... although "no-hists" doesn't hit me as intuitive. How about "-F
>> none" to specify zero columns? ie:
>>
>> perf report --stdio -g folded,count,comm,caller -F none
>> perf report --stdio -g folded,count,pid,caller -F none
>
> Ah, makes sense.  So it'd look like
>
>   $ perf report --stdio -g folded,count,info -F none -s comm
>   $ perf report --stdio -g folded,count,info -F none -s pid
>
> The output would be
>
>   809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>

Thanks, looks almost right: a couple of minor changes:

1. If perf already has the precedent of "PID:comm", instead of my
"comm-PID", then maybe it should use "PID:comm" for perf consistency.
Doesn't make much difference to me.
2. The second space, delimiting "PID:comm" (or comm) and the stack...
I'm nervous about using space as a delimiter any more than once, since
it can also appear in comm (eg, "java main") and frames (eg,
"JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
Thread*)" -- that's direct from "perf script"!). I'd consider making
it a semicolon:

809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...

So the output is "value key", and key is a semicolon delimited stack
with an optional comm or PID:comm frame at the start.

Brendan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-04  6:02       ` Brendan Gregg
@ 2015-11-04 14:51         ` Arnaldo Carvalho de Melo
  2015-11-04 15:34           ` Namhyung Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-04 14:51 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> > Ah, makes sense.  So it'd look like

> >   $ perf report --stdio -g folded,count,info -F none -s comm
> >   $ perf report --stdio -g folded,count,info -F none -s pid

> > The output would be

> >   809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
 
> Thanks, looks almost right: a couple of minor changes:
 
> 1. If perf already has the precedent of "PID:comm", instead of my
> "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> Doesn't make much difference to me.
> 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> I'm nervous about using space as a delimiter any more than once, since
> it can also appear in comm (eg, "java main") and frames (eg,
> "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> Thread*)" -- that's direct from "perf script"!). I'd consider making
> it a semicolon:
 
> 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
 
> So the output is "value key", and key is a semicolon delimited stack
> with an optional comm or PID:comm frame at the start.

Agreed, but then, we can have some sort of default and also be able to,
using -F, specify what are the fields we want, and in which order, and I
liked your suggestion of being able to specify "-F none" and that mean
no hist line to be produced.

Likewise, the way that each callchain line should be formatted should be
programmable via the command line, via the -g option, no? Then script
writers could use it in a way that doesn't requires further processing,
as Brendan showed.

But yeah, the value is the semicolon delimited stack all the way to the
comm/PID:comm if there are more than one or if the user asks it to be
there via a -g keyword, all the other counts/info are just relative to
that, CSV or whatever other delimiter the user asks it to, and space is
not an option, as we know it can appear in the middle of a COMM:

[root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]'
# To display the perf.data header info, please use
# --header/--header-only options.
# Total Lost Samples: 0
# Samples: 164K of event 'cycles:pp'
# Event count (approx.): 34422160859
     0.11%  DOM Worker     
     0.10%  JS Helper      
     0.01%  Qt bearer threa
     0.00%  Socket Thread  
     0.00%  dconf worker   
     0.00%  JS Watchdog    
[root@zoo ~]#

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-04 14:51         ` Arnaldo Carvalho de Melo
@ 2015-11-04 15:34           ` Namhyung Kim
  2015-11-04 18:08             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2015-11-04 15:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Hi Arnaldo and Brendan,

On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > Ah, makes sense.  So it'd look like
> 
> > >   $ perf report --stdio -g folded,count,info -F none -s comm
> > >   $ perf report --stdio -g folded,count,info -F none -s pid
> 
> > > The output would be
> 
> > >   809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>  
> > Thanks, looks almost right: a couple of minor changes:
>  
> > 1. If perf already has the precedent of "PID:comm", instead of my
> > "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> > Doesn't make much difference to me.

Right.  Actually I'd like to write it that way.. ;-)


> > 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> > I'm nervous about using space as a delimiter any more than once, since
> > it can also appear in comm (eg, "java main") and frames (eg,
> > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> > Thread*)" -- that's direct from "perf script"!). I'd consider making
> > it a semicolon:

Fair enough.


>  
> > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
>  
> > So the output is "value key", and key is a semicolon delimited stack
> > with an optional comm or PID:comm frame at the start.
> 
> Agreed, but then, we can have some sort of default and also be able to,
> using -F, specify what are the fields we want, and in which order, and I
> liked your suggestion of being able to specify "-F none" and that mean
> no hist line to be produced.
> 
> Likewise, the way that each callchain line should be formatted should be
> programmable via the command line, via the -g option, no? Then script
> writers could use it in a way that doesn't requires further processing,
> as Brendan showed.

Right.  So '-s <key1>[,<key2>,...] -g info' can control which info is
displayed along with the callchains.

  $ perf report -s comm,dso -g folded,count,info -F none
  809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;...

Note that the info part (swapper;[kernel.vmlinux]) is also separated
by a semicolon.  But I think it's ok since it's controlled by command
line, so script can know how many entries will be.


> 
> But yeah, the value is the semicolon delimited stack all the way to the
> comm/PID:comm if there are more than one or if the user asks it to be
> there via a -g keyword, all the other counts/info are just relative to
> that, CSV or whatever other delimiter the user asks it to, and space is
> not an option, as we know it can appear in the middle of a COMM:

Yes, I think that we should use a given separator (using -t option)
instead of hard-coded semicolon.  Although it'd be rare, it seems
possible to use semicolons in the comm name too.

Thanks,
Namhyung


> 
> [root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]'
> # To display the perf.data header info, please use
> # --header/--header-only options.
> # Total Lost Samples: 0
> # Samples: 164K of event 'cycles:pp'
> # Event count (approx.): 34422160859
>      0.11%  DOM Worker     
>      0.10%  JS Helper      
>      0.01%  Qt bearer threa
>      0.00%  Socket Thread  
>      0.00%  dconf worker   
>      0.00%  JS Watchdog    
> [root@zoo ~]#
> 
> - Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-04 15:34           ` Namhyung Kim
@ 2015-11-04 18:08             ` Arnaldo Carvalho de Melo
  2015-11-05 11:33               ` Namhyung Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-11-04 18:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu:
> Hi Arnaldo and Brendan,
> 
> On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > Ah, makes sense.  So it'd look like
> > 
> > > >   $ perf report --stdio -g folded,count,info -F none -s comm
> > > >   $ perf report --stdio -g folded,count,info -F none -s pid
> > 
> > > > The output would be
> > 
> > > >   809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> >  
> > > Thanks, looks almost right: a couple of minor changes:
> >  
> > > 1. If perf already has the precedent of "PID:comm", instead of my
> > > "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> > > Doesn't make much difference to me.
 
> Right.  Actually I'd like to write it that way.. ;-)

Well, those are two pieces of information: "comm" and "pid", so it would
be nice that we could take this opportunity to remove it, i.e. just
treat it as any other field and separate it via the designated
separator, and only show the ones specified.
 
> > > 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> > > I'm nervous about using space as a delimiter any more than once, since
> > > it can also appear in comm (eg, "java main") and frames (eg,
> > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> > > Thread*)" -- that's direct from "perf script"!). I'd consider making
> > > it a semicolon:

The C++ symbol names are the biggest challenge here for a single line in
CSV ("comma" quoted) record :-\
 
> Fair enough.

> > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
> >  
> > > So the output is "value key", and key is a semicolon delimited stack
> > > with an optional comm or PID:comm frame at the start.
> > 
> > Agreed, but then, we can have some sort of default and also be able to,
> > using -F, specify what are the fields we want, and in which order, and I
> > liked your suggestion of being able to specify "-F none" and that mean
> > no hist line to be produced.
> > 
> > Likewise, the way that each callchain line should be formatted should be
> > programmable via the command line, via the -g option, no? Then script
> > writers could use it in a way that doesn't requires further processing,
> > as Brendan showed.
> 
> Right.  So '-s <key1>[,<key2>,...] -g info' can control which info is
> displayed along with the callchains.

So you force the same selection of fields to be used for both the
hist_entry and the callchains?

And why is that some of the fields will be selected via -s (comm, dso)
and other fields will be selected via -g (count, this "info" thing)?

Why not be flexible and allow any set of fields to be used in both
cases, without one being tied to the other?

I.e. instead of:

    -s <key1>[,<key2>,...] -g info

We use:

    -s <key1>[,<key2>,...] -g [<keyA>[,<keyB>],...]

If one would want to have the same set for both, then yeah, a keyword
for that would be interesting, reusing your "info":

    -s <key1>[,<key2>,...] -g info

Would mean:

    -s <key1>[,<key2>,...] -g [<key1>[,<key2>],...]

With both ... equal

But "info" is way too vague, perhaps "hist_keys", or something more
compact, like: "\-s", to reuse the semantic of regular expression groups
(\1).
 
>   $ perf report -s comm,dso -g folded,count,info -F none
>   809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;...
 
> Note that the info part (swapper;[kernel.vmlinux]) is also separated
> by a semicolon.  But I think it's ok since it's controlled by command
> line, so script can know how many entries will be.
 
> > But yeah, the value is the semicolon delimited stack all the way to the
> > comm/PID:comm if there are more than one or if the user asks it to be
> > there via a -g keyword, all the other counts/info are just relative to
> > that, CSV or whatever other delimiter the user asks it to, and space is
> > not an option, as we know it can appear in the middle of a COMM:
> 
> Yes, I think that we should use a given separator (using -t option)
> instead of hard-coded semicolon.  Although it'd be rare, it seems
> possible to use semicolons in the comm name too.

Well, we can have an option to specify what would be the separator for
the callchains.

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
  2015-11-04 18:08             ` Arnaldo Carvalho de Melo
@ 2015-11-05 11:33               ` Namhyung Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2015-11-05 11:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Brendan Gregg, Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML,
	David Ahern, Frederic Weisbecker, Andi Kleen, Kan Liang

Hi Arnaldo,

On Wed, Nov 04, 2015 at 03:08:58PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu:
> > Hi Arnaldo and Brendan,
> > 
> > On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu:
> > > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > Ah, makes sense.  So it'd look like
> > > 
> > > > >   $ perf report --stdio -g folded,count,info -F none -s comm
> > > > >   $ perf report --stdio -g folded,count,info -F none -s pid
> > > 
> > > > > The output would be
> > > 
> > > > >   809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> > >  
> > > > Thanks, looks almost right: a couple of minor changes:
> > >  
> > > > 1. If perf already has the precedent of "PID:comm", instead of my
> > > > "comm-PID", then maybe it should use "PID:comm" for perf consistency.
> > > > Doesn't make much difference to me.
>  
> > Right.  Actually I'd like to write it that way.. ;-)
> 
> Well, those are two pieces of information: "comm" and "pid", so it would
> be nice that we could take this opportunity to remove it, i.e. just
> treat it as any other field and separate it via the designated
> separator, and only show the ones specified.

So do you want to change '-s pid' to print 'PID' part only?


>  
> > > > 2. The second space, delimiting "PID:comm" (or comm) and the stack...
> > > > I'm nervous about using space as a delimiter any more than once, since
> > > > it can also appear in comm (eg, "java main") and frames (eg,
> > > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> > > > Thread*)" -- that's direct from "perf script"!). I'd consider making
> > > > it a semicolon:
> 
> The C++ symbol names are the biggest challenge here for a single line in
> CSV ("comma" quoted) record :-\
>  
> > Fair enough.
> 
> > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;...
> > >  
> > > > So the output is "value key", and key is a semicolon delimited stack
> > > > with an optional comm or PID:comm frame at the start.
> > > 
> > > Agreed, but then, we can have some sort of default and also be able to,
> > > using -F, specify what are the fields we want, and in which order, and I
> > > liked your suggestion of being able to specify "-F none" and that mean
> > > no hist line to be produced.
> > > 
> > > Likewise, the way that each callchain line should be formatted should be
> > > programmable via the command line, via the -g option, no? Then script
> > > writers could use it in a way that doesn't requires further processing,
> > > as Brendan showed.
> > 
> > Right.  So '-s <key1>[,<key2>,...] -g info' can control which info is
> > displayed along with the callchains.
> 
> So you force the same selection of fields to be used for both the
> hist_entry and the callchains?

Yes.


> 
> And why is that some of the fields will be selected via -s (comm, dso)
> and other fields will be selected via -g (count, this "info" thing)?

Because it affects how hist entries are aggregated..


> 
> Why not be flexible and allow any set of fields to be used in both
> cases, without one being tied to the other?
> 
> I.e. instead of:
> 
>     -s <key1>[,<key2>,...] -g info
> 
> We use:
> 
>     -s <key1>[,<key2>,...] -g [<keyA>[,<keyB>],...]

But then we need to aggregate hist entries using all of key1, key2,
keyA, keyB and so on.  Otherwise callchain info with keyA and keyB
might be stale.

If so, we need to group hist entries again using key1 and key2 only
for printing hist part.  For example, entries for (1,2,A,B) and
(1,2,C,D) should be shown as single entry for (1,2).

I think this 'info' part is only needed when hist entries are omitted
(i.e. -F none).  If so, no need to bother with new options..

> 
> If one would want to have the same set for both, then yeah, a keyword
> for that would be interesting, reusing your "info":
> 
>     -s <key1>[,<key2>,...] -g info
> 
> Would mean:
> 
>     -s <key1>[,<key2>,...] -g [<key1>[,<key2>],...]
> 
> With both ... equal
> 
> But "info" is way too vague, perhaps "hist_keys", or something more
> compact, like: "\-s", to reuse the semantic of regular expression groups
> (\1).

I prefer "hist_keys".


>  
> >   $ perf report -s comm,dso -g folded,count,info -F none
> >   809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;...
>  
> > Note that the info part (swapper;[kernel.vmlinux]) is also separated
> > by a semicolon.  But I think it's ok since it's controlled by command
> > line, so script can know how many entries will be.
>  
> > > But yeah, the value is the semicolon delimited stack all the way to the
> > > comm/PID:comm if there are more than one or if the user asks it to be
> > > there via a -g keyword, all the other counts/info are just relative to
> > > that, CSV or whatever other delimiter the user asks it to, and space is
> > > not an option, as we know it can appear in the middle of a COMM:
> > 
> > Yes, I think that we should use a given separator (using -t option)
> > instead of hard-coded semicolon.  Although it'd be rare, it seems
> > possible to use semicolons in the comm name too.
> 
> Well, we can have an option to specify what would be the separator for
> the callchains.

What's the problem of using a single separator?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-11-05 11:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-03 12:52 [PATCHSET 0/4] perf report: Support folded callchain output (v4) Namhyung Kim
2015-11-03 12:52 ` [PATCH v4 1/4] perf report: Support folded callchain mode on --stdio Namhyung Kim
2015-11-03 12:52 ` [PATCH v4 2/4] perf callchain: Abstract callchain print function Namhyung Kim
2015-11-03 12:52 ` [PATCH v4 3/4] perf callchain: Add count fields to struct callchain_node Namhyung Kim
2015-11-03 12:52 ` [PATCH v4 4/4] perf report: Add callchain value option Namhyung Kim
2015-11-03 14:40 ` [PATCHSET 0/4] perf report: Support folded callchain output (v4) Arnaldo Carvalho de Melo
2015-11-03 21:33   ` Brendan Gregg
2015-11-04  1:54     ` Namhyung Kim
2015-11-04  6:02       ` Brendan Gregg
2015-11-04 14:51         ` Arnaldo Carvalho de Melo
2015-11-04 15:34           ` Namhyung Kim
2015-11-04 18:08             ` Arnaldo Carvalho de Melo
2015-11-05 11:33               ` Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.