All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] perfcounter: callchains with perf report
@ 2009-06-26 14:27 Frederic Weisbecker
  2009-06-26 14:28 ` [PATCH 1/2] perfcounter: prepare a small callchain framework Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Frederic Weisbecker @ 2009-06-26 14:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Frederic Weisbecker

Hi,

Here is a first shot for the sorted callchains per entries handling
with per report.

I'll continue to improve it:

- symbol resolution
- profit we have a tree to display a better graph hierarchy
- let the user provide a limit for hit percentage, depth, number of
  backtraces, etc...
- better output
- colors
- and so on....

Thanks.

Frederic Weisbecker (2):
  perfcounter: prepare a small callchain framework
  perfcounter: print sorted callchains per histogram entries

 tools/perf/Makefile         |    1 +
 tools/perf/builtin-report.c |   87 +++++++++++++++++----
 tools/perf/perf.h           |    5 +
 tools/perf/util/callchain.c |  174 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/callchain.h |   33 ++++++++
 5 files changed, 284 insertions(+), 16 deletions(-)
 create mode 100644 tools/perf/util/callchain.c
 create mode 100644 tools/perf/util/callchain.h


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] perfcounter: prepare a small callchain framework
  2009-06-26 14:27 [PATCH 0/2] perfcounter: callchains with perf report Frederic Weisbecker
@ 2009-06-26 14:28 ` Frederic Weisbecker
  2009-06-26 15:51   ` [tip:perfcounters/urgent] perf_counter tools: Prepare " tip-bot for Frederic Weisbecker
  2009-06-26 14:28 ` [PATCH 2/2] perfcounter: print sorted callchains per histogram entries Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2009-06-26 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Frederic Weisbecker

We plan to display the callchains depending on some user-configurable
parameters.
To gather the callchains stats from the recorded stream in a fast way,
this patch introduces an ad hoc radix tree adapted for callchains and also
a rbtree to sort these callchains once we have gathered every events
from the stream.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/Makefile         |    1 +
 tools/perf/builtin-report.c |    5 -
 tools/perf/perf.h           |    5 +
 tools/perf/util/callchain.c |  174 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/callchain.h |   33 ++++++++
 5 files changed, 213 insertions(+), 5 deletions(-)
 create mode 100644 tools/perf/util/callchain.c
 create mode 100644 tools/perf/util/callchain.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d3887ed..1c1296d 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -329,6 +329,7 @@ LIB_OBJS += util/symbol.o
 LIB_OBJS += util/color.o
 LIB_OBJS += util/pager.o
 LIB_OBJS += util/header.o
+LIB_OBJS += util/callchain.o
 
 BUILTIN_OBJS += builtin-annotate.o
 BUILTIN_OBJS += builtin-help.o
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 681c223..28d1cb2 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -62,11 +62,6 @@ struct ip_event {
 	unsigned char __more_data[];
 };
 
-struct ip_callchain {
-	u64 nr;
-	u64 ips[0];
-};
-
 struct mmap_event {
 	struct perf_event_header header;
 	u32 pid, tid;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 0f32a2e..ce39419 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -72,4 +72,9 @@ sys_perf_counter_open(struct perf_counter_attr *attr,
 #define MAX_COUNTERS			256
 #define MAX_NR_CPUS			256
 
+struct ip_callchain {
+	u64 nr;
+	u64 ips[0];
+};
+
 #endif
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
new file mode 100644
index 0000000..ad3c285
--- /dev/null
+++ b/tools/perf/util/callchain.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2009, Frederic Weisbecker <fweisbec@gmail.com>
+ *
+ * Handle the callchains from the stream in an ad-hoc radix tree and then
+ * sort them in an rbtree.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+
+#include "callchain.h"
+
+
+static void rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct callchain_node *rnode;
+
+	while (*p) {
+		parent = *p;
+		rnode = rb_entry(parent, struct callchain_node, rb_node);
+
+		if (rnode->hit < chain->hit)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	rb_link_node(&chain->rb_node, parent, p);
+	rb_insert_color(&chain->rb_node, root);
+}
+
+/*
+ * Once we get every callchains from the stream, we can now
+ * sort them by hit
+ */
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node)
+{
+	struct callchain_node *child;
+
+	list_for_each_entry(child, &node->children, brothers)
+		sort_chain_to_rbtree(rb_root, child);
+
+	if (node->hit)
+		rb_insert_callchain(rb_root, node);
+}
+
+static struct callchain_node *create_child(struct callchain_node *parent)
+{
+	struct callchain_node *new;
+
+	new = malloc(sizeof(*new));
+	if (!new) {
+		perror("not enough memory to create child for code path tree");
+		return NULL;
+	}
+	new->parent = parent;
+	INIT_LIST_HEAD(&new->children);
+	INIT_LIST_HEAD(&new->val);
+	list_add_tail(&new->brothers, &parent->children);
+
+	return new;
+}
+
+static void
+fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
+{
+	int i;
+
+	for (i = start; i < chain->nr; i++) {
+		struct callchain_list *call;
+
+		call = malloc(sizeof(*chain));
+		if (!call) {
+			perror("not enough memory for the code path tree");
+			return;
+		}
+		call->ip = chain->ips[i];
+		list_add_tail(&call->list, &node->val);
+	}
+	node->val_nr = i - start;
+}
+
+static void add_child(struct callchain_node *parent, struct ip_callchain *chain)
+{
+	struct callchain_node *new;
+
+	new = create_child(parent);
+	fill_node(new, chain, parent->val_nr);
+
+	new->hit = 1;
+}
+
+static void
+split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
+		struct callchain_list *to_split, int idx)
+{
+	struct callchain_node *new;
+
+	/* split */
+	new = create_child(parent);
+	list_move_tail(&to_split->list, &new->val);
+	new->hit = parent->hit;
+	parent->hit = 0;
+	parent->val_nr = idx;
+
+	/* create the new one */
+	add_child(parent, chain);
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		int start);
+
+static int
+__append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
+{
+	struct callchain_node *rnode;
+
+	/* lookup in childrens */
+	list_for_each_entry(rnode, &root->children, brothers) {
+		int ret = __append_chain(rnode, chain, root->val_nr);
+		if (!ret)
+			return 0;
+	}
+	return -1;
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		int start)
+{
+	struct callchain_list *cnode;
+	int i = start;
+	bool found = false;
+
+	/* lookup in the current node */
+	list_for_each_entry(cnode, &root->val, list) {
+		if (cnode->ip != chain->ips[i++])
+			break;
+		if (!found)
+			found = true;
+		if (i == chain->nr)
+			break;
+	}
+
+	/* matches not, relay on the parent */
+	if (!found)
+		return -1;
+
+	/* we match only a part of the node. Split it and add the new chain */
+	if (i < root->val_nr) {
+		split_add_child(root, chain, cnode, i);
+		return 0;
+	}
+
+	/* we match 100% of the path, increment the hit */
+	if (i == root->val_nr) {
+		root->hit++;
+		return 0;
+	}
+
+	return __append_chain_children(root, chain);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain)
+{
+	if (__append_chain_children(root, chain) == -1)
+		add_child(root, chain);
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
new file mode 100644
index 0000000..fa1cd2f
--- /dev/null
+++ b/tools/perf/util/callchain.h
@@ -0,0 +1,33 @@
+#ifndef __PERF_CALLCHAIN_H
+#define __PERF_CALLCHAIN_H
+
+#include "../perf.h"
+#include "list.h"
+#include "rbtree.h"
+
+
+struct callchain_node {
+	struct callchain_node	*parent;
+	struct list_head	brothers;
+	struct list_head 	children;
+	struct list_head 	val;
+	struct rb_node		rb_node;
+	int			val_nr;
+	int			hit;
+};
+
+struct callchain_list {
+	unsigned long		ip;
+	struct list_head	list;
+};
+
+static inline void callchain_init(struct callchain_node *node)
+{
+	INIT_LIST_HEAD(&node->brothers);
+	INIT_LIST_HEAD(&node->children);
+	INIT_LIST_HEAD(&node->val);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain);
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node);
+#endif
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] perfcounter: print sorted callchains per histogram entries
  2009-06-26 14:27 [PATCH 0/2] perfcounter: callchains with perf report Frederic Weisbecker
  2009-06-26 14:28 ` [PATCH 1/2] perfcounter: prepare a small callchain framework Frederic Weisbecker
@ 2009-06-26 14:28 ` Frederic Weisbecker
  2009-06-26 15:52   ` [tip:perfcounters/urgent] perf report: Print " tip-bot for Frederic Weisbecker
  2009-06-26 14:48 ` [PATCH 0/2] perfcounter: callchains with perf report Ingo Molnar
  2009-06-27  1:12 ` Paul Mackerras
  3 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2009-06-26 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Frederic Weisbecker

Use the newly created callchains radix tree to gather the chains stats
from the recorded events and then print the callchains for all of them,
sorted by hits, using the "-c" parameter with perf report.

Example:

 66.15%  [k] atm_clip_exit
            63.08%
                0xffffffffffffff80
                0xffffffff810196a8
                0xffffffff810c14c8
                0xffffffff8101a79c
                0xffffffff810194f3
                0xffffffff8106ab7f
                0xffffffff8106abe5
                0xffffffff8106acde
                0xffffffff8100d94b
                0xffffffff8153e7ea
                [...]

             1.54%
                0xffffffffffffff80
                0xffffffff810196a8
                0xffffffff810c14c8
                0xffffffff8101a79c
		[...]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/builtin-report.c |   82 +++++++++++++++++++++++++++++++++++++------
 1 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 28d1cb2..ed391db 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -15,6 +15,7 @@
 #include "util/rbtree.h"
 #include "util/symbol.h"
 #include "util/string.h"
+#include "util/callchain.h"
 
 #include "perf.h"
 #include "util/header.h"
@@ -52,6 +53,7 @@ static char		*parent_pattern = default_parent_pattern;
 static regex_t		parent_regex;
 
 static int		exclude_other = 1;
+static int		callchain;
 
 static u64		sample_type;
 
@@ -488,17 +490,19 @@ static size_t threads__fprintf(FILE *fp)
 static struct rb_root hist;
 
 struct hist_entry {
-	struct rb_node	 rb_node;
-
-	struct thread	 *thread;
-	struct map	 *map;
-	struct dso	 *dso;
-	struct symbol	 *sym;
-	struct symbol	 *parent;
-	u64		 ip;
-	char		 level;
-
-	u64		 count;
+	struct rb_node		rb_node;
+
+	struct thread		*thread;
+	struct map		*map;
+	struct dso		*dso;
+	struct symbol		*sym;
+	struct symbol		*parent;
+	u64			ip;
+	char			level;
+	struct callchain_node	callchain;
+	struct rb_root		sorted_chain;
+
+	u64			count;
 };
 
 /*
@@ -769,6 +773,48 @@ hist_entry__collapse(struct hist_entry *left, struct hist_entry *right)
 }
 
 static size_t
+callchain__fprintf(FILE *fp, struct callchain_node *self, u64 total_samples)
+{
+	struct callchain_list *chain;
+	size_t ret = 0;
+
+	if (!self)
+		return 0;
+
+	ret += callchain__fprintf(fp, self->parent, total_samples);
+
+
+	list_for_each_entry(chain, &self->val, list)
+		ret += fprintf(fp, "                %p\n", (void *)chain->ip);
+
+	return ret;
+}
+
+static size_t
+hist_entry_callchain__fprintf(FILE *fp, struct hist_entry *self,
+			      u64 total_samples)
+{
+	struct rb_node *rb_node;
+	struct callchain_node *chain;
+	size_t ret = 0;
+
+	rb_node = rb_first(&self->sorted_chain);
+	while (rb_node) {
+		double percent;
+
+		chain = rb_entry(rb_node, struct callchain_node, rb_node);
+		percent = chain->hit * 100.0 / total_samples;
+		ret += fprintf(fp, "           %6.2f%%\n", percent);
+		ret += callchain__fprintf(fp, chain, total_samples);
+		ret += fprintf(fp, "\n");
+		rb_node = rb_next(rb_node);
+	}
+
+	return ret;
+}
+
+
+static size_t
 hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)
 {
 	struct sort_entry *se;
@@ -808,6 +854,9 @@ hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)
 
 	ret += fprintf(fp, "\n");
 
+	if (callchain)
+		hist_entry_callchain__fprintf(fp, self, total_samples);
+
 	return ret;
 }
 
@@ -892,6 +941,7 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 		.level	= level,
 		.count	= count,
 		.parent = NULL,
+		.sorted_chain = RB_ROOT
 	};
 	int cmp;
 
@@ -934,6 +984,8 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 
 		if (!cmp) {
 			he->count += count;
+			if (callchain)
+				append_chain(&he->callchain, chain);
 			return 0;
 		}
 
@@ -947,6 +999,10 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	if (!he)
 		return -ENOMEM;
 	*he = entry;
+	if (callchain) {
+		callchain_init(&he->callchain);
+		append_chain(&he->callchain, chain);
+	}
 	rb_link_node(&he->rb_node, parent, p);
 	rb_insert_color(&he->rb_node, &hist);
 
@@ -1023,6 +1079,9 @@ static void output__insert_entry(struct hist_entry *he)
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 
+	if (callchain)
+		sort_chain_to_rbtree(&he->sorted_chain, &he->callchain);
+
 	while (*p != NULL) {
 		parent = *p;
 		iter = rb_entry(parent, struct hist_entry, rb_node);
@@ -1599,6 +1658,7 @@ static const struct option options[] = {
 		   "regex filter to identify parent, see: '--sort parent'"),
 	OPT_BOOLEAN('x', "exclude-other", &exclude_other,
 		    "Only display entries with parent-match"),
+	OPT_BOOLEAN('c', "callchain", &callchain, "Display callchains"),
 	OPT_END()
 };
 
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] perfcounter: callchains with perf report
  2009-06-26 14:27 [PATCH 0/2] perfcounter: callchains with perf report Frederic Weisbecker
  2009-06-26 14:28 ` [PATCH 1/2] perfcounter: prepare a small callchain framework Frederic Weisbecker
  2009-06-26 14:28 ` [PATCH 2/2] perfcounter: print sorted callchains per histogram entries Frederic Weisbecker
@ 2009-06-26 14:48 ` Ingo Molnar
  2009-06-27  1:12 ` Paul Mackerras
  3 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2009-06-26 14:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Arnaldo Carvalho de Melo


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Hi,
> 
> Here is a first shot for the sorted callchains per entries handling
> with per report.

ok, this is already useful at displaying the raw data.

> I'll continue to improve it:
> 
> - symbol resolution
> - profit we have a tree to display a better graph hierarchy
> - let the user provide a limit for hit percentage, depth, number of
>   backtraces, etc...
> - better output
> - colors
> - and so on....

nice plans! :)

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:perfcounters/urgent] perf_counter tools: Prepare a small callchain framework
  2009-06-26 14:28 ` [PATCH 1/2] perfcounter: prepare a small callchain framework Frederic Weisbecker
@ 2009-06-26 15:51   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2009-06-26 15:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, efault, fweisbec,
	tglx, mingo

Commit-ID:  8cb76d99d715741637b6d0884f389e17e9cb05d2
Gitweb:     http://git.kernel.org/tip/8cb76d99d715741637b6d0884f389e17e9cb05d2
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Fri, 26 Jun 2009 16:28:00 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 26 Jun 2009 16:47:00 +0200

perf_counter tools: Prepare a small callchain framework

We plan to display the callchains depending on some user-configurable
parameters.

To gather the callchains stats from the recorded stream in a fast way,
this patch introduces an ad hoc radix tree adapted for callchains and also
a rbtree to sort these callchains once we have gathered every events
from the stream.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246026481-8314-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 tools/perf/Makefile         |    1 +
 tools/perf/builtin-report.c |    5 -
 tools/perf/perf.h           |    5 +
 tools/perf/util/callchain.c |  174 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/callchain.h |   33 ++++++++
 5 files changed, 213 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d3887ed..1c1296d 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -329,6 +329,7 @@ LIB_OBJS += util/symbol.o
 LIB_OBJS += util/color.o
 LIB_OBJS += util/pager.o
 LIB_OBJS += util/header.o
+LIB_OBJS += util/callchain.o
 
 BUILTIN_OBJS += builtin-annotate.o
 BUILTIN_OBJS += builtin-help.o
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 681c223..28d1cb2 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -62,11 +62,6 @@ struct ip_event {
 	unsigned char __more_data[];
 };
 
-struct ip_callchain {
-	u64 nr;
-	u64 ips[0];
-};
-
 struct mmap_event {
 	struct perf_event_header header;
 	u32 pid, tid;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 16c84fd..a49842b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -66,4 +66,9 @@ sys_perf_counter_open(struct perf_counter_attr *attr,
 #define MAX_COUNTERS			256
 #define MAX_NR_CPUS			256
 
+struct ip_callchain {
+	u64 nr;
+	u64 ips[0];
+};
+
 #endif
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
new file mode 100644
index 0000000..ad3c285
--- /dev/null
+++ b/tools/perf/util/callchain.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2009, Frederic Weisbecker <fweisbec@gmail.com>
+ *
+ * Handle the callchains from the stream in an ad-hoc radix tree and then
+ * sort them in an rbtree.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+
+#include "callchain.h"
+
+
+static void rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct callchain_node *rnode;
+
+	while (*p) {
+		parent = *p;
+		rnode = rb_entry(parent, struct callchain_node, rb_node);
+
+		if (rnode->hit < chain->hit)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	rb_link_node(&chain->rb_node, parent, p);
+	rb_insert_color(&chain->rb_node, root);
+}
+
+/*
+ * Once we get every callchains from the stream, we can now
+ * sort them by hit
+ */
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node)
+{
+	struct callchain_node *child;
+
+	list_for_each_entry(child, &node->children, brothers)
+		sort_chain_to_rbtree(rb_root, child);
+
+	if (node->hit)
+		rb_insert_callchain(rb_root, node);
+}
+
+static struct callchain_node *create_child(struct callchain_node *parent)
+{
+	struct callchain_node *new;
+
+	new = malloc(sizeof(*new));
+	if (!new) {
+		perror("not enough memory to create child for code path tree");
+		return NULL;
+	}
+	new->parent = parent;
+	INIT_LIST_HEAD(&new->children);
+	INIT_LIST_HEAD(&new->val);
+	list_add_tail(&new->brothers, &parent->children);
+
+	return new;
+}
+
+static void
+fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
+{
+	int i;
+
+	for (i = start; i < chain->nr; i++) {
+		struct callchain_list *call;
+
+		call = malloc(sizeof(*chain));
+		if (!call) {
+			perror("not enough memory for the code path tree");
+			return;
+		}
+		call->ip = chain->ips[i];
+		list_add_tail(&call->list, &node->val);
+	}
+	node->val_nr = i - start;
+}
+
+static void add_child(struct callchain_node *parent, struct ip_callchain *chain)
+{
+	struct callchain_node *new;
+
+	new = create_child(parent);
+	fill_node(new, chain, parent->val_nr);
+
+	new->hit = 1;
+}
+
+static void
+split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
+		struct callchain_list *to_split, int idx)
+{
+	struct callchain_node *new;
+
+	/* split */
+	new = create_child(parent);
+	list_move_tail(&to_split->list, &new->val);
+	new->hit = parent->hit;
+	parent->hit = 0;
+	parent->val_nr = idx;
+
+	/* create the new one */
+	add_child(parent, chain);
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		int start);
+
+static int
+__append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
+{
+	struct callchain_node *rnode;
+
+	/* lookup in childrens */
+	list_for_each_entry(rnode, &root->children, brothers) {
+		int ret = __append_chain(rnode, chain, root->val_nr);
+		if (!ret)
+			return 0;
+	}
+	return -1;
+}
+
+static int
+__append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		int start)
+{
+	struct callchain_list *cnode;
+	int i = start;
+	bool found = false;
+
+	/* lookup in the current node */
+	list_for_each_entry(cnode, &root->val, list) {
+		if (cnode->ip != chain->ips[i++])
+			break;
+		if (!found)
+			found = true;
+		if (i == chain->nr)
+			break;
+	}
+
+	/* matches not, relay on the parent */
+	if (!found)
+		return -1;
+
+	/* we match only a part of the node. Split it and add the new chain */
+	if (i < root->val_nr) {
+		split_add_child(root, chain, cnode, i);
+		return 0;
+	}
+
+	/* we match 100% of the path, increment the hit */
+	if (i == root->val_nr) {
+		root->hit++;
+		return 0;
+	}
+
+	return __append_chain_children(root, chain);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain)
+{
+	if (__append_chain_children(root, chain) == -1)
+		add_child(root, chain);
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
new file mode 100644
index 0000000..fa1cd2f
--- /dev/null
+++ b/tools/perf/util/callchain.h
@@ -0,0 +1,33 @@
+#ifndef __PERF_CALLCHAIN_H
+#define __PERF_CALLCHAIN_H
+
+#include "../perf.h"
+#include "list.h"
+#include "rbtree.h"
+
+
+struct callchain_node {
+	struct callchain_node	*parent;
+	struct list_head	brothers;
+	struct list_head 	children;
+	struct list_head 	val;
+	struct rb_node		rb_node;
+	int			val_nr;
+	int			hit;
+};
+
+struct callchain_list {
+	unsigned long		ip;
+	struct list_head	list;
+};
+
+static inline void callchain_init(struct callchain_node *node)
+{
+	INIT_LIST_HEAD(&node->brothers);
+	INIT_LIST_HEAD(&node->children);
+	INIT_LIST_HEAD(&node->val);
+}
+
+void append_chain(struct callchain_node *root, struct ip_callchain *chain);
+void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node);
+#endif

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip:perfcounters/urgent] perf report: Print sorted callchains per histogram entries
  2009-06-26 14:28 ` [PATCH 2/2] perfcounter: print sorted callchains per histogram entries Frederic Weisbecker
@ 2009-06-26 15:52   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2009-06-26 15:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, hpa, mingo, a.p.zijlstra, efault, fweisbec,
	tglx, mingo

Commit-ID:  f55c555226b1010b249730ec6b232e5470286950
Gitweb:     http://git.kernel.org/tip/f55c555226b1010b249730ec6b232e5470286950
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Fri, 26 Jun 2009 16:28:01 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 26 Jun 2009 16:47:01 +0200

perf report: Print sorted callchains per histogram entries

Use the newly created callchains radix tree to gather the chains stats
from the recorded events and then print the callchains for all of them,
sorted by hits, using the "-c" parameter with perf report.

Example:

 66.15%  [k] atm_clip_exit
            63.08%
                0xffffffffffffff80
                0xffffffff810196a8
                0xffffffff810c14c8
                0xffffffff8101a79c
                0xffffffff810194f3
                0xffffffff8106ab7f
                0xffffffff8106abe5
                0xffffffff8106acde
                0xffffffff8100d94b
                0xffffffff8153e7ea
                [...]

             1.54%
                0xffffffffffffff80
                0xffffffff810196a8
                0xffffffff810c14c8
                0xffffffff8101a79c
		[...]

Symbols are not yet resolved.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246026481-8314-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 tools/perf/builtin-report.c |   82 +++++++++++++++++++++++++++++++++++++------
 1 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 28d1cb2..ed391db 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -15,6 +15,7 @@
 #include "util/rbtree.h"
 #include "util/symbol.h"
 #include "util/string.h"
+#include "util/callchain.h"
 
 #include "perf.h"
 #include "util/header.h"
@@ -52,6 +53,7 @@ static char		*parent_pattern = default_parent_pattern;
 static regex_t		parent_regex;
 
 static int		exclude_other = 1;
+static int		callchain;
 
 static u64		sample_type;
 
@@ -488,17 +490,19 @@ static size_t threads__fprintf(FILE *fp)
 static struct rb_root hist;
 
 struct hist_entry {
-	struct rb_node	 rb_node;
-
-	struct thread	 *thread;
-	struct map	 *map;
-	struct dso	 *dso;
-	struct symbol	 *sym;
-	struct symbol	 *parent;
-	u64		 ip;
-	char		 level;
-
-	u64		 count;
+	struct rb_node		rb_node;
+
+	struct thread		*thread;
+	struct map		*map;
+	struct dso		*dso;
+	struct symbol		*sym;
+	struct symbol		*parent;
+	u64			ip;
+	char			level;
+	struct callchain_node	callchain;
+	struct rb_root		sorted_chain;
+
+	u64			count;
 };
 
 /*
@@ -769,6 +773,48 @@ hist_entry__collapse(struct hist_entry *left, struct hist_entry *right)
 }
 
 static size_t
+callchain__fprintf(FILE *fp, struct callchain_node *self, u64 total_samples)
+{
+	struct callchain_list *chain;
+	size_t ret = 0;
+
+	if (!self)
+		return 0;
+
+	ret += callchain__fprintf(fp, self->parent, total_samples);
+
+
+	list_for_each_entry(chain, &self->val, list)
+		ret += fprintf(fp, "                %p\n", (void *)chain->ip);
+
+	return ret;
+}
+
+static size_t
+hist_entry_callchain__fprintf(FILE *fp, struct hist_entry *self,
+			      u64 total_samples)
+{
+	struct rb_node *rb_node;
+	struct callchain_node *chain;
+	size_t ret = 0;
+
+	rb_node = rb_first(&self->sorted_chain);
+	while (rb_node) {
+		double percent;
+
+		chain = rb_entry(rb_node, struct callchain_node, rb_node);
+		percent = chain->hit * 100.0 / total_samples;
+		ret += fprintf(fp, "           %6.2f%%\n", percent);
+		ret += callchain__fprintf(fp, chain, total_samples);
+		ret += fprintf(fp, "\n");
+		rb_node = rb_next(rb_node);
+	}
+
+	return ret;
+}
+
+
+static size_t
 hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)
 {
 	struct sort_entry *se;
@@ -808,6 +854,9 @@ hist_entry__fprintf(FILE *fp, struct hist_entry *self, u64 total_samples)
 
 	ret += fprintf(fp, "\n");
 
+	if (callchain)
+		hist_entry_callchain__fprintf(fp, self, total_samples);
+
 	return ret;
 }
 
@@ -892,6 +941,7 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 		.level	= level,
 		.count	= count,
 		.parent = NULL,
+		.sorted_chain = RB_ROOT
 	};
 	int cmp;
 
@@ -934,6 +984,8 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 
 		if (!cmp) {
 			he->count += count;
+			if (callchain)
+				append_chain(&he->callchain, chain);
 			return 0;
 		}
 
@@ -947,6 +999,10 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	if (!he)
 		return -ENOMEM;
 	*he = entry;
+	if (callchain) {
+		callchain_init(&he->callchain);
+		append_chain(&he->callchain, chain);
+	}
 	rb_link_node(&he->rb_node, parent, p);
 	rb_insert_color(&he->rb_node, &hist);
 
@@ -1023,6 +1079,9 @@ static void output__insert_entry(struct hist_entry *he)
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 
+	if (callchain)
+		sort_chain_to_rbtree(&he->sorted_chain, &he->callchain);
+
 	while (*p != NULL) {
 		parent = *p;
 		iter = rb_entry(parent, struct hist_entry, rb_node);
@@ -1599,6 +1658,7 @@ static const struct option options[] = {
 		   "regex filter to identify parent, see: '--sort parent'"),
 	OPT_BOOLEAN('x', "exclude-other", &exclude_other,
 		    "Only display entries with parent-match"),
+	OPT_BOOLEAN('c', "callchain", &callchain, "Display callchains"),
 	OPT_END()
 };
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] perfcounter: callchains with perf report
  2009-06-26 14:27 [PATCH 0/2] perfcounter: callchains with perf report Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2009-06-26 14:48 ` [PATCH 0/2] perfcounter: callchains with perf report Ingo Molnar
@ 2009-06-27  1:12 ` Paul Mackerras
  2009-06-28 21:05   ` Frederic Weisbecker
  3 siblings, 1 reply; 9+ messages in thread
From: Paul Mackerras @ 2009-06-27  1:12 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Ingo Molnar, LKML, Peter Zijlstra, Mike Galbraith

Frederic Weisbecker writes:

> Here is a first shot for the sorted callchains per entries handling
> with per report.
> 
> I'll continue to improve it:
> 
> - symbol resolution
> - profit we have a tree to display a better graph hierarchy
> - let the user provide a limit for hit percentage, depth, number of
>   backtraces, etc...
> - better output
> - colors
> - and so on....

Nice!

I have just about finished doing the kernel piece of callchain support
on powerpc.  Because of the way function calls and returns work on
powerpc, working out the first one or two return addresses can be
tricky.  We potentially have a valid return address in the link
register (LR), or in the LR save area in the second stack frame, or
both, and you need extra information such as DWARF unwind tables to
work out which of those three possibilities you have, in general.
This is the case at each point where an interrupt or signal has
occurred.

Because I didn't want to go trawling through CFI tables at interrupt
time, particularly for user code, I made the kernel save both possible
return addresses in the callchain.  For the kernel part of the
callchain, I check those two addresses to see if they're valid kernel
addresses and set them to 0 if not, or if they're equal.

That means I need to make some changes to builtin-report.c to ignore
zero addresses.  I may need to add stuff to look for and use unwind
tables as well, if we want completely accurate call chains.

The other thing I did is to put PERF_CONTEXT_KERNEL markers in the
callchain every time we find an interrupt frame, and PERF_CONTEXT_USER
markers every time we find a signal frame, so that userspace knows
when it needs to do the unwinding.

Oh, and a third point is that on powerpc the sampled IP recorded if
you ask for PERF_SAMPLE_IP won't in general be the same as the first
IP in the callchain.  The reason is that the PERF_SAMPLE_IP value
points to the instruction that caused the counter overflow whereas the
first IP in the callchain tells you where the CPU took the interrupt.
That is almost always a few instructions further on, and can be quite
a way further on if interrupts were disabled when the counter overflow
occur.  In fact we regularly see the PERF_SAMPLE_IP value being in the
hypervisor but the first IP in the callchain being in the kernel.

Paul.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] perfcounter: callchains with perf report
  2009-06-27  1:12 ` Paul Mackerras
@ 2009-06-28 21:05   ` Frederic Weisbecker
  2009-06-28 23:40     ` Paul Mackerras
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2009-06-28 21:05 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Ingo Molnar, LKML, Peter Zijlstra, Mike Galbraith

On Sat, Jun 27, 2009 at 11:12:47AM +1000, Paul Mackerras wrote:
> Frederic Weisbecker writes:
> 
> > Here is a first shot for the sorted callchains per entries handling
> > with per report.
> > 
> > I'll continue to improve it:
> > 
> > - symbol resolution
> > - profit we have a tree to display a better graph hierarchy
> > - let the user provide a limit for hit percentage, depth, number of
> >   backtraces, etc...
> > - better output
> > - colors
> > - and so on....
> 
> Nice!
> 
> I have just about finished doing the kernel piece of callchain support
> on powerpc.  Because of the way function calls and returns work on
> powerpc, working out the first one or two return addresses can be
> tricky.  We potentially have a valid return address in the link
> register (LR), or in the LR save area in the second stack frame, or
> both, and you need extra information such as DWARF unwind tables to
> work out which of those three possibilities you have, in general.
> This is the case at each point where an interrupt or signal has
> occurred.
> 
> Because I didn't want to go trawling through CFI tables at interrupt
> time, particularly for user code, I made the kernel save both possible
> return addresses in the callchain.  For the kernel part of the
> callchain, I check those two addresses to see if they're valid kernel
> addresses and set them to 0 if not, or if they're equal.
> 
> That means I need to make some changes to builtin-report.c to ignore
> zero addresses.  I may need to add stuff to look for and use unwind
> tables as well, if we want completely accurate call chains.


Well, I guess I can ignore them in my further patches.
But wouldn't it be better to discard them from the kernel?
Unless it's somewhat useful to know we had an unknown entry?

 
> The other thing I did is to put PERF_CONTEXT_KERNEL markers in the
> callchain every time we find an interrupt frame, and PERF_CONTEXT_USER
> markers every time we find a signal frame, so that userspace knows
> when it needs to do the unwinding.
> 
> Oh, and a third point is that on powerpc the sampled IP recorded if
> you ask for PERF_SAMPLE_IP won't in general be the same as the first
> IP in the callchain.  The reason is that the PERF_SAMPLE_IP value
> points to the instruction that caused the counter overflow whereas the
> first IP in the callchain tells you where the CPU took the interrupt.
> That is almost always a few instructions further on, and can be quite
> a way further on if interrupts were disabled when the counter overflow
> occur.  In fact we regularly see the PERF_SAMPLE_IP value being in the
> hypervisor but the first IP in the callchain being in the kernel.


Ok.

> 
> Paul.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] perfcounter: callchains with perf report
  2009-06-28 21:05   ` Frederic Weisbecker
@ 2009-06-28 23:40     ` Paul Mackerras
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Mackerras @ 2009-06-28 23:40 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Ingo Molnar, LKML, Peter Zijlstra, Mike Galbraith

Frederic Weisbecker writes:

> On Sat, Jun 27, 2009 at 11:12:47AM +1000, Paul Mackerras wrote:
>
> > That means I need to make some changes to builtin-report.c to ignore
> > zero addresses.  I may need to add stuff to look for and use unwind
> > tables as well, if we want completely accurate call chains.
> 
> 
> Well, I guess I can ignore them in my further patches.
> But wouldn't it be better to discard them from the kernel?
> Unless it's somewhat useful to know we had an unknown entry?

If we discard the entries then userspace doesn't know which entries
were discarded, or whether any were discarded.  As it is, userspace
can know that the second value after PERF_CONTEXT_KERNEL/USER is a LR
value or zero, and the third value is from the second stack frame (or
zero).  If we discarded the entries then userspace wouldn't know
exactly where the second and third values came from, which would make
it harder to use unwind or traceback tables to work out more
accurately what the call chain was.

I would be open to replacing the bogus entries with some other
distinguishable value rather than zero if you think that would be
better.

Paul.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-28 23:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-26 14:27 [PATCH 0/2] perfcounter: callchains with perf report Frederic Weisbecker
2009-06-26 14:28 ` [PATCH 1/2] perfcounter: prepare a small callchain framework Frederic Weisbecker
2009-06-26 15:51   ` [tip:perfcounters/urgent] perf_counter tools: Prepare " tip-bot for Frederic Weisbecker
2009-06-26 14:28 ` [PATCH 2/2] perfcounter: print sorted callchains per histogram entries Frederic Weisbecker
2009-06-26 15:52   ` [tip:perfcounters/urgent] perf report: Print " tip-bot for Frederic Weisbecker
2009-06-26 14:48 ` [PATCH 0/2] perfcounter: callchains with perf report Ingo Molnar
2009-06-27  1:12 ` Paul Mackerras
2009-06-28 21:05   ` Frederic Weisbecker
2009-06-28 23:40     ` Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.