linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior
@ 2009-12-15 11:17 Xiao Guangrong
  2009-12-15 11:19 ` [PATCH 1/4] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-15 11:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
	Steven Rostedt, LKML

Hi,

We introduce 'perf timer' in this patchset, it can analyze timer
latency and timer function handle time, the usage and result is
like below:

# perf timer record
# perf timer lat --print-lat --print-handle
-------------------------------------------------------------------------------------------------------
|   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |
|0xf7ad1f5c          |hrtimer   |996068.500    ns|1607650     ns|10270128658526      |init            |
|0xf7903f04          |timer     |0.625         HZ|2           HZ|10270344082394      |swapper         |
|0xf787a05c          |hrtimer   |200239.500    ns|359929      ns|10269316024808      |main            |
|main      :[   PROF]|itimer    |0.000         HZ|0           HZ|10237021270557      |main            |
|main      :[VIRTUAL]|itimer    |0.000         HZ|0           HZ|10257314773501      |main            |

......

-------------------------------------------------------------------------------------------------------
|   Timer            |   TYPE   | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
|0xf7ad1f5c          |hrtimer   |0.025           |0.025         |10270.129           |0xc016b5b0      |
|0xf7903f04          |timer     |0.009           |0.011         |10260.342           |0xc0159240      |
|0xf787a05c          |hrtimer   |0.031           |0.062         |10267.018           |0xc014cc40      |

And, in current code, it'll complain with below message when we use
'perf timer lat':
 
  # ./perf timer lat
  Warning: unknown op '{'
  Warning: Error: expected type 5 but read 1
  Warning: failed to read event print fmt for hrtimer_start
  Warning: unknown op '{'
  Warning: Error: expected type 5 but read 1
  Warning: failed to read event print fmt for hrtimer_expire_entry
 
 It's because perf parse "hrtimer_start" and "hrtimer_expire_entry" fail,
 but it not hurt using.
 
 See: http://lkml.org/lkml/2009/10/12/726
 
 include/trace/events/timer.h            |    8 
 tools/perf/Documentation/perf-timer.txt |   40 +
 tools/perf/Makefile                     |    1 
 tools/perf/builtin-sched.c              |    3 
 tools/perf/builtin-timer.c              |  954 ++++++++++++++++++++++++++++++++
 tools/perf/builtin.h                    |    1 
 tools/perf/command-list.txt             |    1 
 tools/perf/perf.c                       |    1 
 tools/perf/util/trace-event-parse.c     |   25 
 tools/perf/util/trace-event.h           |    4 
 10 files changed, 1030 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/4] trace_event: record task' real_timer in itimer_state tracepoint
  2009-12-15 11:17 [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Xiao Guangrong
@ 2009-12-15 11:19 ` Xiao Guangrong
  2009-12-15 11:20   ` [PATCH 2/4] perf_event: fix getting point Xiao Guangrong
  2009-12-15 14:15 ` [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Frederic Weisbecker
  2009-12-15 14:23 ` Frederic Weisbecker
  2 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-15 11:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
	Steven Rostedt, LKML

When itimer is ITIMER_REAL, we should cooperate with hrtimer's
tracepoit to get it's latency, this patch can let us get itimer's
hrtimer easy

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 include/trace/events/timer.h |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 9496b96..13ec15a 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -285,6 +285,7 @@ TRACE_EVENT(itimer_state,
 
 	TP_STRUCT__entry(
 		__field(	int,		which		)
+		__field(	void *,		timer		)
 		__field(	cputime_t,	expires		)
 		__field(	long,		value_sec	)
 		__field(	long,		value_usec	)
@@ -294,6 +295,8 @@ TRACE_EVENT(itimer_state,
 
 	TP_fast_assign(
 		__entry->which		= which;
+		__entry->timer          = which == ITIMER_REAL ?
+					  &current->signal->real_timer : NULL;
 		__entry->expires	= expires;
 		__entry->value_sec	= value->it_value.tv_sec;
 		__entry->value_usec	= value->it_value.tv_usec;
@@ -301,8 +304,9 @@ TRACE_EVENT(itimer_state,
 		__entry->interval_usec	= value->it_interval.tv_usec;
 	),
 
-	TP_printk("which=%d expires=%llu it_value=%ld.%ld it_interval=%ld.%ld",
-		  __entry->which, (unsigned long long)__entry->expires,
+	TP_printk("which=%d timer=%p expires=%llu it_value=%ld.%ld it_interval="
+		  "%ld.%ld", __entry->which, __entry->timer,
+		  (unsigned long long)__entry->expires,
 		  __entry->value_sec, __entry->value_usec,
 		  __entry->interval_sec, __entry->interval_usec)
 );
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/4] perf_event: fix getting point
  2009-12-15 11:19 ` [PATCH 1/4] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
@ 2009-12-15 11:20   ` Xiao Guangrong
  2009-12-15 11:21     ` [PATCH 3/4] perf/timer: add document Xiao Guangrong
  2009-12-15 13:58     ` [PATCH 2/4] perf_event: fix getting point Frederic Weisbecker
  0 siblings, 2 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-15 11:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
	Steven Rostedt, LKML

The point that got by raw_field_ptr() is point to @data offset not
tracepoint saved.

Introduce raw_field_arry() to get arry data is tracepoint save
arry to it.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/builtin-sched.c          |    3 +--
 tools/perf/util/trace-event-parse.c |   25 ++++++++++++++++++++++---
 tools/perf/util/trace-event.h       |    3 +++
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index d67f274..8df982e 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -630,8 +630,7 @@ static void test_calibrations(void)
 
 #define FILL_ARRAY(ptr, array, event, data)			\
 do {								\
-	void *__array = raw_field_ptr(event, #array, data);	\
-	memcpy(ptr.array, __array, sizeof(ptr.array));	\
+	raw_field_arry(event, #array, data, ptr.array, sizeof(ptr.array));\
 } while(0)
 
 #define FILL_COMMON_FIELDS(ptr, event, data)			\
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index c5c32be..b0ee64f 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
 			field->flags |= FIELD_IS_ARRAY;
 
 			type = read_token(&token);
-		        while (strcmp(token, "]") != 0) {
+			while (strcmp(token, "]") != 0) {
 				if (last_type == EVENT_ITEM &&
 				    type == EVENT_ITEM)
 					len = 2;
@@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)
 
 void *raw_field_ptr(struct event *event, const char *name, void *data)
 {
+	void *ptr;
+	unsigned long long value;
+
+	value = raw_field_value(event, name, data);
+
+	if (!value)
+		return NULL;
+
+	memcpy(&ptr, &value, sizeof(ptr));
+	return ptr;
+}
+
+unsigned long long
+raw_field_arry(struct event *event, const char *name, void *data,
+	       void *arry, int arry_size)
+{
+	int len;
 	struct format_field *field;
 
 	field = find_any_field(event, name);
 	if (!field)
-		return NULL;
+		return 0ULL;
 
-	return data + field->offset;
+	len = min(arry_size, field->size);
+	memcpy(arry, data + field->offset, len);
+	return len;
 }
 
 static int get_common_info(const char *type, int *offset, int *size)
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 81698d5..3f88010 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -255,6 +255,9 @@ unsigned long long read_size(void *ptr, int size);
 unsigned long long
 raw_field_value(struct event *event, const char *name, void *data);
 void *raw_field_ptr(struct event *event, const char *name, void *data);
+unsigned long long
+raw_field_arry(struct event *event, const char *name, void *data,
+	       void *arry, int arry_size);
 unsigned long long eval_flag(const char *flag);
 
 int read_tracing_data(int fd, struct perf_event_attr *pattrs, int nb_events);
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/4] perf/timer: add document
  2009-12-15 11:20   ` [PATCH 2/4] perf_event: fix getting point Xiao Guangrong
@ 2009-12-15 11:21     ` Xiao Guangrong
  2009-12-15 11:22       ` [PATCH 4/4] perf/timer: 'perf timer' core code Xiao Guangrong
  2009-12-15 13:58     ` [PATCH 2/4] perf_event: fix getting point Frederic Weisbecker
  1 sibling, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-15 11:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
	Steven Rostedt, LKML

Add document for 'perf timer'

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/Documentation/perf-timer.txt |   40 +++++++++++++++++++++++++++++++
 1 files changed, 40 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-timer.txt

diff --git a/tools/perf/Documentation/perf-timer.txt b/tools/perf/Documentation/perf-timer.txt
new file mode 100644
index 0000000..fbff936
--- /dev/null
+++ b/tools/perf/Documentation/perf-timer.txt
@@ -0,0 +1,40 @@
+perf-timer(1)
+==============
+
+NAME
+----
+perf-timer - Tool to trace/measure kernel timer properties
+
+SYNOPSIS
+--------
+[verse]
+'perf timer' {record|latency} [<options>]
+
+DESCRIPTION
+-----------
+There are two variants of perf timer:
+
+  'perf timer record <command>' to record the timer events
+  of an arbitrary workload.
+
+  'perf timer latency' to report kernel timer latency and
+  timer function handle time.
+
+OPTIONS
+-------
+-D::
+--dump-raw-trace=::
+        Display verbose dump of the sched data.
+
+-i <file>::
+--input=<file>::
+	Select the input file (default: perf.data)
+
+-s <key[,key2...]>::
+--sort=<key[,key2...]>::
+	Sort the output (default: timer, type, avg-timer-latency,
+	max-timer-latency, avg-handle-time, max-handle-time)
+
+SEE ALSO
+--------
+linkperf:perf-record[1]
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/4] perf/timer: 'perf timer' core code
  2009-12-15 11:21     ` [PATCH 3/4] perf/timer: add document Xiao Guangrong
@ 2009-12-15 11:22       ` Xiao Guangrong
  2009-12-15 17:44         ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-15 11:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Peter Zijlstra, Frederic Weisbecker,
	Steven Rostedt, LKML

It's the core code of 'perf timer', we can use it to analyse timer's
behavior:
# perf timer record
# perf timer lat --print-lat --print-handle
-------------------------------------------------------------------------------------------------------
|   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |
|0xf7ad1f5c          |hrtimer   |996068.500    ns|1607650     ns|10270128658526      |init            |
|0xf7903f04          |timer     |0.625         HZ|2           HZ|10270344082394      |swapper         |
|0xf787a05c          |hrtimer   |200239.500    ns|359929      ns|10269316024808      |main            |
|main      :[   PROF]|itimer    |0.000         HZ|0           HZ|10237021270557      |main            |
|main      :[VIRTUAL]|itimer    |0.000         HZ|0           HZ|10257314773501      |main            |

......

-------------------------------------------------------------------------------------------------------
|   Timer            |   TYPE   | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
|0xf7ad1f5c          |hrtimer   |0.025           |0.025         |10270.129           |0xc016b5b0      |
|0xf7903f04          |timer     |0.009           |0.011         |10260.342           |0xc0159240      |
|0xf787a05c          |hrtimer   |0.031           |0.062         |10267.018           |0xc014cc40      |

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/Makefile         |    1 +
 tools/perf/builtin-timer.c  |  954 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/builtin.h        |    1 +
 tools/perf/command-list.txt |    1 +
 tools/perf/perf.c           |    1 +
 5 files changed, 958 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/builtin-timer.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index a4cb792..797ab53 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -439,6 +439,7 @@ BUILTIN_OBJS += builtin-top.o
 BUILTIN_OBJS += builtin-trace.o
 BUILTIN_OBJS += builtin-probe.o
 BUILTIN_OBJS += builtin-kmem.o
+BUILTIN_OBJS += builtin-timer.o
 
 PERFLIBS = $(LIB_FILE)
 
diff --git a/tools/perf/builtin-timer.c b/tools/perf/builtin-timer.c
new file mode 100644
index 0000000..d35315f
--- /dev/null
+++ b/tools/perf/builtin-timer.c
@@ -0,0 +1,954 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/debug.h"
+#include "util/session.h"
+
+#include <sys/prctl.h>
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+#ifdef TDEBUG
+static char const *timer_state_string[] = {
+	"TIMER_INIT",
+	"TIMER_STARE",
+	"TIMER_EXPIRE_ENTRY",
+	"TIMER_EXPIRE_EXIT",
+};
+
+#define dprintf(format, args...)	printf(format, ## args)
+#else
+#define dprintf(format, args...)
+#endif
+
+#define SORT_KEY \
+	"timer, type, avg-timer-latency, max-timer-latency,"	\
+	" avg-handle-time, max-handle-time"
+
+static char const	*input_name = "perf.data";
+static char		default_sort_order[] = SORT_KEY;
+
+static char		*sort_order = default_sort_order;
+static int		profile_cpu = -1;
+static int		print_lat;
+static int		print_handle;
+static int		bug_nr;
+static struct perf_session *timer_session;
+
+static const struct option timer_options[] = {
+	OPT_STRING('i', "input", &input_name, "file",
+		    "input file name"),
+	OPT_BOOLEAN('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_END()
+};
+
+static const char * const timer_usage[] = {
+	"perf timer [<options>] {record|latency}",
+	NULL
+};
+
+static const struct option latency_options[] = {
+	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+		   "sort by key(s): "SORT_KEY),
+	OPT_BOOLEAN('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_INTEGER('C', "CPU", &profile_cpu,
+		    "CPU to profile on"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN(0, "print-lat", &print_lat, "show timer latency"),
+	OPT_BOOLEAN(0, "print-handle", &print_handle,
+		    "show timer function handle time"),
+	OPT_END()
+};
+
+static const char * const latency_usage[] = {
+	"perf timer latency [<options>]",
+	NULL
+};
+
+static const char *record_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-M",
+	"-f",
+	"-m", "1024",
+	"-c", "1",
+
+	/* timer */
+	"-e", "timer:timer_start",
+	"-e", "timer:timer_expire_entry",
+	"-e", "timer:timer_expire_exit",
+
+	/* hrtimer */
+	"-e", "timer:hrtimer_start",
+	"-e", "timer:hrtimer_expire_entry",
+	"-e", "timer:hrtimer_expire_exit",
+
+	/* itimer **/
+	"-e", "timer:itimer_state",
+	"-e", "timer:itimer_expire",
+};
+
+static LIST_HEAD(default_cmp);
+
+enum timer_type {
+	TIMER,
+	HRTIMER,
+	ITIMER,
+};
+
+static char const *timer_type_string[] = {
+	"timer",
+	"hrtimer",
+	"itimer",
+};
+
+enum timer_state {
+	TIMER_INIT,
+	TIMER_START,
+	TIMER_EXPIRE_ENTRY,
+	TIMER_EXPIRE_EXIT,
+};
+
+struct timer_info {
+	struct rb_node node;
+
+	void			*timer;
+	int			itimer_type;
+	void			*function;
+	enum timer_type		type;
+	enum timer_state	state;
+	int expire_nr;
+	int bug;
+
+	struct timer_info 	*itimer_hrtimer;
+
+	u64			 timer_last_lat;
+	u64			timer_expire_ts;
+	u64			timer_total_lat;
+	double			timer_avg_lat;
+	u64			timer_max_lat;
+	u64			timer_max_lat_at_ts;
+	struct			thread *timer_max_lat_at_thread;
+
+	u64			handler_entry_ts;
+	u64			handler_total_time;
+	double			handler_avg_time;
+	u64			handler_max_time;
+	u64		handler_max_time_at_ts;
+	void		*handler_max_time_at_func;
+};
+
+#define MAX_ITIMER_TYPE	(0xFFFF)
+typedef int (*sort_fn_t)(struct timer_info *, struct timer_info *);
+
+struct sort_dimension {
+	const char		*name;
+	sort_fn_t		cmp;
+	struct list_head	list;
+};
+
+#define CMP_FUNCTION(value)						\
+static int cmp_##value(struct timer_info *v1, struct timer_info *v2)	\
+{									\
+	if (v1->value < v2->value)					\
+		return -1;						\
+	if (v1->value > v2->value)					\
+		return 1;						\
+	return 0;							\
+}
+
+#define DECLARE_SORT_DIMENSION(sort_name, field)			\
+CMP_FUNCTION(field)							\
+static struct sort_dimension sort_##field = {				\
+	.name = sort_name,						\
+	.cmp = cmp_##field,						\
+}
+
+DECLARE_SORT_DIMENSION("timer", timer);
+DECLARE_SORT_DIMENSION("type", type);
+DECLARE_SORT_DIMENSION("avg-timer-latency", timer_avg_lat);
+DECLARE_SORT_DIMENSION("max-timer-latency", timer_max_lat);
+DECLARE_SORT_DIMENSION("avg-handle-time", handler_avg_time);
+DECLARE_SORT_DIMENSION("max-handle-time", handler_max_time);
+
+static struct sort_dimension *available_sorts[] = {
+	&sort_timer,
+	&sort_type,
+	&sort_timer_avg_lat,
+	&sort_timer_max_lat,
+	&sort_handler_avg_time,
+	&sort_handler_max_time,
+};
+
+#define NB_AVAILABLE_SORTS	(int)(sizeof(available_sorts) / sizeof(struct sort_dimension *))
+static int sort_dimension__add(const char *tok, struct list_head *list)
+{
+	int i;
+
+	for (i = 0; i < NB_AVAILABLE_SORTS; i++) {
+		if (!strcmp(available_sorts[i]->name, tok)) {
+			list_add_tail(&available_sorts[i]->list, list);
+
+			return 0;
+		}
+	}
+
+	return -1;
+}
+
+static LIST_HEAD(sort_list);
+
+static void setup_sorting(void)
+{
+	char *tmp, *tok, *str = strdup(sort_order);
+
+	for (tok = strtok_r(str, ", ", &tmp);
+			tok; tok = strtok_r(NULL, ", ", &tmp)) {
+		if (sort_dimension__add(tok, &sort_list) < 0) {
+			error("Unknown --sort key: `%s'", tok);
+			usage_with_options(latency_usage, latency_options);
+		}
+	}
+
+	free(str);
+
+	sort_dimension__add("timer", &default_cmp);
+	sort_dimension__add("itimer-type", &default_cmp);
+}
+
+static struct rb_root timer_root;
+
+static int timer_key_cmp(struct list_head *list, struct timer_info *l,
+			 struct timer_info *r)
+{
+	struct sort_dimension *sort;
+	int ret = 0;
+
+	BUG_ON(list_empty(list));
+
+	list_for_each_entry(sort, list, list) {
+		ret = sort->cmp(l, r);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
+static void timer_traversal(struct rb_root *root,
+			    void (*fun)(struct timer_info *))
+{
+	struct rb_node *next;
+
+	next = rb_first(root);
+
+	while (next) {
+		struct timer_info *timer;
+
+		timer = rb_entry(next, struct timer_info, node);
+		fun(timer);
+		next = rb_next(next);
+	}
+}
+
+static struct timer_info *
+__timer_search(struct rb_root *root, struct timer_info *timer,
+	       struct list_head *_sort_list)
+{
+	struct rb_node *node = root->rb_node;
+
+	while (node) {
+		struct timer_info *timer_info;
+		int cmp;
+
+		timer_info = container_of(node, struct timer_info, node);
+
+		cmp = timer_key_cmp(_sort_list, timer, timer_info);
+		if (cmp > 0)
+			node = node->rb_left;
+		else if (cmp < 0)
+			node = node->rb_right;
+		else
+			return timer_info;
+	}
+	return NULL;
+}
+
+static struct timer_info *
+timer_search(struct timer_info *timer)
+{
+	return __timer_search(&timer_root, timer, &default_cmp);
+}
+
+static void
+__timer_insert(struct rb_root *root, struct timer_info *data,
+			 struct list_head *_sort_list)
+{
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+	while (*new) {
+		struct timer_info *this;
+		int cmp;
+
+		this = container_of(*new, struct timer_info, node);
+		parent = *new;
+
+		cmp = timer_key_cmp(_sort_list, data, this);
+
+		if (cmp > 0)
+			new = &((*new)->rb_left);
+		else
+			new = &((*new)->rb_right);
+	}
+
+	rb_link_node(&data->node, parent, new);
+	rb_insert_color(&data->node, root);
+}
+
+static void timer_insert(struct timer_info *timer_info)
+{
+	dprintf("timer %p insert...\n", timer_info->timer);
+	__timer_insert(&timer_root, timer_info, &default_cmp);
+}
+
+static struct timer_info
+*__timer_find(void *timer, enum timer_type type, int itimer_type)
+{
+	struct timer_info *find = NULL;
+	struct timer_info timer_info = {
+			.timer = timer,
+			.itimer_type = itimer_type,
+		};
+
+	find = timer_search(&timer_info);
+	if (find && find->type != type) {
+
+		dprintf("find timer[%p], but type[%s] is not we expect[%s],"
+			"set to initializtion state.\n", find->timer,
+			timer_type_string[find->type], timer_type_string[type]);
+
+		find->type = type;
+		find->bug++;
+		find->state = TIMER_INIT;
+	}
+
+	dprintf("find timer[%p] %s\n", timer, find ? "success" : "fail");
+	return find;
+}
+
+static struct timer_info *
+timer_findnew(void *timer, enum timer_type type, int itimer_type)
+{
+	struct timer_info *find;
+
+	find = __timer_find(timer, type, itimer_type);
+	if (find)
+		return find;
+
+	find = zalloc(sizeof(*find));
+	if (!find)
+		die("No memory");
+
+	find->type = type;
+	find->timer = timer;
+	find->itimer_type = itimer_type;
+	timer_insert(find);
+
+	return find;
+}
+
+static struct rb_root timer_result;
+
+static void sort_result(void)
+{
+	struct rb_node *node;
+
+	for (;;) {
+		struct timer_info *data;
+		node = rb_first(&timer_root);
+		if (!node)
+			break;
+
+		rb_erase(node, &timer_root);
+		data = rb_entry(node, struct timer_info, node);
+		data->timer_avg_lat = (double)data->timer_total_lat /
+					(double)data->expire_nr;
+		data->handler_avg_time = (double)data->handler_total_time /
+					(double)data->expire_nr;
+		if (data->expire_nr)
+			__timer_insert(&timer_result, data, &sort_list);
+	}
+}
+
+static int timer_check_set_state(struct timer_info *timer_info,
+				 enum timer_state new_state)
+{
+	enum timer_state current_state = timer_info->state;
+	int ret = 1;
+
+	/*
+	 * if current state is TIMER_INIT, it means it's the first time we
+	 * catched, no need check.
+	 */
+	if (current_state == TIMER_INIT) {
+		dprintf("timer[%p] initialize, set state to %s\n",
+			timer_info->timer, timer_state_string[new_state]);
+		ret = 0;
+	}
+
+	timer_info->state = new_state;
+	return ret;
+}
+
+#define FILL_RAM_FIELD_PTR(event, field, data)				\
+	field = raw_field_ptr(event, #field, data)
+#define FILLL_RAW_FIELD_VALUE(event, field, data)			\
+	field = (typeof(field))raw_field_value(event, #field, data)
+
+static void *get_timer(enum timer_type type, struct event *event, void *data)
+{
+	if (type == HRTIMER) {
+		void *hrtimer = NULL;
+
+		FILL_RAM_FIELD_PTR(event, hrtimer, data);
+		return hrtimer;
+	} else {
+		void *timer = NULL;
+
+		FILL_RAM_FIELD_PTR(event, timer, data);
+		return timer;
+	}
+}
+
+static void
+timer_hrtimer_start(enum timer_type type, void *data, struct event *event)
+{
+	void *timer, *function = NULL;
+	u64 expires;
+	struct timer_info *timer_info;
+
+	timer = get_timer(type, event, data);
+
+	FILL_RAM_FIELD_PTR(event, function, data);
+	FILLL_RAW_FIELD_VALUE(event, expires, data);
+
+	dprintf("timer-start: Timer[%p], type:%s, function:%p, expires:%llu\n",
+		timer, timer_type_string[type], function, expires);
+
+	timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
+	timer_check_set_state(timer_info, TIMER_START);
+	timer_info->function = function;
+	timer_info->timer_expire_ts = expires;
+}
+
+static void
+timer_hrtimer_expires_entry(enum timer_type type, void *data,
+			    struct event *event, u64 timestamp,
+			    struct thread *thread)
+{
+	void *timer;
+	int ret;
+	u64 now;
+	s64 delta;
+	struct timer_info *timer_info;
+
+	FILLL_RAW_FIELD_VALUE(event, now, data);
+	timer = get_timer(type, event, data);
+	timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
+	timer_info->handler_entry_ts = timestamp;
+	ret = timer_check_set_state(timer_info, TIMER_EXPIRE_ENTRY);
+
+	dprintf("timer-expires-entry: Timer[%p], type:%s, now:%llu, ts:%llu\n",
+		timer, timer_type_string[type], now, timestamp);
+	if (ret == 0)
+		return;
+
+	delta = (s64)(now - timer_info->timer_expire_ts);
+	if (delta < 0) {
+		timer_info->bug++;
+		dprintf("timer-expires-entry: Timer[%p], type:%s,"
+			" timer latency < 0, bug=%d.\n", timer,
+			timer_type_string[type], timer_info->bug);
+		return ;
+	}
+
+	timer_info->expire_nr++;
+	timer_info->timer_last_lat = delta;
+	timer_info->timer_total_lat += delta;
+	if (timer_info->timer_max_lat <= (u64)delta) {
+		timer_info->timer_max_lat = delta;
+		timer_info->timer_max_lat_at_ts = timestamp;
+		timer_info->timer_max_lat_at_thread = thread;
+	}
+	return;
+}
+
+static void
+timer_hrtimer_expires_exit(enum timer_type type, void *data,
+			   struct event *event, u64 timestamp,
+			   struct thread *thread __used)
+{
+	struct timer_info *timer_info;
+	s64 delta;
+	int ret;
+	void *timer;
+
+	timer = get_timer(type, event, data);
+	timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
+
+	dprintf("timer expires exit: Timer[%p], type:%s, ts:%llu\n",
+		timer, timer_type_string[type], timestamp);
+
+	ret = timer_check_set_state(timer_info, TIMER_EXPIRE_EXIT);
+	if (ret == 0)
+		return;
+
+	delta = timestamp - timer_info->handler_entry_ts;
+	if (delta < 0) {
+		timer_info->bug++;
+		dprintf("timer expires exit: Timer[%p], type:%s, handle time"
+			" < 0, expires-ts:%llu, current-ts:%llu, bug=%d.\n",
+			timer, timer_type_string[type],
+			timer_info->handler_entry_ts, timestamp,
+			timer_info->bug);
+		return;
+	}
+
+	timer_info->handler_total_time += delta;
+	if (timer_info->handler_max_time < (u64)delta) {
+		timer_info->handler_max_time = delta;
+		timer_info->handler_max_time_at_ts = timestamp;
+		timer_info->handler_max_time_at_func = timer_info->function;
+	}
+
+	return;
+}
+
+static void
+timer_start_handler(void *data, struct event *event, int this_cpu __used,
+		    u64 timestamp __used, struct thread *thread __used)
+{
+	return timer_hrtimer_start(TIMER, data, event);
+}
+
+static void
+timer_expire_entry_handler(void *data, struct event *event, int this_cpu __used,
+			   u64 timestamp, struct thread *thread)
+{
+	return timer_hrtimer_expires_entry(TIMER, data, event, timestamp, thread);
+}
+
+static void
+timer_expire_exit_handler(void *data, struct event *event, int this_cpu __used,
+			  u64 timestamp, struct thread *thread)
+{
+	return timer_hrtimer_expires_exit(TIMER, data, event, timestamp, thread);
+}
+
+static void
+hrtimer_start_handler(void *data, struct event *event, int this_cpu __used,
+		      u64 timestamp __used, struct thread *thread __used)
+{
+	return timer_hrtimer_start(HRTIMER, data, event);
+}
+
+static void
+hrtimer_expire_entry_handler(void *data, struct event *event, int this_cpu __used,
+			     u64 timestamp, struct thread *thread)
+{
+	return timer_hrtimer_expires_entry(HRTIMER, data, event, timestamp, thread);
+}
+
+static void
+hrtimer_expire_exit_handler(void *data, struct event *event, int this_cpu __used,
+			    u64 timestamp, struct thread *thread)
+{
+	return timer_hrtimer_expires_exit(HRTIMER, data, event, timestamp, thread);
+}
+
+static void
+itimer_state_handler(void *data, struct event *event, int this_cpu __used,
+		     u64 timestamp __used, struct thread *thread)
+{
+	u64 value_sec, value_usec, expires;
+	struct timer_info *timer_info;
+	void *timer = NULL;
+	int which;
+
+	FILLL_RAW_FIELD_VALUE(event, value_sec, data);
+	FILLL_RAW_FIELD_VALUE(event, value_usec, data);
+	FILLL_RAW_FIELD_VALUE(event, expires, data);
+	FILLL_RAW_FIELD_VALUE(event, which, data);
+	FILL_RAM_FIELD_PTR(event, timer, data);
+
+	timer_info = timer_findnew(thread, ITIMER, which);
+
+	/* itimer canceled, we skip this event */
+	if (!value_sec && !value_usec)
+		return ;
+
+	dprintf("timer-start: Timer[%p], type:%s, task:%s, expires:%llu",
+		 timer_info->timer, "itimer", thread->comm, expires);
+
+	/* itimer started */
+	timer_info->timer_expire_ts = expires;
+	timer_check_set_state(timer_info, TIMER_START);
+	if (which == ITIMER_REAL) {
+		timer = timer_findnew(timer, HRTIMER, MAX_ITIMER_TYPE);
+		timer_info->itimer_hrtimer = timer;
+		dprintf(" hrtimer:%p\n", timer);
+	} else {
+		dprintf("\n");
+	}
+}
+
+static void
+itimer_expire_handler(void *data, struct event *event, int this_cpu __used,
+		      u64 timestamp, struct thread *thread)
+{
+	int which;
+	u64 now;
+	pid_t pid;
+	struct thread *itimer_thread;
+	struct timer_info *timer_info;
+	s64 delta = 0;
+	int ret;
+
+	FILLL_RAW_FIELD_VALUE(event, which, data);
+	FILLL_RAW_FIELD_VALUE(event, now, data);
+	FILLL_RAW_FIELD_VALUE(event, pid, data);
+
+	itimer_thread = perf_session__findnew(timer_session, pid);
+	timer_info = timer_findnew(itimer_thread, ITIMER, which);
+
+	dprintf("timer-expires-entry: Timer[%p], type:%s, ts:%llu\n",
+		timer_info->timer, "itimer", timestamp);
+
+	ret = timer_check_set_state(timer_info, TIMER_EXPIRE_ENTRY);
+	if (ret == 0)
+		return;
+
+	if (which == ITIMER_REAL) {
+		if (timer_info->itimer_hrtimer->state == TIMER_INIT)
+			return;
+		delta = timer_info->itimer_hrtimer->timer_last_lat;
+	} else
+		delta = now - timer_info->timer_expire_ts;
+
+	if (delta < 0) {
+		timer_info->bug++;
+		dprintf("timer-expires-entry: Timer[%p], type:%s, handle time < 0,"
+			"bug=%d.\n",
+			timer_info->timer, "itimer", timer_info->bug);
+		return;
+	}
+
+	timer_info->expire_nr++;
+	timer_info->timer_total_lat += delta;
+	if (timer_info->timer_max_lat <= (u64)delta) {
+		timer_info->timer_max_lat = delta;
+		timer_info->timer_max_lat_at_ts = timestamp;
+		timer_info->timer_max_lat_at_thread = thread;
+	}
+
+	return;
+}
+
+struct event_handler_unit {
+	const char *event_name;
+	void (*handler) (void *data, struct event *event, int this_cpu,
+			 u64 timestamp, struct thread *thread);
+} timer_handler[] = {
+	/* timer */
+	{ "timer_start", 	timer_start_handler		},
+	{ "timer_expire_entry",	timer_expire_entry_handler	},
+	{ "timer_expire_exit",	timer_expire_exit_handler	},
+
+	/* hrtimer */
+	{ "hrtimer_start",	hrtimer_start_handler		},
+	{ "hrtimer_expire_entry", hrtimer_expire_entry_handler	},
+	{ "hrtimer_expire_exit",  hrtimer_expire_exit_handler},
+
+	/* itimer */
+	{ "itimer_state",	itimer_state_handler		},
+	{ "itimer_expire",	itimer_expire_handler		},
+};
+
+static void
+process_raw_event(event_t *raw_event __used, struct perf_session *session __used,
+		  void *data, int cpu, u64 timestamp, struct thread *thread)
+{
+	struct event *event;
+	int type;
+	int i;
+	int handler_item = ARRAY_SIZE(timer_handler);
+
+	type = trace_parse_common_type(data);
+	event = trace_find_event(type);
+
+	for (i = 0; i < handler_item; i++)
+		if (!strcmp(timer_handler[i].event_name, event->name))
+			timer_handler[i].handler(data, event, cpu, timestamp, thread);
+}
+
+static int process_sample_event(event_t *event, struct perf_session *session)
+{
+	struct sample_data data;
+	struct thread *thread;
+
+	if (!(session->sample_type & PERF_SAMPLE_RAW))
+		return 0;
+
+	memset(&data, 0, sizeof(data));
+	data.time = -1;
+	data.cpu = -1;
+	data.period = -1;
+
+	event__parse_sample(event, session->sample_type, &data);
+
+	dump_printf("(IP, %d): %d/%d: %p period: %Ld\n",
+		event->header.misc,
+		data.pid, data.tid,
+		(void *)(long)data.ip,
+		(long long)data.period);
+
+	thread = perf_session__findnew(session, data.pid);
+	if (thread == NULL) {
+		pr_debug("problem processing %d event, skipping it.\n",
+			 event->header.type);
+		return -1;
+	}
+
+	dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
+
+	if (profile_cpu != -1 && profile_cpu != (int)data.cpu)
+		return 0;
+
+	process_raw_event(event, session, data.raw_data, data.cpu, data.time, thread);
+	return 0;
+}
+
+static int sample_type_check(struct perf_session *session)
+{
+	if (!(session->sample_type & PERF_SAMPLE_RAW)) {
+		fprintf(stderr,
+			"No trace sample to read. Did you call perf record "
+			"without -R?");
+		return -1;
+	}
+
+	return 0;
+}
+
+static struct perf_event_ops event_ops = {
+	.process_sample_event	= process_sample_event,
+	.process_comm_event	= event__process_comm,
+	.sample_type_check	= sample_type_check,
+};
+
+static int read_events(void)
+{
+	int err;
+	timer_session = perf_session__new(input_name, O_RDONLY, 0, NULL);
+
+	if (timer_session == NULL)
+		return -ENOMEM;
+
+	err = perf_session__process_events(timer_session, &event_ops);
+	perf_session__delete(timer_session);
+	return err;
+}
+
+static inline const char *get_itimer_type(int which)
+{
+	switch (which) {
+	case ITIMER_REAL:
+		return "REAL";
+	case ITIMER_VIRTUAL:
+		return "VIRTUAL";
+	case ITIMER_PROF:
+		return "PROF";
+	default:
+		return "Unknow";
+	}
+}
+
+static inline const char *get_lat_unit(struct timer_info *timer_info)
+{
+	switch (timer_info->type) {
+	case TIMER:
+		return "HZ";
+	case HRTIMER:
+		return "ns";
+	case ITIMER:
+		if (timer_info->itimer_type == ITIMER_REAL)
+			return "ns";
+		return "HZ";
+	default:
+		return "ERR";
+	}
+}
+
+static void __print_timer_lat(struct timer_info *timer_info)
+{
+	bug_nr += timer_info->bug;
+
+	if (timer_info->type == ITIMER) {
+		printf("|%-10.10s:[%7s]", ((struct thread *)(timer_info->timer))->comm,
+						   get_itimer_type(timer_info->itimer_type));
+	} else
+		printf("|%-20p", timer_info->timer);
+
+	printf("|%-10s", timer_type_string[timer_info->type]);
+	printf("|%-12.3f %3s", timer_info->timer_avg_lat, get_lat_unit(timer_info));
+	printf("|%-10llu %3s", timer_info->timer_max_lat, get_lat_unit(timer_info));
+	printf("|%-20llu", timer_info->timer_max_lat_at_ts);
+	printf("|%-16.16s|\n", timer_info->timer_max_lat_at_thread->comm);
+}
+
+static void print_timer_lat(void)
+{
+	printf("\n");
+	printf("-------------------------------------------------------------------------------------------------------\n");
+	printf("|   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |\n");
+	timer_traversal(&timer_result, __print_timer_lat);
+}
+
+static void __print_handle_time(struct timer_info *timer_info)
+{
+	void *function;
+
+	bug_nr += timer_info->bug;
+
+	if (timer_info->type == ITIMER)
+		return;
+
+	function = timer_info->handler_max_time_at_func;
+
+	printf("|%-20p", timer_info->timer);
+	printf("|%-10s", timer_type_string[timer_info->type]);
+	printf("|%-16.3f", timer_info->handler_avg_time / 1e6);
+	printf("|%-14.3f", (double)timer_info->handler_max_time / 1e6);
+	printf("|%-20.3f", (double)timer_info->handler_max_time_at_ts / 1e9);
+	if (!function) {
+		printf("|%-16s|\n", "Not - Catch");
+		return;
+	}
+
+	printf("|%-16p|\n", function);
+}
+
+static void print_handle_time(void)
+{
+	printf("\n");
+	printf("-------------------------------------------------------------------------------------------------------\n");
+	printf("|   Timer            |   TYPE   | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |\n");
+	timer_traversal(&timer_result, __print_handle_time);
+}
+
+static void __print_bug_timer(struct timer_info *timer_info)
+{
+	if (!timer_info->bug)
+		return;
+
+	if (timer_info->type == ITIMER) {
+		printf("|%-10.10s:[%7s]", ((struct thread *)(timer_info->timer))->comm,
+						   get_itimer_type(timer_info->itimer_type));
+	} else
+		printf("|%-20p", timer_info->timer);
+
+	printf("|%-10s", timer_type_string[timer_info->type]);
+	printf("|%-10d|\n", timer_info->bug);
+}
+
+static void print_bug_timer(void)
+{
+	if (bug_nr) {
+		printf("\nWarning: detect %d bug(s):\n", bug_nr);
+		printf("--------------------------------------------\n");
+		printf("|   Timer            |   TYPE   |    Bug   |\n");
+		timer_traversal(&timer_result, __print_bug_timer);
+	}
+}
+
+static void print_result(void)
+{
+	if (!print_lat && !print_handle)
+		print_lat = 1;
+
+	if (print_lat)
+		print_timer_lat();
+
+	if (print_handle) {
+		bug_nr = 0;
+		print_handle_time();
+	}
+
+	print_bug_timer();
+}
+
+static void __cmd_lat(void)
+{
+	setup_pager();
+	read_events();
+	sort_result();
+	print_result();
+}
+
+static int __cmd_record(int argc, const char **argv)
+{
+	unsigned int rec_argc, i, j;
+	const char **rec_argv;
+
+	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+		rec_argv[i] = strdup(record_args[i]);
+
+	for (j = 1; j < (unsigned int)argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+int cmd_timer(int argc, const char **argv, const char *prefix __used)
+{
+	argc = parse_options(argc, argv, timer_options, timer_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+	if (!argc)
+		usage_with_options(timer_usage, timer_options);
+
+	symbol__init(0);
+	if (!strncmp(argv[0], "rec", 3)) {
+		return __cmd_record(argc, argv);
+	} else if (!strncmp(argv[0], "lat", 3)) {
+		if (argc > 1) {
+			argc = parse_options(argc, argv, latency_options, latency_usage, 0);
+			if (argc)
+				usage_with_options(latency_usage, latency_options);
+		}
+		setup_sorting();
+		__cmd_lat();
+	} else {
+		usage_with_options(timer_usage, timer_options);
+	}
+
+	return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index a3d8bf6..afbe9b1 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -29,5 +29,6 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
 extern int cmd_version(int argc, const char **argv, const char *prefix);
 extern int cmd_probe(int argc, const char **argv, const char *prefix);
 extern int cmd_kmem(int argc, const char **argv, const char *prefix);
+extern int cmd_timer(int argc, const char **argv, const char *prefix);
 
 #endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 02b09ea..d49bc55 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -15,3 +15,4 @@ perf-top			mainporcelain common
 perf-trace			mainporcelain common
 perf-probe			mainporcelain common
 perf-kmem			mainporcelain common
+perf-timer			mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index cf64049..b43268a 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -300,6 +300,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "sched",	cmd_sched,	0 },
 		{ "probe",	cmd_probe,	0 },
 		{ "kmem",	cmd_kmem,	0 },
+		{ "timer",	cmd_timer,	0 },
 	};
 	unsigned int i;
 	static const char ext[] = STRIP_EXTENSION;
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/4] perf_event: fix getting point
  2009-12-15 11:20   ` [PATCH 2/4] perf_event: fix getting point Xiao Guangrong
  2009-12-15 11:21     ` [PATCH 3/4] perf/timer: add document Xiao Guangrong
@ 2009-12-15 13:58     ` Frederic Weisbecker
  2009-12-16  1:03       ` Xiao Guangrong
  1 sibling, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2009-12-15 13:58 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, LKML

On Tue, Dec 15, 2009 at 07:20:26PM +0800, Xiao Guangrong wrote:
> The point that got by raw_field_ptr() is point to @data offset not
> tracepoint saved.
> 
> Introduce raw_field_arry() to get arry data is tracepoint save
> arry to it.


Could you please rename it into raw_field_array()?
I feel like...a kind of loss there :)



> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
> ---
>  tools/perf/builtin-sched.c          |    3 +--
>  tools/perf/util/trace-event-parse.c |   25 ++++++++++++++++++++++---
>  tools/perf/util/trace-event.h       |    3 +++
>  3 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index d67f274..8df982e 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -630,8 +630,7 @@ static void test_calibrations(void)
>  
>  #define FILL_ARRAY(ptr, array, event, data)			\
>  do {								\
> -	void *__array = raw_field_ptr(event, #array, data);	\
> -	memcpy(ptr.array, __array, sizeof(ptr.array));	\
> +	raw_field_arry(event, #array, data, ptr.array, sizeof(ptr.array));\
>  } while(0)
>  
>  #define FILL_COMMON_FIELDS(ptr, event, data)			\
> diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
> index c5c32be..b0ee64f 100644
> --- a/tools/perf/util/trace-event-parse.c
> +++ b/tools/perf/util/trace-event-parse.c
> @@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
>  			field->flags |= FIELD_IS_ARRAY;
>  
>  			type = read_token(&token);
> -		        while (strcmp(token, "]") != 0) {
> +			while (strcmp(token, "]") != 0) {
>  				if (last_type == EVENT_ITEM &&
>  				    type == EVENT_ITEM)
>  					len = 2;
> @@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)
>  
>  void *raw_field_ptr(struct event *event, const char *name, void *data)
>  {
> +	void *ptr;
> +	unsigned long long value;
> +
> +	value = raw_field_value(event, name, data);
> +
> +	if (!value)
> +		return NULL;
> +
> +	memcpy(&ptr, &value, sizeof(ptr));



Could you perhaps just do

ptr = (void *)value; ?

This memcpy suggest we are dereferencing some field while
actually just copy a pointer.

Thanks!


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior
  2009-12-15 11:17 [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Xiao Guangrong
  2009-12-15 11:19 ` [PATCH 1/4] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
@ 2009-12-15 14:15 ` Frederic Weisbecker
  2009-12-16  1:19   ` Xiao Guangrong
  2009-12-15 14:23 ` Frederic Weisbecker
  2 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2009-12-15 14:15 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, LKML

On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
> Hi,
> 
> We introduce 'perf timer' in this patchset, it can analyze timer
> latency and timer function handle time, the usage and result is
> like below:
> 
> # perf timer record
> # perf timer lat --print-lat --print-handle
> -------------------------------------------------------------------------------------------------------
> |   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |
> |0xf7ad1f5c          |hrtimer   |996068.500    ns|1607650     ns|10270128658526      |init            |
> |0xf7903f04          |timer     |0.625         HZ|2           HZ|10270344082394      |swapper         |
> |0xf787a05c          |hrtimer   |200239.500    ns|359929      ns|10269316024808      |main            |
> |main      :[   PROF]|itimer    |0.000         HZ|0           HZ|10237021270557      |main            |
> |main      :[VIRTUAL]|itimer    |0.000         HZ|0           HZ|10257314773501      |main            |



Cool! This is really a good work and a good idea.

Just have some neats in mind. hrtimers and timers don't have the same latency and
granularity requirements.

As you show it, timers have an HZ granularity and hrtimers are about nanoseconds,
and mixing them up in the same array of latency report is too messy.
They don't have the same granularity/latency scope so they should
be reported separately.

Otherwise it's like mixing slothes and flies latency results after
a fly swat attack test ;-)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior
  2009-12-15 11:17 [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Xiao Guangrong
  2009-12-15 11:19 ` [PATCH 1/4] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
  2009-12-15 14:15 ` [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Frederic Weisbecker
@ 2009-12-15 14:23 ` Frederic Weisbecker
  2009-12-22 13:00   ` [PATCH v2 0/5] " Xiao Guangrong
  2 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2009-12-15 14:23 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, LKML

On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
> Hi,
> 
> We introduce 'perf timer' in this patchset, it can analyze timer
> latency and timer function handle time, the usage and result is
> like below:
> 
> # perf timer record
> # perf timer lat --print-lat --print-handle
> -------------------------------------------------------------------------------------------------------
> |   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |
> |0xf7ad1f5c          |hrtimer   |996068.500    ns|1607650     ns|10270128658526      |init            |
> |0xf7903f04          |timer     |0.625         HZ|2           HZ|10270344082394      |swapper         |
> |0xf787a05c          |hrtimer   |200239.500    ns|359929      ns|10269316024808      |main            |
> |main      :[   PROF]|itimer    |0.000         HZ|0           HZ|10237021270557      |main            |
> |main      :[VIRTUAL]|itimer    |0.000         HZ|0           HZ|10257314773501      |main            |
> 
> ......
> 
> -------------------------------------------------------------------------------------------------------
> |   Timer            |   TYPE   | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
> |0xf7ad1f5c          |hrtimer   |0.025           |0.025         |10270.129           |0xc016b5b0      |
> |0xf7903f04          |timer     |0.009           |0.011         |10260.342           |0xc0159240      |
> |0xf787a05c          |hrtimer   |0.031           |0.062         |10267.018           |0xc014cc40      |
> 
> And, in current code, it'll complain with below message when we use
> 'perf timer lat':
>  
>   # ./perf timer lat
>   Warning: unknown op '{'
>   Warning: Error: expected type 5 but read 1
>   Warning: failed to read event print fmt for hrtimer_start
>   Warning: unknown op '{'
>   Warning: Error: expected type 5 but read 1
>   Warning: failed to read event print fmt for hrtimer_expire_entry


Oh and indeed we have some event format complexities to solve
there.

cat /debug/tracing/events/timer/hrtimer_start/format

print fmt: "hrtimer=%p function=%pf expires=%llu softexpires=%llu", REC->hrtimer, REC->function, (unsigned long 
long)(((ktime_t) { .tv64 = REC->expires }).tv64), (unsigned long long)(((ktime_t) { .tv64 = REC->softexpires 
}).tv64)

We should try to simplify this, may be we should just expose
__entry->expires as is.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/4] perf/timer: 'perf timer' core code
  2009-12-15 11:22       ` [PATCH 4/4] perf/timer: 'perf timer' core code Xiao Guangrong
@ 2009-12-15 17:44         ` Thomas Gleixner
  2009-12-16  5:56           ` Xiao Guangrong
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2009-12-15 17:44 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Peter Zijlstra, Frederic Weisbecker, Steven Rostedt, LKML


On Tue, 15 Dec 2009, Xiao Guangrong wrote:

> It's the core code of 'perf timer', we can use it to analyse timer's
> behavior:
> # perf timer record
> # perf timer lat --print-lat --print-handle
> -------------------------------------------------------------------------------------------------------
> |   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |
> |0xf7ad1f5c          |hrtimer   |996068.500    ns|1607650     ns|10270128658526      |init            |
> |0xf7903f04          |timer     |0.625         HZ|2           HZ|10270344082394      |swapper         |
> |0xf787a05c          |hrtimer   |200239.500    ns|359929      ns|10269316024808      |main            |
> |main      :[   PROF]|itimer    |0.000         HZ|0           HZ|10237021270557      |main            |
> |main      :[VIRTUAL]|itimer    |0.000         HZ|0           HZ|10257314773501      |main            |
> 
> ......
> 
> -------------------------------------------------------------------------------------------------------
> |   Timer            |   TYPE   | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
> |0xf7ad1f5c          |hrtimer   |0.025           |0.025         |10270.129           |0xc016b5b0      |
> |0xf7903f04          |timer     |0.009           |0.011         |10260.342           |0xc0159240      |
> |0xf787a05c          |hrtimer   |0.031           |0.062         |10267.018           |0xc014cc40      |

The output is confusing in several aspects:

  1) Different time units:

     Please use consistent time units for everything. micro seconds
     are fine and we definitely do not care about nanosecond
     fractions.

  2) Timer description:
     
     Why do we have hex addresses and process names intermingled ? Why
     don't we print the process/thread name which owns the timer
     always ? [PROF/VIRTUAL] is not a property of the Timer, it
     belongs into type.
     
  3) Max-lat-at-Task:

     What does this field tell ?
   
  4) *handle:

     That should be a more descriptive name, e.g. function runtime

  5) Max-lat-at-func:

     Is this the callback function which ran the longest time ? Why
     is it named latency ? Why is it not decoded into a symbol ?

Btw, fitting the output into 80chars allows to use the tool on a non
graphical terminal as well.

Also there are other metrics of timers which are interesting and
should be retrieved by such a tool:

     number of activated timers
     number of canceled timers
     number of expired timers

in the form of simple statistics.

The canceled timers are interesting also in a list, so we can see
which timers are canceled after which time and how long before the
expiry.

> +#ifdef TDEBUG
> +static char const *timer_state_string[] = {
> +	"TIMER_INIT",
> +	"TIMER_STARE",

  If I _stare_ longer, I might understand it :)

> +#define SORT_KEY \
> +	"timer, type, avg-timer-latency, max-timer-latency,"	\
> +	" avg-handle-time, max-handle-time"

The code has at least two places where these strings are used. They
should be synced.

> +static char const	*input_name = "perf.data";
> +static char		default_sort_order[] = SORT_KEY;
> +
> +static char		*sort_order = default_sort_order;
> +static int		profile_cpu = -1;
> +static int		print_lat;
> +static int		print_handle;
> +static int		bug_nr;
> +static struct perf_session *timer_session;
> +
> +static const struct option timer_options[] = {
> +	OPT_STRING('i', "input", &input_name, "file",
> +		    "input file name"),
> +	OPT_BOOLEAN('v', "verbose", &verbose,
> +		    "be more verbose (show symbol address, etc)"),
> +	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
> +		    "dump raw trace in ASCII"),
> +	OPT_END()
> +};
> +
> +static const char * const timer_usage[] = {
> +	"perf timer [<options>] {record|latency}",
> +	NULL
> +};

Your example above uses "perf timer lat". What's correct ?

> +static const struct option latency_options[] = {
> +	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
> +		   "sort by key(s): "SORT_KEY),

Do we really need a sort order ? A single sort key should be
sufficient.

I'd prefer to have a selector instead which lets me filter timer
types. If I debug hrtimers then I have no interest in itimers or timer
list timers.

> +enum timer_type {
> +	TIMER,
> +	HRTIMER,
> +	ITIMER,
> +};

Hmm. I don't think that's correct. You should split the itimer type
into the various subimplementations.

> +static char const *timer_type_string[] = {
> +	"timer",
> +	"hrtimer",
> +	"itimer",
> +};

Ditto.

> +struct sort_dimension {
> +	const char		*name;
> +	sort_fn_t		cmp;
> +	struct list_head	list;
> +};
> +
> +#define CMP_FUNCTION(value)						\

Please use something like "field". value confused me for a while, as I
expected a real value.

> +static int cmp_##value(struct timer_info *v1, struct timer_info *v2)	\
> +{									\
> +	if (v1->value < v2->value)					\
> +		return -1;						\
> +	if (v1->value > v2->value)					\
> +		return 1;						\
> +	return 0;							\
> +}
> +
> +#define DECLARE_SORT_DIMENSION(sort_name, field)			\

That should be DEFINE_... You are defining the function, not declaring
it.

> +CMP_FUNCTION(field)							\
> +static struct sort_dimension sort_##field = {				\
> +	.name = sort_name,						\
> +	.cmp = cmp_##field,						\
> +}
> +
> +DECLARE_SORT_DIMENSION("timer", timer);
> +DECLARE_SORT_DIMENSION("type", type);
> +DECLARE_SORT_DIMENSION("avg-timer-latency", timer_avg_lat);
> +DECLARE_SORT_DIMENSION("max-timer-latency", timer_max_lat);
> +DECLARE_SORT_DIMENSION("avg-handle-time", handler_avg_time);
> +DECLARE_SORT_DIMENSION("max-handle-time", handler_max_time);
> +
> +static struct sort_dimension *available_sorts[] = {
> +	&sort_timer,
> +	&sort_type,
> +	&sort_timer_avg_lat,
> +	&sort_timer_max_lat,
> +	&sort_handler_avg_time,
> +	&sort_handler_max_time,
> +};

This indirection horrible.

#define SORT(field,_name)	\
{	.func = cmp_##field,	\
	.name = _name, }

static struct sort_key sorts[] = {
       SORT(timer, "timer"),
       ....
};

> +#define NB_AVAILABLE_SORTS	(int)(sizeof(available_sorts) / sizeof(struct sort_dimension *))

Please use NR_ as abbreviation for number. Also the size calculation
for arrays is:

     ARRAY_SIZE(available_sorts)

> +static int sort_dimension__add(const char *tok, struct list_head *list)

  No double underscores please.

> +{
> +	int i;
> +
> +	for (i = 0; i < NB_AVAILABLE_SORTS; i++) {
> +		if (!strcmp(available_sorts[i]->name, tok)) {
> +			list_add_tail(&available_sorts[i]->list, list);
> +
> +			return 0;
> +		}
> +	}
> +
> +	return -1;
> +}
> +
> +static LIST_HEAD(sort_list);
> +
> +static void setup_sorting(void)
> +{
> +	char *tmp, *tok, *str = strdup(sort_order);

  Please hand sort_order in as an argument.

> +	for (tok = strtok_r(str, ", ", &tmp);
> +			tok; tok = strtok_r(NULL, ", ", &tmp)) {
> +		if (sort_dimension__add(tok, &sort_list) < 0) {
> +			error("Unknown --sort key: `%s'", tok);
> +			usage_with_options(latency_usage, latency_options);
> +		}
> +	}
> +
> +	free(str);
> +
> +	sort_dimension__add("timer", &default_cmp);
> +	sort_dimension__add("itimer-type", &default_cmp);

  That obviously does not match any of the sort types.

> +}
> +
> +static struct rb_root timer_root;
> +
> +static int timer_key_cmp(struct list_head *list, struct timer_info *l,
> +			 struct timer_info *r)
> +{
> +	struct sort_dimension *sort;
> +	int ret = 0;
> +
> +	BUG_ON(list_empty(list));
> +
> +	list_for_each_entry(sort, list, list) {
> +		ret = sort->cmp(l, r);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return ret;
> +}
> +
> +static void timer_traversal(struct rb_root *root,
> +			    void (*fun)(struct timer_info *))
> +{
> +	struct rb_node *next;
> +
> +	next = rb_first(root);
> +
> +	while (next) {
> +		struct timer_info *timer;
> +
> +		timer = rb_entry(next, struct timer_info, node);
> +		fun(timer);
> +		next = rb_next(next);
> +	}
> +}
> +
> +static struct timer_info *
> +__timer_search(struct rb_root *root, struct timer_info *timer,
> +	       struct list_head *_sort_list)
> +{
> +	struct rb_node *node = root->rb_node;
> +
> +	while (node) {
> +		struct timer_info *timer_info;
> +		int cmp;
> +
> +		timer_info = container_of(node, struct timer_info, node);
> +
> +		cmp = timer_key_cmp(_sort_list, timer, timer_info);
> +		if (cmp > 0)
> +			node = node->rb_left;
> +		else if (cmp < 0)
> +			node = node->rb_right;
> +		else
> +			return timer_info;

  This looks more than odd. You search for a timer in the list by
  using the compare functions which are used for sorting ?

  How should that work ?

> +	}
> +	return NULL;
> +}
> +
> +static struct timer_info *
> +timer_search(struct timer_info *timer)
> +{
> +	return __timer_search(&timer_root, timer, &default_cmp);
> +}
> +
> +static void
> +__timer_insert(struct rb_root *root, struct timer_info *data,
> +			 struct list_head *_sort_list)
> +{
> +	struct rb_node **new = &(root->rb_node), *parent = NULL;
> +
> +	while (*new) {
> +		struct timer_info *this;
> +		int cmp;
> +
> +		this = container_of(*new, struct timer_info, node);
> +		parent = *new;
> +
> +		cmp = timer_key_cmp(_sort_list, data, this);
> +
> +		if (cmp > 0)
> +			new = &((*new)->rb_left);
> +		else
> +			new = &((*new)->rb_right);
> +	}
> +
> +	rb_link_node(&data->node, parent, new);
> +	rb_insert_color(&data->node, root);
> +}
> +
> +static void timer_insert(struct timer_info *timer_info)
> +{
> +	dprintf("timer %p insert...\n", timer_info->timer);
> +	__timer_insert(&timer_root, timer_info, &default_cmp);
> +}
> +
> +static struct timer_info
> +*__timer_find(void *timer, enum timer_type type, int itimer_type)

  As I said before, that's wrong. Enumerate the itimer subtypes in the
  timer types and you get rid of all that extra itimer_type business.

> +{
> +	struct timer_info *find = NULL;
> +	struct timer_info timer_info = {
> +			.timer = timer,
> +			.itimer_type = itimer_type,
> +		};
> +
> +	find = timer_search(&timer_info);
> +	if (find && find->type != type) {
> +
> +		dprintf("find timer[%p], but type[%s] is not we expect[%s],"
> +			"set to initializtion state.\n", find->timer,
> +			timer_type_string[find->type], timer_type_string[type]);
> +
> +		find->type = type;
> +		find->bug++;
> +		find->state = TIMER_INIT;

  Why does a timer_search fail ? And why is fixing up the type if it
  is not matching a good idea ?

> +	}
> +
> +	dprintf("find timer[%p] %s\n", timer, find ? "success" : "fail");
> +	return find;
> +}
> +
> +static struct timer_info *
> +timer_findnew(void *timer, enum timer_type type, int itimer_type)
> +{
> +	struct timer_info *find;
> +
> +	find = __timer_find(timer, type, itimer_type);
> +	if (find)
> +		return find;
> +
> +	find = zalloc(sizeof(*find));
> +	if (!find)
> +		die("No memory");
> +
> +	find->type = type;
> +	find->timer = timer;
> +	find->itimer_type = itimer_type;
> +	timer_insert(find);
> +
> +	return find;
> +}
> +
> +static struct rb_root timer_result;
> +
> +static void sort_result(void)
> +{
> +	struct rb_node *node;
> +
> +	for (;;) {
> +		struct timer_info *data;
> +		node = rb_first(&timer_root);
> +		if (!node)
> +			break;
> +
> +		rb_erase(node, &timer_root);
> +		data = rb_entry(node, struct timer_info, node);
> +		data->timer_avg_lat = (double)data->timer_total_lat /
> +					(double)data->expire_nr;
> +		data->handler_avg_time = (double)data->handler_total_time /
> +					(double)data->expire_nr;
> +		if (data->expire_nr)
> +			__timer_insert(&timer_result, data, &sort_list);
> +	}
> +}
> +
> +static int timer_check_set_state(struct timer_info *timer_info,
> +				 enum timer_state new_state)
> +{
> +	enum timer_state current_state = timer_info->state;
> +	int ret = 1;
> +
> +	/*
> +	 * if current state is TIMER_INIT, it means it's the first time we
> +	 * catched, no need check.

  Check whatfor ?

> +	 */
> +	if (current_state == TIMER_INIT) {
> +		dprintf("timer[%p] initialize, set state to %s\n",
> +			timer_info->timer, timer_state_string[new_state]);
> +		ret = 0;
> +	}
> +
> +	timer_info->state = new_state;
> +	return ret;
> +}
> +
> +#define FILL_RAM_FIELD_PTR(event, field, data)				\
> +	field = raw_field_ptr(event, #field, data)
> +#define FILLL_RAW_FIELD_VALUE(event, field, data)			\
> +	field = (typeof(field))raw_field_value(event, #field, data)

  I can neither figure out what FILL_RAM_FIELD means nor why
  FILLL_RAW_FIELD needs 3 L.

> +static void *get_timer(enum timer_type type, struct event *event, void *data)
> +{
> +	if (type == HRTIMER) {
> +		void *hrtimer = NULL;
> +
> +		FILL_RAM_FIELD_PTR(event, hrtimer, data);
> +		return hrtimer;

  Shudder.

	return raw_field_ptr(event, "hrtimer", data);

> +	} else {
> +		void *timer = NULL;
> +
> +		FILL_RAM_FIELD_PTR(event, timer, data);
> +		return timer;
> +	}
> +}
> +
> +static void
> +timer_hrtimer_start(enum timer_type type, void *data, struct event *event)
> +{
> +	void *timer, *function = NULL;
> +	u64 expires;
> +	struct timer_info *timer_info;
> +
> +	timer = get_timer(type, event, data);
> +
> +	FILL_RAM_FIELD_PTR(event, function, data);
> +	FILLL_RAW_FIELD_VALUE(event, expires, data);
> +
> +	dprintf("timer-start: Timer[%p], type:%s, function:%p, expires:%llu\n",
> +		timer, timer_type_string[type], function, expires);
> +
> +	timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
> +	timer_check_set_state(timer_info, TIMER_START);
> +	timer_info->function = function;
> +	timer_info->timer_expire_ts = expires;
> +}
> +
> +static void
> +timer_hrtimer_expires_entry(enum timer_type type, void *data,
> +			    struct event *event, u64 timestamp,
> +			    struct thread *thread)
> +{
> +	void *timer;
> +	int ret;
> +	u64 now;
> +	s64 delta;
> +	struct timer_info *timer_info;
> +
> +	FILLL_RAW_FIELD_VALUE(event, now, data);
> +	timer = get_timer(type, event, data);
> +	timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
> +	timer_info->handler_entry_ts = timestamp;
> +	ret = timer_check_set_state(timer_info, TIMER_EXPIRE_ENTRY);
> +
> +	dprintf("timer-expires-entry: Timer[%p], type:%s, now:%llu, ts:%llu\n",
> +		timer, timer_type_string[type], now, timestamp);
> +	if (ret == 0)
> +		return;
> +
> +	delta = (s64)(now - timer_info->timer_expire_ts);
> +	if (delta < 0) {
> +		timer_info->bug++;
> +		dprintf("timer-expires-entry: Timer[%p], type:%s,"
> +			" timer latency < 0, bug=%d.\n", timer,
> +			timer_type_string[type], timer_info->bug);
> +		return ;
> +	}
> +
> +	timer_info->expire_nr++;
> +	timer_info->timer_last_lat = delta;
> +	timer_info->timer_total_lat += delta;
> +	if (timer_info->timer_max_lat <= (u64)delta) {

  Why can't we just make timer_max_lat s64 so we get rid of those
  silly typecasts ?

> +		timer_info->timer_max_lat = delta;
> +		timer_info->timer_max_lat_at_ts = timestamp;
> +		timer_info->timer_max_lat_at_thread = thread;

  What does us the information in which thread context the latency
  happend show ? Actually nothing. All it tells us that thread X was
  on the CPU at that point. That's useless.

> +static void
> +itimer_state_handler(void *data, struct event *event, int this_cpu __used,
> +		     u64 timestamp __used, struct thread *thread)
> +{
> +	u64 value_sec, value_usec, expires;
> +	struct timer_info *timer_info;
> +	void *timer = NULL;
> +	int which;
> +
> +	FILLL_RAW_FIELD_VALUE(event, value_sec, data);
> +	FILLL_RAW_FIELD_VALUE(event, value_usec, data);
> +	FILLL_RAW_FIELD_VALUE(event, expires, data);
> +	FILLL_RAW_FIELD_VALUE(event, which, data);
> +	FILL_RAM_FIELD_PTR(event, timer, data);

  This is complete obfuscated, while

  value_sec = get_value(data, event, "value_sec");

  is obvious.

> +	timer_info = timer_findnew(thread, ITIMER, which);
> +
> +	/* itimer canceled, we skip this event */
> +	if (!value_sec && !value_usec)
> +		return ;

  You throw away valuable information here about canceled timers.

> +	dprintf("timer-start: Timer[%p], type:%s, task:%s, expires:%llu",
> +		 timer_info->timer, "itimer", thread->comm, expires);
> +
> +	/* itimer started */
> +	timer_info->timer_expire_ts = expires;
> +	timer_check_set_state(timer_info, TIMER_START);
> +	if (which == ITIMER_REAL) {
> +		timer = timer_findnew(timer, HRTIMER, MAX_ITIMER_TYPE);
> +		timer_info->itimer_hrtimer = timer;
> +		dprintf(" hrtimer:%p\n", timer);
> +	} else {
> +		dprintf("\n");
> +	}
> +}
> +
> +static void
> +itimer_expire_handler(void *data, struct event *event, int this_cpu __used,
> +		      u64 timestamp, struct thread *thread)
> +{
> +	int which;
> +	u64 now;
> +	pid_t pid;
> +	struct thread *itimer_thread;
> +	struct timer_info *timer_info;
> +	s64 delta = 0;

  No need to initialize that.

> +	int ret;

  Please group identical types

  int ret, which;

> +struct event_handler_unit {
> +	const char *event_name;
> +	void (*handler) (void *data, struct event *event, int this_cpu,
> +			 u64 timestamp, struct thread *thread);
> +} timer_handler[] = {
> +	/* timer */
> +	{ "timer_start", 	timer_start_handler		},
> +	{ "timer_expire_entry",	timer_expire_entry_handler	},
> +	{ "timer_expire_exit",	timer_expire_exit_handler	},
> +
> +	/* hrtimer */
> +	{ "hrtimer_start",	hrtimer_start_handler		},
> +	{ "hrtimer_expire_entry", hrtimer_expire_entry_handler	},
> +	{ "hrtimer_expire_exit",  hrtimer_expire_exit_handler},
> +
> +	/* itimer */
> +	{ "itimer_state",	itimer_state_handler		},
> +	{ "itimer_expire",	itimer_expire_handler		},
> +};
> +
> +static void
> +process_raw_event(event_t *raw_event __used, struct perf_session *session __used,
> +		  void *data, int cpu, u64 timestamp, struct thread *thread)
> +{
> +	struct event *event;
> +	int type;
> +	int i;
> +	int handler_item = ARRAY_SIZE(timer_handler);

  See above.

> +	type = trace_parse_common_type(data);
> +	event = trace_find_event(type);
> +
> +	for (i = 0; i < handler_item; i++)
> +		if (!strcmp(timer_handler[i].event_name, event->name))
> +			timer_handler[i].handler(data, event, cpu, timestamp, thread);
> +}
> +
> +static int read_events(void)
> +{
> +	int err;

  New line between variables and code.

> +	timer_session = perf_session__new(input_name, O_RDONLY, 0, NULL);
> +
> +	if (timer_session == NULL)

  	if (!timer_session)

> +		return -ENOMEM;
> +
> +	err = perf_session__process_events(timer_session, &event_ops);
> +	perf_session__delete(timer_session);
> +	return err;
> +}
> +

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/4] perf_event: fix getting point
  2009-12-15 13:58     ` [PATCH 2/4] perf_event: fix getting point Frederic Weisbecker
@ 2009-12-16  1:03       ` Xiao Guangrong
  2009-12-16  1:22         ` Frederic Weisbecker
  0 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-16  1:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, LKML



Frederic Weisbecker wrote:
> On Tue, Dec 15, 2009 at 07:20:26PM +0800, Xiao Guangrong wrote:
>> The point that got by raw_field_ptr() is point to @data offset not
>> tracepoint saved.
>>
>> Introduce raw_field_arry() to get arry data is tracepoint save
>> arry to it.
> 
> 
> Could you please rename it into raw_field_array()?
> I feel like...a kind of loss there :)
> 

Ah, it's my typo, i will fix it.

> 
> 
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
>> ---
>>  tools/perf/builtin-sched.c          |    3 +--
>>  tools/perf/util/trace-event-parse.c |   25 ++++++++++++++++++++++---
>>  tools/perf/util/trace-event.h       |    3 +++
>>  3 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index d67f274..8df982e 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -630,8 +630,7 @@ static void test_calibrations(void)
>>  
>>  #define FILL_ARRAY(ptr, array, event, data)			\
>>  do {								\
>> -	void *__array = raw_field_ptr(event, #array, data);	\
>> -	memcpy(ptr.array, __array, sizeof(ptr.array));	\
>> +	raw_field_arry(event, #array, data, ptr.array, sizeof(ptr.array));\
>>  } while(0)
>>  
>>  #define FILL_COMMON_FIELDS(ptr, event, data)			\
>> diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
>> index c5c32be..b0ee64f 100644
>> --- a/tools/perf/util/trace-event-parse.c
>> +++ b/tools/perf/util/trace-event-parse.c
>> @@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
>>  			field->flags |= FIELD_IS_ARRAY;
>>  
>>  			type = read_token(&token);
>> -		        while (strcmp(token, "]") != 0) {
>> +			while (strcmp(token, "]") != 0) {
>>  				if (last_type == EVENT_ITEM &&
>>  				    type == EVENT_ITEM)
>>  					len = 2;
>> @@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)
>>  
>>  void *raw_field_ptr(struct event *event, const char *name, void *data)
>>  {
>> +	void *ptr;
>> +	unsigned long long value;
>> +
>> +	value = raw_field_value(event, name, data);
>> +
>> +	if (!value)
>> +		return NULL;
>> +
>> +	memcpy(&ptr, &value, sizeof(ptr));
> 
> 
> 
> Could you perhaps just do
> 
> ptr = (void *)value; ?
> 

We can't do it in x86_32:
error: cast to pointer from integer of different size

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior
  2009-12-15 14:15 ` [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Frederic Weisbecker
@ 2009-12-16  1:19   ` Xiao Guangrong
  2009-12-16  7:32     ` Ingo Molnar
  0 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-16  1:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, LKML



Frederic Weisbecker wrote:
> On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
>> Hi,
>>
>> We introduce 'perf timer' in this patchset, it can analyze timer
>> latency and timer function handle time, the usage and result is
>> like below:
>>
>> # perf timer record
>> # perf timer lat --print-lat --print-handle
>> -------------------------------------------------------------------------------------------------------
>> |   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |
>> |0xf7ad1f5c          |hrtimer   |996068.500    ns|1607650     ns|10270128658526      |init            |
>> |0xf7903f04          |timer     |0.625         HZ|2           HZ|10270344082394      |swapper         |
>> |0xf787a05c          |hrtimer   |200239.500    ns|359929      ns|10269316024808      |main            |
>> |main      :[   PROF]|itimer    |0.000         HZ|0           HZ|10237021270557      |main            |
>> |main      :[VIRTUAL]|itimer    |0.000         HZ|0           HZ|10257314773501      |main            |
> 
> 
> 
> Cool! This is really a good work and a good idea.
> 
> Just have some neats in mind. hrtimers and timers don't have the same latency and
> granularity requirements.
> 
> As you show it, timers have an HZ granularity and hrtimers are about nanoseconds,
> and mixing them up in the same array of latency report is too messy.
> They don't have the same granularity/latency scope so they should
> be reported separately.
> 

Yeah, it has different unit and looks ugly :-(

but, the problem is we can't get HZ in userspace now, i'll export HZ by proc or other
way and rectify 'perf timer' output in my next work

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/4] perf_event: fix getting point
  2009-12-16  1:03       ` Xiao Guangrong
@ 2009-12-16  1:22         ` Frederic Weisbecker
  2009-12-16  1:32           ` Xiao Guangrong
  0 siblings, 1 reply; 42+ messages in thread
From: Frederic Weisbecker @ 2009-12-16  1:22 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, LKML

On Wed, Dec 16, 2009 at 09:03:47AM +0800, Xiao Guangrong wrote:
> >> @@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)
> >>  
> >>  void *raw_field_ptr(struct event *event, const char *name, void *data)
> >>  {
> >> +	void *ptr;
> >> +	unsigned long long value;
> >> +
> >> +	value = raw_field_value(event, name, data);
> >> +
> >> +	if (!value)
> >> +		return NULL;
> >> +
> >> +	memcpy(&ptr, &value, sizeof(ptr));
> > 
> > 
> > 
> > Could you perhaps just do
> > 
> > ptr = (void *)value; ?
> > 
> 
> We can't do it in x86_32:
> error: cast to pointer from integer of different size


You can just do:

ptr = (void *)(unsigned long)value;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/4] perf_event: fix getting point
  2009-12-16  1:22         ` Frederic Weisbecker
@ 2009-12-16  1:32           ` Xiao Guangrong
  0 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-16  1:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Steven Rostedt, LKML



Frederic Weisbecker wrote:

>> We can't do it in x86_32:
>> error: cast to pointer from integer of different size
> 
> 
> You can just do:
> 
> ptr = (void *)(unsigned long)value;
> 

Yeah, i forget this, it work well both in 32/64 machine.

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/4] perf/timer: 'perf timer' core code
  2009-12-15 17:44         ` Thomas Gleixner
@ 2009-12-16  5:56           ` Xiao Guangrong
  2009-12-16 15:59             ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-16  5:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Peter Zijlstra, Frederic Weisbecker, Steven Rostedt, LKML

Hi Thomas,

Sorry for many mistakes(typos and bad ideas) in this patch, i'll cook it
more and be careful later. Thanks very much.

Thomas Gleixner wrote:
> The output is confusing in several aspects:
> 
>   1) Different time units:
> 
>      Please use consistent time units for everything. micro seconds
>      are fine and we definitely do not care about nanosecond
>      fractions.

OK, i'll change ns to ms, and for timer, the unit is HZ, do we have the
way to read kernel HZ in userspace? if not, i'll export HZ by proc/debugfs
or other way

> 
>   2) Timer description:
>      
>      Why do we have hex addresses and process names intermingled ? Why
>      don't we print the process/thread name which owns the timer
>      always ? [PROF/VIRTUAL] is not a property of the Timer, it
>      belongs into type.

Um, but not every timer has it's owner task, for example, if we start
a timer in interrupt handle function, this timer in not owns any tasks.
And itimer is started by userspace task so we can get it's owner, that
why i print hex address for timer/hrtimer, and print task name for itimer.

>
>   3) Max-lat-at-Task:
> 
>      What does this field tell ?

It means that which task in running when the maximum latency occurs.
but, as you noticed, this is useless, i'll remove it in next version
patch.

>    
>   4) *handle:
> 
>      That should be a more descriptive name, e.g. function runtime
> 

OK, will fix

>   5) Max-lat-at-func:
> 
>      Is this the callback function which ran the longest time ? Why
>      is it named latency ? Why is it not decoded into a symbol ?

it's my typo, i'll export it using right/better way

> 
> Btw, fitting the output into 80chars allows to use the tool on a non
> graphical terminal as well.
> 

OK, will fix

> Also there are other metrics of timers which are interesting and
> should be retrieved by such a tool:
> 
>      number of activated timers
>      number of canceled timers
>      number of expired timers
> 
> in the form of simple statistics.
> 

OK. will support it

> The canceled timers are interesting also in a list, so we can see
> which timers are canceled after which time and how long before the
> expiry.
> 

Um, i'll cook timer tracepoints to get the time when timer canceled,
and support this function.

>> +static const char * const timer_usage[] = {
>> +	"perf timer [<options>] {record|latency}",
>> +	NULL
>> +};
> 
> Your example above uses "perf timer lat". What's correct ?
> 

Actually, we only compare frontal 3 characters:
	strncmp(argv[0], "lat", 3)

And 'perf sched' and other commands also use this way.

>> +static const struct option latency_options[] = {
>> +	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
>> +		   "sort by key(s): "SORT_KEY),
> 
> Do we really need a sort order ? A single sort key should be
> sufficient.
> 

i think it's necessary.

for example, if we interest in timer's max latency, then we can
use '-s max-timer-latency' to sort it.

And, if it has many timer with the some max latency, then we can
use '-s max-timer-latency,avg-timer-latency' to sort

> I'd prefer to have a selector instead which lets me filter timer
> types. If I debug hrtimers then I have no interest in itimers or timer
> list timers.
> 

OK, will support this filter

>> +static LIST_HEAD(sort_list);
>> +
>> +static void setup_sorting(void)
>> +{
>> +	char *tmp, *tok, *str = strdup(sort_order);
> 
>   Please hand sort_order in as an argument.
> 

Sorry for my stupid question:
'sort_order' is a global variable and setup_sorting() only called
one time, why need hand sort_order in as an argument?

>> +static struct timer_info *
>> +__timer_search(struct rb_root *root, struct timer_info *timer,
>> +	       struct list_head *_sort_list)
>> +{
>> +	struct rb_node *node = root->rb_node;
>> +
>> +	while (node) {
>> +		struct timer_info *timer_info;
>> +		int cmp;
>> +
>> +		timer_info = container_of(node, struct timer_info, node);
>> +
>> +		cmp = timer_key_cmp(_sort_list, timer, timer_info);
>> +		if (cmp > 0)
>> +			node = node->rb_left;
>> +		else if (cmp < 0)
>> +			node = node->rb_right;
>> +		else
>> +			return timer_info;
> 
>   This looks more than odd. You search for a timer in the list by
>   using the compare functions which are used for sorting ?
> 
>   How should that work ?
> 

We put/get timer in a rb-tree base on the specify order, for example:
we default use this order:

sort_dimension__add("timer", &default_cmp);
sort_dimension__add("itimer-type", &default_cmp);

if timer_info->timer is bigger, we put it to left child, littler to right
child, if the timer_info->timer is the same, then we compare
timer_info->itimer_type.

>> +{
>> +	struct timer_info *find = NULL;
>> +	struct timer_info timer_info = {
>> +			.timer = timer,
>> +			.itimer_type = itimer_type,
>> +		};
>> +
>> +	find = timer_search(&timer_info);
>> +	if (find && find->type != type) {
>> +
>> +		dprintf("find timer[%p], but type[%s] is not we expect[%s],"
>> +			"set to initializtion state.\n", find->timer,
>> +			timer_type_string[find->type], timer_type_string[type]);
>> +
>> +		find->type = type;
>> +		find->bug++;
>> +		find->state = TIMER_INIT;
> 
>   Why does a timer_search fail ? And why is fixing up the type if it
>   is not matching a good idea ?
> 

We search timer base on timer_info->timer and timer_info->itimer_type(not timer_info->type),
if we find the timer's type is changed(for example, the timer is "ITIMER" before, and change
to "HRTIMER" later), is should a bug. this case is hardly to happen but should catch it.

>> +static void *get_timer(enum timer_type type, struct event *event, void *data)
>> +{
>> +	if (type == HRTIMER) {
>> +		void *hrtimer = NULL;
>> +
>> +		FILL_RAM_FIELD_PTR(event, hrtimer, data);
>> +		return hrtimer;
> 
>   Shudder.
> 
> 	return raw_field_ptr(event, "hrtimer", data);
> 

Yeah, it's a clear way.

>> +static void
>> +itimer_state_handler(void *data, struct event *event, int this_cpu __used,
>> +		     u64 timestamp __used, struct thread *thread)
>> +{
>> +	u64 value_sec, value_usec, expires;
>> +	struct timer_info *timer_info;
>> +	void *timer = NULL;
>> +	int which;
>> +
>> +	FILLL_RAW_FIELD_VALUE(event, value_sec, data);
>> +	FILLL_RAW_FIELD_VALUE(event, value_usec, data);
>> +	FILLL_RAW_FIELD_VALUE(event, expires, data);
>> +	FILLL_RAW_FIELD_VALUE(event, which, data);
>> +	FILL_RAM_FIELD_PTR(event, timer, data);
> 
>   This is complete obfuscated, while
> 
>   value_sec = get_value(data, event, "value_sec");
> 
>   is obvious.
> 

Sorry, i cannot get this. As i understand:

#define FILL_RAW_FIELD_VALUE(event, field, data)		\
	field = (typeof(field))raw_field_value(event, #field, data)

After FILL_RAW_FIELD_VALUE(event, value_sec, data) expanded, it's:
  value_sec = raw_field_value(event, "value_sec", data)

Why it's wrong? :-(

>> +	timer_info = timer_findnew(thread, ITIMER, which);
>> +
>> +	/* itimer canceled, we skip this event */
>> +	if (!value_sec && !value_usec)
>> +		return ;
> 
>   You throw away valuable information here about canceled timers.
> 

We are not catch *_cancel event in this patch, i'll catch it to support
'number of canceled timers' in the next version.

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior
  2009-12-16  1:19   ` Xiao Guangrong
@ 2009-12-16  7:32     ` Ingo Molnar
  2009-12-16  7:40       ` Xiao Guangrong
  0 siblings, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2009-12-16  7:32 UTC (permalink / raw)
  To: Xiao Guangrong, Paul Mackerras
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, LKML


* Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:

> 
> 
> Frederic Weisbecker wrote:
> > On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
> >> Hi,
> >>
> >> We introduce 'perf timer' in this patchset, it can analyze timer
> >> latency and timer function handle time, the usage and result is
> >> like below:
> >>
> >> # perf timer record
> >> # perf timer lat --print-lat --print-handle
> >> -------------------------------------------------------------------------------------------------------
> >> |   Timer            |   TYPE   |   Avg-latency  | Max-latency  |  Max-latency-at-TS |Max-lat-at-Task |
> >> |0xf7ad1f5c          |hrtimer   |996068.500    ns|1607650     ns|10270128658526      |init            |
> >> |0xf7903f04          |timer     |0.625         HZ|2           HZ|10270344082394      |swapper         |
> >> |0xf787a05c          |hrtimer   |200239.500    ns|359929      ns|10269316024808      |main            |
> >> |main      :[   PROF]|itimer    |0.000         HZ|0           HZ|10237021270557      |main            |
> >> |main      :[VIRTUAL]|itimer    |0.000         HZ|0           HZ|10257314773501      |main            |
> > 
> > 
> > 
> > Cool! This is really a good work and a good idea.
> > 
> > Just have some neats in mind. hrtimers and timers don't have the same latency and
> > granularity requirements.
> > 
> > As you show it, timers have an HZ granularity and hrtimers are about nanoseconds,
> > and mixing them up in the same array of latency report is too messy.
> > They don't have the same granularity/latency scope so they should
> > be reported separately.
> > 
> 
> Yeah, it has different unit and looks ugly :-(
> 
> but, the problem is we can't get HZ in userspace now, i'll export HZ by proc 
> or other way and rectify 'perf timer' output in my next work

We could export HZ as part of the ftrace event format file, in 
/debug/tracing/events/timers/ or so. Not sure what the best way would be to do 
it (without causing runtime overhead in event processing), but we should 
reduce our reliance on /proc and export such instrumentation information via a 
single, coherent interface.

There's a couple of other 'system settings' values that would be useful to 
expose in such a way - for example the number of online CPUs. (in fact that 
could be exposed as a set of online/offline events plus a way to read the 
current count as well)

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior
  2009-12-16  7:32     ` Ingo Molnar
@ 2009-12-16  7:40       ` Xiao Guangrong
  2009-12-16  7:46         ` Ingo Molnar
  0 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-16  7:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul Mackerras, Frederic Weisbecker, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, LKML



Ingo Molnar wrote:

>>>
>> Yeah, it has different unit and looks ugly :-(
>>
>> but, the problem is we can't get HZ in userspace now, i'll export HZ by proc 
>> or other way and rectify 'perf timer' output in my next work
> 
> We could export HZ as part of the ftrace event format file, in 
> /debug/tracing/events/timers/ or so. Not sure what the best way would be to do 
> it (without causing runtime overhead in event processing), but we should 
> reduce our reliance on /proc and export such instrumentation information via a 
> single, coherent interface.
> 

Hi ingo,

Um, if we export this value in tracepoint format, it should add for every timer's
tracepoint.

> There's a couple of other 'system settings' values that would be useful to 
> expose in such a way - for example the number of online CPUs. (in fact that 
> could be exposed as a set of online/offline events plus a way to read the 
> current count as well)
> 


How about export those values in debugfs/tracing/*? perf tracepoint self depend on
this filesystem. 

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior
  2009-12-16  7:40       ` Xiao Guangrong
@ 2009-12-16  7:46         ` Ingo Molnar
  0 siblings, 0 replies; 42+ messages in thread
From: Ingo Molnar @ 2009-12-16  7:46 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paul Mackerras, Frederic Weisbecker, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, LKML


* Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:

> 
> 
> Ingo Molnar wrote:
> 
> >>>
> >> Yeah, it has different unit and looks ugly :-(
> >>
> >> but, the problem is we can't get HZ in userspace now, i'll export HZ by proc 
> >> or other way and rectify 'perf timer' output in my next work
> > 
> > We could export HZ as part of the ftrace event format file, in 
> > /debug/tracing/events/timers/ or so. Not sure what the best way would be to do 
> > it (without causing runtime overhead in event processing), but we should 
> > reduce our reliance on /proc and export such instrumentation information via a 
> > single, coherent interface.
> 
> Hi ingo,
> 
> Um, if we export this value in tracepoint format, it should add for every 
> timer's tracepoint.

Not if it's a special timer tracepoint 'HZ' dedicated for this. (We might even 
get events through it on dynticks kernels, or on dynamic-sched-tick systems 
[in the future] when the scheduler HZ value gets modified - or so.)

> > There's a couple of other 'system settings' values that would be useful to 
> > expose in such a way - for example the number of online CPUs. (in fact 
> > that could be exposed as a set of online/offline events plus a way to read 
> > the current count as well)
> 
> How about export those values in debugfs/tracing/*? perf tracepoint self 
> depend on this filesystem.

No - in the long run most of those APIs will go away, and what remains are the 
ftrace event format descriptors in essence. (Which we then might move to 
/proc/events/ or so.)

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/4] perf/timer: 'perf timer' core code
  2009-12-16  5:56           ` Xiao Guangrong
@ 2009-12-16 15:59             ` Thomas Gleixner
  2009-12-17  7:26               ` Xiao Guangrong
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2009-12-16 15:59 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Peter Zijlstra, Frederic Weisbecker, Steven Rostedt, LKML

Xiao,

On Wed, 16 Dec 2009, Xiao Guangrong wrote:
>
> Sorry for many mistakes(typos and bad ideas) in this patch, i'll cook it
> more and be careful later. Thanks very much.

Nothing to be sorry about. That's why we review code.
 
> Thomas Gleixner wrote:
> > The output is confusing in several aspects:
> >
> >   2) Timer description:
> >      
> >      Why do we have hex addresses and process names intermingled ? Why
> >      don't we print the process/thread name which owns the timer
> >      always ? [PROF/VIRTUAL] is not a property of the Timer, it
> >      belongs into type.
> 
> Um, but not every timer has it's owner task, for example, if we start
> a timer in interrupt handle function, this timer in not owns any tasks.
> And itimer is started by userspace task so we can get it's owner, that
> why i print hex address for timer/hrtimer, and print task name for itimer.

Well, lot's of timers have an owner task. At least all user space
related ones. And if the timer is rearmed in interrupt context, then
this does not change the owner at all.

> >> +static LIST_HEAD(sort_list);
> >> +
> >> +static void setup_sorting(void)
> >> +{
> >> +	char *tmp, *tok, *str = strdup(sort_order);
> > 
> >   Please hand sort_order in as an argument.
> > 
> 
> Sorry for my stupid question:
> 'sort_order' is a global variable and setup_sorting() only called
> one time, why need hand sort_order in as an argument?

Fair enough.

> >   How should that work ?
> > 
> 
> We put/get timer in a rb-tree base on the specify order, for example:
> we default use this order:
> 
> sort_dimension__add("timer", &default_cmp);
> sort_dimension__add("itimer-type", &default_cmp);
> 
> if timer_info->timer is bigger, we put it to left child, littler to right
> child, if the timer_info->timer is the same, then we compare
> timer_info->itimer_type.

Hmm, I wonder whether a hash table would be more efficient for the
recording side.

> >> +{
> >> +	struct timer_info *find = NULL;
> >> +	struct timer_info timer_info = {
> >> +			.timer = timer,
> >> +			.itimer_type = itimer_type,
> >> +		};
> >> +
> >> +	find = timer_search(&timer_info);
> >> +	if (find && find->type != type) {
> >> +
> >> +		dprintf("find timer[%p], but type[%s] is not we expect[%s],"
> >> +			"set to initializtion state.\n", find->timer,
> >> +			timer_type_string[find->type], timer_type_string[type]);
> >> +
> >> +		find->type = type;
> >> +		find->bug++;
> >> +		find->state = TIMER_INIT;
> > 
> >   Why does a timer_search fail ? And why is fixing up the type if it
> >   is not matching a good idea ?
> > 

> We search timer base on timer_info->timer and
> timer_info->itimer_type(not timer_info->type), if we find the
> timer's type is changed(for example, the timer is "ITIMER" before,
> and change to "HRTIMER" later), is should a bug. this case is hardly
> to happen but should catch it.

No, it's not a bug at all. You _can_ have a hrtimer and a timer_list
timer at the same address in a trace. There are two ways to make that
happen:

 1) kmalloc'ed memory contains a timer_list. timer operation is done,
    memory is kfreed. Now another kmalloc gets the just freed memory
    and has a hrtimer at the same address which was used by the
    timer_list before.

 2) timer_list and hrtimer are also allocated on stack. There is no
    guarantee that they are at different addresses.
 
> >> +static void *get_timer(enum timer_type type, struct event *event, void *data)
> >> +{
> >> +	if (type == HRTIMER) {
> >> +		void *hrtimer = NULL;
> >> +
> >> +		FILL_RAM_FIELD_PTR(event, hrtimer, data);
> >> +		return hrtimer;
> > 
> >   Shudder.
> > 
> > 	return raw_field_ptr(event, "hrtimer", data);
> > 
> 
> Yeah, it's a clear way.
> 
> >> +static void
> >> +itimer_state_handler(void *data, struct event *event, int this_cpu __used,
> >> +		     u64 timestamp __used, struct thread *thread)
> >> +{
> >> +	u64 value_sec, value_usec, expires;
> >> +	struct timer_info *timer_info;
> >> +	void *timer = NULL;
> >> +	int which;
> >> +
> >> +	FILLL_RAW_FIELD_VALUE(event, value_sec, data);
> >> +	FILLL_RAW_FIELD_VALUE(event, value_usec, data);
> >> +	FILLL_RAW_FIELD_VALUE(event, expires, data);
> >> +	FILLL_RAW_FIELD_VALUE(event, which, data);
> >> +	FILL_RAM_FIELD_PTR(event, timer, data);
> > 
> >   This is complete obfuscated, while
> > 
> >   value_sec = get_value(data, event, "value_sec");
> > 
> >   is obvious.
> > 
> 
> Sorry, i cannot get this. As i understand:
> 
> #define FILL_RAW_FIELD_VALUE(event, field, data)		\
> 	field = (typeof(field))raw_field_value(event, #field, data)
> 
> After FILL_RAW_FIELD_VALUE(event, value_sec, data) expanded, it's:
>   value_sec = raw_field_value(event, "value_sec", data)
> 
> Why it's wrong? :-(

Simply because the macro hides the fact that this is an assignment of
a value to a variable. That makes the code harder to read.

 	FILLL_RAW_FIELD_VALUE(event, value_sec, data);
vs.	
	value_sec = get_value(data, event, "value_sec");

The latter is fast to parse and entirely clear.

Btw, you agreed above that the open coded call of raw_field_value()
is clearer than the macro magic. :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/4] perf/timer: 'perf timer' core code
  2009-12-16 15:59             ` Thomas Gleixner
@ 2009-12-17  7:26               ` Xiao Guangrong
  0 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-17  7:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Peter Zijlstra, Frederic Weisbecker, Steven Rostedt, LKML

Hi Thomas,

Thomas Gleixner wrote:
> 
> Nothing to be sorry about. That's why we review code.
>  

Thanks.

>> Um, but not every timer has it's owner task, for example, if we start
>> a timer in interrupt handle function, this timer in not owns any tasks.
>> And itimer is started by userspace task so we can get it's owner, that
>> why i print hex address for timer/hrtimer, and print task name for itimer.
> 
> Well, lot's of timers have an owner task. At least all user space
> related ones. And if the timer is rearmed in interrupt context, then
> this does not change the owner at all.
> 

Sorry, i'm confused, for example, has below sequence:

Task1 running----->| (interrupt)
                   |------------- start timerT(start timerT in interrupt handler)

......
( After a while, schedule to another task, and interruption coming )
......

Task2 running----->| (interrupt)
                   |------------- start timerT again

Then, which task is the timerT owner?

Am I missed something?

>>>   How should that work ?
>>>
>> We put/get timer in a rb-tree base on the specify order, for example:
>> we default use this order:
>>
>> sort_dimension__add("timer", &default_cmp);
>> sort_dimension__add("itimer-type", &default_cmp);
>>
>> if timer_info->timer is bigger, we put it to left child, littler to right
>> child, if the timer_info->timer is the same, then we compare
>> timer_info->itimer_type.
> 
> Hmm, I wonder whether a hash table would be more efficient for the
> recording side.
> 

Um. i'll record it address your way.

> 
>> We search timer base on timer_info->timer and
>> timer_info->itimer_type(not timer_info->type), if we find the
>> timer's type is changed(for example, the timer is "ITIMER" before,
>> and change to "HRTIMER" later), is should a bug. this case is hardly
>> to happen but should catch it.
> 
> No, it's not a bug at all. You _can_ have a hrtimer and a timer_list
> timer at the same address in a trace. There are two ways to make that
> happen:
> 
>  1) kmalloc'ed memory contains a timer_list. timer operation is done,
>     memory is kfreed. Now another kmalloc gets the just freed memory
>     and has a hrtimer at the same address which was used by the
>     timer_list before.
> 
>  2) timer_list and hrtimer are also allocated on stack. There is no
>     guarantee that they are at different addresses.
>  

Yeah, my mistake.

> Simply because the macro hides the fact that this is an assignment of
> a value to a variable. That makes the code harder to read.
> 
>  	FILLL_RAW_FIELD_VALUE(event, value_sec, data);
> vs.	
> 	value_sec = get_value(data, event, "value_sec");
> 
> The latter is fast to parse and entirely clear.
> 

Yeah.

> Btw, you agreed above that the open coded call of raw_field_value()
> is clearer than the macro magic. :)
> 

Sorry, i misunderstand your mean before.

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 0/5] perf_event: introduce 'perf timer' to analyze timer's behavior
  2009-12-15 14:23 ` Frederic Weisbecker
@ 2009-12-22 13:00   ` Xiao Guangrong
  2009-12-22 13:01     ` [PATCH v2 1/5] perf_event: fix getting point Xiao Guangrong
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-22 13:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

Hi,

We introduce 'perf timer' in this patchset, it can report
number of activated/expired/canceled timers, timer latency
and timer function runtime.

Changlog v1->v2:
- export HZ in timer's tracepoint, we can use it to get
  TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency as Ingo's
  suggestion

- rename 'perf timer latency' to 'perf timer report',
  because it not only show timer latency but also show timer
  function runtime, timer activated/canceled/expired statistics
  and canceled timer list(TODO)

Below changes are all from Thomas's suggestion:
- fix a lot of typos and bad ideas

- use hash table instead of RB tree to record timer

- let output information more readable

- report number of activated/canceled/expired timers

- support to filter timer types:
  '--type' can do it

TODO:
- Show canceled timer list that is suggested by Thomas

- Any suggestion are welcome.

===============================================
Usage:
- record timer events:
  #perf timer record
  #^C

- show timer statistics:
  #perf timer report
 
Statistics
==========
Activated timers number:
	HRTIMER: 37199 TIMER: 828 ITIMER_PROF: 2 ITIMER_VIRTUAL: 2 ITIMER_REAL: 18 

Expired timers number:
 	HRTIMER: 34912 TIMER: 617 ITIMER_PROF: 1 ITIMER_VIRTUAL: 1 ITIMER_REAL: 1 

Canceled timers number:
  	HRTIMER: 37200 TIMER: 829 ITIMER_PROF: 1 ITIMER_VIRTUAL: 1 ITIMER_REAL: 17
 
- show timer latency:
  #perf timer report --print-lat
                            Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s
-------------------------------------------------------------------------
smbd           |HRTIMER        |1.563        |1.563        |40666.379    
irqbalance     |HRTIMER        |0.842        |2.484        |40670.231    
	...... 
vi             |HRTIMER        |0.123        |0.123        |40637.065    
events/1       |TIMER          |168.420      |1008.000     |40677.006    
events/0       |TIMER          |87.759       |1008.000     |40618.009    
	......
main           |ITIMER_PROF    |0.000        |0.000        |40661.023    
main           |ITIMER_VIRTUAL |0.000        |0.000        |40642.829    
main           |ITIMER_REAL    |0.133        |0.133        |40622.731 


- show timer function runtime:
 #perf timer report --print-runtime
                            Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
     Timer     | Type  |Avg-rt ms|Max-rt ms|      Max-rt-func      | Max-rt-ts s
-------------------------------------------------------------------------------
smbd           |HRTIMER|0.020    |0.020    |hrtimer_wakeup         |40666.379
irqbalance     |HRTIMER|0.012    |0.020    |hrtimer_wakeup         |40670.231
	......
events/1       |TIMER  |0.009    |0.017    |delayed_work_timer_fn  |40646.090
events/0       |TIMER  |0.009    |0.091    |delayed_work_timer_fn  |40651.105

- you can use '--print-lat' and '--print-runtime' at the same time:
 #perf timer report --print-runtime --print-lat

                          Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s
-------------------------------------------------------------------------
smbd           |HRTIMER        |1.563        |1.563        |40666.379    
irqbalance     |HRTIMER        |0.842        |2.484        |40670.231    
	......
	                       Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
     Timer     | Type  |Avg-rt ms|Max-rt ms|      Max-rt-func      | Max-rt-ts s
-------------------------------------------------------------------------------
smbd           |HRTIMER|0.020    |0.020    |hrtimer_wakeup         |40666.379
irqbalance     |HRTIMER|0.012    |0.020    |hrtimer_wakeup         |40670.231

- Select timer type which you are interesting:
  #perf timer report --print-lat --type timer
                          Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s
-------------------------------------------------------------------------
events/1       |TIMER          |168.420      |1008.000     |40677.006    
events/0       |TIMER          |87.759       |1008.000     |40618.009    
events/1       |TIMER          |57.000       |153.000      |40634.571   

You can specify more types, such as: '--type timer,hrtimer'


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/5] perf_event: fix getting point
  2009-12-22 13:00   ` [PATCH v2 0/5] " Xiao Guangrong
@ 2009-12-22 13:01     ` Xiao Guangrong
  2009-12-22 13:03     ` [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format Xiao Guangrong
  2009-12-22 13:04     ` [PATCH v2 3/5] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
  2 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-22 13:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

The point that got by raw_field_ptr() is point to @data offset not
tracepoint saved.

Introduce raw_field_array() to get array data

Changlog v1->v2:
- fix the typo 'arry' -> 'array' that is pointed out by Frederic Weisbecker

- use '(void *)(unsigned long)value' instead of 'memcpy()' as
  Frederic Weisbecker's suggestion

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/builtin-sched.c          |    3 +--
 tools/perf/util/trace-event-parse.c |   23 ++++++++++++++++++++---
 tools/perf/util/trace-event.h       |    3 +++
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 80209df..c083e88 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -630,8 +630,7 @@ static void test_calibrations(void)
 
 #define FILL_ARRAY(ptr, array, event, data)			\
 do {								\
-	void *__array = raw_field_ptr(event, #array, data);	\
-	memcpy(ptr.array, __array, sizeof(ptr.array));	\
+	raw_field_array(event, #array, data, ptr.array, sizeof(ptr.array));\
 } while(0)
 
 #define FILL_COMMON_FIELDS(ptr, event, data)			\
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index c5c32be..037bbe3 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
 			field->flags |= FIELD_IS_ARRAY;
 
 			type = read_token(&token);
-		        while (strcmp(token, "]") != 0) {
+			while (strcmp(token, "]") != 0) {
 				if (last_type == EVENT_ITEM &&
 				    type == EVENT_ITEM)
 					len = 2;
@@ -1919,13 +1919,30 @@ raw_field_value(struct event *event, const char *name, void *data)
 
 void *raw_field_ptr(struct event *event, const char *name, void *data)
 {
+	unsigned long long value;
+
+	value = raw_field_value(event, name, data);
+
+	if (!value)
+		return NULL;
+
+	return (void *)(unsigned long)value;
+}
+
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+	       void *array, int array_size)
+{
+	int len;
 	struct format_field *field;
 
 	field = find_any_field(event, name);
 	if (!field)
-		return NULL;
+		return 0ULL;
 
-	return data + field->offset;
+	len = min(array_size, field->size);
+	memcpy(array, data + field->offset, len);
+	return len;
 }
 
 static int get_common_info(const char *type, int *offset, int *size)
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 6ad4056..c3b5c39 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -255,6 +255,9 @@ unsigned long long read_size(void *ptr, int size);
 unsigned long long
 raw_field_value(struct event *event, const char *name, void *data);
 void *raw_field_ptr(struct event *event, const char *name, void *data);
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+	       void *array, int array_size);
 unsigned long long eval_flag(const char *flag);
 
 int read_tracing_data(int fd, struct perf_event_attr *pattrs, int nb_events);
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format
  2009-12-22 13:00   ` [PATCH v2 0/5] " Xiao Guangrong
  2009-12-22 13:01     ` [PATCH v2 1/5] perf_event: fix getting point Xiao Guangrong
@ 2009-12-22 13:03     ` Xiao Guangrong
  2009-12-22 13:20       ` Xiao Guangrong
  2009-12-28  7:54       ` Ingo Molnar
  2009-12-22 13:04     ` [PATCH v2 3/5] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
  2 siblings, 2 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-22 13:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

Export HZ in timer's tracepoint, we can use it to get
TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency and it's
suggested by Ingo

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 include/trace/events/timer.h |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 13ec15a..7749ae5 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -74,14 +74,17 @@ TRACE_EVENT(timer_expire_entry,
 	TP_STRUCT__entry(
 		__field( void *,	timer	)
 		__field( unsigned long,	now	)
+		__field( int,		hz	)
 	),
 
 	TP_fast_assign(
 		__entry->timer		= timer;
 		__entry->now		= jiffies;
+		__entry->hz		= HZ;
 	),
 
-	TP_printk("timer=%p now=%lu", __entry->timer, __entry->now)
+	TP_printk("timer=%p now=%lu HZ=%d", __entry->timer, __entry->now,
+		  __entry->hz)
 );
 
 /**
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/5] trace_event: record task' real_timer in itimer_state tracepoint
  2009-12-22 13:00   ` [PATCH v2 0/5] " Xiao Guangrong
  2009-12-22 13:01     ` [PATCH v2 1/5] perf_event: fix getting point Xiao Guangrong
  2009-12-22 13:03     ` [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format Xiao Guangrong
@ 2009-12-22 13:04     ` Xiao Guangrong
  2009-12-22 13:06       ` [PATCH v2 4/5] perf/timer: add document for 'perf timer' Xiao Guangrong
  2 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-22 13:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

When itimer is ITIMER_REAL, we should cooperate with hrtimer's
tracepoit to get it's latency, this patch can let us get itimer's
hrtimer easy

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 include/trace/events/timer.h |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 9496b96..13ec15a 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -285,6 +285,7 @@ TRACE_EVENT(itimer_state,
 
 	TP_STRUCT__entry(
 		__field(	int,		which		)
+		__field(	void *,		timer		)
 		__field(	cputime_t,	expires		)
 		__field(	long,		value_sec	)
 		__field(	long,		value_usec	)
@@ -294,6 +295,8 @@ TRACE_EVENT(itimer_state,
 
 	TP_fast_assign(
 		__entry->which		= which;
+		__entry->timer          = which == ITIMER_REAL ?
+					  &current->signal->real_timer : NULL;
 		__entry->expires	= expires;
 		__entry->value_sec	= value->it_value.tv_sec;
 		__entry->value_usec	= value->it_value.tv_usec;
@@ -301,8 +304,9 @@ TRACE_EVENT(itimer_state,
 		__entry->interval_usec	= value->it_interval.tv_usec;
 	),
 
-	TP_printk("which=%d expires=%llu it_value=%ld.%ld it_interval=%ld.%ld",
-		  __entry->which, (unsigned long long)__entry->expires,
+	TP_printk("which=%d timer=%p expires=%llu it_value=%ld.%ld it_interval="
+		  "%ld.%ld", __entry->which, __entry->timer,
+		  (unsigned long long)__entry->expires,
 		  __entry->value_sec, __entry->value_usec,
 		  __entry->interval_sec, __entry->interval_usec)
 );
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/5] perf/timer: add document for 'perf timer'
  2009-12-22 13:04     ` [PATCH v2 3/5] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
@ 2009-12-22 13:06       ` Xiao Guangrong
  2009-12-22 13:08         ` [PATCH v2 5/5] perf/timer: add 'perf timer' core code Xiao Guangrong
  0 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-22 13:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

Add document for 'perf timer', it let people know how to use it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/Documentation/perf-timer.txt |   50 +++++++++++++++++++++++++++++++
 1 files changed, 50 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-timer.txt

diff --git a/tools/perf/Documentation/perf-timer.txt b/tools/perf/Documentation/perf-timer.txt
new file mode 100644
index 0000000..5bd26d4
--- /dev/null
+++ b/tools/perf/Documentation/perf-timer.txt
@@ -0,0 +1,50 @@
+perf-timer(1)
+==============
+
+NAME
+----
+perf-timer - Tool to trace/measure timer properties
+
+SYNOPSIS
+--------
+[verse]
+'perf timer' {record|report} [<options>]
+
+DESCRIPTION
+-----------
+There are two variants of perf timer:
+
+  'perf timer record <command>' to record the timer events
+  of an arbitrary workload.
+
+  'perf timer report' to report number of activated/expired/canceled
+  timers, timer latency and timer function runtime.
+
+OPTIONS
+-------
+-D::
+--dump-raw-trace=::
+        Display verbose dump of the sched data.
+
+-i <file>::
+--input=<file>::
+	Select the input file (default: perf.data)
+
+-s <key[,key2...]>::
+--sort=<key[,key2...]>::
+	Sort the output (default: type, avg-lat, max-lat,
+	avg-runtime, max-runtime)
+
+--print-lat::
+	Print timer latency
+
+--print-runtime::
+	Print timer function runtime
+
+--type::
+	Timer type selector, you can specify one or more types
+	(timer, hrtimer, itimer-real, itimer-virtual, itimer-prof)
+
+SEE ALSO
+--------
+linkperf:perf-record[1]
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/5] perf/timer: add 'perf timer' core code
  2009-12-22 13:06       ` [PATCH v2 4/5] perf/timer: add document for 'perf timer' Xiao Guangrong
@ 2009-12-22 13:08         ` Xiao Guangrong
  0 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-22 13:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

It's the core code of 'perf timer', it can report number of
activated/expired/canceled timers, timer latency and timer function
runtime

Sample output:
- Record timer event:

#perf timer record
^C

- Show timer latency:
#perf timer report --print-lat
                          Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper        |HRTIMER        |0.180        |2.404        |42722.586    
swapper        |TIMER          |2.000        |2.000        |42722.586    
events/0       |TIMER          |2.000        |2.000        |42723.002    
events/0       |TIMER          |0.000        |0.000        |42723.641    
sshd           |TIMER          |0.000        |0.000        |42723.641    

Statistics
==========
Activated timers number:
  HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Expired timers number:
  HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Canceled timers number:
  HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 
 
- Show timer function runtime:
#perf timer report --print-runtime
                          Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
     Timer     | Type  |Avg-rt ms|Max-rt ms|      Max-rt-func      | Max-rt-ts s
-------------------------------------------------------------------------------
swapper        |HRTIMER|0.010    |0.026    |tick_sched_timer       |42722.586
0xc15f2b70     |HRTIMER|0.009    |0.009    |Not - Catch            |42722.000
swapper        |TIMER  |0.058    |0.111    |tcp_write_timer        |42721.930
events/0       |TIMER  |0.007    |0.010    |delayed_work_timer_fn  |42723.002
swapper        |TIMER  |0.037    |0.037    |dev_watchdog           |42723.002
events/0       |TIMER  |0.028    |0.028    |delayed_work_timer_fn  |42723.641
swapper        |TIMER  |0.023    |0.023    |pcnet32_watchdog       |42722.001
events/1       |TIMER  |0.014    |0.014    |Not - Catch            |42723.003
events/0       |TIMER  |0.010    |0.010    |Not - Catch            |42722.001
sshd           |TIMER  |0.009    |0.009    |delayed_work_timer_fn  |42723.641
swapper        |TIMER  |0.004    |0.004    |Not - Catch            |42721.638
events/1       |TIMER  |0.003    |0.003    |Not - Catch            |42723.003

Statistics
==========
Activated timers number:
  HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Expired timers number:
  HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Canceled timers number:
  HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

- Specify types which you are interesting and choose one or more of
  "timer, hrtimer, itimer-real, itimer-virtual, itimer-prof"

                          Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper        |TIMER          |2.000        |2.000        |42722.586    
events/0       |TIMER          |2.000        |2.000        |42723.002    
events/0       |TIMER          |0.000        |0.000        |42723.641    
sshd           |TIMER          |0.000        |0.000        |42723.641  

Changlog v1->v2:
- rename 'perf timer latency' to 'perf timer report',
  because it not only show timer latency but also show timer
  function runtime, timer activated/canceled/expired statistics
  and canceled timer list(TODO)

Below changes are all from Thomas's suggestion:
- fix a lot of typos and bad ideas

- use hash table instead of RB tree to record timer

- let output information more readable

- report number of activated/canceled/expired timers

- support to filter timer types:
  '--type' can do it

TODO:
- Show canceled timer list that is suggested by Thomas

- Any suggestion are welcome.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/Makefile         |    1 +
 tools/perf/builtin-timer.c  | 1000 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/builtin.h        |    1 +
 tools/perf/command-list.txt |    1 +
 tools/perf/perf.c           |    1 +
 5 files changed, 1004 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/builtin-timer.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 4390d22..9b39528 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -440,6 +440,7 @@ BUILTIN_OBJS += builtin-top.o
 BUILTIN_OBJS += builtin-trace.o
 BUILTIN_OBJS += builtin-probe.o
 BUILTIN_OBJS += builtin-kmem.o
+BUILTIN_OBJS += builtin-timer.o
 
 PERFLIBS = $(LIB_FILE)
 
diff --git a/tools/perf/builtin-timer.c b/tools/perf/builtin-timer.c
new file mode 100644
index 0000000..66afa05
--- /dev/null
+++ b/tools/perf/builtin-timer.c
@@ -0,0 +1,1000 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/debug.h"
+#include "util/session.h"
+
+#include <sys/prctl.h>
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+#include "../../include/linux/hash.h"
+
+#define SORT_KEY	"type, avg-lat, max-lat, avg-runtime, max-runtime"
+
+static char const *input_name = "perf.data";
+static const char *sort_order = SORT_KEY;
+static int print_lat, print_runtime, profile_cpu = -1;
+static struct perf_session *timer_session;
+static int bug_nr;
+
+enum timer_type {
+	REAL_ITIMER,
+	VIRTUAL_ITIMER,
+	PROF_ITIMER,
+	TIMER,
+	HRTIMER,
+
+	MAX_TIMER_TYPE,
+};
+
+static char const *timer_type_str[] = {
+	"ITIMER_REAL",
+	"ITIMER_VIRTUAL",
+	"ITIMER_PROF",
+	"TIMER",
+	"HRTIMER",
+};
+
+enum stat_type {
+	ACTIVATED,
+	EXPIRED,
+	CANCELED,
+
+	MAX_STATS,
+};
+
+static u32 stats[MAX_STATS][MAX_TIMER_TYPE];
+static inline void
+stats_update(enum timer_type timer_type, enum stat_type stat_type)
+{
+	stats[stat_type][timer_type]++;
+}
+
+static int hz;
+static inline void update_hz(int nhz)
+{
+	if (!hz)
+		hz = nhz;
+}
+
+static const struct option timer_options[] = {
+	OPT_STRING('i', "input", &input_name, "file",
+		   "input file name"),
+	OPT_BOOLEAN('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_END()
+};
+
+static const char *const timer_usage[] = {
+	"perf timer [<options>] {record|report}",
+	NULL
+};
+
+static int filter_type;
+static int parse_type_opt(const struct option *opt __used,
+			  const char *arg, int unset __used)
+{
+	char *tok, *str = strdup(arg);
+	int ret = 0;
+
+	if (!str)
+		die("strdup");
+
+	while ((tok = strsep(&str, ",")) != NULL) {
+
+		if (!strcmp(tok, "timer"))
+			filter_type |= 1 << TIMER;
+		else if (!strcmp(tok, "hrtimer"))
+			filter_type |= 1 << HRTIMER;
+		else if (!strcmp(tok, "itimer-real"))
+			filter_type |= 1 << REAL_ITIMER;
+		else if (!strcmp(tok, "itimer-virtual"))
+			filter_type |= 1 << VIRTUAL_ITIMER;
+		else if (!strcmp(tok, "itimer-prof"))
+			filter_type |= 1 << PROF_ITIMER;
+		else {
+			printf("Unknow --type: %s\n", tok);
+			ret = -1;
+			break;
+		}
+	}
+
+	free(str);
+	return ret;
+}
+
+static const struct option report_options[] = {
+	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+		   "sort by key(s): "SORT_KEY),
+	OPT_BOOLEAN('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_INTEGER('C', "CPU", &profile_cpu,
+		    "CPU to profile on"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN(0, "print-lat", &print_lat, "show timer latency"),
+	OPT_BOOLEAN(0, "print-runtime", &print_runtime,
+		    "show timer function runtime"),
+	OPT_CALLBACK(0, "type", NULL, "timer types",
+		     "timer type selector, specify one or more types(timer, "
+		     "hrtimer, itimer-real, itimer-virtual, itimer-prof)",
+		     parse_type_opt),
+	OPT_END()
+};
+
+static const char *const report_usage[] = {
+	"perf timer report [<options>]",
+	NULL
+};
+
+struct timer_entry {
+	struct timer_entry *next;
+	struct rb_node node;
+
+	void			*timer;
+	enum timer_type		type;
+	u64			function;
+	struct thread		*timer_thread;
+	int expire_nr;
+	int run_func_nr;
+	int bug;
+
+	struct timer_entry	*itimer_hrtimer;
+
+	u64			timer_last_lat;
+	u64			timer_expire_ts;
+	u64			timer_total_lat;
+	double			timer_avg_lat;
+	s64			timer_max_lat;
+	u64			timer_max_lat_at_ts;
+
+	u64			func_entry_ts;
+	u64			func_total_runtime;
+	double			func_avg_runtime;
+	s64			func_max_runtime;
+	u64			func_max_runtime_at_ts;
+	u64			max_runtime_at_func;
+};
+
+typedef int (*sort_func)(struct timer_entry *, struct timer_entry *);
+struct sort_dimension {
+	const char		*name;
+	sort_func		cmp;
+	struct list_head	list;
+};
+
+#define DEFINE_CMP_FUNCTION(field)					\
+static int cmp_##field(struct timer_entry *v1, struct timer_entry *v2)	\
+{									\
+	if (v1->field < v2->field)					\
+		return -1;						\
+	if (v1->field > v2->field)					\
+		return 1;						\
+	return 0;							\
+}
+
+DEFINE_CMP_FUNCTION(type);
+DEFINE_CMP_FUNCTION(timer_avg_lat);
+DEFINE_CMP_FUNCTION(timer_max_lat);
+DEFINE_CMP_FUNCTION(func_avg_runtime);
+DEFINE_CMP_FUNCTION(func_max_runtime);
+
+#define SORT(field, _name)		\
+{	.cmp = cmp_##field,		\
+	.name = _name,			\
+}
+
+static struct sort_dimension sorts[] = {
+	SORT(type, "type"),
+	SORT(timer_avg_lat, "avg-lat"),
+	SORT(timer_max_lat, "max-lat"),
+	SORT(func_avg_runtime, "avg-runtime"),
+	SORT(func_max_runtime, "max-runtime"),
+};
+
+static int sort_dimension_add(const char *tok, struct list_head *list)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(sorts); i++) {
+		if (!strcmp(sorts[i].name, tok)) {
+			list_add_tail(&sorts[i].list, list);
+			return 0;
+		}
+	}
+
+	return -1;
+}
+
+static LIST_HEAD(sort);
+static void setup_sorting(char const *sort_string)
+{
+	char *tmp, *tok, *str = strdup(sort_string);
+
+	for (tok = strtok_r(str, ",", &tmp);
+			tok; tok = strtok_r(NULL, ", ", &tmp)) {
+		if (sort_dimension_add(tok, &sort) < 0) {
+			error("Unknown --sort key: `%s'", tok);
+			usage_with_options(report_usage, report_options);
+		}
+	}
+
+	free(str);
+}
+
+static int timer_key_cmp(struct list_head *list, struct timer_entry *l,
+			 struct timer_entry *r)
+{
+	struct sort_dimension *sortp;
+	int ret = 0;
+
+	BUG_ON(list_empty(list));
+
+	list_for_each_entry(sortp, list, list) {
+		ret = sortp->cmp(l, r);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
+static void timer_rb_insert(struct rb_root *root, struct timer_entry *timer,
+			    struct list_head *sort_list)
+{
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+	while (*new) {
+		struct timer_entry *this;
+		int cmp;
+
+		this = container_of(*new, struct timer_entry, node);
+		parent = *new;
+
+		cmp = timer_key_cmp(sort_list, timer, this);
+
+		if (cmp > 0)
+			new = &((*new)->rb_left);
+		else
+			new = &((*new)->rb_right);
+	}
+
+	rb_link_node(&timer->node, parent, new);
+	rb_insert_color(&timer->node, root);
+}
+
+static void timer_rb_traversal(struct rb_root *root,
+			       void (*func)(struct timer_entry *))
+{
+	struct rb_node *next;
+
+	next = rb_first(root);
+
+	while (next) {
+		struct timer_entry *timer;
+
+		timer = rb_entry(next, struct timer_entry, node);
+		func(timer);
+		next = rb_next(next);
+	}
+}
+
+#define TIMER_HASH_BITS		10
+#define TIMER_HASH_SIZE		(1UL << TIMER_HASH_BITS)
+
+static struct timer_entry *timer_hash_table[TIMER_HASH_SIZE];
+
+static inline int timer_hash_func(void *timer)
+{
+	return hash_ptr(timer, TIMER_HASH_BITS);
+}
+
+static struct timer_entry *
+timer_hash_findnew(void *timer, enum timer_type type)
+{
+	int hash = timer_hash_func(timer);
+	struct timer_entry **head, *curr, *prev = NULL;
+
+	head = timer_hash_table + hash;
+	curr = *head;
+
+	while (curr) {
+		if (curr->timer == timer && curr->type == type)
+			return curr;
+
+		prev = curr;
+		curr = curr->next;
+	}
+
+	curr = zalloc(sizeof(*curr));
+	if (!curr)
+		die("No memory");
+
+	curr->timer = timer;
+	curr->type = type;
+
+	if (prev)
+		prev->next = curr;
+	else
+		*head = curr;
+
+	return curr;
+}
+
+static inline void *
+get_timer_hrtimer(enum timer_type type, struct event *event, void *data)
+{
+	if (type == HRTIMER)
+		return raw_field_ptr(event, "hrtimer", data);
+
+	return raw_field_ptr(event, "timer", data);
+}
+
+/*
+ * if expiry time is not recorded, it means we not caught the start event,
+ * we can't calculate timer latency
+ */
+static inline bool check_timer_expire_ts(struct timer_entry *timer_entry)
+{
+	return !!timer_entry->timer_expire_ts;
+}
+
+/*
+ * if function entry time is not recorded, it means we not caught the
+ * expire entry event, we can't calculate function runtime
+ */
+static inline bool check_timer_func_entry_ts(struct timer_entry *timer_entry)
+{
+	return !!timer_entry->func_entry_ts;
+}
+
+static void timer_hrtimer_start(enum timer_type type, void *data,
+				struct event *event, struct thread *thread)
+{
+	void *timer;
+	struct timer_entry *timer_entry;
+	u64 expires, function;
+
+	timer = get_timer_hrtimer(type, event, data);
+	function = raw_field_value(event, "function", data);
+	expires = raw_field_value(event, "expires", data);
+
+	stats_update(type, ACTIVATED);
+	timer_entry = timer_hash_findnew(timer, type);
+
+	if (!timer_entry->timer_thread)
+		timer_entry->timer_thread = thread;
+
+	timer_entry->function = function;
+	timer_entry->timer_expire_ts = expires;
+}
+
+static void timer_hrtimer_expires_entry(enum timer_type type, void *data,
+					struct event *event, u64 timestamp)
+{
+	void *timer;
+	struct timer_entry *timer_entry;
+	u64 now;
+	s64 delta;
+
+	timer = get_timer_hrtimer(type, event, data);
+	now = raw_field_value(event, "now", data);
+
+	stats_update(type, EXPIRED);
+	timer_entry = timer_hash_findnew(timer, type);
+	timer_entry->func_entry_ts = timestamp;
+
+	if (!check_timer_expire_ts(timer_entry))
+		return;
+
+	delta = now - timer_entry->timer_expire_ts;
+	if (delta < 0) {
+		timer_entry->bug++;
+		printf("WARNING: timer[%p] type[%s]: latency < 0, expires"
+		       "[%llu] now[%llu]", timer_entry->timer,
+		       timer_type_str[timer_entry->type],
+		       timer_entry->timer_expire_ts, now);
+		return ;
+	}
+
+	timer_entry->expire_nr++;
+	timer_entry->timer_last_lat = delta;
+	timer_entry->timer_total_lat += delta;
+	if (timer_entry->timer_max_lat <= delta) {
+		timer_entry->timer_max_lat = delta;
+		timer_entry->timer_max_lat_at_ts = timestamp;
+	}
+	return;
+}
+
+static void timer_hrtimer_expires_exit(enum timer_type type, void *data,
+				       struct event *event, u64 timestamp)
+{
+	void *timer;
+	struct timer_entry *timer_entry;
+	s64 delta;
+
+	timer = get_timer_hrtimer(type, event, data);
+	timer_entry = timer_hash_findnew(timer, type);
+
+	if (!check_timer_func_entry_ts(timer_entry))
+		return;
+
+	delta = timestamp - timer_entry->func_entry_ts;
+	if (delta < 0) {
+		timer_entry->bug++;
+		printf("WARNING: timer[%p] type[%s]: runtime < 0, "
+		       "func_entry_ts[%llu] now[%llu]", timer_entry->timer,
+		       timer_type_str[timer_entry->type],
+		       timer_entry->func_entry_ts, timestamp);
+		return;
+	}
+
+	timer_entry->run_func_nr++;
+	timer_entry->func_total_runtime += delta;
+	if (timer_entry->func_max_runtime <= delta) {
+		timer_entry->func_max_runtime = delta;
+		timer_entry->func_max_runtime_at_ts = timestamp;
+		timer_entry->max_runtime_at_func = timer_entry->function;
+	}
+
+	return;
+}
+
+static void timer_hrtimer_cancel(enum timer_type type)
+{
+	stats_update(type, CANCELED);
+}
+
+static void timer_start_handler(void *data, struct event *event,
+				u64 timestamp __used, struct thread *thread)
+{
+	return timer_hrtimer_start(TIMER, data, event, thread);
+}
+
+static void
+timer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+			   struct thread *thread __used)
+{
+	int nhz;
+
+	nhz = raw_field_value(event, "hz", data);
+	update_hz(nhz);
+	return timer_hrtimer_expires_entry(TIMER, data, event, timestamp);
+}
+
+static void
+timer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+			  struct thread *thread __used)
+{
+	return timer_hrtimer_expires_exit(TIMER, data, event, timestamp);
+}
+
+static void timer_cancel(void *data __used, struct event *event __used,
+			 u64 timestamp __used, struct thread *thread __used)
+{
+	return timer_hrtimer_cancel(TIMER);
+}
+
+static void hrtimer_start_handler(void *data, struct event *event,
+				  u64 timestamp __used, struct thread *thread)
+{
+	return timer_hrtimer_start(HRTIMER, data, event, thread);
+}
+
+static void
+hrtimer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+			     struct thread *thread __used)
+{
+	return timer_hrtimer_expires_entry(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+			    struct thread *thread __used)
+{
+	return timer_hrtimer_expires_exit(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_cancel(void *data __used, struct event *event __used,
+	       u64 timestamp __used, struct thread *thread __used)
+{
+	return timer_hrtimer_cancel(HRTIMER);
+}
+
+static void itimer_state_handler(void *data, struct event *event,
+				 u64 timestamp __used, struct thread *thread)
+{
+	struct timer_entry *timer_entry;
+	u64 value_sec, value_usec, expires;
+	int which;
+
+	value_sec = raw_field_value(event, "value_sec", data);
+	value_usec = raw_field_value(event, "value_usec", data);
+	expires = raw_field_value(event, "expires", data);
+	which = raw_field_value(event, "which", data);
+
+	timer_entry = timer_hash_findnew(thread, which);
+
+	/* itimer canceled */
+	if (!value_sec && !value_usec) {
+		stats_update(which, CANCELED);
+		return ;
+	}
+
+	/* itimer started */
+	stats_update(which, ACTIVATED);
+
+	if (which == ITIMER_REAL) {
+		void *hrtimer;
+
+		hrtimer = raw_field_ptr(event, "timer", data);
+		timer_entry->itimer_hrtimer = timer_hash_findnew(hrtimer,
+								 HRTIMER);
+	} else
+		timer_entry->timer_expire_ts = expires;
+}
+
+static void
+itimer_expire_handler(void *data, struct event *event, u64 timestamp,
+		      struct thread *thread __used)
+{
+	struct thread *itimer_thread;
+	struct timer_entry *timer_entry;
+	u64 now;
+	s64 delta;
+	pid_t pid;
+	int which;
+
+	which = raw_field_value(event, "which", data);
+	now = raw_field_value(event, "now", data);
+	pid = raw_field_value(event, "pid", data);
+
+	itimer_thread = perf_session__findnew(timer_session, pid);
+	timer_entry = timer_hash_findnew(itimer_thread, which);
+
+	stats_update(which, EXPIRED);
+
+	if (which == ITIMER_REAL) {
+		if (!timer_entry->itimer_hrtimer ||
+		    !check_timer_expire_ts(timer_entry->itimer_hrtimer))
+			return;
+		delta = timer_entry->itimer_hrtimer->timer_last_lat;
+	} else {
+		if (!check_timer_expire_ts(timer_entry))
+			return;
+		delta = now - timer_entry->timer_expire_ts;
+	}
+
+	if (delta < 0) {
+		timer_entry->bug++;
+		printf("WARNING: timer[%p] type[%s]: latency < 0, expires:"
+		       "[%llu] now[%llu]", timer_entry->timer,
+		       timer_type_str[timer_entry->type],
+		       timer_entry->timer_expire_ts, now);
+		return;
+	}
+
+	timer_entry->expire_nr++;
+	timer_entry->timer_total_lat += delta;
+	if (timer_entry->timer_max_lat <= delta) {
+		timer_entry->timer_max_lat = delta;
+		timer_entry->timer_max_lat_at_ts = timestamp;
+	}
+
+	return;
+}
+
+struct event_handler {
+	const char *event_name;
+	void (*handler) (void *data, struct event *event, u64 timestamp,
+			 struct thread *thread);
+} timer_handler[] = {
+	/* timer */
+	{ "timer_start", 	timer_start_handler		},
+	{ "timer_expire_entry",	timer_expire_entry_handler	},
+	{ "timer_expire_exit",	timer_expire_exit_handler	},
+	{ "timer_cancel",	timer_cancel			},
+
+	/* hrtimer */
+	{ "hrtimer_start",	hrtimer_start_handler		},
+	{ "hrtimer_expire_entry", hrtimer_expire_entry_handler	},
+	{ "hrtimer_expire_exit",  hrtimer_expire_exit_handler	},
+	{ "hrtimer_cancel",	hrtimer_cancel			},
+
+	/* itimer */
+	{ "itimer_state",	itimer_state_handler		},
+	{ "itimer_expire",	itimer_expire_handler		},
+};
+
+static void process_raw_event(event_t *raw_event __used,
+			      struct perf_session *session __used, void *data,
+			      int cpu __used, u64 timestamp,
+			      struct thread *thread)
+{
+	struct event *event;
+	unsigned int i;
+	int type;
+
+	type = trace_parse_common_type(data);
+	event = trace_find_event(type);
+
+	for (i = 0; i < ARRAY_SIZE(timer_handler); i++)
+		if (!strcmp(timer_handler[i].event_name, event->name))
+			timer_handler[i].handler(data, event,
+						 timestamp, thread);
+}
+
+static int process_sample_event(event_t *event, struct perf_session *session)
+{
+	struct sample_data data;
+	struct thread *thread;
+
+	if (!(session->sample_type & PERF_SAMPLE_RAW))
+		return 0;
+
+	memset(&data, 0, sizeof(data));
+	data.time = -1;
+	data.cpu = -1;
+	data.period = -1;
+
+	event__parse_sample(event, session->sample_type, &data);
+
+	dump_printf("(IP, %d): %d/%d: %p period: %lld\n",
+		event->header.misc,
+		data.pid, data.tid,
+		(void *)(long)data.ip,
+		(long long)data.period);
+
+	thread = perf_session__findnew(session, data.pid);
+	if (thread == NULL) {
+		pr_debug("problem processing %d event, skipping it.\n",
+			 event->header.type);
+		return -1;
+	}
+
+	dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
+
+	if (profile_cpu != -1 && profile_cpu != (int)data.cpu)
+		return 0;
+
+	process_raw_event(event, session, data.raw_data, data.cpu,
+			  data.time, thread);
+	return 0;
+}
+
+static int sample_type_check(struct perf_session *session)
+{
+	if (!(session->sample_type & PERF_SAMPLE_RAW)) {
+		fprintf(stderr, "No trace sample to read. Did you call "
+			"perf record without -R?");
+		return -1;
+	}
+
+	return 0;
+}
+
+static struct perf_event_ops event_ops = {
+	.process_sample_event	= process_sample_event,
+	.process_comm_event	= event__process_comm,
+	.sample_type_check	= sample_type_check,
+};
+
+static int read_events(void)
+{
+	int err;
+
+	timer_session = perf_session__new(input_name, O_RDONLY, 0);
+	if (!timer_session)
+		return -ENOMEM;
+
+	err = perf_session__process_events(timer_session, &event_ops);
+	return err;
+}
+
+static inline bool timer_hz_unit(struct timer_entry *entry)
+{
+	return entry->type == TIMER || entry->type == VIRTUAL_ITIMER
+		|| entry->type == PROF_ITIMER;
+}
+
+static void timer_adjust(struct timer_entry *entry)
+{
+	if (entry->expire_nr) {
+		entry->timer_avg_lat = (double)entry->timer_total_lat /
+				       (double)entry->expire_nr;
+		if (timer_hz_unit(entry) && hz) {
+			/* change unit to nanosecond */
+			entry->timer_avg_lat  = entry->timer_avg_lat * 1e9 / hz;
+			entry->timer_max_lat = entry->timer_max_lat * 1e9 / hz;
+		}
+	}
+
+	if (entry->run_func_nr)
+		entry->func_avg_runtime = (double)entry->func_total_runtime /
+					  (double)entry->run_func_nr;
+}
+
+static struct rb_root timer_result;
+static void sort_result(void)
+{
+	unsigned int i;
+
+	if (!filter_type)
+		filter_type = 0xFF;
+
+	for (i = 0; i < TIMER_HASH_SIZE; i++) {
+		struct timer_entry *entry;
+
+		for (entry = *(timer_hash_table + i); entry;
+				entry = entry->next) {
+			if (!(filter_type & 1 << entry->type))
+				continue;
+
+			bug_nr += entry->bug;
+			timer_adjust(entry);
+			timer_rb_insert(&timer_result, entry, &sort);
+		}
+	}
+}
+
+static inline void
+print_timer_name(struct timer_entry *timer_entry, char *buf, int buf_len)
+{
+	enum timer_type type;
+
+	type = timer_entry->type;
+	if (type != TIMER && type != HRTIMER) {
+		snprintf(buf, buf_len, "%s",
+			 ((struct thread *)timer_entry->timer)->comm);
+		return;
+	}
+
+	if (timer_entry->timer_thread)
+		snprintf(buf, buf_len, "%s", timer_entry->timer_thread->comm);
+	else
+		snprintf(buf, buf_len, "%p", timer_entry->timer);
+}
+
+static void __print_timer_lat(struct timer_entry *timer_entry)
+{
+	char buf[20];
+
+	if (!timer_entry->expire_nr)
+		return;
+
+	print_timer_name(timer_entry, buf, sizeof(buf));
+	printf("%-15s", buf);
+
+	printf("|%-15s", timer_type_str[timer_entry->type]);
+	if (timer_hz_unit(timer_entry) && !hz) {
+		printf("|%-10.3f %2s", timer_entry->timer_avg_lat, "HZ");
+		printf("|%-10llu %2s", timer_entry->timer_max_lat, "HZ");
+	} else {
+		printf("|%-13.3f", timer_entry->timer_avg_lat / 1e6);
+		printf("|%-13.3f", timer_entry->timer_max_lat / 1e6);
+	}
+
+	printf("|%-13.3f\n", timer_entry->timer_max_lat_at_ts / 1e9);
+}
+
+static void print_timer_lat(void)
+{
+	printf("\n");
+	printf("                          Timer Latency List\n");
+	printf(
+"[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]\n");
+	printf(
+"-------------------------------------------------------------------------\n");
+	printf(
+"     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s\n");
+	printf(
+"-------------------------------------------------------------------------\n");
+	timer_rb_traversal(&timer_result, __print_timer_lat);
+}
+
+static void __print_func_runtime(struct timer_entry *timer_entry)
+{
+	u64 function;
+	char buf[20];
+
+	if (!timer_entry->run_func_nr)
+		return;
+
+	print_timer_name(timer_entry, buf, sizeof(buf));
+	printf("%-15s", buf);
+	printf("|%-7s", timer_type_str[timer_entry->type]);
+	printf("|%-9.3f", timer_entry->func_avg_runtime / 1e6);
+	printf("|%-9.3f", timer_entry->func_max_runtime / 1e6);
+	function = timer_entry->max_runtime_at_func;
+	if (function) {
+		struct symbol *sym;
+
+		sym = map_groups__find_function(&timer_session->kmaps,
+						timer_session, function, NULL);
+		if (sym)
+			printf("|%-23.23s", sym->name);
+		else
+			printf("|%-23llx", function);
+	} else
+		printf("|%-23s", "Not - Catch");
+
+	printf("|%-8.3f\n", timer_entry->func_max_runtime_at_ts / 1e9);
+}
+
+static void print_func_runtime(void)
+{
+	printf("\n");
+	printf("                           Timer Function Runtime List\n");
+	printf(
+"[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]\n");
+
+	printf(
+"-------------------------------------------------------------------------------\n");
+	printf(
+"     Timer     | Type  |Avg-rt ms|Max-rt ms|      Max-rt-func      | Max-rt-ts s\n");
+	printf(
+"-------------------------------------------------------------------------------\n");
+	timer_rb_traversal(&timer_result, __print_func_runtime);
+}
+
+static void __print_bug_timer(struct timer_entry *timer_entry)
+{
+	char buf[20];
+
+	if (!timer_entry->bug)
+		return;
+
+	print_timer_name(timer_entry, buf, sizeof(buf));
+	printf("%-15s", buf);
+
+	printf("|%-15s", timer_type_str[timer_entry->type]);
+	printf("|%-10d\n", timer_entry->bug);
+}
+
+static void print_bug_timer(void)
+{
+	if (bug_nr) {
+		printf("\nWarning: detect %d bug(s):\n", bug_nr);
+		printf("--------------------------------------------\n");
+		printf("     Timer     |     Type      |    Bug   \n");
+		printf("--------------------------------------------\n");
+		timer_rb_traversal(&timer_result, __print_bug_timer);
+	}
+}
+
+static void
+__print_stats(enum stat_type stat_type, char *buf, int buf_len)
+{
+	int i;
+
+	for (i = MAX_TIMER_TYPE - 1; i >= 0; i--) {
+		int plen;
+
+		if (!(filter_type & 1 << i))
+			continue;
+		plen = snprintf(buf, buf_len, "%s: %d ", timer_type_str[i],
+				stats[stat_type][i]);
+		buf += plen;
+		buf_len -= plen;
+	}
+}
+
+static void print_stats(void)
+{
+	char buf[128];
+
+	printf("\nStatistics\n==========\n");
+	__print_stats(ACTIVATED, buf, sizeof(buf));
+	printf("Activated timers number:\n  %s\n\n", buf);
+	__print_stats(EXPIRED, buf, sizeof(buf));
+	printf("Expired timers number:\n  %s\n\n", buf);
+	__print_stats(CANCELED, buf, sizeof(buf));
+	printf("Canceled timers number:\n  %s\n\n", buf);
+}
+
+static void print_result(void)
+{
+	if (print_lat)
+		print_timer_lat();
+
+	if (print_runtime)
+		print_func_runtime();
+
+	print_bug_timer();
+	print_stats();
+}
+
+static void __cmd_report(void)
+{
+	setup_pager();
+	read_events();
+	setup_sorting(sort_order);
+	sort_result();
+	print_result();
+	perf_session__delete(timer_session);
+}
+
+static const char *record_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-M",
+	"-f",
+	"-m", "1024",
+	"-c", "1",
+
+	/* timer */
+	"-e", "timer:timer_start",
+	"-e", "timer:timer_expire_entry",
+	"-e", "timer:timer_expire_exit",
+	"-e", "timer:timer_cancel",
+
+	/* hrtimer */
+	"-e", "timer:hrtimer_start",
+	"-e", "timer:hrtimer_expire_entry",
+	"-e", "timer:hrtimer_expire_exit",
+	"-e", "timer:hrtimer_cancel",
+
+	/* itimer **/
+	"-e", "timer:itimer_state",
+	"-e", "timer:itimer_expire",
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+	unsigned int rec_argc, i, j;
+	const char **rec_argv;
+
+	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+		rec_argv[i] = strdup(record_args[i]);
+
+	for (j = 1; j < (unsigned int)argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+int cmd_timer(int argc, const char **argv, const char *prefix __used)
+{
+	argc = parse_options(argc, argv, timer_options, timer_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+	if (!argc)
+		usage_with_options(timer_usage, timer_options);
+
+	symbol__init();
+	if (!strncmp(argv[0], "rec", 3))
+		return __cmd_record(argc, argv);
+
+	if (!strcmp(argv[0], "report")) {
+		if (argc > 1) {
+			argc = parse_options(argc, argv, report_options,
+					     report_usage, 0);
+			if (argc)
+				usage_with_options(report_usage,
+						   report_options);
+		}
+		__cmd_report();
+
+	} else
+		usage_with_options(timer_usage, timer_options);
+
+	return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 18035b1..e294128 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -30,5 +30,6 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
 extern int cmd_version(int argc, const char **argv, const char *prefix);
 extern int cmd_probe(int argc, const char **argv, const char *prefix);
 extern int cmd_kmem(int argc, const char **argv, const char *prefix);
+extern int cmd_timer(int argc, const char **argv, const char *prefix);
 
 #endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 71dc7c3..e522f5d 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -16,3 +16,4 @@ perf-top			mainporcelain common
 perf-trace			mainporcelain common
 perf-probe			mainporcelain common
 perf-kmem			mainporcelain common
+perf-timer			mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 873e55f..688d292 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -301,6 +301,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "sched",	cmd_sched,	0 },
 		{ "probe",	cmd_probe,	0 },
 		{ "kmem",	cmd_kmem,	0 },
+		{ "timer",	cmd_timer,	0 },
 	};
 	unsigned int i;
 	static const char ext[] = STRIP_EXTENSION;
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format
  2009-12-22 13:03     ` [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format Xiao Guangrong
@ 2009-12-22 13:20       ` Xiao Guangrong
  2009-12-28  7:54       ` Ingo Molnar
  1 sibling, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-22 13:20 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Frederic Weisbecker, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, Paul Mackerras, LKML

Ah, sorry for this title error. :-(

s/[PATCH v2 2/5]:/[PATCH v2 2/5]

Xiao Guangrong wrote:
> Export HZ in timer's tracepoint, we can use it to get
> TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency and it's
> suggested by Ingo
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format
  2009-12-22 13:03     ` [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format Xiao Guangrong
  2009-12-22 13:20       ` Xiao Guangrong
@ 2009-12-28  7:54       ` Ingo Molnar
  2009-12-28 10:40         ` Xiao Guangrong
                           ` (6 more replies)
  1 sibling, 7 replies; 42+ messages in thread
From: Ingo Molnar @ 2009-12-28  7:54 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML


* Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:

> Export HZ in timer's tracepoint, we can use it to get
> TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency and it's
> suggested by Ingo
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
> ---
>  include/trace/events/timer.h |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
> index 13ec15a..7749ae5 100644
> --- a/include/trace/events/timer.h
> +++ b/include/trace/events/timer.h
> @@ -74,14 +74,17 @@ TRACE_EVENT(timer_expire_entry,
>  	TP_STRUCT__entry(
>  		__field( void *,	timer	)
>  		__field( unsigned long,	now	)
> +		__field( int,		hz	)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->timer		= timer;
>  		__entry->now		= jiffies;
> +		__entry->hz		= HZ;
>  	),
>  
> -	TP_printk("timer=%p now=%lu", __entry->timer, __entry->now)
> +	TP_printk("timer=%p now=%lu HZ=%d", __entry->timer, __entry->now,
> +		  __entry->hz)
>  );

I think we can do something slightly different and more efficient: just create 
a new timer event to report the value of HZ.

That way we dont clutter the timer_expire_entry record format with a 
repetitive HZ field. It's an extra 4 bytes overhead: that has to be written, 
passed along, copied and thrown away in 99.9999% of the cases - such overhead 
should be avoided.

If you created a special timer_params event, which would produce precisely one 
event when triggered via say a new perf ioctl. I.e. add something like this to 
perf_event.h:

  #define PERF_EVENT_IOC_INJECT       _IOW('$', 7, __u64)

and add code to kernel/perf_event.c's perf_ioctl() function that takes that 
__u64 parameter as an event ID and injects an 'artificial' event.

Such a new feature would be useful for other things as well: backtesting rare 
events, injecting other types of 'parameter/query events', etc.

There might be more details to this, but it would be a useful scheme IMO - and 
it would still integrate nicely with the whole ftrace event enumeration scheme 
so tooling support would be easier.

What do you think?

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format
  2009-12-28  7:54       ` Ingo Molnar
@ 2009-12-28 10:40         ` Xiao Guangrong
  2009-12-29  5:20         ` [PATCH v3 0/5] perf tools: introduce 'perf timer' to analyze timer's behavior Xiao Guangrong
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-28 10:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML



Ingo Molnar wrote:

> 
> I think we can do something slightly different and more efficient: just create 
> a new timer event to report the value of HZ.
> 
> That way we dont clutter the timer_expire_entry record format with a 
> repetitive HZ field. It's an extra 4 bytes overhead: that has to be written, 
> passed along, copied and thrown away in 99.9999% of the cases - such overhead 
> should be avoided.
> 
> If you created a special timer_params event, which would produce precisely one 
> event when triggered via say a new perf ioctl. I.e. add something like this to 
> perf_event.h:
> 
>   #define PERF_EVENT_IOC_INJECT       _IOW('$', 7, __u64)
> 
> and add code to kernel/perf_event.c's perf_ioctl() function that takes that 
> __u64 parameter as an event ID and injects an 'artificial' event.
> 
> Such a new feature would be useful for other things as well: backtesting rare 
> events, injecting other types of 'parameter/query events', etc.
> 
> There might be more details to this, but it would be a useful scheme IMO - and 
> it would still integrate nicely with the whole ftrace event enumeration scheme 
> so tooling support would be easier.
> 
> What do you think?
> 

Sure, this is the better way, i'll fix it address your suggestion in the next version.

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 0/5] perf tools: introduce 'perf timer' to analyze timer's behavior
  2009-12-28  7:54       ` Ingo Molnar
  2009-12-28 10:40         ` Xiao Guangrong
@ 2009-12-29  5:20         ` Xiao Guangrong
  2009-12-29  5:21         ` [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ Xiao Guangrong
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-29  5:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

Changelog v2->v3:

introduce 'inject' event to get kernel HZ address Ingo's suggestion
[ see http://lkml.org/lkml/2009/12/28/31 ]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-28  7:54       ` Ingo Molnar
  2009-12-28 10:40         ` Xiao Guangrong
  2009-12-29  5:20         ` [PATCH v3 0/5] perf tools: introduce 'perf timer' to analyze timer's behavior Xiao Guangrong
@ 2009-12-29  5:21         ` Xiao Guangrong
  2009-12-30  9:19           ` Peter Zijlstra
  2009-12-29  5:21         ` [PATCH v3 2/5] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-29  5:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

'inject' event is a very useful feature and it's suggested by Ingo
[ See http://lkml.org/lkml/2009/12/28/31 ]

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 include/linux/perf_event.h  |   13 +++++++++++
 kernel/perf_event.c         |   47 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/builtin-record.c |   13 +++++++++++
 tools/perf/util/session.c   |    5 ++++
 tools/perf/util/session.h   |    3 +-
 5 files changed, 80 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9a1d276..6c93f88 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -228,6 +228,7 @@ struct perf_event_attr {
 #define PERF_EVENT_IOC_PERIOD		_IOW('$', 4, __u64)
 #define PERF_EVENT_IOC_SET_OUTPUT	_IO ('$', 5)
 #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
+#define PERF_EVENT_IOC_INJECT		_IO ('$', 7)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
@@ -413,10 +414,22 @@ enum perf_event_type {
 	 * };
 	 */
 	PERF_RECORD_SAMPLE		= 9,
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32 inject_event_id;
+	 * 	u64 value;
+	 * };
+	 */
+	PERF_RECORD_INJECT		= 10,
 
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
+enum perf_inject_event {
+	PERF_INJECT_HZ = 0x01,
+};
+
 enum perf_callchain_context {
 	PERF_CONTEXT_HV			= (__u64)-32,
 	PERF_CONTEXT_KERNEL		= (__u64)-128,
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 8984afd..9343c6c 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2012,6 +2012,50 @@ unlock:
 	return ret;
 }
 
+static int perf_inject_get_hz(u64 *hz)
+{
+	*hz = HZ;
+	return 0;
+}
+
+static int perf_inject_event(struct perf_event *event, u32 inject_event_id,
+			     int (*get_value)(u64 *))
+{
+	struct perf_output_handle handle;
+	struct perf_inject_event {
+		struct perf_event_header header;
+		u32 inject_event_id;
+		u64 value;
+	} inject_event;
+	int ret = 0;
+
+	inject_event.header.type = PERF_RECORD_INJECT;
+	inject_event.header.misc = 0;
+	inject_event.header.size = sizeof(inject_event);
+	inject_event.inject_event_id = inject_event_id;
+
+	ret = get_value(&inject_event.value);
+	if (ret)
+		goto exit;
+
+	ret = perf_output_begin(&handle, event, inject_event.header.size, 0, 0);
+	if (ret)
+		goto exit;
+
+	perf_output_put(&handle, inject_event);
+	perf_output_end(&handle);
+exit:
+	return ret;
+}
+
+static int perf_inject_ioctl(struct perf_event *event, unsigned int arg)
+{
+	if (!arg || arg & ~PERF_INJECT_HZ)
+		return -EINVAL;
+
+	return perf_inject_event(event, PERF_INJECT_HZ, perf_inject_get_hz);
+}
+
 static int perf_event_set_output(struct perf_event *event, int output_fd);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
 
@@ -2044,6 +2088,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case PERF_EVENT_IOC_SET_FILTER:
 		return perf_event_set_filter(event, (void __user *)arg);
 
+	case PERF_EVENT_IOC_INJECT:
+		return perf_inject_ioctl(event, arg);
+
 	default:
 		return -ENOTTY;
 	}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2654253..d13601d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -65,6 +65,8 @@ static int			file_new			=      1;
 
 static struct perf_session	*session;
 
+u32 inject_events;
+
 struct mmap_data {
 	int			counter;
 	void			*base;
@@ -381,6 +383,17 @@ try_again:
 		}
 	}
 
+	if (inject_events) {
+		ret = ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_INJECT,
+			    inject_events);
+		if (ret) {
+			error("failed to inject event(%u) with %d (%s)\n",
+			      inject_events, errno, strerror(errno));
+			exit(-1);
+		}
+		inject_events = 0;
+	}
+
 	ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_ENABLE);
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7f0537d..74f43af 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -179,6 +179,8 @@ static void perf_event_ops__fill_defaults(struct perf_event_ops *handler)
 		handler->throttle = process_event_stub;
 	if (handler->unthrottle == NULL)
 		handler->unthrottle = process_event_stub;
+	if (handler->inject == NULL)
+		handler->inject = process_event_stub;
 }
 
 static const char *event__name[] = {
@@ -192,6 +194,7 @@ static const char *event__name[] = {
 	[PERF_RECORD_FORK]	 = "FORK",
 	[PERF_RECORD_READ]	 = "READ",
 	[PERF_RECORD_SAMPLE]	 = "SAMPLE",
+	[PERF_RECORD_INJECT]	 = "INJECT",
 };
 
 unsigned long event__total[PERF_RECORD_MAX];
@@ -239,6 +242,8 @@ static int perf_session__process_event(struct perf_session *self,
 		return ops->throttle(event, self);
 	case PERF_RECORD_UNTHROTTLE:
 		return ops->unthrottle(event, self);
+	case PERF_RECORD_INJECT:
+		return ops->inject(event, self);
 	default:
 		self->unknown_events++;
 		return -1;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 77c5ee2..8742354 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -40,7 +40,8 @@ struct perf_event_ops {
 		 lost,
 		 read,
 		 throttle,
-		 unthrottle;
+		 unthrottle,
+		 inject;
 };
 
 struct perf_session *perf_session__new(const char *filename, int mode, bool force);
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/5] trace_event: record task' real_timer in itimer_state tracepoint
  2009-12-28  7:54       ` Ingo Molnar
                           ` (2 preceding siblings ...)
  2009-12-29  5:21         ` [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ Xiao Guangrong
@ 2009-12-29  5:21         ` Xiao Guangrong
  2009-12-29  5:21         ` [PATCH v3 3/5] perf tools: fix getting point Xiao Guangrong
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-29  5:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

When itimer is ITIMER_REAL, we should cooperate with hrtimer's
tracepoit to get it's latency, this patch can let us get itimer's
hrtimer easy

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 include/trace/events/timer.h |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 9496b96..13ec15a 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -285,6 +285,7 @@ TRACE_EVENT(itimer_state,
 
 	TP_STRUCT__entry(
 		__field(	int,		which		)
+		__field(	void *,		timer		)
 		__field(	cputime_t,	expires		)
 		__field(	long,		value_sec	)
 		__field(	long,		value_usec	)
@@ -294,6 +295,8 @@ TRACE_EVENT(itimer_state,
 
 	TP_fast_assign(
 		__entry->which		= which;
+		__entry->timer          = which == ITIMER_REAL ?
+					  &current->signal->real_timer : NULL;
 		__entry->expires	= expires;
 		__entry->value_sec	= value->it_value.tv_sec;
 		__entry->value_usec	= value->it_value.tv_usec;
@@ -301,8 +304,9 @@ TRACE_EVENT(itimer_state,
 		__entry->interval_usec	= value->it_interval.tv_usec;
 	),
 
-	TP_printk("which=%d expires=%llu it_value=%ld.%ld it_interval=%ld.%ld",
-		  __entry->which, (unsigned long long)__entry->expires,
+	TP_printk("which=%d timer=%p expires=%llu it_value=%ld.%ld it_interval="
+		  "%ld.%ld", __entry->which, __entry->timer,
+		  (unsigned long long)__entry->expires,
 		  __entry->value_sec, __entry->value_usec,
 		  __entry->interval_sec, __entry->interval_usec)
 );
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/5] perf tools: fix getting point
  2009-12-28  7:54       ` Ingo Molnar
                           ` (3 preceding siblings ...)
  2009-12-29  5:21         ` [PATCH v3 2/5] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
@ 2009-12-29  5:21         ` Xiao Guangrong
  2009-12-29  5:21         ` [PATCH v3 4/5] perf timer: add document for 'perf timer' Xiao Guangrong
  2009-12-29  5:22         ` [PATCH v3 5/5] perf timer: add 'perf timer' core code Xiao Guangrong
  6 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-29  5:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

The point that got by raw_field_ptr() is point to @data offset not
tracepoint saved.

Introduce raw_field_array() to get array data

Changlog v1->v2:
- fix the typo 'arry' -> 'array' that is pointed out by Frederic Weisbecker

- use '(void *)(unsigned long)value' instead of 'memcpy()' as
  Frederic Weisbecker's suggestion

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/builtin-sched.c          |    3 +--
 tools/perf/util/trace-event-parse.c |   23 ++++++++++++++++++++---
 tools/perf/util/trace-event.h       |    3 +++
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 702322f..9b91836 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -630,8 +630,7 @@ static void test_calibrations(void)
 
 #define FILL_ARRAY(ptr, array, event, data)			\
 do {								\
-	void *__array = raw_field_ptr(event, #array, data);	\
-	memcpy(ptr.array, __array, sizeof(ptr.array));	\
+	raw_field_array(event, #array, data, ptr.array, sizeof(ptr.array));\
 } while(0)
 
 #define FILL_COMMON_FIELDS(ptr, event, data)			\
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index c5c32be..037bbe3 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
 			field->flags |= FIELD_IS_ARRAY;
 
 			type = read_token(&token);
-		        while (strcmp(token, "]") != 0) {
+			while (strcmp(token, "]") != 0) {
 				if (last_type == EVENT_ITEM &&
 				    type == EVENT_ITEM)
 					len = 2;
@@ -1919,13 +1919,30 @@ raw_field_value(struct event *event, const char *name, void *data)
 
 void *raw_field_ptr(struct event *event, const char *name, void *data)
 {
+	unsigned long long value;
+
+	value = raw_field_value(event, name, data);
+
+	if (!value)
+		return NULL;
+
+	return (void *)(unsigned long)value;
+}
+
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+	       void *array, int array_size)
+{
+	int len;
 	struct format_field *field;
 
 	field = find_any_field(event, name);
 	if (!field)
-		return NULL;
+		return 0ULL;
 
-	return data + field->offset;
+	len = min(array_size, field->size);
+	memcpy(array, data + field->offset, len);
+	return len;
 }
 
 static int get_common_info(const char *type, int *offset, int *size)
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 6ad4056..c3b5c39 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -255,6 +255,9 @@ unsigned long long read_size(void *ptr, int size);
 unsigned long long
 raw_field_value(struct event *event, const char *name, void *data);
 void *raw_field_ptr(struct event *event, const char *name, void *data);
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+	       void *array, int array_size);
 unsigned long long eval_flag(const char *flag);
 
 int read_tracing_data(int fd, struct perf_event_attr *pattrs, int nb_events);
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 4/5] perf timer: add document for 'perf timer'
  2009-12-28  7:54       ` Ingo Molnar
                           ` (4 preceding siblings ...)
  2009-12-29  5:21         ` [PATCH v3 3/5] perf tools: fix getting point Xiao Guangrong
@ 2009-12-29  5:21         ` Xiao Guangrong
  2009-12-29  5:22         ` [PATCH v3 5/5] perf timer: add 'perf timer' core code Xiao Guangrong
  6 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-29  5:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

Add document for 'perf timer', it let people know how to use it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/Documentation/perf-timer.txt |   50 +++++++++++++++++++++++++++++++
 1 files changed, 50 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-timer.txt

diff --git a/tools/perf/Documentation/perf-timer.txt b/tools/perf/Documentation/perf-timer.txt
new file mode 100644
index 0000000..5bd26d4
--- /dev/null
+++ b/tools/perf/Documentation/perf-timer.txt
@@ -0,0 +1,50 @@
+perf-timer(1)
+==============
+
+NAME
+----
+perf-timer - Tool to trace/measure timer properties
+
+SYNOPSIS
+--------
+[verse]
+'perf timer' {record|report} [<options>]
+
+DESCRIPTION
+-----------
+There are two variants of perf timer:
+
+  'perf timer record <command>' to record the timer events
+  of an arbitrary workload.
+
+  'perf timer report' to report number of activated/expired/canceled
+  timers, timer latency and timer function runtime.
+
+OPTIONS
+-------
+-D::
+--dump-raw-trace=::
+        Display verbose dump of the sched data.
+
+-i <file>::
+--input=<file>::
+	Select the input file (default: perf.data)
+
+-s <key[,key2...]>::
+--sort=<key[,key2...]>::
+	Sort the output (default: type, avg-lat, max-lat,
+	avg-runtime, max-runtime)
+
+--print-lat::
+	Print timer latency
+
+--print-runtime::
+	Print timer function runtime
+
+--type::
+	Timer type selector, you can specify one or more types
+	(timer, hrtimer, itimer-real, itimer-virtual, itimer-prof)
+
+SEE ALSO
+--------
+linkperf:perf-record[1]
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 5/5] perf timer: add 'perf timer' core code
  2009-12-28  7:54       ` Ingo Molnar
                           ` (5 preceding siblings ...)
  2009-12-29  5:21         ` [PATCH v3 4/5] perf timer: add document for 'perf timer' Xiao Guangrong
@ 2009-12-29  5:22         ` Xiao Guangrong
  6 siblings, 0 replies; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-29  5:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt, Paul Mackerras, LKML

It's the core code of 'perf timer', it can report number of
activated/expired/canceled timers, timer latency and timer function
runtime

Sample output:
- Record timer event:

#perf timer record
^C

- Show timer latency:
#perf timer report --print-lat
                          Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper        |HRTIMER        |0.180        |2.404        |42722.586    
swapper        |TIMER          |2.000        |2.000        |42722.586    
events/0       |TIMER          |2.000        |2.000        |42723.002    
events/0       |TIMER          |0.000        |0.000        |42723.641    
sshd           |TIMER          |0.000        |0.000        |42723.641    

Statistics
==========
Activated timers number:
  HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Expired timers number:
  HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Canceled timers number:
  HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 
 
- Show timer function runtime:
#perf timer report --print-runtime
                          Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
     Timer     | Type  |Avg-rt ms|Max-rt ms|      Max-rt-func      | Max-rt-ts s
-------------------------------------------------------------------------------
swapper        |HRTIMER|0.010    |0.026    |tick_sched_timer       |42722.586
0xc15f2b70     |HRTIMER|0.009    |0.009    |Not - Catch            |42722.000
swapper        |TIMER  |0.058    |0.111    |tcp_write_timer        |42721.930
events/0       |TIMER  |0.007    |0.010    |delayed_work_timer_fn  |42723.002
swapper        |TIMER  |0.037    |0.037    |dev_watchdog           |42723.002
events/0       |TIMER  |0.028    |0.028    |delayed_work_timer_fn  |42723.641
swapper        |TIMER  |0.023    |0.023    |pcnet32_watchdog       |42722.001
events/1       |TIMER  |0.014    |0.014    |Not - Catch            |42723.003
events/0       |TIMER  |0.010    |0.010    |Not - Catch            |42722.001
sshd           |TIMER  |0.009    |0.009    |delayed_work_timer_fn  |42723.641
swapper        |TIMER  |0.004    |0.004    |Not - Catch            |42721.638
events/1       |TIMER  |0.003    |0.003    |Not - Catch            |42723.003

Statistics
==========
Activated timers number:
  HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Expired timers number:
  HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

Canceled timers number:
  HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0 

- Specify types which you are interesting and choose one or more of
  "timer, hrtimer, itimer-real, itimer-virtual, itimer-prof"

#perf timer report --print-lat --type timer

                          Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper        |TIMER          |2.000        |2.000        |42722.586    
events/0       |TIMER          |2.000        |2.000        |42723.002    
events/0       |TIMER          |0.000        |0.000        |42723.641    
sshd           |TIMER          |0.000        |0.000        |42723.641  

Changlog v1->v2:
- rename 'perf timer latency' to 'perf timer report',
  because it not only show timer latency but also show timer
  function runtime, timer activated/canceled/expired statistics
  and canceled timer list(TODO)

Below changes are all from Thomas's suggestion:
- fix a lot of typos and bad ideas

- use hash table instead of RB tree to record timer

- let output information more readable

- report number of activated/canceled/expired timers

- support to filter timer types:
  '--type' can do it

TODO:
- Show canceled timer list that is suggested by Thomas

- Any suggestion are welcome.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/Makefile         |    1 +
 tools/perf/builtin-timer.c  |  996 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/builtin.h        |    1 +
 tools/perf/command-list.txt |    1 +
 tools/perf/perf.c           |    1 +
 tools/perf/util/event.h     |    7 +
 6 files changed, 1007 insertions(+), 0 deletions(-)
 create mode 100644 tools/perf/builtin-timer.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7c84642..b5bbcea 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -449,6 +449,7 @@ BUILTIN_OBJS += builtin-top.o
 BUILTIN_OBJS += builtin-trace.o
 BUILTIN_OBJS += builtin-probe.o
 BUILTIN_OBJS += builtin-kmem.o
+BUILTIN_OBJS += builtin-timer.o
 
 PERFLIBS = $(LIB_FILE)
 
diff --git a/tools/perf/builtin-timer.c b/tools/perf/builtin-timer.c
new file mode 100644
index 0000000..17db588
--- /dev/null
+++ b/tools/perf/builtin-timer.c
@@ -0,0 +1,996 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/debug.h"
+#include "util/session.h"
+
+#include <sys/prctl.h>
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+#include "../../include/linux/hash.h"
+
+#define SORT_KEY	"type, avg-lat, max-lat, avg-runtime, max-runtime"
+
+extern u32 inject_events;
+
+static char const *input_name = "perf.data";
+static const char *sort_order = SORT_KEY;
+static int print_lat, print_runtime, profile_cpu = -1;
+static struct perf_session *timer_session;
+static int bug_nr;
+
+enum timer_type {
+	REAL_ITIMER,
+	VIRTUAL_ITIMER,
+	PROF_ITIMER,
+	TIMER,
+	HRTIMER,
+
+	MAX_TIMER_TYPE,
+};
+
+static char const *timer_type_str[] = {
+	"ITIMER_REAL",
+	"ITIMER_VIRTUAL",
+	"ITIMER_PROF",
+	"TIMER",
+	"HRTIMER",
+};
+
+enum stat_type {
+	ACTIVATED,
+	EXPIRED,
+	CANCELED,
+
+	MAX_STATS,
+};
+
+static u32 stats[MAX_STATS][MAX_TIMER_TYPE];
+static inline void
+stats_update(enum timer_type timer_type, enum stat_type stat_type)
+{
+	stats[stat_type][timer_type]++;
+}
+
+static u32 hz;
+static inline void update_hz(u64 *nhz)
+{
+	hz = *nhz;
+}
+
+static const struct option timer_options[] = {
+	OPT_STRING('i', "input", &input_name, "file",
+		   "input file name"),
+	OPT_BOOLEAN('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_END()
+};
+
+static const char *const timer_usage[] = {
+	"perf timer [<options>] {record|report}",
+	NULL
+};
+
+static int filter_type;
+static int parse_type_opt(const struct option *opt __used,
+			  const char *arg, int unset __used)
+{
+	char *tok, *str = strdup(arg);
+	int ret = 0;
+
+	if (!str)
+		die("strdup");
+
+	while ((tok = strsep(&str, ",")) != NULL) {
+
+		if (!strcmp(tok, "timer"))
+			filter_type |= 1 << TIMER;
+		else if (!strcmp(tok, "hrtimer"))
+			filter_type |= 1 << HRTIMER;
+		else if (!strcmp(tok, "itimer-real"))
+			filter_type |= 1 << REAL_ITIMER;
+		else if (!strcmp(tok, "itimer-virtual"))
+			filter_type |= 1 << VIRTUAL_ITIMER;
+		else if (!strcmp(tok, "itimer-prof"))
+			filter_type |= 1 << PROF_ITIMER;
+		else {
+			printf("Unknow --type: %s\n", tok);
+			ret = -1;
+			break;
+		}
+	}
+
+	free(str);
+	return ret;
+}
+
+static const struct option report_options[] = {
+	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+		   "sort by key(s): "SORT_KEY),
+	OPT_BOOLEAN('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_INTEGER('C', "CPU", &profile_cpu,
+		    "CPU to profile on"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN(0, "print-lat", &print_lat, "show timer latency"),
+	OPT_BOOLEAN(0, "print-runtime", &print_runtime,
+		    "show timer function runtime"),
+	OPT_CALLBACK(0, "type", NULL, "timer types",
+		     "timer type selector, specify one or more types(timer, "
+		     "hrtimer, itimer-real, itimer-virtual, itimer-prof)",
+		     parse_type_opt),
+	OPT_END()
+};
+
+static const char *const report_usage[] = {
+	"perf timer report [<options>]",
+	NULL
+};
+
+struct timer_entry {
+	struct timer_entry *next;
+	struct rb_node node;
+
+	void			*timer;
+	enum timer_type		type;
+	u64			function;
+	struct thread		*timer_thread;
+	int expire_nr;
+	int run_func_nr;
+	int bug;
+
+	struct timer_entry	*itimer_hrtimer;
+
+	u64			timer_last_lat;
+	u64			timer_expire_ts;
+	u64			timer_total_lat;
+	double			timer_avg_lat;
+	s64			timer_max_lat;
+	u64			timer_max_lat_at_ts;
+
+	u64			func_entry_ts;
+	u64			func_total_runtime;
+	double			func_avg_runtime;
+	s64			func_max_runtime;
+	u64			func_max_runtime_at_ts;
+	u64			max_runtime_at_func;
+};
+
+typedef int (*sort_func)(struct timer_entry *, struct timer_entry *);
+struct sort_dimension {
+	const char		*name;
+	sort_func		cmp;
+	struct list_head	list;
+};
+
+#define DEFINE_CMP_FUNCTION(field)					\
+static int cmp_##field(struct timer_entry *v1, struct timer_entry *v2)	\
+{									\
+	if (v1->field < v2->field)					\
+		return -1;						\
+	if (v1->field > v2->field)					\
+		return 1;						\
+	return 0;							\
+}
+
+DEFINE_CMP_FUNCTION(type);
+DEFINE_CMP_FUNCTION(timer_avg_lat);
+DEFINE_CMP_FUNCTION(timer_max_lat);
+DEFINE_CMP_FUNCTION(func_avg_runtime);
+DEFINE_CMP_FUNCTION(func_max_runtime);
+
+#define SORT(field, _name)		\
+{	.cmp = cmp_##field,		\
+	.name = _name,			\
+}
+
+static struct sort_dimension sorts[] = {
+	SORT(type, "type"),
+	SORT(timer_avg_lat, "avg-lat"),
+	SORT(timer_max_lat, "max-lat"),
+	SORT(func_avg_runtime, "avg-runtime"),
+	SORT(func_max_runtime, "max-runtime"),
+};
+
+static int sort_dimension_add(const char *tok, struct list_head *list)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(sorts); i++) {
+		if (!strcmp(sorts[i].name, tok)) {
+			list_add_tail(&sorts[i].list, list);
+			return 0;
+		}
+	}
+
+	return -1;
+}
+
+static LIST_HEAD(sort);
+static void setup_sorting(char const *sort_string)
+{
+	char *tmp, *tok, *str = strdup(sort_string);
+
+	for (tok = strtok_r(str, ",", &tmp);
+			tok; tok = strtok_r(NULL, ", ", &tmp)) {
+		if (sort_dimension_add(tok, &sort) < 0) {
+			error("Unknown --sort key: `%s'", tok);
+			usage_with_options(report_usage, report_options);
+		}
+	}
+
+	free(str);
+}
+
+static int timer_key_cmp(struct list_head *list, struct timer_entry *l,
+			 struct timer_entry *r)
+{
+	struct sort_dimension *sortp;
+	int ret = 0;
+
+	BUG_ON(list_empty(list));
+
+	list_for_each_entry(sortp, list, list) {
+		ret = sortp->cmp(l, r);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
+static void timer_rb_insert(struct rb_root *root, struct timer_entry *timer,
+			    struct list_head *sort_list)
+{
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+	while (*new) {
+		struct timer_entry *this;
+		int cmp;
+
+		this = container_of(*new, struct timer_entry, node);
+		parent = *new;
+
+		cmp = timer_key_cmp(sort_list, timer, this);
+
+		if (cmp > 0)
+			new = &((*new)->rb_left);
+		else
+			new = &((*new)->rb_right);
+	}
+
+	rb_link_node(&timer->node, parent, new);
+	rb_insert_color(&timer->node, root);
+}
+
+static void timer_rb_traversal(struct rb_root *root,
+			       void (*func)(struct timer_entry *))
+{
+	struct rb_node *next;
+
+	next = rb_first(root);
+
+	while (next) {
+		struct timer_entry *timer;
+
+		timer = rb_entry(next, struct timer_entry, node);
+		func(timer);
+		next = rb_next(next);
+	}
+}
+
+#define TIMER_HASH_BITS		10
+#define TIMER_HASH_SIZE		(1UL << TIMER_HASH_BITS)
+
+static struct timer_entry *timer_hash_table[TIMER_HASH_SIZE];
+
+static inline int timer_hash_func(void *timer)
+{
+	return hash_ptr(timer, TIMER_HASH_BITS);
+}
+
+static struct timer_entry *
+timer_hash_findnew(void *timer, enum timer_type type)
+{
+	int hash = timer_hash_func(timer);
+	struct timer_entry **head, *curr, *prev = NULL;
+
+	head = timer_hash_table + hash;
+	curr = *head;
+
+	while (curr) {
+		if (curr->timer == timer && curr->type == type)
+			return curr;
+
+		prev = curr;
+		curr = curr->next;
+	}
+
+	curr = zalloc(sizeof(*curr));
+	if (!curr)
+		die("No memory");
+
+	curr->timer = timer;
+	curr->type = type;
+
+	if (prev)
+		prev->next = curr;
+	else
+		*head = curr;
+
+	return curr;
+}
+
+static inline void *
+get_timer_hrtimer(enum timer_type type, struct event *event, void *data)
+{
+	if (type == HRTIMER)
+		return raw_field_ptr(event, "hrtimer", data);
+
+	return raw_field_ptr(event, "timer", data);
+}
+
+/*
+ * if expiry time is not recorded, it means we not caught the start event,
+ * we can't calculate timer latency
+ */
+static inline bool check_timer_expire_ts(struct timer_entry *timer_entry)
+{
+	return !!timer_entry->timer_expire_ts;
+}
+
+/*
+ * if function entry time is not recorded, it means we not caught the
+ * expire entry event, we can't calculate function runtime
+ */
+static inline bool check_timer_func_entry_ts(struct timer_entry *timer_entry)
+{
+	return !!timer_entry->func_entry_ts;
+}
+
+static void timer_hrtimer_start(enum timer_type type, void *data,
+				struct event *event, struct thread *thread)
+{
+	void *timer;
+	struct timer_entry *timer_entry;
+	u64 expires, function;
+
+	timer = get_timer_hrtimer(type, event, data);
+	function = raw_field_value(event, "function", data);
+	expires = raw_field_value(event, "expires", data);
+
+	stats_update(type, ACTIVATED);
+	timer_entry = timer_hash_findnew(timer, type);
+
+	if (!timer_entry->timer_thread)
+		timer_entry->timer_thread = thread;
+
+	timer_entry->function = function;
+	timer_entry->timer_expire_ts = expires;
+}
+
+static void timer_hrtimer_expires_entry(enum timer_type type, void *data,
+					struct event *event, u64 timestamp)
+{
+	void *timer;
+	struct timer_entry *timer_entry;
+	u64 now;
+	s64 delta;
+
+	timer = get_timer_hrtimer(type, event, data);
+	now = raw_field_value(event, "now", data);
+
+	stats_update(type, EXPIRED);
+	timer_entry = timer_hash_findnew(timer, type);
+	timer_entry->func_entry_ts = timestamp;
+
+	if (!check_timer_expire_ts(timer_entry))
+		return;
+
+	delta = now - timer_entry->timer_expire_ts;
+	if (delta < 0) {
+		timer_entry->bug++;
+		printf("WARNING: timer[%p] type[%s]: latency < 0, expires"
+		       "[%llu] now[%llu]\n", timer_entry->timer,
+		       timer_type_str[timer_entry->type],
+		       timer_entry->timer_expire_ts, now);
+		return ;
+	}
+
+	timer_entry->expire_nr++;
+	timer_entry->timer_last_lat = delta;
+	timer_entry->timer_total_lat += delta;
+	if (timer_entry->timer_max_lat <= delta) {
+		timer_entry->timer_max_lat = delta;
+		timer_entry->timer_max_lat_at_ts = timestamp;
+	}
+	return;
+}
+
+static void timer_hrtimer_expires_exit(enum timer_type type, void *data,
+				       struct event *event, u64 timestamp)
+{
+	void *timer;
+	struct timer_entry *timer_entry;
+	s64 delta;
+
+	timer = get_timer_hrtimer(type, event, data);
+	timer_entry = timer_hash_findnew(timer, type);
+
+	if (!check_timer_func_entry_ts(timer_entry))
+		return;
+
+	delta = timestamp - timer_entry->func_entry_ts;
+	if (delta < 0) {
+		timer_entry->bug++;
+		printf("WARNING: timer[%p] type[%s]: runtime < 0, "
+		       "func_entry_ts[%llu] now[%llu]\n", timer_entry->timer,
+		       timer_type_str[timer_entry->type],
+		       timer_entry->func_entry_ts, timestamp);
+		return;
+	}
+
+	timer_entry->run_func_nr++;
+	timer_entry->func_total_runtime += delta;
+	if (timer_entry->func_max_runtime <= delta) {
+		timer_entry->func_max_runtime = delta;
+		timer_entry->func_max_runtime_at_ts = timestamp;
+		timer_entry->max_runtime_at_func = timer_entry->function;
+	}
+
+	return;
+}
+
+static void timer_hrtimer_cancel(enum timer_type type)
+{
+	stats_update(type, CANCELED);
+}
+
+static void timer_start_handler(void *data, struct event *event,
+				u64 timestamp __used, struct thread *thread)
+{
+	return timer_hrtimer_start(TIMER, data, event, thread);
+}
+
+static void
+timer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+			   struct thread *thread __used)
+{
+	return timer_hrtimer_expires_entry(TIMER, data, event, timestamp);
+}
+
+static void
+timer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+			  struct thread *thread __used)
+{
+	return timer_hrtimer_expires_exit(TIMER, data, event, timestamp);
+}
+
+static void timer_cancel(void *data __used, struct event *event __used,
+			 u64 timestamp __used, struct thread *thread __used)
+{
+	return timer_hrtimer_cancel(TIMER);
+}
+
+static void hrtimer_start_handler(void *data, struct event *event,
+				  u64 timestamp __used, struct thread *thread)
+{
+	return timer_hrtimer_start(HRTIMER, data, event, thread);
+}
+
+static void
+hrtimer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+			     struct thread *thread __used)
+{
+	return timer_hrtimer_expires_entry(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+			    struct thread *thread __used)
+{
+	return timer_hrtimer_expires_exit(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_cancel(void *data __used, struct event *event __used,
+	       u64 timestamp __used, struct thread *thread __used)
+{
+	return timer_hrtimer_cancel(HRTIMER);
+}
+
+static void itimer_state_handler(void *data, struct event *event,
+				 u64 timestamp __used, struct thread *thread)
+{
+	struct timer_entry *timer_entry;
+	u64 value_sec, value_usec, expires;
+	int which;
+
+	value_sec = raw_field_value(event, "value_sec", data);
+	value_usec = raw_field_value(event, "value_usec", data);
+	expires = raw_field_value(event, "expires", data);
+	which = raw_field_value(event, "which", data);
+
+	timer_entry = timer_hash_findnew(thread, which);
+
+	/* itimer canceled */
+	if (!value_sec && !value_usec) {
+		stats_update(which, CANCELED);
+		return ;
+	}
+
+	/* itimer started */
+	stats_update(which, ACTIVATED);
+
+	if (which == ITIMER_REAL) {
+		void *hrtimer;
+
+		hrtimer = raw_field_ptr(event, "timer", data);
+		timer_entry->itimer_hrtimer = timer_hash_findnew(hrtimer,
+								 HRTIMER);
+	} else
+		timer_entry->timer_expire_ts = expires;
+}
+
+static void
+itimer_expire_handler(void *data, struct event *event, u64 timestamp,
+		      struct thread *thread __used)
+{
+	struct thread *itimer_thread;
+	struct timer_entry *timer_entry;
+	u64 now;
+	s64 delta;
+	pid_t pid;
+	int which;
+
+	which = raw_field_value(event, "which", data);
+	now = raw_field_value(event, "now", data);
+	pid = raw_field_value(event, "pid", data);
+
+	itimer_thread = perf_session__findnew(timer_session, pid);
+	timer_entry = timer_hash_findnew(itimer_thread, which);
+
+	stats_update(which, EXPIRED);
+
+	if (which == ITIMER_REAL) {
+		if (!timer_entry->itimer_hrtimer ||
+		    !check_timer_expire_ts(timer_entry->itimer_hrtimer))
+			return;
+		delta = timer_entry->itimer_hrtimer->timer_last_lat;
+	} else {
+		if (!check_timer_expire_ts(timer_entry))
+			return;
+		delta = now - timer_entry->timer_expire_ts;
+	}
+
+	if (delta < 0) {
+		timer_entry->bug++;
+		printf("WARNING: timer[%p] type[%s]: latency < 0, expires:"
+		       "[%llu] now[%llu]\n", timer_entry->timer,
+		       timer_type_str[timer_entry->type],
+		       timer_entry->timer_expire_ts, now);
+		return;
+	}
+
+	timer_entry->expire_nr++;
+	timer_entry->timer_total_lat += delta;
+	if (timer_entry->timer_max_lat <= delta) {
+		timer_entry->timer_max_lat = delta;
+		timer_entry->timer_max_lat_at_ts = timestamp;
+	}
+
+	return;
+}
+
+struct event_handler {
+	const char *event_name;
+	void (*handler) (void *data, struct event *event, u64 timestamp,
+			 struct thread *thread);
+} timer_handler[] = {
+	/* timer */
+	{ "timer_start", 	timer_start_handler		},
+	{ "timer_expire_entry",	timer_expire_entry_handler	},
+	{ "timer_expire_exit",	timer_expire_exit_handler	},
+	{ "timer_cancel",	timer_cancel			},
+
+	/* hrtimer */
+	{ "hrtimer_start",	hrtimer_start_handler		},
+	{ "hrtimer_expire_entry", hrtimer_expire_entry_handler	},
+	{ "hrtimer_expire_exit",  hrtimer_expire_exit_handler	},
+	{ "hrtimer_cancel",	hrtimer_cancel			},
+
+	/* itimer */
+	{ "itimer_state",	itimer_state_handler		},
+	{ "itimer_expire",	itimer_expire_handler		},
+};
+
+static void process_raw_event(event_t *raw_event __used,
+			      struct perf_session *session __used, void *data,
+			      int cpu __used, u64 timestamp,
+			      struct thread *thread)
+{
+	struct event *event;
+	unsigned int i;
+	int type;
+
+	type = trace_parse_common_type(data);
+	event = trace_find_event(type);
+
+	for (i = 0; i < ARRAY_SIZE(timer_handler); i++)
+		if (!strcmp(timer_handler[i].event_name, event->name))
+			timer_handler[i].handler(data, event,
+						 timestamp, thread);
+}
+
+static int process_sample_event(event_t *event, struct perf_session *session)
+{
+	struct sample_data data;
+	struct thread *thread;
+
+	if (!(session->sample_type & PERF_SAMPLE_RAW))
+		return 0;
+
+	memset(&data, 0, sizeof(data));
+	data.time = -1;
+	data.cpu = -1;
+	data.period = -1;
+
+	event__parse_sample(event, session->sample_type, &data);
+
+	dump_printf("(IP, %d): %d/%d: %p period: %lld\n",
+		event->header.misc,
+		data.pid, data.tid,
+		(void *)(long)data.ip,
+		(long long)data.period);
+
+	thread = perf_session__findnew(session, data.pid);
+	if (thread == NULL) {
+		pr_debug("problem processing %d event, skipping it.\n",
+			 event->header.type);
+		return -1;
+	}
+
+	dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
+
+	if (profile_cpu != -1 && profile_cpu != (int)data.cpu)
+		return 0;
+
+	process_raw_event(event, session, data.raw_data, data.cpu,
+			  data.time, thread);
+	return 0;
+}
+
+static int
+process_inject_event(event_t *event, struct perf_session *session __used)
+{
+	if (event->inject.inject_event_id == PERF_INJECT_HZ)
+		update_hz(&event->inject.value);
+
+	return 0;
+}
+
+static struct perf_event_ops event_ops = {
+	.sample	= process_sample_event,
+	.comm	= event__process_comm,
+	.inject	= process_inject_event,
+};
+
+static int read_events(void)
+{
+	int err;
+
+	timer_session = perf_session__new(input_name, O_RDONLY, 0);
+	if (!timer_session)
+		return -ENOMEM;
+
+	err = perf_session__process_events(timer_session, &event_ops);
+	return err;
+}
+
+static inline bool timer_hz_unit(struct timer_entry *entry)
+{
+	return entry->type == TIMER || entry->type == VIRTUAL_ITIMER
+		|| entry->type == PROF_ITIMER;
+}
+
+static void timer_adjust(struct timer_entry *entry)
+{
+	if (entry->expire_nr) {
+		entry->timer_avg_lat = (double)entry->timer_total_lat /
+				       (double)entry->expire_nr;
+		if (timer_hz_unit(entry) && hz) {
+			/* change unit to nanosecond */
+			entry->timer_avg_lat  = entry->timer_avg_lat * 1e9 / hz;
+			entry->timer_max_lat = entry->timer_max_lat * 1e9 / hz;
+		}
+	}
+
+	if (entry->run_func_nr)
+		entry->func_avg_runtime = (double)entry->func_total_runtime /
+					  (double)entry->run_func_nr;
+}
+
+static struct rb_root timer_result;
+static void sort_result(void)
+{
+	unsigned int i;
+
+	if (!filter_type)
+		filter_type = 0xFF;
+
+	for (i = 0; i < TIMER_HASH_SIZE; i++) {
+		struct timer_entry *entry;
+
+		for (entry = *(timer_hash_table + i); entry;
+				entry = entry->next) {
+			if (!(filter_type & 1 << entry->type))
+				continue;
+
+			bug_nr += entry->bug;
+			timer_adjust(entry);
+			timer_rb_insert(&timer_result, entry, &sort);
+		}
+	}
+}
+
+static inline void
+print_timer_name(struct timer_entry *timer_entry, char *buf, int buf_len)
+{
+	enum timer_type type;
+
+	type = timer_entry->type;
+	if (type != TIMER && type != HRTIMER) {
+		snprintf(buf, buf_len, "%s",
+			 ((struct thread *)timer_entry->timer)->comm);
+		return;
+	}
+
+	if (timer_entry->timer_thread)
+		snprintf(buf, buf_len, "%s", timer_entry->timer_thread->comm);
+	else
+		snprintf(buf, buf_len, "%p", timer_entry->timer);
+}
+
+static void __print_timer_lat(struct timer_entry *timer_entry)
+{
+	char buf[20];
+
+	if (!timer_entry->expire_nr)
+		return;
+
+	print_timer_name(timer_entry, buf, sizeof(buf));
+	printf("%-15s", buf);
+
+	printf("|%-15s", timer_type_str[timer_entry->type]);
+	if (timer_hz_unit(timer_entry) && !hz) {
+		printf("|%-10.3f %2s", timer_entry->timer_avg_lat, "HZ");
+		printf("|%-10llu %2s", timer_entry->timer_max_lat, "HZ");
+	} else {
+		printf("|%-13.3f", timer_entry->timer_avg_lat / 1e6);
+		printf("|%-13.3f", timer_entry->timer_max_lat / 1e6);
+	}
+
+	printf("|%-13.3f\n", timer_entry->timer_max_lat_at_ts / 1e9);
+}
+
+static void print_timer_lat(void)
+{
+	printf("\n");
+	printf("                          Timer Latency List\n");
+	printf(
+"[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]\n");
+	printf(
+"-------------------------------------------------------------------------\n");
+	printf(
+"     Timer     |     TYPE      | Avg-lat   ms| Max-lat   ms| Max-lat-ts s\n");
+	printf(
+"-------------------------------------------------------------------------\n");
+	timer_rb_traversal(&timer_result, __print_timer_lat);
+}
+
+static void __print_func_runtime(struct timer_entry *timer_entry)
+{
+	u64 function;
+	char buf[20];
+
+	if (!timer_entry->run_func_nr)
+		return;
+
+	print_timer_name(timer_entry, buf, sizeof(buf));
+	printf("%-15s", buf);
+	printf("|%-7s", timer_type_str[timer_entry->type]);
+	printf("|%-9.3f", timer_entry->func_avg_runtime / 1e6);
+	printf("|%-9.3f", timer_entry->func_max_runtime / 1e6);
+	function = timer_entry->max_runtime_at_func;
+	if (function) {
+		struct symbol *sym;
+
+		sym = map_groups__find_function(&timer_session->kmaps,
+						timer_session, function, NULL);
+		if (sym)
+			printf("|%-23.23s", sym->name);
+		else
+			printf("|%-23llx", function);
+	} else
+		printf("|%-23s", "Not - Catch");
+
+	printf("|%-8.3f\n", timer_entry->func_max_runtime_at_ts / 1e9);
+}
+
+static void print_func_runtime(void)
+{
+	printf("\n");
+	printf("                           Timer Function Runtime List\n");
+	printf(
+"[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]\n");
+
+	printf(
+"-------------------------------------------------------------------------------\n");
+	printf(
+"     Timer     | Type  |Avg-rt ms|Max-rt ms|      Max-rt-func      | Max-rt-ts s\n");
+	printf(
+"-------------------------------------------------------------------------------\n");
+	timer_rb_traversal(&timer_result, __print_func_runtime);
+}
+
+static void __print_bug_timer(struct timer_entry *timer_entry)
+{
+	char buf[20];
+
+	if (!timer_entry->bug)
+		return;
+
+	print_timer_name(timer_entry, buf, sizeof(buf));
+	printf("%-15s", buf);
+
+	printf("|%-15s", timer_type_str[timer_entry->type]);
+	printf("|%-10d\n", timer_entry->bug);
+}
+
+static void print_bug_timer(void)
+{
+	if (bug_nr) {
+		printf("\nWarning: detect %d bug(s):\n", bug_nr);
+		printf("--------------------------------------------\n");
+		printf("     Timer     |     Type      |    Bug   \n");
+		printf("--------------------------------------------\n");
+		timer_rb_traversal(&timer_result, __print_bug_timer);
+	}
+}
+
+static void
+__print_stats(enum stat_type stat_type, char *buf, int buf_len)
+{
+	int i;
+
+	for (i = MAX_TIMER_TYPE - 1; i >= 0; i--) {
+		int plen;
+
+		if (!(filter_type & 1 << i))
+			continue;
+		plen = snprintf(buf, buf_len, "%s: %d ", timer_type_str[i],
+				stats[stat_type][i]);
+		buf += plen;
+		buf_len -= plen;
+	}
+}
+
+static void print_stats(void)
+{
+	char buf[128];
+
+	printf("\nStatistics\n==========\n");
+	__print_stats(ACTIVATED, buf, sizeof(buf));
+	printf("Activated timers number:\n  %s\n\n", buf);
+	__print_stats(EXPIRED, buf, sizeof(buf));
+	printf("Expired timers number:\n  %s\n\n", buf);
+	__print_stats(CANCELED, buf, sizeof(buf));
+	printf("Canceled timers number:\n  %s\n\n", buf);
+}
+
+static void print_result(void)
+{
+	if (print_lat)
+		print_timer_lat();
+
+	if (print_runtime)
+		print_func_runtime();
+
+	print_bug_timer();
+	print_stats();
+}
+
+static void __cmd_report(void)
+{
+	setup_pager();
+	read_events();
+	setup_sorting(sort_order);
+	sort_result();
+	print_result();
+	perf_session__delete(timer_session);
+}
+
+static const char *record_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-M",
+	"-f",
+	"-m", "1024",
+	"-c", "1",
+
+	/* timer */
+	"-e", "timer:timer_start",
+	"-e", "timer:timer_expire_entry",
+	"-e", "timer:timer_expire_exit",
+	"-e", "timer:timer_cancel",
+
+	/* hrtimer */
+	"-e", "timer:hrtimer_start",
+	"-e", "timer:hrtimer_expire_entry",
+	"-e", "timer:hrtimer_expire_exit",
+	"-e", "timer:hrtimer_cancel",
+
+	/* itimer **/
+	"-e", "timer:itimer_state",
+	"-e", "timer:itimer_expire",
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+	unsigned int rec_argc, i, j;
+	const char **rec_argv;
+
+	inject_events = PERF_INJECT_HZ;
+	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+		rec_argv[i] = strdup(record_args[i]);
+
+	for (j = 1; j < (unsigned int)argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+int cmd_timer(int argc, const char **argv, const char *prefix __used)
+{
+	argc = parse_options(argc, argv, timer_options, timer_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+	if (!argc)
+		usage_with_options(timer_usage, timer_options);
+
+	symbol__init();
+	if (!strncmp(argv[0], "rec", 3))
+		return __cmd_record(argc, argv);
+
+	if (!strcmp(argv[0], "report")) {
+		if (argc > 1) {
+			argc = parse_options(argc, argv, report_options,
+					     report_usage, 0);
+			if (argc)
+				usage_with_options(report_usage,
+						   report_options);
+		}
+		__cmd_report();
+
+	} else
+		usage_with_options(timer_usage, timer_options);
+
+	return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 18035b1..e294128 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -30,5 +30,6 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
 extern int cmd_version(int argc, const char **argv, const char *prefix);
 extern int cmd_probe(int argc, const char **argv, const char *prefix);
 extern int cmd_kmem(int argc, const char **argv, const char *prefix);
+extern int cmd_timer(int argc, const char **argv, const char *prefix);
 
 #endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 71dc7c3..e522f5d 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -16,3 +16,4 @@ perf-top			mainporcelain common
 perf-trace			mainporcelain common
 perf-probe			mainporcelain common
 perf-kmem			mainporcelain common
+perf-timer			mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index fc89005..dfb2732 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -301,6 +301,7 @@ static void handle_internal_command(int argc, const char **argv)
 		{ "sched",	cmd_sched,	0 },
 		{ "probe",	cmd_probe,	0 },
 		{ "kmem",	cmd_kmem,	0 },
+		{ "timer",	cmd_timer,	0 },
 	};
 	unsigned int i;
 	static const char ext[] = STRIP_EXTENSION;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 80fb365..6432c75 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -75,6 +75,12 @@ struct sample_data {
 	void *raw_data;
 };
 
+struct inject_event {
+	struct perf_event_header header;
+	u32 inject_event_id;
+	u64 value;
+};
+
 #define BUILD_ID_SIZE 20
 
 struct build_id_event {
@@ -92,6 +98,7 @@ typedef union event_union {
 	struct lost_event		lost;
 	struct read_event		read;
 	struct sample_event		sample;
+	struct inject_event		inject;
 } event_t;
 
 struct events_stats {
-- 
1.6.1.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-29  5:21         ` [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ Xiao Guangrong
@ 2009-12-30  9:19           ` Peter Zijlstra
  2009-12-30  9:28             ` Ingo Molnar
  2009-12-30  9:37             ` Xiao Guangrong
  0 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2009-12-30  9:19 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML

On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> 'inject' event is a very useful feature and it's suggested by Ingo
> [ See http://lkml.org/lkml/2009/12/28/31 ]

I really dislike the name, event injection to me would be like a write()
interface where you provide the actual event data to be stuffed in the
output stream.

This just seems like a very weird way of getting data out. A kind of
like sysconf() but done very strange.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-30  9:19           ` Peter Zijlstra
@ 2009-12-30  9:28             ` Ingo Molnar
  2009-12-30  9:36               ` Peter Zijlstra
  2009-12-30  9:37             ` Xiao Guangrong
  1 sibling, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2009-12-30  9:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Xiao Guangrong, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > 'inject' event is a very useful feature and it's suggested by Ingo
> > [ See http://lkml.org/lkml/2009/12/28/31 ]
> 
> I really dislike the name, event injection to me would be like a write() 
> interface where you provide the actual event data to be stuffed in the 
> output stream.
> 
> This just seems like a very weird way of getting data out. A kind of like 
> sysconf() but done very strange.

What kind of API would you suggest?

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-30  9:28             ` Ingo Molnar
@ 2009-12-30  9:36               ` Peter Zijlstra
  2009-12-30  9:44                 ` Ingo Molnar
  2009-12-30 10:06                 ` Peter Zijlstra
  0 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2009-12-30  9:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Xiao Guangrong, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML

On Wed, 2009-12-30 at 10:28 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > > 'inject' event is a very useful feature and it's suggested by Ingo
> > > [ See http://lkml.org/lkml/2009/12/28/31 ]
> > 
> > I really dislike the name, event injection to me would be like a write() 
> > interface where you provide the actual event data to be stuffed in the 
> > output stream.
> > 
> > This just seems like a very weird way of getting data out. A kind of like 
> > sysconf() but done very strange.
> 
> What kind of API would you suggest?

sysconf() seems ideal for getting single, mostly constant variables out
of the kernel, we already have non POSIX (read Linux specific) names in
there.

_SC_KERNEL_RELOCATION_OFFSET
_SC_KERNEL_HZ

or somethings like that comes to mind.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-30  9:19           ` Peter Zijlstra
  2009-12-30  9:28             ` Ingo Molnar
@ 2009-12-30  9:37             ` Xiao Guangrong
  2009-12-30  9:45               ` Peter Zijlstra
  1 sibling, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2009-12-30  9:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML



Peter Zijlstra wrote:
> On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
>> 'inject' event is a very useful feature and it's suggested by Ingo
>> [ See http://lkml.org/lkml/2009/12/28/31 ]
> 
> I really dislike the name, event injection to me would be like a write()
> interface where you provide the actual event data to be stuffed in the
> output stream.
> 

Yes, it like write and it writes something form kernel to perf.
And, i think this name is suitable for it's doing that injects an
'artificial' event, it's well if you have other name for it :-)

> This just seems like a very weird way of getting data out. A kind of
> like sysconf() but done very strange.
> 

It's since some parameter is only used by perf, suce as HZ in this patch and
'relocation offset' in my other patchset, it also well if has better way to
do it :-)

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-30  9:36               ` Peter Zijlstra
@ 2009-12-30  9:44                 ` Ingo Molnar
  2009-12-30 10:06                 ` Peter Zijlstra
  1 sibling, 0 replies; 42+ messages in thread
From: Ingo Molnar @ 2009-12-30  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Xiao Guangrong, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2009-12-30 at 10:28 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > > > 'inject' event is a very useful feature and it's suggested by Ingo
> > > > [ See http://lkml.org/lkml/2009/12/28/31 ]
> > > 
> > > I really dislike the name, event injection to me would be like a write() 
> > > interface where you provide the actual event data to be stuffed in the 
> > > output stream.
> > > 
> > > This just seems like a very weird way of getting data out. A kind of like 
> > > sysconf() but done very strange.
> > 
> > What kind of API would you suggest?
> 
> sysconf() seems ideal for getting single, mostly constant variables out
> of the kernel, we already have non POSIX (read Linux specific) names in
> there.
> 
> _SC_KERNEL_RELOCATION_OFFSET
> _SC_KERNEL_HZ
> 
> or somethings like that comes to mind.

Nah. HZ might be static here but we'll have more dynamic parameters so the 
sysconf API is rather inflexible there.

I was thinking of something that fits into existing perf API schemes, to keep 
it simple and self-sufficient, and to allow for extensions/etc.

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-30  9:37             ` Xiao Guangrong
@ 2009-12-30  9:45               ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2009-12-30  9:45 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Ingo Molnar, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML

On Wed, 2009-12-30 at 17:37 +0800, Xiao Guangrong wrote:
> > I really dislike the name, event injection to me would be like a write()
> > interface where you provide the actual event data to be stuffed in the
> > output stream.
> > 
> 
> Yes, it like write and it writes something form kernel to perf.
> And, i think this name is suitable for it's doing that injects an
> 'artificial' event, it's well if you have other name for it :-)

No, what I means is that you cannot actually inject an arbitrary event.
Sure it injects something, but that's pretty limited.

> > This just seems like a very weird way of getting data out. A kind of
> > like sysconf() but done very strange.
> > 
> 
> It's since some parameter is only used by perf, suce as HZ in this patch and
> 'relocation offset' in my other patchset, it also well if has better way to
> do it :-)

The thing is, this proposed interface is very much tied to perf and
makes it basically impossible for !perf to use. Whereas I can imagine
esp. the relocation offset to be interesting for other things. Basically
everything needing to resolve a kernel symbol.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-30  9:36               ` Peter Zijlstra
  2009-12-30  9:44                 ` Ingo Molnar
@ 2009-12-30 10:06                 ` Peter Zijlstra
  2009-12-30 11:30                   ` Ingo Molnar
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2009-12-30 10:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Xiao Guangrong, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML

On Wed, 2009-12-30 at 10:36 +0100, Peter Zijlstra wrote:
> On Wed, 2009-12-30 at 10:28 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > > > 'inject' event is a very useful feature and it's suggested by Ingo
> > > > [ See http://lkml.org/lkml/2009/12/28/31 ]
> > > 
> > > I really dislike the name, event injection to me would be like a write() 
> > > interface where you provide the actual event data to be stuffed in the 
> > > output stream.
> > > 
> > > This just seems like a very weird way of getting data out. A kind of like 
> > > sysconf() but done very strange.
> > 
> > What kind of API would you suggest?
> 
> sysconf() seems ideal for getting single, mostly constant variables out
> of the kernel, we already have non POSIX (read Linux specific) names in
> there.
> 
> _SC_KERNEL_RELOCATION_OFFSET
> _SC_KERNEL_HZ
> 
> or somethings like that comes to mind.

OK so there is no sysconf() syscall and its all implemented in glibc,
which is utter suckage.. will have to come up with something saner then.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ
  2009-12-30 10:06                 ` Peter Zijlstra
@ 2009-12-30 11:30                   ` Ingo Molnar
  0 siblings, 0 replies; 42+ messages in thread
From: Ingo Molnar @ 2009-12-30 11:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Xiao Guangrong, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Paul Mackerras, LKML


* Peter Zijlstra <peterz@infradead.org> wrote:

> OK so there is no sysconf() syscall and its all implemented in glibc, which 
> is utter suckage.. will have to come up with something saner then.

I think integrating it into the perf stream of events would be the most 
desirable approach - tooling could make use of it easily.

For data record (and hierarchy) description /debug/tracing/events/ would be 
useful.

We dont want to actually 'activate' them in an 'event' fashion - but we could 
use event IDs and the record format and the transport to describe and recover 
the values.

For transport we could use the mmap ring-buffer and it would also be nice to 
add (or resurrect) some 'quick & easy' read()/write() based transport. It a 
bit like a value read-out from a counter - just here it's not a regular 
counter but some sort of kernel-internal value that we want to provide.

Hm?

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2009-12-30 11:30 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-15 11:17 [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Xiao Guangrong
2009-12-15 11:19 ` [PATCH 1/4] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
2009-12-15 11:20   ` [PATCH 2/4] perf_event: fix getting point Xiao Guangrong
2009-12-15 11:21     ` [PATCH 3/4] perf/timer: add document Xiao Guangrong
2009-12-15 11:22       ` [PATCH 4/4] perf/timer: 'perf timer' core code Xiao Guangrong
2009-12-15 17:44         ` Thomas Gleixner
2009-12-16  5:56           ` Xiao Guangrong
2009-12-16 15:59             ` Thomas Gleixner
2009-12-17  7:26               ` Xiao Guangrong
2009-12-15 13:58     ` [PATCH 2/4] perf_event: fix getting point Frederic Weisbecker
2009-12-16  1:03       ` Xiao Guangrong
2009-12-16  1:22         ` Frederic Weisbecker
2009-12-16  1:32           ` Xiao Guangrong
2009-12-15 14:15 ` [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior Frederic Weisbecker
2009-12-16  1:19   ` Xiao Guangrong
2009-12-16  7:32     ` Ingo Molnar
2009-12-16  7:40       ` Xiao Guangrong
2009-12-16  7:46         ` Ingo Molnar
2009-12-15 14:23 ` Frederic Weisbecker
2009-12-22 13:00   ` [PATCH v2 0/5] " Xiao Guangrong
2009-12-22 13:01     ` [PATCH v2 1/5] perf_event: fix getting point Xiao Guangrong
2009-12-22 13:03     ` [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format Xiao Guangrong
2009-12-22 13:20       ` Xiao Guangrong
2009-12-28  7:54       ` Ingo Molnar
2009-12-28 10:40         ` Xiao Guangrong
2009-12-29  5:20         ` [PATCH v3 0/5] perf tools: introduce 'perf timer' to analyze timer's behavior Xiao Guangrong
2009-12-29  5:21         ` [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ Xiao Guangrong
2009-12-30  9:19           ` Peter Zijlstra
2009-12-30  9:28             ` Ingo Molnar
2009-12-30  9:36               ` Peter Zijlstra
2009-12-30  9:44                 ` Ingo Molnar
2009-12-30 10:06                 ` Peter Zijlstra
2009-12-30 11:30                   ` Ingo Molnar
2009-12-30  9:37             ` Xiao Guangrong
2009-12-30  9:45               ` Peter Zijlstra
2009-12-29  5:21         ` [PATCH v3 2/5] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
2009-12-29  5:21         ` [PATCH v3 3/5] perf tools: fix getting point Xiao Guangrong
2009-12-29  5:21         ` [PATCH v3 4/5] perf timer: add document for 'perf timer' Xiao Guangrong
2009-12-29  5:22         ` [PATCH v3 5/5] perf timer: add 'perf timer' core code Xiao Guangrong
2009-12-22 13:04     ` [PATCH v2 3/5] trace_event: record task' real_timer in itimer_state tracepoint Xiao Guangrong
2009-12-22 13:06       ` [PATCH v2 4/5] perf/timer: add document for 'perf timer' Xiao Guangrong
2009-12-22 13:08         ` [PATCH v2 5/5] perf/timer: add 'perf timer' core code Xiao Guangrong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).