All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] perf tools updates
@ 2010-04-24  2:05 Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 1/9] perf lock: Fix state machine to recognize lock sequence Frederic Weisbecker
                   ` (10 more replies)
  0 siblings, 11 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, William Cohen, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Hitoshi Mitake,
	Masami Hiramatsu, Tom Zanussi, Arjan van de Ven, Pekka Enberg,
	Li Zefan, Stephane Eranian, Jens Axboe, Jason Baron,
	Xiao Guangrong

Ingo,

Please pull the perf/core branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks,
	Frederic
---

Frederic Weisbecker (6):
      perf: Generalize perf lock's sample event reordering to the session layer
      perf: Use generic sample reordering in perf sched
      perf: Use generic sample reordering in perf kmem
      perf: Use generic sample reordering in perf trace
      perf: Use generic sample reordering in perf timechart
      perf: Add a perf trace option to check samples ordering reliability

Hitoshi Mitake (1):
      perf lock: Fix state machine to recognize lock sequence

Stephane Eranian (1):
      perf: Fix initialization bug in parse_single_tracepoint_event()

William Cohen (1):
      perf: Some perf-kvm documentation edits


 tools/perf/Documentation/perf-kvm.txt |    9 +-
 tools/perf/builtin-kmem.c             |    6 +-
 tools/perf/builtin-lock.c             |  595 ++++++++++++++++++++-------------
 tools/perf/builtin-sched.c            |    8 +-
 tools/perf/builtin-timechart.c        |  112 +------
 tools/perf/builtin-trace.c            |   13 +
 tools/perf/util/parse-events.c        |   13 +-
 tools/perf/util/session.c             |  179 ++++++++++-
 tools/perf/util/session.h             |   10 +
 9 files changed, 583 insertions(+), 362 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/9] perf lock: Fix state machine to recognize lock sequence
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24 10:43   ` [PATCH] perf lock: add "info" subcommand for dumping misc information Hitoshi Mitake
  2010-04-24 10:46   ` [PATCH v2] " Hitoshi Mitake
  2010-04-24  2:05 ` [PATCH 2/9] perf: Fix initialization bug in parse_single_tracepoint_event() Frederic Weisbecker
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Hitoshi Mitake, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Jens Axboe, Jason Baron,
	Xiao Guangrong, Ingo Molnar, Frederic Weisbecker

From: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>

Previous state machine of perf lock was really broken.
This patch improves it a little.

This patch prepares the list of state machine that represents
lock sequences for each threads.

These state machines can be one of these sequences:

      1) acquire -> acquired -> release
      2) acquire -> contended -> acquired -> release
      3) acquire (w/ try) -> release
      4) acquire (w/ read) -> release

The case of 4) is a little special.
Double acquire of read lock is allowed, so the state machine
counts read lock number, and permits double acquire and release.

But, things are not so simple. Something in my model is still wrong.
I counted the number of lock instances with bad sequence,
and ratio is like this (case of tracing whoami): bad:233, total:2279

version 2:
 * threads are now identified with tid, not pid
 * prepared SEQ_STATE_READ_ACQUIRED for read lock.
 * bunch of struct lock_seq_stat is now linked list
 * debug information enhanced (this have to be removed someday)
   e.g.
     | === output for debug===
     |
     | bad:233, total:2279
     | bad rate:0.000000
     | histogram of events caused bad sequence
     |     acquire: 165
     |    acquired: 0
     |   contended: 0
     |     release: 68

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1271852634-9351-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
[rename SEQ_STATE_UNINITED to SEQ_STATE_UNINITIALIZED]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/builtin-lock.c |  410 +++++++++++++++++++++++++++++++++++++--------
 1 files changed, 342 insertions(+), 68 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 6c38e4f..716d8c5 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -23,6 +23,8 @@
 #include <linux/list.h>
 #include <linux/hash.h>
 
+static struct perf_session *session;
+
 /* based on kernel/lockdep.c */
 #define LOCKHASH_BITS		12
 #define LOCKHASH_SIZE		(1UL << LOCKHASH_BITS)
@@ -32,9 +34,6 @@ static struct list_head lockhash_table[LOCKHASH_SIZE];
 #define __lockhashfn(key)	hash_long((unsigned long)key, LOCKHASH_BITS)
 #define lockhashentry(key)	(lockhash_table + __lockhashfn((key)))
 
-#define LOCK_STATE_UNLOCKED	0	       /* initial state */
-#define LOCK_STATE_LOCKED	1
-
 struct lock_stat {
 	struct list_head	hash_entry;
 	struct rb_node		rb;		/* used for sorting */
@@ -47,20 +46,151 @@ struct lock_stat {
 	void			*addr;		/* address of lockdep_map, used as ID */
 	char			*name;		/* for strcpy(), we cannot use const */
 
-	int			state;
-	u64			prev_event_time; /* timestamp of previous event */
-
-	unsigned int		nr_acquired;
 	unsigned int		nr_acquire;
+	unsigned int		nr_acquired;
 	unsigned int		nr_contended;
 	unsigned int		nr_release;
 
+	unsigned int		nr_readlock;
+	unsigned int		nr_trylock;
 	/* these times are in nano sec. */
 	u64			wait_time_total;
 	u64			wait_time_min;
 	u64			wait_time_max;
+
+	int			discard; /* flag of blacklist */
+};
+
+/*
+ * States of lock_seq_stat
+ *
+ * UNINITIALIZED is required for detecting first event of acquire.
+ * As the nature of lock events, there is no guarantee
+ * that the first event for the locks are acquire,
+ * it can be acquired, contended or release.
+ */
+#define SEQ_STATE_UNINITIALIZED      0	       /* initial state */
+#define SEQ_STATE_RELEASED	1
+#define SEQ_STATE_ACQUIRING	2
+#define SEQ_STATE_ACQUIRED	3
+#define SEQ_STATE_READ_ACQUIRED	4
+#define SEQ_STATE_CONTENDED	5
+
+/*
+ * MAX_LOCK_DEPTH
+ * Imported from include/linux/sched.h.
+ * Should this be synchronized?
+ */
+#define MAX_LOCK_DEPTH 48
+
+/*
+ * struct lock_seq_stat:
+ * Place to put on state of one lock sequence
+ * 1) acquire -> acquired -> release
+ * 2) acquire -> contended -> acquired -> release
+ * 3) acquire (with read or try) -> release
+ * 4) Are there other patterns?
+ */
+struct lock_seq_stat {
+	struct list_head        list;
+	int			state;
+	u64			prev_event_time;
+	void                    *addr;
+
+	int                     read_count;
 };
 
+struct thread_stat {
+	struct rb_node		rb;
+
+	u32                     tid;
+	struct list_head        seq_list;
+};
+
+static struct rb_root		thread_stats;
+
+static struct thread_stat *thread_stat_find(u32 tid)
+{
+	struct rb_node *node;
+	struct thread_stat *st;
+
+	node = thread_stats.rb_node;
+	while (node) {
+		st = container_of(node, struct thread_stat, rb);
+		if (st->tid == tid)
+			return st;
+		else if (tid < st->tid)
+			node = node->rb_left;
+		else
+			node = node->rb_right;
+	}
+
+	return NULL;
+}
+
+static void thread_stat_insert(struct thread_stat *new)
+{
+	struct rb_node **rb = &thread_stats.rb_node;
+	struct rb_node *parent = NULL;
+	struct thread_stat *p;
+
+	while (*rb) {
+		p = container_of(*rb, struct thread_stat, rb);
+		parent = *rb;
+
+		if (new->tid < p->tid)
+			rb = &(*rb)->rb_left;
+		else if (new->tid > p->tid)
+			rb = &(*rb)->rb_right;
+		else
+			BUG_ON("inserting invalid thread_stat\n");
+	}
+
+	rb_link_node(&new->rb, parent, rb);
+	rb_insert_color(&new->rb, &thread_stats);
+}
+
+static struct thread_stat *thread_stat_findnew_after_first(u32 tid)
+{
+	struct thread_stat *st;
+
+	st = thread_stat_find(tid);
+	if (st)
+		return st;
+
+	st = zalloc(sizeof(struct thread_stat));
+	if (!st)
+		die("memory allocation failed\n");
+
+	st->tid = tid;
+	INIT_LIST_HEAD(&st->seq_list);
+
+	thread_stat_insert(st);
+
+	return st;
+}
+
+static struct thread_stat *thread_stat_findnew_first(u32 tid);
+static struct thread_stat *(*thread_stat_findnew)(u32 tid) =
+	thread_stat_findnew_first;
+
+static struct thread_stat *thread_stat_findnew_first(u32 tid)
+{
+	struct thread_stat *st;
+
+	st = zalloc(sizeof(struct thread_stat));
+	if (!st)
+		die("memory allocation failed\n");
+	st->tid = tid;
+	INIT_LIST_HEAD(&st->seq_list);
+
+	rb_link_node(&st->rb, NULL, &thread_stats.rb_node);
+	rb_insert_color(&st->rb, &thread_stats);
+
+	thread_stat_findnew = thread_stat_findnew_after_first;
+	return st;
+}
+
 /* build simple key function one is bigger than two */
 #define SINGLE_KEY(member)						\
 	static int lock_stat_key_ ## member(struct lock_stat *one,	\
@@ -175,8 +305,6 @@ static struct lock_stat *lock_stat_findnew(void *addr, const char *name)
 		goto alloc_failed;
 	strcpy(new->name, name);
 
-	/* LOCK_STATE_UNLOCKED == 0 isn't guaranteed forever */
-	new->state = LOCK_STATE_UNLOCKED;
 	new->wait_time_min = ULLONG_MAX;
 
 	list_add(&new->hash_entry, entry);
@@ -198,6 +326,7 @@ struct raw_event_sample {
 struct trace_acquire_event {
 	void			*addr;
 	const char		*name;
+	int			flag;
 };
 
 struct trace_acquired_event {
@@ -241,120 +370,246 @@ struct trace_lock_handler {
 			      struct thread *thread);
 };
 
+static struct lock_seq_stat *get_seq(struct thread_stat *ts, void *addr)
+{
+	struct lock_seq_stat *seq;
+
+	list_for_each_entry(seq, &ts->seq_list, list) {
+		if (seq->addr == addr)
+			return seq;
+	}
+
+	seq = zalloc(sizeof(struct lock_seq_stat));
+	if (!seq)
+		die("Not enough memory\n");
+	seq->state = SEQ_STATE_UNINITIALIZED;
+	seq->addr = addr;
+
+	list_add(&seq->list, &ts->seq_list);
+	return seq;
+}
+
+static int bad_hist[4];
+
 static void
 report_lock_acquire_event(struct trace_acquire_event *acquire_event,
 			struct event *__event __used,
 			int cpu __used,
-			u64 timestamp,
+			u64 timestamp __used,
 			struct thread *thread __used)
 {
-	struct lock_stat *st;
+	struct lock_stat *ls;
+	struct thread_stat *ts;
+	struct lock_seq_stat *seq;
+
+	ls = lock_stat_findnew(acquire_event->addr, acquire_event->name);
+	if (ls->discard)
+		return;
 
-	st = lock_stat_findnew(acquire_event->addr, acquire_event->name);
+	ts = thread_stat_findnew(thread->pid);
+	seq = get_seq(ts, acquire_event->addr);
 
-	switch (st->state) {
-	case LOCK_STATE_UNLOCKED:
+	switch (seq->state) {
+	case SEQ_STATE_UNINITIALIZED:
+	case SEQ_STATE_RELEASED:
+		if (!acquire_event->flag) {
+			seq->state = SEQ_STATE_ACQUIRING;
+		} else {
+			if (acquire_event->flag & 1)
+				ls->nr_trylock++;
+			if (acquire_event->flag & 2)
+				ls->nr_readlock++;
+			seq->state = SEQ_STATE_READ_ACQUIRED;
+			seq->read_count = 1;
+			ls->nr_acquired++;
+		}
+		break;
+	case SEQ_STATE_READ_ACQUIRED:
+		if (acquire_event->flag & 2) {
+			seq->read_count++;
+			ls->nr_acquired++;
+			goto end;
+		} else {
+			goto broken;
+		}
 		break;
-	case LOCK_STATE_LOCKED:
+	case SEQ_STATE_ACQUIRED:
+	case SEQ_STATE_ACQUIRING:
+	case SEQ_STATE_CONTENDED:
+broken:
+		/* broken lock sequence, discard it */
+		ls->discard = 1;
+		bad_hist[0]++;
+		list_del(&seq->list);
+		free(seq);
+		goto end;
 		break;
 	default:
-		BUG_ON(1);
+		BUG_ON("Unknown state of lock sequence found!\n");
 		break;
 	}
 
-	st->prev_event_time = timestamp;
+	ls->nr_acquire++;
+	seq->prev_event_time = timestamp;
+end:
+	return;
 }
 
 static void
 report_lock_acquired_event(struct trace_acquired_event *acquired_event,
 			 struct event *__event __used,
 			 int cpu __used,
-			 u64 timestamp,
+			 u64 timestamp __used,
 			 struct thread *thread __used)
 {
-	struct lock_stat *st;
+	struct lock_stat *ls;
+	struct thread_stat *ts;
+	struct lock_seq_stat *seq;
+	u64 contended_term;
 
-	st = lock_stat_findnew(acquired_event->addr, acquired_event->name);
+	ls = lock_stat_findnew(acquired_event->addr, acquired_event->name);
+	if (ls->discard)
+		return;
+
+	ts = thread_stat_findnew(thread->pid);
+	seq = get_seq(ts, acquired_event->addr);
 
-	switch (st->state) {
-	case LOCK_STATE_UNLOCKED:
-		st->state = LOCK_STATE_LOCKED;
-		st->nr_acquired++;
+	switch (seq->state) {
+	case SEQ_STATE_UNINITIALIZED:
+		/* orphan event, do nothing */
+		return;
+	case SEQ_STATE_ACQUIRING:
 		break;
-	case LOCK_STATE_LOCKED:
+	case SEQ_STATE_CONTENDED:
+		contended_term = timestamp - seq->prev_event_time;
+		ls->wait_time_total += contended_term;
+
+		if (contended_term < ls->wait_time_min)
+			ls->wait_time_min = contended_term;
+		else if (ls->wait_time_max < contended_term)
+			ls->wait_time_max = contended_term;
 		break;
+	case SEQ_STATE_RELEASED:
+	case SEQ_STATE_ACQUIRED:
+	case SEQ_STATE_READ_ACQUIRED:
+		/* broken lock sequence, discard it */
+		ls->discard = 1;
+		bad_hist[1]++;
+		list_del(&seq->list);
+		free(seq);
+		goto end;
+		break;
+
 	default:
-		BUG_ON(1);
+		BUG_ON("Unknown state of lock sequence found!\n");
 		break;
 	}
 
-	st->prev_event_time = timestamp;
+	seq->state = SEQ_STATE_ACQUIRED;
+	ls->nr_acquired++;
+	seq->prev_event_time = timestamp;
+end:
+	return;
 }
 
 static void
 report_lock_contended_event(struct trace_contended_event *contended_event,
 			  struct event *__event __used,
 			  int cpu __used,
-			  u64 timestamp,
+			  u64 timestamp __used,
 			  struct thread *thread __used)
 {
-	struct lock_stat *st;
+	struct lock_stat *ls;
+	struct thread_stat *ts;
+	struct lock_seq_stat *seq;
+
+	ls = lock_stat_findnew(contended_event->addr, contended_event->name);
+	if (ls->discard)
+		return;
 
-	st = lock_stat_findnew(contended_event->addr, contended_event->name);
+	ts = thread_stat_findnew(thread->pid);
+	seq = get_seq(ts, contended_event->addr);
 
-	switch (st->state) {
-	case LOCK_STATE_UNLOCKED:
+	switch (seq->state) {
+	case SEQ_STATE_UNINITIALIZED:
+		/* orphan event, do nothing */
+		return;
+	case SEQ_STATE_ACQUIRING:
 		break;
-	case LOCK_STATE_LOCKED:
-		st->nr_contended++;
+	case SEQ_STATE_RELEASED:
+	case SEQ_STATE_ACQUIRED:
+	case SEQ_STATE_READ_ACQUIRED:
+	case SEQ_STATE_CONTENDED:
+		/* broken lock sequence, discard it */
+		ls->discard = 1;
+		bad_hist[2]++;
+		list_del(&seq->list);
+		free(seq);
+		goto end;
 		break;
 	default:
-		BUG_ON(1);
+		BUG_ON("Unknown state of lock sequence found!\n");
 		break;
 	}
 
-	st->prev_event_time = timestamp;
+	seq->state = SEQ_STATE_CONTENDED;
+	ls->nr_contended++;
+	seq->prev_event_time = timestamp;
+end:
+	return;
 }
 
 static void
 report_lock_release_event(struct trace_release_event *release_event,
 			struct event *__event __used,
 			int cpu __used,
-			u64 timestamp,
+			u64 timestamp __used,
 			struct thread *thread __used)
 {
-	struct lock_stat *st;
-	u64 hold_time;
+	struct lock_stat *ls;
+	struct thread_stat *ts;
+	struct lock_seq_stat *seq;
 
-	st = lock_stat_findnew(release_event->addr, release_event->name);
+	ls = lock_stat_findnew(release_event->addr, release_event->name);
+	if (ls->discard)
+		return;
 
-	switch (st->state) {
-	case LOCK_STATE_UNLOCKED:
-		break;
-	case LOCK_STATE_LOCKED:
-		st->state = LOCK_STATE_UNLOCKED;
-		hold_time = timestamp - st->prev_event_time;
+	ts = thread_stat_findnew(thread->pid);
+	seq = get_seq(ts, release_event->addr);
 
-		if (timestamp < st->prev_event_time) {
-			/* terribly, this can happen... */
+	switch (seq->state) {
+	case SEQ_STATE_UNINITIALIZED:
+		goto end;
+		break;
+	case SEQ_STATE_ACQUIRED:
+		break;
+	case SEQ_STATE_READ_ACQUIRED:
+		seq->read_count--;
+		BUG_ON(seq->read_count < 0);
+		if (!seq->read_count) {
+			ls->nr_release++;
 			goto end;
 		}
-
-		if (st->wait_time_min > hold_time)
-			st->wait_time_min = hold_time;
-		if (st->wait_time_max < hold_time)
-			st->wait_time_max = hold_time;
-		st->wait_time_total += hold_time;
-
-		st->nr_release++;
+		break;
+	case SEQ_STATE_ACQUIRING:
+	case SEQ_STATE_CONTENDED:
+	case SEQ_STATE_RELEASED:
+		/* broken lock sequence, discard it */
+		ls->discard = 1;
+		bad_hist[3]++;
+		goto free_seq;
 		break;
 	default:
-		BUG_ON(1);
+		BUG_ON("Unknown state of lock sequence found!\n");
 		break;
 	}
 
+	ls->nr_release++;
+free_seq:
+	list_del(&seq->list);
+	free(seq);
 end:
-	st->prev_event_time = timestamp;
+	return;
 }
 
 /* lock oriented handlers */
@@ -381,6 +636,7 @@ process_lock_acquire_event(void *data,
 	tmp = raw_field_value(event, "lockdep_addr", data);
 	memcpy(&acquire_event.addr, &tmp, sizeof(void *));
 	acquire_event.name = (char *)raw_field_ptr(event, "name", data);
+	acquire_event.flag = (int)raw_field_value(event, "flag", data);
 
 	if (trace_handler->acquire_event)
 		trace_handler->acquire_event(&acquire_event, event, cpu, timestamp, thread);
@@ -441,8 +697,8 @@ process_lock_release_event(void *data,
 }
 
 static void
-process_raw_event(void *data, int cpu,
-		  u64 timestamp, struct thread *thread)
+process_raw_event(void *data, int cpu __used,
+		  u64 timestamp __used, struct thread *thread __used)
 {
 	struct event *event;
 	int type;
@@ -604,14 +860,15 @@ static void queue_raw_event(void *data, int raw_size, int cpu,
 	}
 }
 
-static int process_sample_event(event_t *event, struct perf_session *session)
+static int process_sample_event(event_t *event, struct perf_session *s)
 {
 	struct thread *thread;
 	struct sample_data data;
 
 	bzero(&data, sizeof(struct sample_data));
-	event__parse_sample(event, session->sample_type, &data);
-	thread = perf_session__findnew(session, data.pid);
+	event__parse_sample(event, s->sample_type, &data);
+	/* CAUTION: using tid as thread.pid */
+	thread = perf_session__findnew(s, data.tid);
 
 	if (thread == NULL) {
 		pr_debug("problem processing %d event, skipping it.\n",
@@ -634,8 +891,8 @@ static void print_result(void)
 {
 	struct lock_stat *st;
 	char cut_name[20];
+	int bad, total;
 
-	printf("%18s ", "ID");
 	printf("%20s ", "Name");
 	printf("%10s ", "acquired");
 	printf("%10s ", "contended");
@@ -646,11 +903,15 @@ static void print_result(void)
 
 	printf("\n\n");
 
+	bad = total = 0;
 	while ((st = pop_from_result())) {
+		total++;
+		if (st->discard) {
+			bad++;
+			continue;
+		}
 		bzero(cut_name, 20);
 
-		printf("%p ", st->addr);
-
 		if (strlen(st->name) < 16) {
 			/* output raw name */
 			printf("%20s ", st->name);
@@ -673,6 +934,21 @@ static void print_result(void)
 		       0 : st->wait_time_min);
 		printf("\n");
 	}
+
+	{
+		/* Output for debug, this have to be removed */
+		int i;
+		const char *name[4] =
+			{ "acquire", "acquired", "contended", "release" };
+
+		printf("\n=== output for debug===\n\n");
+		printf("bad:%d, total:%d\n", bad, total);
+		printf("bad rate:%f\n", (double)(bad / total));
+
+		printf("histogram of events caused bad sequence\n");
+		for (i = 0; i < 4; i++)
+			printf(" %10s: %d\n", name[i], bad_hist[i]);
+	}
 }
 
 static void dump_map(void)
@@ -692,8 +968,6 @@ static struct perf_event_ops eops = {
 	.comm			= event__process_comm,
 };
 
-static struct perf_session *session;
-
 static int read_events(void)
 {
 	session = perf_session__new(input_name, O_RDONLY, 0);
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/9] perf: Fix initialization bug in parse_single_tracepoint_event()
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 1/9] perf lock: Fix state machine to recognize lock sequence Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 3/9] perf: Generalize perf lock's sample event reordering to the session layer Frederic Weisbecker
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Stephane Eranian, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Paul Mackerras, Ingo Molnar, Frederic Weisbecker

From: Stephane Eranian <eranian@google.com>

The parse_single_tracepoint_event() was setting some attributes
before it validated the event was indeed a tracepoint event. This
caused problems with other initialization routines like in the
builtin-top.c module whereby sample_period is not set if not 0.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <4bcf232b.698fd80a.6fbe.ffffb737@mx.google.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/util/parse-events.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 3b4ec67..600d327 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -418,12 +418,6 @@ parse_single_tracepoint_event(char *sys_name,
 	u64 id;
 	int fd;
 
-	attr->sample_type |= PERF_SAMPLE_RAW;
-	attr->sample_type |= PERF_SAMPLE_TIME;
-	attr->sample_type |= PERF_SAMPLE_CPU;
-
-	attr->sample_period = 1;
-
 	snprintf(evt_path, MAXPATHLEN, "%s/%s/%s/id", debugfs_path,
 		 sys_name, evt_name);
 
@@ -442,6 +436,13 @@ parse_single_tracepoint_event(char *sys_name,
 	attr->type = PERF_TYPE_TRACEPOINT;
 	*strp = evt_name + evt_length;
 
+	attr->sample_type |= PERF_SAMPLE_RAW;
+	attr->sample_type |= PERF_SAMPLE_TIME;
+	attr->sample_type |= PERF_SAMPLE_CPU;
+
+	attr->sample_period = 1;
+
+
 	return EVT_HANDLED;
 }
 
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/9] perf: Generalize perf lock's sample event reordering to the session layer
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 1/9] perf lock: Fix state machine to recognize lock sequence Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 2/9] perf: Fix initialization bug in parse_single_tracepoint_event() Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 4/9] perf: Use generic sample reordering in perf sched Frederic Weisbecker
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Hitoshi Mitake,
	Ingo Molnar, Masami Hiramatsu, Tom Zanussi

The sample events recorded by perf record are not time ordered
because we have one buffer per cpu for each event (even demultiplexed
per task/per cpu for task bound events). But when we read trace events
we want them to be ordered by time because many state machines are
involved.

There are currently two ways perf tools deal with that:

- use -M to multiplex every buffers (perf sched, perf kmem)
  But this creates a lot of contention in SMP machines on
  record time.

- use a post-processing time reordering (perf timechart, perf lock)
  The reordering used by timechart is simple but doesn't scale well
  with huge flow of events, in terms of performance and memory use
  (unusable with perf lock for example).
  Perf lock has its own samples reordering that flushes its memory
  use in a regular basis and that uses a sorting based on the
  previous event queued (a new event to be queued is close to the
  previous one most of the time).

This patch proposes to export perf lock's samples reordering facility
to the session layer that reads the events. So if a tool wants to
get ordered sample events, it needs to set its
struct perf_event_ops::ordered_samples to true and that's it.

This prepares tracing based perf tools to get rid of the need to
use buffers multiplexing (-M) or to implement their own
reordering.

Also lower the flush period to 2 as it's sufficient already.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
---
 tools/perf/builtin-lock.c |  197 +++++----------------------------------------
 tools/perf/util/session.c |  179 ++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/session.h |   10 +++
 3 files changed, 210 insertions(+), 176 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 716d8c5..ce27675 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -316,8 +316,6 @@ alloc_failed:
 
 static char			const *input_name = "perf.data";
 
-static int			profile_cpu = -1;
-
 struct raw_event_sample {
 	u32			size;
 	char			data[0];
@@ -697,8 +695,7 @@ process_lock_release_event(void *data,
 }
 
 static void
-process_raw_event(void *data, int cpu __used,
-		  u64 timestamp __used, struct thread *thread __used)
+process_raw_event(void *data, int cpu, u64 timestamp, struct thread *thread)
 {
 	struct event *event;
 	int type;
@@ -716,176 +713,6 @@ process_raw_event(void *data, int cpu __used,
 		process_lock_release_event(data, event, cpu, timestamp, thread);
 }
 
-struct raw_event_queue {
-	u64			timestamp;
-	int			cpu;
-	void			*data;
-	struct thread		*thread;
-	struct list_head	list;
-};
-
-static LIST_HEAD(raw_event_head);
-
-#define FLUSH_PERIOD	(5 * NSEC_PER_SEC)
-
-static u64 flush_limit = ULLONG_MAX;
-static u64 last_flush = 0;
-struct raw_event_queue *last_inserted;
-
-static void flush_raw_event_queue(u64 limit)
-{
-	struct raw_event_queue *tmp, *iter;
-
-	list_for_each_entry_safe(iter, tmp, &raw_event_head, list) {
-		if (iter->timestamp > limit)
-			return;
-
-		if (iter == last_inserted)
-			last_inserted = NULL;
-
-		process_raw_event(iter->data, iter->cpu, iter->timestamp,
-				  iter->thread);
-
-		last_flush = iter->timestamp;
-		list_del(&iter->list);
-		free(iter->data);
-		free(iter);
-	}
-}
-
-static void __queue_raw_event_end(struct raw_event_queue *new)
-{
-	struct raw_event_queue *iter;
-
-	list_for_each_entry_reverse(iter, &raw_event_head, list) {
-		if (iter->timestamp < new->timestamp) {
-			list_add(&new->list, &iter->list);
-			return;
-		}
-	}
-
-	list_add(&new->list, &raw_event_head);
-}
-
-static void __queue_raw_event_before(struct raw_event_queue *new,
-				     struct raw_event_queue *iter)
-{
-	list_for_each_entry_continue_reverse(iter, &raw_event_head, list) {
-		if (iter->timestamp < new->timestamp) {
-			list_add(&new->list, &iter->list);
-			return;
-		}
-	}
-
-	list_add(&new->list, &raw_event_head);
-}
-
-static void __queue_raw_event_after(struct raw_event_queue *new,
-				     struct raw_event_queue *iter)
-{
-	list_for_each_entry_continue(iter, &raw_event_head, list) {
-		if (iter->timestamp > new->timestamp) {
-			list_add_tail(&new->list, &iter->list);
-			return;
-		}
-	}
-	list_add_tail(&new->list, &raw_event_head);
-}
-
-/* The queue is ordered by time */
-static void __queue_raw_event(struct raw_event_queue *new)
-{
-	if (!last_inserted) {
-		__queue_raw_event_end(new);
-		return;
-	}
-
-	/*
-	 * Most of the time the current event has a timestamp
-	 * very close to the last event inserted, unless we just switched
-	 * to another event buffer. Having a sorting based on a list and
-	 * on the last inserted event that is close to the current one is
-	 * probably more efficient than an rbtree based sorting.
-	 */
-	if (last_inserted->timestamp >= new->timestamp)
-		__queue_raw_event_before(new, last_inserted);
-	else
-		__queue_raw_event_after(new, last_inserted);
-}
-
-static void queue_raw_event(void *data, int raw_size, int cpu,
-			    u64 timestamp, struct thread *thread)
-{
-	struct raw_event_queue *new;
-
-	if (flush_limit == ULLONG_MAX)
-		flush_limit = timestamp + FLUSH_PERIOD;
-
-	if (timestamp < last_flush) {
-		printf("Warning: Timestamp below last timeslice flush\n");
-		return;
-	}
-
-	new = malloc(sizeof(*new));
-	if (!new)
-		die("Not enough memory\n");
-
-	new->timestamp = timestamp;
-	new->cpu = cpu;
-	new->thread = thread;
-
-	new->data = malloc(raw_size);
-	if (!new->data)
-		die("Not enough memory\n");
-
-	memcpy(new->data, data, raw_size);
-
-	__queue_raw_event(new);
-	last_inserted = new;
-
-	/*
-	 * We want to have a slice of events covering 2 * FLUSH_PERIOD
-	 * If FLUSH_PERIOD is big enough, it ensures every events that occured
-	 * in the first half of the timeslice have all been buffered and there
-	 * are none remaining (we need that because of the weakly ordered
-	 * event recording we have). Then once we reach the 2 * FLUSH_PERIOD
-	 * timeslice, we flush the first half to be gentle with the memory
-	 * (the second half can still get new events in the middle, so wait
-	 * another period to flush it)
-	 */
-	if (new->timestamp > flush_limit &&
-		new->timestamp - flush_limit > FLUSH_PERIOD) {
-		flush_limit += FLUSH_PERIOD;
-		flush_raw_event_queue(flush_limit);
-	}
-}
-
-static int process_sample_event(event_t *event, struct perf_session *s)
-{
-	struct thread *thread;
-	struct sample_data data;
-
-	bzero(&data, sizeof(struct sample_data));
-	event__parse_sample(event, s->sample_type, &data);
-	/* CAUTION: using tid as thread.pid */
-	thread = perf_session__findnew(s, data.tid);
-
-	if (thread == NULL) {
-		pr_debug("problem processing %d event, skipping it.\n",
-			 event->header.type);
-		return -1;
-	}
-
-	dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
-
-	if (profile_cpu != -1 && profile_cpu != (int) data.cpu)
-		return 0;
-
-	queue_raw_event(data.raw_data, data.raw_size, data.cpu, data.time, thread);
-
-	return 0;
-}
-
 /* TODO: various way to print, coloring, nano or milli sec */
 static void print_result(void)
 {
@@ -963,9 +790,30 @@ static void dump_map(void)
 	}
 }
 
+static int process_sample_event(event_t *self, struct perf_session *s)
+{
+	struct sample_data data;
+	struct thread *thread;
+
+	bzero(&data, sizeof(data));
+	event__parse_sample(self, s->sample_type, &data);
+
+	thread = perf_session__findnew(s, data.tid);
+	if (thread == NULL) {
+		pr_debug("problem processing %d event, skipping it.\n",
+			self->header.type);
+		return -1;
+	}
+
+	process_raw_event(data.raw_data, data.cpu, data.time, thread);
+
+	return 0;
+}
+
 static struct perf_event_ops eops = {
 	.sample			= process_sample_event,
 	.comm			= event__process_comm,
+	.ordered_samples	= true,
 };
 
 static int read_events(void)
@@ -994,7 +842,6 @@ static void __cmd_report(void)
 	setup_pager();
 	select_key();
 	read_events();
-	flush_raw_event_queue(ULLONG_MAX);
 	sort_result();
 	print_result();
 }
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7d88ae5..b7aade2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -98,6 +98,8 @@ struct perf_session *perf_session__new(const char *filename, int mode, bool forc
 	self->cwdlen = 0;
 	self->unknown_events = 0;
 	self->kerninfo_root = RB_ROOT;
+	self->ordered_samples.flush_limit = ULLONG_MAX;
+	INIT_LIST_HEAD(&self->ordered_samples.samples_head);
 
 	if (mode == O_RDONLY) {
 		if (perf_session__open(self, force) < 0)
@@ -351,6 +353,178 @@ static event__swap_op event__swap_ops[] = {
 	[PERF_RECORD_HEADER_MAX]    = NULL,
 };
 
+struct sample_queue {
+	u64			timestamp;
+	struct sample_event	*event;
+	struct list_head	list;
+};
+
+#define FLUSH_PERIOD	(2 * NSEC_PER_SEC)
+
+static void flush_sample_queue(struct perf_session *s,
+			       struct perf_event_ops *ops)
+{
+	struct list_head *head = &s->ordered_samples.samples_head;
+	u64 limit = s->ordered_samples.flush_limit;
+	struct sample_queue *tmp, *iter;
+
+	if (!ops->ordered_samples)
+		return;
+
+	list_for_each_entry_safe(iter, tmp, head, list) {
+		if (iter->timestamp > limit)
+			return;
+
+		if (iter == s->ordered_samples.last_inserted)
+			s->ordered_samples.last_inserted = NULL;
+
+		ops->sample((event_t *)iter->event, s);
+
+		s->ordered_samples.last_flush = iter->timestamp;
+		list_del(&iter->list);
+		free(iter->event);
+		free(iter);
+	}
+}
+
+static void __queue_sample_end(struct sample_queue *new, struct list_head *head)
+{
+	struct sample_queue *iter;
+
+	list_for_each_entry_reverse(iter, head, list) {
+		if (iter->timestamp < new->timestamp) {
+			list_add(&new->list, &iter->list);
+			return;
+		}
+	}
+
+	list_add(&new->list, head);
+}
+
+static void __queue_sample_before(struct sample_queue *new,
+				  struct sample_queue *iter,
+				  struct list_head *head)
+{
+	list_for_each_entry_continue_reverse(iter, head, list) {
+		if (iter->timestamp < new->timestamp) {
+			list_add(&new->list, &iter->list);
+			return;
+		}
+	}
+
+	list_add(&new->list, head);
+}
+
+static void __queue_sample_after(struct sample_queue *new,
+				 struct sample_queue *iter,
+				 struct list_head *head)
+{
+	list_for_each_entry_continue(iter, head, list) {
+		if (iter->timestamp > new->timestamp) {
+			list_add_tail(&new->list, &iter->list);
+			return;
+		}
+	}
+	list_add_tail(&new->list, head);
+}
+
+/* The queue is ordered by time */
+static void __queue_sample_event(struct sample_queue *new,
+				 struct perf_session *s)
+{
+	struct sample_queue *last_inserted = s->ordered_samples.last_inserted;
+	struct list_head *head = &s->ordered_samples.samples_head;
+
+
+	if (!last_inserted) {
+		__queue_sample_end(new, head);
+		return;
+	}
+
+	/*
+	 * Most of the time the current event has a timestamp
+	 * very close to the last event inserted, unless we just switched
+	 * to another event buffer. Having a sorting based on a list and
+	 * on the last inserted event that is close to the current one is
+	 * probably more efficient than an rbtree based sorting.
+	 */
+	if (last_inserted->timestamp >= new->timestamp)
+		__queue_sample_before(new, last_inserted, head);
+	else
+		__queue_sample_after(new, last_inserted, head);
+}
+
+static int queue_sample_event(event_t *event, struct sample_data *data,
+			      struct perf_session *s,
+			      struct perf_event_ops *ops)
+{
+	u64 timestamp = data->time;
+	struct sample_queue *new;
+	u64 flush_limit;
+
+
+	if (s->ordered_samples.flush_limit == ULLONG_MAX)
+		s->ordered_samples.flush_limit = timestamp + FLUSH_PERIOD;
+
+	if (timestamp < s->ordered_samples.last_flush) {
+		printf("Warning: Timestamp below last timeslice flush\n");
+		return -EINVAL;
+	}
+
+	new = malloc(sizeof(*new));
+	if (!new)
+		return -ENOMEM;
+
+	new->timestamp = timestamp;
+
+	new->event = malloc(event->header.size);
+	if (!new->event) {
+		free(new);
+		return -ENOMEM;
+	}
+
+	memcpy(new->event, event, event->header.size);
+
+	__queue_sample_event(new, s);
+	s->ordered_samples.last_inserted = new;
+
+	/*
+	 * We want to have a slice of events covering 2 * FLUSH_PERIOD
+	 * If FLUSH_PERIOD is big enough, it ensures every events that occured
+	 * in the first half of the timeslice have all been buffered and there
+	 * are none remaining (we need that because of the weakly ordered
+	 * event recording we have). Then once we reach the 2 * FLUSH_PERIOD
+	 * timeslice, we flush the first half to be gentle with the memory
+	 * (the second half can still get new events in the middle, so wait
+	 * another period to flush it)
+	 */
+	flush_limit = s->ordered_samples.flush_limit;
+
+	if (new->timestamp > flush_limit &&
+		new->timestamp - flush_limit > FLUSH_PERIOD) {
+		s->ordered_samples.flush_limit += FLUSH_PERIOD;
+		flush_sample_queue(s, ops);
+	}
+
+	return 0;
+}
+
+static int perf_session__process_sample(event_t *event, struct perf_session *s,
+					struct perf_event_ops *ops)
+{
+	struct sample_data data;
+
+	if (!ops->ordered_samples)
+		return ops->sample(event, s);
+
+	bzero(&data, sizeof(struct sample_data));
+	event__parse_sample(event, s->sample_type, &data);
+
+	queue_sample_event(event, &data, s, ops);
+
+	return 0;
+}
+
 static int perf_session__process_event(struct perf_session *self,
 				       event_t *event,
 				       struct perf_event_ops *ops,
@@ -371,7 +545,7 @@ static int perf_session__process_event(struct perf_session *self,
 
 	switch (event->header.type) {
 	case PERF_RECORD_SAMPLE:
-		return ops->sample(event, self);
+		return perf_session__process_sample(event, self, ops);
 	case PERF_RECORD_MMAP:
 		return ops->mmap(event, self);
 	case PERF_RECORD_COMM:
@@ -611,6 +785,9 @@ more:
 		goto more;
 done:
 	err = 0;
+	/* do the final flush for ordered samples */
+	self->ordered_samples.flush_limit = ULLONG_MAX;
+	flush_sample_queue(self, ops);
 out_err:
 	ui_progress__delete(progress);
 	return err;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 5e47c87..796e229 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -8,9 +8,17 @@
 #include <linux/rbtree.h>
 #include "../../../include/linux/perf_event.h"
 
+struct sample_queue;
 struct ip_callchain;
 struct thread;
 
+struct ordered_samples {
+	u64			last_flush;
+	u64			flush_limit;
+	struct list_head	samples_head;
+	struct sample_queue	*last_inserted;
+};
+
 struct perf_session {
 	struct perf_header	header;
 	unsigned long		size;
@@ -28,6 +36,7 @@ struct perf_session {
 	bool			fd_pipe;
 	int			cwdlen;
 	char			*cwd;
+	struct ordered_samples	ordered_samples;
 	char filename[0];
 };
 
@@ -47,6 +56,7 @@ struct perf_event_ops {
 		 event_type,
 		 tracing_data,
 		 build_id;
+	bool	ordered_samples;
 };
 
 struct perf_session *perf_session__new(const char *filename, int mode, bool force);
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/9] perf: Use generic sample reordering in perf sched
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2010-04-24  2:05 ` [PATCH 3/9] perf: Generalize perf lock's sample event reordering to the session layer Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 5/9] perf: Use generic sample reordering in perf kmem Frederic Weisbecker
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Hitoshi Mitake,
	Ingo Molnar, Masami Hiramatsu, Tom Zanussi

Use the new generic sample events reordering from perf sched,
this drops the need of multiplexing the buffers on record time,
improving the scalability of perf sched.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
---
 tools/perf/builtin-sched.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 09ddc8e..94453f1 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1651,9 +1651,10 @@ static int process_lost_event(event_t *event __used,
 }
 
 static struct perf_event_ops event_ops = {
-	.sample	= process_sample_event,
-	.comm	= event__process_comm,
-	.lost	= process_lost_event,
+	.sample			= process_sample_event,
+	.comm			= event__process_comm,
+	.lost			= process_lost_event,
+	.ordered_samples	= true,
 };
 
 static int read_events(void)
@@ -1850,7 +1851,6 @@ static const char *record_args[] = {
 	"record",
 	"-a",
 	"-R",
-	"-M",
 	"-f",
 	"-m", "1024",
 	"-c", "1",
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/9] perf: Use generic sample reordering in perf kmem
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2010-04-24  2:05 ` [PATCH 4/9] perf: Use generic sample reordering in perf sched Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 6/9] perf: Use generic sample reordering in perf trace Frederic Weisbecker
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Hitoshi Mitake,
	Ingo Molnar, Masami Hiramatsu, Tom Zanussi, Pekka Enberg,
	Li Zefan

Use the new generic sample events reordering from perf kmem,
this drops the need of multiplexing the buffers on record time,
improving the scalability of perf kmem.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
 tools/perf/builtin-kmem.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index db474bb..ab906cb 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -335,8 +335,9 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 }
 
 static struct perf_event_ops event_ops = {
-	.sample	= process_sample_event,
-	.comm	= event__process_comm,
+	.sample			= process_sample_event,
+	.comm			= event__process_comm,
+	.ordered_samples	= true,
 };
 
 static double fragmentation(unsigned long n_req, unsigned long n_alloc)
@@ -730,7 +731,6 @@ static const char *record_args[] = {
 	"record",
 	"-a",
 	"-R",
-	"-M",
 	"-f",
 	"-c", "1",
 	"-e", "kmem:kmalloc",
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/9] perf: Use generic sample reordering in perf trace
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2010-04-24  2:05 ` [PATCH 5/9] perf: Use generic sample reordering in perf kmem Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 7/9] perf: Use generic sample reordering in perf timechart Frederic Weisbecker
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Hitoshi Mitake,
	Ingo Molnar, Masami Hiramatsu, Tom Zanussi

Use the new generic sample events reordering from perf trace.
Before that, the displayed traces were ordered as they were
in the input as recorded by perf record (not time ordered).

This makes eventually perf trace displaying the events as beeing
time ordered.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
---
 tools/perf/builtin-trace.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 2eefb33..1d034f6 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -108,6 +108,7 @@ static struct perf_event_ops event_ops = {
 	.event_type = event__process_event_type,
 	.tracing_data = event__process_tracing_data,
 	.build_id = event__process_build_id,
+	.ordered_samples = true,
 };
 
 extern volatile int session_done;
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 7/9] perf: Use generic sample reordering in perf timechart
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2010-04-24  2:05 ` [PATCH 6/9] perf: Use generic sample reordering in perf trace Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24  2:05 ` [PATCH 8/9] perf: Add a perf trace option to check samples ordering reliability Frederic Weisbecker
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Hitoshi Mitake,
	Ingo Molnar, Masami Hiramatsu, Tom Zanussi, Arjan van de Ven

Use the new generic sample events reordering from perf timechart,
this drops the ad hoc sample reordering it was using before.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
 tools/perf/builtin-timechart.c |  112 ++--------------------------------------
 1 files changed, 5 insertions(+), 107 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 96f4a09..c35aa44 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -143,9 +143,6 @@ struct wake_event {
 static struct power_event    *power_events;
 static struct wake_event     *wake_events;
 
-struct sample_wrapper *all_samples;
-
-
 struct process_filter;
 struct process_filter {
 	char			*name;
@@ -566,88 +563,6 @@ static void end_sample_processing(void)
 	}
 }
 
-static u64 sample_time(event_t *event, const struct perf_session *session)
-{
-	int cursor;
-
-	cursor = 0;
-	if (session->sample_type & PERF_SAMPLE_IP)
-		cursor++;
-	if (session->sample_type & PERF_SAMPLE_TID)
-		cursor++;
-	if (session->sample_type & PERF_SAMPLE_TIME)
-		return event->sample.array[cursor];
-	return 0;
-}
-
-
-/*
- * We first queue all events, sorted backwards by insertion.
- * The order will get flipped later.
- */
-static int queue_sample_event(event_t *event, struct perf_session *session)
-{
-	struct sample_wrapper *copy, *prev;
-	int size;
-
-	size = event->sample.header.size + sizeof(struct sample_wrapper) + 8;
-
-	copy = malloc(size);
-	if (!copy)
-		return 1;
-
-	memset(copy, 0, size);
-
-	copy->next = NULL;
-	copy->timestamp = sample_time(event, session);
-
-	memcpy(&copy->data, event, event->sample.header.size);
-
-	/* insert in the right place in the list */
-
-	if (!all_samples) {
-		/* first sample ever */
-		all_samples = copy;
-		return 0;
-	}
-
-	if (all_samples->timestamp < copy->timestamp) {
-		/* insert at the head of the list */
-		copy->next = all_samples;
-		all_samples = copy;
-		return 0;
-	}
-
-	prev = all_samples;
-	while (prev->next) {
-		if (prev->next->timestamp < copy->timestamp) {
-			copy->next = prev->next;
-			prev->next = copy;
-			return 0;
-		}
-		prev = prev->next;
-	}
-	/* insert at the end of the list */
-	prev->next = copy;
-
-	return 0;
-}
-
-static void sort_queued_samples(void)
-{
-	struct sample_wrapper *cursor, *next;
-
-	cursor = all_samples;
-	all_samples = NULL;
-
-	while (cursor) {
-		next = cursor->next;
-		cursor->next = all_samples;
-		all_samples = cursor;
-		cursor = next;
-	}
-}
-
 /*
  * Sort the pid datastructure
  */
@@ -1011,26 +926,12 @@ static void write_svg_file(const char *filename)
 	svg_close();
 }
 
-static void process_samples(struct perf_session *session)
-{
-	struct sample_wrapper *cursor;
-	event_t *event;
-
-	sort_queued_samples();
-
-	cursor = all_samples;
-	while (cursor) {
-		event = (void *)&cursor->data;
-		cursor = cursor->next;
-		process_sample_event(event, session);
-	}
-}
-
 static struct perf_event_ops event_ops = {
-	.comm	= process_comm_event,
-	.fork	= process_fork_event,
-	.exit	= process_exit_event,
-	.sample	= queue_sample_event,
+	.comm			= process_comm_event,
+	.fork			= process_fork_event,
+	.exit			= process_exit_event,
+	.sample			= process_sample_event,
+	.ordered_samples	= true,
 };
 
 static int __cmd_timechart(void)
@@ -1048,8 +949,6 @@ static int __cmd_timechart(void)
 	if (ret)
 		goto out_delete;
 
-	process_samples(session);
-
 	end_sample_processing();
 
 	sort_pids();
@@ -1072,7 +971,6 @@ static const char *record_args[] = {
 	"record",
 	"-a",
 	"-R",
-	"-M",
 	"-f",
 	"-c", "1",
 	"-e", "power:power_start",
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 8/9] perf: Add a perf trace option to check samples ordering reliability
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2010-04-24  2:05 ` [PATCH 7/9] perf: Use generic sample reordering in perf timechart Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24 16:13   ` Masami Hiramatsu
  2010-04-24  2:05 ` [PATCH 9/9] perf: Some perf-kvm documentation edits Frederic Weisbecker
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Paul Mackerras, Hitoshi Mitake,
	Ingo Molnar, Masami Hiramatsu, Tom Zanussi

To ensure sample events time reordering is reliable, add a -d option
to perf trace to check that automatically.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
---
 tools/perf/builtin-trace.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 1d034f6..77f556f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -11,6 +11,8 @@
 
 static char const		*script_name;
 static char const		*generate_script_lang;
+static bool			debug_ordering;
+static u64			last_timestamp;
 
 static int default_start_script(const char *script __unused,
 				int argc __unused,
@@ -87,6 +89,14 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	}
 
 	if (session->sample_type & PERF_SAMPLE_RAW) {
+		if (debug_ordering) {
+			if (data.time < last_timestamp) {
+				pr_err("Samples misordered, previous: %llu "
+					"this: %llu\n", last_timestamp,
+					data.time);
+			}
+			last_timestamp = data.time;
+		}
 		/*
 		 * FIXME: better resolve from pid from the struct trace_entry
 		 * field, although it should be the same than this perf
@@ -532,6 +542,8 @@ static const struct option options[] = {
 		   "generate perf-trace.xx script in specified language"),
 	OPT_STRING('i', "input", &input_name, "file",
 		    "input file name"),
+	OPT_BOOLEAN('d', "debug-ordering", &debug_ordering,
+		   "check that samples time ordering is monotonic"),
 
 	OPT_END()
 };
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 9/9] perf: Some perf-kvm documentation edits
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2010-04-24  2:05 ` [PATCH 8/9] perf: Add a perf trace option to check samples ordering reliability Frederic Weisbecker
@ 2010-04-24  2:05 ` Frederic Weisbecker
  2010-04-24  2:27 ` [GIT PULL] perf tools updates Frederic Weisbecker
  2010-04-27  9:15 ` Ingo Molnar
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, William Cohen, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Paul Mackerras, Ingo Molnar, Frederic Weisbecker

From: William Cohen <wcohen@redhat.com>

asciidoc does not allow the "===" to be longer than the line
above it.
Also fix a couple types and formatting errors.

Signed-off-by: William Cohen <wcohen@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <4BD204C5.9000504@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/Documentation/perf-kvm.txt |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
index 93400a0..d004e19 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -1,5 +1,5 @@
 perf-kvm(1)
-==============
+===========
 
 NAME
 ----
@@ -41,9 +41,9 @@ There are a couple of variants of perf kvm:
 OPTIONS
 -------
 --host=::
-        Collect host side perforamnce profile.
+        Collect host side performance profile.
 --guest=::
-        Collect guest side perforamnce profile.
+        Collect guest side performance profile.
 --guestmount=<path>::
 	Guest os root file system mount directory. Users mounts guest os
         root directories under <path> by a specific filesystem access method,
@@ -64,4 +64,5 @@ OPTIONS
 
 SEE ALSO
 --------
-linkperf:perf-top[1] perf-record[1] perf-report[1] perf-diff[1] perf-buildid-list[1]
+linkperf:perf-top[1], linkperf:perf-record[1], linkperf:perf-report[1],
+linkperf:perf-diff[1], linkperf:perf-buildid-list[1]
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf tools updates
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2010-04-24  2:05 ` [PATCH 9/9] perf: Some perf-kvm documentation edits Frederic Weisbecker
@ 2010-04-24  2:27 ` Frederic Weisbecker
  2010-04-27  9:15 ` Ingo Molnar
  10 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24  2:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, William Cohen, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Paul Mackerras, Hitoshi Mitake, Masami Hiramatsu, Tom Zanussi,
	Arjan van de Ven, Pekka Enberg, Li Zefan, Stephane Eranian,
	Jens Axboe, Jason Baron, Xiao Guangrong

On Sat, Apr 24, 2010 at 04:05:33AM +0200, Frederic Weisbecker wrote:
> Ingo,
> 
> Please pull the perf/core branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/core
> 
> Thanks,
> 	Frederic



I forgot to highlight some things here.

- The -M option is not used anymore. Well actually I just checked and it's
  used by the record perl/python scripts. But it's not needed there anymore, so
  I'll drop it in another pass. But globally it's over with the buffers
  multiplexing needs.

- But I haven't plugged the reordering thing to the live mode, because I'm
  not sure exactly if that would be welcome. With live mode we want the
  events as they arrive, using the reordering there would make it get the
  events per bunches of 2 seconds slices. I guess we'll figure out a solution
  for that.

- Perf lock gets into a better shape. There is still some work to make
  it truly usable though. I need to unearth the event injection thing
  to lower the size of the events, profile by lock classes, etc...

- Various important fixes

Thanks.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] perf lock: add "info" subcommand for dumping misc information
  2010-04-24  2:05 ` [PATCH 1/9] perf lock: Fix state machine to recognize lock sequence Frederic Weisbecker
@ 2010-04-24 10:43   ` Hitoshi Mitake
  2010-04-24 10:46     ` Hitoshi Mitake
  2010-04-24 10:46   ` [PATCH v2] " Hitoshi Mitake
  1 sibling, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-04-24 10:43 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mitake, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

Hi Frederic,

I added "info" subcommand to perf lock,
this can be used as dumping metadata like thread or address of lock instances.
"map" was removed because info should do the work of it.

This will be useful not only for debugging but also for ordinary analyzing.

I made this patch on perf/core of your tree, could you queue this?

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/builtin-lock.c |   68 +++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce27675..c54211e 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -778,18 +778,61 @@ static void print_result(void)
 	}
 }
 
+static int			info_threads;
+static int			info_map;
+
+static void rec_dump_threads(struct rb_node *node)
+{
+	struct thread_stat *st;
+	struct thread *t;
+
+	if (!node)
+		return;
+
+	if (node->rb_left)
+		rec_dump_threads(node->rb_left);
+
+	st = container_of(node, struct thread_stat, rb);
+	BUG_ON(!st);
+	t = perf_session__findnew(session, st->tid);
+	BUG_ON(!t);
+
+	printf("%10d: %s\n", st->tid, t->comm);
+
+	if (node->rb_right)
+		rec_dump_threads(node->rb_right);
+}
+
+static void dump_threads(void)
+{
+	printf("%10s: comm\n", "Thread ID");
+	rec_dump_threads(thread_stats.rb_node);
+}
+
 static void dump_map(void)
 {
 	unsigned int i;
 	struct lock_stat *st;
 
+	printf("Address of instance: name of class\n");
 	for (i = 0; i < LOCKHASH_SIZE; i++) {
 		list_for_each_entry(st, &lockhash_table[i], hash_entry) {
-			printf("%p: %s\n", st->addr, st->name);
+			printf(" %p: %s\n", st->addr, st->name);
 		}
 	}
 }
 
+static void dump_info(void)
+{
+	/* ugly... */
+	if (info_threads)
+		dump_threads();
+	else if (info_map)
+		dump_map();
+	else
+		die("Unknown type of information\n");
+}
+
 static int process_sample_event(event_t *self, struct perf_session *s)
 {
 	struct sample_data data;
@@ -858,6 +901,19 @@ static const struct option report_options[] = {
 	OPT_END()
 };
 
+static const char * const info_usage[] = {
+	"perf lock info [<options>]",
+	NULL
+};
+
+static const struct option info_options[] = {
+	OPT_BOOLEAN('t', "threads", &info_threads,
+		    "dump thread list in perf.data"),
+	OPT_BOOLEAN('m', "map", &info_map,
+		    "map of lock instances (name:address table)"),
+	OPT_END()
+};
+
 static const char * const lock_usage[] = {
 	"perf lock [<options>] {record|trace|report}",
 	NULL
@@ -929,12 +985,18 @@ int cmd_lock(int argc, const char **argv, const char *prefix __used)
 	} else if (!strcmp(argv[0], "trace")) {
 		/* Aliased to 'perf trace' */
 		return cmd_trace(argc, argv, prefix);
-	} else if (!strcmp(argv[0], "map")) {
+	} else if (!strcmp(argv[0], "info")) {
+		if (argc) {
+			argc = parse_options(argc, argv,
+					     info_options, info_usage, 0);
+			if (argc)
+				usage_with_options(info_usage, info_options);
+		}
 		/* recycling report_lock_ops */
 		trace_handler = &report_lock_ops;
 		setup_pager();
 		read_events();
-		dump_map();
+		dump_info();
 	} else {
 		usage_with_options(lock_usage, lock_options);
 	}
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: add "info" subcommand for dumping misc information
  2010-04-24 10:43   ` [PATCH] perf lock: add "info" subcommand for dumping misc information Hitoshi Mitake
@ 2010-04-24 10:46     ` Hitoshi Mitake
  0 siblings, 0 replies; 34+ messages in thread
From: Hitoshi Mitake @ 2010-04-24 10:46 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: Frederic Weisbecker, linux-kernel, h.mitake, Ingo Molnar,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Jens Axboe, Jason Baron, Xiao Guangrong

On 04/24/10 19:43, Hitoshi Mitake wrote:
 > Hi Frederic,
 >
 > I added "info" subcommand to perf lock,
 > this can be used as dumping metadata like thread or address of lock 
instances.
 > "map" was removed because info should do the work of it.
 >
 > This will be useful not only for debugging but also for ordinary 
analyzing.
 >
 > I made this patch on perf/core of your tree, could you queue this?

Sorry, I should add example of usage :(
I resend v2, please discard it.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2] perf lock: add "info" subcommand for dumping misc information
  2010-04-24  2:05 ` [PATCH 1/9] perf lock: Fix state machine to recognize lock sequence Frederic Weisbecker
  2010-04-24 10:43   ` [PATCH] perf lock: add "info" subcommand for dumping misc information Hitoshi Mitake
@ 2010-04-24 10:46   ` Hitoshi Mitake
  2010-04-24 13:41     ` Frederic Weisbecker
  2010-04-30 18:49     ` Frederic Weisbecker
  1 sibling, 2 replies; 34+ messages in thread
From: Hitoshi Mitake @ 2010-04-24 10:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mitake, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

Hi Frederic,

I added "info" subcommand to perf lock,
this can be used as dumping metadata like thread or address of lock instances.
"map" was removed because info should do the work of it.

This will be useful not only for debugging but also for ordinary analyzing.

I made this patch on perf/core of your tree, could you queue this?

v2: adding example of usage
% sudo ./perf lock info -t
 | Thread ID: comm
 | 	 0: swapper
 |         1: init
 |        18: migration/5
 |        29: events/2
 |        32: events/5
 |        33: events/6
...

% sudo ./perf lock info -m
| Address of instance: name of class
|  0xffff8800b95adae0: &(&sighand->siglock)->rlock
|  0xffff8800bbb41ae0: &(&sighand->siglock)->rlock
|  0xffff8800bf165ae0: &(&sighand->siglock)->rlock
|  0xffff8800b9576a98: &p->cred_guard_mutex
|  0xffff8800bb890a08: &(&p->alloc_lock)->rlock
|  0xffff8800b9522a08: &(&p->alloc_lock)->rlock
|  0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock
|  0xffff8800bba72a08: &(&p->alloc_lock)->rlock
|  0xffff8800bf18ea08: &(&p->alloc_lock)->rlock
|  0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock
|  0xffff88009bf818a0: &(&ip->i_lock)->mr_lock
|  0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock
|  0xffff8800bb6478a0: &(shost->host_lock)->rlock
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/builtin-lock.c |   68 +++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce27675..c54211e 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -778,18 +778,61 @@ static void print_result(void)
 	}
 }
 
+static int			info_threads;
+static int			info_map;
+
+static void rec_dump_threads(struct rb_node *node)
+{
+	struct thread_stat *st;
+	struct thread *t;
+
+	if (!node)
+		return;
+
+	if (node->rb_left)
+		rec_dump_threads(node->rb_left);
+
+	st = container_of(node, struct thread_stat, rb);
+	BUG_ON(!st);
+	t = perf_session__findnew(session, st->tid);
+	BUG_ON(!t);
+
+	printf("%10d: %s\n", st->tid, t->comm);
+
+	if (node->rb_right)
+		rec_dump_threads(node->rb_right);
+}
+
+static void dump_threads(void)
+{
+	printf("%10s: comm\n", "Thread ID");
+	rec_dump_threads(thread_stats.rb_node);
+}
+
 static void dump_map(void)
 {
 	unsigned int i;
 	struct lock_stat *st;
 
+	printf("Address of instance: name of class\n");
 	for (i = 0; i < LOCKHASH_SIZE; i++) {
 		list_for_each_entry(st, &lockhash_table[i], hash_entry) {
-			printf("%p: %s\n", st->addr, st->name);
+			printf(" %p: %s\n", st->addr, st->name);
 		}
 	}
 }
 
+static void dump_info(void)
+{
+	/* ugly... */
+	if (info_threads)
+		dump_threads();
+	else if (info_map)
+		dump_map();
+	else
+		die("Unknown type of information\n");
+}
+
 static int process_sample_event(event_t *self, struct perf_session *s)
 {
 	struct sample_data data;
@@ -858,6 +901,19 @@ static const struct option report_options[] = {
 	OPT_END()
 };
 
+static const char * const info_usage[] = {
+	"perf lock info [<options>]",
+	NULL
+};
+
+static const struct option info_options[] = {
+	OPT_BOOLEAN('t', "threads", &info_threads,
+		    "dump thread list in perf.data"),
+	OPT_BOOLEAN('m', "map", &info_map,
+		    "map of lock instances (name:address table)"),
+	OPT_END()
+};
+
 static const char * const lock_usage[] = {
 	"perf lock [<options>] {record|trace|report}",
 	NULL
@@ -929,12 +985,18 @@ int cmd_lock(int argc, const char **argv, const char *prefix __used)
 	} else if (!strcmp(argv[0], "trace")) {
 		/* Aliased to 'perf trace' */
 		return cmd_trace(argc, argv, prefix);
-	} else if (!strcmp(argv[0], "map")) {
+	} else if (!strcmp(argv[0], "info")) {
+		if (argc) {
+			argc = parse_options(argc, argv,
+					     info_options, info_usage, 0);
+			if (argc)
+				usage_with_options(info_usage, info_options);
+		}
 		/* recycling report_lock_ops */
 		trace_handler = &report_lock_ops;
 		setup_pager();
 		read_events();
-		dump_map();
+		dump_info();
 	} else {
 		usage_with_options(lock_usage, lock_options);
 	}
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] perf lock: add "info" subcommand for dumping misc information
  2010-04-24 10:46   ` [PATCH v2] " Hitoshi Mitake
@ 2010-04-24 13:41     ` Frederic Weisbecker
  2010-04-30 18:49     ` Frederic Weisbecker
  1 sibling, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-24 13:41 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On Sat, Apr 24, 2010 at 07:46:41PM +0900, Hitoshi Mitake wrote:
> Hi Frederic,
> 
> I added "info" subcommand to perf lock,
> this can be used as dumping metadata like thread or address of lock instances.
> "map" was removed because info should do the work of it.
> 
> This will be useful not only for debugging but also for ordinary analyzing.
> 
> I made this patch on perf/core of your tree, could you queue this?
> 
> v2: adding example of usage
> % sudo ./perf lock info -t
>  | Thread ID: comm
>  | 	 0: swapper
>  |         1: init
>  |        18: migration/5
>  |        29: events/2
>  |        32: events/5
>  |        33: events/6
> ...
> 
> % sudo ./perf lock info -m
> | Address of instance: name of class
> |  0xffff8800b95adae0: &(&sighand->siglock)->rlock
> |  0xffff8800bbb41ae0: &(&sighand->siglock)->rlock
> |  0xffff8800bf165ae0: &(&sighand->siglock)->rlock
> |  0xffff8800b9576a98: &p->cred_guard_mutex
> |  0xffff8800bb890a08: &(&p->alloc_lock)->rlock
> |  0xffff8800b9522a08: &(&p->alloc_lock)->rlock
> |  0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock
> |  0xffff8800bba72a08: &(&p->alloc_lock)->rlock
> |  0xffff8800bf18ea08: &(&p->alloc_lock)->rlock
> |  0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock
> |  0xffff88009bf818a0: &(&ip->i_lock)->mr_lock
> |  0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock
> |  0xffff8800bb6478a0: &(shost->host_lock)->rlock



Thanks, I queue it for the next pass.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 8/9] perf: Add a perf trace option to check samples ordering reliability
  2010-04-24  2:05 ` [PATCH 8/9] perf: Add a perf trace option to check samples ordering reliability Frederic Weisbecker
@ 2010-04-24 16:13   ` Masami Hiramatsu
  2010-04-25 18:08     ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Masami Hiramatsu @ 2010-04-24 16:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Paul Mackerras, Hitoshi Mitake, Tom Zanussi

Hi Frederic,

Frederic Weisbecker wrote:
> To ensure sample events time reordering is reliable, add a -d option
> to perf trace to check that automatically.

Just out of curiosity, is there any possibility to add 
other debugging options to perf-trace?
If so, I think we can pass some arguments for
this -d option, e.g. "-d ordering,events" and so on.
Just an idea.

Thank you,



-- 
Masami Hiramatsu
e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 8/9] perf: Add a perf trace option to check samples ordering reliability
  2010-04-24 16:13   ` Masami Hiramatsu
@ 2010-04-25 18:08     ` Frederic Weisbecker
  0 siblings, 0 replies; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-25 18:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Paul Mackerras, Hitoshi Mitake, Tom Zanussi

On Sat, Apr 24, 2010 at 12:13:05PM -0400, Masami Hiramatsu wrote:
> Hi Frederic,
> 
> Frederic Weisbecker wrote:
> > To ensure sample events time reordering is reliable, add a -d option
> > to perf trace to check that automatically.
> 
> Just out of curiosity, is there any possibility to add 
> other debugging options to perf-trace?
> If so, I think we can pass some arguments for
> this -d option, e.g. "-d ordering,events" and so on.
> Just an idea.
> 
> Thank you,



Sure, if somebody wants to come with another debugging option
in perf trace, no problem.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf tools updates
  2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2010-04-24  2:27 ` [GIT PULL] perf tools updates Frederic Weisbecker
@ 2010-04-27  9:15 ` Ingo Molnar
  10 siblings, 0 replies; 34+ messages in thread
From: Ingo Molnar @ 2010-04-27  9:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, William Cohen, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Paul Mackerras, Hitoshi Mitake, Masami Hiramatsu, Tom Zanussi,
	Arjan van de Ven, Pekka Enberg, Li Zefan, Stephane Eranian,
	Jens Axboe, Jason Baron, Xiao Guangrong


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Ingo,
> 
> Please pull the perf/core branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> 	perf/core
> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (6):
>       perf: Generalize perf lock's sample event reordering to the session layer
>       perf: Use generic sample reordering in perf sched
>       perf: Use generic sample reordering in perf kmem
>       perf: Use generic sample reordering in perf trace
>       perf: Use generic sample reordering in perf timechart
>       perf: Add a perf trace option to check samples ordering reliability
> 
> Hitoshi Mitake (1):
>       perf lock: Fix state machine to recognize lock sequence
> 
> Stephane Eranian (1):
>       perf: Fix initialization bug in parse_single_tracepoint_event()
> 
> William Cohen (1):
>       perf: Some perf-kvm documentation edits
> 
> 
>  tools/perf/Documentation/perf-kvm.txt |    9 +-
>  tools/perf/builtin-kmem.c             |    6 +-
>  tools/perf/builtin-lock.c             |  595 ++++++++++++++++++++-------------
>  tools/perf/builtin-sched.c            |    8 +-
>  tools/perf/builtin-timechart.c        |  112 +------
>  tools/perf/builtin-trace.c            |   13 +
>  tools/perf/util/parse-events.c        |   13 +-
>  tools/perf/util/session.c             |  179 ++++++++++-
>  tools/perf/util/session.h             |   10 +
>  9 files changed, 583 insertions(+), 362 deletions(-)

Pulled, thanks a lot Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] perf lock: add "info" subcommand for dumping misc information
  2010-04-24 10:46   ` [PATCH v2] " Hitoshi Mitake
  2010-04-24 13:41     ` Frederic Weisbecker
@ 2010-04-30 18:49     ` Frederic Weisbecker
  2010-05-03  5:11       ` Hitoshi Mitake
  1 sibling, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2010-04-30 18:49 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On Sat, Apr 24, 2010 at 07:46:41PM +0900, Hitoshi Mitake wrote:
> Hi Frederic,
> 
> I added "info" subcommand to perf lock,
> this can be used as dumping metadata like thread or address of lock instances.
> "map" was removed because info should do the work of it.
> 
> This will be useful not only for debugging but also for ordinary analyzing.
> 
> I made this patch on perf/core of your tree, could you queue this?
> 
> v2: adding example of usage
> % sudo ./perf lock info -t
>  | Thread ID: comm
>  | 	 0: swapper
>  |         1: init
>  |        18: migration/5
>  |        29: events/2
>  |        32: events/5
>  |        33: events/6
> ...
> 
> % sudo ./perf lock info -m
> | Address of instance: name of class
> |  0xffff8800b95adae0: &(&sighand->siglock)->rlock
> |  0xffff8800bbb41ae0: &(&sighand->siglock)->rlock
> |  0xffff8800bf165ae0: &(&sighand->siglock)->rlock
> |  0xffff8800b9576a98: &p->cred_guard_mutex
> |  0xffff8800bb890a08: &(&p->alloc_lock)->rlock
> |  0xffff8800b9522a08: &(&p->alloc_lock)->rlock
> |  0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock
> |  0xffff8800bba72a08: &(&p->alloc_lock)->rlock
> |  0xffff8800bf18ea08: &(&p->alloc_lock)->rlock
> |  0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock
> |  0xffff88009bf818a0: &(&ip->i_lock)->mr_lock
> |  0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock
> |  0xffff8800bb6478a0: &(shost->host_lock)->rlock
> Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jens Axboe <jens.axboe@oracle.com>
> Cc: Jason Baron <jbaron@redhat.com>
> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>


I've eventually not queued it because of some various
problems, see below:



> ---
>  tools/perf/builtin-lock.c |   68 +++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 65 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index ce27675..c54211e 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
> @@ -778,18 +778,61 @@ static void print_result(void)
>  	}
>  }
>  
> +static int			info_threads;
> +static int			info_map;
> +
> +static void rec_dump_threads(struct rb_node *node)
> +{
> +	struct thread_stat *st;
> +	struct thread *t;
> +
> +	if (!node)
> +		return;
> +
> +	if (node->rb_left)
> +		rec_dump_threads(node->rb_left);


That only walks the left nodes of the rbtree, imagine the following
rbtree, W are visited nodes, U are the unvisited:

                   Root
                 /      \
                W        U
              /  \      / \
             W    U    U   U


Better iterate using rb_first() then rb_next() until it is NULL.



> +
> +	st = container_of(node, struct thread_stat, rb);
> +	BUG_ON(!st);


You won't ever have !st because container_of computes an address
based on a struct type and a contained address inside this struct.

struct thread_stat {
	struct list_head	hash_entry;
	struct rb_node		rb;
	[...]
} ts;

If ts->rb == 1000, ts == 1000 - 16 or something like this.

What matters is the "if (!node)" check you did before.



> +	t = perf_session__findnew(session, st->tid);
> +	BUG_ON(!t);
> +
> +	printf("%10d: %s\n", st->tid, t->comm);



Please don't use printf anymore (I did the same mistakes lately),
now that are using a TUI and we might use a GUI one day, we
can't assume anymore we are dealing with a normal stdout.

So better use pr_debug, pr_err, pr_warning, etc...



> +
> +	if (node->rb_right)
> +		rec_dump_threads(node->rb_right);
> +}
> +
> +static void dump_threads(void)
> +{
> +	printf("%10s: comm\n", "Thread ID");



Same here and below.



> +	rec_dump_threads(thread_stats.rb_node);
> +}
> +
>  static void dump_map(void)
>  {
>  	unsigned int i;
>  	struct lock_stat *st;
>  
> +	printf("Address of instance: name of class\n");
>  	for (i = 0; i < LOCKHASH_SIZE; i++) {
>  		list_for_each_entry(st, &lockhash_table[i], hash_entry) {
> -			printf("%p: %s\n", st->addr, st->name);
> +			printf(" %p: %s\n", st->addr, st->name);
>  		}
>  	}
>  }
>  
> +static void dump_info(void)
> +{
> +	/* ugly... */
> +	if (info_threads)
> +		dump_threads();


No it's not ugly, it's ok, we do this everywhere in perf tools :)

Thanks.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] perf lock: add "info" subcommand for dumping misc information
  2010-04-30 18:49     ` Frederic Weisbecker
@ 2010-05-03  5:11       ` Hitoshi Mitake
  2010-05-03  5:12         ` [PATCH v3] " Hitoshi Mitake
  0 siblings, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-03  5:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On 05/01/10 03:49, Frederic Weisbecker wrote:
 > On Sat, Apr 24, 2010 at 07:46:41PM +0900, Hitoshi Mitake wrote:
 >> Hi Frederic,
 >>
 >> I added "info" subcommand to perf lock,
 >> this can be used as dumping metadata like thread or address of lock 
instances.
 >> "map" was removed because info should do the work of it.
 >>
 >> This will be useful not only for debugging but also for ordinary 
analyzing.
 >>
 >> I made this patch on perf/core of your tree, could you queue this?
 >>
 >> v2: adding example of usage
 >> % sudo ./perf lock info -t
 >>   | Thread ID: comm
 >>   | 	 0: swapper
 >>   |         1: init
 >>   |        18: migration/5
 >>   |        29: events/2
 >>   |        32: events/5
 >>   |        33: events/6
 >> ...
 >>
 >> % sudo ./perf lock info -m
 >> | Address of instance: name of class
 >> |  0xffff8800b95adae0:&(&sighand->siglock)->rlock
 >> |  0xffff8800bbb41ae0:&(&sighand->siglock)->rlock
 >> |  0xffff8800bf165ae0:&(&sighand->siglock)->rlock
 >> |  0xffff8800b9576a98:&p->cred_guard_mutex
 >> |  0xffff8800bb890a08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800b9522a08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800bb8aaa08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800bba72a08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800bf18ea08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800b8a0d8a0:&(&ip->i_lock)->mr_lock
 >> |  0xffff88009bf818a0:&(&ip->i_lock)->mr_lock
 >> |  0xffff88004c66b8a0:&(&ip->i_lock)->mr_lock
 >> |  0xffff8800bb6478a0:&(shost->host_lock)->rlock
 >> Signed-off-by: Hitoshi Mitake<mitake@dcl.info.waseda.ac.jp>
 >> Cc: Ingo Molnar<mingo@elte.hu>
 >> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl>
 >> Cc: Paul Mackerras<paulus@samba.org>
 >> Cc: Arnaldo Carvalho de Melo<acme@redhat.com>
 >> Cc: Jens Axboe<jens.axboe@oracle.com>
 >> Cc: Jason Baron<jbaron@redhat.com>
 >> Cc: Xiao Guangrong<xiaoguangrong@cn.fujitsu.com>
 >
 >
 > I've eventually not queued it because of some various
 > problems, see below:

Thanks for your advice, Frederic!

 >
 >
 >
 >> ---
 >>   tools/perf/builtin-lock.c |   68 
+++++++++++++++++++++++++++++++++++++++++++--
 >>   1 files changed, 65 insertions(+), 3 deletions(-)
 >>
 >> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
 >> index ce27675..c54211e 100644
 >> --- a/tools/perf/builtin-lock.c
 >> +++ b/tools/perf/builtin-lock.c
 >> @@ -778,18 +778,61 @@ static void print_result(void)
 >>   	}
 >>   }
 >>
 >> +static int			info_threads;
 >> +static int			info_map;
 >> +
 >> +static void rec_dump_threads(struct rb_node *node)
 >> +{
 >> +	struct thread_stat *st;
 >> +	struct thread *t;
 >> +
 >> +	if (!node)
 >> +		return;
 >> +
 >> +	if (node->rb_left)
 >> +		rec_dump_threads(node->rb_left);
 >
 >
 > That only walks the left nodes of the rbtree, imagine the following
 > rbtree, W are visited nodes, U are the unvisited:
 >
 >                     Root
 >                   /      \
 >                  W        U
 >                /  \      / \
 >               W    U    U   U
 >
 >
 > Better iterate using rb_first() then rb_next() until it is NULL.

Ah, yeah, recursive tracking is too primitive.
I rewrote it with rb_first() and rb_next().

 >
 >
 >
 >> +
 >> +	st = container_of(node, struct thread_stat, rb);
 >> +	BUG_ON(!st);
 >
 >
 > You won't ever have !st because container_of computes an address
 > based on a struct type and a contained address inside this struct.
 >
 > struct thread_stat {
 > 	struct list_head	hash_entry;
 > 	struct rb_node		rb;
 > 	[...]
 > } ts;
 >
 > If ts->rb == 1000, ts == 1000 - 16 or something like this.
 >
 > What matters is the "if (!node)" check you did before.

Yeah... my BUG_ON() checking was really silly and meaningless :(
I'll remove it.

 >
 >
 >
 >> +	t = perf_session__findnew(session, st->tid);
 >> +	BUG_ON(!t);
 >> +
 >> +	printf("%10d: %s\n", st->tid, t->comm);
 >
 >
 >
 > Please don't use printf anymore (I did the same mistakes lately),
 > now that are using a TUI and we might use a GUI one day, we
 > can't assume anymore we are dealing with a normal stdout.
 >
 > So better use pr_debug, pr_err, pr_warning, etc...

OK. I'll replace printf() with pr_info() or pr_debug().

 >
 >
 >
 >> +
 >> +	if (node->rb_right)
 >> +		rec_dump_threads(node->rb_right);
 >> +}
 >> +
 >> +static void dump_threads(void)
 >> +{
 >> +	printf("%10s: comm\n", "Thread ID");
 >
 >
 >
 > Same here and below.
 >
 >
 >
 >> +	rec_dump_threads(thread_stats.rb_node);
 >> +}
 >> +
 >>   static void dump_map(void)
 >>   {
 >>   	unsigned int i;
 >>   	struct lock_stat *st;
 >>
 >> +	printf("Address of instance: name of class\n");
 >>   	for (i = 0; i<  LOCKHASH_SIZE; i++) {
 >>   		list_for_each_entry(st,&lockhash_table[i], hash_entry) {
 >> -			printf("%p: %s\n", st->addr, st->name);
 >> +			printf(" %p: %s\n", st->addr, st->name);
 >>   		}
 >>   	}
 >>   }
 >>
 >> +static void dump_info(void)
 >> +{
 >> +	/* ugly... */
 >> +	if (info_threads)
 >> +		dump_threads();
 >
 >
 > No it's not ugly, it's ok, we do this everywhere in perf tools :)
 >

Thanks :) I'll resend v3 patch later.

        Hitoshi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3] perf lock: add "info" subcommand for dumping misc information
  2010-05-03  5:11       ` Hitoshi Mitake
@ 2010-05-03  5:12         ` Hitoshi Mitake
  2010-05-05 21:10           ` Frederic Weisbecker
  2010-05-10  7:19           ` [tip:perf/core] perf lock: Add "info" subcommand for dumping misc information tip-bot for Hitoshi Mitake
  0 siblings, 2 replies; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-03  5:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mitake, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

Hi Frederic,

I added "info" subcommand to perf lock,
this can be used as dumping metadata like thread or address of lock instances.
"map" was removed because info should do the work of it.

This will be useful not only for debugging but also for ordinary analyzing.

I made this patch on perf/core of your tree, could you queue this?

v2: adding example of usage
% sudo ./perf lock info -t
 | Thread ID: comm
 | 	 0: swapper
 |         1: init
 |        18: migration/5
 |        29: events/2
 |        32: events/5
 |        33: events/6
...

% sudo ./perf lock info -m
| Address of instance: name of class
|  0xffff8800b95adae0: &(&sighand->siglock)->rlock
|  0xffff8800bbb41ae0: &(&sighand->siglock)->rlock
|  0xffff8800bf165ae0: &(&sighand->siglock)->rlock
|  0xffff8800b9576a98: &p->cred_guard_mutex
|  0xffff8800bb890a08: &(&p->alloc_lock)->rlock
|  0xffff8800b9522a08: &(&p->alloc_lock)->rlock
|  0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock
|  0xffff8800bba72a08: &(&p->alloc_lock)->rlock
|  0xffff8800bf18ea08: &(&p->alloc_lock)->rlock
|  0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock
|  0xffff88009bf818a0: &(&ip->i_lock)->mr_lock
|  0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock
|  0xffff8800bb6478a0: &(shost->host_lock)->rlock

v3: fixed some problems Frederic pointed out
 * better rbtree tracking in dump_threads()
 * removed printf() and used pr_info() and pr_debug()

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
---
 tools/perf/builtin-lock.c |   96 ++++++++++++++++++++++++++++++++++-----------
 1 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce27675..010a7c8 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -720,15 +720,15 @@ static void print_result(void)
 	char cut_name[20];
 	int bad, total;
 
-	printf("%20s ", "Name");
-	printf("%10s ", "acquired");
-	printf("%10s ", "contended");
+	pr_info("%20s ", "Name");
+	pr_info("%10s ", "acquired");
+	pr_info("%10s ", "contended");
 
-	printf("%15s ", "total wait (ns)");
-	printf("%15s ", "max wait (ns)");
-	printf("%15s ", "min wait (ns)");
+	pr_info("%15s ", "total wait (ns)");
+	pr_info("%15s ", "max wait (ns)");
+	pr_info("%15s ", "min wait (ns)");
 
-	printf("\n\n");
+	pr_info("\n\n");
 
 	bad = total = 0;
 	while ((st = pop_from_result())) {
@@ -741,7 +741,7 @@ static void print_result(void)
 
 		if (strlen(st->name) < 16) {
 			/* output raw name */
-			printf("%20s ", st->name);
+			pr_info("%20s ", st->name);
 		} else {
 			strncpy(cut_name, st->name, 16);
 			cut_name[16] = '.';
@@ -749,17 +749,17 @@ static void print_result(void)
 			cut_name[18] = '.';
 			cut_name[19] = '\0';
 			/* cut off name for saving output style */
-			printf("%20s ", cut_name);
+			pr_info("%20s ", cut_name);
 		}
 
-		printf("%10u ", st->nr_acquired);
-		printf("%10u ", st->nr_contended);
+		pr_info("%10u ", st->nr_acquired);
+		pr_info("%10u ", st->nr_contended);
 
-		printf("%15llu ", st->wait_time_total);
-		printf("%15llu ", st->wait_time_max);
-		printf("%15llu ", st->wait_time_min == ULLONG_MAX ?
+		pr_info("%15llu ", st->wait_time_total);
+		pr_info("%15llu ", st->wait_time_max);
+		pr_info("%15llu ", st->wait_time_min == ULLONG_MAX ?
 		       0 : st->wait_time_min);
-		printf("\n");
+		pr_info("\n");
 	}
 
 	{
@@ -768,28 +768,59 @@ static void print_result(void)
 		const char *name[4] =
 			{ "acquire", "acquired", "contended", "release" };
 
-		printf("\n=== output for debug===\n\n");
-		printf("bad:%d, total:%d\n", bad, total);
-		printf("bad rate:%f\n", (double)(bad / total));
+		pr_debug("\n=== output for debug===\n\n");
+		pr_debug("bad:%d, total:%d\n", bad, total);
+		pr_debug("bad rate:%f\n", (double)(bad / total));
 
-		printf("histogram of events caused bad sequence\n");
+		pr_debug("histogram of events caused bad sequence\n");
 		for (i = 0; i < 4; i++)
-			printf(" %10s: %d\n", name[i], bad_hist[i]);
+			pr_debug(" %10s: %d\n", name[i], bad_hist[i]);
 	}
 }
 
+static int			info_threads;
+static int			info_map;
+
+static void dump_threads(void)
+{
+	struct thread_stat *st;
+	struct rb_node *node;
+	struct thread *t;
+
+	pr_info("%10s: comm\n", "Thread ID");
+
+	node = rb_first(&thread_stats);
+	while (node) {
+		st = container_of(node, struct thread_stat, rb);
+		t = perf_session__findnew(session, st->tid);
+		pr_info("%10d: %s\n", st->tid, t->comm);
+		node = rb_next(node);
+	};
+}
+
 static void dump_map(void)
 {
 	unsigned int i;
 	struct lock_stat *st;
 
+	pr_info("Address of instance: name of class\n");
 	for (i = 0; i < LOCKHASH_SIZE; i++) {
 		list_for_each_entry(st, &lockhash_table[i], hash_entry) {
-			printf("%p: %s\n", st->addr, st->name);
+			pr_info(" %p: %s\n", st->addr, st->name);
 		}
 	}
 }
 
+static void dump_info(void)
+{
+	if (info_threads)
+		dump_threads();
+	else if (info_map)
+		dump_map();
+	else
+		die("Unknown type of information\n");
+}
+
 static int process_sample_event(event_t *self, struct perf_session *s)
 {
 	struct sample_data data;
@@ -858,6 +889,19 @@ static const struct option report_options[] = {
 	OPT_END()
 };
 
+static const char * const info_usage[] = {
+	"perf lock info [<options>]",
+	NULL
+};
+
+static const struct option info_options[] = {
+	OPT_BOOLEAN('t', "threads", &info_threads,
+		    "dump thread list in perf.data"),
+	OPT_BOOLEAN('m', "map", &info_map,
+		    "map of lock instances (name:address table)"),
+	OPT_END()
+};
+
 static const char * const lock_usage[] = {
 	"perf lock [<options>] {record|trace|report}",
 	NULL
@@ -929,12 +973,18 @@ int cmd_lock(int argc, const char **argv, const char *prefix __used)
 	} else if (!strcmp(argv[0], "trace")) {
 		/* Aliased to 'perf trace' */
 		return cmd_trace(argc, argv, prefix);
-	} else if (!strcmp(argv[0], "map")) {
+	} else if (!strcmp(argv[0], "info")) {
+		if (argc) {
+			argc = parse_options(argc, argv,
+					     info_options, info_usage, 0);
+			if (argc)
+				usage_with_options(info_usage, info_options);
+		}
 		/* recycling report_lock_ops */
 		trace_handler = &report_lock_ops;
 		setup_pager();
 		read_events();
-		dump_map();
+		dump_info();
 	} else {
 		usage_with_options(lock_usage, lock_options);
 	}
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3] perf lock: add "info" subcommand for dumping misc information
  2010-05-03  5:12         ` [PATCH v3] " Hitoshi Mitake
@ 2010-05-05 21:10           ` Frederic Weisbecker
  2010-05-06  9:31             ` Hitoshi Mitake
  2010-05-10  7:19           ` [tip:perf/core] perf lock: Add "info" subcommand for dumping misc information tip-bot for Hitoshi Mitake
  1 sibling, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2010-05-05 21:10 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On Mon, May 03, 2010 at 02:12:00PM +0900, Hitoshi Mitake wrote:
> Hi Frederic,
> 
> I added "info" subcommand to perf lock,
> this can be used as dumping metadata like thread or address of lock instances.
> "map" was removed because info should do the work of it.
> 
> This will be useful not only for debugging but also for ordinary analyzing.
> 
> I made this patch on perf/core of your tree, could you queue this?
> 
> v2: adding example of usage
> % sudo ./perf lock info -t
>  | Thread ID: comm
>  | 	 0: swapper
>  |         1: init
>  |        18: migration/5
>  |        29: events/2
>  |        32: events/5
>  |        33: events/6
> ...
> 
> % sudo ./perf lock info -m
> | Address of instance: name of class
> |  0xffff8800b95adae0: &(&sighand->siglock)->rlock
> |  0xffff8800bbb41ae0: &(&sighand->siglock)->rlock
> |  0xffff8800bf165ae0: &(&sighand->siglock)->rlock
> |  0xffff8800b9576a98: &p->cred_guard_mutex
> |  0xffff8800bb890a08: &(&p->alloc_lock)->rlock
> |  0xffff8800b9522a08: &(&p->alloc_lock)->rlock
> |  0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock
> |  0xffff8800bba72a08: &(&p->alloc_lock)->rlock
> |  0xffff8800bf18ea08: &(&p->alloc_lock)->rlock
> |  0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock
> |  0xffff88009bf818a0: &(&ip->i_lock)->mr_lock
> |  0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock
> |  0xffff8800bb6478a0: &(shost->host_lock)->rlock
> 
> v3: fixed some problems Frederic pointed out
>  * better rbtree tracking in dump_threads()
>  * removed printf() and used pr_info() and pr_debug()



Applied, thanks Hitoshi!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3] perf lock: add "info" subcommand for dumping misc information
  2010-05-05 21:10           ` Frederic Weisbecker
@ 2010-05-06  9:31             ` Hitoshi Mitake
  2010-05-06  9:32               ` [PATCH] perf lock: track only specified threads Hitoshi Mitake
  0 siblings, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-06  9:31 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On 05/06/10 06:10, Frederic Weisbecker wrote:
 > On Mon, May 03, 2010 at 02:12:00PM +0900, Hitoshi Mitake wrote:
 >> Hi Frederic,
 >>
 >> I added "info" subcommand to perf lock,
 >> this can be used as dumping metadata like thread or address of lock 
instances.
 >> "map" was removed because info should do the work of it.
 >>
 >> This will be useful not only for debugging but also for ordinary 
analyzing.
 >>
 >> I made this patch on perf/core of your tree, could you queue this?
 >>
 >> v2: adding example of usage
 >> % sudo ./perf lock info -t
 >>   | Thread ID: comm
 >>   | 	 0: swapper
 >>   |         1: init
 >>   |        18: migration/5
 >>   |        29: events/2
 >>   |        32: events/5
 >>   |        33: events/6
 >> ...
 >>
 >> % sudo ./perf lock info -m
 >> | Address of instance: name of class
 >> |  0xffff8800b95adae0:&(&sighand->siglock)->rlock
 >> |  0xffff8800bbb41ae0:&(&sighand->siglock)->rlock
 >> |  0xffff8800bf165ae0:&(&sighand->siglock)->rlock
 >> |  0xffff8800b9576a98:&p->cred_guard_mutex
 >> |  0xffff8800bb890a08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800b9522a08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800bb8aaa08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800bba72a08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800bf18ea08:&(&p->alloc_lock)->rlock
 >> |  0xffff8800b8a0d8a0:&(&ip->i_lock)->mr_lock
 >> |  0xffff88009bf818a0:&(&ip->i_lock)->mr_lock
 >> |  0xffff88004c66b8a0:&(&ip->i_lock)->mr_lock
 >> |  0xffff8800bb6478a0:&(shost->host_lock)->rlock
 >>
 >> v3: fixed some problems Frederic pointed out
 >>   * better rbtree tracking in dump_threads()
 >>   * removed printf() and used pr_info() and pr_debug()
 >
 >
 >
 > Applied, thanks Hitoshi!
 >

Thanks Frederic!

And I prepared another patch for tracking only specified threads.

I'll send the new patch in this thread later.
But, the new one is not so big and hardly depends on "info -t" patch.
If you think the new patch and previous one should be in single patch,
please tell me. I'll merge them.

Thanks,
	Hitoshi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] perf lock: track only specified threads
  2010-05-06  9:31             ` Hitoshi Mitake
@ 2010-05-06  9:32               ` Hitoshi Mitake
  2010-05-07  0:49                 ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-06  9:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mitake, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

I implemented the feature of tracking only specified threads to perf lock.
With -t option, users can specify which threads should be tracked.

Example of usage:
| % sudo ./perf lock info -t                    # info -t is convenient with this feature
|  Thread ID: comm
|          0: swapper
|          1: init
|         12: migration/3
|         13: ksoftirqd/3
|         27: events/0
|         28: events/1
|         29: events/2
|         30: events/3
|         31: events/4
|        857: kondemand/0
|        858: kondemand/1
|        859: kondemand/2
| ...
| % sudo ./perf lock -t 27,28,29,30,31 report   # track only these threads
|                 Name   acquired  contended total wait (ns)   max wait (ns)   min wait (ns)
|
|  &(&cwq->lock)->r...          4          0               0               0               0
|  &(&cwq->lock)->r...          4          0               0               0               0
|                  key          2          0               0               0               0
|                  key          2          0               0               0               0
|  &(&tty->buf.lock...          2          0               0               0               0
|  &(&tty->buf.lock...          2          0               0               0               0
|       tty_ldisc_lock          2          0               0               0               0
|  &(&base->lock)->...          2          0               0               0               0
|  &(&(&(*({ do { c...          2          0               0               0               0
|  &(&(&(*({ do { c...          2          0               0               0               0
|  &(&cwq->lock)->r...          2          0               0               0               0

I believe that this is convenient feature because perf itself acquires lots of locks.
Could you queue this, Frederic?

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
---
 tools/perf/builtin-lock.c |   71 +++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 010a7c8..7c75c44 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -23,6 +23,7 @@
 #include <linux/list.h>
 #include <linux/hash.h>
 
+static int do_info;
 static struct perf_session *session;
 
 /* based on kernel/lockdep.c */
@@ -34,6 +35,27 @@ static struct list_head lockhash_table[LOCKHASH_SIZE];
 #define __lockhashfn(key)	hash_long((unsigned long)key, LOCKHASH_BITS)
 #define lockhashentry(key)	(lockhash_table + __lockhashfn((key)))
 
+static u32 *track_thread_array;
+static int track_thread_array_length;
+
+static int is_track_thread_do(u32 tid)
+{
+	int i;
+
+	for (i = 0; i < track_thread_array_length; i++) {
+		if (track_thread_array[i] == tid)
+			return 1;
+	}
+	return 0;
+}
+
+static int is_track_thread_nop(u32 tid __used)
+{
+	return 1;
+}
+
+static int (*is_track_thread)(u32) = is_track_thread_nop;
+
 struct lock_stat {
 	struct list_head	hash_entry;
 	struct rb_node		rb;		/* used for sorting */
@@ -315,6 +337,7 @@ alloc_failed:
 }
 
 static char			const *input_name = "perf.data";
+static char			const *track_threads_str;
 
 struct raw_event_sample {
 	u32			size;
@@ -829,6 +852,9 @@ static int process_sample_event(event_t *self, struct perf_session *s)
 	bzero(&data, sizeof(data));
 	event__parse_sample(self, s->sample_type, &data);
 
+	if (!do_info && !is_track_thread(data.tid))
+		return 0;
+
 	thread = perf_session__findnew(s, data.tid);
 	if (thread == NULL) {
 		pr_debug("problem processing %d event, skipping it.\n",
@@ -868,8 +894,41 @@ static void sort_result(void)
 	}
 }
 
+static void parse_track_threads(void)
+{
+	char *tmp, *tok, *str;
+
+	if (!track_threads_str)
+		return;
+
+	str = strdup(track_threads_str);
+	if (!str)
+		die("Memory allocation failed\n");
+
+	tok = strtok_r(str, ", ", &tmp);
+	if (!tok)
+		return;
+	track_thread_array = zalloc(sizeof(u32));
+	if (!track_thread_array)
+		die("Memory allocation failed\n");
+	track_thread_array[0] = atoi(tok);
+	track_thread_array_length = 1;
+	is_track_thread = is_track_thread_do;
+
+	for (tok = strtok_r(NULL, ", ", &tmp);
+	     tok; tok = strtok_r(NULL, ", ", &tmp)) {
+		track_thread_array = realloc(track_thread_array,
+					     sizeof(u32) *
+					     ++track_thread_array_length);
+		if (!track_thread_array)
+			die("Memory allocation failed\n");
+		track_thread_array[track_thread_array_length - 1] = atoi(tok);
+	}
+}
+
 static void __cmd_report(void)
 {
+	parse_track_threads();
 	setup_pager();
 	select_key();
 	read_events();
@@ -908,9 +967,14 @@ static const char * const lock_usage[] = {
 };
 
 static const struct option lock_options[] = {
-	OPT_STRING('i', "input", &input_name, "file", "input file name"),
-	OPT_INCR('v', "verbose", &verbose, "be more verbose (show symbol address, etc)"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace, "dump raw trace in ASCII"),
+	OPT_STRING('i', "input", &input_name,
+		   "file", "input file name"),
+	OPT_INCR('v', "verbose", &verbose,
+		 "be more verbose (show symbol address, etc)"),
+	OPT_BOOLEAN('D', "dump-raw-trace",
+		    &dump_trace, "dump raw trace in ASCII"),
+	OPT_STRING('t', "threads", &track_threads_str,
+		   NULL, "thread IDs to track"),
 	OPT_END()
 };
 
@@ -981,6 +1045,7 @@ int cmd_lock(int argc, const char **argv, const char *prefix __used)
 				usage_with_options(info_usage, info_options);
 		}
 		/* recycling report_lock_ops */
+		do_info = 1;
 		trace_handler = &report_lock_ops;
 		setup_pager();
 		read_events();
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: track only specified threads
  2010-05-06  9:32               ` [PATCH] perf lock: track only specified threads Hitoshi Mitake
@ 2010-05-07  0:49                 ` Frederic Weisbecker
  2010-05-08  8:02                   ` Hitoshi Mitake
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2010-05-07  0:49 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On Thu, May 06, 2010 at 06:32:56PM +0900, Hitoshi Mitake wrote:
> I implemented the feature of tracking only specified threads to perf lock.
> With -t option, users can specify which threads should be tracked.
> 
> Example of usage:
> | % sudo ./perf lock info -t                    # info -t is convenient with this feature
> |  Thread ID: comm
> |          0: swapper
> |          1: init
> |         12: migration/3
> |         13: ksoftirqd/3
> |         27: events/0
> |         28: events/1
> |         29: events/2
> |         30: events/3
> |         31: events/4
> |        857: kondemand/0
> |        858: kondemand/1
> |        859: kondemand/2
> | ...
> | % sudo ./perf lock -t 27,28,29,30,31 report   # track only these threads
> |                 Name   acquired  contended total wait (ns)   max wait (ns)   min wait (ns)



I'm not sure we want such per thread granularity filtering. I'm not
sure it will be very useful. But per process yeah.

And actually we should do that on tracing time rather than on post-processing.
This will lower the tracing overhead a lot.

Ideally I think we need:
	./perf lock record ls -R /

This would trace locks taken by this instance of ls only, ie: drop the -a
if we pass a command line.

What do you think?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: track only specified threads
  2010-05-07  0:49                 ` Frederic Weisbecker
@ 2010-05-08  8:02                   ` Hitoshi Mitake
  2010-05-08  8:10                     ` [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record() Hitoshi Mitake
  0 siblings, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-08  8:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On 05/07/10 09:49, Frederic Weisbecker wrote:
 > On Thu, May 06, 2010 at 06:32:56PM +0900, Hitoshi Mitake wrote:
 >> I implemented the feature of tracking only specified threads to perf 
lock.
 >> With -t option, users can specify which threads should be tracked.
 >>
 >> Example of usage:
 >> | % sudo ./perf lock info -t                    # info -t is 
convenient with this feature
 >> |  Thread ID: comm
 >> |          0: swapper
 >> |          1: init
 >> |         12: migration/3
 >> |         13: ksoftirqd/3
 >> |         27: events/0
 >> |         28: events/1
 >> |         29: events/2
 >> |         30: events/3
 >> |         31: events/4
 >> |        857: kondemand/0
 >> |        858: kondemand/1
 >> |        859: kondemand/2
 >> | ...
 >> | % sudo ./perf lock -t 27,28,29,30,31 report   # track only these 
threads
 >> |                 Name   acquired  contended total wait (ns)   max 
wait (ns)   min wait (ns)
 >
 >
 >
 > I'm not sure we want such per thread granularity filtering. I'm not
 > sure it will be very useful. But per process yeah.
 >
 > And actually we should do that on tracing time rather than on 
post-processing.
 > This will lower the tracing overhead a lot.
 >
 > Ideally I think we need:
 > 	./perf lock record ls -R /
 >
 > This would trace locks taken by this instance of ls only, ie: drop the -a
 > if we pass a command line.
 >
 > What do you think?
 >
 >

Ah, I completely agree with your opinion :)
Even if user wants to append "-a",
   ./perf lock record -a ls -R /
is enough. "-a" should not in default set of arguments.

I'll send the patch to drop "-a" later.

Thanks,
	Hitoshi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record()
  2010-05-08  8:02                   ` Hitoshi Mitake
@ 2010-05-08  8:10                     ` Hitoshi Mitake
  2010-05-08 16:14                       ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-08  8:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, mitake, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

This patch drops "-a" from record_args, which is passed to cmd_record().

Even if user wants to record all lock events during process runs,
        perf lock record -a <program> <argument> ...
is enough for this purpose.

This can reduce size of perf.data.

% sudo ./perf lock record whoami
root
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
% sudo ./perf lock record -a whoami   # with -a option
root
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 48.962 MB perf.data (~2139197 samples) ]

This patch was made on perf/test of random-tracing.git,
could you queue this, Frederic?

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>

---
 tools/perf/builtin-lock.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index d7dde9c..7f72abc 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -930,7 +930,6 @@ static const struct option lock_options[] = {
 
 static const char *record_args[] = {
 	"record",
-	"-a",
 	"-R",
 	"-f",
 	"-m", "1024",
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record()
  2010-05-08  8:10                     ` [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record() Hitoshi Mitake
@ 2010-05-08 16:14                       ` Frederic Weisbecker
  2010-05-09 14:53                         ` Hitoshi Mitake
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2010-05-08 16:14 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On Sat, May 08, 2010 at 05:10:29PM +0900, Hitoshi Mitake wrote:
> This patch drops "-a" from record_args, which is passed to cmd_record().
> 
> Even if user wants to record all lock events during process runs,
>         perf lock record -a <program> <argument> ...
> is enough for this purpose.
> 
> This can reduce size of perf.data.
> 
> % sudo ./perf lock record whoami
> root
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
> % sudo ./perf lock record -a whoami   # with -a option
> root
> [ perf record: Woken up 0 times to write data ]
> [ perf record: Captured and wrote 48.962 MB perf.data (~2139197 samples) ]
> 
> This patch was made on perf/test of random-tracing.git,
> could you queue this, Frederic?
> 
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jens Axboe <jens.axboe@oracle.com>
> Cc: Jason Baron <jbaron@redhat.com>
> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
> Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>


Thanks, will test it and if it's fine I'll queue.

I did a lot of tests these last days to understand what was going on
with perf lock, I mean the fact we have various bad locking scenario.

So far, the state machine looks rather good. In fact, the real problem
is that we don't have every events. We lose a _lot_ of them and that's
because the frequency of lock events is too high and perf record
can't keep up.

I think I'm going to unearth the injection code to reduce the size
of these events.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record()
  2010-05-08 16:14                       ` Frederic Weisbecker
@ 2010-05-09 14:53                         ` Hitoshi Mitake
  2010-05-11  6:48                           ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-09 14:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On 05/09/10 01:14, Frederic Weisbecker wrote:
 > On Sat, May 08, 2010 at 05:10:29PM +0900, Hitoshi Mitake wrote:
 >> This patch drops "-a" from record_args, which is passed to cmd_record().
 >>
 >> Even if user wants to record all lock events during process runs,
 >>          perf lock record -a<program>  <argument>  ...
 >> is enough for this purpose.
 >>
 >> This can reduce size of perf.data.
 >>
 >> % sudo ./perf lock record whoami
 >> root
 >> [ perf record: Woken up 1 times to write data ]
 >> [ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
 >> % sudo ./perf lock record -a whoami   # with -a option
 >> root
 >> [ perf record: Woken up 0 times to write data ]
 >> [ perf record: Captured and wrote 48.962 MB perf.data (~2139197 
samples) ]
 >>
 >> This patch was made on perf/test of random-tracing.git,
 >> could you queue this, Frederic?
 >>
 >> Cc: Ingo Molnar<mingo@elte.hu>
 >> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl>
 >> Cc: Paul Mackerras<paulus@samba.org>
 >> Cc: Arnaldo Carvalho de Melo<acme@redhat.com>
 >> Cc: Jens Axboe<jens.axboe@oracle.com>
 >> Cc: Jason Baron<jbaron@redhat.com>
 >> Cc: Xiao Guangrong<xiaoguangrong@cn.fujitsu.com>
 >> Signed-off-by: Hitoshi Mitake<mitake@dcl.info.waseda.ac.jp>
 >
 >
 > Thanks, will test it and if it's fine I'll queue.
 >
 > I did a lot of tests these last days to understand what was going on
 > with perf lock, I mean the fact we have various bad locking scenario.
 >
 > So far, the state machine looks rather good. In fact, the real problem
 > is that we don't have every events. We lose a _lot_ of them and that's
 > because the frequency of lock events is too high and perf record
 > can't keep up.

Really, I didn't think about lack of events :(

 >
 > I think I'm going to unearth the injection code to reduce the size
 > of these events.
 >
 >

Yeah, injection will be really helpful thing.

And I have a rough idea for reducing event frequency.

Many lock event sequences are like this form:
  * acquire -> acquired -> release
  * acquire -> contended -> acquired -> release
I think that making 3 or 4 events per each lock sequences
is waste of CPU time and memory space.

If threads store time of each events
and make only 1 event at time of release,
we will be able to reduce lots of time and space.

For example, ID of each lock instance is 8 byte in x86_64.
In this scheme 8 * 4 byte for ID will be only 8 byte.
I think this optimization has worth to consider because of
high frequency of lock events.

How do you think?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [tip:perf/core] perf lock: Add "info" subcommand for dumping misc information
  2010-05-03  5:12         ` [PATCH v3] " Hitoshi Mitake
  2010-05-05 21:10           ` Frederic Weisbecker
@ 2010-05-10  7:19           ` tip-bot for Hitoshi Mitake
  1 sibling, 0 replies; 34+ messages in thread
From: tip-bot for Hitoshi Mitake @ 2010-05-10  7:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, paulus, hpa, mingo, a.p.zijlstra, mitake,
	xiaoguangrong, jens.axboe, fweisbec, jbaron, tglx, mingo

Commit-ID:  26242d859c9be9eea61f7f19514e9d272ae8ce26
Gitweb:     http://git.kernel.org/tip/26242d859c9be9eea61f7f19514e9d272ae8ce26
Author:     Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
AuthorDate: Mon, 3 May 2010 14:12:00 +0900
Committer:  Frederic Weisbecker <fweisbec@gmail.com>
CommitDate: Sun, 9 May 2010 13:45:24 +0200

perf lock: Add "info" subcommand for dumping misc information

This adds the "info" subcommand to perf lock which can be used
to dump metadata like threads or addresses of lock instances.
"map" was removed because info should do the work for it.

This will be useful not only for debugging but also for ordinary
analyzing.

v2: adding example of usage
% sudo ./perf lock info -t
 | Thread ID: comm
 | 	 0: swapper
 |         1: init
 |        18: migration/5
 |        29: events/2
 |        32: events/5
 |        33: events/6
...

% sudo ./perf lock info -m
| Address of instance: name of class
|  0xffff8800b95adae0: &(&sighand->siglock)->rlock
|  0xffff8800bbb41ae0: &(&sighand->siglock)->rlock
|  0xffff8800bf165ae0: &(&sighand->siglock)->rlock
|  0xffff8800b9576a98: &p->cred_guard_mutex
|  0xffff8800bb890a08: &(&p->alloc_lock)->rlock
|  0xffff8800b9522a08: &(&p->alloc_lock)->rlock
|  0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock
|  0xffff8800bba72a08: &(&p->alloc_lock)->rlock
|  0xffff8800bf18ea08: &(&p->alloc_lock)->rlock
|  0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock
|  0xffff88009bf818a0: &(&ip->i_lock)->mr_lock
|  0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock
|  0xffff8800bb6478a0: &(shost->host_lock)->rlock

v3: fixed some problems Frederic pointed out
 * better rbtree tracking in dump_threads()
 * removed printf() and used pr_info() and pr_debug()

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
LKML-Reference: <1272863520-16179-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/builtin-lock.c |   96 ++++++++++++++++++++++++++++++++++-----------
 1 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 6605000..c4eb854 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -720,15 +720,15 @@ static void print_result(void)
 	char cut_name[20];
 	int bad, total;
 
-	printf("%20s ", "Name");
-	printf("%10s ", "acquired");
-	printf("%10s ", "contended");
+	pr_info("%20s ", "Name");
+	pr_info("%10s ", "acquired");
+	pr_info("%10s ", "contended");
 
-	printf("%15s ", "total wait (ns)");
-	printf("%15s ", "max wait (ns)");
-	printf("%15s ", "min wait (ns)");
+	pr_info("%15s ", "total wait (ns)");
+	pr_info("%15s ", "max wait (ns)");
+	pr_info("%15s ", "min wait (ns)");
 
-	printf("\n\n");
+	pr_info("\n\n");
 
 	bad = total = 0;
 	while ((st = pop_from_result())) {
@@ -741,7 +741,7 @@ static void print_result(void)
 
 		if (strlen(st->name) < 16) {
 			/* output raw name */
-			printf("%20s ", st->name);
+			pr_info("%20s ", st->name);
 		} else {
 			strncpy(cut_name, st->name, 16);
 			cut_name[16] = '.';
@@ -749,17 +749,17 @@ static void print_result(void)
 			cut_name[18] = '.';
 			cut_name[19] = '\0';
 			/* cut off name for saving output style */
-			printf("%20s ", cut_name);
+			pr_info("%20s ", cut_name);
 		}
 
-		printf("%10u ", st->nr_acquired);
-		printf("%10u ", st->nr_contended);
+		pr_info("%10u ", st->nr_acquired);
+		pr_info("%10u ", st->nr_contended);
 
-		printf("%15llu ", st->wait_time_total);
-		printf("%15llu ", st->wait_time_max);
-		printf("%15llu ", st->wait_time_min == ULLONG_MAX ?
+		pr_info("%15llu ", st->wait_time_total);
+		pr_info("%15llu ", st->wait_time_max);
+		pr_info("%15llu ", st->wait_time_min == ULLONG_MAX ?
 		       0 : st->wait_time_min);
-		printf("\n");
+		pr_info("\n");
 	}
 
 	{
@@ -768,28 +768,59 @@ static void print_result(void)
 		const char *name[4] =
 			{ "acquire", "acquired", "contended", "release" };
 
-		printf("\n=== output for debug===\n\n");
-		printf("bad:%d, total:%d\n", bad, total);
-		printf("bad rate:%f\n", (double)(bad / total));
+		pr_debug("\n=== output for debug===\n\n");
+		pr_debug("bad:%d, total:%d\n", bad, total);
+		pr_debug("bad rate:%f\n", (double)(bad / total));
 
-		printf("histogram of events caused bad sequence\n");
+		pr_debug("histogram of events caused bad sequence\n");
 		for (i = 0; i < 4; i++)
-			printf(" %10s: %d\n", name[i], bad_hist[i]);
+			pr_debug(" %10s: %d\n", name[i], bad_hist[i]);
 	}
 }
 
+static int			info_threads;
+static int			info_map;
+
+static void dump_threads(void)
+{
+	struct thread_stat *st;
+	struct rb_node *node;
+	struct thread *t;
+
+	pr_info("%10s: comm\n", "Thread ID");
+
+	node = rb_first(&thread_stats);
+	while (node) {
+		st = container_of(node, struct thread_stat, rb);
+		t = perf_session__findnew(session, st->tid);
+		pr_info("%10d: %s\n", st->tid, t->comm);
+		node = rb_next(node);
+	};
+}
+
 static void dump_map(void)
 {
 	unsigned int i;
 	struct lock_stat *st;
 
+	pr_info("Address of instance: name of class\n");
 	for (i = 0; i < LOCKHASH_SIZE; i++) {
 		list_for_each_entry(st, &lockhash_table[i], hash_entry) {
-			printf("%p: %s\n", st->addr, st->name);
+			pr_info(" %p: %s\n", st->addr, st->name);
 		}
 	}
 }
 
+static void dump_info(void)
+{
+	if (info_threads)
+		dump_threads();
+	else if (info_map)
+		dump_map();
+	else
+		die("Unknown type of information\n");
+}
+
 static int process_sample_event(event_t *self, struct perf_session *s)
 {
 	struct sample_data data;
@@ -858,6 +889,19 @@ static const struct option report_options[] = {
 	OPT_END()
 };
 
+static const char * const info_usage[] = {
+	"perf lock info [<options>]",
+	NULL
+};
+
+static const struct option info_options[] = {
+	OPT_BOOLEAN('t', "threads", &info_threads,
+		    "dump thread list in perf.data"),
+	OPT_BOOLEAN('m', "map", &info_map,
+		    "map of lock instances (name:address table)"),
+	OPT_END()
+};
+
 static const char * const lock_usage[] = {
 	"perf lock [<options>] {record|trace|report}",
 	NULL
@@ -929,12 +973,18 @@ int cmd_lock(int argc, const char **argv, const char *prefix __used)
 	} else if (!strcmp(argv[0], "trace")) {
 		/* Aliased to 'perf trace' */
 		return cmd_trace(argc, argv, prefix);
-	} else if (!strcmp(argv[0], "map")) {
+	} else if (!strcmp(argv[0], "info")) {
+		if (argc) {
+			argc = parse_options(argc, argv,
+					     info_options, info_usage, 0);
+			if (argc)
+				usage_with_options(info_usage, info_options);
+		}
 		/* recycling report_lock_ops */
 		trace_handler = &report_lock_ops;
 		setup_pager();
 		read_events();
-		dump_map();
+		dump_info();
 	} else {
 		usage_with_options(lock_usage, lock_options);
 	}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record()
  2010-05-09 14:53                         ` Hitoshi Mitake
@ 2010-05-11  6:48                           ` Frederic Weisbecker
  2010-05-12 10:23                             ` Hitoshi Mitake
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2010-05-11  6:48 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On Sun, May 09, 2010 at 11:53:19PM +0900, Hitoshi Mitake wrote:
> On 05/09/10 01:14, Frederic Weisbecker wrote:
> > On Sat, May 08, 2010 at 05:10:29PM +0900, Hitoshi Mitake wrote:
> >> This patch drops "-a" from record_args, which is passed to cmd_record().
> >>
> >> Even if user wants to record all lock events during process runs,
> >>          perf lock record -a<program>  <argument>  ...
> >> is enough for this purpose.
> >>
> >> This can reduce size of perf.data.
> >>
> >> % sudo ./perf lock record whoami
> >> root
> >> [ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ]
> >> % sudo ./perf lock record -a whoami   # with -a option
> >> root
> >> [ perf record: Woken up 0 times to write data ]
> >> [ perf record: Captured and wrote 48.962 MB perf.data (~2139197  
> samples) ]
> >>
> >> This patch was made on perf/test of random-tracing.git,
> >> could you queue this, Frederic?
> >>
> >> Cc: Ingo Molnar<mingo@elte.hu>
> >> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl>
> >> Cc: Paul Mackerras<paulus@samba.org>
> >> Cc: Arnaldo Carvalho de Melo<acme@redhat.com>
> >> Cc: Jens Axboe<jens.axboe@oracle.com>
> >> Cc: Jason Baron<jbaron@redhat.com>
> >> Cc: Xiao Guangrong<xiaoguangrong@cn.fujitsu.com>
> >> Signed-off-by: Hitoshi Mitake<mitake@dcl.info.waseda.ac.jp>
> >
> >
> > Thanks, will test it and if it's fine I'll queue.
> >
> > I did a lot of tests these last days to understand what was going on
> > with perf lock, I mean the fact we have various bad locking scenario.
> >
> > So far, the state machine looks rather good. In fact, the real problem
> > is that we don't have every events. We lose a _lot_ of them and that's
> > because the frequency of lock events is too high and perf record
> > can't keep up.
>
> Really, I didn't think about lack of events :(



You can observe this by looking at perf trace output and search
for long series of the same type of events without the others.

For example it's common to find long series of lock_contended
events without corresponding lock_acquire/release/acquired.

This is because lock_contended is the lowest frequent lock event,
when the others have their buffers full, this one still has rooms
for samples.



> >
> > I think I'm going to unearth the injection code to reduce the size
> > of these events.
> >
> >
>
> Yeah, injection will be really helpful thing.
>
> And I have a rough idea for reducing event frequency.
>
> Many lock event sequences are like this form:
>  * acquire -> acquired -> release
>  * acquire -> contended -> acquired -> release
> I think that making 3 or 4 events per each lock sequences
> is waste of CPU time and memory space.
>
> If threads store time of each events
> and make only 1 event at time of release,
> we will be able to reduce lots of time and space.
>
> For example, ID of each lock instance is 8 byte in x86_64.
> In this scheme 8 * 4 byte for ID will be only 8 byte.
> I think this optimization has worth to consider because of
> high frequency of lock events.
>
> How do you think?


You're right, we could optimize the lock events sequence layout.
What I'm afraid of is to break userspace, but ripping the name from
the lock events while introducing injection will break userspace anyway :-(

May be we can think about providing new lock events and announce the
deprecation of the old ones and remove them later. I'm not sure yet.

But summing up in only one event is not possible. Having only one
lock_release event (and a lock init for name mapping) is suitable
for latency measurements only (timestamp + lock addr + wait time +
acquired time).
And once you dig into finer grained analysis like latency induced
by dependencies (take lock A and then take lock B under A, latency
of B depends of A), then you're screwed, because you only know
you've released locks at given times but you don't know when you
acquired them, hence you can't build a tree of dependencies with
sequences inside.

Ideally, I think that we need to remove lock contended and only
trigger lock_acquired when we contended. We obviously don't need
lock_contended/lock_acquired if the lock wasn't contended.

We could even remove lock_acquire and only play with lock_acquired,
changing a bit the rules, but that will make us lose all the dependencies
"intra-lock". I mean there are locks which slowpath are implemented on top
of other locks: mutexes use mutex->wait_lock spinlock for example.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record()
  2010-05-11  6:48                           ` Frederic Weisbecker
@ 2010-05-12 10:23                             ` Hitoshi Mitake
  2010-05-12 15:51                               ` Frederic Weisbecker
  0 siblings, 1 reply; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-12 10:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On 05/11/10 15:48, Frederic Weisbecker wrote:

 >>>
 >>> I think I'm going to unearth the injection code to reduce the size
 >>> of these events.
 >>>
 >>>
 >>
 >> Yeah, injection will be really helpful thing.
 >>
 >> And I have a rough idea for reducing event frequency.
 >>
 >> Many lock event sequences are like this form:
 >>   * acquire ->  acquired ->  release
 >>   * acquire ->  contended ->  acquired ->  release
 >> I think that making 3 or 4 events per each lock sequences
 >> is waste of CPU time and memory space.
 >>
 >> If threads store time of each events
 >> and make only 1 event at time of release,
 >> we will be able to reduce lots of time and space.
 >>
 >> For example, ID of each lock instance is 8 byte in x86_64.
 >> In this scheme 8 * 4 byte for ID will be only 8 byte.
 >> I think this optimization has worth to consider because of
 >> high frequency of lock events.
 >>
 >> How do you think?
 >
 >
 > You're right, we could optimize the lock events sequence layout.
 > What I'm afraid of is to break userspace, but ripping the name from
 > the lock events while introducing injection will break userspace 
anyway :-(

Really? For me, at least ripping the name with injection
doesn't make bad things for userspace.
What does the word "break" mean in this context?

 >
 > May be we can think about providing new lock events and announce the
 > deprecation of the old ones and remove them later. I'm not sure yet.
 >
 > But summing up in only one event is not possible. Having only one
 > lock_release event (and a lock init for name mapping) is suitable
 > for latency measurements only (timestamp + lock addr + wait time +
 > acquired time).
 > And once you dig into finer grained analysis like latency induced
 > by dependencies (take lock A and then take lock B under A, latency
 > of B depends of A), then you're screwed, because you only know
 > you've released locks at given times but you don't know when you
 > acquired them, hence you can't build a tree of dependencies with
 > sequences inside.

In my imagination, composing 3 or 4 events into one is meaning
timestamp of itself(it is also one of release) + lock addr
+ timestamp of acquire + timestamp of acquired
(+ timestamp of contended) + misc information
like flags.

I'd like to call this new event as "batch event" in below.

If perf lock reads one batch event, original events of 3 or 4
can be reconstructed in userspace.
So I think dependency relation between locks can be obtained
with sorting reconstructed events with timestamp.

For this way, each threads have to hold memory for
size of batch event * MAX_LOCK_DEPTH.

I'm not sure about possibility and effect of this way :(
and if I misunderstood something about your opinion, please correct me

 >
 > Ideally, I think that we need to remove lock contended and only
 > trigger lock_acquired when we contended. We obviously don't need
 > lock_contended/lock_acquired if the lock wasn't contended.

Yeah, lock events can be reduced more.

 >
 > We could even remove lock_acquire and only play with lock_acquired,
 > changing a bit the rules, but that will make us lose all the dependencies
 > "intra-lock". I mean there are locks which slowpath are implemented 
on top
 > of other locks: mutexes use mutex->wait_lock spinlock for example.
 >
 >

Do you mean that the relation like acquire(mutex) -> acquire(spinlock)
-> acquired(spinlock) -> acquired(mutex) -> release(spinlock)
will be lost?

It seems taht locks on other locks are only mutex and rwsem.
I think that we have a way to rewrite their lock events
of mutex and rwsem for intra-lock dependencies.

But I cannot measure the actual cost of it :(
So I cannot say easily this is possible...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record()
  2010-05-12 10:23                             ` Hitoshi Mitake
@ 2010-05-12 15:51                               ` Frederic Weisbecker
  2010-05-15  8:54                                 ` Hitoshi Mitake
  0 siblings, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2010-05-12 15:51 UTC (permalink / raw)
  To: Hitoshi Mitake
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On Wed, May 12, 2010 at 07:23:23PM +0900, Hitoshi Mitake wrote:
> On 05/11/10 15:48, Frederic Weisbecker wrote:
>
> >>>
> >>> I think I'm going to unearth the injection code to reduce the size
> >>> of these events.
> >>>
> >>>
> >>
> >> Yeah, injection will be really helpful thing.
> >>
> >> And I have a rough idea for reducing event frequency.
> >>
> >> Many lock event sequences are like this form:
> >>   * acquire ->  acquired ->  release
> >>   * acquire ->  contended ->  acquired ->  release
> >> I think that making 3 or 4 events per each lock sequences
> >> is waste of CPU time and memory space.
> >>
> >> If threads store time of each events
> >> and make only 1 event at time of release,
> >> we will be able to reduce lots of time and space.
> >>
> >> For example, ID of each lock instance is 8 byte in x86_64.
> >> In this scheme 8 * 4 byte for ID will be only 8 byte.
> >> I think this optimization has worth to consider because of
> >> high frequency of lock events.
> >>
> >> How do you think?
> >
> >
> > You're right, we could optimize the lock events sequence layout.
> > What I'm afraid of is to break userspace, but ripping the name from
> > the lock events while introducing injection will break userspace  
> anyway :-(
>
> Really? For me, at least ripping the name with injection
> doesn't make bad things for userspace.
> What does the word "break" mean in this context?


The fact that tools which relied on the name field in the lock events
won't work anymore as it will be present on lock_init_class only.



> >
> > May be we can think about providing new lock events and announce the
> > deprecation of the old ones and remove them later. I'm not sure yet.
> >
> > But summing up in only one event is not possible. Having only one
> > lock_release event (and a lock init for name mapping) is suitable
> > for latency measurements only (timestamp + lock addr + wait time +
> > acquired time).
> > And once you dig into finer grained analysis like latency induced
> > by dependencies (take lock A and then take lock B under A, latency
> > of B depends of A), then you're screwed, because you only know
> > you've released locks at given times but you don't know when you
> > acquired them, hence you can't build a tree of dependencies with
> > sequences inside.
>
> In my imagination, composing 3 or 4 events into one is meaning
> timestamp of itself(it is also one of release) + lock addr
> + timestamp of acquire + timestamp of acquired
> (+ timestamp of contended) + misc information
> like flags.



Ah I see.



> I'd like to call this new event as "batch event" in below.
>
> If perf lock reads one batch event, original events of 3 or 4
> can be reconstructed in userspace.
> So I think dependency relation between locks can be obtained
> with sorting reconstructed events with timestamp.
>
> For this way, each threads have to hold memory for
> size of batch event * MAX_LOCK_DEPTH.
>
> I'm not sure about possibility and effect of this way :(
> and if I misunderstood something about your opinion, please correct me



Ok it would be indeed more efficient for what we want with perf lock.
But I see drawbacks with this: other people might be interested
in only few of these events. Someone may want to count lock_contended
events only to produce simple contention stats for example, and this
user will have in turn larger traces than he was supposed to with
this batch event.

I think it's usually better to divide the problems into smaller
problems. And here it's about dividing a high level meaning
(a lock sequence) into smaller and lower level meanings (a lock
event). Following this rule makes tracing much more powerful IMO.

A lock acquire event has also an isolated meaning outside the
whole lock sequence, like in my above lock_contended example,
it can be useful alone for someone.



> > We could even remove lock_acquire and only play with lock_acquired,
> > changing a bit the rules, but that will make us lose all the dependencies
> > "intra-lock". I mean there are locks which slowpath are implemented on 
> top
> > of other locks: mutexes use mutex->wait_lock spinlock for example.
> >
> >
>
> Do you mean that the relation like acquire(mutex) -> acquire(spinlock)
> -> acquired(spinlock) -> acquired(mutex) -> release(spinlock)
> will be lost?


Yep.


> It seems taht locks on other locks are only mutex and rwsem.
> I think that we have a way to rewrite their lock events
> of mutex and rwsem for intra-lock dependencies.
>
> But I cannot measure the actual cost of it :(
> So I cannot say easily this is possible...


But still I think it's useful to keep lock_contended and
lock_acquired. They have an important meaning alone.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record()
  2010-05-12 15:51                               ` Frederic Weisbecker
@ 2010-05-15  8:54                                 ` Hitoshi Mitake
  0 siblings, 0 replies; 34+ messages in thread
From: Hitoshi Mitake @ 2010-05-15  8:54 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, h.mitake, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jens Axboe,
	Jason Baron, Xiao Guangrong

On 05/13/10 00:51, Frederic Weisbecker wrote:
 > On Wed, May 12, 2010 at 07:23:23PM +0900, Hitoshi Mitake wrote:
 >> On 05/11/10 15:48, Frederic Weisbecker wrote:
 >>
 >>>>>
 >>>>> I think I'm going to unearth the injection code to reduce the size
 >>>>> of these events.
 >>>>>
 >>>>>
 >>>>
 >>>> Yeah, injection will be really helpful thing.
 >>>>
 >>>> And I have a rough idea for reducing event frequency.
 >>>>
 >>>> Many lock event sequences are like this form:
 >>>>    * acquire ->   acquired ->   release
 >>>>    * acquire ->   contended ->   acquired ->   release
 >>>> I think that making 3 or 4 events per each lock sequences
 >>>> is waste of CPU time and memory space.
 >>>>
 >>>> If threads store time of each events
 >>>> and make only 1 event at time of release,
 >>>> we will be able to reduce lots of time and space.
 >>>>
 >>>> For example, ID of each lock instance is 8 byte in x86_64.
 >>>> In this scheme 8 * 4 byte for ID will be only 8 byte.
 >>>> I think this optimization has worth to consider because of
 >>>> high frequency of lock events.
 >>>>
 >>>> How do you think?
 >>>
 >>>
 >>> You're right, we could optimize the lock events sequence layout.
 >>> What I'm afraid of is to break userspace, but ripping the name from
 >>> the lock events while introducing injection will break userspace
 >> anyway :-(
 >>
 >> Really? For me, at least ripping the name with injection
 >> doesn't make bad things for userspace.
 >> What does the word "break" mean in this context?
 >
 >
 > The fact that tools which relied on the name field in the lock events
 > won't work anymore as it will be present on lock_init_class only.

Ah, but rewriting perf lock will be required anyway
because this is still very alpha program :)

 >
 >
 >
 >>>
 >>> May be we can think about providing new lock events and announce the
 >>> deprecation of the old ones and remove them later. I'm not sure yet.
 >>>
 >>> But summing up in only one event is not possible. Having only one
 >>> lock_release event (and a lock init for name mapping) is suitable
 >>> for latency measurements only (timestamp + lock addr + wait time +
 >>> acquired time).
 >>> And once you dig into finer grained analysis like latency induced
 >>> by dependencies (take lock A and then take lock B under A, latency
 >>> of B depends of A), then you're screwed, because you only know
 >>> you've released locks at given times but you don't know when you
 >>> acquired them, hence you can't build a tree of dependencies with
 >>> sequences inside.
 >>
 >> In my imagination, composing 3 or 4 events into one is meaning
 >> timestamp of itself(it is also one of release) + lock addr
 >> + timestamp of acquire + timestamp of acquired
 >> (+ timestamp of contended) + misc information
 >> like flags.
 >
 >
 >
 > Ah I see.
 >
 >
 >
 >> I'd like to call this new event as "batch event" in below.
 >>
 >> If perf lock reads one batch event, original events of 3 or 4
 >> can be reconstructed in userspace.
 >> So I think dependency relation between locks can be obtained
 >> with sorting reconstructed events with timestamp.
 >>
 >> For this way, each threads have to hold memory for
 >> size of batch event * MAX_LOCK_DEPTH.
 >>
 >> I'm not sure about possibility and effect of this way :(
 >> and if I misunderstood something about your opinion, please correct me
 >
 >
 >
 > Ok it would be indeed more efficient for what we want with perf lock.
 > But I see drawbacks with this: other people might be interested
 > in only few of these events. Someone may want to count lock_contended
 > events only to produce simple contention stats for example, and this
 > user will have in turn larger traces than he was supposed to with
 > this batch event.
 >
 > I think it's usually better to divide the problems into smaller
 > problems. And here it's about dividing a high level meaning
 > (a lock sequence) into smaller and lower level meanings (a lock
 > event). Following this rule makes tracing much more powerful IMO.
 >
 > A lock acquire event has also an isolated meaning outside the
 > whole lock sequence, like in my above lock_contended example,
 > it can be useful alone for someone.

Ah, I would understood your opinion.
Current perf lock records all types of lock events,
but it is not efficient.

For example, users want to make critical section short
might require lock_acquire and lock_release only.
Another example, users want to solve scalability of multi thread/process
program might require recording lock_contended only.

So, perf lock should record only required types of event
for purpose of analyzing lock.
Is this your opinion?

If so, I completely agree with this.
Limiting types of events to record is not only
smart but also will be reduce overhead :)

 >
 >
 >
 >>> We could even remove lock_acquire and only play with lock_acquired,
 >>> changing a bit the rules, but that will make us lose all the 
dependencies
 >>> "intra-lock". I mean there are locks which slowpath are implemented on
 >> top
 >>> of other locks: mutexes use mutex->wait_lock spinlock for example.
 >>>
 >>>
 >>
 >> Do you mean that the relation like acquire(mutex) ->  acquire(spinlock)
 >> ->  acquired(spinlock) ->  acquired(mutex) ->  release(spinlock)
 >> will be lost?
 >
 >
 > Yep.
 >
 >
 >> It seems taht locks on other locks are only mutex and rwsem.
 >> I think that we have a way to rewrite their lock events
 >> of mutex and rwsem for intra-lock dependencies.
 >>
 >> But I cannot measure the actual cost of it :(
 >> So I cannot say easily this is possible...
 >
 >
 > But still I think it's useful to keep lock_contended and
 > lock_acquired. They have an important meaning alone.
 >
 >

Yeah, I agree with this too.
	

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2010-05-15  8:54 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-24  2:05 [GIT PULL] perf tools updates Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 1/9] perf lock: Fix state machine to recognize lock sequence Frederic Weisbecker
2010-04-24 10:43   ` [PATCH] perf lock: add "info" subcommand for dumping misc information Hitoshi Mitake
2010-04-24 10:46     ` Hitoshi Mitake
2010-04-24 10:46   ` [PATCH v2] " Hitoshi Mitake
2010-04-24 13:41     ` Frederic Weisbecker
2010-04-30 18:49     ` Frederic Weisbecker
2010-05-03  5:11       ` Hitoshi Mitake
2010-05-03  5:12         ` [PATCH v3] " Hitoshi Mitake
2010-05-05 21:10           ` Frederic Weisbecker
2010-05-06  9:31             ` Hitoshi Mitake
2010-05-06  9:32               ` [PATCH] perf lock: track only specified threads Hitoshi Mitake
2010-05-07  0:49                 ` Frederic Weisbecker
2010-05-08  8:02                   ` Hitoshi Mitake
2010-05-08  8:10                     ` [PATCH] perf lock: Drop "-a" option from set of default arguments to cmd_record() Hitoshi Mitake
2010-05-08 16:14                       ` Frederic Weisbecker
2010-05-09 14:53                         ` Hitoshi Mitake
2010-05-11  6:48                           ` Frederic Weisbecker
2010-05-12 10:23                             ` Hitoshi Mitake
2010-05-12 15:51                               ` Frederic Weisbecker
2010-05-15  8:54                                 ` Hitoshi Mitake
2010-05-10  7:19           ` [tip:perf/core] perf lock: Add "info" subcommand for dumping misc information tip-bot for Hitoshi Mitake
2010-04-24  2:05 ` [PATCH 2/9] perf: Fix initialization bug in parse_single_tracepoint_event() Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 3/9] perf: Generalize perf lock's sample event reordering to the session layer Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 4/9] perf: Use generic sample reordering in perf sched Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 5/9] perf: Use generic sample reordering in perf kmem Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 6/9] perf: Use generic sample reordering in perf trace Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 7/9] perf: Use generic sample reordering in perf timechart Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 8/9] perf: Add a perf trace option to check samples ordering reliability Frederic Weisbecker
2010-04-24 16:13   ` Masami Hiramatsu
2010-04-25 18:08     ` Frederic Weisbecker
2010-04-24  2:05 ` [PATCH 9/9] perf: Some perf-kvm documentation edits Frederic Weisbecker
2010-04-24  2:27 ` [GIT PULL] perf tools updates Frederic Weisbecker
2010-04-27  9:15 ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.