linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf: Timestamp COMM and MMAP events for correct attribution
@ 2010-11-23  5:35 Ian Munsie
  2010-11-23  5:35 ` [PATCH 1/6] perf: Display unknown rather than IP for unidentified DSO Ian Munsie
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  5:35 UTC (permalink / raw)
  To: linux-kernel

This patch series fixes a bug introduced with the per-task-per-cpu event
support in perf where PERF_RECORD_COMM and PERF_RECORD_MMAP events would arrive
and be recorded via one CPUs buffer much later than samples corresponding to
them arriving on another CPUs buffer, and would cause perf to mis-attribute
those events.

This has been observed on multiple systems, but is more common the more CPUs
present - on 64 CPU systems it is extremely common (one system tested occurred
~50% of the time, while another was >99%), while on my quad Intel i7 it is much
rarer (but I have observed it several times). Examining the data file with perf
report -D shows the COMM and MMAP events a long way through the file (80% or
more), well after many previous events.

This series introduces an ABI change to perf to add the timestamps to the MMAP
and COMM events so that they can be correlated with the samples corresponding
to them. The timestamps are requested from the kernel by setting the all_timed
bit in the attributes when opening the events and is placed just after the
event header of the events in question.

While not necessary to fix this bug (or any other AFAIK), I have also included
changes to the PERF_RECORD_READ and PERF_RECORD_LOST events to add the
timestamp in the same manner. This could be useful to support and if we did
eventually want to add it it would be easier to change all the events in
question at once. Those changes are in separate patches from the COMM and MMAP
changes so they can easily be dropped when merging this upstream if they are
unwanted.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] perf: Display unknown rather than IP for unidentified DSO
  2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
@ 2010-11-23  5:35 ` Ian Munsie
  2010-11-23  5:35 ` [PATCH 2/6] perf: Move all output for perf report -D into trace_event Ian Munsie
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  5:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ian Munsie, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Arun Sharma

From: Ian Munsie <imunsie@au1.ibm.com>

In the event that a DSO has not been identified, just print out
<unknown> instead of the instruction pointer as we previously were
doing, which is pretty meaningless for a shared object (at least to the
users perspective).

The IP we print out is fairly meaningless in general anyway - it's just
one (the first) of the many addresses that were lumped together as
unidentified, and could span many shared objects and symbols. In reality
if we see this <unknown> output the debugging output (report -D) is
going to be more useful anyway as we can see all the different address
that it represents.

If we are printing the symbols we are still going to see this IP in that
column anyway since they shouldn't resolve either.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 tools/perf/util/sort.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index b62a553..68dad55 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -170,7 +170,7 @@ static int hist_entry__dso_snprintf(struct hist_entry *self, char *bf,
 		return repsep_snprintf(bf, size, "%-*s", width, dso_name);
 	}
 
-	return repsep_snprintf(bf, size, "%*Lx", width, self->ip);
+	return repsep_snprintf(bf, size, "%-*s", width, "<unknown>");
 }
 
 /* --sort symbol */
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] perf: Move all output for perf report -D into trace_event
  2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
  2010-11-23  5:35 ` [PATCH 1/6] perf: Display unknown rather than IP for unidentified DSO Ian Munsie
@ 2010-11-23  5:35 ` Ian Munsie
  2010-11-23  5:35 ` [PATCH 3/6] perf: Add timestamp to COMM and MMAP events Ian Munsie
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  5:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ian Munsie, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, David S. Miller, Andy Isaacson,
	Tom Zanussi, Thomas Gleixner

From: Ian Munsie <imunsie@au1.ibm.com>

This patch moves all the places that would produce output while doing a
perf report -D into the trace_event routine. This will be necessary to
allow perf report to process events ordered by timestamp and still
produce sane debugging output.

The offset of ordered events are stored in the sample queue to allow
them to be printed in the debugging output as before. This also now
print out the timestamp (if present) in the debugging output.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 tools/perf/util/debug.c   |   16 +++++++++++++++-
 tools/perf/util/debug.h   |    2 +-
 tools/perf/util/session.c |   40 +++++++++++++++++++---------------------
 3 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index c8d81b0..353d5ed 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -61,7 +61,7 @@ static int dump_printf_color(const char *fmt, const char *color, ...)
 }
 
 
-void trace_event(event_t *event)
+void trace_event(event_t *event, u64 offset, u64 timestamp)
 {
 	unsigned char *raw_event = (void *)event;
 	const char *color = PERF_COLOR_BLUE;
@@ -70,6 +70,14 @@ void trace_event(event_t *event)
 	if (!dump_trace)
 		return;
 
+	dump_printf("\n%#llx ", offset);
+
+	if (timestamp)
+		dump_printf("(%lld) ", timestamp);
+
+	dump_printf("[%#x]: event: %d\n",
+		    event->header.size, event->header.type);
+
 	dump_printf(".");
 	dump_printf_color("\n. ... raw event: size %d bytes\n", color,
 			  event->header.size);
@@ -95,4 +103,10 @@ void trace_event(event_t *event)
 		}
 	}
 	dump_printf(".\n");
+
+	if (event->header.type < PERF_RECORD_HEADER_MAX) {
+		dump_printf("%#llx [%#x]: PERF_RECORD_%s",
+			    offset, event->header.size,
+			    event__name[event->header.type]);
+	}
 }
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index 7b51408..2826bd7 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -9,7 +9,7 @@ extern int verbose;
 extern bool quiet, dump_trace;
 
 int dump_printf(const char *fmt, ...) __attribute__((format(printf, 1, 2)));
-void trace_event(event_t *event);
+void trace_event(event_t *event, u64 offset, u64 timestamp);
 
 struct ui_progress;
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 0073b5b..40d2193 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -400,6 +400,7 @@ static event__swap_op event__swap_ops[] = {
 
 struct sample_queue {
 	u64			timestamp;
+	u64			offset;
 	struct sample_event	*event;
 	struct list_head	list;
 };
@@ -421,6 +422,8 @@ static void flush_sample_queue(struct perf_session *s,
 		if (iter == s->ordered_samples.last_inserted)
 			s->ordered_samples.last_inserted = NULL;
 
+		trace_event((event_t *)iter->event, iter->offset, iter->timestamp);
+
 		ops->sample((event_t *)iter->event, s);
 
 		s->ordered_samples.last_flush = iter->timestamp;
@@ -547,7 +550,7 @@ static void __queue_sample_event(struct sample_queue *new,
 }
 
 static int queue_sample_event(event_t *event, struct sample_data *data,
-			      struct perf_session *s)
+		       struct perf_session *s, u64 offset, u64 head)
 {
 	u64 timestamp = data->time;
 	struct sample_queue *new;
@@ -563,6 +566,7 @@ static int queue_sample_event(event_t *event, struct sample_data *data,
 		return -ENOMEM;
 
 	new->timestamp = timestamp;
+	new->offset    = offset + head;
 
 	new->event = malloc(event->header.size);
 	if (!new->event) {
@@ -582,17 +586,20 @@ static int queue_sample_event(event_t *event, struct sample_data *data,
 }
 
 static int perf_session__process_sample(event_t *event, struct perf_session *s,
-					struct perf_event_ops *ops)
+					struct perf_event_ops *ops,
+					u64 offset, u64 head)
 {
 	struct sample_data data;
 
-	if (!ops->ordered_samples)
-		return ops->sample(event, s);
-
 	bzero(&data, sizeof(struct sample_data));
 	event__parse_sample(event, s->sample_type, &data);
 
-	queue_sample_event(event, &data, s);
+	if (!ops->ordered_samples) {
+		trace_event(event, offset + head, data.time);
+		return ops->sample(event, s);
+	}
+
+	queue_sample_event(event, &data, s, offset, head);
 
 	return 0;
 }
@@ -602,21 +609,18 @@ static int perf_session__process_event(struct perf_session *self,
 				       struct perf_event_ops *ops,
 				       u64 offset, u64 head)
 {
-	trace_event(event);
-
-	if (event->header.type < PERF_RECORD_HEADER_MAX) {
-		dump_printf("%#Lx [%#x]: PERF_RECORD_%s",
-			    offset + head, event->header.size,
-			    event__name[event->header.type]);
+	if (event->header.type < PERF_RECORD_HEADER_MAX)
 		hists__inc_nr_events(&self->hists, event->header.type);
-	}
 
 	if (self->header.needs_swap && event__swap_ops[event->header.type])
 		event__swap_ops[event->header.type](event);
 
+	if (event->header.type == PERF_RECORD_SAMPLE)
+		return perf_session__process_sample(event, self, ops,
+						    offset, head);
+	trace_event(event, offset + head, 0);
+
 	switch (event->header.type) {
-	case PERF_RECORD_SAMPLE:
-		return perf_session__process_sample(event, self, ops);
 	case PERF_RECORD_MMAP:
 		return ops->mmap(event, self);
 	case PERF_RECORD_COMM:
@@ -754,9 +758,6 @@ more:
 
 	head += size;
 
-	dump_printf("\n%#Lx [%#x]: event: %d\n",
-		    head, event.header.size, event.header.type);
-
 	if (skip > 0)
 		head += skip;
 
@@ -834,9 +835,6 @@ more:
 
 	size = event->header.size;
 
-	dump_printf("\n%#Lx [%#x]: event: %d\n",
-		    offset + head, event->header.size, event->header.type);
-
 	if (size == 0 ||
 	    perf_session__process_event(self, event, ops, offset, head) < 0) {
 		dump_printf("%#Lx [%#x]: skipping unknown header type: %d\n",
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
  2010-11-23  5:35 ` [PATCH 1/6] perf: Display unknown rather than IP for unidentified DSO Ian Munsie
  2010-11-23  5:35 ` [PATCH 2/6] perf: Move all output for perf report -D into trace_event Ian Munsie
@ 2010-11-23  5:35 ` Ian Munsie
  2010-11-23 11:08   ` Peter Zijlstra
  2010-11-23  5:35 ` [PATCH 4/6] perf: Add timestamp to READ and LOST events Ian Munsie
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  5:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ian Munsie, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

From: Ian Munsie <imunsie@au1.ibm.com>

This patch adds extra ABI to the perf interface to allow the
PERF_RECORD_COMM and PERF_RECORD_MMAP events to optionally include a
timestamp, which they were previously missing.

This has become necessary since the move to per-task-per-cpu events in
perf as those events would not necessarily be recorded prior to any
samples associated with them and would cause perf to miss-attribute
those events and produce bogus reports. This behaviour can be observed
almost every run on 64 CPU PowerPC systems, but has also been observed
on a simple quad core Intel i7.

The timestamp is requested by setting the all_timed flag in the
event attributes when opening the event and will be placed just after
the header of the events in question.

This patch also ensures that the PERF_RECORD_COMM event is dispatched
before enabling the counters for enable_on_exec.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 include/linux/perf_event.h |    5 ++-
 kernel/perf_event.c        |   95 +++++++++++++++++++++++++------------------
 2 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 40150f3..ef99af4 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -215,8 +215,9 @@ struct perf_event_attr {
 				 */
 				precise_ip     :  2, /* skid constraint       */
 				mmap_data      :  1, /* non-exec mmap data    */
+				all_timed      :  1, /* timestamp every event */
 
-				__reserved_1   : 46;
+				__reserved_1   : 47;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -332,6 +333,7 @@ enum perf_event_type {
 	 *
 	 * struct {
 	 *	struct perf_event_header	header;
+	 *	{ u64			time;     } && all_timed
 	 *
 	 *	u32				pid, tid;
 	 *	u64				addr;
@@ -354,6 +356,7 @@ enum perf_event_type {
 	/*
 	 * struct {
 	 *	struct perf_event_header	header;
+	 *	{ u64			time;     } && all_timed
 	 *
 	 *	u32				pid, tid;
 	 *	char				comm[];
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 8d099f3..c8ebab2 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3884,9 +3884,9 @@ struct perf_comm_event {
 	char			*comm;
 	int			comm_size;
 
+	struct perf_event_header	header;
+	u64				time; /* && all_timed */
 	struct {
-		struct perf_event_header	header;
-
 		u32				pid;
 		u32				tid;
 	} event_id;
@@ -3896,16 +3896,23 @@ static void perf_event_comm_output(struct perf_event *event,
 				     struct perf_comm_event *comm_event)
 {
 	struct perf_output_handle handle;
-	int size = comm_event->event_id.header.size;
-	int ret = perf_output_begin(&handle, event, size, 0, 0);
 
-	if (ret)
+	if (event->attr.all_timed) {
+		comm_event->header.size += sizeof(u64);
+		comm_event->time = perf_clock();
+	}
+
+	if (perf_output_begin(&handle, event, comm_event->header.size, 0, 0))
 		return;
 
 	comm_event->event_id.pid = perf_event_pid(event, comm_event->task);
 	comm_event->event_id.tid = perf_event_tid(event, comm_event->task);
 
+	perf_output_put(&handle, comm_event->header);
+	if (event->attr.all_timed)
+		perf_output_put(&handle, comm_event->time);
 	perf_output_put(&handle, comm_event->event_id);
+
 	perf_output_copy(&handle, comm_event->comm,
 				   comm_event->comm_size);
 	perf_output_end(&handle);
@@ -3913,9 +3920,6 @@ static void perf_event_comm_output(struct perf_event *event,
 
 static int perf_event_comm_match(struct perf_event *event)
 {
-	if (event->state < PERF_EVENT_STATE_INACTIVE)
-		return 0;
-
 	if (event->cpu != -1 && event->cpu != smp_processor_id())
 		return 0;
 
@@ -3952,7 +3956,8 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event)
 	comm_event->comm = comm;
 	comm_event->comm_size = size;
 
-	comm_event->event_id.header.size = sizeof(comm_event->event_id) + size;
+	comm_event->header.size = sizeof(comm_event->header)
+				+ sizeof(comm_event->event_id) + size;
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(pmu, &pmus, entry) {
@@ -3978,33 +3983,34 @@ void perf_event_comm(struct task_struct *task)
 	struct perf_event_context *ctx;
 	int ctxn;
 
-	for_each_task_context_nr(ctxn) {
-		ctx = task->perf_event_ctxp[ctxn];
-		if (!ctx)
-			continue;
-
-		perf_event_enable_on_exec(ctx);
-	}
 
-	if (!atomic_read(&nr_comm_events))
-		return;
-
-	comm_event = (struct perf_comm_event){
-		.task	= task,
-		/* .comm      */
-		/* .comm_size */
-		.event_id  = {
+	if (atomic_read(&nr_comm_events)) {
+		comm_event = (struct perf_comm_event){
+			.task	= task,
+			/* .comm      */
+			/* .comm_size */
 			.header = {
 				.type = PERF_RECORD_COMM,
 				.misc = 0,
 				/* .size */
 			},
-			/* .pid */
-			/* .tid */
-		},
-	};
+			/* .time && all_timed */
+			.event_id  = {
+				/* .pid */
+				/* .tid */
+			},
+		};
+
+		perf_event_comm_event(&comm_event);
+	}
+
+	for_each_task_context_nr(ctxn) {
+		ctx = task->perf_event_ctxp[ctxn];
+		if (!ctx)
+			continue;
 
-	perf_event_comm_event(&comm_event);
+		perf_event_enable_on_exec(ctx);
+	}
 }
 
 /*
@@ -4017,9 +4023,9 @@ struct perf_mmap_event {
 	const char		*file_name;
 	int			file_size;
 
+	struct perf_event_header	header;
+	u64				time; /* && all_timed */
 	struct {
-		struct perf_event_header	header;
-
 		u32				pid;
 		u32				tid;
 		u64				start;
@@ -4032,16 +4038,23 @@ static void perf_event_mmap_output(struct perf_event *event,
 				     struct perf_mmap_event *mmap_event)
 {
 	struct perf_output_handle handle;
-	int size = mmap_event->event_id.header.size;
-	int ret = perf_output_begin(&handle, event, size, 0, 0);
 
-	if (ret)
+	if (event->attr.all_timed) {
+		mmap_event->header.size += sizeof(u64);
+		mmap_event->time = perf_clock();
+	}
+
+	if (perf_output_begin(&handle, event, mmap_event->header.size, 0, 0))
 		return;
 
 	mmap_event->event_id.pid = perf_event_pid(event, current);
 	mmap_event->event_id.tid = perf_event_tid(event, current);
 
+	perf_output_put(&handle, mmap_event->header);
+	if (event->attr.all_timed)
+		perf_output_put(&handle, mmap_event->time);
 	perf_output_put(&handle, mmap_event->event_id);
+
 	perf_output_copy(&handle, mmap_event->file_name,
 				   mmap_event->file_size);
 	perf_output_end(&handle);
@@ -4137,7 +4150,8 @@ got_name:
 	mmap_event->file_name = name;
 	mmap_event->file_size = size;
 
-	mmap_event->event_id.header.size = sizeof(mmap_event->event_id) + size;
+	mmap_event->header.size = sizeof(mmap_event->header)
+				+ sizeof(mmap_event->event_id) + size;
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(pmu, &pmus, entry) {
@@ -4173,12 +4187,13 @@ void perf_event_mmap(struct vm_area_struct *vma)
 		.vma	= vma,
 		/* .file_name */
 		/* .file_size */
+		.header = {
+			.type = PERF_RECORD_MMAP,
+			.misc = PERF_RECORD_MISC_USER,
+			/* .size */
+		},
+		/* .time && all_timed */
 		.event_id  = {
-			.header = {
-				.type = PERF_RECORD_MMAP,
-				.misc = PERF_RECORD_MISC_USER,
-				/* .size */
-			},
 			/* .pid */
 			/* .tid */
 			.start  = vma->vm_start,
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] perf: Add timestamp to READ and LOST events
  2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
                   ` (2 preceding siblings ...)
  2010-11-23  5:35 ` [PATCH 3/6] perf: Add timestamp to COMM and MMAP events Ian Munsie
@ 2010-11-23  5:35 ` Ian Munsie
  2010-11-23  5:35 ` [PATCH 5/6] perf tool: Process COMM and MMAP events in order Ian Munsie
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  5:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ian Munsie, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

From: Ian Munsie <imunsie@au1.ibm.com>

This patch adds timestamps to the remaining events previously missing
them (PERF_RECORD_READ and PERF_RECORD_LOST) if requested by the
all_timed flag in the event attributes when opening the event.

This is not strictly necessary to fix any known bugs, but it is
foreseeable that knowing the timestamp of these events may be desirable
in the future and if we are going to add a timestamp to them, we may as
well do it now while we are changing the ABI to add timestamps to COMM
and MMAP events anyway.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 include/linux/perf_event.h |    2 +
 kernel/perf_event.c        |   56 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ef99af4..8b646aa 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -347,6 +347,7 @@ enum perf_event_type {
 	/*
 	 * struct {
 	 *	struct perf_event_header	header;
+	 *	{ u64			time;     } && all_timed
 	 *	u64				id;
 	 *	u64				lost;
 	 * };
@@ -398,6 +399,7 @@ enum perf_event_type {
 	/*
 	 * struct {
 	 *	struct perf_event_header	header;
+	 *	{ u64			time;     } && all_timed
 	 *	u32				pid, tid;
 	 *
 	 *	struct read_format		values;
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index c8ebab2..310b54a 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3310,10 +3310,14 @@ int perf_output_begin(struct perf_output_handle *handle,
 	struct perf_buffer *buffer;
 	unsigned long tail, offset, head;
 	int have_lost;
+	unsigned int lostsize;
 	struct {
 		struct perf_event_header header;
-		u64			 id;
-		u64			 lost;
+		u64			time; /* && all_timed */
+		struct {
+			u64			 id;
+			u64			 lost;
+		} event_id;
 	} lost_event;
 
 	rcu_read_lock();
@@ -3336,8 +3340,14 @@ int perf_output_begin(struct perf_output_handle *handle,
 		goto out;
 
 	have_lost = local_read(&buffer->lost);
-	if (have_lost)
-		size += sizeof(lost_event);
+	if (have_lost) {
+		lostsize = sizeof(lost_event.header) + sizeof(lost_event.event_id);
+		if (event->attr.all_timed) {
+			lostsize += sizeof(u64);
+			lost_event.time = perf_clock();
+		}
+		size += lostsize;
+	}
 
 	perf_output_get_handle(handle);
 
@@ -3368,11 +3378,14 @@ int perf_output_begin(struct perf_output_handle *handle,
 	if (have_lost) {
 		lost_event.header.type = PERF_RECORD_LOST;
 		lost_event.header.misc = 0;
-		lost_event.header.size = sizeof(lost_event);
-		lost_event.id          = event->id;
-		lost_event.lost        = local_xchg(&buffer->lost, 0);
+		lost_event.header.size = lostsize;
+		lost_event.event_id.id = event->id;
+		lost_event.event_id.lost = local_xchg(&buffer->lost, 0);
 
-		perf_output_put(handle, lost_event);
+		perf_output_put(handle, lost_event.header);
+		if (event->attr.all_timed)
+			perf_output_put(handle, lost_event.time);
+		perf_output_put(handle, lost_event.event_id);
 	}
 
 	return 0;
@@ -3710,9 +3723,12 @@ exit:
 
 struct perf_read_event {
 	struct perf_event_header	header;
+	u64				time; /* && all_timed */
 
-	u32				pid;
-	u32				tid;
+	struct {
+		u32				pid;
+		u32				tid;
+	} event_id;
 };
 
 static void
@@ -3724,18 +3740,30 @@ perf_event_read_event(struct perf_event *event,
 		.header = {
 			.type = PERF_RECORD_READ,
 			.misc = 0,
-			.size = sizeof(read_event) + perf_event_read_size(event),
+			.size = sizeof(read_event.header) + sizeof(read_event.event_id)
+				+ perf_event_read_size(event),
+		},
+		/* .time && all_timed */
+		.event_id = {
+			.pid = perf_event_pid(event, task),
+			.tid = perf_event_tid(event, task),
 		},
-		.pid = perf_event_pid(event, task),
-		.tid = perf_event_tid(event, task),
 	};
 	int ret;
 
+	if (event->attr.all_timed) {
+		read_event.header.size += sizeof(u64);
+		read_event.time = perf_clock();
+	}
+
 	ret = perf_output_begin(&handle, event, read_event.header.size, 0, 0);
 	if (ret)
 		return;
 
-	perf_output_put(&handle, read_event);
+	perf_output_put(&handle, read_event.header);
+	if (event->attr.all_timed)
+		perf_output_put(&handle, read_event.time);
+	perf_output_put(&handle, read_event.event_id);
 	perf_output_read(&handle, event);
 
 	perf_output_end(&handle);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] perf tool: Process COMM and MMAP events in order
  2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
                   ` (3 preceding siblings ...)
  2010-11-23  5:35 ` [PATCH 4/6] perf: Add timestamp to READ and LOST events Ian Munsie
@ 2010-11-23  5:35 ` Ian Munsie
  2010-11-23  5:35 ` [PATCH 6/6] perf tool: Process READ and LOST " Ian Munsie
  2010-11-23  6:37 ` perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
  6 siblings, 0 replies; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  5:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ian Munsie, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Tom Zanussi, Thomas Gleixner

From: Ian Munsie <imunsie@au1.ibm.com>

This patch changes the perf tool to request timestamps on the COMM, MMAP
and SAMPLE events if perf is opened multiple times. This is necessary to
ensure that the COMM and MMAP events can be processed prior to samples
corresponding to them so that those samples are correctly attributed.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 tools/perf/builtin-record.c |    5 ++
 tools/perf/builtin-report.c |    1 +
 tools/perf/util/event.h     |   12 ++++++
 tools/perf/util/session.c   |   92 ++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 100 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9426383..75956c6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -287,6 +287,11 @@ static void create_counter(int counter, int cpu)
 		attr->sample_type	|= PERF_SAMPLE_CPU;
 	}
 
+	if (system_wide || !no_inherit || cpu_list) {
+		attr->sample_type	|= PERF_SAMPLE_TIME;
+		attr->all_timed		= 1;
+	}
+
 	attr->mmap		= track;
 	attr->comm		= track;
 	attr->inherit		= !no_inherit;
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 5de405d..1760187 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -243,6 +243,7 @@ static struct perf_event_ops event_ops = {
 	.event_type = event__process_event_type,
 	.tracing_data = event__process_tracing_data,
 	.build_id = event__process_build_id,
+	.ordered_samples = true,
 };
 
 extern volatile int session_done;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 8e790da..b9c2b2d 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -16,8 +16,18 @@ struct ip_event {
 	unsigned char __more_data[];
 };
 
+/*
+ * all_timed
+ */
+struct timed_event {
+	struct perf_event_header header;
+	u64 time;
+	unsigned char __more_data[];
+};
+
 struct mmap_event {
 	struct perf_event_header header;
+	u64 time; /* && all_timed */
 	u32 pid, tid;
 	u64 start;
 	u64 len;
@@ -27,6 +37,7 @@ struct mmap_event {
 
 struct comm_event {
 	struct perf_event_header header;
+	u64 time; /* && all_timed */
 	u32 pid, tid;
 	char comm[16];
 };
@@ -119,6 +130,7 @@ struct tracing_data_event {
 typedef union event_union {
 	struct perf_event_header	header;
 	struct ip_event			ip;
+	struct timed_event		timed;
 	struct mmap_event		mmap;
 	struct comm_event		comm;
 	struct fork_event		fork;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 40d2193..09d0f3a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -318,12 +318,14 @@ static void event__all64_swap(event_t *self)
 
 static void event__comm_swap(event_t *self)
 {
+	self->comm.time = bswap_64(self->comm.time);
 	self->comm.pid = bswap_32(self->comm.pid);
 	self->comm.tid = bswap_32(self->comm.tid);
 }
 
 static void event__mmap_swap(event_t *self)
 {
+	self->mmap.time	 = bswap_64(self->mmap.time);
 	self->mmap.pid	 = bswap_32(self->mmap.pid);
 	self->mmap.tid	 = bswap_32(self->mmap.tid);
 	self->mmap.start = bswap_64(self->mmap.start);
@@ -424,7 +426,19 @@ static void flush_sample_queue(struct perf_session *s,
 
 		trace_event((event_t *)iter->event, iter->offset, iter->timestamp);
 
-		ops->sample((event_t *)iter->event, s);
+		switch (((event_t *)iter->event)->header.type) {
+		case PERF_RECORD_SAMPLE:
+			ops->sample((event_t *)iter->event, s);
+			break;
+		case PERF_RECORD_COMM:
+			ops->comm((event_t *)iter->event, s);
+			break;
+		case PERF_RECORD_MMAP:
+			ops->mmap((event_t *)iter->event, s);
+			break;
+		default:
+			printf("Warning: Unrecognised event in flush_sample_queue\n");
+		}
 
 		s->ordered_samples.last_flush = iter->timestamp;
 		list_del(&iter->list);
@@ -549,10 +563,9 @@ static void __queue_sample_event(struct sample_queue *new,
 		__queue_sample_after(new, last_inserted, head);
 }
 
-static int queue_sample_event(event_t *event, struct sample_data *data,
+static int queue_event(event_t *event, u64 timestamp,
 		       struct perf_session *s, u64 offset, u64 head)
 {
-	u64 timestamp = data->time;
 	struct sample_queue *new;
 
 
@@ -599,12 +612,26 @@ static int perf_session__process_sample(event_t *event, struct perf_session *s,
 		return ops->sample(event, s);
 	}
 
-	queue_sample_event(event, &data, s, offset, head);
+	queue_event(event, data.time, s, offset, head);
 
 	return 0;
 }
 
-static int perf_session__process_event(struct perf_session *self,
+static int perf_session__process_timed(event_t *event, struct perf_session *s,
+					struct perf_event_ops *ops,
+					event_op def_op, u64 offset, u64 head)
+{
+	if (!ops->ordered_samples) {
+		trace_event(event, offset + head, event->timed.time);
+		return def_op(event, s);
+	}
+
+	queue_event(event, event->timed.time, s, offset, head);
+
+	return 0;
+}
+
+static int __perf_session__process_event(struct perf_session *self,
 				       event_t *event,
 				       struct perf_event_ops *ops,
 				       u64 offset, u64 head)
@@ -615,16 +642,22 @@ static int perf_session__process_event(struct perf_session *self,
 	if (self->header.needs_swap && event__swap_ops[event->header.type])
 		event__swap_ops[event->header.type](event);
 
-	if (event->header.type == PERF_RECORD_SAMPLE)
+	switch (event->header.type) {
+	case PERF_RECORD_SAMPLE:
 		return perf_session__process_sample(event, self, ops,
 						    offset, head);
+	case PERF_RECORD_MMAP:
+		return perf_session__process_timed(event, self, ops,
+						   ops->mmap, offset, head);
+	case PERF_RECORD_COMM:
+		return perf_session__process_timed(event, self, ops,
+						   ops->comm, offset, head);
+	default: break;
+	}
+
 	trace_event(event, offset + head, 0);
 
 	switch (event->header.type) {
-	case PERF_RECORD_MMAP:
-		return ops->mmap(event, self);
-	case PERF_RECORD_COMM:
-		return ops->comm(event, self);
 	case PERF_RECORD_FORK:
 		return ops->fork(event, self);
 	case PERF_RECORD_EXIT:
@@ -655,6 +688,45 @@ static int perf_session__process_event(struct perf_session *self,
 	}
 }
 
+static int perf_session__process_event(struct perf_session *self,
+				       event_t *event,
+				       struct perf_event_ops *ops,
+				       u64 offset, u64 head)
+{
+	int size, ret;
+	event_t *event_tmp;
+
+	if (self->header.attr[0]->attr.all_timed)
+		return __perf_session__process_event(self, event, ops, offset, head);
+
+	switch (event->header.type) {
+	case PERF_RECORD_MMAP: case PERF_RECORD_COMM:
+		/* These events are coming in with the old ABI (missing time
+		 * field), so make a temporary copy following the new ABI data
+		 * structure and set the time field to 0.
+		 */
+		size = event->header.size + sizeof(u64);
+
+		event_tmp = malloc(size);
+		if (!event_tmp)
+			return -1;
+
+		memcpy(event_tmp, event, sizeof(struct perf_event_header));
+		memcpy(event_tmp->timed.__more_data, &event->timed.time,
+			event->header.size - sizeof(struct perf_event_header));
+
+		event_tmp->header.size = size;
+		event_tmp->timed.time = 0;
+
+		ret = __perf_session__process_event(self, event, ops, offset, head);
+
+		free(event_tmp);
+		return ret;
+	default:
+		return __perf_session__process_event(self, event, ops, offset, head);
+	}
+}
+
 void perf_event_header__bswap(struct perf_event_header *self)
 {
 	self->type = bswap_32(self->type);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] perf tool: Process READ and LOST events in order
  2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
                   ` (4 preceding siblings ...)
  2010-11-23  5:35 ` [PATCH 5/6] perf tool: Process COMM and MMAP events in order Ian Munsie
@ 2010-11-23  5:35 ` Ian Munsie
  2010-11-23  6:37 ` perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
  6 siblings, 0 replies; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  5:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ian Munsie, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Tom Zanussi, Thomas Gleixner

From: Ian Munsie <imunsie@au1.ibm.com>

This patches changes the perf tool so that READ and LOST events will be
processed in order of their timestamp, if present.

This is not strictly necessary to fix any known bugs, but it is
foreseeable that knowing the timestamp of these events may be desirable
in the future and if we are going to add a timestamp to them, we may as
well do it now while we are changing the ABI to add timestamps to COMM
and MMAP events anyway.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
 tools/perf/util/event.h   |    2 ++
 tools/perf/util/session.c |   18 ++++++++++++++----
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index b9c2b2d..ba0a761 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -51,6 +51,7 @@ struct fork_event {
 
 struct lost_event {
 	struct perf_event_header header;
+	u64 time; /* && all_timed */
 	u64 id;
 	u64 lost;
 };
@@ -60,6 +61,7 @@ struct lost_event {
  */
 struct read_event {
 	struct perf_event_header header;
+	u64 time; /* && all_timed */
 	u32 pid, tid;
 	u64 value;
 	u64 time_enabled;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 09d0f3a..025da8f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -344,6 +344,7 @@ static void event__task_swap(event_t *self)
 
 static void event__read_swap(event_t *self)
 {
+	self->read.time		= bswap_64(self->mmap.time);
 	self->read.pid		= bswap_32(self->read.pid);
 	self->read.tid		= bswap_32(self->read.tid);
 	self->read.value	= bswap_64(self->read.value);
@@ -436,6 +437,12 @@ static void flush_sample_queue(struct perf_session *s,
 		case PERF_RECORD_MMAP:
 			ops->mmap((event_t *)iter->event, s);
 			break;
+		case PERF_RECORD_LOST:
+			ops->lost((event_t *)iter->event, s);
+			break;
+		case PERF_RECORD_READ:
+			ops->read((event_t *)iter->event, s);
+			break;
 		default:
 			printf("Warning: Unrecognised event in flush_sample_queue\n");
 		}
@@ -652,6 +659,12 @@ static int __perf_session__process_event(struct perf_session *self,
 	case PERF_RECORD_COMM:
 		return perf_session__process_timed(event, self, ops,
 						   ops->comm, offset, head);
+	case PERF_RECORD_LOST:
+		return perf_session__process_timed(event, self, ops,
+						   ops->lost, offset, head);
+	case PERF_RECORD_READ:
+		return perf_session__process_timed(event, self, ops,
+						   ops->read, offset, head);
 	default: break;
 	}
 
@@ -662,10 +675,6 @@ static int __perf_session__process_event(struct perf_session *self,
 		return ops->fork(event, self);
 	case PERF_RECORD_EXIT:
 		return ops->exit(event, self);
-	case PERF_RECORD_LOST:
-		return ops->lost(event, self);
-	case PERF_RECORD_READ:
-		return ops->read(event, self);
 	case PERF_RECORD_THROTTLE:
 		return ops->throttle(event, self);
 	case PERF_RECORD_UNTHROTTLE:
@@ -701,6 +710,7 @@ static int perf_session__process_event(struct perf_session *self,
 
 	switch (event->header.type) {
 	case PERF_RECORD_MMAP: case PERF_RECORD_COMM:
+	case PERF_RECORD_LOST: case PERF_RECORD_READ:
 		/* These events are coming in with the old ABI (missing time
 		 * field), so make a temporary copy following the new ABI data
 		 * structure and set the time field to 0.
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: perf: Timestamp COMM and MMAP events for correct attribution
  2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
                   ` (5 preceding siblings ...)
  2010-11-23  5:35 ` [PATCH 6/6] perf tool: Process READ and LOST " Ian Munsie
@ 2010-11-23  6:37 ` Ian Munsie
  6 siblings, 0 replies; 16+ messages in thread
From: Ian Munsie @ 2010-11-23  6:37 UTC (permalink / raw)
  To: linux-kernel

I forgot to mention the base is tip/master

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-23  5:35 ` [PATCH 3/6] perf: Add timestamp to COMM and MMAP events Ian Munsie
@ 2010-11-23 11:08   ` Peter Zijlstra
  2010-11-23 12:41     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-23 11:08 UTC (permalink / raw)
  To: Ian Munsie
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo

On Tue, 2010-11-23 at 16:35 +1100, Ian Munsie wrote:
> From: Ian Munsie <imunsie@au1.ibm.com>
> 
> This patch adds extra ABI to the perf interface to allow the
> PERF_RECORD_COMM and PERF_RECORD_MMAP events to optionally include a
> timestamp, which they were previously missing.
> 
> This has become necessary since the move to per-task-per-cpu events in
> perf as those events would not necessarily be recorded prior to any
> samples associated with them and would cause perf to miss-attribute
> those events and produce bogus reports. This behaviour can be observed
> almost every run on 64 CPU PowerPC systems, but has also been observed
> on a simple quad core Intel i7.
> 
> The timestamp is requested by setting the all_timed flag in the
> event attributes when opening the event and will be placed just after
> the header of the events in question.
> 
> This patch also ensures that the PERF_RECORD_COMM event is dispatched
> before enabling the counters for enable_on_exec.
> 

> @@ -332,6 +333,7 @@ enum perf_event_type {
>  	 *
>  	 * struct {
>  	 *	struct perf_event_header	header;
> +	 *	{ u64			time;     } && all_timed
>  	 *
>  	 *	u32				pid, tid;
>  	 *	u64				addr;
> @@ -354,6 +356,7 @@ enum perf_event_type {
>  	/*
>  	 * struct {
>  	 *	struct perf_event_header	header;
> +	 *	{ u64			time;     } && all_timed
>  	 *
>  	 *	u32				pid, tid;
>  	 *	char				comm[];


Right, so acme had a patch adding

PERF_SAMPLE_TID
PERF_SAMPLE_TIME
PERF_SAMPLE_ID
PERF_SAMPLE_CPU
PERF_SAMPLE_STREAM_ID

(IIRC) to every record.. so solve this same problem. Except he got
interrupted by KS and things and didn't get around to posting it.

Arnaldo?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-23 11:08   ` Peter Zijlstra
@ 2010-11-23 12:41     ` Arnaldo Carvalho de Melo
  2010-11-25 15:47       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-11-23 12:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ian Munsie, linux-kernel, Paul Mackerras, Ingo Molnar

Em Tue, Nov 23, 2010 at 12:08:32PM +0100, Peter Zijlstra escreveu:
> Right, so acme had a patch adding
> 
> PERF_SAMPLE_TID
> PERF_SAMPLE_TIME
> PERF_SAMPLE_ID
> PERF_SAMPLE_CPU
> PERF_SAMPLE_STREAM_ID
> 
> (IIRC) to every record.. so solve this same problem. Except he got
> interrupted by KS and things and didn't get around to posting it.
> 
> Arnaldo?

Will post later today, IIRC I have to break it down because I did some
refactorings, etc.

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-23 12:41     ` Arnaldo Carvalho de Melo
@ 2010-11-25 15:47       ` Arnaldo Carvalho de Melo
  2010-11-26 21:54         ` Arnaldo Carvalho de Melo
  2010-11-29  6:06         ` Ian Munsie
  0 siblings, 2 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-11-25 15:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ian Munsie, linux-kernel, Paul Mackerras, Ingo Molnar

Em Tue, Nov 23, 2010 at 10:41:21AM -0200, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Nov 23, 2010 at 12:08:32PM +0100, Peter Zijlstra escreveu:
> > Right, so acme had a patch adding
> > 
> > PERF_SAMPLE_TID
> > PERF_SAMPLE_TIME
> > PERF_SAMPLE_ID
> > PERF_SAMPLE_CPU
> > PERF_SAMPLE_STREAM_ID
> > 
> > (IIRC) to every record.. so solve this same problem. Except he got
> > interrupted by KS and things and didn't get around to posting it.
> > 
> > Arnaldo?
> 
> Will post later today, IIRC I have to break it down because I did some
> refactorings, etc.

Ian,

	Please take a look at:

http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=shortlog;h=refs/heads/perf/sample_type_id_all

I'm still testing and Peter is reviewing them, I know of a problem with
unconditionally stting attr->sample_type_all and then trying the new
perf tool on an older kernel, that will fail with:

[root@felicio ~]# perf record -a -g

  Error: sys_perf_event_open() syscall returned with -1 (Invalid
argument).  /bin/dmesg may provide additional information.

  Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

[root@felicio ~]# 

Because it doesn't expects that extra bit to be set. I'll use that to
make sure the binary works on older kernels without this feature.

But apart from this it seems ok, end result can be seen on the HEAD:

http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=commit;h=27ebd1697fe03268d75fbe44ba1648ec34cf572b

$ perf report -D | grep PERF_RECORD_
<SNIP>
 3   5951915425 0x47530 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff8138c1a2 period: 215979 cpu:3
 3   5952026879 0x47588 [0x90]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff810cb480 period: 215979 cpu:3
 3   5952059959 0x47618 [0x38]: PERF_RECORD_FORK(6853:6853):(16811:16811)
 3   5952138878 0x47650 [0x78]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff811bac35 period: 431478 cpu:3
 3   5952375068 0x476c8 [0x30]: PERF_RECORD_COMM: find:6853
 3   5952395923 0x476f8 [0x50]: PERF_RECORD_MMAP 6853/6853: [0x400000(0x25000) @ 0]: /usr/bin/find
 3   5952413756 0x47748 [0xa0]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff810d080f period: 859332 cpu:3
 3   5952419837 0x477e8 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44600000(0x21d000) @ 0]: /lib64/ld-2.5.so
 3   5952437929 0x47840 [0x48]: PERF_RECORD_MMAP 6853/6853: [0x7fff7e1c9000(0x1000) @ 0x7fff7e1c9000]: [vdso]
 3   5952570127 0x47888 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f46200000(0x218000) @ 0]: /lib64/libselinux.so.1
 3   5952623637 0x478e0 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44a00000(0x356000) @ 0]: /lib64/libc-2.5.so
 3   5952675720 0x47938 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44e00000(0x204000) @ 0]: /lib64/libdl-2.5.so
 3   5952710080 0x47990 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f45a00000(0x246000) @ 0]: /lib64/libsepol.so.1
 3   5952847802 0x479e8 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff813897f0 period: 1142536 cpu:3
<SNIP>
cpu   tstamp

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-25 15:47       ` Arnaldo Carvalho de Melo
@ 2010-11-26 21:54         ` Arnaldo Carvalho de Melo
  2010-11-29  6:06         ` Ian Munsie
  1 sibling, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-11-26 21:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ian Munsie, linux-kernel, Paul Mackerras, Ingo Molnar

Em Thu, Nov 25, 2010 at 01:47:18PM -0200, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Nov 23, 2010 at 10:41:21AM -0200, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Nov 23, 2010 at 12:08:32PM +0100, Peter Zijlstra escreveu:
> > > Right, so acme had a patch adding:
> > > PERF_SAMPLE_{TID,TIME,ID,CPU,STREAM_ID}

> I know of a problem with unconditionally stting attr->sample_type_all
> and then trying the new perf tool on an older kernel, that will fail
> with:
> [root@felicio ~]# perf record -a -g
>   Error: sys_perf_event_open() syscall returned with -1 (Invalid
> argument).  /bin/dmesg may provide additional information.
>   Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
> 
> Because it doesn't expects that extra bit to be set. I'll use that to
> make sure the binary works on older kernels without this feature.

This got fixed.

http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=shortlog;h=refs/heads/perf/sample_id_all.v1

Has the latest, that I was about to submit but I'll change it so that
older perf tools can cope with newer perf.data files, by stashing the
new fields after the existing layout, not between perf_event_header and
the mmap/fork/comm/etc specific stuff.

I'll rework that, test newer perf.data files with older perf tools and
then submit.

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-25 15:47       ` Arnaldo Carvalho de Melo
  2010-11-26 21:54         ` Arnaldo Carvalho de Melo
@ 2010-11-29  6:06         ` Ian Munsie
  2010-11-29 11:54           ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Ian Munsie @ 2010-11-29  6:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, linux-kernel, Paul Mackerras, Ingo Molnar

Excerpts from Arnaldo Carvalho de Melo's message of Fri Nov 26 02:47:18 +1100 2010:
> Ian,
> 
>     Please take a look at:
> 
> http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=shortlog;h=refs/heads/perf/sample_type_id_all

Overall it looks pretty good, except (running on a 64way Power7 system):

$ taskset 2 perf record -T ./test 10000000
$ perf report -D|grep PERF_RECORD
<SNIP>
     1009028 0x9c70 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefdfe8 period: 1 cpu:-1
     1010032 0x9c98 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefdf6c period: 1 cpu:-1
     1011060 0x9cc0 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0x100007dc period: 1 cpu:-1
     1012082 0x9ce8 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefe024 period: 1 cpu:-1
     1013112 0x9d10 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefde64 period: 1 cpu:-1
     1014150 0x9d38 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefdec8 period: 1 cpu:-1
     1015184 0x9d60 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefde64 period: 1 cpu:-1
     1016202 0x9d88 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefdf58 period: 1 cpu:-1
     1017230 0x9db0 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0x100007e4 period: 1 cpu:-1
     1018254 0x9dd8 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefde64 period: 1 cpu:-1
     1019292 0x9e00 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0xfefdf3c period: 1 cpu:-1
     1020330 0x9e28 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0x100007e4 period: 1 cpu:-1
     1021390 0x9e50 [0x30]: PERF_RECORD_THROTTLE: unhandled!
     1021682 0x9e80 [0x28]: PERF_RECORD_SAMPLE(IP, 2): 5563/5563: 0x10000e2c period: 1 cpu:-1
     5160466 0x9ea8 [0x30]: PERF_RECORD_UNTHROTTLE: unhandled!
   251846992 0x9ed8 [0x30]: PERF_RECORD_EXIT(5563:5563):(5563:5563)
   251848132 0x9f08 [0x30]: PERF_RECORD_EXIT(5563:5563):(5563:5563)
   251848944 0x9f38 [0x30]: PERF_RECORD_EXIT(5563:5563):(5563:5563)
   251849684 0x9f68 [0x30]: PERF_RECORD_EXIT(5563:5563):(5563:5563)
   251850428 0x9f98 [0x30]: PERF_RECORD_EXIT(5563:5563):(5563:5563)
   251851164 0x9fc8 [0x30]: PERF_RECORD_EXIT(5563:5563):(5563:5563)
18446744073707910238 0x9ff8 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000038a38 period: 1 cpu:-1
18446744073707913124 0xa020 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc0000000000394a8 period: 1 cpu:-1
18446744073707913946 0xa048 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000039524 period: 1 cpu:-1
18446744073707914612 0xa070 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000008a5c period: 1 cpu:-1
18446744073707915242 0xa098 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000008af0 period: 1 cpu:-1
18446744073707915894 0xa0c0 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000008b4c period: 1 cpu:-1
18446744073707916926 0xa0e8 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b754 period: 1 cpu:-1
18446744073707917932 0xa110 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000002c13c period: 1 cpu:-1
18446744073707918894 0xa138 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000008ac0 period: 1 cpu:-1
18446744073707919814 0xa160 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000008b34 period: 1 cpu:-1
18446744073707920708 0xa188 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b74c period: 1 cpu:-1
18446744073707921612 0xa1b0 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000048270 period: 1 cpu:-1
18446744073707922550 0xa1d8 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000047d84 period: 1 cpu:-1
18446744073707923488 0xa200 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b7d0 period: 1 cpu:-1
18446744073707924446 0xa228 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b27c period: 1 cpu:-1
18446744073707925426 0xa250 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b460 period: 1 cpu:-1
18446744073707926370 0xa278 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b43c period: 1 cpu:-1
18446744073707927332 0xa2a0 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b460 period: 1 cpu:-1
18446744073707928302 0xa2c8 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc00000000012b43c period: 1 cpu:-1
18446744073707929300 0xa2f0 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000037ae0 period: 1 cpu:-1
18446744073707930302 0xa318 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000125ae8 period: 1 cpu:-1
18446744073707929794 0xa340 [0x28]: PERF_RECORD_COMM: test:5563
18446744073707931268 0xa368 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc0000000000a38d8 period: 1 cpu:-1
18446744073707932268 0xa390 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000127c7c period: 1 cpu:-1
18446744073707933238 0xa3b8 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000127cf0 period: 1 cpu:-1
18446744073707934226 0xa3e0 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000127d7c period: 1 cpu:-1
18446744073707935176 0xa408 [0x28]: PERF_RECORD_SAMPLE(IP, 1): 5563/5563: 0xc000000000127e90 period: 1 cpu:-1
<SNIP>

This goes away if PERF_SAMPLE_CPU is added to the mix so that the
timestamps are offset from the first event of *their* CPU, otherwise
time-epoch can go negative, as it has done here.

I'm not sure that using per CPU epochs really makes much sense either -
if the first event on CPU1 happens (say) 10 minutes after the first
event on CPU0 the timestamps printed out by perf report -D would not
correlate, even for events that occurred within seconds of each other.


This also doesn't provide the userspace code to process every event in
order (to ensure that COMM and MMAP events have been processed prior to
any relevant samples), so the bug I was working on is still present, but
it does provide the ABI changes necessary to enable us fix it.

I'm happy to rebase the userspace side of my patches on top of this tree
to resolve that - I was extending the existing infrastructure to order
samples to handle other events as well.

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-29  6:06         ` Ian Munsie
@ 2010-11-29 11:54           ` Peter Zijlstra
  2010-11-29 23:45             ` Ian Munsie
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2010-11-29 11:54 UTC (permalink / raw)
  To: Ian Munsie
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Paul Mackerras, Ingo Molnar

On Mon, 2010-11-29 at 17:06 +1100, Ian Munsie wrote:
> 
> 
> This goes away if PERF_SAMPLE_CPU is added to the mix so that the
> timestamps are offset from the first event of *their* CPU, otherwise
> time-epoch can go negative, as it has done here. 

How does that happen, I though the power7 sched_clock() was fully
synchronized and monotonic across all cores?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-29 11:54           ` Peter Zijlstra
@ 2010-11-29 23:45             ` Ian Munsie
  2010-11-30 14:02               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Munsie @ 2010-11-29 23:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Paul Mackerras, Ingo Molnar

Hi Peter,

Excerpts from Peter Zijlstra's message of Mon Nov 29 22:54:50 +1100 2010:
> On Mon, 2010-11-29 at 17:06 +1100, Ian Munsie wrote:
> > This goes away if PERF_SAMPLE_CPU is added to the mix so that the
> > timestamps are offset from the first event of *their* CPU, otherwise
> > time-epoch can go negative, as it has done here. 
> 
> How does that happen, I though the power7 sched_clock() was fully
> synchronized and monotonic across all cores?

Oh the timestamps coming from the kernel are fine, I should have been
clearer in my email. This bug is purely restricted to the userspace code
that prints them out - in the perf_session__print_tstamp function. It
tries to print the timestamps out so that they start at 0 and if it
doesn't know what CPU the events came from it uses the very first event
it sees as epoch, but at the moment that is not necessarily going to be
the earliest event.

Thinking about it overnight, I realise that this should also go away if
we sort them before printing things out at all, which is exactly what
I'm proposing to do in my patches.

Cheers,
-Ian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] perf: Add timestamp to COMM and MMAP events
  2010-11-29 23:45             ` Ian Munsie
@ 2010-11-30 14:02               ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-11-30 14:02 UTC (permalink / raw)
  To: Ian Munsie; +Cc: Peter Zijlstra, linux-kernel, Paul Mackerras, Ingo Molnar

Em Tue, Nov 30, 2010 at 10:45:40AM +1100, Ian Munsie escreveu:
> Excerpts from Peter Zijlstra's message of Mon Nov 29 22:54:50 +1100 2010:
> > On Mon, 2010-11-29 at 17:06 +1100, Ian Munsie wrote:
> > > This goes away if PERF_SAMPLE_CPU is added to the mix so that the
> > > timestamps are offset from the first event of *their* CPU,
> > > otherwise time-epoch can go negative, as it has done here. 

> > How does that happen, I though the power7 sched_clock() was fully
> > synchronized and monotonic across all cores?

> Oh the timestamps coming from the kernel are fine, I should have been
> clearer in my email. This bug is purely restricted to the userspace code
> that prints them out - in the perf_session__print_tstamp function. It
> tries to print the timestamps out so that they start at 0 and if it
> doesn't know what CPU the events came from it uses the very first event
> it sees as epoch, but at the moment that is not necessarily going to be
> the earliest event.
 
> Thinking about it overnight, I realise that this should also go away if
> we sort them before printing things out at all, which is exactly what
> I'm proposing to do in my patches.

I removed that in the current version of this patch series:

http://git.kernel.org/?p=linux/kernel/git/acme/linux-2.6.git;a=shortlog;h=refs/heads/perf/sample_id_all.v3

Now I'm just testing if a new perf.data file can be processed just fine
by older tools (not possible in the sample_id_all.v1 series) and other
combinations of older kernels with the new tool.

Probably I'll fold two patches and then submit for review.

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-11-30 14:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-23  5:35 perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie
2010-11-23  5:35 ` [PATCH 1/6] perf: Display unknown rather than IP for unidentified DSO Ian Munsie
2010-11-23  5:35 ` [PATCH 2/6] perf: Move all output for perf report -D into trace_event Ian Munsie
2010-11-23  5:35 ` [PATCH 3/6] perf: Add timestamp to COMM and MMAP events Ian Munsie
2010-11-23 11:08   ` Peter Zijlstra
2010-11-23 12:41     ` Arnaldo Carvalho de Melo
2010-11-25 15:47       ` Arnaldo Carvalho de Melo
2010-11-26 21:54         ` Arnaldo Carvalho de Melo
2010-11-29  6:06         ` Ian Munsie
2010-11-29 11:54           ` Peter Zijlstra
2010-11-29 23:45             ` Ian Munsie
2010-11-30 14:02               ` Arnaldo Carvalho de Melo
2010-11-23  5:35 ` [PATCH 4/6] perf: Add timestamp to READ and LOST events Ian Munsie
2010-11-23  5:35 ` [PATCH 5/6] perf tool: Process COMM and MMAP events in order Ian Munsie
2010-11-23  5:35 ` [PATCH 6/6] perf tool: Process READ and LOST " Ian Munsie
2010-11-23  6:37 ` perf: Timestamp COMM and MMAP events for correct attribution Ian Munsie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).