All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
@ 2010-04-08 20:45 Stephane Eranian
  2010-04-08 20:55 ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian, eranian

	There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
	when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
	bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
	field is encoded as int, thus the total is never a multiple of 8 which
	trips the check. I think the size should have been u64, but now it is
	too late to change given it is ABI.

	Signed-off-by: Stephane Eranian <eranian@google.com>

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 8143e77..fffeb95 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3311,7 +3311,6 @@ void perf_prepare_sample(struct perf_event_header *header,
 		else
 			size += sizeof(u32);
 
-		WARN_ON_ONCE(size & (sizeof(u64)-1));
 		header->size += size;
 	}
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
  2010-04-08 20:45 [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample() Stephane Eranian
@ 2010-04-08 20:55 ` Peter Zijlstra
  2010-04-08 21:03   ` Peter Zijlstra
  2010-04-08 21:08   ` Stephane Eranian
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 20:55 UTC (permalink / raw)
  To: eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> 	There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> 	when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> 	bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> 	field is encoded as int, thus the total is never a multiple of 8 which
> 	trips the check. I think the size should have been u64, but now it is
> 	too late to change given it is ABI.

PEBS hasn't seen -linus yet, so we can fix that.

There's various things that do indeed rely on the perf buffer to always
be u64 aligned, so this warning isn't bogus at all.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
  2010-04-08 20:55 ` Peter Zijlstra
@ 2010-04-08 21:03   ` Peter Zijlstra
  2010-04-08 21:18     ` Stephane Eranian
  2010-04-08 21:08   ` Stephane Eranian
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:03 UTC (permalink / raw)
  To: eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 22:55 +0200, Peter Zijlstra wrote:
> On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> > 	There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> > 	when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> > 	bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> > 	field is encoded as int, thus the total is never a multiple of 8 which
> > 	trips the check. I think the size should have been u64, but now it is
> > 	too late to change given it is ABI.
> 
> PEBS hasn't seen -linus yet, so we can fix that.
> 
> There's various things that do indeed rely on the perf buffer to always
> be u64 aligned, so this warning isn't bogus at all.

On the subject of PEBS, we need to change the ABI before it does hit
-linus, I've got something like the below,. but I'm not quite sure of
it.

We could keep the PERF_RECORD_MISC_EXACT bit and allow non-fixed up IPs
in PERF_SAMPLE_EVENT_IP fields, that way we can avoid having to add both
IP and EVENT_IP to samples.

---
Subject: perf: Change the PEBS interface
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Mar 30 13:35:58 CEST 2010

This patch changes the PEBS interface from perf_event_attr::precise
and PERF_RECORD_MISC_EXACT to perf_event_attr::precise and
PERF_SAMPLE_EVENT_IP.

The rationale is that .precise=1 !PERF_SAMPLE_EVENT_IP could be used
to implement buffered PEBS.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
---
 arch/x86/include/asm/perf_event.h         |    9 --
 arch/x86/kernel/cpu/perf_event_intel_ds.c |  133 +++++++++++++-----------------
 include/linux/perf_event.h                |    7 +
 kernel/perf_event.c                       |    6 +
 tools/perf/builtin-top.c                  |    9 +-
 5 files changed, 79 insertions(+), 85 deletions(-)

Index: linux-2.6/arch/x86/include/asm/perf_event.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/perf_event.h
+++ linux-2.6/arch/x86/include/asm/perf_event.h
@@ -128,21 +128,12 @@ extern void perf_events_lapic_init(void)
 
 #define PERF_EVENT_INDEX_OFFSET			0
 
-/*
- * Abuse bit 3 of the cpu eflags register to indicate proper PEBS IP fixups.
- * This flag is otherwise unused and ABI specified to be 0, so nobody should
- * care what we do with it.
- */
-#define PERF_EFLAGS_EXACT	(1UL << 3)
-
 #define perf_misc_flags(regs)				\
 ({	int misc = 0;					\
 	if (user_mode(regs))				\
 		misc |= PERF_RECORD_MISC_USER;		\
 	else						\
 		misc |= PERF_RECORD_MISC_KERNEL;	\
-	if (regs->flags & PERF_EFLAGS_EXACT)		\
-		misc |= PERF_RECORD_MISC_EXACT;		\
 	misc; })
 
 #define perf_instruction_pointer(regs)	((regs)->ip)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -330,7 +330,8 @@ static void intel_pmu_pebs_enable(struct
 	cpuc->pebs_enabled |= 1ULL << hwc->idx;
 	WARN_ON_ONCE(cpuc->enabled);
 
-	if (x86_pmu.intel_cap.pebs_trap)
+	if (x86_pmu.intel_cap.pebs_trap &&
+			(event->attr.sample_type & PERF_SAMPLE_EVENT_IP))
 		intel_pmu_lbr_enable(event);
 }
 
@@ -345,7 +346,8 @@ static void intel_pmu_pebs_disable(struc
 
 	hwc->config |= ARCH_PERFMON_EVENTSEL_INT;
 
-	if (x86_pmu.intel_cap.pebs_trap)
+	if (x86_pmu.intel_cap.pebs_trap &&
+			(event->attr.sample_type & PERF_SAMPLE_EVENT_IP))
 		intel_pmu_lbr_disable(event);
 }
 
@@ -376,12 +378,12 @@ static inline bool kernel_ip(unsigned lo
 #endif
 }
 
-static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
+static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	unsigned long from = cpuc->lbr_entries[0].from;
 	unsigned long old_to, to = cpuc->lbr_entries[0].to;
-	unsigned long ip = regs->ip;
+	unsigned long ip = *ipp;
 
 	/*
 	 * We don't need to fixup if the PEBS assist is fault like
@@ -412,7 +414,7 @@ static int intel_pmu_pebs_fixup_ip(struc
 	 * We sampled a branch insn, rewind using the LBR stack
 	 */
 	if (ip == to) {
-		regs->ip = from;
+		*ipp = from;
 		return 1;
 	}
 
@@ -439,7 +441,7 @@ static int intel_pmu_pebs_fixup_ip(struc
 	} while (to < ip);
 
 	if (to == ip) {
-		regs->ip = old_to;
+		*ipp = old_to;
 		return 1;
 	}
 
@@ -452,15 +454,62 @@ static int intel_pmu_pebs_fixup_ip(struc
 
 static int intel_pmu_save_and_restart(struct perf_event *event);
 
+static void __intel_pmu_pebs_event(struct perf_event *event,
+				   struct pt_regs *iregs, void *__pebs)
+{
+	/*
+	 * We cast to pebs_record_core since that is a subset of
+	 * both formats and we don't use the other fields in this
+	 * routine.
+	 */
+	struct pebs_record_core *pebs = __pebs;
+	struct perf_sample_data data;
+	struct perf_raw_record raw;
+	struct pt_regs regs;
+
+	if (!intel_pmu_save_and_restart(event))
+		return;
+
+	perf_sample_data_init(&data, 0);
+	data.period = event->hw.last_period;
+
+	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
+		raw.size = x86_pmu.pebs_record_size;
+		raw.data = pebs;
+		data.raw = &raw;
+	}
+
+	/*
+	 * We use the interrupt regs as a base because the PEBS record
+	 * does not contain a full regs set, specifically it seems to
+	 * lack segment descriptors, which get used by things like
+	 * user_mode().
+	 *
+	 * In the simple case fix up only the IP and BP,SP regs, for
+	 * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
+	 * A possible PERF_SAMPLE_REGS will have to transfer all regs.
+	 */
+	regs = *iregs;
+	regs.ip = pebs->ip;
+	regs.bp = pebs->bp;
+	regs.sp = pebs->sp;
+
+	if (event->attr.sample_type & PERF_SAMPLE_EVENT_IP) {
+		unsigned long event_ip = pebs->ip;
+		if (intel_pmu_pebs_fixup_ip(&event_ip))
+			data.event_ip = event_ip;
+	}
+
+	if (perf_event_overflow(event, 1, &data, &regs))
+		x86_pmu_stop(event);
+}
+
 static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct debug_store *ds = cpuc->ds;
 	struct perf_event *event = cpuc->events[0]; /* PMC0 only */
 	struct pebs_record_core *at, *top;
-	struct perf_sample_data data;
-	struct perf_raw_record raw;
-	struct pt_regs regs;
 	int n;
 
 	if (!ds || !x86_pmu.pebs)
@@ -486,9 +535,6 @@ static void intel_pmu_drain_pebs_core(st
 	if (n <= 0)
 		return;
 
-	if (!intel_pmu_save_and_restart(event))
-		return;
-
 	/*
 	 * Should not happen, we program the threshold at 1 and do not
 	 * set a reset value.
@@ -496,37 +542,7 @@ static void intel_pmu_drain_pebs_core(st
 	WARN_ON_ONCE(n > 1);
 	at += n - 1;
 
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
-
-	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
-		raw.size = x86_pmu.pebs_record_size;
-		raw.data = at;
-		data.raw = &raw;
-	}
-
-	/*
-	 * We use the interrupt regs as a base because the PEBS record
-	 * does not contain a full regs set, specifically it seems to
-	 * lack segment descriptors, which get used by things like
-	 * user_mode().
-	 *
-	 * In the simple case fix up only the IP and BP,SP regs, for
-	 * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
-	 * A possible PERF_SAMPLE_REGS will have to transfer all regs.
-	 */
-	regs = *iregs;
-	regs.ip = at->ip;
-	regs.bp = at->bp;
-	regs.sp = at->sp;
-
-	if (intel_pmu_pebs_fixup_ip(&regs))
-		regs.flags |= PERF_EFLAGS_EXACT;
-	else
-		regs.flags &= ~PERF_EFLAGS_EXACT;
-
-	if (perf_event_overflow(event, 1, &data, &regs))
-		x86_pmu_stop(event);
+	__intel_pmu_pebs_event(event, iregs, at);
 }
 
 static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
@@ -534,10 +550,7 @@ static void intel_pmu_drain_pebs_nhm(str
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct debug_store *ds = cpuc->ds;
 	struct pebs_record_nhm *at, *top;
-	struct perf_sample_data data;
 	struct perf_event *event = NULL;
-	struct perf_raw_record raw;
-	struct pt_regs regs;
 	u64 status = 0;
 	int bit, n;
 
@@ -579,33 +592,7 @@ static void intel_pmu_drain_pebs_nhm(str
 		if (!event || bit >= MAX_PEBS_EVENTS)
 			continue;
 
-		if (!intel_pmu_save_and_restart(event))
-			continue;
-
-		perf_sample_data_init(&data, 0);
-		data.period = event->hw.last_period;
-
-		if (event->attr.sample_type & PERF_SAMPLE_RAW) {
-			raw.size = x86_pmu.pebs_record_size;
-			raw.data = at;
-			data.raw = &raw;
-		}
-
-		/*
-		 * See the comment in intel_pmu_drain_pebs_core()
-		 */
-		regs = *iregs;
-		regs.ip = at->ip;
-		regs.bp = at->bp;
-		regs.sp = at->sp;
-
-		if (intel_pmu_pebs_fixup_ip(&regs))
-			regs.flags |= PERF_EFLAGS_EXACT;
-		else
-			regs.flags &= ~PERF_EFLAGS_EXACT;
-
-		if (perf_event_overflow(event, 1, &data, &regs))
-			x86_pmu_stop(event);
+		__intel_pmu_pebs_event(event, iregs, at);
 	}
 }
 
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -125,8 +125,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_PERIOD			= 1U << 8,
 	PERF_SAMPLE_STREAM_ID			= 1U << 9,
 	PERF_SAMPLE_RAW				= 1U << 10,
+	PERF_SAMPLE_EVENT_IP			= 1U << 11,
 
-	PERF_SAMPLE_MAX = 1U << 11,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 12,		/* non-ABI */
 };
 
 /*
@@ -294,7 +295,6 @@ struct perf_event_mmap_page {
 #define PERF_RECORD_MISC_USER			(2 << 0)
 #define PERF_RECORD_MISC_HYPERVISOR		(3 << 0)
 
-#define PERF_RECORD_MISC_EXACT			(1 << 14)
 /*
  * Reserve the last bit to indicate some extended misc field
  */
@@ -389,6 +389,7 @@ enum perf_event_type {
 	 *	struct perf_event_header	header;
 	 *
 	 *	{ u64			ip;	  } && PERF_SAMPLE_IP
+	 *	{ u64			event_ip; } && PERF_SAMPLE_EVENT_IP
 	 *	{ u32			pid, tid; } && PERF_SAMPLE_TID
 	 *	{ u64			time;     } && PERF_SAMPLE_TIME
 	 *	{ u64			addr;     } && PERF_SAMPLE_ADDR
@@ -804,6 +805,7 @@ struct perf_sample_data {
 	u64				type;
 
 	u64				ip;
+	u64				event_ip;
 	struct {
 		u32	pid;
 		u32	tid;
@@ -824,6 +826,7 @@ struct perf_sample_data {
 static inline
 void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
 {
+	data->event_ip = 0;
 	data->addr = addr;
 	data->raw  = NULL;
 }
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -3158,6 +3158,9 @@ void perf_output_sample(struct perf_outp
 	if (sample_type & PERF_SAMPLE_IP)
 		perf_output_put(handle, data->ip);
 
+	if (sample_type & PERF_SAMPLE_EVENT_IP)
+		perf_output_put(handle, data->event_ip);
+
 	if (sample_type & PERF_SAMPLE_TID)
 		perf_output_put(handle, data->tid_entry);
 
@@ -3237,6 +3240,9 @@ void perf_prepare_sample(struct perf_eve
 		header->size += sizeof(data->ip);
 	}
 
+	if (sample_type & PERF_SAMPLE_EVENT_IP)
+		header->size += sizeof(data->event_ip);
+
 	if (sample_type & PERF_SAMPLE_TID) {
 		/* namespace issues */
 		data->tid_entry.pid = perf_event_pid(event, current);
Index: linux-2.6/tools/perf/builtin-top.c
===================================================================
--- linux-2.6.orig/tools/perf/builtin-top.c
+++ linux-2.6/tools/perf/builtin-top.c
@@ -975,8 +975,10 @@ static void event__process_sample(const 
 		return;
 	}
 
-	if (self->header.misc & PERF_RECORD_MISC_EXACT)
+	if (ip)
 		exact_samples++;
+	else
+		return;
 
 	if (event__preprocess_sample(self, session, &al, symbol_filter) < 0 ||
 	    al.filtered)
@@ -1169,6 +1171,11 @@ static void start_counter(int i, int cou
 
 	attr->sample_type	= PERF_SAMPLE_IP | PERF_SAMPLE_TID;
 
+	if (attr->precise) {
+		attr->sample_type &= ~PERF_SAMPLE_IP;
+		attr->sample_type |= PERF_SAMPLE_EVENT_IP;
+	}
+
 	if (freq) {
 		attr->sample_type	|= PERF_SAMPLE_PERIOD;
 		attr->freq		= 1;



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 20:55 ` Peter Zijlstra
  2010-04-08 21:03   ` Peter Zijlstra
@ 2010-04-08 21:08   ` Stephane Eranian
  2010-04-08 21:11     ` Peter Zijlstra
                       ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
>>       There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
>>       when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
>>       bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
>>       field is encoded as int, thus the total is never a multiple of 8 which
>>       trips the check. I think the size should have been u64, but now it is
>>       too late to change given it is ABI.
>
> PEBS hasn't seen -linus yet, so we can fix that.
>
Are you suggesting you add some padding the PEBS raw sample you
return as PERF_SAMPLE_RAW? Then you need to define what RAW
actually means? Seems here, it would mean more than what the
HW returns.

> There's various things that do indeed rely on the perf buffer to always
> be u64 aligned, so this warning isn't bogus at all.
>
I assume this has to do with the wrap-around detection.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:08   ` Stephane Eranian
@ 2010-04-08 21:11     ` Peter Zijlstra
  2010-04-08 21:14       ` Stephane Eranian
  2010-04-08 21:12     ` Peter Zijlstra
  2010-04-08 21:15     ` Peter Zijlstra
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:11 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> 
> Are you suggesting you add some padding the PEBS raw sample you
> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> actually means? Seems here, it would mean more than what the
> HW returns. 

Well, RAW doesn't mean anything much at all, its really a fugly pass
some crap around thing.

So yeah, adding padding seems just fine.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:08   ` Stephane Eranian
  2010-04-08 21:11     ` Peter Zijlstra
@ 2010-04-08 21:12     ` Peter Zijlstra
  2010-04-08 21:15     ` Peter Zijlstra
  2 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:12 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> 
> > There's various things that do indeed rely on the perf buffer to always
> > be u64 aligned, so this warning isn't bogus at all.
> >
> I assume this has to do with the wrap-around detection.

Nah, mostly dealing with architectures that don't do well with unaligned
accesses.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:11     ` Peter Zijlstra
@ 2010-04-08 21:14       ` Stephane Eranian
  2010-04-08 21:17         ` Frederic Weisbecker
  2010-04-08 21:23         ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>>
>> Are you suggesting you add some padding the PEBS raw sample you
>> return as PERF_SAMPLE_RAW? Then you need to define what RAW
>> actually means? Seems here, it would mean more than what the
>> HW returns.
>
> Well, RAW doesn't mean anything much at all, its really a fugly pass
> some crap around thing.
>
> So yeah, adding padding seems just fine.
>
I would rather see size as u64. Who's using raw today anyway?

The PEBS record itself is always a multiple of u64. The size header is the one
causing the problems.

>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:08   ` Stephane Eranian
  2010-04-08 21:11     ` Peter Zijlstra
  2010-04-08 21:12     ` Peter Zijlstra
@ 2010-04-08 21:15     ` Peter Zijlstra
  2010-04-08 21:29       ` Stephane Eranian
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:15 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> >>       There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> >>       when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> >>       bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> >>       field is encoded as int, thus the total is never a multiple of 8 which
> >>       trips the check. I think the size should have been u64, but now it is
> >>       too late to change given it is ABI.
> >
> > PEBS hasn't seen -linus yet, so we can fix that.
> >
> Are you suggesting you add some padding the PEBS raw sample you
> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> actually means? Seems here, it would mean more than what the
> HW returns. 

The best fix here is to simply remove the PERF_SAMPLE_RAW support and
implement PERF_SAMPLE_REGS. Its just that we need to come up with a way
to deal with compat pt_regs muck.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
  2010-04-08 21:14       ` Stephane Eranian
@ 2010-04-08 21:17         ` Frederic Weisbecker
  2010-04-08 21:22           ` Stephane Eranian
  2010-04-08 21:23         ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2010-04-08 21:17 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
	robert.richter, perfmon2-devel, eranian

On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >>
> >> Are you suggesting you add some padding the PEBS raw sample you
> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> actually means? Seems here, it would mean more than what the
> >> HW returns.
> >
> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> > some crap around thing.
> >
> > So yeah, adding padding seems just fine.
> >
> I would rather see size as u64. Who's using raw today anyway?


The trace events. Hence the size of the size shouldn't be touched, it
is an ABI now.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:03   ` Peter Zijlstra
@ 2010-04-08 21:18     ` Stephane Eranian
  2010-04-08 21:24       ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 16495 bytes --]

On Thu, Apr 8, 2010 at 11:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:> On Thu, 2010-04-08 at 22:55 +0200, Peter Zijlstra wrote:>> On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:>> >     There is a warn_on_once() check for PERF_SAMPLE_RAW which trips>> >     when using PEBS on both Core and Nehalem. Core PEBS sample size is 144>> >     bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size>> >     field is encoded as int, thus the total is never a multiple of 8 which>> >     trips the check. I think the size should have been u64, but now it is>> >     too late to change given it is ABI.>>>> PEBS hasn't seen -linus yet, so we can fix that.>>>> There's various things that do indeed rely on the perf buffer to always>> be u64 aligned, so this warning isn't bogus at all.>> On the subject of PEBS, we need to change the ABI before it does hit> -linus, I've got something like the below,. but I'm not quite sure of> it.>> We could keep the PERF_RECORD_MISC_EXACT bit and allow non-fixed up IPs> in PERF_SAMPLE_EVENT_IP fields, that way we can avoid having to add both> IP and EVENT_IP to samples.>You always have the original IP in the PEBS sample, so why add yet another wayof getting to the same data?
> ---> Subject: perf: Change the PEBS interface> From: Peter Zijlstra <a.p.zijlstra@chello.nl>> Date: Tue Mar 30 13:35:58 CEST 2010>> This patch changes the PEBS interface from perf_event_attr::precise> and PERF_RECORD_MISC_EXACT to perf_event_attr::precise and> PERF_SAMPLE_EVENT_IP.>> The rationale is that .precise=1 !PERF_SAMPLE_EVENT_IP could be used> to implement buffered PEBS.>I am not sure I understand what you mean by buffered PEBS. Are you talkingabout using PEBS buffer bigger than one entry?If so, how can you do:  - the LBR based fixups for multiple samples (on PMU interrupt)  - attribute PID/TID in system-wide mode

> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>> LKML-Reference: <new-submission>> --->  arch/x86/include/asm/perf_event.h         |    9 -->  arch/x86/kernel/cpu/perf_event_intel_ds.c |  133 +++++++++++++----------------->  include/linux/perf_event.h                |    7 +>  kernel/perf_event.c                       |    6 +>  tools/perf/builtin-top.c                  |    9 +->  5 files changed, 79 insertions(+), 85 deletions(-)>> Index: linux-2.6/arch/x86/include/asm/perf_event.h> ===================================================================> --- linux-2.6.orig/arch/x86/include/asm/perf_event.h> +++ linux-2.6/arch/x86/include/asm/perf_event.h> @@ -128,21 +128,12 @@ extern void perf_events_lapic_init(void)>>  #define PERF_EVENT_INDEX_OFFSET                        0>> -/*> - * Abuse bit 3 of the cpu eflags register to indicate proper PEBS IP fixups.> - * This flag is otherwise unused and ABI specified to be 0, so nobody should> - * care what we do with it.> - */> -#define PERF_EFLAGS_EXACT      (1UL << 3)> ->  #define perf_misc_flags(regs)                          \>  ({     int misc = 0;                                   \>        if (user_mode(regs))                            \>                misc |= PERF_RECORD_MISC_USER;          \>        else                                            \>                misc |= PERF_RECORD_MISC_KERNEL;        \> -       if (regs->flags & PERF_EFLAGS_EXACT)            \> -               misc |= PERF_RECORD_MISC_EXACT;         \>        misc; })>>  #define perf_instruction_pointer(regs) ((regs)->ip)> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c> ===================================================================> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c> @@ -330,7 +330,8 @@ static void intel_pmu_pebs_enable(struct>        cpuc->pebs_enabled |= 1ULL << hwc->idx;>        WARN_ON_ONCE(cpuc->enabled);>> -       if (x86_pmu.intel_cap.pebs_trap)> +       if (x86_pmu.intel_cap.pebs_trap &&> +                       (event->attr.sample_type & PERF_SAMPLE_EVENT_IP))>                intel_pmu_lbr_enable(event);>  }>> @@ -345,7 +346,8 @@ static void intel_pmu_pebs_disable(struc>>        hwc->config |= ARCH_PERFMON_EVENTSEL_INT;>> -       if (x86_pmu.intel_cap.pebs_trap)> +       if (x86_pmu.intel_cap.pebs_trap &&> +                       (event->attr.sample_type & PERF_SAMPLE_EVENT_IP))>                intel_pmu_lbr_disable(event);>  }>> @@ -376,12 +378,12 @@ static inline bool kernel_ip(unsigned lo>  #endif>  }>> -static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)> +static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)>  {>        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);>        unsigned long from = cpuc->lbr_entries[0].from;>        unsigned long old_to, to = cpuc->lbr_entries[0].to;> -       unsigned long ip = regs->ip;> +       unsigned long ip = *ipp;>>        /*>         * We don't need to fixup if the PEBS assist is fault like> @@ -412,7 +414,7 @@ static int intel_pmu_pebs_fixup_ip(struc>         * We sampled a branch insn, rewind using the LBR stack>         */>        if (ip == to) {> -               regs->ip = from;> +               *ipp = from;>                return 1;>        }>> @@ -439,7 +441,7 @@ static int intel_pmu_pebs_fixup_ip(struc>        } while (to < ip);>>        if (to == ip) {> -               regs->ip = old_to;> +               *ipp = old_to;>                return 1;>        }>> @@ -452,15 +454,62 @@ static int intel_pmu_pebs_fixup_ip(struc>>  static int intel_pmu_save_and_restart(struct perf_event *event);>> +static void __intel_pmu_pebs_event(struct perf_event *event,> +                                  struct pt_regs *iregs, void *__pebs)> +{> +       /*> +        * We cast to pebs_record_core since that is a subset of> +        * both formats and we don't use the other fields in this> +        * routine.> +        */> +       struct pebs_record_core *pebs = __pebs;> +       struct perf_sample_data data;> +       struct perf_raw_record raw;> +       struct pt_regs regs;> +> +       if (!intel_pmu_save_and_restart(event))> +               return;> +> +       perf_sample_data_init(&data, 0);> +       data.period = event->hw.last_period;> +> +       if (event->attr.sample_type & PERF_SAMPLE_RAW) {> +               raw.size = x86_pmu.pebs_record_size;> +               raw.data = pebs;> +               data.raw = &raw;> +       }> +> +       /*> +        * We use the interrupt regs as a base because the PEBS record> +        * does not contain a full regs set, specifically it seems to> +        * lack segment descriptors, which get used by things like> +        * user_mode().> +        *> +        * In the simple case fix up only the IP and BP,SP regs, for> +        * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.> +        * A possible PERF_SAMPLE_REGS will have to transfer all regs.> +        */> +       regs = *iregs;> +       regs.ip = pebs->ip;> +       regs.bp = pebs->bp;> +       regs.sp = pebs->sp;> +> +       if (event->attr.sample_type & PERF_SAMPLE_EVENT_IP) {> +               unsigned long event_ip = pebs->ip;> +               if (intel_pmu_pebs_fixup_ip(&event_ip))> +                       data.event_ip = event_ip;> +       }> +> +       if (perf_event_overflow(event, 1, &data, &regs))> +               x86_pmu_stop(event);> +}> +>  static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)>  {>        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);>        struct debug_store *ds = cpuc->ds;>        struct perf_event *event = cpuc->events[0]; /* PMC0 only */>        struct pebs_record_core *at, *top;> -       struct perf_sample_data data;> -       struct perf_raw_record raw;> -       struct pt_regs regs;>        int n;>>        if (!ds || !x86_pmu.pebs)> @@ -486,9 +535,6 @@ static void intel_pmu_drain_pebs_core(st>        if (n <= 0)>                return;>> -       if (!intel_pmu_save_and_restart(event))> -               return;> ->        /*>         * Should not happen, we program the threshold at 1 and do not>         * set a reset value.> @@ -496,37 +542,7 @@ static void intel_pmu_drain_pebs_core(st>        WARN_ON_ONCE(n > 1);>        at += n - 1;>> -       perf_sample_data_init(&data, 0);> -       data.period = event->hw.last_period;> -> -       if (event->attr.sample_type & PERF_SAMPLE_RAW) {> -               raw.size = x86_pmu.pebs_record_size;> -               raw.data = at;> -               data.raw = &raw;> -       }> -> -       /*> -        * We use the interrupt regs as a base because the PEBS record> -        * does not contain a full regs set, specifically it seems to> -        * lack segment descriptors, which get used by things like> -        * user_mode().> -        *> -        * In the simple case fix up only the IP and BP,SP regs, for> -        * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.> -        * A possible PERF_SAMPLE_REGS will have to transfer all regs.> -        */> -       regs = *iregs;> -       regs.ip = at->ip;> -       regs.bp = at->bp;> -       regs.sp = at->sp;> -> -       if (intel_pmu_pebs_fixup_ip(&regs))> -               regs.flags |= PERF_EFLAGS_EXACT;> -       else> -               regs.flags &= ~PERF_EFLAGS_EXACT;> -> -       if (perf_event_overflow(event, 1, &data, &regs))> -               x86_pmu_stop(event);> +       __intel_pmu_pebs_event(event, iregs, at);>  }>>  static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)> @@ -534,10 +550,7 @@ static void intel_pmu_drain_pebs_nhm(str>        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);>        struct debug_store *ds = cpuc->ds;>        struct pebs_record_nhm *at, *top;> -       struct perf_sample_data data;>        struct perf_event *event = NULL;> -       struct perf_raw_record raw;> -       struct pt_regs regs;>        u64 status = 0;>        int bit, n;>> @@ -579,33 +592,7 @@ static void intel_pmu_drain_pebs_nhm(str>                if (!event || bit >= MAX_PEBS_EVENTS)>                        continue;>> -               if (!intel_pmu_save_and_restart(event))> -                       continue;> -> -               perf_sample_data_init(&data, 0);> -               data.period = event->hw.last_period;> -> -               if (event->attr.sample_type & PERF_SAMPLE_RAW) {> -                       raw.size = x86_pmu.pebs_record_size;> -                       raw.data = at;> -                       data.raw = &raw;> -               }> -> -               /*> -                * See the comment in intel_pmu_drain_pebs_core()> -                */> -               regs = *iregs;> -               regs.ip = at->ip;> -               regs.bp = at->bp;> -               regs.sp = at->sp;> -> -               if (intel_pmu_pebs_fixup_ip(&regs))> -                       regs.flags |= PERF_EFLAGS_EXACT;> -               else> -                       regs.flags &= ~PERF_EFLAGS_EXACT;> -> -               if (perf_event_overflow(event, 1, &data, &regs))> -                       x86_pmu_stop(event);> +               __intel_pmu_pebs_event(event, iregs, at);>        }>  }>> Index: linux-2.6/include/linux/perf_event.h> ===================================================================> --- linux-2.6.orig/include/linux/perf_event.h> +++ linux-2.6/include/linux/perf_event.h> @@ -125,8 +125,9 @@ enum perf_event_sample_format {>        PERF_SAMPLE_PERIOD                      = 1U << 8,>        PERF_SAMPLE_STREAM_ID                   = 1U << 9,>        PERF_SAMPLE_RAW                         = 1U << 10,> +       PERF_SAMPLE_EVENT_IP                    = 1U << 11,>> -       PERF_SAMPLE_MAX = 1U << 11,             /* non-ABI */> +       PERF_SAMPLE_MAX = 1U << 12,             /* non-ABI */>  };>>  /*> @@ -294,7 +295,6 @@ struct perf_event_mmap_page {>  #define PERF_RECORD_MISC_USER                  (2 << 0)>  #define PERF_RECORD_MISC_HYPERVISOR            (3 << 0)>> -#define PERF_RECORD_MISC_EXACT                 (1 << 14)>  /*>  * Reserve the last bit to indicate some extended misc field>  */> @@ -389,6 +389,7 @@ enum perf_event_type {>         *      struct perf_event_header        header;>         *>         *      { u64                   ip;       } && PERF_SAMPLE_IP> +        *      { u64                   event_ip; } && PERF_SAMPLE_EVENT_IP>         *      { u32                   pid, tid; } && PERF_SAMPLE_TID>         *      { u64                   time;     } && PERF_SAMPLE_TIME>         *      { u64                   addr;     } && PERF_SAMPLE_ADDR> @@ -804,6 +805,7 @@ struct perf_sample_data {>        u64                             type;>>        u64                             ip;> +       u64                             event_ip;>        struct {>                u32     pid;>                u32     tid;> @@ -824,6 +826,7 @@ struct perf_sample_data {>  static inline>  void perf_sample_data_init(struct perf_sample_data *data, u64 addr)>  {> +       data->event_ip = 0;>        data->addr = addr;>        data->raw  = NULL;>  }> Index: linux-2.6/kernel/perf_event.c> ===================================================================> --- linux-2.6.orig/kernel/perf_event.c> +++ linux-2.6/kernel/perf_event.c> @@ -3158,6 +3158,9 @@ void perf_output_sample(struct perf_outp>        if (sample_type & PERF_SAMPLE_IP)>                perf_output_put(handle, data->ip);>> +       if (sample_type & PERF_SAMPLE_EVENT_IP)> +               perf_output_put(handle, data->event_ip);> +>        if (sample_type & PERF_SAMPLE_TID)>                perf_output_put(handle, data->tid_entry);>> @@ -3237,6 +3240,9 @@ void perf_prepare_sample(struct perf_eve>                header->size += sizeof(data->ip);>        }>> +       if (sample_type & PERF_SAMPLE_EVENT_IP)> +               header->size += sizeof(data->event_ip);> +>        if (sample_type & PERF_SAMPLE_TID) {>                /* namespace issues */>                data->tid_entry.pid = perf_event_pid(event, current);> Index: linux-2.6/tools/perf/builtin-top.c> ===================================================================> --- linux-2.6.orig/tools/perf/builtin-top.c> +++ linux-2.6/tools/perf/builtin-top.c> @@ -975,8 +975,10 @@ static void event__process_sample(const>                return;>        }>> -       if (self->header.misc & PERF_RECORD_MISC_EXACT)> +       if (ip)>                exact_samples++;> +       else> +               return;>>        if (event__preprocess_sample(self, session, &al, symbol_filter) < 0 ||>            al.filtered)> @@ -1169,6 +1171,11 @@ static void start_counter(int i, int cou>>        attr->sample_type       = PERF_SAMPLE_IP | PERF_SAMPLE_TID;>> +       if (attr->precise) {> +               attr->sample_type &= ~PERF_SAMPLE_IP;> +               attr->sample_type |= PERF_SAMPLE_EVENT_IP;> +       }> +>        if (freq) {>                attr->sample_type       |= PERF_SAMPLE_PERIOD;>                attr->freq              = 1;>>>ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:17         ` Frederic Weisbecker
@ 2010-04-08 21:22           ` Stephane Eranian
  2010-04-08 21:42             ` Frederic Weisbecker
  2010-04-08 21:46             ` Frederic Weisbecker
  0 siblings, 2 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
	robert.richter, perfmon2-devel, eranian

On Thu, Apr 8, 2010 at 11:17 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
>> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>> >>
>> >> Are you suggesting you add some padding the PEBS raw sample you
>> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
>> >> actually means? Seems here, it would mean more than what the
>> >> HW returns.
>> >
>> > Well, RAW doesn't mean anything much at all, its really a fugly pass
>> > some crap around thing.
>> >
>> > So yeah, adding padding seems just fine.
>> >
>> I would rather see size as u64. Who's using raw today anyway?
>
>
> The trace events. Hence the size of the size shouldn't be touched, it
> is an ABI now.

Given your alignment constraints, it seems like it was a bad choice to pick
u32 for size to begin with.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:14       ` Stephane Eranian
  2010-04-08 21:17         ` Frederic Weisbecker
@ 2010-04-08 21:23         ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:23 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 23:14 +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >>
> >> Are you suggesting you add some padding the PEBS raw sample you
> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> actually means? Seems here, it would mean more than what the
> >> HW returns.
> >
> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> > some crap around thing.
> >
> > So yeah, adding padding seems just fine.
> >
> I would rather see size as u64. Who's using raw today anyway?

the tracepoint stuff does


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:18     ` Stephane Eranian
@ 2010-04-08 21:24       ` Peter Zijlstra
  2010-04-08 21:33         ` Stephane Eranian
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:24 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 23:18 +0200, Stephane Eranian wrote:
> I am not sure I understand what you mean by buffered PEBS. Are you talking
> about using PEBS buffer bigger than one entry?

Yep.

> If so, how can you do:
>   - the LBR based fixups for multiple samples (on PMU interrupt)

Not, so it would be the offset on.

>   - attribute PID/TID in system-wide mode

flush on context switches.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:15     ` Peter Zijlstra
@ 2010-04-08 21:29       ` Stephane Eranian
  2010-04-08 21:42         ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, Apr 8, 2010 at 11:15 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>> On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
>> >>       There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
>> >>       when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
>> >>       bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
>> >>       field is encoded as int, thus the total is never a multiple of 8 which
>> >>       trips the check. I think the size should have been u64, but now it is
>> >>       too late to change given it is ABI.
>> >
>> > PEBS hasn't seen -linus yet, so we can fix that.
>> >
>> Are you suggesting you add some padding the PEBS raw sample you
>> return as PERF_SAMPLE_RAW? Then you need to define what RAW
>> actually means? Seems here, it would mean more than what the
>> HW returns.
>
> The best fix here is to simply remove the PERF_SAMPLE_RAW support and
> implement PERF_SAMPLE_REGS. Its just that we need to come up with a way
> to deal with compat pt_regs muck.
>
>
But isn't RAW already part of the ABI?

It is true that you need to also provide the interrupted state (or
part of it) to the user, i.e.,
beyond just the IP.

But the interrupt state and PEBS state are distinct and should not be
swapped. In fact,
both may be useful at the same time to evaluate the skid.

So yes, you need PERF_SAMPLE_REGS. But just like your proposal with the two IPs.
I think you need to export two versions of REGS: IREGS and PREGS for
instance. The
issue is that PEBS does not record the whole state but only a very small subset.

Then, on Nehalem, there is PEBS more where you collect more that just
machine state,
you collect cache miss information. That's not regs anymore.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:24       ` Peter Zijlstra
@ 2010-04-08 21:33         ` Stephane Eranian
  0 siblings, 0 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, Apr 8, 2010 at 11:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 23:18 +0200, Stephane Eranian wrote:
>> I am not sure I understand what you mean by buffered PEBS. Are you talking
>> about using PEBS buffer bigger than one entry?
>
> Yep.
>
>> If so, how can you do:
>>   - the LBR based fixups for multiple samples (on PMU interrupt)
>
> Not, so it would be the offset on.
>
Not sure I understand what you mean here.

>>   - attribute PID/TID in system-wide mode
>
> flush on context switches.
>
cost?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
  2010-04-08 21:22           ` Stephane Eranian
@ 2010-04-08 21:42             ` Frederic Weisbecker
  2010-04-08 21:46             ` Frederic Weisbecker
  1 sibling, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2010-04-08 21:42 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
	robert.richter, perfmon2-devel, eranian

On Thu, Apr 08, 2010 at 11:22:05PM +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:17 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
> >> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >> >>
> >> >> Are you suggesting you add some padding the PEBS raw sample you
> >> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> >> actually means? Seems here, it would mean more than what the
> >> >> HW returns.
> >> >
> >> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> >> > some crap around thing.
> >> >
> >> > So yeah, adding padding seems just fine.
> >> >
> >> I would rather see size as u64. Who's using raw today anyway?
> >
> >
> > The trace events. Hence the size of the size shouldn't be touched, it
> > is an ABI now.
> 
> Given your alignment constraints, it seems like it was a bad choice to pick
> u32 for size to begin with.


Indeed, I'm not exactly sure how this is dealt since this is indeed u32
and the buffer requires to align to u64...


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in  perf_prepare_sample()
  2010-04-08 21:29       ` Stephane Eranian
@ 2010-04-08 21:42         ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:42 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
	perfmon2-devel, eranian

On Thu, 2010-04-08 at 23:29 +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:15 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >> On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> >> >>       There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> >> >>       when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> >> >>       bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> >> >>       field is encoded as int, thus the total is never a multiple of 8 which
> >> >>       trips the check. I think the size should have been u64, but now it is
> >> >>       too late to change given it is ABI.
> >> >
> >> > PEBS hasn't seen -linus yet, so we can fix that.
> >> >
> >> Are you suggesting you add some padding the PEBS raw sample you
> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> actually means? Seems here, it would mean more than what the
> >> HW returns.
> >
> > The best fix here is to simply remove the PERF_SAMPLE_RAW support and
> > implement PERF_SAMPLE_REGS. Its just that we need to come up with a way
> > to deal with compat pt_regs muck.
> >
> >
> But isn't RAW already part of the ABI?

The existance of it, yes, the content less so.

> It is true that you need to also provide the interrupted state (or
> part of it) to the user, i.e., beyond just the IP.

Why?

The only thing I would do is maybe use the interrupt state to fill out
the PEBS reg data, its mostly things like segment regs that go missing,
and for those the interrupt state ought to be good enough.

> But the interrupt state and PEBS state are distinct and should not be
> swapped. In fact, both may be useful at the same time to evaluate the skid.

That doesn't seem like something a regular person would be interested
in, and for those who do they can easily hack the kernel.

I don't think that is a use-case worth expanding the ABI over.

> So yes, you need PERF_SAMPLE_REGS. But just like your proposal with the two IPs.
> I think you need to export two versions of REGS: IREGS and PREGS for
> instance. The issue is that PEBS does not record the whole state but only a very small subset.
> 
> Then, on Nehalem, there is PEBS more where you collect more that just
> machine state, you collect cache miss information. That's not regs anymore.

Yes, the whole load-latency train-wreck is something we need to come up
with a sensible interface for, that's definitely not something I would
do through RAW.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
  2010-04-08 21:22           ` Stephane Eranian
  2010-04-08 21:42             ` Frederic Weisbecker
@ 2010-04-08 21:46             ` Frederic Weisbecker
  1 sibling, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2010-04-08 21:46 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
	robert.richter, perfmon2-devel, eranian

On Thu, Apr 08, 2010 at 11:22:05PM +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:17 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
> >> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >> >>
> >> >> Are you suggesting you add some padding the PEBS raw sample you
> >> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> >> actually means? Seems here, it would mean more than what the
> >> >> HW returns.
> >> >
> >> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> >> > some crap around thing.
> >> >
> >> > So yeah, adding padding seems just fine.
> >> >
> >> I would rather see size as u64. Who's using raw today anyway?
> >
> >
> > The trace events. Hence the size of the size shouldn't be touched, it
> > is an ABI now.
> 
> Given your alignment constraints, it seems like it was a bad choice to pick
> u32 for size to begin with.


Ah I remember now how we did that. We align the trace events raw sample size
such that sizeof(raw_sample) + sizeof(size) is aligned to u64.

Well, indeed that complexifies a bit the raw_sample size handling but at least
it makes the traces a bit more compact.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-04-08 21:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-08 20:45 [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample() Stephane Eranian
2010-04-08 20:55 ` Peter Zijlstra
2010-04-08 21:03   ` Peter Zijlstra
2010-04-08 21:18     ` Stephane Eranian
2010-04-08 21:24       ` Peter Zijlstra
2010-04-08 21:33         ` Stephane Eranian
2010-04-08 21:08   ` Stephane Eranian
2010-04-08 21:11     ` Peter Zijlstra
2010-04-08 21:14       ` Stephane Eranian
2010-04-08 21:17         ` Frederic Weisbecker
2010-04-08 21:22           ` Stephane Eranian
2010-04-08 21:42             ` Frederic Weisbecker
2010-04-08 21:46             ` Frederic Weisbecker
2010-04-08 21:23         ` Peter Zijlstra
2010-04-08 21:12     ` Peter Zijlstra
2010-04-08 21:15     ` Peter Zijlstra
2010-04-08 21:29       ` Stephane Eranian
2010-04-08 21:42         ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.