* [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
@ 2010-04-08 20:45 Stephane Eranian
2010-04-08 20:55 ` Peter Zijlstra
0 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 20:45 UTC (permalink / raw)
To: linux-kernel
Cc: peterz, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian, eranian
There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
field is encoded as int, thus the total is never a multiple of 8 which
trips the check. I think the size should have been u64, but now it is
too late to change given it is ABI.
Signed-off-by: Stephane Eranian <eranian@google.com>
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 8143e77..fffeb95 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3311,7 +3311,6 @@ void perf_prepare_sample(struct perf_event_header *header,
else
size += sizeof(u32);
- WARN_ON_ONCE(size & (sizeof(u64)-1));
header->size += size;
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 20:45 [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample() Stephane Eranian
@ 2010-04-08 20:55 ` Peter Zijlstra
2010-04-08 21:03 ` Peter Zijlstra
2010-04-08 21:08 ` Stephane Eranian
0 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 20:55 UTC (permalink / raw)
To: eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> field is encoded as int, thus the total is never a multiple of 8 which
> trips the check. I think the size should have been u64, but now it is
> too late to change given it is ABI.
PEBS hasn't seen -linus yet, so we can fix that.
There's various things that do indeed rely on the perf buffer to always
be u64 aligned, so this warning isn't bogus at all.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 20:55 ` Peter Zijlstra
@ 2010-04-08 21:03 ` Peter Zijlstra
2010-04-08 21:18 ` Stephane Eranian
2010-04-08 21:08 ` Stephane Eranian
1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:03 UTC (permalink / raw)
To: eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 22:55 +0200, Peter Zijlstra wrote:
> On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> > There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> > when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> > bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> > field is encoded as int, thus the total is never a multiple of 8 which
> > trips the check. I think the size should have been u64, but now it is
> > too late to change given it is ABI.
>
> PEBS hasn't seen -linus yet, so we can fix that.
>
> There's various things that do indeed rely on the perf buffer to always
> be u64 aligned, so this warning isn't bogus at all.
On the subject of PEBS, we need to change the ABI before it does hit
-linus, I've got something like the below,. but I'm not quite sure of
it.
We could keep the PERF_RECORD_MISC_EXACT bit and allow non-fixed up IPs
in PERF_SAMPLE_EVENT_IP fields, that way we can avoid having to add both
IP and EVENT_IP to samples.
---
Subject: perf: Change the PEBS interface
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Mar 30 13:35:58 CEST 2010
This patch changes the PEBS interface from perf_event_attr::precise
and PERF_RECORD_MISC_EXACT to perf_event_attr::precise and
PERF_SAMPLE_EVENT_IP.
The rationale is that .precise=1 !PERF_SAMPLE_EVENT_IP could be used
to implement buffered PEBS.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
---
arch/x86/include/asm/perf_event.h | 9 --
arch/x86/kernel/cpu/perf_event_intel_ds.c | 133 +++++++++++++-----------------
include/linux/perf_event.h | 7 +
kernel/perf_event.c | 6 +
tools/perf/builtin-top.c | 9 +-
5 files changed, 79 insertions(+), 85 deletions(-)
Index: linux-2.6/arch/x86/include/asm/perf_event.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/perf_event.h
+++ linux-2.6/arch/x86/include/asm/perf_event.h
@@ -128,21 +128,12 @@ extern void perf_events_lapic_init(void)
#define PERF_EVENT_INDEX_OFFSET 0
-/*
- * Abuse bit 3 of the cpu eflags register to indicate proper PEBS IP fixups.
- * This flag is otherwise unused and ABI specified to be 0, so nobody should
- * care what we do with it.
- */
-#define PERF_EFLAGS_EXACT (1UL << 3)
-
#define perf_misc_flags(regs) \
({ int misc = 0; \
if (user_mode(regs)) \
misc |= PERF_RECORD_MISC_USER; \
else \
misc |= PERF_RECORD_MISC_KERNEL; \
- if (regs->flags & PERF_EFLAGS_EXACT) \
- misc |= PERF_RECORD_MISC_EXACT; \
misc; })
#define perf_instruction_pointer(regs) ((regs)->ip)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -330,7 +330,8 @@ static void intel_pmu_pebs_enable(struct
cpuc->pebs_enabled |= 1ULL << hwc->idx;
WARN_ON_ONCE(cpuc->enabled);
- if (x86_pmu.intel_cap.pebs_trap)
+ if (x86_pmu.intel_cap.pebs_trap &&
+ (event->attr.sample_type & PERF_SAMPLE_EVENT_IP))
intel_pmu_lbr_enable(event);
}
@@ -345,7 +346,8 @@ static void intel_pmu_pebs_disable(struc
hwc->config |= ARCH_PERFMON_EVENTSEL_INT;
- if (x86_pmu.intel_cap.pebs_trap)
+ if (x86_pmu.intel_cap.pebs_trap &&
+ (event->attr.sample_type & PERF_SAMPLE_EVENT_IP))
intel_pmu_lbr_disable(event);
}
@@ -376,12 +378,12 @@ static inline bool kernel_ip(unsigned lo
#endif
}
-static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
+static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
unsigned long from = cpuc->lbr_entries[0].from;
unsigned long old_to, to = cpuc->lbr_entries[0].to;
- unsigned long ip = regs->ip;
+ unsigned long ip = *ipp;
/*
* We don't need to fixup if the PEBS assist is fault like
@@ -412,7 +414,7 @@ static int intel_pmu_pebs_fixup_ip(struc
* We sampled a branch insn, rewind using the LBR stack
*/
if (ip == to) {
- regs->ip = from;
+ *ipp = from;
return 1;
}
@@ -439,7 +441,7 @@ static int intel_pmu_pebs_fixup_ip(struc
} while (to < ip);
if (to == ip) {
- regs->ip = old_to;
+ *ipp = old_to;
return 1;
}
@@ -452,15 +454,62 @@ static int intel_pmu_pebs_fixup_ip(struc
static int intel_pmu_save_and_restart(struct perf_event *event);
+static void __intel_pmu_pebs_event(struct perf_event *event,
+ struct pt_regs *iregs, void *__pebs)
+{
+ /*
+ * We cast to pebs_record_core since that is a subset of
+ * both formats and we don't use the other fields in this
+ * routine.
+ */
+ struct pebs_record_core *pebs = __pebs;
+ struct perf_sample_data data;
+ struct perf_raw_record raw;
+ struct pt_regs regs;
+
+ if (!intel_pmu_save_and_restart(event))
+ return;
+
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+
+ if (event->attr.sample_type & PERF_SAMPLE_RAW) {
+ raw.size = x86_pmu.pebs_record_size;
+ raw.data = pebs;
+ data.raw = &raw;
+ }
+
+ /*
+ * We use the interrupt regs as a base because the PEBS record
+ * does not contain a full regs set, specifically it seems to
+ * lack segment descriptors, which get used by things like
+ * user_mode().
+ *
+ * In the simple case fix up only the IP and BP,SP regs, for
+ * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
+ * A possible PERF_SAMPLE_REGS will have to transfer all regs.
+ */
+ regs = *iregs;
+ regs.ip = pebs->ip;
+ regs.bp = pebs->bp;
+ regs.sp = pebs->sp;
+
+ if (event->attr.sample_type & PERF_SAMPLE_EVENT_IP) {
+ unsigned long event_ip = pebs->ip;
+ if (intel_pmu_pebs_fixup_ip(&event_ip))
+ data.event_ip = event_ip;
+ }
+
+ if (perf_event_overflow(event, 1, &data, ®s))
+ x86_pmu_stop(event);
+}
+
static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
struct perf_event *event = cpuc->events[0]; /* PMC0 only */
struct pebs_record_core *at, *top;
- struct perf_sample_data data;
- struct perf_raw_record raw;
- struct pt_regs regs;
int n;
if (!ds || !x86_pmu.pebs)
@@ -486,9 +535,6 @@ static void intel_pmu_drain_pebs_core(st
if (n <= 0)
return;
- if (!intel_pmu_save_and_restart(event))
- return;
-
/*
* Should not happen, we program the threshold at 1 and do not
* set a reset value.
@@ -496,37 +542,7 @@ static void intel_pmu_drain_pebs_core(st
WARN_ON_ONCE(n > 1);
at += n - 1;
- perf_sample_data_init(&data, 0);
- data.period = event->hw.last_period;
-
- if (event->attr.sample_type & PERF_SAMPLE_RAW) {
- raw.size = x86_pmu.pebs_record_size;
- raw.data = at;
- data.raw = &raw;
- }
-
- /*
- * We use the interrupt regs as a base because the PEBS record
- * does not contain a full regs set, specifically it seems to
- * lack segment descriptors, which get used by things like
- * user_mode().
- *
- * In the simple case fix up only the IP and BP,SP regs, for
- * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
- * A possible PERF_SAMPLE_REGS will have to transfer all regs.
- */
- regs = *iregs;
- regs.ip = at->ip;
- regs.bp = at->bp;
- regs.sp = at->sp;
-
- if (intel_pmu_pebs_fixup_ip(®s))
- regs.flags |= PERF_EFLAGS_EXACT;
- else
- regs.flags &= ~PERF_EFLAGS_EXACT;
-
- if (perf_event_overflow(event, 1, &data, ®s))
- x86_pmu_stop(event);
+ __intel_pmu_pebs_event(event, iregs, at);
}
static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
@@ -534,10 +550,7 @@ static void intel_pmu_drain_pebs_nhm(str
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
struct pebs_record_nhm *at, *top;
- struct perf_sample_data data;
struct perf_event *event = NULL;
- struct perf_raw_record raw;
- struct pt_regs regs;
u64 status = 0;
int bit, n;
@@ -579,33 +592,7 @@ static void intel_pmu_drain_pebs_nhm(str
if (!event || bit >= MAX_PEBS_EVENTS)
continue;
- if (!intel_pmu_save_and_restart(event))
- continue;
-
- perf_sample_data_init(&data, 0);
- data.period = event->hw.last_period;
-
- if (event->attr.sample_type & PERF_SAMPLE_RAW) {
- raw.size = x86_pmu.pebs_record_size;
- raw.data = at;
- data.raw = &raw;
- }
-
- /*
- * See the comment in intel_pmu_drain_pebs_core()
- */
- regs = *iregs;
- regs.ip = at->ip;
- regs.bp = at->bp;
- regs.sp = at->sp;
-
- if (intel_pmu_pebs_fixup_ip(®s))
- regs.flags |= PERF_EFLAGS_EXACT;
- else
- regs.flags &= ~PERF_EFLAGS_EXACT;
-
- if (perf_event_overflow(event, 1, &data, ®s))
- x86_pmu_stop(event);
+ __intel_pmu_pebs_event(event, iregs, at);
}
}
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -125,8 +125,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_PERIOD = 1U << 8,
PERF_SAMPLE_STREAM_ID = 1U << 9,
PERF_SAMPLE_RAW = 1U << 10,
+ PERF_SAMPLE_EVENT_IP = 1U << 11,
- PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */
};
/*
@@ -294,7 +295,6 @@ struct perf_event_mmap_page {
#define PERF_RECORD_MISC_USER (2 << 0)
#define PERF_RECORD_MISC_HYPERVISOR (3 << 0)
-#define PERF_RECORD_MISC_EXACT (1 << 14)
/*
* Reserve the last bit to indicate some extended misc field
*/
@@ -389,6 +389,7 @@ enum perf_event_type {
* struct perf_event_header header;
*
* { u64 ip; } && PERF_SAMPLE_IP
+ * { u64 event_ip; } && PERF_SAMPLE_EVENT_IP
* { u32 pid, tid; } && PERF_SAMPLE_TID
* { u64 time; } && PERF_SAMPLE_TIME
* { u64 addr; } && PERF_SAMPLE_ADDR
@@ -804,6 +805,7 @@ struct perf_sample_data {
u64 type;
u64 ip;
+ u64 event_ip;
struct {
u32 pid;
u32 tid;
@@ -824,6 +826,7 @@ struct perf_sample_data {
static inline
void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
{
+ data->event_ip = 0;
data->addr = addr;
data->raw = NULL;
}
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -3158,6 +3158,9 @@ void perf_output_sample(struct perf_outp
if (sample_type & PERF_SAMPLE_IP)
perf_output_put(handle, data->ip);
+ if (sample_type & PERF_SAMPLE_EVENT_IP)
+ perf_output_put(handle, data->event_ip);
+
if (sample_type & PERF_SAMPLE_TID)
perf_output_put(handle, data->tid_entry);
@@ -3237,6 +3240,9 @@ void perf_prepare_sample(struct perf_eve
header->size += sizeof(data->ip);
}
+ if (sample_type & PERF_SAMPLE_EVENT_IP)
+ header->size += sizeof(data->event_ip);
+
if (sample_type & PERF_SAMPLE_TID) {
/* namespace issues */
data->tid_entry.pid = perf_event_pid(event, current);
Index: linux-2.6/tools/perf/builtin-top.c
===================================================================
--- linux-2.6.orig/tools/perf/builtin-top.c
+++ linux-2.6/tools/perf/builtin-top.c
@@ -975,8 +975,10 @@ static void event__process_sample(const
return;
}
- if (self->header.misc & PERF_RECORD_MISC_EXACT)
+ if (ip)
exact_samples++;
+ else
+ return;
if (event__preprocess_sample(self, session, &al, symbol_filter) < 0 ||
al.filtered)
@@ -1169,6 +1171,11 @@ static void start_counter(int i, int cou
attr->sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID;
+ if (attr->precise) {
+ attr->sample_type &= ~PERF_SAMPLE_IP;
+ attr->sample_type |= PERF_SAMPLE_EVENT_IP;
+ }
+
if (freq) {
attr->sample_type |= PERF_SAMPLE_PERIOD;
attr->freq = 1;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 20:55 ` Peter Zijlstra
2010-04-08 21:03 ` Peter Zijlstra
@ 2010-04-08 21:08 ` Stephane Eranian
2010-04-08 21:11 ` Peter Zijlstra
` (2 more replies)
1 sibling, 3 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
>> There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
>> when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
>> bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
>> field is encoded as int, thus the total is never a multiple of 8 which
>> trips the check. I think the size should have been u64, but now it is
>> too late to change given it is ABI.
>
> PEBS hasn't seen -linus yet, so we can fix that.
>
Are you suggesting you add some padding the PEBS raw sample you
return as PERF_SAMPLE_RAW? Then you need to define what RAW
actually means? Seems here, it would mean more than what the
HW returns.
> There's various things that do indeed rely on the perf buffer to always
> be u64 aligned, so this warning isn't bogus at all.
>
I assume this has to do with the wrap-around detection.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:08 ` Stephane Eranian
@ 2010-04-08 21:11 ` Peter Zijlstra
2010-04-08 21:14 ` Stephane Eranian
2010-04-08 21:12 ` Peter Zijlstra
2010-04-08 21:15 ` Peter Zijlstra
2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:11 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>
> Are you suggesting you add some padding the PEBS raw sample you
> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> actually means? Seems here, it would mean more than what the
> HW returns.
Well, RAW doesn't mean anything much at all, its really a fugly pass
some crap around thing.
So yeah, adding padding seems just fine.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:08 ` Stephane Eranian
2010-04-08 21:11 ` Peter Zijlstra
@ 2010-04-08 21:12 ` Peter Zijlstra
2010-04-08 21:15 ` Peter Zijlstra
2 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:12 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>
> > There's various things that do indeed rely on the perf buffer to always
> > be u64 aligned, so this warning isn't bogus at all.
> >
> I assume this has to do with the wrap-around detection.
Nah, mostly dealing with architectures that don't do well with unaligned
accesses.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:11 ` Peter Zijlstra
@ 2010-04-08 21:14 ` Stephane Eranian
2010-04-08 21:17 ` Frederic Weisbecker
2010-04-08 21:23 ` Peter Zijlstra
0 siblings, 2 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>>
>> Are you suggesting you add some padding the PEBS raw sample you
>> return as PERF_SAMPLE_RAW? Then you need to define what RAW
>> actually means? Seems here, it would mean more than what the
>> HW returns.
>
> Well, RAW doesn't mean anything much at all, its really a fugly pass
> some crap around thing.
>
> So yeah, adding padding seems just fine.
>
I would rather see size as u64. Who's using raw today anyway?
The PEBS record itself is always a multiple of u64. The size header is the one
causing the problems.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:08 ` Stephane Eranian
2010-04-08 21:11 ` Peter Zijlstra
2010-04-08 21:12 ` Peter Zijlstra
@ 2010-04-08 21:15 ` Peter Zijlstra
2010-04-08 21:29 ` Stephane Eranian
2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:15 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> >> There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> >> when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> >> bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> >> field is encoded as int, thus the total is never a multiple of 8 which
> >> trips the check. I think the size should have been u64, but now it is
> >> too late to change given it is ABI.
> >
> > PEBS hasn't seen -linus yet, so we can fix that.
> >
> Are you suggesting you add some padding the PEBS raw sample you
> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> actually means? Seems here, it would mean more than what the
> HW returns.
The best fix here is to simply remove the PERF_SAMPLE_RAW support and
implement PERF_SAMPLE_REGS. Its just that we need to come up with a way
to deal with compat pt_regs muck.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:14 ` Stephane Eranian
@ 2010-04-08 21:17 ` Frederic Weisbecker
2010-04-08 21:22 ` Stephane Eranian
2010-04-08 21:23 ` Peter Zijlstra
1 sibling, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2010-04-08 21:17 UTC (permalink / raw)
To: Stephane Eranian
Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
robert.richter, perfmon2-devel, eranian
On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >>
> >> Are you suggesting you add some padding the PEBS raw sample you
> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> actually means? Seems here, it would mean more than what the
> >> HW returns.
> >
> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> > some crap around thing.
> >
> > So yeah, adding padding seems just fine.
> >
> I would rather see size as u64. Who's using raw today anyway?
The trace events. Hence the size of the size shouldn't be touched, it
is an ABI now.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:03 ` Peter Zijlstra
@ 2010-04-08 21:18 ` Stephane Eranian
2010-04-08 21:24 ` Peter Zijlstra
0 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 16495 bytes --]
On Thu, Apr 8, 2010 at 11:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:> On Thu, 2010-04-08 at 22:55 +0200, Peter Zijlstra wrote:>> On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:>> > Â Â There is a warn_on_once() check for PERF_SAMPLE_RAW which trips>> > Â Â when using PEBS on both Core and Nehalem. Core PEBS sample size is 144>> > Â Â bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size>> > Â Â field is encoded as int, thus the total is never a multiple of 8 which>> > Â Â trips the check. I think the size should have been u64, but now it is>> > Â Â too late to change given it is ABI.>>>> PEBS hasn't seen -linus yet, so we can fix that.>>>> There's various things that do indeed rely on the perf buffer to always>> be u64 aligned, so this warning isn't bogus at all.>> On the subject of PEBS, we need to change the ABI before it does hit> -linus, I've got something like the below,. but I'm not quite sure of> it.>> We could keep the PERF_RECORD_MISC_EXACT bit and allow non-fixed up IPs> in PERF_SAMPLE_EVENT_IP fields, that way we can avoid having to add both> IP and EVENT_IP to samples.>You always have the original IP in the PEBS sample, so why add yet another wayof getting to the same data?
> ---> Subject: perf: Change the PEBS interface> From: Peter Zijlstra <a.p.zijlstra@chello.nl>> Date: Tue Mar 30 13:35:58 CEST 2010>> This patch changes the PEBS interface from perf_event_attr::precise> and PERF_RECORD_MISC_EXACT to perf_event_attr::precise and> PERF_SAMPLE_EVENT_IP.>> The rationale is that .precise=1 !PERF_SAMPLE_EVENT_IP could be used> to implement buffered PEBS.>I am not sure I understand what you mean by buffered PEBS. Are you talkingabout using PEBS buffer bigger than one entry?If so, how can you do: - the LBR based fixups for multiple samples (on PMU interrupt) - attribute PID/TID in system-wide mode
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>> LKML-Reference: <new-submission>> --->  arch/x86/include/asm/perf_event.h     |   9 -->  arch/x86/kernel/cpu/perf_event_intel_ds.c |  133 +++++++++++++----------------->  include/linux/perf_event.h         |   7 +>  kernel/perf_event.c            |   6 +>  tools/perf/builtin-top.c          |   9 +->  5 files changed, 79 insertions(+), 85 deletions(-)>> Index: linux-2.6/arch/x86/include/asm/perf_event.h> ===================================================================> --- linux-2.6.orig/arch/x86/include/asm/perf_event.h> +++ linux-2.6/arch/x86/include/asm/perf_event.h> @@ -128,21 +128,12 @@ extern void perf_events_lapic_init(void)>>  #define PERF_EVENT_INDEX_OFFSET             0>> -/*> - * Abuse bit 3 of the cpu eflags register to indicate proper PEBS IP fixups.> - * This flag is otherwise unused and ABI specified to be 0, so nobody should> - * care what we do with it.> - */> -#define PERF_EFLAGS_EXACT    (1UL << 3)> ->  #define perf_misc_flags(regs)              \>  ({   int misc = 0;                  \>     if (user_mode(regs))               \>         misc |= PERF_RECORD_MISC_USER;      \>     else                       \>         misc |= PERF_RECORD_MISC_KERNEL;     \> -    if (regs->flags & PERF_EFLAGS_EXACT)       \> -        misc |= PERF_RECORD_MISC_EXACT;     \>     misc; })>>  #define perf_instruction_pointer(regs) ((regs)->ip)> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c> ===================================================================> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c> @@ -330,7 +330,8 @@ static void intel_pmu_pebs_enable(struct>     cpuc->pebs_enabled |= 1ULL << hwc->idx;>     WARN_ON_ONCE(cpuc->enabled);>> -    if (x86_pmu.intel_cap.pebs_trap)> +    if (x86_pmu.intel_cap.pebs_trap &&> +            (event->attr.sample_type & PERF_SAMPLE_EVENT_IP))>         intel_pmu_lbr_enable(event);>  }>> @@ -345,7 +346,8 @@ static void intel_pmu_pebs_disable(struc>>     hwc->config |= ARCH_PERFMON_EVENTSEL_INT;>> -    if (x86_pmu.intel_cap.pebs_trap)> +    if (x86_pmu.intel_cap.pebs_trap &&> +            (event->attr.sample_type & PERF_SAMPLE_EVENT_IP))>         intel_pmu_lbr_disable(event);>  }>> @@ -376,12 +378,12 @@ static inline bool kernel_ip(unsigned lo>  #endif>  }>> -static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)> +static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)>  {>     struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);>     unsigned long from = cpuc->lbr_entries[0].from;>     unsigned long old_to, to = cpuc->lbr_entries[0].to;> -    unsigned long ip = regs->ip;> +    unsigned long ip = *ipp;>>     /*>     * We don't need to fixup if the PEBS assist is fault like> @@ -412,7 +414,7 @@ static int intel_pmu_pebs_fixup_ip(struc>     * We sampled a branch insn, rewind using the LBR stack>     */>     if (ip == to) {> -        regs->ip = from;> +        *ipp = from;>         return 1;>     }>> @@ -439,7 +441,7 @@ static int intel_pmu_pebs_fixup_ip(struc>     } while (to < ip);>>     if (to == ip) {> -        regs->ip = old_to;> +        *ipp = old_to;>         return 1;>     }>> @@ -452,15 +454,62 @@ static int intel_pmu_pebs_fixup_ip(struc>>  static int intel_pmu_save_and_restart(struct perf_event *event);>> +static void __intel_pmu_pebs_event(struct perf_event *event,> +                  struct pt_regs *iregs, void *__pebs)> +{> +    /*> +     * We cast to pebs_record_core since that is a subset of> +     * both formats and we don't use the other fields in this> +     * routine.> +     */> +    struct pebs_record_core *pebs = __pebs;> +    struct perf_sample_data data;> +    struct perf_raw_record raw;> +    struct pt_regs regs;> +> +    if (!intel_pmu_save_and_restart(event))> +        return;> +> +    perf_sample_data_init(&data, 0);> +    data.period = event->hw.last_period;> +> +    if (event->attr.sample_type & PERF_SAMPLE_RAW) {> +        raw.size = x86_pmu.pebs_record_size;> +        raw.data = pebs;> +        data.raw = &raw;> +    }> +> +    /*> +     * We use the interrupt regs as a base because the PEBS record> +     * does not contain a full regs set, specifically it seems to> +     * lack segment descriptors, which get used by things like> +     * user_mode().> +     *> +     * In the simple case fix up only the IP and BP,SP regs, for> +     * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.> +     * A possible PERF_SAMPLE_REGS will have to transfer all regs.> +     */> +    regs = *iregs;> +    regs.ip = pebs->ip;> +    regs.bp = pebs->bp;> +    regs.sp = pebs->sp;> +> +    if (event->attr.sample_type & PERF_SAMPLE_EVENT_IP) {> +        unsigned long event_ip = pebs->ip;> +        if (intel_pmu_pebs_fixup_ip(&event_ip))> +            data.event_ip = event_ip;> +    }> +> +    if (perf_event_overflow(event, 1, &data, ®s))> +        x86_pmu_stop(event);> +}> +>  static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)>  {>     struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);>     struct debug_store *ds = cpuc->ds;>     struct perf_event *event = cpuc->events[0]; /* PMC0 only */>     struct pebs_record_core *at, *top;> -    struct perf_sample_data data;> -    struct perf_raw_record raw;> -    struct pt_regs regs;>     int n;>>     if (!ds || !x86_pmu.pebs)> @@ -486,9 +535,6 @@ static void intel_pmu_drain_pebs_core(st>     if (n <= 0)>         return;>> -    if (!intel_pmu_save_and_restart(event))> -        return;> ->     /*>     * Should not happen, we program the threshold at 1 and do not>     * set a reset value.> @@ -496,37 +542,7 @@ static void intel_pmu_drain_pebs_core(st>     WARN_ON_ONCE(n > 1);>     at += n - 1;>> -    perf_sample_data_init(&data, 0);> -    data.period = event->hw.last_period;> -> -    if (event->attr.sample_type & PERF_SAMPLE_RAW) {> -        raw.size = x86_pmu.pebs_record_size;> -        raw.data = at;> -        data.raw = &raw;> -    }> -> -    /*> -     * We use the interrupt regs as a base because the PEBS record> -     * does not contain a full regs set, specifically it seems to> -     * lack segment descriptors, which get used by things like> -     * user_mode().> -     *> -     * In the simple case fix up only the IP and BP,SP regs, for> -     * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.> -     * A possible PERF_SAMPLE_REGS will have to transfer all regs.> -     */> -    regs = *iregs;> -    regs.ip = at->ip;> -    regs.bp = at->bp;> -    regs.sp = at->sp;> -> -    if (intel_pmu_pebs_fixup_ip(®s))> -        regs.flags |= PERF_EFLAGS_EXACT;> -    else> -        regs.flags &= ~PERF_EFLAGS_EXACT;> -> -    if (perf_event_overflow(event, 1, &data, ®s))> -        x86_pmu_stop(event);> +    __intel_pmu_pebs_event(event, iregs, at);>  }>>  static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)> @@ -534,10 +550,7 @@ static void intel_pmu_drain_pebs_nhm(str>     struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);>     struct debug_store *ds = cpuc->ds;>     struct pebs_record_nhm *at, *top;> -    struct perf_sample_data data;>     struct perf_event *event = NULL;> -    struct perf_raw_record raw;> -    struct pt_regs regs;>     u64 status = 0;>     int bit, n;>> @@ -579,33 +592,7 @@ static void intel_pmu_drain_pebs_nhm(str>         if (!event || bit >= MAX_PEBS_EVENTS)>             continue;>> -        if (!intel_pmu_save_and_restart(event))> -            continue;> -> -        perf_sample_data_init(&data, 0);> -        data.period = event->hw.last_period;> -> -        if (event->attr.sample_type & PERF_SAMPLE_RAW) {> -            raw.size = x86_pmu.pebs_record_size;> -            raw.data = at;> -            data.raw = &raw;> -        }> -> -        /*> -         * See the comment in intel_pmu_drain_pebs_core()> -         */> -        regs = *iregs;> -        regs.ip = at->ip;> -        regs.bp = at->bp;> -        regs.sp = at->sp;> -> -        if (intel_pmu_pebs_fixup_ip(®s))> -            regs.flags |= PERF_EFLAGS_EXACT;> -        else> -            regs.flags &= ~PERF_EFLAGS_EXACT;> -> -        if (perf_event_overflow(event, 1, &data, ®s))> -            x86_pmu_stop(event);> +        __intel_pmu_pebs_event(event, iregs, at);>     }>  }>> Index: linux-2.6/include/linux/perf_event.h> ===================================================================> --- linux-2.6.orig/include/linux/perf_event.h> +++ linux-2.6/include/linux/perf_event.h> @@ -125,8 +125,9 @@ enum perf_event_sample_format {>     PERF_SAMPLE_PERIOD            = 1U << 8,>     PERF_SAMPLE_STREAM_ID          = 1U << 9,>     PERF_SAMPLE_RAW             = 1U << 10,> +    PERF_SAMPLE_EVENT_IP           = 1U << 11,>> -    PERF_SAMPLE_MAX = 1U << 11,       /* non-ABI */> +    PERF_SAMPLE_MAX = 1U << 12,       /* non-ABI */>  };>>  /*> @@ -294,7 +295,6 @@ struct perf_event_mmap_page {>  #define PERF_RECORD_MISC_USER          (2 << 0)>  #define PERF_RECORD_MISC_HYPERVISOR       (3 << 0)>> -#define PERF_RECORD_MISC_EXACT         (1 << 14)>  /*>  * Reserve the last bit to indicate some extended misc field>  */> @@ -389,6 +389,7 @@ enum perf_event_type {>     *    struct perf_event_header     header;>     *>     *    { u64          ip;    } && PERF_SAMPLE_IP> +     *    { u64          event_ip; } && PERF_SAMPLE_EVENT_IP>     *    { u32          pid, tid; } && PERF_SAMPLE_TID>     *    { u64          time;   } && PERF_SAMPLE_TIME>     *    { u64          addr;   } && PERF_SAMPLE_ADDR> @@ -804,6 +805,7 @@ struct perf_sample_data {>     u64               type;>>     u64               ip;> +    u64               event_ip;>     struct {>         u32   pid;>         u32   tid;> @@ -824,6 +826,7 @@ struct perf_sample_data {>  static inline>  void perf_sample_data_init(struct perf_sample_data *data, u64 addr)>  {> +    data->event_ip = 0;>     data->addr = addr;>     data->raw  = NULL;>  }> Index: linux-2.6/kernel/perf_event.c> ===================================================================> --- linux-2.6.orig/kernel/perf_event.c> +++ linux-2.6/kernel/perf_event.c> @@ -3158,6 +3158,9 @@ void perf_output_sample(struct perf_outp>     if (sample_type & PERF_SAMPLE_IP)>         perf_output_put(handle, data->ip);>> +    if (sample_type & PERF_SAMPLE_EVENT_IP)> +        perf_output_put(handle, data->event_ip);> +>     if (sample_type & PERF_SAMPLE_TID)>         perf_output_put(handle, data->tid_entry);>> @@ -3237,6 +3240,9 @@ void perf_prepare_sample(struct perf_eve>         header->size += sizeof(data->ip);>     }>> +    if (sample_type & PERF_SAMPLE_EVENT_IP)> +        header->size += sizeof(data->event_ip);> +>     if (sample_type & PERF_SAMPLE_TID) {>         /* namespace issues */>         data->tid_entry.pid = perf_event_pid(event, current);> Index: linux-2.6/tools/perf/builtin-top.c> ===================================================================> --- linux-2.6.orig/tools/perf/builtin-top.c> +++ linux-2.6/tools/perf/builtin-top.c> @@ -975,8 +975,10 @@ static void event__process_sample(const>         return;>     }>> -    if (self->header.misc & PERF_RECORD_MISC_EXACT)> +    if (ip)>         exact_samples++;> +    else> +        return;>>     if (event__preprocess_sample(self, session, &al, symbol_filter) < 0 ||>       al.filtered)> @@ -1169,6 +1171,11 @@ static void start_counter(int i, int cou>>     attr->sample_type    = PERF_SAMPLE_IP | PERF_SAMPLE_TID;>> +    if (attr->precise) {> +        attr->sample_type &= ~PERF_SAMPLE_IP;> +        attr->sample_type |= PERF_SAMPLE_EVENT_IP;> +    }> +>     if (freq) {>         attr->sample_type    |= PERF_SAMPLE_PERIOD;>         attr->freq        = 1;>>>ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:17 ` Frederic Weisbecker
@ 2010-04-08 21:22 ` Stephane Eranian
2010-04-08 21:42 ` Frederic Weisbecker
2010-04-08 21:46 ` Frederic Weisbecker
0 siblings, 2 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:22 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
robert.richter, perfmon2-devel, eranian
On Thu, Apr 8, 2010 at 11:17 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
>> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>> >>
>> >> Are you suggesting you add some padding the PEBS raw sample you
>> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
>> >> actually means? Seems here, it would mean more than what the
>> >> HW returns.
>> >
>> > Well, RAW doesn't mean anything much at all, its really a fugly pass
>> > some crap around thing.
>> >
>> > So yeah, adding padding seems just fine.
>> >
>> I would rather see size as u64. Who's using raw today anyway?
>
>
> The trace events. Hence the size of the size shouldn't be touched, it
> is an ABI now.
Given your alignment constraints, it seems like it was a bad choice to pick
u32 for size to begin with.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:14 ` Stephane Eranian
2010-04-08 21:17 ` Frederic Weisbecker
@ 2010-04-08 21:23 ` Peter Zijlstra
1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:23 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 23:14 +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >>
> >> Are you suggesting you add some padding the PEBS raw sample you
> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> actually means? Seems here, it would mean more than what the
> >> HW returns.
> >
> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> > some crap around thing.
> >
> > So yeah, adding padding seems just fine.
> >
> I would rather see size as u64. Who's using raw today anyway?
the tracepoint stuff does
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:18 ` Stephane Eranian
@ 2010-04-08 21:24 ` Peter Zijlstra
2010-04-08 21:33 ` Stephane Eranian
0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:24 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 23:18 +0200, Stephane Eranian wrote:
> I am not sure I understand what you mean by buffered PEBS. Are you talking
> about using PEBS buffer bigger than one entry?
Yep.
> If so, how can you do:
> - the LBR based fixups for multiple samples (on PMU interrupt)
Not, so it would be the offset on.
> - attribute PID/TID in system-wide mode
flush on context switches.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:15 ` Peter Zijlstra
@ 2010-04-08 21:29 ` Stephane Eranian
2010-04-08 21:42 ` Peter Zijlstra
0 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, Apr 8, 2010 at 11:15 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
>> On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
>> >> There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
>> >> when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
>> >> bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
>> >> field is encoded as int, thus the total is never a multiple of 8 which
>> >> trips the check. I think the size should have been u64, but now it is
>> >> too late to change given it is ABI.
>> >
>> > PEBS hasn't seen -linus yet, so we can fix that.
>> >
>> Are you suggesting you add some padding the PEBS raw sample you
>> return as PERF_SAMPLE_RAW? Then you need to define what RAW
>> actually means? Seems here, it would mean more than what the
>> HW returns.
>
> The best fix here is to simply remove the PERF_SAMPLE_RAW support and
> implement PERF_SAMPLE_REGS. Its just that we need to come up with a way
> to deal with compat pt_regs muck.
>
>
But isn't RAW already part of the ABI?
It is true that you need to also provide the interrupted state (or
part of it) to the user, i.e.,
beyond just the IP.
But the interrupt state and PEBS state are distinct and should not be
swapped. In fact,
both may be useful at the same time to evaluate the skid.
So yes, you need PERF_SAMPLE_REGS. But just like your proposal with the two IPs.
I think you need to export two versions of REGS: IREGS and PREGS for
instance. The
issue is that PEBS does not record the whole state but only a very small subset.
Then, on Nehalem, there is PEBS more where you collect more that just
machine state,
you collect cache miss information. That's not regs anymore.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:24 ` Peter Zijlstra
@ 2010-04-08 21:33 ` Stephane Eranian
0 siblings, 0 replies; 18+ messages in thread
From: Stephane Eranian @ 2010-04-08 21:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, Apr 8, 2010 at 11:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-04-08 at 23:18 +0200, Stephane Eranian wrote:
>> I am not sure I understand what you mean by buffered PEBS. Are you talking
>> about using PEBS buffer bigger than one entry?
>
> Yep.
>
>> If so, how can you do:
>> - the LBR based fixups for multiple samples (on PMU interrupt)
>
> Not, so it would be the offset on.
>
Not sure I understand what you mean here.
>> - attribute PID/TID in system-wide mode
>
> flush on context switches.
>
cost?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:22 ` Stephane Eranian
@ 2010-04-08 21:42 ` Frederic Weisbecker
2010-04-08 21:46 ` Frederic Weisbecker
1 sibling, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2010-04-08 21:42 UTC (permalink / raw)
To: Stephane Eranian
Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
robert.richter, perfmon2-devel, eranian
On Thu, Apr 08, 2010 at 11:22:05PM +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:17 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
> >> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >> >>
> >> >> Are you suggesting you add some padding the PEBS raw sample you
> >> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> >> actually means? Seems here, it would mean more than what the
> >> >> HW returns.
> >> >
> >> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> >> > some crap around thing.
> >> >
> >> > So yeah, adding padding seems just fine.
> >> >
> >> I would rather see size as u64. Who's using raw today anyway?
> >
> >
> > The trace events. Hence the size of the size shouldn't be touched, it
> > is an ABI now.
>
> Given your alignment constraints, it seems like it was a bad choice to pick
> u32 for size to begin with.
Indeed, I'm not exactly sure how this is dealt since this is indeed u32
and the buffer requires to align to u64...
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:29 ` Stephane Eranian
@ 2010-04-08 21:42 ` Peter Zijlstra
0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2010-04-08 21:42 UTC (permalink / raw)
To: Stephane Eranian
Cc: linux-kernel, mingo, paulus, davem, fweisbec, robert.richter,
perfmon2-devel, eranian
On Thu, 2010-04-08 at 23:29 +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:15 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >> On Thu, Apr 8, 2010 at 10:55 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Thu, 2010-04-08 at 22:45 +0200, Stephane Eranian wrote:
> >> >> There is a warn_on_once() check for PERF_SAMPLE_RAW which trips
> >> >> when using PEBS on both Core and Nehalem. Core PEBS sample size is 144
> >> >> bytes and 176 bytes for Nehalem. Both are multiples of 8, but the size
> >> >> field is encoded as int, thus the total is never a multiple of 8 which
> >> >> trips the check. I think the size should have been u64, but now it is
> >> >> too late to change given it is ABI.
> >> >
> >> > PEBS hasn't seen -linus yet, so we can fix that.
> >> >
> >> Are you suggesting you add some padding the PEBS raw sample you
> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> actually means? Seems here, it would mean more than what the
> >> HW returns.
> >
> > The best fix here is to simply remove the PERF_SAMPLE_RAW support and
> > implement PERF_SAMPLE_REGS. Its just that we need to come up with a way
> > to deal with compat pt_regs muck.
> >
> >
> But isn't RAW already part of the ABI?
The existance of it, yes, the content less so.
> It is true that you need to also provide the interrupted state (or
> part of it) to the user, i.e., beyond just the IP.
Why?
The only thing I would do is maybe use the interrupt state to fill out
the PEBS reg data, its mostly things like segment regs that go missing,
and for those the interrupt state ought to be good enough.
> But the interrupt state and PEBS state are distinct and should not be
> swapped. In fact, both may be useful at the same time to evaluate the skid.
That doesn't seem like something a regular person would be interested
in, and for those who do they can easily hack the kernel.
I don't think that is a use-case worth expanding the ABI over.
> So yes, you need PERF_SAMPLE_REGS. But just like your proposal with the two IPs.
> I think you need to export two versions of REGS: IREGS and PREGS for
> instance. The issue is that PEBS does not record the whole state but only a very small subset.
>
> Then, on Nehalem, there is PEBS more where you collect more that just
> machine state, you collect cache miss information. That's not regs anymore.
Yes, the whole load-latency train-wreck is something we need to come up
with a sensible interface for, that's definitely not something I would
do through RAW.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample()
2010-04-08 21:22 ` Stephane Eranian
2010-04-08 21:42 ` Frederic Weisbecker
@ 2010-04-08 21:46 ` Frederic Weisbecker
1 sibling, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2010-04-08 21:46 UTC (permalink / raw)
To: Stephane Eranian
Cc: Peter Zijlstra, linux-kernel, mingo, paulus, davem,
robert.richter, perfmon2-devel, eranian
On Thu, Apr 08, 2010 at 11:22:05PM +0200, Stephane Eranian wrote:
> On Thu, Apr 8, 2010 at 11:17 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Thu, Apr 08, 2010 at 11:14:15PM +0200, Stephane Eranian wrote:
> >> On Thu, Apr 8, 2010 at 11:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Thu, 2010-04-08 at 23:08 +0200, Stephane Eranian wrote:
> >> >>
> >> >> Are you suggesting you add some padding the PEBS raw sample you
> >> >> return as PERF_SAMPLE_RAW? Then you need to define what RAW
> >> >> actually means? Seems here, it would mean more than what the
> >> >> HW returns.
> >> >
> >> > Well, RAW doesn't mean anything much at all, its really a fugly pass
> >> > some crap around thing.
> >> >
> >> > So yeah, adding padding seems just fine.
> >> >
> >> I would rather see size as u64. Who's using raw today anyway?
> >
> >
> > The trace events. Hence the size of the size shouldn't be touched, it
> > is an ABI now.
>
> Given your alignment constraints, it seems like it was a bad choice to pick
> u32 for size to begin with.
Ah I remember now how we did that. We align the trace events raw sample size
such that sizeof(raw_sample) + sizeof(size) is aligned to u64.
Well, indeed that complexifies a bit the raw_sample size handling but at least
it makes the traces a bit more compact.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-04-08 21:46 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-08 20:45 [PATCH] perf_events: fix bogus warn_on(_once) in perf_prepare_sample() Stephane Eranian
2010-04-08 20:55 ` Peter Zijlstra
2010-04-08 21:03 ` Peter Zijlstra
2010-04-08 21:18 ` Stephane Eranian
2010-04-08 21:24 ` Peter Zijlstra
2010-04-08 21:33 ` Stephane Eranian
2010-04-08 21:08 ` Stephane Eranian
2010-04-08 21:11 ` Peter Zijlstra
2010-04-08 21:14 ` Stephane Eranian
2010-04-08 21:17 ` Frederic Weisbecker
2010-04-08 21:22 ` Stephane Eranian
2010-04-08 21:42 ` Frederic Weisbecker
2010-04-08 21:46 ` Frederic Weisbecker
2010-04-08 21:23 ` Peter Zijlstra
2010-04-08 21:12 ` Peter Zijlstra
2010-04-08 21:15 ` Peter Zijlstra
2010-04-08 21:29 ` Stephane Eranian
2010-04-08 21:42 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.