* [RFC][PATCH 00/11] Another stab at PEBS and LBR support
@ 2010-03-03 16:39 Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 01/11] perf, x86: Remove superfluous arguments to x86_perf_event_set_period() Peter Zijlstra
` (10 more replies)
0 siblings, 11 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
Sorta works, the PEBS-LBR-fixup stuff makes my machine unhappy, but I could
have made a silly mistake there..
Can be tested using the below patchlet and something like: perf top -e r00c0p
---
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 05d0c5c..f8314e6 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -656,6 +656,11 @@ parse_raw_event(const char **strp, struct perf_event_attr *attr)
return EVT_FAILED;
n = hex2u64(str + 1, &config);
if (n > 0) {
+ if (str[n+1] == 'p') {
+ attr->precise = 1;
+ printf("precise\n");
+ n++;
+ }
*strp = str + n + 1;
attr->type = PERF_TYPE_RAW;
attr->config = config;
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC][PATCH 01/11] perf, x86: Remove superfluous arguments to x86_perf_event_set_period()
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 02/11] perf, x86: Remove superfluous arguments to x86_perf_event_update() Peter Zijlstra
` (9 subsequent siblings)
10 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-x86-cleanup-args.patch --]
[-- Type: text/plain, Size: 2698 bytes --]
The second and third argument to x86_perf_event_set_period() are
superfluous since they are simple expressions of the first argument.
Hence remove them.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 15 +++++++--------
arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
2 files changed, 8 insertions(+), 9 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -165,8 +165,7 @@ static DEFINE_PER_CPU(struct cpu_hw_even
.enabled = 1,
};
-static int x86_perf_event_set_period(struct perf_event *event,
- struct hw_perf_event *hwc, int idx);
+static int x86_perf_event_set_period(struct perf_event *event);
/*
* Generalized hw caching related hw_event table, filled
@@ -830,7 +829,7 @@ void hw_perf_enable(void)
if (hwc->idx == -1) {
x86_assign_hw_event(event, cpuc, i);
- x86_perf_event_set_period(event, hwc, hwc->idx);
+ x86_perf_event_set_period(event);
}
/*
* need to mark as active because x86_pmu_disable()
@@ -871,12 +870,12 @@ static DEFINE_PER_CPU(u64 [X86_PMC_IDX_M
* To be called with the event disabled in hw:
*/
static int
-x86_perf_event_set_period(struct perf_event *event,
- struct hw_perf_event *hwc, int idx)
+x86_perf_event_set_period(struct perf_event *event)
{
+ struct hw_perf_event *hwc = &event->hw;
s64 left = atomic64_read(&hwc->period_left);
s64 period = hwc->sample_period;
- int err, ret = 0;
+ int err, ret = 0, idx = hwc->idx;
if (idx == X86_PMC_IDX_FIXED_BTS)
return 0;
@@ -974,7 +973,7 @@ static int x86_pmu_start(struct perf_eve
if (hwc->idx == -1)
return -EAGAIN;
- x86_perf_event_set_period(event, hwc, hwc->idx);
+ x86_perf_event_set_period(event);
x86_pmu.enable(hwc, hwc->idx);
return 0;
@@ -1119,7 +1118,7 @@ static int x86_pmu_handle_irq(struct pt_
handled = 1;
data.period = event->hw.last_period;
- if (!x86_perf_event_set_period(event, hwc, idx))
+ if (!x86_perf_event_set_period(event))
continue;
if (perf_event_overflow(event, 1, &data, regs))
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -700,7 +700,7 @@ static int intel_pmu_save_and_restart(st
int ret;
x86_perf_event_update(event, hwc, idx);
- ret = x86_perf_event_set_period(event, hwc, idx);
+ ret = x86_perf_event_set_period(event);
return ret;
}
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 02/11] perf, x86: Remove superfluous arguments to x86_perf_event_update()
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 01/11] perf, x86: Remove superfluous arguments to x86_perf_event_set_period() Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 03/11] perf, x86: Change x86_pmu.{enable,disable} calling convention Peter Zijlstra
` (8 subsequent siblings)
10 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-x86-cleanup-args1.patch --]
[-- Type: text/plain, Size: 2493 bytes --]
The second and third argument to x86_perf_event_update() are
superfluous since they are simple expressions of the first argument.
Hence remove them.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 11 ++++++-----
arch/x86/kernel/cpu/perf_event_intel.c | 10 ++--------
2 files changed, 8 insertions(+), 13 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -188,11 +188,12 @@ static u64 __read_mostly hw_cache_event_
* Returns the delta events processed.
*/
static u64
-x86_perf_event_update(struct perf_event *event,
- struct hw_perf_event *hwc, int idx)
+x86_perf_event_update(struct perf_event *event)
{
+ struct hw_perf_event *hwc = &event->hw;
int shift = 64 - x86_pmu.event_bits;
u64 prev_raw_count, new_raw_count;
+ int idx = hwc->idx;
s64 delta;
if (idx == X86_PMC_IDX_FIXED_BTS)
@@ -1059,7 +1060,7 @@ static void x86_pmu_stop(struct perf_eve
* Drain the remaining delta count out of a event
* that we are disabling:
*/
- x86_perf_event_update(event, hwc, idx);
+ x86_perf_event_update(event);
cpuc->events[idx] = NULL;
}
@@ -1108,7 +1109,7 @@ static int x86_pmu_handle_irq(struct pt_
event = cpuc->events[idx];
hwc = &event->hw;
- val = x86_perf_event_update(event, hwc, idx);
+ val = x86_perf_event_update(event);
if (val & (1ULL << (x86_pmu.event_bits - 1)))
continue;
@@ -1419,7 +1420,7 @@ void __init init_hw_perf_events(void)
static inline void x86_pmu_read(struct perf_event *event)
{
- x86_perf_event_update(event, &event->hw, event->hw.idx);
+ x86_perf_event_update(event);
}
static const struct pmu pmu = {
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -695,14 +695,8 @@ static void intel_pmu_enable_event(struc
*/
static int intel_pmu_save_and_restart(struct perf_event *event)
{
- struct hw_perf_event *hwc = &event->hw;
- int idx = hwc->idx;
- int ret;
-
- x86_perf_event_update(event, hwc, idx);
- ret = x86_perf_event_set_period(event);
-
- return ret;
+ x86_perf_event_update(event);
+ return x86_perf_event_set_period(event);
}
static void intel_pmu_reset(void)
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 03/11] perf, x86: Change x86_pmu.{enable,disable} calling convention
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 01/11] perf, x86: Remove superfluous arguments to x86_perf_event_set_period() Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 02/11] perf, x86: Remove superfluous arguments to x86_perf_event_update() Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 04/11] perf, x86: Use unlocked bitops Peter Zijlstra
` (7 subsequent siblings)
10 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-x86-cleanup-args2.patch --]
[-- Type: text/plain, Size: 7246 bytes --]
Pass the ful perf_event into the x86_pmu functions so that those may
make use of more than the hw_perf_event, and while doing this, remove
the superfluous second argument.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 31 +++++++++++++++----------------
arch/x86/kernel/cpu/perf_event_intel.c | 30 +++++++++++++++++-------------
arch/x86/kernel/cpu/perf_event_p6.c | 10 ++++++----
3 files changed, 38 insertions(+), 33 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -133,8 +133,8 @@ struct x86_pmu {
int (*handle_irq)(struct pt_regs *);
void (*disable_all)(void);
void (*enable_all)(void);
- void (*enable)(struct hw_perf_event *, int);
- void (*disable)(struct hw_perf_event *, int);
+ void (*enable)(struct perf_event *);
+ void (*disable)(struct perf_event *);
unsigned eventsel;
unsigned perfctr;
u64 (*event_map)(int);
@@ -840,7 +840,7 @@ void hw_perf_enable(void)
set_bit(hwc->idx, cpuc->active_mask);
cpuc->events[hwc->idx] = event;
- x86_pmu.enable(hwc, hwc->idx);
+ x86_pmu.enable(event);
perf_event_update_userpage(event);
}
cpuc->n_added = 0;
@@ -853,15 +853,16 @@ void hw_perf_enable(void)
x86_pmu.enable_all();
}
-static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc, int idx)
+static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc)
{
- (void)checking_wrmsrl(hwc->config_base + idx,
+ (void)checking_wrmsrl(hwc->config_base + hwc->idx,
hwc->config | ARCH_PERFMON_EVENTSEL_ENABLE);
}
-static inline void x86_pmu_disable_event(struct hw_perf_event *hwc, int idx)
+static inline void x86_pmu_disable_event(struct perf_event *event)
{
- (void)checking_wrmsrl(hwc->config_base + idx, hwc->config);
+ struct hw_perf_event *hwc = &event->hw;
+ (void)checking_wrmsrl(hwc->config_base + hwc->idx, hwc->config);
}
static DEFINE_PER_CPU(u64 [X86_PMC_IDX_MAX], pmc_prev_left);
@@ -922,11 +923,11 @@ x86_perf_event_set_period(struct perf_ev
return ret;
}
-static void x86_pmu_enable_event(struct hw_perf_event *hwc, int idx)
+static void x86_pmu_enable_event(struct perf_event *event)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
if (cpuc->enabled)
- __x86_pmu_enable_event(hwc, idx);
+ __x86_pmu_enable_event(&event->hw);
}
/*
@@ -969,13 +970,11 @@ static int x86_pmu_enable(struct perf_ev
static int x86_pmu_start(struct perf_event *event)
{
- struct hw_perf_event *hwc = &event->hw;
-
- if (hwc->idx == -1)
+ if (event->hw.idx == -1)
return -EAGAIN;
x86_perf_event_set_period(event);
- x86_pmu.enable(hwc, hwc->idx);
+ x86_pmu.enable(event);
return 0;
}
@@ -989,7 +988,7 @@ static void x86_pmu_unthrottle(struct pe
cpuc->events[hwc->idx] != event))
return;
- x86_pmu.enable(hwc, hwc->idx);
+ x86_pmu.enable(event);
}
void perf_event_print_debug(void)
@@ -1054,7 +1053,7 @@ static void x86_pmu_stop(struct perf_eve
* could reenable again:
*/
clear_bit(idx, cpuc->active_mask);
- x86_pmu.disable(hwc, idx);
+ x86_pmu.disable(event);
/*
* Drain the remaining delta count out of a event
@@ -1123,7 +1122,7 @@ static int x86_pmu_handle_irq(struct pt_
continue;
if (perf_event_overflow(event, 1, &data, regs))
- x86_pmu.disable(hwc, idx);
+ x86_pmu.disable(event);
}
if (handled)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -548,9 +548,9 @@ static inline void intel_pmu_ack_status(
}
static inline void
-intel_pmu_disable_fixed(struct hw_perf_event *hwc, int __idx)
+intel_pmu_disable_fixed(struct hw_perf_event *hwc)
{
- int idx = __idx - X86_PMC_IDX_FIXED;
+ int idx = hwc->idx - X86_PMC_IDX_FIXED;
u64 ctrl_val, mask;
mask = 0xfULL << (idx * 4);
@@ -622,26 +622,28 @@ static void intel_pmu_drain_bts_buffer(v
}
static inline void
-intel_pmu_disable_event(struct hw_perf_event *hwc, int idx)
+intel_pmu_disable_event(struct perf_event *event)
{
- if (unlikely(idx == X86_PMC_IDX_FIXED_BTS)) {
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) {
intel_pmu_disable_bts();
intel_pmu_drain_bts_buffer();
return;
}
if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
- intel_pmu_disable_fixed(hwc, idx);
+ intel_pmu_disable_fixed(hwc);
return;
}
- x86_pmu_disable_event(hwc, idx);
+ x86_pmu_disable_event(event);
}
static inline void
-intel_pmu_enable_fixed(struct hw_perf_event *hwc, int __idx)
+intel_pmu_enable_fixed(struct hw_perf_event *hwc)
{
- int idx = __idx - X86_PMC_IDX_FIXED;
+ int idx = hwc->idx - X86_PMC_IDX_FIXED;
u64 ctrl_val, bits, mask;
int err;
@@ -671,9 +673,11 @@ intel_pmu_enable_fixed(struct hw_perf_ev
err = checking_wrmsrl(hwc->config_base, ctrl_val);
}
-static void intel_pmu_enable_event(struct hw_perf_event *hwc, int idx)
+static void intel_pmu_enable_event(struct perf_event *event)
{
- if (unlikely(idx == X86_PMC_IDX_FIXED_BTS)) {
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) {
if (!__get_cpu_var(cpu_hw_events).enabled)
return;
@@ -682,11 +686,11 @@ static void intel_pmu_enable_event(struc
}
if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
- intel_pmu_enable_fixed(hwc, idx);
+ intel_pmu_enable_fixed(hwc);
return;
}
- __x86_pmu_enable_event(hwc, idx);
+ __x86_pmu_enable_event(hwc);
}
/*
@@ -774,7 +778,7 @@ again:
data.period = event->hw.last_period;
if (perf_event_overflow(event, 1, &data, regs))
- intel_pmu_disable_event(&event->hw, bit);
+ intel_pmu_disable_event(event);
}
intel_pmu_ack_status(ack);
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_p6.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_p6.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_p6.c
@@ -77,27 +77,29 @@ static void p6_pmu_enable_all(void)
}
static inline void
-p6_pmu_disable_event(struct hw_perf_event *hwc, int idx)
+p6_pmu_disable_event(struct perf_event *event)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
u64 val = P6_NOP_EVENT;
if (cpuc->enabled)
val |= ARCH_PERFMON_EVENTSEL_ENABLE;
- (void)checking_wrmsrl(hwc->config_base + idx, val);
+ (void)checking_wrmsrl(hwc->config_base + hwc->idx, val);
}
-static void p6_pmu_enable_event(struct hw_perf_event *hwc, int idx)
+static void p6_pmu_enable_event(struct perf_event *event)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
u64 val;
val = hwc->config;
if (cpuc->enabled)
val |= ARCH_PERFMON_EVENTSEL_ENABLE;
- (void)checking_wrmsrl(hwc->config_base + idx, val);
+ (void)checking_wrmsrl(hwc->config_base + hwc->idx, val);
}
static __initconst struct x86_pmu p6_pmu = {
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 04/11] perf, x86: Use unlocked bitops
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (2 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 03/11] perf, x86: Change x86_pmu.{enable,disable} calling convention Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization Peter Zijlstra
` (6 subsequent siblings)
10 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-x86-unlocked-bitops.patch --]
[-- Type: text/plain, Size: 2593 bytes --]
There is no concurrency on these variables, so don't use LOCK'ed ops.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 8 ++++----
arch/x86/kernel/cpu/perf_event_amd.c | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -638,7 +638,7 @@ static int x86_schedule_events(struct cp
if (test_bit(hwc->idx, used_mask))
break;
- set_bit(hwc->idx, used_mask);
+ __set_bit(hwc->idx, used_mask);
if (assign)
assign[i] = hwc->idx;
}
@@ -687,7 +687,7 @@ static int x86_schedule_events(struct cp
if (j == X86_PMC_IDX_MAX)
break;
- set_bit(j, used_mask);
+ __set_bit(j, used_mask);
if (assign)
assign[i] = j;
@@ -837,7 +837,7 @@ void hw_perf_enable(void)
* clear active_mask and events[] yet it preserves
* idx
*/
- set_bit(hwc->idx, cpuc->active_mask);
+ __set_bit(hwc->idx, cpuc->active_mask);
cpuc->events[hwc->idx] = event;
x86_pmu.enable(event);
@@ -1052,7 +1052,7 @@ static void x86_pmu_stop(struct perf_eve
* Must be done before we disable, otherwise the nmi handler
* could reenable again:
*/
- clear_bit(idx, cpuc->active_mask);
+ __clear_bit(idx, cpuc->active_mask);
x86_pmu.disable(event);
/*
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_amd.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_amd.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_amd.c
@@ -309,7 +309,7 @@ static struct amd_nb *amd_alloc_nb(int c
* initialize all possible NB constraints
*/
for (i = 0; i < x86_pmu.num_events; i++) {
- set_bit(i, nb->event_constraints[i].idxmsk);
+ __set_bit(i, nb->event_constraints[i].idxmsk);
nb->event_constraints[i].weight = 1;
}
return nb;
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -768,7 +768,7 @@ again:
for_each_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
- clear_bit(bit, (unsigned long *) &status);
+ __clear_bit(bit, (unsigned long *) &status);
if (!test_bit(bit, cpuc->active_mask))
continue;
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (3 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 04/11] perf, x86: Use unlocked bitops Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 16:49 ` David Miller
` (2 more replies)
2010-03-03 16:39 ` [RFC][PATCH 06/11] perf, x86: PEBS infrastructure Peter Zijlstra
` (5 subsequent siblings)
10 siblings, 3 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Jamie Iles,
Jean Pihet, David S. Miller, stable, Peter Zijlstra
[-- Attachment #1: perf-fixup-data.patch --]
[-- Type: text/plain, Size: 6207 bytes --]
This makes it easier to extend perf_sample_data and fixes a bug on
arm and sparc, which failed to set ->raw to NULL, which can cause
crashes when combined with PERF_SAMPLE_RAW.
It also optimizes PowerPC and tracepoint, because the struct
initialization is forced to zero out the whole structure.
CC: Jamie Iles <jamie.iles@picochip.com>
CC: Jean Pihet <jpihet@mvista.com>
CC: Paul Mackerras <paulus@samba.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: David S. Miller <davem@davemloft.net>
CC: Stephane Eranian <eranian@google.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/kernel/perf_event.c | 4 ++--
arch/powerpc/kernel/perf_event.c | 8 ++++----
arch/sparc/kernel/perf_event.c | 2 +-
arch/x86/kernel/cpu/perf_event.c | 3 +--
arch/x86/kernel/cpu/perf_event_intel.c | 6 ++----
include/linux/perf_event.h | 7 +++++++
kernel/perf_event.c | 21 ++++++++-------------
7 files changed, 25 insertions(+), 26 deletions(-)
Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -965,7 +965,7 @@ armv6pmu_handle_irq(int irq_num,
*/
armv6_pmcr_write(pmcr);
- data.addr = 0;
+ perf_sample_data_init(&data, 0);
cpuc = &__get_cpu_var(cpu_hw_events);
for (idx = 0; idx <= armpmu->num_events; ++idx) {
@@ -1945,7 +1945,7 @@ static irqreturn_t armv7pmu_handle_irq(i
*/
regs = get_irq_regs();
- data.addr = 0;
+ perf_sample_data_init(&data, 0);
cpuc = &__get_cpu_var(cpu_hw_events);
for (idx = 0; idx <= armpmu->num_events; ++idx) {
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -1164,10 +1164,10 @@ static void record_and_restart(struct pe
* Finally record data if requested.
*/
if (record) {
- struct perf_sample_data data = {
- .addr = ~0ULL,
- .period = event->hw.last_period,
- };
+ struct perf_sample_data data;
+
+ perf_sample_data_init(&data, ~0ULL);
+ data.period = event->hw.last_period;
if (event->attr.sample_type & PERF_SAMPLE_ADDR)
perf_get_data_addr(regs, &data.addr);
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -1189,7 +1189,7 @@ static int __kprobes perf_event_nmi_hand
regs = args->regs;
- data.addr = 0;
+ perf_sample_data_init(&data, 0);
cpuc = &__get_cpu_var(cpu_hw_events);
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -1096,8 +1096,7 @@ static int x86_pmu_handle_irq(struct pt_
int idx, handled = 0;
u64 val;
- data.addr = 0;
- data.raw = NULL;
+ perf_sample_data_init(&data, 0);
cpuc = &__get_cpu_var(cpu_hw_events);
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -590,10 +590,9 @@ static void intel_pmu_drain_bts_buffer(v
ds->bts_index = ds->bts_buffer_base;
+ perf_sample_data_init(&data, 0);
data.period = event->hw.last_period;
- data.addr = 0;
- data.raw = NULL;
regs.ip = 0;
/*
@@ -740,8 +739,7 @@ static int intel_pmu_handle_irq(struct p
int bit, loops;
u64 ack, status;
- data.addr = 0;
- data.raw = NULL;
+ perf_sample_data_init(&data, 0);
cpuc = &__get_cpu_var(cpu_hw_events);
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -801,6 +801,13 @@ struct perf_sample_data {
struct perf_raw_record *raw;
};
+static inline
+void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
+{
+ data->addr = addr;
+ data->raw = NULL;
+}
+
extern void perf_output_sample(struct perf_output_handle *handle,
struct perf_event_header *header,
struct perf_sample_data *data,
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -4108,8 +4108,7 @@ void __perf_sw_event(u32 event_id, u64 n
if (rctx < 0)
return;
- data.addr = addr;
- data.raw = NULL;
+ perf_sample_data_init(&data, addr);
do_perf_sw_event(PERF_TYPE_SOFTWARE, event_id, nr, nmi, &data, regs);
@@ -4154,11 +4153,10 @@ static enum hrtimer_restart perf_swevent
struct perf_event *event;
u64 period;
- event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+ event = container_of(hrtimer, struct perf_event, hw.hrtimer);
event->pmu->read(event);
- data.addr = 0;
- data.raw = NULL;
+ perf_sample_data_init(&data, 0);
data.period = event->hw.last_period;
regs = get_irq_regs();
/*
@@ -4322,17 +4320,15 @@ static const struct pmu perf_ops_task_cl
void perf_tp_event(int event_id, u64 addr, u64 count, void *record,
int entry_size)
{
+ struct pt_regs *regs = get_irq_regs();
+ struct perf_sample_data data;
struct perf_raw_record raw = {
.size = entry_size,
.data = record,
};
- struct perf_sample_data data = {
- .addr = addr,
- .raw = &raw,
- };
-
- struct pt_regs *regs = get_irq_regs();
+ perf_sample_data_init(&data, addr);
+ data.raw = &raw;
if (!regs)
regs = task_pt_regs(current);
@@ -4448,8 +4444,7 @@ void perf_bp_event(struct perf_event *bp
struct perf_sample_data sample;
struct pt_regs *regs = data;
- sample.raw = NULL;
- sample.addr = bp->attr.bp_addr;
+ perf_sample_data_init(&sample, bp->attr.bp_addr);
if (!perf_exclude_event(bp, regs))
perf_swevent_add(bp, 1, 1, &sample, regs);
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 06/11] perf, x86: PEBS infrastructure
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (4 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 17:38 ` Robert Richter
2010-03-03 16:39 ` [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS Peter Zijlstra
` (4 subsequent siblings)
10 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: pebs.patch --]
[-- Type: text/plain, Size: 30155 bytes --]
Implement a simple PEBS model that always takes a single PEBS event at
a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 223 +++--------
arch/x86/kernel/cpu/perf_event_intel.c | 152 +------
arch/x86/kernel/cpu/perf_event_intel_ds.c | 594 ++++++++++++++++++++++++++++++
include/linux/perf_event.h | 3
4 files changed, 709 insertions(+), 263 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
===================================================================
--- /dev/null
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -0,0 +1,594 @@
+#ifdef CONFIG_CPU_SUP_INTEL
+
+/* The maximal number of PEBS events: */
+#define MAX_PEBS_EVENTS 4
+
+/* The size of a BTS record in bytes: */
+#define BTS_RECORD_SIZE 24
+
+#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
+#define PEBS_BUFFER_SIZE PAGE_SIZE
+
+/*
+ * pebs_record_32 for p4 and core not supported
+
+struct pebs_record_32 {
+ u32 flags, ip;
+ u32 ax, bc, cx, dx;
+ u32 si, di, bp, sp;
+};
+
+ */
+
+struct pebs_record_core {
+ u64 flags, ip;
+ u64 ax, bx, cx, dx;
+ u64 si, di, bp, sp;
+ u64 r8, r9, r10, r11;
+ u64 r12, r13, r14, r15;
+};
+
+struct pebs_record_nhm {
+ u64 flags, ip;
+ u64 ax, bx, cx, dx;
+ u64 si, di, bp, sp;
+ u64 r8, r9, r10, r11;
+ u64 r12, r13, r14, r15;
+ u64 status, dla, dse, lat;
+};
+
+/*
+ * Bits in the debugctlmsr controlling branch tracing.
+ */
+#define X86_DEBUGCTL_TR (1 << 6)
+#define X86_DEBUGCTL_BTS (1 << 7)
+#define X86_DEBUGCTL_BTINT (1 << 8)
+#define X86_DEBUGCTL_BTS_OFF_OS (1 << 9)
+#define X86_DEBUGCTL_BTS_OFF_USR (1 << 10)
+
+/*
+ * A debug store configuration.
+ *
+ * We only support architectures that use 64bit fields.
+ */
+struct debug_store {
+ u64 bts_buffer_base;
+ u64 bts_index;
+ u64 bts_absolute_maximum;
+ u64 bts_interrupt_threshold;
+ u64 pebs_buffer_base;
+ u64 pebs_index;
+ u64 pebs_absolute_maximum;
+ u64 pebs_interrupt_threshold;
+ u64 pebs_event_reset[MAX_PEBS_EVENTS];
+};
+
+static inline void init_debug_store_on_cpu(int cpu)
+{
+ struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+
+ if (!ds)
+ return;
+
+ wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA,
+ (u32)((u64)(unsigned long)ds),
+ (u32)((u64)(unsigned long)ds >> 32));
+}
+
+static inline void fini_debug_store_on_cpu(int cpu)
+{
+ if (!per_cpu(cpu_hw_events, cpu).ds)
+ return;
+
+ wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, 0, 0);
+}
+
+static void release_ds_buffers(void)
+{
+ int cpu;
+
+ if (!x86_pmu.bts && !x86_pmu.pebs)
+ return;
+
+ get_online_cpus();
+
+ for_each_online_cpu(cpu)
+ fini_debug_store_on_cpu(cpu);
+
+ for_each_possible_cpu(cpu) {
+ struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+
+ if (!ds)
+ continue;
+
+ per_cpu(cpu_hw_events, cpu).ds = NULL;
+
+ kfree((void *)(unsigned long)ds->pebs_buffer_base);
+ kfree((void *)(unsigned long)ds->bts_buffer_base);
+ kfree(ds);
+ }
+
+ put_online_cpus();
+}
+
+static int reserve_ds_buffers(void)
+{
+ int cpu, err = 0;
+
+ if (!x86_pmu.bts && !x86_pmu.pebs)
+ return 0;
+
+ get_online_cpus();
+
+ for_each_possible_cpu(cpu) {
+ struct debug_store *ds;
+ void *buffer;
+ int max, thresh;
+
+ err = -ENOMEM;
+ ds = kzalloc(sizeof(*ds), GFP_KERNEL);
+ if (unlikely(!ds)) {
+ kfree(buffer);
+ break;
+ }
+ per_cpu(cpu_hw_events, cpu).ds = ds;
+
+ if (x86_pmu.bts) {
+ buffer = kzalloc(BTS_BUFFER_SIZE, GFP_KERNEL);
+ if (unlikely(!buffer))
+ break;
+
+ max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
+ thresh = max / 16;
+
+ ds->bts_buffer_base = (u64)(unsigned long)buffer;
+ ds->bts_index = ds->bts_buffer_base;
+ ds->bts_absolute_maximum = ds->bts_buffer_base +
+ max * BTS_RECORD_SIZE;
+ ds->bts_interrupt_threshold = ds->bts_absolute_maximum -
+ thresh * BTS_RECORD_SIZE;
+ }
+
+ if (x86_pmu.pebs) {
+ buffer = kzalloc(PEBS_BUFFER_SIZE, GFP_KERNEL);
+ if (unlikely(!buffer))
+ break;
+
+ max = PEBS_BUFFER_SIZE / x86_pmu.pebs_record_size;
+
+ ds->pebs_buffer_base = (u64)(unsigned long)buffer;
+ ds->pebs_index = ds->pebs_buffer_base;
+ ds->pebs_absolute_maximum = ds->pebs_buffer_base +
+ max * x86_pmu.pebs_record_size;
+ /*
+ * Always use single record PEBS
+ */
+ ds->pebs_interrupt_threshold = ds->pebs_buffer_base +
+ x86_pmu.pebs_record_size;
+ }
+
+ err = 0;
+ }
+
+ if (err)
+ release_ds_buffers();
+ else {
+ for_each_online_cpu(cpu)
+ init_debug_store_on_cpu(cpu);
+ }
+
+ put_online_cpus();
+
+ return err;
+}
+
+/*
+ * BTS
+ */
+
+static struct event_constraint bts_constraint =
+ EVENT_CONSTRAINT(0, 1ULL << X86_PMC_IDX_FIXED_BTS, 0);
+
+static void intel_pmu_enable_bts(u64 config)
+{
+ unsigned long debugctlmsr;
+
+ debugctlmsr = get_debugctlmsr();
+
+ debugctlmsr |= X86_DEBUGCTL_TR;
+ debugctlmsr |= X86_DEBUGCTL_BTS;
+ debugctlmsr |= X86_DEBUGCTL_BTINT;
+
+ if (!(config & ARCH_PERFMON_EVENTSEL_OS))
+ debugctlmsr |= X86_DEBUGCTL_BTS_OFF_OS;
+
+ if (!(config & ARCH_PERFMON_EVENTSEL_USR))
+ debugctlmsr |= X86_DEBUGCTL_BTS_OFF_USR;
+
+ update_debugctlmsr(debugctlmsr);
+}
+
+static void intel_pmu_disable_bts(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ unsigned long debugctlmsr;
+
+ if (!cpuc->ds)
+ return;
+
+ debugctlmsr = get_debugctlmsr();
+
+ debugctlmsr &=
+ ~(X86_DEBUGCTL_TR | X86_DEBUGCTL_BTS | X86_DEBUGCTL_BTINT |
+ X86_DEBUGCTL_BTS_OFF_OS | X86_DEBUGCTL_BTS_OFF_USR);
+
+ update_debugctlmsr(debugctlmsr);
+}
+
+static void intel_pmu_drain_bts_buffer(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct debug_store *ds = cpuc->ds;
+ struct bts_record {
+ u64 from;
+ u64 to;
+ u64 flags;
+ };
+ struct perf_event *event = cpuc->events[X86_PMC_IDX_FIXED_BTS];
+ struct bts_record *at, *top;
+ struct perf_output_handle handle;
+ struct perf_event_header header;
+ struct perf_sample_data data;
+ struct pt_regs regs;
+
+ if (!event)
+ return;
+
+ if (!ds)
+ return;
+
+ at = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
+ top = (struct bts_record *)(unsigned long)ds->bts_index;
+
+ if (top <= at)
+ return;
+
+ ds->bts_index = ds->bts_buffer_base;
+
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+ regs.ip = 0;
+
+ /*
+ * Prepare a generic sample, i.e. fill in the invariant fields.
+ * We will overwrite the from and to address before we output
+ * the sample.
+ */
+ perf_prepare_sample(&header, &data, event, ®s);
+
+ if (perf_output_begin(&handle, event, header.size * (top - at), 1, 1))
+ return;
+
+ for (; at < top; at++) {
+ data.ip = at->from;
+ data.addr = at->to;
+
+ perf_output_sample(&handle, &header, &data, event);
+ }
+
+ perf_output_end(&handle);
+
+ /* There's new data available. */
+ event->hw.interrupts++;
+ event->pending_kill = POLL_IN;
+}
+
+/*
+ * PEBS
+ */
+
+static struct event_constraint intel_core_pebs_events[] = {
+ PEBS_EVENT_CONSTRAINT(0x00c0, 0x1), /* INSTR_RETIRED.ANY */
+ PEBS_EVENT_CONSTRAINT(0xfec1, 0x1), /* X87_OPS_RETIRED.ANY */
+ PEBS_EVENT_CONSTRAINT(0x00c5, 0x1), /* BR_INST_RETIRED.MISPRED */
+ PEBS_EVENT_CONSTRAINT(0x1fc7, 0x1), /* SIMD_INST_RETURED.ANY */
+ PEBS_EVENT_CONSTRAINT(0x01cb, 0x1), /* MEM_LOAD_RETIRED.L1D_MISS */
+ PEBS_EVENT_CONSTRAINT(0x02cb, 0x1), /* MEM_LOAD_RETIRED.L1D_LINE_MISS */
+ PEBS_EVENT_CONSTRAINT(0x04cb, 0x1), /* MEM_LOAD_RETIRED.L2_MISS */
+ PEBS_EVENT_CONSTRAINT(0x08cb, 0x1), /* MEM_LOAD_RETIRED.L2_LINE_MISS */
+ PEBS_EVENT_CONSTRAINT(0x10cb, 0x1), /* MEM_LOAD_RETIRED.DTLB_MISS */
+ EVENT_CONSTRAINT_END
+};
+
+static struct event_constraint intel_nehalem_pebs_events[] = {
+ PEBS_EVENT_CONSTRAINT(0x00c0, 0xf), /* INSTR_RETIRED.ANY */
+ PEBS_EVENT_CONSTRAINT(0xfec1, 0xf), /* X87_OPS_RETIRED.ANY */
+ PEBS_EVENT_CONSTRAINT(0x00c5, 0xf), /* BR_INST_RETIRED.MISPRED */
+ PEBS_EVENT_CONSTRAINT(0x1fc7, 0xf), /* SIMD_INST_RETURED.ANY */
+ PEBS_EVENT_CONSTRAINT(0x01cb, 0xf), /* MEM_LOAD_RETIRED.L1D_MISS */
+ PEBS_EVENT_CONSTRAINT(0x02cb, 0xf), /* MEM_LOAD_RETIRED.L1D_LINE_MISS */
+ PEBS_EVENT_CONSTRAINT(0x04cb, 0xf), /* MEM_LOAD_RETIRED.L2_MISS */
+ PEBS_EVENT_CONSTRAINT(0x08cb, 0xf), /* MEM_LOAD_RETIRED.L2_LINE_MISS */
+ PEBS_EVENT_CONSTRAINT(0x10cb, 0xf), /* MEM_LOAD_RETIRED.DTLB_MISS */
+ EVENT_CONSTRAINT_END
+};
+
+static struct event_constraint *
+intel_pebs_constraints(struct perf_event *event)
+{
+ struct event_constraint *c;
+
+ if (!event->attr.precise)
+ return NULL;
+
+ if (x86_pmu.pebs_constraints) {
+ for_each_event_constraint(c, x86_pmu.pebs_constraints) {
+ if ((event->hw.config & c->cmask) == c->code)
+ return c;
+ }
+ }
+
+ return &emptyconstraint;
+}
+
+static void intel_pmu_pebs_enable(struct hw_perf_event *hwc)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ u64 val = cpuc->pebs_enabled;
+
+ hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
+
+ val |= 1ULL << hwc->idx;
+ wrmsrl(MSR_IA32_PEBS_ENABLE, val);
+}
+
+static void intel_pmu_pebs_disable(struct hw_perf_event *hwc)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ u64 val = cpuc->pebs_enabled;
+
+ val &= ~(1ULL << hwc->idx);
+ wrmsrl(MSR_IA32_PEBS_ENABLE, val);
+
+ hwc->config |= ARCH_PERFMON_EVENTSEL_INT;
+}
+
+static void intel_pmu_pebs_enable_all(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (cpuc->pebs_enabled)
+ wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
+}
+
+static void intel_pmu_pebs_disable_all(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (cpuc->pebs_enabled)
+ wrmsrl(MSR_IA32_PEBS_ENABLE, 0);
+}
+
+#define CC(pebs, regs, reg) (regs)->reg = (pebs)->reg
+
+#ifdef CONFIG_X86_32
+
+#define PEBS_TO_REGS(pebs, regs) \
+do { \
+ memset((regs), 0, sizeof(*regs)); \
+ CC((pebs), (regs), ax); \
+ CC((pebs), (regs), bx); \
+ CC((pebs), (regs), cx); \
+ CC((pebs), (regs), dx); \
+ CC((pebs), (regs), si); \
+ CC((pebs), (regs), di); \
+ CC((pebs), (regs), bp); \
+ CC((pebs), (regs), sp); \
+ CC((pebs), (regs), flags); \
+ CC((pebs), (regs), ip); \
+} while (0)
+
+#else /* CONFIG_X86_64 */
+
+#define PEBS_TO_REGS(pebs, regs) \
+do { \
+ memset((regs), 0, sizeof(*regs)); \
+ CC((pebs), (regs), ax); \
+ CC((pebs), (regs), bx); \
+ CC((pebs), (regs), cx); \
+ CC((pebs), (regs), dx); \
+ CC((pebs), (regs), si); \
+ CC((pebs), (regs), di); \
+ CC((pebs), (regs), bp); \
+ CC((pebs), (regs), sp); \
+ CC((pebs), (regs), r8); \
+ CC((pebs), (regs), r9); \
+ CC((pebs), (regs), r10); \
+ CC((pebs), (regs), r11); \
+ CC((pebs), (regs), r12); \
+ CC((pebs), (regs), r13); \
+ CC((pebs), (regs), r14); \
+ CC((pebs), (regs), r15); \
+ CC((pebs), (regs), flags); \
+ CC((pebs), (regs), ip); \
+} while (0)
+
+#endif
+
+static int intel_pmu_save_and_restart(struct perf_event *event);
+static void intel_pmu_disable_event(struct perf_event *event);
+
+static void intel_pmu_drain_pebs_core(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct debug_store *ds = cpuc->ds;
+ struct perf_event *event = cpuc->events[0]; /* PMC0 only */
+ struct pebs_record_core *at, *top;
+ struct perf_sample_data data;
+ struct pt_regs regs;
+ int n;
+
+ if (!event || !ds || !x86_pmu.pebs)
+ return;
+
+ intel_pmu_pebs_disable_all();
+
+ at = (struct pebs_record_core *)(unsigned long)ds->pebs_buffer_base;
+ top = (struct pebs_record_core *)(unsigned long)ds->pebs_index;
+
+ if (top <= at)
+ goto out;
+
+ ds->pebs_index = ds->pebs_buffer_base;
+
+ if (!intel_pmu_save_and_restart(event))
+ goto out;
+
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+
+ n = top - at;
+
+ /*
+ * Should not happen, we program the threshold at 1 and do not
+ * set a reset value.
+ */
+ if (unlikely(n > 1)) {
+ trace_printk("PEBS: too many events: %d\n", n);
+ at += n-1;
+ }
+
+ PEBS_TO_REGS(at, ®s);
+
+ if (perf_event_overflow(event, 1, &data, ®s))
+ intel_pmu_disable_event(event);
+
+out:
+ intel_pmu_pebs_enable_all();
+}
+
+static void intel_pmu_drain_pebs_nhm(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct debug_store *ds = cpuc->ds;
+ struct pebs_record_nhm *at, *top;
+ struct perf_sample_data data;
+ struct perf_event *event = NULL;
+ struct pt_regs regs;
+ int bit, n;
+
+ if (!ds || !x86_pmu.pebs)
+ return;
+
+ intel_pmu_pebs_disable_all();
+
+ at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
+ top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
+
+ if (top <= at)
+ goto out;
+
+ ds->pebs_index = ds->pebs_buffer_base;
+
+ n = top - at;
+
+ /*
+ * Should not happen, we program the threshold at 1 and do not
+ * set a reset value.
+ */
+ if (unlikely(n > MAX_PEBS_EVENTS))
+ trace_printk("PEBS: too many events: %d\n", n);
+
+ for ( ; at < top; at++) {
+ for_each_bit(bit, (unsigned long *)&at->status, MAX_PEBS_EVENTS) {
+ if (!cpuc->events[bit]->attr.precise)
+ continue;
+
+ if (event)
+ trace_printk("PEBS: status: %Lx\n", at->status);
+
+ event = cpuc->events[bit];
+ }
+
+ if (!event) {
+ trace_printk("PEBS: interrupt, status: %Lx\n",
+ at->status);
+ continue;
+ }
+
+ if (!intel_pmu_save_and_restart(event))
+ continue;
+
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+
+ PEBS_TO_REGS(at, ®s);
+
+ if (perf_event_overflow(event, 1, &data, ®s))
+ intel_pmu_disable_event(event);
+ }
+out:
+ intel_pmu_pebs_enable_all();
+}
+
+/*
+ * BTS, PEBS probe and setup
+ */
+
+static void intel_ds_init(void)
+{
+ /*
+ * No support for 32bit formats
+ */
+ if (!boot_cpu_has(X86_FEATURE_DTES64))
+ return;
+
+ x86_pmu.bts = boot_cpu_has(X86_FEATURE_BTS);
+ x86_pmu.pebs = boot_cpu_has(X86_FEATURE_PEBS);
+ if (x86_pmu.pebs) {
+ int format = 0;
+
+ if (x86_pmu.version > 1) {
+ u64 capabilities;
+ /*
+ * v2+ has a PEBS format field
+ */
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);
+ format = (capabilities >> 8) & 0xf;
+ }
+
+ switch (format) {
+ case 0:
+ printk(KERN_CONT "PEBS v0, ");
+ x86_pmu.pebs_record_size = sizeof(struct pebs_record_core);
+ x86_pmu.drain_pebs = intel_pmu_drain_pebs_core;
+ x86_pmu.pebs_constraints = intel_core_pebs_events;
+ break;
+
+ case 1:
+ printk(KERN_CONT "PEBS v1, ");
+ x86_pmu.pebs_record_size = sizeof(struct pebs_record_nhm);
+ x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
+ x86_pmu.pebs_constraints = intel_nehalem_pebs_events;
+ break;
+
+ default:
+ printk(KERN_CONT "PEBS unknown format: %d, ", format);
+ x86_pmu.pebs = 0;
+ break;
+ }
+ }
+}
+
+#else /* CONFIG_CPU_SUP_INTEL */
+
+static int reseve_ds_buffers(void)
+{
+ return 0;
+}
+
+static void release_ds_buffers(void)
+{
+}
+
+#endif /* CONFIG_CPU_SUP_INTEL */
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -31,45 +31,6 @@
static u64 perf_event_mask __read_mostly;
-/* The maximal number of PEBS events: */
-#define MAX_PEBS_EVENTS 4
-
-/* The size of a BTS record in bytes: */
-#define BTS_RECORD_SIZE 24
-
-/* The size of a per-cpu BTS buffer in bytes: */
-#define BTS_BUFFER_SIZE (BTS_RECORD_SIZE * 2048)
-
-/* The BTS overflow threshold in bytes from the end of the buffer: */
-#define BTS_OVFL_TH (BTS_RECORD_SIZE * 128)
-
-
-/*
- * Bits in the debugctlmsr controlling branch tracing.
- */
-#define X86_DEBUGCTL_TR (1 << 6)
-#define X86_DEBUGCTL_BTS (1 << 7)
-#define X86_DEBUGCTL_BTINT (1 << 8)
-#define X86_DEBUGCTL_BTS_OFF_OS (1 << 9)
-#define X86_DEBUGCTL_BTS_OFF_USR (1 << 10)
-
-/*
- * A debug store configuration.
- *
- * We only support architectures that use 64bit fields.
- */
-struct debug_store {
- u64 bts_buffer_base;
- u64 bts_index;
- u64 bts_absolute_maximum;
- u64 bts_interrupt_threshold;
- u64 pebs_buffer_base;
- u64 pebs_index;
- u64 pebs_absolute_maximum;
- u64 pebs_interrupt_threshold;
- u64 pebs_event_reset[MAX_PEBS_EVENTS];
-};
-
struct event_constraint {
union {
unsigned long idxmsk[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
@@ -88,17 +49,29 @@ struct amd_nb {
};
struct cpu_hw_events {
+ /*
+ * Generic x86 PMC bits
+ */
struct perf_event *events[X86_PMC_IDX_MAX]; /* in counter order */
unsigned long active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
unsigned long interrupts;
int enabled;
- struct debug_store *ds;
int n_events;
int n_added;
int assign[X86_PMC_IDX_MAX]; /* event to counter assignment */
u64 tags[X86_PMC_IDX_MAX];
struct perf_event *event_list[X86_PMC_IDX_MAX]; /* in enabled order */
+
+ /*
+ * Intel DebugStore bits
+ */
+ struct debug_store *ds;
+ u64 pebs_enabled;
+
+ /*
+ * AMD specific bits
+ */
struct amd_nb *amd_nb;
};
@@ -112,12 +85,24 @@ struct cpu_hw_events {
#define EVENT_CONSTRAINT(c, n, m) \
__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n))
+/*
+ * Constraint on the Event code.
+ */
#define INTEL_EVENT_CONSTRAINT(c, n) \
EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVTSEL_MASK)
+/*
+ * Constraint on the Event code + UMask + fixed-mask
+ */
#define FIXED_EVENT_CONSTRAINT(c, n) \
EVENT_CONSTRAINT(c, (1ULL << (32+n)), INTEL_ARCH_FIXED_MASK)
+/*
+ * Constraint on the Event code + UMask
+ */
+#define PEBS_EVENT_CONSTRAINT(c, n) \
+ EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)
+
#define EVENT_CONSTRAINT_END \
EVENT_CONSTRAINT(0, 0, 0)
@@ -128,6 +113,9 @@ struct cpu_hw_events {
* struct x86_pmu - generic x86 pmu
*/
struct x86_pmu {
+ /*
+ * Generic x86 PMC bits
+ */
const char *name;
int version;
int (*handle_irq)(struct pt_regs *);
@@ -146,10 +134,6 @@ struct x86_pmu {
u64 event_mask;
int apic;
u64 max_period;
- u64 intel_ctrl;
- void (*enable_bts)(u64 config);
- void (*disable_bts)(void);
-
struct event_constraint *
(*get_event_constraints)(struct cpu_hw_events *cpuc,
struct perf_event *event);
@@ -157,6 +141,19 @@ struct x86_pmu {
void (*put_event_constraints)(struct cpu_hw_events *cpuc,
struct perf_event *event);
struct event_constraint *event_constraints;
+
+ /*
+ * Intel Arch Perfmon v2+
+ */
+ u64 intel_ctrl;
+
+ /*
+ * Intel DebugStore bits
+ */
+ int bts, pebs;
+ int pebs_record_size;
+ void (*drain_pebs)(void);
+ struct event_constraint *pebs_constraints;
};
static struct x86_pmu x86_pmu __read_mostly;
@@ -288,110 +285,14 @@ static void release_pmc_hardware(void)
#endif
}
-static inline bool bts_available(void)
-{
- return x86_pmu.enable_bts != NULL;
-}
-
-static inline void init_debug_store_on_cpu(int cpu)
-{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
-
- if (!ds)
- return;
-
- wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA,
- (u32)((u64)(unsigned long)ds),
- (u32)((u64)(unsigned long)ds >> 32));
-}
-
-static inline void fini_debug_store_on_cpu(int cpu)
-{
- if (!per_cpu(cpu_hw_events, cpu).ds)
- return;
-
- wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, 0, 0);
-}
-
-static void release_bts_hardware(void)
-{
- int cpu;
-
- if (!bts_available())
- return;
-
- get_online_cpus();
-
- for_each_online_cpu(cpu)
- fini_debug_store_on_cpu(cpu);
-
- for_each_possible_cpu(cpu) {
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
-
- if (!ds)
- continue;
-
- per_cpu(cpu_hw_events, cpu).ds = NULL;
-
- kfree((void *)(unsigned long)ds->bts_buffer_base);
- kfree(ds);
- }
-
- put_online_cpus();
-}
-
-static int reserve_bts_hardware(void)
-{
- int cpu, err = 0;
-
- if (!bts_available())
- return 0;
-
- get_online_cpus();
-
- for_each_possible_cpu(cpu) {
- struct debug_store *ds;
- void *buffer;
-
- err = -ENOMEM;
- buffer = kzalloc(BTS_BUFFER_SIZE, GFP_KERNEL);
- if (unlikely(!buffer))
- break;
-
- ds = kzalloc(sizeof(*ds), GFP_KERNEL);
- if (unlikely(!ds)) {
- kfree(buffer);
- break;
- }
-
- ds->bts_buffer_base = (u64)(unsigned long)buffer;
- ds->bts_index = ds->bts_buffer_base;
- ds->bts_absolute_maximum =
- ds->bts_buffer_base + BTS_BUFFER_SIZE;
- ds->bts_interrupt_threshold =
- ds->bts_absolute_maximum - BTS_OVFL_TH;
-
- per_cpu(cpu_hw_events, cpu).ds = ds;
- err = 0;
- }
-
- if (err)
- release_bts_hardware();
- else {
- for_each_online_cpu(cpu)
- init_debug_store_on_cpu(cpu);
- }
-
- put_online_cpus();
-
- return err;
-}
+static int reserve_ds_buffers(void);
+static void release_ds_buffers(void);
static void hw_perf_event_destroy(struct perf_event *event)
{
if (atomic_dec_and_mutex_lock(&active_events, &pmc_reserve_mutex)) {
release_pmc_hardware();
- release_bts_hardware();
+ release_ds_buffers();
mutex_unlock(&pmc_reserve_mutex);
}
}
@@ -454,7 +355,7 @@ static int __hw_perf_event_init(struct p
if (!reserve_pmc_hardware())
err = -EBUSY;
else
- err = reserve_bts_hardware();
+ err = reserve_ds_buffers();
}
if (!err)
atomic_inc(&active_events);
@@ -532,7 +433,7 @@ static int __hw_perf_event_init(struct p
if ((attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS) &&
(hwc->sample_period == 1)) {
/* BTS is not supported by this architecture. */
- if (!bts_available())
+ if (!x86_pmu.bts)
return -EOPNOTSUPP;
/* BTS is currently only allowed for user-mode. */
@@ -994,6 +895,7 @@ static void x86_pmu_unthrottle(struct pe
void perf_event_print_debug(void)
{
u64 ctrl, status, overflow, pmc_ctrl, pmc_count, prev_left, fixed;
+ u64 pebs;
struct cpu_hw_events *cpuc;
unsigned long flags;
int cpu, idx;
@@ -1011,12 +913,14 @@ void perf_event_print_debug(void)
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, status);
rdmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, overflow);
rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, fixed);
+ rdmsrl(MSR_IA32_PEBS_ENABLE, pebs);
pr_info("\n");
pr_info("CPU#%d: ctrl: %016llx\n", cpu, ctrl);
pr_info("CPU#%d: status: %016llx\n", cpu, status);
pr_info("CPU#%d: overflow: %016llx\n", cpu, overflow);
pr_info("CPU#%d: fixed: %016llx\n", cpu, fixed);
+ pr_info("CPU#%d: pebs: %016llx\n", cpu, pebs);
}
pr_info("CPU#%d: active: %016llx\n", cpu, *(u64 *)cpuc->active_mask);
@@ -1334,6 +1238,7 @@ undo:
#include "perf_event_amd.c"
#include "perf_event_p6.c"
+#include "perf_event_intel_ds.c"
#include "perf_event_intel.c"
static void __init pmu_check_apic(void)
@@ -1431,6 +1336,32 @@ static const struct pmu pmu = {
};
/*
+ * validate that we can schedule this event
+ */
+static int validate_event(struct perf_event *event)
+{
+ struct cpu_hw_events *fake_cpuc;
+ struct event_constraint *c;
+ int ret = 0;
+
+ fake_cpuc = kmalloc(sizeof(*fake_cpuc), GFP_KERNEL | __GFP_ZERO);
+ if (!fake_cpuc)
+ return -ENOMEM;
+
+ c = x86_pmu.get_event_constraints(fake_cpuc, event);
+
+ if (!c || !c->weight)
+ ret = -ENOSPC;
+
+ if (x86_pmu.put_event_constraints)
+ x86_pmu.put_event_constraints(fake_cpuc, event);
+
+ kfree(fake_cpuc);
+
+ return ret;
+}
+
+/*
* validate a single event group
*
* validation include:
@@ -1495,6 +1426,8 @@ const struct pmu *hw_perf_event_init(str
if (event->group_leader != event)
err = validate_group(event);
+ else
+ err = validate_event(event);
event->pmu = tmp;
}
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -470,42 +470,6 @@ static u64 intel_pmu_raw_event(u64 hw_ev
return hw_event & CORE_EVNTSEL_MASK;
}
-static void intel_pmu_enable_bts(u64 config)
-{
- unsigned long debugctlmsr;
-
- debugctlmsr = get_debugctlmsr();
-
- debugctlmsr |= X86_DEBUGCTL_TR;
- debugctlmsr |= X86_DEBUGCTL_BTS;
- debugctlmsr |= X86_DEBUGCTL_BTINT;
-
- if (!(config & ARCH_PERFMON_EVENTSEL_OS))
- debugctlmsr |= X86_DEBUGCTL_BTS_OFF_OS;
-
- if (!(config & ARCH_PERFMON_EVENTSEL_USR))
- debugctlmsr |= X86_DEBUGCTL_BTS_OFF_USR;
-
- update_debugctlmsr(debugctlmsr);
-}
-
-static void intel_pmu_disable_bts(void)
-{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- unsigned long debugctlmsr;
-
- if (!cpuc->ds)
- return;
-
- debugctlmsr = get_debugctlmsr();
-
- debugctlmsr &=
- ~(X86_DEBUGCTL_TR | X86_DEBUGCTL_BTS | X86_DEBUGCTL_BTINT |
- X86_DEBUGCTL_BTS_OFF_OS | X86_DEBUGCTL_BTS_OFF_USR);
-
- update_debugctlmsr(debugctlmsr);
-}
-
static void intel_pmu_disable_all(void)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -514,6 +478,8 @@ static void intel_pmu_disable_all(void)
if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc->active_mask))
intel_pmu_disable_bts();
+
+ intel_pmu_pebs_disable_all();
}
static void intel_pmu_enable_all(void)
@@ -531,6 +497,8 @@ static void intel_pmu_enable_all(void)
intel_pmu_enable_bts(event->hw.config);
}
+
+ intel_pmu_pebs_enable_all();
}
static inline u64 intel_pmu_get_status(void)
@@ -547,8 +515,7 @@ static inline void intel_pmu_ack_status(
wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, ack);
}
-static inline void
-intel_pmu_disable_fixed(struct hw_perf_event *hwc)
+static void intel_pmu_disable_fixed(struct hw_perf_event *hwc)
{
int idx = hwc->idx - X86_PMC_IDX_FIXED;
u64 ctrl_val, mask;
@@ -560,68 +527,7 @@ intel_pmu_disable_fixed(struct hw_perf_e
(void)checking_wrmsrl(hwc->config_base, ctrl_val);
}
-static void intel_pmu_drain_bts_buffer(void)
-{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- struct debug_store *ds = cpuc->ds;
- struct bts_record {
- u64 from;
- u64 to;
- u64 flags;
- };
- struct perf_event *event = cpuc->events[X86_PMC_IDX_FIXED_BTS];
- struct bts_record *at, *top;
- struct perf_output_handle handle;
- struct perf_event_header header;
- struct perf_sample_data data;
- struct pt_regs regs;
-
- if (!event)
- return;
-
- if (!ds)
- return;
-
- at = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
- top = (struct bts_record *)(unsigned long)ds->bts_index;
-
- if (top <= at)
- return;
-
- ds->bts_index = ds->bts_buffer_base;
-
- perf_sample_data_init(&data, 0);
-
- data.period = event->hw.last_period;
- regs.ip = 0;
-
- /*
- * Prepare a generic sample, i.e. fill in the invariant fields.
- * We will overwrite the from and to address before we output
- * the sample.
- */
- perf_prepare_sample(&header, &data, event, ®s);
-
- if (perf_output_begin(&handle, event,
- header.size * (top - at), 1, 1))
- return;
-
- for (; at < top; at++) {
- data.ip = at->from;
- data.addr = at->to;
-
- perf_output_sample(&handle, &header, &data, event);
- }
-
- perf_output_end(&handle);
-
- /* There's new data available. */
- event->hw.interrupts++;
- event->pending_kill = POLL_IN;
-}
-
-static inline void
-intel_pmu_disable_event(struct perf_event *event)
+static void intel_pmu_disable_event(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -637,10 +543,12 @@ intel_pmu_disable_event(struct perf_even
}
x86_pmu_disable_event(event);
+
+ if (unlikely(event->attr.precise))
+ intel_pmu_pebs_disable(hwc);
}
-static inline void
-intel_pmu_enable_fixed(struct hw_perf_event *hwc)
+static void intel_pmu_enable_fixed(struct hw_perf_event *hwc)
{
int idx = hwc->idx - X86_PMC_IDX_FIXED;
u64 ctrl_val, bits, mask;
@@ -689,6 +597,9 @@ static void intel_pmu_enable_event(struc
return;
}
+ if (unlikely(event->attr.precise))
+ intel_pmu_pebs_enable(hwc);
+
__x86_pmu_enable_event(hwc);
}
@@ -763,10 +674,17 @@ again:
inc_irq_stat(apic_perf_irqs);
ack = status;
+
+ /*
+ * PEBS overflow sets bit 62 in the global status register
+ */
+ if (__test_and_clear_bit(62, (unsigned long *)&status))
+ x86_pmu.drain_pebs();
+
for_each_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
- __clear_bit(bit, (unsigned long *) &status);
+ __clear_bit(bit, (unsigned long *)&status);
if (!test_bit(bit, cpuc->active_mask))
continue;
@@ -793,22 +711,18 @@ again:
return 1;
}
-static struct event_constraint bts_constraint =
- EVENT_CONSTRAINT(0, 1ULL << X86_PMC_IDX_FIXED_BTS, 0);
-
static struct event_constraint *
-intel_special_constraints(struct perf_event *event)
+intel_bts_constraints(struct perf_event *event)
{
- unsigned int hw_event;
-
- hw_event = event->hw.config & INTEL_ARCH_EVENT_MASK;
+ struct hw_perf_event *hwc = &event->hw;
+ unsigned int hw_event, bts_event;
- if (unlikely((hw_event ==
- x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS)) &&
- (event->hw.sample_period == 1))) {
+ hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
+ bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
+ if (unlikely(hw_event == bts_event && hwc->sample_period == 1))
return &bts_constraint;
- }
+
return NULL;
}
@@ -817,7 +731,11 @@ intel_get_event_constraints(struct cpu_h
{
struct event_constraint *c;
- c = intel_special_constraints(event);
+ c = intel_bts_constraints(event);
+ if (c)
+ return c;
+
+ c = intel_pebs_constraints(event);
if (c)
return c;
@@ -866,8 +784,6 @@ static __initconst struct x86_pmu intel_
* the generic event period:
*/
.max_period = (1ULL << 31) - 1,
- .enable_bts = intel_pmu_enable_bts,
- .disable_bts = intel_pmu_disable_bts,
.get_event_constraints = intel_get_event_constraints
};
@@ -914,6 +830,8 @@ static __init int intel_pmu_init(void)
if (version > 1)
x86_pmu.num_events_fixed = max((int)edx.split.num_events_fixed, 3);
+ intel_ds_init();
+
/*
* Install the hw-cache-events table:
*/
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -203,8 +203,9 @@ struct perf_event_attr {
enable_on_exec : 1, /* next exec enables */
task : 1, /* trace fork/exit */
watermark : 1, /* wakeup_watermark */
+ precise : 1, /* OoO invariant counter */
- __reserved_1 : 49;
+ __reserved_1 : 48;
union {
__u32 wakeup_events; /* wakeup every n events */
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (5 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 06/11] perf, x86: PEBS infrastructure Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 17:30 ` Stephane Eranian
2010-03-03 22:02 ` Frederic Weisbecker
2010-03-03 16:39 ` [RFC][PATCH 08/11] perf, x86: Implement simple LBR support Peter Zijlstra
` (3 subsequent siblings)
10 siblings, 2 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-sample-regs.patch --]
[-- Type: text/plain, Size: 2365 bytes --]
Simply copy out the provided pt_regs in a u64 aligned fashion.
XXX: do task_pt_regs() and get_irq_regs() always clear everything or
are we now leaking data?
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/perf_event.h | 5 ++++-
kernel/perf_event.c | 17 +++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -125,8 +125,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_PERIOD = 1U << 8,
PERF_SAMPLE_STREAM_ID = 1U << 9,
PERF_SAMPLE_RAW = 1U << 10,
+ PERF_SAMPLE_REGS = 1U << 11,
- PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */
};
/*
@@ -392,6 +393,7 @@ enum perf_event_type {
* { u64 period; } && PERF_SAMPLE_PERIOD
*
* { struct read_format values; } && PERF_SAMPLE_READ
+ * { struct pt_regs regs; } && PERF_SAMPLE_REGS
*
* { u64 nr,
* u64 ips[nr]; } && PERF_SAMPLE_CALLCHAIN
@@ -800,6 +802,7 @@ struct perf_sample_data {
u64 period;
struct perf_callchain_entry *callchain;
struct perf_raw_record *raw;
+ struct pt_regs *regs;
};
static inline
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -3176,6 +3176,17 @@ void perf_output_sample(struct perf_outp
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);
+ if (sample_type & PERF_SAMPLE_REGS) {
+ int size = DIV_ROUND_UP(sizeof(struct pt_regs), sizeof(u64)) -
+ sizeof(struct pt_regs);
+
+ perf_output_put(handle, *data->regs);
+ if (size) {
+ u64 zero = 0;
+ perf_output_copy(handle, &zero, size);
+ }
+ }
+
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
if (data->callchain) {
int size = 1;
@@ -3273,6 +3284,12 @@ void perf_prepare_sample(struct perf_eve
if (sample_type & PERF_SAMPLE_READ)
header->size += perf_event_read_size(event);
+ if (sample_type & PERF_SAMPLE_REGS) {
+ data->regs = regs;
+ header->size += DIV_ROUND_UP(sizeof(struct pt_regs),
+ sizeof(u64));
+ }
+
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
int size = 1;
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (6 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 21:52 ` Stephane Eranian
2010-03-03 21:57 ` Stephane Eranian
2010-03-03 16:39 ` [RFC][PATCH 09/11] perf, x86: Implement PERF_SAMPLE_BRANCH_STACK Peter Zijlstra
` (2 subsequent siblings)
10 siblings, 2 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-lbr.patch --]
[-- Type: text/plain, Size: 8982 bytes --]
Implement support for Intel LBR stacks that support
FREEZE_LBRS_ON_PMI. We do not (yet?) support the LBR config register
because that is SMT wide and would also put undue restraints on the
PEBS users.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 22 ++
arch/x86/kernel/cpu/perf_event_intel.c | 13 +
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 228 +++++++++++++++++++++++++++++
3 files changed, 263 insertions(+)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -48,6 +48,12 @@ struct amd_nb {
struct event_constraint event_constraints[X86_PMC_IDX_MAX];
};
+#define MAX_LBR_ENTRIES 16
+
+struct lbr_entry {
+ u64 from, to, flags;
+};
+
struct cpu_hw_events {
/*
* Generic x86 PMC bits
@@ -70,6 +76,14 @@ struct cpu_hw_events {
u64 pebs_enabled;
/*
+ * Intel LBR bits
+ */
+ int lbr_users;
+ int lbr_entries;
+ struct lbr_entry lbr_stack[MAX_LBR_ENTRIES];
+ void *lbr_context;
+
+ /*
* AMD specific bits
*/
struct amd_nb *amd_nb;
@@ -154,6 +168,13 @@ struct x86_pmu {
int pebs_record_size;
void (*drain_pebs)(void);
struct event_constraint *pebs_constraints;
+
+ /*
+ * Intel LBR
+ */
+ unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */
+ int lbr_nr; /* hardware stack size */
+ int lbr_format; /* hardware format */
};
static struct x86_pmu x86_pmu __read_mostly;
@@ -1238,6 +1259,7 @@ undo:
#include "perf_event_amd.c"
#include "perf_event_p6.c"
+#include "perf_event_intel_lbr.c"
#include "perf_event_intel_ds.c"
#include "perf_event_intel.c"
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -480,6 +480,7 @@ static void intel_pmu_disable_all(void)
intel_pmu_disable_bts();
intel_pmu_pebs_disable_all();
+ intel_pmu_lbr_disable_all();
}
static void intel_pmu_enable_all(void)
@@ -499,6 +500,7 @@ static void intel_pmu_enable_all(void)
}
intel_pmu_pebs_enable_all();
+ intel_pmu_lbr_enable_all();
}
static inline u64 intel_pmu_get_status(void)
@@ -675,6 +677,8 @@ again:
inc_irq_stat(apic_perf_irqs);
ack = status;
+ intel_pmu_lbr_read();
+
/*
* PEBS overflow sets bit 62 in the global status register
*/
@@ -847,6 +851,8 @@ static __init int intel_pmu_init(void)
memcpy(hw_cache_event_ids, core2_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
+ intel_pmu_lbr_init_core();
+
x86_pmu.event_constraints = intel_core2_event_constraints;
pr_cont("Core2 events, ");
break;
@@ -856,13 +862,18 @@ static __init int intel_pmu_init(void)
memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
+ intel_pmu_lbr_init_nhm();
+
x86_pmu.event_constraints = intel_nehalem_event_constraints;
pr_cont("Nehalem/Corei7 events, ");
break;
+
case 28: /* Atom */
memcpy(hw_cache_event_ids, atom_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
+ intel_pmu_lbr_init_atom();
+
x86_pmu.event_constraints = intel_gen_event_constraints;
pr_cont("Atom events, ");
break;
@@ -872,6 +883,8 @@ static __init int intel_pmu_init(void)
memcpy(hw_cache_event_ids, westmere_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
+ intel_pmu_lbr_init_nhm();
+
x86_pmu.event_constraints = intel_westmere_event_constraints;
pr_cont("Westmere events, ");
break;
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
===================================================================
--- /dev/null
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -0,0 +1,228 @@
+#ifdef CONFIG_CPU_SUP_INTEL
+
+enum {
+ LBR_FORMAT_32 = 0x00,
+ LBR_FORMAT_LIP = 0x01,
+ LBR_FORMAT_EIP = 0x02,
+ LBR_FORMAT_EIP_FLAGS = 0x03,
+};
+
+/*
+ * We only support LBR implementations that have FREEZE_LBRS_ON_PMI
+ * otherwise it becomes near impossible to get a reliable stack.
+ */
+
+#define X86_DEBUGCTL_LBR (1 << 0)
+#define X86_DEBUGCTL_FREEZE_LBRS_ON_PMI (1 << 11)
+
+static void __intel_pmu_lbr_enable(void)
+{
+ u64 debugctl;
+
+ rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
+ debugctl |= (X86_DEBUGCTL_LBR | X86_DEBUGCTL_FREEZE_LBRS_ON_PMI);
+ wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
+}
+
+static void __intel_pmu_lbr_disable(void)
+{
+ u64 debugctl;
+
+ rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
+ debugctl &= ~(X86_DEBUGCTL_LBR | X86_DEBUGCTL_FREEZE_LBRS_ON_PMI);
+ wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
+}
+
+static void intel_pmu_lbr_reset_32(void)
+{
+ int i;
+
+ for (i = 0; i < x86_pmu.lbr_nr; i++)
+ wrmsrl(x86_pmu.lbr_from + i, 0);
+}
+
+static void intel_pmu_lbr_reset_64(void)
+{
+ int i;
+
+ for (i = 0; i < x86_pmu.lbr_nr; i++) {
+ wrmsrl(x86_pmu.lbr_from + i, 0);
+ wrmsrl(x86_pmu.lbr_to + i, 0);
+ }
+}
+
+static void intel_pmu_lbr_reset(void)
+{
+ if (x86_pmu.lbr_format == LBR_FORMAT_32)
+ intel_pmu_lbr_reset_32();
+ else
+ intel_pmu_lbr_reset_64();
+}
+
+static void intel_pmu_lbr_enable(struct perf_event *event)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (!x86_pmu.lbr_nr)
+ return;
+
+ WARN_ON(cpuc->enabled);
+
+ /*
+ * Reset the LBR stack if this is the first LBR user or
+ * we changed task context so as to avoid data leaks.
+ */
+
+ if (!cpuc->lbr_users ||
+ (event->ctx->task && cpuc->lbr_context != event->ctx)) {
+ intel_pmu_lbr_reset();
+ cpuc->lbr_context = event->ctx;
+ }
+
+ cpuc->lbr_users++;
+}
+
+static void intel_pmu_lbr_disable(struct perf_event *event)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (!x86_pmu.lbr_nr)
+ return;
+
+ cpuc->lbr_users--;
+
+ BUG_ON(cpuc->lbr_users < 0);
+ WARN_ON(cpuc->enabled);
+}
+
+static void intel_pmu_lbr_enable_all(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (cpuc->lbr_users)
+ __intel_pmu_lbr_enable();
+}
+
+static void intel_pmu_lbr_disable_all(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (cpuc->lbr_users)
+ __intel_pmu_lbr_disable();
+}
+
+static inline u64 intel_pmu_lbr_tos(void)
+{
+ u64 tos;
+
+ rdmsrl(x86_pmu.lbr_tos, tos);
+
+ return tos;
+}
+
+static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
+{
+ unsigned long mask = x86_pmu.lbr_nr - 1;
+ u64 tos = intel_pmu_lbr_tos();
+ int i;
+
+ for (i = 0; i < x86_pmu.lbr_nr; i++, tos--) {
+ unsigned long lbr_idx = (tos - i) & mask;
+ union {
+ struct {
+ u32 from;
+ u32 to;
+ };
+ u64 lbr;
+ } msr_lastbranch;
+
+ rdmsrl(x86_pmu.lbr_from + lbr_idx, msr_lastbranch.lbr);
+
+ cpuc->lbr_stack[i].from = msr_lastbranch.from;
+ cpuc->lbr_stack[i].to = msr_lastbranch.to;
+ cpuc->lbr_stack[i].flags = 0;
+ }
+ cpuc->lbr_entries = i;
+}
+
+#define LBR_FROM_FLAG_MISPRED (1ULL << 63)
+
+/*
+ * Due to lack of segmentation in Linux the effective address (offset)
+ * is the same as the linear address, allowing us to merge the LIP and EIP
+ * LBR formats.
+ */
+static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
+{
+ unsigned long mask = x86_pmu.lbr_nr - 1;
+ u64 tos = intel_pmu_lbr_tos();
+ int i;
+
+ for (i = 0; i < x86_pmu.lbr_nr; i++, tos--) {
+ unsigned long lbr_idx = (tos - i) & mask;
+ u64 from, to, flags = 0;
+
+ rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
+ rdmsrl(x86_pmu.lbr_to + lbr_idx, to);
+
+ if (x86_pmu.lbr_format == LBR_FORMAT_EIP_FLAGS) {
+ flags = !!(from & LBR_FROM_FLAG_MISPRED);
+ from = (u64)((((s64)from) << 1) >> 1);
+ }
+
+ cpuc->lbr_stack[i].from = from;
+ cpuc->lbr_stack[i].to = to;
+ cpuc->lbr_stack[i].flags = flags;
+ }
+ cpuc->lbr_entries = i;
+}
+
+static void intel_pmu_lbr_read(void)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (!cpuc->lbr_users)
+ return;
+
+ if (x86_pmu.lbr_format == LBR_FORMAT_32)
+ intel_pmu_lbr_read_32(cpuc);
+ else
+ intel_pmu_lbr_read_64(cpuc);
+}
+
+static int intel_pmu_lbr_format(void)
+{
+ u64 capabilities;
+
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);
+ return capabilities & 0x1f;
+}
+
+static void intel_pmu_lbr_init_core(void)
+{
+ x86_pmu.lbr_format = intel_pmu_lbr_format();
+ x86_pmu.lbr_nr = 4;
+ x86_pmu.lbr_tos = 0x01c9;
+ x86_pmu.lbr_from = 0x40;
+ x86_pmu.lbr_to = 0x60;
+}
+
+static void intel_pmu_lbr_init_nhm(void)
+{
+ x86_pmu.lbr_format = intel_pmu_lbr_format();
+ x86_pmu.lbr_nr = 16;
+ x86_pmu.lbr_tos = 0x01c9;
+ x86_pmu.lbr_from = 0x680;
+ x86_pmu.lbr_to = 0x6c0;
+}
+
+static void intel_pmu_lbr_init_atom(void)
+{
+ x86_pmu.lbr_format = intel_pmu_lbr_format();
+ x86_pmu.lbr_nr = 8;
+ x86_pmu.lbr_tos = 0x01c9;
+ x86_pmu.lbr_from = 0x40;
+ x86_pmu.lbr_to = 0x60;
+}
+
+#endif /* CONFIG_CPU_SUP_INTEL */
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 09/11] perf, x86: Implement PERF_SAMPLE_BRANCH_STACK
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (7 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 08/11] perf, x86: Implement simple LBR support Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 21:08 ` Frederic Weisbecker
2010-03-03 16:39 ` [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 11/11] perf, x86: Clean up IA32_PERF_CAPABILITIES usage Peter Zijlstra
10 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-sample-lbr.patch --]
[-- Type: text/plain, Size: 9664 bytes --]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 14 +++-------
arch/x86/kernel/cpu/perf_event_intel.c | 10 ++++++-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 16 ++++--------
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 20 ++++++++-------
include/linux/perf_event.h | 27 +++++++++++++++++---
kernel/perf_event.c | 38 ++++++++++++++++++++++-------
6 files changed, 83 insertions(+), 42 deletions(-)
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -126,8 +126,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_STREAM_ID = 1U << 9,
PERF_SAMPLE_RAW = 1U << 10,
PERF_SAMPLE_REGS = 1U << 11,
+ PERF_SAMPLE_BRANCH_STACK = 1U << 12,
- PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 13, /* non-ABI */
};
/*
@@ -395,9 +396,14 @@ enum perf_event_type {
* { struct read_format values; } && PERF_SAMPLE_READ
* { struct pt_regs regs; } && PERF_SAMPLE_REGS
*
- * { u64 nr,
+ * { u64 nr;
* u64 ips[nr]; } && PERF_SAMPLE_CALLCHAIN
*
+ * { u64 nr;
+ * { u64 from, to, flags;
+ * } lbr[nr]; } && PERF_SAMPLE_BRANCH_STACK
+ *
+ *
* #
* # The RAW record below is opaque data wrt the ABI
* #
@@ -469,6 +475,17 @@ struct perf_raw_record {
void *data;
};
+struct perf_branch_entry {
+ __u64 from;
+ __u64 to;
+ __u64 flags;
+};
+
+struct perf_branch_stack {
+ __u64 nr;
+ struct perf_branch_entry entries[0];
+};
+
struct task_struct;
/**
@@ -803,13 +820,15 @@ struct perf_sample_data {
struct perf_callchain_entry *callchain;
struct perf_raw_record *raw;
struct pt_regs *regs;
+ struct perf_branch_stack *branches;
};
static inline
void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
{
- data->addr = addr;
- data->raw = NULL;
+ data->addr = addr;
+ data->raw = NULL;
+ data->branches = NULL;
}
extern void perf_output_sample(struct perf_output_handle *handle,
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -3189,12 +3189,9 @@ void perf_output_sample(struct perf_outp
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
if (data->callchain) {
- int size = 1;
+ int size = sizeof(u64);
- if (data->callchain)
- size += data->callchain->nr;
-
- size *= sizeof(u64);
+ size += data->callchain->nr * sizeof(u64);
perf_output_copy(handle, data->callchain, size);
} else {
@@ -3203,6 +3200,20 @@ void perf_output_sample(struct perf_outp
}
}
+ if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ if (data->branches) {
+ int size = sizeof(u64);
+
+ size += data->branches->nr *
+ sizeof(struct perf_branch_entry);
+
+ perf_output_copy(handle, data->branches, size);
+ } else {
+ u64 nr = 0;
+ perf_output_put(handle, nr);
+ }
+ }
+
if (sample_type & PERF_SAMPLE_RAW) {
if (data->raw) {
perf_output_put(handle, data->raw->size);
@@ -3291,14 +3302,25 @@ void perf_prepare_sample(struct perf_eve
}
if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- int size = 1;
+ int size = sizeof(u64);
data->callchain = perf_callchain(regs);
if (data->callchain)
- size += data->callchain->nr;
+ size += data->callchain->nr * sizeof(u64);
+
+ header->size += size;
+ }
- header->size += size * sizeof(u64);
+ if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ int size = sizeof(u64);
+
+ if (data->branches) {
+ size += data->branches->nr *
+ sizeof(struct perf_branch_entry);
+ }
+
+ header->size += size;
}
if (sample_type & PERF_SAMPLE_RAW) {
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -50,10 +50,6 @@ struct amd_nb {
#define MAX_LBR_ENTRIES 16
-struct lbr_entry {
- u64 from, to, flags;
-};
-
struct cpu_hw_events {
/*
* Generic x86 PMC bits
@@ -78,10 +74,10 @@ struct cpu_hw_events {
/*
* Intel LBR bits
*/
- int lbr_users;
- int lbr_entries;
- struct lbr_entry lbr_stack[MAX_LBR_ENTRIES];
- void *lbr_context;
+ int lbr_users;
+ void *lbr_context;
+ struct perf_branch_stack lbr_stack;
+ struct perf_branch_entry lbr_entries[MAX_LBR_ENTRIES];
/*
* AMD specific bits
@@ -166,7 +162,7 @@ struct x86_pmu {
*/
int bts, pebs;
int pebs_record_size;
- void (*drain_pebs)(void);
+ void (*drain_pebs)(struct perf_sample_data *data);
struct event_constraint *pebs_constraints;
/*
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -138,11 +138,11 @@ static void intel_pmu_lbr_read_32(struct
rdmsrl(x86_pmu.lbr_from + lbr_idx, msr_lastbranch.lbr);
- cpuc->lbr_stack[i].from = msr_lastbranch.from;
- cpuc->lbr_stack[i].to = msr_lastbranch.to;
- cpuc->lbr_stack[i].flags = 0;
+ cpuc->lbr_entries[i].from = msr_lastbranch.from;
+ cpuc->lbr_entries[i].to = msr_lastbranch.to;
+ cpuc->lbr_entries[i].flags = 0;
}
- cpuc->lbr_entries = i;
+ cpuc->lbr_stack.nr = i;
}
#define LBR_FROM_FLAG_MISPRED (1ULL << 63)
@@ -170,14 +170,14 @@ static void intel_pmu_lbr_read_64(struct
from = (u64)((((s64)from) << 1) >> 1);
}
- cpuc->lbr_stack[i].from = from;
- cpuc->lbr_stack[i].to = to;
- cpuc->lbr_stack[i].flags = flags;
+ cpuc->lbr_entries[i].from = from;
+ cpuc->lbr_entries[i].to = to;
+ cpuc->lbr_entries[i].flags = flags;
}
- cpuc->lbr_entries = i;
+ cpuc->lbr_stack.nr = i;
}
-static void intel_pmu_lbr_read(void)
+static void intel_pmu_lbr_read(struct perf_sample_data *data)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -188,6 +188,8 @@ static void intel_pmu_lbr_read(void)
intel_pmu_lbr_read_32(cpuc);
else
intel_pmu_lbr_read_64(cpuc);
+
+ data->branches = &cpuc->lbr_stack;
}
static int intel_pmu_lbr_format(void)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -548,6 +548,9 @@ static void intel_pmu_disable_event(stru
if (unlikely(event->attr.precise))
intel_pmu_pebs_disable(hwc);
+
+ if (event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK)
+ intel_pmu_lbr_disable(event);
}
static void intel_pmu_enable_fixed(struct hw_perf_event *hwc)
@@ -602,6 +605,9 @@ static void intel_pmu_enable_event(struc
if (unlikely(event->attr.precise))
intel_pmu_pebs_enable(hwc);
+ if (event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK)
+ intel_pmu_lbr_enable(event);
+
__x86_pmu_enable_event(hwc);
}
@@ -677,13 +683,13 @@ again:
inc_irq_stat(apic_perf_irqs);
ack = status;
- intel_pmu_lbr_read();
+ intel_pmu_lbr_read(&data);
/*
* PEBS overflow sets bit 62 in the global status register
*/
if (__test_and_clear_bit(62, (unsigned long *)&status))
- x86_pmu.drain_pebs();
+ x86_pmu.drain_pebs(&data);
for_each_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -418,13 +418,12 @@ do { \
static int intel_pmu_save_and_restart(struct perf_event *event);
static void intel_pmu_disable_event(struct perf_event *event);
-static void intel_pmu_drain_pebs_core(void)
+static void intel_pmu_drain_pebs_core(struct perf_sample_data *data)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
struct perf_event *event = cpuc->events[0]; /* PMC0 only */
struct pebs_record_core *at, *top;
- struct perf_sample_data data;
struct pt_regs regs;
int n;
@@ -444,8 +443,7 @@ static void intel_pmu_drain_pebs_core(vo
if (!intel_pmu_save_and_restart(event))
goto out;
- perf_sample_data_init(&data, 0);
- data.period = event->hw.last_period;
+ data->period = event->hw.last_period;
n = top - at;
@@ -460,19 +458,18 @@ static void intel_pmu_drain_pebs_core(vo
PEBS_TO_REGS(at, ®s);
- if (perf_event_overflow(event, 1, &data, ®s))
+ if (perf_event_overflow(event, 1, data, ®s))
intel_pmu_disable_event(event);
out:
intel_pmu_pebs_enable_all();
}
-static void intel_pmu_drain_pebs_nhm(void)
+static void intel_pmu_drain_pebs_nhm(struct perf_sample_data *data)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
struct pebs_record_nhm *at, *top;
- struct perf_sample_data data;
struct perf_event *event = NULL;
struct pt_regs regs;
int bit, n;
@@ -519,12 +516,11 @@ static void intel_pmu_drain_pebs_nhm(voi
if (!intel_pmu_save_and_restart(event))
continue;
- perf_sample_data_init(&data, 0);
- data.period = event->hw.last_period;
+ data->period = event->hw.last_period;
PEBS_TO_REGS(at, ®s);
- if (perf_event_overflow(event, 1, &data, ®s))
+ if (perf_event_overflow(event, 1, data, ®s))
intel_pmu_disable_event(event);
}
out:
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (8 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 09/11] perf, x86: Implement PERF_SAMPLE_BRANCH_STACK Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
2010-03-03 18:05 ` Masami Hiramatsu
2010-03-03 16:39 ` [RFC][PATCH 11/11] perf, x86: Clean up IA32_PERF_CAPABILITIES usage Peter Zijlstra
10 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Masami Hiramatsu,
Peter Zijlstra
[-- Attachment #1: perf-pebs-lbr.patch --]
[-- Type: text/plain, Size: 6648 bytes --]
PEBS always reports the IP+1, that is the instruction after the one
that got sampled, cure this by using the LBR to reliably rewind the
instruction stream.
CC: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 70 ++++++++++++-------------
arch/x86/kernel/cpu/perf_event_intel.c | 4 -
arch/x86/kernel/cpu/perf_event_intel_ds.c | 81 +++++++++++++++++++++++++++++-
3 files changed, 116 insertions(+), 39 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -29,6 +29,41 @@
#include <asm/stacktrace.h>
#include <asm/nmi.h>
+/*
+ * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
+ */
+static unsigned long
+copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
+{
+ unsigned long offset, addr = (unsigned long)from;
+ int type = in_nmi() ? KM_NMI : KM_IRQ0;
+ unsigned long size, len = 0;
+ struct page *page;
+ void *map;
+ int ret;
+
+ do {
+ ret = __get_user_pages_fast(addr, 1, 0, &page);
+ if (!ret)
+ break;
+
+ offset = addr & (PAGE_SIZE - 1);
+ size = min(PAGE_SIZE - offset, n - len);
+
+ map = kmap_atomic(page, type);
+ memcpy(to, map+offset, size);
+ kunmap_atomic(map, type);
+ put_page(page);
+
+ len += size;
+ to += size;
+ addr += size;
+
+ } while (len < n);
+
+ return len;
+}
+
static u64 perf_event_mask __read_mostly;
struct event_constraint {
@@ -1516,41 +1551,6 @@ perf_callchain_kernel(struct pt_regs *re
dump_trace(NULL, regs, NULL, regs->bp, &backtrace_ops, entry);
}
-/*
- * best effort, GUP based copy_from_user() that assumes IRQ or NMI context
- */
-static unsigned long
-copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
-{
- unsigned long offset, addr = (unsigned long)from;
- int type = in_nmi() ? KM_NMI : KM_IRQ0;
- unsigned long size, len = 0;
- struct page *page;
- void *map;
- int ret;
-
- do {
- ret = __get_user_pages_fast(addr, 1, 0, &page);
- if (!ret)
- break;
-
- offset = addr & (PAGE_SIZE - 1);
- size = min(PAGE_SIZE - offset, n - len);
-
- map = kmap_atomic(page, type);
- memcpy(to, map+offset, size);
- kunmap_atomic(map, type);
- put_page(page);
-
- len += size;
- to += size;
- addr += size;
-
- } while (len < n);
-
- return len;
-}
-
static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
{
unsigned long bytes;
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -547,7 +547,7 @@ static void intel_pmu_disable_event(stru
x86_pmu_disable_event(event);
if (unlikely(event->attr.precise))
- intel_pmu_pebs_disable(hwc);
+ intel_pmu_pebs_disable(event);
if (event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK)
intel_pmu_lbr_disable(event);
@@ -603,7 +603,7 @@ static void intel_pmu_enable_event(struc
}
if (unlikely(event->attr.precise))
- intel_pmu_pebs_enable(hwc);
+ intel_pmu_pebs_enable(event);
if (event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK)
intel_pmu_lbr_enable(event);
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -331,26 +331,32 @@ intel_pebs_constraints(struct perf_event
return &emptyconstraint;
}
-static void intel_pmu_pebs_enable(struct hw_perf_event *hwc)
+static void intel_pmu_pebs_enable(struct perf_event *event)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
u64 val = cpuc->pebs_enabled;
hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
val |= 1ULL << hwc->idx;
wrmsrl(MSR_IA32_PEBS_ENABLE, val);
+
+ intel_pmu_lbr_enable(event);
}
-static void intel_pmu_pebs_disable(struct hw_perf_event *hwc)
+static void intel_pmu_pebs_disable(struct perf_event *event)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
u64 val = cpuc->pebs_enabled;
val &= ~(1ULL << hwc->idx);
wrmsrl(MSR_IA32_PEBS_ENABLE, val);
hwc->config |= ARCH_PERFMON_EVENTSEL_INT;
+
+ intel_pmu_lbr_disable(event);
}
static void intel_pmu_pebs_enable_all(void)
@@ -415,6 +421,74 @@ do { \
#endif
+#include <asm/insn.h>
+
+#define MAX_INSN_SIZE 16
+
+static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
+{
+#if 0
+ /*
+ * Borken, makes the machine expode at times trying to
+ * derefence funny userspace addresses.
+ *
+ * Should we always fwd decode from @to, instead of trying
+ * to rewind as implemented?
+ */
+
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ unsigned long from = cpuc->lbr_entries[0].from;
+ unsigned long to = cpuc->lbr_entries[0].to;
+ unsigned long ip = regs->ip;
+ u8 buf[2*MAX_INSN_SIZE];
+ u8 *kaddr;
+ int i;
+
+ if (from && to) {
+ /*
+ * We sampled a branch insn, rewind using the LBR stack
+ */
+ if (ip == to) {
+ regs->ip = from;
+ return;
+ }
+ }
+
+ if (user_mode(regs)) {
+ int bytes = copy_from_user_nmi(buf,
+ (void __user *)(ip - MAX_INSN_SIZE),
+ 2*MAX_INSN_SIZE);
+
+ /*
+ * If we fail to copy the insn stream, give up
+ */
+ if (bytes != 2*MAX_INSN_SIZE)
+ return;
+
+ kaddr = buf;
+ } else
+ kaddr = (void *)(ip - MAX_INSN_SIZE);
+
+ /*
+ * Try to find the longest insn ending up at the given IP
+ */
+ for (i = MAX_INSN_SIZE; i > 0; i--) {
+ struct insn insn;
+
+ kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
+ insn_get_length(&insn);
+ if (insn.length == i) {
+ regs->ip -= i;
+ return;
+ }
+ }
+
+ /*
+ * We failed to find a match for the previous insn.. give up
+ */
+#endif
+}
+
static int intel_pmu_save_and_restart(struct perf_event *event);
static void intel_pmu_disable_event(struct perf_event *event);
@@ -458,6 +532,8 @@ static void intel_pmu_drain_pebs_core(st
PEBS_TO_REGS(at, ®s);
+ intel_pmu_pebs_fixup_ip(®s);
+
if (perf_event_overflow(event, 1, data, ®s))
intel_pmu_disable_event(event);
@@ -519,6 +595,7 @@ static void intel_pmu_drain_pebs_nhm(str
data->period = event->hw.last_period;
PEBS_TO_REGS(at, ®s);
+ intel_pmu_pebs_fixup_ip(®s);
if (perf_event_overflow(event, 1, data, ®s))
intel_pmu_disable_event(event);
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC][PATCH 11/11] perf, x86: Clean up IA32_PERF_CAPABILITIES usage
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
` (9 preceding siblings ...)
2010-03-03 16:39 ` [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup Peter Zijlstra
@ 2010-03-03 16:39 ` Peter Zijlstra
10 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 16:39 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: paulus, eranian, robert.richter, fweisbec, Peter Zijlstra
[-- Attachment #1: perf-capabilities.patch --]
[-- Type: text/plain, Size: 6302 bytes --]
Saner PERF_CAPABILITIES support
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/x86/kernel/cpu/perf_event.c | 15 +++++++++++++--
arch/x86/kernel/cpu/perf_event_intel.c | 10 ++++++++++
arch/x86/kernel/cpu/perf_event_intel_ds.c | 26 +++++++++++++-------------
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 18 ++++--------------
4 files changed, 40 insertions(+), 29 deletions(-)
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -154,6 +154,17 @@ struct cpu_hw_events {
#define for_each_event_constraint(e, c) \
for ((e) = (c); (e)->cmask; (e)++)
+union perf_capabilities {
+ struct {
+ u64 lbr_format : 6;
+ u64 pebs_trap : 1;
+ u64 pebs_arch_reg : 1;
+ u64 pebs_format : 4;
+ u64 smm_freeze : 1;
+ };
+ u64 capabilities;
+};
+
/*
* struct x86_pmu - generic x86 pmu
*/
@@ -190,7 +201,8 @@ struct x86_pmu {
/*
* Intel Arch Perfmon v2+
*/
- u64 intel_ctrl;
+ u64 intel_ctrl;
+ union perf_capabilities intel_perf_capabilities;
/*
* Intel DebugStore bits
@@ -205,7 +217,6 @@ struct x86_pmu {
*/
unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */
int lbr_nr; /* hardware stack size */
- int lbr_format; /* hardware format */
};
static struct x86_pmu x86_pmu __read_mostly;
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -840,6 +840,16 @@ static __init int intel_pmu_init(void)
if (version > 1)
x86_pmu.num_events_fixed = max((int)edx.split.num_events_fixed, 3);
+ /*
+ * v2 and above have a perf capabilities MSR
+ */
+ if (version > 1) {
+ u64 capabilities;
+
+ rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);
+ x86_pmu.intel_perf_capabilities.capabilities = capabilities;
+ }
+
intel_ds_init();
/*
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -444,6 +444,12 @@ static void intel_pmu_pebs_fixup_ip(stru
u8 *kaddr;
int i;
+ /*
+ * We don't need to fixup if the PEBS assist is fault like
+ */
+ if (!x86_pmu.intel_perf_capabilities.pebs_trap)
+ return;
+
if (from && to) {
/*
* We sampled a branch insn, rewind using the LBR stack
@@ -619,34 +625,28 @@ static void intel_ds_init(void)
x86_pmu.bts = boot_cpu_has(X86_FEATURE_BTS);
x86_pmu.pebs = boot_cpu_has(X86_FEATURE_PEBS);
if (x86_pmu.pebs) {
- int format = 0;
-
- if (x86_pmu.version > 1) {
- u64 capabilities;
- /*
- * v2+ has a PEBS format field
- */
- rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);
- format = (capabilities >> 8) & 0xf;
- }
+ int format = x86_pmu.intel_perf_capabilities.pebs_format;
+ char pebs_type =
+ x86_pmu.intel_perf_capabilities.pebs_trap ? '+' : '-';
switch (format) {
case 0:
- printk(KERN_CONT "PEBS v0, ");
+ printk(KERN_CONT "PEBS fmt0%c, ", pebs_type);
x86_pmu.pebs_record_size = sizeof(struct pebs_record_core);
x86_pmu.drain_pebs = intel_pmu_drain_pebs_core;
x86_pmu.pebs_constraints = intel_core_pebs_events;
break;
case 1:
- printk(KERN_CONT "PEBS v1, ");
+ printk(KERN_CONT "PEBS fmt1%c, ", pebs_type);
x86_pmu.pebs_record_size = sizeof(struct pebs_record_nhm);
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
x86_pmu.pebs_constraints = intel_nehalem_pebs_events;
break;
default:
- printk(KERN_CONT "PEBS unknown format: %d, ", format);
+ printk(KERN_CONT "no PEBS fmt%d%c, ",
+ format, pebs_type);
x86_pmu.pebs = 0;
break;
}
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -53,7 +53,7 @@ static void intel_pmu_lbr_reset_64(void)
static void intel_pmu_lbr_reset(void)
{
- if (x86_pmu.lbr_format == LBR_FORMAT_32)
+ if (x86_pmu.intel_perf_capabilities.lbr_format == LBR_FORMAT_32)
intel_pmu_lbr_reset_32();
else
intel_pmu_lbr_reset_64();
@@ -155,6 +155,7 @@ static void intel_pmu_lbr_read_32(struct
static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
{
unsigned long mask = x86_pmu.lbr_nr - 1;
+ int lbr_format = x86_pmu.intel_perf_capabilities.lbr_format;
u64 tos = intel_pmu_lbr_tos();
int i;
@@ -165,7 +166,7 @@ static void intel_pmu_lbr_read_64(struct
rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to + lbr_idx, to);
- if (x86_pmu.lbr_format == LBR_FORMAT_EIP_FLAGS) {
+ if (lbr_format == LBR_FORMAT_EIP_FLAGS) {
flags = !!(from & LBR_FROM_FLAG_MISPRED);
from = (u64)((((s64)from) << 1) >> 1);
}
@@ -184,7 +185,7 @@ static void intel_pmu_lbr_read(struct pe
if (!cpuc->lbr_users)
return;
- if (x86_pmu.lbr_format == LBR_FORMAT_32)
+ if (x86_pmu.intel_perf_capabilities.lbr_format == LBR_FORMAT_32)
intel_pmu_lbr_read_32(cpuc);
else
intel_pmu_lbr_read_64(cpuc);
@@ -192,17 +193,8 @@ static void intel_pmu_lbr_read(struct pe
data->branches = &cpuc->lbr_stack;
}
-static int intel_pmu_lbr_format(void)
-{
- u64 capabilities;
-
- rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);
- return capabilities & 0x1f;
-}
-
static void intel_pmu_lbr_init_core(void)
{
- x86_pmu.lbr_format = intel_pmu_lbr_format();
x86_pmu.lbr_nr = 4;
x86_pmu.lbr_tos = 0x01c9;
x86_pmu.lbr_from = 0x40;
@@ -211,7 +203,6 @@ static void intel_pmu_lbr_init_core(void
static void intel_pmu_lbr_init_nhm(void)
{
- x86_pmu.lbr_format = intel_pmu_lbr_format();
x86_pmu.lbr_nr = 16;
x86_pmu.lbr_tos = 0x01c9;
x86_pmu.lbr_from = 0x680;
@@ -220,7 +211,6 @@ static void intel_pmu_lbr_init_nhm(void)
static void intel_pmu_lbr_init_atom(void)
{
- x86_pmu.lbr_format = intel_pmu_lbr_format();
x86_pmu.lbr_nr = 8;
x86_pmu.lbr_tos = 0x01c9;
x86_pmu.lbr_from = 0x40;
--
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization
2010-03-03 16:39 ` [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization Peter Zijlstra
@ 2010-03-03 16:49 ` David Miller
2010-03-03 21:14 ` Frederic Weisbecker
2010-03-05 8:44 ` Jean Pihet
2 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2010-03-03 16:49 UTC (permalink / raw)
To: a.p.zijlstra
Cc: mingo, linux-kernel, paulus, eranian, robert.richter, fweisbec,
jamie.iles, jpihet, stable
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Wed, 03 Mar 2010 17:39:41 +0100
> This makes it easier to extend perf_sample_data and fixes a bug on
> arm and sparc, which failed to set ->raw to NULL, which can cause
> crashes when combined with PERF_SAMPLE_RAW.
>
> It also optimizes PowerPC and tracepoint, because the struct
> initialization is forced to zero out the whole structure.
>
> CC: Jamie Iles <jamie.iles@picochip.com>
> CC: Jean Pihet <jpihet@mvista.com>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: David S. Miller <davem@davemloft.net>
> CC: Stephane Eranian <eranian@google.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: stable@kernel.org
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 16:39 ` [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS Peter Zijlstra
@ 2010-03-03 17:30 ` Stephane Eranian
2010-03-03 17:39 ` Peter Zijlstra
2010-03-03 22:02 ` Frederic Weisbecker
1 sibling, 1 reply; 44+ messages in thread
From: Stephane Eranian @ 2010-03-03 17:30 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec, David S. Miller
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 4218 bytes --]
This assumes struct pt_regs is somehow exported to userland.Is that the case?
I would clearly spell out that the REGS are the interrupted REGS,not the overflow REGS. Maybe PERF_SAMPLE_IREGS.
On Wed, Mar 3, 2010 at 8:39 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:> Simply copy out the provided pt_regs in a u64 aligned fashion.>> XXX: do task_pt_regs() and get_irq_regs() always clear everything or>   are we now leaking data?>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>> --->  include/linux/perf_event.h |   5 ++++->  kernel/perf_event.c     |  17 +++++++++++++++++>  2 files changed, 21 insertions(+), 1 deletion(-)>> Index: linux-2.6/include/linux/perf_event.h> ===================================================================> --- linux-2.6.orig/include/linux/perf_event.h> +++ linux-2.6/include/linux/perf_event.h> @@ -125,8 +125,9 @@ enum perf_event_sample_format {>     PERF_SAMPLE_PERIOD            = 1U << 8,>     PERF_SAMPLE_STREAM_ID          = 1U << 9,>     PERF_SAMPLE_RAW             = 1U << 10,> +    PERF_SAMPLE_REGS             = 1U << 11,>> -    PERF_SAMPLE_MAX = 1U << 11,       /* non-ABI */> +    PERF_SAMPLE_MAX = 1U << 12,       /* non-ABI */>  };>>  /*> @@ -392,6 +393,7 @@ enum perf_event_type {>     *    { u64          period;  } && PERF_SAMPLE_PERIOD>     *>     *    { struct read_format   values;  } && PERF_SAMPLE_READ> +     *    { struct pt_regs     regs;   } && PERF_SAMPLE_REGS>     *>     *    { u64          nr,>     *     u64          ips[nr];  } && PERF_SAMPLE_CALLCHAIN> @@ -800,6 +802,7 @@ struct perf_sample_data {>     u64               period;>     struct perf_callchain_entry   *callchain;>     struct perf_raw_record      *raw;> +    struct pt_regs          *regs;>  };>>  static inline> Index: linux-2.6/kernel/perf_event.c> ===================================================================> --- linux-2.6.orig/kernel/perf_event.c> +++ linux-2.6/kernel/perf_event.c> @@ -3176,6 +3176,17 @@ void perf_output_sample(struct perf_outp>     if (sample_type & PERF_SAMPLE_READ)>         perf_output_read(handle, event);>> +    if (sample_type & PERF_SAMPLE_REGS) {> +        int size = DIV_ROUND_UP(sizeof(struct pt_regs), sizeof(u64)) -> +              sizeof(struct pt_regs);> +> +        perf_output_put(handle, *data->regs);> +        if (size) {> +            u64 zero = 0;> +            perf_output_copy(handle, &zero, size);> +        }> +    }> +>     if (sample_type & PERF_SAMPLE_CALLCHAIN) {>         if (data->callchain) {>             int size = 1;> @@ -3273,6 +3284,12 @@ void perf_prepare_sample(struct perf_eve>     if (sample_type & PERF_SAMPLE_READ)>         header->size += perf_event_read_size(event);>> +    if (sample_type & PERF_SAMPLE_REGS) {> +        data->regs = regs;> +        header->size += DIV_ROUND_UP(sizeof(struct pt_regs),> +                       sizeof(u64));> +    }> +>     if (sample_type & PERF_SAMPLE_CALLCHAIN) {>         int size = 1;>>> -->>
-- Stephane Eranian | EMEA Software EngineeringGoogle France | 38 avenue de l'Opéra | 75002 ParisTel : +33 (0) 1 42 68 53 00This email may be confidential or privileged. If you received thiscommunication by mistake, pleasedon't forward it to anyone else, please erase all copies andattachments, and please let me know thatit went to the wrong person. Thanksÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 06/11] perf, x86: PEBS infrastructure
2010-03-03 16:39 ` [RFC][PATCH 06/11] perf, x86: PEBS infrastructure Peter Zijlstra
@ 2010-03-03 17:38 ` Robert Richter
2010-03-03 17:42 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Robert Richter @ 2010-03-03 17:38 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, eranian, fweisbec
On 03.03.10 17:39:42, Peter Zijlstra wrote:
> Implement a simple PEBS model that always takes a single PEBS event at
> a time. This is done so that the interaction with the rest of the
> system is as expected (freq adjust, period randomization, lbr).
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
[...]
> +static int validate_event(struct perf_event *event)
> +{
> + struct cpu_hw_events *fake_cpuc;
> + struct event_constraint *c;
> + int ret = 0;
> +
> + fake_cpuc = kmalloc(sizeof(*fake_cpuc), GFP_KERNEL | __GFP_ZERO);
> + if (!fake_cpuc)
> + return -ENOMEM;
> +
> + c = x86_pmu.get_event_constraints(fake_cpuc, event);
> +
> + if (!c || !c->weight)
> + ret = -ENOSPC;
> +
> + if (x86_pmu.put_event_constraints)
> + x86_pmu.put_event_constraints(fake_cpuc, event);
A fake cpu with the struct filled with zeros will cause a null pointer
exception in amd_get_event_constraints():
struct amd_nb *nb = cpuc->amd_nb;
Shouldn't x86_schedule_events() sufficient to decide if a single
counter is available? I did not yet look at group events, this might
happen there too.
-Robert
> +
> + kfree(fake_cpuc);
> +
> + return ret;
> +}
> +
> +/*
> * validate a single event group
> *
> * validation include:
> @@ -1495,6 +1426,8 @@ const struct pmu *hw_perf_event_init(str
>
> if (event->group_leader != event)
> err = validate_group(event);
> + else
> + err = validate_event(event);
>
> event->pmu = tmp;
> }
--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 17:30 ` Stephane Eranian
@ 2010-03-03 17:39 ` Peter Zijlstra
2010-03-03 17:49 ` Stephane Eranian
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 17:39 UTC (permalink / raw)
To: Stephane Eranian
Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec, David S. Miller
On Wed, 2010-03-03 at 09:30 -0800, Stephane Eranian wrote:
> This assumes struct pt_regs is somehow exported to userland.
> Is that the case?
I seems to have understood they were, and asm/ptrace.h seems to agree
with that, it has !__KERNEL__ definitions for struct pt_regs.
> I would clearly spell out that the REGS are the interrupted REGS,
> not the overflow REGS. Maybe PERF_SAMPLE_IREGS.
They can be both, for PEBS they are the overflow trap (until PEBS does
fault) regs.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 06/11] perf, x86: PEBS infrastructure
2010-03-03 17:38 ` Robert Richter
@ 2010-03-03 17:42 ` Peter Zijlstra
2010-03-04 8:50 ` Robert Richter
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 17:42 UTC (permalink / raw)
To: Robert Richter; +Cc: mingo, linux-kernel, paulus, eranian, fweisbec
On Wed, 2010-03-03 at 18:38 +0100, Robert Richter wrote:
> > + fake_cpuc = kmalloc(sizeof(*fake_cpuc), GFP_KERNEL | __GFP_ZERO);
> > + if (!fake_cpuc)
> > + return -ENOMEM;
> > +
> > + c = x86_pmu.get_event_constraints(fake_cpuc, event);
> > +
> > + if (!c || !c->weight)
> > + ret = -ENOSPC;
> > +
> > + if (x86_pmu.put_event_constraints)
> > + x86_pmu.put_event_constraints(fake_cpuc, event);
>
> A fake cpu with the struct filled with zeros will cause a null pointer
> exception in amd_get_event_constraints():
>
> struct amd_nb *nb = cpuc->amd_nb;
That should result in nb == NULL, right? which is checked slightly
further in the function.
> Shouldn't x86_schedule_events() sufficient to decide if a single
> counter is available? I did not yet look at group events, this might
> happen there too.
Sure, but we will only attempt scheduling them at enable time, this is a
creation time check, failing to create an unschedulable event seems
prudent.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 17:39 ` Peter Zijlstra
@ 2010-03-03 17:49 ` Stephane Eranian
2010-03-03 17:55 ` David Miller
0 siblings, 1 reply; 44+ messages in thread
From: Stephane Eranian @ 2010-03-03 17:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec, David S. Miller
On Wed, Mar 3, 2010 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-03-03 at 09:30 -0800, Stephane Eranian wrote:
>> This assumes struct pt_regs is somehow exported to userland.
>> Is that the case?
>
> I seems to have understood they were, and asm/ptrace.h seems to agree
> with that, it has !__KERNEL__ definitions for struct pt_regs.
>
Seems to be the case, indeed.
>> I would clearly spell out that the REGS are the interrupted REGS,
>> not the overflow REGS. Maybe PERF_SAMPLE_IREGS.
>
> They can be both, for PEBS they are the overflow trap (until PEBS does
> fault) regs.
You're saying without PEBS= interrupted state, with PEBS=overflow state.
That precludes requesting both interrupted + overflow state when PEBS is
enabled. That may be interesting to look at differences, distances (in the IP).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 17:49 ` Stephane Eranian
@ 2010-03-03 17:55 ` David Miller
2010-03-03 18:18 ` Stephane Eranian
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: David Miller @ 2010-03-03 17:55 UTC (permalink / raw)
To: eranian; +Cc: peterz, mingo, linux-kernel, paulus, robert.richter, fweisbec
From: Stephane Eranian <eranian@google.com>
Date: Wed, 3 Mar 2010 09:49:33 -0800
> On Wed, Mar 3, 2010 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Wed, 2010-03-03 at 09:30 -0800, Stephane Eranian wrote:
>>> This assumes struct pt_regs is somehow exported to userland.
>>> Is that the case?
>>
>> I seems to have understood they were, and asm/ptrace.h seems to agree
>> with that, it has !__KERNEL__ definitions for struct pt_regs.
>>
> Seems to be the case, indeed.
BTW, how are you going to cope with compat systems?
If I build 'perf' on a sparc64 kernel build, it's going to get the
64-bit pt_regs. So I can't then use that binary on a sparc box
running a 32-bit kernel.
And vice versa.
And more generally aren't we supposed to be able to eventually analyze
perf dumps on any platform not just the one 'perf' was built under?
We'll need to do something about the encoding of pt_regs, therefore.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup
2010-03-03 16:39 ` [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup Peter Zijlstra
@ 2010-03-03 18:05 ` Masami Hiramatsu
2010-03-03 19:37 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Masami Hiramatsu @ 2010-03-03 18:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, paulus, eranian, robert.richter, fweisbec
Peter Zijlstra wrote:
> PEBS always reports the IP+1, that is the instruction after the one
> that got sampled, cure this by using the LBR to reliably rewind the
> instruction stream.
Hmm, does PEBS always report one byte after the end address of the
sampled instruction? Or the instruction which will be executed next
step?
[...]
> +#include <asm/insn.h>
> +
> +#define MAX_INSN_SIZE 16
Hmm, we'd better integrate these kinds of definitions into
asm/insn.h... (several features define it)
> +
> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
> +{
> +#if 0
> + /*
> + * Borken, makes the machine expode at times trying to
> + * derefence funny userspace addresses.
> + *
> + * Should we always fwd decode from @to, instead of trying
> + * to rewind as implemented?
> + */
> +
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> + unsigned long from = cpuc->lbr_entries[0].from;
> + unsigned long to = cpuc->lbr_entries[0].to;
Ah, I see. For branch instruction case, we can use LBR to
find previous IP...
> + unsigned long ip = regs->ip;
> + u8 buf[2*MAX_INSN_SIZE];
> + u8 *kaddr;
> + int i;
> +
> + if (from && to) {
> + /*
> + * We sampled a branch insn, rewind using the LBR stack
> + */
> + if (ip == to) {
> + regs->ip = from;
> + return;
> + }
> + }
> +
> + if (user_mode(regs)) {
> + int bytes = copy_from_user_nmi(buf,
> + (void __user *)(ip - MAX_INSN_SIZE),
> + 2*MAX_INSN_SIZE);
> +
maybe, you'd better check the source address range is within
the user address range. e.g. ip < MAX_INSN_SIZE.
> + /*
> + * If we fail to copy the insn stream, give up
> + */
> + if (bytes != 2*MAX_INSN_SIZE)
> + return;
> +
> + kaddr = buf;
> + } else
> + kaddr = (void *)(ip - MAX_INSN_SIZE);
It also needs to be checked this address within kernel text.
> +
> + /*
> + * Try to find the longest insn ending up at the given IP
> + */
> + for (i = MAX_INSN_SIZE; i > 0; i--) {
> + struct insn insn;
> +
> + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
> + insn_get_length(&insn);
> + if (insn.length == i) {
> + regs->ip -= i;
> + return;
> + }
> + }
Hmm, this will not work correctly on x86, since the decoder can
miss-decode the tail bytes of previous instruction as prefix bytes. :(
Thus, if you want to rewind instruction stream, you need to decode
a function (or basic block) entirely.
Thank you,
> +
> + /*
> + * We failed to find a match for the previous insn.. give up
> + */
> +#endif
> +}
> +
> static int intel_pmu_save_and_restart(struct perf_event *event);
> static void intel_pmu_disable_event(struct perf_event *event);
>
> @@ -458,6 +532,8 @@ static void intel_pmu_drain_pebs_core(st
>
> PEBS_TO_REGS(at, ®s);
>
> + intel_pmu_pebs_fixup_ip(®s);
> +
> if (perf_event_overflow(event, 1, data, ®s))
> intel_pmu_disable_event(event);
>
> @@ -519,6 +595,7 @@ static void intel_pmu_drain_pebs_nhm(str
> data->period = event->hw.last_period;
>
> PEBS_TO_REGS(at, ®s);
> + intel_pmu_pebs_fixup_ip(®s);
>
> if (perf_event_overflow(event, 1, data, ®s))
> intel_pmu_disable_event(event);
>
> --
--
Masami Hiramatsu
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 17:55 ` David Miller
@ 2010-03-03 18:18 ` Stephane Eranian
2010-03-03 19:18 ` Peter Zijlstra
2010-03-04 2:59 ` Ingo Molnar
2 siblings, 0 replies; 44+ messages in thread
From: Stephane Eranian @ 2010-03-03 18:18 UTC (permalink / raw)
To: David Miller
Cc: peterz, mingo, linux-kernel, paulus, robert.richter, fweisbec
On Wed, Mar 3, 2010 at 9:55 AM, David Miller <davem@davemloft.net> wrote:
> From: Stephane Eranian <eranian@google.com>
> Date: Wed, 3 Mar 2010 09:49:33 -0800
>
>> On Wed, Mar 3, 2010 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> On Wed, 2010-03-03 at 09:30 -0800, Stephane Eranian wrote:
>>>> This assumes struct pt_regs is somehow exported to userland.
>>>> Is that the case?
>>>
>>> I seems to have understood they were, and asm/ptrace.h seems to agree
>>> with that, it has !__KERNEL__ definitions for struct pt_regs.
>>>
>> Seems to be the case, indeed.
>
> BTW, how are you going to cope with compat systems?
>
> If I build 'perf' on a sparc64 kernel build, it's going to get the
> 64-bit pt_regs. So I can't then use that binary on a sparc box
> running a 32-bit kernel.
>
> And vice versa.
>
That was going to be my next question. The pt_regs you return
depends on the binary you are monitoring (32 vs. 64) if interrupt
occurred in userland. But what about if it happens in kernel mode?
> And more generally aren't we supposed to be able to eventually analyze
> perf dumps on any platform not just the one 'perf' was built under?
>
> We'll need to do something about the encoding of pt_regs, therefore.
>
--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 17:55 ` David Miller
2010-03-03 18:18 ` Stephane Eranian
@ 2010-03-03 19:18 ` Peter Zijlstra
2010-03-04 2:59 ` Ingo Molnar
2 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 19:18 UTC (permalink / raw)
To: David Miller
Cc: eranian, mingo, linux-kernel, paulus, robert.richter, fweisbec
On Wed, 2010-03-03 at 09:55 -0800, David Miller wrote:
> From: Stephane Eranian <eranian@google.com>
> Date: Wed, 3 Mar 2010 09:49:33 -0800
>
> > On Wed, Mar 3, 2010 at 9:39 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Wed, 2010-03-03 at 09:30 -0800, Stephane Eranian wrote:
> >>> This assumes struct pt_regs is somehow exported to userland.
> >>> Is that the case?
> >>
> >> I seems to have understood they were, and asm/ptrace.h seems to agree
> >> with that, it has !__KERNEL__ definitions for struct pt_regs.
> >>
> > Seems to be the case, indeed.
>
> BTW, how are you going to cope with compat systems?
>
> If I build 'perf' on a sparc64 kernel build, it's going to get the
> 64-bit pt_regs. So I can't then use that binary on a sparc box
> running a 32-bit kernel.
>
> And vice versa.
>
> And more generally aren't we supposed to be able to eventually analyze
> perf dumps on any platform not just the one 'perf' was built under?
>
> We'll need to do something about the encoding of pt_regs, therefore.
Hrm, yes,.. what I can do for the moment is 'cheat' and make the raw
PEBS record available through PERF_SAMPLE_RAW (that also has CAP_ADMIN,
which I guess is a good idea for full reg sets), and then we can work
out how to expose pt_regs later.
If someone has a better suggestion than this, which is basically blurp
out host native pt_regs and cope, please tell ;-)
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup
2010-03-03 18:05 ` Masami Hiramatsu
@ 2010-03-03 19:37 ` Peter Zijlstra
2010-03-03 21:11 ` Masami Hiramatsu
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-03 19:37 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: mingo, linux-kernel, paulus, eranian, robert.richter, fweisbec
On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote:
> Peter Zijlstra wrote:
> > PEBS always reports the IP+1, that is the instruction after the one
> > that got sampled, cure this by using the LBR to reliably rewind the
> > instruction stream.
>
> Hmm, does PEBS always report one byte after the end address of the
> sampled instruction? Or the instruction which will be executed next
> step?
The next instruction, its trap like.
> [...]
> > +#include <asm/insn.h>
> > +
> > +#define MAX_INSN_SIZE 16
>
> Hmm, we'd better integrate these kinds of definitions into
> asm/insn.h... (several features define it)
Agreed, I'll look at doing a patch to collect them all into asm/insn.h
if nobody beats me to it :-)
> > +
> > +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
> > +{
> > +#if 0
> > + /*
> > + * Borken, makes the machine expode at times trying to
> > + * derefence funny userspace addresses.
> > + *
> > + * Should we always fwd decode from @to, instead of trying
> > + * to rewind as implemented?
> > + */
> > +
> > + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> > + unsigned long from = cpuc->lbr_entries[0].from;
> > + unsigned long to = cpuc->lbr_entries[0].to;
>
> Ah, I see. For branch instruction case, we can use LBR to
> find previous IP...
Right, we use the LBR to find the basic block.
> > + unsigned long ip = regs->ip;
> > + u8 buf[2*MAX_INSN_SIZE];
> > + u8 *kaddr;
> > + int i;
> > +
> > + if (from && to) {
> > + /*
> > + * We sampled a branch insn, rewind using the LBR stack
> > + */
> > + if (ip == to) {
> > + regs->ip = from;
> > + return;
> > + }
> > + }
> > +
> > + if (user_mode(regs)) {
> > + int bytes = copy_from_user_nmi(buf,
> > + (void __user *)(ip - MAX_INSN_SIZE),
> > + 2*MAX_INSN_SIZE);
> > +
>
> maybe, you'd better check the source address range is within
> the user address range. e.g. ip < MAX_INSN_SIZE.
Not only that, I realized user_mode() checks regs->cs, which is not set
by the PEBS code, so I added some helpers.
> > +
> > + /*
> > + * Try to find the longest insn ending up at the given IP
> > + */
> > + for (i = MAX_INSN_SIZE; i > 0; i--) {
> > + struct insn insn;
> > +
> > + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
> > + insn_get_length(&insn);
> > + if (insn.length == i) {
> > + regs->ip -= i;
> > + return;
> > + }
> > + }
>
> Hmm, this will not work correctly on x86, since the decoder can
> miss-decode the tail bytes of previous instruction as prefix bytes. :(
>
> Thus, if you want to rewind instruction stream, you need to decode
> a function (or basic block) entirely.
Something like the below?
#ifdef CONFIG_X86_32
static bool kernel_ip(unsigned long ip)
{
return ip > TASK_SIZE;
}
#else
static bool kernel_ip(unsigned long ip)
{
return (long)ip < 0;
}
#endif
static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
unsigned long from = cpuc->lbr_entries[0].from;
unsigned long old_to, to = cpuc->lbr_entries[0].to;
unsigned long ip = *ipp;
int i;
/*
* We don't need to fixup if the PEBS assist is fault like
*/
if (!x86_pmu.intel_perf_capabilities.pebs_trap)
return 0;
if (!cpuc->lbr_stack.nr || !from || !to)
return 0;
if (ip < to)
return 0;
/*
* We sampled a branch insn, rewind using the LBR stack
*/
if (ip == to) {
*ipp = from;
return 1;
}
do {
struct insn insn;
u8 buf[MAX_INSN_SIZE];
void *kaddr;
old_to = to;
if (!kernel_ip(ip)) {
int bytes = copy_from_user_nmi(buf, (void __user *)to,
MAX_INSN_SIZE);
if (bytes != MAX_INSN_SIZE)
return 0;
kaddr = buf;
} else kaddr = (void *)to;
kernel_insn_init(&insn, kaddr);
insn_get_length(&insn);
to += insn.length;
} while (to < ip);
if (to == ip) {
*ipp = old_to;
return 1;
}
return 0;
}
I thought about exposing the success of this fixup as a PERF_RECORD_MISC
bit.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 09/11] perf, x86: Implement PERF_SAMPLE_BRANCH_STACK
2010-03-03 16:39 ` [RFC][PATCH 09/11] perf, x86: Implement PERF_SAMPLE_BRANCH_STACK Peter Zijlstra
@ 2010-03-03 21:08 ` Frederic Weisbecker
0 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2010-03-03 21:08 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, eranian, robert.richter
On Wed, Mar 03, 2010 at 05:39:45PM +0100, Peter Zijlstra wrote:
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> arch/x86/kernel/cpu/perf_event.c | 14 +++-------
> arch/x86/kernel/cpu/perf_event_intel.c | 10 ++++++-
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 16 ++++--------
> arch/x86/kernel/cpu/perf_event_intel_lbr.c | 20 ++++++++-------
> include/linux/perf_event.h | 27 +++++++++++++++++---
> kernel/perf_event.c | 38 ++++++++++++++++++++++-------
> 6 files changed, 83 insertions(+), 42 deletions(-)
>
> Index: linux-2.6/include/linux/perf_event.h
> ===================================================================
> --- linux-2.6.orig/include/linux/perf_event.h
> +++ linux-2.6/include/linux/perf_event.h
> @@ -126,8 +126,9 @@ enum perf_event_sample_format {
> PERF_SAMPLE_STREAM_ID = 1U << 9,
> PERF_SAMPLE_RAW = 1U << 10,
> PERF_SAMPLE_REGS = 1U << 11,
> + PERF_SAMPLE_BRANCH_STACK = 1U << 12,
>
> - PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */
> + PERF_SAMPLE_MAX = 1U << 13, /* non-ABI */
> };
>
> /*
> @@ -395,9 +396,14 @@ enum perf_event_type {
> * { struct read_format values; } && PERF_SAMPLE_READ
> * { struct pt_regs regs; } && PERF_SAMPLE_REGS
> *
> - * { u64 nr,
> + * { u64 nr;
> * u64 ips[nr]; } && PERF_SAMPLE_CALLCHAIN
> *
> + * { u64 nr;
> + * { u64 from, to, flags;
> + * } lbr[nr]; } && PERF_SAMPLE_BRANCH_STACK
> + *
> + *
> * #
> * # The RAW record below is opaque data wrt the ABI
> * #
> @@ -469,6 +475,17 @@ struct perf_raw_record {
> void *data;
> };
>
> +struct perf_branch_entry {
> + __u64 from;
> + __u64 to;
> + __u64 flags;
> +};
> +
> +struct perf_branch_stack {
> + __u64 nr;
> + struct perf_branch_entry entries[0];
> +};
> +
> struct task_struct;
>
> /**
> @@ -803,13 +820,15 @@ struct perf_sample_data {
> struct perf_callchain_entry *callchain;
> struct perf_raw_record *raw;
> struct pt_regs *regs;
> + struct perf_branch_stack *branches;
> };
>
> static inline
> void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
> {
> - data->addr = addr;
> - data->raw = NULL;
> + data->addr = addr;
> + data->raw = NULL;
> + data->branches = NULL;
> }
>
> extern void perf_output_sample(struct perf_output_handle *handle,
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -3189,12 +3189,9 @@ void perf_output_sample(struct perf_outp
>
> if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> if (data->callchain) {
> - int size = 1;
> + int size = sizeof(u64);
>
> - if (data->callchain)
> - size += data->callchain->nr;
> -
> - size *= sizeof(u64);
> + size += data->callchain->nr * sizeof(u64);
>
> perf_output_copy(handle, data->callchain, size);
> } else {
> @@ -3203,6 +3200,20 @@ void perf_output_sample(struct perf_outp
> }
> }
>
> + if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
> + if (data->branches) {
> + int size = sizeof(u64);
> +
> + size += data->branches->nr *
> + sizeof(struct perf_branch_entry);
> +
> + perf_output_copy(handle, data->branches, size);
> + } else {
> + u64 nr = 0;
> + perf_output_put(handle, nr);
> + }
> + }
> +
> if (sample_type & PERF_SAMPLE_RAW) {
> if (data->raw) {
> perf_output_put(handle, data->raw->size);
> @@ -3291,14 +3302,25 @@ void perf_prepare_sample(struct perf_eve
> }
>
> if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> - int size = 1;
> + int size = sizeof(u64);
>
> data->callchain = perf_callchain(regs);
>
> if (data->callchain)
> - size += data->callchain->nr;
> + size += data->callchain->nr * sizeof(u64);
> +
> + header->size += size;
> + }
>
> - header->size += size * sizeof(u64);
> + if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
> + int size = sizeof(u64);
> +
> + if (data->branches) {
> + size += data->branches->nr *
> + sizeof(struct perf_branch_entry);
> + }
> +
> + header->size += size;
> }
That looks good to me, (at least the generic part, as I don't
know enough the x86 part to tell).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup
2010-03-03 19:37 ` Peter Zijlstra
@ 2010-03-03 21:11 ` Masami Hiramatsu
2010-03-03 21:50 ` Stephane Eranian
0 siblings, 1 reply; 44+ messages in thread
From: Masami Hiramatsu @ 2010-03-03 21:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, paulus, eranian, robert.richter, fweisbec
Peter Zijlstra wrote:
> On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote:
>> Peter Zijlstra wrote:
>>> PEBS always reports the IP+1, that is the instruction after the one
>>> that got sampled, cure this by using the LBR to reliably rewind the
>>> instruction stream.
>>
>> Hmm, does PEBS always report one byte after the end address of the
>> sampled instruction? Or the instruction which will be executed next
>> step?
>
> The next instruction, its trap like.
>
>> [...]
>>> +#include <asm/insn.h>
>>> +
>>> +#define MAX_INSN_SIZE 16
>>
>> Hmm, we'd better integrate these kinds of definitions into
>> asm/insn.h... (several features define it)
>
> Agreed, I'll look at doing a patch to collect them all into asm/insn.h
> if nobody beats me to it :-)
At least kprobes doesn't :)
>>> +
>>> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
>>> +{
>>> +#if 0
>>> + /*
>>> + * Borken, makes the machine expode at times trying to
>>> + * derefence funny userspace addresses.
>>> + *
>>> + * Should we always fwd decode from @to, instead of trying
>>> + * to rewind as implemented?
>>> + */
>>> +
>>> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>> + unsigned long from = cpuc->lbr_entries[0].from;
>>> + unsigned long to = cpuc->lbr_entries[0].to;
>>
>> Ah, I see. For branch instruction case, we can use LBR to
>> find previous IP...
>
> Right, we use the LBR to find the basic block.
Hm, that's a good idea :)
>>> + unsigned long ip = regs->ip;
>>> + u8 buf[2*MAX_INSN_SIZE];
>>> + u8 *kaddr;
>>> + int i;
>>> +
>>> + if (from && to) {
>>> + /*
>>> + * We sampled a branch insn, rewind using the LBR stack
>>> + */
>>> + if (ip == to) {
>>> + regs->ip = from;
>>> + return;
>>> + }
>>> + }
>>> +
>>> + if (user_mode(regs)) {
>>> + int bytes = copy_from_user_nmi(buf,
>>> + (void __user *)(ip - MAX_INSN_SIZE),
>>> + 2*MAX_INSN_SIZE);
>>> +
>>
>> maybe, you'd better check the source address range is within
>> the user address range. e.g. ip < MAX_INSN_SIZE.
>
> Not only that, I realized user_mode() checks regs->cs, which is not set
> by the PEBS code, so I added some helpers.
>
>>> +
>>> + /*
>>> + * Try to find the longest insn ending up at the given IP
>>> + */
>>> + for (i = MAX_INSN_SIZE; i > 0; i--) {
>>> + struct insn insn;
>>> +
>>> + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
>>> + insn_get_length(&insn);
>>> + if (insn.length == i) {
>>> + regs->ip -= i;
>>> + return;
>>> + }
>>> + }
>>
>> Hmm, this will not work correctly on x86, since the decoder can
>> miss-decode the tail bytes of previous instruction as prefix bytes. :(
>>
>> Thus, if you want to rewind instruction stream, you need to decode
>> a function (or basic block) entirely.
>
> Something like the below?
Great! it looks good to me.
Yeah, LBR.to may always smaller than current ip (if no one disabled LBR).
Thank you,
>
> #ifdef CONFIG_X86_32
> static bool kernel_ip(unsigned long ip)
> {
> return ip > TASK_SIZE;
> }
> #else
> static bool kernel_ip(unsigned long ip)
> {
> return (long)ip < 0;
> }
> #endif
>
> static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
> {
> struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> unsigned long from = cpuc->lbr_entries[0].from;
> unsigned long old_to, to = cpuc->lbr_entries[0].to;
> unsigned long ip = *ipp;
> int i;
>
> /*
> * We don't need to fixup if the PEBS assist is fault like
> */
> if (!x86_pmu.intel_perf_capabilities.pebs_trap)
> return 0;
>
> if (!cpuc->lbr_stack.nr || !from || !to)
> return 0;
>
> if (ip < to)
> return 0;
>
> /*
> * We sampled a branch insn, rewind using the LBR stack
> */
> if (ip == to) {
> *ipp = from;
> return 1;
> }
>
> do {
> struct insn insn;
> u8 buf[MAX_INSN_SIZE];
> void *kaddr;
>
> old_to = to;
> if (!kernel_ip(ip)) {
> int bytes = copy_from_user_nmi(buf, (void __user *)to,
> MAX_INSN_SIZE);
>
> if (bytes != MAX_INSN_SIZE)
> return 0;
>
> kaddr = buf;
> } else kaddr = (void *)to;
>
> kernel_insn_init(&insn, kaddr);
> insn_get_length(&insn);
> to += insn.length;
> } while (to < ip);
>
> if (to == ip) {
> *ipp = old_to;
> return 1;
> }
>
> return 0;
> }
>
> I thought about exposing the success of this fixup as a PERF_RECORD_MISC
> bit.
>
--
Masami Hiramatsu
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization
2010-03-03 16:39 ` [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization Peter Zijlstra
2010-03-03 16:49 ` David Miller
@ 2010-03-03 21:14 ` Frederic Weisbecker
2010-03-05 8:44 ` Jean Pihet
2 siblings, 0 replies; 44+ messages in thread
From: Frederic Weisbecker @ 2010-03-03 21:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, paulus, eranian, robert.richter, Jamie Iles,
Jean Pihet, David S. Miller, stable
On Wed, Mar 03, 2010 at 05:39:41PM +0100, Peter Zijlstra wrote:
> This makes it easier to extend perf_sample_data and fixes a bug on
> arm and sparc, which failed to set ->raw to NULL, which can cause
> crashes when combined with PERF_SAMPLE_RAW.
>
> It also optimizes PowerPC and tracepoint, because the struct
> initialization is forced to zero out the whole structure.
>
> CC: Jamie Iles <jamie.iles@picochip.com>
> CC: Jean Pihet <jpihet@mvista.com>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: David S. Miller <davem@davemloft.net>
> CC: Stephane Eranian <eranian@google.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup
2010-03-03 21:11 ` Masami Hiramatsu
@ 2010-03-03 21:50 ` Stephane Eranian
2010-03-04 8:57 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Stephane Eranian @ 2010-03-03 21:50 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Peter Zijlstra, mingo, linux-kernel, paulus, robert.richter, fweisbec
I think systematically and transparently using LBR to correct PEBS off-by-one
problem is not such a good idea. You've basically highjacked LBR and user
cannot use it in a different way.
There are PEBS+LBR measurements where you care about extracting the LBR data.
There are PEBS measurements where you don't care about getting the correct IP.
I don't necessarily want to pay the price, especially when this could
be done offline
in the tool.
On Wed, Mar 3, 2010 at 10:11 PM, Masami Hiramatsu <mhiramat@redhat.com> wrote:
> Peter Zijlstra wrote:
>> On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote:
>>> Peter Zijlstra wrote:
>>>> PEBS always reports the IP+1, that is the instruction after the one
>>>> that got sampled, cure this by using the LBR to reliably rewind the
>>>> instruction stream.
>>>
>>> Hmm, does PEBS always report one byte after the end address of the
>>> sampled instruction? Or the instruction which will be executed next
>>> step?
>>
>> The next instruction, its trap like.
>>
>>> [...]
>>>> +#include <asm/insn.h>
>>>> +
>>>> +#define MAX_INSN_SIZE 16
>>>
>>> Hmm, we'd better integrate these kinds of definitions into
>>> asm/insn.h... (several features define it)
>>
>> Agreed, I'll look at doing a patch to collect them all into asm/insn.h
>> if nobody beats me to it :-)
>
> At least kprobes doesn't :)
>
>>>> +
>>>> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
>>>> +{
>>>> +#if 0
>>>> + /*
>>>> + * Borken, makes the machine expode at times trying to
>>>> + * derefence funny userspace addresses.
>>>> + *
>>>> + * Should we always fwd decode from @to, instead of trying
>>>> + * to rewind as implemented?
>>>> + */
>>>> +
>>>> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>>> + unsigned long from = cpuc->lbr_entries[0].from;
>>>> + unsigned long to = cpuc->lbr_entries[0].to;
>>>
>>> Ah, I see. For branch instruction case, we can use LBR to
>>> find previous IP...
>>
>> Right, we use the LBR to find the basic block.
>
> Hm, that's a good idea :)
>
>>>> + unsigned long ip = regs->ip;
>>>> + u8 buf[2*MAX_INSN_SIZE];
>>>> + u8 *kaddr;
>>>> + int i;
>>>> +
>>>> + if (from && to) {
>>>> + /*
>>>> + * We sampled a branch insn, rewind using the LBR stack
>>>> + */
>>>> + if (ip == to) {
>>>> + regs->ip = from;
>>>> + return;
>>>> + }
>>>> + }
>>>> +
>>>> + if (user_mode(regs)) {
>>>> + int bytes = copy_from_user_nmi(buf,
>>>> + (void __user *)(ip - MAX_INSN_SIZE),
>>>> + 2*MAX_INSN_SIZE);
>>>> +
>>>
>>> maybe, you'd better check the source address range is within
>>> the user address range. e.g. ip < MAX_INSN_SIZE.
>>
>> Not only that, I realized user_mode() checks regs->cs, which is not set
>> by the PEBS code, so I added some helpers.
>>
>>>> +
>>>> + /*
>>>> + * Try to find the longest insn ending up at the given IP
>>>> + */
>>>> + for (i = MAX_INSN_SIZE; i > 0; i--) {
>>>> + struct insn insn;
>>>> +
>>>> + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
>>>> + insn_get_length(&insn);
>>>> + if (insn.length == i) {
>>>> + regs->ip -= i;
>>>> + return;
>>>> + }
>>>> + }
>>>
>>> Hmm, this will not work correctly on x86, since the decoder can
>>> miss-decode the tail bytes of previous instruction as prefix bytes. :(
>>>
>>> Thus, if you want to rewind instruction stream, you need to decode
>>> a function (or basic block) entirely.
>>
>> Something like the below?
>
> Great! it looks good to me.
> Yeah, LBR.to may always smaller than current ip (if no one disabled LBR).
>
> Thank you,
>
>>
>> #ifdef CONFIG_X86_32
>> static bool kernel_ip(unsigned long ip)
>> {
>> return ip > TASK_SIZE;
>> }
>> #else
>> static bool kernel_ip(unsigned long ip)
>> {
>> return (long)ip < 0;
>> }
>> #endif
>>
>> static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
>> {
>> struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>> unsigned long from = cpuc->lbr_entries[0].from;
>> unsigned long old_to, to = cpuc->lbr_entries[0].to;
>> unsigned long ip = *ipp;
>> int i;
>>
>> /*
>> * We don't need to fixup if the PEBS assist is fault like
>> */
>> if (!x86_pmu.intel_perf_capabilities.pebs_trap)
>> return 0;
>>
>> if (!cpuc->lbr_stack.nr || !from || !to)
>> return 0;
>>
>> if (ip < to)
>> return 0;
>>
>> /*
>> * We sampled a branch insn, rewind using the LBR stack
>> */
>> if (ip == to) {
>> *ipp = from;
>> return 1;
>> }
>>
>> do {
>> struct insn insn;
>> u8 buf[MAX_INSN_SIZE];
>> void *kaddr;
>>
>> old_to = to;
>> if (!kernel_ip(ip)) {
>> int bytes = copy_from_user_nmi(buf, (void __user *)to,
>> MAX_INSN_SIZE);
>>
>> if (bytes != MAX_INSN_SIZE)
>> return 0;
>>
>> kaddr = buf;
>> } else kaddr = (void *)to;
>>
>> kernel_insn_init(&insn, kaddr);
>> insn_get_length(&insn);
>> to += insn.length;
>> } while (to < ip);
>>
>> if (to == ip) {
>> *ipp = old_to;
>> return 1;
>> }
>>
>> return 0;
>> }
>>
>> I thought about exposing the success of this fixup as a PERF_RECORD_MISC
>> bit.
>>
>
> --
> Masami Hiramatsu
> e-mail: mhiramat@redhat.com
>
--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-03 16:39 ` [RFC][PATCH 08/11] perf, x86: Implement simple LBR support Peter Zijlstra
@ 2010-03-03 21:52 ` Stephane Eranian
2010-03-04 8:58 ` Peter Zijlstra
2010-03-03 21:57 ` Stephane Eranian
1 sibling, 1 reply; 44+ messages in thread
From: Stephane Eranian @ 2010-03-03 21:52 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
On Wed, Mar 3, 2010 at 5:39 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Implement support for Intel LBR stacks that support
> FREEZE_LBRS_ON_PMI. We do not (yet?) support the LBR config register
> because that is SMT wide and would also put undue restraints on the
> PEBS users.
>
You're saying PEBS users have priorities over pure LBR users?
Why is that?
Without coding this, how would you expose LBR configuration to userland
given you're using the PERF_SAMPLE_BRANCH_STACK approach?
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> arch/x86/kernel/cpu/perf_event.c | 22 ++
> arch/x86/kernel/cpu/perf_event_intel.c | 13 +
> arch/x86/kernel/cpu/perf_event_intel_lbr.c | 228 +++++++++++++++++++++++++++++
> 3 files changed, 263 insertions(+)
>
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
> @@ -48,6 +48,12 @@ struct amd_nb {
> struct event_constraint event_constraints[X86_PMC_IDX_MAX];
> };
>
> +#define MAX_LBR_ENTRIES 16
> +
> +struct lbr_entry {
> + u64 from, to, flags;
> +};
> +
> struct cpu_hw_events {
> /*
> * Generic x86 PMC bits
> @@ -70,6 +76,14 @@ struct cpu_hw_events {
> u64 pebs_enabled;
>
> /*
> + * Intel LBR bits
> + */
> + int lbr_users;
> + int lbr_entries;
> + struct lbr_entry lbr_stack[MAX_LBR_ENTRIES];
> + void *lbr_context;
> +
> + /*
> * AMD specific bits
> */
> struct amd_nb *amd_nb;
> @@ -154,6 +168,13 @@ struct x86_pmu {
> int pebs_record_size;
> void (*drain_pebs)(void);
> struct event_constraint *pebs_constraints;
> +
> + /*
> + * Intel LBR
> + */
> + unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */
> + int lbr_nr; /* hardware stack size */
> + int lbr_format; /* hardware format */
> };
>
> static struct x86_pmu x86_pmu __read_mostly;
> @@ -1238,6 +1259,7 @@ undo:
>
> #include "perf_event_amd.c"
> #include "perf_event_p6.c"
> +#include "perf_event_intel_lbr.c"
> #include "perf_event_intel_ds.c"
> #include "perf_event_intel.c"
>
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -480,6 +480,7 @@ static void intel_pmu_disable_all(void)
> intel_pmu_disable_bts();
>
> intel_pmu_pebs_disable_all();
> + intel_pmu_lbr_disable_all();
> }
>
> static void intel_pmu_enable_all(void)
> @@ -499,6 +500,7 @@ static void intel_pmu_enable_all(void)
> }
>
> intel_pmu_pebs_enable_all();
> + intel_pmu_lbr_enable_all();
> }
>
> static inline u64 intel_pmu_get_status(void)
> @@ -675,6 +677,8 @@ again:
> inc_irq_stat(apic_perf_irqs);
> ack = status;
>
> + intel_pmu_lbr_read();
> +
> /*
> * PEBS overflow sets bit 62 in the global status register
> */
> @@ -847,6 +851,8 @@ static __init int intel_pmu_init(void)
> memcpy(hw_cache_event_ids, core2_hw_cache_event_ids,
> sizeof(hw_cache_event_ids));
>
> + intel_pmu_lbr_init_core();
> +
> x86_pmu.event_constraints = intel_core2_event_constraints;
> pr_cont("Core2 events, ");
> break;
> @@ -856,13 +862,18 @@ static __init int intel_pmu_init(void)
> memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
> sizeof(hw_cache_event_ids));
>
> + intel_pmu_lbr_init_nhm();
> +
> x86_pmu.event_constraints = intel_nehalem_event_constraints;
> pr_cont("Nehalem/Corei7 events, ");
> break;
> +
> case 28: /* Atom */
> memcpy(hw_cache_event_ids, atom_hw_cache_event_ids,
> sizeof(hw_cache_event_ids));
>
> + intel_pmu_lbr_init_atom();
> +
> x86_pmu.event_constraints = intel_gen_event_constraints;
> pr_cont("Atom events, ");
> break;
> @@ -872,6 +883,8 @@ static __init int intel_pmu_init(void)
> memcpy(hw_cache_event_ids, westmere_hw_cache_event_ids,
> sizeof(hw_cache_event_ids));
>
> + intel_pmu_lbr_init_nhm();
> +
> x86_pmu.event_constraints = intel_westmere_event_constraints;
> pr_cont("Westmere events, ");
> break;
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> @@ -0,0 +1,228 @@
> +#ifdef CONFIG_CPU_SUP_INTEL
> +
> +enum {
> + LBR_FORMAT_32 = 0x00,
> + LBR_FORMAT_LIP = 0x01,
> + LBR_FORMAT_EIP = 0x02,
> + LBR_FORMAT_EIP_FLAGS = 0x03,
> +};
> +
> +/*
> + * We only support LBR implementations that have FREEZE_LBRS_ON_PMI
> + * otherwise it becomes near impossible to get a reliable stack.
> + */
> +
> +#define X86_DEBUGCTL_LBR (1 << 0)
> +#define X86_DEBUGCTL_FREEZE_LBRS_ON_PMI (1 << 11)
> +
> +static void __intel_pmu_lbr_enable(void)
> +{
> + u64 debugctl;
> +
> + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> + debugctl |= (X86_DEBUGCTL_LBR | X86_DEBUGCTL_FREEZE_LBRS_ON_PMI);
> + wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> +}
> +
> +static void __intel_pmu_lbr_disable(void)
> +{
> + u64 debugctl;
> +
> + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> + debugctl &= ~(X86_DEBUGCTL_LBR | X86_DEBUGCTL_FREEZE_LBRS_ON_PMI);
> + wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> +}
> +
> +static void intel_pmu_lbr_reset_32(void)
> +{
> + int i;
> +
> + for (i = 0; i < x86_pmu.lbr_nr; i++)
> + wrmsrl(x86_pmu.lbr_from + i, 0);
> +}
> +
> +static void intel_pmu_lbr_reset_64(void)
> +{
> + int i;
> +
> + for (i = 0; i < x86_pmu.lbr_nr; i++) {
> + wrmsrl(x86_pmu.lbr_from + i, 0);
> + wrmsrl(x86_pmu.lbr_to + i, 0);
> + }
> +}
> +
> +static void intel_pmu_lbr_reset(void)
> +{
> + if (x86_pmu.lbr_format == LBR_FORMAT_32)
> + intel_pmu_lbr_reset_32();
> + else
> + intel_pmu_lbr_reset_64();
> +}
> +
> +static void intel_pmu_lbr_enable(struct perf_event *event)
> +{
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> +
> + if (!x86_pmu.lbr_nr)
> + return;
> +
> + WARN_ON(cpuc->enabled);
> +
> + /*
> + * Reset the LBR stack if this is the first LBR user or
> + * we changed task context so as to avoid data leaks.
> + */
> +
> + if (!cpuc->lbr_users ||
> + (event->ctx->task && cpuc->lbr_context != event->ctx)) {
> + intel_pmu_lbr_reset();
> + cpuc->lbr_context = event->ctx;
> + }
> +
> + cpuc->lbr_users++;
> +}
> +
> +static void intel_pmu_lbr_disable(struct perf_event *event)
> +{
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> +
> + if (!x86_pmu.lbr_nr)
> + return;
> +
> + cpuc->lbr_users--;
> +
> + BUG_ON(cpuc->lbr_users < 0);
> + WARN_ON(cpuc->enabled);
> +}
> +
> +static void intel_pmu_lbr_enable_all(void)
> +{
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> +
> + if (cpuc->lbr_users)
> + __intel_pmu_lbr_enable();
> +}
> +
> +static void intel_pmu_lbr_disable_all(void)
> +{
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> +
> + if (cpuc->lbr_users)
> + __intel_pmu_lbr_disable();
> +}
> +
> +static inline u64 intel_pmu_lbr_tos(void)
> +{
> + u64 tos;
> +
> + rdmsrl(x86_pmu.lbr_tos, tos);
> +
> + return tos;
> +}
> +
> +static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
> +{
> + unsigned long mask = x86_pmu.lbr_nr - 1;
> + u64 tos = intel_pmu_lbr_tos();
> + int i;
> +
> + for (i = 0; i < x86_pmu.lbr_nr; i++, tos--) {
> + unsigned long lbr_idx = (tos - i) & mask;
> + union {
> + struct {
> + u32 from;
> + u32 to;
> + };
> + u64 lbr;
> + } msr_lastbranch;
> +
> + rdmsrl(x86_pmu.lbr_from + lbr_idx, msr_lastbranch.lbr);
> +
> + cpuc->lbr_stack[i].from = msr_lastbranch.from;
> + cpuc->lbr_stack[i].to = msr_lastbranch.to;
> + cpuc->lbr_stack[i].flags = 0;
> + }
> + cpuc->lbr_entries = i;
> +}
> +
> +#define LBR_FROM_FLAG_MISPRED (1ULL << 63)
> +
> +/*
> + * Due to lack of segmentation in Linux the effective address (offset)
> + * is the same as the linear address, allowing us to merge the LIP and EIP
> + * LBR formats.
> + */
> +static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
> +{
> + unsigned long mask = x86_pmu.lbr_nr - 1;
> + u64 tos = intel_pmu_lbr_tos();
> + int i;
> +
> + for (i = 0; i < x86_pmu.lbr_nr; i++, tos--) {
> + unsigned long lbr_idx = (tos - i) & mask;
> + u64 from, to, flags = 0;
> +
> + rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
> + rdmsrl(x86_pmu.lbr_to + lbr_idx, to);
> +
> + if (x86_pmu.lbr_format == LBR_FORMAT_EIP_FLAGS) {
> + flags = !!(from & LBR_FROM_FLAG_MISPRED);
> + from = (u64)((((s64)from) << 1) >> 1);
> + }
> +
> + cpuc->lbr_stack[i].from = from;
> + cpuc->lbr_stack[i].to = to;
> + cpuc->lbr_stack[i].flags = flags;
> + }
> + cpuc->lbr_entries = i;
> +}
> +
> +static void intel_pmu_lbr_read(void)
> +{
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> +
> + if (!cpuc->lbr_users)
> + return;
> +
> + if (x86_pmu.lbr_format == LBR_FORMAT_32)
> + intel_pmu_lbr_read_32(cpuc);
> + else
> + intel_pmu_lbr_read_64(cpuc);
> +}
> +
> +static int intel_pmu_lbr_format(void)
> +{
> + u64 capabilities;
> +
> + rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);
> + return capabilities & 0x1f;
> +}
> +
> +static void intel_pmu_lbr_init_core(void)
> +{
> + x86_pmu.lbr_format = intel_pmu_lbr_format();
> + x86_pmu.lbr_nr = 4;
> + x86_pmu.lbr_tos = 0x01c9;
> + x86_pmu.lbr_from = 0x40;
> + x86_pmu.lbr_to = 0x60;
> +}
> +
> +static void intel_pmu_lbr_init_nhm(void)
> +{
> + x86_pmu.lbr_format = intel_pmu_lbr_format();
> + x86_pmu.lbr_nr = 16;
> + x86_pmu.lbr_tos = 0x01c9;
> + x86_pmu.lbr_from = 0x680;
> + x86_pmu.lbr_to = 0x6c0;
> +}
> +
> +static void intel_pmu_lbr_init_atom(void)
> +{
> + x86_pmu.lbr_format = intel_pmu_lbr_format();
> + x86_pmu.lbr_nr = 8;
> + x86_pmu.lbr_tos = 0x01c9;
> + x86_pmu.lbr_from = 0x40;
> + x86_pmu.lbr_to = 0x60;
> +}
> +
> +#endif /* CONFIG_CPU_SUP_INTEL */
>
> --
>
>
--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-03 16:39 ` [RFC][PATCH 08/11] perf, x86: Implement simple LBR support Peter Zijlstra
2010-03-03 21:52 ` Stephane Eranian
@ 2010-03-03 21:57 ` Stephane Eranian
2010-03-04 8:58 ` Peter Zijlstra
1 sibling, 1 reply; 44+ messages in thread
From: Stephane Eranian @ 2010-03-03 21:57 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 12897 bytes --]
I don't understand how LBR state is migrated when a per-thread event is movedfrom one CPU to another. It seems LBR is managed per-cpu.
Can you explain this to me?
On Wed, Mar 3, 2010 at 5:39 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:> Implement support for Intel LBR stacks that support> FREEZE_LBRS_ON_PMI. We do not (yet?) support the LBR config register> because that is SMT wide and would also put undue restraints on the> PEBS users.>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>> --->  arch/x86/kernel/cpu/perf_event.c      |  22 ++>  arch/x86/kernel/cpu/perf_event_intel.c   |  13 +>  arch/x86/kernel/cpu/perf_event_intel_lbr.c |  228 +++++++++++++++++++++++++++++>  3 files changed, 263 insertions(+)>> Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c> ===================================================================> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c> +++ linux-2.6/arch/x86/kernel/cpu/perf_event.c> @@ -48,6 +48,12 @@ struct amd_nb {>     struct event_constraint event_constraints[X86_PMC_IDX_MAX];>  };>> +#define MAX_LBR_ENTRIES         16> +> +struct lbr_entry {> +    u64   from, to, flags;> +};> +>  struct cpu_hw_events {>     /*>     * Generic x86 PMC bits> @@ -70,6 +76,14 @@ struct cpu_hw_events {>     u64           pebs_enabled;>>     /*> +     * Intel LBR bits> +     */> +    int           lbr_users;> +    int           lbr_entries;> +    struct lbr_entry     lbr_stack[MAX_LBR_ENTRIES];> +    void           *lbr_context;> +> +    /*>     * AMD specific bits>     */>     struct amd_nb      *amd_nb;> @@ -154,6 +168,13 @@ struct x86_pmu {>     int       pebs_record_size;>     void       (*drain_pebs)(void);>     struct event_constraint *pebs_constraints;> +> +    /*> +     * Intel LBR> +     */> +    unsigned long  lbr_tos, lbr_from, lbr_to; /* MSR base regs    */> +    int       lbr_nr;           /* hardware stack size */> +    int       lbr_format;         /* hardware format   */>  };>>  static struct x86_pmu x86_pmu __read_mostly;> @@ -1238,6 +1259,7 @@ undo:>>  #include "perf_event_amd.c">  #include "perf_event_p6.c"> +#include "perf_event_intel_lbr.c">  #include "perf_event_intel_ds.c">  #include "perf_event_intel.c">> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c> ===================================================================> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c> @@ -480,6 +480,7 @@ static void intel_pmu_disable_all(void)>         intel_pmu_disable_bts();>>     intel_pmu_pebs_disable_all();> +    intel_pmu_lbr_disable_all();>  }>>  static void intel_pmu_enable_all(void)> @@ -499,6 +500,7 @@ static void intel_pmu_enable_all(void)>     }>>     intel_pmu_pebs_enable_all();> +    intel_pmu_lbr_enable_all();>  }>>  static inline u64 intel_pmu_get_status(void)> @@ -675,6 +677,8 @@ again:>     inc_irq_stat(apic_perf_irqs);>     ack = status;>> +    intel_pmu_lbr_read();> +>     /*>     * PEBS overflow sets bit 62 in the global status register>     */> @@ -847,6 +851,8 @@ static __init int intel_pmu_init(void)>         memcpy(hw_cache_event_ids, core2_hw_cache_event_ids,>            sizeof(hw_cache_event_ids));>> +        intel_pmu_lbr_init_core();> +>         x86_pmu.event_constraints = intel_core2_event_constraints;>         pr_cont("Core2 events, ");>         break;> @@ -856,13 +862,18 @@ static __init int intel_pmu_init(void)>         memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,>            sizeof(hw_cache_event_ids));>> +        intel_pmu_lbr_init_nhm();> +>         x86_pmu.event_constraints = intel_nehalem_event_constraints;>         pr_cont("Nehalem/Corei7 events, ");>         break;> +>     case 28: /* Atom */>         memcpy(hw_cache_event_ids, atom_hw_cache_event_ids,>            sizeof(hw_cache_event_ids));>> +        intel_pmu_lbr_init_atom();> +>         x86_pmu.event_constraints = intel_gen_event_constraints;>         pr_cont("Atom events, ");>         break;> @@ -872,6 +883,8 @@ static __init int intel_pmu_init(void)>         memcpy(hw_cache_event_ids, westmere_hw_cache_event_ids,>            sizeof(hw_cache_event_ids));>> +        intel_pmu_lbr_init_nhm();> +>         x86_pmu.event_constraints = intel_westmere_event_constraints;>         pr_cont("Westmere events, ");>         break;> Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c> ===================================================================> --- /dev/null> +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_lbr.c> @@ -0,0 +1,228 @@> +#ifdef CONFIG_CPU_SUP_INTEL> +> +enum {> +    LBR_FORMAT_32      = 0x00,> +    LBR_FORMAT_LIP      = 0x01,> +    LBR_FORMAT_EIP      = 0x02,> +    LBR_FORMAT_EIP_FLAGS   = 0x03,> +};> +> +/*> + * We only support LBR implementations that have FREEZE_LBRS_ON_PMI> + * otherwise it becomes near impossible to get a reliable stack.> + */> +> +#define X86_DEBUGCTL_LBR                (1 << 0)> +#define X86_DEBUGCTL_FREEZE_LBRS_ON_PMI         (1 << 11)> +> +static void __intel_pmu_lbr_enable(void)> +{> +    u64 debugctl;> +> +    rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);> +    debugctl |= (X86_DEBUGCTL_LBR | X86_DEBUGCTL_FREEZE_LBRS_ON_PMI);> +    wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);> +}> +> +static void __intel_pmu_lbr_disable(void)> +{> +    u64 debugctl;> +> +    rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);> +    debugctl &= ~(X86_DEBUGCTL_LBR | X86_DEBUGCTL_FREEZE_LBRS_ON_PMI);> +    wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);> +}> +> +static void intel_pmu_lbr_reset_32(void)> +{> +    int i;> +> +    for (i = 0; i < x86_pmu.lbr_nr; i++)> +        wrmsrl(x86_pmu.lbr_from + i, 0);> +}> +> +static void intel_pmu_lbr_reset_64(void)> +{> +    int i;> +> +    for (i = 0; i < x86_pmu.lbr_nr; i++) {> +        wrmsrl(x86_pmu.lbr_from + i, 0);> +        wrmsrl(x86_pmu.lbr_to  + i, 0);> +    }> +}> +> +static void intel_pmu_lbr_reset(void)> +{> +    if (x86_pmu.lbr_format == LBR_FORMAT_32)> +        intel_pmu_lbr_reset_32();> +    else> +        intel_pmu_lbr_reset_64();> +}> +> +static void intel_pmu_lbr_enable(struct perf_event *event)> +{> +    struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);> +> +    if (!x86_pmu.lbr_nr)> +        return;> +> +    WARN_ON(cpuc->enabled);> +> +    /*> +     * Reset the LBR stack if this is the first LBR user or> +     * we changed task context so as to avoid data leaks.> +     */> +> +    if (!cpuc->lbr_users ||> +      (event->ctx->task && cpuc->lbr_context != event->ctx)) {> +        intel_pmu_lbr_reset();> +        cpuc->lbr_context = event->ctx;> +    }> +> +    cpuc->lbr_users++;> +}> +> +static void intel_pmu_lbr_disable(struct perf_event *event)> +{> +    struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);> +> +    if (!x86_pmu.lbr_nr)> +        return;> +> +    cpuc->lbr_users--;> +> +    BUG_ON(cpuc->lbr_users < 0);> +    WARN_ON(cpuc->enabled);> +}> +> +static void intel_pmu_lbr_enable_all(void)> +{> +    struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);> +> +    if (cpuc->lbr_users)> +        __intel_pmu_lbr_enable();> +}> +> +static void intel_pmu_lbr_disable_all(void)> +{> +    struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);> +> +    if (cpuc->lbr_users)> +        __intel_pmu_lbr_disable();> +}> +> +static inline u64 intel_pmu_lbr_tos(void)> +{> +    u64 tos;> +> +    rdmsrl(x86_pmu.lbr_tos, tos);> +> +    return tos;> +}> +> +static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)> +{> +    unsigned long mask = x86_pmu.lbr_nr - 1;> +    u64 tos = intel_pmu_lbr_tos();> +    int i;> +> +    for (i = 0; i < x86_pmu.lbr_nr; i++, tos--) {> +        unsigned long lbr_idx = (tos - i) & mask;> +        union {> +            struct {> +                u32 from;> +                u32 to;> +            };> +            u64   lbr;> +        } msr_lastbranch;> +> +        rdmsrl(x86_pmu.lbr_from + lbr_idx, msr_lastbranch.lbr);> +> +        cpuc->lbr_stack[i].from  = msr_lastbranch.from;> +        cpuc->lbr_stack[i].to   = msr_lastbranch.to;> +        cpuc->lbr_stack[i].flags = 0;> +    }> +    cpuc->lbr_entries = i;> +}> +> +#define LBR_FROM_FLAG_MISPRED  (1ULL << 63)> +> +/*> + * Due to lack of segmentation in Linux the effective address (offset)> + * is the same as the linear address, allowing us to merge the LIP and EIP> + * LBR formats.> + */> +static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)> +{> +    unsigned long mask = x86_pmu.lbr_nr - 1;> +    u64 tos = intel_pmu_lbr_tos();> +    int i;> +> +    for (i = 0; i < x86_pmu.lbr_nr; i++, tos--) {> +        unsigned long lbr_idx = (tos - i) & mask;> +        u64 from, to, flags = 0;> +> +        rdmsrl(x86_pmu.lbr_from + lbr_idx, from);> +        rdmsrl(x86_pmu.lbr_to  + lbr_idx, to);> +> +        if (x86_pmu.lbr_format == LBR_FORMAT_EIP_FLAGS) {> +            flags = !!(from & LBR_FROM_FLAG_MISPRED);> +            from = (u64)((((s64)from) << 1) >> 1);> +        }> +> +        cpuc->lbr_stack[i].from  = from;> +        cpuc->lbr_stack[i].to   = to;> +        cpuc->lbr_stack[i].flags = flags;> +    }> +    cpuc->lbr_entries = i;> +}> +> +static void intel_pmu_lbr_read(void)> +{> +    struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);> +> +    if (!cpuc->lbr_users)> +        return;> +> +    if (x86_pmu.lbr_format == LBR_FORMAT_32)> +        intel_pmu_lbr_read_32(cpuc);> +    else> +        intel_pmu_lbr_read_64(cpuc);> +}> +> +static int intel_pmu_lbr_format(void)> +{> +    u64 capabilities;> +> +    rdmsrl(MSR_IA32_PERF_CAPABILITIES, capabilities);> +    return capabilities & 0x1f;> +}> +> +static void intel_pmu_lbr_init_core(void)> +{> +    x86_pmu.lbr_format = intel_pmu_lbr_format();> +    x86_pmu.lbr_nr   = 4;> +    x86_pmu.lbr_tos   = 0x01c9;> +    x86_pmu.lbr_from  = 0x40;> +    x86_pmu.lbr_to   = 0x60;> +}> +> +static void intel_pmu_lbr_init_nhm(void)> +{> +    x86_pmu.lbr_format = intel_pmu_lbr_format();> +    x86_pmu.lbr_nr   = 16;> +    x86_pmu.lbr_tos   = 0x01c9;> +    x86_pmu.lbr_from  = 0x680;> +    x86_pmu.lbr_to   = 0x6c0;> +}> +> +static void intel_pmu_lbr_init_atom(void)> +{> +    x86_pmu.lbr_format = intel_pmu_lbr_format();> +    x86_pmu.lbr_nr   = 8;> +    x86_pmu.lbr_tos   = 0x01c9;> +    x86_pmu.lbr_from  = 0x40;> +    x86_pmu.lbr_to   = 0x60;> +}> +> +#endif /* CONFIG_CPU_SUP_INTEL */>> -->>
-- Stephane Eranian | EMEA Software EngineeringGoogle France | 38 avenue de l'Opéra | 75002 ParisTel : +33 (0) 1 42 68 53 00This email may be confidential or privileged. If you received thiscommunication by mistake, pleasedon't forward it to anyone else, please erase all copies andattachments, and please let me know thatit went to the wrong person. Thanksÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 16:39 ` [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS Peter Zijlstra
2010-03-03 17:30 ` Stephane Eranian
@ 2010-03-03 22:02 ` Frederic Weisbecker
2010-03-04 8:58 ` Peter Zijlstra
1 sibling, 1 reply; 44+ messages in thread
From: Frederic Weisbecker @ 2010-03-03 22:02 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, eranian, robert.richter
On Wed, Mar 03, 2010 at 05:39:43PM +0100, Peter Zijlstra wrote:
> Simply copy out the provided pt_regs in a u64 aligned fashion.
>
> XXX: do task_pt_regs() and get_irq_regs() always clear everything or
> are we now leaking data?
It looks like there is a leak in case of non trace syscalls.
where we don't appear to save r12-15.
Then task_pt_regs() may leak the top of a process stack...?
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> include/linux/perf_event.h | 5 ++++-
> kernel/perf_event.c | 17 +++++++++++++++++
> 2 files changed, 21 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/include/linux/perf_event.h
> ===================================================================
> --- linux-2.6.orig/include/linux/perf_event.h
> +++ linux-2.6/include/linux/perf_event.h
> @@ -125,8 +125,9 @@ enum perf_event_sample_format {
> PERF_SAMPLE_PERIOD = 1U << 8,
> PERF_SAMPLE_STREAM_ID = 1U << 9,
> PERF_SAMPLE_RAW = 1U << 10,
> + PERF_SAMPLE_REGS = 1U << 11,
>
> - PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
> + PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */
> };
>
> /*
> @@ -392,6 +393,7 @@ enum perf_event_type {
> * { u64 period; } && PERF_SAMPLE_PERIOD
> *
> * { struct read_format values; } && PERF_SAMPLE_READ
> + * { struct pt_regs regs; } && PERF_SAMPLE_REGS
> *
> * { u64 nr,
> * u64 ips[nr]; } && PERF_SAMPLE_CALLCHAIN
> @@ -800,6 +802,7 @@ struct perf_sample_data {
> u64 period;
> struct perf_callchain_entry *callchain;
> struct perf_raw_record *raw;
> + struct pt_regs *regs;
> };
>
> static inline
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -3176,6 +3176,17 @@ void perf_output_sample(struct perf_outp
> if (sample_type & PERF_SAMPLE_READ)
> perf_output_read(handle, event);
>
> + if (sample_type & PERF_SAMPLE_REGS) {
> + int size = DIV_ROUND_UP(sizeof(struct pt_regs), sizeof(u64)) -
> + sizeof(struct pt_regs);
> +
> + perf_output_put(handle, *data->regs);
> + if (size) {
> + u64 zero = 0;
> + perf_output_copy(handle, &zero, size);
> + }
> + }
> +
> if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> if (data->callchain) {
> int size = 1;
> @@ -3273,6 +3284,12 @@ void perf_prepare_sample(struct perf_eve
> if (sample_type & PERF_SAMPLE_READ)
> header->size += perf_event_read_size(event);
>
> + if (sample_type & PERF_SAMPLE_REGS) {
> + data->regs = regs;
> + header->size += DIV_ROUND_UP(sizeof(struct pt_regs),
> + sizeof(u64));
> + }
> +
> if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> int size = 1;
>
>
> --
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 17:55 ` David Miller
2010-03-03 18:18 ` Stephane Eranian
2010-03-03 19:18 ` Peter Zijlstra
@ 2010-03-04 2:59 ` Ingo Molnar
2010-03-04 12:58 ` Arnaldo Carvalho de Melo
2 siblings, 1 reply; 44+ messages in thread
From: Ingo Molnar @ 2010-03-04 2:59 UTC (permalink / raw)
To: David Miller, Arnaldo Carvalho de Melo
Cc: eranian, peterz, linux-kernel, paulus, robert.richter, fweisbec
* David Miller <davem@davemloft.net> wrote:
> And more generally aren't we supposed to be able to eventually analyze perf
> dumps on any platform not just the one 'perf' was built under?
A aidenote: in this cycle Arnaldo improved this aspect of perf (and those
changes are now upstream). In theory you should be able to do a 'perf record'
+ 'perf archive' on your Sparc box and then analyze it via 'perf report' on an
x86 box - and vice versa.
( Note, it was not tested in that specific combination - another combination
was tested by Arnaldo: 32-bit PA-RISC profile interpreted on 64-bit x86. )
So yes, i agree that at minimum perf should be able to tell apart the nature
of any recording and flag combinations it cannot handle (yet).
Btw, i think the most popular use of PEBS is its precise nature, not the
register dumping aspect per se. If the kernel can provide that transparently
then that's a usecase that does not need a register dump (in user-space that
is). It's borderline doable on x86 ...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 06/11] perf, x86: PEBS infrastructure
2010-03-03 17:42 ` Peter Zijlstra
@ 2010-03-04 8:50 ` Robert Richter
0 siblings, 0 replies; 44+ messages in thread
From: Robert Richter @ 2010-03-04 8:50 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, eranian, fweisbec
On 03.03.10 18:42:48, Peter Zijlstra wrote:
> On Wed, 2010-03-03 at 18:38 +0100, Robert Richter wrote:
> > > + fake_cpuc = kmalloc(sizeof(*fake_cpuc), GFP_KERNEL | __GFP_ZERO);
> > > + if (!fake_cpuc)
> > > + return -ENOMEM;
> > > +
> > > + c = x86_pmu.get_event_constraints(fake_cpuc, event);
> > > +
> > > + if (!c || !c->weight)
> > > + ret = -ENOSPC;
> > > +
> > > + if (x86_pmu.put_event_constraints)
> > > + x86_pmu.put_event_constraints(fake_cpuc, event);
> >
> > A fake cpu with the struct filled with zeros will cause a null pointer
> > exception in amd_get_event_constraints():
> >
> > struct amd_nb *nb = cpuc->amd_nb;
>
> That should result in nb == NULL, right? which is checked slightly
> further in the function.
Yes, right. The problem was in your earlier version of this code where
fake_cpuc was a null pointer. The check in amd_get_event_constraints()
for nb should work.
-Robert
>
> > Shouldn't x86_schedule_events() sufficient to decide if a single
> > counter is available? I did not yet look at group events, this might
> > happen there too.
>
> Sure, but we will only attempt scheduling them at enable time, this is a
> creation time check, failing to create an unschedulable event seems
> prudent.
>
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup
2010-03-03 21:50 ` Stephane Eranian
@ 2010-03-04 8:57 ` Peter Zijlstra
2010-03-09 1:41 ` Stephane Eranian
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-04 8:57 UTC (permalink / raw)
To: Stephane Eranian
Cc: Masami Hiramatsu, mingo, linux-kernel, paulus, robert.richter, fweisbec
On Wed, 2010-03-03 at 22:50 +0100, Stephane Eranian wrote:
> I think systematically and transparently using LBR to correct PEBS off-by-one
> problem is not such a good idea. You've basically highjacked LBR and user
> cannot use it in a different way.
Well, they could, it just makes scheduling the stuff more interesting.
> There are PEBS+LBR measurements where you care about extracting the LBR data.
> There are PEBS measurements where you don't care about getting the correct IP.
> I don't necessarily want to pay the price, especially when this could
> be done offline in the tool.
There are some people who argue that fixing up that +1 insn issue is
critical, sadly they don't appear to want to argue their case in public.
What we can do is make it optional I guess.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-03 21:52 ` Stephane Eranian
@ 2010-03-04 8:58 ` Peter Zijlstra
0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-04 8:58 UTC (permalink / raw)
To: Stephane Eranian; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
On Wed, 2010-03-03 at 22:52 +0100, Stephane Eranian wrote:
> On Wed, Mar 3, 2010 at 5:39 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > Implement support for Intel LBR stacks that support
> > FREEZE_LBRS_ON_PMI. We do not (yet?) support the LBR config register
> > because that is SMT wide and would also put undue restraints on the
> > PEBS users.
> >
> You're saying PEBS users have priorities over pure LBR users?
> Why is that?
I say no such thing, I only say it would make scheduling the PEBS things
more interesting.
> Without coding this, how would you expose LBR configuration to userland
> given you're using the PERF_SAMPLE_BRANCH_STACK approach?
Possibly using a second config word in the attr, but given how sucky the
hardware currently is (sharing the config between SMT) I'd be inclined
to pretend it doesn't exist for the moment.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-03 22:02 ` Frederic Weisbecker
@ 2010-03-04 8:58 ` Peter Zijlstra
2010-03-04 11:04 ` Ingo Molnar
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-04 8:58 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: mingo, linux-kernel, paulus, eranian, robert.richter
On Wed, 2010-03-03 at 23:02 +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2010 at 05:39:43PM +0100, Peter Zijlstra wrote:
> > Simply copy out the provided pt_regs in a u64 aligned fashion.
> >
> > XXX: do task_pt_regs() and get_irq_regs() always clear everything or
> > are we now leaking data?
>
>
> It looks like there is a leak in case of non trace syscalls.
> where we don't appear to save r12-15.
>
> Then task_pt_regs() may leak the top of a process stack...?
Right, I was afraid of that. I've put this PERF_SAMPLE_REGS thing in the
freezer for now as people seem unsure how to deal with it.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-03 21:57 ` Stephane Eranian
@ 2010-03-04 8:58 ` Peter Zijlstra
2010-03-04 17:54 ` Stephane Eranian
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-04 8:58 UTC (permalink / raw)
To: Stephane Eranian; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
On Wed, 2010-03-03 at 22:57 +0100, Stephane Eranian wrote:
> I don't understand how LBR state is migrated when a per-thread event is moved
> from one CPU to another. It seems LBR is managed per-cpu.
>
> Can you explain this to me?
It is not, its basically impossible to do given that the TOS doesn't
count more bits than is strictly needed.
Or we should stop supporting cpu and task users at the same time.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-04 8:58 ` Peter Zijlstra
@ 2010-03-04 11:04 ` Ingo Molnar
0 siblings, 0 replies; 44+ messages in thread
From: Ingo Molnar @ 2010-03-04 11:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Frederic Weisbecker, linux-kernel, paulus, eranian, robert.richter
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-03-03 at 23:02 +0100, Frederic Weisbecker wrote:
> > On Wed, Mar 03, 2010 at 05:39:43PM +0100, Peter Zijlstra wrote:
> > > Simply copy out the provided pt_regs in a u64 aligned fashion.
> > >
> > > XXX: do task_pt_regs() and get_irq_regs() always clear everything or
> > > are we now leaking data?
> >
> >
> > It looks like there is a leak in case of non trace syscalls.
> > where we don't appear to save r12-15.
> >
> > Then task_pt_regs() may leak the top of a process stack...?
>
> Right, I was afraid of that. I've put this PERF_SAMPLE_REGS thing in the
> freezer for now as people seem unsure how to deal with it.
Also, we dont want to expose PEBS nor LBR on an ABI level without there being
a user-space component making good use of it.
For example tools/perf/ support would qualify. Raw libraries alone dont really
count as they generally lag plus there's no guarantee for a full feedback loop
either.
Adding ABI details is always a tricky business and we only want to do it if
there's direct, immediate, close involvement with the user-space side, and
real, immediate benefits to users.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS
2010-03-04 2:59 ` Ingo Molnar
@ 2010-03-04 12:58 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 44+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-04 12:58 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Miller, eranian, peterz, linux-kernel, paulus,
robert.richter, fweisbec
Em Thu, Mar 04, 2010 at 03:59:08AM +0100, Ingo Molnar escreveu:
>
> * David Miller <davem@davemloft.net> wrote:
>
> > And more generally aren't we supposed to be able to eventually analyze perf
> > dumps on any platform not just the one 'perf' was built under?
>
> A aidenote: in this cycle Arnaldo improved this aspect of perf (and those
> changes are now upstream). In theory you should be able to do a 'perf record'
> + 'perf archive' on your Sparc box and then analyze it via 'perf report' on an
> x86 box - and vice versa.
>
> ( Note, it was not tested in that specific combination - another combination
> was tested by Arnaldo: 32-bit PA-RISC profile interpreted on 64-bit x86. )
It was the other way around, 64-bit x86 interpreted on 64-bit PARISC.
Should work in any direction.
Caveats:
perf archive requires build-ids, the kernel has them in distros that
have this support in their toolchain, enabled unconditionally since
about 2.6.24.
If vmlinux is available, it will be used, if not a copy of
/proc/kallsyms is made and as well is keyed by build-id.
I have plans to cope with build-id-less systems, but no code yet.
- Arnaldo
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-04 8:58 ` Peter Zijlstra
@ 2010-03-04 17:54 ` Stephane Eranian
2010-03-04 18:18 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Stephane Eranian @ 2010-03-04 17:54 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
On Thu, Mar 4, 2010 at 12:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-03-03 at 22:57 +0100, Stephane Eranian wrote:
>> I don't understand how LBR state is migrated when a per-thread event is moved
>> from one CPU to another. It seems LBR is managed per-cpu.
>>
>> Can you explain this to me?
>
> It is not, its basically impossible to do given that the TOS doesn't
> count more bits than is strictly needed.
>
I don't get that about the TOS.
So you are saying that one context switch out, you drop the current
content of LBR. When you are scheduled back in on an another CPU,
you grab whatever is there?
> Or we should stop supporting cpu and task users at the same time.
>
Or you should consider LBR as an event which has a constraint that
it can only run on one pseudo counter (similar to what you do with
BTS). Scheduling would take care of the mutual exclusion. Multiplexing
would provide the work-around.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-04 17:54 ` Stephane Eranian
@ 2010-03-04 18:18 ` Peter Zijlstra
2010-03-04 20:23 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-04 18:18 UTC (permalink / raw)
To: Stephane Eranian; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
On Thu, 2010-03-04 at 09:54 -0800, Stephane Eranian wrote:
> On Thu, Mar 4, 2010 at 12:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, 2010-03-03 at 22:57 +0100, Stephane Eranian wrote:
> >> I don't understand how LBR state is migrated when a per-thread event is moved
> >> from one CPU to another. It seems LBR is managed per-cpu.
> >>
> >> Can you explain this to me?
> >
> > It is not, its basically impossible to do given that the TOS doesn't
> > count more bits than is strictly needed.
> >
> I don't get that about the TOS.
>
> So you are saying that one context switch out, you drop the current
> content of LBR. When you are scheduled back in on an another CPU,
> you grab whatever is there?
What is currently implemented is that we loose history at the point a
new task schedules in an LBR using event.
If we had a wider TOS we could try and stitch partial stacks together
because we could detect overflow.
We could also preserve the LBR because we would be able to know where a
task got scheduled in and not release information of the previous task
while still allowing a cpu-wide user to see everything.
> > Or we should stop supporting cpu and task users at the same time.
> >
> Or you should consider LBR as an event which has a constraint that
> it can only run on one pseudo counter (similar to what you do with
> BTS). Scheduling would take care of the mutual exclusion. Multiplexing
> would provide the work-around.
Yes, that an even more limited case than not sharing it between task and
cpu context, which is basically the strongest you need.
If you do that you can store the LBR stack on unschedule and put it back
on schedule (on whichever cpu that may be).
But since we do not support LBR-config that'll be of very limited use
since there are enough branches between the point where we schedule the
counter to hitting userspace to cycle the LBR several times.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-04 18:18 ` Peter Zijlstra
@ 2010-03-04 20:23 ` Peter Zijlstra
2010-03-04 20:57 ` Stephane Eranian
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2010-03-04 20:23 UTC (permalink / raw)
To: Stephane Eranian; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
On Thu, 2010-03-04 at 19:18 +0100, Peter Zijlstra wrote:
> What is currently implemented is that we loose history at the point a
> new task schedules in an LBR using event.
>
This also matches CPU errata AX14, AJ52 and AAK109 which states that a
task switch may produce faulty LBR state, so clearing history after a
task switch seems the best thing to do.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 08/11] perf, x86: Implement simple LBR support
2010-03-04 20:23 ` Peter Zijlstra
@ 2010-03-04 20:57 ` Stephane Eranian
0 siblings, 0 replies; 44+ messages in thread
From: Stephane Eranian @ 2010-03-04 20:57 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulus, robert.richter, fweisbec
On Thu, Mar 4, 2010 at 12:23 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-03-04 at 19:18 +0100, Peter Zijlstra wrote:
>> What is currently implemented is that we loose history at the point a
>> new task schedules in an LBR using event.
>>
> This also matches CPU errata AX14, AJ52 and AAK109 which states that a
> task switch may produce faulty LBR state, so clearing history after a
> task switch seems the best thing to do.
>
>
You would save the LBR before the task switch and restore after the
task switch, so I don't see how you would be impacted by this. You
would not pick up the bogus LBR content.
Given that you seem to be interested only in LBR at the user level.
I think what you have right now should work. But I don't like a design
that precludes supporting LBR config regardless of the fact the MSR
is shared or not, because that is preventing some interesting
measurements.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization
2010-03-03 16:39 ` [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization Peter Zijlstra
2010-03-03 16:49 ` David Miller
2010-03-03 21:14 ` Frederic Weisbecker
@ 2010-03-05 8:44 ` Jean Pihet
2 siblings, 0 replies; 44+ messages in thread
From: Jean Pihet @ 2010-03-05 8:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, paulus, eranian, robert.richter, fweisbec,
Jamie Iles, David S. Miller, stable
On Wednesday 03 March 2010 17:39:41 Peter Zijlstra wrote:
> This makes it easier to extend perf_sample_data and fixes a bug on
> arm and sparc, which failed to set ->raw to NULL, which can cause
> crashes when combined with PERF_SAMPLE_RAW.
>
> It also optimizes PowerPC and tracepoint, because the struct
> initialization is forced to zero out the whole structure.
>
> CC: Jamie Iles <jamie.iles@picochip.com>
> CC: Jean Pihet <jpihet@mvista.com>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: David S. Miller <davem@davemloft.net>
> CC: Stephane Eranian <eranian@google.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: stable@kernel.org
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jean Pihet <jpihet@mvista.com>
Thanks!
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup
2010-03-04 8:57 ` Peter Zijlstra
@ 2010-03-09 1:41 ` Stephane Eranian
0 siblings, 0 replies; 44+ messages in thread
From: Stephane Eranian @ 2010-03-09 1:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Masami Hiramatsu, mingo, linux-kernel, paulus, robert.richter, fweisbec
On Thu, Mar 4, 2010 at 9:57 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-03-03 at 22:50 +0100, Stephane Eranian wrote:
>
>> I think systematically and transparently using LBR to correct PEBS off-by-one
>> problem is not such a good idea. You've basically highjacked LBR and user
>> cannot use it in a different way.
>
> Well, they could, it just makes scheduling the stuff more interesting.
>
>> There are PEBS+LBR measurements where you care about extracting the LBR data.
>> There are PEBS measurements where you don't care about getting the correct IP.
>> I don't necessarily want to pay the price, especially when this could
>> be done offline in the tool.
>
> There are some people who argue that fixing up that +1 insn issue is
> critical, sadly they don't appear to want to argue their case in public.
> What we can do is make it optional I guess.
I can see why they would want IP, instead of IP+1. But what I am saying
is that there are certain measurements where you need to use LBR in
another way. For instance, you may want to combine PEBS + LBR to capture the
path that leads to a cache miss. For that you would need to configure LBR
to record only call branches. Then you would do the correction of the IP offline
in the tool. In this case, the patch is more important than the IP+1 error.
This is why I think you need to provide a config field to disable IP+1
correction,
and thus free LBR for other usage. I understand this also means you cannot
share the LBR with other competing events (on the same or distinct CPUs),
but that's what event scheduling is good for.
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2010-03-09 1:41 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03 16:39 [RFC][PATCH 00/11] Another stab at PEBS and LBR support Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 01/11] perf, x86: Remove superfluous arguments to x86_perf_event_set_period() Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 02/11] perf, x86: Remove superfluous arguments to x86_perf_event_update() Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 03/11] perf, x86: Change x86_pmu.{enable,disable} calling convention Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 04/11] perf, x86: Use unlocked bitops Peter Zijlstra
2010-03-03 16:39 ` [RFC][PATCH 05/11] perf: Generic perf_sample_data initialization Peter Zijlstra
2010-03-03 16:49 ` David Miller
2010-03-03 21:14 ` Frederic Weisbecker
2010-03-05 8:44 ` Jean Pihet
2010-03-03 16:39 ` [RFC][PATCH 06/11] perf, x86: PEBS infrastructure Peter Zijlstra
2010-03-03 17:38 ` Robert Richter
2010-03-03 17:42 ` Peter Zijlstra
2010-03-04 8:50 ` Robert Richter
2010-03-03 16:39 ` [RFC][PATCH 07/11] perf: Provide PERF_SAMPLE_REGS Peter Zijlstra
2010-03-03 17:30 ` Stephane Eranian
2010-03-03 17:39 ` Peter Zijlstra
2010-03-03 17:49 ` Stephane Eranian
2010-03-03 17:55 ` David Miller
2010-03-03 18:18 ` Stephane Eranian
2010-03-03 19:18 ` Peter Zijlstra
2010-03-04 2:59 ` Ingo Molnar
2010-03-04 12:58 ` Arnaldo Carvalho de Melo
2010-03-03 22:02 ` Frederic Weisbecker
2010-03-04 8:58 ` Peter Zijlstra
2010-03-04 11:04 ` Ingo Molnar
2010-03-03 16:39 ` [RFC][PATCH 08/11] perf, x86: Implement simple LBR support Peter Zijlstra
2010-03-03 21:52 ` Stephane Eranian
2010-03-04 8:58 ` Peter Zijlstra
2010-03-03 21:57 ` Stephane Eranian
2010-03-04 8:58 ` Peter Zijlstra
2010-03-04 17:54 ` Stephane Eranian
2010-03-04 18:18 ` Peter Zijlstra
2010-03-04 20:23 ` Peter Zijlstra
2010-03-04 20:57 ` Stephane Eranian
2010-03-03 16:39 ` [RFC][PATCH 09/11] perf, x86: Implement PERF_SAMPLE_BRANCH_STACK Peter Zijlstra
2010-03-03 21:08 ` Frederic Weisbecker
2010-03-03 16:39 ` [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup Peter Zijlstra
2010-03-03 18:05 ` Masami Hiramatsu
2010-03-03 19:37 ` Peter Zijlstra
2010-03-03 21:11 ` Masami Hiramatsu
2010-03-03 21:50 ` Stephane Eranian
2010-03-04 8:57 ` Peter Zijlstra
2010-03-09 1:41 ` Stephane Eranian
2010-03-03 16:39 ` [RFC][PATCH 11/11] perf, x86: Clean up IA32_PERF_CAPABILITIES usage Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.