All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] perf, x86: Fixes for v2.6.39
@ 2011-04-16  0:27 Robert Richter
  2011-04-16  0:27 ` [PATCH 1/4] perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus Robert Richter
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Robert Richter @ 2011-04-16  0:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

This patch series contains fixes for v2.6.39. They are related to the
AMD family 15h PMU but also modify x86 generic code. See the diffstat
below.

If possible I would like to see patch #4 also in .39, though it is
more than just a one-liner with many changes in the x86 event
scheduling code. Without this patch scheduling multiple events on AMD
family 15h may fail under certain conditions though there are enough
counters available.

-Robert



The following changes since commit e566b76ed30768140df8f0023904aed5a41244f7:

  perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec() (2011-04-11 11:07:55 +0200)

Andre Przywara (1):
      perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus

Robert Richter (3):
      perf, x86: Fix AMD family 15h FPU event constraints
      perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE
      perf, x86: Fix event scheduler to solve complex scheduling problems

 arch/x86/kernel/cpu/perf_event.c     |   63 +++++++++++++++++++++++++++------
 arch/x86/kernel/cpu/perf_event_amd.c |   23 ++++++++++--
 2 files changed, 70 insertions(+), 16 deletions(-)




^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/4] perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus
  2011-04-16  0:27 [PATCH 0/4] perf, x86: Fixes for v2.6.39 Robert Richter
@ 2011-04-16  0:27 ` Robert Richter
  2011-04-19 12:03   ` [tip:perf/urgent] " tip-bot for Andre Przywara
  2011-04-16  0:27 ` [PATCH 2/4] perf, x86: Fix AMD family 15h FPU event constraints Robert Richter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-04-16  0:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Stephane Eranian, LKML, Andre Przywara, Robert Richter

From: Andre Przywara <andre.przywara@amd.com>

With AMD cpu family 15h a unit mask was introduced for the Data Cache
Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0
(first data cache miss or streaming store to a 64 B cache line) of
this mask to proper count data cache misses.

Now we set this bit for all families and models. In case a PMU does
not implement a unit mask for event 0x041 the bit is ignored.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 461f62b..4e16138 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -8,7 +8,7 @@ static __initconst const u64 amd_hw_cache_event_ids
  [ C(L1D) ] = {
 	[ C(OP_READ) ] = {
 		[ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
-		[ C(RESULT_MISS)   ] = 0x0041, /* Data Cache Misses          */
+		[ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
 	},
 	[ C(OP_WRITE) ] = {
 		[ C(RESULT_ACCESS) ] = 0x0142, /* Data Cache Refills :system */
-- 
1.7.3.4



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/4] perf, x86: Fix AMD family 15h FPU event constraints
  2011-04-16  0:27 [PATCH 0/4] perf, x86: Fixes for v2.6.39 Robert Richter
  2011-04-16  0:27 ` [PATCH 1/4] perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus Robert Richter
@ 2011-04-16  0:27 ` Robert Richter
  2011-04-19 12:04   ` [tip:perf/urgent] " tip-bot for Robert Richter
  2011-04-16  0:27 ` [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE Robert Richter
  2011-04-16  0:27 ` [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems Robert Richter
  3 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-04-16  0:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter, Stephane Eranian

Depending on the unit mask settings some FPU events may be scheduled
only on cpu counter #3. This patch fixes this.

Cc: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 4e16138..758cccc 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -427,7 +427,9 @@ static __initconst const struct x86_pmu amd_pmu = {
  *
  * Exceptions:
  *
+ * 0x000	FP	PERF_CTL[3], PERF_CTL[5:3] (*)
  * 0x003	FP	PERF_CTL[3]
+ * 0x004	FP	PERF_CTL[3], PERF_CTL[5:3] (*)
  * 0x00B	FP	PERF_CTL[3]
  * 0x00D	FP	PERF_CTL[3]
  * 0x023	DE	PERF_CTL[2:0]
@@ -448,6 +450,8 @@ static __initconst const struct x86_pmu amd_pmu = {
  * 0x0DF	LS	PERF_CTL[5:0]
  * 0x1D6	EX	PERF_CTL[5:0]
  * 0x1D8	EX	PERF_CTL[5:0]
+ *
+ * (*) depending on the umask all FPU counters may be used
  */
 
 static struct event_constraint amd_f15_PMC0  = EVENT_CONSTRAINT(0, 0x01, 0);
@@ -460,18 +464,29 @@ static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
 static struct event_constraint *
 amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
-	unsigned int event_code = amd_get_event_code(&event->hw);
+	struct hw_perf_event *hwc = &event->hw;
+	unsigned int event_code = amd_get_event_code(hwc);
 
 	switch (event_code & AMD_EVENT_TYPE_MASK) {
 	case AMD_EVENT_FP:
 		switch (event_code) {
+		case 0x000:
+			if (!(hwc->config & 0x0000F000ULL))
+				break;
+			if (!(hwc->config & 0x00000F00ULL))
+				break;
+			return &amd_f15_PMC3;
+		case 0x004:
+			if (hweight_long(hwc->config &
+					 ARCH_PERFMON_EVENTSEL_UMASK) <= 1)
+				break;
+			return &amd_f15_PMC3;
 		case 0x003:
 		case 0x00B:
 		case 0x00D:
 			return &amd_f15_PMC3;
-		default:
-			return &amd_f15_PMC53;
 		}
+		return &amd_f15_PMC53;
 	case AMD_EVENT_LS:
 	case AMD_EVENT_DC:
 	case AMD_EVENT_EX_LS:
-- 
1.7.3.4



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE
  2011-04-16  0:27 [PATCH 0/4] perf, x86: Fixes for v2.6.39 Robert Richter
  2011-04-16  0:27 ` [PATCH 1/4] perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus Robert Richter
  2011-04-16  0:27 ` [PATCH 2/4] perf, x86: Fix AMD family 15h FPU event constraints Robert Richter
@ 2011-04-16  0:27 ` Robert Richter
  2011-04-18 20:00   ` Andi Kleen
  2011-04-19 12:04   ` [tip:perf/core] " tip-bot for Robert Richter
  2011-04-16  0:27 ` [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems Robert Richter
  3 siblings, 2 replies; 38+ messages in thread
From: Robert Richter @ 2011-04-16  0:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids
an extra pointer chase and data cache hit.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index eed3673a..224a84f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -31,6 +31,7 @@
 #include <asm/nmi.h>
 #include <asm/compat.h>
 #include <asm/smp.h>
+#include <asm/alternative.h>
 
 #if 0
 #undef wrmsrl
@@ -363,12 +364,18 @@ again:
 	return new_raw_count;
 }
 
-/* using X86_FEATURE_PERFCTR_CORE to later implement ALTERNATIVE() here */
 static inline int x86_pmu_addr_offset(int index)
 {
-	if (boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
-		return index << 1;
-	return index;
+	int offset;
+
+	/* offset = X86_FEATURE_PERFCTR_CORE ? index << 1 : index */
+	alternative_io(ASM_NOP2,
+		       "shll $1, %%eax",
+		       X86_FEATURE_PERFCTR_CORE,
+		       "=a" (offset),
+		       "a"  (index));
+
+	return offset;
 }
 
 static inline unsigned int x86_pmu_config_addr(int index)
-- 
1.7.3.4



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16  0:27 [PATCH 0/4] perf, x86: Fixes for v2.6.39 Robert Richter
                   ` (2 preceding siblings ...)
  2011-04-16  0:27 ` [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE Robert Richter
@ 2011-04-16  0:27 ` Robert Richter
  2011-04-16  8:51   ` Peter Zijlstra
                     ` (2 more replies)
  3 siblings, 3 replies; 38+ messages in thread
From: Robert Richter @ 2011-04-16  0:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

The current x86 event scheduler fails to resolve scheduling problems
of certain combinations of events and constraints. This happens esp.
for events with complex constraints such as those of the AMD family
15h pmu. The scheduler does not find then an existing solution.
Examples are:

	event code	counter		failure		possible solution

1)	0x043		PMC[2:0]	0		1
	0x02E		PMC[3,0]	3		0
	0x003		PMC3		FAIL		3

2)	0x02E		PMC[3,0]	0		3
	0x043		PMC[2:0]	1		0
	0x045		PMC[2:0]	2		1
	0x046		PMC[2:0]	FAIL		2

Scheduling events on counters is a Hamiltonian path problem. To find a
possible solution we must traverse all existing paths. This patch
implements this.

We need to save all states of already walked paths. If we fail to
schedule an event we now rollback the previous state and try to use
another free counter until we have analysed all paths.

We might consider to later remove the constraint weight implementation
completely, but I left this out as this is a much bigger and more
risky change than this fix.

Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c |   48 +++++++++++++++++++++++++++++++------
 1 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 224a84f..887a500 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -770,11 +770,19 @@ static inline int is_x86_event(struct perf_event *event)
 	return event->pmu == &pmu;
 }
 
+struct sched_log
+{
+	int	i;
+	int	w;
+	int	idx;
+};
+
 static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 {
 	struct event_constraint *c, *constraints[X86_PMC_IDX_MAX];
 	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
-	int i, j, w, wmax, num = 0;
+	struct sched_log sched_log[X86_PMC_IDX_MAX];
+	int i, idx, w, wmax, num = 0, scheduled = 0;
 	struct hw_perf_event *hwc;
 
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
@@ -815,6 +823,7 @@ static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	 */
 
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
+	memset(&sched_log, 0, sizeof(sched_log));
 
 	/*
 	 * weight = number of possible counters
@@ -838,25 +847,48 @@ static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	for (w = 1, num = n; num && w <= wmax; w++) {
 		/* for each event */
 		for (i = 0; num && i < n; i++) {
+		redo:
 			c = constraints[i];
 			hwc = &cpuc->event_list[i]->hw;
 
 			if (c->weight != w)
 				continue;
 
-			for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
-				if (!test_bit(j, used_mask))
+			idx = sched_log[scheduled].idx;
+			/* for each bit in idxmsk starting from idx */
+			while (idx < X86_PMC_IDX_MAX) {
+				idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
+						    idx);
+				if (idx == X86_PMC_IDX_MAX)
+					break;
+				if (!__test_and_set_bit(idx, used_mask))
 					break;
+				idx++;
 			}
 
-			if (j == X86_PMC_IDX_MAX)
-				break;
-
-			__set_bit(j, used_mask);
+			if (idx >= X86_PMC_IDX_MAX) {
+				/* roll back and try next free counter */
+				if (!scheduled)
+					/* no free counters anymore */
+					break;
+				sched_log[scheduled].idx = 0;
+				scheduled--;
+				num++;
+				clear_bit(sched_log[scheduled].idx, used_mask);
+				i = sched_log[scheduled].i;
+				w = sched_log[scheduled].w;
+				sched_log[scheduled].idx++;
+				goto redo;
+			}
 
 			if (assign)
-				assign[i] = j;
+				assign[i] = idx;
+
 			num--;
+			sched_log[scheduled].i = i;
+			sched_log[scheduled].w = w;
+			sched_log[scheduled].idx = idx;
+			scheduled++;
 		}
 	}
 done:
-- 
1.7.3.4



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16  0:27 ` [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems Robert Richter
@ 2011-04-16  8:51   ` Peter Zijlstra
  2011-04-16  9:43     ` Ingo Molnar
  2011-04-17  8:15     ` Robert Richter
  2011-04-16 15:52   ` Stephane Eranian
  2011-04-19 10:26   ` [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters Robert Richter
  2 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2011-04-16  8:51 UTC (permalink / raw)
  To: Robert Richter; +Cc: Ingo Molnar, Stephane Eranian, LKML

On Sat, 2011-04-16 at 02:27 +0200, Robert Richter wrote:
> The current x86 event scheduler fails to resolve scheduling problems
> of certain combinations of events and constraints. This happens esp.
> for events with complex constraints such as those of the AMD family
> 15h pmu. The scheduler does not find then an existing solution.
> Examples are:
> 
>         event code      counter         failure         possible
> solution
> 
> 1)      0x043           PMC[2:0]        0               1
>         0x02E           PMC[3,0]        3               0
>         0x003           PMC3            FAIL            3
> 
> 2)      0x02E           PMC[3,0]        0               3
>         0x043           PMC[2:0]        1               0
>         0x045           PMC[2:0]        2               1
>         0x046           PMC[2:0]        FAIL            2
> 
> Scheduling events on counters is a Hamiltonian path problem. To find a
> possible solution we must traverse all existing paths. This patch
> implements this.
> 
> We need to save all states of already walked paths. If we fail to
> schedule an event we now rollback the previous state and try to use
> another free counter until we have analysed all paths.
> 
> We might consider to later remove the constraint weight implementation
> completely, but I left this out as this is a much bigger and more
> risky change than this fix. 

Argh, crap. That's because AMD is now the first with overlapping
constraints. Be sure to let your hardware guys know that they went from
top to bottom om my appreciation list. AMD used to have no constraints
and now they have the absolute worst.

I'd really prefer not to do this for .39, and I'll have to sit down and
actually read this code. It looks like we went from O(n^2) to O(n!) or
somesuch, also not much of an improvement. I'll have to analyze the
solver to see what it does for 'simple' constraints set to see if it
will indeed be more expensive than the O(n^2) solver we had.

Also, I think this code could do with a tiny bit of comments ;-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16  8:51   ` Peter Zijlstra
@ 2011-04-16  9:43     ` Ingo Molnar
  2011-04-16 10:08       ` Peter Zijlstra
  2011-04-17  8:15     ` Robert Richter
  1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2011-04-16  9:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Stephane Eranian, LKML, Arnaldo Carvalho de Melo,
	Frédéric Weisbecker


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, 2011-04-16 at 02:27 +0200, Robert Richter wrote:
> > The current x86 event scheduler fails to resolve scheduling problems
> > of certain combinations of events and constraints. This happens esp.
> > for events with complex constraints such as those of the AMD family
> > 15h pmu. The scheduler does not find then an existing solution.
> > Examples are:
> > 
> >         event code      counter         failure         possible
> > solution
> > 
> > 1)      0x043           PMC[2:0]        0               1
> >         0x02E           PMC[3,0]        3               0
> >         0x003           PMC3            FAIL            3
> > 
> > 2)      0x02E           PMC[3,0]        0               3
> >         0x043           PMC[2:0]        1               0
> >         0x045           PMC[2:0]        2               1
> >         0x046           PMC[2:0]        FAIL            2
> > 
> > Scheduling events on counters is a Hamiltonian path problem. To find a
> > possible solution we must traverse all existing paths. This patch
> > implements this.
> > 
> > We need to save all states of already walked paths. If we fail to
> > schedule an event we now rollback the previous state and try to use
> > another free counter until we have analysed all paths.
> > 
> > We might consider to later remove the constraint weight implementation
> > completely, but I left this out as this is a much bigger and more
> > risky change than this fix. 
> 
> Argh, crap. That's because AMD is now the first with overlapping
> constraints. Be sure to let your hardware guys know that they went from
> top to bottom om my appreciation list. AMD used to have no constraints
> and now they have the absolute worst.
> 
> I'd really prefer not to do this for .39, and I'll have to sit down and
> actually read this code. It looks like we went from O(n^2) to O(n!) or
> somesuch, also not much of an improvement. I'll have to analyze the
> solver to see what it does for 'simple' constraints set to see if it
> will indeed be more expensive than the O(n^2) solver we had.
> 
> Also, I think this code could do with a tiny bit of comments ;-)

I'd also prefer if we first had actual testcases in 'perf test' for all these 
failures - it took an *awfully* long time to find these regressions (the event 
scheduler code has been committed for months), while with proper testcases it 
would only take a second to run 'perf test'.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16  9:43     ` Ingo Molnar
@ 2011-04-16 10:08       ` Peter Zijlstra
  2011-04-16 10:14         ` Ingo Molnar
  2011-04-16 14:26         ` Valdis.Kletnieks
  0 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2011-04-16 10:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Robert Richter, Stephane Eranian, LKML, Arnaldo Carvalho de Melo,
	Frédéric Weisbecker

On Sat, 2011-04-16 at 11:43 +0200, Ingo Molnar wrote:
> I'd also prefer if we first had actual testcases in 'perf test' for all these 
> failures - it took an *awfully* long time to find these regressions (the event 
> scheduler code has been committed for months), while with proper testcases it 
> would only take a second to run 'perf test'. 

These cases only exist on AMD F15, I don't think there's many people
with such systems around.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16 10:08       ` Peter Zijlstra
@ 2011-04-16 10:14         ` Ingo Molnar
  2011-04-16 10:15           ` Peter Zijlstra
  2011-04-16 14:26         ` Valdis.Kletnieks
  1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2011-04-16 10:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Stephane Eranian, LKML, Arnaldo Carvalho de Melo,
	Frédéric Weisbecker


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, 2011-04-16 at 11:43 +0200, Ingo Molnar wrote:
> > I'd also prefer if we first had actual testcases in 'perf test' for all these 
> > failures - it took an *awfully* long time to find these regressions (the event 
> > scheduler code has been committed for months), while with proper testcases it 
> > would only take a second to run 'perf test'. 
> 
> These cases only exist on AMD F15, I don't think there's many people
> with such systems around.

Well, if the trend continues we'll have more twisted constraints and more bugs 
of this sort, so having a testsuite sure cannot hurt, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16 10:14         ` Ingo Molnar
@ 2011-04-16 10:15           ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2011-04-16 10:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Robert Richter, Stephane Eranian, LKML, Arnaldo Carvalho de Melo,
	Frédéric Weisbecker

On Sat, 2011-04-16 at 12:14 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Sat, 2011-04-16 at 11:43 +0200, Ingo Molnar wrote:
> > > I'd also prefer if we first had actual testcases in 'perf test' for all these 
> > > failures - it took an *awfully* long time to find these regressions (the event 
> > > scheduler code has been committed for months), while with proper testcases it 
> > > would only take a second to run 'perf test'. 
> > 
> > These cases only exist on AMD F15, I don't think there's many people
> > with such systems around.
> 
> Well, if the trend continues we'll have more twisted constraints and more bugs 
> of this sort, so having a testsuite sure cannot hurt, right?

For sure. But the problem with writing test cases at the perf userspace
level is that they're very hardware specific. It would be much easier to
write unit tests for the solver itself.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16 10:08       ` Peter Zijlstra
  2011-04-16 10:14         ` Ingo Molnar
@ 2011-04-16 14:26         ` Valdis.Kletnieks
  1 sibling, 0 replies; 38+ messages in thread
From: Valdis.Kletnieks @ 2011-04-16 14:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Robert Richter, Stephane Eranian, LKML,
	Arnaldo Carvalho de Melo, Frédéric Weisbecker

[-- Attachment #1: Type: text/plain, Size: 218 bytes --]

On Sat, 16 Apr 2011 12:08:40 +0200, Peter Zijlstra said:
> These cases only exist on AMD F15, I don't think there's many people
> with such systems around.

Didn't they say similar things about the first Itaniums? :)


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16  0:27 ` [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems Robert Richter
  2011-04-16  8:51   ` Peter Zijlstra
@ 2011-04-16 15:52   ` Stephane Eranian
  2011-04-17  8:44     ` Robert Richter
  2011-04-19 10:26   ` [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters Robert Richter
  2 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2011-04-16 15:52 UTC (permalink / raw)
  To: Robert Richter; +Cc: Peter Zijlstra, Ingo Molnar, LKML

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 7153 bytes --]

On Fri, Apr 15, 2011 at 5:27 PM, Robert Richter <robert.richter@amd.com> wrote:
> The current x86 event scheduler fails to resolve scheduling problems
> of certain combinations of events and constraints. This happens esp.
> for events with complex constraints such as those of the AMD family
> 15h pmu. The scheduler does not find then an existing solution.
> Examples are:
>
>        event code      counter         failure         possible solution
>
> 1)      0x043           PMC[2:0]        0               1
>        0x02E           PMC[3,0]        3               0
>        0x003           PMC3            FAIL            3
>
I am not sure I understand this failure case. If I recall
the scheduler looks at the weight of each event first:

                                            weight
1)      0x043           PMC[2:0]  3
        0x02E           PMC[3,0]   2
        0x003           PMC3        1

Then, it schedules in increasing weight order. So it will
schedule weight 1, 2, 3. For weight 1, it will find counter3,
for weight 2, it will take counter0, for  weight 3, it will
take counter 1 given 0 is already used.

Or am I reading your example the wrong way?

The fact that counter have overlapping constraints
should not matter. In fact this is what happens with
counters without constraints.

> 2)      0x02E           PMC[3,0]        0               3
>        0x043           PMC[2:0]        1               0
>        0x045           PMC[2:0]        2               1
>        0x046           PMC[2:0]        FAIL            2
>
> Scheduling events on counters is a Hamiltonian path problem. To find a
> possible solution we must traverse all existing paths. This patch
> implements this.
>
> We need to save all states of already walked paths. If we fail to
> schedule an event we now rollback the previous state and try to use
> another free counter until we have analysed all paths.
>
> We might consider to later remove the constraint weight implementation
> completely, but I left this out as this is a much bigger and more
> risky change than this fix.
>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Robert Richter <robert.richter@amd.com>
> ---
>  arch/x86/kernel/cpu/perf_event.c |   48 +++++++++++++++++++++++++++++++------
>  1 files changed, 40 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 224a84f..887a500 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -770,11 +770,19 @@ static inline int is_x86_event(struct perf_event *event)
>        return event->pmu == &pmu;
>  }
>
> +struct sched_log
> +{
> +       int     i;
> +       int     w;
> +       int     idx;
> +};
> +
>  static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
>  {
>        struct event_constraint *c, *constraints[X86_PMC_IDX_MAX];
>        unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> -       int i, j, w, wmax, num = 0;
> +       struct sched_log sched_log[X86_PMC_IDX_MAX];
> +       int i, idx, w, wmax, num = 0, scheduled = 0;
>        struct hw_perf_event *hwc;
>
>        bitmap_zero(used_mask, X86_PMC_IDX_MAX);
> @@ -815,6 +823,7 @@ static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
>         */
>
>        bitmap_zero(used_mask, X86_PMC_IDX_MAX);
> +       memset(&sched_log, 0, sizeof(sched_log));
>
>        /*
>         * weight = number of possible counters
> @@ -838,25 +847,48 @@ static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
>        for (w = 1, num = n; num && w <= wmax; w++) {
>                /* for each event */
>                for (i = 0; num && i < n; i++) {
> +               redo:
>                        c = constraints[i];
>                        hwc = &cpuc->event_list[i]->hw;
>
>                        if (c->weight != w)
>                                continue;
>
> -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> -                               if (!test_bit(j, used_mask))
> +                       idx = sched_log[scheduled].idx;
> +                       /* for each bit in idxmsk starting from idx */
> +                       while (idx < X86_PMC_IDX_MAX) {
> +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> +                                                   idx);
> +                               if (idx == X86_PMC_IDX_MAX)
> +                                       break;
> +                               if (!__test_and_set_bit(idx, used_mask))
>                                        break;
> +                               idx++;
>                        }
>
> -                       if (j == X86_PMC_IDX_MAX)
> -                               break;
> -
> -                       __set_bit(j, used_mask);
> +                       if (idx >= X86_PMC_IDX_MAX) {
> +                               /* roll back and try next free counter */
> +                               if (!scheduled)
> +                                       /* no free counters anymore */
> +                                       break;
> +                               sched_log[scheduled].idx = 0;
> +                               scheduled--;
> +                               num++;
> +                               clear_bit(sched_log[scheduled].idx, used_mask);
> +                               i = sched_log[scheduled].i;
> +                               w = sched_log[scheduled].w;
> +                               sched_log[scheduled].idx++;
> +                               goto redo;
> +                       }
>
>                        if (assign)
> -                               assign[i] = j;
> +                               assign[i] = idx;
> +
>                        num--;
> +                       sched_log[scheduled].i = i;
> +                       sched_log[scheduled].w = w;
> +                       sched_log[scheduled].idx = idx;
> +                       scheduled++;
>                }
>        }
>  done:
> --
> 1.7.3.4
>
>
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16  8:51   ` Peter Zijlstra
  2011-04-16  9:43     ` Ingo Molnar
@ 2011-04-17  8:15     ` Robert Richter
  2011-04-17  8:18       ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-04-17  8:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

On 16.04.11 04:51:17, Peter Zijlstra wrote:
> Argh, crap. That's because AMD is now the first with overlapping
> constraints. Be sure to let your hardware guys know that they went from
> top to bottom om my appreciation list. AMD used to have no constraints
> and now they have the absolute worst.

Yes, the overlapping constraints are the problem.

> I'd really prefer not to do this for .39, and I'll have to sit down and
> actually read this code. It looks like we went from O(n^2) to O(n!) or
> somesuch, also not much of an improvement. I'll have to analyze the
> solver to see what it does for 'simple' constraints set to see if it
> will indeed be more expensive than the O(n^2) solver we had.

It wont be more expensive, if there is a solution. But if there is no
one we walk all possible ways now which is something like O(n!).

Yes, we can shift this general solution after .39. Will try to find a
solution that handles the special case for family 15h as a fix for
.39.

> Also, I think this code could do with a tiny bit of comments ;-)

Will comment on the code inline in the patch.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-17  8:15     ` Robert Richter
@ 2011-04-17  8:18       ` Ingo Molnar
  2011-04-17  8:53         ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2011-04-17  8:18 UTC (permalink / raw)
  To: Robert Richter; +Cc: Peter Zijlstra, Stephane Eranian, LKML


* Robert Richter <robert.richter@amd.com> wrote:

> > I'd really prefer not to do this for .39, and I'll have to sit down and 
> > actually read this code. It looks like we went from O(n^2) to O(n!) or 
> > somesuch, also not much of an improvement. I'll have to analyze the solver 
> > to see what it does for 'simple' constraints set to see if it will indeed 
> > be more expensive than the O(n^2) solver we had.
> 
> It wont be more expensive, if there is a solution. But if there is no one we 
> walk all possible ways now which is something like O(n!).

So with 6 counters it would be a loop of 720, with 8 counters a loop of 40320, 
with 10 counters a loop of 3628800 ... O(n!) is not fun.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-16 15:52   ` Stephane Eranian
@ 2011-04-17  8:44     ` Robert Richter
  2011-04-17  9:05       ` Stephane Eranian
  0 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-04-17  8:44 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: Peter Zijlstra, Ingo Molnar, LKML

On 16.04.11 11:52:54, Stephane Eranian wrote:
> On Fri, Apr 15, 2011 at 5:27 PM, Robert Richter <robert.richter@amd.com> wrote:

> >        event code      counter         failure         possible solution
> >
> > 1)      0x043           PMC[2:0]        0               1
> >        0x02E           PMC[3,0]        3               0
> >        0x003           PMC3            FAIL            3
> >
> I am not sure I understand this failure case. If I recall
> the scheduler looks at the weight of each event first:
> 
>                                             weight
> 1)      0x043           PMC[2:0]  3
>         0x02E           PMC[3,0]   2
>         0x003           PMC3        1
> 
> Then, it schedules in increasing weight order. So it will
> schedule weight 1, 2, 3. For weight 1, it will find counter3,
> for weight 2, it will take counter0, for  weight 3, it will
> take counter 1 given 0 is already used.
> 
> Or am I reading your example the wrong way?

No, I have to admit this one was taken out of my mind and I picked
a wrong example for the special problem I was thinking about. The
above works as you described because of the constraint's weights.

I don't have an example with existing constraints, but consider the
following (theoretical) one:

counter: 3210		Failure:	Solution:

e1	   xx		  xo		  ox
e2	  xx 		 xo		 ox
e3	 x x 		o x		o x
e4	 x  x		x  F		x  o

The special with the above is that two events (e1 and e2) must be
rescheduled to schedule e4. This means that swapping counters of only
one already scheduled event is not sufficient. A counter of a third
event must be freed, this counter is then taken by the second event.

> The fact that counter have overlapping constraints
> should not matter. In fact this is what happens with
> counters without constraints.

An event set containing constraints with following conditions is
problematic:

 (c1->weight <= c2->weight && (c1->cmask & c2->cmask != c1->cmask))

Basically this means both constraints do not overlap completly. You
can't select then the correct counter without knowing the events with
higher weights that will be scheduled later.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-17  8:18       ` Ingo Molnar
@ 2011-04-17  8:53         ` Peter Zijlstra
  2011-04-17 11:23           ` Robert Richter
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2011-04-17  8:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Robert Richter, Stephane Eranian, LKML

On Sun, 2011-04-17 at 10:18 +0200, Ingo Molnar wrote:
> * Robert Richter <robert.richter@amd.com> wrote:
> 
> > > I'd really prefer not to do this for .39, and I'll have to sit down and 
> > > actually read this code. It looks like we went from O(n^2) to O(n!) or 
> > > somesuch, also not much of an improvement. I'll have to analyze the solver 
> > > to see what it does for 'simple' constraints set to see if it will indeed 
> > > be more expensive than the O(n^2) solver we had.
> > 
> > It wont be more expensive, if there is a solution. But if there is no one we 
> > walk all possible ways now which is something like O(n!).
> 
> So with 6 counters it would be a loop of 720, with 8 counters a loop of 40320, 
> with 10 counters a loop of 3628800 ... O(n!) is not fun.

Right, and we'll hit this case at least once when scheduling a
over-committed system. Intel Sandy Bridge can have 8 counters per core +
3 fixed counters, giving an n=11 situation. You do _NOT_ want to have
one 39916800 cycle loop before we determine the PMU isn't schedulable,
that's simply unacceptable.

There's a fine point between maximum PMU utilization and acceptable
performance here, and an O(n!) algorithm is really not acceptable. If
you can find a polynomial algorithm that improves the AMD-F15 situation
we can talk.

As it stands I'm tempted to have AMD suffer its terrible PMU design
decisions, if you want this fixed, fix the silicon.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-17  8:44     ` Robert Richter
@ 2011-04-17  9:05       ` Stephane Eranian
  0 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2011-04-17  9:05 UTC (permalink / raw)
  To: Robert Richter; +Cc: Peter Zijlstra, Ingo Molnar, LKML

On Sun, Apr 17, 2011 at 1:44 AM, Robert Richter <robert.richter@amd.com> wrote:
> On 16.04.11 11:52:54, Stephane Eranian wrote:
>> On Fri, Apr 15, 2011 at 5:27 PM, Robert Richter <robert.richter@amd.com> wrote:
>
>> >        event code      counter         failure         possible solution
>> >
>> > 1)      0x043           PMC[2:0]        0               1
>> >        0x02E           PMC[3,0]        3               0
>> >        0x003           PMC3            FAIL            3
>> >
>> I am not sure I understand this failure case. If I recall
>> the scheduler looks at the weight of each event first:
>>
>>                                             weight
>> 1)      0x043           PMC[2:0]  3
>>         0x02E           PMC[3,0]   2
>>         0x003           PMC3        1
>>
>> Then, it schedules in increasing weight order. So it will
>> schedule weight 1, 2, 3. For weight 1, it will find counter3,
>> for weight 2, it will take counter0, for  weight 3, it will
>> take counter 1 given 0 is already used.
>>
>> Or am I reading your example the wrong way?
>
> No, I have to admit this one was taken out of my mind and I picked
> a wrong example for the special problem I was thinking about. The
> above works as you described because of the constraint's weights.
>
> I don't have an example with existing constraints, but consider the
> following (theoretical) one:
>
> counter: 3210           Failure:        Solution:
>
> e1         xx             xo              ox
> e2        xx             xo              ox
> e3       x x            o x             o x
> e4       x  x           x  F            x  o
>
> The special with the above is that two events (e1 and e2) must be
> rescheduled to schedule e4. This means that swapping counters of only
> one already scheduled event is not sufficient. A counter of a third
> event must be freed, this counter is then taken by the second event.

Ok, I understand this one better now. But you admit yourself, you made it
up. Is it even possible with Fam15h?

The thing is, I think, this is not a catastrophic problem. What will
happen is that
e4 cannot be scheduled with e1, e2, e3. In case the events are not in the same
group, e4 will eventually get a chance to be scheduled due to round-robin on
the event list. In case, the events are all in one group, then you won't be able
to add e4 to the group because of group scheduling sanity check on creation.
So either you have to shuffle your event groups or you will incur multiplexing.
It's not like e4 will be created but it will nerver be able to count anything.



>> The fact that counter have overlapping constraints
>> should not matter. In fact this is what happens with
>> counters without constraints.
>
> An event set containing constraints with following conditions is
> problematic:
>
>  (c1->weight <= c2->weight && (c1->cmask & c2->cmask != c1->cmask))
>
> Basically this means both constraints do not overlap completly. You
> can't select then the correct counter without knowing the events with
> higher weights that will be scheduled later.
>
> -Robert
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-17  8:53         ` Peter Zijlstra
@ 2011-04-17 11:23           ` Robert Richter
  2011-04-18  8:17             ` Robert Richter
  0 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-04-17 11:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

[-- Attachment #1: Type: text/plain, Size: 1906 bytes --]

On 17.04.11 04:53:32, Peter Zijlstra wrote:
> On Sun, 2011-04-17 at 10:18 +0200, Ingo Molnar wrote:
> > So with 6 counters it would be a loop of 720, with 8 counters a loop of 40320, 
> > with 10 counters a loop of 3628800 ... O(n!) is not fun.
> 
> Right, and we'll hit this case at least once when scheduling a
> over-committed system. Intel Sandy Bridge can have 8 counters per core +
> 3 fixed counters, giving an n=11 situation. You do _NOT_ want to have
> one 39916800 cycle loop before we determine the PMU isn't schedulable,
> that's simply unacceptable.

Of course it is not that much as the algorithm is already optimized
and we only walk through possible ways. Also, the more constraints we
have the less we have to walk. So lets assume a worst case of 8
unconstraint counters, I reimplemented the algorithm in the perl
script attached and counted 251 loops, following numbers I got
depending on the number of counters:

 $ perl counter-scheduling.pl | grep Num
 Number of counters:  2, loops:  10, redos:   4, ratio: 2.5
 Number of counters:  3, loops:  26, redos:   7, ratio: 3.7
 Number of counters:  4, loops:  53, redos:  11, ratio: 4.8
 Number of counters:  5, loops:  89, redos:  15, ratio: 5.9
 Number of counters:  6, loops: 134, redos:  19, ratio: 7.1
 Number of counters:  7, loops: 188, redos:  23, ratio: 8.2
 Number of counters:  8, loops: 251, redos:  27, ratio: 9.3
 Number of counters:  9, loops: 323, redos:  31, ratio: 10.4
 Number of counters: 10, loops: 404, redos:  35, ratio: 11.5
 Number of counters: 11, loops: 494, redos:  39, ratio: 12.7
 Number of counters: 12, loops: 593, redos:  43, ratio: 13.8

It seems the algorithm is about number-of-counter times slower than
the current. I think this is worth some further considerations. There
is also some room for improvement with my algorithm.

-Robert


-- 
Advanced Micro Devices, Inc.
Operating System Research Center

[-- Attachment #2: counter-scheduling.pl --]
[-- Type: text/x-perl, Size: 984 bytes --]

#! /usr/bin/perl

#$num_ctrs = 11;

for ($num_ctrs = 2; $num_ctrs <= 12; $num_ctrs++) {

$num_events = $num_ctrs + 1;

@sched_log = ();
$scheduled = 0;
$used_mask = 0;

$loops = 0;
$redos = 0;
$scheduled = 0;

while ($scheduled < $num_events) {
    for ($idx = $sched_log[$scheduled] || 0; $idx < $num_ctrs; $idx++)
    {
	$loops++;
	last if !((1 << $idx) & $used_mask);
    }

    if ($idx == $num_ctrs) {
	printf "Failed to schedule event #%d\n", $scheduled;
	last if (!$scheduled);
	$sched_log[$scheduled] = 0;
	$scheduled--;
	$idx = $sched_log[$scheduled];
	$sched_log[$scheduled]++;
	$used_mask &= ~(1 << $idx);
	printf "Rollback event #%d on counter #%d\n", $scheduled, $idx;
	$redos++;
	redo;
    }

    $used_mask |= (1 << $idx);
    push @sched_log, $idx;
    printf "Scheduling event #%d on counter #%d\n", $scheduled, $idx;
    $scheduled++;
}

printf("Number of counters: %2d, loops: %3d, redos: %3d, ratio: %.1f\n",
       $num_ctrs, $loops, $redos, $loops / $redos);

}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems
  2011-04-17 11:23           ` Robert Richter
@ 2011-04-18  8:17             ` Robert Richter
  0 siblings, 0 replies; 38+ messages in thread
From: Robert Richter @ 2011-04-18  8:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

[-- Attachment #1: Type: text/plain, Size: 1929 bytes --]

On 17.04.11 13:23:25, Robert Richter wrote:

>  $ perl counter-scheduling.pl | grep Num
>  Number of counters:  2, loops:  10, redos:   4, ratio: 2.5
>  Number of counters:  3, loops:  26, redos:   7, ratio: 3.7
>  Number of counters:  4, loops:  53, redos:  11, ratio: 4.8
>  Number of counters:  5, loops:  89, redos:  15, ratio: 5.9
>  Number of counters:  6, loops: 134, redos:  19, ratio: 7.1
>  Number of counters:  7, loops: 188, redos:  23, ratio: 8.2
>  Number of counters:  8, loops: 251, redos:  27, ratio: 9.3
>  Number of counters:  9, loops: 323, redos:  31, ratio: 10.4
>  Number of counters: 10, loops: 404, redos:  35, ratio: 11.5
>  Number of counters: 11, loops: 494, redos:  39, ratio: 12.7
>  Number of counters: 12, loops: 593, redos:  43, ratio: 13.8

Wrong, wrong, wrong!

Before wasting your time with this, the script is buggy storing the
correct state:

@@ -35,7 +35,7 @@ while ($scheduled < $num_events) {
     }
 
     $used_mask |= (1 << $idx);
-    push @sched_log, $idx;
+    $sched_log[$scheduled] = $idx;
     printf "Scheduling event #%d on counter #%d\n", $scheduled, $idx;
     $scheduled++;
 }

It's not possible do the calculation for 11 counters in reasonable
time:

 $ perl counter-scheduling.pl | grep Num
 Number of counters:  2, loops:  10, redos:   4, ratio: 2.5
 Number of counters:  3, loops:  48, redos:  15, ratio: 3.2
 Number of counters:  4, loops: 260, redos:  64, ratio: 4.1
 Number of counters:  5, loops: 1630, redos: 325, ratio: 5.0
 Number of counters:  6, loops: 11742, redos: 1956, ratio: 6.0
 Number of counters:  7, loops: 95900, redos: 13699, ratio: 7.0
 Number of counters:  8, loops: 876808, redos: 109600, ratio: 8.0
 Number of counters:  9, loops: 8877690, redos: 986409, ratio: 9.0
 Number of counters: 10, loops: 98641010, redos: 9864100, ratio: 10.0

Updated script attached.

Sorry,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

[-- Attachment #2: counter-scheduling.pl --]
[-- Type: text/x-perl, Size: 992 bytes --]

#! /usr/bin/perl

#$num_ctrs = 11;

for ($num_ctrs = 2; $num_ctrs <= 12; $num_ctrs++) {

$num_events = $num_ctrs + 1;

@sched_log = ();
$scheduled = 0;
$used_mask = 0;

$loops = 0;
$redos = 0;
$scheduled = 0;

while ($scheduled < $num_events) {
    for ($idx = $sched_log[$scheduled] || 0; $idx < $num_ctrs; $idx++)
    {
	$loops++;
	last if !((1 << $idx) & $used_mask);
    }

    if ($idx == $num_ctrs) {
	printf "Failed to schedule event #%d\n", $scheduled;
	last if (!$scheduled);
	$sched_log[$scheduled] = 0;
	$scheduled--;
	$idx = $sched_log[$scheduled];
	$sched_log[$scheduled]++;
	$used_mask &= ~(1 << $idx);
	printf "Rollback event #%d on counter #%d\n", $scheduled, $idx;
	$redos++;
	redo;
    }

    $used_mask |= (1 << $idx);
    $sched_log[$scheduled] = $idx;
    printf "Scheduling event #%d on counter #%d\n", $scheduled, $idx;
    $scheduled++;
}

printf("Number of counters: %2d, loops: %3d, redos: %3d, ratio: %.1f\n",
       $num_ctrs, $loops, $redos, $loops / $redos);

}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE
  2011-04-16  0:27 ` [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE Robert Richter
@ 2011-04-18 20:00   ` Andi Kleen
  2011-04-19 10:39     ` Robert Richter
  2011-04-19 12:04   ` [tip:perf/core] " tip-bot for Robert Richter
  1 sibling, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2011-04-18 20:00 UTC (permalink / raw)
  To: Robert Richter; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, LKML

Robert Richter <robert.richter@amd.com> writes:

> Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids
> an extra pointer chase and data cache hit.

Is that really a performance critical path?

Seems more like unnecessary obfuscation to me.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-04-16  0:27 ` [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems Robert Richter
  2011-04-16  8:51   ` Peter Zijlstra
  2011-04-16 15:52   ` Stephane Eranian
@ 2011-04-19 10:26   ` Robert Richter
  2011-04-19 11:29     ` Ingo Molnar
  2011-05-18 21:16     ` Peter Zijlstra
  2 siblings, 2 replies; 38+ messages in thread
From: Robert Richter @ 2011-04-19 10:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

This version 2 implements an algorithm which does not increase the
loop count for systems with no overlapping-counter constraints. With
overlapping-counter constraints the loop count is reduceded to the
minimum and also limited by a stack depth of 2 states.

See below.

-Robert


>From cd70a579995fa7da885ec25db8ebe04ba4a0c30e Mon Sep 17 00:00:00 2001
From: Robert Richter <robert.richter@amd.com>
Date: Fri, 15 Apr 2011 11:04:06 +0200
Subject: [PATCH] perf, x86: Fix event scheduler for constraints with overlapping counters

The current x86 event scheduler fails to resolve scheduling problems
of certain combinations of events and constraints. This happens if the
counter mask of such an event is not a subset of any other counter
mask of a constraint with an equal or higher weight, e.g. constraints
of the AMD family 15h pmu:

                        counter mask    weight

 amd_f15_PMC30          0x09            2  <--- overlapping counters
 amd_f15_PMC20          0x07            3
 amd_f15_PMC53          0x38            3

The scheduler does not find then an existing solution. Here is an
example:

 event code     counter         failure         possible solution

 0x02E          PMC[3,0]        0               3
 0x043          PMC[2:0]        1               0
 0x045          PMC[2:0]        2               1
 0x046          PMC[2:0]        FAIL            2

The event scheduler may not select the correct counter in the first
cycle because it needs to know which subsequent events will be
scheduled. It may fail to schedule the events then.

To solve this, we now save the scheduler state of events with
overlapping counter counstraints.  If we fail to schedule the events
we rollback to those states and try to use another free counter.

Constraints with overlapping counters are marked with a new introduced
redo flag. We set the redo flag for such constraints to give the
scheduler a hint which events to select for counter rescheduling. The
EVENT_CONSTRAINT_REDO() macro can be used for this.

Care must be taken as the rescheduling algorithm is O(n!) which will
increase scheduling cycles for an over-commited system dramatically.
The number of such EVENT_CONSTRAINT_REDO() macros and its counter
masks must be kept at a minimum. Thus, the current stack is limited to
2 states to limit the number of loops the algorithm takes in the worst
case.

On systems with no overlapping-counter constraints, this
implementation does not increase the loop count compared to the
previous algorithm.

Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c     |   97 ++++++++++++++++++++++++++++++----
 arch/x86/kernel/cpu/perf_event_amd.c |    2 +-
 2 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 224a84f..2d4cfae 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -86,6 +86,7 @@ struct event_constraint {
 	u64	code;
 	u64	cmask;
 	int	weight;
+	int	redo;
 };
 
 struct amd_nb {
@@ -144,15 +145,40 @@ struct cpu_hw_events {
 	struct amd_nb		*amd_nb;
 };
 
-#define __EVENT_CONSTRAINT(c, n, m, w) {\
+#define __EVENT_CONSTRAINT(c, n, m, w, r) {\
 	{ .idxmsk64 = (n) },		\
 	.code = (c),			\
 	.cmask = (m),			\
 	.weight = (w),			\
+	.redo = (r),			\
 }
 
 #define EVENT_CONSTRAINT(c, n, m)	\
-	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n))
+	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0)
+
+/*
+ * The redo flag marks event constraints with overlapping counter
+ * masks. This is the case if the counter mask of such an event is not
+ * a subset of any other counter mask of a constraint with an equal or
+ * higher weight, e.g.:
+ *
+ *  c_overlaps = EVENT_CONSTRAINT_REDO(0, 0x09, 0);
+ *  c_another1 = EVENT_CONSTRAINT(0, 0x07, 0);
+ *  c_another2 = EVENT_CONSTRAINT(0, 0x38, 0);
+ *
+ * The event scheduler may not select the correct counter in the first
+ * cycle because it needs to know which subsequent events will be
+ * scheduled. It may fail to schedule the events then. So we set the
+ * redo flag for such constraints to give the scheduler a hint which
+ * events to select for counter rescheduling.
+ *
+ * Care must be taken as the rescheduling algorithm is O(n!) which
+ * will increase scheduling cycles for an over-commited system
+ * dramatically.  The number of such EVENT_CONSTRAINT_REDO() macros
+ * and its counter masks must be kept at a minimum.
+ */
+#define EVENT_CONSTRAINT_REDO(c, n, m)	\
+	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 1)
 
 /*
  * Constraint on the Event code.
@@ -770,11 +796,24 @@ static inline int is_x86_event(struct perf_event *event)
 	return event->pmu == &pmu;
 }
 
+struct sched_state
+{
+	int	i;
+	int	w;
+	int	idx;
+	int	num;
+	unsigned long used[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+};
+
+/* Total max is X86_PMC_IDX_MAX, but we are O(n!) limited */
+#define	SCHED_STATES_MAX	2
+
 static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 {
 	struct event_constraint *c, *constraints[X86_PMC_IDX_MAX];
 	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
-	int i, j, w, wmax, num = 0;
+	struct sched_state sched_state[SCHED_STATES_MAX];
+	int i, idx, w, wmax, num = 0, state = 0;
 	struct hw_perf_event *hwc;
 
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
@@ -838,25 +877,61 @@ static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	for (w = 1, num = n; num && w <= wmax; w++) {
 		/* for each event */
 		for (i = 0; num && i < n; i++) {
+			idx = 0;
+		redo:
 			c = constraints[i];
 			hwc = &cpuc->event_list[i]->hw;
 
 			if (c->weight != w)
 				continue;
 
-			for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
-				if (!test_bit(j, used_mask))
+			/* for each bit in idxmsk starting from idx */
+			while (idx < X86_PMC_IDX_MAX) {
+				idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
+						    idx);
+				if (idx == X86_PMC_IDX_MAX)
 					break;
+				if (!__test_and_set_bit(idx, used_mask))
+					break;
+				idx++;
 			}
 
-			if (j == X86_PMC_IDX_MAX)
-				break;
-
-			__set_bit(j, used_mask);
+			if (idx >= X86_PMC_IDX_MAX) {
+				/*
+				 * roll back and try to reschedule
+				 * events on other counters
+				 */
+				if (!state)
+					/* no events to reschedule */
+					goto done;
+				state--;
+				/* restore previous state: */
+				i = sched_state[state].i;
+				w = sched_state[state].w;
+				idx = sched_state[state].idx;
+				num = sched_state[state].num;
+				bitmap_copy(used_mask, sched_state[state].used, n);
+				/* try next counter: */
+				clear_bit(idx, used_mask);
+				num++;
+				idx++;
+				goto redo;
+			}
 
 			if (assign)
-				assign[i] = j;
+				assign[i] = idx;
+
 			num--;
+
+			if (c->redo && state < SCHED_STATES_MAX) {
+				/* store scheduler state: */
+				sched_state[state].i = i;
+				sched_state[state].w = w;
+				sched_state[state].idx = idx;
+				sched_state[state].num = num;
+				bitmap_copy(sched_state[state].used, used_mask, n);
+				state++;
+			}
 		}
 	}
 done:
@@ -1535,7 +1610,7 @@ static int __init init_hw_perf_events(void)
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
-				   0, x86_pmu.num_counters);
+				   0, x86_pmu.num_counters, 0);
 
 	if (x86_pmu.event_constraints) {
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 758cccc..21821eb 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -457,7 +457,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 static struct event_constraint amd_f15_PMC0  = EVENT_CONSTRAINT(0, 0x01, 0);
 static struct event_constraint amd_f15_PMC20 = EVENT_CONSTRAINT(0, 0x07, 0);
 static struct event_constraint amd_f15_PMC3  = EVENT_CONSTRAINT(0, 0x08, 0);
-static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT(0, 0x09, 0);
+static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_REDO(0, 0x09, 0);
 static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
 static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
 
-- 
1.7.3.4


-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE
  2011-04-18 20:00   ` Andi Kleen
@ 2011-04-19 10:39     ` Robert Richter
  2011-04-19 18:21       ` Andi Kleen
  0 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-04-19 10:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, LKML

On 18.04.11 16:00:57, Andi Kleen wrote:
> Robert Richter <robert.richter@amd.com> writes:
> 
> > Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids
> > an extra pointer chase and data cache hit.
> 
> Is that really a performance critical path?
> 
> Seems more like unnecessary obfuscation to me.

We hotest path is in perf_pmu_disable(), which happens at least with
every task switch when calling the event scheduler.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-04-19 10:26   ` [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters Robert Richter
@ 2011-04-19 11:29     ` Ingo Molnar
  2011-04-19 13:55       ` Robert Richter
  2011-05-18 21:16     ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2011-04-19 11:29 UTC (permalink / raw)
  To: Robert Richter; +Cc: Peter Zijlstra, Stephane Eranian, LKML


* Robert Richter <robert.richter@amd.com> wrote:

> +				if (!state)
> +					/* no events to reschedule */
> +					goto done;

Hm, that's 5 levels of indentation. Would it be possible to factor out the gist 
of the logic into a helper and such? My eyes would be much happier!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip:perf/urgent] perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus
  2011-04-16  0:27 ` [PATCH 1/4] perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus Robert Richter
@ 2011-04-19 12:03   ` tip-bot for Andre Przywara
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Andre Przywara @ 2011-04-19 12:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra,
	andre.przywara, tglx, mingo

Commit-ID:  83112e688f5f05dea1e63787db9a6c16b2887a1d
Gitweb:     http://git.kernel.org/tip/83112e688f5f05dea1e63787db9a6c16b2887a1d
Author:     Andre Przywara <andre.przywara@amd.com>
AuthorDate: Sat, 16 Apr 2011 02:27:53 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 19 Apr 2011 10:07:54 +0200

perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus

With AMD cpu family 15h a unit mask was introduced for the Data Cache
Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0
(first data cache miss or streaming store to a 64 B cache line) of
this mask to proper count data cache misses.

Now we set this bit for all families and models. In case a PMU does
not implement a unit mask for event 0x041 the bit is ignored.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1302913676-14352-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/perf_event_amd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 461f62b..4e16138 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -8,7 +8,7 @@ static __initconst const u64 amd_hw_cache_event_ids
  [ C(L1D) ] = {
 	[ C(OP_READ) ] = {
 		[ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
-		[ C(RESULT_MISS)   ] = 0x0041, /* Data Cache Misses          */
+		[ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
 	},
 	[ C(OP_WRITE) ] = {
 		[ C(RESULT_ACCESS) ] = 0x0142, /* Data Cache Refills :system */

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:perf/urgent] perf, x86: Fix AMD family 15h FPU event constraints
  2011-04-16  0:27 ` [PATCH 2/4] perf, x86: Fix AMD family 15h FPU event constraints Robert Richter
@ 2011-04-19 12:04   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Robert Richter @ 2011-04-19 12:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, eranian, robert.richter, a.p.zijlstra,
	tglx, mingo

Commit-ID:  855357a21744e488cbee23a47d2b124035160a87
Gitweb:     http://git.kernel.org/tip/855357a21744e488cbee23a47d2b124035160a87
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Sat, 16 Apr 2011 02:27:54 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 19 Apr 2011 10:07:55 +0200

perf, x86: Fix AMD family 15h FPU event constraints

Depending on the unit mask settings some FPU events may be scheduled
only on cpu counter #3. This patch fixes this.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@googlemail.com>
Link: http://lkml.kernel.org/r/1302913676-14352-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/perf_event_amd.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 4e16138..cf4e369 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -427,7 +427,9 @@ static __initconst const struct x86_pmu amd_pmu = {
  *
  * Exceptions:
  *
+ * 0x000	FP	PERF_CTL[3], PERF_CTL[5:3] (*)
  * 0x003	FP	PERF_CTL[3]
+ * 0x004	FP	PERF_CTL[3], PERF_CTL[5:3] (*)
  * 0x00B	FP	PERF_CTL[3]
  * 0x00D	FP	PERF_CTL[3]
  * 0x023	DE	PERF_CTL[2:0]
@@ -448,6 +450,8 @@ static __initconst const struct x86_pmu amd_pmu = {
  * 0x0DF	LS	PERF_CTL[5:0]
  * 0x1D6	EX	PERF_CTL[5:0]
  * 0x1D8	EX	PERF_CTL[5:0]
+ *
+ * (*) depending on the umask all FPU counters may be used
  */
 
 static struct event_constraint amd_f15_PMC0  = EVENT_CONSTRAINT(0, 0x01, 0);
@@ -460,18 +464,28 @@ static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
 static struct event_constraint *
 amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
-	unsigned int event_code = amd_get_event_code(&event->hw);
+	struct hw_perf_event *hwc = &event->hw;
+	unsigned int event_code = amd_get_event_code(hwc);
 
 	switch (event_code & AMD_EVENT_TYPE_MASK) {
 	case AMD_EVENT_FP:
 		switch (event_code) {
+		case 0x000:
+			if (!(hwc->config & 0x0000F000ULL))
+				break;
+			if (!(hwc->config & 0x00000F00ULL))
+				break;
+			return &amd_f15_PMC3;
+		case 0x004:
+			if (hweight_long(hwc->config & ARCH_PERFMON_EVENTSEL_UMASK) <= 1)
+				break;
+			return &amd_f15_PMC3;
 		case 0x003:
 		case 0x00B:
 		case 0x00D:
 			return &amd_f15_PMC3;
-		default:
-			return &amd_f15_PMC53;
 		}
+		return &amd_f15_PMC53;
 	case AMD_EVENT_LS:
 	case AMD_EVENT_DC:
 	case AMD_EVENT_EX_LS:

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:perf/core] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE
  2011-04-16  0:27 ` [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE Robert Richter
  2011-04-18 20:00   ` Andi Kleen
@ 2011-04-19 12:04   ` tip-bot for Robert Richter
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Robert Richter @ 2011-04-19 12:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx, mingo

Commit-ID:  c8e5910edf8bbe2e5c6c35a4ef2a578cc7893b25
Gitweb:     http://git.kernel.org/tip/c8e5910edf8bbe2e5c6c35a4ef2a578cc7893b25
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Sat, 16 Apr 2011 02:27:55 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 19 Apr 2011 10:08:12 +0200

perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE

Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids
an extra pointer chase and data cache hit.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1302913676-14352-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/perf_event.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index eed3673a..224a84f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -31,6 +31,7 @@
 #include <asm/nmi.h>
 #include <asm/compat.h>
 #include <asm/smp.h>
+#include <asm/alternative.h>
 
 #if 0
 #undef wrmsrl
@@ -363,12 +364,18 @@ again:
 	return new_raw_count;
 }
 
-/* using X86_FEATURE_PERFCTR_CORE to later implement ALTERNATIVE() here */
 static inline int x86_pmu_addr_offset(int index)
 {
-	if (boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
-		return index << 1;
-	return index;
+	int offset;
+
+	/* offset = X86_FEATURE_PERFCTR_CORE ? index << 1 : index */
+	alternative_io(ASM_NOP2,
+		       "shll $1, %%eax",
+		       X86_FEATURE_PERFCTR_CORE,
+		       "=a" (offset),
+		       "a"  (index));
+
+	return offset;
 }
 
 static inline unsigned int x86_pmu_config_addr(int index)

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-04-19 11:29     ` Ingo Molnar
@ 2011-04-19 13:55       ` Robert Richter
  2011-04-28  9:50         ` Robert Richter
  0 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-04-19 13:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Stephane Eranian, LKML

On 19.04.11 07:29:33, Ingo Molnar wrote:
> 
> * Robert Richter <robert.richter@amd.com> wrote:
> 
> > +				if (!state)
> > +					/* no events to reschedule */
> > +					goto done;
> 
> Hm, that's 5 levels of indentation. Would it be possible to factor out the gist 
> of the logic into a helper and such? My eyes would be much happier!

Hmm, I was thinking about this too, but it is not easy since we
control also the loops with goto's and updates of the loop
states. Refactoring out the inner block would mean to update state
varables via pointers and to introduce return codes which contains
control flow information. Don't think this will become easier.

We could probably merge both for-loops in one while-loop but also on
costs of code readability.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE
  2011-04-19 10:39     ` Robert Richter
@ 2011-04-19 18:21       ` Andi Kleen
  0 siblings, 0 replies; 38+ messages in thread
From: Andi Kleen @ 2011-04-19 18:21 UTC (permalink / raw)
  To: Robert Richter
  Cc: Andi Kleen, Peter Zijlstra, Ingo Molnar, Stephane Eranian, LKML

On Tue, Apr 19, 2011 at 12:39:27PM +0200, Robert Richter wrote:
> On 18.04.11 16:00:57, Andi Kleen wrote:
> > Robert Richter <robert.richter@amd.com> writes:
> > 
> > > Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids
> > > an extra pointer chase and data cache hit.
> > 
> > Is that really a performance critical path?
> > 
> > Seems more like unnecessary obfuscation to me.
> 
> We hotest path is in perf_pmu_disable(), which happens at least with
> every task switch when calling the event scheduler.

Yes but that's already a slow path isn't it? It better is, because
the MSR accesses alone are incredibly expensive. I guess your test
and jump isn't even on the radar after that ...

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-04-19 13:55       ` Robert Richter
@ 2011-04-28  9:50         ` Robert Richter
  0 siblings, 0 replies; 38+ messages in thread
From: Robert Richter @ 2011-04-28  9:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Stephane Eranian, LKML

On 19.04.11 15:55:18, Robert Richter wrote:
> On 19.04.11 07:29:33, Ingo Molnar wrote:
> > 
> > * Robert Richter <robert.richter@amd.com> wrote:
> > 
> > > +				if (!state)
> > > +					/* no events to reschedule */
> > > +					goto done;
> > 
> > Hm, that's 5 levels of indentation. Would it be possible to factor out the gist 
> > of the logic into a helper and such? My eyes would be much happier!
> 
> Hmm, I was thinking about this too, but it is not easy since we
> control also the loops with goto's and updates of the loop
> states. Refactoring out the inner block would mean to update state
> varables via pointers and to introduce return codes which contains
> control flow information. Don't think this will become easier.
> 
> We could probably merge both for-loops in one while-loop but also on
> costs of code readability.

Ingo, do you still want me to rework it?

Any other comments on this patch?

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-04-19 10:26   ` [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters Robert Richter
  2011-04-19 11:29     ` Ingo Molnar
@ 2011-05-18 21:16     ` Peter Zijlstra
  2011-05-18 21:20       ` Ingo Molnar
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2011-05-18 21:16 UTC (permalink / raw)
  To: Robert Richter; +Cc: Ingo Molnar, Stephane Eranian, LKML

So I'm mostly ok with this, just a few nits..

On Tue, 2011-04-19 at 12:26 +0200, Robert Richter wrote:
> @@ -838,25 +877,61 @@ static int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
>         for (w = 1, num = n; num && w <= wmax; w++) {
>                 /* for each event */
>                 for (i = 0; num && i < n; i++) {
> +                       idx = 0;
> +               redo:

labels go on column 0.

>                         c = constraints[i];
>                         hwc = &cpuc->event_list[i]->hw;
>  
>                         if (c->weight != w)
>                                 continue;
>  
> -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> -                               if (!test_bit(j, used_mask))
> +                       /* for each bit in idxmsk starting from idx */
> +                       while (idx < X86_PMC_IDX_MAX) {
> +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> +                                                   idx);

I'd be mighty tempted to ignore that 80 column rule here ;-)

> +                               if (idx == X86_PMC_IDX_MAX)
>                                         break;
> +                               if (!__test_and_set_bit(idx, used_mask))
> +                                       break;
> +                               idx++;
>                         }
>  
> -                       if (j == X86_PMC_IDX_MAX)
> -                               break;
> -
> -                       __set_bit(j, used_mask);
> +                       if (idx >= X86_PMC_IDX_MAX) {
> +                               /*
> +                                * roll back and try to reschedule
> +                                * events on other counters
> +                                */
> +                               if (!state)
> +                                       /* no events to reschedule */
> +                                       goto done;

Multi line indents get { } even if not strictly needed, alternatively we
can place the comment on the same line as the goto.

> +                               state--;
> +                               /* restore previous state: */
> +                               i = sched_state[state].i;
> +                               w = sched_state[state].w;
> +                               idx = sched_state[state].idx;
> +                               num = sched_state[state].num;
> +                               bitmap_copy(used_mask, sched_state[state].used, n);
> +                               /* try next counter: */
> +                               clear_bit(idx, used_mask);
> +                               num++;

Suppose we put the num--; bit below the if (c->redo) block, then we can
remove this num++;, right?

> +                               idx++;
> +                               goto redo;
> +                       }
>  
>                         if (assign)
> -                               assign[i] = j;
> +                               assign[i] = idx;
> +
>                         num--;
> +
> +                       if (c->redo && state < SCHED_STATES_MAX) {
> +                               /* store scheduler state: */
> +                               sched_state[state].i = i;
> +                               sched_state[state].w = w;
> +                               sched_state[state].idx = idx;
> +                               sched_state[state].num = num;
> +                               bitmap_copy(sched_state[state].used, used_mask, n);
> +                               state++;
> +                       }
>                 }
>         } 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-05-18 21:16     ` Peter Zijlstra
@ 2011-05-18 21:20       ` Ingo Molnar
  2011-05-18 21:36         ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2011-05-18 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Robert Richter, Stephane Eranian, LKML


* Peter Zijlstra <peterz@infradead.org> wrote:

> >                         if (c->weight != w)
> >                                 continue;
> >  
> > -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > -                               if (!test_bit(j, used_mask))
> > +                       /* for each bit in idxmsk starting from idx */
> > +                       while (idx < X86_PMC_IDX_MAX) {
> > +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > +                                                   idx);
> 
> I'd be mighty tempted to ignore that 80 column rule here ;-)

Please put the body of the loop into a helper function, the function is large 
and there are countless col80 uglinesses in it!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-05-18 21:20       ` Ingo Molnar
@ 2011-05-18 21:36         ` Peter Zijlstra
  2011-05-19 10:49           ` Robert Richter
  2011-05-19 18:06           ` Ingo Molnar
  0 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2011-05-18 21:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Robert Richter, Stephane Eranian, LKML

On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > >                         if (c->weight != w)
> > >                                 continue;
> > >  
> > > -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > -                               if (!test_bit(j, used_mask))
> > > +                       /* for each bit in idxmsk starting from idx */
> > > +                       while (idx < X86_PMC_IDX_MAX) {
> > > +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > +                                                   idx);
> > 
> > I'd be mighty tempted to ignore that 80 column rule here ;-)
> 
> Please put the body of the loop into a helper function, the function is large 
> and there are countless col80 uglinesses in it!

I just tried that, its real ugly due to the amount of state you need to
pass around.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-05-18 21:36         ` Peter Zijlstra
@ 2011-05-19 10:49           ` Robert Richter
  2011-05-19 18:06           ` Ingo Molnar
  1 sibling, 0 replies; 38+ messages in thread
From: Robert Richter @ 2011-05-19 10:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

On 18.05.11 17:36:53, Peter Zijlstra wrote:
> On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > >                         if (c->weight != w)
> > > >                                 continue;
> > > >  
> > > > -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > > -                               if (!test_bit(j, used_mask))
> > > > +                       /* for each bit in idxmsk starting from idx */
> > > > +                       while (idx < X86_PMC_IDX_MAX) {
> > > > +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > > +                                                   idx);
> > > 
> > > I'd be mighty tempted to ignore that 80 column rule here ;-)
> > 
> > Please put the body of the loop into a helper function, the function is large 
> > and there are countless col80 uglinesses in it!
> 
> I just tried that, its real ugly due to the amount of state you need to
> pass around.

I will try to improve the patch and send a new version.

Thanks for review.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-05-18 21:36         ` Peter Zijlstra
  2011-05-19 10:49           ` Robert Richter
@ 2011-05-19 18:06           ` Ingo Molnar
  2011-05-20  3:18             ` Robert Richter
  1 sibling, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2011-05-19 18:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Robert Richter, Stephane Eranian, LKML

* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > >                         if (c->weight != w)
> > > >                                 continue;
> > > >  
> > > > -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > > -                               if (!test_bit(j, used_mask))
> > > > +                       /* for each bit in idxmsk starting from idx */
> > > > +                       while (idx < X86_PMC_IDX_MAX) {
> > > > +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > > +                                                   idx);
> > > 
> > > I'd be mighty tempted to ignore that 80 column rule here ;-)
> > 
> > Please put the body of the loop into a helper function, the function is large 
> > and there are countless col80 uglinesses in it!
> 
> I just tried that, its real ugly due to the amount of state you need to
> pass around.

Does it help if you put that state into a helper structure?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-05-19 18:06           ` Ingo Molnar
@ 2011-05-20  3:18             ` Robert Richter
  2011-09-01 12:56               ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-05-20  3:18 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Stephane Eranian, LKML

On 19.05.11 14:06:50, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > >                         if (c->weight != w)
> > > > >                                 continue;
> > > > >  
> > > > > -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > > > -                               if (!test_bit(j, used_mask))
> > > > > +                       /* for each bit in idxmsk starting from idx */
> > > > > +                       while (idx < X86_PMC_IDX_MAX) {
> > > > > +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > > > +                                                   idx);
> > > > 
> > > > I'd be mighty tempted to ignore that 80 column rule here ;-)
> > > 
> > > Please put the body of the loop into a helper function, the function is large 
> > > and there are countless col80 uglinesses in it!
> > 
> > I just tried that, its real ugly due to the amount of state you need to
> > pass around.
> 
> Does it help if you put that state into a helper structure?

Yes, this is what I have in mind too. We could iterate on such a state
stucture instead of a couple of single variables. Storing and
restoring the state will then just copying the structure.

-Robert


> 
> Thanks,
> 
> 	Ingo
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-05-20  3:18             ` Robert Richter
@ 2011-09-01 12:56               ` Peter Zijlstra
  2011-09-01 14:12                 ` Robert Richter
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2011-09-01 12:56 UTC (permalink / raw)
  To: Robert Richter; +Cc: Ingo Molnar, Stephane Eranian, LKML

On Fri, 2011-05-20 at 05:18 +0200, Robert Richter wrote:
> On 19.05.11 14:06:50, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > 
> > > > > >                         if (c->weight != w)
> > > > > >                                 continue;
> > > > > >  
> > > > > > -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > > > > -                               if (!test_bit(j, used_mask))
> > > > > > +                       /* for each bit in idxmsk starting from idx */
> > > > > > +                       while (idx < X86_PMC_IDX_MAX) {
> > > > > > +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > > > > +                                                   idx);
> > > > > 
> > > > > I'd be mighty tempted to ignore that 80 column rule here ;-)
> > > > 
> > > > Please put the body of the loop into a helper function, the function is large 
> > > > and there are countless col80 uglinesses in it!
> > > 
> > > I just tried that, its real ugly due to the amount of state you need to
> > > pass around.
> > 
> > Does it help if you put that state into a helper structure?
> 
> Yes, this is what I have in mind too. We could iterate on such a state
> stucture instead of a couple of single variables. Storing and
> restoring the state will then just copying the structure.

Any word on this work? I just noticed we actually need this for Intel
too, the fixed purpose events have overlapping but non-identical
constraint masks.

Now we could optimize the Intel case by always iterating from the top
down, but it won't cure all cases. For example, suppose one counter
(that could be on a FP reg) previously got scheduled on a GP register
and we take the fast-path, in that case we would still end up
under-utilized.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-09-01 12:56               ` Peter Zijlstra
@ 2011-09-01 14:12                 ` Robert Richter
  2011-09-01 16:37                   ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Robert Richter @ 2011-09-01 14:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

On 01.09.11 08:56:46, Peter Zijlstra wrote:
> On Fri, 2011-05-20 at 05:18 +0200, Robert Richter wrote:
> > On 19.05.11 14:06:50, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> > > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > 
> > > > > > >                         if (c->weight != w)
> > > > > > >                                 continue;
> > > > > > >  
> > > > > > > -                       for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > > > > > -                               if (!test_bit(j, used_mask))
> > > > > > > +                       /* for each bit in idxmsk starting from idx */
> > > > > > > +                       while (idx < X86_PMC_IDX_MAX) {
> > > > > > > +                               idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > > > > > +                                                   idx);
> > > > > > 
> > > > > > I'd be mighty tempted to ignore that 80 column rule here ;-)
> > > > > 
> > > > > Please put the body of the loop into a helper function, the function is large 
> > > > > and there are countless col80 uglinesses in it!
> > > > 
> > > > I just tried that, its real ugly due to the amount of state you need to
> > > > pass around.
> > > 
> > > Does it help if you put that state into a helper structure?
> > 
> > Yes, this is what I have in mind too. We could iterate on such a state
> > stucture instead of a couple of single variables. Storing and
> > restoring the state will then just copying the structure.
> 
> Any word on this work? I just noticed we actually need this for Intel
> too, the fixed purpose events have overlapping but non-identical
> constraint masks.
> 
> Now we could optimize the Intel case by always iterating from the top
> down, but it won't cure all cases. For example, suppose one counter
> (that could be on a FP reg) previously got scheduled on a GP register
> and we take the fast-path, in that case we would still end up
> under-utilized.

Yes, I had this patch in mind too. Will try to post an updated version
next week. Ok?

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters
  2011-09-01 14:12                 ` Robert Richter
@ 2011-09-01 16:37                   ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2011-09-01 16:37 UTC (permalink / raw)
  To: Robert Richter; +Cc: Ingo Molnar, Stephane Eranian, LKML

On Thu, 2011-09-01 at 16:12 +0200, Robert Richter wrote:
> Will try to post an updated version
> next week. Ok?
> 
Sure.. thanks!

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2011-09-01 16:37 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-16  0:27 [PATCH 0/4] perf, x86: Fixes for v2.6.39 Robert Richter
2011-04-16  0:27 ` [PATCH 1/4] perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus Robert Richter
2011-04-19 12:03   ` [tip:perf/urgent] " tip-bot for Andre Przywara
2011-04-16  0:27 ` [PATCH 2/4] perf, x86: Fix AMD family 15h FPU event constraints Robert Richter
2011-04-19 12:04   ` [tip:perf/urgent] " tip-bot for Robert Richter
2011-04-16  0:27 ` [PATCH 3/4] perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE Robert Richter
2011-04-18 20:00   ` Andi Kleen
2011-04-19 10:39     ` Robert Richter
2011-04-19 18:21       ` Andi Kleen
2011-04-19 12:04   ` [tip:perf/core] " tip-bot for Robert Richter
2011-04-16  0:27 ` [PATCH 4/4] perf, x86: Fix event scheduler to solve complex scheduling problems Robert Richter
2011-04-16  8:51   ` Peter Zijlstra
2011-04-16  9:43     ` Ingo Molnar
2011-04-16 10:08       ` Peter Zijlstra
2011-04-16 10:14         ` Ingo Molnar
2011-04-16 10:15           ` Peter Zijlstra
2011-04-16 14:26         ` Valdis.Kletnieks
2011-04-17  8:15     ` Robert Richter
2011-04-17  8:18       ` Ingo Molnar
2011-04-17  8:53         ` Peter Zijlstra
2011-04-17 11:23           ` Robert Richter
2011-04-18  8:17             ` Robert Richter
2011-04-16 15:52   ` Stephane Eranian
2011-04-17  8:44     ` Robert Richter
2011-04-17  9:05       ` Stephane Eranian
2011-04-19 10:26   ` [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters Robert Richter
2011-04-19 11:29     ` Ingo Molnar
2011-04-19 13:55       ` Robert Richter
2011-04-28  9:50         ` Robert Richter
2011-05-18 21:16     ` Peter Zijlstra
2011-05-18 21:20       ` Ingo Molnar
2011-05-18 21:36         ` Peter Zijlstra
2011-05-19 10:49           ` Robert Richter
2011-05-19 18:06           ` Ingo Molnar
2011-05-20  3:18             ` Robert Richter
2011-09-01 12:56               ` Peter Zijlstra
2011-09-01 14:12                 ` Robert Richter
2011-09-01 16:37                   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.