linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
@ 2012-11-10  1:01 Jacob Shin
  2012-11-10  1:01 ` [PATCH 1/4] perf, amd: Simplify northbridge event constraints handler Jacob Shin
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-10  1:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, H. Peter Anvin, Stephane Eranian,
	Robert Richter, x86, linux-kernel, Jacob Shin

The following patchset enables 4 additional performance counters in
AMD family 15h processors that counts northbridge events -- such as
DRAM accesses.

This patchset is based on previous work done by Robert Richter
<rric@kernel.org> :

https://lkml.org/lkml/2012/6/19/324

The main differences are:

- The northbridge counters are indexed contiguously right above the
  core performance counters.

- MSR address offset calculations are moved to architecture specific
  files.

- Interrups are set up to be delivered only to a single core.

Jacob Shin (3):
  perf, amd: Refactor northbridge event constraints handler for code
    sharing
  perf, x86: Move MSR address offset calculation to architecture
    specific files
  perf, amd: Enable northbridge performance counters on AMD family 15h

Robert Richter (1):
  perf, amd: Simplify northbridge event constraints handler

 arch/x86/include/asm/cpufeature.h    |    2 +
 arch/x86/include/asm/msr-index.h     |    2 +
 arch/x86/include/asm/perf_event.h    |    6 +
 arch/x86/kernel/cpu/perf_event.h     |   21 +--
 arch/x86/kernel/cpu/perf_event_amd.c |  279 +++++++++++++++++++++++-----------
 5 files changed, 207 insertions(+), 103 deletions(-)

-- 
1.7.9.5



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] perf, amd: Simplify northbridge event constraints handler
  2012-11-10  1:01 [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
@ 2012-11-10  1:01 ` Jacob Shin
  2012-11-10  1:01 ` [PATCH 2/4] perf, amd: Refactor northbridge event constraints handler for code sharing Jacob Shin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-10  1:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, H. Peter Anvin, Stephane Eranian,
	Robert Richter, x86, linux-kernel, Jacob Shin

From: Robert Richter <rric@kernel.org>

Code simplification, there is no functional change.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c |   68 +++++++++++++---------------------
 1 file changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 4528ae7..d60c5c7 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -256,9 +256,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
-	struct perf_event *old = NULL;
-	int max = x86_pmu.num_counters;
-	int i, j, k = -1;
+	struct perf_event *old;
+	int idx, new = -1;
 
 	/*
 	 * if not NB event or no NB, then no constraints
@@ -276,48 +275,33 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	 * because of successive calls to x86_schedule_events() from
 	 * hw_perf_group_sched_in() without hw_perf_enable()
 	 */
-	for (i = 0; i < max; i++) {
-		/*
-		 * keep track of first free slot
-		 */
-		if (k == -1 && !nb->owners[i])
-			k = i;
+	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+		if (new == -1 || hwc->idx == idx)
+			/* assign free slot, prefer hwc->idx */
+			old = cmpxchg(nb->owners + idx, NULL, event);
+		else if (nb->owners[idx] == event)
+			/* event already present */
+			old = event;
+		else
+			continue;
+
+		if (old && old != event)
+			continue;
+
+		/* reassign to this slot */
+		if (new != -1)
+			cmpxchg(nb->owners + new, event, NULL);
+		new = idx;
 
 		/* already present, reuse */
-		if (nb->owners[i] == event)
-			goto done;
-	}
-	/*
-	 * not present, so grab a new slot
-	 * starting either at:
-	 */
-	if (hwc->idx != -1) {
-		/* previous assignment */
-		i = hwc->idx;
-	} else if (k != -1) {
-		/* start from free slot found */
-		i = k;
-	} else {
-		/*
-		 * event not found, no slot found in
-		 * first pass, try again from the
-		 * beginning
-		 */
-		i = 0;
-	}
-	j = i;
-	do {
-		old = cmpxchg(nb->owners+i, NULL, event);
-		if (!old)
+		if (old == event)
 			break;
-		if (++i == max)
-			i = 0;
-	} while (i != j);
-done:
-	if (!old)
-		return &nb->event_constraints[i];
-
-	return &emptyconstraint;
+	}
+
+	if (new == -1)
+		return &emptyconstraint;
+
+	return &nb->event_constraints[new];
 }
 
 static struct amd_nb *amd_alloc_nb(int cpu)
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] perf, amd: Refactor northbridge event constraints handler for code sharing
  2012-11-10  1:01 [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
  2012-11-10  1:01 ` [PATCH 1/4] perf, amd: Simplify northbridge event constraints handler Jacob Shin
@ 2012-11-10  1:01 ` Jacob Shin
  2012-11-10  1:01 ` [PATCH 3/4] perf, x86: Move MSR address offset calculation to architecture specific files Jacob Shin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-10  1:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, H. Peter Anvin, Stephane Eranian,
	Robert Richter, x86, linux-kernel, Jacob Shin

Breakout and generalize family 10h northbridge event contraints code
so that later we can reuse the same code path with other AMD processor
families that have the same northbridge event constraints.

Based on previous patch by Robert Richter <rric@kernel.org>

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd.c |   43 ++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index d60c5c7..d17debd 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -188,20 +188,13 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 	return nb && nb->nb_id != -1;
 }
 
-static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
-				      struct perf_event *event)
+static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
+					   struct perf_event *event)
 {
-	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
 	int i;
 
 	/*
-	 * only care about NB events
-	 */
-	if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-		return;
-
-	/*
 	 * need to scan whole list because event may not have
 	 * been assigned during scheduling
 	 *
@@ -247,12 +240,13 @@ static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
   *
   * Given that resources are allocated (cmpxchg), they must be
   * eventually freed for others to use. This is accomplished by
-  * calling amd_put_event_constraints().
+  * calling __amd_put_nb_event_constraints()
   *
   * Non NB events are not impacted by this restriction.
   */
 static struct event_constraint *
-amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+__amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
+			       struct event_constraint *c)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
@@ -260,12 +254,6 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	int idx, new = -1;
 
 	/*
-	 * if not NB event or no NB, then no constraints
-	 */
-	if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-		return &unconstrained;
-
-	/*
 	 * detect if already present, if so reuse
 	 *
 	 * cannot merge with actual allocation
@@ -275,7 +263,7 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	 * because of successive calls to x86_schedule_events() from
 	 * hw_perf_group_sched_in() without hw_perf_enable()
 	 */
-	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+	for_each_set_bit(idx, c->idxmsk, X86_PMC_IDX_MAX) {
 		if (new == -1 || hwc->idx == idx)
 			/* assign free slot, prefer hwc->idx */
 			old = cmpxchg(nb->owners + idx, NULL, event);
@@ -391,6 +379,25 @@ static void amd_pmu_cpu_dead(int cpu)
 	}
 }
 
+static struct event_constraint *
+amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	/*
+	 * if not NB event or no NB, then no constraints
+	 */
+	if ((amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
+		return &unconstrained;
+
+	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
+}
+
+static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
+				      struct perf_event *event)
+{
+	if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
+		__amd_put_nb_event_constraints(cpuc, event);
+}
+
 PMU_FORMAT_ATTR(event,	"config:0-7,32-35");
 PMU_FORMAT_ATTR(umask,	"config:8-15"	);
 PMU_FORMAT_ATTR(edge,	"config:18"	);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] perf, x86: Move MSR address offset calculation to architecture specific files
  2012-11-10  1:01 [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
  2012-11-10  1:01 ` [PATCH 1/4] perf, amd: Simplify northbridge event constraints handler Jacob Shin
  2012-11-10  1:01 ` [PATCH 2/4] perf, amd: Refactor northbridge event constraints handler for code sharing Jacob Shin
@ 2012-11-10  1:01 ` Jacob Shin
  2012-11-10  1:01 ` [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
  2012-11-10 11:50 ` [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Robert Richter
  4 siblings, 0 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-10  1:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, H. Peter Anvin, Stephane Eranian,
	Robert Richter, x86, linux-kernel, Jacob Shin

Move counter index to MSR address offset calculation to architecture
specific files. This prepares the way for perf_event_amd to enable
counter addresses that are not contiguous -- for example AMD Family
15h processors have 6 core performance counters starting at 0xc0010200
and 4 northbridge performance counters starting at 0xc0010240.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/cpu/perf_event.h     |   21 +++++---------------
 arch/x86/kernel/cpu/perf_event_amd.c |   36 ++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 271d257..aacf025 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -325,6 +325,7 @@ struct x86_pmu {
 	int		(*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
 	unsigned	eventsel;
 	unsigned	perfctr;
+	int             (*addr_offset)(int index);
 	u64		(*event_map)(int);
 	int		max_events;
 	int		num_counters;
@@ -444,28 +445,16 @@ extern u64 __read_mostly hw_cache_extra_regs
 
 u64 x86_perf_event_update(struct perf_event *event);
 
-static inline int x86_pmu_addr_offset(int index)
-{
-	int offset;
-
-	/* offset = X86_FEATURE_PERFCTR_CORE ? index << 1 : index */
-	alternative_io(ASM_NOP2,
-		       "shll $1, %%eax",
-		       X86_FEATURE_PERFCTR_CORE,
-		       "=a" (offset),
-		       "a"  (index));
-
-	return offset;
-}
-
 static inline unsigned int x86_pmu_config_addr(int index)
 {
-	return x86_pmu.eventsel + x86_pmu_addr_offset(index);
+	return x86_pmu.eventsel +
+		(x86_pmu.addr_offset ? x86_pmu.addr_offset(index) : index);
 }
 
 static inline unsigned int x86_pmu_event_addr(int index)
 {
-	return x86_pmu.perfctr + x86_pmu_addr_offset(index);
+	return x86_pmu.perfctr +
+		(x86_pmu.addr_offset ? x86_pmu.addr_offset(index) : index);
 }
 
 int x86_setup_perfctr(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index d17debd..078beb5 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -132,6 +132,41 @@ static u64 amd_pmu_event_map(int hw_event)
 	return amd_perfmon_event_map[hw_event];
 }
 
+/*
+ * Previously calculated offsets
+ */
+static unsigned int addr_offsets[X86_PMC_IDX_MAX] __read_mostly;
+
+/*
+ * Legacy CPUs:
+ *   4 counters starting at 0xc0010000 each offset by 1
+ *
+ * CPUs with core performance counter extensions:
+ *   6 counters starting at 0xc0010200 each offset by 2
+ */
+static inline int amd_pmu_addr_offset(int index)
+{
+	int offset;
+
+	if (!index)
+		return index;
+
+	offset = addr_offsets[index];
+
+	if (offset)
+		return offset;
+
+	if (!cpu_has_perfctr_core) {
+		offset = index;
+	} else {
+		offset = index << 1;
+	}
+
+	addr_offsets[index] = offset;
+
+	return offset;
+}
+
 static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
@@ -570,6 +605,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.schedule_events	= x86_schedule_events,
 	.eventsel		= MSR_K7_EVNTSEL0,
 	.perfctr		= MSR_K7_PERFCTR0,
+	.addr_offset            = amd_pmu_addr_offset,
 	.event_map		= amd_pmu_event_map,
 	.max_events		= ARRAY_SIZE(amd_perfmon_event_map),
 	.num_counters		= AMD64_NUM_COUNTERS,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-10  1:01 [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
                   ` (2 preceding siblings ...)
  2012-11-10  1:01 ` [PATCH 3/4] perf, x86: Move MSR address offset calculation to architecture specific files Jacob Shin
@ 2012-11-10  1:01 ` Jacob Shin
  2012-11-10 11:50 ` [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Robert Richter
  4 siblings, 0 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-10  1:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Thomas Gleixner, H. Peter Anvin, Stephane Eranian,
	Robert Richter, x86, linux-kernel, Jacob Shin

On AMD family 15h processors, there are 4 new performance counters
(in addition to 6 core performance counters) that can be used for
counting northbridge events (i.e. DRAM accesses). Their bit fields are
almost identical to the core performance counters. However, the same
set of MSRs are shared between multiple cores (that share the same
northbridge). We will reuse the same code path as existing family 10h
northbridge event constraints handler logic to enforce sharing.

Based on previous patch by Robert Richter <rric@kernel.org>

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 arch/x86/include/asm/cpufeature.h    |    2 +
 arch/x86/include/asm/msr-index.h     |    2 +
 arch/x86/include/asm/perf_event.h    |    6 ++
 arch/x86/kernel/cpu/perf_event_amd.c |  142 ++++++++++++++++++++++++++--------
 4 files changed, 120 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 8c297aa..17f75b8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -167,6 +167,7 @@
 #define X86_FEATURE_TBM		(6*32+21) /* trailing bit manipulations */
 #define X86_FEATURE_TOPOEXT	(6*32+22) /* topology extensions CPUID leafs */
 #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
+#define X86_FEATURE_PERFCTR_NB	(6*32+24) /* nb performance counter extensions */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
@@ -308,6 +309,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_perfctr_nb	boot_cpu_has(X86_FEATURE_PERFCTR_NB)
 #define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
 #define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
 #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..e67ff1e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -157,6 +157,8 @@
 /* Fam 15h MSRs */
 #define MSR_F15H_PERF_CTL		0xc0010200
 #define MSR_F15H_PERF_CTR		0xc0010201
+#define MSR_F15H_NB_PERF_CTL		0xc0010240
+#define MSR_F15H_NB_PERF_CTR		0xc0010241
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 4fabcdf..75e039c 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,8 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
+#define AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE		(1ULL << 36)
+#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK		(0x0FULL << 37)
 #define AMD_PERFMON_EVENTSEL_GUESTONLY			(1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY			(1ULL << 41)
 
@@ -46,8 +48,12 @@
 #define AMD64_RAW_EVENT_MASK		\
 	(X86_RAW_EVENT_MASK          |  \
 	 AMD64_EVENTSEL_EVENT)
+#define AMD64_NB_EVENT_MASK		\
+	(AMD64_EVENTSEL_EVENT        |  \
+	 ARCH_PERFMON_EVENTSEL_UMASK)
 #define AMD64_NUM_COUNTERS				4
 #define AMD64_NUM_COUNTERS_CORE				6
+#define AMD64_NUM_COUNTERS_NB				4
 
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 078beb5..adf4026 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -143,10 +143,15 @@ static unsigned int addr_offsets[X86_PMC_IDX_MAX] __read_mostly;
  *
  * CPUs with core performance counter extensions:
  *   6 counters starting at 0xc0010200 each offset by 2
+ *
+ * CPUs with north bridge performance counter extensions:
+ *   4 additional counters starting at 0xc0010240 each offset by 2
+ *   (indexed right above either one of the above core counters)
  */
 static inline int amd_pmu_addr_offset(int index)
 {
 	int offset;
+	int ncore;
 
 	if (!index)
 		return index;
@@ -158,8 +163,17 @@ static inline int amd_pmu_addr_offset(int index)
 
 	if (!cpu_has_perfctr_core) {
 		offset = index;
+		ncore = AMD64_NUM_COUNTERS;
 	} else {
 		offset = index << 1;
+		ncore = AMD64_NUM_COUNTERS_CORE;
+	}
+
+	/* find offset of NB counters with respect to x86_pmu.eventsel */
+	if (cpu_has_perfctr_nb) {
+		if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))
+			offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
+				 ((index - ncore) << 1);
 	}
 
 	addr_offsets[index] = offset;
@@ -167,6 +181,66 @@ static inline int amd_pmu_addr_offset(int index)
 	return offset;
 }
 
+/*
+ * AMD64 events are detected based on their event codes.
+ */
+static inline unsigned int amd_get_event_code(struct hw_perf_event *hwc)
+{
+	return ((hwc->config >> 24) & 0x0f00) | (hwc->config & 0x00ff);
+}
+
+static inline int amd_is_nb_event(struct hw_perf_event *hwc)
+{
+	return (hwc->config & 0xe0) == 0xe0;
+}
+
+static inline int amd_is_nb_perfctr_event(struct hw_perf_event *hwc)
+{
+	return cpu_has_perfctr_nb && amd_is_nb_event(hwc);
+}
+
+static inline int amd_has_nb(struct cpu_hw_events *cpuc)
+{
+	struct amd_nb *nb = cpuc->amd_nb;
+
+	return nb && nb->nb_id != -1;
+}
+
+/*
+ * AMD NB counters (MSRs 0xc0010240 etc.) do not support the following
+ * flags:
+ *
+ *  Host/Guest Only
+ *  Counter Mask
+ *  Invert Comparison
+ *  Edge Detect
+ *  Operating-System Mode
+ *  User Mode
+ *
+ * Try to fix the config for default settings, otherwise fail.
+ */
+static int amd_nb_event_config(struct perf_event *event)
+{
+	if (!amd_is_nb_perfctr_event(&event->hw))
+		return 0;
+
+	if (event->attr.exclude_host || event->attr.exclude_guest
+	    || event->attr.exclude_user || event->attr.exclude_kernel)
+		goto fail;
+
+	event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | ARCH_PERFMON_EVENTSEL_OS);
+
+	if (event->hw.config & ~(AMD64_NB_EVENT_MASK | ARCH_PERFMON_EVENTSEL_INT |
+				 AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE |
+				 AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK))
+		goto fail;
+
+	return 0;
+fail:
+	pr_debug("Invalid nb counter config value: %016Lx\n", event->hw.config);
+	return -EINVAL;
+}
+
 static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
@@ -175,13 +249,13 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	if (event->attr.precise_ip && get_ibs_caps())
 		return -ENOENT;
 
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
 	ret = x86_pmu_hw_config(event);
 	if (ret)
 		return ret;
 
-	if (has_branch_stack(event))
-		return -EOPNOTSUPP;
-
 	if (event->attr.exclude_host && event->attr.exclude_guest)
 		/*
 		 * When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -195,32 +269,10 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	else if (event->attr.exclude_guest)
 		event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;
 
-	if (event->attr.type != PERF_TYPE_RAW)
-		return 0;
-
-	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
-
-	return 0;
-}
-
-/*
- * AMD64 events are detected based on their event codes.
- */
-static inline unsigned int amd_get_event_code(struct hw_perf_event *hwc)
-{
-	return ((hwc->config >> 24) & 0x0f00) | (hwc->config & 0x00ff);
-}
-
-static inline int amd_is_nb_event(struct hw_perf_event *hwc)
-{
-	return (hwc->config & 0xe0) == 0xe0;
-}
-
-static inline int amd_has_nb(struct cpu_hw_events *cpuc)
-{
-	struct amd_nb *nb = cpuc->amd_nb;
+	if (event->attr.type == PERF_TYPE_RAW)
+		event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
 
-	return nb && nb->nb_id != -1;
+	return amd_nb_event_config(event);
 }
 
 static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
@@ -324,6 +376,15 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
 	if (new == -1)
 		return &emptyconstraint;
 
+	/* set up interrupts to be delivered only to this core */
+	if (cpu_has_perfctr_nb) {
+		struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+
+		hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
+		hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
+		hwc->config |= (0ULL | (c->cpu_core_id)) << 37;
+	}
+
 	return &nb->event_constraints[new];
 }
 
@@ -420,7 +481,7 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	/*
 	 * if not NB event or no NB, then no constraints
 	 */
-	if ((amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
+	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
 		return &unconstrained;
 
 	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
@@ -521,6 +582,7 @@ static struct event_constraint amd_f15_PMC3  = EVENT_CONSTRAINT(0, 0x08, 0);
 static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09, 0);
 static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
 static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
+static struct event_constraint amd_f15_NBPMC30 = EVENT_CONSTRAINT(0, 0x3C0, 0);
 
 static struct event_constraint *
 amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
@@ -587,8 +649,11 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
 			return &amd_f15_PMC20;
 		}
 	case AMD_EVENT_NB:
-		/* not yet implemented */
-		return &emptyconstraint;
+		if (cpuc->is_fake)
+			return &amd_f15_NBPMC30;
+
+		return __amd_get_nb_event_constraints(cpuc, event,
+						      &amd_f15_NBPMC30);
 	default:
 		return &emptyconstraint;
 	}
@@ -626,7 +691,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 
 static int setup_event_constraints(void)
 {
-	if (boot_cpu_data.x86 >= 0x15)
+	if (boot_cpu_data.x86 == 0x15)
 		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
 	return 0;
 }
@@ -656,6 +721,18 @@ static int setup_perfctr_core(void)
 	return 0;
 }
 
+static int setup_perfctr_nb(void)
+{
+	if (!cpu_has_perfctr_nb)
+		return -ENODEV;
+
+	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
+
+	printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
+
+	return 0;
+}
+
 __init int amd_pmu_init(void)
 {
 	/* Performance-monitoring supported from K7 and later: */
@@ -666,6 +743,7 @@ __init int amd_pmu_init(void)
 
 	setup_event_constraints();
 	setup_perfctr_core();
+	setup_perfctr_nb();
 
 	/* Events are common for all AMDs */
 	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-10  1:01 [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
                   ` (3 preceding siblings ...)
  2012-11-10  1:01 ` [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
@ 2012-11-10 11:50 ` Robert Richter
  2012-11-11 18:17   ` Stephane Eranian
  2012-11-11 18:44   ` Jacob Shin
  4 siblings, 2 replies; 22+ messages in thread
From: Robert Richter @ 2012-11-10 11:50 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

On 09.11.12 19:01:34, Jacob Shin wrote:
> The following patchset enables 4 additional performance counters in
> AMD family 15h processors that counts northbridge events -- such as
> DRAM accesses.
> 
> This patchset is based on previous work done by Robert Richter
> <rric@kernel.org> :
> 
> https://lkml.org/lkml/2012/6/19/324

The original patch set of this is here (a rebased version):

 http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-nb

This code was tested in detail.

> The main differences are:
> 
> - The northbridge counters are indexed contiguously right above the
>   core performance counters.
> 
> - MSR address offset calculations are moved to architecture specific
>   files.
> 
> - Interrups are set up to be delivered only to a single core.

So I rather suggest to make delta patches on top of my patches.

Peter's main concerns were that my patch set is not in the
Intel-uncore style. I started reworking this but was not able to
finish my work. This concerns still exist.

Due to the current situation I would rather prefer to create a
tip:perf/amd-nb branch that includes my patches and then add all
further necessary steps for mainline acceptance on top of it.

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-10 11:50 ` [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Robert Richter
@ 2012-11-11 18:17   ` Stephane Eranian
  2012-11-12 14:36     ` Robert Richter
  2012-11-11 18:44   ` Jacob Shin
  1 sibling, 1 reply; 22+ messages in thread
From: Stephane Eranian @ 2012-11-11 18:17 UTC (permalink / raw)
  To: Robert Richter
  Cc: Jacob Shin, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin, x86,
	LKML

On Sat, Nov 10, 2012 at 12:50 PM, Robert Richter <rric@kernel.org> wrote:
> On 09.11.12 19:01:34, Jacob Shin wrote:
>> The following patchset enables 4 additional performance counters in
>> AMD family 15h processors that counts northbridge events -- such as
>> DRAM accesses.
>>
>> This patchset is based on previous work done by Robert Richter
>> <rric@kernel.org> :
>>
>> https://lkml.org/lkml/2012/6/19/324
>
> The original patch set of this is here (a rebased version):
>
>  http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-nb
>
> This code was tested in detail.
>
>> The main differences are:
>>
>> - The northbridge counters are indexed contiguously right above the
>>   core performance counters.
>>
>> - MSR address offset calculations are moved to architecture specific
>>   files.
>>
>> - Interrups are set up to be delivered only to a single core.
>
> So I rather suggest to make delta patches on top of my patches.
>
> Peter's main concerns were that my patch set is not in the
> Intel-uncore style. I started reworking this but was not able to
> finish my work. This concerns still exist.
>
That was my concern too. I don't recall exactly why it could not
be totally disconnected from the core PMU. I think hardware-wise,
it was possible. Could you refresh my memory?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-10 11:50 ` [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Robert Richter
  2012-11-11 18:17   ` Stephane Eranian
@ 2012-11-11 18:44   ` Jacob Shin
  2012-11-12 12:24     ` Stephane Eranian
  2012-11-12 14:55     ` Robert Richter
  1 sibling, 2 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-11 18:44 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

On Sat, Nov 10, 2012 at 12:50:27PM +0100, Robert Richter wrote:
> On 09.11.12 19:01:34, Jacob Shin wrote:
> > The following patchset enables 4 additional performance counters in
> > AMD family 15h processors that counts northbridge events -- such as
> > DRAM accesses.
> > 
> > This patchset is based on previous work done by Robert Richter
> > <rric@kernel.org> :
> > 
> > https://lkml.org/lkml/2012/6/19/324
> 
> The original patch set of this is here (a rebased version):
> 
>  http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-nb
> 
> This code was tested in detail.
> 
> > The main differences are:
> > 
> > - The northbridge counters are indexed contiguously right above the
> >   core performance counters.
> > 
> > - MSR address offset calculations are moved to architecture specific
> >   files.
> > 
> > - Interrups are set up to be delivered only to a single core.
> 
> So I rather suggest to make delta patches on top of my patches.

Okay, if we have to, I can rework my patches on top of that, as long 
as the end result looks something like I'm suggesting above. Because
in an upcoming processor family, there is no core performance counter
extensions, but we do have northbridge performance counters. Meaning
the counter address base would be c0010000 and northbridge counters
live in c0010240, being 0x240 apart, we could make counter masks work
but that testng awful alot of 0's for every address offset calculation
.

> 
> Peter's main concerns were that my patch set is not in the
> Intel-uncore style. I started reworking this but was not able to
> finish my work. This concerns still exist.

Right, I considered this too, and still, I agree with you Robert that
it makes more sense to just extend AMD's x86 PMU.

1. Because the hardware interface -- register bit fields, are alost
   identical

2. Because the interrupt delivery mechanism is also identical --
   delivered via same APIC interrupt vector.

I think my proposed patchset on top of current Linus's tree is pretty
minimal, and is isolated to AMD so it should be easier to swallow.

Peter, could you take a look at the patchset and if you still prefer
a intel uncore like implementation?

> 
> Due to the current situation I would rather prefer to create a
> tip:perf/amd-nb branch that includes my patches and then add all
> further necessary steps for mainline acceptance on top of it.

Okay, Peter, let me know if this is a route to go, and I'll generate
my patchset on top of that.

Thanks,

-Jacob

> 
> Thanks,
> 
> -Robert
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-11 18:44   ` Jacob Shin
@ 2012-11-12 12:24     ` Stephane Eranian
  2012-11-12 14:22       ` Robert Richter
  2012-11-12 14:55     ` Robert Richter
  1 sibling, 1 reply; 22+ messages in thread
From: Stephane Eranian @ 2012-11-12 12:24 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Robert Richter, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin, x86,
	LKML

Hi,

Anybody from AMD or formerly @ AMD care to submit a libpfm4 patch
to add the Fam15th NB events?

I'd like to avoid having to type them in manually.
Thanks.

On Sun, Nov 11, 2012 at 7:44 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> On Sat, Nov 10, 2012 at 12:50:27PM +0100, Robert Richter wrote:
>> On 09.11.12 19:01:34, Jacob Shin wrote:
>> > The following patchset enables 4 additional performance counters in
>> > AMD family 15h processors that counts northbridge events -- such as
>> > DRAM accesses.
>> >
>> > This patchset is based on previous work done by Robert Richter
>> > <rric@kernel.org> :
>> >
>> > https://lkml.org/lkml/2012/6/19/324
>>
>> The original patch set of this is here (a rebased version):
>>
>>  http://git.kernel.org/?p=linux/kernel/git/rric/oprofile.git;a=shortlog;h=refs/heads/perf-nb
>>
>> This code was tested in detail.
>>
>> > The main differences are:
>> >
>> > - The northbridge counters are indexed contiguously right above the
>> >   core performance counters.
>> >
>> > - MSR address offset calculations are moved to architecture specific
>> >   files.
>> >
>> > - Interrups are set up to be delivered only to a single core.
>>
>> So I rather suggest to make delta patches on top of my patches.
>
> Okay, if we have to, I can rework my patches on top of that, as long
> as the end result looks something like I'm suggesting above. Because
> in an upcoming processor family, there is no core performance counter
> extensions, but we do have northbridge performance counters. Meaning
> the counter address base would be c0010000 and northbridge counters
> live in c0010240, being 0x240 apart, we could make counter masks work
> but that testng awful alot of 0's for every address offset calculation
> .
>
>>
>> Peter's main concerns were that my patch set is not in the
>> Intel-uncore style. I started reworking this but was not able to
>> finish my work. This concerns still exist.
>
> Right, I considered this too, and still, I agree with you Robert that
> it makes more sense to just extend AMD's x86 PMU.
>
> 1. Because the hardware interface -- register bit fields, are alost
>    identical
>
> 2. Because the interrupt delivery mechanism is also identical --
>    delivered via same APIC interrupt vector.
>
> I think my proposed patchset on top of current Linus's tree is pretty
> minimal, and is isolated to AMD so it should be easier to swallow.
>
> Peter, could you take a look at the patchset and if you still prefer
> a intel uncore like implementation?
>
>>
>> Due to the current situation I would rather prefer to create a
>> tip:perf/amd-nb branch that includes my patches and then add all
>> further necessary steps for mainline acceptance on top of it.
>
> Okay, Peter, let me know if this is a route to go, and I'll generate
> my patchset on top of that.
>
> Thanks,
>
> -Jacob
>
>>
>> Thanks,
>>
>> -Robert
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-12 12:24     ` Stephane Eranian
@ 2012-11-12 14:22       ` Robert Richter
  2012-11-12 16:13         ` Jacob Shin
  0 siblings, 1 reply; 22+ messages in thread
From: Robert Richter @ 2012-11-12 14:22 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jacob Shin, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin, x86,
	LKML, Suravee Suthikulpanit

Stephane,

On 12.11.12 13:24:38, Stephane Eranian wrote:
> Anybody from AMD or formerly @ AMD care to submit a libpfm4 patch
> to add the Fam15th NB events?
> 
> I'd like to avoid having to type them in manually.

Suravee may probably help you here.

HTH

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-11 18:17   ` Stephane Eranian
@ 2012-11-12 14:36     ` Robert Richter
  0 siblings, 0 replies; 22+ messages in thread
From: Robert Richter @ 2012-11-12 14:36 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jacob Shin, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin, x86,
	LKML

On 11.11.12 19:17:07, Stephane Eranian wrote:
> On Sat, Nov 10, 2012 at 12:50 PM, Robert Richter <rric@kernel.org> wrote:
> > Peter's main concerns were that my patch set is not in the
> > Intel-uncore style. I started reworking this but was not able to
> > finish my work. This concerns still exist.
> >
> That was my concern too. I don't recall exactly why it could not
> be totally disconnected from the core PMU. I think hardware-wise,
> it was possible. Could you refresh my memory?

Current implementation only allows the use of a single x86 pmu.
Multiple instances of x86 pmus in a system like an additional pmu for
nb counters requires a complete rework. You need to remove global
variables and extend function interfaces by arguments pointing to a
struct x86_pmu. All this without adding overhead (e.g. pointer
chasing) compared to the existing code.

This is also the reason why the intel-uncore implemenation is
basically a copy-and-paste of generic x86 pmu code plus uncore
specific changes.

Avoiding code cuplication and a complex rework were the reasons for
simply extending the family 10h implementation to also support family
15h nb counters. This seemed to me the best approach that time.

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-11 18:44   ` Jacob Shin
  2012-11-12 12:24     ` Stephane Eranian
@ 2012-11-12 14:55     ` Robert Richter
  1 sibling, 0 replies; 22+ messages in thread
From: Robert Richter @ 2012-11-12 14:55 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

Jacob,

On 11.11.12 12:44:26, Jacob Shin wrote:
> Because
> in an upcoming processor family, there is no core performance counter
> extensions, but we do have northbridge performance counters. Meaning
> the counter address base would be c0010000 and northbridge counters
> live in c0010240, being 0x240 apart, we could make counter masks work
> but that testng awful alot of 0's for every address offset calculation
> .

I agree with you that my imlementation of counter masks does not fit
perfectly for this reason and also since it requires too much changes
in generic code (for_each_generic_counter()) which Peter did not like
either. So it is ok to me to replace your solution with the patch that
implenents counter bitmasks.

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-12 14:22       ` Robert Richter
@ 2012-11-12 16:13         ` Jacob Shin
  2012-11-12 21:08           ` Stephane Eranian
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Shin @ 2012-11-12 16:13 UTC (permalink / raw)
  To: Robert Richter
  Cc: Stephane Eranian, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin, x86,
	LKML, Suravee Suthikulpanit

On Mon, Nov 12, 2012 at 03:22:33PM +0100, Robert Richter wrote:
> Stephane,
> 
> On 12.11.12 13:24:38, Stephane Eranian wrote:
> > Anybody from AMD or formerly @ AMD care to submit a libpfm4 patch
> > to add the Fam15th NB events?
> > 
> > I'd like to avoid having to type them in manually.
> 
> Suravee may probably help you here.

Suravee is out of the office until the end of the month,

Is this necessary ASAP for this perf patchset to go upstream? If so,
I'll commit to getting a patch out for libpfm4 soon.

Otherwise, I'll ask Suravee to work on it when he gets back.

Thanks,

-Jacob

> 
> HTH
> 
> -Robert
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters
  2012-11-12 16:13         ` Jacob Shin
@ 2012-11-12 21:08           ` Stephane Eranian
  0 siblings, 0 replies; 22+ messages in thread
From: Stephane Eranian @ 2012-11-12 21:08 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Robert Richter, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin, x86,
	LKML, Suravee Suthikulpanit

On Mon, Nov 12, 2012 at 5:13 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> On Mon, Nov 12, 2012 at 03:22:33PM +0100, Robert Richter wrote:
>> Stephane,
>>
>> On 12.11.12 13:24:38, Stephane Eranian wrote:
>> > Anybody from AMD or formerly @ AMD care to submit a libpfm4 patch
>> > to add the Fam15th NB events?
>> >
>> > I'd like to avoid having to type them in manually.
>>
>> Suravee may probably help you here.
>
> Suravee is out of the office until the end of the month,
>
> Is this necessary ASAP for this perf patchset to go upstream? If so,
> I'll commit to getting a patch out for libpfm4 soon.
>
> Otherwise, I'll ask Suravee to work on it when he gets back.
>
Not critical for the kernel part. USeful for user level tools beyond
perf.

> Thanks,
>
> -Jacob
>
>>
>> HTH
>>
>> -Robert
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-16 19:32       ` Robert Richter
  2012-11-16 21:57         ` Jacob Shin
@ 2012-11-28 17:42         ` Jacob Shin
  1 sibling, 0 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-28 17:42 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

Robert,

On Fri, Nov 16, 2012 at 08:32:24PM +0100, Robert Richter wrote:
> On 16.11.12 13:00:30, Jacob Shin wrote:
> > On Fri, Nov 16, 2012 at 07:43:44PM +0100, Robert Richter wrote:
> > > On 15.11.12 15:31:53, Jacob Shin wrote:
> > > > @@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> > > >  	if (new == -1)
> > > >  		return &emptyconstraint;
> > > >  
> > > > +	/* set up interrupts to be delivered only to this core */
> > > > +	if (cpu_has_perfctr_nb) {
> > > > +		struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> > > > +
> > > > +		hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> > > > +		hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> > > > +		hwc->config |= (0ULL | (c->cpu_core_id)) <<
> > > > +			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> > > > +	}
> > > 
> > > Looks like a hack to me. The constaints handler is only supposed to
> > > determine constraints and not to touch anything in the event's
> > > structure. This should be done later when setting up hwc->config in
> > > amd_nb_event_config() or so.
> > 
> > Hm.. is the hwc->config called after constraints have been set up
> > already? If so, I'll change it ..
> 
> Should be, since the hw register can be setup only after the counter
> is selected.

Ahh .. looking at this further, it looks like ->config is called
before constraints are set up (before we know what cpu we are going to
run on).

Sorry for not seeing this sooner, but it really looks like the event
constraints function is the right time to set up the INT_CORE_SEL bits
. Are you okay with this?

> > > I also do not think that smp_processor_id() is the right thing to do
> > > here. Since cpu_hw_events is per-cpu the cpu is already selected.
> > 
> > Yeah, I could not figure out how to get the cpu number from cpuc. Is
> > there a container_of kind of thing that I can do to get the cpu number
> > ?
> 
> At some point event->cpu is assigned, I think.

Furthermore, event->cpu can only be used if --cpu flag is specified
from userlan, otherwise event->cpu is 0xffff. And we do not know until
the schedule happens, what cpu we are going to be running on.

I tried to figure out if there was a way to get from cpu_hw_events to
a cpu number, but I didn't see any obvious ways. The cpu_hw_events is
derived from __get_cpu_var from the schedule function that calls the
constraints, so smp_processor_id seems okay to use here.

..

So I'll have to change things back, unless do you have any other
ideas ?

Thanks,

-Jacob

> 
> -Robert
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-16 21:57         ` Jacob Shin
@ 2012-11-26 12:15           ` Robert Richter
  0 siblings, 0 replies; 22+ messages in thread
From: Robert Richter @ 2012-11-26 12:15 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

Jacob,

On 16.11.12 15:57:18, Jacob Shin wrote:
> > It looks like the paths should be defined more clearly.
> 
> Per comments above the function, I was logically going down the cases,
> 1. is this a legacy counter?
> 2. is this a perfctr_core counter?
> 3. is this a perfctr_nb counter?
> 
> To me it seems clear ..

See below...

> So here is v3, how does this look? If okay, could you add Reviewed-by or
> Acked-by ? After that, I'll send out the patchbomb again with review/ack
> on patch [3/4] and [4/4]

I will ack your resent patches if they are looking good to me and we
leave it at the maintainers to add my acked-by after your
signed-off-by.

The direction looks good to me know, but see my comments below.

> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 4fabcdf..df97186 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -29,9 +29,14 @@
>  #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
>  #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
>  
> +#define AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE		(1ULL << 36)
>  #define AMD_PERFMON_EVENTSEL_GUESTONLY			(1ULL << 40)
>  #define AMD_PERFMON_EVENTSEL_HOSTONLY			(1ULL << 41)
>  
> +#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT		37
> +#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK		\
> +	(0xFULL << AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT)
> +

This is a bit shorter:

 AMD64_EVENTSEL_INT_CORE_SEL_*

AMD64_* refers to AMD architectural definitions in the AMD64 manuals.
AMD_PERFMON_* is actually not consistent here. Arghh, Joerg.

>  #define AMD64_EVENTSEL_EVENT	\
>  	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
>  #define INTEL_ARCH_EVENT_MASK	\
> @@ -46,8 +51,12 @@
>  #define AMD64_RAW_EVENT_MASK		\
>  	(X86_RAW_EVENT_MASK          |  \
>  	 AMD64_EVENTSEL_EVENT)
> +#define AMD64_NB_EVENT_MASK		\
> +	(AMD64_EVENTSEL_EVENT        |  \
> +	 ARCH_PERFMON_EVENTSEL_UMASK)

Since this is equivalent to AMD64_RAW_EVENT_MASK a better name would
be AMD64_RAW_EVENT_MASK_NB.

>  #define AMD64_NUM_COUNTERS				4
>  #define AMD64_NUM_COUNTERS_CORE				6
> +#define AMD64_NUM_COUNTERS_NB				4
>  
>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)

>  static inline int amd_pmu_addr_offset(int index)
>  {
>  	int offset;
> +	int ncore;
>  
>  	if (!index)
>  		return index;
> @@ -156,31 +163,27 @@ static inline int amd_pmu_addr_offset(int index)
>  	if (offset)
>  		return offset;
>  
> -	if (!cpu_has_perfctr_core)
> +	if (!cpu_has_perfctr_core) {
>  		offset = index;
> -	else
> +		ncore = AMD64_NUM_COUNTERS;
> +	} else {
>  		offset = index << 1;
> +		ncore = AMD64_NUM_COUNTERS_CORE;
> +	}

We still go in a block above even if we have a nb counter. See
solution below.

> +
> +	/* find offset of NB counters with respect to x86_pmu.eventsel */
> +	if (amd_nb_event_constraint &&
> +	    test_bit(index, amd_nb_event_constraint->idxmsk))
> +		offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
> +			 ((index - ncore) << 1);

I prefer the following paths:

	if (amd_nb_event_constraint && ...) {
		/* nb counter */
		...
	} else if (!cpu_has_perfctr_core) {
		/* core counter */
		...
	} else {
		/* legacy counter */
		...
	}

>  
>  	addr_offsets[index] = offset;
>  
>  	return offset;
>  }
>  
> -static int amd_pmu_hw_config(struct perf_event *event)
> +static int __amd_core_hw_config(struct perf_event *event)

No need for underscores...

> +/*
> + * NB counters do not support the following event select bits:
> + *   Host/Guest only
> + *   Counter mask
> + *   Invert counter mask
> + *   Edge detect
> + *   OS/User mode
> + */
> +static int __amd_nb_hw_config(struct perf_event *event)

No need for underscores...

> +{
> +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
> +
> +	/* set up interrupts to be delivered only to this core */
> +	if (cpu_has_perfctr_nb) {
> +		struct cpuinfo_x86 *c = &cpu_data(event->cpu);
> +
> +		event->hw.config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> +		event->hw.config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> +		event->hw.config |= (0ULL | (c->cpu_core_id)) <<

Better make the cast visible:

	(u64)(c->cpu_core_id) << AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT

> +			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> +	}
> +
> +	event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
> +			      ARCH_PERFMON_EVENTSEL_OS);
>  
> -	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;

Since this is calculated before ...

> +	if (event->hw.config & ~(AMD64_NB_EVENT_MASK                  |
> +				 ARCH_PERFMON_EVENTSEL_INT            |
> +				 AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE |
> +				 AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK))

... we can check:

	if (event->hw.config & ~AMD64_NB_EVENT_MASK)

if we move core_sel setup after the check.

Move comment from above here too.

> +		return -EINVAL;
>  
>  	return 0;

> @@ -422,7 +485,10 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
>  	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
>  		return &unconstrained;
>  
> -	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
> +	return __amd_get_nb_event_constraints(cpuc, event,
> +					      amd_nb_event_constraint ? 
> +					      amd_nb_event_constraint :
> +					      &unconstrained);

An option would be to always set amd_nb_event_constraint to
&unconstrained per default. Or, move the check to
__amd_get_nb_event_constraints().

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-16 19:00     ` Jacob Shin
  2012-11-16 19:32       ` Robert Richter
@ 2012-11-18 16:35       ` Robert Richter
  1 sibling, 0 replies; 22+ messages in thread
From: Robert Richter @ 2012-11-18 16:35 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

On 16.11.12 13:00:30, Jacob Shin wrote:
> > >  static int setup_event_constraints(void)
> > >  {
> > > -	if (boot_cpu_data.x86 >= 0x15)
> > > +	if (boot_cpu_data.x86 == 0x15)
> > >  		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
> > 
> > Since this does not cover family 16h anymore, you also need to extend
> > amd_get_event_ constraints() to setup nb counters with __amd_get_nb_
> > event_constraints() if cpu_has_perfctr_nb is set.
> 
> Yes family 16h will be covered by a separate patch at a later date.

This is fine, if functionality is added later. But If you run the
patch as it is on family 16h, you need to make sure nb counters are
not enabled at all. This is not the case since you have several checks
for cpu_has_perfctr_nb. This code must also run on family 16h as it
is, thus you need strictly disable nb counters in this case, or enable
it.

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-16 19:32       ` Robert Richter
@ 2012-11-16 21:57         ` Jacob Shin
  2012-11-26 12:15           ` Robert Richter
  2012-11-28 17:42         ` Jacob Shin
  1 sibling, 1 reply; 22+ messages in thread
From: Jacob Shin @ 2012-11-16 21:57 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

On Fri, Nov 16, 2012 at 08:32:24PM +0100, Robert Richter wrote:
> On 16.11.12 13:00:30, Jacob Shin wrote:
> > On Fri, Nov 16, 2012 at 07:43:44PM +0100, Robert Richter wrote:
> > > On 15.11.12 15:31:53, Jacob Shin wrote:
> > > > @@ -156,31 +161,28 @@ static inline int amd_pmu_addr_offset(int index)
> > > >  	if (offset)
> > > >  		return offset;
> > > >  
> > > > -	if (!cpu_has_perfctr_core)
> > > > +	if (!cpu_has_perfctr_core) {
> > > >  		offset = index;
> > > > -	else
> > > > +		ncore = AMD64_NUM_COUNTERS;
> > > > +	} else {
> 
> First calculation:
> 
> > > >  		offset = index << 1;
> > > > +		ncore = AMD64_NUM_COUNTERS_CORE;
> > > > +	}
> > > > +
> > > > +	/* find offset of NB counters with respect to x86_pmu.eventsel */
> > > > +	if (cpu_has_perfctr_nb) {
> > > > +		if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))
> 
> Second calculation:
> 
> > > > +			offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
> > > > +				 ((index - ncore) << 1);
> > > > +	}
> > > 
> > > There is duplicate calculatoin of offset in some cases. Better we
> > > avoid this.
> > 
> > Which cases? The code calculates the offset for a given index the very
> > first time it is called, stores it, and uses that stored offset from
> > then on. My [PATCH 3/4] sets that up.
> 
> One case above.
> 
> It looks like the paths should be defined more clearly.

Per comments above the function, I was logically going down the cases,
1. is this a legacy counter?
2. is this a perfctr_core counter?
3. is this a perfctr_nb counter?

To me it seems clear ..

> 
> > > > @@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> > > >  	if (new == -1)
> > > >  		return &emptyconstraint;
> > > >  
> > > > +	/* set up interrupts to be delivered only to this core */
> > > > +	if (cpu_has_perfctr_nb) {
> > > > +		struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> > > > +
> > > > +		hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> > > > +		hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> > > > +		hwc->config |= (0ULL | (c->cpu_core_id)) <<
> > > > +			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> > > > +	}
> > > 
> > > Looks like a hack to me. The constaints handler is only supposed to
> > > determine constraints and not to touch anything in the event's
> > > structure. This should be done later when setting up hwc->config in
> > > amd_nb_event_config() or so.
> > 
> > Hm.. is the hwc->config called after constraints have been set up
> > already? If so, I'll change it ..
> 
> Should be, since the hw register can be setup only after the counter
> is selected.
> 
> > 
> > > 
> > > I also do not think that smp_processor_id() is the right thing to do
> > > here. Since cpu_hw_events is per-cpu the cpu is already selected.
> > 
> > Yeah, I could not figure out how to get the cpu number from cpuc. Is
> > there a container_of kind of thing that I can do to get the cpu number
> > ?
> 
> At some point event->cpu is assigned, I think.

Great, thanks for this hint!

So here is v3, how does this look? If okay, could you add Reviewed-by or
Acked-by ? After that, I'll send out the patchbomb again with review/ack
on patch [3/4] and [4/4]

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 8c297aa..b05c722 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -167,6 +167,7 @@
 #define X86_FEATURE_TBM		(6*32+21) /* trailing bit manipulations */
 #define X86_FEATURE_TOPOEXT	(6*32+22) /* topology extensions CPUID leafs */
 #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
+#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* core performance counter extensions */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
@@ -308,6 +309,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_perfctr_nb	boot_cpu_has(X86_FEATURE_PERFCTR_NB)
 #define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
 #define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
 #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..e67ff1e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -157,6 +157,8 @@
 /* Fam 15h MSRs */
 #define MSR_F15H_PERF_CTL		0xc0010200
 #define MSR_F15H_PERF_CTR		0xc0010201
+#define MSR_F15H_NB_PERF_CTL		0xc0010240
+#define MSR_F15H_NB_PERF_CTR		0xc0010241
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 4fabcdf..df97186 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,9 +29,14 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
+#define AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE		(1ULL << 36)
 #define AMD_PERFMON_EVENTSEL_GUESTONLY			(1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY			(1ULL << 41)
 
+#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT		37
+#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK		\
+	(0xFULL << AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT)
+
 #define AMD64_EVENTSEL_EVENT	\
 	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
 #define INTEL_ARCH_EVENT_MASK	\
@@ -46,8 +51,12 @@
 #define AMD64_RAW_EVENT_MASK		\
 	(X86_RAW_EVENT_MASK          |  \
 	 AMD64_EVENTSEL_EVENT)
+#define AMD64_NB_EVENT_MASK		\
+	(AMD64_EVENTSEL_EVENT        |  \
+	 ARCH_PERFMON_EVENTSEL_UMASK)
 #define AMD64_NUM_COUNTERS				4
 #define AMD64_NUM_COUNTERS_CORE				6
+#define AMD64_NUM_COUNTERS_NB				4
 
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index d6e3337..80ad803 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -132,6 +132,8 @@ static u64 amd_pmu_event_map(int hw_event)
 	return amd_perfmon_event_map[hw_event];
 }
 
+static struct event_constraint *amd_nb_event_constraint;
+
 /*
  * Previously calculated offsets
  */
@@ -143,10 +145,15 @@ static unsigned int addr_offsets[X86_PMC_IDX_MAX] __read_mostly;
  *
  * CPUs with core performance counter extensions:
  *   6 counters starting at 0xc0010200 each offset by 2
+ *
+ * CPUs with north bridge performance counter extensions:
+ *   4 additional counters starting at 0xc0010240 each offset by 2
+ *   (indexed right above either one of the above core counters)
  */
 static inline int amd_pmu_addr_offset(int index)
 {
 	int offset;
+	int ncore;
 
 	if (!index)
 		return index;
@@ -156,31 +163,27 @@ static inline int amd_pmu_addr_offset(int index)
 	if (offset)
 		return offset;
 
-	if (!cpu_has_perfctr_core)
+	if (!cpu_has_perfctr_core) {
 		offset = index;
-	else
+		ncore = AMD64_NUM_COUNTERS;
+	} else {
 		offset = index << 1;
+		ncore = AMD64_NUM_COUNTERS_CORE;
+	}
+
+	/* find offset of NB counters with respect to x86_pmu.eventsel */
+	if (amd_nb_event_constraint &&
+	    test_bit(index, amd_nb_event_constraint->idxmsk))
+		offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
+			 ((index - ncore) << 1);
 
 	addr_offsets[index] = offset;
 
 	return offset;
 }
 
-static int amd_pmu_hw_config(struct perf_event *event)
+static int __amd_core_hw_config(struct perf_event *event)
 {
-	int ret;
-
-	/* pass precise event sampling to ibs: */
-	if (event->attr.precise_ip && get_ibs_caps())
-		return -ENOENT;
-
-	ret = x86_pmu_hw_config(event);
-	if (ret)
-		return ret;
-
-	if (has_branch_stack(event))
-		return -EOPNOTSUPP;
-
 	if (event->attr.exclude_host && event->attr.exclude_guest)
 		/*
 		 * When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -194,10 +197,41 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	else if (event->attr.exclude_guest)
 		event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;
 
-	if (event->attr.type != PERF_TYPE_RAW)
-		return 0;
+	return 0;
+}
+
+/*
+ * NB counters do not support the following event select bits:
+ *   Host/Guest only
+ *   Counter mask
+ *   Invert counter mask
+ *   Edge detect
+ *   OS/User mode
+ */
+static int __amd_nb_hw_config(struct perf_event *event)
+{
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
+	/* set up interrupts to be delivered only to this core */
+	if (cpu_has_perfctr_nb) {
+		struct cpuinfo_x86 *c = &cpu_data(event->cpu);
+
+		event->hw.config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
+		event->hw.config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
+		event->hw.config |= (0ULL | (c->cpu_core_id)) <<
+			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
+	}
+
+	event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
+			      ARCH_PERFMON_EVENTSEL_OS);
 
-	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+	if (event->hw.config & ~(AMD64_NB_EVENT_MASK                  |
+				 ARCH_PERFMON_EVENTSEL_INT            |
+				 AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE |
+				 AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK))
+		return -EINVAL;
 
 	return 0;
 }
@@ -215,6 +249,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
 	return (hwc->config & 0xe0) == 0xe0;
 }
 
+static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
+{
+	return amd_nb_event_constraint && amd_is_nb_event(hwc);
+}
+
 static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 {
 	struct amd_nb *nb = cpuc->amd_nb;
@@ -222,6 +261,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 	return nb && nb->nb_id != -1;
 }
 
+static int amd_pmu_hw_config(struct perf_event *event)
+{
+	int ret;
+
+	/* pass precise event sampling to ibs: */
+	if (event->attr.precise_ip && get_ibs_caps())
+		return -ENOENT;
+
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	ret = x86_pmu_hw_config(event);
+	if (ret)
+		return ret;
+
+	if (event->attr.type == PERF_TYPE_RAW)
+		event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+
+	if (amd_is_perfctr_nb_event(&event->hw))
+		return __amd_nb_hw_config(event);
+
+	return __amd_core_hw_config(event);
+}
+
 static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
 					   struct perf_event *event)
 {
@@ -422,7 +485,10 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
 		return &unconstrained;
 
-	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
+	return __amd_get_nb_event_constraints(cpuc, event,
+					      amd_nb_event_constraint ? 
+					      amd_nb_event_constraint :
+					      &unconstrained);
 }
 
 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
@@ -521,6 +587,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
 static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
 static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
 
+static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
+static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
+
 static struct event_constraint *
 amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
@@ -586,8 +655,11 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
 			return &amd_f15_PMC20;
 		}
 	case AMD_EVENT_NB:
-		/* not yet implemented */
-		return &emptyconstraint;
+		if (cpuc->is_fake)
+			return amd_nb_event_constraint;
+
+		return __amd_get_nb_event_constraints(cpuc, event,
+						      amd_nb_event_constraint);
 	default:
 		return &emptyconstraint;
 	}
@@ -625,7 +697,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 
 static int setup_event_constraints(void)
 {
-	if (boot_cpu_data.x86 >= 0x15)
+	if (boot_cpu_data.x86 == 0x15)
 		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
 	return 0;
 }
@@ -655,6 +727,23 @@ static int setup_perfctr_core(void)
 	return 0;
 }
 
+static int setup_perfctr_nb(void)
+{
+	if (!cpu_has_perfctr_nb)
+		return -ENODEV;
+
+	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
+
+	if (cpu_has_perfctr_core)
+		amd_nb_event_constraint = &amd_NBPMC96;
+	else
+		amd_nb_event_constraint = &amd_NBPMC74;
+
+	printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
+
+	return 0;
+}
+
 __init int amd_pmu_init(void)
 {
 	/* Performance-monitoring supported from K7 and later: */
@@ -665,6 +754,7 @@ __init int amd_pmu_init(void)
 
 	setup_event_constraints();
 	setup_perfctr_core();
+	setup_perfctr_nb();
 
 	/* Events are common for all AMDs */
 	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-16 19:00     ` Jacob Shin
@ 2012-11-16 19:32       ` Robert Richter
  2012-11-16 21:57         ` Jacob Shin
  2012-11-28 17:42         ` Jacob Shin
  2012-11-18 16:35       ` Robert Richter
  1 sibling, 2 replies; 22+ messages in thread
From: Robert Richter @ 2012-11-16 19:32 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

On 16.11.12 13:00:30, Jacob Shin wrote:
> On Fri, Nov 16, 2012 at 07:43:44PM +0100, Robert Richter wrote:
> > On 15.11.12 15:31:53, Jacob Shin wrote:
> > > @@ -156,31 +161,28 @@ static inline int amd_pmu_addr_offset(int index)
> > >  	if (offset)
> > >  		return offset;
> > >  
> > > -	if (!cpu_has_perfctr_core)
> > > +	if (!cpu_has_perfctr_core) {
> > >  		offset = index;
> > > -	else
> > > +		ncore = AMD64_NUM_COUNTERS;
> > > +	} else {

First calculation:

> > >  		offset = index << 1;
> > > +		ncore = AMD64_NUM_COUNTERS_CORE;
> > > +	}
> > > +
> > > +	/* find offset of NB counters with respect to x86_pmu.eventsel */
> > > +	if (cpu_has_perfctr_nb) {
> > > +		if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))

Second calculation:

> > > +			offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
> > > +				 ((index - ncore) << 1);
> > > +	}
> > 
> > There is duplicate calculatoin of offset in some cases. Better we
> > avoid this.
> 
> Which cases? The code calculates the offset for a given index the very
> first time it is called, stores it, and uses that stored offset from
> then on. My [PATCH 3/4] sets that up.

One case above.

It looks like the paths should be defined more clearly.

> > > @@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> > >  	if (new == -1)
> > >  		return &emptyconstraint;
> > >  
> > > +	/* set up interrupts to be delivered only to this core */
> > > +	if (cpu_has_perfctr_nb) {
> > > +		struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> > > +
> > > +		hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> > > +		hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> > > +		hwc->config |= (0ULL | (c->cpu_core_id)) <<
> > > +			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> > > +	}
> > 
> > Looks like a hack to me. The constaints handler is only supposed to
> > determine constraints and not to touch anything in the event's
> > structure. This should be done later when setting up hwc->config in
> > amd_nb_event_config() or so.
> 
> Hm.. is the hwc->config called after constraints have been set up
> already? If so, I'll change it ..

Should be, since the hw register can be setup only after the counter
is selected.

> 
> > 
> > I also do not think that smp_processor_id() is the right thing to do
> > here. Since cpu_hw_events is per-cpu the cpu is already selected.
> 
> Yeah, I could not figure out how to get the cpu number from cpuc. Is
> there a container_of kind of thing that I can do to get the cpu number
> ?

At some point event->cpu is assigned, I think.

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-16 18:43   ` Robert Richter
@ 2012-11-16 19:00     ` Jacob Shin
  2012-11-16 19:32       ` Robert Richter
  2012-11-18 16:35       ` Robert Richter
  0 siblings, 2 replies; 22+ messages in thread
From: Jacob Shin @ 2012-11-16 19:00 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

On Fri, Nov 16, 2012 at 07:43:44PM +0100, Robert Richter wrote:
> Jacob,
> 
> On 15.11.12 15:31:53, Jacob Shin wrote:
> > On AMD family 15h processors, there are 4 new performance counters
> > (in addition to 6 core performance counters) that can be used for
> > counting northbridge events (i.e. DRAM accesses). Their bit fields are
> > almost identical to the core performance counters. However, unlike the
> > core performance counters, these MSRs are shared between multiple
> > cores (that share the same northbridge). We will reuse the same code
> > path as existing family 10h northbridge event constraints handler
> > logic to enforce this sharing.
> > 
> > These new counters are indexed contiguously right above the existing
> > core performance counters, and their indexes correspond to RDPMC ECX
> > values.
> > 
> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> 
> your approach looks ok to me in general, but see my comments inline.
> 
> > @@ -156,31 +161,28 @@ static inline int amd_pmu_addr_offset(int index)
> >  	if (offset)
> >  		return offset;
> >  
> > -	if (!cpu_has_perfctr_core)
> > +	if (!cpu_has_perfctr_core) {
> >  		offset = index;
> > -	else
> > +		ncore = AMD64_NUM_COUNTERS;
> > +	} else {
> >  		offset = index << 1;
> > +		ncore = AMD64_NUM_COUNTERS_CORE;
> > +	}
> > +
> > +	/* find offset of NB counters with respect to x86_pmu.eventsel */
> > +	if (cpu_has_perfctr_nb) {
> > +		if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))
> > +			offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
> > +				 ((index - ncore) << 1);
> > +	}
> 
> There is duplicate calculatoin of offset in some cases. Better we
> avoid this.

Which cases? The code calculates the offset for a given index the very
first time it is called, stores it, and uses that stored offset from
then on. My [PATCH 3/4] sets that up.

> 
> > +static int __amd_nb_hw_config(struct perf_event *event)
> > +{
> > +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > +	    event->attr.exclude_host || event->attr.exclude_guest)
> > +		return -EINVAL;
> >  
> > -	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> > +	event->hw.config &= ~ARCH_PERFMON_EVENTSEL_USR;
> > +	event->hw.config &= ~ARCH_PERFMON_EVENTSEL_OS;
> > +
> > +	if (event->hw.config & ~(AMD64_EVENTSEL_EVENT        |
> > +				 ARCH_PERFMON_EVENTSEL_UMASK |
> > +				 ARCH_PERFMON_EVENTSEL_INT   |
> > +				 AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE |
> > +				 AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK))
> > +		return -EINVAL;
> >  
> >  	return 0;
> >  }
> 
> Comments are missing and an AMD64_NB_EVENT_MASK macro should be
> defined for the above. See my previous version for reference:
> 
> /*
>  * AMD NB counters (MSRs 0xc0010240 etc.) do not support the following
>  * flags:
>  *
>  *  Host/Guest Only
>  *  Counter Mask
>  *  Invert Comparison
>  *  Edge Detect
>  *  Operating-System Mode
>  *  User Mode
>  *
>  * Try to fix the config for default settings, otherwise fail.
>  */
> static int amd_nb_event_config(struct perf_event *event)
> {
>        if (!amd_is_nb_perfctr_event(&event->hw))
>                return 0;
> 
>        if (event->attr.exclude_host || event->attr.exclude_guest
>            || event->attr.exclude_user || event->attr.exclude_kernel)
>                goto fail;
> 
>        event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | ARCH_PERFMON_EVENTSEL_OS);
> 
>        if (event->hw.config & ~(AMD64_NB_EVENT_MASK | ARCH_PERFMON_EVENTSEL_INT))
>                goto fail;
> 
>        return 0;
> fail:
>        pr_debug("Invalid nb counter config value: %016Lx\n", event->hw.config);
>        return -EINVAL;
> }
> 
> 
> > @@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> >  	if (new == -1)
> >  		return &emptyconstraint;
> >  
> > +	/* set up interrupts to be delivered only to this core */
> > +	if (cpu_has_perfctr_nb) {
> > +		struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> > +
> > +		hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> > +		hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> > +		hwc->config |= (0ULL | (c->cpu_core_id)) <<
> > +			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> > +	}
> 
> Looks like a hack to me. The constaints handler is only supposed to
> determine constraints and not to touch anything in the event's
> structure. This should be done later when setting up hwc->config in
> amd_nb_event_config() or so.

Hm.. is the hwc->config called after constraints have been set up
already? If so, I'll change it ..

> 
> I also do not think that smp_processor_id() is the right thing to do
> here. Since cpu_hw_events is per-cpu the cpu is already selected.

Yeah, I could not figure out how to get the cpu number from cpuc. Is
there a container_of kind of thing that I can do to get the cpu number
?

> 
> > +
> >  	return &nb->event_constraints[new];
> >  }
> >  
> > @@ -520,6 +575,7 @@ static struct event_constraint amd_f15_PMC3  = EVENT_CONSTRAINT(0, 0x08, 0);
> >  static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09, 0);
> >  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
> >  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
> > +static struct event_constraint amd_f15_NBPMC30 = EVENT_CONSTRAINT(0, 0x3C0, 0);
> 
> The counter index mask depends on the number of core counters which is
> either 4 or 6 (depending on cpu_has_perfctr_core).
> 
> >  static int setup_event_constraints(void)
> >  {
> > -	if (boot_cpu_data.x86 >= 0x15)
> > +	if (boot_cpu_data.x86 == 0x15)
> >  		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
> 
> Since this does not cover family 16h anymore, you also need to extend
> amd_get_event_ constraints() to setup nb counters with __amd_get_nb_
> event_constraints() if cpu_has_perfctr_nb is set.

Yes family 16h will be covered by a separate patch at a later date.

> 
> >  	return 0;
> >  }
> > @@ -655,6 +714,18 @@ static int setup_perfctr_core(void)
> >  	return 0;
> >  }
> >  
> > +static int setup_perfctr_nb(void)
> > +{
> > +	if (!cpu_has_perfctr_nb)
> > +		return -ENODEV;
> > +
> > +	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
> 
> You should add a nb counter mask here which is used for nb counters.
> 
> The mask can be either 0x3c0 or 0x0f0 depending on cpu_has_perfctr_
> core for later use, you will need it at various locations.
> 
> In general I also would try to write the code in a way that make
> further cpu_has_perfctr_nb lookups unnecessary. There are many tests
> of this your code.

Okay will spin V3 soon.

> 
> -Robert
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-15 21:31 ` [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
@ 2012-11-16 18:43   ` Robert Richter
  2012-11-16 19:00     ` Jacob Shin
  0 siblings, 1 reply; 22+ messages in thread
From: Robert Richter @ 2012-11-16 18:43 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Thomas Gleixner, H. Peter Anvin,
	Stephane Eranian, x86, linux-kernel

Jacob,

On 15.11.12 15:31:53, Jacob Shin wrote:
> On AMD family 15h processors, there are 4 new performance counters
> (in addition to 6 core performance counters) that can be used for
> counting northbridge events (i.e. DRAM accesses). Their bit fields are
> almost identical to the core performance counters. However, unlike the
> core performance counters, these MSRs are shared between multiple
> cores (that share the same northbridge). We will reuse the same code
> path as existing family 10h northbridge event constraints handler
> logic to enforce this sharing.
> 
> These new counters are indexed contiguously right above the existing
> core performance counters, and their indexes correspond to RDPMC ECX
> values.
> 
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>

your approach looks ok to me in general, but see my comments inline.

> @@ -156,31 +161,28 @@ static inline int amd_pmu_addr_offset(int index)
>  	if (offset)
>  		return offset;
>  
> -	if (!cpu_has_perfctr_core)
> +	if (!cpu_has_perfctr_core) {
>  		offset = index;
> -	else
> +		ncore = AMD64_NUM_COUNTERS;
> +	} else {
>  		offset = index << 1;
> +		ncore = AMD64_NUM_COUNTERS_CORE;
> +	}
> +
> +	/* find offset of NB counters with respect to x86_pmu.eventsel */
> +	if (cpu_has_perfctr_nb) {
> +		if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))
> +			offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
> +				 ((index - ncore) << 1);
> +	}

There is duplicate calculatoin of offset in some cases. Better we
avoid this.

> +static int __amd_nb_hw_config(struct perf_event *event)
> +{
> +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
>  
> -	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> +	event->hw.config &= ~ARCH_PERFMON_EVENTSEL_USR;
> +	event->hw.config &= ~ARCH_PERFMON_EVENTSEL_OS;
> +
> +	if (event->hw.config & ~(AMD64_EVENTSEL_EVENT        |
> +				 ARCH_PERFMON_EVENTSEL_UMASK |
> +				 ARCH_PERFMON_EVENTSEL_INT   |
> +				 AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE |
> +				 AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK))
> +		return -EINVAL;
>  
>  	return 0;
>  }

Comments are missing and an AMD64_NB_EVENT_MASK macro should be
defined for the above. See my previous version for reference:

/*
 * AMD NB counters (MSRs 0xc0010240 etc.) do not support the following
 * flags:
 *
 *  Host/Guest Only
 *  Counter Mask
 *  Invert Comparison
 *  Edge Detect
 *  Operating-System Mode
 *  User Mode
 *
 * Try to fix the config for default settings, otherwise fail.
 */
static int amd_nb_event_config(struct perf_event *event)
{
       if (!amd_is_nb_perfctr_event(&event->hw))
               return 0;

       if (event->attr.exclude_host || event->attr.exclude_guest
           || event->attr.exclude_user || event->attr.exclude_kernel)
               goto fail;

       event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | ARCH_PERFMON_EVENTSEL_OS);

       if (event->hw.config & ~(AMD64_NB_EVENT_MASK | ARCH_PERFMON_EVENTSEL_INT))
               goto fail;

       return 0;
fail:
       pr_debug("Invalid nb counter config value: %016Lx\n", event->hw.config);
       return -EINVAL;
}


> @@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
>  	if (new == -1)
>  		return &emptyconstraint;
>  
> +	/* set up interrupts to be delivered only to this core */
> +	if (cpu_has_perfctr_nb) {
> +		struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
> +
> +		hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
> +		hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
> +		hwc->config |= (0ULL | (c->cpu_core_id)) <<
> +			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
> +	}

Looks like a hack to me. The constaints handler is only supposed to
determine constraints and not to touch anything in the event's
structure. This should be done later when setting up hwc->config in
amd_nb_event_config() or so.

I also do not think that smp_processor_id() is the right thing to do
here. Since cpu_hw_events is per-cpu the cpu is already selected.

> +
>  	return &nb->event_constraints[new];
>  }
>  
> @@ -520,6 +575,7 @@ static struct event_constraint amd_f15_PMC3  = EVENT_CONSTRAINT(0, 0x08, 0);
>  static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09, 0);
>  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
>  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
> +static struct event_constraint amd_f15_NBPMC30 = EVENT_CONSTRAINT(0, 0x3C0, 0);

The counter index mask depends on the number of core counters which is
either 4 or 6 (depending on cpu_has_perfctr_core).

>  static int setup_event_constraints(void)
>  {
> -	if (boot_cpu_data.x86 >= 0x15)
> +	if (boot_cpu_data.x86 == 0x15)
>  		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;

Since this does not cover family 16h anymore, you also need to extend
amd_get_event_ constraints() to setup nb counters with __amd_get_nb_
event_constraints() if cpu_has_perfctr_nb is set.

>  	return 0;
>  }
> @@ -655,6 +714,18 @@ static int setup_perfctr_core(void)
>  	return 0;
>  }
>  
> +static int setup_perfctr_nb(void)
> +{
> +	if (!cpu_has_perfctr_nb)
> +		return -ENODEV;
> +
> +	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;

You should add a nb counter mask here which is used for nb counters.

The mask can be either 0x3c0 or 0x0f0 depending on cpu_has_perfctr_
core for later use, you will need it at various locations.

In general I also would try to write the code in a way that make
further cpu_has_perfctr_nb lookups unnecessary. There are many tests
of this your code.

-Robert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h
  2012-11-15 21:31 [PATCH V2 " Jacob Shin
@ 2012-11-15 21:31 ` Jacob Shin
  2012-11-16 18:43   ` Robert Richter
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Shin @ 2012-11-15 21:31 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Thomas Gleixner, H. Peter Anvin, Stephane Eranian,
	Robert Richter, x86, linux-kernel, Jacob Shin

On AMD family 15h processors, there are 4 new performance counters
(in addition to 6 core performance counters) that can be used for
counting northbridge events (i.e. DRAM accesses). Their bit fields are
almost identical to the core performance counters. However, unlike the
core performance counters, these MSRs are shared between multiple
cores (that share the same northbridge). We will reuse the same code
path as existing family 10h northbridge event constraints handler
logic to enforce this sharing.

These new counters are indexed contiguously right above the existing
core performance counters, and their indexes correspond to RDPMC ECX
values.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/include/asm/cpufeature.h    |    2 +
 arch/x86/include/asm/msr-index.h     |    2 +
 arch/x86/include/asm/perf_event.h    |    6 ++
 arch/x86/kernel/cpu/perf_event_amd.c |  116 +++++++++++++++++++++++++++-------
 4 files changed, 104 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 8c297aa..b05c722 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -167,6 +167,7 @@
 #define X86_FEATURE_TBM		(6*32+21) /* trailing bit manipulations */
 #define X86_FEATURE_TOPOEXT	(6*32+22) /* topology extensions CPUID leafs */
 #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
+#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* core performance counter extensions */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
@@ -308,6 +309,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_perfctr_nb	boot_cpu_has(X86_FEATURE_PERFCTR_NB)
 #define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
 #define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
 #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..e67ff1e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -157,6 +157,8 @@
 /* Fam 15h MSRs */
 #define MSR_F15H_PERF_CTL		0xc0010200
 #define MSR_F15H_PERF_CTR		0xc0010201
+#define MSR_F15H_NB_PERF_CTL		0xc0010240
+#define MSR_F15H_NB_PERF_CTR		0xc0010241
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 4fabcdf..a610ddb 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,9 +29,14 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
+#define AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE		(1ULL << 36)
 #define AMD_PERFMON_EVENTSEL_GUESTONLY			(1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY			(1ULL << 41)
 
+#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT		37
+#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK		\
+	(0xFULL << AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT)
+
 #define AMD64_EVENTSEL_EVENT	\
 	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
 #define INTEL_ARCH_EVENT_MASK	\
@@ -48,6 +53,7 @@
 	 AMD64_EVENTSEL_EVENT)
 #define AMD64_NUM_COUNTERS				4
 #define AMD64_NUM_COUNTERS_CORE				6
+#define AMD64_NUM_COUNTERS_NB				4
 
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index d6e3337..2fb7b8c 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -143,10 +143,15 @@ static unsigned int addr_offsets[X86_PMC_IDX_MAX] __read_mostly;
  *
  * CPUs with core performance counter extensions:
  *   6 counters starting at 0xc0010200 each offset by 2
+ *
+ * CPUs with north bridge performance counter extensions:
+ *   4 additional counters starting at 0xc0010240 each offset by 2
+ *   (indexed right above either one of the above core counters)
  */
 static inline int amd_pmu_addr_offset(int index)
 {
 	int offset;
+	int ncore;
 
 	if (!index)
 		return index;
@@ -156,31 +161,28 @@ static inline int amd_pmu_addr_offset(int index)
 	if (offset)
 		return offset;
 
-	if (!cpu_has_perfctr_core)
+	if (!cpu_has_perfctr_core) {
 		offset = index;
-	else
+		ncore = AMD64_NUM_COUNTERS;
+	} else {
 		offset = index << 1;
+		ncore = AMD64_NUM_COUNTERS_CORE;
+	}
+
+	/* find offset of NB counters with respect to x86_pmu.eventsel */
+	if (cpu_has_perfctr_nb) {
+		if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))
+			offset = (MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel) +
+				 ((index - ncore) << 1);
+	}
 
 	addr_offsets[index] = offset;
 
 	return offset;
 }
 
-static int amd_pmu_hw_config(struct perf_event *event)
+static int __amd_core_hw_config(struct perf_event *event)
 {
-	int ret;
-
-	/* pass precise event sampling to ibs: */
-	if (event->attr.precise_ip && get_ibs_caps())
-		return -ENOENT;
-
-	ret = x86_pmu_hw_config(event);
-	if (ret)
-		return ret;
-
-	if (has_branch_stack(event))
-		return -EOPNOTSUPP;
-
 	if (event->attr.exclude_host && event->attr.exclude_guest)
 		/*
 		 * When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -194,10 +196,24 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	else if (event->attr.exclude_guest)
 		event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;
 
-	if (event->attr.type != PERF_TYPE_RAW)
-		return 0;
+	return 0;
+}
+
+static int __amd_nb_hw_config(struct perf_event *event)
+{
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
 
-	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+	event->hw.config &= ~ARCH_PERFMON_EVENTSEL_USR;
+	event->hw.config &= ~ARCH_PERFMON_EVENTSEL_OS;
+
+	if (event->hw.config & ~(AMD64_EVENTSEL_EVENT        |
+				 ARCH_PERFMON_EVENTSEL_UMASK |
+				 ARCH_PERFMON_EVENTSEL_INT   |
+				 AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE |
+				 AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK))
+		return -EINVAL;
 
 	return 0;
 }
@@ -215,6 +231,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
 	return (hwc->config & 0xe0) == 0xe0;
 }
 
+static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
+{
+	return cpu_has_perfctr_nb && amd_is_nb_event(hwc);
+}
+
 static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 {
 	struct amd_nb *nb = cpuc->amd_nb;
@@ -222,6 +243,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 	return nb && nb->nb_id != -1;
 }
 
+static int amd_pmu_hw_config(struct perf_event *event)
+{
+	int ret;
+
+	/* pass precise event sampling to ibs: */
+	if (event->attr.precise_ip && get_ibs_caps())
+		return -ENOENT;
+
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	ret = x86_pmu_hw_config(event);
+	if (ret)
+		return ret;
+
+	if (event->attr.type == PERF_TYPE_RAW)
+		event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+
+	if (amd_is_perfctr_nb_event(&event->hw))
+		return __amd_nb_hw_config(event);
+
+	return __amd_core_hw_config(event);
+}
+
 static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
 					   struct perf_event *event)
 {
@@ -323,6 +368,16 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
 	if (new == -1)
 		return &emptyconstraint;
 
+	/* set up interrupts to be delivered only to this core */
+	if (cpu_has_perfctr_nb) {
+		struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+
+		hwc->config |= AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE;
+		hwc->config &= ~AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK;
+		hwc->config |= (0ULL | (c->cpu_core_id)) <<
+			AMD_PERFMON_EVENTSEL_INT_CORE_SEL_SHIFT;
+	}
+
 	return &nb->event_constraints[new];
 }
 
@@ -520,6 +575,7 @@ static struct event_constraint amd_f15_PMC3  = EVENT_CONSTRAINT(0, 0x08, 0);
 static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09, 0);
 static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
 static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
+static struct event_constraint amd_f15_NBPMC30 = EVENT_CONSTRAINT(0, 0x3C0, 0);
 
 static struct event_constraint *
 amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
@@ -586,8 +642,11 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
 			return &amd_f15_PMC20;
 		}
 	case AMD_EVENT_NB:
-		/* not yet implemented */
-		return &emptyconstraint;
+		if (cpuc->is_fake)
+			return &amd_f15_NBPMC30;
+
+		return __amd_get_nb_event_constraints(cpuc, event,
+						      &amd_f15_NBPMC30);
 	default:
 		return &emptyconstraint;
 	}
@@ -625,7 +684,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 
 static int setup_event_constraints(void)
 {
-	if (boot_cpu_data.x86 >= 0x15)
+	if (boot_cpu_data.x86 == 0x15)
 		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
 	return 0;
 }
@@ -655,6 +714,18 @@ static int setup_perfctr_core(void)
 	return 0;
 }
 
+static int setup_perfctr_nb(void)
+{
+	if (!cpu_has_perfctr_nb)
+		return -ENODEV;
+
+	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
+
+	printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
+
+	return 0;
+}
+
 __init int amd_pmu_init(void)
 {
 	/* Performance-monitoring supported from K7 and later: */
@@ -665,6 +736,7 @@ __init int amd_pmu_init(void)
 
 	setup_event_constraints();
 	setup_perfctr_core();
+	setup_perfctr_nb();
 
 	/* Events are common for all AMDs */
 	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-11-28 17:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-10  1:01 [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
2012-11-10  1:01 ` [PATCH 1/4] perf, amd: Simplify northbridge event constraints handler Jacob Shin
2012-11-10  1:01 ` [PATCH 2/4] perf, amd: Refactor northbridge event constraints handler for code sharing Jacob Shin
2012-11-10  1:01 ` [PATCH 3/4] perf, x86: Move MSR address offset calculation to architecture specific files Jacob Shin
2012-11-10  1:01 ` [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
2012-11-10 11:50 ` [PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters Robert Richter
2012-11-11 18:17   ` Stephane Eranian
2012-11-12 14:36     ` Robert Richter
2012-11-11 18:44   ` Jacob Shin
2012-11-12 12:24     ` Stephane Eranian
2012-11-12 14:22       ` Robert Richter
2012-11-12 16:13         ` Jacob Shin
2012-11-12 21:08           ` Stephane Eranian
2012-11-12 14:55     ` Robert Richter
2012-11-15 21:31 [PATCH V2 " Jacob Shin
2012-11-15 21:31 ` [PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
2012-11-16 18:43   ` Robert Richter
2012-11-16 19:00     ` Jacob Shin
2012-11-16 19:32       ` Robert Richter
2012-11-16 21:57         ` Jacob Shin
2012-11-26 12:15           ` Robert Richter
2012-11-28 17:42         ` Jacob Shin
2012-11-18 16:35       ` Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).