[PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters
@ 2013-02-06 17:26 Jacob Shin
  2013-02-06 17:26 ` [PATCH 1/6] perf, amd: Rework northbridge event constraints handler Jacob Shin
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel, Jacob Shin

The following patchset enables 4 additional performance counters in
AMD family 15h processors that count northbridge events -- such as
number of DRAM accesses.

This patchset is based on previous work done by Robert Richter
<rric@kernel.org> :

https://lkml.org/lkml/2012/6/19/324

The main differences are:

* The northbridge counters are indexed contiguously right above the
  core performance counters.

* MSR address offset calculations are moved to architecture specific
  files.

* Interrups are set up to be delivered only to a single core.

v6:
Revised per feedback from Stephane Eranian. Updated to only allow
counting mode on northbridge counters.

V5:
Rebased against latest tip

V4:
* Moved interrupt core select set up back to event constraints
  function, sicne during ->hw_config time we do not yet know on which
  CPU the the event will run on.
* Tested on and made minor revisions to make sure that the patchset is
  compatible with upcoming AMD Family 16h processors, and will support
  core and NB counters without any further patches.

V3:
Addressed the following feedback/comments from Robert's review
* https://lkml.org/lkml/2012/11/16/484
* https://lkml.org/lkml/2012/11/26/162

V2:
Separate out Robert's patches, and add properly ordered certificate of
origins.

Jacob Shin (4):
  perf, amd: Use proper naming scheme for AMD bit field definitions
  perf, x86: Move MSR address offset calculation to architecture
    specific files
  perf, x86: Allow for architecture specific RDPMC indexes
  perf, amd: Enable northbridge performance counters on AMD family 15h

Robert Richter (2):
  perf, amd: Rework northbridge event constraints handler
  perf, amd: Generalize northbridge constraints code for family 15h

 arch/x86/include/asm/cpufeature.h     |    2 +
 arch/x86/include/asm/perf_event.h     |   13 +-
 arch/x86/include/uapi/asm/msr-index.h |    2 +
 arch/x86/kernel/cpu/perf_event.c      |    2 +-
 arch/x86/kernel/cpu/perf_event.h      |   25 +--
 arch/x86/kernel/cpu/perf_event_amd.c  |  322 +++++++++++++++++++++++++--------
 6 files changed, 272 insertions(+), 94 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] perf, amd: Rework northbridge event constraints handler
  2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
@ 2013-02-06 17:26 ` Jacob Shin
  2013-02-06 20:28   ` [tip:perf/core] perf/x86/amd: " tip-bot for Robert Richter
  2013-02-06 17:26 ` [PATCH 2/6] perf, amd: Generalize northbridge constraints code for family 15h Jacob Shin
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel, Robert Richter, Jacob Shin

From: Robert Richter <rric@kernel.org>

Code simplification. No functional changes.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c |   68 +++++++++++++---------------------
 1 file changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index c93bc4e..e7963c7 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -256,9 +256,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
-	struct perf_event *old = NULL;
-	int max = x86_pmu.num_counters;
-	int i, j, k = -1;
+	struct perf_event *old;
+	int idx, new = -1;
 
 	/*
 	 * if not NB event or no NB, then no constraints
@@ -276,48 +275,33 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	 * because of successive calls to x86_schedule_events() from
 	 * hw_perf_group_sched_in() without hw_perf_enable()
 	 */
-	for (i = 0; i < max; i++) {
-		/*
-		 * keep track of first free slot
-		 */
-		if (k == -1 && !nb->owners[i])
-			k = i;
+	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+		if (new == -1 || hwc->idx == idx)
+			/* assign free slot, prefer hwc->idx */
+			old = cmpxchg(nb->owners + idx, NULL, event);
+		else if (nb->owners[idx] == event)
+			/* event already present */
+			old = event;
+		else
+			continue;
+
+		if (old && old != event)
+			continue;
+
+		/* reassign to this slot */
+		if (new != -1)
+			cmpxchg(nb->owners + new, event, NULL);
+		new = idx;
 
 		/* already present, reuse */
-		if (nb->owners[i] == event)
-			goto done;
-	}
-	/*
-	 * not present, so grab a new slot
-	 * starting either at:
-	 */
-	if (hwc->idx != -1) {
-		/* previous assignment */
-		i = hwc->idx;
-	} else if (k != -1) {
-		/* start from free slot found */
-		i = k;
-	} else {
-		/*
-		 * event not found, no slot found in
-		 * first pass, try again from the
-		 * beginning
-		 */
-		i = 0;
-	}
-	j = i;
-	do {
-		old = cmpxchg(nb->owners+i, NULL, event);
-		if (!old)
+		if (old == event)
 			break;
-		if (++i == max)
-			i = 0;
-	} while (i != j);
-done:
-	if (!old)
-		return &nb->event_constraints[i];
-
-	return &emptyconstraint;
+	}
+
+	if (new == -1)
+		return &emptyconstraint;
+
+	return &nb->event_constraints[new];
 }
 
 static struct amd_nb *amd_alloc_nb(int cpu)
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] perf, amd: Generalize northbridge constraints code for family 15h
  2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
  2013-02-06 17:26 ` [PATCH 1/6] perf, amd: Rework northbridge event constraints handler Jacob Shin
@ 2013-02-06 17:26 ` Jacob Shin
  2013-02-06 20:29   ` [tip:perf/core] perf/x86/amd: " tip-bot for Robert Richter
  2013-02-06 17:26 ` [PATCH 3/6] perf, amd: Use proper naming scheme for AMD bit field definitions Jacob Shin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel, Robert Richter, Jacob Shin

From: Robert Richter <rric@kernel.org>

Generalize northbridge constraints code for family 10h so that later
we can reuse the same code path with other AMD processor families that
have the same northbridge event constraints.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c |   43 ++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index e7963c7..f8c9dfb 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -188,20 +188,13 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 	return nb && nb->nb_id != -1;
 }
 
-static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
-				      struct perf_event *event)
+static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
+					   struct perf_event *event)
 {
-	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
 	int i;
 
 	/*
-	 * only care about NB events
-	 */
-	if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-		return;
-
-	/*
 	 * need to scan whole list because event may not have
 	 * been assigned during scheduling
 	 *
@@ -247,12 +240,13 @@ static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
   *
   * Given that resources are allocated (cmpxchg), they must be
   * eventually freed for others to use. This is accomplished by
-  * calling amd_put_event_constraints().
+  * calling __amd_put_nb_event_constraints()
   *
   * Non NB events are not impacted by this restriction.
   */
 static struct event_constraint *
-amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+__amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
+			       struct event_constraint *c)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
@@ -260,12 +254,6 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	int idx, new = -1;
 
 	/*
-	 * if not NB event or no NB, then no constraints
-	 */
-	if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-		return &unconstrained;
-
-	/*
 	 * detect if already present, if so reuse
 	 *
 	 * cannot merge with actual allocation
@@ -275,7 +263,7 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	 * because of successive calls to x86_schedule_events() from
 	 * hw_perf_group_sched_in() without hw_perf_enable()
 	 */
-	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+	for_each_set_bit(idx, c->idxmsk, x86_pmu.num_counters) {
 		if (new == -1 || hwc->idx == idx)
 			/* assign free slot, prefer hwc->idx */
 			old = cmpxchg(nb->owners + idx, NULL, event);
@@ -391,6 +379,25 @@ static void amd_pmu_cpu_dead(int cpu)
 	}
 }
 
+static struct event_constraint *
+amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	/*
+	 * if not NB event or no NB, then no constraints
+	 */
+	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
+		return &unconstrained;
+
+	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
+}
+
+static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
+				      struct perf_event *event)
+{
+	if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
+		__amd_put_nb_event_constraints(cpuc, event);
+}
+
 PMU_FORMAT_ATTR(event,	"config:0-7,32-35");
 PMU_FORMAT_ATTR(umask,	"config:8-15"	);
 PMU_FORMAT_ATTR(edge,	"config:18"	);
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] perf, amd: Use proper naming scheme for AMD bit field definitions
  2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
  2013-02-06 17:26 ` [PATCH 1/6] perf, amd: Rework northbridge event constraints handler Jacob Shin
  2013-02-06 17:26 ` [PATCH 2/6] perf, amd: Generalize northbridge constraints code for family 15h Jacob Shin
@ 2013-02-06 17:26 ` Jacob Shin
  2013-02-06 20:30   ` [tip:perf/core] perf/x86/amd: " tip-bot for Jacob Shin
  2013-02-06 17:26 ` [PATCH 4/6] perf, x86: Move MSR address offset calculation to architecture specific files Jacob Shin
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel, Jacob Shin

Update these AMD bit field names to be consistent with naming
convention followed by the rest of the file.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/include/asm/perf_event.h    |    4 ++--
 arch/x86/kernel/cpu/perf_event_amd.c |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 4fabcdf..2234eaaec 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,8 +29,8 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
-#define AMD_PERFMON_EVENTSEL_GUESTONLY			(1ULL << 40)
-#define AMD_PERFMON_EVENTSEL_HOSTONLY			(1ULL << 41)
+#define AMD64_EVENTSEL_GUESTONLY			(1ULL << 40)
+#define AMD64_EVENTSEL_HOSTONLY				(1ULL << 41)
 
 #define AMD64_EVENTSEL_EVENT	\
 	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index f8c9dfb..aea8c20 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -156,9 +156,9 @@ static int amd_pmu_hw_config(struct perf_event *event)
 		event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
 				      ARCH_PERFMON_EVENTSEL_OS);
 	else if (event->attr.exclude_host)
-		event->hw.config |= AMD_PERFMON_EVENTSEL_GUESTONLY;
+		event->hw.config |= AMD64_EVENTSEL_GUESTONLY;
 	else if (event->attr.exclude_guest)
-		event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;
+		event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
 
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
@@ -336,7 +336,7 @@ static void amd_pmu_cpu_starting(int cpu)
 	struct amd_nb *nb;
 	int i, nb_id;
 
-	cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+	cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
 
 	if (boot_cpu_data.x86_max_cores < 2)
 		return;
@@ -669,7 +669,7 @@ void amd_pmu_disable_virt(void)
 	 * SVM is disabled the Guest-only bits still gets set and the counter
 	 * will not count anything.
 	 */
-	cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+	cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
 
 	/* Reload all events */
 	x86_pmu_disable_all();
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] perf, x86: Move MSR address offset calculation to architecture specific files
  2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
                   ` (2 preceding siblings ...)
  2013-02-06 17:26 ` [PATCH 3/6] perf, amd: Use proper naming scheme for AMD bit field definitions Jacob Shin
@ 2013-02-06 17:26 ` Jacob Shin
  2013-02-06 20:31   ` [tip:perf/core] perf/x86: " tip-bot for Jacob Shin
  2013-02-06 17:26 ` [PATCH 5/6] perf, x86: Allow for architecture specific RDPMC indexes Jacob Shin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel, Jacob Shin

Move counter index to MSR address offset calculation to architecture
specific files. This prepares the way for perf_event_amd to enable
counter addresses that are not contiguous -- for example AMD Family
15h processors have 6 core performance counters starting at 0xc0010200
and 4 northbridge performance counters starting at 0xc0010240.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/kernel/cpu/perf_event.h     |   21 ++++-------------
 arch/x86/kernel/cpu/perf_event_amd.c |   42 ++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 158f46b..c455cba 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -351,6 +351,7 @@ struct x86_pmu {
 	int		(*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
 	unsigned	eventsel;
 	unsigned	perfctr;
+	int		(*addr_offset)(int index, bool eventsel);
 	u64		(*event_map)(int);
 	int		max_events;
 	int		num_counters;
@@ -497,28 +498,16 @@ extern u64 __read_mostly hw_cache_extra_regs
 
 u64 x86_perf_event_update(struct perf_event *event);
 
-static inline int x86_pmu_addr_offset(int index)
-{
-	int offset;
-
-	/* offset = X86_FEATURE_PERFCTR_CORE ? index << 1 : index */
-	alternative_io(ASM_NOP2,
-		       "shll $1, %%eax",
-		       X86_FEATURE_PERFCTR_CORE,
-		       "=a" (offset),
-		       "a"  (index));
-
-	return offset;
-}
-
 static inline unsigned int x86_pmu_config_addr(int index)
 {
-	return x86_pmu.eventsel + x86_pmu_addr_offset(index);
+	return x86_pmu.eventsel + (x86_pmu.addr_offset ?
+				   x86_pmu.addr_offset(index, true) : index);
 }
 
 static inline unsigned int x86_pmu_event_addr(int index)
 {
-	return x86_pmu.perfctr + x86_pmu_addr_offset(index);
+	return x86_pmu.perfctr + (x86_pmu.addr_offset ?
+				  x86_pmu.addr_offset(index, false) : index);
 }
 
 int x86_setup_perfctr(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index aea8c20..b60f31c 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -132,6 +132,47 @@ static u64 amd_pmu_event_map(int hw_event)
 	return amd_perfmon_event_map[hw_event];
 }
 
+/*
+ * Previously calculated offsets
+ */
+static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
+static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
+
+/*
+ * Legacy CPUs:
+ *   4 counters starting at 0xc0010000 each offset by 1
+ *
+ * CPUs with core performance counter extensions:
+ *   6 counters starting at 0xc0010200 each offset by 2
+ */
+static inline int amd_pmu_addr_offset(int index, bool eventsel)
+{
+	int offset;
+
+	if (!index)
+		return index;
+
+	if (eventsel)
+		offset = event_offsets[index];
+	else
+		offset = count_offsets[index];
+
+	if (offset)
+		return offset;
+
+	if (!cpu_has_perfctr_core)
+		offset = index;
+	else
+		offset = index << 1;
+
+	if (eventsel)
+		event_offsets[index] = offset;
+	else
+		count_offsets[index] = offset;
+
+	return offset;
+}
+
 static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
@@ -578,6 +619,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.schedule_events	= x86_schedule_events,
 	.eventsel		= MSR_K7_EVNTSEL0,
 	.perfctr		= MSR_K7_PERFCTR0,
+	.addr_offset            = amd_pmu_addr_offset,
 	.event_map		= amd_pmu_event_map,
 	.max_events		= ARRAY_SIZE(amd_perfmon_event_map),
 	.num_counters		= AMD64_NUM_COUNTERS,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] perf, x86: Allow for architecture specific RDPMC indexes
  2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
                   ` (3 preceding siblings ...)
  2013-02-06 17:26 ` [PATCH 4/6] perf, x86: Move MSR address offset calculation to architecture specific files Jacob Shin
@ 2013-02-06 17:26 ` Jacob Shin
  2013-02-06 20:32   ` [tip:perf/core] perf/x86: " tip-bot for Jacob Shin
  2013-02-06 17:26 ` [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
  2013-02-06 17:31 ` [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
  6 siblings, 1 reply; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel, Jacob Shin

Similar to config_base and event_base, allow architecture specific
RDPMC ECX values.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.c     |    2 +-
 arch/x86/kernel/cpu/perf_event.h     |    6 ++++++
 arch/x86/kernel/cpu/perf_event_amd.c |    6 ++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index c6ef37a..5ed7a4c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -829,7 +829,7 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 	} else {
 		hwc->config_base = x86_pmu_config_addr(hwc->idx);
 		hwc->event_base  = x86_pmu_event_addr(hwc->idx);
-		hwc->event_base_rdpmc = hwc->idx;
+		hwc->event_base_rdpmc = x86_pmu_rdpmc_index(hwc->idx);
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index c455cba..1a2ea03 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -352,6 +352,7 @@ struct x86_pmu {
 	unsigned	eventsel;
 	unsigned	perfctr;
 	int		(*addr_offset)(int index, bool eventsel);
+	int		(*rdpmc_index)(int index);
 	u64		(*event_map)(int);
 	int		max_events;
 	int		num_counters;
@@ -510,6 +511,11 @@ static inline unsigned int x86_pmu_event_addr(int index)
 				  x86_pmu.addr_offset(index, false) : index);
 }
 
+static inline int x86_pmu_rdpmc_index(int index)
+{
+	return x86_pmu.rdpmc_index ? x86_pmu.rdpmc_index(index) : index;
+}
+
 int x86_setup_perfctr(struct perf_event *event);
 
 int x86_pmu_hw_config(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index b60f31c..05462f0 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -173,6 +173,11 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
 	return offset;
 }
 
+static inline int amd_pmu_rdpmc_index(int index)
+{
+	return index;
+}
+
 static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
@@ -620,6 +625,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.eventsel		= MSR_K7_EVNTSEL0,
 	.perfctr		= MSR_K7_PERFCTR0,
 	.addr_offset            = amd_pmu_addr_offset,
+	.rdpmc_index		= amd_pmu_rdpmc_index,
 	.event_map		= amd_pmu_event_map,
 	.max_events		= ARRAY_SIZE(amd_perfmon_event_map),
 	.num_counters		= AMD64_NUM_COUNTERS,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h
  2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
                   ` (4 preceding siblings ...)
  2013-02-06 17:26 ` [PATCH 5/6] perf, x86: Allow for architecture specific RDPMC indexes Jacob Shin
@ 2013-02-06 17:26 ` Jacob Shin
  2013-02-07 17:57   ` Jacob Shin
                     ` (2 more replies)
  2013-02-06 17:31 ` [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
  6 siblings, 3 replies; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel, Jacob Shin

On AMD family 15h processors, there are 4 new performance counters
(in addition to 6 core performance counters) that can be used for
counting northbridge events (i.e. DRAM accesses). Their bit fields are
almost identical to the core performance counters. However, unlike the
core performance counters, these MSRs are shared between multiple
cores (that share the same northbridge). We will reuse the same code
path as existing family 10h northbridge event constraints handler
logic to enforce this sharing.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
---
 arch/x86/include/asm/cpufeature.h     |    2 +
 arch/x86/include/asm/perf_event.h     |    9 ++
 arch/x86/include/uapi/asm/msr-index.h |    2 +
 arch/x86/kernel/cpu/perf_event_amd.c  |  171 +++++++++++++++++++++++++++++----
 4 files changed, 164 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 2d9075e..93fe929 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -167,6 +167,7 @@
 #define X86_FEATURE_TBM		(6*32+21) /* trailing bit manipulations */
 #define X86_FEATURE_TOPOEXT	(6*32+22) /* topology extensions CPUID leafs */
 #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
+#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* NB performance counter extensions */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
@@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_perfctr_nb	boot_cpu_has(X86_FEATURE_PERFCTR_NB)
 #define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
 #define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
 #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 2234eaaec..57cb634 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,9 +29,14 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
+#define AMD64_EVENTSEL_INT_CORE_ENABLE			(1ULL << 36)
 #define AMD64_EVENTSEL_GUESTONLY			(1ULL << 40)
 #define AMD64_EVENTSEL_HOSTONLY				(1ULL << 41)
 
+#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT		37
+#define AMD64_EVENTSEL_INT_CORE_SEL_MASK		\
+	(0xFULL << AMD64_EVENTSEL_INT_CORE_SEL_SHIFT)
+
 #define AMD64_EVENTSEL_EVENT	\
 	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
 #define INTEL_ARCH_EVENT_MASK	\
@@ -46,8 +51,12 @@
 #define AMD64_RAW_EVENT_MASK		\
 	(X86_RAW_EVENT_MASK          |  \
 	 AMD64_EVENTSEL_EVENT)
+#define AMD64_RAW_EVENT_MASK_NB		\
+	(AMD64_EVENTSEL_EVENT        |  \
+	 ARCH_PERFMON_EVENTSEL_UMASK)
 #define AMD64_NUM_COUNTERS				4
 #define AMD64_NUM_COUNTERS_CORE				6
+#define AMD64_NUM_COUNTERS_NB				4
 
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index 1031604..27c05d2 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -195,6 +195,8 @@
 /* Fam 15h MSRs */
 #define MSR_F15H_PERF_CTL		0xc0010200
 #define MSR_F15H_PERF_CTR		0xc0010201
+#define MSR_F15H_NB_PERF_CTL		0xc0010240
+#define MSR_F15H_NB_PERF_CTR		0xc0010241
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 05462f0..dfdab42 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -132,11 +132,14 @@ static u64 amd_pmu_event_map(int hw_event)
 	return amd_perfmon_event_map[hw_event];
 }
 
+static struct event_constraint *amd_nb_event_constraint;
+
 /*
  * Previously calculated offsets
  */
 static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
 static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
+static unsigned int rdpmc_indexes[X86_PMC_IDX_MAX] __read_mostly;
 
 /*
  * Legacy CPUs:
@@ -144,10 +147,14 @@ static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
  *
  * CPUs with core performance counter extensions:
  *   6 counters starting at 0xc0010200 each offset by 2
+ *
+ * CPUs with north bridge performance counter extensions:
+ *   4 additional counters starting at 0xc0010240 each offset by 2
+ *   (indexed right above either one of the above core counters)
  */
 static inline int amd_pmu_addr_offset(int index, bool eventsel)
 {
-	int offset;
+	int offset, first, base;
 
 	if (!index)
 		return index;
@@ -160,7 +167,23 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
 	if (offset)
 		return offset;
 
-	if (!cpu_has_perfctr_core)
+	if (amd_nb_event_constraint &&
+	    test_bit(index, amd_nb_event_constraint->idxmsk)) {
+		/*
+		 * calculate the offset of NB counters with respect to
+		 * base eventsel or perfctr
+		 */
+
+		first = find_first_bit(amd_nb_event_constraint->idxmsk,
+				       X86_PMC_IDX_MAX);
+
+		if (eventsel)
+			base = MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel;
+		else
+			base = MSR_F15H_NB_PERF_CTR - x86_pmu.perfctr;
+
+		offset = base + ((index - first) << 1);
+	} else if (!cpu_has_perfctr_core)
 		offset = index;
 	else
 		offset = index << 1;
@@ -175,24 +198,36 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
 
 static inline int amd_pmu_rdpmc_index(int index)
 {
-	return index;
-}
+	int ret, first;
 
-static int amd_pmu_hw_config(struct perf_event *event)
-{
-	int ret;
+	if (!index)
+		return index;
 
-	/* pass precise event sampling to ibs: */
-	if (event->attr.precise_ip && get_ibs_caps())
-		return -ENOENT;
+	ret = rdpmc_indexes[index];
 
-	ret = x86_pmu_hw_config(event);
 	if (ret)
 		return ret;
 
-	if (has_branch_stack(event))
-		return -EOPNOTSUPP;
+	if (amd_nb_event_constraint &&
+	    test_bit(index, amd_nb_event_constraint->idxmsk)) {
+		/*
+		 * according to the mnual, ECX value of the NB counters is
+		 * the index of the NB counter (0, 1, 2 or 3) plus 6
+		 */
+
+		first = find_first_bit(amd_nb_event_constraint->idxmsk,
+				       X86_PMC_IDX_MAX);
+		ret = index - first + 6;
+	} else
+		ret = index;
+
+	rdpmc_indexes[index] = ret;
 
+	return ret;
+}
+
+static int amd_core_hw_config(struct perf_event *event)
+{
 	if (event->attr.exclude_host && event->attr.exclude_guest)
 		/*
 		 * When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -206,10 +241,33 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	else if (event->attr.exclude_guest)
 		event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
 
-	if (event->attr.type != PERF_TYPE_RAW)
-		return 0;
+	return 0;
+}
 
-	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+/*
+ * NB counters do not support the following event select bits:
+ *   Host/Guest only
+ *   Counter mask
+ *   Invert counter mask
+ *   Edge detect
+ *   OS/User mode
+ */
+static int amd_nb_hw_config(struct perf_event *event)
+{
+	/* for NB, we only allow system wide counting mode */
+	if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+		return -EINVAL;
+
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
+	event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
+			      ARCH_PERFMON_EVENTSEL_OS);
+
+	if (event->hw.config & ~(AMD64_RAW_EVENT_MASK_NB |
+				 ARCH_PERFMON_EVENTSEL_INT))
+		return -EINVAL;
 
 	return 0;
 }
@@ -227,6 +285,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
 	return (hwc->config & 0xe0) == 0xe0;
 }
 
+static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
+{
+	return amd_nb_event_constraint && amd_is_nb_event(hwc);
+}
+
 static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 {
 	struct amd_nb *nb = cpuc->amd_nb;
@@ -234,6 +297,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 	return nb && nb->nb_id != -1;
 }
 
+static int amd_pmu_hw_config(struct perf_event *event)
+{
+	int ret;
+
+	/* pass precise event sampling to ibs: */
+	if (event->attr.precise_ip && get_ibs_caps())
+		return -ENOENT;
+
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	ret = x86_pmu_hw_config(event);
+	if (ret)
+		return ret;
+
+	if (event->attr.type == PERF_TYPE_RAW)
+		event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+
+	if (amd_is_perfctr_nb_event(&event->hw))
+		return amd_nb_hw_config(event);
+
+	return amd_core_hw_config(event);
+}
+
 static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
 					   struct perf_event *event)
 {
@@ -254,6 +341,19 @@ static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
 	}
 }
 
+static void amd_nb_interrupt_hw_config(struct hw_perf_event *hwc)
+{
+	int core_id = cpu_data(smp_processor_id()).cpu_core_id;
+
+	/* deliver interrupts only to this core */
+	if (hwc->config & ARCH_PERFMON_EVENTSEL_INT) {
+		hwc->config |= AMD64_EVENTSEL_INT_CORE_ENABLE;
+		hwc->config &= ~AMD64_EVENTSEL_INT_CORE_SEL_MASK;
+		hwc->config |= (u64)(core_id) <<
+			AMD64_EVENTSEL_INT_CORE_SEL_SHIFT;
+	}
+}
+
  /*
   * AMD64 NorthBridge events need special treatment because
   * counter access needs to be synchronized across all cores
@@ -299,6 +399,12 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
 	struct perf_event *old;
 	int idx, new = -1;
 
+	if (!c)
+		c = &unconstrained;
+
+	if (cpuc->is_fake)
+		return c;
+
 	/*
 	 * detect if already present, if so reuse
 	 *
@@ -335,6 +441,9 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
 	if (new == -1)
 		return &emptyconstraint;
 
+	if (amd_is_perfctr_nb_event(hwc))
+		amd_nb_interrupt_hw_config(hwc);
+
 	return &nb->event_constraints[new];
 }
 
@@ -434,7 +543,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
 		return &unconstrained;
 
-	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
+	return __amd_get_nb_event_constraints(cpuc, event,
+					      amd_nb_event_constraint);
 }
 
 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
@@ -533,6 +643,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
 static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
 static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
 
+static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
+static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
+
 static struct event_constraint *
 amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
@@ -598,8 +711,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
 			return &amd_f15_PMC20;
 		}
 	case AMD_EVENT_NB:
-		/* not yet implemented */
-		return &emptyconstraint;
+		return __amd_get_nb_event_constraints(cpuc, event,
+						      amd_nb_event_constraint);
 	default:
 		return &emptyconstraint;
 	}
@@ -647,7 +760,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 
 static int setup_event_constraints(void)
 {
-	if (boot_cpu_data.x86 >= 0x15)
+	if (boot_cpu_data.x86 == 0x15)
 		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
 	return 0;
 }
@@ -677,6 +790,23 @@ static int setup_perfctr_core(void)
 	return 0;
 }
 
+static int setup_perfctr_nb(void)
+{
+	if (!cpu_has_perfctr_nb)
+		return -ENODEV;
+
+	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
+
+	if (cpu_has_perfctr_core)
+		amd_nb_event_constraint = &amd_NBPMC96;
+	else
+		amd_nb_event_constraint = &amd_NBPMC74;
+
+	printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
+
+	return 0;
+}
+
 __init int amd_pmu_init(void)
 {
 	/* Performance-monitoring supported from K7 and later: */
@@ -687,6 +817,7 @@ __init int amd_pmu_init(void)
 
 	setup_event_constraints();
 	setup_perfctr_core();
+	setup_perfctr_nb();
 
 	/* Events are common for all AMDs */
 	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters
  2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
                   ` (5 preceding siblings ...)
  2013-02-06 17:26 ` [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
@ 2013-02-06 17:31 ` Jacob Shin
  2013-02-08 10:55   ` Stephane Eranian
  6 siblings, 1 reply; 21+ messages in thread
From: Jacob Shin @ 2013-02-06 17:31 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, perfmon2-devel

On Wed, Feb 06, 2013 at 11:26:23AM -0600, Jacob Shin wrote:
> The following patchset enables 4 additional performance counters in
> AMD family 15h processors that count northbridge events -- such as
> number of DRAM accesses.
> 

Here is the libpfm4 counterpart,

Thanks!

>From acbc2e6f66dc131658a0fa1283d830327a44919f Mon Sep 17 00:00:00 2001
From: Jacob Shin <jacob.shin@amd.com>
Date: Thu, 31 Jan 2013 14:34:06 -0600
Subject: [PATCH V2] Add AMD Family 15h northbridge performance events

libpfm4 side support for the following Linux kernel patchset:
  http://lkml.org/lkml/2013/1/10/450

Reference -- BIOS and Kernel Developer Guide (BKDG) for AMD Family 15h
 Models 00h-0Fh Processors:
  http://support.amd.com/us/Processor_TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
---
 lib/events/amd64_events_fam15h.h | 1128 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 1128 insertions(+)

diff --git a/lib/events/amd64_events_fam15h.h b/lib/events/amd64_events_fam15h.h
index 7f654e8..8700ab2 100644
--- a/lib/events/amd64_events_fam15h.h
+++ b/lib/events/amd64_events_fam15h.h
@@ -752,6 +752,910 @@ static const amd64_umask_t amd64_fam15h_l2_prefetcher_trigger_events[]={
    },
 };
 
+static const amd64_umask_t amd64_fam15h_dram_accesses[]={
+   { .uname = "DCT0_PAGE_HIT",
+     .udesc = "DCT0 Page hit",
+     .ucode = 0x1,
+   },
+   { .uname = "DCT0_PAGE_MISS",
+     .udesc = "DCT0 Page Miss",
+     .ucode = 0x2,
+   },
+   { .uname = "DCT0_PAGE_CONFLICT",
+     .udesc = "DCT0 Page Conflict",
+     .ucode = 0x4,
+   },
+   { .uname = "DCT1_PAGE_HIT",
+     .udesc = "DCT1 Page hit",
+     .ucode = 0x8,
+   },
+   { .uname = "DCT1_PAGE_MISS",
+     .udesc = "DCT1 Page Miss",
+     .ucode = 0x10,
+   },
+   { .uname = "DCT1_PAGE_CONFLICT",
+     .udesc = "DCT1 Page Conflict",
+     .ucode = 0x20,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0x3f,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_dram_controller_page_table_overflows[]={
+   { .uname = "DCT0_PAGE_TABLE_OVERFLOW",
+     .udesc = "DCT0 Page Table Overflow",
+     .ucode = 0x1,
+   },
+   { .uname = "DCT1_PAGE_TABLE_OVERFLOW",
+     .udesc = "DCT1 Page Table Overflow",
+     .ucode = 0x2,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode  = 0x3,
+     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_memory_controller_dram_command_slots_missed[]={
+   { .uname = "DCT0_COMMAND_SLOTS_MISSED",
+     .udesc = "DCT0 Command Slots Missed (in MemClks)",
+     .ucode = 0x1,
+   },
+   { .uname = "DCT1_COMMAND_SLOTS_MISSED",
+     .udesc = "DCT1 Command Slots Missed (in MemClks)",
+     .ucode = 0x2,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode  = 0x3,
+     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_memory_controller_turnarounds[]={
+   { .uname = "DCT0_DIMM_TURNAROUND",
+     .udesc = "DCT0 DIMM (chip select) turnaround",
+     .ucode = 0x1,
+   },
+   { .uname = "DCT0_READ_WRITE_TURNAROUND",
+     .udesc = "DCT0 Read to write turnaround",
+     .ucode = 0x2,
+   },
+   { .uname = "DCT0_WRITE_READ_TURNAROUND",
+     .udesc = "DCT0 Write to read turnaround",
+     .ucode = 0x4,
+   },
+   { .uname = "DCT1_DIMM_TURNAROUND",
+     .udesc = "DCT1 DIMM (chip select) turnaround",
+     .ucode = 0x8,
+   },
+   { .uname = "DCT1_READ_WRITE_TURNAROUND",
+     .udesc = "DCT1 Read to write turnaround",
+     .ucode = 0x10,
+   },
+   { .uname = "DCT1_WRITE_READ_TURNAROUND",
+     .udesc = "DCT1 Write to read turnaround",
+     .ucode = 0x20,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode  = 0x3f,
+     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_memory_controller_bypass_counter_saturation[]={
+   { .uname = "MEMORY_CONTROLLER_HIGH_PRIORITY_BYPASS",
+     .udesc = "Memory controller high priority bypass",
+     .ucode = 0x1,
+   },
+   { .uname = "MEMORY_CONTROLLER_MEDIUM_PRIORITY_BYPASS",
+     .udesc = "Memory controller medium priority bypass",
+     .ucode = 0x2,
+   },
+   { .uname = "DCT0_DCQ_BYPASS",
+     .udesc = "DCT0 DCQ bypass",
+     .ucode = 0x4,
+   },
+   { .uname = "DCT1_DCQ_BYPASS",
+     .udesc = "DCT1 DCQ bypass",
+     .ucode = 0x8,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode  = 0xf,
+     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_thermal_status[]={
+   { .uname = "NUM_HTC_TRIP_POINT_CROSSED",
+     .udesc = "Number of times the HTC trip point is crossed",
+     .ucode = 0x4,
+   },
+   { .uname = "NUM_CLOCKS_HTC_PSTATE_INACTIVE",
+     .udesc = "Number of clocks HTC P-state is inactive",
+     .ucode = 0x20,
+   },
+   { .uname = "NUM_CLOCKS_HTC_PSTATE_ACTIVE",
+     .udesc = "Number of clocks HTC P-state is active",
+     .ucode = 0x40,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0x64,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_cpu_io_requests_to_memory_io[]={
+   { .uname = "REMOTE_IO_TO_LOCAL_IO",
+     .udesc = "Remote IO to Local IO",
+     .ucode = 0x61,
+   },
+   { .uname = "REMOTE_CPU_TO_LOCAL_IO",
+     .udesc = "Remote CPU to Local IO",
+     .ucode = 0x64,
+   },
+   { .uname = "LOCAL_IO_TO_REMOTE_IO",
+     .udesc = "Local IO to Remote IO",
+     .ucode = 0x91,
+   },
+   { .uname = "LOCAL_IO_TO_REMOTE_MEM",
+     .udesc = "Local IO to Remote Mem",
+     .ucode = 0x92,
+   },
+   { .uname = "LOCAL_CPU_TO_REMOTE_IO",
+     .udesc = "Local CPU to Remote IO",
+     .ucode = 0x94,
+   },
+   { .uname = "LOCAL_CPU_TO_REMOTE_MEM",
+     .udesc = "Local CPU to Remote Mem",
+     .ucode = 0x98,
+   },
+   { .uname = "LOCAL_IO_TO_LOCAL_IO",
+     .udesc = "Local IO to Local IO",
+     .ucode = 0xa1,
+   },
+   { .uname = "LOCAL_IO_TO_LOCAL_MEM",
+     .udesc = "Local IO to Local Mem",
+     .ucode = 0xa2,
+   },
+   { .uname = "LOCAL_CPU_TO_LOCAL_IO",
+     .udesc = "Local CPU to Local IO",
+     .ucode = 0xa4,
+   },
+   { .uname = "LOCAL_CPU_TO_LOCAL_MEM",
+     .udesc = "Local CPU to Local Mem",
+     .ucode = 0xa8,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_cache_block_commands[]={
+   { .uname = "VICTIM_BLOCK",
+     .udesc = "Victim Block (Writeback)",
+     .ucode = 0x1,
+   },
+   { .uname = "READ_BLOCK",
+     .udesc = "Read Block (Dcache load miss refill)",
+     .ucode = 0x4,
+   },
+   { .uname = "READ_BLOCK_SHARED",
+     .udesc = "Read Block Shared (Icache refill)",
+     .ucode = 0x8,
+   },
+   { .uname = "READ_BLOCK_MODIFIED",
+     .udesc = "Read Block Modified (Dcache store miss refill)",
+     .ucode = 0x10,
+   },
+   { .uname = "CHANGE_TO_DIRTY",
+     .udesc = "Change-to-Dirty (first store to clean block already in cache)",
+     .ucode = 0x20,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode  = 0x3d,
+     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_sized_commands[]={
+   { .uname = "NON-POSTED_SZWR_BYTE",
+     .udesc = "Non-Posted SzWr Byte (1-32 bytes). Typical Usage: Legacy or mapped IO, typically 1-4 bytes.",
+     .ucode = 0x1,
+   },
+   { .uname = "NON-POSTED_SZWR_DW",
+     .udesc = "Non-Posted SzWr DW (1-16 dwords). Typical Usage: Legacy or mapped IO, typically 1",
+     .ucode = 0x2,
+   },
+   { .uname = "POSTED_SZWR_BYTE",
+     .udesc = "Posted SzWr Byte (1-32 bytes). Typical Usage: Subcache-line DMA writes, size varies; also",
+     .ucode = 0x4,
+   },
+   { .uname = "POSTED_SZWR_DW",
+     .udesc = "Posted SzWr DW (1-16 dwords). Typical Usage: Block-oriented DMA writes, often cache-line",
+     .ucode = 0x8,
+   },
+   { .uname = "SZRD_BYTE",
+     .udesc = "SzRd Byte (4 bytes). Typical Usage: Legacy or mapped IO.",
+     .ucode = 0x10,
+   },
+   { .uname = "SZRD_DW",
+     .udesc = "SzRd DW (1-16 dwords). Typical Usage: Block-oriented DMA reads, typically cache-line size.",
+     .ucode = 0x20,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0x3f,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_probe_responses_and_upstream_requests[]={
+   { .uname = "PROBE_MISS",
+     .udesc = "Probe miss",
+     .ucode = 0x1,
+   },
+   { .uname = "PROBE_HIT_CLEAN",
+     .udesc = "Probe hit clean",
+     .ucode = 0x2,
+   },
+   { .uname = "PROBE_HIT_DIRTY_WITHOUT_MEMORY_CANCEL",
+     .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)",
+     .ucode = 0x4,
+   },
+   { .uname = "PROBE_HIT_DIRTY_WITH_MEMORY_CANCEL",
+     .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)",
+     .ucode = 0x8,
+   },
+   { .uname = "UPSTREAM_DISPLAY_REFRESH_ISOC_READS",
+     .udesc = "Upstream display refresh/ISOC reads",
+     .ucode = 0x10,
+   },
+   { .uname = "UPSTREAM_NON-DISPLAY_REFRESH_READS",
+     .udesc = "Upstream non-display refresh reads",
+     .ucode = 0x20,
+   },
+   { .uname = "UPSTREAM_ISOC_WRITES",
+     .udesc = "Upstream ISOC writes",
+     .ucode = 0x40,
+   },
+   { .uname = "UPSTREAM_NON-ISOC_WRITES",
+     .udesc = "Upstream non-ISOC writes",
+     .ucode = 0x80,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_gart_events[]={
+   { .uname = "GART_APERTURE_HIT_ON_ACCESS_FROM_CPU",
+     .udesc = "GART aperture hit on access from CPU",
+     .ucode = 0x1,
+   },
+   { .uname = "GART_APERTURE_HIT_ON_ACCESS_FROM_IO",
+     .udesc = "GART aperture hit on access from IO",
+     .ucode = 0x2,
+   },
+   { .uname = "GART_MISS",
+     .udesc = "GART miss",
+     .ucode = 0x4,
+   },
+   { .uname = "GART_REQUEST_HIT_TABLE_WALK_IN_PROGRESS",
+     .udesc = "GART Request hit table walk in progress",
+     .ucode = 0x8,
+   },
+   { .uname = "GART_MULTIPLE_TABLE_WALK_IN_PROGRESS",
+     .udesc = "GART multiple table walk in progress",
+     .ucode = 0x80,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0x8f,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_link_transmit_bandwidth[]={
+   { .uname = "COMMAND_DW_SENT",
+     .udesc = "Command DW sent",
+     .ucode = 0x1,
+   },
+   { .uname = "DATA_DW_SENT",
+     .udesc = "Data DW sent",
+     .ucode = 0x2,
+   },
+   { .uname = "BUFFER_RELEASE_DW_SENT",
+     .udesc = "Buffer release DW sent",
+     .ucode = 0x4,
+   },
+   { .uname = "NOP_DW_SENT",
+     .udesc = "NOP DW sent (idle)",
+     .ucode = 0x8,
+   },
+   { .uname = "ADDRESS_DW_SENT",
+     .udesc = "Address (including extensions) DW sent",
+     .ucode = 0x10,
+   },
+   { .uname = "PER_PACKET_CRC_SENT",
+     .udesc = "Per packet CRC sent",
+     .ucode = 0x20,
+   },
+   { .uname = "SUBLINK_COMMAND_DW_SENT",
+     .udesc = "Sublink Command DW sent",
+     .ucode = 0x81,
+   },
+   { .uname = "SUBLINK_DATA_DW_SENT",
+     .udesc = "Sublink Data DW sent",
+     .ucode = 0x82,
+   },
+   { .uname = "SUBLINK_BUFFER_RELEASE_DW_SENT",
+     .udesc = "Sublink Buffer release DW sent",
+     .ucode = 0x84,
+   },
+   { .uname = "SUBLINK_NOP_DW_SENT",
+     .udesc = "Sublink NOP DW sent (idle)",
+     .ucode = 0x88,
+   },
+   { .uname = "SUBLINK_ADDRESS_DW_SENT",
+     .udesc = "Sublink Address (including extensions) DW sent",
+     .ucode = 0x90,
+   },
+   { .uname = "SUBLINK_PER_PACKET_CRC_SENT",
+     .udesc = "Sublink Per packet CRC sent",
+     .ucode = 0xa0,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0x3f,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_cpu_to_dram_requests_to_target_node[]={
+   { .uname = "LOCAL_TO_NODE_0",
+     .udesc = "From Local node to Node 0",
+     .ucode = 0x1,
+   },
+   { .uname = "LOCAL_TO_NODE_1",
+     .udesc = "From Local node to Node 1",
+     .ucode = 0x2,
+   },
+   { .uname = "LOCAL_TO_NODE_2",
+     .udesc = "From Local node to Node 2",
+     .ucode = 0x4,
+   },
+   { .uname = "LOCAL_TO_NODE_3",
+     .udesc = "From Local node to Node 3",
+     .ucode = 0x8,
+   },
+   { .uname = "LOCAL_TO_NODE_4",
+     .udesc = "From Local node to Node 4",
+     .ucode = 0x10,
+   },
+   { .uname = "LOCAL_TO_NODE_5",
+     .udesc = "From Local node to Node 5",
+     .ucode = 0x20,
+   },
+   { .uname = "LOCAL_TO_NODE_6",
+     .udesc = "From Local node to Node 6",
+     .ucode = 0x40,
+   },
+   { .uname = "LOCAL_TO_NODE_7",
+     .udesc = "From Local node to Node 7",
+     .ucode = 0x80,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_io_to_dram_requests_to_target_node[]={
+   { .uname = "LOCAL_TO_NODE_0",
+     .udesc = "From Local node to Node 0",
+     .ucode = 0x1,
+   },
+   { .uname = "LOCAL_TO_NODE_1",
+     .udesc = "From Local node to Node 1",
+     .ucode = 0x2,
+   },
+   { .uname = "LOCAL_TO_NODE_2",
+     .udesc = "From Local node to Node 2",
+     .ucode = 0x4,
+   },
+   { .uname = "LOCAL_TO_NODE_3",
+     .udesc = "From Local node to Node 3",
+     .ucode = 0x8,
+   },
+   { .uname = "LOCAL_TO_NODE_4",
+     .udesc = "From Local node to Node 4",
+     .ucode = 0x10,
+   },
+   { .uname = "LOCAL_TO_NODE_5",
+     .udesc = "From Local node to Node 5",
+     .ucode = 0x20,
+   },
+   { .uname = "LOCAL_TO_NODE_6",
+     .udesc = "From Local node to Node 6",
+     .ucode = 0x40,
+   },
+   { .uname = "LOCAL_TO_NODE_7",
+     .udesc = "From Local node to Node 7",
+     .ucode = 0x80,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_cpu_read_command_requests_to_target_node_0_3[]={
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_0",
+     .udesc = "Read block From Local node to Node 0",
+     .ucode = 0x11,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_0",
+     .udesc = "Read block shared From Local node to Node 0",
+     .ucode = 0x12,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_0",
+     .udesc = "Read block modified From Local node to Node 0",
+     .ucode = 0x14,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_0",
+     .udesc = "Change-to-Dirty From Local node to Node 0",
+     .ucode = 0x18,
+   },
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_1",
+     .udesc = "Read block From Local node to Node 1",
+     .ucode = 0x21,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_1",
+     .udesc = "Read block shared From Local node to Node 1",
+     .ucode = 0x22,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_1",
+     .udesc = "Read block modified From Local node to Node 1",
+     .ucode = 0x24,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_1",
+     .udesc = "Change-to-Dirty From Local node to Node 1",
+     .ucode = 0x28,
+   },
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_2",
+     .udesc = "Read block From Local node to Node 2",
+     .ucode = 0x41,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_2",
+     .udesc = "Read block shared From Local node to Node 2",
+     .ucode = 0x42,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_2",
+     .udesc = "Read block modified From Local node to Node 2",
+     .ucode = 0x44,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_2",
+     .udesc = "Change-to-Dirty From Local node to Node 2",
+     .ucode = 0x48,
+   },
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_3",
+     .udesc = "Read block From Local node to Node 3",
+     .ucode = 0x81,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_3",
+     .udesc = "Read block shared From Local node to Node 3",
+     .ucode = 0x82,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_3",
+     .udesc = "Read block modified From Local node to Node 3",
+     .ucode = 0x84,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_3",
+     .udesc = "Change-to-Dirty From Local node to Node 3",
+     .ucode = 0x88,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_cpu_read_command_requests_to_target_node_4_7[]={
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_4",
+     .udesc = "Read block From Local node to Node 4",
+     .ucode = 0x11,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_4",
+     .udesc = "Read block shared From Local node to Node 4",
+     .ucode = 0x12,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_4",
+     .udesc = "Read block modified From Local node to Node 4",
+     .ucode = 0x14,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_4",
+     .udesc = "Change-to-Dirty From Local node to Node 4",
+     .ucode = 0x18,
+   },
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_5",
+     .udesc = "Read block From Local node to Node 5",
+     .ucode = 0x21,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_5",
+     .udesc = "Read block shared From Local node to Node 5",
+     .ucode = 0x22,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_5",
+     .udesc = "Read block modified From Local node to Node 5",
+     .ucode = 0x24,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_5",
+     .udesc = "Change-to-Dirty From Local node to Node 5",
+     .ucode = 0x28,
+   },
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_6",
+     .udesc = "Read block From Local node to Node 6",
+     .ucode = 0x41,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_6",
+     .udesc = "Read block shared From Local node to Node 6",
+     .ucode = 0x42,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_6",
+     .udesc = "Read block modified From Local node to Node 6",
+     .ucode = 0x44,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_6",
+     .udesc = "Change-to-Dirty From Local node to Node 6",
+     .ucode = 0x48,
+   },
+   { .uname = "READ_BLOCK_LOCAL_TO_NODE_7",
+     .udesc = "Read block From Local node to Node 7",
+     .ucode = 0x81,
+   },
+   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_7",
+     .udesc = "Read block shared From Local node to Node 7",
+     .ucode = 0x82,
+   },
+   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_7",
+     .udesc = "Read block modified From Local node to Node 7",
+     .ucode = 0x84,
+   },
+   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_7",
+     .udesc = "Change-to-Dirty From Local node to Node 7",
+     .ucode = 0x88,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_cpu_command_requests_to_target_node[]={
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_0",
+     .udesc = "Read Sized From Local node to Node 0",
+     .ucode = 0x11,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_0",
+     .udesc = "Write Sized From Local node to Node 0",
+     .ucode = 0x12,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_0",
+     .udesc = "Victim Block From Local node to Node 0",
+     .ucode = 0x14,
+   },
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_1",
+     .udesc = "Read Sized From Local node to Node 1",
+     .ucode = 0x21,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_1",
+     .udesc = "Write Sized From Local node to Node 1",
+     .ucode = 0x22,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_1",
+     .udesc = "Victim Block From Local node to Node 1",
+     .ucode = 0x24,
+   },
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_2",
+     .udesc = "Read Sized From Local node to Node 2",
+     .ucode = 0x41,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_2",
+     .udesc = "Write Sized From Local node to Node 2",
+     .ucode = 0x42,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_2",
+     .udesc = "Victim Block From Local node to Node 2",
+     .ucode = 0x44,
+   },
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_3",
+     .udesc = "Read Sized From Local node to Node 3",
+     .ucode = 0x81,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_3",
+     .udesc = "Write Sized From Local node to Node 3",
+     .ucode = 0x82,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_3",
+     .udesc = "Victim Block From Local node to Node 3",
+     .ucode = 0x84,
+   },
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_4",
+     .udesc = "Read Sized From Local node to Node 4",
+     .ucode = 0x19,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_4",
+     .udesc = "Write Sized From Local node to Node 4",
+     .ucode = 0x1a,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_4",
+     .udesc = "Victim Block From Local node to Node 4",
+     .ucode = 0x1c,
+   },
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_5",
+     .udesc = "Read Sized From Local node to Node 5",
+     .ucode = 0x29,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_5",
+     .udesc = "Write Sized From Local node to Node 5",
+     .ucode = 0x2a,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_5",
+     .udesc = "Victim Block From Local node to Node 5",
+     .ucode = 0x2c,
+   },
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_6",
+     .udesc = "Read Sized From Local node to Node 6",
+     .ucode = 0x49,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_6",
+     .udesc = "Write Sized From Local node to Node 6",
+     .ucode = 0x4a,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_6",
+     .udesc = "Victim Block From Local node to Node 6",
+     .ucode = 0x4c,
+   },
+   { .uname = "READ_SIZED_LOCAL_TO_NODE_7",
+     .udesc = "Read Sized From Local node to Node 7",
+     .ucode = 0x89,
+   },
+   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_7",
+     .udesc = "Write Sized From Local node to Node 7",
+     .ucode = 0x8a,
+   },
+   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_7",
+     .udesc = "Victim Block From Local node to Node 7",
+     .ucode = 0x8c,
+   },
+   { .uname  = "ALL_LOCAL_TO_NODE_0_3",
+     .udesc  = "All From Local node to Node 0-3",
+     .ucode = 0xf7,
+     .uflags= AMD64_FL_NCOMBO,
+   },
+   { .uname  = "ALL_LOCAL_TO_NODE_4_7",
+     .udesc  = "All From Local node to Node 4-7",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_request_cache_status_0[]={
+   { .uname = "PROBE_HIT_S",
+     .udesc = "Probe Hit S",
+     .ucode = 0x1,
+   },
+   { .uname = "PROBE_HIT_E",
+     .udesc = "Probe Hit E",
+     .ucode = 0x2,
+   },
+   { .uname = "PROBE_HIT_MUW_OR_O",
+     .udesc = "Probe Hit MuW or O",
+     .ucode = 0x4,
+   },
+   { .uname = "PROBE_HIT_M",
+     .udesc = "Probe Hit M",
+     .ucode = 0x8,
+   },
+   { .uname = "PROBE_MISS",
+     .udesc = "Probe Miss",
+     .ucode = 0x10,
+   },
+   { .uname = "DIRECTED_PROBE",
+     .udesc = "Directed Probe",
+     .ucode = 0x20,
+   },
+   { .uname = "TRACK_CACHE_STAT_FOR_RDBLK",
+     .udesc = "Track Cache Stat for RdBlk",
+     .ucode = 0x40,
+   },
+   { .uname = "TRACK_CACHE_STAT_FOR_RDBLKS",
+     .udesc = "Track Cache Stat for RdBlkS",
+     .ucode = 0x80,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_request_cache_status_1[]={
+   { .uname = "PROBE_HIT_S",
+     .udesc = "Probe Hit S",
+     .ucode = 0x1,
+   },
+   { .uname = "PROBE_HIT_E",
+     .udesc = "Probe Hit E",
+     .ucode = 0x2,
+   },
+   { .uname = "PROBE_HIT_MUW_OR_O",
+     .udesc = "Probe Hit MuW or O",
+     .ucode = 0x4,
+   },
+   { .uname = "PROBE_HIT_M",
+     .udesc = "Probe Hit M",
+     .ucode = 0x8,
+   },
+   { .uname = "PROBE_MISS",
+     .udesc = "Probe Miss",
+     .ucode = 0x10,
+   },
+   { .uname = "DIRECTED_PROBE",
+     .udesc = "Directed Probe",
+     .ucode = 0x20,
+   },
+   { .uname = "TRACK_CACHE_STAT_FOR_CHGTODIRTY",
+     .udesc = "Track Cache Stat for ChgToDirty",
+     .ucode = 0x40,
+   },
+   { .uname = "TRACK_CACHE_STAT_FOR_RDBLKM",
+     .udesc = "Track Cache Stat for RdBlkM",
+     .ucode = 0x80,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_memory_controller_requests[]={
+   { .uname = "WRITE_REQUESTS_TO_DCT",
+     .udesc = "Write requests sent to the DCT",
+     .ucode = 0x1,
+   },
+   { .uname = "READ_REQUESTS_TO_DCT",
+     .udesc = "Read requests (including prefetch requests) sent to the DCT",
+     .ucode = 0x2,
+   },
+   { .uname = "PREFETCH_REQUESTS_TO_DCT",
+     .udesc = "Prefetch requests sent to the DCT",
+     .ucode = 0x4,
+   },
+   { .uname = "32_BYTES_SIZED_WRITES",
+     .udesc = "32 Bytes Sized Writes",
+     .ucode = 0x8,
+   },
+   { .uname = "64_BYTES_SIZED_WRITES",
+     .udesc = "64 Bytes Sized Writes",
+     .ucode = 0x10,
+   },
+   { .uname = "32_BYTES_SIZED_READS",
+     .udesc = "32 Bytes Sized Reads",
+     .ucode = 0x20,
+   },
+   { .uname = "64_BYTE_SIZED_READS",
+     .udesc = "64 Byte Sized Reads",
+     .ucode = 0x40,
+   },
+   { .uname = "READ_REQUESTS_TO_DCT_WHILE_WRITES_PENDING",
+     .udesc = "Read requests sent to the DCT while writes requests are pending in the DCT",
+     .ucode = 0x80,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_read_request_to_l3_cache[]={
+   { .uname = "READ_BLOCK_EXCLUSIVE",
+     .udesc = "Read Block Exclusive (Data cache read)",
+     .ucode = 0xf1,
+   },
+   { .uname = "READ_BLOCK_SHARED",
+     .udesc = "Read Block Shared (Instruction cache read)",
+     .ucode = 0xf2,
+   },
+   { .uname = "READ_BLOCK_MODIFY",
+     .udesc = "Read Block Modify",
+     .ucode = 0xf4,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xf7,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_l3_fills_caused_by_l2_evictions[]={
+   { .uname = "SHARED",
+     .udesc = "Shared",
+     .ucode = 0xf1,
+   },
+   { .uname = "EXCLUSIVE",
+     .udesc = "Exclusive",
+     .ucode = 0xf2,
+   },
+   { .uname = "OWNED",
+     .udesc = "Owned",
+     .ucode = 0xf4,
+   },
+   { .uname = "MODIFIED",
+     .udesc = "Modified",
+     .ucode = 0xf8,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xff,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_l3_evictions[]={
+   { .uname = "SHARED",
+     .udesc = "Shared",
+     .ucode = 0x1,
+   },
+   { .uname = "EXCLUSIVE",
+     .udesc = "Exclusive",
+     .ucode = 0x2,
+   },
+   { .uname = "OWNED",
+     .udesc = "Owned",
+     .ucode = 0x4,
+   },
+   { .uname = "MODIFIED",
+     .udesc = "Modified",
+     .ucode = 0x8,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0xf,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
+static const amd64_umask_t amd64_fam15h_l3_latency[]={
+   { .uname = "L3_REQUEST_CYCLE",
+     .udesc = "L3 Request cycle count.",
+     .ucode = 0x1,
+   },
+   { .uname = "L3_REQUEST",
+     .udesc = "L3 request count.",
+     .ucode = 0x2,
+   },
+   { .uname  = "ALL",
+     .udesc  = "All sub-events selected",
+     .ucode = 0x3,
+     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
+   },
+};
+
 static const amd64_entry_t amd64_fam15h_pe[]={
 { .name    = "DISPATCHED_FPU_OPS",
   .desc    = "FPU Pipe Assignment",
@@ -1256,4 +2160,228 @@ static const amd64_entry_t amd64_fam15h_pe[]={
   .modmsk  = AMD64_FAM15H_ATTRS,
   .code    = 0x1d8,
 },
+{ .name    = "DRAM_ACCESSES",
+  .desc    = "DRAM Accesses",
+  .code    = 0xe0,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_dram_accesses),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_dram_accesses,
+},
+{ .name    = "DRAM_CONTROLLER_PAGE_TABLE_OVERFLOWS",
+  .desc    = "DRAM Controller Page Table Overflows",
+  .code    = 0xe1,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_dram_controller_page_table_overflows),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_dram_controller_page_table_overflows,
+},
+{ .name    = "MEMORY_CONTROLLER_DRAM_COMMAND_SLOTS_MISSED",
+  .desc    = "Memory Controller DRAM Command Slots Missed",
+  .code    = 0xe2,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_dram_command_slots_missed),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_memory_controller_dram_command_slots_missed,
+},
+{ .name    = "MEMORY_CONTROLLER_TURNAROUNDS",
+  .desc    = "Memory Controller Turnarounds",
+  .code    = 0xe3,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_turnarounds),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_memory_controller_turnarounds,
+},
+{ .name    = "MEMORY_CONTROLLER_BYPASS_COUNTER_SATURATION",
+  .desc    = "Memory Controller Bypass Counter Saturation",
+  .code    = 0xe4,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_bypass_counter_saturation),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_memory_controller_bypass_counter_saturation,
+},
+{ .name    = "THERMAL_STATUS",
+  .desc    = "Thermal Status",
+  .code    = 0xe8,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_thermal_status),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_thermal_status,
+},
+{ .name    = "CPU_IO_REQUESTS_TO_MEMORY_IO",
+  .desc    = "CPU/IO Requests to Memory/IO",
+  .code    = 0xe9,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_io_requests_to_memory_io),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_io_requests_to_memory_io,
+},
+{ .name    = "CACHE_BLOCK_COMMANDS",
+  .desc    = "Cache Block Commands",
+  .code    = 0xea,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cache_block_commands),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cache_block_commands,
+},
+{ .name    = "SIZED_COMMANDS",
+  .desc    = "Sized Commands",
+  .code    = 0xeb,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_sized_commands),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_sized_commands,
+},
+{ .name    = "PROBE_RESPONSES_AND_UPSTREAM_REQUESTS",
+  .desc    = "Probe Responses and Upstream Requests",
+  .code    = 0xec,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_probe_responses_and_upstream_requests),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_probe_responses_and_upstream_requests,
+},
+{ .name    = "GART_EVENTS",
+  .desc    = "GART Events",
+  .code    = 0xee,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_gart_events),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_gart_events,
+},
+{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_0",
+  .desc    = "Link Transmit Bandwidth Link 0",
+  .code    = 0xf6,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_link_transmit_bandwidth,
+},
+{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_1",
+  .desc    = "Link Transmit Bandwidth Link 1",
+  .code    = 0xf7,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_link_transmit_bandwidth,
+},
+{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_2",
+  .desc    = "Link Transmit Bandwidth Link 2",
+  .code    = 0xf8,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_link_transmit_bandwidth,
+},
+{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_3",
+  .desc    = "Link Transmit Bandwidth Link 3",
+  .code    = 0x1f9,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_link_transmit_bandwidth,
+},
+{ .name    = "CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE",
+  .desc    = "CPU to DRAM Requests to Target Node",
+  .code    = 0x1e0,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_to_dram_requests_to_target_node),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_to_dram_requests_to_target_node,
+},
+{ .name    = "IO_TO_DRAM_REQUESTS_TO_TARGET_NODE",
+  .desc    = "IO to DRAM Requests to Target Node",
+  .code    = 0x1e1,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_io_to_dram_requests_to_target_node),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_io_to_dram_requests_to_target_node,
+},
+{ .name    = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_0_3",
+  .desc    = "CPU Read Command Latency to Target Node 0-3",
+  .code    = 0x1e2,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_0_3),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_0_3,
+},
+{ .name    = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_0_3",
+  .desc    = "CPU Read Command Requests to Target Node 0-3",
+  .code    = 0x1e3,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_0_3),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_0_3,
+},
+{ .name    = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_4_7",
+  .desc    = "CPU Read Command Latency to Target Node 4-7",
+  .code    = 0x1e4,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_4_7),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_4_7,
+},
+{ .name    = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_4_7",
+  .desc    = "CPU Read Command Requests to Target Node 4-7",
+  .code    = 0x1e5,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_4_7),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_4_7,
+},
+{ .name    = "CPU_COMMAND_LATENCY_TO_TARGET_NODE",
+  .desc    = "CPU Command Latency to Target Node",
+  .code    = 0x1e6,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_command_requests_to_target_node),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_command_requests_to_target_node,
+},
+{ .name    = "CPU_REQUESTS_TO_TARGET_NODE",
+  .desc    = "CPU Requests to Target Node",
+  .code    = 0x1e7,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_command_requests_to_target_node),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_cpu_command_requests_to_target_node,
+},
+{ .name    = "REQUEST_CACHE_STATUS_0",
+  .desc    = "Request Cache Status 0",
+  .code    = 0x1ea,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_request_cache_status_0),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_request_cache_status_0,
+},
+{ .name    = "REQUEST_CACHE_STATUS_1",
+  .desc    = "Request Cache Status 1",
+  .code    = 0x1eb,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_request_cache_status_1),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_request_cache_status_1,
+},
+{ .name    = "MEMORY_CONTROLLER_REQUESTS",
+  .desc    = "Memory Controller Requests",
+  .code    = 0x1f0,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_requests),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_memory_controller_requests,
+},
+{ .name    = "READ_REQUEST_TO_L3_CACHE",
+  .desc    = "Read Request to L3 Cache",
+  .code    = 0x4e0,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_read_request_to_l3_cache,
+},
+{ .name    = "L3_CACHE_MISSES",
+  .desc    = "L3 Cache Misses",
+  .code    = 0x4e1,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_read_request_to_l3_cache,
+},
+{ .name    = "L3_FILLS_CAUSED_BY_L2_EVICTIONS",
+  .desc    = "L3 Fills caused by L2 Evictions",
+  .code    = 0x4e2,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_fills_caused_by_l2_evictions),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_l3_fills_caused_by_l2_evictions,
+},
+{ .name    = "L3_EVICTIONS",
+  .desc    = "L3 Evictions",
+  .code    = 0x4e3,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_evictions),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_l3_evictions,
+},
+{ .name    = "NON_CANCELED_L3_READ_REQUESTS",
+  .desc    = "Non-canceled L3 Read Requests",
+  .code    = 0x4ed,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_read_request_to_l3_cache,
+},
+{ .name    = "L3_LATENCY",
+  .desc    = "L3 Latency",
+  .code    = 0x4ef,
+  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_latency),
+  .ngrp    = 1,
+  .umasks  = amd64_fam15h_l3_latency,
+},
 };
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86/amd: Rework northbridge event constraints handler
  2013-02-06 17:26 ` [PATCH 1/6] perf, amd: Rework northbridge event constraints handler Jacob Shin
@ 2013-02-06 20:28   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Robert Richter @ 2013-02-06 20:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, eranian, hpa, mingo, a.p.zijlstra, acme,
	jolsa, jacob.shin, tglx, rric

Commit-ID:  2c53c3dd0b6497484b29fd49d34ef98acbc14577
Gitweb:     http://git.kernel.org/tip/2c53c3dd0b6497484b29fd49d34ef98acbc14577
Author:     Robert Richter <rric@kernel.org>
AuthorDate: Wed, 6 Feb 2013 11:26:24 -0600
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Feb 2013 19:45:22 +0100

perf/x86/amd: Rework northbridge event constraints handler

Code simplification. No functional changes.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Robert Richter <rric@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd.c | 68 ++++++++++++++----------------------
 1 file changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index c93bc4e..e7963c7 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -256,9 +256,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
-	struct perf_event *old = NULL;
-	int max = x86_pmu.num_counters;
-	int i, j, k = -1;
+	struct perf_event *old;
+	int idx, new = -1;
 
 	/*
 	 * if not NB event or no NB, then no constraints
@@ -276,48 +275,33 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	 * because of successive calls to x86_schedule_events() from
 	 * hw_perf_group_sched_in() without hw_perf_enable()
 	 */
-	for (i = 0; i < max; i++) {
-		/*
-		 * keep track of first free slot
-		 */
-		if (k == -1 && !nb->owners[i])
-			k = i;
+	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+		if (new == -1 || hwc->idx == idx)
+			/* assign free slot, prefer hwc->idx */
+			old = cmpxchg(nb->owners + idx, NULL, event);
+		else if (nb->owners[idx] == event)
+			/* event already present */
+			old = event;
+		else
+			continue;
+
+		if (old && old != event)
+			continue;
+
+		/* reassign to this slot */
+		if (new != -1)
+			cmpxchg(nb->owners + new, event, NULL);
+		new = idx;
 
 		/* already present, reuse */
-		if (nb->owners[i] == event)
-			goto done;
-	}
-	/*
-	 * not present, so grab a new slot
-	 * starting either at:
-	 */
-	if (hwc->idx != -1) {
-		/* previous assignment */
-		i = hwc->idx;
-	} else if (k != -1) {
-		/* start from free slot found */
-		i = k;
-	} else {
-		/*
-		 * event not found, no slot found in
-		 * first pass, try again from the
-		 * beginning
-		 */
-		i = 0;
-	}
-	j = i;
-	do {
-		old = cmpxchg(nb->owners+i, NULL, event);
-		if (!old)
+		if (old == event)
 			break;
-		if (++i == max)
-			i = 0;
-	} while (i != j);
-done:
-	if (!old)
-		return &nb->event_constraints[i];
-
-	return &emptyconstraint;
+	}
+
+	if (new == -1)
+		return &emptyconstraint;
+
+	return &nb->event_constraints[new];
 }
 
 static struct amd_nb *amd_alloc_nb(int cpu)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86/amd: Generalize northbridge constraints code for family 15h
  2013-02-06 17:26 ` [PATCH 2/6] perf, amd: Generalize northbridge constraints code for family 15h Jacob Shin
@ 2013-02-06 20:29   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Robert Richter @ 2013-02-06 20:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, paulus, hpa, mingo, a.p.zijlstra, acme,
	jolsa, jacob.shin, tglx, rric

Commit-ID:  4dd4c2ae555d8a91e8c5bf1cd56807a35764436a
Gitweb:     http://git.kernel.org/tip/4dd4c2ae555d8a91e8c5bf1cd56807a35764436a
Author:     Robert Richter <rric@kernel.org>
AuthorDate: Wed, 6 Feb 2013 11:26:25 -0600
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Feb 2013 19:45:23 +0100

perf/x86/amd: Generalize northbridge constraints code for family 15h

Generalize northbridge constraints code for family 10h so that
later we can reuse the same code path with other AMD processor
families that have the same northbridge event constraints.

Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-3-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd.c | 43 +++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index e7963c7..f8c9dfb 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -188,20 +188,13 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 	return nb && nb->nb_id != -1;
 }
 
-static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
-				      struct perf_event *event)
+static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
+					   struct perf_event *event)
 {
-	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
 	int i;
 
 	/*
-	 * only care about NB events
-	 */
-	if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-		return;
-
-	/*
 	 * need to scan whole list because event may not have
 	 * been assigned during scheduling
 	 *
@@ -247,12 +240,13 @@ static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
   *
   * Given that resources are allocated (cmpxchg), they must be
   * eventually freed for others to use. This is accomplished by
-  * calling amd_put_event_constraints().
+  * calling __amd_put_nb_event_constraints()
   *
   * Non NB events are not impacted by this restriction.
   */
 static struct event_constraint *
-amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+__amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
+			       struct event_constraint *c)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
@@ -260,12 +254,6 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	int idx, new = -1;
 
 	/*
-	 * if not NB event or no NB, then no constraints
-	 */
-	if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-		return &unconstrained;
-
-	/*
 	 * detect if already present, if so reuse
 	 *
 	 * cannot merge with actual allocation
@@ -275,7 +263,7 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	 * because of successive calls to x86_schedule_events() from
 	 * hw_perf_group_sched_in() without hw_perf_enable()
 	 */
-	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+	for_each_set_bit(idx, c->idxmsk, x86_pmu.num_counters) {
 		if (new == -1 || hwc->idx == idx)
 			/* assign free slot, prefer hwc->idx */
 			old = cmpxchg(nb->owners + idx, NULL, event);
@@ -391,6 +379,25 @@ static void amd_pmu_cpu_dead(int cpu)
 	}
 }
 
+static struct event_constraint *
+amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	/*
+	 * if not NB event or no NB, then no constraints
+	 */
+	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
+		return &unconstrained;
+
+	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
+}
+
+static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
+				      struct perf_event *event)
+{
+	if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
+		__amd_put_nb_event_constraints(cpuc, event);
+}
+
 PMU_FORMAT_ATTR(event,	"config:0-7,32-35");
 PMU_FORMAT_ATTR(umask,	"config:8-15"	);
 PMU_FORMAT_ATTR(edge,	"config:18"	);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86/amd: Use proper naming scheme for AMD bit field definitions
  2013-02-06 17:26 ` [PATCH 3/6] perf, amd: Use proper naming scheme for AMD bit field definitions Jacob Shin
@ 2013-02-06 20:30   ` tip-bot for Jacob Shin
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Jacob Shin @ 2013-02-06 20:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, eranian, hpa, mingo, a.p.zijlstra, acme,
	jolsa, jacob.shin, tglx

Commit-ID:  9f19010af8c651879ac2c36f1a808a3a4419cd40
Gitweb:     http://git.kernel.org/tip/9f19010af8c651879ac2c36f1a808a3a4419cd40
Author:     Jacob Shin <jacob.shin@amd.com>
AuthorDate: Wed, 6 Feb 2013 11:26:26 -0600
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Feb 2013 19:45:23 +0100

perf/x86/amd: Use proper naming scheme for AMD bit field definitions

Update these AMD bit field names to be consistent with naming
convention followed by the rest of the file.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-4-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/perf_event.h    | 4 ++--
 arch/x86/kernel/cpu/perf_event_amd.c | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 4fabcdf..2234eaaec 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,8 +29,8 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
-#define AMD_PERFMON_EVENTSEL_GUESTONLY			(1ULL << 40)
-#define AMD_PERFMON_EVENTSEL_HOSTONLY			(1ULL << 41)
+#define AMD64_EVENTSEL_GUESTONLY			(1ULL << 40)
+#define AMD64_EVENTSEL_HOSTONLY				(1ULL << 41)
 
 #define AMD64_EVENTSEL_EVENT	\
 	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index f8c9dfb..aea8c20 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -156,9 +156,9 @@ static int amd_pmu_hw_config(struct perf_event *event)
 		event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
 				      ARCH_PERFMON_EVENTSEL_OS);
 	else if (event->attr.exclude_host)
-		event->hw.config |= AMD_PERFMON_EVENTSEL_GUESTONLY;
+		event->hw.config |= AMD64_EVENTSEL_GUESTONLY;
 	else if (event->attr.exclude_guest)
-		event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;
+		event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
 
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
@@ -336,7 +336,7 @@ static void amd_pmu_cpu_starting(int cpu)
 	struct amd_nb *nb;
 	int i, nb_id;
 
-	cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+	cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
 
 	if (boot_cpu_data.x86_max_cores < 2)
 		return;
@@ -669,7 +669,7 @@ void amd_pmu_disable_virt(void)
 	 * SVM is disabled the Guest-only bits still gets set and the counter
 	 * will not count anything.
 	 */
-	cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;
+	cpuc->perf_ctr_virt_mask = AMD64_EVENTSEL_HOSTONLY;
 
 	/* Reload all events */
 	x86_pmu_disable_all();

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86: Move MSR address offset calculation to architecture specific files
  2013-02-06 17:26 ` [PATCH 4/6] perf, x86: Move MSR address offset calculation to architecture specific files Jacob Shin
@ 2013-02-06 20:31   ` tip-bot for Jacob Shin
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Jacob Shin @ 2013-02-06 20:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, paulus, hpa, mingo, a.p.zijlstra, acme,
	jolsa, jacob.shin, tglx

Commit-ID:  4c1fd17a1cb32bc4f429c7a5ff9a91a3bffdb8fa
Gitweb:     http://git.kernel.org/tip/4c1fd17a1cb32bc4f429c7a5ff9a91a3bffdb8fa
Author:     Jacob Shin <jacob.shin@amd.com>
AuthorDate: Wed, 6 Feb 2013 11:26:27 -0600
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Feb 2013 19:45:24 +0100

perf/x86: Move MSR address offset calculation to architecture specific files

Move counter index to MSR address offset calculation to
architecture specific files. This prepares the way for
perf_event_amd to enable counter addresses that are not
contiguous -- for example AMD Family 15h processors have 6 core
performance counters starting at 0xc0010200 and 4 northbridge
performance counters starting at 0xc0010240.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-5-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event.h     | 21 +++++-------------
 arch/x86/kernel/cpu/perf_event_amd.c | 42 ++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 115c1ea..a7f06a9 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -325,6 +325,7 @@ struct x86_pmu {
 	int		(*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
 	unsigned	eventsel;
 	unsigned	perfctr;
+	int		(*addr_offset)(int index, bool eventsel);
 	u64		(*event_map)(int);
 	int		max_events;
 	int		num_counters;
@@ -446,28 +447,16 @@ extern u64 __read_mostly hw_cache_extra_regs
 
 u64 x86_perf_event_update(struct perf_event *event);
 
-static inline int x86_pmu_addr_offset(int index)
-{
-	int offset;
-
-	/* offset = X86_FEATURE_PERFCTR_CORE ? index << 1 : index */
-	alternative_io(ASM_NOP2,
-		       "shll $1, %%eax",
-		       X86_FEATURE_PERFCTR_CORE,
-		       "=a" (offset),
-		       "a"  (index));
-
-	return offset;
-}
-
 static inline unsigned int x86_pmu_config_addr(int index)
 {
-	return x86_pmu.eventsel + x86_pmu_addr_offset(index);
+	return x86_pmu.eventsel + (x86_pmu.addr_offset ?
+				   x86_pmu.addr_offset(index, true) : index);
 }
 
 static inline unsigned int x86_pmu_event_addr(int index)
 {
-	return x86_pmu.perfctr + x86_pmu_addr_offset(index);
+	return x86_pmu.perfctr + (x86_pmu.addr_offset ?
+				  x86_pmu.addr_offset(index, false) : index);
 }
 
 int x86_setup_perfctr(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index aea8c20..b60f31c 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -132,6 +132,47 @@ static u64 amd_pmu_event_map(int hw_event)
 	return amd_perfmon_event_map[hw_event];
 }
 
+/*
+ * Previously calculated offsets
+ */
+static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
+static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
+
+/*
+ * Legacy CPUs:
+ *   4 counters starting at 0xc0010000 each offset by 1
+ *
+ * CPUs with core performance counter extensions:
+ *   6 counters starting at 0xc0010200 each offset by 2
+ */
+static inline int amd_pmu_addr_offset(int index, bool eventsel)
+{
+	int offset;
+
+	if (!index)
+		return index;
+
+	if (eventsel)
+		offset = event_offsets[index];
+	else
+		offset = count_offsets[index];
+
+	if (offset)
+		return offset;
+
+	if (!cpu_has_perfctr_core)
+		offset = index;
+	else
+		offset = index << 1;
+
+	if (eventsel)
+		event_offsets[index] = offset;
+	else
+		count_offsets[index] = offset;
+
+	return offset;
+}
+
 static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
@@ -578,6 +619,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.schedule_events	= x86_schedule_events,
 	.eventsel		= MSR_K7_EVNTSEL0,
 	.perfctr		= MSR_K7_PERFCTR0,
+	.addr_offset            = amd_pmu_addr_offset,
 	.event_map		= amd_pmu_event_map,
 	.max_events		= ARRAY_SIZE(amd_perfmon_event_map),
 	.num_counters		= AMD64_NUM_COUNTERS,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86: Allow for architecture specific RDPMC indexes
  2013-02-06 17:26 ` [PATCH 5/6] perf, x86: Allow for architecture specific RDPMC indexes Jacob Shin
@ 2013-02-06 20:32   ` tip-bot for Jacob Shin
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Jacob Shin @ 2013-02-06 20:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, eranian, hpa, mingo, a.p.zijlstra, acme,
	jolsa, jacob.shin, tglx

Commit-ID:  0fbdad078a70ed72248c3d30fe32e45e83be00d1
Gitweb:     http://git.kernel.org/tip/0fbdad078a70ed72248c3d30fe32e45e83be00d1
Author:     Jacob Shin <jacob.shin@amd.com>
AuthorDate: Wed, 6 Feb 2013 11:26:28 -0600
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Feb 2013 19:45:24 +0100

perf/x86: Allow for architecture specific RDPMC indexes

Similar to config_base and event_base, allow architecture
specific RDPMC ECX values.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-6-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c     | 2 +-
 arch/x86/kernel/cpu/perf_event.h     | 6 ++++++
 arch/x86/kernel/cpu/perf_event_amd.c | 6 ++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index c0df5ed2..bf0f01a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -829,7 +829,7 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 	} else {
 		hwc->config_base = x86_pmu_config_addr(hwc->idx);
 		hwc->event_base  = x86_pmu_event_addr(hwc->idx);
-		hwc->event_base_rdpmc = hwc->idx;
+		hwc->event_base_rdpmc = x86_pmu_rdpmc_index(hwc->idx);
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index a7f06a9..7f5c75c 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -326,6 +326,7 @@ struct x86_pmu {
 	unsigned	eventsel;
 	unsigned	perfctr;
 	int		(*addr_offset)(int index, bool eventsel);
+	int		(*rdpmc_index)(int index);
 	u64		(*event_map)(int);
 	int		max_events;
 	int		num_counters;
@@ -459,6 +460,11 @@ static inline unsigned int x86_pmu_event_addr(int index)
 				  x86_pmu.addr_offset(index, false) : index);
 }
 
+static inline int x86_pmu_rdpmc_index(int index)
+{
+	return x86_pmu.rdpmc_index ? x86_pmu.rdpmc_index(index) : index;
+}
+
 int x86_setup_perfctr(struct perf_event *event);
 
 int x86_pmu_hw_config(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index b60f31c..05462f0 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -173,6 +173,11 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
 	return offset;
 }
 
+static inline int amd_pmu_rdpmc_index(int index)
+{
+	return index;
+}
+
 static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
@@ -620,6 +625,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.eventsel		= MSR_K7_EVNTSEL0,
 	.perfctr		= MSR_K7_PERFCTR0,
 	.addr_offset            = amd_pmu_addr_offset,
+	.rdpmc_index		= amd_pmu_rdpmc_index,
 	.event_map		= amd_pmu_event_map,
 	.max_events		= ARRAY_SIZE(amd_perfmon_event_map),
 	.num_counters		= AMD64_NUM_COUNTERS,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h
  2013-02-06 17:26 ` [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
@ 2013-02-07 17:57   ` Jacob Shin
  2013-02-07 17:58     ` Stephane Eranian
  2013-02-07 19:09     ` Ingo Molnar
  2013-02-08 11:16   ` Stephane Eranian
  2013-02-18  8:30   ` [tip:perf/core] perf/x86/amd: " tip-bot for Jacob Shin
  2 siblings, 2 replies; 21+ messages in thread
From: Jacob Shin @ 2013-02-07 17:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra
  Cc: Paul Mackerras, Arnaldo Carvalho de Melo, Stephane Eranian,
	Jiri Olsa, linux-kernel

On Wed, Feb 06, 2013 at 11:26:29AM -0600, Jacob Shin wrote:
> On AMD family 15h processors, there are 4 new performance counters
> (in addition to 6 core performance counters) that can be used for
> counting northbridge events (i.e. DRAM accesses). Their bit fields are
> almost identical to the core performance counters. However, unlike the
> core performance counters, these MSRs are shared between multiple
> cores (that share the same northbridge). We will reuse the same code
> path as existing family 10h northbridge event constraints handler
> logic to enforce this sharing.
> 
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>

Hi Ingo, could you please apply this one to tip as well? I recieved
tip-bot emails for all other patches in this series except for this
last one 6/6.

Or was that intentional? If so, what other changes are required/
recommended?

Thanks!

-Jacob

> ---
>  arch/x86/include/asm/cpufeature.h     |    2 +
>  arch/x86/include/asm/perf_event.h     |    9 ++
>  arch/x86/include/uapi/asm/msr-index.h |    2 +
>  arch/x86/kernel/cpu/perf_event_amd.c  |  171 +++++++++++++++++++++++++++++----
>  4 files changed, 164 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 2d9075e..93fe929 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -167,6 +167,7 @@
>  #define X86_FEATURE_TBM		(6*32+21) /* trailing bit manipulations */
>  #define X86_FEATURE_TOPOEXT	(6*32+22) /* topology extensions CPUID leafs */
>  #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
> +#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* NB performance counter extensions */
>  
>  /*
>   * Auxiliary flags: Linux defined - For features scattered in various
> @@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
>  #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
>  #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
>  #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
> +#define cpu_has_perfctr_nb	boot_cpu_has(X86_FEATURE_PERFCTR_NB)
>  #define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
>  #define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
>  #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 2234eaaec..57cb634 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -29,9 +29,14 @@
>  #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
>  #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
>  
> +#define AMD64_EVENTSEL_INT_CORE_ENABLE			(1ULL << 36)
>  #define AMD64_EVENTSEL_GUESTONLY			(1ULL << 40)
>  #define AMD64_EVENTSEL_HOSTONLY				(1ULL << 41)
>  
> +#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT		37
> +#define AMD64_EVENTSEL_INT_CORE_SEL_MASK		\
> +	(0xFULL << AMD64_EVENTSEL_INT_CORE_SEL_SHIFT)
> +
>  #define AMD64_EVENTSEL_EVENT	\
>  	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
>  #define INTEL_ARCH_EVENT_MASK	\
> @@ -46,8 +51,12 @@
>  #define AMD64_RAW_EVENT_MASK		\
>  	(X86_RAW_EVENT_MASK          |  \
>  	 AMD64_EVENTSEL_EVENT)
> +#define AMD64_RAW_EVENT_MASK_NB		\
> +	(AMD64_EVENTSEL_EVENT        |  \
> +	 ARCH_PERFMON_EVENTSEL_UMASK)
>  #define AMD64_NUM_COUNTERS				4
>  #define AMD64_NUM_COUNTERS_CORE				6
> +#define AMD64_NUM_COUNTERS_NB				4
>  
>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
> diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
> index 1031604..27c05d2 100644
> --- a/arch/x86/include/uapi/asm/msr-index.h
> +++ b/arch/x86/include/uapi/asm/msr-index.h
> @@ -195,6 +195,8 @@
>  /* Fam 15h MSRs */
>  #define MSR_F15H_PERF_CTL		0xc0010200
>  #define MSR_F15H_PERF_CTR		0xc0010201
> +#define MSR_F15H_NB_PERF_CTL		0xc0010240
> +#define MSR_F15H_NB_PERF_CTR		0xc0010241
>  
>  /* Fam 10h MSRs */
>  #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index 05462f0..dfdab42 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -132,11 +132,14 @@ static u64 amd_pmu_event_map(int hw_event)
>  	return amd_perfmon_event_map[hw_event];
>  }
>  
> +static struct event_constraint *amd_nb_event_constraint;
> +
>  /*
>   * Previously calculated offsets
>   */
>  static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
>  static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
> +static unsigned int rdpmc_indexes[X86_PMC_IDX_MAX] __read_mostly;
>  
>  /*
>   * Legacy CPUs:
> @@ -144,10 +147,14 @@ static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
>   *
>   * CPUs with core performance counter extensions:
>   *   6 counters starting at 0xc0010200 each offset by 2
> + *
> + * CPUs with north bridge performance counter extensions:
> + *   4 additional counters starting at 0xc0010240 each offset by 2
> + *   (indexed right above either one of the above core counters)
>   */
>  static inline int amd_pmu_addr_offset(int index, bool eventsel)
>  {
> -	int offset;
> +	int offset, first, base;
>  
>  	if (!index)
>  		return index;
> @@ -160,7 +167,23 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
>  	if (offset)
>  		return offset;
>  
> -	if (!cpu_has_perfctr_core)
> +	if (amd_nb_event_constraint &&
> +	    test_bit(index, amd_nb_event_constraint->idxmsk)) {
> +		/*
> +		 * calculate the offset of NB counters with respect to
> +		 * base eventsel or perfctr
> +		 */
> +
> +		first = find_first_bit(amd_nb_event_constraint->idxmsk,
> +				       X86_PMC_IDX_MAX);
> +
> +		if (eventsel)
> +			base = MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel;
> +		else
> +			base = MSR_F15H_NB_PERF_CTR - x86_pmu.perfctr;
> +
> +		offset = base + ((index - first) << 1);
> +	} else if (!cpu_has_perfctr_core)
>  		offset = index;
>  	else
>  		offset = index << 1;
> @@ -175,24 +198,36 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
>  
>  static inline int amd_pmu_rdpmc_index(int index)
>  {
> -	return index;
> -}
> +	int ret, first;
>  
> -static int amd_pmu_hw_config(struct perf_event *event)
> -{
> -	int ret;
> +	if (!index)
> +		return index;
>  
> -	/* pass precise event sampling to ibs: */
> -	if (event->attr.precise_ip && get_ibs_caps())
> -		return -ENOENT;
> +	ret = rdpmc_indexes[index];
>  
> -	ret = x86_pmu_hw_config(event);
>  	if (ret)
>  		return ret;
>  
> -	if (has_branch_stack(event))
> -		return -EOPNOTSUPP;
> +	if (amd_nb_event_constraint &&
> +	    test_bit(index, amd_nb_event_constraint->idxmsk)) {
> +		/*
> +		 * according to the mnual, ECX value of the NB counters is
> +		 * the index of the NB counter (0, 1, 2 or 3) plus 6
> +		 */
> +
> +		first = find_first_bit(amd_nb_event_constraint->idxmsk,
> +				       X86_PMC_IDX_MAX);
> +		ret = index - first + 6;
> +	} else
> +		ret = index;
> +
> +	rdpmc_indexes[index] = ret;
>  
> +	return ret;
> +}
> +
> +static int amd_core_hw_config(struct perf_event *event)
> +{
>  	if (event->attr.exclude_host && event->attr.exclude_guest)
>  		/*
>  		 * When HO == GO == 1 the hardware treats that as GO == HO == 0
> @@ -206,10 +241,33 @@ static int amd_pmu_hw_config(struct perf_event *event)
>  	else if (event->attr.exclude_guest)
>  		event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
>  
> -	if (event->attr.type != PERF_TYPE_RAW)
> -		return 0;
> +	return 0;
> +}
>  
> -	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> +/*
> + * NB counters do not support the following event select bits:
> + *   Host/Guest only
> + *   Counter mask
> + *   Invert counter mask
> + *   Edge detect
> + *   OS/User mode
> + */
> +static int amd_nb_hw_config(struct perf_event *event)
> +{
> +	/* for NB, we only allow system wide counting mode */
> +	if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> +		return -EINVAL;
> +
> +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
> +
> +	event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
> +			      ARCH_PERFMON_EVENTSEL_OS);
> +
> +	if (event->hw.config & ~(AMD64_RAW_EVENT_MASK_NB |
> +				 ARCH_PERFMON_EVENTSEL_INT))
> +		return -EINVAL;
>  
>  	return 0;
>  }
> @@ -227,6 +285,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
>  	return (hwc->config & 0xe0) == 0xe0;
>  }
>  
> +static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
> +{
> +	return amd_nb_event_constraint && amd_is_nb_event(hwc);
> +}
> +
>  static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>  {
>  	struct amd_nb *nb = cpuc->amd_nb;
> @@ -234,6 +297,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>  	return nb && nb->nb_id != -1;
>  }
>  
> +static int amd_pmu_hw_config(struct perf_event *event)
> +{
> +	int ret;
> +
> +	/* pass precise event sampling to ibs: */
> +	if (event->attr.precise_ip && get_ibs_caps())
> +		return -ENOENT;
> +
> +	if (has_branch_stack(event))
> +		return -EOPNOTSUPP;
> +
> +	ret = x86_pmu_hw_config(event);
> +	if (ret)
> +		return ret;
> +
> +	if (event->attr.type == PERF_TYPE_RAW)
> +		event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> +
> +	if (amd_is_perfctr_nb_event(&event->hw))
> +		return amd_nb_hw_config(event);
> +
> +	return amd_core_hw_config(event);
> +}
> +
>  static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
>  					   struct perf_event *event)
>  {
> @@ -254,6 +341,19 @@ static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
>  	}
>  }
>  
> +static void amd_nb_interrupt_hw_config(struct hw_perf_event *hwc)
> +{
> +	int core_id = cpu_data(smp_processor_id()).cpu_core_id;
> +
> +	/* deliver interrupts only to this core */
> +	if (hwc->config & ARCH_PERFMON_EVENTSEL_INT) {
> +		hwc->config |= AMD64_EVENTSEL_INT_CORE_ENABLE;
> +		hwc->config &= ~AMD64_EVENTSEL_INT_CORE_SEL_MASK;
> +		hwc->config |= (u64)(core_id) <<
> +			AMD64_EVENTSEL_INT_CORE_SEL_SHIFT;
> +	}
> +}
> +
>   /*
>    * AMD64 NorthBridge events need special treatment because
>    * counter access needs to be synchronized across all cores
> @@ -299,6 +399,12 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
>  	struct perf_event *old;
>  	int idx, new = -1;
>  
> +	if (!c)
> +		c = &unconstrained;
> +
> +	if (cpuc->is_fake)
> +		return c;
> +
>  	/*
>  	 * detect if already present, if so reuse
>  	 *
> @@ -335,6 +441,9 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
>  	if (new == -1)
>  		return &emptyconstraint;
>  
> +	if (amd_is_perfctr_nb_event(hwc))
> +		amd_nb_interrupt_hw_config(hwc);
> +
>  	return &nb->event_constraints[new];
>  }
>  
> @@ -434,7 +543,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
>  	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
>  		return &unconstrained;
>  
> -	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
> +	return __amd_get_nb_event_constraints(cpuc, event,
> +					      amd_nb_event_constraint);
>  }
>  
>  static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
> @@ -533,6 +643,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
>  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
>  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
>  
> +static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
> +static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
> +
>  static struct event_constraint *
>  amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
>  {
> @@ -598,8 +711,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
>  			return &amd_f15_PMC20;
>  		}
>  	case AMD_EVENT_NB:
> -		/* not yet implemented */
> -		return &emptyconstraint;
> +		return __amd_get_nb_event_constraints(cpuc, event,
> +						      amd_nb_event_constraint);
>  	default:
>  		return &emptyconstraint;
>  	}
> @@ -647,7 +760,7 @@ static __initconst const struct x86_pmu amd_pmu = {
>  
>  static int setup_event_constraints(void)
>  {
> -	if (boot_cpu_data.x86 >= 0x15)
> +	if (boot_cpu_data.x86 == 0x15)
>  		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
>  	return 0;
>  }
> @@ -677,6 +790,23 @@ static int setup_perfctr_core(void)
>  	return 0;
>  }
>  
> +static int setup_perfctr_nb(void)
> +{
> +	if (!cpu_has_perfctr_nb)
> +		return -ENODEV;
> +
> +	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
> +
> +	if (cpu_has_perfctr_core)
> +		amd_nb_event_constraint = &amd_NBPMC96;
> +	else
> +		amd_nb_event_constraint = &amd_NBPMC74;
> +
> +	printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
> +
> +	return 0;
> +}
> +
>  __init int amd_pmu_init(void)
>  {
>  	/* Performance-monitoring supported from K7 and later: */
> @@ -687,6 +817,7 @@ __init int amd_pmu_init(void)
>  
>  	setup_event_constraints();
>  	setup_perfctr_core();
> +	setup_perfctr_nb();
>  
>  	/* Events are common for all AMDs */
>  	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
> -- 
> 1.7.9.5
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h
  2013-02-07 17:57   ` Jacob Shin
@ 2013-02-07 17:58     ` Stephane Eranian
  2013-02-07 19:09     ` Ingo Molnar
  1 sibling, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-02-07 17:58 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Jiri Olsa, LKML

On Thu, Feb 7, 2013 at 6:57 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> On Wed, Feb 06, 2013 at 11:26:29AM -0600, Jacob Shin wrote:
>> On AMD family 15h processors, there are 4 new performance counters
>> (in addition to 6 core performance counters) that can be used for
>> counting northbridge events (i.e. DRAM accesses). Their bit fields are
>> almost identical to the core performance counters. However, unlike the
>> core performance counters, these MSRs are shared between multiple
>> cores (that share the same northbridge). We will reuse the same code
>> path as existing family 10h northbridge event constraints handler
>> logic to enforce this sharing.
>>
>> Signed-off-by: Jacob Shin <jacob.shin@amd.com>
>
> Hi Ingo, could you please apply this one to tip as well? I recieved
> tip-bot emails for all other patches in this series except for this
> last one 6/6.
>
> Or was that intentional? If so, what other changes are required/
> recommended?
>
I am testing this patch right now. Should be done by tomorrow.

> Thanks!
>
> -Jacob
>
>> ---
>>  arch/x86/include/asm/cpufeature.h     |    2 +
>>  arch/x86/include/asm/perf_event.h     |    9 ++
>>  arch/x86/include/uapi/asm/msr-index.h |    2 +
>>  arch/x86/kernel/cpu/perf_event_amd.c  |  171 +++++++++++++++++++++++++++++----
>>  4 files changed, 164 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
>> index 2d9075e..93fe929 100644
>> --- a/arch/x86/include/asm/cpufeature.h
>> +++ b/arch/x86/include/asm/cpufeature.h
>> @@ -167,6 +167,7 @@
>>  #define X86_FEATURE_TBM              (6*32+21) /* trailing bit manipulations */
>>  #define X86_FEATURE_TOPOEXT  (6*32+22) /* topology extensions CPUID leafs */
>>  #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
>> +#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* NB performance counter extensions */
>>
>>  /*
>>   * Auxiliary flags: Linux defined - For features scattered in various
>> @@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
>>  #define cpu_has_hypervisor   boot_cpu_has(X86_FEATURE_HYPERVISOR)
>>  #define cpu_has_pclmulqdq    boot_cpu_has(X86_FEATURE_PCLMULQDQ)
>>  #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
>> +#define cpu_has_perfctr_nb   boot_cpu_has(X86_FEATURE_PERFCTR_NB)
>>  #define cpu_has_cx8          boot_cpu_has(X86_FEATURE_CX8)
>>  #define cpu_has_cx16         boot_cpu_has(X86_FEATURE_CX16)
>>  #define cpu_has_eager_fpu    boot_cpu_has(X86_FEATURE_EAGER_FPU)
>> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
>> index 2234eaaec..57cb634 100644
>> --- a/arch/x86/include/asm/perf_event.h
>> +++ b/arch/x86/include/asm/perf_event.h
>> @@ -29,9 +29,14 @@
>>  #define ARCH_PERFMON_EVENTSEL_INV                    (1ULL << 23)
>>  #define ARCH_PERFMON_EVENTSEL_CMASK                  0xFF000000ULL
>>
>> +#define AMD64_EVENTSEL_INT_CORE_ENABLE                       (1ULL << 36)
>>  #define AMD64_EVENTSEL_GUESTONLY                     (1ULL << 40)
>>  #define AMD64_EVENTSEL_HOSTONLY                              (1ULL << 41)
>>
>> +#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT            37
>> +#define AMD64_EVENTSEL_INT_CORE_SEL_MASK             \
>> +     (0xFULL << AMD64_EVENTSEL_INT_CORE_SEL_SHIFT)
>> +
>>  #define AMD64_EVENTSEL_EVENT \
>>       (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
>>  #define INTEL_ARCH_EVENT_MASK        \
>> @@ -46,8 +51,12 @@
>>  #define AMD64_RAW_EVENT_MASK         \
>>       (X86_RAW_EVENT_MASK          |  \
>>        AMD64_EVENTSEL_EVENT)
>> +#define AMD64_RAW_EVENT_MASK_NB              \
>> +     (AMD64_EVENTSEL_EVENT        |  \
>> +      ARCH_PERFMON_EVENTSEL_UMASK)
>>  #define AMD64_NUM_COUNTERS                           4
>>  #define AMD64_NUM_COUNTERS_CORE                              6
>> +#define AMD64_NUM_COUNTERS_NB                                4
>>
>>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL                0x3c
>>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK              (0x00 << 8)
>> diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
>> index 1031604..27c05d2 100644
>> --- a/arch/x86/include/uapi/asm/msr-index.h
>> +++ b/arch/x86/include/uapi/asm/msr-index.h
>> @@ -195,6 +195,8 @@
>>  /* Fam 15h MSRs */
>>  #define MSR_F15H_PERF_CTL            0xc0010200
>>  #define MSR_F15H_PERF_CTR            0xc0010201
>> +#define MSR_F15H_NB_PERF_CTL         0xc0010240
>> +#define MSR_F15H_NB_PERF_CTR         0xc0010241
>>
>>  /* Fam 10h MSRs */
>>  #define MSR_FAM10H_MMIO_CONF_BASE    0xc0010058
>> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
>> index 05462f0..dfdab42 100644
>> --- a/arch/x86/kernel/cpu/perf_event_amd.c
>> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
>> @@ -132,11 +132,14 @@ static u64 amd_pmu_event_map(int hw_event)
>>       return amd_perfmon_event_map[hw_event];
>>  }
>>
>> +static struct event_constraint *amd_nb_event_constraint;
>> +
>>  /*
>>   * Previously calculated offsets
>>   */
>>  static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
>>  static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
>> +static unsigned int rdpmc_indexes[X86_PMC_IDX_MAX] __read_mostly;
>>
>>  /*
>>   * Legacy CPUs:
>> @@ -144,10 +147,14 @@ static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
>>   *
>>   * CPUs with core performance counter extensions:
>>   *   6 counters starting at 0xc0010200 each offset by 2
>> + *
>> + * CPUs with north bridge performance counter extensions:
>> + *   4 additional counters starting at 0xc0010240 each offset by 2
>> + *   (indexed right above either one of the above core counters)
>>   */
>>  static inline int amd_pmu_addr_offset(int index, bool eventsel)
>>  {
>> -     int offset;
>> +     int offset, first, base;
>>
>>       if (!index)
>>               return index;
>> @@ -160,7 +167,23 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
>>       if (offset)
>>               return offset;
>>
>> -     if (!cpu_has_perfctr_core)
>> +     if (amd_nb_event_constraint &&
>> +         test_bit(index, amd_nb_event_constraint->idxmsk)) {
>> +             /*
>> +              * calculate the offset of NB counters with respect to
>> +              * base eventsel or perfctr
>> +              */
>> +
>> +             first = find_first_bit(amd_nb_event_constraint->idxmsk,
>> +                                    X86_PMC_IDX_MAX);
>> +
>> +             if (eventsel)
>> +                     base = MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel;
>> +             else
>> +                     base = MSR_F15H_NB_PERF_CTR - x86_pmu.perfctr;
>> +
>> +             offset = base + ((index - first) << 1);
>> +     } else if (!cpu_has_perfctr_core)
>>               offset = index;
>>       else
>>               offset = index << 1;
>> @@ -175,24 +198,36 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
>>
>>  static inline int amd_pmu_rdpmc_index(int index)
>>  {
>> -     return index;
>> -}
>> +     int ret, first;
>>
>> -static int amd_pmu_hw_config(struct perf_event *event)
>> -{
>> -     int ret;
>> +     if (!index)
>> +             return index;
>>
>> -     /* pass precise event sampling to ibs: */
>> -     if (event->attr.precise_ip && get_ibs_caps())
>> -             return -ENOENT;
>> +     ret = rdpmc_indexes[index];
>>
>> -     ret = x86_pmu_hw_config(event);
>>       if (ret)
>>               return ret;
>>
>> -     if (has_branch_stack(event))
>> -             return -EOPNOTSUPP;
>> +     if (amd_nb_event_constraint &&
>> +         test_bit(index, amd_nb_event_constraint->idxmsk)) {
>> +             /*
>> +              * according to the mnual, ECX value of the NB counters is
>> +              * the index of the NB counter (0, 1, 2 or 3) plus 6
>> +              */
>> +
>> +             first = find_first_bit(amd_nb_event_constraint->idxmsk,
>> +                                    X86_PMC_IDX_MAX);
>> +             ret = index - first + 6;
>> +     } else
>> +             ret = index;
>> +
>> +     rdpmc_indexes[index] = ret;
>>
>> +     return ret;
>> +}
>> +
>> +static int amd_core_hw_config(struct perf_event *event)
>> +{
>>       if (event->attr.exclude_host && event->attr.exclude_guest)
>>               /*
>>                * When HO == GO == 1 the hardware treats that as GO == HO == 0
>> @@ -206,10 +241,33 @@ static int amd_pmu_hw_config(struct perf_event *event)
>>       else if (event->attr.exclude_guest)
>>               event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
>>
>> -     if (event->attr.type != PERF_TYPE_RAW)
>> -             return 0;
>> +     return 0;
>> +}
>>
>> -     event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
>> +/*
>> + * NB counters do not support the following event select bits:
>> + *   Host/Guest only
>> + *   Counter mask
>> + *   Invert counter mask
>> + *   Edge detect
>> + *   OS/User mode
>> + */
>> +static int amd_nb_hw_config(struct perf_event *event)
>> +{
>> +     /* for NB, we only allow system wide counting mode */
>> +     if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
>> +             return -EINVAL;
>> +
>> +     if (event->attr.exclude_user || event->attr.exclude_kernel ||
>> +         event->attr.exclude_host || event->attr.exclude_guest)
>> +             return -EINVAL;
>> +
>> +     event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
>> +                           ARCH_PERFMON_EVENTSEL_OS);
>> +
>> +     if (event->hw.config & ~(AMD64_RAW_EVENT_MASK_NB |
>> +                              ARCH_PERFMON_EVENTSEL_INT))
>> +             return -EINVAL;
>>
>>       return 0;
>>  }
>> @@ -227,6 +285,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
>>       return (hwc->config & 0xe0) == 0xe0;
>>  }
>>
>> +static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
>> +{
>> +     return amd_nb_event_constraint && amd_is_nb_event(hwc);
>> +}
>> +
>>  static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>>  {
>>       struct amd_nb *nb = cpuc->amd_nb;
>> @@ -234,6 +297,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>>       return nb && nb->nb_id != -1;
>>  }
>>
>> +static int amd_pmu_hw_config(struct perf_event *event)
>> +{
>> +     int ret;
>> +
>> +     /* pass precise event sampling to ibs: */
>> +     if (event->attr.precise_ip && get_ibs_caps())
>> +             return -ENOENT;
>> +
>> +     if (has_branch_stack(event))
>> +             return -EOPNOTSUPP;
>> +
>> +     ret = x86_pmu_hw_config(event);
>> +     if (ret)
>> +             return ret;
>> +
>> +     if (event->attr.type == PERF_TYPE_RAW)
>> +             event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
>> +
>> +     if (amd_is_perfctr_nb_event(&event->hw))
>> +             return amd_nb_hw_config(event);
>> +
>> +     return amd_core_hw_config(event);
>> +}
>> +
>>  static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
>>                                          struct perf_event *event)
>>  {
>> @@ -254,6 +341,19 @@ static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
>>       }
>>  }
>>
>> +static void amd_nb_interrupt_hw_config(struct hw_perf_event *hwc)
>> +{
>> +     int core_id = cpu_data(smp_processor_id()).cpu_core_id;
>> +
>> +     /* deliver interrupts only to this core */
>> +     if (hwc->config & ARCH_PERFMON_EVENTSEL_INT) {
>> +             hwc->config |= AMD64_EVENTSEL_INT_CORE_ENABLE;
>> +             hwc->config &= ~AMD64_EVENTSEL_INT_CORE_SEL_MASK;
>> +             hwc->config |= (u64)(core_id) <<
>> +                     AMD64_EVENTSEL_INT_CORE_SEL_SHIFT;
>> +     }
>> +}
>> +
>>   /*
>>    * AMD64 NorthBridge events need special treatment because
>>    * counter access needs to be synchronized across all cores
>> @@ -299,6 +399,12 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
>>       struct perf_event *old;
>>       int idx, new = -1;
>>
>> +     if (!c)
>> +             c = &unconstrained;
>> +
>> +     if (cpuc->is_fake)
>> +             return c;
>> +
>>       /*
>>        * detect if already present, if so reuse
>>        *
>> @@ -335,6 +441,9 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
>>       if (new == -1)
>>               return &emptyconstraint;
>>
>> +     if (amd_is_perfctr_nb_event(hwc))
>> +             amd_nb_interrupt_hw_config(hwc);
>> +
>>       return &nb->event_constraints[new];
>>  }
>>
>> @@ -434,7 +543,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
>>       if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
>>               return &unconstrained;
>>
>> -     return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
>> +     return __amd_get_nb_event_constraints(cpuc, event,
>> +                                           amd_nb_event_constraint);
>>  }
>>
>>  static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
>> @@ -533,6 +643,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
>>  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
>>  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
>>
>> +static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
>> +static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
>> +
>>  static struct event_constraint *
>>  amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
>>  {
>> @@ -598,8 +711,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
>>                       return &amd_f15_PMC20;
>>               }
>>       case AMD_EVENT_NB:
>> -             /* not yet implemented */
>> -             return &emptyconstraint;
>> +             return __amd_get_nb_event_constraints(cpuc, event,
>> +                                                   amd_nb_event_constraint);
>>       default:
>>               return &emptyconstraint;
>>       }
>> @@ -647,7 +760,7 @@ static __initconst const struct x86_pmu amd_pmu = {
>>
>>  static int setup_event_constraints(void)
>>  {
>> -     if (boot_cpu_data.x86 >= 0x15)
>> +     if (boot_cpu_data.x86 == 0x15)
>>               x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
>>       return 0;
>>  }
>> @@ -677,6 +790,23 @@ static int setup_perfctr_core(void)
>>       return 0;
>>  }
>>
>> +static int setup_perfctr_nb(void)
>> +{
>> +     if (!cpu_has_perfctr_nb)
>> +             return -ENODEV;
>> +
>> +     x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
>> +
>> +     if (cpu_has_perfctr_core)
>> +             amd_nb_event_constraint = &amd_NBPMC96;
>> +     else
>> +             amd_nb_event_constraint = &amd_NBPMC74;
>> +
>> +     printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
>> +
>> +     return 0;
>> +}
>> +
>>  __init int amd_pmu_init(void)
>>  {
>>       /* Performance-monitoring supported from K7 and later: */
>> @@ -687,6 +817,7 @@ __init int amd_pmu_init(void)
>>
>>       setup_event_constraints();
>>       setup_perfctr_core();
>> +     setup_perfctr_nb();
>>
>>       /* Events are common for all AMDs */
>>       memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
>> --
>> 1.7.9.5
>>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h
  2013-02-07 17:57   ` Jacob Shin
  2013-02-07 17:58     ` Stephane Eranian
@ 2013-02-07 19:09     ` Ingo Molnar
  1 sibling, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2013-02-07 19:09 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Stephane Eranian, Jiri Olsa, linux-kernel


* Jacob Shin <jacob.shin@amd.com> wrote:

> On Wed, Feb 06, 2013 at 11:26:29AM -0600, Jacob Shin wrote:
> > On AMD family 15h processors, there are 4 new performance counters
> > (in addition to 6 core performance counters) that can be used for
> > counting northbridge events (i.e. DRAM accesses). Their bit fields are
> > almost identical to the core performance counters. However, unlike the
> > core performance counters, these MSRs are shared between multiple
> > cores (that share the same northbridge). We will reuse the same code
> > path as existing family 10h northbridge event constraints handler
> > logic to enforce this sharing.
> > 
> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> 
> Hi Ingo, could you please apply this one to tip as well? I 
> recieved tip-bot emails for all other patches in this series 
> except for this last one 6/6.
> 
> Or was that intentional? If so, what other changes are 
> required/ recommended?

Was waiting for Stephane's ack.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters
  2013-02-06 17:31 ` [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
@ 2013-02-08 10:55   ` Stephane Eranian
  0 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-02-08 10:55 UTC (permalink / raw)
  To: Jacob Shin; +Cc: LKML, perfmon2-devel

On Wed, Feb 6, 2013 at 6:31 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> On Wed, Feb 06, 2013 at 11:26:23AM -0600, Jacob Shin wrote:
>> The following patchset enables 4 additional performance counters in
>> AMD family 15h processors that count northbridge events -- such as
>> number of DRAM accesses.
>>
>
> Here is the libpfm4 counterpart,
>
Patch applied to libpfm4 with fixes to make it pass the validation test suite.
Thanks.

>
> From acbc2e6f66dc131658a0fa1283d830327a44919f Mon Sep 17 00:00:00 2001
> From: Jacob Shin <jacob.shin@amd.com>
> Date: Thu, 31 Jan 2013 14:34:06 -0600
> Subject: [PATCH V2] Add AMD Family 15h northbridge performance events
>
> libpfm4 side support for the following Linux kernel patchset:
>   http://lkml.org/lkml/2013/1/10/450
>
> Reference -- BIOS and Kernel Developer Guide (BKDG) for AMD Family 15h
>  Models 00h-0Fh Processors:
>   http://support.amd.com/us/Processor_TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
> ---
>  lib/events/amd64_events_fam15h.h | 1128 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 1128 insertions(+)
>
> diff --git a/lib/events/amd64_events_fam15h.h b/lib/events/amd64_events_fam15h.h
> index 7f654e8..8700ab2 100644
> --- a/lib/events/amd64_events_fam15h.h
> +++ b/lib/events/amd64_events_fam15h.h
> @@ -752,6 +752,910 @@ static const amd64_umask_t amd64_fam15h_l2_prefetcher_trigger_events[]={
>     },
>  };
>
> +static const amd64_umask_t amd64_fam15h_dram_accesses[]={
> +   { .uname = "DCT0_PAGE_HIT",
> +     .udesc = "DCT0 Page hit",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "DCT0_PAGE_MISS",
> +     .udesc = "DCT0 Page Miss",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "DCT0_PAGE_CONFLICT",
> +     .udesc = "DCT0 Page Conflict",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "DCT1_PAGE_HIT",
> +     .udesc = "DCT1 Page hit",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "DCT1_PAGE_MISS",
> +     .udesc = "DCT1 Page Miss",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "DCT1_PAGE_CONFLICT",
> +     .udesc = "DCT1 Page Conflict",
> +     .ucode = 0x20,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0x3f,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_dram_controller_page_table_overflows[]={
> +   { .uname = "DCT0_PAGE_TABLE_OVERFLOW",
> +     .udesc = "DCT0 Page Table Overflow",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "DCT1_PAGE_TABLE_OVERFLOW",
> +     .udesc = "DCT1 Page Table Overflow",
> +     .ucode = 0x2,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode  = 0x3,
> +     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_memory_controller_dram_command_slots_missed[]={
> +   { .uname = "DCT0_COMMAND_SLOTS_MISSED",
> +     .udesc = "DCT0 Command Slots Missed (in MemClks)",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "DCT1_COMMAND_SLOTS_MISSED",
> +     .udesc = "DCT1 Command Slots Missed (in MemClks)",
> +     .ucode = 0x2,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode  = 0x3,
> +     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_memory_controller_turnarounds[]={
> +   { .uname = "DCT0_DIMM_TURNAROUND",
> +     .udesc = "DCT0 DIMM (chip select) turnaround",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "DCT0_READ_WRITE_TURNAROUND",
> +     .udesc = "DCT0 Read to write turnaround",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "DCT0_WRITE_READ_TURNAROUND",
> +     .udesc = "DCT0 Write to read turnaround",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "DCT1_DIMM_TURNAROUND",
> +     .udesc = "DCT1 DIMM (chip select) turnaround",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "DCT1_READ_WRITE_TURNAROUND",
> +     .udesc = "DCT1 Read to write turnaround",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "DCT1_WRITE_READ_TURNAROUND",
> +     .udesc = "DCT1 Write to read turnaround",
> +     .ucode = 0x20,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode  = 0x3f,
> +     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_memory_controller_bypass_counter_saturation[]={
> +   { .uname = "MEMORY_CONTROLLER_HIGH_PRIORITY_BYPASS",
> +     .udesc = "Memory controller high priority bypass",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "MEMORY_CONTROLLER_MEDIUM_PRIORITY_BYPASS",
> +     .udesc = "Memory controller medium priority bypass",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "DCT0_DCQ_BYPASS",
> +     .udesc = "DCT0 DCQ bypass",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "DCT1_DCQ_BYPASS",
> +     .udesc = "DCT1 DCQ bypass",
> +     .ucode = 0x8,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode  = 0xf,
> +     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_thermal_status[]={
> +   { .uname = "NUM_HTC_TRIP_POINT_CROSSED",
> +     .udesc = "Number of times the HTC trip point is crossed",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "NUM_CLOCKS_HTC_PSTATE_INACTIVE",
> +     .udesc = "Number of clocks HTC P-state is inactive",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "NUM_CLOCKS_HTC_PSTATE_ACTIVE",
> +     .udesc = "Number of clocks HTC P-state is active",
> +     .ucode = 0x40,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0x64,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_cpu_io_requests_to_memory_io[]={
> +   { .uname = "REMOTE_IO_TO_LOCAL_IO",
> +     .udesc = "Remote IO to Local IO",
> +     .ucode = 0x61,
> +   },
> +   { .uname = "REMOTE_CPU_TO_LOCAL_IO",
> +     .udesc = "Remote CPU to Local IO",
> +     .ucode = 0x64,
> +   },
> +   { .uname = "LOCAL_IO_TO_REMOTE_IO",
> +     .udesc = "Local IO to Remote IO",
> +     .ucode = 0x91,
> +   },
> +   { .uname = "LOCAL_IO_TO_REMOTE_MEM",
> +     .udesc = "Local IO to Remote Mem",
> +     .ucode = 0x92,
> +   },
> +   { .uname = "LOCAL_CPU_TO_REMOTE_IO",
> +     .udesc = "Local CPU to Remote IO",
> +     .ucode = 0x94,
> +   },
> +   { .uname = "LOCAL_CPU_TO_REMOTE_MEM",
> +     .udesc = "Local CPU to Remote Mem",
> +     .ucode = 0x98,
> +   },
> +   { .uname = "LOCAL_IO_TO_LOCAL_IO",
> +     .udesc = "Local IO to Local IO",
> +     .ucode = 0xa1,
> +   },
> +   { .uname = "LOCAL_IO_TO_LOCAL_MEM",
> +     .udesc = "Local IO to Local Mem",
> +     .ucode = 0xa2,
> +   },
> +   { .uname = "LOCAL_CPU_TO_LOCAL_IO",
> +     .udesc = "Local CPU to Local IO",
> +     .ucode = 0xa4,
> +   },
> +   { .uname = "LOCAL_CPU_TO_LOCAL_MEM",
> +     .udesc = "Local CPU to Local Mem",
> +     .ucode = 0xa8,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_cache_block_commands[]={
> +   { .uname = "VICTIM_BLOCK",
> +     .udesc = "Victim Block (Writeback)",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "READ_BLOCK",
> +     .udesc = "Read Block (Dcache load miss refill)",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "READ_BLOCK_SHARED",
> +     .udesc = "Read Block Shared (Icache refill)",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED",
> +     .udesc = "Read Block Modified (Dcache store miss refill)",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY",
> +     .udesc = "Change-to-Dirty (first store to clean block already in cache)",
> +     .ucode = 0x20,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode  = 0x3d,
> +     .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_sized_commands[]={
> +   { .uname = "NON-POSTED_SZWR_BYTE",
> +     .udesc = "Non-Posted SzWr Byte (1-32 bytes). Typical Usage: Legacy or mapped IO, typically 1-4 bytes.",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "NON-POSTED_SZWR_DW",
> +     .udesc = "Non-Posted SzWr DW (1-16 dwords). Typical Usage: Legacy or mapped IO, typically 1",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "POSTED_SZWR_BYTE",
> +     .udesc = "Posted SzWr Byte (1-32 bytes). Typical Usage: Subcache-line DMA writes, size varies; also",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "POSTED_SZWR_DW",
> +     .udesc = "Posted SzWr DW (1-16 dwords). Typical Usage: Block-oriented DMA writes, often cache-line",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "SZRD_BYTE",
> +     .udesc = "SzRd Byte (4 bytes). Typical Usage: Legacy or mapped IO.",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "SZRD_DW",
> +     .udesc = "SzRd DW (1-16 dwords). Typical Usage: Block-oriented DMA reads, typically cache-line size.",
> +     .ucode = 0x20,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0x3f,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_probe_responses_and_upstream_requests[]={
> +   { .uname = "PROBE_MISS",
> +     .udesc = "Probe miss",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "PROBE_HIT_CLEAN",
> +     .udesc = "Probe hit clean",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "PROBE_HIT_DIRTY_WITHOUT_MEMORY_CANCEL",
> +     .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "PROBE_HIT_DIRTY_WITH_MEMORY_CANCEL",
> +     .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "UPSTREAM_DISPLAY_REFRESH_ISOC_READS",
> +     .udesc = "Upstream display refresh/ISOC reads",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "UPSTREAM_NON-DISPLAY_REFRESH_READS",
> +     .udesc = "Upstream non-display refresh reads",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "UPSTREAM_ISOC_WRITES",
> +     .udesc = "Upstream ISOC writes",
> +     .ucode = 0x40,
> +   },
> +   { .uname = "UPSTREAM_NON-ISOC_WRITES",
> +     .udesc = "Upstream non-ISOC writes",
> +     .ucode = 0x80,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_gart_events[]={
> +   { .uname = "GART_APERTURE_HIT_ON_ACCESS_FROM_CPU",
> +     .udesc = "GART aperture hit on access from CPU",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "GART_APERTURE_HIT_ON_ACCESS_FROM_IO",
> +     .udesc = "GART aperture hit on access from IO",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "GART_MISS",
> +     .udesc = "GART miss",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "GART_REQUEST_HIT_TABLE_WALK_IN_PROGRESS",
> +     .udesc = "GART Request hit table walk in progress",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "GART_MULTIPLE_TABLE_WALK_IN_PROGRESS",
> +     .udesc = "GART multiple table walk in progress",
> +     .ucode = 0x80,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0x8f,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_link_transmit_bandwidth[]={
> +   { .uname = "COMMAND_DW_SENT",
> +     .udesc = "Command DW sent",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "DATA_DW_SENT",
> +     .udesc = "Data DW sent",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "BUFFER_RELEASE_DW_SENT",
> +     .udesc = "Buffer release DW sent",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "NOP_DW_SENT",
> +     .udesc = "NOP DW sent (idle)",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "ADDRESS_DW_SENT",
> +     .udesc = "Address (including extensions) DW sent",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "PER_PACKET_CRC_SENT",
> +     .udesc = "Per packet CRC sent",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "SUBLINK_COMMAND_DW_SENT",
> +     .udesc = "Sublink Command DW sent",
> +     .ucode = 0x81,
> +   },
> +   { .uname = "SUBLINK_DATA_DW_SENT",
> +     .udesc = "Sublink Data DW sent",
> +     .ucode = 0x82,
> +   },
> +   { .uname = "SUBLINK_BUFFER_RELEASE_DW_SENT",
> +     .udesc = "Sublink Buffer release DW sent",
> +     .ucode = 0x84,
> +   },
> +   { .uname = "SUBLINK_NOP_DW_SENT",
> +     .udesc = "Sublink NOP DW sent (idle)",
> +     .ucode = 0x88,
> +   },
> +   { .uname = "SUBLINK_ADDRESS_DW_SENT",
> +     .udesc = "Sublink Address (including extensions) DW sent",
> +     .ucode = 0x90,
> +   },
> +   { .uname = "SUBLINK_PER_PACKET_CRC_SENT",
> +     .udesc = "Sublink Per packet CRC sent",
> +     .ucode = 0xa0,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0x3f,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_cpu_to_dram_requests_to_target_node[]={
> +   { .uname = "LOCAL_TO_NODE_0",
> +     .udesc = "From Local node to Node 0",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "LOCAL_TO_NODE_1",
> +     .udesc = "From Local node to Node 1",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "LOCAL_TO_NODE_2",
> +     .udesc = "From Local node to Node 2",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "LOCAL_TO_NODE_3",
> +     .udesc = "From Local node to Node 3",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "LOCAL_TO_NODE_4",
> +     .udesc = "From Local node to Node 4",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "LOCAL_TO_NODE_5",
> +     .udesc = "From Local node to Node 5",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "LOCAL_TO_NODE_6",
> +     .udesc = "From Local node to Node 6",
> +     .ucode = 0x40,
> +   },
> +   { .uname = "LOCAL_TO_NODE_7",
> +     .udesc = "From Local node to Node 7",
> +     .ucode = 0x80,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_io_to_dram_requests_to_target_node[]={
> +   { .uname = "LOCAL_TO_NODE_0",
> +     .udesc = "From Local node to Node 0",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "LOCAL_TO_NODE_1",
> +     .udesc = "From Local node to Node 1",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "LOCAL_TO_NODE_2",
> +     .udesc = "From Local node to Node 2",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "LOCAL_TO_NODE_3",
> +     .udesc = "From Local node to Node 3",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "LOCAL_TO_NODE_4",
> +     .udesc = "From Local node to Node 4",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "LOCAL_TO_NODE_5",
> +     .udesc = "From Local node to Node 5",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "LOCAL_TO_NODE_6",
> +     .udesc = "From Local node to Node 6",
> +     .ucode = 0x40,
> +   },
> +   { .uname = "LOCAL_TO_NODE_7",
> +     .udesc = "From Local node to Node 7",
> +     .ucode = 0x80,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_cpu_read_command_requests_to_target_node_0_3[]={
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_0",
> +     .udesc = "Read block From Local node to Node 0",
> +     .ucode = 0x11,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_0",
> +     .udesc = "Read block shared From Local node to Node 0",
> +     .ucode = 0x12,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_0",
> +     .udesc = "Read block modified From Local node to Node 0",
> +     .ucode = 0x14,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_0",
> +     .udesc = "Change-to-Dirty From Local node to Node 0",
> +     .ucode = 0x18,
> +   },
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_1",
> +     .udesc = "Read block From Local node to Node 1",
> +     .ucode = 0x21,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_1",
> +     .udesc = "Read block shared From Local node to Node 1",
> +     .ucode = 0x22,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_1",
> +     .udesc = "Read block modified From Local node to Node 1",
> +     .ucode = 0x24,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_1",
> +     .udesc = "Change-to-Dirty From Local node to Node 1",
> +     .ucode = 0x28,
> +   },
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_2",
> +     .udesc = "Read block From Local node to Node 2",
> +     .ucode = 0x41,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_2",
> +     .udesc = "Read block shared From Local node to Node 2",
> +     .ucode = 0x42,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_2",
> +     .udesc = "Read block modified From Local node to Node 2",
> +     .ucode = 0x44,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_2",
> +     .udesc = "Change-to-Dirty From Local node to Node 2",
> +     .ucode = 0x48,
> +   },
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_3",
> +     .udesc = "Read block From Local node to Node 3",
> +     .ucode = 0x81,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_3",
> +     .udesc = "Read block shared From Local node to Node 3",
> +     .ucode = 0x82,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_3",
> +     .udesc = "Read block modified From Local node to Node 3",
> +     .ucode = 0x84,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_3",
> +     .udesc = "Change-to-Dirty From Local node to Node 3",
> +     .ucode = 0x88,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_cpu_read_command_requests_to_target_node_4_7[]={
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_4",
> +     .udesc = "Read block From Local node to Node 4",
> +     .ucode = 0x11,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_4",
> +     .udesc = "Read block shared From Local node to Node 4",
> +     .ucode = 0x12,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_4",
> +     .udesc = "Read block modified From Local node to Node 4",
> +     .ucode = 0x14,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_4",
> +     .udesc = "Change-to-Dirty From Local node to Node 4",
> +     .ucode = 0x18,
> +   },
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_5",
> +     .udesc = "Read block From Local node to Node 5",
> +     .ucode = 0x21,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_5",
> +     .udesc = "Read block shared From Local node to Node 5",
> +     .ucode = 0x22,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_5",
> +     .udesc = "Read block modified From Local node to Node 5",
> +     .ucode = 0x24,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_5",
> +     .udesc = "Change-to-Dirty From Local node to Node 5",
> +     .ucode = 0x28,
> +   },
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_6",
> +     .udesc = "Read block From Local node to Node 6",
> +     .ucode = 0x41,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_6",
> +     .udesc = "Read block shared From Local node to Node 6",
> +     .ucode = 0x42,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_6",
> +     .udesc = "Read block modified From Local node to Node 6",
> +     .ucode = 0x44,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_6",
> +     .udesc = "Change-to-Dirty From Local node to Node 6",
> +     .ucode = 0x48,
> +   },
> +   { .uname = "READ_BLOCK_LOCAL_TO_NODE_7",
> +     .udesc = "Read block From Local node to Node 7",
> +     .ucode = 0x81,
> +   },
> +   { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_7",
> +     .udesc = "Read block shared From Local node to Node 7",
> +     .ucode = 0x82,
> +   },
> +   { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_7",
> +     .udesc = "Read block modified From Local node to Node 7",
> +     .ucode = 0x84,
> +   },
> +   { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_7",
> +     .udesc = "Change-to-Dirty From Local node to Node 7",
> +     .ucode = 0x88,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_cpu_command_requests_to_target_node[]={
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_0",
> +     .udesc = "Read Sized From Local node to Node 0",
> +     .ucode = 0x11,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_0",
> +     .udesc = "Write Sized From Local node to Node 0",
> +     .ucode = 0x12,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_0",
> +     .udesc = "Victim Block From Local node to Node 0",
> +     .ucode = 0x14,
> +   },
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_1",
> +     .udesc = "Read Sized From Local node to Node 1",
> +     .ucode = 0x21,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_1",
> +     .udesc = "Write Sized From Local node to Node 1",
> +     .ucode = 0x22,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_1",
> +     .udesc = "Victim Block From Local node to Node 1",
> +     .ucode = 0x24,
> +   },
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_2",
> +     .udesc = "Read Sized From Local node to Node 2",
> +     .ucode = 0x41,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_2",
> +     .udesc = "Write Sized From Local node to Node 2",
> +     .ucode = 0x42,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_2",
> +     .udesc = "Victim Block From Local node to Node 2",
> +     .ucode = 0x44,
> +   },
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_3",
> +     .udesc = "Read Sized From Local node to Node 3",
> +     .ucode = 0x81,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_3",
> +     .udesc = "Write Sized From Local node to Node 3",
> +     .ucode = 0x82,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_3",
> +     .udesc = "Victim Block From Local node to Node 3",
> +     .ucode = 0x84,
> +   },
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_4",
> +     .udesc = "Read Sized From Local node to Node 4",
> +     .ucode = 0x19,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_4",
> +     .udesc = "Write Sized From Local node to Node 4",
> +     .ucode = 0x1a,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_4",
> +     .udesc = "Victim Block From Local node to Node 4",
> +     .ucode = 0x1c,
> +   },
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_5",
> +     .udesc = "Read Sized From Local node to Node 5",
> +     .ucode = 0x29,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_5",
> +     .udesc = "Write Sized From Local node to Node 5",
> +     .ucode = 0x2a,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_5",
> +     .udesc = "Victim Block From Local node to Node 5",
> +     .ucode = 0x2c,
> +   },
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_6",
> +     .udesc = "Read Sized From Local node to Node 6",
> +     .ucode = 0x49,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_6",
> +     .udesc = "Write Sized From Local node to Node 6",
> +     .ucode = 0x4a,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_6",
> +     .udesc = "Victim Block From Local node to Node 6",
> +     .ucode = 0x4c,
> +   },
> +   { .uname = "READ_SIZED_LOCAL_TO_NODE_7",
> +     .udesc = "Read Sized From Local node to Node 7",
> +     .ucode = 0x89,
> +   },
> +   { .uname = "WRITE_SIZED_LOCAL_TO_NODE_7",
> +     .udesc = "Write Sized From Local node to Node 7",
> +     .ucode = 0x8a,
> +   },
> +   { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_7",
> +     .udesc = "Victim Block From Local node to Node 7",
> +     .ucode = 0x8c,
> +   },
> +   { .uname  = "ALL_LOCAL_TO_NODE_0_3",
> +     .udesc  = "All From Local node to Node 0-3",
> +     .ucode = 0xf7,
> +     .uflags= AMD64_FL_NCOMBO,
> +   },
> +   { .uname  = "ALL_LOCAL_TO_NODE_4_7",
> +     .udesc  = "All From Local node to Node 4-7",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_request_cache_status_0[]={
> +   { .uname = "PROBE_HIT_S",
> +     .udesc = "Probe Hit S",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "PROBE_HIT_E",
> +     .udesc = "Probe Hit E",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "PROBE_HIT_MUW_OR_O",
> +     .udesc = "Probe Hit MuW or O",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "PROBE_HIT_M",
> +     .udesc = "Probe Hit M",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "PROBE_MISS",
> +     .udesc = "Probe Miss",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "DIRECTED_PROBE",
> +     .udesc = "Directed Probe",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "TRACK_CACHE_STAT_FOR_RDBLK",
> +     .udesc = "Track Cache Stat for RdBlk",
> +     .ucode = 0x40,
> +   },
> +   { .uname = "TRACK_CACHE_STAT_FOR_RDBLKS",
> +     .udesc = "Track Cache Stat for RdBlkS",
> +     .ucode = 0x80,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_request_cache_status_1[]={
> +   { .uname = "PROBE_HIT_S",
> +     .udesc = "Probe Hit S",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "PROBE_HIT_E",
> +     .udesc = "Probe Hit E",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "PROBE_HIT_MUW_OR_O",
> +     .udesc = "Probe Hit MuW or O",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "PROBE_HIT_M",
> +     .udesc = "Probe Hit M",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "PROBE_MISS",
> +     .udesc = "Probe Miss",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "DIRECTED_PROBE",
> +     .udesc = "Directed Probe",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "TRACK_CACHE_STAT_FOR_CHGTODIRTY",
> +     .udesc = "Track Cache Stat for ChgToDirty",
> +     .ucode = 0x40,
> +   },
> +   { .uname = "TRACK_CACHE_STAT_FOR_RDBLKM",
> +     .udesc = "Track Cache Stat for RdBlkM",
> +     .ucode = 0x80,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_memory_controller_requests[]={
> +   { .uname = "WRITE_REQUESTS_TO_DCT",
> +     .udesc = "Write requests sent to the DCT",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "READ_REQUESTS_TO_DCT",
> +     .udesc = "Read requests (including prefetch requests) sent to the DCT",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "PREFETCH_REQUESTS_TO_DCT",
> +     .udesc = "Prefetch requests sent to the DCT",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "32_BYTES_SIZED_WRITES",
> +     .udesc = "32 Bytes Sized Writes",
> +     .ucode = 0x8,
> +   },
> +   { .uname = "64_BYTES_SIZED_WRITES",
> +     .udesc = "64 Bytes Sized Writes",
> +     .ucode = 0x10,
> +   },
> +   { .uname = "32_BYTES_SIZED_READS",
> +     .udesc = "32 Bytes Sized Reads",
> +     .ucode = 0x20,
> +   },
> +   { .uname = "64_BYTE_SIZED_READS",
> +     .udesc = "64 Byte Sized Reads",
> +     .ucode = 0x40,
> +   },
> +   { .uname = "READ_REQUESTS_TO_DCT_WHILE_WRITES_PENDING",
> +     .udesc = "Read requests sent to the DCT while writes requests are pending in the DCT",
> +     .ucode = 0x80,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_read_request_to_l3_cache[]={
> +   { .uname = "READ_BLOCK_EXCLUSIVE",
> +     .udesc = "Read Block Exclusive (Data cache read)",
> +     .ucode = 0xf1,
> +   },
> +   { .uname = "READ_BLOCK_SHARED",
> +     .udesc = "Read Block Shared (Instruction cache read)",
> +     .ucode = 0xf2,
> +   },
> +   { .uname = "READ_BLOCK_MODIFY",
> +     .udesc = "Read Block Modify",
> +     .ucode = 0xf4,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xf7,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_l3_fills_caused_by_l2_evictions[]={
> +   { .uname = "SHARED",
> +     .udesc = "Shared",
> +     .ucode = 0xf1,
> +   },
> +   { .uname = "EXCLUSIVE",
> +     .udesc = "Exclusive",
> +     .ucode = 0xf2,
> +   },
> +   { .uname = "OWNED",
> +     .udesc = "Owned",
> +     .ucode = 0xf4,
> +   },
> +   { .uname = "MODIFIED",
> +     .udesc = "Modified",
> +     .ucode = 0xf8,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xff,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_l3_evictions[]={
> +   { .uname = "SHARED",
> +     .udesc = "Shared",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "EXCLUSIVE",
> +     .udesc = "Exclusive",
> +     .ucode = 0x2,
> +   },
> +   { .uname = "OWNED",
> +     .udesc = "Owned",
> +     .ucode = 0x4,
> +   },
> +   { .uname = "MODIFIED",
> +     .udesc = "Modified",
> +     .ucode = 0x8,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0xf,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
> +static const amd64_umask_t amd64_fam15h_l3_latency[]={
> +   { .uname = "L3_REQUEST_CYCLE",
> +     .udesc = "L3 Request cycle count.",
> +     .ucode = 0x1,
> +   },
> +   { .uname = "L3_REQUEST",
> +     .udesc = "L3 request count.",
> +     .ucode = 0x2,
> +   },
> +   { .uname  = "ALL",
> +     .udesc  = "All sub-events selected",
> +     .ucode = 0x3,
> +     .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,
> +   },
> +};
> +
>  static const amd64_entry_t amd64_fam15h_pe[]={
>  { .name    = "DISPATCHED_FPU_OPS",
>    .desc    = "FPU Pipe Assignment",
> @@ -1256,4 +2160,228 @@ static const amd64_entry_t amd64_fam15h_pe[]={
>    .modmsk  = AMD64_FAM15H_ATTRS,
>    .code    = 0x1d8,
>  },
> +{ .name    = "DRAM_ACCESSES",
> +  .desc    = "DRAM Accesses",
> +  .code    = 0xe0,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_dram_accesses),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_dram_accesses,
> +},
> +{ .name    = "DRAM_CONTROLLER_PAGE_TABLE_OVERFLOWS",
> +  .desc    = "DRAM Controller Page Table Overflows",
> +  .code    = 0xe1,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_dram_controller_page_table_overflows),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_dram_controller_page_table_overflows,
> +},
> +{ .name    = "MEMORY_CONTROLLER_DRAM_COMMAND_SLOTS_MISSED",
> +  .desc    = "Memory Controller DRAM Command Slots Missed",
> +  .code    = 0xe2,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_dram_command_slots_missed),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_memory_controller_dram_command_slots_missed,
> +},
> +{ .name    = "MEMORY_CONTROLLER_TURNAROUNDS",
> +  .desc    = "Memory Controller Turnarounds",
> +  .code    = 0xe3,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_turnarounds),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_memory_controller_turnarounds,
> +},
> +{ .name    = "MEMORY_CONTROLLER_BYPASS_COUNTER_SATURATION",
> +  .desc    = "Memory Controller Bypass Counter Saturation",
> +  .code    = 0xe4,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_bypass_counter_saturation),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_memory_controller_bypass_counter_saturation,
> +},
> +{ .name    = "THERMAL_STATUS",
> +  .desc    = "Thermal Status",
> +  .code    = 0xe8,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_thermal_status),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_thermal_status,
> +},
> +{ .name    = "CPU_IO_REQUESTS_TO_MEMORY_IO",
> +  .desc    = "CPU/IO Requests to Memory/IO",
> +  .code    = 0xe9,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_io_requests_to_memory_io),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_io_requests_to_memory_io,
> +},
> +{ .name    = "CACHE_BLOCK_COMMANDS",
> +  .desc    = "Cache Block Commands",
> +  .code    = 0xea,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cache_block_commands),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cache_block_commands,
> +},
> +{ .name    = "SIZED_COMMANDS",
> +  .desc    = "Sized Commands",
> +  .code    = 0xeb,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_sized_commands),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_sized_commands,
> +},
> +{ .name    = "PROBE_RESPONSES_AND_UPSTREAM_REQUESTS",
> +  .desc    = "Probe Responses and Upstream Requests",
> +  .code    = 0xec,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_probe_responses_and_upstream_requests),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_probe_responses_and_upstream_requests,
> +},
> +{ .name    = "GART_EVENTS",
> +  .desc    = "GART Events",
> +  .code    = 0xee,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_gart_events),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_gart_events,
> +},
> +{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_0",
> +  .desc    = "Link Transmit Bandwidth Link 0",
> +  .code    = 0xf6,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_link_transmit_bandwidth,
> +},
> +{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_1",
> +  .desc    = "Link Transmit Bandwidth Link 1",
> +  .code    = 0xf7,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_link_transmit_bandwidth,
> +},
> +{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_2",
> +  .desc    = "Link Transmit Bandwidth Link 2",
> +  .code    = 0xf8,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_link_transmit_bandwidth,
> +},
> +{ .name    = "LINK_TRANSMIT_BANDWIDTH_LINK_3",
> +  .desc    = "Link Transmit Bandwidth Link 3",
> +  .code    = 0x1f9,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_link_transmit_bandwidth,
> +},
> +{ .name    = "CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE",
> +  .desc    = "CPU to DRAM Requests to Target Node",
> +  .code    = 0x1e0,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_to_dram_requests_to_target_node),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_to_dram_requests_to_target_node,
> +},
> +{ .name    = "IO_TO_DRAM_REQUESTS_TO_TARGET_NODE",
> +  .desc    = "IO to DRAM Requests to Target Node",
> +  .code    = 0x1e1,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_io_to_dram_requests_to_target_node),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_io_to_dram_requests_to_target_node,
> +},
> +{ .name    = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_0_3",
> +  .desc    = "CPU Read Command Latency to Target Node 0-3",
> +  .code    = 0x1e2,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_0_3),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_0_3,
> +},
> +{ .name    = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_0_3",
> +  .desc    = "CPU Read Command Requests to Target Node 0-3",
> +  .code    = 0x1e3,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_0_3),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_0_3,
> +},
> +{ .name    = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_4_7",
> +  .desc    = "CPU Read Command Latency to Target Node 4-7",
> +  .code    = 0x1e4,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_4_7),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_4_7,
> +},
> +{ .name    = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_4_7",
> +  .desc    = "CPU Read Command Requests to Target Node 4-7",
> +  .code    = 0x1e5,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_4_7),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_read_command_requests_to_target_node_4_7,
> +},
> +{ .name    = "CPU_COMMAND_LATENCY_TO_TARGET_NODE",
> +  .desc    = "CPU Command Latency to Target Node",
> +  .code    = 0x1e6,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_command_requests_to_target_node),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_command_requests_to_target_node,
> +},
> +{ .name    = "CPU_REQUESTS_TO_TARGET_NODE",
> +  .desc    = "CPU Requests to Target Node",
> +  .code    = 0x1e7,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_command_requests_to_target_node),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_cpu_command_requests_to_target_node,
> +},
> +{ .name    = "REQUEST_CACHE_STATUS_0",
> +  .desc    = "Request Cache Status 0",
> +  .code    = 0x1ea,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_request_cache_status_0),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_request_cache_status_0,
> +},
> +{ .name    = "REQUEST_CACHE_STATUS_1",
> +  .desc    = "Request Cache Status 1",
> +  .code    = 0x1eb,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_request_cache_status_1),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_request_cache_status_1,
> +},
> +{ .name    = "MEMORY_CONTROLLER_REQUESTS",
> +  .desc    = "Memory Controller Requests",
> +  .code    = 0x1f0,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_requests),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_memory_controller_requests,
> +},
> +{ .name    = "READ_REQUEST_TO_L3_CACHE",
> +  .desc    = "Read Request to L3 Cache",
> +  .code    = 0x4e0,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_read_request_to_l3_cache,
> +},
> +{ .name    = "L3_CACHE_MISSES",
> +  .desc    = "L3 Cache Misses",
> +  .code    = 0x4e1,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_read_request_to_l3_cache,
> +},
> +{ .name    = "L3_FILLS_CAUSED_BY_L2_EVICTIONS",
> +  .desc    = "L3 Fills caused by L2 Evictions",
> +  .code    = 0x4e2,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_fills_caused_by_l2_evictions),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_l3_fills_caused_by_l2_evictions,
> +},
> +{ .name    = "L3_EVICTIONS",
> +  .desc    = "L3 Evictions",
> +  .code    = 0x4e3,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_evictions),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_l3_evictions,
> +},
> +{ .name    = "NON_CANCELED_L3_READ_REQUESTS",
> +  .desc    = "Non-canceled L3 Read Requests",
> +  .code    = 0x4ed,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_read_request_to_l3_cache,
> +},
> +{ .name    = "L3_LATENCY",
> +  .desc    = "L3 Latency",
> +  .code    = 0x4ef,
> +  .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_latency),
> +  .ngrp    = 1,
> +  .umasks  = amd64_fam15h_l3_latency,
> +},
>  };
> --
> 1.7.9.5
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h
  2013-02-06 17:26 ` [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
  2013-02-07 17:57   ` Jacob Shin
@ 2013-02-08 11:16   ` Stephane Eranian
  2013-02-11 16:26     ` Jacob Shin
  2013-02-18  8:30   ` [tip:perf/core] perf/x86/amd: " tip-bot for Jacob Shin
  2 siblings, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-02-08 11:16 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Jiri Olsa, LKML

On Wed, Feb 6, 2013 at 6:26 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> On AMD family 15h processors, there are 4 new performance counters
> (in addition to 6 core performance counters) that can be used for
> counting northbridge events (i.e. DRAM accesses). Their bit fields are
> almost identical to the core performance counters. However, unlike the
> core performance counters, these MSRs are shared between multiple
> cores (that share the same northbridge). We will reuse the same code
> path as existing family 10h northbridge event constraints handler
> logic to enforce this sharing.
>
> Signed-off-by: Jacob Shin <jacob.shin@amd.com>

Works for me.

I simply regret that the design decision ties uncore with core
even though hardware-wise they are separate. If I recall the earlier
discussion the motivation was to limit code duplication. That's true
but that's at the expense of isolation. For instance, now if the core
PMU is overcommitted, but not the uncore, then uncore still goes
thru event rescheduling for nothing.

But what matters at this point, is that there is coverage
for uncore, so we can get some bandwidth measurements
out. So i recommend we merge this in. Thanks.

Acked-by: Stephane Eranian <eranian@google.com>

> ---
>  arch/x86/include/asm/cpufeature.h     |    2 +
>  arch/x86/include/asm/perf_event.h     |    9 ++
>  arch/x86/include/uapi/asm/msr-index.h |    2 +
>  arch/x86/kernel/cpu/perf_event_amd.c  |  171 +++++++++++++++++++++++++++++----
>  4 files changed, 164 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 2d9075e..93fe929 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -167,6 +167,7 @@
>  #define X86_FEATURE_TBM                (6*32+21) /* trailing bit manipulations */
>  #define X86_FEATURE_TOPOEXT    (6*32+22) /* topology extensions CPUID leafs */
>  #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
> +#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* NB performance counter extensions */
>
>  /*
>   * Auxiliary flags: Linux defined - For features scattered in various
> @@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
>  #define cpu_has_hypervisor     boot_cpu_has(X86_FEATURE_HYPERVISOR)
>  #define cpu_has_pclmulqdq      boot_cpu_has(X86_FEATURE_PCLMULQDQ)
>  #define cpu_has_perfctr_core   boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
> +#define cpu_has_perfctr_nb     boot_cpu_has(X86_FEATURE_PERFCTR_NB)
>  #define cpu_has_cx8            boot_cpu_has(X86_FEATURE_CX8)
>  #define cpu_has_cx16           boot_cpu_has(X86_FEATURE_CX16)
>  #define cpu_has_eager_fpu      boot_cpu_has(X86_FEATURE_EAGER_FPU)
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 2234eaaec..57cb634 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -29,9 +29,14 @@
>  #define ARCH_PERFMON_EVENTSEL_INV                      (1ULL << 23)
>  #define ARCH_PERFMON_EVENTSEL_CMASK                    0xFF000000ULL
>
> +#define AMD64_EVENTSEL_INT_CORE_ENABLE                 (1ULL << 36)
>  #define AMD64_EVENTSEL_GUESTONLY                       (1ULL << 40)
>  #define AMD64_EVENTSEL_HOSTONLY                                (1ULL << 41)
>
> +#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT              37
> +#define AMD64_EVENTSEL_INT_CORE_SEL_MASK               \
> +       (0xFULL << AMD64_EVENTSEL_INT_CORE_SEL_SHIFT)
> +
>  #define AMD64_EVENTSEL_EVENT   \
>         (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
>  #define INTEL_ARCH_EVENT_MASK  \
> @@ -46,8 +51,12 @@
>  #define AMD64_RAW_EVENT_MASK           \
>         (X86_RAW_EVENT_MASK          |  \
>          AMD64_EVENTSEL_EVENT)
> +#define AMD64_RAW_EVENT_MASK_NB                \
> +       (AMD64_EVENTSEL_EVENT        |  \
> +        ARCH_PERFMON_EVENTSEL_UMASK)
>  #define AMD64_NUM_COUNTERS                             4
>  #define AMD64_NUM_COUNTERS_CORE                                6
> +#define AMD64_NUM_COUNTERS_NB                          4
>
>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL          0x3c
>  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK                (0x00 << 8)
> diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
> index 1031604..27c05d2 100644
> --- a/arch/x86/include/uapi/asm/msr-index.h
> +++ b/arch/x86/include/uapi/asm/msr-index.h
> @@ -195,6 +195,8 @@
>  /* Fam 15h MSRs */
>  #define MSR_F15H_PERF_CTL              0xc0010200
>  #define MSR_F15H_PERF_CTR              0xc0010201
> +#define MSR_F15H_NB_PERF_CTL           0xc0010240
> +#define MSR_F15H_NB_PERF_CTR           0xc0010241
>
>  /* Fam 10h MSRs */
>  #define MSR_FAM10H_MMIO_CONF_BASE      0xc0010058
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index 05462f0..dfdab42 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -132,11 +132,14 @@ static u64 amd_pmu_event_map(int hw_event)
>         return amd_perfmon_event_map[hw_event];
>  }
>
> +static struct event_constraint *amd_nb_event_constraint;
> +
>  /*
>   * Previously calculated offsets
>   */
>  static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
>  static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
> +static unsigned int rdpmc_indexes[X86_PMC_IDX_MAX] __read_mostly;
>
>  /*
>   * Legacy CPUs:
> @@ -144,10 +147,14 @@ static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
>   *
>   * CPUs with core performance counter extensions:
>   *   6 counters starting at 0xc0010200 each offset by 2
> + *
> + * CPUs with north bridge performance counter extensions:
> + *   4 additional counters starting at 0xc0010240 each offset by 2
> + *   (indexed right above either one of the above core counters)
>   */
>  static inline int amd_pmu_addr_offset(int index, bool eventsel)
>  {
> -       int offset;
> +       int offset, first, base;
>
>         if (!index)
>                 return index;
> @@ -160,7 +167,23 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
>         if (offset)
>                 return offset;
>
> -       if (!cpu_has_perfctr_core)
> +       if (amd_nb_event_constraint &&
> +           test_bit(index, amd_nb_event_constraint->idxmsk)) {
> +               /*
> +                * calculate the offset of NB counters with respect to
> +                * base eventsel or perfctr
> +                */
> +
> +               first = find_first_bit(amd_nb_event_constraint->idxmsk,
> +                                      X86_PMC_IDX_MAX);
> +
> +               if (eventsel)
> +                       base = MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel;
> +               else
> +                       base = MSR_F15H_NB_PERF_CTR - x86_pmu.perfctr;
> +
> +               offset = base + ((index - first) << 1);
> +       } else if (!cpu_has_perfctr_core)
>                 offset = index;
>         else
>                 offset = index << 1;
> @@ -175,24 +198,36 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
>
>  static inline int amd_pmu_rdpmc_index(int index)
>  {
> -       return index;
> -}
> +       int ret, first;
>
> -static int amd_pmu_hw_config(struct perf_event *event)
> -{
> -       int ret;
> +       if (!index)
> +               return index;
>
> -       /* pass precise event sampling to ibs: */
> -       if (event->attr.precise_ip && get_ibs_caps())
> -               return -ENOENT;
> +       ret = rdpmc_indexes[index];
>
> -       ret = x86_pmu_hw_config(event);
>         if (ret)
>                 return ret;
>
> -       if (has_branch_stack(event))
> -               return -EOPNOTSUPP;
> +       if (amd_nb_event_constraint &&
> +           test_bit(index, amd_nb_event_constraint->idxmsk)) {
> +               /*
> +                * according to the mnual, ECX value of the NB counters is
> +                * the index of the NB counter (0, 1, 2 or 3) plus 6
> +                */
> +
> +               first = find_first_bit(amd_nb_event_constraint->idxmsk,
> +                                      X86_PMC_IDX_MAX);
> +               ret = index - first + 6;
> +       } else
> +               ret = index;
> +
> +       rdpmc_indexes[index] = ret;
>
> +       return ret;
> +}
> +
> +static int amd_core_hw_config(struct perf_event *event)
> +{
>         if (event->attr.exclude_host && event->attr.exclude_guest)
>                 /*
>                  * When HO == GO == 1 the hardware treats that as GO == HO == 0
> @@ -206,10 +241,33 @@ static int amd_pmu_hw_config(struct perf_event *event)
>         else if (event->attr.exclude_guest)
>                 event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
>
> -       if (event->attr.type != PERF_TYPE_RAW)
> -               return 0;
> +       return 0;
> +}
>
> -       event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> +/*
> + * NB counters do not support the following event select bits:
> + *   Host/Guest only
> + *   Counter mask
> + *   Invert counter mask
> + *   Edge detect
> + *   OS/User mode
> + */
> +static int amd_nb_hw_config(struct perf_event *event)
> +{
> +       /* for NB, we only allow system wide counting mode */
> +       if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> +               return -EINVAL;
> +
> +       if (event->attr.exclude_user || event->attr.exclude_kernel ||
> +           event->attr.exclude_host || event->attr.exclude_guest)
> +               return -EINVAL;
> +
> +       event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
> +                             ARCH_PERFMON_EVENTSEL_OS);
> +
> +       if (event->hw.config & ~(AMD64_RAW_EVENT_MASK_NB |
> +                                ARCH_PERFMON_EVENTSEL_INT))
> +               return -EINVAL;
>
>         return 0;
>  }
> @@ -227,6 +285,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
>         return (hwc->config & 0xe0) == 0xe0;
>  }
>
> +static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
> +{
> +       return amd_nb_event_constraint && amd_is_nb_event(hwc);
> +}
> +
>  static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>  {
>         struct amd_nb *nb = cpuc->amd_nb;
> @@ -234,6 +297,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
>         return nb && nb->nb_id != -1;
>  }
>
> +static int amd_pmu_hw_config(struct perf_event *event)
> +{
> +       int ret;
> +
> +       /* pass precise event sampling to ibs: */
> +       if (event->attr.precise_ip && get_ibs_caps())
> +               return -ENOENT;
> +
> +       if (has_branch_stack(event))
> +               return -EOPNOTSUPP;
> +
> +       ret = x86_pmu_hw_config(event);
> +       if (ret)
> +               return ret;
> +
> +       if (event->attr.type == PERF_TYPE_RAW)
> +               event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> +
> +       if (amd_is_perfctr_nb_event(&event->hw))
> +               return amd_nb_hw_config(event);
> +
> +       return amd_core_hw_config(event);
> +}
> +
>  static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
>                                            struct perf_event *event)
>  {
> @@ -254,6 +341,19 @@ static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
>         }
>  }
>
> +static void amd_nb_interrupt_hw_config(struct hw_perf_event *hwc)
> +{
> +       int core_id = cpu_data(smp_processor_id()).cpu_core_id;
> +
> +       /* deliver interrupts only to this core */
> +       if (hwc->config & ARCH_PERFMON_EVENTSEL_INT) {
> +               hwc->config |= AMD64_EVENTSEL_INT_CORE_ENABLE;
> +               hwc->config &= ~AMD64_EVENTSEL_INT_CORE_SEL_MASK;
> +               hwc->config |= (u64)(core_id) <<
> +                       AMD64_EVENTSEL_INT_CORE_SEL_SHIFT;
> +       }
> +}
> +
>   /*
>    * AMD64 NorthBridge events need special treatment because
>    * counter access needs to be synchronized across all cores
> @@ -299,6 +399,12 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
>         struct perf_event *old;
>         int idx, new = -1;
>
> +       if (!c)
> +               c = &unconstrained;
> +
> +       if (cpuc->is_fake)
> +               return c;
> +
>         /*
>          * detect if already present, if so reuse
>          *
> @@ -335,6 +441,9 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
>         if (new == -1)
>                 return &emptyconstraint;
>
> +       if (amd_is_perfctr_nb_event(hwc))
> +               amd_nb_interrupt_hw_config(hwc);
> +
>         return &nb->event_constraints[new];
>  }
>
> @@ -434,7 +543,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
>         if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
>                 return &unconstrained;
>
> -       return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
> +       return __amd_get_nb_event_constraints(cpuc, event,
> +                                             amd_nb_event_constraint);
>  }
>
>  static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
> @@ -533,6 +643,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
>  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
>  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
>
> +static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
> +static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
> +
>  static struct event_constraint *
>  amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
>  {
> @@ -598,8 +711,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
>                         return &amd_f15_PMC20;
>                 }
>         case AMD_EVENT_NB:
> -               /* not yet implemented */
> -               return &emptyconstraint;
> +               return __amd_get_nb_event_constraints(cpuc, event,
> +                                                     amd_nb_event_constraint);
>         default:
>                 return &emptyconstraint;
>         }
> @@ -647,7 +760,7 @@ static __initconst const struct x86_pmu amd_pmu = {
>
>  static int setup_event_constraints(void)
>  {
> -       if (boot_cpu_data.x86 >= 0x15)
> +       if (boot_cpu_data.x86 == 0x15)
>                 x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
>         return 0;
>  }
> @@ -677,6 +790,23 @@ static int setup_perfctr_core(void)
>         return 0;
>  }
>
> +static int setup_perfctr_nb(void)
> +{
> +       if (!cpu_has_perfctr_nb)
> +               return -ENODEV;
> +
> +       x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
> +
> +       if (cpu_has_perfctr_core)
> +               amd_nb_event_constraint = &amd_NBPMC96;
> +       else
> +               amd_nb_event_constraint = &amd_NBPMC74;
> +
> +       printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
> +
> +       return 0;
> +}
> +
>  __init int amd_pmu_init(void)
>  {
>         /* Performance-monitoring supported from K7 and later: */
> @@ -687,6 +817,7 @@ __init int amd_pmu_init(void)
>
>         setup_event_constraints();
>         setup_perfctr_core();
> +       setup_perfctr_nb();
>
>         /* Events are common for all AMDs */
>         memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
> --
> 1.7.9.5
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h
  2013-02-08 11:16   ` Stephane Eranian
@ 2013-02-11 16:26     ` Jacob Shin
  2013-02-15 20:51       ` Jacob Shin
  0 siblings, 1 reply; 21+ messages in thread
From: Jacob Shin @ 2013-02-11 16:26 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Jiri Olsa, LKML

On Fri, Feb 08, 2013 at 12:16:28PM +0100, Stephane Eranian wrote:
> On Wed, Feb 6, 2013 at 6:26 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> > On AMD family 15h processors, there are 4 new performance counters
> > (in addition to 6 core performance counters) that can be used for
> > counting northbridge events (i.e. DRAM accesses). Their bit fields are
> > almost identical to the core performance counters. However, unlike the
> > core performance counters, these MSRs are shared between multiple
> > cores (that share the same northbridge). We will reuse the same code
> > path as existing family 10h northbridge event constraints handler
> > logic to enforce this sharing.
> >
> > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> 
> Works for me.
> 
> I simply regret that the design decision ties uncore with core
> even though hardware-wise they are separate. If I recall the earlier
> discussion the motivation was to limit code duplication. That's true
> but that's at the expense of isolation. For instance, now if the core
> PMU is overcommitted, but not the uncore, then uncore still goes
> thru event rescheduling for nothing.
> 
> But what matters at this point, is that there is coverage
> for uncore, so we can get some bandwidth measurements
> out. So i recommend we merge this in. Thanks.
> 
> Acked-by: Stephane Eranian <eranian@google.com>

Stephane, thank you for your time reviewing/testing the patchset.

Ingo, could you please commit this patch 6/6 to tip?

Thank you,

-Jacob

> 
> > ---
> >  arch/x86/include/asm/cpufeature.h     |    2 +
> >  arch/x86/include/asm/perf_event.h     |    9 ++
> >  arch/x86/include/uapi/asm/msr-index.h |    2 +
> >  arch/x86/kernel/cpu/perf_event_amd.c  |  171 +++++++++++++++++++++++++++++----
> >  4 files changed, 164 insertions(+), 20 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > index 2d9075e..93fe929 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -167,6 +167,7 @@
> >  #define X86_FEATURE_TBM                (6*32+21) /* trailing bit manipulations */
> >  #define X86_FEATURE_TOPOEXT    (6*32+22) /* topology extensions CPUID leafs */
> >  #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
> > +#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* NB performance counter extensions */
> >
> >  /*
> >   * Auxiliary flags: Linux defined - For features scattered in various
> > @@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
> >  #define cpu_has_hypervisor     boot_cpu_has(X86_FEATURE_HYPERVISOR)
> >  #define cpu_has_pclmulqdq      boot_cpu_has(X86_FEATURE_PCLMULQDQ)
> >  #define cpu_has_perfctr_core   boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
> > +#define cpu_has_perfctr_nb     boot_cpu_has(X86_FEATURE_PERFCTR_NB)
> >  #define cpu_has_cx8            boot_cpu_has(X86_FEATURE_CX8)
> >  #define cpu_has_cx16           boot_cpu_has(X86_FEATURE_CX16)
> >  #define cpu_has_eager_fpu      boot_cpu_has(X86_FEATURE_EAGER_FPU)
> > diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> > index 2234eaaec..57cb634 100644
> > --- a/arch/x86/include/asm/perf_event.h
> > +++ b/arch/x86/include/asm/perf_event.h
> > @@ -29,9 +29,14 @@
> >  #define ARCH_PERFMON_EVENTSEL_INV                      (1ULL << 23)
> >  #define ARCH_PERFMON_EVENTSEL_CMASK                    0xFF000000ULL
> >
> > +#define AMD64_EVENTSEL_INT_CORE_ENABLE                 (1ULL << 36)
> >  #define AMD64_EVENTSEL_GUESTONLY                       (1ULL << 40)
> >  #define AMD64_EVENTSEL_HOSTONLY                                (1ULL << 41)
> >
> > +#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT              37
> > +#define AMD64_EVENTSEL_INT_CORE_SEL_MASK               \
> > +       (0xFULL << AMD64_EVENTSEL_INT_CORE_SEL_SHIFT)
> > +
> >  #define AMD64_EVENTSEL_EVENT   \
> >         (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
> >  #define INTEL_ARCH_EVENT_MASK  \
> > @@ -46,8 +51,12 @@
> >  #define AMD64_RAW_EVENT_MASK           \
> >         (X86_RAW_EVENT_MASK          |  \
> >          AMD64_EVENTSEL_EVENT)
> > +#define AMD64_RAW_EVENT_MASK_NB                \
> > +       (AMD64_EVENTSEL_EVENT        |  \
> > +        ARCH_PERFMON_EVENTSEL_UMASK)
> >  #define AMD64_NUM_COUNTERS                             4
> >  #define AMD64_NUM_COUNTERS_CORE                                6
> > +#define AMD64_NUM_COUNTERS_NB                          4
> >
> >  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL          0x3c
> >  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK                (0x00 << 8)
> > diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
> > index 1031604..27c05d2 100644
> > --- a/arch/x86/include/uapi/asm/msr-index.h
> > +++ b/arch/x86/include/uapi/asm/msr-index.h
> > @@ -195,6 +195,8 @@
> >  /* Fam 15h MSRs */
> >  #define MSR_F15H_PERF_CTL              0xc0010200
> >  #define MSR_F15H_PERF_CTR              0xc0010201
> > +#define MSR_F15H_NB_PERF_CTL           0xc0010240
> > +#define MSR_F15H_NB_PERF_CTR           0xc0010241
> >
> >  /* Fam 10h MSRs */
> >  #define MSR_FAM10H_MMIO_CONF_BASE      0xc0010058
> > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > index 05462f0..dfdab42 100644
> > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > @@ -132,11 +132,14 @@ static u64 amd_pmu_event_map(int hw_event)
> >         return amd_perfmon_event_map[hw_event];
> >  }
> >
> > +static struct event_constraint *amd_nb_event_constraint;
> > +
> >  /*
> >   * Previously calculated offsets
> >   */
> >  static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
> >  static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
> > +static unsigned int rdpmc_indexes[X86_PMC_IDX_MAX] __read_mostly;
> >
> >  /*
> >   * Legacy CPUs:
> > @@ -144,10 +147,14 @@ static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
> >   *
> >   * CPUs with core performance counter extensions:
> >   *   6 counters starting at 0xc0010200 each offset by 2
> > + *
> > + * CPUs with north bridge performance counter extensions:
> > + *   4 additional counters starting at 0xc0010240 each offset by 2
> > + *   (indexed right above either one of the above core counters)
> >   */
> >  static inline int amd_pmu_addr_offset(int index, bool eventsel)
> >  {
> > -       int offset;
> > +       int offset, first, base;
> >
> >         if (!index)
> >                 return index;
> > @@ -160,7 +167,23 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
> >         if (offset)
> >                 return offset;
> >
> > -       if (!cpu_has_perfctr_core)
> > +       if (amd_nb_event_constraint &&
> > +           test_bit(index, amd_nb_event_constraint->idxmsk)) {
> > +               /*
> > +                * calculate the offset of NB counters with respect to
> > +                * base eventsel or perfctr
> > +                */
> > +
> > +               first = find_first_bit(amd_nb_event_constraint->idxmsk,
> > +                                      X86_PMC_IDX_MAX);
> > +
> > +               if (eventsel)
> > +                       base = MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel;
> > +               else
> > +                       base = MSR_F15H_NB_PERF_CTR - x86_pmu.perfctr;
> > +
> > +               offset = base + ((index - first) << 1);
> > +       } else if (!cpu_has_perfctr_core)
> >                 offset = index;
> >         else
> >                 offset = index << 1;
> > @@ -175,24 +198,36 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
> >
> >  static inline int amd_pmu_rdpmc_index(int index)
> >  {
> > -       return index;
> > -}
> > +       int ret, first;
> >
> > -static int amd_pmu_hw_config(struct perf_event *event)
> > -{
> > -       int ret;
> > +       if (!index)
> > +               return index;
> >
> > -       /* pass precise event sampling to ibs: */
> > -       if (event->attr.precise_ip && get_ibs_caps())
> > -               return -ENOENT;
> > +       ret = rdpmc_indexes[index];
> >
> > -       ret = x86_pmu_hw_config(event);
> >         if (ret)
> >                 return ret;
> >
> > -       if (has_branch_stack(event))
> > -               return -EOPNOTSUPP;
> > +       if (amd_nb_event_constraint &&
> > +           test_bit(index, amd_nb_event_constraint->idxmsk)) {
> > +               /*
> > +                * according to the mnual, ECX value of the NB counters is
> > +                * the index of the NB counter (0, 1, 2 or 3) plus 6
> > +                */
> > +
> > +               first = find_first_bit(amd_nb_event_constraint->idxmsk,
> > +                                      X86_PMC_IDX_MAX);
> > +               ret = index - first + 6;
> > +       } else
> > +               ret = index;
> > +
> > +       rdpmc_indexes[index] = ret;
> >
> > +       return ret;
> > +}
> > +
> > +static int amd_core_hw_config(struct perf_event *event)
> > +{
> >         if (event->attr.exclude_host && event->attr.exclude_guest)
> >                 /*
> >                  * When HO == GO == 1 the hardware treats that as GO == HO == 0
> > @@ -206,10 +241,33 @@ static int amd_pmu_hw_config(struct perf_event *event)
> >         else if (event->attr.exclude_guest)
> >                 event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
> >
> > -       if (event->attr.type != PERF_TYPE_RAW)
> > -               return 0;
> > +       return 0;
> > +}
> >
> > -       event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> > +/*
> > + * NB counters do not support the following event select bits:
> > + *   Host/Guest only
> > + *   Counter mask
> > + *   Invert counter mask
> > + *   Edge detect
> > + *   OS/User mode
> > + */
> > +static int amd_nb_hw_config(struct perf_event *event)
> > +{
> > +       /* for NB, we only allow system wide counting mode */
> > +       if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> > +               return -EINVAL;
> > +
> > +       if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > +           event->attr.exclude_host || event->attr.exclude_guest)
> > +               return -EINVAL;
> > +
> > +       event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
> > +                             ARCH_PERFMON_EVENTSEL_OS);
> > +
> > +       if (event->hw.config & ~(AMD64_RAW_EVENT_MASK_NB |
> > +                                ARCH_PERFMON_EVENTSEL_INT))
> > +               return -EINVAL;
> >
> >         return 0;
> >  }
> > @@ -227,6 +285,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
> >         return (hwc->config & 0xe0) == 0xe0;
> >  }
> >
> > +static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
> > +{
> > +       return amd_nb_event_constraint && amd_is_nb_event(hwc);
> > +}
> > +
> >  static inline int amd_has_nb(struct cpu_hw_events *cpuc)
> >  {
> >         struct amd_nb *nb = cpuc->amd_nb;
> > @@ -234,6 +297,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
> >         return nb && nb->nb_id != -1;
> >  }
> >
> > +static int amd_pmu_hw_config(struct perf_event *event)
> > +{
> > +       int ret;
> > +
> > +       /* pass precise event sampling to ibs: */
> > +       if (event->attr.precise_ip && get_ibs_caps())
> > +               return -ENOENT;
> > +
> > +       if (has_branch_stack(event))
> > +               return -EOPNOTSUPP;
> > +
> > +       ret = x86_pmu_hw_config(event);
> > +       if (ret)
> > +               return ret;
> > +
> > +       if (event->attr.type == PERF_TYPE_RAW)
> > +               event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> > +
> > +       if (amd_is_perfctr_nb_event(&event->hw))
> > +               return amd_nb_hw_config(event);
> > +
> > +       return amd_core_hw_config(event);
> > +}
> > +
> >  static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
> >                                            struct perf_event *event)
> >  {
> > @@ -254,6 +341,19 @@ static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
> >         }
> >  }
> >
> > +static void amd_nb_interrupt_hw_config(struct hw_perf_event *hwc)
> > +{
> > +       int core_id = cpu_data(smp_processor_id()).cpu_core_id;
> > +
> > +       /* deliver interrupts only to this core */
> > +       if (hwc->config & ARCH_PERFMON_EVENTSEL_INT) {
> > +               hwc->config |= AMD64_EVENTSEL_INT_CORE_ENABLE;
> > +               hwc->config &= ~AMD64_EVENTSEL_INT_CORE_SEL_MASK;
> > +               hwc->config |= (u64)(core_id) <<
> > +                       AMD64_EVENTSEL_INT_CORE_SEL_SHIFT;
> > +       }
> > +}
> > +
> >   /*
> >    * AMD64 NorthBridge events need special treatment because
> >    * counter access needs to be synchronized across all cores
> > @@ -299,6 +399,12 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> >         struct perf_event *old;
> >         int idx, new = -1;
> >
> > +       if (!c)
> > +               c = &unconstrained;
> > +
> > +       if (cpuc->is_fake)
> > +               return c;
> > +
> >         /*
> >          * detect if already present, if so reuse
> >          *
> > @@ -335,6 +441,9 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> >         if (new == -1)
> >                 return &emptyconstraint;
> >
> > +       if (amd_is_perfctr_nb_event(hwc))
> > +               amd_nb_interrupt_hw_config(hwc);
> > +
> >         return &nb->event_constraints[new];
> >  }
> >
> > @@ -434,7 +543,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
> >         if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
> >                 return &unconstrained;
> >
> > -       return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
> > +       return __amd_get_nb_event_constraints(cpuc, event,
> > +                                             amd_nb_event_constraint);
> >  }
> >
> >  static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
> > @@ -533,6 +643,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
> >  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
> >  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
> >
> > +static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
> > +static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
> > +
> >  static struct event_constraint *
> >  amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
> >  {
> > @@ -598,8 +711,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
> >                         return &amd_f15_PMC20;
> >                 }
> >         case AMD_EVENT_NB:
> > -               /* not yet implemented */
> > -               return &emptyconstraint;
> > +               return __amd_get_nb_event_constraints(cpuc, event,
> > +                                                     amd_nb_event_constraint);
> >         default:
> >                 return &emptyconstraint;
> >         }
> > @@ -647,7 +760,7 @@ static __initconst const struct x86_pmu amd_pmu = {
> >
> >  static int setup_event_constraints(void)
> >  {
> > -       if (boot_cpu_data.x86 >= 0x15)
> > +       if (boot_cpu_data.x86 == 0x15)
> >                 x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
> >         return 0;
> >  }
> > @@ -677,6 +790,23 @@ static int setup_perfctr_core(void)
> >         return 0;
> >  }
> >
> > +static int setup_perfctr_nb(void)
> > +{
> > +       if (!cpu_has_perfctr_nb)
> > +               return -ENODEV;
> > +
> > +       x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
> > +
> > +       if (cpu_has_perfctr_core)
> > +               amd_nb_event_constraint = &amd_NBPMC96;
> > +       else
> > +               amd_nb_event_constraint = &amd_NBPMC74;
> > +
> > +       printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
> > +
> > +       return 0;
> > +}
> > +
> >  __init int amd_pmu_init(void)
> >  {
> >         /* Performance-monitoring supported from K7 and later: */
> > @@ -687,6 +817,7 @@ __init int amd_pmu_init(void)
> >
> >         setup_event_constraints();
> >         setup_perfctr_core();
> > +       setup_perfctr_nb();
> >
> >         /* Events are common for all AMDs */
> >         memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
> > --
> > 1.7.9.5
> >
> >
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h
  2013-02-11 16:26     ` Jacob Shin
@ 2013-02-15 20:51       ` Jacob Shin
  0 siblings, 0 replies; 21+ messages in thread
From: Jacob Shin @ 2013-02-15 20:51 UTC (permalink / raw)
  To: Stephane Eranian, Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, x86, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Jiri Olsa, LKML

On Mon, Feb 11, 2013 at 10:26:09AM -0600, Jacob Shin wrote:
> On Fri, Feb 08, 2013 at 12:16:28PM +0100, Stephane Eranian wrote:
> > On Wed, Feb 6, 2013 at 6:26 PM, Jacob Shin <jacob.shin@amd.com> wrote:
> > > On AMD family 15h processors, there are 4 new performance counters
> > > (in addition to 6 core performance counters) that can be used for
> > > counting northbridge events (i.e. DRAM accesses). Their bit fields are
> > > almost identical to the core performance counters. However, unlike the
> > > core performance counters, these MSRs are shared between multiple
> > > cores (that share the same northbridge). We will reuse the same code
> > > path as existing family 10h northbridge event constraints handler
> > > logic to enforce this sharing.
> > >
> > > Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> > 
> > Works for me.
> > 
> > I simply regret that the design decision ties uncore with core
> > even though hardware-wise they are separate. If I recall the earlier
> > discussion the motivation was to limit code duplication. That's true
> > but that's at the expense of isolation. For instance, now if the core
> > PMU is overcommitted, but not the uncore, then uncore still goes
> > thru event rescheduling for nothing.
> > 
> > But what matters at this point, is that there is coverage
> > for uncore, so we can get some bandwidth measurements
> > out. So i recommend we merge this in. Thanks.
> > 
> > Acked-by: Stephane Eranian <eranian@google.com>
> 
> Stephane, thank you for your time reviewing/testing the patchset.
> 
> Ingo, could you please commit this patch 6/6 to tip?

Hi Ingo, just ping'ing again .. Since we now have an ACK from Stephane
could you please commit this last one to tip ? (just want to get it in
before the merge window opens)

Thanks for you time,

-Jacob

> 
> Thank you,
> 
> -Jacob
> 
> > 
> > > ---
> > >  arch/x86/include/asm/cpufeature.h     |    2 +
> > >  arch/x86/include/asm/perf_event.h     |    9 ++
> > >  arch/x86/include/uapi/asm/msr-index.h |    2 +
> > >  arch/x86/kernel/cpu/perf_event_amd.c  |  171 +++++++++++++++++++++++++++++----
> > >  4 files changed, 164 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > > index 2d9075e..93fe929 100644
> > > --- a/arch/x86/include/asm/cpufeature.h
> > > +++ b/arch/x86/include/asm/cpufeature.h
> > > @@ -167,6 +167,7 @@
> > >  #define X86_FEATURE_TBM                (6*32+21) /* trailing bit manipulations */
> > >  #define X86_FEATURE_TOPOEXT    (6*32+22) /* topology extensions CPUID leafs */
> > >  #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
> > > +#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* NB performance counter extensions */
> > >
> > >  /*
> > >   * Auxiliary flags: Linux defined - For features scattered in various
> > > @@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
> > >  #define cpu_has_hypervisor     boot_cpu_has(X86_FEATURE_HYPERVISOR)
> > >  #define cpu_has_pclmulqdq      boot_cpu_has(X86_FEATURE_PCLMULQDQ)
> > >  #define cpu_has_perfctr_core   boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
> > > +#define cpu_has_perfctr_nb     boot_cpu_has(X86_FEATURE_PERFCTR_NB)
> > >  #define cpu_has_cx8            boot_cpu_has(X86_FEATURE_CX8)
> > >  #define cpu_has_cx16           boot_cpu_has(X86_FEATURE_CX16)
> > >  #define cpu_has_eager_fpu      boot_cpu_has(X86_FEATURE_EAGER_FPU)
> > > diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> > > index 2234eaaec..57cb634 100644
> > > --- a/arch/x86/include/asm/perf_event.h
> > > +++ b/arch/x86/include/asm/perf_event.h
> > > @@ -29,9 +29,14 @@
> > >  #define ARCH_PERFMON_EVENTSEL_INV                      (1ULL << 23)
> > >  #define ARCH_PERFMON_EVENTSEL_CMASK                    0xFF000000ULL
> > >
> > > +#define AMD64_EVENTSEL_INT_CORE_ENABLE                 (1ULL << 36)
> > >  #define AMD64_EVENTSEL_GUESTONLY                       (1ULL << 40)
> > >  #define AMD64_EVENTSEL_HOSTONLY                                (1ULL << 41)
> > >
> > > +#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT              37
> > > +#define AMD64_EVENTSEL_INT_CORE_SEL_MASK               \
> > > +       (0xFULL << AMD64_EVENTSEL_INT_CORE_SEL_SHIFT)
> > > +
> > >  #define AMD64_EVENTSEL_EVENT   \
> > >         (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
> > >  #define INTEL_ARCH_EVENT_MASK  \
> > > @@ -46,8 +51,12 @@
> > >  #define AMD64_RAW_EVENT_MASK           \
> > >         (X86_RAW_EVENT_MASK          |  \
> > >          AMD64_EVENTSEL_EVENT)
> > > +#define AMD64_RAW_EVENT_MASK_NB                \
> > > +       (AMD64_EVENTSEL_EVENT        |  \
> > > +        ARCH_PERFMON_EVENTSEL_UMASK)
> > >  #define AMD64_NUM_COUNTERS                             4
> > >  #define AMD64_NUM_COUNTERS_CORE                                6
> > > +#define AMD64_NUM_COUNTERS_NB                          4
> > >
> > >  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL          0x3c
> > >  #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK                (0x00 << 8)
> > > diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
> > > index 1031604..27c05d2 100644
> > > --- a/arch/x86/include/uapi/asm/msr-index.h
> > > +++ b/arch/x86/include/uapi/asm/msr-index.h
> > > @@ -195,6 +195,8 @@
> > >  /* Fam 15h MSRs */
> > >  #define MSR_F15H_PERF_CTL              0xc0010200
> > >  #define MSR_F15H_PERF_CTR              0xc0010201
> > > +#define MSR_F15H_NB_PERF_CTL           0xc0010240
> > > +#define MSR_F15H_NB_PERF_CTR           0xc0010241
> > >
> > >  /* Fam 10h MSRs */
> > >  #define MSR_FAM10H_MMIO_CONF_BASE      0xc0010058
> > > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > > index 05462f0..dfdab42 100644
> > > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > > @@ -132,11 +132,14 @@ static u64 amd_pmu_event_map(int hw_event)
> > >         return amd_perfmon_event_map[hw_event];
> > >  }
> > >
> > > +static struct event_constraint *amd_nb_event_constraint;
> > > +
> > >  /*
> > >   * Previously calculated offsets
> > >   */
> > >  static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
> > >  static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
> > > +static unsigned int rdpmc_indexes[X86_PMC_IDX_MAX] __read_mostly;
> > >
> > >  /*
> > >   * Legacy CPUs:
> > > @@ -144,10 +147,14 @@ static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
> > >   *
> > >   * CPUs with core performance counter extensions:
> > >   *   6 counters starting at 0xc0010200 each offset by 2
> > > + *
> > > + * CPUs with north bridge performance counter extensions:
> > > + *   4 additional counters starting at 0xc0010240 each offset by 2
> > > + *   (indexed right above either one of the above core counters)
> > >   */
> > >  static inline int amd_pmu_addr_offset(int index, bool eventsel)
> > >  {
> > > -       int offset;
> > > +       int offset, first, base;
> > >
> > >         if (!index)
> > >                 return index;
> > > @@ -160,7 +167,23 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
> > >         if (offset)
> > >                 return offset;
> > >
> > > -       if (!cpu_has_perfctr_core)
> > > +       if (amd_nb_event_constraint &&
> > > +           test_bit(index, amd_nb_event_constraint->idxmsk)) {
> > > +               /*
> > > +                * calculate the offset of NB counters with respect to
> > > +                * base eventsel or perfctr
> > > +                */
> > > +
> > > +               first = find_first_bit(amd_nb_event_constraint->idxmsk,
> > > +                                      X86_PMC_IDX_MAX);
> > > +
> > > +               if (eventsel)
> > > +                       base = MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel;
> > > +               else
> > > +                       base = MSR_F15H_NB_PERF_CTR - x86_pmu.perfctr;
> > > +
> > > +               offset = base + ((index - first) << 1);
> > > +       } else if (!cpu_has_perfctr_core)
> > >                 offset = index;
> > >         else
> > >                 offset = index << 1;
> > > @@ -175,24 +198,36 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
> > >
> > >  static inline int amd_pmu_rdpmc_index(int index)
> > >  {
> > > -       return index;
> > > -}
> > > +       int ret, first;
> > >
> > > -static int amd_pmu_hw_config(struct perf_event *event)
> > > -{
> > > -       int ret;
> > > +       if (!index)
> > > +               return index;
> > >
> > > -       /* pass precise event sampling to ibs: */
> > > -       if (event->attr.precise_ip && get_ibs_caps())
> > > -               return -ENOENT;
> > > +       ret = rdpmc_indexes[index];
> > >
> > > -       ret = x86_pmu_hw_config(event);
> > >         if (ret)
> > >                 return ret;
> > >
> > > -       if (has_branch_stack(event))
> > > -               return -EOPNOTSUPP;
> > > +       if (amd_nb_event_constraint &&
> > > +           test_bit(index, amd_nb_event_constraint->idxmsk)) {
> > > +               /*
> > > +                * according to the mnual, ECX value of the NB counters is
> > > +                * the index of the NB counter (0, 1, 2 or 3) plus 6
> > > +                */
> > > +
> > > +               first = find_first_bit(amd_nb_event_constraint->idxmsk,
> > > +                                      X86_PMC_IDX_MAX);
> > > +               ret = index - first + 6;
> > > +       } else
> > > +               ret = index;
> > > +
> > > +       rdpmc_indexes[index] = ret;
> > >
> > > +       return ret;
> > > +}
> > > +
> > > +static int amd_core_hw_config(struct perf_event *event)
> > > +{
> > >         if (event->attr.exclude_host && event->attr.exclude_guest)
> > >                 /*
> > >                  * When HO == GO == 1 the hardware treats that as GO == HO == 0
> > > @@ -206,10 +241,33 @@ static int amd_pmu_hw_config(struct perf_event *event)
> > >         else if (event->attr.exclude_guest)
> > >                 event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
> > >
> > > -       if (event->attr.type != PERF_TYPE_RAW)
> > > -               return 0;
> > > +       return 0;
> > > +}
> > >
> > > -       event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> > > +/*
> > > + * NB counters do not support the following event select bits:
> > > + *   Host/Guest only
> > > + *   Counter mask
> > > + *   Invert counter mask
> > > + *   Edge detect
> > > + *   OS/User mode
> > > + */
> > > +static int amd_nb_hw_config(struct perf_event *event)
> > > +{
> > > +       /* for NB, we only allow system wide counting mode */
> > > +       if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> > > +               return -EINVAL;
> > > +
> > > +       if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > > +           event->attr.exclude_host || event->attr.exclude_guest)
> > > +               return -EINVAL;
> > > +
> > > +       event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
> > > +                             ARCH_PERFMON_EVENTSEL_OS);
> > > +
> > > +       if (event->hw.config & ~(AMD64_RAW_EVENT_MASK_NB |
> > > +                                ARCH_PERFMON_EVENTSEL_INT))
> > > +               return -EINVAL;
> > >
> > >         return 0;
> > >  }
> > > @@ -227,6 +285,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
> > >         return (hwc->config & 0xe0) == 0xe0;
> > >  }
> > >
> > > +static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
> > > +{
> > > +       return amd_nb_event_constraint && amd_is_nb_event(hwc);
> > > +}
> > > +
> > >  static inline int amd_has_nb(struct cpu_hw_events *cpuc)
> > >  {
> > >         struct amd_nb *nb = cpuc->amd_nb;
> > > @@ -234,6 +297,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
> > >         return nb && nb->nb_id != -1;
> > >  }
> > >
> > > +static int amd_pmu_hw_config(struct perf_event *event)
> > > +{
> > > +       int ret;
> > > +
> > > +       /* pass precise event sampling to ibs: */
> > > +       if (event->attr.precise_ip && get_ibs_caps())
> > > +               return -ENOENT;
> > > +
> > > +       if (has_branch_stack(event))
> > > +               return -EOPNOTSUPP;
> > > +
> > > +       ret = x86_pmu_hw_config(event);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       if (event->attr.type == PERF_TYPE_RAW)
> > > +               event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
> > > +
> > > +       if (amd_is_perfctr_nb_event(&event->hw))
> > > +               return amd_nb_hw_config(event);
> > > +
> > > +       return amd_core_hw_config(event);
> > > +}
> > > +
> > >  static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
> > >                                            struct perf_event *event)
> > >  {
> > > @@ -254,6 +341,19 @@ static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
> > >         }
> > >  }
> > >
> > > +static void amd_nb_interrupt_hw_config(struct hw_perf_event *hwc)
> > > +{
> > > +       int core_id = cpu_data(smp_processor_id()).cpu_core_id;
> > > +
> > > +       /* deliver interrupts only to this core */
> > > +       if (hwc->config & ARCH_PERFMON_EVENTSEL_INT) {
> > > +               hwc->config |= AMD64_EVENTSEL_INT_CORE_ENABLE;
> > > +               hwc->config &= ~AMD64_EVENTSEL_INT_CORE_SEL_MASK;
> > > +               hwc->config |= (u64)(core_id) <<
> > > +                       AMD64_EVENTSEL_INT_CORE_SEL_SHIFT;
> > > +       }
> > > +}
> > > +
> > >   /*
> > >    * AMD64 NorthBridge events need special treatment because
> > >    * counter access needs to be synchronized across all cores
> > > @@ -299,6 +399,12 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> > >         struct perf_event *old;
> > >         int idx, new = -1;
> > >
> > > +       if (!c)
> > > +               c = &unconstrained;
> > > +
> > > +       if (cpuc->is_fake)
> > > +               return c;
> > > +
> > >         /*
> > >          * detect if already present, if so reuse
> > >          *
> > > @@ -335,6 +441,9 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
> > >         if (new == -1)
> > >                 return &emptyconstraint;
> > >
> > > +       if (amd_is_perfctr_nb_event(hwc))
> > > +               amd_nb_interrupt_hw_config(hwc);
> > > +
> > >         return &nb->event_constraints[new];
> > >  }
> > >
> > > @@ -434,7 +543,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
> > >         if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
> > >                 return &unconstrained;
> > >
> > > -       return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
> > > +       return __amd_get_nb_event_constraints(cpuc, event,
> > > +                                             amd_nb_event_constraint);
> > >  }
> > >
> > >  static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
> > > @@ -533,6 +643,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
> > >  static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
> > >  static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
> > >
> > > +static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
> > > +static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
> > > +
> > >  static struct event_constraint *
> > >  amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
> > >  {
> > > @@ -598,8 +711,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
> > >                         return &amd_f15_PMC20;
> > >                 }
> > >         case AMD_EVENT_NB:
> > > -               /* not yet implemented */
> > > -               return &emptyconstraint;
> > > +               return __amd_get_nb_event_constraints(cpuc, event,
> > > +                                                     amd_nb_event_constraint);
> > >         default:
> > >                 return &emptyconstraint;
> > >         }
> > > @@ -647,7 +760,7 @@ static __initconst const struct x86_pmu amd_pmu = {
> > >
> > >  static int setup_event_constraints(void)
> > >  {
> > > -       if (boot_cpu_data.x86 >= 0x15)
> > > +       if (boot_cpu_data.x86 == 0x15)
> > >                 x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
> > >         return 0;
> > >  }
> > > @@ -677,6 +790,23 @@ static int setup_perfctr_core(void)
> > >         return 0;
> > >  }
> > >
> > > +static int setup_perfctr_nb(void)
> > > +{
> > > +       if (!cpu_has_perfctr_nb)
> > > +               return -ENODEV;
> > > +
> > > +       x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
> > > +
> > > +       if (cpu_has_perfctr_core)
> > > +               amd_nb_event_constraint = &amd_NBPMC96;
> > > +       else
> > > +               amd_nb_event_constraint = &amd_NBPMC74;
> > > +
> > > +       printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  __init int amd_pmu_init(void)
> > >  {
> > >         /* Performance-monitoring supported from K7 and later: */
> > > @@ -687,6 +817,7 @@ __init int amd_pmu_init(void)
> > >
> > >         setup_event_constraints();
> > >         setup_perfctr_core();
> > > +       setup_perfctr_nb();
> > >
> > >         /* Events are common for all AMDs */
> > >         memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
> > > --
> > > 1.7.9.5
> > >
> > >
> > 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86/amd: Enable northbridge performance counters on AMD family 15h
  2013-02-06 17:26 ` [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
  2013-02-07 17:57   ` Jacob Shin
  2013-02-08 11:16   ` Stephane Eranian
@ 2013-02-18  8:30   ` tip-bot for Jacob Shin
  2 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Jacob Shin @ 2013-02-18  8:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, eranian, hpa, mingo, a.p.zijlstra, acme,
	jolsa, jacob.shin, tglx

Commit-ID:  e259514eef764a5286873618e34c560ecb6cff13
Gitweb:     http://git.kernel.org/tip/e259514eef764a5286873618e34c560ecb6cff13
Author:     Jacob Shin <jacob.shin@amd.com>
AuthorDate: Wed, 6 Feb 2013 11:26:29 -0600
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 16 Feb 2013 09:37:27 +0100

perf/x86/amd: Enable northbridge performance counters on AMD family 15h

On AMD family 15h processors, there are 4 new performance
counters (in addition to 6 core performance counters) that can
be used for counting northbridge events (i.e. DRAM accesses).

Their bit fields are almost identical to the core performance
counters. However, unlike the core performance counters, these
MSRs are shared between multiple cores (that share the same
northbridge).

We will reuse the same code path as existing family 10h
northbridge event constraints handler logic to enforce
this sharing.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360171589-6381-7-git-send-email-jacob.shin@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/cpufeature.h     |   2 +
 arch/x86/include/asm/perf_event.h     |   9 ++
 arch/x86/include/uapi/asm/msr-index.h |   2 +
 arch/x86/kernel/cpu/perf_event_amd.c  | 171 ++++++++++++++++++++++++++++++----
 4 files changed, 164 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 2d9075e..93fe929 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -167,6 +167,7 @@
 #define X86_FEATURE_TBM		(6*32+21) /* trailing bit manipulations */
 #define X86_FEATURE_TOPOEXT	(6*32+22) /* topology extensions CPUID leafs */
 #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
+#define X86_FEATURE_PERFCTR_NB  (6*32+24) /* NB performance counter extensions */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
@@ -309,6 +310,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_hypervisor	boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq	boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core	boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_perfctr_nb	boot_cpu_has(X86_FEATURE_PERFCTR_NB)
 #define cpu_has_cx8		boot_cpu_has(X86_FEATURE_CX8)
 #define cpu_has_cx16		boot_cpu_has(X86_FEATURE_CX16)
 #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 2234eaaec..57cb634 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,9 +29,14 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
+#define AMD64_EVENTSEL_INT_CORE_ENABLE			(1ULL << 36)
 #define AMD64_EVENTSEL_GUESTONLY			(1ULL << 40)
 #define AMD64_EVENTSEL_HOSTONLY				(1ULL << 41)
 
+#define AMD64_EVENTSEL_INT_CORE_SEL_SHIFT		37
+#define AMD64_EVENTSEL_INT_CORE_SEL_MASK		\
+	(0xFULL << AMD64_EVENTSEL_INT_CORE_SEL_SHIFT)
+
 #define AMD64_EVENTSEL_EVENT	\
 	(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
 #define INTEL_ARCH_EVENT_MASK	\
@@ -46,8 +51,12 @@
 #define AMD64_RAW_EVENT_MASK		\
 	(X86_RAW_EVENT_MASK          |  \
 	 AMD64_EVENTSEL_EVENT)
+#define AMD64_RAW_EVENT_MASK_NB		\
+	(AMD64_EVENTSEL_EVENT        |  \
+	 ARCH_PERFMON_EVENTSEL_UMASK)
 #define AMD64_NUM_COUNTERS				4
 #define AMD64_NUM_COUNTERS_CORE				6
+#define AMD64_NUM_COUNTERS_NB				4
 
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index 433a59f..075a402 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -194,6 +194,8 @@
 /* Fam 15h MSRs */
 #define MSR_F15H_PERF_CTL		0xc0010200
 #define MSR_F15H_PERF_CTR		0xc0010201
+#define MSR_F15H_NB_PERF_CTL		0xc0010240
+#define MSR_F15H_NB_PERF_CTR		0xc0010241
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 05462f0..dfdab42 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -132,11 +132,14 @@ static u64 amd_pmu_event_map(int hw_event)
 	return amd_perfmon_event_map[hw_event];
 }
 
+static struct event_constraint *amd_nb_event_constraint;
+
 /*
  * Previously calculated offsets
  */
 static unsigned int event_offsets[X86_PMC_IDX_MAX] __read_mostly;
 static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
+static unsigned int rdpmc_indexes[X86_PMC_IDX_MAX] __read_mostly;
 
 /*
  * Legacy CPUs:
@@ -144,10 +147,14 @@ static unsigned int count_offsets[X86_PMC_IDX_MAX] __read_mostly;
  *
  * CPUs with core performance counter extensions:
  *   6 counters starting at 0xc0010200 each offset by 2
+ *
+ * CPUs with north bridge performance counter extensions:
+ *   4 additional counters starting at 0xc0010240 each offset by 2
+ *   (indexed right above either one of the above core counters)
  */
 static inline int amd_pmu_addr_offset(int index, bool eventsel)
 {
-	int offset;
+	int offset, first, base;
 
 	if (!index)
 		return index;
@@ -160,7 +167,23 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
 	if (offset)
 		return offset;
 
-	if (!cpu_has_perfctr_core)
+	if (amd_nb_event_constraint &&
+	    test_bit(index, amd_nb_event_constraint->idxmsk)) {
+		/*
+		 * calculate the offset of NB counters with respect to
+		 * base eventsel or perfctr
+		 */
+
+		first = find_first_bit(amd_nb_event_constraint->idxmsk,
+				       X86_PMC_IDX_MAX);
+
+		if (eventsel)
+			base = MSR_F15H_NB_PERF_CTL - x86_pmu.eventsel;
+		else
+			base = MSR_F15H_NB_PERF_CTR - x86_pmu.perfctr;
+
+		offset = base + ((index - first) << 1);
+	} else if (!cpu_has_perfctr_core)
 		offset = index;
 	else
 		offset = index << 1;
@@ -175,24 +198,36 @@ static inline int amd_pmu_addr_offset(int index, bool eventsel)
 
 static inline int amd_pmu_rdpmc_index(int index)
 {
-	return index;
-}
+	int ret, first;
 
-static int amd_pmu_hw_config(struct perf_event *event)
-{
-	int ret;
+	if (!index)
+		return index;
 
-	/* pass precise event sampling to ibs: */
-	if (event->attr.precise_ip && get_ibs_caps())
-		return -ENOENT;
+	ret = rdpmc_indexes[index];
 
-	ret = x86_pmu_hw_config(event);
 	if (ret)
 		return ret;
 
-	if (has_branch_stack(event))
-		return -EOPNOTSUPP;
+	if (amd_nb_event_constraint &&
+	    test_bit(index, amd_nb_event_constraint->idxmsk)) {
+		/*
+		 * according to the mnual, ECX value of the NB counters is
+		 * the index of the NB counter (0, 1, 2 or 3) plus 6
+		 */
+
+		first = find_first_bit(amd_nb_event_constraint->idxmsk,
+				       X86_PMC_IDX_MAX);
+		ret = index - first + 6;
+	} else
+		ret = index;
+
+	rdpmc_indexes[index] = ret;
 
+	return ret;
+}
+
+static int amd_core_hw_config(struct perf_event *event)
+{
 	if (event->attr.exclude_host && event->attr.exclude_guest)
 		/*
 		 * When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -206,10 +241,33 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	else if (event->attr.exclude_guest)
 		event->hw.config |= AMD64_EVENTSEL_HOSTONLY;
 
-	if (event->attr.type != PERF_TYPE_RAW)
-		return 0;
+	return 0;
+}
 
-	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+/*
+ * NB counters do not support the following event select bits:
+ *   Host/Guest only
+ *   Counter mask
+ *   Invert counter mask
+ *   Edge detect
+ *   OS/User mode
+ */
+static int amd_nb_hw_config(struct perf_event *event)
+{
+	/* for NB, we only allow system wide counting mode */
+	if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+		return -EINVAL;
+
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
+	event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR |
+			      ARCH_PERFMON_EVENTSEL_OS);
+
+	if (event->hw.config & ~(AMD64_RAW_EVENT_MASK_NB |
+				 ARCH_PERFMON_EVENTSEL_INT))
+		return -EINVAL;
 
 	return 0;
 }
@@ -227,6 +285,11 @@ static inline int amd_is_nb_event(struct hw_perf_event *hwc)
 	return (hwc->config & 0xe0) == 0xe0;
 }
 
+static inline int amd_is_perfctr_nb_event(struct hw_perf_event *hwc)
+{
+	return amd_nb_event_constraint && amd_is_nb_event(hwc);
+}
+
 static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 {
 	struct amd_nb *nb = cpuc->amd_nb;
@@ -234,6 +297,30 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
 	return nb && nb->nb_id != -1;
 }
 
+static int amd_pmu_hw_config(struct perf_event *event)
+{
+	int ret;
+
+	/* pass precise event sampling to ibs: */
+	if (event->attr.precise_ip && get_ibs_caps())
+		return -ENOENT;
+
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	ret = x86_pmu_hw_config(event);
+	if (ret)
+		return ret;
+
+	if (event->attr.type == PERF_TYPE_RAW)
+		event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
+
+	if (amd_is_perfctr_nb_event(&event->hw))
+		return amd_nb_hw_config(event);
+
+	return amd_core_hw_config(event);
+}
+
 static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
 					   struct perf_event *event)
 {
@@ -254,6 +341,19 @@ static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
 	}
 }
 
+static void amd_nb_interrupt_hw_config(struct hw_perf_event *hwc)
+{
+	int core_id = cpu_data(smp_processor_id()).cpu_core_id;
+
+	/* deliver interrupts only to this core */
+	if (hwc->config & ARCH_PERFMON_EVENTSEL_INT) {
+		hwc->config |= AMD64_EVENTSEL_INT_CORE_ENABLE;
+		hwc->config &= ~AMD64_EVENTSEL_INT_CORE_SEL_MASK;
+		hwc->config |= (u64)(core_id) <<
+			AMD64_EVENTSEL_INT_CORE_SEL_SHIFT;
+	}
+}
+
  /*
   * AMD64 NorthBridge events need special treatment because
   * counter access needs to be synchronized across all cores
@@ -299,6 +399,12 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
 	struct perf_event *old;
 	int idx, new = -1;
 
+	if (!c)
+		c = &unconstrained;
+
+	if (cpuc->is_fake)
+		return c;
+
 	/*
 	 * detect if already present, if so reuse
 	 *
@@ -335,6 +441,9 @@ __amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *ev
 	if (new == -1)
 		return &emptyconstraint;
 
+	if (amd_is_perfctr_nb_event(hwc))
+		amd_nb_interrupt_hw_config(hwc);
+
 	return &nb->event_constraints[new];
 }
 
@@ -434,7 +543,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
 		return &unconstrained;
 
-	return __amd_get_nb_event_constraints(cpuc, event, &unconstrained);
+	return __amd_get_nb_event_constraints(cpuc, event,
+					      amd_nb_event_constraint);
 }
 
 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
@@ -533,6 +643,9 @@ static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09,
 static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
 static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
 
+static struct event_constraint amd_NBPMC96 = EVENT_CONSTRAINT(0, 0x3C0, 0);
+static struct event_constraint amd_NBPMC74 = EVENT_CONSTRAINT(0, 0xF0, 0);
+
 static struct event_constraint *
 amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
 {
@@ -598,8 +711,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
 			return &amd_f15_PMC20;
 		}
 	case AMD_EVENT_NB:
-		/* not yet implemented */
-		return &emptyconstraint;
+		return __amd_get_nb_event_constraints(cpuc, event,
+						      amd_nb_event_constraint);
 	default:
 		return &emptyconstraint;
 	}
@@ -647,7 +760,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 
 static int setup_event_constraints(void)
 {
-	if (boot_cpu_data.x86 >= 0x15)
+	if (boot_cpu_data.x86 == 0x15)
 		x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
 	return 0;
 }
@@ -677,6 +790,23 @@ static int setup_perfctr_core(void)
 	return 0;
 }
 
+static int setup_perfctr_nb(void)
+{
+	if (!cpu_has_perfctr_nb)
+		return -ENODEV;
+
+	x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
+
+	if (cpu_has_perfctr_core)
+		amd_nb_event_constraint = &amd_NBPMC96;
+	else
+		amd_nb_event_constraint = &amd_NBPMC74;
+
+	printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
+
+	return 0;
+}
+
 __init int amd_pmu_init(void)
 {
 	/* Performance-monitoring supported from K7 and later: */
@@ -687,6 +817,7 @@ __init int amd_pmu_init(void)
 
 	setup_event_constraints();
 	setup_perfctr_core();
+	setup_perfctr_nb();
 
 	/* Events are common for all AMDs */
 	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-02-18  8:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-06 17:26 [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
2013-02-06 17:26 ` [PATCH 1/6] perf, amd: Rework northbridge event constraints handler Jacob Shin
2013-02-06 20:28   ` [tip:perf/core] perf/x86/amd: " tip-bot for Robert Richter
2013-02-06 17:26 ` [PATCH 2/6] perf, amd: Generalize northbridge constraints code for family 15h Jacob Shin
2013-02-06 20:29   ` [tip:perf/core] perf/x86/amd: " tip-bot for Robert Richter
2013-02-06 17:26 ` [PATCH 3/6] perf, amd: Use proper naming scheme for AMD bit field definitions Jacob Shin
2013-02-06 20:30   ` [tip:perf/core] perf/x86/amd: " tip-bot for Jacob Shin
2013-02-06 17:26 ` [PATCH 4/6] perf, x86: Move MSR address offset calculation to architecture specific files Jacob Shin
2013-02-06 20:31   ` [tip:perf/core] perf/x86: " tip-bot for Jacob Shin
2013-02-06 17:26 ` [PATCH 5/6] perf, x86: Allow for architecture specific RDPMC indexes Jacob Shin
2013-02-06 20:32   ` [tip:perf/core] perf/x86: " tip-bot for Jacob Shin
2013-02-06 17:26 ` [PATCH 6/6] perf, amd: Enable northbridge performance counters on AMD family 15h Jacob Shin
2013-02-07 17:57   ` Jacob Shin
2013-02-07 17:58     ` Stephane Eranian
2013-02-07 19:09     ` Ingo Molnar
2013-02-08 11:16   ` Stephane Eranian
2013-02-11 16:26     ` Jacob Shin
2013-02-15 20:51       ` Jacob Shin
2013-02-18  8:30   ` [tip:perf/core] perf/x86/amd: " tip-bot for Jacob Shin
2013-02-06 17:31 ` [PATCH V6 0/6] perf, amd: Enable AMD family 15h northbridge counters Jacob Shin
2013-02-08 10:55   ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).