linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] perf: Intel uncore pmu counting support
@ 2012-03-28  6:43 Yan, Zheng
  2012-03-28  6:43 ` [PATCH 1/5] perf: Export perf_assign_events Yan, Zheng
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28  6:43 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian; +Cc: linux-kernel, ming.m.lin

Hi, all

Here is the RFC patches to add uncore counting support for Nehalem,
Sandy Bridge and Sandy Bridge-EP, applied on top of current tip.
The code is based on Lin Ming's old patches.

You can use 'perf stat' to access to the uncore pmu. For example:
perf stat -a -C 0 -e 'uncore_nhm/config=0xffff/' sleep 1

For Nehalem and Sandy Bridge-EP, A few general events are exported
under directory:
/sys/bus/event_source/devices/uncore_xxx/events/

Any comment is appreciated.
Thank you


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] perf: Export perf_assign_events
  2012-03-28  6:43 [RFC PATCH 0/5] perf: Intel uncore pmu counting support Yan, Zheng
@ 2012-03-28  6:43 ` Yan, Zheng
  2012-03-28  6:43 ` [PATCH 2/5] perf: generic intel uncore support Yan, Zheng
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28  6:43 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian; +Cc: linux-kernel, ming.m.lin

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Export perf_assign_events so the uncore code can use it to
schedule events.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event.c |    6 +++---
 arch/x86/kernel/cpu/perf_event.h |    2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index bb8e034..70e5784 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -640,7 +640,7 @@ static bool __perf_sched_find_counter(struct perf_sched *sched)
 	c = sched->constraints[sched->state.event];
 
 	/* Prefer fixed purpose counters */
-	if (x86_pmu.num_counters_fixed) {
+	if (c->idxmsk64 & ((u64)-1 << X86_PMC_IDX_FIXED)) {
 		idx = X86_PMC_IDX_FIXED;
 		for_each_set_bit_from(idx, c->idxmsk, X86_PMC_IDX_MAX) {
 			if (!__test_and_set_bit(idx, sched->state.used))
@@ -707,8 +707,8 @@ static bool perf_sched_next_event(struct perf_sched *sched)
 /*
  * Assign a counter for each event.
  */
-static int perf_assign_events(struct event_constraint **constraints, int n,
-			      int wmin, int wmax, int *assign)
+int perf_assign_events(struct event_constraint **constraints, int n,
+			int wmin, int wmax, int *assign)
 {
 	struct perf_sched sched;
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 6638aaf..e6dfc00 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -466,6 +466,8 @@ static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 
 void x86_pmu_enable_all(int added);
 
+int perf_assign_events(struct event_constraint **constraints, int n,
+			int wmin, int wmax, int *assign);
 int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign);
 
 void x86_pmu_stop(struct perf_event *event, int flags);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] perf: generic intel uncore support
  2012-03-28  6:43 [RFC PATCH 0/5] perf: Intel uncore pmu counting support Yan, Zheng
  2012-03-28  6:43 ` [PATCH 1/5] perf: Export perf_assign_events Yan, Zheng
@ 2012-03-28  6:43 ` Yan, Zheng
  2012-03-28  9:24   ` Andi Kleen
  2012-03-31  3:18   ` Peter Zijlstra
  2012-03-28  6:43 ` [PATCH 3/5] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28  6:43 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian; +Cc: linux-kernel, ming.m.lin

From: "Yan, Zheng" <zheng.z.yan@intel.com>

This patch adds the generic intel uncore pmu support. The code aims
to provide uncore pmu support for Sandy Bridge-EP, but it also works
for Nehalem and Sandy Bridge. The uncore subsystem in Sandy Bridge-EP
consists of a variety of components, each component contain one or
more boxes.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/Makefile                  |    2 +-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  814 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |  200 ++++++
 3 files changed, 1015 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/perf_event_intel_uncore.c
 create mode 100644 arch/x86/kernel/cpu/perf_event_intel_uncore.h

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 6ab6aa2..9dfa9e9 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -32,7 +32,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
 
 ifdef CONFIG_PERF_EVENTS
 obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o
-obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o perf_event_intel_uncore.o
 endif
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
new file mode 100644
index 0000000..d159e3e
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -0,0 +1,814 @@
+#include "perf_event_intel_uncore.h"
+
+static struct intel_uncore_type *empty_uncore[] = { NULL, };
+static struct intel_uncore_type **msr_uncores = empty_uncore;
+
+/* constraint for box with 2 counters */
+static struct event_constraint unconstrained_2 =
+	EVENT_CONSTRAINT(0, 0x3, 0);
+/* constraint for box with 3 counters */
+static struct event_constraint unconstrained_3 =
+	EVENT_CONSTRAINT(0, 0x7, 0);
+/* constraint for box with 4 counters */
+static struct event_constraint unconstrained_4 =
+	EVENT_CONSTRAINT(0, 0xf, 0);
+/* constraint for box with 8 counters */
+static struct event_constraint unconstrained_8 =
+	EVENT_CONSTRAINT(0, 0xff, 0);
+/* constraint for the fixed countesr */
+static struct event_constraint constraint_fixed =
+	EVENT_CONSTRAINT((u64)-1, 1 << UNCORE_PMC_IDX_FIXED, (u64)-1);
+
+static DEFINE_SPINLOCK(uncore_box_lock);
+
+static void uncore_assign_hw_event(struct intel_uncore_box *box,
+				   struct perf_event *event, int idx)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	hwc->idx = idx;
+	hwc->last_tag = ++box->tags[idx];
+
+	if (hwc->idx == UNCORE_PMC_IDX_FIXED) {
+		hwc->event_base = uncore_msr_fixed_ctr(box);
+		hwc->config_base = uncore_msr_fixed_ctl(box);
+		return;
+	}
+
+	hwc->config_base = uncore_msr_event_ctl(box, hwc->idx);
+	hwc->event_base =  uncore_msr_perf_ctr(box, hwc->idx);
+}
+
+static void __uncore_perf_event_update(struct intel_uncore_box *box,
+				      struct perf_event *event)
+{
+	u64 prev_count, new_count, delta;
+	int shift;
+
+	if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
+		shift = 64 - uncore_fixed_ctr_bits(box);
+	else
+		shift = 64 - uncore_perf_ctr_bits(box);
+
+	new_count = uncore_read_counter(box, event);
+	prev_count = local64_xchg(&event->hw.prev_count, new_count);
+
+	delta = (new_count << shift) - (prev_count << shift);
+	delta >>= shift;
+
+	local64_add(delta, &event->count);
+}
+
+static void uncore_perf_event_update(struct intel_uncore_box *box,
+				      struct perf_event *event)
+{
+	raw_spin_lock(&box->lock);
+	__uncore_perf_event_update(box, event);
+	raw_spin_unlock(&box->lock);
+}
+
+/*
+ * The overflow interrupt is unavailable for SandyBridge-EP, is broken
+ * for SandyBridge. So we use hrtimer to periodically poll the counter
+ */
+static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)
+{
+	struct intel_uncore_box *box;
+	enum hrtimer_restart ret = HRTIMER_RESTART;
+	int bit;
+
+	box = container_of(hrtimer, struct intel_uncore_box, hrtimer);
+	raw_spin_lock(&box->lock);
+
+	if (!box->n_active) {
+		ret = HRTIMER_NORESTART;
+		goto unlock;
+	}
+
+	uncore_disable_all(box);
+
+	for_each_set_bit(bit, box->active_mask, X86_PMC_IDX_MAX)
+		__uncore_perf_event_update(box, box->events[bit]);
+
+	hrtimer_forward_now(hrtimer, ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL));
+
+	uncore_enable_all(box);
+unlock:
+	raw_spin_unlock(&box->lock);
+	return ret;
+}
+
+static void uncore_pmu_start_hrtimer(struct intel_uncore_box *box)
+{
+	__hrtimer_start_range_ns(&box->hrtimer,
+				ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL), 0,
+				HRTIMER_MODE_REL_PINNED, 0);
+}
+
+static void uncore_pmu_cancel_hrtimer(struct intel_uncore_box *box)
+{
+	hrtimer_cancel(&box->hrtimer);
+}
+
+static void uncore_pmu_init_hrtimer(struct intel_uncore_box *box)
+{
+	hrtimer_init(&box->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	box->hrtimer.function = uncore_pmu_hrtimer;
+}
+
+struct intel_uncore_box *alloc_uncore_box(int cpu)
+{
+	struct intel_uncore_box *box;
+
+	box = kmalloc_node(sizeof(*box), GFP_KERNEL | __GFP_ZERO,
+			cpu_to_node(cpu));
+	if (!box)
+		return NULL;
+
+	raw_spin_lock_init(&box->lock);
+	uncore_pmu_init_hrtimer(box);
+	box->refcnt = 1;
+
+	return box;
+}
+
+static struct intel_uncore_box *
+__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
+{
+	struct intel_uncore_box *box;
+	struct hlist_head *head;
+	struct hlist_node *node;
+
+	head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
+
+	hlist_for_each_entry_rcu(box, node, head, hlist) {
+		if (box->phy_id == phyid)
+			return box;
+	}
+
+	return NULL;
+}
+
+static struct intel_uncore_box *
+uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
+{
+	struct intel_uncore_box *box;
+
+	rcu_read_lock();
+	box = __uncore_pmu_find_box(pmu, phyid);
+	rcu_read_unlock();
+
+	return box;
+}
+
+/* caller should hold the uncore_box_lock */
+static void uncore_pmu_add_box(struct intel_uncore_pmu *pmu,
+				struct intel_uncore_box *box)
+{
+	struct hlist_head *head;
+
+	head = &pmu->box_hash[box->phy_id % UNCORE_BOX_HASH_SIZE];
+	hlist_add_head_rcu(&box->hlist, head);
+}
+
+static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
+{
+	return container_of(event->pmu, struct intel_uncore_pmu, pmu);
+}
+
+static struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
+{
+	int phyid = topology_physical_package_id(smp_processor_id());
+	return uncore_pmu_find_box(uncore_event_to_pmu(event), phyid);
+}
+
+static int uncore_collect_events(struct intel_uncore_box *box,
+			  struct perf_event *leader, bool dogrp)
+{
+	struct perf_event *event;
+	int n, max_count;
+
+	max_count = box->pmu->type->num_counters;
+	if (box->pmu->type->fixed_ctl)
+		max_count++;
+
+	if (box->n_events >= max_count)
+		return -EINVAL;
+
+	/*
+	 * adding the same events twice to the uncore PMU may cause
+	 * general protection fault
+	 */
+	for (n = 0; n < box->n_events; n++) {
+		event = box->event_list[n];
+		if (event->hw.config == leader->hw.config)
+			return -EINVAL;
+	}
+
+	n = box->n_events;
+	box->event_list[n] = leader;
+	n++;
+	if (!dogrp)
+		return n;
+
+	list_for_each_entry(event, &leader->sibling_list, group_entry) {
+		if (event->state <= PERF_EVENT_STATE_OFF)
+			continue;
+
+		if (n >= max_count)
+			return -EINVAL;
+
+		box->event_list[n] = event;
+		n++;
+	}
+	return n;
+}
+
+static struct event_constraint *
+uncore_event_constraint(struct intel_uncore_type *type,
+			struct perf_event *event)
+{
+	struct event_constraint *c;
+
+	if (event->hw.config == (u64)-1)
+		return &constraint_fixed;
+
+	if (type->constraints) {
+		for_each_event_constraint(c, type->constraints) {
+			if ((event->hw.config & c->cmask) == c->code)
+				return c;
+		}
+	}
+
+	if (type->num_counters == 2)
+		return &unconstrained_2;
+	if (type->num_counters == 3)
+		return &unconstrained_3;
+	if (type->num_counters == 4)
+		return &unconstrained_4;
+	if (type->num_counters == 8)
+		return &unconstrained_8;
+
+	WARN_ON_ONCE(1);
+	return &unconstrained_2;
+}
+
+static int uncore_assign_events(struct intel_uncore_box *box,
+				int assign[], int n)
+{
+	struct event_constraint *c, *constraints[UNCORE_PMC_IDX_MAX];
+	int i, ret, wmin, wmax;
+
+	for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
+		c = uncore_event_constraint(box->pmu->type,
+					box->event_list[i]);
+		constraints[i] = c;
+		wmin = min(wmin, c->weight);
+		wmax = max(wmax, c->weight);
+	}
+
+	ret = perf_assign_events(constraints, n, wmin, wmax, assign);
+	return ret ? -EINVAL : 0;
+}
+
+static void __uncore_pmu_event_start(struct intel_uncore_box *box,
+				     struct perf_event *event, int flags)
+{
+	int idx = event->hw.idx;
+
+	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+		return;
+
+	if (WARN_ON_ONCE(idx == -1 || idx >= UNCORE_PMC_IDX_MAX))
+		return;
+
+	event->hw.state = 0;
+	__set_bit(idx, box->active_mask);
+	box->n_active++;
+	box->events[idx] = event;
+
+	local64_set(&event->hw.prev_count, uncore_read_counter(box, event));
+	uncore_enable_event(box, event);
+
+	if (box->n_active == 1)
+		uncore_pmu_start_hrtimer(box);
+}
+
+static void uncore_pmu_event_start(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+
+	raw_spin_lock(&box->lock);
+	__uncore_pmu_event_start(box, event, flags);
+	raw_spin_unlock(&box->lock);
+}
+
+static void __uncore_pmu_event_stop(struct intel_uncore_box *box,
+				    struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (__test_and_clear_bit(hwc->idx, box->active_mask)) {
+		uncore_disable_event(box, event);
+		box->n_active--;
+		box->events[hwc->idx] = NULL;
+		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+		hwc->state |= PERF_HES_STOPPED;
+
+		if (box->n_active == 0)
+			uncore_pmu_cancel_hrtimer(box);
+	}
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		/*
+		 * Drain the remaining delta count out of a event
+		 * that we are disabling:
+		 */
+		__uncore_perf_event_update(box, event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static void uncore_pmu_event_stop(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+
+	raw_spin_lock(&box->lock);
+	__uncore_pmu_event_stop(box, event, flags);
+	raw_spin_unlock(&box->lock);
+}
+
+static int uncore_pmu_event_add(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+	struct hw_perf_event *hwc = &event->hw;
+	int assign[UNCORE_PMC_IDX_MAX];
+	int i, n, ret;
+
+	if (!box)
+		return -ENODEV;
+
+	raw_spin_lock(&box->lock);
+	uncore_disable_all(box);
+
+	ret = n = uncore_collect_events(box, event, false);
+	if (ret < 0)
+		goto out;
+
+	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+	if (!(flags & PERF_EF_START))
+		hwc->state |= PERF_HES_ARCH;
+
+	ret = uncore_assign_events(box, assign, n);
+	if (ret)
+		goto out;
+
+	/* save events moving to new counters */
+	for (i = 0; i < box->n_events; i++) {
+		event = box->event_list[i];
+		hwc = &event->hw;
+
+		if (hwc->idx == assign[i] &&
+		    hwc->last_tag == box->tags[assign[i]])
+			continue;
+		/*
+		 * Ensure we don't accidentally enable a stopped
+		 * counter simply because we rescheduled.
+		 */
+		if (hwc->state & PERF_HES_STOPPED)
+			hwc->state |= PERF_HES_ARCH;
+
+		__uncore_pmu_event_stop(box, event, PERF_EF_UPDATE);
+	}
+
+	/* reprogram moved events into new counters */
+	for (i = 0; i < n; i++) {
+		event = box->event_list[i];
+		hwc = &event->hw;
+
+		if (hwc->idx != assign[i] ||
+		    hwc->last_tag != box->tags[assign[i]])
+			uncore_assign_hw_event(box, event, assign[i]);
+		else if (i < box->n_events)
+			continue;
+
+		if (hwc->state & PERF_HES_ARCH)
+			continue;
+
+		__uncore_pmu_event_start(box, event, 0);
+	}
+
+	box->n_events = n;
+	ret = 0;
+out:
+	uncore_enable_all(box);
+	raw_spin_unlock(&box->lock);
+	return ret;
+}
+
+static void uncore_pmu_event_del(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+	int i;
+
+	raw_spin_lock(&box->lock);
+	__uncore_pmu_event_stop(box, event, PERF_EF_UPDATE);
+
+	for (i = 0; i < box->n_events; i++) {
+		if (event == box->event_list[i]) {
+			while (++i < box->n_events)
+				box->event_list[i - 1] = box->event_list[i];
+
+			--box->n_events;
+			break;
+		}
+	}
+	raw_spin_unlock(&box->lock);
+}
+
+static void uncore_pmu_event_read(struct perf_event *event)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+
+	uncore_perf_event_update(box, event);
+}
+
+/*
+ * validation ensures the group can be loaded onto the
+ * PMU if it was the only group available.
+ */
+static int uncore_validate_group(struct intel_uncore_pmu *pmu,
+				 struct perf_event *event)
+{
+	struct perf_event *leader = event->group_leader;
+	struct intel_uncore_box *fake_box;
+	int assign[UNCORE_PMC_IDX_MAX];
+	int ret = -EINVAL, n;
+
+	fake_box = alloc_uncore_box(smp_processor_id());
+	if (!fake_box)
+		return -ENOMEM;
+
+	fake_box->pmu = pmu;
+	/*
+	 * the event is not yet connected with its
+	 * siblings therefore we must first collect
+	 * existing siblings, then add the new event
+	 * before we can simulate the scheduling
+	 */
+	n = uncore_collect_events(fake_box, leader, true);
+	if (n < 0)
+		goto out;
+
+	fake_box->n_events = n;
+	n = uncore_collect_events(fake_box, event, false);
+	if (n < 0)
+		goto out;
+
+	fake_box->n_events = n;
+
+	ret = uncore_assign_events(fake_box, assign, n);
+out:
+	kfree(fake_box);
+	return ret;
+}
+
+int uncore_pmu_event_init(struct perf_event *event)
+{
+	struct intel_uncore_pmu *pmu;
+	struct hw_perf_event *hwc = &event->hw;
+	int ret = 0;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	pmu = uncore_event_to_pmu(event);
+	/* no device found for this pmu */
+	if (pmu->func_id < 0)
+		return -ENOENT;
+
+	/*
+	 * Uncore PMU does measure at all privilege level all the time.
+	 * So it doesn't make sense to specify any exclude bits.
+	 */
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+	    event->attr.exclude_hv || event->attr.exclude_idle)
+		return -EINVAL;
+
+	/* Sampling not supported yet */
+	if (hwc->sample_period)
+		return -EINVAL;
+
+	if (event->attr.config == UNCORE_FIXED_EVENT) {
+		/* no fixed counter */
+		if (!pmu->type->fixed_ctl)
+			return -EINVAL;
+		/*
+		 * if there is only one fixed counter, only the first pmu
+		 * can access the fixed counter
+		 */
+		if (pmu->type->single_fixed && pmu->pmu_idx > 0)
+			return -EINVAL;
+		hwc->config = (u64)-1;
+	} else {
+		hwc->config = event->attr.config & pmu->type->event_mask;
+	}
+
+	event->hw.idx = -1;
+	event->hw.last_tag = ~0ULL;
+
+	if (event->group_leader != event)
+		ret = uncore_validate_group(pmu, event);
+
+	return ret;
+}
+
+static int __init uncore_pmu_register(struct intel_uncore_pmu *pmu)
+{
+	int ret;
+
+	pmu->pmu.attr_groups	= pmu->type->attr_groups;
+	pmu->pmu.task_ctx_nr	= perf_invalid_context;
+	pmu->pmu.event_init	= uncore_pmu_event_init;
+	pmu->pmu.add		= uncore_pmu_event_add;
+	pmu->pmu.del		= uncore_pmu_event_del;
+	pmu->pmu.start		= uncore_pmu_event_start;
+	pmu->pmu.stop		= uncore_pmu_event_stop;
+	pmu->pmu.read		= uncore_pmu_event_read;
+
+	if (pmu->type->num_boxes == 1)
+		sprintf(pmu->name, "uncore_%s", pmu->type->name);
+	else
+		sprintf(pmu->name, "uncore_%s%d", pmu->type->name,
+			pmu->pmu_idx);
+
+	ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
+	return ret;
+}
+
+static void __init uncore_type_exit(struct intel_uncore_type *type)
+{
+	struct intel_uncore_pmu *pmus = type->pmus;
+
+	kfree(type->attr_groups[1]);
+	kfree(pmus);
+}
+
+static void __init uncore_types_exit(struct intel_uncore_type **types)
+{
+	int i;
+
+	for (i = 0; types[i]; i++)
+		uncore_type_exit(types[i]);
+}
+
+static int __init uncore_type_init(struct intel_uncore_type *type)
+{
+	struct intel_uncore_pmu *pmus;
+	struct attribute_group *events_group;
+	struct attribute **attrs;
+	int i, j;
+
+	pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
+	if (!pmus)
+		return -ENOMEM;
+
+	for (i = 0; i < type->num_boxes; i++) {
+		pmus[i].func_id = -1;
+		pmus[i].pmu_idx = i;
+		pmus[i].type = type;
+
+		for (j = 0; j < ARRAY_SIZE(pmus[0].box_hash); j++)
+			INIT_HLIST_HEAD(&pmus[i].box_hash[j]);
+	}
+
+	if (type->event_descs) {
+		for (i = 0; ; i++) {
+			if (!type->event_descs[i].attr.attr.name)
+				break;
+		}
+
+		events_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
+				sizeof(*events_group), GFP_KERNEL);
+		if (!events_group)
+			goto fail;
+
+		attrs = (struct attribute **)(events_group + 1);
+		events_group->name = "events";
+		events_group->attrs = attrs;
+
+		for (j = 0; j < i; j++)
+			attrs[j] = &type->event_descs[j].attr.attr;
+
+		type->attr_groups[1] = events_group;
+	}
+	type->pmus = pmus;
+	return 0;
+fail:
+	uncore_type_exit(type);
+	return -ENOMEM;
+}
+
+static int __init uncore_types_init(struct intel_uncore_type **types)
+{
+	int i, ret;
+
+	for (i = 0; types[i]; i++) {
+		ret = uncore_type_init(types[i]);
+		if (ret)
+			goto fail;
+	}
+	return 0;
+fail:
+	while (--i >= 0)
+		uncore_type_exit(types[i]);
+	return ret;
+}
+
+static void uncore_cpu_dying(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int i, j, phyid, free_it;
+
+	phyid = topology_physical_package_id(cpu);
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			box = uncore_pmu_find_box(pmu, phyid);
+			if (box) {
+				free_it = 0;
+				spin_lock(&uncore_box_lock);
+				if (--box->refcnt == 0) {
+					hlist_del_rcu(&box->hlist);
+					free_it = 1;
+				}
+				spin_unlock(&uncore_box_lock);
+				if (free_it)
+					kfree_rcu(box, rcu_head);
+			}
+		}
+	}
+}
+
+static int uncore_cpu_starting(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int i, j, phyid;
+
+	phyid = topology_physical_package_id(cpu);
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			box = uncore_pmu_find_box(pmu, phyid);
+			if (box)
+				uncore_box_init(box);
+		}
+	}
+	return 0;
+}
+
+static int uncore_cpu_prepare(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *exist, *box = NULL;
+	int i, j, phyid;
+
+	phyid = topology_physical_package_id(cpu);
+
+	/* pre-allocate box */
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			exist = NULL;
+			pmu = &type->pmus[j];
+
+			spin_lock(&uncore_box_lock);
+			if (pmu->func_id < 0)
+				pmu->func_id = j;
+			exist = __uncore_pmu_find_box(pmu, phyid);
+			if (exist)
+				exist->refcnt++;
+			spin_unlock(&uncore_box_lock);
+			if (exist)
+				continue;
+
+			if (!box)
+				box = alloc_uncore_box(cpu);
+			if (!box)
+				return -ENOMEM;
+
+			spin_lock(&uncore_box_lock);
+			exist = __uncore_pmu_find_box(pmu, phyid);
+			if (!exist) {
+				box->pmu = pmu;
+				box->phy_id = phyid;
+				uncore_pmu_add_box(pmu, box);
+				box = NULL;
+			}
+			spin_unlock(&uncore_box_lock);
+		}
+	}
+	kfree(box);
+	return 0;
+}
+
+static void __init uncore_cpu_setup(void *dummy)
+{
+	uncore_cpu_starting(smp_processor_id());
+}
+
+static int __cpuinit uncore_cpu_notifier(struct notifier_block *self,
+					 unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (long)hcpu;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_UP_PREPARE:
+		uncore_cpu_prepare(cpu);
+		break;
+	case CPU_STARTING:
+		uncore_cpu_starting(cpu);
+		break;
+	case CPU_UP_CANCELED:
+	case CPU_DYING:
+		uncore_cpu_dying(cpu);
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static int __init uncore_cpu_init(void)
+{
+	int ret, cpu;
+
+	switch (boot_cpu_data.x86_model) {
+	default:
+		return 0;
+	}
+
+	ret = uncore_types_init(msr_uncores);
+	if (ret)
+		return ret;
+
+	get_online_cpus();
+	for_each_online_cpu(cpu)
+		uncore_cpu_prepare(cpu);
+
+	preempt_disable();
+	smp_call_function(uncore_cpu_setup, NULL, 1);
+	uncore_cpu_setup(NULL);
+	preempt_enable();
+
+	perf_cpu_notifier(uncore_cpu_notifier);
+	put_online_cpus();
+
+	return 0;
+}
+
+static int __init uncore_pmus_register(void)
+{
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_type *type;
+	int i, j;
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			uncore_pmu_register(pmu);
+		}
+	}
+
+	return 0;
+}
+
+static int __init intel_uncore_init(void)
+{
+	int ret;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return -ENODEV;
+
+	ret = uncore_cpu_init();
+	if (ret) {
+		goto fail;
+	}
+
+	uncore_pmus_register();
+	return 0;
+fail:
+	return ret;
+}
+device_initcall(intel_uncore_init);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
new file mode 100644
index 0000000..b5d7124
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -0,0 +1,200 @@
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+#include "perf_event.h"
+
+#define UNCORE_PMU_NAME_LEN		16
+#define UNCORE_BOX_HASH_SIZE		8
+
+#define UNCORE_PMU_HRTIMER_INTERVAL	(10 * NSEC_PER_SEC)
+
+#define UNCORE_FIXED_EVENT		0xffff
+#define UNCORE_PMC_IDX_MAX_GENERIC	8
+#define UNCORE_PMC_IDX_FIXED		UNCORE_PMC_IDX_MAX_GENERIC
+#define UNCORE_PMC_IDX_MAX		(UNCORE_PMC_IDX_FIXED + 1)
+
+#define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)
+
+struct intel_uncore_ops;
+struct intel_uncore_pmu;
+struct intel_uncore_box;
+struct uncore_event_desc;
+
+struct intel_uncore_type {
+	const char *name;
+	int num_counters;
+	int num_boxes;
+	int perf_ctr_bits;
+	int fixed_ctr_bits;
+	int single_fixed;
+	unsigned perf_ctr;
+	unsigned event_ctl;
+	unsigned event_mask;
+	unsigned fixed_ctr;
+	unsigned fixed_ctl;
+	unsigned box_ctl;
+	unsigned msr_offset;
+	struct intel_uncore_ops *ops;
+	struct event_constraint *constraints;
+	struct uncore_event_desc *event_descs;
+	const struct attribute_group *attr_groups[3];
+	struct intel_uncore_pmu *pmus;
+};
+
+#define format_group attr_groups[0]
+
+struct intel_uncore_ops {
+	void (*init)(struct intel_uncore_box *);
+	void (*disable_all)(struct intel_uncore_box *);
+	void (*enable_all)(struct intel_uncore_box *);
+	void (*disable_event)(struct intel_uncore_box *, struct perf_event *);
+	void (*enable_event)(struct intel_uncore_box *, struct perf_event *);
+	u64 (*read_counter)(struct intel_uncore_box *, struct perf_event *);
+};
+
+struct intel_uncore_pmu {
+	struct pmu pmu;
+	char name[UNCORE_PMU_NAME_LEN];
+	int pmu_idx;
+	int func_id;
+	struct intel_uncore_type *type;
+	struct hlist_head box_hash[UNCORE_BOX_HASH_SIZE];
+};
+
+struct intel_uncore_box {
+	struct hlist_node hlist;
+	int phy_id;
+	int refcnt;
+	int n_active;
+	int n_events;
+	unsigned long flags;
+	struct perf_event *events[UNCORE_PMC_IDX_MAX];
+	struct perf_event *event_list[UNCORE_PMC_IDX_MAX];
+	unsigned long active_mask[BITS_TO_LONGS(UNCORE_PMC_IDX_MAX)];
+	u64 tags[UNCORE_PMC_IDX_MAX];
+	struct intel_uncore_pmu *pmu;
+	struct hrtimer hrtimer;
+	struct rcu_head rcu_head;
+	raw_spinlock_t lock;
+};
+
+#define UNCORE_BOX_FLAG_INITIATED	0
+
+struct uncore_event_desc {
+	struct kobj_attribute attr;
+	u64 config;
+};
+
+#define INTEL_UNCORE_EVENT_DESC(_name, _config)			\
+{								\
+	.attr	= __ATTR(_name, 0444, uncore_event_show, NULL),	\
+	.config	= _config,					\
+}
+
+#define DEFINE_UNCORE_FORMAT_ATTR(_var, _name, _format)			\
+static ssize_t __uncore_##_var##_show(struct kobject *kobj,		\
+				struct kobj_attribute *attr,		\
+				char *page)				\
+{									\
+	BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE);			\
+	return sprintf(page, _format "\n");				\
+}									\
+static struct kobj_attribute format_attr_##_var =			\
+	__ATTR(_name, 0444, __uncore_##_var##_show, NULL)
+
+
+static ssize_t uncore_event_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	struct uncore_event_desc *event =
+		container_of(attr, struct uncore_event_desc, attr);
+	return sprintf(buf, "0x%llx\n", event->config);
+}
+
+static inline
+unsigned uncore_msr_box_ctl(struct intel_uncore_box *box)
+{
+	if (!box->pmu->type->box_ctl)
+		return 0;
+	return box->pmu->type->box_ctl +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_fixed_ctl(struct intel_uncore_box *box)
+{
+	if (!box->pmu->type->fixed_ctl)
+		return 0;
+	return box->pmu->type->fixed_ctl +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_fixed_ctr(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctr +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_event_ctl(struct intel_uncore_box *box, int idx)
+{
+	return idx + box->pmu->type->event_ctl +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_perf_ctr(struct intel_uncore_box *box, int idx)
+{
+	return idx + box->pmu->type->perf_ctr +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
+{
+	return box->pmu->type->perf_ctr_bits;
+}
+
+static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctr_bits;
+}
+
+static inline int uncore_num_counters(struct intel_uncore_box *box)
+{
+	return box->pmu->type->num_counters;
+}
+
+static inline void uncore_disable_all(struct intel_uncore_box *box)
+{
+	box->pmu->type->ops->disable_all(box);
+}
+
+static inline void uncore_enable_all(struct intel_uncore_box *box)
+{
+	box->pmu->type->ops->enable_all(box);
+}
+
+static inline void uncore_disable_event(struct intel_uncore_box *box,
+				struct perf_event *event)
+{
+	box->pmu->type->ops->disable_event(box, event);
+}
+
+static inline void uncore_enable_event(struct intel_uncore_box *box,
+				struct perf_event *event)
+{
+	box->pmu->type->ops->enable_event(box, event);
+}
+
+static inline u64 uncore_read_counter(struct intel_uncore_box *box,
+				struct perf_event *event)
+{
+	return box->pmu->type->ops->read_counter(box, event);
+}
+
+static inline void uncore_box_init(struct intel_uncore_box *box)
+{
+	if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags))
+		box->pmu->type->ops->init(box);
+}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] perf: Add Nehalem and Sandy Bridge uncore support
  2012-03-28  6:43 [RFC PATCH 0/5] perf: Intel uncore pmu counting support Yan, Zheng
  2012-03-28  6:43 ` [PATCH 1/5] perf: Export perf_assign_events Yan, Zheng
  2012-03-28  6:43 ` [PATCH 2/5] perf: generic intel uncore support Yan, Zheng
@ 2012-03-28  6:43 ` Yan, Zheng
  2012-03-28  6:43 ` [PATCH 4/5] perf: Generic pci uncore device support Yan, Zheng
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28  6:43 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian; +Cc: linux-kernel, ming.m.lin

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Add Intel Nehalem and Sandy Bridge uncore pmu support.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  240 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   50 +++++
 2 files changed, 290 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index d159e3e..39a1c0e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -21,6 +21,237 @@ static struct event_constraint constraint_fixed =
 
 static DEFINE_SPINLOCK(uncore_box_lock);
 
+DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
+DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
+DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
+DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
+DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
+DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+
+/* Sandy Bridge uncore support */
+static void snb_uncore_msr_disable_all(struct intel_uncore_box *box)
+{
+	u64 config;
+	unsigned msr;
+	int i;
+
+	for (i = 0; i < UNCORE_PMC_IDX_MAX; i++) {
+		if (!test_bit(i, box->active_mask))
+			continue;
+
+		if (i < UNCORE_PMC_IDX_FIXED)
+			msr = uncore_msr_event_ctl(box, i);
+		else
+			msr = uncore_msr_fixed_ctl(box);
+
+		rdmsrl(msr, config);
+		config &= ~SNB_UNC_CTL_EN;
+		wrmsrl(msr, config);
+	}
+}
+
+static void snb_uncore_msr_enable_all(struct intel_uncore_box *box)
+{
+	u64 config;
+	unsigned msr;
+	int i;
+
+	for (i = 0; i < UNCORE_PMC_IDX_MAX; i++) {
+		if (!test_bit(i, box->active_mask))
+			continue;
+
+		if (i < UNCORE_PMC_IDX_FIXED)
+			msr = uncore_msr_event_ctl(box, i);
+		else
+			msr = uncore_msr_fixed_ctl(box);
+
+		msr = uncore_msr_event_ctl(box, i);
+		rdmsrl(msr, config);
+		config |= SNB_UNC_CTL_EN;
+		wrmsrl(msr, config);
+	}
+}
+
+static void snb_uncore_msr_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->idx < UNCORE_PMC_IDX_FIXED)
+		wrmsrl(hwc->config_base, hwc->config | SNB_UNC_CTL_EN);
+	else
+		wrmsrl(hwc->config_base, SNB_UNC_CTL_EN);
+}
+
+static void snb_uncore_msr_disable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	wrmsrl(event->hw.config_base, 0);
+}
+
+static u64 snb_uncore_msr_read_counter(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	u64 count;
+	rdmsrl(event->hw.event_base, count);
+	return count;
+}
+
+static void snb_uncore_msr_box_init(struct intel_uncore_box *box)
+{
+	if (box->pmu->pmu_idx == 0) {
+		wrmsrl(SNB_UNC_PERF_GLOBAL_CTL,
+			SNB_UNC_GLOBAL_CTL_EN | SNB_UNC_GLOBAL_CTL_CORE_ALL);
+	}
+}
+
+static struct attribute *snb_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_cmask5.attr,
+	NULL,
+};
+
+static struct attribute_group snb_uncore_format_group = {
+	.name = "format",
+	.attrs = snb_uncore_formats_attr,
+};
+
+static struct intel_uncore_ops snb_uncore_msr_ops = {
+	.init		= snb_uncore_msr_box_init,
+	.disable_all	= snb_uncore_msr_disable_all,
+	.enable_all	= snb_uncore_msr_enable_all,
+	.disable_event	= snb_uncore_msr_disable_event,
+	.enable_event	= snb_uncore_msr_enable_event,
+	.read_counter	= snb_uncore_msr_read_counter,
+};
+
+static struct event_constraint snb_uncore_cbo_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x80, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x83, 0x1),
+	EVENT_CONSTRAINT_END
+};
+
+static struct intel_uncore_type snb_uncore_cbo = {
+	.name		= "cbo",
+	.num_counters   = 2,
+	.num_boxes	= 4,
+	.perf_ctr_bits	= 44,
+	.fixed_ctr_bits	= 48,
+	.perf_ctr	= SNB_UNC_CBO_0_PER_CTR0,
+	.event_ctl	= SNB_UNC_CBO_0_PERFEVTSEL0,
+	.fixed_ctr	= SNB_UNC_FIXED_CTR,
+	.fixed_ctl	= SNB_UNC_FIXED_CTR_CTRL,
+	.single_fixed	= 1,
+	.event_mask	= SNB_UNC_RAW_EVENT_MASK,
+	.msr_offset	= SNB_UNC_CBO_MSR_OFFSET,
+	.constraints	= snb_uncore_cbo_constraints,
+	.ops		= &snb_uncore_msr_ops,
+	.format_group	= &snb_uncore_format_group,
+};
+
+static struct intel_uncore_type *snb_msr_uncores[] = {
+	&snb_uncore_cbo,
+	NULL,
+};
+/* end of Sandy Bridge uncore support */
+
+/* Nehalem uncore support */
+static void nhm_uncore_msr_disable_all(struct intel_uncore_box *box)
+{
+	wrmsrl(NHM_UNC_PERF_GLOBAL_CTL, 0);
+}
+
+static void nhm_uncore_msr_enable_all(struct intel_uncore_box *box)
+{
+	wrmsrl(NHM_UNC_PERF_GLOBAL_CTL,
+		NHM_UNC_GLOBAL_CTL_EN_PC_ALL | NHM_UNC_GLOBAL_CTL_EN_FC);
+}
+
+static void nhm_uncore_msr_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->idx < UNCORE_PMC_IDX_FIXED)
+		wrmsrl(hwc->config_base, hwc->config | SNB_UNC_CTL_EN);
+	else
+		wrmsrl(hwc->config_base, NHM_UNC_FIXED_CTR_CTL_EN);
+}
+
+static void nhm_uncore_msr_box_init(struct intel_uncore_box *box)
+{
+	wrmsrl(NHM_UNC_PERF_GLOBAL_CTL, 0);
+}
+
+static struct attribute *nhm_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_cmask8.attr,
+	NULL,
+};
+
+static struct attribute_group nhm_uncore_format_group = {
+	.name = "format",
+	.attrs = nhm_uncore_formats_attr,
+};
+
+static struct uncore_event_desc nhm_uncore_events[] = {
+	INTEL_UNCORE_EVENT_DESC(CLOCKTICKS, 0xffff),
+	/* full cache line writes to DRAM */
+	INTEL_UNCORE_EVENT_DESC(QMC_WRITES_FULL.ANY, 0xf2f),
+	/* Quickpath Memory Controller medium and low priority read requests */
+	INTEL_UNCORE_EVENT_DESC(QMC_NORMAL_READS.ANY, 0xf2c),
+	/* Quickpath Home Logic read requests from the IOH */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST.IOH_READS, 0x120),
+	/* Quickpath Home Logic write requests from the IOH */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST.IOH_WRITES, 0x220),
+	/* Quickpath Home Logic read requests from a remote socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST.REMOTE_READS, 0x420),
+	/* Quickpath Home Logic write requests from a remote socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST.REMOTE_WRITES, 0x820),
+	/* Quickpath Home Logic read requests from the local socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST.LOCAL_READS, 0x1020),
+	/* Quickpath Home Logic write requests from the local socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST.LOCAL_WRITES, 0x2020),
+	{ /* end: all zeroes */ },
+};
+
+static struct intel_uncore_ops nhm_uncore_msr_ops = {
+	.init		= nhm_uncore_msr_box_init,
+	.disable_all	= nhm_uncore_msr_disable_all,
+	.enable_all	= nhm_uncore_msr_enable_all,
+	.disable_event	= snb_uncore_msr_disable_event,
+	.enable_event	= nhm_uncore_msr_enable_event,
+	.read_counter	= snb_uncore_msr_read_counter,
+};
+
+static struct intel_uncore_type nhm_uncore = {
+	.name		= "nhm",
+	.num_counters   = 8,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 48,
+	.fixed_ctr_bits	= 48,
+	.event_ctl	= NHM_UNC_PERFEVTSEL0,
+	.perf_ctr	= NHM_UNC_UNCORE_PMC0,
+	.fixed_ctr	= NHM_UNC_FIXED_CTR,
+	.fixed_ctl	= NHM_UNC_FIXED_CTR_CTRL,
+	.event_mask	= NHM_UNC_RAW_EVENT_MASK,
+	.event_descs	= nhm_uncore_events,
+	.ops		= &nhm_uncore_msr_ops,
+	.format_group	= &nhm_uncore_format_group,
+};
+
+static struct intel_uncore_type *nhm_msr_uncores[] = {
+	&nhm_uncore,
+	NULL,
+};
+/* end of Nehalem uncore support */
+
 static void uncore_assign_hw_event(struct intel_uncore_box *box,
 				   struct perf_event *event, int idx)
 {
@@ -754,6 +985,15 @@ static int __init uncore_cpu_init(void)
 	int ret, cpu;
 
 	switch (boot_cpu_data.x86_model) {
+	case 26: /* Nehalem */
+	case 30:
+	case 31:
+	case 37: /* Westmere */
+		msr_uncores = nhm_msr_uncores;
+		break;
+	case 42: /* Sandy Bridge */
+		msr_uncores = snb_msr_uncores;
+		break;
 	default:
 		return 0;
 	}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index b5d7124..389e996 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -15,6 +15,56 @@
 
 #define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)
 
+/* SNB event control */
+#define SNB_UNC_CTL_EV_SEL_MASK			0x000000ff
+#define SNB_UNC_CTL_UMASK_MASK			0x0000ff00
+#define SNB_UNC_CTL_EDGE_DET			(1 << 18)
+#define SNB_UNC_CTL_EN				(1 << 22)
+#define SNB_UNC_CTL_INVERT			(1 << 23)
+#define SNB_UNC_CTL_CMASK_MASK			0x1f000000
+#define NHM_UNC_CTL_CMASK_MASK			0xff000000
+#define NHM_UNC_FIXED_CTR_CTL_EN		(1 << 0)
+
+#define SNB_UNC_RAW_EVENT_MASK			(SNB_UNC_CTL_EV_SEL_MASK | \
+						 SNB_UNC_CTL_UMASK_MASK | \
+						 SNB_UNC_CTL_EDGE_DET | \
+						 SNB_UNC_CTL_INVERT | \
+						 SNB_UNC_CTL_CMASK_MASK)
+
+#define NHM_UNC_RAW_EVENT_MASK			(SNB_UNC_CTL_EV_SEL_MASK | \
+						 SNB_UNC_CTL_UMASK_MASK | \
+						 SNB_UNC_CTL_EDGE_DET | \
+						 SNB_UNC_CTL_INVERT | \
+						 NHM_UNC_CTL_CMASK_MASK)
+
+/* NHM & SNB global control register */
+#define SNB_UNC_PERF_GLOBAL_CTL                 0x391
+#define SNB_UNC_FIXED_CTR_CTRL                  0x394
+#define SNB_UNC_FIXED_CTR                       0x395
+
+/* SNB uncore global control */
+#define SNB_UNC_GLOBAL_CTL_CORE_ALL             ((1 << 4) - 1)
+#define SNB_UNC_GLOBAL_CTL_EN                   (1 << 29)
+
+/* SNB Cbo register */
+#define SNB_UNC_CBO_0_PERFEVTSEL0               0x700
+#define SNB_UNC_CBO_0_PER_CTR0                  0x706
+#define SNB_UNC_CBO_MSR_OFFSET                  0x10
+
+/* NHM global control register */
+#define NHM_UNC_PERF_GLOBAL_CTL                 0x391
+#define NHM_UNC_FIXED_CTR                       0x394
+#define NHM_UNC_FIXED_CTR_CTRL                  0x395
+
+/* NHM uncore global control */
+#define NHM_UNC_GLOBAL_CTL_EN_PC_ALL            ((1ULL << 8) - 1)
+#define NHM_UNC_GLOBAL_CTL_EN_FC                (1ULL << 32)
+
+/* NHM uncore register */
+#define NHM_UNC_PERFEVTSEL0                     0x3c0
+#define NHM_UNC_UNCORE_PMC0                     0x3b0
+
+
 struct intel_uncore_ops;
 struct intel_uncore_pmu;
 struct intel_uncore_box;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] perf: Generic pci uncore device support
  2012-03-28  6:43 [RFC PATCH 0/5] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (2 preceding siblings ...)
  2012-03-28  6:43 ` [PATCH 3/5] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
@ 2012-03-28  6:43 ` Yan, Zheng
  2012-03-28  6:43 ` [PATCH 5/5] perf: Add Sandy Bridge-EP uncore support Yan, Zheng
  2012-03-28  6:49 ` [RFC PATCH 0/5] perf: Intel uncore pmu counting support Ingo Molnar
  5 siblings, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28  6:43 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian; +Cc: linux-kernel, ming.m.lin

From: "Yan, Zheng" <zheng.z.yan@intel.com>

This patch adds generic support for uncore pmu presented as pci
device.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  148 ++++++++++++++++++++++++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   29 +++++
 2 files changed, 173 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 39a1c0e..7e19996 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -2,6 +2,12 @@
 
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
 static struct intel_uncore_type **msr_uncores = empty_uncore;
+static struct intel_uncore_type **pci_uncores = empty_uncore;
+static struct pci_driver *uncore_pci_driver;
+static bool pcidrv_registered;
+
+/* pci bus to socket mapping */
+static int pcibus_to_phyid[256] = { [0 ... 255] = -1, };
 
 /* constraint for box with 2 counters */
 static struct event_constraint unconstrained_2 =
@@ -261,13 +267,23 @@ static void uncore_assign_hw_event(struct intel_uncore_box *box,
 	hwc->last_tag = ++box->tags[idx];
 
 	if (hwc->idx == UNCORE_PMC_IDX_FIXED) {
-		hwc->event_base = uncore_msr_fixed_ctr(box);
-		hwc->config_base = uncore_msr_fixed_ctl(box);
+		if (box->pci_dev) {
+			hwc->event_base = uncore_pci_fixed_ctr(box);
+			hwc->config_base = uncore_pci_fixed_ctl(box);
+		} else {
+			hwc->event_base = uncore_msr_fixed_ctr(box);
+			hwc->config_base = uncore_msr_fixed_ctl(box);
+		}
 		return;
 	}
 
-	hwc->config_base = uncore_msr_event_ctl(box, hwc->idx);
-	hwc->event_base =  uncore_msr_perf_ctr(box, hwc->idx);
+	if (box->pci_dev) {
+		hwc->config_base = uncore_pci_event_ctl(box, hwc->idx);
+		hwc->event_base =  uncore_pci_perf_ctr(box, hwc->idx);
+	} else {
+		hwc->config_base = uncore_msr_event_ctl(box, hwc->idx);
+		hwc->event_base =  uncore_msr_perf_ctr(box, hwc->idx);
+	}
 }
 
 static void __uncore_perf_event_update(struct intel_uncore_box *box,
@@ -856,6 +872,118 @@ static int __init uncore_types_init(struct intel_uncore_type **types)
 	return ret;
 }
 
+/*
+ * add a pci uncore device
+ */
+static int __devinit uncore_pci_add(struct intel_uncore_type *type,
+				struct pci_dev *pdev)
+{
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int phyid, i, ret = 0;
+
+	phyid = pcibus_to_phyid[pdev->bus->number];
+	if (phyid < 0)
+		return -ENODEV;
+
+	box = alloc_uncore_box(0);
+	if (!box)
+		return -ENOMEM;
+
+	spin_lock(&uncore_box_lock);
+	/*
+	 * for performance monitoring unit with multiple boxes,
+	 * each box has a different function id.
+	 */
+	for (i = 0; i < type->num_boxes; i++) {
+		pmu = &type->pmus[i];
+		if (pmu->func_id == pdev->devfn)
+			break;
+		if (pmu->func_id < 0) {
+			pmu->func_id = pdev->devfn;
+			break;
+		}
+		pmu = NULL;
+	}
+
+	if (pmu) {
+		box->phy_id = phyid;
+		box->pci_dev = pdev;
+		box->pmu = pmu;
+		uncore_box_init(box);
+		pci_set_drvdata(pdev, box);
+		uncore_pmu_add_box(pmu, box);
+	} else {
+		ret = -EINVAL;
+	}
+	spin_unlock(&uncore_box_lock);
+	if (ret)
+		kfree(box);
+	return ret;
+}
+
+static void __devexit uncore_pci_remove(struct pci_dev *pdev)
+{
+	struct intel_uncore_box *box = pci_get_drvdata(pdev);
+	int phyid = pcibus_to_phyid[pdev->bus->number];
+	int free_it = 0;
+
+	if (WARN_ON_ONCE(phyid != box->phy_id))
+		return;
+
+	box->pci_dev = NULL;
+	spin_lock(&uncore_box_lock);
+	if (--box->refcnt == 0) {
+		hlist_del_rcu(&box->hlist);
+		free_it = 1;
+	}
+	spin_unlock(&uncore_box_lock);
+	if (free_it)
+		kfree_rcu(box, rcu_head);
+}
+
+static int __devinit uncore_pci_probe(struct pci_dev *pdev,
+				const struct pci_device_id *id)
+{
+	struct intel_uncore_type *type;
+
+	type = (struct intel_uncore_type *)id->driver_data;
+	return uncore_pci_add(type, pdev);
+}
+
+static int __init uncore_pci_init(void)
+{
+	int ret;
+
+	switch (boot_cpu_data.x86_model) {
+	default:
+		return 0;
+	}
+
+	ret =  uncore_types_init(pci_uncores);
+	if (ret)
+		return ret;
+
+	uncore_pci_driver->probe = uncore_pci_probe;
+	uncore_pci_driver->remove = uncore_pci_remove;
+
+	ret = pci_register_driver(uncore_pci_driver);
+	if (ret == 0)
+		pcidrv_registered = true;
+	else
+		uncore_types_exit(pci_uncores);
+
+	return ret;
+}
+
+static void __init uncore_pci_exit(void)
+{
+	if (pcidrv_registered) {
+		pci_unregister_driver(uncore_pci_driver);
+		pcidrv_registered = false;
+	}
+}
+
 static void uncore_cpu_dying(int cpu)
 {
 	struct intel_uncore_type *type;
@@ -1031,6 +1159,14 @@ static int __init uncore_pmus_register(void)
 		}
 	}
 
+	for (i = 0; pci_uncores[i]; i++) {
+		type = pci_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			uncore_pmu_register(pmu);
+		}
+	}
+
 	return 0;
 }
 
@@ -1041,8 +1177,12 @@ static int __init intel_uncore_init(void)
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
 		return -ENODEV;
 
+	ret = uncore_pci_init();
+	if (ret)
+		goto fail;
 	ret = uncore_cpu_init();
 	if (ret) {
+		uncore_pci_exit();
 		goto fail;
 	}
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index 389e996..39df0ec 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -1,5 +1,6 @@
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/pci.h>
 #include <linux/perf_event.h>
 #include "perf_event.h"
 
@@ -122,6 +123,7 @@ struct intel_uncore_box {
 	struct perf_event *event_list[UNCORE_PMC_IDX_MAX];
 	unsigned long active_mask[BITS_TO_LONGS(UNCORE_PMC_IDX_MAX)];
 	u64 tags[UNCORE_PMC_IDX_MAX];
+	struct pci_dev *pci_dev;
 	struct intel_uncore_pmu *pmu;
 	struct hrtimer hrtimer;
 	struct rcu_head rcu_head;
@@ -161,6 +163,33 @@ static ssize_t uncore_event_show(struct kobject *kobj,
 	return sprintf(buf, "0x%llx\n", event->config);
 }
 
+static inline unsigned uncore_pci_box_ctl(struct intel_uncore_box *box)
+{
+	return box->pmu->type->box_ctl;
+}
+
+static inline unsigned uncore_pci_fixed_ctl(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctl;
+}
+
+static inline unsigned uncore_pci_fixed_ctr(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctr;
+}
+
+static inline
+unsigned uncore_pci_event_ctl(struct intel_uncore_box *box, int idx)
+{
+	return idx * 4 + box->pmu->type->event_ctl;
+}
+
+static inline
+unsigned uncore_pci_perf_ctr(struct intel_uncore_box *box, int idx)
+{
+	return idx * 8 + box->pmu->type->perf_ctr;
+}
+
 static inline
 unsigned uncore_msr_box_ctl(struct intel_uncore_box *box)
 {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] perf: Add Sandy Bridge-EP uncore support
  2012-03-28  6:43 [RFC PATCH 0/5] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (3 preceding siblings ...)
  2012-03-28  6:43 ` [PATCH 4/5] perf: Generic pci uncore device support Yan, Zheng
@ 2012-03-28  6:43 ` Yan, Zheng
  2012-03-28  6:49 ` [RFC PATCH 0/5] perf: Intel uncore pmu counting support Ingo Molnar
  5 siblings, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28  6:43 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian; +Cc: linux-kernel, ming.m.lin

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Add Intel Nehalem and Sandy Bridge uncore pmu support. The uncore
subsystem in Sandy Bridge-EP consists of 8 components (Ubox,
Cacheing Agent, Home Agent, Memory controller, Power Control,
QPI Link Layer, R2PCIe, R3QPI)

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  507 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   86 +++++
 include/linux/pci_ids.h                       |   11 +
 3 files changed, 604 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 7e19996..6656708 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -33,6 +33,505 @@ DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
 DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
 DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
 DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(thresh8, thresh, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(thresh5, thresh, "config:24-28");
+DEFINE_UNCORE_FORMAT_ATTR(occ_sel, occ_sel, "config:14-15");
+DEFINE_UNCORE_FORMAT_ATTR(occ_invert, occ_invert, "config:30");
+DEFINE_UNCORE_FORMAT_ATTR(occ_edge, occ_edge, "config:14-51");
+
+/* Sandy Bridge-EP uncore support */
+static void snbep_uncore_pci_disable_all(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+	u32 config;
+
+	pci_read_config_dword(pdev, box_ctl, &config);
+	config |= SNBEP_PMON_BOX_CTL_FRZ;
+	pci_write_config_dword(pdev, box_ctl, config);
+}
+
+static void snbep_uncore_pci_enable_all(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+	u32 config;
+
+	pci_read_config_dword(pdev, box_ctl, &config);
+	config &= ~SNBEP_PMON_BOX_CTL_FRZ;
+	pci_write_config_dword(pdev, box_ctl, config);
+}
+
+static void snbep_uncore_pci_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, hwc->config |
+				SNBEP_PMON_CTL_EN);
+}
+
+static void snbep_uncore_pci_disable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, hwc->config);
+}
+
+static u64 snbep_uncore_pci_read_counter(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+	u64 count;
+
+	pci_read_config_dword(pdev, hwc->event_base, (u32 *)&count);
+	pci_read_config_dword(pdev, hwc->event_base + 4, (u32 *)&count + 1);
+	return count;
+}
+
+static void snbep_uncore_pci_box_init(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL,
+				SNBEP_PMON_BOX_CTL_INT);
+}
+
+static void snbep_uncore_msr_disable_all(struct intel_uncore_box *box)
+{
+	u64 config;
+	unsigned msr;
+	int i;
+
+	msr = uncore_msr_box_ctl(box);
+	if (msr) {
+		rdmsrl(msr, config);
+		config |= SNBEP_PMON_BOX_CTL_FRZ;
+		wrmsrl(msr, config);
+		return;
+	}
+
+	/* ubox has no box level control */
+	for (i = 0; i < uncore_num_counters(box); i++) {
+		if (!test_bit(i, box->active_mask))
+			continue;
+
+		msr = uncore_msr_event_ctl(box, i);
+		rdmsrl(msr, config);
+		config &= ~SNBEP_PMON_CTL_EN;
+		wrmsrl(msr, config);
+	}
+}
+
+static void snbep_uncore_msr_enable_all(struct intel_uncore_box *box)
+{
+	u64 config;
+	unsigned msr;
+	int i;
+
+	msr = uncore_msr_box_ctl(box);
+	if (msr) {
+		rdmsrl(msr, config);
+		config &= ~SNBEP_PMON_BOX_CTL_FRZ;
+		wrmsrl(msr, config);
+		return;
+	}
+
+	/* ubox has no box level control */
+	for (i = 0; i < uncore_num_counters(box); i++) {
+		if (!test_bit(i, box->active_mask))
+			continue;
+
+		msr = uncore_msr_event_ctl(box, i);
+		rdmsrl(msr, config);
+		config |= SNBEP_PMON_CTL_EN;
+		wrmsrl(msr, config);
+	}
+}
+
+static void snbep_uncore_msr_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, hwc->config | SNBEP_PMON_CTL_EN);
+}
+
+static void snbep_uncore_msr_disable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, hwc->config);
+}
+
+static u64 snbep_uncore_msr_read_counter(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 count;
+
+	rdmsrl(hwc->event_base, count);
+	return count;
+}
+
+static void snbep_uncore_msr_box_init(struct intel_uncore_box *box)
+{
+	if (uncore_msr_box_ctl(box))
+		wrmsrl(uncore_msr_box_ctl(box), SNBEP_PMON_BOX_CTL_INT);
+}
+
+static struct attribute *snbep_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh8.attr,
+	NULL,
+};
+
+static struct attribute *snbep_uncore_ubox_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh5.attr,
+	NULL,
+};
+
+static struct attribute *snbep_uncore_pcu_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_occ_sel.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh5.attr,
+	&format_attr_occ_invert.attr,
+	&format_attr_occ_edge.attr,
+	NULL,
+};
+
+static struct uncore_event_desc snbep_uncore_imc_events[] = {
+	/* DRAM Activate Count */
+	INTEL_UNCORE_EVENT_DESC(ACT_COUNT, 0x1),
+	/* DRAM Read CAS commands */
+	INTEL_UNCORE_EVENT_DESC(CAS_COUNT.RD, 0x304),
+	/* DRAM Write CAS commands */
+	INTEL_UNCORE_EVENT_DESC(CAS_COUNT.WR, 0xc04),
+	INTEL_UNCORE_EVENT_DESC(CAS_COUNT.ALL, 0xf04),
+	{ /* end: all zeroes */ },
+};
+
+static struct uncore_event_desc snbep_uncore_r2pcie_events[] = {
+	INTEL_UNCORE_EVENT_DESC(CLOCKTICKS, 0x1),
+	/* date ring cycles used in the up direction */
+	INTEL_UNCORE_EVENT_DESC(RING_BL_USED.UP, 0x309),
+	/* date ring cycles used in the down direction */
+	INTEL_UNCORE_EVENT_DESC(RING_BL_USED.DOWN, 0xc09),
+	{ /* end: all zeroes */ },
+};
+
+static struct attribute_group snbep_uncore_format_group = {
+	.name = "format",
+	.attrs = snbep_uncore_formats_attr,
+};
+
+static struct attribute_group snbep_uncore_ubox_format_group = {
+	.name = "format",
+	.attrs = snbep_uncore_ubox_formats_attr,
+};
+
+static struct attribute_group snbep_uncore_pcu_format_group = {
+	.name = "format",
+	.attrs = snbep_uncore_pcu_formats_attr,
+};
+
+static struct intel_uncore_ops snbep_uncore_msr_ops = {
+	.init		= snbep_uncore_msr_box_init,
+	.disable_all	= snbep_uncore_msr_disable_all,
+	.enable_all	= snbep_uncore_msr_enable_all,
+	.disable_event	= snbep_uncore_msr_disable_event,
+	.enable_event	= snbep_uncore_msr_enable_event,
+	.read_counter	= snbep_uncore_msr_read_counter,
+};
+
+static struct intel_uncore_ops snbep_uncore_pci_ops = {
+	.init		= snbep_uncore_pci_box_init,
+	.disable_all	= snbep_uncore_pci_disable_all,
+	.enable_all	= snbep_uncore_pci_enable_all,
+	.disable_event	= snbep_uncore_pci_disable_event,
+	.enable_event	= snbep_uncore_pci_enable_event,
+	.read_counter	= snbep_uncore_pci_read_counter,
+};
+
+static struct event_constraint snbep_uncore_cbo_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x01, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x02, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x04, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x05, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x07, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x11, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x12, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x13, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x1b, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1c, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1d, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1e, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1f, 0xe),
+	UNCORE_EVENT_CONSTRAINT(0x21, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x23, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x31, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x32, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x33, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x34, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x35, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x36, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x37, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x38, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x39, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x3b, 0x1),
+	EVENT_CONSTRAINT_END
+};
+
+static struct event_constraint snbep_uncore_r2pcie_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x10, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x11, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x12, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x23, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x24, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x25, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x26, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x32, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x33, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x34, 0x3),
+	EVENT_CONSTRAINT_END
+};
+
+static struct event_constraint snbep_uncore_r3qpi_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x10, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x11, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x12, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x13, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x20, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x21, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x22, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x23, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x24, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x25, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x26, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x30, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x31, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x32, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x33, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x34, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x36, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x37, 0x3),
+	EVENT_CONSTRAINT_END
+};
+
+static struct intel_uncore_type snbep_uncore_ubox = {
+	.name		= "ubox",
+	.num_counters   = 2,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 44,
+	.fixed_ctr_bits	= 48,
+	.perf_ctr	= SNBEP_U_MSR_PMON_CTR0,
+	.event_ctl	= SNBEP_U_MSR_PMON_CTL0,
+	.event_mask	= SNBEP_U_MSR_PMON_RAW_EVENT_MASK,
+	.fixed_ctr	= SNBEP_U_MSR_PMON_UCLK_FIXED_CTR,
+	.fixed_ctl	= SNBEP_U_MSR_PMON_UCLK_FIXED_CTL,
+	.ops		= &snbep_uncore_msr_ops,
+	.format_group	= &snbep_uncore_ubox_format_group,
+};
+
+static struct intel_uncore_type snbep_uncore_cbo = {
+	.name		= "cbo",
+	.num_counters   = 4,
+	.num_boxes	= 8,
+	.perf_ctr_bits	= 44,
+	.event_ctl	= SNBEP_C0_MSR_PMON_CTL0,
+	.perf_ctr	= SNBEP_C0_MSR_PMON_CTR0,
+	.event_mask	= SNBEP_PMON_RAW_EVENT_MASK,
+	.box_ctl	= SNBEP_C0_MSR_PMON_BOX_CTL,
+	.msr_offset	= SNBEP_CBO_MSR_OFFSET,
+	.constraints	= snbep_uncore_cbo_constraints,
+	.ops		= &snbep_uncore_msr_ops,
+	.format_group	= &snbep_uncore_format_group,
+};
+
+static struct intel_uncore_type snbep_uncore_pcu = {
+	.name		= "pcu",
+	.num_counters   = 4,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 48,
+	.perf_ctr	= SNBEP_PCU_MSR_PMON_CTR0,
+	.event_ctl	= SNBEP_PCU_MSR_PMON_CTL0,
+	.event_mask	= SNBEP_PCU_MSR_PMON_RAW_EVENT_MASK,
+	.box_ctl	= SNBEP_PCU_MSR_PMON_BOX_CTL,
+	.ops		= &snbep_uncore_msr_ops,
+	.format_group	= &snbep_uncore_pcu_format_group,
+};
+
+static struct intel_uncore_type *snbep_msr_uncores[] = {
+	&snbep_uncore_ubox,
+	&snbep_uncore_cbo,
+	&snbep_uncore_pcu,
+	NULL,
+};
+
+#define SNBEP_UNCORE_PCI_COMMON_INIT()				\
+	.perf_ctr	= SNBEP_PCI_PMON_CTR0,			\
+	.event_ctl	= SNBEP_PCI_PMON_CTL0,			\
+	.event_mask	= SNBEP_PMON_RAW_EVENT_MASK,		\
+	.box_ctl	= SNBEP_PCI_PMON_BOX_CTL,		\
+	.ops		= &snbep_uncore_pci_ops,		\
+	.format_group	= &snbep_uncore_format_group
+
+static struct intel_uncore_type snbep_uncore_ha = {
+	.name		= "ha",
+	.num_counters   = 4,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 48,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type snbep_uncore_imc = {
+	.name		= "imc",
+	.num_counters   = 4,
+	.num_boxes	= 4,
+	.perf_ctr_bits	= 48,
+	.fixed_ctr_bits	= 48,
+	.fixed_ctr	= SNBEP_MC_CHy_PCI_PMON_FIXED_CTR,
+	.fixed_ctl	= SNBEP_MC_CHy_PCI_PMON_FIXED_CTL,
+	.event_descs	= snbep_uncore_imc_events,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type snbep_uncore_qpi = {
+	.name		= "qpi",
+	.num_counters   = 4,
+	.num_boxes	= 2,
+	.perf_ctr_bits	= 48,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+
+static struct intel_uncore_type snbep_uncore_r2pcie = {
+	.name		= "r2pcie",
+	.num_counters   = 4,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 44,
+	.constraints	= snbep_uncore_r2pcie_constraints,
+	.event_descs	= snbep_uncore_r2pcie_events,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type snbep_uncore_r3qpi = {
+	.name		= "r3qpi",
+	.num_counters   = 3,
+	.num_boxes	= 2,
+	.perf_ctr_bits	= 44,
+	.constraints	= snbep_uncore_r3qpi_constraints,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type *snbep_pci_uncores[] = {
+	&snbep_uncore_ha,
+	&snbep_uncore_imc,
+	&snbep_uncore_qpi,
+	&snbep_uncore_r2pcie,
+	&snbep_uncore_r3qpi,
+	NULL,
+};
+
+static DEFINE_PCI_DEVICE_TABLE(snbep_uncore_pci_ids) = {
+	{ /* Home Agent */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_HA),
+		.driver_data = (unsigned long)&snbep_uncore_ha,
+	},
+	{ /* MC Channel 0 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC0),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* MC Channel 1 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC1),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* MC Channel 2 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC2),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* MC Channel 3 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC3),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* QPI Port 0 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_QPI0),
+		.driver_data = (unsigned long)&snbep_uncore_qpi,
+	},
+	{ /* QPI Port 1 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_QPI1),
+		.driver_data = (unsigned long)&snbep_uncore_qpi,
+	},
+	{ /* P2PCIe */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_R2PCIE),
+		.driver_data = (unsigned long)&snbep_uncore_r2pcie,
+	},
+	{ /* R3QPI Link 0 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_R3QPI0),
+		.driver_data = (unsigned long)&snbep_uncore_r3qpi,
+	},
+	{ /* R3QPI Link 1 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_R3QPI1),
+		.driver_data = (unsigned long)&snbep_uncore_r3qpi,
+	},
+	{ /* end: all zeroes */ }
+};
+
+static struct pci_driver snbep_uncore_pci_driver = {
+	.name		= "snbep_uncore",
+	.id_table	= snbep_uncore_pci_ids,
+};
+
+/*
+ * build pci bus to socket mapping
+ */
+static void snbep_pci2phy_map_init(void)
+{
+	struct pci_dev *ubox_dev = NULL;
+	int i, bus, nodeid;
+	u32 config;
+
+	while (1) {
+		/* find the UBOX device */
+		ubox_dev = pci_get_device(PCI_VENDOR_ID_INTEL,
+					PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX,
+					ubox_dev);
+		if (!ubox_dev)
+			break;
+		bus = ubox_dev->bus->number;
+		/* get the Node ID of the local register */
+		pci_read_config_dword(ubox_dev, 0x40, &config);
+		nodeid = config;
+		/* get the Node ID mapping */
+		pci_read_config_dword(ubox_dev, 0x54, &config);
+		/*
+		 * every three bits in the Node ID mapping register maps
+		 * to a particular node.
+		 */
+		for (i = 0; i < 8; i++) {
+			if (nodeid == ((config >> (3 * i)) & 0x7)) {
+				pcibus_to_phyid[bus] = i;
+				break;
+			}
+		}
+	};
+	return;
+}
+/* end of Sandy Bridge-EP uncore support */
+
 
 /* Sandy Bridge uncore support */
 static void snb_uncore_msr_disable_all(struct intel_uncore_box *box)
@@ -956,6 +1455,11 @@ static int __init uncore_pci_init(void)
 	int ret;
 
 	switch (boot_cpu_data.x86_model) {
+	case 45: /* Sandy Bridge-EP */
+		pci_uncores = snbep_pci_uncores;
+		uncore_pci_driver = &snbep_uncore_pci_driver;
+		snbep_pci2phy_map_init();
+		break;
 	default:
 		return 0;
 	}
@@ -1122,6 +1626,9 @@ static int __init uncore_cpu_init(void)
 	case 42: /* Sandy Bridge */
 		msr_uncores = snb_msr_uncores;
 		break;
+	case 45: /* Sandy Birdge-EP */
+		msr_uncores = snbep_msr_uncores;
+		break;
 	default:
 		return 0;
 	}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index 39df0ec..c07c9d3 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -65,6 +65,92 @@
 #define NHM_UNC_PERFEVTSEL0                     0x3c0
 #define NHM_UNC_UNCORE_PMC0                     0x3b0
 
+/* SNB-EP Box level control */
+#define SNBEP_PMON_BOX_CTL_RST_CTRL	(1 << 0)
+#define SNBEP_PMON_BOX_CTL_RST_CTRS	(1 << 1)
+#define SNBEP_PMON_BOX_CTL_FRZ		(1 << 8)
+#define SNBEP_PMON_BOX_CTL_FRZ_EN	(1 << 16)
+#define SNBEP_PMON_BOX_CTL_INT		(SNBEP_PMON_BOX_CTL_RST_CTRL | \
+					 SNBEP_PMON_BOX_CTL_RST_CTRS | \
+					 SNBEP_PMON_BOX_CTL_FRZ_EN)
+/* SNB-EP event control */
+#define SNBEP_PMON_CTL_EV_SEL_MASK	0x000000ff
+#define SNBEP_PMON_CTL_UMASK_MASK	0x0000ff00
+#define SNBEP_PMON_CTL_RST		(1 << 17)
+#define SNBEP_PMON_CTL_EDGE_DET		(1 << 18)
+#define SNBEP_PMON_CTL_EV_SEL_EXT	(1 << 21)	/* only for QPI */
+#define SNBEP_PMON_CTL_EN		(1 << 22)
+#define SNBEP_PMON_CTL_INVERT		(1 << 23)
+#define SNBEP_PMON_CTL_TRESH_MASK	0xff000000
+#define SNBEP_PMON_RAW_EVENT_MASK	(SNBEP_PMON_CTL_EV_SEL_MASK | \
+					 SNBEP_PMON_CTL_UMASK_MASK | \
+					 SNBEP_PMON_CTL_EDGE_DET | \
+					 SNBEP_PMON_CTL_INVERT | \
+					 SNBEP_PMON_CTL_TRESH_MASK)
+
+/* SNB-EP Ubox event control */
+#define SNBEP_U_MSR_PMON_CTL_TRESH_MASK		0x1f000000
+#define SNBEP_U_MSR_PMON_RAW_EVENT_MASK		\
+				(SNBEP_PMON_CTL_EV_SEL_MASK | \
+				 SNBEP_PMON_CTL_UMASK_MASK | \
+				 SNBEP_PMON_CTL_EDGE_DET | \
+				 SNBEP_PMON_CTL_INVERT | \
+				 SNBEP_U_MSR_PMON_CTL_TRESH_MASK)
+
+/* SNB-EP PCU event control */
+#define SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK	0x0000c000
+#define SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK	0x1f000000
+#define SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT	(1 << 30)
+#define SNBEP_PCU_MSR_PMON_CTL_OCC_EDGE_DET	(1 << 31)
+#define SNBEP_PCU_MSR_PMON_RAW_EVENT_MASK	\
+				(SNBEP_PMON_CTL_EV_SEL_MASK | \
+				 SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \
+				 SNBEP_PMON_CTL_EDGE_DET | \
+				 SNBEP_PMON_CTL_INVERT | \
+				 SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK | \
+				 SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \
+				 SNBEP_PCU_MSR_PMON_CTL_OCC_EDGE_DET)
+
+/* SNB-EP pci control register */
+#define SNBEP_PCI_PMON_BOX_CTL			0xf4
+#define SNBEP_PCI_PMON_CTL0			0xd8
+/* SNB-EP pci counter register */
+#define SNBEP_PCI_PMON_CTR0			0xa0
+
+/* SNB-EP home agent register */
+#define SNBEP_HA_PCI_PMON_BOX_ADDRMATCH0	0x40
+#define SNBEP_HA_PCI_PMON_BOX_ADDRMATCH1	0x44
+#define SNBEP_HA_PCI_PMON_BOX_OPCODEMATCH	0x48
+/* SNB-EP memory controller register */
+#define SNBEP_MC_CHy_PCI_PMON_FIXED_CTL		0xf0
+#define SNBEP_MC_CHy_PCI_PMON_FIXED_CTR		0xd0
+/* SNB-EP QPI register */
+#define SNBEP_Q_Py_PCI_PMON_PKT_MATCH0		0x228
+#define SNBEP_Q_Py_PCI_PMON_PKT_MATCH1		0x22c
+#define SNBEP_Q_Py_PCI_PMON_PKT_MASK0		0x238
+#define SNBEP_Q_Py_PCI_PMON_PKT_MASK1		0x23c
+
+/* SNB-EP Ubox register */
+#define SNBEP_U_MSR_PMON_CTR0			0xc16
+#define SNBEP_U_MSR_PMON_CTL0			0xc10
+
+#define SNBEP_U_MSR_PMON_UCLK_FIXED_CTL		0xc08
+#define SNBEP_U_MSR_PMON_UCLK_FIXED_CTR		0xc09
+
+/* SNB-EP Cbo register */
+#define SNBEP_C0_MSR_PMON_CTR0			0xd16
+#define SNBEP_C0_MSR_PMON_CTL0			0xd10
+#define SNBEP_C0_MSR_PMON_BOX_FILTER		0xd14
+#define SNBEP_C0_MSR_PMON_BOX_CTL		0xd04
+#define SNBEP_CBO_MSR_OFFSET			0x20
+
+/* SNB-EP PCU register */
+#define SNBEP_PCU_MSR_PMON_CTR0			0xc36
+#define SNBEP_PCU_MSR_PMON_CTL0			0xc30
+#define SNBEP_PCU_MSR_PMON_BOX_FILTER		0xc34
+#define SNBEP_PCU_MSR_PMON_BOX_CTL		0xc24
+#define SNBEP_PCU_MSR_CORE_C3_CTR		0x3fc
+#define SNBEP_PCU_MSR_CORE_C6_CTR		0x3fd
 
 struct intel_uncore_ops;
 struct intel_uncore_pmu;
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 3329965..9870b8d 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2754,6 +2754,17 @@
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB7	0x3c27
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB8	0x3c2e
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB9	0x3c2f
+#define PCI_DEVICE_ID_INTEL_UNC_HA	0x3c46
+#define PCI_DEVICE_ID_INTEL_UNC_IMC0	0x3cb0
+#define PCI_DEVICE_ID_INTEL_UNC_IMC1	0x3cb1
+#define PCI_DEVICE_ID_INTEL_UNC_IMC2	0x3cb4
+#define PCI_DEVICE_ID_INTEL_UNC_IMC3	0x3cb5
+#define PCI_DEVICE_ID_INTEL_UNC_QPI0	0x3c41
+#define PCI_DEVICE_ID_INTEL_UNC_QPI1	0x3c42
+#define PCI_DEVICE_ID_INTEL_UNC_R2PCIE	0x3c43
+#define PCI_DEVICE_ID_INTEL_UNC_R3QPI0	0x3c44
+#define PCI_DEVICE_ID_INTEL_UNC_R3QPI1	0x3c45
+#define PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX	0x3ce0
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB	0x402f
 #define PCI_DEVICE_ID_INTEL_5100_16	0x65f0
 #define PCI_DEVICE_ID_INTEL_5100_21	0x65f5
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] perf: Intel uncore pmu counting support
  2012-03-28  6:43 [RFC PATCH 0/5] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (4 preceding siblings ...)
  2012-03-28  6:43 ` [PATCH 5/5] perf: Add Sandy Bridge-EP uncore support Yan, Zheng
@ 2012-03-28  6:49 ` Ingo Molnar
  2012-03-28  8:49   ` Peter Zijlstra
  2012-03-28  8:57   ` Andi Kleen
  5 siblings, 2 replies; 28+ messages in thread
From: Ingo Molnar @ 2012-03-28  6:49 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: a.p.zijlstra, mingo, andi, eranian, linux-kernel, ming.m.lin


* Yan, Zheng <zheng.z.yan@intel.com> wrote:

> Hi, all
> 
> Here is the RFC patches to add uncore counting support for Nehalem,
> Sandy Bridge and Sandy Bridge-EP, applied on top of current tip.
> The code is based on Lin Ming's old patches.
> 
> You can use 'perf stat' to access to the uncore pmu. For example:
> perf stat -a -C 0 -e 'uncore_nhm/config=0xffff/' sleep 1

My main complaint is that that's not user friendly *AT ALL*.

You need to make this useful to mere mortals: go through the 
SDM, categorize interesting looking events, look at how it can 
be expressed via tooling, add a generic event where appropriate, 
provide examples, actually *USE* it to improve the kernel or an 
app and see the workflow as it happens and improve the tooling, 
etc.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] perf: Intel uncore pmu counting support
  2012-03-28  6:49 ` [RFC PATCH 0/5] perf: Intel uncore pmu counting support Ingo Molnar
@ 2012-03-28  8:49   ` Peter Zijlstra
  2012-03-28  9:02     ` Yan, Zheng
  2012-03-28  8:57   ` Andi Kleen
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2012-03-28  8:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yan, Zheng, mingo, andi, eranian, linux-kernel, ming.m.lin,
	Jiri Olsa, Arnaldo Carvalho de Melo

On Wed, 2012-03-28 at 08:49 +0200, Ingo Molnar wrote:
> * Yan, Zheng <zheng.z.yan@intel.com> wrote:
> 
> > Hi, all
> > 
> > Here is the RFC patches to add uncore counting support for Nehalem,
> > Sandy Bridge and Sandy Bridge-EP, applied on top of current tip.
> > The code is based on Lin Ming's old patches.
> > 
> > You can use 'perf stat' to access to the uncore pmu. For example:
> > perf stat -a -C 0 -e 'uncore_nhm/config=0xffff/' sleep 1
> 
> My main complaint is that that's not user friendly *AT ALL*.
> 
> You need to make this useful to mere mortals: go through the 
> SDM, categorize interesting looking events, look at how it can 
> be expressed via tooling, add a generic event where appropriate, 
> provide examples, actually *USE* it to improve the kernel or an 
> app and see the workflow as it happens and improve the tooling, 
> etc.

Easiest way out here is add a /sys/bus/event_source/devices/*/events/
directory which contains files who's name we can use as events and who's
contents are of the form we would use given the format/ stuff.

Example, suppose a westmere,

$ cat /sys/bus/event_source/devices/cpu/events/frontend_stalled_cycles
event=0x0e,umask=0x01,inv,cmask=1

I'll review the uncore patches later this week, but I suspect the whole
cpu->node mapping stuff is still not done properly.

Also, quick question, did Intel fix the SNB uncore PMI?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] perf: Intel uncore pmu counting support
  2012-03-28  6:49 ` [RFC PATCH 0/5] perf: Intel uncore pmu counting support Ingo Molnar
  2012-03-28  8:49   ` Peter Zijlstra
@ 2012-03-28  8:57   ` Andi Kleen
  2012-03-28  9:30     ` Ingo Molnar
  2012-03-28 10:58     ` Peter Zijlstra
  1 sibling, 2 replies; 28+ messages in thread
From: Andi Kleen @ 2012-03-28  8:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yan, Zheng, a.p.zijlstra, mingo, andi, eranian, linux-kernel, ming.m.lin

On Wed, Mar 28, 2012 at 08:49:28AM +0200, Ingo Molnar wrote:
> 
> * Yan, Zheng <zheng.z.yan@intel.com> wrote:
> 
> > Hi, all
> > 
> > Here is the RFC patches to add uncore counting support for Nehalem,
> > Sandy Bridge and Sandy Bridge-EP, applied on top of current tip.
> > The code is based on Lin Ming's old patches.
> > 
> > You can use 'perf stat' to access to the uncore pmu. For example:
> > perf stat -a -C 0 -e 'uncore_nhm/config=0xffff/' sleep 1
> 
> My main complaint is that that's not user friendly *AT ALL*.
> 
> You need to make this useful to mere mortals: go through the 
> SDM, categorize interesting looking events, look at how it can 

The main problem is that we don't know currently what events are
really useful. People need to play around with the raw events
first to find that out. But to do that really already need some
in tree support because out of tree is too painful to use: it's 
a chicken and egg problem.

> be expressed via tooling, add a generic event where appropriate, 
> provide examples, actually *USE* it to improve the kernel or an 

Yes it needs to be used, exactly.

To do that need the base line driver be available first, so that
this experience can be gained.

The only way to satisfy your request now would be to put in
some unproven events as generic events, probably not a good idea.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] perf: Intel uncore pmu counting support
  2012-03-28  8:49   ` Peter Zijlstra
@ 2012-03-28  9:02     ` Yan, Zheng
  0 siblings, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28  9:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, mingo, andi, eranian, linux-kernel, ming.m.lin,
	Jiri Olsa, Arnaldo Carvalho de Melo

On 03/28/2012 04:49 PM, Peter Zijlstra wrote:
> On Wed, 2012-03-28 at 08:49 +0200, Ingo Molnar wrote:
>> * Yan, Zheng <zheng.z.yan@intel.com> wrote:
>>
>>> Hi, all
>>>
>>> Here is the RFC patches to add uncore counting support for Nehalem,
>>> Sandy Bridge and Sandy Bridge-EP, applied on top of current tip.
>>> The code is based on Lin Ming's old patches.
>>>
>>> You can use 'perf stat' to access to the uncore pmu. For example:
>>> perf stat -a -C 0 -e 'uncore_nhm/config=0xffff/' sleep 1
>>
>> My main complaint is that that's not user friendly *AT ALL*.
>>
>> You need to make this useful to mere mortals: go through the 
>> SDM, categorize interesting looking events, look at how it can 
>> be expressed via tooling, add a generic event where appropriate, 
>> provide examples, actually *USE* it to improve the kernel or an 
>> app and see the workflow as it happens and improve the tooling, 
>> etc.
> 
> Easiest way out here is add a /sys/bus/event_source/devices/*/events/
> directory which contains files who's name we can use as events and who's
> contents are of the form we would use given the format/ stuff.
> 
> Example, suppose a westmere,
> 
> $ cat /sys/bus/event_source/devices/cpu/events/frontend_stalled_cycles
> event=0x0e,umask=0x01,inv,cmask=1
> 
> I'll review the uncore patches later this week, but I suspect the whole
> cpu->node mapping stuff is still not done properly.
> 
> Also, quick question, did Intel fix the SNB uncore PMI?
No. furthermore there is completely no uncore PMI in Sandy Bridge-EP

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-03-28  6:43 ` [PATCH 2/5] perf: generic intel uncore support Yan, Zheng
@ 2012-03-28  9:24   ` Andi Kleen
  2012-03-28  9:38     ` Peter Zijlstra
  2012-03-28 11:24     ` Yan, Zheng
  2012-03-31  3:18   ` Peter Zijlstra
  1 sibling, 2 replies; 28+ messages in thread
From: Andi Kleen @ 2012-03-28  9:24 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: a.p.zijlstra, mingo, andi, eranian, linux-kernel, ming.m.lin

Overall the driver looks rather good. Thanks.

On Wed, Mar 28, 2012 at 02:43:15PM +0800, Yan, Zheng wrote:
> +static void uncore_perf_event_update(struct intel_uncore_box *box,
> +				      struct perf_event *event)
> +{
> +	raw_spin_lock(&box->lock);

I think a raw lock would be only needed if the uncore was called
from the scheduler context switch, which it should not be.

So you can use a normal lock instead of a raw lock.


> +static void uncore_pmu_start_hrtimer(struct intel_uncore_box *box)
> +{
> +	__hrtimer_start_range_ns(&box->hrtimer,
> +				ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL), 0,
> +				HRTIMER_MODE_REL_PINNED, 0);
> +}

Can probably do some slack to be more friendly for power.

> +static struct intel_uncore_box *
> +uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> +{
> +	struct intel_uncore_box *box;
> +
> +	rcu_read_lock();

I'm not sure RCU is really needed here, are any of those paths
time critical? But ok shouldn't hurt either.

> +static int __init uncore_cpu_init(void)
> +{
> +	int ret, cpu;
> +
> +	switch (boot_cpu_data.x86_model) {
> +	default:
> +		return 0;
> +	}

Needs a case? Always returns?

> +
> +	ret = uncore_types_init(msr_uncores);
> +	if (ret)
> +		return ret;
> +
> +	get_online_cpus();
> +	for_each_online_cpu(cpu)
> +		uncore_cpu_prepare(cpu);
> +
> +	preempt_disable();
> +	smp_call_function(uncore_cpu_setup, NULL, 1);
> +	uncore_cpu_setup(NULL);
> +	preempt_enable();

That's on_each_cpu()


-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] perf: Intel uncore pmu counting support
  2012-03-28  8:57   ` Andi Kleen
@ 2012-03-28  9:30     ` Ingo Molnar
  2012-03-28 10:58     ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2012-03-28  9:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Yan, Zheng, a.p.zijlstra, mingo, eranian, linux-kernel, ming.m.lin


* Andi Kleen <andi@firstfloor.org> wrote:

> On Wed, Mar 28, 2012 at 08:49:28AM +0200, Ingo Molnar wrote:
> > 
> > * Yan, Zheng <zheng.z.yan@intel.com> wrote:
> > 
> > > Hi, all
> > > 
> > > Here is the RFC patches to add uncore counting support for Nehalem,
> > > Sandy Bridge and Sandy Bridge-EP, applied on top of current tip.
> > > The code is based on Lin Ming's old patches.
> > > 
> > > You can use 'perf stat' to access to the uncore pmu. For example:
> > > perf stat -a -C 0 -e 'uncore_nhm/config=0xffff/' sleep 1
> > 
> > My main complaint is that that's not user friendly *AT ALL*.
> > 
> > You need to make this useful to mere mortals: go through the 
> > SDM, categorize interesting looking events, look at how it can 
> 
> The main problem is that we don't know currently what events 
> are really useful. People need to play around with the raw 
> events first to find that out. But to do that really already 
> need some in tree support because out of tree is too painful 
> to use: it's a chicken and egg problem.

Nonsense. User-space hacks can be used for experimenting around, 
raw access to /dev/msr has in fact be used to implement uncore 
support in the past ... It has not helped the least to bring 
useful tooling forward, I'd in fact argue the opposite: access 
to raw events helped *hide* real usecases.

So guys, please give me some tangible reason to merge it and 
some real way for developers to use it. The kernel is already 
complex enough as-is, "it might be useful in the future" or
"I don't want to tell my sekrit usecases" does not fly.

If you can't be bothered to make it useful to everyone then 
frankly we can't be bothered to maintain it upstream, it's 
really that simple.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-03-28  9:24   ` Andi Kleen
@ 2012-03-28  9:38     ` Peter Zijlstra
  2012-03-28 11:24     ` Yan, Zheng
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2012-03-28  9:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Yan, Zheng, mingo, eranian, linux-kernel, ming.m.lin

On Wed, 2012-03-28 at 11:24 +0200, Andi Kleen wrote:
> On Wed, Mar 28, 2012 at 02:43:15PM +0800, Yan, Zheng wrote:
> > +static void uncore_perf_event_update(struct intel_uncore_box *box,
> > +                                   struct perf_event *event)
> > +{
> > +     raw_spin_lock(&box->lock);
> 
> I think a raw lock would be only needed if the uncore was called
> from the scheduler context switch, which it should not be.
> 
> So you can use a normal lock instead of a raw lock.

Please ignore any and all feedback from wrongbot Andi, as said I'll
review the patches later this week.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/5] perf: Intel uncore pmu counting support
  2012-03-28  8:57   ` Andi Kleen
  2012-03-28  9:30     ` Ingo Molnar
@ 2012-03-28 10:58     ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2012-03-28 10:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Yan, Zheng, mingo, eranian, linux-kernel, ming.m.lin

On Wed, 2012-03-28 at 10:57 +0200, Andi Kleen wrote:
> The main problem is that we don't know currently what events are
> really useful. People need to play around with the raw events
> first to find that out. But to do that really already need some
> in tree support because out of tree is too painful to use: it's 
> a chicken and egg problem. 

So you're saying Intel just made the hardware without any idea if it was
useful or not?

Somehow that seems rather unlikely. It seems to me they had at least one
but very likely several strong use-cases before they designed the
hardware.

What Ingo is asking is to implement these, so that everybody can enjoy
them instead of being a conduit for Intel proprietary tools.

A minimal driver with no users except $$ tools is not acceptable.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-03-28  9:24   ` Andi Kleen
  2012-03-28  9:38     ` Peter Zijlstra
@ 2012-03-28 11:24     ` Yan, Zheng
  1 sibling, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-03-28 11:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: a.p.zijlstra, mingo, eranian, linux-kernel, ming.m.lin

On 03/28/2012 05:24 PM, Andi Kleen wrote:
> Overall the driver looks rather good. Thanks.
> 
> On Wed, Mar 28, 2012 at 02:43:15PM +0800, Yan, Zheng wrote:
>> +static void uncore_perf_event_update(struct intel_uncore_box *box,
>> +				      struct perf_event *event)
>> +{
>> +	raw_spin_lock(&box->lock);
> 
> I think a raw lock would be only needed if the uncore was called
> from the scheduler context switch, which it should not be.
> 
> So you can use a normal lock instead of a raw lock.
>
> 
>> +static void uncore_pmu_start_hrtimer(struct intel_uncore_box *box)
>> +{
>> +	__hrtimer_start_range_ns(&box->hrtimer,
>> +				ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL), 0,
>> +				HRTIMER_MODE_REL_PINNED, 0);
>> +}
> 
> Can probably do some slack to be more friendly for power.
> 
>> +static struct intel_uncore_box *
>> +uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
>> +{
>> +	struct intel_uncore_box *box;
>> +
>> +	rcu_read_lock();
> 
> I'm not sure RCU is really needed here, are any of those paths
> time critical? But ok shouldn't hurt either.
>
It's not time critical. but using RCU here is as simple as
using lock. So I decided to use RCU.

 
>> +static int __init uncore_cpu_init(void)
>> +{
>> +	int ret, cpu;
>> +
>> +	switch (boot_cpu_data.x86_model) {
>> +	default:
>> +		return 0;
>> +	}
> 
> Needs a case? Always returns?

later patches add code to here

> 
>> +
>> +	ret = uncore_types_init(msr_uncores);
>> +	if (ret)
>> +		return ret;
>> +
>> +	get_online_cpus();
>> +	for_each_online_cpu(cpu)
>> +		uncore_cpu_prepare(cpu);
>> +
>> +	preempt_disable();
>> +	smp_call_function(uncore_cpu_setup, NULL, 1);
>> +	uncore_cpu_setup(NULL);
>> +	preempt_enable();
> 
> That's on_each_cpu()
> 
will switch to on_each_cpu() in later version of patches

Thanks
Yan, Zheng
> 
> -Andi


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-03-28  6:43 ` [PATCH 2/5] perf: generic intel uncore support Yan, Zheng
  2012-03-28  9:24   ` Andi Kleen
@ 2012-03-31  3:18   ` Peter Zijlstra
  2012-04-01  3:11     ` Yan, Zheng
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2012-03-31  3:18 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Wed, 2012-03-28 at 14:43 +0800, Yan, Zheng wrote:
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> new file mode 100644
> index 0000000..d159e3e
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -0,0 +1,814 @@
> +#include "perf_event_intel_uncore.h"
> +
> +static struct intel_uncore_type *empty_uncore[] = { NULL, };
> +static struct intel_uncore_type **msr_uncores = empty_uncore;
> +
> +/* constraint for box with 2 counters */
> +static struct event_constraint unconstrained_2 =
> +       EVENT_CONSTRAINT(0, 0x3, 0);
> +/* constraint for box with 3 counters */
> +static struct event_constraint unconstrained_3 =
> +       EVENT_CONSTRAINT(0, 0x7, 0);
> +/* constraint for box with 4 counters */
> +static struct event_constraint unconstrained_4 =
> +       EVENT_CONSTRAINT(0, 0xf, 0);
> +/* constraint for box with 8 counters */
> +static struct event_constraint unconstrained_8 =
> +       EVENT_CONSTRAINT(0, 0xff, 0);
> +/* constraint for the fixed countesr */
> +static struct event_constraint constraint_fixed =
> +       EVENT_CONSTRAINT((u64)-1, 1 << UNCORE_PMC_IDX_FIXED, (u64)-1);

Since they're all different, why not have an struct event_constraint
unconstrained member in your struct intel_uncore_pmu and fill it out
whenever you create that.

> +static DEFINE_SPINLOCK(uncore_box_lock);

> +/*
> + * The overflow interrupt is unavailable for SandyBridge-EP, is broken
> + * for SandyBridge. So we use hrtimer to periodically poll the counter
> + */

To avoid overlow and accumulate into the software u64, right? Not to
actually sample anything.

Might also want to say is broken for anything else, since afaik uncore
PMI has been broken for everything with an uncore.


> +static struct intel_uncore_box *
> +__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> +{
> +       struct intel_uncore_box *box;
> +       struct hlist_head *head;
> +       struct hlist_node *node;
> +
> +       head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
> +
> +       hlist_for_each_entry_rcu(box, node, head, hlist) {
> +               if (box->phy_id == phyid)
> +                       return box;
> +       }
> +
> +       return NULL;
> +}
> +
> +static struct intel_uncore_box *
> +uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> +{
> +       struct intel_uncore_box *box;
> +
> +       rcu_read_lock();
> +       box = __uncore_pmu_find_box(pmu, phyid);
> +       rcu_read_unlock();
> +
> +       return box;
> +}
> +
> +/* caller should hold the uncore_box_lock */
> +static void uncore_pmu_add_box(struct intel_uncore_pmu *pmu,
> +                               struct intel_uncore_box *box)
> +{
> +       struct hlist_head *head;
> +
> +       head = &pmu->box_hash[box->phy_id % UNCORE_BOX_HASH_SIZE];
> +       hlist_add_head_rcu(&box->hlist, head);
> +}
> +
> +static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
> +{
> +       return container_of(event->pmu, struct intel_uncore_pmu, pmu);
> +}
> +
> +static struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
> +{
> +       int phyid = topology_physical_package_id(smp_processor_id());

Who says that this event has anything to do with the current cpu?

> +       return uncore_pmu_find_box(uncore_event_to_pmu(event), phyid);
> +}

So why not simply use a per-cpu allocation and have something like:

struct intel_uncore_pmu {
	...
	struct intel_uncore_box * __percpu box;
};

static inline
struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
{
	return per_cpu_ptr(event->pmu->box, event->cpu);
}

And be done with it?

> +static int uncore_collect_events(struct intel_uncore_box *box,
> +                         struct perf_event *leader, bool dogrp)
> +{
> +       struct perf_event *event;
> +       int n, max_count;
> +
> +       max_count = box->pmu->type->num_counters;
> +       if (box->pmu->type->fixed_ctl)
> +               max_count++;
> +
> +       if (box->n_events >= max_count)
> +               return -EINVAL;
> +
> +       /*
> +        * adding the same events twice to the uncore PMU may cause
> +        * general protection fault
> +        */

Is that an errata or a 'feature' of some specific box types, or what?

> +       for (n = 0; n < box->n_events; n++) {
> +               event = box->event_list[n];
> +               if (event->hw.config == leader->hw.config)
> +                       return -EINVAL;
> +       }
> +
> +       n = box->n_events;
> +       box->event_list[n] = leader;
> +       n++;
> +       if (!dogrp)
> +               return n;
> +
> +       list_for_each_entry(event, &leader->sibling_list, group_entry) {
> +               if (event->state <= PERF_EVENT_STATE_OFF)
> +                       continue;
> +
> +               if (n >= max_count)
> +                       return -EINVAL;
> +
> +               box->event_list[n] = event;
> +               n++;
> +       }
> +       return n;
> +}
> +
> +static struct event_constraint *
> +uncore_event_constraint(struct intel_uncore_type *type,
> +                       struct perf_event *event)
> +{
> +       struct event_constraint *c;
> +
> +       if (event->hw.config == (u64)-1)
> +               return &constraint_fixed;
> +
> +       if (type->constraints) {
> +               for_each_event_constraint(c, type->constraints) {
> +                       if ((event->hw.config & c->cmask) == c->code)
> +                               return c;
> +               }
> +       }
> +
> +       if (type->num_counters == 2)
> +               return &unconstrained_2;
> +       if (type->num_counters == 3)
> +               return &unconstrained_3;
> +       if (type->num_counters == 4)
> +               return &unconstrained_4;
> +       if (type->num_counters == 8)
> +               return &unconstrained_8;
> +
> +       WARN_ON_ONCE(1);
> +       return &unconstrained_2;

	return event->pmu->unconstrained;

seems much saner to me..

> +}
> +
> +static int uncore_assign_events(struct intel_uncore_box *box,
> +                               int assign[], int n)
> +{
> +       struct event_constraint *c, *constraints[UNCORE_PMC_IDX_MAX];
> +       int i, ret, wmin, wmax;
> +
> +       for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
> +               c = uncore_event_constraint(box->pmu->type,
> +                                       box->event_list[i]);
> +               constraints[i] = c;
> +               wmin = min(wmin, c->weight);
> +               wmax = max(wmax, c->weight);
> +       }

No fast path then?

> +       ret = perf_assign_events(constraints, n, wmin, wmax, assign);
> +       return ret ? -EINVAL : 0;
> +}


> +static void uncore_pmu_event_start(struct perf_event *event, int flags)
> +{
> +       struct intel_uncore_box *box = uncore_event_to_box(event);
> +
> +       raw_spin_lock(&box->lock);
> +       __uncore_pmu_event_start(box, event, flags);
> +       raw_spin_unlock(&box->lock);
> +}

> +static void uncore_pmu_event_stop(struct perf_event *event, int flags)
> +{
> +       struct intel_uncore_box *box = uncore_event_to_box(event);
> +
> +       raw_spin_lock(&box->lock);
> +       __uncore_pmu_event_stop(box, event, flags);
> +       raw_spin_unlock(&box->lock);
> +}

> +static int uncore_pmu_event_add(struct perf_event *event, int flags)
> +{
> +       struct intel_uncore_box *box = uncore_event_to_box(event);
> +       struct hw_perf_event *hwc = &event->hw;
> +       int assign[UNCORE_PMC_IDX_MAX];
> +       int i, n, ret;
> +
> +       if (!box)
> +               return -ENODEV;
> +
> +       raw_spin_lock(&box->lock);

> +       raw_spin_unlock(&box->lock);
> +       return ret;
> +}
> +
> +static void uncore_pmu_event_del(struct perf_event *event, int flags)
> +{
> +       struct intel_uncore_box *box = uncore_event_to_box(event);
> +       int i;
> +
> +       raw_spin_lock(&box->lock);

> +       raw_spin_unlock(&box->lock);
> +}

So what's up with all this box->lock business.. why does that lock
exist?

> +static int __init uncore_pmu_register(struct intel_uncore_pmu *pmu)
> +{
> +       int ret;
> +
> +       pmu->pmu.attr_groups    = pmu->type->attr_groups;
> +       pmu->pmu.task_ctx_nr    = perf_invalid_context;
> +       pmu->pmu.event_init     = uncore_pmu_event_init;
> +       pmu->pmu.add            = uncore_pmu_event_add;
> +       pmu->pmu.del            = uncore_pmu_event_del;
> +       pmu->pmu.start          = uncore_pmu_event_start;
> +       pmu->pmu.stop           = uncore_pmu_event_stop;
> +       pmu->pmu.read           = uncore_pmu_event_read;

Won't this look better as a C99 struct init? Something like:

	pmu->pmu = (struct pmu){
		.attr_groups	= pmu->type->attr_groups,
		.task_ctx_nr	= perf_invalid_context,
		.event_init	= uncore_pmu_event_init,
		...
	};

> +       if (pmu->type->num_boxes == 1)
> +               sprintf(pmu->name, "uncore_%s", pmu->type->name);
> +       else
> +               sprintf(pmu->name, "uncore_%s%d", pmu->type->name,
> +                       pmu->pmu_idx);
> +
> +       ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
> +       return ret;
> +}


> +static int __init uncore_type_init(struct intel_uncore_type *type)
> +{
> +       struct intel_uncore_pmu *pmus;
> +       struct attribute_group *events_group;
> +       struct attribute **attrs;
> +       int i, j;
> +
> +       pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
> +       if (!pmus)
> +               return -ENOMEM;

Hmm, but if you have a pmu per number of boxes, then what do you need
that  pmu->box reference for?

> +
> +       for (i = 0; i < type->num_boxes; i++) {
> +               pmus[i].func_id = -1;
> +               pmus[i].pmu_idx = i;
> +               pmus[i].type = type;
> +
> +               for (j = 0; j < ARRAY_SIZE(pmus[0].box_hash); j++)
> +                       INIT_HLIST_HEAD(&pmus[i].box_hash[j]);
> +       }
> +
> +       if (type->event_descs) {
> +               for (i = 0; ; i++) {
> +                       if (!type->event_descs[i].attr.attr.name)
> +                               break;
> +               }
> +
> +               events_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
> +                               sizeof(*events_group), GFP_KERNEL);
> +               if (!events_group)
> +                       goto fail;
> +
> +               attrs = (struct attribute **)(events_group + 1);
> +               events_group->name = "events";
> +               events_group->attrs = attrs;
> +
> +               for (j = 0; j < i; j++)
> +                       attrs[j] = &type->event_descs[j].attr.attr;
> +
> +               type->attr_groups[1] = events_group;
> +       }
> +       type->pmus = pmus;
> +       return 0;
> +fail:
> +       uncore_type_exit(type);
> +       return -ENOMEM;
> +}
> +


Aside from all this, there's still the problem that you don't place all
events for a particular phys_id onto a single cpu. It doesn't matter
which cpu in that package it is, but all events should go to the same.

This means that on unplug of that cpu, you have to migrate all these
events etc..

I suspect doing this will also allow you to get rid of that box->lock
thing.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-03-31  3:18   ` Peter Zijlstra
@ 2012-04-01  3:11     ` Yan, Zheng
  2012-04-02 22:10       ` Peter Zijlstra
                         ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-04-01  3:11 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On 03/31/2012 11:18 AM, Peter Zijlstra wrote:
> On Wed, 2012-03-28 at 14:43 +0800, Yan, Zheng wrote:
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> new file mode 100644
>> index 0000000..d159e3e
>> --- /dev/null
>> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> @@ -0,0 +1,814 @@
>> +#include "perf_event_intel_uncore.h"
>> +
>> +static struct intel_uncore_type *empty_uncore[] = { NULL, };
>> +static struct intel_uncore_type **msr_uncores = empty_uncore;
>> +
>> +/* constraint for box with 2 counters */
>> +static struct event_constraint unconstrained_2 =
>> +       EVENT_CONSTRAINT(0, 0x3, 0);
>> +/* constraint for box with 3 counters */
>> +static struct event_constraint unconstrained_3 =
>> +       EVENT_CONSTRAINT(0, 0x7, 0);
>> +/* constraint for box with 4 counters */
>> +static struct event_constraint unconstrained_4 =
>> +       EVENT_CONSTRAINT(0, 0xf, 0);
>> +/* constraint for box with 8 counters */
>> +static struct event_constraint unconstrained_8 =
>> +       EVENT_CONSTRAINT(0, 0xff, 0);
>> +/* constraint for the fixed countesr */
>> +static struct event_constraint constraint_fixed =
>> +       EVENT_CONSTRAINT((u64)-1, 1 << UNCORE_PMC_IDX_FIXED, (u64)-1);
> 
> Since they're all different, why not have an struct event_constraint
> unconstrained member in your struct intel_uncore_pmu and fill it out
> whenever you create that.
> 
>> +static DEFINE_SPINLOCK(uncore_box_lock);
> 
>> +/*
>> + * The overflow interrupt is unavailable for SandyBridge-EP, is broken
>> + * for SandyBridge. So we use hrtimer to periodically poll the counter
>> + */
> 
> To avoid overlow and accumulate into the software u64, right? Not to
> actually sample anything.
Yes

> 
> Might also want to say is broken for anything else, since afaik uncore
> PMI has been broken for everything with an uncore.
> 
> 
>> +static struct intel_uncore_box *
>> +__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
>> +{
>> +       struct intel_uncore_box *box;
>> +       struct hlist_head *head;
>> +       struct hlist_node *node;
>> +
>> +       head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
>> +
>> +       hlist_for_each_entry_rcu(box, node, head, hlist) {
>> +               if (box->phy_id == phyid)
>> +                       return box;
>> +       }
>> +
>> +       return NULL;
>> +}
>> +
>> +static struct intel_uncore_box *
>> +uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
>> +{
>> +       struct intel_uncore_box *box;
>> +
>> +       rcu_read_lock();
>> +       box = __uncore_pmu_find_box(pmu, phyid);
>> +       rcu_read_unlock();
>> +
>> +       return box;
>> +}
>> +
>> +/* caller should hold the uncore_box_lock */
>> +static void uncore_pmu_add_box(struct intel_uncore_pmu *pmu,
>> +                               struct intel_uncore_box *box)
>> +{
>> +       struct hlist_head *head;
>> +
>> +       head = &pmu->box_hash[box->phy_id % UNCORE_BOX_HASH_SIZE];
>> +       hlist_add_head_rcu(&box->hlist, head);
>> +}
>> +
>> +static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
>> +{
>> +       return container_of(event->pmu, struct intel_uncore_pmu, pmu);
>> +}
>> +
>> +static struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
>> +{
>> +       int phyid = topology_physical_package_id(smp_processor_id());
> 
> Who says that this event has anything to do with the current cpu?
> 
Because perf core schedules event on the basis of cpu. 

>> +       return uncore_pmu_find_box(uncore_event_to_pmu(event), phyid);
>> +}
> 
> So why not simply use a per-cpu allocation and have something like:
> 
> struct intel_uncore_pmu {
> 	...
> 	struct intel_uncore_box * __percpu box;
> };
> 
> static inline
> struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
> {
> 	return per_cpu_ptr(event->pmu->box, event->cpu);
> }
> 
> And be done with it?

Because using per-cpu allocation is inconvenience for PCI uncore device.

> 
>> +static int uncore_collect_events(struct intel_uncore_box *box,
>> +                         struct perf_event *leader, bool dogrp)
>> +{
>> +       struct perf_event *event;
>> +       int n, max_count;
>> +
>> +       max_count = box->pmu->type->num_counters;
>> +       if (box->pmu->type->fixed_ctl)
>> +               max_count++;
>> +
>> +       if (box->n_events >= max_count)
>> +               return -EINVAL;
>> +
>> +       /*
>> +        * adding the same events twice to the uncore PMU may cause
>> +        * general protection fault
>> +        */
> 
> Is that an errata or a 'feature' of some specific box types, or what?
> 
>> +       for (n = 0; n < box->n_events; n++) {
>> +               event = box->event_list[n];
>> +               if (event->hw.config == leader->hw.config)
>> +                       return -EINVAL;
>> +       }
>> +
>> +       n = box->n_events;
>> +       box->event_list[n] = leader;
>> +       n++;
>> +       if (!dogrp)
>> +               return n;
>> +
>> +       list_for_each_entry(event, &leader->sibling_list, group_entry) {
>> +               if (event->state <= PERF_EVENT_STATE_OFF)
>> +                       continue;
>> +
>> +               if (n >= max_count)
>> +                       return -EINVAL;
>> +
>> +               box->event_list[n] = event;
>> +               n++;
>> +       }
>> +       return n;
>> +}
>> +
>> +static struct event_constraint *
>> +uncore_event_constraint(struct intel_uncore_type *type,
>> +                       struct perf_event *event)
>> +{
>> +       struct event_constraint *c;
>> +
>> +       if (event->hw.config == (u64)-1)
>> +               return &constraint_fixed;
>> +
>> +       if (type->constraints) {
>> +               for_each_event_constraint(c, type->constraints) {
>> +                       if ((event->hw.config & c->cmask) == c->code)
>> +                               return c;
>> +               }
>> +       }
>> +
>> +       if (type->num_counters == 2)
>> +               return &unconstrained_2;
>> +       if (type->num_counters == 3)
>> +               return &unconstrained_3;
>> +       if (type->num_counters == 4)
>> +               return &unconstrained_4;
>> +       if (type->num_counters == 8)
>> +               return &unconstrained_8;
>> +
>> +       WARN_ON_ONCE(1);
>> +       return &unconstrained_2;
> 
> 	return event->pmu->unconstrained;
> 
> seems much saner to me..

will change the code
> 
>> +}
>> +
>> +static int uncore_assign_events(struct intel_uncore_box *box,
>> +                               int assign[], int n)
>> +{
>> +       struct event_constraint *c, *constraints[UNCORE_PMC_IDX_MAX];
>> +       int i, ret, wmin, wmax;
>> +
>> +       for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
>> +               c = uncore_event_constraint(box->pmu->type,
>> +                                       box->event_list[i]);
>> +               constraints[i] = c;
>> +               wmin = min(wmin, c->weight);
>> +               wmax = max(wmax, c->weight);
>> +       }
> 
> No fast path then?

will add the fast path
> 
>> +       ret = perf_assign_events(constraints, n, wmin, wmax, assign);
>> +       return ret ? -EINVAL : 0;
>> +}
> 
> 
>> +static void uncore_pmu_event_start(struct perf_event *event, int flags)
>> +{
>> +       struct intel_uncore_box *box = uncore_event_to_box(event);
>> +
>> +       raw_spin_lock(&box->lock);
>> +       __uncore_pmu_event_start(box, event, flags);
>> +       raw_spin_unlock(&box->lock);
>> +}
> 
>> +static void uncore_pmu_event_stop(struct perf_event *event, int flags)
>> +{
>> +       struct intel_uncore_box *box = uncore_event_to_box(event);
>> +
>> +       raw_spin_lock(&box->lock);
>> +       __uncore_pmu_event_stop(box, event, flags);
>> +       raw_spin_unlock(&box->lock);
>> +}
> 
>> +static int uncore_pmu_event_add(struct perf_event *event, int flags)
>> +{
>> +       struct intel_uncore_box *box = uncore_event_to_box(event);
>> +       struct hw_perf_event *hwc = &event->hw;
>> +       int assign[UNCORE_PMC_IDX_MAX];
>> +       int i, n, ret;
>> +
>> +       if (!box)
>> +               return -ENODEV;
>> +
>> +       raw_spin_lock(&box->lock);
> 
>> +       raw_spin_unlock(&box->lock);
>> +       return ret;
>> +}
>> +
>> +static void uncore_pmu_event_del(struct perf_event *event, int flags)
>> +{
>> +       struct intel_uncore_box *box = uncore_event_to_box(event);
>> +       int i;
>> +
>> +       raw_spin_lock(&box->lock);
> 
>> +       raw_spin_unlock(&box->lock);
>> +}
> 
> So what's up with all this box->lock business.. why does that lock
> exist?

If user doesn't provide the "-C x" option to the perf tool, multiple cpus
will try adding/deleting events at the same time.

> 
>> +static int __init uncore_pmu_register(struct intel_uncore_pmu *pmu)
>> +{
>> +       int ret;
>> +
>> +       pmu->pmu.attr_groups    = pmu->type->attr_groups;
>> +       pmu->pmu.task_ctx_nr    = perf_invalid_context;
>> +       pmu->pmu.event_init     = uncore_pmu_event_init;
>> +       pmu->pmu.add            = uncore_pmu_event_add;
>> +       pmu->pmu.del            = uncore_pmu_event_del;
>> +       pmu->pmu.start          = uncore_pmu_event_start;
>> +       pmu->pmu.stop           = uncore_pmu_event_stop;
>> +       pmu->pmu.read           = uncore_pmu_event_read;
> 
> Won't this look better as a C99 struct init? Something like:
> 
> 	pmu->pmu = (struct pmu){
> 		.attr_groups	= pmu->type->attr_groups,
> 		.task_ctx_nr	= perf_invalid_context,
> 		.event_init	= uncore_pmu_event_init,
> 		...
> 	};
> 
>> +       if (pmu->type->num_boxes == 1)
>> +               sprintf(pmu->name, "uncore_%s", pmu->type->name);
>> +       else
>> +               sprintf(pmu->name, "uncore_%s%d", pmu->type->name,
>> +                       pmu->pmu_idx);
>> +
>> +       ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
>> +       return ret;
>> +}
> 
> 
>> +static int __init uncore_type_init(struct intel_uncore_type *type)
>> +{
>> +       struct intel_uncore_pmu *pmus;
>> +       struct attribute_group *events_group;
>> +       struct attribute **attrs;
>> +       int i, j;
>> +
>> +       pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
>> +       if (!pmus)
>> +               return -ENOMEM;
> 
> Hmm, but if you have a pmu per number of boxes, then what do you need
> that  pmu->box reference for?

Type->num_boxes is number of boxes within one physical cpu. pmu->box_hash
is needed because there can be several physical cpus in a system.  

> 
>> +
>> +       for (i = 0; i < type->num_boxes; i++) {
>> +               pmus[i].func_id = -1;
>> +               pmus[i].pmu_idx = i;
>> +               pmus[i].type = type;
>> +
>> +               for (j = 0; j < ARRAY_SIZE(pmus[0].box_hash); j++)
>> +                       INIT_HLIST_HEAD(&pmus[i].box_hash[j]);
>> +       }
>> +
>> +       if (type->event_descs) {
>> +               for (i = 0; ; i++) {
>> +                       if (!type->event_descs[i].attr.attr.name)
>> +                               break;
>> +               }
>> +
>> +               events_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
>> +                               sizeof(*events_group), GFP_KERNEL);
>> +               if (!events_group)
>> +                       goto fail;
>> +
>> +               attrs = (struct attribute **)(events_group + 1);
>> +               events_group->name = "events";
>> +               events_group->attrs = attrs;
>> +
>> +               for (j = 0; j < i; j++)
>> +                       attrs[j] = &type->event_descs[j].attr.attr;
>> +
>> +               type->attr_groups[1] = events_group;
>> +       }
>> +       type->pmus = pmus;
>> +       return 0;
>> +fail:
>> +       uncore_type_exit(type);
>> +       return -ENOMEM;
>> +}
>> +
> 
> 
> Aside from all this, there's still the problem that you don't place all
> events for a particular phys_id onto a single cpu. It doesn't matter
> which cpu in that package it is, but all events should go to the same.
> 
> This means that on unplug of that cpu, you have to migrate all these
> events etc..

Any hints how to do this. I'm afraid it requires big changes to perf core.

Thank you very much
Yan, Zheng

> 
> I suspect doing this will also allow you to get rid of that box->lock
> thing.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-01  3:11     ` Yan, Zheng
@ 2012-04-02 22:10       ` Peter Zijlstra
  2012-04-02 22:11       ` Peter Zijlstra
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2012-04-02 22:10 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
> >> +static struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
> >> +{
> >> +       int phyid = topology_physical_package_id(smp_processor_id());
> > 
> > Who says that this event has anything to do with the current cpu?
> > 
> Because perf core schedules event on the basis of cpu. 

This appears true for all the current callchains, however there's no
comment in there making clear you thought about this, nor to remind you
future users respect this constraint.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-01  3:11     ` Yan, Zheng
  2012-04-02 22:10       ` Peter Zijlstra
@ 2012-04-02 22:11       ` Peter Zijlstra
  2012-04-03  8:28         ` Yan, Zheng
  2012-04-02 22:16       ` Peter Zijlstra
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2012-04-02 22:11 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
> 
> Because using per-cpu allocation is inconvenience for PCI uncore device.

What are those, where does one read about them and why?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-01  3:11     ` Yan, Zheng
  2012-04-02 22:10       ` Peter Zijlstra
  2012-04-02 22:11       ` Peter Zijlstra
@ 2012-04-02 22:16       ` Peter Zijlstra
  2012-04-02 22:24       ` Peter Zijlstra
  2012-04-16 12:07       ` Peter Zijlstra
  4 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2012-04-02 22:16 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
> > So what's up with all this box->lock business.. why does that lock
> > exist?
> 
> If user doesn't provide the "-C x" option to the perf tool, multiple cpus
> will try adding/deleting events at the same time. 

Right, however:

> > Aside from all this, there's still the problem that you don't place all
> > events for a particular phys_id onto a single cpu. It doesn't matter
> > which cpu in that package it is, but all events should go to the same.
> > 
> > This means that on unplug of that cpu, you have to migrate all these
> > events etc..
> 
> Any hints how to do this. I'm afraid it requires big changes to perf core.

Yes it'll need some core changes, but I don't think I'll be too bad,
I'll try and write up something soonish.. 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-01  3:11     ` Yan, Zheng
                         ` (2 preceding siblings ...)
  2012-04-02 22:16       ` Peter Zijlstra
@ 2012-04-02 22:24       ` Peter Zijlstra
  2012-04-16 12:07       ` Peter Zijlstra
  4 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2012-04-02 22:24 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
> 
> >> +static int __init uncore_type_init(struct intel_uncore_type *type)
> >> +{
> >> +       struct intel_uncore_pmu *pmus;
> >> +       struct attribute_group *events_group;
> >> +       struct attribute **attrs;
> >> +       int i, j;
> >> +
> >> +       pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
> >> +       if (!pmus)
> >> +               return -ENOMEM;
> > 
> > Hmm, but if you have a pmu per number of boxes, then what do you need
> > that  pmu->box reference for?
> 
> Type->num_boxes is number of boxes within one physical cpu. pmu->box_hash
> is needed because there can be several physical cpus in a system.  

Ah, indeed. 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-02 22:11       ` Peter Zijlstra
@ 2012-04-03  8:28         ` Yan, Zheng
  2012-04-03 14:29           ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Yan, Zheng @ 2012-04-03  8:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On 04/03/2012 06:11 AM, Peter Zijlstra wrote:
> On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
>>
>> Because using per-cpu allocation is inconvenience for PCI uncore device.
> 
> What are those, where does one read about them and why?
> 
PCI uncore device support was added by patch 4. If using per-cpu pointer,
we have to setup the per-cpu pointer after the PCI driver's probe function
recognizes the uncore device. It means we have to first add the PCI uncore
device to a list, then call a function on all cpus to scan the list, find
the right uncore device and setup the per-cpu pointer.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-03  8:28         ` Yan, Zheng
@ 2012-04-03 14:29           ` Peter Zijlstra
  2012-04-04  1:47             ` Yan, Zheng
  2012-04-10  0:48             ` Yan, Zheng
  0 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2012-04-03 14:29 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Tue, 2012-04-03 at 16:28 +0800, Yan, Zheng wrote:
> On 04/03/2012 06:11 AM, Peter Zijlstra wrote:
> > On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
> >>
> >> Because using per-cpu allocation is inconvenience for PCI uncore device.
> > 
> > What are those, where does one read about them and why?
> > 
> PCI uncore device support was added by patch 4. If using per-cpu pointer,
> we have to setup the per-cpu pointer after the PCI driver's probe function
> recognizes the uncore device. It means we have to first add the PCI uncore
> device to a list, then call a function on all cpus to scan the list, find
> the right uncore device and setup the per-cpu pointer.

Yeah, I saw that, but it was all completely lacking in where one can
find actual detail on that stuff. What parts of the SDM do I read? Is
there anything other than the SDM, etc..



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-03 14:29           ` Peter Zijlstra
@ 2012-04-04  1:47             ` Yan, Zheng
  2012-04-10  0:48             ` Yan, Zheng
  1 sibling, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-04-04  1:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On 04/03/2012 10:29 PM, Peter Zijlstra wrote:
> On Tue, 2012-04-03 at 16:28 +0800, Yan, Zheng wrote:
>> On 04/03/2012 06:11 AM, Peter Zijlstra wrote:
>>> On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
>>>>
>>>> Because using per-cpu allocation is inconvenience for PCI uncore device.
>>>
>>> What are those, where does one read about them and why?
>>>
>> PCI uncore device support was added by patch 4. If using per-cpu pointer,
>> we have to setup the per-cpu pointer after the PCI driver's probe function
>> recognizes the uncore device. It means we have to first add the PCI uncore
>> device to a list, then call a function on all cpus to scan the list, find
>> the right uncore device and setup the per-cpu pointer.
> 
> Yeah, I saw that, but it was all completely lacking in where one can
> find actual detail on that stuff. What parts of the SDM do I read? Is
> there anything other than the SDM, etc..
> 

I'm sorry, the SDM for SandyBridge-EP uncore doesn't release to public so far.

Regards
Yan, Zheng



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-03 14:29           ` Peter Zijlstra
  2012-04-04  1:47             ` Yan, Zheng
@ 2012-04-10  0:48             ` Yan, Zheng
  2012-04-16 12:11               ` Peter Zijlstra
  1 sibling, 1 reply; 28+ messages in thread
From: Yan, Zheng @ 2012-04-10  0:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On 04/03/2012 10:29 PM, Peter Zijlstra wrote:
> On Tue, 2012-04-03 at 16:28 +0800, Yan, Zheng wrote:
>> On 04/03/2012 06:11 AM, Peter Zijlstra wrote:
>>> On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
>>>>
>>>> Because using per-cpu allocation is inconvenience for PCI uncore device.
>>>
>>> What are those, where does one read about them and why?
>>>
>> PCI uncore device support was added by patch 4. If using per-cpu pointer,
>> we have to setup the per-cpu pointer after the PCI driver's probe function
>> recognizes the uncore device. It means we have to first add the PCI uncore
>> device to a list, then call a function on all cpus to scan the list, find
>> the right uncore device and setup the per-cpu pointer.
> 
> Yeah, I saw that, but it was all completely lacking in where one can
> find actual detail on that stuff. What parts of the SDM do I read? Is
> there anything other than the SDM, etc..
> 
The uncore performance monitoring guide:

http://www.intel.com/content/dam/www/public/us/en/documents/design-guides/xeon-e5-2600-uncore-guide.pdf

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-01  3:11     ` Yan, Zheng
                         ` (3 preceding siblings ...)
  2012-04-02 22:24       ` Peter Zijlstra
@ 2012-04-16 12:07       ` Peter Zijlstra
  2012-04-17  6:56         ` Yan, Zheng
  4 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2012-04-16 12:07 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
> Any hints how to do this. I'm afraid it requires big changes to perf core.


Sorry for taking so long..


I think something like the (completely untested) below should suffice..

In your driver, have hotplug notifiers keep track of what cpu is the
active cpu for your node, if it needs to change due to it going offline,
pick a new one and use the below function to migrate the events.

The only missing piece is not doing the normal
perf_event_exit_cpu_context() thing for these PMUs, except of course
once there's no cpus left in your node.

Doing that might want an extra struct pmu method, which if not set
defaults to perf_event_exit_cpu_context, and otherwise does your custom
migrate/exit.

---
 kernel/events/core.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a6a9ec4..824becf 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1641,6 +1641,8 @@ perf_install_in_context(struct perf_event_context *ctx,
 	lockdep_assert_held(&ctx->mutex);
 
 	event->ctx = ctx;
+	if (event->cpu != -1)
+		event->cpu = cpu;
 
 	if (!task) {
 		/*
@@ -6375,6 +6377,8 @@ SYSCALL_DEFINE5(perf_event_open,
 	mutex_lock(&ctx->mutex);
 
 	if (move_group) {
+		synchronize_rcu();
+
 		perf_install_in_context(ctx, group_leader, cpu);
 		get_ctx(ctx);
 		list_for_each_entry(sibling, &group_leader->sibling_list,
@@ -6477,6 +6481,35 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 }
 EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
 
+void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
+{
+	struct perf_cpu_context *src_cpuctx = per_cpu(pmu->pmu_cpu_context, src_cpu);
+	struct perf_cpu_context *dst_cpuctx = per_cpu(pmu->pmu_cpu_context, dst_cpu);
+	struct perf_event_context *src_ctx = &src_cpuctx->ctx;
+	struct perf_event_context *dst_ctx = &dst_cpuctx->ctx;
+	struct perf_event *event, *tmp;
+	LIST_HEAD(events);
+
+	mutex_lock(&src_ctx->mutex);
+	list_for_each_entry_safe(event, tmp, &src_ctx->event_list, event_entry) {
+		perf_remove_from_context(event);
+		put_ctx(src_ctx);
+		list_add(&event->event_entry, &events);
+	}
+	mutex_unlock(&src_ctx->mutex);
+
+	synchronize_rcu();
+
+	mutex_lock(&dst_ctx->mutex);
+	list_for_each_entry_safe(event, tmp, &events, event_entry) {
+		list_del(&event->event_entry);
+		perf_install_in_context(dst_ctx, event, dst_cpu);
+		get_ctx(dst_ctx);
+	}
+	mutex_unlock(&dst_ctx->mutex);
+}
+EXPORT_SYMBOL_GPL(perf_pmu_migrate_context);
+
 static void sync_child_event(struct perf_event *child_event,
 			       struct task_struct *child)
 {


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-10  0:48             ` Yan, Zheng
@ 2012-04-16 12:11               ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2012-04-16 12:11 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On Tue, 2012-04-10 at 08:48 +0800, Yan, Zheng wrote:
> On 04/03/2012 10:29 PM, Peter Zijlstra wrote:
> > On Tue, 2012-04-03 at 16:28 +0800, Yan, Zheng wrote:
> >> On 04/03/2012 06:11 AM, Peter Zijlstra wrote:
> >>> On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
> >>>>
> >>>> Because using per-cpu allocation is inconvenience for PCI uncore device.
> >>>
> >>> What are those, where does one read about them and why?
> >>>
> >> PCI uncore device support was added by patch 4. If using per-cpu pointer,
> >> we have to setup the per-cpu pointer after the PCI driver's probe function
> >> recognizes the uncore device. It means we have to first add the PCI uncore
> >> device to a list, then call a function on all cpus to scan the list, find
> >> the right uncore device and setup the per-cpu pointer.
> > 
> > Yeah, I saw that, but it was all completely lacking in where one can
> > find actual detail on that stuff. What parts of the SDM do I read? Is
> > there anything other than the SDM, etc..
> > 
> The uncore performance monitoring guide:
> 
> http://www.intel.com/content/dam/www/public/us/en/documents/design-guides/xeon-e5-2600-uncore-guide.pdf

Thanks,.. ok so I don't see what the problem for PCI is. They're similar
to the MSR based ones except the access is memory-io instead of MSR.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] perf: generic intel uncore support
  2012-04-16 12:07       ` Peter Zijlstra
@ 2012-04-17  6:56         ` Yan, Zheng
  0 siblings, 0 replies; 28+ messages in thread
From: Yan, Zheng @ 2012-04-17  6:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, linux-kernel, ming.m.lin

On 04/16/2012 08:07 PM, Peter Zijlstra wrote:
> On Sun, 2012-04-01 at 11:11 +0800, Yan, Zheng wrote:
>> Any hints how to do this. I'm afraid it requires big changes to perf core.
> 
> 
> Sorry for taking so long..
> 
> 
> I think something like the (completely untested) below should suffice..
> 
> In your driver, have hotplug notifiers keep track of what cpu is the
> active cpu for your node, if it needs to change due to it going offline,
> pick a new one and use the below function to migrate the events.
> 
> The only missing piece is not doing the normal
> perf_event_exit_cpu_context() thing for these PMUs, except of course
> once there's no cpus left in your node.
> 
> Doing that might want an extra struct pmu method, which if not set
> defaults to perf_event_exit_cpu_context, and otherwise does your custom
> migrate/exit.
> 
> ---
>  kernel/events/core.c |   33 +++++++++++++++++++++++++++++++++
>  1 files changed, 33 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index a6a9ec4..824becf 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1641,6 +1641,8 @@ perf_install_in_context(struct perf_event_context *ctx,
>  	lockdep_assert_held(&ctx->mutex);
>  
>  	event->ctx = ctx;
> +	if (event->cpu != -1)
> +		event->cpu = cpu;
>  
>  	if (!task) {
>  		/*
> @@ -6375,6 +6377,8 @@ SYSCALL_DEFINE5(perf_event_open,
>  	mutex_lock(&ctx->mutex);
>  
>  	if (move_group) {
> +		synchronize_rcu();
> +
>  		perf_install_in_context(ctx, group_leader, cpu);
>  		get_ctx(ctx);
>  		list_for_each_entry(sibling, &group_leader->sibling_list,
> @@ -6477,6 +6481,35 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
>  }
>  EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
>  
> +void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
> +{
> +	struct perf_cpu_context *src_cpuctx = per_cpu(pmu->pmu_cpu_context, src_cpu);
> +	struct perf_cpu_context *dst_cpuctx = per_cpu(pmu->pmu_cpu_context, dst_cpu);
> +	struct perf_event_context *src_ctx = &src_cpuctx->ctx;
> +	struct perf_event_context *dst_ctx = &dst_cpuctx->ctx;
> +	struct perf_event *event, *tmp;
> +	LIST_HEAD(events);
> +
> +	mutex_lock(&src_ctx->mutex);
> +	list_for_each_entry_safe(event, tmp, &src_ctx->event_list, event_entry) {
> +		perf_remove_from_context(event);
> +		put_ctx(src_ctx);
> +		list_add(&event->event_entry, &events);
> +	}
> +	mutex_unlock(&src_ctx->mutex);
> +
> +	synchronize_rcu();
> +
> +	mutex_lock(&dst_ctx->mutex);
> +	list_for_each_entry_safe(event, tmp, &events, event_entry) {
> +		list_del(&event->event_entry);
> +		perf_install_in_context(dst_ctx, event, dst_cpu);
> +		get_ctx(dst_ctx);
> +	}
> +	mutex_unlock(&dst_ctx->mutex);
> +}
> +EXPORT_SYMBOL_GPL(perf_pmu_migrate_context);
> +
>  static void sync_child_event(struct perf_event *child_event,
>  			       struct task_struct *child)
>  {
> 

Thank you every much.

How about interpreting the parameter 'cpu' for perf_event_open() as target socket
instead of target cpu? So that we can get rid of raw_spin_lock in the uncore_box.
The event_init() pmu callback can do this job easily.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-04-17  6:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28  6:43 [RFC PATCH 0/5] perf: Intel uncore pmu counting support Yan, Zheng
2012-03-28  6:43 ` [PATCH 1/5] perf: Export perf_assign_events Yan, Zheng
2012-03-28  6:43 ` [PATCH 2/5] perf: generic intel uncore support Yan, Zheng
2012-03-28  9:24   ` Andi Kleen
2012-03-28  9:38     ` Peter Zijlstra
2012-03-28 11:24     ` Yan, Zheng
2012-03-31  3:18   ` Peter Zijlstra
2012-04-01  3:11     ` Yan, Zheng
2012-04-02 22:10       ` Peter Zijlstra
2012-04-02 22:11       ` Peter Zijlstra
2012-04-03  8:28         ` Yan, Zheng
2012-04-03 14:29           ` Peter Zijlstra
2012-04-04  1:47             ` Yan, Zheng
2012-04-10  0:48             ` Yan, Zheng
2012-04-16 12:11               ` Peter Zijlstra
2012-04-02 22:16       ` Peter Zijlstra
2012-04-02 22:24       ` Peter Zijlstra
2012-04-16 12:07       ` Peter Zijlstra
2012-04-17  6:56         ` Yan, Zheng
2012-03-28  6:43 ` [PATCH 3/5] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
2012-03-28  6:43 ` [PATCH 4/5] perf: Generic pci uncore device support Yan, Zheng
2012-03-28  6:43 ` [PATCH 5/5] perf: Add Sandy Bridge-EP uncore support Yan, Zheng
2012-03-28  6:49 ` [RFC PATCH 0/5] perf: Intel uncore pmu counting support Ingo Molnar
2012-03-28  8:49   ` Peter Zijlstra
2012-03-28  9:02     ` Yan, Zheng
2012-03-28  8:57   ` Andi Kleen
2012-03-28  9:30     ` Ingo Molnar
2012-03-28 10:58     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).