linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/9] perf: Intel uncore pmu counting support
@ 2012-05-02  2:07 Yan, Zheng
  2012-05-02  2:07 ` [PATCH 1/9] perf: Export perf_assign_events Yan, Zheng
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

Hi, all

Here is the V3 patches to add uncore counting support for Nehalem,
Sandy Bridge and Sandy Bridge-EP, applied on top of current tip.
The code is based on Lin Ming's old patches.

For Nehalem and Sandy Bridge-EP, A few general events are exported
under directory:
  /sys/bus/event_source/devices/${uncore_dev}/events/

Each file in the events directory defines an event. The content is
like:
  config=1,config1=2

You can use 'perf stat' to access to the uncore pmu. For example:
  perf stat -a -C 0 -e 'uncore_imc0/CAS_COUNT_RD/' sleep 1

Any comment is appreciated.
Thank you
---
Changes since v1:
 - Modify perf tool to parse events from sysfs
 - A few minor code cleanup

Changes since v2:
 - Place all events for a particular socket onto a single cpu
 - Make the events parser in perf tool reentrantable
 - A few code cleanup


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/9] perf: Export perf_assign_events
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-02  2:07 ` [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event Yan, Zheng
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Export perf_assign_events so the uncore code can use it to
schedule events.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event.c |    6 +++---
 arch/x86/kernel/cpu/perf_event.h |    2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index e33e9cf..eec3d09 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -637,7 +637,7 @@ static bool __perf_sched_find_counter(struct perf_sched *sched)
 	c = sched->constraints[sched->state.event];
 
 	/* Prefer fixed purpose counters */
-	if (x86_pmu.num_counters_fixed) {
+	if (c->idxmsk64 & ((u64)-1 << X86_PMC_IDX_FIXED)) {
 		idx = X86_PMC_IDX_FIXED;
 		for_each_set_bit_from(idx, c->idxmsk, X86_PMC_IDX_MAX) {
 			if (!__test_and_set_bit(idx, sched->state.used))
@@ -704,8 +704,8 @@ static bool perf_sched_next_event(struct perf_sched *sched)
 /*
  * Assign a counter for each event.
  */
-static int perf_assign_events(struct event_constraint **constraints, int n,
-			      int wmin, int wmax, int *assign)
+int perf_assign_events(struct event_constraint **constraints, int n,
+			int wmin, int wmax, int *assign)
 {
 	struct perf_sched sched;
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 6638aaf..e6dfc00 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -466,6 +466,8 @@ static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 
 void x86_pmu_enable_all(int added);
 
+int perf_assign_events(struct event_constraint **constraints, int n,
+			int wmin, int wmax, int *assign);
 int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign);
 
 void x86_pmu_stop(struct perf_event *event, int flags);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
  2012-05-02  2:07 ` [PATCH 1/9] perf: Export perf_assign_events Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-09  6:38   ` Anshuman Khandual
  2012-05-02  2:07 ` [PATCH 3/9] perf: Introduce perf_pmu_migrate_context Yan, Zheng
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Allow the pmu->event_init callback to change event->cpu, so pmu can
choose cpu on which to install event.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 kernel/events/core.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 32cfc76..84911de 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6250,6 +6250,8 @@ SYSCALL_DEFINE5(perf_event_open,
 		}
 	}
 
+	get_online_cpus();
+
 	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
 				 NULL, NULL);
 	if (IS_ERR(event)) {
@@ -6302,7 +6304,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	/*
 	 * Get the target context (task or percpu):
 	 */
-	ctx = find_get_context(pmu, task, cpu);
+	ctx = find_get_context(pmu, task, event->cpu);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto err_alloc;
@@ -6375,20 +6377,22 @@ SYSCALL_DEFINE5(perf_event_open,
 	mutex_lock(&ctx->mutex);
 
 	if (move_group) {
-		perf_install_in_context(ctx, group_leader, cpu);
+		perf_install_in_context(ctx, group_leader, event->cpu);
 		get_ctx(ctx);
 		list_for_each_entry(sibling, &group_leader->sibling_list,
 				    group_entry) {
-			perf_install_in_context(ctx, sibling, cpu);
+			perf_install_in_context(ctx, sibling, event->cpu);
 			get_ctx(ctx);
 		}
 	}
 
-	perf_install_in_context(ctx, event, cpu);
+	perf_install_in_context(ctx, event, event->cpu);
 	++ctx->generation;
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 
+	put_online_cpus();
+
 	event->owner = current;
 
 	mutex_lock(&current->perf_event_mutex);
@@ -6417,6 +6421,7 @@ SYSCALL_DEFINE5(perf_event_open,
 err_alloc:
 	free_event(event);
 err_task:
+	put_online_cpus();
 	if (task)
 		put_task_struct(task);
 err_group_fd:
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/9] perf: Introduce perf_pmu_migrate_context
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
  2012-05-02  2:07 ` [PATCH 1/9] perf: Export perf_assign_events Yan, Zheng
  2012-05-02  2:07 ` [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-02  2:07 ` [PATCH 4/9] perf: Generic intel uncore support Yan, Zheng
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Originally from Peter Zijlstra. The helper migrates perf events
from one cpu to another cpu.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 include/linux/perf_event.h |    2 ++
 kernel/events/core.c       |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ddbb6a9..13b7b2d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1106,6 +1106,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				struct task_struct *task,
 				perf_overflow_handler_t callback,
 				void *context);
+extern void perf_pmu_migrate_context(struct pmu *pmu,
+				int src_cpu, int dst_cpu);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 84911de..1fc7000 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1641,6 +1641,8 @@ perf_install_in_context(struct perf_event_context *ctx,
 	lockdep_assert_held(&ctx->mutex);
 
 	event->ctx = ctx;
+	if (event->cpu != -1)
+		event->cpu = cpu;
 
 	if (!task) {
 		/*
@@ -6377,6 +6379,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	mutex_lock(&ctx->mutex);
 
 	if (move_group) {
+		synchronize_rcu();
 		perf_install_in_context(ctx, group_leader, event->cpu);
 		get_ctx(ctx);
 		list_for_each_entry(sibling, &group_leader->sibling_list,
@@ -6482,6 +6485,39 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 }
 EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
 
+void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
+{
+	struct perf_event_context *src_ctx;
+	struct perf_event_context *dst_ctx;
+	struct perf_event *event, *tmp;
+	LIST_HEAD(events);
+
+	src_ctx = &per_cpu_ptr(pmu->pmu_cpu_context, src_cpu)->ctx;
+	dst_ctx = &per_cpu_ptr(pmu->pmu_cpu_context, dst_cpu)->ctx;
+
+	mutex_lock(&src_ctx->mutex);
+	list_for_each_entry_safe(event, tmp, &src_ctx->event_list,
+				 event_entry) {
+		perf_remove_from_context(event);
+		put_ctx(src_ctx);
+		list_add(&event->event_entry, &events);
+	}
+	mutex_unlock(&src_ctx->mutex);
+
+	synchronize_rcu();
+
+	mutex_lock(&dst_ctx->mutex);
+	list_for_each_entry_safe(event, tmp, &events, event_entry) {
+		list_del(&event->event_entry);
+		if (event->state >= PERF_EVENT_STATE_OFF)
+			event->state = PERF_EVENT_STATE_INACTIVE;
+		perf_install_in_context(dst_ctx, event, dst_cpu);
+		get_ctx(dst_ctx);
+	}
+	mutex_unlock(&dst_ctx->mutex);
+}
+EXPORT_SYMBOL_GPL(perf_pmu_migrate_context);
+
 static void sync_child_event(struct perf_event *child_event,
 			       struct task_struct *child)
 {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/9] perf: Generic intel uncore support
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (2 preceding siblings ...)
  2012-05-02  2:07 ` [PATCH 3/9] perf: Introduce perf_pmu_migrate_context Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-03 17:12   ` Peter Zijlstra
                     ` (2 more replies)
  2012-05-02  2:07 ` [PATCH 5/9] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

This patch adds the generic intel uncore pmu support, including helper
functions that add/delete uncore events, a hrtimer that periodically
polls the counters to avoid overflow and code that places all events
for a particular socket onto a single cpu. The code design is based on
the structure of Sandy Bridge-EP's uncore subsystem, which consists of
a variety of components, each component contain one or more boxes.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/Makefile                  |    2 +-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  880 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |  205 ++++++
 3 files changed, 1086 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/perf_event_intel_uncore.c
 create mode 100644 arch/x86/kernel/cpu/perf_event_intel_uncore.h

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 6ab6aa2..9dfa9e9 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -32,7 +32,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
 
 ifdef CONFIG_PERF_EVENTS
 obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o
-obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o perf_event_intel_uncore.o
 endif
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
new file mode 100644
index 0000000..0dda34e
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -0,0 +1,880 @@
+#include "perf_event_intel_uncore.h"
+
+static struct intel_uncore_type *empty_uncore[] = { NULL, };
+static struct intel_uncore_type **msr_uncores = empty_uncore;
+
+/* mask of cpus that collect uncore events */
+static cpumask_t uncore_cpu_mask;
+
+/* constraint for the fixed countesr */
+static struct event_constraint constraint_fixed =
+	EVENT_CONSTRAINT((u64)-1, 1 << UNCORE_PMC_IDX_FIXED, (u64)-1);
+
+static void uncore_assign_hw_event(struct intel_uncore_box *box,
+				struct perf_event *event, int idx)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	hwc->idx = idx;
+	hwc->last_tag = ++box->tags[idx];
+
+	if (hwc->idx == UNCORE_PMC_IDX_FIXED) {
+		hwc->event_base = uncore_msr_fixed_ctr(box);
+		hwc->config_base = uncore_msr_fixed_ctl(box);
+		return;
+	}
+
+	hwc->config_base = uncore_msr_event_ctl(box, hwc->idx);
+	hwc->event_base =  uncore_msr_perf_ctr(box, hwc->idx);
+}
+
+static void uncore_perf_event_update(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	u64 prev_count, new_count, delta;
+	int shift;
+
+	if (event->hw.idx >= UNCORE_PMC_IDX_FIXED)
+		shift = 64 - uncore_fixed_ctr_bits(box);
+	else
+		shift = 64 - uncore_perf_ctr_bits(box);
+
+	/* the hrtimer might modify the previous event value */
+again:
+	prev_count = local64_read(&event->hw.prev_count);
+	new_count = uncore_read_counter(box, event);
+	if (local64_xchg(&event->hw.prev_count, new_count) != prev_count)
+		goto again;
+
+	delta = (new_count << shift) - (prev_count << shift);
+	delta >>= shift;
+
+	local64_add(delta, &event->count);
+}
+
+/*
+ * The overflow interrupt is unavailable for SandyBridge-EP, is broken
+ * for SandyBridge. So we use hrtimer to periodically poll the counter
+ * to avoid overflow.
+ */
+static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)
+{
+	struct intel_uncore_box *box;
+	unsigned long flags;
+	int bit;
+
+	box = container_of(hrtimer, struct intel_uncore_box, hrtimer);
+	if (!box->n_active || box->cpu != smp_processor_id())
+		return HRTIMER_NORESTART;
+	/*
+	 * disable local interrupt to prevent uncore_pmu_event_start/stop
+	 * to interrupt the update process
+	 */
+	local_irq_save(flags);
+
+	for_each_set_bit(bit, box->active_mask, UNCORE_PMC_IDX_MAX)
+		uncore_perf_event_update(box, box->events[bit]);
+
+	local_irq_restore(flags);
+
+	hrtimer_forward_now(hrtimer, ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL));
+	return HRTIMER_RESTART;
+}
+
+static void uncore_pmu_start_hrtimer(struct intel_uncore_box *box)
+{
+	__hrtimer_start_range_ns(&box->hrtimer,
+			ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL), 0,
+			HRTIMER_MODE_REL_PINNED, 0);
+}
+
+static void uncore_pmu_cancel_hrtimer(struct intel_uncore_box *box)
+{
+	hrtimer_cancel(&box->hrtimer);
+}
+
+static void uncore_pmu_init_hrtimer(struct intel_uncore_box *box)
+{
+	hrtimer_init(&box->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	box->hrtimer.function = uncore_pmu_hrtimer;
+}
+
+struct intel_uncore_box *uncore_alloc_box(int cpu)
+{
+	struct intel_uncore_box *box;
+
+	box = kmalloc_node(sizeof(*box), GFP_KERNEL | __GFP_ZERO,
+			cpu_to_node(cpu));
+	if (!box)
+		return NULL;
+
+	uncore_pmu_init_hrtimer(box);
+	box->cpu = -1;
+	box->refcnt = 1;
+
+	return box;
+}
+
+static struct intel_uncore_box *
+__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
+{
+	struct intel_uncore_box *box;
+	struct hlist_head *head;
+	struct hlist_node *node;
+
+	head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
+	hlist_for_each_entry_rcu(box, node, head, hlist) {
+		if (box->phy_id == phyid)
+			return box;
+	}
+
+	return NULL;
+}
+
+static struct intel_uncore_box *
+uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
+{
+	struct intel_uncore_box *box;
+
+	rcu_read_lock();
+	box = __uncore_pmu_find_box(pmu, phyid);
+	rcu_read_unlock();
+
+	return box;
+}
+
+static void uncore_pmu_add_box(struct intel_uncore_pmu *pmu,
+				struct intel_uncore_box *box)
+{
+	struct hlist_head *head;
+
+	head = &pmu->box_hash[box->phy_id % UNCORE_BOX_HASH_SIZE];
+	hlist_add_head_rcu(&box->hlist, head);
+}
+
+static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
+{
+	return container_of(event->pmu, struct intel_uncore_pmu, pmu);
+}
+
+static struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
+{
+	/*
+	 * perf core schedules event on the basis of cpu, uncore events are
+	 * collected by one of the cpus inside a physical package.
+	 */
+	int phyid = topology_physical_package_id(smp_processor_id());
+	return uncore_pmu_find_box(uncore_event_to_pmu(event), phyid);
+}
+
+static int uncore_collect_events(struct intel_uncore_box *box,
+				struct perf_event *leader, bool dogrp)
+{
+	struct perf_event *event;
+	int n, max_count;
+
+	max_count = box->pmu->type->num_counters;
+	if (box->pmu->type->fixed_ctl)
+		max_count++;
+
+	if (box->n_events >= max_count)
+		return -EINVAL;
+
+	n = box->n_events;
+	box->event_list[n] = leader;
+	n++;
+	if (!dogrp)
+		return n;
+
+	list_for_each_entry(event, &leader->sibling_list, group_entry) {
+		if (event->state <= PERF_EVENT_STATE_OFF)
+			continue;
+
+		if (n >= max_count)
+			return -EINVAL;
+
+		box->event_list[n] = event;
+		n++;
+	}
+	return n;
+}
+
+static struct event_constraint *
+uncore_event_constraint(struct intel_uncore_type *type,
+			struct perf_event *event)
+{
+	struct event_constraint *c;
+
+	if (event->hw.config == (u64)-1)
+		return &constraint_fixed;
+
+	if (type->constraints) {
+		for_each_event_constraint(c, type->constraints) {
+			if ((event->hw.config & c->cmask) == c->code)
+				return c;
+		}
+	}
+
+	return &type->unconstrainted;
+}
+
+static int uncore_assign_events(struct intel_uncore_box *box,
+				int assign[], int n)
+{
+	unsigned long used_mask[BITS_TO_LONGS(UNCORE_PMC_IDX_MAX)];
+	struct event_constraint *c, *constraints[UNCORE_PMC_IDX_MAX];
+	int i, ret, wmin, wmax;
+	struct hw_perf_event *hwc;
+
+	bitmap_zero(used_mask, UNCORE_PMC_IDX_MAX);
+
+	for (i = 0, wmin = UNCORE_PMC_IDX_MAX, wmax = 0; i < n; i++) {
+		c = uncore_event_constraint(box->pmu->type,
+				box->event_list[i]);
+		constraints[i] = c;
+		wmin = min(wmin, c->weight);
+		wmax = max(wmax, c->weight);
+	}
+
+	/* fastpath, try to reuse previous register */
+	for (i = 0; i < n; i++) {
+		hwc = &box->event_list[i]->hw;
+		c = constraints[i];
+
+		/* never assigned */
+		if (hwc->idx == -1)
+			break;
+
+		/* constraint still honored */
+		if (!test_bit(hwc->idx, c->idxmsk))
+			break;
+
+		/* not already used */
+		if (test_bit(hwc->idx, used_mask))
+			break;
+
+		__set_bit(hwc->idx, used_mask);
+		assign[i] = hwc->idx;
+	}
+	if (i == n)
+		return 0;
+
+	/* slow path */
+	ret = perf_assign_events(constraints, n, wmin, wmax, assign);
+	return ret ? -EINVAL : 0;
+}
+
+static void uncore_pmu_event_start(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+	int idx = event->hw.idx;
+
+	if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+		return;
+
+	if (WARN_ON_ONCE(idx == -1 || idx >= UNCORE_PMC_IDX_MAX))
+		return;
+
+	event->hw.state = 0;
+	box->events[idx] = event;
+	box->n_active++;
+	__set_bit(idx, box->active_mask);
+
+	local64_set(&event->hw.prev_count, uncore_read_counter(box, event));
+	uncore_enable_event(box, event);
+
+	if (box->n_active == 1) {
+		uncore_enable_box(box);
+		uncore_pmu_start_hrtimer(box);
+	}
+}
+
+static void uncore_pmu_event_stop(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (__test_and_clear_bit(hwc->idx, box->active_mask)) {
+		uncore_disable_event(box, event);
+		box->n_active--;
+		box->events[hwc->idx] = NULL;
+		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+		hwc->state |= PERF_HES_STOPPED;
+
+		if (box->n_active == 0) {
+			uncore_disable_box(box);
+			uncore_pmu_cancel_hrtimer(box);
+		}
+	}
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		/*
+		 * Drain the remaining delta count out of a event
+		 * that we are disabling:
+		 */
+		uncore_perf_event_update(box, event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int uncore_pmu_event_add(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+	struct hw_perf_event *hwc = &event->hw;
+	int assign[UNCORE_PMC_IDX_MAX];
+	int i, n, ret;
+
+	if (!box)
+		return -ENODEV;
+
+	ret = n = uncore_collect_events(box, event, false);
+	if (ret < 0)
+		return ret;
+
+	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+	if (!(flags & PERF_EF_START))
+		hwc->state |= PERF_HES_ARCH;
+
+	ret = uncore_assign_events(box, assign, n);
+	if (ret)
+		return ret;
+
+	/* save events moving to new counters */
+	for (i = 0; i < box->n_events; i++) {
+		event = box->event_list[i];
+		hwc = &event->hw;
+
+		if (hwc->idx == assign[i] &&
+			hwc->last_tag == box->tags[assign[i]])
+			continue;
+		/*
+		 * Ensure we don't accidentally enable a stopped
+		 * counter simply because we rescheduled.
+		 */
+		if (hwc->state & PERF_HES_STOPPED)
+			hwc->state |= PERF_HES_ARCH;
+
+		uncore_pmu_event_stop(event, PERF_EF_UPDATE);
+	}
+
+	/* reprogram moved events into new counters */
+	for (i = 0; i < n; i++) {
+		event = box->event_list[i];
+		hwc = &event->hw;
+
+		if (hwc->idx != assign[i] ||
+			hwc->last_tag != box->tags[assign[i]])
+			uncore_assign_hw_event(box, event, assign[i]);
+		else if (i < box->n_events)
+			continue;
+
+		if (hwc->state & PERF_HES_ARCH)
+			continue;
+
+		uncore_pmu_event_start(event, 0);
+	}
+	box->n_events = n;
+
+	return 0;
+}
+
+static void uncore_pmu_event_del(struct perf_event *event, int flags)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+	int i;
+
+	uncore_pmu_event_stop(event, PERF_EF_UPDATE);
+
+	for (i = 0; i < box->n_events; i++) {
+		if (event == box->event_list[i]) {
+			while (++i < box->n_events)
+				box->event_list[i - 1] = box->event_list[i];
+
+			--box->n_events;
+			break;
+		}
+	}
+
+	event->hw.idx = -1;
+	event->hw.last_tag = ~0ULL;
+}
+
+static void uncore_pmu_event_read(struct perf_event *event)
+{
+	struct intel_uncore_box *box = uncore_event_to_box(event);
+	uncore_perf_event_update(box, event);
+}
+
+/*
+ * validation ensures the group can be loaded onto the
+ * PMU if it was the only group available.
+ */
+static int uncore_validate_group(struct intel_uncore_pmu *pmu,
+				struct perf_event *event)
+{
+	struct perf_event *leader = event->group_leader;
+	struct intel_uncore_box *fake_box;
+	int assign[UNCORE_PMC_IDX_MAX];
+	int ret = -EINVAL, n;
+
+	fake_box = uncore_alloc_box(smp_processor_id());
+	if (!fake_box)
+		return -ENOMEM;
+
+	fake_box->pmu = pmu;
+	/*
+	 * the event is not yet connected with its
+	 * siblings therefore we must first collect
+	 * existing siblings, then add the new event
+	 * before we can simulate the scheduling
+	 */
+	n = uncore_collect_events(fake_box, leader, true);
+	if (n < 0)
+		goto out;
+
+	fake_box->n_events = n;
+	n = uncore_collect_events(fake_box, event, false);
+	if (n < 0)
+		goto out;
+
+	fake_box->n_events = n;
+
+	ret = uncore_assign_events(fake_box, assign, n);
+out:
+	kfree(fake_box);
+	return ret;
+}
+
+int uncore_pmu_event_init(struct perf_event *event)
+{
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	struct hw_perf_event *hwc = &event->hw;
+	int ret;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	pmu = uncore_event_to_pmu(event);
+	/* no device found for this pmu */
+	if (pmu->func_id < 0)
+		return -ENOENT;
+
+	/*
+	 * Uncore PMU does measure at all privilege level all the time.
+	 * So it doesn't make sense to specify any exclude bits.
+	 */
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+			event->attr.exclude_hv || event->attr.exclude_idle)
+		return -EINVAL;
+
+	/* Sampling not supported yet */
+	if (hwc->sample_period)
+		return -EINVAL;
+
+	/*
+	 * Place all uncore events for a particular physical package
+	 * onto a single cpu
+	 */
+	if (event->cpu < 0)
+		return -EINVAL;
+	box = uncore_pmu_find_box(pmu,
+			topology_physical_package_id(event->cpu));
+	if (!box || box->cpu < 0)
+		return -EINVAL;
+	event->cpu = box->cpu;
+
+	if (event->attr.config == UNCORE_FIXED_EVENT) {
+		/* no fixed counter */
+		if (!pmu->type->fixed_ctl)
+			return -EINVAL;
+		/*
+		 * if there is only one fixed counter, only the first pmu
+		 * can access the fixed counter
+		 */
+		if (pmu->type->single_fixed && pmu->pmu_idx > 0)
+			return -EINVAL;
+		hwc->config = (u64)-1;
+	} else {
+		hwc->config = event->attr.config & pmu->type->event_mask;
+	}
+
+	event->hw.idx = -1;
+	event->hw.last_tag = ~0ULL;
+
+	if (event->group_leader != event)
+		ret = uncore_validate_group(pmu, event);
+	else
+		ret = 0;
+
+	return ret;
+}
+
+static int __init uncore_pmu_register(struct intel_uncore_pmu *pmu)
+{
+	int ret;
+
+	pmu->pmu = (struct pmu) {
+		.attr_groups	= pmu->type->attr_groups,
+		.task_ctx_nr	= perf_invalid_context,
+		.event_init	= uncore_pmu_event_init,
+		.add		= uncore_pmu_event_add,
+		.del		= uncore_pmu_event_del,
+		.start		= uncore_pmu_event_start,
+		.stop		= uncore_pmu_event_stop,
+		.read		= uncore_pmu_event_read,
+	};
+
+	if (pmu->type->num_boxes == 1) {
+		if (strlen(pmu->type->name) > 0)
+			sprintf(pmu->name, "uncore_%s", pmu->type->name);
+		else
+			sprintf(pmu->name, "uncore");
+	} else {
+		sprintf(pmu->name, "uncore_%s%d", pmu->type->name,
+			pmu->pmu_idx);
+	}
+
+	ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
+	return ret;
+}
+
+static void __init uncore_type_exit(struct intel_uncore_type *type)
+{
+	kfree(type->attr_groups[1]);
+	kfree(type->pmus);
+	type->attr_groups[1] = NULL;
+	type->pmus = NULL;
+}
+
+static int __init uncore_type_init(struct intel_uncore_type *type)
+{
+	struct intel_uncore_pmu *pmus;
+	struct attribute_group *events_group;
+	struct attribute **attrs;
+	int i, j;
+
+	pmus = kzalloc(sizeof(*pmus) * type->num_boxes, GFP_KERNEL);
+	if (!pmus)
+		return -ENOMEM;
+
+	type->unconstrainted = (struct event_constraint)
+		__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
+				0, type->num_counters, 0);
+
+	for (i = 0; i < type->num_boxes; i++) {
+		pmus[i].func_id = -1;
+		pmus[i].pmu_idx = i;
+		pmus[i].type = type;
+
+		for (j = 0; j < ARRAY_SIZE(pmus[0].box_hash); j++)
+			INIT_HLIST_HEAD(&pmus[i].box_hash[j]);
+	}
+
+	if (type->event_descs) {
+		for (i = 0; type->event_descs[i].attr.attr.name; i++);
+
+		events_group = kzalloc(sizeof(struct attribute *) * (i + 1) +
+					sizeof(*events_group), GFP_KERNEL);
+		if (!events_group)
+			goto fail;
+
+		attrs = (struct attribute **)(events_group + 1);
+		events_group->name = "events";
+		events_group->attrs = attrs;
+
+		for (j = 0; j < i; j++)
+			attrs[j] = &type->event_descs[j].attr.attr;
+
+		type->attr_groups[1] = events_group;
+	}
+
+	type->pmus = pmus;
+	return 0;
+fail:
+	uncore_type_exit(type);
+	return -ENOMEM;
+}
+
+static int __init uncore_types_init(struct intel_uncore_type **types)
+{
+	int i, ret;
+
+	for (i = 0; types[i]; i++) {
+		ret = uncore_type_init(types[i]);
+		if (ret)
+			goto fail;
+	}
+	return 0;
+fail:
+	while (--i >= 0)
+		uncore_type_exit(types[i]);
+	return ret;
+}
+
+static void __cpuinit uncore_cpu_dying(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int i, j, phyid;
+
+	phyid = topology_physical_package_id(cpu);
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			box = uncore_pmu_find_box(pmu, phyid);
+			if (box && --box->refcnt == 0) {
+				hlist_del_rcu(&box->hlist);
+				kfree_rcu(box, rcu_head);
+			}
+		}
+	}
+}
+
+static int __cpuinit uncore_cpu_starting(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int i, j, phyid;
+
+	phyid = topology_physical_package_id(cpu);
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			box = uncore_pmu_find_box(pmu, phyid);
+			if (box)
+				uncore_box_init(box);
+		}
+	}
+	return 0;
+}
+
+static int __cpuinit uncore_cpu_prepare(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *exist, *box;
+	int i, j, phyid;
+
+	phyid = topology_physical_package_id(cpu);
+
+	/* allocate the box data structure */
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			exist = NULL;
+			pmu = &type->pmus[j];
+
+			if (pmu->func_id < 0)
+				pmu->func_id = j;
+			exist = uncore_pmu_find_box(pmu, phyid);
+			if (exist)
+				exist->refcnt++;
+			if (exist)
+				continue;
+
+			box = uncore_alloc_box(cpu);
+			if (!box)
+				return -ENOMEM;
+
+			box->pmu = pmu;
+			box->phy_id = phyid;
+			uncore_pmu_add_box(pmu, box);
+		}
+	}
+	return 0;
+}
+
+static void __cpuinit uncore_event_exit_cpu(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int i, j, phyid, target;
+
+	/* if exiting cpu is used for collecting uncore events */
+	if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
+		return;
+
+	/* find a new cpu to collect uncore events */
+	phyid = topology_physical_package_id(cpu);
+	target = -1;
+	for_each_online_cpu(i) {
+		if (i == cpu)
+			continue;
+		if (phyid == topology_physical_package_id(i)) {
+			target = i;
+			break;
+		}
+	}
+
+	/* migrate uncore events to the new cpu */
+	if (target >= 0)
+		cpumask_set_cpu(target, &uncore_cpu_mask);
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			box = uncore_pmu_find_box(pmu, phyid);
+			WARN_ON_ONCE(box->cpu != cpu);
+
+			if (target >= 0) {
+				uncore_pmu_cancel_hrtimer(box);
+				perf_pmu_migrate_context(&pmu->pmu,
+							cpu, target);
+				box->cpu = target;
+			} else {
+				box->cpu = -1;
+			}
+		}
+	}
+}
+
+static void __cpuinit uncore_event_init_cpu(int cpu)
+{
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int i, j, phyid;
+
+	phyid = topology_physical_package_id(cpu);
+	for_each_cpu(i, &uncore_cpu_mask) {
+		if (phyid == topology_physical_package_id(i))
+			return;
+	}
+
+	cpumask_set_cpu(cpu, &uncore_cpu_mask);
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			box = uncore_pmu_find_box(pmu, phyid);
+			WARN_ON_ONCE(box->cpu != -1);
+			box->cpu = cpu;
+		}
+	}
+}
+
+static int __cpuinit uncore_cpu_notifier(struct notifier_block *self,
+					 unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (long)hcpu;
+
+	/* allocate/free data structure for uncore box */
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_UP_PREPARE:
+		uncore_cpu_prepare(cpu);
+		break;
+	case CPU_STARTING:
+		uncore_cpu_starting(cpu);
+		break;
+	case CPU_UP_CANCELED:
+	case CPU_DYING:
+		uncore_cpu_dying(cpu);
+		break;
+	default:
+		break;
+	}
+
+	/* select the cpu that collects uncore events */
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_UP_PREPARE:
+	case CPU_DOWN_FAILED:
+		uncore_event_init_cpu(cpu);
+		break;
+
+	case CPU_UP_CANCELED:
+	case CPU_DOWN_PREPARE:
+		uncore_event_exit_cpu(cpu);
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block uncore_cpu_nb __cpuinitdata = {
+	.notifier_call = uncore_cpu_notifier,
+	/*
+	 * to migrate uncore events, our notifier should be executed
+	 * before perf core's notifier.
+	 */
+	.priority = CPU_PRI_PERF + 1,
+};
+
+static void __init uncore_cpu_setup(void *dummy)
+{
+	uncore_cpu_starting(smp_processor_id());
+}
+
+static int __init uncore_cpu_init(void)
+{
+	int ret, cpu;
+
+	switch (boot_cpu_data.x86_model) {
+	default:
+		return 0;
+	}
+
+	ret = uncore_types_init(msr_uncores);
+	if (ret)
+		return ret;
+
+	get_online_cpus();
+
+	for_each_online_cpu(cpu) {
+		uncore_cpu_prepare(cpu);
+		uncore_event_init_cpu(cpu);
+	}
+	on_each_cpu(uncore_cpu_setup, NULL, 1);
+
+	register_cpu_notifier(&uncore_cpu_nb);
+
+	put_online_cpus();
+
+	return 0;
+}
+
+static int __init uncore_pmus_register(void)
+{
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_type *type;
+	int i, j;
+
+	for (i = 0; msr_uncores[i]; i++) {
+		type = msr_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			uncore_pmu_register(pmu);
+		}
+	}
+
+	return 0;
+}
+
+static int __init intel_uncore_init(void)
+{
+	int ret;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return -ENODEV;
+
+	ret = uncore_cpu_init();
+	if (ret)
+		goto fail;
+
+	uncore_pmus_register();
+	return 0;
+fail:
+	return ret;
+}
+device_initcall(intel_uncore_init);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
new file mode 100644
index 0000000..ab648f3
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -0,0 +1,205 @@
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+#include "perf_event.h"
+
+#define UNCORE_PMU_NAME_LEN		32
+#define UNCORE_BOX_HASH_SIZE		8
+
+#define UNCORE_PMU_HRTIMER_INTERVAL	(60 * NSEC_PER_SEC)
+
+#define UNCORE_FIXED_EVENT		0xffff
+#define UNCORE_PMC_IDX_MAX_GENERIC	8
+#define UNCORE_PMC_IDX_FIXED		UNCORE_PMC_IDX_MAX_GENERIC
+#define UNCORE_PMC_IDX_MAX		(UNCORE_PMC_IDX_FIXED + 1)
+
+#define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)
+
+struct intel_uncore_ops;
+struct intel_uncore_pmu;
+struct intel_uncore_box;
+struct uncore_event_desc;
+
+struct intel_uncore_type {
+	const char *name;
+	int num_counters;
+	int num_boxes;
+	int perf_ctr_bits;
+	int fixed_ctr_bits;
+	int single_fixed;
+	unsigned perf_ctr;
+	unsigned event_ctl;
+	unsigned event_mask;
+	unsigned fixed_ctr;
+	unsigned fixed_ctl;
+	unsigned box_ctl;
+	unsigned msr_offset;
+	struct event_constraint unconstrainted;
+	struct event_constraint *constraints;
+	struct intel_uncore_pmu *pmus;
+	struct intel_uncore_ops *ops;
+	struct uncore_event_desc *event_descs;
+	const struct attribute_group *attr_groups[3];
+};
+
+#define format_group attr_groups[0]
+
+struct intel_uncore_ops {
+	void (*init_box)(struct intel_uncore_box *);
+	void (*disable_box)(struct intel_uncore_box *);
+	void (*enable_box)(struct intel_uncore_box *);
+	void (*disable_event)(struct intel_uncore_box *, struct perf_event *);
+	void (*enable_event)(struct intel_uncore_box *, struct perf_event *);
+	u64 (*read_counter)(struct intel_uncore_box *, struct perf_event *);
+};
+
+struct intel_uncore_pmu {
+	struct pmu pmu;
+	char name[UNCORE_PMU_NAME_LEN];
+	int pmu_idx;
+	int func_id;
+	struct intel_uncore_type *type;
+	struct hlist_head box_hash[UNCORE_BOX_HASH_SIZE];
+};
+
+struct intel_uncore_box {
+	struct hlist_node hlist;
+	int phy_id;
+	int refcnt;
+	int n_active;	/* number of active events */
+	int n_events;
+	int cpu;	/* cpu to collect events */
+	unsigned long flags;
+	struct perf_event *events[UNCORE_PMC_IDX_MAX];
+	struct perf_event *event_list[UNCORE_PMC_IDX_MAX];
+	unsigned long active_mask[BITS_TO_LONGS(UNCORE_PMC_IDX_MAX)];
+	u64 tags[UNCORE_PMC_IDX_MAX];
+	struct intel_uncore_pmu *pmu;
+	struct hrtimer hrtimer;
+	struct rcu_head rcu_head;
+};
+
+#define UNCORE_BOX_FLAG_INITIATED	0
+
+struct uncore_event_desc {
+	struct kobj_attribute attr;
+	const char *config;
+};
+
+#define INTEL_UNCORE_EVENT_DESC(_name, _config)			\
+{								\
+	.attr	= __ATTR(_name, 0444, uncore_event_show, NULL),	\
+	.config	= _config,					\
+}
+
+#define DEFINE_UNCORE_FORMAT_ATTR(_var, _name, _format)			\
+static ssize_t __uncore_##_var##_show(struct kobject *kobj,		\
+				struct kobj_attribute *attr,		\
+				char *page)				\
+{									\
+	BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE);			\
+	return sprintf(page, _format "\n");				\
+}									\
+static struct kobj_attribute format_attr_##_var =			\
+	__ATTR(_name, 0444, __uncore_##_var##_show, NULL)
+
+
+static ssize_t uncore_event_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	struct uncore_event_desc *event =
+		container_of(attr, struct uncore_event_desc, attr);
+	return sprintf(buf, "%s", event->config);
+}
+
+static inline
+unsigned uncore_msr_box_ctl(struct intel_uncore_box *box)
+{
+	if (!box->pmu->type->box_ctl)
+		return 0;
+	return box->pmu->type->box_ctl +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_fixed_ctl(struct intel_uncore_box *box)
+{
+	if (!box->pmu->type->fixed_ctl)
+		return 0;
+	return box->pmu->type->fixed_ctl +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_fixed_ctr(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctr +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_event_ctl(struct intel_uncore_box *box, int idx)
+{
+	return idx + box->pmu->type->event_ctl +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline
+unsigned uncore_msr_perf_ctr(struct intel_uncore_box *box, int idx)
+{
+	return idx + box->pmu->type->perf_ctr +
+		box->pmu->type->msr_offset * box->pmu->pmu_idx;
+}
+
+static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
+{
+	return box->pmu->type->perf_ctr_bits;
+}
+
+static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctr_bits;
+}
+
+static inline int uncore_num_counters(struct intel_uncore_box *box)
+{
+	return box->pmu->type->num_counters;
+}
+
+static inline void uncore_disable_box(struct intel_uncore_box *box)
+{
+	if (box->pmu->type->ops->disable_box)
+		box->pmu->type->ops->disable_box(box);
+}
+
+static inline void uncore_enable_box(struct intel_uncore_box *box)
+{
+	if (box->pmu->type->ops->enable_box)
+		box->pmu->type->ops->enable_box(box);
+}
+
+static inline void uncore_disable_event(struct intel_uncore_box *box,
+				struct perf_event *event)
+{
+	box->pmu->type->ops->disable_event(box, event);
+}
+
+static inline void uncore_enable_event(struct intel_uncore_box *box,
+				struct perf_event *event)
+{
+	box->pmu->type->ops->enable_event(box, event);
+}
+
+static inline u64 uncore_read_counter(struct intel_uncore_box *box,
+				struct perf_event *event)
+{
+	return box->pmu->type->ops->read_counter(box, event);
+}
+
+static inline void uncore_box_init(struct intel_uncore_box *box)
+{
+	if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIATED, &box->flags)) {
+		if (box->pmu->type->ops->init_box)
+			box->pmu->type->ops->init_box(box);
+	}
+}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/9] perf: Add Nehalem and Sandy Bridge uncore support
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (3 preceding siblings ...)
  2012-05-02  2:07 ` [PATCH 4/9] perf: Generic intel uncore support Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-03 21:04   ` Peter Zijlstra
  2012-05-03 21:04   ` Peter Zijlstra
  2012-05-02  2:07 ` [PATCH 6/9] perf: Generic pci uncore device support Yan, Zheng
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Add Intel Nehalem and Sandy Bridge uncore pmu support.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  195 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   50 +++++++
 2 files changed, 245 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 0dda34e..6022c8a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -10,6 +10,192 @@ static cpumask_t uncore_cpu_mask;
 static struct event_constraint constraint_fixed =
 	EVENT_CONSTRAINT((u64)-1, 1 << UNCORE_PMC_IDX_FIXED, (u64)-1);
 
+DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
+DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
+DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
+DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
+DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
+DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+
+/* Sandy Bridge uncore support */
+static void snb_uncore_msr_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->idx < UNCORE_PMC_IDX_FIXED)
+		wrmsrl(hwc->config_base, hwc->config | SNB_UNC_CTL_EN);
+	else
+		wrmsrl(hwc->config_base, SNB_UNC_CTL_EN);
+}
+
+static void snb_uncore_msr_disable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	wrmsrl(event->hw.config_base, 0);
+}
+
+static u64 snb_uncore_msr_read_counter(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	u64 count;
+	rdmsrl(event->hw.event_base, count);
+	return count;
+}
+
+static void snb_uncore_msr_init_box(struct intel_uncore_box *box)
+{
+	if (box->pmu->pmu_idx == 0) {
+		wrmsrl(SNB_UNC_PERF_GLOBAL_CTL,
+			SNB_UNC_GLOBAL_CTL_EN | SNB_UNC_GLOBAL_CTL_CORE_ALL);
+	}
+}
+
+static struct attribute *snb_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_cmask5.attr,
+	NULL,
+};
+
+static struct attribute_group snb_uncore_format_group = {
+	.name = "format",
+	.attrs = snb_uncore_formats_attr,
+};
+
+static struct intel_uncore_ops snb_uncore_msr_ops = {
+	.init_box	= snb_uncore_msr_init_box,
+	.disable_event	= snb_uncore_msr_disable_event,
+	.enable_event	= snb_uncore_msr_enable_event,
+	.read_counter	= snb_uncore_msr_read_counter,
+};
+
+static struct event_constraint snb_uncore_cbo_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x80, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x83, 0x1),
+	EVENT_CONSTRAINT_END
+};
+
+static struct intel_uncore_type snb_uncore_cbo = {
+	.name		= "cbo",
+	.num_counters   = 2,
+	.num_boxes	= 4,
+	.perf_ctr_bits	= 44,
+	.fixed_ctr_bits	= 48,
+	.perf_ctr	= SNB_UNC_CBO_0_PER_CTR0,
+	.event_ctl	= SNB_UNC_CBO_0_PERFEVTSEL0,
+	.fixed_ctr	= SNB_UNC_FIXED_CTR,
+	.fixed_ctl	= SNB_UNC_FIXED_CTR_CTRL,
+	.single_fixed	= 1,
+	.event_mask	= SNB_UNC_RAW_EVENT_MASK,
+	.msr_offset	= SNB_UNC_CBO_MSR_OFFSET,
+	.constraints	= snb_uncore_cbo_constraints,
+	.ops		= &snb_uncore_msr_ops,
+	.format_group	= &snb_uncore_format_group,
+};
+
+static struct intel_uncore_type *snb_msr_uncores[] = {
+	&snb_uncore_cbo,
+	NULL,
+};
+/* end of Sandy Bridge uncore support */
+
+/* Nehalem uncore support */
+static void nhm_uncore_msr_disable_box(struct intel_uncore_box *box)
+{
+	wrmsrl(NHM_UNC_PERF_GLOBAL_CTL, 0);
+}
+
+static void nhm_uncore_msr_enable_box(struct intel_uncore_box *box)
+{
+	wrmsrl(NHM_UNC_PERF_GLOBAL_CTL,
+		NHM_UNC_GLOBAL_CTL_EN_PC_ALL | NHM_UNC_GLOBAL_CTL_EN_FC);
+}
+
+static void nhm_uncore_msr_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->idx < UNCORE_PMC_IDX_FIXED)
+		wrmsrl(hwc->config_base, hwc->config | SNB_UNC_CTL_EN);
+	else
+		wrmsrl(hwc->config_base, NHM_UNC_FIXED_CTR_CTL_EN);
+}
+
+static struct attribute *nhm_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_cmask8.attr,
+	NULL,
+};
+
+static struct attribute_group nhm_uncore_format_group = {
+	.name = "format",
+	.attrs = nhm_uncore_formats_attr,
+};
+
+static struct uncore_event_desc nhm_uncore_events[] = {
+	INTEL_UNCORE_EVENT_DESC(CLOCKTICKS, "config=0xffff"),
+	/* full cache line writes to DRAM */
+	INTEL_UNCORE_EVENT_DESC(QMC_WRITES_FULL_ANY, "event=0x2f,umask=0xf"),
+	/* Quickpath Memory Controller normal priority read requests */
+	INTEL_UNCORE_EVENT_DESC(QMC_NORMAL_READS_ANY, "event=0x2c,umask=0xf"),
+	/* Quickpath Home Logic read requests from the IOH */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST_IOH_READS,
+				"event=0x20,umask=0x1"),
+	/* Quickpath Home Logic write requests from the IOH */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST_IOH_WRITES,
+				"event=0x20,umask=0x2"),
+	/* Quickpath Home Logic read requests from a remote socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST_REMOTE_READS,
+				"event=0x20,umask=0x4"),
+	/* Quickpath Home Logic write requests from a remote socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST_REMOTE_WRITES,
+				"event=0x20,umask=0x8"),
+	/* Quickpath Home Logic read requests from the local socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST_LOCAL_READS,
+				"event=0x20,umask=0x10"),
+	/* Quickpath Home Logic write requests from the local socket */
+	INTEL_UNCORE_EVENT_DESC(QHL_REQUEST_LOCAL_WRITES,
+				"event=0x20,umask=0x20"),
+	{ /* end: all zeroes */ },
+};
+
+static struct intel_uncore_ops nhm_uncore_msr_ops = {
+	.disable_box	= nhm_uncore_msr_disable_box,
+	.enable_box	= nhm_uncore_msr_enable_box,
+	.disable_event	= snb_uncore_msr_disable_event,
+	.enable_event	= nhm_uncore_msr_enable_event,
+	.read_counter	= snb_uncore_msr_read_counter,
+};
+
+static struct intel_uncore_type nhm_uncore = {
+	.name		= "",
+	.num_counters   = 8,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 48,
+	.fixed_ctr_bits	= 48,
+	.event_ctl	= NHM_UNC_PERFEVTSEL0,
+	.perf_ctr	= NHM_UNC_UNCORE_PMC0,
+	.fixed_ctr	= NHM_UNC_FIXED_CTR,
+	.fixed_ctl	= NHM_UNC_FIXED_CTR_CTRL,
+	.event_mask	= NHM_UNC_RAW_EVENT_MASK,
+	.event_descs	= nhm_uncore_events,
+	.ops		= &nhm_uncore_msr_ops,
+	.format_group	= &nhm_uncore_format_group,
+};
+
+static struct intel_uncore_type *nhm_msr_uncores[] = {
+	&nhm_uncore,
+	NULL,
+};
+/* end of Nehalem uncore support */
+
 static void uncore_assign_hw_event(struct intel_uncore_box *box,
 				struct perf_event *event, int idx)
 {
@@ -821,6 +1007,15 @@ static int __init uncore_cpu_init(void)
 	int ret, cpu;
 
 	switch (boot_cpu_data.x86_model) {
+	case 26: /* Nehalem */
+	case 30:
+	case 31:
+	case 37: /* Westmere */
+		msr_uncores = nhm_msr_uncores;
+		break;
+	case 42: /* Sandy Bridge */
+		msr_uncores = snb_msr_uncores;
+		break;
 	default:
 		return 0;
 	}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index ab648f3..1c87569 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -15,6 +15,56 @@
 
 #define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)
 
+/* SNB event control */
+#define SNB_UNC_CTL_EV_SEL_MASK			0x000000ff
+#define SNB_UNC_CTL_UMASK_MASK			0x0000ff00
+#define SNB_UNC_CTL_EDGE_DET			(1 << 18)
+#define SNB_UNC_CTL_EN				(1 << 22)
+#define SNB_UNC_CTL_INVERT			(1 << 23)
+#define SNB_UNC_CTL_CMASK_MASK			0x1f000000
+#define NHM_UNC_CTL_CMASK_MASK			0xff000000
+#define NHM_UNC_FIXED_CTR_CTL_EN		(1 << 0)
+
+#define SNB_UNC_RAW_EVENT_MASK			(SNB_UNC_CTL_EV_SEL_MASK | \
+						 SNB_UNC_CTL_UMASK_MASK | \
+						 SNB_UNC_CTL_EDGE_DET | \
+						 SNB_UNC_CTL_INVERT | \
+						 SNB_UNC_CTL_CMASK_MASK)
+
+#define NHM_UNC_RAW_EVENT_MASK			(SNB_UNC_CTL_EV_SEL_MASK | \
+						 SNB_UNC_CTL_UMASK_MASK | \
+						 SNB_UNC_CTL_EDGE_DET | \
+						 SNB_UNC_CTL_INVERT | \
+						 NHM_UNC_CTL_CMASK_MASK)
+
+/* SNB global control register */
+#define SNB_UNC_PERF_GLOBAL_CTL                 0x391
+#define SNB_UNC_FIXED_CTR_CTRL                  0x394
+#define SNB_UNC_FIXED_CTR                       0x395
+
+/* SNB uncore global control */
+#define SNB_UNC_GLOBAL_CTL_CORE_ALL             ((1 << 4) - 1)
+#define SNB_UNC_GLOBAL_CTL_EN                   (1 << 29)
+
+/* SNB Cbo register */
+#define SNB_UNC_CBO_0_PERFEVTSEL0               0x700
+#define SNB_UNC_CBO_0_PER_CTR0                  0x706
+#define SNB_UNC_CBO_MSR_OFFSET                  0x10
+
+/* NHM global control register */
+#define NHM_UNC_PERF_GLOBAL_CTL                 0x391
+#define NHM_UNC_FIXED_CTR                       0x394
+#define NHM_UNC_FIXED_CTR_CTRL                  0x395
+
+/* NHM uncore global control */
+#define NHM_UNC_GLOBAL_CTL_EN_PC_ALL            ((1ULL << 8) - 1)
+#define NHM_UNC_GLOBAL_CTL_EN_FC                (1ULL << 32)
+
+/* NHM uncore register */
+#define NHM_UNC_PERFEVTSEL0                     0x3c0
+#define NHM_UNC_UNCORE_PMC0                     0x3b0
+
+
 struct intel_uncore_ops;
 struct intel_uncore_pmu;
 struct intel_uncore_box;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/9] perf: Generic pci uncore device support
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (4 preceding siblings ...)
  2012-05-02  2:07 ` [PATCH 5/9] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-03 21:37   ` Peter Zijlstra
                     ` (2 more replies)
  2012-05-02  2:07 ` [PATCH 7/9] perf: Add Sandy Bridge-EP uncore support Yan, Zheng
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

This patch adds generic support for uncore pmu presented as
pci device.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  210 ++++++++++++++++++++++---
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   29 ++++
 2 files changed, 214 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 6022c8a..b4a15a5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -2,6 +2,7 @@
 
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
 static struct intel_uncore_type **msr_uncores = empty_uncore;
+static struct intel_uncore_type **pci_uncores = empty_uncore;
 
 /* mask of cpus that collect uncore events */
 static cpumask_t uncore_cpu_mask;
@@ -205,13 +206,23 @@ static void uncore_assign_hw_event(struct intel_uncore_box *box,
 	hwc->last_tag = ++box->tags[idx];
 
 	if (hwc->idx == UNCORE_PMC_IDX_FIXED) {
-		hwc->event_base = uncore_msr_fixed_ctr(box);
-		hwc->config_base = uncore_msr_fixed_ctl(box);
+		if (box->pci_dev) {
+			hwc->event_base = uncore_pci_fixed_ctr(box);
+			hwc->config_base = uncore_pci_fixed_ctl(box);
+		} else {
+			hwc->event_base = uncore_msr_fixed_ctr(box);
+			hwc->config_base = uncore_msr_fixed_ctl(box);
+		}
 		return;
 	}
 
-	hwc->config_base = uncore_msr_event_ctl(box, hwc->idx);
-	hwc->event_base =  uncore_msr_perf_ctr(box, hwc->idx);
+	if (box->pci_dev) {
+		hwc->config_base = uncore_pci_event_ctl(box, hwc->idx);
+		hwc->event_base =  uncore_pci_perf_ctr(box, hwc->idx);
+	} else {
+		hwc->config_base = uncore_msr_event_ctl(box, hwc->idx);
+		hwc->event_base =  uncore_msr_perf_ctr(box, hwc->idx);
+	}
 }
 
 static void uncore_perf_event_update(struct intel_uncore_box *box,
@@ -733,6 +744,13 @@ static void __init uncore_type_exit(struct intel_uncore_type *type)
 	type->pmus = NULL;
 }
 
+static void uncore_types_exit(struct intel_uncore_type **types)
+{
+	int i;
+	for (i = 0; types[i]; i++)
+		uncore_type_exit(types[i]);
+}
+
 static int __init uncore_type_init(struct intel_uncore_type *type)
 {
 	struct intel_uncore_pmu *pmus;
@@ -798,6 +816,121 @@ static int __init uncore_types_init(struct intel_uncore_type **types)
 	return ret;
 }
 
+static DEFINE_SPINLOCK(uncore_pci_lock);
+static struct pci_driver *uncore_pci_driver;
+static bool pcidrv_registered;
+/* pci bus to socket mapping */
+static int pcibus_to_phyid[256] = { [0 ... 255] = -1, };
+
+/*
+ * add a pci uncore device
+ */
+static int __devinit uncore_pci_add(struct intel_uncore_type *type,
+				struct pci_dev *pdev)
+{
+	struct intel_uncore_pmu *pmu;
+	struct intel_uncore_box *box;
+	int phyid, i, ret = 0;
+
+	phyid = pcibus_to_phyid[pdev->bus->number];
+	if (phyid < 0)
+		return -ENODEV;
+
+	box = uncore_alloc_box(0);
+	if (!box)
+		return -ENOMEM;
+
+	/*
+	 * for performance monitoring unit with multiple boxes,
+	 * each box has a different function id.
+	 */
+	for (i = 0; i < type->num_boxes; i++) {
+		pmu = &type->pmus[i];
+		if (pmu->func_id == pdev->devfn)
+			break;
+		if (pmu->func_id < 0) {
+			pmu->func_id = pdev->devfn;
+			break;
+		}
+		pmu = NULL;
+	}
+
+	if (pmu) {
+		box->phy_id = phyid;
+		box->pci_dev = pdev;
+		box->pmu = pmu;
+		uncore_box_init(box);
+		pci_set_drvdata(pdev, box);
+		spin_lock(&uncore_pci_lock);
+		uncore_pmu_add_box(pmu, box);
+		spin_unlock(&uncore_pci_lock);
+	} else {
+		ret = -EINVAL;
+		kfree(box);
+	}
+	return ret;
+}
+
+static void __devexit uncore_pci_remove(struct pci_dev *pdev)
+{
+	struct intel_uncore_box *box = pci_get_drvdata(pdev);
+	int phyid = pcibus_to_phyid[pdev->bus->number];
+
+	if (WARN_ON_ONCE(phyid != box->phy_id))
+		return;
+
+	box->pci_dev = NULL;
+	if (--box->refcnt == 0) {
+		spin_lock(&uncore_pci_lock);
+		hlist_del_rcu(&box->hlist);
+		spin_unlock(&uncore_pci_lock);
+		kfree_rcu(box, rcu_head);
+	}
+}
+
+static int __devinit uncore_pci_probe(struct pci_dev *pdev,
+				const struct pci_device_id *id)
+{
+	struct intel_uncore_type *type;
+
+	type = (struct intel_uncore_type *)id->driver_data;
+	return uncore_pci_add(type, pdev);
+}
+
+static int __init uncore_pci_init(void)
+{
+	int ret;
+
+	switch (boot_cpu_data.x86_model) {
+	default:
+		return 0;
+	}
+
+	ret =  uncore_types_init(pci_uncores);
+	if (ret)
+		return ret;
+
+	uncore_pci_driver->probe = uncore_pci_probe;
+	uncore_pci_driver->remove = uncore_pci_remove;
+
+	ret = pci_register_driver(uncore_pci_driver);
+	if (ret == 0)
+		pcidrv_registered = true;
+	else
+		uncore_types_exit(pci_uncores);
+
+	return ret;
+}
+
+static void __init uncore_pci_exit(void)
+{
+	if (pcidrv_registered) {
+		pcidrv_registered = false;
+		pci_unregister_driver(uncore_pci_driver);
+		uncore_types_exit(pci_uncores);
+	}
+}
+
 static void __cpuinit uncore_cpu_dying(int cpu)
 {
 	struct intel_uncore_type *type;
@@ -882,6 +1015,7 @@ static void __cpuinit uncore_event_exit_cpu(int cpu)
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
+	struct intel_uncore_type **uncores;
 	int i, j, phyid, target;
 
 	/* if exiting cpu is used for collecting uncore events */
@@ -904,22 +1038,28 @@ static void __cpuinit uncore_event_exit_cpu(int cpu)
 	if (target >= 0)
 		cpumask_set_cpu(target, &uncore_cpu_mask);
 
-	for (i = 0; msr_uncores[i]; i++) {
-		type = msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			box = uncore_pmu_find_box(pmu, phyid);
-			WARN_ON_ONCE(box->cpu != cpu);
-
-			if (target >= 0) {
-				uncore_pmu_cancel_hrtimer(box);
-				perf_pmu_migrate_context(&pmu->pmu,
+	uncores = msr_uncores;
+	while (1) {
+		for (i = 0; uncores[i]; i++) {
+			type = uncores[i];
+			for (j = 0; j < type->num_boxes; j++) {
+				pmu = &type->pmus[j];
+				box = uncore_pmu_find_box(pmu, phyid);
+				WARN_ON_ONCE(box->cpu != cpu);
+
+				if (target >= 0) {
+					uncore_pmu_cancel_hrtimer(box);
+					perf_pmu_migrate_context(&pmu->pmu,
 							cpu, target);
-				box->cpu = target;
-			} else {
-				box->cpu = -1;
+					box->cpu = target;
+				} else {
+					box->cpu = -1;
+				}
 			}
 		}
+		if (uncores != msr_uncores)
+			break;
+		uncores = pci_uncores;
 	}
 }
 
@@ -928,6 +1068,7 @@ static void __cpuinit uncore_event_init_cpu(int cpu)
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
+	struct intel_uncore_type **uncores;
 	int i, j, phyid;
 
 	phyid = topology_physical_package_id(cpu);
@@ -938,14 +1079,20 @@ static void __cpuinit uncore_event_init_cpu(int cpu)
 
 	cpumask_set_cpu(cpu, &uncore_cpu_mask);
 
-	for (i = 0; msr_uncores[i]; i++) {
-		type = msr_uncores[i];
-		for (j = 0; j < type->num_boxes; j++) {
-			pmu = &type->pmus[j];
-			box = uncore_pmu_find_box(pmu, phyid);
-			WARN_ON_ONCE(box->cpu != -1);
-			box->cpu = cpu;
+	uncores = msr_uncores;
+	while (1) {
+		for (i = 0; uncores[i]; i++) {
+			type = uncores[i];
+			for (j = 0; j < type->num_boxes; j++) {
+				pmu = &type->pmus[j];
+				box = uncore_pmu_find_box(pmu, phyid);
+				WARN_ON_ONCE(box->cpu != -1);
+				box->cpu = cpu;
+			}
 		}
+		if (uncores != msr_uncores)
+			break;
+		uncores = pci_uncores;
 	}
 }
 
@@ -1053,6 +1200,14 @@ static int __init uncore_pmus_register(void)
 		}
 	}
 
+	for (i = 0; pci_uncores[i]; i++) {
+		type = pci_uncores[i];
+		for (j = 0; j < type->num_boxes; j++) {
+			pmu = &type->pmus[j];
+			uncore_pmu_register(pmu);
+		}
+	}
+
 	return 0;
 }
 
@@ -1063,9 +1218,14 @@ static int __init intel_uncore_init(void)
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
 		return -ENODEV;
 
-	ret = uncore_cpu_init();
+	ret = uncore_pci_init();
 	if (ret)
 		goto fail;
+	ret = uncore_cpu_init();
+	if (ret) {
+		uncore_pci_exit();
+		goto fail;
+	}
 
 	uncore_pmus_register();
 	return 0;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index 1c87569..b39e623 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -1,5 +1,6 @@
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/pci.h>
 #include <linux/perf_event.h>
 #include "perf_event.h"
 
@@ -124,6 +125,7 @@ struct intel_uncore_box {
 	struct perf_event *event_list[UNCORE_PMC_IDX_MAX];
 	unsigned long active_mask[BITS_TO_LONGS(UNCORE_PMC_IDX_MAX)];
 	u64 tags[UNCORE_PMC_IDX_MAX];
+	struct pci_dev *pci_dev;
 	struct intel_uncore_pmu *pmu;
 	struct hrtimer hrtimer;
 	struct rcu_head rcu_head;
@@ -162,6 +164,33 @@ static ssize_t uncore_event_show(struct kobject *kobj,
 	return sprintf(buf, "%s", event->config);
 }
 
+static inline unsigned uncore_pci_box_ctl(struct intel_uncore_box *box)
+{
+	return box->pmu->type->box_ctl;
+}
+
+static inline unsigned uncore_pci_fixed_ctl(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctl;
+}
+
+static inline unsigned uncore_pci_fixed_ctr(struct intel_uncore_box *box)
+{
+	return box->pmu->type->fixed_ctr;
+}
+
+static inline
+unsigned uncore_pci_event_ctl(struct intel_uncore_box *box, int idx)
+{
+	return idx * 4 + box->pmu->type->event_ctl;
+}
+
+static inline
+unsigned uncore_pci_perf_ctr(struct intel_uncore_box *box, int idx)
+{
+	return idx * 8 + box->pmu->type->perf_ctr;
+}
+
 static inline
 unsigned uncore_msr_box_ctl(struct intel_uncore_box *box)
 {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 7/9] perf: Add Sandy Bridge-EP uncore support
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (5 preceding siblings ...)
  2012-05-02  2:07 ` [PATCH 6/9] perf: Generic pci uncore device support Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-03 21:12   ` Peter Zijlstra
  2012-05-02  2:07 ` [PATCH 8/9] perf tool: Make the event parser reentrantable Yan, Zheng
  2012-05-02  2:07 ` [PATCH 9/9] perf tool: Add pmu event alias support Yan, Zheng
  8 siblings, 1 reply; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Add Intel Nehalem and Sandy Bridge uncore pmu support. The uncore
subsystem in Sandy Bridge-EP consists of 8 components (Ubox,
Cacheing Agent, Home Agent, Memory controller, Power Control,
QPI Link Layer, R2PCIe, R3QPI).

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |  489 ++++++++++++++++++++++++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.h |   86 +++++
 include/linux/pci_ids.h                       |   11 +
 3 files changed, 584 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index b4a15a5..c0c77d0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -4,6 +4,9 @@ static struct intel_uncore_type *empty_uncore[] = { NULL, };
 static struct intel_uncore_type **msr_uncores = empty_uncore;
 static struct intel_uncore_type **pci_uncores = empty_uncore;
 
+/* pci bus to socket mapping */
+static int pcibus_to_phyid[256] = { [0 ... 255] = -1, };
+
 /* mask of cpus that collect uncore events */
 static cpumask_t uncore_cpu_mask;
 
@@ -17,6 +20,482 @@ DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
 DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
 DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
 DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(thresh8, thresh, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(thresh5, thresh, "config:24-28");
+DEFINE_UNCORE_FORMAT_ATTR(occ_sel, occ_sel, "config:14-15");
+DEFINE_UNCORE_FORMAT_ATTR(occ_invert, occ_invert, "config:30");
+DEFINE_UNCORE_FORMAT_ATTR(occ_edge, occ_edge, "config:14-51");
+
+/* Sandy Bridge-EP uncore support */
+static void snbep_uncore_pci_disable_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+	u32 config;
+
+	pci_read_config_dword(pdev, box_ctl, &config);
+	config |= SNBEP_PMON_BOX_CTL_FRZ;
+	pci_write_config_dword(pdev, box_ctl, config);
+}
+
+static void snbep_uncore_pci_enable_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+	u32 config;
+
+	pci_read_config_dword(pdev, box_ctl, &config);
+	config &= ~SNBEP_PMON_BOX_CTL_FRZ;
+	pci_write_config_dword(pdev, box_ctl, config);
+}
+
+static void snbep_uncore_pci_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, hwc->config |
+				SNBEP_PMON_CTL_EN);
+}
+
+static void snbep_uncore_pci_disable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, hwc->config);
+}
+
+static u64 snbep_uncore_pci_read_counter(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+	u64 count;
+
+	pci_read_config_dword(pdev, hwc->event_base, (u32 *)&count);
+	pci_read_config_dword(pdev, hwc->event_base + 4, (u32 *)&count + 1);
+	return count;
+}
+
+static void snbep_uncore_pci_init_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	pci_write_config_dword(pdev, SNBEP_PCI_PMON_BOX_CTL,
+				SNBEP_PMON_BOX_CTL_INT);
+}
+
+static void snbep_uncore_msr_disable_box(struct intel_uncore_box *box)
+{
+	u64 config;
+	unsigned msr;
+
+	msr = uncore_msr_box_ctl(box);
+	if (msr) {
+		rdmsrl(msr, config);
+		config |= SNBEP_PMON_BOX_CTL_FRZ;
+		wrmsrl(msr, config);
+		return;
+	}
+}
+
+static void snbep_uncore_msr_enable_box(struct intel_uncore_box *box)
+{
+	u64 config;
+	unsigned msr;
+
+	msr = uncore_msr_box_ctl(box);
+	if (msr) {
+		rdmsrl(msr, config);
+		config &= ~SNBEP_PMON_BOX_CTL_FRZ;
+		wrmsrl(msr, config);
+		return;
+	}
+}
+
+static void snbep_uncore_msr_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, hwc->config | SNBEP_PMON_CTL_EN);
+}
+
+static void snbep_uncore_msr_disable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, hwc->config);
+}
+
+static u64 snbep_uncore_msr_read_counter(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 count;
+
+	rdmsrl(hwc->event_base, count);
+	return count;
+}
+
+static void snbep_uncore_msr_init_box(struct intel_uncore_box *box)
+{
+	unsigned msr = uncore_msr_box_ctl(box);
+	if (msr)
+		wrmsrl(msr, SNBEP_PMON_BOX_CTL_INT);
+}
+
+static struct attribute *snbep_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh8.attr,
+	NULL,
+};
+
+static struct attribute *snbep_uncore_ubox_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh5.attr,
+	NULL,
+};
+
+static struct attribute *snbep_uncore_pcu_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_occ_sel.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh5.attr,
+	&format_attr_occ_invert.attr,
+	&format_attr_occ_edge.attr,
+	NULL,
+};
+
+static struct uncore_event_desc snbep_uncore_imc_events[] = {
+	INTEL_UNCORE_EVENT_DESC(CLOCKTICKS, "config=0xffff"),
+	/* read */
+	INTEL_UNCORE_EVENT_DESC(CAS_COUNT_RD, "event=0x4,umask=0x3"),
+	/* write */
+	INTEL_UNCORE_EVENT_DESC(CAS_COUNT_WR, "event=0x4,umask=0xc"),
+	{ /* end: all zeroes */ },
+};
+
+static struct uncore_event_desc snbep_uncore_qpi_events[] = {
+	INTEL_UNCORE_EVENT_DESC(CLOCKTICKS, "event=0x14"),
+	/* outgoing data+nondata flits */
+	INTEL_UNCORE_EVENT_DESC(TxL_FLITS_ACTIVE, "event=0x0,umask=0x6"),
+	/* DRS data received */
+	INTEL_UNCORE_EVENT_DESC(DRS_DATA, "event=0x2,umask=0x8"),
+	/* NCB data received */
+	INTEL_UNCORE_EVENT_DESC(NCB_DATA, "event=0x3,umask=0x4"),
+	{ /* end: all zeroes */ },
+};
+
+static struct attribute_group snbep_uncore_format_group = {
+	.name = "format",
+	.attrs = snbep_uncore_formats_attr,
+};
+
+static struct attribute_group snbep_uncore_ubox_format_group = {
+	.name = "format",
+	.attrs = snbep_uncore_ubox_formats_attr,
+};
+
+static struct attribute_group snbep_uncore_pcu_format_group = {
+	.name = "format",
+	.attrs = snbep_uncore_pcu_formats_attr,
+};
+
+static struct intel_uncore_ops snbep_uncore_msr_ops = {
+	.init_box	= snbep_uncore_msr_init_box,
+	.disable_box	= snbep_uncore_msr_disable_box,
+	.enable_box	= snbep_uncore_msr_enable_box,
+	.disable_event	= snbep_uncore_msr_disable_event,
+	.enable_event	= snbep_uncore_msr_enable_event,
+	.read_counter	= snbep_uncore_msr_read_counter,
+};
+
+static struct intel_uncore_ops snbep_uncore_pci_ops = {
+	.init_box	= snbep_uncore_pci_init_box,
+	.disable_box	= snbep_uncore_pci_disable_box,
+	.enable_box	= snbep_uncore_pci_enable_box,
+	.disable_event	= snbep_uncore_pci_disable_event,
+	.enable_event	= snbep_uncore_pci_enable_event,
+	.read_counter	= snbep_uncore_pci_read_counter,
+};
+
+static struct event_constraint snbep_uncore_cbo_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x01, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x02, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x04, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x05, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x07, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x11, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x12, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x13, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x1b, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1c, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1d, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1e, 0xc),
+	UNCORE_EVENT_CONSTRAINT(0x1f, 0xe),
+	UNCORE_EVENT_CONSTRAINT(0x21, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x23, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x31, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x32, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x33, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x34, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x35, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x36, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x37, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x38, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x39, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x3b, 0x1),
+	EVENT_CONSTRAINT_END
+};
+
+static struct event_constraint snbep_uncore_r2pcie_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x10, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x11, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x12, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x23, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x24, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x25, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x26, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x32, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x33, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x34, 0x3),
+	EVENT_CONSTRAINT_END
+};
+
+static struct event_constraint snbep_uncore_r3qpi_constraints[] = {
+	UNCORE_EVENT_CONSTRAINT(0x10, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x11, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x12, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x13, 0x1),
+	UNCORE_EVENT_CONSTRAINT(0x20, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x21, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x22, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x23, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x24, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x25, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x26, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x30, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x31, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x32, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x33, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x34, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x36, 0x3),
+	UNCORE_EVENT_CONSTRAINT(0x37, 0x3),
+	EVENT_CONSTRAINT_END
+};
+
+static struct intel_uncore_type snbep_uncore_ubox = {
+	.name		= "ubox",
+	.num_counters   = 2,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 44,
+	.fixed_ctr_bits	= 48,
+	.perf_ctr	= SNBEP_U_MSR_PMON_CTR0,
+	.event_ctl	= SNBEP_U_MSR_PMON_CTL0,
+	.event_mask	= SNBEP_U_MSR_PMON_RAW_EVENT_MASK,
+	.fixed_ctr	= SNBEP_U_MSR_PMON_UCLK_FIXED_CTR,
+	.fixed_ctl	= SNBEP_U_MSR_PMON_UCLK_FIXED_CTL,
+	.ops		= &snbep_uncore_msr_ops,
+	.format_group	= &snbep_uncore_ubox_format_group,
+};
+
+static struct intel_uncore_type snbep_uncore_cbo = {
+	.name		= "cbo",
+	.num_counters   = 4,
+	.num_boxes	= 8,
+	.perf_ctr_bits	= 44,
+	.event_ctl	= SNBEP_C0_MSR_PMON_CTL0,
+	.perf_ctr	= SNBEP_C0_MSR_PMON_CTR0,
+	.event_mask	= SNBEP_PMON_RAW_EVENT_MASK,
+	.box_ctl	= SNBEP_C0_MSR_PMON_BOX_CTL,
+	.msr_offset	= SNBEP_CBO_MSR_OFFSET,
+	.constraints	= snbep_uncore_cbo_constraints,
+	.ops		= &snbep_uncore_msr_ops,
+	.format_group	= &snbep_uncore_format_group,
+};
+
+static struct intel_uncore_type snbep_uncore_pcu = {
+	.name		= "pcu",
+	.num_counters   = 4,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 48,
+	.perf_ctr	= SNBEP_PCU_MSR_PMON_CTR0,
+	.event_ctl	= SNBEP_PCU_MSR_PMON_CTL0,
+	.event_mask	= SNBEP_PCU_MSR_PMON_RAW_EVENT_MASK,
+	.box_ctl	= SNBEP_PCU_MSR_PMON_BOX_CTL,
+	.ops		= &snbep_uncore_msr_ops,
+	.format_group	= &snbep_uncore_pcu_format_group,
+};
+
+static struct intel_uncore_type *snbep_msr_uncores[] = {
+	&snbep_uncore_ubox,
+	&snbep_uncore_cbo,
+	&snbep_uncore_pcu,
+	NULL,
+};
+
+#define SNBEP_UNCORE_PCI_COMMON_INIT()				\
+	.perf_ctr	= SNBEP_PCI_PMON_CTR0,			\
+	.event_ctl	= SNBEP_PCI_PMON_CTL0,			\
+	.event_mask	= SNBEP_PMON_RAW_EVENT_MASK,		\
+	.box_ctl	= SNBEP_PCI_PMON_BOX_CTL,		\
+	.ops		= &snbep_uncore_pci_ops,		\
+	.format_group	= &snbep_uncore_format_group
+
+static struct intel_uncore_type snbep_uncore_ha = {
+	.name		= "ha",
+	.num_counters   = 4,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 48,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type snbep_uncore_imc = {
+	.name		= "imc",
+	.num_counters   = 4,
+	.num_boxes	= 4,
+	.perf_ctr_bits	= 48,
+	.fixed_ctr_bits	= 48,
+	.fixed_ctr	= SNBEP_MC_CHy_PCI_PMON_FIXED_CTR,
+	.fixed_ctl	= SNBEP_MC_CHy_PCI_PMON_FIXED_CTL,
+	.event_descs	= snbep_uncore_imc_events,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type snbep_uncore_qpi = {
+	.name		= "qpi",
+	.num_counters   = 4,
+	.num_boxes	= 2,
+	.perf_ctr_bits	= 48,
+	.event_descs	= snbep_uncore_qpi_events,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+
+static struct intel_uncore_type snbep_uncore_r2pcie = {
+	.name		= "r2pcie",
+	.num_counters   = 4,
+	.num_boxes	= 1,
+	.perf_ctr_bits	= 44,
+	.constraints	= snbep_uncore_r2pcie_constraints,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type snbep_uncore_r3qpi = {
+	.name		= "r3qpi",
+	.num_counters   = 3,
+	.num_boxes	= 2,
+	.perf_ctr_bits	= 44,
+	.constraints	= snbep_uncore_r3qpi_constraints,
+	SNBEP_UNCORE_PCI_COMMON_INIT(),
+};
+
+static struct intel_uncore_type *snbep_pci_uncores[] = {
+	&snbep_uncore_ha,
+	&snbep_uncore_imc,
+	&snbep_uncore_qpi,
+	&snbep_uncore_r2pcie,
+	&snbep_uncore_r3qpi,
+	NULL,
+};
+
+static DEFINE_PCI_DEVICE_TABLE(snbep_uncore_pci_ids) = {
+	{ /* Home Agent */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_HA),
+		.driver_data = (unsigned long)&snbep_uncore_ha,
+	},
+	{ /* MC Channel 0 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC0),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* MC Channel 1 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC1),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* MC Channel 2 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC2),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* MC Channel 3 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_IMC3),
+		.driver_data = (unsigned long)&snbep_uncore_imc,
+	},
+	{ /* QPI Port 0 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_QPI0),
+		.driver_data = (unsigned long)&snbep_uncore_qpi,
+	},
+	{ /* QPI Port 1 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_QPI1),
+		.driver_data = (unsigned long)&snbep_uncore_qpi,
+	},
+	{ /* P2PCIe */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_R2PCIE),
+		.driver_data = (unsigned long)&snbep_uncore_r2pcie,
+	},
+	{ /* R3QPI Link 0 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_R3QPI0),
+		.driver_data = (unsigned long)&snbep_uncore_r3qpi,
+	},
+	{ /* R3QPI Link 1 */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_UNC_R3QPI1),
+		.driver_data = (unsigned long)&snbep_uncore_r3qpi,
+	},
+	{ /* end: all zeroes */ }
+};
+
+static struct pci_driver snbep_uncore_pci_driver = {
+	.name		= "snbep_uncore",
+	.id_table	= snbep_uncore_pci_ids,
+};
+
+/*
+ * build pci bus to socket mapping
+ */
+static void snbep_pci2phy_map_init(void)
+{
+	struct pci_dev *ubox_dev = NULL;
+	int i, bus, nodeid;
+	u32 config;
+
+	while (1) {
+		/* find the UBOX device */
+		ubox_dev = pci_get_device(PCI_VENDOR_ID_INTEL,
+					PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX,
+					ubox_dev);
+		if (!ubox_dev)
+			break;
+		bus = ubox_dev->bus->number;
+		/* get the Node ID of the local register */
+		pci_read_config_dword(ubox_dev, 0x40, &config);
+		nodeid = config;
+		/* get the Node ID mapping */
+		pci_read_config_dword(ubox_dev, 0x54, &config);
+		/*
+		 * every three bits in the Node ID mapping register maps
+		 * to a particular node.
+		 */
+		for (i = 0; i < 8; i++) {
+			if (nodeid == ((config >> (3 * i)) & 0x7)) {
+				pcibus_to_phyid[bus] = i;
+				break;
+			}
+		}
+	};
+	return;
+}
+/* end of Sandy Bridge-EP uncore support */
+
 
 /* Sandy Bridge uncore support */
 static void snb_uncore_msr_enable_event(struct intel_uncore_box *box,
@@ -819,8 +1298,6 @@ static int __init uncore_types_init(struct intel_uncore_type **types)
 static DEFINE_SPINLOCK(uncore_pci_lock);
 static struct pci_driver *uncore_pci_driver;
 static bool pcidrv_registered;
-/* pci bus to socket mapping */
-static int pcibus_to_phyid[256] = { [0 ... 255] = -1, };
 
 /*
  * add a pci uncore device
@@ -902,6 +1379,11 @@ static int __init uncore_pci_init(void)
 	int ret;
 
 	switch (boot_cpu_data.x86_model) {
+	case 45: /* Sandy Bridge-EP */
+		pci_uncores = snbep_pci_uncores;
+		uncore_pci_driver = &snbep_uncore_pci_driver;
+		snbep_pci2phy_map_init();
+		break;
 	default:
 		return 0;
 	}
@@ -1163,6 +1645,9 @@ static int __init uncore_cpu_init(void)
 	case 42: /* Sandy Bridge */
 		msr_uncores = snb_msr_uncores;
 		break;
+	case 45: /* Sandy Birdge-EP */
+		msr_uncores = snbep_msr_uncores;
+		break;
 	default:
 		return 0;
 	}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index b39e623..07c4908 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -65,6 +65,92 @@
 #define NHM_UNC_PERFEVTSEL0                     0x3c0
 #define NHM_UNC_UNCORE_PMC0                     0x3b0
 
+/* SNB-EP Box level control */
+#define SNBEP_PMON_BOX_CTL_RST_CTRL	(1 << 0)
+#define SNBEP_PMON_BOX_CTL_RST_CTRS	(1 << 1)
+#define SNBEP_PMON_BOX_CTL_FRZ		(1 << 8)
+#define SNBEP_PMON_BOX_CTL_FRZ_EN	(1 << 16)
+#define SNBEP_PMON_BOX_CTL_INT		(SNBEP_PMON_BOX_CTL_RST_CTRL | \
+					 SNBEP_PMON_BOX_CTL_RST_CTRS | \
+					 SNBEP_PMON_BOX_CTL_FRZ_EN)
+/* SNB-EP event control */
+#define SNBEP_PMON_CTL_EV_SEL_MASK	0x000000ff
+#define SNBEP_PMON_CTL_UMASK_MASK	0x0000ff00
+#define SNBEP_PMON_CTL_RST		(1 << 17)
+#define SNBEP_PMON_CTL_EDGE_DET		(1 << 18)
+#define SNBEP_PMON_CTL_EV_SEL_EXT	(1 << 21)	/* only for QPI */
+#define SNBEP_PMON_CTL_EN		(1 << 22)
+#define SNBEP_PMON_CTL_INVERT		(1 << 23)
+#define SNBEP_PMON_CTL_TRESH_MASK	0xff000000
+#define SNBEP_PMON_RAW_EVENT_MASK	(SNBEP_PMON_CTL_EV_SEL_MASK | \
+					 SNBEP_PMON_CTL_UMASK_MASK | \
+					 SNBEP_PMON_CTL_EDGE_DET | \
+					 SNBEP_PMON_CTL_INVERT | \
+					 SNBEP_PMON_CTL_TRESH_MASK)
+
+/* SNB-EP Ubox event control */
+#define SNBEP_U_MSR_PMON_CTL_TRESH_MASK		0x1f000000
+#define SNBEP_U_MSR_PMON_RAW_EVENT_MASK		\
+				(SNBEP_PMON_CTL_EV_SEL_MASK | \
+				 SNBEP_PMON_CTL_UMASK_MASK | \
+				 SNBEP_PMON_CTL_EDGE_DET | \
+				 SNBEP_PMON_CTL_INVERT | \
+				 SNBEP_U_MSR_PMON_CTL_TRESH_MASK)
+
+/* SNB-EP PCU event control */
+#define SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK	0x0000c000
+#define SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK	0x1f000000
+#define SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT	(1 << 30)
+#define SNBEP_PCU_MSR_PMON_CTL_OCC_EDGE_DET	(1 << 31)
+#define SNBEP_PCU_MSR_PMON_RAW_EVENT_MASK	\
+				(SNBEP_PMON_CTL_EV_SEL_MASK | \
+				 SNBEP_PCU_MSR_PMON_CTL_OCC_SEL_MASK | \
+				 SNBEP_PMON_CTL_EDGE_DET | \
+				 SNBEP_PMON_CTL_INVERT | \
+				 SNBEP_PCU_MSR_PMON_CTL_TRESH_MASK | \
+				 SNBEP_PCU_MSR_PMON_CTL_OCC_INVERT | \
+				 SNBEP_PCU_MSR_PMON_CTL_OCC_EDGE_DET)
+
+/* SNB-EP pci control register */
+#define SNBEP_PCI_PMON_BOX_CTL			0xf4
+#define SNBEP_PCI_PMON_CTL0			0xd8
+/* SNB-EP pci counter register */
+#define SNBEP_PCI_PMON_CTR0			0xa0
+
+/* SNB-EP home agent register */
+#define SNBEP_HA_PCI_PMON_BOX_ADDRMATCH0	0x40
+#define SNBEP_HA_PCI_PMON_BOX_ADDRMATCH1	0x44
+#define SNBEP_HA_PCI_PMON_BOX_OPCODEMATCH	0x48
+/* SNB-EP memory controller register */
+#define SNBEP_MC_CHy_PCI_PMON_FIXED_CTL		0xf0
+#define SNBEP_MC_CHy_PCI_PMON_FIXED_CTR		0xd0
+/* SNB-EP QPI register */
+#define SNBEP_Q_Py_PCI_PMON_PKT_MATCH0		0x228
+#define SNBEP_Q_Py_PCI_PMON_PKT_MATCH1		0x22c
+#define SNBEP_Q_Py_PCI_PMON_PKT_MASK0		0x238
+#define SNBEP_Q_Py_PCI_PMON_PKT_MASK1		0x23c
+
+/* SNB-EP Ubox register */
+#define SNBEP_U_MSR_PMON_CTR0			0xc16
+#define SNBEP_U_MSR_PMON_CTL0			0xc10
+
+#define SNBEP_U_MSR_PMON_UCLK_FIXED_CTL		0xc08
+#define SNBEP_U_MSR_PMON_UCLK_FIXED_CTR		0xc09
+
+/* SNB-EP Cbo register */
+#define SNBEP_C0_MSR_PMON_CTR0			0xd16
+#define SNBEP_C0_MSR_PMON_CTL0			0xd10
+#define SNBEP_C0_MSR_PMON_BOX_FILTER		0xd14
+#define SNBEP_C0_MSR_PMON_BOX_CTL		0xd04
+#define SNBEP_CBO_MSR_OFFSET			0x20
+
+/* SNB-EP PCU register */
+#define SNBEP_PCU_MSR_PMON_CTR0			0xc36
+#define SNBEP_PCU_MSR_PMON_CTL0			0xc30
+#define SNBEP_PCU_MSR_PMON_BOX_FILTER		0xc34
+#define SNBEP_PCU_MSR_PMON_BOX_CTL		0xc24
+#define SNBEP_PCU_MSR_CORE_C3_CTR		0x3fc
+#define SNBEP_PCU_MSR_CORE_C6_CTR		0x3fd
 
 struct intel_uncore_ops;
 struct intel_uncore_pmu;
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 3329965..9870b8d 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2754,6 +2754,17 @@
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB7	0x3c27
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB8	0x3c2e
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB9	0x3c2f
+#define PCI_DEVICE_ID_INTEL_UNC_HA	0x3c46
+#define PCI_DEVICE_ID_INTEL_UNC_IMC0	0x3cb0
+#define PCI_DEVICE_ID_INTEL_UNC_IMC1	0x3cb1
+#define PCI_DEVICE_ID_INTEL_UNC_IMC2	0x3cb4
+#define PCI_DEVICE_ID_INTEL_UNC_IMC3	0x3cb5
+#define PCI_DEVICE_ID_INTEL_UNC_QPI0	0x3c41
+#define PCI_DEVICE_ID_INTEL_UNC_QPI1	0x3c42
+#define PCI_DEVICE_ID_INTEL_UNC_R2PCIE	0x3c43
+#define PCI_DEVICE_ID_INTEL_UNC_R3QPI0	0x3c44
+#define PCI_DEVICE_ID_INTEL_UNC_R3QPI1	0x3c45
+#define PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX	0x3ce0
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB	0x402f
 #define PCI_DEVICE_ID_INTEL_5100_16	0x65f0
 #define PCI_DEVICE_ID_INTEL_5100_21	0x65f5
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 8/9] perf tool: Make the event parser reentrantable
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (6 preceding siblings ...)
  2012-05-02  2:07 ` [PATCH 7/9] perf: Add Sandy Bridge-EP uncore support Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-02  2:07 ` [PATCH 9/9] perf tool: Add pmu event alias support Yan, Zheng
  8 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 tools/perf/Makefile            |    7 ++-
 tools/perf/util/parse-events.c |   32 ++++++++++----
 tools/perf/util/parse-events.h |    2 +-
 tools/perf/util/parse-events.l |   92 ++++++++++++++++++++-------------------
 tools/perf/util/parse-events.y |    9 +++-
 5 files changed, 83 insertions(+), 59 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7055a00..2d59ca1 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -725,14 +725,17 @@ $(OUTPUT)perf.o perf.spec \
 .SUFFIXES:
 .SUFFIXES: .o .c .S .s
 
+# Remove -Wextra to suppress 'unused parameter' warning
+ALL_CFLAGS_NO_WEXTRA = $(shell echo $(ALL_CFLAGS) | sed "s/-Wextra //g")
+
 # These two need to be here so that when O= is not used they take precedence
 # over the general rule for .o
 
 $(OUTPUT)util/%-flex.o: $(OUTPUT)util/%-flex.c $(OUTPUT)PERF-CFLAGS
-	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) -Iutil/ -Wno-redundant-decls -Wno-switch-default -Wno-unused-function $<
+	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS_NO_WEXTRA) -Iutil/ -Wno-redundant-decls -Wno-switch-default -Wno-unused-function $<
 
 $(OUTPUT)util/%-bison.o: $(OUTPUT)util/%-bison.c $(OUTPUT)PERF-CFLAGS
-	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -Iutil/ -Wno-redundant-decls -Wno-switch-default -Wno-unused-function $<
+	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS_NO_WEXTRA) -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -Iutil/ -Wno-redundant-decls -Wno-switch-default -Wno-unused-function $<
 
 $(OUTPUT)%.o: %.c $(OUTPUT)PERF-CFLAGS
 	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5b3a0ef..c587ae8 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -11,6 +11,7 @@
 #include "cache.h"
 #include "header.h"
 #include "debugfs.h"
+#include "parse-events-bison.h"
 #include "parse-events-flex.h"
 #include "pmu.h"
 
@@ -24,7 +25,8 @@ struct event_symbol {
 };
 
 int parse_events_parse(struct list_head *list, struct list_head *list_tmp,
-		       int *idx);
+		       int *idx, void *scanner);
+static int __parse_events(const char *str, int *idx, struct list_head *list);
 
 #define CHW(x) .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_##x
 #define CSW(x) .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_##x
@@ -747,20 +749,34 @@ int parse_events_modifier(struct list_head *list, char *str)
 	return 0;
 }
 
-int parse_events(struct perf_evlist *evlist, const char *str, int unset __used)
+static int __parse_events(const char *str, int *idx, struct list_head *list)
 {
-	LIST_HEAD(list);
 	LIST_HEAD(list_tmp);
 	YY_BUFFER_STATE buffer;
-	int ret, idx = evlist->nr_entries;
+	void *scanner;
+	int ret;
+
+	ret = parse_events_lex_init(&scanner);
+	if (ret)
+		return ret;
+
+	buffer = parse_events__scan_string(str, scanner);
 
-	buffer = parse_events__scan_string(str);
+	ret = parse_events_parse(list, &list_tmp, idx, scanner);
 
-	ret = parse_events_parse(&list, &list_tmp, &idx);
+	parse_events__flush_buffer(buffer, scanner);
+	parse_events__delete_buffer(buffer, scanner);
+	parse_events_lex_destroy(scanner);
 
-	parse_events__flush_buffer(buffer);
-	parse_events__delete_buffer(buffer);
+	return ret;
+}
+
+int parse_events(struct perf_evlist *evlist, const char *str, int unset __used)
+{
+	LIST_HEAD(list);
+	int ret, idx = evlist->nr_entries;
 
+	ret =  __parse_events(str, &idx, &list);
 	if (!ret) {
 		int entries = idx - evlist->nr_entries;
 		perf_evlist__splice_list_tail(evlist, &list, entries);
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index ca069f8..e1ffeb7 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -80,7 +80,7 @@ void parse_events_update_lists(struct list_head *list_event,
 			       struct list_head *list_all);
 void parse_events_error(struct list_head *list_all,
 			struct list_head *list_event,
-			int *idx, char const *msg);
+			int *idx, void *scanner, char const *msg);
 
 void print_events(const char *event_glob);
 void print_events_type(u8 type);
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 1fcf1bb..eec423c 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -1,4 +1,6 @@
 
+%option reentrant
+%option bison-bridge
 %option prefix="parse_events_"
 
 %{
@@ -7,7 +9,7 @@
 #include "parse-events-bison.h"
 #include "parse-events.h"
 
-static int __value(char *str, int base, int token)
+static int __value(YYSTYPE *yylval, char *str, int base, int token)
 {
 	long num;
 
@@ -16,35 +18,35 @@ static int __value(char *str, int base, int token)
 	if (errno)
 		return PE_ERROR;
 
-	parse_events_lval.num = num;
+	yylval->num = num;
 	return token;
 }
 
-static int value(int base)
+static int value(YYSTYPE *yylval, char *text, int base)
 {
-	return __value(parse_events_text, base, PE_VALUE);
+	return __value(yylval, text, base, PE_VALUE);
 }
 
-static int raw(void)
+static int raw(YYSTYPE *yylval, char *text)
 {
-	return __value(parse_events_text + 1, 16, PE_RAW);
+	return __value(yylval, text + 1, 16, PE_RAW);
 }
 
-static int str(int token)
+static int str(YYSTYPE *yylval, char *text, int token)
 {
-	parse_events_lval.str = strdup(parse_events_text);
+	yylval->str = strdup(text);
 	return token;
 }
 
-static int sym(int type, int config)
+static int sym(YYSTYPE *yylval, int type, int config)
 {
-	parse_events_lval.num = (type << 16) + config;
+	yylval->num = (type << 16) + config;
 	return PE_VALUE_SYM;
 }
 
-static int term(int type)
+static int term(YYSTYPE *yylval, int type)
 {
-	parse_events_lval.num = type;
+	yylval->num = type;
 	return PE_TERM;
 }
 
@@ -58,25 +60,25 @@ modifier_event	[ukhpGH]{1,8}
 modifier_bp	[rwx]
 
 %%
-cpu-cycles|cycles				{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); }
-stalled-cycles-frontend|idle-cycles-frontend	{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
-stalled-cycles-backend|idle-cycles-backend	{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
-instructions					{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); }
-cache-references				{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES); }
-cache-misses					{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); }
-branch-instructions|branches			{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
-branch-misses					{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
-bus-cycles					{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
-ref-cycles					{ return sym(PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
-cpu-clock					{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
-task-clock					{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
-page-faults|faults				{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
-minor-faults					{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
-major-faults					{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
-context-switches|cs				{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES); }
-cpu-migrations|migrations			{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
-alignment-faults				{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
-emulation-faults				{ return sym(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
+cpu-cycles|cycles				{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend	{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend	{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions					{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references				{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses					{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches			{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses					{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles					{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles					{ return sym(yylval, PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-clock					{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
+task-clock					{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
+page-faults|faults				{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
+minor-faults					{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
+major-faults					{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
+context-switches|cs				{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES); }
+cpu-migrations|migrations			{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
+alignment-faults				{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
+emulation-faults				{ return sym(yylval, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
 
 L1-dcache|l1-d|l1d|L1-data		|
 L1-icache|l1-i|l1i|L1-instruction	|
@@ -84,14 +86,14 @@ LLC|L2					|
 dTLB|d-tlb|Data-TLB			|
 iTLB|i-tlb|Instruction-TLB		|
 branch|branches|bpu|btb|bpc		|
-node					{ return str(PE_NAME_CACHE_TYPE); }
+node					{ return str(yylval, yytext, PE_NAME_CACHE_TYPE); }
 
 load|loads|read				|
 store|stores|write			|
 prefetch|prefetches			|
 speculative-read|speculative-load	|
 refs|Reference|ops|access		|
-misses|miss				{ return str(PE_NAME_CACHE_OP_RESULT); }
+misses|miss				{ return str(yylval, yytext, PE_NAME_CACHE_OP_RESULT); }
 
 	/*
 	 * These are event config hardcoded term names to be specified
@@ -99,20 +101,20 @@ misses|miss				{ return str(PE_NAME_CACHE_OP_RESULT); }
 	 * so we can put them here directly. In case the we have a conflict
 	 * in future, this needs to go into '//' condition block.
 	 */
-config			{ return term(PARSE_EVENTS__TERM_TYPE_CONFIG); }
-config1			{ return term(PARSE_EVENTS__TERM_TYPE_CONFIG1); }
-config2			{ return term(PARSE_EVENTS__TERM_TYPE_CONFIG2); }
-period			{ return term(PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
-branch_type		{ return term(PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
+config			{ return term(yylval, PARSE_EVENTS__TERM_TYPE_CONFIG); }
+config1			{ return term(yylval, PARSE_EVENTS__TERM_TYPE_CONFIG1); }
+config2			{ return term(yylval, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
+period			{ return term(yylval, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
+branch_type		{ return term(yylval, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 
 mem:			{ return PE_PREFIX_MEM; }
-r{num_raw_hex}		{ return raw(); }
-{num_dec}		{ return value(10); }
-{num_hex}		{ return value(16); }
+r{num_raw_hex}		{ return raw(yylval, yytext); }
+{num_dec}		{ return value(yylval, yytext, 10); }
+{num_hex}		{ return value(yylval, yytext, 16); }
 
-{modifier_event}	{ return str(PE_MODIFIER_EVENT); }
-{modifier_bp}		{ return str(PE_MODIFIER_BP); }
-{name}			{ return str(PE_NAME); }
+{modifier_event}	{ return str(yylval, yytext, PE_MODIFIER_EVENT); }
+{modifier_bp}		{ return str(yylval, yytext, PE_MODIFIER_BP); }
+{name}			{ return str(yylval, yytext, PE_NAME); }
 "/"			{ return '/'; }
 -			{ return '-'; }
 ,			{ return ','; }
@@ -121,7 +123,7 @@ r{num_raw_hex}		{ return raw(); }
 
 %%
 
-int parse_events_wrap(void)
+int parse_events_wrap(void *scanner __used)
 {
 	return 1;
 }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index d9637da..52082a7 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -1,8 +1,10 @@
-
+%pure-parser
 %name-prefix "parse_events_"
 %parse-param {struct list_head *list_all}
 %parse-param {struct list_head *list_event}
 %parse-param {int *idx}
+%parse-param {void *scanner}
+%lex-param {void* scanner}
 
 %{
 
@@ -13,8 +15,9 @@
 #include "types.h"
 #include "util.h"
 #include "parse-events.h"
+#include "parse-events-bison.h"
 
-extern int parse_events_lex (void);
+extern int parse_events_lex (YYSTYPE* lvalp, void* scanner);
 
 #define ABORT_ON(val) \
 do { \
@@ -223,7 +226,7 @@ sep_slash_dc: '/' | ':' |
 
 void parse_events_error(struct list_head *list_all __used,
 			struct list_head *list_event __used,
-			int *idx __used,
+			int *idx __used, void *scanner __used,
 			char const *msg __used)
 {
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
                   ` (7 preceding siblings ...)
  2012-05-02  2:07 ` [PATCH 8/9] perf tool: Make the event parser reentrantable Yan, Zheng
@ 2012-05-02  2:07 ` Yan, Zheng
  2012-05-03 10:56   ` Jiri Olsa
  8 siblings, 1 reply; 38+ messages in thread
From: Yan, Zheng @ 2012-05-02  2:07 UTC (permalink / raw)
  To: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin; +Cc: linux-kernel

From: "Yan, Zheng" <zheng.z.yan@intel.com>

The definition of pmu event alias is located at:
  ${sysfs_mount}/bus/event_source/devices/${pmu}/events/

Each file in the 'events' directory defines a event alias. Its contents
is like:
  config=1,config1=2

Using pmu event alias, event could be now specified like:
  uncore/CLOCKTICKS/

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 tools/perf/util/parse-events.c |   24 ++++++++-
 tools/perf/util/parse-events.y |    2 +-
 tools/perf/util/pmu.c          |  117 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/pmu.h          |   10 +++-
 4 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index c587ae8..764b2c31 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -653,8 +653,12 @@ int parse_events_add_numeric(struct list_head *list, int *idx,
 int parse_events_add_pmu(struct list_head *list, int *idx,
 			 char *name, struct list_head *head_config)
 {
+	LIST_HEAD(event);
 	struct perf_event_attr attr;
 	struct perf_pmu *pmu;
+	const char *config;
+	char *str;
+	int ret;
 
 	pmu = perf_pmu__find(name);
 	if (!pmu)
@@ -668,10 +672,26 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 	 */
 	config_attr(&attr, head_config, 0);
 
-	if (perf_pmu__config(pmu, &attr, head_config))
+	ret = perf_pmu__config(pmu, &attr, head_config);
+	if (!ret)
+		return add_event(list, idx, &attr, (char *) "pmu");
+
+	config = perf_pmu__alias(pmu, head_config);
+	if (!config)
 		return -EINVAL;
 
-	return add_event(list, idx, &attr, (char *) "pmu");
+	str = malloc(strlen(pmu->name) + strlen(config) + 3);
+	if (!str)
+		return -ENOMEM;
+
+	sprintf(str, "%s/%s/", pmu->name, config);
+	ret =  __parse_events(str, idx, &event);
+	free(str);
+	if (ret)
+		return ret;
+
+	list_splice_tail(&event, list);
+	return 0;
 }
 
 void parse_events_update_lists(struct list_head *list_event,
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 52082a7..8a26f3d 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -197,7 +197,7 @@ PE_NAME
 {
 	struct parse_events__term *term;
 
-	ABORT_ON(parse_events__new_term(&term, PARSE_EVENTS__TERM_TYPE_NUM,
+	ABORT_ON(parse_events__new_term(&term, PARSE_EVENTS__TERM_TYPE_STR,
 		 $1, NULL, 1));
 	$$ = term;
 }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index cb08a11..13dde6c 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -80,6 +80,89 @@ static int pmu_format(char *name, struct list_head *format)
 	return 0;
 }
 
+static int perf_pmu__new_alias(struct list_head *list, char *name, FILE *file)
+{
+	struct perf_pmu__alias *alias;
+	char buf[256];
+	int ret;
+
+	ret = fread(buf, 1, sizeof(buf), file);
+	if (ret == 0)
+		return -EINVAL;
+
+	alias = zalloc(sizeof(*alias));
+	if (!alias)
+		return -ENOMEM;
+
+	alias->name = strdup(name);
+	alias->config = strndup(buf, ret);
+
+	list_add_tail(&alias->list, list);
+	return 0;
+}
+
+/*
+ * Process all the sysfs attributes located under the directory
+ * specified in 'dir' parameter.
+ */
+static int pmu_aliases_parse(char *dir, struct list_head *head)
+{
+	struct dirent *evt_ent;
+	DIR *event_dir;
+	int ret = 0;
+
+	event_dir = opendir(dir);
+	if (!event_dir)
+		return -EINVAL;
+
+	while (!ret && (evt_ent = readdir(event_dir))) {
+		char path[PATH_MAX];
+		char *name = evt_ent->d_name;
+		FILE *file;
+
+		if (!strcmp(name, ".") || !strcmp(name, ".."))
+			continue;
+
+		snprintf(path, PATH_MAX, "%s/%s", dir, name);
+
+		ret = -EINVAL;
+		file = fopen(path, "r");
+		if (!file)
+			break;
+		ret = perf_pmu__new_alias(head, name, file);
+		fclose(file);
+	}
+
+	closedir(event_dir);
+	return ret;
+}
+
+/*
+ * Reading the pmu event aliases definition, which should be located at:
+ * /sys/bus/event_source/devices/<dev>/events as sysfs group attributes.
+ */
+static int pmu_aliases(char *name, struct list_head *aliases)
+{
+	struct stat st;
+	char path[PATH_MAX];
+	const char *sysfs;
+
+	sysfs = sysfs_find_mountpoint();
+	if (!sysfs)
+		return -1;
+
+	snprintf(path, PATH_MAX,
+		 "%s/bus/event_source/devices/%s/events", sysfs, name);
+
+	if (stat(path, &st) < 0)
+		return -1;
+
+	if (pmu_aliases_parse(path, aliases))
+		return -1;
+
+	return 0;
+}
+
 /*
  * Reading/parsing the default pmu type value, which should be
  * located at:
@@ -118,6 +201,7 @@ static struct perf_pmu *pmu_lookup(char *name)
 {
 	struct perf_pmu *pmu;
 	LIST_HEAD(format);
+	LIST_HEAD(aliases);
 	__u32 type;
 
 	/*
@@ -135,8 +219,12 @@ static struct perf_pmu *pmu_lookup(char *name)
 	if (!pmu)
 		return NULL;
 
+	pmu_aliases(name, &aliases);
+
 	INIT_LIST_HEAD(&pmu->format);
+	INIT_LIST_HEAD(&pmu->aliases);
 	list_splice(&format, &pmu->format);
+	list_splice(&aliases, &pmu->aliases);
 	pmu->name = strdup(name);
 	pmu->type = type;
 	return pmu;
@@ -262,6 +350,18 @@ static int pmu_config(struct list_head *formats, struct perf_event_attr *attr,
 	return 0;
 }
 
+static struct perf_pmu__alias *pmu_find_alias(struct list_head *events,
+					      char *name)
+{
+	struct perf_pmu__alias *alias;
+
+	list_for_each_entry(alias, events, list) {
+		if (!strcmp(alias->name, name))
+			return alias;
+	}
+	return NULL;
+}
+
 /*
  * Configures event's 'attr' parameter based on the:
  * 1) users input - specified in terms parameter
@@ -274,6 +374,23 @@ int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
 	return pmu_config(&pmu->format, attr, head_terms);
 }
 
+const char *perf_pmu__alias(struct perf_pmu *pmu, struct list_head *head_terms)
+{
+	struct parse_events__term *term;
+	struct perf_pmu__alias *alias;
+
+	term = list_entry(head_terms->next, struct parse_events__term, list);
+
+	if (term->type != PARSE_EVENTS__TERM_TYPE_STR || term->val.str)
+		return NULL;
+
+	alias = pmu_find_alias(&pmu->aliases, term->config);
+	if (!alias)
+		return NULL;
+
+	return alias->config;
+}
+
 int perf_pmu__new_format(struct list_head *list, char *name,
 			 int config, unsigned long *bits)
 {
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 68c0db9..7a100fe 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -19,17 +19,25 @@ struct perf_pmu__format {
 	struct list_head list;
 };
 
+struct perf_pmu__alias {
+	char *name;
+	char *config;
+	struct list_head list;
+};
+
 struct perf_pmu {
 	char *name;
 	__u32 type;
 	struct list_head format;
+	struct list_head aliases;
 	struct list_head list;
 };
 
 struct perf_pmu *perf_pmu__find(char *name);
 int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
 		     struct list_head *head_terms);
-
+const char *perf_pmu__alias(struct perf_pmu *pmu,
+			    struct list_head *head_terms);
 int perf_pmu_wrap(void);
 void perf_pmu_error(struct list_head *list, char *name, char const *msg);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-02  2:07 ` [PATCH 9/9] perf tool: Add pmu event alias support Yan, Zheng
@ 2012-05-03 10:56   ` Jiri Olsa
  2012-05-03 11:24     ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Jiri Olsa @ 2012-05-03 10:56 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: a.p.zijlstra, mingo, andi, eranian, ming.m.lin, linux-kernel

On Wed, May 02, 2012 at 10:07:20AM +0800, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> The definition of pmu event alias is located at:
>   ${sysfs_mount}/bus/event_source/devices/${pmu}/events/
> 
> Each file in the 'events' directory defines a event alias. Its contents
> is like:
>   config=1,config1=2
> 
> Using pmu event alias, event could be now specified like:
>   uncore/CLOCKTICKS/
> 
> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ---
>  tools/perf/util/parse-events.c |   24 ++++++++-
>  tools/perf/util/parse-events.y |    2 +-
>  tools/perf/util/pmu.c          |  117 ++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/pmu.h          |   10 +++-
>  4 files changed, 149 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index c587ae8..764b2c31 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -653,8 +653,12 @@ int parse_events_add_numeric(struct list_head *list, int *idx,
>  int parse_events_add_pmu(struct list_head *list, int *idx,
>  			 char *name, struct list_head *head_config)
>  {
> +	LIST_HEAD(event);
>  	struct perf_event_attr attr;
>  	struct perf_pmu *pmu;
> +	const char *config;
> +	char *str;
> +	int ret;
>  
>  	pmu = perf_pmu__find(name);
>  	if (!pmu)
> @@ -668,10 +672,26 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
>  	 */
>  	config_attr(&attr, head_config, 0);
>  
> -	if (perf_pmu__config(pmu, &attr, head_config))
> +	ret = perf_pmu__config(pmu, &attr, head_config);
> +	if (!ret)
> +		return add_event(list, idx, &attr, (char *) "pmu");
> +
> +	config = perf_pmu__alias(pmu, head_config);
> +	if (!config)
>  		return -EINVAL;
hi,
could we have the interface with string only:
	config = perf_pmu__alias(pmu, alias);

and AFAICS check if there's only single term and it's string,
then check for alias


I've got an idea for another approach, that would not need reentrant
parser and might be more gentle to the sysfs file rule

- in sysfs you would have directory with aliases (now called 'events')
- each alias is sysfs dir, with file attrs:
	file name = term name, file value = term value
  eg.:
	events/
		CAS_COUNT_RD/
			# files:
			config  - value 1
			config1 - value 2
			mask	- value ...

- on init you read all aliases and load its terms
  so each alias is defined by list of terms
- in parse_events_add_pmu before you run perf_pmu__config,
  you check if any term matches any defined alias
  and replace that term with all the terms defined for the alias
- run perf_pmu__config with new set of terms..

this way it's also possible to add extra terms to existing alias
in command line if needed... might be handy

thoughts?
jirka

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-03 10:56   ` Jiri Olsa
@ 2012-05-03 11:24     ` Peter Zijlstra
  2012-05-03 20:05       ` Jiri Olsa
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 11:24 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Yan, Zheng, mingo, andi, eranian, ming.m.lin, linux-kernel

On Thu, 2012-05-03 at 12:56 +0200, Jiri Olsa wrote:
> - in sysfs you would have directory with aliases (now called 'events')
> - each alias is sysfs dir, with file attrs:
>         file name = term name, file value = term value
>   eg.:
>         events/
>                 CAS_COUNT_RD/
>                         # files:
>                         config  - value 1
>                         config1 - value 2
>                         mask    - value ... 

I'd prefer the thing Yan proposed (if the sysfs folks let us),

$foo/events/QHL_REQUEST_REMOTE_READS

with contents: "event=0x20,umask=0x4"

> this way it's also possible to add extra terms to existing alias
> in command line if needed... might be handy
> 
That should always be possible, if you modify the parser to take things
like:

  event=0x20,umask=0x4,event=0x21

and have latter values override earlier values, so it collapses into:

  umask=0x4,event=0x21

you can simply take whatever comes out of the event file and stick extra
bits at the end.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-02  2:07 ` [PATCH 4/9] perf: Generic intel uncore support Yan, Zheng
@ 2012-05-03 17:12   ` Peter Zijlstra
  2012-05-04  7:33     ` Yan, Zheng
  2012-05-10  7:34     ` Yan, Zheng
  2012-05-03 21:49   ` Peter Zijlstra
  2012-05-11  6:31   ` Anshuman Khandual
  2 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 17:12 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
> +static struct intel_uncore_box *
> +__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> +{
> +       struct intel_uncore_box *box;
> +       struct hlist_head *head;
> +       struct hlist_node *node;
> +
> +       head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
> +       hlist_for_each_entry_rcu(box, node, head, hlist) {
> +               if (box->phy_id == phyid)
> +                       return box;
> +       }
> +
> +       return NULL;
> +} 

I still don't get why something like:

static struct intel_uncore_box *
pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
{
	return per_cpu_ptr(pmu->box, cpu);
}

doesn't work.

Last time you mumbled something about PCI devices, but afaict those are
in all respects identical to MSR devices except you talk to them using
PCI-mmio instead of MSR registers.

In fact, since its all local to the generic code there's nothing
different between pci/msr already.

So how about something like this:

---
 Makefile                  |    4 +-
 perf_event_intel_uncore.c |   92 ++++++++++++++++++----------------------------
 perf_event_intel_uncore.h |    4 +-
 3 files changed, 42 insertions(+), 58 deletions(-)

--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -32,7 +32,9 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_event
 
 ifdef CONFIG_PERF_EVENTS
 obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o
-obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o perf_event_intel_uncore.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o 
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o 
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o
 endif
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -116,40 +116,21 @@ struct intel_uncore_box *uncore_alloc_bo
 }
 
 static struct intel_uncore_box *
-__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
+uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 {
-	struct intel_uncore_box *box;
-	struct hlist_head *head;
-	struct hlist_node *node;
-
-	head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
-	hlist_for_each_entry_rcu(box, node, head, hlist) {
-		if (box->phy_id == phyid)
-			return box;
-	}
-
-	return NULL;
-}
-
-static struct intel_uncore_box *
-uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
-{
-	struct intel_uncore_box *box;
-
-	rcu_read_lock();
-	box = __uncore_pmu_find_box(pmu, phyid);
-	rcu_read_unlock();
-
-	return box;
+	return per_cpu_ptr(pmu->box, cpu);
 }
 
 static void uncore_pmu_add_box(struct intel_uncore_pmu *pmu,
 				struct intel_uncore_box *box)
 {
-	struct hlist_head *head;
+	int cpu;
 
-	head = &pmu->box_hash[box->phy_id % UNCORE_BOX_HASH_SIZE];
-	hlist_add_head_rcu(&box->hlist, head);
+	for_each_cpu(cpu) {
+		if (box->phys_id != topology_physical_package_id(cpu))
+			continue;
+		per_cpu_ptr(pmu->box, cpu) = box;
+	}
 }
 
 static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
@@ -163,8 +144,7 @@ static struct intel_uncore_box *uncore_e
 	 * perf core schedules event on the basis of cpu, uncore events are
 	 * collected by one of the cpus inside a physical package.
 	 */
-	int phyid = topology_physical_package_id(smp_processor_id());
-	return uncore_pmu_find_box(uncore_event_to_pmu(event), phyid);
+	return uncore_pmu_to_box(uncore_event_to_pmu(event), smp_processor_id());
 }
 
 static int uncore_collect_events(struct intel_uncore_box *box,
@@ -478,8 +458,7 @@ int uncore_pmu_event_init(struct perf_ev
 	 */
 	if (event->cpu < 0)
 		return -EINVAL;
-	box = uncore_pmu_find_box(pmu,
-			topology_physical_package_id(event->cpu));
+	box = uncore_pmu_to_box(pmu, event->cpu);
 	if (!box || box->cpu < 0)
 		return -EINVAL;
 	event->cpu = box->cpu;
@@ -541,7 +520,11 @@ static int __init uncore_pmu_register(st
 
 static void __init uncore_type_exit(struct intel_uncore_type *type)
 {
+	int i;
+
 	kfree(type->attr_groups[1]);
+	for (i = 0; i < type->num_boxes; i++)
+		free_percpu(type->pmus[i].box);
 	kfree(type->pmus);
 	type->attr_groups[1] = NULL;
 	type->pmus = NULL;
@@ -566,9 +549,9 @@ static int __init uncore_type_init(struc
 		pmus[i].func_id = -1;
 		pmus[i].pmu_idx = i;
 		pmus[i].type = type;
-
-		for (j = 0; j < ARRAY_SIZE(pmus[0].box_hash); j++)
-			INIT_HLIST_HEAD(&pmus[i].box_hash[j]);
+		pmus[i].box = alloc_percpu(struct intel_uncore_box *);
+		if (!pmus[i].box)
+			goto fail_percpu;
 	}
 
 	if (type->event_descs) {
@@ -591,6 +574,11 @@ static int __init uncore_type_init(struc
 
 	type->pmus = pmus;
 	return 0;
+
+fail_percpu:
+	for (i = 0; i < type->num_boxes; i++)
+		free_percpu(pmus[i].box);
+
 fail:
 	uncore_type_exit(type);
 	return -ENOMEM;
@@ -617,15 +605,13 @@ static void __cpuinit uncore_cpu_dying(i
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j, phyid;
-
-	phyid = topology_physical_package_id(cpu);
+	int i, j;
 
 	for (i = 0; msr_uncores[i]; i++) {
 		type = msr_uncores[i];
 		for (j = 0; j < type->num_boxes; j++) {
 			pmu = &type->pmus[j];
-			box = uncore_pmu_find_box(pmu, phyid);
+			box = uncore_pmu_to_box(pmu, cpu);
 			if (box && --box->refcnt == 0) {
 				hlist_del_rcu(&box->hlist);
 				kfree_rcu(box, rcu_head);
@@ -639,15 +625,13 @@ static int __cpuinit uncore_cpu_starting
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j, phyid;
-
-	phyid = topology_physical_package_id(cpu);
+	int i, j;
 
 	for (i = 0; msr_uncores[i]; i++) {
 		type = msr_uncores[i];
 		for (j = 0; j < type->num_boxes; j++) {
 			pmu = &type->pmus[j];
-			box = uncore_pmu_find_box(pmu, phyid);
+			box = uncore_pmu_to_box(pmu, cpu);
 			if (box)
 				uncore_box_init(box);
 		}
@@ -660,9 +644,7 @@ static int __cpuinit uncore_cpu_prepare(
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *exist, *box;
-	int i, j, phyid;
-
-	phyid = topology_physical_package_id(cpu);
+	int i, j;
 
 	/* allocate the box data structure */
 	for (i = 0; msr_uncores[i]; i++) {
@@ -673,7 +655,7 @@ static int __cpuinit uncore_cpu_prepare(
 
 			if (pmu->func_id < 0)
 				pmu->func_id = j;
-			exist = uncore_pmu_find_box(pmu, phyid);
+			exist = uncore_pmu_to_box(pmu, cpu);
 			if (exist)
 				exist->refcnt++;
 			if (exist)
@@ -684,7 +666,7 @@ static int __cpuinit uncore_cpu_prepare(
 				return -ENOMEM;
 
 			box->pmu = pmu;
-			box->phy_id = phyid;
+			box->phys_id = topology_physical_package_id(cpu);
 			uncore_pmu_add_box(pmu, box);
 		}
 	}
@@ -696,19 +678,19 @@ static void __cpuinit uncore_event_exit_
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j, phyid, target;
+	int i, j, phys_id, target;
 
 	/* if exiting cpu is used for collecting uncore events */
 	if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
 		return;
 
 	/* find a new cpu to collect uncore events */
-	phyid = topology_physical_package_id(cpu);
+	phys_id = topology_physical_package_id(cpu);
 	target = -1;
 	for_each_online_cpu(i) {
 		if (i == cpu)
 			continue;
-		if (phyid == topology_physical_package_id(i)) {
+		if (phys_id == topology_physical_package_id(i)) {
 			target = i;
 			break;
 		}
@@ -722,7 +704,7 @@ static void __cpuinit uncore_event_exit_
 		type = msr_uncores[i];
 		for (j = 0; j < type->num_boxes; j++) {
 			pmu = &type->pmus[j];
-			box = uncore_pmu_find_box(pmu, phyid);
+			box = uncore_pmu_to_box(pmu, phys_id);
 			WARN_ON_ONCE(box->cpu != cpu);
 
 			if (target >= 0) {
@@ -742,11 +724,11 @@ static void __cpuinit uncore_event_init_
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, j, phyid;
+	int i, j, phys_id;
 
-	phyid = topology_physical_package_id(cpu);
+	phys_id = topology_physical_package_id(cpu);
 	for_each_cpu(i, &uncore_cpu_mask) {
-		if (phyid == topology_physical_package_id(i))
+		if (phys_id == topology_physical_package_id(i))
 			return;
 	}
 
@@ -756,7 +738,7 @@ static void __cpuinit uncore_event_init_
 		type = msr_uncores[i];
 		for (j = 0; j < type->num_boxes; j++) {
 			pmu = &type->pmus[j];
-			box = uncore_pmu_find_box(pmu, phyid);
+			box = uncore_pmu_to_box(pmu, cpu);
 			WARN_ON_ONCE(box->cpu != -1);
 			box->cpu = cpu;
 		}
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -59,12 +59,12 @@ struct intel_uncore_pmu {
 	int pmu_idx;
 	int func_id;
 	struct intel_uncore_type *type;
-	struct hlist_head box_hash[UNCORE_BOX_HASH_SIZE];
+	struct intel_uncore_box * __per_cpu box;
 };
 
 struct intel_uncore_box {
 	struct hlist_node hlist;
-	int phy_id;
+	int phys_id;
 	int refcnt;
 	int n_active;	/* number of active events */
 	int n_events;


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-03 11:24     ` Peter Zijlstra
@ 2012-05-03 20:05       ` Jiri Olsa
  2012-05-04 12:32         ` Yan, Zheng
                           ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Jiri Olsa @ 2012-05-03 20:05 UTC (permalink / raw)
  To: Peter Zijlstra, Yan, Zheng, mingo
  Cc: mingo, andi, eranian, ming.m.lin, linux-kernel

On Thu, May 03, 2012 at 01:24:21PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-05-03 at 12:56 +0200, Jiri Olsa wrote:
> > - in sysfs you would have directory with aliases (now called 'events')
> > - each alias is sysfs dir, with file attrs:
> >         file name = term name, file value = term value
> >   eg.:
> >         events/
> >                 CAS_COUNT_RD/
> >                         # files:
> >                         config  - value 1
> >                         config1 - value 2
> >                         mask    - value ... 
> 
> I'd prefer the thing Yan proposed (if the sysfs folks let us),
> 
> $foo/events/QHL_REQUEST_REMOTE_READS
> 
> with contents: "event=0x20,umask=0x4"
> 
> > this way it's also possible to add extra terms to existing alias
> > in command line if needed... might be handy
> > 
> That should always be possible, if you modify the parser to take things
> like:
> 
>   event=0x20,umask=0x4,event=0x21
> 
> and have latter values override earlier values, so it collapses into:
> 
>   umask=0x4,event=0x21
> 
> you can simply take whatever comes out of the event file and stick extra
> bits at the end.

I discussed this with Peter on irc, so I'll try to sum it up

we have following options so far:

   with event alias 'al' with definition 'config=1,config1=1,config2=2'

1) inside parse_events_add_pmu function
   once alias term is detected as part of event definition 'pmu/al/mod' we
   construct new event 'pmu/config=1,config1=1,config2=2/mod' and rerun the
   event parser on that

2) inside parse_events_add_pmu function 
   once alias term is detected as part of event definition 'pmu/al/mod' we
   replace that term with list of terms for that alias definition and run
   perf_pmu__config with this new term list

3) during bison/flex processing
   have option 2) embeded inside flex/bison rules. Once alias term
   is detected, insert the aliased terms directly to the list of terms,
   not replacing expos as in option 2.


- option 1 is currently implemented 
- options 2 and 3 requires the aliased config is loaded/parsed from pmu
  sysfs tree in form of terms list
- option 3 is a little fuzzy for me now on how to integrate this with
  flex/bison

My interest here is to go with option 2 or 3 if possible - preferrably 2 ;),
because I think it's better/cleaner to deal with terms in one place once they
are parsed - in parse_events_add_pmu function.

I think there's no need to re run the whole parser (option 1) when we
have the whole thing ready by adding just extra terms.

thoughts?

thanks,
jirka

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/9] perf: Add Nehalem and Sandy Bridge uncore support
  2012-05-02  2:07 ` [PATCH 5/9] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
@ 2012-05-03 21:04   ` Peter Zijlstra
  2012-05-04  5:47     ` Yan, Zheng
  2012-05-03 21:04   ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 21:04 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
>         switch (boot_cpu_data.x86_model) {
> +       case 26: /* Nehalem */
> +       case 30:
> +       case 31:
> +       case 37: /* Westmere */
> +               msr_uncores = nhm_msr_uncores;
> +               break;
> +       case 42: /* Sandy Bridge */
> +               msr_uncores = snb_msr_uncores;
> +               break;
>         default:
>                 return 0;
>         } 

I really hate we're duplicating all these things all over the place, but
I don't really know what to do about it either.

Anyway, it looks like perf_event_intel.c is missing 31, that said I
cannot seem to find model 31 on wikipedia either.

What about 44, WSM-EP ?

I understand NHM-EX (46) and WSM-EX (47) have different uncores not
implemented in this series, will those follow?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/9] perf: Add Nehalem and Sandy Bridge uncore support
  2012-05-02  2:07 ` [PATCH 5/9] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
  2012-05-03 21:04   ` Peter Zijlstra
@ 2012-05-03 21:04   ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 21:04 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
> +       .name           = "cbo",

"C-Box"

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/9] perf: Add Sandy Bridge-EP uncore support
  2012-05-02  2:07 ` [PATCH 7/9] perf: Add Sandy Bridge-EP uncore support Yan, Zheng
@ 2012-05-03 21:12   ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 21:12 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

Let's keep the names as they're listed in the Intel doc:

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
> +static struct intel_uncore_type snbep_uncore_ubox = {
> +       .name           = "ubox",

"U-Box"

> +};
> +
> +static struct intel_uncore_type snbep_uncore_cbo = {
> +       .name           = "cbo",

"C-Box"

> +};
> +
> +static struct intel_uncore_type snbep_uncore_pcu = {
> +       .name           = "pcu",

"PCU"

> +};


> +static struct intel_uncore_type snbep_uncore_ha = {
> +       .name           = "ha",

"HA"

> +};
> +
> +static struct intel_uncore_type snbep_uncore_imc = {
> +       .name           = "imc",

"iMC"

> +};
> +
> +static struct intel_uncore_type snbep_uncore_qpi = {
> +       .name           = "qpi",

"QPI"

> +};
> +
> +
> +static struct intel_uncore_type snbep_uncore_r2pcie = {
> +       .name           = "r2pcie",

"R2PCIe"

> +};
> +
> +static struct intel_uncore_type snbep_uncore_r3qpi = {
> +       .name           = "r3qpi",

"R3QPI"

> +}; 

These last two are the P-Boxes ? The figure (1-1) lists a 3rd P-box
covering the SMI channels, the table (1-1) doesn't list it though.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/9] perf: Generic pci uncore device support
  2012-05-02  2:07 ` [PATCH 6/9] perf: Generic pci uncore device support Yan, Zheng
@ 2012-05-03 21:37   ` Peter Zijlstra
  2012-05-03 21:39   ` Peter Zijlstra
  2012-05-03 21:46   ` Peter Zijlstra
  2 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 21:37 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
> +       if (box->pci_dev) {
> +               hwc->config_base = uncore_pci_event_ctl(box,
> hwc->idx);
> +               hwc->event_base =  uncore_pci_perf_ctr(box, hwc->idx);
> +       } else {
> +               hwc->config_base = uncore_msr_event_ctl(box,
> hwc->idx);
> +               hwc->event_base =  uncore_msr_perf_ctr(box, hwc->idx);
> +       } 

Since box is already an argument to uncore_*_event_ct[lr]() it could
include this conditional. Is GCC smart enough to pull it out and not
evaluate the condition twice in that case?



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/9] perf: Generic pci uncore device support
  2012-05-02  2:07 ` [PATCH 6/9] perf: Generic pci uncore device support Yan, Zheng
  2012-05-03 21:37   ` Peter Zijlstra
@ 2012-05-03 21:39   ` Peter Zijlstra
  2012-05-03 21:46   ` Peter Zijlstra
  2 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 21:39 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
> +       uncores = msr_uncores;
> +       while (1) {
> +               for (i = 0; uncores[i]; i++) {
> +                       type = uncores[i];
> +                       for (j = 0; j < type->num_boxes; j++) {
> +                               pmu = &type->pmus[j];
> +                               box = uncore_pmu_find_box(pmu, phyid);
> +                               WARN_ON_ONCE(box->cpu != -1);
> +                               box->cpu = cpu;
> +                       }
>                 }
> +               if (uncores != msr_uncores)
> +                       break;
> +               uncores = pci_uncores;
>         } 

Wouldn't it be better to pull the body out into a separate function and
do something like:

 __uncore_init_types(msr_uncores, cpu);
 __uncore_init_types(pci_uncores, cpu);

same for the other such construct..



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/9] perf: Generic pci uncore device support
  2012-05-02  2:07 ` [PATCH 6/9] perf: Generic pci uncore device support Yan, Zheng
  2012-05-03 21:37   ` Peter Zijlstra
  2012-05-03 21:39   ` Peter Zijlstra
@ 2012-05-03 21:46   ` Peter Zijlstra
  2012-05-04  6:07     ` Yan, Zheng
  2 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 21:46 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
> +static void __devexit uncore_pci_remove(struct pci_dev *pdev)
> +{
> +       struct intel_uncore_box *box = pci_get_drvdata(pdev);
> +       int phyid = pcibus_to_phyid[pdev->bus->number];
> +
> +       if (WARN_ON_ONCE(phyid != box->phy_id))
> +               return;
> +
> +       box->pci_dev = NULL;
> +       if (--box->refcnt == 0) {

This appears completely unserialized (as is all the refcnt stuff in
patch 4 it seems), now I figure the only way to actually have this pci
device go away is by hotplug, which is serialized. Still looks very odd.

> +               spin_lock(&uncore_pci_lock);
> +               hlist_del_rcu(&box->hlist);
> +               spin_unlock(&uncore_pci_lock);
> +               kfree_rcu(box, rcu_head);
> +       }
> +} 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-02  2:07 ` [PATCH 4/9] perf: Generic intel uncore support Yan, Zheng
  2012-05-03 17:12   ` Peter Zijlstra
@ 2012-05-03 21:49   ` Peter Zijlstra
  2012-05-11  6:31   ` Anshuman Khandual
  2 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-03 21:49 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
> +               sprintf(pmu->name, "uncore_%s%d", pmu->type->name,
> +                       pmu->pmu_idx); 

That probably wants to be: "uncore_%s_%d" or somesuch.

"uncore_C-Box3 vs "uncore_C-Box_3"

Looking at it like that, it might do to call it "Uncore*", dunno..

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 5/9] perf: Add Nehalem and Sandy Bridge uncore support
  2012-05-03 21:04   ` Peter Zijlstra
@ 2012-05-04  5:47     ` Yan, Zheng
  0 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-04  5:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On 05/04/2012 05:04 AM, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
>>         switch (boot_cpu_data.x86_model) {
>> +       case 26: /* Nehalem */
>> +       case 30:
>> +       case 31:
>> +       case 37: /* Westmere */
>> +               msr_uncores = nhm_msr_uncores;
>> +               break;
>> +       case 42: /* Sandy Bridge */
>> +               msr_uncores = snb_msr_uncores;
>> +               break;
>>         default:
>>                 return 0;
>>         } 
> 
> I really hate we're duplicating all these things all over the place, but
> I don't really know what to do about it either.
> 
> Anyway, it looks like perf_event_intel.c is missing 31, that said I
> cannot seem to find model 31 on wikipedia either.
> 
Sorry. the number 31 is copied from Lin Ming's old uncore patch. Now both
Lin Ming and I can't find the model 31 processor. I will remove the 31 in
future patches. 

> What about 44, WSM-EP ?
> 
> I understand NHM-EX (46) and WSM-EX (47) have different uncores not
> implemented in this series, will those follow?
Yes, I just start reading the SDM.

Regards
Yan, Zheng


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/9] perf: Generic pci uncore device support
  2012-05-03 21:46   ` Peter Zijlstra
@ 2012-05-04  6:07     ` Yan, Zheng
  0 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-04  6:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On 05/04/2012 05:46 AM, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
>> +static void __devexit uncore_pci_remove(struct pci_dev *pdev)
>> +{
>> +       struct intel_uncore_box *box = pci_get_drvdata(pdev);
>> +       int phyid = pcibus_to_phyid[pdev->bus->number];
>> +
>> +       if (WARN_ON_ONCE(phyid != box->phy_id))
>> +               return;
>> +
>> +       box->pci_dev = NULL;
>> +       if (--box->refcnt == 0) {
> 
> This appears completely unserialized (as is all the refcnt stuff in
> patch 4 it seems), now I figure the only way to actually have this pci
> device go away is by hotplug, which is serialized. Still looks very odd.
> 
yes, box->refcnt should always be 1 here. will remove the 'if (--box->refcnt == 0)'

>> +               spin_lock(&uncore_pci_lock);
>> +               hlist_del_rcu(&box->hlist);
>> +               spin_unlock(&uncore_pci_lock);
>> +               kfree_rcu(box, rcu_head);
>> +       }
>> +} 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-03 17:12   ` Peter Zijlstra
@ 2012-05-04  7:33     ` Yan, Zheng
  2012-05-04 17:57       ` Peter Zijlstra
  2012-05-10  7:34     ` Yan, Zheng
  1 sibling, 1 reply; 38+ messages in thread
From: Yan, Zheng @ 2012-05-04  7:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On 05/04/2012 01:12 AM, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
>> +static struct intel_uncore_box *
>> +__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
>> +{
>> +       struct intel_uncore_box *box;
>> +       struct hlist_head *head;
>> +       struct hlist_node *node;
>> +
>> +       head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
>> +       hlist_for_each_entry_rcu(box, node, head, hlist) {
>> +               if (box->phy_id == phyid)
>> +                       return box;
>> +       }
>> +
>> +       return NULL;
>> +} 
> 
> I still don't get why something like:
> 
> static struct intel_uncore_box *
> pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
> {
> 	return per_cpu_ptr(pmu->box, cpu);
> }
> 
> doesn't work.
> 
> Last time you mumbled something about PCI devices, but afaict those are
> in all respects identical to MSR devices except you talk to them using
> PCI-mmio instead of MSR registers.
> 
> In fact, since its all local to the generic code there's nothing
> different between pci/msr already.
> 
> So how about something like this:
> 
> ---
>  Makefile                  |    4 +-
>  perf_event_intel_uncore.c |   92 ++++++++++++++++++----------------------------
>  perf_event_intel_uncore.h |    4 +-
>  3 files changed, 42 insertions(+), 58 deletions(-)
> 
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -32,7 +32,9 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_event
>  
>  ifdef CONFIG_PERF_EVENTS
>  obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o
> -obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o perf_event_intel_uncore.o
> +obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o 
> +obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o 
> +obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o
>  endif
>  
>  obj-$(CONFIG_X86_MCE)			+= mcheck/
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -116,40 +116,21 @@ struct intel_uncore_box *uncore_alloc_bo
>  }
>  
>  static struct intel_uncore_box *
> -__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> +uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
>  {
> -	struct intel_uncore_box *box;
> -	struct hlist_head *head;
> -	struct hlist_node *node;
> -
> -	head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
> -	hlist_for_each_entry_rcu(box, node, head, hlist) {
> -		if (box->phy_id == phyid)
> -			return box;
> -	}
> -
> -	return NULL;
> -}
> -
> -static struct intel_uncore_box *
> -uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> -{
> -	struct intel_uncore_box *box;
> -
> -	rcu_read_lock();
> -	box = __uncore_pmu_find_box(pmu, phyid);
> -	rcu_read_unlock();
> -
> -	return box;
> +	return per_cpu_ptr(pmu->box, cpu);
>  }
>  
>  static void uncore_pmu_add_box(struct intel_uncore_pmu *pmu,
>  				struct intel_uncore_box *box)
>  {
> -	struct hlist_head *head;
> +	int cpu;
>  
> -	head = &pmu->box_hash[box->phy_id % UNCORE_BOX_HASH_SIZE];
> -	hlist_add_head_rcu(&box->hlist, head);
> +	for_each_cpu(cpu) {
> +		if (box->phys_id != topology_physical_package_id(cpu))
> +			continue;
> +		per_cpu_ptr(pmu->box, cpu) = box;
> +	}
>  }
Thank you for your suggestion. One reason I choose hash table is that I'm uncertain
the system behavior when hot-plug a CPU, the PCI uncore device is probed first or
the CPU data is initialized first? If the PCI device is probed first, I think your
code won't work because topology_physical_package_id may return incorrect id.

Regards
Yan, Zheng

> 
>  static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
> @@ -163,8 +144,7 @@ static struct intel_uncore_box *uncore_e
>  	 * perf core schedules event on the basis of cpu, uncore events are
>  	 * collected by one of the cpus inside a physical package.
>  	 */
> -	int phyid = topology_physical_package_id(smp_processor_id());
> -	return uncore_pmu_find_box(uncore_event_to_pmu(event), phyid);
> +	return uncore_pmu_to_box(uncore_event_to_pmu(event), smp_processor_id());
>  }
>  
>  static int uncore_collect_events(struct intel_uncore_box *box,
> @@ -478,8 +458,7 @@ int uncore_pmu_event_init(struct perf_ev
>  	 */
>  	if (event->cpu < 0)
>  		return -EINVAL;
> -	box = uncore_pmu_find_box(pmu,
> -			topology_physical_package_id(event->cpu));
> +	box = uncore_pmu_to_box(pmu, event->cpu);
>  	if (!box || box->cpu < 0)
>  		return -EINVAL;
>  	event->cpu = box->cpu;
> @@ -541,7 +520,11 @@ static int __init uncore_pmu_register(st
>  
>  static void __init uncore_type_exit(struct intel_uncore_type *type)
>  {
> +	int i;
> +
>  	kfree(type->attr_groups[1]);
> +	for (i = 0; i < type->num_boxes; i++)
> +		free_percpu(type->pmus[i].box);
>  	kfree(type->pmus);
>  	type->attr_groups[1] = NULL;
>  	type->pmus = NULL;
> @@ -566,9 +549,9 @@ static int __init uncore_type_init(struc
>  		pmus[i].func_id = -1;
>  		pmus[i].pmu_idx = i;
>  		pmus[i].type = type;
> -
> -		for (j = 0; j < ARRAY_SIZE(pmus[0].box_hash); j++)
> -			INIT_HLIST_HEAD(&pmus[i].box_hash[j]);
> +		pmus[i].box = alloc_percpu(struct intel_uncore_box *);
> +		if (!pmus[i].box)
> +			goto fail_percpu;
>  	}
>  
>  	if (type->event_descs) {
> @@ -591,6 +574,11 @@ static int __init uncore_type_init(struc
>  
>  	type->pmus = pmus;
>  	return 0;
> +
> +fail_percpu:
> +	for (i = 0; i < type->num_boxes; i++)
> +		free_percpu(pmus[i].box);
> +
>  fail:
>  	uncore_type_exit(type);
>  	return -ENOMEM;
> @@ -617,15 +605,13 @@ static void __cpuinit uncore_cpu_dying(i
>  	struct intel_uncore_type *type;
>  	struct intel_uncore_pmu *pmu;
>  	struct intel_uncore_box *box;
> -	int i, j, phyid;
> -
> -	phyid = topology_physical_package_id(cpu);
> +	int i, j;
>  
>  	for (i = 0; msr_uncores[i]; i++) {
>  		type = msr_uncores[i];
>  		for (j = 0; j < type->num_boxes; j++) {
>  			pmu = &type->pmus[j];
> -			box = uncore_pmu_find_box(pmu, phyid);
> +			box = uncore_pmu_to_box(pmu, cpu);
>  			if (box && --box->refcnt == 0) {
>  				hlist_del_rcu(&box->hlist);
>  				kfree_rcu(box, rcu_head);
> @@ -639,15 +625,13 @@ static int __cpuinit uncore_cpu_starting
>  	struct intel_uncore_type *type;
>  	struct intel_uncore_pmu *pmu;
>  	struct intel_uncore_box *box;
> -	int i, j, phyid;
> -
> -	phyid = topology_physical_package_id(cpu);
> +	int i, j;
>  
>  	for (i = 0; msr_uncores[i]; i++) {
>  		type = msr_uncores[i];
>  		for (j = 0; j < type->num_boxes; j++) {
>  			pmu = &type->pmus[j];
> -			box = uncore_pmu_find_box(pmu, phyid);
> +			box = uncore_pmu_to_box(pmu, cpu);
>  			if (box)
>  				uncore_box_init(box);
>  		}
> @@ -660,9 +644,7 @@ static int __cpuinit uncore_cpu_prepare(
>  	struct intel_uncore_type *type;
>  	struct intel_uncore_pmu *pmu;
>  	struct intel_uncore_box *exist, *box;
> -	int i, j, phyid;
> -
> -	phyid = topology_physical_package_id(cpu);
> +	int i, j;
>  
>  	/* allocate the box data structure */
>  	for (i = 0; msr_uncores[i]; i++) {
> @@ -673,7 +655,7 @@ static int __cpuinit uncore_cpu_prepare(
>  
>  			if (pmu->func_id < 0)
>  				pmu->func_id = j;
> -			exist = uncore_pmu_find_box(pmu, phyid);
> +			exist = uncore_pmu_to_box(pmu, cpu);
>  			if (exist)
>  				exist->refcnt++;
>  			if (exist)
> @@ -684,7 +666,7 @@ static int __cpuinit uncore_cpu_prepare(
>  				return -ENOMEM;
>  
>  			box->pmu = pmu;
> -			box->phy_id = phyid;
> +			box->phys_id = topology_physical_package_id(cpu);
>  			uncore_pmu_add_box(pmu, box);
>  		}
>  	}
> @@ -696,19 +678,19 @@ static void __cpuinit uncore_event_exit_
>  	struct intel_uncore_type *type;
>  	struct intel_uncore_pmu *pmu;
>  	struct intel_uncore_box *box;
> -	int i, j, phyid, target;
> +	int i, j, phys_id, target;
>  
>  	/* if exiting cpu is used for collecting uncore events */
>  	if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
>  		return;
>  
>  	/* find a new cpu to collect uncore events */
> -	phyid = topology_physical_package_id(cpu);
> +	phys_id = topology_physical_package_id(cpu);
>  	target = -1;
>  	for_each_online_cpu(i) {
>  		if (i == cpu)
>  			continue;
> -		if (phyid == topology_physical_package_id(i)) {
> +		if (phys_id == topology_physical_package_id(i)) {
>  			target = i;
>  			break;
>  		}
> @@ -722,7 +704,7 @@ static void __cpuinit uncore_event_exit_
>  		type = msr_uncores[i];
>  		for (j = 0; j < type->num_boxes; j++) {
>  			pmu = &type->pmus[j];
> -			box = uncore_pmu_find_box(pmu, phyid);
> +			box = uncore_pmu_to_box(pmu, phys_id);
>  			WARN_ON_ONCE(box->cpu != cpu);
>  
>  			if (target >= 0) {
> @@ -742,11 +724,11 @@ static void __cpuinit uncore_event_init_
>  	struct intel_uncore_type *type;
>  	struct intel_uncore_pmu *pmu;
>  	struct intel_uncore_box *box;
> -	int i, j, phyid;
> +	int i, j, phys_id;
>  
> -	phyid = topology_physical_package_id(cpu);
> +	phys_id = topology_physical_package_id(cpu);
>  	for_each_cpu(i, &uncore_cpu_mask) {
> -		if (phyid == topology_physical_package_id(i))
> +		if (phys_id == topology_physical_package_id(i))
>  			return;
>  	}
>  
> @@ -756,7 +738,7 @@ static void __cpuinit uncore_event_init_
>  		type = msr_uncores[i];
>  		for (j = 0; j < type->num_boxes; j++) {
>  			pmu = &type->pmus[j];
> -			box = uncore_pmu_find_box(pmu, phyid);
> +			box = uncore_pmu_to_box(pmu, cpu);
>  			WARN_ON_ONCE(box->cpu != -1);
>  			box->cpu = cpu;
>  		}
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -59,12 +59,12 @@ struct intel_uncore_pmu {
>  	int pmu_idx;
>  	int func_id;
>  	struct intel_uncore_type *type;
> -	struct hlist_head box_hash[UNCORE_BOX_HASH_SIZE];
> +	struct intel_uncore_box * __per_cpu box;
>  };
>  
>  struct intel_uncore_box {
>  	struct hlist_node hlist;
> -	int phy_id;
> +	int phys_id;
>  	int refcnt;
>  	int n_active;	/* number of active events */
>  	int n_events;
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-03 20:05       ` Jiri Olsa
@ 2012-05-04 12:32         ` Yan, Zheng
  2012-05-07  8:34         ` Yan, Zheng
  2012-05-07 17:14         ` Peter Zijlstra
  2 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-04 12:32 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Peter Zijlstra, mingo, andi, eranian, ming.m.lin, linux-kernel

On 05/04/2012 04:05 AM, Jiri Olsa wrote:
> On Thu, May 03, 2012 at 01:24:21PM +0200, Peter Zijlstra wrote:
>> On Thu, 2012-05-03 at 12:56 +0200, Jiri Olsa wrote:
>>> - in sysfs you would have directory with aliases (now called 'events')
>>> - each alias is sysfs dir, with file attrs:
>>>         file name = term name, file value = term value
>>>   eg.:
>>>         events/
>>>                 CAS_COUNT_RD/
>>>                         # files:
>>>                         config  - value 1
>>>                         config1 - value 2
>>>                         mask    - value ... 
>>
>> I'd prefer the thing Yan proposed (if the sysfs folks let us),
>>
>> $foo/events/QHL_REQUEST_REMOTE_READS
>>
>> with contents: "event=0x20,umask=0x4"
>>
>>> this way it's also possible to add extra terms to existing alias
>>> in command line if needed... might be handy
>>>
>> That should always be possible, if you modify the parser to take things
>> like:
>>
>>   event=0x20,umask=0x4,event=0x21
>>
>> and have latter values override earlier values, so it collapses into:
>>
>>   umask=0x4,event=0x21
>>
>> you can simply take whatever comes out of the event file and stick extra
>> bits at the end.
> 
> I discussed this with Peter on irc, so I'll try to sum it up
> 
> we have following options so far:
> 
>    with event alias 'al' with definition 'config=1,config1=1,config2=2'
> 
> 1) inside parse_events_add_pmu function
>    once alias term is detected as part of event definition 'pmu/al/mod' we
>    construct new event 'pmu/config=1,config1=1,config2=2/mod' and rerun the
>    event parser on that
> 
> 2) inside parse_events_add_pmu function 
>    once alias term is detected as part of event definition 'pmu/al/mod' we
>    replace that term with list of terms for that alias definition and run
>    perf_pmu__config with this new term list
> 
> 3) during bison/flex processing
>    have option 2) embeded inside flex/bison rules. Once alias term
>    is detected, insert the aliased terms directly to the list of terms,
>    not replacing expos as in option 2.
> 
> 
> - option 1 is currently implemented 
> - options 2 and 3 requires the aliased config is loaded/parsed from pmu
>   sysfs tree in form of terms list
> - option 3 is a little fuzzy for me now on how to integrate this with
>   flex/bison
> 
> My interest here is to go with option 2 or 3 if possible - preferrably 2 ;),
> because I think it's better/cleaner to deal with terms in one place once they
> are parsed - in parse_events_add_pmu function.
> 
> I think there's no need to re run the whole parser (option 1) when we
> have the whole thing ready by adding just extra terms.
> 
> thoughts?
> 

I agree with you, option 2 is cleaner than option 1. I will try implementing it

Thank you
Yan, Zheng 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-04  7:33     ` Yan, Zheng
@ 2012-05-04 17:57       ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-04 17:57 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Fri, 2012-05-04 at 15:33 +0800, Yan, Zheng wrote:
> Thank you for your suggestion. One reason I choose hash table is that I'm uncertain
> the system behavior when hot-plug a CPU, the PCI uncore device is probed first or
> the CPU data is initialized first? If the PCI device is probed first, I think your
> code won't work because topology_physical_package_id may return incorrect id.

It had better first initialize the cpu data before it starts probing its
devices. Doing it the other way around just doesn't make any sense.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-03 20:05       ` Jiri Olsa
  2012-05-04 12:32         ` Yan, Zheng
@ 2012-05-07  8:34         ` Yan, Zheng
  2012-05-10  9:52           ` Jiri Olsa
  2012-05-07 17:14         ` Peter Zijlstra
  2 siblings, 1 reply; 38+ messages in thread
From: Yan, Zheng @ 2012-05-07  8:34 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Peter Zijlstra, mingo, andi, eranian, ming.m.lin, linux-kernel

On 05/04/2012 04:05 AM, Jiri Olsa wrote:
> On Thu, May 03, 2012 at 01:24:21PM +0200, Peter Zijlstra wrote:
>> On Thu, 2012-05-03 at 12:56 +0200, Jiri Olsa wrote:
>>> - in sysfs you would have directory with aliases (now called 'events')
>>> - each alias is sysfs dir, with file attrs:
>>>         file name = term name, file value = term value
>>>   eg.:
>>>         events/
>>>                 CAS_COUNT_RD/
>>>                         # files:
>>>                         config  - value 1
>>>                         config1 - value 2
>>>                         mask    - value ... 
>>
>> I'd prefer the thing Yan proposed (if the sysfs folks let us),
>>
>> $foo/events/QHL_REQUEST_REMOTE_READS
>>
>> with contents: "event=0x20,umask=0x4"
>>
>>> this way it's also possible to add extra terms to existing alias
>>> in command line if needed... might be handy
>>>
>> That should always be possible, if you modify the parser to take things
>> like:
>>
>>   event=0x20,umask=0x4,event=0x21
>>
>> and have latter values override earlier values, so it collapses into:
>>
>>   umask=0x4,event=0x21
>>
>> you can simply take whatever comes out of the event file and stick extra
>> bits at the end.
> 
> I discussed this with Peter on irc, so I'll try to sum it up
> 
> we have following options so far:
> 
>    with event alias 'al' with definition 'config=1,config1=1,config2=2'
> 
> 1) inside parse_events_add_pmu function
>    once alias term is detected as part of event definition 'pmu/al/mod' we
>    construct new event 'pmu/config=1,config1=1,config2=2/mod' and rerun the
>    event parser on that
> 
> 2) inside parse_events_add_pmu function 
>    once alias term is detected as part of event definition 'pmu/al/mod' we
>    replace that term with list of terms for that alias definition and run
>    perf_pmu__config with this new term list
> 
> 3) during bison/flex processing
>    have option 2) embeded inside flex/bison rules. Once alias term
>    is detected, insert the aliased terms directly to the list of terms,
>    not replacing expos as in option 2.
> 
> 
> - option 1 is currently implemented 
> - options 2 and 3 requires the aliased config is loaded/parsed from pmu
>   sysfs tree in form of terms list
> - option 3 is a little fuzzy for me now on how to integrate this with
>   flex/bison
> 
> My interest here is to go with option 2 or 3 if possible - preferrably 2 ;),
> because I think it's better/cleaner to deal with terms in one place once they
> are parsed - in parse_events_add_pmu function.
> 
> I think there's no need to re run the whole parser (option 1) when we
> have the whole thing ready by adding just extra terms.
> 
> thoughts?
> 

How is the patch below, it implements option 2.

Thanks
---
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index c587ae8..4ed4278 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -656,22 +656,33 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 	struct perf_event_attr attr;
 	struct perf_pmu *pmu;
 
+	/* called by parse_event_config? */
+	if (!idx) {
+		list_splice_init(head_config, list);
+		return 0;
+	}
+
 	pmu = perf_pmu__find(name);
 	if (!pmu)
 		return -EINVAL;
 
 	memset(&attr, 0, sizeof(attr));
 
-	/*
-	 * Configure hardcoded terms first, no need to check
-	 * return value when called with fail == 0 ;)
-	 */
-	config_attr(&attr, head_config, 0);
+	while (1) {
+		/*
+		 * Configure hardcoded terms first, no need to check
+		 * return value when called with fail == 0 ;)
+		 */
+		config_attr(&attr, head_config, 0);
 
-	if (perf_pmu__config(pmu, &attr, head_config))
-		return -EINVAL;
+		if (!perf_pmu__config(pmu, &attr, head_config))
+			return add_event(list, idx, &attr, (char *) "pmu");
 
-	return add_event(list, idx, &attr, (char *) "pmu");
+		head_config = perf_pmu__alias(pmu, head_config);
+		if (!head_config)
+			break;
+	}
+	return -EINVAL;
 }
 
 void parse_events_update_lists(struct list_head *list_event,
@@ -771,6 +782,26 @@ static int __parse_events(const char *str, int *idx, struct list_head *list)
 	return ret;
 }
 
+/*
+ * parse event config string, return a list of event terms.
+ */
+int parse_event_config(struct list_head *terms, const char *str)
+{
+	char *buf;
+	int ret;
+
+	buf = malloc(strlen(str) + 6);
+	if (!buf)
+		return -ENOMEM;
+
+	/* It is no matter which pmu is used here */
+	sprintf(buf, "cpu/%s/", str);
+	ret = __parse_events(buf, NULL, terms);
+
+	free(buf);
+	return ret;
+}
+
 int parse_events(struct perf_evlist *evlist, const char *str, int unset __used)
 {
 	LIST_HEAD(list);
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e1ffeb7..5b5d698 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -30,6 +30,7 @@ extern int parse_events_option(const struct option *opt, const char *str,
 extern int parse_events(struct perf_evlist *evlist, const char *str,
 			int unset);
 extern int parse_filter(const struct option *opt, const char *str, int unset);
+extern int parse_event_config(struct list_head *terms, const char *str);
 
 #define EVENTS_HELP_MAX (128*1024)
 
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 52082a7..8a26f3d 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -197,7 +197,7 @@ PE_NAME
 {
 	struct parse_events__term *term;
 
-	ABORT_ON(parse_events__new_term(&term, PARSE_EVENTS__TERM_TYPE_NUM,
+	ABORT_ON(parse_events__new_term(&term, PARSE_EVENTS__TERM_TYPE_STR,
 		 $1, NULL, 1));
 	$$ = term;
 }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index cb08a11..7a85779 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -80,6 +80,95 @@ static int pmu_format(char *name, struct list_head *format)
 	return 0;
 }
 
+static int perf_pmu__new_alias(struct list_head *list, char *name, FILE *file)
+{
+	struct perf_pmu__alias *alias;
+	char buf[256];
+	int ret;
+
+	ret = fread(buf, 1, sizeof(buf), file);
+	if (ret == 0)
+		return -EINVAL;
+	buf[ret] = 0;
+
+	alias = malloc(sizeof(*alias));
+	if (!alias)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&alias->terms);
+	ret = parse_event_config(&alias->terms, buf);
+	if (ret) {
+		free(alias);
+		return ret;
+	}
+
+	alias->name = strdup(name);
+	list_add_tail(&alias->list, list);
+	return 0;
+}
+
+/*
+ * Process all the sysfs attributes located under the directory
+ * specified in 'dir' parameter.
+ */
+static int pmu_aliases_parse(char *dir, struct list_head *head)
+{
+	struct dirent *evt_ent;
+	DIR *event_dir;
+	int ret = 0;
+
+	event_dir = opendir(dir);
+	if (!event_dir)
+		return -EINVAL;
+
+	while (!ret && (evt_ent = readdir(event_dir))) {
+		char path[PATH_MAX];
+		char *name = evt_ent->d_name;
+		FILE *file;
+
+		if (!strcmp(name, ".") || !strcmp(name, ".."))
+			continue;
+
+		snprintf(path, PATH_MAX, "%s/%s", dir, name);
+
+		ret = -EINVAL;
+		file = fopen(path, "r");
+		if (!file)
+			break;
+		ret = perf_pmu__new_alias(head, name, file);
+		fclose(file);
+	}
+
+	closedir(event_dir);
+	return ret;
+}
+
+/*
+ * Reading the pmu event aliases definition, which should be located at:
+ * /sys/bus/event_source/devices/<dev>/events as sysfs group attributes.
+ */
+static int pmu_aliases(char *name, struct list_head *head)
+{
+	struct stat st;
+	char path[PATH_MAX];
+	const char *sysfs;
+
+	sysfs = sysfs_find_mountpoint();
+	if (!sysfs)
+		return -1;
+
+	snprintf(path, PATH_MAX,
+		 "%s/bus/event_source/devices/%s/events", sysfs, name);
+
+	if (stat(path, &st) < 0)
+		return -1;
+
+	if (pmu_aliases_parse(path, head))
+		return -1;
+
+	return 0;
+}
+
 /*
  * Reading/parsing the default pmu type value, which should be
  * located at:
@@ -118,6 +207,7 @@ static struct perf_pmu *pmu_lookup(char *name)
 {
 	struct perf_pmu *pmu;
 	LIST_HEAD(format);
+	LIST_HEAD(aliases);
 	__u32 type;
 
 	/*
@@ -135,8 +225,12 @@ static struct perf_pmu *pmu_lookup(char *name)
 	if (!pmu)
 		return NULL;
 
+	pmu_aliases(name, &aliases);
+
 	INIT_LIST_HEAD(&pmu->format);
+	INIT_LIST_HEAD(&pmu->aliases);
 	list_splice(&format, &pmu->format);
+	list_splice(&aliases, &pmu->aliases);
 	pmu->name = strdup(name);
 	pmu->type = type;
 	return pmu;
@@ -262,6 +356,35 @@ static int pmu_config(struct list_head *formats, struct perf_event_attr *attr,
 	return 0;
 }
 
+static struct perf_pmu__alias *pmu_find_alias(struct list_head *events,
+					      char *name)
+{
+	struct perf_pmu__alias *alias;
+
+	list_for_each_entry(alias, events, list) {
+		if (!strcmp(alias->name, name))
+			return alias;
+	}
+	return NULL;
+}
+
+struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
+				  struct list_head *head_terms)
+{
+	struct parse_events__term *term;
+	struct perf_pmu__alias *alias;
+
+	if (!list_is_singular(head_terms))
+		return NULL;
+
+	term = list_entry(head_terms->next, struct parse_events__term, list);
+	if (term->type != PARSE_EVENTS__TERM_TYPE_STR || term->val.str)
+		return NULL;
+
+	alias = pmu_find_alias(&pmu->aliases, term->config);
+	return &alias->terms;
+}
+
 /*
  * Configures event's 'attr' parameter based on the:
  * 1) users input - specified in terms parameter
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 68c0db9..8fad317 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -19,17 +19,25 @@ struct perf_pmu__format {
 	struct list_head list;
 };
 
+struct perf_pmu__alias {
+	char *name;
+	struct list_head terms;
+	struct list_head list;
+};
+
 struct perf_pmu {
 	char *name;
 	__u32 type;
 	struct list_head format;
+	struct list_head aliases;
 	struct list_head list;
 };
 
 struct perf_pmu *perf_pmu__find(char *name);
 int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
 		     struct list_head *head_terms);
-
+struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
+				struct list_head *head_terms);
 int perf_pmu_wrap(void);
 void perf_pmu_error(struct list_head *list, char *name, char const *msg);
 







^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-03 20:05       ` Jiri Olsa
  2012-05-04 12:32         ` Yan, Zheng
  2012-05-07  8:34         ` Yan, Zheng
@ 2012-05-07 17:14         ` Peter Zijlstra
  2 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-07 17:14 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Yan, Zheng, mingo, andi, eranian, ming.m.lin, linux-kernel

On Thu, 2012-05-03 at 22:05 +0200, Jiri Olsa wrote:
> thoughts?

The currently proposed syntax for aliases is 'pmu/alias,more-terms/'
right?

Should we also allow something like 'pmu/event=alias,more-terms/' ? Or
possibly even do only that?

The reason is that that would be much easier for the external events
Stephane wants with that JSON file. I think the sysfs alias and external
JSON alias should be the same mechanism and syntax.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event
  2012-05-02  2:07 ` [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event Yan, Zheng
@ 2012-05-09  6:38   ` Anshuman Khandual
  2012-05-10  1:09     ` Yan, Zheng
  0 siblings, 1 reply; 38+ messages in thread
From: Anshuman Khandual @ 2012-05-09  6:38 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wednesday 02 May 2012 07:37 AM, Yan, Zheng wrote:

> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> Allow the pmu->event_init callback to change event->cpu, so pmu can
> choose cpu on which to install event.
> 
> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ---
>  kernel/events/core.c |   13 +++++++++----
>  1 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 32cfc76..84911de 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6250,6 +6250,8 @@ SYSCALL_DEFINE5(perf_event_open,
>  		}
>  	}
> 
> +	get_online_cpus();

Why this protection against cpu hotplug operation ? Is this because PMU now can change event->cpu
during event initialization (specific to uncore PMU events) or this protection has always been required
for normal on-cpu HW PMU events also and we added it right now ?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event
  2012-05-09  6:38   ` Anshuman Khandual
@ 2012-05-10  1:09     ` Yan, Zheng
  2012-05-10  3:41       ` Anshuman Khandual
  0 siblings, 1 reply; 38+ messages in thread
From: Yan, Zheng @ 2012-05-10  1:09 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On 05/09/2012 02:38 PM, Anshuman Khandual wrote:
> On Wednesday 02 May 2012 07:37 AM, Yan, Zheng wrote:
> 
>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>
>> Allow the pmu->event_init callback to change event->cpu, so pmu can
>> choose cpu on which to install event.
>>
>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
>> ---
>>  kernel/events/core.c |   13 +++++++++----
>>  1 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 32cfc76..84911de 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -6250,6 +6250,8 @@ SYSCALL_DEFINE5(perf_event_open,
>>  		}
>>  	}
>>
>> +	get_online_cpus();
> 
> Why this protection against cpu hotplug operation ? Is this because PMU now can change event->cpu
> during event initialization (specific to uncore PMU events) or this protection has always been required
> for normal on-cpu HW PMU events also and we added it right now ?
> 
I think it's always required. Because when creating a perf event, 'cpu online' is checked by
find_get_context, the cpu can go offline after find_get_context return.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event
  2012-05-10  1:09     ` Yan, Zheng
@ 2012-05-10  3:41       ` Anshuman Khandual
  2012-05-10 10:56         ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Anshuman Khandual @ 2012-05-10  3:41 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Thursday 10 May 2012 06:39 AM, Yan, Zheng wrote:

> On 05/09/2012 02:38 PM, Anshuman Khandual wrote:
>> On Wednesday 02 May 2012 07:37 AM, Yan, Zheng wrote:
>>
>>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>>
>>> Allow the pmu->event_init callback to change event->cpu, so pmu can
>>> choose cpu on which to install event.
>>>
>>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
>>> ---
>>>  kernel/events/core.c |   13 +++++++++----
>>>  1 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 32cfc76..84911de 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -6250,6 +6250,8 @@ SYSCALL_DEFINE5(perf_event_open,
>>>  		}
>>>  	}
>>>
>>> +	get_online_cpus();
>>
>> Why this protection against cpu hotplug operation ? Is this because PMU now can change event->cpu
>> during event initialization (specific to uncore PMU events) or this protection has always been required
>> for normal on-cpu HW PMU events also and we added it right now ?
>>
> I think it's always required. Because when creating a perf event, 'cpu online' is checked by
> find_get_context, the cpu can go offline after find_get_context return.

Agreed. So here the get_online_cpus()/put_online_cpus() pair solves an existing problem. Could you please
put an additional statement explaining this in the patch documentation. Thank you.


> Regards
> Yan, Zheng
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-03 17:12   ` Peter Zijlstra
  2012-05-04  7:33     ` Yan, Zheng
@ 2012-05-10  7:34     ` Yan, Zheng
  2012-05-10 10:05       ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Yan, Zheng @ 2012-05-10  7:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On 05/04/2012 01:12 AM, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 10:07 +0800, Yan, Zheng wrote:
>> +static struct intel_uncore_box *
>> +__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
>> +{
>> +       struct intel_uncore_box *box;
>> +       struct hlist_head *head;
>> +       struct hlist_node *node;
>> +
>> +       head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
>> +       hlist_for_each_entry_rcu(box, node, head, hlist) {
>> +               if (box->phy_id == phyid)
>> +                       return box;
>> +       }
>> +
>> +       return NULL;
>> +} 
> 
> I still don't get why something like:
> 
> static struct intel_uncore_box *
> pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
> {
> 	return per_cpu_ptr(pmu->box, cpu);
> }
> 
> doesn't work.
> 
> Last time you mumbled something about PCI devices, but afaict those are
> in all respects identical to MSR devices except you talk to them using
> PCI-mmio instead of MSR registers.
> 
> In fact, since its all local to the generic code there's nothing
> different between pci/msr already.
> 
> So how about something like this:
> 
> ---
>  Makefile                  |    4 +-
>  perf_event_intel_uncore.c |   92 ++++++++++++++++++----------------------------
>  perf_event_intel_uncore.h |    4 +-
>  3 files changed, 42 insertions(+), 58 deletions(-)
> 
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -32,7 +32,9 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_event
>  
>  ifdef CONFIG_PERF_EVENTS
>  obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o
> -obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o perf_event_intel_uncore.o
> +obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_p4.o 
> +obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o 
> +obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o
>  endif
>  
>  obj-$(CONFIG_X86_MCE)			+= mcheck/
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -116,40 +116,21 @@ struct intel_uncore_box *uncore_alloc_bo
>  }
>  
>  static struct intel_uncore_box *
> -__uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> +uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
>  {
> -	struct intel_uncore_box *box;
> -	struct hlist_head *head;
> -	struct hlist_node *node;
> -
> -	head = &pmu->box_hash[phyid % UNCORE_BOX_HASH_SIZE];
> -	hlist_for_each_entry_rcu(box, node, head, hlist) {
> -		if (box->phy_id == phyid)
> -			return box;
> -	}
> -
> -	return NULL;
> -}
> -
> -static struct intel_uncore_box *
> -uncore_pmu_find_box(struct intel_uncore_pmu *pmu, int phyid)
> -{
> -	struct intel_uncore_box *box;
> -
> -	rcu_read_lock();
> -	box = __uncore_pmu_find_box(pmu, phyid);
> -	rcu_read_unlock();
> -
> -	return box;
> +	return per_cpu_ptr(pmu->box, cpu);
>  }
>  
>  static void uncore_pmu_add_box(struct intel_uncore_pmu *pmu,
>  				struct intel_uncore_box *box)
>  {
> -	struct hlist_head *head;
> +	int cpu;
>  
> -	head = &pmu->box_hash[box->phy_id % UNCORE_BOX_HASH_SIZE];
> -	hlist_add_head_rcu(&box->hlist, head);
> +	for_each_cpu(cpu) {
> +		if (box->phys_id != topology_physical_package_id(cpu))
> +			continue;
> +		per_cpu_ptr(pmu->box, cpu) = box;
> +	}
>  }

This code doesn't work for PCI uncore device if there are offline CPUs,
because topology_physical_package_id() always return 0 for offline CPUs.
So besides the per CPU variable, we still need another data structure
to track the uncore boxes. Do you still want to use per CPU variable?

Regards
Yan, Zheng


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 9/9] perf tool: Add pmu event alias support
  2012-05-07  8:34         ` Yan, Zheng
@ 2012-05-10  9:52           ` Jiri Olsa
  0 siblings, 0 replies; 38+ messages in thread
From: Jiri Olsa @ 2012-05-10  9:52 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Peter Zijlstra, mingo, andi, eranian, ming.m.lin, linux-kernel

On Mon, May 07, 2012 at 04:34:12PM +0800, Yan, Zheng wrote:
> On 05/04/2012 04:05 AM, Jiri Olsa wrote:
> > On Thu, May 03, 2012 at 01:24:21PM +0200, Peter Zijlstra wrote:
> >> On Thu, 2012-05-03 at 12:56 +0200, Jiri Olsa wrote:
> >>> - in sysfs you would have directory with aliases (now called 'events')

SNIP

> 
> How is the patch below, it implements option 2.
hi,
sorry for late reply and long email ;) comments below

jirka

> 
> Thanks
> ---
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index c587ae8..4ed4278 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -656,22 +656,33 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
>  	struct perf_event_attr attr;
>  	struct perf_pmu *pmu;
>  
> +	/* called by parse_event_config? */
> +	if (!idx) {
> +		list_splice_init(head_config, list);
> +		return 0;
> +	}

ok, so this is because we want to use current event parser to parse the alias terms

options we have globaly ;) AFAICS:

1) the one you have
   - in case the idx pointer is NULL, we use the event list to store
     terms
   - I guess it works ;) but seems to me like we might have troubles
     in future to expand this code further..

   if we want to have it this way, we better use some new parse events
   function argument to define the function of the parser:
       - return events
       - return terms

2) I'm currently looking on having multiple starting points in the
   grammar. It seems that in some cases this is working option:

   http://www.gnu.org/software/bison/manual/html_node/Multiple-start_002dsymbols.html

   I'll update you later with this one ;)


3) having separate parser for terms parsing, which would be called
   from pmu initialization and during event parsing
   - this seems clean and doable but it smells with -ETOOMANYPARSERS ;)

4) having the alias definitions defined within the tree structure
   I described earlier:
        events/
                CAS_COUNT_RD/
                        # files:
                        config  - value 1
                        config1 - value 2
                        mask    - value ...

      - initialy I thought the current sysfs alias format would clash
        with sysfs rules.. but after talking to Peter ;) I think it's ok,
        because it's still <one file - one value>
      - but using this, we would not need special parser

- ad 4) seems now like avoiding the problem
- ad 1) hackish
- ad 3) seems most clean and extentable in future
- ad 4) need to explore

I think we should ask sysfs folks to confirm the sysfs layout.. :)

> +
>  	pmu = perf_pmu__find(name);
>  	if (!pmu)
>  		return -EINVAL;
>  
>  	memset(&attr, 0, sizeof(attr));
>  
> -	/*
> -	 * Configure hardcoded terms first, no need to check
> -	 * return value when called with fail == 0 ;)
> -	 */
> -	config_attr(&attr, head_config, 0);
> +	while (1) {
> +		/*
> +		 * Configure hardcoded terms first, no need to check
> +		 * return value when called with fail == 0 ;)
> +		 */
> +		config_attr(&attr, head_config, 0);
>  
> -	if (perf_pmu__config(pmu, &attr, head_config))
> -		return -EINVAL;
> +		if (!perf_pmu__config(pmu, &attr, head_config))
> +			return add_event(list, idx, &attr, (char *) "pmu");
>  
> -	return add_event(list, idx, &attr, (char *) "pmu");
> +		head_config = perf_pmu__alias(pmu, head_config);
> +		if (!head_config)
> +			break;
> +	}
> +	return -EINVAL;
The perf_pmu__alias checks first term in the list for the alias, but what
if the alias is second term? I think we could be more general like:


parse_events_add_pmu {
	...

	config_attr(&attr, head_config, 0);

	config_alias(pmu, head_config);

	if (perf_pmu__config(pmu, &attr, head_config))
		return -EINVAL;

	return 0;
}

config_alias(pmu, head) {
	for each term in head {
		if (is term alias in pmu) {
			replace the alias term with its term definition (multiple terms probably)
		}
	}
}

so we just replace the alias term with it's definition terms
and call perf_pmu__config with final terms

this way:
- we could add more terms on command line in addition to the alias
- alias does not need to be the first term specified on command line


>  }
>  
>  void parse_events_update_lists(struct list_head *list_event,
> @@ -771,6 +782,26 @@ static int __parse_events(const char *str, int *idx, struct list_head *list)
>  	return ret;
>  }
>  
> +/*
> + * parse event config string, return a list of event terms.
> + */
> +int parse_event_config(struct list_head *terms, const char *str)
> +{
> +	char *buf;
> +	int ret;
> +
> +	buf = malloc(strlen(str) + 6);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	/* It is no matter which pmu is used here */
> +	sprintf(buf, "cpu/%s/", str);
> +	ret = __parse_events(buf, NULL, terms);
> +
> +	free(buf);
> +	return ret;
> +}
> +
>  int parse_events(struct perf_evlist *evlist, const char *str, int unset __used)
>  {
>  	LIST_HEAD(list);
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index e1ffeb7..5b5d698 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -30,6 +30,7 @@ extern int parse_events_option(const struct option *opt, const char *str,
>  extern int parse_events(struct perf_evlist *evlist, const char *str,
>  			int unset);
>  extern int parse_filter(const struct option *opt, const char *str, int unset);
> +extern int parse_event_config(struct list_head *terms, const char *str);
>  
>  #define EVENTS_HELP_MAX (128*1024)
>  
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index 52082a7..8a26f3d 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -197,7 +197,7 @@ PE_NAME
>  {
>  	struct parse_events__term *term;
>  
> -	ABORT_ON(parse_events__new_term(&term, PARSE_EVENTS__TERM_TYPE_NUM,
> +	ABORT_ON(parse_events__new_term(&term, PARSE_EVENTS__TERM_TYPE_STR,
>  		 $1, NULL, 1));
>  	$$ = term;
>  }
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index cb08a11..7a85779 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -80,6 +80,95 @@ static int pmu_format(char *name, struct list_head *format)
>  	return 0;
>  }
>  
> +static int perf_pmu__new_alias(struct list_head *list, char *name, FILE *file)
> +{
> +	struct perf_pmu__alias *alias;
> +	char buf[256];
> +	int ret;
> +
> +	ret = fread(buf, 1, sizeof(buf), file);
> +	if (ret == 0)
> +		return -EINVAL;
> +	buf[ret] = 0;
> +
> +	alias = malloc(sizeof(*alias));
> +	if (!alias)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&alias->terms);
> +	ret = parse_event_config(&alias->terms, buf);
> +	if (ret) {
> +		free(alias);
> +		return ret;
> +	}
> +
> +	alias->name = strdup(name);
> +	list_add_tail(&alias->list, list);
> +	return 0;
> +}
> +
> +/*
> + * Process all the sysfs attributes located under the directory
> + * specified in 'dir' parameter.
> + */
> +static int pmu_aliases_parse(char *dir, struct list_head *head)
> +{
> +	struct dirent *evt_ent;
> +	DIR *event_dir;
> +	int ret = 0;
> +
> +	event_dir = opendir(dir);
> +	if (!event_dir)
> +		return -EINVAL;
> +
> +	while (!ret && (evt_ent = readdir(event_dir))) {
> +		char path[PATH_MAX];
> +		char *name = evt_ent->d_name;
> +		FILE *file;
> +
> +		if (!strcmp(name, ".") || !strcmp(name, ".."))
> +			continue;
> +
> +		snprintf(path, PATH_MAX, "%s/%s", dir, name);
> +
> +		ret = -EINVAL;
> +		file = fopen(path, "r");
> +		if (!file)
> +			break;
> +		ret = perf_pmu__new_alias(head, name, file);
> +		fclose(file);
> +	}
> +
> +	closedir(event_dir);
> +	return ret;
> +}
> +
> +/*
> + * Reading the pmu event aliases definition, which should be located at:
> + * /sys/bus/event_source/devices/<dev>/events as sysfs group attributes.
> + */
> +static int pmu_aliases(char *name, struct list_head *head)
> +{
> +	struct stat st;
> +	char path[PATH_MAX];
> +	const char *sysfs;
> +
> +	sysfs = sysfs_find_mountpoint();
> +	if (!sysfs)
> +		return -1;
> +
> +	snprintf(path, PATH_MAX,
> +		 "%s/bus/event_source/devices/%s/events", sysfs, name);
> +
> +	if (stat(path, &st) < 0)
> +		return -1;
> +
> +	if (pmu_aliases_parse(path, head))
> +		return -1;
> +
> +	return 0;
> +}
> +
>  /*
>   * Reading/parsing the default pmu type value, which should be
>   * located at:
> @@ -118,6 +207,7 @@ static struct perf_pmu *pmu_lookup(char *name)
>  {
>  	struct perf_pmu *pmu;
>  	LIST_HEAD(format);
> +	LIST_HEAD(aliases);
>  	__u32 type;
>  
>  	/*
> @@ -135,8 +225,12 @@ static struct perf_pmu *pmu_lookup(char *name)
>  	if (!pmu)
>  		return NULL;
>  
> +	pmu_aliases(name, &aliases);
> +
>  	INIT_LIST_HEAD(&pmu->format);
> +	INIT_LIST_HEAD(&pmu->aliases);
>  	list_splice(&format, &pmu->format);
> +	list_splice(&aliases, &pmu->aliases);
>  	pmu->name = strdup(name);
>  	pmu->type = type;
>  	return pmu;
> @@ -262,6 +356,35 @@ static int pmu_config(struct list_head *formats, struct perf_event_attr *attr,
>  	return 0;
>  }
>  
> +static struct perf_pmu__alias *pmu_find_alias(struct list_head *events,
> +					      char *name)
> +{
> +	struct perf_pmu__alias *alias;
> +
> +	list_for_each_entry(alias, events, list) {
> +		if (!strcmp(alias->name, name))
> +			return alias;
> +	}
> +	return NULL;
> +}
> +
> +struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
> +				  struct list_head *head_terms)
> +{
> +	struct parse_events__term *term;
> +	struct perf_pmu__alias *alias;
> +
> +	if (!list_is_singular(head_terms))
> +		return NULL;
> +
> +	term = list_entry(head_terms->next, struct parse_events__term, list);
> +	if (term->type != PARSE_EVENTS__TERM_TYPE_STR || term->val.str)
> +		return NULL;
> +
> +	alias = pmu_find_alias(&pmu->aliases, term->config);
> +	return &alias->terms;
> +}
> +
>  /*
>   * Configures event's 'attr' parameter based on the:
>   * 1) users input - specified in terms parameter
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 68c0db9..8fad317 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -19,17 +19,25 @@ struct perf_pmu__format {
>  	struct list_head list;
>  };
>  
> +struct perf_pmu__alias {
> +	char *name;
> +	struct list_head terms;
> +	struct list_head list;
> +};
> +
>  struct perf_pmu {
>  	char *name;
>  	__u32 type;
>  	struct list_head format;
> +	struct list_head aliases;
>  	struct list_head list;
>  };
>  
>  struct perf_pmu *perf_pmu__find(char *name);
>  int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
>  		     struct list_head *head_terms);
> -
> +struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
> +				struct list_head *head_terms);
>  int perf_pmu_wrap(void);
>  void perf_pmu_error(struct list_head *list, char *name, char const *msg);
>  
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-10  7:34     ` Yan, Zheng
@ 2012-05-10 10:05       ` Peter Zijlstra
  2012-05-11  1:54         ` Yan, Zheng
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-10 10:05 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel, hpa

On Thu, 2012-05-10 at 15:34 +0800, Yan, Zheng wrote:
> 
> This code doesn't work for PCI uncore device if there are offline CPUs,
> because topology_physical_package_id() always return 0 for offline CPUs.

Hmm, that sounds wrong.. one would expect something like BAD_APICID, -1
or the correct number. hpa should we fix that?

Anyway,

> So besides the per CPU variable, we still need another data structure
> to track the uncore boxes. Do you still want to use per CPU variable?

Well you don't really need the value for offline CPUs, until they're
online right? So all you need to do is iterate all the box muck in a
hotplug handler and set the right pointer, no? (I suspect this would
need to be done in CPU_STARTING).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event
  2012-05-10  3:41       ` Anshuman Khandual
@ 2012-05-10 10:56         ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2012-05-10 10:56 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Yan, Zheng, mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Thu, 2012-05-10 at 09:11 +0530, Anshuman Khandual wrote:
> 
> Agreed. So here the get_online_cpus()/put_online_cpus() pair solves an
> existing problem. Could you please
> put an additional statement explaining this in the patch
> documentation. Thank you. 

No, make it a separate patch. If a patch does two separate things its
doing it wrong ;-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-10 10:05       ` Peter Zijlstra
@ 2012-05-11  1:54         ` Yan, Zheng
  0 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-11  1:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel, hpa

On 05/10/2012 06:05 PM, Peter Zijlstra wrote:
> On Thu, 2012-05-10 at 15:34 +0800, Yan, Zheng wrote:
>>
>> This code doesn't work for PCI uncore device if there are offline CPUs,
>> because topology_physical_package_id() always return 0 for offline CPUs.
> 
> Hmm, that sounds wrong.. one would expect something like BAD_APICID, -1
> or the correct number. hpa should we fix that?
> 
> Anyway,
> 
>> So besides the per CPU variable, we still need another data structure
>> to track the uncore boxes. Do you still want to use per CPU variable?
> 
> Well you don't really need the value for offline CPUs, until they're
> online right? 
Yes

> So all you need to do is iterate all the box muck in a
> hotplug handler and set the right pointer, no? (I suspect this would
> need to be done in CPU_STARTING).
The problem is that CPUs on a particular socket can all be offline. So
we need an extra data structure to track the uncore boxes. This means
that the memory addresses of uncore boxes are stored in two places.
It's a little tricky to keep the two places in sync. For example, In the
case of hot-plugging CPU, the per CPU pointers can only be set after the
PCI uncore devices are probed. How to handle the case that cpu hotplug
handler is executed before PCI uncore devices are probed? For the reason
listed above I still prefer not to use per CPU pointer.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-02  2:07 ` [PATCH 4/9] perf: Generic intel uncore support Yan, Zheng
  2012-05-03 17:12   ` Peter Zijlstra
  2012-05-03 21:49   ` Peter Zijlstra
@ 2012-05-11  6:31   ` Anshuman Khandual
  2012-05-11  6:41     ` Yan, Zheng
  2 siblings, 1 reply; 38+ messages in thread
From: Anshuman Khandual @ 2012-05-11  6:31 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On Wednesday 02 May 2012 07:37 AM, Yan, Zheng wrote:

> +static int __cpuinit uncore_cpu_prepare(int cpu)
> +{
> +	struct intel_uncore_type *type;
> +	struct intel_uncore_pmu *pmu;
> +	struct intel_uncore_box *exist, *box;
> +	int i, j, phyid;
> +
> +	phyid = topology_physical_package_id(cpu);
> +
> +	/* allocate the box data structure */
> +	for (i = 0; msr_uncores[i]; i++) {
> +		type = msr_uncores[i];
> +		for (j = 0; j < type->num_boxes; j++) {
> +			exist = NULL;
> +			pmu = &type->pmus[j];
> +
> +			if (pmu->func_id < 0)
> +				pmu->func_id = j;
> +			exist = uncore_pmu_find_box(pmu, phyid);
> +			if (exist)
> +				exist->refcnt++;
> +			if (exist)
> +				continue;



May be a redundant condition checking ^ ?

> +
> +			box = uncore_alloc_box(cpu);
> +			if (!box)
> +				return -ENOMEM;
> +
> +			box->pmu = pmu;
> +			box->phy_id = phyid;
> +			uncore_pmu_add_box(pmu, box);
> +		}
> +	}
> +	return 0;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/9] perf: Generic intel uncore support
  2012-05-11  6:31   ` Anshuman Khandual
@ 2012-05-11  6:41     ` Yan, Zheng
  0 siblings, 0 replies; 38+ messages in thread
From: Yan, Zheng @ 2012-05-11  6:41 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: a.p.zijlstra, mingo, andi, eranian, jolsa, ming.m.lin, linux-kernel

On 05/11/2012 02:31 PM, Anshuman Khandual wrote:
> On Wednesday 02 May 2012 07:37 AM, Yan, Zheng wrote:
> 
>> +static int __cpuinit uncore_cpu_prepare(int cpu)
>> +{
>> +	struct intel_uncore_type *type;
>> +	struct intel_uncore_pmu *pmu;
>> +	struct intel_uncore_box *exist, *box;
>> +	int i, j, phyid;
>> +
>> +	phyid = topology_physical_package_id(cpu);
>> +
>> +	/* allocate the box data structure */
>> +	for (i = 0; msr_uncores[i]; i++) {
>> +		type = msr_uncores[i];
>> +		for (j = 0; j < type->num_boxes; j++) {
>> +			exist = NULL;
>> +			pmu = &type->pmus[j];
>> +
>> +			if (pmu->func_id < 0)
>> +				pmu->func_id = j;
>> +			exist = uncore_pmu_find_box(pmu, phyid);
>> +			if (exist)
>> +				exist->refcnt++;
>> +			if (exist)
>> +				continue;
> 
> 
> 
> May be a redundant condition checking ^ ?

Yes, it's redundant. I will remove it in later patches.

Regards
Yan, Zheng
> 
>> +
>> +			box = uncore_alloc_box(cpu);
>> +			if (!box)
>> +				return -ENOMEM;
>> +
>> +			box->pmu = pmu;
>> +			box->phy_id = phyid;
>> +			uncore_pmu_add_box(pmu, box);
>> +		}
>> +	}
>> +	return 0;
> 
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2012-05-11  6:42 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-02  2:07 [PATCH V3 0/9] perf: Intel uncore pmu counting support Yan, Zheng
2012-05-02  2:07 ` [PATCH 1/9] perf: Export perf_assign_events Yan, Zheng
2012-05-02  2:07 ` [PATCH 2/9] perf: Allow pmu to choose cpu on which to install event Yan, Zheng
2012-05-09  6:38   ` Anshuman Khandual
2012-05-10  1:09     ` Yan, Zheng
2012-05-10  3:41       ` Anshuman Khandual
2012-05-10 10:56         ` Peter Zijlstra
2012-05-02  2:07 ` [PATCH 3/9] perf: Introduce perf_pmu_migrate_context Yan, Zheng
2012-05-02  2:07 ` [PATCH 4/9] perf: Generic intel uncore support Yan, Zheng
2012-05-03 17:12   ` Peter Zijlstra
2012-05-04  7:33     ` Yan, Zheng
2012-05-04 17:57       ` Peter Zijlstra
2012-05-10  7:34     ` Yan, Zheng
2012-05-10 10:05       ` Peter Zijlstra
2012-05-11  1:54         ` Yan, Zheng
2012-05-03 21:49   ` Peter Zijlstra
2012-05-11  6:31   ` Anshuman Khandual
2012-05-11  6:41     ` Yan, Zheng
2012-05-02  2:07 ` [PATCH 5/9] perf: Add Nehalem and Sandy Bridge " Yan, Zheng
2012-05-03 21:04   ` Peter Zijlstra
2012-05-04  5:47     ` Yan, Zheng
2012-05-03 21:04   ` Peter Zijlstra
2012-05-02  2:07 ` [PATCH 6/9] perf: Generic pci uncore device support Yan, Zheng
2012-05-03 21:37   ` Peter Zijlstra
2012-05-03 21:39   ` Peter Zijlstra
2012-05-03 21:46   ` Peter Zijlstra
2012-05-04  6:07     ` Yan, Zheng
2012-05-02  2:07 ` [PATCH 7/9] perf: Add Sandy Bridge-EP uncore support Yan, Zheng
2012-05-03 21:12   ` Peter Zijlstra
2012-05-02  2:07 ` [PATCH 8/9] perf tool: Make the event parser reentrantable Yan, Zheng
2012-05-02  2:07 ` [PATCH 9/9] perf tool: Add pmu event alias support Yan, Zheng
2012-05-03 10:56   ` Jiri Olsa
2012-05-03 11:24     ` Peter Zijlstra
2012-05-03 20:05       ` Jiri Olsa
2012-05-04 12:32         ` Yan, Zheng
2012-05-07  8:34         ` Yan, Zheng
2012-05-10  9:52           ` Jiri Olsa
2012-05-07 17:14         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).