All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/8] perf pmu interface
@ 2010-06-16 16:00 Peter Zijlstra
  2010-06-16 16:00 ` [RFC][PATCH 1/8] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

These patches prepare the perf code for multiple pmus (no user
interface yet, Lin Ming is working on that). These patches remove all
weak functions and rework the struct pmu interface.

The latter is inspired by the work Frederic is doing to to filter out
IRQ contexts.

These patches are very prelimenary, they haven't seen a compiler yet and
the last patch still needs sparc,ppc and arm converted.

But they patches seem to be in a good enough shape to see what people
think..

comments?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 1/8] perf, x86: Fix Nehalem PMU quirk
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-16 16:00 ` [RFC][PATCH 2/8] perf: deconstify struct pmu Peter Zijlstra
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-x86-fix-nehalem-quirk.patch --]
[-- Type: text/plain, Size: 748 bytes --]

The idea was to run all three counters, which means we need to
program 7 into the enable mask, not 3 (which is two bits).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -504,7 +504,7 @@ static void intel_pmu_nhm_enable_all(int
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
 
-		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
+		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
 		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
 
 		for (i = 0; i < 3; i++) {



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 2/8] perf: deconstify struct pmu
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
  2010-06-16 16:00 ` [RFC][PATCH 1/8] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-16 16:00 ` [RFC][PATCH 3/8] perf: register pmu implementations Peter Zijlstra
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-deconst-pmu.patch --]
[-- Type: text/plain, Size: 12375 bytes --]

sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/perf_event.c             |    2 +-
 arch/powerpc/kernel/perf_event.c         |    8 ++++----
 arch/powerpc/kernel/perf_event_fsl_emb.c |    2 +-
 arch/sh/kernel/perf_event.c              |    4 ++--
 arch/sparc/kernel/perf_event.c           |   10 +++++-----
 arch/x86/kernel/cpu/perf_event.c         |   14 +++++++-------
 include/linux/perf_event.h               |   10 +++++-----
 kernel/perf_event.c                      |   26 +++++++++++++-------------
 8 files changed, 38 insertions(+), 38 deletions(-)

Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -491,7 +491,7 @@ __hw_perf_event_init(struct perf_event *
 	return err;
 }
 
-const struct pmu *
+struct pmu *
 hw_perf_event_init(struct perf_event *event)
 {
 	int err = 0;
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -854,7 +854,7 @@ static void power_pmu_unthrottle(struct 
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-void power_pmu_start_txn(const struct pmu *pmu)
+void power_pmu_start_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -867,7 +867,7 @@ void power_pmu_start_txn(const struct pm
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-void power_pmu_cancel_txn(const struct pmu *pmu)
+void power_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -879,7 +879,7 @@ void power_pmu_cancel_txn(const struct p
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-int power_pmu_commit_txn(const struct pmu *pmu)
+int power_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	long i, n;
@@ -1011,7 +1011,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	u64 ev;
 	unsigned long flags;
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -428,7 +428,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	u64 ev;
 	struct perf_event *events[MAX_HWEVENTS];
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -257,13 +257,13 @@ static void sh_pmu_read(struct perf_even
 	sh_perf_event_update(event, &event->hw, event->hw.idx);
 }
 
-static const struct pmu pmu = {
+static struct pmu pmu = {
 	.enable		= sh_pmu_enable,
 	.disable	= sh_pmu_disable,
 	.read		= sh_pmu_read,
 };
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	int err = __hw_perf_event_init(event);
 	if (unlikely(err)) {
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -1098,7 +1098,7 @@ static int __hw_perf_event_init(struct p
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void sparc_pmu_start_txn(const struct pmu *pmu)
+static void sparc_pmu_start_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -1110,7 +1110,7 @@ static void sparc_pmu_start_txn(const st
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-static void sparc_pmu_cancel_txn(const struct pmu *pmu)
+static void sparc_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -1122,7 +1122,7 @@ static void sparc_pmu_cancel_txn(const s
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-static int sparc_pmu_commit_txn(const struct pmu *pmu)
+static int sparc_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int n;
@@ -1141,7 +1141,7 @@ static int sparc_pmu_commit_txn(const st
 	return 0;
 }
 
-static const struct pmu pmu = {
+static struct pmu pmu = {
 	.enable		= sparc_pmu_enable,
 	.disable	= sparc_pmu_disable,
 	.read		= sparc_pmu_read,
@@ -1151,7 +1151,7 @@ static const struct pmu pmu = {
 	.commit_txn	= sparc_pmu_commit_txn,
 };
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	int err = __hw_perf_event_init(event);
 
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -618,7 +618,7 @@ static void x86_pmu_enable_all(int added
 	}
 }
 
-static const struct pmu pmu;
+static struct pmu pmu;
 
 static inline int is_x86_event(struct perf_event *event)
 {
@@ -1394,7 +1394,7 @@ static inline void x86_pmu_read(struct p
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void x86_pmu_start_txn(const struct pmu *pmu)
+static void x86_pmu_start_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -1407,7 +1407,7 @@ static void x86_pmu_start_txn(const stru
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-static void x86_pmu_cancel_txn(const struct pmu *pmu)
+static void x86_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -1424,7 +1424,7 @@ static void x86_pmu_cancel_txn(const str
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-static int x86_pmu_commit_txn(const struct pmu *pmu)
+static int x86_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int assign[X86_PMC_IDX_MAX];
@@ -1450,7 +1450,7 @@ static int x86_pmu_commit_txn(const stru
 	return 0;
 }
 
-static const struct pmu pmu = {
+static struct pmu pmu = {
 	.enable		= x86_pmu_enable,
 	.disable	= x86_pmu_disable,
 	.start		= x86_pmu_start,
@@ -1536,9 +1536,9 @@ out:
 	return ret;
 }
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
-	const struct pmu *tmp;
+	struct pmu *tmp;
 	int err;
 
 	err = __hw_perf_event_init(event);
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -576,19 +576,19 @@ struct pmu {
 	 * Start the transaction, after this ->enable() doesn't need
 	 * to do schedulability tests.
 	 */
-	void (*start_txn)	(const struct pmu *pmu);
+	void (*start_txn)	(struct pmu *pmu);
 	/*
 	 * If ->start_txn() disabled the ->enable() schedulability test
 	 * then ->commit_txn() is required to perform one. On success
 	 * the transaction is closed. On error the transaction is kept
 	 * open until ->cancel_txn() is called.
 	 */
-	int  (*commit_txn)	(const struct pmu *pmu);
+	int  (*commit_txn)	(struct pmu *pmu);
 	/*
 	 * Will cancel the transaction, assumes ->disable() is called for
 	 * each successfull ->enable() during the transaction.
 	 */
-	void (*cancel_txn)	(const struct pmu *pmu);
+	void (*cancel_txn)	(struct pmu *pmu);
 };
 
 /**
@@ -667,7 +667,7 @@ struct perf_event {
 	int				nr_siblings;
 	int				group_flags;
 	struct perf_event		*group_leader;
-	const struct pmu		*pmu;
+	struct pmu		*pmu;
 
 	enum perf_event_active_state	state;
 	unsigned int			attach_state;
@@ -845,7 +845,7 @@ struct perf_output_handle {
  */
 extern int perf_max_events;
 
-extern const struct pmu *hw_perf_event_init(struct perf_event *event);
+extern struct pmu *hw_perf_event_init(struct perf_event *event);
 
 extern void perf_event_task_sched_in(struct task_struct *task);
 extern void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next);
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -75,7 +75,7 @@ static DEFINE_SPINLOCK(perf_resource_loc
 /*
  * Architecture provided APIs - weak aliases:
  */
-extern __weak const struct pmu *hw_perf_event_init(struct perf_event *event)
+extern __weak struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	return NULL;
 }
@@ -673,7 +673,7 @@ group_sched_in(struct perf_event *group_
 	       struct perf_event_context *ctx)
 {
 	struct perf_event *event, *partial_group = NULL;
-	const struct pmu *pmu = group_event->pmu;
+	struct pmu *pmu = group_event->pmu;
 	bool txn = false;
 
 	if (group_event->state == PERF_EVENT_STATE_OFF)
@@ -4291,7 +4291,7 @@ static int perf_swevent_int(struct perf_
 	return 0;
 }
 
-static const struct pmu perf_ops_generic = {
+static struct pmu perf_ops_generic = {
 	.enable		= perf_swevent_enable,
 	.disable	= perf_swevent_disable,
 	.start		= perf_swevent_int,
@@ -4404,7 +4404,7 @@ static void cpu_clock_perf_event_read(st
 	cpu_clock_perf_event_update(event);
 }
 
-static const struct pmu perf_ops_cpu_clock = {
+static struct pmu perf_ops_cpu_clock = {
 	.enable		= cpu_clock_perf_event_enable,
 	.disable	= cpu_clock_perf_event_disable,
 	.read		= cpu_clock_perf_event_read,
@@ -4461,7 +4461,7 @@ static void task_clock_perf_event_read(s
 	task_clock_perf_event_update(event, time);
 }
 
-static const struct pmu perf_ops_task_clock = {
+static struct pmu perf_ops_task_clock = {
 	.enable		= task_clock_perf_event_enable,
 	.disable	= task_clock_perf_event_disable,
 	.read		= task_clock_perf_event_read,
@@ -4575,7 +4575,7 @@ static int swevent_hlist_get(struct perf
 
 #ifdef CONFIG_EVENT_TRACING
 
-static const struct pmu perf_ops_tracepoint = {
+static struct pmu perf_ops_tracepoint = {
 	.enable		= perf_trace_enable,
 	.disable	= perf_trace_disable,
 	.start		= perf_swevent_int,
@@ -4639,7 +4639,7 @@ static void tp_perf_event_destroy(struct
 	perf_trace_destroy(event);
 }
 
-static const struct pmu *tp_perf_event_init(struct perf_event *event)
+static struct pmu *tp_perf_event_init(struct perf_event *event)
 {
 	int err;
 
@@ -4686,7 +4686,7 @@ static void perf_event_free_filter(struc
 
 #else
 
-static const struct pmu *tp_perf_event_init(struct perf_event *event)
+static struct pmu *tp_perf_event_init(struct perf_event *event)
 {
 	return NULL;
 }
@@ -4708,7 +4708,7 @@ static void bp_perf_event_destroy(struct
 	release_bp_slot(event);
 }
 
-static const struct pmu *bp_perf_event_init(struct perf_event *bp)
+static struct pmu *bp_perf_event_init(struct perf_event *bp)
 {
 	int err;
 
@@ -4732,7 +4732,7 @@ void perf_bp_event(struct perf_event *bp
 		perf_swevent_add(bp, 1, 1, &sample, regs);
 }
 #else
-static const struct pmu *bp_perf_event_init(struct perf_event *bp)
+static struct pmu *bp_perf_event_init(struct perf_event *bp)
 {
 	return NULL;
 }
@@ -4754,9 +4754,9 @@ static void sw_perf_event_destroy(struct
 	swevent_hlist_put(event);
 }
 
-static const struct pmu *sw_perf_event_init(struct perf_event *event)
+static struct pmu *sw_perf_event_init(struct perf_event *event)
 {
-	const struct pmu *pmu = NULL;
+	struct pmu *pmu = NULL;
 	u64 event_id = event->attr.config;
 
 	/*
@@ -4818,7 +4818,7 @@ perf_event_alloc(struct perf_event_attr 
 		   perf_overflow_handler_t overflow_handler,
 		   gfp_t gfpflags)
 {
-	const struct pmu *pmu;
+	struct pmu *pmu;
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
 	long err;



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
  2010-06-16 16:00 ` [RFC][PATCH 1/8] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
  2010-06-16 16:00 ` [RFC][PATCH 2/8] perf: deconstify struct pmu Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-16 16:45   ` Robert Richter
                     ` (2 more replies)
  2010-06-16 16:00 ` [RFC][PATCH 4/8] perf: Unindent labels Peter Zijlstra
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-register-pmu.patch --]
[-- Type: text/plain, Size: 32507 bytes --]

Simple registration interface for struct pmu, this provides the
infrastructure for removing all the weak functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/perf_event.c             |   36 +
 arch/powerpc/kernel/perf_event.c         |   41 +-
 arch/powerpc/kernel/perf_event_fsl_emb.c |   31 -
 arch/sh/kernel/perf_event.c              |   19 
 arch/sparc/kernel/perf_event.c           |   13 
 arch/x86/kernel/cpu/perf_event.c         |   45 +-
 include/linux/perf_event.h               |   10 
 kernel/hw_breakpoint.c                   |   35 +
 kernel/perf_event.c                      |  597 +++++++++++++++----------------
 9 files changed, 432 insertions(+), 395 deletions(-)

Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -306,13 +306,6 @@ out:
 	return err;
 }
 
-static struct pmu pmu = {
-	.enable	    = armpmu_enable,
-	.disable    = armpmu_disable,
-	.unthrottle = armpmu_unthrottle,
-	.read	    = armpmu_read,
-};
-
 static int
 validate_event(struct cpu_hw_events *cpuc,
 	       struct perf_event *event)
@@ -491,13 +484,22 @@ __hw_perf_event_init(struct perf_event *
 	return err;
 }
 
-struct pmu *
-hw_perf_event_init(struct perf_event *event)
+static int armpmu_event_init(struct perf_event *event)
 {
 	int err = 0;
 
+	switch (event->attr.type) {
+	case PERF_TYPE_RAW:
+	case PERF_TYPE_HARDWARE:
+	case PERF_TYPE_HW_CACHE:
+		break;
+
+	default:
+		return -ENOENT;
+	}
+
 	if (!armpmu)
-		return ERR_PTR(-ENODEV);
+		return -ENODEV;
 
 	event->destroy = hw_perf_event_destroy;
 
@@ -518,15 +520,23 @@ hw_perf_event_init(struct perf_event *ev
 	}
 
 	if (err)
-		return ERR_PTR(err);
+		return err;
 
 	err = __hw_perf_event_init(event);
 	if (err)
 		hw_perf_event_destroy(event);
 
-	return err ? ERR_PTR(err) : &pmu;
+	return err;
 }
 
+static struct pmu pmu = {
+	.event_init = armpmu_event_init,
+	.enable	    = armpmu_enable,
+	.disable    = armpmu_disable,
+	.unthrottle = armpmu_unthrottle,
+	.read	    = armpmu_read,
+};
+
 void
 hw_perf_enable(void)
 {
@@ -2994,6 +3004,8 @@ init_hw_perf_events(void)
 		perf_max_events = -1;
 	}
 
+	perf_pmu_register(&pmu)
+
 	return 0;
 }
 arch_initcall(init_hw_perf_events);
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -901,16 +901,6 @@ int power_pmu_commit_txn(struct pmu *pmu
 	return 0;
 }
 
-struct pmu power_pmu = {
-	.enable		= power_pmu_enable,
-	.disable	= power_pmu_disable,
-	.read		= power_pmu_read,
-	.unthrottle	= power_pmu_unthrottle,
-	.start_txn	= power_pmu_start_txn,
-	.cancel_txn	= power_pmu_cancel_txn,
-	.commit_txn	= power_pmu_commit_txn,
-};
-
 /*
  * Return 1 if we might be able to put event on a limited PMC,
  * or 0 if not.
@@ -1011,7 +1001,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
+static int power_pmu_event_init(struct perf_event *event)
 {
 	u64 ev;
 	unsigned long flags;
@@ -1023,7 +1013,8 @@ struct pmu *hw_perf_event_init(struct pe
 	struct cpu_hw_events *cpuhw;
 
 	if (!ppmu)
-		return ERR_PTR(-ENXIO);
+		return -ENOENT;
+
 	switch (event->attr.type) {
 	case PERF_TYPE_HARDWARE:
 		ev = event->attr.config;
@@ -1040,8 +1031,9 @@ struct pmu *hw_perf_event_init(struct pe
 		ev = event->attr.config;
 		break;
 	default:
-		return ERR_PTR(-EINVAL);
+		return -ENOENT;
 	}
+
 	event->hw.config_base = ev;
 	event->hw.idx = 0;
 
@@ -1078,7 +1070,7 @@ struct pmu *hw_perf_event_init(struct pe
 			 */
 			ev = normal_pmc_alternative(ev, flags);
 			if (!ev)
-				return ERR_PTR(-EINVAL);
+				return -EINVAL;
 		}
 	}
 
@@ -1092,19 +1084,19 @@ struct pmu *hw_perf_event_init(struct pe
 		n = collect_events(event->group_leader, ppmu->n_counter - 1,
 				   ctrs, events, cflags);
 		if (n < 0)
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 	}
 	events[n] = ev;
 	ctrs[n] = event;
 	cflags[n] = flags;
 	if (check_excludes(ctrs, cflags, n, 1))
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 
 	cpuhw = &get_cpu_var(cpu_hw_events);
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 	put_cpu_var(cpu_hw_events);
 	if (err)
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 
 	event->hw.config = events[n];
 	event->hw.event_base = cflags[n];
@@ -1129,11 +1121,19 @@ struct pmu *hw_perf_event_init(struct pe
 	}
 	event->destroy = hw_perf_event_destroy;
 
-	if (err)
-		return ERR_PTR(err);
-	return &power_pmu;
+	return err;
 }
 
+struct pmu power_pmu = {
+	.enable		= power_pmu_enable,
+	.disable	= power_pmu_disable,
+	.read		= power_pmu_read,
+	.unthrottle	= power_pmu_unthrottle,
+	.start_txn	= power_pmu_start_txn,
+	.cancel_txn	= power_pmu_cancel_txn,
+	.commit_txn	= power_pmu_commit_txn,
+};
+
 /*
  * A counter has overflowed; update its count and record
  * things if requested.  Note that interrupts are hard-disabled
@@ -1339,6 +1339,7 @@ int register_power_pmu(struct power_pmu 
 		freeze_events_kernel = MMCR0_FCHV;
 #endif /* CONFIG_PPC64 */
 
+	perf_pmu_register(&power_pmu);
 	perf_cpu_notifier(power_pmu_notifier);
 
 	return 0;
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -378,13 +378,6 @@ static void fsl_emb_pmu_unthrottle(struc
 	local_irq_restore(flags);
 }
 
-static struct pmu fsl_emb_pmu = {
-	.enable		= fsl_emb_pmu_enable,
-	.disable	= fsl_emb_pmu_disable,
-	.read		= fsl_emb_pmu_read,
-	.unthrottle	= fsl_emb_pmu_unthrottle,
-};
-
 /*
  * Release the PMU if this is the last perf_event.
  */
@@ -428,7 +421,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
+static int fsl_emb_pmu_event_init(struct perf_event *event)
 {
 	u64 ev;
 	struct perf_event *events[MAX_HWEVENTS];
@@ -456,12 +449,12 @@ struct pmu *hw_perf_event_init(struct pe
 		break;
 
 	default:
-		return ERR_PTR(-EINVAL);
+		return -ENOENT;
 	}
 
 	event->hw.config = ppmu->xlate_event(ev);
 	if (!(event->hw.config & FSL_EMB_EVENT_VALID))
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 
 	/*
 	 * If this is in a group, check if it can go on with all the
@@ -473,7 +466,7 @@ struct pmu *hw_perf_event_init(struct pe
 		n = collect_events(event->group_leader,
 		                   ppmu->n_counter - 1, events);
 		if (n < 0)
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 	}
 
 	if (event->hw.config & FSL_EMB_EVENT_RESTRICTED) {
@@ -484,7 +477,7 @@ struct pmu *hw_perf_event_init(struct pe
 		}
 
 		if (num_restricted >= ppmu->n_restricted)
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 	}
 
 	event->hw.idx = -1;
@@ -523,11 +516,17 @@ struct pmu *hw_perf_event_init(struct pe
 	}
 	event->destroy = hw_perf_event_destroy;
 
-	if (err)
-		return ERR_PTR(err);
-	return &fsl_emb_pmu;
+	return err;
 }
 
+static struct pmu fsl_emb_pmu = {
+	.event_init	= fsl_emb_pmu_event_init,
+	.enable		= fsl_emb_pmu_enable,
+	.disable	= fsl_emb_pmu_disable,
+	.read		= fsl_emb_pmu_read,
+	.unthrottle	= fsl_emb_pmu_unthrottle,
+};
+
 /*
  * A counter has overflowed; update its count and record
  * things if requested.  Note that interrupts are hard-disabled
@@ -650,5 +649,7 @@ int register_fsl_emb_pmu(struct fsl_emb_
 	pr_info("%s performance monitor hardware support registered\n",
 		pmu->name);
 
+	perf_pmu_register(&fsl_emb_pmu);
+
 	return 0;
 }
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -257,24 +257,24 @@ static void sh_pmu_read(struct perf_even
 	sh_perf_event_update(event, &event->hw, event->hw.idx);
 }
 
-static struct pmu pmu = {
-	.enable		= sh_pmu_enable,
-	.disable	= sh_pmu_disable,
-	.read		= sh_pmu_read,
-};
-
-struct pmu *hw_perf_event_init(struct perf_event *event)
+static in sh_pmu_event_init(struct perf_event *event)
 {
 	int err = __hw_perf_event_init(event);
 	if (unlikely(err)) {
 		if (event->destroy)
 			event->destroy(event);
-		return ERR_PTR(err);
 	}
 
-	return &pmu;
+	return err;
 }
 
+static struct pmu pmu = {
+	.event_init	= sh_pmu_event_init,
+	.enable		= sh_pmu_enable,
+	.disable	= sh_pmu_disable,
+	.read		= sh_pmu_read,
+};
+
 static void sh_pmu_setup(int cpu)
 {
 	struct cpu_hw_events *cpuhw = &per_cpu(cpu_hw_events, cpu);
@@ -325,6 +325,7 @@ int __cpuinit register_sh_pmu(struct sh_
 
 	WARN_ON(pmu->num_events > MAX_HWEVENTS);
 
+	perf_pmu_register(&pmu);
 	perf_cpu_notifier(sh_pmu_notifier);
 	return 0;
 }
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -1024,7 +1024,7 @@ out:
 	return ret;
 }
 
-static int __hw_perf_event_init(struct perf_event *event)
+static int sparc_pmu_event_init(struct perf_event *event)
 {
 	struct perf_event_attr *attr = &event->attr;
 	struct perf_event *evts[MAX_HWEVENTS];
@@ -1142,6 +1142,7 @@ static int sparc_pmu_commit_txn(struct p
 }
 
 static struct pmu pmu = {
+	.event_init	= sparc_pmu_event_init,
 	.enable		= sparc_pmu_enable,
 	.disable	= sparc_pmu_disable,
 	.read		= sparc_pmu_read,
@@ -1151,15 +1152,6 @@ static struct pmu pmu = {
 	.commit_txn	= sparc_pmu_commit_txn,
 };
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
-{
-	int err = __hw_perf_event_init(event);
-
-	if (err)
-		return ERR_PTR(err);
-	return &pmu;
-}
-
 void perf_event_print_debug(void)
 {
 	unsigned long flags;
@@ -1279,6 +1271,7 @@ void __init init_hw_perf_events(void)
 	/* All sparc64 PMUs currently have 2 events.  */
 	perf_max_events = 2;
 
+	perf_pmu_register(&pmu);
 	register_die_notifier(&perf_event_nmi_notifier);
 }
 
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -530,7 +530,7 @@ static int x86_pmu_hw_config(struct perf
 /*
  * Setup the hardware configuration for a given attr_type
  */
-static int __hw_perf_event_init(struct perf_event *event)
+static int __x86_pmu_event_init(struct perf_event *event)
 {
 	int err;
 
@@ -1381,6 +1381,7 @@ void __init init_hw_perf_events(void)
 	pr_info("... fixed-purpose events:   %d\n",     x86_pmu.num_counters_fixed);
 	pr_info("... event mask:             %016Lx\n", x86_pmu.intel_ctrl);
 
+	perf_pmu_register(&pmu);
 	perf_cpu_notifier(x86_pmu_notifier);
 }
 
@@ -1450,18 +1451,6 @@ static int x86_pmu_commit_txn(struct pmu
 	return 0;
 }
 
-static struct pmu pmu = {
-	.enable		= x86_pmu_enable,
-	.disable	= x86_pmu_disable,
-	.start		= x86_pmu_start,
-	.stop		= x86_pmu_stop,
-	.read		= x86_pmu_read,
-	.unthrottle	= x86_pmu_unthrottle,
-	.start_txn	= x86_pmu_start_txn,
-	.cancel_txn	= x86_pmu_cancel_txn,
-	.commit_txn	= x86_pmu_commit_txn,
-};
-
 /*
  * validate that we can schedule this event
  */
@@ -1536,12 +1525,22 @@ out:
 	return ret;
 }
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
+int x86_pmu_event_init(struct perf_event *event)
 {
 	struct pmu *tmp;
 	int err;
 
-	err = __hw_perf_event_init(event);
+	switch (event->attr.type) {
+	case PERF_TYPE_RAW:
+	case PERF_TYPE_HARDWARE:
+	case PERF_TYPE_HW_CACHE:
+		break;
+
+	default:
+		return -ENOENT;
+	}
+
+	err = __x86_pmu_event_init(event);
 	if (!err) {
 		/*
 		 * we temporarily connect event to its pmu
@@ -1561,12 +1560,24 @@ struct pmu *hw_perf_event_init(struct pe
 	if (err) {
 		if (event->destroy)
 			event->destroy(event);
-		return ERR_PTR(err);
 	}
 
-	return &pmu;
+	return err;
 }
 
+static struct pmu pmu = {
+	.event_init	= x86_pmu_event_init
+	.enable		= x86_pmu_enable,
+	.disable	= x86_pmu_disable,
+	.start		= x86_pmu_start,
+	.stop		= x86_pmu_stop,
+	.read		= x86_pmu_read,
+	.unthrottle	= x86_pmu_unthrottle,
+	.start_txn	= x86_pmu_start_txn,
+	.cancel_txn	= x86_pmu_cancel_txn,
+	.commit_txn	= x86_pmu_commit_txn,
+};
+
 /*
  * callchain support
  */
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -559,6 +559,13 @@ struct perf_event;
  * struct pmu - generic performance monitoring unit
  */
 struct pmu {
+	struct list_head		entry;
+
+	/*
+	 * Should return -ENOENT when the @event doesn't match this pmu
+	 */
+	int (*event_init)		(struct perf_event *event);
+
 	int (*enable)			(struct perf_event *event);
 	void (*disable)			(struct perf_event *event);
 	int (*start)			(struct perf_event *event);
@@ -845,7 +852,8 @@ struct perf_output_handle {
  */
 extern int perf_max_events;
 
-extern struct pmu *hw_perf_event_init(struct perf_event *event);
+extern int perf_pmu_register(struct pmu *pmu);
+extern void perf_pmu_unregister(struct pmu *pmu);
 
 extern void perf_event_task_sched_in(struct task_struct *task);
 extern void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next);
Index: linux-2.6/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6.orig/kernel/hw_breakpoint.c
+++ linux-2.6/kernel/hw_breakpoint.c
@@ -549,6 +549,34 @@ static struct notifier_block hw_breakpoi
 	.priority = 0x7fffffff
 };
 
+static void bp_perf_event_destroy(struct perf_event *event)
+{
+	release_bp_slot(event);
+}
+
+static struct pmu *bp_perf_event_init(struct perf_event *bp)
+{
+	int err;
+
+	if (event->attr.type != PERF_TYPE_BREAKPOINT)
+		return -ENOENT;
+
+	err = register_perf_hw_breakpoint(bp);
+	if (err)
+		return err;
+
+	bp->destroy = bp_perf_event_destroy;
+
+	return 0;
+}
+
+static struct pmu perf_breakpoint = {
+	.event_init	= hw_breakpoint_event_init,
+	.enable		= arch_install_hw_breakpoint,
+	.disable	= arch_uninstall_hw_breakpoint,
+	.read		= hw_breakpoint_pmu_read,
+};
+
 static int __init init_hw_breakpoint(void)
 {
 	unsigned int **task_bp_pinned;
@@ -570,6 +598,8 @@ static int __init init_hw_breakpoint(voi
 
 	constraints_initialized = 1;
 
+	perf_pmu_register(&perf_breakpoint);
+
 	return register_die_notifier(&hw_breakpoint_exceptions_nb);
 
  err_alloc:
@@ -585,8 +615,3 @@ static int __init init_hw_breakpoint(voi
 core_initcall(init_hw_breakpoint);
 
 
-struct pmu perf_ops_bp = {
-	.enable		= arch_install_hw_breakpoint,
-	.disable	= arch_uninstall_hw_breakpoint,
-	.read		= hw_breakpoint_pmu_read,
-};
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -31,7 +31,6 @@
 #include <linux/kernel_stat.h>
 #include <linux/perf_event.h>
 #include <linux/ftrace_event.h>
-#include <linux/hw_breakpoint.h>
 
 #include <asm/irq_regs.h>
 
@@ -72,14 +71,6 @@ static atomic64_t perf_event_id;
  */
 static DEFINE_SPINLOCK(perf_resource_lock);
 
-/*
- * Architecture provided APIs - weak aliases:
- */
-extern __weak struct pmu *hw_perf_event_init(struct perf_event *event)
-{
-	return NULL;
-}
-
 void __weak hw_perf_disable(void)		{ barrier(); }
 void __weak hw_perf_enable(void)		{ barrier(); }
 
@@ -4286,187 +4277,6 @@ static void perf_swevent_void(struct per
 {
 }
 
-static int perf_swevent_int(struct perf_event *event)
-{
-	return 0;
-}
-
-static struct pmu perf_ops_generic = {
-	.enable		= perf_swevent_enable,
-	.disable	= perf_swevent_disable,
-	.start		= perf_swevent_int,
-	.stop		= perf_swevent_void,
-	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void, /* hwc->interrupts already reset */
-};
-
-/*
- * hrtimer based swevent callback
- */
-
-static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
-{
-	enum hrtimer_restart ret = HRTIMER_RESTART;
-	struct perf_sample_data data;
-	struct pt_regs *regs;
-	struct perf_event *event;
-	u64 period;
-
-	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
-	event->pmu->read(event);
-
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
-	regs = get_irq_regs();
-
-	if (regs && !perf_exclude_event(event, regs)) {
-		if (!(event->attr.exclude_idle && current->pid == 0))
-			if (perf_event_overflow(event, 0, &data, regs))
-				ret = HRTIMER_NORESTART;
-	}
-
-	period = max_t(u64, 10000, event->hw.sample_period);
-	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
-
-	return ret;
-}
-
-static void perf_swevent_start_hrtimer(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-
-	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hwc->hrtimer.function = perf_swevent_hrtimer;
-	if (hwc->sample_period) {
-		u64 period;
-
-		if (hwc->remaining) {
-			if (hwc->remaining < 0)
-				period = 10000;
-			else
-				period = hwc->remaining;
-			hwc->remaining = 0;
-		} else {
-			period = max_t(u64, 10000, hwc->sample_period);
-		}
-		__hrtimer_start_range_ns(&hwc->hrtimer,
-				ns_to_ktime(period), 0,
-				HRTIMER_MODE_REL, 0);
-	}
-}
-
-static void perf_swevent_cancel_hrtimer(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-
-	if (hwc->sample_period) {
-		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
-		hwc->remaining = ktime_to_ns(remaining);
-
-		hrtimer_cancel(&hwc->hrtimer);
-	}
-}
-
-/*
- * Software event: cpu wall time clock
- */
-
-static void cpu_clock_perf_event_update(struct perf_event *event)
-{
-	int cpu = raw_smp_processor_id();
-	s64 prev;
-	u64 now;
-
-	now = cpu_clock(cpu);
-	prev = local64_xchg(&event->hw.prev_count, now);
-	local64_add(now - prev, &event->count);
-}
-
-static int cpu_clock_perf_event_enable(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-	int cpu = raw_smp_processor_id();
-
-	local64_set(&hwc->prev_count, cpu_clock(cpu));
-	perf_swevent_start_hrtimer(event);
-
-	return 0;
-}
-
-static void cpu_clock_perf_event_disable(struct perf_event *event)
-{
-	perf_swevent_cancel_hrtimer(event);
-	cpu_clock_perf_event_update(event);
-}
-
-static void cpu_clock_perf_event_read(struct perf_event *event)
-{
-	cpu_clock_perf_event_update(event);
-}
-
-static struct pmu perf_ops_cpu_clock = {
-	.enable		= cpu_clock_perf_event_enable,
-	.disable	= cpu_clock_perf_event_disable,
-	.read		= cpu_clock_perf_event_read,
-};
-
-/*
- * Software event: task time clock
- */
-
-static void task_clock_perf_event_update(struct perf_event *event, u64 now)
-{
-	u64 prev;
-	s64 delta;
-
-	prev = local64_xchg(&event->hw.prev_count, now);
-	delta = now - prev;
-	local64_add(delta, &event->count);
-}
-
-static int task_clock_perf_event_enable(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-	u64 now;
-
-	now = event->ctx->time;
-
-	local64_set(&hwc->prev_count, now);
-
-	perf_swevent_start_hrtimer(event);
-
-	return 0;
-}
-
-static void task_clock_perf_event_disable(struct perf_event *event)
-{
-	perf_swevent_cancel_hrtimer(event);
-	task_clock_perf_event_update(event, event->ctx->time);
-
-}
-
-static void task_clock_perf_event_read(struct perf_event *event)
-{
-	u64 time;
-
-	if (!in_nmi()) {
-		update_context_time(event->ctx);
-		time = event->ctx->time;
-	} else {
-		u64 now = perf_clock();
-		u64 delta = now - event->ctx->timestamp;
-		time = event->ctx->time + delta;
-	}
-
-	task_clock_perf_event_update(event, time);
-}
-
-static struct pmu perf_ops_task_clock = {
-	.enable		= task_clock_perf_event_enable,
-	.disable	= task_clock_perf_event_disable,
-	.read		= task_clock_perf_event_read,
-};
-
 /* Deref the hlist from the update side */
 static inline struct swevent_hlist *
 swevent_hlist_deref(struct perf_cpu_context *cpuctx)
@@ -4573,17 +4383,61 @@ static int swevent_hlist_get(struct perf
 	return err;
 }
 
-#ifdef CONFIG_EVENT_TRACING
+atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
-static struct pmu perf_ops_tracepoint = {
-	.enable		= perf_trace_enable,
-	.disable	= perf_trace_disable,
+static void sw_perf_event_destroy(struct perf_event *event)
+{
+	u64 event_id = event->attr.config;
+
+	WARN_ON(event->parent);
+
+	atomic_dec(&perf_swevent_enabled[event_id]);
+	swevent_hlist_put(event);
+}
+
+static int perf_swevent_int(struct perf_event *event)
+{
+	if (event->attr.type != PERF_TYPE_SOFTWARE)
+		return -ENOENT;
+
+	switch (event->attr.config) {
+	case PERF_COUNT_SW_CPU_CLOCK:
+	case PERF_COUNT_SW_TASK_CLOCK:
+		return -ENOENT;
+
+	default:
+		break;
+	}
+
+	if (event->attr.config > PERF_COUNT_SW_MAX)
+		return -ENOENT;
+
+	if (!event->parent) {
+		int err;
+
+		err = swevent_hlist_get(event);
+		if (err)
+			return ERR_PTR(err);
+
+		atomic_inc(&perf_swevent_enabled[event_id]);
+		event->destroy = sw_perf_event_destroy;
+	}
+
+	return 0;
+}
+
+static struct pmu perf_swevent = {
+	.event_init	= perf_swevent_init,
+	.enable		= perf_swevent_enable,
+	.disable	= perf_swevent_disable,
 	.start		= perf_swevent_int,
 	.stop		= perf_swevent_void,
 	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void,
+	.unthrottle	= perf_swevent_void, /* hwc->interrupts already reset */
 };
 
+#ifdef CONFIG_EVENT_TRACING
+
 static int perf_tp_filter_match(struct perf_event *event,
 				struct perf_sample_data *data)
 {
@@ -4639,10 +4493,13 @@ static void tp_perf_event_destroy(struct
 	perf_trace_destroy(event);
 }
 
-static struct pmu *tp_perf_event_init(struct perf_event *event)
+static int perf_tp_event_init(struct perf_event *event)
 {
 	int err;
 
+	if (event->attr.type != PERF_TYPE_TRACEPOINT)
+		return -ENOENT;
+
 	/*
 	 * Raw tracepoint data is a severe data leak, only allow root to
 	 * have these.
@@ -4650,15 +4507,30 @@ static struct pmu *tp_perf_event_init(st
 	if ((event->attr.sample_type & PERF_SAMPLE_RAW) &&
 			perf_paranoid_tracepoint_raw() &&
 			!capable(CAP_SYS_ADMIN))
-		return ERR_PTR(-EPERM);
+		return -EPERM;
 
 	err = perf_trace_init(event);
 	if (err)
-		return NULL;
+		return err;
 
 	event->destroy = tp_perf_event_destroy;
 
-	return &perf_ops_tracepoint;
+	return 0
+}
+
+static struct pmu perf_tracepoint = {
+	.event_init	= perf_tp_event_init,
+	.enable		= perf_trace_enable,
+	.disable	= perf_trace_disable,
+	.start		= perf_swevent_int,
+	.stop		= perf_swevent_void,
+	.read		= perf_swevent_read,
+	.unthrottle	= perf_swevent_void,
+};
+
+static inline void perf_tp_register(void)
+{
+	perf_pmu_register(&perf_tracepoint);
 }
 
 static int perf_event_set_filter(struct perf_event *event, void __user *arg)
@@ -4686,9 +4558,8 @@ static void perf_event_free_filter(struc
 
 #else
 
-static struct pmu *tp_perf_event_init(struct perf_event *event)
+static inline void perf_tp_register(void)
 {
-	return NULL;
 }
 
 static int perf_event_set_filter(struct perf_event *event, void __user *arg)
@@ -4703,105 +4574,247 @@ static void perf_event_free_filter(struc
 #endif /* CONFIG_EVENT_TRACING */
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-static void bp_perf_event_destroy(struct perf_event *event)
+void perf_bp_event(struct perf_event *bp, void *data)
 {
-	release_bp_slot(event);
+	struct perf_sample_data sample;
+	struct pt_regs *regs = data;
+
+	perf_sample_data_init(&sample, bp->attr.bp_addr);
+
+	if (!perf_exclude_event(bp, regs))
+		perf_swevent_add(bp, 1, 1, &sample, regs);
 }
+#endif
 
-static struct pmu *bp_perf_event_init(struct perf_event *bp)
+/*
+ * hrtimer based swevent callback
+ */
+
+static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 {
-	int err;
+	enum hrtimer_restart ret = HRTIMER_RESTART;
+	struct perf_sample_data data;
+	struct pt_regs *regs;
+	struct perf_event *event;
+	u64 period;
 
-	err = register_perf_hw_breakpoint(bp);
-	if (err)
-		return ERR_PTR(err);
+	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+	event->pmu->read(event);
+
+	perf_sample_data_init(&data, 0);
+	data.period = event->hw.last_period;
+	regs = get_irq_regs();
 
-	bp->destroy = bp_perf_event_destroy;
+	if (regs && !perf_exclude_event(event, regs)) {
+		if (!(event->attr.exclude_idle && current->pid == 0))
+			if (perf_event_overflow(event, 0, &data, regs))
+				ret = HRTIMER_NORESTART;
+	}
 
-	return &perf_ops_bp;
+	period = max_t(u64, 10000, event->hw.sample_period);
+	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
+
+	return ret;
 }
 
-void perf_bp_event(struct perf_event *bp, void *data)
+static void perf_swevent_start_hrtimer(struct perf_event *event)
 {
-	struct perf_sample_data sample;
-	struct pt_regs *regs = data;
+	struct hw_perf_event *hwc = &event->hw;
 
-	perf_sample_data_init(&sample, bp->attr.bp_addr);
+	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hwc->hrtimer.function = perf_swevent_hrtimer;
+	if (hwc->sample_period) {
+		u64 period;
 
-	if (!perf_exclude_event(bp, regs))
-		perf_swevent_add(bp, 1, 1, &sample, regs);
+		if (hwc->remaining) {
+			if (hwc->remaining < 0)
+				period = 10000;
+			else
+				period = hwc->remaining;
+			hwc->remaining = 0;
+		} else {
+			period = max_t(u64, 10000, hwc->sample_period);
+		}
+		__hrtimer_start_range_ns(&hwc->hrtimer,
+				ns_to_ktime(period), 0,
+				HRTIMER_MODE_REL, 0);
+	}
 }
-#else
-static struct pmu *bp_perf_event_init(struct perf_event *bp)
+
+static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 {
-	return NULL;
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->sample_period) {
+		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
+		hwc->remaining = ktime_to_ns(remaining);
+
+		hrtimer_cancel(&hwc->hrtimer);
+	}
 }
 
-void perf_bp_event(struct perf_event *bp, void *regs)
+/*
+ * Software event: cpu wall time clock
+ */
+
+static void cpu_clock_event_update(struct perf_event *event)
 {
+	int cpu = raw_smp_processor_id();
+	s64 prev;
+	u64 now;
+
+	now = cpu_clock(cpu);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
 }
-#endif
 
-atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+static int cpu_clock_event_enable(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int cpu = raw_smp_processor_id();
 
-static void sw_perf_event_destroy(struct perf_event *event)
+	local64_set(&hwc->prev_count, cpu_clock(cpu));
+	perf_swevent_start_hrtimer(event);
+
+	return 0;
+}
+
+static void cpu_clock_event_disable(struct perf_event *event)
 {
-	u64 event_id = event->attr.config;
+	perf_swevent_cancel_hrtimer(event);
+	cpu_clock_event_update(event);
+}
 
-	WARN_ON(event->parent);
+static void cpu_clock_event_read(struct perf_event *event)
+{
+	cpu_clock_event_update(event);
+}
 
-	atomic_dec(&perf_swevent_enabled[event_id]);
-	swevent_hlist_put(event);
+static int cpu_clock_event_init(struct perf_event *event)
+{
+	if (event->attr.type != PERF_TYPE_SOFTWARE)
+		return -ENOENT;
+
+	if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK)
+		return -ENOENT;
+
+	return 0;
 }
 
-static struct pmu *sw_perf_event_init(struct perf_event *event)
+static struct pmu perf_cpu_clock = {
+	.event_init	= cpu_clock_event_init,
+	.enable		= cpu_clock_event_enable,
+	.disable	= cpu_clock_event_disable,
+	.read		= cpu_clock_event_read,
+};
+
+/*
+ * Software event: task time clock
+ */
+
+static void task_clock_event_update(struct perf_event *event, u64 now)
 {
-	struct pmu *pmu = NULL;
-	u64 event_id = event->attr.config;
+	u64 prev;
+	s64 delta;
 
-	/*
-	 * Software events (currently) can't in general distinguish
-	 * between user, kernel and hypervisor events.
-	 * However, context switches and cpu migrations are considered
-	 * to be kernel events, and page faults are never hypervisor
-	 * events.
-	 */
-	switch (event_id) {
-	case PERF_COUNT_SW_CPU_CLOCK:
-		pmu = &perf_ops_cpu_clock;
+	prev = local64_xchg(&event->hw.prev_count, now);
+	delta = now - prev;
+	local64_add(delta, &event->count);
+}
 
-		break;
-	case PERF_COUNT_SW_TASK_CLOCK:
-		/*
-		 * If the user instantiates this as a per-cpu event,
-		 * use the cpu_clock event instead.
-		 */
-		if (event->ctx->task)
-			pmu = &perf_ops_task_clock;
-		else
-			pmu = &perf_ops_cpu_clock;
+static int task_clock_event_enable(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 now;
 
-		break;
-	case PERF_COUNT_SW_PAGE_FAULTS:
-	case PERF_COUNT_SW_PAGE_FAULTS_MIN:
-	case PERF_COUNT_SW_PAGE_FAULTS_MAJ:
-	case PERF_COUNT_SW_CONTEXT_SWITCHES:
-	case PERF_COUNT_SW_CPU_MIGRATIONS:
-	case PERF_COUNT_SW_ALIGNMENT_FAULTS:
-	case PERF_COUNT_SW_EMULATION_FAULTS:
-		if (!event->parent) {
-			int err;
-
-			err = swevent_hlist_get(event);
-			if (err)
-				return ERR_PTR(err);
+	now = event->ctx->time;
+
+	local64_set(&hwc->prev_count, now);
 
-			atomic_inc(&perf_swevent_enabled[event_id]);
-			event->destroy = sw_perf_event_destroy;
+	perf_swevent_start_hrtimer(event);
+
+	return 0;
+}
+
+static void task_clock_event_disable(struct perf_event *event)
+{
+	perf_swevent_cancel_hrtimer(event);
+	task_clock_event_update(event, event->ctx->time);
+
+}
+
+static void task_clock_event_read(struct perf_event *event)
+{
+	u64 time;
+
+	if (!in_nmi()) {
+		update_context_time(event->ctx);
+		time = event->ctx->time;
+	} else {
+		u64 now = perf_clock();
+		u64 delta = now - event->ctx->timestamp;
+		time = event->ctx->time + delta;
+	}
+
+	task_clock_event_update(event, time);
+}
+
+static int task_clock_event_init(struct perf_event *event)
+{
+	if (event->attr.type != PERF_TYPE_SOFTWARE)
+		return -ENOENT;
+
+	if (event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
+		return -ENOENT;
+
+	return 0;
+}
+
+static struct pmu perf_ops_clock = {
+	.event_init	= task_clock_event_init,
+	.enable		= task_clock_event_enable,
+	.disable	= task_clock_event_disable,
+	.read		= task_clock_event_read,
+};
+
+static struct list_head pmus;
+static DEFINE_SPINLOCK(pmus_lock);
+static struct srcu_struct pmus_srcu;
+
+int perf_pmu_register(struct pmu *pmu)
+{
+	spin_lock(&pmus_lock);
+	list_add_rcu(&pmu->entry, &pmus);
+	spin_unlock(&pmus_lock);
+
+	return 0;
+}
+
+void perf_pmu_unregister(struct pmu *pmu)
+{
+	spin_lock(&pmus_lock);
+	list_del_rcu(&pmu->entry);
+	spin_unlock(&pmus_lock);
+
+	synchronize_srcu(&pmus_srcu);
+}
+
+struct pmu *perf_init_event(struct perf_event *event)
+{
+	struct pmu *pmu;
+	int idx;
+
+	idx = srcu_read_lock(&pmus_srcu);
+	list_for_each_entry_rcu(pmu, &pmus, entry) {
+		int ret = pmu->event_init(event);
+		if (!ret)
+			break;
+		if (ret != -ENOENT) {
+			pmu = ERR_PTR(ret);
+			break;
 		}
-		pmu = &perf_ops_generic;
-		break;
 	}
+	srcu_read_unlock(&pmus_srcu, idx);
 
 	return pmu;
 }
@@ -4882,29 +4895,8 @@ perf_event_alloc(struct perf_event_attr 
 	if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
 		goto done;
 
-	switch (attr->type) {
-	case PERF_TYPE_RAW:
-	case PERF_TYPE_HARDWARE:
-	case PERF_TYPE_HW_CACHE:
-		pmu = hw_perf_event_init(event);
-		break;
-
-	case PERF_TYPE_SOFTWARE:
-		pmu = sw_perf_event_init(event);
-		break;
-
-	case PERF_TYPE_TRACEPOINT:
-		pmu = tp_perf_event_init(event);
-		break;
+	pmu = perf_init_event(event);
 
-	case PERF_TYPE_BREAKPOINT:
-		pmu = bp_perf_event_init(event);
-		break;
-
-
-	default:
-		break;
-	}
 done:
 	err = 0;
 	if (!pmu)
@@ -5743,15 +5735,15 @@ perf_cpu_notify(struct notifier_block *s
 {
 	unsigned int cpu = (long)hcpu;
 
-	switch (action) {
+	switch (action & ~CPU_TASK_FROZEN) {
 
 	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
+	case CPU_DOWN_FAILED:
 		perf_event_init_cpu(cpu);
 		break;
 
+	case CPU_UP_CANCELED:
 	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
 		perf_event_exit_cpu(cpu);
 		break;
 
@@ -5762,22 +5754,15 @@ perf_cpu_notify(struct notifier_block *s
 	return NOTIFY_OK;
 }
 
-/*
- * This has to have a higher priority than migration_notifier in sched.c.
- */
-static struct notifier_block __cpuinitdata perf_cpu_nb = {
-	.notifier_call		= perf_cpu_notify,
-	.priority		= 20,
-};
-
 void __init perf_event_init(void)
 {
 	perf_event_init_all_cpus();
-	perf_cpu_notify(&perf_cpu_nb, (unsigned long)CPU_UP_PREPARE,
-			(void *)(long)smp_processor_id());
-	perf_cpu_notify(&perf_cpu_nb, (unsigned long)CPU_ONLINE,
-			(void *)(long)smp_processor_id());
-	register_cpu_notifier(&perf_cpu_nb);
+	init_srcu_struct(&pmus_srcu);
+	perf_pmu_register(&perf_swevent);
+	perf_pmu_register(&perf_cpu_clock);
+	perf_pmu_register(&perf_task_clock);
+	perf_tp_register();
+	perf_cpu_notifier(perf_cpu_notify);
 }
 
 static ssize_t perf_show_reserve_percpu(struct sysdev_class *class,



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 4/8] perf: Unindent labels
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
                   ` (2 preceding siblings ...)
  2010-06-16 16:00 ` [RFC][PATCH 3/8] perf: register pmu implementations Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-16 17:16   ` Frederic Weisbecker
  2010-06-16 16:00 ` [RFC][PATCH 5/8] perf: Reduce perf_disable() usage Peter Zijlstra
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-whitespace.patch --]
[-- Type: text/plain, Size: 4256 bytes --]

Fixup random annoying style bits

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/perf_event.c |   43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -147,7 +147,7 @@ perf_lock_task_context(struct task_struc
 	struct perf_event_context *ctx;
 
 	rcu_read_lock();
- retry:
+retry:
 	ctx = rcu_dereference(task->perf_event_ctxp);
 	if (ctx) {
 		/*
@@ -601,7 +601,7 @@ void perf_event_disable(struct perf_even
 		return;
 	}
 
- retry:
+retry:
 	task_oncpu_function_call(task, __perf_event_disable, event);
 
 	raw_spin_lock_irq(&ctx->lock);
@@ -831,7 +831,7 @@ static void __perf_install_in_context(vo
 	if (!err && !ctx->task && cpuctx->max_pertask)
 		cpuctx->max_pertask--;
 
- unlock:
+unlock:
 	perf_enable();
 
 	raw_spin_unlock(&ctx->lock);
@@ -904,10 +904,12 @@ static void __perf_event_mark_enabled(st
 
 	event->state = PERF_EVENT_STATE_INACTIVE;
 	event->tstamp_enabled = ctx->time - event->total_time_enabled;
-	list_for_each_entry(sub, &event->sibling_list, group_entry)
-		if (sub->state >= PERF_EVENT_STATE_INACTIVE)
+	list_for_each_entry(sub, &event->sibling_list, group_entry) {
+		if (sub->state >= PERF_EVENT_STATE_INACTIVE) {
 			sub->tstamp_enabled =
 				ctx->time - sub->total_time_enabled;
+		}
+	}
 }
 
 /*
@@ -973,7 +975,7 @@ static void __perf_event_enable(void *in
 		}
 	}
 
- unlock:
+unlock:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -1014,7 +1016,7 @@ void perf_event_enable(struct perf_event
 	if (event->state == PERF_EVENT_STATE_ERROR)
 		event->state = PERF_EVENT_STATE_OFF;
 
- retry:
+retry:
 	raw_spin_unlock_irq(&ctx->lock);
 	task_oncpu_function_call(task, __perf_event_enable, event);
 
@@ -1034,7 +1036,7 @@ void perf_event_enable(struct perf_event
 	if (event->state == PERF_EVENT_STATE_OFF)
 		__perf_event_mark_enabled(event, ctx);
 
- out:
+out:
 	raw_spin_unlock_irq(&ctx->lock);
 }
 
@@ -1074,17 +1076,19 @@ static void ctx_sched_out(struct perf_ev
 	if (!ctx->nr_active)
 		goto out_enable;
 
-	if (event_type & EVENT_PINNED)
+	if (event_type & EVENT_PINNED) {
 		list_for_each_entry(event, &ctx->pinned_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
+	}
 
-	if (event_type & EVENT_FLEXIBLE)
+	if (event_type & EVENT_FLEXIBLE) {
 		list_for_each_entry(event, &ctx->flexible_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
+	}
 
  out_enable:
 	perf_enable();
- out:
+out:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -1323,9 +1327,10 @@ ctx_flexible_sched_in(struct perf_event_
 		if (event->cpu != -1 && event->cpu != smp_processor_id())
 			continue;
 
-		if (group_can_go_on(event, cpuctx, can_add_hw))
+		if (group_can_go_on(event, cpuctx, can_add_hw)) {
 			if (group_sched_in(event, cpuctx, ctx))
 				can_add_hw = 0;
+		}
 	}
 }
 
@@ -1355,7 +1360,7 @@ ctx_sched_in(struct perf_event_context *
 		ctx_flexible_sched_in(ctx, cpuctx);
 
 	perf_enable();
- out:
+out:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -1696,7 +1701,7 @@ static void perf_event_enable_on_exec(st
 	raw_spin_unlock(&ctx->lock);
 
 	perf_event_task_sched_in(task);
- out:
+out:
 	local_irq_restore(flags);
 }
 
@@ -1825,7 +1830,7 @@ static struct perf_event_context *find_g
 	if (!ptrace_may_access(task, PTRACE_MODE_READ))
 		goto errout;
 
- retry:
+retry:
 	ctx = perf_lock_task_context(task, &flags);
 	if (ctx) {
 		unclone_ctx(ctx);
@@ -1853,7 +1858,7 @@ static struct perf_event_context *find_g
 	put_task_struct(task);
 	return ctx;
 
- errout:
+errout:
 	put_task_struct(task);
 	return ERR_PTR(err);
 }
@@ -3044,7 +3049,7 @@ again:
 	if (handle->wakeup != local_read(&buffer->wakeup))
 		perf_output_wakeup(handle);
 
- out:
+out:
 	preempt_enable();
 }
 
@@ -4347,7 +4352,7 @@ static int swevent_hlist_get_cpu(struct 
 		rcu_assign_pointer(cpuctx->swevent_hlist, hlist);
 	}
 	cpuctx->hlist_refcount++;
- exit:
+exit:
 	mutex_unlock(&cpuctx->hlist_mutex);
 
 	return err;
@@ -4372,7 +4377,7 @@ static int swevent_hlist_get(struct perf
 	put_online_cpus();
 
 	return 0;
- fail:
+fail:
 	for_each_possible_cpu(cpu) {
 		if (cpu == failed_cpu)
 			break;



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 5/8] perf: Reduce perf_disable() usage
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
                   ` (3 preceding siblings ...)
  2010-06-16 16:00 ` [RFC][PATCH 4/8] perf: Unindent labels Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-16 16:52   ` Cyrill Gorcunov
  2010-06-16 16:00 ` [RFC][PATCH 6/8] perf: Per PMU disable Peter Zijlstra
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-less-disable-1.patch --]
[-- Type: text/plain, Size: 11310 bytes --]

Since the current perf_disable() usage is only an optimization, remove
it for now. This eases the removal of the weak hw_perf_enable
interface.

[ Some comments talk about NMI races, but I cannot find any such
  things. ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/powerpc/kernel/perf_event.c         |    3 ++
 arch/powerpc/kernel/perf_event_fsl_emb.c |    8 +++++-
 arch/sparc/kernel/perf_event.c           |    3 ++
 arch/x86/kernel/cpu/perf_event.c         |   22 +++++++++++-------
 include/linux/perf_event.h               |   20 ++++++++--------
 kernel/perf_event.c                      |   37 -------------------------------
 6 files changed, 37 insertions(+), 56 deletions(-)

Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -562,26 +562,26 @@ struct pmu {
 	struct list_head		entry;
 
 	/*
-	 * Should return -ENOENT when the @event doesn't match this pmu
+	 * Should return -ENOENT when the @event doesn't match this PMU.
 	 */
 	int (*event_init)		(struct perf_event *event);
 
-	int (*enable)			(struct perf_event *event);
+	int  (*enable)			(struct perf_event *event);
 	void (*disable)			(struct perf_event *event);
-	int (*start)			(struct perf_event *event);
+	int  (*start)			(struct perf_event *event);
 	void (*stop)			(struct perf_event *event);
 	void (*read)			(struct perf_event *event);
 	void (*unthrottle)		(struct perf_event *event);
 
 	/*
-	 * Group events scheduling is treated as a transaction, add group
-	 * events as a whole and perform one schedulability test. If the test
-	 * fails, roll back the whole group
+	 * Group events scheduling is treated as a transaction, add
+	 * group events as a whole and perform one schedulability test.
+	 * If the test fails, roll back the whole group
 	 */
 
 	/*
-	 * Start the transaction, after this ->enable() doesn't need
-	 * to do schedulability tests.
+	 * Start the transaction, after this ->enable() doesn't need to
+	 * do schedulability tests.
 	 */
 	void (*start_txn)	(struct pmu *pmu);
 	/*
@@ -592,8 +592,8 @@ struct pmu {
 	 */
 	int  (*commit_txn)	(struct pmu *pmu);
 	/*
-	 * Will cancel the transaction, assumes ->disable() is called for
-	 * each successfull ->enable() during the transaction.
+	 * Will cancel the transaction, assumes ->disable() is called
+	 * for each successfull ->enable() during the transaction.
 	 */
 	void (*cancel_txn)	(struct pmu *pmu);
 };
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -460,11 +460,6 @@ static void __perf_event_remove_from_con
 		return;
 
 	raw_spin_lock(&ctx->lock);
-	/*
-	 * Protect the list operation against NMI by disabling the
-	 * events on a global level.
-	 */
-	perf_disable();
 
 	event_sched_out(event, cpuctx, ctx);
 
@@ -480,7 +475,6 @@ static void __perf_event_remove_from_con
 			    perf_max_events - perf_reserved_percpu);
 	}
 
-	perf_enable();
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -785,12 +779,6 @@ static void __perf_install_in_context(vo
 	ctx->is_active = 1;
 	update_context_time(ctx);
 
-	/*
-	 * Protect the list operation against NMI by disabling the
-	 * events on a global level. NOP for non NMI based events.
-	 */
-	perf_disable();
-
 	add_event_to_ctx(event, ctx);
 
 	if (event->cpu != -1 && event->cpu != smp_processor_id())
@@ -832,8 +820,6 @@ static void __perf_install_in_context(vo
 		cpuctx->max_pertask--;
 
 unlock:
-	perf_enable();
-
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -954,12 +940,10 @@ static void __perf_event_enable(void *in
 	if (!group_can_go_on(event, cpuctx, 1)) {
 		err = -EEXIST;
 	} else {
-		perf_disable();
 		if (event == leader)
 			err = group_sched_in(event, cpuctx, ctx);
 		else
 			err = event_sched_in(event, cpuctx, ctx);
-		perf_enable();
 	}
 
 	if (err) {
@@ -1072,9 +1056,8 @@ static void ctx_sched_out(struct perf_ev
 		goto out;
 	update_context_time(ctx);
 
-	perf_disable();
 	if (!ctx->nr_active)
-		goto out_enable;
+		goto out;
 
 	if (event_type & EVENT_PINNED) {
 		list_for_each_entry(event, &ctx->pinned_groups, group_entry)
@@ -1085,9 +1068,6 @@ static void ctx_sched_out(struct perf_ev
 		list_for_each_entry(event, &ctx->flexible_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
 	}
-
- out_enable:
-	perf_enable();
 out:
 	raw_spin_unlock(&ctx->lock);
 }
@@ -1346,8 +1326,6 @@ ctx_sched_in(struct perf_event_context *
 
 	ctx->timestamp = perf_clock();
 
-	perf_disable();
-
 	/*
 	 * First go through the list and put on any pinned groups
 	 * in order to give them the best chance of going on.
@@ -1359,7 +1337,6 @@ ctx_sched_in(struct perf_event_context *
 	if (event_type & EVENT_FLEXIBLE)
 		ctx_flexible_sched_in(ctx, cpuctx);
 
-	perf_enable();
 out:
 	raw_spin_unlock(&ctx->lock);
 }
@@ -1407,8 +1384,6 @@ void perf_event_task_sched_in(struct tas
 	if (cpuctx->task_ctx == ctx)
 		return;
 
-	perf_disable();
-
 	/*
 	 * We want to keep the following priority order:
 	 * cpu pinned (that don't need to move), task pinned,
@@ -1421,8 +1396,6 @@ void perf_event_task_sched_in(struct tas
 	ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE);
 
 	cpuctx->task_ctx = ctx;
-
-	perf_enable();
 }
 
 #define MAX_INTERRUPTS (~0ULL)
@@ -1537,11 +1510,9 @@ static void perf_adjust_period(struct pe
 	hwc->sample_period = sample_period;
 
 	if (local64_read(&hwc->period_left) > 8*sample_period) {
-		perf_disable();
 		perf_event_stop(event);
 		local64_set(&hwc->period_left, 0);
 		perf_event_start(event);
-		perf_enable();
 	}
 }
 
@@ -1570,15 +1541,12 @@ static void perf_ctx_adjust_freq(struct 
 		 */
 		if (interrupts == MAX_INTERRUPTS) {
 			perf_log_throttle(event, 1);
-			perf_disable();
 			event->pmu->unthrottle(event);
-			perf_enable();
 		}
 
 		if (!event->attr.freq || !event->attr.sample_freq)
 			continue;
 
-		perf_disable();
 		event->pmu->read(event);
 		now = local64_read(&event->count);
 		delta = now - hwc->freq_count_stamp;
@@ -1586,7 +1554,6 @@ static void perf_ctx_adjust_freq(struct 
 
 		if (delta > 0)
 			perf_adjust_period(event, TICK_NSEC, delta);
-		perf_enable();
 	}
 	raw_spin_unlock(&ctx->lock);
 }
@@ -1629,7 +1596,6 @@ void perf_event_task_tick(struct task_st
 	if (!rotate)
 		return;
 
-	perf_disable();
 	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 	if (ctx)
 		task_ctx_sched_out(ctx, EVENT_FLEXIBLE);
@@ -1641,7 +1607,6 @@ void perf_event_task_tick(struct task_st
 	cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE);
 	if (ctx)
 		task_ctx_sched_in(curr, EVENT_FLEXIBLE);
-	perf_enable();
 }
 
 static int event_enable_on_exec(struct perf_event *event,
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -262,7 +262,7 @@ static int collect_events(struct perf_ev
 	return n;
 }
 
-/* perf must be disabled, context locked on entry */
+/* context locked on entry */
 static int fsl_emb_pmu_enable(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuhw;
@@ -271,6 +271,7 @@ static int fsl_emb_pmu_enable(struct per
 	u64 val;
 	int i;
 
+	perf_disable();
 	cpuhw = &get_cpu_var(cpu_hw_events);
 
 	if (event->hw.config & FSL_EMB_EVENT_RESTRICTED)
@@ -310,15 +311,17 @@ static int fsl_emb_pmu_enable(struct per
 	ret = 0;
  out:
 	put_cpu_var(cpu_hw_events);
+	perf_enable();
 	return ret;
 }
 
-/* perf must be disabled, context locked on entry */
+/* context locked on entry */
 static void fsl_emb_pmu_disable(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuhw;
 	int i = event->hw.idx;
 
+	perf_disable();
 	if (i < 0)
 		goto out;
 
@@ -346,6 +349,7 @@ static void fsl_emb_pmu_disable(struct p
 	cpuhw->n_events--;
 
  out:
+	perf_enable();
 	put_cpu_var(cpu_hw_events);
 }
 
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -969,10 +969,11 @@ static int x86_pmu_enable(struct perf_ev
 
 	hwc = &event->hw;
 
+	perf_disable();
 	n0 = cpuc->n_events;
-	n = collect_events(cpuc, event, false);
-	if (n < 0)
-		return n;
+	ret = n = collect_events(cpuc, event, false);
+	if (ret < 0)
+		goto out;
 
 	/*
 	 * If group events scheduling transaction was started,
@@ -980,23 +981,26 @@ static int x86_pmu_enable(struct perf_ev
 	 * at commit time(->commit_txn) as a whole
 	 */
 	if (cpuc->group_flag & PERF_EVENT_TXN)
-		goto out;
+		goto done_collect;
 
 	ret = x86_pmu.schedule_events(cpuc, n, assign);
 	if (ret)
-		return ret;
+		goto out;
 	/*
 	 * copy new assignment, now we know it is possible
 	 * will be used by hw_perf_enable()
 	 */
 	memcpy(cpuc->assign, assign, n*sizeof(int));
 
-out:
+done_collect:
 	cpuc->n_events = n;
 	cpuc->n_added += n - n0;
 	cpuc->n_txn += n - n0;
 
-	return 0;
+	ret = 0;
+out:
+	perf_enable();
+	return ret;
 }
 
 static int x86_pmu_start(struct perf_event *event)
@@ -1399,6 +1403,7 @@ static void x86_pmu_start_txn(struct pmu
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
+	perf_disable();
 	cpuc->group_flag |= PERF_EVENT_TXN;
 	cpuc->n_txn = 0;
 }
@@ -1418,6 +1423,7 @@ static void x86_pmu_cancel_txn(struct pm
 	 */
 	cpuc->n_added -= cpuc->n_txn;
 	cpuc->n_events -= cpuc->n_txn;
+	perf_enable();
 }
 
 /*
@@ -1447,7 +1453,7 @@ static int x86_pmu_commit_txn(struct pmu
 	memcpy(cpuc->assign, assign, n*sizeof(int));
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
-
+	perf_enable();
 	return 0;
 }
 
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -858,6 +858,7 @@ void power_pmu_start_txn(struct pmu *pmu
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
+	perf_disable();
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 	cpuhw->n_txn_start = cpuhw->n_events;
 }
@@ -872,6 +873,7 @@ void power_pmu_cancel_txn(struct pmu *pm
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 }
 
 /*
@@ -898,6 +900,7 @@ int power_pmu_commit_txn(struct pmu *pmu
 		cpuhw->event[i]->hw.config = cpuhw->events[i];
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 	return 0;
 }
 
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -1102,6 +1102,7 @@ static void sparc_pmu_start_txn(struct p
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
+	perf_disable();
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 }
 
@@ -1115,6 +1116,7 @@ static void sparc_pmu_cancel_txn(struct 
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 }
 
 /*
@@ -1138,6 +1140,7 @@ static int sparc_pmu_commit_txn(struct p
 		return -EAGAIN;
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 6/8] perf: Per PMU disable
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
                   ` (4 preceding siblings ...)
  2010-06-16 16:00 ` [RFC][PATCH 5/8] perf: Reduce perf_disable() usage Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-16 17:48   ` Robert Richter
  2010-06-18  2:14   ` Frederic Weisbecker
  2010-06-16 16:00 ` [RFC][PATCH 7/8] perf: Default PMU ops Peter Zijlstra
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-pmu-disable.patch --]
[-- Type: text/plain, Size: 15537 bytes --]

Changes perf_disable() into perf_disable_pmu().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/perf_event.c             |   24 ++++++++++-----------
 arch/powerpc/kernel/perf_event.c         |   23 +++++++++++---------
 arch/powerpc/kernel/perf_event_fsl_emb.c |   18 +++++++++-------
 arch/sh/kernel/perf_event.c              |   34 ++++++++++++++++---------------
 arch/sparc/kernel/perf_event.c           |   20 ++++++++++--------
 arch/x86/kernel/cpu/perf_event.c         |   16 ++++++++------
 include/linux/perf_event.h               |   13 ++++++-----
 kernel/perf_event.c                      |   30 ++++++++++++++++-----------
 8 files changed, 98 insertions(+), 80 deletions(-)

Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -529,16 +529,7 @@ static int armpmu_event_init(struct perf
 	return err;
 }
 
-static struct pmu pmu = {
-	.event_init = armpmu_event_init,
-	.enable	    = armpmu_enable,
-	.disable    = armpmu_disable,
-	.unthrottle = armpmu_unthrottle,
-	.read	    = armpmu_read,
-};
-
-void
-hw_perf_enable(void)
+static void armpmu_pmu_enable(struct pmu *pmu)
 {
 	/* Enable all of the perf events on hardware. */
 	int idx;
@@ -559,13 +550,22 @@ hw_perf_enable(void)
 	armpmu->start();
 }
 
-void
-hw_perf_disable(void)
+static void armpmu_pmu_disable(struct pmu *pmu)
 {
 	if (armpmu)
 		armpmu->stop();
 }
 
+static struct pmu pmu = {
+	.pmu_enable = armpmu_pmu_enable,
+	.pmu_disable= armpmu_pmu_disable,
+	.event_init = armpmu_event_init,
+	.enable	    = armpmu_enable,
+	.disable    = armpmu_disable,
+	.unthrottle = armpmu_unthrottle,
+	.read	    = armpmu_read,
+};
+
 /*
  * ARMv6 Performance counter handling code.
  *
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -517,7 +517,7 @@ static void write_mmcr0(struct cpu_hw_ev
  * Disable all events to prevent PMU interrupts and to allow
  * events to be added or removed.
  */
-void hw_perf_disable(void)
+static void powerpc_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -565,7 +565,7 @@ void hw_perf_disable(void)
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
  */
-void hw_perf_enable(void)
+static void powerpc_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct perf_event *event;
 	struct cpu_hw_events *cpuhw;
@@ -735,7 +735,7 @@ static int power_pmu_enable(struct perf_
 	int ret = -EAGAIN;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 
 	/*
 	 * Add the event to the list (if there is room)
@@ -769,7 +769,7 @@ nocheck:
 
 	ret = 0;
  out:
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	local_irq_restore(flags);
 	return ret;
 }
@@ -784,7 +784,7 @@ static void power_pmu_disable(struct per
 	unsigned long flags;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 
 	power_pmu_read(event);
 
@@ -818,7 +818,7 @@ static void power_pmu_disable(struct per
 		cpuhw->mmcr[0] &= ~(MMCR0_PMXE | MMCR0_FCECE);
 	}
 
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -834,7 +834,7 @@ static void power_pmu_unthrottle(struct 
 	if (!event->hw.idx || !event->hw.sample_period)
 		return;
 	local_irq_save(flags);
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 	power_pmu_read(event);
 	left = event->hw.sample_period;
 	event->hw.last_period = left;
@@ -845,7 +845,7 @@ static void power_pmu_unthrottle(struct 
 	local64_set(&event->hw.prev_count, val);
 	local64_set(&event->hw.period_left, left);
 	perf_event_update_userpage(event);
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -858,7 +858,7 @@ void power_pmu_start_txn(struct pmu *pmu
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
-	perf_disable();
+	perf_disable_pmu(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 	cpuhw->n_txn_start = cpuhw->n_events;
 }
@@ -873,7 +873,7 @@ void power_pmu_cancel_txn(struct pmu *pm
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_enable_pmu(pmu);
 }
 
 /*
@@ -1128,6 +1128,9 @@ static int power_pmu_event_init(struct p
 }
 
 struct pmu power_pmu = {
+	.pmu_enable	= power_pmu_pmu_enable,
+	.pmu_disable	= power_pmu_pmu_disable,
+	.event_init	= pmwer_pmu_event_init,
 	.enable		= power_pmu_enable,
 	.disable	= power_pmu_disable,
 	.read		= power_pmu_read,
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -177,7 +177,7 @@ static void fsl_emb_pmu_read(struct perf
  * Disable all events to prevent PMU interrupts and to allow
  * events to be added or removed.
  */
-void hw_perf_disable(void)
+static void fsl_emb_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -216,7 +216,7 @@ void hw_perf_disable(void)
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
  */
-void hw_perf_enable(void)
+static void fsl_emb_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -271,7 +271,7 @@ static int fsl_emb_pmu_enable(struct per
 	u64 val;
 	int i;
 
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 	cpuhw = &get_cpu_var(cpu_hw_events);
 
 	if (event->hw.config & FSL_EMB_EVENT_RESTRICTED)
@@ -311,7 +311,7 @@ static int fsl_emb_pmu_enable(struct per
 	ret = 0;
  out:
 	put_cpu_var(cpu_hw_events);
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	return ret;
 }
 
@@ -321,7 +321,7 @@ static void fsl_emb_pmu_disable(struct p
 	struct cpu_hw_events *cpuhw;
 	int i = event->hw.idx;
 
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 	if (i < 0)
 		goto out;
 
@@ -349,7 +349,7 @@ static void fsl_emb_pmu_disable(struct p
 	cpuhw->n_events--;
 
  out:
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	put_cpu_var(cpu_hw_events);
 }
 
@@ -367,7 +367,7 @@ static void fsl_emb_pmu_unthrottle(struc
 	if (event->hw.idx < 0 || !event->hw.sample_period)
 		return;
 	local_irq_save(flags);
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 	fsl_emb_pmu_read(event);
 	left = event->hw.sample_period;
 	event->hw.last_period = left;
@@ -378,7 +378,7 @@ static void fsl_emb_pmu_unthrottle(struc
 	atomic64_set(&event->hw.prev_count, val);
 	atomic64_set(&event->hw.period_left, left);
 	perf_event_update_userpage(event);
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -524,6 +524,8 @@ static int fsl_emb_pmu_event_init(struct
 }
 
 static struct pmu fsl_emb_pmu = {
+	.pmu_enable	= fsl_emb_pmu_pmu_enable,
+	.pmu_disable	= fsl_emb_pmu_pmu_disable,
 	.event_init	= fsl_emb_pmu_event_init,
 	.enable		= fsl_emb_pmu_enable,
 	.disable	= fsl_emb_pmu_disable,
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -268,7 +268,25 @@ static in sh_pmu_event_init(struct perf_
 	return err;
 }
 
+static void sh_pmu_pmu_enable(struct pmu *pmu)
+{
+	if (!sh_pmu_initialized())
+		return;
+
+	sh_pmu->enable_all();
+}
+
+static void sh_pmu_pmu_disable(struct pmu *pmu)
+{
+	if (!sh_pmu_initialized())
+		return;
+
+	sh_pmu->disable_all();
+}
+
 static struct pmu pmu = {
+	.pmu_enable	= sh_pmu_pmu_enable,
+	.pmu_disable	= sh_pmu_pmu_disable,
 	.event_init	= sh_pmu_event_init,
 	.enable		= sh_pmu_enable,
 	.disable	= sh_pmu_disable,
@@ -299,22 +317,6 @@ sh_pmu_notifier(struct notifier_block *s
 	return NOTIFY_OK;
 }
 
-void hw_perf_enable(void)
-{
-	if (!sh_pmu_initialized())
-		return;
-
-	sh_pmu->enable_all();
-}
-
-void hw_perf_disable(void)
-{
-	if (!sh_pmu_initialized())
-		return;
-
-	sh_pmu->disable_all();
-}
-
 int __cpuinit register_sh_pmu(struct sh_pmu *pmu)
 {
 	if (sh_pmu)
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -663,7 +663,7 @@ out:
 	return pcr;
 }
 
-void hw_perf_enable(void)
+static void sparc_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	u64 pcr;
@@ -690,7 +690,7 @@ void hw_perf_enable(void)
 	pcr_ops->write(cpuc->pcr);
 }
 
-void hw_perf_disable(void)
+static void sparc_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	u64 val;
@@ -717,7 +717,7 @@ static void sparc_pmu_disable(struct per
 	int i;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 
 	for (i = 0; i < cpuc->n_events; i++) {
 		if (event == cpuc->event[i]) {
@@ -747,7 +747,7 @@ static void sparc_pmu_disable(struct per
 		}
 	}
 
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -990,7 +990,7 @@ static int sparc_pmu_enable(struct perf_
 	unsigned long flags;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 
 	n0 = cpuc->n_events;
 	if (n0 >= perf_max_events)
@@ -1019,7 +1019,7 @@ nocheck:
 
 	ret = 0;
 out:
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	local_irq_restore(flags);
 	return ret;
 }
@@ -1102,7 +1102,7 @@ static void sparc_pmu_start_txn(struct p
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
-	perf_disable();
+	perf_disable_pmu(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 }
 
@@ -1116,7 +1116,7 @@ static void sparc_pmu_cancel_txn(struct 
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_enable_pmu(pmu);
 }
 
 /*
@@ -1140,11 +1140,13 @@ static int sparc_pmu_commit_txn(struct p
 		return -EAGAIN;
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_enable_pmu(pmu);
 	return 0;
 }
 
 static struct pmu pmu = {
+	.pmu_enable	= sparc_pmu_pmu_enable,
+	.pmu_disable	= sparc_pmu_pmu_disable,
 	.event_init	= sparc_pmu_event_init,
 	.enable		= sparc_pmu_enable,
 	.disable	= sparc_pmu_disable,
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -583,7 +583,7 @@ static void x86_pmu_disable_all(void)
 	}
 }
 
-void hw_perf_disable(void)
+static void x86_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -803,7 +803,7 @@ static inline int match_prev_assignment(
 static int x86_pmu_start(struct perf_event *event);
 static void x86_pmu_stop(struct perf_event *event);
 
-void hw_perf_enable(void)
+static void x86_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct perf_event *event;
@@ -969,7 +969,7 @@ static int x86_pmu_enable(struct perf_ev
 
 	hwc = &event->hw;
 
-	perf_disable();
+	perf_disable_pmu(event->pmu);
 	n0 = cpuc->n_events;
 	ret = n = collect_events(cpuc, event, false);
 	if (ret < 0)
@@ -999,7 +999,7 @@ done_collect:
 
 	ret = 0;
 out:
-	perf_enable();
+	perf_enable_pmu(event->pmu);
 	return ret;
 }
 
@@ -1403,7 +1403,7 @@ static void x86_pmu_start_txn(struct pmu
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
-	perf_disable();
+	perf_disable_pmu(pmu);
 	cpuc->group_flag |= PERF_EVENT_TXN;
 	cpuc->n_txn = 0;
 }
@@ -1423,7 +1423,7 @@ static void x86_pmu_cancel_txn(struct pm
 	 */
 	cpuc->n_added -= cpuc->n_txn;
 	cpuc->n_events -= cpuc->n_txn;
-	perf_enable();
+	perf_enable_pmu(pmu);
 }
 
 /*
@@ -1453,7 +1453,7 @@ static int x86_pmu_commit_txn(struct pmu
 	memcpy(cpuc->assign, assign, n*sizeof(int));
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_enable_pmu(pmu);
 	return 0;
 }
 
@@ -1572,6 +1572,8 @@ int x86_pmu_event_init(struct perf_event
 }
 
 static struct pmu pmu = {
+	.pmu_enable	= x86_pmu_pmu_enable,
+	.pmu_disable	= x86_pmu_pmu_disable,
 	.event_init	= x86_pmu_event_init
 	.enable		= x86_pmu_enable,
 	.disable	= x86_pmu_disable,
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -561,6 +561,11 @@ struct perf_event;
 struct pmu {
 	struct list_head		entry;
 
+	int				*pmu_disable_count;
+
+	void (*pmu_enable)		(struct pmu *pmu);
+	void (*pmu_disable)		(struct pmu *pmu);
+
 	/*
 	 * Should return -ENOENT when the @event doesn't match this PMU.
 	 */
@@ -864,10 +869,8 @@ extern void perf_event_free_task(struct 
 extern void set_perf_event_pending(void);
 extern void perf_event_do_pending(void);
 extern void perf_event_print_debug(void);
-extern void __perf_disable(void);
-extern bool __perf_enable(void);
-extern void perf_disable(void);
-extern void perf_enable(void);
+extern void perf_disable_pmu(struct pmu *pmu);
+extern void perf_enable_pmu(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
 extern void perf_event_update_userpage(struct perf_event *event);
@@ -1056,8 +1059,6 @@ static inline void perf_event_exit_task(
 static inline void perf_event_free_task(struct task_struct *task)	{ }
 static inline void perf_event_do_pending(void)				{ }
 static inline void perf_event_print_debug(void)				{ }
-static inline void perf_disable(void)					{ }
-static inline void perf_enable(void)					{ }
 static inline int perf_event_task_disable(void)				{ return -EINVAL; }
 static inline int perf_event_task_enable(void)				{ return -EINVAL; }
 
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -71,23 +71,20 @@ static atomic64_t perf_event_id;
  */
 static DEFINE_SPINLOCK(perf_resource_lock);
 
-void __weak hw_perf_disable(void)		{ barrier(); }
-void __weak hw_perf_enable(void)		{ barrier(); }
-
 void __weak perf_event_print_debug(void)	{ }
 
-static DEFINE_PER_CPU(int, perf_disable_count);
-
-void perf_disable(void)
+void perf_disable_pmu(struct pmu *pmu)
 {
-	if (!__get_cpu_var(perf_disable_count)++)
-		hw_perf_disable();
+	int *count = this_cpu_ptr(pmu->pmu_disable_count);
+	if (!(*count)++)
+		pmu->pmu_disable(pmu);
 }
 
-void perf_enable(void)
+void perf_enable_pmu(struct pmu *pmu)
 {
-	if (!--__get_cpu_var(perf_disable_count))
-		hw_perf_enable();
+	int *count = this_cpu_ptr(pmu->pmu_disable_count);
+	if (!--(*count))
+		pmu->pmu_enable(pmu);
 }
 
 static void get_ctx(struct perf_event_context *ctx)
@@ -4758,16 +4755,25 @@ static struct srcu_struct pmus_srcu;
 
 int perf_pmu_register(struct pmu *pmu)
 {
+	int ret;
+
 	spin_lock(&pmus_lock);
+	ret = -ENOMEM;
+	pmu->pmu_disable_count = alloc_percpu(int);
+	if (!pmu->pmu_disable_count)
+		goto unlock;
 	list_add_rcu(&pmu->entry, &pmus);
+	ret = 0;
+unlock:
 	spin_unlock(&pmus_lock);
 
-	return 0;
+	return ret;
 }
 
 void perf_pmu_unregister(struct pmu *pmu)
 {
 	spin_lock(&pmus_lock);
+	free_percpu(pmu->pmu_disable_count);
 	list_del_rcu(&pmu->entry);
 	spin_unlock(&pmus_lock);
 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 7/8] perf: Default PMU ops
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
                   ` (5 preceding siblings ...)
  2010-06-16 16:00 ` [RFC][PATCH 6/8] perf: Per PMU disable Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-16 16:00 ` [RFC][PATCH 8/8] perf: Rework the PMU methods Peter Zijlstra
  2010-06-16 18:19 ` [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
  8 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-default-ops.patch --]
[-- Type: text/plain, Size: 2216 bytes --]

Provide default implementations for the pmu txn methods, this allows
us to remove some conditional code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/perf_event.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -656,17 +656,11 @@ group_sched_in(struct perf_event *group_
 {
 	struct perf_event *event, *partial_group = NULL;
 	struct pmu *pmu = group_event->pmu;
-	bool txn = false;
 
 	if (group_event->state == PERF_EVENT_STATE_OFF)
 		return 0;
 
-	/* Check if group transaction availabe */
-	if (pmu->start_txn)
-		txn = true;
-
-	if (txn)
-		pmu->start_txn(pmu);
+	pmu->start_txn(pmu);
 
 	if (event_sched_in(group_event, cpuctx, ctx)) {
 		if (txn)
@@ -684,7 +678,7 @@ group_sched_in(struct perf_event *group_
 		}
 	}
 
-	if (!txn || !pmu->commit_txn(pmu))
+	if (!pmu->commit_txn(pmu))
 		return 0;
 
 group_error:
@@ -699,8 +693,7 @@ group_error:
 	}
 	event_sched_out(group_event, cpuctx, ctx);
 
-	if (txn)
-		pmu->cancel_txn(pmu);
+	pmu->cancel_txn(pmu);
 
 	return -EAGAIN;
 }
@@ -4748,6 +4741,26 @@ static struct list_head pmus;
 static DEFINE_SPINLOCK(pmus_lock);
 static struct srcu_struct pmus_srcu;
 
+static void perf_pmu_nop(struct pmu *pmu)
+{
+}
+
+static void perf_pmu_start_txn(struct pmu *pmu)
+{
+	perf_disable_pmu(pmu);
+}
+
+static int perf_pmu_commit_txn(struct pmu *pmu)
+{
+	perf_enable_pmu(pmu);
+	return 0;
+}
+
+static void perf_pmu_cancel_txn(struct pmu *pmu)
+{
+	perf_enable_pmu(pmu);
+}
+
 int perf_pmu_register(struct pmu *pmu)
 {
 	int ret;
@@ -4757,6 +4770,18 @@ int perf_pmu_register(struct pmu *pmu)
 	pmu->pmu_disable_count = alloc_percpu(int);
 	if (!pmu->pmu_disable_count)
 		goto unlock;
+
+	if (!pmu->pmu_enable) {
+		pmu->pmu_enable  = perf_pmu_nop;
+		pmu->pmu_disable = perf_pmu_nop;
+	}
+
+	if (!pmu->start_txn) {
+		pmu->start_txn  = perf_pmu_start_txn;
+		pmu->commit_txn = perf_pmu_commit_txn;
+		pmu->cancel_txn = perf_pmu_cancel_txn;
+	}
+
 	list_add_rcu(&pmu->entry, &pmus);
 	ret = 0;
 unlock:



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 8/8] perf: Rework the PMU methods
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
                   ` (6 preceding siblings ...)
  2010-06-16 16:00 ` [RFC][PATCH 7/8] perf: Default PMU ops Peter Zijlstra
@ 2010-06-16 16:00 ` Peter Zijlstra
  2010-06-18  4:21   ` Frederic Weisbecker
  2010-06-16 18:19 ` [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
  8 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 16:00 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-change-ops.patch --]
[-- Type: text/plain, Size: 22956 bytes --]

Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.

The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.

This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).

It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).

XXX: implement: ARM, SPARC, POWERPC

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sh/kernel/perf_event.c      |   67 +++++++++++++-----
 arch/x86/kernel/cpu/perf_event.c |   93 +++++++++++++------------
 include/linux/perf_event.h       |   40 ++++++++---
 kernel/hw_breakpoint.c           |   33 ++++++++-
 kernel/perf_event.c              |  140 ++++++++++++++++++++-------------------
 kernel/trace/trace_event_perf.c  |    6 +
 6 files changed, 236 insertions(+), 143 deletions(-)

Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -133,6 +133,12 @@ struct cpu_hw_events {
 	struct amd_nb		*amd_nb;
 };
 
+/*
+ * hw_perf_event::state flags
+ */
+#define HES_STOPPED	0x01
+#define HES_UPTODATE	0x02
+
 #define __EVENT_CONSTRAINT(c, n, m, w) {\
 	{ .idxmsk64 = (n) },		\
 	.code = (c),			\
@@ -800,8 +806,8 @@ static inline int match_prev_assignment(
 		hwc->last_tag == cpuc->tags[i];
 }
 
-static int x86_pmu_start(struct perf_event *event);
-static void x86_pmu_stop(struct perf_event *event);
+static int x86_pmu_start(struct perf_event *event, int flags);
+static void x86_pmu_stop(struct perf_event *event, int flags);
 
 static void x86_pmu_pmu_enable(struct pmu *pmu)
 {
@@ -839,7 +845,7 @@ static void x86_pmu_pmu_enable(struct pm
 			    match_prev_assignment(hwc, cpuc, i))
 				continue;
 
-			x86_pmu_stop(event);
+			x86_pmu_stop(event, PERF_EF_UPDATE);
 		}
 
 		for (i = 0; i < cpuc->n_events; i++) {
@@ -851,7 +857,8 @@ static void x86_pmu_pmu_enable(struct pm
 			else if (i < n_running)
 				continue;
 
-			x86_pmu_start(event);
+			if (!(event->hw.state & HES_STOPPED))
+				x86_pmu_start(event, PERF_EF_RELOAD);
 		}
 		cpuc->n_added = 0;
 		perf_events_lapic_init();
@@ -952,15 +959,12 @@ static void x86_pmu_enable_event(struct 
 }
 
 /*
- * activate a single event
+ * Add a single event to the PMU.
  *
  * The event is added to the group of enabled events
  * but only if it can be scehduled with existing events.
- *
- * Called with PMU disabled. If successful and return value 1,
- * then guaranteed to call perf_enable() and hw_perf_enable()
  */
-static int x86_pmu_enable(struct perf_event *event)
+static int x86_pmu_add(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc;
@@ -975,10 +979,14 @@ static int x86_pmu_enable(struct perf_ev
 	if (ret < 0)
 		goto out;
 
+	hwc->state = HES_UPTODATE;
+	if (!(flags & PERF_EF_START))
+		hwc->state |= HES_STOPPED;
+
 	/*
 	 * If group events scheduling transaction was started,
 	 * skip the schedulability test here, it will be peformed
-	 * at commit time(->commit_txn) as a whole
+	 * at commit time (->commit_txn) as a whole
 	 */
 	if (cpuc->group_flag & PERF_EVENT_TXN)
 		goto done_collect;
@@ -1003,27 +1011,24 @@ out:
 	return ret;
 }
 
-static int x86_pmu_start(struct perf_event *event)
+static void x86_pmu_start(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int idx = event->hw.idx;
 
-	if (idx == -1)
-		return -EAGAIN;
+	if (WARN_ON_ONCE(idx == -1))
+		return;
+
+	if (flags & PERF_EF_RELOAD) {
+		WARN_ON_ONCE(!(event->hw.state & HES_UPTODATE));
+		x86_perf_event_set_period(event);
+	}
 
-	x86_perf_event_set_period(event);
 	cpuc->events[idx] = event;
 	__set_bit(idx, cpuc->active_mask);
+	event->hw.state = 0;
 	x86_pmu.enable(event);
 	perf_event_update_userpage(event);
-
-	return 0;
-}
-
-static void x86_pmu_unthrottle(struct perf_event *event)
-{
-	int ret = x86_pmu_start(event);
-	WARN_ON_ONCE(ret);
 }
 
 void perf_event_print_debug(void)
@@ -1080,27 +1085,25 @@ void perf_event_print_debug(void)
 	local_irq_restore(flags);
 }
 
-static void x86_pmu_stop(struct perf_event *event)
+static void x86_pmu_stop(struct perf_event *event, int flags)
 {
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx;
-
 	if (!__test_and_clear_bit(idx, cpuc->active_mask))
-		return;
-
-	x86_pmu.disable(event);
-
-	/*
-	 * Drain the remaining delta count out of a event
-	 * that we are disabling:
-	 */
-	x86_perf_event_update(event);
+		x86_pmu.disable(event);
+		cpuc->events[idx] = NULL;
+		event->hw.state |= HES_STOPPED;
+	}
 
-	cpuc->events[idx] = NULL;
+	if ((flags & PERF_EF_UPDATE) && !(event->hw.state & HES_UPTODATE)) {
+		/*
+		 * Drain the remaining delta count out of a event
+		 * that we are disabling:
+		 */
+		x86_perf_event_update(event);
+		event->hw.state |= HES_UPTODATE;
+	}
 }
 
-static void x86_pmu_disable(struct perf_event *event)
+static void x86_pmu_del(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int i;
@@ -1113,7 +1116,7 @@ static void x86_pmu_disable(struct perf_
 	if (cpuc->group_flag & PERF_EVENT_TXN)
 		return;
 
-	x86_pmu_stop(event);
+	x86_pmu_stop(event, PERF_EF_UPDATE);
 
 	for (i = 0; i < cpuc->n_events; i++) {
 		if (event == cpuc->event_list[i]) {
@@ -1165,7 +1168,7 @@ static int x86_pmu_handle_irq(struct pt_
 			continue;
 
 		if (perf_event_overflow(event, 1, &data, regs))
-			x86_pmu_stop(event);
+			x86_pmu_stop(event, 0);
 	}
 
 	if (handled)
@@ -1574,13 +1577,15 @@ int x86_pmu_event_init(struct perf_event
 static struct pmu pmu = {
 	.pmu_enable	= x86_pmu_pmu_enable,
 	.pmu_disable	= x86_pmu_pmu_disable,
-	.event_init	= x86_pmu_event_init
-	.enable		= x86_pmu_enable,
-	.disable	= x86_pmu_disable,
+
+	.event_init	= x86_pmu_event_init,
+
+	.add		= x86_pmu_add,
+	.del		= x86_pmu_del,
 	.start		= x86_pmu_start,
 	.stop		= x86_pmu_stop,
 	.read		= x86_pmu_read,
-	.unthrottle	= x86_pmu_unthrottle,
+
 	.start_txn	= x86_pmu_start_txn,
 	.cancel_txn	= x86_pmu_cancel_txn,
 	.commit_txn	= x86_pmu_commit_txn,
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -529,7 +529,6 @@ struct hw_perf_event {
 			int		last_cpu;
 		};
 		struct { /* software */
-			s64		remaining;
 			struct hrtimer	hrtimer;
 		};
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
@@ -537,6 +536,7 @@ struct hw_perf_event {
 		struct arch_hw_breakpoint	info;
 #endif
 	};
+	int				state;
 	local64_t			prev_count;
 	u64				sample_period;
 	u64				last_period;
@@ -563,28 +563,48 @@ struct pmu {
 
 	int				*pmu_disable_count;
 
+	/*
+	 * Fully disable/enable this PMU, can be used to protect from the PMI
+	 * as well as for lazy/batch writing of the MSRs.
+	 */
 	void (*pmu_enable)		(struct pmu *pmu);
 	void (*pmu_disable)		(struct pmu *pmu);
 
 	/*
+	 * Try and initialize the event for this PMU.
 	 * Should return -ENOENT when the @event doesn't match this PMU.
 	 */
 	int (*event_init)		(struct perf_event *event);
 
-	int  (*enable)			(struct perf_event *event);
-	void (*disable)			(struct perf_event *event);
-	int  (*start)			(struct perf_event *event);
-	void (*stop)			(struct perf_event *event);
+#define PERF_EF_START	0x01		/* start the counter when adding    */
+#define PERF_EF_RELOAD	0x02		/* reload the counter when starting */
+#define PERF_EF_UPDATE	0x04		/* update the counter when stopping */
+
+	/*
+	 * Adds/Removes a counter to/from the PMU, can be done inside
+	 * a transaction, see the ->*_txn() methods.
+	 */
+	int  (*add)			(struct perf_event *event, int flags);
+	void (*del)			(struct perf_event *event, int flags);
+
+	/*
+	 * Starts/Stops a counter present on the PMU. The PMI handler
+	 * should stop the counter when perf_event_overflow() returns
+	 * !0. ->start() will be used to continue.
+	 */
+	void (*start)			(struct perf_event *event, int flags);
+	void (*stop)			(struct perf_event *event, int flags);
+
+	/*
+	 * Updates the counter value of the event.
+	 */
 	void (*read)			(struct perf_event *event);
-	void (*unthrottle)		(struct perf_event *event);
 
 	/*
 	 * Group events scheduling is treated as a transaction, add
 	 * group events as a whole and perform one schedulability test.
 	 * If the test fails, roll back the whole group
-	 */
-
-	/*
+	 *
 	 * Start the transaction, after this ->enable() doesn't need to
 	 * do schedulability tests.
 	 */
@@ -679,7 +699,7 @@ struct perf_event {
 	int				nr_siblings;
 	int				group_flags;
 	struct perf_event		*group_leader;
-	struct pmu		*pmu;
+	struct pmu			*pmu;
 
 	enum perf_event_active_state	state;
 	unsigned int			attach_state;
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -404,7 +404,7 @@ event_sched_out(struct perf_event *event
 		event->state = PERF_EVENT_STATE_OFF;
 	}
 	event->tstamp_stopped = ctx->time;
-	event->pmu->disable(event);
+	event->pmu->del(event, 0);
 	event->oncpu = -1;
 
 	if (!is_software_event(event))
@@ -631,7 +631,7 @@ event_sched_in(struct perf_event *event,
 	 */
 	smp_wmb();
 
-	if (event->pmu->enable(event)) {
+	if (event->pmu->add(event, PERF_EF_START)) {
 		event->state = PERF_EVENT_STATE_INACTIVE;
 		event->oncpu = -1;
 		return -EAGAIN;
@@ -1465,22 +1465,6 @@ do {					\
 	return div64_u64(dividend, divisor);
 }
 
-static void perf_event_stop(struct perf_event *event)
-{
-	if (!event->pmu->stop)
-		return event->pmu->disable(event);
-
-	return event->pmu->stop(event);
-}
-
-static int perf_event_start(struct perf_event *event)
-{
-	if (!event->pmu->start)
-		return event->pmu->enable(event);
-
-	return event->pmu->start(event);
-}
-
 static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -1500,9 +1484,9 @@ static void perf_adjust_period(struct pe
 	hwc->sample_period = sample_period;
 
 	if (local64_read(&hwc->period_left) > 8*sample_period) {
-		perf_event_stop(event);
+		event->pmu->stop(event, PERF_EF_UPDATE);
 		local64_set(&hwc->period_left, 0);
-		perf_event_start(event);
+		event->pmu->start(event, PERF_EF_RELOAD);
 	}
 }
 
@@ -1531,7 +1515,7 @@ static void perf_ctx_adjust_freq(struct 
 		 */
 		if (interrupts == MAX_INTERRUPTS) {
 			perf_log_throttle(event, 1);
-			event->pmu->unthrottle(event);
+			event->pmu->start(event, 0);
 		}
 
 		if (!event->attr.freq || !event->attr.sample_freq)
@@ -3900,8 +3884,6 @@ static int __perf_event_overflow(struct 
 	struct hw_perf_event *hwc = &event->hw;
 	int ret = 0;
 
-	throttle = (throttle && event->pmu->unthrottle != NULL);
-
 	if (!throttle) {
 		hwc->interrupts++;
 	} else {
@@ -4069,6 +4051,9 @@ static int perf_swevent_match(struct per
 				struct perf_sample_data *data,
 				struct pt_regs *regs)
 {
+	if (event->hw.state)
+		return 0;
+
 	if (event->attr.type != type)
 		return 0;
 
@@ -4211,7 +4196,7 @@ static void perf_swevent_read(struct per
 {
 }
 
-static int perf_swevent_enable(struct perf_event *event)
+static int perf_swevent_add(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_cpu_context *cpuctx;
@@ -4224,6 +4209,8 @@ static int perf_swevent_enable(struct pe
 		perf_swevent_set_period(event);
 	}
 
+	hwc->state = !(flags & PERF_EF_START);
+
 	head = find_swevent_head(cpuctx, event);
 	if (WARN_ON_ONCE(!head))
 		return -EINVAL;
@@ -4233,13 +4220,19 @@ static int perf_swevent_enable(struct pe
 	return 0;
 }
 
-static void perf_swevent_disable(struct perf_event *event)
+static void perf_swevent_del(struct perf_event *event, int flags)
 {
 	hlist_del_rcu(&event->hlist_entry);
 }
 
-static void perf_swevent_void(struct perf_event *event)
+static void perf_swevent_start(struct perf_event *event, int flags)
+{
+	event->hw.state = 0;
+}
+
+static void perf_swevent_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = 1;
 }
 
 /* Deref the hlist from the update side */
@@ -4393,12 +4386,11 @@ static int perf_swevent_int(struct perf_
 
 static struct pmu perf_swevent = {
 	.event_init	= perf_swevent_init,
-	.enable		= perf_swevent_enable,
-	.disable	= perf_swevent_disable,
-	.start		= perf_swevent_int,
-	.stop		= perf_swevent_void,
+	.add		= perf_swevent_add,
+	.del		= perf_swevent_del,
+	.start		= perf_swevent_start,
+	.stop		= perf_swevent_stop,
 	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void, /* hwc->interrupts already reset */
 };
 
 #ifdef CONFIG_EVENT_TRACING
@@ -4485,12 +4477,11 @@ static int perf_tp_event_init(struct per
 
 static struct pmu perf_tracepoint = {
 	.event_init	= perf_tp_event_init,
-	.enable		= perf_trace_enable,
-	.disable	= perf_trace_disable,
-	.start		= perf_swevent_int,
-	.stop		= perf_swevent_void,
+	.add		= perf_trace_add,
+	.del		= perf_trace_del,
+	.start		= perf_swevent_start,
+	.stop		= perf_swevent_stop,
 	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void,
 };
 
 static inline void perf_tp_register(void)
@@ -4546,7 +4537,7 @@ void perf_bp_event(struct perf_event *bp
 
 	perf_sample_data_init(&sample, bp->attr.bp_addr);
 
-	if (!perf_exclude_event(bp, regs))
+	if (!bp->hw.state && !perf_exclude_event(bp, regs))
 		perf_swevent_add(bp, 1, 1, &sample, regs);
 }
 #endif
@@ -4591,12 +4582,12 @@ static void perf_swevent_start_hrtimer(s
 	if (hwc->sample_period) {
 		u64 period;
 
-		if (hwc->remaining) {
-			if (hwc->remaining < 0)
+		if (hwc->period_left) {
+			if (hwc->period_left < 0)
 				period = 10000;
 			else
-				period = hwc->remaining;
-			hwc->remaining = 0;
+				period = hwc->period_left;
+			hwc->period_left = 0;
 		} else {
 			period = max_t(u64, 10000, hwc->sample_period);
 		}
@@ -4611,8 +4602,8 @@ static void perf_swevent_cancel_hrtimer(
 	struct hw_perf_event *hwc = &event->hw;
 
 	if (hwc->sample_period) {
-		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
-		hwc->remaining = ktime_to_ns(remaining);
+		ktime_t period_left = hrtimer_get_period_left(&hwc->hrtimer);
+		hwc->period_left = ktime_to_ns(period_left);
 
 		hrtimer_cancel(&hwc->hrtimer);
 	}
@@ -4624,32 +4615,39 @@ static void perf_swevent_cancel_hrtimer(
 
 static void cpu_clock_event_update(struct perf_event *event)
 {
-	int cpu = raw_smp_processor_id();
 	s64 prev;
 	u64 now;
 
-	now = cpu_clock(cpu);
+	now = local_clock();
 	prev = local64_xchg(&event->hw.prev_count, now);
 	local64_add(now - prev, &event->count);
 }
 
-static int cpu_clock_event_enable(struct perf_event *event)
+static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
-	struct hw_perf_event *hwc = &event->hw;
-	int cpu = raw_smp_processor_id();
-
-	local64_set(&hwc->prev_count, cpu_clock(cpu));
+	local64_set(&event->hw.prev_count, local_clock());
 	perf_swevent_start_hrtimer(event);
-
-	return 0;
 }
 
-static void cpu_clock_event_disable(struct perf_event *event)
+static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
 	cpu_clock_event_update(event);
 }
 
+static int cpu_clock_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		cpu_clock_event_start(event, flags);
+
+	return 0;
+}
+
+static void cpu_clock_event_del(struct perf_event *event, int flags)
+{
+	cpu_clock_event_stop(event, flags);
+}
+
 static void cpu_clock_event_read(struct perf_event *event)
 {
 	cpu_clock_event_update(event);
@@ -4668,8 +4666,10 @@ static int cpu_clock_event_init(struct p
 
 static struct pmu perf_cpu_clock = {
 	.event_init	= cpu_clock_event_init,
-	.enable		= cpu_clock_event_enable,
-	.disable	= cpu_clock_event_disable,
+	.add		= cpu_clock_event_add,
+	.del		= cpu_clock_event_del,
+	.start		= cpu_clock_event_start,
+	.stop		= cpu_clock_event_stop,
 	.read		= cpu_clock_event_read,
 };
 
@@ -4687,25 +4687,27 @@ static void task_clock_event_update(stru
 	local64_add(delta, &event->count);
 }
 
-static int task_clock_event_enable(struct perf_event *event)
+static void task_clock_event_start(struct perf_event *event, int flags)
 {
-	struct hw_perf_event *hwc = &event->hw;
-	u64 now;
-
-	now = event->ctx->time;
-
-	local64_set(&hwc->prev_count, now);
-
+	local64_set(&event->hw.prev_count, event->ctx->time);
 	perf_swevent_start_hrtimer(event);
-
-	return 0;
 }
 
-static void task_clock_event_disable(struct perf_event *event)
+static void task_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
 	task_clock_event_update(event, event->ctx->time);
+}
 
+static int task_clock_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		task_clock_event_start(event, flags);
+}
+
+static void task_clock_event_stop(struct perf_event *event, int flags)
+{
+	task_clock_event_stop(event, flags);
 }
 
 static void task_clock_event_read(struct perf_event *event)
@@ -4737,8 +4739,10 @@ static int task_clock_event_init(struct 
 
 static struct pmu perf_ops_clock = {
 	.event_init	= task_clock_event_init,
-	.enable		= task_clock_event_enable,
-	.disable	= task_clock_event_disable,
+	.add		= task_clock_event_add,
+	.del		= task_clock_event_del,
+	.start		= task_clock_event_start,
+	.stop		= task_clock_event_stop,
 	.read		= task_clock_event_read,
 };
 
Index: linux-2.6/kernel/trace/trace_event_perf.c
===================================================================
--- linux-2.6.orig/kernel/trace/trace_event_perf.c
+++ linux-2.6/kernel/trace/trace_event_perf.c
@@ -107,7 +107,7 @@ int perf_trace_init(struct perf_event *p
 	return ret;
 }
 
-int perf_trace_enable(struct perf_event *p_event)
+int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
 	struct hlist_head *list;
@@ -116,13 +116,15 @@ int perf_trace_enable(struct perf_event 
 	if (WARN_ON_ONCE(!list))
 		return -EINVAL;
 
+	p_event->hw.stopped = !(flags & PERF_EF_START);
+
 	list = this_cpu_ptr(list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
 	return 0;
 }
 
-void perf_trace_disable(struct perf_event *p_event)
+void perf_trace_del(struct perf_event *p_event, int flags)
 {
 	hlist_del_rcu(&p_event->hlist_entry);
 }
Index: linux-2.6/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6.orig/kernel/hw_breakpoint.c
+++ linux-2.6/kernel/hw_breakpoint.c
@@ -570,10 +570,39 @@ static struct pmu *bp_perf_event_init(st
 	return 0;
 }
 
+static int hw_breakpoint_add(struct perf_event *event, int flags)
+{
+	int ret;
+
+	ret = arch_install_hw_breakpoint(event);
+	if (ret)
+		return ret;
+
+	event->hw.state = !(flags & PERF_EF_START);
+	return 0;
+}
+
+static void hw_breakpoint_del(struct perf_event *event, int flags)
+{
+	arch_uninstall_hw_breakpoint(event);
+}
+
+static void hw_breakpoint_start(struct perf_event *event, int flags)
+{
+	event->hw.state = 0;
+}
+
+static void hw_breakpoint_stop(struct perf_event *event, int flags)
+{
+	event->hw.state = 1;
+}
+
 static struct pmu perf_breakpoint = {
 	.event_init	= hw_breakpoint_event_init,
-	.enable		= arch_install_hw_breakpoint,
-	.disable	= arch_uninstall_hw_breakpoint,
+	.add		= hw_breakpoint_add,
+	.del		= hw_breakpoint_del,
+	.start		= hw_breakpoint_start,
+	.stop		= hw_breakpoint_stop,
 	.read		= hw_breakpoint_pmu_read,
 };
 
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -30,9 +30,14 @@
 struct cpu_hw_events {
 	struct perf_event	*events[MAX_HWEVENTS];
 	unsigned long		used_mask[BITS_TO_LONGS(MAX_HWEVENTS)];
-	unsigned long		active_mask[BITS_TO_LONGS(MAX_HWEVENTS)];
 };
 
+/*
+ * hw_perf_event::state flags
+ */
+#define HES_STOPPED	0x01
+#define HES_UPTODATE	0x02
+
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
 
 static struct sh_pmu *sh_pmu __read_mostly;
@@ -206,46 +211,72 @@ again:
 	local64_add(delta, &event->count);
 }
 
-static void sh_pmu_disable(struct perf_event *event)
+static void sh_pmu_stop(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	clear_bit(idx, cpuc->active_mask);
-	sh_pmu->disable(hwc, idx);
+	if (!(event->hw.state & HES_STOPPED)) {
+		sh_pmu->disable(hwc, idx);
+		cpuc->events[idx] = NULL;
+		event->hw.state |= HES_STOPPED;
+	}
+
+	if ((flags & PERF_EF_UPDATE) && !(event->hw.state & HES_UPTODATE)) {
+		sh_perf_event_update(event, &event->hw, idx);
+		event->hw.state |= HES_UPTODATE;
+	}
+}
+
+static void sh_pmu_start(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	if (WARN_ON_ONCE(idx == -1))
+		return;
 
-	barrier();
+	if (flags & PERF_EF_RELOAD)
+		WARN_ON_ONCE(!(event->hw.state & HES_UPTODATE));
+
+	cpuc->events[idx] = event;
+	event->hw.state = 0;
+	sh_pmu->enable(hwc, idx);
+}
 
-	sh_perf_event_update(event, &event->hw, idx);
+static void sh_pmu_del(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
-	cpuc->events[idx] = NULL;
-	clear_bit(idx, cpuc->used_mask);
+	sh_pmu_stop(event, PERF_EF_UPDATE);
+	__clear_bit(event->hw.idx, cpuc->used_mask);
 
 	perf_event_update_userpage(event);
 }
 
-static int sh_pmu_enable(struct perf_event *event)
+static int sh_pmu_enable(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	if (test_and_set_bit(idx, cpuc->used_mask)) {
+	if (__test_and_set_bit(idx, cpuc->used_mask)) {
 		idx = find_first_zero_bit(cpuc->used_mask, sh_pmu->num_events);
 		if (idx == sh_pmu->num_events)
 			return -EAGAIN;
 
-		set_bit(idx, cpuc->used_mask);
 		hwc->idx = idx;
 	}
 
 	sh_pmu->disable(hwc, idx);
 
-	cpuc->events[idx] = event;
-	set_bit(idx, cpuc->active_mask);
-
-	sh_pmu->enable(hwc, idx);
+	event->hw.state = HES_UPTODATE;
+	if (!(flags & PERF_EF_START))
+		event->hw.state = HES_STOPPED;
+	else
+		sh_pmu_start(event, PERF_EF_RELOAD);
 
 	perf_event_update_userpage(event);
 
@@ -288,8 +319,10 @@ static struct pmu pmu = {
 	.pmu_enable	= sh_pmu_pmu_enable,
 	.pmu_disable	= sh_pmu_pmu_disable,
 	.event_init	= sh_pmu_event_init,
-	.enable		= sh_pmu_enable,
-	.disable	= sh_pmu_disable,
+	.add		= sh_pmu_add,
+	.del		= sh_pmu_del,
+	.start		= sh_pmu_start,
+	.stop		= sh_pmu_stop,
 	.read		= sh_pmu_read,
 };
 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-16 16:00 ` [RFC][PATCH 3/8] perf: register pmu implementations Peter Zijlstra
@ 2010-06-16 16:45   ` Robert Richter
  2010-06-16 17:03   ` Frederic Weisbecker
  2010-06-17 23:31   ` Paul Mackerras
  2 siblings, 0 replies; 32+ messages in thread
From: Robert Richter @ 2010-06-16 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On 16.06.10 12:00:30, Peter Zijlstra wrote:
> Simple registration interface for struct pmu, this provides the
> infrastructure for removing all the weak functions.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/arm/kernel/perf_event.c             |   36 +
>  arch/powerpc/kernel/perf_event.c         |   41 +-
>  arch/powerpc/kernel/perf_event_fsl_emb.c |   31 -
>  arch/sh/kernel/perf_event.c              |   19
>  arch/sparc/kernel/perf_event.c           |   13
>  arch/x86/kernel/cpu/perf_event.c         |   45 +-
>  include/linux/perf_event.h               |   10
>  kernel/hw_breakpoint.c                   |   35 +
>  kernel/perf_event.c                      |  597 +++++++++++++++----------------
>  9 files changed, 432 insertions(+), 395 deletions(-)

Yes, this makes the code much more easier! Now it would be possible to
easy register multiple hw pmus.

Thanks Peter,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 5/8] perf: Reduce perf_disable() usage
  2010-06-16 16:00 ` [RFC][PATCH 5/8] perf: Reduce perf_disable() usage Peter Zijlstra
@ 2010-06-16 16:52   ` Cyrill Gorcunov
  0 siblings, 0 replies; 32+ messages in thread
From: Cyrill Gorcunov @ 2010-06-16 16:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Wed, Jun 16, 2010 at 06:00:32PM +0200, Peter Zijlstra wrote:
> Since the current perf_disable() usage is only an optimization, remove
> it for now. This eases the removal of the weak hw_perf_enable
> interface.
> 
> [ Some comments talk about NMI races, but I cannot find any such
>   things. ]
> 

The race was there and was fixed by commit 

| commit 34adc8062227f41b04ade0ff3fbd1dbe3002669e
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Wed May 20 20:13:28 2009 +0200

then you fixed it further in commit

| commit 3fb2b8ddcc6a7aa62af6bd2cb939edfd4c460506
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Mon Mar 8 13:51:01 2010 +0100

which I understand as we don't need perf_disable here for sure.
Just for the record :)

	-- Cyrill

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-16 16:00 ` [RFC][PATCH 3/8] perf: register pmu implementations Peter Zijlstra
  2010-06-16 16:45   ` Robert Richter
@ 2010-06-16 17:03   ` Frederic Weisbecker
  2010-06-16 17:48     ` Peter Zijlstra
  2010-06-17 23:31   ` Paul Mackerras
  2 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-16 17:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, Jun 16, 2010 at 06:00:30PM +0200, Peter Zijlstra wrote:
> +static void bp_perf_event_destroy(struct perf_event *event)
> +{
> +	release_bp_slot(event);
> +}
> +
> +static struct pmu *bp_perf_event_init(struct perf_event *bp)
> +{
> +	int err;
> +
> +	if (event->attr.type != PERF_TYPE_BREAKPOINT)
> +		return -ENOENT;
> +
> +	err = register_perf_hw_breakpoint(bp);
> +	if (err)
> +		return err;
> +
> +	bp->destroy = bp_perf_event_destroy;
> +
> +	return 0;
> +}
> +
> +static struct pmu perf_breakpoint = {
> +	.event_init	= hw_breakpoint_event_init,



Should be bp_perf_event_init?



> +	.enable		= arch_install_hw_breakpoint,
> +	.disable	= arch_uninstall_hw_breakpoint,
> +	.read		= hw_breakpoint_pmu_read,
> +};
<snip>
> +static int perf_swevent_int(struct perf_event *event)
> +{
> +	if (event->attr.type != PERF_TYPE_SOFTWARE)
> +		return -ENOENT


perf_swevent_init() ?



> +int perf_pmu_register(struct pmu *pmu)
> +{
> +	spin_lock(&pmus_lock);
> +	list_add_rcu(&pmu->entry, &pmus);
> +	spin_unlock(&pmus_lock);
> +
> +	return 0;
> +}
> +
> +void perf_pmu_unregister(struct pmu *pmu)
> +{
> +	spin_lock(&pmus_lock);
> +	list_del_rcu(&pmu->entry);
> +	spin_unlock(&pmus_lock);
> +
> +	synchronize_srcu(&pmus_srcu);
> +}



Who needs this?



> +
> +struct pmu *perf_init_event(struct perf_event *event)
> +{
> +	struct pmu *pmu;
> +	int idx;
> +
> +	idx = srcu_read_lock(&pmus_srcu);
> +	list_for_each_entry_rcu(pmu, &pmus, entry) {
> +		int ret = pmu->event_init(event);
> +		if (!ret)
> +			break;
> +		if (ret != -ENOENT) {
> +			pmu = ERR_PTR(ret);
> +			break;
>  		}
> -		pmu = &perf_ops_generic;
> -		break;
>  	}
> +	srcu_read_unlock(&pmus_srcu, idx);



This could use a simple mutex instead of a spinlock + srcu_sync on
writer and srcu on reader.

That doesn't matter much that said. What I don't understand is
why we need to synchronize the writers. Walking the list with
list_*_rcu() looks justified once we support boot events, but
until then...


For the rest of the patch, the whole idea is nice.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 4/8] perf: Unindent labels
  2010-06-16 16:00 ` [RFC][PATCH 4/8] perf: Unindent labels Peter Zijlstra
@ 2010-06-16 17:16   ` Frederic Weisbecker
  2010-06-16 17:48     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-16 17:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, Jun 16, 2010 at 06:00:31PM +0200, Peter Zijlstra wrote:
> Fixup random annoying style bits
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/perf_event.c |   43 ++++++++++++++++++++++++-------------------
>  1 file changed, 24 insertions(+), 19 deletions(-)
> 
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -147,7 +147,7 @@ perf_lock_task_context(struct task_struc
>  	struct perf_event_context *ctx;
>  
>  	rcu_read_lock();
> - retry:
> +retry:



The point in having a space before the label starts is that it doesn't
appear as a function boundary in patches.

This avoid patches that look like:

@@ -xxx +yyy @@ out:
-	blah


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-16 17:03   ` Frederic Weisbecker
@ 2010-06-16 17:48     ` Peter Zijlstra
  2010-06-16 18:10       ` Peter Zijlstra
  2010-06-18  4:51       ` Frederic Weisbecker
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 17:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, 2010-06-16 at 19:03 +0200, Frederic Weisbecker wrote:

> > +static struct pmu perf_breakpoint = {
> > +	.event_init	= hw_breakpoint_event_init,
> 
> 
> 
> Should be bp_perf_event_init?

Ah, yes, like said, the compiler didn't get near yet..

> 
> > +	.enable		= arch_install_hw_breakpoint,
> > +	.disable	= arch_uninstall_hw_breakpoint,
> > +	.read		= hw_breakpoint_pmu_read,
> > +};
> <snip>
> > +static int perf_swevent_int(struct perf_event *event)
> > +{
> > +	if (event->attr.type != PERF_TYPE_SOFTWARE)
> > +		return -ENOENT
> 
> 
> perf_swevent_init() ?

copy/paste gone wild..

> > +void perf_pmu_unregister(struct pmu *pmu)
> > +{
> > +	spin_lock(&pmus_lock);
> > +	list_del_rcu(&pmu->entry);
> > +	spin_unlock(&pmus_lock);
> > +
> > +	synchronize_srcu(&pmus_srcu);
> > +}

> Who needs this?

Nobody yet.. 

> > +
> > +struct pmu *perf_init_event(struct perf_event *event)
> > +{
> > +	struct pmu *pmu;
> > +	int idx;
> > +
> > +	idx = srcu_read_lock(&pmus_srcu);
> > +	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > +		int ret = pmu->event_init(event);
> > +		if (!ret)
> > +			break;
> > +		if (ret != -ENOENT) {
> > +			pmu = ERR_PTR(ret);
> > +			break;
> >  		}
> > -		pmu = &perf_ops_generic;
> > -		break;
> >  	}
> > +	srcu_read_unlock(&pmus_srcu, idx);
> 
> 
> 
> This could use a simple mutex instead of a spinlock + srcu_sync on
> writer and srcu on reader.

Right, that spinlock needs to be a mutex for sure, a later patch adds an
allocation under it.

But even with a mutex we need srcu_sync in there to sync against the
readers.

> That doesn't matter much that said. What I don't understand is
> why we need to synchronize the writers. Walking the list with
> list_*_rcu() looks justified once we support boot events, but
> until then...

Well, the typical unregister user would be a module, if you unregister
and then dealloc the struct pmu by unloading the module a reader might
still see a reference to it if you don't srcu_sync it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/8] perf: Per PMU disable
  2010-06-16 16:00 ` [RFC][PATCH 6/8] perf: Per PMU disable Peter Zijlstra
@ 2010-06-16 17:48   ` Robert Richter
  2010-06-16 17:58     ` Peter Zijlstra
  2010-06-18  2:14   ` Frederic Weisbecker
  1 sibling, 1 reply; 32+ messages in thread
From: Robert Richter @ 2010-06-16 17:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On 16.06.10 12:00:33, Peter Zijlstra wrote:
> Changes perf_disable() into perf_disable_pmu().

I would rather use perf_pmu_disable() as this uses the perf_pmu_XXX
namespace.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 4/8] perf: Unindent labels
  2010-06-16 17:16   ` Frederic Weisbecker
@ 2010-06-16 17:48     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 17:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, 2010-06-16 at 19:16 +0200, Frederic Weisbecker wrote:

> The point in having a space before the label starts is that it doesn't
> appear as a function boundary in patches.
> 
> This avoid patches that look like:
> 
> @@ -xxx +yyy @@ out:
> -	blah
> 

Yeah, fix diff already ;-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/8] perf: Per PMU disable
  2010-06-16 17:48   ` Robert Richter
@ 2010-06-16 17:58     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 17:58 UTC (permalink / raw)
  To: Robert Richter
  Cc: paulus, stephane eranian, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Wed, 2010-06-16 at 19:48 +0200, Robert Richter wrote:
> On 16.06.10 12:00:33, Peter Zijlstra wrote:
> > Changes perf_disable() into perf_disable_pmu().
> 
> I would rather use perf_pmu_disable() as this uses the perf_pmu_XXX
> namespace.

Sure..

sed -i -e 's/perf_disable_pmu/perf_pmu_disable/g' -e 's/perf_enable_pmu/perf_pmu_enable/g' `quilt series`



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-16 17:48     ` Peter Zijlstra
@ 2010-06-16 18:10       ` Peter Zijlstra
  2010-06-18  4:51       ` Frederic Weisbecker
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 18:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, 2010-06-16 at 19:48 +0200, Peter Zijlstra wrote:
> > > +static int perf_swevent_int(struct perf_event *event)
> > > +{
> > > +   if (event->attr.type != PERF_TYPE_SOFTWARE)
> > > +           return -ENOENT
> > 
> > 
> > perf_swevent_init() ?
> 
> copy/paste gone wild.. 

/me learns how to read again,.. D'0h!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/8] perf pmu interface
  2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
                   ` (7 preceding siblings ...)
  2010-06-16 16:00 ` [RFC][PATCH 8/8] perf: Rework the PMU methods Peter Zijlstra
@ 2010-06-16 18:19 ` Peter Zijlstra
  2010-06-18  4:35   ` Frederic Weisbecker
  8 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-16 18:19 UTC (permalink / raw)
  To: paulus
  Cc: stephane eranian, Robert Richter, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel, Ingo Molnar

On Wed, 2010-06-16 at 18:00 +0200, Peter Zijlstra wrote:
> These patches prepare the perf code for multiple pmus (no user
> interface yet, Lin Ming is working on that). These patches remove all
> weak functions and rework the struct pmu interface.
> 
> The latter is inspired by the work Frederic is doing to to filter out
> IRQ contexts.
> 
> These patches are very prelimenary, they haven't seen a compiler yet and
> the last patch still needs sparc,ppc and arm converted.
> 
> But they patches seem to be in a good enough shape to see what people
> think..

Another idea I was kicking about was to push find_get_context()
partially into struct pmu, so that we can have context's per pmu.

For cpu-wide contexts its easy, for per-task contexts we need more
pointers in task_struct, so I was thinking of something like:

enum {
  perf_swevent_context = 0,
  perf_cpu_context,
#ifdef HW_BREAKPOINT
  perf_bp_context,
#endif
  perf_nr_task_context
};

struct task_struct {
  ...
  struct perf_event_context *perf_event_ctxs[perf_nr_task_context];
  ...
};

and have add for loops over the struct pmu list for the cpu-wide
contexts and for loops over perf_nr_task_context for the task contexts.

It would add some extra code to the hot-paths, but its the best I can
come up with.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-16 16:00 ` [RFC][PATCH 3/8] perf: register pmu implementations Peter Zijlstra
  2010-06-16 16:45   ` Robert Richter
  2010-06-16 17:03   ` Frederic Weisbecker
@ 2010-06-17 23:31   ` Paul Mackerras
  2010-06-18  8:40     ` Peter Zijlstra
  2 siblings, 1 reply; 32+ messages in thread
From: Paul Mackerras @ 2010-06-17 23:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stephane eranian, Robert Richter, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Wed, Jun 16, 2010 at 06:00:30PM +0200, Peter Zijlstra wrote:

> Simple registration interface for struct pmu, this provides the
> infrastructure for removing all the weak functions.

Nice idea...

> @@ -1011,7 +1001,7 @@ static int hw_perf_cache_event(u64 confi
>  	return 0;
>  }
>  
> -struct pmu *hw_perf_event_init(struct perf_event *event)
> +static int power_pmu_event_init(struct perf_event *event)

How does power_pmu_event_init ever get called now?  I don't see any
other references to it in the patch.  Should struct pmu have a
reference to it?

Paul.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/8] perf: Per PMU disable
  2010-06-16 16:00 ` [RFC][PATCH 6/8] perf: Per PMU disable Peter Zijlstra
  2010-06-16 17:48   ` Robert Richter
@ 2010-06-18  2:14   ` Frederic Weisbecker
  2010-06-18  7:11     ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-18  2:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, Jun 16, 2010 at 06:00:33PM +0200, Peter Zijlstra wrote:
> +static void armpmu_pmu_enable(struct pmu *pmu)
>  {
> +static void powerpc_pmu_pmu_disable(struct pmu *pmu)
>  {
> +static void fsl_emb_pmu_pmu_disable(struct pmu *pmu)
>  {
> +static void sh_pmu_pmu_enable(struct pmu *pmu)
> +{
> +static void sparc_pmu_pmu_enable(struct pmu *pmu)
>  {
> +static void x86_pmu_pmu_disable(struct pmu *pmu)
>  {


These namings are really bad. Why not just using pmu once
in each names? x86_pmu_enable, etc...


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 8/8] perf: Rework the PMU methods
  2010-06-16 16:00 ` [RFC][PATCH 8/8] perf: Rework the PMU methods Peter Zijlstra
@ 2010-06-18  4:21   ` Frederic Weisbecker
  2010-06-18  7:15     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-18  4:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, Jun 16, 2010 at 06:00:35PM +0200, Peter Zijlstra wrote:
> -static void x86_pmu_stop(struct perf_event *event)
> +static void x86_pmu_stop(struct perf_event *event, int flags)
>  {
> -	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> -	struct hw_perf_event *hwc = &event->hw;
> -	int idx = hwc->idx;
> -
>  	if (!__test_and_clear_bit(idx, cpuc->active_mask))
> -		return;



Do you still need active_mask now that you have HES_STOPPED?



> @@ -4069,6 +4051,9 @@ static int perf_swevent_match(struct per
>  				struct perf_sample_data *data,
>  				struct pt_regs *regs)
>  {
> +	if (event->hw.state)
> +		return 0;
> +
>  	if (event->attr.type != type)
>  		return 0;
>  
> @@ -4211,7 +4196,7 @@ static void perf_swevent_read(struct per
>  {
>  }
>  
> -static int perf_swevent_enable(struct perf_event *event)
> +static int perf_swevent_add(struct perf_event *event, int flags)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	struct perf_cpu_context *cpuctx;
> @@ -4224,6 +4209,8 @@ static int perf_swevent_enable(struct pe
>  		perf_swevent_set_period(event);
>  	}
>  
> +	hwc->state = !(flags & PERF_EF_START);
> +
>  	head = find_swevent_head(cpuctx, event);
>  	if (WARN_ON_ONCE(!head))
>  		return -EINVAL;
> @@ -4233,13 +4220,19 @@ static int perf_swevent_enable(struct pe
>  	return 0;
>  }
>  
> -static void perf_swevent_disable(struct perf_event *event)
> +static void perf_swevent_del(struct perf_event *event, int flags)
>  {
>  	hlist_del_rcu(&event->hlist_entry);
>  }
>  
> -static void perf_swevent_void(struct perf_event *event)
> +static void perf_swevent_start(struct perf_event *event, int flags)
> +{
> +	event->hw.state = 0;
> +}
> +
> +static void perf_swevent_stop(struct perf_event *event, int flags)
>  {
> +	event->hw.state = 1;
>  }


So, instead of doing this and add yet another check in the fast path,
what about just playing with the hlist insertion and deletion?

And if we have PERF_EF_RELOAD in start or PERF_EF_UPDATE in stop,
we simply do nothing, as we know it's just about updating the counter
or reset the interrupt, things that software events just don't care.

And in ->add, you can also do nothing if PERF_EF_START.

It would be nice to have a PERF_EF_STOP as well in ->del, so that
each pmu don't need to maintain an internal state.
If we assume the generic code will never imbalance add/start/stop/del,
or call start after add(PERF_EF_START), or things like this, pmus
like this don't need to ever touch event->hw.state. It's only
necessary for hardware events that call their start/stop internally.


>  static inline void perf_tp_register(void)
> @@ -4546,7 +4537,7 @@ void perf_bp_event(struct perf_event *bp
>  
>  	perf_sample_data_init(&sample, bp->attr.bp_addr);
>  
> -	if (!perf_exclude_event(bp, regs))
> +	if (!bp->hw.state && !perf_exclude_event(bp, regs))
>  		perf_swevent_add(bp, 1, 1, &sample, regs);



Same thing here, and same for trace events.



>  }
>  #endif
> @@ -4591,12 +4582,12 @@ static void perf_swevent_start_hrtimer(s
>  	if (hwc->sample_period) {
>  		u64 period;
>  
> -		if (hwc->remaining) {
> -			if (hwc->remaining < 0)
> +		if (hwc->period_left) {
> +			if (hwc->period_left < 0)
>  				period = 10000;
>  			else
> -				period = hwc->remaining;
> -			hwc->remaining = 0;
> +				period = hwc->period_left;
> +			hwc->period_left = 0;



If remaining can be replaced by period_left, it should probably be done
in another patch.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/8] perf pmu interface
  2010-06-16 18:19 ` [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
@ 2010-06-18  4:35   ` Frederic Weisbecker
  2010-06-18  7:22     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-18  4:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel, Ingo Molnar

On Wed, Jun 16, 2010 at 08:19:32PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-06-16 at 18:00 +0200, Peter Zijlstra wrote:
> > These patches prepare the perf code for multiple pmus (no user
> > interface yet, Lin Ming is working on that). These patches remove all
> > weak functions and rework the struct pmu interface.
> > 
> > The latter is inspired by the work Frederic is doing to to filter out
> > IRQ contexts.
> > 
> > These patches are very prelimenary, they haven't seen a compiler yet and
> > the last patch still needs sparc,ppc and arm converted.
> > 
> > But they patches seem to be in a good enough shape to see what people
> > think..
> 
> Another idea I was kicking about was to push find_get_context()
> partially into struct pmu, so that we can have context's per pmu.
> 
> For cpu-wide contexts its easy, for per-task contexts we need more
> pointers in task_struct, so I was thinking of something like:
> 
> enum {
>   perf_swevent_context = 0,
>   perf_cpu_context,
> #ifdef HW_BREAKPOINT
>   perf_bp_context,
> #endif
>   perf_nr_task_context
> };
> 
> struct task_struct {
>   ...
>   struct perf_event_context *perf_event_ctxs[perf_nr_task_context];
>   ...
> };
> 
> and have add for loops over the struct pmu list for the cpu-wide
> contexts and for loops over perf_nr_task_context for the task contexts.
> 
> It would add some extra code to the hot-paths, but its the best I can
> come up with.


I'm not sure what you mean. Would that be to optimize the start_txn / commit_txn ?
Then that sounds a good idea. But you'd only need two groups I think:

enum {
   perf_swevent_context = 0,
   perf_cpu_context,
   perf_nr_task_context
};


As only the cpu pmu needs the txn game, at least for now.

It seems that would drop the ability to gather hardware and software
events in a same group though.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-16 17:48     ` Peter Zijlstra
  2010-06-16 18:10       ` Peter Zijlstra
@ 2010-06-18  4:51       ` Frederic Weisbecker
  1 sibling, 0 replies; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-18  4:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Wed, Jun 16, 2010 at 07:48:00PM +0200, Peter Zijlstra wrote:
> > > +
> > > +struct pmu *perf_init_event(struct perf_event *event)
> > > +{
> > > +	struct pmu *pmu;
> > > +	int idx;
> > > +
> > > +	idx = srcu_read_lock(&pmus_srcu);
> > > +	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > > +		int ret = pmu->event_init(event);
> > > +		if (!ret)
> > > +			break;
> > > +		if (ret != -ENOENT) {
> > > +			pmu = ERR_PTR(ret);
> > > +			break;
> > >  		}
> > > -		pmu = &perf_ops_generic;
> > > -		break;
> > >  	}
> > > +	srcu_read_unlock(&pmus_srcu, idx);
> > 
> > 
> > 
> > This could use a simple mutex instead of a spinlock + srcu_sync on
> > writer and srcu on reader.
> 
> Right, that spinlock needs to be a mutex for sure, a later patch adds an
> allocation under it.
> 
> But even with a mutex we need srcu_sync in there to sync against the
> readers.
> 
> > That doesn't matter much that said. What I don't understand is
> > why we need to synchronize the writers. Walking the list with
> > list_*_rcu() looks justified once we support boot events, but
> > until then...
> 
> Well, the typical unregister user would be a module, if you unregister
> and then dealloc the struct pmu by unloading the module a reader might
> still see a reference to it if you don't srcu_sync it.


Ok, I see what you mean.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/8] perf: Per PMU disable
  2010-06-18  2:14   ` Frederic Weisbecker
@ 2010-06-18  7:11     ` Peter Zijlstra
  2010-06-22 16:21       ` Frederic Weisbecker
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-18  7:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Fri, 2010-06-18 at 04:14 +0200, Frederic Weisbecker wrote:
> On Wed, Jun 16, 2010 at 06:00:33PM +0200, Peter Zijlstra wrote:
> > +static void armpmu_pmu_enable(struct pmu *pmu)
> >  {
> > +static void powerpc_pmu_pmu_disable(struct pmu *pmu)
> >  {
> > +static void fsl_emb_pmu_pmu_disable(struct pmu *pmu)
> >  {
> > +static void sh_pmu_pmu_enable(struct pmu *pmu)
> > +{
> > +static void sparc_pmu_pmu_enable(struct pmu *pmu)
> >  {
> > +static void x86_pmu_pmu_disable(struct pmu *pmu)
> >  {
> 
> 
> These namings are really bad. Why not just using pmu once
> in each names? x86_pmu_enable, etc...

Because some of those were already taken:

static const struct pmu pmu = {
        .enable         = x86_pmu_enable,
        .disable        = x86_pmu_disable,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 8/8] perf: Rework the PMU methods
  2010-06-18  4:21   ` Frederic Weisbecker
@ 2010-06-18  7:15     ` Peter Zijlstra
  2010-06-22 16:26       ` Frederic Weisbecker
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-18  7:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Fri, 2010-06-18 at 06:21 +0200, Frederic Weisbecker wrote:
> On Wed, Jun 16, 2010 at 06:00:35PM +0200, Peter Zijlstra wrote:
> > -static void x86_pmu_stop(struct perf_event *event)
> > +static void x86_pmu_stop(struct perf_event *event, int flags)
> >  {
> > -	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> > -	struct hw_perf_event *hwc = &event->hw;
> > -	int idx = hwc->idx;
> > -
> >  	if (!__test_and_clear_bit(idx, cpuc->active_mask))
> > -		return;
> 
> 
> 
> Do you still need active_mask now that you have HES_STOPPED?

there still were some users, but yeah, we cuold probably clean that up,
bit since the patch is large enough as is, I didn't attempt that.

> > +static void perf_swevent_start(struct perf_event *event, int flags)
> > +{
> > +	event->hw.state = 0;
> > +}
> > +
> > +static void perf_swevent_stop(struct perf_event *event, int flags)
> >  {
> > +	event->hw.state = 1;
> >  }
> 
> 
> So, instead of doing this and add yet another check in the fast path,
> what about just playing with the hlist insertion and deletion?

I wanted to avoid too much trickery, first make a simple one work, then
try something fancy.

> It would be nice to have a PERF_EF_STOP as well in ->del, so that
> each pmu don't need to maintain an internal state.

You have to track it since we can stop the thing outselves without the
caller knowing.

> >  }
> >  #endif
> > @@ -4591,12 +4582,12 @@ static void perf_swevent_start_hrtimer(s
> >  	if (hwc->sample_period) {
> >  		u64 period;
> >  
> > -		if (hwc->remaining) {
> > -			if (hwc->remaining < 0)
> > +		if (hwc->period_left) {
> > +			if (hwc->period_left < 0)
> >  				period = 10000;
> >  			else
> > -				period = hwc->remaining;
> > -			hwc->remaining = 0;
> > +				period = hwc->period_left;
> > +			hwc->period_left = 0;
> 
> 
> 
> If remaining can be replaced by period_left, it should probably be done
> in another patch.

true.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/8] perf pmu interface
  2010-06-18  4:35   ` Frederic Weisbecker
@ 2010-06-18  7:22     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-18  7:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel, Ingo Molnar

On Fri, 2010-06-18 at 06:35 +0200, Frederic Weisbecker wrote:

> > Another idea I was kicking about was to push find_get_context()
> > partially into struct pmu, so that we can have context's per pmu.
> > 
> > For cpu-wide contexts its easy, for per-task contexts we need more
> > pointers in task_struct, so I was thinking of something like:
> > 
> > enum {
> >   perf_swevent_context = 0,
> >   perf_cpu_context,
> > #ifdef HW_BREAKPOINT
> >   perf_bp_context,
> > #endif
> >   perf_nr_task_context
> > };
> > 
> > struct task_struct {
> >   ...
> >   struct perf_event_context *perf_event_ctxs[perf_nr_task_context];
> >   ...
> > };
> > 
> > and have add for loops over the struct pmu list for the cpu-wide
> > contexts and for loops over perf_nr_task_context for the task contexts.
> > 
> > It would add some extra code to the hot-paths, but its the best I can
> > come up with.
> 
> 
> I'm not sure what you mean. Would that be to optimize the start_txn / commit_txn ?
> Then that sounds a good idea. 

Not really, its for pmu_disable/enable, because if you know what pmu is
associated with the context, you can easily disable/enable it while
you're operating on it for doing the lazy machine writes.

It also solves another problem, we currently stop adding new events to a
context on the first failure, this gives very odd effect with software
events which are always schedulable.

If we were to add another hardware pmu the story would get even more
complex, since you could easily get into the situation where one pmu
would still be empty but the other is full and stop adding counters.

> But you'd only need two groups I think:
> 
> enum {
>    perf_swevent_context = 0,
>    perf_cpu_context,
>    perf_nr_task_context
> };
> 
> 
> As only the cpu pmu needs the txn game, at least for now.

Well yeah, but breakpoint really should be a hardware pmu
implementation.

> It seems that would drop the ability to gather hardware and software
> events in a same group though.

Almost, we could allow mixing software events with any hardware event in
a group, but disallow mixing hardware events from different pmus.

The only tricky bit is when the group-leader is a sofrware event, but we
could migrate to the hardware context on adding the first hardware
event.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/8] perf: register pmu implementations
  2010-06-17 23:31   ` Paul Mackerras
@ 2010-06-18  8:40     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-18  8:40 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: stephane eranian, Robert Richter, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, 2010-06-18 at 09:31 +1000, Paul Mackerras wrote:
> 
> > @@ -1011,7 +1001,7 @@ static int hw_perf_cache_event(u64 confi
> >       return 0;
> >  }
> >  
> > -struct pmu *hw_perf_event_init(struct perf_event *event)
> > +static int power_pmu_event_init(struct perf_event *event)
> 
> How does power_pmu_event_init ever get called now?  I don't see any
> other references to it in the patch.  Should struct pmu have a
> reference to it? 

Uhm yeah.. looks like that went pear-shaped. Fixed it.

---
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -1125,6 +1125,7 @@ static int power_pmu_event_init(struct p
 }
 
 struct pmu power_pmu = {
+	.event_init	= power_pmu_event_init,
 	.enable		= power_pmu_enable,
 	.disable	= power_pmu_disable,
 	.read		= power_pmu_read,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/8] perf: Per PMU disable
  2010-06-18  7:11     ` Peter Zijlstra
@ 2010-06-22 16:21       ` Frederic Weisbecker
  2010-06-22 17:09         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-22 16:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Fri, Jun 18, 2010 at 09:11:58AM +0200, Peter Zijlstra wrote:
> On Fri, 2010-06-18 at 04:14 +0200, Frederic Weisbecker wrote:
> > On Wed, Jun 16, 2010 at 06:00:33PM +0200, Peter Zijlstra wrote:
> > > +static void armpmu_pmu_enable(struct pmu *pmu)
> > >  {
> > > +static void powerpc_pmu_pmu_disable(struct pmu *pmu)
> > >  {
> > > +static void fsl_emb_pmu_pmu_disable(struct pmu *pmu)
> > >  {
> > > +static void sh_pmu_pmu_enable(struct pmu *pmu)
> > > +{
> > > +static void sparc_pmu_pmu_enable(struct pmu *pmu)
> > >  {
> > > +static void x86_pmu_pmu_disable(struct pmu *pmu)
> > >  {
> > 
> > 
> > These namings are really bad. Why not just using pmu once
> > in each names? x86_pmu_enable, etc...
> 
> Because some of those were already taken:
> 
> static const struct pmu pmu = {
>         .enable         = x86_pmu_enable,
>         .disable        = x86_pmu_disable,


Then those should be renamed into x86_event_enable or so. 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 8/8] perf: Rework the PMU methods
  2010-06-18  7:15     ` Peter Zijlstra
@ 2010-06-22 16:26       ` Frederic Weisbecker
  2010-06-22 17:10         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Frederic Weisbecker @ 2010-06-22 16:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Fri, Jun 18, 2010 at 09:15:38AM +0200, Peter Zijlstra wrote:
> On Fri, 2010-06-18 at 06:21 +0200, Frederic Weisbecker wrote:
> > On Wed, Jun 16, 2010 at 06:00:35PM +0200, Peter Zijlstra wrote:
> > > -static void x86_pmu_stop(struct perf_event *event)
> > > +static void x86_pmu_stop(struct perf_event *event, int flags)
> > >  {
> > > -	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> > > -	struct hw_perf_event *hwc = &event->hw;
> > > -	int idx = hwc->idx;
> > > -
> > >  	if (!__test_and_clear_bit(idx, cpuc->active_mask))
> > > -		return;
> > 
> > 
> > 
> > Do you still need active_mask now that you have HES_STOPPED?
> 
> there still were some users, but yeah, we cuold probably clean that up,
> bit since the patch is large enough as is, I didn't attempt that.


Yeah, that can be done later.


 
> > > +static void perf_swevent_start(struct perf_event *event, int flags)
> > > +{
> > > +	event->hw.state = 0;
> > > +}
> > > +
> > > +static void perf_swevent_stop(struct perf_event *event, int flags)
> > >  {
> > > +	event->hw.state = 1;
> > >  }
> > 
> > 
> > So, instead of doing this and add yet another check in the fast path,
> > what about just playing with the hlist insertion and deletion?
> 
> I wanted to avoid too much trickery, first make a simple one work, then
> try something fancy.



Ok, as far it's not considered a long term thing. I can
improve that from my exclusion patchset, rebased on top of yours.




> > It would be nice to have a PERF_EF_STOP as well in ->del, so that
> > each pmu don't need to maintain an internal state.
> 
> You have to track it since we can stop the thing outselves without the
> caller knowing.


>From the pmu internals yeah, that's what the x86 pmu does. But otherwise, other
pmu don't do such things.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 6/8] perf: Per PMU disable
  2010-06-22 16:21       ` Frederic Weisbecker
@ 2010-06-22 17:09         ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-22 17:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-22 at 18:21 +0200, Frederic Weisbecker wrote:
> On Fri, Jun 18, 2010 at 09:11:58AM +0200, Peter Zijlstra wrote:
> > On Fri, 2010-06-18 at 04:14 +0200, Frederic Weisbecker wrote:
> > > On Wed, Jun 16, 2010 at 06:00:33PM +0200, Peter Zijlstra wrote:
> > > > +static void armpmu_pmu_enable(struct pmu *pmu)
> > > >  {
> > > > +static void powerpc_pmu_pmu_disable(struct pmu *pmu)
> > > >  {
> > > > +static void fsl_emb_pmu_pmu_disable(struct pmu *pmu)
> > > >  {
> > > > +static void sh_pmu_pmu_enable(struct pmu *pmu)
> > > > +{
> > > > +static void sparc_pmu_pmu_enable(struct pmu *pmu)
> > > >  {
> > > > +static void x86_pmu_pmu_disable(struct pmu *pmu)
> > > >  {
> > > 
> > > 
> > > These namings are really bad. Why not just using pmu once
> > > in each names? x86_pmu_enable, etc...
> > 
> > Because some of those were already taken:
> > 
> > static const struct pmu pmu = {
> >         .enable         = x86_pmu_enable,
> >         .disable        = x86_pmu_disable,
> 
> 
> Then those should be renamed into x86_event_enable or so. 
> 
well, possibly, but the patches were large enough already.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 8/8] perf: Rework the PMU methods
  2010-06-22 16:26       ` Frederic Weisbecker
@ 2010-06-22 17:10         ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2010-06-22 17:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-22 at 18:26 +0200, Frederic Weisbecker wrote:
> > > It would be nice to have a PERF_EF_STOP as well in ->del, so that
> > > each pmu don't need to maintain an internal state.
> > 
> > You have to track it since we can stop the thing outselves without the
> > caller knowing.
> 
> 
> From the pmu internals yeah, that's what the x86 pmu does. But otherwise, other
> pmu don't do such things. 

Everybody who does throttling will have to.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2010-06-23 20:46 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-16 16:00 [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
2010-06-16 16:00 ` [RFC][PATCH 1/8] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
2010-06-16 16:00 ` [RFC][PATCH 2/8] perf: deconstify struct pmu Peter Zijlstra
2010-06-16 16:00 ` [RFC][PATCH 3/8] perf: register pmu implementations Peter Zijlstra
2010-06-16 16:45   ` Robert Richter
2010-06-16 17:03   ` Frederic Weisbecker
2010-06-16 17:48     ` Peter Zijlstra
2010-06-16 18:10       ` Peter Zijlstra
2010-06-18  4:51       ` Frederic Weisbecker
2010-06-17 23:31   ` Paul Mackerras
2010-06-18  8:40     ` Peter Zijlstra
2010-06-16 16:00 ` [RFC][PATCH 4/8] perf: Unindent labels Peter Zijlstra
2010-06-16 17:16   ` Frederic Weisbecker
2010-06-16 17:48     ` Peter Zijlstra
2010-06-16 16:00 ` [RFC][PATCH 5/8] perf: Reduce perf_disable() usage Peter Zijlstra
2010-06-16 16:52   ` Cyrill Gorcunov
2010-06-16 16:00 ` [RFC][PATCH 6/8] perf: Per PMU disable Peter Zijlstra
2010-06-16 17:48   ` Robert Richter
2010-06-16 17:58     ` Peter Zijlstra
2010-06-18  2:14   ` Frederic Weisbecker
2010-06-18  7:11     ` Peter Zijlstra
2010-06-22 16:21       ` Frederic Weisbecker
2010-06-22 17:09         ` Peter Zijlstra
2010-06-16 16:00 ` [RFC][PATCH 7/8] perf: Default PMU ops Peter Zijlstra
2010-06-16 16:00 ` [RFC][PATCH 8/8] perf: Rework the PMU methods Peter Zijlstra
2010-06-18  4:21   ` Frederic Weisbecker
2010-06-18  7:15     ` Peter Zijlstra
2010-06-22 16:26       ` Frederic Weisbecker
2010-06-22 17:10         ` Peter Zijlstra
2010-06-16 18:19 ` [RFC][PATCH 0/8] perf pmu interface Peter Zijlstra
2010-06-18  4:35   ` Frederic Weisbecker
2010-06-18  7:22     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.