linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 00/11] perf pmu interface -v2
@ 2010-06-24 14:28 Peter Zijlstra
  2010-06-24 14:28 ` [PATCH 01/11] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
                   ` (13 more replies)
  0 siblings, 14 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

These patches prepare the perf code for multiple pmus (no user
interface yet, Lin Ming is working on that). These patches remove all
weak functions and rework the struct pmu interface.

The patches are boot tested on x86_64 and compile tested on: powerpc
(!fsl, what config is that?), sparc and arm (sorry no SH compiler)

On x86 perf stat and record with both software and hardware events still worked.

I changed all (hardware) pmu implementations so there's a fair chance I messed
something up, please have a look at it.

If people agree with this, I'll continue with the per-pmu context stuff I
talked about last time, which should get us the hardware write batching back.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/11] perf, x86: Fix Nehalem PMU quirk
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-24 14:28 ` [PATCH 02/11] perf: Fix argument of perf_arch_fetch_caller_regs Peter Zijlstra
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-x86-fix-nehalem-quirk.patch --]
[-- Type: text/plain, Size: 748 bytes --]

The idea was to run all three counters, which means we need to
program 7 into the enable mask, not 3 (which is two bits).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -504,7 +504,7 @@ static void intel_pmu_nhm_enable_all(int
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x4300B1);
 		wrmsrl(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x4300B5);
 
-		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x3);
+		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x7);
 		wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0x0);
 
 		for (i = 0; i < 3; i++) {



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 02/11] perf: Fix argument of perf_arch_fetch_caller_regs
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
  2010-06-24 14:28 ` [PATCH 01/11] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-24 14:28 ` [PATCH 03/11] perf, sparc64: Fix maybe_change_configuration() PCR setting Peter Zijlstra
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra, Nobuhiro Iwamatsu

[-- Attachment #1: nobuhiro-fix-perf-arch-fetch-caller-regs.patch --]
[-- Type: text/plain, Size: 977 bytes --]

From: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>

"struct regs" was set to argument of perf_arch_fetch_caller_regs.
"struct pt_regs" is correct.

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <AANLkTinKKFKEBQrZ3Hkj-XCaMwaTqulb-XnFzqEYiFRr@mail.gmail.com>
---
 include/linux/perf_event.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -971,7 +971,7 @@ extern void __perf_sw_event(u32, u64, in
 
 #ifndef perf_arch_fetch_caller_regs
 static inline void
-perf_arch_fetch_caller_regs(struct regs *regs, unsigned long ip) { }
+perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip) { }
 #endif
 
 /*



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 03/11] perf, sparc64: Fix maybe_change_configuration() PCR setting.
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
  2010-06-24 14:28 ` [PATCH 01/11] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
  2010-06-24 14:28 ` [PATCH 02/11] perf: Fix argument of perf_arch_fetch_caller_regs Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-24 14:28 ` [RFC][PATCH 04/11] perf: deconstify struct pmu Peter Zijlstra
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: davem-sparc-pmu-fixup.patch --]
[-- Type: text/plain, Size: 788 bytes --]

From: David S. Miller <davem@davemloft.net>

Need to mask out the existing event bits before OR'ing in
the new ones.

Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/sparc/kernel/perf_event.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -657,6 +657,7 @@ static u64 maybe_change_configuration(st
 		cpuc->current_idx[i] = idx;
 
 		enc = perf_event_get_enc(cpuc->events[i]);
+		pcr &= ~mask_for_index(idx);
 		pcr |= event_encoding(enc, idx);
 	}
 out:



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 04/11] perf: deconstify struct pmu
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (2 preceding siblings ...)
  2010-06-24 14:28 ` [PATCH 03/11] perf, sparc64: Fix maybe_change_configuration() PCR setting Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-24 14:28 ` [RFC][PATCH 05/11] perf: register pmu implementations Peter Zijlstra
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-deconst-pmu.patch --]
[-- Type: text/plain, Size: 12375 bytes --]

sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/perf_event.c             |    2 +-
 arch/powerpc/kernel/perf_event.c         |    8 ++++----
 arch/powerpc/kernel/perf_event_fsl_emb.c |    2 +-
 arch/sh/kernel/perf_event.c              |    4 ++--
 arch/sparc/kernel/perf_event.c           |   10 +++++-----
 arch/x86/kernel/cpu/perf_event.c         |   14 +++++++-------
 include/linux/perf_event.h               |   10 +++++-----
 kernel/perf_event.c                      |   26 +++++++++++++-------------
 8 files changed, 38 insertions(+), 38 deletions(-)

Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -491,7 +491,7 @@ __hw_perf_event_init(struct perf_event *
 	return err;
 }
 
-const struct pmu *
+struct pmu *
 hw_perf_event_init(struct perf_event *event)
 {
 	int err = 0;
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -854,7 +854,7 @@ static void power_pmu_unthrottle(struct 
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-void power_pmu_start_txn(const struct pmu *pmu)
+void power_pmu_start_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -867,7 +867,7 @@ void power_pmu_start_txn(const struct pm
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-void power_pmu_cancel_txn(const struct pmu *pmu)
+void power_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -879,7 +879,7 @@ void power_pmu_cancel_txn(const struct p
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-int power_pmu_commit_txn(const struct pmu *pmu)
+int power_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	long i, n;
@@ -1011,7 +1011,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	u64 ev;
 	unsigned long flags;
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -428,7 +428,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	u64 ev;
 	struct perf_event *events[MAX_HWEVENTS];
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -257,13 +257,13 @@ static void sh_pmu_read(struct perf_even
 	sh_perf_event_update(event, &event->hw, event->hw.idx);
 }
 
-static const struct pmu pmu = {
+static struct pmu pmu = {
 	.enable		= sh_pmu_enable,
 	.disable	= sh_pmu_disable,
 	.read		= sh_pmu_read,
 };
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	int err = __hw_perf_event_init(event);
 	if (unlikely(err)) {
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -1098,7 +1098,7 @@ static int __hw_perf_event_init(struct p
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void sparc_pmu_start_txn(const struct pmu *pmu)
+static void sparc_pmu_start_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -1110,7 +1110,7 @@ static void sparc_pmu_start_txn(const st
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-static void sparc_pmu_cancel_txn(const struct pmu *pmu)
+static void sparc_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
@@ -1122,7 +1122,7 @@ static void sparc_pmu_cancel_txn(const s
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-static int sparc_pmu_commit_txn(const struct pmu *pmu)
+static int sparc_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int n;
@@ -1141,7 +1141,7 @@ static int sparc_pmu_commit_txn(const st
 	return 0;
 }
 
-static const struct pmu pmu = {
+static struct pmu pmu = {
 	.enable		= sparc_pmu_enable,
 	.disable	= sparc_pmu_disable,
 	.read		= sparc_pmu_read,
@@ -1151,7 +1151,7 @@ static const struct pmu pmu = {
 	.commit_txn	= sparc_pmu_commit_txn,
 };
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	int err = __hw_perf_event_init(event);
 
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -618,7 +618,7 @@ static void x86_pmu_enable_all(int added
 	}
 }
 
-static const struct pmu pmu;
+static struct pmu pmu;
 
 static inline int is_x86_event(struct perf_event *event)
 {
@@ -1394,7 +1394,7 @@ static inline void x86_pmu_read(struct p
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  */
-static void x86_pmu_start_txn(const struct pmu *pmu)
+static void x86_pmu_start_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -1407,7 +1407,7 @@ static void x86_pmu_start_txn(const stru
  * Clear the flag and pmu::enable() will perform the
  * schedulability test.
  */
-static void x86_pmu_cancel_txn(const struct pmu *pmu)
+static void x86_pmu_cancel_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -1424,7 +1424,7 @@ static void x86_pmu_cancel_txn(const str
  * Perform the group schedulability test as a whole
  * Return 0 if success
  */
-static int x86_pmu_commit_txn(const struct pmu *pmu)
+static int x86_pmu_commit_txn(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int assign[X86_PMC_IDX_MAX];
@@ -1450,7 +1450,7 @@ static int x86_pmu_commit_txn(const stru
 	return 0;
 }
 
-static const struct pmu pmu = {
+static struct pmu pmu = {
 	.enable		= x86_pmu_enable,
 	.disable	= x86_pmu_disable,
 	.start		= x86_pmu_start,
@@ -1536,9 +1536,9 @@ out:
 	return ret;
 }
 
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+struct pmu *hw_perf_event_init(struct perf_event *event)
 {
-	const struct pmu *tmp;
+	struct pmu *tmp;
 	int err;
 
 	err = __hw_perf_event_init(event);
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -576,19 +576,19 @@ struct pmu {
 	 * Start the transaction, after this ->enable() doesn't need
 	 * to do schedulability tests.
 	 */
-	void (*start_txn)	(const struct pmu *pmu);
+	void (*start_txn)	(struct pmu *pmu);
 	/*
 	 * If ->start_txn() disabled the ->enable() schedulability test
 	 * then ->commit_txn() is required to perform one. On success
 	 * the transaction is closed. On error the transaction is kept
 	 * open until ->cancel_txn() is called.
 	 */
-	int  (*commit_txn)	(const struct pmu *pmu);
+	int  (*commit_txn)	(struct pmu *pmu);
 	/*
 	 * Will cancel the transaction, assumes ->disable() is called for
 	 * each successfull ->enable() during the transaction.
 	 */
-	void (*cancel_txn)	(const struct pmu *pmu);
+	void (*cancel_txn)	(struct pmu *pmu);
 };
 
 /**
@@ -667,7 +667,7 @@ struct perf_event {
 	int				nr_siblings;
 	int				group_flags;
 	struct perf_event		*group_leader;
-	const struct pmu		*pmu;
+	struct pmu		*pmu;
 
 	enum perf_event_active_state	state;
 	unsigned int			attach_state;
@@ -845,7 +845,7 @@ struct perf_output_handle {
  */
 extern int perf_max_events;
 
-extern const struct pmu *hw_perf_event_init(struct perf_event *event);
+extern struct pmu *hw_perf_event_init(struct perf_event *event);
 
 extern void perf_event_task_sched_in(struct task_struct *task);
 extern void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next);
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -75,7 +75,7 @@ static DEFINE_SPINLOCK(perf_resource_loc
 /*
  * Architecture provided APIs - weak aliases:
  */
-extern __weak const struct pmu *hw_perf_event_init(struct perf_event *event)
+extern __weak struct pmu *hw_perf_event_init(struct perf_event *event)
 {
 	return NULL;
 }
@@ -673,7 +673,7 @@ group_sched_in(struct perf_event *group_
 	       struct perf_event_context *ctx)
 {
 	struct perf_event *event, *partial_group = NULL;
-	const struct pmu *pmu = group_event->pmu;
+	struct pmu *pmu = group_event->pmu;
 	bool txn = false;
 
 	if (group_event->state == PERF_EVENT_STATE_OFF)
@@ -4291,7 +4291,7 @@ static int perf_swevent_int(struct perf_
 	return 0;
 }
 
-static const struct pmu perf_ops_generic = {
+static struct pmu perf_ops_generic = {
 	.enable		= perf_swevent_enable,
 	.disable	= perf_swevent_disable,
 	.start		= perf_swevent_int,
@@ -4404,7 +4404,7 @@ static void cpu_clock_perf_event_read(st
 	cpu_clock_perf_event_update(event);
 }
 
-static const struct pmu perf_ops_cpu_clock = {
+static struct pmu perf_ops_cpu_clock = {
 	.enable		= cpu_clock_perf_event_enable,
 	.disable	= cpu_clock_perf_event_disable,
 	.read		= cpu_clock_perf_event_read,
@@ -4461,7 +4461,7 @@ static void task_clock_perf_event_read(s
 	task_clock_perf_event_update(event, time);
 }
 
-static const struct pmu perf_ops_task_clock = {
+static struct pmu perf_ops_task_clock = {
 	.enable		= task_clock_perf_event_enable,
 	.disable	= task_clock_perf_event_disable,
 	.read		= task_clock_perf_event_read,
@@ -4575,7 +4575,7 @@ static int swevent_hlist_get(struct perf
 
 #ifdef CONFIG_EVENT_TRACING
 
-static const struct pmu perf_ops_tracepoint = {
+static struct pmu perf_ops_tracepoint = {
 	.enable		= perf_trace_enable,
 	.disable	= perf_trace_disable,
 	.start		= perf_swevent_int,
@@ -4639,7 +4639,7 @@ static void tp_perf_event_destroy(struct
 	perf_trace_destroy(event);
 }
 
-static const struct pmu *tp_perf_event_init(struct perf_event *event)
+static struct pmu *tp_perf_event_init(struct perf_event *event)
 {
 	int err;
 
@@ -4686,7 +4686,7 @@ static void perf_event_free_filter(struc
 
 #else
 
-static const struct pmu *tp_perf_event_init(struct perf_event *event)
+static struct pmu *tp_perf_event_init(struct perf_event *event)
 {
 	return NULL;
 }
@@ -4708,7 +4708,7 @@ static void bp_perf_event_destroy(struct
 	release_bp_slot(event);
 }
 
-static const struct pmu *bp_perf_event_init(struct perf_event *bp)
+static struct pmu *bp_perf_event_init(struct perf_event *bp)
 {
 	int err;
 
@@ -4732,7 +4732,7 @@ void perf_bp_event(struct perf_event *bp
 		perf_swevent_add(bp, 1, 1, &sample, regs);
 }
 #else
-static const struct pmu *bp_perf_event_init(struct perf_event *bp)
+static struct pmu *bp_perf_event_init(struct perf_event *bp)
 {
 	return NULL;
 }
@@ -4754,9 +4754,9 @@ static void sw_perf_event_destroy(struct
 	swevent_hlist_put(event);
 }
 
-static const struct pmu *sw_perf_event_init(struct perf_event *event)
+static struct pmu *sw_perf_event_init(struct perf_event *event)
 {
-	const struct pmu *pmu = NULL;
+	struct pmu *pmu = NULL;
 	u64 event_id = event->attr.config;
 
 	/*
@@ -4818,7 +4818,7 @@ perf_event_alloc(struct perf_event_attr 
 		   perf_overflow_handler_t overflow_handler,
 		   gfp_t gfpflags)
 {
-	const struct pmu *pmu;
+	struct pmu *pmu;
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
 	long err;



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 05/11] perf: register pmu implementations
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (3 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 04/11] perf: deconstify struct pmu Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-28 13:21   ` Frederic Weisbecker
  2010-07-09  3:08   ` Paul Mackerras
  2010-06-24 14:28 ` [RFC][PATCH 06/11] perf: Unindent labels Peter Zijlstra
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-register-pmu.patch --]
[-- Type: text/plain, Size: 33498 bytes --]

Simple registration interface for struct pmu, this provides the
infrastructure for removing all the weak functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/perf_event.c             |   36 +
 arch/powerpc/kernel/perf_event.c         |   46 +-
 arch/powerpc/kernel/perf_event_fsl_emb.c |   31 -
 arch/sh/kernel/perf_event.c              |   19 
 arch/sparc/kernel/perf_event.c           |   29 -
 arch/x86/kernel/cpu/perf_event.c         |   45 +-
 include/linux/perf_event.h               |   10 
 kernel/hw_breakpoint.c                   |   35 +
 kernel/perf_event.c                      |  594 +++++++++++++++----------------
 9 files changed, 450 insertions(+), 395 deletions(-)

Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -306,13 +306,6 @@ out:
 	return err;
 }
 
-static struct pmu pmu = {
-	.enable	    = armpmu_enable,
-	.disable    = armpmu_disable,
-	.unthrottle = armpmu_unthrottle,
-	.read	    = armpmu_read,
-};
-
 static int
 validate_event(struct cpu_hw_events *cpuc,
 	       struct perf_event *event)
@@ -491,13 +484,22 @@ __hw_perf_event_init(struct perf_event *
 	return err;
 }
 
-struct pmu *
-hw_perf_event_init(struct perf_event *event)
+static int armpmu_event_init(struct perf_event *event)
 {
 	int err = 0;
 
+	switch (event->attr.type) {
+	case PERF_TYPE_RAW:
+	case PERF_TYPE_HARDWARE:
+	case PERF_TYPE_HW_CACHE:
+		break;
+
+	default:
+		return -ENOENT;
+	}
+
 	if (!armpmu)
-		return ERR_PTR(-ENODEV);
+		return -ENODEV;
 
 	event->destroy = hw_perf_event_destroy;
 
@@ -518,15 +520,23 @@ hw_perf_event_init(struct perf_event *ev
 	}
 
 	if (err)
-		return ERR_PTR(err);
+		return err;
 
 	err = __hw_perf_event_init(event);
 	if (err)
 		hw_perf_event_destroy(event);
 
-	return err ? ERR_PTR(err) : &pmu;
+	return err;
 }
 
+static struct pmu pmu = {
+	.event_init = armpmu_event_init,
+	.enable	    = armpmu_enable,
+	.disable    = armpmu_disable,
+	.unthrottle = armpmu_unthrottle,
+	.read	    = armpmu_read,
+};
+
 void
 hw_perf_enable(void)
 {
@@ -2994,6 +3004,8 @@ init_hw_perf_events(void)
 		perf_max_events = -1;
 	}
 
+	perf_pmu_register(&pmu)
+
 	return 0;
 }
 arch_initcall(init_hw_perf_events);
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -901,16 +901,6 @@ int power_pmu_commit_txn(struct pmu *pmu
 	return 0;
 }
 
-struct pmu power_pmu = {
-	.enable		= power_pmu_enable,
-	.disable	= power_pmu_disable,
-	.read		= power_pmu_read,
-	.unthrottle	= power_pmu_unthrottle,
-	.start_txn	= power_pmu_start_txn,
-	.cancel_txn	= power_pmu_cancel_txn,
-	.commit_txn	= power_pmu_commit_txn,
-};
-
 /*
  * Return 1 if we might be able to put event on a limited PMC,
  * or 0 if not.
@@ -1011,7 +1001,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
+static int power_pmu_event_init(struct perf_event *event)
 {
 	u64 ev;
 	unsigned long flags;
@@ -1023,25 +1013,27 @@ struct pmu *hw_perf_event_init(struct pe
 	struct cpu_hw_events *cpuhw;
 
 	if (!ppmu)
-		return ERR_PTR(-ENXIO);
+		return -ENOENT;
+
 	switch (event->attr.type) {
 	case PERF_TYPE_HARDWARE:
 		ev = event->attr.config;
 		if (ev >= ppmu->n_generic || ppmu->generic_events[ev] == 0)
-			return ERR_PTR(-EOPNOTSUPP);
+			return -EOPNOTSUPP;
 		ev = ppmu->generic_events[ev];
 		break;
 	case PERF_TYPE_HW_CACHE:
 		err = hw_perf_cache_event(event->attr.config, &ev);
 		if (err)
-			return ERR_PTR(err);
+			return err;
 		break;
 	case PERF_TYPE_RAW:
 		ev = event->attr.config;
 		break;
 	default:
-		return ERR_PTR(-EINVAL);
+		return -ENOENT;
 	}
+
 	event->hw.config_base = ev;
 	event->hw.idx = 0;
 
@@ -1078,7 +1070,7 @@ struct pmu *hw_perf_event_init(struct pe
 			 */
 			ev = normal_pmc_alternative(ev, flags);
 			if (!ev)
-				return ERR_PTR(-EINVAL);
+				return -EINVAL;
 		}
 	}
 
@@ -1092,19 +1084,19 @@ struct pmu *hw_perf_event_init(struct pe
 		n = collect_events(event->group_leader, ppmu->n_counter - 1,
 				   ctrs, events, cflags);
 		if (n < 0)
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 	}
 	events[n] = ev;
 	ctrs[n] = event;
 	cflags[n] = flags;
 	if (check_excludes(ctrs, cflags, n, 1))
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 
 	cpuhw = &get_cpu_var(cpu_hw_events);
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 	put_cpu_var(cpu_hw_events);
 	if (err)
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 
 	event->hw.config = events[n];
 	event->hw.event_base = cflags[n];
@@ -1129,11 +1121,20 @@ struct pmu *hw_perf_event_init(struct pe
 	}
 	event->destroy = hw_perf_event_destroy;
 
-	if (err)
-		return ERR_PTR(err);
-	return &power_pmu;
+	return err;
 }
 
+struct pmu power_pmu = {
+	.event_init	= power_pmu_event_init,
+	.enable		= power_pmu_enable,
+	.disable	= power_pmu_disable,
+	.read		= power_pmu_read,
+	.unthrottle	= power_pmu_unthrottle,
+	.start_txn	= power_pmu_start_txn,
+	.cancel_txn	= power_pmu_cancel_txn,
+	.commit_txn	= power_pmu_commit_txn,
+};
+
 /*
  * A counter has overflowed; update its count and record
  * things if requested.  Note that interrupts are hard-disabled
@@ -1339,6 +1340,7 @@ int register_power_pmu(struct power_pmu 
 		freeze_events_kernel = MMCR0_FCHV;
 #endif /* CONFIG_PPC64 */
 
+	perf_pmu_register(&power_pmu);
 	perf_cpu_notifier(power_pmu_notifier);
 
 	return 0;
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -378,13 +378,6 @@ static void fsl_emb_pmu_unthrottle(struc
 	local_irq_restore(flags);
 }
 
-static struct pmu fsl_emb_pmu = {
-	.enable		= fsl_emb_pmu_enable,
-	.disable	= fsl_emb_pmu_disable,
-	.read		= fsl_emb_pmu_read,
-	.unthrottle	= fsl_emb_pmu_unthrottle,
-};
-
 /*
  * Release the PMU if this is the last perf_event.
  */
@@ -428,7 +421,7 @@ static int hw_perf_cache_event(u64 confi
 	return 0;
 }
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
+static int fsl_emb_pmu_event_init(struct perf_event *event)
 {
 	u64 ev;
 	struct perf_event *events[MAX_HWEVENTS];
@@ -456,12 +449,12 @@ struct pmu *hw_perf_event_init(struct pe
 		break;
 
 	default:
-		return ERR_PTR(-EINVAL);
+		return -ENOENT;
 	}
 
 	event->hw.config = ppmu->xlate_event(ev);
 	if (!(event->hw.config & FSL_EMB_EVENT_VALID))
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 
 	/*
 	 * If this is in a group, check if it can go on with all the
@@ -473,7 +466,7 @@ struct pmu *hw_perf_event_init(struct pe
 		n = collect_events(event->group_leader,
 		                   ppmu->n_counter - 1, events);
 		if (n < 0)
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 	}
 
 	if (event->hw.config & FSL_EMB_EVENT_RESTRICTED) {
@@ -484,7 +477,7 @@ struct pmu *hw_perf_event_init(struct pe
 		}
 
 		if (num_restricted >= ppmu->n_restricted)
-			return ERR_PTR(-EINVAL);
+			return -EINVAL;
 	}
 
 	event->hw.idx = -1;
@@ -523,11 +516,17 @@ struct pmu *hw_perf_event_init(struct pe
 	}
 	event->destroy = hw_perf_event_destroy;
 
-	if (err)
-		return ERR_PTR(err);
-	return &fsl_emb_pmu;
+	return err;
 }
 
+static struct pmu fsl_emb_pmu = {
+	.event_init	= fsl_emb_pmu_event_init,
+	.enable		= fsl_emb_pmu_enable,
+	.disable	= fsl_emb_pmu_disable,
+	.read		= fsl_emb_pmu_read,
+	.unthrottle	= fsl_emb_pmu_unthrottle,
+};
+
 /*
  * A counter has overflowed; update its count and record
  * things if requested.  Note that interrupts are hard-disabled
@@ -650,5 +649,7 @@ int register_fsl_emb_pmu(struct fsl_emb_
 	pr_info("%s performance monitor hardware support registered\n",
 		pmu->name);
 
+	perf_pmu_register(&fsl_emb_pmu);
+
 	return 0;
 }
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -257,24 +257,24 @@ static void sh_pmu_read(struct perf_even
 	sh_perf_event_update(event, &event->hw, event->hw.idx);
 }
 
-static struct pmu pmu = {
-	.enable		= sh_pmu_enable,
-	.disable	= sh_pmu_disable,
-	.read		= sh_pmu_read,
-};
-
-struct pmu *hw_perf_event_init(struct perf_event *event)
+static in sh_pmu_event_init(struct perf_event *event)
 {
 	int err = __hw_perf_event_init(event);
 	if (unlikely(err)) {
 		if (event->destroy)
 			event->destroy(event);
-		return ERR_PTR(err);
 	}
 
-	return &pmu;
+	return err;
 }
 
+static struct pmu pmu = {
+	.event_init	= sh_pmu_event_init,
+	.enable		= sh_pmu_enable,
+	.disable	= sh_pmu_disable,
+	.read		= sh_pmu_read,
+};
+
 static void sh_pmu_setup(int cpu)
 {
 	struct cpu_hw_events *cpuhw = &per_cpu(cpu_hw_events, cpu);
@@ -325,6 +325,7 @@ int __cpuinit register_sh_pmu(struct sh_
 
 	WARN_ON(pmu->num_events > MAX_HWEVENTS);
 
+	perf_pmu_register(&pmu);
 	perf_cpu_notifier(sh_pmu_notifier);
 	return 0;
 }
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -1024,7 +1024,7 @@ out:
 	return ret;
 }
 
-static int __hw_perf_event_init(struct perf_event *event)
+static int sparc_pmu_event_init(struct perf_event *event)
 {
 	struct perf_event_attr *attr = &event->attr;
 	struct perf_event *evts[MAX_HWEVENTS];
@@ -1037,17 +1037,27 @@ static int __hw_perf_event_init(struct p
 	if (atomic_read(&nmi_active) < 0)
 		return -ENODEV;
 
-	if (attr->type == PERF_TYPE_HARDWARE) {
+	switch (attr->type) {
+	case PERF_TYPE_HARDWARE:
 		if (attr->config >= sparc_pmu->max_events)
 			return -EINVAL;
 		pmap = sparc_pmu->event_map(attr->config);
-	} else if (attr->type == PERF_TYPE_HW_CACHE) {
+		break;
+
+	case PERF_TYPE_HW_CACHE:
 		pmap = sparc_map_cache_event(attr->config);
 		if (IS_ERR(pmap))
 			return PTR_ERR(pmap);
-	} else
+		break;
+
+	case PERF_TYPE_RAW:
 		return -EOPNOTSUPP;
 
+	default:
+		return -ENOENT;
+
+	}
+
 	/* We save the enable bits in the config_base.  */
 	hwc->config_base = sparc_pmu->irq_bit;
 	if (!attr->exclude_user)
@@ -1142,6 +1152,7 @@ static int sparc_pmu_commit_txn(struct p
 }
 
 static struct pmu pmu = {
+	.event_init	= sparc_pmu_event_init,
 	.enable		= sparc_pmu_enable,
 	.disable	= sparc_pmu_disable,
 	.read		= sparc_pmu_read,
@@ -1151,15 +1162,6 @@ static struct pmu pmu = {
 	.commit_txn	= sparc_pmu_commit_txn,
 };
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
-{
-	int err = __hw_perf_event_init(event);
-
-	if (err)
-		return ERR_PTR(err);
-	return &pmu;
-}
-
 void perf_event_print_debug(void)
 {
 	unsigned long flags;
@@ -1279,6 +1281,7 @@ void __init init_hw_perf_events(void)
 	/* All sparc64 PMUs currently have 2 events.  */
 	perf_max_events = 2;
 
+	perf_pmu_register(&pmu);
 	register_die_notifier(&perf_event_nmi_notifier);
 }
 
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -530,7 +530,7 @@ static int x86_pmu_hw_config(struct perf
 /*
  * Setup the hardware configuration for a given attr_type
  */
-static int __hw_perf_event_init(struct perf_event *event)
+static int __x86_pmu_event_init(struct perf_event *event)
 {
 	int err;
 
@@ -1381,6 +1381,7 @@ void __init init_hw_perf_events(void)
 	pr_info("... fixed-purpose events:   %d\n",     x86_pmu.num_counters_fixed);
 	pr_info("... event mask:             %016Lx\n", x86_pmu.intel_ctrl);
 
+	perf_pmu_register(&pmu);
 	perf_cpu_notifier(x86_pmu_notifier);
 }
 
@@ -1450,18 +1451,6 @@ static int x86_pmu_commit_txn(struct pmu
 	return 0;
 }
 
-static struct pmu pmu = {
-	.enable		= x86_pmu_enable,
-	.disable	= x86_pmu_disable,
-	.start		= x86_pmu_start,
-	.stop		= x86_pmu_stop,
-	.read		= x86_pmu_read,
-	.unthrottle	= x86_pmu_unthrottle,
-	.start_txn	= x86_pmu_start_txn,
-	.cancel_txn	= x86_pmu_cancel_txn,
-	.commit_txn	= x86_pmu_commit_txn,
-};
-
 /*
  * validate that we can schedule this event
  */
@@ -1536,12 +1525,22 @@ out:
 	return ret;
 }
 
-struct pmu *hw_perf_event_init(struct perf_event *event)
+int x86_pmu_event_init(struct perf_event *event)
 {
 	struct pmu *tmp;
 	int err;
 
-	err = __hw_perf_event_init(event);
+	switch (event->attr.type) {
+	case PERF_TYPE_RAW:
+	case PERF_TYPE_HARDWARE:
+	case PERF_TYPE_HW_CACHE:
+		break;
+
+	default:
+		return -ENOENT;
+	}
+
+	err = __x86_pmu_event_init(event);
 	if (!err) {
 		/*
 		 * we temporarily connect event to its pmu
@@ -1561,12 +1560,24 @@ struct pmu *hw_perf_event_init(struct pe
 	if (err) {
 		if (event->destroy)
 			event->destroy(event);
-		return ERR_PTR(err);
 	}
 
-	return &pmu;
+	return err;
 }
 
+static struct pmu pmu = {
+	.event_init	= x86_pmu_event_init,
+	.enable		= x86_pmu_enable,
+	.disable	= x86_pmu_disable,
+	.start		= x86_pmu_start,
+	.stop		= x86_pmu_stop,
+	.read		= x86_pmu_read,
+	.unthrottle	= x86_pmu_unthrottle,
+	.start_txn	= x86_pmu_start_txn,
+	.cancel_txn	= x86_pmu_cancel_txn,
+	.commit_txn	= x86_pmu_commit_txn,
+};
+
 /*
  * callchain support
  */
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -559,6 +559,13 @@ struct perf_event;
  * struct pmu - generic performance monitoring unit
  */
 struct pmu {
+	struct list_head		entry;
+
+	/*
+	 * Should return -ENOENT when the @event doesn't match this pmu
+	 */
+	int (*event_init)		(struct perf_event *event);
+
 	int (*enable)			(struct perf_event *event);
 	void (*disable)			(struct perf_event *event);
 	int (*start)			(struct perf_event *event);
@@ -845,7 +852,8 @@ struct perf_output_handle {
  */
 extern int perf_max_events;
 
-extern struct pmu *hw_perf_event_init(struct perf_event *event);
+extern int perf_pmu_register(struct pmu *pmu);
+extern void perf_pmu_unregister(struct pmu *pmu);
 
 extern void perf_event_task_sched_in(struct task_struct *task);
 extern void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next);
Index: linux-2.6/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6.orig/kernel/hw_breakpoint.c
+++ linux-2.6/kernel/hw_breakpoint.c
@@ -549,6 +549,34 @@ static struct notifier_block hw_breakpoi
 	.priority = 0x7fffffff
 };
 
+static void bp_perf_event_destroy(struct perf_event *event)
+{
+	release_bp_slot(event);
+}
+
+static int hw_breakpoint_event_init(struct perf_event *bp)
+{
+	int err;
+
+	if (bp->attr.type != PERF_TYPE_BREAKPOINT)
+		return -ENOENT;
+
+	err = register_perf_hw_breakpoint(bp);
+	if (err)
+		return err;
+
+	bp->destroy = bp_perf_event_destroy;
+
+	return 0;
+}
+
+static struct pmu perf_breakpoint = {
+	.event_init	= hw_breakpoint_event_init,
+	.enable		= arch_install_hw_breakpoint,
+	.disable	= arch_uninstall_hw_breakpoint,
+	.read		= hw_breakpoint_pmu_read,
+};
+
 static int __init init_hw_breakpoint(void)
 {
 	unsigned int **task_bp_pinned;
@@ -570,6 +598,8 @@ static int __init init_hw_breakpoint(voi
 
 	constraints_initialized = 1;
 
+	perf_pmu_register(&perf_breakpoint);
+
 	return register_die_notifier(&hw_breakpoint_exceptions_nb);
 
  err_alloc:
@@ -585,8 +615,3 @@ static int __init init_hw_breakpoint(voi
 core_initcall(init_hw_breakpoint);
 
 
-struct pmu perf_ops_bp = {
-	.enable		= arch_install_hw_breakpoint,
-	.disable	= arch_uninstall_hw_breakpoint,
-	.read		= hw_breakpoint_pmu_read,
-};
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -31,7 +31,6 @@
 #include <linux/kernel_stat.h>
 #include <linux/perf_event.h>
 #include <linux/ftrace_event.h>
-#include <linux/hw_breakpoint.h>
 
 #include <asm/irq_regs.h>
 
@@ -72,14 +71,6 @@ static atomic64_t perf_event_id;
  */
 static DEFINE_SPINLOCK(perf_resource_lock);
 
-/*
- * Architecture provided APIs - weak aliases:
- */
-extern __weak struct pmu *hw_perf_event_init(struct perf_event *event)
-{
-	return NULL;
-}
-
 void __weak hw_perf_disable(void)		{ barrier(); }
 void __weak hw_perf_enable(void)		{ barrier(); }
 
@@ -4291,182 +4282,6 @@ static int perf_swevent_int(struct perf_
 	return 0;
 }
 
-static struct pmu perf_ops_generic = {
-	.enable		= perf_swevent_enable,
-	.disable	= perf_swevent_disable,
-	.start		= perf_swevent_int,
-	.stop		= perf_swevent_void,
-	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void, /* hwc->interrupts already reset */
-};
-
-/*
- * hrtimer based swevent callback
- */
-
-static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
-{
-	enum hrtimer_restart ret = HRTIMER_RESTART;
-	struct perf_sample_data data;
-	struct pt_regs *regs;
-	struct perf_event *event;
-	u64 period;
-
-	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
-	event->pmu->read(event);
-
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
-	regs = get_irq_regs();
-
-	if (regs && !perf_exclude_event(event, regs)) {
-		if (!(event->attr.exclude_idle && current->pid == 0))
-			if (perf_event_overflow(event, 0, &data, regs))
-				ret = HRTIMER_NORESTART;
-	}
-
-	period = max_t(u64, 10000, event->hw.sample_period);
-	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
-
-	return ret;
-}
-
-static void perf_swevent_start_hrtimer(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-
-	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	hwc->hrtimer.function = perf_swevent_hrtimer;
-	if (hwc->sample_period) {
-		u64 period;
-
-		if (hwc->remaining) {
-			if (hwc->remaining < 0)
-				period = 10000;
-			else
-				period = hwc->remaining;
-			hwc->remaining = 0;
-		} else {
-			period = max_t(u64, 10000, hwc->sample_period);
-		}
-		__hrtimer_start_range_ns(&hwc->hrtimer,
-				ns_to_ktime(period), 0,
-				HRTIMER_MODE_REL, 0);
-	}
-}
-
-static void perf_swevent_cancel_hrtimer(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-
-	if (hwc->sample_period) {
-		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
-		hwc->remaining = ktime_to_ns(remaining);
-
-		hrtimer_cancel(&hwc->hrtimer);
-	}
-}
-
-/*
- * Software event: cpu wall time clock
- */
-
-static void cpu_clock_perf_event_update(struct perf_event *event)
-{
-	int cpu = raw_smp_processor_id();
-	s64 prev;
-	u64 now;
-
-	now = cpu_clock(cpu);
-	prev = local64_xchg(&event->hw.prev_count, now);
-	local64_add(now - prev, &event->count);
-}
-
-static int cpu_clock_perf_event_enable(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-	int cpu = raw_smp_processor_id();
-
-	local64_set(&hwc->prev_count, cpu_clock(cpu));
-	perf_swevent_start_hrtimer(event);
-
-	return 0;
-}
-
-static void cpu_clock_perf_event_disable(struct perf_event *event)
-{
-	perf_swevent_cancel_hrtimer(event);
-	cpu_clock_perf_event_update(event);
-}
-
-static void cpu_clock_perf_event_read(struct perf_event *event)
-{
-	cpu_clock_perf_event_update(event);
-}
-
-static struct pmu perf_ops_cpu_clock = {
-	.enable		= cpu_clock_perf_event_enable,
-	.disable	= cpu_clock_perf_event_disable,
-	.read		= cpu_clock_perf_event_read,
-};
-
-/*
- * Software event: task time clock
- */
-
-static void task_clock_perf_event_update(struct perf_event *event, u64 now)
-{
-	u64 prev;
-	s64 delta;
-
-	prev = local64_xchg(&event->hw.prev_count, now);
-	delta = now - prev;
-	local64_add(delta, &event->count);
-}
-
-static int task_clock_perf_event_enable(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-	u64 now;
-
-	now = event->ctx->time;
-
-	local64_set(&hwc->prev_count, now);
-
-	perf_swevent_start_hrtimer(event);
-
-	return 0;
-}
-
-static void task_clock_perf_event_disable(struct perf_event *event)
-{
-	perf_swevent_cancel_hrtimer(event);
-	task_clock_perf_event_update(event, event->ctx->time);
-
-}
-
-static void task_clock_perf_event_read(struct perf_event *event)
-{
-	u64 time;
-
-	if (!in_nmi()) {
-		update_context_time(event->ctx);
-		time = event->ctx->time;
-	} else {
-		u64 now = perf_clock();
-		u64 delta = now - event->ctx->timestamp;
-		time = event->ctx->time + delta;
-	}
-
-	task_clock_perf_event_update(event, time);
-}
-
-static struct pmu perf_ops_task_clock = {
-	.enable		= task_clock_perf_event_enable,
-	.disable	= task_clock_perf_event_disable,
-	.read		= task_clock_perf_event_read,
-};
-
 /* Deref the hlist from the update side */
 static inline struct swevent_hlist *
 swevent_hlist_deref(struct perf_cpu_context *cpuctx)
@@ -4573,17 +4388,63 @@ static int swevent_hlist_get(struct perf
 	return err;
 }
 
-#ifdef CONFIG_EVENT_TRACING
+atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
 
-static struct pmu perf_ops_tracepoint = {
-	.enable		= perf_trace_enable,
-	.disable	= perf_trace_disable,
+static void sw_perf_event_destroy(struct perf_event *event)
+{
+	u64 event_id = event->attr.config;
+
+	WARN_ON(event->parent);
+
+	atomic_dec(&perf_swevent_enabled[event_id]);
+	swevent_hlist_put(event);
+}
+
+static int perf_swevent_init(struct perf_event *event)
+{
+	int event_id = event->attr.config;
+
+	if (event->attr.type != PERF_TYPE_SOFTWARE)
+		return -ENOENT;
+
+	switch (event_id) {
+	case PERF_COUNT_SW_CPU_CLOCK:
+	case PERF_COUNT_SW_TASK_CLOCK:
+		return -ENOENT;
+
+	default:
+		break;
+	}
+
+	if (event_id > PERF_COUNT_SW_MAX)
+		return -ENOENT;
+
+	if (!event->parent) {
+		int err;
+
+		err = swevent_hlist_get(event);
+		if (err)
+			return err;
+
+		atomic_inc(&perf_swevent_enabled[event_id]);
+		event->destroy = sw_perf_event_destroy;
+	}
+
+	return 0;
+}
+
+static struct pmu perf_swevent = {
+	.event_init	= perf_swevent_init,
+	.enable		= perf_swevent_enable,
+	.disable	= perf_swevent_disable,
 	.start		= perf_swevent_int,
 	.stop		= perf_swevent_void,
 	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void,
+	.unthrottle	= perf_swevent_void, /* hwc->interrupts already reset */
 };
 
+#ifdef CONFIG_EVENT_TRACING
+
 static int perf_tp_filter_match(struct perf_event *event,
 				struct perf_sample_data *data)
 {
@@ -4639,10 +4500,13 @@ static void tp_perf_event_destroy(struct
 	perf_trace_destroy(event);
 }
 
-static struct pmu *tp_perf_event_init(struct perf_event *event)
+static int perf_tp_event_init(struct perf_event *event)
 {
 	int err;
 
+	if (event->attr.type != PERF_TYPE_TRACEPOINT)
+		return -ENOENT;
+
 	/*
 	 * Raw tracepoint data is a severe data leak, only allow root to
 	 * have these.
@@ -4650,15 +4514,30 @@ static struct pmu *tp_perf_event_init(st
 	if ((event->attr.sample_type & PERF_SAMPLE_RAW) &&
 			perf_paranoid_tracepoint_raw() &&
 			!capable(CAP_SYS_ADMIN))
-		return ERR_PTR(-EPERM);
+		return -EPERM;
 
 	err = perf_trace_init(event);
 	if (err)
-		return NULL;
+		return err;
 
 	event->destroy = tp_perf_event_destroy;
 
-	return &perf_ops_tracepoint;
+	return 0;
+}
+
+static struct pmu perf_tracepoint = {
+	.event_init	= perf_tp_event_init,
+	.enable		= perf_trace_enable,
+	.disable	= perf_trace_disable,
+	.start		= perf_swevent_int,
+	.stop		= perf_swevent_void,
+	.read		= perf_swevent_read,
+	.unthrottle	= perf_swevent_void,
+};
+
+static inline void perf_tp_register(void)
+{
+	perf_pmu_register(&perf_tracepoint);
 }
 
 static int perf_event_set_filter(struct perf_event *event, void __user *arg)
@@ -4686,9 +4565,8 @@ static void perf_event_free_filter(struc
 
 #else
 
-static struct pmu *tp_perf_event_init(struct perf_event *event)
+static inline void perf_tp_register(void)
 {
-	return NULL;
 }
 
 static int perf_event_set_filter(struct perf_event *event, void __user *arg)
@@ -4703,105 +4581,247 @@ static void perf_event_free_filter(struc
 #endif /* CONFIG_EVENT_TRACING */
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-static void bp_perf_event_destroy(struct perf_event *event)
+void perf_bp_event(struct perf_event *bp, void *data)
 {
-	release_bp_slot(event);
+	struct perf_sample_data sample;
+	struct pt_regs *regs = data;
+
+	perf_sample_data_init(&sample, bp->attr.bp_addr);
+
+	if (!perf_exclude_event(bp, regs))
+		perf_swevent_add(bp, 1, 1, &sample, regs);
 }
+#endif
 
-static struct pmu *bp_perf_event_init(struct perf_event *bp)
+/*
+ * hrtimer based swevent callback
+ */
+
+static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 {
-	int err;
+	enum hrtimer_restart ret = HRTIMER_RESTART;
+	struct perf_sample_data data;
+	struct pt_regs *regs;
+	struct perf_event *event;
+	u64 period;
 
-	err = register_perf_hw_breakpoint(bp);
-	if (err)
-		return ERR_PTR(err);
+	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+	event->pmu->read(event);
+
+	perf_sample_data_init(&data, 0);
+	data.period = event->hw.last_period;
+	regs = get_irq_regs();
+
+	if (regs && !perf_exclude_event(event, regs)) {
+		if (!(event->attr.exclude_idle && current->pid == 0))
+			if (perf_event_overflow(event, 0, &data, regs))
+				ret = HRTIMER_NORESTART;
+	}
 
-	bp->destroy = bp_perf_event_destroy;
+	period = max_t(u64, 10000, event->hw.sample_period);
+	hrtimer_forward_now(hrtimer, ns_to_ktime(period));
 
-	return &perf_ops_bp;
+	return ret;
 }
 
-void perf_bp_event(struct perf_event *bp, void *data)
+static void perf_swevent_start_hrtimer(struct perf_event *event)
 {
-	struct perf_sample_data sample;
-	struct pt_regs *regs = data;
+	struct hw_perf_event *hwc = &event->hw;
 
-	perf_sample_data_init(&sample, bp->attr.bp_addr);
+	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hwc->hrtimer.function = perf_swevent_hrtimer;
+	if (hwc->sample_period) {
+		u64 period;
 
-	if (!perf_exclude_event(bp, regs))
-		perf_swevent_add(bp, 1, 1, &sample, regs);
+		if (hwc->remaining) {
+			if (hwc->remaining < 0)
+				period = 10000;
+			else
+				period = hwc->remaining;
+			hwc->remaining = 0;
+		} else {
+			period = max_t(u64, 10000, hwc->sample_period);
+		}
+		__hrtimer_start_range_ns(&hwc->hrtimer,
+				ns_to_ktime(period), 0,
+				HRTIMER_MODE_REL, 0);
+	}
 }
-#else
-static struct pmu *bp_perf_event_init(struct perf_event *bp)
+
+static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 {
-	return NULL;
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->sample_period) {
+		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
+		hwc->remaining = ktime_to_ns(remaining);
+
+		hrtimer_cancel(&hwc->hrtimer);
+	}
 }
 
-void perf_bp_event(struct perf_event *bp, void *regs)
+/*
+ * Software event: cpu wall time clock
+ */
+
+static void cpu_clock_event_update(struct perf_event *event)
 {
+	int cpu = raw_smp_processor_id();
+	s64 prev;
+	u64 now;
+
+	now = cpu_clock(cpu);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
 }
-#endif
 
-atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+static int cpu_clock_event_enable(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int cpu = raw_smp_processor_id();
 
-static void sw_perf_event_destroy(struct perf_event *event)
+	local64_set(&hwc->prev_count, cpu_clock(cpu));
+	perf_swevent_start_hrtimer(event);
+
+	return 0;
+}
+
+static void cpu_clock_event_disable(struct perf_event *event)
 {
-	u64 event_id = event->attr.config;
+	perf_swevent_cancel_hrtimer(event);
+	cpu_clock_event_update(event);
+}
 
-	WARN_ON(event->parent);
+static void cpu_clock_event_read(struct perf_event *event)
+{
+	cpu_clock_event_update(event);
+}
 
-	atomic_dec(&perf_swevent_enabled[event_id]);
-	swevent_hlist_put(event);
+static int cpu_clock_event_init(struct perf_event *event)
+{
+	if (event->attr.type != PERF_TYPE_SOFTWARE)
+		return -ENOENT;
+
+	if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK)
+		return -ENOENT;
+
+	return 0;
 }
 
-static struct pmu *sw_perf_event_init(struct perf_event *event)
+static struct pmu perf_cpu_clock = {
+	.event_init	= cpu_clock_event_init,
+	.enable		= cpu_clock_event_enable,
+	.disable	= cpu_clock_event_disable,
+	.read		= cpu_clock_event_read,
+};
+
+/*
+ * Software event: task time clock
+ */
+
+static void task_clock_event_update(struct perf_event *event, u64 now)
 {
-	struct pmu *pmu = NULL;
-	u64 event_id = event->attr.config;
+	u64 prev;
+	s64 delta;
 
-	/*
-	 * Software events (currently) can't in general distinguish
-	 * between user, kernel and hypervisor events.
-	 * However, context switches and cpu migrations are considered
-	 * to be kernel events, and page faults are never hypervisor
-	 * events.
-	 */
-	switch (event_id) {
-	case PERF_COUNT_SW_CPU_CLOCK:
-		pmu = &perf_ops_cpu_clock;
+	prev = local64_xchg(&event->hw.prev_count, now);
+	delta = now - prev;
+	local64_add(delta, &event->count);
+}
 
-		break;
-	case PERF_COUNT_SW_TASK_CLOCK:
-		/*
-		 * If the user instantiates this as a per-cpu event,
-		 * use the cpu_clock event instead.
-		 */
-		if (event->ctx->task)
-			pmu = &perf_ops_task_clock;
-		else
-			pmu = &perf_ops_cpu_clock;
+static int task_clock_event_enable(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 now;
 
-		break;
-	case PERF_COUNT_SW_PAGE_FAULTS:
-	case PERF_COUNT_SW_PAGE_FAULTS_MIN:
-	case PERF_COUNT_SW_PAGE_FAULTS_MAJ:
-	case PERF_COUNT_SW_CONTEXT_SWITCHES:
-	case PERF_COUNT_SW_CPU_MIGRATIONS:
-	case PERF_COUNT_SW_ALIGNMENT_FAULTS:
-	case PERF_COUNT_SW_EMULATION_FAULTS:
-		if (!event->parent) {
-			int err;
-
-			err = swevent_hlist_get(event);
-			if (err)
-				return ERR_PTR(err);
+	now = event->ctx->time;
+
+	local64_set(&hwc->prev_count, now);
+
+	perf_swevent_start_hrtimer(event);
+
+	return 0;
+}
+
+static void task_clock_event_disable(struct perf_event *event)
+{
+	perf_swevent_cancel_hrtimer(event);
+	task_clock_event_update(event, event->ctx->time);
+
+}
+
+static void task_clock_event_read(struct perf_event *event)
+{
+	u64 time;
+
+	if (!in_nmi()) {
+		update_context_time(event->ctx);
+		time = event->ctx->time;
+	} else {
+		u64 now = perf_clock();
+		u64 delta = now - event->ctx->timestamp;
+		time = event->ctx->time + delta;
+	}
+
+	task_clock_event_update(event, time);
+}
+
+static int task_clock_event_init(struct perf_event *event)
+{
+	if (event->attr.type != PERF_TYPE_SOFTWARE)
+		return -ENOENT;
+
+	if (event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
+		return -ENOENT;
+
+	return 0;
+}
+
+static struct pmu perf_task_clock = {
+	.event_init	= task_clock_event_init,
+	.enable		= task_clock_event_enable,
+	.disable	= task_clock_event_disable,
+	.read		= task_clock_event_read,
+};
+
+static LIST_HEAD(pmus);
+static DEFINE_MUTEX(pmus_lock);
+static struct srcu_struct pmus_srcu;
+
+int perf_pmu_register(struct pmu *pmu)
+{
+	mutex_lock(&pmus_lock);
+	list_add_rcu(&pmu->entry, &pmus);
+	mutex_unlock(&pmus_lock);
+
+	return 0;
+}
+
+void perf_pmu_unregister(struct pmu *pmu)
+{
+	mutex_lock(&pmus_lock);
+	list_del_rcu(&pmu->entry);
+	mutex_unlock(&pmus_lock);
 
-			atomic_inc(&perf_swevent_enabled[event_id]);
-			event->destroy = sw_perf_event_destroy;
+	synchronize_srcu(&pmus_srcu);
+}
+
+struct pmu *perf_init_event(struct perf_event *event)
+{
+	struct pmu *pmu = NULL;
+	int idx;
+
+	idx = srcu_read_lock(&pmus_srcu);
+	list_for_each_entry_rcu(pmu, &pmus, entry) {
+		int ret = pmu->event_init(event);
+		if (!ret)
+			break;
+		if (ret != -ENOENT) {
+			pmu = ERR_PTR(ret);
+			break;
 		}
-		pmu = &perf_ops_generic;
-		break;
 	}
+	srcu_read_unlock(&pmus_srcu, idx);
 
 	return pmu;
 }
@@ -4882,29 +4902,8 @@ perf_event_alloc(struct perf_event_attr 
 	if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
 		goto done;
 
-	switch (attr->type) {
-	case PERF_TYPE_RAW:
-	case PERF_TYPE_HARDWARE:
-	case PERF_TYPE_HW_CACHE:
-		pmu = hw_perf_event_init(event);
-		break;
-
-	case PERF_TYPE_SOFTWARE:
-		pmu = sw_perf_event_init(event);
-		break;
-
-	case PERF_TYPE_TRACEPOINT:
-		pmu = tp_perf_event_init(event);
-		break;
+	pmu = perf_init_event(event);
 
-	case PERF_TYPE_BREAKPOINT:
-		pmu = bp_perf_event_init(event);
-		break;
-
-
-	default:
-		break;
-	}
 done:
 	err = 0;
 	if (!pmu)
@@ -5743,15 +5742,15 @@ perf_cpu_notify(struct notifier_block *s
 {
 	unsigned int cpu = (long)hcpu;
 
-	switch (action) {
+	switch (action & ~CPU_TASKS_FROZEN) {
 
 	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
+	case CPU_DOWN_FAILED:
 		perf_event_init_cpu(cpu);
 		break;
 
+	case CPU_UP_CANCELED:
 	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
 		perf_event_exit_cpu(cpu);
 		break;
 
@@ -5762,22 +5761,15 @@ perf_cpu_notify(struct notifier_block *s
 	return NOTIFY_OK;
 }
 
-/*
- * This has to have a higher priority than migration_notifier in sched.c.
- */
-static struct notifier_block __cpuinitdata perf_cpu_nb = {
-	.notifier_call		= perf_cpu_notify,
-	.priority		= 20,
-};
-
 void __init perf_event_init(void)
 {
 	perf_event_init_all_cpus();
-	perf_cpu_notify(&perf_cpu_nb, (unsigned long)CPU_UP_PREPARE,
-			(void *)(long)smp_processor_id());
-	perf_cpu_notify(&perf_cpu_nb, (unsigned long)CPU_ONLINE,
-			(void *)(long)smp_processor_id());
-	register_cpu_notifier(&perf_cpu_nb);
+	init_srcu_struct(&pmus_srcu);
+	perf_pmu_register(&perf_swevent);
+	perf_pmu_register(&perf_cpu_clock);
+	perf_pmu_register(&perf_task_clock);
+	perf_tp_register();
+	perf_cpu_notifier(perf_cpu_notify);
 }
 
 static ssize_t perf_show_reserve_percpu(struct sysdev_class *class,



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 06/11] perf: Unindent labels
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (4 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 05/11] perf: register pmu implementations Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-24 14:28 ` [RFC][PATCH 07/11] perf: Reduce perf_disable() usage Peter Zijlstra
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-whitespace.patch --]
[-- Type: text/plain, Size: 4256 bytes --]

Fixup random annoying style bits

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/perf_event.c |   43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -147,7 +147,7 @@ perf_lock_task_context(struct task_struc
 	struct perf_event_context *ctx;
 
 	rcu_read_lock();
- retry:
+retry:
 	ctx = rcu_dereference(task->perf_event_ctxp);
 	if (ctx) {
 		/*
@@ -601,7 +601,7 @@ void perf_event_disable(struct perf_even
 		return;
 	}
 
- retry:
+retry:
 	task_oncpu_function_call(task, __perf_event_disable, event);
 
 	raw_spin_lock_irq(&ctx->lock);
@@ -831,7 +831,7 @@ static void __perf_install_in_context(vo
 	if (!err && !ctx->task && cpuctx->max_pertask)
 		cpuctx->max_pertask--;
 
- unlock:
+unlock:
 	perf_enable();
 
 	raw_spin_unlock(&ctx->lock);
@@ -904,10 +904,12 @@ static void __perf_event_mark_enabled(st
 
 	event->state = PERF_EVENT_STATE_INACTIVE;
 	event->tstamp_enabled = ctx->time - event->total_time_enabled;
-	list_for_each_entry(sub, &event->sibling_list, group_entry)
-		if (sub->state >= PERF_EVENT_STATE_INACTIVE)
+	list_for_each_entry(sub, &event->sibling_list, group_entry) {
+		if (sub->state >= PERF_EVENT_STATE_INACTIVE) {
 			sub->tstamp_enabled =
 				ctx->time - sub->total_time_enabled;
+		}
+	}
 }
 
 /*
@@ -973,7 +975,7 @@ static void __perf_event_enable(void *in
 		}
 	}
 
- unlock:
+unlock:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -1014,7 +1016,7 @@ void perf_event_enable(struct perf_event
 	if (event->state == PERF_EVENT_STATE_ERROR)
 		event->state = PERF_EVENT_STATE_OFF;
 
- retry:
+retry:
 	raw_spin_unlock_irq(&ctx->lock);
 	task_oncpu_function_call(task, __perf_event_enable, event);
 
@@ -1034,7 +1036,7 @@ void perf_event_enable(struct perf_event
 	if (event->state == PERF_EVENT_STATE_OFF)
 		__perf_event_mark_enabled(event, ctx);
 
- out:
+out:
 	raw_spin_unlock_irq(&ctx->lock);
 }
 
@@ -1074,17 +1076,19 @@ static void ctx_sched_out(struct perf_ev
 	if (!ctx->nr_active)
 		goto out_enable;
 
-	if (event_type & EVENT_PINNED)
+	if (event_type & EVENT_PINNED) {
 		list_for_each_entry(event, &ctx->pinned_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
+	}
 
-	if (event_type & EVENT_FLEXIBLE)
+	if (event_type & EVENT_FLEXIBLE) {
 		list_for_each_entry(event, &ctx->flexible_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
+	}
 
  out_enable:
 	perf_enable();
- out:
+out:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -1323,9 +1327,10 @@ ctx_flexible_sched_in(struct perf_event_
 		if (event->cpu != -1 && event->cpu != smp_processor_id())
 			continue;
 
-		if (group_can_go_on(event, cpuctx, can_add_hw))
+		if (group_can_go_on(event, cpuctx, can_add_hw)) {
 			if (group_sched_in(event, cpuctx, ctx))
 				can_add_hw = 0;
+		}
 	}
 }
 
@@ -1355,7 +1360,7 @@ ctx_sched_in(struct perf_event_context *
 		ctx_flexible_sched_in(ctx, cpuctx);
 
 	perf_enable();
- out:
+out:
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -1696,7 +1701,7 @@ static void perf_event_enable_on_exec(st
 	raw_spin_unlock(&ctx->lock);
 
 	perf_event_task_sched_in(task);
- out:
+out:
 	local_irq_restore(flags);
 }
 
@@ -1825,7 +1830,7 @@ static struct perf_event_context *find_g
 	if (!ptrace_may_access(task, PTRACE_MODE_READ))
 		goto errout;
 
- retry:
+retry:
 	ctx = perf_lock_task_context(task, &flags);
 	if (ctx) {
 		unclone_ctx(ctx);
@@ -1853,7 +1858,7 @@ static struct perf_event_context *find_g
 	put_task_struct(task);
 	return ctx;
 
- errout:
+errout:
 	put_task_struct(task);
 	return ERR_PTR(err);
 }
@@ -3044,7 +3049,7 @@ again:
 	if (handle->wakeup != local_read(&buffer->wakeup))
 		perf_output_wakeup(handle);
 
- out:
+out:
 	preempt_enable();
 }
 
@@ -4352,7 +4357,7 @@ static int swevent_hlist_get_cpu(struct 
 		rcu_assign_pointer(cpuctx->swevent_hlist, hlist);
 	}
 	cpuctx->hlist_refcount++;
- exit:
+exit:
 	mutex_unlock(&cpuctx->hlist_mutex);
 
 	return err;
@@ -4377,7 +4382,7 @@ static int swevent_hlist_get(struct perf
 	put_online_cpus();
 
 	return 0;
- fail:
+fail:
 	for_each_possible_cpu(cpu) {
 		if (cpu == failed_cpu)
 			break;



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 07/11] perf: Reduce perf_disable() usage
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (5 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 06/11] perf: Unindent labels Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-24 14:28 ` [RFC][PATCH 08/11] perf: Per PMU disable Peter Zijlstra
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-less-disable-1.patch --]
[-- Type: text/plain, Size: 11232 bytes --]

Since the current perf_disable() usage is only an optimization, remove
it for now. This eases the removal of the weak hw_perf_enable
interface.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/powerpc/kernel/perf_event.c         |    3 ++
 arch/powerpc/kernel/perf_event_fsl_emb.c |    8 +++++-
 arch/sparc/kernel/perf_event.c           |    3 ++
 arch/x86/kernel/cpu/perf_event.c         |   22 +++++++++++-------
 include/linux/perf_event.h               |   20 ++++++++--------
 kernel/perf_event.c                      |   37 -------------------------------
 6 files changed, 37 insertions(+), 56 deletions(-)

Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -562,26 +562,26 @@ struct pmu {
 	struct list_head		entry;
 
 	/*
-	 * Should return -ENOENT when the @event doesn't match this pmu
+	 * Should return -ENOENT when the @event doesn't match this PMU.
 	 */
 	int (*event_init)		(struct perf_event *event);
 
-	int (*enable)			(struct perf_event *event);
+	int  (*enable)			(struct perf_event *event);
 	void (*disable)			(struct perf_event *event);
-	int (*start)			(struct perf_event *event);
+	int  (*start)			(struct perf_event *event);
 	void (*stop)			(struct perf_event *event);
 	void (*read)			(struct perf_event *event);
 	void (*unthrottle)		(struct perf_event *event);
 
 	/*
-	 * Group events scheduling is treated as a transaction, add group
-	 * events as a whole and perform one schedulability test. If the test
-	 * fails, roll back the whole group
+	 * Group events scheduling is treated as a transaction, add
+	 * group events as a whole and perform one schedulability test.
+	 * If the test fails, roll back the whole group
 	 */
 
 	/*
-	 * Start the transaction, after this ->enable() doesn't need
-	 * to do schedulability tests.
+	 * Start the transaction, after this ->enable() doesn't need to
+	 * do schedulability tests.
 	 */
 	void (*start_txn)	(struct pmu *pmu);
 	/*
@@ -592,8 +592,8 @@ struct pmu {
 	 */
 	int  (*commit_txn)	(struct pmu *pmu);
 	/*
-	 * Will cancel the transaction, assumes ->disable() is called for
-	 * each successfull ->enable() during the transaction.
+	 * Will cancel the transaction, assumes ->disable() is called
+	 * for each successfull ->enable() during the transaction.
 	 */
 	void (*cancel_txn)	(struct pmu *pmu);
 };
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -460,11 +460,6 @@ static void __perf_event_remove_from_con
 		return;
 
 	raw_spin_lock(&ctx->lock);
-	/*
-	 * Protect the list operation against NMI by disabling the
-	 * events on a global level.
-	 */
-	perf_disable();
 
 	event_sched_out(event, cpuctx, ctx);
 
@@ -480,7 +475,6 @@ static void __perf_event_remove_from_con
 			    perf_max_events - perf_reserved_percpu);
 	}
 
-	perf_enable();
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -785,12 +779,6 @@ static void __perf_install_in_context(vo
 	ctx->is_active = 1;
 	update_context_time(ctx);
 
-	/*
-	 * Protect the list operation against NMI by disabling the
-	 * events on a global level. NOP for non NMI based events.
-	 */
-	perf_disable();
-
 	add_event_to_ctx(event, ctx);
 
 	if (event->cpu != -1 && event->cpu != smp_processor_id())
@@ -832,8 +820,6 @@ static void __perf_install_in_context(vo
 		cpuctx->max_pertask--;
 
 unlock:
-	perf_enable();
-
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -954,12 +940,10 @@ static void __perf_event_enable(void *in
 	if (!group_can_go_on(event, cpuctx, 1)) {
 		err = -EEXIST;
 	} else {
-		perf_disable();
 		if (event == leader)
 			err = group_sched_in(event, cpuctx, ctx);
 		else
 			err = event_sched_in(event, cpuctx, ctx);
-		perf_enable();
 	}
 
 	if (err) {
@@ -1072,9 +1056,8 @@ static void ctx_sched_out(struct perf_ev
 		goto out;
 	update_context_time(ctx);
 
-	perf_disable();
 	if (!ctx->nr_active)
-		goto out_enable;
+		goto out;
 
 	if (event_type & EVENT_PINNED) {
 		list_for_each_entry(event, &ctx->pinned_groups, group_entry)
@@ -1085,9 +1068,6 @@ static void ctx_sched_out(struct perf_ev
 		list_for_each_entry(event, &ctx->flexible_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
 	}
-
- out_enable:
-	perf_enable();
 out:
 	raw_spin_unlock(&ctx->lock);
 }
@@ -1346,8 +1326,6 @@ ctx_sched_in(struct perf_event_context *
 
 	ctx->timestamp = perf_clock();
 
-	perf_disable();
-
 	/*
 	 * First go through the list and put on any pinned groups
 	 * in order to give them the best chance of going on.
@@ -1359,7 +1337,6 @@ ctx_sched_in(struct perf_event_context *
 	if (event_type & EVENT_FLEXIBLE)
 		ctx_flexible_sched_in(ctx, cpuctx);
 
-	perf_enable();
 out:
 	raw_spin_unlock(&ctx->lock);
 }
@@ -1407,8 +1384,6 @@ void perf_event_task_sched_in(struct tas
 	if (cpuctx->task_ctx == ctx)
 		return;
 
-	perf_disable();
-
 	/*
 	 * We want to keep the following priority order:
 	 * cpu pinned (that don't need to move), task pinned,
@@ -1421,8 +1396,6 @@ void perf_event_task_sched_in(struct tas
 	ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE);
 
 	cpuctx->task_ctx = ctx;
-
-	perf_enable();
 }
 
 #define MAX_INTERRUPTS (~0ULL)
@@ -1537,11 +1510,9 @@ static void perf_adjust_period(struct pe
 	hwc->sample_period = sample_period;
 
 	if (local64_read(&hwc->period_left) > 8*sample_period) {
-		perf_disable();
 		perf_event_stop(event);
 		local64_set(&hwc->period_left, 0);
 		perf_event_start(event);
-		perf_enable();
 	}
 }
 
@@ -1570,15 +1541,12 @@ static void perf_ctx_adjust_freq(struct 
 		 */
 		if (interrupts == MAX_INTERRUPTS) {
 			perf_log_throttle(event, 1);
-			perf_disable();
 			event->pmu->unthrottle(event);
-			perf_enable();
 		}
 
 		if (!event->attr.freq || !event->attr.sample_freq)
 			continue;
 
-		perf_disable();
 		event->pmu->read(event);
 		now = local64_read(&event->count);
 		delta = now - hwc->freq_count_stamp;
@@ -1586,7 +1554,6 @@ static void perf_ctx_adjust_freq(struct 
 
 		if (delta > 0)
 			perf_adjust_period(event, TICK_NSEC, delta);
-		perf_enable();
 	}
 	raw_spin_unlock(&ctx->lock);
 }
@@ -1629,7 +1596,6 @@ void perf_event_task_tick(struct task_st
 	if (!rotate)
 		return;
 
-	perf_disable();
 	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 	if (ctx)
 		task_ctx_sched_out(ctx, EVENT_FLEXIBLE);
@@ -1641,7 +1607,6 @@ void perf_event_task_tick(struct task_st
 	cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE);
 	if (ctx)
 		task_ctx_sched_in(curr, EVENT_FLEXIBLE);
-	perf_enable();
 }
 
 static int event_enable_on_exec(struct perf_event *event,
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -262,7 +262,7 @@ static int collect_events(struct perf_ev
 	return n;
 }
 
-/* perf must be disabled, context locked on entry */
+/* context locked on entry */
 static int fsl_emb_pmu_enable(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuhw;
@@ -271,6 +271,7 @@ static int fsl_emb_pmu_enable(struct per
 	u64 val;
 	int i;
 
+	perf_disable();
 	cpuhw = &get_cpu_var(cpu_hw_events);
 
 	if (event->hw.config & FSL_EMB_EVENT_RESTRICTED)
@@ -310,15 +311,17 @@ static int fsl_emb_pmu_enable(struct per
 	ret = 0;
  out:
 	put_cpu_var(cpu_hw_events);
+	perf_enable();
 	return ret;
 }
 
-/* perf must be disabled, context locked on entry */
+/* context locked on entry */
 static void fsl_emb_pmu_disable(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuhw;
 	int i = event->hw.idx;
 
+	perf_disable();
 	if (i < 0)
 		goto out;
 
@@ -346,6 +349,7 @@ static void fsl_emb_pmu_disable(struct p
 	cpuhw->n_events--;
 
  out:
+	perf_enable();
 	put_cpu_var(cpu_hw_events);
 }
 
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -969,10 +969,11 @@ static int x86_pmu_enable(struct perf_ev
 
 	hwc = &event->hw;
 
+	perf_disable();
 	n0 = cpuc->n_events;
-	n = collect_events(cpuc, event, false);
-	if (n < 0)
-		return n;
+	ret = n = collect_events(cpuc, event, false);
+	if (ret < 0)
+		goto out;
 
 	/*
 	 * If group events scheduling transaction was started,
@@ -980,23 +981,26 @@ static int x86_pmu_enable(struct perf_ev
 	 * at commit time(->commit_txn) as a whole
 	 */
 	if (cpuc->group_flag & PERF_EVENT_TXN)
-		goto out;
+		goto done_collect;
 
 	ret = x86_pmu.schedule_events(cpuc, n, assign);
 	if (ret)
-		return ret;
+		goto out;
 	/*
 	 * copy new assignment, now we know it is possible
 	 * will be used by hw_perf_enable()
 	 */
 	memcpy(cpuc->assign, assign, n*sizeof(int));
 
-out:
+done_collect:
 	cpuc->n_events = n;
 	cpuc->n_added += n - n0;
 	cpuc->n_txn += n - n0;
 
-	return 0;
+	ret = 0;
+out:
+	perf_enable();
+	return ret;
 }
 
 static int x86_pmu_start(struct perf_event *event)
@@ -1399,6 +1403,7 @@ static void x86_pmu_start_txn(struct pmu
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
+	perf_disable();
 	cpuc->group_flag |= PERF_EVENT_TXN;
 	cpuc->n_txn = 0;
 }
@@ -1418,6 +1423,7 @@ static void x86_pmu_cancel_txn(struct pm
 	 */
 	cpuc->n_added -= cpuc->n_txn;
 	cpuc->n_events -= cpuc->n_txn;
+	perf_enable();
 }
 
 /*
@@ -1447,7 +1453,7 @@ static int x86_pmu_commit_txn(struct pmu
 	memcpy(cpuc->assign, assign, n*sizeof(int));
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
-
+	perf_enable();
 	return 0;
 }
 
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -858,6 +858,7 @@ void power_pmu_start_txn(struct pmu *pmu
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
+	perf_disable();
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 	cpuhw->n_txn_start = cpuhw->n_events;
 }
@@ -872,6 +873,7 @@ void power_pmu_cancel_txn(struct pmu *pm
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 }
 
 /*
@@ -898,6 +900,7 @@ int power_pmu_commit_txn(struct pmu *pmu
 		cpuhw->event[i]->hw.config = cpuhw->events[i];
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 	return 0;
 }
 
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -1112,6 +1112,7 @@ static void sparc_pmu_start_txn(struct p
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
+	perf_disable();
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 }
 
@@ -1125,6 +1126,7 @@ static void sparc_pmu_cancel_txn(struct 
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 }
 
 /*
@@ -1148,6 +1150,7 @@ static int sparc_pmu_commit_txn(struct p
 		return -EAGAIN;
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
+	perf_enable();
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 08/11] perf: Per PMU disable
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (6 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 07/11] perf: Reduce perf_disable() usage Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-07-09  7:31   ` Paul Mackerras
  2010-06-24 14:28 ` [RFC][PATCH 09/11] perf: Default PMU ops Peter Zijlstra
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-pmu-disable.patch --]
[-- Type: text/plain, Size: 15794 bytes --]

Changes perf_disable() into perf_pmu_disable().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/perf_event.c             |   24 ++++++++++-----------
 arch/powerpc/kernel/perf_event.c         |   26 ++++++++++++-----------
 arch/powerpc/kernel/perf_event_fsl_emb.c |   18 +++++++++-------
 arch/sh/kernel/perf_event.c              |   34 ++++++++++++++++---------------
 arch/sparc/kernel/perf_event.c           |   20 ++++++++++--------
 arch/x86/kernel/cpu/perf_event.c         |   16 ++++++++------
 include/linux/perf_event.h               |   13 ++++++-----
 kernel/perf_event.c                      |   30 ++++++++++++++++-----------
 8 files changed, 99 insertions(+), 82 deletions(-)

Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -529,16 +529,7 @@ static int armpmu_event_init(struct perf
 	return err;
 }
 
-static struct pmu pmu = {
-	.event_init = armpmu_event_init,
-	.enable	    = armpmu_enable,
-	.disable    = armpmu_disable,
-	.unthrottle = armpmu_unthrottle,
-	.read	    = armpmu_read,
-};
-
-void
-hw_perf_enable(void)
+static void armpmu_pmu_enable(struct pmu *pmu)
 {
 	/* Enable all of the perf events on hardware. */
 	int idx;
@@ -559,13 +550,22 @@ hw_perf_enable(void)
 	armpmu->start();
 }
 
-void
-hw_perf_disable(void)
+static void armpmu_pmu_disable(struct pmu *pmu)
 {
 	if (armpmu)
 		armpmu->stop();
 }
 
+static struct pmu pmu = {
+	.pmu_enable = armpmu_pmu_enable,
+	.pmu_disable= armpmu_pmu_disable,
+	.event_init = armpmu_event_init,
+	.enable	    = armpmu_enable,
+	.disable    = armpmu_disable,
+	.unthrottle = armpmu_unthrottle,
+	.read	    = armpmu_read,
+};
+
 /*
  * ARMv6 Performance counter handling code.
  *
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -517,7 +517,7 @@ static void write_mmcr0(struct cpu_hw_ev
  * Disable all events to prevent PMU interrupts and to allow
  * events to be added or removed.
  */
-void hw_perf_disable(void)
+static void powerpc_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -565,7 +565,7 @@ void hw_perf_disable(void)
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
  */
-void hw_perf_enable(void)
+static void powerpc_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct perf_event *event;
 	struct cpu_hw_events *cpuhw;
@@ -735,7 +735,7 @@ static int power_pmu_enable(struct perf_
 	int ret = -EAGAIN;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 
 	/*
 	 * Add the event to the list (if there is room)
@@ -769,7 +769,7 @@ nocheck:
 
 	ret = 0;
  out:
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
 	return ret;
 }
@@ -784,7 +784,7 @@ static void power_pmu_disable(struct per
 	unsigned long flags;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 
 	power_pmu_read(event);
 
@@ -818,7 +818,7 @@ static void power_pmu_disable(struct per
 		cpuhw->mmcr[0] &= ~(MMCR0_PMXE | MMCR0_FCECE);
 	}
 
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -834,7 +834,7 @@ static void power_pmu_unthrottle(struct 
 	if (!event->hw.idx || !event->hw.sample_period)
 		return;
 	local_irq_save(flags);
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 	power_pmu_read(event);
 	left = event->hw.sample_period;
 	event->hw.last_period = left;
@@ -845,7 +845,7 @@ static void power_pmu_unthrottle(struct 
 	local64_set(&event->hw.prev_count, val);
 	local64_set(&event->hw.period_left, left);
 	perf_event_update_userpage(event);
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -858,7 +858,7 @@ void power_pmu_start_txn(struct pmu *pmu
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
-	perf_disable();
+	perf_pmu_disable(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 	cpuhw->n_txn_start = cpuhw->n_events;
 }
@@ -873,7 +873,7 @@ void power_pmu_cancel_txn(struct pmu *pm
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_pmu_enable(pmu);
 }
 
 /*
@@ -900,7 +900,7 @@ int power_pmu_commit_txn(struct pmu *pmu
 		cpuhw->event[i]->hw.config = cpuhw->events[i];
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_pmu_enable(pmu);
 	return 0;
 }
 
@@ -1128,7 +1128,9 @@ static int power_pmu_event_init(struct p
 }
 
 struct pmu power_pmu = {
-	.event_init	= power_pmu_event_init,
+	.pmu_enable	= power_pmu_pmu_enable,
+	.pmu_disable	= power_pmu_pmu_disable,
+	.event_init	= pmwer_pmu_event_init,
 	.enable		= power_pmu_enable,
 	.disable	= power_pmu_disable,
 	.read		= power_pmu_read,
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -177,7 +177,7 @@ static void fsl_emb_pmu_read(struct perf
  * Disable all events to prevent PMU interrupts and to allow
  * events to be added or removed.
  */
-void hw_perf_disable(void)
+static void fsl_emb_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -216,7 +216,7 @@ void hw_perf_disable(void)
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
  */
-void hw_perf_enable(void)
+static void fsl_emb_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -271,7 +271,7 @@ static int fsl_emb_pmu_enable(struct per
 	u64 val;
 	int i;
 
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 	cpuhw = &get_cpu_var(cpu_hw_events);
 
 	if (event->hw.config & FSL_EMB_EVENT_RESTRICTED)
@@ -311,7 +311,7 @@ static int fsl_emb_pmu_enable(struct per
 	ret = 0;
  out:
 	put_cpu_var(cpu_hw_events);
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	return ret;
 }
 
@@ -321,7 +321,7 @@ static void fsl_emb_pmu_disable(struct p
 	struct cpu_hw_events *cpuhw;
 	int i = event->hw.idx;
 
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 	if (i < 0)
 		goto out;
 
@@ -349,7 +349,7 @@ static void fsl_emb_pmu_disable(struct p
 	cpuhw->n_events--;
 
  out:
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	put_cpu_var(cpu_hw_events);
 }
 
@@ -367,7 +367,7 @@ static void fsl_emb_pmu_unthrottle(struc
 	if (event->hw.idx < 0 || !event->hw.sample_period)
 		return;
 	local_irq_save(flags);
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 	fsl_emb_pmu_read(event);
 	left = event->hw.sample_period;
 	event->hw.last_period = left;
@@ -378,7 +378,7 @@ static void fsl_emb_pmu_unthrottle(struc
 	atomic64_set(&event->hw.prev_count, val);
 	atomic64_set(&event->hw.period_left, left);
 	perf_event_update_userpage(event);
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -524,6 +524,8 @@ static int fsl_emb_pmu_event_init(struct
 }
 
 static struct pmu fsl_emb_pmu = {
+	.pmu_enable	= fsl_emb_pmu_pmu_enable,
+	.pmu_disable	= fsl_emb_pmu_pmu_disable,
 	.event_init	= fsl_emb_pmu_event_init,
 	.enable		= fsl_emb_pmu_enable,
 	.disable	= fsl_emb_pmu_disable,
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -268,7 +268,25 @@ static in sh_pmu_event_init(struct perf_
 	return err;
 }
 
+static void sh_pmu_pmu_enable(struct pmu *pmu)
+{
+	if (!sh_pmu_initialized())
+		return;
+
+	sh_pmu->enable_all();
+}
+
+static void sh_pmu_pmu_disable(struct pmu *pmu)
+{
+	if (!sh_pmu_initialized())
+		return;
+
+	sh_pmu->disable_all();
+}
+
 static struct pmu pmu = {
+	.pmu_enable	= sh_pmu_pmu_enable,
+	.pmu_disable	= sh_pmu_pmu_disable,
 	.event_init	= sh_pmu_event_init,
 	.enable		= sh_pmu_enable,
 	.disable	= sh_pmu_disable,
@@ -299,22 +317,6 @@ sh_pmu_notifier(struct notifier_block *s
 	return NOTIFY_OK;
 }
 
-void hw_perf_enable(void)
-{
-	if (!sh_pmu_initialized())
-		return;
-
-	sh_pmu->enable_all();
-}
-
-void hw_perf_disable(void)
-{
-	if (!sh_pmu_initialized())
-		return;
-
-	sh_pmu->disable_all();
-}
-
 int __cpuinit register_sh_pmu(struct sh_pmu *pmu)
 {
 	if (sh_pmu)
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -663,7 +663,7 @@ out:
 	return pcr;
 }
 
-void hw_perf_enable(void)
+static void sparc_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	u64 pcr;
@@ -690,7 +690,7 @@ void hw_perf_enable(void)
 	pcr_ops->write(cpuc->pcr);
 }
 
-void hw_perf_disable(void)
+static void sparc_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	u64 val;
@@ -717,7 +717,7 @@ static void sparc_pmu_disable(struct per
 	int i;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 
 	for (i = 0; i < cpuc->n_events; i++) {
 		if (event == cpuc->event[i]) {
@@ -747,7 +747,7 @@ static void sparc_pmu_disable(struct per
 		}
 	}
 
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
 }
 
@@ -990,7 +990,7 @@ static int sparc_pmu_enable(struct perf_
 	unsigned long flags;
 
 	local_irq_save(flags);
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 
 	n0 = cpuc->n_events;
 	if (n0 >= perf_max_events)
@@ -1019,7 +1019,7 @@ nocheck:
 
 	ret = 0;
 out:
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
 	return ret;
 }
@@ -1112,7 +1112,7 @@ static void sparc_pmu_start_txn(struct p
 {
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
-	perf_disable();
+	perf_pmu_disable(pmu);
 	cpuhw->group_flag |= PERF_EVENT_TXN;
 }
 
@@ -1126,7 +1126,7 @@ static void sparc_pmu_cancel_txn(struct 
 	struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	cpuhw->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_pmu_enable(pmu);
 }
 
 /*
@@ -1150,11 +1150,13 @@ static int sparc_pmu_commit_txn(struct p
 		return -EAGAIN;
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_pmu_enable(pmu);
 	return 0;
 }
 
 static struct pmu pmu = {
+	.pmu_enable	= sparc_pmu_pmu_enable,
+	.pmu_disable	= sparc_pmu_pmu_disable,
 	.event_init	= sparc_pmu_event_init,
 	.enable		= sparc_pmu_enable,
 	.disable	= sparc_pmu_disable,
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -583,7 +583,7 @@ static void x86_pmu_disable_all(void)
 	}
 }
 
-void hw_perf_disable(void)
+static void x86_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -803,7 +803,7 @@ static inline int match_prev_assignment(
 static int x86_pmu_start(struct perf_event *event);
 static void x86_pmu_stop(struct perf_event *event);
 
-void hw_perf_enable(void)
+static void x86_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct perf_event *event;
@@ -969,7 +969,7 @@ static int x86_pmu_enable(struct perf_ev
 
 	hwc = &event->hw;
 
-	perf_disable();
+	perf_pmu_disable(event->pmu);
 	n0 = cpuc->n_events;
 	ret = n = collect_events(cpuc, event, false);
 	if (ret < 0)
@@ -999,7 +999,7 @@ done_collect:
 
 	ret = 0;
 out:
-	perf_enable();
+	perf_pmu_enable(event->pmu);
 	return ret;
 }
 
@@ -1403,7 +1403,7 @@ static void x86_pmu_start_txn(struct pmu
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
-	perf_disable();
+	perf_pmu_disable(pmu);
 	cpuc->group_flag |= PERF_EVENT_TXN;
 	cpuc->n_txn = 0;
 }
@@ -1423,7 +1423,7 @@ static void x86_pmu_cancel_txn(struct pm
 	 */
 	cpuc->n_added -= cpuc->n_txn;
 	cpuc->n_events -= cpuc->n_txn;
-	perf_enable();
+	perf_pmu_enable(pmu);
 }
 
 /*
@@ -1453,7 +1453,7 @@ static int x86_pmu_commit_txn(struct pmu
 	memcpy(cpuc->assign, assign, n*sizeof(int));
 
 	cpuc->group_flag &= ~PERF_EVENT_TXN;
-	perf_enable();
+	perf_pmu_enable(pmu);
 	return 0;
 }
 
@@ -1572,6 +1572,8 @@ int x86_pmu_event_init(struct perf_event
 }
 
 static struct pmu pmu = {
+	.pmu_enable	= x86_pmu_pmu_enable,
+	.pmu_disable	= x86_pmu_pmu_disable,
 	.event_init	= x86_pmu_event_init,
 	.enable		= x86_pmu_enable,
 	.disable	= x86_pmu_disable,
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -561,6 +561,11 @@ struct perf_event;
 struct pmu {
 	struct list_head		entry;
 
+	int				*pmu_disable_count;
+
+	void (*pmu_enable)		(struct pmu *pmu);
+	void (*pmu_disable)		(struct pmu *pmu);
+
 	/*
 	 * Should return -ENOENT when the @event doesn't match this PMU.
 	 */
@@ -864,10 +869,8 @@ extern void perf_event_free_task(struct 
 extern void set_perf_event_pending(void);
 extern void perf_event_do_pending(void);
 extern void perf_event_print_debug(void);
-extern void __perf_disable(void);
-extern bool __perf_enable(void);
-extern void perf_disable(void);
-extern void perf_enable(void);
+extern void perf_pmu_disable(struct pmu *pmu);
+extern void perf_pmu_enable(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
 extern void perf_event_update_userpage(struct perf_event *event);
@@ -1038,8 +1041,6 @@ static inline void perf_event_exit_task(
 static inline void perf_event_free_task(struct task_struct *task)	{ }
 static inline void perf_event_do_pending(void)				{ }
 static inline void perf_event_print_debug(void)				{ }
-static inline void perf_disable(void)					{ }
-static inline void perf_enable(void)					{ }
 static inline int perf_event_task_disable(void)				{ return -EINVAL; }
 static inline int perf_event_task_enable(void)				{ return -EINVAL; }
 
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -71,23 +71,20 @@ static atomic64_t perf_event_id;
  */
 static DEFINE_SPINLOCK(perf_resource_lock);
 
-void __weak hw_perf_disable(void)		{ barrier(); }
-void __weak hw_perf_enable(void)		{ barrier(); }
-
 void __weak perf_event_print_debug(void)	{ }
 
-static DEFINE_PER_CPU(int, perf_disable_count);
-
-void perf_disable(void)
+void perf_pmu_disable(struct pmu *pmu)
 {
-	if (!__get_cpu_var(perf_disable_count)++)
-		hw_perf_disable();
+	int *count = this_cpu_ptr(pmu->pmu_disable_count);
+	if (!(*count)++)
+		pmu->pmu_disable(pmu);
 }
 
-void perf_enable(void)
+void perf_pmu_enable(struct pmu *pmu)
 {
-	if (!--__get_cpu_var(perf_disable_count))
-		hw_perf_enable();
+	int *count = this_cpu_ptr(pmu->pmu_disable_count);
+	if (!--(*count))
+		pmu->pmu_enable(pmu);
 }
 
 static void get_ctx(struct perf_event_context *ctx)
@@ -4760,16 +4757,25 @@ static struct srcu_struct pmus_srcu;
 
 int perf_pmu_register(struct pmu *pmu)
 {
+	int ret;
+
 	mutex_lock(&pmus_lock);
+	ret = -ENOMEM;
+	pmu->pmu_disable_count = alloc_percpu(int);
+	if (!pmu->pmu_disable_count)
+		goto unlock;
 	list_add_rcu(&pmu->entry, &pmus);
+	ret = 0;
+unlock:
 	mutex_unlock(&pmus_lock);
 
-	return 0;
+	return ret;
 }
 
 void perf_pmu_unregister(struct pmu *pmu)
 {
 	mutex_lock(&pmus_lock);
+	free_percpu(pmu->pmu_disable_count);
 	list_del_rcu(&pmu->entry);
 	mutex_unlock(&pmus_lock);
 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (7 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 08/11] perf: Per PMU disable Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-29 14:49   ` Frederic Weisbecker
  2010-06-29 14:58   ` Frederic Weisbecker
  2010-06-24 14:28 ` [RFC][PATCH 10/11] perf: Shrink hw_perf_event Peter Zijlstra
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-default-ops.patch --]
[-- Type: text/plain, Size: 2292 bytes --]

Provide default implementations for the pmu txn methods, this allows
us to remove some conditional code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/perf_event.c |   48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -656,21 +656,14 @@ group_sched_in(struct perf_event *group_
 {
 	struct perf_event *event, *partial_group = NULL;
 	struct pmu *pmu = group_event->pmu;
-	bool txn = false;
 
 	if (group_event->state == PERF_EVENT_STATE_OFF)
 		return 0;
 
-	/* Check if group transaction availabe */
-	if (pmu->start_txn)
-		txn = true;
-
-	if (txn)
-		pmu->start_txn(pmu);
+	pmu->start_txn(pmu);
 
 	if (event_sched_in(group_event, cpuctx, ctx)) {
-		if (txn)
-			pmu->cancel_txn(pmu);
+		pmu->cancel_txn(pmu);
 		return -EAGAIN;
 	}
 
@@ -684,7 +677,7 @@ group_sched_in(struct perf_event *group_
 		}
 	}
 
-	if (!txn || !pmu->commit_txn(pmu))
+	if (!pmu->commit_txn(pmu))
 		return 0;
 
 group_error:
@@ -699,8 +692,7 @@ group_error:
 	}
 	event_sched_out(group_event, cpuctx, ctx);
 
-	if (txn)
-		pmu->cancel_txn(pmu);
+	pmu->cancel_txn(pmu);
 
 	return -EAGAIN;
 }
@@ -4755,6 +4747,26 @@ static struct list_head pmus;
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
 
+static void perf_pmu_nop(struct pmu *pmu)
+{
+}
+
+static void perf_pmu_start_txn(struct pmu *pmu)
+{
+	perf_pmu_disable(pmu);
+}
+
+static int perf_pmu_commit_txn(struct pmu *pmu)
+{
+	perf_pmu_enable(pmu);
+	return 0;
+}
+
+static void perf_pmu_cancel_txn(struct pmu *pmu)
+{
+	perf_pmu_enable(pmu);
+}
+
 int perf_pmu_register(struct pmu *pmu)
 {
 	int ret;
@@ -4764,6 +4776,18 @@ int perf_pmu_register(struct pmu *pmu)
 	pmu->pmu_disable_count = alloc_percpu(int);
 	if (!pmu->pmu_disable_count)
 		goto unlock;
+
+	if (!pmu->pmu_enable) {
+		pmu->pmu_enable  = perf_pmu_nop;
+		pmu->pmu_disable = perf_pmu_nop;
+	}
+
+	if (!pmu->start_txn) {
+		pmu->start_txn  = perf_pmu_start_txn;
+		pmu->commit_txn = perf_pmu_commit_txn;
+		pmu->cancel_txn = perf_pmu_cancel_txn;
+	}
+
 	list_add_rcu(&pmu->entry, &pmus);
 	ret = 0;
 unlock:



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 10/11] perf: Shrink hw_perf_event
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (8 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 09/11] perf: Default PMU ops Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-29 15:06   ` Frederic Weisbecker
  2010-06-24 14:28 ` [RFC][PATCH 11/11] perf: Rework the PMU methods Peter Zijlstra
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-swtimer-period_left.patch --]
[-- Type: text/plain, Size: 1701 bytes --]

Use hw_perf_event::period_left instead of hw_perf_event::remaning and
win back 8 bytes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/perf_event.h |    1 -
 kernel/perf_event.c        |   13 ++++++-------
 2 files changed, 6 insertions(+), 8 deletions(-)

Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -529,7 +529,6 @@ struct hw_perf_event {
 			int		last_cpu;
 		};
 		struct { /* software */
-			s64		remaining;
 			struct hrtimer	hrtimer;
 		};
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -4590,14 +4590,13 @@ static void perf_swevent_start_hrtimer(s
 	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hwc->hrtimer.function = perf_swevent_hrtimer;
 	if (hwc->sample_period) {
-		u64 period;
+		s64 period = local64_read(&hwc->period_left);
 
-		if (hwc->remaining) {
-			if (hwc->remaining < 0)
+		if (period) {
+			if (period < 0)
 				period = 10000;
-			else
-				period = hwc->remaining;
-			hwc->remaining = 0;
+
+			local64_set(&hwc->period_left, 0);
 		} else {
 			period = max_t(u64, 10000, hwc->sample_period);
 		}
@@ -4613,7 +4612,7 @@ static void perf_swevent_cancel_hrtimer(
 
 	if (hwc->sample_period) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
-		hwc->remaining = ktime_to_ns(remaining);
+		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
 		hrtimer_cancel(&hwc->hrtimer);
 	}



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 11/11] perf: Rework the PMU methods
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (9 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 10/11] perf: Shrink hw_perf_event Peter Zijlstra
@ 2010-06-24 14:28 ` Peter Zijlstra
  2010-06-29 15:37   ` Frederic Weisbecker
  2010-06-25 11:11 ` [RFC][PATCH 00/11] perf pmu interface -v2 Will Deacon
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-24 14:28 UTC (permalink / raw)
  To: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller
  Cc: linux-kernel, Peter Zijlstra

[-- Attachment #1: perf-change-ops.patch --]
[-- Type: text/plain, Size: 47388 bytes --]

Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.

The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.

This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).

It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).

The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:

 1) We disable the counter:
    a) the pmu has per-counter enable bits, we flip that
    b) we program a NOP event, preserving the counter state

 2) We store the counter state and ignore all read/overflow events

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/arm/kernel/perf_event.c              |   64 +++++++++----
 arch/powerpc/kernel/perf_event.c          |  107 ++++++++++++++--------
 arch/powerpc/kernel/perf_event_fsl_emb.c  |  113 ++++++++++++++----------
 arch/sh/kernel/perf_event.c               |   68 ++++++++++----
 arch/sparc/kernel/perf_event.c            |  109 ++++++++++++++---------
 arch/x86/kernel/cpu/perf_event.c          |   92 ++++++++++---------
 arch/x86/kernel/cpu/perf_event_intel.c    |    2 
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    2 
 include/linux/ftrace_event.h              |    4 
 include/linux/perf_event.h                |   45 +++++++--
 kernel/hw_breakpoint.c                    |   29 +++++-
 kernel/perf_event.c                       |  140 +++++++++++++++---------------
 kernel/trace/trace_event_perf.c           |    7 +
 13 files changed, 489 insertions(+), 293 deletions(-)

Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -583,7 +583,7 @@ static void x86_pmu_disable_all(void)
 	}
 }
 
-static void x86_pmu_pmu_disable(struct pmu *pmu)
+static void x86_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -800,10 +800,10 @@ static inline int match_prev_assignment(
 		hwc->last_tag == cpuc->tags[i];
 }
 
-static int x86_pmu_start(struct perf_event *event);
-static void x86_pmu_stop(struct perf_event *event);
+static void x86_pmu_start(struct perf_event *event, int flags);
+static void x86_pmu_stop(struct perf_event *event, int flags);
 
-static void x86_pmu_pmu_enable(struct pmu *pmu)
+static void x86_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct perf_event *event;
@@ -839,7 +839,7 @@ static void x86_pmu_pmu_enable(struct pm
 			    match_prev_assignment(hwc, cpuc, i))
 				continue;
 
-			x86_pmu_stop(event);
+			x86_pmu_stop(event, PERF_EF_UPDATE);
 		}
 
 		for (i = 0; i < cpuc->n_events; i++) {
@@ -851,7 +851,8 @@ static void x86_pmu_pmu_enable(struct pm
 			else if (i < n_running)
 				continue;
 
-			x86_pmu_start(event);
+			if (!(event->hw.state & PERF_HES_STOPPED))
+				x86_pmu_start(event, PERF_EF_RELOAD);
 		}
 		cpuc->n_added = 0;
 		perf_events_lapic_init();
@@ -952,15 +953,12 @@ static void x86_pmu_enable_event(struct 
 }
 
 /*
- * activate a single event
+ * Add a single event to the PMU.
  *
  * The event is added to the group of enabled events
  * but only if it can be scehduled with existing events.
- *
- * Called with PMU disabled. If successful and return value 1,
- * then guaranteed to call perf_enable() and hw_perf_enable()
  */
-static int x86_pmu_enable(struct perf_event *event)
+static int x86_pmu_add(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc;
@@ -975,10 +973,14 @@ static int x86_pmu_enable(struct perf_ev
 	if (ret < 0)
 		goto out;
 
+	hwc->state = PERF_HES_UPTODATE;
+	if (!(flags & PERF_EF_START))
+		hwc->state |= PERF_HES_STOPPED;
+
 	/*
 	 * If group events scheduling transaction was started,
 	 * skip the schedulability test here, it will be peformed
-	 * at commit time(->commit_txn) as a whole
+	 * at commit time (->commit_txn) as a whole
 	 */
 	if (cpuc->group_flag & PERF_EVENT_TXN)
 		goto done_collect;
@@ -1003,27 +1005,24 @@ out:
 	return ret;
 }
 
-static int x86_pmu_start(struct perf_event *event)
+static void x86_pmu_start(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int idx = event->hw.idx;
 
-	if (idx == -1)
-		return -EAGAIN;
+	if (WARN_ON_ONCE(idx == -1))
+		return;
+
+	if (flags & PERF_EF_RELOAD) {
+		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+		x86_perf_event_set_period(event);
+	}
 
-	x86_perf_event_set_period(event);
 	cpuc->events[idx] = event;
 	__set_bit(idx, cpuc->active_mask);
+	event->hw.state = 0;
 	x86_pmu.enable(event);
 	perf_event_update_userpage(event);
-
-	return 0;
-}
-
-static void x86_pmu_unthrottle(struct perf_event *event)
-{
-	int ret = x86_pmu_start(event);
-	WARN_ON_ONCE(ret);
 }
 
 void perf_event_print_debug(void)
@@ -1080,27 +1079,28 @@ void perf_event_print_debug(void)
 	local_irq_restore(flags);
 }
 
-static void x86_pmu_stop(struct perf_event *event)
+static void x86_pmu_stop(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx;
-
-	if (!__test_and_clear_bit(idx, cpuc->active_mask))
-		return;
-
-	x86_pmu.disable(event);
 
-	/*
-	 * Drain the remaining delta count out of a event
-	 * that we are disabling:
-	 */
-	x86_perf_event_update(event);
+	if (!__test_and_clear_bit(hwc->idx, cpuc->active_mask)) {
+		x86_pmu.disable(event);
+		cpuc->events[hwc->idx] = NULL;
+		hwc->state |= PERF_HES_STOPPED;
+	}
 
-	cpuc->events[idx] = NULL;
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		/*
+		 * Drain the remaining delta count out of a event
+		 * that we are disabling:
+		 */
+		x86_perf_event_update(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
 }
 
-static void x86_pmu_disable(struct perf_event *event)
+static void x86_pmu_del(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int i;
@@ -1113,7 +1113,7 @@ static void x86_pmu_disable(struct perf_
 	if (cpuc->group_flag & PERF_EVENT_TXN)
 		return;
 
-	x86_pmu_stop(event);
+	x86_pmu_stop(event, PERF_EF_UPDATE);
 
 	for (i = 0; i < cpuc->n_events; i++) {
 		if (event == cpuc->event_list[i]) {
@@ -1165,7 +1165,7 @@ static int x86_pmu_handle_irq(struct pt_
 			continue;
 
 		if (perf_event_overflow(event, 1, &data, regs))
-			x86_pmu_stop(event);
+			x86_pmu_stop(event, 0);
 	}
 
 	if (handled)
@@ -1572,15 +1572,17 @@ int x86_pmu_event_init(struct perf_event
 }
 
 static struct pmu pmu = {
-	.pmu_enable	= x86_pmu_pmu_enable,
-	.pmu_disable	= x86_pmu_pmu_disable,
+	.pmu_enable	= x86_pmu_enable,
+	.pmu_disable	= x86_pmu_disable,
+
 	.event_init	= x86_pmu_event_init,
-	.enable		= x86_pmu_enable,
-	.disable	= x86_pmu_disable,
+
+	.add		= x86_pmu_add,
+	.del		= x86_pmu_del,
 	.start		= x86_pmu_start,
 	.stop		= x86_pmu_stop,
 	.read		= x86_pmu_read,
-	.unthrottle	= x86_pmu_unthrottle,
+
 	.start_txn	= x86_pmu_start_txn,
 	.cancel_txn	= x86_pmu_cancel_txn,
 	.commit_txn	= x86_pmu_commit_txn,
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -404,7 +404,7 @@ event_sched_out(struct perf_event *event
 		event->state = PERF_EVENT_STATE_OFF;
 	}
 	event->tstamp_stopped = ctx->time;
-	event->pmu->disable(event);
+	event->pmu->del(event, 0);
 	event->oncpu = -1;
 
 	if (!is_software_event(event))
@@ -631,7 +631,7 @@ event_sched_in(struct perf_event *event,
 	 */
 	smp_wmb();
 
-	if (event->pmu->enable(event)) {
+	if (event->pmu->add(event, PERF_EF_START)) {
 		event->state = PERF_EVENT_STATE_INACTIVE;
 		event->oncpu = -1;
 		return -EAGAIN;
@@ -1464,22 +1464,6 @@ do {					\
 	return div64_u64(dividend, divisor);
 }
 
-static void perf_event_stop(struct perf_event *event)
-{
-	if (!event->pmu->stop)
-		return event->pmu->disable(event);
-
-	return event->pmu->stop(event);
-}
-
-static int perf_event_start(struct perf_event *event)
-{
-	if (!event->pmu->start)
-		return event->pmu->enable(event);
-
-	return event->pmu->start(event);
-}
-
 static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -1499,9 +1483,9 @@ static void perf_adjust_period(struct pe
 	hwc->sample_period = sample_period;
 
 	if (local64_read(&hwc->period_left) > 8*sample_period) {
-		perf_event_stop(event);
+		event->pmu->stop(event, PERF_EF_UPDATE);
 		local64_set(&hwc->period_left, 0);
-		perf_event_start(event);
+		event->pmu->start(event, PERF_EF_RELOAD);
 	}
 }
 
@@ -1530,7 +1514,7 @@ static void perf_ctx_adjust_freq(struct 
 		 */
 		if (interrupts == MAX_INTERRUPTS) {
 			perf_log_throttle(event, 1);
-			event->pmu->unthrottle(event);
+			event->pmu->start(event, 0);
 		}
 
 		if (!event->attr.freq || !event->attr.sample_freq)
@@ -2276,6 +2260,9 @@ int perf_event_task_disable(void)
 
 static int perf_event_index(struct perf_event *event)
 {
+	if (event->hw.state & PERF_HES_STOPPED)
+		return 0;
+
 	if (event->state != PERF_EVENT_STATE_ACTIVE)
 		return 0;
 
@@ -3894,8 +3881,6 @@ static int __perf_event_overflow(struct 
 	struct hw_perf_event *hwc = &event->hw;
 	int ret = 0;
 
-	throttle = (throttle && event->pmu->unthrottle != NULL);
-
 	if (!throttle) {
 		hwc->interrupts++;
 	} else {
@@ -4020,7 +4005,7 @@ static void perf_swevent_overflow(struct
 	}
 }
 
-static void perf_swevent_add(struct perf_event *event, u64 nr,
+static void perf_swevent_event(struct perf_event *event, u64 nr,
 			       int nmi, struct perf_sample_data *data,
 			       struct pt_regs *regs)
 {
@@ -4046,6 +4031,9 @@ static void perf_swevent_add(struct perf
 static int perf_exclude_event(struct perf_event *event,
 			      struct pt_regs *regs)
 {
+	if (event->hw.state & PERF_HES_STOPPED)
+		return 0;
+
 	if (regs) {
 		if (event->attr.exclude_user && user_mode(regs))
 			return 1;
@@ -4145,7 +4133,7 @@ static void do_perf_sw_event(enum perf_t
 
 	hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
 		if (perf_swevent_match(event, type, event_id, data, regs))
-			perf_swevent_add(event, nr, nmi, data, regs);
+			perf_swevent_event(event, nr, nmi, data, regs);
 	}
 end:
 	rcu_read_unlock();
@@ -4205,7 +4193,7 @@ static void perf_swevent_read(struct per
 {
 }
 
-static int perf_swevent_enable(struct perf_event *event)
+static int perf_swevent_add(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_cpu_context *cpuctx;
@@ -4218,6 +4206,8 @@ static int perf_swevent_enable(struct pe
 		perf_swevent_set_period(event);
 	}
 
+	hwc->state = !(flags & PERF_EF_START);
+
 	head = find_swevent_head(cpuctx, event);
 	if (WARN_ON_ONCE(!head))
 		return -EINVAL;
@@ -4227,18 +4217,19 @@ static int perf_swevent_enable(struct pe
 	return 0;
 }
 
-static void perf_swevent_disable(struct perf_event *event)
+static void perf_swevent_del(struct perf_event *event, int flags)
 {
 	hlist_del_rcu(&event->hlist_entry);
 }
 
-static void perf_swevent_void(struct perf_event *event)
+static void perf_swevent_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 }
 
-static int perf_swevent_int(struct perf_event *event)
+static void perf_swevent_stop(struct perf_event *event, int flags)
 {
-	return 0;
+	event->hw.state = PERF_HES_STOPPED;
 }
 
 /* Deref the hlist from the update side */
@@ -4394,12 +4385,11 @@ static int perf_swevent_init(struct perf
 
 static struct pmu perf_swevent = {
 	.event_init	= perf_swevent_init,
-	.enable		= perf_swevent_enable,
-	.disable	= perf_swevent_disable,
-	.start		= perf_swevent_int,
-	.stop		= perf_swevent_void,
+	.add		= perf_swevent_add,
+	.del		= perf_swevent_del,
+	.start		= perf_swevent_start,
+	.stop		= perf_swevent_stop,
 	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void, /* hwc->interrupts already reset */
 };
 
 #ifdef CONFIG_EVENT_TRACING
@@ -4447,7 +4437,7 @@ void perf_tp_event(u64 addr, u64 count, 
 
 	hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
 		if (perf_tp_event_match(event, &data, regs))
-			perf_swevent_add(event, count, 1, &data, regs);
+			perf_swevent_event(event, count, 1, &data, regs);
 	}
 
 	perf_swevent_put_recursion_context(rctx);
@@ -4486,12 +4476,11 @@ static int perf_tp_event_init(struct per
 
 static struct pmu perf_tracepoint = {
 	.event_init	= perf_tp_event_init,
-	.enable		= perf_trace_enable,
-	.disable	= perf_trace_disable,
-	.start		= perf_swevent_int,
-	.stop		= perf_swevent_void,
+	.add		= perf_trace_add,
+	.del		= perf_trace_del,
+	.start		= perf_swevent_start,
+	.stop		= perf_swevent_stop,
 	.read		= perf_swevent_read,
-	.unthrottle	= perf_swevent_void,
 };
 
 static inline void perf_tp_register(void)
@@ -4547,8 +4536,8 @@ void perf_bp_event(struct perf_event *bp
 
 	perf_sample_data_init(&sample, bp->attr.bp_addr);
 
-	if (!perf_exclude_event(bp, regs))
-		perf_swevent_add(bp, 1, 1, &sample, regs);
+	if (!bp->hw.state && !perf_exclude_event(bp, regs))
+		perf_swevent_event(bp, 1, 1, &sample, regs);
 }
 #endif
 
@@ -4624,32 +4613,39 @@ static void perf_swevent_cancel_hrtimer(
 
 static void cpu_clock_event_update(struct perf_event *event)
 {
-	int cpu = raw_smp_processor_id();
 	s64 prev;
 	u64 now;
 
-	now = cpu_clock(cpu);
+	now = local_clock();
 	prev = local64_xchg(&event->hw.prev_count, now);
 	local64_add(now - prev, &event->count);
 }
 
-static int cpu_clock_event_enable(struct perf_event *event)
+static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
-	struct hw_perf_event *hwc = &event->hw;
-	int cpu = raw_smp_processor_id();
-
-	local64_set(&hwc->prev_count, cpu_clock(cpu));
+	local64_set(&event->hw.prev_count, local_clock());
 	perf_swevent_start_hrtimer(event);
-
-	return 0;
 }
 
-static void cpu_clock_event_disable(struct perf_event *event)
+static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
 	cpu_clock_event_update(event);
 }
 
+static int cpu_clock_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		cpu_clock_event_start(event, flags);
+
+	return 0;
+}
+
+static void cpu_clock_event_del(struct perf_event *event, int flags)
+{
+	cpu_clock_event_stop(event, flags);
+}
+
 static void cpu_clock_event_read(struct perf_event *event)
 {
 	cpu_clock_event_update(event);
@@ -4668,8 +4664,10 @@ static int cpu_clock_event_init(struct p
 
 static struct pmu perf_cpu_clock = {
 	.event_init	= cpu_clock_event_init,
-	.enable		= cpu_clock_event_enable,
-	.disable	= cpu_clock_event_disable,
+	.add		= cpu_clock_event_add,
+	.del		= cpu_clock_event_del,
+	.start		= cpu_clock_event_start,
+	.stop		= cpu_clock_event_stop,
 	.read		= cpu_clock_event_read,
 };
 
@@ -4687,25 +4685,29 @@ static void task_clock_event_update(stru
 	local64_add(delta, &event->count);
 }
 
-static int task_clock_event_enable(struct perf_event *event)
+static void task_clock_event_start(struct perf_event *event, int flags)
 {
-	struct hw_perf_event *hwc = &event->hw;
-	u64 now;
-
-	now = event->ctx->time;
-
-	local64_set(&hwc->prev_count, now);
-
+	local64_set(&event->hw.prev_count, event->ctx->time);
 	perf_swevent_start_hrtimer(event);
-
-	return 0;
 }
 
-static void task_clock_event_disable(struct perf_event *event)
+static void task_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
 	task_clock_event_update(event, event->ctx->time);
+}
+
+static int task_clock_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		task_clock_event_start(event, flags);
 
+	return 0;
+}
+
+static void task_clock_event_del(struct perf_event *event, int flags)
+{
+	task_clock_event_stop(event, PERF_EF_UPDATE);
 }
 
 static void task_clock_event_read(struct perf_event *event)
@@ -4737,8 +4739,10 @@ static int task_clock_event_init(struct 
 
 static struct pmu perf_task_clock = {
 	.event_init	= task_clock_event_init,
-	.enable		= task_clock_event_enable,
-	.disable	= task_clock_event_disable,
+	.add		= task_clock_event_add,
+	.del		= task_clock_event_del,
+	.start		= task_clock_event_start,
+	.stop		= task_clock_event_stop,
 	.read		= task_clock_event_read,
 };
 
Index: linux-2.6/kernel/trace/trace_event_perf.c
===================================================================
--- linux-2.6.orig/kernel/trace/trace_event_perf.c
+++ linux-2.6/kernel/trace/trace_event_perf.c
@@ -107,7 +107,7 @@ int perf_trace_init(struct perf_event *p
 	return ret;
 }
 
-int perf_trace_enable(struct perf_event *p_event)
+int perf_trace_add(struct perf_event *p_event, int flags)
 {
 	struct ftrace_event_call *tp_event = p_event->tp_event;
 	struct hlist_head *list;
@@ -116,13 +116,16 @@ int perf_trace_enable(struct perf_event 
 	if (WARN_ON_ONCE(!list))
 		return -EINVAL;
 
+	if (!(flags & PERF_EF_START))
+		p_event->hw.state = PERF_HES_STOPPED;
+
 	list = this_cpu_ptr(list);
 	hlist_add_head_rcu(&p_event->hlist_entry, list);
 
 	return 0;
 }
 
-void perf_trace_disable(struct perf_event *p_event)
+void perf_trace_del(struct perf_event *p_event, int flags)
 {
 	hlist_del_rcu(&p_event->hlist_entry);
 }
Index: linux-2.6/arch/sh/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sh/kernel/perf_event.c
+++ linux-2.6/arch/sh/kernel/perf_event.c
@@ -206,46 +206,72 @@ again:
 	local64_add(delta, &event->count);
 }
 
-static void sh_pmu_disable(struct perf_event *event)
+static void sh_pmu_stop(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	clear_bit(idx, cpuc->active_mask);
-	sh_pmu->disable(hwc, idx);
+	if (!(event->hw.state & PERF_HES_STOPPED)) {
+		sh_pmu->disable(hwc, idx);
+		cpuc->events[idx] = NULL;
+		event->hw.state |= PERF_HES_STOPPED;
+	}
 
-	barrier();
+	if ((flags & PERF_EF_UPDATE) && !(event->hw.state & PERF_HES_UPTODATE)) {
+		sh_perf_event_update(event, &event->hw, idx);
+		event->hw.state |= PERF_HES_UPTODATE;
+	}
+}
 
-	sh_perf_event_update(event, &event->hw, idx);
+static void sh_pmu_start(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
 
-	cpuc->events[idx] = NULL;
-	clear_bit(idx, cpuc->used_mask);
+	if (WARN_ON_ONCE(idx == -1))
+		return;
+
+	if (flags & PERF_EF_RELOAD)
+		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+
+	cpuc->events[idx] = event;
+	event->hw.state = 0;
+	sh_pmu->enable(hwc, idx);
+}
+
+static void sh_pmu_del(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+	sh_pmu_stop(event, PERF_EF_UPDATE);
+	__clear_bit(event->hw.idx, cpuc->used_mask);
 
 	perf_event_update_userpage(event);
 }
 
-static int sh_pmu_enable(struct perf_event *event)
+static int sh_pmu_enable(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	if (test_and_set_bit(idx, cpuc->used_mask)) {
+	if (__test_and_set_bit(idx, cpuc->used_mask)) {
 		idx = find_first_zero_bit(cpuc->used_mask, sh_pmu->num_events);
 		if (idx == sh_pmu->num_events)
 			return -EAGAIN;
 
-		set_bit(idx, cpuc->used_mask);
 		hwc->idx = idx;
 	}
 
 	sh_pmu->disable(hwc, idx);
 
-	cpuc->events[idx] = event;
-	set_bit(idx, cpuc->active_mask);
-
-	sh_pmu->enable(hwc, idx);
+	event->hw.state = PERF_HES_UPTODATE;
+	if (!(flags & PERF_EF_START))
+		event->hw.state = PERF_HES_STOPPED;
+	else
+		sh_pmu_start(event, PERF_EF_RELOAD);
 
 	perf_event_update_userpage(event);
 
@@ -268,7 +294,7 @@ static in sh_pmu_event_init(struct perf_
 	return err;
 }
 
-static void sh_pmu_pmu_enable(struct pmu *pmu)
+static void sh_pmu_enable(struct pmu *pmu)
 {
 	if (!sh_pmu_initialized())
 		return;
@@ -276,7 +302,7 @@ static void sh_pmu_pmu_enable(struct pmu
 	sh_pmu->enable_all();
 }
 
-static void sh_pmu_pmu_disable(struct pmu *pmu)
+static void sh_pmu_disable(struct pmu *pmu)
 {
 	if (!sh_pmu_initialized())
 		return;
@@ -285,11 +311,13 @@ static void sh_pmu_pmu_disable(struct pm
 }
 
 static struct pmu pmu = {
-	.pmu_enable	= sh_pmu_pmu_enable,
-	.pmu_disable	= sh_pmu_pmu_disable,
+	.pmu_enable	= sh_pmu_enable,
+	.pmu_disable	= sh_pmu_disable,
 	.event_init	= sh_pmu_event_init,
-	.enable		= sh_pmu_enable,
-	.disable	= sh_pmu_disable,
+	.add		= sh_pmu_add,
+	.del		= sh_pmu_del,
+	.start		= sh_pmu_start,
+	.stop		= sh_pmu_stop,
 	.read		= sh_pmu_read,
 };
 
Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -221,7 +221,7 @@ again:
 }
 
 static void
-armpmu_disable(struct perf_event *event)
+armpmu_del(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
@@ -254,10 +254,39 @@ armpmu_read(struct perf_event *event)
 }
 
 static void
-armpmu_unthrottle(struct perf_event *event)
+armpmu_stop(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
+	if (!armpmu)
+		return;
+
+	/*
+	 * ARM pmu always has to update the counter, so ignore
+	 * PERF_EF_UPDATE, see comments in armpmu_start().
+	 */
+	if (!(hwc->state & PERF_HES_STOPPED)) {
+		armpmu->disable(hwc, hwc->idx);
+		hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+	}
+}
+
+static void
+armpmu_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!armpmu)
+		return;
+
+	/*
+	 * ARM pmu always has to reprogram the period, so ignore
+	 * PERF_EF_RELOAD, see the comment below.
+	 */
+	if (flags & PERF_EF_RELOAD)
+		WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+
+	hwc->state = 0;
 	/*
 	 * Set the period again. Some counters can't be stopped, so when we
 	 * were throttled we simply disabled the IRQ source and the counter
@@ -270,7 +299,7 @@ armpmu_unthrottle(struct perf_event *eve
 }
 
 static int
-armpmu_enable(struct perf_event *event)
+armpmu_add(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
@@ -293,11 +322,9 @@ armpmu_enable(struct perf_event *event)
 	cpuc->events[idx] = event;
 	set_bit(idx, cpuc->active_mask);
 
-	/* Set the period for the event. */
-	armpmu_event_set_period(event, hwc, idx);
-
-	/* Enable the event. */
-	armpmu->enable(hwc, idx);
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+	if (flags & PERF_EF_START)
+		armpmu_start(event, PERF_EF_RELOAD);
 
 	/* Propagate our changes to the userspace mapping. */
 	perf_event_update_userpage(event);
@@ -531,7 +558,7 @@ static int armpmu_event_init(struct perf
 	return err;
 }
 
-static void armpmu_pmu_enable(struct pmu *pmu)
+static void armpmu_enable(struct pmu *pmu)
 {
 	/* Enable all of the perf events on hardware. */
 	int idx;
@@ -552,20 +579,21 @@ static void armpmu_pmu_enable(struct pmu
 	armpmu->start();
 }
 
-static void armpmu_pmu_disable(struct pmu *pmu)
+static void armpmu_disable(struct pmu *pmu)
 {
 	if (armpmu)
 		armpmu->stop();
 }
 
 static struct pmu pmu = {
-	.pmu_enable = armpmu_pmu_enable,
-	.pmu_disable= armpmu_pmu_disable,
-	.event_init = armpmu_event_init,
-	.enable	    = armpmu_enable,
-	.disable    = armpmu_disable,
-	.unthrottle = armpmu_unthrottle,
-	.read	    = armpmu_read,
+	.pmu_enable	= armpmu_enable,
+	.pmu_disable	= armpmu_disable,
+	.event_init	= armpmu_event_init,
+	.add		= armpmu_add,
+	.del		= armpmu_del,
+	.start		= armpmu_start,
+	.stop		= armpmu_stop,
+	.read		= armpmu_read,
 };
 
 /*
@@ -1047,7 +1075,7 @@ armv6pmu_handle_irq(int irq_num,
 			continue;
 
 		if (perf_event_overflow(event, 0, &data, regs))
-			armpmu->disable(hwc, idx);
+			armpmu_stop(event, 0);
 	}
 
 	/*
Index: linux-2.6/arch/powerpc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event.c
+++ linux-2.6/arch/powerpc/kernel/perf_event.c
@@ -402,6 +402,9 @@ static void power_pmu_read(struct perf_e
 {
 	s64 val, delta, prev;
 
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+
 	if (!event->hw.idx)
 		return;
 	/*
@@ -517,7 +520,7 @@ static void write_mmcr0(struct cpu_hw_ev
  * Disable all events to prevent PMU interrupts and to allow
  * events to be added or removed.
  */
-static void powerpc_pmu_pmu_disable(struct pmu *pmu)
+static void power_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -565,7 +568,7 @@ static void powerpc_pmu_pmu_disable(stru
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
  */
-static void powerpc_pmu_pmu_enable(struct pmu *pmu)
+static void power_pmu_enable(struct pmu *pmu)
 {
 	struct perf_event *event;
 	struct cpu_hw_events *cpuhw;
@@ -672,6 +675,8 @@ static void powerpc_pmu_pmu_enable(struc
 		}
 		local64_set(&event->hw.prev_count, val);
 		event->hw.idx = idx;
+		if (event->hw.state & PERF_HES_STOPPED)
+			val = 0;
 		write_pmc(idx, val);
 		perf_event_update_userpage(event);
 	}
@@ -727,7 +732,7 @@ static int collect_events(struct perf_ev
  * re-enable the PMU in order to get hw_perf_enable to do the
  * actual work of reconfiguring the PMU.
  */
-static int power_pmu_enable(struct perf_event *event)
+static int power_pmu_add(struct perf_event *event, int ef_flags)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -749,6 +754,9 @@ static int power_pmu_enable(struct perf_
 	cpuhw->events[n0] = event->hw.config;
 	cpuhw->flags[n0] = event->hw.event_base;
 
+	if (!(ef_flags & PERF_EF_START))
+		event->hw.state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
 	/*
 	 * If group events scheduling transaction was started,
 	 * skip the schedulability test here, it will be peformed
@@ -777,7 +785,7 @@ nocheck:
 /*
  * Remove a event from the PMU.
  */
-static void power_pmu_disable(struct perf_event *event)
+static void power_pmu_del(struct perf_event *event, int ef_flags)
 {
 	struct cpu_hw_events *cpuhw;
 	long i;
@@ -823,27 +831,53 @@ static void power_pmu_disable(struct per
 }
 
 /*
- * Re-enable interrupts on a event after they were throttled
- * because they were coming too fast.
+ * POWER-PMU does not support disabling individual counters, hence
+ * program their cycle counter to their max value and ignore the interrupts.
  */
-static void power_pmu_unthrottle(struct perf_event *event)
+
+static void power_pmu_start(struct perf_event *event, int ef_flags)
 {
-	s64 val, left;
 	unsigned long flags;
+	s64 left;
 
 	if (!event->hw.idx || !event->hw.sample_period)
 		return;
+
+	if (!(event->hw.state & PERF_HES_STOPPED))
+		return;
+
+	if (ef_flags & PERF_EF_RELOAD)
+		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+
+	local_irq_save(flags);
+	perf_pmu_disable(event->pmu);
+
+	event->hw.state = 0;
+	left = local64_read(&event->hw.period_left);
+	write_pmc(event->hw.idx, left);
+
+	perf_event_update_userpage(event);
+	perf_pmu_enable(event->pmu);
+	local_irq_restore(flags);
+}
+
+static void power_pmu_stop(struct perf_event *event, int ef_flags)
+{
+	unsigned long flags;
+
+	if (!event->hw.idx || !event->hw.sample_period)
+		return;
+
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+
 	local_irq_save(flags);
 	perf_pmu_disable(event->pmu);
+
 	power_pmu_read(event);
-	left = event->hw.sample_period;
-	event->hw.last_period = left;
-	val = 0;
-	if (left < 0x80000000L)
-		val = 0x80000000L - left;
-	write_pmc(event->hw.idx, val);
-	local64_set(&event->hw.prev_count, val);
-	local64_set(&event->hw.period_left, left);
+	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+	write_pmc(event->hw.idx, 0);
+
 	perf_event_update_userpage(event);
 	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
@@ -1128,13 +1162,14 @@ static int power_pmu_event_init(struct p
 }
 
 struct pmu power_pmu = {
-	.pmu_enable	= power_pmu_pmu_enable,
-	.pmu_disable	= power_pmu_pmu_disable,
-	.event_init	= pmwer_pmu_event_init,
-	.enable		= power_pmu_enable,
-	.disable	= power_pmu_disable,
+	.pmu_enable	= power_pmu_enable,
+	.pmu_disable	= power_pmu_disable,
+	.event_init	= power_pmu_event_init,
+	.add		= power_pmu_add,
+	.del		= power_pmu_del,
+	.start		= power_pmu_start,
+	.stop		= power_pmu_stop,
 	.read		= power_pmu_read,
-	.unthrottle	= power_pmu_unthrottle,
 	.start_txn	= power_pmu_start_txn,
 	.cancel_txn	= power_pmu_cancel_txn,
 	.commit_txn	= power_pmu_commit_txn,
@@ -1152,6 +1187,11 @@ static void record_and_restart(struct pe
 	s64 prev, delta, left;
 	int record = 0;
 
+	if (event->hw.state & PERF_HES_STOPPED) {
+		write_pmc(event->hw.idx, 0);
+		return;
+	}
+
 	/* we don't have to worry about interrupts here */
 	prev = local64_read(&event->hw.prev_count);
 	delta = (val - prev) & 0xfffffffful;
@@ -1174,6 +1214,11 @@ static void record_and_restart(struct pe
 			val = 0x80000000LL - left;
 	}
 
+	write_pmc(event->hw.idx, val);
+	local64_set(&event->hw.prev_count, val);
+	local64_set(&event->hw.period_left, left);
+	perf_event_update_userpage(event);
+
 	/*
 	 * Finally record data if requested.
 	 */
@@ -1186,23 +1231,9 @@ static void record_and_restart(struct pe
 		if (event->attr.sample_type & PERF_SAMPLE_ADDR)
 			perf_get_data_addr(regs, &data.addr);
 
-		if (perf_event_overflow(event, nmi, &data, regs)) {
-			/*
-			 * Interrupts are coming too fast - throttle them
-			 * by setting the event to 0, so it will be
-			 * at least 2^30 cycles until the next interrupt
-			 * (assuming each event counts at most 2 counts
-			 * per cycle).
-			 */
-			val = 0;
-			left = ~0ULL >> 1;
-		}
+		if (perf_event_overflow(event, nmi, &data, regs))
+			power_pmu_stop(event, 0);
 	}
-
-	write_pmc(event->hw.idx, val);
-	local64_set(&event->hw.prev_count, val);
-	local64_set(&event->hw.period_left, left);
-	perf_event_update_userpage(event);
 }
 
 /*
Index: linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ linux-2.6/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -156,6 +156,9 @@ static void fsl_emb_pmu_read(struct perf
 {
 	s64 val, delta, prev;
 
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+
 	/*
 	 * Performance monitor interrupts come even when interrupts
 	 * are soft-disabled, as long as interrupts are hard-enabled.
@@ -177,7 +180,7 @@ static void fsl_emb_pmu_read(struct perf
  * Disable all events to prevent PMU interrupts and to allow
  * events to be added or removed.
  */
-static void fsl_emb_pmu_pmu_disable(struct pmu *pmu)
+static void fsl_emb_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -216,7 +219,7 @@ static void fsl_emb_pmu_pmu_disable(stru
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
  */
-static void fsl_emb_pmu_pmu_enable(struct pmu *pmu)
+static void fsl_emb_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -263,7 +266,7 @@ static int collect_events(struct perf_ev
 }
 
 /* context locked on entry */
-static int fsl_emb_pmu_enable(struct perf_event *event)
+static int fsl_emb_pmu_add(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuhw;
 	int ret = -EAGAIN;
@@ -302,6 +305,12 @@ static int fsl_emb_pmu_enable(struct per
 			val = 0x80000000L - left;
 	}
 	atomic64_set(&event->hw.prev_count, val);
+
+	if (!(flags & PERF_EF_START)) {
+		event->hw.state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+		val = 0;
+	}
+
 	write_pmc(i, val);
 	perf_event_update_userpage(event);
 
@@ -316,7 +325,7 @@ static int fsl_emb_pmu_enable(struct per
 }
 
 /* context locked on entry */
-static void fsl_emb_pmu_disable(struct perf_event *event)
+static void fsl_emb_pmu_del(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuhw;
 	int i = event->hw.idx;
@@ -353,30 +362,48 @@ static void fsl_emb_pmu_disable(struct p
 	put_cpu_var(cpu_hw_events);
 }
 
-/*
- * Re-enable interrupts on a event after they were throttled
- * because they were coming too fast.
- *
- * Context is locked on entry, but perf is not disabled.
- */
-static void fsl_emb_pmu_unthrottle(struct perf_event *event)
+static void fsl_emb_pmu_start(struct perf_event *event, int flags)
 {
-	s64 val, left;
 	unsigned long flags;
 
 	if (event->hw.idx < 0 || !event->hw.sample_period)
 		return;
+
+	if (!(event->hw.state & PERF_HES_STOPPED))
+		return;
+
+	if (flags & PERF_EF_RELOAD)
+		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+
 	local_irq_save(flags);
 	perf_pmu_disable(event->pmu);
-	fsl_emb_pmu_read(event);
-	left = event->hw.sample_period;
-	event->hw.last_period = left;
-	val = 0;
-	if (left < 0x80000000L)
-		val = 0x80000000L - left;
-	write_pmc(event->hw.idx, val);
-	atomic64_set(&event->hw.prev_count, val);
-	atomic64_set(&event->hw.period_left, left);
+
+	event->hw.state = 0;
+	left = local64_read(&event->hw.period_left);
+	write_pmc(event->hw.idx, left);
+
+	perf_event_update_userpage(event);
+	perf_pmu_enable(event->pmu);
+	local_irq_restore(flags);
+}
+
+static void fsl_emb_pmu_stop(struct perf_event *event, int flags)
+{
+	unsigned long flags;
+
+	if (event->hw.idx < 0 || !event->hw.sample_period)
+		return;
+
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+
+	local_irq_save(flags);
+	perf_pmu_disable(event->pmu);
+
+	power_pmu_read(event);
+	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+	write_pmc(event->hw.idx, 0);
+
 	perf_event_update_userpage(event);
 	perf_pmu_enable(event->pmu);
 	local_irq_restore(flags);
@@ -524,13 +551,14 @@ static int fsl_emb_pmu_event_init(struct
 }
 
 static struct pmu fsl_emb_pmu = {
-	.pmu_enable	= fsl_emb_pmu_pmu_enable,
-	.pmu_disable	= fsl_emb_pmu_pmu_disable,
+	.pmu_enable	= fsl_emb_pmu_enable,
+	.pmu_disable	= fsl_emb_pmu_disable,
 	.event_init	= fsl_emb_pmu_event_init,
-	.enable		= fsl_emb_pmu_enable,
-	.disable	= fsl_emb_pmu_disable,
+	.add		= fsl_emb_pmu_add,
+	.del		= fsl_emb_pmu_del,
+	.start		= fsl_emb_pmu_start,
+	.stop		= fsl_emb_pmu_stop,
 	.read		= fsl_emb_pmu_read,
-	.unthrottle	= fsl_emb_pmu_unthrottle,
 };
 
 /*
@@ -545,6 +573,11 @@ static void record_and_restart(struct pe
 	s64 prev, delta, left;
 	int record = 0;
 
+	if (event->hw.state & PERF_HES_STOPPED) {
+		write_pmc(event->hw.idx, 0);
+		return;
+	}
+
 	/* we don't have to worry about interrupts here */
 	prev = atomic64_read(&event->hw.prev_count);
 	delta = (val - prev) & 0xfffffffful;
@@ -567,31 +600,21 @@ static void record_and_restart(struct pe
 			val = 0x80000000LL - left;
 	}
 
+	write_pmc(event->hw.idx, val);
+	atomic64_set(&event->hw.prev_count, val);
+	atomic64_set(&event->hw.period_left, left);
+	perf_event_update_userpage(event);
+
 	/*
 	 * Finally record data if requested.
 	 */
 	if (record) {
-		struct perf_sample_data data = {
-			.period	= event->hw.last_period,
-		};
+		perf_sample_data_init(&data, ~0ULL);
+		data.period = event->hw.last_period;
 
-		if (perf_event_overflow(event, nmi, &data, regs)) {
-			/*
-			 * Interrupts are coming too fast - throttle them
-			 * by setting the event to 0, so it will be
-			 * at least 2^30 cycles until the next interrupt
-			 * (assuming each event counts at most 2 counts
-			 * per cycle).
-			 */
-			val = 0;
-			left = ~0ULL >> 1;
-		}
+		if (perf_event_overflow(event, nmi, &data, regs))
+			fsl_emb_pmu_stop(event, 0);
 	}
-
-	write_pmc(event->hw.idx, val);
-	atomic64_set(&event->hw.prev_count, val);
-	atomic64_set(&event->hw.period_left, left);
-	perf_event_update_userpage(event);
 }
 
 static void perf_event_interrupt(struct pt_regs *regs)
Index: linux-2.6/arch/sparc/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/sparc/kernel/perf_event.c
+++ linux-2.6/arch/sparc/kernel/perf_event.c
@@ -658,13 +658,16 @@ static u64 maybe_change_configuration(st
 
 		enc = perf_event_get_enc(cpuc->events[i]);
 		pcr &= ~mask_for_index(idx);
-		pcr |= event_encoding(enc, idx);
+		if (hwc->state & PERF_HES_STOPPED)
+			pcr |= nop_for_index(idx);
+		else
+			pcr |= event_encoding(enc, idx);
 	}
 out:
 	return pcr;
 }
 
-static void sparc_pmu_pmu_enable(struct pmu *pmu)
+static void sparc_pmu_enable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	u64 pcr;
@@ -691,7 +694,7 @@ static void sparc_pmu_pmu_enable(struct 
 	pcr_ops->write(cpuc->pcr);
 }
 
-static void sparc_pmu_pmu_disable(struct pmu *pmu)
+static void sparc_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	u64 val;
@@ -710,10 +713,53 @@ static void sparc_pmu_pmu_disable(struct
 	pcr_ops->write(cpuc->pcr);
 }
 
-static void sparc_pmu_disable(struct perf_event *event)
+static int active_event_index(struct cpu_hw_events *cpuc,
+			      struct perf_event *event)
+{
+	int i;
+
+	for (i = 0; i < cpuc->n_events; i++) {
+		if (cpuc->event[i] == event)
+			break;
+	}
+	BUG_ON(i == cpuc->n_events);
+	return cpuc->current_idx[i];
+}
+
+static void sparc_pmu_start(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	int idx = active_event_index(cpuc, event);
+
+	if (flags & PERF_EF_RELOAD) {
+		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+		sparc_perf_event_set_period(event, &event->hw, idx);
+	}
+
+	event->hw.state = 0;
+
+	sparc_pmu_enable_event(cpuc, &event->hw, idx);
+}
+
+static void sparc_pmu_stop(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	int idx = active_event_index(cpuc, event);
+
+	if (!(event->hw.state & PERF_HES_STOPPED)) {
+		sparc_pmu_disable_event(cpuc, &event->hw, idx);
+		event->hw.state |= PERF_HES_STOPPED;
+	}
+
+	if (!(event->hw.state & PERF_HES_UPTODATE) && (flags & PERF_EF_UPDATE)) {
+		sparc_perf_event_update(event, &event->hw, idx);
+		event->hw.state |= PERF_HES_UPTODATE;
+	}
+}
+
+static void sparc_pmu_del(struct perf_event *event, int _flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct hw_perf_event *hwc = &event->hw;
 	unsigned long flags;
 	int i;
 
@@ -722,7 +768,10 @@ static void sparc_pmu_disable(struct per
 
 	for (i = 0; i < cpuc->n_events; i++) {
 		if (event == cpuc->event[i]) {
-			int idx = cpuc->current_idx[i];
+			/* Absorb the final count and turn off the
+			 * event.
+			 */
+			sparc_pmu_stop(event, PERF_EF_UPDATE);
 
 			/* Shift remaining entries down into
 			 * the existing slot.
@@ -734,13 +783,6 @@ static void sparc_pmu_disable(struct per
 					cpuc->current_idx[i];
 			}
 
-			/* Absorb the final count and turn off the
-			 * event.
-			 */
-			sparc_pmu_disable_event(cpuc, hwc, idx);
-			barrier();
-			sparc_perf_event_update(event, hwc, idx);
-
 			perf_event_update_userpage(event);
 
 			cpuc->n_events--;
@@ -752,19 +794,6 @@ static void sparc_pmu_disable(struct per
 	local_irq_restore(flags);
 }
 
-static int active_event_index(struct cpu_hw_events *cpuc,
-			      struct perf_event *event)
-{
-	int i;
-
-	for (i = 0; i < cpuc->n_events; i++) {
-		if (cpuc->event[i] == event)
-			break;
-	}
-	BUG_ON(i == cpuc->n_events);
-	return cpuc->current_idx[i];
-}
-
 static void sparc_pmu_read(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -774,15 +803,6 @@ static void sparc_pmu_read(struct perf_e
 	sparc_perf_event_update(event, hwc, idx);
 }
 
-static void sparc_pmu_unthrottle(struct perf_event *event)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	int idx = active_event_index(cpuc, event);
-	struct hw_perf_event *hwc = &event->hw;
-
-	sparc_pmu_enable_event(cpuc, hwc, idx);
-}
-
 static atomic_t active_events = ATOMIC_INIT(0);
 static DEFINE_MUTEX(pmc_grab_mutex);
 
@@ -984,7 +1004,7 @@ static int collect_events(struct perf_ev
 	return n;
 }
 
-static int sparc_pmu_enable(struct perf_event *event)
+static int sparc_pmu_add(struct perf_event *event, int ef_flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	int n0, ret = -EAGAIN;
@@ -1001,6 +1021,10 @@ static int sparc_pmu_enable(struct perf_
 	cpuc->events[n0] = event->hw.event_base;
 	cpuc->current_idx[n0] = PIC_NO_INDEX;
 
+	event->hw.state = PERF_HES_UPTODATE;
+	if (!(ef_flags & PERF_EF_START))
+		event->hw.state |= PERF_HES_STOPPED;
+
 	/*
 	 * If group events scheduling transaction was started,
 	 * skip the schedulability test here, it will be peformed
@@ -1156,13 +1180,14 @@ static int sparc_pmu_commit_txn(struct p
 }
 
 static struct pmu pmu = {
-	.pmu_enable	= sparc_pmu_pmu_enable,
-	.pmu_disable	= sparc_pmu_pmu_disable,
+	.pmu_enable	= sparc_pmu_enable,
+	.pmu_disable	= sparc_pmu_disable,
 	.event_init	= sparc_pmu_event_init,
-	.enable		= sparc_pmu_enable,
-	.disable	= sparc_pmu_disable,
+	.add		= sparc_pmu_add,
+	.del		= sparc_pmu_del,
+	.start		= sparc_pmu_start,
+	.stop		= sparc_pmu_stop,
 	.read		= sparc_pmu_read,
-	.unthrottle	= sparc_pmu_unthrottle,
 	.start_txn	= sparc_pmu_start_txn,
 	.cancel_txn	= sparc_pmu_cancel_txn,
 	.commit_txn	= sparc_pmu_commit_txn,
@@ -1243,7 +1268,7 @@ static int __kprobes perf_event_nmi_hand
 			continue;
 
 		if (perf_event_overflow(event, 1, &data, regs))
-			sparc_pmu_disable_event(cpuc, hwc, idx);
+			sparc_pmu_stop(event, 0);
 	}
 
 	return NOTIFY_STOP;
Index: linux-2.6/include/linux/ftrace_event.h
===================================================================
--- linux-2.6.orig/include/linux/ftrace_event.h
+++ linux-2.6/include/linux/ftrace_event.h
@@ -245,8 +245,8 @@ DECLARE_PER_CPU(struct pt_regs, perf_tra
 
 extern int  perf_trace_init(struct perf_event *event);
 extern void perf_trace_destroy(struct perf_event *event);
-extern int  perf_trace_enable(struct perf_event *event);
-extern void perf_trace_disable(struct perf_event *event);
+extern int  perf_trace_add(struct perf_event *event, int flags);
+extern void perf_trace_del(struct perf_event *event, int flags);
 extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
 				     char *filter_str);
 extern void ftrace_profile_free_filter(struct perf_event *event);
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -536,6 +536,7 @@ struct hw_perf_event {
 		struct arch_hw_breakpoint	info;
 #endif
 	};
+	int				state;
 	local64_t			prev_count;
 	u64				sample_period;
 	u64				last_period;
@@ -547,6 +548,12 @@ struct hw_perf_event {
 #endif
 };
 
+/*
+ * hw_perf_event::state flags
+ */
+#define PERF_HES_STOPPED	0x01 /* the counter is stopped */
+#define PERF_HES_UPTODATE	0x02 /* event->count up-to-date */
+
 struct perf_event;
 
 /*
@@ -562,28 +569,48 @@ struct pmu {
 
 	int				*pmu_disable_count;
 
+	/*
+	 * Fully disable/enable this PMU, can be used to protect from the PMI
+	 * as well as for lazy/batch writing of the MSRs.
+	 */
 	void (*pmu_enable)		(struct pmu *pmu);
 	void (*pmu_disable)		(struct pmu *pmu);
 
 	/*
+	 * Try and initialize the event for this PMU.
 	 * Should return -ENOENT when the @event doesn't match this PMU.
 	 */
 	int (*event_init)		(struct perf_event *event);
 
-	int  (*enable)			(struct perf_event *event);
-	void (*disable)			(struct perf_event *event);
-	int  (*start)			(struct perf_event *event);
-	void (*stop)			(struct perf_event *event);
+#define PERF_EF_START	0x01		/* start the counter when adding    */
+#define PERF_EF_RELOAD	0x02		/* reload the counter when starting */
+#define PERF_EF_UPDATE	0x04		/* update the counter when stopping */
+
+	/*
+	 * Adds/Removes a counter to/from the PMU, can be done inside
+	 * a transaction, see the ->*_txn() methods.
+	 */
+	int  (*add)			(struct perf_event *event, int flags);
+	void (*del)			(struct perf_event *event, int flags);
+
+	/*
+	 * Starts/Stops a counter present on the PMU. The PMI handler
+	 * should stop the counter when perf_event_overflow() returns
+	 * !0. ->start() will be used to continue.
+	 */
+	void (*start)			(struct perf_event *event, int flags);
+	void (*stop)			(struct perf_event *event, int flags);
+
+	/*
+	 * Updates the counter value of the event.
+	 */
 	void (*read)			(struct perf_event *event);
-	void (*unthrottle)		(struct perf_event *event);
 
 	/*
 	 * Group events scheduling is treated as a transaction, add
 	 * group events as a whole and perform one schedulability test.
 	 * If the test fails, roll back the whole group
-	 */
-
-	/*
+	 *
 	 * Start the transaction, after this ->enable() doesn't need to
 	 * do schedulability tests.
 	 */
@@ -678,7 +705,7 @@ struct perf_event {
 	int				nr_siblings;
 	int				group_flags;
 	struct perf_event		*group_leader;
-	struct pmu		*pmu;
+	struct pmu			*pmu;
 
 	enum perf_event_active_state	state;
 	unsigned int			attach_state;
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c
@@ -713,7 +713,7 @@ again:
 		data.period = event->hw.last_period;
 
 		if (perf_event_overflow(event, 1, &data, regs))
-			x86_pmu_stop(event);
+			x86_pmu_stop(event, 0);
 	}
 
 	intel_pmu_ack_status(ack);
Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -491,7 +491,7 @@ static void __intel_pmu_pebs_event(struc
 		regs.flags &= ~PERF_EFLAGS_EXACT;
 
 	if (perf_event_overflow(event, 1, &data, &regs))
-		x86_pmu_stop(event);
+		x86_pmu_stop(event, 0);
 }
 
 static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
Index: linux-2.6/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6.orig/kernel/hw_breakpoint.c
+++ linux-2.6/kernel/hw_breakpoint.c
@@ -570,10 +570,35 @@ static int hw_breakpoint_event_init(stru
 	return 0;
 }
 
+static int hw_breakpoint_add(struct perf_event *bp, int flags)
+{
+	if (!(flags & PERF_EF_START))
+		bp->hw.state = PERF_HES_STOPPED;
+
+	return arch_install_hw_breakpoint(bp);
+}
+
+static void hw_breakpoint_del(struct perf_event *bp, int flags)
+{
+	arch_uninstall_hw_breakpoint(bp);
+}
+
+static void hw_breakpoint_start(struct perf_event *bp, int flags)
+{
+	bp->hw.state = 0;
+}
+
+static void hw_breakpoint_stop(struct perf_event *bp, int flags)
+{
+	bp->hw.state = PERF_HES_STOPPED;
+}
+
 static struct pmu perf_breakpoint = {
 	.event_init	= hw_breakpoint_event_init,
-	.enable		= arch_install_hw_breakpoint,
-	.disable	= arch_uninstall_hw_breakpoint,
+	.add		= hw_breakpoint_add,
+	.del		= hw_breakpoint_del,
+	.start		= hw_breakpoint_start,
+	.stop		= hw_breakpoint_stop,
 	.read		= hw_breakpoint_pmu_read,
 };
 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (10 preceding siblings ...)
  2010-06-24 14:28 ` [RFC][PATCH 11/11] perf: Rework the PMU methods Peter Zijlstra
@ 2010-06-25 11:11 ` Will Deacon
  2010-06-25 11:16   ` Peter Zijlstra
  2010-06-26 11:22 ` Matt Fleming
  2010-06-26 16:22 ` Corey Ashford
  13 siblings, 1 reply; 56+ messages in thread
From: Will Deacon @ 2010-06-25 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

Hi Peter,

On Thu, 2010-06-24 at 15:28 +0100, Peter Zijlstra wrote:
> These patches prepare the perf code for multiple pmus (no user
> interface yet, Lin Ming is working on that). These patches remove all
> weak functions and rework the struct pmu interface.
> 
> The patches are boot tested on x86_64 and compile tested on: powerpc
> (!fsl, what config is that?), sparc and arm (sorry no SH compiler)
> 

I tried to test these for ARM, but I get compilation errors. Maybe you
forgot to enable perf in the ARM .config? Anyway, the compiler log is:

  CC      arch/arm/kernel/perf_event.o
arch/arm/kernel/perf_event.c: In function 'validate_event':
arch/arm/kernel/perf_event.c:342: error: 'pmu' undeclared (first use in this function)
arch/arm/kernel/perf_event.c:342: error: (Each undeclared identifier is reported only once
arch/arm/kernel/perf_event.c:342: error: for each function it appears in.)
arch/arm/kernel/perf_event.c: In function 'armpmu_event_init':
arch/arm/kernel/perf_event.c:536: warning: return makes integer from pointer without a cast
arch/arm/kernel/perf_event.c: In function 'init_hw_perf_events':
arch/arm/kernel/perf_event.c:3037: error: expected ';' before 'return'
arch/arm/kernel/perf_event.c:3038: warning: no return statement in function returning non-void
make[1]: *** [arch/arm/kernel/perf_event.o] Error 1
make: *** [arch/arm/kernel] Error 2

I think there's a missing semicolon, an ERR_PTR that should be a PTR_ERR
and an ordering issue with the pmu struct.

Cheers,

Will


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-25 11:11 ` [RFC][PATCH 00/11] perf pmu interface -v2 Will Deacon
@ 2010-06-25 11:16   ` Peter Zijlstra
  2010-06-25 14:36     ` Will Deacon
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-25 11:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: paulus, stephane eranian, Robert Richter, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, 2010-06-25 at 12:11 +0100, Will Deacon wrote:
> Hi Peter,
> 
> On Thu, 2010-06-24 at 15:28 +0100, Peter Zijlstra wrote:
> > These patches prepare the perf code for multiple pmus (no user
> > interface yet, Lin Ming is working on that). These patches remove all
> > weak functions and rework the struct pmu interface.
> > 
> > The patches are boot tested on x86_64 and compile tested on: powerpc
> > (!fsl, what config is that?), sparc and arm (sorry no SH compiler)
> > 
> 
> I tried to test these for ARM, but I get compilation errors. Maybe you
> forgot to enable perf in the ARM .config? Anyway, the compiler log is:
> 
>   CC      arch/arm/kernel/perf_event.o
> arch/arm/kernel/perf_event.c: In function 'validate_event':
> arch/arm/kernel/perf_event.c:342: error: 'pmu' undeclared (first use in this function)
> arch/arm/kernel/perf_event.c:342: error: (Each undeclared identifier is reported only once
> arch/arm/kernel/perf_event.c:342: error: for each function it appears in.)
> arch/arm/kernel/perf_event.c: In function 'armpmu_event_init':
> arch/arm/kernel/perf_event.c:536: warning: return makes integer from pointer without a cast
> arch/arm/kernel/perf_event.c: In function 'init_hw_perf_events':
> arch/arm/kernel/perf_event.c:3037: error: expected ';' before 'return'
> arch/arm/kernel/perf_event.c:3038: warning: no return statement in function returning non-void
> make[1]: *** [arch/arm/kernel/perf_event.o] Error 1
> make: *** [arch/arm/kernel] Error 2
> 
> I think there's a missing semicolon, an ERR_PTR that should be a PTR_ERR
> and an ordering issue with the pmu struct.

I seem to have lost a refresh before sending the emails, please check:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git perf-pmu

I pushed out updated patches there.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-25 11:16   ` Peter Zijlstra
@ 2010-06-25 14:36     ` Will Deacon
  2010-06-25 14:50       ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Will Deacon @ 2010-06-25 14:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

Hello,

On Fri, 2010-06-25 at 12:16 +0100, Peter Zijlstra wrote:
> I seem to have lost a refresh before sending the emails, please check:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git perf-pmu
> 
> I pushed out updated patches there.

Ok, I rebuilt my Kernel and perf tools from that tree and tested it on a
quad-core ARMv7 board. Per-task counters appear to work (software and
hardware) but pinned hardware counters always return 0:


root@will-lucid-nfs:~# perf stat -a -e cs -e cycles -e instructions -- ls
linux-2.6  tmp

 Performance counter stats for 'ls':

                 33  context-switches        
                  0  cycles                  
                  0  instructions             #      0.000 IPC  

        0.028572009  seconds time elapsed


It's odd if only ARM is affected in this way. Do pinned events still work
for other people?

Will


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-25 14:36     ` Will Deacon
@ 2010-06-25 14:50       ` Peter Zijlstra
  2010-07-01 14:36         ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-25 14:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: paulus, stephane eranian, Robert Richter, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, 2010-06-25 at 15:36 +0100, Will Deacon wrote:
> Hello,
> 
> On Fri, 2010-06-25 at 12:16 +0100, Peter Zijlstra wrote:
> > I seem to have lost a refresh before sending the emails, please check:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf.git perf-pmu
> > 
> > I pushed out updated patches there.
> 
> Ok, I rebuilt my Kernel and perf tools from that tree and tested it on a
> quad-core ARMv7 board. 

Ooh, neat toy ;-)

> Per-task counters appear to work (software and
> hardware) but pinned hardware counters always return 0:
> 
> 
> root@will-lucid-nfs:~# perf stat -a -e cs -e cycles -e instructions -- ls
> linux-2.6  tmp
> 
>  Performance counter stats for 'ls':
> 
>                  33  context-switches        
>                   0  cycles                  
>                   0  instructions             #      0.000 IPC  
> 
>         0.028572009  seconds time elapsed
> 
> 
> It's odd if only ARM is affected in this way. Do pinned events still work
> for other people?

/me goes build that exact tree on his x86_64.. and gets:

# perf stat -a -e cs -e cycles -e instructions -- ls > /dev/null

 Performance counter stats for 'ls':

             51  context-switches        
       24963513  cycles                  
        9225808  instructions             #      0.370 IPC  

    0.002389051  seconds time elapsed


Not exactly sure how I could have messed up the ARM architecture code to
make this happen though... will have a peek.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (11 preceding siblings ...)
  2010-06-25 11:11 ` [RFC][PATCH 00/11] perf pmu interface -v2 Will Deacon
@ 2010-06-26 11:22 ` Matt Fleming
  2010-06-26 16:22 ` Corey Ashford
  13 siblings, 0 replies; 56+ messages in thread
From: Matt Fleming @ 2010-06-26 11:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, paulus, stephane eranian, Robert Richter,
	Will Deacon, Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov,
	Lin Ming, Yanmin, Deng-Cheng Zhu, David Miller

On Thu, 24 Jun 2010 16:28:04 +0200, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> These patches prepare the perf code for multiple pmus (no user
> interface yet, Lin Ming is working on that). These patches remove all
> weak functions and rework the struct pmu interface.
> 
> The patches are boot tested on x86_64 and compile tested on: powerpc
> (!fsl, what config is that?), sparc and arm (sorry no SH compiler)

Hi Peter,

I tried your SH changes and I needed to apply this patch to get it to
compile. I haven't run the code yet, but I'll do that later today.

---

diff --git a/arch/sh/kernel/perf_event.c b/arch/sh/kernel/perf_event.c
index 50ff852..6bfbaec 100644
--- a/arch/sh/kernel/perf_event.c
+++ b/arch/sh/kernel/perf_event.c
@@ -251,7 +251,7 @@ static void sh_pmu_del(struct perf_event *event, int flags)
 	perf_event_update_userpage(event);
 }
 
-static int sh_pmu_enable(struct perf_event *event, int flags)
+static int sh_pmu_add(struct perf_event *event, int flags)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
@@ -283,7 +283,7 @@ static void sh_pmu_read(struct perf_event *event)
 	sh_perf_event_update(event, &event->hw, event->hw.idx);
 }
 
-static in sh_pmu_event_init(struct perf_event *event)
+static int sh_pmu_event_init(struct perf_event *event)
 {
 	int err = __hw_perf_event_init(event);
 	if (unlikely(err)) {
@@ -345,15 +345,15 @@ sh_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	return NOTIFY_OK;
 }
 
-int __cpuinit register_sh_pmu(struct sh_pmu *pmu)
+int __cpuinit register_sh_pmu(struct sh_pmu *_pmu)
 {
 	if (sh_pmu)
 		return -EBUSY;
-	sh_pmu = pmu;
+	sh_pmu = _pmu;
 
-	pr_info("Performance Events: %s support registered\n", pmu->name);
+	pr_info("Performance Events: %s support registered\n", _pmu->name);
 
-	WARN_ON(pmu->num_events > MAX_HWEVENTS);
+	WARN_ON(_pmu->num_events > MAX_HWEVENTS);
 
 	perf_pmu_register(&pmu);
 	perf_cpu_notifier(sh_pmu_notifier);

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
                   ` (12 preceding siblings ...)
  2010-06-26 11:22 ` Matt Fleming
@ 2010-06-26 16:22 ` Corey Ashford
  2010-06-28 15:13   ` Peter Zijlstra
  13 siblings, 1 reply; 56+ messages in thread
From: Corey Ashford @ 2010-06-26 16:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On 06/24/2010 07:28 AM, Peter Zijlstra wrote:
> These patches prepare the perf code for multiple pmus (no user
> interface yet, Lin Ming is working on that). These patches remove all
> weak functions and rework the struct pmu interface.
>
> The patches are boot tested on x86_64 and compile tested on: powerpc
> (!fsl, what config is that?), sparc and arm (sorry no SH compiler)
>
> On x86 perf stat and record with both software and hardware events still worked.
>
> I changed all (hardware) pmu implementations so there's a fair chance I messed
> something up, please have a look at it.
>
> If people agree with this, I'll continue with the per-pmu context stuff I
> talked about last time, which should get us the hardware write batching back.

Hi Peter,

These patches look like they are really taking us in the right 
direction.  Thanks for all your effort on this!

As for the "hardware write batching", can you describe a bit more about 
what you have in mind there?  I wonder if this might have something to 
do with accounting for PMU hardware which is slow to access, for 
example, via I2C via an internal bridge.

Cheers,

- Corey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 05/11] perf: register pmu implementations
  2010-06-24 14:28 ` [RFC][PATCH 05/11] perf: register pmu implementations Peter Zijlstra
@ 2010-06-28 13:21   ` Frederic Weisbecker
  2010-06-28 15:16     ` Peter Zijlstra
  2010-07-09  3:08   ` Paul Mackerras
  1 sibling, 1 reply; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-28 13:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Thu, Jun 24, 2010 at 04:28:09PM +0200, Peter Zijlstra wrote:
> +	if (bp->attr.type != PERF_TYPE_BREAKPOINT)
> +		return -ENOENT;
> +
> +	err = register_perf_hw_breakpoint(bp);
> +	if (err)
> +		return err;
> +
> +	bp->destroy = bp_perf_event_destroy;



Seems it would make sense to also have destroy in the pmu, it's the same
along every events in the same class right?

But this can be for later.


> +static LIST_HEAD(pmus);
> +static DEFINE_MUTEX(pmus_lock);
> +static struct srcu_struct pmus_srcu;
> +
> +int perf_pmu_register(struct pmu *pmu)
> +{
> +	mutex_lock(&pmus_lock);
> +	list_add_rcu(&pmu->entry, &pmus);
> +	mutex_unlock(&pmus_lock);
> +
> +	return 0;
> +}
> +
> +void perf_pmu_unregister(struct pmu *pmu)
> +{
> +	mutex_lock(&pmus_lock);
> +	list_del_rcu(&pmu->entry);
> +	mutex_unlock(&pmus_lock);
>  
> -			atomic_inc(&perf_swevent_enabled[event_id]);
> -			event->destroy = sw_perf_event_destroy;
> +	synchronize_srcu(&pmus_srcu);
> +}
> +
> +struct pmu *perf_init_event(struct perf_event *event)
> +{
> +	struct pmu *pmu = NULL;
> +	int idx;
> +
> +	idx = srcu_read_lock(&pmus_srcu);
> +	list_for_each_entry_rcu(pmu, &pmus, entry) {
> +		int ret = pmu->event_init(event);
> +		if (!ret)
> +			break;
> +		if (ret != -ENOENT) {
> +			pmu = ERR_PTR(ret);
> +			break;
>  		}
> -		pmu = &perf_ops_generic;
> -		break;
>  	}
> +	srcu_read_unlock(&pmus_srcu, idx);
>  
>  	return pmu;
>  }



I'm still not sure why all this locking is needed. We don't even
support pmus in modules.

Is there something coming soon that will use this?
I remember something about KVM.

And who will have to use srcu? It seems the event fastpath would
be concerned, right? Will that have an impact on the performances?



> @@ -5743,15 +5742,15 @@ perf_cpu_notify(struct notifier_block *s
>  {
>  	unsigned int cpu = (long)hcpu;
>  
> -	switch (action) {
> +	switch (action & ~CPU_TASKS_FROZEN) {
>  
>  	case CPU_UP_PREPARE:
> -	case CPU_UP_PREPARE_FROZEN:
> +	case CPU_DOWN_FAILED:
>  		perf_event_init_cpu(cpu);
>  		break;
>  
> +	case CPU_UP_CANCELED:
>  	case CPU_DOWN_PREPARE:
> -	case CPU_DOWN_PREPARE_FROZEN:
>  		perf_event_exit_cpu(cpu);
>  		break;



That doesn't seem to be related to this patch initial topic.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-26 16:22 ` Corey Ashford
@ 2010-06-28 15:13   ` Peter Zijlstra
  2010-06-30 17:19     ` Corey Ashford
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-28 15:13 UTC (permalink / raw)
  To: Corey Ashford
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Sat, 2010-06-26 at 09:22 -0700, Corey Ashford wrote:

> These patches look like they are really taking us in the right 
> direction.  Thanks for all your effort on this!

Yeah, although I seem to have managed to wreck all architectures tested
so far (I found some iffy on x86 too), I guess I need to carefully look
at things.

> As for the "hardware write batching", can you describe a bit more about 
> what you have in mind there?  I wonder if this might have something to 
> do with accounting for PMU hardware which is slow to access, for 
> example, via I2C via an internal bridge.

Right, so the write batching is basically delaying writing out the PMU
state to hardware until pmu::pmu_enable() time. It avoids having to
re-program the hardware when, due to a scheduling constraint, we have to
move counters around.

So say, we context switch a task, and remove the old events and add the
new ones under a single pmu::pmu_disable()/::pmu_enable() pair, we will
only hit the hardware twice (once to disable, once to enable), instead
of for each individual ::del()/::add().

For this to work we need to have an association between a context and a
pmu, otherwise its very hard to know what pmu to disable/enable; the
alternative is all of them which isn't very attractive.

Then again, it doesn't make sense to have task-counters on an I2C
attached PMU simply because writing to the PMU could cause context
switches.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 05/11] perf: register pmu implementations
  2010-06-28 13:21   ` Frederic Weisbecker
@ 2010-06-28 15:16     ` Peter Zijlstra
  2010-06-28 15:29       ` Frederic Weisbecker
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-28 15:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Mon, 2010-06-28 at 15:21 +0200, Frederic Weisbecker wrote:
> On Thu, Jun 24, 2010 at 04:28:09PM +0200, Peter Zijlstra wrote:
> > +	if (bp->attr.type != PERF_TYPE_BREAKPOINT)
> > +		return -ENOENT;
> > +
> > +	err = register_perf_hw_breakpoint(bp);
> > +	if (err)
> > +		return err;
> > +
> > +	bp->destroy = bp_perf_event_destroy;

> Seems it would make sense to also have destroy in the pmu, it's the same
> along every events in the same class right?
> 
> But this can be for later.

Ah, indeed.

> > +static LIST_HEAD(pmus);
> > +static DEFINE_MUTEX(pmus_lock);
> > +static struct srcu_struct pmus_srcu;
> > +
> > +int perf_pmu_register(struct pmu *pmu)
> > +{
> > +	mutex_lock(&pmus_lock);
> > +	list_add_rcu(&pmu->entry, &pmus);
> > +	mutex_unlock(&pmus_lock);
> > +
> > +	return 0;
> > +}
> > +
> > +void perf_pmu_unregister(struct pmu *pmu)
> > +{
> > +	mutex_lock(&pmus_lock);
> > +	list_del_rcu(&pmu->entry);
> > +	mutex_unlock(&pmus_lock);
> >  
> > -			atomic_inc(&perf_swevent_enabled[event_id]);
> > -			event->destroy = sw_perf_event_destroy;
> > +	synchronize_srcu(&pmus_srcu);
> > +}
> > +
> > +struct pmu *perf_init_event(struct perf_event *event)
> > +{
> > +	struct pmu *pmu = NULL;
> > +	int idx;
> > +
> > +	idx = srcu_read_lock(&pmus_srcu);
> > +	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > +		int ret = pmu->event_init(event);
> > +		if (!ret)
> > +			break;
> > +		if (ret != -ENOENT) {
> > +			pmu = ERR_PTR(ret);
> > +			break;
> >  		}
> > -		pmu = &perf_ops_generic;
> > -		break;
> >  	}
> > +	srcu_read_unlock(&pmus_srcu, idx);
> >  
> >  	return pmu;
> >  }
> 
> 
> 
> I'm still not sure why all this locking is needed. We don't even
> support pmus in modules.
> 
> Is there something coming soon that will use this?
> I remember something about KVM.

Possibly, not sure. We could put the unregister thing in a later patch,
but I wanted to make sure it was sanely possibly and its only a few
lines of code.

> And who will have to use srcu? It seems the event fastpath would
> be concerned, right? Will that have an impact on the performances?

Only event creation like above (perf_init_event) will have to use SRCU,
so not really a hot path.

> > @@ -5743,15 +5742,15 @@ perf_cpu_notify(struct notifier_block *s
> >  {
> >  	unsigned int cpu = (long)hcpu;
> >  
> > -	switch (action) {
> > +	switch (action & ~CPU_TASKS_FROZEN) {
> >  
> >  	case CPU_UP_PREPARE:
> > -	case CPU_UP_PREPARE_FROZEN:
> > +	case CPU_DOWN_FAILED:
> >  		perf_event_init_cpu(cpu);
> >  		break;
> >  
> > +	case CPU_UP_CANCELED:
> >  	case CPU_DOWN_PREPARE:
> > -	case CPU_DOWN_PREPARE_FROZEN:
> >  		perf_event_exit_cpu(cpu);
> >  		break;
> 
> 
> 
> That doesn't seem to be related to this patch initial topic.

Ah indeed, that needs to go live in its own patch.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 05/11] perf: register pmu implementations
  2010-06-28 15:16     ` Peter Zijlstra
@ 2010-06-28 15:29       ` Frederic Weisbecker
  0 siblings, 0 replies; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-28 15:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Mon, Jun 28, 2010 at 05:16:37PM +0200, Peter Zijlstra wrote:
> > >  	}
> > > +	srcu_read_unlock(&pmus_srcu, idx);
> > >  
> > >  	return pmu;
> > >  }
> > 
> > 
> > 
> > I'm still not sure why all this locking is needed. We don't even
> > support pmus in modules.
> > 
> > Is there something coming soon that will use this?
> > I remember something about KVM.
> 
> Possibly, not sure. We could put the unregister thing in a later patch,
> but I wanted to make sure it was sanely possibly and its only a few
> lines of code.



Ok.

 
> > And who will have to use srcu? It seems the event fastpath would
> > be concerned, right? Will that have an impact on the performances?
> 
> Only event creation like above (perf_init_event) will have to use SRCU,
> so not really a hot path.



Ah I see. The event itself is synchronized against the fast-path using rcu.
And then pmus themselves would be synchronized against events. Right
that makes sense.

But then why RCU (or SRCU, whatever)? I mean parent event creation is
quite rare. And child events won't need to be synchronized as far as the parent
keeps a reference to the pmu.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-24 14:28 ` [RFC][PATCH 09/11] perf: Default PMU ops Peter Zijlstra
@ 2010-06-29 14:49   ` Frederic Weisbecker
  2010-06-29 14:57     ` Peter Zijlstra
  2010-06-29 14:58   ` Frederic Weisbecker
  1 sibling, 1 reply; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 14:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Thu, Jun 24, 2010 at 04:28:13PM +0200, Peter Zijlstra wrote:
> Provide default implementations for the pmu txn methods, this allows
> us to remove some conditional code.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/perf_event.c |   48 ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -656,21 +656,14 @@ group_sched_in(struct perf_event *group_
>  {
>  	struct perf_event *event, *partial_group = NULL;
>  	struct pmu *pmu = group_event->pmu;
> -	bool txn = false;
>  
>  	if (group_event->state == PERF_EVENT_STATE_OFF)
>  		return 0;
>  
> -	/* Check if group transaction availabe */
> -	if (pmu->start_txn)
> -		txn = true;
> -
> -	if (txn)
> -		pmu->start_txn(pmu);
> +	pmu->start_txn(pmu);
>  
>  	if (event_sched_in(group_event, cpuctx, ctx)) {
> -		if (txn)
> -			pmu->cancel_txn(pmu);
> +		pmu->cancel_txn(pmu);
>  		return -EAGAIN;
>  	}
>  
> @@ -684,7 +677,7 @@ group_sched_in(struct perf_event *group_
>  		}
>  	}
>  
> -	if (!txn || !pmu->commit_txn(pmu))
> +	if (!pmu->commit_txn(pmu))
>  		return 0;
>  
>  group_error:
> @@ -699,8 +692,7 @@ group_error:
>  	}
>  	event_sched_out(group_event, cpuctx, ctx);
>  
> -	if (txn)
> -		pmu->cancel_txn(pmu);
> +	pmu->cancel_txn(pmu);
>  
>  	return -EAGAIN;
>  }
> @@ -4755,6 +4747,26 @@ static struct list_head pmus;
>  static DEFINE_MUTEX(pmus_lock);
>  static struct srcu_struct pmus_srcu;
>  
> +static void perf_pmu_nop(struct pmu *pmu)
> +{
> +}
> +
> +static void perf_pmu_start_txn(struct pmu *pmu)
> +{
> +	perf_pmu_disable(pmu);
> +}
> +
> +static int perf_pmu_commit_txn(struct pmu *pmu)
> +{
> +	perf_pmu_enable(pmu);
> +	return 0;
> +}
> +
> +static void perf_pmu_cancel_txn(struct pmu *pmu)
> +{
> +	perf_pmu_enable(pmu);


Did you mean disable here?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 14:49   ` Frederic Weisbecker
@ 2010-06-29 14:57     ` Peter Zijlstra
  2010-06-29 15:00       ` Frederic Weisbecker
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-29 14:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-29 at 16:49 +0200, Frederic Weisbecker wrote:
> > +static void perf_pmu_start_txn(struct pmu *pmu)
> > +{
> > +     perf_pmu_disable(pmu);
> > +}
> > +
> > +static int perf_pmu_commit_txn(struct pmu *pmu)
> > +{
> > +     perf_pmu_enable(pmu);
> > +     return 0;
> > +}
> > +
> > +static void perf_pmu_cancel_txn(struct pmu *pmu)
> > +{
> > +     perf_pmu_enable(pmu);
> 
> 
> Did you mean disable here? 

Nope, start_txn() will disable the pmu, both commit_txn and cancel_txn
can be used to close the transaction and should thus enable the pmu
again.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-24 14:28 ` [RFC][PATCH 09/11] perf: Default PMU ops Peter Zijlstra
  2010-06-29 14:49   ` Frederic Weisbecker
@ 2010-06-29 14:58   ` Frederic Weisbecker
  2010-06-29 14:59     ` Peter Zijlstra
  1 sibling, 1 reply; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 14:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Thu, Jun 24, 2010 at 04:28:13PM +0200, Peter Zijlstra wrote:
> Provide default implementations for the pmu txn methods, this allows
> us to remove some conditional code.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/perf_event.c |   48 ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -656,21 +656,14 @@ group_sched_in(struct perf_event *group_
>  {
>  	struct perf_event *event, *partial_group = NULL;
>  	struct pmu *pmu = group_event->pmu;
> -	bool txn = false;
>  
>  	if (group_event->state == PERF_EVENT_STATE_OFF)
>  		return 0;
>  
> -	/* Check if group transaction availabe */
> -	if (pmu->start_txn)
> -		txn = true;
> -
> -	if (txn)
> -		pmu->start_txn(pmu);
> +	pmu->start_txn(pmu);
>  
>  	if (event_sched_in(group_event, cpuctx, ctx)) {
> -		if (txn)
> -			pmu->cancel_txn(pmu);
> +		pmu->cancel_txn(pmu);
>  		return -EAGAIN;
>  	}
>  
> @@ -684,7 +677,7 @@ group_sched_in(struct perf_event *group_
>  		}
>  	}
>  
> -	if (!txn || !pmu->commit_txn(pmu))
> +	if (!pmu->commit_txn(pmu))
>  		return 0;
>  
>  group_error:
> @@ -699,8 +692,7 @@ group_error:
>  	}
>  	event_sched_out(group_event, cpuctx, ctx);
>  
> -	if (txn)
> -		pmu->cancel_txn(pmu);
> +	pmu->cancel_txn(pmu);
>  
>  	return -EAGAIN;
>  }
> @@ -4755,6 +4747,26 @@ static struct list_head pmus;
>  static DEFINE_MUTEX(pmus_lock);
>  static struct srcu_struct pmus_srcu;
>  
> +static void perf_pmu_nop(struct pmu *pmu)
> +{
> +}
> +
> +static void perf_pmu_start_txn(struct pmu *pmu)
> +{
> +	perf_pmu_disable(pmu);
> +}
> +
> +static int perf_pmu_commit_txn(struct pmu *pmu)
> +{
> +	perf_pmu_enable(pmu);
> +	return 0;
> +}
> +
> +static void perf_pmu_cancel_txn(struct pmu *pmu)
> +{
> +	perf_pmu_enable(pmu);
> +}


So why do you need perf_pmu_*able wrappers now that you brings stubs
if none is provided?

Actually, one problem is that it makes calling two indirect nops
for software events.

Should the txn things really map to the enable/disable ops is the
off-case? Probably better let pmu implementations deal with that.
If they didn't provide txn implementations, it means they don't need it,
hence it should directly map to a nop.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 14:58   ` Frederic Weisbecker
@ 2010-06-29 14:59     ` Peter Zijlstra
  2010-06-29 15:03       ` Frederic Weisbecker
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-29 14:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-29 at 16:58 +0200, Frederic Weisbecker wrote:
> On Thu, Jun 24, 2010 at 04:28:13PM +0200, Peter Zijlstra wrote:
> > Provide default implementations for the pmu txn methods, this allows
> > us to remove some conditional code.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> >  kernel/perf_event.c |   48 ++++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 36 insertions(+), 12 deletions(-)
> > 
> > Index: linux-2.6/kernel/perf_event.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/perf_event.c
> > +++ linux-2.6/kernel/perf_event.c
> > @@ -656,21 +656,14 @@ group_sched_in(struct perf_event *group_
> >  {
> >  	struct perf_event *event, *partial_group = NULL;
> >  	struct pmu *pmu = group_event->pmu;
> > -	bool txn = false;
> >  
> >  	if (group_event->state == PERF_EVENT_STATE_OFF)
> >  		return 0;
> >  
> > -	/* Check if group transaction availabe */
> > -	if (pmu->start_txn)
> > -		txn = true;
> > -
> > -	if (txn)
> > -		pmu->start_txn(pmu);
> > +	pmu->start_txn(pmu);
> >  
> >  	if (event_sched_in(group_event, cpuctx, ctx)) {
> > -		if (txn)
> > -			pmu->cancel_txn(pmu);
> > +		pmu->cancel_txn(pmu);
> >  		return -EAGAIN;
> >  	}
> >  
> > @@ -684,7 +677,7 @@ group_sched_in(struct perf_event *group_
> >  		}
> >  	}
> >  
> > -	if (!txn || !pmu->commit_txn(pmu))
> > +	if (!pmu->commit_txn(pmu))
> >  		return 0;
> >  
> >  group_error:
> > @@ -699,8 +692,7 @@ group_error:
> >  	}
> >  	event_sched_out(group_event, cpuctx, ctx);
> >  
> > -	if (txn)
> > -		pmu->cancel_txn(pmu);
> > +	pmu->cancel_txn(pmu);
> >  
> >  	return -EAGAIN;
> >  }
> > @@ -4755,6 +4747,26 @@ static struct list_head pmus;
> >  static DEFINE_MUTEX(pmus_lock);
> >  static struct srcu_struct pmus_srcu;
> >  
> > +static void perf_pmu_nop(struct pmu *pmu)
> > +{
> > +}
> > +
> > +static void perf_pmu_start_txn(struct pmu *pmu)
> > +{
> > +	perf_pmu_disable(pmu);
> > +}
> > +
> > +static int perf_pmu_commit_txn(struct pmu *pmu)
> > +{
> > +	perf_pmu_enable(pmu);
> > +	return 0;
> > +}
> > +
> > +static void perf_pmu_cancel_txn(struct pmu *pmu)
> > +{
> > +	perf_pmu_enable(pmu);
> > +}
> 
> 
> So why do you need perf_pmu_*able wrappers now that you brings stubs
> if none is provided?
> 
> Actually, one problem is that it makes calling two indirect nops
> for software events.
> 
> Should the txn things really map to the enable/disable ops is the
> off-case? Probably better let pmu implementations deal with that.
> If they didn't provide txn implementations, it means they don't need it,
> hence it should directly map to a nop.
> 

You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
*/ } ?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 14:57     ` Peter Zijlstra
@ 2010-06-29 15:00       ` Frederic Weisbecker
  0 siblings, 0 replies; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 15:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, Jun 29, 2010 at 04:57:53PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-29 at 16:49 +0200, Frederic Weisbecker wrote:
> > > +static void perf_pmu_start_txn(struct pmu *pmu)
> > > +{
> > > +     perf_pmu_disable(pmu);
> > > +}
> > > +
> > > +static int perf_pmu_commit_txn(struct pmu *pmu)
> > > +{
> > > +     perf_pmu_enable(pmu);
> > > +     return 0;
> > > +}
> > > +
> > > +static void perf_pmu_cancel_txn(struct pmu *pmu)
> > > +{
> > > +     perf_pmu_enable(pmu);
> > 
> > 
> > Did you mean disable here? 
> 
> Nope, start_txn() will disable the pmu, both commit_txn and cancel_txn
> can be used to close the transaction and should thus enable the pmu
> again.


Looks like I read that while walking on my arms, sorry.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 14:59     ` Peter Zijlstra
@ 2010-06-29 15:03       ` Frederic Weisbecker
  2010-06-29 16:34         ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 15:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, Jun 29, 2010 at 04:59:51PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-29 at 16:58 +0200, Frederic Weisbecker wrote:
> > On Thu, Jun 24, 2010 at 04:28:13PM +0200, Peter Zijlstra wrote:
> > > Provide default implementations for the pmu txn methods, this allows
> > > us to remove some conditional code.
> > > 
> > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > ---
> > >  kernel/perf_event.c |   48 ++++++++++++++++++++++++++++++++++++------------
> > >  1 file changed, 36 insertions(+), 12 deletions(-)
> > > 
> > > Index: linux-2.6/kernel/perf_event.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/perf_event.c
> > > +++ linux-2.6/kernel/perf_event.c
> > > @@ -656,21 +656,14 @@ group_sched_in(struct perf_event *group_
> > >  {
> > >  	struct perf_event *event, *partial_group = NULL;
> > >  	struct pmu *pmu = group_event->pmu;
> > > -	bool txn = false;
> > >  
> > >  	if (group_event->state == PERF_EVENT_STATE_OFF)
> > >  		return 0;
> > >  
> > > -	/* Check if group transaction availabe */
> > > -	if (pmu->start_txn)
> > > -		txn = true;
> > > -
> > > -	if (txn)
> > > -		pmu->start_txn(pmu);
> > > +	pmu->start_txn(pmu);
> > >  
> > >  	if (event_sched_in(group_event, cpuctx, ctx)) {
> > > -		if (txn)
> > > -			pmu->cancel_txn(pmu);
> > > +		pmu->cancel_txn(pmu);
> > >  		return -EAGAIN;
> > >  	}
> > >  
> > > @@ -684,7 +677,7 @@ group_sched_in(struct perf_event *group_
> > >  		}
> > >  	}
> > >  
> > > -	if (!txn || !pmu->commit_txn(pmu))
> > > +	if (!pmu->commit_txn(pmu))
> > >  		return 0;
> > >  
> > >  group_error:
> > > @@ -699,8 +692,7 @@ group_error:
> > >  	}
> > >  	event_sched_out(group_event, cpuctx, ctx);
> > >  
> > > -	if (txn)
> > > -		pmu->cancel_txn(pmu);
> > > +	pmu->cancel_txn(pmu);
> > >  
> > >  	return -EAGAIN;
> > >  }
> > > @@ -4755,6 +4747,26 @@ static struct list_head pmus;
> > >  static DEFINE_MUTEX(pmus_lock);
> > >  static struct srcu_struct pmus_srcu;
> > >  
> > > +static void perf_pmu_nop(struct pmu *pmu)
> > > +{
> > > +}
> > > +
> > > +static void perf_pmu_start_txn(struct pmu *pmu)
> > > +{
> > > +	perf_pmu_disable(pmu);
> > > +}
> > > +
> > > +static int perf_pmu_commit_txn(struct pmu *pmu)
> > > +{
> > > +	perf_pmu_enable(pmu);
> > > +	return 0;
> > > +}
> > > +
> > > +static void perf_pmu_cancel_txn(struct pmu *pmu)
> > > +{
> > > +	perf_pmu_enable(pmu);
> > > +}
> > 
> > 
> > So why do you need perf_pmu_*able wrappers now that you brings stubs
> > if none is provided?
> > 
> > Actually, one problem is that it makes calling two indirect nops
> > for software events.
> > 
> > Should the txn things really map to the enable/disable ops is the
> > off-case? Probably better let pmu implementations deal with that.
> > If they didn't provide txn implementations, it means they don't need it,
> > hence it should directly map to a nop.
> > 
> 
> You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
> */ } ?


Not really. pmu_*able and txn are there for different purposes.
A pmu implementation may want to provide enable/disable things
but not require any txn. Or one may just not need any of those,
like software events.

It should simply map to a nop if nothing is provided.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 10/11] perf: Shrink hw_perf_event
  2010-06-24 14:28 ` [RFC][PATCH 10/11] perf: Shrink hw_perf_event Peter Zijlstra
@ 2010-06-29 15:06   ` Frederic Weisbecker
  0 siblings, 0 replies; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Thu, Jun 24, 2010 at 04:28:14PM +0200, Peter Zijlstra wrote:
> Use hw_perf_event::period_left instead of hw_perf_event::remaning and
> win back 8 bytes.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


Acked-by: Frederic Weisbecker <fweisbec@gmail.com>



> ---
>  include/linux/perf_event.h |    1 -
>  kernel/perf_event.c        |   13 ++++++-------
>  2 files changed, 6 insertions(+), 8 deletions(-)
> 
> Index: linux-2.6/include/linux/perf_event.h
> ===================================================================
> --- linux-2.6.orig/include/linux/perf_event.h
> +++ linux-2.6/include/linux/perf_event.h
> @@ -529,7 +529,6 @@ struct hw_perf_event {
>  			int		last_cpu;
>  		};
>  		struct { /* software */
> -			s64		remaining;
>  			struct hrtimer	hrtimer;
>  		};
>  #ifdef CONFIG_HAVE_HW_BREAKPOINT
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -4590,14 +4590,13 @@ static void perf_swevent_start_hrtimer(s
>  	hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>  	hwc->hrtimer.function = perf_swevent_hrtimer;
>  	if (hwc->sample_period) {
> -		u64 period;
> +		s64 period = local64_read(&hwc->period_left);
>  
> -		if (hwc->remaining) {
> -			if (hwc->remaining < 0)
> +		if (period) {
> +			if (period < 0)
>  				period = 10000;
> -			else
> -				period = hwc->remaining;
> -			hwc->remaining = 0;
> +
> +			local64_set(&hwc->period_left, 0);
>  		} else {
>  			period = max_t(u64, 10000, hwc->sample_period);
>  		}
> @@ -4613,7 +4612,7 @@ static void perf_swevent_cancel_hrtimer(
>  
>  	if (hwc->sample_period) {
>  		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
> -		hwc->remaining = ktime_to_ns(remaining);
> +		local64_set(&hwc->period_left, ktime_to_ns(remaining));
>  
>  		hrtimer_cancel(&hwc->hrtimer);
>  	}
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 11/11] perf: Rework the PMU methods
  2010-06-24 14:28 ` [RFC][PATCH 11/11] perf: Rework the PMU methods Peter Zijlstra
@ 2010-06-29 15:37   ` Frederic Weisbecker
  2010-06-29 16:40     ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Thu, Jun 24, 2010 at 04:28:15PM +0200, Peter Zijlstra wrote:
> Replace pmu::{enable,disable,start,stop,unthrottle} with
> pmu::{add,del,start,stop}, all of which take a flags argument.
> 
> The new interface extends the capability to stop a counter while
> keeping it scheduled on the PMU. We replace the throttled state with
> the generic stopped state.
> 
> This also allows us to efficiently stop/start counters over certain
> code paths (like IRQ handlers).
> 
> It also allows scheduling a counter without it starting, allowing for
> a generic frozen state (useful for rotating stopped counters).
> 
> The stopped state is implemented in two different ways, depending on
> how the architecture implemented the throttled state:
> 
>  1) We disable the counter:
>     a) the pmu has per-counter enable bits, we flip that
>     b) we program a NOP event, preserving the counter state
> 
>  2) We store the counter state and ignore all read/overflow events
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


Acked-by: Frederic Weisbecker <fweisbec@gmail.com>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 15:03       ` Frederic Weisbecker
@ 2010-06-29 16:34         ` Peter Zijlstra
  2010-06-29 18:07           ` Frederic Weisbecker
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-29 16:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-29 at 17:03 +0200, Frederic Weisbecker wrote:
> > You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
> > */ } ?
> 
> 
> Not really. pmu_*able and txn are there for different purposes.
> A pmu implementation may want to provide enable/disable things
> but not require any txn. Or one may just not need any of those,
> like software events.
> 
> It should simply map to a nop if nothing is provided. 

Thing is, using at least the pmu_enable/disable fallback when no txn
methods are provided can save bunch of hardware writes. So this trivial
fallback makes sense.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 11/11] perf: Rework the PMU methods
  2010-06-29 15:37   ` Frederic Weisbecker
@ 2010-06-29 16:40     ` Peter Zijlstra
  2010-06-29 18:09       ` Frederic Weisbecker
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-29 16:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-29 at 17:37 +0200, Frederic Weisbecker wrote:

> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

I know it breaks at least x86 (although subtle, takes a few perf records
to daze and confuse), arm and sh.

I need to get back to this patch-set and sort that :-)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 16:34         ` Peter Zijlstra
@ 2010-06-29 18:07           ` Frederic Weisbecker
  2010-06-29 18:09             ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 18:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, Jun 29, 2010 at 06:34:10PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-29 at 17:03 +0200, Frederic Weisbecker wrote:
> > > You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
> > > */ } ?
> > 
> > 
> > Not really. pmu_*able and txn are there for different purposes.
> > A pmu implementation may want to provide enable/disable things
> > but not require any txn. Or one may just not need any of those,
> > like software events.
> > 
> > It should simply map to a nop if nothing is provided. 
> 
> Thing is, using at least the pmu_enable/disable fallback when no txn
> methods are provided can save bunch of hardware writes. So this trivial
> fallback makes sense.


In this case, archs should handle that by implementing their txn.

neglected pmu implementations shouldn't impact software pmus.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 18:07           ` Frederic Weisbecker
@ 2010-06-29 18:09             ` Peter Zijlstra
  2010-06-29 18:11               ` Frederic Weisbecker
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-29 18:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-29 at 20:07 +0200, Frederic Weisbecker wrote:
> On Tue, Jun 29, 2010 at 06:34:10PM +0200, Peter Zijlstra wrote:
> > On Tue, 2010-06-29 at 17:03 +0200, Frederic Weisbecker wrote:
> > > > You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
> > > > */ } ?

> neglected pmu implementations shouldn't impact software pmus.

With the above and not providing ->pmu_enable you get that.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 11/11] perf: Rework the PMU methods
  2010-06-29 16:40     ` Peter Zijlstra
@ 2010-06-29 18:09       ` Frederic Weisbecker
  0 siblings, 0 replies; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, Jun 29, 2010 at 06:40:01PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-29 at 17:37 +0200, Frederic Weisbecker wrote:
> 
> > Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
> 
> I know it breaks at least x86 (although subtle, takes a few perf records
> to daze and confuse), arm and sh.
> 
> I need to get back to this patch-set and sort that :-)


Ok :)

But at least the whole idea looks good. Concerning the software parts
I will make a second pass to optimize from my exclusion patchset, but
other than that, it sets a nice background for it.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 18:09             ` Peter Zijlstra
@ 2010-06-29 18:11               ` Frederic Weisbecker
  2010-06-29 18:19                 ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 18:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, Jun 29, 2010 at 08:09:10PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-29 at 20:07 +0200, Frederic Weisbecker wrote:
> > On Tue, Jun 29, 2010 at 06:34:10PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2010-06-29 at 17:03 +0200, Frederic Weisbecker wrote:
> > > > > You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
> > > > > */ } ?
> 
> > neglected pmu implementations shouldn't impact software pmus.
> 
> With the above and not providing ->pmu_enable you get that.


But we don't need to provide a pmu_enable for software events, right?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 18:11               ` Frederic Weisbecker
@ 2010-06-29 18:19                 ` Peter Zijlstra
  2010-06-29 18:21                   ` Frederic Weisbecker
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-29 18:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, 2010-06-29 at 20:11 +0200, Frederic Weisbecker wrote:
> On Tue, Jun 29, 2010 at 08:09:10PM +0200, Peter Zijlstra wrote:
> > On Tue, 2010-06-29 at 20:07 +0200, Frederic Weisbecker wrote:
> > > On Tue, Jun 29, 2010 at 06:34:10PM +0200, Peter Zijlstra wrote:
> > > > On Tue, 2010-06-29 at 17:03 +0200, Frederic Weisbecker wrote:
> > > > > > You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
> > > > > > */ } ?
> > 
> > > neglected pmu implementations shouldn't impact software pmus.
> > 
> > With the above and not providing ->pmu_enable you get that.
> 
> 
> But we don't need to provide a pmu_enable for software events, right?

My point exactly.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/11] perf: Default PMU ops
  2010-06-29 18:19                 ` Peter Zijlstra
@ 2010-06-29 18:21                   ` Frederic Weisbecker
  0 siblings, 0 replies; 56+ messages in thread
From: Frederic Weisbecker @ 2010-06-29 18:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Cyrill Gorcunov, Lin Ming, Yanmin, Deng-Cheng Zhu,
	David Miller, linux-kernel

On Tue, Jun 29, 2010 at 08:19:03PM +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-29 at 20:11 +0200, Frederic Weisbecker wrote:
> > On Tue, Jun 29, 2010 at 08:09:10PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2010-06-29 at 20:07 +0200, Frederic Weisbecker wrote:
> > > > On Tue, Jun 29, 2010 at 06:34:10PM +0200, Peter Zijlstra wrote:
> > > > > On Tue, 2010-06-29 at 17:03 +0200, Frederic Weisbecker wrote:
> > > > > > > You mean, if (!pmu->start_txn && pmu->pmu_enable) { /* install defaults
> > > > > > > */ } ?
> > > 
> > > > neglected pmu implementations shouldn't impact software pmus.
> > > 
> > > With the above and not providing ->pmu_enable you get that.
> > 
> > 
> > But we don't need to provide a pmu_enable for software events, right?
> 
> My point exactly.


Which I understand late. Agreed.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-28 15:13   ` Peter Zijlstra
@ 2010-06-30 17:19     ` Corey Ashford
  2010-06-30 18:11       ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Corey Ashford @ 2010-06-30 17:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On 06/28/2010 08:13 AM, Peter Zijlstra wrote:
> On Sat, 2010-06-26 at 09:22 -0700, Corey Ashford wrote:
> As for the "hardware write batching", can you describe a bit more about
>> what you have in mind there?  I wonder if this might have something to
>> do with accounting for PMU hardware which is slow to access, for
>> example, via I2C via an internal bridge.
>
> Right, so the write batching is basically delaying writing out the PMU
> state to hardware until pmu::pmu_enable() time. It avoids having to
> re-program the hardware when, due to a scheduling constraint, we have to
> move counters around.
>
> So say, we context switch a task, and remove the old events and add the
> new ones under a single pmu::pmu_disable()/::pmu_enable() pair, we will
> only hit the hardware twice (once to disable, once to enable), instead
> of for each individual ::del()/::add().
>
> For this to work we need to have an association between a context and a
> pmu, otherwise its very hard to know what pmu to disable/enable; the
> alternative is all of them which isn't very attractive.
>
> Then again, it doesn't make sense to have task-counters on an I2C
> attached PMU simply because writing to the PMU could cause context
> switches.

Thanks for your reply.

In our case, writing to some of the nest PMUs' control registers is done 
via an internal bridge.  We write to a specific memory address and an 
internal MMIO-to-SCOM bridge (SCOM is similar to I2C) translates it to 
serial and sends it over the internal serial bus.  The address we write 
to is based upon the control register's serial bus address, plus an 
offset from the base of MMIO-to-SCOM bridge.  The same process works for 
reads.

While it does not cause a context switch because there are no IO drivers 
to go through, it will take several thousand CPU cycles to complete, 
which by the same token, still makes them inappropriate for 
task-counters (not to mention that the nest units operate asynchronously 
from the CPU).

However, there still are concerns relative to writing these control 
registers from an interrupt handler because of the latency that will be 
incurred, however slow we choose to do the event rotation.  So at least 
for the Wire-Speed processor, we may need a worker thread of some sort 
to hand off the work to.

Our current code, based on linux 2.6.31 (soon to be 2.6.32) doesn't use 
worker threads; we are just taking the latency hit for now.

- Corey

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-30 17:19     ` Corey Ashford
@ 2010-06-30 18:11       ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-06-30 18:11 UTC (permalink / raw)
  To: Corey Ashford
  Cc: paulus, stephane eranian, Robert Richter, Will Deacon,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Wed, 2010-06-30 at 10:19 -0700, Corey Ashford wrote:
> However, there still are concerns relative to writing these control 
> registers from an interrupt handler because of the latency that will be 
> incurred, however slow we choose to do the event rotation.  So at least 
> for the Wire-Speed processor, we may need a worker thread of some sort 
> to hand off the work to. 

Right, once we have per-pmu contexts we can look at having means to
over-ride the means of rotation, such that the pmu can optionally drive
it itself instead of through the common tick.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-06-25 14:50       ` Peter Zijlstra
@ 2010-07-01 14:36         ` Peter Zijlstra
  2010-07-01 15:02           ` Peter Zijlstra
  2010-07-02 12:55           ` Will Deacon
  0 siblings, 2 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-07-01 14:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: paulus, stephane eranian, Robert Richter, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, 2010-06-25 at 16:50 +0200, Peter Zijlstra wrote:

> Not exactly sure how I could have messed up the ARM architecture code to
> make this happen though... will have a peek. 

I did find a bug in there, not sure it could have been responsible for
this but who knows...

Pushed out a new git tree with the below delta folded in.

---
Index: linux-2.6/arch/arm/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/arch/arm/kernel/perf_event.c
+++ linux-2.6/arch/arm/kernel/perf_event.c
@@ -221,27 +221,6 @@ again:
 }
 
 static void
-armpmu_del(struct perf_event *event, int flags)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx;
-
-	WARN_ON(idx < 0);
-
-	clear_bit(idx, cpuc->active_mask);
-	armpmu->disable(hwc, idx);
-
-	barrier();
-
-	armpmu_event_update(event, hwc, idx);
-	cpuc->events[idx] = NULL;
-	clear_bit(idx, cpuc->used_mask);
-
-	perf_event_update_userpage(event);
-}
-
-static void
 armpmu_read(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -267,6 +246,8 @@ armpmu_stop(struct perf_event *event, in
 	 */
 	if (!(hwc->state & PERF_HES_STOPPED)) {
 		armpmu->disable(hwc, hwc->idx);
+		barrier(); /* why? */
+		armpmu_event_update(event, hwc, hwc->idx);
 		hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
 	}
 }
@@ -289,7 +270,7 @@ armpmu_start(struct perf_event *event, i
 	hwc->state = 0;
 	/*
 	 * Set the period again. Some counters can't be stopped, so when we
-	 * were throttled we simply disabled the IRQ source and the counter
+	 * were stopped we simply disabled the IRQ source and the counter
 	 * may have been left counting. If we don't do this step then we may
 	 * get an interrupt too soon or *way* too late if the overflow has
 	 * happened since disabling.
@@ -298,6 +279,23 @@ armpmu_start(struct perf_event *event, i
 	armpmu->enable(hwc, hwc->idx);
 }
 
+static void
+armpmu_del(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	WARN_ON(idx < 0);
+
+	clear_bit(idx, cpuc->active_mask);
+	armpmu_stop(event, PERF_EF_UPDATE);
+	cpuc->events[idx] = NULL;
+	clear_bit(idx, cpuc->used_mask);
+
+	perf_event_update_userpage(event);
+}
+
 static int
 armpmu_add(struct perf_event *event, int flags)
 {
@@ -1075,7 +1073,7 @@ armv6pmu_handle_irq(int irq_num,
 			continue;
 
 		if (perf_event_overflow(event, 0, &data, regs))
-			armpmu_stop(event, 0);
+			armpmu->disable(hwc, idx);
 	}
 
 	/*


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-01 14:36         ` Peter Zijlstra
@ 2010-07-01 15:02           ` Peter Zijlstra
  2010-07-01 15:31             ` MattFleming
  2010-07-02 12:55           ` Will Deacon
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-07-01 15:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: paulus, stephane eranian, Robert Richter, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel, MattFleming

On Thu, 2010-07-01 at 16:36 +0200, Peter Zijlstra wrote:
> 
> Pushed out a new git tree with the below delta folded in.
> 
Said tree also cures x86 and folds the SH build fix.

Matt, you said it broke SH completely, but did you try perf stat? perf
record is not supposed to work on SH due to the hardware not having an
overflow interrupt.

Which made me think, what on SH guarantees we update the counter often
enough not to suffer from counter wrap? Would it make sense to make the
SH code hook into their arch tick handler and update the counters from
there?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-01 15:02           ` Peter Zijlstra
@ 2010-07-01 15:31             ` MattFleming
  2010-07-01 15:39               ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: MattFleming @ 2010-07-01 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, paulus, stephane eranian, Robert Richter,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, Jul 01, 2010 at 05:02:35PM +0200, Peter Zijlstra wrote:
> 
> Matt, you said it broke SH completely, but did you try perf stat? perf
> record is not supposed to work on SH due to the hardware not having an
> overflow interrupt.

perf record does work to some degree. It definitely worked before
applying your changes but not after. I admit I haven't really read the
perf event code, but Paul will know.

> Which made me think, what on SH guarantees we update the counter often
> enough not to suffer from counter wrap? Would it make sense to make the
> SH code hook into their arch tick handler and update the counters from
> there?

This was the way that the oprofile code used to work. Paul and I were
talking about using a hrtimer to sample performance counters as
opposed to piggy-backing on the tick handler.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-01 15:31             ` MattFleming
@ 2010-07-01 15:39               ` Peter Zijlstra
  2010-07-01 16:04                 ` Matt Fleming
                                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-07-01 15:39 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Will Deacon, paulus, stephane eranian, Robert Richter,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, 2010-07-01 at 16:31 +0100, MattFleming wrote:
> On Thu, Jul 01, 2010 at 05:02:35PM +0200, Peter Zijlstra wrote:
> > 
> > Matt, you said it broke SH completely, but did you try perf stat? perf
> > record is not supposed to work on SH due to the hardware not having an
> > overflow interrupt.
> 
> perf record does work to some degree. It definitely worked before
> applying your changes but not after. I admit I haven't really read the
> perf event code, but Paul will know.

Ok, let me look at that again.

> > Which made me think, what on SH guarantees we update the counter often
> > enough not to suffer from counter wrap? Would it make sense to make the
> > SH code hook into their arch tick handler and update the counters from
> > there?
> 
> This was the way that the oprofile code used to work. Paul and I were
> talking about using a hrtimer to sample performance counters as
> opposed to piggy-backing on the tick handler.

Ah, for sampling for sure, simply group a software perf event and a
hardware perf event together and use PERF_SAMPLE_READ.

But suppose its a non sampling counter, how do you avoid overflows of
the hardware register?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-01 15:39               ` Peter Zijlstra
@ 2010-07-01 16:04                 ` Matt Fleming
  2010-07-02  2:57                 ` Paul Mundt
  2010-07-08 11:13                 ` Peter Zijlstra
  2 siblings, 0 replies; 56+ messages in thread
From: Matt Fleming @ 2010-07-01 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, paulus, stephane eranian, Robert Richter,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, Jul 01, 2010 at 05:39:53PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 16:31 +0100, MattFleming wrote:
> > On Thu, Jul 01, 2010 at 05:02:35PM +0200, Peter Zijlstra wrote:
> > >
> > > Which made me think, what on SH guarantees we update the counter often
> > > enough not to suffer from counter wrap? Would it make sense to make the
> > > SH code hook into their arch tick handler and update the counters from
> > > there?
> > 
> > This was the way that the oprofile code used to work. Paul and I were
> > talking about using a hrtimer to sample performance counters as
> > opposed to piggy-backing on the tick handler.
> 
> Ah, for sampling for sure, simply group a software perf event and a
> hardware perf event together and use PERF_SAMPLE_READ.
> 
> But suppose its a non sampling counter, how do you avoid overflows of
> the hardware register?

Hmm.. good question! I'm not entirely sure we do. As you were saying,
without using the arch tick handler, I don't think we can guarantee
avoiding counter overflows. Currently the counters are chained such
that the counters are at least 48 bits. I guess all my tests were
short enough to not cause the counters to wrap ;-)

At some point we will want to not require chaining, giving us 32
bits. So yeah, this is issue is gonna crop up then. Interestingly, the
counters on SH don't wrap when they reach they're maximum value, they
just stop incrementing.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-01 15:39               ` Peter Zijlstra
  2010-07-01 16:04                 ` Matt Fleming
@ 2010-07-02  2:57                 ` Paul Mundt
  2010-07-02  9:52                   ` Peter Zijlstra
  2010-07-08 11:13                 ` Peter Zijlstra
  2 siblings, 1 reply; 56+ messages in thread
From: Paul Mundt @ 2010-07-02  2:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Will Deacon, paulus, stephane eranian,
	Robert Richter, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, Jul 01, 2010 at 05:39:53PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 16:31 +0100, MattFleming wrote:
> > On Thu, Jul 01, 2010 at 05:02:35PM +0200, Peter Zijlstra wrote:
> > > 
> > > Matt, you said it broke SH completely, but did you try perf stat? perf
> > > record is not supposed to work on SH due to the hardware not having an
> > > overflow interrupt.
> > 
> > perf record does work to some degree. It definitely worked before
> > applying your changes but not after. I admit I haven't really read the
> > perf event code, but Paul will know.
> 
> Ok, let me look at that again.
> 
Any perf record functionality observed is entirely coincidental and not
by design. It was something I planned to revisit, but most of what we
have right now is only geared at the one-shot perf stat case.

> > > Which made me think, what on SH guarantees we update the counter often
> > > enough not to suffer from counter wrap? Would it make sense to make the
> > > SH code hook into their arch tick handler and update the counters from
> > > there?
> > 
> > This was the way that the oprofile code used to work. Paul and I were
> > talking about using a hrtimer to sample performance counters as
> > opposed to piggy-backing on the tick handler.
> 
> Ah, for sampling for sure, simply group a software perf event and a
> hardware perf event together and use PERF_SAMPLE_READ.
> 
> But suppose its a non sampling counter, how do you avoid overflows of
> the hardware register?

At the moment it's not an issue since we have big enough counters that
overflows don't really happen, especially if we're primarily using them
for one-shot measuring.

SH-4A style counters behave in such a fashion that we have 2 general
purpose counters, and 2 counters for measuring bus transactions. These
bus counters can optionally be disabled and used in a chained mode to
provide the general purpose counters a 64-bit counter (the actual
validity in the upper half of the chained counter varies depending on the
CPUs, but all of them can do at least 48-bits when chained).

Each counter has overflow detection and asserts an overflow bit, but
there are no exceptions associated with this, so it's something that we
would have to tie in to the tick or defer to a bottom half handler in the
non-sampling case (or simply test on every read, and accept some degree
of accuracy loss). Any perf record functionality we implement with this
sort of scheme is only going to provide ballpark figures anyways, so it's
certainly within the parameters of acceptable loss in exchange for
increased functionality.

Different CPUs also implement their overflows differently, some will roll
and resume counting, but most simply stop until the overflow bit is
cleared.

My main plan was to build on top of the multi-pmu stuff, unchain the
counters, and expose the bus counters with their own event map as a
separate PMU instance. All of the other handling logic can pretty much be
reused directly, but it does mean that we need to be a bit smarter about
overflow detection/handling. Sampling and so on is also on the TODO list,
but is as of yet still not supported.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-02  2:57                 ` Paul Mundt
@ 2010-07-02  9:52                   ` Peter Zijlstra
  2010-07-05 11:14                     ` Paul Mundt
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-07-02  9:52 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Matt Fleming, Will Deacon, paulus, stephane eranian,
	Robert Richter, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, 2010-07-02 at 11:57 +0900, Paul Mundt wrote:
> At the moment it's not an issue since we have big enough counters that
> overflows don't really happen, especially if we're primarily using them
> for one-shot measuring.
> 
> SH-4A style counters behave in such a fashion that we have 2 general
> purpose counters, and 2 counters for measuring bus transactions. These
> bus counters can optionally be disabled and used in a chained mode to
> provide the general purpose counters a 64-bit counter (the actual
> validity in the upper half of the chained counter varies depending on the
> CPUs, but all of them can do at least 48-bits when chained). 

Right, so I was reading some of that code and I couldn't actually find
where you keep consistency between the hardware counter value and the
stored prev_count value.

That is, suppose I'm counting, the hardware starts at 0, hwc->prev_count
= 0 and event->count = 0.

At some point, x we context switch this task away, so we ->disable(),
which disables the counter and updates the values, so at that time
hwc->prev = x and event->count = x, right?

Now suppose we schedule the task back in, so we do ->enable(), then what
happens? sh_pmu_enable() finds an unused index, (disables it for some
reason.. it should already be cleared if its not used, but I guess a few
extra hardware writes dont hurt) and calls sh4a_pmu_enable() on it.

sh4a_pmu_enable() does 3 writes:

  PPC_PMCAT -- does this clear the counter value?
  PPC_CCBR  -- writes the ->config bits
  PPC_CCBR  (adds CCBR_DUC, couldn't this be done in the 
             previous write to this reg?)

Now assuming that enable does indeed clear the hardware counter value,
shouldn't you also set hwc->prev_count to 0 again? Otherwise the next
update will see a massive jump?

Alternatively you could write the hwc->prev_count value back to the
register.

If you eventually want to drop the chained counter support I guess it
would make sense to have sh_perf_event_update() read and clear the
counter so that you're always 0 based and then enforce an update from
the arch tick hander so you never overflow.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-01 14:36         ` Peter Zijlstra
  2010-07-01 15:02           ` Peter Zijlstra
@ 2010-07-02 12:55           ` Will Deacon
  1 sibling, 0 replies; 56+ messages in thread
From: Will Deacon @ 2010-07-02 12:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulus, stephane eranian, Robert Richter, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

Hi Peter,

On Thu, 2010-07-01 at 15:36 +0100, Peter Zijlstra wrote:
> On Fri, 2010-06-25 at 16:50 +0200, Peter Zijlstra wrote:
> 
> > Not exactly sure how I could have messed up the ARM architecture code to
> > make this happen though... will have a peek.
> 
> I did find a bug in there, not sure it could have been responsible for
> this but who knows...
> 
> Pushed out a new git tree with the below delta folded in.
> 
I had a look at this yesterday and discovered a bug in the ARM
backend code, which I've posted a patch for to ALKML:

http://lists.infradead.org/pipermail/linux-arm-kernel/2010-July/019461.html

Unfortunately, with this applied and your latest changes I still
get 0 from pinned hardware counters:

# perf stat -r 5 -e cycles -e instructions -e cs -e faults -e branches -a -- git status

Performance counter stats for 'git status' (5 runs):

                  0  cycles                     ( +-     nan% )
                  0  instructions             #      0.000 IPC     ( +-     nan% )
              88447  context-switches           ( +-  12.624% )
              13647  page-faults                ( +-   0.015% )
                  0  branches                   ( +-     nan% )

The changes you've made to arch/arm/kernel/perf_event.c
look sane. If I get some time I'll try and dig deeper.

Will


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-02  9:52                   ` Peter Zijlstra
@ 2010-07-05 11:14                     ` Paul Mundt
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Mundt @ 2010-07-05 11:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Will Deacon, paulus, stephane eranian,
	Robert Richter, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, Jul 02, 2010 at 11:52:03AM +0200, Peter Zijlstra wrote:
> Right, so I was reading some of that code and I couldn't actually find
> where you keep consistency between the hardware counter value and the
> stored prev_count value.
> 
> That is, suppose I'm counting, the hardware starts at 0, hwc->prev_count
> = 0 and event->count = 0.
> 
> At some point, x we context switch this task away, so we ->disable(),
> which disables the counter and updates the values, so at that time
> hwc->prev = x and event->count = x, right?
> 
> Now suppose we schedule the task back in, so we do ->enable(), then what
> happens? sh_pmu_enable() finds an unused index, (disables it for some
> reason.. it should already be cleared if its not used, but I guess a few
> extra hardware writes dont hurt) and calls sh4a_pmu_enable() on it.
> 
I don't quite remember where the ->disable() came from, I vaguely recall
copying it from one of the other architectures, but it could have just
been a remnant of something I had for debug code. In any event, you're
correct, we don't seem to need it anymore.

> sh4a_pmu_enable() does 3 writes:
> 
>   PPC_PMCAT -- does this clear the counter value?

Yes, the counters themselves are read-only, so clearing is done through
the PMCAT control register.

>   PPC_CCBR  -- writes the ->config bits
>   PPC_CCBR  (adds CCBR_DUC, couldn't this be done in the 
>              previous write to this reg?)
> 
No, the DUC bit needs to be set by itself or the write is discarded on
some CPUs. Clearing it with other bits is fine, however. This is what
starts the counter running.

> Now assuming that enable does indeed clear the hardware counter value,
> shouldn't you also set hwc->prev_count to 0 again? Otherwise the next
> update will see a massive jump?
> 
I think that's a correct observation, but I'm having difficulty verifying
it on my current board since it seems someone moved the PMCAT register,
as the counters aren't being cleared on this particular CPU. I'll test on
the board I wrote this code for initially tomorrow and see how that goes.
It did used to work fine at least.

> Alternatively you could write the hwc->prev_count value back to the
> register.
> 
That would be an option if the counters weren't read-only, yes.

> If you eventually want to drop the chained counter support I guess it
> would make sense to have sh_perf_event_update() read and clear the
> counter so that you're always 0 based and then enforce an update from
> the arch tick hander so you never overflow.
> 
Yes, I'd thought about that too. I'll give it a go once I find out where
the other half of my registers disappeared to. As it is, it seems my bat
and I have an appointment to make.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-01 15:39               ` Peter Zijlstra
  2010-07-01 16:04                 ` Matt Fleming
  2010-07-02  2:57                 ` Paul Mundt
@ 2010-07-08 11:13                 ` Peter Zijlstra
  2010-07-08 11:19                   ` Ingo Molnar
  2 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2010-07-08 11:13 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Will Deacon, paulus, stephane eranian, Robert Richter,
	Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov, Lin Ming,
	Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, 2010-07-01 at 17:39 +0200, Peter Zijlstra wrote:
> 
> Ah, for sampling for sure, simply group a software perf event and a
> hardware perf event together and use PERF_SAMPLE_READ. 

So the idea is to sample using a software event (periodic timer of
sorts, maybe randomize it) and weight its samples by the hardware event
deltas.

Suppose you have a workload consisting of two main parts:

  my_important_work()
  {
     load_my_data();
     compute_me_silly();
  }

Now, lets assume that both these functions take the same time to
complete for each part of work. In that case a periodic timer generate
samples that are about 50/50 distributed between these two functions.

Now, let us further assume that load_my_data() is so slow because its
missing all the caches and compute_me_silly() is slow because its
defeating the branch predictor.

So what we want to end up with, is that when we sample for cache-misses
we get load_my_data() as the predominant function, not a nice 50/50
relation. Idem for branch misses and compute_me_silly().

By weighting the samples by the hw counter delta we get this, if we
assume that the sampling frequency is not a harmonic of the runtime of
these functions, then statistics will dtrt.

It basically generates a massive skid on the sample, but as long as most
of the samples end up hitting the right function we're good. For a
periodic workload like: 
  while (lots) { my_important_work() }
that is even true for period > function_runtime with the exception of
that harmonic thing. For less neat workloads like:
  while (lots) { my_important_work(); other_random_things(); }
This doesn't need to work unless period < function_runtime.

Clearly we cannot attribute anything to the actual instruction hit due
to the massive skid, but we can (possibly) say something about the
function based on these statistical rules.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-08 11:13                 ` Peter Zijlstra
@ 2010-07-08 11:19                   ` Ingo Molnar
  2010-07-18 19:37                     ` Matt Fleming
  0 siblings, 1 reply; 56+ messages in thread
From: Ingo Molnar @ 2010-07-08 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Will Deacon, paulus, stephane eranian,
	Robert Richter, Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov,
	Lin Ming, Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2010-07-01 at 17:39 +0200, Peter Zijlstra wrote:
> > 
> > Ah, for sampling for sure, simply group a software perf event and a
> > hardware perf event together and use PERF_SAMPLE_READ. 
> 
> So the idea is to sample using a software event (periodic timer of sorts, 
> maybe randomize it) and weight its samples by the hardware event deltas.
> 
> Suppose you have a workload consisting of two main parts:
> 
>   my_important_work()
>   {
>      load_my_data();
>      compute_me_silly();
>   }
> 
> Now, lets assume that both these functions take the same time to complete 
> for each part of work. In that case a periodic timer generate samples that 
> are about 50/50 distributed between these two functions.
> 
> Now, let us further assume that load_my_data() is so slow because its 
> missing all the caches and compute_me_silly() is slow because its defeating 
> the branch predictor.
> 
> So what we want to end up with, is that when we sample for cache-misses we 
> get load_my_data() as the predominant function, not a nice 50/50 relation. 
> Idem for branch misses and compute_me_silly().
> 
> By weighting the samples by the hw counter delta we get this, if we assume 
> that the sampling frequency is not a harmonic of the runtime of these 
> functions, then statistics will dtrt.

Yes.

And if the platform code implements this then the tooling side already takes 
care of it - even if the CPU itself cannot geneate interrupts based on say 
cachemisses or branches (but can measure them via counts).

The only situation where statistics will not do the right thing is when the 
likelyhood of the sample tick significantly correlates with the likelyhood of 
the workload itself executing. Timer-dominated workloads would be an example.

Real hrtimers are sufficiently tick-less to solve most of these artifacts in 
practice.

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 05/11] perf: register pmu implementations
  2010-06-24 14:28 ` [RFC][PATCH 05/11] perf: register pmu implementations Peter Zijlstra
  2010-06-28 13:21   ` Frederic Weisbecker
@ 2010-07-09  3:08   ` Paul Mackerras
  2010-07-09  8:14     ` Peter Zijlstra
  1 sibling, 1 reply; 56+ messages in thread
From: Paul Mackerras @ 2010-07-09  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stephane eranian, Robert Richter, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, Jun 24, 2010 at 04:28:09PM +0200, Peter Zijlstra wrote:

> Simple registration interface for struct pmu, this provides the
> infrastructure for removing all the weak functions.

Seems to work on powerpc, but the sh bits don't seem quite right:

> Index: linux-2.6/arch/sh/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/arch/sh/kernel/perf_event.c
> +++ linux-2.6/arch/sh/kernel/perf_event.c
> @@ -257,24 +257,24 @@ static void sh_pmu_read(struct perf_even
>  	sh_perf_event_update(event, &event->hw, event->hw.idx);
>  }
>  
> -static struct pmu pmu = {
> -	.enable		= sh_pmu_enable,
> -	.disable	= sh_pmu_disable,
> -	.read		= sh_pmu_read,
> -};
> -
> -struct pmu *hw_perf_event_init(struct perf_event *event)
> +static in sh_pmu_event_init(struct perf_event *event)

int?

>  {
>  	int err = __hw_perf_event_init(event);

We need a switch on event->attr.type so we return -ENOENT if it's
not PERF_TYPE_{HARDWARE,HW_CACHE,RAW}.  As it is we don't ever return
-ENOENT, which might stop software and tracepoint events from working.

Paul.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 08/11] perf: Per PMU disable
  2010-06-24 14:28 ` [RFC][PATCH 08/11] perf: Per PMU disable Peter Zijlstra
@ 2010-07-09  7:31   ` Paul Mackerras
  2010-07-09  8:36     ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Mackerras @ 2010-07-09  7:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: stephane eranian, Robert Richter, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, Jun 24, 2010 at 04:28:12PM +0200, Peter Zijlstra wrote:

> Changes perf_disable() into perf_pmu_disable().

Needs this rolled in to compile on powerpc.

Paul.

diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 8890456..9046aea 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -517,7 +517,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, unsigned long mmcr0)
  * Disable all events to prevent PMU interrupts and to allow
  * events to be added or removed.
  */
-static void powerpc_pmu_pmu_disable(struct pmu *pmu)
+static void power_pmu_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
@@ -565,7 +565,7 @@ static void powerpc_pmu_pmu_disable(struct pmu *pmu)
  * If we were previously disabled and events were added, then
  * put the new config on the PMU.
  */
-static void powerpc_pmu_pmu_enable(struct pmu *pmu)
+static void power_pmu_pmu_enable(struct pmu *pmu)
 {
 	struct perf_event *event;
 	struct cpu_hw_events *cpuhw;
@@ -1130,7 +1130,7 @@ static int power_pmu_event_init(struct perf_event *event)
 struct pmu power_pmu = {
 	.pmu_enable	= power_pmu_pmu_enable,
 	.pmu_disable	= power_pmu_pmu_disable,
-	.event_init	= pmwer_pmu_event_init,
+	.event_init	= power_pmu_event_init,
 	.enable		= power_pmu_enable,
 	.disable	= power_pmu_disable,
 	.read		= power_pmu_read,

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 05/11] perf: register pmu implementations
  2010-07-09  3:08   ` Paul Mackerras
@ 2010-07-09  8:14     ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-07-09  8:14 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: stephane eranian, Robert Richter, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, 2010-07-09 at 13:08 +1000, Paul Mackerras wrote:

> > -struct pmu *hw_perf_event_init(struct perf_event *event)
> > +static in sh_pmu_event_init(struct perf_event *event)
> 
> int?

Argh, I fixed that a few times, but the hunk keeps slipping into
different patches.. cured.

> >  {
> >  	int err = __hw_perf_event_init(event);
> 
> We need a switch on event->attr.type so we return -ENOENT if it's
> not PERF_TYPE_{HARDWARE,HW_CACHE,RAW}.  As it is we don't ever return
> -ENOENT, which might stop software and tracepoint events from working.

Aaah, indeed! That is why Matt's perf record broke, perf record defaults
to -e cycles which automagically falls back to a software timer, which
then doesn't work.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 08/11] perf: Per PMU disable
  2010-07-09  7:31   ` Paul Mackerras
@ 2010-07-09  8:36     ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2010-07-09  8:36 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: stephane eranian, Robert Richter, Will Deacon, Paul Mundt,
	Frederic Weisbecker, Cyrill Gorcunov, Lin Ming, Yanmin,
	Deng-Cheng Zhu, David Miller, linux-kernel

On Fri, 2010-07-09 at 17:31 +1000, Paul Mackerras wrote:
> On Thu, Jun 24, 2010 at 04:28:12PM +0200, Peter Zijlstra wrote:
> 
> > Changes perf_disable() into perf_pmu_disable().
> 
> Needs this rolled in to compile on powerpc.

Done (a week or so ago, should have reposted, will do in a moment).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/11] perf pmu interface -v2
  2010-07-08 11:19                   ` Ingo Molnar
@ 2010-07-18 19:37                     ` Matt Fleming
  0 siblings, 0 replies; 56+ messages in thread
From: Matt Fleming @ 2010-07-18 19:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Will Deacon, paulus, stephane eranian,
	Robert Richter, Paul Mundt, Frederic Weisbecker, Cyrill Gorcunov,
	Lin Ming, Yanmin, Deng-Cheng Zhu, David Miller, linux-kernel

On Thu, 8 Jul 2010 13:19:36 +0200, Ingo Molnar <mingo@elte.hu> wrote:
> 
> And if the platform code implements this then the tooling side already takes 
> care of it - even if the CPU itself cannot geneate interrupts based on say 
> cachemisses or branches (but can measure them via counts).

I finally got a bit of time to work on this. I'm confused about how the
tools will take care of this at the moment. How much support currently
exists for these sorts of weighted samples? The only thing I can find
that looks related is PERF_SAMPLE_READ, but I don't think that's exactly
what we need.

Could someone point me in the right direction?

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2010-07-18 19:37 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-24 14:28 [RFC][PATCH 00/11] perf pmu interface -v2 Peter Zijlstra
2010-06-24 14:28 ` [PATCH 01/11] perf, x86: Fix Nehalem PMU quirk Peter Zijlstra
2010-06-24 14:28 ` [PATCH 02/11] perf: Fix argument of perf_arch_fetch_caller_regs Peter Zijlstra
2010-06-24 14:28 ` [PATCH 03/11] perf, sparc64: Fix maybe_change_configuration() PCR setting Peter Zijlstra
2010-06-24 14:28 ` [RFC][PATCH 04/11] perf: deconstify struct pmu Peter Zijlstra
2010-06-24 14:28 ` [RFC][PATCH 05/11] perf: register pmu implementations Peter Zijlstra
2010-06-28 13:21   ` Frederic Weisbecker
2010-06-28 15:16     ` Peter Zijlstra
2010-06-28 15:29       ` Frederic Weisbecker
2010-07-09  3:08   ` Paul Mackerras
2010-07-09  8:14     ` Peter Zijlstra
2010-06-24 14:28 ` [RFC][PATCH 06/11] perf: Unindent labels Peter Zijlstra
2010-06-24 14:28 ` [RFC][PATCH 07/11] perf: Reduce perf_disable() usage Peter Zijlstra
2010-06-24 14:28 ` [RFC][PATCH 08/11] perf: Per PMU disable Peter Zijlstra
2010-07-09  7:31   ` Paul Mackerras
2010-07-09  8:36     ` Peter Zijlstra
2010-06-24 14:28 ` [RFC][PATCH 09/11] perf: Default PMU ops Peter Zijlstra
2010-06-29 14:49   ` Frederic Weisbecker
2010-06-29 14:57     ` Peter Zijlstra
2010-06-29 15:00       ` Frederic Weisbecker
2010-06-29 14:58   ` Frederic Weisbecker
2010-06-29 14:59     ` Peter Zijlstra
2010-06-29 15:03       ` Frederic Weisbecker
2010-06-29 16:34         ` Peter Zijlstra
2010-06-29 18:07           ` Frederic Weisbecker
2010-06-29 18:09             ` Peter Zijlstra
2010-06-29 18:11               ` Frederic Weisbecker
2010-06-29 18:19                 ` Peter Zijlstra
2010-06-29 18:21                   ` Frederic Weisbecker
2010-06-24 14:28 ` [RFC][PATCH 10/11] perf: Shrink hw_perf_event Peter Zijlstra
2010-06-29 15:06   ` Frederic Weisbecker
2010-06-24 14:28 ` [RFC][PATCH 11/11] perf: Rework the PMU methods Peter Zijlstra
2010-06-29 15:37   ` Frederic Weisbecker
2010-06-29 16:40     ` Peter Zijlstra
2010-06-29 18:09       ` Frederic Weisbecker
2010-06-25 11:11 ` [RFC][PATCH 00/11] perf pmu interface -v2 Will Deacon
2010-06-25 11:16   ` Peter Zijlstra
2010-06-25 14:36     ` Will Deacon
2010-06-25 14:50       ` Peter Zijlstra
2010-07-01 14:36         ` Peter Zijlstra
2010-07-01 15:02           ` Peter Zijlstra
2010-07-01 15:31             ` MattFleming
2010-07-01 15:39               ` Peter Zijlstra
2010-07-01 16:04                 ` Matt Fleming
2010-07-02  2:57                 ` Paul Mundt
2010-07-02  9:52                   ` Peter Zijlstra
2010-07-05 11:14                     ` Paul Mundt
2010-07-08 11:13                 ` Peter Zijlstra
2010-07-08 11:19                   ` Ingo Molnar
2010-07-18 19:37                     ` Matt Fleming
2010-07-02 12:55           ` Will Deacon
2010-06-26 11:22 ` Matt Fleming
2010-06-26 16:22 ` Corey Ashford
2010-06-28 15:13   ` Peter Zijlstra
2010-06-30 17:19     ` Corey Ashford
2010-06-30 18:11       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).