All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] MIPS: perf: Add support for 64-bit MIPS hardware counters.
@ 2011-01-21 22:59 David Daney
  2011-01-21 22:59 ` [PATCH v2 1/4] MIPS: Add accessor macros for 64-bit performance counter registers David Daney
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: David Daney @ 2011-01-21 22:59 UTC (permalink / raw)
  To: linux-mips, ralf
  Cc: David Daney, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Dezhong Diao, Gabor Juhos,
	Grant Likely, Deng-Cheng Zhu

MIPS hardware performance counters may have either 32-bit or 64-bit
wide counter registers.  The current implementation only supports the
32-bit variety.

These patches aim to add support for 64-bit wide counters while
mantaining support for 32-bit.

Changes from v1:

o Removed Octeon processor support to a separate patch set.

o Rebased against v5 of Deng-Cheng Zhu's cleanups:
      http://patchwork.linux-mips.org/patch/2011/
      http://patchwork.linux-mips.org/patch/2012/
      http://patchwork.linux-mips.org/patch/2013/
      http://patchwork.linux-mips.org/patch/2014/
      http://patchwork.linux-mips.org/patch/2015/

o Tried to fix problem where 32-bit counters generated way too many
  interrupts.

David Daney (4):
  MIPS: Add accessor macros for 64-bit performance counter registers.
  MIPS: perf: Cleanup formatting in arch/mips/kernel/perf_event.c
  MIPS: perf: Reorganize contents of perf support files.
  MIPS: perf: Add support for 64-bit perf counters.

 arch/mips/Kconfig                    |    2 +-
 arch/mips/include/asm/mipsregs.h     |    8 +
 arch/mips/kernel/Makefile            |    5 +-
 arch/mips/kernel/perf_event.c        |  521 +----------------
 arch/mips/kernel/perf_event_mipsxx.c | 1104 ++++++++++++++++++++++++----------
 5 files changed, 785 insertions(+), 855 deletions(-)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dezhong Diao <dediao@cisco.com>
Cc: Gabor Juhos <juhosg@openwrt.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] MIPS: Add accessor macros for 64-bit performance counter registers.
  2011-01-21 22:59 [PATCH v2 0/4] MIPS: perf: Add support for 64-bit MIPS hardware counters David Daney
@ 2011-01-21 22:59 ` David Daney
  2011-01-21 22:59 ` [PATCH v2 2/4] MIPS: perf: Cleanup formatting in arch/mips/kernel/perf_event.c David Daney
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: David Daney @ 2011-01-21 22:59 UTC (permalink / raw)
  To: linux-mips, ralf; +Cc: David Daney

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
---
 arch/mips/include/asm/mipsregs.h |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index d5c9eaa..86c7ec1 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -1019,18 +1019,26 @@ do {									\
 #define write_c0_perfctrl0(val)	__write_32bit_c0_register($25, 0, val)
 #define read_c0_perfcntr0()	__read_32bit_c0_register($25, 1)
 #define write_c0_perfcntr0(val)	__write_32bit_c0_register($25, 1, val)
+#define read_c0_perfcntr0_64()	__read_64bit_c0_register($25, 1)
+#define write_c0_perfcntr0_64(val) __write_64bit_c0_register($25, 1, val)
 #define read_c0_perfctrl1()	__read_32bit_c0_register($25, 2)
 #define write_c0_perfctrl1(val)	__write_32bit_c0_register($25, 2, val)
 #define read_c0_perfcntr1()	__read_32bit_c0_register($25, 3)
 #define write_c0_perfcntr1(val)	__write_32bit_c0_register($25, 3, val)
+#define read_c0_perfcntr1_64()	__read_64bit_c0_register($25, 3)
+#define write_c0_perfcntr1_64(val) __write_64bit_c0_register($25, 3, val)
 #define read_c0_perfctrl2()	__read_32bit_c0_register($25, 4)
 #define write_c0_perfctrl2(val)	__write_32bit_c0_register($25, 4, val)
 #define read_c0_perfcntr2()	__read_32bit_c0_register($25, 5)
 #define write_c0_perfcntr2(val)	__write_32bit_c0_register($25, 5, val)
+#define read_c0_perfcntr2_64()	__read_64bit_c0_register($25, 5)
+#define write_c0_perfcntr2_64(val) __write_64bit_c0_register($25, 5, val)
 #define read_c0_perfctrl3()	__read_32bit_c0_register($25, 6)
 #define write_c0_perfctrl3(val)	__write_32bit_c0_register($25, 6, val)
 #define read_c0_perfcntr3()	__read_32bit_c0_register($25, 7)
 #define write_c0_perfcntr3(val)	__write_32bit_c0_register($25, 7, val)
+#define read_c0_perfcntr3_64()	__read_64bit_c0_register($25, 7)
+#define write_c0_perfcntr3_64(val) __write_64bit_c0_register($25, 7, val)
 
 /* RM9000 PerfCount performance counter register */
 #define read_c0_perfcount()	__read_64bit_c0_register($25, 0)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] MIPS: perf: Cleanup formatting in arch/mips/kernel/perf_event.c
  2011-01-21 22:59 [PATCH v2 0/4] MIPS: perf: Add support for 64-bit MIPS hardware counters David Daney
  2011-01-21 22:59 ` [PATCH v2 1/4] MIPS: Add accessor macros for 64-bit performance counter registers David Daney
@ 2011-01-21 22:59 ` David Daney
  2011-01-21 22:59 ` [PATCH v2 3/4] MIPS: perf: Reorganize contents of perf support files David Daney
  2011-01-21 22:59 ` [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters David Daney
  3 siblings, 0 replies; 15+ messages in thread
From: David Daney @ 2011-01-21 22:59 UTC (permalink / raw)
  To: linux-mips, ralf
  Cc: David Daney, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Deng-Cheng Zhu

Get rid of a bunch of useless inline declarations, and join a bunch of
improperly split lines.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
---
 arch/mips/kernel/perf_event.c        |   26 +++++-------
 arch/mips/kernel/perf_event_mipsxx.c |   68 +++++++++++++---------------------
 2 files changed, 37 insertions(+), 57 deletions(-)

diff --git a/arch/mips/kernel/perf_event.c b/arch/mips/kernel/perf_event.c
index a824485..931d957 100644
--- a/arch/mips/kernel/perf_event.c
+++ b/arch/mips/kernel/perf_event.c
@@ -118,10 +118,9 @@ struct mips_pmu {
 
 static const struct mips_pmu *mipspmu;
 
-static int
-mipspmu_event_set_period(struct perf_event *event,
-			struct hw_perf_event *hwc,
-			int idx)
+static int mipspmu_event_set_period(struct perf_event *event,
+				    struct hw_perf_event *hwc,
+				    int idx)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	s64 left = local64_read(&hwc->period_left);
@@ -162,8 +161,8 @@ mipspmu_event_set_period(struct perf_event *event,
 }
 
 static void mipspmu_event_update(struct perf_event *event,
-			struct hw_perf_event *hwc,
-			int idx)
+				 struct hw_perf_event *hwc,
+				 int idx)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	unsigned long flags;
@@ -422,8 +421,7 @@ static struct pmu pmu = {
 	.read		= mipspmu_read,
 };
 
-static inline unsigned int
-mipspmu_perf_event_encode(const struct mips_perf_event *pev)
+static unsigned int mipspmu_perf_event_encode(const struct mips_perf_event *pev)
 {
 /*
  * Top 8 bits for range, next 16 bits for cntr_mask, lowest 8 bits for
@@ -439,8 +437,7 @@ mipspmu_perf_event_encode(const struct mips_perf_event *pev)
 #endif
 }
 
-static const struct mips_perf_event *
-mipspmu_map_general_event(int idx)
+static const struct mips_perf_event *mipspmu_map_general_event(int idx)
 {
 	const struct mips_perf_event *pev;
 
@@ -451,8 +448,7 @@ mipspmu_map_general_event(int idx)
 	return pev;
 }
 
-static const struct mips_perf_event *
-mipspmu_map_cache_event(u64 config)
+static const struct mips_perf_event *mipspmu_map_cache_event(u64 config)
 {
 	unsigned int cache_type, cache_op, cache_result;
 	const struct mips_perf_event *pev;
@@ -515,9 +511,9 @@ static int validate_group(struct perf_event *event)
 }
 
 /* This is needed by specific irq handlers in perf_event_*.c */
-static void
-handle_associated_event(struct cpu_hw_events *cpuc,
-	int idx, struct perf_sample_data *data, struct pt_regs *regs)
+static void handle_associated_event(struct cpu_hw_events *cpuc,
+				    int idx, struct perf_sample_data *data,
+				    struct pt_regs *regs)
 {
 	struct perf_event *event = cpuc->events[idx];
 	struct hw_perf_event *hwc = &event->hw;
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index d9a7db7..72cd2e1 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -49,37 +49,32 @@ static int cpu_has_mipsmt_pertccounters;
 #endif
 
 /* Copied from op_model_mipsxx.c */
-static inline unsigned int vpe_shift(void)
+static unsigned int vpe_shift(void)
 {
 	if (num_possible_cpus() > 1)
 		return 1;
 
 	return 0;
 }
-#else /* !CONFIG_MIPS_MT_SMP */
-#define vpe_id()	0
-
-static inline unsigned int vpe_shift(void)
-{
-	return 0;
-}
-#endif /* CONFIG_MIPS_MT_SMP */
 
-static inline unsigned int
-counters_total_to_per_cpu(unsigned int counters)
+static unsigned int counters_total_to_per_cpu(unsigned int counters)
 {
 	return counters >> vpe_shift();
 }
 
-static inline unsigned int
-counters_per_cpu_to_total(unsigned int counters)
+static unsigned int counters_per_cpu_to_total(unsigned int counters)
 {
 	return counters << vpe_shift();
 }
 
+#else /* !CONFIG_MIPS_MT_SMP */
+#define vpe_id()	0
+
+#endif /* CONFIG_MIPS_MT_SMP */
+
 #define __define_perf_accessors(r, n, np)				\
 									\
-static inline unsigned int r_c0_ ## r ## n(void)			\
+static unsigned int r_c0_ ## r ## n(void)				\
 {									\
 	unsigned int cpu = vpe_id();					\
 									\
@@ -94,7 +89,7 @@ static inline unsigned int r_c0_ ## r ## n(void)			\
 	return 0;							\
 }									\
 									\
-static inline void w_c0_ ## r ## n(unsigned int value)			\
+static void w_c0_ ## r ## n(unsigned int value)				\
 {									\
 	unsigned int cpu = vpe_id();					\
 									\
@@ -121,7 +116,7 @@ __define_perf_accessors(perfctrl, 1, 3)
 __define_perf_accessors(perfctrl, 2, 0)
 __define_perf_accessors(perfctrl, 3, 1)
 
-static inline int __n_counters(void)
+static int __n_counters(void)
 {
 	if (!(read_c0_config1() & M_CONFIG1_PC))
 		return 0;
@@ -135,7 +130,7 @@ static inline int __n_counters(void)
 	return 4;
 }
 
-static inline int n_counters(void)
+static int n_counters(void)
 {
 	int counters;
 
@@ -175,8 +170,7 @@ static void reset_counters(void *arg)
 	}
 }
 
-static inline u64
-mipsxx_pmu_read_counter(unsigned int idx)
+static u64 mipsxx_pmu_read_counter(unsigned int idx)
 {
 	switch (idx) {
 	case 0:
@@ -193,8 +187,7 @@ mipsxx_pmu_read_counter(unsigned int idx)
 	}
 }
 
-static inline void
-mipsxx_pmu_write_counter(unsigned int idx, u64 val)
+static void mipsxx_pmu_write_counter(unsigned int idx, u64 val)
 {
 	switch (idx) {
 	case 0:
@@ -212,8 +205,7 @@ mipsxx_pmu_write_counter(unsigned int idx, u64 val)
 	}
 }
 
-static inline unsigned int
-mipsxx_pmu_read_control(unsigned int idx)
+static unsigned int mipsxx_pmu_read_control(unsigned int idx)
 {
 	switch (idx) {
 	case 0:
@@ -230,8 +222,7 @@ mipsxx_pmu_read_control(unsigned int idx)
 	}
 }
 
-static inline void
-mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
+static void mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
 {
 	switch (idx) {
 	case 0:
@@ -483,9 +474,8 @@ static const struct mips_perf_event mipsxx74Kcore_cache_map
 };
 
 #ifdef CONFIG_MIPS_MT_SMP
-static void
-check_and_calc_range(struct perf_event *event,
-			const struct mips_perf_event *pev)
+static void check_and_calc_range(struct perf_event *event,
+				 const struct mips_perf_event *pev)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
@@ -508,9 +498,8 @@ check_and_calc_range(struct perf_event *event,
 		hwc->config_base |= M_TC_EN_ALL;
 }
 #else
-static void
-check_and_calc_range(struct perf_event *event,
-			const struct mips_perf_event *pev)
+static void check_and_calc_range(struct perf_event *event,
+				 const struct mips_perf_event *pev)
 {
 }
 #endif
@@ -705,8 +694,7 @@ static int mipsxx_pmu_handle_shared_irq(void)
 	return handled;
 }
 
-static irqreturn_t
-mipsxx_pmu_handle_irq(int irq, void *dev)
+static irqreturn_t mipsxx_pmu_handle_irq(int irq, void *dev)
 {
 	return mipsxx_pmu_handle_shared_irq();
 }
@@ -738,9 +726,8 @@ static void mipsxx_pmu_stop(void)
 #endif
 }
 
-static int
-mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
-			struct hw_perf_event *hwc)
+static int mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
+				    struct hw_perf_event *hwc)
 {
 	int i;
 
@@ -769,8 +756,7 @@ mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
 	return -EAGAIN;
 }
 
-static void
-mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
+static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	unsigned long flags;
@@ -788,8 +774,7 @@ mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
 	local_irq_restore(flags);
 }
 
-static void
-mipsxx_pmu_disable_event(int idx)
+static void mipsxx_pmu_disable_event(int idx)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	unsigned long flags;
@@ -864,8 +849,7 @@ mipsxx_pmu_disable_event(int idx)
  * then 128 needs to be added to 15 as the input for the event config,
  * i.e., 143 (0x8F) to be used.
  */
-static const struct mips_perf_event *
-mipsxx_pmu_map_raw_event(u64 config)
+static const struct mips_perf_event *mipsxx_pmu_map_raw_event(u64 config)
 {
 	unsigned int raw_id = config & 0xff;
 	unsigned int base_id = raw_id & 0x7f;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] MIPS: perf: Reorganize contents of perf support files.
  2011-01-21 22:59 [PATCH v2 0/4] MIPS: perf: Add support for 64-bit MIPS hardware counters David Daney
  2011-01-21 22:59 ` [PATCH v2 1/4] MIPS: Add accessor macros for 64-bit performance counter registers David Daney
  2011-01-21 22:59 ` [PATCH v2 2/4] MIPS: perf: Cleanup formatting in arch/mips/kernel/perf_event.c David Daney
@ 2011-01-21 22:59 ` David Daney
  2011-01-21 22:59 ` [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters David Daney
  3 siblings, 0 replies; 15+ messages in thread
From: David Daney @ 2011-01-21 22:59 UTC (permalink / raw)
  To: linux-mips, ralf
  Cc: David Daney, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Dezhong Diao, Gabor Juhos,
	Grant Likely, Deng-Cheng Zhu

The contents of arch/mips/kernel/perf_event.c and
arch/mips/kernel/perf_event_mipsxx.c were divided in a seemingly ad
hoc manner, with the first including the second.

I moved all the hardware counter support code to perf_event_mipsxx.c
and removed the gating #ifdefs to the Kconfig and Makefile.

Now perf_event.c contains only the callchain support, everything else
is in perf_event_mipsxx.c

There are no code changes, only moving of functions from one file to
the other, or removing empty unneeded functions.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Dezhong Diao <dediao@cisco.com>
Cc: Gabor Juhos <juhosg@openwrt.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
---
 arch/mips/Kconfig                    |    2 +-
 arch/mips/kernel/Makefile            |    5 +-
 arch/mips/kernel/perf_event.c        |  517 +---------------------------------
 arch/mips/kernel/perf_event_mipsxx.c |  532 +++++++++++++++++++++++++++++++++-
 4 files changed, 534 insertions(+), 522 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 9648c66..a56a4fc 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2033,7 +2033,7 @@ config NODES_SHIFT
 
 config HW_PERF_EVENTS
 	bool "Enable hardware performance counter support for perf events"
-	depends on PERF_EVENTS && !MIPS_MT_SMTC && OPROFILE=n && CPU_MIPS32
+	depends on PERF_EVENTS && !MIPS_MT_SMTC && OPROFILE=n && (CPU_MIPS32 || CPU_MIPS64 || CPU_R10000 || CPU_SB1)
 	default y
 	help
 	  Enable hardware performance counter support for perf events. If
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index cedee2b..753b421 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -11,6 +11,8 @@ obj-y		+= cpu-probe.o branch.o entry.o genex.o irq.o process.o \
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
+CFLAGS_REMOVE_perf_event.o = -pg
+CFLAGS_REMOVE_perf_event_mipsxx.o = -pg
 endif
 
 obj-$(CONFIG_CEVT_BCM1480)	+= cevt-bcm1480.o
@@ -105,7 +107,8 @@ obj-$(CONFIG_HAVE_STD_PC_SERIAL_PORT)	+= 8250-platform.o
 
 obj-$(CONFIG_MIPS_CPUFREQ)	+= cpufreq/
 
-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
+obj-$(CONFIG_PERF_EVENTS)	+= perf_event.o
+obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event_mipsxx.o
 
 obj-$(CONFIG_JUMP_LABEL)	+= jump_label.o
 
diff --git a/arch/mips/kernel/perf_event.c b/arch/mips/kernel/perf_event.c
index 931d957..c1cf9c6 100644
--- a/arch/mips/kernel/perf_event.c
+++ b/arch/mips/kernel/perf_event.c
@@ -14,531 +14,16 @@
  * published by the Free Software Foundation.
  */
 
-#include <linux/cpumask.h>
-#include <linux/interrupt.h>
-#include <linux/smp.h>
-#include <linux/kernel.h>
 #include <linux/perf_event.h>
-#include <linux/uaccess.h>
 
-#include <asm/irq.h>
-#include <asm/irq_regs.h>
 #include <asm/stacktrace.h>
-#include <asm/time.h> /* For perf_irq */
-
-/* These are for 32bit counters. For 64bit ones, define them accordingly. */
-#define MAX_PERIOD	((1ULL << 32) - 1)
-#define VALID_COUNT	0x7fffffff
-#define TOTAL_BITS	32
-#define HIGHEST_BIT	31
-
-#define MIPS_MAX_HWEVENTS 4
-
-struct cpu_hw_events {
-	/* Array of events on this cpu. */
-	struct perf_event	*events[MIPS_MAX_HWEVENTS];
-
-	/*
-	 * Set the bit (indexed by the counter number) when the counter
-	 * is used for an event.
-	 */
-	unsigned long		used_mask[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
-
-	/*
-	 * The borrowed MSB for the performance counter. A MIPS performance
-	 * counter uses its bit 31 (for 32bit counters) or bit 63 (for 64bit
-	 * counters) as a factor of determining whether a counter overflow
-	 * should be signaled. So here we use a separate MSB for each
-	 * counter to make things easy.
-	 */
-	unsigned long		msbs[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
-
-	/*
-	 * Software copy of the control register for each performance counter.
-	 * MIPS CPUs vary in performance counters. They use this differently,
-	 * and even may not use it.
-	 */
-	unsigned int		saved_ctrl[MIPS_MAX_HWEVENTS];
-};
-DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
-	.saved_ctrl = {0},
-};
-
-/* The description of MIPS performance events. */
-struct mips_perf_event {
-	unsigned int event_id;
-	/*
-	 * MIPS performance counters are indexed starting from 0.
-	 * CNTR_EVEN indicates the indexes of the counters to be used are
-	 * even numbers.
-	 */
-	unsigned int cntr_mask;
-	#define CNTR_EVEN	0x55555555
-	#define CNTR_ODD	0xaaaaaaaa
-#ifdef CONFIG_MIPS_MT_SMP
-	enum {
-		T  = 0,
-		V  = 1,
-		P  = 2,
-	} range;
-#else
-	#define T
-	#define V
-	#define P
-#endif
-};
-
-static struct mips_perf_event raw_event;
-static DEFINE_MUTEX(raw_event_mutex);
-
-#define UNSUPPORTED_PERF_EVENT_ID 0xffffffff
-#define C(x) PERF_COUNT_HW_CACHE_##x
-
-struct mips_pmu {
-	const char	*name;
-	int		irq;
-	irqreturn_t	(*handle_irq)(int irq, void *dev);
-	int		(*handle_shared_irq)(void);
-	void		(*start)(void);
-	void		(*stop)(void);
-	int		(*alloc_counter)(struct cpu_hw_events *cpuc,
-					struct hw_perf_event *hwc);
-	u64		(*read_counter)(unsigned int idx);
-	void		(*write_counter)(unsigned int idx, u64 val);
-	void		(*enable_event)(struct hw_perf_event *evt, int idx);
-	void		(*disable_event)(int idx);
-	const struct mips_perf_event *(*map_raw_event)(u64 config);
-	const struct mips_perf_event (*general_event_map)[PERF_COUNT_HW_MAX];
-	const struct mips_perf_event (*cache_event_map)
-				[PERF_COUNT_HW_CACHE_MAX]
-				[PERF_COUNT_HW_CACHE_OP_MAX]
-				[PERF_COUNT_HW_CACHE_RESULT_MAX];
-	unsigned int	num_counters;
-};
-
-static const struct mips_pmu *mipspmu;
-
-static int mipspmu_event_set_period(struct perf_event *event,
-				    struct hw_perf_event *hwc,
-				    int idx)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	s64 left = local64_read(&hwc->period_left);
-	s64 period = hwc->sample_period;
-	int ret = 0;
-	u64 uleft;
-	unsigned long flags;
-
-	if (unlikely(left <= -period)) {
-		left = period;
-		local64_set(&hwc->period_left, left);
-		hwc->last_period = period;
-		ret = 1;
-	}
-
-	if (unlikely(left <= 0)) {
-		left += period;
-		local64_set(&hwc->period_left, left);
-		hwc->last_period = period;
-		ret = 1;
-	}
-
-	if (left > (s64)MAX_PERIOD)
-		left = MAX_PERIOD;
-
-	local64_set(&hwc->prev_count, (u64)-left);
-
-	local_irq_save(flags);
-	uleft = (u64)(-left) & MAX_PERIOD;
-	uleft > VALID_COUNT ?
-		set_bit(idx, cpuc->msbs) : clear_bit(idx, cpuc->msbs);
-	mipspmu->write_counter(idx, (u64)(-left) & VALID_COUNT);
-	local_irq_restore(flags);
-
-	perf_event_update_userpage(event);
-
-	return ret;
-}
-
-static void mipspmu_event_update(struct perf_event *event,
-				 struct hw_perf_event *hwc,
-				 int idx)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	unsigned long flags;
-	int shift = 64 - TOTAL_BITS;
-	s64 prev_raw_count, new_raw_count;
-	u64 delta;
-
-again:
-	prev_raw_count = local64_read(&hwc->prev_count);
-	local_irq_save(flags);
-	/* Make the counter value be a "real" one. */
-	new_raw_count = mipspmu->read_counter(idx);
-	if (new_raw_count & (test_bit(idx, cpuc->msbs) << HIGHEST_BIT)) {
-		new_raw_count &= VALID_COUNT;
-		clear_bit(idx, cpuc->msbs);
-	} else
-		new_raw_count |= (test_bit(idx, cpuc->msbs) << HIGHEST_BIT);
-	local_irq_restore(flags);
-
-	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
-				new_raw_count) != prev_raw_count)
-		goto again;
-
-	delta = (new_raw_count << shift) - (prev_raw_count << shift);
-	delta >>= shift;
-
-	local64_add(delta, &event->count);
-	local64_sub(delta, &hwc->period_left);
-
-	return;
-}
-
-static void mipspmu_start(struct perf_event *event, int flags)
-{
-	struct hw_perf_event *hwc = &event->hw;
-
-	if (!mipspmu)
-		return;
-
-	if (flags & PERF_EF_RELOAD)
-		WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
-
-	hwc->state = 0;
-
-	/* Set the period for the event. */
-	mipspmu_event_set_period(event, hwc, hwc->idx);
-
-	/* Enable the event. */
-	mipspmu->enable_event(hwc, hwc->idx);
-}
-
-static void mipspmu_stop(struct perf_event *event, int flags)
-{
-	struct hw_perf_event *hwc = &event->hw;
-
-	if (!mipspmu)
-		return;
-
-	if (!(hwc->state & PERF_HES_STOPPED)) {
-		/* We are working on a local event. */
-		mipspmu->disable_event(hwc->idx);
-		barrier();
-		mipspmu_event_update(event, hwc, hwc->idx);
-		hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
-	}
-}
-
-static int mipspmu_add(struct perf_event *event, int flags)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct hw_perf_event *hwc = &event->hw;
-	int idx;
-	int err = 0;
-
-	perf_pmu_disable(event->pmu);
-
-	/* To look for a free counter for this event. */
-	idx = mipspmu->alloc_counter(cpuc, hwc);
-	if (idx < 0) {
-		err = idx;
-		goto out;
-	}
-
-	/*
-	 * If there is an event in the counter we are going to use then
-	 * make sure it is disabled.
-	 */
-	event->hw.idx = idx;
-	mipspmu->disable_event(idx);
-	cpuc->events[idx] = event;
-
-	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
-	if (flags & PERF_EF_START)
-		mipspmu_start(event, PERF_EF_RELOAD);
-
-	/* Propagate our changes to the userspace mapping. */
-	perf_event_update_userpage(event);
-
-out:
-	perf_pmu_enable(event->pmu);
-	return err;
-}
-
-static void mipspmu_del(struct perf_event *event, int flags)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx;
-
-	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
-
-	mipspmu_stop(event, PERF_EF_UPDATE);
-	cpuc->events[idx] = NULL;
-	clear_bit(idx, cpuc->used_mask);
-
-	perf_event_update_userpage(event);
-}
-
-static void mipspmu_read(struct perf_event *event)
-{
-	struct hw_perf_event *hwc = &event->hw;
-
-	/* Don't read disabled counters! */
-	if (hwc->idx < 0)
-		return;
-
-	mipspmu_event_update(event, hwc, hwc->idx);
-}
-
-static void mipspmu_enable(struct pmu *pmu)
-{
-	if (mipspmu)
-		mipspmu->start();
-}
-
-static void mipspmu_disable(struct pmu *pmu)
-{
-	if (mipspmu)
-		mipspmu->stop();
-}
-
-static atomic_t active_events = ATOMIC_INIT(0);
-static DEFINE_MUTEX(pmu_reserve_mutex);
-static int (*save_perf_irq)(void);
-
-static int mipspmu_get_irq(void)
-{
-	int err;
-
-	if (mipspmu->irq >= 0) {
-		/* Request my own irq handler. */
-		err = request_irq(mipspmu->irq, mipspmu->handle_irq,
-			IRQF_DISABLED | IRQF_NOBALANCING,
-			"mips_perf_pmu", NULL);
-		if (err) {
-			pr_warning("Unable to request IRQ%d for MIPS "
-			   "performance counters!\n", mipspmu->irq);
-		}
-	} else if (cp0_perfcount_irq < 0) {
-		/*
-		 * We are sharing the irq number with the timer interrupt.
-		 */
-		save_perf_irq = perf_irq;
-		perf_irq = mipspmu->handle_shared_irq;
-		err = 0;
-	} else {
-		pr_warning("The platform hasn't properly defined its "
-			"interrupt controller.\n");
-		err = -ENOENT;
-	}
-
-	return err;
-}
-
-static void mipspmu_free_irq(void)
-{
-	if (mipspmu->irq >= 0)
-		free_irq(mipspmu->irq, NULL);
-	else if (cp0_perfcount_irq < 0)
-		perf_irq = save_perf_irq;
-}
-
-/*
- * mipsxx/rm9000/loongson2 have different performance counters, they have
- * specific low-level init routines.
- */
-static void reset_counters(void *arg);
-static int __hw_perf_event_init(struct perf_event *event);
-
-static void hw_perf_event_destroy(struct perf_event *event)
-{
-	if (atomic_dec_and_mutex_lock(&active_events,
-				&pmu_reserve_mutex)) {
-		/*
-		 * We must not call the destroy function with interrupts
-		 * disabled.
-		 */
-		on_each_cpu(reset_counters,
-			(void *)(long)mipspmu->num_counters, 1);
-		mipspmu_free_irq();
-		mutex_unlock(&pmu_reserve_mutex);
-	}
-}
-
-static int mipspmu_event_init(struct perf_event *event)
-{
-	int err = 0;
-
-	switch (event->attr.type) {
-	case PERF_TYPE_RAW:
-	case PERF_TYPE_HARDWARE:
-	case PERF_TYPE_HW_CACHE:
-		break;
-
-	default:
-		return -ENOENT;
-	}
-
-	if (!mipspmu || event->cpu >= nr_cpumask_bits ||
-		(event->cpu >= 0 && !cpu_online(event->cpu)))
-		return -ENODEV;
-
-	if (!atomic_inc_not_zero(&active_events)) {
-		if (atomic_read(&active_events) > MIPS_MAX_HWEVENTS) {
-			atomic_dec(&active_events);
-			return -ENOSPC;
-		}
-
-		mutex_lock(&pmu_reserve_mutex);
-		if (atomic_read(&active_events) == 0)
-			err = mipspmu_get_irq();
-
-		if (!err)
-			atomic_inc(&active_events);
-		mutex_unlock(&pmu_reserve_mutex);
-	}
-
-	if (err)
-		return err;
-
-	err = __hw_perf_event_init(event);
-	if (err)
-		hw_perf_event_destroy(event);
-
-	return err;
-}
-
-static struct pmu pmu = {
-	.pmu_enable	= mipspmu_enable,
-	.pmu_disable	= mipspmu_disable,
-	.event_init	= mipspmu_event_init,
-	.add		= mipspmu_add,
-	.del		= mipspmu_del,
-	.start		= mipspmu_start,
-	.stop		= mipspmu_stop,
-	.read		= mipspmu_read,
-};
-
-static unsigned int mipspmu_perf_event_encode(const struct mips_perf_event *pev)
-{
-/*
- * Top 8 bits for range, next 16 bits for cntr_mask, lowest 8 bits for
- * event_id.
- */
-#ifdef CONFIG_MIPS_MT_SMP
-	return ((unsigned int)pev->range << 24) |
-		(pev->cntr_mask & 0xffff00) |
-		(pev->event_id & 0xff);
-#else
-	return (pev->cntr_mask & 0xffff00) |
-		(pev->event_id & 0xff);
-#endif
-}
-
-static const struct mips_perf_event *mipspmu_map_general_event(int idx)
-{
-	const struct mips_perf_event *pev;
-
-	pev = ((*mipspmu->general_event_map)[idx].event_id ==
-		UNSUPPORTED_PERF_EVENT_ID ? ERR_PTR(-EOPNOTSUPP) :
-		&(*mipspmu->general_event_map)[idx]);
-
-	return pev;
-}
-
-static const struct mips_perf_event *mipspmu_map_cache_event(u64 config)
-{
-	unsigned int cache_type, cache_op, cache_result;
-	const struct mips_perf_event *pev;
-
-	cache_type = (config >> 0) & 0xff;
-	if (cache_type >= PERF_COUNT_HW_CACHE_MAX)
-		return ERR_PTR(-EINVAL);
-
-	cache_op = (config >> 8) & 0xff;
-	if (cache_op >= PERF_COUNT_HW_CACHE_OP_MAX)
-		return ERR_PTR(-EINVAL);
-
-	cache_result = (config >> 16) & 0xff;
-	if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
-		return ERR_PTR(-EINVAL);
-
-	pev = &((*mipspmu->cache_event_map)
-					[cache_type]
-					[cache_op]
-					[cache_result]);
-
-	if (pev->event_id == UNSUPPORTED_PERF_EVENT_ID)
-		return ERR_PTR(-EOPNOTSUPP);
-
-	return pev;
-
-}
-
-static int validate_event(struct cpu_hw_events *cpuc,
-	       struct perf_event *event)
-{
-	struct hw_perf_event fake_hwc = event->hw;
-
-	/* Allow mixed event group. So return 1 to pass validation. */
-	if (event->pmu != &pmu || event->state <= PERF_EVENT_STATE_OFF)
-		return 1;
-
-	return mipspmu->alloc_counter(cpuc, &fake_hwc) >= 0;
-}
-
-static int validate_group(struct perf_event *event)
-{
-	struct perf_event *sibling, *leader = event->group_leader;
-	struct cpu_hw_events fake_cpuc;
-
-	memset(&fake_cpuc, 0, sizeof(fake_cpuc));
-
-	if (!validate_event(&fake_cpuc, leader))
-		return -ENOSPC;
-
-	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
-		if (!validate_event(&fake_cpuc, sibling))
-			return -ENOSPC;
-	}
-
-	if (!validate_event(&fake_cpuc, event))
-		return -ENOSPC;
-
-	return 0;
-}
-
-/* This is needed by specific irq handlers in perf_event_*.c */
-static void handle_associated_event(struct cpu_hw_events *cpuc,
-				    int idx, struct perf_sample_data *data,
-				    struct pt_regs *regs)
-{
-	struct perf_event *event = cpuc->events[idx];
-	struct hw_perf_event *hwc = &event->hw;
-
-	mipspmu_event_update(event, hwc, idx);
-	data->period = event->hw.last_period;
-	if (!mipspmu_event_set_period(event, hwc, idx))
-		return;
-
-	if (perf_event_overflow(event, 0, data, regs))
-		mipspmu->disable_event(idx);
-}
-
-#include "perf_event_mipsxx.c"
 
 /* Callchain handling code. */
 
 /*
  * Leave userspace callchain empty for now. When we find a way to trace
- * the user stack callchains, we add here.
+ * the user stack callchains, we will add it here.
  */
-void perf_callchain_user(struct perf_callchain_entry *entry,
-		    struct pt_regs *regs)
-{
-}
 
 static void save_raw_perf_callchain(struct perf_callchain_entry *entry,
 	unsigned long reg29)
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 72cd2e1..409207d 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -1,5 +1,531 @@
-#if defined(CONFIG_CPU_MIPS32) || defined(CONFIG_CPU_MIPS64) || \
-    defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_SB1)
+/*
+ * Linux performance counter support for MIPS.
+ *
+ * Copyright (C) 2010 MIPS Technologies, Inc.
+ * Author: Deng-Cheng Zhu
+ *
+ * This code is based on the implementation for ARM, which is in turn
+ * based on the sparc64 perf event code and the x86 code. Performance
+ * counter access is based on the MIPS Oprofile code. And the callchain
+ * support references the code of MIPS stacktrace.c.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/interrupt.h>
+#include <linux/smp.h>
+#include <linux/kernel.h>
+#include <linux/perf_event.h>
+#include <linux/uaccess.h>
+
+#include <asm/irq.h>
+#include <asm/irq_regs.h>
+#include <asm/stacktrace.h>
+#include <asm/time.h> /* For perf_irq */
+
+/* These are for 32bit counters. For 64bit ones, define them accordingly. */
+#define MAX_PERIOD	((1ULL << 32) - 1)
+#define VALID_COUNT	0x7fffffff
+#define TOTAL_BITS	32
+#define HIGHEST_BIT	31
+
+#define MIPS_MAX_HWEVENTS 4
+
+struct cpu_hw_events {
+	/* Array of events on this cpu. */
+	struct perf_event	*events[MIPS_MAX_HWEVENTS];
+
+	/*
+	 * Set the bit (indexed by the counter number) when the counter
+	 * is used for an event.
+	 */
+	unsigned long		used_mask[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
+
+	/*
+	 * The borrowed MSB for the performance counter. A MIPS performance
+	 * counter uses its bit 31 (for 32bit counters) or bit 63 (for 64bit
+	 * counters) as a factor of determining whether a counter overflow
+	 * should be signaled. So here we use a separate MSB for each
+	 * counter to make things easy.
+	 */
+	unsigned long		msbs[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
+
+	/*
+	 * Software copy of the control register for each performance counter.
+	 * MIPS CPUs vary in performance counters. They use this differently,
+	 * and even may not use it.
+	 */
+	unsigned int		saved_ctrl[MIPS_MAX_HWEVENTS];
+};
+DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
+	.saved_ctrl = {0},
+};
+
+/* The description of MIPS performance events. */
+struct mips_perf_event {
+	unsigned int event_id;
+	/*
+	 * MIPS performance counters are indexed starting from 0.
+	 * CNTR_EVEN indicates the indexes of the counters to be used are
+	 * even numbers.
+	 */
+	unsigned int cntr_mask;
+	#define CNTR_EVEN	0x55555555
+	#define CNTR_ODD	0xaaaaaaaa
+#ifdef CONFIG_MIPS_MT_SMP
+	enum {
+		T  = 0,
+		V  = 1,
+		P  = 2,
+	} range;
+#else
+	#define T
+	#define V
+	#define P
+#endif
+};
+
+static struct mips_perf_event raw_event;
+static DEFINE_MUTEX(raw_event_mutex);
+
+#define UNSUPPORTED_PERF_EVENT_ID 0xffffffff
+#define C(x) PERF_COUNT_HW_CACHE_##x
+
+struct mips_pmu {
+	const char	*name;
+	int		irq;
+	irqreturn_t	(*handle_irq)(int irq, void *dev);
+	int		(*handle_shared_irq)(void);
+	void		(*start)(void);
+	void		(*stop)(void);
+	int		(*alloc_counter)(struct cpu_hw_events *cpuc,
+					struct hw_perf_event *hwc);
+	u64		(*read_counter)(unsigned int idx);
+	void		(*write_counter)(unsigned int idx, u64 val);
+	void		(*enable_event)(struct hw_perf_event *evt, int idx);
+	void		(*disable_event)(int idx);
+	const struct mips_perf_event *(*map_raw_event)(u64 config);
+	const struct mips_perf_event (*general_event_map)[PERF_COUNT_HW_MAX];
+	const struct mips_perf_event (*cache_event_map)
+				[PERF_COUNT_HW_CACHE_MAX]
+				[PERF_COUNT_HW_CACHE_OP_MAX]
+				[PERF_COUNT_HW_CACHE_RESULT_MAX];
+	unsigned int	num_counters;
+};
+
+static const struct mips_pmu *mipspmu;
+
+static int mipspmu_event_set_period(struct perf_event *event,
+				    struct hw_perf_event *hwc,
+				    int idx)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	s64 left = local64_read(&hwc->period_left);
+	s64 period = hwc->sample_period;
+	int ret = 0;
+	u64 uleft;
+	unsigned long flags;
+
+	if (unlikely(left <= -period)) {
+		left = period;
+		local64_set(&hwc->period_left, left);
+		hwc->last_period = period;
+		ret = 1;
+	}
+
+	if (unlikely(left <= 0)) {
+		left += period;
+		local64_set(&hwc->period_left, left);
+		hwc->last_period = period;
+		ret = 1;
+	}
+
+	if (left > (s64)MAX_PERIOD)
+		left = MAX_PERIOD;
+
+	local64_set(&hwc->prev_count, (u64)-left);
+
+	local_irq_save(flags);
+	uleft = (u64)(-left) & MAX_PERIOD;
+	uleft > VALID_COUNT ?
+		set_bit(idx, cpuc->msbs) : clear_bit(idx, cpuc->msbs);
+	mipspmu->write_counter(idx, (u64)(-left) & VALID_COUNT);
+	local_irq_restore(flags);
+
+	perf_event_update_userpage(event);
+
+	return ret;
+}
+
+static void mipspmu_event_update(struct perf_event *event,
+				 struct hw_perf_event *hwc,
+				 int idx)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	unsigned long flags;
+	int shift = 64 - TOTAL_BITS;
+	s64 prev_raw_count, new_raw_count;
+	u64 delta;
+
+again:
+	prev_raw_count = local64_read(&hwc->prev_count);
+	local_irq_save(flags);
+	/* Make the counter value be a "real" one. */
+	new_raw_count = mipspmu->read_counter(idx);
+	if (new_raw_count & (test_bit(idx, cpuc->msbs) << HIGHEST_BIT)) {
+		new_raw_count &= VALID_COUNT;
+		clear_bit(idx, cpuc->msbs);
+	} else
+		new_raw_count |= (test_bit(idx, cpuc->msbs) << HIGHEST_BIT);
+	local_irq_restore(flags);
+
+	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+				new_raw_count) != prev_raw_count)
+		goto again;
+
+	delta = (new_raw_count << shift) - (prev_raw_count << shift);
+	delta >>= shift;
+
+	local64_add(delta, &event->count);
+	local64_sub(delta, &hwc->period_left);
+
+	return;
+}
+
+static void mipspmu_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!mipspmu)
+		return;
+
+	if (flags & PERF_EF_RELOAD)
+		WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+
+	hwc->state = 0;
+
+	/* Set the period for the event. */
+	mipspmu_event_set_period(event, hwc, hwc->idx);
+
+	/* Enable the event. */
+	mipspmu->enable_event(hwc, hwc->idx);
+}
+
+static void mipspmu_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!mipspmu)
+		return;
+
+	if (!(hwc->state & PERF_HES_STOPPED)) {
+		/* We are working on a local event. */
+		mipspmu->disable_event(hwc->idx);
+		barrier();
+		mipspmu_event_update(event, hwc, hwc->idx);
+		hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+	}
+}
+
+static int mipspmu_add(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+	int err = 0;
+
+	perf_pmu_disable(event->pmu);
+
+	/* To look for a free counter for this event. */
+	idx = mipspmu->alloc_counter(cpuc, hwc);
+	if (idx < 0) {
+		err = idx;
+		goto out;
+	}
+
+	/*
+	 * If there is an event in the counter we are going to use then
+	 * make sure it is disabled.
+	 */
+	event->hw.idx = idx;
+	mipspmu->disable_event(idx);
+	cpuc->events[idx] = event;
+
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+	if (flags & PERF_EF_START)
+		mipspmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate our changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+out:
+	perf_pmu_enable(event->pmu);
+	return err;
+}
+
+static void mipspmu_del(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
+
+	mipspmu_stop(event, PERF_EF_UPDATE);
+	cpuc->events[idx] = NULL;
+	clear_bit(idx, cpuc->used_mask);
+
+	perf_event_update_userpage(event);
+}
+
+static void mipspmu_read(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	/* Don't read disabled counters! */
+	if (hwc->idx < 0)
+		return;
+
+	mipspmu_event_update(event, hwc, hwc->idx);
+}
+
+static void mipspmu_enable(struct pmu *pmu)
+{
+	if (mipspmu)
+		mipspmu->start();
+}
+
+static void mipspmu_disable(struct pmu *pmu)
+{
+	if (mipspmu)
+		mipspmu->stop();
+}
+
+static atomic_t active_events = ATOMIC_INIT(0);
+static DEFINE_MUTEX(pmu_reserve_mutex);
+static int (*save_perf_irq)(void);
+
+static int mipspmu_get_irq(void)
+{
+	int err;
+
+	if (mipspmu->irq >= 0) {
+		/* Request my own irq handler. */
+		err = request_irq(mipspmu->irq, mipspmu->handle_irq,
+			IRQF_DISABLED | IRQF_NOBALANCING,
+			"mips_perf_pmu", NULL);
+		if (err) {
+			pr_warning("Unable to request IRQ%d for MIPS "
+			   "performance counters!\n", mipspmu->irq);
+		}
+	} else if (cp0_perfcount_irq < 0) {
+		/*
+		 * We are sharing the irq number with the timer interrupt.
+		 */
+		save_perf_irq = perf_irq;
+		perf_irq = mipspmu->handle_shared_irq;
+		err = 0;
+	} else {
+		pr_warning("The platform hasn't properly defined its "
+			"interrupt controller.\n");
+		err = -ENOENT;
+	}
+
+	return err;
+}
+
+static void mipspmu_free_irq(void)
+{
+	if (mipspmu->irq >= 0)
+		free_irq(mipspmu->irq, NULL);
+	else if (cp0_perfcount_irq < 0)
+		perf_irq = save_perf_irq;
+}
+
+/*
+ * mipsxx/rm9000/loongson2 have different performance counters, they have
+ * specific low-level init routines.
+ */
+static void reset_counters(void *arg);
+static int __hw_perf_event_init(struct perf_event *event);
+
+static void hw_perf_event_destroy(struct perf_event *event)
+{
+	if (atomic_dec_and_mutex_lock(&active_events,
+				&pmu_reserve_mutex)) {
+		/*
+		 * We must not call the destroy function with interrupts
+		 * disabled.
+		 */
+		on_each_cpu(reset_counters,
+			(void *)(long)mipspmu->num_counters, 1);
+		mipspmu_free_irq();
+		mutex_unlock(&pmu_reserve_mutex);
+	}
+}
+
+static int mipspmu_event_init(struct perf_event *event)
+{
+	int err = 0;
+
+	switch (event->attr.type) {
+	case PERF_TYPE_RAW:
+	case PERF_TYPE_HARDWARE:
+	case PERF_TYPE_HW_CACHE:
+		break;
+
+	default:
+		return -ENOENT;
+	}
+
+	if (!mipspmu || event->cpu >= nr_cpumask_bits ||
+		(event->cpu >= 0 && !cpu_online(event->cpu)))
+		return -ENODEV;
+
+	if (!atomic_inc_not_zero(&active_events)) {
+		if (atomic_read(&active_events) > MIPS_MAX_HWEVENTS) {
+			atomic_dec(&active_events);
+			return -ENOSPC;
+		}
+
+		mutex_lock(&pmu_reserve_mutex);
+		if (atomic_read(&active_events) == 0)
+			err = mipspmu_get_irq();
+
+		if (!err)
+			atomic_inc(&active_events);
+		mutex_unlock(&pmu_reserve_mutex);
+	}
+
+	if (err)
+		return err;
+
+	err = __hw_perf_event_init(event);
+	if (err)
+		hw_perf_event_destroy(event);
+
+	return err;
+}
+
+static struct pmu pmu = {
+	.pmu_enable	= mipspmu_enable,
+	.pmu_disable	= mipspmu_disable,
+	.event_init	= mipspmu_event_init,
+	.add		= mipspmu_add,
+	.del		= mipspmu_del,
+	.start		= mipspmu_start,
+	.stop		= mipspmu_stop,
+	.read		= mipspmu_read,
+};
+
+static unsigned int mipspmu_perf_event_encode(const struct mips_perf_event *pev)
+{
+/*
+ * Top 8 bits for range, next 16 bits for cntr_mask, lowest 8 bits for
+ * event_id.
+ */
+#ifdef CONFIG_MIPS_MT_SMP
+	return ((unsigned int)pev->range << 24) |
+		(pev->cntr_mask & 0xffff00) |
+		(pev->event_id & 0xff);
+#else
+	return (pev->cntr_mask & 0xffff00) |
+		(pev->event_id & 0xff);
+#endif
+}
+
+static const struct mips_perf_event *mipspmu_map_general_event(int idx)
+{
+	const struct mips_perf_event *pev;
+
+	pev = ((*mipspmu->general_event_map)[idx].event_id ==
+		UNSUPPORTED_PERF_EVENT_ID ? ERR_PTR(-EOPNOTSUPP) :
+		&(*mipspmu->general_event_map)[idx]);
+
+	return pev;
+}
+
+static const struct mips_perf_event *mipspmu_map_cache_event(u64 config)
+{
+	unsigned int cache_type, cache_op, cache_result;
+	const struct mips_perf_event *pev;
+
+	cache_type = (config >> 0) & 0xff;
+	if (cache_type >= PERF_COUNT_HW_CACHE_MAX)
+		return ERR_PTR(-EINVAL);
+
+	cache_op = (config >> 8) & 0xff;
+	if (cache_op >= PERF_COUNT_HW_CACHE_OP_MAX)
+		return ERR_PTR(-EINVAL);
+
+	cache_result = (config >> 16) & 0xff;
+	if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
+		return ERR_PTR(-EINVAL);
+
+	pev = &((*mipspmu->cache_event_map)
+					[cache_type]
+					[cache_op]
+					[cache_result]);
+
+	if (pev->event_id == UNSUPPORTED_PERF_EVENT_ID)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	return pev;
+
+}
+
+static int validate_event(struct cpu_hw_events *cpuc,
+	       struct perf_event *event)
+{
+	struct hw_perf_event fake_hwc = event->hw;
+
+	/* Allow mixed event group. So return 1 to pass validation. */
+	if (event->pmu != &pmu || event->state <= PERF_EVENT_STATE_OFF)
+		return 1;
+
+	return mipspmu->alloc_counter(cpuc, &fake_hwc) >= 0;
+}
+
+static int validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct cpu_hw_events fake_cpuc;
+
+	memset(&fake_cpuc, 0, sizeof(fake_cpuc));
+
+	if (!validate_event(&fake_cpuc, leader))
+		return -ENOSPC;
+
+	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+		if (!validate_event(&fake_cpuc, sibling))
+			return -ENOSPC;
+	}
+
+	if (!validate_event(&fake_cpuc, event))
+		return -ENOSPC;
+
+	return 0;
+}
+
+/* This is needed by specific irq handlers in perf_event_*.c */
+static void handle_associated_event(struct cpu_hw_events *cpuc,
+				    int idx, struct perf_sample_data *data,
+				    struct pt_regs *regs)
+{
+	struct perf_event *event = cpuc->events[idx];
+	struct hw_perf_event *hwc = &event->hw;
+
+	mipspmu_event_update(event, hwc, idx);
+	data->period = event->hw.last_period;
+	if (!mipspmu_event_set_period(event, hwc, idx))
+		return;
+
+	if (perf_event_overflow(event, 0, data, regs))
+		mipspmu->disable_event(idx);
+}
 
 #define M_CONFIG1_PC	(1 << 4)
 
@@ -1034,5 +1560,3 @@ init_hw_perf_events(void)
 	return 0;
 }
 early_initcall(init_hw_perf_events);
-
-#endif /* defined(CONFIG_CPU_MIPS32)... */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-01-21 22:59 [PATCH v2 0/4] MIPS: perf: Add support for 64-bit MIPS hardware counters David Daney
                   ` (2 preceding siblings ...)
  2011-01-21 22:59 ` [PATCH v2 3/4] MIPS: perf: Reorganize contents of perf support files David Daney
@ 2011-01-21 22:59 ` David Daney
  2011-01-25  3:42   ` Deng-Cheng Zhu
  3 siblings, 1 reply; 15+ messages in thread
From: David Daney @ 2011-01-21 22:59 UTC (permalink / raw)
  To: linux-mips, ralf
  Cc: David Daney, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Deng-Cheng Zhu

The hard coded constants are moved to struct mips_pmu.  All counter
register access move to the read_counter and write_counter function
pointers, which are set to either 32-bit or 64-bit access methods at
initialization time.

Many of the function pointers in struct mips_pmu were not needed as
there was only a single implementation, these were removed.

I couldn't figure out what made struct cpu_hw_events.msbs[] at all
useful, so I removed it too.

Some functions and other declarations were reordered to reduce the
need for forward declarations.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
---
 arch/mips/kernel/perf_event_mipsxx.c |  844 ++++++++++++++++------------------
 1 files changed, 387 insertions(+), 457 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 409207d..f15bb01 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -2,6 +2,7 @@
  * Linux performance counter support for MIPS.
  *
  * Copyright (C) 2010 MIPS Technologies, Inc.
+ * Copyright (C) 2011 Cavium Networks, Inc.
  * Author: Deng-Cheng Zhu
  *
  * This code is based on the implementation for ARM, which is in turn
@@ -26,12 +27,6 @@
 #include <asm/stacktrace.h>
 #include <asm/time.h> /* For perf_irq */
 
-/* These are for 32bit counters. For 64bit ones, define them accordingly. */
-#define MAX_PERIOD	((1ULL << 32) - 1)
-#define VALID_COUNT	0x7fffffff
-#define TOTAL_BITS	32
-#define HIGHEST_BIT	31
-
 #define MIPS_MAX_HWEVENTS 4
 
 struct cpu_hw_events {
@@ -45,15 +40,6 @@ struct cpu_hw_events {
 	unsigned long		used_mask[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
 
 	/*
-	 * The borrowed MSB for the performance counter. A MIPS performance
-	 * counter uses its bit 31 (for 32bit counters) or bit 63 (for 64bit
-	 * counters) as a factor of determining whether a counter overflow
-	 * should be signaled. So here we use a separate MSB for each
-	 * counter to make things easy.
-	 */
-	unsigned long		msbs[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
-
-	/*
 	 * Software copy of the control register for each performance counter.
 	 * MIPS CPUs vary in performance counters. They use this differently,
 	 * and even may not use it.
@@ -75,6 +61,7 @@ struct mips_perf_event {
 	unsigned int cntr_mask;
 	#define CNTR_EVEN	0x55555555
 	#define CNTR_ODD	0xaaaaaaaa
+	#define CNTR_ALL	0xffffffff
 #ifdef CONFIG_MIPS_MT_SMP
 	enum {
 		T  = 0,
@@ -95,18 +82,13 @@ static DEFINE_MUTEX(raw_event_mutex);
 #define C(x) PERF_COUNT_HW_CACHE_##x
 
 struct mips_pmu {
+	u64		max_period;
+	u64		valid_count;
+	u64		overflow;
 	const char	*name;
 	int		irq;
-	irqreturn_t	(*handle_irq)(int irq, void *dev);
-	int		(*handle_shared_irq)(void);
-	void		(*start)(void);
-	void		(*stop)(void);
-	int		(*alloc_counter)(struct cpu_hw_events *cpuc,
-					struct hw_perf_event *hwc);
 	u64		(*read_counter)(unsigned int idx);
 	void		(*write_counter)(unsigned int idx, u64 val);
-	void		(*enable_event)(struct hw_perf_event *evt, int idx);
-	void		(*disable_event)(int idx);
 	const struct mips_perf_event *(*map_raw_event)(u64 config);
 	const struct mips_perf_event (*general_event_map)[PERF_COUNT_HW_MAX];
 	const struct mips_perf_event (*cache_event_map)
@@ -116,43 +98,302 @@ struct mips_pmu {
 	unsigned int	num_counters;
 };
 
-static const struct mips_pmu *mipspmu;
+static struct mips_pmu mipspmu;
+
+#define M_CONFIG1_PC	(1 << 4)
+
+#define M_PERFCTL_EXL			(1      <<  0)
+#define M_PERFCTL_KERNEL		(1      <<  1)
+#define M_PERFCTL_SUPERVISOR		(1      <<  2)
+#define M_PERFCTL_USER			(1      <<  3)
+#define M_PERFCTL_INTERRUPT_ENABLE	(1      <<  4)
+#define M_PERFCTL_EVENT(event)		(((event) & 0x3ff)  << 5)
+#define M_PERFCTL_VPEID(vpe)		((vpe)    << 16)
+#define M_PERFCTL_MT_EN(filter)		((filter) << 20)
+#define    M_TC_EN_ALL			M_PERFCTL_MT_EN(0)
+#define    M_TC_EN_VPE			M_PERFCTL_MT_EN(1)
+#define    M_TC_EN_TC			M_PERFCTL_MT_EN(2)
+#define M_PERFCTL_TCID(tcid)		((tcid)   << 22)
+#define M_PERFCTL_WIDE			(1      << 30)
+#define M_PERFCTL_MORE			(1      << 31)
+
+#define M_PERFCTL_COUNT_EVENT_WHENEVER	(M_PERFCTL_EXL |		\
+					M_PERFCTL_KERNEL |		\
+					M_PERFCTL_USER |		\
+					M_PERFCTL_SUPERVISOR |		\
+					M_PERFCTL_INTERRUPT_ENABLE)
+
+#ifdef CONFIG_MIPS_MT_SMP
+#define M_PERFCTL_CONFIG_MASK		0x3fff801f
+#else
+#define M_PERFCTL_CONFIG_MASK		0x1f
+#endif
+#define M_PERFCTL_EVENT_MASK		0xfe0
+
+
+#ifdef CONFIG_MIPS_MT_SMP
+static int cpu_has_mipsmt_pertccounters;
+
+static DEFINE_RWLOCK(pmuint_rwlock);
+
+/*
+ * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
+ * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
+ */
+#if defined(CONFIG_HW_PERF_EVENTS)
+#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
+			0 : smp_processor_id())
+#else
+#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
+			0 : cpu_data[smp_processor_id()].vpe_id)
+#endif
+
+/* Copied from op_model_mipsxx.c */
+static unsigned int vpe_shift(void)
+{
+	if (num_possible_cpus() > 1)
+		return 1;
+
+	return 0;
+}
+
+static unsigned int counters_total_to_per_cpu(unsigned int counters)
+{
+	return counters >> vpe_shift();
+}
+
+static unsigned int counters_per_cpu_to_total(unsigned int counters)
+{
+	return counters << vpe_shift();
+}
+
+#else /* !CONFIG_MIPS_MT_SMP */
+#define vpe_id()	0
+
+#endif /* CONFIG_MIPS_MT_SMP */
+
+static void resume_local_counters(void);
+static void pause_local_counters(void);
+static irqreturn_t mipsxx_pmu_handle_irq(int, void *);
+static int mipsxx_pmu_handle_shared_irq(void);
+
+static unsigned int mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
+{
+	if (vpe_id() == 1)
+		idx = (idx + 2) & 3;
+	return idx;
+}
+
+static u64 mipsxx_pmu_read_counter(unsigned int idx)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		return read_c0_perfcntr0();
+	case 1:
+		return read_c0_perfcntr1();
+	case 2:
+		return read_c0_perfcntr2();
+	case 3:
+		return read_c0_perfcntr3();
+	default:
+		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
+		return 0;
+	}
+}
+
+static u64 mipsxx_pmu_read_counter_64(unsigned int idx)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		return read_c0_perfcntr0_64();
+	case 1:
+		return read_c0_perfcntr1_64();
+	case 2:
+		return read_c0_perfcntr2_64();
+	case 3:
+		return read_c0_perfcntr3_64();
+	default:
+		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
+		return 0;
+	}
+}
+
+static void mipsxx_pmu_write_counter(unsigned int idx, u64 val)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		write_c0_perfcntr0(val);
+		return;
+	case 1:
+		write_c0_perfcntr1(val);
+		return;
+	case 2:
+		write_c0_perfcntr2(val);
+		return;
+	case 3:
+		write_c0_perfcntr3(val);
+		return;
+	}
+}
+
+static void mipsxx_pmu_write_counter_64(unsigned int idx, u64 val)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		write_c0_perfcntr0_64(val);
+		return;
+	case 1:
+		write_c0_perfcntr1_64(val);
+		return;
+	case 2:
+		write_c0_perfcntr2_64(val);
+		return;
+	case 3:
+		write_c0_perfcntr3_64(val);
+		return;
+	}
+}
+
+static unsigned int mipsxx_pmu_read_control(unsigned int idx)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		return read_c0_perfctrl0();
+	case 1:
+		return read_c0_perfctrl1();
+	case 2:
+		return read_c0_perfctrl2();
+	case 3:
+		return read_c0_perfctrl3();
+	default:
+		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
+		return 0;
+	}
+}
+
+static void mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		write_c0_perfctrl0(val);
+		return;
+	case 1:
+		write_c0_perfctrl1(val);
+		return;
+	case 2:
+		write_c0_perfctrl2(val);
+		return;
+	case 3:
+		write_c0_perfctrl3(val);
+		return;
+	}
+}
+
+static int mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
+				    struct hw_perf_event *hwc)
+{
+	int i;
+
+	/*
+	 * We only need to care the counter mask. The range has been
+	 * checked definitely.
+	 */
+	unsigned long cntr_mask = (hwc->event_base >> 8) & 0xffff;
+
+	for (i = mipspmu.num_counters - 1; i >= 0; i--) {
+		/*
+		 * Note that some MIPS perf events can be counted by both
+		 * even and odd counters, wheresas many other are only by
+		 * even _or_ odd counters. This introduces an issue that
+		 * when the former kind of event takes the counter the
+		 * latter kind of event wants to use, then the "counter
+		 * allocation" for the latter event will fail. In fact if
+		 * they can be dynamically swapped, they both feel happy.
+		 * But here we leave this issue alone for now.
+		 */
+		if (test_bit(i, &cntr_mask) &&
+			!test_and_set_bit(i, cpuc->used_mask))
+			return i;
+	}
+
+	return -EAGAIN;
+}
+
+static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	unsigned long flags;
+
+	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
+
+	local_irq_save(flags);
+	cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base & 0xff) |
+		(evt->config_base & M_PERFCTL_CONFIG_MASK) |
+		/* Make sure interrupt enabled. */
+		M_PERFCTL_INTERRUPT_ENABLE;
+	/*
+	 * We do not actually let the counter run. Leave it until start().
+	 */
+	local_irq_restore(flags);
+}
+
+static void mipsxx_pmu_disable_event(int idx)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	unsigned long flags;
+
+	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
+
+	local_irq_save(flags);
+	cpuc->saved_ctrl[idx] = mipsxx_pmu_read_control(idx) &
+		~M_PERFCTL_COUNT_EVENT_WHENEVER;
+	mipsxx_pmu_write_control(idx, cpuc->saved_ctrl[idx]);
+	local_irq_restore(flags);
+}
 
 static int mipspmu_event_set_period(struct perf_event *event,
 				    struct hw_perf_event *hwc,
 				    int idx)
 {
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	s64 left = local64_read(&hwc->period_left);
-	s64 period = hwc->sample_period;
+	u64 left = local64_read(&hwc->period_left);
+	u64 period = hwc->sample_period;
 	int ret = 0;
-	u64 uleft;
 	unsigned long flags;
 
-	if (unlikely(left <= -period)) {
+	if (unlikely((left + period) & (1ULL << 63))) {
 		left = period;
 		local64_set(&hwc->period_left, left);
 		hwc->last_period = period;
 		ret = 1;
 	}
 
-	if (unlikely(left <= 0)) {
+
+	if (unlikely((left + period) <= period)) {
 		left += period;
 		local64_set(&hwc->period_left, left);
 		hwc->last_period = period;
 		ret = 1;
 	}
 
-	if (left > (s64)MAX_PERIOD)
-		left = MAX_PERIOD;
+	if (left > mipspmu.max_period)
+		left = mipspmu.max_period;
 
-	local64_set(&hwc->prev_count, (u64)-left);
+	local64_set(&hwc->prev_count, mipspmu.overflow - left);
 
 	local_irq_save(flags);
-	uleft = (u64)(-left) & MAX_PERIOD;
-	uleft > VALID_COUNT ?
-		set_bit(idx, cpuc->msbs) : clear_bit(idx, cpuc->msbs);
-	mipspmu->write_counter(idx, (u64)(-left) & VALID_COUNT);
+	mipspmu.write_counter(idx, mipspmu.overflow - left);
 	local_irq_restore(flags);
 
 	perf_event_update_userpage(event);
@@ -164,30 +405,22 @@ static void mipspmu_event_update(struct perf_event *event,
 				 struct hw_perf_event *hwc,
 				 int idx)
 {
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	unsigned long flags;
-	int shift = 64 - TOTAL_BITS;
-	s64 prev_raw_count, new_raw_count;
+	u64 prev_raw_count, new_raw_count;
 	u64 delta;
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
 	local_irq_save(flags);
 	/* Make the counter value be a "real" one. */
-	new_raw_count = mipspmu->read_counter(idx);
-	if (new_raw_count & (test_bit(idx, cpuc->msbs) << HIGHEST_BIT)) {
-		new_raw_count &= VALID_COUNT;
-		clear_bit(idx, cpuc->msbs);
-	} else
-		new_raw_count |= (test_bit(idx, cpuc->msbs) << HIGHEST_BIT);
+	new_raw_count = mipspmu.read_counter(idx);
 	local_irq_restore(flags);
 
 	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
 				new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count << shift) - (prev_raw_count << shift);
-	delta >>= shift;
+	delta = new_raw_count - prev_raw_count;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -199,9 +432,6 @@ static void mipspmu_start(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (!mipspmu)
-		return;
-
 	if (flags & PERF_EF_RELOAD)
 		WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
 
@@ -211,19 +441,16 @@ static void mipspmu_start(struct perf_event *event, int flags)
 	mipspmu_event_set_period(event, hwc, hwc->idx);
 
 	/* Enable the event. */
-	mipspmu->enable_event(hwc, hwc->idx);
+	mipsxx_pmu_enable_event(hwc, hwc->idx);
 }
 
 static void mipspmu_stop(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (!mipspmu)
-		return;
-
 	if (!(hwc->state & PERF_HES_STOPPED)) {
 		/* We are working on a local event. */
-		mipspmu->disable_event(hwc->idx);
+		mipsxx_pmu_disable_event(hwc->idx);
 		barrier();
 		mipspmu_event_update(event, hwc, hwc->idx);
 		hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
@@ -240,7 +467,7 @@ static int mipspmu_add(struct perf_event *event, int flags)
 	perf_pmu_disable(event->pmu);
 
 	/* To look for a free counter for this event. */
-	idx = mipspmu->alloc_counter(cpuc, hwc);
+	idx = mipsxx_pmu_alloc_counter(cpuc, hwc);
 	if (idx < 0) {
 		err = idx;
 		goto out;
@@ -251,7 +478,7 @@ static int mipspmu_add(struct perf_event *event, int flags)
 	 * make sure it is disabled.
 	 */
 	event->hw.idx = idx;
-	mipspmu->disable_event(idx);
+	mipsxx_pmu_disable_event(idx);
 	cpuc->events[idx] = event;
 
 	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
@@ -272,7 +499,7 @@ static void mipspmu_del(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
+	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
 
 	mipspmu_stop(event, PERF_EF_UPDATE);
 	cpuc->events[idx] = NULL;
@@ -294,14 +521,29 @@ static void mipspmu_read(struct perf_event *event)
 
 static void mipspmu_enable(struct pmu *pmu)
 {
-	if (mipspmu)
-		mipspmu->start();
+#ifdef CONFIG_MIPS_MT_SMP
+	write_unlock(&pmuint_rwlock);
+#endif
+	resume_local_counters();
 }
 
+/*
+ * MIPS performance counters can be per-TC. The control registers can
+ * not be directly accessed accross CPUs. Hence if we want to do global
+ * control, we need cross CPU calls. on_each_cpu() can help us, but we
+ * can not make sure this function is called with interrupts enabled. So
+ * here we pause local counters and then grab a rwlock and leave the
+ * counters on other CPUs alone. If any counter interrupt raises while
+ * we own the write lock, simply pause local counters on that CPU and
+ * spin in the handler. Also we know we won't be switched to another
+ * CPU after pausing local counters and before grabbing the lock.
+ */
 static void mipspmu_disable(struct pmu *pmu)
 {
-	if (mipspmu)
-		mipspmu->stop();
+	pause_local_counters();
+#ifdef CONFIG_MIPS_MT_SMP
+	write_lock(&pmuint_rwlock);
+#endif
 }
 
 static atomic_t active_events = ATOMIC_INIT(0);
@@ -312,21 +554,21 @@ static int mipspmu_get_irq(void)
 {
 	int err;
 
-	if (mipspmu->irq >= 0) {
+	if (mipspmu.irq >= 0) {
 		/* Request my own irq handler. */
-		err = request_irq(mipspmu->irq, mipspmu->handle_irq,
-			IRQF_DISABLED | IRQF_NOBALANCING,
+		err = request_irq(mipspmu.irq, mipsxx_pmu_handle_irq,
+			IRQF_PERCPU | IRQF_NOBALANCING,
 			"mips_perf_pmu", NULL);
 		if (err) {
 			pr_warning("Unable to request IRQ%d for MIPS "
-			   "performance counters!\n", mipspmu->irq);
+			   "performance counters!\n", mipspmu.irq);
 		}
 	} else if (cp0_perfcount_irq < 0) {
 		/*
 		 * We are sharing the irq number with the timer interrupt.
 		 */
 		save_perf_irq = perf_irq;
-		perf_irq = mipspmu->handle_shared_irq;
+		perf_irq = mipsxx_pmu_handle_shared_irq;
 		err = 0;
 	} else {
 		pr_warning("The platform hasn't properly defined its "
@@ -339,8 +581,8 @@ static int mipspmu_get_irq(void)
 
 static void mipspmu_free_irq(void)
 {
-	if (mipspmu->irq >= 0)
-		free_irq(mipspmu->irq, NULL);
+	if (mipspmu.irq >= 0)
+		free_irq(mipspmu.irq, NULL);
 	else if (cp0_perfcount_irq < 0)
 		perf_irq = save_perf_irq;
 }
@@ -361,7 +603,7 @@ static void hw_perf_event_destroy(struct perf_event *event)
 		 * disabled.
 		 */
 		on_each_cpu(reset_counters,
-			(void *)(long)mipspmu->num_counters, 1);
+			(void *)(long)mipspmu.num_counters, 1);
 		mipspmu_free_irq();
 		mutex_unlock(&pmu_reserve_mutex);
 	}
@@ -381,8 +623,8 @@ static int mipspmu_event_init(struct perf_event *event)
 		return -ENOENT;
 	}
 
-	if (!mipspmu || event->cpu >= nr_cpumask_bits ||
-		(event->cpu >= 0 && !cpu_online(event->cpu)))
+	if (event->cpu >= nr_cpumask_bits ||
+	    (event->cpu >= 0 && !cpu_online(event->cpu)))
 		return -ENODEV;
 
 	if (!atomic_inc_not_zero(&active_events)) {
@@ -441,9 +683,9 @@ static const struct mips_perf_event *mipspmu_map_general_event(int idx)
 {
 	const struct mips_perf_event *pev;
 
-	pev = ((*mipspmu->general_event_map)[idx].event_id ==
+	pev = ((*mipspmu.general_event_map)[idx].event_id ==
 		UNSUPPORTED_PERF_EVENT_ID ? ERR_PTR(-EOPNOTSUPP) :
-		&(*mipspmu->general_event_map)[idx]);
+		&(*mipspmu.general_event_map)[idx]);
 
 	return pev;
 }
@@ -465,7 +707,7 @@ static const struct mips_perf_event *mipspmu_map_cache_event(u64 config)
 	if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
 		return ERR_PTR(-EINVAL);
 
-	pev = &((*mipspmu->cache_event_map)
+	pev = &((*mipspmu.cache_event_map)
 					[cache_type]
 					[cache_op]
 					[cache_result]);
@@ -486,7 +728,7 @@ static int validate_event(struct cpu_hw_events *cpuc,
 	if (event->pmu != &pmu || event->state <= PERF_EVENT_STATE_OFF)
 		return 1;
 
-	return mipspmu->alloc_counter(cpuc, &fake_hwc) >= 0;
+	return mipsxx_pmu_alloc_counter(cpuc, &fake_hwc) >= 0;
 }
 
 static int validate_group(struct perf_event *event)
@@ -524,123 +766,9 @@ static void handle_associated_event(struct cpu_hw_events *cpuc,
 		return;
 
 	if (perf_event_overflow(event, 0, data, regs))
-		mipspmu->disable_event(idx);
-}
-
-#define M_CONFIG1_PC	(1 << 4)
-
-#define M_PERFCTL_EXL			(1UL      <<  0)
-#define M_PERFCTL_KERNEL		(1UL      <<  1)
-#define M_PERFCTL_SUPERVISOR		(1UL      <<  2)
-#define M_PERFCTL_USER			(1UL      <<  3)
-#define M_PERFCTL_INTERRUPT_ENABLE	(1UL      <<  4)
-#define M_PERFCTL_EVENT(event)		(((event) & 0x3ff)  << 5)
-#define M_PERFCTL_VPEID(vpe)		((vpe)    << 16)
-#define M_PERFCTL_MT_EN(filter)		((filter) << 20)
-#define    M_TC_EN_ALL			M_PERFCTL_MT_EN(0)
-#define    M_TC_EN_VPE			M_PERFCTL_MT_EN(1)
-#define    M_TC_EN_TC			M_PERFCTL_MT_EN(2)
-#define M_PERFCTL_TCID(tcid)		((tcid)   << 22)
-#define M_PERFCTL_WIDE			(1UL      << 30)
-#define M_PERFCTL_MORE			(1UL      << 31)
-
-#define M_PERFCTL_COUNT_EVENT_WHENEVER	(M_PERFCTL_EXL |		\
-					M_PERFCTL_KERNEL |		\
-					M_PERFCTL_USER |		\
-					M_PERFCTL_SUPERVISOR |		\
-					M_PERFCTL_INTERRUPT_ENABLE)
-
-#ifdef CONFIG_MIPS_MT_SMP
-#define M_PERFCTL_CONFIG_MASK		0x3fff801f
-#else
-#define M_PERFCTL_CONFIG_MASK		0x1f
-#endif
-#define M_PERFCTL_EVENT_MASK		0xfe0
-
-#define M_COUNTER_OVERFLOW		(1UL      << 31)
-
-#ifdef CONFIG_MIPS_MT_SMP
-static int cpu_has_mipsmt_pertccounters;
-
-/*
- * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
- * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
- */
-#if defined(CONFIG_HW_PERF_EVENTS)
-#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
-			0 : smp_processor_id())
-#else
-#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
-			0 : cpu_data[smp_processor_id()].vpe_id)
-#endif
-
-/* Copied from op_model_mipsxx.c */
-static unsigned int vpe_shift(void)
-{
-	if (num_possible_cpus() > 1)
-		return 1;
-
-	return 0;
+		mipsxx_pmu_disable_event(idx);
 }
 
-static unsigned int counters_total_to_per_cpu(unsigned int counters)
-{
-	return counters >> vpe_shift();
-}
-
-static unsigned int counters_per_cpu_to_total(unsigned int counters)
-{
-	return counters << vpe_shift();
-}
-
-#else /* !CONFIG_MIPS_MT_SMP */
-#define vpe_id()	0
-
-#endif /* CONFIG_MIPS_MT_SMP */
-
-#define __define_perf_accessors(r, n, np)				\
-									\
-static unsigned int r_c0_ ## r ## n(void)				\
-{									\
-	unsigned int cpu = vpe_id();					\
-									\
-	switch (cpu) {							\
-	case 0:								\
-		return read_c0_ ## r ## n();				\
-	case 1:								\
-		return read_c0_ ## r ## np();				\
-	default:							\
-		BUG();							\
-	}								\
-	return 0;							\
-}									\
-									\
-static void w_c0_ ## r ## n(unsigned int value)				\
-{									\
-	unsigned int cpu = vpe_id();					\
-									\
-	switch (cpu) {							\
-	case 0:								\
-		write_c0_ ## r ## n(value);				\
-		return;							\
-	case 1:								\
-		write_c0_ ## r ## np(value);				\
-		return;							\
-	default:							\
-		BUG();							\
-	}								\
-	return;								\
-}									\
-
-__define_perf_accessors(perfcntr, 0, 2)
-__define_perf_accessors(perfcntr, 1, 3)
-__define_perf_accessors(perfcntr, 2, 0)
-__define_perf_accessors(perfcntr, 3, 1)
-
-__define_perf_accessors(perfctrl, 0, 2)
-__define_perf_accessors(perfctrl, 1, 3)
-__define_perf_accessors(perfctrl, 2, 0)
-__define_perf_accessors(perfctrl, 3, 1)
 
 static int __n_counters(void)
 {
@@ -682,94 +810,20 @@ static void reset_counters(void *arg)
 	int counters = (int)(long)arg;
 	switch (counters) {
 	case 4:
-		w_c0_perfctrl3(0);
-		w_c0_perfcntr3(0);
+		mipsxx_pmu_write_control(3, 0);
+		mipspmu.write_counter(3, 0);
 	case 3:
-		w_c0_perfctrl2(0);
-		w_c0_perfcntr2(0);
+		mipsxx_pmu_write_control(2, 0);
+		mipspmu.write_counter(2, 0);
 	case 2:
-		w_c0_perfctrl1(0);
-		w_c0_perfcntr1(0);
-	case 1:
-		w_c0_perfctrl0(0);
-		w_c0_perfcntr0(0);
-	}
-}
-
-static u64 mipsxx_pmu_read_counter(unsigned int idx)
-{
-	switch (idx) {
-	case 0:
-		return r_c0_perfcntr0();
+		mipsxx_pmu_write_control(1, 0);
+		mipspmu.write_counter(1, 0);
 	case 1:
-		return r_c0_perfcntr1();
-	case 2:
-		return r_c0_perfcntr2();
-	case 3:
-		return r_c0_perfcntr3();
-	default:
-		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
-		return 0;
-	}
-}
-
-static void mipsxx_pmu_write_counter(unsigned int idx, u64 val)
-{
-	switch (idx) {
-	case 0:
-		w_c0_perfcntr0(val);
-		return;
-	case 1:
-		w_c0_perfcntr1(val);
-		return;
-	case 2:
-		w_c0_perfcntr2(val);
-		return;
-	case 3:
-		w_c0_perfcntr3(val);
-		return;
-	}
-}
-
-static unsigned int mipsxx_pmu_read_control(unsigned int idx)
-{
-	switch (idx) {
-	case 0:
-		return r_c0_perfctrl0();
-	case 1:
-		return r_c0_perfctrl1();
-	case 2:
-		return r_c0_perfctrl2();
-	case 3:
-		return r_c0_perfctrl3();
-	default:
-		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
-		return 0;
-	}
-}
-
-static void mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
-{
-	switch (idx) {
-	case 0:
-		w_c0_perfctrl0(val);
-		return;
-	case 1:
-		w_c0_perfctrl1(val);
-		return;
-	case 2:
-		w_c0_perfctrl2(val);
-		return;
-	case 3:
-		w_c0_perfctrl3(val);
-		return;
+		mipsxx_pmu_write_control(0, 0);
+		mipspmu.write_counter(0, 0);
 	}
 }
 
-#ifdef CONFIG_MIPS_MT_SMP
-static DEFINE_RWLOCK(pmuint_rwlock);
-#endif
-
 /* 24K/34K/1004K cores can share the same event map. */
 static const struct mips_perf_event mipsxxcore_event_map
 				[PERF_COUNT_HW_MAX] = {
@@ -1047,7 +1101,7 @@ static int __hw_perf_event_init(struct perf_event *event)
 	} else if (PERF_TYPE_RAW == event->attr.type) {
 		/* We are working on the global raw event. */
 		mutex_lock(&raw_event_mutex);
-		pev = mipspmu->map_raw_event(event->attr.config);
+		pev = mipspmu.map_raw_event(event->attr.config);
 	} else {
 		/* The event type is not (yet) supported. */
 		return -EOPNOTSUPP;
@@ -1092,7 +1146,7 @@ static int __hw_perf_event_init(struct perf_event *event)
 	hwc->config = 0;
 
 	if (!hwc->sample_period) {
-		hwc->sample_period  = MAX_PERIOD;
+		hwc->sample_period  = mipspmu.max_period;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
@@ -1105,55 +1159,38 @@ static int __hw_perf_event_init(struct perf_event *event)
 	}
 
 	event->destroy = hw_perf_event_destroy;
-
 	return err;
 }
 
 static void pause_local_counters(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	int counters = mipspmu->num_counters;
+	int ctr = mipspmu.num_counters;
 	unsigned long flags;
 
 	local_irq_save(flags);
-	switch (counters) {
-	case 4:
-		cpuc->saved_ctrl[3] = r_c0_perfctrl3();
-		w_c0_perfctrl3(cpuc->saved_ctrl[3] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	case 3:
-		cpuc->saved_ctrl[2] = r_c0_perfctrl2();
-		w_c0_perfctrl2(cpuc->saved_ctrl[2] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	case 2:
-		cpuc->saved_ctrl[1] = r_c0_perfctrl1();
-		w_c0_perfctrl1(cpuc->saved_ctrl[1] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	case 1:
-		cpuc->saved_ctrl[0] = r_c0_perfctrl0();
-		w_c0_perfctrl0(cpuc->saved_ctrl[0] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	}
+	do {
+		ctr--;
+		cpuc->saved_ctrl[ctr] = mipsxx_pmu_read_control(ctr);
+		mipsxx_pmu_write_control(ctr, cpuc->saved_ctrl[ctr] &
+					 ~M_PERFCTL_COUNT_EVENT_WHENEVER);
+	} while (ctr > 0);
 	local_irq_restore(flags);
 }
 
 static void resume_local_counters(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	int counters = mipspmu->num_counters;
+	int ctr = mipspmu.num_counters;
 	unsigned long flags;
 
 	local_irq_save(flags);
-	switch (counters) {
-	case 4:
-		w_c0_perfctrl3(cpuc->saved_ctrl[3]);
-	case 3:
-		w_c0_perfctrl2(cpuc->saved_ctrl[2]);
-	case 2:
-		w_c0_perfctrl1(cpuc->saved_ctrl[1]);
-	case 1:
-		w_c0_perfctrl0(cpuc->saved_ctrl[0]);
-	}
+
+	do {
+		ctr--;
+		mipsxx_pmu_write_control(ctr, cpuc->saved_ctrl[ctr]);
+	} while (ctr > 0);
+
 	local_irq_restore(flags);
 }
 
@@ -1161,14 +1198,13 @@ static int mipsxx_pmu_handle_shared_irq(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct perf_sample_data data;
-	unsigned int counters = mipspmu->num_counters;
-	unsigned int counter;
+	unsigned int counters = mipspmu.num_counters;
+	u64 counter;
 	int handled = IRQ_NONE;
 	struct pt_regs *regs;
 
 	if (cpu_has_mips_r2 && !(read_c0_cause() & (1 << 26)))
 		return handled;
-
 	/*
 	 * First we pause the local counters, so that when we are locked
 	 * here, the counters are all paused. When it gets locked due to
@@ -1189,13 +1225,9 @@ static int mipsxx_pmu_handle_shared_irq(void)
 #define HANDLE_COUNTER(n)						\
 	case n + 1:							\
 		if (test_bit(n, cpuc->used_mask)) {			\
-			counter = r_c0_perfcntr ## n();			\
-			if (counter & M_COUNTER_OVERFLOW) {		\
-				w_c0_perfcntr ## n(counter &		\
-						VALID_COUNT);		\
-				if (test_and_change_bit(n, cpuc->msbs))	\
-					handle_associated_event(cpuc,	\
-						n, &data, regs);	\
+			counter = mipspmu.read_counter(n);		\
+			if (counter & mipspmu.overflow) {		\
+				handle_associated_event(cpuc, n, &data, regs); \
 				handled = IRQ_HANDLED;			\
 			}						\
 		}
@@ -1225,95 +1257,6 @@ static irqreturn_t mipsxx_pmu_handle_irq(int irq, void *dev)
 	return mipsxx_pmu_handle_shared_irq();
 }
 
-static void mipsxx_pmu_start(void)
-{
-#ifdef CONFIG_MIPS_MT_SMP
-	write_unlock(&pmuint_rwlock);
-#endif
-	resume_local_counters();
-}
-
-/*
- * MIPS performance counters can be per-TC. The control registers can
- * not be directly accessed accross CPUs. Hence if we want to do global
- * control, we need cross CPU calls. on_each_cpu() can help us, but we
- * can not make sure this function is called with interrupts enabled. So
- * here we pause local counters and then grab a rwlock and leave the
- * counters on other CPUs alone. If any counter interrupt raises while
- * we own the write lock, simply pause local counters on that CPU and
- * spin in the handler. Also we know we won't be switched to another
- * CPU after pausing local counters and before grabbing the lock.
- */
-static void mipsxx_pmu_stop(void)
-{
-	pause_local_counters();
-#ifdef CONFIG_MIPS_MT_SMP
-	write_lock(&pmuint_rwlock);
-#endif
-}
-
-static int mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
-				    struct hw_perf_event *hwc)
-{
-	int i;
-
-	/*
-	 * We only need to care the counter mask. The range has been
-	 * checked definitely.
-	 */
-	unsigned long cntr_mask = (hwc->event_base >> 8) & 0xffff;
-
-	for (i = mipspmu->num_counters - 1; i >= 0; i--) {
-		/*
-		 * Note that some MIPS perf events can be counted by both
-		 * even and odd counters, wheresas many other are only by
-		 * even _or_ odd counters. This introduces an issue that
-		 * when the former kind of event takes the counter the
-		 * latter kind of event wants to use, then the "counter
-		 * allocation" for the latter event will fail. In fact if
-		 * they can be dynamically swapped, they both feel happy.
-		 * But here we leave this issue alone for now.
-		 */
-		if (test_bit(i, &cntr_mask) &&
-			!test_and_set_bit(i, cpuc->used_mask))
-			return i;
-	}
-
-	return -EAGAIN;
-}
-
-static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	unsigned long flags;
-
-	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
-
-	local_irq_save(flags);
-	cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base & 0xff) |
-		(evt->config_base & M_PERFCTL_CONFIG_MASK) |
-		/* Make sure interrupt enabled. */
-		M_PERFCTL_INTERRUPT_ENABLE;
-	/*
-	 * We do not actually let the counter run. Leave it until start().
-	 */
-	local_irq_restore(flags);
-}
-
-static void mipsxx_pmu_disable_event(int idx)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	unsigned long flags;
-
-	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
-
-	local_irq_save(flags);
-	cpuc->saved_ctrl[idx] = mipsxx_pmu_read_control(idx) &
-		~M_PERFCTL_COUNT_EVENT_WHENEVER;
-	mipsxx_pmu_write_control(idx, cpuc->saved_ctrl[idx]);
-	local_irq_restore(flags);
-}
-
 /* 24K */
 #define IS_UNSUPPORTED_24K_EVENT(r, b)					\
 	((b) == 12 || (r) == 151 || (r) == 152 || (b) == 26 ||		\
@@ -1452,40 +1395,11 @@ static const struct mips_perf_event *mipsxx_pmu_map_raw_event(u64 config)
 	return &raw_event;
 }
 
-static struct mips_pmu mipsxxcore_pmu = {
-	.handle_irq = mipsxx_pmu_handle_irq,
-	.handle_shared_irq = mipsxx_pmu_handle_shared_irq,
-	.start = mipsxx_pmu_start,
-	.stop = mipsxx_pmu_stop,
-	.alloc_counter = mipsxx_pmu_alloc_counter,
-	.read_counter = mipsxx_pmu_read_counter,
-	.write_counter = mipsxx_pmu_write_counter,
-	.enable_event = mipsxx_pmu_enable_event,
-	.disable_event = mipsxx_pmu_disable_event,
-	.map_raw_event = mipsxx_pmu_map_raw_event,
-	.general_event_map = &mipsxxcore_event_map,
-	.cache_event_map = &mipsxxcore_cache_map,
-};
-
-static struct mips_pmu mipsxx74Kcore_pmu = {
-	.handle_irq = mipsxx_pmu_handle_irq,
-	.handle_shared_irq = mipsxx_pmu_handle_shared_irq,
-	.start = mipsxx_pmu_start,
-	.stop = mipsxx_pmu_stop,
-	.alloc_counter = mipsxx_pmu_alloc_counter,
-	.read_counter = mipsxx_pmu_read_counter,
-	.write_counter = mipsxx_pmu_write_counter,
-	.enable_event = mipsxx_pmu_enable_event,
-	.disable_event = mipsxx_pmu_disable_event,
-	.map_raw_event = mipsxx_pmu_map_raw_event,
-	.general_event_map = &mipsxx74Kcore_event_map,
-	.cache_event_map = &mipsxx74Kcore_cache_map,
-};
-
 static int __init
 init_hw_perf_events(void)
 {
 	int counters, irq;
+	int counter_bits;
 
 	pr_info("Performance counters: ");
 
@@ -1517,32 +1431,28 @@ init_hw_perf_events(void)
 	}
 #endif
 
-	on_each_cpu(reset_counters, (void *)(long)counters, 1);
+	mipspmu.map_raw_event = mipsxx_pmu_map_raw_event;
 
 	switch (current_cpu_type()) {
 	case CPU_24K:
-		mipsxxcore_pmu.name = "mips/24K";
-		mipsxxcore_pmu.num_counters = counters;
-		mipsxxcore_pmu.irq = irq;
-		mipspmu = &mipsxxcore_pmu;
+		mipspmu.name = "mips/24K";
+		mipspmu.general_event_map = &mipsxxcore_event_map;
+		mipspmu.cache_event_map = &mipsxxcore_cache_map;
 		break;
 	case CPU_34K:
-		mipsxxcore_pmu.name = "mips/34K";
-		mipsxxcore_pmu.num_counters = counters;
-		mipsxxcore_pmu.irq = irq;
-		mipspmu = &mipsxxcore_pmu;
+		mipspmu.name = "mips/34K";
+		mipspmu.general_event_map = &mipsxxcore_event_map;
+		mipspmu.cache_event_map = &mipsxxcore_cache_map;
 		break;
 	case CPU_74K:
-		mipsxx74Kcore_pmu.name = "mips/74K";
-		mipsxx74Kcore_pmu.num_counters = counters;
-		mipsxx74Kcore_pmu.irq = irq;
-		mipspmu = &mipsxx74Kcore_pmu;
+		mipspmu.name = "mips/74K";
+		mipspmu.general_event_map = &mipsxx74Kcore_event_map;
+		mipspmu.cache_event_map = &mipsxx74Kcore_cache_map;
 		break;
 	case CPU_1004K:
-		mipsxxcore_pmu.name = "mips/1004K";
-		mipsxxcore_pmu.num_counters = counters;
-		mipsxxcore_pmu.irq = irq;
-		mipspmu = &mipsxxcore_pmu;
+		mipspmu.name = "mips/1004K";
+		mipspmu.general_event_map = &mipsxxcore_event_map;
+		mipspmu.cache_event_map = &mipsxxcore_cache_map;
 		break;
 	default:
 		pr_cont("Either hardware does not support performance "
@@ -1550,10 +1460,30 @@ init_hw_perf_events(void)
 		return -ENODEV;
 	}
 
-	if (mipspmu)
-		pr_cont("%s PMU enabled, %d counters available to each "
-			"CPU, irq %d%s\n", mipspmu->name, counters, irq,
-			irq < 0 ? " (share with timer interrupt)" : "");
+	mipspmu.num_counters = counters;
+	mipspmu.irq = irq;
+
+	if (read_c0_perfctrl0() & M_PERFCTL_WIDE) {
+		mipspmu.max_period = (1ULL << 63) - 1;
+		mipspmu.valid_count = (1ULL << 63) - 1;
+		mipspmu.overflow = 1ULL << 63;
+		mipspmu.read_counter = mipsxx_pmu_read_counter_64;
+		mipspmu.write_counter = mipsxx_pmu_write_counter_64;
+		counter_bits = 64;
+	} else {
+		mipspmu.max_period = (1ULL << 31) - 1;
+		mipspmu.valid_count = (1ULL << 31) - 1;
+		mipspmu.overflow = 1ULL << 31;
+		mipspmu.read_counter = mipsxx_pmu_read_counter;
+		mipspmu.write_counter = mipsxx_pmu_write_counter;
+		counter_bits = 32;
+	}
+
+	on_each_cpu(reset_counters, (void *)(long)counters, 1);
+
+	pr_cont("%s PMU enabled, %d %d-bit counters available to each "
+		"CPU, irq %d%s\n", mipspmu.name, counters, counter_bits, irq,
+		irq < 0 ? " (share with timer interrupt)" : "");
 
 	perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-01-21 22:59 ` [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters David Daney
@ 2011-01-25  3:42   ` Deng-Cheng Zhu
  2011-01-26  0:20     ` David Daney
  0 siblings, 1 reply; 15+ messages in thread
From: Deng-Cheng Zhu @ 2011-01-25  3:42 UTC (permalink / raw)
  To: David Daney
  Cc: linux-mips, ralf, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

Hi, David


This version does fix the problem with 'perf stat'. However, when working
with 'perf record', the following happened:

-sh-4.0# perf record -f -e cycles -e instructions -e branches \
> -e branch-misses -e r12 find / -name "*sys*" >/dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (~53 samples) ]
-sh-4.0#
-sh-4.0# perf report
#
# (For a higher level overview, try: perf report --sort comm,dso)
#

Again, when not patching this series, the results were:

-sh-4.0# perf record -f -e cycles -e instructions -e branches \
> -e branch-misses -e r12 find / -name "*sys*" >/dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.283 MB perf.data (~12368 samples) ]
-sh-4.0#
-sh-4.0# perf report
# Events: 3K cycles
#
# Overhead  Command      Shared Object  Symbol
# ........  .......  .................  ......
#
     5.99%     find  [kernel.kallsyms]  [k] sysfs_refresh_inode
     4.23%     find  find               [.] base_name
     3.96%     find  libc-2.9.so        [.] _int_malloc
[MORE DATA]


# Events: 2K instructions
#
# Overhead  Command      Shared Object  Symbol
# ........  .......  .................  ......
#
     5.74%     find  libc-2.9.so        [.] __GI_strlen
     4.27%     find  find               [.] base_name
     3.85%     find  [kernel.kallsyms]  [k] ext3fs_dirhash
[MORE DATA]


# Events: 924  branches
#
# Overhead  Command      Shared Object  Symbol
# ........  .......  .................  ......
#
    13.26%     find  find               [.] internal_fnmatch
     6.64%     find  libc-2.9.so        [.] _int_malloc
     6.17%     find  [kernel.kallsyms]  [k] fput
[MORE DATA]


# Events: 376  branch-misses
#
# Overhead  Command      Shared Object  Symbol
# ........  .......  .................  ......
#
    10.16%     find  find               [.] internal_fnmatch
     8.49%     find  libc-2.9.so        [.] __GI_memmove
     7.66%     find  libc-2.9.so        [.] __GI_strlen
[MORE DATA]


# Events: 465  raw 0x12
#
# Overhead  Command      Shared Object  Symbol
# ........  .......  .................  ......
#
     6.92%     find  libc-2.9.so        [.] __alloc_dir
     6.42%     find  [kernel.kallsyms]  [k] ext3_find_entry
     6.40%     find  [kernel.kallsyms]  [k] dcache_readdir
[MORE DATA]


Deng-Cheng


2011/1/22 David Daney <ddaney@caviumnetworks.com>:
> The hard coded constants are moved to struct mips_pmu.  All counter
> register access move to the read_counter and write_counter function
> pointers, which are set to either 32-bit or 64-bit access methods at
> initialization time.
>
> Many of the function pointers in struct mips_pmu were not needed as
> there was only a single implementation, these were removed.
>
> I couldn't figure out what made struct cpu_hw_events.msbs[] at all
> useful, so I removed it too.
>
> Some functions and other declarations were reordered to reduce the
> need for forward declarations.
>
> Signed-off-by: David Daney <ddaney@caviumnetworks.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
> ---
>  arch/mips/kernel/perf_event_mipsxx.c |  844 ++++++++++++++++------------------
>  1 files changed, 387 insertions(+), 457 deletions(-)
>
> diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
> index 409207d..f15bb01 100644
> --- a/arch/mips/kernel/perf_event_mipsxx.c
> +++ b/arch/mips/kernel/perf_event_mipsxx.c
> @@ -2,6 +2,7 @@
>  * Linux performance counter support for MIPS.
>  *
>  * Copyright (C) 2010 MIPS Technologies, Inc.
> + * Copyright (C) 2011 Cavium Networks, Inc.
>  * Author: Deng-Cheng Zhu
>  *
>  * This code is based on the implementation for ARM, which is in turn
> @@ -26,12 +27,6 @@
>  #include <asm/stacktrace.h>
>  #include <asm/time.h> /* For perf_irq */
>
> -/* These are for 32bit counters. For 64bit ones, define them accordingly. */
> -#define MAX_PERIOD     ((1ULL << 32) - 1)
> -#define VALID_COUNT    0x7fffffff
> -#define TOTAL_BITS     32
> -#define HIGHEST_BIT    31
> -
>  #define MIPS_MAX_HWEVENTS 4
>
>  struct cpu_hw_events {
> @@ -45,15 +40,6 @@ struct cpu_hw_events {
>        unsigned long           used_mask[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
>
>        /*
> -        * The borrowed MSB for the performance counter. A MIPS performance
> -        * counter uses its bit 31 (for 32bit counters) or bit 63 (for 64bit
> -        * counters) as a factor of determining whether a counter overflow
> -        * should be signaled. So here we use a separate MSB for each
> -        * counter to make things easy.
> -        */
> -       unsigned long           msbs[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
> -
> -       /*
>         * Software copy of the control register for each performance counter.
>         * MIPS CPUs vary in performance counters. They use this differently,
>         * and even may not use it.
> @@ -75,6 +61,7 @@ struct mips_perf_event {
>        unsigned int cntr_mask;
>        #define CNTR_EVEN       0x55555555
>        #define CNTR_ODD        0xaaaaaaaa
> +       #define CNTR_ALL        0xffffffff
>  #ifdef CONFIG_MIPS_MT_SMP
>        enum {
>                T  = 0,
> @@ -95,18 +82,13 @@ static DEFINE_MUTEX(raw_event_mutex);
>  #define C(x) PERF_COUNT_HW_CACHE_##x
>
>  struct mips_pmu {
> +       u64             max_period;
> +       u64             valid_count;
> +       u64             overflow;
>        const char      *name;
>        int             irq;
> -       irqreturn_t     (*handle_irq)(int irq, void *dev);
> -       int             (*handle_shared_irq)(void);
> -       void            (*start)(void);
> -       void            (*stop)(void);
> -       int             (*alloc_counter)(struct cpu_hw_events *cpuc,
> -                                       struct hw_perf_event *hwc);
>        u64             (*read_counter)(unsigned int idx);
>        void            (*write_counter)(unsigned int idx, u64 val);
> -       void            (*enable_event)(struct hw_perf_event *evt, int idx);
> -       void            (*disable_event)(int idx);
>        const struct mips_perf_event *(*map_raw_event)(u64 config);
>        const struct mips_perf_event (*general_event_map)[PERF_COUNT_HW_MAX];
>        const struct mips_perf_event (*cache_event_map)
> @@ -116,43 +98,302 @@ struct mips_pmu {
>        unsigned int    num_counters;
>  };
>
> -static const struct mips_pmu *mipspmu;
> +static struct mips_pmu mipspmu;
> +
> +#define M_CONFIG1_PC   (1 << 4)
> +
> +#define M_PERFCTL_EXL                  (1      <<  0)
> +#define M_PERFCTL_KERNEL               (1      <<  1)
> +#define M_PERFCTL_SUPERVISOR           (1      <<  2)
> +#define M_PERFCTL_USER                 (1      <<  3)
> +#define M_PERFCTL_INTERRUPT_ENABLE     (1      <<  4)
> +#define M_PERFCTL_EVENT(event)         (((event) & 0x3ff)  << 5)
> +#define M_PERFCTL_VPEID(vpe)           ((vpe)    << 16)
> +#define M_PERFCTL_MT_EN(filter)                ((filter) << 20)
> +#define    M_TC_EN_ALL                 M_PERFCTL_MT_EN(0)
> +#define    M_TC_EN_VPE                 M_PERFCTL_MT_EN(1)
> +#define    M_TC_EN_TC                  M_PERFCTL_MT_EN(2)
> +#define M_PERFCTL_TCID(tcid)           ((tcid)   << 22)
> +#define M_PERFCTL_WIDE                 (1      << 30)
> +#define M_PERFCTL_MORE                 (1      << 31)
> +
> +#define M_PERFCTL_COUNT_EVENT_WHENEVER (M_PERFCTL_EXL |                \
> +                                       M_PERFCTL_KERNEL |              \
> +                                       M_PERFCTL_USER |                \
> +                                       M_PERFCTL_SUPERVISOR |          \
> +                                       M_PERFCTL_INTERRUPT_ENABLE)
> +
> +#ifdef CONFIG_MIPS_MT_SMP
> +#define M_PERFCTL_CONFIG_MASK          0x3fff801f
> +#else
> +#define M_PERFCTL_CONFIG_MASK          0x1f
> +#endif
> +#define M_PERFCTL_EVENT_MASK           0xfe0
> +
> +
> +#ifdef CONFIG_MIPS_MT_SMP
> +static int cpu_has_mipsmt_pertccounters;
> +
> +static DEFINE_RWLOCK(pmuint_rwlock);
> +
> +/*
> + * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
> + * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
> + */
> +#if defined(CONFIG_HW_PERF_EVENTS)
> +#define vpe_id()       (cpu_has_mipsmt_pertccounters ? \
> +                       0 : smp_processor_id())
> +#else
> +#define vpe_id()       (cpu_has_mipsmt_pertccounters ? \
> +                       0 : cpu_data[smp_processor_id()].vpe_id)
> +#endif
> +
> +/* Copied from op_model_mipsxx.c */
> +static unsigned int vpe_shift(void)
> +{
> +       if (num_possible_cpus() > 1)
> +               return 1;
> +
> +       return 0;
> +}
> +
> +static unsigned int counters_total_to_per_cpu(unsigned int counters)
> +{
> +       return counters >> vpe_shift();
> +}
> +
> +static unsigned int counters_per_cpu_to_total(unsigned int counters)
> +{
> +       return counters << vpe_shift();
> +}
> +
> +#else /* !CONFIG_MIPS_MT_SMP */
> +#define vpe_id()       0
> +
> +#endif /* CONFIG_MIPS_MT_SMP */
> +
> +static void resume_local_counters(void);
> +static void pause_local_counters(void);
> +static irqreturn_t mipsxx_pmu_handle_irq(int, void *);
> +static int mipsxx_pmu_handle_shared_irq(void);
> +
> +static unsigned int mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
> +{
> +       if (vpe_id() == 1)
> +               idx = (idx + 2) & 3;
> +       return idx;
> +}
> +
> +static u64 mipsxx_pmu_read_counter(unsigned int idx)
> +{
> +       idx = mipsxx_pmu_swizzle_perf_idx(idx);
> +
> +       switch (idx) {
> +       case 0:
> +               return read_c0_perfcntr0();
> +       case 1:
> +               return read_c0_perfcntr1();
> +       case 2:
> +               return read_c0_perfcntr2();
> +       case 3:
> +               return read_c0_perfcntr3();
> +       default:
> +               WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
> +               return 0;
> +       }
> +}
> +
> +static u64 mipsxx_pmu_read_counter_64(unsigned int idx)
> +{
> +       idx = mipsxx_pmu_swizzle_perf_idx(idx);
> +
> +       switch (idx) {
> +       case 0:
> +               return read_c0_perfcntr0_64();
> +       case 1:
> +               return read_c0_perfcntr1_64();
> +       case 2:
> +               return read_c0_perfcntr2_64();
> +       case 3:
> +               return read_c0_perfcntr3_64();
> +       default:
> +               WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
> +               return 0;
> +       }
> +}
> +
> +static void mipsxx_pmu_write_counter(unsigned int idx, u64 val)
> +{
> +       idx = mipsxx_pmu_swizzle_perf_idx(idx);
> +
> +       switch (idx) {
> +       case 0:
> +               write_c0_perfcntr0(val);
> +               return;
> +       case 1:
> +               write_c0_perfcntr1(val);
> +               return;
> +       case 2:
> +               write_c0_perfcntr2(val);
> +               return;
> +       case 3:
> +               write_c0_perfcntr3(val);
> +               return;
> +       }
> +}
> +
> +static void mipsxx_pmu_write_counter_64(unsigned int idx, u64 val)
> +{
> +       idx = mipsxx_pmu_swizzle_perf_idx(idx);
> +
> +       switch (idx) {
> +       case 0:
> +               write_c0_perfcntr0_64(val);
> +               return;
> +       case 1:
> +               write_c0_perfcntr1_64(val);
> +               return;
> +       case 2:
> +               write_c0_perfcntr2_64(val);
> +               return;
> +       case 3:
> +               write_c0_perfcntr3_64(val);
> +               return;
> +       }
> +}
> +
> +static unsigned int mipsxx_pmu_read_control(unsigned int idx)
> +{
> +       idx = mipsxx_pmu_swizzle_perf_idx(idx);
> +
> +       switch (idx) {
> +       case 0:
> +               return read_c0_perfctrl0();
> +       case 1:
> +               return read_c0_perfctrl1();
> +       case 2:
> +               return read_c0_perfctrl2();
> +       case 3:
> +               return read_c0_perfctrl3();
> +       default:
> +               WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
> +               return 0;
> +       }
> +}
> +
> +static void mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
> +{
> +       idx = mipsxx_pmu_swizzle_perf_idx(idx);
> +
> +       switch (idx) {
> +       case 0:
> +               write_c0_perfctrl0(val);
> +               return;
> +       case 1:
> +               write_c0_perfctrl1(val);
> +               return;
> +       case 2:
> +               write_c0_perfctrl2(val);
> +               return;
> +       case 3:
> +               write_c0_perfctrl3(val);
> +               return;
> +       }
> +}
> +
> +static int mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
> +                                   struct hw_perf_event *hwc)
> +{
> +       int i;
> +
> +       /*
> +        * We only need to care the counter mask. The range has been
> +        * checked definitely.
> +        */
> +       unsigned long cntr_mask = (hwc->event_base >> 8) & 0xffff;
> +
> +       for (i = mipspmu.num_counters - 1; i >= 0; i--) {
> +               /*
> +                * Note that some MIPS perf events can be counted by both
> +                * even and odd counters, wheresas many other are only by
> +                * even _or_ odd counters. This introduces an issue that
> +                * when the former kind of event takes the counter the
> +                * latter kind of event wants to use, then the "counter
> +                * allocation" for the latter event will fail. In fact if
> +                * they can be dynamically swapped, they both feel happy.
> +                * But here we leave this issue alone for now.
> +                */
> +               if (test_bit(i, &cntr_mask) &&
> +                       !test_and_set_bit(i, cpuc->used_mask))
> +                       return i;
> +       }
> +
> +       return -EAGAIN;
> +}
> +
> +static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
> +{
> +       struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> +       unsigned long flags;
> +
> +       WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
> +
> +       local_irq_save(flags);
> +       cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base & 0xff) |
> +               (evt->config_base & M_PERFCTL_CONFIG_MASK) |
> +               /* Make sure interrupt enabled. */
> +               M_PERFCTL_INTERRUPT_ENABLE;
> +       /*
> +        * We do not actually let the counter run. Leave it until start().
> +        */
> +       local_irq_restore(flags);
> +}
> +
> +static void mipsxx_pmu_disable_event(int idx)
> +{
> +       struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> +       unsigned long flags;
> +
> +       WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
> +
> +       local_irq_save(flags);
> +       cpuc->saved_ctrl[idx] = mipsxx_pmu_read_control(idx) &
> +               ~M_PERFCTL_COUNT_EVENT_WHENEVER;
> +       mipsxx_pmu_write_control(idx, cpuc->saved_ctrl[idx]);
> +       local_irq_restore(flags);
> +}
>
>  static int mipspmu_event_set_period(struct perf_event *event,
>                                    struct hw_perf_event *hwc,
>                                    int idx)
>  {
> -       struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> -       s64 left = local64_read(&hwc->period_left);
> -       s64 period = hwc->sample_period;
> +       u64 left = local64_read(&hwc->period_left);
> +       u64 period = hwc->sample_period;
>        int ret = 0;
> -       u64 uleft;
>        unsigned long flags;
>
> -       if (unlikely(left <= -period)) {
> +       if (unlikely((left + period) & (1ULL << 63))) {
>                left = period;
>                local64_set(&hwc->period_left, left);
>                hwc->last_period = period;
>                ret = 1;
>        }
>
> -       if (unlikely(left <= 0)) {
> +
> +       if (unlikely((left + period) <= period)) {
>                left += period;
>                local64_set(&hwc->period_left, left);
>                hwc->last_period = period;
>                ret = 1;
>        }
>
> -       if (left > (s64)MAX_PERIOD)
> -               left = MAX_PERIOD;
> +       if (left > mipspmu.max_period)
> +               left = mipspmu.max_period;
>
> -       local64_set(&hwc->prev_count, (u64)-left);
> +       local64_set(&hwc->prev_count, mipspmu.overflow - left);
>
>        local_irq_save(flags);
> -       uleft = (u64)(-left) & MAX_PERIOD;
> -       uleft > VALID_COUNT ?
> -               set_bit(idx, cpuc->msbs) : clear_bit(idx, cpuc->msbs);
> -       mipspmu->write_counter(idx, (u64)(-left) & VALID_COUNT);
> +       mipspmu.write_counter(idx, mipspmu.overflow - left);
>        local_irq_restore(flags);
>
>        perf_event_update_userpage(event);
> @@ -164,30 +405,22 @@ static void mipspmu_event_update(struct perf_event *event,
>                                 struct hw_perf_event *hwc,
>                                 int idx)
>  {
> -       struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>        unsigned long flags;
> -       int shift = 64 - TOTAL_BITS;
> -       s64 prev_raw_count, new_raw_count;
> +       u64 prev_raw_count, new_raw_count;
>        u64 delta;
>
>  again:
>        prev_raw_count = local64_read(&hwc->prev_count);
>        local_irq_save(flags);
>        /* Make the counter value be a "real" one. */
> -       new_raw_count = mipspmu->read_counter(idx);
> -       if (new_raw_count & (test_bit(idx, cpuc->msbs) << HIGHEST_BIT)) {
> -               new_raw_count &= VALID_COUNT;
> -               clear_bit(idx, cpuc->msbs);
> -       } else
> -               new_raw_count |= (test_bit(idx, cpuc->msbs) << HIGHEST_BIT);
> +       new_raw_count = mipspmu.read_counter(idx);
>        local_irq_restore(flags);
>
>        if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
>                                new_raw_count) != prev_raw_count)
>                goto again;
>
> -       delta = (new_raw_count << shift) - (prev_raw_count << shift);
> -       delta >>= shift;
> +       delta = new_raw_count - prev_raw_count;
>
>        local64_add(delta, &event->count);
>        local64_sub(delta, &hwc->period_left);
> @@ -199,9 +432,6 @@ static void mipspmu_start(struct perf_event *event, int flags)
>  {
>        struct hw_perf_event *hwc = &event->hw;
>
> -       if (!mipspmu)
> -               return;
> -
>        if (flags & PERF_EF_RELOAD)
>                WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
>
> @@ -211,19 +441,16 @@ static void mipspmu_start(struct perf_event *event, int flags)
>        mipspmu_event_set_period(event, hwc, hwc->idx);
>
>        /* Enable the event. */
> -       mipspmu->enable_event(hwc, hwc->idx);
> +       mipsxx_pmu_enable_event(hwc, hwc->idx);
>  }
>
>  static void mipspmu_stop(struct perf_event *event, int flags)
>  {
>        struct hw_perf_event *hwc = &event->hw;
>
> -       if (!mipspmu)
> -               return;
> -
>        if (!(hwc->state & PERF_HES_STOPPED)) {
>                /* We are working on a local event. */
> -               mipspmu->disable_event(hwc->idx);
> +               mipsxx_pmu_disable_event(hwc->idx);
>                barrier();
>                mipspmu_event_update(event, hwc, hwc->idx);
>                hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
> @@ -240,7 +467,7 @@ static int mipspmu_add(struct perf_event *event, int flags)
>        perf_pmu_disable(event->pmu);
>
>        /* To look for a free counter for this event. */
> -       idx = mipspmu->alloc_counter(cpuc, hwc);
> +       idx = mipsxx_pmu_alloc_counter(cpuc, hwc);
>        if (idx < 0) {
>                err = idx;
>                goto out;
> @@ -251,7 +478,7 @@ static int mipspmu_add(struct perf_event *event, int flags)
>         * make sure it is disabled.
>         */
>        event->hw.idx = idx;
> -       mipspmu->disable_event(idx);
> +       mipsxx_pmu_disable_event(idx);
>        cpuc->events[idx] = event;
>
>        hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> @@ -272,7 +499,7 @@ static void mipspmu_del(struct perf_event *event, int flags)
>        struct hw_perf_event *hwc = &event->hw;
>        int idx = hwc->idx;
>
> -       WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
> +       WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
>
>        mipspmu_stop(event, PERF_EF_UPDATE);
>        cpuc->events[idx] = NULL;
> @@ -294,14 +521,29 @@ static void mipspmu_read(struct perf_event *event)
>
>  static void mipspmu_enable(struct pmu *pmu)
>  {
> -       if (mipspmu)
> -               mipspmu->start();
> +#ifdef CONFIG_MIPS_MT_SMP
> +       write_unlock(&pmuint_rwlock);
> +#endif
> +       resume_local_counters();
>  }
>
> +/*
> + * MIPS performance counters can be per-TC. The control registers can
> + * not be directly accessed accross CPUs. Hence if we want to do global
> + * control, we need cross CPU calls. on_each_cpu() can help us, but we
> + * can not make sure this function is called with interrupts enabled. So
> + * here we pause local counters and then grab a rwlock and leave the
> + * counters on other CPUs alone. If any counter interrupt raises while
> + * we own the write lock, simply pause local counters on that CPU and
> + * spin in the handler. Also we know we won't be switched to another
> + * CPU after pausing local counters and before grabbing the lock.
> + */
>  static void mipspmu_disable(struct pmu *pmu)
>  {
> -       if (mipspmu)
> -               mipspmu->stop();
> +       pause_local_counters();
> +#ifdef CONFIG_MIPS_MT_SMP
> +       write_lock(&pmuint_rwlock);
> +#endif
>  }
>
>  static atomic_t active_events = ATOMIC_INIT(0);
> @@ -312,21 +554,21 @@ static int mipspmu_get_irq(void)
>  {
>        int err;
>
> -       if (mipspmu->irq >= 0) {
> +       if (mipspmu.irq >= 0) {
>                /* Request my own irq handler. */
> -               err = request_irq(mipspmu->irq, mipspmu->handle_irq,
> -                       IRQF_DISABLED | IRQF_NOBALANCING,
> +               err = request_irq(mipspmu.irq, mipsxx_pmu_handle_irq,
> +                       IRQF_PERCPU | IRQF_NOBALANCING,
>                        "mips_perf_pmu", NULL);
>                if (err) {
>                        pr_warning("Unable to request IRQ%d for MIPS "
> -                          "performance counters!\n", mipspmu->irq);
> +                          "performance counters!\n", mipspmu.irq);
>                }
>        } else if (cp0_perfcount_irq < 0) {
>                /*
>                 * We are sharing the irq number with the timer interrupt.
>                 */
>                save_perf_irq = perf_irq;
> -               perf_irq = mipspmu->handle_shared_irq;
> +               perf_irq = mipsxx_pmu_handle_shared_irq;
>                err = 0;
>        } else {
>                pr_warning("The platform hasn't properly defined its "
> @@ -339,8 +581,8 @@ static int mipspmu_get_irq(void)
>
>  static void mipspmu_free_irq(void)
>  {
> -       if (mipspmu->irq >= 0)
> -               free_irq(mipspmu->irq, NULL);
> +       if (mipspmu.irq >= 0)
> +               free_irq(mipspmu.irq, NULL);
>        else if (cp0_perfcount_irq < 0)
>                perf_irq = save_perf_irq;
>  }
> @@ -361,7 +603,7 @@ static void hw_perf_event_destroy(struct perf_event *event)
>                 * disabled.
>                 */
>                on_each_cpu(reset_counters,
> -                       (void *)(long)mipspmu->num_counters, 1);
> +                       (void *)(long)mipspmu.num_counters, 1);
>                mipspmu_free_irq();
>                mutex_unlock(&pmu_reserve_mutex);
>        }
> @@ -381,8 +623,8 @@ static int mipspmu_event_init(struct perf_event *event)
>                return -ENOENT;
>        }
>
> -       if (!mipspmu || event->cpu >= nr_cpumask_bits ||
> -               (event->cpu >= 0 && !cpu_online(event->cpu)))
> +       if (event->cpu >= nr_cpumask_bits ||
> +           (event->cpu >= 0 && !cpu_online(event->cpu)))
>                return -ENODEV;
>
>        if (!atomic_inc_not_zero(&active_events)) {
> @@ -441,9 +683,9 @@ static const struct mips_perf_event *mipspmu_map_general_event(int idx)
>  {
>        const struct mips_perf_event *pev;
>
> -       pev = ((*mipspmu->general_event_map)[idx].event_id ==
> +       pev = ((*mipspmu.general_event_map)[idx].event_id ==
>                UNSUPPORTED_PERF_EVENT_ID ? ERR_PTR(-EOPNOTSUPP) :
> -               &(*mipspmu->general_event_map)[idx]);
> +               &(*mipspmu.general_event_map)[idx]);
>
>        return pev;
>  }
> @@ -465,7 +707,7 @@ static const struct mips_perf_event *mipspmu_map_cache_event(u64 config)
>        if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
>                return ERR_PTR(-EINVAL);
>
> -       pev = &((*mipspmu->cache_event_map)
> +       pev = &((*mipspmu.cache_event_map)
>                                        [cache_type]
>                                        [cache_op]
>                                        [cache_result]);
> @@ -486,7 +728,7 @@ static int validate_event(struct cpu_hw_events *cpuc,
>        if (event->pmu != &pmu || event->state <= PERF_EVENT_STATE_OFF)
>                return 1;
>
> -       return mipspmu->alloc_counter(cpuc, &fake_hwc) >= 0;
> +       return mipsxx_pmu_alloc_counter(cpuc, &fake_hwc) >= 0;
>  }
>
>  static int validate_group(struct perf_event *event)
> @@ -524,123 +766,9 @@ static void handle_associated_event(struct cpu_hw_events *cpuc,
>                return;
>
>        if (perf_event_overflow(event, 0, data, regs))
> -               mipspmu->disable_event(idx);
> -}
> -
> -#define M_CONFIG1_PC   (1 << 4)
> -
> -#define M_PERFCTL_EXL                  (1UL      <<  0)
> -#define M_PERFCTL_KERNEL               (1UL      <<  1)
> -#define M_PERFCTL_SUPERVISOR           (1UL      <<  2)
> -#define M_PERFCTL_USER                 (1UL      <<  3)
> -#define M_PERFCTL_INTERRUPT_ENABLE     (1UL      <<  4)
> -#define M_PERFCTL_EVENT(event)         (((event) & 0x3ff)  << 5)
> -#define M_PERFCTL_VPEID(vpe)           ((vpe)    << 16)
> -#define M_PERFCTL_MT_EN(filter)                ((filter) << 20)
> -#define    M_TC_EN_ALL                 M_PERFCTL_MT_EN(0)
> -#define    M_TC_EN_VPE                 M_PERFCTL_MT_EN(1)
> -#define    M_TC_EN_TC                  M_PERFCTL_MT_EN(2)
> -#define M_PERFCTL_TCID(tcid)           ((tcid)   << 22)
> -#define M_PERFCTL_WIDE                 (1UL      << 30)
> -#define M_PERFCTL_MORE                 (1UL      << 31)
> -
> -#define M_PERFCTL_COUNT_EVENT_WHENEVER (M_PERFCTL_EXL |                \
> -                                       M_PERFCTL_KERNEL |              \
> -                                       M_PERFCTL_USER |                \
> -                                       M_PERFCTL_SUPERVISOR |          \
> -                                       M_PERFCTL_INTERRUPT_ENABLE)
> -
> -#ifdef CONFIG_MIPS_MT_SMP
> -#define M_PERFCTL_CONFIG_MASK          0x3fff801f
> -#else
> -#define M_PERFCTL_CONFIG_MASK          0x1f
> -#endif
> -#define M_PERFCTL_EVENT_MASK           0xfe0
> -
> -#define M_COUNTER_OVERFLOW             (1UL      << 31)
> -
> -#ifdef CONFIG_MIPS_MT_SMP
> -static int cpu_has_mipsmt_pertccounters;
> -
> -/*
> - * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
> - * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
> - */
> -#if defined(CONFIG_HW_PERF_EVENTS)
> -#define vpe_id()       (cpu_has_mipsmt_pertccounters ? \
> -                       0 : smp_processor_id())
> -#else
> -#define vpe_id()       (cpu_has_mipsmt_pertccounters ? \
> -                       0 : cpu_data[smp_processor_id()].vpe_id)
> -#endif
> -
> -/* Copied from op_model_mipsxx.c */
> -static unsigned int vpe_shift(void)
> -{
> -       if (num_possible_cpus() > 1)
> -               return 1;
> -
> -       return 0;
> +               mipsxx_pmu_disable_event(idx);
>  }
>
> -static unsigned int counters_total_to_per_cpu(unsigned int counters)
> -{
> -       return counters >> vpe_shift();
> -}
> -
> -static unsigned int counters_per_cpu_to_total(unsigned int counters)
> -{
> -       return counters << vpe_shift();
> -}
> -
> -#else /* !CONFIG_MIPS_MT_SMP */
> -#define vpe_id()       0
> -
> -#endif /* CONFIG_MIPS_MT_SMP */
> -
> -#define __define_perf_accessors(r, n, np)                              \
> -                                                                       \
> -static unsigned int r_c0_ ## r ## n(void)                              \
> -{                                                                      \
> -       unsigned int cpu = vpe_id();                                    \
> -                                                                       \
> -       switch (cpu) {                                                  \
> -       case 0:                                                         \
> -               return read_c0_ ## r ## n();                            \
> -       case 1:                                                         \
> -               return read_c0_ ## r ## np();                           \
> -       default:                                                        \
> -               BUG();                                                  \
> -       }                                                               \
> -       return 0;                                                       \
> -}                                                                      \
> -                                                                       \
> -static void w_c0_ ## r ## n(unsigned int value)                                \
> -{                                                                      \
> -       unsigned int cpu = vpe_id();                                    \
> -                                                                       \
> -       switch (cpu) {                                                  \
> -       case 0:                                                         \
> -               write_c0_ ## r ## n(value);                             \
> -               return;                                                 \
> -       case 1:                                                         \
> -               write_c0_ ## r ## np(value);                            \
> -               return;                                                 \
> -       default:                                                        \
> -               BUG();                                                  \
> -       }                                                               \
> -       return;                                                         \
> -}                                                                      \
> -
> -__define_perf_accessors(perfcntr, 0, 2)
> -__define_perf_accessors(perfcntr, 1, 3)
> -__define_perf_accessors(perfcntr, 2, 0)
> -__define_perf_accessors(perfcntr, 3, 1)
> -
> -__define_perf_accessors(perfctrl, 0, 2)
> -__define_perf_accessors(perfctrl, 1, 3)
> -__define_perf_accessors(perfctrl, 2, 0)
> -__define_perf_accessors(perfctrl, 3, 1)
>
>  static int __n_counters(void)
>  {
> @@ -682,94 +810,20 @@ static void reset_counters(void *arg)
>        int counters = (int)(long)arg;
>        switch (counters) {
>        case 4:
> -               w_c0_perfctrl3(0);
> -               w_c0_perfcntr3(0);
> +               mipsxx_pmu_write_control(3, 0);
> +               mipspmu.write_counter(3, 0);
>        case 3:
> -               w_c0_perfctrl2(0);
> -               w_c0_perfcntr2(0);
> +               mipsxx_pmu_write_control(2, 0);
> +               mipspmu.write_counter(2, 0);
>        case 2:
> -               w_c0_perfctrl1(0);
> -               w_c0_perfcntr1(0);
> -       case 1:
> -               w_c0_perfctrl0(0);
> -               w_c0_perfcntr0(0);
> -       }
> -}
> -
> -static u64 mipsxx_pmu_read_counter(unsigned int idx)
> -{
> -       switch (idx) {
> -       case 0:
> -               return r_c0_perfcntr0();
> +               mipsxx_pmu_write_control(1, 0);
> +               mipspmu.write_counter(1, 0);
>        case 1:
> -               return r_c0_perfcntr1();
> -       case 2:
> -               return r_c0_perfcntr2();
> -       case 3:
> -               return r_c0_perfcntr3();
> -       default:
> -               WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
> -               return 0;
> -       }
> -}
> -
> -static void mipsxx_pmu_write_counter(unsigned int idx, u64 val)
> -{
> -       switch (idx) {
> -       case 0:
> -               w_c0_perfcntr0(val);
> -               return;
> -       case 1:
> -               w_c0_perfcntr1(val);
> -               return;
> -       case 2:
> -               w_c0_perfcntr2(val);
> -               return;
> -       case 3:
> -               w_c0_perfcntr3(val);
> -               return;
> -       }
> -}
> -
> -static unsigned int mipsxx_pmu_read_control(unsigned int idx)
> -{
> -       switch (idx) {
> -       case 0:
> -               return r_c0_perfctrl0();
> -       case 1:
> -               return r_c0_perfctrl1();
> -       case 2:
> -               return r_c0_perfctrl2();
> -       case 3:
> -               return r_c0_perfctrl3();
> -       default:
> -               WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
> -               return 0;
> -       }
> -}
> -
> -static void mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
> -{
> -       switch (idx) {
> -       case 0:
> -               w_c0_perfctrl0(val);
> -               return;
> -       case 1:
> -               w_c0_perfctrl1(val);
> -               return;
> -       case 2:
> -               w_c0_perfctrl2(val);
> -               return;
> -       case 3:
> -               w_c0_perfctrl3(val);
> -               return;
> +               mipsxx_pmu_write_control(0, 0);
> +               mipspmu.write_counter(0, 0);
>        }
>  }
>
> -#ifdef CONFIG_MIPS_MT_SMP
> -static DEFINE_RWLOCK(pmuint_rwlock);
> -#endif
> -
>  /* 24K/34K/1004K cores can share the same event map. */
>  static const struct mips_perf_event mipsxxcore_event_map
>                                [PERF_COUNT_HW_MAX] = {
> @@ -1047,7 +1101,7 @@ static int __hw_perf_event_init(struct perf_event *event)
>        } else if (PERF_TYPE_RAW == event->attr.type) {
>                /* We are working on the global raw event. */
>                mutex_lock(&raw_event_mutex);
> -               pev = mipspmu->map_raw_event(event->attr.config);
> +               pev = mipspmu.map_raw_event(event->attr.config);
>        } else {
>                /* The event type is not (yet) supported. */
>                return -EOPNOTSUPP;
> @@ -1092,7 +1146,7 @@ static int __hw_perf_event_init(struct perf_event *event)
>        hwc->config = 0;
>
>        if (!hwc->sample_period) {
> -               hwc->sample_period  = MAX_PERIOD;
> +               hwc->sample_period  = mipspmu.max_period;
>                hwc->last_period    = hwc->sample_period;
>                local64_set(&hwc->period_left, hwc->sample_period);
>        }
> @@ -1105,55 +1159,38 @@ static int __hw_perf_event_init(struct perf_event *event)
>        }
>
>        event->destroy = hw_perf_event_destroy;
> -
>        return err;
>  }
>
>  static void pause_local_counters(void)
>  {
>        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> -       int counters = mipspmu->num_counters;
> +       int ctr = mipspmu.num_counters;
>        unsigned long flags;
>
>        local_irq_save(flags);
> -       switch (counters) {
> -       case 4:
> -               cpuc->saved_ctrl[3] = r_c0_perfctrl3();
> -               w_c0_perfctrl3(cpuc->saved_ctrl[3] &
> -                       ~M_PERFCTL_COUNT_EVENT_WHENEVER);
> -       case 3:
> -               cpuc->saved_ctrl[2] = r_c0_perfctrl2();
> -               w_c0_perfctrl2(cpuc->saved_ctrl[2] &
> -                       ~M_PERFCTL_COUNT_EVENT_WHENEVER);
> -       case 2:
> -               cpuc->saved_ctrl[1] = r_c0_perfctrl1();
> -               w_c0_perfctrl1(cpuc->saved_ctrl[1] &
> -                       ~M_PERFCTL_COUNT_EVENT_WHENEVER);
> -       case 1:
> -               cpuc->saved_ctrl[0] = r_c0_perfctrl0();
> -               w_c0_perfctrl0(cpuc->saved_ctrl[0] &
> -                       ~M_PERFCTL_COUNT_EVENT_WHENEVER);
> -       }
> +       do {
> +               ctr--;
> +               cpuc->saved_ctrl[ctr] = mipsxx_pmu_read_control(ctr);
> +               mipsxx_pmu_write_control(ctr, cpuc->saved_ctrl[ctr] &
> +                                        ~M_PERFCTL_COUNT_EVENT_WHENEVER);
> +       } while (ctr > 0);
>        local_irq_restore(flags);
>  }
>
>  static void resume_local_counters(void)
>  {
>        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> -       int counters = mipspmu->num_counters;
> +       int ctr = mipspmu.num_counters;
>        unsigned long flags;
>
>        local_irq_save(flags);
> -       switch (counters) {
> -       case 4:
> -               w_c0_perfctrl3(cpuc->saved_ctrl[3]);
> -       case 3:
> -               w_c0_perfctrl2(cpuc->saved_ctrl[2]);
> -       case 2:
> -               w_c0_perfctrl1(cpuc->saved_ctrl[1]);
> -       case 1:
> -               w_c0_perfctrl0(cpuc->saved_ctrl[0]);
> -       }
> +
> +       do {
> +               ctr--;
> +               mipsxx_pmu_write_control(ctr, cpuc->saved_ctrl[ctr]);
> +       } while (ctr > 0);
> +
>        local_irq_restore(flags);
>  }
>
> @@ -1161,14 +1198,13 @@ static int mipsxx_pmu_handle_shared_irq(void)
>  {
>        struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>        struct perf_sample_data data;
> -       unsigned int counters = mipspmu->num_counters;
> -       unsigned int counter;
> +       unsigned int counters = mipspmu.num_counters;
> +       u64 counter;
>        int handled = IRQ_NONE;
>        struct pt_regs *regs;
>
>        if (cpu_has_mips_r2 && !(read_c0_cause() & (1 << 26)))
>                return handled;
> -
>        /*
>         * First we pause the local counters, so that when we are locked
>         * here, the counters are all paused. When it gets locked due to
> @@ -1189,13 +1225,9 @@ static int mipsxx_pmu_handle_shared_irq(void)
>  #define HANDLE_COUNTER(n)                                              \
>        case n + 1:                                                     \
>                if (test_bit(n, cpuc->used_mask)) {                     \
> -                       counter = r_c0_perfcntr ## n();                 \
> -                       if (counter & M_COUNTER_OVERFLOW) {             \
> -                               w_c0_perfcntr ## n(counter &            \
> -                                               VALID_COUNT);           \
> -                               if (test_and_change_bit(n, cpuc->msbs)) \
> -                                       handle_associated_event(cpuc,   \
> -                                               n, &data, regs);        \
> +                       counter = mipspmu.read_counter(n);              \
> +                       if (counter & mipspmu.overflow) {               \
> +                               handle_associated_event(cpuc, n, &data, regs); \
>                                handled = IRQ_HANDLED;                  \
>                        }                                               \
>                }
> @@ -1225,95 +1257,6 @@ static irqreturn_t mipsxx_pmu_handle_irq(int irq, void *dev)
>        return mipsxx_pmu_handle_shared_irq();
>  }
>
> -static void mipsxx_pmu_start(void)
> -{
> -#ifdef CONFIG_MIPS_MT_SMP
> -       write_unlock(&pmuint_rwlock);
> -#endif
> -       resume_local_counters();
> -}
> -
> -/*
> - * MIPS performance counters can be per-TC. The control registers can
> - * not be directly accessed accross CPUs. Hence if we want to do global
> - * control, we need cross CPU calls. on_each_cpu() can help us, but we
> - * can not make sure this function is called with interrupts enabled. So
> - * here we pause local counters and then grab a rwlock and leave the
> - * counters on other CPUs alone. If any counter interrupt raises while
> - * we own the write lock, simply pause local counters on that CPU and
> - * spin in the handler. Also we know we won't be switched to another
> - * CPU after pausing local counters and before grabbing the lock.
> - */
> -static void mipsxx_pmu_stop(void)
> -{
> -       pause_local_counters();
> -#ifdef CONFIG_MIPS_MT_SMP
> -       write_lock(&pmuint_rwlock);
> -#endif
> -}
> -
> -static int mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
> -                                   struct hw_perf_event *hwc)
> -{
> -       int i;
> -
> -       /*
> -        * We only need to care the counter mask. The range has been
> -        * checked definitely.
> -        */
> -       unsigned long cntr_mask = (hwc->event_base >> 8) & 0xffff;
> -
> -       for (i = mipspmu->num_counters - 1; i >= 0; i--) {
> -               /*
> -                * Note that some MIPS perf events can be counted by both
> -                * even and odd counters, wheresas many other are only by
> -                * even _or_ odd counters. This introduces an issue that
> -                * when the former kind of event takes the counter the
> -                * latter kind of event wants to use, then the "counter
> -                * allocation" for the latter event will fail. In fact if
> -                * they can be dynamically swapped, they both feel happy.
> -                * But here we leave this issue alone for now.
> -                */
> -               if (test_bit(i, &cntr_mask) &&
> -                       !test_and_set_bit(i, cpuc->used_mask))
> -                       return i;
> -       }
> -
> -       return -EAGAIN;
> -}
> -
> -static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
> -{
> -       struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> -       unsigned long flags;
> -
> -       WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
> -
> -       local_irq_save(flags);
> -       cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base & 0xff) |
> -               (evt->config_base & M_PERFCTL_CONFIG_MASK) |
> -               /* Make sure interrupt enabled. */
> -               M_PERFCTL_INTERRUPT_ENABLE;
> -       /*
> -        * We do not actually let the counter run. Leave it until start().
> -        */
> -       local_irq_restore(flags);
> -}
> -
> -static void mipsxx_pmu_disable_event(int idx)
> -{
> -       struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> -       unsigned long flags;
> -
> -       WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
> -
> -       local_irq_save(flags);
> -       cpuc->saved_ctrl[idx] = mipsxx_pmu_read_control(idx) &
> -               ~M_PERFCTL_COUNT_EVENT_WHENEVER;
> -       mipsxx_pmu_write_control(idx, cpuc->saved_ctrl[idx]);
> -       local_irq_restore(flags);
> -}
> -
>  /* 24K */
>  #define IS_UNSUPPORTED_24K_EVENT(r, b)                                 \
>        ((b) == 12 || (r) == 151 || (r) == 152 || (b) == 26 ||          \
> @@ -1452,40 +1395,11 @@ static const struct mips_perf_event *mipsxx_pmu_map_raw_event(u64 config)
>        return &raw_event;
>  }
>
> -static struct mips_pmu mipsxxcore_pmu = {
> -       .handle_irq = mipsxx_pmu_handle_irq,
> -       .handle_shared_irq = mipsxx_pmu_handle_shared_irq,
> -       .start = mipsxx_pmu_start,
> -       .stop = mipsxx_pmu_stop,
> -       .alloc_counter = mipsxx_pmu_alloc_counter,
> -       .read_counter = mipsxx_pmu_read_counter,
> -       .write_counter = mipsxx_pmu_write_counter,
> -       .enable_event = mipsxx_pmu_enable_event,
> -       .disable_event = mipsxx_pmu_disable_event,
> -       .map_raw_event = mipsxx_pmu_map_raw_event,
> -       .general_event_map = &mipsxxcore_event_map,
> -       .cache_event_map = &mipsxxcore_cache_map,
> -};
> -
> -static struct mips_pmu mipsxx74Kcore_pmu = {
> -       .handle_irq = mipsxx_pmu_handle_irq,
> -       .handle_shared_irq = mipsxx_pmu_handle_shared_irq,
> -       .start = mipsxx_pmu_start,
> -       .stop = mipsxx_pmu_stop,
> -       .alloc_counter = mipsxx_pmu_alloc_counter,
> -       .read_counter = mipsxx_pmu_read_counter,
> -       .write_counter = mipsxx_pmu_write_counter,
> -       .enable_event = mipsxx_pmu_enable_event,
> -       .disable_event = mipsxx_pmu_disable_event,
> -       .map_raw_event = mipsxx_pmu_map_raw_event,
> -       .general_event_map = &mipsxx74Kcore_event_map,
> -       .cache_event_map = &mipsxx74Kcore_cache_map,
> -};
> -
>  static int __init
>  init_hw_perf_events(void)
>  {
>        int counters, irq;
> +       int counter_bits;
>
>        pr_info("Performance counters: ");
>
> @@ -1517,32 +1431,28 @@ init_hw_perf_events(void)
>        }
>  #endif
>
> -       on_each_cpu(reset_counters, (void *)(long)counters, 1);
> +       mipspmu.map_raw_event = mipsxx_pmu_map_raw_event;
>
>        switch (current_cpu_type()) {
>        case CPU_24K:
> -               mipsxxcore_pmu.name = "mips/24K";
> -               mipsxxcore_pmu.num_counters = counters;
> -               mipsxxcore_pmu.irq = irq;
> -               mipspmu = &mipsxxcore_pmu;
> +               mipspmu.name = "mips/24K";
> +               mipspmu.general_event_map = &mipsxxcore_event_map;
> +               mipspmu.cache_event_map = &mipsxxcore_cache_map;
>                break;
>        case CPU_34K:
> -               mipsxxcore_pmu.name = "mips/34K";
> -               mipsxxcore_pmu.num_counters = counters;
> -               mipsxxcore_pmu.irq = irq;
> -               mipspmu = &mipsxxcore_pmu;
> +               mipspmu.name = "mips/34K";
> +               mipspmu.general_event_map = &mipsxxcore_event_map;
> +               mipspmu.cache_event_map = &mipsxxcore_cache_map;
>                break;
>        case CPU_74K:
> -               mipsxx74Kcore_pmu.name = "mips/74K";
> -               mipsxx74Kcore_pmu.num_counters = counters;
> -               mipsxx74Kcore_pmu.irq = irq;
> -               mipspmu = &mipsxx74Kcore_pmu;
> +               mipspmu.name = "mips/74K";
> +               mipspmu.general_event_map = &mipsxx74Kcore_event_map;
> +               mipspmu.cache_event_map = &mipsxx74Kcore_cache_map;
>                break;
>        case CPU_1004K:
> -               mipsxxcore_pmu.name = "mips/1004K";
> -               mipsxxcore_pmu.num_counters = counters;
> -               mipsxxcore_pmu.irq = irq;
> -               mipspmu = &mipsxxcore_pmu;
> +               mipspmu.name = "mips/1004K";
> +               mipspmu.general_event_map = &mipsxxcore_event_map;
> +               mipspmu.cache_event_map = &mipsxxcore_cache_map;
>                break;
>        default:
>                pr_cont("Either hardware does not support performance "
> @@ -1550,10 +1460,30 @@ init_hw_perf_events(void)
>                return -ENODEV;
>        }
>
> -       if (mipspmu)
> -               pr_cont("%s PMU enabled, %d counters available to each "
> -                       "CPU, irq %d%s\n", mipspmu->name, counters, irq,
> -                       irq < 0 ? " (share with timer interrupt)" : "");
> +       mipspmu.num_counters = counters;
> +       mipspmu.irq = irq;
> +
> +       if (read_c0_perfctrl0() & M_PERFCTL_WIDE) {
> +               mipspmu.max_period = (1ULL << 63) - 1;
> +               mipspmu.valid_count = (1ULL << 63) - 1;
> +               mipspmu.overflow = 1ULL << 63;
> +               mipspmu.read_counter = mipsxx_pmu_read_counter_64;
> +               mipspmu.write_counter = mipsxx_pmu_write_counter_64;
> +               counter_bits = 64;
> +       } else {
> +               mipspmu.max_period = (1ULL << 31) - 1;
> +               mipspmu.valid_count = (1ULL << 31) - 1;
> +               mipspmu.overflow = 1ULL << 31;
> +               mipspmu.read_counter = mipsxx_pmu_read_counter;
> +               mipspmu.write_counter = mipsxx_pmu_write_counter;
> +               counter_bits = 32;
> +       }
> +
> +       on_each_cpu(reset_counters, (void *)(long)counters, 1);
> +
> +       pr_cont("%s PMU enabled, %d %d-bit counters available to each "
> +               "CPU, irq %d%s\n", mipspmu.name, counters, counter_bits, irq,
> +               irq < 0 ? " (share with timer interrupt)" : "");
>
>        perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
>
> --
> 1.7.2.3
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-01-25  3:42   ` Deng-Cheng Zhu
@ 2011-01-26  0:20     ` David Daney
  2011-01-27  6:24       ` Deng-Cheng Zhu
  0 siblings, 1 reply; 15+ messages in thread
From: David Daney @ 2011-01-26  0:20 UTC (permalink / raw)
  To: Deng-Cheng Zhu
  Cc: linux-mips, ralf, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On 01/24/2011 07:42 PM, Deng-Cheng Zhu wrote:
> Hi, David
>
>
> This version does fix the problem with 'perf stat'. However, when working
> with 'perf record', the following happened:
>
> -sh-4.0# perf record -f -e cycles -e instructions -e branches \
>> -e branch-misses -e r12 find / -name "*sys*">/dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.001 MB perf.data (~53 samples) ]


I get the same thing.  What happens if you supply either '-c xxx' or '-f 
xxx'?

I get:octeon:~/linux/tools/perf# ./perf record -e cycles /bin/ls -l /
total 100
drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
[...]
drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (~82 samples) ]

Almost no samples as you got.

But if I do:

octeon:~/linux/tools/perf# ./perf record -F 100000 -e cycles /bin/ls -l /
total 100
drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
[...]
drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.404 MB perf.data (~17653 samples) ]

Look many more samples!

The question is, what is it supposed to do?

If you can get a reasonable number of samples out if you supply -c or
-F, then I would argue that it is working and the default settings for
-F are not a good fit for your test case.

I have slightly changed the patch.  You could try the attached version 
instead and tell me the results.


David Daney



[-- Attachment #2: mips-perf-64bit.patch --]
[-- Type: text/plain, Size: 34403 bytes --]

MIPS: perf: Add support for 64-bit perf counters.

The hard coded constants are moved to struct mips_pmu.  All counter
register access move to the read_counter and write_counter function
pointers, which are set to either 32-bit or 64-bit access methods at
initialization time.

Many of the function pointers in struct mips_pmu were not needed as
there was only a single implementation, these were removed.

I couldn't figure out what made struct cpu_hw_events.msbs[] at all
useful, so I removed it too.

Some functions and other declarations were reordered to reduce the
need for forward declarations.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
---
 arch/mips/kernel/perf_event_mipsxx.c |  849 ++++++++++++++++------------------
 1 files changed, 390 insertions(+), 459 deletions(-)

diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 409207d..e335e2e 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -2,6 +2,7 @@
  * Linux performance counter support for MIPS.
  *
  * Copyright (C) 2010 MIPS Technologies, Inc.
+ * Copyright (C) 2011 Cavium Networks, Inc.
  * Author: Deng-Cheng Zhu
  *
  * This code is based on the implementation for ARM, which is in turn
@@ -26,12 +27,6 @@
 #include <asm/stacktrace.h>
 #include <asm/time.h> /* For perf_irq */
 
-/* These are for 32bit counters. For 64bit ones, define them accordingly. */
-#define MAX_PERIOD	((1ULL << 32) - 1)
-#define VALID_COUNT	0x7fffffff
-#define TOTAL_BITS	32
-#define HIGHEST_BIT	31
-
 #define MIPS_MAX_HWEVENTS 4
 
 struct cpu_hw_events {
@@ -45,15 +40,6 @@ struct cpu_hw_events {
 	unsigned long		used_mask[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
 
 	/*
-	 * The borrowed MSB for the performance counter. A MIPS performance
-	 * counter uses its bit 31 (for 32bit counters) or bit 63 (for 64bit
-	 * counters) as a factor of determining whether a counter overflow
-	 * should be signaled. So here we use a separate MSB for each
-	 * counter to make things easy.
-	 */
-	unsigned long		msbs[BITS_TO_LONGS(MIPS_MAX_HWEVENTS)];
-
-	/*
 	 * Software copy of the control register for each performance counter.
 	 * MIPS CPUs vary in performance counters. They use this differently,
 	 * and even may not use it.
@@ -75,6 +61,7 @@ struct mips_perf_event {
 	unsigned int cntr_mask;
 	#define CNTR_EVEN	0x55555555
 	#define CNTR_ODD	0xaaaaaaaa
+	#define CNTR_ALL	0xffffffff
 #ifdef CONFIG_MIPS_MT_SMP
 	enum {
 		T  = 0,
@@ -95,18 +82,13 @@ static DEFINE_MUTEX(raw_event_mutex);
 #define C(x) PERF_COUNT_HW_CACHE_##x
 
 struct mips_pmu {
+	u64		max_period;
+	u64		valid_count;
+	u64		overflow;
 	const char	*name;
 	int		irq;
-	irqreturn_t	(*handle_irq)(int irq, void *dev);
-	int		(*handle_shared_irq)(void);
-	void		(*start)(void);
-	void		(*stop)(void);
-	int		(*alloc_counter)(struct cpu_hw_events *cpuc,
-					struct hw_perf_event *hwc);
 	u64		(*read_counter)(unsigned int idx);
 	void		(*write_counter)(unsigned int idx, u64 val);
-	void		(*enable_event)(struct hw_perf_event *evt, int idx);
-	void		(*disable_event)(int idx);
 	const struct mips_perf_event *(*map_raw_event)(u64 config);
 	const struct mips_perf_event (*general_event_map)[PERF_COUNT_HW_MAX];
 	const struct mips_perf_event (*cache_event_map)
@@ -116,43 +98,303 @@ struct mips_pmu {
 	unsigned int	num_counters;
 };
 
-static const struct mips_pmu *mipspmu;
+static struct mips_pmu mipspmu;
+
+#define M_CONFIG1_PC	(1 << 4)
+
+#define M_PERFCTL_EXL			(1      <<  0)
+#define M_PERFCTL_KERNEL		(1      <<  1)
+#define M_PERFCTL_SUPERVISOR		(1      <<  2)
+#define M_PERFCTL_USER			(1      <<  3)
+#define M_PERFCTL_INTERRUPT_ENABLE	(1      <<  4)
+#define M_PERFCTL_EVENT(event)		(((event) & 0x3ff)  << 5)
+#define M_PERFCTL_VPEID(vpe)		((vpe)    << 16)
+#define M_PERFCTL_MT_EN(filter)		((filter) << 20)
+#define    M_TC_EN_ALL			M_PERFCTL_MT_EN(0)
+#define    M_TC_EN_VPE			M_PERFCTL_MT_EN(1)
+#define    M_TC_EN_TC			M_PERFCTL_MT_EN(2)
+#define M_PERFCTL_TCID(tcid)		((tcid)   << 22)
+#define M_PERFCTL_WIDE			(1      << 30)
+#define M_PERFCTL_MORE			(1      << 31)
+
+#define M_PERFCTL_COUNT_EVENT_WHENEVER	(M_PERFCTL_EXL |		\
+					M_PERFCTL_KERNEL |		\
+					M_PERFCTL_USER |		\
+					M_PERFCTL_SUPERVISOR |		\
+					M_PERFCTL_INTERRUPT_ENABLE)
+
+#ifdef CONFIG_MIPS_MT_SMP
+#define M_PERFCTL_CONFIG_MASK		0x3fff801f
+#else
+#define M_PERFCTL_CONFIG_MASK		0x1f
+#endif
+#define M_PERFCTL_EVENT_MASK		0xfe0
+
+
+#ifdef CONFIG_MIPS_MT_SMP
+static int cpu_has_mipsmt_pertccounters;
+
+static DEFINE_RWLOCK(pmuint_rwlock);
+
+/*
+ * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
+ * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
+ */
+#if defined(CONFIG_HW_PERF_EVENTS)
+#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
+			0 : smp_processor_id())
+#else
+#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
+			0 : cpu_data[smp_processor_id()].vpe_id)
+#endif
+
+/* Copied from op_model_mipsxx.c */
+static unsigned int vpe_shift(void)
+{
+	if (num_possible_cpus() > 1)
+		return 1;
+
+	return 0;
+}
+
+static unsigned int counters_total_to_per_cpu(unsigned int counters)
+{
+	return counters >> vpe_shift();
+}
+
+static unsigned int counters_per_cpu_to_total(unsigned int counters)
+{
+	return counters << vpe_shift();
+}
+
+#else /* !CONFIG_MIPS_MT_SMP */
+#define vpe_id()	0
+
+#endif /* CONFIG_MIPS_MT_SMP */
+
+static void resume_local_counters(void);
+static void pause_local_counters(void);
+static irqreturn_t mipsxx_pmu_handle_irq(int, void *);
+static int mipsxx_pmu_handle_shared_irq(void);
+
+static unsigned int mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
+{
+	if (vpe_id() == 1)
+		idx = (idx + 2) & 3;
+	return idx;
+}
+
+static u64 mipsxx_pmu_read_counter(unsigned int idx)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		return read_c0_perfcntr0();
+	case 1:
+		return read_c0_perfcntr1();
+	case 2:
+		return read_c0_perfcntr2();
+	case 3:
+		return read_c0_perfcntr3();
+	default:
+		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
+		return 0;
+	}
+}
+
+static u64 mipsxx_pmu_read_counter_64(unsigned int idx)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		return read_c0_perfcntr0_64();
+	case 1:
+		return read_c0_perfcntr1_64();
+	case 2:
+		return read_c0_perfcntr2_64();
+	case 3:
+		return read_c0_perfcntr3_64();
+	default:
+		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
+		return 0;
+	}
+}
+
+static void mipsxx_pmu_write_counter(unsigned int idx, u64 val)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		write_c0_perfcntr0(val);
+		return;
+	case 1:
+		write_c0_perfcntr1(val);
+		return;
+	case 2:
+		write_c0_perfcntr2(val);
+		return;
+	case 3:
+		write_c0_perfcntr3(val);
+		return;
+	}
+}
+
+static void mipsxx_pmu_write_counter_64(unsigned int idx, u64 val)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		write_c0_perfcntr0_64(val);
+		return;
+	case 1:
+		write_c0_perfcntr1_64(val);
+		return;
+	case 2:
+		write_c0_perfcntr2_64(val);
+		return;
+	case 3:
+		write_c0_perfcntr3_64(val);
+		return;
+	}
+}
+
+static unsigned int mipsxx_pmu_read_control(unsigned int idx)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		return read_c0_perfctrl0();
+	case 1:
+		return read_c0_perfctrl1();
+	case 2:
+		return read_c0_perfctrl2();
+	case 3:
+		return read_c0_perfctrl3();
+	default:
+		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
+		return 0;
+	}
+}
+
+static void mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
+{
+	idx = mipsxx_pmu_swizzle_perf_idx(idx);
+
+	switch (idx) {
+	case 0:
+		write_c0_perfctrl0(val);
+		return;
+	case 1:
+		write_c0_perfctrl1(val);
+		return;
+	case 2:
+		write_c0_perfctrl2(val);
+		return;
+	case 3:
+		write_c0_perfctrl3(val);
+		return;
+	}
+}
+
+static int mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
+				    struct hw_perf_event *hwc)
+{
+	int i;
+
+	/*
+	 * We only need to care the counter mask. The range has been
+	 * checked definitely.
+	 */
+	unsigned long cntr_mask = (hwc->event_base >> 8) & 0xffff;
+
+	for (i = mipspmu.num_counters - 1; i >= 0; i--) {
+		/*
+		 * Note that some MIPS perf events can be counted by both
+		 * even and odd counters, wheresas many other are only by
+		 * even _or_ odd counters. This introduces an issue that
+		 * when the former kind of event takes the counter the
+		 * latter kind of event wants to use, then the "counter
+		 * allocation" for the latter event will fail. In fact if
+		 * they can be dynamically swapped, they both feel happy.
+		 * But here we leave this issue alone for now.
+		 */
+		if (test_bit(i, &cntr_mask) &&
+			!test_and_set_bit(i, cpuc->used_mask))
+			return i;
+	}
+
+	return -EAGAIN;
+}
+
+static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	unsigned long flags;
+
+	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
+
+	local_irq_save(flags);
+	cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base & 0xff) |
+		(evt->config_base & M_PERFCTL_CONFIG_MASK) |
+		/* Make sure interrupt enabled. */
+		M_PERFCTL_INTERRUPT_ENABLE;
+	/*
+	 * We do not actually let the counter run. Leave it until start().
+	 */
+	local_irq_restore(flags);
+}
+
+static void mipsxx_pmu_disable_event(int idx)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	unsigned long flags;
+
+	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
+
+	local_irq_save(flags);
+	cpuc->saved_ctrl[idx] = mipsxx_pmu_read_control(idx) &
+		~M_PERFCTL_COUNT_EVENT_WHENEVER;
+	mipsxx_pmu_write_control(idx, cpuc->saved_ctrl[idx]);
+	local_irq_restore(flags);
+}
 
 static int mipspmu_event_set_period(struct perf_event *event,
 				    struct hw_perf_event *hwc,
 				    int idx)
 {
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	s64 left = local64_read(&hwc->period_left);
-	s64 period = hwc->sample_period;
+	u64 left = local64_read(&hwc->period_left);
+	u64 period = hwc->sample_period;
 	int ret = 0;
-	u64 uleft;
 	unsigned long flags;
 
-	if (unlikely(left <= -period)) {
+	if (unlikely((left + period) & (1ULL << 63))) {
+		/* left underflowed by more than period. */
 		left = period;
 		local64_set(&hwc->period_left, left);
 		hwc->last_period = period;
 		ret = 1;
-	}
-
-	if (unlikely(left <= 0)) {
+	} else	if (unlikely((left + period) <= period)) {
+		/* left underflowed by less than period. */
 		left += period;
 		local64_set(&hwc->period_left, left);
 		hwc->last_period = period;
 		ret = 1;
 	}
 
-	if (left > (s64)MAX_PERIOD)
-		left = MAX_PERIOD;
+	if (left > mipspmu.max_period) {
+		left = mipspmu.max_period;
+		local64_set(&hwc->period_left, left);
+	}
 
-	local64_set(&hwc->prev_count, (u64)-left);
+	local64_set(&hwc->prev_count, mipspmu.overflow - left);
 
 	local_irq_save(flags);
-	uleft = (u64)(-left) & MAX_PERIOD;
-	uleft > VALID_COUNT ?
-		set_bit(idx, cpuc->msbs) : clear_bit(idx, cpuc->msbs);
-	mipspmu->write_counter(idx, (u64)(-left) & VALID_COUNT);
+	mipspmu.write_counter(idx, mipspmu.overflow - left);
 	local_irq_restore(flags);
 
 	perf_event_update_userpage(event);
@@ -164,30 +406,22 @@ static void mipspmu_event_update(struct perf_event *event,
 				 struct hw_perf_event *hwc,
 				 int idx)
 {
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	unsigned long flags;
-	int shift = 64 - TOTAL_BITS;
-	s64 prev_raw_count, new_raw_count;
+	u64 prev_raw_count, new_raw_count;
 	u64 delta;
 
 again:
 	prev_raw_count = local64_read(&hwc->prev_count);
 	local_irq_save(flags);
 	/* Make the counter value be a "real" one. */
-	new_raw_count = mipspmu->read_counter(idx);
-	if (new_raw_count & (test_bit(idx, cpuc->msbs) << HIGHEST_BIT)) {
-		new_raw_count &= VALID_COUNT;
-		clear_bit(idx, cpuc->msbs);
-	} else
-		new_raw_count |= (test_bit(idx, cpuc->msbs) << HIGHEST_BIT);
+	new_raw_count = mipspmu.read_counter(idx);
 	local_irq_restore(flags);
 
 	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
 				new_raw_count) != prev_raw_count)
 		goto again;
 
-	delta = (new_raw_count << shift) - (prev_raw_count << shift);
-	delta >>= shift;
+	delta = new_raw_count - prev_raw_count;
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -199,9 +433,6 @@ static void mipspmu_start(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (!mipspmu)
-		return;
-
 	if (flags & PERF_EF_RELOAD)
 		WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
 
@@ -211,19 +442,16 @@ static void mipspmu_start(struct perf_event *event, int flags)
 	mipspmu_event_set_period(event, hwc, hwc->idx);
 
 	/* Enable the event. */
-	mipspmu->enable_event(hwc, hwc->idx);
+	mipsxx_pmu_enable_event(hwc, hwc->idx);
 }
 
 static void mipspmu_stop(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (!mipspmu)
-		return;
-
 	if (!(hwc->state & PERF_HES_STOPPED)) {
 		/* We are working on a local event. */
-		mipspmu->disable_event(hwc->idx);
+		mipsxx_pmu_disable_event(hwc->idx);
 		barrier();
 		mipspmu_event_update(event, hwc, hwc->idx);
 		hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
@@ -240,7 +468,7 @@ static int mipspmu_add(struct perf_event *event, int flags)
 	perf_pmu_disable(event->pmu);
 
 	/* To look for a free counter for this event. */
-	idx = mipspmu->alloc_counter(cpuc, hwc);
+	idx = mipsxx_pmu_alloc_counter(cpuc, hwc);
 	if (idx < 0) {
 		err = idx;
 		goto out;
@@ -251,7 +479,7 @@ static int mipspmu_add(struct perf_event *event, int flags)
 	 * make sure it is disabled.
 	 */
 	event->hw.idx = idx;
-	mipspmu->disable_event(idx);
+	mipsxx_pmu_disable_event(idx);
 	cpuc->events[idx] = event;
 
 	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
@@ -272,7 +500,7 @@ static void mipspmu_del(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
+	WARN_ON(idx < 0 || idx >= mipspmu.num_counters);
 
 	mipspmu_stop(event, PERF_EF_UPDATE);
 	cpuc->events[idx] = NULL;
@@ -294,14 +522,29 @@ static void mipspmu_read(struct perf_event *event)
 
 static void mipspmu_enable(struct pmu *pmu)
 {
-	if (mipspmu)
-		mipspmu->start();
+#ifdef CONFIG_MIPS_MT_SMP
+	write_unlock(&pmuint_rwlock);
+#endif
+	resume_local_counters();
 }
 
+/*
+ * MIPS performance counters can be per-TC. The control registers can
+ * not be directly accessed accross CPUs. Hence if we want to do global
+ * control, we need cross CPU calls. on_each_cpu() can help us, but we
+ * can not make sure this function is called with interrupts enabled. So
+ * here we pause local counters and then grab a rwlock and leave the
+ * counters on other CPUs alone. If any counter interrupt raises while
+ * we own the write lock, simply pause local counters on that CPU and
+ * spin in the handler. Also we know we won't be switched to another
+ * CPU after pausing local counters and before grabbing the lock.
+ */
 static void mipspmu_disable(struct pmu *pmu)
 {
-	if (mipspmu)
-		mipspmu->stop();
+	pause_local_counters();
+#ifdef CONFIG_MIPS_MT_SMP
+	write_lock(&pmuint_rwlock);
+#endif
 }
 
 static atomic_t active_events = ATOMIC_INIT(0);
@@ -312,21 +555,21 @@ static int mipspmu_get_irq(void)
 {
 	int err;
 
-	if (mipspmu->irq >= 0) {
+	if (mipspmu.irq >= 0) {
 		/* Request my own irq handler. */
-		err = request_irq(mipspmu->irq, mipspmu->handle_irq,
-			IRQF_DISABLED | IRQF_NOBALANCING,
+		err = request_irq(mipspmu.irq, mipsxx_pmu_handle_irq,
+			IRQF_PERCPU | IRQF_NOBALANCING,
 			"mips_perf_pmu", NULL);
 		if (err) {
 			pr_warning("Unable to request IRQ%d for MIPS "
-			   "performance counters!\n", mipspmu->irq);
+			   "performance counters!\n", mipspmu.irq);
 		}
 	} else if (cp0_perfcount_irq < 0) {
 		/*
 		 * We are sharing the irq number with the timer interrupt.
 		 */
 		save_perf_irq = perf_irq;
-		perf_irq = mipspmu->handle_shared_irq;
+		perf_irq = mipsxx_pmu_handle_shared_irq;
 		err = 0;
 	} else {
 		pr_warning("The platform hasn't properly defined its "
@@ -339,8 +582,8 @@ static int mipspmu_get_irq(void)
 
 static void mipspmu_free_irq(void)
 {
-	if (mipspmu->irq >= 0)
-		free_irq(mipspmu->irq, NULL);
+	if (mipspmu.irq >= 0)
+		free_irq(mipspmu.irq, NULL);
 	else if (cp0_perfcount_irq < 0)
 		perf_irq = save_perf_irq;
 }
@@ -361,7 +604,7 @@ static void hw_perf_event_destroy(struct perf_event *event)
 		 * disabled.
 		 */
 		on_each_cpu(reset_counters,
-			(void *)(long)mipspmu->num_counters, 1);
+			(void *)(long)mipspmu.num_counters, 1);
 		mipspmu_free_irq();
 		mutex_unlock(&pmu_reserve_mutex);
 	}
@@ -381,8 +624,8 @@ static int mipspmu_event_init(struct perf_event *event)
 		return -ENOENT;
 	}
 
-	if (!mipspmu || event->cpu >= nr_cpumask_bits ||
-		(event->cpu >= 0 && !cpu_online(event->cpu)))
+	if (event->cpu >= nr_cpumask_bits ||
+	    (event->cpu >= 0 && !cpu_online(event->cpu)))
 		return -ENODEV;
 
 	if (!atomic_inc_not_zero(&active_events)) {
@@ -441,9 +684,9 @@ static const struct mips_perf_event *mipspmu_map_general_event(int idx)
 {
 	const struct mips_perf_event *pev;
 
-	pev = ((*mipspmu->general_event_map)[idx].event_id ==
+	pev = ((*mipspmu.general_event_map)[idx].event_id ==
 		UNSUPPORTED_PERF_EVENT_ID ? ERR_PTR(-EOPNOTSUPP) :
-		&(*mipspmu->general_event_map)[idx]);
+		&(*mipspmu.general_event_map)[idx]);
 
 	return pev;
 }
@@ -465,7 +708,7 @@ static const struct mips_perf_event *mipspmu_map_cache_event(u64 config)
 	if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
 		return ERR_PTR(-EINVAL);
 
-	pev = &((*mipspmu->cache_event_map)
+	pev = &((*mipspmu.cache_event_map)
 					[cache_type]
 					[cache_op]
 					[cache_result]);
@@ -486,7 +729,7 @@ static int validate_event(struct cpu_hw_events *cpuc,
 	if (event->pmu != &pmu || event->state <= PERF_EVENT_STATE_OFF)
 		return 1;
 
-	return mipspmu->alloc_counter(cpuc, &fake_hwc) >= 0;
+	return mipsxx_pmu_alloc_counter(cpuc, &fake_hwc) >= 0;
 }
 
 static int validate_group(struct perf_event *event)
@@ -524,123 +767,9 @@ static void handle_associated_event(struct cpu_hw_events *cpuc,
 		return;
 
 	if (perf_event_overflow(event, 0, data, regs))
-		mipspmu->disable_event(idx);
-}
-
-#define M_CONFIG1_PC	(1 << 4)
-
-#define M_PERFCTL_EXL			(1UL      <<  0)
-#define M_PERFCTL_KERNEL		(1UL      <<  1)
-#define M_PERFCTL_SUPERVISOR		(1UL      <<  2)
-#define M_PERFCTL_USER			(1UL      <<  3)
-#define M_PERFCTL_INTERRUPT_ENABLE	(1UL      <<  4)
-#define M_PERFCTL_EVENT(event)		(((event) & 0x3ff)  << 5)
-#define M_PERFCTL_VPEID(vpe)		((vpe)    << 16)
-#define M_PERFCTL_MT_EN(filter)		((filter) << 20)
-#define    M_TC_EN_ALL			M_PERFCTL_MT_EN(0)
-#define    M_TC_EN_VPE			M_PERFCTL_MT_EN(1)
-#define    M_TC_EN_TC			M_PERFCTL_MT_EN(2)
-#define M_PERFCTL_TCID(tcid)		((tcid)   << 22)
-#define M_PERFCTL_WIDE			(1UL      << 30)
-#define M_PERFCTL_MORE			(1UL      << 31)
-
-#define M_PERFCTL_COUNT_EVENT_WHENEVER	(M_PERFCTL_EXL |		\
-					M_PERFCTL_KERNEL |		\
-					M_PERFCTL_USER |		\
-					M_PERFCTL_SUPERVISOR |		\
-					M_PERFCTL_INTERRUPT_ENABLE)
-
-#ifdef CONFIG_MIPS_MT_SMP
-#define M_PERFCTL_CONFIG_MASK		0x3fff801f
-#else
-#define M_PERFCTL_CONFIG_MASK		0x1f
-#endif
-#define M_PERFCTL_EVENT_MASK		0xfe0
-
-#define M_COUNTER_OVERFLOW		(1UL      << 31)
-
-#ifdef CONFIG_MIPS_MT_SMP
-static int cpu_has_mipsmt_pertccounters;
-
-/*
- * FIXME: For VSMP, vpe_id() is redefined for Perf-events, because
- * cpu_data[cpuid].vpe_id reports 0 for _both_ CPUs.
- */
-#if defined(CONFIG_HW_PERF_EVENTS)
-#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
-			0 : smp_processor_id())
-#else
-#define vpe_id()	(cpu_has_mipsmt_pertccounters ? \
-			0 : cpu_data[smp_processor_id()].vpe_id)
-#endif
-
-/* Copied from op_model_mipsxx.c */
-static unsigned int vpe_shift(void)
-{
-	if (num_possible_cpus() > 1)
-		return 1;
-
-	return 0;
-}
-
-static unsigned int counters_total_to_per_cpu(unsigned int counters)
-{
-	return counters >> vpe_shift();
+		mipsxx_pmu_disable_event(idx);
 }
 
-static unsigned int counters_per_cpu_to_total(unsigned int counters)
-{
-	return counters << vpe_shift();
-}
-
-#else /* !CONFIG_MIPS_MT_SMP */
-#define vpe_id()	0
-
-#endif /* CONFIG_MIPS_MT_SMP */
-
-#define __define_perf_accessors(r, n, np)				\
-									\
-static unsigned int r_c0_ ## r ## n(void)				\
-{									\
-	unsigned int cpu = vpe_id();					\
-									\
-	switch (cpu) {							\
-	case 0:								\
-		return read_c0_ ## r ## n();				\
-	case 1:								\
-		return read_c0_ ## r ## np();				\
-	default:							\
-		BUG();							\
-	}								\
-	return 0;							\
-}									\
-									\
-static void w_c0_ ## r ## n(unsigned int value)				\
-{									\
-	unsigned int cpu = vpe_id();					\
-									\
-	switch (cpu) {							\
-	case 0:								\
-		write_c0_ ## r ## n(value);				\
-		return;							\
-	case 1:								\
-		write_c0_ ## r ## np(value);				\
-		return;							\
-	default:							\
-		BUG();							\
-	}								\
-	return;								\
-}									\
-
-__define_perf_accessors(perfcntr, 0, 2)
-__define_perf_accessors(perfcntr, 1, 3)
-__define_perf_accessors(perfcntr, 2, 0)
-__define_perf_accessors(perfcntr, 3, 1)
-
-__define_perf_accessors(perfctrl, 0, 2)
-__define_perf_accessors(perfctrl, 1, 3)
-__define_perf_accessors(perfctrl, 2, 0)
-__define_perf_accessors(perfctrl, 3, 1)
 
 static int __n_counters(void)
 {
@@ -682,94 +811,20 @@ static void reset_counters(void *arg)
 	int counters = (int)(long)arg;
 	switch (counters) {
 	case 4:
-		w_c0_perfctrl3(0);
-		w_c0_perfcntr3(0);
-	case 3:
-		w_c0_perfctrl2(0);
-		w_c0_perfcntr2(0);
-	case 2:
-		w_c0_perfctrl1(0);
-		w_c0_perfcntr1(0);
-	case 1:
-		w_c0_perfctrl0(0);
-		w_c0_perfcntr0(0);
-	}
-}
-
-static u64 mipsxx_pmu_read_counter(unsigned int idx)
-{
-	switch (idx) {
-	case 0:
-		return r_c0_perfcntr0();
-	case 1:
-		return r_c0_perfcntr1();
-	case 2:
-		return r_c0_perfcntr2();
+		mipsxx_pmu_write_control(3, 0);
+		mipspmu.write_counter(3, 0);
 	case 3:
-		return r_c0_perfcntr3();
-	default:
-		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
-		return 0;
-	}
-}
-
-static void mipsxx_pmu_write_counter(unsigned int idx, u64 val)
-{
-	switch (idx) {
-	case 0:
-		w_c0_perfcntr0(val);
-		return;
-	case 1:
-		w_c0_perfcntr1(val);
-		return;
+		mipsxx_pmu_write_control(2, 0);
+		mipspmu.write_counter(2, 0);
 	case 2:
-		w_c0_perfcntr2(val);
-		return;
-	case 3:
-		w_c0_perfcntr3(val);
-		return;
-	}
-}
-
-static unsigned int mipsxx_pmu_read_control(unsigned int idx)
-{
-	switch (idx) {
-	case 0:
-		return r_c0_perfctrl0();
+		mipsxx_pmu_write_control(1, 0);
+		mipspmu.write_counter(1, 0);
 	case 1:
-		return r_c0_perfctrl1();
-	case 2:
-		return r_c0_perfctrl2();
-	case 3:
-		return r_c0_perfctrl3();
-	default:
-		WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
-		return 0;
+		mipsxx_pmu_write_control(0, 0);
+		mipspmu.write_counter(0, 0);
 	}
 }
 
-static void mipsxx_pmu_write_control(unsigned int idx, unsigned int val)
-{
-	switch (idx) {
-	case 0:
-		w_c0_perfctrl0(val);
-		return;
-	case 1:
-		w_c0_perfctrl1(val);
-		return;
-	case 2:
-		w_c0_perfctrl2(val);
-		return;
-	case 3:
-		w_c0_perfctrl3(val);
-		return;
-	}
-}
-
-#ifdef CONFIG_MIPS_MT_SMP
-static DEFINE_RWLOCK(pmuint_rwlock);
-#endif
-
 /* 24K/34K/1004K cores can share the same event map. */
 static const struct mips_perf_event mipsxxcore_event_map
 				[PERF_COUNT_HW_MAX] = {
@@ -1047,7 +1102,7 @@ static int __hw_perf_event_init(struct perf_event *event)
 	} else if (PERF_TYPE_RAW == event->attr.type) {
 		/* We are working on the global raw event. */
 		mutex_lock(&raw_event_mutex);
-		pev = mipspmu->map_raw_event(event->attr.config);
+		pev = mipspmu.map_raw_event(event->attr.config);
 	} else {
 		/* The event type is not (yet) supported. */
 		return -EOPNOTSUPP;
@@ -1092,7 +1147,7 @@ static int __hw_perf_event_init(struct perf_event *event)
 	hwc->config = 0;
 
 	if (!hwc->sample_period) {
-		hwc->sample_period  = MAX_PERIOD;
+		hwc->sample_period  = mipspmu.max_period;
 		hwc->last_period    = hwc->sample_period;
 		local64_set(&hwc->period_left, hwc->sample_period);
 	}
@@ -1105,55 +1160,38 @@ static int __hw_perf_event_init(struct perf_event *event)
 	}
 
 	event->destroy = hw_perf_event_destroy;
-
 	return err;
 }
 
 static void pause_local_counters(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	int counters = mipspmu->num_counters;
+	int ctr = mipspmu.num_counters;
 	unsigned long flags;
 
 	local_irq_save(flags);
-	switch (counters) {
-	case 4:
-		cpuc->saved_ctrl[3] = r_c0_perfctrl3();
-		w_c0_perfctrl3(cpuc->saved_ctrl[3] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	case 3:
-		cpuc->saved_ctrl[2] = r_c0_perfctrl2();
-		w_c0_perfctrl2(cpuc->saved_ctrl[2] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	case 2:
-		cpuc->saved_ctrl[1] = r_c0_perfctrl1();
-		w_c0_perfctrl1(cpuc->saved_ctrl[1] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	case 1:
-		cpuc->saved_ctrl[0] = r_c0_perfctrl0();
-		w_c0_perfctrl0(cpuc->saved_ctrl[0] &
-			~M_PERFCTL_COUNT_EVENT_WHENEVER);
-	}
+	do {
+		ctr--;
+		cpuc->saved_ctrl[ctr] = mipsxx_pmu_read_control(ctr);
+		mipsxx_pmu_write_control(ctr, cpuc->saved_ctrl[ctr] &
+					 ~M_PERFCTL_COUNT_EVENT_WHENEVER);
+	} while (ctr > 0);
 	local_irq_restore(flags);
 }
 
 static void resume_local_counters(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	int counters = mipspmu->num_counters;
+	int ctr = mipspmu.num_counters;
 	unsigned long flags;
 
 	local_irq_save(flags);
-	switch (counters) {
-	case 4:
-		w_c0_perfctrl3(cpuc->saved_ctrl[3]);
-	case 3:
-		w_c0_perfctrl2(cpuc->saved_ctrl[2]);
-	case 2:
-		w_c0_perfctrl1(cpuc->saved_ctrl[1]);
-	case 1:
-		w_c0_perfctrl0(cpuc->saved_ctrl[0]);
-	}
+
+	do {
+		ctr--;
+		mipsxx_pmu_write_control(ctr, cpuc->saved_ctrl[ctr]);
+	} while (ctr > 0);
+
 	local_irq_restore(flags);
 }
 
@@ -1161,14 +1199,13 @@ static int mipsxx_pmu_handle_shared_irq(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct perf_sample_data data;
-	unsigned int counters = mipspmu->num_counters;
-	unsigned int counter;
+	unsigned int counters = mipspmu.num_counters;
+	u64 counter;
 	int handled = IRQ_NONE;
 	struct pt_regs *regs;
 
 	if (cpu_has_mips_r2 && !(read_c0_cause() & (1 << 26)))
 		return handled;
-
 	/*
 	 * First we pause the local counters, so that when we are locked
 	 * here, the counters are all paused. When it gets locked due to
@@ -1189,13 +1226,9 @@ static int mipsxx_pmu_handle_shared_irq(void)
 #define HANDLE_COUNTER(n)						\
 	case n + 1:							\
 		if (test_bit(n, cpuc->used_mask)) {			\
-			counter = r_c0_perfcntr ## n();			\
-			if (counter & M_COUNTER_OVERFLOW) {		\
-				w_c0_perfcntr ## n(counter &		\
-						VALID_COUNT);		\
-				if (test_and_change_bit(n, cpuc->msbs))	\
-					handle_associated_event(cpuc,	\
-						n, &data, regs);	\
+			counter = mipspmu.read_counter(n);		\
+			if (counter & mipspmu.overflow) {		\
+				handle_associated_event(cpuc, n, &data, regs); \
 				handled = IRQ_HANDLED;			\
 			}						\
 		}
@@ -1225,95 +1258,6 @@ static irqreturn_t mipsxx_pmu_handle_irq(int irq, void *dev)
 	return mipsxx_pmu_handle_shared_irq();
 }
 
-static void mipsxx_pmu_start(void)
-{
-#ifdef CONFIG_MIPS_MT_SMP
-	write_unlock(&pmuint_rwlock);
-#endif
-	resume_local_counters();
-}
-
-/*
- * MIPS performance counters can be per-TC. The control registers can
- * not be directly accessed accross CPUs. Hence if we want to do global
- * control, we need cross CPU calls. on_each_cpu() can help us, but we
- * can not make sure this function is called with interrupts enabled. So
- * here we pause local counters and then grab a rwlock and leave the
- * counters on other CPUs alone. If any counter interrupt raises while
- * we own the write lock, simply pause local counters on that CPU and
- * spin in the handler. Also we know we won't be switched to another
- * CPU after pausing local counters and before grabbing the lock.
- */
-static void mipsxx_pmu_stop(void)
-{
-	pause_local_counters();
-#ifdef CONFIG_MIPS_MT_SMP
-	write_lock(&pmuint_rwlock);
-#endif
-}
-
-static int mipsxx_pmu_alloc_counter(struct cpu_hw_events *cpuc,
-				    struct hw_perf_event *hwc)
-{
-	int i;
-
-	/*
-	 * We only need to care the counter mask. The range has been
-	 * checked definitely.
-	 */
-	unsigned long cntr_mask = (hwc->event_base >> 8) & 0xffff;
-
-	for (i = mipspmu->num_counters - 1; i >= 0; i--) {
-		/*
-		 * Note that some MIPS perf events can be counted by both
-		 * even and odd counters, wheresas many other are only by
-		 * even _or_ odd counters. This introduces an issue that
-		 * when the former kind of event takes the counter the
-		 * latter kind of event wants to use, then the "counter
-		 * allocation" for the latter event will fail. In fact if
-		 * they can be dynamically swapped, they both feel happy.
-		 * But here we leave this issue alone for now.
-		 */
-		if (test_bit(i, &cntr_mask) &&
-			!test_and_set_bit(i, cpuc->used_mask))
-			return i;
-	}
-
-	return -EAGAIN;
-}
-
-static void mipsxx_pmu_enable_event(struct hw_perf_event *evt, int idx)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	unsigned long flags;
-
-	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
-
-	local_irq_save(flags);
-	cpuc->saved_ctrl[idx] = M_PERFCTL_EVENT(evt->event_base & 0xff) |
-		(evt->config_base & M_PERFCTL_CONFIG_MASK) |
-		/* Make sure interrupt enabled. */
-		M_PERFCTL_INTERRUPT_ENABLE;
-	/*
-	 * We do not actually let the counter run. Leave it until start().
-	 */
-	local_irq_restore(flags);
-}
-
-static void mipsxx_pmu_disable_event(int idx)
-{
-	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	unsigned long flags;
-
-	WARN_ON(idx < 0 || idx >= mipspmu->num_counters);
-
-	local_irq_save(flags);
-	cpuc->saved_ctrl[idx] = mipsxx_pmu_read_control(idx) &
-		~M_PERFCTL_COUNT_EVENT_WHENEVER;
-	mipsxx_pmu_write_control(idx, cpuc->saved_ctrl[idx]);
-	local_irq_restore(flags);
-}
-
 /* 24K */
 #define IS_UNSUPPORTED_24K_EVENT(r, b)					\
 	((b) == 12 || (r) == 151 || (r) == 152 || (b) == 26 ||		\
@@ -1452,40 +1396,11 @@ static const struct mips_perf_event *mipsxx_pmu_map_raw_event(u64 config)
 	return &raw_event;
 }
 
-static struct mips_pmu mipsxxcore_pmu = {
-	.handle_irq = mipsxx_pmu_handle_irq,
-	.handle_shared_irq = mipsxx_pmu_handle_shared_irq,
-	.start = mipsxx_pmu_start,
-	.stop = mipsxx_pmu_stop,
-	.alloc_counter = mipsxx_pmu_alloc_counter,
-	.read_counter = mipsxx_pmu_read_counter,
-	.write_counter = mipsxx_pmu_write_counter,
-	.enable_event = mipsxx_pmu_enable_event,
-	.disable_event = mipsxx_pmu_disable_event,
-	.map_raw_event = mipsxx_pmu_map_raw_event,
-	.general_event_map = &mipsxxcore_event_map,
-	.cache_event_map = &mipsxxcore_cache_map,
-};
-
-static struct mips_pmu mipsxx74Kcore_pmu = {
-	.handle_irq = mipsxx_pmu_handle_irq,
-	.handle_shared_irq = mipsxx_pmu_handle_shared_irq,
-	.start = mipsxx_pmu_start,
-	.stop = mipsxx_pmu_stop,
-	.alloc_counter = mipsxx_pmu_alloc_counter,
-	.read_counter = mipsxx_pmu_read_counter,
-	.write_counter = mipsxx_pmu_write_counter,
-	.enable_event = mipsxx_pmu_enable_event,
-	.disable_event = mipsxx_pmu_disable_event,
-	.map_raw_event = mipsxx_pmu_map_raw_event,
-	.general_event_map = &mipsxx74Kcore_event_map,
-	.cache_event_map = &mipsxx74Kcore_cache_map,
-};
-
 static int __init
 init_hw_perf_events(void)
 {
 	int counters, irq;
+	int counter_bits;
 
 	pr_info("Performance counters: ");
 
@@ -1517,32 +1432,28 @@ init_hw_perf_events(void)
 	}
 #endif
 
-	on_each_cpu(reset_counters, (void *)(long)counters, 1);
+	mipspmu.map_raw_event = mipsxx_pmu_map_raw_event;
 
 	switch (current_cpu_type()) {
 	case CPU_24K:
-		mipsxxcore_pmu.name = "mips/24K";
-		mipsxxcore_pmu.num_counters = counters;
-		mipsxxcore_pmu.irq = irq;
-		mipspmu = &mipsxxcore_pmu;
+		mipspmu.name = "mips/24K";
+		mipspmu.general_event_map = &mipsxxcore_event_map;
+		mipspmu.cache_event_map = &mipsxxcore_cache_map;
 		break;
 	case CPU_34K:
-		mipsxxcore_pmu.name = "mips/34K";
-		mipsxxcore_pmu.num_counters = counters;
-		mipsxxcore_pmu.irq = irq;
-		mipspmu = &mipsxxcore_pmu;
+		mipspmu.name = "mips/34K";
+		mipspmu.general_event_map = &mipsxxcore_event_map;
+		mipspmu.cache_event_map = &mipsxxcore_cache_map;
 		break;
 	case CPU_74K:
-		mipsxx74Kcore_pmu.name = "mips/74K";
-		mipsxx74Kcore_pmu.num_counters = counters;
-		mipsxx74Kcore_pmu.irq = irq;
-		mipspmu = &mipsxx74Kcore_pmu;
+		mipspmu.name = "mips/74K";
+		mipspmu.general_event_map = &mipsxx74Kcore_event_map;
+		mipspmu.cache_event_map = &mipsxx74Kcore_cache_map;
 		break;
 	case CPU_1004K:
-		mipsxxcore_pmu.name = "mips/1004K";
-		mipsxxcore_pmu.num_counters = counters;
-		mipsxxcore_pmu.irq = irq;
-		mipspmu = &mipsxxcore_pmu;
+		mipspmu.name = "mips/1004K";
+		mipspmu.general_event_map = &mipsxxcore_event_map;
+		mipspmu.cache_event_map = &mipsxxcore_cache_map;
 		break;
 	default:
 		pr_cont("Either hardware does not support performance "
@@ -1550,10 +1461,30 @@ init_hw_perf_events(void)
 		return -ENODEV;
 	}
 
-	if (mipspmu)
-		pr_cont("%s PMU enabled, %d counters available to each "
-			"CPU, irq %d%s\n", mipspmu->name, counters, irq,
-			irq < 0 ? " (share with timer interrupt)" : "");
+	mipspmu.num_counters = counters;
+	mipspmu.irq = irq;
+
+	if (read_c0_perfctrl0() & M_PERFCTL_WIDE) {
+		mipspmu.max_period = (1ULL << 63) - 1;
+		mipspmu.valid_count = (1ULL << 63) - 1;
+		mipspmu.overflow = 1ULL << 63;
+		mipspmu.read_counter = mipsxx_pmu_read_counter_64;
+		mipspmu.write_counter = mipsxx_pmu_write_counter_64;
+		counter_bits = 64;
+	} else {
+		mipspmu.max_period = (1ULL << 31) - 1;
+		mipspmu.valid_count = (1ULL << 31) - 1;
+		mipspmu.overflow = 1ULL << 31;
+		mipspmu.read_counter = mipsxx_pmu_read_counter;
+		mipspmu.write_counter = mipsxx_pmu_write_counter;
+		counter_bits = 32;
+	}
+
+	on_each_cpu(reset_counters, (void *)(long)counters, 1);
+
+	pr_cont("%s PMU enabled, %d %d-bit counters available to each "
+		"CPU, irq %d%s\n", mipspmu.name, counters, counter_bits, irq,
+		irq < 0 ? " (share with timer interrupt)" : "");
 
 	perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-01-26  0:20     ` David Daney
@ 2011-01-27  6:24       ` Deng-Cheng Zhu
  2011-01-27 18:41         ` David Daney
  0 siblings, 1 reply; 15+ messages in thread
From: Deng-Cheng Zhu @ 2011-01-27  6:24 UTC (permalink / raw)
  To: David Daney
  Cc: linux-mips, ralf, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

Using your attached patch, I experimented -c and -F by 'ls /'. The numbers
I used are 10, 1000 and 100000 for both -c and -F.

The number of samples I got was 24 all the way. That means the event period
to sample and the profiling frequency do not affect the results on MIPS32
platform. While working on the old code, the system had the following
results:

-c 10: The system seems busy dealing with interrupts. And the following log
       was printed out:
       ================================================
       hda: ide_dma_sff_timer_expiry: DMA status (0x24)
       hda: DMA interrupt recovery
       hda: lost interrupt
       ================================================
       This does need to be fixed later on.
-c 1000: ~11085 samples
-c 100000: ~48 samples ('perf report' still showed some data.)
-F 10: ~118 samples
-F 1000: ~352 samples
-F 100000: ~379 samples

I'll try to take time to look into the patch to see if anything can be
changed.


Deng-Cheng


2011/1/26 David Daney <ddaney@caviumnetworks.com>:
> On 01/24/2011 07:42 PM, Deng-Cheng Zhu wrote:
>>
>> Hi, David
>>
>>
>> This version does fix the problem with 'perf stat'. However, when working
>> with 'perf record', the following happened:
>>
>> -sh-4.0# perf record -f -e cycles -e instructions -e branches \
>>>
>>> -e branch-misses -e r12 find / -name "*sys*">/dev/null
>>
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.001 MB perf.data (~53 samples) ]
>
>
> I get the same thing.  What happens if you supply either '-c xxx' or '-f
> xxx'?
>
> I get:octeon:~/linux/tools/perf# ./perf record -e cycles /bin/ls -l /
> total 100
> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
> [...]
> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (~82 samples) ]
>
> Almost no samples as you got.
>
> But if I do:
>
> octeon:~/linux/tools/perf# ./perf record -F 100000 -e cycles /bin/ls -l /
> total 100
> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
> [...]
> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.404 MB perf.data (~17653 samples) ]
>
> Look many more samples!
>
> The question is, what is it supposed to do?
>
> If you can get a reasonable number of samples out if you supply -c or
> -F, then I would argue that it is working and the default settings for
> -F are not a good fit for your test case.
>
> I have slightly changed the patch.  You could try the attached version
> instead and tell me the results.
>
>
> David Daney
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-01-27  6:24       ` Deng-Cheng Zhu
@ 2011-01-27 18:41         ` David Daney
  2011-01-28  2:46           ` Deng-Cheng Zhu
  0 siblings, 1 reply; 15+ messages in thread
From: David Daney @ 2011-01-27 18:41 UTC (permalink / raw)
  To: Deng-Cheng Zhu
  Cc: linux-mips, ralf, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On 01/26/2011 10:24 PM, Deng-Cheng Zhu wrote:
> Using your attached patch, I experimented -c and -F by 'ls /'. The numbers
> I used are 10, 1000 and 100000 for both -c and -F.
>
> The number of samples I got was 24 all the way. That means the event period
> to sample and the profiling frequency do not affect the results on MIPS32
> platform. While working on the old code, the system had the following
> results:
>
> -c 10: The system seems busy dealing with interrupts. And the following log
>         was printed out:
>         ================================================
>         hda: ide_dma_sff_timer_expiry: DMA status (0x24)
>         hda: DMA interrupt recovery
>         hda: lost interrupt
>         ================================================
>         This does need to be fixed later on.
> -c 1000: ~11085 samples
> -c 100000: ~48 samples ('perf report' still showed some data.)
> -F 10: ~118 samples
> -F 1000: ~352 samples
> -F 100000: ~379 samples
>
> I'll try to take time to look into the patch to see if anything can be
> changed.
>

I have found it useful to enable tracing, and then placing 
trace_printk() in mipspmu_event_set_period() to look at the values of:

sample_period, period_left that are being used.

Also you could use a trace_printk() in mipsxx_pmu_write_counter() to 
check the value being written to the register.

What hardware are you using to test this?  I wonder if there is a board 
with a 32-bit CPU that I could get access to.

David Daney


>
> Deng-Cheng
>
>
> 2011/1/26 David Daney<ddaney@caviumnetworks.com>:
>> On 01/24/2011 07:42 PM, Deng-Cheng Zhu wrote:
>>>
>>> Hi, David
>>>
>>>
>>> This version does fix the problem with 'perf stat'. However, when working
>>> with 'perf record', the following happened:
>>>
>>> -sh-4.0# perf record -f -e cycles -e instructions -e branches \
>>>>
>>>> -e branch-misses -e r12 find / -name "*sys*">/dev/null
>>>
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.001 MB perf.data (~53 samples) ]
>>
>>
>> I get the same thing.  What happens if you supply either '-c xxx' or '-f
>> xxx'?
>>
>> I get:octeon:~/linux/tools/perf# ./perf record -e cycles /bin/ls -l /
>> total 100
>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>> [...]
>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.002 MB perf.data (~82 samples) ]
>>
>> Almost no samples as you got.
>>
>> But if I do:
>>
>> octeon:~/linux/tools/perf# ./perf record -F 100000 -e cycles /bin/ls -l /
>> total 100
>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>> [...]
>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.404 MB perf.data (~17653 samples) ]
>>
>> Look many more samples!
>>
>> The question is, what is it supposed to do?
>>
>> If you can get a reasonable number of samples out if you supply -c or
>> -F, then I would argue that it is working and the default settings for
>> -F are not a good fit for your test case.
>>
>> I have slightly changed the patch.  You could try the attached version
>> instead and tell me the results.
>>
>>
>> David Daney
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-01-27 18:41         ` David Daney
@ 2011-01-28  2:46           ` Deng-Cheng Zhu
  2011-02-17 10:46             ` Deng-Cheng Zhu
  0 siblings, 1 reply; 15+ messages in thread
From: Deng-Cheng Zhu @ 2011-01-28  2:46 UTC (permalink / raw)
  To: David Daney
  Cc: linux-mips, ralf, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

OK. I'll try to use tracing when needed.

The hardware I was using for the test was Malta-R with 34K bitfile
programed into the FPGA. The CPU frequency is 50MHz.


Deng-Cheng


2011/1/28 David Daney <ddaney@caviumnetworks.com>:
> On 01/26/2011 10:24 PM, Deng-Cheng Zhu wrote:
>>
>> Using your attached patch, I experimented -c and -F by 'ls /'. The numbers
>> I used are 10, 1000 and 100000 for both -c and -F.
>>
>> The number of samples I got was 24 all the way. That means the event
>> period
>> to sample and the profiling frequency do not affect the results on MIPS32
>> platform. While working on the old code, the system had the following
>> results:
>>
>> -c 10: The system seems busy dealing with interrupts. And the following
>> log
>>        was printed out:
>>        ================================================
>>        hda: ide_dma_sff_timer_expiry: DMA status (0x24)
>>        hda: DMA interrupt recovery
>>        hda: lost interrupt
>>        ================================================
>>        This does need to be fixed later on.
>> -c 1000: ~11085 samples
>> -c 100000: ~48 samples ('perf report' still showed some data.)
>> -F 10: ~118 samples
>> -F 1000: ~352 samples
>> -F 100000: ~379 samples
>>
>> I'll try to take time to look into the patch to see if anything can be
>> changed.
>>
>
> I have found it useful to enable tracing, and then placing trace_printk() in
> mipspmu_event_set_period() to look at the values of:
>
> sample_period, period_left that are being used.
>
> Also you could use a trace_printk() in mipsxx_pmu_write_counter() to check
> the value being written to the register.
>
> What hardware are you using to test this?  I wonder if there is a board with
> a 32-bit CPU that I could get access to.
>
> David Daney
>
>
>>
>> Deng-Cheng
>>
>>
>> 2011/1/26 David Daney<ddaney@caviumnetworks.com>:
>>>
>>> On 01/24/2011 07:42 PM, Deng-Cheng Zhu wrote:
>>>>
>>>> Hi, David
>>>>
>>>>
>>>> This version does fix the problem with 'perf stat'. However, when
>>>> working
>>>> with 'perf record', the following happened:
>>>>
>>>> -sh-4.0# perf record -f -e cycles -e instructions -e branches \
>>>>>
>>>>> -e branch-misses -e r12 find / -name "*sys*">/dev/null
>>>>
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.001 MB perf.data (~53 samples) ]
>>>
>>>
>>> I get the same thing.  What happens if you supply either '-c xxx' or '-f
>>> xxx'?
>>>
>>> I get:octeon:~/linux/tools/perf# ./perf record -e cycles /bin/ls -l /
>>> total 100
>>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>>> [...]
>>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.002 MB perf.data (~82 samples) ]
>>>
>>> Almost no samples as you got.
>>>
>>> But if I do:
>>>
>>> octeon:~/linux/tools/perf# ./perf record -F 100000 -e cycles /bin/ls -l /
>>> total 100
>>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>>> [...]
>>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.404 MB perf.data (~17653 samples) ]
>>>
>>> Look many more samples!
>>>
>>> The question is, what is it supposed to do?
>>>
>>> If you can get a reasonable number of samples out if you supply -c or
>>> -F, then I would argue that it is working and the default settings for
>>> -F are not a good fit for your test case.
>>>
>>> I have slightly changed the patch.  You could try the attached version
>>> instead and tell me the results.
>>>
>>>
>>> David Daney
>>>
>>>
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-01-28  2:46           ` Deng-Cheng Zhu
@ 2011-02-17 10:46             ` Deng-Cheng Zhu
  2011-02-17 13:36               ` Ralf Baechle
                                 ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Deng-Cheng Zhu @ 2011-02-17 10:46 UTC (permalink / raw)
  To: David Daney
  Cc: linux-mips, ralf, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

Hi, David


The reason of the perf-record failure on 32bit platforms is that the 32bit
counter read function mipsxx_pmu_read_counter() returns wrong 64bit values.
For example, the counter value 0x12345678 will be returned as
0xffffffff12345678. So in mipspmu_event_update(), the delta will be wrong.
So here's a possible fix for your reference:

--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -184,19 +184,21 @@ static unsigned int
mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
        return idx;
 }

+#define U32_MASK 0xffffffff
+
 static u64 mipsxx_pmu_read_counter(unsigned int idx)
 {
        idx = mipsxx_pmu_swizzle_perf_idx(idx);

        switch (idx) {
        case 0:
-               return read_c0_perfcntr0();
+               return read_c0_perfcntr0() & U32_MASK;
        case 1:
-               return read_c0_perfcntr1();
+               return read_c0_perfcntr1() & U32_MASK;
        case 2:
-               return read_c0_perfcntr2();
+               return read_c0_perfcntr2() & U32_MASK;
        case 3:
-               return read_c0_perfcntr3();
+               return read_c0_perfcntr3() & U32_MASK;
        default:
                WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
                return 0;

In addition, since you removed the use of cpuc->msbs, some code relative to
this logic can be removed:

@@ -370,7 +372,6 @@ static int mipspmu_event_set_period(struct
perf_event *event,
        u64 left = local64_read(&hwc->period_left);
        u64 period = hwc->sample_period;
        int ret = 0;
-       unsigned long flags;

        if (unlikely((left + period) & (1ULL << 63))) {
                /* left underflowed by more than period. */
@@ -393,9 +394,7 @@ static int mipspmu_event_set_period(struct
perf_event *event,

        local64_set(&hwc->prev_count, mipspmu.overflow - left);

-       local_irq_save(flags);
        mipspmu.write_counter(idx, mipspmu.overflow - left);
-       local_irq_restore(flags);

        perf_event_update_userpage(event);

@@ -406,16 +405,12 @@ static void mipspmu_event_update(struct perf_event *event,
                                 struct hw_perf_event *hwc,
                                 int idx)
 {
-       unsigned long flags;
        u64 prev_raw_count, new_raw_count;
        u64 delta;

 again:
        prev_raw_count = local64_read(&hwc->prev_count);
-       local_irq_save(flags);
-       /* Make the counter value be a "real" one. */
        new_raw_count = mipspmu.read_counter(idx);
-       local_irq_restore(flags);

        if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
                                new_raw_count) != prev_raw_count)

And here's a general comment: You are putting the majority of the
implementation in perf_event_mipsxx.c. This will require other CPUs like
Loongson2 to replicate quite a lot code in their corresponding files. I
personally think the original "skeleton + #include perf_event_$cpu.c" is a
better choice. I understand you prefer not using code like
"#if defined(CONFIG_CPU_MIPS32)" on the top of perf_event_$cpu.c, but that
is what other architectures (X86/ARM etc) are doing.


Deng-Cheng


2011/1/28 Deng-Cheng Zhu <dengcheng.zhu@gmail.com>:
> OK. I'll try to use tracing when needed.
>
> The hardware I was using for the test was Malta-R with 34K bitfile
> programed into the FPGA. The CPU frequency is 50MHz.
>
>
> Deng-Cheng
>
>
> 2011/1/28 David Daney <ddaney@caviumnetworks.com>:
>> On 01/26/2011 10:24 PM, Deng-Cheng Zhu wrote:
>>>
>>> Using your attached patch, I experimented -c and -F by 'ls /'. The numbers
>>> I used are 10, 1000 and 100000 for both -c and -F.
>>>
>>> The number of samples I got was 24 all the way. That means the event
>>> period
>>> to sample and the profiling frequency do not affect the results on MIPS32
>>> platform. While working on the old code, the system had the following
>>> results:
>>>
>>> -c 10: The system seems busy dealing with interrupts. And the following
>>> log
>>>        was printed out:
>>>        ================================================
>>>        hda: ide_dma_sff_timer_expiry: DMA status (0x24)
>>>        hda: DMA interrupt recovery
>>>        hda: lost interrupt
>>>        ================================================
>>>        This does need to be fixed later on.
>>> -c 1000: ~11085 samples
>>> -c 100000: ~48 samples ('perf report' still showed some data.)
>>> -F 10: ~118 samples
>>> -F 1000: ~352 samples
>>> -F 100000: ~379 samples
>>>
>>> I'll try to take time to look into the patch to see if anything can be
>>> changed.
>>>
>>
>> I have found it useful to enable tracing, and then placing trace_printk() in
>> mipspmu_event_set_period() to look at the values of:
>>
>> sample_period, period_left that are being used.
>>
>> Also you could use a trace_printk() in mipsxx_pmu_write_counter() to check
>> the value being written to the register.
>>
>> What hardware are you using to test this?  I wonder if there is a board with
>> a 32-bit CPU that I could get access to.
>>
>> David Daney
>>
>>
>>>
>>> Deng-Cheng
>>>
>>>
>>> 2011/1/26 David Daney<ddaney@caviumnetworks.com>:
>>>>
>>>> On 01/24/2011 07:42 PM, Deng-Cheng Zhu wrote:
>>>>>
>>>>> Hi, David
>>>>>
>>>>>
>>>>> This version does fix the problem with 'perf stat'. However, when
>>>>> working
>>>>> with 'perf record', the following happened:
>>>>>
>>>>> -sh-4.0# perf record -f -e cycles -e instructions -e branches \
>>>>>>
>>>>>> -e branch-misses -e r12 find / -name "*sys*">/dev/null
>>>>>
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 0.001 MB perf.data (~53 samples) ]
>>>>
>>>>
>>>> I get the same thing.  What happens if you supply either '-c xxx' or '-f
>>>> xxx'?
>>>>
>>>> I get:octeon:~/linux/tools/perf# ./perf record -e cycles /bin/ls -l /
>>>> total 100
>>>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>>>> [...]
>>>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.002 MB perf.data (~82 samples) ]
>>>>
>>>> Almost no samples as you got.
>>>>
>>>> But if I do:
>>>>
>>>> octeon:~/linux/tools/perf# ./perf record -F 100000 -e cycles /bin/ls -l /
>>>> total 100
>>>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>>>> [...]
>>>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.404 MB perf.data (~17653 samples) ]
>>>>
>>>> Look many more samples!
>>>>
>>>> The question is, what is it supposed to do?
>>>>
>>>> If you can get a reasonable number of samples out if you supply -c or
>>>> -F, then I would argue that it is working and the default settings for
>>>> -F are not a good fit for your test case.
>>>>
>>>> I have slightly changed the patch.  You could try the attached version
>>>> instead and tell me the results.
>>>>
>>>>
>>>> David Daney
>>>>
>>>>
>>>>
>>>
>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-02-17 10:46             ` Deng-Cheng Zhu
@ 2011-02-17 13:36               ` Ralf Baechle
  2011-02-17 15:26                 ` Deng-Cheng Zhu
  2011-02-17 17:26               ` David Daney
  2011-02-17 19:23               ` David Daney
  2 siblings, 1 reply; 15+ messages in thread
From: Ralf Baechle @ 2011-02-17 13:36 UTC (permalink / raw)
  To: Deng-Cheng Zhu
  Cc: David Daney, linux-mips, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Thu, Feb 17, 2011 at 06:46:39PM +0800, Deng-Cheng Zhu wrote:

> The reason of the perf-record failure on 32bit platforms is that the 32bit
> counter read function mipsxx_pmu_read_counter() returns wrong 64bit values.
> For example, the counter value 0x12345678 will be returned as
> 0xffffffff12345678. So in mipspmu_event_update(), the delta will be wrong.
> So here's a possible fix for your reference:
> 
> --- a/arch/mips/kernel/perf_event_mipsxx.c
> +++ b/arch/mips/kernel/perf_event_mipsxx.c
> @@ -184,19 +184,21 @@ static unsigned int
> mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
>         return idx;
>  }
> 
> +#define U32_MASK 0xffffffff
> +
>  static u64 mipsxx_pmu_read_counter(unsigned int idx)
>  {
>         idx = mipsxx_pmu_swizzle_perf_idx(idx);
> 
>         switch (idx) {
>         case 0:
> -               return read_c0_perfcntr0();
> +               return read_c0_perfcntr0() & U32_MASK;
>         case 1:
> -               return read_c0_perfcntr1();
> +               return read_c0_perfcntr1() & U32_MASK;
>         case 2:
> -               return read_c0_perfcntr2();
> +               return read_c0_perfcntr2() & U32_MASK;
>         case 3:
> -               return read_c0_perfcntr3();
> +               return read_c0_perfcntr3() & U32_MASK;

read_c0_perfctrl0 etc. are defined in mipsregs.h as 32-bit reads returning
a signed int.  That was ok on 32-bit kernels.  To support the optional
64-bit counters the code will have to be changed to something like:

static u64 mipsxx_pmu_read_counter(unsigned int idx)
{
	idx = mipsxx_pmu_swizzle_perf_idx(idx);

	switch (idx) {
	case 0:
		if (read_c0_perfctrl0() & M_PERFCTL_WIDE)
			return read_c0_64_bit_perfcntr0();
		else
			return read_c0_32_bit_perfcntr0();
	case 1:
		if (read_c0_perfctrl1() & M_PERFCTL_WIDE)
			return read_c0_64_bit_perfcntr1();
		else
			return read_c0_32_bit_perfcntr1();
...

And read_c0_32_bit_perfcntrX need to zero-extend their return value.

  Ralf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-02-17 13:36               ` Ralf Baechle
@ 2011-02-17 15:26                 ` Deng-Cheng Zhu
  0 siblings, 0 replies; 15+ messages in thread
From: Deng-Cheng Zhu @ 2011-02-17 15:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: David Daney, linux-mips, Peter Zijlstra, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo

Since another function mipsxx_pmu_read_counter_64() has already been
defined, the counter wide judgement should not be needed here. And
yes, U32_MASK is right here to zero out the upper 32 bits of the
64-bit return value.

Deng-Cheng


在 2011年2月17日星期四,Ralf Baechle <ralf@linux-mips.org> 写道:
> On Thu, Feb 17, 2011 at 06:46:39PM +0800, Deng-Cheng Zhu wrote:
>
>> The reason of the perf-record failure on 32bit platforms is that the 32bit
>> counter read function mipsxx_pmu_read_counter() returns wrong 64bit values.
>> For example, the counter value 0x12345678 will be returned as
>> 0xffffffff12345678. So in mipspmu_event_update(), the delta will be wrong.
>> So here's a possible fix for your reference:
>>
>> --- a/arch/mips/kernel/perf_event_mipsxx.c
>> +++ b/arch/mips/kernel/perf_event_mipsxx.c
>> @@ -184,19 +184,21 @@ static unsigned int
>> mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
>>         return idx;
>>  }
>>
>> +#define U32_MASK 0xffffffff
>> +
>>  static u64 mipsxx_pmu_read_counter(unsigned int idx)
>>  {
>>         idx = mipsxx_pmu_swizzle_perf_idx(idx);
>>
>>         switch (idx) {
>>         case 0:
>> -               return read_c0_perfcntr0();
>> +               return read_c0_perfcntr0() & U32_MASK;
>>         case 1:
>> -               return read_c0_perfcntr1();
>> +               return read_c0_perfcntr1() & U32_MASK;
>>         case 2:
>> -               return read_c0_perfcntr2();
>> +               return read_c0_perfcntr2() & U32_MASK;
>>         case 3:
>> -               return read_c0_perfcntr3();
>> +               return read_c0_perfcntr3() & U32_MASK;
>
> read_c0_perfctrl0 etc. are defined in mipsregs.h as 32-bit reads returning
> a signed int.  That was ok on 32-bit kernels.  To support the optional
> 64-bit counters the code will have to be changed to something like:
>
> static u64 mipsxx_pmu_read_counter(unsigned int idx)
> {
>         idx = mipsxx_pmu_swizzle_perf_idx(idx);
>
>         switch (idx) {
>         case 0:
>                 if (read_c0_perfctrl0() & M_PERFCTL_WIDE)
>                         return read_c0_64_bit_perfcntr0();
>                 else
>                         return read_c0_32_bit_perfcntr0();
>         case 1:
>                 if (read_c0_perfctrl1() & M_PERFCTL_WIDE)
>                         return read_c0_64_bit_perfcntr1();
>                 else
>                         return read_c0_32_bit_perfcntr1();
> ...
>
> And read_c0_32_bit_perfcntrX need to zero-extend their return value.
>
>   Ralf
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-02-17 10:46             ` Deng-Cheng Zhu
  2011-02-17 13:36               ` Ralf Baechle
@ 2011-02-17 17:26               ` David Daney
  2011-02-17 19:23               ` David Daney
  2 siblings, 0 replies; 15+ messages in thread
From: David Daney @ 2011-02-17 17:26 UTC (permalink / raw)
  To: Deng-Cheng Zhu, ralf
  Cc: linux-mips, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On 02/17/2011 02:46 AM, Deng-Cheng Zhu wrote:
> Hi, David
>
>
> The reason of the perf-record failure on 32bit platforms is that the 32bit
> counter read function mipsxx_pmu_read_counter() returns wrong 64bit values.
> For example, the counter value 0x12345678 will be returned as
> 0xffffffff12345678. So in mipspmu_event_update(), the delta will be wrong.
> So here's a possible fix for your reference:
>

Thanks for the excellent detective work.  I will generate a fixed patch 
set.  I think we are getting close to something that will work here.

David Daney



> --- a/arch/mips/kernel/perf_event_mipsxx.c
> +++ b/arch/mips/kernel/perf_event_mipsxx.c
> @@ -184,19 +184,21 @@ static unsigned int
> mipsxx_pmu_swizzle_perf_idx(unsigned int idx)
>          return idx;
>   }
>
> +#define U32_MASK 0xffffffff
> +
>   static u64 mipsxx_pmu_read_counter(unsigned int idx)
>   {
>          idx = mipsxx_pmu_swizzle_perf_idx(idx);
>
>          switch (idx) {
>          case 0:
> -               return read_c0_perfcntr0();
> +               return read_c0_perfcntr0()&  U32_MASK;
>          case 1:
> -               return read_c0_perfcntr1();
> +               return read_c0_perfcntr1()&  U32_MASK;
>          case 2:
> -               return read_c0_perfcntr2();
> +               return read_c0_perfcntr2()&  U32_MASK;
>          case 3:
> -               return read_c0_perfcntr3();
> +               return read_c0_perfcntr3()&  U32_MASK;
>          default:
>                  WARN_ONCE(1, "Invalid performance counter number (%d)\n", idx);
>                  return 0;
>
> In addition, since you removed the use of cpuc->msbs, some code relative to
> this logic can be removed:
>
> @@ -370,7 +372,6 @@ static int mipspmu_event_set_period(struct
> perf_event *event,
>          u64 left = local64_read(&hwc->period_left);
>          u64 period = hwc->sample_period;
>          int ret = 0;
> -       unsigned long flags;
>
>          if (unlikely((left + period)&  (1ULL<<  63))) {
>                  /* left underflowed by more than period. */
> @@ -393,9 +394,7 @@ static int mipspmu_event_set_period(struct
> perf_event *event,
>
>          local64_set(&hwc->prev_count, mipspmu.overflow - left);
>
> -       local_irq_save(flags);
>          mipspmu.write_counter(idx, mipspmu.overflow - left);
> -       local_irq_restore(flags);
>
>          perf_event_update_userpage(event);
>
> @@ -406,16 +405,12 @@ static void mipspmu_event_update(struct perf_event *event,
>                                   struct hw_perf_event *hwc,
>                                   int idx)
>   {
> -       unsigned long flags;
>          u64 prev_raw_count, new_raw_count;
>          u64 delta;
>
>   again:
>          prev_raw_count = local64_read(&hwc->prev_count);
> -       local_irq_save(flags);
> -       /* Make the counter value be a "real" one. */
>          new_raw_count = mipspmu.read_counter(idx);
> -       local_irq_restore(flags);
>
>          if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
>                                  new_raw_count) != prev_raw_count)
>
> And here's a general comment: You are putting the majority of the
> implementation in perf_event_mipsxx.c. This will require other CPUs like
> Loongson2 to replicate quite a lot code in their corresponding files. I
> personally think the original "skeleton + #include perf_event_$cpu.c" is a
> better choice. I understand you prefer not using code like
> "#if defined(CONFIG_CPU_MIPS32)" on the top of perf_event_$cpu.c, but that
> is what other architectures (X86/ARM etc) are doing.
>
>
> Deng-Cheng
>
>
> 2011/1/28 Deng-Cheng Zhu<dengcheng.zhu@gmail.com>:
>> OK. I'll try to use tracing when needed.
>>
>> The hardware I was using for the test was Malta-R with 34K bitfile
>> programed into the FPGA. The CPU frequency is 50MHz.
>>
>>
>> Deng-Cheng
>>
>>
>> 2011/1/28 David Daney<ddaney@caviumnetworks.com>:
>>> On 01/26/2011 10:24 PM, Deng-Cheng Zhu wrote:
>>>>
>>>> Using your attached patch, I experimented -c and -F by 'ls /'. The numbers
>>>> I used are 10, 1000 and 100000 for both -c and -F.
>>>>
>>>> The number of samples I got was 24 all the way. That means the event
>>>> period
>>>> to sample and the profiling frequency do not affect the results on MIPS32
>>>> platform. While working on the old code, the system had the following
>>>> results:
>>>>
>>>> -c 10: The system seems busy dealing with interrupts. And the following
>>>> log
>>>>         was printed out:
>>>>         ================================================
>>>>         hda: ide_dma_sff_timer_expiry: DMA status (0x24)
>>>>         hda: DMA interrupt recovery
>>>>         hda: lost interrupt
>>>>         ================================================
>>>>         This does need to be fixed later on.
>>>> -c 1000: ~11085 samples
>>>> -c 100000: ~48 samples ('perf report' still showed some data.)
>>>> -F 10: ~118 samples
>>>> -F 1000: ~352 samples
>>>> -F 100000: ~379 samples
>>>>
>>>> I'll try to take time to look into the patch to see if anything can be
>>>> changed.
>>>>
>>>
>>> I have found it useful to enable tracing, and then placing trace_printk() in
>>> mipspmu_event_set_period() to look at the values of:
>>>
>>> sample_period, period_left that are being used.
>>>
>>> Also you could use a trace_printk() in mipsxx_pmu_write_counter() to check
>>> the value being written to the register.
>>>
>>> What hardware are you using to test this?  I wonder if there is a board with
>>> a 32-bit CPU that I could get access to.
>>>
>>> David Daney
>>>
>>>
>>>>
>>>> Deng-Cheng
>>>>
>>>>
>>>> 2011/1/26 David Daney<ddaney@caviumnetworks.com>:
>>>>>
>>>>> On 01/24/2011 07:42 PM, Deng-Cheng Zhu wrote:
>>>>>>
>>>>>> Hi, David
>>>>>>
>>>>>>
>>>>>> This version does fix the problem with 'perf stat'. However, when
>>>>>> working
>>>>>> with 'perf record', the following happened:
>>>>>>
>>>>>> -sh-4.0# perf record -f -e cycles -e instructions -e branches \
>>>>>>>
>>>>>>> -e branch-misses -e r12 find / -name "*sys*">/dev/null
>>>>>>
>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 0.001 MB perf.data (~53 samples) ]
>>>>>
>>>>>
>>>>> I get the same thing.  What happens if you supply either '-c xxx' or '-f
>>>>> xxx'?
>>>>>
>>>>> I get:octeon:~/linux/tools/perf# ./perf record -e cycles /bin/ls -l /
>>>>> total 100
>>>>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>>>>> [...]
>>>>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 0.002 MB perf.data (~82 samples) ]
>>>>>
>>>>> Almost no samples as you got.
>>>>>
>>>>> But if I do:
>>>>>
>>>>> octeon:~/linux/tools/perf# ./perf record -F 100000 -e cycles /bin/ls -l /
>>>>> total 100
>>>>> drwxr-xr-x   2 root root  4096 2010-11-12 11:39 bin
>>>>> [...]
>>>>> drwxr-xr-x  13 root root  4096 2007-05-25 12:28 var
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 0.404 MB perf.data (~17653 samples) ]
>>>>>
>>>>> Look many more samples!
>>>>>
>>>>> The question is, what is it supposed to do?
>>>>>
>>>>> If you can get a reasonable number of samples out if you supply -c or
>>>>> -F, then I would argue that it is working and the default settings for
>>>>> -F are not a good fit for your test case.
>>>>>
>>>>> I have slightly changed the patch.  You could try the attached version
>>>>> instead and tell me the results.
>>>>>
>>>>>
>>>>> David Daney
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters.
  2011-02-17 10:46             ` Deng-Cheng Zhu
  2011-02-17 13:36               ` Ralf Baechle
  2011-02-17 17:26               ` David Daney
@ 2011-02-17 19:23               ` David Daney
  2 siblings, 0 replies; 15+ messages in thread
From: David Daney @ 2011-02-17 19:23 UTC (permalink / raw)
  To: Deng-Cheng Zhu
  Cc: linux-mips, ralf, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo

On 02/17/2011 02:46 AM, Deng-Cheng Zhu wrote:
> Hi, David
[...]
>
> And here's a general comment: You are putting the majority of the
> implementation in perf_event_mipsxx.c. This will require other CPUs like
> Loongson2 to replicate quite a lot code in their corresponding files.

There is no such implementation.  But if someone were to create one, I 
would suggest they move common code to a separate file that would be 
shared among the implementations that need it.

> I
> personally think the original "skeleton + #include perf_event_$cpu.c" is a
> better choice. I understand you prefer not using code like
> "#if defined(CONFIG_CPU_MIPS32)" on the top of perf_event_$cpu.c, but that
> is what other architectures (X86/ARM etc) are doing.
>

Existing poor practice is not a good reason to do this.

David Daney

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-17 19:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-21 22:59 [PATCH v2 0/4] MIPS: perf: Add support for 64-bit MIPS hardware counters David Daney
2011-01-21 22:59 ` [PATCH v2 1/4] MIPS: Add accessor macros for 64-bit performance counter registers David Daney
2011-01-21 22:59 ` [PATCH v2 2/4] MIPS: perf: Cleanup formatting in arch/mips/kernel/perf_event.c David Daney
2011-01-21 22:59 ` [PATCH v2 3/4] MIPS: perf: Reorganize contents of perf support files David Daney
2011-01-21 22:59 ` [PATCH v2 4/4] MIPS: perf: Add support for 64-bit perf counters David Daney
2011-01-25  3:42   ` Deng-Cheng Zhu
2011-01-26  0:20     ` David Daney
2011-01-27  6:24       ` Deng-Cheng Zhu
2011-01-27 18:41         ` David Daney
2011-01-28  2:46           ` Deng-Cheng Zhu
2011-02-17 10:46             ` Deng-Cheng Zhu
2011-02-17 13:36               ` Ralf Baechle
2011-02-17 15:26                 ` Deng-Cheng Zhu
2011-02-17 17:26               ` David Daney
2011-02-17 19:23               ` David Daney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.