linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8
@ 2013-06-24 11:28 Michael Ellerman
  2013-06-24 11:28 ` [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable() Michael Ellerman
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

A mistake we have made in the past is that we pull out the fields we
need from the event code, but don't check that there are no unknown bits
set. This means that we can't ever assign meaning to those unknown bits
in future.

Although we have once again failed to do this at release, it is still
early days for Power8 so I think we can still slip this in and get away
with it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/perf/power8-pmu.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index f7d1c4f..84cdc6d 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -109,6 +109,16 @@
 #define EVENT_IS_MARKED		(EVENT_MARKED_MASK << EVENT_MARKED_SHIFT)
 #define EVENT_PSEL_MASK		0xff	/* PMCxSEL value */
 
+#define EVENT_VALID_MASK	\
+	((EVENT_THRESH_MASK    << EVENT_THRESH_SHIFT)		|	\
+	 (EVENT_SAMPLE_MASK    << EVENT_SAMPLE_SHIFT)		|	\
+	 (EVENT_CACHE_SEL_MASK << EVENT_CACHE_SEL_SHIFT)	|	\
+	 (EVENT_PMC_MASK       << EVENT_PMC_SHIFT)		|	\
+	 (EVENT_UNIT_MASK      << EVENT_UNIT_SHIFT)		|	\
+	 (EVENT_COMBINE_MASK   << EVENT_COMBINE_SHIFT)		|	\
+	 (EVENT_MARKED_MASK    << EVENT_MARKED_SHIFT)		|	\
+	  EVENT_PSEL_MASK)
+
 /* MMCRA IFM bits - POWER8 */
 #define	POWER8_MMCRA_IFM1		0x0000000040000000UL
 #define	POWER8_MMCRA_IFM2		0x0000000080000000UL
@@ -212,6 +222,9 @@ static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long
 
 	mask = value = 0;
 
+	if (event & ~EVENT_VALID_MASK)
+		return -1;
+
 	pmc   = (event >> EVENT_PMC_SHIFT)       & EVENT_PMC_MASK;
 	unit  = (event >> EVENT_UNIT_SHIFT)      & EVENT_UNIT_MASK;
 	cache = (event >> EVENT_CACHE_SEL_SHIFT) & EVENT_CACHE_SEL_MASK;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable()
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
@ 2013-06-24 11:28 ` Michael Ellerman
  2013-06-25 11:22   ` Anshuman Khandual
  2013-06-24 11:28 ` [PATCH 3/8] powerpc/perf: Freeze PMC5/6 if we're not using them Michael Ellerman
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
bit in MMCR0. In order to do this we have to read/modify/write MMCR0.

It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
Occurred) set. When we write that value back it will cause an interrupt
to occur. We will then end up in the PMU interrupt handler even though
we are supposed to have just disabled the PMU.

We can avoid this by making sure we never write PMAO back. We should not
lose interrupts because when the PMU is re-enabled the overflowed values
will cause another interrupt.

We also reorder the clearing of SAMPLE_ENABLE so that is done after the
PMU is frozen. Otherwise there is a small window between the clearing of
SAMPLE_ENABLE and the setting of FC where we could take an interrupt and
incorrectly see SAMPLE_ENABLE not set. This would for example change the
logic in perf_read_regs().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/perf/core-book3s.c |   31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 29c6482..1ab3068 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -75,6 +75,7 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
 
 #define MMCR0_FCHV		0
 #define MMCR0_PMCjCE		MMCR0_PMCnCE
+#define MMCR0_PMAO		0
 
 #define SPRN_MMCRA		SPRN_MMCR2
 #define MMCRA_SAMPLE_ENABLE	0
@@ -852,7 +853,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, unsigned long mmcr0)
 static void power_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
-	unsigned long flags;
+	unsigned long flags, val;
 
 	if (!ppmu)
 		return;
@@ -860,9 +861,6 @@ static void power_pmu_disable(struct pmu *pmu)
 	cpuhw = &__get_cpu_var(cpu_hw_events);
 
 	if (!cpuhw->disabled) {
-		cpuhw->disabled = 1;
-		cpuhw->n_added = 0;
-
 		/*
 		 * Check if we ever enabled the PMU on this cpu.
 		 */
@@ -872,6 +870,21 @@ static void power_pmu_disable(struct pmu *pmu)
 		}
 
 		/*
+		 * Set the 'freeze counters' bit, clear PMAO.
+		 */
+		val  = mfspr(SPRN_MMCR0);
+		val |= MMCR0_FC;
+		val &= ~MMCR0_PMAO;
+
+		/*
+		 * The barrier is to make sure the mtspr has been
+		 * executed and the PMU has frozen the events etc.
+		 * before we return.
+		 */
+		write_mmcr0(cpuhw, val);
+		mb();
+
+		/*
 		 * Disable instruction sampling if it was enabled
 		 */
 		if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
@@ -880,14 +893,8 @@ static void power_pmu_disable(struct pmu *pmu)
 			mb();
 		}
 
-		/*
-		 * Set the 'freeze counters' bit.
-		 * The barrier is to make sure the mtspr has been
-		 * executed and the PMU has frozen the events
-		 * before we return.
-		 */
-		write_mmcr0(cpuhw, mfspr(SPRN_MMCR0) | MMCR0_FC);
-		mb();
+		cpuhw->disabled = 1;
+		cpuhw->n_added = 0;
 	}
 	local_irq_restore(flags);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/8] powerpc/perf: Freeze PMC5/6 if we're not using them
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
  2013-06-24 11:28 ` [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable() Michael Ellerman
@ 2013-06-24 11:28 ` Michael Ellerman
  2013-06-24 11:28 ` [PATCH 4/8] powerpc/perf: Use existing out label in power_pmu_enable() Michael Ellerman
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

On Power8 we can freeze PMC5 and 6 if we're not using them. Normally they
run all the time.

As noticed by Anshuman, we should unfreeze them when we disable the PMU
as there are legacy tools which expect them to run all the time.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/include/asm/reg.h  |    1 +
 arch/powerpc/perf/core-book3s.c |    5 +++--
 arch/powerpc/perf/power8-pmu.c  |    4 ++++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 4a9e408..362142b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -626,6 +626,7 @@
 #define   MMCR0_TRIGGER	0x00002000UL /* TRIGGER enable */
 #define   MMCR0_PMAO	0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */
 #define   MMCR0_SHRFC	0x00000040UL /* SHRre freeze conditions between threads */
+#define   MMCR0_FC56	0x00000010UL /* freeze counters 5 and 6 */
 #define   MMCR0_FCTI	0x00000008UL /* freeze counters in tags inactive mode */
 #define   MMCR0_FCTA	0x00000004UL /* freeze counters in tags active mode */
 #define   MMCR0_FCWAIT	0x00000002UL /* freeze counter in WAIT state */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1ab3068..3d566ee 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -75,6 +75,7 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
 
 #define MMCR0_FCHV		0
 #define MMCR0_PMCjCE		MMCR0_PMCnCE
+#define MMCR0_FC56		0
 #define MMCR0_PMAO		0
 
 #define SPRN_MMCRA		SPRN_MMCR2
@@ -870,11 +871,11 @@ static void power_pmu_disable(struct pmu *pmu)
 		}
 
 		/*
-		 * Set the 'freeze counters' bit, clear PMAO.
+		 * Set the 'freeze counters' bit, clear PMAO/FC56.
 		 */
 		val  = mfspr(SPRN_MMCR0);
 		val |= MMCR0_FC;
-		val &= ~MMCR0_PMAO;
+		val &= ~(MMCR0_PMAO | MMCR0_FC56);
 
 		/*
 		 * The barrier is to make sure the mtspr has been
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 84cdc6d..d59f5b2 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -391,6 +391,10 @@ static int power8_compute_mmcr(u64 event[], int n_ev,
 	if (pmc_inuse & 0x7c)
 		mmcr[0] |= MMCR0_PMCjCE;
 
+	/* If we're not using PMC 5 or 6, freeze them */
+	if (!(pmc_inuse & 0x60))
+		mmcr[0] |= MMCR0_FC56;
+
 	mmcr[1] = mmcr1;
 	mmcr[2] = mmcra;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/8] powerpc/perf: Use existing out label in power_pmu_enable()
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
  2013-06-24 11:28 ` [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable() Michael Ellerman
  2013-06-24 11:28 ` [PATCH 3/8] powerpc/perf: Freeze PMC5/6 if we're not using them Michael Ellerman
@ 2013-06-24 11:28 ` Michael Ellerman
  2013-06-27 11:01   ` Anshuman Khandual
  2013-06-24 11:28 ` [PATCH 5/8] powerpc/perf: Don't enable if we have zero events Michael Ellerman
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

In power_pmu_enable() we can use the existing out label to reduce the
number of return paths.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/perf/core-book3s.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3d566ee..af4b4b1 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -919,12 +919,13 @@ static void power_pmu_enable(struct pmu *pmu)
 
 	if (!ppmu)
 		return;
+
 	local_irq_save(flags);
+
 	cpuhw = &__get_cpu_var(cpu_hw_events);
-	if (!cpuhw->disabled) {
-		local_irq_restore(flags);
-		return;
-	}
+	if (!cpuhw->disabled)
+		goto out;
+
 	cpuhw->disabled = 0;
 
 	/*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/8] powerpc/perf: Don't enable if we have zero events
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
                   ` (2 preceding siblings ...)
  2013-06-24 11:28 ` [PATCH 4/8] powerpc/perf: Use existing out label in power_pmu_enable() Michael Ellerman
@ 2013-06-24 11:28 ` Michael Ellerman
  2013-06-28  5:10   ` Anshuman Khandual
  2013-06-24 11:28 ` [PATCH 6/8] powerpc/perf: Drop MMCRA from thread_struct Michael Ellerman
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

In power_pmu_enable() we still enable the PMU even if we have zero
events. This should have no effect but doesn't make much sense. Instead
just return after telling the hypervisor that we are not using the PMCs.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/perf/core-book3s.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index af4b4b1..d3ee2e5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -926,6 +926,11 @@ static void power_pmu_enable(struct pmu *pmu)
 	if (!cpuhw->disabled)
 		goto out;
 
+	if (cpuhw->n_events == 0) {
+		ppc_set_pmu_inuse(0);
+		goto out;
+	}
+
 	cpuhw->disabled = 0;
 
 	/*
@@ -937,8 +942,6 @@ static void power_pmu_enable(struct pmu *pmu)
 	if (!cpuhw->n_added) {
 		mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
 		mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
-		if (cpuhw->n_events == 0)
-			ppc_set_pmu_inuse(0);
 		goto out_enable;
 	}
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/8] powerpc/perf: Drop MMCRA from thread_struct
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
                   ` (3 preceding siblings ...)
  2013-06-24 11:28 ` [PATCH 5/8] powerpc/perf: Don't enable if we have zero events Michael Ellerman
@ 2013-06-24 11:28 ` Michael Ellerman
  2013-06-24 11:28 ` [PATCH 7/8] powerpc/perf: Core EBB support for 64-bit book3s Michael Ellerman
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

In commit 59affcd "Context switch more PMU related SPRs" I added more
PMU SPRs to thread_struct, later modified in commit b11ae95. To add
insult to injury it turns out we don't need to switch MMCRA as it's
only user readable, and the value is recomputed by the PMU code.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/include/asm/processor.h |    1 -
 arch/powerpc/kernel/asm-offsets.c    |    1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 14a6583..48af5d7 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -289,7 +289,6 @@ struct thread_struct {
 	unsigned long	sier;
 	unsigned long	mmcr0;
 	unsigned long	mmcr2;
-	unsigned long	mmcra;
 #endif
 };
 
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 6f16ffa..2f066ef 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -132,7 +132,6 @@ int main(void)
 	DEFINE(THREAD_SIER, offsetof(struct thread_struct, sier));
 	DEFINE(THREAD_MMCR0, offsetof(struct thread_struct, mmcr0));
 	DEFINE(THREAD_MMCR2, offsetof(struct thread_struct, mmcr2));
-	DEFINE(THREAD_MMCRA, offsetof(struct thread_struct, mmcra));
 #endif
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	DEFINE(PACATMSCRATCH, offsetof(struct paca_struct, tm_scratch));
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/8] powerpc/perf: Core EBB support for 64-bit book3s
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
                   ` (4 preceding siblings ...)
  2013-06-24 11:28 ` [PATCH 6/8] powerpc/perf: Drop MMCRA from thread_struct Michael Ellerman
@ 2013-06-24 11:28 ` Michael Ellerman
  2013-06-26  8:38   ` Anshuman Khandual
  2013-06-24 11:28 ` [PATCH 8/8] powerpc/perf: Add power8 EBB support Michael Ellerman
  2013-06-25 10:55 ` [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Anshuman Khandual
  7 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

Add support for EBB (Event Based Branches) on 64-bit book3s. See the
included documentation for more details.

EBBs are a feature which allows the hardware to branch directly to a
specified user space address when a PMU event overflows. This can be
used by programs for self-monitoring with no kernel involvement in the
inner loop.

Most of the logic is in the generic book3s code, primarily to avoid a
proliferation of PMU callbacks.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 Documentation/powerpc/00-INDEX               |    2 +
 Documentation/powerpc/pmu-ebb.txt            |  122 +++++++++++++++++++
 arch/powerpc/include/asm/perf_event_server.h |    6 +
 arch/powerpc/include/asm/processor.h         |    3 +-
 arch/powerpc/include/asm/reg.h               |    8 ++
 arch/powerpc/include/asm/switch_to.h         |   14 +++
 arch/powerpc/kernel/process.c                |    4 +
 arch/powerpc/perf/core-book3s.c              |  161 +++++++++++++++++++++++---
 8 files changed, 306 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/powerpc/pmu-ebb.txt

diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index dd9e928..05026ce 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -14,6 +14,8 @@ hvcs.txt
 	- IBM "Hypervisor Virtual Console Server" Installation Guide
 mpc52xx.txt
 	- Linux 2.6.x on MPC52xx family
+pmu-ebb.txt
+	- Description of the API for using the PMU with Event Based Branches.
 qe_firmware.txt
 	- describes the layout of firmware binaries for the Freescale QUICC
 	  Engine and the code that parses and uploads the microcode therein.
diff --git a/Documentation/powerpc/pmu-ebb.txt b/Documentation/powerpc/pmu-ebb.txt
new file mode 100644
index 0000000..65b6989
--- /dev/null
+++ b/Documentation/powerpc/pmu-ebb.txt
@@ -0,0 +1,122 @@
+PMU Event Based Branches
+========================
+
+Event Based Branches (EBBs) are a feature which allows the hardware to
+branch directly to a specified user space address when certain events occur.
+
+The full specification is available in Power ISA v2.07:
+
+  https://www.power.org/documentation/power-isa-version-2-07/
+
+One type of event for which EBBs can be configured is PMU exceptions. This
+document describes the API for configuring the Power PMU to generate EBBs,
+using the Linux perf_events API.
+
+
+Terminology
+-----------
+
+Throughout this document we will refer to an "EBB event" or "EBB events". This
+just refers to a struct perf_event which has set the "EBB" flag in its
+attr.config. All events which can be configured on the hardware PMU are
+possible "EBB events".
+
+
+Background
+----------
+
+When a PMU EBB occurs it is delivered to the currently running process. As such
+EBBs can only sensibly be used by programs for self-monitoring.
+
+It is a feature of the perf_events API that events can be created on other
+processes, subject to standard permission checks. This is also true of EBB
+events, however unless the target process enables EBBs (via mtspr(BESCR)) no
+EBBs will ever be delivered.
+
+This makes it possible for a process to enable EBBs for itself, but not
+actually configure any events. At a later time another process can come along
+and attach an EBB event to the process, which will then cause EBBs to be
+delivered to the first process. It's not clear if this is actually useful.
+
+
+When the PMU is configured for EBBs, all PMU interrupts are delivered to the
+user process. This means once an EBB event is scheduled on the PMU, no non-EBB
+events can be configured. This means that EBB events can not be run
+concurrently with regular 'perf' commands.
+
+It is however safe to run 'perf' commands on a process which is using EBBs. In
+general the EBB event will take priority, though it depends on the exact
+options used on the perf_event_open() and the timing.
+
+
+Creating an EBB event
+---------------------
+
+To request that an event is counted using EBB, the event code should have bit
+63 set.
+
+EBB events must be created with a particular, and restrictive, set of
+attributes - this is so that they interoperate correctly with the rest of the
+perf_events subsystem.
+
+An EBB event must be created with the "pinned" and "exclusive" attributes set.
+Note that if you are creating a group of EBB events, only the leader can have
+these attributes set.
+
+An EBB event must NOT set any of the "inherit", "sample_period", "freq" or
+"enable_on_exec" attributes.
+
+An EBB event must be attached to a task. This is specified to perf_event_open()
+by passing a pid value, typically 0 indicating the current task.
+
+All events in a group must agree on whether they want EBB. That is all events
+must request EBB, or none may request EBB.
+
+
+Enabling an EBB event
+---------------------
+
+Once an EBB event has been successfully opened, it must be enabled with the
+perf_events API. This can be achieved either via the ioctl() interface, or the
+prctl() interface.
+
+However, due to the design of the perf_events API, enabling an event does not
+guarantee that it has been scheduled on the PMU. To ensure that the EBB event
+has been scheduled on the PMU, you must perform a read() on the event. If the
+read() returns EOF, then the event has not been scheduled and EBBs are not
+enabled.
+
+
+Reading an EBB event
+--------------------
+
+It is possible to read() from an EBB event. However the results are
+meaningless. Because interrupts are being delivered to the user process the
+kernel is not able to count the event, and so will return a junk value.
+
+
+Closing an EBB event
+--------------------
+
+When an EBB event is finished with, you can close it using close() as for any
+regular event. If this is the last EBB event the PMU will be deconfigured and
+no further PMU EBBs will be delivered.
+
+
+EBB Handler
+-----------
+
+The EBB handler is just regular userspace code, however it must be written in
+the style of an interrupt handler. When the handler is entered all registers
+are live (possibly) and so must be saved somehow before the handler can invoke
+other code.
+
+It's up to the program how to handle this. For C programs a relatively simple
+option is to create an interrupt frame on the stack and save registers there.
+
+Fork
+----
+
+EBB events are not inherited across fork. If the child process wishes to use
+EBBs it should open a new event for itself. Similarly the EBB state in
+BESCR/EBBHR/EBBRR is cleared across fork().
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index f265049..2dd7bfc 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -60,6 +60,7 @@ struct power_pmu {
 #define PPMU_HAS_SSLOT		0x00000020 /* Has sampled slot in MMCRA */
 #define PPMU_HAS_SIER		0x00000040 /* Has SIER */
 #define PPMU_BHRB		0x00000080 /* has BHRB feature enabled */
+#define PPMU_EBB		0x00000100 /* supports event based branch */
 
 /*
  * Values for flags to get_alternatives()
@@ -68,6 +69,11 @@ struct power_pmu {
 #define PPMU_LIMITED_PMC_REQD	2	/* have to put this on a limited PMC */
 #define PPMU_ONLY_COUNT_RUN	4	/* only counting in run state */
 
+/*
+ * We use the event config bit 63 as a flag to request EBB.
+ */
+#define EVENT_CONFIG_EBB_SHIFT	63
+
 extern int register_power_pmu(struct power_pmu *);
 
 struct pt_regs;
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 48af5d7..f9a4cdc 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -287,8 +287,9 @@ struct thread_struct {
 	unsigned long	siar;
 	unsigned long	sdar;
 	unsigned long	sier;
-	unsigned long	mmcr0;
 	unsigned long	mmcr2;
+	unsigned 	mmcr0;
+	unsigned 	used_ebb;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 362142b..5d7d9c2 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -621,6 +621,9 @@
 #define   MMCR0_PMXE	0x04000000UL /* performance monitor exception enable */
 #define   MMCR0_FCECE	0x02000000UL /* freeze ctrs on enabled cond or event */
 #define   MMCR0_TBEE	0x00400000UL /* time base exception enable */
+#define   MMCR0_EBE	0x00100000UL /* Event based branch enable */
+#define   MMCR0_PMCC	0x000c0000UL /* PMC control */
+#define   MMCR0_PMCC_U6	0x00080000UL /* PMC1-6 are R/W by user (PR) */
 #define   MMCR0_PMC1CE	0x00008000UL /* PMC1 count enable*/
 #define   MMCR0_PMCjCE	0x00004000UL /* PMCj count enable*/
 #define   MMCR0_TRIGGER	0x00002000UL /* TRIGGER enable */
@@ -674,6 +677,11 @@
 #define   SIER_SIAR_VALID	0x0400000	/* SIAR contents valid */
 #define   SIER_SDAR_VALID	0x0200000	/* SDAR contents valid */
 
+/* When EBB is enabled, some of MMCR0/MMCR2/SIER are user accessible */
+#define MMCR0_USER_MASK	(MMCR0_FC | MMCR0_PMXE | MMCR0_PMAO)
+#define MMCR2_USER_MASK	0x4020100804020000UL /* (FC1P|FC2P|FC3P|FC4P|FC5P|FC6P) */
+#define SIER_USER_MASK	0x7fffffUL
+
 #define SPRN_PA6T_MMCR0 795
 #define   PA6T_MMCR0_EN0	0x0000000000000001UL
 #define   PA6T_MMCR0_EN1	0x0000000000000002UL
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 200d763..49a13e0 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -67,4 +67,18 @@ static inline void flush_spe_to_thread(struct task_struct *t)
 }
 #endif
 
+static inline void clear_task_ebb(struct task_struct *t)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+    /* EBB perf events are not inherited, so clear all EBB state. */
+    t->thread.bescr = 0;
+    t->thread.mmcr2 = 0;
+    t->thread.mmcr0 = 0;
+    t->thread.siar = 0;
+    t->thread.sdar = 0;
+    t->thread.sier = 0;
+    t->thread.used_ebb = 0;
+#endif
+}
+
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 076d124..c517dbe 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -916,7 +916,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	flush_altivec_to_thread(src);
 	flush_vsx_to_thread(src);
 	flush_spe_to_thread(src);
+
 	*dst = *src;
+
+	clear_task_ebb(dst);
+
 	return 0;
 }
 
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index d3ee2e5..c6bbbf9 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -77,6 +77,9 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
 #define MMCR0_PMCjCE		MMCR0_PMCnCE
 #define MMCR0_FC56		0
 #define MMCR0_PMAO		0
+#define MMCR0_EBE		0
+#define MMCR0_PMCC		0
+#define MMCR0_PMCC_U6		0
 
 #define SPRN_MMCRA		SPRN_MMCR2
 #define MMCRA_SAMPLE_ENABLE	0
@@ -104,6 +107,15 @@ static inline int siar_valid(struct pt_regs *regs)
 	return 1;
 }
 
+static bool is_ebb_event(struct perf_event *event) { return false; }
+static int ebb_event_check(struct perf_event *event) { return 0; }
+static void ebb_event_add(struct perf_event *event) { }
+static void ebb_switch_out(unsigned long mmcr0) { }
+static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0)
+{
+	return mmcr0;
+}
+
 static inline void power_pmu_bhrb_enable(struct perf_event *event) {}
 static inline void power_pmu_bhrb_disable(struct perf_event *event) {}
 void power_pmu_flush_branch_stack(void) {}
@@ -464,6 +476,89 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 	return;
 }
 
+static bool is_ebb_event(struct perf_event *event)
+{
+	/*
+	 * This could be a per-PMU callback, but we'd rather avoid the cost. We
+	 * check that the PMU supports EBB, meaning those that don't can still
+	 * use bit 63 of the event code for something else if they wish.
+	 */
+	return (ppmu->flags & PPMU_EBB) &&
+	       ((event->attr.config >> EVENT_CONFIG_EBB_SHIFT) & 1);
+}
+
+static int ebb_event_check(struct perf_event *event)
+{
+	struct perf_event *leader = event->group_leader;
+
+	/* Event and group leader must agree on EBB */
+	if (is_ebb_event(leader) != is_ebb_event(event))
+		return -EINVAL;
+
+	if (is_ebb_event(event)) {
+		if (!(event->attach_state & PERF_ATTACH_TASK))
+			return -EINVAL;
+
+		if (!leader->attr.pinned || !leader->attr.exclusive)
+			return -EINVAL;
+
+		if (event->attr.inherit || event->attr.sample_period ||
+		    event->attr.enable_on_exec || event->attr.freq)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void ebb_event_add(struct perf_event *event)
+{
+	if (!is_ebb_event(event) || current->thread.used_ebb)
+		return;
+
+	/*
+	 * IFF this is the first time we've added an EBB event, set
+	 * PMXE in the user MMCR0 so we can detect when it's cleared by
+	 * userspace. We need this so that we can context switch while
+	 * userspace is in the EBB handler (where PMXE is 0).
+	 */
+	current->thread.used_ebb = 1;
+	current->thread.mmcr0 |= MMCR0_PMXE;
+}
+
+static void ebb_switch_out(unsigned long mmcr0)
+{
+	if (!(mmcr0 & MMCR0_EBE))
+		return;
+
+	current->thread.siar  = mfspr(SPRN_SIAR);
+	current->thread.sier  = mfspr(SPRN_SIER);
+	current->thread.sdar  = mfspr(SPRN_SDAR);
+	current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK;
+	current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK;
+}
+
+static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0)
+{
+	if (!ebb)
+		goto out;
+
+	/* Enable EBB and read/write to all 6 PMCs for userspace */
+	mmcr0 |= MMCR0_EBE | MMCR0_PMCC_U6;
+
+	/* Add any bits from the user reg, FC or PMAO */
+	mmcr0 |= current->thread.mmcr0;
+
+	/* Be careful not to set PMXE if userspace had it cleared */
+	if (!(current->thread.mmcr0 & MMCR0_PMXE))
+		mmcr0 &= ~MMCR0_PMXE;
+
+	mtspr(SPRN_SIAR, current->thread.siar);
+	mtspr(SPRN_SIER, current->thread.sier);
+	mtspr(SPRN_SDAR, current->thread.sdar);
+	mtspr(SPRN_MMCR2, current->thread.mmcr2);
+out:
+	return mmcr0;
+}
 #endif /* CONFIG_PPC64 */
 
 static void perf_event_interrupt(struct pt_regs *regs);
@@ -734,6 +829,13 @@ static void power_pmu_read(struct perf_event *event)
 
 	if (!event->hw.idx)
 		return;
+
+	if (is_ebb_event(event)) {
+		val = read_pmc(event->hw.idx);
+		local64_set(&event->hw.prev_count, val);
+		return;
+	}
+
 	/*
 	 * Performance monitor interrupts come even when interrupts
 	 * are soft-disabled, as long as interrupts are hard-enabled.
@@ -854,7 +956,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, unsigned long mmcr0)
 static void power_pmu_disable(struct pmu *pmu)
 {
 	struct cpu_hw_events *cpuhw;
-	unsigned long flags, val;
+	unsigned long flags, mmcr0, val;
 
 	if (!ppmu)
 		return;
@@ -871,11 +973,11 @@ static void power_pmu_disable(struct pmu *pmu)
 		}
 
 		/*
-		 * Set the 'freeze counters' bit, clear PMAO/FC56.
+		 * Set the 'freeze counters' bit, clear EBE/PMCC/PMAO/FC56.
 		 */
-		val  = mfspr(SPRN_MMCR0);
+		val  = mmcr0 = mfspr(SPRN_MMCR0);
 		val |= MMCR0_FC;
-		val &= ~(MMCR0_PMAO | MMCR0_FC56);
+		val &= ~(MMCR0_EBE | MMCR0_PMCC | MMCR0_PMAO | MMCR0_FC56);
 
 		/*
 		 * The barrier is to make sure the mtspr has been
@@ -896,7 +998,10 @@ static void power_pmu_disable(struct pmu *pmu)
 
 		cpuhw->disabled = 1;
 		cpuhw->n_added = 0;
+
+		ebb_switch_out(mmcr0);
 	}
+
 	local_irq_restore(flags);
 }
 
@@ -911,15 +1016,15 @@ static void power_pmu_enable(struct pmu *pmu)
 	struct cpu_hw_events *cpuhw;
 	unsigned long flags;
 	long i;
-	unsigned long val;
+	unsigned long val, mmcr0;
 	s64 left;
 	unsigned int hwc_index[MAX_HWEVENTS];
 	int n_lim;
 	int idx;
+	bool ebb;
 
 	if (!ppmu)
 		return;
-
 	local_irq_save(flags);
 
 	cpuhw = &__get_cpu_var(cpu_hw_events);
@@ -934,6 +1039,13 @@ static void power_pmu_enable(struct pmu *pmu)
 	cpuhw->disabled = 0;
 
 	/*
+	 * EBB requires an exclusive group and all events must have the EBB
+	 * flag set, or not set, so we can just check a single event. Also we
+	 * know we have at least one event.
+	 */
+	ebb = is_ebb_event(cpuhw->event[0]);
+
+	/*
 	 * If we didn't change anything, or only removed events,
 	 * no need to recalculate MMCR* settings and reset the PMCs.
 	 * Just reenable the PMU with the current MMCR* settings
@@ -1008,25 +1120,34 @@ static void power_pmu_enable(struct pmu *pmu)
 			++n_lim;
 			continue;
 		}
-		val = 0;
-		if (event->hw.sample_period) {
-			left = local64_read(&event->hw.period_left);
-			if (left < 0x80000000L)
-				val = 0x80000000L - left;
+
+		if (ebb)
+			val = local64_read(&event->hw.prev_count);
+		else {
+			val = 0;
+			if (event->hw.sample_period) {
+				left = local64_read(&event->hw.period_left);
+				if (left < 0x80000000L)
+					val = 0x80000000L - left;
+			}
+			local64_set(&event->hw.prev_count, val);
 		}
-		local64_set(&event->hw.prev_count, val);
+
 		event->hw.idx = idx;
 		if (event->hw.state & PERF_HES_STOPPED)
 			val = 0;
 		write_pmc(idx, val);
+
 		perf_event_update_userpage(event);
 	}
 	cpuhw->n_limited = n_lim;
 	cpuhw->mmcr[0] |= MMCR0_PMXE | MMCR0_FCECE;
 
  out_enable:
+	mmcr0 = ebb_switch_in(ebb, cpuhw->mmcr[0]);
+
 	mb();
-	write_mmcr0(cpuhw, cpuhw->mmcr[0]);
+	write_mmcr0(cpuhw, mmcr0);
 
 	/*
 	 * Enable instruction sampling if necessary
@@ -1124,6 +1245,8 @@ static int power_pmu_add(struct perf_event *event, int ef_flags)
 	event->hw.config = cpuhw->events[n0];
 
 nocheck:
+	ebb_event_add(event);
+
 	++cpuhw->n_events;
 	++cpuhw->n_added;
 
@@ -1484,6 +1607,11 @@ static int power_pmu_event_init(struct perf_event *event)
 		}
 	}
 
+	/* Extra checks for EBB */
+	err = ebb_event_check(event);
+	if (err)
+		return err;
+
 	/*
 	 * If this is in a group, check if it can go on with all the
 	 * other hardware events in the group.  We assume the event
@@ -1523,6 +1651,13 @@ static int power_pmu_event_init(struct perf_event *event)
 	local64_set(&event->hw.period_left, event->hw.last_period);
 
 	/*
+	 * For EBB events we just context switch the PMC value, we don't do any
+	 * of the sample_period logic. We use hw.prev_count for this.
+	 */
+	if (is_ebb_event(event))
+		local64_set(&event->hw.prev_count, 0);
+
+	/*
 	 * See if we need to reserve the PMU.
 	 * If no events are currently in use, then we have to take a
 	 * mutex to ensure that we don't race with another task doing
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/8] powerpc/perf: Add power8 EBB support
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
                   ` (5 preceding siblings ...)
  2013-06-24 11:28 ` [PATCH 7/8] powerpc/perf: Core EBB support for 64-bit book3s Michael Ellerman
@ 2013-06-24 11:28 ` Michael Ellerman
  2013-06-26  9:58   ` Anshuman Khandual
  2013-06-25 10:55 ` [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Anshuman Khandual
  7 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2013-06-24 11:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sukadev, Paul Mackerras, khandual

Add logic to the power8 PMU code to support EBB. Future processors would
also be expected to implement similar constraints. At that time we could
possibly factor these out into common code.

Finally mark the power8 PMU as supporting EBB, which is the actual
enable switch which allows EBBs to be configured.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/perf/power8-pmu.c |   44 +++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index d59f5b2..c7f8ccc 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -31,9 +31,9 @@
  *
  *        60        56        52        48        44        40        36        32
  * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
- *                                     [      thresh_cmp     ]   [  thresh_ctl   ]
- *                                                                       |
- *                                       thresh start/stop OR FAB match -*
+ *   |                                 [      thresh_cmp     ]   [  thresh_ctl   ]
+ *   |                                                                   |
+ *   *- EBB (Linux)                      thresh start/stop OR FAB match -*
  *
  *        28        24        20        16        12         8         4         0
  * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
@@ -117,6 +117,7 @@
 	 (EVENT_UNIT_MASK      << EVENT_UNIT_SHIFT)		|	\
 	 (EVENT_COMBINE_MASK   << EVENT_COMBINE_SHIFT)		|	\
 	 (EVENT_MARKED_MASK    << EVENT_MARKED_SHIFT)		|	\
+	 (1ull		       << EVENT_CONFIG_EBB_SHIFT)	|	\
 	  EVENT_PSEL_MASK)
 
 /* MMCRA IFM bits - POWER8 */
@@ -140,10 +141,10 @@
  *
  *        28        24        20        16        12         8         4         0
  * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
- *                       [ ]   [  sample ]   [     ]   [6] [5]   [4] [3]   [2] [1]
- *                        |                     |
- *      L1 I/D qualifier -*                     |      Count of events for each PMC.
- *                                              |        p1, p2, p3, p4, p5, p6.
+ *                   |   [ ]   [  sample ]   [     ]   [6] [5]   [4] [3]   [2] [1]
+ *              EBB -*    |                     |
+ *                        |                     |      Count of events for each PMC.
+ *      L1 I/D qualifier -*                     |        p1, p2, p3, p4, p5, p6.
  *                     nc - number of counters -*
  *
  * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints
@@ -159,6 +160,9 @@
 #define CNST_THRESH_VAL(v)	(((v) & EVENT_THRESH_MASK) << 32)
 #define CNST_THRESH_MASK	CNST_THRESH_VAL(EVENT_THRESH_MASK)
 
+#define CNST_EBB_VAL(v)		(((v) & 1) << 24)
+#define CNST_EBB_MASK		CNST_EBB_VAL(1)
+
 #define CNST_L1_QUAL_VAL(v)	(((v) & 3) << 22)
 #define CNST_L1_QUAL_MASK	CNST_L1_QUAL_VAL(3)
 
@@ -217,7 +221,7 @@ static inline bool event_is_fab_match(u64 event)
 
 static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 {
-	unsigned int unit, pmc, cache;
+	unsigned int unit, pmc, cache, ebb;
 	unsigned long mask, value;
 
 	mask = value = 0;
@@ -225,9 +229,13 @@ static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long
 	if (event & ~EVENT_VALID_MASK)
 		return -1;
 
-	pmc   = (event >> EVENT_PMC_SHIFT)       & EVENT_PMC_MASK;
-	unit  = (event >> EVENT_UNIT_SHIFT)      & EVENT_UNIT_MASK;
-	cache = (event >> EVENT_CACHE_SEL_SHIFT) & EVENT_CACHE_SEL_MASK;
+	pmc   = (event >> EVENT_PMC_SHIFT)        & EVENT_PMC_MASK;
+	unit  = (event >> EVENT_UNIT_SHIFT)       & EVENT_UNIT_MASK;
+	cache = (event >> EVENT_CACHE_SEL_SHIFT)  & EVENT_CACHE_SEL_MASK;
+	ebb   = (event >> EVENT_CONFIG_EBB_SHIFT) & 1;
+
+	/* Clear the EBB bit in the event, so event checks work below */
+	event &= ~(1ull << EVENT_CONFIG_EBB_SHIFT);
 
 	if (pmc) {
 		if (pmc > 6)
@@ -297,6 +305,18 @@ static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long
 		value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT);
 	}
 
+	if (!pmc && ebb)
+		/* EBB events must specify the PMC */
+		return -1;
+
+	/*
+	 * All events must agree on EBB, either all request it or none.
+	 * EBB events are pinned & exclusive, so this should never actually
+	 * hit, but we leave it as a fallback in case.
+	 */
+	mask  |= CNST_EBB_VAL(ebb);
+	value |= CNST_EBB_MASK;
+
 	*maskp = mask;
 	*valp = value;
 
@@ -591,7 +611,7 @@ static struct power_pmu power8_pmu = {
 	.get_constraint		= power8_get_constraint,
 	.get_alternatives	= power8_get_alternatives,
 	.disable_pmc		= power8_disable_pmc,
-	.flags			= PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB,
+	.flags			= PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
 	.n_generic		= ARRAY_SIZE(power8_generic_events),
 	.generic_events		= power8_generic_events,
 	.attr_groups		= power8_pmu_attr_groups,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8
  2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
                   ` (6 preceding siblings ...)
  2013-06-24 11:28 ` [PATCH 8/8] powerpc/perf: Add power8 EBB support Michael Ellerman
@ 2013-06-25 10:55 ` Anshuman Khandual
  7 siblings, 0 replies; 23+ messages in thread
From: Anshuman Khandual @ 2013-06-25 10:55 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sukadev, Paul Mackerras

On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> A mistake we have made in the past is that we pull out the fields we
> need from the event code, but don't check that there are no unknown bits
> set. This means that we can't ever assign meaning to those unknown bits
> in future.
> 
> Although we have once again failed to do this at release, it is still
> early days for Power8 so I think we can still slip this in and get away
> with it.
> 
> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable()
  2013-06-24 11:28 ` [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable() Michael Ellerman
@ 2013-06-25 11:22   ` Anshuman Khandual
  2013-06-26  3:28     ` Michael Ellerman
  2013-07-10  0:15     ` Sukadev Bhattiprolu
  0 siblings, 2 replies; 23+ messages in thread
From: Anshuman Khandual @ 2013-06-25 11:22 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sukadev, Paul Mackerras

On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
> bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
> 
> It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
> Occurred) set. When we write that value back it will cause an interrupt
> to occur. We will then end up in the PMU interrupt handler even though
> we are supposed to have just disabled the PMU.
> 

Is that possible ? First of all MMCR0[PMAO] could not be written by SW.
Even if you try writing it, how its going to generate PMU interrupt ?
HW sets this bit MMCR0[PMAO] after a PMU interrupt has already occurred
not that if we set this, a PMU interrupt would be generated.

> We can avoid this by making sure we never write PMAO back. We should not

Making sure that we dont write PMAO back is a good idea though.

> lose interrupts because when the PMU is re-enabled the overflowed values
> will cause another interrupt.
> 

I doubt this theory.

> We also reorder the clearing of SAMPLE_ENABLE so that is done after the
> PMU is frozen. Otherwise there is a small window between the clearing of
> SAMPLE_ENABLE and the setting of FC where we could take an interrupt and
> incorrectly see SAMPLE_ENABLE not set. This would for example change the
> logic in perf_read_regs().
> 

Agreed

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable()
  2013-06-25 11:22   ` Anshuman Khandual
@ 2013-06-26  3:28     ` Michael Ellerman
  2013-07-10  0:15     ` Sukadev Bhattiprolu
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2013-06-26  3:28 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linuxppc-dev, sukadev, Paul Mackerras

On Tue, Jun 25, 2013 at 04:52:39PM +0530, Anshuman Khandual wrote:
> On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> > In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
> > bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
> > 
> > It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
> > Occurred) set. When we write that value back it will cause an interrupt
> > to occur. We will then end up in the PMU interrupt handler even though
> > we are supposed to have just disabled the PMU.
> > 
> 
> Is that possible ? First of all MMCR0[PMAO] could not be written by SW.
> Even if you try writing it, how its going to generate PMU interrupt ?
> HW sets this bit MMCR0[PMAO] after a PMU interrupt has already occurred
> not that if we set this, a PMU interrupt would be generated.

Yes it's possible.

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] powerpc/perf: Core EBB support for 64-bit book3s
  2013-06-24 11:28 ` [PATCH 7/8] powerpc/perf: Core EBB support for 64-bit book3s Michael Ellerman
@ 2013-06-26  8:38   ` Anshuman Khandual
  2013-06-27 11:52     ` Michael Ellerman
  0 siblings, 1 reply; 23+ messages in thread
From: Anshuman Khandual @ 2013-06-26  8:38 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sukadev, Paul Mackerras

On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> Add support for EBB (Event Based Branches) on 64-bit book3s. See the
> included documentation for more details.
> 
> EBBs are a feature which allows the hardware to branch directly to a
> specified user space address when a PMU event overflows. This can be
> used by programs for self-monitoring with no kernel involvement in the
> inner loop.
> 
> Most of the logic is in the generic book3s code, primarily to avoid a
> proliferation of PMU callbacks.
> 
> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
> ---
>  Documentation/powerpc/00-INDEX               |    2 +
>  Documentation/powerpc/pmu-ebb.txt            |  122 +++++++++++++++++++
>  arch/powerpc/include/asm/perf_event_server.h |    6 +
>  arch/powerpc/include/asm/processor.h         |    3 +-
>  arch/powerpc/include/asm/reg.h               |    8 ++
>  arch/powerpc/include/asm/switch_to.h         |   14 +++
>  arch/powerpc/kernel/process.c                |    4 +
>  arch/powerpc/perf/core-book3s.c              |  161 +++++++++++++++++++++++---
>  8 files changed, 306 insertions(+), 14 deletions(-)
>  create mode 100644 Documentation/powerpc/pmu-ebb.txt
> 
> diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
> index dd9e928..05026ce 100644
> --- a/Documentation/powerpc/00-INDEX
> +++ b/Documentation/powerpc/00-INDEX
> @@ -14,6 +14,8 @@ hvcs.txt
>  	- IBM "Hypervisor Virtual Console Server" Installation Guide
>  mpc52xx.txt
>  	- Linux 2.6.x on MPC52xx family
> +pmu-ebb.txt
> +	- Description of the API for using the PMU with Event Based Branches.
>  qe_firmware.txt
>  	- describes the layout of firmware binaries for the Freescale QUICC
>  	  Engine and the code that parses and uploads the microcode therein.
> diff --git a/Documentation/powerpc/pmu-ebb.txt b/Documentation/powerpc/pmu-ebb.txt
> new file mode 100644
> index 0000000..65b6989
> --- /dev/null
> +++ b/Documentation/powerpc/pmu-ebb.txt
> @@ -0,0 +1,122 @@
> +PMU Event Based Branches
> +========================
> +
> +Event Based Branches (EBBs) are a feature which allows the hardware to
> +branch directly to a specified user space address when certain events occur.
> +
> +The full specification is available in Power ISA v2.07:
> +
> +  https://www.power.org/documentation/power-isa-version-2-07/
> +
> +One type of event for which EBBs can be configured is PMU exceptions. This
> +document describes the API for configuring the Power PMU to generate EBBs,
> +using the Linux perf_events API.
> +
> +
> +Terminology
> +-----------
> +
> +Throughout this document we will refer to an "EBB event" or "EBB events". This
> +just refers to a struct perf_event which has set the "EBB" flag in its
> +attr.config. All events which can be configured on the hardware PMU are
> +possible "EBB events".
> +

Then why we have a condition like this where the event code must have the PMC
value inside it (in the next patch) ?


+	if (!pmc && ebb)
+		/* EBB events must specify the PMC */
+		return -1;


> +
> +Background
> +----------
> +
> +When a PMU EBB occurs it is delivered to the currently running process. As such
> +EBBs can only sensibly be used by programs for self-monitoring.
> +
> +It is a feature of the perf_events API that events can be created on other
> +processes, subject to standard permission checks. This is also true of EBB
> +events, however unless the target process enables EBBs (via mtspr(BESCR)) no
> +EBBs will ever be delivered.
> +
> +This makes it possible for a process to enable EBBs for itself, but not
> +actually configure any events. At a later time another process can come along
> +and attach an EBB event to the process, which will then cause EBBs to be
> +delivered to the first process. It's not clear if this is actually useful.
> +

May be useful when a "master process" wants each of the thread to collect statistics
(about the same thread) on various dynamically configured events (as and when it
wishes) and report it back to the master process. Just a thought about a case where
it can be useful.

> +When the PMU is configured for EBBs, all PMU interrupts are delivered to the
> +user process. This means once an EBB event is scheduled on the PMU, no non-EBB
> +events can be configured. This means that EBB events can not be run
> +concurrently with regular 'perf' commands.
> +
> +It is however safe to run 'perf' commands on a process which is using EBBs. In
> +general the EBB event will take priority, though it depends on the exact
> +options used on the perf_event_open() and the timing.
> +

This is confusing. If a process A is using EBB for itself on event "p" and gets scheduled
on CPU X. Process B has started perf session on process A for event "q". Now the PMU of
CPU X would to be programmed for both the events "p" and "q" on different PMCs at the same
point of time (with a condition checking that they dont collide on the same PMC though).
What you are saying is that when the event "p" overflows, the PMU interrupt (CPU X) would
be delivered to the process A user space and when the event "q" overflows, the PMU interrupt
(CPU X) would be delivered inside the perf kernel component "perf_event_interrupt()" and would
be processed for the perf session initiated by the second process B on process A.

But again this contradicts your previous statement that when PMU is configured for EBB, *all*
PMU interrupts would be delivered to the user space. Could you please kindly clarify this
scenario.

> +
> +Creating an EBB event
> +---------------------
> +
> +To request that an event is counted using EBB, the event code should have bit
> +63 set.
> +

This macro (defined in arch header) identifies any event as an EBB event

+/*
+ * We use the event config bit 63 as a flag to request EBB.
+ */
+#define EVENT_CONFIG_EBB_SHIFT	63
+

So any user program would have to include the arch header to be able to set EBB bit.
Numeric 63 will not be a clean ABI.

> +EBB events must be created with a particular, and restrictive, set of
> +attributes - this is so that they interoperate correctly with the rest of the
> +perf_events subsystem.
> +
> +An EBB event must be created with the "pinned" and "exclusive" attributes set.
> +Note that if you are creating a group of EBB events, only the leader can have
> +these attributes set.
> +
> +An EBB event must NOT set any of the "inherit", "sample_period", "freq" or
> +"enable_on_exec" attributes.
> +
> +An EBB event must be attached to a task. This is specified to perf_event_open()
> +by passing a pid value, typically 0 indicating the current task.
> +
> +All events in a group must agree on whether they want EBB. That is all events
> +must request EBB, or none may request EBB.
> +
> +
> +Enabling an EBB event
> +---------------------
> +
> +Once an EBB event has been successfully opened, it must be enabled with the
> +perf_events API. This can be achieved either via the ioctl() interface, or the
> +prctl() interface.
> +
> +However, due to the design of the perf_events API, enabling an event does not
> +guarantee that it has been scheduled on the PMU. To ensure that the EBB event
> +has been scheduled on the PMU, you must perform a read() on the event. If the
> +read() returns EOF, then the event has not been scheduled and EBBs are not
> +enabled.
> +
> +
> +Reading an EBB event
> +--------------------
> +
> +It is possible to read() from an EBB event. However the results are
> +meaningless. Because interrupts are being delivered to the user process the
> +kernel is not able to count the event, and so will return a junk value.
> +
> +
> +Closing an EBB event
> +--------------------
> +
> +When an EBB event is finished with, you can close it using close() as for any
> +regular event. If this is the last EBB event the PMU will be deconfigured and
> +no further PMU EBBs will be delivered.
> +
> +
> +EBB Handler
> +-----------
> +
> +The EBB handler is just regular userspace code, however it must be written in
> +the style of an interrupt handler. When the handler is entered all registers
> +are live (possibly) and so must be saved somehow before the handler can invoke
> +other code.
> +
> +It's up to the program how to handle this. For C programs a relatively simple
> +option is to create an interrupt frame on the stack and save registers there.
> +

Would be a great if you could give sample framework here on how to save and restore
registers. Moreover we could actually put the various essential parts of the EBB
handler construct in the perf_event_server.h file, so that the user would be able
to user them directly and only focus on core part of the event handling.

> +Fork
> +----
> +
> +EBB events are not inherited across fork. If the child process wishes to use
> +EBBs it should open a new event for itself. Similarly the EBB state in
> +BESCR/EBBHR/EBBRR is cleared across fork().
> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
> index f265049..2dd7bfc 100644
> --- a/arch/powerpc/include/asm/perf_event_server.h
> +++ b/arch/powerpc/include/asm/perf_event_server.h
> @@ -60,6 +60,7 @@ struct power_pmu {
>  #define PPMU_HAS_SSLOT		0x00000020 /* Has sampled slot in MMCRA */
>  #define PPMU_HAS_SIER		0x00000040 /* Has SIER */
>  #define PPMU_BHRB		0x00000080 /* has BHRB feature enabled */
> +#define PPMU_EBB		0x00000100 /* supports event based branch */
> 
>  /*
>   * Values for flags to get_alternatives()
> @@ -68,6 +69,11 @@ struct power_pmu {
>  #define PPMU_LIMITED_PMC_REQD	2	/* have to put this on a limited PMC */
>  #define PPMU_ONLY_COUNT_RUN	4	/* only counting in run state */
> 
> +/*
> + * We use the event config bit 63 as a flag to request EBB.
> + */
> +#define EVENT_CONFIG_EBB_SHIFT	63
> +
>  extern int register_power_pmu(struct power_pmu *);
> 
>  struct pt_regs;
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 48af5d7..f9a4cdc 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -287,8 +287,9 @@ struct thread_struct {
>  	unsigned long	siar;
>  	unsigned long	sdar;
>  	unsigned long	sier;
> -	unsigned long	mmcr0;
>  	unsigned long	mmcr2;
> +	unsigned 	mmcr0;
> +	unsigned 	used_ebb;
>  #endif
>  };
> 

Why mmrc0 has to change position here.


> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 362142b..5d7d9c2 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -621,6 +621,9 @@
>  #define   MMCR0_PMXE	0x04000000UL /* performance monitor exception enable */
>  #define   MMCR0_FCECE	0x02000000UL /* freeze ctrs on enabled cond or event */
>  #define   MMCR0_TBEE	0x00400000UL /* time base exception enable */
> +#define   MMCR0_EBE	0x00100000UL /* Event based branch enable */
> +#define   MMCR0_PMCC	0x000c0000UL /* PMC control */
> +#define   MMCR0_PMCC_U6	0x00080000UL /* PMC1-6 are R/W by user (PR) */
>  #define   MMCR0_PMC1CE	0x00008000UL /* PMC1 count enable*/
>  #define   MMCR0_PMCjCE	0x00004000UL /* PMCj count enable*/
>  #define   MMCR0_TRIGGER	0x00002000UL /* TRIGGER enable */
> @@ -674,6 +677,11 @@
>  #define   SIER_SIAR_VALID	0x0400000	/* SIAR contents valid */
>  #define   SIER_SDAR_VALID	0x0200000	/* SDAR contents valid */
> 
> +/* When EBB is enabled, some of MMCR0/MMCR2/SIER are user accessible */
> +#define MMCR0_USER_MASK	(MMCR0_FC | MMCR0_PMXE | MMCR0_PMAO)
> +#define MMCR2_USER_MASK	0x4020100804020000UL /* (FC1P|FC2P|FC3P|FC4P|FC5P|FC6P) */
> +#define SIER_USER_MASK	0x7fffffUL
> +

Ohh these are the bits in SPR which are available in user space to read and write as well ?
Better to have macros instead of hex codes here.

>  #define SPRN_PA6T_MMCR0 795
>  #define   PA6T_MMCR0_EN0	0x0000000000000001UL
>  #define   PA6T_MMCR0_EN1	0x0000000000000002UL
> diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
> index 200d763..49a13e0 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -67,4 +67,18 @@ static inline void flush_spe_to_thread(struct task_struct *t)
>  }
>  #endif
> 
> +static inline void clear_task_ebb(struct task_struct *t)
> +{
> +#ifdef CONFIG_PPC_BOOK3S_64
> +    /* EBB perf events are not inherited, so clear all EBB state. */
> +    t->thread.bescr = 0;
> +    t->thread.mmcr2 = 0;
> +    t->thread.mmcr0 = 0;
> +    t->thread.siar = 0;
> +    t->thread.sdar = 0;
> +    t->thread.sier = 0;
> +    t->thread.used_ebb = 0;
> +#endif
> +}
> +
>  #endif /* _ASM_POWERPC_SWITCH_TO_H */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 076d124..c517dbe 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -916,7 +916,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>  	flush_altivec_to_thread(src);
>  	flush_vsx_to_thread(src);
>  	flush_spe_to_thread(src);
> +
>  	*dst = *src;
> +
> +	clear_task_ebb(dst);
> +
>  	return 0;
>  }

Blank lines are not necessary here.

> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index d3ee2e5..c6bbbf9 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -77,6 +77,9 @@ static unsigned int freeze_events_kernel = MMCR0_FCS;
>  #define MMCR0_PMCjCE		MMCR0_PMCnCE
>  #define MMCR0_FC56		0
>  #define MMCR0_PMAO		0
> +#define MMCR0_EBE		0
> +#define MMCR0_PMCC		0
> +#define MMCR0_PMCC_U6		0
> 
>  #define SPRN_MMCRA		SPRN_MMCR2
>  #define MMCRA_SAMPLE_ENABLE	0
> @@ -104,6 +107,15 @@ static inline int siar_valid(struct pt_regs *regs)
>  	return 1;
>  }
> 
> +static bool is_ebb_event(struct perf_event *event) { return false; }
> +static int ebb_event_check(struct perf_event *event) { return 0; }
> +static void ebb_event_add(struct perf_event *event) { }
> +static void ebb_switch_out(unsigned long mmcr0) { }
> +static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0)
> +{
> +	return mmcr0;
> +}
> +
>  static inline void power_pmu_bhrb_enable(struct perf_event *event) {}
>  static inline void power_pmu_bhrb_disable(struct perf_event *event) {}
>  void power_pmu_flush_branch_stack(void) {}
> @@ -464,6 +476,89 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>  	return;
>  }
> 
> +static bool is_ebb_event(struct perf_event *event)
> +{
> +	/*
> +	 * This could be a per-PMU callback, but we'd rather avoid the cost. We
> +	 * check that the PMU supports EBB, meaning those that don't can still
> +	 * use bit 63 of the event code for something else if they wish.
> +	 */
> +	return (ppmu->flags & PPMU_EBB) &&
> +	       ((event->attr.config >> EVENT_CONFIG_EBB_SHIFT) & 1);
> +}
> +
> +static int ebb_event_check(struct perf_event *event)
> +{
> +	struct perf_event *leader = event->group_leader;
> +
> +	/* Event and group leader must agree on EBB */
> +	if (is_ebb_event(leader) != is_ebb_event(event))
> +		return -EINVAL;
> +
> +	if (is_ebb_event(event)) {
> +		if (!(event->attach_state & PERF_ATTACH_TASK))
> +			return -EINVAL;
> +
> +		if (!leader->attr.pinned || !leader->attr.exclusive)
> +			return -EINVAL;
> +
> +		if (event->attr.inherit || event->attr.sample_period ||
> +		    event->attr.enable_on_exec || event->attr.freq)
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static void ebb_event_add(struct perf_event *event)
> +{
> +	if (!is_ebb_event(event) || current->thread.used_ebb)
> +		return;
> +
> +	/*
> +	 * IFF this is the first time we've added an EBB event, set
> +	 * PMXE in the user MMCR0 so we can detect when it's cleared by
> +	 * userspace. We need this so that we can context switch while
> +	 * userspace is in the EBB handler (where PMXE is 0).
> +	 */
> +	current->thread.used_ebb = 1;
> +	current->thread.mmcr0 |= MMCR0_PMXE;
> +}
> +
> +static void ebb_switch_out(unsigned long mmcr0)
> +{
> +	if (!(mmcr0 & MMCR0_EBE))
> +		return;
> +
> +	current->thread.siar  = mfspr(SPRN_SIAR);
> +	current->thread.sier  = mfspr(SPRN_SIER);
> +	current->thread.sdar  = mfspr(SPRN_SDAR);
> +	current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK;
> +	current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK;
> +}
> +

We also need to filter sier value for SIER_USER_MASK, right ? 

> +static unsigned long ebb_switch_in(bool ebb, unsigned long mmcr0)
> +{
> +	if (!ebb)
> +		goto out;
> +
> +	/* Enable EBB and read/write to all 6 PMCs for userspace */
> +	mmcr0 |= MMCR0_EBE | MMCR0_PMCC_U6;
> +
> +	/* Add any bits from the user reg, FC or PMAO */
> +	mmcr0 |= current->thread.mmcr0;
> +
> +	/* Be careful not to set PMXE if userspace had it cleared */
> +	if (!(current->thread.mmcr0 & MMCR0_PMXE))
> +		mmcr0 &= ~MMCR0_PMXE;
> +
> +	mtspr(SPRN_SIAR, current->thread.siar);
> +	mtspr(SPRN_SIER, current->thread.sier);
> +	mtspr(SPRN_SDAR, current->thread.sdar);
> +	mtspr(SPRN_MMCR2, current->thread.mmcr2);
> +out:
> +	return mmcr0;
> +}
>  #endif /* CONFIG_PPC64 */
> 
>  static void perf_event_interrupt(struct pt_regs *regs);
> @@ -734,6 +829,13 @@ static void power_pmu_read(struct perf_event *event)
> 
>  	if (!event->hw.idx)
>  		return;
> +
> +	if (is_ebb_event(event)) {
> +		val = read_pmc(event->hw.idx);
> +		local64_set(&event->hw.prev_count, val);
> +		return;
> +	}
> +
>  	/*
>  	 * Performance monitor interrupts come even when interrupts
>  	 * are soft-disabled, as long as interrupts are hard-enabled.
> @@ -854,7 +956,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, unsigned long mmcr0)
>  static void power_pmu_disable(struct pmu *pmu)
>  {
>  	struct cpu_hw_events *cpuhw;
> -	unsigned long flags, val;
> +	unsigned long flags, mmcr0, val;
> 
>  	if (!ppmu)
>  		return;
> @@ -871,11 +973,11 @@ static void power_pmu_disable(struct pmu *pmu)
>  		}
> 
>  		/*
> -		 * Set the 'freeze counters' bit, clear PMAO/FC56.
> +		 * Set the 'freeze counters' bit, clear EBE/PMCC/PMAO/FC56.
>  		 */
> -		val  = mfspr(SPRN_MMCR0);
> +		val  = mmcr0 = mfspr(SPRN_MMCR0);
>  		val |= MMCR0_FC;
> -		val &= ~(MMCR0_PMAO | MMCR0_FC56);
> +		val &= ~(MMCR0_EBE | MMCR0_PMCC | MMCR0_PMAO | MMCR0_FC56);
> 
>  		/*
>  		 * The barrier is to make sure the mtspr has been
> @@ -896,7 +998,10 @@ static void power_pmu_disable(struct pmu *pmu)
> 
>  		cpuhw->disabled = 1;
>  		cpuhw->n_added = 0;
> +
> +		ebb_switch_out(mmcr0);
>  	}
> +
>  	local_irq_restore(flags);
>  }
> 
> @@ -911,15 +1016,15 @@ static void power_pmu_enable(struct pmu *pmu)
>  	struct cpu_hw_events *cpuhw;
>  	unsigned long flags;
>  	long i;
> -	unsigned long val;
> +	unsigned long val, mmcr0;
>  	s64 left;
>  	unsigned int hwc_index[MAX_HWEVENTS];
>  	int n_lim;
>  	int idx;
> +	bool ebb;
> 
>  	if (!ppmu)
>  		return;
> -
>  	local_irq_save(flags);
> 
>  	cpuhw = &__get_cpu_var(cpu_hw_events);
> @@ -934,6 +1039,13 @@ static void power_pmu_enable(struct pmu *pmu)
>  	cpuhw->disabled = 0;
> 
>  	/*
> +	 * EBB requires an exclusive group and all events must have the EBB
> +	 * flag set, or not set, so we can just check a single event. Also we
> +	 * know we have at least one event.
> +	 */
> +	ebb = is_ebb_event(cpuhw->event[0]);
> +
> +	/*
>  	 * If we didn't change anything, or only removed events,
>  	 * no need to recalculate MMCR* settings and reset the PMCs.
>  	 * Just reenable the PMU with the current MMCR* settings
> @@ -1008,25 +1120,34 @@ static void power_pmu_enable(struct pmu *pmu)
>  			++n_lim;
>  			continue;
>  		}
> -		val = 0;
> -		if (event->hw.sample_period) {
> -			left = local64_read(&event->hw.period_left);
> -			if (left < 0x80000000L)
> -				val = 0x80000000L - left;
> +
> +		if (ebb)
> +			val = local64_read(&event->hw.prev_count);
> +		else {
> +			val = 0;
> +			if (event->hw.sample_period) {
> +				left = local64_read(&event->hw.period_left);
> +				if (left < 0x80000000L)
> +					val = 0x80000000L - left;
> +			}
> +			local64_set(&event->hw.prev_count, val);
>  		}
> -		local64_set(&event->hw.prev_count, val);
> +
>  		event->hw.idx = idx;
>  		if (event->hw.state & PERF_HES_STOPPED)
>  			val = 0;
>  		write_pmc(idx, val);
> +
>  		perf_event_update_userpage(event);
>  	}
>  	cpuhw->n_limited = n_lim;
>  	cpuhw->mmcr[0] |= MMCR0_PMXE | MMCR0_FCECE;
> 
>   out_enable:
> +	mmcr0 = ebb_switch_in(ebb, cpuhw->mmcr[0]);
> +
>  	mb();
> -	write_mmcr0(cpuhw, cpuhw->mmcr[0]);
> +	write_mmcr0(cpuhw, mmcr0);
> 
>  	/*
>  	 * Enable instruction sampling if necessary
> @@ -1124,6 +1245,8 @@ static int power_pmu_add(struct perf_event *event, int ef_flags)
>  	event->hw.config = cpuhw->events[n0];
> 
>  nocheck:
> +	ebb_event_add(event);
> +
>  	++cpuhw->n_events;
>  	++cpuhw->n_added;
> 
> @@ -1484,6 +1607,11 @@ static int power_pmu_event_init(struct perf_event *event)
>  		}
>  	}
> 
> +	/* Extra checks for EBB */
> +	err = ebb_event_check(event);
> +	if (err)
> +		return err;
> +
>  	/*
>  	 * If this is in a group, check if it can go on with all the
>  	 * other hardware events in the group.  We assume the event
> @@ -1523,6 +1651,13 @@ static int power_pmu_event_init(struct perf_event *event)
>  	local64_set(&event->hw.period_left, event->hw.last_period);
> 
>  	/*
> +	 * For EBB events we just context switch the PMC value, we don't do any
> +	 * of the sample_period logic. We use hw.prev_count for this.
> +	 */
> +	if (is_ebb_event(event))
> +		local64_set(&event->hw.prev_count, 0);
> +
> +	/*
>  	 * See if we need to reserve the PMU.
>  	 * If no events are currently in use, then we have to take a
>  	 * mutex to ensure that we don't race with another task doing
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] powerpc/perf: Add power8 EBB support
  2013-06-24 11:28 ` [PATCH 8/8] powerpc/perf: Add power8 EBB support Michael Ellerman
@ 2013-06-26  9:58   ` Anshuman Khandual
  2013-06-27 11:52     ` Michael Ellerman
  0 siblings, 1 reply; 23+ messages in thread
From: Anshuman Khandual @ 2013-06-26  9:58 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sukadev, Paul Mackerras

> @@ -117,6 +117,7 @@
>  	 (EVENT_UNIT_MASK      << EVENT_UNIT_SHIFT)		|	\
>  	 (EVENT_COMBINE_MASK   << EVENT_COMBINE_SHIFT)		|	\
>  	 (EVENT_MARKED_MASK    << EVENT_MARKED_SHIFT)		|	\
> +	 (1ull		       << EVENT_CONFIG_EBB_SHIFT)	|	\

We should define this macro like EVENT_MARKED_MASK

#define EVENT_EBB_MASK       0x1

Numeric value of "1ull" stands out odd in the scheme.


>  	  EVENT_PSEL_MASK)
> 
> + *              EBB -*    |                     |
> + *                        |                     |      Count of events for each PMC.
> + *      L1 I/D qualifier -*                     |        p1, p2, p3, p4, p5, p6.
>   *                     nc - number of counters -*
>   *
>   * The PMC fields P1..P6, and NC, are adder fields. As we accumulate constraints
> @@ -159,6 +160,9 @@
>  #define CNST_THRESH_VAL(v)	(((v) & EVENT_THRESH_MASK) << 32)
>  #define CNST_THRESH_MASK	CNST_THRESH_VAL(EVENT_THRESH_MASK)
> 
> +#define CNST_EBB_VAL(v)		(((v) & 1) << 24)

EVENT_EBB_MASK can be used here as well.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/8] powerpc/perf: Use existing out label in power_pmu_enable()
  2013-06-24 11:28 ` [PATCH 4/8] powerpc/perf: Use existing out label in power_pmu_enable() Michael Ellerman
@ 2013-06-27 11:01   ` Anshuman Khandual
  0 siblings, 0 replies; 23+ messages in thread
From: Anshuman Khandual @ 2013-06-27 11:01 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sukadev, Paul Mackerras

On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> In power_pmu_enable() we can use the existing out label to reduce the
> number of return paths.
> 
> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

> ---
>  arch/powerpc/perf/core-book3s.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 3d566ee..af4b4b1 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -919,12 +919,13 @@ static void power_pmu_enable(struct pmu *pmu)
> 
>  	if (!ppmu)
>  		return;
> +
>  	local_irq_save(flags);
> +
>  	cpuhw = &__get_cpu_var(cpu_hw_events);
> -	if (!cpuhw->disabled) {
> -		local_irq_restore(flags);
> -		return;
> -	}
> +	if (!cpuhw->disabled)
> +		goto out;
> +
>  	cpuhw->disabled = 0;
> 
>  	/*
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] powerpc/perf: Core EBB support for 64-bit book3s
  2013-06-26  8:38   ` Anshuman Khandual
@ 2013-06-27 11:52     ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2013-06-27 11:52 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linuxppc-dev, sukadev, Paul Mackerras


On Wed, 2013-06-26 at 14:08 +0530, Anshuman Khandual wrote:
> On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> > Add support for EBB (Event Based Branches) on 64-bit book3s. See the
> > included documentation for more details.
..
> > +
> > +
> > +Terminology
> > +-----------
> > +
> > +Throughout this document we will refer to an "EBB event" or "EBB events". This
> > +just refers to a struct perf_event which has set the "EBB" flag in its
> > +attr.config. All events which can be configured on the hardware PMU are
> > +possible "EBB events".
> > +
> 
> Then why we have a condition like this where the event code must have the PMC
> value inside it (in the next patch) ?

Those two things are not contradictory.

> +	if (!pmc && ebb)
> +		/* EBB events must specify the PMC */
> +		return -1;

This does not exclude any events. It just means if you're using one of
the any-PMC events you must choose a PMC for the event before opening
it.

The reason we do this is because userspace needs to know which PMC each
event is on, in order to read the PMCs and count them correctly. There
is a mechanism for the kernel to communicate to userspace which event is
on which PMC, but it is vastly simpler for userspace if it just
specifies the PMC in the event to begin with.

We could relax this restriction in future if someone comes up with a
really good reason to, but I don't see one.

I will add it to the documentation though.

> > +
> > +Background
> > +----------
> > +
> > +When a PMU EBB occurs it is delivered to the currently running process. As such
> > +EBBs can only sensibly be used by programs for self-monitoring.
> > +
> > +It is a feature of the perf_events API that events can be created on other
> > +processes, subject to standard permission checks. This is also true of EBB
> > +events, however unless the target process enables EBBs (via mtspr(BESCR)) no
> > +EBBs will ever be delivered.
> > +
> > +This makes it possible for a process to enable EBBs for itself, but not
> > +actually configure any events. At a later time another process can come along
> > +and attach an EBB event to the process, which will then cause EBBs to be
> > +delivered to the first process. It's not clear if this is actually useful.
> > +
> 
> May be useful when a "master process" wants each of the thread to collect statistics
> (about the same thread) on various dynamically configured events (as and when it
> wishes) and report it back to the master process. Just a thought about a case where
> it can be useful.

Yeah sort of.

I was thinking more of a long running process which sets up an EBB
handler and is therefore prepared to monitor itself, but you use an
external program to turn the event collection on and off.

> > +When the PMU is configured for EBBs, all PMU interrupts are delivered to the
> > +user process. This means once an EBB event is scheduled on the PMU, no non-EBB
> > +events can be configured. This means that EBB events can not be run
> > +concurrently with regular 'perf' commands.
> > +
> > +It is however safe to run 'perf' commands on a process which is using EBBs. In
> > +general the EBB event will take priority, though it depends on the exact
> > +options used on the perf_event_open() and the timing.
> > +
> 
> This is confusing. 

I don't think it is.

> If a process A is using EBB for itself on event "p" and gets scheduled
> on CPU X. Process B has started perf session on process A for event "q". Now the PMU of
> CPU X would to be programmed for both the events "p" and "q" on different PMCs at the same
> point of time (with a condition checking that they dont collide on the same PMC though).

No that's not correct.

> What you are saying is that when the event "p" overflows, the PMU interrupt (CPU X) would
> be delivered to the process A user space and when the event "q" overflows, the PMU interrupt
> (CPU X) would be delivered inside the perf kernel component "perf_event_interrupt()" and would
> be processed for the perf session initiated by the second process B on process A.

I'm not saying that, I don't know how you got that idea.

> But again this contradicts your previous statement that when PMU is configured for EBB, *all*
> PMU interrupts would be delivered to the user space. Could you please kindly clarify this
> scenario.

To paraphrase you above:

        Process A is using EBB with event "p" on CPU x, and process B
        starts running perf on process A with event "q".
        
        The PMU of CPU x will be programmed to count event "p" _and
        nothing else_. All PMU interrupts will be delivered to
        userspace.
        
        When the perf session reads its counter it will see that event
        "q" was unable to run for some of the time it was enabled.


There is one caveat to this which is that the EBB event gets priority
because it is pinned and exclusive. If there is another event that is
also pinned then there is a race between which event is opened on the
process first. This is what I talk about below in the "Enabling an EBB
event" section.

In general perf the tool doesn't pin events, so I don't see this as
being a problem.


> > +
> > +Creating an EBB event
> > +---------------------
> > +
> > +To request that an event is counted using EBB, the event code should have bit
> > +63 set.
> > +
> 
> This macro (defined in arch header) identifies any event as an EBB event
> 
> +/*
> + * We use the event config bit 63 as a flag to request EBB.
> + */
> +#define EVENT_CONFIG_EBB_SHIFT	63
> +
> 
> So any user program would have to include the arch header to be able to set EBB bit.
> Numeric 63 will not be a clean ABI.

I think you mean we should put that in an exported header and expose it
to userspace?

If so I'm not sure yet.

Doing so would mean we were guaranteeing that bit 63 would always mean
"EBB please" (on processors that support EBB). I /think/ that's probably
OK, but I want to confirm that with the HW folks first.

If we don't think we want to guarantee that then we would export the EBB
bit information using the existing format attrs in the perf code, and
userspace would have to look that up in sysfs.


> > +Enabling an EBB event
> > +---------------------
> > +
> > +Once an EBB event has been successfully opened, it must be enabled with the
> > +perf_events API. This can be achieved either via the ioctl() interface, or the
> > +prctl() interface.
> > +
> > +However, due to the design of the perf_events API, enabling an event does not
> > +guarantee that it has been scheduled on the PMU. To ensure that the EBB event
> > +has been scheduled on the PMU, you must perform a read() on the event. If the
> > +read() returns EOF, then the event has not been scheduled and EBBs are not
> > +enabled.
...
> > +EBB Handler
> > +-----------
> > +
> > +The EBB handler is just regular userspace code, however it must be written in
> > +the style of an interrupt handler. When the handler is entered all registers
> > +are live (possibly) and so must be saved somehow before the handler can invoke
> > +other code.
> > +
> > +It's up to the program how to handle this. For C programs a relatively simple
> > +option is to create an interrupt frame on the stack and save registers there.
> > +
> 
> Would be a great if you could give sample framework here on how to save and restore
> registers. Moreover we could actually put the various essential parts of the EBB
> handler construct in the perf_event_server.h file, so that the user would be able
> to user them directly and only focus on core part of the event handling.

I will eventually merge some test programs and example code.

But they definitely don't belong in the kernel headers.

And as an aside perf_event_server.h isn't exported to userspace.


> > diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> > index 48af5d7..f9a4cdc 100644
> > --- a/arch/powerpc/include/asm/processor.h
> > +++ b/arch/powerpc/include/asm/processor.h
> > @@ -287,8 +287,9 @@ struct thread_struct {
> >  	unsigned long	siar;
> >  	unsigned long	sdar;
> >  	unsigned long	sier;
> > -	unsigned long	mmcr0;
> >  	unsigned long	mmcr2;
> > +	unsigned 	mmcr0;
> > +	unsigned 	used_ebb;
> >  #endif
> >  };
> > 
> 
> Why mmrc0 has to change position here.

So the structure packs properly.


> > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> > index 362142b..5d7d9c2 100644
> > --- a/arch/powerpc/include/asm/reg.h
> > +++ b/arch/powerpc/include/asm/reg.h
> > @@ -621,6 +621,9 @@
> >  #define   MMCR0_PMXE	0x04000000UL /* performance monitor exception enable */
> >  #define   MMCR0_FCECE	0x02000000UL /* freeze ctrs on enabled cond or event */
> >  #define   MMCR0_TBEE	0x00400000UL /* time base exception enable */
> > +#define   MMCR0_EBE	0x00100000UL /* Event based branch enable */
> > +#define   MMCR0_PMCC	0x000c0000UL /* PMC control */
> > +#define   MMCR0_PMCC_U6	0x00080000UL /* PMC1-6 are R/W by user (PR) */
> >  #define   MMCR0_PMC1CE	0x00008000UL /* PMC1 count enable*/
> >  #define   MMCR0_PMCjCE	0x00004000UL /* PMCj count enable*/
> >  #define   MMCR0_TRIGGER	0x00002000UL /* TRIGGER enable */
> > @@ -674,6 +677,11 @@
> >  #define   SIER_SIAR_VALID	0x0400000	/* SIAR contents valid */
> >  #define   SIER_SDAR_VALID	0x0200000	/* SDAR contents valid */
> > 
> > +/* When EBB is enabled, some of MMCR0/MMCR2/SIER are user accessible */
> > +#define MMCR0_USER_MASK	(MMCR0_FC | MMCR0_PMXE | MMCR0_PMAO)
> > +#define MMCR2_USER_MASK	0x4020100804020000UL /* (FC1P|FC2P|FC3P|FC4P|FC5P|FC6P) */
> > +#define SIER_USER_MASK	0x7fffffUL
> > +
> 
> Ohh these are the bits in SPR which are available in user space to read and write as well ?

Yes. See section 9.4.10 of PowerISA v2.07.

> Better to have macros instead of hex codes here.

We don't have macros for the other fields. If we add macros for FCxP then we can redo it then.


> >  #endif /* _ASM_POWERPC_SWITCH_TO_H */
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 076d124..c517dbe 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -916,7 +916,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> >  	flush_altivec_to_thread(src);
> >  	flush_vsx_to_thread(src);
> >  	flush_spe_to_thread(src);
> > +
> >  	*dst = *src;
> > +
> > +	clear_task_ebb(dst);
> > +
> >  	return 0;
> >  }
> 
> Blank lines are not necessary here.

Blank lines are never necessary.

But they highlight parts of the code that are important and make it more
readable. In this case I'm highlighting that we do the structure
assignment and then clear the EBB info of the destination task, vs the
other routines which operate on the src.


> > +static void ebb_switch_out(unsigned long mmcr0)
> > +{
> > +	if (!(mmcr0 & MMCR0_EBE))
> > +		return;
> > +
> > +	current->thread.siar  = mfspr(SPRN_SIAR);
> > +	current->thread.sier  = mfspr(SPRN_SIER);
> > +	current->thread.sdar  = mfspr(SPRN_SDAR);
> > +	current->thread.mmcr0 = mmcr0 & MMCR0_USER_MASK;
> > +	current->thread.mmcr2 = mfspr(SPRN_MMCR2) & MMCR2_USER_MASK;
> > +}
> > +
> 
> We also need to filter sier value for SIER_USER_MASK, right ? 

We don't need to, the hardware does it for us.

We are filtering the others because we do (in the case of MMCR0) or
might (MMCR2) use those inside the perf code, and we don't want to
confuse values we've set with values the user has set.

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] powerpc/perf: Add power8 EBB support
  2013-06-26  9:58   ` Anshuman Khandual
@ 2013-06-27 11:52     ` Michael Ellerman
  2013-06-28  4:15       ` Anshuman Khandual
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2013-06-27 11:52 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linuxppc-dev, sukadev, Paul Mackerras


On Wed, 2013-06-26 at 15:28 +0530, Anshuman Khandual wrote:
> > @@ -117,6 +117,7 @@
> >  	 (EVENT_UNIT_MASK      << EVENT_UNIT_SHIFT)		|	\
> >  	 (EVENT_COMBINE_MASK   << EVENT_COMBINE_SHIFT)		|	\
> >  	 (EVENT_MARKED_MASK    << EVENT_MARKED_SHIFT)		|	\
> > +	 (1ull		       << EVENT_CONFIG_EBB_SHIFT)	|	\
> 
> We should define this macro like EVENT_MARKED_MASK
> 
> #define EVENT_EBB_MASK       0x1
> 
> Numeric value of "1ull" stands out odd in the scheme.

Yeah I guess.

Would you like it in blue? :)

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] powerpc/perf: Add power8 EBB support
  2013-06-27 11:52     ` Michael Ellerman
@ 2013-06-28  4:15       ` Anshuman Khandual
  2013-07-04 18:58         ` Adhemerval Zanella
  0 siblings, 1 reply; 23+ messages in thread
From: Anshuman Khandual @ 2013-06-28  4:15 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sukadev, Paul Mackerras

On 06/27/2013 05:22 PM, Michael Ellerman wrote:
> 
> On Wed, 2013-06-26 at 15:28 +0530, Anshuman Khandual wrote:
>>> @@ -117,6 +117,7 @@
>>>  	 (EVENT_UNIT_MASK      << EVENT_UNIT_SHIFT)		|	\
>>>  	 (EVENT_COMBINE_MASK   << EVENT_COMBINE_SHIFT)		|	\
>>>  	 (EVENT_MARKED_MASK    << EVENT_MARKED_SHIFT)		|	\
>>> +	 (1ull		       << EVENT_CONFIG_EBB_SHIFT)	|	\
>>
>> We should define this macro like EVENT_MARKED_MASK
>>
>> #define EVENT_EBB_MASK       0x1
>>
>> Numeric value of "1ull" stands out odd in the scheme.
> 
> Yeah I guess.
> 
> Would you like it in blue? :)
> 

:) No, I meant probably a macro definition would be cool.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/8] powerpc/perf: Don't enable if we have zero events
  2013-06-24 11:28 ` [PATCH 5/8] powerpc/perf: Don't enable if we have zero events Michael Ellerman
@ 2013-06-28  5:10   ` Anshuman Khandual
  0 siblings, 0 replies; 23+ messages in thread
From: Anshuman Khandual @ 2013-06-28  5:10 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sukadev, Paul Mackerras

On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> In power_pmu_enable() we still enable the PMU even if we have zero
> events. This should have no effect but doesn't make much sense. Instead
> just return after telling the hypervisor that we are not using the PMCs.
> 
> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] powerpc/perf: Add power8 EBB support
  2013-06-28  4:15       ` Anshuman Khandual
@ 2013-07-04 18:58         ` Adhemerval Zanella
  2013-07-05  2:54           ` Michael Ellerman
  0 siblings, 1 reply; 23+ messages in thread
From: Adhemerval Zanella @ 2013-07-04 18:58 UTC (permalink / raw)
  To: linuxppc-dev

Hi Michael,

I believe you forgot to add the cpu_user_features2 bit to announce the EBB support
for P8, patch following:

Signed-off-by: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/cputable.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 2a45d0f..5f0c80a 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -105,6 +105,7 @@ extern void __restore_cpu_e6500(void);
                                 PPC_FEATURE_PSERIES_PERFMON_COMPAT)
 #define COMMON_USER2_POWER8    (PPC_FEATURE2_ARCH_2_07 | \
                                 PPC_FEATURE2_HTM_COMP | PPC_FEATURE2_DSCR | \
+                                PPC_FEATURE2_EBB | \
                                 PPC_FEATURE2_ISEL | PPC_FEATURE2_TAR)
 #define COMMON_USER_PA6T       (COMMON_USER_PPC64 | PPC_FEATURE_PA6T |\
                                 PPC_FEATURE_TRUE_LE | \


On 28-06-2013 01:15, Anshuman Khandual wrote:
> On 06/27/2013 05:22 PM, Michael Ellerman wrote:
>> On Wed, 2013-06-26 at 15:28 +0530, Anshuman Khandual wrote:
>>>> @@ -117,6 +117,7 @@
>>>>  	 (EVENT_UNIT_MASK      << EVENT_UNIT_SHIFT)		|	\
>>>>  	 (EVENT_COMBINE_MASK   << EVENT_COMBINE_SHIFT)		|	\
>>>>  	 (EVENT_MARKED_MASK    << EVENT_MARKED_SHIFT)		|	\
>>>> +	 (1ull		       << EVENT_CONFIG_EBB_SHIFT)	|	\
>>> We should define this macro like EVENT_MARKED_MASK
>>>
>>> #define EVENT_EBB_MASK       0x1
>>>
>>> Numeric value of "1ull" stands out odd in the scheme.
>> Yeah I guess.
>>
>> Would you like it in blue? :)
>>
> :) No, I meant probably a macro definition would be cool.
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] powerpc/perf: Add power8 EBB support
  2013-07-04 18:58         ` Adhemerval Zanella
@ 2013-07-05  2:54           ` Michael Ellerman
  2013-07-05 17:57             ` Adhemerval Zanella
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2013-07-05  2:54 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: ryanarn, linuxppc-dev

On Thu, Jul 04, 2013 at 03:58:01PM -0300, Adhemerval Zanella wrote:
> Hi Michael,
> 
> I believe you forgot to add the cpu_user_features2 bit to announce the EBB support
> for P8, patch following:

Hi Adhemerval,

You're right, I haven't added it. I was wondering how best to do it.

It's possible to configure the kernel so that it doesn't have PMU
support, and in that case EBB is unsupported. It's also possible that something
goes wrong with the PMU registration (kernel bug or OOM), and again EBB is then
unsupported.

So I think it might be better if we add PPC_FEATURE2_EBB at runtime in
init_power8_pmu().

What do you think?

Something like:

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index c7f8ccc..fd9ed89 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -620,10 +682,19 @@ static struct power_pmu power8_pmu = {
 
 static int __init init_power8_pmu(void)
 {
+       int rc;
+
        if (!cur_cpu_spec->oprofile_cpu_type ||
            strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8"))
                return -ENODEV;
 
-       return register_power_pmu(&power8_pmu);
+       rc = register_power_pmu(&power8_pmu);
+       if (rc)
+               return rc;
+
+       /* Tell userspace that EBB is supported */
+       cur_cpu_spec->cpu_user_features2 |= PPC_FEATURE2_EBB;
+
+       return 0;
 }
 early_initcall(init_power8_pmu);


cheers

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] powerpc/perf: Add power8 EBB support
  2013-07-05  2:54           ` Michael Ellerman
@ 2013-07-05 17:57             ` Adhemerval Zanella
  0 siblings, 0 replies; 23+ messages in thread
From: Adhemerval Zanella @ 2013-07-05 17:57 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: ryanarn, linuxppc-dev

On 04-07-2013 23:54, Michael Ellerman wrote:
> On Thu, Jul 04, 2013 at 03:58:01PM -0300, Adhemerval Zanella wrote:
>> Hi Michael,
>>
>> I believe you forgot to add the cpu_user_features2 bit to announce the EBB support
>> for P8, patch following:
> Hi Adhemerval,
>
> You're right, I haven't added it. I was wondering how best to do it.
>
> It's possible to configure the kernel so that it doesn't have PMU
> support, and in that case EBB is unsupported. It's also possible that something
> goes wrong with the PMU registration (kernel bug or OOM), and again EBB is then
> unsupported.
>
> So I think it might be better if we add PPC_FEATURE2_EBB at runtime in
> init_power8_pmu().
>
> What do you think?
Indeed your approach seems better (I wasn't aware you could configure
kernel with perf subsystem).

>
> Something like:
>
> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
> index c7f8ccc..fd9ed89 100644
> --- a/arch/powerpc/perf/power8-pmu.c
> +++ b/arch/powerpc/perf/power8-pmu.c
> @@ -620,10 +682,19 @@ static struct power_pmu power8_pmu = {
>
>  static int __init init_power8_pmu(void)
>  {
> +       int rc;
> +
>         if (!cur_cpu_spec->oprofile_cpu_type ||
>             strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8"))
>                 return -ENODEV;
>
> -       return register_power_pmu(&power8_pmu);
> +       rc = register_power_pmu(&power8_pmu);
> +       if (rc)
> +               return rc;
> +
> +       /* Tell userspace that EBB is supported */
> +       cur_cpu_spec->cpu_user_features2 |= PPC_FEATURE2_EBB;
> +
> +       return 0;
>  }
>  early_initcall(init_power8_pmu);
>
>
> cheers
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable()
  2013-06-25 11:22   ` Anshuman Khandual
  2013-06-26  3:28     ` Michael Ellerman
@ 2013-07-10  0:15     ` Sukadev Bhattiprolu
  2013-07-10  2:12       ` Michael Ellerman
  1 sibling, 1 reply; 23+ messages in thread
From: Sukadev Bhattiprolu @ 2013-07-10  0:15 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linuxppc-dev, Paul Mackerras

Anshuman Khandual [khandual@linux.vnet.ibm.com] wrote:
| On 06/24/2013 04:58 PM, Michael Ellerman wrote:
| > In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
| > bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
| > 
| > It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
| > Occurred) set. When we write that value back it will cause an interrupt
| > to occur. We will then end up in the PMU interrupt handler even though
| > we are supposed to have just disabled the PMU.
| > 
| 
| Is that possible ? First of all MMCR0[PMAO] could not be written by SW.
| Even if you try writing it, how its going to generate PMU interrupt ?
| HW sets this bit MMCR0[PMAO] after a PMU interrupt has already occurred
| not that if we set this, a PMU interrupt would be generated.

Looks like writing 1 MMCR0[PMAO] is allowed (to save interrupts across
partition swaps) and it does generate the interrupt.

| 
| > We can avoid this by making sure we never write PMAO back. We should not
| 
| Making sure that we dont write PMAO back is a good idea though.
| 
| > lose interrupts because when the PMU is re-enabled the overflowed values
| > will cause another interrupt.


Is it enough to set the FC and clear the PMAO - or should we also clear the
PMAE in pmu_disable() (and set it back in pmu_enable()) ?

The PMU spec says "...Alert will occur when enabled condition or event exists
and Performance Monitor Alerts are enabled through MMCR0[PMAE] field"

The condition of overflowing counter will still exist and the PMAE is still
set. So, won't the PMU simply turn PMAO back on after we clear it ?

Or is it that PMAO is only set when counting is enabled but the interrupt is
generated even when counting is disabled ?
| > 
| 
| I doubt this theory.
| 
| > We also reorder the clearing of SAMPLE_ENABLE so that is done after the
| > PMU is frozen. Otherwise there is a small window between the clearing of
| > SAMPLE_ENABLE and the setting of FC where we could take an interrupt and
| > incorrectly see SAMPLE_ENABLE not set. This would for example change the
| > logic in perf_read_regs().
| > 
Good point.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable()
  2013-07-10  0:15     ` Sukadev Bhattiprolu
@ 2013-07-10  2:12       ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2013-07-10  2:12 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: linuxppc-dev, Paul Mackerras, Anshuman Khandual

On Tue, Jul 09, 2013 at 05:15:23PM -0700, Sukadev Bhattiprolu wrote:
> Anshuman Khandual [khandual@linux.vnet.ibm.com] wrote:
> | On 06/24/2013 04:58 PM, Michael Ellerman wrote:
> | > In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
> | > bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
> | > 
> | > It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
> | > Occurred) set. When we write that value back it will cause an interrupt
> | > to occur. We will then end up in the PMU interrupt handler even though
> | > we are supposed to have just disabled the PMU.
> | > 
> | 
> | Is that possible ? First of all MMCR0[PMAO] could not be written by SW.
> | Even if you try writing it, how its going to generate PMU interrupt ?
> | HW sets this bit MMCR0[PMAO] after a PMU interrupt has already occurred
> | not that if we set this, a PMU interrupt would be generated.
> 
> Looks like writing 1 MMCR0[PMAO] is allowed (to save interrupts across
> partition swaps) and it does generate the interrupt.

Yes it's documented in the ISA.

> | > We can avoid this by making sure we never write PMAO back. We should not
> | 
> | Making sure that we dont write PMAO back is a good idea though.
> | 
> | > lose interrupts because when the PMU is re-enabled the overflowed values
> | > will cause another interrupt.
> 
> Is it enough to set the FC and clear the PMAO - or should we also clear the
> PMAE in pmu_disable() (and set it back in pmu_enable()) ?

Yeah that's on my todo list, I just haven't got around to it. I think
clearing PMAE would be more in keeping with what the HW folks
have in mind, but it's a fairly major change so we'd need to test it
across all supported hardware.

It's not that easy to test because if you miss an interrupt rarely you
will generally not notice it. You'll just see a slightly lower count
for the event which you will just put down to variability.

> The PMU spec says "...Alert will occur when enabled condition or event exists
> and Performance Monitor Alerts are enabled through MMCR0[PMAE] field"
> 
> The condition of overflowing counter will still exist and the PMAE is still
> set. So, won't the PMU simply turn PMAO back on after we clear it ?

Not that I've observed. It's not clear to me that the architecture and
the hardware agree 100% on some of these corner cases, but proving it
one way or the other is tricky.
 
cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-07-10  2:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-24 11:28 [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Michael Ellerman
2013-06-24 11:28 ` [PATCH 2/8] powerpc/perf: Rework disable logic in pmu_disable() Michael Ellerman
2013-06-25 11:22   ` Anshuman Khandual
2013-06-26  3:28     ` Michael Ellerman
2013-07-10  0:15     ` Sukadev Bhattiprolu
2013-07-10  2:12       ` Michael Ellerman
2013-06-24 11:28 ` [PATCH 3/8] powerpc/perf: Freeze PMC5/6 if we're not using them Michael Ellerman
2013-06-24 11:28 ` [PATCH 4/8] powerpc/perf: Use existing out label in power_pmu_enable() Michael Ellerman
2013-06-27 11:01   ` Anshuman Khandual
2013-06-24 11:28 ` [PATCH 5/8] powerpc/perf: Don't enable if we have zero events Michael Ellerman
2013-06-28  5:10   ` Anshuman Khandual
2013-06-24 11:28 ` [PATCH 6/8] powerpc/perf: Drop MMCRA from thread_struct Michael Ellerman
2013-06-24 11:28 ` [PATCH 7/8] powerpc/perf: Core EBB support for 64-bit book3s Michael Ellerman
2013-06-26  8:38   ` Anshuman Khandual
2013-06-27 11:52     ` Michael Ellerman
2013-06-24 11:28 ` [PATCH 8/8] powerpc/perf: Add power8 EBB support Michael Ellerman
2013-06-26  9:58   ` Anshuman Khandual
2013-06-27 11:52     ` Michael Ellerman
2013-06-28  4:15       ` Anshuman Khandual
2013-07-04 18:58         ` Adhemerval Zanella
2013-07-05  2:54           ` Michael Ellerman
2013-07-05 17:57             ` Adhemerval Zanella
2013-06-25 10:55 ` [PATCH 1/8] powerpc/perf: Check that events only include valid bits on Power8 Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).