All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14 v2] idle performance improvements
@ 2017-06-11 23:58 Nicholas Piggin
  2017-06-11 23:58 ` [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
                   ` (13 more replies)
  0 siblings, 14 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

I rebased this on the powerpc next tree.

A couple of things are changed since last post:

- Patch 1 now properly accounts for the fact the powernv idle
  wakeups do not re-enable interrupts until the cpuidle driver
  enables them. This was not quite right in the previous patch
  (and prep_irq_for_idle() is not quite right for that case so
  a new primitive has to be introduced).

- Patch to replace interrupts from system reset wakeup changed
  rather than replaying directly, it just marks the IRQ in the
  lazy pending bit and it will get replayed at the right time
  when irqs are re-enabled.

Thanks,
Nick

Nicholas Piggin (14):
  powerpc/64s: idle move soft interrupt mask logic into C code
  powerpc/64s: idle hotplug lazy-irq simplification
  powerpc/64s: idle provide a default idle for POWER9
  powerpc/64s: idle process interrupts from system reset wakeup
  powerpc/64s: msgclr when handling doorbell exceptions
  powerpc/64s: interrupt replay balance the return branch predictor
  powerpc/64s: idle branch to handler with virtual mode offset
  powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
  powerpc/64s: idle hmi wakeup is unlikely
  powerpc/64s: cpuidle set polling before enabling irqs
  powerpc/64s: cpuidle read mostly for common globals
  powerpc/64s: cpuidle no memory barrier after break from idle
  powerpc/64: runlatch CTRL[RUN] set optimisation
  powerpc/64s: idle runlatch switch is done with MSR[EE]=0

 arch/powerpc/include/asm/dbell.h         |  13 +++
 arch/powerpc/include/asm/exception-64s.h |  17 +++-
 arch/powerpc/include/asm/hw_irq.h        |   5 ++
 arch/powerpc/include/asm/machdep.h       |   1 +
 arch/powerpc/include/asm/ppc-opcode.h    |   3 +
 arch/powerpc/include/asm/processor.h     |  10 +--
 arch/powerpc/kernel/asm-offsets.c        |   1 +
 arch/powerpc/kernel/exceptions-64s.S     |  33 ++++++--
 arch/powerpc/kernel/idle_book3s.S        | 135 +++++++++----------------------
 arch/powerpc/kernel/irq.c                |  58 ++++++++++++-
 arch/powerpc/kernel/process.c            |  12 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   8 +-
 arch/powerpc/platforms/powernv/idle.c    |  90 +++++++++++++++++++--
 arch/powerpc/platforms/powernv/smp.c     |  31 ++++---
 arch/powerpc/platforms/powernv/subcore.c |   3 +-
 drivers/cpuidle/cpuidle-powernv.c        |  37 +++++----
 drivers/cpuidle/cpuidle-pseries.c        |  22 +++--
 17 files changed, 309 insertions(+), 170 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12  8:37   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 02/14] powerpc/64s: idle hotplug lazy-irq simplification Nicholas Piggin
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

This simplifies the asm and fixes irq-off tracing over sleep
instructions.

Also move powersave_nap check for POWER8 into C code, and move
PSSCR register value calculation for POWER9 into C.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/hw_irq.h        |  3 ++
 arch/powerpc/include/asm/machdep.h       |  1 +
 arch/powerpc/include/asm/processor.h     | 10 ++--
 arch/powerpc/kernel/idle_book3s.S        | 82 ++++++--------------------------
 arch/powerpc/kernel/irq.c                | 33 ++++++++++++-
 arch/powerpc/platforms/powernv/idle.c    | 71 ++++++++++++++++++++++++---
 arch/powerpc/platforms/powernv/smp.c     |  2 -
 arch/powerpc/platforms/powernv/subcore.c |  3 +-
 drivers/cpuidle/cpuidle-powernv.c        | 12 ++---
 9 files changed, 128 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index eba60416536e..f06112cf8734 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -129,6 +129,9 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 }
 
 extern bool prep_irq_for_idle(void);
+extern bool prep_irq_for_idle_irqsoff(void);
+
+#define fini_irq_for_idle_irqsoff() trace_hardirqs_off();
 
 extern void force_external_irq_replay(void);
 
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index f90b22c722e1..cd2fc1cc1cc7 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -226,6 +226,7 @@ struct machdep_calls {
 extern void e500_idle(void);
 extern void power4_idle(void);
 extern void power7_idle(void);
+extern void power9_idle(void);
 extern void ppc6xx_idle(void);
 extern void book3e_idle(void);
 
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index a2123f291ab0..c49165a7439c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -481,11 +481,11 @@ extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
-extern unsigned long power7_nap(int check_irq);
-extern unsigned long power7_sleep(void);
-extern unsigned long power7_winkle(void);
-extern unsigned long power9_idle_stop(unsigned long stop_psscr_val,
-				      unsigned long stop_psscr_mask);
+extern unsigned long power7_idle_insn(unsigned long type); /* PNV_THREAD_NAP/etc*/
+extern void power7_idle_type(unsigned long type);
+extern unsigned long power9_idle_stop(unsigned long psscr_val);
+extern void power9_idle_type(unsigned long stop_psscr_val,
+			      unsigned long stop_psscr_mask);
 
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 98a6d07ecb5c..35cf5bb7daed 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -109,13 +109,9 @@ core_idle_lock_held:
 /*
  * Pass requested state in r3:
  *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
- *	   - Requested STOP state in POWER9
+ *	   - Requested PSSCR value in POWER9
  *
- * To check IRQ_HAPPENED in r4
- * 	0 - don't check
- * 	1 - check
- *
- * Address to 'rfid' to in r5
+ * Address of idle handler to 'rfid' to in r4
  */
 pnv_powersave_common:
 	/* Use r3 to pass state nap/sleep/winkle */
@@ -131,30 +127,7 @@ pnv_powersave_common:
 	std	r0,_LINK(r1)
 	std	r0,_NIP(r1)
 
-	/* Hard disable interrupts */
-	mfmsr	r9
-	rldicl	r9,r9,48,1
-	rotldi	r9,r9,16
-	mtmsrd	r9,1			/* hard-disable interrupts */
-
-	/* Check if something happened while soft-disabled */
-	lbz	r0,PACAIRQHAPPENED(r13)
-	andi.	r0,r0,~PACA_IRQ_HARD_DIS@l
-	beq	1f
-	cmpwi	cr0,r4,0
-	beq	1f
-	addi	r1,r1,INT_FRAME_SIZE
-	ld	r0,16(r1)
-	li	r3,0			/* Return 0 (no nap) */
-	mtlr	r0
-	blr
-
-1:	/* We mark irqs hard disabled as this is the state we'll
-	 * be in when returning and we need to tell arch_local_irq_restore()
-	 * about it
-	 */
-	li	r0,PACA_IRQ_HARD_DIS
-	stb	r0,PACAIRQHAPPENED(r13)
+	mfmsr   r9
 
 	/* We haven't lost state ... yet */
 	li	r0,0
@@ -163,8 +136,8 @@ pnv_powersave_common:
 	/* Continue saving state */
 	SAVE_GPR(2, r1)
 	SAVE_NVGPRS(r1)
-	mfcr	r4
-	std	r4,_CCR(r1)
+	mfcr	r5
+	std	r5,_CCR(r1)
 	std	r9,_MSR(r1)
 	std	r1,PACAR1(r13)
 
@@ -178,7 +151,7 @@ pnv_powersave_common:
 	li	r6, MSR_RI
 	andc	r6, r9, r6
 	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
-	mtspr	SPRN_SRR0, r5
+	mtspr	SPRN_SRR0, r4
 	mtspr	SPRN_SRR1, r7
 	rfid
 
@@ -322,35 +295,14 @@ lwarx_loop_stop:
 
 	IDLE_STATE_ENTER_SEQ_NORET(PPC_STOP)
 
-_GLOBAL(power7_idle)
+/*
+ * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
+ * r3 contains desired idle state (PNV_THREAD_NAP/SLEEP/WINKLE).
+ */
+_GLOBAL(power7_idle_insn)
 	/* Now check if user or arch enabled NAP mode */
-	LOAD_REG_ADDRBASE(r3,powersave_nap)
-	lwz	r4,ADDROFF(powersave_nap)(r3)
-	cmpwi	0,r4,0
-	beqlr
-	li	r3, 1
-	/* fall through */
-
-_GLOBAL(power7_nap)
-	mr	r4,r3
-	li	r3,PNV_THREAD_NAP
-	LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
-	b	pnv_powersave_common
-	/* No return */
-
-_GLOBAL(power7_sleep)
-	li	r3,PNV_THREAD_SLEEP
-	li	r4,1
-	LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
+	LOAD_REG_ADDR(r4, pnv_enter_arch207_idle_mode)
 	b	pnv_powersave_common
-	/* No return */
-
-_GLOBAL(power7_winkle)
-	li	r3,PNV_THREAD_WINKLE
-	li	r4,1
-	LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
-	b	pnv_powersave_common
-	/* No return */
 
 #define CHECK_HMI_INTERRUPT						\
 	mfspr	r0,SPRN_SRR1;						\
@@ -372,17 +324,13 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
 20:	nop;
 
 /*
- * r3 - The PSSCR value corresponding to the stop state.
- * r4 - The PSSCR mask corrresonding to the stop state.
+ * Entered with MSR[EE]=0 and no soft-masked interrupts pending.
+ * r3 contains desired PSSCR register value.
  */
 _GLOBAL(power9_idle_stop)
-	mfspr   r5,SPRN_PSSCR
-	andc    r5,r5,r4
-	or      r3,r3,r5
 	std	r3, PACA_REQ_PSSCR(r13)
 	mtspr 	SPRN_PSSCR,r3
-	LOAD_REG_ADDR(r5,power_enter_stop)
-	li	r4,1
+	LOAD_REG_ADDR(r4,power_enter_stop)
 	b	pnv_powersave_common
 	/* No return */
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ab2ed9afd3c2..be32cec28107 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -322,7 +322,8 @@ bool prep_irq_for_idle(void)
 	 * First we need to hard disable to ensure no interrupt
 	 * occurs before we effectively enter the low power state
 	 */
-	hard_irq_disable();
+	__hard_irq_disable();
+	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 
 	/*
 	 * If anything happened while we were soft-disabled,
@@ -348,6 +349,36 @@ bool prep_irq_for_idle(void)
 }
 
 /*
+ * This is for idle sequences that return with IRQs off, but the
+ * idle state itself wakes on interrupt. Tell the irq tracer that
+ * IRQs are enabled for the duration of idle so it does not get long
+ * off times. Must be paired with fini_irq_for_idle_irqsoff.
+ */
+bool prep_irq_for_idle_irqsoff(void)
+{
+	WARN_ON(!irqs_disabled());
+
+	/*
+	 * First we need to hard disable to ensure no interrupt
+	 * occurs before we effectively enter the low power state
+	 */
+	__hard_irq_disable();
+	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+
+	/*
+	 * If anything happened while we were soft-disabled,
+	 * we return now and do not enter the low power state.
+	 */
+	if (lazy_irq_pending())
+		return false;
+
+	/* Tell lockdep we are about to re-enable */
+	trace_hardirqs_on();
+
+	return true;
+}
+
+/*
  * Force a replay of the external interrupt handler on this CPU.
  */
 void force_external_irq_replay(void)
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 46946a587004..f875879ff1eb 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -23,6 +23,7 @@
 #include <asm/cpuidle.h>
 #include <asm/code-patching.h>
 #include <asm/smp.h>
+#include <asm/runlatch.h>
 
 #include "powernv.h"
 #include "subcore.h"
@@ -283,12 +284,68 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
 			show_fastsleep_workaround_applyonce,
 			store_fastsleep_workaround_applyonce);
 
+static unsigned long __power7_idle_type(unsigned long type)
+{
+	unsigned long srr1;
+
+	if (!prep_irq_for_idle_irqsoff())
+		return 0;
+
+	ppc64_runlatch_off();
+	srr1 = power7_idle_insn(type);
+	ppc64_runlatch_on();
+
+	fini_irq_for_idle_irqsoff();
+
+	return srr1;
+}
+
+void power7_idle_type(unsigned long type)
+{
+	__power7_idle_type(type);
+}
+
+void power7_idle(void)
+{
+	if (!powersave_nap)
+		return;
+
+	power7_idle_type(PNV_THREAD_NAP);
+}
+
+static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
+				      unsigned long stop_psscr_mask)
+{
+	unsigned long psscr;
+	unsigned long srr1;
+
+	if (!prep_irq_for_idle_irqsoff())
+		return 0;
+
+	psscr = mfspr(SPRN_PSSCR);
+	psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
+
+	ppc64_runlatch_off();
+	srr1 = power9_idle_stop(psscr);
+	ppc64_runlatch_on();
+
+	fini_irq_for_idle_irqsoff();
+
+	return srr1;
+}
+
+void power9_idle_type(unsigned long stop_psscr_val,
+				      unsigned long stop_psscr_mask)
+{
+	__power9_idle_type(stop_psscr_val, stop_psscr_mask);
+}
+
 /*
  * Used for ppc_md.power_save which needs a function with no parameters
  */
-static void power9_idle(void)
+void power9_idle(void)
 {
-	power9_idle_stop(pnv_default_stop_val, pnv_default_stop_mask);
+	power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -303,16 +360,17 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 	u32 idle_states = pnv_get_supported_cpuidle_states();
 
 	if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
-		srr1 = power9_idle_stop(pnv_deepest_stop_psscr_val,
+		srr1 = __power9_idle_type(pnv_deepest_stop_psscr_val,
 					pnv_deepest_stop_psscr_mask);
 	} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
-		srr1 = power7_winkle();
+		srr1 = __power7_idle_type(PNV_THREAD_WINKLE);
 	} else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
 		   (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
-		srr1 = power7_sleep();
+		srr1 = __power7_idle_type(PNV_THREAD_SLEEP);
 	} else if (idle_states & OPAL_PM_NAP_ENABLED) {
-		srr1 = power7_nap(1);
+		srr1 = __power7_idle_type(PNV_THREAD_NAP);
 	} else {
+		ppc64_runlatch_off();
 		/* This is the fallback method. We emulate snooze */
 		while (!generic_check_cpu_restart(cpu)) {
 			HMT_low();
@@ -320,6 +378,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 		}
 		srr1 = 0;
 		HMT_medium();
+		ppc64_runlatch_on();
 	}
 
 	return srr1;
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 4aff754b6f2c..f8752795decf 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -182,9 +182,7 @@ static void pnv_smp_cpu_kill_self(void)
 		 */
 		kvmppc_set_host_ipi(cpu, 0);
 
-		ppc64_runlatch_off();
 		srr1 = pnv_cpu_offline(cpu);
-		ppc64_runlatch_on();
 
 		/*
 		 * If the SRR1 value indicates that we woke up due to
diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
index 0babef11136f..d975d78188a9 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -18,6 +18,7 @@
 #include <linux/stop_machine.h>
 
 #include <asm/cputhreads.h>
+#include <asm/cpuidle.h>
 #include <asm/kvm_ppc.h>
 #include <asm/machdep.h>
 #include <asm/opal.h>
@@ -182,7 +183,7 @@ static void unsplit_core(void)
 	cpu = smp_processor_id();
 	if (cpu_thread_in_core(cpu) != 0) {
 		while (mfspr(SPRN_HID0) & mask)
-			power7_nap(0);
+			power7_idle_insn(PNV_THREAD_NAP);
 
 		per_cpu(split_state, cpu).step = SYNC_STEP_UNSPLIT;
 		return;
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 45eaf06462ae..79152676f62b 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -73,9 +73,8 @@ static int nap_loop(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
 			int index)
 {
-	ppc64_runlatch_off();
-	power7_idle();
-	ppc64_runlatch_on();
+	power7_idle_type(PNV_THREAD_NAP);
+
 	return index;
 }
 
@@ -98,7 +97,8 @@ static int fastsleep_loop(struct cpuidle_device *dev,
 	new_lpcr &= ~LPCR_PECE1;
 
 	mtspr(SPRN_LPCR, new_lpcr);
-	power7_sleep();
+
+	power7_idle_type(PNV_THREAD_SLEEP);
 
 	mtspr(SPRN_LPCR, old_lpcr);
 
@@ -110,10 +110,8 @@ static int stop_loop(struct cpuidle_device *dev,
 		     struct cpuidle_driver *drv,
 		     int index)
 {
-	ppc64_runlatch_off();
-	power9_idle_stop(stop_psscr_table[index].val,
+	power9_idle_type(stop_psscr_table[index].val,
 			 stop_psscr_table[index].mask);
-	ppc64_runlatch_on();
 	return index;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/14] powerpc/64s: idle hotplug lazy-irq simplification
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
  2017-06-11 23:58 ` [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-11 23:58 ` [PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9 Nicholas Piggin
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

Rather than concern ourselves with any soft-mask logic in the CPU
hotplug handler, just hard disable interrupts. This ensures there
are no lazy-irqs pending, which means we can call directly to idle
instruction in order to sleep.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/powernv/idle.c | 23 +++++++++++++++--------
 arch/powerpc/platforms/powernv/smp.c  | 29 ++++++++++++++---------------
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index f875879ff1eb..f188d84d9c59 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -352,25 +352,31 @@ void power9_idle(void)
 /*
  * pnv_cpu_offline: A function that puts the CPU into the deepest
  * available platform idle state on a CPU-Offline.
+ * interrupts hard disabled and no lazy irq pending.
  */
 unsigned long pnv_cpu_offline(unsigned int cpu)
 {
 	unsigned long srr1;
-
 	u32 idle_states = pnv_get_supported_cpuidle_states();
 
+	ppc64_runlatch_off();
+
 	if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
-		srr1 = __power9_idle_type(pnv_deepest_stop_psscr_val,
-					pnv_deepest_stop_psscr_mask);
+		unsigned long psscr;
+
+		psscr = mfspr(SPRN_PSSCR);
+		psscr = (psscr & ~pnv_deepest_stop_psscr_mask) |
+						pnv_deepest_stop_psscr_val;
+		srr1 = power9_idle_stop(psscr);
+
 	} else if (idle_states & OPAL_PM_WINKLE_ENABLED) {
-		srr1 = __power7_idle_type(PNV_THREAD_WINKLE);
+		srr1 = power7_idle_insn(PNV_THREAD_WINKLE);
 	} else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
 		   (idle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
-		srr1 = __power7_idle_type(PNV_THREAD_SLEEP);
+		srr1 = power7_idle_insn(PNV_THREAD_SLEEP);
 	} else if (idle_states & OPAL_PM_NAP_ENABLED) {
-		srr1 = __power7_idle_type(PNV_THREAD_NAP);
+		srr1 = power7_idle_insn(PNV_THREAD_NAP);
 	} else {
-		ppc64_runlatch_off();
 		/* This is the fallback method. We emulate snooze */
 		while (!generic_check_cpu_restart(cpu)) {
 			HMT_low();
@@ -378,9 +384,10 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 		}
 		srr1 = 0;
 		HMT_medium();
-		ppc64_runlatch_on();
 	}
 
+	ppc64_runlatch_on();
+
 	return srr1;
 }
 #endif
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index f8752795decf..c04c87adad94 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -144,7 +144,14 @@ static void pnv_smp_cpu_kill_self(void)
 	unsigned long srr1, wmask;
 
 	/* Standard hot unplug procedure */
-	local_irq_disable();
+	/*
+	 * This hard disables local interurpts, ensuring we have no lazy
+	 * irqs pending.
+	 */
+	WARN_ON(irqs_disabled());
+	hard_irq_disable();
+	WARN_ON(lazy_irq_pending());
+
 	idle_task_exit();
 	current->active_mm = NULL; /* for sanity */
 	cpu = smp_processor_id();
@@ -162,16 +169,6 @@ static void pnv_smp_cpu_kill_self(void)
 	 */
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
 
-	/*
-	 * Hard-disable interrupts, and then clear irq_happened flags
-	 * that we can safely ignore while off-line, since they
-	 * are for things for which we do no processing when off-line
-	 * (or in the case of HMI, all the processing we need to do
-	 * is done in lower-level real-mode code).
-	 */
-	hard_irq_disable();
-	local_paca->irq_happened &= ~(PACA_IRQ_DEC | PACA_IRQ_HMI);
-
 	while (!generic_check_cpu_restart(cpu)) {
 		/*
 		 * Clear IPI flag, since we don't handle IPIs while
@@ -184,6 +181,8 @@ static void pnv_smp_cpu_kill_self(void)
 
 		srr1 = pnv_cpu_offline(cpu);
 
+		WARN_ON(lazy_irq_pending());
+
 		/*
 		 * If the SRR1 value indicates that we woke up due to
 		 * an external interrupt, then clear the interrupt.
@@ -196,8 +195,7 @@ static void pnv_smp_cpu_kill_self(void)
 		 * contains 0.
 		 */
 		if (((srr1 & wmask) == SRR1_WAKEEE) ||
-		    ((srr1 & wmask) == SRR1_WAKEHVI) ||
-		    (local_paca->irq_happened & PACA_IRQ_EE)) {
+		    ((srr1 & wmask) == SRR1_WAKEHVI)) {
 			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
 				if (xive_enabled())
 					xive_flush_interrupt();
@@ -209,14 +207,15 @@ static void pnv_smp_cpu_kill_self(void)
 			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 			asm volatile(PPC_MSGCLR(%0) : : "r" (msg));
 		}
-		local_paca->irq_happened &= ~(PACA_IRQ_EE | PACA_IRQ_DBELL);
 		smp_mb();
 
 		if (cpu_core_split_required())
 			continue;
 
 		if (srr1 && !generic_check_cpu_restart(cpu))
-			DBG("CPU%d Unexpected exit while offline !\n", cpu);
+			DBG("CPU%d Unexpected exit while offline srr1=%lx!\n",
+					cpu, srr1);
+
 	}
 
 	/* Re-enable decrementer interrupts */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
  2017-06-11 23:58 ` [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
  2017-06-11 23:58 ` [PATCH 02/14] powerpc/64s: idle hotplug lazy-irq simplification Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12  8:53   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup Nicholas Piggin
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

Before the cpuidle driver is enabled, provide a default idle
function similarly to POWER7/8.

This should not have much effect, because the cpuidle driver
for powernv is mandatory, but if that changes we should have
a fallback.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/powernv/idle.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index f188d84d9c59..e327e1585ddc 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -677,6 +677,8 @@ static int __init pnv_init_idle_states(void)
 
 	if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
 		ppc_md.power_save = power7_idle;
+	else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
+		ppc_md.power_save = power9_idle;
 
 out:
 	return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (2 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9 Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12  9:41   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions Nicholas Piggin
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

When the CPU wakes from low power state, it begins at the system reset
interrupt with the exception that caused the wakeup encoded in SRR1.

Today, powernv idle wakeup ignores the wakeup reason (except a special
case for HMI), and the regular interrupt corresponding to the
exception will fire after the idle wakeup exits.

Change this to replay the interrupt from the idle wakeup before
interrupts are hard-enabled.

Test on POWER8 of context_switch selftests benchmark with polling idle
disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
results:

                                original         wakeup direct
Different threads, same core:   315k/s           264k/s
Different cores:                235k/s           242k/s

There is a slowdown for doorbell IPI (same core) case because system
reset wakeup does not clear the message and the doorbell interrupt
fires again needlessly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/hw_irq.h     |  2 ++
 arch/powerpc/kernel/irq.c             | 25 +++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/idle.c | 10 ++++++++--
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index f06112cf8734..8366bdc69988 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -32,6 +32,7 @@
 #ifndef __ASSEMBLY__
 
 extern void __replay_interrupt(unsigned int vector);
+extern void __replay_wakeup_interrupt(unsigned long srr1);
 
 extern void timer_interrupt(struct pt_regs *);
 extern void performance_monitor_exception(struct pt_regs *regs);
@@ -130,6 +131,7 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 
 extern bool prep_irq_for_idle(void);
 extern bool prep_irq_for_idle_irqsoff(void);
+extern void irq_set_pending_from_srr1(unsigned long srr1);
 
 #define fini_irq_for_idle_irqsoff() trace_hardirqs_off();
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index be32cec28107..d8a85768c8ec 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -348,6 +348,7 @@ bool prep_irq_for_idle(void)
 	return true;
 }
 
+#ifdef CONFIG_PPC_BOOK3S
 /*
  * This is for idle sequences that return with IRQs off, but the
  * idle state itself wakes on interrupt. Tell the irq tracer that
@@ -379,6 +380,30 @@ bool prep_irq_for_idle_irqsoff(void)
 }
 
 /*
+ * Take the SRR1 wakeup reason, index into this table to find the
+ * appropriate irq_happened bit.
+ */
+static const u8 srr1_to_irq[0x10] = {
+	0, 0, 0,
+	PACA_IRQ_DBELL,
+	0,
+	PACA_IRQ_DBELL,
+	PACA_IRQ_DEC,
+	0,
+	PACA_IRQ_EE,
+	PACA_IRQ_EE,
+	PACA_IRQ_HMI,
+	0, 0, 0, 0, 0 };
+
+void irq_set_pending_from_srr1(unsigned long srr1)
+{
+	unsigned int idx = (srr1 >> 18) & SRR1_WAKEMASK_P8;
+
+	local_paca->irq_happened |= srr1_to_irq[idx];
+}
+#endif /* CONFIG_PPC_BOOK3S */
+
+/*
  * Force a replay of the external interrupt handler on this CPU.
  */
 void force_external_irq_replay(void)
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index e327e1585ddc..ee416e016387 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -302,7 +302,10 @@ static unsigned long __power7_idle_type(unsigned long type)
 
 void power7_idle_type(unsigned long type)
 {
-	__power7_idle_type(type);
+	unsigned long srr1;
+
+	srr1 = __power7_idle_type(type);
+	irq_set_pending_from_srr1(srr1);
 }
 
 void power7_idle(void)
@@ -337,7 +340,10 @@ static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
 void power9_idle_type(unsigned long stop_psscr_val,
 				      unsigned long stop_psscr_mask)
 {
-	__power9_idle_type(stop_psscr_val, stop_psscr_mask);
+	unsigned long srr1;
+
+	srr1 = __power9_idle_type(stop_psscr_val, stop_psscr_mask);
+	irq_set_pending_from_srr1(srr1);
 }
 
 /*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (3 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 14:38   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor Nicholas Piggin
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

msgsnd doorbell exceptions are cleared when the doorbell interrupt is
taken. However if a doorbell exception causes a system reset interrupt
wake from power saving state, the message is not cleared. Processing
the doorbell from the system reset interrupt requires msgclr to avoid
taking the exception again.

Testing this plus the previous wakup direct patch gives:

                                original         wakeup direct     msgclr
Different threads, same core:   315k/s           264k/s            345k/s
Different cores:                235k/s           242k/s            242k/s

Net speedup is +10% for same core, and +3% for different core.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/dbell.h      | 13 +++++++++++++
 arch/powerpc/include/asm/ppc-opcode.h |  3 +++
 arch/powerpc/kernel/asm-offsets.c     |  1 +
 arch/powerpc/kernel/exceptions-64s.S  | 23 +++++++++++++++++++++--
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h
index f70cbfe0ec04..9f2ae0d25e15 100644
--- a/arch/powerpc/include/asm/dbell.h
+++ b/arch/powerpc/include/asm/dbell.h
@@ -56,6 +56,19 @@ static inline void ppc_msgsync(void)
 				: : "i" (CPU_FTR_HVMODE|CPU_FTR_ARCH_300));
 }
 
+static inline void _ppc_msgclr(u32 msg)
+{
+	__asm__ __volatile__ (ASM_FTR_IFSET(PPC_MSGCLR(%1), PPC_MSGCLRP(%1), %0)
+				: : "i" (CPU_FTR_HVMODE), "r" (msg));
+}
+
+static inline void ppc_msgclr(enum ppc_dbell type)
+{
+	u32 msg = PPC_DBELL_TYPE(type);
+
+	_ppc_msgclr(msg);
+}
+
 #else /* CONFIG_PPC_BOOK3S */
 
 #define PPC_DBELL_MSGTYPE		PPC_DBELL
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 3a8d278e7421..3b29c54e51fa 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -221,6 +221,7 @@
 #define PPC_INST_MSGCLR			0x7c0001dc
 #define PPC_INST_MSGSYNC		0x7c0006ec
 #define PPC_INST_MSGSNDP		0x7c00011c
+#define PPC_INST_MSGCLRP		0x7c00015c
 #define PPC_INST_MTTMR			0x7c0003dc
 #define PPC_INST_NOP			0x60000000
 #define PPC_INST_PASTE			0x7c00070c
@@ -409,6 +410,8 @@
 					___PPC_RB(b))
 #define PPC_MSGSNDP(b)		stringify_in_c(.long PPC_INST_MSGSNDP | \
 					___PPC_RB(b))
+#define PPC_MSGCLRP(b)		stringify_in_c(.long PPC_INST_MSGCLRP | \
+					___PPC_RB(b))
 #define PPC_POPCNTB(a, s)	stringify_in_c(.long PPC_INST_POPCNTB | \
 					__PPC_RA(a) | __PPC_RS(s))
 #define PPC_POPCNTD(a, s)	stringify_in_c(.long PPC_INST_POPCNTD | \
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index e15c178ba079..9624851ca276 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -746,6 +746,7 @@ int main(void)
 #endif
 
 	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+	DEFINE(PPC_DBELL_MSGTYPE, PPC_DBELL_MSGTYPE);
 
 #ifdef CONFIG_PPC_8xx
 	DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE));
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ae418b85c17c..a04ee0d7f88e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1552,6 +1552,25 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	b	1b
 
 /*
+ * When doorbell is triggered from system reset wakeup, the message is
+ * not cleared, so it would fire again when EE is enabled.
+ *
+ * When coming from local_irq_enable, there may be the same problem if
+ * we were hard disabled.
+ *
+ * Execute msgclr to clear pending exceptions before handling it.
+ */
+h_doorbell_common_msgclr:
+	LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+	PPC_MSGCLR(3)
+	b 	h_doorbell_common
+
+doorbell_super_common_msgclr:
+	LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
+	PPC_MSGCLRP(3)
+	b 	doorbell_super_common
+
+/*
  * Called from arch_local_irq_enable when an interrupt needs
  * to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate
  * which kind of interrupt. MSR:EE is already off. We generate a
@@ -1576,13 +1595,13 @@ _GLOBAL(__replay_interrupt)
 	beq	hardware_interrupt_common
 BEGIN_FTR_SECTION
 	cmpwi	r3,0xe80
-	beq	h_doorbell_common
+	beq	h_doorbell_common_msgclr
 	cmpwi	r3,0xea0
 	beq	h_virt_irq_common
 	cmpwi	r3,0xe60
 	beq	hmi_exception_common
 FTR_SECTION_ELSE
 	cmpwi	r3,0xa00
-	beq	doorbell_super_common
+	beq	doorbell_super_common_msgclr
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
 	blr
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (4 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 14:41   ` Gautham R Shenoy
  2017-06-13  9:51   ` Michael Ellerman
  2017-06-11 23:58 ` [PATCH 07/14] powerpc/64s: idle branch to handler with virtual mode offset Nicholas Piggin
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

The __replay_interrupt code is branched to with bl, but the caller is
returned to directly with rfid from the interrupt.

Instead return to a return stub that returns to the caller with blr,
which should do better with the return predictor.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index a04ee0d7f88e..d55201625ea3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1586,7 +1586,7 @@ _GLOBAL(__replay_interrupt)
 	 * we don't give a damn about, so we don't bother storing them.
 	 */
 	mfmsr	r12
-	mflr	r11
+	LOAD_REG_ADDR(r11, __replay_interrupt_return)
 	mfcr	r9
 	ori	r12,r12,MSR_EE
 	cmpwi	r3,0x900
@@ -1604,4 +1604,5 @@ FTR_SECTION_ELSE
 	cmpwi	r3,0xa00
 	beq	doorbell_super_common_msgclr
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
+__replay_interrupt_return:
 	blr
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/14] powerpc/64s: idle branch to handler with virtual mode offset
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (5 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-11 23:58 ` [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications of
avoiding rfid when switching to virtual mode in the wakeup handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/exception-64s.h | 17 ++++++++++++++---
 arch/powerpc/kernel/exceptions-64s.S     |  6 ++++--
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 183d73b6ed99..0912e328e1d7 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -236,15 +236,26 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define kvmppc_interrupt kvmppc_interrupt_pr
 #endif
 
+/*
+ * Branch to label using its 0xC000 address. This gives the same real address
+ * when relocation is off, but allows mtmsr to set MSR[IR|DR]=1.
+ * This could set the 0xc bits for !RELOCATABLE rather than load KBASE for
+ * a slight optimisation.
+ */
+#define BRANCH_TO_C000(reg, label)					\
+	__LOAD_HANDLER(reg, label);					\
+	mtctr	reg;							\
+	bctr
+
 #ifdef CONFIG_RELOCATABLE
 #define BRANCH_TO_COMMON(reg, label)					\
 	__LOAD_HANDLER(reg, label);					\
 	mtctr	reg;							\
 	bctr
 
-#define BRANCH_LINK_TO_FAR(label)					\
-	__LOAD_FAR_HANDLER(r12, label);					\
-	mtctr	r12;							\
+#define BRANCH_LINK_TO_FAR(reg, label)					\
+	__LOAD_FAR_HANDLER(reg, label);					\
+	mtctr	reg;							\
 	bctrl
 
 /*
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index d55201625ea3..fec7c933d095 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -99,7 +99,9 @@ EXC_VIRT_NONE(0x4000, 0x100)
 #ifdef CONFIG_PPC_P7_NAP
 	/*
 	 * If running native on arch 2.06 or later, check if we are waking up
-	 * from nap/sleep/winkle, and branch to idle handler.
+	 * from nap/sleep/winkle, and branch to idle handler. The idle wakeup
+	 * handler initially runs in real mode, but we branch to the 0xc000...
+	 * address so we can turn on relocation with mtmsr.
 	 */
 #define IDLETEST(n)							\
 	BEGIN_FTR_SECTION ;						\
@@ -107,7 +109,7 @@ EXC_VIRT_NONE(0x4000, 0x100)
 	rlwinm.	r10,r10,47-31,30,31 ;					\
 	beq-	1f ;							\
 	cmpwi	cr3,r10,2 ;						\
-	BRANCH_TO_COMMON(r10, system_reset_idle_common) ;		\
+	BRANCH_TO_C000(r10, system_reset_idle_common) ;			\
 1:									\
 	END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #else
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (6 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 07/14] powerpc/64s: idle branch to handler with virtual mode offset Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-13 10:25   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 09/14] powerpc/64s: idle hmi wakeup is unlikely Nicholas Piggin
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

Idle code now always runs at the 0xc... effective address whether
in real or virtual mode. This means rfid can be ditched, along
with a lot of SRR manipulations.

In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
MSR states as required.

This also balances the return prediction for the idle call, by
doing blr rather than rfid to return to the idle caller.

On POWER9, 2-process context switch on different cores, with snooze
disabled, increases performance by 2%.
---
 arch/powerpc/kernel/exceptions-64s.S    |  1 +
 arch/powerpc/kernel/idle_book3s.S       | 57 +++++++++++++++------------------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 ++++-
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index fec7c933d095..c3d0aef089a7 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -130,6 +130,7 @@ EXC_VIRT_NONE(0x4100, 0x100)
 
 #ifdef CONFIG_PPC_P7_NAP
 EXC_COMMON_BEGIN(system_reset_idle_common)
+	mfspr	r12,SPRN_SRR1
 	b	pnv_powersave_wakeup
 #endif
 
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 35cf5bb7daed..6305d4d7a268 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -111,7 +111,7 @@ core_idle_lock_held:
  *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
  *	   - Requested PSSCR value in POWER9
  *
- * Address of idle handler to 'rfid' to in r4
+ * Address of idle handler to branch to in realmode in r4
  */
 pnv_powersave_common:
 	/* Use r3 to pass state nap/sleep/winkle */
@@ -121,14 +121,14 @@ pnv_powersave_common:
 	 * need to save PC, some CR bits and the NV GPRs,
 	 * but for now an interrupt frame will do.
 	 */
+	mtctr	r4
+
 	mflr	r0
 	std	r0,16(r1)
 	stdu	r1,-INT_FRAME_SIZE(r1)
 	std	r0,_LINK(r1)
 	std	r0,_NIP(r1)
 
-	mfmsr   r9
-
 	/* We haven't lost state ... yet */
 	li	r0,0
 	stb	r0,PACA_NAPSTATELOST(r13)
@@ -138,7 +138,6 @@ pnv_powersave_common:
 	SAVE_NVGPRS(r1)
 	mfcr	r5
 	std	r5,_CCR(r1)
-	std	r9,_MSR(r1)
 	std	r1,PACAR1(r13)
 
 	/*
@@ -148,12 +147,8 @@ pnv_powersave_common:
 	 * the MMU context to the guest.
 	 */
 	LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
-	li	r6, MSR_RI
-	andc	r6, r9, r6
-	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
-	mtspr	SPRN_SRR0, r4
-	mtspr	SPRN_SRR1, r7
-	rfid
+	mtmsrd	r7,0
+	bctr
 
 	.globl pnv_enter_arch207_idle_mode
 pnv_enter_arch207_idle_mode:
@@ -305,11 +300,10 @@ _GLOBAL(power7_idle_insn)
 	b	pnv_powersave_common
 
 #define CHECK_HMI_INTERRUPT						\
-	mfspr	r0,SPRN_SRR1;						\
 BEGIN_FTR_SECTION_NESTED(66);						\
-	rlwinm	r0,r0,45-31,0xf;  /* extract wake reason field (P8) */	\
+	rlwinm	r0,r12,45-31,0xf;  /* extract wake reason field (P8) */	\
 FTR_SECTION_ELSE_NESTED(66);						\
-	rlwinm	r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */	\
+	rlwinm	r0,r12,45-31,0xe;  /* P7 wake reason field is 3 bits */	\
 ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
 	cmpwi	r0,0xa;			/* Hypervisor maintenance ? */	\
 	bne	20f;							\
@@ -388,17 +382,17 @@ pnv_powersave_wakeup_mce:
 
 	/*
 	 * Now put the original SRR1 with SRR1_WAKEMCE_RESVD as the wake
-	 * reason into SRR1, which allows reuse of the system reset wakeup
+	 * reason into r12, which allows reuse of the system reset wakeup
 	 * code without being mistaken for another type of wakeup.
 	 */
-	oris	r3,r3,SRR1_WAKEMCE_RESVD@h
-	mtspr	SPRN_SRR1,r3
+	oris	r12,r3,SRR1_WAKEMCE_RESVD@h
 
 	b	pnv_powersave_wakeup
 
 /*
  * Called from reset vector for powersave wakeups.
  * cr3 - set to gt if waking up with partial/complete hypervisor state loss
+ * r12 - SRR1
  */
 .global pnv_powersave_wakeup
 pnv_powersave_wakeup:
@@ -408,8 +402,10 @@ BEGIN_FTR_SECTION
 BEGIN_FTR_SECTION_NESTED(70)
 	bl	power9_dd1_recover_paca
 END_FTR_SECTION_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
+	ld	r1,PACAR1(r13)
 	bl	pnv_restore_hyp_resource_arch300
 FTR_SECTION_ELSE
+	ld	r1,PACAR1(r13)
 	bl	pnv_restore_hyp_resource_arch207
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 
@@ -429,7 +425,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 #endif
 
 	/* Return SRR1 from power7_nap() */
-	mfspr	r3,SPRN_SRR1
+	mr	r3,r12
 	blt	cr3,pnv_wakeup_noloss
 	b	pnv_wakeup_loss
 
@@ -503,7 +499,6 @@ pnv_restore_hyp_resource_arch207:
  * r4 - PACA_THREAD_IDLE_STATE
  */
 pnv_wakeup_tb_loss:
-	ld	r1,PACAR1(r13)
 	/*
 	 * Before entering any idle state, the NVGPRs are saved in the stack.
 	 * If there was a state loss, or PACA_NAPSTATELOST was set, then the
@@ -529,9 +524,9 @@ pnv_wakeup_tb_loss:
 	 * is required to return back to reset vector after hypervisor state
 	 * restore is complete.
 	 */
+	mr	r19,r12
 	mr	r18,r4
 	mflr	r17
-	mfspr	r16,SPRN_SRR1
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
@@ -781,7 +776,7 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 hypervisor_state_restored:
 
-	mtspr	SPRN_SRR1,r16
+	mr	r12,r19
 	mtlr	r17
 	blr		/* return to pnv_powersave_wakeup */
 
@@ -797,20 +792,19 @@ fastsleep_workaround_at_exit:
  */
 .global pnv_wakeup_loss
 pnv_wakeup_loss:
-	ld	r1,PACAR1(r13)
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 	REST_NVGPRS(r1)
 	REST_GPR(2, r1)
+	ld	r4,PACAKMSR(r13)
+	ld	r5,_LINK(r1)
 	ld	r6,_CCR(r1)
-	ld	r4,_MSR(r1)
-	ld	r5,_NIP(r1)
 	addi	r1,r1,INT_FRAME_SIZE
+	mtlr	r5
 	mtcr	r6
-	mtspr	SPRN_SRR1,r4
-	mtspr	SPRN_SRR0,r5
-	rfid
+	mtmsrd	r4
+	blr
 
 /*
  * R3 here contains the value that will be returned to the caller
@@ -823,12 +817,11 @@ pnv_wakeup_noloss:
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-	ld	r1,PACAR1(r13)
-	ld	r6,_CCR(r1)
-	ld	r4,_MSR(r1)
+	ld	r4,PACAKMSR(r13)
 	ld	r5,_NIP(r1)
+	ld	r6,_CCR(r1)
 	addi	r1,r1,INT_FRAME_SIZE
+	mtlr	r5
 	mtcr	r6
-	mtspr	SPRN_SRR1,r4
-	mtspr	SPRN_SRR0,r5
-	rfid
+	mtmsrd	r4
+	blr
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bdb3f76ceb6b..eb5b78b6bacf 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -329,15 +329,21 @@ kvm_novcpu_exit:
  * We come in here when wakened from nap mode.
  * Relocation is off and most register values are lost.
  * r13 points to the PACA.
+ * r3 contains the SRR1 wakeup value, SRR1 is trashed.
  */
 	.globl	kvm_start_guest
 kvm_start_guest:
-
 	/* Set runlatch bit the minute you wake up from nap */
 	mfspr	r0, SPRN_CTRLF
 	ori 	r0, r0, 1
 	mtspr	SPRN_CTRLT, r0
 
+	/*
+	 * Could avoid this and pass it through in r3. For now,
+	 * code expects it to be in SRR1.
+	 */
+	mtspr	r3,SPRN_SRR1
+
 	ld	r2,PACATOC(r13)
 
 	li	r0,KVM_HWTHREAD_IN_KVM
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/14] powerpc/64s: idle hmi wakeup is unlikely
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (7 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 15:03   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs Nicholas Piggin
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

In a busy system, idle wakeups can be expected from IPIs and device
interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/idle_book3s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 6305d4d7a268..32b76fb28352 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -306,7 +306,7 @@ FTR_SECTION_ELSE_NESTED(66);						\
 	rlwinm	r0,r12,45-31,0xe;  /* P7 wake reason field is 3 bits */	\
 ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
 	cmpwi	r0,0xa;			/* Hypervisor maintenance ? */	\
-	bne	20f;							\
+	bne+	20f;							\
 	/* Invoke opal call to handle hmi */				\
 	ld	r2,PACATOC(r13);					\
 	ld	r1,PACAR1(r13);						\
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (8 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 09/14] powerpc/64s: idle hmi wakeup is unlikely Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 15:10   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals Nicholas Piggin
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.

Expand the TIF_POLLING_NRFLAG coverage to as large as possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 drivers/cpuidle/cpuidle-powernv.c | 4 +++-
 drivers/cpuidle/cpuidle-pseries.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 79152676f62b..50b3c2e0306f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -51,9 +51,10 @@ static int snooze_loop(struct cpuidle_device *dev,
 {
 	u64 snooze_exit_time;
 
-	local_irq_enable();
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
+	local_irq_enable();
+
 	snooze_exit_time = get_tb() + snooze_timeout;
 	ppc64_runlatch_off();
 	HMT_very_low();
@@ -66,6 +67,7 @@ static int snooze_loop(struct cpuidle_device *dev,
 	ppc64_runlatch_on();
 	clear_thread_flag(TIF_POLLING_NRFLAG);
 	smp_mb();
+
 	return index;
 }
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 166ccd711ec9..7b12bb2ea70f 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -62,9 +62,10 @@ static int snooze_loop(struct cpuidle_device *dev,
 	unsigned long in_purr;
 	u64 snooze_exit_time;
 
+	set_thread_flag(TIF_POLLING_NRFLAG);
+
 	idle_loop_prolog(&in_purr);
 	local_irq_enable();
-	set_thread_flag(TIF_POLLING_NRFLAG);
 	snooze_exit_time = get_tb() + snooze_timeout;
 
 	while (!need_resched()) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (9 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 15:30   ` Gautham R Shenoy
  2017-06-11 23:58 ` [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle Nicholas Piggin
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

Ensure these don't get put into bouncing cachelines.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 drivers/cpuidle/cpuidle-powernv.c | 10 +++++-----
 drivers/cpuidle/cpuidle-pseries.c |  8 ++++----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 50b3c2e0306f..9d03326ac05e 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -32,18 +32,18 @@ static struct cpuidle_driver powernv_idle_driver = {
 	.owner            = THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
 
 struct stop_psscr_table {
 	u64 val;
 	u64 mask;
 };
 
-static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX];
+static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
 
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static int snooze_loop(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv,
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 7b12bb2ea70f..a404f352d284 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -25,10 +25,10 @@ struct cpuidle_driver pseries_idle_driver = {
 	.owner            = THIS_MODULE,
 };
 
-static int max_idle_state;
-static struct cpuidle_state *cpuidle_state_table;
-static u64 snooze_timeout;
-static bool snooze_timeout_en;
+static int max_idle_state __read_mostly;
+static struct cpuidle_state *cpuidle_state_table __read_mostly;
+static u64 snooze_timeout __read_mostly;
+static bool snooze_timeout_en __read_mostly;
 
 static inline void idle_loop_prolog(unsigned long *in_purr)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (10 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 17:48   ` Vaidyanathan Srinivasan
  2017-06-11 23:58 ` [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation Nicholas Piggin
  2017-06-11 23:58 ` [PATCH 14/14] powerpc/64s: idle runlatch switch is done with MSR[EE]=0 Nicholas Piggin
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

A memory barrier is not required after the task wakes up,
only if we clear the polling flag before waking. The case
where we have work to do is the important one, so optimise
for it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 drivers/cpuidle/cpuidle-powernv.c | 11 +++++++++--
 drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 9d03326ac05e..37b0698b7193 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
 	ppc64_runlatch_off();
 	HMT_very_low();
 	while (!need_resched()) {
-		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
+		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+			/*
+			 * Task has not woken up but we are exiting the polling
+			 * loop anyway. Require a barrier after polling is
+			 * cleared to order subsequent test of need_resched().
+			 */
+			clear_thread_flag(TIF_POLLING_NRFLAG);
+			smp_mb();
 			break;
+		}
 	}
 
 	HMT_medium();
 	ppc64_runlatch_on();
 	clear_thread_flag(TIF_POLLING_NRFLAG);
-	smp_mb();
 
 	return index;
 }
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index a404f352d284..e9b3853d93ea 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -71,13 +71,20 @@ static int snooze_loop(struct cpuidle_device *dev,
 	while (!need_resched()) {
 		HMT_low();
 		HMT_very_low();
-		if (snooze_timeout_en && get_tb() > snooze_exit_time)
+		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+			/*
+			 * Task has not woken up but we are exiting the polling
+			 * loop anyway. Require a barrier after polling is
+			 * cleared to order subsequent test of need_resched().
+			 */
+			clear_thread_flag(TIF_POLLING_NRFLAG);
+			smp_mb();
 			break;
+		}
 	}
 
 	HMT_medium();
 	clear_thread_flag(TIF_POLLING_NRFLAG);
-	smp_mb();
 
 	idle_loop_epilog(in_purr);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (11 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 17:11   ` Vaidyanathan Srinivasan
  2017-06-11 23:58 ` [PATCH 14/14] powerpc/64s: idle runlatch switch is done with MSR[EE]=0 Nicholas Piggin
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

The CTRL register is read-only except bit 63 which is the run latch
control. This means it can be updated with a mtspr rather than
mfspr/mtspr.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/process.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..a44ea034c226 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1960,12 +1960,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
 void notrace __ppc64_runlatch_on(void)
 {
 	struct thread_info *ti = current_thread_info();
-	unsigned long ctrl;
-
-	ctrl = mfspr(SPRN_CTRLF);
-	ctrl |= CTRL_RUNLATCH;
-	mtspr(SPRN_CTRLT, ctrl);
 
+	mtspr(SPRN_CTRLT, CTRL_RUNLATCH);
 	ti->local_flags |= _TLF_RUNLATCH;
 }
 
@@ -1973,13 +1969,9 @@ void notrace __ppc64_runlatch_on(void)
 void notrace __ppc64_runlatch_off(void)
 {
 	struct thread_info *ti = current_thread_info();
-	unsigned long ctrl;
 
 	ti->local_flags &= ~_TLF_RUNLATCH;
-
-	ctrl = mfspr(SPRN_CTRLF);
-	ctrl &= ~CTRL_RUNLATCH;
-	mtspr(SPRN_CTRLT, ctrl);
+	mtspr(SPRN_CTRLT, 0);
 }
 #endif /* CONFIG_PPC64 */
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 14/14] powerpc/64s: idle runlatch switch is done with MSR[EE]=0
  2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
                   ` (12 preceding siblings ...)
  2017-06-11 23:58 ` [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation Nicholas Piggin
@ 2017-06-11 23:58 ` Nicholas Piggin
  2017-06-12 17:00   ` Vaidyanathan Srinivasan
  13 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-11 23:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

2*mfmsr and 2*mtmsr can be avoided in the idle sleep/wake code
because we know the MSR[EE] is clear.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/powernv/idle.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index ee416e016387..84c55a5ef7ea 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -291,9 +291,9 @@ static unsigned long __power7_idle_type(unsigned long type)
 	if (!prep_irq_for_idle_irqsoff())
 		return 0;
 
-	ppc64_runlatch_off();
+	__ppc64_runlatch_off();
 	srr1 = power7_idle_insn(type);
-	ppc64_runlatch_on();
+	__ppc64_runlatch_on();
 
 	fini_irq_for_idle_irqsoff();
 
@@ -328,9 +328,9 @@ static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
 	psscr = mfspr(SPRN_PSSCR);
 	psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
 
-	ppc64_runlatch_off();
+	__ppc64_runlatch_off();
 	srr1 = power9_idle_stop(psscr);
-	ppc64_runlatch_on();
+	__ppc64_runlatch_on();
 
 	fini_irq_for_idle_irqsoff();
 
@@ -365,7 +365,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 	unsigned long srr1;
 	u32 idle_states = pnv_get_supported_cpuidle_states();
 
-	ppc64_runlatch_off();
+	__ppc64_runlatch_off();
 
 	if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
 		unsigned long psscr;
@@ -392,7 +392,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 		HMT_medium();
 	}
 
-	ppc64_runlatch_on();
+	__ppc64_runlatch_on();
 
 	return srr1;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code
  2017-06-11 23:58 ` [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
@ 2017-06-12  8:37   ` Gautham R Shenoy
  2017-06-12 14:46     ` Nicholas Piggin
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12  8:37 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu, Paul Mackerras

Hi Nick,

(Added Paul Mackerass to the Cc)
On Mon, Jun 12, 2017 at 09:58:22AM +1000, Nicholas Piggin wrote:
> This simplifies the asm and fixes irq-off tracing over sleep
> instructions.
> 
> Also move powersave_nap check for POWER8 into C code, and move
> PSSCR register value calculation for POWER9 into C.
>

Thanks for doing this. Only one minor comment.

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> index 98a6d07ecb5c..35cf5bb7daed 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S

[..snip..]

> @@ -109,13 +109,9 @@ core_idle_lock_held:
>  /*
>   * Pass requested state in r3:
>   *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
> - *	   - Requested STOP state in POWER9
> + *	   - Requested PSSCR value in POWER9
>   *
> - * To check IRQ_HAPPENED in r4
> - * 	0 - don't check
> - * 	1 - check
> - *
> - * Address to 'rfid' to in r5
> + * Address of idle handler to 'rfid' to in r4
>   */
>  pnv_powersave_common:
>  	/* Use r3 to pass state nap/sleep/winkle */
> @@ -131,30 +127,7 @@ pnv_powersave_common:
>  	std	r0,_LINK(r1)
>  	std	r0,_NIP(r1)
> 
> -	/* Hard disable interrupts */
> -	mfmsr	r9
> -	rldicl	r9,r9,48,1
> -	rotldi	r9,r9,16
> -	mtmsrd	r9,1			/* hard-disable interrupts */
> -
> -	/* Check if something happened while soft-disabled */
> -	lbz	r0,PACAIRQHAPPENED(r13)
> -	andi.	r0,r0,~PACA_IRQ_HARD_DIS@l
> -	beq	1f
> -	cmpwi	cr0,r4,0

There were callers to power7_nap() in the dynamic-core-split/unsplit
which wanted nap to be forced irrespective of whether an irq was
pending or not. With this new patch, there won't be a way to enforce
that on POWER8.

> -	beq	1f
> -	addi	r1,r1,INT_FRAME_SIZE
> -	ld	r0,16(r1)
> -	li	r3,0			/* Return 0 (no nap) */
> -	mtlr	r0
> -	blr
> -

[..snip..]

> diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
> index 0babef11136f..d975d78188a9 100644
> --- a/arch/powerpc/platforms/powernv/subcore.c
> +++ b/arch/powerpc/platforms/powernv/subcore.c
> @@ -18,6 +18,7 @@
>  #include <linux/stop_machine.h>
> 
>  #include <asm/cputhreads.h>
> +#include <asm/cpuidle.h>
>  #include <asm/kvm_ppc.h>
>  #include <asm/machdep.h>
>  #include <asm/opal.h>
> @@ -182,7 +183,7 @@ static void unsplit_core(void)
>  	cpu = smp_processor_id();
>  	if (cpu_thread_in_core(cpu) != 0) {
>  		while (mfspr(SPRN_HID0) & mask)
> -			power7_nap(0);
> +			power7_idle_insn(PNV_THREAD_NAP);

This is the place where we are unsplitting the core after exiting the
guest and returning to the host. This required us to execute the nap
instruction. With the new patch, will just loop inside this loop
without executing nap. Paul, will this be an issue ?

> 
>  		per_cpu(split_state, cpu).step = SYNC_STEP_UNSPLIT;
>  		return;
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 45eaf06462ae..79152676f62b 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -73,9 +73,8 @@ static int nap_loop(struct cpuidle_device *dev,
>  			struct cpuidle_driver *drv,
>  			int index)
>  {
> -	ppc64_runlatch_off();
> -	power7_idle();
> -	ppc64_runlatch_on();
> +	power7_idle_type(PNV_THREAD_NAP);
> +
>  	return index;
>  }
> 
> @@ -98,7 +97,8 @@ static int fastsleep_loop(struct cpuidle_device *dev,
>  	new_lpcr &= ~LPCR_PECE1;
> 
>  	mtspr(SPRN_LPCR, new_lpcr);
> -	power7_sleep();
> +
> +	power7_idle_type(PNV_THREAD_SLEEP);
> 
>  	mtspr(SPRN_LPCR, old_lpcr);
> 
> @@ -110,10 +110,8 @@ static int stop_loop(struct cpuidle_device *dev,
>  		     struct cpuidle_driver *drv,
>  		     int index)
>  {
> -	ppc64_runlatch_off();
> -	power9_idle_stop(stop_psscr_table[index].val,
> +	power9_idle_type(stop_psscr_table[index].val,
>  			 stop_psscr_table[index].mask);
> -	ppc64_runlatch_on();
>  	return index;
>  }
> 
> -- 
> 2.11.0
> 

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9
  2017-06-11 23:58 ` [PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9 Nicholas Piggin
@ 2017-06-12  8:53   ` Gautham R Shenoy
  2017-06-12 14:46     ` Nicholas Piggin
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12  8:53 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

Hi Nick,

On Mon, Jun 12, 2017 at 09:58:24AM +1000, Nicholas Piggin wrote:
> Before the cpuidle driver is enabled, provide a default idle
> function similarly to POWER7/8.
> 
> This should not have much effect, because the cpuidle driver
> for powernv is mandatory, but if that changes we should have
> a fallback.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/platforms/powernv/idle.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index f188d84d9c59..e327e1585ddc 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -677,6 +677,8 @@ static int __init pnv_init_idle_states(void)
> 
>  	if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
>  		ppc_md.power_save = power7_idle;
> +	else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
> +		ppc_md.power_save = power9_idle;

We are already initializing this in pnv_power9_idle_init() depending
on whether the device tree has exposed at least one INST_FAST idle
state. Else this should be NULL, because the firmware doesn't want us
to use a platform idle state!


> 
>  out:
>  	return 0;
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup
  2017-06-11 23:58 ` [PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup Nicholas Piggin
@ 2017-06-12  9:41   ` Gautham R Shenoy
  2017-06-12 14:51     ` Nicholas Piggin
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12  9:41 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

Hi Nick,

On Mon, Jun 12, 2017 at 09:58:25AM +1000, Nicholas Piggin wrote:
> When the CPU wakes from low power state, it begins at the system reset
> interrupt with the exception that caused the wakeup encoded in SRR1.
> 
> Today, powernv idle wakeup ignores the wakeup reason (except a special
> case for HMI), and the regular interrupt corresponding to the
> exception will fire after the idle wakeup exits.
> 
> Change this to replay the interrupt from the idle wakeup before
> interrupts are hard-enabled.
> 
> Test on POWER8 of context_switch selftests benchmark with polling idle
> disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
> results:
> 
>                                 original         wakeup direct
> Different threads, same core:   315k/s           264k/s
> Different cores:                235k/s           242k/s
> 
> There is a slowdown for doorbell IPI (same core) case because system
> reset wakeup does not clear the message and the doorbell interrupt
> fires again needlessly.

Should we clear the doorbell message if we are recording the fact that
the Doorbell irq has happened in paca ?

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/hw_irq.h     |  2 ++
>  arch/powerpc/kernel/irq.c             | 25 +++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/idle.c | 10 ++++++++--
>  3 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index f06112cf8734..8366bdc69988 100644

[..snip..]

> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -348,6 +348,7 @@ bool prep_irq_for_idle(void)
>  	return true;
>  }
> 
> +#ifdef CONFIG_PPC_BOOK3S
>  /*
>   * This is for idle sequences that return with IRQs off, but the
>   * idle state itself wakes on interrupt. Tell the irq tracer that
> @@ -379,6 +380,30 @@ bool prep_irq_for_idle_irqsoff(void)
>  }
> 
>  /*
> + * Take the SRR1 wakeup reason, index into this table to find the
> + * appropriate irq_happened bit.
> + */
> +static const u8 srr1_to_irq[0x10] = {
> +	0, 0, 0,
> +	PACA_IRQ_DBELL,
> +	0,
> +	PACA_IRQ_DBELL,
> +	PACA_IRQ_DEC,
> +	0,
> +	PACA_IRQ_EE,
> +	PACA_IRQ_EE,
> +	PACA_IRQ_HMI,
> +	0, 0, 0, 0, 0 };
> +
> +void irq_set_pending_from_srr1(unsigned long srr1)
> +{
> +	unsigned int idx = (srr1 >> 18) & SRR1_WAKEMASK_P8;

Shouldn't this be
	unsigned int idx = (srr1 & SRR1_WAKEMASK_P8) >> 18;
	
	?
> +
> +	local_paca->irq_happened |= srr1_to_irq[idx];
> +}
> +#endif /* CONFIG_PPC_BOOK3S */
> +


Looks good otherwise.

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions
  2017-06-11 23:58 ` [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions Nicholas Piggin
@ 2017-06-12 14:38   ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12 14:38 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

On Mon, Jun 12, 2017 at 09:58:26AM +1000, Nicholas Piggin wrote:
> msgsnd doorbell exceptions are cleared when the doorbell interrupt is
> taken. However if a doorbell exception causes a system reset interrupt
> wake from power saving state, the message is not cleared. Processing
> the doorbell from the system reset interrupt requires msgclr to avoid
> taking the exception again.
>

So you are clearing the doorbell message in this patch.

> Testing this plus the previous wakup direct patch gives:
> 
>                                 original         wakeup direct     msgclr
> Different threads, same core:   315k/s           264k/s            345k/s
> Different cores:                235k/s           242k/s            242k/s
> 
> Net speedup is +10% for same core, and +3% for different core.

This is good speedup.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>


> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/dbell.h      | 13 +++++++++++++
>  arch/powerpc/include/asm/ppc-opcode.h |  3 +++
>  arch/powerpc/kernel/asm-offsets.c     |  1 +
>  arch/powerpc/kernel/exceptions-64s.S  | 23 +++++++++++++++++++++--
>  4 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dbell.h b/arch/powerpc/include/asm/dbell.h
> index f70cbfe0ec04..9f2ae0d25e15 100644
> --- a/arch/powerpc/include/asm/dbell.h
> +++ b/arch/powerpc/include/asm/dbell.h
> @@ -56,6 +56,19 @@ static inline void ppc_msgsync(void)
>  				: : "i" (CPU_FTR_HVMODE|CPU_FTR_ARCH_300));
>  }
> 
> +static inline void _ppc_msgclr(u32 msg)
> +{
> +	__asm__ __volatile__ (ASM_FTR_IFSET(PPC_MSGCLR(%1), PPC_MSGCLRP(%1), %0)
> +				: : "i" (CPU_FTR_HVMODE), "r" (msg));
> +}
> +
> +static inline void ppc_msgclr(enum ppc_dbell type)
> +{
> +	u32 msg = PPC_DBELL_TYPE(type);
> +
> +	_ppc_msgclr(msg);
> +}
> +
>  #else /* CONFIG_PPC_BOOK3S */
> 
>  #define PPC_DBELL_MSGTYPE		PPC_DBELL
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
> index 3a8d278e7421..3b29c54e51fa 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -221,6 +221,7 @@
>  #define PPC_INST_MSGCLR			0x7c0001dc
>  #define PPC_INST_MSGSYNC		0x7c0006ec
>  #define PPC_INST_MSGSNDP		0x7c00011c
> +#define PPC_INST_MSGCLRP		0x7c00015c
>  #define PPC_INST_MTTMR			0x7c0003dc
>  #define PPC_INST_NOP			0x60000000
>  #define PPC_INST_PASTE			0x7c00070c
> @@ -409,6 +410,8 @@
>  					___PPC_RB(b))
>  #define PPC_MSGSNDP(b)		stringify_in_c(.long PPC_INST_MSGSNDP | \
>  					___PPC_RB(b))
> +#define PPC_MSGCLRP(b)		stringify_in_c(.long PPC_INST_MSGCLRP | \
> +					___PPC_RB(b))
>  #define PPC_POPCNTB(a, s)	stringify_in_c(.long PPC_INST_POPCNTB | \
>  					__PPC_RA(a) | __PPC_RS(s))
>  #define PPC_POPCNTD(a, s)	stringify_in_c(.long PPC_INST_POPCNTD | \
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index e15c178ba079..9624851ca276 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -746,6 +746,7 @@ int main(void)
>  #endif
> 
>  	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> +	DEFINE(PPC_DBELL_MSGTYPE, PPC_DBELL_MSGTYPE);
> 
>  #ifdef CONFIG_PPC_8xx
>  	DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE));
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index ae418b85c17c..a04ee0d7f88e 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1552,6 +1552,25 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>  	b	1b
> 
>  /*
> + * When doorbell is triggered from system reset wakeup, the message is
> + * not cleared, so it would fire again when EE is enabled.
> + *
> + * When coming from local_irq_enable, there may be the same problem if
> + * we were hard disabled.
> + *
> + * Execute msgclr to clear pending exceptions before handling it.
> + */
> +h_doorbell_common_msgclr:
> +	LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
> +	PPC_MSGCLR(3)
> +	b 	h_doorbell_common
> +
> +doorbell_super_common_msgclr:
> +	LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
> +	PPC_MSGCLRP(3)
> +	b 	doorbell_super_common
> +
> +/*
>   * Called from arch_local_irq_enable when an interrupt needs
>   * to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate
>   * which kind of interrupt. MSR:EE is already off. We generate a
> @@ -1576,13 +1595,13 @@ _GLOBAL(__replay_interrupt)
>  	beq	hardware_interrupt_common
>  BEGIN_FTR_SECTION
>  	cmpwi	r3,0xe80
> -	beq	h_doorbell_common
> +	beq	h_doorbell_common_msgclr
>  	cmpwi	r3,0xea0
>  	beq	h_virt_irq_common
>  	cmpwi	r3,0xe60
>  	beq	hmi_exception_common
>  FTR_SECTION_ELSE
>  	cmpwi	r3,0xa00
> -	beq	doorbell_super_common
> +	beq	doorbell_super_common_msgclr
>  ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
>  	blr
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor
  2017-06-11 23:58 ` [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor Nicholas Piggin
@ 2017-06-12 14:41   ` Gautham R Shenoy
  2017-06-13  9:51   ` Michael Ellerman
  1 sibling, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12 14:41 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

On Mon, Jun 12, 2017 at 09:58:27AM +1000, Nicholas Piggin wrote:
> The __replay_interrupt code is branched to with bl, but the caller is
> returned to directly with rfid from the interrupt.
> 
> Instead return to a return stub that returns to the caller with blr,
> which should do better with the return predictor.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index a04ee0d7f88e..d55201625ea3 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1586,7 +1586,7 @@ _GLOBAL(__replay_interrupt)
>  	 * we don't give a damn about, so we don't bother storing them.
>  	 */
>  	mfmsr	r12
> -	mflr	r11
> +	LOAD_REG_ADDR(r11, __replay_interrupt_return)
>  	mfcr	r9
>  	ori	r12,r12,MSR_EE
>  	cmpwi	r3,0x900
> @@ -1604,4 +1604,5 @@ FTR_SECTION_ELSE
>  	cmpwi	r3,0xa00
>  	beq	doorbell_super_common_msgclr
>  ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
> +__replay_interrupt_return:
>  	blr
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code
  2017-06-12  8:37   ` Gautham R Shenoy
@ 2017-06-12 14:46     ` Nicholas Piggin
  2017-06-13  4:28       ` Gautham R Shenoy
  0 siblings, 1 reply; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-12 14:46 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: linuxppc-dev, Paul Mackerras

Hi Gautham,

Thanks for the reviews.

On Mon, 12 Jun 2017 14:07:27 +0530
Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:

> Hi Nick,
> 
> (Added Paul Mackerass to the Cc)
> On Mon, Jun 12, 2017 at 09:58:22AM +1000, Nicholas Piggin wrote:
> > This simplifies the asm and fixes irq-off tracing over sleep
> > instructions.
> > 
> > Also move powersave_nap check for POWER8 into C code, and move
> > PSSCR register value calculation for POWER9 into C.
> >  
> 
> Thanks for doing this. Only one minor comment.
> 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > index 98a6d07ecb5c..35cf5bb7daed 100644
> > --- a/arch/powerpc/kernel/idle_book3s.S
> > +++ b/arch/powerpc/kernel/idle_book3s.S  
> 
> [..snip..]
> 
> > @@ -109,13 +109,9 @@ core_idle_lock_held:
> >  /*
> >   * Pass requested state in r3:
> >   *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
> > - *	   - Requested STOP state in POWER9
> > + *	   - Requested PSSCR value in POWER9
> >   *
> > - * To check IRQ_HAPPENED in r4
> > - * 	0 - don't check
> > - * 	1 - check
> > - *
> > - * Address to 'rfid' to in r5
> > + * Address of idle handler to 'rfid' to in r4
> >   */
> >  pnv_powersave_common:
> >  	/* Use r3 to pass state nap/sleep/winkle */
> > @@ -131,30 +127,7 @@ pnv_powersave_common:
> >  	std	r0,_LINK(r1)
> >  	std	r0,_NIP(r1)
> > 
> > -	/* Hard disable interrupts */
> > -	mfmsr	r9
> > -	rldicl	r9,r9,48,1
> > -	rotldi	r9,r9,16
> > -	mtmsrd	r9,1			/* hard-disable interrupts */
> > -
> > -	/* Check if something happened while soft-disabled */
> > -	lbz	r0,PACAIRQHAPPENED(r13)
> > -	andi.	r0,r0,~PACA_IRQ_HARD_DIS@l
> > -	beq	1f
> > -	cmpwi	cr0,r4,0  
> 
> There were callers to power7_nap() in the dynamic-core-split/unsplit
> which wanted nap to be forced irrespective of whether an irq was
> pending or not. With this new patch, there won't be a way to enforce
> that on POWER8.

Actually there are two APIs to sleep now, the low level one which
does not check lazy irq or nap disable is power7_idle_insn(). The
one which sets up the lazy irq state is power7_idle_type(). The
dynamic core split calls the former, so AFAIKS it should be equivalent.

Not the best names perhaps. Open to better suggestions.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9
  2017-06-12  8:53   ` Gautham R Shenoy
@ 2017-06-12 14:46     ` Nicholas Piggin
  0 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-12 14:46 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: linuxppc-dev

On Mon, 12 Jun 2017 14:23:16 +0530
Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:

> Hi Nick,
> 
> On Mon, Jun 12, 2017 at 09:58:24AM +1000, Nicholas Piggin wrote:
> > Before the cpuidle driver is enabled, provide a default idle
> > function similarly to POWER7/8.
> > 
> > This should not have much effect, because the cpuidle driver
> > for powernv is mandatory, but if that changes we should have
> > a fallback.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/platforms/powernv/idle.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> > index f188d84d9c59..e327e1585ddc 100644
> > --- a/arch/powerpc/platforms/powernv/idle.c
> > +++ b/arch/powerpc/platforms/powernv/idle.c
> > @@ -677,6 +677,8 @@ static int __init pnv_init_idle_states(void)
> > 
> >  	if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
> >  		ppc_md.power_save = power7_idle;
> > +	else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
> > +		ppc_md.power_save = power9_idle;  
> 
> We are already initializing this in pnv_power9_idle_init() depending
> on whether the device tree has exposed at least one INST_FAST idle
> state. Else this should be NULL, because the firmware doesn't want us
> to use a platform idle state!

Ah I missed that, thanks. Will drop this one.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup
  2017-06-12  9:41   ` Gautham R Shenoy
@ 2017-06-12 14:51     ` Nicholas Piggin
  0 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-12 14:51 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: linuxppc-dev

On Mon, 12 Jun 2017 15:11:06 +0530
Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:

> Hi Nick,
> 
> On Mon, Jun 12, 2017 at 09:58:25AM +1000, Nicholas Piggin wrote:
> > When the CPU wakes from low power state, it begins at the system reset
> > interrupt with the exception that caused the wakeup encoded in SRR1.
> > 
> > Today, powernv idle wakeup ignores the wakeup reason (except a special
> > case for HMI), and the regular interrupt corresponding to the
> > exception will fire after the idle wakeup exits.
> > 
> > Change this to replay the interrupt from the idle wakeup before
> > interrupts are hard-enabled.
> > 
> > Test on POWER8 of context_switch selftests benchmark with polling idle
> > disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
> > results:
> > 
> >                                 original         wakeup direct
> > Different threads, same core:   315k/s           264k/s
> > Different cores:                235k/s           242k/s
> > 
> > There is a slowdown for doorbell IPI (same core) case because system
> > reset wakeup does not clear the message and the doorbell interrupt
> > fires again needlessly.  
> 
> Should we clear the doorbell message if we are recording the fact that
> the Doorbell irq has happened in paca ?

As you saw, I clear it in the next patch.

It would be quite tidy to msgclear when moving srr1 wakeup into pending
mask. But on the other hand, I thought that deferring the clear as late
as possible and moving it closer to where IRQs will be hard enabled
again may help to avoid taking other IPIs.

> 
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/include/asm/hw_irq.h     |  2 ++
> >  arch/powerpc/kernel/irq.c             | 25 +++++++++++++++++++++++++
> >  arch/powerpc/platforms/powernv/idle.c | 10 ++++++++--
> >  3 files changed, 35 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> > index f06112cf8734..8366bdc69988 100644  
> 
> [..snip..]
> 
> > --- a/arch/powerpc/kernel/irq.c
> > +++ b/arch/powerpc/kernel/irq.c
> > @@ -348,6 +348,7 @@ bool prep_irq_for_idle(void)
> >  	return true;
> >  }
> > 
> > +#ifdef CONFIG_PPC_BOOK3S
> >  /*
> >   * This is for idle sequences that return with IRQs off, but the
> >   * idle state itself wakes on interrupt. Tell the irq tracer that
> > @@ -379,6 +380,30 @@ bool prep_irq_for_idle_irqsoff(void)
> >  }
> > 
> >  /*
> > + * Take the SRR1 wakeup reason, index into this table to find the
> > + * appropriate irq_happened bit.
> > + */
> > +static const u8 srr1_to_irq[0x10] = {
> > +	0, 0, 0,
> > +	PACA_IRQ_DBELL,
> > +	0,
> > +	PACA_IRQ_DBELL,
> > +	PACA_IRQ_DEC,
> > +	0,
> > +	PACA_IRQ_EE,
> > +	PACA_IRQ_EE,
> > +	PACA_IRQ_HMI,
> > +	0, 0, 0, 0, 0 };
> > +
> > +void irq_set_pending_from_srr1(unsigned long srr1)
> > +{
> > +	unsigned int idx = (srr1 >> 18) & SRR1_WAKEMASK_P8;  
> 
> Shouldn't this be
> 	unsigned int idx = (srr1 & SRR1_WAKEMASK_P8) >> 18;
> 	
> 	?

Yes, good catch. That will teach me not to verify after making a change.
Will fix.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/14] powerpc/64s: idle hmi wakeup is unlikely
  2017-06-11 23:58 ` [PATCH 09/14] powerpc/64s: idle hmi wakeup is unlikely Nicholas Piggin
@ 2017-06-12 15:03   ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12 15:03 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

On Mon, Jun 12, 2017 at 09:58:30AM +1000, Nicholas Piggin wrote:
> In a busy system, idle wakeups can be expected from IPIs and device
> interrupts.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/idle_book3s.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 6305d4d7a268..32b76fb28352 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -306,7 +306,7 @@ FTR_SECTION_ELSE_NESTED(66);						\
>  	rlwinm	r0,r12,45-31,0xe;  /* P7 wake reason field is 3 bits */	\
>  ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
>  	cmpwi	r0,0xa;			/* Hypervisor maintenance ? */	\
> -	bne	20f;							\
> +	bne+	20f;							\
>  	/* Invoke opal call to handle hmi */				\
>  	ld	r2,PACATOC(r13);					\
>  	ld	r1,PACAR1(r13);						\
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs
  2017-06-11 23:58 ` [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs Nicholas Piggin
@ 2017-06-12 15:10   ` Gautham R Shenoy
  2017-06-12 15:20     ` Nicholas Piggin
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12 15:10 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

On Mon, Jun 12, 2017 at 09:58:31AM +1000, Nicholas Piggin wrote:
> local_irq_enable can cause interrupts to be taken which could
> take significant amount of processing time. The idle process
> should set its polling flag before this, so another process that
> wakes it during this time will not have to send an IPI.
> 
> Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Looks good. Were you able to see this make a difference in any of the
tests ?

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  drivers/cpuidle/cpuidle-powernv.c | 4 +++-
>  drivers/cpuidle/cpuidle-pseries.c | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 79152676f62b..50b3c2e0306f 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -51,9 +51,10 @@ static int snooze_loop(struct cpuidle_device *dev,
>  {
>  	u64 snooze_exit_time;
> 
> -	local_irq_enable();
>  	set_thread_flag(TIF_POLLING_NRFLAG);
> 
> +	local_irq_enable();
> +
>  	snooze_exit_time = get_tb() + snooze_timeout;
>  	ppc64_runlatch_off();
>  	HMT_very_low();
> @@ -66,6 +67,7 @@ static int snooze_loop(struct cpuidle_device *dev,
>  	ppc64_runlatch_on();
>  	clear_thread_flag(TIF_POLLING_NRFLAG);
>  	smp_mb();
> +
>  	return index;
>  }
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 166ccd711ec9..7b12bb2ea70f 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -62,9 +62,10 @@ static int snooze_loop(struct cpuidle_device *dev,
>  	unsigned long in_purr;
>  	u64 snooze_exit_time;
> 
> +	set_thread_flag(TIF_POLLING_NRFLAG);
> +
>  	idle_loop_prolog(&in_purr);
>  	local_irq_enable();
> -	set_thread_flag(TIF_POLLING_NRFLAG);
>  	snooze_exit_time = get_tb() + snooze_timeout;
> 
>  	while (!need_resched()) {
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs
  2017-06-12 15:10   ` Gautham R Shenoy
@ 2017-06-12 15:20     ` Nicholas Piggin
  0 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-12 15:20 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: linuxppc-dev

On Mon, 12 Jun 2017 20:40:25 +0530
Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:

> On Mon, Jun 12, 2017 at 09:58:31AM +1000, Nicholas Piggin wrote:
> > local_irq_enable can cause interrupts to be taken which could
> > take significant amount of processing time. The idle process
> > should set its polling flag before this, so another process that
> > wakes it during this time will not have to send an IPI.
> > 
> > Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>  
> 
> Looks good. Were you able to see this make a difference in any of the
> tests ?

No I didn't measure a difference or have a test case where this was
noticable. I think on a workload with some IO interrupts as well as
cross-CPU wakeups, then statistically we should see some improvement
in IPI rates with this patch.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals
  2017-06-11 23:58 ` [PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals Nicholas Piggin
@ 2017-06-12 15:30   ` Gautham R Shenoy
  2017-06-12 17:50     ` Vaidyanathan Srinivasan
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-12 15:30 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

On Mon, Jun 12, 2017 at 09:58:32AM +1000, Nicholas Piggin wrote:
> Ensure these don't get put into bouncing cachelines.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  drivers/cpuidle/cpuidle-powernv.c | 10 +++++-----
>  drivers/cpuidle/cpuidle-pseries.c |  8 ++++----
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 50b3c2e0306f..9d03326ac05e 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -32,18 +32,18 @@ static struct cpuidle_driver powernv_idle_driver = {
>  	.owner            = THIS_MODULE,
>  };
> 
> -static int max_idle_state;
> -static struct cpuidle_state *cpuidle_state_table;
> +static int max_idle_state __read_mostly;
> +static struct cpuidle_state *cpuidle_state_table __read_mostly;
> 
>  struct stop_psscr_table {
>  	u64 val;
>  	u64 mask;
>  };
> 
> -static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX];
> +static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
> 
> -static u64 snooze_timeout;
> -static bool snooze_timeout_en;
> +static u64 snooze_timeout __read_mostly;
> +static bool snooze_timeout_en __read_mostly;
> 
>  static int snooze_loop(struct cpuidle_device *dev,
>  			struct cpuidle_driver *drv,
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 7b12bb2ea70f..a404f352d284 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -25,10 +25,10 @@ struct cpuidle_driver pseries_idle_driver = {
>  	.owner            = THIS_MODULE,
>  };
> 
> -static int max_idle_state;
> -static struct cpuidle_state *cpuidle_state_table;
> -static u64 snooze_timeout;
> -static bool snooze_timeout_en;
> +static int max_idle_state __read_mostly;
> +static struct cpuidle_state *cpuidle_state_table __read_mostly;
> +static u64 snooze_timeout __read_mostly;
> +static bool snooze_timeout_en __read_mostly;
> 
>  static inline void idle_loop_prolog(unsigned long *in_purr)
>  {
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 14/14] powerpc/64s: idle runlatch switch is done with MSR[EE]=0
  2017-06-11 23:58 ` [PATCH 14/14] powerpc/64s: idle runlatch switch is done with MSR[EE]=0 Nicholas Piggin
@ 2017-06-12 17:00   ` Vaidyanathan Srinivasan
  0 siblings, 0 replies; 42+ messages in thread
From: Vaidyanathan Srinivasan @ 2017-06-12 17:00 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

* Nicholas Piggin <npiggin@gmail.com> [2017-06-12 09:58:35]:

> 2*mfmsr and 2*mtmsr can be avoided in the idle sleep/wake code
> because we know the MSR[EE] is clear.

Good optimization for powernv.
 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

> ---
>  arch/powerpc/platforms/powernv/idle.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index ee416e016387..84c55a5ef7ea 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -291,9 +291,9 @@ static unsigned long __power7_idle_type(unsigned long type)
>  	if (!prep_irq_for_idle_irqsoff())
>  		return 0;
> 
> -	ppc64_runlatch_off();
> +	__ppc64_runlatch_off();
>  	srr1 = power7_idle_insn(type);
> -	ppc64_runlatch_on();
> +	__ppc64_runlatch_on();
> 
>  	fini_irq_for_idle_irqsoff();
> 
> @@ -328,9 +328,9 @@ static unsigned long __power9_idle_type(unsigned long stop_psscr_val,
>  	psscr = mfspr(SPRN_PSSCR);
>  	psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
> 
> -	ppc64_runlatch_off();
> +	__ppc64_runlatch_off();
>  	srr1 = power9_idle_stop(psscr);
> -	ppc64_runlatch_on();
> +	__ppc64_runlatch_on();
> 
>  	fini_irq_for_idle_irqsoff();
> 
> @@ -365,7 +365,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
>  	unsigned long srr1;
>  	u32 idle_states = pnv_get_supported_cpuidle_states();
> 
> -	ppc64_runlatch_off();
> +	__ppc64_runlatch_off();
> 
>  	if (cpu_has_feature(CPU_FTR_ARCH_300) && deepest_stop_found) {
>  		unsigned long psscr;
> @@ -392,7 +392,7 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
>  		HMT_medium();
>  	}
> 
> -	ppc64_runlatch_on();
> +	__ppc64_runlatch_on();
> 
>  	return srr1;
>  }
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation
  2017-06-11 23:58 ` [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation Nicholas Piggin
@ 2017-06-12 17:11   ` Vaidyanathan Srinivasan
  2017-06-13 10:04     ` Michael Ellerman
  0 siblings, 1 reply; 42+ messages in thread
From: Vaidyanathan Srinivasan @ 2017-06-12 17:11 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

* Nicholas Piggin <npiggin@gmail.com> [2017-06-12 09:58:34]:

> The CTRL register is read-only except bit 63 which is the run latch
> control. This means it can be updated with a mtspr rather than
> mfspr/mtspr.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/process.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index baae104b16c7..a44ea034c226 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1960,12 +1960,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>  void notrace __ppc64_runlatch_on(void)
>  {
>  	struct thread_info *ti = current_thread_info();
> -	unsigned long ctrl;
> -
> -	ctrl = mfspr(SPRN_CTRLF);
> -	ctrl |= CTRL_RUNLATCH;
> -	mtspr(SPRN_CTRLT, ctrl);
> 
> +	mtspr(SPRN_CTRLT, CTRL_RUNLATCH);
>  	ti->local_flags |= _TLF_RUNLATCH;
>  }
> 
> @@ -1973,13 +1969,9 @@ void notrace __ppc64_runlatch_on(void)
>  void notrace __ppc64_runlatch_off(void)
>  {
>  	struct thread_info *ti = current_thread_info();
> -	unsigned long ctrl;
> 
>  	ti->local_flags &= ~_TLF_RUNLATCH;
> -
> -	ctrl = mfspr(SPRN_CTRLF);
> -	ctrl &= ~CTRL_RUNLATCH;
> -	mtspr(SPRN_CTRLT, ctrl);
> +	mtspr(SPRN_CTRLT, 0);

Good idea.  Writing to CTRL register can change only the RUN field.
Was this any different in older generations?  Anton and Ben kept the
mfspr/mtspr part in earlier updates to this routine.

--Vaidy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle
  2017-06-11 23:58 ` [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle Nicholas Piggin
@ 2017-06-12 17:48   ` Vaidyanathan Srinivasan
  2017-06-13 12:47     ` Nicholas Piggin
  0 siblings, 1 reply; 42+ messages in thread
From: Vaidyanathan Srinivasan @ 2017-06-12 17:48 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

* Nicholas Piggin <npiggin@gmail.com> [2017-06-12 09:58:33]:

> A memory barrier is not required after the task wakes up,
> only if we clear the polling flag before waking. The case
> where we have work to do is the important one, so optimise
> for it.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>


> ---
>  drivers/cpuidle/cpuidle-powernv.c | 11 +++++++++--
>  drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++--
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 9d03326ac05e..37b0698b7193 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
>  	ppc64_runlatch_off();
>  	HMT_very_low();
>  	while (!need_resched()) {
> -		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
> +		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
> +			/*
> +			 * Task has not woken up but we are exiting the polling
> +			 * loop anyway. Require a barrier after polling is
> +			 * cleared to order subsequent test of need_resched().
> +			 */
> +			clear_thread_flag(TIF_POLLING_NRFLAG);
> +			smp_mb();
>  			break;
> +		}
>  	}
> 
>  	HMT_medium();
>  	ppc64_runlatch_on();
>  	clear_thread_flag(TIF_POLLING_NRFLAG);
> -	smp_mb();

If we reach here without executing if(snooze_timeout) with the
barrier+break, that means we have seen the need_resched flag and hence
we can avoid the barrier.  Clearing of the polling flag can be seen
(take affect) later as we will exit the idle loop and re-enter anyway
at this point.  The caller also will check for need_resched and exit
the idle loop.

Actually do_idle() has a __current_set_polling() and
__current_clr_polling() in the default idle loop.  Do we really need
to set/clear the TIF_POLLING_NRFLAG again in cpuidle driver?

--Vaidy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals
  2017-06-12 15:30   ` Gautham R Shenoy
@ 2017-06-12 17:50     ` Vaidyanathan Srinivasan
  0 siblings, 0 replies; 42+ messages in thread
From: Vaidyanathan Srinivasan @ 2017-06-12 17:50 UTC (permalink / raw)
  To: Gautham R Shenoy; +Cc: Nicholas Piggin, linuxppc-dev, Shreyas B . Prabhu

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2017-06-12 21:00:27]:

> On Mon, Jun 12, 2017 at 09:58:32AM +1000, Nicholas Piggin wrote:
> > Ensure these don't get put into bouncing cachelines.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> 
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
 
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
 
> > ---
> >  drivers/cpuidle/cpuidle-powernv.c | 10 +++++-----
> >  drivers/cpuidle/cpuidle-pseries.c |  8 ++++----
> >  2 files changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > index 50b3c2e0306f..9d03326ac05e 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -32,18 +32,18 @@ static struct cpuidle_driver powernv_idle_driver = {
> >  	.owner            = THIS_MODULE,
> >  };
> > 
> > -static int max_idle_state;
> > -static struct cpuidle_state *cpuidle_state_table;
> > +static int max_idle_state __read_mostly;
> > +static struct cpuidle_state *cpuidle_state_table __read_mostly;
> > 
> >  struct stop_psscr_table {
> >  	u64 val;
> >  	u64 mask;
> >  };
> > 
> > -static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX];
> > +static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
> > 
> > -static u64 snooze_timeout;
> > -static bool snooze_timeout_en;
> > +static u64 snooze_timeout __read_mostly;
> > +static bool snooze_timeout_en __read_mostly;
> > 
> >  static int snooze_loop(struct cpuidle_device *dev,
> >  			struct cpuidle_driver *drv,
> > diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> > index 7b12bb2ea70f..a404f352d284 100644
> > --- a/drivers/cpuidle/cpuidle-pseries.c
> > +++ b/drivers/cpuidle/cpuidle-pseries.c
> > @@ -25,10 +25,10 @@ struct cpuidle_driver pseries_idle_driver = {
> >  	.owner            = THIS_MODULE,
> >  };
> > 
> > -static int max_idle_state;
> > -static struct cpuidle_state *cpuidle_state_table;
> > -static u64 snooze_timeout;
> > -static bool snooze_timeout_en;
> > +static int max_idle_state __read_mostly;
> > +static struct cpuidle_state *cpuidle_state_table __read_mostly;
> > +static u64 snooze_timeout __read_mostly;
> > +static bool snooze_timeout_en __read_mostly;
> > 

Simple annotation of  __read_mostly could save us a few cache line
bounces.  Good idea.

--Vaidy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code
  2017-06-12 14:46     ` Nicholas Piggin
@ 2017-06-13  4:28       ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-13  4:28 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Gautham R Shenoy, linuxppc-dev, Paul Mackerras

On Tue, Jun 13, 2017 at 12:46:02AM +1000, Nicholas Piggin wrote:
> Hi Gautham,
> 
> Thanks for the reviews.
> 
> On Mon, 12 Jun 2017 14:07:27 +0530
> Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:
> 
> > Hi Nick,
> > 
> > (Added Paul Mackerass to the Cc)
> > On Mon, Jun 12, 2017 at 09:58:22AM +1000, Nicholas Piggin wrote:
> > > This simplifies the asm and fixes irq-off tracing over sleep
> > > instructions.
> > > 
> > > Also move powersave_nap check for POWER8 into C code, and move
> > > PSSCR register value calculation for POWER9 into C.
> > >  
> > 
> > Thanks for doing this. Only one minor comment.
> > 
> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > ---
> > > index 98a6d07ecb5c..35cf5bb7daed 100644
> > > --- a/arch/powerpc/kernel/idle_book3s.S
> > > +++ b/arch/powerpc/kernel/idle_book3s.S  
> > 
> > [..snip..]
> > 
> > > @@ -109,13 +109,9 @@ core_idle_lock_held:
> > >  /*
> > >   * Pass requested state in r3:
> > >   *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
> > > - *	   - Requested STOP state in POWER9
> > > + *	   - Requested PSSCR value in POWER9
> > >   *
> > > - * To check IRQ_HAPPENED in r4
> > > - * 	0 - don't check
> > > - * 	1 - check
> > > - *
> > > - * Address to 'rfid' to in r5
> > > + * Address of idle handler to 'rfid' to in r4
> > >   */
> > >  pnv_powersave_common:
> > >  	/* Use r3 to pass state nap/sleep/winkle */
> > > @@ -131,30 +127,7 @@ pnv_powersave_common:
> > >  	std	r0,_LINK(r1)
> > >  	std	r0,_NIP(r1)
> > > 
> > > -	/* Hard disable interrupts */
> > > -	mfmsr	r9
> > > -	rldicl	r9,r9,48,1
> > > -	rotldi	r9,r9,16
> > > -	mtmsrd	r9,1			/* hard-disable interrupts */
> > > -
> > > -	/* Check if something happened while soft-disabled */
> > > -	lbz	r0,PACAIRQHAPPENED(r13)
> > > -	andi.	r0,r0,~PACA_IRQ_HARD_DIS@l
> > > -	beq	1f
> > > -	cmpwi	cr0,r4,0  
> > 
> > There were callers to power7_nap() in the dynamic-core-split/unsplit
> > which wanted nap to be forced irrespective of whether an irq was
> > pending or not. With this new patch, there won't be a way to enforce
> > that on POWER8.
> 
> Actually there are two APIs to sleep now, the low level one which
> does not check lazy irq or nap disable is power7_idle_insn(). The
> one which sets up the lazy irq state is power7_idle_type(). The
> dynamic core split calls the former, so AFAIKS it should be
> equivalent.

Ah, I see. Yes, my bad. I didn't check which one we are calling in the
dynamic split case.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> 
> Not the best names perhaps. Open to better suggestions.
> 
> Thanks,
> Nick
> 

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor
  2017-06-11 23:58 ` [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor Nicholas Piggin
  2017-06-12 14:41   ` Gautham R Shenoy
@ 2017-06-13  9:51   ` Michael Ellerman
  2017-06-13 11:09     ` Nicholas Piggin
  1 sibling, 1 reply; 42+ messages in thread
From: Michael Ellerman @ 2017-06-13  9:51 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: Gautham R . Shenoy, Nicholas Piggin, Shreyas B . Prabhu

Nicholas Piggin <npiggin@gmail.com> writes:

> The __replay_interrupt code is branched to with bl, but the caller is
> returned to directly with rfid from the interrupt.
>
> Instead return to a return stub that returns to the caller with blr,
> which should do better with the return predictor.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index a04ee0d7f88e..d55201625ea3 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1586,7 +1586,7 @@ _GLOBAL(__replay_interrupt)
>  	 * we don't give a damn about, so we don't bother storing them.
>  	 */
>  	mfmsr	r12
> -	mflr	r11
> +	LOAD_REG_ADDR(r11, __replay_interrupt_return)

Can you make it a local label, to make it clear nothing outside the file
returns to there, and to not clutter the symbol map?

cheers

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation
  2017-06-12 17:11   ` Vaidyanathan Srinivasan
@ 2017-06-13 10:04     ` Michael Ellerman
  2017-06-13 11:56       ` Nicholas Piggin
  2017-06-13 13:45       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 42+ messages in thread
From: Michael Ellerman @ 2017-06-13 10:04 UTC (permalink / raw)
  To: svaidy, Nicholas Piggin
  Cc: Gautham R . Shenoy, linuxppc-dev, Shreyas B . Prabhu

Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
> * Nicholas Piggin <npiggin@gmail.com> [2017-06-12 09:58:34]:
>
>> The CTRL register is read-only except bit 63 which is the run latch
>> control. This means it can be updated with a mtspr rather than
>> mfspr/mtspr.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index baae104b16c7..a44ea034c226 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1973,13 +1969,9 @@ void notrace __ppc64_runlatch_on(void)
>>  void notrace __ppc64_runlatch_off(void)
>>  {
>>  	struct thread_info *ti = current_thread_info();
>> -	unsigned long ctrl;
>> 
>>  	ti->local_flags &= ~_TLF_RUNLATCH;
>> -
>> -	ctrl = mfspr(SPRN_CTRLF);
>> -	ctrl &= ~CTRL_RUNLATCH;
>> -	mtspr(SPRN_CTRLT, ctrl);
>> +	mtspr(SPRN_CTRLT, 0);
>
> Good idea.  Writing to CTRL register can change only the RUN field.
> Was this any different in older generations?

No AFAICS back to 2.02.

> Anton and Ben kept the mfspr/mtspr part in earlier updates to this
> routine.

Doing the read/modify write is forward compatible vs a new writable
field, whereas writing the whole register with a known value is not.

cheers

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
  2017-06-11 23:58 ` [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
@ 2017-06-13 10:25   ` Gautham R Shenoy
  2017-06-13 10:45     ` Nicholas Piggin
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2017-06-13 10:25 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu,
	Vaidyanathan Srinivasan

Hi Nick,

On Mon, Jun 12, 2017 at 09:58:29AM +1000, Nicholas Piggin wrote:
> Idle code now always runs at the 0xc... effective address whether
> in real or virtual mode. This means rfid can be ditched, along
> with a lot of SRR manipulations.
> 
> In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
> MSR states as required.
> 
> This also balances the return prediction for the idle call, by
> doing blr rather than rfid to return to the idle caller.
> 
> On POWER9, 2-process context switch on different cores, with snooze
> disabled, increases performance by 2%.
> ---
>  arch/powerpc/kernel/exceptions-64s.S    |  1 +
>  arch/powerpc/kernel/idle_book3s.S       | 57 +++++++++++++++------------------
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 ++++-
>  3 files changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index fec7c933d095..c3d0aef089a7 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -148,12 +147,8 @@ pnv_powersave_common:
>  	 * the MMU context to the guest.
>  	 */
>  	LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
> -	li	r6, MSR_RI
> -	andc	r6, r9, r6
> -	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
> -	mtspr	SPRN_SRR0, r4
> -	mtspr	SPRN_SRR1, r7
> -	rfid
> +	mtmsrd	r7,0
> +	bctr

So at this point we need to transition from virtual to real mode as
the comment in pnv_enter_arch207_idle_mode expects us to. Which is
being performed by mtmsrd here. Then we jump to the function via
bctr. So, in this patch we are using two instructions to modify the
MSR and the PC, while earlier the rfid would update these atomically.

Does forgoing atomicity have any risk? I am asking this because
historically we have modified IR/DR bits in the MSR via rfid
mechanism.
--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths
  2017-06-13 10:25   ` Gautham R Shenoy
@ 2017-06-13 10:45     ` Nicholas Piggin
  0 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-13 10:45 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: linuxppc-dev, Shreyas B . Prabhu, Vaidyanathan Srinivasan

On Tue, 13 Jun 2017 15:55:53 +0530
Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:

> Hi Nick,
> 
> On Mon, Jun 12, 2017 at 09:58:29AM +1000, Nicholas Piggin wrote:
> > Idle code now always runs at the 0xc... effective address whether
> > in real or virtual mode. This means rfid can be ditched, along
> > with a lot of SRR manipulations.
> > 
> > In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change
> > MSR states as required.
> > 
> > This also balances the return prediction for the idle call, by
> > doing blr rather than rfid to return to the idle caller.
> > 
> > On POWER9, 2-process context switch on different cores, with snooze
> > disabled, increases performance by 2%.
> > ---
> >  arch/powerpc/kernel/exceptions-64s.S    |  1 +
> >  arch/powerpc/kernel/idle_book3s.S       | 57 +++++++++++++++------------------
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 ++++-
> >  3 files changed, 33 insertions(+), 33 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> > index fec7c933d095..c3d0aef089a7 100644
> > --- a/arch/powerpc/kernel/idle_book3s.S
> > +++ b/arch/powerpc/kernel/idle_book3s.S
> > @@ -148,12 +147,8 @@ pnv_powersave_common:
> >  	 * the MMU context to the guest.
> >  	 */
> >  	LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
> > -	li	r6, MSR_RI
> > -	andc	r6, r9, r6
> > -	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
> > -	mtspr	SPRN_SRR0, r4
> > -	mtspr	SPRN_SRR1, r7
> > -	rfid
> > +	mtmsrd	r7,0
> > +	bctr  
> 
> So at this point we need to transition from virtual to real mode as
> the comment in pnv_enter_arch207_idle_mode expects us to. Which is
> being performed by mtmsrd here. Then we jump to the function via
> bctr. So, in this patch we are using two instructions to modify the
> MSR and the PC, while earlier the rfid would update these atomically.
> 
> Does forgoing atomicity have any risk? I am asking this because
> historically we have modified IR/DR bits in the MSR via rfid
> mechanism.

I believe it's not a problem. Actually the ISA has a note about using
it in this way, p.1181 of ISA v3.0B, the programming note suggests
using mtmsrd rather than rfid to enable IR.

I've tested on POWER8 and 9 and not had any problems with it. Interesting
question though.

One thing I wonder about is that ERAT installs real mode entries, so
using mtmsrd to transition from real to virtual mode will require 2
I-ERAT entries for these nearby instruction addresses. Then if you did
a branch to a distant address, that would require another I-ERAT. On
the other hand if you do an rfid to switch MSR and branch to distant
address at the same time, it should only require 2 I-ERAT entries. So
you may see better microbenchmark performance of the first case, but
the latter may end up being slower on real work.

I've decided that probably the kernel is compact enough that we aren't
likely to cause more ERAT footprint by doing this. It would be
interesting to do a proper analysis of this, but I haven't got around
to it yet.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor
  2017-06-13  9:51   ` Michael Ellerman
@ 2017-06-13 11:09     ` Nicholas Piggin
  0 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-13 11:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

On Tue, 13 Jun 2017 19:51:19 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > The __replay_interrupt code is branched to with bl, but the caller is
> > returned to directly with rfid from the interrupt.
> >
> > Instead return to a return stub that returns to the caller with blr,
> > which should do better with the return predictor.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/kernel/exceptions-64s.S | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> > index a04ee0d7f88e..d55201625ea3 100644
> > --- a/arch/powerpc/kernel/exceptions-64s.S
> > +++ b/arch/powerpc/kernel/exceptions-64s.S
> > @@ -1586,7 +1586,7 @@ _GLOBAL(__replay_interrupt)
> >  	 * we don't give a damn about, so we don't bother storing them.
> >  	 */
> >  	mfmsr	r12
> > -	mflr	r11
> > +	LOAD_REG_ADDR(r11, __replay_interrupt_return)  
> 
> Can you make it a local label, to make it clear nothing outside the file
> returns to there, and to not clutter the symbol map?

I can do that. Interrupt returns will now get significantly
attributed to this guy in profiles (and you can see it on some
sleep/wake workloads). __replay_interrupt is probably better
than arch_local_irq_restore, I guess.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation
  2017-06-13 10:04     ` Michael Ellerman
@ 2017-06-13 11:56       ` Nicholas Piggin
  2017-06-13 13:45       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-13 11:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: svaidy, Gautham R . Shenoy, linuxppc-dev, Shreyas B . Prabhu

On Tue, 13 Jun 2017 20:04:27 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
> > * Nicholas Piggin <npiggin@gmail.com> [2017-06-12 09:58:34]:
> >  
> >> The CTRL register is read-only except bit 63 which is the run latch
> >> control. This means it can be updated with a mtspr rather than
> >> mfspr/mtspr.
> >> 
> >> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>  
> >  
> >> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> >> index baae104b16c7..a44ea034c226 100644
> >> --- a/arch/powerpc/kernel/process.c
> >> +++ b/arch/powerpc/kernel/process.c
> >> @@ -1973,13 +1969,9 @@ void notrace __ppc64_runlatch_on(void)
> >>  void notrace __ppc64_runlatch_off(void)
> >>  {
> >>  	struct thread_info *ti = current_thread_info();
> >> -	unsigned long ctrl;
> >> 
> >>  	ti->local_flags &= ~_TLF_RUNLATCH;
> >> -
> >> -	ctrl = mfspr(SPRN_CTRLF);
> >> -	ctrl &= ~CTRL_RUNLATCH;
> >> -	mtspr(SPRN_CTRLT, ctrl);
> >> +	mtspr(SPRN_CTRLT, 0);  
> >
> > Good idea.  Writing to CTRL register can change only the RUN field.
> > Was this any different in older generations?  
> 
> No AFAICS back to 2.02.
> 
> > Anton and Ben kept the mfspr/mtspr part in earlier updates to this
> > routine.  
> 
> Doing the read/modify write is forward compatible vs a new writable
> field, whereas writing the whole register with a known value is not.

Can we call that an incompatible arch change and not worry about
it? ISA says we may expect TS (read-only) field to expand, but I
guess they could shoehorn something else in there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle
  2017-06-12 17:48   ` Vaidyanathan Srinivasan
@ 2017-06-13 12:47     ` Nicholas Piggin
  0 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-13 12:47 UTC (permalink / raw)
  To: Vaidyanathan Srinivasan
  Cc: linuxppc-dev, Gautham R . Shenoy, Shreyas B . Prabhu

On Mon, 12 Jun 2017 23:18:44 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> * Nicholas Piggin <npiggin@gmail.com> [2017-06-12 09:58:33]:
> 
> > A memory barrier is not required after the task wakes up,
> > only if we clear the polling flag before waking. The case
> > where we have work to do is the important one, so optimise
> > for it.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>  
> 
> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> 
> 
> > ---
> >  drivers/cpuidle/cpuidle-powernv.c | 11 +++++++++--
> >  drivers/cpuidle/cpuidle-pseries.c | 11 +++++++++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > index 9d03326ac05e..37b0698b7193 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -59,14 +59,21 @@ static int snooze_loop(struct cpuidle_device *dev,
> >  	ppc64_runlatch_off();
> >  	HMT_very_low();
> >  	while (!need_resched()) {
> > -		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time)
> > +		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
> > +			/*
> > +			 * Task has not woken up but we are exiting the polling
> > +			 * loop anyway. Require a barrier after polling is
> > +			 * cleared to order subsequent test of need_resched().
> > +			 */
> > +			clear_thread_flag(TIF_POLLING_NRFLAG);
> > +			smp_mb();
> >  			break;
> > +		}
> >  	}
> > 
> >  	HMT_medium();
> >  	ppc64_runlatch_on();
> >  	clear_thread_flag(TIF_POLLING_NRFLAG);
> > -	smp_mb();  
> 
> If we reach here without executing if(snooze_timeout) with the
> barrier+break, that means we have seen the need_resched flag and hence
> we can avoid the barrier.  Clearing of the polling flag can be seen
> (take affect) later as we will exit the idle loop and re-enter anyway
> at this point.  The caller also will check for need_resched and exit
> the idle loop.
> 
> Actually do_idle() has a __current_set_polling() and
> __current_clr_polling() in the default idle loop.  Do we really need
> to set/clear the TIF_POLLING_NRFLAG again in cpuidle driver?

We do because cpuidle drivers are entered with POLLING clear, don't
they?

It would be nice to have the entire scheduler idle including cpuidle
entry run with POLLING set. It would then be up to interrupt-based
cpuidle drivers to clear POLLING before sleeping. I think that should
remove... maybe 4 atomic ops from the most performance-critical idle
states (the polling ones).

That's a bigger change though and will touch many archs and core code.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation
  2017-06-13 10:04     ` Michael Ellerman
  2017-06-13 11:56       ` Nicholas Piggin
@ 2017-06-13 13:45       ` Benjamin Herrenschmidt
  2017-06-14  3:34         ` Michael Ellerman
  1 sibling, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2017-06-13 13:45 UTC (permalink / raw)
  To: Michael Ellerman, svaidy, Nicholas Piggin
  Cc: Gautham R . Shenoy, linuxppc-dev, Shreyas B . Prabhu

On Tue, 2017-06-13 at 20:04 +1000, Michael Ellerman wrote:
> > Good idea.  Writing to CTRL register can change only the RUN field.
> > Was this any different in older generations?
> 
> No AFAICS back to 2.02.
> 
> > Anton and Ben kept the mfspr/mtspr part in earlier updates to this
> > routine.
> 
> Doing the read/modify write is forward compatible vs a new writable
> field, whereas writing the whole register with a known value is not.

At this stage I wouldn't worry too much about it. What we can do is
write a pre-cooked value (from reading it earlier once at boot) if we
are paranoid or just do what Nick does and put the onus on future
designs that might want to re-use it for other things to add a mode
bits to configure the new feature in.

Ben.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation
  2017-06-13 13:45       ` Benjamin Herrenschmidt
@ 2017-06-14  3:34         ` Michael Ellerman
  0 siblings, 0 replies; 42+ messages in thread
From: Michael Ellerman @ 2017-06-14  3:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, svaidy, Nicholas Piggin
  Cc: Gautham R . Shenoy, linuxppc-dev, Shreyas B . Prabhu

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Tue, 2017-06-13 at 20:04 +1000, Michael Ellerman wrote:
>> > Good idea.=C2=A0 Writing to CTRL register can change only the RUN fiel=
d.
>> > Was this any different in older generations?
>>=20
>> No AFAICS back to 2.02.
>>=20
>> > Anton and Ben kept the mfspr/mtspr part in earlier updates to this
>> > routine.
>>=20
>> Doing the read/modify write is forward compatible vs a new writable
>> field, whereas writing the whole register with a known value is not.
>
> At this stage I wouldn't worry too much about it. What we can do is
> write a pre-cooked value (from reading it earlier once at boot) if we
> are paranoid or just do what Nick does and put the onus on future
> designs that might want to re-use it for other things to add a mode
> bits to configure the new feature in.

Fine by me.

cheers

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs
  2017-06-08 15:50 [PATCH 00/14] idle performance improvements Nicholas Piggin
@ 2017-06-08 15:51 ` Nicholas Piggin
  0 siblings, 0 replies; 42+ messages in thread
From: Nicholas Piggin @ 2017-06-08 15:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Gautham R . Shenoy, Shreyas B . Prabhu

local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.

Expand the TIF_POLLING_NRFLAG coverage to as large as possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 drivers/cpuidle/cpuidle-powernv.c | 4 +++-
 drivers/cpuidle/cpuidle-pseries.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 150b971c303b..0ee4660efb5f 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -51,9 +51,10 @@ static int snooze_loop(struct cpuidle_device *dev,
 {
 	u64 snooze_exit_time;
 
-	local_irq_enable();
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
+	local_irq_enable();
+
 	snooze_exit_time = get_tb() + snooze_timeout;
 	ppc64_runlatch_off();
 	HMT_very_low();
@@ -66,6 +67,7 @@ static int snooze_loop(struct cpuidle_device *dev,
 	ppc64_runlatch_on();
 	clear_thread_flag(TIF_POLLING_NRFLAG);
 	smp_mb();
+
 	return index;
 }
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 166ccd711ec9..7b12bb2ea70f 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -62,9 +62,10 @@ static int snooze_loop(struct cpuidle_device *dev,
 	unsigned long in_purr;
 	u64 snooze_exit_time;
 
+	set_thread_flag(TIF_POLLING_NRFLAG);
+
 	idle_loop_prolog(&in_purr);
 	local_irq_enable();
-	set_thread_flag(TIF_POLLING_NRFLAG);
 	snooze_exit_time = get_tb() + snooze_timeout;
 
 	while (!need_resched()) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-06-14  3:34 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-11 23:58 [PATCH 00/14 v2] idle performance improvements Nicholas Piggin
2017-06-11 23:58 ` [PATCH 01/14] powerpc/64s: idle move soft interrupt mask logic into C code Nicholas Piggin
2017-06-12  8:37   ` Gautham R Shenoy
2017-06-12 14:46     ` Nicholas Piggin
2017-06-13  4:28       ` Gautham R Shenoy
2017-06-11 23:58 ` [PATCH 02/14] powerpc/64s: idle hotplug lazy-irq simplification Nicholas Piggin
2017-06-11 23:58 ` [PATCH 03/14] powerpc/64s: idle provide a default idle for POWER9 Nicholas Piggin
2017-06-12  8:53   ` Gautham R Shenoy
2017-06-12 14:46     ` Nicholas Piggin
2017-06-11 23:58 ` [PATCH 04/14] powerpc/64s: idle process interrupts from system reset wakeup Nicholas Piggin
2017-06-12  9:41   ` Gautham R Shenoy
2017-06-12 14:51     ` Nicholas Piggin
2017-06-11 23:58 ` [PATCH 05/14] powerpc/64s: msgclr when handling doorbell exceptions Nicholas Piggin
2017-06-12 14:38   ` Gautham R Shenoy
2017-06-11 23:58 ` [PATCH 06/14] powerpc/64s: interrupt replay balance the return branch predictor Nicholas Piggin
2017-06-12 14:41   ` Gautham R Shenoy
2017-06-13  9:51   ` Michael Ellerman
2017-06-13 11:09     ` Nicholas Piggin
2017-06-11 23:58 ` [PATCH 07/14] powerpc/64s: idle branch to handler with virtual mode offset Nicholas Piggin
2017-06-11 23:58 ` [PATCH 08/14] powerpc/64s: idle avoid SRR usage in idle sleep/wake paths Nicholas Piggin
2017-06-13 10:25   ` Gautham R Shenoy
2017-06-13 10:45     ` Nicholas Piggin
2017-06-11 23:58 ` [PATCH 09/14] powerpc/64s: idle hmi wakeup is unlikely Nicholas Piggin
2017-06-12 15:03   ` Gautham R Shenoy
2017-06-11 23:58 ` [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs Nicholas Piggin
2017-06-12 15:10   ` Gautham R Shenoy
2017-06-12 15:20     ` Nicholas Piggin
2017-06-11 23:58 ` [PATCH 11/14] powerpc/64s: cpuidle read mostly for common globals Nicholas Piggin
2017-06-12 15:30   ` Gautham R Shenoy
2017-06-12 17:50     ` Vaidyanathan Srinivasan
2017-06-11 23:58 ` [PATCH 12/14] powerpc/64s: cpuidle no memory barrier after break from idle Nicholas Piggin
2017-06-12 17:48   ` Vaidyanathan Srinivasan
2017-06-13 12:47     ` Nicholas Piggin
2017-06-11 23:58 ` [PATCH 13/14] powerpc/64: runlatch CTRL[RUN] set optimisation Nicholas Piggin
2017-06-12 17:11   ` Vaidyanathan Srinivasan
2017-06-13 10:04     ` Michael Ellerman
2017-06-13 11:56       ` Nicholas Piggin
2017-06-13 13:45       ` Benjamin Herrenschmidt
2017-06-14  3:34         ` Michael Ellerman
2017-06-11 23:58 ` [PATCH 14/14] powerpc/64s: idle runlatch switch is done with MSR[EE]=0 Nicholas Piggin
2017-06-12 17:00   ` Vaidyanathan Srinivasan
  -- strict thread matches above, loose matches on Subject: below --
2017-06-08 15:50 [PATCH 00/14] idle performance improvements Nicholas Piggin
2017-06-08 15:51 ` [PATCH 10/14] powerpc/64s: cpuidle set polling before enabling irqs Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.