linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/11]  powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states
@ 2016-06-08 16:54 Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 01/11] powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle Shreyas B. Prabhu
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu, Rafael J. Wysocki, Daniel Lezcano, linux-pm,
	Rob Herring, Lorenzo Pieralisi

POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
	instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named PSSCR is added which controls the behavior
	of stop instruction. 
		
PSSCR has following key fields
	Bits 0:3  - Power-Saving Level Status. This field indicates the
	lowest power-saving state the thread entered since stop
	instruction was last executed.
		
	Bit 42 - Enable State Loss                          
	0 - No state is lost irrespective of other fields  
	1 - Allows state loss
		
	Bits 44:47 - Power-Saving Level Limit      
	This limits the power-saving level that can be entered into.
		
	Bits 60:63 - Requested Level              
	Used to specify which power-saving level must be entered on
	executing stop instruction
		
Stop idle states and their properties like name, latency, target
residency, psscr value are exposed via device tree.

This patch series adds support for this new mechanism.

Patches 1-7 are cleanups and code movement.
Patch 8 adds platform specific support for stop and psscr handling.
Patch 9 is a minor cleanup in cpuidle driver.
Patch 10 adds cpuidle driver support.
Patch 11 makes offlined cpu use deepest stop state.

Note: Documentation for the device tree bindings is posted here-
http://patchwork.ozlabs.org/patch/629125/


Changes in v6
=============
 - Restore new POWER ISA v3 SPRS when waking up from deep idle

Changes in v5
=============
 - Use generic cpuidle constant CPUIDLE_NAME_LEN
 - Fix return code handling for of_property_read_string_array
 - Use DT flags to determine if are using stop instruction, instead of
   cpu_has_feature
 - Removed uncessary cast with names
 - &stop_loop -> stop_loop
 - Added POWERNV_THRESHOLD_LATENCY_NS to filter out idle states with high latency

Changes in v4
=============
 - Added a patch to use PNV_THREAD_WINKLE macro while requesting for winkle
 - Moved power7_powersave_common rename to more appropriate patch
 - renaming power7_enter_nap_mode to pnv_enter_arch207_idle_mode
 - Added PSSCR layout to Patch 7's commit message
 - Improved / Fixed comments
 - Fixed whitespace error in paca.h
 - Using MAX_POSSIBLE_STOP_STATE macro instead of hardcoding 0xF has
   max possible stop state

Changes in v3
=============
 - Rebased on powerpc-next
 - Dropping patch 1 since we are not adding a new file for P9 idle support
 - Improved comments in multiple places
 - Moved GET_PACA from power7_restore_hyp_resource to System Reset
 - Instead of moving few functions from idle_power7 to idle_power_common,
   renaming idle_power7.S to idle_power_common.S
 - Moved HSTATE_HWTHREAD_STATE updation to power_powersave_common
 - Dropped earlier patch 5 which moved few macros from idle_power_common to
   asm/cpuidle.h. 
 - Added a patch to rename reusable power7_* idle functions to pnv_*
 - Added new patch that creates abstraction for saving SPRs before
   entering deep idle states
 - Instead of introducing new file idle_power_stop.S, P9 idle support
   is added to idle_power_common.S using CPU_FTR sections.
 - Fixed r4 reg clobbering in power_stop0

Changes in v2
=============
 - Rebased on v4.6-rc6
 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@au1.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>

Shreyas B. Prabhu (11):
  powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for
    winkle
  powerpc/kvm: make hypervisor state restore a function
  powerpc/powernv: Rename idle_power7.S to idle_power_common.S
  powerpc/powernv: Rename reusable idle functions to hardware agnostic
    names
  powerpc/powernv: Make pnv_powersave_common more generic
  powerpc/powernv: abstraction for saving SPRs before entering deep idle
    states
  powerpc/powernv: set power_save func after the idle states are
    initialized
  powerpc/powernv: Add platform support for stop instruction
  cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of
    MAX_POWERNV_IDLE_STATES
  cpuidle/powernv: Add support for POWER ISA v3 idle states
  powerpc/powernv: Use deepest stop state when cpu is offlined

 arch/powerpc/include/asm/cpuidle.h        |   2 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |   2 +-
 arch/powerpc/include/asm/machdep.h        |   1 +
 arch/powerpc/include/asm/opal-api.h       |  11 +-
 arch/powerpc/include/asm/paca.h           |   2 +
 arch/powerpc/include/asm/ppc-opcode.h     |   4 +
 arch/powerpc/include/asm/processor.h      |   1 +
 arch/powerpc/include/asm/reg.h            |  14 +
 arch/powerpc/kernel/Makefile              |   2 +-
 arch/powerpc/kernel/asm-offsets.c         |   2 +
 arch/powerpc/kernel/exceptions-64s.S      |  30 +-
 arch/powerpc/kernel/idle_power7.S         | 515 ----------------------
 arch/powerpc/kernel/idle_power_common.S   | 682 ++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |   4 +-
 arch/powerpc/platforms/powernv/idle.c     |  98 ++++-
 arch/powerpc/platforms/powernv/powernv.h  |   1 +
 arch/powerpc/platforms/powernv/setup.c    |   2 +-
 arch/powerpc/platforms/powernv/smp.c      |   4 +-
 drivers/cpuidle/cpuidle-powernv.c         |  73 +++-
 19 files changed, 887 insertions(+), 563 deletions(-)
 delete mode 100644 arch/powerpc/kernel/idle_power7.S
 create mode 100644 arch/powerpc/kernel/idle_power_common.S

-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 01/11] powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 02/11] powerpc/kvm: make hypervisor state restore a function Shreyas B. Prabhu
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
-No changes since v4

Changes in v4
=============
- New in v4

 arch/powerpc/kernel/idle_power7.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 470ceeb..705c867 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -252,7 +252,7 @@ _GLOBAL(power7_sleep)
 	/* No return */
 
 _GLOBAL(power7_winkle)
-	li	r3,3
+	li	r3,PNV_THREAD_WINKLE
 	li	r4,1
 	b	power7_powersave_common
 	/* No return */
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 02/11] powerpc/kvm: make hypervisor state restore a function
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 01/11] powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 03/11] powerpc/powernv: Rename idle_power7.S to idle_power_common.S Shreyas B. Prabhu
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

In the current code, when the thread wakes up in reset vector, some
of the state restore code and check for whether a thread needs to
branch to kvm is duplicated. Reorder the code such that this
duplication is avoided.

At a higher level this is what the change looks like-

Before this patch -
power7_wakeup_tb_loss:
	restore hypervisor state
	if (thread needed by kvm)
		goto kvm_start_guest
	restore nvgprs, cr, pc
	rfid to process context

power7_wakeup_loss:
	restore nvgprs, cr, pc
	rfid to process context

reset vector:
	if (waking from deep idle states)
		goto power7_wakeup_tb_loss
	else
		if (thread needed by kvm)
			goto kvm_start_guest
		goto power7_wakeup_loss

After this patch -
power7_wakeup_tb_loss:
	restore hypervisor state
	return

power7_restore_hyp_resource():
	if (waking from deep idle states)
		goto power7_wakeup_tb_loss
	return

power7_wakeup_loss:
	restore nvgprs, cr, pc
	rfid to process context

reset vector:
	power7_restore_hyp_resource()
	if (thread needed by kvm)
                goto kvm_start_guest
	goto power7_wakeup_loss

Reviewed-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
- No changes since v3

Changes in v3:
=============
- Retaining GET_PACA(r13) in System Reset vector instead of moving it
  to power7_restore_hyp_resource
- Added comments indicating entry conditions for power7_restore_hyp_resource
- Improved comments around return statements

 arch/powerpc/kernel/exceptions-64s.S | 28 ++------------
 arch/powerpc/kernel/idle_power7.S    | 72 +++++++++++++++++++++---------------
 2 files changed, 46 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 4c94406..4a74d6a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -107,25 +107,9 @@ BEGIN_FTR_SECTION
 	beq	9f
 
 	cmpwi	cr3,r13,2
-
-	/*
-	 * Check if last bit of HSPGR0 is set. This indicates whether we are
-	 * waking up from winkle.
-	 */
 	GET_PACA(r13)
-	clrldi	r5,r13,63
-	clrrdi	r13,r13,1
-	cmpwi	cr4,r5,1
-	mtspr	SPRN_HSPRG0,r13
+	bl	power7_restore_hyp_resource
 
-	lbz	r0,PACA_THREAD_IDLE_STATE(r13)
-	cmpwi   cr2,r0,PNV_THREAD_NAP
-	bgt     cr2,8f				/* Either sleep or Winkle */
-
-	/* Waking up from nap should not cause hypervisor state loss */
-	bgt	cr3,.
-
-	/* Waking up from nap */
 	li	r0,PNV_THREAD_RUNNING
 	stb	r0,PACA_THREAD_IDLE_STATE(r13)	/* Clear thread state */
 
@@ -143,13 +127,9 @@ BEGIN_FTR_SECTION
 
 	/* Return SRR1 from power7_nap() */
 	mfspr	r3,SPRN_SRR1
-	beq	cr3,2f
-	b	power7_wakeup_noloss
-2:	b	power7_wakeup_loss
-
-	/* Fast Sleep wakeup on PowerNV */
-8:	GET_PACA(r13)
-	b 	power7_wakeup_tb_loss
+	blt	cr3,2f
+	b	power7_wakeup_loss
+2:	b	power7_wakeup_noloss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index 705c867..d5def06 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -276,6 +276,39 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
 20:	nop;
 
 
+/*
+ * Called from reset vector. Check whether we have woken up with
+ * hypervisor state loss. If yes, restore hypervisor state and return
+ * back to reset vector.
+ *
+ * r13 - Contents of HSPRG0
+ * cr3 - set to gt if waking up with partial/complete hypervisor state loss
+ */
+_GLOBAL(power7_restore_hyp_resource)
+	/*
+	 * Check if last bit of HSPGR0 is set. This indicates whether we are
+	 * waking up from winkle.
+	 */
+	clrldi	r5,r13,63
+	clrrdi	r13,r13,1
+	cmpwi	cr4,r5,1
+	mtspr	SPRN_HSPRG0,r13
+
+	lbz	r0,PACA_THREAD_IDLE_STATE(r13)
+	cmpwi   cr2,r0,PNV_THREAD_NAP
+	bgt     cr2,power7_wakeup_tb_loss	/* Either sleep or Winkle */
+
+	/*
+	 * We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
+	 * up from nap. At this stage CR3 shouldn't contains 'gt' since that
+	 * indicates we are waking with hypervisor state loss from nap.
+	 */
+	bgt	cr3,.
+
+	blr	/* Return back to System Reset vector from where
+		   power7_restore_hyp_resource was invoked */
+
+
 _GLOBAL(power7_wakeup_tb_loss)
 	ld	r2,PACATOC(r13);
 	ld	r1,PACAR1(r13)
@@ -284,11 +317,13 @@ _GLOBAL(power7_wakeup_tb_loss)
 	 * and they are restored before switching to the process context. Hence
 	 * until they are restored, they are free to be used.
 	 *
-	 * Save SRR1 in a NVGPR as it might be clobbered in opal_call_realmode
-	 * (called in CHECK_HMI_INTERRUPT). SRR1 is required to determine the
-	 * wakeup reason if we branch to kvm_start_guest.
+	 * Save SRR1 and LR in NVGPRs as they might be clobbered in
+	 * opal_call_realmode (called in CHECK_HMI_INTERRUPT). SRR1 is required
+	 * to determine the wakeup reason if we branch to kvm_start_guest. LR
+	 * is required to return back to reset vector after hypervisor state
+	 * restore is complete.
 	 */
-
+	mflr	r17
 	mfspr	r16,SPRN_SRR1
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
@@ -438,33 +473,10 @@ common_exit:
 
 hypervisor_state_restored:
 
-	li	r5,PNV_THREAD_RUNNING
-	stb     r5,PACA_THREAD_IDLE_STATE(r13)
-
 	mtspr	SPRN_SRR1,r16
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-	li      r0,KVM_HWTHREAD_IN_KERNEL
-	stb     r0,HSTATE_HWTHREAD_STATE(r13)
-	/* Order setting hwthread_state vs. testing hwthread_req */
-	sync
-	lbz     r0,HSTATE_HWTHREAD_REQ(r13)
-	cmpwi   r0,0
-	beq     6f
-	b       kvm_start_guest
-6:
-#endif
-
-	REST_NVGPRS(r1)
-	REST_GPR(2, r1)
-	ld	r3,_CCR(r1)
-	ld	r4,_MSR(r1)
-	ld	r5,_NIP(r1)
-	addi	r1,r1,INT_FRAME_SIZE
-	mtcr	r3
-	mfspr	r3,SPRN_SRR1		/* Return SRR1 */
-	mtspr	SPRN_SRR1,r4
-	mtspr	SPRN_SRR0,r5
-	rfid
+	mtlr	r17
+	blr	/* Return back to System Reset vector from where
+		   power7_restore_hyp_resource was invoked */
 
 fastsleep_workaround_at_exit:
 	li	r3,1
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 03/11] powerpc/powernv: Rename idle_power7.S to idle_power_common.S
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 01/11] powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 02/11] powerpc/kvm: make hypervisor state restore a function Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-15  5:31   ` [v6, " Michael Ellerman
  2016-06-08 16:54 ` [PATCH v6 04/11] powerpc/powernv: Rename reusable idle functions to hardware agnostic names Shreyas B. Prabhu
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next
patch for POWER9. Rename the file to a non-hardware specific
name.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
 - No changes since v3

Changes in v3:
==============
 - Instead of moving few common functions from idle_power7.S to
   idle_power_common.S, renaming idle_power7.S to idle_power_common.S

 arch/powerpc/kernel/Makefile            |   2 +-
 arch/powerpc/kernel/idle_power7.S       | 527 --------------------------------
 arch/powerpc/kernel/idle_power_common.S | 527 ++++++++++++++++++++++++++++++++
 3 files changed, 528 insertions(+), 528 deletions(-)
 delete mode 100644 arch/powerpc/kernel/idle_power7.S
 create mode 100644 arch/powerpc/kernel/idle_power_common.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..99116da 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -47,7 +47,7 @@ obj-$(CONFIG_PPC_BOOK3E_64)	+= exceptions-64e.o idle_book3e.o
 obj-$(CONFIG_PPC64)		+= vdso64/
 obj-$(CONFIG_ALTIVEC)		+= vecemu.o
 obj-$(CONFIG_PPC_970_NAP)	+= idle_power4.o
-obj-$(CONFIG_PPC_P7_NAP)	+= idle_power7.o
+obj-$(CONFIG_PPC_P7_NAP)	+= idle_power_common.o
 procfs-y			:= proc_powerpc.o
 obj-$(CONFIG_PROC_FS)		+= $(procfs-y)
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)	:= rtas_pci.o
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
deleted file mode 100644
index d5def06..0000000
--- a/arch/powerpc/kernel/idle_power7.S
+++ /dev/null
@@ -1,527 +0,0 @@
-/*
- *  This file contains the power_save function for Power7 CPUs.
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- */
-
-#include <linux/threads.h>
-#include <asm/processor.h>
-#include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/thread_info.h>
-#include <asm/ppc_asm.h>
-#include <asm/asm-offsets.h>
-#include <asm/ppc-opcode.h>
-#include <asm/hw_irq.h>
-#include <asm/kvm_book3s_asm.h>
-#include <asm/opal.h>
-#include <asm/cpuidle.h>
-#include <asm/book3s/64/mmu-hash.h>
-
-#undef DEBUG
-
-/*
- * Use unused space in the interrupt stack to save and restore
- * registers for winkle support.
- */
-#define _SDR1	GPR3
-#define _RPR	GPR4
-#define _SPURR	GPR5
-#define _PURR	GPR6
-#define _TSCR	GPR7
-#define _DSCR	GPR8
-#define _AMOR	GPR9
-#define _WORT	GPR10
-#define _WORC	GPR11
-
-/* Idle state entry routines */
-
-#define	IDLE_STATE_ENTER_SEQ(IDLE_INST)				\
-	/* Magic NAP/SLEEP/WINKLE mode enter sequence */	\
-	std	r0,0(r1);					\
-	ptesync;						\
-	ld	r0,0(r1);					\
-1:	cmp	cr0,r0,r0;					\
-	bne	1b;						\
-	IDLE_INST;						\
-	b	.
-
-	.text
-
-/*
- * Used by threads when the lock bit of core_idle_state is set.
- * Threads will spin in HMT_LOW until the lock bit is cleared.
- * r14 - pointer to core_idle_state
- * r15 - used to load contents of core_idle_state
- */
-
-core_idle_lock_held:
-	HMT_LOW
-3:	lwz	r15,0(r14)
-	andi.   r15,r15,PNV_CORE_IDLE_LOCK_BIT
-	bne	3b
-	HMT_MEDIUM
-	lwarx	r15,0,r14
-	blr
-
-/*
- * Pass requested state in r3:
- *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE
- *
- * To check IRQ_HAPPENED in r4
- * 	0 - don't check
- * 	1 - check
- */
-_GLOBAL(power7_powersave_common)
-	/* Use r3 to pass state nap/sleep/winkle */
-	/* NAP is a state loss, we create a regs frame on the
-	 * stack, fill it up with the state we care about and
-	 * stick a pointer to it in PACAR1. We really only
-	 * need to save PC, some CR bits and the NV GPRs,
-	 * but for now an interrupt frame will do.
-	 */
-	mflr	r0
-	std	r0,16(r1)
-	stdu	r1,-INT_FRAME_SIZE(r1)
-	std	r0,_LINK(r1)
-	std	r0,_NIP(r1)
-
-	/* Hard disable interrupts */
-	mfmsr	r9
-	rldicl	r9,r9,48,1
-	rotldi	r9,r9,16
-	mtmsrd	r9,1			/* hard-disable interrupts */
-
-	/* Check if something happened while soft-disabled */
-	lbz	r0,PACAIRQHAPPENED(r13)
-	andi.	r0,r0,~PACA_IRQ_HARD_DIS@l
-	beq	1f
-	cmpwi	cr0,r4,0
-	beq	1f
-	addi	r1,r1,INT_FRAME_SIZE
-	ld	r0,16(r1)
-	li	r3,0			/* Return 0 (no nap) */
-	mtlr	r0
-	blr
-
-1:	/* We mark irqs hard disabled as this is the state we'll
-	 * be in when returning and we need to tell arch_local_irq_restore()
-	 * about it
-	 */
-	li	r0,PACA_IRQ_HARD_DIS
-	stb	r0,PACAIRQHAPPENED(r13)
-
-	/* We haven't lost state ... yet */
-	li	r0,0
-	stb	r0,PACA_NAPSTATELOST(r13)
-
-	/* Continue saving state */
-	SAVE_GPR(2, r1)
-	SAVE_NVGPRS(r1)
-	mfcr	r4
-	std	r4,_CCR(r1)
-	std	r9,_MSR(r1)
-	std	r1,PACAR1(r13)
-
-	/*
-	 * Go to real mode to do the nap, as required by the architecture.
-	 * Also, we need to be in real mode before setting hwthread_state,
-	 * because as soon as we do that, another thread can switch
-	 * the MMU context to the guest.
-	 */
-	LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
-	li	r6, MSR_RI
-	andc	r6, r9, r6
-	LOAD_REG_ADDR(r7, power7_enter_nap_mode)
-	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
-	mtspr	SPRN_SRR0, r7
-	mtspr	SPRN_SRR1, r5
-	rfid
-
-	.globl	power7_enter_nap_mode
-power7_enter_nap_mode:
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-	/* Tell KVM we're napping */
-	li	r4,KVM_HWTHREAD_IN_NAP
-	stb	r4,HSTATE_HWTHREAD_STATE(r13)
-#endif
-	stb	r3,PACA_THREAD_IDLE_STATE(r13)
-	cmpwi	cr3,r3,PNV_THREAD_SLEEP
-	bge	cr3,2f
-	IDLE_STATE_ENTER_SEQ(PPC_NAP)
-	/* No return */
-2:
-	/* Sleep or winkle */
-	lbz	r7,PACA_THREAD_MASK(r13)
-	ld	r14,PACA_CORE_IDLE_STATE_PTR(r13)
-lwarx_loop1:
-	lwarx	r15,0,r14
-
-	andi.   r9,r15,PNV_CORE_IDLE_LOCK_BIT
-	bnel	core_idle_lock_held
-
-	andc	r15,r15,r7			/* Clear thread bit */
-
-	andi.	r15,r15,PNV_CORE_IDLE_THREAD_BITS
-
-/*
- * If cr0 = 0, then current thread is the last thread of the core entering
- * sleep. Last thread needs to execute the hardware bug workaround code if
- * required by the platform.
- * Make the workaround call unconditionally here. The below branch call is
- * patched out when the idle states are discovered if the platform does not
- * require it.
- */
-.global pnv_fastsleep_workaround_at_entry
-pnv_fastsleep_workaround_at_entry:
-	beq	fastsleep_workaround_at_entry
-
-	stwcx.	r15,0,r14
-	bne-	lwarx_loop1
-	isync
-
-common_enter: /* common code for all the threads entering sleep or winkle */
-	bgt	cr3,enter_winkle
-	IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
-
-fastsleep_workaround_at_entry:
-	ori	r15,r15,PNV_CORE_IDLE_LOCK_BIT
-	stwcx.	r15,0,r14
-	bne-	lwarx_loop1
-	isync
-
-	/* Fast sleep workaround */
-	li	r3,1
-	li	r4,1
-	li	r0,OPAL_CONFIG_CPU_IDLE_STATE
-	bl	opal_call_realmode
-
-	/* Clear Lock bit */
-	li	r0,0
-	lwsync
-	stw	r0,0(r14)
-	b	common_enter
-
-enter_winkle:
-	/*
-	 * Note all register i.e per-core, per-subcore or per-thread is saved
-	 * here since any thread in the core might wake up first
-	 */
-	mfspr	r3,SPRN_SDR1
-	std	r3,_SDR1(r1)
-	mfspr	r3,SPRN_RPR
-	std	r3,_RPR(r1)
-	mfspr	r3,SPRN_SPURR
-	std	r3,_SPURR(r1)
-	mfspr	r3,SPRN_PURR
-	std	r3,_PURR(r1)
-	mfspr	r3,SPRN_TSCR
-	std	r3,_TSCR(r1)
-	mfspr	r3,SPRN_DSCR
-	std	r3,_DSCR(r1)
-	mfspr	r3,SPRN_AMOR
-	std	r3,_AMOR(r1)
-	mfspr	r3,SPRN_WORT
-	std	r3,_WORT(r1)
-	mfspr	r3,SPRN_WORC
-	std	r3,_WORC(r1)
-	IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
-
-_GLOBAL(power7_idle)
-	/* Now check if user or arch enabled NAP mode */
-	LOAD_REG_ADDRBASE(r3,powersave_nap)
-	lwz	r4,ADDROFF(powersave_nap)(r3)
-	cmpwi	0,r4,0
-	beqlr
-	li	r3, 1
-	/* fall through */
-
-_GLOBAL(power7_nap)
-	mr	r4,r3
-	li	r3,PNV_THREAD_NAP
-	b	power7_powersave_common
-	/* No return */
-
-_GLOBAL(power7_sleep)
-	li	r3,PNV_THREAD_SLEEP
-	li	r4,1
-	b	power7_powersave_common
-	/* No return */
-
-_GLOBAL(power7_winkle)
-	li	r3,PNV_THREAD_WINKLE
-	li	r4,1
-	b	power7_powersave_common
-	/* No return */
-
-#define CHECK_HMI_INTERRUPT						\
-	mfspr	r0,SPRN_SRR1;						\
-BEGIN_FTR_SECTION_NESTED(66);						\
-	rlwinm	r0,r0,45-31,0xf;  /* extract wake reason field (P8) */	\
-FTR_SECTION_ELSE_NESTED(66);						\
-	rlwinm	r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */	\
-ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
-	cmpwi	r0,0xa;			/* Hypervisor maintenance ? */	\
-	bne	20f;							\
-	/* Invoke opal call to handle hmi */				\
-	ld	r2,PACATOC(r13);					\
-	ld	r1,PACAR1(r13);						\
-	std	r3,ORIG_GPR3(r1);	/* Save original r3 */		\
-	li	r0,OPAL_HANDLE_HMI;	/* Pass opal token argument*/	\
-	bl	opal_call_realmode;					\
-	ld	r3,ORIG_GPR3(r1);	/* Restore original r3 */	\
-20:	nop;
-
-
-/*
- * Called from reset vector. Check whether we have woken up with
- * hypervisor state loss. If yes, restore hypervisor state and return
- * back to reset vector.
- *
- * r13 - Contents of HSPRG0
- * cr3 - set to gt if waking up with partial/complete hypervisor state loss
- */
-_GLOBAL(power7_restore_hyp_resource)
-	/*
-	 * Check if last bit of HSPGR0 is set. This indicates whether we are
-	 * waking up from winkle.
-	 */
-	clrldi	r5,r13,63
-	clrrdi	r13,r13,1
-	cmpwi	cr4,r5,1
-	mtspr	SPRN_HSPRG0,r13
-
-	lbz	r0,PACA_THREAD_IDLE_STATE(r13)
-	cmpwi   cr2,r0,PNV_THREAD_NAP
-	bgt     cr2,power7_wakeup_tb_loss	/* Either sleep or Winkle */
-
-	/*
-	 * We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
-	 * up from nap. At this stage CR3 shouldn't contains 'gt' since that
-	 * indicates we are waking with hypervisor state loss from nap.
-	 */
-	bgt	cr3,.
-
-	blr	/* Return back to System Reset vector from where
-		   power7_restore_hyp_resource was invoked */
-
-
-_GLOBAL(power7_wakeup_tb_loss)
-	ld	r2,PACATOC(r13);
-	ld	r1,PACAR1(r13)
-	/*
-	 * Before entering any idle state, the NVGPRs are saved in the stack
-	 * and they are restored before switching to the process context. Hence
-	 * until they are restored, they are free to be used.
-	 *
-	 * Save SRR1 and LR in NVGPRs as they might be clobbered in
-	 * opal_call_realmode (called in CHECK_HMI_INTERRUPT). SRR1 is required
-	 * to determine the wakeup reason if we branch to kvm_start_guest. LR
-	 * is required to return back to reset vector after hypervisor state
-	 * restore is complete.
-	 */
-	mflr	r17
-	mfspr	r16,SPRN_SRR1
-BEGIN_FTR_SECTION
-	CHECK_HMI_INTERRUPT
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-
-	lbz	r7,PACA_THREAD_MASK(r13)
-	ld	r14,PACA_CORE_IDLE_STATE_PTR(r13)
-lwarx_loop2:
-	lwarx	r15,0,r14
-	andi.	r9,r15,PNV_CORE_IDLE_LOCK_BIT
-	/*
-	 * Lock bit is set in one of the 2 cases-
-	 * a. In the sleep/winkle enter path, the last thread is executing
-	 * fastsleep workaround code.
-	 * b. In the wake up path, another thread is executing fastsleep
-	 * workaround undo code or resyncing timebase or restoring context
-	 * In either case loop until the lock bit is cleared.
-	 */
-	bnel	core_idle_lock_held
-
-	cmpwi	cr2,r15,0
-	lbz	r4,PACA_SUBCORE_SIBLING_MASK(r13)
-	and	r4,r4,r15
-	cmpwi	cr1,r4,0	/* Check if first in subcore */
-
-	/*
-	 * At this stage
-	 * cr1 - 0b0100 if first thread to wakeup in subcore
-	 * cr2 - 0b0100 if first thread to wakeup in core
-	 * cr3-  0b0010 if waking up from sleep or winkle
-	 * cr4 - 0b0100 if waking up from winkle
-	 */
-
-	or	r15,r15,r7		/* Set thread bit */
-
-	beq	cr1,first_thread_in_subcore
-
-	/* Not first thread in subcore to wake up */
-	stwcx.	r15,0,r14
-	bne-	lwarx_loop2
-	isync
-	b	common_exit
-
-first_thread_in_subcore:
-	/* First thread in subcore to wakeup */
-	ori	r15,r15,PNV_CORE_IDLE_LOCK_BIT
-	stwcx.	r15,0,r14
-	bne-	lwarx_loop2
-	isync
-
-	/*
-	 * If waking up from sleep, subcore state is not lost. Hence
-	 * skip subcore state restore
-	 */
-	bne	cr4,subcore_state_restored
-
-	/* Restore per-subcore state */
-	ld      r4,_SDR1(r1)
-	mtspr   SPRN_SDR1,r4
-	ld      r4,_RPR(r1)
-	mtspr   SPRN_RPR,r4
-	ld	r4,_AMOR(r1)
-	mtspr	SPRN_AMOR,r4
-
-subcore_state_restored:
-	/*
-	 * Check if the thread is also the first thread in the core. If not,
-	 * skip to clear_lock.
-	 */
-	bne	cr2,clear_lock
-
-first_thread_in_core:
-
-	/*
-	 * First thread in the core waking up from fastsleep. It needs to
-	 * call the fastsleep workaround code if the platform requires it.
-	 * Call it unconditionally here. The below branch instruction will
-	 * be patched out when the idle states are discovered if platform
-	 * does not require workaround.
-	 */
-.global pnv_fastsleep_workaround_at_exit
-pnv_fastsleep_workaround_at_exit:
-	b	fastsleep_workaround_at_exit
-
-timebase_resync:
-	/* Do timebase resync if we are waking up from sleep. Use cr3 value
-	 * set in exceptions-64s.S */
-	ble	cr3,clear_lock
-	/* Time base re-sync */
-	li	r0,OPAL_RESYNC_TIMEBASE
-	bl	opal_call_realmode;
-	/* TODO: Check r3 for failure */
-
-	/*
-	 * If waking up from sleep, per core state is not lost, skip to
-	 * clear_lock.
-	 */
-	bne	cr4,clear_lock
-
-	/* Restore per core state */
-	ld	r4,_TSCR(r1)
-	mtspr	SPRN_TSCR,r4
-	ld	r4,_WORC(r1)
-	mtspr	SPRN_WORC,r4
-
-clear_lock:
-	andi.	r15,r15,PNV_CORE_IDLE_THREAD_BITS
-	lwsync
-	stw	r15,0(r14)
-
-common_exit:
-	/*
-	 * Common to all threads.
-	 *
-	 * If waking up from sleep, hypervisor state is not lost. Hence
-	 * skip hypervisor state restore.
-	 */
-	bne	cr4,hypervisor_state_restored
-
-	/* Waking up from winkle */
-
-	/* Restore per thread state */
-	bl	__restore_cpu_power8
-
-	/* Restore SLB  from PACA */
-	ld	r8,PACA_SLBSHADOWPTR(r13)
-
-	.rept	SLB_NUM_BOLTED
-	li	r3, SLBSHADOW_SAVEAREA
-	LDX_BE	r5, r8, r3
-	addi	r3, r3, 8
-	LDX_BE	r6, r8, r3
-	andis.	r7,r5,SLB_ESID_V@h
-	beq	1f
-	slbmte	r6,r5
-1:	addi	r8,r8,16
-	.endr
-
-	ld	r4,_SPURR(r1)
-	mtspr	SPRN_SPURR,r4
-	ld	r4,_PURR(r1)
-	mtspr	SPRN_PURR,r4
-	ld	r4,_DSCR(r1)
-	mtspr	SPRN_DSCR,r4
-	ld	r4,_WORT(r1)
-	mtspr	SPRN_WORT,r4
-
-hypervisor_state_restored:
-
-	mtspr	SPRN_SRR1,r16
-	mtlr	r17
-	blr	/* Return back to System Reset vector from where
-		   power7_restore_hyp_resource was invoked */
-
-fastsleep_workaround_at_exit:
-	li	r3,1
-	li	r4,0
-	li	r0,OPAL_CONFIG_CPU_IDLE_STATE
-	bl	opal_call_realmode
-	b	timebase_resync
-
-/*
- * R3 here contains the value that will be returned to the caller
- * of power7_nap.
- */
-_GLOBAL(power7_wakeup_loss)
-	ld	r1,PACAR1(r13)
-BEGIN_FTR_SECTION
-	CHECK_HMI_INTERRUPT
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-	REST_NVGPRS(r1)
-	REST_GPR(2, r1)
-	ld	r6,_CCR(r1)
-	ld	r4,_MSR(r1)
-	ld	r5,_NIP(r1)
-	addi	r1,r1,INT_FRAME_SIZE
-	mtcr	r6
-	mtspr	SPRN_SRR1,r4
-	mtspr	SPRN_SRR0,r5
-	rfid
-
-/*
- * R3 here contains the value that will be returned to the caller
- * of power7_nap.
- */
-_GLOBAL(power7_wakeup_noloss)
-	lbz	r0,PACA_NAPSTATELOST(r13)
-	cmpwi	r0,0
-	bne	power7_wakeup_loss
-BEGIN_FTR_SECTION
-	CHECK_HMI_INTERRUPT
-END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
-	ld	r1,PACAR1(r13)
-	ld	r6,_CCR(r1)
-	ld	r4,_MSR(r1)
-	ld	r5,_NIP(r1)
-	addi	r1,r1,INT_FRAME_SIZE
-	mtcr	r6
-	mtspr	SPRN_SRR1,r4
-	mtspr	SPRN_SRR0,r5
-	rfid
diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S
new file mode 100644
index 0000000..d5def06
--- /dev/null
+++ b/arch/powerpc/kernel/idle_power_common.S
@@ -0,0 +1,527 @@
+/*
+ *  This file contains the power_save function for Power7 CPUs.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/threads.h>
+#include <asm/processor.h>
+#include <asm/page.h>
+#include <asm/cputable.h>
+#include <asm/thread_info.h>
+#include <asm/ppc_asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/ppc-opcode.h>
+#include <asm/hw_irq.h>
+#include <asm/kvm_book3s_asm.h>
+#include <asm/opal.h>
+#include <asm/cpuidle.h>
+#include <asm/book3s/64/mmu-hash.h>
+
+#undef DEBUG
+
+/*
+ * Use unused space in the interrupt stack to save and restore
+ * registers for winkle support.
+ */
+#define _SDR1	GPR3
+#define _RPR	GPR4
+#define _SPURR	GPR5
+#define _PURR	GPR6
+#define _TSCR	GPR7
+#define _DSCR	GPR8
+#define _AMOR	GPR9
+#define _WORT	GPR10
+#define _WORC	GPR11
+
+/* Idle state entry routines */
+
+#define	IDLE_STATE_ENTER_SEQ(IDLE_INST)				\
+	/* Magic NAP/SLEEP/WINKLE mode enter sequence */	\
+	std	r0,0(r1);					\
+	ptesync;						\
+	ld	r0,0(r1);					\
+1:	cmp	cr0,r0,r0;					\
+	bne	1b;						\
+	IDLE_INST;						\
+	b	.
+
+	.text
+
+/*
+ * Used by threads when the lock bit of core_idle_state is set.
+ * Threads will spin in HMT_LOW until the lock bit is cleared.
+ * r14 - pointer to core_idle_state
+ * r15 - used to load contents of core_idle_state
+ */
+
+core_idle_lock_held:
+	HMT_LOW
+3:	lwz	r15,0(r14)
+	andi.   r15,r15,PNV_CORE_IDLE_LOCK_BIT
+	bne	3b
+	HMT_MEDIUM
+	lwarx	r15,0,r14
+	blr
+
+/*
+ * Pass requested state in r3:
+ *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE
+ *
+ * To check IRQ_HAPPENED in r4
+ * 	0 - don't check
+ * 	1 - check
+ */
+_GLOBAL(power7_powersave_common)
+	/* Use r3 to pass state nap/sleep/winkle */
+	/* NAP is a state loss, we create a regs frame on the
+	 * stack, fill it up with the state we care about and
+	 * stick a pointer to it in PACAR1. We really only
+	 * need to save PC, some CR bits and the NV GPRs,
+	 * but for now an interrupt frame will do.
+	 */
+	mflr	r0
+	std	r0,16(r1)
+	stdu	r1,-INT_FRAME_SIZE(r1)
+	std	r0,_LINK(r1)
+	std	r0,_NIP(r1)
+
+	/* Hard disable interrupts */
+	mfmsr	r9
+	rldicl	r9,r9,48,1
+	rotldi	r9,r9,16
+	mtmsrd	r9,1			/* hard-disable interrupts */
+
+	/* Check if something happened while soft-disabled */
+	lbz	r0,PACAIRQHAPPENED(r13)
+	andi.	r0,r0,~PACA_IRQ_HARD_DIS@l
+	beq	1f
+	cmpwi	cr0,r4,0
+	beq	1f
+	addi	r1,r1,INT_FRAME_SIZE
+	ld	r0,16(r1)
+	li	r3,0			/* Return 0 (no nap) */
+	mtlr	r0
+	blr
+
+1:	/* We mark irqs hard disabled as this is the state we'll
+	 * be in when returning and we need to tell arch_local_irq_restore()
+	 * about it
+	 */
+	li	r0,PACA_IRQ_HARD_DIS
+	stb	r0,PACAIRQHAPPENED(r13)
+
+	/* We haven't lost state ... yet */
+	li	r0,0
+	stb	r0,PACA_NAPSTATELOST(r13)
+
+	/* Continue saving state */
+	SAVE_GPR(2, r1)
+	SAVE_NVGPRS(r1)
+	mfcr	r4
+	std	r4,_CCR(r1)
+	std	r9,_MSR(r1)
+	std	r1,PACAR1(r13)
+
+	/*
+	 * Go to real mode to do the nap, as required by the architecture.
+	 * Also, we need to be in real mode before setting hwthread_state,
+	 * because as soon as we do that, another thread can switch
+	 * the MMU context to the guest.
+	 */
+	LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
+	li	r6, MSR_RI
+	andc	r6, r9, r6
+	LOAD_REG_ADDR(r7, power7_enter_nap_mode)
+	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
+	mtspr	SPRN_SRR0, r7
+	mtspr	SPRN_SRR1, r5
+	rfid
+
+	.globl	power7_enter_nap_mode
+power7_enter_nap_mode:
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	/* Tell KVM we're napping */
+	li	r4,KVM_HWTHREAD_IN_NAP
+	stb	r4,HSTATE_HWTHREAD_STATE(r13)
+#endif
+	stb	r3,PACA_THREAD_IDLE_STATE(r13)
+	cmpwi	cr3,r3,PNV_THREAD_SLEEP
+	bge	cr3,2f
+	IDLE_STATE_ENTER_SEQ(PPC_NAP)
+	/* No return */
+2:
+	/* Sleep or winkle */
+	lbz	r7,PACA_THREAD_MASK(r13)
+	ld	r14,PACA_CORE_IDLE_STATE_PTR(r13)
+lwarx_loop1:
+	lwarx	r15,0,r14
+
+	andi.   r9,r15,PNV_CORE_IDLE_LOCK_BIT
+	bnel	core_idle_lock_held
+
+	andc	r15,r15,r7			/* Clear thread bit */
+
+	andi.	r15,r15,PNV_CORE_IDLE_THREAD_BITS
+
+/*
+ * If cr0 = 0, then current thread is the last thread of the core entering
+ * sleep. Last thread needs to execute the hardware bug workaround code if
+ * required by the platform.
+ * Make the workaround call unconditionally here. The below branch call is
+ * patched out when the idle states are discovered if the platform does not
+ * require it.
+ */
+.global pnv_fastsleep_workaround_at_entry
+pnv_fastsleep_workaround_at_entry:
+	beq	fastsleep_workaround_at_entry
+
+	stwcx.	r15,0,r14
+	bne-	lwarx_loop1
+	isync
+
+common_enter: /* common code for all the threads entering sleep or winkle */
+	bgt	cr3,enter_winkle
+	IDLE_STATE_ENTER_SEQ(PPC_SLEEP)
+
+fastsleep_workaround_at_entry:
+	ori	r15,r15,PNV_CORE_IDLE_LOCK_BIT
+	stwcx.	r15,0,r14
+	bne-	lwarx_loop1
+	isync
+
+	/* Fast sleep workaround */
+	li	r3,1
+	li	r4,1
+	li	r0,OPAL_CONFIG_CPU_IDLE_STATE
+	bl	opal_call_realmode
+
+	/* Clear Lock bit */
+	li	r0,0
+	lwsync
+	stw	r0,0(r14)
+	b	common_enter
+
+enter_winkle:
+	/*
+	 * Note all register i.e per-core, per-subcore or per-thread is saved
+	 * here since any thread in the core might wake up first
+	 */
+	mfspr	r3,SPRN_SDR1
+	std	r3,_SDR1(r1)
+	mfspr	r3,SPRN_RPR
+	std	r3,_RPR(r1)
+	mfspr	r3,SPRN_SPURR
+	std	r3,_SPURR(r1)
+	mfspr	r3,SPRN_PURR
+	std	r3,_PURR(r1)
+	mfspr	r3,SPRN_TSCR
+	std	r3,_TSCR(r1)
+	mfspr	r3,SPRN_DSCR
+	std	r3,_DSCR(r1)
+	mfspr	r3,SPRN_AMOR
+	std	r3,_AMOR(r1)
+	mfspr	r3,SPRN_WORT
+	std	r3,_WORT(r1)
+	mfspr	r3,SPRN_WORC
+	std	r3,_WORC(r1)
+	IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
+
+_GLOBAL(power7_idle)
+	/* Now check if user or arch enabled NAP mode */
+	LOAD_REG_ADDRBASE(r3,powersave_nap)
+	lwz	r4,ADDROFF(powersave_nap)(r3)
+	cmpwi	0,r4,0
+	beqlr
+	li	r3, 1
+	/* fall through */
+
+_GLOBAL(power7_nap)
+	mr	r4,r3
+	li	r3,PNV_THREAD_NAP
+	b	power7_powersave_common
+	/* No return */
+
+_GLOBAL(power7_sleep)
+	li	r3,PNV_THREAD_SLEEP
+	li	r4,1
+	b	power7_powersave_common
+	/* No return */
+
+_GLOBAL(power7_winkle)
+	li	r3,PNV_THREAD_WINKLE
+	li	r4,1
+	b	power7_powersave_common
+	/* No return */
+
+#define CHECK_HMI_INTERRUPT						\
+	mfspr	r0,SPRN_SRR1;						\
+BEGIN_FTR_SECTION_NESTED(66);						\
+	rlwinm	r0,r0,45-31,0xf;  /* extract wake reason field (P8) */	\
+FTR_SECTION_ELSE_NESTED(66);						\
+	rlwinm	r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */	\
+ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
+	cmpwi	r0,0xa;			/* Hypervisor maintenance ? */	\
+	bne	20f;							\
+	/* Invoke opal call to handle hmi */				\
+	ld	r2,PACATOC(r13);					\
+	ld	r1,PACAR1(r13);						\
+	std	r3,ORIG_GPR3(r1);	/* Save original r3 */		\
+	li	r0,OPAL_HANDLE_HMI;	/* Pass opal token argument*/	\
+	bl	opal_call_realmode;					\
+	ld	r3,ORIG_GPR3(r1);	/* Restore original r3 */	\
+20:	nop;
+
+
+/*
+ * Called from reset vector. Check whether we have woken up with
+ * hypervisor state loss. If yes, restore hypervisor state and return
+ * back to reset vector.
+ *
+ * r13 - Contents of HSPRG0
+ * cr3 - set to gt if waking up with partial/complete hypervisor state loss
+ */
+_GLOBAL(power7_restore_hyp_resource)
+	/*
+	 * Check if last bit of HSPGR0 is set. This indicates whether we are
+	 * waking up from winkle.
+	 */
+	clrldi	r5,r13,63
+	clrrdi	r13,r13,1
+	cmpwi	cr4,r5,1
+	mtspr	SPRN_HSPRG0,r13
+
+	lbz	r0,PACA_THREAD_IDLE_STATE(r13)
+	cmpwi   cr2,r0,PNV_THREAD_NAP
+	bgt     cr2,power7_wakeup_tb_loss	/* Either sleep or Winkle */
+
+	/*
+	 * We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
+	 * up from nap. At this stage CR3 shouldn't contains 'gt' since that
+	 * indicates we are waking with hypervisor state loss from nap.
+	 */
+	bgt	cr3,.
+
+	blr	/* Return back to System Reset vector from where
+		   power7_restore_hyp_resource was invoked */
+
+
+_GLOBAL(power7_wakeup_tb_loss)
+	ld	r2,PACATOC(r13);
+	ld	r1,PACAR1(r13)
+	/*
+	 * Before entering any idle state, the NVGPRs are saved in the stack
+	 * and they are restored before switching to the process context. Hence
+	 * until they are restored, they are free to be used.
+	 *
+	 * Save SRR1 and LR in NVGPRs as they might be clobbered in
+	 * opal_call_realmode (called in CHECK_HMI_INTERRUPT). SRR1 is required
+	 * to determine the wakeup reason if we branch to kvm_start_guest. LR
+	 * is required to return back to reset vector after hypervisor state
+	 * restore is complete.
+	 */
+	mflr	r17
+	mfspr	r16,SPRN_SRR1
+BEGIN_FTR_SECTION
+	CHECK_HMI_INTERRUPT
+END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
+
+	lbz	r7,PACA_THREAD_MASK(r13)
+	ld	r14,PACA_CORE_IDLE_STATE_PTR(r13)
+lwarx_loop2:
+	lwarx	r15,0,r14
+	andi.	r9,r15,PNV_CORE_IDLE_LOCK_BIT
+	/*
+	 * Lock bit is set in one of the 2 cases-
+	 * a. In the sleep/winkle enter path, the last thread is executing
+	 * fastsleep workaround code.
+	 * b. In the wake up path, another thread is executing fastsleep
+	 * workaround undo code or resyncing timebase or restoring context
+	 * In either case loop until the lock bit is cleared.
+	 */
+	bnel	core_idle_lock_held
+
+	cmpwi	cr2,r15,0
+	lbz	r4,PACA_SUBCORE_SIBLING_MASK(r13)
+	and	r4,r4,r15
+	cmpwi	cr1,r4,0	/* Check if first in subcore */
+
+	/*
+	 * At this stage
+	 * cr1 - 0b0100 if first thread to wakeup in subcore
+	 * cr2 - 0b0100 if first thread to wakeup in core
+	 * cr3-  0b0010 if waking up from sleep or winkle
+	 * cr4 - 0b0100 if waking up from winkle
+	 */
+
+	or	r15,r15,r7		/* Set thread bit */
+
+	beq	cr1,first_thread_in_subcore
+
+	/* Not first thread in subcore to wake up */
+	stwcx.	r15,0,r14
+	bne-	lwarx_loop2
+	isync
+	b	common_exit
+
+first_thread_in_subcore:
+	/* First thread in subcore to wakeup */
+	ori	r15,r15,PNV_CORE_IDLE_LOCK_BIT
+	stwcx.	r15,0,r14
+	bne-	lwarx_loop2
+	isync
+
+	/*
+	 * If waking up from sleep, subcore state is not lost. Hence
+	 * skip subcore state restore
+	 */
+	bne	cr4,subcore_state_restored
+
+	/* Restore per-subcore state */
+	ld      r4,_SDR1(r1)
+	mtspr   SPRN_SDR1,r4
+	ld      r4,_RPR(r1)
+	mtspr   SPRN_RPR,r4
+	ld	r4,_AMOR(r1)
+	mtspr	SPRN_AMOR,r4
+
+subcore_state_restored:
+	/*
+	 * Check if the thread is also the first thread in the core. If not,
+	 * skip to clear_lock.
+	 */
+	bne	cr2,clear_lock
+
+first_thread_in_core:
+
+	/*
+	 * First thread in the core waking up from fastsleep. It needs to
+	 * call the fastsleep workaround code if the platform requires it.
+	 * Call it unconditionally here. The below branch instruction will
+	 * be patched out when the idle states are discovered if platform
+	 * does not require workaround.
+	 */
+.global pnv_fastsleep_workaround_at_exit
+pnv_fastsleep_workaround_at_exit:
+	b	fastsleep_workaround_at_exit
+
+timebase_resync:
+	/* Do timebase resync if we are waking up from sleep. Use cr3 value
+	 * set in exceptions-64s.S */
+	ble	cr3,clear_lock
+	/* Time base re-sync */
+	li	r0,OPAL_RESYNC_TIMEBASE
+	bl	opal_call_realmode;
+	/* TODO: Check r3 for failure */
+
+	/*
+	 * If waking up from sleep, per core state is not lost, skip to
+	 * clear_lock.
+	 */
+	bne	cr4,clear_lock
+
+	/* Restore per core state */
+	ld	r4,_TSCR(r1)
+	mtspr	SPRN_TSCR,r4
+	ld	r4,_WORC(r1)
+	mtspr	SPRN_WORC,r4
+
+clear_lock:
+	andi.	r15,r15,PNV_CORE_IDLE_THREAD_BITS
+	lwsync
+	stw	r15,0(r14)
+
+common_exit:
+	/*
+	 * Common to all threads.
+	 *
+	 * If waking up from sleep, hypervisor state is not lost. Hence
+	 * skip hypervisor state restore.
+	 */
+	bne	cr4,hypervisor_state_restored
+
+	/* Waking up from winkle */
+
+	/* Restore per thread state */
+	bl	__restore_cpu_power8
+
+	/* Restore SLB  from PACA */
+	ld	r8,PACA_SLBSHADOWPTR(r13)
+
+	.rept	SLB_NUM_BOLTED
+	li	r3, SLBSHADOW_SAVEAREA
+	LDX_BE	r5, r8, r3
+	addi	r3, r3, 8
+	LDX_BE	r6, r8, r3
+	andis.	r7,r5,SLB_ESID_V@h
+	beq	1f
+	slbmte	r6,r5
+1:	addi	r8,r8,16
+	.endr
+
+	ld	r4,_SPURR(r1)
+	mtspr	SPRN_SPURR,r4
+	ld	r4,_PURR(r1)
+	mtspr	SPRN_PURR,r4
+	ld	r4,_DSCR(r1)
+	mtspr	SPRN_DSCR,r4
+	ld	r4,_WORT(r1)
+	mtspr	SPRN_WORT,r4
+
+hypervisor_state_restored:
+
+	mtspr	SPRN_SRR1,r16
+	mtlr	r17
+	blr	/* Return back to System Reset vector from where
+		   power7_restore_hyp_resource was invoked */
+
+fastsleep_workaround_at_exit:
+	li	r3,1
+	li	r4,0
+	li	r0,OPAL_CONFIG_CPU_IDLE_STATE
+	bl	opal_call_realmode
+	b	timebase_resync
+
+/*
+ * R3 here contains the value that will be returned to the caller
+ * of power7_nap.
+ */
+_GLOBAL(power7_wakeup_loss)
+	ld	r1,PACAR1(r13)
+BEGIN_FTR_SECTION
+	CHECK_HMI_INTERRUPT
+END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
+	REST_NVGPRS(r1)
+	REST_GPR(2, r1)
+	ld	r6,_CCR(r1)
+	ld	r4,_MSR(r1)
+	ld	r5,_NIP(r1)
+	addi	r1,r1,INT_FRAME_SIZE
+	mtcr	r6
+	mtspr	SPRN_SRR1,r4
+	mtspr	SPRN_SRR0,r5
+	rfid
+
+/*
+ * R3 here contains the value that will be returned to the caller
+ * of power7_nap.
+ */
+_GLOBAL(power7_wakeup_noloss)
+	lbz	r0,PACA_NAPSTATELOST(r13)
+	cmpwi	r0,0
+	bne	power7_wakeup_loss
+BEGIN_FTR_SECTION
+	CHECK_HMI_INTERRUPT
+END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
+	ld	r1,PACAR1(r13)
+	ld	r6,_CCR(r1)
+	ld	r4,_MSR(r1)
+	ld	r5,_NIP(r1)
+	addi	r1,r1,INT_FRAME_SIZE
+	mtcr	r6
+	mtspr	SPRN_SRR1,r4
+	mtspr	SPRN_SRR0,r5
+	rfid
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 04/11] powerpc/powernv: Rename reusable idle functions to hardware agnostic names
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (2 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 03/11] powerpc/powernv: Rename idle_power7.S to idle_power_common.S Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 05/11] powerpc/powernv: Make pnv_powersave_common more generic Shreyas B. Prabhu
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

Functions like power7_wakeup_loss, power7_wakeup_noloss,
power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can
also be used by POWER9. Hence rename these functions hardware agnostic
names.

Suggested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
 - No changes since v4

Changes in v4:
==============
 - renaming power7_powersave_common to pnv_powersave_common
 - renaming power7_enter_nap_mode to pnv_enter_arch207_idle_mode

 arch/powerpc/kernel/exceptions-64s.S    |  8 ++++----
 arch/powerpc/kernel/idle_power_common.S | 33 +++++++++++++++++----------------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  4 ++--
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 4a74d6a..2a123cd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -108,7 +108,7 @@ BEGIN_FTR_SECTION
 
 	cmpwi	cr3,r13,2
 	GET_PACA(r13)
-	bl	power7_restore_hyp_resource
+	bl	pnv_restore_hyp_resource
 
 	li	r0,PNV_THREAD_RUNNING
 	stb	r0,PACA_THREAD_IDLE_STATE(r13)	/* Clear thread state */
@@ -128,8 +128,8 @@ BEGIN_FTR_SECTION
 	/* Return SRR1 from power7_nap() */
 	mfspr	r3,SPRN_SRR1
 	blt	cr3,2f
-	b	power7_wakeup_loss
-2:	b	power7_wakeup_noloss
+	b	pnv_wakeup_loss
+2:	b	pnv_wakeup_noloss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
@@ -1269,7 +1269,7 @@ machine_check_handle_early:
 	GET_PACA(r13)
 	ld	r1,PACAR1(r13)
 	li	r3,PNV_THREAD_NAP
-	b	power7_enter_nap_mode
+	b	pnv_enter_arch207_idle_mode
 4:
 #endif
 	/*
diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S
index d5def06..34dbfc9 100644
--- a/arch/powerpc/kernel/idle_power_common.S
+++ b/arch/powerpc/kernel/idle_power_common.S
@@ -1,5 +1,6 @@
 /*
- *  This file contains the power_save function for Power7 CPUs.
+ *  This file contains idle entry/exit functions for POWER7 and
+ *  POWER8 CPUs.
  *
  *  This program is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU General Public License
@@ -75,7 +76,7 @@ core_idle_lock_held:
  * 	0 - don't check
  * 	1 - check
  */
-_GLOBAL(power7_powersave_common)
+_GLOBAL(pnv_powersave_common)
 	/* Use r3 to pass state nap/sleep/winkle */
 	/* NAP is a state loss, we create a regs frame on the
 	 * stack, fill it up with the state we care about and
@@ -135,14 +136,14 @@ _GLOBAL(power7_powersave_common)
 	LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
 	li	r6, MSR_RI
 	andc	r6, r9, r6
-	LOAD_REG_ADDR(r7, power7_enter_nap_mode)
+	LOAD_REG_ADDR(r7, pnv_enter_arch207_idle_mode)
 	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
 	mtspr	SPRN_SRR0, r7
 	mtspr	SPRN_SRR1, r5
 	rfid
 
-	.globl	power7_enter_nap_mode
-power7_enter_nap_mode:
+	.globl pnv_enter_arch207_idle_mode
+pnv_enter_arch207_idle_mode:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	/* Tell KVM we're napping */
 	li	r4,KVM_HWTHREAD_IN_NAP
@@ -242,19 +243,19 @@ _GLOBAL(power7_idle)
 _GLOBAL(power7_nap)
 	mr	r4,r3
 	li	r3,PNV_THREAD_NAP
-	b	power7_powersave_common
+	b	pnv_powersave_common
 	/* No return */
 
 _GLOBAL(power7_sleep)
 	li	r3,PNV_THREAD_SLEEP
 	li	r4,1
-	b	power7_powersave_common
+	b	pnv_powersave_common
 	/* No return */
 
 _GLOBAL(power7_winkle)
 	li	r3,PNV_THREAD_WINKLE
 	li	r4,1
-	b	power7_powersave_common
+	b	pnv_powersave_common
 	/* No return */
 
 #define CHECK_HMI_INTERRUPT						\
@@ -284,7 +285,7 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
  * r13 - Contents of HSPRG0
  * cr3 - set to gt if waking up with partial/complete hypervisor state loss
  */
-_GLOBAL(power7_restore_hyp_resource)
+_GLOBAL(pnv_restore_hyp_resource)
 	/*
 	 * Check if last bit of HSPGR0 is set. This indicates whether we are
 	 * waking up from winkle.
@@ -296,7 +297,7 @@ _GLOBAL(power7_restore_hyp_resource)
 
 	lbz	r0,PACA_THREAD_IDLE_STATE(r13)
 	cmpwi   cr2,r0,PNV_THREAD_NAP
-	bgt     cr2,power7_wakeup_tb_loss	/* Either sleep or Winkle */
+	bgt     cr2,pnv_wakeup_tb_loss	/* Either sleep or Winkle */
 
 	/*
 	 * We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
@@ -306,10 +307,10 @@ _GLOBAL(power7_restore_hyp_resource)
 	bgt	cr3,.
 
 	blr	/* Return back to System Reset vector from where
-		   power7_restore_hyp_resource was invoked */
+		   pnv_restore_hyp_resource was invoked */
 
 
-_GLOBAL(power7_wakeup_tb_loss)
+_GLOBAL(pnv_wakeup_tb_loss)
 	ld	r2,PACATOC(r13);
 	ld	r1,PACAR1(r13)
 	/*
@@ -476,7 +477,7 @@ hypervisor_state_restored:
 	mtspr	SPRN_SRR1,r16
 	mtlr	r17
 	blr	/* Return back to System Reset vector from where
-		   power7_restore_hyp_resource was invoked */
+		   pnv_restore_hyp_resource was invoked */
 
 fastsleep_workaround_at_exit:
 	li	r3,1
@@ -489,7 +490,7 @@ fastsleep_workaround_at_exit:
  * R3 here contains the value that will be returned to the caller
  * of power7_nap.
  */
-_GLOBAL(power7_wakeup_loss)
+_GLOBAL(pnv_wakeup_loss)
 	ld	r1,PACAR1(r13)
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
@@ -509,10 +510,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
  * R3 here contains the value that will be returned to the caller
  * of power7_nap.
  */
-_GLOBAL(power7_wakeup_noloss)
+_GLOBAL(pnv_wakeup_noloss)
 	lbz	r0,PACA_NAPSTATELOST(r13)
 	cmpwi	r0,0
-	bne	power7_wakeup_loss
+	bne	pnv_wakeup_loss
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index e571ad2..86f0cae 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -392,7 +392,7 @@ kvm_no_guest:
 	cmpwi	r3, 0
 	bne	54f
 /*
- * We jump to power7_wakeup_loss, which will return to the caller
+ * We jump to pnv_wakeup_loss, which will return to the caller
  * of power7_nap in the powernv cpu offline loop.  The value we
  * put in r3 becomes the return value for power7_nap.
  */
@@ -401,7 +401,7 @@ kvm_no_guest:
 	rlwimi	r4, r3, 0, LPCR_PECE0 | LPCR_PECE1
 	mtspr	SPRN_LPCR, r4
 	li	r3, 0
-	b	power7_wakeup_loss
+	b	pnv_wakeup_loss
 
 53:	HMT_LOW
 	ld	r5, HSTATE_KVM_VCORE(r13)
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 05/11] powerpc/powernv: Make pnv_powersave_common more generic
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (3 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 04/11] powerpc/powernv: Rename reusable idle functions to hardware agnostic names Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 06/11] powerpc/powernv: abstraction for saving SPRs before entering deep idle states Shreyas B. Prabhu
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

pnv_powersave_common does common steps needed before entering idle
state and eventually changes MSR to MSR_IDLE and does rfid to
pnv_enter_arch207_idle_mode.

Move the updation of HSTATE_HWTHREAD_STATE to pnv_powersave_common
from pnv_enter_arch207_idle_mode and make it more generic by passing the rfid
address as a function parameter.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
 - No changes since v4

Changes in v4:
==============
 - Moved renaming of power7_powersave_common to earlier patch

Changes in v3:
==============
 - Moved HSTATE_HWTHREAD_STATE updation to power_powersave_common

 arch/powerpc/kernel/idle_power_common.S | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S
index 34dbfc9..a8397e3 100644
--- a/arch/powerpc/kernel/idle_power_common.S
+++ b/arch/powerpc/kernel/idle_power_common.S
@@ -75,6 +75,8 @@ core_idle_lock_held:
  * To check IRQ_HAPPENED in r4
  * 	0 - don't check
  * 	1 - check
+ *
+ * Address to 'rfid' to in r5
  */
 _GLOBAL(pnv_powersave_common)
 	/* Use r3 to pass state nap/sleep/winkle */
@@ -127,28 +129,28 @@ _GLOBAL(pnv_powersave_common)
 	std	r9,_MSR(r1)
 	std	r1,PACAR1(r13)
 
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	/* Tell KVM we're entering idle */
+	li	r4,KVM_HWTHREAD_IN_NAP
+	stb	r4,HSTATE_HWTHREAD_STATE(r13)
+#endif
+
 	/*
 	 * Go to real mode to do the nap, as required by the architecture.
 	 * Also, we need to be in real mode before setting hwthread_state,
 	 * because as soon as we do that, another thread can switch
 	 * the MMU context to the guest.
 	 */
-	LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
+	LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
 	li	r6, MSR_RI
 	andc	r6, r9, r6
-	LOAD_REG_ADDR(r7, pnv_enter_arch207_idle_mode)
 	mtmsrd	r6, 1		/* clear RI before setting SRR0/1 */
-	mtspr	SPRN_SRR0, r7
-	mtspr	SPRN_SRR1, r5
+	mtspr	SPRN_SRR0, r5
+	mtspr	SPRN_SRR1, r7
 	rfid
 
 	.globl pnv_enter_arch207_idle_mode
 pnv_enter_arch207_idle_mode:
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-	/* Tell KVM we're napping */
-	li	r4,KVM_HWTHREAD_IN_NAP
-	stb	r4,HSTATE_HWTHREAD_STATE(r13)
-#endif
 	stb	r3,PACA_THREAD_IDLE_STATE(r13)
 	cmpwi	cr3,r3,PNV_THREAD_SLEEP
 	bge	cr3,2f
@@ -243,18 +245,21 @@ _GLOBAL(power7_idle)
 _GLOBAL(power7_nap)
 	mr	r4,r3
 	li	r3,PNV_THREAD_NAP
+	LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
 	b	pnv_powersave_common
 	/* No return */
 
 _GLOBAL(power7_sleep)
 	li	r3,PNV_THREAD_SLEEP
 	li	r4,1
+	LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
 	b	pnv_powersave_common
 	/* No return */
 
 _GLOBAL(power7_winkle)
 	li	r3,PNV_THREAD_WINKLE
 	li	r4,1
+	LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
 	b	pnv_powersave_common
 	/* No return */
 
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 06/11] powerpc/powernv: abstraction for saving SPRs before entering deep idle states
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (4 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 05/11] powerpc/powernv: Make pnv_powersave_common more generic Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized Shreyas B. Prabhu
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

Create a function for saving SPRs before entering deep idle states.
This function can be reused for POWER9 deep idle states.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
 - No changes since v3

Changes in v3:
=============
 - Newly added in v3

 arch/powerpc/kernel/idle_power_common.S | 54 +++++++++++++++++++--------------
 1 file changed, 32 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S
index a8397e3..2f909a1 100644
--- a/arch/powerpc/kernel/idle_power_common.S
+++ b/arch/powerpc/kernel/idle_power_common.S
@@ -53,6 +53,36 @@
 	.text
 
 /*
+ * Used by threads before entering deep idle states. Saves SPRs
+ * in interrupt stack frame
+ */
+save_sprs_to_stack:
+	/*
+	 * Note all register i.e per-core, per-subcore or per-thread is saved
+	 * here since any thread in the core might wake up first
+	 */
+	mfspr	r3,SPRN_SDR1
+	std	r3,_SDR1(r1)
+	mfspr	r3,SPRN_RPR
+	std	r3,_RPR(r1)
+	mfspr	r3,SPRN_SPURR
+	std	r3,_SPURR(r1)
+	mfspr	r3,SPRN_PURR
+	std	r3,_PURR(r1)
+	mfspr	r3,SPRN_TSCR
+	std	r3,_TSCR(r1)
+	mfspr	r3,SPRN_DSCR
+	std	r3,_DSCR(r1)
+	mfspr	r3,SPRN_AMOR
+	std	r3,_AMOR(r1)
+	mfspr	r3,SPRN_WORT
+	std	r3,_WORT(r1)
+	mfspr	r3,SPRN_WORC
+	std	r3,_WORC(r1)
+
+	blr
+
+/*
  * Used by threads when the lock bit of core_idle_state is set.
  * Threads will spin in HMT_LOW until the lock bit is cleared.
  * r14 - pointer to core_idle_state
@@ -209,28 +239,8 @@ fastsleep_workaround_at_entry:
 	b	common_enter
 
 enter_winkle:
-	/*
-	 * Note all register i.e per-core, per-subcore or per-thread is saved
-	 * here since any thread in the core might wake up first
-	 */
-	mfspr	r3,SPRN_SDR1
-	std	r3,_SDR1(r1)
-	mfspr	r3,SPRN_RPR
-	std	r3,_RPR(r1)
-	mfspr	r3,SPRN_SPURR
-	std	r3,_SPURR(r1)
-	mfspr	r3,SPRN_PURR
-	std	r3,_PURR(r1)
-	mfspr	r3,SPRN_TSCR
-	std	r3,_TSCR(r1)
-	mfspr	r3,SPRN_DSCR
-	std	r3,_DSCR(r1)
-	mfspr	r3,SPRN_AMOR
-	std	r3,_AMOR(r1)
-	mfspr	r3,SPRN_WORT
-	std	r3,_WORT(r1)
-	mfspr	r3,SPRN_WORC
-	std	r3,_WORC(r1)
+	bl	save_sprs_to_stack
+
 	IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
 
 _GLOBAL(power7_idle)
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (5 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 06/11] powerpc/powernv: abstraction for saving SPRs before entering deep idle states Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-15  5:41   ` [v6, " Michael Ellerman
                     ` (2 more replies)
  2016-06-08 16:54 ` [PATCH v6 08/11] powerpc/powernv: Add platform support for stop instruction Shreyas B. Prabhu
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

pnv_init_idle_states discovers supported idle states from the
device tree and does the required initialization. Set power_save
function pointer only after this initialization is done

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
- No changes since v1

 arch/powerpc/platforms/powernv/idle.c  | 3 +++
 arch/powerpc/platforms/powernv/setup.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index fcc8b68..fbb09fb 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void)
 	}
 
 	pnv_alloc_idle_core_states();
+
+	if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
+		ppc_md.power_save = power7_idle;
 out_free:
 	kfree(flags);
 out:
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index ee6430b..8492bbb 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -315,7 +315,7 @@ define_machine(powernv) {
 	.get_proc_freq          = pnv_get_proc_freq,
 	.progress		= pnv_progress,
 	.machine_shutdown	= pnv_shutdown,
-	.power_save             = power7_idle,
+	.power_save             = NULL,
 	.calibrate_decr		= generic_calibrate_decr,
 #ifdef CONFIG_KEXEC
 	.kexec_cpu_down		= pnv_kexec_cpu_down,
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 08/11] powerpc/powernv: Add platform support for stop instruction
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (6 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-15 11:14   ` [v6, " Michael Ellerman
  2016-06-08 16:54 ` [PATCH v6 09/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES Shreyas B. Prabhu
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
	instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named Processor Stop Status and Control Register
	(PSSCR) is added which controls the behavior of stop instruction.

PSSCR layout:
----------------------------------------------------------
| PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
----------------------------------------------------------
0      4     41   42    43   44     48    54   56    60

PSSCR key fields:
	Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
	power-saving state the thread entered since stop instruction was last
	executed.

	Bit 42 - Enable State Loss
	0 - No state is lost irrespective of other fields
	1 - Allows state loss

	Bits 44:47 - Power-Saving Level Limit
	This limits the power-saving level that can be entered into.

	Bits 60:63 - Requested Level
	Used to specify which power-saving level must be entered on executing
	stop instruction

This patch adds support for stop instruction and PSSCR handling.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
Changes in v6
=============
 - Save/restore new P9 SPRs when using deep idle states

Changes in v4:
==============
 - Added PSSCR layout to commit message
 - Improved / Fixed comments
 - Fixed whitespace error in paca.h
 - Using MAX_POSSIBLE_STOP_STATE macro instead of hardcoding 0xF as 
   max possible stop state

Changes in v3:
==============
 - Instead of introducing new file idle_power_stop.S, P9 idle support
   is added to idle_power_common.S using CPU_FTR sections.
 - Fixed r4 reg clobbering in power_stop0
 - Improved comments

Changes in v2:
==============
 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST

 arch/powerpc/include/asm/cpuidle.h        |   2 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |   2 +-
 arch/powerpc/include/asm/machdep.h        |   1 +
 arch/powerpc/include/asm/opal-api.h       |  11 +-
 arch/powerpc/include/asm/paca.h           |   2 +
 arch/powerpc/include/asm/ppc-opcode.h     |   4 +
 arch/powerpc/include/asm/processor.h      |   1 +
 arch/powerpc/include/asm/reg.h            |  14 +++
 arch/powerpc/kernel/asm-offsets.c         |   2 +
 arch/powerpc/kernel/idle_power_common.S   | 175 +++++++++++++++++++++++++++---
 arch/powerpc/platforms/powernv/idle.c     |  84 ++++++++++++--
 11 files changed, 265 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
index d2f99ca..3d7fc06 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -13,6 +13,8 @@
 #ifndef __ASSEMBLY__
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
+
+extern u64 pnv_first_deep_stop_state;
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 72b6225..d318d43 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -162,7 +162,7 @@ struct kvmppc_book3s_shadow_vcpu {
 
 /* Values for kvm_state */
 #define KVM_HWTHREAD_IN_KERNEL	0
-#define KVM_HWTHREAD_IN_NAP	1
+#define KVM_HWTHREAD_IN_IDLE	1
 #define KVM_HWTHREAD_IN_KVM	2
 
 #endif /* __ASM_KVM_BOOK3S_ASM_H__ */
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 6bdcd0d..ae3b155 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -262,6 +262,7 @@ struct machdep_calls {
 extern void e500_idle(void);
 extern void power4_idle(void);
 extern void power7_idle(void);
+extern void power_stop0(void);
 extern void ppc6xx_idle(void);
 extern void book3e_idle(void);
 
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 9bb8ddf..7f3f8c6 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -162,13 +162,20 @@
 
 /* Device tree flags */
 
-/* Flags set in power-mgmt nodes in device tree if
- * respective idle states are supported in the platform.
+/*
+ * Flags set in power-mgmt nodes in device tree describing
+ * idle states that are supported in the platform.
  */
+
+#define OPAL_PM_TIMEBASE_STOP		0x00000002
+#define OPAL_PM_LOSE_HYP_CONTEXT	0x00002000
+#define OPAL_PM_LOSE_FULL_CONTEXT	0x00004000
 #define OPAL_PM_NAP_ENABLED		0x00010000
 #define OPAL_PM_SLEEP_ENABLED		0x00020000
 #define OPAL_PM_WINKLE_ENABLED		0x00040000
 #define OPAL_PM_SLEEP_ENABLED_ER1	0x00080000 /* with workaround */
+#define OPAL_PM_STOP_INST_FAST		0x00100000
+#define OPAL_PM_STOP_INST_DEEP		0x00200000
 
 /*
  * OPAL_CONFIG_CPU_IDLE_STATE parameters
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 546540b..ae91b44 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -171,6 +171,8 @@ struct paca_struct {
 	/* Mask to denote subcore sibling threads */
 	u8 subcore_sibling_mask;
 #endif
+	/* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set */
+	u64 thread_psscr;
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	/* Exclusive emergency stack pointer for machine check exception. */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 1d035c1..6a8e43b 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -199,6 +199,8 @@
 #define PPC_INST_SLEEP			0x4c0003a4
 #define PPC_INST_WINKLE			0x4c0003e4
 
+#define PPC_INST_STOP			0x4c0002e4
+
 /* A2 specific instructions */
 #define PPC_INST_ERATWE			0x7c0001a6
 #define PPC_INST_ERATRE			0x7c000166
@@ -370,6 +372,8 @@
 #define PPC_SLEEP		stringify_in_c(.long PPC_INST_SLEEP)
 #define PPC_WINKLE		stringify_in_c(.long PPC_INST_WINKLE)
 
+#define PPC_STOP		stringify_in_c(.long PPC_INST_STOP)
+
 /* BHRB instructions */
 #define PPC_CLRBHRB		stringify_in_c(.long PPC_INST_CLRBHRB)
 #define PPC_MFBHRBE(r, n)	stringify_in_c(.long PPC_INST_BHRBE | \
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 009fab1..7f92fc8 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -457,6 +457,7 @@ extern int powersave_nap;	/* set if nap mode can be used in idle loop */
 extern unsigned long power7_nap(int check_irq);
 extern unsigned long power7_sleep(void);
 extern unsigned long power7_winkle(void);
+extern unsigned long power_stop(unsigned long state);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index a0948f4..89a00d9 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -145,6 +145,16 @@
 #define MSR_64BIT	0
 #endif
 
+/* Power Management - PSSCR Fields */
+#define PSSCR_RL_MASK		0x0000000F
+#define PSSCR_MTL_MASK		0x000000F0
+#define PSSCR_TR_MASK		0x00000300
+#define PSSCR_PSLL_MASK		0x000F0000
+#define PSSCR_EC		0x00100000
+#define PSSCR_ESL		0x00200000
+#define PSSCR_SD		0x00400000
+
+
 /* Floating Point Status and Control Register (FPSCR) Fields */
 #define FPSCR_FX	0x80000000	/* FPU exception summary */
 #define FPSCR_FEX	0x40000000	/* FPU enabled exception summary */
@@ -288,6 +298,7 @@
 #define SPRN_PMICR	0x354   /* Power Management Idle Control Reg */
 #define SPRN_PMSR	0x355   /* Power Management Status Reg */
 #define SPRN_PMMAR	0x356	/* Power Management Memory Activity Register */
+#define SPRN_PSSCR	0x357	/* Processor Stop Status and Control Register */
 #define SPRN_PMCR	0x374	/* Power Management Control Register */
 
 /* HFSCR and FSCR bit numbers are the same */
@@ -761,6 +772,9 @@
 #define   SIER_SDAR_VALID	0x0200000	/* SDAR contents valid */
 #define SPRN_SIAR	796
 #define SPRN_SDAR	797
+#define SPRN_LMRR	813
+#define SPRN_LMSER	814
+#define SPRN_ASDR	816
 #define SPRN_TACR	888
 #define SPRN_TCSCR	889
 #define SPRN_CSIGR	890
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 9ea0955..670d2a7 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -779,6 +779,8 @@ int main(void)
 			offsetof(struct paca_struct, thread_mask));
 	DEFINE(PACA_SUBCORE_SIBLING_MASK,
 			offsetof(struct paca_struct, subcore_sibling_mask));
+	DEFINE(PACA_THREAD_PSSCR,
+			offsetof(struct paca_struct, thread_psscr));
 #endif
 
 	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S
index 2f909a1..c6c2f66 100644
--- a/arch/powerpc/kernel/idle_power_common.S
+++ b/arch/powerpc/kernel/idle_power_common.S
@@ -1,6 +1,6 @@
 /*
- *  This file contains idle entry/exit functions for POWER7 and
- *  POWER8 CPUs.
+ *  This file contains idle entry/exit functions for POWER7,
+ *  POWER8 and POWER9 CPUs.
  *
  *  This program is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU General Public License
@@ -21,6 +21,7 @@
 #include <asm/opal.h>
 #include <asm/cpuidle.h>
 #include <asm/book3s/64/mmu-hash.h>
+#include <asm/mmu.h>
 
 #undef DEBUG
 
@@ -28,6 +29,8 @@
  * Use unused space in the interrupt stack to save and restore
  * registers for winkle support.
  */
+#define _LMRR	GPR0
+#define _LMSER	GPR1
 #define _SDR1	GPR3
 #define _RPR	GPR4
 #define _SPURR	GPR5
@@ -37,6 +40,8 @@
 #define _AMOR	GPR9
 #define _WORT	GPR10
 #define _WORC	GPR11
+#define _PTCR	GPR12
+#define _ASDR	GPR13
 
 /* Idle state entry routines */
 
@@ -50,6 +55,15 @@
 	IDLE_INST;						\
 	b	.
 
+/*
+ * rA - Requested stop state
+ * rB - Spare reg that can be used
+ */
+#define PSSCR_REQUEST_STATE(rA, rB) 		\
+	ld	rB, PACA_THREAD_PSSCR(r13);	\
+	or	rB,rB,rA;			\
+	mtspr	SPRN_PSSCR, rB;
+
 	.text
 
 /*
@@ -61,8 +75,19 @@ save_sprs_to_stack:
 	 * Note all register i.e per-core, per-subcore or per-thread is saved
 	 * here since any thread in the core might wake up first
 	 */
+BEGIN_FTR_SECTION
+	mfspr	r3,SPRN_PTCR
+	std	r3,_PTCR(r1)
+	mfspr	r3,SPRN_LMRR
+	std	r3,_LMRR(r1)
+	mfspr	r3,SPRN_LMSER
+	std	r3,_LMSER(r1)
+	mfspr	r3,SPRN_ASDR
+	std	r3,_ASDR(r1)
+FTR_SECTION_ELSE
 	mfspr	r3,SPRN_SDR1
 	std	r3,_SDR1(r1)
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 	mfspr	r3,SPRN_RPR
 	std	r3,_RPR(r1)
 	mfspr	r3,SPRN_SPURR
@@ -100,7 +125,8 @@ core_idle_lock_held:
 
 /*
  * Pass requested state in r3:
- *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE
+ *	r3 - PNV_THREAD_NAP/SLEEP/WINKLE in POWER8
+ *	   - Requested STOP state in POWER9
  *
  * To check IRQ_HAPPENED in r4
  * 	0 - don't check
@@ -161,7 +187,7 @@ _GLOBAL(pnv_powersave_common)
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	/* Tell KVM we're entering idle */
-	li	r4,KVM_HWTHREAD_IN_NAP
+	li	r4,KVM_HWTHREAD_IN_IDLE
 	stb	r4,HSTATE_HWTHREAD_STATE(r13)
 #endif
 
@@ -243,6 +269,41 @@ enter_winkle:
 
 	IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
 
+/*
+ * r3 - requested stop state
+ */
+power_enter_stop:
+/*
+ * Check if the requested state is a deep idle state.
+ */
+	LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
+	ld	r4,ADDROFF(pnv_first_deep_stop_state)(r5)
+	cmpd	r3,r4
+	bge	2f
+	IDLE_STATE_ENTER_SEQ(PPC_STOP)
+2:
+/*
+ * Entering deep idle state.
+ * Clear thread bit in PACA_CORE_IDLE_STATE, save SPRs to
+ * stack and enter stop
+ */
+	lbz     r7,PACA_THREAD_MASK(r13)
+	ld      r14,PACA_CORE_IDLE_STATE_PTR(r13)
+
+lwarx_loop_stop:
+	lwarx   r15,0,r14
+	andi.   r9,r15,PNV_CORE_IDLE_LOCK_BIT
+	bnel    core_idle_lock_held
+	andc    r15,r15,r7                      /* Clear thread bit */
+
+	stwcx.  r15,0,r14
+	bne-    lwarx_loop_stop
+	isync
+
+	bl	save_sprs_to_stack
+
+	IDLE_STATE_ENTER_SEQ(PPC_STOP)
+
 _GLOBAL(power7_idle)
 	/* Now check if user or arch enabled NAP mode */
 	LOAD_REG_ADDRBASE(r3,powersave_nap)
@@ -293,6 +354,21 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
 
 
 /*
+ * Used for ppc_md.power_save which needs a function with no parameters
+ */
+_GLOBAL(power_stop0)
+	li	r3,0
+	/* Fall through to power_stop */
+/*
+ * r3 - requested stop state
+ */
+_GLOBAL(power_stop)
+	PSSCR_REQUEST_STATE(r3,r4)
+	li	r4, 1
+	LOAD_REG_ADDR(r5,power_enter_stop)
+	b	pnv_powersave_common
+	/* No return */
+/*
  * Called from reset vector. Check whether we have woken up with
  * hypervisor state loss. If yes, restore hypervisor state and return
  * back to reset vector.
@@ -301,7 +377,32 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
  * cr3 - set to gt if waking up with partial/complete hypervisor state loss
  */
 _GLOBAL(pnv_restore_hyp_resource)
+BEGIN_FTR_SECTION
 	/*
+	 * POWER ISA 3. Use PSSCR to determine if we
+	 * are waking up from deep idle state
+	 */
+	LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
+	ld	r4,ADDROFF(pnv_first_deep_stop_state)(r5)
+
+	mfspr	r5,SPRN_PSSCR
+	/*
+	 * 0-3 bits correspond to Power-Saving Level Status
+	 * which indicates the idle state we are waking up from
+	 */
+	rldicl  r5,r5,4,60
+	cmpd	cr4,r5,r4
+	bge	cr4,pnv_wakeup_tb_loss
+	/*
+	 * Waking up without hypervisor state loss. Return to
+	 * reset vector
+	 */
+	blr
+
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
+	/*
+	 * POWER ISA 2.07 or less.
 	 * Check if last bit of HSPGR0 is set. This indicates whether we are
 	 * waking up from winkle.
 	 */
@@ -324,7 +425,16 @@ _GLOBAL(pnv_restore_hyp_resource)
 	blr	/* Return back to System Reset vector from where
 		   pnv_restore_hyp_resource was invoked */
 
-
+/*
+ * Called if waking up from idle state which can cause either partial or
+ * complete hyp state loss.
+ * In POWER8, called if waking up from fastsleep or winkle
+ * In POWER9, called if waking up from stop state >= pnv_first_deep_stop_state
+ *
+ * r13 - PACA
+ * cr3 - gt if waking up with partial/complete hypervisor state loss
+ * cr4 - eq if waking up from complete hypervisor state loss.
+ */
 _GLOBAL(pnv_wakeup_tb_loss)
 	ld	r2,PACATOC(r13);
 	ld	r1,PACAR1(r13)
@@ -367,10 +477,10 @@ lwarx_loop2:
 
 	/*
 	 * At this stage
-	 * cr1 - 0b0100 if first thread to wakeup in subcore
-	 * cr2 - 0b0100 if first thread to wakeup in core
-	 * cr3-  0b0010 if waking up from sleep or winkle
-	 * cr4 - 0b0100 if waking up from winkle
+	 * cr1 - eq if first thread to wakeup in subcore
+	 * cr2 - eq if first thread to wakeup in core
+	 * cr3-  gt if waking up with partial/complete hypervisor state loss
+	 * cr4 - eq if waking up from complete hypervisor state loss.
 	 */
 
 	or	r15,r15,r7		/* Set thread bit */
@@ -397,8 +507,11 @@ first_thread_in_subcore:
 	bne	cr4,subcore_state_restored
 
 	/* Restore per-subcore state */
+BEGIN_FTR_SECTION
 	ld      r4,_SDR1(r1)
 	mtspr   SPRN_SDR1,r4
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+
 	ld      r4,_RPR(r1)
 	mtspr   SPRN_RPR,r4
 	ld	r4,_AMOR(r1)
@@ -414,19 +527,23 @@ subcore_state_restored:
 first_thread_in_core:
 
 	/*
-	 * First thread in the core waking up from fastsleep. It needs to
+	 * First thread in the core waking up from any state which can cause
+	 * partial or complete hypervisor state loss. It needs to
 	 * call the fastsleep workaround code if the platform requires it.
 	 * Call it unconditionally here. The below branch instruction will
-	 * be patched out when the idle states are discovered if platform
-	 * does not require workaround.
+	 * be patched out if the platform does not have fastsleep or does not
+	 * require the workaround. Patching will be performed during the
+	 * discovery of idle-states.
 	 */
 .global pnv_fastsleep_workaround_at_exit
 pnv_fastsleep_workaround_at_exit:
 	b	fastsleep_workaround_at_exit
 
 timebase_resync:
-	/* Do timebase resync if we are waking up from sleep. Use cr3 value
-	 * set in exceptions-64s.S */
+	/*
+	 * Use cr3 which indicates that we are waking up with atleast partial
+	 * hypervisor state loss to determine if TIMEBASE RESYNC is needed.
+	 */
 	ble	cr3,clear_lock
 	/* Time base re-sync */
 	li	r0,OPAL_RESYNC_TIMEBASE
@@ -439,7 +556,16 @@ timebase_resync:
 	 */
 	bne	cr4,clear_lock
 
-	/* Restore per core state */
+	/*
+	 * First thread in the core to wake up and its waking up with
+	 * complete hypervisor state loss. Restore per core hypervisor
+	 * state.
+	 */
+BEGIN_FTR_SECTION
+	ld	r4,_PTCR(r1)
+	mtspr	SPRN_PTCR,r4
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
 	ld	r4,_TSCR(r1)
 	mtspr	SPRN_TSCR,r4
 	ld	r4,_WORC(r1)
@@ -461,9 +587,7 @@ common_exit:
 
 	/* Waking up from winkle */
 
-	/* Restore per thread state */
-	bl	__restore_cpu_power8
-
+BEGIN_MMU_FTR_SECTION
 	/* Restore SLB  from PACA */
 	ld	r8,PACA_SLBSHADOWPTR(r13)
 
@@ -477,6 +601,21 @@ common_exit:
 	slbmte	r6,r5
 1:	addi	r8,r8,16
 	.endr
+END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
+
+	/* Restore per thread state */
+BEGIN_FTR_SECTION
+	bl      __restore_cpu_power9
+
+	ld	r4,_LMRR(r1)
+	mtspr	SPRN_LMRR,r4
+	ld	r4,_LMSER(r1)
+	mtspr	SPRN_LMSER,r4
+	ld	r4,_ASDR(r1)
+	mtspr	SPRN_ASDR,r4
+FTR_SECTION_ELSE
+	bl	__restore_cpu_power8
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 
 	ld	r4,_SPURR(r1)
 	mtspr	SPRN_SPURR,r4
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index fbb09fb..bfbd359 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -27,9 +27,11 @@
 #include "powernv.h"
 #include "subcore.h"
 
+#define MAX_STOP_STATE	0xF
+
 static u32 supported_cpuidle_states;
 
-int pnv_save_sprs_for_winkle(void)
+int pnv_save_sprs_for_deep_states(void)
 {
 	int cpu;
 	int rc;
@@ -50,15 +52,19 @@ int pnv_save_sprs_for_winkle(void)
 		uint64_t pir = get_hard_smp_processor_id(cpu);
 		uint64_t hsprg0_val = (uint64_t)&paca[cpu];
 
-		/*
-		 * HSPRG0 is used to store the cpu's pointer to paca. Hence last
-		 * 3 bits are guaranteed to be 0. Program slw to restore HSPRG0
-		 * with 63rd bit set, so that when a thread wakes up at 0x100 we
-		 * can use this bit to distinguish between fastsleep and
-		 * deep winkle.
-		 */
-		hsprg0_val |= 1;
-
+		if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+			/*
+			 * HSPRG0 is used to store the cpu's pointer to paca.
+			 * Hence last 3 bits are guaranteed to be 0. Program
+			 * slw to restore HSPRG0 with 63rd bit set, so that
+			 * when a thread wakes up at 0x100 we can use this bit
+			 * to distinguish between fastsleep and deep winkle.
+			 * This is not necessary with stop/psscr since PLS
+			 * field of psscr indicates which state we are waking
+			 * up from.
+			 */
+			hsprg0_val |= 1;
+		}
 		rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val);
 		if (rc != 0)
 			return rc;
@@ -130,8 +136,8 @@ static void pnv_alloc_idle_core_states(void)
 
 	update_subcore_sibling_mask();
 
-	if (supported_cpuidle_states & OPAL_PM_WINKLE_ENABLED)
-		pnv_save_sprs_for_winkle();
+	if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
+		pnv_save_sprs_for_deep_states();
 }
 
 u32 pnv_get_supported_cpuidle_states(void)
@@ -230,11 +236,18 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
 			show_fastsleep_workaround_applyonce,
 			store_fastsleep_workaround_applyonce);
 
+/*
+ * First deep stop state. Used to figure out when to save/restore
+ * hypervisor context.
+ */
+u64 pnv_first_deep_stop_state;
+
 static int __init pnv_init_idle_states(void)
 {
 	struct device_node *power_mgt;
 	int dt_idle_states;
 	u32 *flags;
+	u64 *psscr_val = NULL;
 	int i;
 
 	supported_cpuidle_states = 0;
@@ -264,6 +277,32 @@ static int __init pnv_init_idle_states(void)
 		goto out_free;
 	}
 
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
+					GFP_KERNEL);
+		if (!psscr_val)
+			goto out_free;
+		if (of_property_read_u64_array(power_mgt,
+			"ibm,cpu-idle-state-psscr",
+			psscr_val, dt_idle_states)) {
+			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n");
+			goto out_free_psscr;
+		}
+
+		/*
+		 * Set pnv_first_deep_stop_state to the first stop level
+		 * to cause hypervisor state loss
+		 */
+		pnv_first_deep_stop_state = MAX_STOP_STATE;
+		for (i = 0; i < dt_idle_states; i++) {
+			u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
+
+			if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
+			     (pnv_first_deep_stop_state > psscr_rl))
+				pnv_first_deep_stop_state = psscr_rl;
+		}
+	}
+
 	for (i = 0; i < dt_idle_states; i++)
 		supported_cpuidle_states |= flags[i];
 
@@ -286,8 +325,29 @@ static int __init pnv_init_idle_states(void)
 
 	pnv_alloc_idle_core_states();
 
+	if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
+		for_each_possible_cpu(i) {
+
+			u64 psscr_init_val = PSSCR_ESL | PSSCR_EC |
+					PSSCR_PSLL_MASK | PSSCR_TR_MASK |
+					PSSCR_MTL_MASK;
+
+			paca[i].thread_psscr = psscr_init_val;
+			/*
+			 * Memory barrier to ensure that the writes to PACA
+			 * goes through before ppc_md.power_save is updated
+			 * below.
+			 */
+			mb();
+		}
+
 	if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
 		ppc_md.power_save = power7_idle;
+	else if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
+		ppc_md.power_save = power_stop0;
+
+out_free_psscr:
+	kfree(psscr_val);
 out_free:
 	kfree(flags);
 out:
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 09/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (7 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 08/11] powerpc/powernv: Add platform support for stop instruction Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-13 15:01   ` Daniel Lezcano
  2016-06-08 16:54 ` [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
  2016-06-08 16:54 ` [PATCH v6 11/11] powerpc/powernv: Use deepest stop state when cpu is offlined Shreyas B. Prabhu
  10 siblings, 1 reply; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu, Rafael J. Wysocki, Daniel Lezcano, linux-pm

Use cpuidle's CPUIDLE_STATE_MAX macro instead of powernv specific
MAX_POWERNV_IDLE_STATES.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
 - No changes after v5

Changes in v5
=============
 - New in v5

 drivers/cpuidle/cpuidle-powernv.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index e12dc30..3a763a8 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -20,8 +20,6 @@
 #include <asm/opal.h>
 #include <asm/runlatch.h>
 
-#define MAX_POWERNV_IDLE_STATES	8
-
 struct cpuidle_driver powernv_idle_driver = {
 	.name             = "powernv_idle",
 	.owner            = THIS_MODULE,
@@ -96,7 +94,7 @@ static int fastsleep_loop(struct cpuidle_device *dev,
 /*
  * States for dedicated partition case.
  */
-static struct cpuidle_state powernv_states[MAX_POWERNV_IDLE_STATES] = {
+static struct cpuidle_state powernv_states[CPUIDLE_STATE_MAX] = {
 	{ /* Snooze */
 		.name = "snooze",
 		.desc = "snooze",
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (8 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 09/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  2016-06-13 15:34   ` Daniel Lezcano
  2016-06-13 21:48   ` Benjamin Herrenschmidt
  2016-06-08 16:54 ` [PATCH v6 11/11] powerpc/powernv: Use deepest stop state when cpu is offlined Shreyas B. Prabhu
  10 siblings, 2 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu, Rafael J. Wysocki, Daniel Lezcano,
	Rob Herring, Lorenzo Pieralisi, linux-pm

POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added.
 b) new per thread SPR named PSSCR is added which controls the behavior
	of stop instruction.

Supported idle states and value to be written to PSSCR register to enter
any idle state is exposed via ibm,cpu-idle-state-names and
ibm,cpu-idle-state-psscr respectively. To enter an idle state,
platform provided power_stop() needs to be invoked with the appropriate
PSSCR value.

This patch adds support for this new mechanism in cpuidle powernv driver.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: linux-pm@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
Note: Documentation for the device tree bindings is posted here-
http://patchwork.ozlabs.org/patch/629125/

 - No changes in v6

Changes in v5
=============
 - Use generic cpuidle constant CPUIDLE_NAME_LEN
 - Fix return code handling for of_property_read_string_array
 - Use DT flags to determine if are using stop instruction, instead of
   cpu_has_feature
 - Removed uncessary cast with names
 - &stop_loop -> stop_loop
 - Added POWERNV_THRESHOLD_LATENCY_NS to filter out idle states with high latency

 drivers/cpuidle/cpuidle-powernv.c | 71 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 3a763a8..c74a020 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -20,6 +20,8 @@
 #include <asm/opal.h>
 #include <asm/runlatch.h>
 
+#define POWERNV_THRESHOLD_LATENCY_NS 200000
+
 struct cpuidle_driver powernv_idle_driver = {
 	.name             = "powernv_idle",
 	.owner            = THIS_MODULE,
@@ -27,6 +29,9 @@ struct cpuidle_driver powernv_idle_driver = {
 
 static int max_idle_state;
 static struct cpuidle_state *cpuidle_state_table;
+
+static u64 stop_psscr_table[CPUIDLE_STATE_MAX];
+
 static u64 snooze_timeout;
 static bool snooze_timeout_en;
 
@@ -91,6 +96,17 @@ static int fastsleep_loop(struct cpuidle_device *dev,
 	return index;
 }
 #endif
+
+static int stop_loop(struct cpuidle_device *dev,
+		     struct cpuidle_driver *drv,
+		     int index)
+{
+	ppc64_runlatch_off();
+	power_stop(stop_psscr_table[index]);
+	ppc64_runlatch_on();
+	return index;
+}
+
 /*
  * States for dedicated partition case.
  */
@@ -167,6 +183,8 @@ static int powernv_add_idle_states(void)
 	int nr_idle_states = 1; /* Snooze */
 	int dt_idle_states;
 	u32 *latency_ns, *residency_ns, *flags;
+	u64 *psscr_val = NULL;
+	const char *names[CPUIDLE_STATE_MAX];
 	int i, rc;
 
 	/* Currently we have snooze statically defined */
@@ -199,12 +217,41 @@ static int powernv_add_idle_states(void)
 		goto out_free_latency;
 	}
 
+	rc = of_property_read_string_array(power_mgt,
+					   "ibm,cpu-idle-state-names", names,
+					   dt_idle_states);
+	if (rc < 0) {
+		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
+		goto out_free_latency;
+	}
+
+	/*
+	 * If the idle states use stop instruction, probe for psscr values
+	 * which are necessary to specify required stop level.
+	 */
+	if (flags[0] & (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP)) {
+		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
+				    GFP_KERNEL);
+		rc = of_property_read_u64_array(power_mgt,
+						"ibm,cpu-idle-state-psscr",
+						psscr_val, dt_idle_states);
+		if (rc) {
+			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n");
+			goto out_free_psscr;
+		}
+	}
 	residency_ns = kzalloc(sizeof(*residency_ns) * dt_idle_states, GFP_KERNEL);
 	rc = of_property_read_u32_array(power_mgt,
 		"ibm,cpu-idle-state-residency-ns", residency_ns, dt_idle_states);
 
 	for (i = 0; i < dt_idle_states; i++) {
-
+		/*
+		 * If an idle state has exit latency beyond
+		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
+		 * in cpu-idle.
+		 */
+		if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
+			continue;
 		/*
 		 * Cpuidle accepts exit_latency and target_residency in us.
 		 * Use default target_residency values if f/w does not expose it.
@@ -216,6 +263,16 @@ static int powernv_add_idle_states(void)
 			powernv_states[nr_idle_states].flags = 0;
 			powernv_states[nr_idle_states].target_residency = 100;
 			powernv_states[nr_idle_states].enter = &nap_loop;
+		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
+				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
+			strncpy(powernv_states[nr_idle_states].name,
+				names[i], CPUIDLE_NAME_LEN);
+			strncpy(powernv_states[nr_idle_states].desc,
+				names[i], CPUIDLE_NAME_LEN);
+			powernv_states[nr_idle_states].flags = 0;
+
+			powernv_states[nr_idle_states].enter = stop_loop;
+			stop_psscr_table[nr_idle_states] = psscr_val[i];
 		}
 
 		/*
@@ -231,6 +288,16 @@ static int powernv_add_idle_states(void)
 			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIMER_STOP;
 			powernv_states[nr_idle_states].target_residency = 300000;
 			powernv_states[nr_idle_states].enter = &fastsleep_loop;
+		} else if ((flags[i] & OPAL_PM_STOP_INST_DEEP) &&
+				(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
+			strncpy(powernv_states[nr_idle_states].name,
+				names[i], CPUIDLE_NAME_LEN);
+			strncpy(powernv_states[nr_idle_states].desc,
+				names[i], CPUIDLE_NAME_LEN);
+
+			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIMER_STOP;
+			powernv_states[nr_idle_states].enter = stop_loop;
+			stop_psscr_table[nr_idle_states] = psscr_val[i];
 		}
 #endif
 		powernv_states[nr_idle_states].exit_latency =
@@ -245,6 +312,8 @@ static int powernv_add_idle_states(void)
 	}
 
 	kfree(residency_ns);
+out_free_psscr:
+	kfree(psscr_val);
 out_free_latency:
 	kfree(latency_ns);
 out_free_flags:
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v6 11/11] powerpc/powernv: Use deepest stop state when cpu is offlined
  2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
                   ` (9 preceding siblings ...)
  2016-06-08 16:54 ` [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
@ 2016-06-08 16:54 ` Shreyas B. Prabhu
  10 siblings, 0 replies; 30+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-08 16:54 UTC (permalink / raw)
  To: mpe
  Cc: benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Shreyas B. Prabhu

If hardware supports stop state, use the deepest stop state when
the cpu is offlined.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
 - No changes since v1

 arch/powerpc/platforms/powernv/idle.c    | 15 +++++++++++++--
 arch/powerpc/platforms/powernv/powernv.h |  1 +
 arch/powerpc/platforms/powernv/smp.c     |  4 +++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index bfbd359..b38cb33 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -242,6 +242,11 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
  */
 u64 pnv_first_deep_stop_state;
 
+/*
+ * Deepest stop idle state. Used when a cpu is offlined
+ */
+u64 pnv_deepest_stop_state;
+
 static int __init pnv_init_idle_states(void)
 {
 	struct device_node *power_mgt;
@@ -290,8 +295,11 @@ static int __init pnv_init_idle_states(void)
 		}
 
 		/*
-		 * Set pnv_first_deep_stop_state to the first stop level
-		 * to cause hypervisor state loss
+		 * Set pnv_first_deep_stop_state and pnv_deepest_stop_state.
+		 * pnv_first_deep_stop_state should be set to the first stop
+		 * level to cause hypervisor state loss.
+		 * pnv_deepest_stop_state should be set to the deepest stop
+		 * stop state.
 		 */
 		pnv_first_deep_stop_state = MAX_STOP_STATE;
 		for (i = 0; i < dt_idle_states; i++) {
@@ -300,6 +308,9 @@ static int __init pnv_init_idle_states(void)
 			if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
 			     (pnv_first_deep_stop_state > psscr_rl))
 				pnv_first_deep_stop_state = psscr_rl;
+
+			if (pnv_deepest_stop_state < psscr_rl)
+				pnv_deepest_stop_state = psscr_rl;
 		}
 	}
 
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 6dbc0a1..da7c843 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -18,6 +18,7 @@ static inline void pnv_pci_shutdown(void) { }
 #endif
 
 extern u32 pnv_get_supported_cpuidle_states(void);
+extern u64 pnv_deepest_stop_state;
 
 extern void pnv_lpc_init(void);
 
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index ad7b1a3..f69ceb6 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -182,7 +182,9 @@ static void pnv_smp_cpu_kill_self(void)
 
 		ppc64_runlatch_off();
 
-		if (idle_states & OPAL_PM_WINKLE_ENABLED)
+		if (cpu_has_feature(CPU_FTR_ARCH_300))
+			srr1 = power_stop(pnv_deepest_stop_state);
+		else if (idle_states & OPAL_PM_WINKLE_ENABLED)
 			srr1 = power7_winkle();
 		else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
 				(idle_states & OPAL_PM_SLEEP_ENABLED_ER1))
-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 09/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
  2016-06-08 16:54 ` [PATCH v6 09/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES Shreyas B. Prabhu
@ 2016-06-13 15:01   ` Daniel Lezcano
  2016-06-13 20:58     ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2016-06-13 15:01 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: mpe, benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Rafael J. Wysocki, linux-pm

On Wed, Jun 08, 2016 at 11:54:29AM -0500, Shreyas B. Prabhu wrote:
> Use cpuidle's CPUIDLE_STATE_MAX macro instead of powernv specific
> MAX_POWERNV_IDLE_STATES.
> 
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: linux-pm@vger.kernel.org
> Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> ---

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-08 16:54 ` [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
@ 2016-06-13 15:34   ` Daniel Lezcano
  2016-06-14 10:47     ` Shreyas B Prabhu
  2016-06-13 21:48   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2016-06-13 15:34 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: mpe, benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Rafael J. Wysocki, Rob Herring, Lorenzo Pieralisi, linux-pm

On Wed, Jun 08, 2016 at 11:54:30AM -0500, Shreyas B. Prabhu wrote:
> POWER ISA v3 defines a new idle processor core mechanism. In summary,
>  a) new instruction named stop is added.
>  b) new per thread SPR named PSSCR is added which controls the behavior
> 	of stop instruction.
> 
> Supported idle states and value to be written to PSSCR register to enter
> any idle state is exposed via ibm,cpu-idle-state-names and
> ibm,cpu-idle-state-psscr respectively. To enter an idle state,
> platform provided power_stop() needs to be invoked with the appropriate
> PSSCR value.
> 
> This patch adds support for this new mechanism in cpuidle powernv driver.
> 
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
> Cc: linux-pm@vger.kernel.org
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> ---

[ ... ]

> +	rc = of_property_read_string_array(power_mgt,
> +					   "ibm,cpu-idle-state-names", names,
> +					   dt_idle_states);
> +	if (rc < 0) {
> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
> +		goto out_free_latency;
> +	}
> +
> +	/*
> +	 * If the idle states use stop instruction, probe for psscr values
> +	 * which are necessary to specify required stop level.
> +	 */
> +	if (flags[0] & (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP)) {
> +		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
> +				    GFP_KERNEL);

if (!psscr_val) check missing.

> +		rc = of_property_read_u64_array(power_mgt,
> +						"ibm,cpu-idle-state-psscr",
> +						psscr_val, dt_idle_states);
> +		if (rc) {
> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n");
> +			goto out_free_psscr;
> +		}
> +	}
>  	residency_ns = kzalloc(sizeof(*residency_ns) * dt_idle_states, GFP_KERNEL);

if (!residency_ns) check missing.

I suppose the code is relying on 'of_property_read_u32_array' to check it, 
right ?


  -- Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 09/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
  2016-06-13 15:01   ` Daniel Lezcano
@ 2016-06-13 20:58     ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2016-06-13 20:58 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Shreyas B. Prabhu, mpe, benh, paulus, mikey, ego, maddy,
	linuxppc-dev, linux-kernel, Rafael J. Wysocki, linux-pm

On Monday, June 13, 2016 05:01:50 PM Daniel Lezcano wrote:
> On Wed, Jun 08, 2016 at 11:54:29AM -0500, Shreyas B. Prabhu wrote:
> > Use cpuidle's CPUIDLE_STATE_MAX macro instead of powernv specific
> > MAX_POWERNV_IDLE_STATES.
> > 
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Cc: linux-pm@vger.kernel.org
> > Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> > ---
> 
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

Since this seems to depend on some other patches in the series, I'm expecting
it to go in along with the patches it depends on.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-08 16:54 ` [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
  2016-06-13 15:34   ` Daniel Lezcano
@ 2016-06-13 21:48   ` Benjamin Herrenschmidt
  2016-06-14 11:11     ` Shreyas B Prabhu
  1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2016-06-13 21:48 UTC (permalink / raw)
  To: Shreyas B. Prabhu, mpe
  Cc: ego, mikey, Daniel Lezcano, linux-pm, Rafael J. Wysocki,
	linux-kernel, Rob Herring, maddy, Lorenzo Pieralisi,
	linuxppc-dev

On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote:
> 
>  /*
>   * States for dedicated partition case.
>   */
> @@ -167,6 +183,8 @@ static int powernv_add_idle_states(void)
>  	int nr_idle_states = 1; /* Snooze */
>  	int dt_idle_states;
>  	u32 *latency_ns, *residency_ns, *flags;
> +	u64 *psscr_val = NULL;
> +	const char *names[CPUIDLE_STATE_MAX];
>  	int i, rc;
>  
>  	/* Currently we have snooze statically defined */
> @@ -199,12 +217,41 @@ static int powernv_add_idle_states(void)
>  		goto out_free_latency;
>  	}
>  
> +	rc = of_property_read_string_array(power_mgt,
> +					   "ibm,cpu-idle-state-names", names,
> +					   dt_idle_states);

Ok so from this I assume that dt_idle_states is the number of entries,
which has been checked properly to be < CPUIDLE_STATE_MAX correct ?

Beause ...

> +	if (rc < 0) {
> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
> +		goto out_free_latency;
> +	}
> +
> +	/*
> +	 * If the idle states use stop instruction, probe for psscr values
> +	 * which are necessary to specify required stop level.
> +	 */
> +	if (flags[0] & (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP)) {
> +		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
> +				    GFP_KERNEL);
> +		rc = of_property_read_u64_array(power_mgt,
> +						"ibm,cpu-idle-state-psscr",
> +						psscr_val, dt_idle_states);

Here, psscr val is only one u64 ... shouldn't you kmalloc sizeof(..) *
dt_idle_states ?

> +		if (rc) {
> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n");
> +			goto out_free_psscr;
> +		}
> +	}
>  	residency_ns = kzalloc(sizeof(*residency_ns) * dt_idle_states, GFP_KERNEL);

Just like we do here

>  	rc = of_property_read_u32_array(power_mgt, 		"ibm,cpu-idle-state-residency-ns", residency_ns, dt_idle_states);
>  	for (i = 0; i < dt_idle_states; i++) {
> -
> +		/*
> +		 * If an idle state has exit latency beyond
> +		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> +		 * in cpu-idle.
> +		 */
> +		if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
> +			continue;
>  		/*
>  		 * Cpuidle accepts exit_latency and target_residency in us.
>  		 * Use default target_residency values if f/w does not expose it.
> @@ -216,6 +263,16 @@ static int powernv_add_idle_states(void)
>  			powernv_states[nr_idle_states].flags = 0;
>  			powernv_states[nr_idle_states].target_residency = 100;
>  			powernv_states[nr_idle_states].enter = &nap_loop;
> +		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
> +				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
> +			strncpy(powernv_states[nr_idle_states].name,
> +				names[i], CPUIDLE_NAME_LEN);
> +			strncpy(powernv_states[nr_idle_states].desc,
> +				names[i], CPUIDLE_NAME_LEN);
> +			powernv_states[nr_idle_states].flags = 0;
> +
> +			powernv_states[nr_idle_states].enter = stop_loop;
> +			stop_psscr_table[nr_idle_states] = psscr_val[i];
>  		}
>  
>  		/*
> @@ -231,6 +288,16 @@ static int powernv_add_idle_states(void)
>  			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIMER_STOP;
>  			powernv_states[nr_idle_states].target_residency = 300000;
>  			powernv_states[nr_idle_states].enter = &fastsleep_loop;
> +		} else if ((flags[i] & OPAL_PM_STOP_INST_DEEP) &&
> +				(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
> +			strncpy(powernv_states[nr_idle_states].name,
> +				names[i], CPUIDLE_NAME_LEN);
> +			strncpy(powernv_states[nr_idle_states].desc,
> +				names[i], CPUIDLE_NAME_LEN);
> +
> +			powernv_states[nr_idle_states].flags = CPUIDLE_FLAG_TIMER_STOP;
> +			powernv_states[nr_idle_states].enter = stop_loop;
> +			stop_psscr_table[nr_idle_states] = psscr_val[i];
>  		}
>  #endif
>  		powernv_states[nr_idle_states].exit_latency =
> @@ -245,6 +312,8 @@ static int powernv_add_idle_states(void)
>  	}
>  
>  	kfree(residency_ns);
> +out_free_psscr:
> +	kfree(psscr_val);
>  out_free_latency:
>  	kfree(latency_ns);
>  out_free_flags:

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-13 15:34   ` Daniel Lezcano
@ 2016-06-14 10:47     ` Shreyas B Prabhu
  2016-06-14 11:29       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 30+ messages in thread
From: Shreyas B Prabhu @ 2016-06-14 10:47 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: mpe, benh, paulus, mikey, ego, maddy, linuxppc-dev, linux-kernel,
	Rafael J. Wysocki, Rob Herring, Lorenzo Pieralisi, linux-pm



On 06/13/2016 09:04 PM, Daniel Lezcano wrote:
> On Wed, Jun 08, 2016 at 11:54:30AM -0500, Shreyas B. Prabhu wrote:
>> POWER ISA v3 defines a new idle processor core mechanism. In summary,
>>  a) new instruction named stop is added.
>>  b) new per thread SPR named PSSCR is added which controls the behavior
>> 	of stop instruction.
>>
>> Supported idle states and value to be written to PSSCR register to enter
>> any idle state is exposed via ibm,cpu-idle-state-names and
>> ibm,cpu-idle-state-psscr respectively. To enter an idle state,
>> platform provided power_stop() needs to be invoked with the appropriate
>> PSSCR value.
>>
>> This patch adds support for this new mechanism in cpuidle powernv driver.
>>
>> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
>> Cc: Rob Herring <robh+dt@kernel.org>
>> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
>> Cc: linux-pm@vger.kernel.org
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
>> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
>> ---
> 
> [ ... ]
> 
>> +	rc = of_property_read_string_array(power_mgt,
>> +					   "ibm,cpu-idle-state-names", names,
>> +					   dt_idle_states);
>> +	if (rc < 0) {
>> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
>> +		goto out_free_latency;
>> +	}
>> +
>> +	/*
>> +	 * If the idle states use stop instruction, probe for psscr values
>> +	 * which are necessary to specify required stop level.
>> +	 */
>> +	if (flags[0] & (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP)) {
>> +		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
>> +				    GFP_KERNEL);
> 
> if (!psscr_val) check missing.

I ignored adding this check because this is part of initcall and we are
unlikely to run out of memory at this state. But I'll add the check in
next version.
> 
>> +		rc = of_property_read_u64_array(power_mgt,
>> +						"ibm,cpu-idle-state-psscr",
>> +						psscr_val, dt_idle_states);
>> +		if (rc) {
>> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n");
>> +			goto out_free_psscr;
>> +		}
>> +	}
>>  	residency_ns = kzalloc(sizeof(*residency_ns) * dt_idle_states, GFP_KERNEL);
> 
> if (!residency_ns) check missing.
> 
> I suppose the code is relying on 'of_property_read_u32_array' to check it, 
> right ?

I'll add the NULL check for existing kzalloc's in the file as well in
the next version.

Thanks,
Shreyas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-13 21:48   ` Benjamin Herrenschmidt
@ 2016-06-14 11:11     ` Shreyas B Prabhu
  2016-06-14 11:31       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 30+ messages in thread
From: Shreyas B Prabhu @ 2016-06-14 11:11 UTC (permalink / raw)
  To: benh, mpe
  Cc: ego, mikey, Daniel Lezcano, linux-pm, Rafael J. Wysocki,
	linux-kernel, Rob Herring, maddy, Lorenzo Pieralisi,
	linuxppc-dev



On 06/14/2016 03:18 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote:
>>
>>  /*
>>   * States for dedicated partition case.
>>   */
>> @@ -167,6 +183,8 @@ static int powernv_add_idle_states(void)
>>  	int nr_idle_states = 1; /* Snooze */
>>  	int dt_idle_states;
>>  	u32 *latency_ns, *residency_ns, *flags;
>> +	u64 *psscr_val = NULL;
>> +	const char *names[CPUIDLE_STATE_MAX];
>>  	int i, rc;
>>  
>>  	/* Currently we have snooze statically defined */
>> @@ -199,12 +217,41 @@ static int powernv_add_idle_states(void)
>>  		goto out_free_latency;
>>  	}
>>  
>> +	rc = of_property_read_string_array(power_mgt,
>> +					   "ibm,cpu-idle-state-names", names,
>> +					   dt_idle_states);
> 
> Ok so from this I assume that dt_idle_states is the number of entries,
> which has been checked properly to be < CPUIDLE_STATE_MAX correct ?
> 
> Beause ...
>

While dt_idle_states should not be > CPUIDLE_STATE_MAX, if that were the
case we will end up corrupting memory while updating powernv_states[].
I'll add a WARN_ON for such a case and
handle adding idle states to powernv_states accordingly. Thanks for
pointing this out.

>> +	if (rc < 0) {
>> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
>> +		goto out_free_latency;
>> +	}
>> +
>> +	/*
>> +	 * If the idle states use stop instruction, probe for psscr values
>> +	 * which are necessary to specify required stop level.
>> +	 */
>> +	if (flags[0] & (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP)) {
>> +		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
>> +				    GFP_KERNEL);
>> +		rc = of_property_read_u64_array(power_mgt,
>> +						"ibm,cpu-idle-state-psscr",
>> +						psscr_val, dt_idle_states);
> 
> Here, psscr val is only one u64 ... shouldn't you kmalloc sizeof(..) *
> dt_idle_states ?

I'm using kcalloc here since checkpatch script suggested kcalloc over
kzalloc for allocating memory for arrays.
I'll also include a patch to use kcalloc throughout the file for
uniformity in next version. I was originally planning to post that
cleanup separately.

Thanks,
Shreyas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-14 10:47     ` Shreyas B Prabhu
@ 2016-06-14 11:29       ` Benjamin Herrenschmidt
  2016-06-15  4:57         ` Shreyas B Prabhu
  0 siblings, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2016-06-14 11:29 UTC (permalink / raw)
  To: Shreyas B Prabhu, Daniel Lezcano
  Cc: ego, mikey, linux-pm, Rafael J. Wysocki, linux-kernel,
	Rob Herring, maddy, Lorenzo Pieralisi, linuxppc-dev

On Tue, 2016-06-14 at 16:17 +0530, Shreyas B Prabhu wrote:

> 
> I ignored adding this check because this is part of initcall and we are
> unlikely to run out of memory at this state. But I'll add the check in
> next version.

Why do you malloc the u64 array and not the string pointer array ?
Shouldn't you either have both on stack or both allocated ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-14 11:11     ` Shreyas B Prabhu
@ 2016-06-14 11:31       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2016-06-14 11:31 UTC (permalink / raw)
  To: Shreyas B Prabhu, mpe
  Cc: ego, mikey, Rafael J. Wysocki, linux-pm, linuxppc-dev,
	Daniel Lezcano, linux-kernel, Rob Herring, Lorenzo Pieralisi,
	maddy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

	
 <1465404871-5406-11-git-send-email-shreyas@linux.vnet.ibm.com>
	
	
 <1465854492.3022.30.camel@au1.ibm.com>
	 <575FE64C.9080107@linux.vnet.ibm.com>
Organization: IBM Australia
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.20.3 (3.20.3-1.fc24) 
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit

On Tue, 2016-06-14 at 16:41 +0530, Shreyas B Prabhu wrote:
> >> +            psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
> >> +                                GFP_KERNEL);
> >> +            rc = of_property_read_u64_array(power_mgt,
> >> +                                            "ibm,cpu-idle-state-psscr",
> >> +                                            psscr_val, dt_idle_states);
> > 
> > Here, psscr val is only one u64 ... shouldn't you kmalloc sizeof(..) *
> > dt_idle_states ?
> 
> I'm using kcalloc here since checkpatch script suggested kcalloc over
> kzalloc for allocating memory for arrays.
> I'll also include a patch to use kcalloc throughout the file for
> uniformity in next version. I was originally planning to post that
> cleanup separately.

Ah ok, I missed the use of kcalloc (I didn't even know its existence),
my brain just read kmalloc :-)

Still, I find it inconsistent that you allocate here while you use the
stack for the names. Any reason for that ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states
  2016-06-14 11:29       ` Benjamin Herrenschmidt
@ 2016-06-15  4:57         ` Shreyas B Prabhu
  0 siblings, 0 replies; 30+ messages in thread
From: Shreyas B Prabhu @ 2016-06-15  4:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Daniel Lezcano
  Cc: ego, mikey, linux-pm, Rafael J. Wysocki, linux-kernel,
	Rob Herring, maddy, Lorenzo Pieralisi, linuxppc-dev



On 06/14/2016 04:59 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2016-06-14 at 16:17 +0530, Shreyas B Prabhu wrote:
> 
>>
>> I ignored adding this check because this is part of initcall and we are
>> unlikely to run out of memory at this state. But I'll add the check in
>> next version.
> 
> Why do you malloc the u64 array and not the string pointer array ?
> Shouldn't you either have both on stack or both allocated ?
> 

Yes. I'll make this consistent.

Thanks,
Shreyas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v6, 03/11] powerpc/powernv: Rename idle_power7.S to idle_power_common.S
  2016-06-08 16:54 ` [PATCH v6 03/11] powerpc/powernv: Rename idle_power7.S to idle_power_common.S Shreyas B. Prabhu
@ 2016-06-15  5:31   ` Michael Ellerman
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Ellerman @ 2016-06-15  5:31 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: ego, mikey, benh, Shreyas B. Prabhu, linux-kernel, maddy, linuxppc-dev

On Wed, 2016-08-06 at 16:54:23 UTC, "Shreyas B. Prabhu" wrote:
> idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next
> patch for POWER9. Rename the file to a non-hardware specific
> name.

It's not common for all powerpc CPUs though. So can you call it something other
than just "common".

Maybe idle_book3s.S, or (if it's correct) idle_206.S

cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v6, 07/11] powerpc/powernv: set power_save func after the idle states are initialized
  2016-06-08 16:54 ` [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized Shreyas B. Prabhu
@ 2016-06-15  5:41   ` Michael Ellerman
  2016-06-15  6:11     ` Shreyas B Prabhu
  2016-06-22  1:54   ` [PATCH v6 " Benjamin Herrenschmidt
  2016-06-28 12:10   ` [v6, " Michael Ellerman
  2 siblings, 1 reply; 30+ messages in thread
From: Michael Ellerman @ 2016-06-15  5:41 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: ego, mikey, benh, Shreyas B. Prabhu, linux-kernel, maddy, linuxppc-dev

On Wed, 2016-08-06 at 16:54:27 UTC, "Shreyas B. Prabhu" wrote:
> pnv_init_idle_states discovers supported idle states from the
> device tree and does the required initialization. Set power_save
> function pointer only after this initialization is done

This looks like a bug fix? Or is this not a concern in practice for some reason
(and if so what is that reason)?

cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v6, 07/11] powerpc/powernv: set power_save func after the idle states are initialized
  2016-06-15  5:41   ` [v6, " Michael Ellerman
@ 2016-06-15  6:11     ` Shreyas B Prabhu
  0 siblings, 0 replies; 30+ messages in thread
From: Shreyas B Prabhu @ 2016-06-15  6:11 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: ego, mikey, benh, linux-kernel, maddy, linuxppc-dev



On 06/15/2016 11:11 AM, Michael Ellerman wrote:
> On Wed, 2016-08-06 at 16:54:27 UTC, "Shreyas B. Prabhu" wrote:
>> pnv_init_idle_states discovers supported idle states from the
>> device tree and does the required initialization. Set power_save
>> function pointer only after this initialization is done
> 
> This looks like a bug fix? Or is this not a concern in practice for some reason
> (and if so what is that reason)?
> 

This isn't a concern currently because, all powernv machines so far
supported nap and nap does not need any initialization from kernel side.

Thanks,
Shreyas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v6, 08/11] powerpc/powernv: Add platform support for stop instruction
  2016-06-08 16:54 ` [PATCH v6 08/11] powerpc/powernv: Add platform support for stop instruction Shreyas B. Prabhu
@ 2016-06-15 11:14   ` Michael Ellerman
  2016-06-15 13:00     ` Shreyas B Prabhu
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Ellerman @ 2016-06-15 11:14 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: ego, mikey, benh, Shreyas B. Prabhu, linux-kernel, maddy, linuxppc-dev

Hi Shreyas,

Comments inline ...

On Wed, 2016-08-06 at 16:54:28 UTC, "Shreyas B. Prabhu" wrote:
> POWER ISA v3 defines a new idle processor core mechanism. In summary,
>  a) new instruction named stop is added. This instruction replaces
> 	instructions like nap, sleep, rvwinkle.
>  b) new per thread SPR named Processor Stop Status and Control Register
> 	(PSSCR) is added which controls the behavior of stop instruction.
> 
> PSSCR layout:
> ----------------------------------------------------------
> | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
> ----------------------------------------------------------
> 0      4     41   42    43   44     48    54   56    60
> 
> PSSCR key fields:
> 	Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
> 	power-saving state the thread entered since stop instruction was last
> 	executed.
> 
> 	Bit 42 - Enable State Loss
> 	0 - No state is lost irrespective of other fields
> 	1 - Allows state loss
> 
> 	Bits 44:47 - Power-Saving Level Limit
> 	This limits the power-saving level that can be entered into.
> 
> 	Bits 60:63 - Requested Level
> 	Used to specify which power-saving level must be entered on executing
> 	stop instruction

That would probably be good as a comment somewhere too, maybe idle.c

> diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
> index d2f99ca..3d7fc06 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -13,6 +13,8 @@
>  #ifndef __ASSEMBLY__
>  extern u32 pnv_fastsleep_workaround_at_entry[];
>  extern u32 pnv_fastsleep_workaround_at_exit[];
> +
> +extern u64 pnv_first_deep_stop_state;

Should this have some safe initial value?

> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
> index 6bdcd0d..ae3b155 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -262,6 +262,7 @@ struct machdep_calls {
>  extern void e500_idle(void);
>  extern void power4_idle(void);
>  extern void power7_idle(void);
> +extern void power_stop0(void);

Can that have a better name please?

> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index 9bb8ddf..7f3f8c6 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -162,13 +162,20 @@
>  
>  /* Device tree flags */
>  
> -/* Flags set in power-mgmt nodes in device tree if
> - * respective idle states are supported in the platform.
> +/*
> + * Flags set in power-mgmt nodes in device tree describing
> + * idle states that are supported in the platform.
>   */
> +
> +#define OPAL_PM_TIMEBASE_STOP		0x00000002
> +#define OPAL_PM_LOSE_HYP_CONTEXT	0x00002000
> +#define OPAL_PM_LOSE_FULL_CONTEXT	0x00004000
>  #define OPAL_PM_NAP_ENABLED		0x00010000
>  #define OPAL_PM_SLEEP_ENABLED		0x00020000
>  #define OPAL_PM_WINKLE_ENABLED		0x00040000
>  #define OPAL_PM_SLEEP_ENABLED_ER1	0x00080000 /* with workaround */
> +#define OPAL_PM_STOP_INST_FAST		0x00100000
> +#define OPAL_PM_STOP_INST_DEEP		0x00200000

I don't see the above in skiboot yet?

> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 546540b..ae91b44 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -171,6 +171,8 @@ struct paca_struct {
>  	/* Mask to denote subcore sibling threads */
>  	u8 subcore_sibling_mask;
>  #endif
> +	/* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set */
> +	u64 thread_psscr;

I'm not entirely clear on why that needs to be in the paca. Could it not be global?

> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 009fab1..7f92fc8 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -457,6 +457,7 @@ extern int powersave_nap;	/* set if nap mode can be used in idle loop */
>  extern unsigned long power7_nap(int check_irq);
>  extern unsigned long power7_sleep(void);
>  extern unsigned long power7_winkle(void);
> +extern unsigned long power_stop(unsigned long state);

Can that also have a better name?

> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index a0948f4..89a00d9 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -145,6 +145,16 @@
>  #define MSR_64BIT	0
>  #endif
>  
> +/* Power Management - PSSCR Fields */
> +#define PSSCR_RL_MASK		0x0000000F
> +#define PSSCR_MTL_MASK		0x000000F0
> +#define PSSCR_TR_MASK		0x00000300
> +#define PSSCR_PSLL_MASK		0x000F0000
> +#define PSSCR_EC		0x00100000
> +#define PSSCR_ESL		0x00200000
> +#define PSSCR_SD		0x00400000

Can we get a comment for each #define saying what it is?

> @@ -288,6 +298,7 @@
>  #define SPRN_PMICR	0x354   /* Power Management Idle Control Reg */
>  #define SPRN_PMSR	0x355   /* Power Management Status Reg */
>  #define SPRN_PMMAR	0x356	/* Power Management Memory Activity Register */
> +#define SPRN_PSSCR	0x357	/* Processor Stop Status and Control Register */

Can you add ISA 3, eg:

#define SPRN_PSSCR	0x357	/* Processor Stop Status and Control Register (ISA 3.0) */

I know we haven't been very consistent about that in the past, but we can always try :)

> @@ -761,6 +772,9 @@
>  #define   SIER_SDAR_VALID	0x0200000	/* SDAR contents valid */
>  #define SPRN_SIAR	796
>  #define SPRN_SDAR	797
> +#define SPRN_LMRR	813
> +#define SPRN_LMSER	814
> +#define SPRN_ASDR	816

Ditto, comments with ISA 3 please.

> diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S
> index 2f909a1..c6c2f66 100644
> --- a/arch/powerpc/kernel/idle_power_common.S
> +++ b/arch/powerpc/kernel/idle_power_common.S
> @@ -50,6 +55,15 @@
>  	IDLE_INST;						\
>  	b	.
>  
> +/*
> + * rA - Requested stop state
> + * rB - Spare reg that can be used
> + */
> +#define PSSCR_REQUEST_STATE(rA, rB) 		\
> +	ld	rB, PACA_THREAD_PSSCR(r13);	\
> +	or	rB,rB,rA;			\
> +	mtspr	SPRN_PSSCR, rB;

I only see this used once, so it can just be inline.

> +
>  	.text
>  
>  /*
> @@ -61,8 +75,19 @@ save_sprs_to_stack:
>  	 * Note all register i.e per-core, per-subcore or per-thread is saved
>  	 * here since any thread in the core might wake up first
>  	 */
> +BEGIN_FTR_SECTION
> +	mfspr	r3,SPRN_PTCR
> +	std	r3,_PTCR(r1)
> +	mfspr	r3,SPRN_LMRR
> +	std	r3,_LMRR(r1)
> +	mfspr	r3,SPRN_LMSER
> +	std	r3,_LMSER(r1)
> +	mfspr	r3,SPRN_ASDR
> +	std	r3,_ASDR(r1)
> +FTR_SECTION_ELSE

A comment here saying that SDR1 is removed in ISA 3.0 would be helpful.

>  	mfspr	r3,SPRN_SDR1
>  	std	r3,_SDR1(r1)
> +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)

> @@ -293,6 +354,21 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
>  
>  
>  /*
> + * Used for ppc_md.power_save which needs a function with no parameters
> + */
> +_GLOBAL(power_stop0)
> +	li	r3,0

Zero?

> +	/* Fall through to power_stop */

I think I'd rather you just did that as a C function.

> +/*
> + * r3 - requested stop state
> + */
> +_GLOBAL(power_stop)
> +	PSSCR_REQUEST_STATE(r3,r4)
> +	li	r4, 1
> +	LOAD_REG_ADDR(r5,power_enter_stop)
> +	b	pnv_powersave_common
> +	/* No return */
> +/*
>   * Called from reset vector. Check whether we have woken up with
>   * hypervisor state loss. If yes, restore hypervisor state and return
>   * back to reset vector.
> @@ -301,7 +377,32 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
>   * cr3 - set to gt if waking up with partial/complete hypervisor state loss
>   */
>  _GLOBAL(pnv_restore_hyp_resource)
> +BEGIN_FTR_SECTION
>  	/*
> +	 * POWER ISA 3. Use PSSCR to determine if we
> +	 * are waking up from deep idle state
> +	 */
> +	LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)

That's an @got load using r2, but have we restored r2 yet?

> +	ld	r4,ADDROFF(pnv_first_deep_stop_state)(r5)
> +
> +	mfspr	r5,SPRN_PSSCR

> @@ -397,8 +507,11 @@ first_thread_in_subcore:
>  	bne	cr4,subcore_state_restored
>  
>  	/* Restore per-subcore state */

We don't have subcores on P9, or did I miss a memo?

A comment somewhere explaining that would help I think, it's not clear AFAICS.

> +BEGIN_FTR_SECTION
>  	ld      r4,_SDR1(r1)
>  	mtspr   SPRN_SDR1,r4
> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> +
>  	ld      r4,_RPR(r1)
>  	mtspr   SPRN_RPR,r4
>  	ld	r4,_AMOR(r1)

> @@ -477,6 +601,21 @@ common_exit:
>  	slbmte	r6,r5
>  1:	addi	r8,r8,16
>  	.endr
> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
> +
> +	/* Restore per thread state */
> +BEGIN_FTR_SECTION
> +	bl      __restore_cpu_power9
> +
> +	ld	r4,_LMRR(r1)
> +	mtspr	SPRN_LMRR,r4
> +	ld	r4,_LMSER(r1)
> +	mtspr	SPRN_LMSER,r4
> +	ld	r4,_ASDR(r1)
> +	mtspr	SPRN_ASDR,r4

Should those be in __restore_cpu_power9 ?

> +FTR_SECTION_ELSE
> +	bl	__restore_cpu_power8
> +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)

Then we could potentially do the above by calling through cur_cpu_spec.

> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index fbb09fb..bfbd359 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -27,9 +27,11 @@
>  #include "powernv.h"
>  #include "subcore.h"
>  
> +#define MAX_STOP_STATE	0xF

Says who?

> @@ -130,8 +136,8 @@ static void pnv_alloc_idle_core_states(void)
>  
>  	update_subcore_sibling_mask();
>  
> -	if (supported_cpuidle_states & OPAL_PM_WINKLE_ENABLED)
> -		pnv_save_sprs_for_winkle();
> +	if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
> +		pnv_save_sprs_for_deep_states();

How does that work on old skiboot that doesn't set OPAL_PM_LOSE_FULL_CONTEXT but
still sets OPAL_PM_WINKLE_ENABLED?

>  }
>  
>  u32 pnv_get_supported_cpuidle_states(void)
> @@ -230,11 +236,18 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
>  			show_fastsleep_workaround_applyonce,
>  			store_fastsleep_workaround_applyonce);
>  
> +/*
> + * First deep stop state. Used to figure out when to save/restore
> + * hypervisor context.
> + */
> +u64 pnv_first_deep_stop_state;
> +
>  static int __init pnv_init_idle_states(void)
>  {
>  	struct device_node *power_mgt;

I prefer just "np" - it's shorter and I immediately know what it is.

>  	int dt_idle_states;
>  	u32 *flags;
> +	u64 *psscr_val = NULL;
>  	int i;
>  
>  	supported_cpuidle_states = 0;
> @@ -264,6 +277,32 @@ static int __init pnv_init_idle_states(void)
>  		goto out_free;
>  	}
>  
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {

> +		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
> +					GFP_KERNEL);
> +		if (!psscr_val)
> +			goto out_free;

Newline please.

> +		if (of_property_read_u64_array(power_mgt,
> +			"ibm,cpu-idle-state-psscr",
> +			psscr_val, dt_idle_states)) {
> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n");
> +			goto out_free_psscr;
> +		}
> +
> +		/*
> +		 * Set pnv_first_deep_stop_state to the first stop level
> +		 * to cause hypervisor state loss
> +		 */
> +		pnv_first_deep_stop_state = MAX_STOP_STATE;
> +		for (i = 0; i < dt_idle_states; i++) {
> +			u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
> +
> +			if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
> +			     (pnv_first_deep_stop_state > psscr_rl))
> +				pnv_first_deep_stop_state = psscr_rl;
> +		}
> +	}

This function is >110 lines, which is too big for my taste.

The above would be better as a separate function I think.

>  	for (i = 0; i < dt_idle_states; i++)
>  		supported_cpuidle_states |= flags[i];
>  
> @@ -286,8 +325,29 @@ static int __init pnv_init_idle_states(void)
>  
>  	pnv_alloc_idle_core_states();
>  
> +	if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
> +		for_each_possible_cpu(i) {
> +
> +			u64 psscr_init_val = PSSCR_ESL | PSSCR_EC |
> +					PSSCR_PSLL_MASK | PSSCR_TR_MASK |
> +					PSSCR_MTL_MASK;
> +
> +			paca[i].thread_psscr = psscr_init_val;
> +			/*
> +			 * Memory barrier to ensure that the writes to PACA
> +			 * goes through before ppc_md.power_save is updated
> +			 * below.
> +			 */
> +			mb();
> +		}

And likewise that loop.

cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v6, 08/11] powerpc/powernv: Add platform support for stop instruction
  2016-06-15 11:14   ` [v6, " Michael Ellerman
@ 2016-06-15 13:00     ` Shreyas B Prabhu
  2016-06-21  6:14       ` Michael Neuling
  0 siblings, 1 reply; 30+ messages in thread
From: Shreyas B Prabhu @ 2016-06-15 13:00 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: ego, mikey, benh, linux-kernel, maddy, linuxppc-dev

Hi Michael,

On 06/15/2016 04:44 PM, Michael Ellerman wrote:
> Hi Shreyas,
> 
> Comments inline ...
> 
> On Wed, 2016-08-06 at 16:54:28 UTC, "Shreyas B. Prabhu" wrote:
>> POWER ISA v3 defines a new idle processor core mechanism. In summary,
>>  a) new instruction named stop is added. This instruction replaces
>> 	instructions like nap, sleep, rvwinkle.
>>  b) new per thread SPR named Processor Stop Status and Control Register
>> 	(PSSCR) is added which controls the behavior of stop instruction.
>>
>> PSSCR layout:
>> ----------------------------------------------------------
>> | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
>> ----------------------------------------------------------
>> 0      4     41   42    43   44     48    54   56    60
>>
>> PSSCR key fields:
>> 	Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
>> 	power-saving state the thread entered since stop instruction was last
>> 	executed.
>>
>> 	Bit 42 - Enable State Loss
>> 	0 - No state is lost irrespective of other fields
>> 	1 - Allows state loss
>>
>> 	Bits 44:47 - Power-Saving Level Limit
>> 	This limits the power-saving level that can be entered into.
>>
>> 	Bits 60:63 - Requested Level
>> 	Used to specify which power-saving level must be entered on executing
>> 	stop instruction
> 
> That would probably be good as a comment somewhere too, maybe idle.c
> 

Ok. I'll add it there.

>> diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
>> index d2f99ca..3d7fc06 100644
>> --- a/arch/powerpc/include/asm/cpuidle.h
>> +++ b/arch/powerpc/include/asm/cpuidle.h
>> @@ -13,6 +13,8 @@
>>  #ifndef __ASSEMBLY__
>>  extern u32 pnv_fastsleep_workaround_at_entry[];
>>  extern u32 pnv_fastsleep_workaround_at_exit[];
>> +
>> +extern u64 pnv_first_deep_stop_state;
> 
> Should this have some safe initial value?
> 
>> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>> index 6bdcd0d..ae3b155 100644
>> --- a/arch/powerpc/include/asm/machdep.h
>> +++ b/arch/powerpc/include/asm/machdep.h
>> @@ -262,6 +262,7 @@ struct machdep_calls {
>>  extern void e500_idle(void);
>>  extern void power4_idle(void);
>>  extern void power7_idle(void);
>> +extern void power_stop0(void);
> 
> Can that have a better name please?

What do you have in mind?
power_arch300_idle0()?
power_arch300_stop0()?
or I can use power9_idle() here. But we will still need a better
replacement for power_stop() below.

> 
>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>> index 9bb8ddf..7f3f8c6 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -162,13 +162,20 @@
>>  
>>  /* Device tree flags */
>>  
>> -/* Flags set in power-mgmt nodes in device tree if
>> - * respective idle states are supported in the platform.
>> +/*
>> + * Flags set in power-mgmt nodes in device tree describing
>> + * idle states that are supported in the platform.
>>   */
>> +
>> +#define OPAL_PM_TIMEBASE_STOP		0x00000002
>> +#define OPAL_PM_LOSE_HYP_CONTEXT	0x00002000
>> +#define OPAL_PM_LOSE_FULL_CONTEXT	0x00004000
>>  #define OPAL_PM_NAP_ENABLED		0x00010000
>>  #define OPAL_PM_SLEEP_ENABLED		0x00020000
>>  #define OPAL_PM_WINKLE_ENABLED		0x00040000
>>  #define OPAL_PM_SLEEP_ENABLED_ER1	0x00080000 /* with workaround */
>> +#define OPAL_PM_STOP_INST_FAST		0x00100000
>> +#define OPAL_PM_STOP_INST_DEEP		0x00200000
> 
> I don't see the above in skiboot yet?

I've posted it here -
http://patchwork.ozlabs.org/patch/617828/
> 
>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>> index 546540b..ae91b44 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -171,6 +171,8 @@ struct paca_struct {
>>  	/* Mask to denote subcore sibling threads */
>>  	u8 subcore_sibling_mask;
>>  #endif
>> +	/* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set */
>> +	u64 thread_psscr;
> 
> I'm not entirely clear on why that needs to be in the paca. Could it not be global?
> 

While we use Requested Level (RL) field of PSSCR to request a stop
level, other fields in the SPR like EC, ESL, TR, PSLL, MTL can be
modified by individual threads less frequently to alter the behaviour of
stop. So the idea was to have a per-thread variable with all (except RL)
fields of PSSCR set appropriately. Threads at the time of entering idle,
can modify the RL field in the variable and execute stop instruction.


>> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
>> index 009fab1..7f92fc8 100644
>> --- a/arch/powerpc/include/asm/processor.h
>> +++ b/arch/powerpc/include/asm/processor.h
>> @@ -457,6 +457,7 @@ extern int powersave_nap;	/* set if nap mode can be used in idle loop */
>>  extern unsigned long power7_nap(int check_irq);
>>  extern unsigned long power7_sleep(void);
>>  extern unsigned long power7_winkle(void);
>> +extern unsigned long power_stop(unsigned long state);
> 
> Can that also have a better name?

power_arch300_idle?

> 
>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
>> index a0948f4..89a00d9 100644
>> --- a/arch/powerpc/include/asm/reg.h
>> +++ b/arch/powerpc/include/asm/reg.h
>> @@ -145,6 +145,16 @@
>>  #define MSR_64BIT	0
>>  #endif
>>  
>> +/* Power Management - PSSCR Fields */
>> +#define PSSCR_RL_MASK		0x0000000F
>> +#define PSSCR_MTL_MASK		0x000000F0
>> +#define PSSCR_TR_MASK		0x00000300
>> +#define PSSCR_PSLL_MASK		0x000F0000
>> +#define PSSCR_EC		0x00100000
>> +#define PSSCR_ESL		0x00200000
>> +#define PSSCR_SD		0x00400000
> 
> Can we get a comment for each #define saying what it is?
> 

Yes, Sam and Madhavan suggested this as well. I'll put the comments in
the next revision.

>> @@ -288,6 +298,7 @@
>>  #define SPRN_PMICR	0x354   /* Power Management Idle Control Reg */
>>  #define SPRN_PMSR	0x355   /* Power Management Status Reg */
>>  #define SPRN_PMMAR	0x356	/* Power Management Memory Activity Register */
>> +#define SPRN_PSSCR	0x357	/* Processor Stop Status and Control Register */
> 
> Can you add ISA 3, eg:
> 
> #define SPRN_PSSCR	0x357	/* Processor Stop Status and Control Register (ISA 3.0) */
> 
> I know we haven't been very consistent about that in the past, but we can always try :)
> 

Ok.

>> @@ -761,6 +772,9 @@
>>  #define   SIER_SDAR_VALID	0x0200000	/* SDAR contents valid */
>>  #define SPRN_SIAR	796
>>  #define SPRN_SDAR	797
>> +#define SPRN_LMRR	813
>> +#define SPRN_LMSER	814
>> +#define SPRN_ASDR	816
> 
> Ditto, comments with ISA 3 please.
> 
>> diff --git a/arch/powerpc/kernel/idle_power_common.S b/arch/powerpc/kernel/idle_power_common.S
>> index 2f909a1..c6c2f66 100644
>> --- a/arch/powerpc/kernel/idle_power_common.S
>> +++ b/arch/powerpc/kernel/idle_power_common.S
>> @@ -50,6 +55,15 @@
>>  	IDLE_INST;						\
>>  	b	.
>>  
>> +/*
>> + * rA - Requested stop state
>> + * rB - Spare reg that can be used
>> + */
>> +#define PSSCR_REQUEST_STATE(rA, rB) 		\
>> +	ld	rB, PACA_THREAD_PSSCR(r13);	\
>> +	or	rB,rB,rA;			\
>> +	mtspr	SPRN_PSSCR, rB;
> 
> I only see this used once, so it can just be inline.
> 
Yes. Will do that.
>> +
>>  	.text
>>  
>>  /*
>> @@ -61,8 +75,19 @@ save_sprs_to_stack:
>>  	 * Note all register i.e per-core, per-subcore or per-thread is saved
>>  	 * here since any thread in the core might wake up first
>>  	 */
>> +BEGIN_FTR_SECTION
>> +	mfspr	r3,SPRN_PTCR
>> +	std	r3,_PTCR(r1)
>> +	mfspr	r3,SPRN_LMRR
>> +	std	r3,_LMRR(r1)
>> +	mfspr	r3,SPRN_LMSER
>> +	std	r3,_LMSER(r1)
>> +	mfspr	r3,SPRN_ASDR
>> +	std	r3,_ASDR(r1)
>> +FTR_SECTION_ELSE
> 
> A comment here saying that SDR1 is removed in ISA 3.0 would be helpful.
> 

Ok.

>>  	mfspr	r3,SPRN_SDR1
>>  	std	r3,_SDR1(r1)
>> +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
> 
>> @@ -293,6 +354,21 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
>>  
>>  
>>  /*
>> + * Used for ppc_md.power_save which needs a function with no parameters
>> + */
>> +_GLOBAL(power_stop0)
>> +	li	r3,0
> 
> Zero?
> 
Passing 0 to power_stop. This is just to have a function with no
parameters that can be used for ppc_md.power_save.

>> +	/* Fall through to power_stop */
> 
> I think I'd rather you just did that as a C function.
> 

Ok.

>> +/*
>> + * r3 - requested stop state
>> + */
>> +_GLOBAL(power_stop)
>> +	PSSCR_REQUEST_STATE(r3,r4)
>> +	li	r4, 1
>> +	LOAD_REG_ADDR(r5,power_enter_stop)
>> +	b	pnv_powersave_common
>> +	/* No return */
>> +/*
>>   * Called from reset vector. Check whether we have woken up with
>>   * hypervisor state loss. If yes, restore hypervisor state and return
>>   * back to reset vector.
>> @@ -301,7 +377,32 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);		\
>>   * cr3 - set to gt if waking up with partial/complete hypervisor state loss
>>   */
>>  _GLOBAL(pnv_restore_hyp_resource)
>> +BEGIN_FTR_SECTION
>>  	/*
>> +	 * POWER ISA 3. Use PSSCR to determine if we
>> +	 * are waking up from deep idle state
>> +	 */
>> +	LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
> 
> That's an @got load using r2, but have we restored r2 yet?

That's a bug. I'll fix it in the next revision.

> 
>> +	ld	r4,ADDROFF(pnv_first_deep_stop_state)(r5)
>> +
>> +	mfspr	r5,SPRN_PSSCR
> 
>> @@ -397,8 +507,11 @@ first_thread_in_subcore:
>>  	bne	cr4,subcore_state_restored
>>  
>>  	/* Restore per-subcore state */
> 
> We don't have subcores on P9, or did I miss a memo?
> 
> A comment somewhere explaining that would help I think, it's not clear AFAICS.
> 

Ok.

>> +BEGIN_FTR_SECTION
>>  	ld      r4,_SDR1(r1)
>>  	mtspr   SPRN_SDR1,r4
>> +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
>> +
>>  	ld      r4,_RPR(r1)
>>  	mtspr   SPRN_RPR,r4
>>  	ld	r4,_AMOR(r1)
> 
>> @@ -477,6 +601,21 @@ common_exit:
>>  	slbmte	r6,r5
>>  1:	addi	r8,r8,16
>>  	.endr
>> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
>> +
>> +	/* Restore per thread state */
>> +BEGIN_FTR_SECTION
>> +	bl      __restore_cpu_power9
>> +
>> +	ld	r4,_LMRR(r1)
>> +	mtspr	SPRN_LMRR,r4
>> +	ld	r4,_LMSER(r1)
>> +	mtspr	SPRN_LMSER,r4
>> +	ld	r4,_ASDR(r1)
>> +	mtspr	SPRN_ASDR,r4
> 
> Should those be in __restore_cpu_power9 ?

I was not sure how these registers will be used, but after speaking to
Aneesh and Mikey I realized these registers will not need restoring.
LMRR and LMSER are associated with the context and ADSR will be consumed
before entering stop. So I'll be dropping the this hunk in next revision.

> 
>> +FTR_SECTION_ELSE
>> +	bl	__restore_cpu_power8
>> +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
> 
> Then we could potentially do the above by calling through cur_cpu_spec.
>


I'll call cur_cpu_spec->cpu_restore() here.

>> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
>> index fbb09fb..bfbd359 100644
>> --- a/arch/powerpc/platforms/powernv/idle.c
>> +++ b/arch/powerpc/platforms/powernv/idle.c
>> @@ -27,9 +27,11 @@
>>  #include "powernv.h"
>>  #include "subcore.h"
>>  
>> +#define MAX_STOP_STATE	0xF
> 
> Says who?
>

ISA. I'll add a comment.

>> @@ -130,8 +136,8 @@ static void pnv_alloc_idle_core_states(void)
>>  
>>  	update_subcore_sibling_mask();
>>  
>> -	if (supported_cpuidle_states & OPAL_PM_WINKLE_ENABLED)
>> -		pnv_save_sprs_for_winkle();
>> +	if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
>> +		pnv_save_sprs_for_deep_states();
> 
> How does that work on old skiboot that doesn't set OPAL_PM_LOSE_FULL_CONTEXT but
> still sets OPAL_PM_WINKLE_ENABLED?
>

Even old skiboot has OPAL_PM_LOSE_FULL_CONTEXT flag set for winkle. So
this will be backward compatible.

>>  }
>>  
>>  u32 pnv_get_supported_cpuidle_states(void)
>> @@ -230,11 +236,18 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
>>  			show_fastsleep_workaround_applyonce,
>>  			store_fastsleep_workaround_applyonce);
>>  
>> +/*
>> + * First deep stop state. Used to figure out when to save/restore
>> + * hypervisor context.
>> + */
>> +u64 pnv_first_deep_stop_state;
>> +
>>  static int __init pnv_init_idle_states(void)
>>  {
>>  	struct device_node *power_mgt;
> 
> I prefer just "np" - it's shorter and I immediately know what it is.
> 

Ok.
>>  	int dt_idle_states;
>>  	u32 *flags;
>> +	u64 *psscr_val = NULL;
>>  	int i;
>>  
>>  	supported_cpuidle_states = 0;
>> @@ -264,6 +277,32 @@ static int __init pnv_init_idle_states(void)
>>  		goto out_free;
>>  	}
>>  
>> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> 
>> +		psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val),
>> +					GFP_KERNEL);
>> +		if (!psscr_val)
>> +			goto out_free;
> 
> Newline please.

Ok.
> 
>> +		if (of_property_read_u64_array(power_mgt,
>> +			"ibm,cpu-idle-state-psscr",
>> +			psscr_val, dt_idle_states)) {
>> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-states-psscr in DT\n");
>> +			goto out_free_psscr;
>> +		}
>> +
>> +		/*
>> +		 * Set pnv_first_deep_stop_state to the first stop level
>> +		 * to cause hypervisor state loss
>> +		 */
>> +		pnv_first_deep_stop_state = MAX_STOP_STATE;
>> +		for (i = 0; i < dt_idle_states; i++) {
>> +			u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
>> +
>> +			if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
>> +			     (pnv_first_deep_stop_state > psscr_rl))
>> +				pnv_first_deep_stop_state = psscr_rl;
>> +		}
>> +	}
> 
> This function is >110 lines, which is too big for my taste.
> 
> The above would be better as a separate function I think.

Ok. I'll break this up.
> 
>>  	for (i = 0; i < dt_idle_states; i++)
>>  		supported_cpuidle_states |= flags[i];
>>  
>> @@ -286,8 +325,29 @@ static int __init pnv_init_idle_states(void)
>>  
>>  	pnv_alloc_idle_core_states();
>>  
>> +	if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
>> +		for_each_possible_cpu(i) {
>> +
>> +			u64 psscr_init_val = PSSCR_ESL | PSSCR_EC |
>> +					PSSCR_PSLL_MASK | PSSCR_TR_MASK |
>> +					PSSCR_MTL_MASK;
>> +
>> +			paca[i].thread_psscr = psscr_init_val;
>> +			/*
>> +			 * Memory barrier to ensure that the writes to PACA
>> +			 * goes through before ppc_md.power_save is updated
>> +			 * below.
>> +			 */
>> +			mb();
>> +		}
> 
> And likewise that loop.
> 
Ok.

Thanks for review.
--Shreyas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v6, 08/11] powerpc/powernv: Add platform support for stop instruction
  2016-06-15 13:00     ` Shreyas B Prabhu
@ 2016-06-21  6:14       ` Michael Neuling
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Neuling @ 2016-06-21  6:14 UTC (permalink / raw)
  To: Shreyas B Prabhu, Michael Ellerman
  Cc: ego, benh, linux-kernel, maddy, linuxppc-dev, Aneesh Kk Veetil


> > > +#define OPAL_PM_TIMEBASE_STOP		0x00000002
> > > +#define OPAL_PM_LOSE_HYP_CONTEXT	0x00002000
> > > +#define OPAL_PM_LOSE_FULL_CONTEXT	0x00004000
> > >  #define OPAL_PM_NAP_ENABLED		0x00010000
> > >  #define OPAL_PM_SLEEP_ENABLED		0x00020000
> > >  #define OPAL_PM_WINKLE_ENABLED		0x00040000
> > >  #define OPAL_PM_SLEEP_ENABLED_ER1	0x00080000 /* with
> > > workaround */
> > > +#define OPAL_PM_STOP_INST_FAST		0x00100000
> > > +#define OPAL_PM_STOP_INST_DEEP		0x00200000
> > I don't see the above in skiboot yet?
> I've posted it here -
> http://patchwork.ozlabs.org/patch/617828/

FWIW, this is in now.

https://github.com/open-power/skiboot/commit/952daa69baca407383bc900911f6c40718a0e289

> > 
> > 
> > > 
> > > diff --git a/arch/powerpc/include/asm/paca.h
> > > b/arch/powerpc/include/asm/paca.h
> > > index 546540b..ae91b44 100644
> > > --- a/arch/powerpc/include/asm/paca.h
> > > +++ b/arch/powerpc/include/asm/paca.h
> > > @@ -171,6 +171,8 @@ struct paca_struct {
> > >  	/* Mask to denote subcore sibling threads */
> > >  	u8 subcore_sibling_mask;
> > >  #endif
> > > +	/* Template for PSSCR with EC, ESL, TR, PSLL, MTL fields set
> > > */
> > > +	u64 thread_psscr;
> > I'm not entirely clear on why that needs to be in the paca. Could it
> > not be global?
> > 
> While we use Requested Level (RL) field of PSSCR to request a stop
> level, other fields in the SPR like EC, ESL, TR, PSLL, MTL can be
> modified by individual threads less frequently to alter the behaviour of
> stop. So the idea was to have a per-thread variable with all (except RL)
> fields of PSSCR set appropriately. Threads at the time of entering idle,
> can modify the RL field in the variable and execute stop instruction.

But we don't do any of this currently? This is setup at init
in pnv_init_idle_states() and only the RL is changed in power_stop().

So it can still be a global.  It could just be a constant currently even.

>  	.text
> > >  
> > >  /*
> > > @@ -61,8 +75,19 @@ save_sprs_to_stack:
> > >  	 * Note all register i.e per-core, per-subcore or per-thread 
> > > is saved
> > >  	 * here since any thread in the core might wake up first
> > >  	 */
> > > +BEGIN_FTR_SECTION
> > > +	mfspr	r3,SPRN_PTCR
> > > +	std	r3,_PTCR(r1)
> > > +	mfspr	r3,SPRN_LMRR
> > > +	std	r3,_LMRR(r1)
> > > +	mfspr	r3,SPRN_LMSER
> > > +	std	r3,_LMSER(r1)
> > > +	mfspr	r3,SPRN_ASDR
> > > +	std	r3,_ASDR(r1)
> > > +FTR_SECTION_ELSE
> > A comment here saying that SDR1 is removed in ISA 3.0 would be helpful.
> > 
> Ok.

I thought we decided we didn't need LMRR, LMSR, 

https://lkml.org/lkml/2016/6/8/1121

or ASDR isn't actually used at all yet and is only valid for some page
faults, so we don't need it here also.

> +END_MMU_FTR_SECTION_IFCLR(MMU_FTR_RADIX)
> > > +
> > > +	/* Restore per thread state */
> > > +BEGIN_FTR_SECTION
> > > +	bl      __restore_cpu_power9
> > > +
> > > +	ld	r4,_LMRR(r1)
> > > +	mtspr	SPRN_LMRR,r4
> > > +	ld	r4,_LMSER(r1)
> > > +	mtspr	SPRN_LMSER,r4
> > > +	ld	r4,_ASDR(r1)
> > > +	mtspr	SPRN_ASDR,r4
> > Should those be in __restore_cpu_power9 ?
> I was not sure how these registers will be used, but after speaking to
> Aneesh and Mikey I realized these registers will not need restoring.
> LMRR and LMSER are associated with the context and ADSR will be consumed
> before entering stop. So I'll be dropping the this hunk in next revision.

Yep.

> 
> > >  	pnv_alloc_idle_core_states();
> > >  
> > > +	if (supported_cpuidle_states & OPAL_PM_STOP_INST_FAST)
> > > +		for_each_possible_cpu(i) {
> > > +
> > > +			u64 psscr_init_val = PSSCR_ESL | PSSCR_EC |
> > > +					PSSCR_PSLL_MASK | PSSCR_TR_MASK |
> > > +					PSSCR_MTL_MASK;
> > > +
> > > +			paca[i].thread_psscr = psscr_init_val;

This seems to be the only place you set this.  Why put it in the paca, why
not just make this a constant? 

Mikey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized
  2016-06-08 16:54 ` [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized Shreyas B. Prabhu
  2016-06-15  5:41   ` [v6, " Michael Ellerman
@ 2016-06-22  1:54   ` Benjamin Herrenschmidt
  2016-06-22  5:11     ` Michael Neuling
  2016-06-28 12:10   ` [v6, " Michael Ellerman
  2 siblings, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2016-06-22  1:54 UTC (permalink / raw)
  To: Shreyas B. Prabhu, mpe; +Cc: ego, mikey, linux-kernel, maddy, linuxppc-dev

On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote:
> pnv_init_idle_states discovers supported idle states from the
> device tree and does the required initialization. Set power_save
> function pointer only after this initialization is done
> 
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Please merge that one as-is now, no need to wait for the rest, as
otherwise pwoer9 crashes at boot. It doesn't need to wait for the
rest of the series.

Cheers,
Ben.

> ---
> - No changes since v1
> 
>  arch/powerpc/platforms/powernv/idle.c  | 3 +++
>  arch/powerpc/platforms/powernv/setup.c | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c
> b/arch/powerpc/platforms/powernv/idle.c
> index fcc8b68..fbb09fb 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void)
>  	}
>  
>  	pnv_alloc_idle_core_states();
> +
> +	if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
> +		ppc_md.power_save = power7_idle;
>  out_free:
>  	kfree(flags);
>  out:
> diff --git a/arch/powerpc/platforms/powernv/setup.c
> b/arch/powerpc/platforms/powernv/setup.c
> index ee6430b..8492bbb 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -315,7 +315,7 @@ define_machine(powernv) {
>  	.get_proc_freq          = pnv_get_proc_freq,
>  	.progress		= pnv_progress,
>  	.machine_shutdown	= pnv_shutdown,
> -	.power_save             = power7_idle,
> +	.power_save             = NULL,
>  	.calibrate_decr		= generic_calibrate_decr,
>  #ifdef CONFIG_KEXEC
>  	.kexec_cpu_down		= pnv_kexec_cpu_down,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized
  2016-06-22  1:54   ` [PATCH v6 " Benjamin Herrenschmidt
@ 2016-06-22  5:11     ` Michael Neuling
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Neuling @ 2016-06-22  5:11 UTC (permalink / raw)
  To: benh, Shreyas B. Prabhu, mpe; +Cc: ego, linux-kernel, maddy, linuxppc-dev

On Wed, 2016-06-22 at 11:54 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2016-06-08 at 11:54 -0500, Shreyas B. Prabhu wrote:
> > 
> > pnv_init_idle_states discovers supported idle states from the
> > device tree and does the required initialization. Set power_save
> > function pointer only after this initialization is done
> > 
> > Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> Please merge that one as-is now, no need to wait for the rest, as
> otherwise pwoer9 crashes at boot. It doesn't need to wait for the
> rest of the series.

Acked-by: Michael Neuling <mikey@neuling.org>

For the same reason. Without this we need powersave=off on the cmdline on
POWER9.

Mikey

> 
> Cheers,
> Ben.
> 
> > 
> > ---
> > - No changes since v1
> > 
> >  arch/powerpc/platforms/powernv/idle.c  | 3 +++
> >  arch/powerpc/platforms/powernv/setup.c | 2 +-
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/idle.c
> > b/arch/powerpc/platforms/powernv/idle.c
> > index fcc8b68..fbb09fb 100644
> > --- a/arch/powerpc/platforms/powernv/idle.c
> > +++ b/arch/powerpc/platforms/powernv/idle.c
> > @@ -285,6 +285,9 @@ static int __init pnv_init_idle_states(void)
> >  	}
> >  
> >  	pnv_alloc_idle_core_states();
> > +
> > +	if (supported_cpuidle_states & OPAL_PM_NAP_ENABLED)
> > +		ppc_md.power_save = power7_idle;
> >  out_free:
> >  	kfree(flags);
> >  out:
> > diff --git a/arch/powerpc/platforms/powernv/setup.c
> > b/arch/powerpc/platforms/powernv/setup.c
> > index ee6430b..8492bbb 100644
> > --- a/arch/powerpc/platforms/powernv/setup.c
> > +++ b/arch/powerpc/platforms/powernv/setup.c
> > @@ -315,7 +315,7 @@ define_machine(powernv) {
> >  	.get_proc_freq          = pnv_get_proc_freq,
> >  	.progress		= pnv_progress,
> >  	.machine_shutdown	= pnv_shutdown,
> > -	.power_save             = power7_idle,
> > +	.power_save             = NULL,
> >  	.calibrate_decr		= generic_calibrate_decr,
> >  #ifdef CONFIG_KEXEC
> >  	.kexec_cpu_down		= pnv_kexec_cpu_down,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v6, 07/11] powerpc/powernv: set power_save func after the idle states are initialized
  2016-06-08 16:54 ` [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized Shreyas B. Prabhu
  2016-06-15  5:41   ` [v6, " Michael Ellerman
  2016-06-22  1:54   ` [PATCH v6 " Benjamin Herrenschmidt
@ 2016-06-28 12:10   ` Michael Ellerman
  2 siblings, 0 replies; 30+ messages in thread
From: Michael Ellerman @ 2016-06-28 12:10 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: ego, mikey, benh, Shreyas B. Prabhu, linux-kernel, maddy, linuxppc-dev

On Wed, 2016-08-06 at 16:54:27 UTC, "Shreyas B. Prabhu" wrote:
> pnv_init_idle_states discovers supported idle states from the
> device tree and does the required initialization. Set power_save
> function pointer only after this initialization is done
> 
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Acked-by: Michael Neuling <mikey@neuling.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5593e3032736ccba30d28bd27e

cheers

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-06-28 12:10 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08 16:54 [PATCH v6 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
2016-06-08 16:54 ` [PATCH v6 01/11] powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle Shreyas B. Prabhu
2016-06-08 16:54 ` [PATCH v6 02/11] powerpc/kvm: make hypervisor state restore a function Shreyas B. Prabhu
2016-06-08 16:54 ` [PATCH v6 03/11] powerpc/powernv: Rename idle_power7.S to idle_power_common.S Shreyas B. Prabhu
2016-06-15  5:31   ` [v6, " Michael Ellerman
2016-06-08 16:54 ` [PATCH v6 04/11] powerpc/powernv: Rename reusable idle functions to hardware agnostic names Shreyas B. Prabhu
2016-06-08 16:54 ` [PATCH v6 05/11] powerpc/powernv: Make pnv_powersave_common more generic Shreyas B. Prabhu
2016-06-08 16:54 ` [PATCH v6 06/11] powerpc/powernv: abstraction for saving SPRs before entering deep idle states Shreyas B. Prabhu
2016-06-08 16:54 ` [PATCH v6 07/11] powerpc/powernv: set power_save func after the idle states are initialized Shreyas B. Prabhu
2016-06-15  5:41   ` [v6, " Michael Ellerman
2016-06-15  6:11     ` Shreyas B Prabhu
2016-06-22  1:54   ` [PATCH v6 " Benjamin Herrenschmidt
2016-06-22  5:11     ` Michael Neuling
2016-06-28 12:10   ` [v6, " Michael Ellerman
2016-06-08 16:54 ` [PATCH v6 08/11] powerpc/powernv: Add platform support for stop instruction Shreyas B. Prabhu
2016-06-15 11:14   ` [v6, " Michael Ellerman
2016-06-15 13:00     ` Shreyas B Prabhu
2016-06-21  6:14       ` Michael Neuling
2016-06-08 16:54 ` [PATCH v6 09/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES Shreyas B. Prabhu
2016-06-13 15:01   ` Daniel Lezcano
2016-06-13 20:58     ` Rafael J. Wysocki
2016-06-08 16:54 ` [PATCH v6 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states Shreyas B. Prabhu
2016-06-13 15:34   ` Daniel Lezcano
2016-06-14 10:47     ` Shreyas B Prabhu
2016-06-14 11:29       ` Benjamin Herrenschmidt
2016-06-15  4:57         ` Shreyas B Prabhu
2016-06-13 21:48   ` Benjamin Herrenschmidt
2016-06-14 11:11     ` Shreyas B Prabhu
2016-06-14 11:31       ` Benjamin Herrenschmidt
2016-06-08 16:54 ` [PATCH v6 11/11] powerpc/powernv: Use deepest stop state when cpu is offlined Shreyas B. Prabhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).