All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Enable support for deep-stop states on POWER9
@ 2017-05-16  8:49 Gautham R. Shenoy
  2017-05-16  8:49 ` [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr Gautham R. Shenoy
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Gautham R. Shenoy @ 2017-05-16  8:49 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Hi,

This patch series contains some of the fixes required for enabling
support for deep stop states such as STOP4 and STOP11 via CPU-Hotplug.

These fixes mainly ensure that some of the hypervisor resources which
are lost during the deep stop state are correctly restored on a
wakeup.

There are 6 patches in the series.

Patch 1 correctly initializes the core_idle_state_ptr based on the
threads_per_core. core_idle_state_ptr is used to determine if a thread
is the last thread entering a deep stop state or a first thread waking
up from deep stop state in order to save/restore per-core resources.

Patch 2 decouples restoring timebase from restoring hypervisor
resources, as there are stop states which lose hypervisor state but
not the timebase.

Patch 3 saves the LPCR value before executing deep stop and restores
it back to the saved value on the wakeup from stop.

Patch 4 programs the restoration of some of one-time initialized SPRs
via the stop-api.

Patch 5 provides a workaround for a hardware issue on POWER9 DD1 chips
where the PLS value cannot be relied upon on a wakeup from deep stop.

Patch 6 fixes the cpuidle-powernv initialization code to allow deep
states that don't lose timebase.

These patches are based on the Linux upstream and have been tested
with the corresponding skiboot patches in
https://lists.ozlabs.org/pipermail/skiboot/2017-May/007183.html to get
STOP4 working via CPU-Hotplug.

Akshay Adiga (1):
  powernv:idle: Restore SPRs for deep idle states via stop API.

Gautham R. Shenoy (5):
  powernv:idle: Correctly initialize core_idle_state_ptr
  powernv:idle: Decouple Timebase restore & Per-core SPRs restore
  powernv:idle: Restore LPCR on wakeup from deep-stop
  powernv:idle: Use Requested Level for restoring state on P9 DD1
  cpuidle-powernv: Allow Deep stop states that don't stop time

 arch/powerpc/include/asm/paca.h       |   2 +
 arch/powerpc/kernel/asm-offsets.c     |   1 +
 arch/powerpc/kernel/idle_book3s.S     |  33 +++++++---
 arch/powerpc/platforms/powernv/idle.c | 112 +++++++++++++++++++++-------------
 drivers/cpuidle/cpuidle-powernv.c     |  16 +++--
 5 files changed, 110 insertions(+), 54 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr
  2017-05-16  8:49 [PATCH 0/6] Enable support for deep-stop states on POWER9 Gautham R. Shenoy
@ 2017-05-16  8:49 ` Gautham R. Shenoy
  2017-05-30  5:56   ` Nicholas Piggin
  2017-05-30  9:11   ` [1/6] " Michael Ellerman
  2017-05-16  8:49 ` [PATCH 2/6] powernv:idle: Decouple Timebase restore & Per-core SPRs restore Gautham R. Shenoy
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Gautham R. Shenoy @ 2017-05-16  8:49 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
threads in the core. This is supposed to be initialized to bit-map
corresponding to the threads_per_core. However, currently it is
initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
POWER8 which has 8 threads per core, but not for POWER9 which has 4
threads per core.

As a result, on POWER9, core_idle_state_ptr gets initialized to
0xFF. In case when all the threads of the core are idle, the bits
corresponding tracking the idle-threads are non-zero. As a result, the
idle entry/exit code fails to save/restore per-core hypervisor state
since it assumes that there are threads in the cores which are still
active.

Fix this by correctly initializing the lower bits of the
core_idle_state_ptr on the basis of threads_per_core.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/idle.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 445f30a..84eb9bc 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -96,15 +96,24 @@ static void pnv_alloc_idle_core_states(void)
 	u32 *core_idle_state;
 
 	/*
-	 * core_idle_state - First 8 bits track the idle state of each thread
-	 * of the core. The 8th bit is the lock bit. Initially all thread bits
-	 * are set. They are cleared when the thread enters deep idle state
-	 * like sleep and winkle. Initially the lock bit is cleared.
-	 * The lock bit has 2 purposes
-	 * a. While the first thread is restoring core state, it prevents
-	 * other threads in the core from switching to process context.
-	 * b. While the last thread in the core is saving the core state, it
-	 * prevents a different thread from waking up.
+	 * core_idle_state - The lower 8 bits track the idle state of
+	 * each thread of the core.
+	 *
+	 * The most significant bit is the lock bit.
+	 *
+	 * Initially all the bits corresponding to threads_per_core
+	 * are set. They are cleared when the thread enters deep idle
+	 * state like sleep and winkle/stop.
+	 *
+	 * Initially the lock bit is cleared.  The lock bit has 2
+	 * purposes:
+	 * 	a. While the first thread in the core waking up from
+	 * 	   idle is restoring core state, it prevents other
+	 * 	   threads in the core from switching to process
+	 * 	   context.
+	 * 	b. While the last thread in the core is saving the
+	 *	   core state, it prevents a different thread from
+	 *	   waking up.
 	 */
 	for (i = 0; i < nr_cores; i++) {
 		int first_cpu = i * threads_per_core;
@@ -112,7 +121,7 @@ static void pnv_alloc_idle_core_states(void)
 		size_t paca_ptr_array_size;
 
 		core_idle_state = kmalloc_node(sizeof(u32), GFP_KERNEL, node);
-		*core_idle_state = PNV_CORE_IDLE_THREAD_BITS;
+		*core_idle_state = (1 << threads_per_core) - 1;
 		paca_ptr_array_size = (threads_per_core *
 				       sizeof(struct paca_struct *));
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] powernv:idle: Decouple Timebase restore & Per-core SPRs restore
  2017-05-16  8:49 [PATCH 0/6] Enable support for deep-stop states on POWER9 Gautham R. Shenoy
  2017-05-16  8:49 ` [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr Gautham R. Shenoy
@ 2017-05-16  8:49 ` Gautham R. Shenoy
  2017-05-30  6:12   ` Nicholas Piggin
  2017-05-16  8:49 ` [PATCH 3/6] powernv:idle: Restore LPCR on wakeup from deep-stop Gautham R. Shenoy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Gautham R. Shenoy @ 2017-05-16  8:49 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On POWER8, in case of
   -  nap: both timebase and hypervisor state is retained.
   -  fast-sleep: timebase is lost. But the hypervisor state is retained.
   -  winkle: timebase and hypervisor state is lost.

Hence, the current code for handling exit from a idle state assumes
that if the timebase value is retained, then so is the hypervisor
state. Thus, the current code doesn't restore per-core hypervisor
state in such cases.

But that is no longer the case on POWER9 where we do have stop states
in which timebase value is retained, but the hypervisor state is
lost. So we have to ensure that the per-core hypervisor state gets
restored in such cases.

Fix this by ensuring that even in the case when timebase is retained,
we explicitly check if we are waking up from a deep stop that loses
per-core hypervisor state (indicated by cr4 being eq or gt), and if
this is the case, we restore the per-core hypervisor state.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/idle_book3s.S | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 4898d67..afd029f 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -731,13 +731,14 @@ timebase_resync:
 	 * Use cr3 which indicates that we are waking up with atleast partial
 	 * hypervisor state loss to determine if TIMEBASE RESYNC is needed.
 	 */
-	ble	cr3,clear_lock
+	ble	cr3,.Ltb_resynced
 	/* Time base re-sync */
 	bl	opal_resync_timebase;
 	/*
-	 * If waking up from sleep, per core state is not lost, skip to
-	 * clear_lock.
+	 * If waking up from sleep (POWER8), per core state
+	 * is not lost, skip to clear_lock.
 	 */
+.Ltb_resynced:
 	blt	cr4,clear_lock
 
 	/*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] powernv:idle: Restore LPCR on wakeup from deep-stop
  2017-05-16  8:49 [PATCH 0/6] Enable support for deep-stop states on POWER9 Gautham R. Shenoy
  2017-05-16  8:49 ` [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr Gautham R. Shenoy
  2017-05-16  8:49 ` [PATCH 2/6] powernv:idle: Decouple Timebase restore & Per-core SPRs restore Gautham R. Shenoy
@ 2017-05-16  8:49 ` Gautham R. Shenoy
  2017-05-30  6:17   ` Nicholas Piggin
  2017-05-16  8:49 ` [PATCH 4/6] powernv:idle: Restore SPRs for deep idle states via stop API Gautham R. Shenoy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Gautham R. Shenoy @ 2017-05-16  8:49 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On wakeup from a deep stop state which is supposed to lose the
hypervisor state, we don't restore the LPCR to the old value but set
it to a "sane" value via cur_cpu_spec->cpu_restore().

The problem is that the "sane" value doesn't include UPRT and the HR
bits which are required to run correctly in Radix mode.

Fix this on POWER9 onwards by restoring the LPCR value whatever it was
before executing the stop instruction.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/idle_book3s.S | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index afd029f..6c9920d 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -31,6 +31,7 @@
  * registers for winkle support.
  */
 #define _SDR1	GPR3
+#define _PTCR	GPR3
 #define _RPR	GPR4
 #define _SPURR	GPR5
 #define _PURR	GPR6
@@ -39,7 +40,7 @@
 #define _AMOR	GPR9
 #define _WORT	GPR10
 #define _WORC	GPR11
-#define _PTCR	GPR12
+#define _LPCR	GPR12
 
 #define PSSCR_EC_ESL_MASK_SHIFTED          (PSSCR_EC | PSSCR_ESL) >> 16
 
@@ -55,12 +56,14 @@ save_sprs_to_stack:
 	 * here since any thread in the core might wake up first
 	 */
 BEGIN_FTR_SECTION
-	mfspr	r3,SPRN_PTCR
-	std	r3,_PTCR(r1)
 	/*
 	 * Note - SDR1 is dropped in Power ISA v3. Hence not restoring
 	 * SDR1 here
 	 */
+	mfspr	r3,SPRN_PTCR
+	std	r3,_PTCR(r1)
+	mfspr	r3,SPRN_LPCR
+	std	r3,_LPCR(r1)
 FTR_SECTION_ELSE
 	mfspr	r3,SPRN_SDR1
 	std	r3,_SDR1(r1)
@@ -813,6 +816,10 @@ no_segments:
 	mtctr	r12
 	bctrl
 
+BEGIN_FTR_SECTION
+	ld	r4,_LPCR(r1)
+	mtspr	SPRN_LPCR,r4
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 hypervisor_state_restored:
 
 	mtspr	SPRN_SRR1,r16
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] powernv:idle: Restore SPRs for deep idle states via stop API.
  2017-05-16  8:49 [PATCH 0/6] Enable support for deep-stop states on POWER9 Gautham R. Shenoy
                   ` (2 preceding siblings ...)
  2017-05-16  8:49 ` [PATCH 3/6] powernv:idle: Restore LPCR on wakeup from deep-stop Gautham R. Shenoy
@ 2017-05-16  8:49 ` Gautham R. Shenoy
  2017-05-16  8:49 ` [PATCH 5/6] powernv:idle: Use Requested Level for restoring state on P9 DD1 Gautham R. Shenoy
  2017-05-16  8:49 ` [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time Gautham R. Shenoy
  5 siblings, 0 replies; 19+ messages in thread
From: Gautham R. Shenoy @ 2017-05-16  8:49 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>

Some of the SPR values (HID0, MSR, SPRG0) don't change during the run
time of a booted kernel, once they have been initialized.

The contents of these SPRs are lost when the CPUs enter deep stop
states. So instead saving and restoring SPRs from the kernel, use the
stop-api provided by the firmware by which the firmware can restore
the contents of these SPRs to their initialized values after wakeup
from a deep stop state.

Apart from these, program the PSSCR value to that of the deepest stop
state via the stop-api. This will be used to indicate to the
underlying firmware as to what stop state to put the threads that have
been woken up by a special-wakeup.

And while we are at programming SPRs via stop-api, ensure that HID1,
HID4 and HID5 registers which are only available on POWER8 are not
requested to be restored by the firware on POWER9.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/idle.c | 83 ++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 84eb9bc..4deac0d 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -30,8 +30,33 @@
 /* Power ISA 3.0 allows for stop states 0x0 - 0xF */
 #define MAX_STOP_STATE	0xF
 
+#define P9_STOP_SPR_MSR 2000
+#define P9_STOP_SPR_PSSCR      855
+
 static u32 supported_cpuidle_states;
 
+/*
+ * The default stop state that will be used by ppc_md.power_save
+ * function on platforms that support stop instruction.
+ */
+static u64 pnv_default_stop_val;
+static u64 pnv_default_stop_mask;
+static bool default_stop_found;
+
+/*
+ * First deep stop state. Used to figure out when to save/restore
+ * hypervisor context.
+ */
+u64 pnv_first_deep_stop_state = MAX_STOP_STATE;
+
+/*
+ * psscr value and mask of the deepest stop idle state.
+ * Used when a cpu is offlined.
+ */
+static u64 pnv_deepest_stop_psscr_val;
+static u64 pnv_deepest_stop_psscr_mask;
+static bool deepest_stop_found;
+
 static int pnv_save_sprs_for_deep_states(void)
 {
 	int cpu;
@@ -48,6 +73,8 @@ static int pnv_save_sprs_for_deep_states(void)
 	uint64_t hid4_val = mfspr(SPRN_HID4);
 	uint64_t hid5_val = mfspr(SPRN_HID5);
 	uint64_t hmeer_val = mfspr(SPRN_HMEER);
+	uint64_t msr_val = MSR_IDLE;
+	uint64_t psscr_val = pnv_deepest_stop_psscr_val;
 
 	for_each_possible_cpu(cpu) {
 		uint64_t pir = get_hard_smp_processor_id(cpu);
@@ -61,6 +88,18 @@ static int pnv_save_sprs_for_deep_states(void)
 		if (rc != 0)
 			return rc;
 
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			rc = opal_slw_set_reg(pir, P9_STOP_SPR_MSR, msr_val);
+			if (rc)
+				return rc;
+
+			rc = opal_slw_set_reg(pir,
+					      P9_STOP_SPR_PSSCR, psscr_val);
+
+			if (rc)
+				return rc;
+		}
+
 		/* HIDs are per core registers */
 		if (cpu_thread_in_core(cpu) == 0) {
 
@@ -72,17 +111,21 @@ static int pnv_save_sprs_for_deep_states(void)
 			if (rc != 0)
 				return rc;
 
-			rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
-			if (rc != 0)
-				return rc;
+			/* Only p8 needs to set extra HID regiters */
+			if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
 
-			rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
-			if (rc != 0)
-				return rc;
+				rc = opal_slw_set_reg(pir, SPRN_HID1, hid1_val);
+				if (rc != 0)
+					return rc;
 
-			rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
-			if (rc != 0)
-				return rc;
+				rc = opal_slw_set_reg(pir, SPRN_HID4, hid4_val);
+				if (rc != 0)
+					return rc;
+
+				rc = opal_slw_set_reg(pir, SPRN_HID5, hid5_val);
+				if (rc != 0)
+					return rc;
+			}
 		}
 	}
 
@@ -241,14 +284,6 @@ static DEVICE_ATTR(fastsleep_workaround_applyonce, 0600,
 			store_fastsleep_workaround_applyonce);
 
 /*
- * The default stop state that will be used by ppc_md.power_save
- * function on platforms that support stop instruction.
- */
-static u64 pnv_default_stop_val;
-static u64 pnv_default_stop_mask;
-static bool default_stop_found;
-
-/*
  * Used for ppc_md.power_save which needs a function with no parameters
  */
 static void power9_idle(void)
@@ -257,20 +292,6 @@ static void power9_idle(void)
 }
 
 /*
- * First deep stop state. Used to figure out when to save/restore
- * hypervisor context.
- */
-u64 pnv_first_deep_stop_state = MAX_STOP_STATE;
-
-/*
- * psscr value and mask of the deepest stop idle state.
- * Used when a cpu is offlined.
- */
-static u64 pnv_deepest_stop_psscr_val;
-static u64 pnv_deepest_stop_psscr_mask;
-static bool deepest_stop_found;
-
-/*
  * pnv_cpu_offline: A function that puts the CPU into the deepest
  * available platform idle state on a CPU-Offline.
  */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] powernv:idle: Use Requested Level for restoring state on P9 DD1
  2017-05-16  8:49 [PATCH 0/6] Enable support for deep-stop states on POWER9 Gautham R. Shenoy
                   ` (3 preceding siblings ...)
  2017-05-16  8:49 ` [PATCH 4/6] powernv:idle: Restore SPRs for deep idle states via stop API Gautham R. Shenoy
@ 2017-05-16  8:49 ` Gautham R. Shenoy
  2017-05-30  6:27   ` Nicholas Piggin
  2017-05-16  8:49 ` [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time Gautham R. Shenoy
  5 siblings, 1 reply; 19+ messages in thread
From: Gautham R. Shenoy @ 2017-05-16  8:49 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On Power9 DD1 due to a hardware bug the Power-Saving Level Status
field (PLS) of the PSSCR for a thread waking up from a deep state can
under-report if some other thread in the core is in a shallow stop
state. The scenario in which this can manifest is as follows:

       1) All the threads of the core are in deep stop.
       2) One of the threads is woken up. The PLS for this thread will
          correctly reflect that it is waking up from deep stop.
       3) The thread that has woken up now executes a shallow stop.
       4) When some other thread in the core is woken, its PLS will reflect
          the shallow stop state.

Thus, the subsequent thread for which the PLS is under-reporting the
wakeup state will not restore the hypervisor resources.

Hence, on DD1 systems, use the Requested Level (RL) field as a
workaround to restore the contents of the hypervisor resources on the
wakeup from the stop state.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/paca.h   |  2 ++
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/idle_book3s.S | 13 ++++++++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 1c09f8f..77f60a0 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -177,6 +177,8 @@ struct paca_struct {
 	 * to the sibling threads' paca.
 	 */
 	struct paca_struct **thread_sibling_pacas;
+	/* The PSSCR value that the kernel requested before going to stop */
+	u64 requested_psscr;
 #endif
 
 #ifdef CONFIG_PPC_STD_MMU_64
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 709e234..e15c178 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -742,6 +742,7 @@ int main(void)
 	OFFSET(PACA_THREAD_MASK, paca_struct, thread_mask);
 	OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
 	OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
+	OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
 #endif
 
 	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 6c9920d..98a6d07 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -379,6 +379,7 @@ _GLOBAL(power9_idle_stop)
 	mfspr   r5,SPRN_PSSCR
 	andc    r5,r5,r4
 	or      r3,r3,r5
+	std	r3, PACA_REQ_PSSCR(r13)
 	mtspr 	SPRN_PSSCR,r3
 	LOAD_REG_ADDR(r5,power_enter_stop)
 	li	r4,1
@@ -498,12 +499,22 @@ pnv_restore_hyp_resource_arch300:
 	LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
 	ld	r4,ADDROFF(pnv_first_deep_stop_state)(r5)
 
-	mfspr	r5,SPRN_PSSCR
+BEGIN_FTR_SECTION_NESTED(71)
+	/*
+	 * Assume that we are waking up from the state
+	 * same as the Requested Level (RL) in the PSSCR
+	 * which are Bits 60-63
+	 */
+	ld	r5,PACA_REQ_PSSCR(r13)
+	rldicl  r5,r5,0,60
+FTR_SECTION_ELSE_NESTED(71)
 	/*
 	 * 0-3 bits correspond to Power-Saving Level Status
 	 * which indicates the idle state we are waking up from
 	 */
+	mfspr	r5, SPRN_PSSCR
 	rldicl  r5,r5,4,60
+ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 71)
 	cmpd	cr4,r5,r4
 	bge	cr4,pnv_wakeup_tb_loss /* returns to caller */
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time
  2017-05-16  8:49 [PATCH 0/6] Enable support for deep-stop states on POWER9 Gautham R. Shenoy
                   ` (4 preceding siblings ...)
  2017-05-16  8:49 ` [PATCH 5/6] powernv:idle: Use Requested Level for restoring state on P9 DD1 Gautham R. Shenoy
@ 2017-05-16  8:49 ` Gautham R. Shenoy
  2017-05-30  7:13   ` Nicholas Piggin
  5 siblings, 1 reply; 19+ messages in thread
From: Gautham R. Shenoy @ 2017-05-16  8:49 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

The current code in the cpuidle-powernv intialization only allows deep
stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
(indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
POWER8 time where deep states used to lose the timebase. However, on
POWER9, we do have stop states that are deep (they lose hypervisor
state) but retain the timebase.

Fix the initialization code in the cpuidle-powernv driver to allow
such deep states.

Further, there is a bug in cpuidle-powernv driver with
CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
even if a platform idle state which loses time base was not added to
the cpuidle table.

Fix this by ensuring that the nr_idle_states variable gets incremented
only when the platform idle state was added to the cpuidle table.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle-powernv.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 12409a5..45eaf06 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -354,6 +354,7 @@ static int powernv_add_idle_states(void)
 
 	for (i = 0; i < dt_idle_states; i++) {
 		unsigned int exit_latency, target_residency;
+		bool stops_timebase = false;
 		/*
 		 * If an idle state has exit latency beyond
 		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
@@ -381,6 +382,9 @@ static int powernv_add_idle_states(void)
 			}
 		}
 
+		if (flags[i] & OPAL_PM_TIMEBASE_STOP)
+			stops_timebase = true;
+
 		/*
 		 * For nap and fastsleep, use default target_residency
 		 * values if f/w does not expose it.
@@ -392,8 +396,7 @@ static int powernv_add_idle_states(void)
 			add_powernv_state(nr_idle_states, "Nap",
 					  CPUIDLE_FLAG_NONE, nap_loop,
 					  target_residency, exit_latency, 0, 0);
-		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
-				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
+		} else if (has_stop_states && !stops_timebase) {
 			add_powernv_state(nr_idle_states, names[i],
 					  CPUIDLE_FLAG_NONE, stop_loop,
 					  target_residency, exit_latency,
@@ -405,8 +408,8 @@ static int powernv_add_idle_states(void)
 		 * within this config dependency check.
 		 */
 #ifdef CONFIG_TICK_ONESHOT
-		if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
-			flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
+		else if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
+			 flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
 			if (!rc)
 				target_residency = 300000;
 			/* Add FASTSLEEP state */
@@ -414,14 +417,15 @@ static int powernv_add_idle_states(void)
 					  CPUIDLE_FLAG_TIMER_STOP,
 					  fastsleep_loop,
 					  target_residency, exit_latency, 0, 0);
-		} else if ((flags[i] & OPAL_PM_STOP_INST_DEEP) &&
-				(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
+		} else if (has_stop_states && stops_timebase) {
 			add_powernv_state(nr_idle_states, names[i],
 					  CPUIDLE_FLAG_TIMER_STOP, stop_loop,
 					  target_residency, exit_latency,
 					  psscr_val[i], psscr_mask[i]);
 		}
 #endif
+		else
+			continue;
 		nr_idle_states++;
 	}
 out:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr
  2017-05-16  8:49 ` [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr Gautham R. Shenoy
@ 2017-05-30  5:56   ` Nicholas Piggin
  2017-05-30 10:23     ` Gautham R Shenoy
  2017-05-30  9:11   ` [1/6] " Michael Ellerman
  1 sibling, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2017-05-30  5:56 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: Michael Ellerman, Michael Neuling, Vaidyanathan Srinivasan,
	Shilpasri G Bhat, Akshay Adiga, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel

On Tue, 16 May 2017 14:19:43 +0530
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
> threads in the core. This is supposed to be initialized to bit-map
> corresponding to the threads_per_core. However, currently it is
> initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
> POWER8 which has 8 threads per core, but not for POWER9 which has 4
> threads per core.
> 
> As a result, on POWER9, core_idle_state_ptr gets initialized to
> 0xFF. In case when all the threads of the core are idle, the bits
> corresponding tracking the idle-threads are non-zero. As a result, the
> idle entry/exit code fails to save/restore per-core hypervisor state
> since it assumes that there are threads in the cores which are still
> active.
> 
> Fix this by correctly initializing the lower bits of the
> core_idle_state_ptr on the basis of threads_per_core.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

This looks good to me.

Until this patch series, we can't enable HV state loss idle modes
on POWER9, is that correct? And after your series does it work?

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] powernv:idle: Decouple Timebase restore & Per-core SPRs restore
  2017-05-16  8:49 ` [PATCH 2/6] powernv:idle: Decouple Timebase restore & Per-core SPRs restore Gautham R. Shenoy
@ 2017-05-30  6:12   ` Nicholas Piggin
  2017-05-30 10:28     ` Gautham R Shenoy
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2017-05-30  6:12 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: Michael Ellerman, Michael Neuling, Vaidyanathan Srinivasan,
	Shilpasri G Bhat, Akshay Adiga, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel

On Tue, 16 May 2017 14:19:44 +0530
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> On POWER8, in case of
>    -  nap: both timebase and hypervisor state is retained.
>    -  fast-sleep: timebase is lost. But the hypervisor state is retained.
>    -  winkle: timebase and hypervisor state is lost.
> 
> Hence, the current code for handling exit from a idle state assumes
> that if the timebase value is retained, then so is the hypervisor
> state. Thus, the current code doesn't restore per-core hypervisor
> state in such cases.
> 
> But that is no longer the case on POWER9 where we do have stop states
> in which timebase value is retained, but the hypervisor state is
> lost. So we have to ensure that the per-core hypervisor state gets
> restored in such cases.
> 
> Fix this by ensuring that even in the case when timebase is retained,
> we explicitly check if we are waking up from a deep stop that loses
> per-core hypervisor state (indicated by cr4 being eq or gt), and if
> this is the case, we restore the per-core hypervisor state.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/idle_book3s.S | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 4898d67..afd029f 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -731,13 +731,14 @@ timebase_resync:
>  	 * Use cr3 which indicates that we are waking up with atleast partial
>  	 * hypervisor state loss to determine if TIMEBASE RESYNC is needed.
>  	 */
> -	ble	cr3,clear_lock
> +	ble	cr3,.Ltb_resynced
>  	/* Time base re-sync */
>  	bl	opal_resync_timebase;
>  	/*
> -	 * If waking up from sleep, per core state is not lost, skip to
> -	 * clear_lock.
> +	 * If waking up from sleep (POWER8), per core state
> +	 * is not lost, skip to clear_lock.
>  	 */
> +.Ltb_resynced:
>  	blt	cr4,clear_lock
>  
>  	/*

It's more that timebase was not lost, rather than resynced.
Can we just do a 'bgtl cr3,opal_resync_timebase'?

I guess cr4 branch will never be true on POWER9... I think
pnv_wakeup_tb_loss is going to end up clearer being split
into two between isa 207 and 300. But that can wait until
after POWER9 is working properly.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] powernv:idle: Restore LPCR on wakeup from deep-stop
  2017-05-16  8:49 ` [PATCH 3/6] powernv:idle: Restore LPCR on wakeup from deep-stop Gautham R. Shenoy
@ 2017-05-30  6:17   ` Nicholas Piggin
  2017-05-30 10:35     ` Gautham R Shenoy
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2017-05-30  6:17 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: Michael Ellerman, Michael Neuling, Vaidyanathan Srinivasan,
	Shilpasri G Bhat, Akshay Adiga, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel

On Tue, 16 May 2017 14:19:45 +0530
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> On wakeup from a deep stop state which is supposed to lose the
> hypervisor state, we don't restore the LPCR to the old value but set
> it to a "sane" value via cur_cpu_spec->cpu_restore().
> 
> The problem is that the "sane" value doesn't include UPRT and the HR
> bits which are required to run correctly in Radix mode.
> 
> Fix this on POWER9 onwards by restoring the LPCR value whatever it was
> before executing the stop instruction.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

Yes I think we need this. I have a plan to rework some of
that cpu_restore and early CPU init stuff, but for now we
need this.

Does the OCC restore LPCR properly then we just trash it
with ->cpu_restore(), or is it always junk?

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>


> ---
>  arch/powerpc/kernel/idle_book3s.S | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index afd029f..6c9920d 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -31,6 +31,7 @@
>   * registers for winkle support.
>   */
>  #define _SDR1	GPR3
> +#define _PTCR	GPR3
>  #define _RPR	GPR4
>  #define _SPURR	GPR5
>  #define _PURR	GPR6
> @@ -39,7 +40,7 @@
>  #define _AMOR	GPR9
>  #define _WORT	GPR10
>  #define _WORC	GPR11
> -#define _PTCR	GPR12
> +#define _LPCR	GPR12
>  
>  #define PSSCR_EC_ESL_MASK_SHIFTED          (PSSCR_EC | PSSCR_ESL) >> 16
>  
> @@ -55,12 +56,14 @@ save_sprs_to_stack:
>  	 * here since any thread in the core might wake up first
>  	 */
>  BEGIN_FTR_SECTION
> -	mfspr	r3,SPRN_PTCR
> -	std	r3,_PTCR(r1)
>  	/*
>  	 * Note - SDR1 is dropped in Power ISA v3. Hence not restoring
>  	 * SDR1 here
>  	 */
> +	mfspr	r3,SPRN_PTCR
> +	std	r3,_PTCR(r1)
> +	mfspr	r3,SPRN_LPCR
> +	std	r3,_LPCR(r1)
>  FTR_SECTION_ELSE
>  	mfspr	r3,SPRN_SDR1
>  	std	r3,_SDR1(r1)
> @@ -813,6 +816,10 @@ no_segments:
>  	mtctr	r12
>  	bctrl
>  
> +BEGIN_FTR_SECTION
> +	ld	r4,_LPCR(r1)
> +	mtspr	SPRN_LPCR,r4
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>  hypervisor_state_restored:
>  
>  	mtspr	SPRN_SRR1,r16

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] powernv:idle: Use Requested Level for restoring state on P9 DD1
  2017-05-16  8:49 ` [PATCH 5/6] powernv:idle: Use Requested Level for restoring state on P9 DD1 Gautham R. Shenoy
@ 2017-05-30  6:27   ` Nicholas Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2017-05-30  6:27 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: Michael Ellerman, Michael Neuling, Vaidyanathan Srinivasan,
	Shilpasri G Bhat, Akshay Adiga, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel

On Tue, 16 May 2017 14:19:47 +0530
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> On Power9 DD1 due to a hardware bug the Power-Saving Level Status
> field (PLS) of the PSSCR for a thread waking up from a deep state can
> under-report if some other thread in the core is in a shallow stop
> state. The scenario in which this can manifest is as follows:
> 
>        1) All the threads of the core are in deep stop.
>        2) One of the threads is woken up. The PLS for this thread will
>           correctly reflect that it is waking up from deep stop.
>        3) The thread that has woken up now executes a shallow stop.
>        4) When some other thread in the core is woken, its PLS will reflect
>           the shallow stop state.
> 
> Thus, the subsequent thread for which the PLS is under-reporting the
> wakeup state will not restore the hypervisor resources.
> 
> Hence, on DD1 systems, use the Requested Level (RL) field as a
> workaround to restore the contents of the hypervisor resources on the
> wakeup from the stop state.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>


Reviewed-by: Nicholas Piggin <npiggin@gmail.com>


> ---
>  arch/powerpc/include/asm/paca.h   |  2 ++
>  arch/powerpc/kernel/asm-offsets.c |  1 +
>  arch/powerpc/kernel/idle_book3s.S | 13 ++++++++++++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 1c09f8f..77f60a0 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -177,6 +177,8 @@ struct paca_struct {
>  	 * to the sibling threads' paca.
>  	 */
>  	struct paca_struct **thread_sibling_pacas;
> +	/* The PSSCR value that the kernel requested before going to stop */
> +	u64 requested_psscr;
>  #endif
>  
>  #ifdef CONFIG_PPC_STD_MMU_64
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 709e234..e15c178 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -742,6 +742,7 @@ int main(void)
>  	OFFSET(PACA_THREAD_MASK, paca_struct, thread_mask);
>  	OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
>  	OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
> +	OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
>  #endif
>  
>  	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 6c9920d..98a6d07 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -379,6 +379,7 @@ _GLOBAL(power9_idle_stop)
>  	mfspr   r5,SPRN_PSSCR
>  	andc    r5,r5,r4
>  	or      r3,r3,r5
> +	std	r3, PACA_REQ_PSSCR(r13)
>  	mtspr 	SPRN_PSSCR,r3
>  	LOAD_REG_ADDR(r5,power_enter_stop)
>  	li	r4,1
> @@ -498,12 +499,22 @@ pnv_restore_hyp_resource_arch300:
>  	LOAD_REG_ADDRBASE(r5,pnv_first_deep_stop_state)
>  	ld	r4,ADDROFF(pnv_first_deep_stop_state)(r5)
>  
> -	mfspr	r5,SPRN_PSSCR
> +BEGIN_FTR_SECTION_NESTED(71)
> +	/*
> +	 * Assume that we are waking up from the state
> +	 * same as the Requested Level (RL) in the PSSCR
> +	 * which are Bits 60-63
> +	 */
> +	ld	r5,PACA_REQ_PSSCR(r13)
> +	rldicl  r5,r5,0,60
> +FTR_SECTION_ELSE_NESTED(71)
>  	/*
>  	 * 0-3 bits correspond to Power-Saving Level Status
>  	 * which indicates the idle state we are waking up from
>  	 */
> +	mfspr	r5, SPRN_PSSCR
>  	rldicl  r5,r5,4,60
> +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_POWER9_DD1, 71)
>  	cmpd	cr4,r5,r4
>  	bge	cr4,pnv_wakeup_tb_loss /* returns to caller */
>  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time
  2017-05-16  8:49 ` [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time Gautham R. Shenoy
@ 2017-05-30  7:13   ` Nicholas Piggin
  2017-05-30 10:50     ` Gautham R Shenoy
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2017-05-30  7:13 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: Michael Ellerman, Michael Neuling, Vaidyanathan Srinivasan,
	Shilpasri G Bhat, Akshay Adiga, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel

On Tue, 16 May 2017 14:19:48 +0530
"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> The current code in the cpuidle-powernv intialization only allows deep
> stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
> (indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
> POWER8 time where deep states used to lose the timebase. However, on
> POWER9, we do have stop states that are deep (they lose hypervisor
> state) but retain the timebase.
> 
> Fix the initialization code in the cpuidle-powernv driver to allow
> such deep states.
> 
> Further, there is a bug in cpuidle-powernv driver with
> CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
> even if a platform idle state which loses time base was not added to
> the cpuidle table.
> 
> Fix this by ensuring that the nr_idle_states variable gets incremented
> only when the platform idle state was added to the cpuidle table.

Should this be a separate patch? Stable?

> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  drivers/cpuidle/cpuidle-powernv.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 12409a5..45eaf06 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -354,6 +354,7 @@ static int powernv_add_idle_states(void)
>  
>  	for (i = 0; i < dt_idle_states; i++) {
>  		unsigned int exit_latency, target_residency;
> +		bool stops_timebase = false;
>  		/*
>  		 * If an idle state has exit latency beyond
>  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> @@ -381,6 +382,9 @@ static int powernv_add_idle_states(void)
>  			}
>  		}
>  
> +		if (flags[i] & OPAL_PM_TIMEBASE_STOP)
> +			stops_timebase = true;
> +
>  		/*
>  		 * For nap and fastsleep, use default target_residency
>  		 * values if f/w does not expose it.
> @@ -392,8 +396,7 @@ static int powernv_add_idle_states(void)
>  			add_powernv_state(nr_idle_states, "Nap",
>  					  CPUIDLE_FLAG_NONE, nap_loop,
>  					  target_residency, exit_latency, 0, 0);
> -		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
> -				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
> +		} else if (has_stop_states && !stops_timebase) {
>  			add_powernv_state(nr_idle_states, names[i],
>  					  CPUIDLE_FLAG_NONE, stop_loop,
>  					  target_residency, exit_latency,
> @@ -405,8 +408,8 @@ static int powernv_add_idle_states(void)
>  		 * within this config dependency check.
>  		 */
>  #ifdef CONFIG_TICK_ONESHOT
> -		if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> -			flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
> +		else if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> +			 flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {

Hmm, seems okay but readability is isn't the best with the ifdef and
mixing power8 and 9 cases IMO.

Particularly with the nice regular POWER9 states, we're not doing much
logic in this loop besides checking for the timebase stop flag, right?
Would it be clearer if it was changed to something like this (untested
quick hack)?

---
 drivers/cpuidle/cpuidle-powernv.c | 76 +++++++++++++++++++--------------------
 1 file changed, 37 insertions(+), 39 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 12409a519cc5..77291389f9ac 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -353,7 +353,9 @@ static int powernv_add_idle_states(void)
 	}
 
 	for (i = 0; i < dt_idle_states; i++) {
+		unsigned int cpuidle_flags = CPUIDLE_FLAG_NONE;
 		unsigned int exit_latency, target_residency;
+
 		/*
 		 * If an idle state has exit latency beyond
 		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
@@ -371,6 +373,16 @@ static int powernv_add_idle_states(void)
 		else
 			target_residency = 0;
 
+		if (flags[i] & OPAL_PM_TIMEBASE_STOP) {
+			/*
+			 * All cpuidle states with CPUIDLE_FLAG_TIMER_STOP set
+			 * depend on CONFIG_TICK_ONESHOT.
+			 */
+			if (!IS_ENABLED(CONFIG_TICK_ONESHOT))
+				continue;
+			cpuidle_flags = CPUIDLE_FLAG_TIMER_STOP;
+		}
+
 		if (has_stop_states) {
 			int err = validate_psscr_val_mask(&psscr_val[i],
 							  &psscr_mask[i],
@@ -379,50 +391,36 @@ static int powernv_add_idle_states(void)
 				report_invalid_psscr_val(psscr_val[i], err);
 				continue;
 			}
-		}
 
-		/*
-		 * For nap and fastsleep, use default target_residency
-		 * values if f/w does not expose it.
-		 */
-		if (flags[i] & OPAL_PM_NAP_ENABLED) {
-			if (!rc)
-				target_residency = 100;
-			/* Add NAP state */
-			add_powernv_state(nr_idle_states, "Nap",
-					  CPUIDLE_FLAG_NONE, nap_loop,
-					  target_residency, exit_latency, 0, 0);
-		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
-				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
 			add_powernv_state(nr_idle_states, names[i],
-					  CPUIDLE_FLAG_NONE, stop_loop,
-					  target_residency, exit_latency,
-					  psscr_val[i], psscr_mask[i]);
-		}
-
-		/*
-		 * All cpuidle states with CPUIDLE_FLAG_TIMER_STOP set must come
-		 * within this config dependency check.
-		 */
-#ifdef CONFIG_TICK_ONESHOT
-		if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
-			flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
-			if (!rc)
-				target_residency = 300000;
-			/* Add FASTSLEEP state */
-			add_powernv_state(nr_idle_states, "FastSleep",
-					  CPUIDLE_FLAG_TIMER_STOP,
-					  fastsleep_loop,
-					  target_residency, exit_latency, 0, 0);
-		} else if ((flags[i] & OPAL_PM_STOP_INST_DEEP) &&
-				(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
-			add_powernv_state(nr_idle_states, names[i],
-					  CPUIDLE_FLAG_TIMER_STOP, stop_loop,
+					  cpuidle_flags, stop_loop,
 					  target_residency, exit_latency,
 					  psscr_val[i], psscr_mask[i]);
+			nr_idle_states++;
+		} else {
+			/*
+			 * For nap and fastsleep, use default target_residency
+			 * values if f/w does not expose it.
+			 */
+			if (flags[i] & OPAL_PM_NAP_ENABLED) {
+				if (!rc)
+					target_residency = 100;
+				/* Add NAP state */
+				add_powernv_state(nr_idle_states, "Nap",
+						  cpuidle_flags, nap_loop,
+						  target_residency, exit_latency, 0, 0);
+				nr_idle_states++;
+			} else if (flags[i] & (OPAL_PM_SLEEP_ENABLED |
+						OPAL_PM_SLEEP_ENABLED_ER1)) {
+				if (!rc)
+					target_residency = 300000;
+				/* Add FASTSLEEP state */
+				add_powernv_state(nr_idle_states, "FastSleep",
+						  cpuidle_flags, fastsleep_loop,
+						  target_residency, exit_latency, 0, 0);
+				nr_idle_states++;
+			}
 		}
-#endif
-		nr_idle_states++;
 	}
 out:
 	return nr_idle_states;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [1/6] powernv:idle: Correctly initialize core_idle_state_ptr
  2017-05-16  8:49 ` [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr Gautham R. Shenoy
  2017-05-30  5:56   ` Nicholas Piggin
@ 2017-05-30  9:11   ` Michael Ellerman
  1 sibling, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2017-05-30  9:11 UTC (permalink / raw)
  To: Gautham R. Shenoy, Nicholas Piggin, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt
  Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel

On Tue, 2017-05-16 at 08:49:43 UTC, "Gautham R. Shenoy" wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
> threads in the core. This is supposed to be initialized to bit-map
> corresponding to the threads_per_core. However, currently it is
> initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
> POWER8 which has 8 threads per core, but not for POWER9 which has 4
> threads per core.
> 
> As a result, on POWER9, core_idle_state_ptr gets initialized to
> 0xFF. In case when all the threads of the core are idle, the bits
> corresponding tracking the idle-threads are non-zero. As a result, the
> idle entry/exit code fails to save/restore per-core hypervisor state
> since it assumes that there are threads in the cores which are still
> active.
> 
> Fix this by correctly initializing the lower bits of the
> core_idle_state_ptr on the basis of threads_per_core.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5f221c3ca13dceaea8eefe21dbd85d

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr
  2017-05-30  5:56   ` Nicholas Piggin
@ 2017-05-30 10:23     ` Gautham R Shenoy
  0 siblings, 0 replies; 19+ messages in thread
From: Gautham R Shenoy @ 2017-05-30 10:23 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Gautham R. Shenoy, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt, linuxppc-dev, linux-kernel

Hi Nicholas,

On Tue, May 30, 2017 at 03:56:12PM +1000, Nicholas Piggin wrote:
> On Tue, 16 May 2017 14:19:43 +0530
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:
> 
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > 
> > The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
> > threads in the core. This is supposed to be initialized to bit-map
> > corresponding to the threads_per_core. However, currently it is
> > initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
> > POWER8 which has 8 threads per core, but not for POWER9 which has 4
> > threads per core.
> > 
> > As a result, on POWER9, core_idle_state_ptr gets initialized to
> > 0xFF. In case when all the threads of the core are idle, the bits
> > corresponding tracking the idle-threads are non-zero. As a result, the
> > idle entry/exit code fails to save/restore per-core hypervisor state
> > since it assumes that there are threads in the cores which are still
> > active.
> > 
> > Fix this by correctly initializing the lower bits of the
> > core_idle_state_ptr on the basis of threads_per_core.
> > 
> > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> 
> This looks good to me.
> 
> Until this patch series, we can't enable HV state loss idle modes
> on POWER9, is that correct? And after your series does it work?

Yes, that is correct.

> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>

Thanks for reviewing the patch!

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] powernv:idle: Decouple Timebase restore & Per-core SPRs restore
  2017-05-30  6:12   ` Nicholas Piggin
@ 2017-05-30 10:28     ` Gautham R Shenoy
  0 siblings, 0 replies; 19+ messages in thread
From: Gautham R Shenoy @ 2017-05-30 10:28 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Gautham R. Shenoy, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt, linuxppc-dev, linux-kernel

On Tue, May 30, 2017 at 04:12:38PM +1000, Nicholas Piggin wrote:
> On Tue, 16 May 2017 14:19:44 +0530
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:
> 
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > 
> > On POWER8, in case of
> >    -  nap: both timebase and hypervisor state is retained.
> >    -  fast-sleep: timebase is lost. But the hypervisor state is retained.
> >    -  winkle: timebase and hypervisor state is lost.
> > 
> > Hence, the current code for handling exit from a idle state assumes
> > that if the timebase value is retained, then so is the hypervisor
> > state. Thus, the current code doesn't restore per-core hypervisor
> > state in such cases.
> > 
> > But that is no longer the case on POWER9 where we do have stop states
> > in which timebase value is retained, but the hypervisor state is
> > lost. So we have to ensure that the per-core hypervisor state gets
> > restored in such cases.
> > 
> > Fix this by ensuring that even in the case when timebase is retained,
> > we explicitly check if we are waking up from a deep stop that loses
> > per-core hypervisor state (indicated by cr4 being eq or gt), and if
> > this is the case, we restore the per-core hypervisor state.
> > 
> > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/kernel/idle_book3s.S | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> > index 4898d67..afd029f 100644
> > --- a/arch/powerpc/kernel/idle_book3s.S
> > +++ b/arch/powerpc/kernel/idle_book3s.S
> > @@ -731,13 +731,14 @@ timebase_resync:
> >  	 * Use cr3 which indicates that we are waking up with atleast partial
> >  	 * hypervisor state loss to determine if TIMEBASE RESYNC is needed.
> >  	 */
> > -	ble	cr3,clear_lock
> > +	ble	cr3,.Ltb_resynced
> >  	/* Time base re-sync */
> >  	bl	opal_resync_timebase;
> >  	/*
> > -	 * If waking up from sleep, per core state is not lost, skip to
> > -	 * clear_lock.
> > +	 * If waking up from sleep (POWER8), per core state
> > +	 * is not lost, skip to clear_lock.
> >  	 */
> > +.Ltb_resynced:
> >  	blt	cr4,clear_lock
> >  
> >  	/*
> 
> It's more that timebase was not lost, rather than resynced.
> Can we just do a 'bgtl cr3,opal_resync_timebase'?

Yes, that's looks much better. I hadn't thought of bgtl. Will update
this.


> 
> I guess cr4 branch will never be true on POWER9... I think

Yes. Inside pnv_wakeup_tb_loss, cr4 will either have gt or
eq set. This branch applies only to POWER8 where cr4 would be eq only
on winkle, and not on fastsleep (a state that loses timebase but not
hypervisor state).


> pnv_wakeup_tb_loss is going to end up clearer being split
> into two between isa 207 and 300. But that can wait until
> after POWER9 is working properly.

Sure. 

> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] powernv:idle: Restore LPCR on wakeup from deep-stop
  2017-05-30  6:17   ` Nicholas Piggin
@ 2017-05-30 10:35     ` Gautham R Shenoy
  0 siblings, 0 replies; 19+ messages in thread
From: Gautham R Shenoy @ 2017-05-30 10:35 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Gautham R. Shenoy, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt, linuxppc-dev, linux-kernel

On Tue, May 30, 2017 at 04:17:31PM +1000, Nicholas Piggin wrote:
> On Tue, 16 May 2017 14:19:45 +0530
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:
> 
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > 
> > On wakeup from a deep stop state which is supposed to lose the
> > hypervisor state, we don't restore the LPCR to the old value but set
> > it to a "sane" value via cur_cpu_spec->cpu_restore().
> > 
> > The problem is that the "sane" value doesn't include UPRT and the HR
> > bits which are required to run correctly in Radix mode.
> > 
> > Fix this on POWER9 onwards by restoring the LPCR value whatever it was
> > before executing the stop instruction.
> > 
> > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> 
> Yes I think we need this. I have a plan to rework some of
> that cpu_restore and early CPU init stuff, but for now we
> need this.
> 
> Does the OCC restore LPCR properly then we just trash it
> with ->cpu_restore(), or is it always junk?

The microcode restores LPCR to the value that the kernel asks it to
set to in pnv_save_sprs_for_deep_states() via the opal_slw_set_reg. So
it would be some sane state.

However when we return from stop to either the cpuidle or cpuhotplug,
we want LPCR to be restored to the value it hand when it entered stop.


> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
>

Thanks for the review.

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time
  2017-05-30  7:13   ` Nicholas Piggin
@ 2017-05-30 10:50     ` Gautham R Shenoy
  2017-05-30 11:10       ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Gautham R Shenoy @ 2017-05-30 10:50 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Gautham R. Shenoy, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt, linuxppc-dev, linux-kernel

On Tue, May 30, 2017 at 05:13:57PM +1000, Nicholas Piggin wrote:
> On Tue, 16 May 2017 14:19:48 +0530
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:
> 
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > 
> > The current code in the cpuidle-powernv intialization only allows deep
> > stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
> > (indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
> > POWER8 time where deep states used to lose the timebase. However, on
> > POWER9, we do have stop states that are deep (they lose hypervisor
> > state) but retain the timebase.
> > 
> > Fix the initialization code in the cpuidle-powernv driver to allow
> > such deep states.
> > 
> > Further, there is a bug in cpuidle-powernv driver with
> > CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
> > even if a platform idle state which loses time base was not added to
> > the cpuidle table.
> > 
> > Fix this by ensuring that the nr_idle_states variable gets incremented
> > only when the platform idle state was added to the cpuidle table.
> 
> Should this be a separate patch? Stable?

Ok. Will send it out separately.

> 
> > 
> > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > ---
> >  drivers/cpuidle/cpuidle-powernv.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > index 12409a5..45eaf06 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -354,6 +354,7 @@ static int powernv_add_idle_states(void)
> >  
> >  	for (i = 0; i < dt_idle_states; i++) {
> >  		unsigned int exit_latency, target_residency;
> > +		bool stops_timebase = false;
> >  		/*
> >  		 * If an idle state has exit latency beyond
> >  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> > @@ -381,6 +382,9 @@ static int powernv_add_idle_states(void)
> >  			}
> >  		}
> >  
> > +		if (flags[i] & OPAL_PM_TIMEBASE_STOP)
> > +			stops_timebase = true;
> > +
> >  		/*
> >  		 * For nap and fastsleep, use default target_residency
> >  		 * values if f/w does not expose it.
> > @@ -392,8 +396,7 @@ static int powernv_add_idle_states(void)
> >  			add_powernv_state(nr_idle_states, "Nap",
> >  					  CPUIDLE_FLAG_NONE, nap_loop,
> >  					  target_residency, exit_latency, 0, 0);
> > -		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
> > -				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
> > +		} else if (has_stop_states && !stops_timebase) {
> >  			add_powernv_state(nr_idle_states, names[i],
> >  					  CPUIDLE_FLAG_NONE, stop_loop,
> >  					  target_residency, exit_latency,
> > @@ -405,8 +408,8 @@ static int powernv_add_idle_states(void)
> >  		 * within this config dependency check.
> >  		 */
> >  #ifdef CONFIG_TICK_ONESHOT
> > -		if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> > -			flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
> > +		else if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> > +			 flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
> 
> Hmm, seems okay but readability is isn't the best with the ifdef and
> mixing power8 and 9 cases IMO.
> 
> Particularly with the nice regular POWER9 states, we're not doing much
> logic in this loop besides checking for the timebase stop flag, right?
> Would it be clearer if it was changed to something like this (untested
> quick hack)?

Yes, this is very much doable. Some comments below.

> 
> ---
>  drivers/cpuidle/cpuidle-powernv.c | 76 +++++++++++++++++++--------------------
>  1 file changed, 37 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 12409a519cc5..77291389f9ac 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -353,7 +353,9 @@ static int powernv_add_idle_states(void)
>  	}
> 
>  	for (i = 0; i < dt_idle_states; i++) {
> +		unsigned int cpuidle_flags = CPUIDLE_FLAG_NONE;
>  		unsigned int exit_latency, target_residency;
> +
>  		/*
>  		 * If an idle state has exit latency beyond
>  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> @@ -371,6 +373,16 @@ static int powernv_add_idle_states(void)
>  		else
>  			target_residency = 0;
> 
> +		if (flags[i] & OPAL_PM_TIMEBASE_STOP) {
> +			/*
> +			 * All cpuidle states with CPUIDLE_FLAG_TIMER_STOP set
> +			 * depend on CONFIG_TICK_ONESHOT.
> +			 */
> +			if (!IS_ENABLED(CONFIG_TICK_ONESHOT))
> +				continue;
> +			cpuidle_flags = CPUIDLE_FLAG_TIMER_STOP;

Yes, this can be done. We just need to extend this condition to
(flags[i] & OPAL_PM_TIMEBASE_STOP || flags[i] &
OPAL_PM_SLEEP_ENABLED).

Even though all the recent versions of OPAL have OPAL_PM_TIMEBASE_STOP
set for states which have OPAL_PM_SLEEP_ENABLED, I am not sure if
that was always the case and which is why we have the fastsleep case
explicitly coded in the kernel.



> +		}
> +
>  		if (has_stop_states) {
>  			int err = validate_psscr_val_mask(&psscr_val[i],
>  							  &psscr_mask[i],
> @@ -379,50 +391,36 @@ static int powernv_add_idle_states(void)
>  				report_invalid_psscr_val(psscr_val[i], err);
>  				continue;
>  			}
> -		}
> 
> -		/*
> -		 * For nap and fastsleep, use default target_residency
> -		 * values if f/w does not expose it.
> -		 */
> -		if (flags[i] & OPAL_PM_NAP_ENABLED) {
> -			if (!rc)
> -				target_residency = 100;
> -			/* Add NAP state */
> -			add_powernv_state(nr_idle_states, "Nap",
> -					  CPUIDLE_FLAG_NONE, nap_loop,
> -					  target_residency, exit_latency, 0, 0);
> -		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
> -				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
>  			add_powernv_state(nr_idle_states, names[i],
> -					  CPUIDLE_FLAG_NONE, stop_loop,
> -					  target_residency, exit_latency,
> -					  psscr_val[i], psscr_mask[i]);
> -		}
> -
> -		/*
> -		 * All cpuidle states with CPUIDLE_FLAG_TIMER_STOP set must come
> -		 * within this config dependency check.
> -		 */
> -#ifdef CONFIG_TICK_ONESHOT
> -		if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> -			flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
> -			if (!rc)
> -				target_residency = 300000;
> -			/* Add FASTSLEEP state */
> -			add_powernv_state(nr_idle_states, "FastSleep",
> -					  CPUIDLE_FLAG_TIMER_STOP,
> -					  fastsleep_loop,
> -					  target_residency, exit_latency, 0, 0);
> -		} else if ((flags[i] & OPAL_PM_STOP_INST_DEEP) &&
> -				(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
> -			add_powernv_state(nr_idle_states, names[i],
> -					  CPUIDLE_FLAG_TIMER_STOP, stop_loop,
> +					  cpuidle_flags, stop_loop,
>  					  target_residency, exit_latency,
>  					  psscr_val[i], psscr_mask[i]);
> +			nr_idle_states++;
> +		} else {
> +			/*
> +			 * For nap and fastsleep, use default target_residency
> +			 * values if f/w does not expose it.
> +			 */
> +			if (flags[i] & OPAL_PM_NAP_ENABLED) {
> +				if (!rc)
> +					target_residency = 100;
> +				/* Add NAP state */
> +				add_powernv_state(nr_idle_states, "Nap",
> +						  cpuidle_flags, nap_loop,
> +						  target_residency, exit_latency, 0, 0);
> +				nr_idle_states++;
> +			} else if (flags[i] & (OPAL_PM_SLEEP_ENABLED |
> +						OPAL_PM_SLEEP_ENABLED_ER1)) {
> +				if (!rc)
> +					target_residency = 300000;
> +				/* Add FASTSLEEP state */
> +				add_powernv_state(nr_idle_states, "FastSleep",
> +						  cpuidle_flags, fastsleep_loop,
> +						  target_residency, exit_latency, 0, 0);
> +				nr_idle_states++;
> +			}
>  		}
> -#endif
> -		nr_idle_states++;
>  	}
>  out:
>  	return nr_idle_states;
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time
  2017-05-30 10:50     ` Gautham R Shenoy
@ 2017-05-30 11:10       ` Nicholas Piggin
  2017-05-31  8:39         ` Gautham R Shenoy
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2017-05-30 11:10 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Michael Ellerman, Michael Neuling, Vaidyanathan Srinivasan,
	Shilpasri G Bhat, Akshay Adiga, Benjamin Herrenschmidt,
	linuxppc-dev, linux-kernel

On Tue, 30 May 2017 16:20:55 +0530
Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:

> On Tue, May 30, 2017 at 05:13:57PM +1000, Nicholas Piggin wrote:
> > On Tue, 16 May 2017 14:19:48 +0530
> > "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:
> >   
> > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > > 
> > > The current code in the cpuidle-powernv intialization only allows deep
> > > stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
> > > (indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
> > > POWER8 time where deep states used to lose the timebase. However, on
> > > POWER9, we do have stop states that are deep (they lose hypervisor
> > > state) but retain the timebase.
> > > 
> > > Fix the initialization code in the cpuidle-powernv driver to allow
> > > such deep states.
> > > 
> > > Further, there is a bug in cpuidle-powernv driver with
> > > CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
> > > even if a platform idle state which loses time base was not added to
> > > the cpuidle table.
> > > 
> > > Fix this by ensuring that the nr_idle_states variable gets incremented
> > > only when the platform idle state was added to the cpuidle table.  
> > 
> > Should this be a separate patch? Stable?  
> 
> Ok. Will send it out separately.

Looks like mpe has merged this in next now. I just wonder if this
particular bit would be relevant for POWER8 and therefore be a
stable candidate? All the POWER9 idle fixes may not be suitable for
stable.


> > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > > ---
> > >  drivers/cpuidle/cpuidle-powernv.c | 16 ++++++++++------
> > >  1 file changed, 10 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > > index 12409a5..45eaf06 100644
> > > --- a/drivers/cpuidle/cpuidle-powernv.c
> > > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > > @@ -354,6 +354,7 @@ static int powernv_add_idle_states(void)
> > >  
> > >  	for (i = 0; i < dt_idle_states; i++) {
> > >  		unsigned int exit_latency, target_residency;
> > > +		bool stops_timebase = false;
> > >  		/*
> > >  		 * If an idle state has exit latency beyond
> > >  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> > > @@ -381,6 +382,9 @@ static int powernv_add_idle_states(void)
> > >  			}
> > >  		}
> > >  
> > > +		if (flags[i] & OPAL_PM_TIMEBASE_STOP)
> > > +			stops_timebase = true;
> > > +
> > >  		/*
> > >  		 * For nap and fastsleep, use default target_residency
> > >  		 * values if f/w does not expose it.
> > > @@ -392,8 +396,7 @@ static int powernv_add_idle_states(void)
> > >  			add_powernv_state(nr_idle_states, "Nap",
> > >  					  CPUIDLE_FLAG_NONE, nap_loop,
> > >  					  target_residency, exit_latency, 0, 0);
> > > -		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
> > > -				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
> > > +		} else if (has_stop_states && !stops_timebase) {
> > >  			add_powernv_state(nr_idle_states, names[i],
> > >  					  CPUIDLE_FLAG_NONE, stop_loop,
> > >  					  target_residency, exit_latency,
> > > @@ -405,8 +408,8 @@ static int powernv_add_idle_states(void)
> > >  		 * within this config dependency check.
> > >  		 */
> > >  #ifdef CONFIG_TICK_ONESHOT
> > > -		if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> > > -			flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
> > > +		else if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> > > +			 flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {  
> > 
> > Hmm, seems okay but readability is isn't the best with the ifdef and
> > mixing power8 and 9 cases IMO.
> > 
> > Particularly with the nice regular POWER9 states, we're not doing much
> > logic in this loop besides checking for the timebase stop flag, right?
> > Would it be clearer if it was changed to something like this (untested
> > quick hack)?  
> 
> Yes, this is very much doable. Some comments below.
> 
> > 
> > ---
> >  drivers/cpuidle/cpuidle-powernv.c | 76 +++++++++++++++++++--------------------
> >  1 file changed, 37 insertions(+), 39 deletions(-)
> > 
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > index 12409a519cc5..77291389f9ac 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -353,7 +353,9 @@ static int powernv_add_idle_states(void)
> >  	}
> > 
> >  	for (i = 0; i < dt_idle_states; i++) {
> > +		unsigned int cpuidle_flags = CPUIDLE_FLAG_NONE;
> >  		unsigned int exit_latency, target_residency;
> > +
> >  		/*
> >  		 * If an idle state has exit latency beyond
> >  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> > @@ -371,6 +373,16 @@ static int powernv_add_idle_states(void)
> >  		else
> >  			target_residency = 0;
> > 
> > +		if (flags[i] & OPAL_PM_TIMEBASE_STOP) {
> > +			/*
> > +			 * All cpuidle states with CPUIDLE_FLAG_TIMER_STOP set
> > +			 * depend on CONFIG_TICK_ONESHOT.
> > +			 */
> > +			if (!IS_ENABLED(CONFIG_TICK_ONESHOT))
> > +				continue;
> > +			cpuidle_flags = CPUIDLE_FLAG_TIMER_STOP;  
> 
> Yes, this can be done. We just need to extend this condition to
> (flags[i] & OPAL_PM_TIMEBASE_STOP || flags[i] &
> OPAL_PM_SLEEP_ENABLED).
> 
> Even though all the recent versions of OPAL have OPAL_PM_TIMEBASE_STOP
> set for states which have OPAL_PM_SLEEP_ENABLED, I am not sure if
> that was always the case and which is why we have the fastsleep case
> explicitly coded in the kernel.

Okay that makes sense, thanks for explaining.

Since your patches are now merged, it's a matter of preference,
if you want to send any cleanup patches you can take any of my
suggestions.

Thanks for the fixes!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time
  2017-05-30 11:10       ` Nicholas Piggin
@ 2017-05-31  8:39         ` Gautham R Shenoy
  0 siblings, 0 replies; 19+ messages in thread
From: Gautham R Shenoy @ 2017-05-31  8:39 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Gautham R Shenoy, Michael Ellerman, Michael Neuling,
	Vaidyanathan Srinivasan, Shilpasri G Bhat, Akshay Adiga,
	Benjamin Herrenschmidt, linuxppc-dev, linux-kernel

On Tue, May 30, 2017 at 09:10:06PM +1000, Nicholas Piggin wrote:
> On Tue, 30 May 2017 16:20:55 +0530
> Gautham R Shenoy <ego@linux.vnet.ibm.com> wrote:
> 
> > On Tue, May 30, 2017 at 05:13:57PM +1000, Nicholas Piggin wrote:
> > > On Tue, 16 May 2017 14:19:48 +0530
> > > "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> wrote:
> > >   
> > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > > > 
> > > > The current code in the cpuidle-powernv intialization only allows deep
> > > > stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
> > > > (indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
> > > > POWER8 time where deep states used to lose the timebase. However, on
> > > > POWER9, we do have stop states that are deep (they lose hypervisor
> > > > state) but retain the timebase.
> > > > 
> > > > Fix the initialization code in the cpuidle-powernv driver to allow
> > > > such deep states.
> > > > 
> > > > Further, there is a bug in cpuidle-powernv driver with
> > > > CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
> > > > even if a platform idle state which loses time base was not added to
> > > > the cpuidle table.
> > > > 
> > > > Fix this by ensuring that the nr_idle_states variable gets incremented
> > > > only when the platform idle state was added to the cpuidle table.  
> > > 
> > > Should this be a separate patch? Stable?  
> > 
> > Ok. Will send it out separately.
> 
> Looks like mpe has merged this in next now. I just wonder if this
> particular bit would be relevant for POWER8 and therefore be a
> stable candidate? All the POWER9 idle fixes may not be suitable for
> stable.

I agree. The other POWER9 fixes aren't suitable for stable. I will
clean this patch alone based on your suggestion and mark it for
stable.

> 
> 
> > > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > > > ---
> > > >  drivers/cpuidle/cpuidle-powernv.c | 16 ++++++++++------
> > > >  1 file changed, 10 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > > > index 12409a5..45eaf06 100644
> > > > --- a/drivers/cpuidle/cpuidle-powernv.c
> > > > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > > > @@ -354,6 +354,7 @@ static int powernv_add_idle_states(void)
> > > >  
> > > >  	for (i = 0; i < dt_idle_states; i++) {
> > > >  		unsigned int exit_latency, target_residency;
> > > > +		bool stops_timebase = false;
> > > >  		/*
> > > >  		 * If an idle state has exit latency beyond
> > > >  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> > > > @@ -381,6 +382,9 @@ static int powernv_add_idle_states(void)
> > > >  			}
> > > >  		}
> > > >  
> > > > +		if (flags[i] & OPAL_PM_TIMEBASE_STOP)
> > > > +			stops_timebase = true;
> > > > +
> > > >  		/*
> > > >  		 * For nap and fastsleep, use default target_residency
> > > >  		 * values if f/w does not expose it.
> > > > @@ -392,8 +396,7 @@ static int powernv_add_idle_states(void)
> > > >  			add_powernv_state(nr_idle_states, "Nap",
> > > >  					  CPUIDLE_FLAG_NONE, nap_loop,
> > > >  					  target_residency, exit_latency, 0, 0);
> > > > -		} else if ((flags[i] & OPAL_PM_STOP_INST_FAST) &&
> > > > -				!(flags[i] & OPAL_PM_TIMEBASE_STOP)) {
> > > > +		} else if (has_stop_states && !stops_timebase) {
> > > >  			add_powernv_state(nr_idle_states, names[i],
> > > >  					  CPUIDLE_FLAG_NONE, stop_loop,
> > > >  					  target_residency, exit_latency,
> > > > @@ -405,8 +408,8 @@ static int powernv_add_idle_states(void)
> > > >  		 * within this config dependency check.
> > > >  		 */
> > > >  #ifdef CONFIG_TICK_ONESHOT
> > > > -		if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> > > > -			flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
> > > > +		else if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> > > > +			 flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {  
> > > 
> > > Hmm, seems okay but readability is isn't the best with the ifdef and
> > > mixing power8 and 9 cases IMO.
> > > 
> > > Particularly with the nice regular POWER9 states, we're not doing much
> > > logic in this loop besides checking for the timebase stop flag, right?
> > > Would it be clearer if it was changed to something like this (untested
> > > quick hack)?  
> > 
> > Yes, this is very much doable. Some comments below.
> > 
> > > 
> > > ---
> > >  drivers/cpuidle/cpuidle-powernv.c | 76 +++++++++++++++++++--------------------
> > >  1 file changed, 37 insertions(+), 39 deletions(-)
> > > 
> > > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > > index 12409a519cc5..77291389f9ac 100644
> > > --- a/drivers/cpuidle/cpuidle-powernv.c
> > > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > > @@ -353,7 +353,9 @@ static int powernv_add_idle_states(void)
> > >  	}
> > > 
> > >  	for (i = 0; i < dt_idle_states; i++) {
> > > +		unsigned int cpuidle_flags = CPUIDLE_FLAG_NONE;
> > >  		unsigned int exit_latency, target_residency;
> > > +
> > >  		/*
> > >  		 * If an idle state has exit latency beyond
> > >  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
> > > @@ -371,6 +373,16 @@ static int powernv_add_idle_states(void)
> > >  		else
> > >  			target_residency = 0;
> > > 
> > > +		if (flags[i] & OPAL_PM_TIMEBASE_STOP) {
> > > +			/*
> > > +			 * All cpuidle states with CPUIDLE_FLAG_TIMER_STOP set
> > > +			 * depend on CONFIG_TICK_ONESHOT.
> > > +			 */
> > > +			if (!IS_ENABLED(CONFIG_TICK_ONESHOT))
> > > +				continue;
> > > +			cpuidle_flags = CPUIDLE_FLAG_TIMER_STOP;  
> > 
> > Yes, this can be done. We just need to extend this condition to
> > (flags[i] & OPAL_PM_TIMEBASE_STOP || flags[i] &
> > OPAL_PM_SLEEP_ENABLED).
> > 
> > Even though all the recent versions of OPAL have OPAL_PM_TIMEBASE_STOP
> > set for states which have OPAL_PM_SLEEP_ENABLED, I am not sure if
> > that was always the case and which is why we have the fastsleep case
> > explicitly coded in the kernel.
> 
> Okay that makes sense, thanks for explaining.
> 
> Since your patches are now merged, it's a matter of preference,
> if you want to send any cleanup patches you can take any of my
> suggestions.
> 
> Thanks for the fixes!

I will incorporate your suggestion into a separate patch to do all the
device-tree parsing pertaining to the idle-bits in one place,
preferably arch/powerpc/platforms/powernv/idle.c and expose the parsed
output to any other subsystems such as cpuidle-powernv via an in
kernel data-structure. Today we do it at two separate place, once
during the idle-code initialization and another cpuidle-powernv driver
initialization.

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-05-31  8:39 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-16  8:49 [PATCH 0/6] Enable support for deep-stop states on POWER9 Gautham R. Shenoy
2017-05-16  8:49 ` [PATCH 1/6] powernv:idle: Correctly initialize core_idle_state_ptr Gautham R. Shenoy
2017-05-30  5:56   ` Nicholas Piggin
2017-05-30 10:23     ` Gautham R Shenoy
2017-05-30  9:11   ` [1/6] " Michael Ellerman
2017-05-16  8:49 ` [PATCH 2/6] powernv:idle: Decouple Timebase restore & Per-core SPRs restore Gautham R. Shenoy
2017-05-30  6:12   ` Nicholas Piggin
2017-05-30 10:28     ` Gautham R Shenoy
2017-05-16  8:49 ` [PATCH 3/6] powernv:idle: Restore LPCR on wakeup from deep-stop Gautham R. Shenoy
2017-05-30  6:17   ` Nicholas Piggin
2017-05-30 10:35     ` Gautham R Shenoy
2017-05-16  8:49 ` [PATCH 4/6] powernv:idle: Restore SPRs for deep idle states via stop API Gautham R. Shenoy
2017-05-16  8:49 ` [PATCH 5/6] powernv:idle: Use Requested Level for restoring state on P9 DD1 Gautham R. Shenoy
2017-05-30  6:27   ` Nicholas Piggin
2017-05-16  8:49 ` [PATCH 6/6] cpuidle-powernv: Allow Deep stop states that don't stop time Gautham R. Shenoy
2017-05-30  7:13   ` Nicholas Piggin
2017-05-30 10:50     ` Gautham R Shenoy
2017-05-30 11:10       ` Nicholas Piggin
2017-05-31  8:39         ` Gautham R Shenoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.