linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes
@ 2014-10-01  7:45 Shreyas B. Prabhu
  2014-10-01  7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Shreyas B. Prabhu @ 2014-10-01  7:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Shreyas B. Prabhu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, linux-pm, linuxppc-dev,
	Srivatsa S. Bhat, Preeti U. Murthy, Vaidyanathan Srinivasan

Fast sleep is an idle state, where the core and the L1 and L2
caches are brought down to a threshold voltage. This also means that
the communication between L2 and L3 caches have to be fenced. However
the current P8 chips have a bug wherein this fencing between L2 and
L3 caches get delayed by a cpu cycle. This can delay L3 response to
the other cpus if they request for data during this time. Thus they
would fetch the same data from the memory which could lead to data
corruption if L3 cache is not flushed. 

This series overcomes above problem in kernel.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Srivatsa S. Bhat <srivatsa@mit.edu>
Cc: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

v2:
Rebased on 3.17-rc7
Split from 'powerpc/powernv: Support for fastsleep and winkle'

v1:
https://lkml.org/lkml/2014/8/25/446

Preeti U Murthy (1):
  powerpc/powernv/cpuidle: Add workaround to enable fastsleep

Shreyas B. Prabhu (1):
  powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
    fast-sleep

Srivatsa S. Bhat (1):
  powerpc/powernv: Enable Offline CPUs to enter deep idle states

 arch/powerpc/include/asm/machdep.h             |   3 +
 arch/powerpc/include/asm/opal.h                |   7 ++
 arch/powerpc/include/asm/processor.h           |   4 +-
 arch/powerpc/kernel/exceptions-64s.S           |  35 ++++----
 arch/powerpc/kernel/idle.c                     |  19 ++++
 arch/powerpc/kernel/idle_power7.S              |   2 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/powernv.h       |   7 ++
 arch/powerpc/platforms/powernv/setup.c         | 118 +++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/smp.c           |  11 ++-
 drivers/cpuidle/cpuidle-powernv.c              |  13 ++-
 11 files changed, 194 insertions(+), 26 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states
  2014-10-01  7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu
@ 2014-10-01  7:45 ` Shreyas B. Prabhu
  2014-10-07  5:06   ` Benjamin Herrenschmidt
  2014-10-01  7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Shreyas B. Prabhu @ 2014-10-01  7:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Srivatsa S. Bhat, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, linux-pm, linuxppc-dev,
	Shreyas B. Prabhu, Preeti U. Murthy

From: "Srivatsa S. Bhat" <srivatsa@mit.edu>

The offline cpus should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.

Since the  device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.

Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.

They will be taken care of by the core powernv code.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
[ Changelog modified by preeti@linux.vnet.ibm.com ]
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/opal.h          |  4 +++
 arch/powerpc/platforms/powernv/powernv.h |  7 +++++
 arch/powerpc/platforms/powernv/setup.c   | 51 ++++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/smp.c     | 11 ++++++-
 drivers/cpuidle/cpuidle-powernv.c        |  7 ++---
 5 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 86055e5..28b8342 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -772,6 +772,10 @@ extern struct kobject *opal_kobj;
 /* /ibm,opal */
 extern struct device_node *opal_node;
 
+/* Flags used for idle state discovery from the device tree */
+#define IDLE_INST_NAP	0x00010000 /* nap instruction can be used */
+#define IDLE_INST_SLEEP	0x00020000 /* sleep instruction can be used */
+
 /* API functions */
 int64_t opal_invalid_call(void);
 int64_t opal_console_write(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 75501bf..31ece13 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 }
 #endif
 
+/* Flags to indicate which of the CPU idle states are available for use */
+
+#define IDLE_USE_NAP		(1UL << 0)
+#define IDLE_USE_SLEEP		(1UL << 1)
+
+extern unsigned int pnv_get_supported_cpuidle_states(void);
+
 extern void pnv_lpc_init(void);
 
 bool cpu_core_split_required(void);
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 5a0e2dc..2dca1d8 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void)
 }
 #endif /* CONFIG_PPC_POWERNV_RTAS */
 
+static unsigned int supported_cpuidle_states;
+
+unsigned int pnv_get_supported_cpuidle_states(void)
+{
+	return supported_cpuidle_states;
+}
+
+static int __init pnv_probe_idle_states(void)
+{
+	struct device_node *power_mgt;
+	struct property *prop;
+	int dt_idle_states;
+	u32 *flags;
+	int i;
+
+	supported_cpuidle_states = 0;
+
+	if (cpuidle_disable != IDLE_NO_OVERRIDE)
+		return 0;
+
+	if (!firmware_has_feature(FW_FEATURE_OPALv3))
+		return 0;
+
+	power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+	if (!power_mgt) {
+		pr_warn("opal: PowerMgmt Node not found\n");
+		return 0;
+	}
+
+	prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
+	if (!prop) {
+		pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+		return 0;
+	}
+
+	dt_idle_states = prop->length / sizeof(u32);
+	flags = (u32 *) prop->value;
+
+	for (i = 0; i < dt_idle_states; i++) {
+		if (flags[i] & IDLE_INST_NAP)
+			supported_cpuidle_states |= IDLE_USE_NAP;
+
+		if (flags[i] & IDLE_INST_SLEEP)
+			supported_cpuidle_states |= IDLE_USE_SLEEP;
+	}
+
+	return 0;
+}
+
+subsys_initcall(pnv_probe_idle_states);
+
 static int __init pnv_probe(void)
 {
 	unsigned long root = of_get_flat_dt_root();
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 5fcfcf4..3ad31d2 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void)
 static void pnv_smp_cpu_kill_self(void)
 {
 	unsigned int cpu;
+	unsigned long idle_states;
 
 	/* Standard hot unplug procedure */
 	local_irq_disable();
@@ -159,13 +160,21 @@ static void pnv_smp_cpu_kill_self(void)
 	generic_set_cpu_dead(cpu);
 	smp_wmb();
 
+	idle_states = pnv_get_supported_cpuidle_states();
+
 	/* We don't want to take decrementer interrupts while we are offline,
 	 * so clear LPCR:PECE1. We keep PECE2 enabled.
 	 */
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
 	while (!generic_check_cpu_restart(cpu)) {
 		ppc64_runlatch_off();
-		power7_nap(1);
+
+		/* If sleep is supported, go to sleep, instead of nap */
+		if (idle_states & IDLE_USE_SLEEP)
+			power7_sleep();
+		else
+			power7_nap(1);
+
 		ppc64_runlatch_on();
 
 		/* Reenable IRQs briefly to clear the IPI that woke us */
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index a64be57..23d2743 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -16,13 +16,12 @@
 
 #include <asm/machdep.h>
 #include <asm/firmware.h>
+#include <asm/opal.h>
 #include <asm/runlatch.h>
 
 /* Flags and constants used in PowerNV platform */
 
 #define MAX_POWERNV_IDLE_STATES	8
-#define IDLE_USE_INST_NAP	0x00010000 /* Use nap instruction */
-#define IDLE_USE_INST_SLEEP	0x00020000 /* Use sleep instruction */
 
 struct cpuidle_driver powernv_idle_driver = {
 	.name             = "powernv_idle",
@@ -185,7 +184,7 @@ static int powernv_add_idle_states(void)
 	for (i = 0; i < dt_idle_states; i++) {
 
 		flags = be32_to_cpu(idle_state_flags[i]);
-		if (flags & IDLE_USE_INST_NAP) {
+		if (flags & IDLE_INST_NAP) {
 			/* Add NAP state */
 			strcpy(powernv_states[nr_idle_states].name, "Nap");
 			strcpy(powernv_states[nr_idle_states].desc, "Nap");
@@ -196,7 +195,7 @@ static int powernv_add_idle_states(void)
 			nr_idle_states++;
 		}
 
-		if (flags & IDLE_USE_INST_SLEEP) {
+		if (flags & IDLE_INST_SLEEP) {
 			/* Add FASTSLEEP state */
 			strcpy(powernv_states[nr_idle_states].name, "FastSleep");
 			strcpy(powernv_states[nr_idle_states].desc, "FastSleep");
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep
  2014-10-01  7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu
  2014-10-01  7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu
@ 2014-10-01  7:45 ` Shreyas B. Prabhu
  2014-10-02 16:39   ` Shreyas B Prabhu
  2014-10-07  5:11   ` Benjamin Herrenschmidt
  2014-10-01  7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu
  2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki
  3 siblings, 2 replies; 11+ messages in thread
From: Shreyas B. Prabhu @ 2014-10-01  7:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Shreyas B. Prabhu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linuxppc-dev, Preeti U Murthy

When guests have to be launched, the secondary threads which are offline
are woken up to run the guests. Today these threads wake up from nap
and check if they have to run guests. Now that the offline secondary
threads can go to fastsleep or going ahead a deeper idle state such as winkle,
add this check in the wakeup from any of the deep idle states path as well.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Suggested-by: "Srivatsa S. Bhat" <srivatsa@mit.edu>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
[ Changelog added by <preeti@linux.vnet.ibm.com> ]
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 050f79a..c64f3cc0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -100,25 +100,8 @@ system_reset_pSeries:
 	SET_SCRATCH0(r13)
 #ifdef CONFIG_PPC_P7_NAP
 BEGIN_FTR_SECTION
-	/* Running native on arch 2.06 or later, check if we are
-	 * waking up from nap. We only handle no state loss and
-	 * supervisor state loss. We do -not- handle hypervisor
-	 * state loss at this time.
-	 */
-	mfspr	r13,SPRN_SRR1
-	rlwinm.	r13,r13,47-31,30,31
-	beq	9f
 
-	/* waking up from powersave (nap) state */
-	cmpwi	cr1,r13,2
-	/* Total loss of HV state is fatal, we could try to use the
-	 * PIR to locate a PACA, then use an emergency stack etc...
-	 * OPAL v3 based powernv platforms have new idle states
-	 * which fall in this catagory.
-	 */
-	bgt	cr1,8f
 	GET_PACA(r13)
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	li	r0,KVM_HWTHREAD_IN_KERNEL
 	stb	r0,HSTATE_HWTHREAD_STATE(r13)
@@ -131,13 +114,27 @@ BEGIN_FTR_SECTION
 1:
 #endif
 
+	/* Running native on arch 2.06 or later, check if we are
+	 * waking up from nap. We only handle no state loss and
+	 * supervisor state loss. We do -not- handle hypervisor
+	 * state loss at this time.
+	 */
+	mfspr	r13,SPRN_SRR1
+	rlwinm.	r13,r13,47-31,30,31
+	beq	9f
+
+	/* waking up from powersave (nap) state */
+	cmpwi	cr1,r13,2
+	GET_PACA(r13)
+
+	bgt	cr1,8f
+
 	beq	cr1,2f
 	b	power7_wakeup_noloss
 2:	b	power7_wakeup_loss
 
 	/* Fast Sleep wakeup on PowerNV */
-8:	GET_PACA(r13)
-	b 	power7_wakeup_tb_loss
+8:	b 	power7_wakeup_tb_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep
  2014-10-01  7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu
  2014-10-01  7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu
  2014-10-01  7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu
@ 2014-10-01  7:46 ` Shreyas B. Prabhu
  2014-10-07  5:20   ` Benjamin Herrenschmidt
  2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki
  3 siblings, 1 reply; 11+ messages in thread
From: Shreyas B. Prabhu @ 2014-10-01  7:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Preeti U Murthy, linux-pm, linuxppc-dev, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Rafael J. Wysocki,
	Shreyas B. Prabhu

From: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Fast sleep is an idle state, where the core and the L1 and L2
caches are brought down to a threshold voltage. This also means that
the communication between L2 and L3 caches have to be fenced. However
the current P8 chips have a bug wherein this fencing between L2 and
L3 caches get delayed by a cpu cycle. This can delay L3 response to
the other cpus if they request for data during this time. Thus they
would fetch the same data from the memory which could lead to data
corruption if L3 cache is not flushed.

The cpu idle states save power at a core level and not at a thread level.
Hence powersavings is based on the shallowest idle state that a thread
of a core is in. The above issue in fastsleep will arise only when
all the threads in a core either enter fastsleep or some of them enter
any deeper idle states, with only a few being in fastsleep. This patch
therefore implements a workaround this bug  by ensuring
that, each time a cpu goes to fastsleep, it checks if it is the last
thread in the core to enter fastsleep. If so, it needs to make an opal
call to get around the above mentioned fastsleep problem in the hardware
before issuing the sleep instruction.

Similarly when a thread in a core comes out of fastsleep, it needs
to verify if its the first thread in the core to come out of fastsleep
and issue the opal call to revert the changes made while entering
fastsleep.

For the same reason mentioned above we need to take care of offline threads
as well since we allow them to enter fastsleep and with support for
deep winkle soon coming in they can enter winkle as well.  We therefore
ensure that even offline threads make the above mentioned opal calls
similarly, so that as long as the threads in a core are in and
idle state >= fastsleep, we have the workaround in place. Whenever a
thread comes out of either of these states, it needs to verify if the
opal call has been made and if so it will revert it. For now this patch
ensures that offline threads enter fastsleep.

We need to be able to synchronize the cpus in a core which are entering
and exiting fastsleep so as to ensure that the last thread in the core
to enter fastsleep and the first to exit fastsleep *only* issue the opal
call. To do so, we need a per-core lock and counter. The counter is
required to keep track of the number of threads in a core which are in
idle state >= fastsleep. To make the implementation of this simple, we
introduce a per-cpu lock and counter and every thread always takes the
primary thread's lock, modifies the primary thread's counter. This
effectively makes them per-core entities.

But the workaround is abstracted in the powernv core code and neither
the hotplug path nor the cpuidle driver need to bother about it. All
they need to know is if fastsleep, with error or no error is present as
an idle state.

Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/machdep.h             |   3 +
 arch/powerpc/include/asm/opal.h                |   3 +
 arch/powerpc/include/asm/processor.h           |   4 +-
 arch/powerpc/kernel/idle.c                     |  19 ++++
 arch/powerpc/kernel/idle_power7.S              |   2 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/setup.c         | 139 ++++++++++++++++++-------
 drivers/cpuidle/cpuidle-powernv.c              |   8 +-
 8 files changed, 140 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index b125cea..f37014f 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -298,6 +298,9 @@ struct machdep_calls {
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	int (*remove_memory)(u64, u64);
 #endif
+	/* Idle handlers */
+	void		(*setup_idle)(void);
+	unsigned long	(*power7_sleep)(void);
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 28b8342..166d572 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -149,6 +149,7 @@ struct opal_sg_list {
 #define OPAL_DUMP_INFO2				94
 #define OPAL_PCI_EEH_FREEZE_SET			97
 #define OPAL_HANDLE_HMI				98
+#define OPAL_CONFIG_IDLE_STATE			99
 #define OPAL_REGISTER_DUMP_REGION		101
 #define OPAL_UNREGISTER_DUMP_REGION		102
 
@@ -775,6 +776,7 @@ extern struct device_node *opal_node;
 /* Flags used for idle state discovery from the device tree */
 #define IDLE_INST_NAP	0x00010000 /* nap instruction can be used */
 #define IDLE_INST_SLEEP	0x00020000 /* sleep instruction can be used */
+#define IDLE_INST_SLEEP_ER1	0x00080000 /* Use sleep with work around*/
 
 /* API functions */
 int64_t opal_invalid_call(void);
@@ -975,6 +977,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs);
 
 extern void opal_shutdown(void);
 extern int opal_resync_timebase(void);
+int64_t opal_config_idle_state(uint64_t state, uint64_t enter);
 
 extern void opal_lpc_init(void);
 
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index dda7ac4..41953cd 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -451,8 +451,10 @@ extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
+extern void arch_setup_idle(void);
 extern void power7_nap(int check_irq);
-extern void power7_sleep(void);
+extern unsigned long power7_sleep(void);
+extern unsigned long __power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
 extern void poweroff_now(void);
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index d7216c9..1f268e0 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -32,6 +32,9 @@
 #include <asm/machdep.h>
 #include <asm/runlatch.h>
 #include <asm/smp.h>
+#include <asm/cputhreads.h>
+#include <asm/firmware.h>
+#include <asm/opal.h>
 
 
 unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
@@ -78,6 +81,22 @@ void arch_cpu_idle(void)
 	HMT_medium();
 	ppc64_runlatch_on();
 }
+void arch_setup_idle(void)
+{
+	if (ppc_md.setup_idle)
+		ppc_md.setup_idle();
+}
+
+unsigned long power7_sleep(void)
+{
+	unsigned long ret;
+
+	if (ppc_md.power7_sleep)
+		ret = ppc_md.power7_sleep();
+	else
+		ret = __power7_sleep();
+	return ret;
+}
 
 int powersave_nap;
 
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index be05841..c3481c9 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -129,7 +129,7 @@ _GLOBAL(power7_nap)
 	b	power7_powersave_common
 	/* No return */
 
-_GLOBAL(power7_sleep)
+_GLOBAL(__power7_sleep)
 	li	r3,1
 	li	r4,1
 	b	power7_powersave_common
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 2e6ce1b..8d1e724 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -245,5 +245,6 @@ OPAL_CALL(opal_sensor_read,			OPAL_SENSOR_READ);
 OPAL_CALL(opal_get_param,			OPAL_GET_PARAM);
 OPAL_CALL(opal_set_param,			OPAL_SET_PARAM);
 OPAL_CALL(opal_handle_hmi,			OPAL_HANDLE_HMI);
+OPAL_CALL(opal_config_idle_state,		OPAL_CONFIG_IDLE_STATE);
 OPAL_CALL(opal_register_dump_region,		OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region,		OPAL_UNREGISTER_DUMP_REGION);
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 2dca1d8..9d9a898 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -36,9 +36,20 @@
 #include <asm/opal.h>
 #include <asm/kexec.h>
 #include <asm/smp.h>
+#include <asm/cputhreads.h>
 
 #include "powernv.h"
 
+/* Per-cpu structures to keep track of cpus of a core that
+ * are in idle states >= fastsleep so as to call opal for
+ * sleep setup when the entire core is ready to go to fastsleep.
+ *
+ * We need sometihng similar to a per-core lock. For now we
+ * achieve this by taking the lock of the primary thread in the core.
+ */
+static DEFINE_PER_CPU(spinlock_t, fastsleep_override_lock);
+static DEFINE_PER_CPU(int, fastsleep_cnt);
+
 static void __init pnv_setup_arch(void)
 {
 	set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
@@ -254,35 +265,8 @@ static unsigned long pnv_memory_block_size(void)
 }
 #endif
 
-static void __init pnv_setup_machdep_opal(void)
-{
-	ppc_md.get_boot_time = opal_get_boot_time;
-	ppc_md.get_rtc_time = opal_get_rtc_time;
-	ppc_md.set_rtc_time = opal_set_rtc_time;
-	ppc_md.restart = pnv_restart;
-	ppc_md.power_off = pnv_power_off;
-	ppc_md.halt = pnv_halt;
-	ppc_md.machine_check_exception = opal_machine_check;
-	ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery;
-	ppc_md.hmi_exception_early = opal_hmi_exception_early;
-	ppc_md.handle_hmi_exception = opal_handle_hmi_exception;
-}
-
-#ifdef CONFIG_PPC_POWERNV_RTAS
-static void __init pnv_setup_machdep_rtas(void)
-{
-	if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) {
-		ppc_md.get_boot_time = rtas_get_boot_time;
-		ppc_md.get_rtc_time = rtas_get_rtc_time;
-		ppc_md.set_rtc_time = rtas_set_rtc_time;
-	}
-	ppc_md.restart = rtas_restart;
-	ppc_md.power_off = rtas_power_off;
-	ppc_md.halt = rtas_halt;
-}
-#endif /* CONFIG_PPC_POWERNV_RTAS */
-
 static unsigned int supported_cpuidle_states;
+static int need_fastsleep_workaround;
 
 unsigned int pnv_get_supported_cpuidle_states(void)
 {
@@ -292,12 +276,13 @@ unsigned int pnv_get_supported_cpuidle_states(void)
 static int __init pnv_probe_idle_states(void)
 {
 	struct device_node *power_mgt;
-	struct property *prop;
 	int dt_idle_states;
-	u32 *flags;
+	const __be32 *idle_state_flags;
+	u32 len_flags, flags;
 	int i;
 
 	supported_cpuidle_states = 0;
+	need_fastsleep_workaround = 0;
 
 	if (cpuidle_disable != IDLE_NO_OVERRIDE)
 		return 0;
@@ -311,21 +296,28 @@ static int __init pnv_probe_idle_states(void)
 		return 0;
 	}
 
-	prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
-	if (!prop) {
+	idle_state_flags = of_get_property(power_mgt,
+			"ibm,cpu-idle-state-flags", &len_flags);
+	if (!idle_state_flags) {
 		pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
 		return 0;
 	}
 
-	dt_idle_states = prop->length / sizeof(u32);
-	flags = (u32 *) prop->value;
+	dt_idle_states = len_flags / sizeof(u32);
 
 	for (i = 0; i < dt_idle_states; i++) {
-		if (flags[i] & IDLE_INST_NAP)
+
+		flags = be32_to_cpu(idle_state_flags[i]);
+		if (flags & IDLE_INST_NAP)
 			supported_cpuidle_states |= IDLE_USE_NAP;
 
-		if (flags[i] & IDLE_INST_SLEEP)
+		if (flags & IDLE_INST_SLEEP)
 			supported_cpuidle_states |= IDLE_USE_SLEEP;
+
+		if (flags & IDLE_INST_SLEEP_ER1) {
+			supported_cpuidle_states |= IDLE_USE_SLEEP;
+			need_fastsleep_workaround = 1;
+		}
 	}
 
 	return 0;
@@ -333,6 +325,81 @@ static int __init pnv_probe_idle_states(void)
 
 subsys_initcall(pnv_probe_idle_states);
 
+static void pnv_setup_idle(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		spin_lock_init(&per_cpu(fastsleep_override_lock, cpu));
+		per_cpu(fastsleep_cnt, cpu) = threads_per_core;
+	}
+}
+
+static void
+pnv_apply_fastsleep_workaround(bool enter_fastsleep, int primary_thread)
+{
+	if (enter_fastsleep) {
+		spin_lock(&per_cpu(fastsleep_override_lock, primary_thread));
+		if (--(per_cpu(fastsleep_cnt, primary_thread)) == 0)
+			opal_config_idle_state(1, 1);
+		spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread));
+	} else {
+		spin_lock(&per_cpu(fastsleep_override_lock, primary_thread));
+		if ((per_cpu(fastsleep_cnt, primary_thread)) == 0)
+			opal_config_idle_state(1, 0);
+		per_cpu(fastsleep_cnt, primary_thread)++;
+		spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread));
+	}
+}
+
+static unsigned long pnv_power7_sleep(void)
+{
+	int cpu, primary_thread;
+	unsigned long srr1;
+
+	cpu = smp_processor_id();
+	primary_thread = cpu_first_thread_sibling(cpu);
+
+	if (need_fastsleep_workaround) {
+		pnv_apply_fastsleep_workaround(1, primary_thread);
+		srr1 = __power7_sleep();
+		pnv_apply_fastsleep_workaround(0, primary_thread);
+	} else {
+		srr1 = __power7_sleep();
+	}
+	return srr1;
+}
+
+static void __init pnv_setup_machdep_opal(void)
+{
+	ppc_md.get_boot_time = opal_get_boot_time;
+	ppc_md.get_rtc_time = opal_get_rtc_time;
+	ppc_md.set_rtc_time = opal_set_rtc_time;
+	ppc_md.restart = pnv_restart;
+	ppc_md.power_off = pnv_power_off;
+	ppc_md.halt = pnv_halt;
+	ppc_md.machine_check_exception = opal_machine_check;
+	ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery;
+	ppc_md.hmi_exception_early = opal_hmi_exception_early;
+	ppc_md.handle_hmi_exception = opal_handle_hmi_exception;
+	ppc_md.setup_idle = pnv_setup_idle;
+	ppc_md.power7_sleep = pnv_power7_sleep;
+}
+
+#ifdef CONFIG_PPC_POWERNV_RTAS
+static void __init pnv_setup_machdep_rtas(void)
+{
+	if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) {
+		ppc_md.get_boot_time = rtas_get_boot_time;
+		ppc_md.get_rtc_time = rtas_get_rtc_time;
+		ppc_md.set_rtc_time = rtas_set_rtc_time;
+	}
+	ppc_md.restart = rtas_restart;
+	ppc_md.power_off = rtas_power_off;
+	ppc_md.halt = rtas_halt;
+}
+#endif /* CONFIG_PPC_POWERNV_RTAS */
+
 static int __init pnv_probe(void)
 {
 	unsigned long root = of_get_flat_dt_root();
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 23d2743..8ad97a9 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -18,6 +18,7 @@
 #include <asm/firmware.h>
 #include <asm/opal.h>
 #include <asm/runlatch.h>
+#include <asm/processor.h>
 
 /* Flags and constants used in PowerNV platform */
 
@@ -195,7 +196,8 @@ static int powernv_add_idle_states(void)
 			nr_idle_states++;
 		}
 
-		if (flags & IDLE_INST_SLEEP) {
+		if ((flags & IDLE_INST_SLEEP_ER1) ||
+				(flags & IDLE_INST_SLEEP)) {
 			/* Add FASTSLEEP state */
 			strcpy(powernv_states[nr_idle_states].name, "FastSleep");
 			strcpy(powernv_states[nr_idle_states].desc, "FastSleep");
@@ -247,6 +249,10 @@ static int __init powernv_processor_idle_init(void)
 
 	register_cpu_notifier(&setup_hotplug_notifier);
 	printk(KERN_DEBUG "powernv_idle_driver registered\n");
+
+	/* If any idle states require special
+	 * initializations before cpuidle kicks in */
+	arch_setup_idle();
 	return 0;
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes
  2014-10-01  7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu
                   ` (2 preceding siblings ...)
  2014-10-01  7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu
@ 2014-10-01 20:46 ` Rafael J. Wysocki
  2014-10-02 16:40   ` Shreyas B Prabhu
  3 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2014-10-01 20:46 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-pm, linuxppc-dev, Srivatsa S. Bhat,
	Preeti U. Murthy, Vaidyanathan Srinivasan

On Wednesday, October 01, 2014 01:15:57 PM Shreyas B. Prabhu wrote:
> Fast sleep is an idle state, where the core and the L1 and L2
> caches are brought down to a threshold voltage. This also means that
> the communication between L2 and L3 caches have to be fenced. However
> the current P8 chips have a bug wherein this fencing between L2 and
> L3 caches get delayed by a cpu cycle. This can delay L3 response to
> the other cpus if they request for data during this time. Thus they
> would fetch the same data from the memory which could lead to data
> corruption if L3 cache is not flushed. 
> 
> This series overcomes above problem in kernel.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: linux-pm@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Srivatsa S. Bhat <srivatsa@mit.edu>
> Cc: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> 
> v2:
> Rebased on 3.17-rc7
> Split from 'powerpc/powernv: Support for fastsleep and winkle'
> 
> v1:
> https://lkml.org/lkml/2014/8/25/446
> 
> Preeti U Murthy (1):
>   powerpc/powernv/cpuidle: Add workaround to enable fastsleep
> 
> Shreyas B. Prabhu (1):
>   powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
>     fast-sleep
> 
> Srivatsa S. Bhat (1):
>   powerpc/powernv: Enable Offline CPUs to enter deep idle states
> 
>  arch/powerpc/include/asm/machdep.h             |   3 +
>  arch/powerpc/include/asm/opal.h                |   7 ++
>  arch/powerpc/include/asm/processor.h           |   4 +-
>  arch/powerpc/kernel/exceptions-64s.S           |  35 ++++----
>  arch/powerpc/kernel/idle.c                     |  19 ++++
>  arch/powerpc/kernel/idle_power7.S              |   2 +-
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>  arch/powerpc/platforms/powernv/powernv.h       |   7 ++
>  arch/powerpc/platforms/powernv/setup.c         | 118 +++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/smp.c           |  11 ++-
>  drivers/cpuidle/cpuidle-powernv.c              |  13 ++-
>  11 files changed, 194 insertions(+), 26 deletions(-)

[2/3] seems to be missig from the series.

Also, since that mostly modifies arch/powerpc, I think it should go through
that tree.  I'm fine with the cpuidle-powernv changes in [1/3] and [3/3].

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep
  2014-10-01  7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu
@ 2014-10-02 16:39   ` Shreyas B Prabhu
  2014-10-07  5:11   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 11+ messages in thread
From: Shreyas B Prabhu @ 2014-10-02 16:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, Preeti U Murthy, Rafael J. Wysocki, linux-pm

CCing Rafael J. Wysocki and linux-pm@vger.kernel.org

On Wednesday 01 October 2014 01:15 PM, Shreyas B. Prabhu wrote:
> When guests have to be launched, the secondary threads which are offline
> are woken up to run the guests. Today these threads wake up from nap
> and check if they have to run guests. Now that the offline secondary
> threads can go to fastsleep or going ahead a deeper idle state such as winkle,
> add this check in the wakeup from any of the deep idle states path as well.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: linuxppc-dev@lists.ozlabs.org
> Suggested-by: "Srivatsa S. Bhat" <srivatsa@mit.edu>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> [ Changelog added by <preeti@linux.vnet.ibm.com> ]
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 35 ++++++++++++++++-------------------
>  1 file changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 050f79a..c64f3cc0 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -100,25 +100,8 @@ system_reset_pSeries:
>  	SET_SCRATCH0(r13)
>  #ifdef CONFIG_PPC_P7_NAP
>  BEGIN_FTR_SECTION
> -	/* Running native on arch 2.06 or later, check if we are
> -	 * waking up from nap. We only handle no state loss and
> -	 * supervisor state loss. We do -not- handle hypervisor
> -	 * state loss at this time.
> -	 */
> -	mfspr	r13,SPRN_SRR1
> -	rlwinm.	r13,r13,47-31,30,31
> -	beq	9f
> 
> -	/* waking up from powersave (nap) state */
> -	cmpwi	cr1,r13,2
> -	/* Total loss of HV state is fatal, we could try to use the
> -	 * PIR to locate a PACA, then use an emergency stack etc...
> -	 * OPAL v3 based powernv platforms have new idle states
> -	 * which fall in this catagory.
> -	 */
> -	bgt	cr1,8f
>  	GET_PACA(r13)
> -
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  	li	r0,KVM_HWTHREAD_IN_KERNEL
>  	stb	r0,HSTATE_HWTHREAD_STATE(r13)
> @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION
>  1:
>  #endif
> 
> +	/* Running native on arch 2.06 or later, check if we are
> +	 * waking up from nap. We only handle no state loss and
> +	 * supervisor state loss. We do -not- handle hypervisor
> +	 * state loss at this time.
> +	 */
> +	mfspr	r13,SPRN_SRR1
> +	rlwinm.	r13,r13,47-31,30,31
> +	beq	9f
> +
> +	/* waking up from powersave (nap) state */
> +	cmpwi	cr1,r13,2
> +	GET_PACA(r13)
> +
> +	bgt	cr1,8f
> +
>  	beq	cr1,2f
>  	b	power7_wakeup_noloss
>  2:	b	power7_wakeup_loss
> 
>  	/* Fast Sleep wakeup on PowerNV */
> -8:	GET_PACA(r13)
> -	b 	power7_wakeup_tb_loss
> +8:	b 	power7_wakeup_tb_loss
> 
>  9:
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes
  2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki
@ 2014-10-02 16:40   ` Shreyas B Prabhu
  0 siblings, 0 replies; 11+ messages in thread
From: Shreyas B Prabhu @ 2014-10-02 16:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-pm, linuxppc-dev, Srivatsa S. Bhat,
	Preeti U. Murthy, Vaidyanathan Srinivasan



On Thursday 02 October 2014 02:16 AM, Rafael J. Wysocki wrote:
> On Wednesday, October 01, 2014 01:15:57 PM Shreyas B. Prabhu wrote:
>> Fast sleep is an idle state, where the core and the L1 and L2
>> caches are brought down to a threshold voltage. This also means that
>> the communication between L2 and L3 caches have to be fenced. However
>> the current P8 chips have a bug wherein this fencing between L2 and
>> L3 caches get delayed by a cpu cycle. This can delay L3 response to
>> the other cpus if they request for data during this time. Thus they
>> would fetch the same data from the memory which could lead to data
>> corruption if L3 cache is not flushed. 
>>
>> This series overcomes above problem in kernel.
>>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>> Cc: linux-pm@vger.kernel.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: Srivatsa S. Bhat <srivatsa@mit.edu>
>> Cc: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
>> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
>>
>> v2:
>> Rebased on 3.17-rc7
>> Split from 'powerpc/powernv: Support for fastsleep and winkle'
>>
>> v1:
>> https://lkml.org/lkml/2014/8/25/446
>>
>> Preeti U Murthy (1):
>>   powerpc/powernv/cpuidle: Add workaround to enable fastsleep
>>
>> Shreyas B. Prabhu (1):
>>   powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from
>>     fast-sleep
>>
>> Srivatsa S. Bhat (1):
>>   powerpc/powernv: Enable Offline CPUs to enter deep idle states
>>
>>  arch/powerpc/include/asm/machdep.h             |   3 +
>>  arch/powerpc/include/asm/opal.h                |   7 ++
>>  arch/powerpc/include/asm/processor.h           |   4 +-
>>  arch/powerpc/kernel/exceptions-64s.S           |  35 ++++----
>>  arch/powerpc/kernel/idle.c                     |  19 ++++
>>  arch/powerpc/kernel/idle_power7.S              |   2 +-
>>  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>>  arch/powerpc/platforms/powernv/powernv.h       |   7 ++
>>  arch/powerpc/platforms/powernv/setup.c         | 118 +++++++++++++++++++++++++
>>  arch/powerpc/platforms/powernv/smp.c           |  11 ++-
>>  drivers/cpuidle/cpuidle-powernv.c              |  13 ++-
>>  11 files changed, 194 insertions(+), 26 deletions(-)
> 
> [2/3] seems to be missig from the series.
> 
> Also, since that mostly modifies arch/powerpc, I think it should go through
> that tree.  I'm fine with the cpuidle-powernv changes in [1/3] and [3/3].
> 
Hi Rafael, 

Thanks for looking into this. The second patch is an independent fix in the 
powerpc exception handler. To be safe I am ccing you and linux-pm list on that
patch now. 


Thanks, 
Shreyas


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states
  2014-10-01  7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu
@ 2014-10-07  5:06   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2014-10-07  5:06 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: linux-kernel, Srivatsa S. Bhat, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, linux-pm, linuxppc-dev, Preeti U. Murthy

On Wed, 2014-10-01 at 13:15 +0530, Shreyas B. Prabhu wrote:
> From: "Srivatsa S. Bhat" <srivatsa@mit.edu>
> 
> The offline cpus

Arguably "cpus" here should be "secondary threads" to make the commit
message a bit more comprehensible. A few more nits below...

> should enter deep idle states so as to gain maximum
> powersavings when the entire core is offline. To do so the offline path
> must be made aware of the available deepest idle state. Hence probe the
> device tree for the possible idle states in powernv core code and
> expose the deepest idle state through flags.
> 
> Since the  device tree is probed by the cpuidle driver as well, move
> the parameters required to discover the idle states into an appropriate
> common place to both the driver and the powernv core code.
> 
> Another point is that fastsleep idle state may require workarounds in
> the kernel to function properly. This workaround is introduced in the
> subsequent patches. However neither the cpuidle driver or the hotplug
> path need be bothered about this workaround.
> 
> They will be taken care of by the core powernv code.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: linux-pm@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa@mit.edu>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> [ Changelog modified by preeti@linux.vnet.ibm.com ]
> Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/opal.h          |  4 +++
>  arch/powerpc/platforms/powernv/powernv.h |  7 +++++
>  arch/powerpc/platforms/powernv/setup.c   | 51 ++++++++++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/smp.c     | 11 ++++++-
>  drivers/cpuidle/cpuidle-powernv.c        |  7 ++---
>  5 files changed, 75 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 86055e5..28b8342 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -772,6 +772,10 @@ extern struct kobject *opal_kobj;
>  /* /ibm,opal */
>  extern struct device_node *opal_node;
>  
> +/* Flags used for idle state discovery from the device tree */
> +#define IDLE_INST_NAP	0x00010000 /* nap instruction can be used */
> +#define IDLE_INST_SLEEP	0x00020000 /* sleep instruction can be used */

Please provide a better explanation if what this is about, maybe a
commend describing the device-tree property. Also those macros have
names too likely to collide or be confused with other uses. Use
something a bit less ambiguous such as OPAL_PM_NAP_AVAILABLE,
OPAL_PM_SLEEP_ENABLED,...

Also put that in the part of opal.h that isn't the linux internal
implementation, but instead the "API" part. This will help when we
finally split the file.

>  /* API functions */
>  int64_t opal_invalid_call(void);
>  int64_t opal_console_write(int64_t term_number, __be64 *length,
> diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
> index 75501bf..31ece13 100644
> --- a/arch/powerpc/platforms/powernv/powernv.h
> +++ b/arch/powerpc/platforms/powernv/powernv.h
> @@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
>  }
>  #endif
>  
> +/* Flags to indicate which of the CPU idle states are available for use */
> +
> +#define IDLE_USE_NAP		(1UL << 0)
> +#define IDLE_USE_SLEEP		(1UL << 1)

This somewhat duplicates the opal.h definitions, can't we just re-use
them ?

> +extern unsigned int pnv_get_supported_cpuidle_states(void);
> +
>  extern void pnv_lpc_init(void);
>  
>  bool cpu_core_split_required(void);
> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> index 5a0e2dc..2dca1d8 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void)
>  }
>  #endif /* CONFIG_PPC_POWERNV_RTAS */
>  
> +static unsigned int supported_cpuidle_states;
> +
> +unsigned int pnv_get_supported_cpuidle_states(void)
> +{
> +	return supported_cpuidle_states;
> +}

Will this be used by a module ? Doesn't it need to be exported ? Also
keep the prefix pnv on the variable, I don't like globals with such a
confusing name.

> +static int __init pnv_probe_idle_states(void)
> +{
> +	struct device_node *power_mgt;
> +	struct property *prop;
> +	int dt_idle_states;
> +	u32 *flags;
> +	int i;
> +
> +	supported_cpuidle_states = 0;
> +
> +	if (cpuidle_disable != IDLE_NO_OVERRIDE)
> +		return 0;
> +
> +	if (!firmware_has_feature(FW_FEATURE_OPALv3))
> +		return 0;
> +
> +	power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
> +	if (!power_mgt) {
> +		pr_warn("opal: PowerMgmt Node not found\n");
> +		return 0;
> +	}
> +
> +	prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
> +	if (!prop) {
> +		pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
> +		return 0;
> +	}
> +
> +	dt_idle_states = prop->length / sizeof(u32);
> +	flags = (u32 *) prop->value;
> +
> +	for (i = 0; i < dt_idle_states; i++) {
> +		if (flags[i] & IDLE_INST_NAP)
> +			supported_cpuidle_states |= IDLE_USE_NAP;
> +
> +		if (flags[i] & IDLE_INST_SLEEP)
> +			supported_cpuidle_states |= IDLE_USE_SLEEP;
> +	}
> +
> +	return 0;
> +}
> +
> +subsys_initcall(pnv_probe_idle_states);
> +
>  static int __init pnv_probe(void)
>  {
>  	unsigned long root = of_get_flat_dt_root();
> diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
> index 5fcfcf4..3ad31d2 100644
> --- a/arch/powerpc/platforms/powernv/smp.c
> +++ b/arch/powerpc/platforms/powernv/smp.c
> @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void)
>  static void pnv_smp_cpu_kill_self(void)
>  {
>  	unsigned int cpu;
> +	unsigned long idle_states;
>  
>  	/* Standard hot unplug procedure */
>  	local_irq_disable();
> @@ -159,13 +160,21 @@ static void pnv_smp_cpu_kill_self(void)
>  	generic_set_cpu_dead(cpu);
>  	smp_wmb();
>  
> +	idle_states = pnv_get_supported_cpuidle_states();
> +
>  	/* We don't want to take decrementer interrupts while we are offline,
>  	 * so clear LPCR:PECE1. We keep PECE2 enabled.
>  	 */
>  	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
>  	while (!generic_check_cpu_restart(cpu)) {
>  		ppc64_runlatch_off();
> -		power7_nap(1);
> +
> +		/* If sleep is supported, go to sleep, instead of nap */
> +		if (idle_states & IDLE_USE_SLEEP)
> +			power7_sleep();
> +		else
> +			power7_nap(1);
> +
>  		ppc64_runlatch_on();
>  
>  		/* Reenable IRQs briefly to clear the IPI that woke us */
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index a64be57..23d2743 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -16,13 +16,12 @@
>  
>  #include <asm/machdep.h>
>  #include <asm/firmware.h>
> +#include <asm/opal.h>
>  #include <asm/runlatch.h>
>  
>  /* Flags and constants used in PowerNV platform */
>  
>  #define MAX_POWERNV_IDLE_STATES	8
> -#define IDLE_USE_INST_NAP	0x00010000 /* Use nap instruction */
> -#define IDLE_USE_INST_SLEEP	0x00020000 /* Use sleep instruction */
>  
>  struct cpuidle_driver powernv_idle_driver = {
>  	.name             = "powernv_idle",
> @@ -185,7 +184,7 @@ static int powernv_add_idle_states(void)
>  	for (i = 0; i < dt_idle_states; i++) {
>  
>  		flags = be32_to_cpu(idle_state_flags[i]);
> -		if (flags & IDLE_USE_INST_NAP) {
> +		if (flags & IDLE_INST_NAP) {
>  			/* Add NAP state */
>  			strcpy(powernv_states[nr_idle_states].name, "Nap");
>  			strcpy(powernv_states[nr_idle_states].desc, "Nap");
> @@ -196,7 +195,7 @@ static int powernv_add_idle_states(void)
>  			nr_idle_states++;
>  		}
>  
> -		if (flags & IDLE_USE_INST_SLEEP) {
> +		if (flags & IDLE_INST_SLEEP) {
>  			/* Add FASTSLEEP state */
>  			strcpy(powernv_states[nr_idle_states].name, "FastSleep");
>  			strcpy(powernv_states[nr_idle_states].desc, "FastSleep");



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep
  2014-10-01  7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu
  2014-10-02 16:39   ` Shreyas B Prabhu
@ 2014-10-07  5:11   ` Benjamin Herrenschmidt
  2014-10-09 10:03     ` Preeti U Murthy
  1 sibling, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2014-10-07  5:11 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: linux-kernel, Paul Mackerras, Michael Ellerman, linuxppc-dev,
	Preeti U Murthy

On Wed, 2014-10-01 at 13:15 +0530, Shreyas B. Prabhu wrote:
> When guests have to be launched, the secondary threads which are offline
> are woken up to run the guests. Today these threads wake up from nap
> and check if they have to run guests. Now that the offline secondary
> threads can go to fastsleep or going ahead a deeper idle state such as winkle,
> add this check in the wakeup from any of the deep idle states path as well.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: linuxppc-dev@lists.ozlabs.org
> Suggested-by: "Srivatsa S. Bhat" <srivatsa@mit.edu>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> [ Changelog added by <preeti@linux.vnet.ibm.com> ]
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 35 ++++++++++++++++-------------------
>  1 file changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 050f79a..c64f3cc0 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -100,25 +100,8 @@ system_reset_pSeries:
>  	SET_SCRATCH0(r13)
>  #ifdef CONFIG_PPC_P7_NAP
>  BEGIN_FTR_SECTION
> -	/* Running native on arch 2.06 or later, check if we are
> -	 * waking up from nap. We only handle no state loss and
> -	 * supervisor state loss. We do -not- handle hypervisor
> -	 * state loss at this time.
> -	 */
> -	mfspr	r13,SPRN_SRR1
> -	rlwinm.	r13,r13,47-31,30,31
> -	beq	9f
>  
> -	/* waking up from powersave (nap) state */
> -	cmpwi	cr1,r13,2
> -	/* Total loss of HV state is fatal, we could try to use the
> -	 * PIR to locate a PACA, then use an emergency stack etc...
> -	 * OPAL v3 based powernv platforms have new idle states
> -	 * which fall in this catagory.
> -	 */
> -	bgt	cr1,8f
>  	GET_PACA(r13)
> -
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  	li	r0,KVM_HWTHREAD_IN_KERNEL
>  	stb	r0,HSTATE_HWTHREAD_STATE(r13)
> @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION
>  1:
>  #endif

So you moved the state loss check to after the KVM check ? Was this
reviewed by Paul ? Is that ok ? (Does this match what we have in
PowerKVM ?). Is it possible that we end up calling kvm_start_guest
after a HV state loss or do we know for sure that this won't happen
for a reason or another ? If that's the case, then that reason needs
to be clearly documented here in a comment.
 
> +	/* Running native on arch 2.06 or later, check if we are
> +	 * waking up from nap. We only handle no state loss and
> +	 * supervisor state loss. We do -not- handle hypervisor
> +	 * state loss at this time.
> +	 */
> +	mfspr	r13,SPRN_SRR1
> +	rlwinm.	r13,r13,47-31,30,31
> +	beq	9f
> +
> +	/* waking up from powersave (nap) state */
> +	cmpwi	cr1,r13,2
> +	GET_PACA(r13)
> +
> +	bgt	cr1,8f
> +
>  	beq	cr1,2f
>  	b	power7_wakeup_noloss
>  2:	b	power7_wakeup_loss
>  
>  	/* Fast Sleep wakeup on PowerNV */
> -8:	GET_PACA(r13)
> -	b 	power7_wakeup_tb_loss
> +8:	b 	power7_wakeup_tb_loss
>  
>  9:
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep
  2014-10-01  7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu
@ 2014-10-07  5:20   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2014-10-07  5:20 UTC (permalink / raw)
  To: Shreyas B. Prabhu
  Cc: linux-kernel, Preeti U Murthy, linux-pm, linuxppc-dev,
	Paul Mackerras, Michael Ellerman, Rafael J. Wysocki

On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote:
> From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> 
> Fast sleep is an idle state, where the core and the L1 and L2
> caches are brought down to a threshold voltage. This also means that
> the communication between L2 and L3 caches have to be fenced. However
> the current P8 chips have a bug wherein this fencing between L2 and
> L3 caches get delayed by a cpu cycle. This can delay L3 response to
> the other cpus if they request for data during this time. Thus they
> would fetch the same data from the memory which could lead to data
> corruption if L3 cache is not flushed.
> 
> The cpu idle states save power at a core level and not at a thread level.
> Hence powersavings is based on the shallowest idle state that a thread
> of a core is in. The above issue in fastsleep will arise only when
> all the threads in a core either enter fastsleep or some of them enter
> any deeper idle states, with only a few being in fastsleep. This patch
> therefore implements a workaround this bug  by ensuring
> that, each time a cpu goes to fastsleep, it checks if it is the last
> thread in the core to enter fastsleep. If so, it needs to make an opal
> call to get around the above mentioned fastsleep problem in the hardware
> before issuing the sleep instruction.
> 
> Similarly when a thread in a core comes out of fastsleep, it needs
> to verify if its the first thread in the core to come out of fastsleep
> and issue the opal call to revert the changes made while entering
> fastsleep.
> 
> For the same reason mentioned above we need to take care of offline threads
> as well since we allow them to enter fastsleep and with support for
> deep winkle soon coming in they can enter winkle as well.  We therefore
> ensure that even offline threads make the above mentioned opal calls
> similarly, so that as long as the threads in a core are in and
> idle state >= fastsleep, we have the workaround in place. Whenever a
> thread comes out of either of these states, it needs to verify if the
> opal call has been made and if so it will revert it. For now this patch
> ensures that offline threads enter fastsleep.
> 
> We need to be able to synchronize the cpus in a core which are entering
> and exiting fastsleep so as to ensure that the last thread in the core
> to enter fastsleep and the first to exit fastsleep *only* issue the opal
> call. To do so, we need a per-core lock and counter. The counter is
> required to keep track of the number of threads in a core which are in
> idle state >= fastsleep. To make the implementation of this simple, we
> introduce a per-cpu lock and counter and every thread always takes the
> primary thread's lock, modifies the primary thread's counter. This
> effectively makes them per-core entities.
> 
> But the workaround is abstracted in the powernv core code and neither
> the hotplug path nor the cpuidle driver need to bother about it. All
> they need to know is if fastsleep, with error or no error is present as
> an idle state.
> 
> Cc: linux-pm@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/machdep.h             |   3 +
>  arch/powerpc/include/asm/opal.h                |   3 +
>  arch/powerpc/include/asm/processor.h           |   4 +-
>  arch/powerpc/kernel/idle.c                     |  19 ++++
>  arch/powerpc/kernel/idle_power7.S              |   2 +-
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>  arch/powerpc/platforms/powernv/setup.c         | 139 ++++++++++++++++++-------
>  drivers/cpuidle/cpuidle-powernv.c              |   8 +-
>  8 files changed, 140 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
> index b125cea..f37014f 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -298,6 +298,9 @@ struct machdep_calls {
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  	int (*remove_memory)(u64, u64);
>  #endif
> +	/* Idle handlers */
> +	void		(*setup_idle)(void);
> +	unsigned long	(*power7_sleep)(void);
>  };

Do we need that ppc_md hook ? Since we are going to use idle states in
the CPU unplug loop, I would think we should do the necessary
initializations at boot time regardless of whether the cpuidle driver
is loaded or not, so we probably don't need this, or am I missing
something ?

>  extern void e500_idle(void);
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 28b8342..166d572 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -149,6 +149,7 @@ struct opal_sg_list {
>  #define OPAL_DUMP_INFO2				94
>  #define OPAL_PCI_EEH_FREEZE_SET			97
>  #define OPAL_HANDLE_HMI				98
> +#define OPAL_CONFIG_IDLE_STATE			99
>  #define OPAL_REGISTER_DUMP_REGION		101
>  #define OPAL_UNREGISTER_DUMP_REGION		102
>  
> @@ -775,6 +776,7 @@ extern struct device_node *opal_node;
>  /* Flags used for idle state discovery from the device tree */
>  #define IDLE_INST_NAP	0x00010000 /* nap instruction can be used */
>  #define IDLE_INST_SLEEP	0x00020000 /* sleep instruction can be used */
> +#define IDLE_INST_SLEEP_ER1	0x00080000 /* Use sleep with work around*/

Usual comment about names.
 
>  /* API functions */
>  int64_t opal_invalid_call(void);
> @@ -975,6 +977,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs);
>  
>  extern void opal_shutdown(void);
>  extern int opal_resync_timebase(void);
> +int64_t opal_config_idle_state(uint64_t state, uint64_t enter);
>  
>  extern void opal_lpc_init(void);
>  
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index dda7ac4..41953cd 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -451,8 +451,10 @@ extern unsigned long cpuidle_disable;
>  enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
>  
>  extern int powersave_nap;	/* set if nap mode can be used in idle loop */
> +extern void arch_setup_idle(void);
>  extern void power7_nap(int check_irq);
> -extern void power7_sleep(void);
> +extern unsigned long power7_sleep(void);
> +extern unsigned long __power7_sleep(void);
>  extern void flush_instruction_cache(void);
>  extern void hard_reset_now(void);
>  extern void poweroff_now(void);
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index d7216c9..1f268e0 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -32,6 +32,9 @@
>  #include <asm/machdep.h>
>  #include <asm/runlatch.h>
>  #include <asm/smp.h>
> +#include <asm/cputhreads.h>
> +#include <asm/firmware.h>
> +#include <asm/opal.h>
>  
> 
>  unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
> @@ -78,6 +81,22 @@ void arch_cpu_idle(void)
>  	HMT_medium();
>  	ppc64_runlatch_on();
>  }
> +void arch_setup_idle(void)
> +{
> +	if (ppc_md.setup_idle)
> +		ppc_md.setup_idle();
> +}

See comment about hook

> +unsigned long power7_sleep(void)
> +{
> +	unsigned long ret;
> +
> +	if (ppc_md.power7_sleep)
> +		ret = ppc_md.power7_sleep();
> +	else
> +		ret = __power7_sleep();
> +	return ret;
> +}

This is in the wrong place. In fact, it should probably be power8 and
not power7 and it should be somewhere in powernv, not in generic code.

We don't need the ppc_md. hook, we can just check if the workaround
is needed from the ppc_md code.
 
>  int powersave_nap;
>  
> diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
> index be05841..c3481c9 100644
> --- a/arch/powerpc/kernel/idle_power7.S
> +++ b/arch/powerpc/kernel/idle_power7.S
> @@ -129,7 +129,7 @@ _GLOBAL(power7_nap)
>  	b	power7_powersave_common
>  	/* No return */
>  
> -_GLOBAL(power7_sleep)
> +_GLOBAL(__power7_sleep)
>  	li	r3,1
>  	li	r4,1
>  	b	power7_powersave_common
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index 2e6ce1b..8d1e724 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -245,5 +245,6 @@ OPAL_CALL(opal_sensor_read,			OPAL_SENSOR_READ);
>  OPAL_CALL(opal_get_param,			OPAL_GET_PARAM);
>  OPAL_CALL(opal_set_param,			OPAL_SET_PARAM);
>  OPAL_CALL(opal_handle_hmi,			OPAL_HANDLE_HMI);
> +OPAL_CALL(opal_config_idle_state,		OPAL_CONFIG_IDLE_STATE);
>  OPAL_CALL(opal_register_dump_region,		OPAL_REGISTER_DUMP_REGION);
>  OPAL_CALL(opal_unregister_dump_region,		OPAL_UNREGISTER_DUMP_REGION);
> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> index 2dca1d8..9d9a898 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -36,9 +36,20 @@
>  #include <asm/opal.h>
>  #include <asm/kexec.h>
>  #include <asm/smp.h>
> +#include <asm/cputhreads.h>
>  
>  #include "powernv.h"
>  
> +/* Per-cpu structures to keep track of cpus of a core that
> + * are in idle states >= fastsleep so as to call opal for
> + * sleep setup when the entire core is ready to go to fastsleep.
> + *
> + * We need sometihng similar to a per-core lock. For now we
> + * achieve this by taking the lock of the primary thread in the core.
> + */
> +static DEFINE_PER_CPU(spinlock_t, fastsleep_override_lock);
> +static DEFINE_PER_CPU(int, fastsleep_cnt);
> +
>  static void __init pnv_setup_arch(void)
>  {
>  	set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
> @@ -254,35 +265,8 @@ static unsigned long pnv_memory_block_size(void)
>  }
>  #endif
>  
> -static void __init pnv_setup_machdep_opal(void)
> -{
> -	ppc_md.get_boot_time = opal_get_boot_time;
> -	ppc_md.get_rtc_time = opal_get_rtc_time;
> -	ppc_md.set_rtc_time = opal_set_rtc_time;
> -	ppc_md.restart = pnv_restart;
> -	ppc_md.power_off = pnv_power_off;
> -	ppc_md.halt = pnv_halt;
> -	ppc_md.machine_check_exception = opal_machine_check;
> -	ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery;
> -	ppc_md.hmi_exception_early = opal_hmi_exception_early;
> -	ppc_md.handle_hmi_exception = opal_handle_hmi_exception;
> -}
> -
> -#ifdef CONFIG_PPC_POWERNV_RTAS
> -static void __init pnv_setup_machdep_rtas(void)
> -{
> -	if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) {
> -		ppc_md.get_boot_time = rtas_get_boot_time;
> -		ppc_md.get_rtc_time = rtas_get_rtc_time;
> -		ppc_md.set_rtc_time = rtas_set_rtc_time;
> -	}
> -	ppc_md.restart = rtas_restart;
> -	ppc_md.power_off = rtas_power_off;
> -	ppc_md.halt = rtas_halt;
> -}
> -#endif /* CONFIG_PPC_POWERNV_RTAS */

That patch is ugly because of moving the above. You should instead
change the previous patch to put pnv_probe_idle_states above the above
two functions so that this patch has a lot less impact and is more
reviewable.

>  static unsigned int supported_cpuidle_states;
> +static int need_fastsleep_workaround;

bool ?
 
>  unsigned int pnv_get_supported_cpuidle_states(void)
>  {
> @@ -292,12 +276,13 @@ unsigned int pnv_get_supported_cpuidle_states(void)
>  static int __init pnv_probe_idle_states(void)
>  {
>  	struct device_node *power_mgt;
> -	struct property *prop;
>  	int dt_idle_states;
> -	u32 *flags;

Fix the previous one to use __be ? IE. Previous patch is endian broken
and this one fixes it. Don't do that.

> +	const __be32 *idle_state_flags;
> +	u32 len_flags, flags;
>  	int i;
>  
>  	supported_cpuidle_states = 0;
> +	need_fastsleep_workaround = 0;
>  
>  	if (cpuidle_disable != IDLE_NO_OVERRIDE)
>  		return 0;
> @@ -311,21 +296,28 @@ static int __init pnv_probe_idle_states(void)
>  		return 0;
>  	}
>  
> -	prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
> -	if (!prop) {
> +	idle_state_flags = of_get_property(power_mgt,
> +			"ibm,cpu-idle-state-flags", &len_flags);
> +	if (!idle_state_flags) {
>  		pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
>  		return 0;
>  	}
>  
> -	dt_idle_states = prop->length / sizeof(u32);
> -	flags = (u32 *) prop->value;
> +	dt_idle_states = len_flags / sizeof(u32);
>  
>  	for (i = 0; i < dt_idle_states; i++) {
> -		if (flags[i] & IDLE_INST_NAP)
> +
> +		flags = be32_to_cpu(idle_state_flags[i]);
> +		if (flags & IDLE_INST_NAP)
>  			supported_cpuidle_states |= IDLE_USE_NAP;
>  
> -		if (flags[i] & IDLE_INST_SLEEP)
> +		if (flags & IDLE_INST_SLEEP)
>  			supported_cpuidle_states |= IDLE_USE_SLEEP;
> +
> +		if (flags & IDLE_INST_SLEEP_ER1) {
> +			supported_cpuidle_states |= IDLE_USE_SLEEP;
> +			need_fastsleep_workaround = 1;
> +		}
>  	}
>  
>  	return 0;
> @@ -333,6 +325,81 @@ static int __init pnv_probe_idle_states(void)
>  
>  subsys_initcall(pnv_probe_idle_states);
>  
> +static void pnv_setup_idle(void)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		spin_lock_init(&per_cpu(fastsleep_override_lock, cpu));
> +		per_cpu(fastsleep_cnt, cpu) = threads_per_core;
> +	}
> +}

That can be done from probe_idle_states no ?

That locking construct and counter per-core (not per-CPU really) should
probably be documented somewhere and to be safe we should probably
initialize the secondary threads counters to some crazy value like -1
and BUG_ON on it in the use case.

> +static void
> +pnv_apply_fastsleep_workaround(bool enter_fastsleep, int primary_thread)
> +{
> +	if (enter_fastsleep) {
> +		spin_lock(&per_cpu(fastsleep_override_lock, primary_thread));
> +		if (--(per_cpu(fastsleep_cnt, primary_thread)) == 0)
> +			opal_config_idle_state(1, 1);
> +		spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread));
> +	} else {
> +		spin_lock(&per_cpu(fastsleep_override_lock, primary_thread));
> +		if ((per_cpu(fastsleep_cnt, primary_thread)) == 0)
> +			opal_config_idle_state(1, 0);
> +		per_cpu(fastsleep_cnt, primary_thread)++;
> +		spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread));
> +	}

Make two separate functions, one for enter and one to exit. IE.

pnv_enter_sleep_workaround();

vs.

pnv_exit_sleep_workaround();

> +
> +static unsigned long pnv_power7_sleep(void)
> +{
> +	int cpu, primary_thread;
> +	unsigned long srr1;
> +
> +	cpu = smp_processor_id();
> +	primary_thread = cpu_first_thread_sibling(cpu);
> +
> +	if (need_fastsleep_workaround) {
> +		pnv_apply_fastsleep_workaround(1, primary_thread);
> +		srr1 = __power7_sleep();
> +		pnv_apply_fastsleep_workaround(0, primary_thread);
> +	} else {
> +		srr1 = __power7_sleep();
> +	}
> +	return srr1;
> +}

Just pnv_sleep() and you can remove the __ in power7_sleep

> +static void __init pnv_setup_machdep_opal(void)
> +{
> +	ppc_md.get_boot_time = opal_get_boot_time;
> +	ppc_md.get_rtc_time = opal_get_rtc_time;
> +	ppc_md.set_rtc_time = opal_set_rtc_time;
> +	ppc_md.restart = pnv_restart;
> +	ppc_md.power_off = pnv_power_off;
> +	ppc_md.halt = pnv_halt;
> +	ppc_md.machine_check_exception = opal_machine_check;
> +	ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery;
> +	ppc_md.hmi_exception_early = opal_hmi_exception_early;
> +	ppc_md.handle_hmi_exception = opal_handle_hmi_exception;
> +	ppc_md.setup_idle = pnv_setup_idle;
> +	ppc_md.power7_sleep = pnv_power7_sleep;
> +}

Just export pnv_sleep() and call it from the pnv cpuidle driver
directly.

> +#ifdef CONFIG_PPC_POWERNV_RTAS
> +static void __init pnv_setup_machdep_rtas(void)
> +{
> +	if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) {
> +		ppc_md.get_boot_time = rtas_get_boot_time;
> +		ppc_md.get_rtc_time = rtas_get_rtc_time;
> +		ppc_md.set_rtc_time = rtas_set_rtc_time;
> +	}
> +	ppc_md.restart = rtas_restart;
> +	ppc_md.power_off = rtas_power_off;
> +	ppc_md.halt = rtas_halt;
> +}
> +#endif /* CONFIG_PPC_POWERNV_RTAS */
> +
>  static int __init pnv_probe(void)
>  {
>  	unsigned long root = of_get_flat_dt_root();
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 23d2743..8ad97a9 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -18,6 +18,7 @@
>  #include <asm/firmware.h>
>  #include <asm/opal.h>
>  #include <asm/runlatch.h>
> +#include <asm/processor.h>
>  
>  /* Flags and constants used in PowerNV platform */
>  
> @@ -195,7 +196,8 @@ static int powernv_add_idle_states(void)
>  			nr_idle_states++;
>  		}
>  
> -		if (flags & IDLE_INST_SLEEP) {
> +		if ((flags & IDLE_INST_SLEEP_ER1) ||
> +				(flags & IDLE_INST_SLEEP)) {
>  			/* Add FASTSLEEP state */
>  			strcpy(powernv_states[nr_idle_states].name, "FastSleep");
>  			strcpy(powernv_states[nr_idle_states].desc, "FastSleep");
> @@ -247,6 +249,10 @@ static int __init powernv_processor_idle_init(void)
>  
>  	register_cpu_notifier(&setup_hotplug_notifier);
>  	printk(KERN_DEBUG "powernv_idle_driver registered\n");
> +
> +	/* If any idle states require special
> +	 * initializations before cpuidle kicks in */
> +	arch_setup_idle();
>  	return 0;
>  }
>  



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep
  2014-10-07  5:11   ` Benjamin Herrenschmidt
@ 2014-10-09 10:03     ` Preeti U Murthy
  0 siblings, 0 replies; 11+ messages in thread
From: Preeti U Murthy @ 2014-10-09 10:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Shreyas B. Prabhu
  Cc: linux-kernel, Paul Mackerras, Michael Ellerman, linuxppc-dev

On 10/07/2014 10:41 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2014-10-01 at 13:15 +0530, Shreyas B. Prabhu wrote:
>>
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>> index 050f79a..c64f3cc0 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -100,25 +100,8 @@ system_reset_pSeries:
>>  	SET_SCRATCH0(r13)
>>  #ifdef CONFIG_PPC_P7_NAP
>>  BEGIN_FTR_SECTION
>> -	/* Running native on arch 2.06 or later, check if we are
>> -	 * waking up from nap. We only handle no state loss and
>> -	 * supervisor state loss. We do -not- handle hypervisor
>> -	 * state loss at this time.
>> -	 */
>> -	mfspr	r13,SPRN_SRR1
>> -	rlwinm.	r13,r13,47-31,30,31
>> -	beq	9f
>>  
>> -	/* waking up from powersave (nap) state */
>> -	cmpwi	cr1,r13,2
>> -	/* Total loss of HV state is fatal, we could try to use the
>> -	 * PIR to locate a PACA, then use an emergency stack etc...
>> -	 * OPAL v3 based powernv platforms have new idle states
>> -	 * which fall in this catagory.
>> -	 */
>> -	bgt	cr1,8f
>>  	GET_PACA(r13)
>> -
>>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>>  	li	r0,KVM_HWTHREAD_IN_KERNEL
>>  	stb	r0,HSTATE_HWTHREAD_STATE(r13)
>> @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION
>>  1:
>>  #endif
> 
> So you moved the state loss check to after the KVM check ? Was this
> reviewed by Paul ? Is that ok ? (Does this match what we have in
> PowerKVM ?). Is it possible that we end up calling kvm_start_guest
> after a HV state loss or do we know for sure that this won't happen
> for a reason or another ? If that's the case, then that reason needs
> to be clearly documented here in a comment.

This wont happen because the first thread in the core which comes out of
an idle state which has a state loss will not enter into KVM since the
HSTATE_HWTHREAD_STATE is not yet set. It continues on to restore the
lost state.

This thread sets the HSTATE_HWTHREAD_STATE and wakes up the remaining
threads in the core. These sibling threads enter kvm directly not
requiring to restore lost state since the first thread has restored it
anyway. So we are safe. We will certainly add a comment there.

Thanks

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-10-09 10:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-01  7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu
2014-10-01  7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu
2014-10-07  5:06   ` Benjamin Herrenschmidt
2014-10-01  7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu
2014-10-02 16:39   ` Shreyas B Prabhu
2014-10-07  5:11   ` Benjamin Herrenschmidt
2014-10-09 10:03     ` Preeti U Murthy
2014-10-01  7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu
2014-10-07  5:20   ` Benjamin Herrenschmidt
2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki
2014-10-02 16:40   ` Shreyas B Prabhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).