* [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes @ 2014-10-01 7:45 Shreyas B. Prabhu 2014-10-01 7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:45 UTC (permalink / raw) To: linux-kernel Cc: Srivatsa S. Bhat, linux-pm, Shreyas B. Prabhu, Rafael J. Wysocki, Paul Mackerras, Preeti U. Murthy, linuxppc-dev Fast sleep is an idle state, where the core and the L1 and L2 caches are brought down to a threshold voltage. This also means that the communication between L2 and L3 caches have to be fenced. However the current P8 chips have a bug wherein this fencing between L2 and L3 caches get delayed by a cpu cycle. This can delay L3 response to the other cpus if they request for data during this time. Thus they would fetch the same data from the memory which could lead to data corruption if L3 cache is not flushed. This series overcomes above problem in kernel. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Srivatsa S. Bhat <srivatsa@mit.edu> Cc: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> v2: Rebased on 3.17-rc7 Split from 'powerpc/powernv: Support for fastsleep and winkle' v1: https://lkml.org/lkml/2014/8/25/446 Preeti U Murthy (1): powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu (1): powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Srivatsa S. Bhat (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states arch/powerpc/include/asm/machdep.h | 3 + arch/powerpc/include/asm/opal.h | 7 ++ arch/powerpc/include/asm/processor.h | 4 +- arch/powerpc/kernel/exceptions-64s.S | 35 ++++---- arch/powerpc/kernel/idle.c | 19 ++++ arch/powerpc/kernel/idle_power7.S | 2 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/powernv.h | 7 ++ arch/powerpc/platforms/powernv/setup.c | 118 +++++++++++++++++++++++++ arch/powerpc/platforms/powernv/smp.c | 11 ++- drivers/cpuidle/cpuidle-powernv.c | 13 ++- 11 files changed, 194 insertions(+), 26 deletions(-) -- 1.9.3 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states 2014-10-01 7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu @ 2014-10-01 7:45 ` Shreyas B. Prabhu 2014-10-07 5:06 ` Benjamin Herrenschmidt 2014-10-01 7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu ` (2 subsequent siblings) 3 siblings, 1 reply; 11+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:45 UTC (permalink / raw) To: linux-kernel Cc: Shreyas B. Prabhu, Srivatsa S. Bhat, linux-pm, Rafael J. Wysocki, Paul Mackerras, Preeti U. Murthy, linuxppc-dev From: "Srivatsa S. Bhat" <srivatsa@mit.edu> The offline cpus should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> [ Changelog modified by preeti@linux.vnet.ibm.com ] Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> --- arch/powerpc/include/asm/opal.h | 4 +++ arch/powerpc/platforms/powernv/powernv.h | 7 +++++ arch/powerpc/platforms/powernv/setup.c | 51 ++++++++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/smp.c | 11 ++++++- drivers/cpuidle/cpuidle-powernv.c | 7 ++--- 5 files changed, 75 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 86055e5..28b8342 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -772,6 +772,10 @@ extern struct kobject *opal_kobj; /* /ibm,opal */ extern struct device_node *opal_node; +/* Flags used for idle state discovery from the device tree */ +#define IDLE_INST_NAP 0x00010000 /* nap instruction can be used */ +#define IDLE_INST_SLEEP 0x00020000 /* sleep instruction can be used */ + /* API functions */ int64_t opal_invalid_call(void); int64_t opal_console_write(int64_t term_number, __be64 *length, diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 75501bf..31ece13 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) } #endif +/* Flags to indicate which of the CPU idle states are available for use */ + +#define IDLE_USE_NAP (1UL << 0) +#define IDLE_USE_SLEEP (1UL << 1) + +extern unsigned int pnv_get_supported_cpuidle_states(void); + extern void pnv_lpc_init(void); bool cpu_core_split_required(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 5a0e2dc..2dca1d8 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void) } #endif /* CONFIG_PPC_POWERNV_RTAS */ +static unsigned int supported_cpuidle_states; + +unsigned int pnv_get_supported_cpuidle_states(void) +{ + return supported_cpuidle_states; +} + +static int __init pnv_probe_idle_states(void) +{ + struct device_node *power_mgt; + struct property *prop; + int dt_idle_states; + u32 *flags; + int i; + + supported_cpuidle_states = 0; + + if (cpuidle_disable != IDLE_NO_OVERRIDE) + return 0; + + if (!firmware_has_feature(FW_FEATURE_OPALv3)) + return 0; + + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); + if (!power_mgt) { + pr_warn("opal: PowerMgmt Node not found\n"); + return 0; + } + + prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL); + if (!prop) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); + return 0; + } + + dt_idle_states = prop->length / sizeof(u32); + flags = (u32 *) prop->value; + + for (i = 0; i < dt_idle_states; i++) { + if (flags[i] & IDLE_INST_NAP) + supported_cpuidle_states |= IDLE_USE_NAP; + + if (flags[i] & IDLE_INST_SLEEP) + supported_cpuidle_states |= IDLE_USE_SLEEP; + } + + return 0; +} + +subsys_initcall(pnv_probe_idle_states); + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 5fcfcf4..3ad31d2 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void) static void pnv_smp_cpu_kill_self(void) { unsigned int cpu; + unsigned long idle_states; /* Standard hot unplug procedure */ local_irq_disable(); @@ -159,13 +160,21 @@ static void pnv_smp_cpu_kill_self(void) generic_set_cpu_dead(cpu); smp_wmb(); + idle_states = pnv_get_supported_cpuidle_states(); + /* We don't want to take decrementer interrupts while we are offline, * so clear LPCR:PECE1. We keep PECE2 enabled. */ mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1); while (!generic_check_cpu_restart(cpu)) { ppc64_runlatch_off(); - power7_nap(1); + + /* If sleep is supported, go to sleep, instead of nap */ + if (idle_states & IDLE_USE_SLEEP) + power7_sleep(); + else + power7_nap(1); + ppc64_runlatch_on(); /* Reenable IRQs briefly to clear the IPI that woke us */ diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index a64be57..23d2743 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -16,13 +16,12 @@ #include <asm/machdep.h> #include <asm/firmware.h> +#include <asm/opal.h> #include <asm/runlatch.h> /* Flags and constants used in PowerNV platform */ #define MAX_POWERNV_IDLE_STATES 8 -#define IDLE_USE_INST_NAP 0x00010000 /* Use nap instruction */ -#define IDLE_USE_INST_SLEEP 0x00020000 /* Use sleep instruction */ struct cpuidle_driver powernv_idle_driver = { .name = "powernv_idle", @@ -185,7 +184,7 @@ static int powernv_add_idle_states(void) for (i = 0; i < dt_idle_states; i++) { flags = be32_to_cpu(idle_state_flags[i]); - if (flags & IDLE_USE_INST_NAP) { + if (flags & IDLE_INST_NAP) { /* Add NAP state */ strcpy(powernv_states[nr_idle_states].name, "Nap"); strcpy(powernv_states[nr_idle_states].desc, "Nap"); @@ -196,7 +195,7 @@ static int powernv_add_idle_states(void) nr_idle_states++; } - if (flags & IDLE_USE_INST_SLEEP) { + if (flags & IDLE_INST_SLEEP) { /* Add FASTSLEEP state */ strcpy(powernv_states[nr_idle_states].name, "FastSleep"); strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); -- 1.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states 2014-10-01 7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu @ 2014-10-07 5:06 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 11+ messages in thread From: Benjamin Herrenschmidt @ 2014-10-07 5:06 UTC (permalink / raw) To: Shreyas B. Prabhu Cc: Srivatsa S. Bhat, linux-pm, Rafael J. Wysocki, linux-kernel, Paul Mackerras, Preeti U. Murthy, linuxppc-dev On Wed, 2014-10-01 at 13:15 +0530, Shreyas B. Prabhu wrote: > From: "Srivatsa S. Bhat" <srivatsa@mit.edu> > > The offline cpus Arguably "cpus" here should be "secondary threads" to make the commit message a bit more comprehensible. A few more nits below... > should enter deep idle states so as to gain maximum > powersavings when the entire core is offline. To do so the offline path > must be made aware of the available deepest idle state. Hence probe the > device tree for the possible idle states in powernv core code and > expose the deepest idle state through flags. > > Since the device tree is probed by the cpuidle driver as well, move > the parameters required to discover the idle states into an appropriate > common place to both the driver and the powernv core code. > > Another point is that fastsleep idle state may require workarounds in > the kernel to function properly. This workaround is introduced in the > subsequent patches. However neither the cpuidle driver or the hotplug > path need be bothered about this workaround. > > They will be taken care of by the core powernv code. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net> > Cc: linux-pm@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Srivatsa S. Bhat <srivatsa@mit.edu> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > [ Changelog modified by preeti@linux.vnet.ibm.com ] > Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> > --- > arch/powerpc/include/asm/opal.h | 4 +++ > arch/powerpc/platforms/powernv/powernv.h | 7 +++++ > arch/powerpc/platforms/powernv/setup.c | 51 ++++++++++++++++++++++++++++++++ > arch/powerpc/platforms/powernv/smp.c | 11 ++++++- > drivers/cpuidle/cpuidle-powernv.c | 7 ++--- > 5 files changed, 75 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 86055e5..28b8342 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -772,6 +772,10 @@ extern struct kobject *opal_kobj; > /* /ibm,opal */ > extern struct device_node *opal_node; > > +/* Flags used for idle state discovery from the device tree */ > +#define IDLE_INST_NAP 0x00010000 /* nap instruction can be used */ > +#define IDLE_INST_SLEEP 0x00020000 /* sleep instruction can be used */ Please provide a better explanation if what this is about, maybe a commend describing the device-tree property. Also those macros have names too likely to collide or be confused with other uses. Use something a bit less ambiguous such as OPAL_PM_NAP_AVAILABLE, OPAL_PM_SLEEP_ENABLED,... Also put that in the part of opal.h that isn't the linux internal implementation, but instead the "API" part. This will help when we finally split the file. > /* API functions */ > int64_t opal_invalid_call(void); > int64_t opal_console_write(int64_t term_number, __be64 *length, > diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h > index 75501bf..31ece13 100644 > --- a/arch/powerpc/platforms/powernv/powernv.h > +++ b/arch/powerpc/platforms/powernv/powernv.h > @@ -23,6 +23,13 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask) > } > #endif > > +/* Flags to indicate which of the CPU idle states are available for use */ > + > +#define IDLE_USE_NAP (1UL << 0) > +#define IDLE_USE_SLEEP (1UL << 1) This somewhat duplicates the opal.h definitions, can't we just re-use them ? > +extern unsigned int pnv_get_supported_cpuidle_states(void); > + > extern void pnv_lpc_init(void); > > bool cpu_core_split_required(void); > diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c > index 5a0e2dc..2dca1d8 100644 > --- a/arch/powerpc/platforms/powernv/setup.c > +++ b/arch/powerpc/platforms/powernv/setup.c > @@ -282,6 +282,57 @@ static void __init pnv_setup_machdep_rtas(void) > } > #endif /* CONFIG_PPC_POWERNV_RTAS */ > > +static unsigned int supported_cpuidle_states; > + > +unsigned int pnv_get_supported_cpuidle_states(void) > +{ > + return supported_cpuidle_states; > +} Will this be used by a module ? Doesn't it need to be exported ? Also keep the prefix pnv on the variable, I don't like globals with such a confusing name. > +static int __init pnv_probe_idle_states(void) > +{ > + struct device_node *power_mgt; > + struct property *prop; > + int dt_idle_states; > + u32 *flags; > + int i; > + > + supported_cpuidle_states = 0; > + > + if (cpuidle_disable != IDLE_NO_OVERRIDE) > + return 0; > + > + if (!firmware_has_feature(FW_FEATURE_OPALv3)) > + return 0; > + > + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); > + if (!power_mgt) { > + pr_warn("opal: PowerMgmt Node not found\n"); > + return 0; > + } > + > + prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL); > + if (!prop) { > + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); > + return 0; > + } > + > + dt_idle_states = prop->length / sizeof(u32); > + flags = (u32 *) prop->value; > + > + for (i = 0; i < dt_idle_states; i++) { > + if (flags[i] & IDLE_INST_NAP) > + supported_cpuidle_states |= IDLE_USE_NAP; > + > + if (flags[i] & IDLE_INST_SLEEP) > + supported_cpuidle_states |= IDLE_USE_SLEEP; > + } > + > + return 0; > +} > + > +subsys_initcall(pnv_probe_idle_states); > + > static int __init pnv_probe(void) > { > unsigned long root = of_get_flat_dt_root(); > diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c > index 5fcfcf4..3ad31d2 100644 > --- a/arch/powerpc/platforms/powernv/smp.c > +++ b/arch/powerpc/platforms/powernv/smp.c > @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void) > static void pnv_smp_cpu_kill_self(void) > { > unsigned int cpu; > + unsigned long idle_states; > > /* Standard hot unplug procedure */ > local_irq_disable(); > @@ -159,13 +160,21 @@ static void pnv_smp_cpu_kill_self(void) > generic_set_cpu_dead(cpu); > smp_wmb(); > > + idle_states = pnv_get_supported_cpuidle_states(); > + > /* We don't want to take decrementer interrupts while we are offline, > * so clear LPCR:PECE1. We keep PECE2 enabled. > */ > mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1); > while (!generic_check_cpu_restart(cpu)) { > ppc64_runlatch_off(); > - power7_nap(1); > + > + /* If sleep is supported, go to sleep, instead of nap */ > + if (idle_states & IDLE_USE_SLEEP) > + power7_sleep(); > + else > + power7_nap(1); > + > ppc64_runlatch_on(); > > /* Reenable IRQs briefly to clear the IPI that woke us */ > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c > index a64be57..23d2743 100644 > --- a/drivers/cpuidle/cpuidle-powernv.c > +++ b/drivers/cpuidle/cpuidle-powernv.c > @@ -16,13 +16,12 @@ > > #include <asm/machdep.h> > #include <asm/firmware.h> > +#include <asm/opal.h> > #include <asm/runlatch.h> > > /* Flags and constants used in PowerNV platform */ > > #define MAX_POWERNV_IDLE_STATES 8 > -#define IDLE_USE_INST_NAP 0x00010000 /* Use nap instruction */ > -#define IDLE_USE_INST_SLEEP 0x00020000 /* Use sleep instruction */ > > struct cpuidle_driver powernv_idle_driver = { > .name = "powernv_idle", > @@ -185,7 +184,7 @@ static int powernv_add_idle_states(void) > for (i = 0; i < dt_idle_states; i++) { > > flags = be32_to_cpu(idle_state_flags[i]); > - if (flags & IDLE_USE_INST_NAP) { > + if (flags & IDLE_INST_NAP) { > /* Add NAP state */ > strcpy(powernv_states[nr_idle_states].name, "Nap"); > strcpy(powernv_states[nr_idle_states].desc, "Nap"); > @@ -196,7 +195,7 @@ static int powernv_add_idle_states(void) > nr_idle_states++; > } > > - if (flags & IDLE_USE_INST_SLEEP) { > + if (flags & IDLE_INST_SLEEP) { > /* Add FASTSLEEP state */ > strcpy(powernv_states[nr_idle_states].name, "FastSleep"); > strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep 2014-10-01 7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu 2014-10-01 7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu @ 2014-10-01 7:45 ` Shreyas B. Prabhu 2014-10-02 16:39 ` Shreyas B Prabhu 2014-10-07 5:11 ` Benjamin Herrenschmidt 2014-10-01 7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu 2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki 3 siblings, 2 replies; 11+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:45 UTC (permalink / raw) To: linux-kernel Cc: Shreyas B. Prabhu, Paul Mackerras, Preeti U Murthy, linuxppc-dev When guests have to be launched, the secondary threads which are offline are woken up to run the guests. Today these threads wake up from nap and check if they have to run guests. Now that the offline secondary threads can go to fastsleep or going ahead a deeper idle state such as winkle, add this check in the wakeup from any of the deep idle states path as well. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Suggested-by: "Srivatsa S. Bhat" <srivatsa@mit.edu> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> [ Changelog added by <preeti@linux.vnet.ibm.com> ] Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> --- arch/powerpc/kernel/exceptions-64s.S | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 050f79a..c64f3cc0 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -100,25 +100,8 @@ system_reset_pSeries: SET_SCRATCH0(r13) #ifdef CONFIG_PPC_P7_NAP BEGIN_FTR_SECTION - /* Running native on arch 2.06 or later, check if we are - * waking up from nap. We only handle no state loss and - * supervisor state loss. We do -not- handle hypervisor - * state loss at this time. - */ - mfspr r13,SPRN_SRR1 - rlwinm. r13,r13,47-31,30,31 - beq 9f - /* waking up from powersave (nap) state */ - cmpwi cr1,r13,2 - /* Total loss of HV state is fatal, we could try to use the - * PIR to locate a PACA, then use an emergency stack etc... - * OPAL v3 based powernv platforms have new idle states - * which fall in this catagory. - */ - bgt cr1,8f GET_PACA(r13) - #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE li r0,KVM_HWTHREAD_IN_KERNEL stb r0,HSTATE_HWTHREAD_STATE(r13) @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION 1: #endif + /* Running native on arch 2.06 or later, check if we are + * waking up from nap. We only handle no state loss and + * supervisor state loss. We do -not- handle hypervisor + * state loss at this time. + */ + mfspr r13,SPRN_SRR1 + rlwinm. r13,r13,47-31,30,31 + beq 9f + + /* waking up from powersave (nap) state */ + cmpwi cr1,r13,2 + GET_PACA(r13) + + bgt cr1,8f + beq cr1,2f b power7_wakeup_noloss 2: b power7_wakeup_loss /* Fast Sleep wakeup on PowerNV */ -8: GET_PACA(r13) - b power7_wakeup_tb_loss +8: b power7_wakeup_tb_loss 9: END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) -- 1.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep 2014-10-01 7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu @ 2014-10-02 16:39 ` Shreyas B Prabhu 2014-10-07 5:11 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 11+ messages in thread From: Shreyas B Prabhu @ 2014-10-02 16:39 UTC (permalink / raw) To: linux-kernel Cc: linux-pm, Rafael J. Wysocki, Paul Mackerras, Preeti U Murthy, linuxppc-dev CCing Rafael J. Wysocki and linux-pm@vger.kernel.org On Wednesday 01 October 2014 01:15 PM, Shreyas B. Prabhu wrote: > When guests have to be launched, the secondary threads which are offline > are woken up to run the guests. Today these threads wake up from nap > and check if they have to run guests. Now that the offline secondary > threads can go to fastsleep or going ahead a deeper idle state such as winkle, > add this check in the wakeup from any of the deep idle states path as well. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: linuxppc-dev@lists.ozlabs.org > Suggested-by: "Srivatsa S. Bhat" <srivatsa@mit.edu> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > [ Changelog added by <preeti@linux.vnet.ibm.com> ] > Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > --- > arch/powerpc/kernel/exceptions-64s.S | 35 ++++++++++++++++------------------- > 1 file changed, 16 insertions(+), 19 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S > index 050f79a..c64f3cc0 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -100,25 +100,8 @@ system_reset_pSeries: > SET_SCRATCH0(r13) > #ifdef CONFIG_PPC_P7_NAP > BEGIN_FTR_SECTION > - /* Running native on arch 2.06 or later, check if we are > - * waking up from nap. We only handle no state loss and > - * supervisor state loss. We do -not- handle hypervisor > - * state loss at this time. > - */ > - mfspr r13,SPRN_SRR1 > - rlwinm. r13,r13,47-31,30,31 > - beq 9f > > - /* waking up from powersave (nap) state */ > - cmpwi cr1,r13,2 > - /* Total loss of HV state is fatal, we could try to use the > - * PIR to locate a PACA, then use an emergency stack etc... > - * OPAL v3 based powernv platforms have new idle states > - * which fall in this catagory. > - */ > - bgt cr1,8f > GET_PACA(r13) > - > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > li r0,KVM_HWTHREAD_IN_KERNEL > stb r0,HSTATE_HWTHREAD_STATE(r13) > @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION > 1: > #endif > > + /* Running native on arch 2.06 or later, check if we are > + * waking up from nap. We only handle no state loss and > + * supervisor state loss. We do -not- handle hypervisor > + * state loss at this time. > + */ > + mfspr r13,SPRN_SRR1 > + rlwinm. r13,r13,47-31,30,31 > + beq 9f > + > + /* waking up from powersave (nap) state */ > + cmpwi cr1,r13,2 > + GET_PACA(r13) > + > + bgt cr1,8f > + > beq cr1,2f > b power7_wakeup_noloss > 2: b power7_wakeup_loss > > /* Fast Sleep wakeup on PowerNV */ > -8: GET_PACA(r13) > - b power7_wakeup_tb_loss > +8: b power7_wakeup_tb_loss > > 9: > END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep 2014-10-01 7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu 2014-10-02 16:39 ` Shreyas B Prabhu @ 2014-10-07 5:11 ` Benjamin Herrenschmidt 2014-10-09 10:03 ` Preeti U Murthy 1 sibling, 1 reply; 11+ messages in thread From: Benjamin Herrenschmidt @ 2014-10-07 5:11 UTC (permalink / raw) To: Shreyas B. Prabhu Cc: Preeti U Murthy, linuxppc-dev, Paul Mackerras, linux-kernel On Wed, 2014-10-01 at 13:15 +0530, Shreyas B. Prabhu wrote: > When guests have to be launched, the secondary threads which are offline > are woken up to run the guests. Today these threads wake up from nap > and check if they have to run guests. Now that the offline secondary > threads can go to fastsleep or going ahead a deeper idle state such as winkle, > add this check in the wakeup from any of the deep idle states path as well. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: linuxppc-dev@lists.ozlabs.org > Suggested-by: "Srivatsa S. Bhat" <srivatsa@mit.edu> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > [ Changelog added by <preeti@linux.vnet.ibm.com> ] > Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > --- > arch/powerpc/kernel/exceptions-64s.S | 35 ++++++++++++++++------------------- > 1 file changed, 16 insertions(+), 19 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S > index 050f79a..c64f3cc0 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -100,25 +100,8 @@ system_reset_pSeries: > SET_SCRATCH0(r13) > #ifdef CONFIG_PPC_P7_NAP > BEGIN_FTR_SECTION > - /* Running native on arch 2.06 or later, check if we are > - * waking up from nap. We only handle no state loss and > - * supervisor state loss. We do -not- handle hypervisor > - * state loss at this time. > - */ > - mfspr r13,SPRN_SRR1 > - rlwinm. r13,r13,47-31,30,31 > - beq 9f > > - /* waking up from powersave (nap) state */ > - cmpwi cr1,r13,2 > - /* Total loss of HV state is fatal, we could try to use the > - * PIR to locate a PACA, then use an emergency stack etc... > - * OPAL v3 based powernv platforms have new idle states > - * which fall in this catagory. > - */ > - bgt cr1,8f > GET_PACA(r13) > - > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > li r0,KVM_HWTHREAD_IN_KERNEL > stb r0,HSTATE_HWTHREAD_STATE(r13) > @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION > 1: > #endif So you moved the state loss check to after the KVM check ? Was this reviewed by Paul ? Is that ok ? (Does this match what we have in PowerKVM ?). Is it possible that we end up calling kvm_start_guest after a HV state loss or do we know for sure that this won't happen for a reason or another ? If that's the case, then that reason needs to be clearly documented here in a comment. > + /* Running native on arch 2.06 or later, check if we are > + * waking up from nap. We only handle no state loss and > + * supervisor state loss. We do -not- handle hypervisor > + * state loss at this time. > + */ > + mfspr r13,SPRN_SRR1 > + rlwinm. r13,r13,47-31,30,31 > + beq 9f > + > + /* waking up from powersave (nap) state */ > + cmpwi cr1,r13,2 > + GET_PACA(r13) > + > + bgt cr1,8f > + > beq cr1,2f > b power7_wakeup_noloss > 2: b power7_wakeup_loss > > /* Fast Sleep wakeup on PowerNV */ > -8: GET_PACA(r13) > - b power7_wakeup_tb_loss > +8: b power7_wakeup_tb_loss > > 9: > END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep 2014-10-07 5:11 ` Benjamin Herrenschmidt @ 2014-10-09 10:03 ` Preeti U Murthy 0 siblings, 0 replies; 11+ messages in thread From: Preeti U Murthy @ 2014-10-09 10:03 UTC (permalink / raw) To: Benjamin Herrenschmidt, Shreyas B. Prabhu Cc: linuxppc-dev, Paul Mackerras, linux-kernel On 10/07/2014 10:41 AM, Benjamin Herrenschmidt wrote: > On Wed, 2014-10-01 at 13:15 +0530, Shreyas B. Prabhu wrote: >> >> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S >> index 050f79a..c64f3cc0 100644 >> --- a/arch/powerpc/kernel/exceptions-64s.S >> +++ b/arch/powerpc/kernel/exceptions-64s.S >> @@ -100,25 +100,8 @@ system_reset_pSeries: >> SET_SCRATCH0(r13) >> #ifdef CONFIG_PPC_P7_NAP >> BEGIN_FTR_SECTION >> - /* Running native on arch 2.06 or later, check if we are >> - * waking up from nap. We only handle no state loss and >> - * supervisor state loss. We do -not- handle hypervisor >> - * state loss at this time. >> - */ >> - mfspr r13,SPRN_SRR1 >> - rlwinm. r13,r13,47-31,30,31 >> - beq 9f >> >> - /* waking up from powersave (nap) state */ >> - cmpwi cr1,r13,2 >> - /* Total loss of HV state is fatal, we could try to use the >> - * PIR to locate a PACA, then use an emergency stack etc... >> - * OPAL v3 based powernv platforms have new idle states >> - * which fall in this catagory. >> - */ >> - bgt cr1,8f >> GET_PACA(r13) >> - >> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE >> li r0,KVM_HWTHREAD_IN_KERNEL >> stb r0,HSTATE_HWTHREAD_STATE(r13) >> @@ -131,13 +114,27 @@ BEGIN_FTR_SECTION >> 1: >> #endif > > So you moved the state loss check to after the KVM check ? Was this > reviewed by Paul ? Is that ok ? (Does this match what we have in > PowerKVM ?). Is it possible that we end up calling kvm_start_guest > after a HV state loss or do we know for sure that this won't happen > for a reason or another ? If that's the case, then that reason needs > to be clearly documented here in a comment. This wont happen because the first thread in the core which comes out of an idle state which has a state loss will not enter into KVM since the HSTATE_HWTHREAD_STATE is not yet set. It continues on to restore the lost state. This thread sets the HSTATE_HWTHREAD_STATE and wakes up the remaining threads in the core. These sibling threads enter kvm directly not requiring to restore lost state since the first thread has restored it anyway. So we are safe. We will certainly add a comment there. Thanks Regards Preeti U Murthy ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep 2014-10-01 7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu 2014-10-01 7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu 2014-10-01 7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu @ 2014-10-01 7:46 ` Shreyas B. Prabhu 2014-10-07 5:20 ` Benjamin Herrenschmidt 2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki 3 siblings, 1 reply; 11+ messages in thread From: Shreyas B. Prabhu @ 2014-10-01 7:46 UTC (permalink / raw) To: linux-kernel Cc: Shreyas B. Prabhu, linux-pm, Rafael J. Wysocki, Paul Mackerras, Preeti U Murthy, linuxppc-dev From: Preeti U Murthy <preeti@linux.vnet.ibm.com> Fast sleep is an idle state, where the core and the L1 and L2 caches are brought down to a threshold voltage. This also means that the communication between L2 and L3 caches have to be fenced. However the current P8 chips have a bug wherein this fencing between L2 and L3 caches get delayed by a cpu cycle. This can delay L3 response to the other cpus if they request for data during this time. Thus they would fetch the same data from the memory which could lead to data corruption if L3 cache is not flushed. The cpu idle states save power at a core level and not at a thread level. Hence powersavings is based on the shallowest idle state that a thread of a core is in. The above issue in fastsleep will arise only when all the threads in a core either enter fastsleep or some of them enter any deeper idle states, with only a few being in fastsleep. This patch therefore implements a workaround this bug by ensuring that, each time a cpu goes to fastsleep, it checks if it is the last thread in the core to enter fastsleep. If so, it needs to make an opal call to get around the above mentioned fastsleep problem in the hardware before issuing the sleep instruction. Similarly when a thread in a core comes out of fastsleep, it needs to verify if its the first thread in the core to come out of fastsleep and issue the opal call to revert the changes made while entering fastsleep. For the same reason mentioned above we need to take care of offline threads as well since we allow them to enter fastsleep and with support for deep winkle soon coming in they can enter winkle as well. We therefore ensure that even offline threads make the above mentioned opal calls similarly, so that as long as the threads in a core are in and idle state >= fastsleep, we have the workaround in place. Whenever a thread comes out of either of these states, it needs to verify if the opal call has been made and if so it will revert it. For now this patch ensures that offline threads enter fastsleep. We need to be able to synchronize the cpus in a core which are entering and exiting fastsleep so as to ensure that the last thread in the core to enter fastsleep and the first to exit fastsleep *only* issue the opal call. To do so, we need a per-core lock and counter. The counter is required to keep track of the number of threads in a core which are in idle state >= fastsleep. To make the implementation of this simple, we introduce a per-cpu lock and counter and every thread always takes the primary thread's lock, modifies the primary thread's counter. This effectively makes them per-core entities. But the workaround is abstracted in the powernv core code and neither the hotplug path nor the cpuidle driver need to bother about it. All they need to know is if fastsleep, with error or no error is present as an idle state. Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> --- arch/powerpc/include/asm/machdep.h | 3 + arch/powerpc/include/asm/opal.h | 3 + arch/powerpc/include/asm/processor.h | 4 +- arch/powerpc/kernel/idle.c | 19 ++++ arch/powerpc/kernel/idle_power7.S | 2 +- arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/setup.c | 139 ++++++++++++++++++------- drivers/cpuidle/cpuidle-powernv.c | 8 +- 8 files changed, 140 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index b125cea..f37014f 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -298,6 +298,9 @@ struct machdep_calls { #ifdef CONFIG_MEMORY_HOTREMOVE int (*remove_memory)(u64, u64); #endif + /* Idle handlers */ + void (*setup_idle)(void); + unsigned long (*power7_sleep)(void); }; extern void e500_idle(void); diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 28b8342..166d572 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -149,6 +149,7 @@ struct opal_sg_list { #define OPAL_DUMP_INFO2 94 #define OPAL_PCI_EEH_FREEZE_SET 97 #define OPAL_HANDLE_HMI 98 +#define OPAL_CONFIG_IDLE_STATE 99 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION 102 @@ -775,6 +776,7 @@ extern struct device_node *opal_node; /* Flags used for idle state discovery from the device tree */ #define IDLE_INST_NAP 0x00010000 /* nap instruction can be used */ #define IDLE_INST_SLEEP 0x00020000 /* sleep instruction can be used */ +#define IDLE_INST_SLEEP_ER1 0x00080000 /* Use sleep with work around*/ /* API functions */ int64_t opal_invalid_call(void); @@ -975,6 +977,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs); extern void opal_shutdown(void); extern int opal_resync_timebase(void); +int64_t opal_config_idle_state(uint64_t state, uint64_t enter); extern void opal_lpc_init(void); diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index dda7ac4..41953cd 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -451,8 +451,10 @@ extern unsigned long cpuidle_disable; enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF}; extern int powersave_nap; /* set if nap mode can be used in idle loop */ +extern void arch_setup_idle(void); extern void power7_nap(int check_irq); -extern void power7_sleep(void); +extern unsigned long power7_sleep(void); +extern unsigned long __power7_sleep(void); extern void flush_instruction_cache(void); extern void hard_reset_now(void); extern void poweroff_now(void); diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c index d7216c9..1f268e0 100644 --- a/arch/powerpc/kernel/idle.c +++ b/arch/powerpc/kernel/idle.c @@ -32,6 +32,9 @@ #include <asm/machdep.h> #include <asm/runlatch.h> #include <asm/smp.h> +#include <asm/cputhreads.h> +#include <asm/firmware.h> +#include <asm/opal.h> unsigned long cpuidle_disable = IDLE_NO_OVERRIDE; @@ -78,6 +81,22 @@ void arch_cpu_idle(void) HMT_medium(); ppc64_runlatch_on(); } +void arch_setup_idle(void) +{ + if (ppc_md.setup_idle) + ppc_md.setup_idle(); +} + +unsigned long power7_sleep(void) +{ + unsigned long ret; + + if (ppc_md.power7_sleep) + ret = ppc_md.power7_sleep(); + else + ret = __power7_sleep(); + return ret; +} int powersave_nap; diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index be05841..c3481c9 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -129,7 +129,7 @@ _GLOBAL(power7_nap) b power7_powersave_common /* No return */ -_GLOBAL(power7_sleep) +_GLOBAL(__power7_sleep) li r3,1 li r4,1 b power7_powersave_common diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index 2e6ce1b..8d1e724 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -245,5 +245,6 @@ OPAL_CALL(opal_sensor_read, OPAL_SENSOR_READ); OPAL_CALL(opal_get_param, OPAL_GET_PARAM); OPAL_CALL(opal_set_param, OPAL_SET_PARAM); OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); +OPAL_CALL(opal_config_idle_state, OPAL_CONFIG_IDLE_STATE); OPAL_CALL(opal_register_dump_region, OPAL_REGISTER_DUMP_REGION); OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 2dca1d8..9d9a898 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -36,9 +36,20 @@ #include <asm/opal.h> #include <asm/kexec.h> #include <asm/smp.h> +#include <asm/cputhreads.h> #include "powernv.h" +/* Per-cpu structures to keep track of cpus of a core that + * are in idle states >= fastsleep so as to call opal for + * sleep setup when the entire core is ready to go to fastsleep. + * + * We need sometihng similar to a per-core lock. For now we + * achieve this by taking the lock of the primary thread in the core. + */ +static DEFINE_PER_CPU(spinlock_t, fastsleep_override_lock); +static DEFINE_PER_CPU(int, fastsleep_cnt); + static void __init pnv_setup_arch(void) { set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); @@ -254,35 +265,8 @@ static unsigned long pnv_memory_block_size(void) } #endif -static void __init pnv_setup_machdep_opal(void) -{ - ppc_md.get_boot_time = opal_get_boot_time; - ppc_md.get_rtc_time = opal_get_rtc_time; - ppc_md.set_rtc_time = opal_set_rtc_time; - ppc_md.restart = pnv_restart; - ppc_md.power_off = pnv_power_off; - ppc_md.halt = pnv_halt; - ppc_md.machine_check_exception = opal_machine_check; - ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery; - ppc_md.hmi_exception_early = opal_hmi_exception_early; - ppc_md.handle_hmi_exception = opal_handle_hmi_exception; -} - -#ifdef CONFIG_PPC_POWERNV_RTAS -static void __init pnv_setup_machdep_rtas(void) -{ - if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) { - ppc_md.get_boot_time = rtas_get_boot_time; - ppc_md.get_rtc_time = rtas_get_rtc_time; - ppc_md.set_rtc_time = rtas_set_rtc_time; - } - ppc_md.restart = rtas_restart; - ppc_md.power_off = rtas_power_off; - ppc_md.halt = rtas_halt; -} -#endif /* CONFIG_PPC_POWERNV_RTAS */ - static unsigned int supported_cpuidle_states; +static int need_fastsleep_workaround; unsigned int pnv_get_supported_cpuidle_states(void) { @@ -292,12 +276,13 @@ unsigned int pnv_get_supported_cpuidle_states(void) static int __init pnv_probe_idle_states(void) { struct device_node *power_mgt; - struct property *prop; int dt_idle_states; - u32 *flags; + const __be32 *idle_state_flags; + u32 len_flags, flags; int i; supported_cpuidle_states = 0; + need_fastsleep_workaround = 0; if (cpuidle_disable != IDLE_NO_OVERRIDE) return 0; @@ -311,21 +296,28 @@ static int __init pnv_probe_idle_states(void) return 0; } - prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL); - if (!prop) { + idle_state_flags = of_get_property(power_mgt, + "ibm,cpu-idle-state-flags", &len_flags); + if (!idle_state_flags) { pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); return 0; } - dt_idle_states = prop->length / sizeof(u32); - flags = (u32 *) prop->value; + dt_idle_states = len_flags / sizeof(u32); for (i = 0; i < dt_idle_states; i++) { - if (flags[i] & IDLE_INST_NAP) + + flags = be32_to_cpu(idle_state_flags[i]); + if (flags & IDLE_INST_NAP) supported_cpuidle_states |= IDLE_USE_NAP; - if (flags[i] & IDLE_INST_SLEEP) + if (flags & IDLE_INST_SLEEP) supported_cpuidle_states |= IDLE_USE_SLEEP; + + if (flags & IDLE_INST_SLEEP_ER1) { + supported_cpuidle_states |= IDLE_USE_SLEEP; + need_fastsleep_workaround = 1; + } } return 0; @@ -333,6 +325,81 @@ static int __init pnv_probe_idle_states(void) subsys_initcall(pnv_probe_idle_states); +static void pnv_setup_idle(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + spin_lock_init(&per_cpu(fastsleep_override_lock, cpu)); + per_cpu(fastsleep_cnt, cpu) = threads_per_core; + } +} + +static void +pnv_apply_fastsleep_workaround(bool enter_fastsleep, int primary_thread) +{ + if (enter_fastsleep) { + spin_lock(&per_cpu(fastsleep_override_lock, primary_thread)); + if (--(per_cpu(fastsleep_cnt, primary_thread)) == 0) + opal_config_idle_state(1, 1); + spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread)); + } else { + spin_lock(&per_cpu(fastsleep_override_lock, primary_thread)); + if ((per_cpu(fastsleep_cnt, primary_thread)) == 0) + opal_config_idle_state(1, 0); + per_cpu(fastsleep_cnt, primary_thread)++; + spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread)); + } +} + +static unsigned long pnv_power7_sleep(void) +{ + int cpu, primary_thread; + unsigned long srr1; + + cpu = smp_processor_id(); + primary_thread = cpu_first_thread_sibling(cpu); + + if (need_fastsleep_workaround) { + pnv_apply_fastsleep_workaround(1, primary_thread); + srr1 = __power7_sleep(); + pnv_apply_fastsleep_workaround(0, primary_thread); + } else { + srr1 = __power7_sleep(); + } + return srr1; +} + +static void __init pnv_setup_machdep_opal(void) +{ + ppc_md.get_boot_time = opal_get_boot_time; + ppc_md.get_rtc_time = opal_get_rtc_time; + ppc_md.set_rtc_time = opal_set_rtc_time; + ppc_md.restart = pnv_restart; + ppc_md.power_off = pnv_power_off; + ppc_md.halt = pnv_halt; + ppc_md.machine_check_exception = opal_machine_check; + ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery; + ppc_md.hmi_exception_early = opal_hmi_exception_early; + ppc_md.handle_hmi_exception = opal_handle_hmi_exception; + ppc_md.setup_idle = pnv_setup_idle; + ppc_md.power7_sleep = pnv_power7_sleep; +} + +#ifdef CONFIG_PPC_POWERNV_RTAS +static void __init pnv_setup_machdep_rtas(void) +{ + if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) { + ppc_md.get_boot_time = rtas_get_boot_time; + ppc_md.get_rtc_time = rtas_get_rtc_time; + ppc_md.set_rtc_time = rtas_set_rtc_time; + } + ppc_md.restart = rtas_restart; + ppc_md.power_off = rtas_power_off; + ppc_md.halt = rtas_halt; +} +#endif /* CONFIG_PPC_POWERNV_RTAS */ + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 23d2743..8ad97a9 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -18,6 +18,7 @@ #include <asm/firmware.h> #include <asm/opal.h> #include <asm/runlatch.h> +#include <asm/processor.h> /* Flags and constants used in PowerNV platform */ @@ -195,7 +196,8 @@ static int powernv_add_idle_states(void) nr_idle_states++; } - if (flags & IDLE_INST_SLEEP) { + if ((flags & IDLE_INST_SLEEP_ER1) || + (flags & IDLE_INST_SLEEP)) { /* Add FASTSLEEP state */ strcpy(powernv_states[nr_idle_states].name, "FastSleep"); strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); @@ -247,6 +249,10 @@ static int __init powernv_processor_idle_init(void) register_cpu_notifier(&setup_hotplug_notifier); printk(KERN_DEBUG "powernv_idle_driver registered\n"); + + /* If any idle states require special + * initializations before cpuidle kicks in */ + arch_setup_idle(); return 0; } -- 1.9.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep 2014-10-01 7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu @ 2014-10-07 5:20 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 11+ messages in thread From: Benjamin Herrenschmidt @ 2014-10-07 5:20 UTC (permalink / raw) To: Shreyas B. Prabhu Cc: linux-pm, Rafael J. Wysocki, linux-kernel, Paul Mackerras, Preeti U Murthy, linuxppc-dev On Wed, 2014-10-01 at 13:16 +0530, Shreyas B. Prabhu wrote: > From: Preeti U Murthy <preeti@linux.vnet.ibm.com> > > Fast sleep is an idle state, where the core and the L1 and L2 > caches are brought down to a threshold voltage. This also means that > the communication between L2 and L3 caches have to be fenced. However > the current P8 chips have a bug wherein this fencing between L2 and > L3 caches get delayed by a cpu cycle. This can delay L3 response to > the other cpus if they request for data during this time. Thus they > would fetch the same data from the memory which could lead to data > corruption if L3 cache is not flushed. > > The cpu idle states save power at a core level and not at a thread level. > Hence powersavings is based on the shallowest idle state that a thread > of a core is in. The above issue in fastsleep will arise only when > all the threads in a core either enter fastsleep or some of them enter > any deeper idle states, with only a few being in fastsleep. This patch > therefore implements a workaround this bug by ensuring > that, each time a cpu goes to fastsleep, it checks if it is the last > thread in the core to enter fastsleep. If so, it needs to make an opal > call to get around the above mentioned fastsleep problem in the hardware > before issuing the sleep instruction. > > Similarly when a thread in a core comes out of fastsleep, it needs > to verify if its the first thread in the core to come out of fastsleep > and issue the opal call to revert the changes made while entering > fastsleep. > > For the same reason mentioned above we need to take care of offline threads > as well since we allow them to enter fastsleep and with support for > deep winkle soon coming in they can enter winkle as well. We therefore > ensure that even offline threads make the above mentioned opal calls > similarly, so that as long as the threads in a core are in and > idle state >= fastsleep, we have the workaround in place. Whenever a > thread comes out of either of these states, it needs to verify if the > opal call has been made and if so it will revert it. For now this patch > ensures that offline threads enter fastsleep. > > We need to be able to synchronize the cpus in a core which are entering > and exiting fastsleep so as to ensure that the last thread in the core > to enter fastsleep and the first to exit fastsleep *only* issue the opal > call. To do so, we need a per-core lock and counter. The counter is > required to keep track of the number of threads in a core which are in > idle state >= fastsleep. To make the implementation of this simple, we > introduce a per-cpu lock and counter and every thread always takes the > primary thread's lock, modifies the primary thread's counter. This > effectively makes them per-core entities. > > But the workaround is abstracted in the powernv core code and neither > the hotplug path nor the cpuidle driver need to bother about it. All > they need to know is if fastsleep, with error or no error is present as > an idle state. > > Cc: linux-pm@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> > Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> > --- > arch/powerpc/include/asm/machdep.h | 3 + > arch/powerpc/include/asm/opal.h | 3 + > arch/powerpc/include/asm/processor.h | 4 +- > arch/powerpc/kernel/idle.c | 19 ++++ > arch/powerpc/kernel/idle_power7.S | 2 +- > arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + > arch/powerpc/platforms/powernv/setup.c | 139 ++++++++++++++++++------- > drivers/cpuidle/cpuidle-powernv.c | 8 +- > 8 files changed, 140 insertions(+), 39 deletions(-) > > diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h > index b125cea..f37014f 100644 > --- a/arch/powerpc/include/asm/machdep.h > +++ b/arch/powerpc/include/asm/machdep.h > @@ -298,6 +298,9 @@ struct machdep_calls { > #ifdef CONFIG_MEMORY_HOTREMOVE > int (*remove_memory)(u64, u64); > #endif > + /* Idle handlers */ > + void (*setup_idle)(void); > + unsigned long (*power7_sleep)(void); > }; Do we need that ppc_md hook ? Since we are going to use idle states in the CPU unplug loop, I would think we should do the necessary initializations at boot time regardless of whether the cpuidle driver is loaded or not, so we probably don't need this, or am I missing something ? > extern void e500_idle(void); > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 28b8342..166d572 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -149,6 +149,7 @@ struct opal_sg_list { > #define OPAL_DUMP_INFO2 94 > #define OPAL_PCI_EEH_FREEZE_SET 97 > #define OPAL_HANDLE_HMI 98 > +#define OPAL_CONFIG_IDLE_STATE 99 > #define OPAL_REGISTER_DUMP_REGION 101 > #define OPAL_UNREGISTER_DUMP_REGION 102 > > @@ -775,6 +776,7 @@ extern struct device_node *opal_node; > /* Flags used for idle state discovery from the device tree */ > #define IDLE_INST_NAP 0x00010000 /* nap instruction can be used */ > #define IDLE_INST_SLEEP 0x00020000 /* sleep instruction can be used */ > +#define IDLE_INST_SLEEP_ER1 0x00080000 /* Use sleep with work around*/ Usual comment about names. > /* API functions */ > int64_t opal_invalid_call(void); > @@ -975,6 +977,7 @@ extern int opal_handle_hmi_exception(struct pt_regs *regs); > > extern void opal_shutdown(void); > extern int opal_resync_timebase(void); > +int64_t opal_config_idle_state(uint64_t state, uint64_t enter); > > extern void opal_lpc_init(void); > > diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h > index dda7ac4..41953cd 100644 > --- a/arch/powerpc/include/asm/processor.h > +++ b/arch/powerpc/include/asm/processor.h > @@ -451,8 +451,10 @@ extern unsigned long cpuidle_disable; > enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF}; > > extern int powersave_nap; /* set if nap mode can be used in idle loop */ > +extern void arch_setup_idle(void); > extern void power7_nap(int check_irq); > -extern void power7_sleep(void); > +extern unsigned long power7_sleep(void); > +extern unsigned long __power7_sleep(void); > extern void flush_instruction_cache(void); > extern void hard_reset_now(void); > extern void poweroff_now(void); > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c > index d7216c9..1f268e0 100644 > --- a/arch/powerpc/kernel/idle.c > +++ b/arch/powerpc/kernel/idle.c > @@ -32,6 +32,9 @@ > #include <asm/machdep.h> > #include <asm/runlatch.h> > #include <asm/smp.h> > +#include <asm/cputhreads.h> > +#include <asm/firmware.h> > +#include <asm/opal.h> > > > unsigned long cpuidle_disable = IDLE_NO_OVERRIDE; > @@ -78,6 +81,22 @@ void arch_cpu_idle(void) > HMT_medium(); > ppc64_runlatch_on(); > } > +void arch_setup_idle(void) > +{ > + if (ppc_md.setup_idle) > + ppc_md.setup_idle(); > +} See comment about hook > +unsigned long power7_sleep(void) > +{ > + unsigned long ret; > + > + if (ppc_md.power7_sleep) > + ret = ppc_md.power7_sleep(); > + else > + ret = __power7_sleep(); > + return ret; > +} This is in the wrong place. In fact, it should probably be power8 and not power7 and it should be somewhere in powernv, not in generic code. We don't need the ppc_md. hook, we can just check if the workaround is needed from the ppc_md code. > int powersave_nap; > > diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S > index be05841..c3481c9 100644 > --- a/arch/powerpc/kernel/idle_power7.S > +++ b/arch/powerpc/kernel/idle_power7.S > @@ -129,7 +129,7 @@ _GLOBAL(power7_nap) > b power7_powersave_common > /* No return */ > > -_GLOBAL(power7_sleep) > +_GLOBAL(__power7_sleep) > li r3,1 > li r4,1 > b power7_powersave_common > diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S > index 2e6ce1b..8d1e724 100644 > --- a/arch/powerpc/platforms/powernv/opal-wrappers.S > +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S > @@ -245,5 +245,6 @@ OPAL_CALL(opal_sensor_read, OPAL_SENSOR_READ); > OPAL_CALL(opal_get_param, OPAL_GET_PARAM); > OPAL_CALL(opal_set_param, OPAL_SET_PARAM); > OPAL_CALL(opal_handle_hmi, OPAL_HANDLE_HMI); > +OPAL_CALL(opal_config_idle_state, OPAL_CONFIG_IDLE_STATE); > OPAL_CALL(opal_register_dump_region, OPAL_REGISTER_DUMP_REGION); > OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION); > diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c > index 2dca1d8..9d9a898 100644 > --- a/arch/powerpc/platforms/powernv/setup.c > +++ b/arch/powerpc/platforms/powernv/setup.c > @@ -36,9 +36,20 @@ > #include <asm/opal.h> > #include <asm/kexec.h> > #include <asm/smp.h> > +#include <asm/cputhreads.h> > > #include "powernv.h" > > +/* Per-cpu structures to keep track of cpus of a core that > + * are in idle states >= fastsleep so as to call opal for > + * sleep setup when the entire core is ready to go to fastsleep. > + * > + * We need sometihng similar to a per-core lock. For now we > + * achieve this by taking the lock of the primary thread in the core. > + */ > +static DEFINE_PER_CPU(spinlock_t, fastsleep_override_lock); > +static DEFINE_PER_CPU(int, fastsleep_cnt); > + > static void __init pnv_setup_arch(void) > { > set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); > @@ -254,35 +265,8 @@ static unsigned long pnv_memory_block_size(void) > } > #endif > > -static void __init pnv_setup_machdep_opal(void) > -{ > - ppc_md.get_boot_time = opal_get_boot_time; > - ppc_md.get_rtc_time = opal_get_rtc_time; > - ppc_md.set_rtc_time = opal_set_rtc_time; > - ppc_md.restart = pnv_restart; > - ppc_md.power_off = pnv_power_off; > - ppc_md.halt = pnv_halt; > - ppc_md.machine_check_exception = opal_machine_check; > - ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery; > - ppc_md.hmi_exception_early = opal_hmi_exception_early; > - ppc_md.handle_hmi_exception = opal_handle_hmi_exception; > -} > - > -#ifdef CONFIG_PPC_POWERNV_RTAS > -static void __init pnv_setup_machdep_rtas(void) > -{ > - if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) { > - ppc_md.get_boot_time = rtas_get_boot_time; > - ppc_md.get_rtc_time = rtas_get_rtc_time; > - ppc_md.set_rtc_time = rtas_set_rtc_time; > - } > - ppc_md.restart = rtas_restart; > - ppc_md.power_off = rtas_power_off; > - ppc_md.halt = rtas_halt; > -} > -#endif /* CONFIG_PPC_POWERNV_RTAS */ That patch is ugly because of moving the above. You should instead change the previous patch to put pnv_probe_idle_states above the above two functions so that this patch has a lot less impact and is more reviewable. > static unsigned int supported_cpuidle_states; > +static int need_fastsleep_workaround; bool ? > unsigned int pnv_get_supported_cpuidle_states(void) > { > @@ -292,12 +276,13 @@ unsigned int pnv_get_supported_cpuidle_states(void) > static int __init pnv_probe_idle_states(void) > { > struct device_node *power_mgt; > - struct property *prop; > int dt_idle_states; > - u32 *flags; Fix the previous one to use __be ? IE. Previous patch is endian broken and this one fixes it. Don't do that. > + const __be32 *idle_state_flags; > + u32 len_flags, flags; > int i; > > supported_cpuidle_states = 0; > + need_fastsleep_workaround = 0; > > if (cpuidle_disable != IDLE_NO_OVERRIDE) > return 0; > @@ -311,21 +296,28 @@ static int __init pnv_probe_idle_states(void) > return 0; > } > > - prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL); > - if (!prop) { > + idle_state_flags = of_get_property(power_mgt, > + "ibm,cpu-idle-state-flags", &len_flags); > + if (!idle_state_flags) { > pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); > return 0; > } > > - dt_idle_states = prop->length / sizeof(u32); > - flags = (u32 *) prop->value; > + dt_idle_states = len_flags / sizeof(u32); > > for (i = 0; i < dt_idle_states; i++) { > - if (flags[i] & IDLE_INST_NAP) > + > + flags = be32_to_cpu(idle_state_flags[i]); > + if (flags & IDLE_INST_NAP) > supported_cpuidle_states |= IDLE_USE_NAP; > > - if (flags[i] & IDLE_INST_SLEEP) > + if (flags & IDLE_INST_SLEEP) > supported_cpuidle_states |= IDLE_USE_SLEEP; > + > + if (flags & IDLE_INST_SLEEP_ER1) { > + supported_cpuidle_states |= IDLE_USE_SLEEP; > + need_fastsleep_workaround = 1; > + } > } > > return 0; > @@ -333,6 +325,81 @@ static int __init pnv_probe_idle_states(void) > > subsys_initcall(pnv_probe_idle_states); > > +static void pnv_setup_idle(void) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) { > + spin_lock_init(&per_cpu(fastsleep_override_lock, cpu)); > + per_cpu(fastsleep_cnt, cpu) = threads_per_core; > + } > +} That can be done from probe_idle_states no ? That locking construct and counter per-core (not per-CPU really) should probably be documented somewhere and to be safe we should probably initialize the secondary threads counters to some crazy value like -1 and BUG_ON on it in the use case. > +static void > +pnv_apply_fastsleep_workaround(bool enter_fastsleep, int primary_thread) > +{ > + if (enter_fastsleep) { > + spin_lock(&per_cpu(fastsleep_override_lock, primary_thread)); > + if (--(per_cpu(fastsleep_cnt, primary_thread)) == 0) > + opal_config_idle_state(1, 1); > + spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread)); > + } else { > + spin_lock(&per_cpu(fastsleep_override_lock, primary_thread)); > + if ((per_cpu(fastsleep_cnt, primary_thread)) == 0) > + opal_config_idle_state(1, 0); > + per_cpu(fastsleep_cnt, primary_thread)++; > + spin_unlock(&per_cpu(fastsleep_override_lock, primary_thread)); > + } Make two separate functions, one for enter and one to exit. IE. pnv_enter_sleep_workaround(); vs. pnv_exit_sleep_workaround(); > + > +static unsigned long pnv_power7_sleep(void) > +{ > + int cpu, primary_thread; > + unsigned long srr1; > + > + cpu = smp_processor_id(); > + primary_thread = cpu_first_thread_sibling(cpu); > + > + if (need_fastsleep_workaround) { > + pnv_apply_fastsleep_workaround(1, primary_thread); > + srr1 = __power7_sleep(); > + pnv_apply_fastsleep_workaround(0, primary_thread); > + } else { > + srr1 = __power7_sleep(); > + } > + return srr1; > +} Just pnv_sleep() and you can remove the __ in power7_sleep > +static void __init pnv_setup_machdep_opal(void) > +{ > + ppc_md.get_boot_time = opal_get_boot_time; > + ppc_md.get_rtc_time = opal_get_rtc_time; > + ppc_md.set_rtc_time = opal_set_rtc_time; > + ppc_md.restart = pnv_restart; > + ppc_md.power_off = pnv_power_off; > + ppc_md.halt = pnv_halt; > + ppc_md.machine_check_exception = opal_machine_check; > + ppc_md.mce_check_early_recovery = opal_mce_check_early_recovery; > + ppc_md.hmi_exception_early = opal_hmi_exception_early; > + ppc_md.handle_hmi_exception = opal_handle_hmi_exception; > + ppc_md.setup_idle = pnv_setup_idle; > + ppc_md.power7_sleep = pnv_power7_sleep; > +} Just export pnv_sleep() and call it from the pnv cpuidle driver directly. > +#ifdef CONFIG_PPC_POWERNV_RTAS > +static void __init pnv_setup_machdep_rtas(void) > +{ > + if (rtas_token("get-time-of-day") != RTAS_UNKNOWN_SERVICE) { > + ppc_md.get_boot_time = rtas_get_boot_time; > + ppc_md.get_rtc_time = rtas_get_rtc_time; > + ppc_md.set_rtc_time = rtas_set_rtc_time; > + } > + ppc_md.restart = rtas_restart; > + ppc_md.power_off = rtas_power_off; > + ppc_md.halt = rtas_halt; > +} > +#endif /* CONFIG_PPC_POWERNV_RTAS */ > + > static int __init pnv_probe(void) > { > unsigned long root = of_get_flat_dt_root(); > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c > index 23d2743..8ad97a9 100644 > --- a/drivers/cpuidle/cpuidle-powernv.c > +++ b/drivers/cpuidle/cpuidle-powernv.c > @@ -18,6 +18,7 @@ > #include <asm/firmware.h> > #include <asm/opal.h> > #include <asm/runlatch.h> > +#include <asm/processor.h> > > /* Flags and constants used in PowerNV platform */ > > @@ -195,7 +196,8 @@ static int powernv_add_idle_states(void) > nr_idle_states++; > } > > - if (flags & IDLE_INST_SLEEP) { > + if ((flags & IDLE_INST_SLEEP_ER1) || > + (flags & IDLE_INST_SLEEP)) { > /* Add FASTSLEEP state */ > strcpy(powernv_states[nr_idle_states].name, "FastSleep"); > strcpy(powernv_states[nr_idle_states].desc, "FastSleep"); > @@ -247,6 +249,10 @@ static int __init powernv_processor_idle_init(void) > > register_cpu_notifier(&setup_hotplug_notifier); > printk(KERN_DEBUG "powernv_idle_driver registered\n"); > + > + /* If any idle states require special > + * initializations before cpuidle kicks in */ > + arch_setup_idle(); > return 0; > } > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes 2014-10-01 7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu ` (2 preceding siblings ...) 2014-10-01 7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu @ 2014-10-01 20:46 ` Rafael J. Wysocki 2014-10-02 16:40 ` Shreyas B Prabhu 3 siblings, 1 reply; 11+ messages in thread From: Rafael J. Wysocki @ 2014-10-01 20:46 UTC (permalink / raw) To: Shreyas B. Prabhu Cc: Srivatsa S. Bhat, linux-pm, linux-kernel, Paul Mackerras, Preeti U. Murthy, linuxppc-dev On Wednesday, October 01, 2014 01:15:57 PM Shreyas B. Prabhu wrote: > Fast sleep is an idle state, where the core and the L1 and L2 > caches are brought down to a threshold voltage. This also means that > the communication between L2 and L3 caches have to be fenced. However > the current P8 chips have a bug wherein this fencing between L2 and > L3 caches get delayed by a cpu cycle. This can delay L3 response to > the other cpus if they request for data during this time. Thus they > would fetch the same data from the memory which could lead to data > corruption if L3 cache is not flushed. > > This series overcomes above problem in kernel. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net> > Cc: linux-pm@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: Srivatsa S. Bhat <srivatsa@mit.edu> > Cc: Preeti U. Murthy <preeti@linux.vnet.ibm.com> > Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> > > v2: > Rebased on 3.17-rc7 > Split from 'powerpc/powernv: Support for fastsleep and winkle' > > v1: > https://lkml.org/lkml/2014/8/25/446 > > Preeti U Murthy (1): > powerpc/powernv/cpuidle: Add workaround to enable fastsleep > > Shreyas B. Prabhu (1): > powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from > fast-sleep > > Srivatsa S. Bhat (1): > powerpc/powernv: Enable Offline CPUs to enter deep idle states > > arch/powerpc/include/asm/machdep.h | 3 + > arch/powerpc/include/asm/opal.h | 7 ++ > arch/powerpc/include/asm/processor.h | 4 +- > arch/powerpc/kernel/exceptions-64s.S | 35 ++++---- > arch/powerpc/kernel/idle.c | 19 ++++ > arch/powerpc/kernel/idle_power7.S | 2 +- > arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + > arch/powerpc/platforms/powernv/powernv.h | 7 ++ > arch/powerpc/platforms/powernv/setup.c | 118 +++++++++++++++++++++++++ > arch/powerpc/platforms/powernv/smp.c | 11 ++- > drivers/cpuidle/cpuidle-powernv.c | 13 ++- > 11 files changed, 194 insertions(+), 26 deletions(-) [2/3] seems to be missig from the series. Also, since that mostly modifies arch/powerpc, I think it should go through that tree. I'm fine with the cpuidle-powernv changes in [1/3] and [3/3]. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes 2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki @ 2014-10-02 16:40 ` Shreyas B Prabhu 0 siblings, 0 replies; 11+ messages in thread From: Shreyas B Prabhu @ 2014-10-02 16:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Srivatsa S. Bhat, linux-pm, linux-kernel, Paul Mackerras, Preeti U. Murthy, linuxppc-dev On Thursday 02 October 2014 02:16 AM, Rafael J. Wysocki wrote: > On Wednesday, October 01, 2014 01:15:57 PM Shreyas B. Prabhu wrote: >> Fast sleep is an idle state, where the core and the L1 and L2 >> caches are brought down to a threshold voltage. This also means that >> the communication between L2 and L3 caches have to be fenced. However >> the current P8 chips have a bug wherein this fencing between L2 and >> L3 caches get delayed by a cpu cycle. This can delay L3 response to >> the other cpus if they request for data during this time. Thus they >> would fetch the same data from the memory which could lead to data >> corruption if L3 cache is not flushed. >> >> This series overcomes above problem in kernel. >> >> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> >> Cc: Paul Mackerras <paulus@samba.org> >> Cc: Michael Ellerman <mpe@ellerman.id.au> >> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> >> Cc: linux-pm@vger.kernel.org >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: Srivatsa S. Bhat <srivatsa@mit.edu> >> Cc: Preeti U. Murthy <preeti@linux.vnet.ibm.com> >> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> >> >> v2: >> Rebased on 3.17-rc7 >> Split from 'powerpc/powernv: Support for fastsleep and winkle' >> >> v1: >> https://lkml.org/lkml/2014/8/25/446 >> >> Preeti U Murthy (1): >> powerpc/powernv/cpuidle: Add workaround to enable fastsleep >> >> Shreyas B. Prabhu (1): >> powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from >> fast-sleep >> >> Srivatsa S. Bhat (1): >> powerpc/powernv: Enable Offline CPUs to enter deep idle states >> >> arch/powerpc/include/asm/machdep.h | 3 + >> arch/powerpc/include/asm/opal.h | 7 ++ >> arch/powerpc/include/asm/processor.h | 4 +- >> arch/powerpc/kernel/exceptions-64s.S | 35 ++++---- >> arch/powerpc/kernel/idle.c | 19 ++++ >> arch/powerpc/kernel/idle_power7.S | 2 +- >> arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + >> arch/powerpc/platforms/powernv/powernv.h | 7 ++ >> arch/powerpc/platforms/powernv/setup.c | 118 +++++++++++++++++++++++++ >> arch/powerpc/platforms/powernv/smp.c | 11 ++- >> drivers/cpuidle/cpuidle-powernv.c | 13 ++- >> 11 files changed, 194 insertions(+), 26 deletions(-) > > [2/3] seems to be missig from the series. > > Also, since that mostly modifies arch/powerpc, I think it should go through > that tree. I'm fine with the cpuidle-powernv changes in [1/3] and [3/3]. > Hi Rafael, Thanks for looking into this. The second patch is an independent fix in the powerpc exception handler. To be safe I am ccing you and linux-pm list on that patch now. Thanks, Shreyas ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-10-09 10:03 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-10-01 7:45 [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Shreyas B. Prabhu 2014-10-01 7:45 ` [PATCH v2 1/3] powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu 2014-10-07 5:06 ` Benjamin Herrenschmidt 2014-10-01 7:45 ` [PATCH v2 2/3] powerpc/kvm/book3s_hv: Enable CPUs to run guest after waking up from fast-sleep Shreyas B. Prabhu 2014-10-02 16:39 ` Shreyas B Prabhu 2014-10-07 5:11 ` Benjamin Herrenschmidt 2014-10-09 10:03 ` Preeti U Murthy 2014-10-01 7:46 ` [PATCH v2 3/3] powerpc/powernv/cpuidle: Add workaround to enable fastsleep Shreyas B. Prabhu 2014-10-07 5:20 ` Benjamin Herrenschmidt 2014-10-01 20:46 ` [PATCH v2 0/3] powernv/cpuidle: Fastsleep workaround and fixes Rafael J. Wysocki 2014-10-02 16:40 ` Shreyas B Prabhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).