* [PATCH 0/3] coupled cpuidle state support @ 2011-12-21 0:09 Colin Cross 2011-12-21 0:09 ` [PATCH 1/3] cpuidle: refactor out cpuidle_enter_state Colin Cross ` (5 more replies) 0 siblings, 6 replies; 36+ messages in thread From: Colin Cross @ 2011-12-21 0:09 UTC (permalink / raw) To: linux-kernel, linux-arm-kernel, linux-pm Cc: Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra, Colin Cross On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the cpus cannot be independently powered down, either due to sequencing restrictions (on Tegra 2, cpu 0 must be the last to power down), or due to HW bugs (on OMAP4460, a cpu powering up will corrupt the gic state unless the other cpu runs a work around). Each cpu has a power state that it can enter without coordinating with the other cpu (usually Wait For Interrupt, or WFI), and one or more "coupled" power states that affect blocks shared between the cpus (L2 cache, interrupt controller, and sometimes the whole SoC). Entering a coupled power state must be tightly controlled on both cpus. The easiest solution to implementing coupled cpu power states is to hotplug all but one cpu whenever possible, usually using a cpufreq governor that looks at cpu load to determine when to enable the secondary cpus. This causes problems, as hotplug is an expensive operation, so the number of hotplug transitions must be minimized, leading to very slow response to loads, often on the order of seconds. This patch series implements an alternative solution, where each cpu will wait in the WFI state until all cpus are ready to enter a coupled state, at which point the coupled state function will be called on all cpus at approximately the same time. Once all cpus are ready to enter idle, they are woken by an smp cross call. At this point, there is a chance that one of the cpus will find work to do, and choose not to enter suspend. A final pass is needed to guarantee that all cpus will call the power state enter function at the same time. During this pass, each cpu will increment the ready counter, and continue once the ready counter matches the number of online coupled cpus. If any cpu exits idle, the other cpus will decrement their counter and retry. To use coupled cpuidle states, a cpuidle driver must: Set struct cpuidle_device.coupled_cpus to the mask of all coupled cpus, usually the same as cpu_possible_mask if all cpus are part of the same cluster. The coupled_cpus mask must be set in the struct cpuidle_device for each cpu. Set struct cpuidle_device.safe_state to a state that is not a coupled state. This is usually WFI. Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each state that affects multiple cpus. Provide a struct cpuidle_state.enter function for each state that affects multiple cpus. This function is guaranteed to be called on all cpus at approximately the same time. The driver should ensure that the cpus all abort together if any cpu tries to abort once the function is called. This series was functionally tested on v3.0, but has only been compile-tested on v3.2 after the removal of per-cpu state fields. This patch set has a few disadvantages over the hotplug governor, but I think they are all fairly minor: * Worst-case interrupt latency can be increased. If one cpu receives an interrupt while the other is spinning in the ready_count loop, the second cpu will be stuck with interrupts off until the first cpu finished processing its interrupt and exits idle. This will increase the worst case interrupt latency by the worst-case interrupt processing time, but should be very rare. * Interrupts are processed while still inside pm_idle. Normally, interrupts are only processed at the very end of pm_idle, just before it returns to the idle loop. Coupled states requires processing interrupts inside cpuidle_enter_state_coupled in order to distinguish between the smp_cross_call from another cpu that is now idle and an interrupt that should cause idle to exit. I don't see a way to fix this without either being able to read the next pending irq from the interrupt chip, or querying the irq core for which interrupts were processed. * Since interrupts are processed inside cpuidle, the next timer event could change. The new timer event will be handled correctly, but the idle state decision made by the governor will be out of date, and will not be revisited. The governor select function could be called again every time, but this could lead to a lot of work being done by an idle cpu if the other cpu was mostly busy. * The spinlock that protects requested_state and ready_count is should probably be replaced with careful use of atomics and barriers. None of the platforms I work with have an SMP idle implementation upstream, so I can't easily show a patch that converts a platform from hotplug governor to coupled cpuidle states. Instead, I'll give a quick example implementation assuming functions that handle hotplug and single-cpu idle already exist. static int mach_enter_idle_coupled(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { ktime_t enter, exit; s64 us; clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu); enter = ktime_get(); cpu_pm_enter(); if (dev->cpu == 0) { for_each_online_cpu(i) while (i != dev->cpu && !mach_cpu_is_reset(i)) cpu_relax(); mach_cpu_idle(); for_each_online_cpu(i) if (i != cpu) mach_cpu_online(i); } else { mach_cpu_offline(); } cpu_pm_exit(); exit = ktime_sub(ktime_get(), enter); us = ktime_to_us(exit); clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu); local_irq_enable(); dev->last_residency = us; return index; } ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/3] cpuidle: refactor out cpuidle_enter_state 2011-12-21 0:09 [PATCH 0/3] coupled cpuidle state support Colin Cross @ 2011-12-21 0:09 ` Colin Cross 2012-01-04 14:08 ` [linux-pm] " Jean Pihet 2011-12-21 0:09 ` [PATCH 2/3] cpuidle: fix error handling in __cpuidle_register_device Colin Cross ` (4 subsequent siblings) 5 siblings, 1 reply; 36+ messages in thread From: Colin Cross @ 2011-12-21 0:09 UTC (permalink / raw) To: linux-kernel, linux-arm-kernel, linux-pm Cc: Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra, Colin Cross Split the code to enter a state and update the stats into a helper function, cpuidle_enter_state, and export it. This function will be called by the coupled state code to handle entering the safe state and the final coupled state. Signed-off-by: Colin Cross <ccross@android.com> --- drivers/cpuidle/cpuidle.c | 43 +++++++++++++++++++++++++++++-------------- drivers/cpuidle/cpuidle.h | 2 ++ 2 files changed, 31 insertions(+), 14 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 06ce268..1486b3c 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -54,6 +54,34 @@ static void cpuidle_kick_cpus(void) {} static int __cpuidle_register_device(struct cpuidle_device *dev); /** + * cpuidle_enter_state + * + * enter the state and update stats + */ +int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, + int next_state) +{ + int entered_state; + struct cpuidle_state *target_state; + + target_state = &drv->states[next_state]; + + entered_state = target_state->enter(dev, drv, next_state); + + if (entered_state >= 0) { + /* Update cpuidle counters */ + /* This can be moved to within driver enter routine + * but that results in multiple copies of same code. + */ + dev->states_usage[entered_state].time += + (unsigned long long)dev->last_residency; + dev->states_usage[entered_state].usage++; + } + + return entered_state; +} + +/** * cpuidle_idle_call - the main idle loop * * NOTE: no locks or semaphores should be used here @@ -63,7 +91,6 @@ int cpuidle_idle_call(void) { struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); struct cpuidle_driver *drv = cpuidle_get_driver(); - struct cpuidle_state *target_state; int next_state, entered_state; if (off) @@ -92,26 +119,14 @@ int cpuidle_idle_call(void) return 0; } - target_state = &drv->states[next_state]; - trace_power_start(POWER_CSTATE, next_state, dev->cpu); trace_cpu_idle(next_state, dev->cpu); - entered_state = target_state->enter(dev, drv, next_state); + entered_state = cpuidle_enter_state(dev, drv, next_state); trace_power_end(dev->cpu); trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu); - if (entered_state >= 0) { - /* Update cpuidle counters */ - /* This can be moved to within driver enter routine - * but that results in multiple copies of same code. - */ - dev->states_usage[entered_state].time += - (unsigned long long)dev->last_residency; - dev->states_usage[entered_state].usage++; - } - /* give the governor an opportunity to reflect on the outcome */ if (cpuidle_curr_governor->reflect) cpuidle_curr_governor->reflect(dev, entered_state); diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h index 38c3fd8..dd2df8f 100644 --- a/drivers/cpuidle/cpuidle.h +++ b/drivers/cpuidle/cpuidle.h @@ -14,6 +14,8 @@ extern struct list_head cpuidle_detected_devices; extern struct mutex cpuidle_lock; extern spinlock_t cpuidle_driver_lock; extern int cpuidle_disabled(void); +int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, + int next_state); /* idle loop */ extern void cpuidle_install_idle_handler(void); -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 1/3] cpuidle: refactor out cpuidle_enter_state 2011-12-21 0:09 ` [PATCH 1/3] cpuidle: refactor out cpuidle_enter_state Colin Cross @ 2012-01-04 14:08 ` Jean Pihet 0 siblings, 0 replies; 36+ messages in thread From: Jean Pihet @ 2012-01-04 14:08 UTC (permalink / raw) To: Colin Cross Cc: linux-kernel, linux-arm-kernel, linux-pm, Kevin Hilman, Len Brown, Amit Kucheria, linux-tegra, linux-omap, Arjan van de Ven Hi Colin, On Wed, Dec 21, 2011 at 1:09 AM, Colin Cross <ccross@android.com> wrote: > Split the code to enter a state and update the stats into a helper > function, cpuidle_enter_state, and export it. This function will > be called by the coupled state code to handle entering the safe > state and the final coupled state. > > Signed-off-by: Colin Cross <ccross@android.com> > --- > drivers/cpuidle/cpuidle.c | 43 +++++++++++++++++++++++++++++-------------- > drivers/cpuidle/cpuidle.h | 2 ++ > 2 files changed, 31 insertions(+), 14 deletions(-) > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > index 06ce268..1486b3c 100644 > --- a/drivers/cpuidle/cpuidle.c > +++ b/drivers/cpuidle/cpuidle.c > @@ -54,6 +54,34 @@ static void cpuidle_kick_cpus(void) {} > static int __cpuidle_register_device(struct cpuidle_device *dev); > > /** > + * cpuidle_enter_state > + * > + * enter the state and update stats > + */ > +int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, > + int next_state) > +{ > + int entered_state; > + struct cpuidle_state *target_state; > + > + target_state = &drv->states[next_state]; > + > + entered_state = target_state->enter(dev, drv, next_state); > + > + if (entered_state >= 0) { > + /* Update cpuidle counters */ > + /* This can be moved to within driver enter routine > + * but that results in multiple copies of same code. > + */ > + dev->states_usage[entered_state].time += > + (unsigned long long)dev->last_residency; > + dev->states_usage[entered_state].usage++; > + } > + > + return entered_state; > +} > + > +/** > * cpuidle_idle_call - the main idle loop > * > * NOTE: no locks or semaphores should be used here > @@ -63,7 +91,6 @@ int cpuidle_idle_call(void) > { > struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); > struct cpuidle_driver *drv = cpuidle_get_driver(); > - struct cpuidle_state *target_state; > int next_state, entered_state; > > if (off) > @@ -92,26 +119,14 @@ int cpuidle_idle_call(void) > return 0; > } > > - target_state = &drv->states[next_state]; > - > trace_power_start(POWER_CSTATE, next_state, dev->cpu); > trace_cpu_idle(next_state, dev->cpu); > > - entered_state = target_state->enter(dev, drv, next_state); > + entered_state = cpuidle_enter_state(dev, drv, next_state); > > trace_power_end(dev->cpu); > trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu); The cpu_idle traces are only present in this function and not in cpuidle_enter_state. Is that expected? Can all the transitions from all the cpus get traced that way? Regards, Jean ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 2/3] cpuidle: fix error handling in __cpuidle_register_device 2011-12-21 0:09 [PATCH 0/3] coupled cpuidle state support Colin Cross 2011-12-21 0:09 ` [PATCH 1/3] cpuidle: refactor out cpuidle_enter_state Colin Cross @ 2011-12-21 0:09 ` Colin Cross 2011-12-21 0:09 ` [PATCH 3/3] cpuidle: add support for states that affect multiple cpus Colin Cross ` (3 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Colin Cross @ 2011-12-21 0:09 UTC (permalink / raw) To: linux-kernel, linux-arm-kernel, linux-pm Cc: Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra, Colin Cross Fix the error handling in __cpuidle_register_device to include the missing list_del. Move it to a label, which will simplify the error handling when coupled states are added. Signed-off-by: Colin Cross <ccross@android.com> --- drivers/cpuidle/cpuidle.c | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 1486b3c..ea00a16 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -318,13 +318,18 @@ static int __cpuidle_register_device(struct cpuidle_device *dev) per_cpu(cpuidle_devices, dev->cpu) = dev; list_add(&dev->device_list, &cpuidle_detected_devices); - if ((ret = cpuidle_add_sysfs(sys_dev))) { - module_put(cpuidle_driver->owner); - return ret; - } + ret = cpuidle_add_sysfs(sys_dev); + if (ret) + goto err_sysfs; dev->registered = 1; return 0; + +err_sysfs: + module_put(cpuidle_driver->owner); + list_del(&dev->device_list); + per_cpu(cpuidle_devices, dev->cpu) = NULL; + return ret; } /** -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 3/3] cpuidle: add support for states that affect multiple cpus 2011-12-21 0:09 [PATCH 0/3] coupled cpuidle state support Colin Cross 2011-12-21 0:09 ` [PATCH 1/3] cpuidle: refactor out cpuidle_enter_state Colin Cross 2011-12-21 0:09 ` [PATCH 2/3] cpuidle: fix error handling in __cpuidle_register_device Colin Cross @ 2011-12-21 0:09 ` Colin Cross 2011-12-21 9:02 ` [PATCH 0/3] coupled cpuidle state support Arjan van de Ven ` (2 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Colin Cross @ 2011-12-21 0:09 UTC (permalink / raw) To: linux-kernel, linux-arm-kernel, linux-pm Cc: Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra, Colin Cross On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the cpus cannot be independently powered down, either due to sequencing restrictions (on Tegra 2, cpu 0 must be the last to power down), or due to HW bugs (on OMAP4460, a cpu powering up will corrupt the gic state unless the other cpu runs a work around). Each cpu has a power state that it can enter without coordinating with the other cpu (usually Wait For Interrupt, or WFI), and one or more "coupled" power states that affect blocks shared between the cpus (L2 cache, interrupt controller, and sometimes the whole SoC). Entering a coupled power state must be tightly controlled on both cpus. The easiest solution to implementing coupled cpu power states is to hotplug all but one cpu whenever possible, usually using a cpufreq governor that looks at cpu load to determine when to enable the secondary cpus. This causes problems, as hotplug is an expensive operation, so the number of hotplug transitions must be minimized, leading to very slow response to loads, often on the order of seconds. This file implements an alternative solution, where each cpu will wait in the WFI state until all cpus are ready to enter a coupled state, at which point the coupled state function will be called on all cpus at approximately the same time. Once all cpus are ready to enter idle, they are woken by an smp cross call. At this point, there is a chance that one of the cpus will find work to do, and choose not to enter suspend. A final pass is needed to guarantee that all cpus will call the power state enter function at the same time. During this pass, each cpu will increment the ready counter, and continue once the ready counter matches the number of online coupled cpus. If any cpu exits idle, the other cpus will decrement their counter and retry. To use coupled cpuidle states, a cpuidle driver must: Set struct cpuidle_device.coupled_cpus to the mask of all coupled cpus, usually the same as cpu_possible_mask if all cpus are part of the same cluster. The coupled_cpus mask must be set in the struct cpuidle_device for each cpu. Set struct cpuidle_device.safe_state to a state that is not a coupled state. This is usually WFI. Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each state that affects multiple cpus. Provide a struct cpuidle_state.enter function for each state that affects multiple cpus. This function is guaranteed to be called on all cpus at approximately the same time. The driver should ensure that the cpus all abort together if any cpu tries to abort once the function is called. Signed-off-by: Colin Cross <ccross@android.com> Cc: Len Brown <len.brown@intel.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Amit Kucheria <amit.kucheria@linaro.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Trinabh Gupta <g.trinabh@gmail.com> Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> --- drivers/cpuidle/Kconfig | 3 + drivers/cpuidle/Makefile | 1 + drivers/cpuidle/coupled.c | 413 +++++++++++++++++++++++++++++++++++++++++++++ drivers/cpuidle/cpuidle.c | 14 ++- drivers/cpuidle/cpuidle.h | 39 +++++ include/linux/cpuidle.h | 7 + 6 files changed, 476 insertions(+), 1 deletions(-) create mode 100644 drivers/cpuidle/coupled.c diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig index 7dbc4a8..7a72e55 100644 --- a/drivers/cpuidle/Kconfig +++ b/drivers/cpuidle/Kconfig @@ -18,3 +18,6 @@ config CPU_IDLE_GOV_MENU bool depends on CPU_IDLE && NO_HZ default y + +config ARCH_NEEDS_CPU_IDLE_COUPLED + def_bool n diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index 5634f88..38c8f69 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -3,3 +3,4 @@ # obj-y += cpuidle.o driver.o governor.o sysfs.o governors/ +obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c new file mode 100644 index 0000000..3fb7d24 --- /dev/null +++ b/drivers/cpuidle/coupled.c @@ -0,0 +1,413 @@ +/* + * coupled.c - helper functions to enter the same idle state on multiple cpus + * + * Copyright (c) 2011 Google, Inc. + * + * Author: Colin Cross <ccross@android.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include <linux/kernel.h> +#include <linux/cpu.h> +#include <linux/cpuidle.h> +#include <linux/mutex.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/spinlock.h> + +#include "cpuidle.h" + +/* + * coupled cpuidle states + * + * On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the + * cpus cannot be independently powered down, either due to + * sequencing restrictions (on Tegra 2, cpu 0 must be the last to + * power down), or due to HW bugs (on OMAP4460, a cpu powering up + * will corrupt the gic state unless the other cpu runs a work + * around). Each cpu has a power state that it can enter without + * coordinating with the other cpu (usually Wait For Interrupt, or + * WFI), and one or more "coupled" power states that affect blocks + * shared between the cpus (L2 cache, interrupt controller, and + * sometimes the whole SoC). Entering a coupled power state must + * be tightly controlled on both cpus. + * + * The easiest solution to implementing coupled cpu power states is + * to hotplug all but one cpu whenever possible, usually using a + * cpufreq governor that looks at cpu load to determine when to + * enable the secondary cpus. This causes problems, as hotplug is an + * expensive operation, so the number of hotplug transitions must be + * minimized, leading to very slow response to loads, often on the + * order of seconds. + * + * This file implements an alternative solution, where each cpu will + * wait in the WFI state until all cpus are ready to enter a coupled + * state, at which point the coupled state function will be called + * on all cpus at approximately the same time. + * + * Once all cpus are ready to enter idle, they are woken by an smp + * cross call. At this point, there is a chance that one of the + * cpus will find work to do, and choose not to enter suspend. A + * final pass is needed to guarantee that all cpus will call the + * power state enter function at the same time. During this pass, + * each cpu will increment the ready counter, and continue once the + * ready counter matches the number of online coupled cpus. If any + * cpu exits idle, the other cpus will decrement their counter and + * retry. + * + * To use coupled cpuidle states, a cpuidle driver must: + * + * Set struct cpuidle_device.coupled_cpus to the mask of all + * coupled cpus, usually the same as cpu_possible_mask if all cpus + * are part of the same cluster. The coupled_cpus mask must be + * set in the struct cpuidle_device for each cpu. + * + * Set struct cpuidle_device.safe_state to a state that is not a + * coupled state. This is usually WFI. + * + * Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each + * state that affects multiple cpus. + * + * Provide a struct cpuidle_state.enter function for each state + * that affects multiple cpus. This function is guaranteed to be + * called on all cpus at approximately the same time. The driver + * should ensure that the cpus all abort together if any cpu tries + * to abort once the function is called. + * + */ + +static DEFINE_MUTEX(cpuidle_coupled_lock); +static DEFINE_PER_CPU(struct call_single_data, cpuidle_coupled_poke_cb); +static cpumask_t cpuidle_coupled_poked_mask; + +/** + * cpuidle_state_is_coupled + * + * Returns true if the target state is coupled with cpus besides this one + */ +bool cpuidle_state_is_coupled(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int state) +{ + return drv->states[state].flags & CPUIDLE_FLAG_COUPLED; +} + +/** + * cpuidle_all_coupled_cpus_idle + * + * Returns true if all cpus coupled to this target state are idle + */ +static inline bool +cpuidle_coupled_cpus_idle(struct cpuidle_coupled *coupled) +{ + int i; + + assert_spin_locked(&coupled->lock); + + smp_rmb(); + + for_each_cpu_mask(i, coupled->alive_coupled_cpus) + if (coupled->requested_state[i] < 0) + return false; + + return true; +} + +/** + * cpuidle_get_idle_state + * + * Returns the deepest idle state that all coupled cpus can enter + */ +static inline int cpuidle_coupled_get_state(struct cpuidle_device *dev, + struct cpuidle_coupled *coupled) +{ + int i; + int state = INT_MAX; + + assert_spin_locked(&coupled->lock); + + for_each_cpu_mask(i, coupled->alive_coupled_cpus) + if (coupled->requested_state[i] < state) + state = coupled->requested_state[i]; + + BUG_ON(state >= dev->state_count || state < 0); + + return state; +} + +static void cpuidle_coupled_poked(void *info) +{ + int cpu = (unsigned long)info; + cpumask_clear_cpu(cpu, &cpuidle_coupled_poked_mask); +} + +static void cpuidle_coupled_poke(int cpu) +{ + struct call_single_data *csd = &per_cpu(cpuidle_coupled_poke_cb, cpu); + if (cpu_online(cpu)) + if (!cpumask_test_and_set_cpu(cpu, &cpuidle_coupled_poked_mask)) + __smp_call_function_single(cpu, csd, 0); +} + +/** + * cpuidle_coupled_update_state + * + * Updates the requested idle state for the specified cpuidle device, + * poking all coupled cpus out of idle to let them see the new state. + */ +static void cpuidle_coupled_update_state(struct cpuidle_device *dev, + struct cpuidle_coupled *coupled, int next_state) +{ + int cpu; + + assert_spin_locked(&coupled->lock); + + coupled->requested_state[dev->cpu] = next_state; + smp_wmb(); + + if (next_state >= 0) + for_each_cpu_mask(cpu, coupled->alive_coupled_cpus) + if (cpu != dev->cpu) + cpuidle_coupled_poke(cpu); +} + +/** + * cpuidle_enter_state_coupled + * + * Coordinate with coupled cpus to enter the target state. This is a two + * stage process. In the first stage, the cpus are operating independently, + * and may call into cpuidle_enter_state_coupled at completely different times. + * To save as much power as possible, the first cpus to call this function will + * go to an intermediate state (the cpuidle_device's safe state), and wait for + * all the other cpus to call this function. Once all coupled cpus are idle, + * the second stage will start. Each coupled cpu will spin until all cpus have + * guaranteed that they will call the target_state. + */ +int cpuidle_enter_state_coupled(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int next_state) +{ + int entered_state = -1; + struct cpuidle_coupled *coupled = dev->coupled; + + spin_lock(&coupled->lock); + if (!cpumask_test_cpu(dev->cpu, &coupled->alive_coupled_cpus)) { + /* + * somebody took us out of the online coupled cpus mask, we + * must be on the way up or down + */ + spin_unlock(&coupled->lock); + return -1; + } + + BUG_ON(coupled->ready_count); + cpuidle_coupled_update_state(dev, coupled, next_state); + +retry: + /* + * Wait for all coupled cpus to be idle, using the deepest state + * allowed for a single cpu. + */ + while (!need_resched() && !cpuidle_coupled_cpus_idle(coupled)) { + spin_unlock(&coupled->lock); + + entered_state = cpuidle_enter_state(dev, drv, + dev->safe_state_index); + + local_irq_enable(); + cpu_relax(); + local_irq_disable(); + + spin_lock(&coupled->lock); + } + + /* give a chance to process any remaining pokes */ + local_irq_enable(); + cpu_relax(); + local_irq_disable(); + + if (need_resched()) { + cpuidle_coupled_update_state(dev, coupled, -1); + goto out; + } + + /* + * All coupled cpus are probably idle. There is a small chance that + * one of the other cpus just became active. Increment a counter when + * ready, and spin until all coupled cpus have incremented the counter. + * Once a cpu has incremented the counter, it cannot abort idle and must + * spin until either the count has hit num_online_cpus(), or another + * cpu leaves idle. + */ + + coupled->ready_count++; + + while (coupled->ready_count != + cpumask_weight(&coupled->alive_coupled_cpus)) { + if (!cpuidle_coupled_cpus_idle(coupled)) { + coupled->ready_count--; + goto retry; + } + + spin_unlock(&coupled->lock); + cpu_relax(); + spin_lock(&coupled->lock); + } + + /* all cpus have acked the coupled state */ + next_state = cpuidle_coupled_get_state(dev, coupled); + spin_unlock(&coupled->lock); + + entered_state = cpuidle_enter_state(dev, drv, next_state); + + spin_lock(&coupled->lock); + + cpuidle_coupled_update_state(dev, coupled, -1); + coupled->ready_count--; + +out: + local_irq_enable(); + + while (coupled->ready_count > 0) { + spin_unlock(&coupled->lock); + cpu_relax(); + spin_lock(&coupled->lock); + } + + spin_unlock(&coupled->lock); + + return entered_state; +} + +/** + * cpuidle_coupled_register_device + * + * Called from cpuidle_register_device to handle coupled idle init. Finds the + * cpuidle_coupled struct for this set of coupled cpus, or creates one if none + * exists yet. + */ +int cpuidle_coupled_register_device(struct cpuidle_device *dev) +{ + int cpu; + struct cpuidle_device *other_dev; + struct call_single_data *csd; + + if (cpumask_empty(&dev->coupled_cpus)) + return 0; + + for_each_cpu_mask(cpu, dev->coupled_cpus) { + other_dev = per_cpu(cpuidle_devices, cpu); + if (other_dev && other_dev->coupled) { + BUG_ON(!cpumask_equal(&dev->coupled_cpus, + &other_dev->coupled_cpus)); + dev->coupled = other_dev->coupled; + goto have_coupled; + } + } + + /* No existing coupled info found, create a new one */ + dev->coupled = kzalloc(sizeof(struct cpuidle_coupled), GFP_KERNEL); + if (!dev->coupled) + return -ENOMEM; + + spin_lock_init(&dev->coupled->lock); + +have_coupled: + spin_lock(&dev->coupled->lock); + + dev->coupled->requested_state[dev->cpu] = -1; + + if (cpu_online(dev->cpu)) + cpumask_set_cpu(dev->cpu, &dev->coupled->alive_coupled_cpus); + dev->coupled->refcnt++; + + csd = &per_cpu(cpuidle_coupled_poke_cb, dev->cpu); + csd->func = cpuidle_coupled_poked; + csd->info = (void *)(unsigned long)dev->cpu; + + spin_unlock(&dev->coupled->lock); + + return 0; +} + +/** + * cpuidle_coupled_unregister_device + * + * Called from cpuidle_unregister_device to tear down coupled idle. Removes the + * cpu from the coupled idle set, and frees the cpuidle_coupled_info struct if + * this was the last cpu in the set. + */ +void cpuidle_coupled_unregister_device(struct cpuidle_device *dev) +{ + if (cpumask_empty(&dev->coupled_cpus)) + return; + + cpumask_clear_cpu(dev->cpu, &dev->coupled->alive_coupled_cpus); + if (--dev->coupled->refcnt) + kfree(dev->coupled); + + dev->coupled = NULL; +} + +static void cpuidle_coupled_cpu_set_alive(int cpu, bool online) +{ + struct cpuidle_device *dev; + + mutex_lock(&cpuidle_lock); + + dev = per_cpu(cpuidle_devices, cpu); + if (!dev->coupled) + goto out; + + spin_lock(&dev->coupled->lock); + + if (online) + cpumask_set_cpu(dev->cpu, &dev->coupled->alive_coupled_cpus); + else + cpumask_clear_cpu(dev->cpu, &dev->coupled->alive_coupled_cpus); + + spin_unlock(&dev->coupled->lock); + +out: + mutex_unlock(&cpuidle_lock); +} + +/** + * cpuidle_coupled_cpu_notify + * + * Called when a cpu is brought on or offline using hotplug. Updates the + * coupled cpu set appropriately + */ +static int cpuidle_coupled_cpu_notify(struct notifier_block *nb, + unsigned long action, void *hcpu) +{ + int cpu = (unsigned long)hcpu; + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_DEAD: + case CPU_UP_CANCELED: + cpuidle_coupled_cpu_set_alive(cpu, false); + break; + case CPU_UP_PREPARE: + cpuidle_coupled_cpu_set_alive(cpu, true); + break; + } + return NOTIFY_OK; +} + +static struct notifier_block cpuidle_coupled_cpu_notifier = { + .notifier_call = cpuidle_coupled_cpu_notify, +}; + +static int __init cpuidle_coupled_init(void) +{ + return register_cpu_notifier(&cpuidle_coupled_cpu_notifier); +} +core_initcall(cpuidle_coupled_init); diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ea00a16..e3d61b2 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -122,7 +122,10 @@ int cpuidle_idle_call(void) trace_power_start(POWER_CSTATE, next_state, dev->cpu); trace_cpu_idle(next_state, dev->cpu); - entered_state = cpuidle_enter_state(dev, drv, next_state); + if (cpuidle_state_is_coupled(dev, drv, next_state)) + entered_state = cpuidle_enter_state_coupled(dev, drv, next_state); + else + entered_state = cpuidle_enter_state(dev, drv, next_state); trace_power_end(dev->cpu); trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu); @@ -322,9 +325,16 @@ static int __cpuidle_register_device(struct cpuidle_device *dev) if (ret) goto err_sysfs; + ret = cpuidle_coupled_register_device(dev); + if (ret) + goto err_coupled; + dev->registered = 1; return 0; +err_coupled: + cpuidle_remove_sysfs(sys_dev); + wait_for_completion(&dev->kobj_unregister); err_sysfs: module_put(cpuidle_driver->owner); list_del(&dev->device_list); @@ -379,6 +389,8 @@ void cpuidle_unregister_device(struct cpuidle_device *dev) wait_for_completion(&dev->kobj_unregister); per_cpu(cpuidle_devices, dev->cpu) = NULL; + cpuidle_coupled_unregister_device(dev); + cpuidle_resume_and_unlock(); module_put(cpuidle_driver->owner); diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h index dd2df8f..55a0c6f 100644 --- a/drivers/cpuidle/cpuidle.h +++ b/drivers/cpuidle/cpuidle.h @@ -32,4 +32,43 @@ extern void cpuidle_remove_state_sysfs(struct cpuidle_device *device); extern int cpuidle_add_sysfs(struct sys_device *sysdev); extern void cpuidle_remove_sysfs(struct sys_device *sysdev); +/* coupled states */ +struct cpuidle_coupled { + spinlock_t lock; + int requested_state[NR_CPUS]; + int ready_count; + cpumask_t alive_coupled_cpus; + int refcnt; +}; + +#ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED +bool cpuidle_state_is_coupled(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int state); +int cpuidle_enter_state_coupled(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int next_state); +int cpuidle_coupled_register_device(struct cpuidle_device *dev); +void cpuidle_coupled_unregister_device(struct cpuidle_device *dev); +#else +static inline bool cpuidle_state_is_coupled(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int state) +{ + return false; +} + +static inline int cpuidle_enter_state_coupled(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int next_state) +{ + return -1; +} + +static inline int cpuidle_coupled_register_device(struct cpuidle_device *dev) +{ + return 0; +} + +static inline void cpuidle_coupled_unregister_device(struct cpuidle_device *dev) +{ +} +#endif + #endif /* __DRIVER_CPUIDLE_H */ diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 7408af8..5438a09 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -53,6 +53,7 @@ struct cpuidle_state { /* Idle State Flags */ #define CPUIDLE_FLAG_TIME_VALID (0x01) /* is residency time measurable? */ +#define CPUIDLE_FLAG_COUPLED (0x02) /* state applies to multiple cpus */ #define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000) @@ -97,6 +98,12 @@ struct cpuidle_device { struct kobject kobj; struct completion kobj_unregister; void *governor_data; + +#ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED + int safe_state_index; + cpumask_t coupled_cpus; + struct cpuidle_coupled *coupled; +#endif }; DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices); -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 0:09 [PATCH 0/3] coupled cpuidle state support Colin Cross ` (2 preceding siblings ...) 2011-12-21 0:09 ` [PATCH 3/3] cpuidle: add support for states that affect multiple cpus Colin Cross @ 2011-12-21 9:02 ` Arjan van de Ven 2011-12-21 9:40 ` Colin Cross 2012-01-04 0:41 ` Kevin Hilman 2012-01-20 8:46 ` Daniel Lezcano 5 siblings, 1 reply; 36+ messages in thread From: Arjan van de Ven @ 2011-12-21 9:02 UTC (permalink / raw) To: Colin Cross Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On 12/21/2011 1:09 AM, Colin Cross wrote: > On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the > cpus cannot be independently powered down, either due to > sequencing restrictions (on Tegra 2, cpu 0 must be the last to > power down), or due to HW bugs (on OMAP4460, a cpu powering up > will corrupt the gic state unless the other cpu runs a work > around). Each cpu has a power state that it can enter without > coordinating with the other cpu (usually Wait For Interrupt, or > WFI), and one or more "coupled" power states that affect blocks > shared between the cpus (L2 cache, interrupt controller, and > sometimes the whole SoC). Entering a coupled power state must > be tightly controlled on both cpus. > > The easiest solution to implementing coupled cpu power states is > to hotplug all but one cpu whenever possible, usually using a > cpufreq governor that looks at cpu load to determine when to > enable the secondary cpus. This causes problems, as hotplug is an > expensive operation, so the number of hotplug transitions must be > minimized, leading to very slow response to loads, often on the > order of seconds. > > This patch series implements an alternative solution, where each > cpu will wait in the WFI state until all cpus are ready to enter > a coupled state, at which point the coupled state function will > be called on all cpus at approximately the same time. > > Once all cpus are ready to enter idle, they are woken by an smp > cross call. At this point, there is a chance that one of the > cpus will find work to do, and choose not to enter suspend. A > final pass is needed to guarantee that all cpus will call the > power state enter function at the same time. During this pass, > each cpu will increment the ready counter, and continue once the > ready counter matches the number of online coupled cpus. If any > cpu exits idle, the other cpus will decrement their counter and > retry. this smells fundamentally racey to me; you can get an interrupt one cycle after you think you're done, but before the last guy enters WFI... how do you solve that issue ? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 9:02 ` [PATCH 0/3] coupled cpuidle state support Arjan van de Ven @ 2011-12-21 9:40 ` Colin Cross 2011-12-21 9:44 ` Arjan van de Ven 0 siblings, 1 reply; 36+ messages in thread From: Colin Cross @ 2011-12-21 9:40 UTC (permalink / raw) To: Arjan van de Ven Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Wed, Dec 21, 2011 at 1:02 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 12/21/2011 1:09 AM, Colin Cross wrote: >> On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the >> cpus cannot be independently powered down, either due to >> sequencing restrictions (on Tegra 2, cpu 0 must be the last to >> power down), or due to HW bugs (on OMAP4460, a cpu powering up >> will corrupt the gic state unless the other cpu runs a work >> around). Each cpu has a power state that it can enter without >> coordinating with the other cpu (usually Wait For Interrupt, or >> WFI), and one or more "coupled" power states that affect blocks >> shared between the cpus (L2 cache, interrupt controller, and >> sometimes the whole SoC). Entering a coupled power state must >> be tightly controlled on both cpus. >> >> The easiest solution to implementing coupled cpu power states is >> to hotplug all but one cpu whenever possible, usually using a >> cpufreq governor that looks at cpu load to determine when to >> enable the secondary cpus. This causes problems, as hotplug is an >> expensive operation, so the number of hotplug transitions must be >> minimized, leading to very slow response to loads, often on the >> order of seconds. >> >> This patch series implements an alternative solution, where each >> cpu will wait in the WFI state until all cpus are ready to enter >> a coupled state, at which point the coupled state function will >> be called on all cpus at approximately the same time. >> >> Once all cpus are ready to enter idle, they are woken by an smp >> cross call. At this point, there is a chance that one of the >> cpus will find work to do, and choose not to enter suspend. A >> final pass is needed to guarantee that all cpus will call the >> power state enter function at the same time. During this pass, >> each cpu will increment the ready counter, and continue once the >> ready counter matches the number of online coupled cpus. If any >> cpu exits idle, the other cpus will decrement their counter and >> retry. > > this smells fundamentally racey to me; you can get an interrupt one > cycle after you think you're done, but before the last guy enters WFI... > > how do you solve that issue ? All the cpus have interrupts off when they increment the counter, so they cannot receive an interrupt. If an interrupt is pending on one of those cpus, it will be handled later when WFI aborts due to the pending interrupt. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 9:40 ` Colin Cross @ 2011-12-21 9:44 ` Arjan van de Ven 2011-12-21 9:55 ` Colin Cross 0 siblings, 1 reply; 36+ messages in thread From: Arjan van de Ven @ 2011-12-21 9:44 UTC (permalink / raw) To: Colin Cross Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On 12/21/2011 10:40 AM, Colin Cross wrote: >> this smells fundamentally racey to me; you can get an interrupt one >> cycle after you think you're done, but before the last guy enters WFI... >> >> how do you solve that issue ? > > All the cpus have interrupts off when they increment the counter, so > they cannot receive an interrupt. If an interrupt is pending on one > of those cpus, it will be handled later when WFI aborts due to the > pending interrupt. ... but this leads to cases where you're aborting before other cpus are entering..... so your "last guy in" doesn't really work, since while cpu 0 thinks it's the last guy, cpu 1 is already on the way out/out already... (heck it might already be going back to sleep if your idle code can run fast, like in the size of a cache miss) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 9:44 ` Arjan van de Ven @ 2011-12-21 9:55 ` Colin Cross 2011-12-21 12:12 ` Arjan van de Ven 0 siblings, 1 reply; 36+ messages in thread From: Colin Cross @ 2011-12-21 9:55 UTC (permalink / raw) To: Arjan van de Ven Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Wed, Dec 21, 2011 at 1:44 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 12/21/2011 10:40 AM, Colin Cross wrote: > >>> this smells fundamentally racey to me; you can get an interrupt one >>> cycle after you think you're done, but before the last guy enters WFI... >>> >>> how do you solve that issue ? >> >> All the cpus have interrupts off when they increment the counter, so >> they cannot receive an interrupt. If an interrupt is pending on one >> of those cpus, it will be handled later when WFI aborts due to the >> pending interrupt. > > ... but this leads to cases where you're aborting before other cpus are > entering..... so your "last guy in" doesn't really work, since while cpu > 0 thinks it's the last guy, cpu 1 is already on the way out/out > already... (heck it might already be going back to sleep if your idle > code can run fast, like in the size of a cache miss) Once a cpu has incremented the counter, it has no way out unless either 1: another cpu (that hasn't incremented the counter yet) receives an interrupt, aborts idle, and clears its idle flag or 2: all cpus enter the ready counter, and call the cpuidle driver's enter function. In your example, cpu 1 has incremented the counter, so it cannot be on the way out unless cpu 0 aborts (in which case it will not increment the counter, and the counter will never be equal to the number of cpus), or unless cpu 0 turns off its interrupts and incrementes the counter (in which case neither cpu can return until after the cpuidle driver's enter function has been called on all cpus). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 9:55 ` Colin Cross @ 2011-12-21 12:12 ` Arjan van de Ven 2011-12-21 19:05 ` Colin Cross 2012-03-14 0:39 ` Colin Cross 0 siblings, 2 replies; 36+ messages in thread From: Arjan van de Ven @ 2011-12-21 12:12 UTC (permalink / raw) To: Colin Cross Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On 12/21/2011 10:55 AM, Colin Cross wrote: > On Wed, Dec 21, 2011 at 1:44 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: >> On 12/21/2011 10:40 AM, Colin Cross wrote: >> >>>> this smells fundamentally racey to me; you can get an interrupt one >>>> cycle after you think you're done, but before the last guy enters WFI... >>>> >>>> how do you solve that issue ? >>> >>> All the cpus have interrupts off when they increment the counter, so >>> they cannot receive an interrupt. If an interrupt is pending on one >>> of those cpus, it will be handled later when WFI aborts due to the >>> pending interrupt. >> >> ... but this leads to cases where you're aborting before other cpus are >> entering..... so your "last guy in" doesn't really work, since while cpu >> 0 thinks it's the last guy, cpu 1 is already on the way out/out >> already... (heck it might already be going back to sleep if your idle >> code can run fast, like in the size of a cache miss) > > Once a cpu has incremented the counter, it has no way out unless either > 1: another cpu (that hasn't incremented the counter yet) receives an > interrupt, aborts idle, and clears its idle flag > or > 2: all cpus enter the ready counter, and call the cpuidle driver's > enter function. .. or it enters WFI, and a physical device sends it an interrupt, at which point it exits. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 12:12 ` Arjan van de Ven @ 2011-12-21 19:05 ` Colin Cross 2011-12-21 19:36 ` Arjan van de Ven 2012-03-14 0:39 ` Colin Cross 1 sibling, 1 reply; 36+ messages in thread From: Colin Cross @ 2011-12-21 19:05 UTC (permalink / raw) To: Arjan van de Ven Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Wed, Dec 21, 2011 at 4:12 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 12/21/2011 10:55 AM, Colin Cross wrote: >> On Wed, Dec 21, 2011 at 1:44 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: >>> On 12/21/2011 10:40 AM, Colin Cross wrote: >>> >>>>> this smells fundamentally racey to me; you can get an interrupt one >>>>> cycle after you think you're done, but before the last guy enters WFI... >>>>> >>>>> how do you solve that issue ? >>>> >>>> All the cpus have interrupts off when they increment the counter, so >>>> they cannot receive an interrupt. If an interrupt is pending on one >>>> of those cpus, it will be handled later when WFI aborts due to the >>>> pending interrupt. >>> >>> ... but this leads to cases where you're aborting before other cpus are >>> entering..... so your "last guy in" doesn't really work, since while cpu >>> 0 thinks it's the last guy, cpu 1 is already on the way out/out >>> already... (heck it might already be going back to sleep if your idle >>> code can run fast, like in the size of a cache miss) >> >> Once a cpu has incremented the counter, it has no way out unless either >> 1: another cpu (that hasn't incremented the counter yet) receives an >> interrupt, aborts idle, and clears its idle flag >> or >> 2: all cpus enter the ready counter, and call the cpuidle driver's >> enter function. > > .. or it enters WFI, and a physical device sends it an interrupt, > at which point it exits. None of the cpus will return to the idle loop until all cpus have decremented the ready counter back to 0, so they can't wrap around again. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 19:05 ` Colin Cross @ 2011-12-21 19:36 ` Arjan van de Ven 2011-12-21 19:42 ` [linux-pm] " Colin Cross 0 siblings, 1 reply; 36+ messages in thread From: Arjan van de Ven @ 2011-12-21 19:36 UTC (permalink / raw) To: Colin Cross Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra >> >> .. or it enters WFI, and a physical device sends it an interrupt, >> at which point it exits. > > None of the cpus will return to the idle loop until all cpus have > decremented the ready counter back to 0, so they can't wrap around > again. yikes, so you IPI all the cpus on the first exit. that must burn power ;-( ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2011-12-21 19:36 ` Arjan van de Ven @ 2011-12-21 19:42 ` Colin Cross 2011-12-22 8:35 ` Shilimkar, Santosh 0 siblings, 1 reply; 36+ messages in thread From: Colin Cross @ 2011-12-21 19:42 UTC (permalink / raw) To: Arjan van de Ven Cc: Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, linux-arm-kernel On Wed, Dec 21, 2011 at 11:36 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: >>> >>> .. or it enters WFI, and a physical device sends it an interrupt, >>> at which point it exits. >> >> None of the cpus will return to the idle loop until all cpus have >> decremented the ready counter back to 0, so they can't wrap around >> again. > > > yikes, so you IPI all the cpus on the first exit. > that must burn power ;-( No, you're not understanding the point of this series. If your cpus can go in and out of idle independently, you don't use this code at all. Each cpu can call WFI and come back out without talking to the other cpu. However, if you have two cpus that share some part of the SoC that can be turned off in idle, like the L2 cache controller or the system bus, the two cpus need to go to idle together, and they will both boot together when either one receives an interrupt (although one will likely immediately go back to a shallower state that doesn't require coordination with the other cpu). There is no way around this, it's how the hardware works on some ARM platforms. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2011-12-21 19:42 ` [linux-pm] " Colin Cross @ 2011-12-22 8:35 ` Shilimkar, Santosh 2011-12-22 8:53 ` Arjan van de Ven 0 siblings, 1 reply; 36+ messages in thread From: Shilimkar, Santosh @ 2011-12-22 8:35 UTC (permalink / raw) To: Colin Cross Cc: Arjan van de Ven, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, linux-arm-kernel On Thu, Dec 22, 2011 at 1:12 AM, Colin Cross <ccross@android.com> wrote: > On Wed, Dec 21, 2011 at 11:36 AM, Arjan van de Ven > <arjan@linux.intel.com> wrote: >>>> >>>> .. or it enters WFI, and a physical device sends it an interrupt, >>>> at which point it exits. >>> >>> None of the cpus will return to the idle loop until all cpus have >>> decremented the ready counter back to 0, so they can't wrap around >>> again. >> >> >> yikes, so you IPI all the cpus on the first exit. >> that must burn power ;-( > > No, you're not understanding the point of this series. > > If your cpus can go in and out of idle independently, you don't use > this code at all. Each cpu can call WFI and come back out without > talking to the other cpu. > Indeed. The SOCs, Arch's which does support low power state independently and doesn't need any co-ordination between CPU's will continue to work same way as before with this series. > However, if you have two cpus that share some part of the SoC that can > be turned off in idle, like the L2 cache controller or the system bus, > the two cpus need to go to idle together, and they will both boot > together when either one receives an interrupt (although one will > likely immediately go back to a shallower state that doesn't require > coordination with the other cpu). There is no way around this, it's > how the hardware works on some ARM platforms. Apart from shared peripherals which Colin pointed out, OMAP also brings in the security software state which is kind of executing in parallel with linux. This state is lost in certain deeper low power states and since the security software is affine to only master CPU (because of OMAP security architecture), the co-ordination is mandatory to achieve those low power states. So this additional CPU co-ordination logic for such C-states really helps to solve the issue in most generic way. Regards Santosh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2011-12-22 8:35 ` Shilimkar, Santosh @ 2011-12-22 8:53 ` Arjan van de Ven 2011-12-22 9:30 ` Shilimkar, Santosh 2011-12-22 21:20 ` Colin Cross 0 siblings, 2 replies; 36+ messages in thread From: Arjan van de Ven @ 2011-12-22 8:53 UTC (permalink / raw) To: Shilimkar, Santosh Cc: Colin Cross, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, linux-arm-kernel On 12/22/2011 9:35 AM, Shilimkar, Santosh wrote: > Indeed. The SOCs, Arch's which does support low power > state independently and doesn't need any co-ordination between CPU's > will continue to work same way as before with this series. btw I think you misunderstand; I don't object to a need for something like this, I am just very concerned that this may not be possible to be done in a race-free way. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2011-12-22 8:53 ` Arjan van de Ven @ 2011-12-22 9:30 ` Shilimkar, Santosh 2011-12-22 21:20 ` Colin Cross 1 sibling, 0 replies; 36+ messages in thread From: Shilimkar, Santosh @ 2011-12-22 9:30 UTC (permalink / raw) To: Arjan van de Ven Cc: Colin Cross, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, linux-arm-kernel On Thu, Dec 22, 2011 at 2:23 PM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 12/22/2011 9:35 AM, Shilimkar, Santosh wrote: > >> Indeed. The SOCs, Arch's which does support low power >> state independently and doesn't need any co-ordination between CPU's >> will continue to work same way as before with this series. > > btw I think you misunderstand; I don't object to a need for something > like this, I did. Thanks for clarification. > I am just very concerned that this may not be possible to be > done in a race-free way. > I agree to those races but may be they are harmless. Also the safe state need not be just WFI and can be bit deeper where the co-ordination between isn't necessary. So that should still not burn the power that much. For simplicity let's assume a two CPU scenario. Ideal scenario: CPU 1 has entered idle and incremented counter for the co-ordinated C state. CPU0 also enter and increments the counter and now the subsystem and interconnect can go down along with CPU cluster. Few of the race conditions will possibly lead to void the above conditions and in that case the counter would reflect that and such a C-state won't be attempted and a safe C-state would be attempted. That should still work fine. Some how this hardware/security restriction is bit stupid and likely going against the existing CPUIDLE design which expect that a CPUIDLE thread are per CPU and it should be independent and local to that CPU. Regards Santosh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2011-12-22 8:53 ` Arjan van de Ven 2011-12-22 9:30 ` Shilimkar, Santosh @ 2011-12-22 21:20 ` Colin Cross 1 sibling, 0 replies; 36+ messages in thread From: Colin Cross @ 2011-12-22 21:20 UTC (permalink / raw) To: Arjan van de Ven Cc: Shilimkar, Santosh, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, linux-arm-kernel On Thu, Dec 22, 2011 at 12:53 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 12/22/2011 9:35 AM, Shilimkar, Santosh wrote: > >> Indeed. The SOCs, Arch's which does support low power >> state independently and doesn't need any co-ordination between CPU's >> will continue to work same way as before with this series. > > btw I think you misunderstand; I don't object to a need for something > like this, I am just very concerned that this may not be possible to be > done in a race-free way. I agree that there are many potential races in this code, but I believe I've handled all of them. This patch set is a refactoring of the common parts of the SMP idle code that has shipped on Tegra and OMAP4 devices, so the basic idea has been hammered on extensively. I think I've explained the protection against the race condition you asked about earlier. As for the power impact, the power savings of getting into the deeper coupled power states far outweighs the cost of booting all coupled cpus out of idle and letting them loop back to a shallower idle state. On an OMAP4 platform, at the slowest cpu speed, the system-wide power usage with both cpus in the non-coupled state (WFI) is around 50 mA, while the deepest coupled state is 12 mA. At the fastest cpu speed, WFI is over 100 mA. On some platforms it may be possible to have only one cpu boot out of idle when an interrupt arrives, but that introduces the race condition you asked about before, where the last cpu tries to go into idle at the same time that an already-idle cpu comes back out. This way has fairly minimal overhead (booting the 2nd cpu and then going to WFI), and that's a path that already needs to be fast because it's part of the normal single-cpu idle path. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 12:12 ` Arjan van de Ven 2011-12-21 19:05 ` Colin Cross @ 2012-03-14 0:39 ` Colin Cross 1 sibling, 0 replies; 36+ messages in thread From: Colin Cross @ 2012-03-14 0:39 UTC (permalink / raw) To: Arjan van de Ven Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Wed, Dec 21, 2011 at 4:12 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 12/21/2011 10:55 AM, Colin Cross wrote: >> On Wed, Dec 21, 2011 at 1:44 AM, Arjan van de Ven <arjan@linux.intel.com> wrote: >>> On 12/21/2011 10:40 AM, Colin Cross wrote: >>> >>>>> this smells fundamentally racey to me; you can get an interrupt one >>>>> cycle after you think you're done, but before the last guy enters WFI... >>>>> >>>>> how do you solve that issue ? >>>> >>>> All the cpus have interrupts off when they increment the counter, so >>>> they cannot receive an interrupt. If an interrupt is pending on one >>>> of those cpus, it will be handled later when WFI aborts due to the >>>> pending interrupt. >>> >>> ... but this leads to cases where you're aborting before other cpus are >>> entering..... so your "last guy in" doesn't really work, since while cpu >>> 0 thinks it's the last guy, cpu 1 is already on the way out/out >>> already... (heck it might already be going back to sleep if your idle >>> code can run fast, like in the size of a cache miss) >> >> Once a cpu has incremented the counter, it has no way out unless either >> 1: another cpu (that hasn't incremented the counter yet) receives an >> interrupt, aborts idle, and clears its idle flag >> or >> 2: all cpus enter the ready counter, and call the cpuidle driver's >> enter function. > > .. or it enters WFI, and a physical device sends it an interrupt, > at which point it exits. Rereading this, I think I understand what you meant. I misunderstood because of your reference to WFI, which is not entered at this point. WFI generally refers to the shallowest idle state, which only affects a single cpu. For deeper states, the cpu may or may not execute the WFI instruction to trigger the transition (depending on the platform), but it's not really entering WFI, and the cpu will generally not ever execute the next instruction (again, depending on the platform). For coupled cpus, only one cpu is capable of booting first, so all interrupts should be masked on cpu1 by the time it gets to its low power state, so it should never abort. If it does abort (like the example Kevin Hillman just posted to the end of this thread), cpu1 needs to set a flag to say it is not going to get to idle, and cpu0 needs to wait for cpu1 to be in idle (by polling a power controller register), or for the flag to be set. Afterwards, the cpus need to resynchronize and reset the flag, which will be easy with the parallel barrier helper function that will be included in the next patch set. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 0:09 [PATCH 0/3] coupled cpuidle state support Colin Cross ` (3 preceding siblings ...) 2011-12-21 9:02 ` [PATCH 0/3] coupled cpuidle state support Arjan van de Ven @ 2012-01-04 0:41 ` Kevin Hilman 2012-01-04 17:27 ` Shilimkar, Santosh 2012-01-20 8:46 ` Daniel Lezcano 5 siblings, 1 reply; 36+ messages in thread From: Kevin Hilman @ 2012-01-04 0:41 UTC (permalink / raw) To: Colin Cross Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra Colin Cross <ccross@android.com> writes: > This patch series implements an alternative solution, where each > cpu will wait in the WFI state until all cpus are ready to enter > a coupled state, at which point the coupled state function will > be called on all cpus at approximately the same time. This looks great, and is certainly preferred to hotplug IMO. [...] > None of the platforms I work with have an SMP idle implementation > upstream, so I can't easily show a patch that converts a platform > from hotplug governor to coupled cpuidle states. Basic OMAP4 CPUidle support is in now queued for v3.3 (see omap4 branch Tony's tree[1].) Can you (or Santosh) send a patch that adds coupled support to that driver so it can see some broader testing on OMAP4? Thanks, Kevin [1] git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap.git ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2012-01-04 0:41 ` Kevin Hilman @ 2012-01-04 17:27 ` Shilimkar, Santosh 0 siblings, 0 replies; 36+ messages in thread From: Shilimkar, Santosh @ 2012-01-04 17:27 UTC (permalink / raw) To: Kevin Hilman Cc: Colin Cross, linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Wed, Jan 4, 2012 at 1:41 AM, Kevin Hilman <khilman@ti.com> wrote: > Colin Cross <ccross@android.com> writes: > > >> This patch series implements an alternative solution, where each >> cpu will wait in the WFI state until all cpus are ready to enter >> a coupled state, at which point the coupled state function will >> be called on all cpus at approximately the same time. > > This looks great, and is certainly preferred to hotplug IMO. > > [...] > >> None of the platforms I work with have an SMP idle implementation >> upstream, so I can't easily show a patch that converts a platform >> from hotplug governor to coupled cpuidle states. > > Basic OMAP4 CPUidle support is in now queued for v3.3 (see omap4 > branch Tony's tree[1].) > > Can you (or Santosh) send a patch that adds coupled support to that > driver so it can see some broader testing on OMAP4? > I briefly attempted OMAP4 idle with this series but had some lock up related issues which we faced in the older development. Then I got busy into other activities which will keep me occupied for next few weeks. I will be happy to share the patches in case some one would like to have a look till then. Regards Santosh ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2011-12-21 0:09 [PATCH 0/3] coupled cpuidle state support Colin Cross ` (4 preceding siblings ...) 2012-01-04 0:41 ` Kevin Hilman @ 2012-01-20 8:46 ` Daniel Lezcano 2012-01-20 20:40 ` Colin Cross [not found] ` <8762e8kqi6.fsf@ti.com> 5 siblings, 2 replies; 36+ messages in thread From: Daniel Lezcano @ 2012-01-20 8:46 UTC (permalink / raw) To: Colin Cross Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On 12/21/2011 01:09 AM, Colin Cross wrote: > On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the > cpus cannot be independently powered down, either due to > sequencing restrictions (on Tegra 2, cpu 0 must be the last to > power down), or due to HW bugs (on OMAP4460, a cpu powering up > will corrupt the gic state unless the other cpu runs a work > around). Each cpu has a power state that it can enter without > coordinating with the other cpu (usually Wait For Interrupt, or > WFI), and one or more "coupled" power states that affect blocks > shared between the cpus (L2 cache, interrupt controller, and > sometimes the whole SoC). Entering a coupled power state must > be tightly controlled on both cpus. > > The easiest solution to implementing coupled cpu power states is > to hotplug all but one cpu whenever possible, usually using a > cpufreq governor that looks at cpu load to determine when to > enable the secondary cpus. This causes problems, as hotplug is an > expensive operation, so the number of hotplug transitions must be > minimized, leading to very slow response to loads, often on the > order of seconds. > > This patch series implements an alternative solution, where each > cpu will wait in the WFI state until all cpus are ready to enter > a coupled state, at which point the coupled state function will > be called on all cpus at approximately the same time. > > Once all cpus are ready to enter idle, they are woken by an smp > cross call. At this point, there is a chance that one of the > cpus will find work to do, and choose not to enter suspend. A > final pass is needed to guarantee that all cpus will call the > power state enter function at the same time. During this pass, > each cpu will increment the ready counter, and continue once the > ready counter matches the number of online coupled cpus. If any > cpu exits idle, the other cpus will decrement their counter and > retry. > > To use coupled cpuidle states, a cpuidle driver must: > > Set struct cpuidle_device.coupled_cpus to the mask of all > coupled cpus, usually the same as cpu_possible_mask if all cpus > are part of the same cluster. The coupled_cpus mask must be > set in the struct cpuidle_device for each cpu. > > Set struct cpuidle_device.safe_state to a state that is not a > coupled state. This is usually WFI. > > Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each > state that affects multiple cpus. > > Provide a struct cpuidle_state.enter function for each state > that affects multiple cpus. This function is guaranteed to be > called on all cpus at approximately the same time. The driver > should ensure that the cpus all abort together if any cpu tries > to abort once the function is called. > > This series was functionally tested on v3.0, but has only been > compile-tested on v3.2 after the removal of per-cpu state fields. Hi Colin, this patchset could be interesting to resolve in a generic way the cpu dependencies. What is the status of this patchset ? Did you have the opportunity to measure the power consumption with and without this patchset ? Thanks -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro:<http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2012-01-20 8:46 ` Daniel Lezcano @ 2012-01-20 20:40 ` Colin Cross 2012-01-25 14:04 ` Daniel Lezcano 2012-01-27 8:54 ` [linux-pm] " Vincent Guittot [not found] ` <8762e8kqi6.fsf@ti.com> 1 sibling, 2 replies; 36+ messages in thread From: Colin Cross @ 2012-01-20 20:40 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Fri, Jan 20, 2012 at 12:46 AM, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > Hi Colin, > > this patchset could be interesting to resolve in a generic way the cpu > dependencies. > What is the status of this patchset ? I can't do much with it right now, because I don't have any devices that can do SMP idle with a v3.2 kernel. I've started working on an updated version that avoids the spinlock, but it might be a while before I can test and post it. I'm mostly looking for feedback on the approach taken in this patch, and whether it will be useful for other SoCs besides Tegra and OMAP4. > Did you have the opportunity to measure the power consumption with and > without this patchset ? Power consumption will be very dependent on the specific SoC in question. The most important factors are the power savings of the independent cpuidle state (normally WFI) vs. the hotplug state (normally 1 cpu in OFF), and the workload being tested. On a very idle system, these patches result in the same total power as hotplugging one cpu and letting the other idle normally. On a 25% busy system, you might see a slight increase in power, as the best independent cpuidle state might be WFI, vs 1 cpu in OFF mode in hotplug. On OMAP4, that difference is small, on the order of 10 mW. Once you hit the threshold where a hotplug governor would have hotplugged in the second cpu (lets say 40%), the savings from these patches are enormous, as you can hit the lowest power state up to 60% of the time, where the hotplug solution would never be going below WFI on both cpus. On OMAP4, that can be well over 100 mW. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2012-01-20 20:40 ` Colin Cross @ 2012-01-25 14:04 ` Daniel Lezcano 2012-01-31 14:13 ` Daniel Lezcano 2012-01-27 8:54 ` [linux-pm] " Vincent Guittot 1 sibling, 1 reply; 36+ messages in thread From: Daniel Lezcano @ 2012-01-25 14:04 UTC (permalink / raw) To: Colin Cross Cc: Daniel Lezcano, linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On 01/20/2012 09:40 PM, Colin Cross wrote: > On Fri, Jan 20, 2012 at 12:46 AM, Daniel Lezcano > <daniel.lezcano@linaro.org> wrote: >> Hi Colin, >> >> this patchset could be interesting to resolve in a generic way the cpu >> dependencies. >> What is the status of this patchset ? > I can't do much with it right now, because I don't have any devices > that can do SMP idle with a v3.2 kernel. I've started working on an > updated version that avoids the spinlock, but it might be a while > before I can test and post it. I'm mostly looking for feedback on the > approach taken in this patch, and whether it will be useful for other > SoCs besides Tegra and OMAP4. Hi Colin, I will be happy to test your patchset. Do you have a pointer to a more recent kernel ? >> Did you have the opportunity to measure the power consumption with and >> without this patchset ? > Power consumption will be very dependent on the specific SoC in > question. The most important factors are the power savings of the > independent cpuidle state (normally WFI) vs. the hotplug state > (normally 1 cpu in OFF), and the workload being tested. > > On a very idle system, these patches result in the same total power as > hotplugging one cpu and letting the other idle normally. On a 25% > busy system, you might see a slight increase in power, as the best > independent cpuidle state might be WFI, vs 1 cpu in OFF mode in > hotplug. On OMAP4, that difference is small, on the order of 10 mW. > Once you hit the threshold where a hotplug governor would have > hotplugged in the second cpu (lets say 40%), the savings from these > patches are enormous, as you can hit the lowest power state up to 60% > of the time, where the hotplug solution would never be going below WFI > on both cpus. On OMAP4, that can be well over 100 mW. Interesting. Thanks -- Daniel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2012-01-25 14:04 ` Daniel Lezcano @ 2012-01-31 14:13 ` Daniel Lezcano 0 siblings, 0 replies; 36+ messages in thread From: Daniel Lezcano @ 2012-01-31 14:13 UTC (permalink / raw) To: Colin Cross Cc: Daniel Lezcano, linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Kevin Hilman, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On 01/25/2012 03:04 PM, Daniel Lezcano wrote: > On 01/20/2012 09:40 PM, Colin Cross wrote: >> On Fri, Jan 20, 2012 at 12:46 AM, Daniel Lezcano >> <daniel.lezcano@linaro.org> wrote: >>> Hi Colin, >>> >>> this patchset could be interesting to resolve in a generic way the cpu >>> dependencies. >>> What is the status of this patchset ? >> I can't do much with it right now, because I don't have any devices >> that can do SMP idle with a v3.2 kernel. I've started working on an >> updated version that avoids the spinlock, but it might be a while >> before I can test and post it. I'm mostly looking for feedback on the >> approach taken in this patch, and whether it will be useful for other >> SoCs besides Tegra and OMAP4. > > Hi Colin, > > I will be happy to test your patchset. Do you have a pointer to a more > recent kernel ? Hi Colin, sorry for bothering you. I am working on a cpuidle driver for an ARM SMP board, I will be really happy to test your patchset and look how it fits with the cpuidle driver. Thanks -- Daniel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-01-20 20:40 ` Colin Cross 2012-01-25 14:04 ` Daniel Lezcano @ 2012-01-27 8:54 ` Vincent Guittot 2012-01-27 17:32 ` Colin Cross 1 sibling, 1 reply; 36+ messages in thread From: Vincent Guittot @ 2012-01-27 8:54 UTC (permalink / raw) To: Colin Cross Cc: Daniel Lezcano, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel On 20 January 2012 21:40, Colin Cross <ccross@android.com> wrote: > On Fri, Jan 20, 2012 at 12:46 AM, Daniel Lezcano > <daniel.lezcano@linaro.org> wrote: >> Hi Colin, >> >> this patchset could be interesting to resolve in a generic way the cpu >> dependencies. >> What is the status of this patchset ? > > I can't do much with it right now, because I don't have any devices > that can do SMP idle with a v3.2 kernel. I've started working on an > updated version that avoids the spinlock, but it might be a while > before I can test and post it. I'm mostly looking for feedback on the > approach taken in this patch, and whether it will be useful for other > SoCs besides Tegra and OMAP4. > Hi Colin, In your patch, you put in safe state (WFI for most of platform) the cpus that become idle and these cpus are woken up each time a new cpu of the cluster becomes idle. Then, the cluster state is chosen and the cpus enter the selected C-state. On ux500, we are using another behavior for synchronizing the cpus. The cpus are prepared to enter the c-state that has been chosen by the governor and the last cpu, that enters idle, chooses the final cluster state (according to cpus' C-state). The main advantage of this solution is that you don't need to wake other cpus to enter the C-state of a cluster. This can be quite worth full when tasks mainly run on one cpu. Have you also think about such behavior when developing the coupled cpuidle driver ? It could be interesting to add such behavior. Regards, Vincent >> Did you have the opportunity to measure the power consumption with and >> without this patchset ? > > Power consumption will be very dependent on the specific SoC in > question. The most important factors are the power savings of the > independent cpuidle state (normally WFI) vs. the hotplug state > (normally 1 cpu in OFF), and the workload being tested. > > On a very idle system, these patches result in the same total power as > hotplugging one cpu and letting the other idle normally. On a 25% > busy system, you might see a slight increase in power, as the best > independent cpuidle state might be WFI, vs 1 cpu in OFF mode in > hotplug. On OMAP4, that difference is small, on the order of 10 mW. > Once you hit the threshold where a hotplug governor would have > hotplugged in the second cpu (lets say 40%), the savings from these > patches are enormous, as you can hit the lowest power state up to 60% > of the time, where the hotplug solution would never be going below WFI > on both cpus. On OMAP4, that can be well over 100 mW. > _______________________________________________ > linux-pm mailing list > linux-pm@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/linux-pm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-01-27 8:54 ` [linux-pm] " Vincent Guittot @ 2012-01-27 17:32 ` Colin Cross 2012-02-01 12:13 ` Vincent Guittot 0 siblings, 1 reply; 36+ messages in thread From: Colin Cross @ 2012-01-27 17:32 UTC (permalink / raw) To: Vincent Guittot Cc: Daniel Lezcano, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel On Fri, Jan 27, 2012 at 12:54 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote: > On 20 January 2012 21:40, Colin Cross <ccross@android.com> wrote: >> On Fri, Jan 20, 2012 at 12:46 AM, Daniel Lezcano >> <daniel.lezcano@linaro.org> wrote: >>> Hi Colin, >>> >>> this patchset could be interesting to resolve in a generic way the cpu >>> dependencies. >>> What is the status of this patchset ? >> >> I can't do much with it right now, because I don't have any devices >> that can do SMP idle with a v3.2 kernel. I've started working on an >> updated version that avoids the spinlock, but it might be a while >> before I can test and post it. I'm mostly looking for feedback on the >> approach taken in this patch, and whether it will be useful for other >> SoCs besides Tegra and OMAP4. >> > > Hi Colin, > > In your patch, you put in safe state (WFI for most of platform) the > cpus that become idle and these cpus are woken up each time a new cpu > of the cluster becomes idle. Then, the cluster state is chosen and the > cpus enter the selected C-state. On ux500, we are using another > behavior for synchronizing the cpus. The cpus are prepared to enter > the c-state that has been chosen by the governor and the last cpu, > that enters idle, chooses the final cluster state (according to cpus' > C-state). The main advantage of this solution is that you don't need > to wake other cpus to enter the C-state of a cluster. This can be > quite worth full when tasks mainly run on one cpu. Have you also think > about such behavior when developing the coupled cpuidle driver ? It > could be interesting to add such behavior. Waking up the cpus that are in the safe state is not done just to choose the target state, it's done to allow the cpus to take themselves to the target low power state. On ux500, are you saying you take the cpus directly from the safe state to a lower power state without ever going back to the active state? I once implemented Tegra that way, and it required lots of nasty synchronization to prevent resetting the cpu at the same time that it was booting due to an interrupt, and I was later told that Tegra can't handle that sequence at all, although I haven't verified it yet. On platforms that can't turn the cpus off in a random order, or that can't take a cpu directly from the safe state to the target state, something like these coupled cpuidle patches are required. On platforms that can, the low power modes can be implemented without these patches, although it is very hard to do without race conditions. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-01-27 17:32 ` Colin Cross @ 2012-02-01 12:13 ` Vincent Guittot 2012-02-01 14:59 ` Lorenzo Pieralisi 0 siblings, 1 reply; 36+ messages in thread From: Vincent Guittot @ 2012-02-01 12:13 UTC (permalink / raw) To: Colin Cross Cc: Daniel Lezcano, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel Hi Colin, Sorry for this late reply On 27 January 2012 18:32, Colin Cross <ccross@android.com> wrote: > On Fri, Jan 27, 2012 at 12:54 AM, Vincent Guittot > <vincent.guittot@linaro.org> wrote: >> On 20 January 2012 21:40, Colin Cross <ccross@android.com> wrote: >>> On Fri, Jan 20, 2012 at 12:46 AM, Daniel Lezcano >>> <daniel.lezcano@linaro.org> wrote: >>>> Hi Colin, >>>> >>>> this patchset could be interesting to resolve in a generic way the cpu >>>> dependencies. >>>> What is the status of this patchset ? >>> >>> I can't do much with it right now, because I don't have any devices >>> that can do SMP idle with a v3.2 kernel. I've started working on an >>> updated version that avoids the spinlock, but it might be a while >>> before I can test and post it. I'm mostly looking for feedback on the >>> approach taken in this patch, and whether it will be useful for other >>> SoCs besides Tegra and OMAP4. >>> >> >> Hi Colin, >> >> In your patch, you put in safe state (WFI for most of platform) the >> cpus that become idle and these cpus are woken up each time a new cpu >> of the cluster becomes idle. Then, the cluster state is chosen and the >> cpus enter the selected C-state. On ux500, we are using another >> behavior for synchronizing the cpus. The cpus are prepared to enter >> the c-state that has been chosen by the governor and the last cpu, >> that enters idle, chooses the final cluster state (according to cpus' >> C-state). The main advantage of this solution is that you don't need >> to wake other cpus to enter the C-state of a cluster. This can be >> quite worth full when tasks mainly run on one cpu. Have you also think >> about such behavior when developing the coupled cpuidle driver ? It >> could be interesting to add such behavior. > > Waking up the cpus that are in the safe state is not done just to > choose the target state, it's done to allow the cpus to take > themselves to the target low power state. On ux500, are you saying > you take the cpus directly from the safe state to a lower power state > without ever going back to the active state? I once implemented Tegra yes it is > that way, and it required lots of nasty synchronization to prevent > resetting the cpu at the same time that it was booting due to an > interrupt, and I was later told that Tegra can't handle that sequence > at all, although I haven't verified it yet. you have to 2 main things to check : - this cpu is the last one to enter an idle state - other cpus are prepared to enter a cluster power state - other cpus are in WFI Vincent > > On platforms that can't turn the cpus off in a random order, or that > can't take a cpu directly from the safe state to the target state, > something like these coupled cpuidle patches are required. On > platforms that can, the low power modes can be implemented without > these patches, although it is very hard to do without race conditions. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-02-01 12:13 ` Vincent Guittot @ 2012-02-01 14:59 ` Lorenzo Pieralisi 2012-02-01 17:30 ` Colin Cross 0 siblings, 1 reply; 36+ messages in thread From: Lorenzo Pieralisi @ 2012-02-01 14:59 UTC (permalink / raw) To: Vincent Guittot Cc: Colin Cross, Daniel Lezcano, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote: [...] > >> In your patch, you put in safe state (WFI for most of platform) the > >> cpus that become idle and these cpus are woken up each time a new cpu > >> of the cluster becomes idle. Then, the cluster state is chosen and the > >> cpus enter the selected C-state. On ux500, we are using another > >> behavior for synchronizing the cpus. The cpus are prepared to enter > >> the c-state that has been chosen by the governor and the last cpu, > >> that enters idle, chooses the final cluster state (according to cpus' > >> C-state). The main advantage of this solution is that you don't need > >> to wake other cpus to enter the C-state of a cluster. This can be > >> quite worth full when tasks mainly run on one cpu. Have you also think > >> about such behavior when developing the coupled cpuidle driver ? It > >> could be interesting to add such behavior. > > > > Waking up the cpus that are in the safe state is not done just to > > choose the target state, it's done to allow the cpus to take > > themselves to the target low power state. On ux500, are you saying > > you take the cpus directly from the safe state to a lower power state > > without ever going back to the active state? I once implemented Tegra > > yes it is But if there is a single power rail for the entire cluster, when a CPU is "prepared" for shutdown this means that you have to save the context and clean L1, maybe for nothing since if other CPUs are up and running the CPU going idle can just enter a simple standby wfi (clock-gated but power on). With Colin's approach, context is saved and L1 cleaned only when it is almost certain the cluster is powered off (so the CPUs). It is a trade-off, I am not saying one approach is better than the other; we just have to make sure that preparing the CPU for "possible" shutdown is better than sending IPIs to take CPUs out of wfi and synchronize them (this happens if and only if CPUs enter coupled C-states). As usual this will depend on use cases (and silicon implementations :) ) It is definitely worth benchmarking them. Lorenzo ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-02-01 14:59 ` Lorenzo Pieralisi @ 2012-02-01 17:30 ` Colin Cross 2012-02-01 18:07 ` Lorenzo Pieralisi 0 siblings, 1 reply; 36+ messages in thread From: Colin Cross @ 2012-02-01 17:30 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Vincent Guittot, Daniel Lezcano, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel On Wed, Feb 1, 2012 at 6:59 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote: > > [...] > >> >> In your patch, you put in safe state (WFI for most of platform) the >> >> cpus that become idle and these cpus are woken up each time a new cpu >> >> of the cluster becomes idle. Then, the cluster state is chosen and the >> >> cpus enter the selected C-state. On ux500, we are using another >> >> behavior for synchronizing the cpus. The cpus are prepared to enter >> >> the c-state that has been chosen by the governor and the last cpu, >> >> that enters idle, chooses the final cluster state (according to cpus' >> >> C-state). The main advantage of this solution is that you don't need >> >> to wake other cpus to enter the C-state of a cluster. This can be >> >> quite worth full when tasks mainly run on one cpu. Have you also think >> >> about such behavior when developing the coupled cpuidle driver ? It >> >> could be interesting to add such behavior. >> > >> > Waking up the cpus that are in the safe state is not done just to >> > choose the target state, it's done to allow the cpus to take >> > themselves to the target low power state. On ux500, are you saying >> > you take the cpus directly from the safe state to a lower power state >> > without ever going back to the active state? I once implemented Tegra >> >> yes it is > > But if there is a single power rail for the entire cluster, when a CPU > is "prepared" for shutdown this means that you have to save the context and > clean L1, maybe for nothing since if other CPUs are up and running the > CPU going idle can just enter a simple standby wfi (clock-gated but power on). > > With Colin's approach, context is saved and L1 cleaned only when it is > almost certain the cluster is powered off (so the CPUs). > > It is a trade-off, I am not saying one approach is better than the > other; we just have to make sure that preparing the CPU for "possible" shutdown > is better than sending IPIs to take CPUs out of wfi and synchronize > them (this happens if and only if CPUs enter coupled C-states). > > As usual this will depend on use cases (and silicon implementations :) ) > > It is definitely worth benchmarking them. > I'm less worried about performance, and more worried about race conditions. How do you deal with the following situation: CPU0 goes to WFI, and saves its state CPU1 goes idle, and selects a deep idle state that powers down CPU0 CPU1 saves is state, and is about to trigger the power down CPU0 gets an interrupt, restores its state, and modifies state (maybe takes a spinlock during boot) CPU1 cuts the power to CPU0 On OMAP4, the race is handled in hardware. When CPU1 tries to cut the power to the blocks shared by CPU0 the hardware will ignore the request if CPU0 is not in WFI. On Tegra2, there is no hardware support and I had to handle it with a spinlock implemented in scratch registers because CPU0 is out of coherency when it starts booting and ldrex/strex don't work. I'm not convinced my implementation is correct, and I'd be curious to see any other implementations. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-02-01 17:30 ` Colin Cross @ 2012-02-01 18:07 ` Lorenzo Pieralisi 2012-02-03 1:19 ` Colin Cross 0 siblings, 1 reply; 36+ messages in thread From: Lorenzo Pieralisi @ 2012-02-01 18:07 UTC (permalink / raw) To: Colin Cross Cc: Vincent Guittot, Daniel Lezcano, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel On Wed, Feb 01, 2012 at 05:30:15PM +0000, Colin Cross wrote: > On Wed, Feb 1, 2012 at 6:59 AM, Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com> wrote: > > On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote: > > > > [...] > > > >> >> In your patch, you put in safe state (WFI for most of platform) the > >> >> cpus that become idle and these cpus are woken up each time a new cpu > >> >> of the cluster becomes idle. Then, the cluster state is chosen and the > >> >> cpus enter the selected C-state. On ux500, we are using another > >> >> behavior for synchronizing the cpus. The cpus are prepared to enter > >> >> the c-state that has been chosen by the governor and the last cpu, > >> >> that enters idle, chooses the final cluster state (according to cpus' > >> >> C-state). The main advantage of this solution is that you don't need > >> >> to wake other cpus to enter the C-state of a cluster. This can be > >> >> quite worth full when tasks mainly run on one cpu. Have you also think > >> >> about such behavior when developing the coupled cpuidle driver ? It > >> >> could be interesting to add such behavior. > >> > > >> > Waking up the cpus that are in the safe state is not done just to > >> > choose the target state, it's done to allow the cpus to take > >> > themselves to the target low power state. On ux500, are you saying > >> > you take the cpus directly from the safe state to a lower power state > >> > without ever going back to the active state? I once implemented Tegra > >> > >> yes it is > > > > But if there is a single power rail for the entire cluster, when a CPU > > is "prepared" for shutdown this means that you have to save the context and > > clean L1, maybe for nothing since if other CPUs are up and running the > > CPU going idle can just enter a simple standby wfi (clock-gated but power on). > > > > With Colin's approach, context is saved and L1 cleaned only when it is > > almost certain the cluster is powered off (so the CPUs). > > > > It is a trade-off, I am not saying one approach is better than the > > other; we just have to make sure that preparing the CPU for "possible" shutdown > > is better than sending IPIs to take CPUs out of wfi and synchronize > > them (this happens if and only if CPUs enter coupled C-states). > > > > As usual this will depend on use cases (and silicon implementations :) ) > > > > It is definitely worth benchmarking them. > > > > I'm less worried about performance, and more worried about race > conditions. How do you deal with the following situation: > CPU0 goes to WFI, and saves its state > CPU1 goes idle, and selects a deep idle state that powers down CPU0 > CPU1 saves is state, and is about to trigger the power down > CPU0 gets an interrupt, restores its state, and modifies state (maybe > takes a spinlock during boot) > CPU1 cuts the power to CPU0 > > On OMAP4, the race is handled in hardware. When CPU1 tries to cut the > power to the blocks shared by CPU0 the hardware will ignore the > request if CPU0 is not in WFI. On Tegra2, there is no hardware > support and I had to handle it with a spinlock implemented in scratch > registers because CPU0 is out of coherency when it starts booting and > ldrex/strex don't work. I'm not convinced my implementation is > correct, and I'd be curious to see any other implementations. That's a problem you solved with coupled C-states (ie your example in the cover letter), where the primary waits for other CPUs to be reset before issuing the power down command, right ? At that point in time secondaries cannot wake up (?) and if wfi (ie power down) aborts you just take the secondaries out of reset and restart executing simultaneously, correct ? It mirrors the suspend behaviour, which is easier to deal with than completely random idle paths. It is true that this should be managed by the PM HW; if HW is not capable of managing these situations things get nasty as you highlighted. And it is also true ldrex/strex on cacheable memory might not be available in those early warm-boot stages. I came up with a locking algorithm on strongly ordered memory to deal with that, but I am still not sure it is something we really really need. I will test coupled C-state code ASAP, and come back with feedback. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-02-01 18:07 ` Lorenzo Pieralisi @ 2012-02-03 1:19 ` Colin Cross 0 siblings, 0 replies; 36+ messages in thread From: Colin Cross @ 2012-02-03 1:19 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Vincent Guittot, Daniel Lezcano, Kevin Hilman, Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel On Wed, Feb 1, 2012 at 10:07 AM, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > On Wed, Feb 01, 2012 at 05:30:15PM +0000, Colin Cross wrote: >> On Wed, Feb 1, 2012 at 6:59 AM, Lorenzo Pieralisi >> <lorenzo.pieralisi@arm.com> wrote: >> > On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote: >> > >> > [...] >> > >> >> >> In your patch, you put in safe state (WFI for most of platform) the >> >> >> cpus that become idle and these cpus are woken up each time a new cpu >> >> >> of the cluster becomes idle. Then, the cluster state is chosen and the >> >> >> cpus enter the selected C-state. On ux500, we are using another >> >> >> behavior for synchronizing the cpus. The cpus are prepared to enter >> >> >> the c-state that has been chosen by the governor and the last cpu, >> >> >> that enters idle, chooses the final cluster state (according to cpus' >> >> >> C-state). The main advantage of this solution is that you don't need >> >> >> to wake other cpus to enter the C-state of a cluster. This can be >> >> >> quite worth full when tasks mainly run on one cpu. Have you also think >> >> >> about such behavior when developing the coupled cpuidle driver ? It >> >> >> could be interesting to add such behavior. >> >> > >> >> > Waking up the cpus that are in the safe state is not done just to >> >> > choose the target state, it's done to allow the cpus to take >> >> > themselves to the target low power state. On ux500, are you saying >> >> > you take the cpus directly from the safe state to a lower power state >> >> > without ever going back to the active state? I once implemented Tegra >> >> >> >> yes it is >> > >> > But if there is a single power rail for the entire cluster, when a CPU >> > is "prepared" for shutdown this means that you have to save the context and >> > clean L1, maybe for nothing since if other CPUs are up and running the >> > CPU going idle can just enter a simple standby wfi (clock-gated but power on). >> > >> > With Colin's approach, context is saved and L1 cleaned only when it is >> > almost certain the cluster is powered off (so the CPUs). >> > >> > It is a trade-off, I am not saying one approach is better than the >> > other; we just have to make sure that preparing the CPU for "possible" shutdown >> > is better than sending IPIs to take CPUs out of wfi and synchronize >> > them (this happens if and only if CPUs enter coupled C-states). >> > >> > As usual this will depend on use cases (and silicon implementations :) ) >> > >> > It is definitely worth benchmarking them. >> > >> >> I'm less worried about performance, and more worried about race >> conditions. How do you deal with the following situation: >> CPU0 goes to WFI, and saves its state >> CPU1 goes idle, and selects a deep idle state that powers down CPU0 >> CPU1 saves is state, and is about to trigger the power down >> CPU0 gets an interrupt, restores its state, and modifies state (maybe >> takes a spinlock during boot) >> CPU1 cuts the power to CPU0 >> >> On OMAP4, the race is handled in hardware. When CPU1 tries to cut the >> power to the blocks shared by CPU0 the hardware will ignore the >> request if CPU0 is not in WFI. On Tegra2, there is no hardware >> support and I had to handle it with a spinlock implemented in scratch >> registers because CPU0 is out of coherency when it starts booting and >> ldrex/strex don't work. I'm not convinced my implementation is >> correct, and I'd be curious to see any other implementations. > > That's a problem you solved with coupled C-states (ie your example in > the cover letter), where the primary waits for other CPUs to be reset > before issuing the power down command, right ? At that point in time > secondaries cannot wake up (?) and if wfi (ie power down) aborts you just > take the secondaries out of reset and restart executing simultaneously, > correct ? It mirrors the suspend behaviour, which is easier to deal with > than completely random idle paths. Yes, anything that supports hotplug and suspend should support coupled cpuidle states fairly easily. The only thing required that is not already used by hotplug/suspend is the ability to save and restore context on cpu1, but most implementations end up doing that already. > It is true that this should be managed by the PM HW; if HW is not > capable of managing these situations things get nasty as you highlighted. Yes - on some platforms, the HW is not designed to handle it. On others, it is designed to, but due to HW bugs it cannot be used. > And it is also true ldrex/strex on cacheable memory might not be available in > those early warm-boot stages. I came up with a locking algorithm on > strongly ordered memory to deal with that, but I am still not sure it is > something we really really need. I did the same, but with device memory. > I will test coupled C-state code ASAP, and come back with feedback. > > Thanks, > Lorenzo > ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <8762e8kqi6.fsf@ti.com>]
* Re: [PATCH 0/3] coupled cpuidle state support [not found] ` <8762e8kqi6.fsf@ti.com> @ 2012-03-14 0:28 ` Colin Cross 2012-03-14 0:47 ` Colin Cross 2012-03-14 2:04 ` Arjan van de Ven 1 sibling, 1 reply; 36+ messages in thread From: Colin Cross @ 2012-03-14 0:28 UTC (permalink / raw) To: Kevin Hilman Cc: Daniel Lezcano, linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Tue, Mar 13, 2012 at 4:52 PM, Kevin Hilman <khilman@ti.com> wrote: > Hi Colin, > > On 12/21/2011 01:09 AM, Colin Cross wrote: > >> To use coupled cpuidle states, a cpuidle driver must: > > [...] > >> Provide a struct cpuidle_state.enter function for each state >> that affects multiple cpus. This function is guaranteed to be >> called on all cpus at approximately the same time. The driver >> should ensure that the cpus all abort together if any cpu tries >> to abort once the function is called. > > I've discoved the last sentence above is crucial, and in order to catch > all the corner cases I found it useful to have the struct > cpuidle_coupled in cpuidle.h so that the driver can check ready_count > itself (patch below, on top of $SUBJECT series.) ready_count is an internal state of core coupled code, and will change significantly in the next version of the patches. Drivers cannot depend on it. > As you know, on OMAP4, when entering the coupled state, CPU0 has to wait > for CPU1 to enter its low power state before entering itself. The first > pass at implementing this was to just spin waiting for the powerdomain > of CPU1 to hit off. That works... most of the time. > > If CPU1 wakes up immediately (or before CPU0 starts checking), or more > likely, fails to hit the low-power state because of other hardware > "conditions", CPU0 will end up stuck in the loop waiting for CPU1. > > To solve this, in addition to checking the power state of CPU1, I also > check if (coupled->ready_count != cpumask_weight(&coupled->alive_coupled_cpus)). > If true, it means that CPU1 has already exited/aborted so CPU0 had > better abort as well. > > Checking the ready_count seemed like an easy way to do this, but did you > have any other mechanisms in mind for CPUs to communicate that they've > exited/aborted? Why not set a flag from CPU1 when it exits the low power state, and have CPU0 spin on the powerdomain register or the flag? You can then use the parallel barrier function to ensure both cpus have seen the flag and reset it to 0 before returning. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2012-03-14 0:28 ` Colin Cross @ 2012-03-14 0:47 ` Colin Cross 2012-03-14 14:23 ` [linux-pm] " Kevin Hilman 0 siblings, 1 reply; 36+ messages in thread From: Colin Cross @ 2012-03-14 0:47 UTC (permalink / raw) To: Kevin Hilman Cc: Daniel Lezcano, linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Santosh Shilimkar, Amit Kucheria, Arjan van de Ven, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Tue, Mar 13, 2012 at 5:28 PM, Colin Cross <ccross@android.com> wrote: > On Tue, Mar 13, 2012 at 4:52 PM, Kevin Hilman <khilman@ti.com> wrote: >> Hi Colin, >> >> On 12/21/2011 01:09 AM, Colin Cross wrote: >> >>> To use coupled cpuidle states, a cpuidle driver must: >> >> [...] >> >>> Provide a struct cpuidle_state.enter function for each state >>> that affects multiple cpus. This function is guaranteed to be >>> called on all cpus at approximately the same time. The driver >>> should ensure that the cpus all abort together if any cpu tries >>> to abort once the function is called. >> >> I've discoved the last sentence above is crucial, and in order to catch >> all the corner cases I found it useful to have the struct >> cpuidle_coupled in cpuidle.h so that the driver can check ready_count >> itself (patch below, on top of $SUBJECT series.) > > ready_count is an internal state of core coupled code, and will change > significantly in the next version of the patches. Drivers cannot > depend on it. > >> As you know, on OMAP4, when entering the coupled state, CPU0 has to wait >> for CPU1 to enter its low power state before entering itself. The first >> pass at implementing this was to just spin waiting for the powerdomain >> of CPU1 to hit off. That works... most of the time. >> >> If CPU1 wakes up immediately (or before CPU0 starts checking), or more >> likely, fails to hit the low-power state because of other hardware >> "conditions", CPU0 will end up stuck in the loop waiting for CPU1. >> >> To solve this, in addition to checking the power state of CPU1, I also >> check if (coupled->ready_count != cpumask_weight(&coupled->alive_coupled_cpus)). >> If true, it means that CPU1 has already exited/aborted so CPU0 had >> better abort as well. >> >> Checking the ready_count seemed like an easy way to do this, but did you >> have any other mechanisms in mind for CPUs to communicate that they've >> exited/aborted? > > Why not set a flag from CPU1 when it exits the low power state, and > have CPU0 spin on the powerdomain register or the flag? You can then > use the parallel barrier function to ensure both cpus have seen the > flag and reset it to 0 before returning. I realized the parallel barrier helper was not included in the patch set I posted, it will be in the next patch set. Short version, no caller to cpuidle_coupled_parallel_barrier will return before all cpus in the coupled set have called it. It allows you to resynchronize the cpus after an abort to ensure they have all seen the abort flag before clearing it and returning, leaving everything in the correct state for the next idle attempt. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support 2012-03-14 0:47 ` Colin Cross @ 2012-03-14 14:23 ` Kevin Hilman 0 siblings, 0 replies; 36+ messages in thread From: Kevin Hilman @ 2012-03-14 14:23 UTC (permalink / raw) To: Colin Cross Cc: Len Brown, linux-kernel, Amit Kucheria, linux-tegra, linux-pm, linux-omap, Arjan van de Ven, linux-arm-kernel Colin Cross <ccross@android.com> writes: > On Tue, Mar 13, 2012 at 5:28 PM, Colin Cross <ccross@android.com> wrote: >> On Tue, Mar 13, 2012 at 4:52 PM, Kevin Hilman <khilman@ti.com> wrote: [...] >>> >>> Checking the ready_count seemed like an easy way to do this, but did you >>> have any other mechanisms in mind for CPUs to communicate that they've >>> exited/aborted? >> >> Why not set a flag from CPU1 when it exits the low power state, and >> have CPU0 spin on the powerdomain register or the flag? You can then >> use the parallel barrier function to ensure both cpus have seen the >> flag and reset it to 0 before returning. > > I realized the parallel barrier helper was not included in the patch > set I posted, it will be in the next patch set. Do you have an ETA for the next patchset? I'll be glad to give it a spin on OMAP4. > Short version, no caller to cpuidle_coupled_parallel_barrier will > return before all cpus in the coupled set have called it. It allows > you to resynchronize the cpus after an abort to ensure they have all > seen the abort flag before clearing it and returning, leaving > everything in the correct state for the next idle attempt. OK, this sounds like it will work well for what I need. Kevin ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support [not found] ` <8762e8kqi6.fsf@ti.com> 2012-03-14 0:28 ` Colin Cross @ 2012-03-14 2:04 ` Arjan van de Ven 2012-03-14 2:21 ` Colin Cross 1 sibling, 1 reply; 36+ messages in thread From: Arjan van de Ven @ 2012-03-14 2:04 UTC (permalink / raw) To: Kevin Hilman Cc: Colin Cross, Daniel Lezcano, linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On 3/13/2012 4:52 PM, Kevin Hilman wrote: > Checking the ready_count seemed like an easy way to do this, but did you > have any other mechanisms in mind for CPUs to communicate that they've > exited/aborted? this indeed is the tricky part (which I warned about earlier); I've spent quite a lot of time (weeks) to get this provably working for an Intel system with similar requirements... and it's extremely unfunny, and needed firmware support to close some of the race conditions. I sure hope that hardware with these requirements is on the way out... it's not very OS friendly. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/3] coupled cpuidle state support 2012-03-14 2:04 ` Arjan van de Ven @ 2012-03-14 2:21 ` Colin Cross 0 siblings, 0 replies; 36+ messages in thread From: Colin Cross @ 2012-03-14 2:21 UTC (permalink / raw) To: Arjan van de Ven Cc: Kevin Hilman, Daniel Lezcano, linux-kernel, linux-arm-kernel, linux-pm, Len Brown, Santosh Shilimkar, Amit Kucheria, Trinabh Gupta, Deepthi Dharwar, linux-omap, linux-tegra On Tue, Mar 13, 2012 at 7:04 PM, Arjan van de Ven <arjan@linux.intel.com> wrote: > On 3/13/2012 4:52 PM, Kevin Hilman wrote: >> Checking the ready_count seemed like an easy way to do this, but did you >> have any other mechanisms in mind for CPUs to communicate that they've >> exited/aborted? > > this indeed is the tricky part (which I warned about earlier); > I've spent quite a lot of time (weeks) to get this provably working for > an Intel system with similar requirements... and it's extremely unfunny, > and needed firmware support to close some of the race conditions. As long as you can tell from cpu0 that cpu1 has succesfully entered the low power state, this should be easy, and the coupled_cpuidle_parallel_barrier helper should make it even easier. This series just allows the exact same sequence of transitions used by hotplug cpufreq governors to happen from the idle thread: 1. cpu0 forces cpu1 offline. From a wakeup perspective, this is exactly like hotplug removing cpu1. Unlike hotplug, the state of cpu1 has to be saved, and the scheduler is not told that the cpu is gone. Instead of using the IPI signalling that hotplug uses, we call the same function on both cpus, and one cpu runs the equivalent of platform_cpu_kill, and the other the equivalent of platform_cpu_die. Wakeup events that are only targeted at cpu1 will need to be temporarily migrated to cpu0. 2. If the hotplug is successful, cpu0 goes to a low power state, the same way it would when the hotplug cpufreq governor had removed cpu1. 3. cpu0 wakes up, because all wakeup events are pointed at it. 4. After restoring its own state, cpu0 brings cpu1 back online, exactly like hotplug online would, except that the boot vector has to point to a cpu_resume handler instead of secondary start. > I sure hope that hardware with these requirements is on the way out... > it's not very OS friendly. Even hardware that was designed not not have these requirements sometimes has bugs that require this kind of sequencing. OMAP4430 should theoretically support idle without sequencing, but OMAP4460 introduced a ROM code bug that requires sequencing again (turning on cpu1 corrupts the interrupt controller, so cpu0 has to be waiting with interrupts off to run a workaround whenever cpu1 turns on). Out of curiosity, what Intel hardware needs this? ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2012-03-14 14:23 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-12-21 0:09 [PATCH 0/3] coupled cpuidle state support Colin Cross 2011-12-21 0:09 ` [PATCH 1/3] cpuidle: refactor out cpuidle_enter_state Colin Cross 2012-01-04 14:08 ` [linux-pm] " Jean Pihet 2011-12-21 0:09 ` [PATCH 2/3] cpuidle: fix error handling in __cpuidle_register_device Colin Cross 2011-12-21 0:09 ` [PATCH 3/3] cpuidle: add support for states that affect multiple cpus Colin Cross 2011-12-21 9:02 ` [PATCH 0/3] coupled cpuidle state support Arjan van de Ven 2011-12-21 9:40 ` Colin Cross 2011-12-21 9:44 ` Arjan van de Ven 2011-12-21 9:55 ` Colin Cross 2011-12-21 12:12 ` Arjan van de Ven 2011-12-21 19:05 ` Colin Cross 2011-12-21 19:36 ` Arjan van de Ven 2011-12-21 19:42 ` [linux-pm] " Colin Cross 2011-12-22 8:35 ` Shilimkar, Santosh 2011-12-22 8:53 ` Arjan van de Ven 2011-12-22 9:30 ` Shilimkar, Santosh 2011-12-22 21:20 ` Colin Cross 2012-03-14 0:39 ` Colin Cross 2012-01-04 0:41 ` Kevin Hilman 2012-01-04 17:27 ` Shilimkar, Santosh 2012-01-20 8:46 ` Daniel Lezcano 2012-01-20 20:40 ` Colin Cross 2012-01-25 14:04 ` Daniel Lezcano 2012-01-31 14:13 ` Daniel Lezcano 2012-01-27 8:54 ` [linux-pm] " Vincent Guittot 2012-01-27 17:32 ` Colin Cross 2012-02-01 12:13 ` Vincent Guittot 2012-02-01 14:59 ` Lorenzo Pieralisi 2012-02-01 17:30 ` Colin Cross 2012-02-01 18:07 ` Lorenzo Pieralisi 2012-02-03 1:19 ` Colin Cross [not found] ` <8762e8kqi6.fsf@ti.com> 2012-03-14 0:28 ` Colin Cross 2012-03-14 0:47 ` Colin Cross 2012-03-14 14:23 ` [linux-pm] " Kevin Hilman 2012-03-14 2:04 ` Arjan van de Ven 2012-03-14 2:21 ` Colin Cross
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).