* [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info @ 2014-03-28 12:29 Daniel Lezcano 2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano ` (5 more replies) 0 siblings, 6 replies; 47+ messages in thread From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw) To: linux-kernel, mingo, peterz Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen The following patchset provides an interaction between cpuidle and the scheduler. The first patch encapsulate the needed information for the scheduler in a separate cpuidle structure. The second one stores the pointer to this structure when entering idle. The third one, use this information to take the decision to find the idlest cpu. After some basic testing with hackbench, it appears there is an improvement for the performances (small) and for the duration of the idle states (which provides a better power saving). The measurement has been done with the 'idlestat' tool previously posted in this mailing list. So the benefit is good for both sides performance and power saving. The select_idle_sibling could be also improved in the same way. ====================== test with hackbench 3.14-rc8 ========================= /usr/bin/hackbench -l 10000 -s 4096 Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 10000 messages of 4096 bytes Time: 44.433 Total trace buffer: 1846688 kB clusterA@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-VB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 0 0.00 0.00 0.00 0.00 core0@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-IVB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 1396 87932131.00 62988.63 0.00 320146.00 cpu0@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 1 14.00 14.00 14.00 14.00 C1E-VB 0 0.00 0.00 0.00 0.00 C3-IVB 1 262.00 262.00 262.00 262.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 1180 87938177.00 74523.88 1.00 320147.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 0 0.00 0.00 0.00 0.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 0 0.00 0.00 0.00 0.00 cpu0 wakeups name count irq009 acpi 1 cpu1@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-VB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 475 87941356.00 185139.70 322.00 1500690.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 0 0.00 0.00 0.00 0.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 0 0.00 0.00 0.00 0.00 cpu1 wakeups name count irq009 acpi 3 core1@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-IVB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 0 0.00 0.00 0.00 0.00 cpu2@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 11 288157.00 26196.09 16.00 200060.00 C1E-VB 6 221601.00 36933.50 79.00 200066.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 950 87417466.00 92018.39 19.00 200074.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 2 34.00 17.00 11.00 23.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 745 18800.00 25.23 2.00 156.00 cpu2 wakeups name count irq019 ahci 50 irq009 acpi 17 cpu3@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-VB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 0 0.00 0.00 0.00 0.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 0 0.00 0.00 0.00 0.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 0 0.00 0.00 0.00 0.00 cpu3 wakeups name count ================ test with hackbench 3.14-rc8 + patchset ==================== /usr/bin/hackbench -l 10000 -s 4096 Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 10000 messages of 4096 bytes Time: 42.179 Total trace buffer: 1846688 kB clusterA@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-VB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 0 0.00 0.00 0.00 0.00 core0@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-IVB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 880 89157590.00 101315.44 0.00 400184.00 cpu0@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-VB 1 233.00 233.00 233.00 233.00 C3-IVB 1 260.00 260.00 260.00 260.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 700 89162006.00 127374.29 182.00 400187.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 0 0.00 0.00 0.00 0.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 0 0.00 0.00 0.00 0.00 cpu0 wakeups name count irq009 acpi 2 cpu1@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-VB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 334 89164805.00 266960.49 1.00 1500677.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 0 0.00 0.00 0.00 0.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 0 0.00 0.00 0.00 0.00 cpu1 wakeups name count irq009 acpi 6 core1@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-IVB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 0 0.00 0.00 0.00 0.00 cpu2@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 19 2169047.00 114160.37 18.00 999129.00 C1E-IB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 376 86993307.00 231365.18 20.00 1500682.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 0 0.00 0.00 0.00 0.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 0 0.00 0.00 0.00 0.00 cpu2 wakeups name count irq009 acpi 32 irq019 ahci 45 cpu3@state hits total(us) avg(us) min(us) max(us) POLL 0 0.00 0.00 0.00 0.00 C1-IVB 0 0.00 0.00 0.00 0.00 C1E-VB 0 0.00 0.00 0.00 0.00 C3-IVB 0 0.00 0.00 0.00 0.00 C6-IVB 0 0.00 0.00 0.00 0.00 C7-IVB 0 0.00 0.00 0.00 0.00 1701 0 0.00 0.00 0.00 0.00 1700 0 0.00 0.00 0.00 0.00 1600 0 0.00 0.00 0.00 0.00 1500 0 0.00 0.00 0.00 0.00 1400 0 0.00 0.00 0.00 0.00 1300 0 0.00 0.00 0.00 0.00 1200 0 0.00 0.00 0.00 0.00 1100 0 0.00 0.00 0.00 0.00 1000 0 0.00 0.00 0.00 0.00 900 0 0.00 0.00 0.00 0.00 800 0 0.00 0.00 0.00 0.00 782 0 0.00 0.00 0.00 0.00 cpu3 wakeups name count Daniel Lezcano (3): cpuidle: encapsulate power info in a separate structure idle: store the idle state the cpu is sched/fair: use the idle state info to choose the idlest cpu arch/arm/include/asm/cpuidle.h | 6 +- arch/arm/mach-exynos/cpuidle.c | 4 +- drivers/acpi/processor_idle.c | 4 +- drivers/base/power/domain.c | 6 +- drivers/cpuidle/cpuidle-at91.c | 4 +- drivers/cpuidle/cpuidle-big_little.c | 9 +-- drivers/cpuidle/cpuidle-calxeda.c | 6 +- drivers/cpuidle/cpuidle-kirkwood.c | 4 +- drivers/cpuidle/cpuidle-powernv.c | 8 +-- drivers/cpuidle/cpuidle-pseries.c | 12 ++-- drivers/cpuidle/cpuidle-ux500.c | 14 ++--- drivers/cpuidle/cpuidle-zynq.c | 4 +- drivers/cpuidle/driver.c | 6 +- drivers/cpuidle/governors/ladder.c | 14 +++-- drivers/cpuidle/governors/menu.c | 8 +-- drivers/cpuidle/sysfs.c | 2 +- drivers/idle/intel_idle.c | 112 +++++++++++++++++----------------- include/linux/cpuidle.h | 10 ++- kernel/sched/fair.c | 46 ++++++++++++-- kernel/sched/idle.c | 17 +++++- kernel/sched/sched.h | 5 ++ 21 files changed, 180 insertions(+), 121 deletions(-) -- 1.7.9.5 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure 2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano @ 2014-03-28 12:29 ` Daniel Lezcano 2014-03-28 18:17 ` Nicolas Pitre 2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano ` (4 subsequent siblings) 5 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw) To: linux-kernel, mingo, peterz Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen The scheduler needs some information from cpuidle to know the timing for a specific idle state a cpu is. This patch creates a separate structure to group the cpuidle power info in order to share it with the scheduler. It improves the encapsulation of the code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- arch/arm/include/asm/cpuidle.h | 6 +- arch/arm/mach-exynos/cpuidle.c | 4 +- drivers/acpi/processor_idle.c | 4 +- drivers/base/power/domain.c | 6 +- drivers/cpuidle/cpuidle-at91.c | 4 +- drivers/cpuidle/cpuidle-big_little.c | 9 +-- drivers/cpuidle/cpuidle-calxeda.c | 6 +- drivers/cpuidle/cpuidle-kirkwood.c | 4 +- drivers/cpuidle/cpuidle-powernv.c | 8 +-- drivers/cpuidle/cpuidle-pseries.c | 12 ++-- drivers/cpuidle/cpuidle-ux500.c | 14 ++--- drivers/cpuidle/cpuidle-zynq.c | 4 +- drivers/cpuidle/driver.c | 6 +- drivers/cpuidle/governors/ladder.c | 14 +++-- drivers/cpuidle/governors/menu.c | 8 +-- drivers/cpuidle/sysfs.c | 2 +- drivers/idle/intel_idle.c | 112 +++++++++++++++++----------------- include/linux/cpuidle.h | 10 ++- 18 files changed, 120 insertions(+), 113 deletions(-) diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h index 2fca60a..987ee53 100644 --- a/arch/arm/include/asm/cpuidle.h +++ b/arch/arm/include/asm/cpuidle.h @@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev, /* Common ARM WFI state */ #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\ .enter = arm_cpuidle_simple_enter,\ - .exit_latency = 1,\ - .target_residency = 1,\ - .power_usage = p,\ + .power.exit_latency = 1,\ + .power.target_residency = 1,\ + .power.power_usage = p,\ .flags = CPUIDLE_FLAG_TIME_VALID,\ .name = "WFI",\ .desc = "ARM WFI",\ diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c index f57cb91..f6275cb 100644 --- a/arch/arm/mach-exynos/cpuidle.c +++ b/arch/arm/mach-exynos/cpuidle.c @@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = { [0] = ARM_CPUIDLE_WFI_STATE, [1] = { .enter = exynos4_enter_lowpower, - .exit_latency = 300, - .target_residency = 100000, + .power.exit_latency = 300, + .power.target_residency = 100000, .flags = CPUIDLE_FLAG_TIME_VALID, .name = "C1", .desc = "ARM power down", diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 3dca36d..05fa991 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr) state = &drv->states[count]; snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i); strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN); - state->exit_latency = cx->latency; - state->target_residency = cx->latency * latency_factor; + state->power.exit_latency = cx->latency; + state->power.target_residency = cx->latency * latency_factor; state->flags = 0; switch (cx->type) { diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index bfb8955..6bcb1e8 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd) usecs64 = genpd->power_on_latency_ns; do_div(usecs64, NSEC_PER_USEC); usecs64 += genpd->cpu_data->saved_exit_latency; - genpd->cpu_data->idle_state->exit_latency = usecs64; + genpd->cpu_data->idle_state->power.exit_latency = usecs64; } /** @@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state) goto err; } cpu_data->idle_state = idle_state; - cpu_data->saved_exit_latency = idle_state->exit_latency; + cpu_data->saved_exit_latency = idle_state->power.exit_latency; genpd->cpu_data = cpu_data; genpd_recalc_cpu_exit_latency(genpd); @@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd) ret = -EAGAIN; goto out; } - idle_state->exit_latency = cpu_data->saved_exit_latency; + idle_state->power.exit_latency = cpu_data->saved_exit_latency; cpuidle_driver_unref(); genpd->cpu_data = NULL; kfree(cpu_data); diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c index a077437..48c7063 100644 --- a/drivers/cpuidle/cpuidle-at91.c +++ b/drivers/cpuidle/cpuidle-at91.c @@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = { .owner = THIS_MODULE, .states[0] = ARM_CPUIDLE_WFI_STATE, .states[1] = { + .power.exit_latency = 10, + .power.target_residency = 10000, .enter = at91_enter_idle, - .exit_latency = 10, - .target_residency = 10000, .flags = CPUIDLE_FLAG_TIME_VALID, .name = "RAM_SR", .desc = "WFI and DDR Self Refresh", diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c index b45fc62..5a0af4b 100644 --- a/drivers/cpuidle/cpuidle-big_little.c +++ b/drivers/cpuidle/cpuidle-big_little.c @@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = { .owner = THIS_MODULE, .states[0] = ARM_CPUIDLE_WFI_STATE, .states[1] = { + .power.exit_latency = 700, + .power.target_residency = 2500, .enter = bl_enter_powerdown, - .exit_latency = 700, - .target_residency = 2500, .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TIMER_STOP, .name = "C1", @@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = { .owner = THIS_MODULE, .states[0] = ARM_CPUIDLE_WFI_STATE, .states[1] = { + + .power.exit_latency = 500, + .power.target_residency = 2000, .enter = bl_enter_powerdown, - .exit_latency = 500, - .target_residency = 2000, .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TIMER_STOP, .name = "C1", diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c index 6e51114..8357a20 100644 --- a/drivers/cpuidle/cpuidle-calxeda.c +++ b/drivers/cpuidle/cpuidle-calxeda.c @@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = { .name = "PG", .desc = "Power Gate", .flags = CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 30, - .power_usage = 50, - .target_residency = 200, + .power.exit_latency = 30, + .power.power_usage = 50, + .power.target_residency = 200, .enter = calxeda_pwrdown_idle, }, }, diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c index 41ba843..0ae4138 100644 --- a/drivers/cpuidle/cpuidle-kirkwood.c +++ b/drivers/cpuidle/cpuidle-kirkwood.c @@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = { .owner = THIS_MODULE, .states[0] = ARM_CPUIDLE_WFI_STATE, .states[1] = { + .power.exit_latency = 10, + .power.target_residency = 100000, .enter = kirkwood_enter_idle, - .exit_latency = 10, - .target_residency = 100000, .flags = CPUIDLE_FLAG_TIME_VALID, .name = "DDR SR", .desc = "WFI and DDR Self Refresh", diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index f48607c..c47cc02 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = { .name = "snooze", .desc = "snooze", .flags = CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 0, - .target_residency = 0, + .power.exit_latency = 0, + .power.target_residency = 0, .enter = &snooze_loop }, { /* NAP */ .name = "NAP", .desc = "NAP", .flags = CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 10, - .target_residency = 100, + .power.exit_latency = 10, + .power.target_residency = 100, .enter = &nap_loop }, }; diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 6f7b019..483d7e7 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = { .name = "snooze", .desc = "snooze", .flags = CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 0, - .target_residency = 0, + .power.exit_latency = 0, + .power.target_residency = 0, .enter = &snooze_loop }, { /* CEDE */ .name = "CEDE", .desc = "CEDE", .flags = CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 10, - .target_residency = 100, + .power.exit_latency = 10, + .power.target_residency = 100, .enter = &dedicated_cede_loop }, }; @@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = { .name = "Shared Cede", .desc = "Shared Cede", .flags = CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 0, - .target_residency = 0, + .power.exit_latency = 0, + .power.target_residency = 0, .enter = &shared_cede_loop }, }; diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c index 5e35804..3261eb2 100644 --- a/drivers/cpuidle/cpuidle-ux500.c +++ b/drivers/cpuidle/cpuidle-ux500.c @@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = { .states = { ARM_CPUIDLE_WFI_STATE, { - .enter = ux500_enter_idle, - .exit_latency = 70, - .target_residency = 260, - .flags = CPUIDLE_FLAG_TIME_VALID | - CPUIDLE_FLAG_TIMER_STOP, - .name = "ApIdle", - .desc = "ARM Retention", + .power.exit_latency = 70, + .power.target_residency = 260, + .enter = ux500_enter_idle, + .flags = CPUIDLE_FLAG_TIME_VALID | + CPUIDLE_FLAG_TIMER_STOP, + .name = "ApIdle", + .desc = "ARM Retention", }, }, .safe_state_index = 0, diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c index aded759..dddefb8 100644 --- a/drivers/cpuidle/cpuidle-zynq.c +++ b/drivers/cpuidle/cpuidle-zynq.c @@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = { .states = { ARM_CPUIDLE_WFI_STATE, { + .power.exit_latency = 10, + .power.target_residency = 10000, .enter = zynq_enter_idle, - .exit_latency = 10, - .target_residency = 10000, .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TIMER_STOP, .name = "RAM_SR", diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 06dbe7c..40ddd3c 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv) snprintf(state->name, CPUIDLE_NAME_LEN, "POLL"); snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE"); - state->exit_latency = 0; - state->target_residency = 0; - state->power_usage = -1; + state->power.exit_latency = 0; + state->power.target_residency = 0; + state->power.power_usage = -1; state->flags = 0; state->enter = poll_idle; state->disabled = false; diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index 9f08e8c..4837880 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) { last_residency = cpuidle_get_last_residency(dev) - \ - drv->states[last_idx].exit_latency; + drv->states[last_idx].power.exit_latency; } else last_residency = last_state->threshold.promotion_time + 1; @@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, !drv->states[last_idx + 1].disabled && !dev->states_usage[last_idx + 1].disable && last_residency > last_state->threshold.promotion_time && - drv->states[last_idx + 1].exit_latency <= latency_req) { + drv->states[last_idx + 1].power.exit_latency <= latency_req) { last_state->stats.promotion_count++; last_state->stats.demotion_count = 0; if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { @@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv, if (last_idx > CPUIDLE_DRIVER_STATE_START && (drv->states[last_idx].disabled || dev->states_usage[last_idx].disable || - drv->states[last_idx].exit_latency > latency_req)) { + drv->states[last_idx].power.exit_latency > latency_req)) { int i; for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) { - if (drv->states[i].exit_latency <= latency_req) + if (drv->states[i].power.exit_latency <= latency_req) break; } ladder_do_selection(ldev, last_idx, i); @@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv, lstate->threshold.demotion_count = DEMOTION_COUNT; if (i < drv->state_count - 1) - lstate->threshold.promotion_time = state->exit_latency; + lstate->threshold.promotion_time = + state->power.exit_latency; if (i > 0) - lstate->threshold.demotion_time = state->exit_latency; + lstate->threshold.demotion_time = + state->power.exit_latency; } return 0; diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index cf7f2f0..34bd463 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (s->disabled || su->disable) continue; - if (s->target_residency > data->predicted_us) + if (s->power.target_residency > data->predicted_us) continue; - if (s->exit_latency > latency_req) + if (s->power.exit_latency > latency_req) continue; - if (s->exit_latency * multiplier > data->predicted_us) + if (s->power.exit_latency * multiplier > data->predicted_us) continue; data->last_state_idx = i; - data->exit_us = s->exit_latency; + data->exit_us = s->power.exit_latency; } return data->last_state_idx; diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c index e918b6d..1a45541 100644 --- a/drivers/cpuidle/sysfs.c +++ b/drivers/cpuidle/sysfs.c @@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store) static ssize_t show_state_##_name(struct cpuidle_state *state, \ struct cpuidle_state_usage *state_usage, char *buf) \ { \ - return sprintf(buf, "%u\n", state->_name);\ + return sprintf(buf, "%u\n", state->power._name);\ } #define define_store_state_ull_function(_name) \ diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 8e1939f..4f0533e 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = { .name = "C1-NHM", .desc = "MWAIT 0x00", .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 3, - .target_residency = 6, + .power.exit_latency = 3, + .power.target_residency = 6, .enter = &intel_idle }, { .name = "C1E-NHM", .desc = "MWAIT 0x01", .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 10, - .target_residency = 20, + .power.exit_latency = 10, + .power.target_residency = 20, .enter = &intel_idle }, { .name = "C3-NHM", .desc = "MWAIT 0x10", .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 20, - .target_residency = 80, + .power.exit_latency = 20, + .power.target_residency = 80, .enter = &intel_idle }, { .name = "C6-NHM", .desc = "MWAIT 0x20", .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 200, - .target_residency = 800, + .power.exit_latency = 200, + .power.target_residency = 800, .enter = &intel_idle }, { .enter = NULL } @@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = { .name = "C1-SNB", .desc = "MWAIT 0x00", .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 2, - .target_residency = 2, + .power.exit_latency = 2, + .power.target_residency = 2, .enter = &intel_idle }, { .name = "C1E-SNB", .desc = "MWAIT 0x01", .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 10, - .target_residency = 20, + .power.exit_latency = 10, + .power.target_residency = 20, .enter = &intel_idle }, { .name = "C3-SNB", .desc = "MWAIT 0x10", .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 80, - .target_residency = 211, + .power.exit_latency = 80, + .power.target_residency = 211, .enter = &intel_idle }, { .name = "C6-SNB", .desc = "MWAIT 0x20", .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 104, - .target_residency = 345, + .power.exit_latency = 104, + .power.target_residency = 345, .enter = &intel_idle }, { .name = "C7-SNB", .desc = "MWAIT 0x30", .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 109, - .target_residency = 345, + .power.exit_latency = 109, + .power.target_residency = 345, .enter = &intel_idle }, { .enter = NULL } @@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = { .name = "C1-IVB", .desc = "MWAIT 0x00", .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 1, - .target_residency = 1, + .power.exit_latency = 1, + .power.target_residency = 1, .enter = &intel_idle }, { .name = "C1E-IVB", .desc = "MWAIT 0x01", .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 10, - .target_residency = 20, + .power.exit_latency = 10, + .power.target_residency = 20, .enter = &intel_idle }, { .name = "C3-IVB", .desc = "MWAIT 0x10", .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 59, - .target_residency = 156, + .power.exit_latency = 59, + .power.target_residency = 156, .enter = &intel_idle }, { .name = "C6-IVB", .desc = "MWAIT 0x20", .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 80, - .target_residency = 300, + .power.exit_latency = 80, + .power.target_residency = 300, .enter = &intel_idle }, { .name = "C7-IVB", .desc = "MWAIT 0x30", .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 87, - .target_residency = 300, + .power.exit_latency = 87, + .power.target_residency = 300, .enter = &intel_idle }, { .enter = NULL } @@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = { .name = "C1-HSW", .desc = "MWAIT 0x00", .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 2, - .target_residency = 2, + .power.exit_latency = 2, + .power.target_residency = 2, .enter = &intel_idle }, { .name = "C1E-HSW", .desc = "MWAIT 0x01", .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 10, - .target_residency = 20, + .power.exit_latency = 10, + .power.target_residency = 20, .enter = &intel_idle }, { .name = "C3-HSW", .desc = "MWAIT 0x10", .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 33, - .target_residency = 100, + .power.exit_latency = 33, + .power.target_residency = 100, .enter = &intel_idle }, { .name = "C6-HSW", .desc = "MWAIT 0x20", .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 133, - .target_residency = 400, + .power.exit_latency = 133, + .power.target_residency = 400, .enter = &intel_idle }, { .name = "C7s-HSW", .desc = "MWAIT 0x32", .flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 166, - .target_residency = 500, + .power.exit_latency = 166, + .power.target_residency = 500, .enter = &intel_idle }, { .name = "C8-HSW", .desc = "MWAIT 0x40", .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 300, - .target_residency = 900, + .power.exit_latency = 300, + .power.target_residency = 900, .enter = &intel_idle }, { .name = "C9-HSW", .desc = "MWAIT 0x50", .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 600, - .target_residency = 1800, + .power.exit_latency = 600, + .power.target_residency = 1800, .enter = &intel_idle }, { .name = "C10-HSW", .desc = "MWAIT 0x60", .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 2600, - .target_residency = 7700, + .power.exit_latency = 2600, + .power.target_residency = 7700, .enter = &intel_idle }, { .enter = NULL } @@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = { .name = "C1E-ATM", .desc = "MWAIT 0x00", .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 10, - .target_residency = 20, + .power.exit_latency = 10, + .power.target_residency = 20, .enter = &intel_idle }, { .name = "C2-ATM", .desc = "MWAIT 0x10", .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 20, - .target_residency = 80, + .power.exit_latency = 20, + .power.target_residency = 80, .enter = &intel_idle }, { .name = "C4-ATM", .desc = "MWAIT 0x30", .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 100, - .target_residency = 400, + .power.exit_latency = 100, + .power.target_residency = 400, .enter = &intel_idle }, { .name = "C6-ATM", .desc = "MWAIT 0x52", .flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 140, - .target_residency = 560, + .power.exit_latency = 140, + .power.target_residency = 560, .enter = &intel_idle }, { .enter = NULL } @@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = { .name = "C1-AVN", .desc = "MWAIT 0x00", .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, - .exit_latency = 2, - .target_residency = 2, + .power.exit_latency = 2, + .power.target_residency = 2, .enter = &intel_idle }, { .name = "C6-AVN", .desc = "MWAIT 0x51", .flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, - .exit_latency = 15, - .target_residency = 45, + .power.exit_latency = 15, + .power.target_residency = 45, .enter = &intel_idle }, { .enter = NULL } diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index b0238cb..eb58ab3 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -35,14 +35,18 @@ struct cpuidle_state_usage { unsigned long long time; /* in US */ }; +struct cpuidle_power { + unsigned int exit_latency; /* in US */ + unsigned int target_residency; /* in US */ + int power_usage; /* in mW */ +}; + struct cpuidle_state { char name[CPUIDLE_NAME_LEN]; char desc[CPUIDLE_DESC_LEN]; unsigned int flags; - unsigned int exit_latency; /* in US */ - int power_usage; /* in mW */ - unsigned int target_residency; /* in US */ + struct cpuidle_power power; bool disabled; /* disabled on all CPUs */ int (*enter) (struct cpuidle_device *dev, -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure 2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano @ 2014-03-28 18:17 ` Nicolas Pitre 2014-03-28 20:42 ` Daniel Lezcano 0 siblings, 1 reply; 47+ messages in thread From: Nicolas Pitre @ 2014-03-28 18:17 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, 28 Mar 2014, Daniel Lezcano wrote: > The scheduler needs some information from cpuidle to know the timing for a > specific idle state a cpu is. > > This patch creates a separate structure to group the cpuidle power info in > order to share it with the scheduler. It improves the encapsulation of the > code. Having cpuidle_power as a structure name, or worse, 'power' as a struct member, is a really bad choice. Amongst the fields this struct contains, only 1 out of 3 is about power. The word "power" is already abused quite significantly to mean too many different things already. I'd suggest something inspired by your own patch log message i.e. 'struct cpuidle_info' instead, and use 'info' as a field name within struct cpuidle_state. Having 'params" instead of "info" could be a good alternative too, although slightly longer. And with struct rq in patch 2/3 I'd simply use: struct cpuidle_info *cpuidle; This way you'll have rq->cpuidle->exit_latency to refer to from the scheduler context which is IMHO much more self explanatory. > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > arch/arm/include/asm/cpuidle.h | 6 +- > arch/arm/mach-exynos/cpuidle.c | 4 +- > drivers/acpi/processor_idle.c | 4 +- > drivers/base/power/domain.c | 6 +- > drivers/cpuidle/cpuidle-at91.c | 4 +- > drivers/cpuidle/cpuidle-big_little.c | 9 +-- > drivers/cpuidle/cpuidle-calxeda.c | 6 +- > drivers/cpuidle/cpuidle-kirkwood.c | 4 +- > drivers/cpuidle/cpuidle-powernv.c | 8 +-- > drivers/cpuidle/cpuidle-pseries.c | 12 ++-- > drivers/cpuidle/cpuidle-ux500.c | 14 ++--- > drivers/cpuidle/cpuidle-zynq.c | 4 +- > drivers/cpuidle/driver.c | 6 +- > drivers/cpuidle/governors/ladder.c | 14 +++-- > drivers/cpuidle/governors/menu.c | 8 +-- > drivers/cpuidle/sysfs.c | 2 +- > drivers/idle/intel_idle.c | 112 +++++++++++++++++----------------- > include/linux/cpuidle.h | 10 ++- > 18 files changed, 120 insertions(+), 113 deletions(-) > > diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h > index 2fca60a..987ee53 100644 > --- a/arch/arm/include/asm/cpuidle.h > +++ b/arch/arm/include/asm/cpuidle.h > @@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev, > /* Common ARM WFI state */ > #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\ > .enter = arm_cpuidle_simple_enter,\ > - .exit_latency = 1,\ > - .target_residency = 1,\ > - .power_usage = p,\ > + .power.exit_latency = 1,\ > + .power.target_residency = 1,\ > + .power.power_usage = p,\ > .flags = CPUIDLE_FLAG_TIME_VALID,\ > .name = "WFI",\ > .desc = "ARM WFI",\ > diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c > index f57cb91..f6275cb 100644 > --- a/arch/arm/mach-exynos/cpuidle.c > +++ b/arch/arm/mach-exynos/cpuidle.c > @@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = { > [0] = ARM_CPUIDLE_WFI_STATE, > [1] = { > .enter = exynos4_enter_lowpower, > - .exit_latency = 300, > - .target_residency = 100000, > + .power.exit_latency = 300, > + .power.target_residency = 100000, > .flags = CPUIDLE_FLAG_TIME_VALID, > .name = "C1", > .desc = "ARM power down", > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c > index 3dca36d..05fa991 100644 > --- a/drivers/acpi/processor_idle.c > +++ b/drivers/acpi/processor_idle.c > @@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr) > state = &drv->states[count]; > snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i); > strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN); > - state->exit_latency = cx->latency; > - state->target_residency = cx->latency * latency_factor; > + state->power.exit_latency = cx->latency; > + state->power.target_residency = cx->latency * latency_factor; > > state->flags = 0; > switch (cx->type) { > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c > index bfb8955..6bcb1e8 100644 > --- a/drivers/base/power/domain.c > +++ b/drivers/base/power/domain.c > @@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd) > usecs64 = genpd->power_on_latency_ns; > do_div(usecs64, NSEC_PER_USEC); > usecs64 += genpd->cpu_data->saved_exit_latency; > - genpd->cpu_data->idle_state->exit_latency = usecs64; > + genpd->cpu_data->idle_state->power.exit_latency = usecs64; > } > > /** > @@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state) > goto err; > } > cpu_data->idle_state = idle_state; > - cpu_data->saved_exit_latency = idle_state->exit_latency; > + cpu_data->saved_exit_latency = idle_state->power.exit_latency; > genpd->cpu_data = cpu_data; > genpd_recalc_cpu_exit_latency(genpd); > > @@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd) > ret = -EAGAIN; > goto out; > } > - idle_state->exit_latency = cpu_data->saved_exit_latency; > + idle_state->power.exit_latency = cpu_data->saved_exit_latency; > cpuidle_driver_unref(); > genpd->cpu_data = NULL; > kfree(cpu_data); > diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c > index a077437..48c7063 100644 > --- a/drivers/cpuidle/cpuidle-at91.c > +++ b/drivers/cpuidle/cpuidle-at91.c > @@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = { > .owner = THIS_MODULE, > .states[0] = ARM_CPUIDLE_WFI_STATE, > .states[1] = { > + .power.exit_latency = 10, > + .power.target_residency = 10000, > .enter = at91_enter_idle, > - .exit_latency = 10, > - .target_residency = 10000, > .flags = CPUIDLE_FLAG_TIME_VALID, > .name = "RAM_SR", > .desc = "WFI and DDR Self Refresh", > diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c > index b45fc62..5a0af4b 100644 > --- a/drivers/cpuidle/cpuidle-big_little.c > +++ b/drivers/cpuidle/cpuidle-big_little.c > @@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = { > .owner = THIS_MODULE, > .states[0] = ARM_CPUIDLE_WFI_STATE, > .states[1] = { > + .power.exit_latency = 700, > + .power.target_residency = 2500, > .enter = bl_enter_powerdown, > - .exit_latency = 700, > - .target_residency = 2500, > .flags = CPUIDLE_FLAG_TIME_VALID | > CPUIDLE_FLAG_TIMER_STOP, > .name = "C1", > @@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = { > .owner = THIS_MODULE, > .states[0] = ARM_CPUIDLE_WFI_STATE, > .states[1] = { > + > + .power.exit_latency = 500, > + .power.target_residency = 2000, > .enter = bl_enter_powerdown, > - .exit_latency = 500, > - .target_residency = 2000, > .flags = CPUIDLE_FLAG_TIME_VALID | > CPUIDLE_FLAG_TIMER_STOP, > .name = "C1", > diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c > index 6e51114..8357a20 100644 > --- a/drivers/cpuidle/cpuidle-calxeda.c > +++ b/drivers/cpuidle/cpuidle-calxeda.c > @@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = { > .name = "PG", > .desc = "Power Gate", > .flags = CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 30, > - .power_usage = 50, > - .target_residency = 200, > + .power.exit_latency = 30, > + .power.power_usage = 50, > + .power.target_residency = 200, > .enter = calxeda_pwrdown_idle, > }, > }, > diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c > index 41ba843..0ae4138 100644 > --- a/drivers/cpuidle/cpuidle-kirkwood.c > +++ b/drivers/cpuidle/cpuidle-kirkwood.c > @@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = { > .owner = THIS_MODULE, > .states[0] = ARM_CPUIDLE_WFI_STATE, > .states[1] = { > + .power.exit_latency = 10, > + .power.target_residency = 100000, > .enter = kirkwood_enter_idle, > - .exit_latency = 10, > - .target_residency = 100000, > .flags = CPUIDLE_FLAG_TIME_VALID, > .name = "DDR SR", > .desc = "WFI and DDR Self Refresh", > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c > index f48607c..c47cc02 100644 > --- a/drivers/cpuidle/cpuidle-powernv.c > +++ b/drivers/cpuidle/cpuidle-powernv.c > @@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = { > .name = "snooze", > .desc = "snooze", > .flags = CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 0, > - .target_residency = 0, > + .power.exit_latency = 0, > + .power.target_residency = 0, > .enter = &snooze_loop }, > { /* NAP */ > .name = "NAP", > .desc = "NAP", > .flags = CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 10, > - .target_residency = 100, > + .power.exit_latency = 10, > + .power.target_residency = 100, > .enter = &nap_loop }, > }; > > diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c > index 6f7b019..483d7e7 100644 > --- a/drivers/cpuidle/cpuidle-pseries.c > +++ b/drivers/cpuidle/cpuidle-pseries.c > @@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = { > .name = "snooze", > .desc = "snooze", > .flags = CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 0, > - .target_residency = 0, > + .power.exit_latency = 0, > + .power.target_residency = 0, > .enter = &snooze_loop }, > { /* CEDE */ > .name = "CEDE", > .desc = "CEDE", > .flags = CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 10, > - .target_residency = 100, > + .power.exit_latency = 10, > + .power.target_residency = 100, > .enter = &dedicated_cede_loop }, > }; > > @@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = { > .name = "Shared Cede", > .desc = "Shared Cede", > .flags = CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 0, > - .target_residency = 0, > + .power.exit_latency = 0, > + .power.target_residency = 0, > .enter = &shared_cede_loop }, > }; > > diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c > index 5e35804..3261eb2 100644 > --- a/drivers/cpuidle/cpuidle-ux500.c > +++ b/drivers/cpuidle/cpuidle-ux500.c > @@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = { > .states = { > ARM_CPUIDLE_WFI_STATE, > { > - .enter = ux500_enter_idle, > - .exit_latency = 70, > - .target_residency = 260, > - .flags = CPUIDLE_FLAG_TIME_VALID | > - CPUIDLE_FLAG_TIMER_STOP, > - .name = "ApIdle", > - .desc = "ARM Retention", > + .power.exit_latency = 70, > + .power.target_residency = 260, > + .enter = ux500_enter_idle, > + .flags = CPUIDLE_FLAG_TIME_VALID | > + CPUIDLE_FLAG_TIMER_STOP, > + .name = "ApIdle", > + .desc = "ARM Retention", > }, > }, > .safe_state_index = 0, > diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c > index aded759..dddefb8 100644 > --- a/drivers/cpuidle/cpuidle-zynq.c > +++ b/drivers/cpuidle/cpuidle-zynq.c > @@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = { > .states = { > ARM_CPUIDLE_WFI_STATE, > { > + .power.exit_latency = 10, > + .power.target_residency = 10000, > .enter = zynq_enter_idle, > - .exit_latency = 10, > - .target_residency = 10000, > .flags = CPUIDLE_FLAG_TIME_VALID | > CPUIDLE_FLAG_TIMER_STOP, > .name = "RAM_SR", > diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c > index 06dbe7c..40ddd3c 100644 > --- a/drivers/cpuidle/driver.c > +++ b/drivers/cpuidle/driver.c > @@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv) > > snprintf(state->name, CPUIDLE_NAME_LEN, "POLL"); > snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE"); > - state->exit_latency = 0; > - state->target_residency = 0; > - state->power_usage = -1; > + state->power.exit_latency = 0; > + state->power.target_residency = 0; > + state->power.power_usage = -1; > state->flags = 0; > state->enter = poll_idle; > state->disabled = false; > diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c > index 9f08e8c..4837880 100644 > --- a/drivers/cpuidle/governors/ladder.c > +++ b/drivers/cpuidle/governors/ladder.c > @@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, > > if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) { > last_residency = cpuidle_get_last_residency(dev) - \ > - drv->states[last_idx].exit_latency; > + drv->states[last_idx].power.exit_latency; > } > else > last_residency = last_state->threshold.promotion_time + 1; > @@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, > !drv->states[last_idx + 1].disabled && > !dev->states_usage[last_idx + 1].disable && > last_residency > last_state->threshold.promotion_time && > - drv->states[last_idx + 1].exit_latency <= latency_req) { > + drv->states[last_idx + 1].power.exit_latency <= latency_req) { > last_state->stats.promotion_count++; > last_state->stats.demotion_count = 0; > if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { > @@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv, > if (last_idx > CPUIDLE_DRIVER_STATE_START && > (drv->states[last_idx].disabled || > dev->states_usage[last_idx].disable || > - drv->states[last_idx].exit_latency > latency_req)) { > + drv->states[last_idx].power.exit_latency > latency_req)) { > int i; > > for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) { > - if (drv->states[i].exit_latency <= latency_req) > + if (drv->states[i].power.exit_latency <= latency_req) > break; > } > ladder_do_selection(ldev, last_idx, i); > @@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv, > lstate->threshold.demotion_count = DEMOTION_COUNT; > > if (i < drv->state_count - 1) > - lstate->threshold.promotion_time = state->exit_latency; > + lstate->threshold.promotion_time = > + state->power.exit_latency; > if (i > 0) > - lstate->threshold.demotion_time = state->exit_latency; > + lstate->threshold.demotion_time = > + state->power.exit_latency; > } > > return 0; > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c > index cf7f2f0..34bd463 100644 > --- a/drivers/cpuidle/governors/menu.c > +++ b/drivers/cpuidle/governors/menu.c > @@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) > > if (s->disabled || su->disable) > continue; > - if (s->target_residency > data->predicted_us) > + if (s->power.target_residency > data->predicted_us) > continue; > - if (s->exit_latency > latency_req) > + if (s->power.exit_latency > latency_req) > continue; > - if (s->exit_latency * multiplier > data->predicted_us) > + if (s->power.exit_latency * multiplier > data->predicted_us) > continue; > > data->last_state_idx = i; > - data->exit_us = s->exit_latency; > + data->exit_us = s->power.exit_latency; > } > > return data->last_state_idx; > diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c > index e918b6d..1a45541 100644 > --- a/drivers/cpuidle/sysfs.c > +++ b/drivers/cpuidle/sysfs.c > @@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store) > static ssize_t show_state_##_name(struct cpuidle_state *state, \ > struct cpuidle_state_usage *state_usage, char *buf) \ > { \ > - return sprintf(buf, "%u\n", state->_name);\ > + return sprintf(buf, "%u\n", state->power._name);\ > } > > #define define_store_state_ull_function(_name) \ > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > index 8e1939f..4f0533e 100644 > --- a/drivers/idle/intel_idle.c > +++ b/drivers/idle/intel_idle.c > @@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = { > .name = "C1-NHM", > .desc = "MWAIT 0x00", > .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 3, > - .target_residency = 6, > + .power.exit_latency = 3, > + .power.target_residency = 6, > .enter = &intel_idle }, > { > .name = "C1E-NHM", > .desc = "MWAIT 0x01", > .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 10, > - .target_residency = 20, > + .power.exit_latency = 10, > + .power.target_residency = 20, > .enter = &intel_idle }, > { > .name = "C3-NHM", > .desc = "MWAIT 0x10", > .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 20, > - .target_residency = 80, > + .power.exit_latency = 20, > + .power.target_residency = 80, > .enter = &intel_idle }, > { > .name = "C6-NHM", > .desc = "MWAIT 0x20", > .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 200, > - .target_residency = 800, > + .power.exit_latency = 200, > + .power.target_residency = 800, > .enter = &intel_idle }, > { > .enter = NULL } > @@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = { > .name = "C1-SNB", > .desc = "MWAIT 0x00", > .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 2, > - .target_residency = 2, > + .power.exit_latency = 2, > + .power.target_residency = 2, > .enter = &intel_idle }, > { > .name = "C1E-SNB", > .desc = "MWAIT 0x01", > .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 10, > - .target_residency = 20, > + .power.exit_latency = 10, > + .power.target_residency = 20, > .enter = &intel_idle }, > { > .name = "C3-SNB", > .desc = "MWAIT 0x10", > .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 80, > - .target_residency = 211, > + .power.exit_latency = 80, > + .power.target_residency = 211, > .enter = &intel_idle }, > { > .name = "C6-SNB", > .desc = "MWAIT 0x20", > .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 104, > - .target_residency = 345, > + .power.exit_latency = 104, > + .power.target_residency = 345, > .enter = &intel_idle }, > { > .name = "C7-SNB", > .desc = "MWAIT 0x30", > .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 109, > - .target_residency = 345, > + .power.exit_latency = 109, > + .power.target_residency = 345, > .enter = &intel_idle }, > { > .enter = NULL } > @@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = { > .name = "C1-IVB", > .desc = "MWAIT 0x00", > .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 1, > - .target_residency = 1, > + .power.exit_latency = 1, > + .power.target_residency = 1, > .enter = &intel_idle }, > { > .name = "C1E-IVB", > .desc = "MWAIT 0x01", > .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 10, > - .target_residency = 20, > + .power.exit_latency = 10, > + .power.target_residency = 20, > .enter = &intel_idle }, > { > .name = "C3-IVB", > .desc = "MWAIT 0x10", > .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 59, > - .target_residency = 156, > + .power.exit_latency = 59, > + .power.target_residency = 156, > .enter = &intel_idle }, > { > .name = "C6-IVB", > .desc = "MWAIT 0x20", > .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 80, > - .target_residency = 300, > + .power.exit_latency = 80, > + .power.target_residency = 300, > .enter = &intel_idle }, > { > .name = "C7-IVB", > A .desc = "MWAIT 0x30", > .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 87, > - .target_residency = 300, > + .power.exit_latency = 87, > + .power.target_residency = 300, > .enter = &intel_idle }, > { > .enter = NULL } > @@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = { > .name = "C1-HSW", > .desc = "MWAIT 0x00", > .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 2, > - .target_residency = 2, > + .power.exit_latency = 2, > + .power.target_residency = 2, > .enter = &intel_idle }, > { > .name = "C1E-HSW", > .desc = "MWAIT 0x01", > .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 10, > - .target_residency = 20, > + .power.exit_latency = 10, > + .power.target_residency = 20, > .enter = &intel_idle }, > { > .name = "C3-HSW", > .desc = "MWAIT 0x10", > .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 33, > - .target_residency = 100, > + .power.exit_latency = 33, > + .power.target_residency = 100, > .enter = &intel_idle }, > { > .name = "C6-HSW", > .desc = "MWAIT 0x20", > .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 133, > - .target_residency = 400, > + .power.exit_latency = 133, > + .power.target_residency = 400, > .enter = &intel_idle }, > { > .name = "C7s-HSW", > .desc = "MWAIT 0x32", > .flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 166, > - .target_residency = 500, > + .power.exit_latency = 166, > + .power.target_residency = 500, > .enter = &intel_idle }, > { > .name = "C8-HSW", > .desc = "MWAIT 0x40", > .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 300, > - .target_residency = 900, > + .power.exit_latency = 300, > + .power.target_residency = 900, > .enter = &intel_idle }, > { > .name = "C9-HSW", > .desc = "MWAIT 0x50", > .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 600, > - .target_residency = 1800, > + .power.exit_latency = 600, > + .power.target_residency = 1800, > .enter = &intel_idle }, > { > .name = "C10-HSW", > .desc = "MWAIT 0x60", > .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 2600, > - .target_residency = 7700, > + .power.exit_latency = 2600, > + .power.target_residency = 7700, > .enter = &intel_idle }, > { > .enter = NULL } > @@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = { > .name = "C1E-ATM", > .desc = "MWAIT 0x00", > .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 10, > - .target_residency = 20, > + .power.exit_latency = 10, > + .power.target_residency = 20, > .enter = &intel_idle }, > { > .name = "C2-ATM", > .desc = "MWAIT 0x10", > .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 20, > - .target_residency = 80, > + .power.exit_latency = 20, > + .power.target_residency = 80, > .enter = &intel_idle }, > { > .name = "C4-ATM", > .desc = "MWAIT 0x30", > .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 100, > - .target_residency = 400, > + .power.exit_latency = 100, > + .power.target_residency = 400, > .enter = &intel_idle }, > { > .name = "C6-ATM", > .desc = "MWAIT 0x52", > .flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 140, > - .target_residency = 560, > + .power.exit_latency = 140, > + .power.target_residency = 560, > .enter = &intel_idle }, > { > .enter = NULL } > @@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = { > .name = "C1-AVN", > .desc = "MWAIT 0x00", > .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, > - .exit_latency = 2, > - .target_residency = 2, > + .power.exit_latency = 2, > + .power.target_residency = 2, > .enter = &intel_idle }, > { > .name = "C6-AVN", > .desc = "MWAIT 0x51", > .flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, > - .exit_latency = 15, > - .target_residency = 45, > + .power.exit_latency = 15, > + .power.target_residency = 45, > .enter = &intel_idle }, > { > .enter = NULL } > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h > index b0238cb..eb58ab3 100644 > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -35,14 +35,18 @@ struct cpuidle_state_usage { > unsigned long long time; /* in US */ > }; > > +struct cpuidle_power { > + unsigned int exit_latency; /* in US */ > + unsigned int target_residency; /* in US */ > + int power_usage; /* in mW */ > +}; > + > struct cpuidle_state { > char name[CPUIDLE_NAME_LEN]; > char desc[CPUIDLE_DESC_LEN]; > > unsigned int flags; > - unsigned int exit_latency; /* in US */ > - int power_usage; /* in mW */ > - unsigned int target_residency; /* in US */ > + struct cpuidle_power power; > bool disabled; /* disabled on all CPUs */ > > int (*enter) (struct cpuidle_device *dev, > -- > 1.7.9.5 > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure 2014-03-28 18:17 ` Nicolas Pitre @ 2014-03-28 20:42 ` Daniel Lezcano 2014-03-29 0:00 ` Nicolas Pitre 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-03-28 20:42 UTC (permalink / raw) To: Nicolas Pitre Cc: LKML, mingo, Peter Zijlstra, Rafael J. Wysocki, linux-pm, Alex Shi, Vincent Guittot, Morten Rasmussen Hi Nicolas, thanks for reviewing the patchset. On 03/28/2014 07:17 PM, Nicolas Pitre wrote: > On Fri, 28 Mar 2014, Daniel Lezcano wrote: > >> The scheduler needs some information from cpuidle to know the timing for a >> specific idle state a cpu is. >> >> This patch creates a separate structure to group the cpuidle power info in >> order to share it with the scheduler. It improves the encapsulation of the >> code. > > Having cpuidle_power as a structure name, or worse, 'power' as a struct > member, is a really bad choice. Yes, I was asking myself if this name was a good choice or not. I assumed 'power' could have been a good name because 'target_residency' is a time conversion of the power needed to enter this state. > Amongst the fields this struct > contains, only 1 out of 3 is about power. The word "power" is already > abused quite significantly to mean too many different things already. > > I'd suggest something inspired by your own patch log message i.e. > 'struct cpuidle_info' instead, and use 'info' as a field name within > struct cpuidle_state. Having 'params" instead of "info" could be a good > alternative too, although slightly longer. Hmm 'info' or 'param' sound too vague. What about: cpuidle_attr or cpuidle_property ? > And with struct rq in patch 2/3 I'd simply use: > > struct cpuidle_info *cpuidle; > > This way you'll have rq->cpuidle->exit_latency to refer to from the > scheduler context which is IMHO much more self explanatory. Ok, sounds good. >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> >> --- >> arch/arm/include/asm/cpuidle.h | 6 +- >> arch/arm/mach-exynos/cpuidle.c | 4 +- >> drivers/acpi/processor_idle.c | 4 +- >> drivers/base/power/domain.c | 6 +- >> drivers/cpuidle/cpuidle-at91.c | 4 +- >> drivers/cpuidle/cpuidle-big_little.c | 9 +-- >> drivers/cpuidle/cpuidle-calxeda.c | 6 +- >> drivers/cpuidle/cpuidle-kirkwood.c | 4 +- >> drivers/cpuidle/cpuidle-powernv.c | 8 +-- >> drivers/cpuidle/cpuidle-pseries.c | 12 ++-- >> drivers/cpuidle/cpuidle-ux500.c | 14 ++--- >> drivers/cpuidle/cpuidle-zynq.c | 4 +- >> drivers/cpuidle/driver.c | 6 +- >> drivers/cpuidle/governors/ladder.c | 14 +++-- >> drivers/cpuidle/governors/menu.c | 8 +-- >> drivers/cpuidle/sysfs.c | 2 +- >> drivers/idle/intel_idle.c | 112 +++++++++++++++++----------------- >> include/linux/cpuidle.h | 10 ++- >> 18 files changed, 120 insertions(+), 113 deletions(-) >> >> diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h >> index 2fca60a..987ee53 100644 >> --- a/arch/arm/include/asm/cpuidle.h >> +++ b/arch/arm/include/asm/cpuidle.h >> @@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev, >> /* Common ARM WFI state */ >> #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\ >> .enter = arm_cpuidle_simple_enter,\ >> - .exit_latency = 1,\ >> - .target_residency = 1,\ >> - .power_usage = p,\ >> + .power.exit_latency = 1,\ >> + .power.target_residency = 1,\ >> + .power.power_usage = p,\ >> .flags = CPUIDLE_FLAG_TIME_VALID,\ >> .name = "WFI",\ >> .desc = "ARM WFI",\ >> diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c >> index f57cb91..f6275cb 100644 >> --- a/arch/arm/mach-exynos/cpuidle.c >> +++ b/arch/arm/mach-exynos/cpuidle.c >> @@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = { >> [0] = ARM_CPUIDLE_WFI_STATE, >> [1] = { >> .enter = exynos4_enter_lowpower, >> - .exit_latency = 300, >> - .target_residency = 100000, >> + .power.exit_latency = 300, >> + .power.target_residency = 100000, >> .flags = CPUIDLE_FLAG_TIME_VALID, >> .name = "C1", >> .desc = "ARM power down", >> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c >> index 3dca36d..05fa991 100644 >> --- a/drivers/acpi/processor_idle.c >> +++ b/drivers/acpi/processor_idle.c >> @@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr) >> state = &drv->states[count]; >> snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i); >> strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN); >> - state->exit_latency = cx->latency; >> - state->target_residency = cx->latency * latency_factor; >> + state->power.exit_latency = cx->latency; >> + state->power.target_residency = cx->latency * latency_factor; >> >> state->flags = 0; >> switch (cx->type) { >> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c >> index bfb8955..6bcb1e8 100644 >> --- a/drivers/base/power/domain.c >> +++ b/drivers/base/power/domain.c >> @@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd) >> usecs64 = genpd->power_on_latency_ns; >> do_div(usecs64, NSEC_PER_USEC); >> usecs64 += genpd->cpu_data->saved_exit_latency; >> - genpd->cpu_data->idle_state->exit_latency = usecs64; >> + genpd->cpu_data->idle_state->power.exit_latency = usecs64; >> } >> >> /** >> @@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state) >> goto err; >> } >> cpu_data->idle_state = idle_state; >> - cpu_data->saved_exit_latency = idle_state->exit_latency; >> + cpu_data->saved_exit_latency = idle_state->power.exit_latency; >> genpd->cpu_data = cpu_data; >> genpd_recalc_cpu_exit_latency(genpd); >> >> @@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd) >> ret = -EAGAIN; >> goto out; >> } >> - idle_state->exit_latency = cpu_data->saved_exit_latency; >> + idle_state->power.exit_latency = cpu_data->saved_exit_latency; >> cpuidle_driver_unref(); >> genpd->cpu_data = NULL; >> kfree(cpu_data); >> diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c >> index a077437..48c7063 100644 >> --- a/drivers/cpuidle/cpuidle-at91.c >> +++ b/drivers/cpuidle/cpuidle-at91.c >> @@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = { >> .owner = THIS_MODULE, >> .states[0] = ARM_CPUIDLE_WFI_STATE, >> .states[1] = { >> + .power.exit_latency = 10, >> + .power.target_residency = 10000, >> .enter = at91_enter_idle, >> - .exit_latency = 10, >> - .target_residency = 10000, >> .flags = CPUIDLE_FLAG_TIME_VALID, >> .name = "RAM_SR", >> .desc = "WFI and DDR Self Refresh", >> diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c >> index b45fc62..5a0af4b 100644 >> --- a/drivers/cpuidle/cpuidle-big_little.c >> +++ b/drivers/cpuidle/cpuidle-big_little.c >> @@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = { >> .owner = THIS_MODULE, >> .states[0] = ARM_CPUIDLE_WFI_STATE, >> .states[1] = { >> + .power.exit_latency = 700, >> + .power.target_residency = 2500, >> .enter = bl_enter_powerdown, >> - .exit_latency = 700, >> - .target_residency = 2500, >> .flags = CPUIDLE_FLAG_TIME_VALID | >> CPUIDLE_FLAG_TIMER_STOP, >> .name = "C1", >> @@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = { >> .owner = THIS_MODULE, >> .states[0] = ARM_CPUIDLE_WFI_STATE, >> .states[1] = { >> + >> + .power.exit_latency = 500, >> + .power.target_residency = 2000, >> .enter = bl_enter_powerdown, >> - .exit_latency = 500, >> - .target_residency = 2000, >> .flags = CPUIDLE_FLAG_TIME_VALID | >> CPUIDLE_FLAG_TIMER_STOP, >> .name = "C1", >> diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c >> index 6e51114..8357a20 100644 >> --- a/drivers/cpuidle/cpuidle-calxeda.c >> +++ b/drivers/cpuidle/cpuidle-calxeda.c >> @@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = { >> .name = "PG", >> .desc = "Power Gate", >> .flags = CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 30, >> - .power_usage = 50, >> - .target_residency = 200, >> + .power.exit_latency = 30, >> + .power.power_usage = 50, >> + .power.target_residency = 200, >> .enter = calxeda_pwrdown_idle, >> }, >> }, >> diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c >> index 41ba843..0ae4138 100644 >> --- a/drivers/cpuidle/cpuidle-kirkwood.c >> +++ b/drivers/cpuidle/cpuidle-kirkwood.c >> @@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = { >> .owner = THIS_MODULE, >> .states[0] = ARM_CPUIDLE_WFI_STATE, >> .states[1] = { >> + .power.exit_latency = 10, >> + .power.target_residency = 100000, >> .enter = kirkwood_enter_idle, >> - .exit_latency = 10, >> - .target_residency = 100000, >> .flags = CPUIDLE_FLAG_TIME_VALID, >> .name = "DDR SR", >> .desc = "WFI and DDR Self Refresh", >> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c >> index f48607c..c47cc02 100644 >> --- a/drivers/cpuidle/cpuidle-powernv.c >> +++ b/drivers/cpuidle/cpuidle-powernv.c >> @@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = { >> .name = "snooze", >> .desc = "snooze", >> .flags = CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 0, >> - .target_residency = 0, >> + .power.exit_latency = 0, >> + .power.target_residency = 0, >> .enter = &snooze_loop }, >> { /* NAP */ >> .name = "NAP", >> .desc = "NAP", >> .flags = CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 10, >> - .target_residency = 100, >> + .power.exit_latency = 10, >> + .power.target_residency = 100, >> .enter = &nap_loop }, >> }; >> >> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c >> index 6f7b019..483d7e7 100644 >> --- a/drivers/cpuidle/cpuidle-pseries.c >> +++ b/drivers/cpuidle/cpuidle-pseries.c >> @@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = { >> .name = "snooze", >> .desc = "snooze", >> .flags = CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 0, >> - .target_residency = 0, >> + .power.exit_latency = 0, >> + .power.target_residency = 0, >> .enter = &snooze_loop }, >> { /* CEDE */ >> .name = "CEDE", >> .desc = "CEDE", >> .flags = CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 10, >> - .target_residency = 100, >> + .power.exit_latency = 10, >> + .power.target_residency = 100, >> .enter = &dedicated_cede_loop }, >> }; >> >> @@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = { >> .name = "Shared Cede", >> .desc = "Shared Cede", >> .flags = CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 0, >> - .target_residency = 0, >> + .power.exit_latency = 0, >> + .power.target_residency = 0, >> .enter = &shared_cede_loop }, >> }; >> >> diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c >> index 5e35804..3261eb2 100644 >> --- a/drivers/cpuidle/cpuidle-ux500.c >> +++ b/drivers/cpuidle/cpuidle-ux500.c >> @@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = { >> .states = { >> ARM_CPUIDLE_WFI_STATE, >> { >> - .enter = ux500_enter_idle, >> - .exit_latency = 70, >> - .target_residency = 260, >> - .flags = CPUIDLE_FLAG_TIME_VALID | >> - CPUIDLE_FLAG_TIMER_STOP, >> - .name = "ApIdle", >> - .desc = "ARM Retention", >> + .power.exit_latency = 70, >> + .power.target_residency = 260, >> + .enter = ux500_enter_idle, >> + .flags = CPUIDLE_FLAG_TIME_VALID | >> + CPUIDLE_FLAG_TIMER_STOP, >> + .name = "ApIdle", >> + .desc = "ARM Retention", >> }, >> }, >> .safe_state_index = 0, >> diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c >> index aded759..dddefb8 100644 >> --- a/drivers/cpuidle/cpuidle-zynq.c >> +++ b/drivers/cpuidle/cpuidle-zynq.c >> @@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = { >> .states = { >> ARM_CPUIDLE_WFI_STATE, >> { >> + .power.exit_latency = 10, >> + .power.target_residency = 10000, >> .enter = zynq_enter_idle, >> - .exit_latency = 10, >> - .target_residency = 10000, >> .flags = CPUIDLE_FLAG_TIME_VALID | >> CPUIDLE_FLAG_TIMER_STOP, >> .name = "RAM_SR", >> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c >> index 06dbe7c..40ddd3c 100644 >> --- a/drivers/cpuidle/driver.c >> +++ b/drivers/cpuidle/driver.c >> @@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv) >> >> snprintf(state->name, CPUIDLE_NAME_LEN, "POLL"); >> snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE"); >> - state->exit_latency = 0; >> - state->target_residency = 0; >> - state->power_usage = -1; >> + state->power.exit_latency = 0; >> + state->power.target_residency = 0; >> + state->power.power_usage = -1; >> state->flags = 0; >> state->enter = poll_idle; >> state->disabled = false; >> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c >> index 9f08e8c..4837880 100644 >> --- a/drivers/cpuidle/governors/ladder.c >> +++ b/drivers/cpuidle/governors/ladder.c >> @@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, >> >> if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) { >> last_residency = cpuidle_get_last_residency(dev) - \ >> - drv->states[last_idx].exit_latency; >> + drv->states[last_idx].power.exit_latency; >> } >> else >> last_residency = last_state->threshold.promotion_time + 1; >> @@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, >> !drv->states[last_idx + 1].disabled && >> !dev->states_usage[last_idx + 1].disable && >> last_residency > last_state->threshold.promotion_time && >> - drv->states[last_idx + 1].exit_latency <= latency_req) { >> + drv->states[last_idx + 1].power.exit_latency <= latency_req) { >> last_state->stats.promotion_count++; >> last_state->stats.demotion_count = 0; >> if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { >> @@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv, >> if (last_idx > CPUIDLE_DRIVER_STATE_START && >> (drv->states[last_idx].disabled || >> dev->states_usage[last_idx].disable || >> - drv->states[last_idx].exit_latency > latency_req)) { >> + drv->states[last_idx].power.exit_latency > latency_req)) { >> int i; >> >> for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) { >> - if (drv->states[i].exit_latency <= latency_req) >> + if (drv->states[i].power.exit_latency <= latency_req) >> break; >> } >> ladder_do_selection(ldev, last_idx, i); >> @@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv, >> lstate->threshold.demotion_count = DEMOTION_COUNT; >> >> if (i < drv->state_count - 1) >> - lstate->threshold.promotion_time = state->exit_latency; >> + lstate->threshold.promotion_time = >> + state->power.exit_latency; >> if (i > 0) >> - lstate->threshold.demotion_time = state->exit_latency; >> + lstate->threshold.demotion_time = >> + state->power.exit_latency; >> } >> >> return 0; >> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c >> index cf7f2f0..34bd463 100644 >> --- a/drivers/cpuidle/governors/menu.c >> +++ b/drivers/cpuidle/governors/menu.c >> @@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) >> >> if (s->disabled || su->disable) >> continue; >> - if (s->target_residency > data->predicted_us) >> + if (s->power.target_residency > data->predicted_us) >> continue; >> - if (s->exit_latency > latency_req) >> + if (s->power.exit_latency > latency_req) >> continue; >> - if (s->exit_latency * multiplier > data->predicted_us) >> + if (s->power.exit_latency * multiplier > data->predicted_us) >> continue; >> >> data->last_state_idx = i; >> - data->exit_us = s->exit_latency; >> + data->exit_us = s->power.exit_latency; >> } >> >> return data->last_state_idx; >> diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c >> index e918b6d..1a45541 100644 >> --- a/drivers/cpuidle/sysfs.c >> +++ b/drivers/cpuidle/sysfs.c >> @@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store) >> static ssize_t show_state_##_name(struct cpuidle_state *state, \ >> struct cpuidle_state_usage *state_usage, char *buf) \ >> { \ >> - return sprintf(buf, "%u\n", state->_name);\ >> + return sprintf(buf, "%u\n", state->power._name);\ >> } >> >> #define define_store_state_ull_function(_name) \ >> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c >> index 8e1939f..4f0533e 100644 >> --- a/drivers/idle/intel_idle.c >> +++ b/drivers/idle/intel_idle.c >> @@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = { >> .name = "C1-NHM", >> .desc = "MWAIT 0x00", >> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 3, >> - .target_residency = 6, >> + .power.exit_latency = 3, >> + .power.target_residency = 6, >> .enter = &intel_idle }, >> { >> .name = "C1E-NHM", >> .desc = "MWAIT 0x01", >> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 10, >> - .target_residency = 20, >> + .power.exit_latency = 10, >> + .power.target_residency = 20, >> .enter = &intel_idle }, >> { >> .name = "C3-NHM", >> .desc = "MWAIT 0x10", >> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 20, >> - .target_residency = 80, >> + .power.exit_latency = 20, >> + .power.target_residency = 80, >> .enter = &intel_idle }, >> { >> .name = "C6-NHM", >> .desc = "MWAIT 0x20", >> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 200, >> - .target_residency = 800, >> + .power.exit_latency = 200, >> + .power.target_residency = 800, >> .enter = &intel_idle }, >> { >> .enter = NULL } >> @@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = { >> .name = "C1-SNB", >> .desc = "MWAIT 0x00", >> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 2, >> - .target_residency = 2, >> + .power.exit_latency = 2, >> + .power.target_residency = 2, >> .enter = &intel_idle }, >> { >> .name = "C1E-SNB", >> .desc = "MWAIT 0x01", >> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 10, >> - .target_residency = 20, >> + .power.exit_latency = 10, >> + .power.target_residency = 20, >> .enter = &intel_idle }, >> { >> .name = "C3-SNB", >> .desc = "MWAIT 0x10", >> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 80, >> - .target_residency = 211, >> + .power.exit_latency = 80, >> + .power.target_residency = 211, >> .enter = &intel_idle }, >> { >> .name = "C6-SNB", >> .desc = "MWAIT 0x20", >> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 104, >> - .target_residency = 345, >> + .power.exit_latency = 104, >> + .power.target_residency = 345, >> .enter = &intel_idle }, >> { >> .name = "C7-SNB", >> .desc = "MWAIT 0x30", >> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 109, >> - .target_residency = 345, >> + .power.exit_latency = 109, >> + .power.target_residency = 345, >> .enter = &intel_idle }, >> { >> .enter = NULL } >> @@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = { >> .name = "C1-IVB", >> .desc = "MWAIT 0x00", >> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 1, >> - .target_residency = 1, >> + .power.exit_latency = 1, >> + .power.target_residency = 1, >> .enter = &intel_idle }, >> { >> .name = "C1E-IVB", >> .desc = "MWAIT 0x01", >> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 10, >> - .target_residency = 20, >> + .power.exit_latency = 10, >> + .power.target_residency = 20, >> .enter = &intel_idle }, >> { >> .name = "C3-IVB", >> .desc = "MWAIT 0x10", >> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 59, >> - .target_residency = 156, >> + .power.exit_latency = 59, >> + .power.target_residency = 156, >> .enter = &intel_idle }, >> { >> .name = "C6-IVB", >> .desc = "MWAIT 0x20", >> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 80, >> - .target_residency = 300, >> + .power.exit_latency = 80, >> + .power.target_residency = 300, >> .enter = &intel_idle }, >> { >> .name = "C7-IVB", >> A .desc = "MWAIT 0x30", >> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 87, >> - .target_residency = 300, >> + .power.exit_latency = 87, >> + .power.target_residency = 300, >> .enter = &intel_idle }, >> { >> .enter = NULL } >> @@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = { >> .name = "C1-HSW", >> .desc = "MWAIT 0x00", >> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 2, >> - .target_residency = 2, >> + .power.exit_latency = 2, >> + .power.target_residency = 2, >> .enter = &intel_idle }, >> { >> .name = "C1E-HSW", >> .desc = "MWAIT 0x01", >> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 10, >> - .target_residency = 20, >> + .power.exit_latency = 10, >> + .power.target_residency = 20, >> .enter = &intel_idle }, >> { >> .name = "C3-HSW", >> .desc = "MWAIT 0x10", >> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 33, >> - .target_residency = 100, >> + .power.exit_latency = 33, >> + .power.target_residency = 100, >> .enter = &intel_idle }, >> { >> .name = "C6-HSW", >> .desc = "MWAIT 0x20", >> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 133, >> - .target_residency = 400, >> + .power.exit_latency = 133, >> + .power.target_residency = 400, >> .enter = &intel_idle }, >> { >> .name = "C7s-HSW", >> .desc = "MWAIT 0x32", >> .flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 166, >> - .target_residency = 500, >> + .power.exit_latency = 166, >> + .power.target_residency = 500, >> .enter = &intel_idle }, >> { >> .name = "C8-HSW", >> .desc = "MWAIT 0x40", >> .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 300, >> - .target_residency = 900, >> + .power.exit_latency = 300, >> + .power.target_residency = 900, >> .enter = &intel_idle }, >> { >> .name = "C9-HSW", >> .desc = "MWAIT 0x50", >> .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 600, >> - .target_residency = 1800, >> + .power.exit_latency = 600, >> + .power.target_residency = 1800, >> .enter = &intel_idle }, >> { >> .name = "C10-HSW", >> .desc = "MWAIT 0x60", >> .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 2600, >> - .target_residency = 7700, >> + .power.exit_latency = 2600, >> + .power.target_residency = 7700, >> .enter = &intel_idle }, >> { >> .enter = NULL } >> @@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = { >> .name = "C1E-ATM", >> .desc = "MWAIT 0x00", >> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 10, >> - .target_residency = 20, >> + .power.exit_latency = 10, >> + .power.target_residency = 20, >> .enter = &intel_idle }, >> { >> .name = "C2-ATM", >> .desc = "MWAIT 0x10", >> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 20, >> - .target_residency = 80, >> + .power.exit_latency = 20, >> + .power.target_residency = 80, >> .enter = &intel_idle }, >> { >> .name = "C4-ATM", >> .desc = "MWAIT 0x30", >> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 100, >> - .target_residency = 400, >> + .power.exit_latency = 100, >> + .power.target_residency = 400, >> .enter = &intel_idle }, >> { >> .name = "C6-ATM", >> .desc = "MWAIT 0x52", >> .flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 140, >> - .target_residency = 560, >> + .power.exit_latency = 140, >> + .power.target_residency = 560, >> .enter = &intel_idle }, >> { >> .enter = NULL } >> @@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = { >> .name = "C1-AVN", >> .desc = "MWAIT 0x00", >> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID, >> - .exit_latency = 2, >> - .target_residency = 2, >> + .power.exit_latency = 2, >> + .power.target_residency = 2, >> .enter = &intel_idle }, >> { >> .name = "C6-AVN", >> .desc = "MWAIT 0x51", >> .flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED, >> - .exit_latency = 15, >> - .target_residency = 45, >> + .power.exit_latency = 15, >> + .power.target_residency = 45, >> .enter = &intel_idle }, >> { >> .enter = NULL } >> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h >> index b0238cb..eb58ab3 100644 >> --- a/include/linux/cpuidle.h >> +++ b/include/linux/cpuidle.h >> @@ -35,14 +35,18 @@ struct cpuidle_state_usage { >> unsigned long long time; /* in US */ >> }; >> >> +struct cpuidle_power { >> + unsigned int exit_latency; /* in US */ >> + unsigned int target_residency; /* in US */ >> + int power_usage; /* in mW */ >> +}; >> + >> struct cpuidle_state { >> char name[CPUIDLE_NAME_LEN]; >> char desc[CPUIDLE_DESC_LEN]; >> >> unsigned int flags; >> - unsigned int exit_latency; /* in US */ >> - int power_usage; /* in mW */ >> - unsigned int target_residency; /* in US */ >> + struct cpuidle_power power; >> bool disabled; /* disabled on all CPUs */ >> >> int (*enter) (struct cpuidle_device *dev, >> -- >> 1.7.9.5 >> -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure 2014-03-28 20:42 ` Daniel Lezcano @ 2014-03-29 0:00 ` Nicolas Pitre 0 siblings, 0 replies; 47+ messages in thread From: Nicolas Pitre @ 2014-03-29 0:00 UTC (permalink / raw) To: Daniel Lezcano Cc: LKML, mingo, Peter Zijlstra, Rafael J. Wysocki, linux-pm, Alex Shi, Vincent Guittot, Morten Rasmussen On Fri, 28 Mar 2014, Daniel Lezcano wrote: > Hi Nicolas, > > thanks for reviewing the patchset. > > On 03/28/2014 07:17 PM, Nicolas Pitre wrote: > > On Fri, 28 Mar 2014, Daniel Lezcano wrote: > > > >> The scheduler needs some information from cpuidle to know the timing for a > >> specific idle state a cpu is. > >> > >> This patch creates a separate structure to group the cpuidle power info in > >> order to share it with the scheduler. It improves the encapsulation of the > >> code. > > > > Having cpuidle_power as a structure name, or worse, 'power' as a struct > > member, is a really bad choice. > > Yes, I was asking myself if this name was a good choice or not. I > assumed 'power' could have been a good name because 'target_residency' > is a time conversion of the power needed to enter this state. Still, that's something the casual reviewer might not know. And we ought to be careful when talking about power as well. By definition, power means energy transferred per unit of time. Sometimes we tend to say 'power' when we actually mean 'energy'. With more "power aware" work going into the scheduler, it is better to disambiguate those terms. > > Amongst the fields this struct > > contains, only 1 out of 3 is about power. The word "power" is already > > abused quite significantly to mean too many different things already. > > > > I'd suggest something inspired by your own patch log message i.e. > > 'struct cpuidle_info' instead, and use 'info' as a field name within > > struct cpuidle_state. Having 'params" instead of "info" could be a good > > alternative too, although slightly longer. > > Hmm 'info' or 'param' sound too vague. What about: > > cpuidle_attr > or > cpuidle_property As you wish. As long as it isn't 'power'. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCHC 2/3] idle: store the idle state the cpu is 2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano 2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano @ 2014-03-28 12:29 ` Daniel Lezcano 2014-04-15 12:43 ` Peter Zijlstra 2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano ` (3 subsequent siblings) 5 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw) To: linux-kernel, mingo, peterz Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen When the cpu enters idle it stores the cpuidle power info in the struct rq which in turn could be used to take a right decision when balancing a task. As soon as the cpu exits the idle state, the structure is filled with the NULL pointer. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- kernel/sched/idle.c | 17 +++++++++++++++-- kernel/sched/sched.h | 5 +++++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 8f4390a..5c32c11 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -12,6 +12,8 @@ #include <trace/events/power.h> +#include "sched.h" + static int __read_mostly cpu_idle_force_poll; void cpu_idle_poll_ctrl(bool enable) @@ -69,7 +71,7 @@ void __weak arch_cpu_idle(void) * NOTE: no locks or semaphores should be used here * return non-zero on failure */ -static int cpuidle_idle_call(void) +static int cpuidle_idle_call(struct cpuidle_power **power) { struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices); struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev); @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void) if (!ret) { trace_cpu_idle_rcuidle(next_state, dev->cpu); + *power = &drv->states[next_state].power; + + wmb(); + /* * Enter the idle state previously * returned by the governor @@ -154,6 +160,10 @@ static int cpuidle_idle_call(void) entered_state = cpuidle_enter(drv, dev, next_state); + *power = NULL; + + wmb(); + trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu); @@ -198,6 +208,9 @@ static int cpuidle_idle_call(void) */ static void cpu_idle_loop(void) { + struct rq *rq = this_rq(); + struct cpuidle_power **power = &rq->power; + while (1) { tick_nohz_idle_enter(); @@ -223,7 +236,7 @@ static void cpu_idle_loop(void) if (cpu_idle_force_poll || tick_check_broadcast_expired()) cpu_idle_poll(); else - cpuidle_idle_call(); + cpuidle_idle_call(power); arch_cpu_idle_exit(); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1929deb..1bcac35 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -14,6 +14,7 @@ #include "cpuacct.h" struct rq; +struct cpuidle_power; extern __read_mostly int scheduler_running; @@ -632,6 +633,10 @@ struct rq { #ifdef CONFIG_SMP struct llist_head wake_list; #endif + +#ifdef CONFIG_CPU_IDLE + struct cpuidle_power *power; +#endif }; static inline int cpu_of(struct rq *rq) -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is 2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano @ 2014-04-15 12:43 ` Peter Zijlstra 2014-04-15 12:44 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2014-04-15 12:43 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote: > @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void) > if (!ret) { > trace_cpu_idle_rcuidle(next_state, dev->cpu); > > + *power = &drv->states[next_state].power; > + > + wmb(); > + I very much suspect you meant: smp_wmb(), as I don't see the hardware reading that pointer, therefore UP wouldn't care. Also, any and all barriers should come with a comment that describes the data ordering and points to the matchin barriers. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is 2014-04-15 12:43 ` Peter Zijlstra @ 2014-04-15 12:44 ` Peter Zijlstra 2014-04-15 14:17 ` Daniel Lezcano 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2014-04-15 12:44 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote: > On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote: > > @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void) > > if (!ret) { > > trace_cpu_idle_rcuidle(next_state, dev->cpu); > > > > + *power = &drv->states[next_state].power; > > + > > + wmb(); > > + > > I very much suspect you meant: smp_wmb(), as I don't see the hardware > reading that pointer, therefore UP wouldn't care. Also, any and all > barriers should come with a comment that describes the data ordering and > points to the matchin barriers. Furthermore, this patch fails to describe the life-time rules of the object placed there. Can the objected pointed to ever disappear? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is 2014-04-15 12:44 ` Peter Zijlstra @ 2014-04-15 14:17 ` Daniel Lezcano 2014-04-15 14:33 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-04-15 14:17 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/15/2014 02:44 PM, Peter Zijlstra wrote: > On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote: >> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote: >>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void) >>> if (!ret) { >>> trace_cpu_idle_rcuidle(next_state, dev->cpu); >>> >>> + *power = &drv->states[next_state].power; >>> + >>> + wmb(); >>> + >> >> I very much suspect you meant: smp_wmb(), as I don't see the hardware >> reading that pointer, therefore UP wouldn't care. Also, any and all >> barriers should come with a comment that describes the data ordering and >> points to the matchin barriers. > > Furthermore, this patch fails to describe the life-time rules of the > object placed there. Can the objected pointed to ever disappear? Hi Peter, thanks for reviewing the patches. There are a couple of situations where a cpuidle state can disappear: 1. For x86/acpi with dynamic c-states, when a laptop switches from battery to AC that could result on removing the deeper idle state. The acpi driver triggers: 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'. This one will call 'cpuidle_uninstall_idle_handler' which in turn calls 'kick_all_cpus_sync'. All cpus will exit their idle state and the pointed object will be set to NULL again. 2. The cpuidle driver is unloaded. Logically that could happen but not in practice because the drivers are always compiled in and 95% of the drivers are not coded to unregister the driver. Anyway ... The unloading code must call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'. IIUC, the race can happen if we take the pointer and then one of these two situation occurs at the same moment. As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler' should suffice, no ? Thanks -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is 2014-04-15 14:17 ` Daniel Lezcano @ 2014-04-15 14:33 ` Peter Zijlstra 2014-04-15 14:39 ` Daniel Lezcano 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2014-04-15 14:33 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote: > On 04/15/2014 02:44 PM, Peter Zijlstra wrote: > >On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote: > >>On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote: > >>>@@ -143,6 +145,10 @@ static int cpuidle_idle_call(void) > >>> if (!ret) { > >>> trace_cpu_idle_rcuidle(next_state, dev->cpu); > >>> > >>>+ *power = &drv->states[next_state].power; > >>>+ > >>>+ wmb(); > >>>+ > >> > >>I very much suspect you meant: smp_wmb(), as I don't see the hardware > >>reading that pointer, therefore UP wouldn't care. Also, any and all > >>barriers should come with a comment that describes the data ordering and > >>points to the matchin barriers. > > > >Furthermore, this patch fails to describe the life-time rules of the > >object placed there. Can the objected pointed to ever disappear? > > Hi Peter, > > thanks for reviewing the patches. > > There are a couple of situations where a cpuidle state can disappear: > > 1. For x86/acpi with dynamic c-states, when a laptop switches from battery > to AC that could result on removing the deeper idle state. The acpi driver > triggers: > > 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'. > This one will call 'cpuidle_uninstall_idle_handler' which in turn calls > 'kick_all_cpus_sync'. > > All cpus will exit their idle state and the pointed object will be set to > NULL again. > > 2. The cpuidle driver is unloaded. Logically that could happen but not in > practice because the drivers are always compiled in and 95% of the drivers > are not coded to unregister the driver. Anyway ... > > The unloading code must call 'cpuidle_unregister_device', that calls > 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'. > > IIUC, the race can happen if we take the pointer and then one of these two > situation occurs at the same moment. > > As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a > rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler' > should suffice, no ? Indeed. But be sure to document this. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is 2014-04-15 14:33 ` Peter Zijlstra @ 2014-04-15 14:39 ` Daniel Lezcano 0 siblings, 0 replies; 47+ messages in thread From: Daniel Lezcano @ 2014-04-15 14:39 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/15/2014 04:33 PM, Peter Zijlstra wrote: > On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote: >> On 04/15/2014 02:44 PM, Peter Zijlstra wrote: >>> On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote: >>>> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote: >>>>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void) >>>>> if (!ret) { >>>>> trace_cpu_idle_rcuidle(next_state, dev->cpu); >>>>> >>>>> + *power = &drv->states[next_state].power; >>>>> + >>>>> + wmb(); >>>>> + >>>> >>>> I very much suspect you meant: smp_wmb(), as I don't see the hardware >>>> reading that pointer, therefore UP wouldn't care. Also, any and all >>>> barriers should come with a comment that describes the data ordering and >>>> points to the matchin barriers. >>> >>> Furthermore, this patch fails to describe the life-time rules of the >>> object placed there. Can the objected pointed to ever disappear? >> >> Hi Peter, >> >> thanks for reviewing the patches. >> >> There are a couple of situations where a cpuidle state can disappear: >> >> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery >> to AC that could result on removing the deeper idle state. The acpi driver >> triggers: >> >> 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'. >> This one will call 'cpuidle_uninstall_idle_handler' which in turn calls >> 'kick_all_cpus_sync'. >> >> All cpus will exit their idle state and the pointed object will be set to >> NULL again. >> >> 2. The cpuidle driver is unloaded. Logically that could happen but not in >> practice because the drivers are always compiled in and 95% of the drivers >> are not coded to unregister the driver. Anyway ... >> >> The unloading code must call 'cpuidle_unregister_device', that calls >> 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'. >> >> IIUC, the race can happen if we take the pointer and then one of these two >> situation occurs at the same moment. >> >> As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a >> rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler' >> should suffice, no ? > > Indeed. But be sure to document this. Yes, sure. Thanks for pointing this. -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano 2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano 2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano @ 2014-03-28 12:29 ` Daniel Lezcano 2014-04-02 3:05 ` Nicolas Pitre 2014-04-15 13:03 ` Peter Zijlstra 2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot ` (2 subsequent siblings) 5 siblings, 2 replies; 47+ messages in thread From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw) To: linux-kernel, mingo, peterz Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen As we know in which idle state the cpu is, we can investigate the following: 1. when did the cpu entered the idle state ? the longer the cpu is idle, the deeper it is idle 2. what exit latency is ? the greater the exit latency is, the deeper it is With both information, when all cpus are idle, we can choose the idlest cpu. When one cpu is not idle, the old check against weighted load applies. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- kernel/sched/fair.c | 46 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 16042b5..068e503 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -23,6 +23,7 @@ #include <linux/latencytop.h> #include <linux/sched.h> #include <linux/cpumask.h> +#include <linux/cpuidle.h> #include <linux/slab.h> #include <linux/profile.h> #include <linux/interrupt.h> @@ -4336,20 +4337,53 @@ static int find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) { unsigned long load, min_load = ULONG_MAX; - int idlest = -1; + unsigned int min_exit_latency = UINT_MAX; + u64 idle_stamp, min_idle_stamp = ULONG_MAX; + + struct rq *rq; + struct cpuidle_power *power; + + int cpu_idle = -1; + int cpu_busy = -1; int i; /* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { - load = weighted_cpuload(i); - if (load < min_load || (load == min_load && i == this_cpu)) { - min_load = load; - idlest = i; + if (idle_cpu(i)) { + + rq = cpu_rq(i); + power = rq->power; + idle_stamp = rq->idle_stamp; + + /* The cpu is idle since a shorter time */ + if (idle_stamp < min_idle_stamp) { + min_idle_stamp = idle_stamp; + cpu_idle = i; + continue; + } + + /* The cpu is idle but the exit_latency is shorter */ + if (power && power->exit_latency < min_exit_latency) { + min_exit_latency = power->exit_latency; + cpu_idle = i; + continue; + } + } else { + + load = weighted_cpuload(i); + + if (load < min_load || + (load == min_load && i == this_cpu)) { + min_load = load; + cpu_busy = i; + continue; + } } } - return idlest; + /* Busy cpus are considered less idle than idle cpus ;) */ + return cpu_busy != -1 ? cpu_busy : cpu_idle; } /* -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano @ 2014-04-02 3:05 ` Nicolas Pitre 2014-04-04 11:57 ` Rafael J. Wysocki 2014-04-17 13:53 ` Daniel Lezcano 2014-04-15 13:03 ` Peter Zijlstra 1 sibling, 2 replies; 47+ messages in thread From: Nicolas Pitre @ 2014-04-02 3:05 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, 28 Mar 2014, Daniel Lezcano wrote: > As we know in which idle state the cpu is, we can investigate the following: > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the > deeper it is idle > 2. what exit latency is ? the greater the exit latency is, the deeper it is > > With both information, when all cpus are idle, we can choose the idlest cpu. > > When one cpu is not idle, the old check against weighted load applies. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> There seems to be some problems with the implementation. > @@ -4336,20 +4337,53 @@ static int > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) > { > unsigned long load, min_load = ULONG_MAX; > - int idlest = -1; > + unsigned int min_exit_latency = UINT_MAX; > + u64 idle_stamp, min_idle_stamp = ULONG_MAX; I don't think you really meant to assign an u64 variable with ULONG_MAX. You probably want ULLONG_MAX here. And probably not in fact (more later). > + > + struct rq *rq; > + struct cpuidle_power *power; > + > + int cpu_idle = -1; > + int cpu_busy = -1; > int i; > > /* Traverse only the allowed CPUs */ > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > - load = weighted_cpuload(i); > > - if (load < min_load || (load == min_load && i == this_cpu)) { > - min_load = load; > - idlest = i; > + if (idle_cpu(i)) { > + > + rq = cpu_rq(i); > + power = rq->power; > + idle_stamp = rq->idle_stamp; > + > + /* The cpu is idle since a shorter time */ > + if (idle_stamp < min_idle_stamp) { > + min_idle_stamp = idle_stamp; > + cpu_idle = i; > + continue; Don't you want the highest time stamp in order to select the most recently idled CPU? Favoring the CPU which has been idle the longest makes little sense. > + } > + > + /* The cpu is idle but the exit_latency is shorter */ > + if (power && power->exit_latency < min_exit_latency) { > + min_exit_latency = power->exit_latency; > + cpu_idle = i; > + continue; > + } I think this is wrong. This gives priority to CPUs which have been idle for a (longer... although this should have been) shorter period of time over those with a shallower idle state. I think this should rather be: if (power && power->exit_latency < min_exit_latency) { min_exit_latency = power->exit_latency; latest_idle_stamp = idle_stamp; cpu_idle = i; } else if ((!power || power->exit_latency == min_exit_latency) && idle_stamp > latest_idle_stamp) { latest_idle_stamp = idle_stamp; cpu_idle = i; } So the CPU with the shallowest idle state is selected in priority, and if many CPUs are in the same state then the time stamp is used to select the most recent one. Whenever a shallower idle state is found then the latest_idle_stamp is reset for that state even if it is further in the past. > + } else { > + > + load = weighted_cpuload(i); > + > + if (load < min_load || > + (load == min_load && i == this_cpu)) { > + min_load = load; > + cpu_busy = i; > + continue; > + } > } I think this is wrong to do an if-else based on idle_cpu() here. What if a CPU is heavily loaded, but for some reason it happens to be idle at this very moment? With your patch it could be selected as an idle CPU while it would be discarded as being too busy otherwise. It is important to determine both cpu_busy and cpu_idle for all CPUs. And cpu_busy is a bad name for this. Something like least_loaded would be more self explanatory. Same thing for cpu_idle which could be clearer if named shalloest_idle. > - return idlest; > + /* Busy cpus are considered less idle than idle cpus ;) */ > + return cpu_busy != -1 ? cpu_busy : cpu_idle; And finally it is a policy decision whether or not we want to return least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs first or not. That in itself needs more investigation. To keep the existing policy unchanged for now the above condition should have its variables swapped. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-02 3:05 ` Nicolas Pitre @ 2014-04-04 11:57 ` Rafael J. Wysocki 2014-04-04 16:56 ` Nicolas Pitre 2014-04-17 13:53 ` Daniel Lezcano 1 sibling, 1 reply; 47+ messages in thread From: Rafael J. Wysocki @ 2014-04-04 11:57 UTC (permalink / raw) To: Nicolas Pitre Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote: > On Fri, 28 Mar 2014, Daniel Lezcano wrote: > > > As we know in which idle state the cpu is, we can investigate the following: > > > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the > > deeper it is idle > > 2. what exit latency is ? the greater the exit latency is, the deeper it is > > > > With both information, when all cpus are idle, we can choose the idlest cpu. > > > > When one cpu is not idle, the old check against weighted load applies. > > > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > There seems to be some problems with the implementation. > > > @@ -4336,20 +4337,53 @@ static int > > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) > > { > > unsigned long load, min_load = ULONG_MAX; > > - int idlest = -1; > > + unsigned int min_exit_latency = UINT_MAX; > > + u64 idle_stamp, min_idle_stamp = ULONG_MAX; > > I don't think you really meant to assign an u64 variable with ULONG_MAX. > You probably want ULLONG_MAX here. And probably not in fact (more > later). > > > + > > + struct rq *rq; > > + struct cpuidle_power *power; > > + > > + int cpu_idle = -1; > > + int cpu_busy = -1; > > int i; > > > > /* Traverse only the allowed CPUs */ > > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > > - load = weighted_cpuload(i); > > > > - if (load < min_load || (load == min_load && i == this_cpu)) { > > - min_load = load; > > - idlest = i; > > + if (idle_cpu(i)) { > > + > > + rq = cpu_rq(i); > > + power = rq->power; > > + idle_stamp = rq->idle_stamp; > > + > > + /* The cpu is idle since a shorter time */ > > + if (idle_stamp < min_idle_stamp) { > > + min_idle_stamp = idle_stamp; > > + cpu_idle = i; > > + continue; > > Don't you want the highest time stamp in order to select the most > recently idled CPU? Favoring the CPU which has been idle the longest > makes little sense. It may make sense if the hardware can auto-promote CPUs to deeper C-states. Something like that happens with package C-states that are only entered when all cores have entered a particular core C-state already. In that case the probability of the core being in a deeper state grows with time. That said I would just drop this heuristics for the time being. If auto-promotion is disregarded, it doesn't really matter how much time the given CPU has been idle except for one case: When the target residency of its idle state hasn't been reached yet, waking up the CPU may be a mistake (depending on how deep the state actually is, but for the majority of drivers in the tree we don't have any measure of that). > > + } > > + > > + /* The cpu is idle but the exit_latency is shorter */ > > + if (power && power->exit_latency < min_exit_latency) { > > + min_exit_latency = power->exit_latency; > > + cpu_idle = i; > > + continue; > > + } > > I think this is wrong. This gives priority to CPUs which have been idle > for a (longer... although this should have been) shorter period of time > over those with a shallower idle state. I think this should rather be: > > if (power && power->exit_latency < min_exit_latency) { > min_exit_latency = power->exit_latency; > latest_idle_stamp = idle_stamp; > cpu_idle = i; > } else if ((!power || power->exit_latency == min_exit_latency) && > idle_stamp > latest_idle_stamp) { > latest_idle_stamp = idle_stamp; > cpu_idle = i; > } > > So the CPU with the shallowest idle state is selected in priority, and > if many CPUs are in the same state then the time stamp is used to > select the most recent one. Again, if auto-promotion is disregarded, it doesn't really matter which of them is woken up. > Whenever a shallower idle state is found then the latest_idle_stamp is reset for > that state even if it is further in the past. > > > + } else { > > + > > + load = weighted_cpuload(i); > > + > > + if (load < min_load || > > + (load == min_load && i == this_cpu)) { > > + min_load = load; > > + cpu_busy = i; > > + continue; > > + } > > } > > I think this is wrong to do an if-else based on idle_cpu() here. What > if a CPU is heavily loaded, but for some reason it happens to be idle at > this very moment? With your patch it could be selected as an idle CPU > while it would be discarded as being too busy otherwise. But see below -> > It is important to determine both cpu_busy and cpu_idle for all CPUs. > > And cpu_busy is a bad name for this. Something like least_loaded would > be more self explanatory. Same thing for cpu_idle which could be > clearer if named shalloest_idle. shallowest_idle? > > - return idlest; > > + /* Busy cpus are considered less idle than idle cpus ;) */ > > + return cpu_busy != -1 ? cpu_busy : cpu_idle; > > And finally it is a policy decision whether or not we want to return > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs > first or not. That in itself needs more investigation. To keep the > existing policy unchanged for now the above condition should have its > variables swapped. Which means that once we've find the first idle CPU, it is not useful to continue computing least_loaded, because we will return the idle one anyway, right? -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-04 11:57 ` Rafael J. Wysocki @ 2014-04-04 16:56 ` Nicolas Pitre 2014-04-05 2:01 ` Rafael J. Wysocki 0 siblings, 1 reply; 47+ messages in thread From: Nicolas Pitre @ 2014-04-04 16:56 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, 4 Apr 2014, Rafael J. Wysocki wrote: > On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote: > > On Fri, 28 Mar 2014, Daniel Lezcano wrote: > > > > > As we know in which idle state the cpu is, we can investigate the following: > > > > > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the > > > deeper it is idle > > > 2. what exit latency is ? the greater the exit latency is, the deeper it is > > > > > > With both information, when all cpus are idle, we can choose the idlest cpu. > > > > > > When one cpu is not idle, the old check against weighted load applies. > > > > > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > > > There seems to be some problems with the implementation. > > > > > @@ -4336,20 +4337,53 @@ static int > > > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) > > > { > > > unsigned long load, min_load = ULONG_MAX; > > > - int idlest = -1; > > > + unsigned int min_exit_latency = UINT_MAX; > > > + u64 idle_stamp, min_idle_stamp = ULONG_MAX; > > > > I don't think you really meant to assign an u64 variable with ULONG_MAX. > > You probably want ULLONG_MAX here. And probably not in fact (more > > later). > > > > > + > > > + struct rq *rq; > > > + struct cpuidle_power *power; > > > + > > > + int cpu_idle = -1; > > > + int cpu_busy = -1; > > > int i; > > > > > > /* Traverse only the allowed CPUs */ > > > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > > > - load = weighted_cpuload(i); > > > > > > - if (load < min_load || (load == min_load && i == this_cpu)) { > > > - min_load = load; > > > - idlest = i; > > > + if (idle_cpu(i)) { > > > + > > > + rq = cpu_rq(i); > > > + power = rq->power; > > > + idle_stamp = rq->idle_stamp; > > > + > > > + /* The cpu is idle since a shorter time */ > > > + if (idle_stamp < min_idle_stamp) { > > > + min_idle_stamp = idle_stamp; > > > + cpu_idle = i; > > > + continue; > > > > Don't you want the highest time stamp in order to select the most > > recently idled CPU? Favoring the CPU which has been idle the longest > > makes little sense. > > It may make sense if the hardware can auto-promote CPUs to deeper C-states. If so the promotion will happen over time, no? What I'm saying here is that those CPUs which have been idle longer should not be favored when it is time to select a CPU for a task to run. More recently idled CPUs are more likely to be in a shallower C-state. > Something like that happens with package C-states that are only entered when > all cores have entered a particular core C-state already. In that case the > probability of the core being in a deeper state grows with time. Exactly what I'm saying. Also here it is worth remembering that the scheduling domains should represent those packages that share common C-states at a higher level. The scheduler can then be told not to balance across domains if it doesn't need to in order to favor the conditions for those package C-states to be used. That's what the task packing patch series is about, independently of this one. > That said I would just drop this heuristics for the time being. If auto-promotion > is disregarded, it doesn't really matter how much time the given CPU has been idle > except for one case: When the target residency of its idle state hasn't been > reached yet, waking up the CPU may be a mistake (depending on how deep the state > actually is, but for the majority of drivers in the tree we don't have any measure > of that). There is one reason for considering the time a CPU has been idle, assuming equivalent C-state, and that is cache snooping. The longer a CPU is idle, the more likely its cache content will have been claimed and migrated by other CPUs. Of course that doesn't make much difference for deeper C-states where the cache isn't preserved, but it is probably simpler and cheaper to apply this heuristic in all cases. > > > + } > > > + > > > + /* The cpu is idle but the exit_latency is shorter */ > > > + if (power && power->exit_latency < min_exit_latency) { > > > + min_exit_latency = power->exit_latency; > > > + cpu_idle = i; > > > + continue; > > > + } > > > > I think this is wrong. This gives priority to CPUs which have been idle > > for a (longer... although this should have been) shorter period of time > > over those with a shallower idle state. I think this should rather be: > > > > if (power && power->exit_latency < min_exit_latency) { > > min_exit_latency = power->exit_latency; > > latest_idle_stamp = idle_stamp; > > cpu_idle = i; > > } else if ((!power || power->exit_latency == min_exit_latency) && > > idle_stamp > latest_idle_stamp) { > > latest_idle_stamp = idle_stamp; > > cpu_idle = i; > > } > > > > So the CPU with the shallowest idle state is selected in priority, and > > if many CPUs are in the same state then the time stamp is used to > > select the most recent one. > > Again, if auto-promotion is disregarded, it doesn't really matter which of them > is woken up. If it doesn't matter then it doesn't hurt. But in some cases it matters. > > Whenever a shallower idle state is found then the latest_idle_stamp is reset for > > that state even if it is further in the past. > > > > > + } else { > > > + > > > + load = weighted_cpuload(i); > > > + > > > + if (load < min_load || > > > + (load == min_load && i == this_cpu)) { > > > + min_load = load; > > > + cpu_busy = i; > > > + continue; > > > + } > > > } > > > > I think this is wrong to do an if-else based on idle_cpu() here. What > > if a CPU is heavily loaded, but for some reason it happens to be idle at > > this very moment? With your patch it could be selected as an idle CPU > > while it would be discarded as being too busy otherwise. > > But see below -> > > > It is important to determine both cpu_busy and cpu_idle for all CPUs. > > > > And cpu_busy is a bad name for this. Something like least_loaded would > > be more self explanatory. Same thing for cpu_idle which could be > > clearer if named shalloest_idle. > > shallowest_idle? Something that means the CPU with the shallowest C-state. Using "cpu_idle" for this variable doesn't cut it. > > > - return idlest; > > > + /* Busy cpus are considered less idle than idle cpus ;) */ > > > + return cpu_busy != -1 ? cpu_busy : cpu_idle; > > > > And finally it is a policy decision whether or not we want to return > > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs > > first or not. That in itself needs more investigation. To keep the > > existing policy unchanged for now the above condition should have its > > variables swapped. > > Which means that once we've find the first idle CPU, it is not useful to > continue computing least_loaded, because we will return the idle one anyway, > right? Good point. Currently, that should be the case. Eventually we'll want to put new tasks on lightly loaded CPUs instead of waking up a fully idle CPU in order to favor deeper C-states. But that requires a patch series of its own just to determine how loaded a CPU is and how much work it can still accommodate before being oversubscribed, etc. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-04 16:56 ` Nicolas Pitre @ 2014-04-05 2:01 ` Rafael J. Wysocki 0 siblings, 0 replies; 47+ messages in thread From: Rafael J. Wysocki @ 2014-04-05 2:01 UTC (permalink / raw) To: Nicolas Pitre Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Friday, April 04, 2014 12:56:52 PM Nicolas Pitre wrote: > On Fri, 4 Apr 2014, Rafael J. Wysocki wrote: > > > On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote: > > > On Fri, 28 Mar 2014, Daniel Lezcano wrote: > > > > > > > As we know in which idle state the cpu is, we can investigate the following: > > > > > > > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the > > > > deeper it is idle > > > > 2. what exit latency is ? the greater the exit latency is, the deeper it is > > > > > > > > With both information, when all cpus are idle, we can choose the idlest cpu. > > > > > > > > When one cpu is not idle, the old check against weighted load applies. > > > > > > > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > > > > > There seems to be some problems with the implementation. > > > > > > > @@ -4336,20 +4337,53 @@ static int > > > > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) > > > > { > > > > unsigned long load, min_load = ULONG_MAX; > > > > - int idlest = -1; > > > > + unsigned int min_exit_latency = UINT_MAX; > > > > + u64 idle_stamp, min_idle_stamp = ULONG_MAX; > > > > > > I don't think you really meant to assign an u64 variable with ULONG_MAX. > > > You probably want ULLONG_MAX here. And probably not in fact (more > > > later). > > > > > > > + > > > > + struct rq *rq; > > > > + struct cpuidle_power *power; > > > > + > > > > + int cpu_idle = -1; > > > > + int cpu_busy = -1; > > > > int i; > > > > > > > > /* Traverse only the allowed CPUs */ > > > > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > > > > - load = weighted_cpuload(i); > > > > > > > > - if (load < min_load || (load == min_load && i == this_cpu)) { > > > > - min_load = load; > > > > - idlest = i; > > > > + if (idle_cpu(i)) { > > > > + > > > > + rq = cpu_rq(i); > > > > + power = rq->power; > > > > + idle_stamp = rq->idle_stamp; > > > > + > > > > + /* The cpu is idle since a shorter time */ > > > > + if (idle_stamp < min_idle_stamp) { > > > > + min_idle_stamp = idle_stamp; > > > > + cpu_idle = i; > > > > + continue; > > > > > > Don't you want the highest time stamp in order to select the most > > > recently idled CPU? Favoring the CPU which has been idle the longest > > > makes little sense. > > > > It may make sense if the hardware can auto-promote CPUs to deeper C-states. > > If so the promotion will happen over time, no? What I'm saying here is > that those CPUs which have been idle longer should not be favored when > it is time to select a CPU for a task to run. More recently idled CPUs > are more likely to be in a shallower C-state. > > > Something like that happens with package C-states that are only entered when > > all cores have entered a particular core C-state already. In that case the > > probability of the core being in a deeper state grows with time. > > Exactly what I'm saying. Right, I got that the other way around by mistake. > Also here it is worth remembering that the scheduling domains should > represent those packages that share common C-states at a higher level. > The scheduler can then be told not to balance across domains if it > doesn't need to in order to favor the conditions for those package > C-states to be used. That's what the task packing patch series is > about, independently of this one. > > > That said I would just drop this heuristics for the time being. If auto-promotion > > is disregarded, it doesn't really matter how much time the given CPU has been idle > > except for one case: When the target residency of its idle state hasn't been > > reached yet, waking up the CPU may be a mistake (depending on how deep the state > > actually is, but for the majority of drivers in the tree we don't have any measure > > of that). > > There is one reason for considering the time a CPU has been idle, > assuming equivalent C-state, and that is cache snooping. The longer a > CPU is idle, the more likely its cache content will have been claimed > and migrated by other CPUs. Of course that doesn't make much difference > for deeper C-states where the cache isn't preserved, but it is probably > simpler and cheaper to apply this heuristic in all cases. Yes, that sounds like it might be a reason, but I'd like to see numbers confirming that to be honest. > > > > + } > > > > + > > > > + /* The cpu is idle but the exit_latency is shorter */ > > > > + if (power && power->exit_latency < min_exit_latency) { > > > > + min_exit_latency = power->exit_latency; > > > > + cpu_idle = i; > > > > + continue; > > > > + } > > > > > > I think this is wrong. This gives priority to CPUs which have been idle > > > for a (longer... although this should have been) shorter period of time > > > over those with a shallower idle state. I think this should rather be: > > > > > > if (power && power->exit_latency < min_exit_latency) { > > > min_exit_latency = power->exit_latency; > > > latest_idle_stamp = idle_stamp; > > > cpu_idle = i; > > > } else if ((!power || power->exit_latency == min_exit_latency) && > > > idle_stamp > latest_idle_stamp) { > > > latest_idle_stamp = idle_stamp; > > > cpu_idle = i; > > > } > > > > > > So the CPU with the shallowest idle state is selected in priority, and > > > if many CPUs are in the same state then the time stamp is used to > > > select the most recent one. > > > > Again, if auto-promotion is disregarded, it doesn't really matter which of them > > is woken up. > > If it doesn't matter then it doesn't hurt. But in some cases it > matters. > > > > Whenever a shallower idle state is found then the latest_idle_stamp is reset for > > > that state even if it is further in the past. > > > > > > > + } else { > > > > + > > > > + load = weighted_cpuload(i); > > > > + > > > > + if (load < min_load || > > > > + (load == min_load && i == this_cpu)) { > > > > + min_load = load; > > > > + cpu_busy = i; > > > > + continue; > > > > + } > > > > } > > > > > > I think this is wrong to do an if-else based on idle_cpu() here. What > > > if a CPU is heavily loaded, but for some reason it happens to be idle at > > > this very moment? With your patch it could be selected as an idle CPU > > > while it would be discarded as being too busy otherwise. > > > > But see below -> > > > > > It is important to determine both cpu_busy and cpu_idle for all CPUs. > > > > > > And cpu_busy is a bad name for this. Something like least_loaded would > > > be more self explanatory. Same thing for cpu_idle which could be > > > clearer if named shalloest_idle. > > > > shallowest_idle? > > Something that means the CPU with the shallowest C-state. Using > "cpu_idle" for this variable doesn't cut it. Yes, that was about the typo above only. :-) > > > > - return idlest; > > > > + /* Busy cpus are considered less idle than idle cpus ;) */ > > > > + return cpu_busy != -1 ? cpu_busy : cpu_idle; > > > > > > And finally it is a policy decision whether or not we want to return > > > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs > > > first or not. That in itself needs more investigation. To keep the > > > existing policy unchanged for now the above condition should have its > > > variables swapped. > > > > Which means that once we've find the first idle CPU, it is not useful to > > continue computing least_loaded, because we will return the idle one anyway, > > right? > > Good point. Currently, that should be the case. > > Eventually we'll want to put new tasks on lightly loaded CPUs instead of > waking up a fully idle CPU in order to favor deeper C-states. But that > requires a patch series of its own just to determine how loaded a CPU is > and how much work it can still accommodate before being oversubscribed, > etc. Wouldn't we need power consumption numbers for that realistically? Rafael ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-02 3:05 ` Nicolas Pitre 2014-04-04 11:57 ` Rafael J. Wysocki @ 2014-04-17 13:53 ` Daniel Lezcano 2014-04-17 14:47 ` Peter Zijlstra 2014-04-17 15:53 ` Nicolas Pitre 1 sibling, 2 replies; 47+ messages in thread From: Daniel Lezcano @ 2014-04-17 13:53 UTC (permalink / raw) To: Nicolas Pitre Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/02/2014 05:05 AM, Nicolas Pitre wrote: > On Fri, 28 Mar 2014, Daniel Lezcano wrote: > >> As we know in which idle state the cpu is, we can investigate the following: >> >> 1. when did the cpu entered the idle state ? the longer the cpu is idle, the >> deeper it is idle >> 2. what exit latency is ? the greater the exit latency is, the deeper it is >> >> With both information, when all cpus are idle, we can choose the idlest cpu. >> >> When one cpu is not idle, the old check against weighted load applies. >> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > There seems to be some problems with the implementation. > >> @@ -4336,20 +4337,53 @@ static int >> find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) >> { >> unsigned long load, min_load = ULONG_MAX; >> - int idlest = -1; >> + unsigned int min_exit_latency = UINT_MAX; >> + u64 idle_stamp, min_idle_stamp = ULONG_MAX; > > I don't think you really meant to assign an u64 variable with ULONG_MAX. > You probably want ULLONG_MAX here. And probably not in fact (more > later). > >> + >> + struct rq *rq; >> + struct cpuidle_power *power; >> + >> + int cpu_idle = -1; >> + int cpu_busy = -1; >> int i; >> >> /* Traverse only the allowed CPUs */ >> for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { >> - load = weighted_cpuload(i); >> >> - if (load < min_load || (load == min_load && i == this_cpu)) { >> - min_load = load; >> - idlest = i; >> + if (idle_cpu(i)) { >> + >> + rq = cpu_rq(i); >> + power = rq->power; >> + idle_stamp = rq->idle_stamp; >> + >> + /* The cpu is idle since a shorter time */ >> + if (idle_stamp < min_idle_stamp) { >> + min_idle_stamp = idle_stamp; >> + cpu_idle = i; >> + continue; > > Don't you want the highest time stamp in order to select the most > recently idled CPU? Favoring the CPU which has been idle the longest > makes little sense. > >> + } >> + >> + /* The cpu is idle but the exit_latency is shorter */ >> + if (power && power->exit_latency < min_exit_latency) { >> + min_exit_latency = power->exit_latency; >> + cpu_idle = i; >> + continue; >> + } > > I think this is wrong. This gives priority to CPUs which have been idle > for a (longer... although this should have been) shorter period of time > over those with a shallower idle state. I think this should rather be: > > if (power && power->exit_latency < min_exit_latency) { > min_exit_latency = power->exit_latency; > latest_idle_stamp = idle_stamp; > cpu_idle = i; > } else if ((!power || power->exit_latency == min_exit_latency) && > idle_stamp > latest_idle_stamp) { > latest_idle_stamp = idle_stamp; > cpu_idle = i; > } > > So the CPU with the shallowest idle state is selected in priority, and > if many CPUs are in the same state then the time stamp is used to > select the most recent one. Whenever > a shallower idle state is found then the latest_idle_stamp is reset for > that state even if it is further in the past. > >> + } else { >> + >> + load = weighted_cpuload(i); >> + >> + if (load < min_load || >> + (load == min_load && i == this_cpu)) { >> + min_load = load; >> + cpu_busy = i; >> + continue; >> + } >> } > > I think this is wrong to do an if-else based on idle_cpu() here. What > if a CPU is heavily loaded, but for some reason it happens to be idle at > this very moment? With your patch it could be selected as an idle CPU > while it would be discarded as being too busy otherwise. > > It is important to determine both cpu_busy and cpu_idle for all CPUs. > > And cpu_busy is a bad name for this. Something like least_loaded would > be more self explanatory. Same thing for cpu_idle which could be > clearer if named shalloest_idle. > >> - return idlest; >> + /* Busy cpus are considered less idle than idle cpus ;) */ >> + return cpu_busy != -1 ? cpu_busy : cpu_idle; > > And finally it is a policy decision whether or not we want to return > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs > first or not. That in itself needs more investigation. To keep the > existing policy unchanged for now the above condition should have its > variables swapped. Ok, refreshed the patchset but before sending it out I would to discuss about the rational of the changes and the policy, and change the patchset consequently. What order to choose if the cpu is idle ? Let's assume all cpus are idle on a dual socket quad core. Also, we can reasonably do the hypothesis if the cluster is in low power mode, the cpus belonging to the same cluster are in the same idle state (putting apart the auto-promote where we don't have control on). If the policy you talk above is 'aggressive power saving', we can follow the rules with decreasing priority: 1. We want to prevent to wakeup the entire cluster => as the cpus are in the same idle state, by choosing a cpu in shallow state, we should have the guarantee we won't wakeup a cluster (except if no shallowest idle cpu are found). 2. We want to prevent to wakeup a cpu which did not reach the target residency time (will need some work to unify cpuidle idle time and idle task run time) => with the target residency and, as a first step, with the idle stamp, we can determine if the cpu slept enough 3. We want to prevent to wakeup a cpu in deep idle state => by looking for the cpu in shallowest idle state 4. We want to prevent to wakeup a cpu where the exit latency is longer than the expected run time of the task (and the time to migrate the task ?) Concerning the policy, I would suggest to create an entry in /proc/sys/kernel/sched_power, where a couple of values could be performance - power saving (0 / 1). Does it make sense ? Any ideas ? Thanks -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-17 13:53 ` Daniel Lezcano @ 2014-04-17 14:47 ` Peter Zijlstra 2014-04-17 15:03 ` Daniel Lezcano 2014-04-17 15:53 ` Nicolas Pitre 1 sibling, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2014-04-17 14:47 UTC (permalink / raw) To: Daniel Lezcano Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote: > Concerning the policy, I would suggest to create an entry in > /proc/sys/kernel/sched_power, where a couple of values could be performance > - power saving (0 / 1). Ingo wanted a sched_balance_policy file with 3 values: "performance, power, auto" Where the auto thing switches between them, initially based off of having AC or not. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-17 14:47 ` Peter Zijlstra @ 2014-04-17 15:03 ` Daniel Lezcano 2014-04-18 8:09 ` Ingo Molnar 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-04-17 15:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/17/2014 04:47 PM, Peter Zijlstra wrote: > On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote: >> Concerning the policy, I would suggest to create an entry in >> /proc/sys/kernel/sched_power, where a couple of values could be performance >> - power saving (0 / 1). > > Ingo wanted a sched_balance_policy file with 3 values: > "performance, power, auto" > > Where the auto thing switches between them, initially based off of > having AC or not. oh, good. Thanks ! -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-17 15:03 ` Daniel Lezcano @ 2014-04-18 8:09 ` Ingo Molnar 2014-04-18 8:36 ` Daniel Lezcano 0 siblings, 1 reply; 47+ messages in thread From: Ingo Molnar @ 2014-04-18 8:09 UTC (permalink / raw) To: Daniel Lezcano Cc: Peter Zijlstra, Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen * Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > On 04/17/2014 04:47 PM, Peter Zijlstra wrote: > >On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote: > >>Concerning the policy, I would suggest to create an entry in > >>/proc/sys/kernel/sched_power, where a couple of values could be performance > >>- power saving (0 / 1). > > > >Ingo wanted a sched_balance_policy file with 3 values: > > "performance, power, auto" > > > >Where the auto thing switches between them, initially based off of > >having AC or not. > > oh, good. Thanks ! Also, 'auto' should be the default, because the kernel doing TRT is really what users want. Userspace can sill tweak it all and make it all user-space controlled, by flipping between 'performance' and 'power'. (and those modes are also helpful for development and debugging.) Thanks, Ingo ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-18 8:09 ` Ingo Molnar @ 2014-04-18 8:36 ` Daniel Lezcano 0 siblings, 0 replies; 47+ messages in thread From: Daniel Lezcano @ 2014-04-18 8:36 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/18/2014 10:09 AM, Ingo Molnar wrote: > > * Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >> On 04/17/2014 04:47 PM, Peter Zijlstra wrote: >>> On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote: >>>> Concerning the policy, I would suggest to create an entry in >>>> /proc/sys/kernel/sched_power, where a couple of values could be performance >>>> - power saving (0 / 1). >>> >>> Ingo wanted a sched_balance_policy file with 3 values: >>> "performance, power, auto" >>> >>> Where the auto thing switches between them, initially based off of >>> having AC or not. >> >> oh, good. Thanks ! > > Also, 'auto' should be the default, because the kernel doing TRT is > really what users want. > > Userspace can sill tweak it all and make it all user-space controlled, > by flipping between 'performance' and 'power'. (and those modes are > also helpful for development and debugging.) Copy that. Thanks ! -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-17 13:53 ` Daniel Lezcano 2014-04-17 14:47 ` Peter Zijlstra @ 2014-04-17 15:53 ` Nicolas Pitre 2014-04-17 16:05 ` Daniel Lezcano 1 sibling, 1 reply; 47+ messages in thread From: Nicolas Pitre @ 2014-04-17 15:53 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Thu, 17 Apr 2014, Daniel Lezcano wrote: > Ok, refreshed the patchset but before sending it out I would to discuss about > the rational of the changes and the policy, and change the patchset > consequently. > > What order to choose if the cpu is idle ? > > Let's assume all cpus are idle on a dual socket quad core. > > Also, we can reasonably do the hypothesis if the cluster is in low power mode, > the cpus belonging to the same cluster are in the same idle state (putting > apart the auto-promote where we don't have control on). > > If the policy you talk above is 'aggressive power saving', we can follow the > rules with decreasing priority: > > 1. We want to prevent to wakeup the entire cluster > => as the cpus are in the same idle state, by choosing a cpu in > => shallow > state, we should have the guarantee we won't wakeup a cluster (except if no > shallowest idle cpu are found). This is unclear to me. Obviously, if an entire cluster is down, that means all the CPUs it contains have been idle for a long time. And therefore they shouldn't be subject to selection unless there is no other CPUs available. Is that what you mean? > 2. We want to prevent to wakeup a cpu which did not reach the target residency > time (will need some work to unify cpuidle idle time and idle task run time) > => with the target residency and, as a first step, with the idle > => stamp, > we can determine if the cpu slept enough Agreed. However, right now, the scheduler does not have any consideration for that. So this should be done as a separate patch. > 3. We want to prevent to wakeup a cpu in deep idle state > => by looking for the cpu in shallowest idle state Obvious. > 4. We want to prevent to wakeup a cpu where the exit latency is longer than > the expected run time of the task (and the time to migrate the task ?) Sure. That would be a case for using task packing even if the policy is set to performance rather than powersave whereas task packing is normally for powersave. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-17 15:53 ` Nicolas Pitre @ 2014-04-17 16:05 ` Daniel Lezcano 2014-04-17 16:21 ` Nicolas Pitre 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-04-17 16:05 UTC (permalink / raw) To: Nicolas Pitre Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/17/2014 05:53 PM, Nicolas Pitre wrote: > On Thu, 17 Apr 2014, Daniel Lezcano wrote: > >> Ok, refreshed the patchset but before sending it out I would to discuss about >> the rational of the changes and the policy, and change the patchset >> consequently. >> >> What order to choose if the cpu is idle ? >> >> Let's assume all cpus are idle on a dual socket quad core. >> >> Also, we can reasonably do the hypothesis if the cluster is in low power mode, >> the cpus belonging to the same cluster are in the same idle state (putting >> apart the auto-promote where we don't have control on). >> >> If the policy you talk above is 'aggressive power saving', we can follow the >> rules with decreasing priority: >> >> 1. We want to prevent to wakeup the entire cluster >> => as the cpus are in the same idle state, by choosing a cpu in >> => shallow >> state, we should have the guarantee we won't wakeup a cluster (except if no >> shallowest idle cpu are found). > > This is unclear to me. Obviously, if an entire cluster is down, that > means all the CPUs it contains have been idle for a long time. And > therefore they shouldn't be subject to selection unless there is no > other CPUs available. Is that what you mean? Yes, this is what I meant. But also what I meant is we can get rid for the moment of the cpu topology and the coupling idle state because if we do this described approach, as the idle state will be the same for the cpus belonging to the same cluster we won't select a cluster down (except if there is no other CPUs available). >> 2. We want to prevent to wakeup a cpu which did not reach the target residency >> time (will need some work to unify cpuidle idle time and idle task run time) >> => with the target residency and, as a first step, with the idle >> => stamp, >> we can determine if the cpu slept enough > > Agreed. However, right now, the scheduler does not have any > consideration for that. So this should be done as a separate patch. Yes, I thought as a very first step we can rely on the idle stamp until we unify the times with a big comment. Or I can first unify the idle times and then take into account the target residency. It is to comply with Rafael's request to have the 'big picture'. >> 3. We want to prevent to wakeup a cpu in deep idle state >> => by looking for the cpu in shallowest idle state > > Obvious. > >> 4. We want to prevent to wakeup a cpu where the exit latency is longer than >> the expected run time of the task (and the time to migrate the task ?) > > Sure. That would be a case for using task packing even if the policy is > set to performance rather than powersave whereas task packing is > normally for powersave. Yes, I agree, task packing improves also the performances and it makes really sense to prevent task migration under some circumstances for a better cache efficiency. Thanks for the comments -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-17 16:05 ` Daniel Lezcano @ 2014-04-17 16:21 ` Nicolas Pitre 2014-04-18 9:38 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Nicolas Pitre @ 2014-04-17 16:21 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Thu, 17 Apr 2014, Daniel Lezcano wrote: > On 04/17/2014 05:53 PM, Nicolas Pitre wrote: > > On Thu, 17 Apr 2014, Daniel Lezcano wrote: > > > > > Ok, refreshed the patchset but before sending it out I would to discuss > > > about > > > the rational of the changes and the policy, and change the patchset > > > consequently. > > > > > > What order to choose if the cpu is idle ? > > > > > > Let's assume all cpus are idle on a dual socket quad core. > > > > > > Also, we can reasonably do the hypothesis if the cluster is in low power > > > mode, > > > the cpus belonging to the same cluster are in the same idle state (putting > > > apart the auto-promote where we don't have control on). > > > > > > If the policy you talk above is 'aggressive power saving', we can follow > > > the > > > rules with decreasing priority: > > > > > > 1. We want to prevent to wakeup the entire cluster > > > => as the cpus are in the same idle state, by choosing a cpu in > > > => shallow > > > state, we should have the guarantee we won't wakeup a cluster (except if > > > no > > > shallowest idle cpu are found). > > > > This is unclear to me. Obviously, if an entire cluster is down, that > > means all the CPUs it contains have been idle for a long time. And > > therefore they shouldn't be subject to selection unless there is no > > other CPUs available. Is that what you mean? > > Yes, this is what I meant. But also what I meant is we can get rid for the > moment of the cpu topology and the coupling idle state because if we do this > described approach, as the idle state will be the same for the cpus belonging > to the same cluster we won't select a cluster down (except if there is no > other CPUs available). CPU topology is needed to properly describe scheduling domains. Whether we balance across domains or pack using as few domains as possible is a separate issue. In other words, you shouldn't have to care in this patch series. And IMHO coupled C-state is a low-level mechanism that should remain private to cpuidle which the scheduler shouldn't be aware of. > > > 2. We want to prevent to wakeup a cpu which did not reach the target > > > residency > > > time (will need some work to unify cpuidle idle time and idle task run > > > time) > > > => with the target residency and, as a first step, with the idle > > > => stamp, > > > we can determine if the cpu slept enough > > > > Agreed. However, right now, the scheduler does not have any > > consideration for that. So this should be done as a separate patch. > > Yes, I thought as a very first step we can rely on the idle stamp until we > unify the times with a big comment. Or I can first unify the idle times and > then take into account the target residency. It is to comply with Rafael's > request to have the 'big picture'. I agree, but that should be done incrementally. Even without this consideration, what you proposed is already an improvement over the current state of affairs. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-17 16:21 ` Nicolas Pitre @ 2014-04-18 9:38 ` Peter Zijlstra 2014-04-18 12:13 ` Daniel Lezcano 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2014-04-18 9:38 UTC (permalink / raw) To: Nicolas Pitre Cc: Daniel Lezcano, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote: > CPU topology is needed to properly describe scheduling domains. Whether > we balance across domains or pack using as few domains as possible is a > separate issue. In other words, you shouldn't have to care in this > patch series. > > And IMHO coupled C-state is a low-level mechanism that should remain > private to cpuidle which the scheduler shouldn't be aware of. I'm confused.. why wouldn't you want to expose these? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-18 9:38 ` Peter Zijlstra @ 2014-04-18 12:13 ` Daniel Lezcano 2014-04-18 12:53 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-04-18 12:13 UTC (permalink / raw) To: Peter Zijlstra, Nicolas Pitre Cc: linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/18/2014 11:38 AM, Peter Zijlstra wrote: > On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote: >> CPU topology is needed to properly describe scheduling domains. Whether >> we balance across domains or pack using as few domains as possible is a >> separate issue. In other words, you shouldn't have to care in this >> patch series. >> >> And IMHO coupled C-state is a low-level mechanism that should remain >> private to cpuidle which the scheduler shouldn't be aware of. > > I'm confused.. why wouldn't you want to expose these? The couple C-state is used as a mechanism for cpuidle to sync the cpus when entering a specific c-state. This mechanism is usually used to handle the cluster power down. It is only used for a two drivers (soon three) but it is not the only mechanism used for syncing the cpus. There are also the MCPM (tc2), the hand made sync when the hardware allows it (ux500), and an abstraction from the firmware (mwait), transparent to the kernel. Taking into account the couple c-state only does not make sense because of the other mechanisms above. This is why it should stay inside the cpuidle framework. The extension of the cpu topology will provide a generic way to describe and abstracting such dependencies. Does it answer your question ? -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-18 12:13 ` Daniel Lezcano @ 2014-04-18 12:53 ` Peter Zijlstra 2014-04-18 13:04 ` Daniel Lezcano 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2014-04-18 12:53 UTC (permalink / raw) To: Daniel Lezcano Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, Apr 18, 2014 at 02:13:48PM +0200, Daniel Lezcano wrote: > On 04/18/2014 11:38 AM, Peter Zijlstra wrote: > >On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote: > >>CPU topology is needed to properly describe scheduling domains. Whether > >>we balance across domains or pack using as few domains as possible is a > >>separate issue. In other words, you shouldn't have to care in this > >>patch series. > >> > >>And IMHO coupled C-state is a low-level mechanism that should remain > >>private to cpuidle which the scheduler shouldn't be aware of. > > > >I'm confused.. why wouldn't you want to expose these? > > The couple C-state is used as a mechanism for cpuidle to sync the cpus when > entering a specific c-state. This mechanism is usually used to handle the > cluster power down. It is only used for a two drivers (soon three) but it is > not the only mechanism used for syncing the cpus. There are also the MCPM > (tc2), the hand made sync when the hardware allows it (ux500), and an > abstraction from the firmware (mwait), transparent to the kernel. > > Taking into account the couple c-state only does not make sense because of > the other mechanisms above. This is why it should stay inside the cpuidle > framework. > > The extension of the cpu topology will provide a generic way to describe and > abstracting such dependencies. > > Does it answer your question ? I suppose so; its still a bit like we won't but we will :-) So we _will_ actually expose coupled C states through the topology bits, that's good. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-18 12:53 ` Peter Zijlstra @ 2014-04-18 13:04 ` Daniel Lezcano 2014-04-18 16:00 ` Nicolas Pitre 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-04-18 13:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/18/2014 02:53 PM, Peter Zijlstra wrote: > On Fri, Apr 18, 2014 at 02:13:48PM +0200, Daniel Lezcano wrote: >> On 04/18/2014 11:38 AM, Peter Zijlstra wrote: >>> On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote: >>>> CPU topology is needed to properly describe scheduling domains. Whether >>>> we balance across domains or pack using as few domains as possible is a >>>> separate issue. In other words, you shouldn't have to care in this >>>> patch series. >>>> >>>> And IMHO coupled C-state is a low-level mechanism that should remain >>>> private to cpuidle which the scheduler shouldn't be aware of. >>> >>> I'm confused.. why wouldn't you want to expose these? >> >> The couple C-state is used as a mechanism for cpuidle to sync the cpus when >> entering a specific c-state. This mechanism is usually used to handle the >> cluster power down. It is only used for a two drivers (soon three) but it is >> not the only mechanism used for syncing the cpus. There are also the MCPM >> (tc2), the hand made sync when the hardware allows it (ux500), and an >> abstraction from the firmware (mwait), transparent to the kernel. >> >> Taking into account the couple c-state only does not make sense because of >> the other mechanisms above. This is why it should stay inside the cpuidle >> framework. >> >> The extension of the cpu topology will provide a generic way to describe and >> abstracting such dependencies. >> >> Does it answer your question ? > > I suppose so; its still a bit like we won't but we will :-) > > So we _will_ actually expose coupled C states through the topology bits, > that's good. Ah, ok. I think I understood where the confusion is coming from. A couple of definitions for the same thing :) 1. Coupled C-states : *mechanism* implemented in the cpuidle framework: drivers/cpuidle/coupled.c 2. Coupled C-states : *constraint* to reach a cluster power down state, will be described through the topology and could be implemented by different mechanism (MCPM, handmade sync, cpuidle-coupled-c-state, firmware). We want to expose 2. not 1. to the scheduler. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-04-18 13:04 ` Daniel Lezcano @ 2014-04-18 16:00 ` Nicolas Pitre 0 siblings, 0 replies; 47+ messages in thread From: Nicolas Pitre @ 2014-04-18 16:00 UTC (permalink / raw) To: Daniel Lezcano Cc: Peter Zijlstra, linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, 18 Apr 2014, Daniel Lezcano wrote: > On 04/18/2014 02:53 PM, Peter Zijlstra wrote: > > I suppose so; its still a bit like we won't but we will :-) > > > > So we _will_ actually expose coupled C states through the topology bits, > > that's good. > > Ah, ok. I think I understood where the confusion is coming from. > > A couple of definitions for the same thing :) > > 1. Coupled C-states : *mechanism* implemented in the cpuidle framework: > drivers/cpuidle/coupled.c > > 2. Coupled C-states : *constraint* to reach a cluster power down state, will > be described through the topology and could be implemented by different > mechanism (MCPM, handmade sync, cpuidle-coupled-c-state, firmware). > > We want to expose 2. not 1. to the scheduler. I couldn't explain it better. Sorry for creating confusion. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu 2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano 2014-04-02 3:05 ` Nicolas Pitre @ 2014-04-15 13:03 ` Peter Zijlstra 1 sibling, 0 replies; 47+ messages in thread From: Peter Zijlstra @ 2014-04-15 13:03 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, Mar 28, 2014 at 01:29:56PM +0100, Daniel Lezcano wrote: > @@ -4336,20 +4337,53 @@ static int > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) > { > unsigned long load, min_load = ULONG_MAX; > - int idlest = -1; > + unsigned int min_exit_latency = UINT_MAX; > + u64 idle_stamp, min_idle_stamp = ULONG_MAX; > + > + struct rq *rq; > + struct cpuidle_power *power; > + > + int cpu_idle = -1; > + int cpu_busy = -1; > int i; > > /* Traverse only the allowed CPUs */ > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > - load = weighted_cpuload(i); > > - if (load < min_load || (load == min_load && i == this_cpu)) { > - min_load = load; > - idlest = i; > + if (idle_cpu(i)) { > + > + rq = cpu_rq(i); > + power = rq->power; > + idle_stamp = rq->idle_stamp; > + > + /* The cpu is idle since a shorter time */ > + if (idle_stamp < min_idle_stamp) { > + min_idle_stamp = idle_stamp; > + cpu_idle = i; > + continue; > + } > + > + /* The cpu is idle but the exit_latency is shorter */ > + if (power && power->exit_latency < min_exit_latency) { > + min_exit_latency = power->exit_latency; > + cpu_idle = i; > + continue; > + } Aside from the arguments made by Nico (which I agree with), depending on the life time rules of the power object we might need smp_read_barrier_depends() between reading and using. If all these objects are static and never change content we do not, if there's dynamic objects involved we probably should. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano ` (2 preceding siblings ...) 2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano @ 2014-03-31 13:52 ` Vincent Guittot 2014-03-31 15:55 ` Daniel Lezcano 2014-04-01 23:01 ` Rafael J. Wysocki 2014-04-04 6:29 ` Len Brown 5 siblings, 1 reply; 47+ messages in thread From: Vincent Guittot @ 2014-03-31 13:52 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre, linux-pm, Alex Shi, Morten Rasmussen On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > The following patchset provides an interaction between cpuidle and the scheduler. > > The first patch encapsulate the needed information for the scheduler in a > separate cpuidle structure. The second one stores the pointer to this structure > when entering idle. The third one, use this information to take the decision to > find the idlest cpu. > > After some basic testing with hackbench, it appears there is an improvement for > the performances (small) and for the duration of the idle states (which provides > a better power saving). > > The measurement has been done with the 'idlestat' tool previously posted in this > mailing list. > > So the benefit is good for both sides performance and power saving. Hi Daniel, I have looked at your results and i'm a bit surprised that you have so much time in C-state with a test that involved 400 tasks on a dual cores HT system. You shouldn't have any CPUs in idle state when running hackbench; the total time of core0state in C7-IVB is 87932131.00(us), which is quite huge for a bench that runs 44sec. Or i'm doing something wrong in the interpretation of the results ? Regards, Vincent > > The select_idle_sibling could be also improved in the same way. > > ====================== test with hackbench 3.14-rc8 ========================= > > /usr/bin/hackbench -l 10000 -s 4096 > > Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) > Each sender will pass 10000 messages of 4096 bytes > > Time: 44.433 > > Total trace buffer: 1846688 kB > clusterA@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-VB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 0 0.00 0.00 0.00 0.00 > core0@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-IVB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 1396 87932131.00 62988.63 0.00 320146.00 > cpu0@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 1 14.00 14.00 14.00 14.00 > C1E-VB 0 0.00 0.00 0.00 0.00 > C3-IVB 1 262.00 262.00 262.00 262.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 1180 87938177.00 74523.88 1.00 320147.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 0 0.00 0.00 0.00 0.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 0 0.00 0.00 0.00 0.00 > cpu0 wakeups name count > irq009 acpi 1 > cpu1@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-VB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 475 87941356.00 185139.70 322.00 1500690.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 0 0.00 0.00 0.00 0.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 0 0.00 0.00 0.00 0.00 > cpu1 wakeups name count > irq009 acpi 3 > core1@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-IVB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 0 0.00 0.00 0.00 0.00 > cpu2@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 11 288157.00 26196.09 16.00 200060.00 > C1E-VB 6 221601.00 36933.50 79.00 200066.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 950 87417466.00 92018.39 19.00 200074.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 2 34.00 17.00 11.00 23.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 745 18800.00 25.23 2.00 156.00 > cpu2 wakeups name count > irq019 ahci 50 > irq009 acpi 17 > cpu3@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-VB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 0 0.00 0.00 0.00 0.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 0 0.00 0.00 0.00 0.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 0 0.00 0.00 0.00 0.00 > cpu3 wakeups name count > > ================ test with hackbench 3.14-rc8 + patchset ==================== > > /usr/bin/hackbench -l 10000 -s 4096 > > Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) > Each sender will pass 10000 messages of 4096 bytes > > Time: 42.179 > > Total trace buffer: 1846688 kB > clusterA@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-VB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 0 0.00 0.00 0.00 0.00 > core0@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-IVB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 880 89157590.00 101315.44 0.00 400184.00 > cpu0@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-VB 1 233.00 233.00 233.00 233.00 > C3-IVB 1 260.00 260.00 260.00 260.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 700 89162006.00 127374.29 182.00 400187.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 0 0.00 0.00 0.00 0.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 0 0.00 0.00 0.00 0.00 > cpu0 wakeups name count > irq009 acpi 2 > cpu1@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-VB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 334 89164805.00 266960.49 1.00 1500677.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 0 0.00 0.00 0.00 0.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 0 0.00 0.00 0.00 0.00 > cpu1 wakeups name count > irq009 acpi 6 > core1@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-IVB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 0 0.00 0.00 0.00 0.00 > cpu2@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 19 2169047.00 114160.37 18.00 999129.00 > C1E-IB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 376 86993307.00 231365.18 20.00 1500682.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 0 0.00 0.00 0.00 0.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 0 0.00 0.00 0.00 0.00 > cpu2 wakeups name count > irq009 acpi 32 > irq019 ahci 45 > cpu3@state hits total(us) avg(us) min(us) max(us) > POLL 0 0.00 0.00 0.00 0.00 > C1-IVB 0 0.00 0.00 0.00 0.00 > C1E-VB 0 0.00 0.00 0.00 0.00 > C3-IVB 0 0.00 0.00 0.00 0.00 > C6-IVB 0 0.00 0.00 0.00 0.00 > C7-IVB 0 0.00 0.00 0.00 0.00 > 1701 0 0.00 0.00 0.00 0.00 > 1700 0 0.00 0.00 0.00 0.00 > 1600 0 0.00 0.00 0.00 0.00 > 1500 0 0.00 0.00 0.00 0.00 > 1400 0 0.00 0.00 0.00 0.00 > 1300 0 0.00 0.00 0.00 0.00 > 1200 0 0.00 0.00 0.00 0.00 > 1100 0 0.00 0.00 0.00 0.00 > 1000 0 0.00 0.00 0.00 0.00 > 900 0 0.00 0.00 0.00 0.00 > 800 0 0.00 0.00 0.00 0.00 > 782 0 0.00 0.00 0.00 0.00 > cpu3 wakeups name count > > > Daniel Lezcano (3): > cpuidle: encapsulate power info in a separate structure > idle: store the idle state the cpu is > sched/fair: use the idle state info to choose the idlest cpu > > arch/arm/include/asm/cpuidle.h | 6 +- > arch/arm/mach-exynos/cpuidle.c | 4 +- > drivers/acpi/processor_idle.c | 4 +- > drivers/base/power/domain.c | 6 +- > drivers/cpuidle/cpuidle-at91.c | 4 +- > drivers/cpuidle/cpuidle-big_little.c | 9 +-- > drivers/cpuidle/cpuidle-calxeda.c | 6 +- > drivers/cpuidle/cpuidle-kirkwood.c | 4 +- > drivers/cpuidle/cpuidle-powernv.c | 8 +-- > drivers/cpuidle/cpuidle-pseries.c | 12 ++-- > drivers/cpuidle/cpuidle-ux500.c | 14 ++--- > drivers/cpuidle/cpuidle-zynq.c | 4 +- > drivers/cpuidle/driver.c | 6 +- > drivers/cpuidle/governors/ladder.c | 14 +++-- > drivers/cpuidle/governors/menu.c | 8 +-- > drivers/cpuidle/sysfs.c | 2 +- > drivers/idle/intel_idle.c | 112 +++++++++++++++++----------------- > include/linux/cpuidle.h | 10 ++- > kernel/sched/fair.c | 46 ++++++++++++-- > kernel/sched/idle.c | 17 +++++- > kernel/sched/sched.h | 5 ++ > 21 files changed, 180 insertions(+), 121 deletions(-) > > -- > 1.7.9.5 > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot @ 2014-03-31 15:55 ` Daniel Lezcano 2014-04-01 7:16 ` Vincent Guittot 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-03-31 15:55 UTC (permalink / raw) To: Vincent Guittot Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre, linux-pm, Alex Shi, Morten Rasmussen On 03/31/2014 03:52 PM, Vincent Guittot wrote: > On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> The following patchset provides an interaction between cpuidle and the scheduler. >> >> The first patch encapsulate the needed information for the scheduler in a >> separate cpuidle structure. The second one stores the pointer to this structure >> when entering idle. The third one, use this information to take the decision to >> find the idlest cpu. >> >> After some basic testing with hackbench, it appears there is an improvement for >> the performances (small) and for the duration of the idle states (which provides >> a better power saving). >> >> The measurement has been done with the 'idlestat' tool previously posted in this >> mailing list. >> >> So the benefit is good for both sides performance and power saving. > > Hi Daniel, > > I have looked at your results and i'm a bit surprised that you have so > much time in C-state with a test that involved 400 tasks on a dual > cores HT system. You shouldn't have any CPUs in idle state when > running hackbench; the total time of core0state in C7-IVB is > 87932131.00(us), which is quite huge for a bench that runs 44sec. Or > i'm doing something wrong in the interpretation of the results ? No, actually I mixed the output of hackbench without being run with idlestat or with idlestat. The hackbench's results below are without idlestat. The idlestat results are consistent and effectively it adds a non negligeable overhead as it impacts the hackbench results. So to summarize, hackbench has been run 4 times. 1, 2 : without idlestat, with and without the patchset - hackbench results ~42 secs 3, 4 : with idlestat, with and without the patchset - hackbench results ~87 secs At the first the glance, the results are consistent but I will double check them. Do you have a suggestion for a benchmarking program ? Thanks ! -- Daniel >> The select_idle_sibling could be also improved in the same way. >> >> ====================== test with hackbench 3.14-rc8 ========================= >> >> /usr/bin/hackbench -l 10000 -s 4096 >> >> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) >> Each sender will pass 10000 messages of 4096 bytes >> >> Time: 44.433 >> >> Total trace buffer: 1846688 kB >> clusterA@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-VB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 0 0.00 0.00 0.00 0.00 >> core0@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-IVB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 1396 87932131.00 62988.63 0.00 320146.00 >> cpu0@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 1 14.00 14.00 14.00 14.00 >> C1E-VB 0 0.00 0.00 0.00 0.00 >> C3-IVB 1 262.00 262.00 262.00 262.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 1180 87938177.00 74523.88 1.00 320147.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 0 0.00 0.00 0.00 0.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 0 0.00 0.00 0.00 0.00 >> cpu0 wakeups name count >> irq009 acpi 1 >> cpu1@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-VB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 475 87941356.00 185139.70 322.00 1500690.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 0 0.00 0.00 0.00 0.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 0 0.00 0.00 0.00 0.00 >> cpu1 wakeups name count >> irq009 acpi 3 >> core1@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-IVB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 0 0.00 0.00 0.00 0.00 >> cpu2@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 11 288157.00 26196.09 16.00 200060.00 >> C1E-VB 6 221601.00 36933.50 79.00 200066.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 950 87417466.00 92018.39 19.00 200074.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 2 34.00 17.00 11.00 23.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 745 18800.00 25.23 2.00 156.00 >> cpu2 wakeups name count >> irq019 ahci 50 >> irq009 acpi 17 >> cpu3@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-VB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 0 0.00 0.00 0.00 0.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 0 0.00 0.00 0.00 0.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 0 0.00 0.00 0.00 0.00 >> cpu3 wakeups name count >> >> ================ test with hackbench 3.14-rc8 + patchset ==================== >> >> /usr/bin/hackbench -l 10000 -s 4096 >> >> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) >> Each sender will pass 10000 messages of 4096 bytes >> >> Time: 42.179 >> >> Total trace buffer: 1846688 kB >> clusterA@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-VB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 0 0.00 0.00 0.00 0.00 >> core0@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-IVB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 880 89157590.00 101315.44 0.00 400184.00 >> cpu0@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-VB 1 233.00 233.00 233.00 233.00 >> C3-IVB 1 260.00 260.00 260.00 260.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 700 89162006.00 127374.29 182.00 400187.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 0 0.00 0.00 0.00 0.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 0 0.00 0.00 0.00 0.00 >> cpu0 wakeups name count >> irq009 acpi 2 >> cpu1@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-VB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 334 89164805.00 266960.49 1.00 1500677.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 0 0.00 0.00 0.00 0.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 0 0.00 0.00 0.00 0.00 >> cpu1 wakeups name count >> irq009 acpi 6 >> core1@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-IVB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 0 0.00 0.00 0.00 0.00 >> cpu2@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 19 2169047.00 114160.37 18.00 999129.00 >> C1E-IB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 376 86993307.00 231365.18 20.00 1500682.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 0 0.00 0.00 0.00 0.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 0 0.00 0.00 0.00 0.00 >> cpu2 wakeups name count >> irq009 acpi 32 >> irq019 ahci 45 >> cpu3@state hits total(us) avg(us) min(us) max(us) >> POLL 0 0.00 0.00 0.00 0.00 >> C1-IVB 0 0.00 0.00 0.00 0.00 >> C1E-VB 0 0.00 0.00 0.00 0.00 >> C3-IVB 0 0.00 0.00 0.00 0.00 >> C6-IVB 0 0.00 0.00 0.00 0.00 >> C7-IVB 0 0.00 0.00 0.00 0.00 >> 1701 0 0.00 0.00 0.00 0.00 >> 1700 0 0.00 0.00 0.00 0.00 >> 1600 0 0.00 0.00 0.00 0.00 >> 1500 0 0.00 0.00 0.00 0.00 >> 1400 0 0.00 0.00 0.00 0.00 >> 1300 0 0.00 0.00 0.00 0.00 >> 1200 0 0.00 0.00 0.00 0.00 >> 1100 0 0.00 0.00 0.00 0.00 >> 1000 0 0.00 0.00 0.00 0.00 >> 900 0 0.00 0.00 0.00 0.00 >> 800 0 0.00 0.00 0.00 0.00 >> 782 0 0.00 0.00 0.00 0.00 >> cpu3 wakeups name count >> >> >> Daniel Lezcano (3): >> cpuidle: encapsulate power info in a separate structure >> idle: store the idle state the cpu is >> sched/fair: use the idle state info to choose the idlest cpu >> >> arch/arm/include/asm/cpuidle.h | 6 +- >> arch/arm/mach-exynos/cpuidle.c | 4 +- >> drivers/acpi/processor_idle.c | 4 +- >> drivers/base/power/domain.c | 6 +- >> drivers/cpuidle/cpuidle-at91.c | 4 +- >> drivers/cpuidle/cpuidle-big_little.c | 9 +-- >> drivers/cpuidle/cpuidle-calxeda.c | 6 +- >> drivers/cpuidle/cpuidle-kirkwood.c | 4 +- >> drivers/cpuidle/cpuidle-powernv.c | 8 +-- >> drivers/cpuidle/cpuidle-pseries.c | 12 ++-- >> drivers/cpuidle/cpuidle-ux500.c | 14 ++--- >> drivers/cpuidle/cpuidle-zynq.c | 4 +- >> drivers/cpuidle/driver.c | 6 +- >> drivers/cpuidle/governors/ladder.c | 14 +++-- >> drivers/cpuidle/governors/menu.c | 8 +-- >> drivers/cpuidle/sysfs.c | 2 +- >> drivers/idle/intel_idle.c | 112 +++++++++++++++++----------------- >> include/linux/cpuidle.h | 10 ++- >> kernel/sched/fair.c | 46 ++++++++++++-- >> kernel/sched/idle.c | 17 +++++- >> kernel/sched/sched.h | 5 ++ >> 21 files changed, 180 insertions(+), 121 deletions(-) >> >> -- >> 1.7.9.5 >> -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-03-31 15:55 ` Daniel Lezcano @ 2014-04-01 7:16 ` Vincent Guittot 2014-04-01 7:43 ` Daniel Lezcano 0 siblings, 1 reply; 47+ messages in thread From: Vincent Guittot @ 2014-04-01 7:16 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre, linux-pm, Alex Shi, Morten Rasmussen On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > On 03/31/2014 03:52 PM, Vincent Guittot wrote: >> >> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >>> >>> The following patchset provides an interaction between cpuidle and the >>> scheduler. >>> >>> The first patch encapsulate the needed information for the scheduler in a >>> separate cpuidle structure. The second one stores the pointer to this >>> structure >>> when entering idle. The third one, use this information to take the >>> decision to >>> find the idlest cpu. >>> >>> After some basic testing with hackbench, it appears there is an >>> improvement for >>> the performances (small) and for the duration of the idle states (which >>> provides >>> a better power saving). >>> >>> The measurement has been done with the 'idlestat' tool previously posted >>> in this >>> mailing list. >>> >>> So the benefit is good for both sides performance and power saving. >> >> >> Hi Daniel, >> >> I have looked at your results and i'm a bit surprised that you have so >> much time in C-state with a test that involved 400 tasks on a dual >> cores HT system. You shouldn't have any CPUs in idle state when >> running hackbench; the total time of core0state in C7-IVB is >> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or >> i'm doing something wrong in the interpretation of the results ? > > > No, actually I mixed the output of hackbench without being run with idlestat > or with idlestat. > > The hackbench's results below are without idlestat. > > The idlestat results are consistent and effectively it adds a non > negligeable overhead as it impacts the hackbench results. > > So to summarize, hackbench has been run 4 times. > > 1, 2 : without idlestat, with and without the patchset - hackbench results > ~42 secs > > 3, 4 : with idlestat, with and without the patchset - hackbench results ~87 > secs > > At the first the glance, the results are consistent but I will double check > them. > > Do you have a suggestion for a benchmarking program ? We are working on a bench which can generate middle load pattern with idle CPUs but it's not available yet. In the mean time, one bench that plays with idle time is cyclictest, it will not give you performance results but only scheduling latency which might be what you are looking for. Vincent > > Thanks ! > > -- Daniel > > > >>> The select_idle_sibling could be also improved in the same way. >>> >>> ====================== test with hackbench 3.14-rc8 >>> ========================= >>> >>> /usr/bin/hackbench -l 10000 -s 4096 >>> >>> Running in process mode with 10 groups using 40 file descriptors each (== >>> 400 tasks) >>> Each sender will pass 10000 messages of 4096 bytes >>> >>> Time: 44.433 >>> >>> Total trace buffer: 1846688 kB >>> clusterA@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-VB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 0 0.00 0.00 0.00 0.00 >>> core0@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 1396 87932131.00 62988.63 0.00 >>> 320146.00 >>> cpu0@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 1 14.00 14.00 14.00 14.00 >>> C1E-VB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 1 262.00 262.00 262.00 262.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 1180 87938177.00 74523.88 1.00 >>> 320147.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 0 0.00 0.00 0.00 0.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 0 0.00 0.00 0.00 0.00 >>> cpu0 wakeups name count >>> irq009 acpi 1 >>> cpu1@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-VB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 475 87941356.00 185139.70 322.00 >>> 1500690.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 0 0.00 0.00 0.00 0.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 0 0.00 0.00 0.00 0.00 >>> cpu1 wakeups name count >>> irq009 acpi 3 >>> core1@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 0 0.00 0.00 0.00 0.00 >>> cpu2@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 11 288157.00 26196.09 16.00 >>> 200060.00 >>> C1E-VB 6 221601.00 36933.50 79.00 >>> 200066.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 950 87417466.00 92018.39 19.00 >>> 200074.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 2 34.00 17.00 11.00 23.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 745 18800.00 25.23 2.00 156.00 >>> cpu2 wakeups name count >>> irq019 ahci 50 >>> irq009 acpi 17 >>> cpu3@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-VB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 0 0.00 0.00 0.00 0.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 0 0.00 0.00 0.00 0.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 0 0.00 0.00 0.00 0.00 >>> cpu3 wakeups name count >>> >>> ================ test with hackbench 3.14-rc8 + patchset >>> ==================== >>> >>> /usr/bin/hackbench -l 10000 -s 4096 >>> >>> Running in process mode with 10 groups using 40 file descriptors each (== >>> 400 tasks) >>> Each sender will pass 10000 messages of 4096 bytes >>> >>> Time: 42.179 >>> >>> Total trace buffer: 1846688 kB >>> clusterA@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-VB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 0 0.00 0.00 0.00 0.00 >>> core0@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 880 89157590.00 101315.44 0.00 >>> 400184.00 >>> cpu0@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-VB 1 233.00 233.00 233.00 233.00 >>> C3-IVB 1 260.00 260.00 260.00 260.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 700 89162006.00 127374.29 182.00 >>> 400187.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 0 0.00 0.00 0.00 0.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 0 0.00 0.00 0.00 0.00 >>> cpu0 wakeups name count >>> irq009 acpi 2 >>> cpu1@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-VB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 334 89164805.00 266960.49 1.00 >>> 1500677.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 0 0.00 0.00 0.00 0.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 0 0.00 0.00 0.00 0.00 >>> cpu1 wakeups name count >>> irq009 acpi 6 >>> core1@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 0 0.00 0.00 0.00 0.00 >>> cpu2@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 19 2169047.00 114160.37 18.00 >>> 999129.00 >>> C1E-IB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 376 86993307.00 231365.18 20.00 >>> 1500682.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 0 0.00 0.00 0.00 0.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 0 0.00 0.00 0.00 0.00 >>> cpu2 wakeups name count >>> irq009 acpi 32 >>> irq019 ahci 45 >>> cpu3@state hits total(us) avg(us) min(us) max(us) >>> POLL 0 0.00 0.00 0.00 0.00 >>> C1-IVB 0 0.00 0.00 0.00 0.00 >>> C1E-VB 0 0.00 0.00 0.00 0.00 >>> C3-IVB 0 0.00 0.00 0.00 0.00 >>> C6-IVB 0 0.00 0.00 0.00 0.00 >>> C7-IVB 0 0.00 0.00 0.00 0.00 >>> 1701 0 0.00 0.00 0.00 0.00 >>> 1700 0 0.00 0.00 0.00 0.00 >>> 1600 0 0.00 0.00 0.00 0.00 >>> 1500 0 0.00 0.00 0.00 0.00 >>> 1400 0 0.00 0.00 0.00 0.00 >>> 1300 0 0.00 0.00 0.00 0.00 >>> 1200 0 0.00 0.00 0.00 0.00 >>> 1100 0 0.00 0.00 0.00 0.00 >>> 1000 0 0.00 0.00 0.00 0.00 >>> 900 0 0.00 0.00 0.00 0.00 >>> 800 0 0.00 0.00 0.00 0.00 >>> 782 0 0.00 0.00 0.00 0.00 >>> cpu3 wakeups name count >>> >>> >>> Daniel Lezcano (3): >>> cpuidle: encapsulate power info in a separate structure >>> idle: store the idle state the cpu is >>> sched/fair: use the idle state info to choose the idlest cpu >>> >>> arch/arm/include/asm/cpuidle.h | 6 +- >>> arch/arm/mach-exynos/cpuidle.c | 4 +- >>> drivers/acpi/processor_idle.c | 4 +- >>> drivers/base/power/domain.c | 6 +- >>> drivers/cpuidle/cpuidle-at91.c | 4 +- >>> drivers/cpuidle/cpuidle-big_little.c | 9 +-- >>> drivers/cpuidle/cpuidle-calxeda.c | 6 +- >>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +- >>> drivers/cpuidle/cpuidle-powernv.c | 8 +-- >>> drivers/cpuidle/cpuidle-pseries.c | 12 ++-- >>> drivers/cpuidle/cpuidle-ux500.c | 14 ++--- >>> drivers/cpuidle/cpuidle-zynq.c | 4 +- >>> drivers/cpuidle/driver.c | 6 +- >>> drivers/cpuidle/governors/ladder.c | 14 +++-- >>> drivers/cpuidle/governors/menu.c | 8 +-- >>> drivers/cpuidle/sysfs.c | 2 +- >>> drivers/idle/intel_idle.c | 112 >>> +++++++++++++++++----------------- >>> include/linux/cpuidle.h | 10 ++- >>> kernel/sched/fair.c | 46 ++++++++++++-- >>> kernel/sched/idle.c | 17 +++++- >>> kernel/sched/sched.h | 5 ++ >>> 21 files changed, 180 insertions(+), 121 deletions(-) >>> >>> -- >>> 1.7.9.5 >>> > > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-01 7:16 ` Vincent Guittot @ 2014-04-01 7:43 ` Daniel Lezcano 2014-04-01 9:05 ` Vincent Guittot 0 siblings, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-04-01 7:43 UTC (permalink / raw) To: Vincent Guittot Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre, linux-pm, Alex Shi, Morten Rasmussen On 04/01/2014 09:16 AM, Vincent Guittot wrote: > On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> On 03/31/2014 03:52 PM, Vincent Guittot wrote: >>> >>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >>>> >>>> The following patchset provides an interaction between cpuidle and the >>>> scheduler. >>>> >>>> The first patch encapsulate the needed information for the scheduler in a >>>> separate cpuidle structure. The second one stores the pointer to this >>>> structure >>>> when entering idle. The third one, use this information to take the >>>> decision to >>>> find the idlest cpu. >>>> >>>> After some basic testing with hackbench, it appears there is an >>>> improvement for >>>> the performances (small) and for the duration of the idle states (which >>>> provides >>>> a better power saving). >>>> >>>> The measurement has been done with the 'idlestat' tool previously posted >>>> in this >>>> mailing list. >>>> >>>> So the benefit is good for both sides performance and power saving. >>> >>> >>> Hi Daniel, >>> >>> I have looked at your results and i'm a bit surprised that you have so >>> much time in C-state with a test that involved 400 tasks on a dual >>> cores HT system. You shouldn't have any CPUs in idle state when >>> running hackbench; the total time of core0state in C7-IVB is >>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or >>> i'm doing something wrong in the interpretation of the results ? >> >> >> No, actually I mixed the output of hackbench without being run with idlestat >> or with idlestat. >> >> The hackbench's results below are without idlestat. >> >> The idlestat results are consistent and effectively it adds a non >> negligeable overhead as it impacts the hackbench results. >> >> So to summarize, hackbench has been run 4 times. >> >> 1, 2 : without idlestat, with and without the patchset - hackbench results >> ~42 secs >> >> 3, 4 : with idlestat, with and without the patchset - hackbench results ~87 >> secs >> >> At the first the glance, the results are consistent but I will double check >> them. >> >> Do you have a suggestion for a benchmarking program ? > > We are working on a bench which can generate middle load pattern with > idle CPUs but it's not available yet. In the mean time, one bench that > plays with idle time is cyclictest, it will not give you performance > results but only scheduling latency which might be what you are > looking for. Yeah, thanks. I believe I know what is in the rt-tests package :) What I meant is what kind of values would you like to see with this patchset ? >>>> The select_idle_sibling could be also improved in the same way. >>>> >>>> ====================== test with hackbench 3.14-rc8 >>>> ========================= >>>> >>>> /usr/bin/hackbench -l 10000 -s 4096 >>>> >>>> Running in process mode with 10 groups using 40 file descriptors each (== >>>> 400 tasks) >>>> Each sender will pass 10000 messages of 4096 bytes >>>> >>>> Time: 44.433 >>>> >>>> Total trace buffer: 1846688 kB >>>> clusterA@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>> core0@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 1396 87932131.00 62988.63 0.00 >>>> 320146.00 >>>> cpu0@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 1 14.00 14.00 14.00 14.00 >>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 1 262.00 262.00 262.00 262.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 1180 87938177.00 74523.88 1.00 >>>> 320147.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 0 0.00 0.00 0.00 0.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 0 0.00 0.00 0.00 0.00 >>>> cpu0 wakeups name count >>>> irq009 acpi 1 >>>> cpu1@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 475 87941356.00 185139.70 322.00 >>>> 1500690.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 0 0.00 0.00 0.00 0.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 0 0.00 0.00 0.00 0.00 >>>> cpu1 wakeups name count >>>> irq009 acpi 3 >>>> core1@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>> cpu2@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 11 288157.00 26196.09 16.00 >>>> 200060.00 >>>> C1E-VB 6 221601.00 36933.50 79.00 >>>> 200066.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 950 87417466.00 92018.39 19.00 >>>> 200074.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 2 34.00 17.00 11.00 23.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 745 18800.00 25.23 2.00 156.00 >>>> cpu2 wakeups name count >>>> irq019 ahci 50 >>>> irq009 acpi 17 >>>> cpu3@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 0 0.00 0.00 0.00 0.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 0 0.00 0.00 0.00 0.00 >>>> cpu3 wakeups name count >>>> >>>> ================ test with hackbench 3.14-rc8 + patchset >>>> ==================== >>>> >>>> /usr/bin/hackbench -l 10000 -s 4096 >>>> >>>> Running in process mode with 10 groups using 40 file descriptors each (== >>>> 400 tasks) >>>> Each sender will pass 10000 messages of 4096 bytes >>>> >>>> Time: 42.179 >>>> >>>> Total trace buffer: 1846688 kB >>>> clusterA@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>> core0@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 880 89157590.00 101315.44 0.00 >>>> 400184.00 >>>> cpu0@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-VB 1 233.00 233.00 233.00 233.00 >>>> C3-IVB 1 260.00 260.00 260.00 260.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 700 89162006.00 127374.29 182.00 >>>> 400187.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 0 0.00 0.00 0.00 0.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 0 0.00 0.00 0.00 0.00 >>>> cpu0 wakeups name count >>>> irq009 acpi 2 >>>> cpu1@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 334 89164805.00 266960.49 1.00 >>>> 1500677.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 0 0.00 0.00 0.00 0.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 0 0.00 0.00 0.00 0.00 >>>> cpu1 wakeups name count >>>> irq009 acpi 6 >>>> core1@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>> cpu2@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 19 2169047.00 114160.37 18.00 >>>> 999129.00 >>>> C1E-IB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 376 86993307.00 231365.18 20.00 >>>> 1500682.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 0 0.00 0.00 0.00 0.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 0 0.00 0.00 0.00 0.00 >>>> cpu2 wakeups name count >>>> irq009 acpi 32 >>>> irq019 ahci 45 >>>> cpu3@state hits total(us) avg(us) min(us) max(us) >>>> POLL 0 0.00 0.00 0.00 0.00 >>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>> 1701 0 0.00 0.00 0.00 0.00 >>>> 1700 0 0.00 0.00 0.00 0.00 >>>> 1600 0 0.00 0.00 0.00 0.00 >>>> 1500 0 0.00 0.00 0.00 0.00 >>>> 1400 0 0.00 0.00 0.00 0.00 >>>> 1300 0 0.00 0.00 0.00 0.00 >>>> 1200 0 0.00 0.00 0.00 0.00 >>>> 1100 0 0.00 0.00 0.00 0.00 >>>> 1000 0 0.00 0.00 0.00 0.00 >>>> 900 0 0.00 0.00 0.00 0.00 >>>> 800 0 0.00 0.00 0.00 0.00 >>>> 782 0 0.00 0.00 0.00 0.00 >>>> cpu3 wakeups name count >>>> >>>> >>>> Daniel Lezcano (3): >>>> cpuidle: encapsulate power info in a separate structure >>>> idle: store the idle state the cpu is >>>> sched/fair: use the idle state info to choose the idlest cpu >>>> >>>> arch/arm/include/asm/cpuidle.h | 6 +- >>>> arch/arm/mach-exynos/cpuidle.c | 4 +- >>>> drivers/acpi/processor_idle.c | 4 +- >>>> drivers/base/power/domain.c | 6 +- >>>> drivers/cpuidle/cpuidle-at91.c | 4 +- >>>> drivers/cpuidle/cpuidle-big_little.c | 9 +-- >>>> drivers/cpuidle/cpuidle-calxeda.c | 6 +- >>>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +- >>>> drivers/cpuidle/cpuidle-powernv.c | 8 +-- >>>> drivers/cpuidle/cpuidle-pseries.c | 12 ++-- >>>> drivers/cpuidle/cpuidle-ux500.c | 14 ++--- >>>> drivers/cpuidle/cpuidle-zynq.c | 4 +- >>>> drivers/cpuidle/driver.c | 6 +- >>>> drivers/cpuidle/governors/ladder.c | 14 +++-- >>>> drivers/cpuidle/governors/menu.c | 8 +-- >>>> drivers/cpuidle/sysfs.c | 2 +- >>>> drivers/idle/intel_idle.c | 112 >>>> +++++++++++++++++----------------- >>>> include/linux/cpuidle.h | 10 ++- >>>> kernel/sched/fair.c | 46 ++++++++++++-- >>>> kernel/sched/idle.c | 17 +++++- >>>> kernel/sched/sched.h | 5 ++ >>>> 21 files changed, 180 insertions(+), 121 deletions(-) >>>> >>>> -- >>>> 1.7.9.5 >>>> >> >> >> -- >> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs >> >> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >> <http://twitter.com/#!/linaroorg> Twitter | >> <http://www.linaro.org/linaro-blog/> Blog >> -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-01 7:43 ` Daniel Lezcano @ 2014-04-01 9:05 ` Vincent Guittot 2014-04-15 13:13 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Vincent Guittot @ 2014-04-01 9:05 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre, linux-pm, Alex Shi, Morten Rasmussen On 1 April 2014 09:43, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > On 04/01/2014 09:16 AM, Vincent Guittot wrote: >> >> On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >>> >>> On 03/31/2014 03:52 PM, Vincent Guittot wrote: >>>> >>>> >>>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> >>>> wrote: >>>>> >>>>> >>>>> The following patchset provides an interaction between cpuidle and the >>>>> scheduler. >>>>> >>>>> The first patch encapsulate the needed information for the scheduler in >>>>> a >>>>> separate cpuidle structure. The second one stores the pointer to this >>>>> structure >>>>> when entering idle. The third one, use this information to take the >>>>> decision to >>>>> find the idlest cpu. >>>>> >>>>> After some basic testing with hackbench, it appears there is an >>>>> improvement for >>>>> the performances (small) and for the duration of the idle states (which >>>>> provides >>>>> a better power saving). >>>>> >>>>> The measurement has been done with the 'idlestat' tool previously >>>>> posted >>>>> in this >>>>> mailing list. >>>>> >>>>> So the benefit is good for both sides performance and power saving. >>>> >>>> >>>> >>>> Hi Daniel, >>>> >>>> I have looked at your results and i'm a bit surprised that you have so >>>> much time in C-state with a test that involved 400 tasks on a dual >>>> cores HT system. You shouldn't have any CPUs in idle state when >>>> running hackbench; the total time of core0state in C7-IVB is >>>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or >>>> i'm doing something wrong in the interpretation of the results ? >>> >>> >>> >>> No, actually I mixed the output of hackbench without being run with >>> idlestat >>> or with idlestat. >>> >>> The hackbench's results below are without idlestat. >>> >>> The idlestat results are consistent and effectively it adds a non >>> negligeable overhead as it impacts the hackbench results. >>> >>> So to summarize, hackbench has been run 4 times. >>> >>> 1, 2 : without idlestat, with and without the patchset - hackbench >>> results >>> ~42 secs >>> >>> 3, 4 : with idlestat, with and without the patchset - hackbench results >>> ~87 >>> secs >>> >>> At the first the glance, the results are consistent but I will double >>> check >>> them. >>> >>> Do you have a suggestion for a benchmarking program ? >> >> >> We are working on a bench which can generate middle load pattern with >> idle CPUs but it's not available yet. In the mean time, one bench that >> plays with idle time is cyclictest, it will not give you performance >> results but only scheduling latency which might be what you are >> looking for. > > > Yeah, thanks. I believe I know what is in the rt-tests package :) > > What I meant is what kind of values would you like to see with this patchset > ? IIUC, you patch tries to improve the wake up latency of a task by selecting the CPUs with the shallowest C-state, so this metrics seems to be a good candidate > > > > >>>>> The select_idle_sibling could be also improved in the same way. >>>>> >>>>> ====================== test with hackbench 3.14-rc8 >>>>> ========================= >>>>> >>>>> /usr/bin/hackbench -l 10000 -s 4096 >>>>> >>>>> Running in process mode with 10 groups using 40 file descriptors each >>>>> (== >>>>> 400 tasks) >>>>> Each sender will pass 10000 messages of 4096 bytes >>>>> >>>>> Time: 44.433 >>>>> >>>>> Total trace buffer: 1846688 kB >>>>> clusterA@state hits total(us) avg(us) min(us) max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>>> core0@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 1396 87932131.00 62988.63 0.00 >>>>> 320146.00 >>>>> cpu0@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 1 14.00 14.00 14.00 14.00 >>>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 1 262.00 262.00 262.00 >>>>> 262.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 1180 87938177.00 74523.88 1.00 >>>>> 320147.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 0 0.00 0.00 0.00 0.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.0 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 0 0.00 0.00 0.00 0.00 >>>>> cpu0 wakeups name count >>>>> irq009 acpi 1 >>>>> cpu1@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 475 87941356.00 185139.70 322.00 >>>>> 1500690.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 0 0.00 0.00 0.00 0.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.00 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 0 0.00 0.00 0.00 0.00 >>>>> cpu1 wakeups name count >>>>> irq009 acpi 3 >>>>> core1@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>>> cpu2@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 11 288157.00 26196.09 16.00 >>>>> 200060.00 >>>>> C1E-VB 6 221601.00 36933.50 79.00 >>>>> 200066.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 950 87417466.00 92018.39 19.00 >>>>> 200074.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 2 34.00 17.00 11.00 23.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.00 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 745 18800.00 25.23 2.00 >>>>> 156.00 >>>>> cpu2 wakeups name count >>>>> irq019 ahci 50 >>>>> irq009 acpi 17 >>>>> cpu3@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 0 0.00 0.00 0.00 0.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.00 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 0 0.00 0.00 0.00 0.00 >>>>> cpu3 wakeups name count >>>>> >>>>> ================ test with hackbench 3.14-rc8 + patchset >>>>> ==================== >>>>> >>>>> /usr/bin/hackbench -l 10000 -s 4096 >>>>> >>>>> Running in process mode with 10 groups using 40 file descriptors each >>>>> (== >>>>> 400 tasks) >>>>> Each sender will pass 10000 messages of 4096 bytes >>>>> >>>>> Time: 42.179 >>>>> >>>>> Total trace buffer: 1846688 kB >>>>> clusterA@state hits total(us) avg(us) min(us) max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>>> core0@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 880 89157590.00 101315.44 0.00 >>>>> 400184.00 >>>>> cpu0@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-VB 1 233.00 233.00 233.00 >>>>> 233.00 >>>>> C3-IVB 1 260.00 260.00 260.00 >>>>> 260.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 700 89162006.00 127374.29 182.00 >>>>> 400187.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 0 0.00 0.00 0.00 0.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.00 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 0 0.00 0.00 0.00 0.00 >>>>> cpu0 wakeups name count >>>>> irq009 acpi 2 >>>>> cpu1@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 334 89164805.00 266960.49 1.00 >>>>> 1500677.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 0 0.00 0.00 0.00 0.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.00 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 0 0.00 0.00 0.00 0.00 >>>>> cpu1 wakeups name count >>>>> irq009 acpi 6 >>>>> core1@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-IVB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>>> cpu2@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 19 2169047.00 114160.37 18.00 >>>>> 999129.00 >>>>> C1E-IB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 376 86993307.00 231365.18 20.00 >>>>> 1500682.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 0 0.00 0.00 0.00 0.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.00 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 0 0.00 0.00 0.00 0.00 >>>>> cpu2 wakeups name count >>>>> irq009 acpi 32 >>>>> irq019 ahci 45 >>>>> cpu3@state hits total(us) avg(us) min(us) >>>>> max(us) >>>>> POLL 0 0.00 0.00 0.00 0.00 >>>>> C1-IVB 0 0.00 0.00 0.00 0.00 >>>>> C1E-VB 0 0.00 0.00 0.00 0.00 >>>>> C3-IVB 0 0.00 0.00 0.00 0.00 >>>>> C6-IVB 0 0.00 0.00 0.00 0.00 >>>>> C7-IVB 0 0.00 0.00 0.00 0.00 >>>>> 1701 0 0.00 0.00 0.00 0.00 >>>>> 1700 0 0.00 0.00 0.00 0.00 >>>>> 1600 0 0.00 0.00 0.00 0.00 >>>>> 1500 0 0.00 0.00 0.00 0.00 >>>>> 1400 0 0.00 0.00 0.00 0.00 >>>>> 1300 0 0.00 0.00 0.00 0.00 >>>>> 1200 0 0.00 0.00 0.00 0.00 >>>>> 1100 0 0.00 0.00 0.00 0.00 >>>>> 1000 0 0.00 0.00 0.00 0.00 >>>>> 900 0 0.00 0.00 0.00 0.00 >>>>> 800 0 0.00 0.00 0.00 0.00 >>>>> 782 0 0.00 0.00 0.00 0.00 >>>>> cpu3 wakeups name count >>>>> >>>>> >>>>> Daniel Lezcano (3): >>>>> cpuidle: encapsulate power info in a separate structure >>>>> idle: store the idle state the cpu is >>>>> sched/fair: use the idle state info to choose the idlest cpu >>>>> >>>>> arch/arm/include/asm/cpuidle.h | 6 +- >>>>> arch/arm/mach-exynos/cpuidle.c | 4 +- >>>>> drivers/acpi/processor_idle.c | 4 +- >>>>> drivers/base/power/domain.c | 6 +- >>>>> drivers/cpuidle/cpuidle-at91.c | 4 +- >>>>> drivers/cpuidle/cpuidle-big_little.c | 9 +-- >>>>> drivers/cpuidle/cpuidle-calxeda.c | 6 +- >>>>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +- >>>>> drivers/cpuidle/cpuidle-powernv.c | 8 +-- >>>>> drivers/cpuidle/cpuidle-pseries.c | 12 ++-- >>>>> drivers/cpuidle/cpuidle-ux500.c | 14 ++--- >>>>> drivers/cpuidle/cpuidle-zynq.c | 4 +- >>>>> drivers/cpuidle/driver.c | 6 +- >>>>> drivers/cpuidle/governors/ladder.c | 14 +++-- >>>>> drivers/cpuidle/governors/menu.c | 8 +-- >>>>> drivers/cpuidle/sysfs.c | 2 +- >>>>> drivers/idle/intel_idle.c | 112 >>>>> +++++++++++++++++----------------- >>>>> include/linux/cpuidle.h | 10 ++- >>>>> kernel/sched/fair.c | 46 ++++++++++++-- >>>>> kernel/sched/idle.c | 17 +++++- >>>>> kernel/sched/sched.h | 5 ++ >>>>> 21 files changed, 180 insertions(+), 121 deletions(-) >>>>> >>>>> -- >>>>> 1.7.9.5 >>>>> >>> >>> >>> -- >>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs >>> >>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | >>> <http://twitter.com/#!/linaroorg> Twitter | >>> <http://www.linaro.org/linaro-blog/> Blog >>> > > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-01 9:05 ` Vincent Guittot @ 2014-04-15 13:13 ` Peter Zijlstra 0 siblings, 0 replies; 47+ messages in thread From: Peter Zijlstra @ 2014-04-15 13:13 UTC (permalink / raw) To: Vincent Guittot Cc: Daniel Lezcano, linux-kernel, Ingo Molnar, rjw, Nicolas Pitre, linux-pm, Alex Shi, Morten Rasmussen On Tue, Apr 01, 2014 at 11:05:16AM +0200, Vincent Guittot wrote: > >> We are working on a bench which can generate middle load pattern with > >> idle CPUs but it's not available yet. In the mean time, one bench that > >> plays with idle time is cyclictest, it will not give you performance > >> results but only scheduling latency which might be what you are > >> looking for. > > > > > > Yeah, thanks. I believe I know what is in the rt-tests package :) > > > > What I meant is what kind of values would you like to see with this patchset > > ? > > IIUC, you patch tries to improve the wake up latency of a task by > selecting the CPUs with the shallowest C-state, so this metrics seems > to be a good candidate cyclic-test might be too regular to really measure anything though. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano ` (3 preceding siblings ...) 2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot @ 2014-04-01 23:01 ` Rafael J. Wysocki 2014-04-02 3:14 ` Nicolas Pitre 2014-04-02 8:26 ` Daniel Lezcano 2014-04-04 6:29 ` Len Brown 5 siblings, 2 replies; 47+ messages in thread From: Rafael J. Wysocki @ 2014-04-01 23:01 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote: > The following patchset provides an interaction between cpuidle and the scheduler. > > The first patch encapsulate the needed information for the scheduler in a > separate cpuidle structure. The second one stores the pointer to this structure > when entering idle. The third one, use this information to take the decision to > find the idlest cpu. > > After some basic testing with hackbench, it appears there is an improvement for > the performances (small) and for the duration of the idle states (which provides > a better power saving). > > The measurement has been done with the 'idlestat' tool previously posted in this > mailing list. > > So the benefit is good for both sides performance and power saving. > > The select_idle_sibling could be also improved in the same way. Well, quite frankly, I don't really like this series. Not the idea itself, but the way it has been implemented. First off, if the scheduler is to access idle state data stored in struct cpuidle_state, I'm not sure why we need a separate new structure for that? Couldn't there be a pointer to a whole struct cpuidle_state from struct rq instead? [->exit_latency is the only field that find_idlest_cpu() in your third patch seems to be using anyway.] Second, is accessing the idle state information for all CPUs from find_idlest_cpu() guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to non-idle while another one is executing find_idlest_cpu()? In other words, where's the read memory barrier corresponding to the write ones in the modified cpu_idle_call()? And is the memory barrier actually sufficient? After all, you need to guarantee that the CPU is still idle after you have evaluated idle_cpu() on it. Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest" CPU the best one? What about deeper vs shallower idle states, for example? Rafael ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-01 23:01 ` Rafael J. Wysocki @ 2014-04-02 3:14 ` Nicolas Pitre 2014-04-04 11:43 ` Rafael J. Wysocki 2014-04-02 8:26 ` Daniel Lezcano 1 sibling, 1 reply; 47+ messages in thread From: Nicolas Pitre @ 2014-04-02 3:14 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Wed, 2 Apr 2014, Rafael J. Wysocki wrote: > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote: > > The following patchset provides an interaction between cpuidle and the scheduler. > > > > The first patch encapsulate the needed information for the scheduler in a > > separate cpuidle structure. The second one stores the pointer to this structure > > when entering idle. The third one, use this information to take the decision to > > find the idlest cpu. > > > > After some basic testing with hackbench, it appears there is an improvement for > > the performances (small) and for the duration of the idle states (which provides > > a better power saving). > > > > The measurement has been done with the 'idlestat' tool previously posted in this > > mailing list. > > > > So the benefit is good for both sides performance and power saving. > > > > The select_idle_sibling could be also improved in the same way. > > Well, quite frankly, I don't really like this series. Not the idea itself, but > the way it has been implemented. > > First off, if the scheduler is to access idle state data stored in struct > cpuidle_state, I'm not sure why we need a separate new structure for that? > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq > instead? [->exit_latency is the only field that find_idlest_cpu() in your > third patch seems to be using anyway.] Future patches are likely to use the other fields. I presume that's why Daniel put them there. But I admit being on the fence about this i.e whether or not we should encapsulate shared fields into a separate structure or not. > Second, is accessing the idle state information for all CPUs from find_idlest_cpu() > guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to > non-idle while another one is executing find_idlest_cpu()? In other words, > where's the read memory barrier corresponding to the write ones in the modified > cpu_idle_call()? And is the memory barrier actually sufficient? After all, > you need to guarantee that the CPU is still idle after you have evaluated > idle_cpu() on it. I don't think avoiding races is all that important here. Right now any idle CPU is selected regardless of its idle state depth. What this patch should do (considering my previous comments on it) is to favor the idle CPU with the shalloest idle state. If once in a while the selection is wrong because of a race we're not going to make it any worse than what we have today without this patch. That probably means the write barrier could potentially be omitted as well if it implies a useless cost. We need to ensure the cpuidle data structure is not going away (e.g. cpuidle driver module removal) while another CPU looks at it though. The timing would have to be awfully weird for this to happen but still. > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest" > CPU the best one? What about deeper vs shallower idle states, for example? That's what this patch series is about. The find_idlest_cpu code should look for the idle CPU with the shallowest idle state, or the one with the smallest load. In this context "find_idlest_cpu" might become a misnomer. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-02 3:14 ` Nicolas Pitre @ 2014-04-04 11:43 ` Rafael J. Wysocki 2014-04-15 13:17 ` Peter Zijlstra 2014-04-15 13:25 ` Peter Zijlstra 0 siblings, 2 replies; 47+ messages in thread From: Rafael J. Wysocki @ 2014-04-04 11:43 UTC (permalink / raw) To: Nicolas Pitre Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Tuesday, April 01, 2014 11:14:33 PM Nicolas Pitre wrote: > On Wed, 2 Apr 2014, Rafael J. Wysocki wrote: > > > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote: > > > The following patchset provides an interaction between cpuidle and the scheduler. > > > > > > The first patch encapsulate the needed information for the scheduler in a > > > separate cpuidle structure. The second one stores the pointer to this structure > > > when entering idle. The third one, use this information to take the decision to > > > find the idlest cpu. > > > > > > After some basic testing with hackbench, it appears there is an improvement for > > > the performances (small) and for the duration of the idle states (which provides > > > a better power saving). > > > > > > The measurement has been done with the 'idlestat' tool previously posted in this > > > mailing list. > > > > > > So the benefit is good for both sides performance and power saving. > > > > > > The select_idle_sibling could be also improved in the same way. > > > > Well, quite frankly, I don't really like this series. Not the idea itself, but > > the way it has been implemented. > > > > First off, if the scheduler is to access idle state data stored in struct > > cpuidle_state, I'm not sure why we need a separate new structure for that? > > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq > > instead? [->exit_latency is the only field that find_idlest_cpu() in your > > third patch seems to be using anyway.] > > Future patches are likely to use the other fields. I presume that's why > Daniel put them there. > > But I admit being on the fence about this i.e whether or not we should > encapsulate shared fields into a separate structure or not. Quite frankly, I don't see a point in using a separate structure here. > > Second, is accessing the idle state information for all CPUs from find_idlest_cpu() > > guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to > > non-idle while another one is executing find_idlest_cpu()? In other words, > > where's the read memory barrier corresponding to the write ones in the modified > > cpu_idle_call()? And is the memory barrier actually sufficient? After all, > > you need to guarantee that the CPU is still idle after you have evaluated > > idle_cpu() on it. > > I don't think avoiding races is all that important here. Right now any > idle CPU is selected regardless of its idle state depth. What this > patch should do (considering my previous comments on it) is to favor the > idle CPU with the shalloest idle state. If once in a while the > selection is wrong because of a race we're not going to make it any > worse than what we have today without this patch. > > That probably means the write barrier could potentially be omitted as > well if it implies a useless cost. Yes, the write barriers don't seem to serve any real purpose. > We need to ensure the cpuidle data structure is not going away (e.g. > cpuidle driver module removal) while another CPU looks at it though. > The timing would have to be awfully weird for this to happen but still. Well, I'm not sure if that is a real concern. Only a couple of drivers try to implement module unloading and I guess this isn't tested too much, so perhaps we should just make it impossible to unload a cpuidle driver? > > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest" > > CPU the best one? What about deeper vs shallower idle states, for example? > > That's what this patch series is about. The find_idlest_cpu code should > look for the idle CPU with the shallowest idle state, or the one with > the smallest load. In this context "find_idlest_cpu" might become a > misnomer. Yes, clearly. It should be called find_best_cpu or something like that. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-04 11:43 ` Rafael J. Wysocki @ 2014-04-15 13:17 ` Peter Zijlstra 2014-04-15 13:25 ` Peter Zijlstra 1 sibling, 0 replies; 47+ messages in thread From: Peter Zijlstra @ 2014-04-15 13:17 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote: > > We need to ensure the cpuidle data structure is not going away (e.g. > > cpuidle driver module removal) while another CPU looks at it though. > > The timing would have to be awfully weird for this to happen but still. > > Well, I'm not sure if that is a real concern. Only a couple of drivers try > to implement module unloading and I guess this isn't tested too much, so > perhaps we should just make it impossible to unload a cpuidle driver? The 'easy' solution is to mandate the use of rcu_read_lock() around the dereference and make all cpuidle drivers put an rcu_barrier() in their module unload path. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-04 11:43 ` Rafael J. Wysocki 2014-04-15 13:17 ` Peter Zijlstra @ 2014-04-15 13:25 ` Peter Zijlstra 2014-04-15 15:27 ` Nicolas Pitre 2014-04-15 15:33 ` Rafael J. Wysocki 1 sibling, 2 replies; 47+ messages in thread From: Peter Zijlstra @ 2014-04-15 13:25 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote: > > That's what this patch series is about. The find_idlest_cpu code should > > look for the idle CPU with the shallowest idle state, or the one with > > the smallest load. In this context "find_idlest_cpu" might become a > > misnomer. > > Yes, clearly. It should be called find_best_cpu or something like that. Ha!, but for what purpose? We already have find_busiest_cpu() to find the CPU to steal work from. The converse action, currently called find_idlest_cpu() is finding the CPU where to put work. 'Best' is ambiguous in all regards, it doesn't convey the direction nor the quality sorted on. So while idlest might be somewhat of a misnomer, it at least conveys the directional thing fairly well. Also we are still searching the least busy, and preferable an idle, cpu. 'Idlest' being a superlative also conveys the meaning of order. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-15 13:25 ` Peter Zijlstra @ 2014-04-15 15:27 ` Nicolas Pitre 2014-04-15 15:33 ` Rafael J. Wysocki 1 sibling, 0 replies; 47+ messages in thread From: Nicolas Pitre @ 2014-04-15 15:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Rafael J. Wysocki, Daniel Lezcano, linux-kernel, mingo, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Tue, 15 Apr 2014, Peter Zijlstra wrote: > On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote: > > > That's what this patch series is about. The find_idlest_cpu code should > > > look for the idle CPU with the shallowest idle state, or the one with > > > the smallest load. In this context "find_idlest_cpu" might become a > > > misnomer. > > > > Yes, clearly. It should be called find_best_cpu or something like that. > > Ha!, but for what purpose? We already have find_busiest_cpu() to find > the CPU to steal work from. The converse action, currently called > find_idlest_cpu() is finding the CPU where to put work. > > 'Best' is ambiguous in all regards, it doesn't convey the direction nor > the quality sorted on. > > So while idlest might be somewhat of a misnomer, it at least conveys the > directional thing fairly well. Also we are still searching the least > busy, and preferable an idle, cpu. 'Idlest' being a superlative also > conveys the meaning of order. I agree that anything which is called "best" is ambigous. Best for what? That isn't self explanatory. However "idlest" is no longer the wanted attribute here. "Least busy" is right. But not necessarily the "idlest". The "best" CPU here is somewhat in the middle between busiest and idlest i.e. preferably idle, but not the "idlest" in the cpuidle sense. Maybe we could use your definition to simply call it find_cpu_to_put_work() or the like. Today this is based on the idleness of CPUs, but eventually we'll want to pack tasks on already loaded CPUs (without oversubscribing them) in order to keep as many CPUs idle as possible when that makes sense, which would alter the selection somewhat. Nicolas ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-15 13:25 ` Peter Zijlstra 2014-04-15 15:27 ` Nicolas Pitre @ 2014-04-15 15:33 ` Rafael J. Wysocki 1 sibling, 0 replies; 47+ messages in thread From: Rafael J. Wysocki @ 2014-04-15 15:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Tuesday, April 15, 2014 03:25:10 PM Peter Zijlstra wrote: > On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote: > > > That's what this patch series is about. The find_idlest_cpu code should > > > look for the idle CPU with the shallowest idle state, or the one with > > > the smallest load. In this context "find_idlest_cpu" might become a > > > misnomer. > > > > Yes, clearly. It should be called find_best_cpu or something like that. > > Ha!, but for what purpose? We already have find_busiest_cpu() to find > the CPU to steal work from. The converse action, currently called > find_idlest_cpu() is finding the CPU where to put work. > > 'Best' is ambiguous in all regards, it doesn't convey the direction nor > the quality sorted on. > > So while idlest might be somewhat of a misnomer, it at least conveys the > directional thing fairly well. Also we are still searching the least > busy, and preferable an idle, cpu. 'Idlest' being a superlative also > conveys the meaning of order. But 'idlest' can also be understood as 'deepest idle', which clearly is not the intent. Perhaps find_cpu_for_work() reflects what it does, but I'm not sure if that's a good name either. Rafael ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-01 23:01 ` Rafael J. Wysocki 2014-04-02 3:14 ` Nicolas Pitre @ 2014-04-02 8:26 ` Daniel Lezcano 2014-04-04 11:23 ` Rafael J. Wysocki 1 sibling, 1 reply; 47+ messages in thread From: Daniel Lezcano @ 2014-04-02 8:26 UTC (permalink / raw) To: Rafael J. Wysocki Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On 04/02/2014 01:01 AM, Rafael J. Wysocki wrote: > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote: >> The following patchset provides an interaction between cpuidle and the scheduler. >> >> The first patch encapsulate the needed information for the scheduler in a >> separate cpuidle structure. The second one stores the pointer to this structure >> when entering idle. The third one, use this information to take the decision to >> find the idlest cpu. >> >> After some basic testing with hackbench, it appears there is an improvement for >> the performances (small) and for the duration of the idle states (which provides >> a better power saving). >> >> The measurement has been done with the 'idlestat' tool previously posted in this >> mailing list. >> >> So the benefit is good for both sides performance and power saving. >> >> The select_idle_sibling could be also improved in the same way. > > Well, quite frankly, I don't really like this series. Not the idea itself, but > the way it has been implemented. > > First off, if the scheduler is to access idle state data stored in struct > cpuidle_state, I'm not sure why we need a separate new structure for that? > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq > instead? [->exit_latency is the only field that find_idlest_cpu() in your > third patch seems to be using anyway.] Hi Rafael, thank you very much for reviewing the patchset. I created a specific structure to encapsulate the informations needed for the scheduler and to prevent to export unneeded data. This is purely for code design. Also it was to separate the idle's energy characteristics from the cpuidle framework data (flags, name, etc ...). The exit_latency field is only used in this patchset but the target_residency will be used also (eg. prevent to wakeup a cpu before the minimum idle time target residency). The power field is ... hum ... not filled by any board (except for calxeda). Vendors do not like to share this information, so very likely that would be changed to a normalized value, I don't know. I agree we can put a pointer to the struct cpuidle_state instead if that reduce the impact of the patchset. > Second, is accessing the idle state information for all CPUs from find_idlest_cpu() > guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to > non-idle while another one is executing find_idlest_cpu()? In other words, > where's the read memory barrier corresponding to the write ones in the modified > cpu_idle_call()? And is the memory barrier actually sufficient? After all, > you need to guarantee that the CPU is still idle after you have evaluated > idle_cpu() on it. Well, as Nicolas mentioned it in another mail, we can live with races, the scheduler will take a wrong decision but nothing worth than what we have today. In any case we want to prevent any lock in the code. > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest" > CPU the best one? What about deeper vs shallower idle states, for example? I believe it is what is supposed to do the patchset. 1. if the cpu is idle, pick the shallower, 2. if the cpu is not idle pick the less loaded. But may be there is something wrong in the routine as pointed Nico, I have to double check it. Thanks ! -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-02 8:26 ` Daniel Lezcano @ 2014-04-04 11:23 ` Rafael J. Wysocki 0 siblings, 0 replies; 47+ messages in thread From: Rafael J. Wysocki @ 2014-04-04 11:23 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi, vincent.guittot, morten.rasmussen On Wednesday, April 02, 2014 10:26:31 AM Daniel Lezcano wrote: > On 04/02/2014 01:01 AM, Rafael J. Wysocki wrote: > > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote: > >> The following patchset provides an interaction between cpuidle and the scheduler. > >> > >> The first patch encapsulate the needed information for the scheduler in a > >> separate cpuidle structure. The second one stores the pointer to this structure > >> when entering idle. The third one, use this information to take the decision to > >> find the idlest cpu. > >> > >> After some basic testing with hackbench, it appears there is an improvement for > >> the performances (small) and for the duration of the idle states (which provides > >> a better power saving). > >> > >> The measurement has been done with the 'idlestat' tool previously posted in this > >> mailing list. > >> > >> So the benefit is good for both sides performance and power saving. > >> > >> The select_idle_sibling could be also improved in the same way. > > > > Well, quite frankly, I don't really like this series. Not the idea itself, but > > the way it has been implemented. > > > > First off, if the scheduler is to access idle state data stored in struct > > cpuidle_state, I'm not sure why we need a separate new structure for that? > > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq > > instead? [->exit_latency is the only field that find_idlest_cpu() in your > > third patch seems to be using anyway.] > > Hi Rafael, > > thank you very much for reviewing the patchset. > > I created a specific structure to encapsulate the informations needed > for the scheduler and to prevent to export unneeded data. This is purely > for code design. Also it was to separate the idle's energy > characteristics from the cpuidle framework data (flags, name, etc ...). > > The exit_latency field is only used in this patchset but the > target_residency will be used also (eg. prevent to wakeup a cpu before > the minimum idle time target residency). OK It would be good to add that heuristics upfront so that we can see the full picture. > The power field is ... hum ... not filled by any board (except for > calxeda). Vendors do not like to share this information, so very likely > that would be changed to a normalized value, I don't know. I'm not sure if that field is ever going to be used by everyone to be honest. > I agree we can put a pointer to the struct cpuidle_state instead if that > reduce the impact of the patchset. Yes, it will, in my opinion. > > Second, is accessing the idle state information for all CPUs from find_idlest_cpu() > > guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to > > non-idle while another one is executing find_idlest_cpu()? In other words, > > where's the read memory barrier corresponding to the write ones in the modified > > cpu_idle_call()? And is the memory barrier actually sufficient? After all, > > you need to guarantee that the CPU is still idle after you have evaluated > > idle_cpu() on it. > > Well, as Nicolas mentioned it in another mail, we can live with races, > the scheduler will take a wrong decision but nothing worth than what we I guess you mean "worse"? I'm not sure about that. > have today. In any case we want to prevent any lock in the code. Of course. :-) > > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest" > > CPU the best one? What about deeper vs shallower idle states, for example? > > I believe it is what is supposed to do the patchset. 1. if the cpu is > idle, pick the shallower, 2. if the cpu is not idle pick the less > loaded. But may be there is something wrong in the routine as pointed > Nico, I have to double check it. Yes, that routine doesn't look entirely correct then. Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano ` (4 preceding siblings ...) 2014-04-01 23:01 ` Rafael J. Wysocki @ 2014-04-04 6:29 ` Len Brown 2014-04-04 8:16 ` Daniel Lezcano 5 siblings, 1 reply; 47+ messages in thread From: Len Brown @ 2014-04-04 6:29 UTC (permalink / raw) To: Daniel Lezcano Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J. Wysocki, nicolas.pitre, Linux PM list, alex.shi, vincent.guittot, morten.rasmussen Hi Daniel, Interesting idea. The benefit of this patch is to reduce power. Have you been able to measure a power reduction, via power meter, or via built-in RAPL power meter? (turbostat will show RAPL watts, or if you have constant quantity of work, use turbostat -J) thanks, -Len ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info 2014-04-04 6:29 ` Len Brown @ 2014-04-04 8:16 ` Daniel Lezcano 0 siblings, 0 replies; 47+ messages in thread From: Daniel Lezcano @ 2014-04-04 8:16 UTC (permalink / raw) To: Len Brown Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J. Wysocki, nicolas.pitre, Linux PM list, alex.shi, vincent.guittot, morten.rasmussen On 04/04/2014 08:29 AM, Len Brown wrote: > Hi Daniel, > > Interesting idea. > > The benefit of this patch is to reduce power. > Have you been able to measure a power reduction, via power meter, or > via built-in RAPL power meter? > (turbostat will show RAPL watts, or if you have constant quantity of > work, use turbostat -J) Hi Len, thanks for looking the patches. I will tweak, respin the patchset and do some more measurements. I don't have a power meter but may be the RAPL could help to test on x86. Thanks -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2014-04-18 16:00 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano 2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano 2014-03-28 18:17 ` Nicolas Pitre 2014-03-28 20:42 ` Daniel Lezcano 2014-03-29 0:00 ` Nicolas Pitre 2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano 2014-04-15 12:43 ` Peter Zijlstra 2014-04-15 12:44 ` Peter Zijlstra 2014-04-15 14:17 ` Daniel Lezcano 2014-04-15 14:33 ` Peter Zijlstra 2014-04-15 14:39 ` Daniel Lezcano 2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano 2014-04-02 3:05 ` Nicolas Pitre 2014-04-04 11:57 ` Rafael J. Wysocki 2014-04-04 16:56 ` Nicolas Pitre 2014-04-05 2:01 ` Rafael J. Wysocki 2014-04-17 13:53 ` Daniel Lezcano 2014-04-17 14:47 ` Peter Zijlstra 2014-04-17 15:03 ` Daniel Lezcano 2014-04-18 8:09 ` Ingo Molnar 2014-04-18 8:36 ` Daniel Lezcano 2014-04-17 15:53 ` Nicolas Pitre 2014-04-17 16:05 ` Daniel Lezcano 2014-04-17 16:21 ` Nicolas Pitre 2014-04-18 9:38 ` Peter Zijlstra 2014-04-18 12:13 ` Daniel Lezcano 2014-04-18 12:53 ` Peter Zijlstra 2014-04-18 13:04 ` Daniel Lezcano 2014-04-18 16:00 ` Nicolas Pitre 2014-04-15 13:03 ` Peter Zijlstra 2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot 2014-03-31 15:55 ` Daniel Lezcano 2014-04-01 7:16 ` Vincent Guittot 2014-04-01 7:43 ` Daniel Lezcano 2014-04-01 9:05 ` Vincent Guittot 2014-04-15 13:13 ` Peter Zijlstra 2014-04-01 23:01 ` Rafael J. Wysocki 2014-04-02 3:14 ` Nicolas Pitre 2014-04-04 11:43 ` Rafael J. Wysocki 2014-04-15 13:17 ` Peter Zijlstra 2014-04-15 13:25 ` Peter Zijlstra 2014-04-15 15:27 ` Nicolas Pitre 2014-04-15 15:33 ` Rafael J. Wysocki 2014-04-02 8:26 ` Daniel Lezcano 2014-04-04 11:23 ` Rafael J. Wysocki 2014-04-04 6:29 ` Len Brown 2014-04-04 8:16 ` Daniel Lezcano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.