* [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
@ 2014-03-28 12:29 Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
` (5 more replies)
0 siblings, 6 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
To: linux-kernel, mingo, peterz
Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
morten.rasmussen
The following patchset provides an interaction between cpuidle and the scheduler.
The first patch encapsulate the needed information for the scheduler in a
separate cpuidle structure. The second one stores the pointer to this structure
when entering idle. The third one, use this information to take the decision to
find the idlest cpu.
After some basic testing with hackbench, it appears there is an improvement for
the performances (small) and for the duration of the idle states (which provides
a better power saving).
The measurement has been done with the 'idlestat' tool previously posted in this
mailing list.
So the benefit is good for both sides performance and power saving.
The select_idle_sibling could be also improved in the same way.
====================== test with hackbench 3.14-rc8 =========================
/usr/bin/hackbench -l 10000 -s 4096
Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
Each sender will pass 10000 messages of 4096 bytes
Time: 44.433
Total trace buffer: 1846688 kB
clusterA@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-VB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 0 0.00 0.00 0.00 0.00
core0@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-IVB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 1396 87932131.00 62988.63 0.00 320146.00
cpu0@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 1 14.00 14.00 14.00 14.00
C1E-VB 0 0.00 0.00 0.00 0.00
C3-IVB 1 262.00 262.00 262.00 262.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 1180 87938177.00 74523.88 1.00 320147.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 0 0.00 0.00 0.00 0.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 0 0.00 0.00 0.00 0.00
cpu0 wakeups name count
irq009 acpi 1
cpu1@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-VB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 475 87941356.00 185139.70 322.00 1500690.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 0 0.00 0.00 0.00 0.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 0 0.00 0.00 0.00 0.00
cpu1 wakeups name count
irq009 acpi 3
core1@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-IVB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 0 0.00 0.00 0.00 0.00
cpu2@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 11 288157.00 26196.09 16.00 200060.00
C1E-VB 6 221601.00 36933.50 79.00 200066.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 950 87417466.00 92018.39 19.00 200074.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 2 34.00 17.00 11.00 23.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 745 18800.00 25.23 2.00 156.00
cpu2 wakeups name count
irq019 ahci 50
irq009 acpi 17
cpu3@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-VB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 0 0.00 0.00 0.00 0.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 0 0.00 0.00 0.00 0.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 0 0.00 0.00 0.00 0.00
cpu3 wakeups name count
================ test with hackbench 3.14-rc8 + patchset ====================
/usr/bin/hackbench -l 10000 -s 4096
Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
Each sender will pass 10000 messages of 4096 bytes
Time: 42.179
Total trace buffer: 1846688 kB
clusterA@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-VB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 0 0.00 0.00 0.00 0.00
core0@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-IVB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 880 89157590.00 101315.44 0.00 400184.00
cpu0@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-VB 1 233.00 233.00 233.00 233.00
C3-IVB 1 260.00 260.00 260.00 260.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 700 89162006.00 127374.29 182.00 400187.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 0 0.00 0.00 0.00 0.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 0 0.00 0.00 0.00 0.00
cpu0 wakeups name count
irq009 acpi 2
cpu1@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-VB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 334 89164805.00 266960.49 1.00 1500677.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 0 0.00 0.00 0.00 0.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 0 0.00 0.00 0.00 0.00
cpu1 wakeups name count
irq009 acpi 6
core1@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-IVB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 0 0.00 0.00 0.00 0.00
cpu2@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 19 2169047.00 114160.37 18.00 999129.00
C1E-IB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 376 86993307.00 231365.18 20.00 1500682.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 0 0.00 0.00 0.00 0.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 0 0.00 0.00 0.00 0.00
cpu2 wakeups name count
irq009 acpi 32
irq019 ahci 45
cpu3@state hits total(us) avg(us) min(us) max(us)
POLL 0 0.00 0.00 0.00 0.00
C1-IVB 0 0.00 0.00 0.00 0.00
C1E-VB 0 0.00 0.00 0.00 0.00
C3-IVB 0 0.00 0.00 0.00 0.00
C6-IVB 0 0.00 0.00 0.00 0.00
C7-IVB 0 0.00 0.00 0.00 0.00
1701 0 0.00 0.00 0.00 0.00
1700 0 0.00 0.00 0.00 0.00
1600 0 0.00 0.00 0.00 0.00
1500 0 0.00 0.00 0.00 0.00
1400 0 0.00 0.00 0.00 0.00
1300 0 0.00 0.00 0.00 0.00
1200 0 0.00 0.00 0.00 0.00
1100 0 0.00 0.00 0.00 0.00
1000 0 0.00 0.00 0.00 0.00
900 0 0.00 0.00 0.00 0.00
800 0 0.00 0.00 0.00 0.00
782 0 0.00 0.00 0.00 0.00
cpu3 wakeups name count
Daniel Lezcano (3):
cpuidle: encapsulate power info in a separate structure
idle: store the idle state the cpu is
sched/fair: use the idle state info to choose the idlest cpu
arch/arm/include/asm/cpuidle.h | 6 +-
arch/arm/mach-exynos/cpuidle.c | 4 +-
drivers/acpi/processor_idle.c | 4 +-
drivers/base/power/domain.c | 6 +-
drivers/cpuidle/cpuidle-at91.c | 4 +-
drivers/cpuidle/cpuidle-big_little.c | 9 +--
drivers/cpuidle/cpuidle-calxeda.c | 6 +-
drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
drivers/cpuidle/cpuidle-powernv.c | 8 +--
drivers/cpuidle/cpuidle-pseries.c | 12 ++--
drivers/cpuidle/cpuidle-ux500.c | 14 ++---
drivers/cpuidle/cpuidle-zynq.c | 4 +-
drivers/cpuidle/driver.c | 6 +-
drivers/cpuidle/governors/ladder.c | 14 +++--
drivers/cpuidle/governors/menu.c | 8 +--
drivers/cpuidle/sysfs.c | 2 +-
drivers/idle/intel_idle.c | 112 +++++++++++++++++-----------------
include/linux/cpuidle.h | 10 ++-
kernel/sched/fair.c | 46 ++++++++++++--
kernel/sched/idle.c | 17 +++++-
kernel/sched/sched.h | 5 ++
21 files changed, 180 insertions(+), 121 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
@ 2014-03-28 12:29 ` Daniel Lezcano
2014-03-28 18:17 ` Nicolas Pitre
2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
` (4 subsequent siblings)
5 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
To: linux-kernel, mingo, peterz
Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
morten.rasmussen
The scheduler needs some information from cpuidle to know the timing for a
specific idle state a cpu is.
This patch creates a separate structure to group the cpuidle power info in
order to share it with the scheduler. It improves the encapsulation of the
code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
arch/arm/include/asm/cpuidle.h | 6 +-
arch/arm/mach-exynos/cpuidle.c | 4 +-
drivers/acpi/processor_idle.c | 4 +-
drivers/base/power/domain.c | 6 +-
drivers/cpuidle/cpuidle-at91.c | 4 +-
drivers/cpuidle/cpuidle-big_little.c | 9 +--
drivers/cpuidle/cpuidle-calxeda.c | 6 +-
drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
drivers/cpuidle/cpuidle-powernv.c | 8 +--
drivers/cpuidle/cpuidle-pseries.c | 12 ++--
drivers/cpuidle/cpuidle-ux500.c | 14 ++---
drivers/cpuidle/cpuidle-zynq.c | 4 +-
drivers/cpuidle/driver.c | 6 +-
drivers/cpuidle/governors/ladder.c | 14 +++--
| 8 +--
drivers/cpuidle/sysfs.c | 2 +-
drivers/idle/intel_idle.c | 112 +++++++++++++++++-----------------
include/linux/cpuidle.h | 10 ++-
18 files changed, 120 insertions(+), 113 deletions(-)
diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
index 2fca60a..987ee53 100644
--- a/arch/arm/include/asm/cpuidle.h
+++ b/arch/arm/include/asm/cpuidle.h
@@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
/* Common ARM WFI state */
#define ARM_CPUIDLE_WFI_STATE_PWR(p) {\
.enter = arm_cpuidle_simple_enter,\
- .exit_latency = 1,\
- .target_residency = 1,\
- .power_usage = p,\
+ .power.exit_latency = 1,\
+ .power.target_residency = 1,\
+ .power.power_usage = p,\
.flags = CPUIDLE_FLAG_TIME_VALID,\
.name = "WFI",\
.desc = "ARM WFI",\
diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c
index f57cb91..f6275cb 100644
--- a/arch/arm/mach-exynos/cpuidle.c
+++ b/arch/arm/mach-exynos/cpuidle.c
@@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = {
[0] = ARM_CPUIDLE_WFI_STATE,
[1] = {
.enter = exynos4_enter_lowpower,
- .exit_latency = 300,
- .target_residency = 100000,
+ .power.exit_latency = 300,
+ .power.target_residency = 100000,
.flags = CPUIDLE_FLAG_TIME_VALID,
.name = "C1",
.desc = "ARM power down",
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 3dca36d..05fa991 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
state = &drv->states[count];
snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
- state->exit_latency = cx->latency;
- state->target_residency = cx->latency * latency_factor;
+ state->power.exit_latency = cx->latency;
+ state->power.target_residency = cx->latency * latency_factor;
state->flags = 0;
switch (cx->type) {
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index bfb8955..6bcb1e8 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd)
usecs64 = genpd->power_on_latency_ns;
do_div(usecs64, NSEC_PER_USEC);
usecs64 += genpd->cpu_data->saved_exit_latency;
- genpd->cpu_data->idle_state->exit_latency = usecs64;
+ genpd->cpu_data->idle_state->power.exit_latency = usecs64;
}
/**
@@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state)
goto err;
}
cpu_data->idle_state = idle_state;
- cpu_data->saved_exit_latency = idle_state->exit_latency;
+ cpu_data->saved_exit_latency = idle_state->power.exit_latency;
genpd->cpu_data = cpu_data;
genpd_recalc_cpu_exit_latency(genpd);
@@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd)
ret = -EAGAIN;
goto out;
}
- idle_state->exit_latency = cpu_data->saved_exit_latency;
+ idle_state->power.exit_latency = cpu_data->saved_exit_latency;
cpuidle_driver_unref();
genpd->cpu_data = NULL;
kfree(cpu_data);
diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c
index a077437..48c7063 100644
--- a/drivers/cpuidle/cpuidle-at91.c
+++ b/drivers/cpuidle/cpuidle-at91.c
@@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = {
.owner = THIS_MODULE,
.states[0] = ARM_CPUIDLE_WFI_STATE,
.states[1] = {
+ .power.exit_latency = 10,
+ .power.target_residency = 10000,
.enter = at91_enter_idle,
- .exit_latency = 10,
- .target_residency = 10000,
.flags = CPUIDLE_FLAG_TIME_VALID,
.name = "RAM_SR",
.desc = "WFI and DDR Self Refresh",
diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c
index b45fc62..5a0af4b 100644
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = {
.owner = THIS_MODULE,
.states[0] = ARM_CPUIDLE_WFI_STATE,
.states[1] = {
+ .power.exit_latency = 700,
+ .power.target_residency = 2500,
.enter = bl_enter_powerdown,
- .exit_latency = 700,
- .target_residency = 2500,
.flags = CPUIDLE_FLAG_TIME_VALID |
CPUIDLE_FLAG_TIMER_STOP,
.name = "C1",
@@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = {
.owner = THIS_MODULE,
.states[0] = ARM_CPUIDLE_WFI_STATE,
.states[1] = {
+
+ .power.exit_latency = 500,
+ .power.target_residency = 2000,
.enter = bl_enter_powerdown,
- .exit_latency = 500,
- .target_residency = 2000,
.flags = CPUIDLE_FLAG_TIME_VALID |
CPUIDLE_FLAG_TIMER_STOP,
.name = "C1",
diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
index 6e51114..8357a20 100644
--- a/drivers/cpuidle/cpuidle-calxeda.c
+++ b/drivers/cpuidle/cpuidle-calxeda.c
@@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = {
.name = "PG",
.desc = "Power Gate",
.flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 30,
- .power_usage = 50,
- .target_residency = 200,
+ .power.exit_latency = 30,
+ .power.power_usage = 50,
+ .power.target_residency = 200,
.enter = calxeda_pwrdown_idle,
},
},
diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
index 41ba843..0ae4138 100644
--- a/drivers/cpuidle/cpuidle-kirkwood.c
+++ b/drivers/cpuidle/cpuidle-kirkwood.c
@@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = {
.owner = THIS_MODULE,
.states[0] = ARM_CPUIDLE_WFI_STATE,
.states[1] = {
+ .power.exit_latency = 10,
+ .power.target_residency = 100000,
.enter = kirkwood_enter_idle,
- .exit_latency = 10,
- .target_residency = 100000,
.flags = CPUIDLE_FLAG_TIME_VALID,
.name = "DDR SR",
.desc = "WFI and DDR Self Refresh",
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index f48607c..c47cc02 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = {
.name = "snooze",
.desc = "snooze",
.flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 0,
- .target_residency = 0,
+ .power.exit_latency = 0,
+ .power.target_residency = 0,
.enter = &snooze_loop },
{ /* NAP */
.name = "NAP",
.desc = "NAP",
.flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 100,
+ .power.exit_latency = 10,
+ .power.target_residency = 100,
.enter = &nap_loop },
};
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 6f7b019..483d7e7 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = {
.name = "snooze",
.desc = "snooze",
.flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 0,
- .target_residency = 0,
+ .power.exit_latency = 0,
+ .power.target_residency = 0,
.enter = &snooze_loop },
{ /* CEDE */
.name = "CEDE",
.desc = "CEDE",
.flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 100,
+ .power.exit_latency = 10,
+ .power.target_residency = 100,
.enter = &dedicated_cede_loop },
};
@@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = {
.name = "Shared Cede",
.desc = "Shared Cede",
.flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 0,
- .target_residency = 0,
+ .power.exit_latency = 0,
+ .power.target_residency = 0,
.enter = &shared_cede_loop },
};
diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c
index 5e35804..3261eb2 100644
--- a/drivers/cpuidle/cpuidle-ux500.c
+++ b/drivers/cpuidle/cpuidle-ux500.c
@@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = {
.states = {
ARM_CPUIDLE_WFI_STATE,
{
- .enter = ux500_enter_idle,
- .exit_latency = 70,
- .target_residency = 260,
- .flags = CPUIDLE_FLAG_TIME_VALID |
- CPUIDLE_FLAG_TIMER_STOP,
- .name = "ApIdle",
- .desc = "ARM Retention",
+ .power.exit_latency = 70,
+ .power.target_residency = 260,
+ .enter = ux500_enter_idle,
+ .flags = CPUIDLE_FLAG_TIME_VALID |
+ CPUIDLE_FLAG_TIMER_STOP,
+ .name = "ApIdle",
+ .desc = "ARM Retention",
},
},
.safe_state_index = 0,
diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c
index aded759..dddefb8 100644
--- a/drivers/cpuidle/cpuidle-zynq.c
+++ b/drivers/cpuidle/cpuidle-zynq.c
@@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = {
.states = {
ARM_CPUIDLE_WFI_STATE,
{
+ .power.exit_latency = 10,
+ .power.target_residency = 10000,
.enter = zynq_enter_idle,
- .exit_latency = 10,
- .target_residency = 10000,
.flags = CPUIDLE_FLAG_TIME_VALID |
CPUIDLE_FLAG_TIMER_STOP,
.name = "RAM_SR",
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 06dbe7c..40ddd3c 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv)
snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
- state->exit_latency = 0;
- state->target_residency = 0;
- state->power_usage = -1;
+ state->power.exit_latency = 0;
+ state->power.target_residency = 0;
+ state->power.power_usage = -1;
state->flags = 0;
state->enter = poll_idle;
state->disabled = false;
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 9f08e8c..4837880 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
last_residency = cpuidle_get_last_residency(dev) - \
- drv->states[last_idx].exit_latency;
+ drv->states[last_idx].power.exit_latency;
}
else
last_residency = last_state->threshold.promotion_time + 1;
@@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
!drv->states[last_idx + 1].disabled &&
!dev->states_usage[last_idx + 1].disable &&
last_residency > last_state->threshold.promotion_time &&
- drv->states[last_idx + 1].exit_latency <= latency_req) {
+ drv->states[last_idx + 1].power.exit_latency <= latency_req) {
last_state->stats.promotion_count++;
last_state->stats.demotion_count = 0;
if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
@@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv,
if (last_idx > CPUIDLE_DRIVER_STATE_START &&
(drv->states[last_idx].disabled ||
dev->states_usage[last_idx].disable ||
- drv->states[last_idx].exit_latency > latency_req)) {
+ drv->states[last_idx].power.exit_latency > latency_req)) {
int i;
for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
- if (drv->states[i].exit_latency <= latency_req)
+ if (drv->states[i].power.exit_latency <= latency_req)
break;
}
ladder_do_selection(ldev, last_idx, i);
@@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
lstate->threshold.demotion_count = DEMOTION_COUNT;
if (i < drv->state_count - 1)
- lstate->threshold.promotion_time = state->exit_latency;
+ lstate->threshold.promotion_time =
+ state->power.exit_latency;
if (i > 0)
- lstate->threshold.demotion_time = state->exit_latency;
+ lstate->threshold.demotion_time =
+ state->power.exit_latency;
}
return 0;
--git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..34bd463 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
if (s->disabled || su->disable)
continue;
- if (s->target_residency > data->predicted_us)
+ if (s->power.target_residency > data->predicted_us)
continue;
- if (s->exit_latency > latency_req)
+ if (s->power.exit_latency > latency_req)
continue;
- if (s->exit_latency * multiplier > data->predicted_us)
+ if (s->power.exit_latency * multiplier > data->predicted_us)
continue;
data->last_state_idx = i;
- data->exit_us = s->exit_latency;
+ data->exit_us = s->power.exit_latency;
}
return data->last_state_idx;
diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
index e918b6d..1a45541 100644
--- a/drivers/cpuidle/sysfs.c
+++ b/drivers/cpuidle/sysfs.c
@@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store)
static ssize_t show_state_##_name(struct cpuidle_state *state, \
struct cpuidle_state_usage *state_usage, char *buf) \
{ \
- return sprintf(buf, "%u\n", state->_name);\
+ return sprintf(buf, "%u\n", state->power._name);\
}
#define define_store_state_ull_function(_name) \
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 8e1939f..4f0533e 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = {
.name = "C1-NHM",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 3,
- .target_residency = 6,
+ .power.exit_latency = 3,
+ .power.target_residency = 6,
.enter = &intel_idle },
{
.name = "C1E-NHM",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 20,
+ .power.exit_latency = 10,
+ .power.target_residency = 20,
.enter = &intel_idle },
{
.name = "C3-NHM",
.desc = "MWAIT 0x10",
.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 20,
- .target_residency = 80,
+ .power.exit_latency = 20,
+ .power.target_residency = 80,
.enter = &intel_idle },
{
.name = "C6-NHM",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 200,
- .target_residency = 800,
+ .power.exit_latency = 200,
+ .power.target_residency = 800,
.enter = &intel_idle },
{
.enter = NULL }
@@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = {
.name = "C1-SNB",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 2,
- .target_residency = 2,
+ .power.exit_latency = 2,
+ .power.target_residency = 2,
.enter = &intel_idle },
{
.name = "C1E-SNB",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 20,
+ .power.exit_latency = 10,
+ .power.target_residency = 20,
.enter = &intel_idle },
{
.name = "C3-SNB",
.desc = "MWAIT 0x10",
.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 80,
- .target_residency = 211,
+ .power.exit_latency = 80,
+ .power.target_residency = 211,
.enter = &intel_idle },
{
.name = "C6-SNB",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 104,
- .target_residency = 345,
+ .power.exit_latency = 104,
+ .power.target_residency = 345,
.enter = &intel_idle },
{
.name = "C7-SNB",
.desc = "MWAIT 0x30",
.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 109,
- .target_residency = 345,
+ .power.exit_latency = 109,
+ .power.target_residency = 345,
.enter = &intel_idle },
{
.enter = NULL }
@@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = {
.name = "C1-IVB",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 1,
- .target_residency = 1,
+ .power.exit_latency = 1,
+ .power.target_residency = 1,
.enter = &intel_idle },
{
.name = "C1E-IVB",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 20,
+ .power.exit_latency = 10,
+ .power.target_residency = 20,
.enter = &intel_idle },
{
.name = "C3-IVB",
.desc = "MWAIT 0x10",
.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 59,
- .target_residency = 156,
+ .power.exit_latency = 59,
+ .power.target_residency = 156,
.enter = &intel_idle },
{
.name = "C6-IVB",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 80,
- .target_residency = 300,
+ .power.exit_latency = 80,
+ .power.target_residency = 300,
.enter = &intel_idle },
{
.name = "C7-IVB",
.desc = "MWAIT 0x30",
.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 87,
- .target_residency = 300,
+ .power.exit_latency = 87,
+ .power.target_residency = 300,
.enter = &intel_idle },
{
.enter = NULL }
@@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = {
.name = "C1-HSW",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 2,
- .target_residency = 2,
+ .power.exit_latency = 2,
+ .power.target_residency = 2,
.enter = &intel_idle },
{
.name = "C1E-HSW",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 20,
+ .power.exit_latency = 10,
+ .power.target_residency = 20,
.enter = &intel_idle },
{
.name = "C3-HSW",
.desc = "MWAIT 0x10",
.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 33,
- .target_residency = 100,
+ .power.exit_latency = 33,
+ .power.target_residency = 100,
.enter = &intel_idle },
{
.name = "C6-HSW",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 133,
- .target_residency = 400,
+ .power.exit_latency = 133,
+ .power.target_residency = 400,
.enter = &intel_idle },
{
.name = "C7s-HSW",
.desc = "MWAIT 0x32",
.flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 166,
- .target_residency = 500,
+ .power.exit_latency = 166,
+ .power.target_residency = 500,
.enter = &intel_idle },
{
.name = "C8-HSW",
.desc = "MWAIT 0x40",
.flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 300,
- .target_residency = 900,
+ .power.exit_latency = 300,
+ .power.target_residency = 900,
.enter = &intel_idle },
{
.name = "C9-HSW",
.desc = "MWAIT 0x50",
.flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 600,
- .target_residency = 1800,
+ .power.exit_latency = 600,
+ .power.target_residency = 1800,
.enter = &intel_idle },
{
.name = "C10-HSW",
.desc = "MWAIT 0x60",
.flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 2600,
- .target_residency = 7700,
+ .power.exit_latency = 2600,
+ .power.target_residency = 7700,
.enter = &intel_idle },
{
.enter = NULL }
@@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = {
.name = "C1E-ATM",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 10,
- .target_residency = 20,
+ .power.exit_latency = 10,
+ .power.target_residency = 20,
.enter = &intel_idle },
{
.name = "C2-ATM",
.desc = "MWAIT 0x10",
.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 20,
- .target_residency = 80,
+ .power.exit_latency = 20,
+ .power.target_residency = 80,
.enter = &intel_idle },
{
.name = "C4-ATM",
.desc = "MWAIT 0x30",
.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 100,
- .target_residency = 400,
+ .power.exit_latency = 100,
+ .power.target_residency = 400,
.enter = &intel_idle },
{
.name = "C6-ATM",
.desc = "MWAIT 0x52",
.flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 140,
- .target_residency = 560,
+ .power.exit_latency = 140,
+ .power.target_residency = 560,
.enter = &intel_idle },
{
.enter = NULL }
@@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = {
.name = "C1-AVN",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 2,
- .target_residency = 2,
+ .power.exit_latency = 2,
+ .power.target_residency = 2,
.enter = &intel_idle },
{
.name = "C6-AVN",
.desc = "MWAIT 0x51",
.flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
- .exit_latency = 15,
- .target_residency = 45,
+ .power.exit_latency = 15,
+ .power.target_residency = 45,
.enter = &intel_idle },
{
.enter = NULL }
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index b0238cb..eb58ab3 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -35,14 +35,18 @@ struct cpuidle_state_usage {
unsigned long long time; /* in US */
};
+struct cpuidle_power {
+ unsigned int exit_latency; /* in US */
+ unsigned int target_residency; /* in US */
+ int power_usage; /* in mW */
+};
+
struct cpuidle_state {
char name[CPUIDLE_NAME_LEN];
char desc[CPUIDLE_DESC_LEN];
unsigned int flags;
- unsigned int exit_latency; /* in US */
- int power_usage; /* in mW */
- unsigned int target_residency; /* in US */
+ struct cpuidle_power power;
bool disabled; /* disabled on all CPUs */
int (*enter) (struct cpuidle_device *dev,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [RFC PATCHC 2/3] idle: store the idle state the cpu is
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
@ 2014-03-28 12:29 ` Daniel Lezcano
2014-04-15 12:43 ` Peter Zijlstra
2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
` (3 subsequent siblings)
5 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
To: linux-kernel, mingo, peterz
Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
morten.rasmussen
When the cpu enters idle it stores the cpuidle power info in the struct
rq which in turn could be used to take a right decision when balancing
a task.
As soon as the cpu exits the idle state, the structure is filled with the
NULL pointer.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
kernel/sched/idle.c | 17 +++++++++++++++--
kernel/sched/sched.h | 5 +++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8f4390a..5c32c11 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -12,6 +12,8 @@
#include <trace/events/power.h>
+#include "sched.h"
+
static int __read_mostly cpu_idle_force_poll;
void cpu_idle_poll_ctrl(bool enable)
@@ -69,7 +71,7 @@ void __weak arch_cpu_idle(void)
* NOTE: no locks or semaphores should be used here
* return non-zero on failure
*/
-static int cpuidle_idle_call(void)
+static int cpuidle_idle_call(struct cpuidle_power **power)
{
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
@@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
if (!ret) {
trace_cpu_idle_rcuidle(next_state, dev->cpu);
+ *power = &drv->states[next_state].power;
+
+ wmb();
+
/*
* Enter the idle state previously
* returned by the governor
@@ -154,6 +160,10 @@ static int cpuidle_idle_call(void)
entered_state = cpuidle_enter(drv, dev,
next_state);
+ *power = NULL;
+
+ wmb();
+
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
dev->cpu);
@@ -198,6 +208,9 @@ static int cpuidle_idle_call(void)
*/
static void cpu_idle_loop(void)
{
+ struct rq *rq = this_rq();
+ struct cpuidle_power **power = &rq->power;
+
while (1) {
tick_nohz_idle_enter();
@@ -223,7 +236,7 @@ static void cpu_idle_loop(void)
if (cpu_idle_force_poll || tick_check_broadcast_expired())
cpu_idle_poll();
else
- cpuidle_idle_call();
+ cpuidle_idle_call(power);
arch_cpu_idle_exit();
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1929deb..1bcac35 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -14,6 +14,7 @@
#include "cpuacct.h"
struct rq;
+struct cpuidle_power;
extern __read_mostly int scheduler_running;
@@ -632,6 +633,10 @@ struct rq {
#ifdef CONFIG_SMP
struct llist_head wake_list;
#endif
+
+#ifdef CONFIG_CPU_IDLE
+ struct cpuidle_power *power;
+#endif
};
static inline int cpu_of(struct rq *rq)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
@ 2014-03-28 12:29 ` Daniel Lezcano
2014-04-02 3:05 ` Nicolas Pitre
2014-04-15 13:03 ` Peter Zijlstra
2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
` (2 subsequent siblings)
5 siblings, 2 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
To: linux-kernel, mingo, peterz
Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
morten.rasmussen
As we know in which idle state the cpu is, we can investigate the following:
1. when did the cpu entered the idle state ? the longer the cpu is idle, the
deeper it is idle
2. what exit latency is ? the greater the exit latency is, the deeper it is
With both information, when all cpus are idle, we can choose the idlest cpu.
When one cpu is not idle, the old check against weighted load applies.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
kernel/sched/fair.c | 46 ++++++++++++++++++++++++++++++++++++++++------
1 file changed, 40 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 16042b5..068e503 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -23,6 +23,7 @@
#include <linux/latencytop.h>
#include <linux/sched.h>
#include <linux/cpumask.h>
+#include <linux/cpuidle.h>
#include <linux/slab.h>
#include <linux/profile.h>
#include <linux/interrupt.h>
@@ -4336,20 +4337,53 @@ static int
find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
{
unsigned long load, min_load = ULONG_MAX;
- int idlest = -1;
+ unsigned int min_exit_latency = UINT_MAX;
+ u64 idle_stamp, min_idle_stamp = ULONG_MAX;
+
+ struct rq *rq;
+ struct cpuidle_power *power;
+
+ int cpu_idle = -1;
+ int cpu_busy = -1;
int i;
/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
- load = weighted_cpuload(i);
- if (load < min_load || (load == min_load && i == this_cpu)) {
- min_load = load;
- idlest = i;
+ if (idle_cpu(i)) {
+
+ rq = cpu_rq(i);
+ power = rq->power;
+ idle_stamp = rq->idle_stamp;
+
+ /* The cpu is idle since a shorter time */
+ if (idle_stamp < min_idle_stamp) {
+ min_idle_stamp = idle_stamp;
+ cpu_idle = i;
+ continue;
+ }
+
+ /* The cpu is idle but the exit_latency is shorter */
+ if (power && power->exit_latency < min_exit_latency) {
+ min_exit_latency = power->exit_latency;
+ cpu_idle = i;
+ continue;
+ }
+ } else {
+
+ load = weighted_cpuload(i);
+
+ if (load < min_load ||
+ (load == min_load && i == this_cpu)) {
+ min_load = load;
+ cpu_busy = i;
+ continue;
+ }
}
}
- return idlest;
+ /* Busy cpus are considered less idle than idle cpus ;) */
+ return cpu_busy != -1 ? cpu_busy : cpu_idle;
}
/*
--
1.7.9.5
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
@ 2014-03-28 18:17 ` Nicolas Pitre
2014-03-28 20:42 ` Daniel Lezcano
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-03-28 18:17 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> The scheduler needs some information from cpuidle to know the timing for a
> specific idle state a cpu is.
>
> This patch creates a separate structure to group the cpuidle power info in
> order to share it with the scheduler. It improves the encapsulation of the
> code.
Having cpuidle_power as a structure name, or worse, 'power' as a struct
member, is a really bad choice. Amongst the fields this struct
contains, only 1 out of 3 is about power. The word "power" is already
abused quite significantly to mean too many different things already.
I'd suggest something inspired by your own patch log message i.e.
'struct cpuidle_info' instead, and use 'info' as a field name within
struct cpuidle_state. Having 'params" instead of "info" could be a good
alternative too, although slightly longer.
And with struct rq in patch 2/3 I'd simply use:
struct cpuidle_info *cpuidle;
This way you'll have rq->cpuidle->exit_latency to refer to from the
scheduler context which is IMHO much more self explanatory.
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
> arch/arm/include/asm/cpuidle.h | 6 +-
> arch/arm/mach-exynos/cpuidle.c | 4 +-
> drivers/acpi/processor_idle.c | 4 +-
> drivers/base/power/domain.c | 6 +-
> drivers/cpuidle/cpuidle-at91.c | 4 +-
> drivers/cpuidle/cpuidle-big_little.c | 9 +--
> drivers/cpuidle/cpuidle-calxeda.c | 6 +-
> drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
> drivers/cpuidle/cpuidle-powernv.c | 8 +--
> drivers/cpuidle/cpuidle-pseries.c | 12 ++--
> drivers/cpuidle/cpuidle-ux500.c | 14 ++---
> drivers/cpuidle/cpuidle-zynq.c | 4 +-
> drivers/cpuidle/driver.c | 6 +-
> drivers/cpuidle/governors/ladder.c | 14 +++--
> drivers/cpuidle/governors/menu.c | 8 +--
> drivers/cpuidle/sysfs.c | 2 +-
> drivers/idle/intel_idle.c | 112 +++++++++++++++++-----------------
> include/linux/cpuidle.h | 10 ++-
> 18 files changed, 120 insertions(+), 113 deletions(-)
>
> diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
> index 2fca60a..987ee53 100644
> --- a/arch/arm/include/asm/cpuidle.h
> +++ b/arch/arm/include/asm/cpuidle.h
> @@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
> /* Common ARM WFI state */
> #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\
> .enter = arm_cpuidle_simple_enter,\
> - .exit_latency = 1,\
> - .target_residency = 1,\
> - .power_usage = p,\
> + .power.exit_latency = 1,\
> + .power.target_residency = 1,\
> + .power.power_usage = p,\
> .flags = CPUIDLE_FLAG_TIME_VALID,\
> .name = "WFI",\
> .desc = "ARM WFI",\
> diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c
> index f57cb91..f6275cb 100644
> --- a/arch/arm/mach-exynos/cpuidle.c
> +++ b/arch/arm/mach-exynos/cpuidle.c
> @@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = {
> [0] = ARM_CPUIDLE_WFI_STATE,
> [1] = {
> .enter = exynos4_enter_lowpower,
> - .exit_latency = 300,
> - .target_residency = 100000,
> + .power.exit_latency = 300,
> + .power.target_residency = 100000,
> .flags = CPUIDLE_FLAG_TIME_VALID,
> .name = "C1",
> .desc = "ARM power down",
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 3dca36d..05fa991 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
> state = &drv->states[count];
> snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
> strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
> - state->exit_latency = cx->latency;
> - state->target_residency = cx->latency * latency_factor;
> + state->power.exit_latency = cx->latency;
> + state->power.target_residency = cx->latency * latency_factor;
>
> state->flags = 0;
> switch (cx->type) {
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index bfb8955..6bcb1e8 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd)
> usecs64 = genpd->power_on_latency_ns;
> do_div(usecs64, NSEC_PER_USEC);
> usecs64 += genpd->cpu_data->saved_exit_latency;
> - genpd->cpu_data->idle_state->exit_latency = usecs64;
> + genpd->cpu_data->idle_state->power.exit_latency = usecs64;
> }
>
> /**
> @@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state)
> goto err;
> }
> cpu_data->idle_state = idle_state;
> - cpu_data->saved_exit_latency = idle_state->exit_latency;
> + cpu_data->saved_exit_latency = idle_state->power.exit_latency;
> genpd->cpu_data = cpu_data;
> genpd_recalc_cpu_exit_latency(genpd);
>
> @@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd)
> ret = -EAGAIN;
> goto out;
> }
> - idle_state->exit_latency = cpu_data->saved_exit_latency;
> + idle_state->power.exit_latency = cpu_data->saved_exit_latency;
> cpuidle_driver_unref();
> genpd->cpu_data = NULL;
> kfree(cpu_data);
> diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c
> index a077437..48c7063 100644
> --- a/drivers/cpuidle/cpuidle-at91.c
> +++ b/drivers/cpuidle/cpuidle-at91.c
> @@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = {
> .owner = THIS_MODULE,
> .states[0] = ARM_CPUIDLE_WFI_STATE,
> .states[1] = {
> + .power.exit_latency = 10,
> + .power.target_residency = 10000,
> .enter = at91_enter_idle,
> - .exit_latency = 10,
> - .target_residency = 10000,
> .flags = CPUIDLE_FLAG_TIME_VALID,
> .name = "RAM_SR",
> .desc = "WFI and DDR Self Refresh",
> diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c
> index b45fc62..5a0af4b 100644
> --- a/drivers/cpuidle/cpuidle-big_little.c
> +++ b/drivers/cpuidle/cpuidle-big_little.c
> @@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = {
> .owner = THIS_MODULE,
> .states[0] = ARM_CPUIDLE_WFI_STATE,
> .states[1] = {
> + .power.exit_latency = 700,
> + .power.target_residency = 2500,
> .enter = bl_enter_powerdown,
> - .exit_latency = 700,
> - .target_residency = 2500,
> .flags = CPUIDLE_FLAG_TIME_VALID |
> CPUIDLE_FLAG_TIMER_STOP,
> .name = "C1",
> @@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = {
> .owner = THIS_MODULE,
> .states[0] = ARM_CPUIDLE_WFI_STATE,
> .states[1] = {
> +
> + .power.exit_latency = 500,
> + .power.target_residency = 2000,
> .enter = bl_enter_powerdown,
> - .exit_latency = 500,
> - .target_residency = 2000,
> .flags = CPUIDLE_FLAG_TIME_VALID |
> CPUIDLE_FLAG_TIMER_STOP,
> .name = "C1",
> diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
> index 6e51114..8357a20 100644
> --- a/drivers/cpuidle/cpuidle-calxeda.c
> +++ b/drivers/cpuidle/cpuidle-calxeda.c
> @@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = {
> .name = "PG",
> .desc = "Power Gate",
> .flags = CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 30,
> - .power_usage = 50,
> - .target_residency = 200,
> + .power.exit_latency = 30,
> + .power.power_usage = 50,
> + .power.target_residency = 200,
> .enter = calxeda_pwrdown_idle,
> },
> },
> diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
> index 41ba843..0ae4138 100644
> --- a/drivers/cpuidle/cpuidle-kirkwood.c
> +++ b/drivers/cpuidle/cpuidle-kirkwood.c
> @@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = {
> .owner = THIS_MODULE,
> .states[0] = ARM_CPUIDLE_WFI_STATE,
> .states[1] = {
> + .power.exit_latency = 10,
> + .power.target_residency = 100000,
> .enter = kirkwood_enter_idle,
> - .exit_latency = 10,
> - .target_residency = 100000,
> .flags = CPUIDLE_FLAG_TIME_VALID,
> .name = "DDR SR",
> .desc = "WFI and DDR Self Refresh",
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index f48607c..c47cc02 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = {
> .name = "snooze",
> .desc = "snooze",
> .flags = CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 0,
> - .target_residency = 0,
> + .power.exit_latency = 0,
> + .power.target_residency = 0,
> .enter = &snooze_loop },
> { /* NAP */
> .name = "NAP",
> .desc = "NAP",
> .flags = CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 10,
> - .target_residency = 100,
> + .power.exit_latency = 10,
> + .power.target_residency = 100,
> .enter = &nap_loop },
> };
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 6f7b019..483d7e7 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = {
> .name = "snooze",
> .desc = "snooze",
> .flags = CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 0,
> - .target_residency = 0,
> + .power.exit_latency = 0,
> + .power.target_residency = 0,
> .enter = &snooze_loop },
> { /* CEDE */
> .name = "CEDE",
> .desc = "CEDE",
> .flags = CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 10,
> - .target_residency = 100,
> + .power.exit_latency = 10,
> + .power.target_residency = 100,
> .enter = &dedicated_cede_loop },
> };
>
> @@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = {
> .name = "Shared Cede",
> .desc = "Shared Cede",
> .flags = CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 0,
> - .target_residency = 0,
> + .power.exit_latency = 0,
> + .power.target_residency = 0,
> .enter = &shared_cede_loop },
> };
>
> diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c
> index 5e35804..3261eb2 100644
> --- a/drivers/cpuidle/cpuidle-ux500.c
> +++ b/drivers/cpuidle/cpuidle-ux500.c
> @@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = {
> .states = {
> ARM_CPUIDLE_WFI_STATE,
> {
> - .enter = ux500_enter_idle,
> - .exit_latency = 70,
> - .target_residency = 260,
> - .flags = CPUIDLE_FLAG_TIME_VALID |
> - CPUIDLE_FLAG_TIMER_STOP,
> - .name = "ApIdle",
> - .desc = "ARM Retention",
> + .power.exit_latency = 70,
> + .power.target_residency = 260,
> + .enter = ux500_enter_idle,
> + .flags = CPUIDLE_FLAG_TIME_VALID |
> + CPUIDLE_FLAG_TIMER_STOP,
> + .name = "ApIdle",
> + .desc = "ARM Retention",
> },
> },
> .safe_state_index = 0,
> diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c
> index aded759..dddefb8 100644
> --- a/drivers/cpuidle/cpuidle-zynq.c
> +++ b/drivers/cpuidle/cpuidle-zynq.c
> @@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = {
> .states = {
> ARM_CPUIDLE_WFI_STATE,
> {
> + .power.exit_latency = 10,
> + .power.target_residency = 10000,
> .enter = zynq_enter_idle,
> - .exit_latency = 10,
> - .target_residency = 10000,
> .flags = CPUIDLE_FLAG_TIME_VALID |
> CPUIDLE_FLAG_TIMER_STOP,
> .name = "RAM_SR",
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 06dbe7c..40ddd3c 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv)
>
> snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
> snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
> - state->exit_latency = 0;
> - state->target_residency = 0;
> - state->power_usage = -1;
> + state->power.exit_latency = 0;
> + state->power.target_residency = 0;
> + state->power.power_usage = -1;
> state->flags = 0;
> state->enter = poll_idle;
> state->disabled = false;
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index 9f08e8c..4837880 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>
> if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
> last_residency = cpuidle_get_last_residency(dev) - \
> - drv->states[last_idx].exit_latency;
> + drv->states[last_idx].power.exit_latency;
> }
> else
> last_residency = last_state->threshold.promotion_time + 1;
> @@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
> !drv->states[last_idx + 1].disabled &&
> !dev->states_usage[last_idx + 1].disable &&
> last_residency > last_state->threshold.promotion_time &&
> - drv->states[last_idx + 1].exit_latency <= latency_req) {
> + drv->states[last_idx + 1].power.exit_latency <= latency_req) {
> last_state->stats.promotion_count++;
> last_state->stats.demotion_count = 0;
> if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
> @@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv,
> if (last_idx > CPUIDLE_DRIVER_STATE_START &&
> (drv->states[last_idx].disabled ||
> dev->states_usage[last_idx].disable ||
> - drv->states[last_idx].exit_latency > latency_req)) {
> + drv->states[last_idx].power.exit_latency > latency_req)) {
> int i;
>
> for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
> - if (drv->states[i].exit_latency <= latency_req)
> + if (drv->states[i].power.exit_latency <= latency_req)
> break;
> }
> ladder_do_selection(ldev, last_idx, i);
> @@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
> lstate->threshold.demotion_count = DEMOTION_COUNT;
>
> if (i < drv->state_count - 1)
> - lstate->threshold.promotion_time = state->exit_latency;
> + lstate->threshold.promotion_time =
> + state->power.exit_latency;
> if (i > 0)
> - lstate->threshold.demotion_time = state->exit_latency;
> + lstate->threshold.demotion_time =
> + state->power.exit_latency;
> }
>
> return 0;
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..34bd463 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>
> if (s->disabled || su->disable)
> continue;
> - if (s->target_residency > data->predicted_us)
> + if (s->power.target_residency > data->predicted_us)
> continue;
> - if (s->exit_latency > latency_req)
> + if (s->power.exit_latency > latency_req)
> continue;
> - if (s->exit_latency * multiplier > data->predicted_us)
> + if (s->power.exit_latency * multiplier > data->predicted_us)
> continue;
>
> data->last_state_idx = i;
> - data->exit_us = s->exit_latency;
> + data->exit_us = s->power.exit_latency;
> }
>
> return data->last_state_idx;
> diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
> index e918b6d..1a45541 100644
> --- a/drivers/cpuidle/sysfs.c
> +++ b/drivers/cpuidle/sysfs.c
> @@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store)
> static ssize_t show_state_##_name(struct cpuidle_state *state, \
> struct cpuidle_state_usage *state_usage, char *buf) \
> { \
> - return sprintf(buf, "%u\n", state->_name);\
> + return sprintf(buf, "%u\n", state->power._name);\
> }
>
> #define define_store_state_ull_function(_name) \
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 8e1939f..4f0533e 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = {
> .name = "C1-NHM",
> .desc = "MWAIT 0x00",
> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 3,
> - .target_residency = 6,
> + .power.exit_latency = 3,
> + .power.target_residency = 6,
> .enter = &intel_idle },
> {
> .name = "C1E-NHM",
> .desc = "MWAIT 0x01",
> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 10,
> - .target_residency = 20,
> + .power.exit_latency = 10,
> + .power.target_residency = 20,
> .enter = &intel_idle },
> {
> .name = "C3-NHM",
> .desc = "MWAIT 0x10",
> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 20,
> - .target_residency = 80,
> + .power.exit_latency = 20,
> + .power.target_residency = 80,
> .enter = &intel_idle },
> {
> .name = "C6-NHM",
> .desc = "MWAIT 0x20",
> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 200,
> - .target_residency = 800,
> + .power.exit_latency = 200,
> + .power.target_residency = 800,
> .enter = &intel_idle },
> {
> .enter = NULL }
> @@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = {
> .name = "C1-SNB",
> .desc = "MWAIT 0x00",
> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 2,
> - .target_residency = 2,
> + .power.exit_latency = 2,
> + .power.target_residency = 2,
> .enter = &intel_idle },
> {
> .name = "C1E-SNB",
> .desc = "MWAIT 0x01",
> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 10,
> - .target_residency = 20,
> + .power.exit_latency = 10,
> + .power.target_residency = 20,
> .enter = &intel_idle },
> {
> .name = "C3-SNB",
> .desc = "MWAIT 0x10",
> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 80,
> - .target_residency = 211,
> + .power.exit_latency = 80,
> + .power.target_residency = 211,
> .enter = &intel_idle },
> {
> .name = "C6-SNB",
> .desc = "MWAIT 0x20",
> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 104,
> - .target_residency = 345,
> + .power.exit_latency = 104,
> + .power.target_residency = 345,
> .enter = &intel_idle },
> {
> .name = "C7-SNB",
> .desc = "MWAIT 0x30",
> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 109,
> - .target_residency = 345,
> + .power.exit_latency = 109,
> + .power.target_residency = 345,
> .enter = &intel_idle },
> {
> .enter = NULL }
> @@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = {
> .name = "C1-IVB",
> .desc = "MWAIT 0x00",
> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 1,
> - .target_residency = 1,
> + .power.exit_latency = 1,
> + .power.target_residency = 1,
> .enter = &intel_idle },
> {
> .name = "C1E-IVB",
> .desc = "MWAIT 0x01",
> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 10,
> - .target_residency = 20,
> + .power.exit_latency = 10,
> + .power.target_residency = 20,
> .enter = &intel_idle },
> {
> .name = "C3-IVB",
> .desc = "MWAIT 0x10",
> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 59,
> - .target_residency = 156,
> + .power.exit_latency = 59,
> + .power.target_residency = 156,
> .enter = &intel_idle },
> {
> .name = "C6-IVB",
> .desc = "MWAIT 0x20",
> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 80,
> - .target_residency = 300,
> + .power.exit_latency = 80,
> + .power.target_residency = 300,
> .enter = &intel_idle },
> {
> .name = "C7-IVB",
> A .desc = "MWAIT 0x30",
> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 87,
> - .target_residency = 300,
> + .power.exit_latency = 87,
> + .power.target_residency = 300,
> .enter = &intel_idle },
> {
> .enter = NULL }
> @@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = {
> .name = "C1-HSW",
> .desc = "MWAIT 0x00",
> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 2,
> - .target_residency = 2,
> + .power.exit_latency = 2,
> + .power.target_residency = 2,
> .enter = &intel_idle },
> {
> .name = "C1E-HSW",
> .desc = "MWAIT 0x01",
> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 10,
> - .target_residency = 20,
> + .power.exit_latency = 10,
> + .power.target_residency = 20,
> .enter = &intel_idle },
> {
> .name = "C3-HSW",
> .desc = "MWAIT 0x10",
> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 33,
> - .target_residency = 100,
> + .power.exit_latency = 33,
> + .power.target_residency = 100,
> .enter = &intel_idle },
> {
> .name = "C6-HSW",
> .desc = "MWAIT 0x20",
> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 133,
> - .target_residency = 400,
> + .power.exit_latency = 133,
> + .power.target_residency = 400,
> .enter = &intel_idle },
> {
> .name = "C7s-HSW",
> .desc = "MWAIT 0x32",
> .flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 166,
> - .target_residency = 500,
> + .power.exit_latency = 166,
> + .power.target_residency = 500,
> .enter = &intel_idle },
> {
> .name = "C8-HSW",
> .desc = "MWAIT 0x40",
> .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 300,
> - .target_residency = 900,
> + .power.exit_latency = 300,
> + .power.target_residency = 900,
> .enter = &intel_idle },
> {
> .name = "C9-HSW",
> .desc = "MWAIT 0x50",
> .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 600,
> - .target_residency = 1800,
> + .power.exit_latency = 600,
> + .power.target_residency = 1800,
> .enter = &intel_idle },
> {
> .name = "C10-HSW",
> .desc = "MWAIT 0x60",
> .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 2600,
> - .target_residency = 7700,
> + .power.exit_latency = 2600,
> + .power.target_residency = 7700,
> .enter = &intel_idle },
> {
> .enter = NULL }
> @@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = {
> .name = "C1E-ATM",
> .desc = "MWAIT 0x00",
> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 10,
> - .target_residency = 20,
> + .power.exit_latency = 10,
> + .power.target_residency = 20,
> .enter = &intel_idle },
> {
> .name = "C2-ATM",
> .desc = "MWAIT 0x10",
> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 20,
> - .target_residency = 80,
> + .power.exit_latency = 20,
> + .power.target_residency = 80,
> .enter = &intel_idle },
> {
> .name = "C4-ATM",
> .desc = "MWAIT 0x30",
> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 100,
> - .target_residency = 400,
> + .power.exit_latency = 100,
> + .power.target_residency = 400,
> .enter = &intel_idle },
> {
> .name = "C6-ATM",
> .desc = "MWAIT 0x52",
> .flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 140,
> - .target_residency = 560,
> + .power.exit_latency = 140,
> + .power.target_residency = 560,
> .enter = &intel_idle },
> {
> .enter = NULL }
> @@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = {
> .name = "C1-AVN",
> .desc = "MWAIT 0x00",
> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> - .exit_latency = 2,
> - .target_residency = 2,
> + .power.exit_latency = 2,
> + .power.target_residency = 2,
> .enter = &intel_idle },
> {
> .name = "C6-AVN",
> .desc = "MWAIT 0x51",
> .flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> - .exit_latency = 15,
> - .target_residency = 45,
> + .power.exit_latency = 15,
> + .power.target_residency = 45,
> .enter = &intel_idle },
> {
> .enter = NULL }
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index b0238cb..eb58ab3 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -35,14 +35,18 @@ struct cpuidle_state_usage {
> unsigned long long time; /* in US */
> };
>
> +struct cpuidle_power {
> + unsigned int exit_latency; /* in US */
> + unsigned int target_residency; /* in US */
> + int power_usage; /* in mW */
> +};
> +
> struct cpuidle_state {
> char name[CPUIDLE_NAME_LEN];
> char desc[CPUIDLE_DESC_LEN];
>
> unsigned int flags;
> - unsigned int exit_latency; /* in US */
> - int power_usage; /* in mW */
> - unsigned int target_residency; /* in US */
> + struct cpuidle_power power;
> bool disabled; /* disabled on all CPUs */
>
> int (*enter) (struct cpuidle_device *dev,
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
2014-03-28 18:17 ` Nicolas Pitre
@ 2014-03-28 20:42 ` Daniel Lezcano
2014-03-29 0:00 ` Nicolas Pitre
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 20:42 UTC (permalink / raw)
To: Nicolas Pitre
Cc: LKML, mingo, Peter Zijlstra, Rafael J. Wysocki, linux-pm,
Alex Shi, Vincent Guittot, Morten Rasmussen
Hi Nicolas,
thanks for reviewing the patchset.
On 03/28/2014 07:17 PM, Nicolas Pitre wrote:
> On Fri, 28 Mar 2014, Daniel Lezcano wrote:
>
>> The scheduler needs some information from cpuidle to know the timing for a
>> specific idle state a cpu is.
>>
>> This patch creates a separate structure to group the cpuidle power info in
>> order to share it with the scheduler. It improves the encapsulation of the
>> code.
>
> Having cpuidle_power as a structure name, or worse, 'power' as a struct
> member, is a really bad choice.
Yes, I was asking myself if this name was a good choice or not. I
assumed 'power' could have been a good name because 'target_residency'
is a time conversion of the power needed to enter this state.
> Amongst the fields this struct
> contains, only 1 out of 3 is about power. The word "power" is already
> abused quite significantly to mean too many different things already.
>
> I'd suggest something inspired by your own patch log message i.e.
> 'struct cpuidle_info' instead, and use 'info' as a field name within
> struct cpuidle_state. Having 'params" instead of "info" could be a good
> alternative too, although slightly longer.
Hmm 'info' or 'param' sound too vague. What about:
cpuidle_attr
or
cpuidle_property
?
> And with struct rq in patch 2/3 I'd simply use:
>
> struct cpuidle_info *cpuidle;
>
> This way you'll have rq->cpuidle->exit_latency to refer to from the
> scheduler context which is IMHO much more self explanatory.
Ok, sounds good.
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> ---
>> arch/arm/include/asm/cpuidle.h | 6 +-
>> arch/arm/mach-exynos/cpuidle.c | 4 +-
>> drivers/acpi/processor_idle.c | 4 +-
>> drivers/base/power/domain.c | 6 +-
>> drivers/cpuidle/cpuidle-at91.c | 4 +-
>> drivers/cpuidle/cpuidle-big_little.c | 9 +--
>> drivers/cpuidle/cpuidle-calxeda.c | 6 +-
>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
>> drivers/cpuidle/cpuidle-powernv.c | 8 +--
>> drivers/cpuidle/cpuidle-pseries.c | 12 ++--
>> drivers/cpuidle/cpuidle-ux500.c | 14 ++---
>> drivers/cpuidle/cpuidle-zynq.c | 4 +-
>> drivers/cpuidle/driver.c | 6 +-
>> drivers/cpuidle/governors/ladder.c | 14 +++--
>> drivers/cpuidle/governors/menu.c | 8 +--
>> drivers/cpuidle/sysfs.c | 2 +-
>> drivers/idle/intel_idle.c | 112 +++++++++++++++++-----------------
>> include/linux/cpuidle.h | 10 ++-
>> 18 files changed, 120 insertions(+), 113 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
>> index 2fca60a..987ee53 100644
>> --- a/arch/arm/include/asm/cpuidle.h
>> +++ b/arch/arm/include/asm/cpuidle.h
>> @@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
>> /* Common ARM WFI state */
>> #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\
>> .enter = arm_cpuidle_simple_enter,\
>> - .exit_latency = 1,\
>> - .target_residency = 1,\
>> - .power_usage = p,\
>> + .power.exit_latency = 1,\
>> + .power.target_residency = 1,\
>> + .power.power_usage = p,\
>> .flags = CPUIDLE_FLAG_TIME_VALID,\
>> .name = "WFI",\
>> .desc = "ARM WFI",\
>> diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c
>> index f57cb91..f6275cb 100644
>> --- a/arch/arm/mach-exynos/cpuidle.c
>> +++ b/arch/arm/mach-exynos/cpuidle.c
>> @@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = {
>> [0] = ARM_CPUIDLE_WFI_STATE,
>> [1] = {
>> .enter = exynos4_enter_lowpower,
>> - .exit_latency = 300,
>> - .target_residency = 100000,
>> + .power.exit_latency = 300,
>> + .power.target_residency = 100000,
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> .name = "C1",
>> .desc = "ARM power down",
>> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
>> index 3dca36d..05fa991 100644
>> --- a/drivers/acpi/processor_idle.c
>> +++ b/drivers/acpi/processor_idle.c
>> @@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
>> state = &drv->states[count];
>> snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
>> strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
>> - state->exit_latency = cx->latency;
>> - state->target_residency = cx->latency * latency_factor;
>> + state->power.exit_latency = cx->latency;
>> + state->power.target_residency = cx->latency * latency_factor;
>>
>> state->flags = 0;
>> switch (cx->type) {
>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>> index bfb8955..6bcb1e8 100644
>> --- a/drivers/base/power/domain.c
>> +++ b/drivers/base/power/domain.c
>> @@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd)
>> usecs64 = genpd->power_on_latency_ns;
>> do_div(usecs64, NSEC_PER_USEC);
>> usecs64 += genpd->cpu_data->saved_exit_latency;
>> - genpd->cpu_data->idle_state->exit_latency = usecs64;
>> + genpd->cpu_data->idle_state->power.exit_latency = usecs64;
>> }
>>
>> /**
>> @@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state)
>> goto err;
>> }
>> cpu_data->idle_state = idle_state;
>> - cpu_data->saved_exit_latency = idle_state->exit_latency;
>> + cpu_data->saved_exit_latency = idle_state->power.exit_latency;
>> genpd->cpu_data = cpu_data;
>> genpd_recalc_cpu_exit_latency(genpd);
>>
>> @@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd)
>> ret = -EAGAIN;
>> goto out;
>> }
>> - idle_state->exit_latency = cpu_data->saved_exit_latency;
>> + idle_state->power.exit_latency = cpu_data->saved_exit_latency;
>> cpuidle_driver_unref();
>> genpd->cpu_data = NULL;
>> kfree(cpu_data);
>> diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c
>> index a077437..48c7063 100644
>> --- a/drivers/cpuidle/cpuidle-at91.c
>> +++ b/drivers/cpuidle/cpuidle-at91.c
>> @@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = {
>> .owner = THIS_MODULE,
>> .states[0] = ARM_CPUIDLE_WFI_STATE,
>> .states[1] = {
>> + .power.exit_latency = 10,
>> + .power.target_residency = 10000,
>> .enter = at91_enter_idle,
>> - .exit_latency = 10,
>> - .target_residency = 10000,
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> .name = "RAM_SR",
>> .desc = "WFI and DDR Self Refresh",
>> diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c
>> index b45fc62..5a0af4b 100644
>> --- a/drivers/cpuidle/cpuidle-big_little.c
>> +++ b/drivers/cpuidle/cpuidle-big_little.c
>> @@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = {
>> .owner = THIS_MODULE,
>> .states[0] = ARM_CPUIDLE_WFI_STATE,
>> .states[1] = {
>> + .power.exit_latency = 700,
>> + .power.target_residency = 2500,
>> .enter = bl_enter_powerdown,
>> - .exit_latency = 700,
>> - .target_residency = 2500,
>> .flags = CPUIDLE_FLAG_TIME_VALID |
>> CPUIDLE_FLAG_TIMER_STOP,
>> .name = "C1",
>> @@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = {
>> .owner = THIS_MODULE,
>> .states[0] = ARM_CPUIDLE_WFI_STATE,
>> .states[1] = {
>> +
>> + .power.exit_latency = 500,
>> + .power.target_residency = 2000,
>> .enter = bl_enter_powerdown,
>> - .exit_latency = 500,
>> - .target_residency = 2000,
>> .flags = CPUIDLE_FLAG_TIME_VALID |
>> CPUIDLE_FLAG_TIMER_STOP,
>> .name = "C1",
>> diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
>> index 6e51114..8357a20 100644
>> --- a/drivers/cpuidle/cpuidle-calxeda.c
>> +++ b/drivers/cpuidle/cpuidle-calxeda.c
>> @@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = {
>> .name = "PG",
>> .desc = "Power Gate",
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 30,
>> - .power_usage = 50,
>> - .target_residency = 200,
>> + .power.exit_latency = 30,
>> + .power.power_usage = 50,
>> + .power.target_residency = 200,
>> .enter = calxeda_pwrdown_idle,
>> },
>> },
>> diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
>> index 41ba843..0ae4138 100644
>> --- a/drivers/cpuidle/cpuidle-kirkwood.c
>> +++ b/drivers/cpuidle/cpuidle-kirkwood.c
>> @@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = {
>> .owner = THIS_MODULE,
>> .states[0] = ARM_CPUIDLE_WFI_STATE,
>> .states[1] = {
>> + .power.exit_latency = 10,
>> + .power.target_residency = 100000,
>> .enter = kirkwood_enter_idle,
>> - .exit_latency = 10,
>> - .target_residency = 100000,
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> .name = "DDR SR",
>> .desc = "WFI and DDR Self Refresh",
>> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
>> index f48607c..c47cc02 100644
>> --- a/drivers/cpuidle/cpuidle-powernv.c
>> +++ b/drivers/cpuidle/cpuidle-powernv.c
>> @@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = {
>> .name = "snooze",
>> .desc = "snooze",
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 0,
>> - .target_residency = 0,
>> + .power.exit_latency = 0,
>> + .power.target_residency = 0,
>> .enter = &snooze_loop },
>> { /* NAP */
>> .name = "NAP",
>> .desc = "NAP",
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 100,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 100,
>> .enter = &nap_loop },
>> };
>>
>> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
>> index 6f7b019..483d7e7 100644
>> --- a/drivers/cpuidle/cpuidle-pseries.c
>> +++ b/drivers/cpuidle/cpuidle-pseries.c
>> @@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = {
>> .name = "snooze",
>> .desc = "snooze",
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 0,
>> - .target_residency = 0,
>> + .power.exit_latency = 0,
>> + .power.target_residency = 0,
>> .enter = &snooze_loop },
>> { /* CEDE */
>> .name = "CEDE",
>> .desc = "CEDE",
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 100,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 100,
>> .enter = &dedicated_cede_loop },
>> };
>>
>> @@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = {
>> .name = "Shared Cede",
>> .desc = "Shared Cede",
>> .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 0,
>> - .target_residency = 0,
>> + .power.exit_latency = 0,
>> + .power.target_residency = 0,
>> .enter = &shared_cede_loop },
>> };
>>
>> diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c
>> index 5e35804..3261eb2 100644
>> --- a/drivers/cpuidle/cpuidle-ux500.c
>> +++ b/drivers/cpuidle/cpuidle-ux500.c
>> @@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = {
>> .states = {
>> ARM_CPUIDLE_WFI_STATE,
>> {
>> - .enter = ux500_enter_idle,
>> - .exit_latency = 70,
>> - .target_residency = 260,
>> - .flags = CPUIDLE_FLAG_TIME_VALID |
>> - CPUIDLE_FLAG_TIMER_STOP,
>> - .name = "ApIdle",
>> - .desc = "ARM Retention",
>> + .power.exit_latency = 70,
>> + .power.target_residency = 260,
>> + .enter = ux500_enter_idle,
>> + .flags = CPUIDLE_FLAG_TIME_VALID |
>> + CPUIDLE_FLAG_TIMER_STOP,
>> + .name = "ApIdle",
>> + .desc = "ARM Retention",
>> },
>> },
>> .safe_state_index = 0,
>> diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c
>> index aded759..dddefb8 100644
>> --- a/drivers/cpuidle/cpuidle-zynq.c
>> +++ b/drivers/cpuidle/cpuidle-zynq.c
>> @@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = {
>> .states = {
>> ARM_CPUIDLE_WFI_STATE,
>> {
>> + .power.exit_latency = 10,
>> + .power.target_residency = 10000,
>> .enter = zynq_enter_idle,
>> - .exit_latency = 10,
>> - .target_residency = 10000,
>> .flags = CPUIDLE_FLAG_TIME_VALID |
>> CPUIDLE_FLAG_TIMER_STOP,
>> .name = "RAM_SR",
>> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
>> index 06dbe7c..40ddd3c 100644
>> --- a/drivers/cpuidle/driver.c
>> +++ b/drivers/cpuidle/driver.c
>> @@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv)
>>
>> snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
>> snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
>> - state->exit_latency = 0;
>> - state->target_residency = 0;
>> - state->power_usage = -1;
>> + state->power.exit_latency = 0;
>> + state->power.target_residency = 0;
>> + state->power.power_usage = -1;
>> state->flags = 0;
>> state->enter = poll_idle;
>> state->disabled = false;
>> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
>> index 9f08e8c..4837880 100644
>> --- a/drivers/cpuidle/governors/ladder.c
>> +++ b/drivers/cpuidle/governors/ladder.c
>> @@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>>
>> if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
>> last_residency = cpuidle_get_last_residency(dev) - \
>> - drv->states[last_idx].exit_latency;
>> + drv->states[last_idx].power.exit_latency;
>> }
>> else
>> last_residency = last_state->threshold.promotion_time + 1;
>> @@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>> !drv->states[last_idx + 1].disabled &&
>> !dev->states_usage[last_idx + 1].disable &&
>> last_residency > last_state->threshold.promotion_time &&
>> - drv->states[last_idx + 1].exit_latency <= latency_req) {
>> + drv->states[last_idx + 1].power.exit_latency <= latency_req) {
>> last_state->stats.promotion_count++;
>> last_state->stats.demotion_count = 0;
>> if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
>> @@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>> if (last_idx > CPUIDLE_DRIVER_STATE_START &&
>> (drv->states[last_idx].disabled ||
>> dev->states_usage[last_idx].disable ||
>> - drv->states[last_idx].exit_latency > latency_req)) {
>> + drv->states[last_idx].power.exit_latency > latency_req)) {
>> int i;
>>
>> for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
>> - if (drv->states[i].exit_latency <= latency_req)
>> + if (drv->states[i].power.exit_latency <= latency_req)
>> break;
>> }
>> ladder_do_selection(ldev, last_idx, i);
>> @@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>> lstate->threshold.demotion_count = DEMOTION_COUNT;
>>
>> if (i < drv->state_count - 1)
>> - lstate->threshold.promotion_time = state->exit_latency;
>> + lstate->threshold.promotion_time =
>> + state->power.exit_latency;
>> if (i > 0)
>> - lstate->threshold.demotion_time = state->exit_latency;
>> + lstate->threshold.demotion_time =
>> + state->power.exit_latency;
>> }
>>
>> return 0;
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..34bd463 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>
>> if (s->disabled || su->disable)
>> continue;
>> - if (s->target_residency > data->predicted_us)
>> + if (s->power.target_residency > data->predicted_us)
>> continue;
>> - if (s->exit_latency > latency_req)
>> + if (s->power.exit_latency > latency_req)
>> continue;
>> - if (s->exit_latency * multiplier > data->predicted_us)
>> + if (s->power.exit_latency * multiplier > data->predicted_us)
>> continue;
>>
>> data->last_state_idx = i;
>> - data->exit_us = s->exit_latency;
>> + data->exit_us = s->power.exit_latency;
>> }
>>
>> return data->last_state_idx;
>> diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
>> index e918b6d..1a45541 100644
>> --- a/drivers/cpuidle/sysfs.c
>> +++ b/drivers/cpuidle/sysfs.c
>> @@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store)
>> static ssize_t show_state_##_name(struct cpuidle_state *state, \
>> struct cpuidle_state_usage *state_usage, char *buf) \
>> { \
>> - return sprintf(buf, "%u\n", state->_name);\
>> + return sprintf(buf, "%u\n", state->power._name);\
>> }
>>
>> #define define_store_state_ull_function(_name) \
>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>> index 8e1939f..4f0533e 100644
>> --- a/drivers/idle/intel_idle.c
>> +++ b/drivers/idle/intel_idle.c
>> @@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = {
>> .name = "C1-NHM",
>> .desc = "MWAIT 0x00",
>> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 3,
>> - .target_residency = 6,
>> + .power.exit_latency = 3,
>> + .power.target_residency = 6,
>> .enter = &intel_idle },
>> {
>> .name = "C1E-NHM",
>> .desc = "MWAIT 0x01",
>> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>> .enter = &intel_idle },
>> {
>> .name = "C3-NHM",
>> .desc = "MWAIT 0x10",
>> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 20,
>> - .target_residency = 80,
>> + .power.exit_latency = 20,
>> + .power.target_residency = 80,
>> .enter = &intel_idle },
>> {
>> .name = "C6-NHM",
>> .desc = "MWAIT 0x20",
>> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 200,
>> - .target_residency = 800,
>> + .power.exit_latency = 200,
>> + .power.target_residency = 800,
>> .enter = &intel_idle },
>> {
>> .enter = NULL }
>> @@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = {
>> .name = "C1-SNB",
>> .desc = "MWAIT 0x00",
>> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 2,
>> - .target_residency = 2,
>> + .power.exit_latency = 2,
>> + .power.target_residency = 2,
>> .enter = &intel_idle },
>> {
>> .name = "C1E-SNB",
>> .desc = "MWAIT 0x01",
>> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>> .enter = &intel_idle },
>> {
>> .name = "C3-SNB",
>> .desc = "MWAIT 0x10",
>> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 80,
>> - .target_residency = 211,
>> + .power.exit_latency = 80,
>> + .power.target_residency = 211,
>> .enter = &intel_idle },
>> {
>> .name = "C6-SNB",
>> .desc = "MWAIT 0x20",
>> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 104,
>> - .target_residency = 345,
>> + .power.exit_latency = 104,
>> + .power.target_residency = 345,
>> .enter = &intel_idle },
>> {
>> .name = "C7-SNB",
>> .desc = "MWAIT 0x30",
>> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 109,
>> - .target_residency = 345,
>> + .power.exit_latency = 109,
>> + .power.target_residency = 345,
>> .enter = &intel_idle },
>> {
>> .enter = NULL }
>> @@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = {
>> .name = "C1-IVB",
>> .desc = "MWAIT 0x00",
>> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 1,
>> - .target_residency = 1,
>> + .power.exit_latency = 1,
>> + .power.target_residency = 1,
>> .enter = &intel_idle },
>> {
>> .name = "C1E-IVB",
>> .desc = "MWAIT 0x01",
>> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>> .enter = &intel_idle },
>> {
>> .name = "C3-IVB",
>> .desc = "MWAIT 0x10",
>> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 59,
>> - .target_residency = 156,
>> + .power.exit_latency = 59,
>> + .power.target_residency = 156,
>> .enter = &intel_idle },
>> {
>> .name = "C6-IVB",
>> .desc = "MWAIT 0x20",
>> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 80,
>> - .target_residency = 300,
>> + .power.exit_latency = 80,
>> + .power.target_residency = 300,
>> .enter = &intel_idle },
>> {
>> .name = "C7-IVB",
>> A .desc = "MWAIT 0x30",
>> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 87,
>> - .target_residency = 300,
>> + .power.exit_latency = 87,
>> + .power.target_residency = 300,
>> .enter = &intel_idle },
>> {
>> .enter = NULL }
>> @@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = {
>> .name = "C1-HSW",
>> .desc = "MWAIT 0x00",
>> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 2,
>> - .target_residency = 2,
>> + .power.exit_latency = 2,
>> + .power.target_residency = 2,
>> .enter = &intel_idle },
>> {
>> .name = "C1E-HSW",
>> .desc = "MWAIT 0x01",
>> .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>> .enter = &intel_idle },
>> {
>> .name = "C3-HSW",
>> .desc = "MWAIT 0x10",
>> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 33,
>> - .target_residency = 100,
>> + .power.exit_latency = 33,
>> + .power.target_residency = 100,
>> .enter = &intel_idle },
>> {
>> .name = "C6-HSW",
>> .desc = "MWAIT 0x20",
>> .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 133,
>> - .target_residency = 400,
>> + .power.exit_latency = 133,
>> + .power.target_residency = 400,
>> .enter = &intel_idle },
>> {
>> .name = "C7s-HSW",
>> .desc = "MWAIT 0x32",
>> .flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 166,
>> - .target_residency = 500,
>> + .power.exit_latency = 166,
>> + .power.target_residency = 500,
>> .enter = &intel_idle },
>> {
>> .name = "C8-HSW",
>> .desc = "MWAIT 0x40",
>> .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 300,
>> - .target_residency = 900,
>> + .power.exit_latency = 300,
>> + .power.target_residency = 900,
>> .enter = &intel_idle },
>> {
>> .name = "C9-HSW",
>> .desc = "MWAIT 0x50",
>> .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 600,
>> - .target_residency = 1800,
>> + .power.exit_latency = 600,
>> + .power.target_residency = 1800,
>> .enter = &intel_idle },
>> {
>> .name = "C10-HSW",
>> .desc = "MWAIT 0x60",
>> .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 2600,
>> - .target_residency = 7700,
>> + .power.exit_latency = 2600,
>> + .power.target_residency = 7700,
>> .enter = &intel_idle },
>> {
>> .enter = NULL }
>> @@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = {
>> .name = "C1E-ATM",
>> .desc = "MWAIT 0x00",
>> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>> .enter = &intel_idle },
>> {
>> .name = "C2-ATM",
>> .desc = "MWAIT 0x10",
>> .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 20,
>> - .target_residency = 80,
>> + .power.exit_latency = 20,
>> + .power.target_residency = 80,
>> .enter = &intel_idle },
>> {
>> .name = "C4-ATM",
>> .desc = "MWAIT 0x30",
>> .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 100,
>> - .target_residency = 400,
>> + .power.exit_latency = 100,
>> + .power.target_residency = 400,
>> .enter = &intel_idle },
>> {
>> .name = "C6-ATM",
>> .desc = "MWAIT 0x52",
>> .flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 140,
>> - .target_residency = 560,
>> + .power.exit_latency = 140,
>> + .power.target_residency = 560,
>> .enter = &intel_idle },
>> {
>> .enter = NULL }
>> @@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = {
>> .name = "C1-AVN",
>> .desc = "MWAIT 0x00",
>> .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 2,
>> - .target_residency = 2,
>> + .power.exit_latency = 2,
>> + .power.target_residency = 2,
>> .enter = &intel_idle },
>> {
>> .name = "C6-AVN",
>> .desc = "MWAIT 0x51",
>> .flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 15,
>> - .target_residency = 45,
>> + .power.exit_latency = 15,
>> + .power.target_residency = 45,
>> .enter = &intel_idle },
>> {
>> .enter = NULL }
>> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
>> index b0238cb..eb58ab3 100644
>> --- a/include/linux/cpuidle.h
>> +++ b/include/linux/cpuidle.h
>> @@ -35,14 +35,18 @@ struct cpuidle_state_usage {
>> unsigned long long time; /* in US */
>> };
>>
>> +struct cpuidle_power {
>> + unsigned int exit_latency; /* in US */
>> + unsigned int target_residency; /* in US */
>> + int power_usage; /* in mW */
>> +};
>> +
>> struct cpuidle_state {
>> char name[CPUIDLE_NAME_LEN];
>> char desc[CPUIDLE_DESC_LEN];
>>
>> unsigned int flags;
>> - unsigned int exit_latency; /* in US */
>> - int power_usage; /* in mW */
>> - unsigned int target_residency; /* in US */
>> + struct cpuidle_power power;
>> bool disabled; /* disabled on all CPUs */
>>
>> int (*enter) (struct cpuidle_device *dev,
>> --
>> 1.7.9.5
>>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
2014-03-28 20:42 ` Daniel Lezcano
@ 2014-03-29 0:00 ` Nicolas Pitre
0 siblings, 0 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-03-29 0:00 UTC (permalink / raw)
To: Daniel Lezcano
Cc: LKML, mingo, Peter Zijlstra, Rafael J. Wysocki, linux-pm,
Alex Shi, Vincent Guittot, Morten Rasmussen
On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> Hi Nicolas,
>
> thanks for reviewing the patchset.
>
> On 03/28/2014 07:17 PM, Nicolas Pitre wrote:
> > On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> >
> >> The scheduler needs some information from cpuidle to know the timing for a
> >> specific idle state a cpu is.
> >>
> >> This patch creates a separate structure to group the cpuidle power info in
> >> order to share it with the scheduler. It improves the encapsulation of the
> >> code.
> >
> > Having cpuidle_power as a structure name, or worse, 'power' as a struct
> > member, is a really bad choice.
>
> Yes, I was asking myself if this name was a good choice or not. I
> assumed 'power' could have been a good name because 'target_residency'
> is a time conversion of the power needed to enter this state.
Still, that's something the casual reviewer might not know.
And we ought to be careful when talking about power as well. By
definition, power means energy transferred per unit of time. Sometimes
we tend to say 'power' when we actually mean 'energy'. With more "power
aware" work going into the scheduler, it is better to disambiguate those
terms.
> > Amongst the fields this struct
> > contains, only 1 out of 3 is about power. The word "power" is already
> > abused quite significantly to mean too many different things already.
> >
> > I'd suggest something inspired by your own patch log message i.e.
> > 'struct cpuidle_info' instead, and use 'info' as a field name within
> > struct cpuidle_state. Having 'params" instead of "info" could be a good
> > alternative too, although slightly longer.
>
> Hmm 'info' or 'param' sound too vague. What about:
>
> cpuidle_attr
> or
> cpuidle_property
As you wish. As long as it isn't 'power'.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
` (2 preceding siblings ...)
2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
@ 2014-03-31 13:52 ` Vincent Guittot
2014-03-31 15:55 ` Daniel Lezcano
2014-04-01 23:01 ` Rafael J. Wysocki
2014-04-04 6:29 ` Len Brown
5 siblings, 1 reply; 47+ messages in thread
From: Vincent Guittot @ 2014-03-31 13:52 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
linux-pm, Alex Shi, Morten Rasmussen
On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> The following patchset provides an interaction between cpuidle and the scheduler.
>
> The first patch encapsulate the needed information for the scheduler in a
> separate cpuidle structure. The second one stores the pointer to this structure
> when entering idle. The third one, use this information to take the decision to
> find the idlest cpu.
>
> After some basic testing with hackbench, it appears there is an improvement for
> the performances (small) and for the duration of the idle states (which provides
> a better power saving).
>
> The measurement has been done with the 'idlestat' tool previously posted in this
> mailing list.
>
> So the benefit is good for both sides performance and power saving.
Hi Daniel,
I have looked at your results and i'm a bit surprised that you have so
much time in C-state with a test that involved 400 tasks on a dual
cores HT system. You shouldn't have any CPUs in idle state when
running hackbench; the total time of core0state in C7-IVB is
87932131.00(us), which is quite huge for a bench that runs 44sec. Or
i'm doing something wrong in the interpretation of the results ?
Regards,
Vincent
>
> The select_idle_sibling could be also improved in the same way.
>
> ====================== test with hackbench 3.14-rc8 =========================
>
> /usr/bin/hackbench -l 10000 -s 4096
>
> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
> Each sender will pass 10000 messages of 4096 bytes
>
> Time: 44.433
>
> Total trace buffer: 1846688 kB
> clusterA@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-VB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 0 0.00 0.00 0.00 0.00
> core0@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-IVB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 1396 87932131.00 62988.63 0.00 320146.00
> cpu0@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 1 14.00 14.00 14.00 14.00
> C1E-VB 0 0.00 0.00 0.00 0.00
> C3-IVB 1 262.00 262.00 262.00 262.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 1180 87938177.00 74523.88 1.00 320147.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 0 0.00 0.00 0.00 0.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 0 0.00 0.00 0.00 0.00
> cpu0 wakeups name count
> irq009 acpi 1
> cpu1@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-VB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 475 87941356.00 185139.70 322.00 1500690.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 0 0.00 0.00 0.00 0.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 0 0.00 0.00 0.00 0.00
> cpu1 wakeups name count
> irq009 acpi 3
> core1@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-IVB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 0 0.00 0.00 0.00 0.00
> cpu2@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 11 288157.00 26196.09 16.00 200060.00
> C1E-VB 6 221601.00 36933.50 79.00 200066.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 950 87417466.00 92018.39 19.00 200074.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 2 34.00 17.00 11.00 23.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 745 18800.00 25.23 2.00 156.00
> cpu2 wakeups name count
> irq019 ahci 50
> irq009 acpi 17
> cpu3@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-VB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 0 0.00 0.00 0.00 0.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 0 0.00 0.00 0.00 0.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 0 0.00 0.00 0.00 0.00
> cpu3 wakeups name count
>
> ================ test with hackbench 3.14-rc8 + patchset ====================
>
> /usr/bin/hackbench -l 10000 -s 4096
>
> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
> Each sender will pass 10000 messages of 4096 bytes
>
> Time: 42.179
>
> Total trace buffer: 1846688 kB
> clusterA@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-VB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 0 0.00 0.00 0.00 0.00
> core0@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-IVB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 880 89157590.00 101315.44 0.00 400184.00
> cpu0@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-VB 1 233.00 233.00 233.00 233.00
> C3-IVB 1 260.00 260.00 260.00 260.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 700 89162006.00 127374.29 182.00 400187.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 0 0.00 0.00 0.00 0.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 0 0.00 0.00 0.00 0.00
> cpu0 wakeups name count
> irq009 acpi 2
> cpu1@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-VB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 334 89164805.00 266960.49 1.00 1500677.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 0 0.00 0.00 0.00 0.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 0 0.00 0.00 0.00 0.00
> cpu1 wakeups name count
> irq009 acpi 6
> core1@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-IVB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 0 0.00 0.00 0.00 0.00
> cpu2@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 19 2169047.00 114160.37 18.00 999129.00
> C1E-IB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 376 86993307.00 231365.18 20.00 1500682.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 0 0.00 0.00 0.00 0.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 0 0.00 0.00 0.00 0.00
> cpu2 wakeups name count
> irq009 acpi 32
> irq019 ahci 45
> cpu3@state hits total(us) avg(us) min(us) max(us)
> POLL 0 0.00 0.00 0.00 0.00
> C1-IVB 0 0.00 0.00 0.00 0.00
> C1E-VB 0 0.00 0.00 0.00 0.00
> C3-IVB 0 0.00 0.00 0.00 0.00
> C6-IVB 0 0.00 0.00 0.00 0.00
> C7-IVB 0 0.00 0.00 0.00 0.00
> 1701 0 0.00 0.00 0.00 0.00
> 1700 0 0.00 0.00 0.00 0.00
> 1600 0 0.00 0.00 0.00 0.00
> 1500 0 0.00 0.00 0.00 0.00
> 1400 0 0.00 0.00 0.00 0.00
> 1300 0 0.00 0.00 0.00 0.00
> 1200 0 0.00 0.00 0.00 0.00
> 1100 0 0.00 0.00 0.00 0.00
> 1000 0 0.00 0.00 0.00 0.00
> 900 0 0.00 0.00 0.00 0.00
> 800 0 0.00 0.00 0.00 0.00
> 782 0 0.00 0.00 0.00 0.00
> cpu3 wakeups name count
>
>
> Daniel Lezcano (3):
> cpuidle: encapsulate power info in a separate structure
> idle: store the idle state the cpu is
> sched/fair: use the idle state info to choose the idlest cpu
>
> arch/arm/include/asm/cpuidle.h | 6 +-
> arch/arm/mach-exynos/cpuidle.c | 4 +-
> drivers/acpi/processor_idle.c | 4 +-
> drivers/base/power/domain.c | 6 +-
> drivers/cpuidle/cpuidle-at91.c | 4 +-
> drivers/cpuidle/cpuidle-big_little.c | 9 +--
> drivers/cpuidle/cpuidle-calxeda.c | 6 +-
> drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
> drivers/cpuidle/cpuidle-powernv.c | 8 +--
> drivers/cpuidle/cpuidle-pseries.c | 12 ++--
> drivers/cpuidle/cpuidle-ux500.c | 14 ++---
> drivers/cpuidle/cpuidle-zynq.c | 4 +-
> drivers/cpuidle/driver.c | 6 +-
> drivers/cpuidle/governors/ladder.c | 14 +++--
> drivers/cpuidle/governors/menu.c | 8 +--
> drivers/cpuidle/sysfs.c | 2 +-
> drivers/idle/intel_idle.c | 112 +++++++++++++++++-----------------
> include/linux/cpuidle.h | 10 ++-
> kernel/sched/fair.c | 46 ++++++++++++--
> kernel/sched/idle.c | 17 +++++-
> kernel/sched/sched.h | 5 ++
> 21 files changed, 180 insertions(+), 121 deletions(-)
>
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
@ 2014-03-31 15:55 ` Daniel Lezcano
2014-04-01 7:16 ` Vincent Guittot
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-31 15:55 UTC (permalink / raw)
To: Vincent Guittot
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
linux-pm, Alex Shi, Morten Rasmussen
On 03/31/2014 03:52 PM, Vincent Guittot wrote:
> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>> The following patchset provides an interaction between cpuidle and the scheduler.
>>
>> The first patch encapsulate the needed information for the scheduler in a
>> separate cpuidle structure. The second one stores the pointer to this structure
>> when entering idle. The third one, use this information to take the decision to
>> find the idlest cpu.
>>
>> After some basic testing with hackbench, it appears there is an improvement for
>> the performances (small) and for the duration of the idle states (which provides
>> a better power saving).
>>
>> The measurement has been done with the 'idlestat' tool previously posted in this
>> mailing list.
>>
>> So the benefit is good for both sides performance and power saving.
>
> Hi Daniel,
>
> I have looked at your results and i'm a bit surprised that you have so
> much time in C-state with a test that involved 400 tasks on a dual
> cores HT system. You shouldn't have any CPUs in idle state when
> running hackbench; the total time of core0state in C7-IVB is
> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
> i'm doing something wrong in the interpretation of the results ?
No, actually I mixed the output of hackbench without being run with
idlestat or with idlestat.
The hackbench's results below are without idlestat.
The idlestat results are consistent and effectively it adds a non
negligeable overhead as it impacts the hackbench results.
So to summarize, hackbench has been run 4 times.
1, 2 : without idlestat, with and without the patchset - hackbench
results ~42 secs
3, 4 : with idlestat, with and without the patchset - hackbench results
~87 secs
At the first the glance, the results are consistent but I will double
check them.
Do you have a suggestion for a benchmarking program ?
Thanks !
-- Daniel
>> The select_idle_sibling could be also improved in the same way.
>>
>> ====================== test with hackbench 3.14-rc8 =========================
>>
>> /usr/bin/hackbench -l 10000 -s 4096
>>
>> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
>> Each sender will pass 10000 messages of 4096 bytes
>>
>> Time: 44.433
>>
>> Total trace buffer: 1846688 kB
>> clusterA@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-VB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 0 0.00 0.00 0.00 0.00
>> core0@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-IVB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 1396 87932131.00 62988.63 0.00 320146.00
>> cpu0@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 1 14.00 14.00 14.00 14.00
>> C1E-VB 0 0.00 0.00 0.00 0.00
>> C3-IVB 1 262.00 262.00 262.00 262.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 1180 87938177.00 74523.88 1.00 320147.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 0 0.00 0.00 0.00 0.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 0 0.00 0.00 0.00 0.00
>> cpu0 wakeups name count
>> irq009 acpi 1
>> cpu1@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-VB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 475 87941356.00 185139.70 322.00 1500690.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 0 0.00 0.00 0.00 0.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 0 0.00 0.00 0.00 0.00
>> cpu1 wakeups name count
>> irq009 acpi 3
>> core1@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-IVB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 0 0.00 0.00 0.00 0.00
>> cpu2@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 11 288157.00 26196.09 16.00 200060.00
>> C1E-VB 6 221601.00 36933.50 79.00 200066.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 950 87417466.00 92018.39 19.00 200074.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 2 34.00 17.00 11.00 23.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 745 18800.00 25.23 2.00 156.00
>> cpu2 wakeups name count
>> irq019 ahci 50
>> irq009 acpi 17
>> cpu3@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-VB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 0 0.00 0.00 0.00 0.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 0 0.00 0.00 0.00 0.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 0 0.00 0.00 0.00 0.00
>> cpu3 wakeups name count
>>
>> ================ test with hackbench 3.14-rc8 + patchset ====================
>>
>> /usr/bin/hackbench -l 10000 -s 4096
>>
>> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
>> Each sender will pass 10000 messages of 4096 bytes
>>
>> Time: 42.179
>>
>> Total trace buffer: 1846688 kB
>> clusterA@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-VB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 0 0.00 0.00 0.00 0.00
>> core0@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-IVB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 880 89157590.00 101315.44 0.00 400184.00
>> cpu0@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-VB 1 233.00 233.00 233.00 233.00
>> C3-IVB 1 260.00 260.00 260.00 260.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 700 89162006.00 127374.29 182.00 400187.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 0 0.00 0.00 0.00 0.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 0 0.00 0.00 0.00 0.00
>> cpu0 wakeups name count
>> irq009 acpi 2
>> cpu1@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-VB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 334 89164805.00 266960.49 1.00 1500677.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 0 0.00 0.00 0.00 0.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 0 0.00 0.00 0.00 0.00
>> cpu1 wakeups name count
>> irq009 acpi 6
>> core1@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-IVB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 0 0.00 0.00 0.00 0.00
>> cpu2@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 19 2169047.00 114160.37 18.00 999129.00
>> C1E-IB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 376 86993307.00 231365.18 20.00 1500682.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 0 0.00 0.00 0.00 0.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 0 0.00 0.00 0.00 0.00
>> cpu2 wakeups name count
>> irq009 acpi 32
>> irq019 ahci 45
>> cpu3@state hits total(us) avg(us) min(us) max(us)
>> POLL 0 0.00 0.00 0.00 0.00
>> C1-IVB 0 0.00 0.00 0.00 0.00
>> C1E-VB 0 0.00 0.00 0.00 0.00
>> C3-IVB 0 0.00 0.00 0.00 0.00
>> C6-IVB 0 0.00 0.00 0.00 0.00
>> C7-IVB 0 0.00 0.00 0.00 0.00
>> 1701 0 0.00 0.00 0.00 0.00
>> 1700 0 0.00 0.00 0.00 0.00
>> 1600 0 0.00 0.00 0.00 0.00
>> 1500 0 0.00 0.00 0.00 0.00
>> 1400 0 0.00 0.00 0.00 0.00
>> 1300 0 0.00 0.00 0.00 0.00
>> 1200 0 0.00 0.00 0.00 0.00
>> 1100 0 0.00 0.00 0.00 0.00
>> 1000 0 0.00 0.00 0.00 0.00
>> 900 0 0.00 0.00 0.00 0.00
>> 800 0 0.00 0.00 0.00 0.00
>> 782 0 0.00 0.00 0.00 0.00
>> cpu3 wakeups name count
>>
>>
>> Daniel Lezcano (3):
>> cpuidle: encapsulate power info in a separate structure
>> idle: store the idle state the cpu is
>> sched/fair: use the idle state info to choose the idlest cpu
>>
>> arch/arm/include/asm/cpuidle.h | 6 +-
>> arch/arm/mach-exynos/cpuidle.c | 4 +-
>> drivers/acpi/processor_idle.c | 4 +-
>> drivers/base/power/domain.c | 6 +-
>> drivers/cpuidle/cpuidle-at91.c | 4 +-
>> drivers/cpuidle/cpuidle-big_little.c | 9 +--
>> drivers/cpuidle/cpuidle-calxeda.c | 6 +-
>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
>> drivers/cpuidle/cpuidle-powernv.c | 8 +--
>> drivers/cpuidle/cpuidle-pseries.c | 12 ++--
>> drivers/cpuidle/cpuidle-ux500.c | 14 ++---
>> drivers/cpuidle/cpuidle-zynq.c | 4 +-
>> drivers/cpuidle/driver.c | 6 +-
>> drivers/cpuidle/governors/ladder.c | 14 +++--
>> drivers/cpuidle/governors/menu.c | 8 +--
>> drivers/cpuidle/sysfs.c | 2 +-
>> drivers/idle/intel_idle.c | 112 +++++++++++++++++-----------------
>> include/linux/cpuidle.h | 10 ++-
>> kernel/sched/fair.c | 46 ++++++++++++--
>> kernel/sched/idle.c | 17 +++++-
>> kernel/sched/sched.h | 5 ++
>> 21 files changed, 180 insertions(+), 121 deletions(-)
>>
>> --
>> 1.7.9.5
>>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-03-31 15:55 ` Daniel Lezcano
@ 2014-04-01 7:16 ` Vincent Guittot
2014-04-01 7:43 ` Daniel Lezcano
0 siblings, 1 reply; 47+ messages in thread
From: Vincent Guittot @ 2014-04-01 7:16 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
linux-pm, Alex Shi, Morten Rasmussen
On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> On 03/31/2014 03:52 PM, Vincent Guittot wrote:
>>
>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>>
>>> The following patchset provides an interaction between cpuidle and the
>>> scheduler.
>>>
>>> The first patch encapsulate the needed information for the scheduler in a
>>> separate cpuidle structure. The second one stores the pointer to this
>>> structure
>>> when entering idle. The third one, use this information to take the
>>> decision to
>>> find the idlest cpu.
>>>
>>> After some basic testing with hackbench, it appears there is an
>>> improvement for
>>> the performances (small) and for the duration of the idle states (which
>>> provides
>>> a better power saving).
>>>
>>> The measurement has been done with the 'idlestat' tool previously posted
>>> in this
>>> mailing list.
>>>
>>> So the benefit is good for both sides performance and power saving.
>>
>>
>> Hi Daniel,
>>
>> I have looked at your results and i'm a bit surprised that you have so
>> much time in C-state with a test that involved 400 tasks on a dual
>> cores HT system. You shouldn't have any CPUs in idle state when
>> running hackbench; the total time of core0state in C7-IVB is
>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
>> i'm doing something wrong in the interpretation of the results ?
>
>
> No, actually I mixed the output of hackbench without being run with idlestat
> or with idlestat.
>
> The hackbench's results below are without idlestat.
>
> The idlestat results are consistent and effectively it adds a non
> negligeable overhead as it impacts the hackbench results.
>
> So to summarize, hackbench has been run 4 times.
>
> 1, 2 : without idlestat, with and without the patchset - hackbench results
> ~42 secs
>
> 3, 4 : with idlestat, with and without the patchset - hackbench results ~87
> secs
>
> At the first the glance, the results are consistent but I will double check
> them.
>
> Do you have a suggestion for a benchmarking program ?
We are working on a bench which can generate middle load pattern with
idle CPUs but it's not available yet. In the mean time, one bench that
plays with idle time is cyclictest, it will not give you performance
results but only scheduling latency which might be what you are
looking for.
Vincent
>
> Thanks !
>
> -- Daniel
>
>
>
>>> The select_idle_sibling could be also improved in the same way.
>>>
>>> ====================== test with hackbench 3.14-rc8
>>> =========================
>>>
>>> /usr/bin/hackbench -l 10000 -s 4096
>>>
>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>> 400 tasks)
>>> Each sender will pass 10000 messages of 4096 bytes
>>>
>>> Time: 44.433
>>>
>>> Total trace buffer: 1846688 kB
>>> clusterA@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>> core0@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 1396 87932131.00 62988.63 0.00
>>> 320146.00
>>> cpu0@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 1 14.00 14.00 14.00 14.00
>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 1 262.00 262.00 262.00 262.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 1180 87938177.00 74523.88 1.00
>>> 320147.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 0 0.00 0.00 0.00 0.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 0 0.00 0.00 0.00 0.00
>>> cpu0 wakeups name count
>>> irq009 acpi 1
>>> cpu1@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 475 87941356.00 185139.70 322.00
>>> 1500690.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 0 0.00 0.00 0.00 0.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 0 0.00 0.00 0.00 0.00
>>> cpu1 wakeups name count
>>> irq009 acpi 3
>>> core1@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>> cpu2@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 11 288157.00 26196.09 16.00
>>> 200060.00
>>> C1E-VB 6 221601.00 36933.50 79.00
>>> 200066.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 950 87417466.00 92018.39 19.00
>>> 200074.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 2 34.00 17.00 11.00 23.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 745 18800.00 25.23 2.00 156.00
>>> cpu2 wakeups name count
>>> irq019 ahci 50
>>> irq009 acpi 17
>>> cpu3@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 0 0.00 0.00 0.00 0.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 0 0.00 0.00 0.00 0.00
>>> cpu3 wakeups name count
>>>
>>> ================ test with hackbench 3.14-rc8 + patchset
>>> ====================
>>>
>>> /usr/bin/hackbench -l 10000 -s 4096
>>>
>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>> 400 tasks)
>>> Each sender will pass 10000 messages of 4096 bytes
>>>
>>> Time: 42.179
>>>
>>> Total trace buffer: 1846688 kB
>>> clusterA@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>> core0@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 880 89157590.00 101315.44 0.00
>>> 400184.00
>>> cpu0@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-VB 1 233.00 233.00 233.00 233.00
>>> C3-IVB 1 260.00 260.00 260.00 260.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 700 89162006.00 127374.29 182.00
>>> 400187.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 0 0.00 0.00 0.00 0.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 0 0.00 0.00 0.00 0.00
>>> cpu0 wakeups name count
>>> irq009 acpi 2
>>> cpu1@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 334 89164805.00 266960.49 1.00
>>> 1500677.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 0 0.00 0.00 0.00 0.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 0 0.00 0.00 0.00 0.00
>>> cpu1 wakeups name count
>>> irq009 acpi 6
>>> core1@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>> cpu2@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 19 2169047.00 114160.37 18.00
>>> 999129.00
>>> C1E-IB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 376 86993307.00 231365.18 20.00
>>> 1500682.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 0 0.00 0.00 0.00 0.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 0 0.00 0.00 0.00 0.00
>>> cpu2 wakeups name count
>>> irq009 acpi 32
>>> irq019 ahci 45
>>> cpu3@state hits total(us) avg(us) min(us) max(us)
>>> POLL 0 0.00 0.00 0.00 0.00
>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>> 1701 0 0.00 0.00 0.00 0.00
>>> 1700 0 0.00 0.00 0.00 0.00
>>> 1600 0 0.00 0.00 0.00 0.00
>>> 1500 0 0.00 0.00 0.00 0.00
>>> 1400 0 0.00 0.00 0.00 0.00
>>> 1300 0 0.00 0.00 0.00 0.00
>>> 1200 0 0.00 0.00 0.00 0.00
>>> 1100 0 0.00 0.00 0.00 0.00
>>> 1000 0 0.00 0.00 0.00 0.00
>>> 900 0 0.00 0.00 0.00 0.00
>>> 800 0 0.00 0.00 0.00 0.00
>>> 782 0 0.00 0.00 0.00 0.00
>>> cpu3 wakeups name count
>>>
>>>
>>> Daniel Lezcano (3):
>>> cpuidle: encapsulate power info in a separate structure
>>> idle: store the idle state the cpu is
>>> sched/fair: use the idle state info to choose the idlest cpu
>>>
>>> arch/arm/include/asm/cpuidle.h | 6 +-
>>> arch/arm/mach-exynos/cpuidle.c | 4 +-
>>> drivers/acpi/processor_idle.c | 4 +-
>>> drivers/base/power/domain.c | 6 +-
>>> drivers/cpuidle/cpuidle-at91.c | 4 +-
>>> drivers/cpuidle/cpuidle-big_little.c | 9 +--
>>> drivers/cpuidle/cpuidle-calxeda.c | 6 +-
>>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
>>> drivers/cpuidle/cpuidle-powernv.c | 8 +--
>>> drivers/cpuidle/cpuidle-pseries.c | 12 ++--
>>> drivers/cpuidle/cpuidle-ux500.c | 14 ++---
>>> drivers/cpuidle/cpuidle-zynq.c | 4 +-
>>> drivers/cpuidle/driver.c | 6 +-
>>> drivers/cpuidle/governors/ladder.c | 14 +++--
>>> drivers/cpuidle/governors/menu.c | 8 +--
>>> drivers/cpuidle/sysfs.c | 2 +-
>>> drivers/idle/intel_idle.c | 112
>>> +++++++++++++++++-----------------
>>> include/linux/cpuidle.h | 10 ++-
>>> kernel/sched/fair.c | 46 ++++++++++++--
>>> kernel/sched/idle.c | 17 +++++-
>>> kernel/sched/sched.h | 5 ++
>>> 21 files changed, 180 insertions(+), 121 deletions(-)
>>>
>>> --
>>> 1.7.9.5
>>>
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-01 7:16 ` Vincent Guittot
@ 2014-04-01 7:43 ` Daniel Lezcano
2014-04-01 9:05 ` Vincent Guittot
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-01 7:43 UTC (permalink / raw)
To: Vincent Guittot
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
linux-pm, Alex Shi, Morten Rasmussen
On 04/01/2014 09:16 AM, Vincent Guittot wrote:
> On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>> On 03/31/2014 03:52 PM, Vincent Guittot wrote:
>>>
>>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> The following patchset provides an interaction between cpuidle and the
>>>> scheduler.
>>>>
>>>> The first patch encapsulate the needed information for the scheduler in a
>>>> separate cpuidle structure. The second one stores the pointer to this
>>>> structure
>>>> when entering idle. The third one, use this information to take the
>>>> decision to
>>>> find the idlest cpu.
>>>>
>>>> After some basic testing with hackbench, it appears there is an
>>>> improvement for
>>>> the performances (small) and for the duration of the idle states (which
>>>> provides
>>>> a better power saving).
>>>>
>>>> The measurement has been done with the 'idlestat' tool previously posted
>>>> in this
>>>> mailing list.
>>>>
>>>> So the benefit is good for both sides performance and power saving.
>>>
>>>
>>> Hi Daniel,
>>>
>>> I have looked at your results and i'm a bit surprised that you have so
>>> much time in C-state with a test that involved 400 tasks on a dual
>>> cores HT system. You shouldn't have any CPUs in idle state when
>>> running hackbench; the total time of core0state in C7-IVB is
>>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
>>> i'm doing something wrong in the interpretation of the results ?
>>
>>
>> No, actually I mixed the output of hackbench without being run with idlestat
>> or with idlestat.
>>
>> The hackbench's results below are without idlestat.
>>
>> The idlestat results are consistent and effectively it adds a non
>> negligeable overhead as it impacts the hackbench results.
>>
>> So to summarize, hackbench has been run 4 times.
>>
>> 1, 2 : without idlestat, with and without the patchset - hackbench results
>> ~42 secs
>>
>> 3, 4 : with idlestat, with and without the patchset - hackbench results ~87
>> secs
>>
>> At the first the glance, the results are consistent but I will double check
>> them.
>>
>> Do you have a suggestion for a benchmarking program ?
>
> We are working on a bench which can generate middle load pattern with
> idle CPUs but it's not available yet. In the mean time, one bench that
> plays with idle time is cyclictest, it will not give you performance
> results but only scheduling latency which might be what you are
> looking for.
Yeah, thanks. I believe I know what is in the rt-tests package :)
What I meant is what kind of values would you like to see with this
patchset ?
>>>> The select_idle_sibling could be also improved in the same way.
>>>>
>>>> ====================== test with hackbench 3.14-rc8
>>>> =========================
>>>>
>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>
>>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>>> 400 tasks)
>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>
>>>> Time: 44.433
>>>>
>>>> Total trace buffer: 1846688 kB
>>>> clusterA@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>> core0@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 1396 87932131.00 62988.63 0.00
>>>> 320146.00
>>>> cpu0@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 1 14.00 14.00 14.00 14.00
>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 1 262.00 262.00 262.00 262.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 1180 87938177.00 74523.88 1.00
>>>> 320147.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 0 0.00 0.00 0.00 0.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 0 0.00 0.00 0.00 0.00
>>>> cpu0 wakeups name count
>>>> irq009 acpi 1
>>>> cpu1@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 475 87941356.00 185139.70 322.00
>>>> 1500690.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 0 0.00 0.00 0.00 0.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 0 0.00 0.00 0.00 0.00
>>>> cpu1 wakeups name count
>>>> irq009 acpi 3
>>>> core1@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>> cpu2@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 11 288157.00 26196.09 16.00
>>>> 200060.00
>>>> C1E-VB 6 221601.00 36933.50 79.00
>>>> 200066.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 950 87417466.00 92018.39 19.00
>>>> 200074.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 2 34.00 17.00 11.00 23.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 745 18800.00 25.23 2.00 156.00
>>>> cpu2 wakeups name count
>>>> irq019 ahci 50
>>>> irq009 acpi 17
>>>> cpu3@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 0 0.00 0.00 0.00 0.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 0 0.00 0.00 0.00 0.00
>>>> cpu3 wakeups name count
>>>>
>>>> ================ test with hackbench 3.14-rc8 + patchset
>>>> ====================
>>>>
>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>
>>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>>> 400 tasks)
>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>
>>>> Time: 42.179
>>>>
>>>> Total trace buffer: 1846688 kB
>>>> clusterA@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>> core0@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 880 89157590.00 101315.44 0.00
>>>> 400184.00
>>>> cpu0@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-VB 1 233.00 233.00 233.00 233.00
>>>> C3-IVB 1 260.00 260.00 260.00 260.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 700 89162006.00 127374.29 182.00
>>>> 400187.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 0 0.00 0.00 0.00 0.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 0 0.00 0.00 0.00 0.00
>>>> cpu0 wakeups name count
>>>> irq009 acpi 2
>>>> cpu1@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 334 89164805.00 266960.49 1.00
>>>> 1500677.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 0 0.00 0.00 0.00 0.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 0 0.00 0.00 0.00 0.00
>>>> cpu1 wakeups name count
>>>> irq009 acpi 6
>>>> core1@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>> cpu2@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 19 2169047.00 114160.37 18.00
>>>> 999129.00
>>>> C1E-IB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 376 86993307.00 231365.18 20.00
>>>> 1500682.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 0 0.00 0.00 0.00 0.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 0 0.00 0.00 0.00 0.00
>>>> cpu2 wakeups name count
>>>> irq009 acpi 32
>>>> irq019 ahci 45
>>>> cpu3@state hits total(us) avg(us) min(us) max(us)
>>>> POLL 0 0.00 0.00 0.00 0.00
>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>> 1701 0 0.00 0.00 0.00 0.00
>>>> 1700 0 0.00 0.00 0.00 0.00
>>>> 1600 0 0.00 0.00 0.00 0.00
>>>> 1500 0 0.00 0.00 0.00 0.00
>>>> 1400 0 0.00 0.00 0.00 0.00
>>>> 1300 0 0.00 0.00 0.00 0.00
>>>> 1200 0 0.00 0.00 0.00 0.00
>>>> 1100 0 0.00 0.00 0.00 0.00
>>>> 1000 0 0.00 0.00 0.00 0.00
>>>> 900 0 0.00 0.00 0.00 0.00
>>>> 800 0 0.00 0.00 0.00 0.00
>>>> 782 0 0.00 0.00 0.00 0.00
>>>> cpu3 wakeups name count
>>>>
>>>>
>>>> Daniel Lezcano (3):
>>>> cpuidle: encapsulate power info in a separate structure
>>>> idle: store the idle state the cpu is
>>>> sched/fair: use the idle state info to choose the idlest cpu
>>>>
>>>> arch/arm/include/asm/cpuidle.h | 6 +-
>>>> arch/arm/mach-exynos/cpuidle.c | 4 +-
>>>> drivers/acpi/processor_idle.c | 4 +-
>>>> drivers/base/power/domain.c | 6 +-
>>>> drivers/cpuidle/cpuidle-at91.c | 4 +-
>>>> drivers/cpuidle/cpuidle-big_little.c | 9 +--
>>>> drivers/cpuidle/cpuidle-calxeda.c | 6 +-
>>>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
>>>> drivers/cpuidle/cpuidle-powernv.c | 8 +--
>>>> drivers/cpuidle/cpuidle-pseries.c | 12 ++--
>>>> drivers/cpuidle/cpuidle-ux500.c | 14 ++---
>>>> drivers/cpuidle/cpuidle-zynq.c | 4 +-
>>>> drivers/cpuidle/driver.c | 6 +-
>>>> drivers/cpuidle/governors/ladder.c | 14 +++--
>>>> drivers/cpuidle/governors/menu.c | 8 +--
>>>> drivers/cpuidle/sysfs.c | 2 +-
>>>> drivers/idle/intel_idle.c | 112
>>>> +++++++++++++++++-----------------
>>>> include/linux/cpuidle.h | 10 ++-
>>>> kernel/sched/fair.c | 46 ++++++++++++--
>>>> kernel/sched/idle.c | 17 +++++-
>>>> kernel/sched/sched.h | 5 ++
>>>> 21 files changed, 180 insertions(+), 121 deletions(-)
>>>>
>>>> --
>>>> 1.7.9.5
>>>>
>>
>>
>> --
>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-01 7:43 ` Daniel Lezcano
@ 2014-04-01 9:05 ` Vincent Guittot
2014-04-15 13:13 ` Peter Zijlstra
0 siblings, 1 reply; 47+ messages in thread
From: Vincent Guittot @ 2014-04-01 9:05 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
linux-pm, Alex Shi, Morten Rasmussen
On 1 April 2014 09:43, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> On 04/01/2014 09:16 AM, Vincent Guittot wrote:
>>
>> On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>>
>>> On 03/31/2014 03:52 PM, Vincent Guittot wrote:
>>>>
>>>>
>>>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> The following patchset provides an interaction between cpuidle and the
>>>>> scheduler.
>>>>>
>>>>> The first patch encapsulate the needed information for the scheduler in
>>>>> a
>>>>> separate cpuidle structure. The second one stores the pointer to this
>>>>> structure
>>>>> when entering idle. The third one, use this information to take the
>>>>> decision to
>>>>> find the idlest cpu.
>>>>>
>>>>> After some basic testing with hackbench, it appears there is an
>>>>> improvement for
>>>>> the performances (small) and for the duration of the idle states (which
>>>>> provides
>>>>> a better power saving).
>>>>>
>>>>> The measurement has been done with the 'idlestat' tool previously
>>>>> posted
>>>>> in this
>>>>> mailing list.
>>>>>
>>>>> So the benefit is good for both sides performance and power saving.
>>>>
>>>>
>>>>
>>>> Hi Daniel,
>>>>
>>>> I have looked at your results and i'm a bit surprised that you have so
>>>> much time in C-state with a test that involved 400 tasks on a dual
>>>> cores HT system. You shouldn't have any CPUs in idle state when
>>>> running hackbench; the total time of core0state in C7-IVB is
>>>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
>>>> i'm doing something wrong in the interpretation of the results ?
>>>
>>>
>>>
>>> No, actually I mixed the output of hackbench without being run with
>>> idlestat
>>> or with idlestat.
>>>
>>> The hackbench's results below are without idlestat.
>>>
>>> The idlestat results are consistent and effectively it adds a non
>>> negligeable overhead as it impacts the hackbench results.
>>>
>>> So to summarize, hackbench has been run 4 times.
>>>
>>> 1, 2 : without idlestat, with and without the patchset - hackbench
>>> results
>>> ~42 secs
>>>
>>> 3, 4 : with idlestat, with and without the patchset - hackbench results
>>> ~87
>>> secs
>>>
>>> At the first the glance, the results are consistent but I will double
>>> check
>>> them.
>>>
>>> Do you have a suggestion for a benchmarking program ?
>>
>>
>> We are working on a bench which can generate middle load pattern with
>> idle CPUs but it's not available yet. In the mean time, one bench that
>> plays with idle time is cyclictest, it will not give you performance
>> results but only scheduling latency which might be what you are
>> looking for.
>
>
> Yeah, thanks. I believe I know what is in the rt-tests package :)
>
> What I meant is what kind of values would you like to see with this patchset
> ?
IIUC, you patch tries to improve the wake up latency of a task by
selecting the CPUs with the shallowest C-state, so this metrics seems
to be a good candidate
>
>
>
>
>>>>> The select_idle_sibling could be also improved in the same way.
>>>>>
>>>>> ====================== test with hackbench 3.14-rc8
>>>>> =========================
>>>>>
>>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>>
>>>>> Running in process mode with 10 groups using 40 file descriptors each
>>>>> (==
>>>>> 400 tasks)
>>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>>
>>>>> Time: 44.433
>>>>>
>>>>> Total trace buffer: 1846688 kB
>>>>> clusterA@state hits total(us) avg(us) min(us) max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>>> core0@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 1396 87932131.00 62988.63 0.00
>>>>> 320146.00
>>>>> cpu0@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 1 14.00 14.00 14.00 14.00
>>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 1 262.00 262.00 262.00
>>>>> 262.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 1180 87938177.00 74523.88 1.00
>>>>> 320147.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 0 0.00 0.00 0.00 0.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.0
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 0 0.00 0.00 0.00 0.00
>>>>> cpu0 wakeups name count
>>>>> irq009 acpi 1
>>>>> cpu1@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 475 87941356.00 185139.70 322.00
>>>>> 1500690.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 0 0.00 0.00 0.00 0.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.00
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 0 0.00 0.00 0.00 0.00
>>>>> cpu1 wakeups name count
>>>>> irq009 acpi 3
>>>>> core1@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>>> cpu2@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 11 288157.00 26196.09 16.00
>>>>> 200060.00
>>>>> C1E-VB 6 221601.00 36933.50 79.00
>>>>> 200066.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 950 87417466.00 92018.39 19.00
>>>>> 200074.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 2 34.00 17.00 11.00 23.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.00
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 745 18800.00 25.23 2.00
>>>>> 156.00
>>>>> cpu2 wakeups name count
>>>>> irq019 ahci 50
>>>>> irq009 acpi 17
>>>>> cpu3@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 0 0.00 0.00 0.00 0.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.00
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 0 0.00 0.00 0.00 0.00
>>>>> cpu3 wakeups name count
>>>>>
>>>>> ================ test with hackbench 3.14-rc8 + patchset
>>>>> ====================
>>>>>
>>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>>
>>>>> Running in process mode with 10 groups using 40 file descriptors each
>>>>> (==
>>>>> 400 tasks)
>>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>>
>>>>> Time: 42.179
>>>>>
>>>>> Total trace buffer: 1846688 kB
>>>>> clusterA@state hits total(us) avg(us) min(us) max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>>> core0@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 880 89157590.00 101315.44 0.00
>>>>> 400184.00
>>>>> cpu0@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-VB 1 233.00 233.00 233.00
>>>>> 233.00
>>>>> C3-IVB 1 260.00 260.00 260.00
>>>>> 260.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 700 89162006.00 127374.29 182.00
>>>>> 400187.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 0 0.00 0.00 0.00 0.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.00
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 0 0.00 0.00 0.00 0.00
>>>>> cpu0 wakeups name count
>>>>> irq009 acpi 2
>>>>> cpu1@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 334 89164805.00 266960.49 1.00
>>>>> 1500677.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 0 0.00 0.00 0.00 0.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.00
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 0 0.00 0.00 0.00 0.00
>>>>> cpu1 wakeups name count
>>>>> irq009 acpi 6
>>>>> core1@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-IVB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>>> cpu2@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 19 2169047.00 114160.37 18.00
>>>>> 999129.00
>>>>> C1E-IB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 376 86993307.00 231365.18 20.00
>>>>> 1500682.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 0 0.00 0.00 0.00 0.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.00
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 0 0.00 0.00 0.00 0.00
>>>>> cpu2 wakeups name count
>>>>> irq009 acpi 32
>>>>> irq019 ahci 45
>>>>> cpu3@state hits total(us) avg(us) min(us)
>>>>> max(us)
>>>>> POLL 0 0.00 0.00 0.00 0.00
>>>>> C1-IVB 0 0.00 0.00 0.00 0.00
>>>>> C1E-VB 0 0.00 0.00 0.00 0.00
>>>>> C3-IVB 0 0.00 0.00 0.00 0.00
>>>>> C6-IVB 0 0.00 0.00 0.00 0.00
>>>>> C7-IVB 0 0.00 0.00 0.00 0.00
>>>>> 1701 0 0.00 0.00 0.00 0.00
>>>>> 1700 0 0.00 0.00 0.00 0.00
>>>>> 1600 0 0.00 0.00 0.00 0.00
>>>>> 1500 0 0.00 0.00 0.00 0.00
>>>>> 1400 0 0.00 0.00 0.00 0.00
>>>>> 1300 0 0.00 0.00 0.00 0.00
>>>>> 1200 0 0.00 0.00 0.00 0.00
>>>>> 1100 0 0.00 0.00 0.00 0.00
>>>>> 1000 0 0.00 0.00 0.00 0.00
>>>>> 900 0 0.00 0.00 0.00 0.00
>>>>> 800 0 0.00 0.00 0.00 0.00
>>>>> 782 0 0.00 0.00 0.00 0.00
>>>>> cpu3 wakeups name count
>>>>>
>>>>>
>>>>> Daniel Lezcano (3):
>>>>> cpuidle: encapsulate power info in a separate structure
>>>>> idle: store the idle state the cpu is
>>>>> sched/fair: use the idle state info to choose the idlest cpu
>>>>>
>>>>> arch/arm/include/asm/cpuidle.h | 6 +-
>>>>> arch/arm/mach-exynos/cpuidle.c | 4 +-
>>>>> drivers/acpi/processor_idle.c | 4 +-
>>>>> drivers/base/power/domain.c | 6 +-
>>>>> drivers/cpuidle/cpuidle-at91.c | 4 +-
>>>>> drivers/cpuidle/cpuidle-big_little.c | 9 +--
>>>>> drivers/cpuidle/cpuidle-calxeda.c | 6 +-
>>>>> drivers/cpuidle/cpuidle-kirkwood.c | 4 +-
>>>>> drivers/cpuidle/cpuidle-powernv.c | 8 +--
>>>>> drivers/cpuidle/cpuidle-pseries.c | 12 ++--
>>>>> drivers/cpuidle/cpuidle-ux500.c | 14 ++---
>>>>> drivers/cpuidle/cpuidle-zynq.c | 4 +-
>>>>> drivers/cpuidle/driver.c | 6 +-
>>>>> drivers/cpuidle/governors/ladder.c | 14 +++--
>>>>> drivers/cpuidle/governors/menu.c | 8 +--
>>>>> drivers/cpuidle/sysfs.c | 2 +-
>>>>> drivers/idle/intel_idle.c | 112
>>>>> +++++++++++++++++-----------------
>>>>> include/linux/cpuidle.h | 10 ++-
>>>>> kernel/sched/fair.c | 46 ++++++++++++--
>>>>> kernel/sched/idle.c | 17 +++++-
>>>>> kernel/sched/sched.h | 5 ++
>>>>> 21 files changed, 180 insertions(+), 121 deletions(-)
>>>>>
>>>>> --
>>>>> 1.7.9.5
>>>>>
>>>
>>>
>>> --
>>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>>
>>> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
>>> <http://twitter.com/#!/linaroorg> Twitter |
>>> <http://www.linaro.org/linaro-blog/> Blog
>>>
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
` (3 preceding siblings ...)
2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
@ 2014-04-01 23:01 ` Rafael J. Wysocki
2014-04-02 3:14 ` Nicolas Pitre
2014-04-02 8:26 ` Daniel Lezcano
2014-04-04 6:29 ` Len Brown
5 siblings, 2 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-01 23:01 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> The following patchset provides an interaction between cpuidle and the scheduler.
>
> The first patch encapsulate the needed information for the scheduler in a
> separate cpuidle structure. The second one stores the pointer to this structure
> when entering idle. The third one, use this information to take the decision to
> find the idlest cpu.
>
> After some basic testing with hackbench, it appears there is an improvement for
> the performances (small) and for the duration of the idle states (which provides
> a better power saving).
>
> The measurement has been done with the 'idlestat' tool previously posted in this
> mailing list.
>
> So the benefit is good for both sides performance and power saving.
>
> The select_idle_sibling could be also improved in the same way.
Well, quite frankly, I don't really like this series. Not the idea itself, but
the way it has been implemented.
First off, if the scheduler is to access idle state data stored in struct
cpuidle_state, I'm not sure why we need a separate new structure for that?
Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
instead? [->exit_latency is the only field that find_idlest_cpu() in your
third patch seems to be using anyway.]
Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to
non-idle while another one is executing find_idlest_cpu()? In other words,
where's the read memory barrier corresponding to the write ones in the modified
cpu_idle_call()? And is the memory barrier actually sufficient? After all,
you need to guarantee that the CPU is still idle after you have evaluated
idle_cpu() on it.
Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
CPU the best one? What about deeper vs shallower idle states, for example?
Rafael
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
@ 2014-04-02 3:05 ` Nicolas Pitre
2014-04-04 11:57 ` Rafael J. Wysocki
2014-04-17 13:53 ` Daniel Lezcano
2014-04-15 13:03 ` Peter Zijlstra
1 sibling, 2 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-02 3:05 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> As we know in which idle state the cpu is, we can investigate the following:
>
> 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> deeper it is idle
> 2. what exit latency is ? the greater the exit latency is, the deeper it is
>
> With both information, when all cpus are idle, we can choose the idlest cpu.
>
> When one cpu is not idle, the old check against weighted load applies.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
There seems to be some problems with the implementation.
> @@ -4336,20 +4337,53 @@ static int
> find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> {
> unsigned long load, min_load = ULONG_MAX;
> - int idlest = -1;
> + unsigned int min_exit_latency = UINT_MAX;
> + u64 idle_stamp, min_idle_stamp = ULONG_MAX;
I don't think you really meant to assign an u64 variable with ULONG_MAX.
You probably want ULLONG_MAX here. And probably not in fact (more
later).
> +
> + struct rq *rq;
> + struct cpuidle_power *power;
> +
> + int cpu_idle = -1;
> + int cpu_busy = -1;
> int i;
>
> /* Traverse only the allowed CPUs */
> for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> - load = weighted_cpuload(i);
>
> - if (load < min_load || (load == min_load && i == this_cpu)) {
> - min_load = load;
> - idlest = i;
> + if (idle_cpu(i)) {
> +
> + rq = cpu_rq(i);
> + power = rq->power;
> + idle_stamp = rq->idle_stamp;
> +
> + /* The cpu is idle since a shorter time */
> + if (idle_stamp < min_idle_stamp) {
> + min_idle_stamp = idle_stamp;
> + cpu_idle = i;
> + continue;
Don't you want the highest time stamp in order to select the most
recently idled CPU? Favoring the CPU which has been idle the longest
makes little sense.
> + }
> +
> + /* The cpu is idle but the exit_latency is shorter */
> + if (power && power->exit_latency < min_exit_latency) {
> + min_exit_latency = power->exit_latency;
> + cpu_idle = i;
> + continue;
> + }
I think this is wrong. This gives priority to CPUs which have been idle
for a (longer... although this should have been) shorter period of time
over those with a shallower idle state. I think this should rather be:
if (power && power->exit_latency < min_exit_latency) {
min_exit_latency = power->exit_latency;
latest_idle_stamp = idle_stamp;
cpu_idle = i;
} else if ((!power || power->exit_latency == min_exit_latency) &&
idle_stamp > latest_idle_stamp) {
latest_idle_stamp = idle_stamp;
cpu_idle = i;
}
So the CPU with the shallowest idle state is selected in priority, and
if many CPUs are in the same state then the time stamp is used to
select the most recent one. Whenever
a shallower idle state is found then the latest_idle_stamp is reset for
that state even if it is further in the past.
> + } else {
> +
> + load = weighted_cpuload(i);
> +
> + if (load < min_load ||
> + (load == min_load && i == this_cpu)) {
> + min_load = load;
> + cpu_busy = i;
> + continue;
> + }
> }
I think this is wrong to do an if-else based on idle_cpu() here. What
if a CPU is heavily loaded, but for some reason it happens to be idle at
this very moment? With your patch it could be selected as an idle CPU
while it would be discarded as being too busy otherwise.
It is important to determine both cpu_busy and cpu_idle for all CPUs.
And cpu_busy is a bad name for this. Something like least_loaded would
be more self explanatory. Same thing for cpu_idle which could be
clearer if named shalloest_idle.
> - return idlest;
> + /* Busy cpus are considered less idle than idle cpus ;) */
> + return cpu_busy != -1 ? cpu_busy : cpu_idle;
And finally it is a policy decision whether or not we want to return
least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs
first or not. That in itself needs more investigation. To keep the
existing policy unchanged for now the above condition should have its
variables swapped.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-01 23:01 ` Rafael J. Wysocki
@ 2014-04-02 3:14 ` Nicolas Pitre
2014-04-04 11:43 ` Rafael J. Wysocki
2014-04-02 8:26 ` Daniel Lezcano
1 sibling, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-02 3:14 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Wed, 2 Apr 2014, Rafael J. Wysocki wrote:
> On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> > The following patchset provides an interaction between cpuidle and the scheduler.
> >
> > The first patch encapsulate the needed information for the scheduler in a
> > separate cpuidle structure. The second one stores the pointer to this structure
> > when entering idle. The third one, use this information to take the decision to
> > find the idlest cpu.
> >
> > After some basic testing with hackbench, it appears there is an improvement for
> > the performances (small) and for the duration of the idle states (which provides
> > a better power saving).
> >
> > The measurement has been done with the 'idlestat' tool previously posted in this
> > mailing list.
> >
> > So the benefit is good for both sides performance and power saving.
> >
> > The select_idle_sibling could be also improved in the same way.
>
> Well, quite frankly, I don't really like this series. Not the idea itself, but
> the way it has been implemented.
>
> First off, if the scheduler is to access idle state data stored in struct
> cpuidle_state, I'm not sure why we need a separate new structure for that?
> Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> instead? [->exit_latency is the only field that find_idlest_cpu() in your
> third patch seems to be using anyway.]
Future patches are likely to use the other fields. I presume that's why
Daniel put them there.
But I admit being on the fence about this i.e whether or not we should
encapsulate shared fields into a separate structure or not.
> Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to
> non-idle while another one is executing find_idlest_cpu()? In other words,
> where's the read memory barrier corresponding to the write ones in the modified
> cpu_idle_call()? And is the memory barrier actually sufficient? After all,
> you need to guarantee that the CPU is still idle after you have evaluated
> idle_cpu() on it.
I don't think avoiding races is all that important here. Right now any
idle CPU is selected regardless of its idle state depth. What this
patch should do (considering my previous comments on it) is to favor the
idle CPU with the shalloest idle state. If once in a while the
selection is wrong because of a race we're not going to make it any
worse than what we have today without this patch.
That probably means the write barrier could potentially be omitted as
well if it implies a useless cost.
We need to ensure the cpuidle data structure is not going away (e.g.
cpuidle driver module removal) while another CPU looks at it though.
The timing would have to be awfully weird for this to happen but still.
> Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> CPU the best one? What about deeper vs shallower idle states, for example?
That's what this patch series is about. The find_idlest_cpu code should
look for the idle CPU with the shallowest idle state, or the one with
the smallest load. In this context "find_idlest_cpu" might become a
misnomer.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-01 23:01 ` Rafael J. Wysocki
2014-04-02 3:14 ` Nicolas Pitre
@ 2014-04-02 8:26 ` Daniel Lezcano
2014-04-04 11:23 ` Rafael J. Wysocki
1 sibling, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-02 8:26 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On 04/02/2014 01:01 AM, Rafael J. Wysocki wrote:
> On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
>> The following patchset provides an interaction between cpuidle and the scheduler.
>>
>> The first patch encapsulate the needed information for the scheduler in a
>> separate cpuidle structure. The second one stores the pointer to this structure
>> when entering idle. The third one, use this information to take the decision to
>> find the idlest cpu.
>>
>> After some basic testing with hackbench, it appears there is an improvement for
>> the performances (small) and for the duration of the idle states (which provides
>> a better power saving).
>>
>> The measurement has been done with the 'idlestat' tool previously posted in this
>> mailing list.
>>
>> So the benefit is good for both sides performance and power saving.
>>
>> The select_idle_sibling could be also improved in the same way.
>
> Well, quite frankly, I don't really like this series. Not the idea itself, but
> the way it has been implemented.
>
> First off, if the scheduler is to access idle state data stored in struct
> cpuidle_state, I'm not sure why we need a separate new structure for that?
> Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> instead? [->exit_latency is the only field that find_idlest_cpu() in your
> third patch seems to be using anyway.]
Hi Rafael,
thank you very much for reviewing the patchset.
I created a specific structure to encapsulate the informations needed
for the scheduler and to prevent to export unneeded data. This is purely
for code design. Also it was to separate the idle's energy
characteristics from the cpuidle framework data (flags, name, etc ...).
The exit_latency field is only used in this patchset but the
target_residency will be used also (eg. prevent to wakeup a cpu before
the minimum idle time target residency).
The power field is ... hum ... not filled by any board (except for
calxeda). Vendors do not like to share this information, so very likely
that would be changed to a normalized value, I don't know.
I agree we can put a pointer to the struct cpuidle_state instead if that
reduce the impact of the patchset.
> Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to
> non-idle while another one is executing find_idlest_cpu()? In other words,
> where's the read memory barrier corresponding to the write ones in the modified
> cpu_idle_call()? And is the memory barrier actually sufficient? After all,
> you need to guarantee that the CPU is still idle after you have evaluated
> idle_cpu() on it.
Well, as Nicolas mentioned it in another mail, we can live with races,
the scheduler will take a wrong decision but nothing worth than what we
have today. In any case we want to prevent any lock in the code.
> Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> CPU the best one? What about deeper vs shallower idle states, for example?
I believe it is what is supposed to do the patchset. 1. if the cpu is
idle, pick the shallower, 2. if the cpu is not idle pick the less
loaded. But may be there is something wrong in the routine as pointed
Nico, I have to double check it.
Thanks !
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
` (4 preceding siblings ...)
2014-04-01 23:01 ` Rafael J. Wysocki
@ 2014-04-04 6:29 ` Len Brown
2014-04-04 8:16 ` Daniel Lezcano
5 siblings, 1 reply; 47+ messages in thread
From: Len Brown @ 2014-04-04 6:29 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J. Wysocki,
nicolas.pitre, Linux PM list, alex.shi, vincent.guittot,
morten.rasmussen
Hi Daniel,
Interesting idea.
The benefit of this patch is to reduce power.
Have you been able to measure a power reduction, via power meter, or
via built-in RAPL power meter?
(turbostat will show RAPL watts, or if you have constant quantity of
work, use turbostat -J)
thanks,
-Len
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-04 6:29 ` Len Brown
@ 2014-04-04 8:16 ` Daniel Lezcano
0 siblings, 0 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-04 8:16 UTC (permalink / raw)
To: Len Brown
Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J. Wysocki,
nicolas.pitre, Linux PM list, alex.shi, vincent.guittot,
morten.rasmussen
On 04/04/2014 08:29 AM, Len Brown wrote:
> Hi Daniel,
>
> Interesting idea.
>
> The benefit of this patch is to reduce power.
> Have you been able to measure a power reduction, via power meter, or
> via built-in RAPL power meter?
> (turbostat will show RAPL watts, or if you have constant quantity of
> work, use turbostat -J)
Hi Len,
thanks for looking the patches.
I will tweak, respin the patchset and do some more measurements.
I don't have a power meter but may be the RAPL could help to test on x86.
Thanks
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-02 8:26 ` Daniel Lezcano
@ 2014-04-04 11:23 ` Rafael J. Wysocki
0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-04 11:23 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Wednesday, April 02, 2014 10:26:31 AM Daniel Lezcano wrote:
> On 04/02/2014 01:01 AM, Rafael J. Wysocki wrote:
> > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> >> The following patchset provides an interaction between cpuidle and the scheduler.
> >>
> >> The first patch encapsulate the needed information for the scheduler in a
> >> separate cpuidle structure. The second one stores the pointer to this structure
> >> when entering idle. The third one, use this information to take the decision to
> >> find the idlest cpu.
> >>
> >> After some basic testing with hackbench, it appears there is an improvement for
> >> the performances (small) and for the duration of the idle states (which provides
> >> a better power saving).
> >>
> >> The measurement has been done with the 'idlestat' tool previously posted in this
> >> mailing list.
> >>
> >> So the benefit is good for both sides performance and power saving.
> >>
> >> The select_idle_sibling could be also improved in the same way.
> >
> > Well, quite frankly, I don't really like this series. Not the idea itself, but
> > the way it has been implemented.
> >
> > First off, if the scheduler is to access idle state data stored in struct
> > cpuidle_state, I'm not sure why we need a separate new structure for that?
> > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> > instead? [->exit_latency is the only field that find_idlest_cpu() in your
> > third patch seems to be using anyway.]
>
> Hi Rafael,
>
> thank you very much for reviewing the patchset.
>
> I created a specific structure to encapsulate the informations needed
> for the scheduler and to prevent to export unneeded data. This is purely
> for code design. Also it was to separate the idle's energy
> characteristics from the cpuidle framework data (flags, name, etc ...).
>
> The exit_latency field is only used in this patchset but the
> target_residency will be used also (eg. prevent to wakeup a cpu before
> the minimum idle time target residency).
OK
It would be good to add that heuristics upfront so that we can see the full
picture.
> The power field is ... hum ... not filled by any board (except for
> calxeda). Vendors do not like to share this information, so very likely
> that would be changed to a normalized value, I don't know.
I'm not sure if that field is ever going to be used by everyone to be honest.
> I agree we can put a pointer to the struct cpuidle_state instead if that
> reduce the impact of the patchset.
Yes, it will, in my opinion.
> > Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> > guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to
> > non-idle while another one is executing find_idlest_cpu()? In other words,
> > where's the read memory barrier corresponding to the write ones in the modified
> > cpu_idle_call()? And is the memory barrier actually sufficient? After all,
> > you need to guarantee that the CPU is still idle after you have evaluated
> > idle_cpu() on it.
>
> Well, as Nicolas mentioned it in another mail, we can live with races,
> the scheduler will take a wrong decision but nothing worth than what we
I guess you mean "worse"? I'm not sure about that.
> have today. In any case we want to prevent any lock in the code.
Of course. :-)
> > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> > CPU the best one? What about deeper vs shallower idle states, for example?
>
> I believe it is what is supposed to do the patchset. 1. if the cpu is
> idle, pick the shallower, 2. if the cpu is not idle pick the less
> loaded. But may be there is something wrong in the routine as pointed
> Nico, I have to double check it.
Yes, that routine doesn't look entirely correct then.
Thanks!
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-02 3:14 ` Nicolas Pitre
@ 2014-04-04 11:43 ` Rafael J. Wysocki
2014-04-15 13:17 ` Peter Zijlstra
2014-04-15 13:25 ` Peter Zijlstra
0 siblings, 2 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-04 11:43 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Tuesday, April 01, 2014 11:14:33 PM Nicolas Pitre wrote:
> On Wed, 2 Apr 2014, Rafael J. Wysocki wrote:
>
> > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> > > The following patchset provides an interaction between cpuidle and the scheduler.
> > >
> > > The first patch encapsulate the needed information for the scheduler in a
> > > separate cpuidle structure. The second one stores the pointer to this structure
> > > when entering idle. The third one, use this information to take the decision to
> > > find the idlest cpu.
> > >
> > > After some basic testing with hackbench, it appears there is an improvement for
> > > the performances (small) and for the duration of the idle states (which provides
> > > a better power saving).
> > >
> > > The measurement has been done with the 'idlestat' tool previously posted in this
> > > mailing list.
> > >
> > > So the benefit is good for both sides performance and power saving.
> > >
> > > The select_idle_sibling could be also improved in the same way.
> >
> > Well, quite frankly, I don't really like this series. Not the idea itself, but
> > the way it has been implemented.
> >
> > First off, if the scheduler is to access idle state data stored in struct
> > cpuidle_state, I'm not sure why we need a separate new structure for that?
> > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> > instead? [->exit_latency is the only field that find_idlest_cpu() in your
> > third patch seems to be using anyway.]
>
> Future patches are likely to use the other fields. I presume that's why
> Daniel put them there.
>
> But I admit being on the fence about this i.e whether or not we should
> encapsulate shared fields into a separate structure or not.
Quite frankly, I don't see a point in using a separate structure here.
> > Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> > guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to
> > non-idle while another one is executing find_idlest_cpu()? In other words,
> > where's the read memory barrier corresponding to the write ones in the modified
> > cpu_idle_call()? And is the memory barrier actually sufficient? After all,
> > you need to guarantee that the CPU is still idle after you have evaluated
> > idle_cpu() on it.
>
> I don't think avoiding races is all that important here. Right now any
> idle CPU is selected regardless of its idle state depth. What this
> patch should do (considering my previous comments on it) is to favor the
> idle CPU with the shalloest idle state. If once in a while the
> selection is wrong because of a race we're not going to make it any
> worse than what we have today without this patch.
>
> That probably means the write barrier could potentially be omitted as
> well if it implies a useless cost.
Yes, the write barriers don't seem to serve any real purpose.
> We need to ensure the cpuidle data structure is not going away (e.g.
> cpuidle driver module removal) while another CPU looks at it though.
> The timing would have to be awfully weird for this to happen but still.
Well, I'm not sure if that is a real concern. Only a couple of drivers try
to implement module unloading and I guess this isn't tested too much, so
perhaps we should just make it impossible to unload a cpuidle driver?
> > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> > CPU the best one? What about deeper vs shallower idle states, for example?
>
> That's what this patch series is about. The find_idlest_cpu code should
> look for the idle CPU with the shallowest idle state, or the one with
> the smallest load. In this context "find_idlest_cpu" might become a
> misnomer.
Yes, clearly. It should be called find_best_cpu or something like that.
Thanks!
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-02 3:05 ` Nicolas Pitre
@ 2014-04-04 11:57 ` Rafael J. Wysocki
2014-04-04 16:56 ` Nicolas Pitre
2014-04-17 13:53 ` Daniel Lezcano
1 sibling, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-04 11:57 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote:
> On Fri, 28 Mar 2014, Daniel Lezcano wrote:
>
> > As we know in which idle state the cpu is, we can investigate the following:
> >
> > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> > deeper it is idle
> > 2. what exit latency is ? the greater the exit latency is, the deeper it is
> >
> > With both information, when all cpus are idle, we can choose the idlest cpu.
> >
> > When one cpu is not idle, the old check against weighted load applies.
> >
> > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>
> There seems to be some problems with the implementation.
>
> > @@ -4336,20 +4337,53 @@ static int
> > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> > {
> > unsigned long load, min_load = ULONG_MAX;
> > - int idlest = -1;
> > + unsigned int min_exit_latency = UINT_MAX;
> > + u64 idle_stamp, min_idle_stamp = ULONG_MAX;
>
> I don't think you really meant to assign an u64 variable with ULONG_MAX.
> You probably want ULLONG_MAX here. And probably not in fact (more
> later).
>
> > +
> > + struct rq *rq;
> > + struct cpuidle_power *power;
> > +
> > + int cpu_idle = -1;
> > + int cpu_busy = -1;
> > int i;
> >
> > /* Traverse only the allowed CPUs */
> > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> > - load = weighted_cpuload(i);
> >
> > - if (load < min_load || (load == min_load && i == this_cpu)) {
> > - min_load = load;
> > - idlest = i;
> > + if (idle_cpu(i)) {
> > +
> > + rq = cpu_rq(i);
> > + power = rq->power;
> > + idle_stamp = rq->idle_stamp;
> > +
> > + /* The cpu is idle since a shorter time */
> > + if (idle_stamp < min_idle_stamp) {
> > + min_idle_stamp = idle_stamp;
> > + cpu_idle = i;
> > + continue;
>
> Don't you want the highest time stamp in order to select the most
> recently idled CPU? Favoring the CPU which has been idle the longest
> makes little sense.
It may make sense if the hardware can auto-promote CPUs to deeper C-states.
Something like that happens with package C-states that are only entered when
all cores have entered a particular core C-state already. In that case the
probability of the core being in a deeper state grows with time.
That said I would just drop this heuristics for the time being. If auto-promotion
is disregarded, it doesn't really matter how much time the given CPU has been idle
except for one case: When the target residency of its idle state hasn't been
reached yet, waking up the CPU may be a mistake (depending on how deep the state
actually is, but for the majority of drivers in the tree we don't have any measure
of that).
> > + }
> > +
> > + /* The cpu is idle but the exit_latency is shorter */
> > + if (power && power->exit_latency < min_exit_latency) {
> > + min_exit_latency = power->exit_latency;
> > + cpu_idle = i;
> > + continue;
> > + }
>
> I think this is wrong. This gives priority to CPUs which have been idle
> for a (longer... although this should have been) shorter period of time
> over those with a shallower idle state. I think this should rather be:
>
> if (power && power->exit_latency < min_exit_latency) {
> min_exit_latency = power->exit_latency;
> latest_idle_stamp = idle_stamp;
> cpu_idle = i;
> } else if ((!power || power->exit_latency == min_exit_latency) &&
> idle_stamp > latest_idle_stamp) {
> latest_idle_stamp = idle_stamp;
> cpu_idle = i;
> }
>
> So the CPU with the shallowest idle state is selected in priority, and
> if many CPUs are in the same state then the time stamp is used to
> select the most recent one.
Again, if auto-promotion is disregarded, it doesn't really matter which of them
is woken up.
> Whenever a shallower idle state is found then the latest_idle_stamp is reset for
> that state even if it is further in the past.
>
> > + } else {
> > +
> > + load = weighted_cpuload(i);
> > +
> > + if (load < min_load ||
> > + (load == min_load && i == this_cpu)) {
> > + min_load = load;
> > + cpu_busy = i;
> > + continue;
> > + }
> > }
>
> I think this is wrong to do an if-else based on idle_cpu() here. What
> if a CPU is heavily loaded, but for some reason it happens to be idle at
> this very moment? With your patch it could be selected as an idle CPU
> while it would be discarded as being too busy otherwise.
But see below ->
> It is important to determine both cpu_busy and cpu_idle for all CPUs.
>
> And cpu_busy is a bad name for this. Something like least_loaded would
> be more self explanatory. Same thing for cpu_idle which could be
> clearer if named shalloest_idle.
shallowest_idle?
> > - return idlest;
> > + /* Busy cpus are considered less idle than idle cpus ;) */
> > + return cpu_busy != -1 ? cpu_busy : cpu_idle;
>
> And finally it is a policy decision whether or not we want to return
> least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs
> first or not. That in itself needs more investigation. To keep the
> existing policy unchanged for now the above condition should have its
> variables swapped.
Which means that once we've find the first idle CPU, it is not useful to
continue computing least_loaded, because we will return the idle one anyway,
right?
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-04 11:57 ` Rafael J. Wysocki
@ 2014-04-04 16:56 ` Nicolas Pitre
2014-04-05 2:01 ` Rafael J. Wysocki
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-04 16:56 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Fri, 4 Apr 2014, Rafael J. Wysocki wrote:
> On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote:
> > On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> >
> > > As we know in which idle state the cpu is, we can investigate the following:
> > >
> > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> > > deeper it is idle
> > > 2. what exit latency is ? the greater the exit latency is, the deeper it is
> > >
> > > With both information, when all cpus are idle, we can choose the idlest cpu.
> > >
> > > When one cpu is not idle, the old check against weighted load applies.
> > >
> > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> >
> > There seems to be some problems with the implementation.
> >
> > > @@ -4336,20 +4337,53 @@ static int
> > > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> > > {
> > > unsigned long load, min_load = ULONG_MAX;
> > > - int idlest = -1;
> > > + unsigned int min_exit_latency = UINT_MAX;
> > > + u64 idle_stamp, min_idle_stamp = ULONG_MAX;
> >
> > I don't think you really meant to assign an u64 variable with ULONG_MAX.
> > You probably want ULLONG_MAX here. And probably not in fact (more
> > later).
> >
> > > +
> > > + struct rq *rq;
> > > + struct cpuidle_power *power;
> > > +
> > > + int cpu_idle = -1;
> > > + int cpu_busy = -1;
> > > int i;
> > >
> > > /* Traverse only the allowed CPUs */
> > > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> > > - load = weighted_cpuload(i);
> > >
> > > - if (load < min_load || (load == min_load && i == this_cpu)) {
> > > - min_load = load;
> > > - idlest = i;
> > > + if (idle_cpu(i)) {
> > > +
> > > + rq = cpu_rq(i);
> > > + power = rq->power;
> > > + idle_stamp = rq->idle_stamp;
> > > +
> > > + /* The cpu is idle since a shorter time */
> > > + if (idle_stamp < min_idle_stamp) {
> > > + min_idle_stamp = idle_stamp;
> > > + cpu_idle = i;
> > > + continue;
> >
> > Don't you want the highest time stamp in order to select the most
> > recently idled CPU? Favoring the CPU which has been idle the longest
> > makes little sense.
>
> It may make sense if the hardware can auto-promote CPUs to deeper C-states.
If so the promotion will happen over time, no? What I'm saying here is
that those CPUs which have been idle longer should not be favored when
it is time to select a CPU for a task to run. More recently idled CPUs
are more likely to be in a shallower C-state.
> Something like that happens with package C-states that are only entered when
> all cores have entered a particular core C-state already. In that case the
> probability of the core being in a deeper state grows with time.
Exactly what I'm saying.
Also here it is worth remembering that the scheduling domains should
represent those packages that share common C-states at a higher level.
The scheduler can then be told not to balance across domains if it
doesn't need to in order to favor the conditions for those package
C-states to be used. That's what the task packing patch series is
about, independently of this one.
> That said I would just drop this heuristics for the time being. If auto-promotion
> is disregarded, it doesn't really matter how much time the given CPU has been idle
> except for one case: When the target residency of its idle state hasn't been
> reached yet, waking up the CPU may be a mistake (depending on how deep the state
> actually is, but for the majority of drivers in the tree we don't have any measure
> of that).
There is one reason for considering the time a CPU has been idle,
assuming equivalent C-state, and that is cache snooping. The longer a
CPU is idle, the more likely its cache content will have been claimed
and migrated by other CPUs. Of course that doesn't make much difference
for deeper C-states where the cache isn't preserved, but it is probably
simpler and cheaper to apply this heuristic in all cases.
> > > + }
> > > +
> > > + /* The cpu is idle but the exit_latency is shorter */
> > > + if (power && power->exit_latency < min_exit_latency) {
> > > + min_exit_latency = power->exit_latency;
> > > + cpu_idle = i;
> > > + continue;
> > > + }
> >
> > I think this is wrong. This gives priority to CPUs which have been idle
> > for a (longer... although this should have been) shorter period of time
> > over those with a shallower idle state. I think this should rather be:
> >
> > if (power && power->exit_latency < min_exit_latency) {
> > min_exit_latency = power->exit_latency;
> > latest_idle_stamp = idle_stamp;
> > cpu_idle = i;
> > } else if ((!power || power->exit_latency == min_exit_latency) &&
> > idle_stamp > latest_idle_stamp) {
> > latest_idle_stamp = idle_stamp;
> > cpu_idle = i;
> > }
> >
> > So the CPU with the shallowest idle state is selected in priority, and
> > if many CPUs are in the same state then the time stamp is used to
> > select the most recent one.
>
> Again, if auto-promotion is disregarded, it doesn't really matter which of them
> is woken up.
If it doesn't matter then it doesn't hurt. But in some cases it
matters.
> > Whenever a shallower idle state is found then the latest_idle_stamp is reset for
> > that state even if it is further in the past.
> >
> > > + } else {
> > > +
> > > + load = weighted_cpuload(i);
> > > +
> > > + if (load < min_load ||
> > > + (load == min_load && i == this_cpu)) {
> > > + min_load = load;
> > > + cpu_busy = i;
> > > + continue;
> > > + }
> > > }
> >
> > I think this is wrong to do an if-else based on idle_cpu() here. What
> > if a CPU is heavily loaded, but for some reason it happens to be idle at
> > this very moment? With your patch it could be selected as an idle CPU
> > while it would be discarded as being too busy otherwise.
>
> But see below ->
>
> > It is important to determine both cpu_busy and cpu_idle for all CPUs.
> >
> > And cpu_busy is a bad name for this. Something like least_loaded would
> > be more self explanatory. Same thing for cpu_idle which could be
> > clearer if named shalloest_idle.
>
> shallowest_idle?
Something that means the CPU with the shallowest C-state. Using
"cpu_idle" for this variable doesn't cut it.
> > > - return idlest;
> > > + /* Busy cpus are considered less idle than idle cpus ;) */
> > > + return cpu_busy != -1 ? cpu_busy : cpu_idle;
> >
> > And finally it is a policy decision whether or not we want to return
> > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs
> > first or not. That in itself needs more investigation. To keep the
> > existing policy unchanged for now the above condition should have its
> > variables swapped.
>
> Which means that once we've find the first idle CPU, it is not useful to
> continue computing least_loaded, because we will return the idle one anyway,
> right?
Good point. Currently, that should be the case.
Eventually we'll want to put new tasks on lightly loaded CPUs instead of
waking up a fully idle CPU in order to favor deeper C-states. But that
requires a patch series of its own just to determine how loaded a CPU is
and how much work it can still accommodate before being oversubscribed,
etc.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-04 16:56 ` Nicolas Pitre
@ 2014-04-05 2:01 ` Rafael J. Wysocki
0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-05 2:01 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Friday, April 04, 2014 12:56:52 PM Nicolas Pitre wrote:
> On Fri, 4 Apr 2014, Rafael J. Wysocki wrote:
>
> > On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote:
> > > On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> > >
> > > > As we know in which idle state the cpu is, we can investigate the following:
> > > >
> > > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> > > > deeper it is idle
> > > > 2. what exit latency is ? the greater the exit latency is, the deeper it is
> > > >
> > > > With both information, when all cpus are idle, we can choose the idlest cpu.
> > > >
> > > > When one cpu is not idle, the old check against weighted load applies.
> > > >
> > > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > >
> > > There seems to be some problems with the implementation.
> > >
> > > > @@ -4336,20 +4337,53 @@ static int
> > > > find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> > > > {
> > > > unsigned long load, min_load = ULONG_MAX;
> > > > - int idlest = -1;
> > > > + unsigned int min_exit_latency = UINT_MAX;
> > > > + u64 idle_stamp, min_idle_stamp = ULONG_MAX;
> > >
> > > I don't think you really meant to assign an u64 variable with ULONG_MAX.
> > > You probably want ULLONG_MAX here. And probably not in fact (more
> > > later).
> > >
> > > > +
> > > > + struct rq *rq;
> > > > + struct cpuidle_power *power;
> > > > +
> > > > + int cpu_idle = -1;
> > > > + int cpu_busy = -1;
> > > > int i;
> > > >
> > > > /* Traverse only the allowed CPUs */
> > > > for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> > > > - load = weighted_cpuload(i);
> > > >
> > > > - if (load < min_load || (load == min_load && i == this_cpu)) {
> > > > - min_load = load;
> > > > - idlest = i;
> > > > + if (idle_cpu(i)) {
> > > > +
> > > > + rq = cpu_rq(i);
> > > > + power = rq->power;
> > > > + idle_stamp = rq->idle_stamp;
> > > > +
> > > > + /* The cpu is idle since a shorter time */
> > > > + if (idle_stamp < min_idle_stamp) {
> > > > + min_idle_stamp = idle_stamp;
> > > > + cpu_idle = i;
> > > > + continue;
> > >
> > > Don't you want the highest time stamp in order to select the most
> > > recently idled CPU? Favoring the CPU which has been idle the longest
> > > makes little sense.
> >
> > It may make sense if the hardware can auto-promote CPUs to deeper C-states.
>
> If so the promotion will happen over time, no? What I'm saying here is
> that those CPUs which have been idle longer should not be favored when
> it is time to select a CPU for a task to run. More recently idled CPUs
> are more likely to be in a shallower C-state.
>
> > Something like that happens with package C-states that are only entered when
> > all cores have entered a particular core C-state already. In that case the
> > probability of the core being in a deeper state grows with time.
>
> Exactly what I'm saying.
Right, I got that the other way around by mistake.
> Also here it is worth remembering that the scheduling domains should
> represent those packages that share common C-states at a higher level.
> The scheduler can then be told not to balance across domains if it
> doesn't need to in order to favor the conditions for those package
> C-states to be used. That's what the task packing patch series is
> about, independently of this one.
>
> > That said I would just drop this heuristics for the time being. If auto-promotion
> > is disregarded, it doesn't really matter how much time the given CPU has been idle
> > except for one case: When the target residency of its idle state hasn't been
> > reached yet, waking up the CPU may be a mistake (depending on how deep the state
> > actually is, but for the majority of drivers in the tree we don't have any measure
> > of that).
>
> There is one reason for considering the time a CPU has been idle,
> assuming equivalent C-state, and that is cache snooping. The longer a
> CPU is idle, the more likely its cache content will have been claimed
> and migrated by other CPUs. Of course that doesn't make much difference
> for deeper C-states where the cache isn't preserved, but it is probably
> simpler and cheaper to apply this heuristic in all cases.
Yes, that sounds like it might be a reason, but I'd like to see numbers
confirming that to be honest.
> > > > + }
> > > > +
> > > > + /* The cpu is idle but the exit_latency is shorter */
> > > > + if (power && power->exit_latency < min_exit_latency) {
> > > > + min_exit_latency = power->exit_latency;
> > > > + cpu_idle = i;
> > > > + continue;
> > > > + }
> > >
> > > I think this is wrong. This gives priority to CPUs which have been idle
> > > for a (longer... although this should have been) shorter period of time
> > > over those with a shallower idle state. I think this should rather be:
> > >
> > > if (power && power->exit_latency < min_exit_latency) {
> > > min_exit_latency = power->exit_latency;
> > > latest_idle_stamp = idle_stamp;
> > > cpu_idle = i;
> > > } else if ((!power || power->exit_latency == min_exit_latency) &&
> > > idle_stamp > latest_idle_stamp) {
> > > latest_idle_stamp = idle_stamp;
> > > cpu_idle = i;
> > > }
> > >
> > > So the CPU with the shallowest idle state is selected in priority, and
> > > if many CPUs are in the same state then the time stamp is used to
> > > select the most recent one.
> >
> > Again, if auto-promotion is disregarded, it doesn't really matter which of them
> > is woken up.
>
> If it doesn't matter then it doesn't hurt. But in some cases it
> matters.
>
> > > Whenever a shallower idle state is found then the latest_idle_stamp is reset for
> > > that state even if it is further in the past.
> > >
> > > > + } else {
> > > > +
> > > > + load = weighted_cpuload(i);
> > > > +
> > > > + if (load < min_load ||
> > > > + (load == min_load && i == this_cpu)) {
> > > > + min_load = load;
> > > > + cpu_busy = i;
> > > > + continue;
> > > > + }
> > > > }
> > >
> > > I think this is wrong to do an if-else based on idle_cpu() here. What
> > > if a CPU is heavily loaded, but for some reason it happens to be idle at
> > > this very moment? With your patch it could be selected as an idle CPU
> > > while it would be discarded as being too busy otherwise.
> >
> > But see below ->
> >
> > > It is important to determine both cpu_busy and cpu_idle for all CPUs.
> > >
> > > And cpu_busy is a bad name for this. Something like least_loaded would
> > > be more self explanatory. Same thing for cpu_idle which could be
> > > clearer if named shalloest_idle.
> >
> > shallowest_idle?
>
> Something that means the CPU with the shallowest C-state. Using
> "cpu_idle" for this variable doesn't cut it.
Yes, that was about the typo above only. :-)
> > > > - return idlest;
> > > > + /* Busy cpus are considered less idle than idle cpus ;) */
> > > > + return cpu_busy != -1 ? cpu_busy : cpu_idle;
> > >
> > > And finally it is a policy decision whether or not we want to return
> > > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs
> > > first or not. That in itself needs more investigation. To keep the
> > > existing policy unchanged for now the above condition should have its
> > > variables swapped.
> >
> > Which means that once we've find the first idle CPU, it is not useful to
> > continue computing least_loaded, because we will return the idle one anyway,
> > right?
>
> Good point. Currently, that should be the case.
>
> Eventually we'll want to put new tasks on lightly loaded CPUs instead of
> waking up a fully idle CPU in order to favor deeper C-states. But that
> requires a patch series of its own just to determine how loaded a CPU is
> and how much work it can still accommodate before being oversubscribed,
> etc.
Wouldn't we need power consumption numbers for that realistically?
Rafael
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
@ 2014-04-15 12:43 ` Peter Zijlstra
2014-04-15 12:44 ` Peter Zijlstra
0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 12:43 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
> if (!ret) {
> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>
> + *power = &drv->states[next_state].power;
> +
> + wmb();
> +
I very much suspect you meant: smp_wmb(), as I don't see the hardware
reading that pointer, therefore UP wouldn't care. Also, any and all
barriers should come with a comment that describes the data ordering and
points to the matchin barriers.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
2014-04-15 12:43 ` Peter Zijlstra
@ 2014-04-15 12:44 ` Peter Zijlstra
2014-04-15 14:17 ` Daniel Lezcano
0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 12:44 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> > @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
> > if (!ret) {
> > trace_cpu_idle_rcuidle(next_state, dev->cpu);
> >
> > + *power = &drv->states[next_state].power;
> > +
> > + wmb();
> > +
>
> I very much suspect you meant: smp_wmb(), as I don't see the hardware
> reading that pointer, therefore UP wouldn't care. Also, any and all
> barriers should come with a comment that describes the data ordering and
> points to the matchin barriers.
Furthermore, this patch fails to describe the life-time rules of the
object placed there. Can the objected pointed to ever disappear?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
2014-04-02 3:05 ` Nicolas Pitre
@ 2014-04-15 13:03 ` Peter Zijlstra
1 sibling, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:03 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Fri, Mar 28, 2014 at 01:29:56PM +0100, Daniel Lezcano wrote:
> @@ -4336,20 +4337,53 @@ static int
> find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> {
> unsigned long load, min_load = ULONG_MAX;
> - int idlest = -1;
> + unsigned int min_exit_latency = UINT_MAX;
> + u64 idle_stamp, min_idle_stamp = ULONG_MAX;
> +
> + struct rq *rq;
> + struct cpuidle_power *power;
> +
> + int cpu_idle = -1;
> + int cpu_busy = -1;
> int i;
>
> /* Traverse only the allowed CPUs */
> for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> - load = weighted_cpuload(i);
>
> - if (load < min_load || (load == min_load && i == this_cpu)) {
> - min_load = load;
> - idlest = i;
> + if (idle_cpu(i)) {
> +
> + rq = cpu_rq(i);
> + power = rq->power;
> + idle_stamp = rq->idle_stamp;
> +
> + /* The cpu is idle since a shorter time */
> + if (idle_stamp < min_idle_stamp) {
> + min_idle_stamp = idle_stamp;
> + cpu_idle = i;
> + continue;
> + }
> +
> + /* The cpu is idle but the exit_latency is shorter */
> + if (power && power->exit_latency < min_exit_latency) {
> + min_exit_latency = power->exit_latency;
> + cpu_idle = i;
> + continue;
> + }
Aside from the arguments made by Nico (which I agree with), depending on
the life time rules of the power object we might need
smp_read_barrier_depends() between reading and using.
If all these objects are static and never change content we do not, if
there's dynamic objects involved we probably should.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-01 9:05 ` Vincent Guittot
@ 2014-04-15 13:13 ` Peter Zijlstra
0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:13 UTC (permalink / raw)
To: Vincent Guittot
Cc: Daniel Lezcano, linux-kernel, Ingo Molnar, rjw, Nicolas Pitre,
linux-pm, Alex Shi, Morten Rasmussen
On Tue, Apr 01, 2014 at 11:05:16AM +0200, Vincent Guittot wrote:
> >> We are working on a bench which can generate middle load pattern with
> >> idle CPUs but it's not available yet. In the mean time, one bench that
> >> plays with idle time is cyclictest, it will not give you performance
> >> results but only scheduling latency which might be what you are
> >> looking for.
> >
> >
> > Yeah, thanks. I believe I know what is in the rt-tests package :)
> >
> > What I meant is what kind of values would you like to see with this patchset
> > ?
>
> IIUC, you patch tries to improve the wake up latency of a task by
> selecting the CPUs with the shallowest C-state, so this metrics seems
> to be a good candidate
cyclic-test might be too regular to really measure anything though.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-04 11:43 ` Rafael J. Wysocki
@ 2014-04-15 13:17 ` Peter Zijlstra
2014-04-15 13:25 ` Peter Zijlstra
1 sibling, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:17 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm,
alex.shi, vincent.guittot, morten.rasmussen
On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > We need to ensure the cpuidle data structure is not going away (e.g.
> > cpuidle driver module removal) while another CPU looks at it though.
> > The timing would have to be awfully weird for this to happen but still.
>
> Well, I'm not sure if that is a real concern. Only a couple of drivers try
> to implement module unloading and I guess this isn't tested too much, so
> perhaps we should just make it impossible to unload a cpuidle driver?
The 'easy' solution is to mandate the use of rcu_read_lock() around the
dereference and make all cpuidle drivers put an rcu_barrier() in their
module unload path.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-04 11:43 ` Rafael J. Wysocki
2014-04-15 13:17 ` Peter Zijlstra
@ 2014-04-15 13:25 ` Peter Zijlstra
2014-04-15 15:27 ` Nicolas Pitre
2014-04-15 15:33 ` Rafael J. Wysocki
1 sibling, 2 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:25 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm,
alex.shi, vincent.guittot, morten.rasmussen
On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > That's what this patch series is about. The find_idlest_cpu code should
> > look for the idle CPU with the shallowest idle state, or the one with
> > the smallest load. In this context "find_idlest_cpu" might become a
> > misnomer.
>
> Yes, clearly. It should be called find_best_cpu or something like that.
Ha!, but for what purpose? We already have find_busiest_cpu() to find
the CPU to steal work from. The converse action, currently called
find_idlest_cpu() is finding the CPU where to put work.
'Best' is ambiguous in all regards, it doesn't convey the direction nor
the quality sorted on.
So while idlest might be somewhat of a misnomer, it at least conveys the
directional thing fairly well. Also we are still searching the least
busy, and preferable an idle, cpu. 'Idlest' being a superlative also
conveys the meaning of order.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
2014-04-15 12:44 ` Peter Zijlstra
@ 2014-04-15 14:17 ` Daniel Lezcano
2014-04-15 14:33 ` Peter Zijlstra
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-15 14:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
> On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
>> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
>>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>>> if (!ret) {
>>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>>
>>> + *power = &drv->states[next_state].power;
>>> +
>>> + wmb();
>>> +
>>
>> I very much suspect you meant: smp_wmb(), as I don't see the hardware
>> reading that pointer, therefore UP wouldn't care. Also, any and all
>> barriers should come with a comment that describes the data ordering and
>> points to the matchin barriers.
>
> Furthermore, this patch fails to describe the life-time rules of the
> object placed there. Can the objected pointed to ever disappear?
Hi Peter,
thanks for reviewing the patches.
There are a couple of situations where a cpuidle state can disappear:
1. For x86/acpi with dynamic c-states, when a laptop switches from
battery to AC that could result on removing the deeper idle state. The
acpi driver triggers:
'acpi_processor_cst_has_changed' which will call
'cpuidle_pause_and_lock'. This one will call
'cpuidle_uninstall_idle_handler' which in turn calls 'kick_all_cpus_sync'.
All cpus will exit their idle state and the pointed object will be set
to NULL again.
2. The cpuidle driver is unloaded. Logically that could happen but not
in practice because the drivers are always compiled in and 95% of the
drivers are not coded to unregister the driver. Anyway ...
The unloading code must call 'cpuidle_unregister_device', that calls
'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.
IIUC, the race can happen if we take the pointer and then one of these
two situation occurs at the same moment.
As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a
rcu_barrier in 'cpuidle_pause_and_lock' or
'cpuidle_uninstall_idle_handler' should suffice, no ?
Thanks
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
2014-04-15 14:17 ` Daniel Lezcano
@ 2014-04-15 14:33 ` Peter Zijlstra
2014-04-15 14:39 ` Daniel Lezcano
0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 14:33 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote:
> On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
> >On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
> >>On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> >>>@@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
> >>> if (!ret) {
> >>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
> >>>
> >>>+ *power = &drv->states[next_state].power;
> >>>+
> >>>+ wmb();
> >>>+
> >>
> >>I very much suspect you meant: smp_wmb(), as I don't see the hardware
> >>reading that pointer, therefore UP wouldn't care. Also, any and all
> >>barriers should come with a comment that describes the data ordering and
> >>points to the matchin barriers.
> >
> >Furthermore, this patch fails to describe the life-time rules of the
> >object placed there. Can the objected pointed to ever disappear?
>
> Hi Peter,
>
> thanks for reviewing the patches.
>
> There are a couple of situations where a cpuidle state can disappear:
>
> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
> to AC that could result on removing the deeper idle state. The acpi driver
> triggers:
>
> 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'.
> This one will call 'cpuidle_uninstall_idle_handler' which in turn calls
> 'kick_all_cpus_sync'.
>
> All cpus will exit their idle state and the pointed object will be set to
> NULL again.
>
> 2. The cpuidle driver is unloaded. Logically that could happen but not in
> practice because the drivers are always compiled in and 95% of the drivers
> are not coded to unregister the driver. Anyway ...
>
> The unloading code must call 'cpuidle_unregister_device', that calls
> 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.
>
> IIUC, the race can happen if we take the pointer and then one of these two
> situation occurs at the same moment.
>
> As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a
> rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler'
> should suffice, no ?
Indeed. But be sure to document this.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
2014-04-15 14:33 ` Peter Zijlstra
@ 2014-04-15 14:39 ` Daniel Lezcano
0 siblings, 0 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-15 14:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On 04/15/2014 04:33 PM, Peter Zijlstra wrote:
> On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote:
>> On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
>>> On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
>>>> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
>>>>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>>>>> if (!ret) {
>>>>> trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>>>>
>>>>> + *power = &drv->states[next_state].power;
>>>>> +
>>>>> + wmb();
>>>>> +
>>>>
>>>> I very much suspect you meant: smp_wmb(), as I don't see the hardware
>>>> reading that pointer, therefore UP wouldn't care. Also, any and all
>>>> barriers should come with a comment that describes the data ordering and
>>>> points to the matchin barriers.
>>>
>>> Furthermore, this patch fails to describe the life-time rules of the
>>> object placed there. Can the objected pointed to ever disappear?
>>
>> Hi Peter,
>>
>> thanks for reviewing the patches.
>>
>> There are a couple of situations where a cpuidle state can disappear:
>>
>> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
>> to AC that could result on removing the deeper idle state. The acpi driver
>> triggers:
>>
>> 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'.
>> This one will call 'cpuidle_uninstall_idle_handler' which in turn calls
>> 'kick_all_cpus_sync'.
>>
>> All cpus will exit their idle state and the pointed object will be set to
>> NULL again.
>>
>> 2. The cpuidle driver is unloaded. Logically that could happen but not in
>> practice because the drivers are always compiled in and 95% of the drivers
>> are not coded to unregister the driver. Anyway ...
>>
>> The unloading code must call 'cpuidle_unregister_device', that calls
>> 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.
>>
>> IIUC, the race can happen if we take the pointer and then one of these two
>> situation occurs at the same moment.
>>
>> As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a
>> rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler'
>> should suffice, no ?
>
> Indeed. But be sure to document this.
Yes, sure. Thanks for pointing this.
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-15 13:25 ` Peter Zijlstra
@ 2014-04-15 15:27 ` Nicolas Pitre
2014-04-15 15:33 ` Rafael J. Wysocki
1 sibling, 0 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-15 15:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Daniel Lezcano, linux-kernel, mingo, linux-pm,
alex.shi, vincent.guittot, morten.rasmussen
On Tue, 15 Apr 2014, Peter Zijlstra wrote:
> On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > > That's what this patch series is about. The find_idlest_cpu code should
> > > look for the idle CPU with the shallowest idle state, or the one with
> > > the smallest load. In this context "find_idlest_cpu" might become a
> > > misnomer.
> >
> > Yes, clearly. It should be called find_best_cpu or something like that.
>
> Ha!, but for what purpose? We already have find_busiest_cpu() to find
> the CPU to steal work from. The converse action, currently called
> find_idlest_cpu() is finding the CPU where to put work.
>
> 'Best' is ambiguous in all regards, it doesn't convey the direction nor
> the quality sorted on.
>
> So while idlest might be somewhat of a misnomer, it at least conveys the
> directional thing fairly well. Also we are still searching the least
> busy, and preferable an idle, cpu. 'Idlest' being a superlative also
> conveys the meaning of order.
I agree that anything which is called "best" is ambigous. Best for
what? That isn't self explanatory.
However "idlest" is no longer the wanted attribute here. "Least busy"
is right. But not necessarily the "idlest". The "best" CPU here is
somewhat in the middle between busiest and idlest i.e. preferably idle,
but not the "idlest" in the cpuidle sense.
Maybe we could use your definition to simply call it
find_cpu_to_put_work() or the like. Today this is based on the idleness
of CPUs, but eventually we'll want to pack tasks on already loaded CPUs
(without oversubscribing them) in order to keep as many CPUs idle as
possible when that makes sense, which would alter the selection
somewhat.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
2014-04-15 13:25 ` Peter Zijlstra
2014-04-15 15:27 ` Nicolas Pitre
@ 2014-04-15 15:33 ` Rafael J. Wysocki
1 sibling, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-15 15:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm,
alex.shi, vincent.guittot, morten.rasmussen
On Tuesday, April 15, 2014 03:25:10 PM Peter Zijlstra wrote:
> On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > > That's what this patch series is about. The find_idlest_cpu code should
> > > look for the idle CPU with the shallowest idle state, or the one with
> > > the smallest load. In this context "find_idlest_cpu" might become a
> > > misnomer.
> >
> > Yes, clearly. It should be called find_best_cpu or something like that.
>
> Ha!, but for what purpose? We already have find_busiest_cpu() to find
> the CPU to steal work from. The converse action, currently called
> find_idlest_cpu() is finding the CPU where to put work.
>
> 'Best' is ambiguous in all regards, it doesn't convey the direction nor
> the quality sorted on.
>
> So while idlest might be somewhat of a misnomer, it at least conveys the
> directional thing fairly well. Also we are still searching the least
> busy, and preferable an idle, cpu. 'Idlest' being a superlative also
> conveys the meaning of order.
But 'idlest' can also be understood as 'deepest idle', which clearly is not the
intent. Perhaps find_cpu_for_work() reflects what it does, but I'm not sure
if that's a good name either.
Rafael
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-02 3:05 ` Nicolas Pitre
2014-04-04 11:57 ` Rafael J. Wysocki
@ 2014-04-17 13:53 ` Daniel Lezcano
2014-04-17 14:47 ` Peter Zijlstra
2014-04-17 15:53 ` Nicolas Pitre
1 sibling, 2 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-17 13:53 UTC (permalink / raw)
To: Nicolas Pitre
Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On 04/02/2014 05:05 AM, Nicolas Pitre wrote:
> On Fri, 28 Mar 2014, Daniel Lezcano wrote:
>
>> As we know in which idle state the cpu is, we can investigate the following:
>>
>> 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
>> deeper it is idle
>> 2. what exit latency is ? the greater the exit latency is, the deeper it is
>>
>> With both information, when all cpus are idle, we can choose the idlest cpu.
>>
>> When one cpu is not idle, the old check against weighted load applies.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>
> There seems to be some problems with the implementation.
>
>> @@ -4336,20 +4337,53 @@ static int
>> find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>> {
>> unsigned long load, min_load = ULONG_MAX;
>> - int idlest = -1;
>> + unsigned int min_exit_latency = UINT_MAX;
>> + u64 idle_stamp, min_idle_stamp = ULONG_MAX;
>
> I don't think you really meant to assign an u64 variable with ULONG_MAX.
> You probably want ULLONG_MAX here. And probably not in fact (more
> later).
>
>> +
>> + struct rq *rq;
>> + struct cpuidle_power *power;
>> +
>> + int cpu_idle = -1;
>> + int cpu_busy = -1;
>> int i;
>>
>> /* Traverse only the allowed CPUs */
>> for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
>> - load = weighted_cpuload(i);
>>
>> - if (load < min_load || (load == min_load && i == this_cpu)) {
>> - min_load = load;
>> - idlest = i;
>> + if (idle_cpu(i)) {
>> +
>> + rq = cpu_rq(i);
>> + power = rq->power;
>> + idle_stamp = rq->idle_stamp;
>> +
>> + /* The cpu is idle since a shorter time */
>> + if (idle_stamp < min_idle_stamp) {
>> + min_idle_stamp = idle_stamp;
>> + cpu_idle = i;
>> + continue;
>
> Don't you want the highest time stamp in order to select the most
> recently idled CPU? Favoring the CPU which has been idle the longest
> makes little sense.
>
>> + }
>> +
>> + /* The cpu is idle but the exit_latency is shorter */
>> + if (power && power->exit_latency < min_exit_latency) {
>> + min_exit_latency = power->exit_latency;
>> + cpu_idle = i;
>> + continue;
>> + }
>
> I think this is wrong. This gives priority to CPUs which have been idle
> for a (longer... although this should have been) shorter period of time
> over those with a shallower idle state. I think this should rather be:
>
> if (power && power->exit_latency < min_exit_latency) {
> min_exit_latency = power->exit_latency;
> latest_idle_stamp = idle_stamp;
> cpu_idle = i;
> } else if ((!power || power->exit_latency == min_exit_latency) &&
> idle_stamp > latest_idle_stamp) {
> latest_idle_stamp = idle_stamp;
> cpu_idle = i;
> }
>
> So the CPU with the shallowest idle state is selected in priority, and
> if many CPUs are in the same state then the time stamp is used to
> select the most recent one. Whenever
> a shallower idle state is found then the latest_idle_stamp is reset for
> that state even if it is further in the past.
>
>> + } else {
>> +
>> + load = weighted_cpuload(i);
>> +
>> + if (load < min_load ||
>> + (load == min_load && i == this_cpu)) {
>> + min_load = load;
>> + cpu_busy = i;
>> + continue;
>> + }
>> }
>
> I think this is wrong to do an if-else based on idle_cpu() here. What
> if a CPU is heavily loaded, but for some reason it happens to be idle at
> this very moment? With your patch it could be selected as an idle CPU
> while it would be discarded as being too busy otherwise.
>
> It is important to determine both cpu_busy and cpu_idle for all CPUs.
>
> And cpu_busy is a bad name for this. Something like least_loaded would
> be more self explanatory. Same thing for cpu_idle which could be
> clearer if named shalloest_idle.
>
>> - return idlest;
>> + /* Busy cpus are considered less idle than idle cpus ;) */
>> + return cpu_busy != -1 ? cpu_busy : cpu_idle;
>
> And finally it is a policy decision whether or not we want to return
> least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs
> first or not. That in itself needs more investigation. To keep the
> existing policy unchanged for now the above condition should have its
> variables swapped.
Ok, refreshed the patchset but before sending it out I would to discuss
about the rational of the changes and the policy, and change the
patchset consequently.
What order to choose if the cpu is idle ?
Let's assume all cpus are idle on a dual socket quad core.
Also, we can reasonably do the hypothesis if the cluster is in low power
mode, the cpus belonging to the same cluster are in the same idle state
(putting apart the auto-promote where we don't have control on).
If the policy you talk above is 'aggressive power saving', we can follow
the rules with decreasing priority:
1. We want to prevent to wakeup the entire cluster
=> as the cpus are in the same idle state, by choosing a cpu in shallow
state, we should have the guarantee we won't wakeup a cluster (except if
no shallowest idle cpu are found).
2. We want to prevent to wakeup a cpu which did not reach the target
residency time (will need some work to unify cpuidle idle time and idle
task run time)
=> with the target residency and, as a first step, with the idle stamp,
we can determine if the cpu slept enough
3. We want to prevent to wakeup a cpu in deep idle state
=> by looking for the cpu in shallowest idle state
4. We want to prevent to wakeup a cpu where the exit latency is longer
than the expected run time of the task (and the time to migrate the task ?)
Concerning the policy, I would suggest to create an entry in
/proc/sys/kernel/sched_power, where a couple of values could be
performance - power saving (0 / 1).
Does it make sense ? Any ideas ?
Thanks
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-17 13:53 ` Daniel Lezcano
@ 2014-04-17 14:47 ` Peter Zijlstra
2014-04-17 15:03 ` Daniel Lezcano
2014-04-17 15:53 ` Nicolas Pitre
1 sibling, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-17 14:47 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
> Concerning the policy, I would suggest to create an entry in
> /proc/sys/kernel/sched_power, where a couple of values could be performance
> - power saving (0 / 1).
Ingo wanted a sched_balance_policy file with 3 values:
"performance, power, auto"
Where the auto thing switches between them, initially based off of
having AC or not.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-17 14:47 ` Peter Zijlstra
@ 2014-04-17 15:03 ` Daniel Lezcano
2014-04-18 8:09 ` Ingo Molnar
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-17 15:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On 04/17/2014 04:47 PM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
>> Concerning the policy, I would suggest to create an entry in
>> /proc/sys/kernel/sched_power, where a couple of values could be performance
>> - power saving (0 / 1).
>
> Ingo wanted a sched_balance_policy file with 3 values:
> "performance, power, auto"
>
> Where the auto thing switches between them, initially based off of
> having AC or not.
oh, good. Thanks !
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-17 13:53 ` Daniel Lezcano
2014-04-17 14:47 ` Peter Zijlstra
@ 2014-04-17 15:53 ` Nicolas Pitre
2014-04-17 16:05 ` Daniel Lezcano
1 sibling, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-17 15:53 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Thu, 17 Apr 2014, Daniel Lezcano wrote:
> Ok, refreshed the patchset but before sending it out I would to discuss about
> the rational of the changes and the policy, and change the patchset
> consequently.
>
> What order to choose if the cpu is idle ?
>
> Let's assume all cpus are idle on a dual socket quad core.
>
> Also, we can reasonably do the hypothesis if the cluster is in low power mode,
> the cpus belonging to the same cluster are in the same idle state (putting
> apart the auto-promote where we don't have control on).
>
> If the policy you talk above is 'aggressive power saving', we can follow the
> rules with decreasing priority:
>
> 1. We want to prevent to wakeup the entire cluster
> => as the cpus are in the same idle state, by choosing a cpu in
> => shallow
> state, we should have the guarantee we won't wakeup a cluster (except if no
> shallowest idle cpu are found).
This is unclear to me. Obviously, if an entire cluster is down, that
means all the CPUs it contains have been idle for a long time. And
therefore they shouldn't be subject to selection unless there is no
other CPUs available. Is that what you mean?
> 2. We want to prevent to wakeup a cpu which did not reach the target residency
> time (will need some work to unify cpuidle idle time and idle task run time)
> => with the target residency and, as a first step, with the idle
> => stamp,
> we can determine if the cpu slept enough
Agreed. However, right now, the scheduler does not have any
consideration for that. So this should be done as a separate patch.
> 3. We want to prevent to wakeup a cpu in deep idle state
> => by looking for the cpu in shallowest idle state
Obvious.
> 4. We want to prevent to wakeup a cpu where the exit latency is longer than
> the expected run time of the task (and the time to migrate the task ?)
Sure. That would be a case for using task packing even if the policy is
set to performance rather than powersave whereas task packing is
normally for powersave.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-17 15:53 ` Nicolas Pitre
@ 2014-04-17 16:05 ` Daniel Lezcano
2014-04-17 16:21 ` Nicolas Pitre
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-17 16:05 UTC (permalink / raw)
To: Nicolas Pitre
Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On 04/17/2014 05:53 PM, Nicolas Pitre wrote:
> On Thu, 17 Apr 2014, Daniel Lezcano wrote:
>
>> Ok, refreshed the patchset but before sending it out I would to discuss about
>> the rational of the changes and the policy, and change the patchset
>> consequently.
>>
>> What order to choose if the cpu is idle ?
>>
>> Let's assume all cpus are idle on a dual socket quad core.
>>
>> Also, we can reasonably do the hypothesis if the cluster is in low power mode,
>> the cpus belonging to the same cluster are in the same idle state (putting
>> apart the auto-promote where we don't have control on).
>>
>> If the policy you talk above is 'aggressive power saving', we can follow the
>> rules with decreasing priority:
>>
>> 1. We want to prevent to wakeup the entire cluster
>> => as the cpus are in the same idle state, by choosing a cpu in
>> => shallow
>> state, we should have the guarantee we won't wakeup a cluster (except if no
>> shallowest idle cpu are found).
>
> This is unclear to me. Obviously, if an entire cluster is down, that
> means all the CPUs it contains have been idle for a long time. And
> therefore they shouldn't be subject to selection unless there is no
> other CPUs available. Is that what you mean?
Yes, this is what I meant. But also what I meant is we can get rid for
the moment of the cpu topology and the coupling idle state because if we
do this described approach, as the idle state will be the same for the
cpus belonging to the same cluster we won't select a cluster down
(except if there is no other CPUs available).
>> 2. We want to prevent to wakeup a cpu which did not reach the target residency
>> time (will need some work to unify cpuidle idle time and idle task run time)
>> => with the target residency and, as a first step, with the idle
>> => stamp,
>> we can determine if the cpu slept enough
>
> Agreed. However, right now, the scheduler does not have any
> consideration for that. So this should be done as a separate patch.
Yes, I thought as a very first step we can rely on the idle stamp until
we unify the times with a big comment. Or I can first unify the idle
times and then take into account the target residency. It is to comply
with Rafael's request to have the 'big picture'.
>> 3. We want to prevent to wakeup a cpu in deep idle state
>> => by looking for the cpu in shallowest idle state
>
> Obvious.
>
>> 4. We want to prevent to wakeup a cpu where the exit latency is longer than
>> the expected run time of the task (and the time to migrate the task ?)
>
> Sure. That would be a case for using task packing even if the policy is
> set to performance rather than powersave whereas task packing is
> normally for powersave.
Yes, I agree, task packing improves also the performances and it makes
really sense to prevent task migration under some circumstances for a
better cache efficiency.
Thanks for the comments
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-17 16:05 ` Daniel Lezcano
@ 2014-04-17 16:21 ` Nicolas Pitre
2014-04-18 9:38 ` Peter Zijlstra
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-17 16:21 UTC (permalink / raw)
To: Daniel Lezcano
Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Thu, 17 Apr 2014, Daniel Lezcano wrote:
> On 04/17/2014 05:53 PM, Nicolas Pitre wrote:
> > On Thu, 17 Apr 2014, Daniel Lezcano wrote:
> >
> > > Ok, refreshed the patchset but before sending it out I would to discuss
> > > about
> > > the rational of the changes and the policy, and change the patchset
> > > consequently.
> > >
> > > What order to choose if the cpu is idle ?
> > >
> > > Let's assume all cpus are idle on a dual socket quad core.
> > >
> > > Also, we can reasonably do the hypothesis if the cluster is in low power
> > > mode,
> > > the cpus belonging to the same cluster are in the same idle state (putting
> > > apart the auto-promote where we don't have control on).
> > >
> > > If the policy you talk above is 'aggressive power saving', we can follow
> > > the
> > > rules with decreasing priority:
> > >
> > > 1. We want to prevent to wakeup the entire cluster
> > > => as the cpus are in the same idle state, by choosing a cpu in
> > > => shallow
> > > state, we should have the guarantee we won't wakeup a cluster (except if
> > > no
> > > shallowest idle cpu are found).
> >
> > This is unclear to me. Obviously, if an entire cluster is down, that
> > means all the CPUs it contains have been idle for a long time. And
> > therefore they shouldn't be subject to selection unless there is no
> > other CPUs available. Is that what you mean?
>
> Yes, this is what I meant. But also what I meant is we can get rid for the
> moment of the cpu topology and the coupling idle state because if we do this
> described approach, as the idle state will be the same for the cpus belonging
> to the same cluster we won't select a cluster down (except if there is no
> other CPUs available).
CPU topology is needed to properly describe scheduling domains. Whether
we balance across domains or pack using as few domains as possible is a
separate issue. In other words, you shouldn't have to care in this
patch series.
And IMHO coupled C-state is a low-level mechanism that should remain
private to cpuidle which the scheduler shouldn't be aware of.
> > > 2. We want to prevent to wakeup a cpu which did not reach the target
> > > residency
> > > time (will need some work to unify cpuidle idle time and idle task run
> > > time)
> > > => with the target residency and, as a first step, with the idle
> > > => stamp,
> > > we can determine if the cpu slept enough
> >
> > Agreed. However, right now, the scheduler does not have any
> > consideration for that. So this should be done as a separate patch.
>
> Yes, I thought as a very first step we can rely on the idle stamp until we
> unify the times with a big comment. Or I can first unify the idle times and
> then take into account the target residency. It is to comply with Rafael's
> request to have the 'big picture'.
I agree, but that should be done incrementally. Even without this
consideration, what you proposed is already an improvement over the
current state of affairs.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-17 15:03 ` Daniel Lezcano
@ 2014-04-18 8:09 ` Ingo Molnar
2014-04-18 8:36 ` Daniel Lezcano
0 siblings, 1 reply; 47+ messages in thread
From: Ingo Molnar @ 2014-04-18 8:09 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Peter Zijlstra, Nicolas Pitre, linux-kernel, mingo, rjw,
linux-pm, alex.shi, vincent.guittot, morten.rasmussen
* Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> On 04/17/2014 04:47 PM, Peter Zijlstra wrote:
> >On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
> >>Concerning the policy, I would suggest to create an entry in
> >>/proc/sys/kernel/sched_power, where a couple of values could be performance
> >>- power saving (0 / 1).
> >
> >Ingo wanted a sched_balance_policy file with 3 values:
> > "performance, power, auto"
> >
> >Where the auto thing switches between them, initially based off of
> >having AC or not.
>
> oh, good. Thanks !
Also, 'auto' should be the default, because the kernel doing TRT is
really what users want.
Userspace can sill tweak it all and make it all user-space controlled,
by flipping between 'performance' and 'power'. (and those modes are
also helpful for development and debugging.)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-18 8:09 ` Ingo Molnar
@ 2014-04-18 8:36 ` Daniel Lezcano
0 siblings, 0 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-18 8:36 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Nicolas Pitre, linux-kernel, mingo, rjw,
linux-pm, alex.shi, vincent.guittot, morten.rasmussen
On 04/18/2014 10:09 AM, Ingo Molnar wrote:
>
> * Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
>> On 04/17/2014 04:47 PM, Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
>>>> Concerning the policy, I would suggest to create an entry in
>>>> /proc/sys/kernel/sched_power, where a couple of values could be performance
>>>> - power saving (0 / 1).
>>>
>>> Ingo wanted a sched_balance_policy file with 3 values:
>>> "performance, power, auto"
>>>
>>> Where the auto thing switches between them, initially based off of
>>> having AC or not.
>>
>> oh, good. Thanks !
>
> Also, 'auto' should be the default, because the kernel doing TRT is
> really what users want.
>
> Userspace can sill tweak it all and make it all user-space controlled,
> by flipping between 'performance' and 'power'. (and those modes are
> also helpful for development and debugging.)
Copy that.
Thanks !
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-17 16:21 ` Nicolas Pitre
@ 2014-04-18 9:38 ` Peter Zijlstra
2014-04-18 12:13 ` Daniel Lezcano
0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-18 9:38 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Daniel Lezcano, linux-kernel, mingo, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
> CPU topology is needed to properly describe scheduling domains. Whether
> we balance across domains or pack using as few domains as possible is a
> separate issue. In other words, you shouldn't have to care in this
> patch series.
>
> And IMHO coupled C-state is a low-level mechanism that should remain
> private to cpuidle which the scheduler shouldn't be aware of.
I'm confused.. why wouldn't you want to expose these?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-18 9:38 ` Peter Zijlstra
@ 2014-04-18 12:13 ` Daniel Lezcano
2014-04-18 12:53 ` Peter Zijlstra
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-18 12:13 UTC (permalink / raw)
To: Peter Zijlstra, Nicolas Pitre
Cc: linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot,
morten.rasmussen
On 04/18/2014 11:38 AM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
>> CPU topology is needed to properly describe scheduling domains. Whether
>> we balance across domains or pack using as few domains as possible is a
>> separate issue. In other words, you shouldn't have to care in this
>> patch series.
>>
>> And IMHO coupled C-state is a low-level mechanism that should remain
>> private to cpuidle which the scheduler shouldn't be aware of.
>
> I'm confused.. why wouldn't you want to expose these?
The couple C-state is used as a mechanism for cpuidle to sync the cpus
when entering a specific c-state. This mechanism is usually used to
handle the cluster power down. It is only used for a two drivers (soon
three) but it is not the only mechanism used for syncing the cpus. There
are also the MCPM (tc2), the hand made sync when the hardware allows it
(ux500), and an abstraction from the firmware (mwait), transparent to
the kernel.
Taking into account the couple c-state only does not make sense because
of the other mechanisms above. This is why it should stay inside the
cpuidle framework.
The extension of the cpu topology will provide a generic way to describe
and abstracting such dependencies.
Does it answer your question ?
-- Daniel
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-18 12:13 ` Daniel Lezcano
@ 2014-04-18 12:53 ` Peter Zijlstra
2014-04-18 13:04 ` Daniel Lezcano
0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-18 12:53 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Fri, Apr 18, 2014 at 02:13:48PM +0200, Daniel Lezcano wrote:
> On 04/18/2014 11:38 AM, Peter Zijlstra wrote:
> >On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
> >>CPU topology is needed to properly describe scheduling domains. Whether
> >>we balance across domains or pack using as few domains as possible is a
> >>separate issue. In other words, you shouldn't have to care in this
> >>patch series.
> >>
> >>And IMHO coupled C-state is a low-level mechanism that should remain
> >>private to cpuidle which the scheduler shouldn't be aware of.
> >
> >I'm confused.. why wouldn't you want to expose these?
>
> The couple C-state is used as a mechanism for cpuidle to sync the cpus when
> entering a specific c-state. This mechanism is usually used to handle the
> cluster power down. It is only used for a two drivers (soon three) but it is
> not the only mechanism used for syncing the cpus. There are also the MCPM
> (tc2), the hand made sync when the hardware allows it (ux500), and an
> abstraction from the firmware (mwait), transparent to the kernel.
>
> Taking into account the couple c-state only does not make sense because of
> the other mechanisms above. This is why it should stay inside the cpuidle
> framework.
>
> The extension of the cpu topology will provide a generic way to describe and
> abstracting such dependencies.
>
> Does it answer your question ?
I suppose so; its still a bit like we won't but we will :-)
So we _will_ actually expose coupled C states through the topology bits,
that's good.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-18 12:53 ` Peter Zijlstra
@ 2014-04-18 13:04 ` Daniel Lezcano
2014-04-18 16:00 ` Nicolas Pitre
0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-18 13:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On 04/18/2014 02:53 PM, Peter Zijlstra wrote:
> On Fri, Apr 18, 2014 at 02:13:48PM +0200, Daniel Lezcano wrote:
>> On 04/18/2014 11:38 AM, Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
>>>> CPU topology is needed to properly describe scheduling domains. Whether
>>>> we balance across domains or pack using as few domains as possible is a
>>>> separate issue. In other words, you shouldn't have to care in this
>>>> patch series.
>>>>
>>>> And IMHO coupled C-state is a low-level mechanism that should remain
>>>> private to cpuidle which the scheduler shouldn't be aware of.
>>>
>>> I'm confused.. why wouldn't you want to expose these?
>>
>> The couple C-state is used as a mechanism for cpuidle to sync the cpus when
>> entering a specific c-state. This mechanism is usually used to handle the
>> cluster power down. It is only used for a two drivers (soon three) but it is
>> not the only mechanism used for syncing the cpus. There are also the MCPM
>> (tc2), the hand made sync when the hardware allows it (ux500), and an
>> abstraction from the firmware (mwait), transparent to the kernel.
>>
>> Taking into account the couple c-state only does not make sense because of
>> the other mechanisms above. This is why it should stay inside the cpuidle
>> framework.
>>
>> The extension of the cpu topology will provide a generic way to describe and
>> abstracting such dependencies.
>>
>> Does it answer your question ?
>
> I suppose so; its still a bit like we won't but we will :-)
>
> So we _will_ actually expose coupled C states through the topology bits,
> that's good.
Ah, ok. I think I understood where the confusion is coming from.
A couple of definitions for the same thing :)
1. Coupled C-states : *mechanism* implemented in the cpuidle framework:
drivers/cpuidle/coupled.c
2. Coupled C-states : *constraint* to reach a cluster power down state,
will be described through the topology and could be implemented by
different mechanism (MCPM, handmade sync, cpuidle-coupled-c-state,
firmware).
We want to expose 2. not 1. to the scheduler.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
2014-04-18 13:04 ` Daniel Lezcano
@ 2014-04-18 16:00 ` Nicolas Pitre
0 siblings, 0 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-18 16:00 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Peter Zijlstra, linux-kernel, mingo, rjw, linux-pm, alex.shi,
vincent.guittot, morten.rasmussen
On Fri, 18 Apr 2014, Daniel Lezcano wrote:
> On 04/18/2014 02:53 PM, Peter Zijlstra wrote:
> > I suppose so; its still a bit like we won't but we will :-)
> >
> > So we _will_ actually expose coupled C states through the topology bits,
> > that's good.
>
> Ah, ok. I think I understood where the confusion is coming from.
>
> A couple of definitions for the same thing :)
>
> 1. Coupled C-states : *mechanism* implemented in the cpuidle framework:
> drivers/cpuidle/coupled.c
>
> 2. Coupled C-states : *constraint* to reach a cluster power down state, will
> be described through the topology and could be implemented by different
> mechanism (MCPM, handmade sync, cpuidle-coupled-c-state, firmware).
>
> We want to expose 2. not 1. to the scheduler.
I couldn't explain it better.
Sorry for creating confusion.
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2014-04-18 16:00 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
2014-03-28 18:17 ` Nicolas Pitre
2014-03-28 20:42 ` Daniel Lezcano
2014-03-29 0:00 ` Nicolas Pitre
2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
2014-04-15 12:43 ` Peter Zijlstra
2014-04-15 12:44 ` Peter Zijlstra
2014-04-15 14:17 ` Daniel Lezcano
2014-04-15 14:33 ` Peter Zijlstra
2014-04-15 14:39 ` Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
2014-04-02 3:05 ` Nicolas Pitre
2014-04-04 11:57 ` Rafael J. Wysocki
2014-04-04 16:56 ` Nicolas Pitre
2014-04-05 2:01 ` Rafael J. Wysocki
2014-04-17 13:53 ` Daniel Lezcano
2014-04-17 14:47 ` Peter Zijlstra
2014-04-17 15:03 ` Daniel Lezcano
2014-04-18 8:09 ` Ingo Molnar
2014-04-18 8:36 ` Daniel Lezcano
2014-04-17 15:53 ` Nicolas Pitre
2014-04-17 16:05 ` Daniel Lezcano
2014-04-17 16:21 ` Nicolas Pitre
2014-04-18 9:38 ` Peter Zijlstra
2014-04-18 12:13 ` Daniel Lezcano
2014-04-18 12:53 ` Peter Zijlstra
2014-04-18 13:04 ` Daniel Lezcano
2014-04-18 16:00 ` Nicolas Pitre
2014-04-15 13:03 ` Peter Zijlstra
2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
2014-03-31 15:55 ` Daniel Lezcano
2014-04-01 7:16 ` Vincent Guittot
2014-04-01 7:43 ` Daniel Lezcano
2014-04-01 9:05 ` Vincent Guittot
2014-04-15 13:13 ` Peter Zijlstra
2014-04-01 23:01 ` Rafael J. Wysocki
2014-04-02 3:14 ` Nicolas Pitre
2014-04-04 11:43 ` Rafael J. Wysocki
2014-04-15 13:17 ` Peter Zijlstra
2014-04-15 13:25 ` Peter Zijlstra
2014-04-15 15:27 ` Nicolas Pitre
2014-04-15 15:33 ` Rafael J. Wysocki
2014-04-02 8:26 ` Daniel Lezcano
2014-04-04 11:23 ` Rafael J. Wysocki
2014-04-04 6:29 ` Len Brown
2014-04-04 8:16 ` Daniel Lezcano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.