All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
@ 2014-03-28 12:29 Daniel Lezcano
  2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
                   ` (5 more replies)
  0 siblings, 6 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz
  Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen

The following patchset provides an interaction between cpuidle and the scheduler.

The first patch encapsulate the needed information for the scheduler in a
separate cpuidle structure. The second one stores the pointer to this structure
when entering idle. The third one, use this information to take the decision to
find the idlest cpu.

After some basic testing with hackbench, it appears there is an improvement for
the performances (small) and for the duration of the idle states (which provides
a better power saving).

The measurement has been done with the 'idlestat' tool previously posted in this
mailing list.

So the benefit is good for both sides performance and power saving.

The select_idle_sibling could be also improved in the same way.

====================== test with hackbench 3.14-rc8 =========================

/usr/bin/hackbench -l 10000 -s 4096

Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
Each sender will pass 10000 messages of 4096 bytes

Time: 44.433

Total trace buffer: 1846688 kB
clusterA@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	0	           0.00	           0.00	0.00	0.00
         C1E-VB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	0	           0.00	           0.00	0.00	0.00
  core0@state	hits	      total(us)		avg(us)	min(us)	max(us)
        POLL	0	           0.00	           0.00	0.00	0.00
        C1-IVB	0	           0.00	           0.00	0.00	0.00
        C1E-IVB	0	           0.00	           0.00	0.00	0.00
        C3-IVB	0	           0.00	           0.00	0.00	0.00
        C6-IVB	0	           0.00	           0.00	0.00	0.00
        C7-IVB	1396	    87932131.00	       62988.63	0.00	320146.00
    cpu0@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	1	          14.00	          14.00	14.00	14.00
         C1E-VB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	1	         262.00	         262.00	262.00	262.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	1180	    87938177.00	       74523.88	1.00	320147.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	0	           0.00	           0.00	0.00	0.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	0	           0.00	           0.00	0.00	0.00
    cpu0 wakeups 	name 		count
         irq009	acpi           	1
    cpu1@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	0	           0.00	           0.00	0.00	0.00
         C1E-VB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	475	    87941356.00	      185139.70	322.00	1500690.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	0	           0.00	           0.00	0.00	0.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	0	           0.00	           0.00	0.00	0.00
    cpu1 wakeups 	name 		count
         irq009	acpi           	3
  core1@state	hits	      total(us)		avg(us)	min(us)	max(us)
        POLL	0	           0.00	           0.00	0.00	0.00
        C1-IVB	0	           0.00	           0.00	0.00	0.00
        C1E-IVB	0	           0.00	           0.00	0.00	0.00
        C3-IVB	0	           0.00	           0.00	0.00	0.00
        C6-IVB	0	           0.00	           0.00	0.00	0.00
        C7-IVB	0	           0.00	           0.00	0.00	0.00
    cpu2@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	11	      288157.00	       26196.09	16.00	200060.00
         C1E-VB	6	      221601.00	       36933.50	79.00	200066.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	950	    87417466.00	       92018.39	19.00	200074.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	2	          34.00	          17.00	11.00	23.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	745	       18800.00	          25.23	2.00	156.00
    cpu2 wakeups 	name 		count
         irq019	ahci           	50
         irq009	acpi           	17
    cpu3@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	0	           0.00	           0.00	0.00	0.00
         C1E-VB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	0	           0.00	           0.00	0.00	0.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	0	           0.00	           0.00	0.00	0.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	0	           0.00	           0.00	0.00	0.00
    cpu3 wakeups 	name 		count

================ test with hackbench 3.14-rc8 + patchset ====================

/usr/bin/hackbench -l 10000 -s 4096

Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
Each sender will pass 10000 messages of 4096 bytes

Time: 42.179

Total trace buffer: 1846688 kB
clusterA@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	0	           0.00	           0.00	0.00	0.00
         C1E-VB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	0	           0.00	           0.00	0.00	0.00
  core0@state	hits	      total(us)		avg(us)	min(us)	max(us)
        POLL	0	           0.00	           0.00	0.00	0.00
        C1-IVB	0	           0.00	           0.00	0.00	0.00
        C1E-IVB	0	           0.00	           0.00	0.00	0.00
        C3-IVB	0	           0.00	           0.00	0.00	0.00
        C6-IVB	0	           0.00	           0.00	0.00	0.00
        C7-IVB	880	    89157590.00	      101315.44	0.00	400184.00
    cpu0@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	0	           0.00	           0.00	0.00	0.00
         C1E-VB	1	         233.00	         233.00	233.00	233.00
         C3-IVB	1	         260.00	         260.00	260.00	260.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	700	    89162006.00	      127374.29	182.00	400187.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	0	           0.00	           0.00	0.00	0.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	0	           0.00	           0.00	0.00	0.00
    cpu0 wakeups 	name 		count
         irq009	acpi           	2
    cpu1@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	0	           0.00	           0.00	0.00	0.00
         C1E-VB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	334	    89164805.00	      266960.49	1.00	1500677.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	0	           0.00	           0.00	0.00	0.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	0	           0.00	           0.00	0.00	0.00
    cpu1 wakeups 	name 		count
         irq009	acpi           	6
  core1@state	hits	      total(us)		avg(us)	min(us)	max(us)
        POLL	0	           0.00	           0.00	0.00	0.00
        C1-IVB	0	           0.00	           0.00	0.00	0.00
        C1E-IVB	0	           0.00	           0.00	0.00	0.00
        C3-IVB	0	           0.00	           0.00	0.00	0.00
        C6-IVB	0	           0.00	           0.00	0.00	0.00
        C7-IVB	0	           0.00	           0.00	0.00	0.00
    cpu2@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	19	     2169047.00	      114160.37	18.00	999129.00
         C1E-IB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	376	    86993307.00	      231365.18	20.00	1500682.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	0	           0.00	           0.00	0.00	0.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	0	           0.00	           0.00	0.00	0.00
    cpu2 wakeups 	name 		count
         irq009	acpi           	32
         irq019	ahci           	45
    cpu3@state	hits	      total(us)		avg(us)	min(us)	max(us)
         POLL	0	           0.00	           0.00	0.00	0.00
         C1-IVB	0	           0.00	           0.00	0.00	0.00
         C1E-VB	0	           0.00	           0.00	0.00	0.00
         C3-IVB	0	           0.00	           0.00	0.00	0.00
         C6-IVB	0	           0.00	           0.00	0.00	0.00
         C7-IVB	0	           0.00	           0.00	0.00	0.00
         1701	0	           0.00	           0.00	0.00	0.00
         1700	0	           0.00	           0.00	0.00	0.00
         1600	0	           0.00	           0.00	0.00	0.00
         1500	0	           0.00	           0.00	0.00	0.00
         1400	0	           0.00	           0.00	0.00	0.00
         1300	0	           0.00	           0.00	0.00	0.00
         1200	0	           0.00	           0.00	0.00	0.00
         1100	0	           0.00	           0.00	0.00	0.00
         1000	0	           0.00	           0.00	0.00	0.00
         900	0	           0.00	           0.00	0.00	0.00
         800	0	           0.00	           0.00	0.00	0.00
         782	0	           0.00	           0.00	0.00	0.00
    cpu3 wakeups 	name 		count


Daniel Lezcano (3):
  cpuidle: encapsulate power info in a separate structure
  idle: store the idle state the cpu is
  sched/fair: use the idle state info to choose the idlest cpu

 arch/arm/include/asm/cpuidle.h       |    6 +-
 arch/arm/mach-exynos/cpuidle.c       |    4 +-
 drivers/acpi/processor_idle.c        |    4 +-
 drivers/base/power/domain.c          |    6 +-
 drivers/cpuidle/cpuidle-at91.c       |    4 +-
 drivers/cpuidle/cpuidle-big_little.c |    9 +--
 drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
 drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
 drivers/cpuidle/cpuidle-powernv.c    |    8 +--
 drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
 drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
 drivers/cpuidle/cpuidle-zynq.c       |    4 +-
 drivers/cpuidle/driver.c             |    6 +-
 drivers/cpuidle/governors/ladder.c   |   14 +++--
 drivers/cpuidle/governors/menu.c     |    8 +--
 drivers/cpuidle/sysfs.c              |    2 +-
 drivers/idle/intel_idle.c            |  112 +++++++++++++++++-----------------
 include/linux/cpuidle.h              |   10 ++-
 kernel/sched/fair.c                  |   46 ++++++++++++--
 kernel/sched/idle.c                  |   17 +++++-
 kernel/sched/sched.h                 |    5 ++
 21 files changed, 180 insertions(+), 121 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
  2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
@ 2014-03-28 12:29 ` Daniel Lezcano
  2014-03-28 18:17   ` Nicolas Pitre
  2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz
  Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen

The scheduler needs some information from cpuidle to know the timing for a
specific idle state a cpu is.

This patch creates a separate structure to group the cpuidle power info in
order to share it with the scheduler. It improves the encapsulation of the
code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 arch/arm/include/asm/cpuidle.h       |    6 +-
 arch/arm/mach-exynos/cpuidle.c       |    4 +-
 drivers/acpi/processor_idle.c        |    4 +-
 drivers/base/power/domain.c          |    6 +-
 drivers/cpuidle/cpuidle-at91.c       |    4 +-
 drivers/cpuidle/cpuidle-big_little.c |    9 +--
 drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
 drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
 drivers/cpuidle/cpuidle-powernv.c    |    8 +--
 drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
 drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
 drivers/cpuidle/cpuidle-zynq.c       |    4 +-
 drivers/cpuidle/driver.c             |    6 +-
 drivers/cpuidle/governors/ladder.c   |   14 +++--
 drivers/cpuidle/governors/menu.c     |    8 +--
 drivers/cpuidle/sysfs.c              |    2 +-
 drivers/idle/intel_idle.c            |  112 +++++++++++++++++-----------------
 include/linux/cpuidle.h              |   10 ++-
 18 files changed, 120 insertions(+), 113 deletions(-)

diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
index 2fca60a..987ee53 100644
--- a/arch/arm/include/asm/cpuidle.h
+++ b/arch/arm/include/asm/cpuidle.h
@@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
 /* Common ARM WFI state */
 #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\
 	.enter                  = arm_cpuidle_simple_enter,\
-	.exit_latency           = 1,\
-	.target_residency       = 1,\
-	.power_usage		= p,\
+	.power.exit_latency     = 1,\
+	.power.target_residency = 1,\
+	.power.power_usage	= p,\
 	.flags                  = CPUIDLE_FLAG_TIME_VALID,\
 	.name                   = "WFI",\
 	.desc                   = "ARM WFI",\
diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c
index f57cb91..f6275cb 100644
--- a/arch/arm/mach-exynos/cpuidle.c
+++ b/arch/arm/mach-exynos/cpuidle.c
@@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = {
 		[0] = ARM_CPUIDLE_WFI_STATE,
 		[1] = {
 			.enter			= exynos4_enter_lowpower,
-			.exit_latency		= 300,
-			.target_residency	= 100000,
+			.power.exit_latency	= 300,
+			.power.target_residency = 100000,
 			.flags			= CPUIDLE_FLAG_TIME_VALID,
 			.name			= "C1",
 			.desc			= "ARM power down",
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 3dca36d..05fa991 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
 		state = &drv->states[count];
 		snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
 		strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
-		state->exit_latency = cx->latency;
-		state->target_residency = cx->latency * latency_factor;
+		state->power.exit_latency = cx->latency;
+		state->power.target_residency = cx->latency * latency_factor;
 
 		state->flags = 0;
 		switch (cx->type) {
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index bfb8955..6bcb1e8 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd)
 	usecs64 = genpd->power_on_latency_ns;
 	do_div(usecs64, NSEC_PER_USEC);
 	usecs64 += genpd->cpu_data->saved_exit_latency;
-	genpd->cpu_data->idle_state->exit_latency = usecs64;
+	genpd->cpu_data->idle_state->power.exit_latency = usecs64;
 }
 
 /**
@@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state)
 		goto err;
 	}
 	cpu_data->idle_state = idle_state;
-	cpu_data->saved_exit_latency = idle_state->exit_latency;
+	cpu_data->saved_exit_latency = idle_state->power.exit_latency;
 	genpd->cpu_data = cpu_data;
 	genpd_recalc_cpu_exit_latency(genpd);
 
@@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd)
 		ret = -EAGAIN;
 		goto out;
 	}
-	idle_state->exit_latency = cpu_data->saved_exit_latency;
+	idle_state->power.exit_latency = cpu_data->saved_exit_latency;
 	cpuidle_driver_unref();
 	genpd->cpu_data = NULL;
 	kfree(cpu_data);
diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c
index a077437..48c7063 100644
--- a/drivers/cpuidle/cpuidle-at91.c
+++ b/drivers/cpuidle/cpuidle-at91.c
@@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = {
 	.owner			= THIS_MODULE,
 	.states[0]		= ARM_CPUIDLE_WFI_STATE,
 	.states[1]		= {
+		.power.exit_latency	= 10,
+		.power.target_residency = 10000,
 		.enter			= at91_enter_idle,
-		.exit_latency		= 10,
-		.target_residency	= 10000,
 		.flags			= CPUIDLE_FLAG_TIME_VALID,
 		.name			= "RAM_SR",
 		.desc			= "WFI and DDR Self Refresh",
diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c
index b45fc62..5a0af4b 100644
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = {
 	.owner = THIS_MODULE,
 	.states[0] = ARM_CPUIDLE_WFI_STATE,
 	.states[1] = {
+		.power.exit_latency	= 700,
+		.power.target_residency = 2500,
 		.enter			= bl_enter_powerdown,
-		.exit_latency		= 700,
-		.target_residency	= 2500,
 		.flags			= CPUIDLE_FLAG_TIME_VALID |
 					  CPUIDLE_FLAG_TIMER_STOP,
 		.name			= "C1",
@@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = {
 	.owner = THIS_MODULE,
 	.states[0] = ARM_CPUIDLE_WFI_STATE,
 	.states[1] = {
+
+		.power.exit_latency	= 500,
+		.power.target_residency = 2000,
 		.enter			= bl_enter_powerdown,
-		.exit_latency		= 500,
-		.target_residency	= 2000,
 		.flags			= CPUIDLE_FLAG_TIME_VALID |
 					  CPUIDLE_FLAG_TIMER_STOP,
 		.name			= "C1",
diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
index 6e51114..8357a20 100644
--- a/drivers/cpuidle/cpuidle-calxeda.c
+++ b/drivers/cpuidle/cpuidle-calxeda.c
@@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = {
 			.name = "PG",
 			.desc = "Power Gate",
 			.flags = CPUIDLE_FLAG_TIME_VALID,
-			.exit_latency = 30,
-			.power_usage = 50,
-			.target_residency = 200,
+			.power.exit_latency = 30,
+			.power.power_usage = 50,
+			.power.target_residency = 200,
 			.enter = calxeda_pwrdown_idle,
 		},
 	},
diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
index 41ba843..0ae4138 100644
--- a/drivers/cpuidle/cpuidle-kirkwood.c
+++ b/drivers/cpuidle/cpuidle-kirkwood.c
@@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = {
 	.owner			= THIS_MODULE,
 	.states[0]		= ARM_CPUIDLE_WFI_STATE,
 	.states[1]		= {
+		.power.exit_latency	= 10,
+		.power.target_residency = 100000,
 		.enter			= kirkwood_enter_idle,
-		.exit_latency		= 10,
-		.target_residency	= 100000,
 		.flags			= CPUIDLE_FLAG_TIME_VALID,
 		.name			= "DDR SR",
 		.desc			= "WFI and DDR Self Refresh",
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index f48607c..c47cc02 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = {
 		.name = "snooze",
 		.desc = "snooze",
 		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 0,
-		.target_residency = 0,
+		.power.exit_latency = 0,
+		.power.target_residency = 0,
 		.enter = &snooze_loop },
 	{ /* NAP */
 		.name = "NAP",
 		.desc = "NAP",
 		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
+		.power.exit_latency = 10,
+		.power.target_residency = 100,
 		.enter = &nap_loop },
 };
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 6f7b019..483d7e7 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = {
 		.name = "snooze",
 		.desc = "snooze",
 		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 0,
-		.target_residency = 0,
+		.power.exit_latency = 0,
+		.power.target_residency = 0,
 		.enter = &snooze_loop },
 	{ /* CEDE */
 		.name = "CEDE",
 		.desc = "CEDE",
 		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 100,
+		.power.exit_latency = 10,
+		.power.target_residency = 100,
 		.enter = &dedicated_cede_loop },
 };
 
@@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = {
 		.name = "Shared Cede",
 		.desc = "Shared Cede",
 		.flags = CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 0,
-		.target_residency = 0,
+		.power.exit_latency = 0,
+		.power.target_residency = 0,
 		.enter = &shared_cede_loop },
 };
 
diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c
index 5e35804..3261eb2 100644
--- a/drivers/cpuidle/cpuidle-ux500.c
+++ b/drivers/cpuidle/cpuidle-ux500.c
@@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = {
 	.states = {
 		ARM_CPUIDLE_WFI_STATE,
 		{
-			.enter		  = ux500_enter_idle,
-			.exit_latency	  = 70,
-			.target_residency = 260,
-			.flags		  = CPUIDLE_FLAG_TIME_VALID |
-			                    CPUIDLE_FLAG_TIMER_STOP,
-			.name		  = "ApIdle",
-			.desc		  = "ARM Retention",
+			.power.exit_latency	= 70,
+			.power.target_residency = 260,
+			.enter			= ux500_enter_idle,
+			.flags			= CPUIDLE_FLAG_TIME_VALID |
+						CPUIDLE_FLAG_TIMER_STOP,
+			.name			= "ApIdle",
+			.desc			= "ARM Retention",
 		},
 	},
 	.safe_state_index = 0,
diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c
index aded759..dddefb8 100644
--- a/drivers/cpuidle/cpuidle-zynq.c
+++ b/drivers/cpuidle/cpuidle-zynq.c
@@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = {
 	.states = {
 		ARM_CPUIDLE_WFI_STATE,
 		{
+			.power.exit_latency	= 10,
+			.power.target_residency = 10000,
 			.enter			= zynq_enter_idle,
-			.exit_latency		= 10,
-			.target_residency	= 10000,
 			.flags			= CPUIDLE_FLAG_TIME_VALID |
 						  CPUIDLE_FLAG_TIMER_STOP,
 			.name			= "RAM_SR",
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 06dbe7c..40ddd3c 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv)
 
 	snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
 	snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
-	state->exit_latency = 0;
-	state->target_residency = 0;
-	state->power_usage = -1;
+	state->power.exit_latency = 0;
+	state->power.target_residency = 0;
+	state->power.power_usage = -1;
 	state->flags = 0;
 	state->enter = poll_idle;
 	state->disabled = false;
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 9f08e8c..4837880 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
 
 	if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
 		last_residency = cpuidle_get_last_residency(dev) - \
-					 drv->states[last_idx].exit_latency;
+			drv->states[last_idx].power.exit_latency;
 	}
 	else
 		last_residency = last_state->threshold.promotion_time + 1;
@@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
 	    !drv->states[last_idx + 1].disabled &&
 	    !dev->states_usage[last_idx + 1].disable &&
 	    last_residency > last_state->threshold.promotion_time &&
-	    drv->states[last_idx + 1].exit_latency <= latency_req) {
+	    drv->states[last_idx + 1].power.exit_latency <= latency_req) {
 		last_state->stats.promotion_count++;
 		last_state->stats.demotion_count = 0;
 		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
@@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv,
 	if (last_idx > CPUIDLE_DRIVER_STATE_START &&
 	    (drv->states[last_idx].disabled ||
 	    dev->states_usage[last_idx].disable ||
-	    drv->states[last_idx].exit_latency > latency_req)) {
+	    drv->states[last_idx].power.exit_latency > latency_req)) {
 		int i;
 
 		for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
-			if (drv->states[i].exit_latency <= latency_req)
+			if (drv->states[i].power.exit_latency <= latency_req)
 				break;
 		}
 		ladder_do_selection(ldev, last_idx, i);
@@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
 		lstate->threshold.demotion_count = DEMOTION_COUNT;
 
 		if (i < drv->state_count - 1)
-			lstate->threshold.promotion_time = state->exit_latency;
+			lstate->threshold.promotion_time =
+				state->power.exit_latency;
 		if (i > 0)
-			lstate->threshold.demotion_time = state->exit_latency;
+			lstate->threshold.demotion_time =
+				state->power.exit_latency;
 	}
 
 	return 0;
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..34bd463 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 
 		if (s->disabled || su->disable)
 			continue;
-		if (s->target_residency > data->predicted_us)
+		if (s->power.target_residency > data->predicted_us)
 			continue;
-		if (s->exit_latency > latency_req)
+		if (s->power.exit_latency > latency_req)
 			continue;
-		if (s->exit_latency * multiplier > data->predicted_us)
+		if (s->power.exit_latency * multiplier > data->predicted_us)
 			continue;
 
 		data->last_state_idx = i;
-		data->exit_us = s->exit_latency;
+		data->exit_us = s->power.exit_latency;
 	}
 
 	return data->last_state_idx;
diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
index e918b6d..1a45541 100644
--- a/drivers/cpuidle/sysfs.c
+++ b/drivers/cpuidle/sysfs.c
@@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store)
 static ssize_t show_state_##_name(struct cpuidle_state *state, \
 			 struct cpuidle_state_usage *state_usage, char *buf) \
 { \
-	return sprintf(buf, "%u\n", state->_name);\
+	return sprintf(buf, "%u\n", state->power._name);\
 }
 
 #define define_store_state_ull_function(_name) \
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 8e1939f..4f0533e 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = {
 		.name = "C1-NHM",
 		.desc = "MWAIT 0x00",
 		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 3,
-		.target_residency = 6,
+		.power.exit_latency = 3,
+		.power.target_residency = 6,
 		.enter = &intel_idle },
 	{
 		.name = "C1E-NHM",
 		.desc = "MWAIT 0x01",
 		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 20,
+		.power.exit_latency = 10,
+		.power.target_residency = 20,
 		.enter = &intel_idle },
 	{
 		.name = "C3-NHM",
 		.desc = "MWAIT 0x10",
 		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 20,
-		.target_residency = 80,
+		.power.exit_latency = 20,
+		.power.target_residency = 80,
 		.enter = &intel_idle },
 	{
 		.name = "C6-NHM",
 		.desc = "MWAIT 0x20",
 		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 200,
-		.target_residency = 800,
+		.power.exit_latency = 200,
+		.power.target_residency = 800,
 		.enter = &intel_idle },
 	{
 		.enter = NULL }
@@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = {
 		.name = "C1-SNB",
 		.desc = "MWAIT 0x00",
 		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 2,
-		.target_residency = 2,
+		.power.exit_latency = 2,
+		.power.target_residency = 2,
 		.enter = &intel_idle },
 	{
 		.name = "C1E-SNB",
 		.desc = "MWAIT 0x01",
 		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 20,
+		.power.exit_latency = 10,
+		.power.target_residency = 20,
 		.enter = &intel_idle },
 	{
 		.name = "C3-SNB",
 		.desc = "MWAIT 0x10",
 		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 80,
-		.target_residency = 211,
+		.power.exit_latency = 80,
+		.power.target_residency = 211,
 		.enter = &intel_idle },
 	{
 		.name = "C6-SNB",
 		.desc = "MWAIT 0x20",
 		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 104,
-		.target_residency = 345,
+		.power.exit_latency = 104,
+		.power.target_residency = 345,
 		.enter = &intel_idle },
 	{
 		.name = "C7-SNB",
 		.desc = "MWAIT 0x30",
 		.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 109,
-		.target_residency = 345,
+		.power.exit_latency = 109,
+		.power.target_residency = 345,
 		.enter = &intel_idle },
 	{
 		.enter = NULL }
@@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = {
 		.name = "C1-IVB",
 		.desc = "MWAIT 0x00",
 		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 1,
-		.target_residency = 1,
+		.power.exit_latency = 1,
+		.power.target_residency = 1,
 		.enter = &intel_idle },
 	{
 		.name = "C1E-IVB",
 		.desc = "MWAIT 0x01",
 		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 20,
+		.power.exit_latency = 10,
+		.power.target_residency = 20,
 		.enter = &intel_idle },
 	{
 		.name = "C3-IVB",
 		.desc = "MWAIT 0x10",
 		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 59,
-		.target_residency = 156,
+		.power.exit_latency = 59,
+		.power.target_residency = 156,
 		.enter = &intel_idle },
 	{
 		.name = "C6-IVB",
 		.desc = "MWAIT 0x20",
 		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 80,
-		.target_residency = 300,
+		.power.exit_latency = 80,
+		.power.target_residency = 300,
 		.enter = &intel_idle },
 	{
 		.name = "C7-IVB",
 		.desc = "MWAIT 0x30",
 		.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 87,
-		.target_residency = 300,
+		.power.exit_latency = 87,
+		.power.target_residency = 300,
 		.enter = &intel_idle },
 	{
 		.enter = NULL }
@@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = {
 		.name = "C1-HSW",
 		.desc = "MWAIT 0x00",
 		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 2,
-		.target_residency = 2,
+		.power.exit_latency = 2,
+		.power.target_residency = 2,
 		.enter = &intel_idle },
 	{
 		.name = "C1E-HSW",
 		.desc = "MWAIT 0x01",
 		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 20,
+		.power.exit_latency = 10,
+		.power.target_residency = 20,
 		.enter = &intel_idle },
 	{
 		.name = "C3-HSW",
 		.desc = "MWAIT 0x10",
 		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 33,
-		.target_residency = 100,
+		.power.exit_latency = 33,
+		.power.target_residency = 100,
 		.enter = &intel_idle },
 	{
 		.name = "C6-HSW",
 		.desc = "MWAIT 0x20",
 		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 133,
-		.target_residency = 400,
+		.power.exit_latency = 133,
+		.power.target_residency = 400,
 		.enter = &intel_idle },
 	{
 		.name = "C7s-HSW",
 		.desc = "MWAIT 0x32",
 		.flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 166,
-		.target_residency = 500,
+		.power.exit_latency = 166,
+		.power.target_residency = 500,
 		.enter = &intel_idle },
 	{
 		.name = "C8-HSW",
 		.desc = "MWAIT 0x40",
 		.flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 300,
-		.target_residency = 900,
+		.power.exit_latency = 300,
+		.power.target_residency = 900,
 		.enter = &intel_idle },
 	{
 		.name = "C9-HSW",
 		.desc = "MWAIT 0x50",
 		.flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 600,
-		.target_residency = 1800,
+		.power.exit_latency = 600,
+		.power.target_residency = 1800,
 		.enter = &intel_idle },
 	{
 		.name = "C10-HSW",
 		.desc = "MWAIT 0x60",
 		.flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 2600,
-		.target_residency = 7700,
+		.power.exit_latency = 2600,
+		.power.target_residency = 7700,
 		.enter = &intel_idle },
 	{
 		.enter = NULL }
@@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = {
 		.name = "C1E-ATM",
 		.desc = "MWAIT 0x00",
 		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 10,
-		.target_residency = 20,
+		.power.exit_latency = 10,
+		.power.target_residency = 20,
 		.enter = &intel_idle },
 	{
 		.name = "C2-ATM",
 		.desc = "MWAIT 0x10",
 		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 20,
-		.target_residency = 80,
+		.power.exit_latency = 20,
+		.power.target_residency = 80,
 		.enter = &intel_idle },
 	{
 		.name = "C4-ATM",
 		.desc = "MWAIT 0x30",
 		.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 100,
-		.target_residency = 400,
+		.power.exit_latency = 100,
+		.power.target_residency = 400,
 		.enter = &intel_idle },
 	{
 		.name = "C6-ATM",
 		.desc = "MWAIT 0x52",
 		.flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 140,
-		.target_residency = 560,
+		.power.exit_latency = 140,
+		.power.target_residency = 560,
 		.enter = &intel_idle },
 	{
 		.enter = NULL }
@@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = {
 		.name = "C1-AVN",
 		.desc = "MWAIT 0x00",
 		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
-		.exit_latency = 2,
-		.target_residency = 2,
+		.power.exit_latency = 2,
+		.power.target_residency = 2,
 		.enter = &intel_idle },
 	{
 		.name = "C6-AVN",
 		.desc = "MWAIT 0x51",
 		.flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
-		.exit_latency = 15,
-		.target_residency = 45,
+		.power.exit_latency = 15,
+		.power.target_residency = 45,
 		.enter = &intel_idle },
 	{
 		.enter = NULL }
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index b0238cb..eb58ab3 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -35,14 +35,18 @@ struct cpuidle_state_usage {
 	unsigned long long	time; /* in US */
 };
 
+struct cpuidle_power {
+	unsigned int	exit_latency; /* in US */
+	unsigned int	target_residency; /* in US */
+	int		power_usage; /* in mW */
+};
+
 struct cpuidle_state {
 	char		name[CPUIDLE_NAME_LEN];
 	char		desc[CPUIDLE_DESC_LEN];
 
 	unsigned int	flags;
-	unsigned int	exit_latency; /* in US */
-	int		power_usage; /* in mW */
-	unsigned int	target_residency; /* in US */
+	struct cpuidle_power power;
 	bool		disabled; /* disabled on all CPUs */
 
 	int (*enter)	(struct cpuidle_device *dev,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCHC 2/3] idle: store the idle state the cpu is
  2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
  2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
@ 2014-03-28 12:29 ` Daniel Lezcano
  2014-04-15 12:43   ` Peter Zijlstra
  2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz
  Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen

When the cpu enters idle it stores the cpuidle power info in the struct
rq which in turn could be used to take a right decision when balancing
a task.

As soon as the cpu exits the idle state, the structure is filled with the
NULL pointer.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/idle.c  |   17 +++++++++++++++--
 kernel/sched/sched.h |    5 +++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8f4390a..5c32c11 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -12,6 +12,8 @@
 
 #include <trace/events/power.h>
 
+#include "sched.h"
+
 static int __read_mostly cpu_idle_force_poll;
 
 void cpu_idle_poll_ctrl(bool enable)
@@ -69,7 +71,7 @@ void __weak arch_cpu_idle(void)
  * NOTE: no locks or semaphores should be used here
  * return non-zero on failure
  */
-static int cpuidle_idle_call(void)
+static int cpuidle_idle_call(struct cpuidle_power **power)
 {
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
@@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
 			if (!ret) {
 				trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
+				*power = &drv->states[next_state].power;
+
+				wmb();
+
 				/*
 				 * Enter the idle state previously
 				 * returned by the governor
@@ -154,6 +160,10 @@ static int cpuidle_idle_call(void)
 				entered_state = cpuidle_enter(drv, dev,
 							      next_state);
 
+				*power = NULL;
+
+				wmb();
+
 				trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
 						       dev->cpu);
 
@@ -198,6 +208,9 @@ static int cpuidle_idle_call(void)
  */
 static void cpu_idle_loop(void)
 {
+	struct rq *rq = this_rq();
+	struct cpuidle_power **power = &rq->power;
+
 	while (1) {
 		tick_nohz_idle_enter();
 
@@ -223,7 +236,7 @@ static void cpu_idle_loop(void)
 			if (cpu_idle_force_poll || tick_check_broadcast_expired())
 				cpu_idle_poll();
 			else
-				cpuidle_idle_call();
+				cpuidle_idle_call(power);
 
 			arch_cpu_idle_exit();
 		}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1929deb..1bcac35 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -14,6 +14,7 @@
 #include "cpuacct.h"
 
 struct rq;
+struct cpuidle_power;
 
 extern __read_mostly int scheduler_running;
 
@@ -632,6 +633,10 @@ struct rq {
 #ifdef CONFIG_SMP
 	struct llist_head wake_list;
 #endif
+
+#ifdef CONFIG_CPU_IDLE
+	struct cpuidle_power *power;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
  2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
  2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
@ 2014-03-28 12:29 ` Daniel Lezcano
  2014-04-02  3:05   ` Nicolas Pitre
  2014-04-15 13:03   ` Peter Zijlstra
  2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 12:29 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz
  Cc: rjw, nicolas.pitre, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen

As we know in which idle state the cpu is, we can investigate the following:

1. when did the cpu entered the idle state ? the longer the cpu is idle, the
deeper it is idle
2. what exit latency is ? the greater the exit latency is, the deeper it is

With both information, when all cpus are idle, we can choose the idlest cpu.

When one cpu is not idle, the old check against weighted load applies.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/fair.c |   46 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 16042b5..068e503 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -23,6 +23,7 @@
 #include <linux/latencytop.h>
 #include <linux/sched.h>
 #include <linux/cpumask.h>
+#include <linux/cpuidle.h>
 #include <linux/slab.h>
 #include <linux/profile.h>
 #include <linux/interrupt.h>
@@ -4336,20 +4337,53 @@ static int
 find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 {
 	unsigned long load, min_load = ULONG_MAX;
-	int idlest = -1;
+	unsigned int min_exit_latency = UINT_MAX;
+	u64 idle_stamp, min_idle_stamp = ULONG_MAX;
+
+	struct rq *rq;
+	struct cpuidle_power *power;
+
+	int cpu_idle = -1;
+	int cpu_busy = -1;
 	int i;
 
 	/* Traverse only the allowed CPUs */
 	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
-		load = weighted_cpuload(i);
 
-		if (load < min_load || (load == min_load && i == this_cpu)) {
-			min_load = load;
-			idlest = i;
+		if (idle_cpu(i)) {
+
+			rq = cpu_rq(i);
+			power = rq->power;
+			idle_stamp = rq->idle_stamp;
+
+			/* The cpu is idle since a shorter time */
+			if (idle_stamp < min_idle_stamp) {
+				min_idle_stamp = idle_stamp;
+				cpu_idle = i;
+				continue;
+			}
+
+			/* The cpu is idle but the exit_latency is shorter */
+			if (power && power->exit_latency < min_exit_latency) {
+				min_exit_latency = power->exit_latency;
+				cpu_idle = i;
+				continue;
+			}
+		} else {
+
+			load = weighted_cpuload(i);
+
+			if (load < min_load ||
+			    (load == min_load && i == this_cpu)) {
+				min_load = load;
+				cpu_busy = i;
+				continue;
+			}
 		}
 	}
 
-	return idlest;
+	/* Busy cpus are considered less idle than idle cpus ;) */
+	return cpu_busy != -1 ? cpu_busy : cpu_idle;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
  2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
@ 2014-03-28 18:17   ` Nicolas Pitre
  2014-03-28 20:42     ` Daniel Lezcano
  0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-03-28 18:17 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Fri, 28 Mar 2014, Daniel Lezcano wrote:

> The scheduler needs some information from cpuidle to know the timing for a
> specific idle state a cpu is.
> 
> This patch creates a separate structure to group the cpuidle power info in
> order to share it with the scheduler. It improves the encapsulation of the
> code.

Having cpuidle_power as a structure name, or worse, 'power' as a struct 
member, is a really bad choice.  Amongst the fields this struct 
contains, only 1 out of 3 is about power.  The word "power" is already 
abused quite significantly to mean too many different things already.

I'd suggest something inspired by your own patch log message i.e. 
'struct cpuidle_info' instead, and use 'info' as a field name within 
struct cpuidle_state.  Having 'params" instead of "info" could be a good 
alternative too, although slightly longer.

And with struct rq in patch 2/3 I'd simply use:

	struct cpuidle_info *cpuidle;

This way you'll have rq->cpuidle->exit_latency to refer to from the 
scheduler context which is IMHO much more self explanatory.

> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  arch/arm/include/asm/cpuidle.h       |    6 +-
>  arch/arm/mach-exynos/cpuidle.c       |    4 +-
>  drivers/acpi/processor_idle.c        |    4 +-
>  drivers/base/power/domain.c          |    6 +-
>  drivers/cpuidle/cpuidle-at91.c       |    4 +-
>  drivers/cpuidle/cpuidle-big_little.c |    9 +--
>  drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
>  drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
>  drivers/cpuidle/cpuidle-powernv.c    |    8 +--
>  drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
>  drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
>  drivers/cpuidle/cpuidle-zynq.c       |    4 +-
>  drivers/cpuidle/driver.c             |    6 +-
>  drivers/cpuidle/governors/ladder.c   |   14 +++--
>  drivers/cpuidle/governors/menu.c     |    8 +--
>  drivers/cpuidle/sysfs.c              |    2 +-
>  drivers/idle/intel_idle.c            |  112 +++++++++++++++++-----------------
>  include/linux/cpuidle.h              |   10 ++-
>  18 files changed, 120 insertions(+), 113 deletions(-)
> 
> diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
> index 2fca60a..987ee53 100644
> --- a/arch/arm/include/asm/cpuidle.h
> +++ b/arch/arm/include/asm/cpuidle.h
> @@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
>  /* Common ARM WFI state */
>  #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\
>  	.enter                  = arm_cpuidle_simple_enter,\
> -	.exit_latency           = 1,\
> -	.target_residency       = 1,\
> -	.power_usage		= p,\
> +	.power.exit_latency     = 1,\
> +	.power.target_residency = 1,\
> +	.power.power_usage	= p,\
>  	.flags                  = CPUIDLE_FLAG_TIME_VALID,\
>  	.name                   = "WFI",\
>  	.desc                   = "ARM WFI",\
> diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c
> index f57cb91..f6275cb 100644
> --- a/arch/arm/mach-exynos/cpuidle.c
> +++ b/arch/arm/mach-exynos/cpuidle.c
> @@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = {
>  		[0] = ARM_CPUIDLE_WFI_STATE,
>  		[1] = {
>  			.enter			= exynos4_enter_lowpower,
> -			.exit_latency		= 300,
> -			.target_residency	= 100000,
> +			.power.exit_latency	= 300,
> +			.power.target_residency = 100000,
>  			.flags			= CPUIDLE_FLAG_TIME_VALID,
>  			.name			= "C1",
>  			.desc			= "ARM power down",
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 3dca36d..05fa991 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
>  		state = &drv->states[count];
>  		snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
>  		strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
> -		state->exit_latency = cx->latency;
> -		state->target_residency = cx->latency * latency_factor;
> +		state->power.exit_latency = cx->latency;
> +		state->power.target_residency = cx->latency * latency_factor;
>  
>  		state->flags = 0;
>  		switch (cx->type) {
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index bfb8955..6bcb1e8 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd)
>  	usecs64 = genpd->power_on_latency_ns;
>  	do_div(usecs64, NSEC_PER_USEC);
>  	usecs64 += genpd->cpu_data->saved_exit_latency;
> -	genpd->cpu_data->idle_state->exit_latency = usecs64;
> +	genpd->cpu_data->idle_state->power.exit_latency = usecs64;
>  }
>  
>  /**
> @@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state)
>  		goto err;
>  	}
>  	cpu_data->idle_state = idle_state;
> -	cpu_data->saved_exit_latency = idle_state->exit_latency;
> +	cpu_data->saved_exit_latency = idle_state->power.exit_latency;
>  	genpd->cpu_data = cpu_data;
>  	genpd_recalc_cpu_exit_latency(genpd);
>  
> @@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd)
>  		ret = -EAGAIN;
>  		goto out;
>  	}
> -	idle_state->exit_latency = cpu_data->saved_exit_latency;
> +	idle_state->power.exit_latency = cpu_data->saved_exit_latency;
>  	cpuidle_driver_unref();
>  	genpd->cpu_data = NULL;
>  	kfree(cpu_data);
> diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c
> index a077437..48c7063 100644
> --- a/drivers/cpuidle/cpuidle-at91.c
> +++ b/drivers/cpuidle/cpuidle-at91.c
> @@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = {
>  	.owner			= THIS_MODULE,
>  	.states[0]		= ARM_CPUIDLE_WFI_STATE,
>  	.states[1]		= {
> +		.power.exit_latency	= 10,
> +		.power.target_residency = 10000,
>  		.enter			= at91_enter_idle,
> -		.exit_latency		= 10,
> -		.target_residency	= 10000,
>  		.flags			= CPUIDLE_FLAG_TIME_VALID,
>  		.name			= "RAM_SR",
>  		.desc			= "WFI and DDR Self Refresh",
> diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c
> index b45fc62..5a0af4b 100644
> --- a/drivers/cpuidle/cpuidle-big_little.c
> +++ b/drivers/cpuidle/cpuidle-big_little.c
> @@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = {
>  	.owner = THIS_MODULE,
>  	.states[0] = ARM_CPUIDLE_WFI_STATE,
>  	.states[1] = {
> +		.power.exit_latency	= 700,
> +		.power.target_residency = 2500,
>  		.enter			= bl_enter_powerdown,
> -		.exit_latency		= 700,
> -		.target_residency	= 2500,
>  		.flags			= CPUIDLE_FLAG_TIME_VALID |
>  					  CPUIDLE_FLAG_TIMER_STOP,
>  		.name			= "C1",
> @@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = {
>  	.owner = THIS_MODULE,
>  	.states[0] = ARM_CPUIDLE_WFI_STATE,
>  	.states[1] = {
> +
> +		.power.exit_latency	= 500,
> +		.power.target_residency = 2000,
>  		.enter			= bl_enter_powerdown,
> -		.exit_latency		= 500,
> -		.target_residency	= 2000,
>  		.flags			= CPUIDLE_FLAG_TIME_VALID |
>  					  CPUIDLE_FLAG_TIMER_STOP,
>  		.name			= "C1",
> diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
> index 6e51114..8357a20 100644
> --- a/drivers/cpuidle/cpuidle-calxeda.c
> +++ b/drivers/cpuidle/cpuidle-calxeda.c
> @@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = {
>  			.name = "PG",
>  			.desc = "Power Gate",
>  			.flags = CPUIDLE_FLAG_TIME_VALID,
> -			.exit_latency = 30,
> -			.power_usage = 50,
> -			.target_residency = 200,
> +			.power.exit_latency = 30,
> +			.power.power_usage = 50,
> +			.power.target_residency = 200,
>  			.enter = calxeda_pwrdown_idle,
>  		},
>  	},
> diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
> index 41ba843..0ae4138 100644
> --- a/drivers/cpuidle/cpuidle-kirkwood.c
> +++ b/drivers/cpuidle/cpuidle-kirkwood.c
> @@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = {
>  	.owner			= THIS_MODULE,
>  	.states[0]		= ARM_CPUIDLE_WFI_STATE,
>  	.states[1]		= {
> +		.power.exit_latency	= 10,
> +		.power.target_residency = 100000,
>  		.enter			= kirkwood_enter_idle,
> -		.exit_latency		= 10,
> -		.target_residency	= 100000,
>  		.flags			= CPUIDLE_FLAG_TIME_VALID,
>  		.name			= "DDR SR",
>  		.desc			= "WFI and DDR Self Refresh",
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index f48607c..c47cc02 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = {
>  		.name = "snooze",
>  		.desc = "snooze",
>  		.flags = CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 0,
> -		.target_residency = 0,
> +		.power.exit_latency = 0,
> +		.power.target_residency = 0,
>  		.enter = &snooze_loop },
>  	{ /* NAP */
>  		.name = "NAP",
>  		.desc = "NAP",
>  		.flags = CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 10,
> -		.target_residency = 100,
> +		.power.exit_latency = 10,
> +		.power.target_residency = 100,
>  		.enter = &nap_loop },
>  };
>  
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 6f7b019..483d7e7 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = {
>  		.name = "snooze",
>  		.desc = "snooze",
>  		.flags = CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 0,
> -		.target_residency = 0,
> +		.power.exit_latency = 0,
> +		.power.target_residency = 0,
>  		.enter = &snooze_loop },
>  	{ /* CEDE */
>  		.name = "CEDE",
>  		.desc = "CEDE",
>  		.flags = CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 10,
> -		.target_residency = 100,
> +		.power.exit_latency = 10,
> +		.power.target_residency = 100,
>  		.enter = &dedicated_cede_loop },
>  };
>  
> @@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = {
>  		.name = "Shared Cede",
>  		.desc = "Shared Cede",
>  		.flags = CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 0,
> -		.target_residency = 0,
> +		.power.exit_latency = 0,
> +		.power.target_residency = 0,
>  		.enter = &shared_cede_loop },
>  };
>  
> diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c
> index 5e35804..3261eb2 100644
> --- a/drivers/cpuidle/cpuidle-ux500.c
> +++ b/drivers/cpuidle/cpuidle-ux500.c
> @@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = {
>  	.states = {
>  		ARM_CPUIDLE_WFI_STATE,
>  		{
> -			.enter		  = ux500_enter_idle,
> -			.exit_latency	  = 70,
> -			.target_residency = 260,
> -			.flags		  = CPUIDLE_FLAG_TIME_VALID |
> -			                    CPUIDLE_FLAG_TIMER_STOP,
> -			.name		  = "ApIdle",
> -			.desc		  = "ARM Retention",
> +			.power.exit_latency	= 70,
> +			.power.target_residency = 260,
> +			.enter			= ux500_enter_idle,
> +			.flags			= CPUIDLE_FLAG_TIME_VALID |
> +						CPUIDLE_FLAG_TIMER_STOP,
> +			.name			= "ApIdle",
> +			.desc			= "ARM Retention",
>  		},
>  	},
>  	.safe_state_index = 0,
> diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c
> index aded759..dddefb8 100644
> --- a/drivers/cpuidle/cpuidle-zynq.c
> +++ b/drivers/cpuidle/cpuidle-zynq.c
> @@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = {
>  	.states = {
>  		ARM_CPUIDLE_WFI_STATE,
>  		{
> +			.power.exit_latency	= 10,
> +			.power.target_residency = 10000,
>  			.enter			= zynq_enter_idle,
> -			.exit_latency		= 10,
> -			.target_residency	= 10000,
>  			.flags			= CPUIDLE_FLAG_TIME_VALID |
>  						  CPUIDLE_FLAG_TIMER_STOP,
>  			.name			= "RAM_SR",
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 06dbe7c..40ddd3c 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv)
>  
>  	snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
>  	snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
> -	state->exit_latency = 0;
> -	state->target_residency = 0;
> -	state->power_usage = -1;
> +	state->power.exit_latency = 0;
> +	state->power.target_residency = 0;
> +	state->power.power_usage = -1;
>  	state->flags = 0;
>  	state->enter = poll_idle;
>  	state->disabled = false;
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index 9f08e8c..4837880 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>  
>  	if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
>  		last_residency = cpuidle_get_last_residency(dev) - \
> -					 drv->states[last_idx].exit_latency;
> +			drv->states[last_idx].power.exit_latency;
>  	}
>  	else
>  		last_residency = last_state->threshold.promotion_time + 1;
> @@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>  	    !drv->states[last_idx + 1].disabled &&
>  	    !dev->states_usage[last_idx + 1].disable &&
>  	    last_residency > last_state->threshold.promotion_time &&
> -	    drv->states[last_idx + 1].exit_latency <= latency_req) {
> +	    drv->states[last_idx + 1].power.exit_latency <= latency_req) {
>  		last_state->stats.promotion_count++;
>  		last_state->stats.demotion_count = 0;
>  		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
> @@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>  	if (last_idx > CPUIDLE_DRIVER_STATE_START &&
>  	    (drv->states[last_idx].disabled ||
>  	    dev->states_usage[last_idx].disable ||
> -	    drv->states[last_idx].exit_latency > latency_req)) {
> +	    drv->states[last_idx].power.exit_latency > latency_req)) {
>  		int i;
>  
>  		for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
> -			if (drv->states[i].exit_latency <= latency_req)
> +			if (drv->states[i].power.exit_latency <= latency_req)
>  				break;
>  		}
>  		ladder_do_selection(ldev, last_idx, i);
> @@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>  		lstate->threshold.demotion_count = DEMOTION_COUNT;
>  
>  		if (i < drv->state_count - 1)
> -			lstate->threshold.promotion_time = state->exit_latency;
> +			lstate->threshold.promotion_time =
> +				state->power.exit_latency;
>  		if (i > 0)
> -			lstate->threshold.demotion_time = state->exit_latency;
> +			lstate->threshold.demotion_time =
> +				state->power.exit_latency;
>  	}
>  
>  	return 0;
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..34bd463 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  
>  		if (s->disabled || su->disable)
>  			continue;
> -		if (s->target_residency > data->predicted_us)
> +		if (s->power.target_residency > data->predicted_us)
>  			continue;
> -		if (s->exit_latency > latency_req)
> +		if (s->power.exit_latency > latency_req)
>  			continue;
> -		if (s->exit_latency * multiplier > data->predicted_us)
> +		if (s->power.exit_latency * multiplier > data->predicted_us)
>  			continue;
>  
>  		data->last_state_idx = i;
> -		data->exit_us = s->exit_latency;
> +		data->exit_us = s->power.exit_latency;
>  	}
>  
>  	return data->last_state_idx;
> diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
> index e918b6d..1a45541 100644
> --- a/drivers/cpuidle/sysfs.c
> +++ b/drivers/cpuidle/sysfs.c
> @@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store)
>  static ssize_t show_state_##_name(struct cpuidle_state *state, \
>  			 struct cpuidle_state_usage *state_usage, char *buf) \
>  { \
> -	return sprintf(buf, "%u\n", state->_name);\
> +	return sprintf(buf, "%u\n", state->power._name);\
>  }
>  
>  #define define_store_state_ull_function(_name) \
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 8e1939f..4f0533e 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = {
>  		.name = "C1-NHM",
>  		.desc = "MWAIT 0x00",
>  		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 3,
> -		.target_residency = 6,
> +		.power.exit_latency = 3,
> +		.power.target_residency = 6,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C1E-NHM",
>  		.desc = "MWAIT 0x01",
>  		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 10,
> -		.target_residency = 20,
> +		.power.exit_latency = 10,
> +		.power.target_residency = 20,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C3-NHM",
>  		.desc = "MWAIT 0x10",
>  		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 20,
> -		.target_residency = 80,
> +		.power.exit_latency = 20,
> +		.power.target_residency = 80,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C6-NHM",
>  		.desc = "MWAIT 0x20",
>  		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 200,
> -		.target_residency = 800,
> +		.power.exit_latency = 200,
> +		.power.target_residency = 800,
>  		.enter = &intel_idle },
>  	{
>  		.enter = NULL }
> @@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = {
>  		.name = "C1-SNB",
>  		.desc = "MWAIT 0x00",
>  		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 2,
> -		.target_residency = 2,
> +		.power.exit_latency = 2,
> +		.power.target_residency = 2,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C1E-SNB",
>  		.desc = "MWAIT 0x01",
>  		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 10,
> -		.target_residency = 20,
> +		.power.exit_latency = 10,
> +		.power.target_residency = 20,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C3-SNB",
>  		.desc = "MWAIT 0x10",
>  		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 80,
> -		.target_residency = 211,
> +		.power.exit_latency = 80,
> +		.power.target_residency = 211,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C6-SNB",
>  		.desc = "MWAIT 0x20",
>  		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 104,
> -		.target_residency = 345,
> +		.power.exit_latency = 104,
> +		.power.target_residency = 345,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C7-SNB",
>  		.desc = "MWAIT 0x30",
>  		.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 109,
> -		.target_residency = 345,
> +		.power.exit_latency = 109,
> +		.power.target_residency = 345,
>  		.enter = &intel_idle },
>  	{
>  		.enter = NULL }
> @@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = {
>  		.name = "C1-IVB",
>  		.desc = "MWAIT 0x00",
>  		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 1,
> -		.target_residency = 1,
> +		.power.exit_latency = 1,
> +		.power.target_residency = 1,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C1E-IVB",
>  		.desc = "MWAIT 0x01",
>  		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 10,
> -		.target_residency = 20,
> +		.power.exit_latency = 10,
> +		.power.target_residency = 20,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C3-IVB",
>  		.desc = "MWAIT 0x10",
>  		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 59,
> -		.target_residency = 156,
> +		.power.exit_latency = 59,
> +		.power.target_residency = 156,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C6-IVB",
>  		.desc = "MWAIT 0x20",
>  		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 80,
> -		.target_residency = 300,
> +		.power.exit_latency = 80,
> +		.power.target_residency = 300,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C7-IVB",
>  	A	.desc = "MWAIT 0x30",
>  		.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 87,
> -		.target_residency = 300,
> +		.power.exit_latency = 87,
> +		.power.target_residency = 300,
>  		.enter = &intel_idle },
>  	{
>  		.enter = NULL }
> @@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = {
>  		.name = "C1-HSW",
>  		.desc = "MWAIT 0x00",
>  		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 2,
> -		.target_residency = 2,
> +		.power.exit_latency = 2,
> +		.power.target_residency = 2,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C1E-HSW",
>  		.desc = "MWAIT 0x01",
>  		.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 10,
> -		.target_residency = 20,
> +		.power.exit_latency = 10,
> +		.power.target_residency = 20,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C3-HSW",
>  		.desc = "MWAIT 0x10",
>  		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 33,
> -		.target_residency = 100,
> +		.power.exit_latency = 33,
> +		.power.target_residency = 100,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C6-HSW",
>  		.desc = "MWAIT 0x20",
>  		.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 133,
> -		.target_residency = 400,
> +		.power.exit_latency = 133,
> +		.power.target_residency = 400,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C7s-HSW",
>  		.desc = "MWAIT 0x32",
>  		.flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 166,
> -		.target_residency = 500,
> +		.power.exit_latency = 166,
> +		.power.target_residency = 500,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C8-HSW",
>  		.desc = "MWAIT 0x40",
>  		.flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 300,
> -		.target_residency = 900,
> +		.power.exit_latency = 300,
> +		.power.target_residency = 900,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C9-HSW",
>  		.desc = "MWAIT 0x50",
>  		.flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 600,
> -		.target_residency = 1800,
> +		.power.exit_latency = 600,
> +		.power.target_residency = 1800,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C10-HSW",
>  		.desc = "MWAIT 0x60",
>  		.flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 2600,
> -		.target_residency = 7700,
> +		.power.exit_latency = 2600,
> +		.power.target_residency = 7700,
>  		.enter = &intel_idle },
>  	{
>  		.enter = NULL }
> @@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = {
>  		.name = "C1E-ATM",
>  		.desc = "MWAIT 0x00",
>  		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 10,
> -		.target_residency = 20,
> +		.power.exit_latency = 10,
> +		.power.target_residency = 20,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C2-ATM",
>  		.desc = "MWAIT 0x10",
>  		.flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 20,
> -		.target_residency = 80,
> +		.power.exit_latency = 20,
> +		.power.target_residency = 80,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C4-ATM",
>  		.desc = "MWAIT 0x30",
>  		.flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 100,
> -		.target_residency = 400,
> +		.power.exit_latency = 100,
> +		.power.target_residency = 400,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C6-ATM",
>  		.desc = "MWAIT 0x52",
>  		.flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 140,
> -		.target_residency = 560,
> +		.power.exit_latency = 140,
> +		.power.target_residency = 560,
>  		.enter = &intel_idle },
>  	{
>  		.enter = NULL }
> @@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = {
>  		.name = "C1-AVN",
>  		.desc = "MWAIT 0x00",
>  		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
> -		.exit_latency = 2,
> -		.target_residency = 2,
> +		.power.exit_latency = 2,
> +		.power.target_residency = 2,
>  		.enter = &intel_idle },
>  	{
>  		.name = "C6-AVN",
>  		.desc = "MWAIT 0x51",
>  		.flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
> -		.exit_latency = 15,
> -		.target_residency = 45,
> +		.power.exit_latency = 15,
> +		.power.target_residency = 45,
>  		.enter = &intel_idle },
>  	{
>  		.enter = NULL }
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index b0238cb..eb58ab3 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -35,14 +35,18 @@ struct cpuidle_state_usage {
>  	unsigned long long	time; /* in US */
>  };
>  
> +struct cpuidle_power {
> +	unsigned int	exit_latency; /* in US */
> +	unsigned int	target_residency; /* in US */
> +	int		power_usage; /* in mW */
> +};
> +
>  struct cpuidle_state {
>  	char		name[CPUIDLE_NAME_LEN];
>  	char		desc[CPUIDLE_DESC_LEN];
>  
>  	unsigned int	flags;
> -	unsigned int	exit_latency; /* in US */
> -	int		power_usage; /* in mW */
> -	unsigned int	target_residency; /* in US */
> +	struct cpuidle_power power;
>  	bool		disabled; /* disabled on all CPUs */
>  
>  	int (*enter)	(struct cpuidle_device *dev,
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
  2014-03-28 18:17   ` Nicolas Pitre
@ 2014-03-28 20:42     ` Daniel Lezcano
  2014-03-29  0:00       ` Nicolas Pitre
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-28 20:42 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: LKML, mingo, Peter Zijlstra, Rafael J. Wysocki, linux-pm,
	Alex Shi, Vincent Guittot, Morten Rasmussen

Hi Nicolas,

thanks for reviewing the patchset.

On 03/28/2014 07:17 PM, Nicolas Pitre wrote:
> On Fri, 28 Mar 2014, Daniel Lezcano wrote:
>
>> The scheduler needs some information from cpuidle to know the timing for a
>> specific idle state a cpu is.
>>
>> This patch creates a separate structure to group the cpuidle power info in
>> order to share it with the scheduler. It improves the encapsulation of the
>> code.
>
> Having cpuidle_power as a structure name, or worse, 'power' as a struct
> member, is a really bad choice.

Yes, I was asking myself if this name was a good choice or not. I
assumed 'power' could have been a good name because 'target_residency'
is a time conversion of the power needed to enter this state.

> Amongst the fields this struct
> contains, only 1 out of 3 is about power.  The word "power" is already
> abused quite significantly to mean too many different things already.
>
> I'd suggest something inspired by your own patch log message i.e.
> 'struct cpuidle_info' instead, and use 'info' as a field name within
> struct cpuidle_state.  Having 'params" instead of "info" could be a good
> alternative too, although slightly longer.

Hmm 'info' or 'param' sound too vague. What about:

cpuidle_attr
or
cpuidle_property

?

> And with struct rq in patch 2/3 I'd simply use:
>
> struct cpuidle_info *cpuidle;
>
> This way you'll have rq->cpuidle->exit_latency to refer to from the
> scheduler context which is IMHO much more self explanatory.

Ok, sounds good.

>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> ---
>>   arch/arm/include/asm/cpuidle.h       |    6 +-
>>   arch/arm/mach-exynos/cpuidle.c       |    4 +-
>>   drivers/acpi/processor_idle.c        |    4 +-
>>   drivers/base/power/domain.c          |    6 +-
>>   drivers/cpuidle/cpuidle-at91.c       |    4 +-
>>   drivers/cpuidle/cpuidle-big_little.c |    9 +--
>>   drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
>>   drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
>>   drivers/cpuidle/cpuidle-powernv.c    |    8 +--
>>   drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
>>   drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
>>   drivers/cpuidle/cpuidle-zynq.c       |    4 +-
>>   drivers/cpuidle/driver.c             |    6 +-
>>   drivers/cpuidle/governors/ladder.c   |   14 +++--
>>   drivers/cpuidle/governors/menu.c     |    8 +--
>>   drivers/cpuidle/sysfs.c              |    2 +-
>>   drivers/idle/intel_idle.c            |  112 +++++++++++++++++-----------------
>>   include/linux/cpuidle.h              |   10 ++-
>>   18 files changed, 120 insertions(+), 113 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/cpuidle.h b/arch/arm/include/asm/cpuidle.h
>> index 2fca60a..987ee53 100644
>> --- a/arch/arm/include/asm/cpuidle.h
>> +++ b/arch/arm/include/asm/cpuidle.h
>> @@ -12,9 +12,9 @@ static inline int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
>>   /* Common ARM WFI state */
>>   #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\
>>   .enter                  = arm_cpuidle_simple_enter,\
>> - .exit_latency           = 1,\
>> - .target_residency       = 1,\
>> - .power_usage = p,\
>> + .power.exit_latency     = 1,\
>> + .power.target_residency = 1,\
>> + .power.power_usage = p,\
>>   .flags                  = CPUIDLE_FLAG_TIME_VALID,\
>>   .name                   = "WFI",\
>>   .desc                   = "ARM WFI",\
>> diff --git a/arch/arm/mach-exynos/cpuidle.c b/arch/arm/mach-exynos/cpuidle.c
>> index f57cb91..f6275cb 100644
>> --- a/arch/arm/mach-exynos/cpuidle.c
>> +++ b/arch/arm/mach-exynos/cpuidle.c
>> @@ -73,8 +73,8 @@ static struct cpuidle_driver exynos4_idle_driver = {
>>   [0] = ARM_CPUIDLE_WFI_STATE,
>>   [1] = {
>>   .enter = exynos4_enter_lowpower,
>> - .exit_latency = 300,
>> - .target_residency = 100000,
>> + .power.exit_latency = 300,
>> + .power.target_residency = 100000,
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>>   .name = "C1",
>>   .desc = "ARM power down",
>> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
>> index 3dca36d..05fa991 100644
>> --- a/drivers/acpi/processor_idle.c
>> +++ b/drivers/acpi/processor_idle.c
>> @@ -979,8 +979,8 @@ static int acpi_processor_setup_cpuidle_states(struct acpi_processor *pr)
>>   state = &drv->states[count];
>>   snprintf(state->name, CPUIDLE_NAME_LEN, "C%d", i);
>>   strncpy(state->desc, cx->desc, CPUIDLE_DESC_LEN);
>> - state->exit_latency = cx->latency;
>> - state->target_residency = cx->latency * latency_factor;
>> + state->power.exit_latency = cx->latency;
>> + state->power.target_residency = cx->latency * latency_factor;
>>
>>   state->flags = 0;
>>   switch (cx->type) {
>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>> index bfb8955..6bcb1e8 100644
>> --- a/drivers/base/power/domain.c
>> +++ b/drivers/base/power/domain.c
>> @@ -154,7 +154,7 @@ static void genpd_recalc_cpu_exit_latency(struct generic_pm_domain *genpd)
>>   usecs64 = genpd->power_on_latency_ns;
>>   do_div(usecs64, NSEC_PER_USEC);
>>   usecs64 += genpd->cpu_data->saved_exit_latency;
>> - genpd->cpu_data->idle_state->exit_latency = usecs64;
>> + genpd->cpu_data->idle_state->power.exit_latency = usecs64;
>>   }
>>
>>   /**
>> @@ -1882,7 +1882,7 @@ int pm_genpd_attach_cpuidle(struct generic_pm_domain *genpd, int state)
>>   goto err;
>>   }
>>   cpu_data->idle_state = idle_state;
>> - cpu_data->saved_exit_latency = idle_state->exit_latency;
>> + cpu_data->saved_exit_latency = idle_state->power.exit_latency;
>>   genpd->cpu_data = cpu_data;
>>   genpd_recalc_cpu_exit_latency(genpd);
>>
>> @@ -1936,7 +1936,7 @@ int pm_genpd_detach_cpuidle(struct generic_pm_domain *genpd)
>>   ret = -EAGAIN;
>>   goto out;
>>   }
>> - idle_state->exit_latency = cpu_data->saved_exit_latency;
>> + idle_state->power.exit_latency = cpu_data->saved_exit_latency;
>>   cpuidle_driver_unref();
>>   genpd->cpu_data = NULL;
>>   kfree(cpu_data);
>> diff --git a/drivers/cpuidle/cpuidle-at91.c b/drivers/cpuidle/cpuidle-at91.c
>> index a077437..48c7063 100644
>> --- a/drivers/cpuidle/cpuidle-at91.c
>> +++ b/drivers/cpuidle/cpuidle-at91.c
>> @@ -40,9 +40,9 @@ static struct cpuidle_driver at91_idle_driver = {
>>   .owner = THIS_MODULE,
>>   .states[0] = ARM_CPUIDLE_WFI_STATE,
>>   .states[1] = {
>> + .power.exit_latency = 10,
>> + .power.target_residency = 10000,
>>   .enter = at91_enter_idle,
>> - .exit_latency = 10,
>> - .target_residency = 10000,
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>>   .name = "RAM_SR",
>>   .desc = "WFI and DDR Self Refresh",
>> diff --git a/drivers/cpuidle/cpuidle-big_little.c b/drivers/cpuidle/cpuidle-big_little.c
>> index b45fc62..5a0af4b 100644
>> --- a/drivers/cpuidle/cpuidle-big_little.c
>> +++ b/drivers/cpuidle/cpuidle-big_little.c
>> @@ -62,9 +62,9 @@ static struct cpuidle_driver bl_idle_little_driver = {
>>   .owner = THIS_MODULE,
>>   .states[0] = ARM_CPUIDLE_WFI_STATE,
>>   .states[1] = {
>> + .power.exit_latency = 700,
>> + .power.target_residency = 2500,
>>   .enter = bl_enter_powerdown,
>> - .exit_latency = 700,
>> - .target_residency = 2500,
>>   .flags = CPUIDLE_FLAG_TIME_VALID |
>>    CPUIDLE_FLAG_TIMER_STOP,
>>   .name = "C1",
>> @@ -78,9 +78,10 @@ static struct cpuidle_driver bl_idle_big_driver = {
>>   .owner = THIS_MODULE,
>>   .states[0] = ARM_CPUIDLE_WFI_STATE,
>>   .states[1] = {
>> +
>> + .power.exit_latency = 500,
>> + .power.target_residency = 2000,
>>   .enter = bl_enter_powerdown,
>> - .exit_latency = 500,
>> - .target_residency = 2000,
>>   .flags = CPUIDLE_FLAG_TIME_VALID |
>>    CPUIDLE_FLAG_TIMER_STOP,
>>   .name = "C1",
>> diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
>> index 6e51114..8357a20 100644
>> --- a/drivers/cpuidle/cpuidle-calxeda.c
>> +++ b/drivers/cpuidle/cpuidle-calxeda.c
>> @@ -56,9 +56,9 @@ static struct cpuidle_driver calxeda_idle_driver = {
>>   .name = "PG",
>>   .desc = "Power Gate",
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 30,
>> - .power_usage = 50,
>> - .target_residency = 200,
>> + .power.exit_latency = 30,
>> + .power.power_usage = 50,
>> + .power.target_residency = 200,
>>   .enter = calxeda_pwrdown_idle,
>>   },
>>   },
>> diff --git a/drivers/cpuidle/cpuidle-kirkwood.c b/drivers/cpuidle/cpuidle-kirkwood.c
>> index 41ba843..0ae4138 100644
>> --- a/drivers/cpuidle/cpuidle-kirkwood.c
>> +++ b/drivers/cpuidle/cpuidle-kirkwood.c
>> @@ -44,9 +44,9 @@ static struct cpuidle_driver kirkwood_idle_driver = {
>>   .owner = THIS_MODULE,
>>   .states[0] = ARM_CPUIDLE_WFI_STATE,
>>   .states[1] = {
>> + .power.exit_latency = 10,
>> + .power.target_residency = 100000,
>>   .enter = kirkwood_enter_idle,
>> - .exit_latency = 10,
>> - .target_residency = 100000,
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>>   .name = "DDR SR",
>>   .desc = "WFI and DDR Self Refresh",
>> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
>> index f48607c..c47cc02 100644
>> --- a/drivers/cpuidle/cpuidle-powernv.c
>> +++ b/drivers/cpuidle/cpuidle-powernv.c
>> @@ -62,15 +62,15 @@ static struct cpuidle_state powernv_states[] = {
>>   .name = "snooze",
>>   .desc = "snooze",
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 0,
>> - .target_residency = 0,
>> + .power.exit_latency = 0,
>> + .power.target_residency = 0,
>>   .enter = &snooze_loop },
>>   { /* NAP */
>>   .name = "NAP",
>>   .desc = "NAP",
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 100,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 100,
>>   .enter = &nap_loop },
>>   };
>>
>> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
>> index 6f7b019..483d7e7 100644
>> --- a/drivers/cpuidle/cpuidle-pseries.c
>> +++ b/drivers/cpuidle/cpuidle-pseries.c
>> @@ -143,15 +143,15 @@ static struct cpuidle_state dedicated_states[] = {
>>   .name = "snooze",
>>   .desc = "snooze",
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 0,
>> - .target_residency = 0,
>> + .power.exit_latency = 0,
>> + .power.target_residency = 0,
>>   .enter = &snooze_loop },
>>   { /* CEDE */
>>   .name = "CEDE",
>>   .desc = "CEDE",
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 100,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 100,
>>   .enter = &dedicated_cede_loop },
>>   };
>>
>> @@ -163,8 +163,8 @@ static struct cpuidle_state shared_states[] = {
>>   .name = "Shared Cede",
>>   .desc = "Shared Cede",
>>   .flags = CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 0,
>> - .target_residency = 0,
>> + .power.exit_latency = 0,
>> + .power.target_residency = 0,
>>   .enter = &shared_cede_loop },
>>   };
>>
>> diff --git a/drivers/cpuidle/cpuidle-ux500.c b/drivers/cpuidle/cpuidle-ux500.c
>> index 5e35804..3261eb2 100644
>> --- a/drivers/cpuidle/cpuidle-ux500.c
>> +++ b/drivers/cpuidle/cpuidle-ux500.c
>> @@ -98,13 +98,13 @@ static struct cpuidle_driver ux500_idle_driver = {
>>   .states = {
>>   ARM_CPUIDLE_WFI_STATE,
>>   {
>> - .enter  = ux500_enter_idle,
>> - .exit_latency  = 70,
>> - .target_residency = 260,
>> - .flags  = CPUIDLE_FLAG_TIME_VALID |
>> -                    CPUIDLE_FLAG_TIMER_STOP,
>> - .name  = "ApIdle",
>> - .desc  = "ARM Retention",
>> + .power.exit_latency = 70,
>> + .power.target_residency = 260,
>> + .enter = ux500_enter_idle,
>> + .flags = CPUIDLE_FLAG_TIME_VALID |
>> + CPUIDLE_FLAG_TIMER_STOP,
>> + .name = "ApIdle",
>> + .desc = "ARM Retention",
>>   },
>>   },
>>   .safe_state_index = 0,
>> diff --git a/drivers/cpuidle/cpuidle-zynq.c b/drivers/cpuidle/cpuidle-zynq.c
>> index aded759..dddefb8 100644
>> --- a/drivers/cpuidle/cpuidle-zynq.c
>> +++ b/drivers/cpuidle/cpuidle-zynq.c
>> @@ -56,9 +56,9 @@ static struct cpuidle_driver zynq_idle_driver = {
>>   .states = {
>>   ARM_CPUIDLE_WFI_STATE,
>>   {
>> + .power.exit_latency = 10,
>> + .power.target_residency = 10000,
>>   .enter = zynq_enter_idle,
>> - .exit_latency = 10,
>> - .target_residency = 10000,
>>   .flags = CPUIDLE_FLAG_TIME_VALID |
>>    CPUIDLE_FLAG_TIMER_STOP,
>>   .name = "RAM_SR",
>> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
>> index 06dbe7c..40ddd3c 100644
>> --- a/drivers/cpuidle/driver.c
>> +++ b/drivers/cpuidle/driver.c
>> @@ -206,9 +206,9 @@ static void poll_idle_init(struct cpuidle_driver *drv)
>>
>>   snprintf(state->name, CPUIDLE_NAME_LEN, "POLL");
>>   snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE");
>> - state->exit_latency = 0;
>> - state->target_residency = 0;
>> - state->power_usage = -1;
>> + state->power.exit_latency = 0;
>> + state->power.target_residency = 0;
>> + state->power.power_usage = -1;
>>   state->flags = 0;
>>   state->enter = poll_idle;
>>   state->disabled = false;
>> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
>> index 9f08e8c..4837880 100644
>> --- a/drivers/cpuidle/governors/ladder.c
>> +++ b/drivers/cpuidle/governors/ladder.c
>> @@ -81,7 +81,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>>
>>   if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
>>   last_residency = cpuidle_get_last_residency(dev) - \
>> - drv->states[last_idx].exit_latency;
>> + drv->states[last_idx].power.exit_latency;
>>   }
>>   else
>>   last_residency = last_state->threshold.promotion_time + 1;
>> @@ -91,7 +91,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>>      !drv->states[last_idx + 1].disabled &&
>>      !dev->states_usage[last_idx + 1].disable &&
>>      last_residency > last_state->threshold.promotion_time &&
>> -    drv->states[last_idx + 1].exit_latency <= latency_req) {
>> +    drv->states[last_idx + 1].power.exit_latency <= latency_req) {
>>   last_state->stats.promotion_count++;
>>   last_state->stats.demotion_count = 0;
>>   if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
>> @@ -104,11 +104,11 @@ static int ladder_select_state(struct cpuidle_driver *drv,
>>   if (last_idx > CPUIDLE_DRIVER_STATE_START &&
>>      (drv->states[last_idx].disabled ||
>>      dev->states_usage[last_idx].disable ||
>> -    drv->states[last_idx].exit_latency > latency_req)) {
>> +    drv->states[last_idx].power.exit_latency > latency_req)) {
>>   int i;
>>
>>   for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
>> - if (drv->states[i].exit_latency <= latency_req)
>> + if (drv->states[i].power.exit_latency <= latency_req)
>>   break;
>>   }
>>   ladder_do_selection(ldev, last_idx, i);
>> @@ -155,9 +155,11 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>>   lstate->threshold.demotion_count = DEMOTION_COUNT;
>>
>>   if (i < drv->state_count - 1)
>> - lstate->threshold.promotion_time = state->exit_latency;
>> + lstate->threshold.promotion_time =
>> + state->power.exit_latency;
>>   if (i > 0)
>> - lstate->threshold.demotion_time = state->exit_latency;
>> + lstate->threshold.demotion_time =
>> + state->power.exit_latency;
>>   }
>>
>>   return 0;
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index cf7f2f0..34bd463 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -351,15 +351,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>>
>>   if (s->disabled || su->disable)
>>   continue;
>> - if (s->target_residency > data->predicted_us)
>> + if (s->power.target_residency > data->predicted_us)
>>   continue;
>> - if (s->exit_latency > latency_req)
>> + if (s->power.exit_latency > latency_req)
>>   continue;
>> - if (s->exit_latency * multiplier > data->predicted_us)
>> + if (s->power.exit_latency * multiplier > data->predicted_us)
>>   continue;
>>
>>   data->last_state_idx = i;
>> - data->exit_us = s->exit_latency;
>> + data->exit_us = s->power.exit_latency;
>>   }
>>
>>   return data->last_state_idx;
>> diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
>> index e918b6d..1a45541 100644
>> --- a/drivers/cpuidle/sysfs.c
>> +++ b/drivers/cpuidle/sysfs.c
>> @@ -252,7 +252,7 @@ static struct cpuidle_state_attr attr_##_name = __ATTR(_name, 0644, show, store)
>>   static ssize_t show_state_##_name(struct cpuidle_state *state, \
>>   struct cpuidle_state_usage *state_usage, char *buf) \
>>   { \
>> - return sprintf(buf, "%u\n", state->_name);\
>> + return sprintf(buf, "%u\n", state->power._name);\
>>   }
>>
>>   #define define_store_state_ull_function(_name) \
>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>> index 8e1939f..4f0533e 100644
>> --- a/drivers/idle/intel_idle.c
>> +++ b/drivers/idle/intel_idle.c
>> @@ -128,29 +128,29 @@ static struct cpuidle_state nehalem_cstates[] = {
>>   .name = "C1-NHM",
>>   .desc = "MWAIT 0x00",
>>   .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 3,
>> - .target_residency = 6,
>> + .power.exit_latency = 3,
>> + .power.target_residency = 6,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C1E-NHM",
>>   .desc = "MWAIT 0x01",
>>   .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C3-NHM",
>>   .desc = "MWAIT 0x10",
>>   .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 20,
>> - .target_residency = 80,
>> + .power.exit_latency = 20,
>> + .power.target_residency = 80,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C6-NHM",
>>   .desc = "MWAIT 0x20",
>>   .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 200,
>> - .target_residency = 800,
>> + .power.exit_latency = 200,
>> + .power.target_residency = 800,
>>   .enter = &intel_idle },
>>   {
>>   .enter = NULL }
>> @@ -161,36 +161,36 @@ static struct cpuidle_state snb_cstates[] = {
>>   .name = "C1-SNB",
>>   .desc = "MWAIT 0x00",
>>   .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 2,
>> - .target_residency = 2,
>> + .power.exit_latency = 2,
>> + .power.target_residency = 2,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C1E-SNB",
>>   .desc = "MWAIT 0x01",
>>   .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C3-SNB",
>>   .desc = "MWAIT 0x10",
>>   .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 80,
>> - .target_residency = 211,
>> + .power.exit_latency = 80,
>> + .power.target_residency = 211,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C6-SNB",
>>   .desc = "MWAIT 0x20",
>>   .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 104,
>> - .target_residency = 345,
>> + .power.exit_latency = 104,
>> + .power.target_residency = 345,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C7-SNB",
>>   .desc = "MWAIT 0x30",
>>   .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 109,
>> - .target_residency = 345,
>> + .power.exit_latency = 109,
>> + .power.target_residency = 345,
>>   .enter = &intel_idle },
>>   {
>>   .enter = NULL }
>> @@ -201,36 +201,36 @@ static struct cpuidle_state ivb_cstates[] = {
>>   .name = "C1-IVB",
>>   .desc = "MWAIT 0x00",
>>   .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 1,
>> - .target_residency = 1,
>> + .power.exit_latency = 1,
>> + .power.target_residency = 1,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C1E-IVB",
>>   .desc = "MWAIT 0x01",
>>   .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C3-IVB",
>>   .desc = "MWAIT 0x10",
>>   .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 59,
>> - .target_residency = 156,
>> + .power.exit_latency = 59,
>> + .power.target_residency = 156,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C6-IVB",
>>   .desc = "MWAIT 0x20",
>>   .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 80,
>> - .target_residency = 300,
>> + .power.exit_latency = 80,
>> + .power.target_residency = 300,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C7-IVB",
>>   A .desc = "MWAIT 0x30",
>>   .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 87,
>> - .target_residency = 300,
>> + .power.exit_latency = 87,
>> + .power.target_residency = 300,
>>   .enter = &intel_idle },
>>   {
>>   .enter = NULL }
>> @@ -241,57 +241,57 @@ static struct cpuidle_state hsw_cstates[] = {
>>   .name = "C1-HSW",
>>   .desc = "MWAIT 0x00",
>>   .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 2,
>> - .target_residency = 2,
>> + .power.exit_latency = 2,
>> + .power.target_residency = 2,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C1E-HSW",
>>   .desc = "MWAIT 0x01",
>>   .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C3-HSW",
>>   .desc = "MWAIT 0x10",
>>   .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 33,
>> - .target_residency = 100,
>> + .power.exit_latency = 33,
>> + .power.target_residency = 100,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C6-HSW",
>>   .desc = "MWAIT 0x20",
>>   .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 133,
>> - .target_residency = 400,
>> + .power.exit_latency = 133,
>> + .power.target_residency = 400,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C7s-HSW",
>>   .desc = "MWAIT 0x32",
>>   .flags = MWAIT2flg(0x32) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 166,
>> - .target_residency = 500,
>> + .power.exit_latency = 166,
>> + .power.target_residency = 500,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C8-HSW",
>>   .desc = "MWAIT 0x40",
>>   .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 300,
>> - .target_residency = 900,
>> + .power.exit_latency = 300,
>> + .power.target_residency = 900,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C9-HSW",
>>   .desc = "MWAIT 0x50",
>>   .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 600,
>> - .target_residency = 1800,
>> + .power.exit_latency = 600,
>> + .power.target_residency = 1800,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C10-HSW",
>>   .desc = "MWAIT 0x60",
>>   .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 2600,
>> - .target_residency = 7700,
>> + .power.exit_latency = 2600,
>> + .power.target_residency = 7700,
>>   .enter = &intel_idle },
>>   {
>>   .enter = NULL }
>> @@ -302,29 +302,29 @@ static struct cpuidle_state atom_cstates[] = {
>>   .name = "C1E-ATM",
>>   .desc = "MWAIT 0x00",
>>   .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 10,
>> - .target_residency = 20,
>> + .power.exit_latency = 10,
>> + .power.target_residency = 20,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C2-ATM",
>>   .desc = "MWAIT 0x10",
>>   .flags = MWAIT2flg(0x10) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 20,
>> - .target_residency = 80,
>> + .power.exit_latency = 20,
>> + .power.target_residency = 80,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C4-ATM",
>>   .desc = "MWAIT 0x30",
>>   .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 100,
>> - .target_residency = 400,
>> + .power.exit_latency = 100,
>> + .power.target_residency = 400,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C6-ATM",
>>   .desc = "MWAIT 0x52",
>>   .flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 140,
>> - .target_residency = 560,
>> + .power.exit_latency = 140,
>> + .power.target_residency = 560,
>>   .enter = &intel_idle },
>>   {
>>   .enter = NULL }
>> @@ -334,15 +334,15 @@ static struct cpuidle_state avn_cstates[] = {
>>   .name = "C1-AVN",
>>   .desc = "MWAIT 0x00",
>>   .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
>> - .exit_latency = 2,
>> - .target_residency = 2,
>> + .power.exit_latency = 2,
>> + .power.target_residency = 2,
>>   .enter = &intel_idle },
>>   {
>>   .name = "C6-AVN",
>>   .desc = "MWAIT 0x51",
>>   .flags = MWAIT2flg(0x51) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
>> - .exit_latency = 15,
>> - .target_residency = 45,
>> + .power.exit_latency = 15,
>> + .power.target_residency = 45,
>>   .enter = &intel_idle },
>>   {
>>   .enter = NULL }
>> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
>> index b0238cb..eb58ab3 100644
>> --- a/include/linux/cpuidle.h
>> +++ b/include/linux/cpuidle.h
>> @@ -35,14 +35,18 @@ struct cpuidle_state_usage {
>>   unsigned long long time; /* in US */
>>   };
>>
>> +struct cpuidle_power {
>> + unsigned int exit_latency; /* in US */
>> + unsigned int target_residency; /* in US */
>> + int power_usage; /* in mW */
>> +};
>> +
>>   struct cpuidle_state {
>>   char name[CPUIDLE_NAME_LEN];
>>   char desc[CPUIDLE_DESC_LEN];
>>
>>   unsigned int flags;
>> - unsigned int exit_latency; /* in US */
>> - int power_usage; /* in mW */
>> - unsigned int target_residency; /* in US */
>> + struct cpuidle_power power;
>>   bool disabled; /* disabled on all CPUs */
>>
>>   int (*enter) (struct cpuidle_device *dev,
>> --
>> 1.7.9.5
>>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure
  2014-03-28 20:42     ` Daniel Lezcano
@ 2014-03-29  0:00       ` Nicolas Pitre
  0 siblings, 0 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-03-29  0:00 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: LKML, mingo, Peter Zijlstra, Rafael J. Wysocki, linux-pm,
	Alex Shi, Vincent Guittot, Morten Rasmussen

On Fri, 28 Mar 2014, Daniel Lezcano wrote:

> Hi Nicolas,
> 
> thanks for reviewing the patchset.
> 
> On 03/28/2014 07:17 PM, Nicolas Pitre wrote:
> > On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> >
> >> The scheduler needs some information from cpuidle to know the timing for a
> >> specific idle state a cpu is.
> >>
> >> This patch creates a separate structure to group the cpuidle power info in
> >> order to share it with the scheduler. It improves the encapsulation of the
> >> code.
> >
> > Having cpuidle_power as a structure name, or worse, 'power' as a struct
> > member, is a really bad choice.
> 
> Yes, I was asking myself if this name was a good choice or not. I
> assumed 'power' could have been a good name because 'target_residency'
> is a time conversion of the power needed to enter this state.

Still, that's something the casual reviewer might not know.

And we ought to be careful when talking about power as well.  By 
definition, power means energy transferred per unit of time.  Sometimes 
we tend to say 'power' when we actually mean 'energy'.  With more "power 
aware" work going into the scheduler, it is better to disambiguate those 
terms.

> > Amongst the fields this struct
> > contains, only 1 out of 3 is about power.  The word "power" is already
> > abused quite significantly to mean too many different things already.
> >
> > I'd suggest something inspired by your own patch log message i.e.
> > 'struct cpuidle_info' instead, and use 'info' as a field name within
> > struct cpuidle_state.  Having 'params" instead of "info" could be a good
> > alternative too, although slightly longer.
> 
> Hmm 'info' or 'param' sound too vague. What about:
> 
> cpuidle_attr
> or
> cpuidle_property

As you wish.  As long as it isn't 'power'.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
                   ` (2 preceding siblings ...)
  2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
@ 2014-03-31 13:52 ` Vincent Guittot
  2014-03-31 15:55   ` Daniel Lezcano
  2014-04-01 23:01 ` Rafael J. Wysocki
  2014-04-04  6:29 ` Len Brown
  5 siblings, 1 reply; 47+ messages in thread
From: Vincent Guittot @ 2014-03-31 13:52 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
	linux-pm, Alex Shi, Morten Rasmussen

On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> The following patchset provides an interaction between cpuidle and the scheduler.
>
> The first patch encapsulate the needed information for the scheduler in a
> separate cpuidle structure. The second one stores the pointer to this structure
> when entering idle. The third one, use this information to take the decision to
> find the idlest cpu.
>
> After some basic testing with hackbench, it appears there is an improvement for
> the performances (small) and for the duration of the idle states (which provides
> a better power saving).
>
> The measurement has been done with the 'idlestat' tool previously posted in this
> mailing list.
>
> So the benefit is good for both sides performance and power saving.

Hi Daniel,

I have looked at your results and i'm a bit surprised that you have so
much time in C-state with a test that involved 400 tasks on a dual
cores HT system. You shouldn't have any CPUs in idle state when
running hackbench; the total time of core0state in C7-IVB is
87932131.00(us), which is quite huge for a bench that runs 44sec. Or
i'm doing something wrong in the interpretation of the results ?

Regards,
Vincent

>
> The select_idle_sibling could be also improved in the same way.
>
> ====================== test with hackbench 3.14-rc8 =========================
>
> /usr/bin/hackbench -l 10000 -s 4096
>
> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
> Each sender will pass 10000 messages of 4096 bytes
>
> Time: 44.433
>
> Total trace buffer: 1846688 kB
> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 0                  0.00            0.00 0.00    0.00
>          C1E-VB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 0                  0.00            0.00 0.00    0.00
>   core0@state   hits          total(us)         avg(us) min(us) max(us)
>         POLL    0                  0.00            0.00 0.00    0.00
>         C1-IVB  0                  0.00            0.00 0.00    0.00
>         C1E-IVB 0                  0.00            0.00 0.00    0.00
>         C3-IVB  0                  0.00            0.00 0.00    0.00
>         C6-IVB  0                  0.00            0.00 0.00    0.00
>         C7-IVB  1396        87932131.00        62988.63 0.00    320146.00
>     cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 1                 14.00           14.00 14.00   14.00
>          C1E-VB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 1                262.00          262.00 262.00  262.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 1180        87938177.00        74523.88 1.00    320147.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   0                  0.00            0.00 0.00    0.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    0                  0.00            0.00 0.00    0.00
>     cpu0 wakeups        name            count
>          irq009 acpi            1
>     cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 0                  0.00            0.00 0.00    0.00
>          C1E-VB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 475         87941356.00       185139.70 322.00  1500690.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   0                  0.00            0.00 0.00    0.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    0                  0.00            0.00 0.00    0.00
>     cpu1 wakeups        name            count
>          irq009 acpi            3
>   core1@state   hits          total(us)         avg(us) min(us) max(us)
>         POLL    0                  0.00            0.00 0.00    0.00
>         C1-IVB  0                  0.00            0.00 0.00    0.00
>         C1E-IVB 0                  0.00            0.00 0.00    0.00
>         C3-IVB  0                  0.00            0.00 0.00    0.00
>         C6-IVB  0                  0.00            0.00 0.00    0.00
>         C7-IVB  0                  0.00            0.00 0.00    0.00
>     cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 11            288157.00        26196.09 16.00   200060.00
>          C1E-VB 6             221601.00        36933.50 79.00   200066.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 950         87417466.00        92018.39 19.00   200074.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   2                 34.00           17.00 11.00   23.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    745            18800.00           25.23 2.00    156.00
>     cpu2 wakeups        name            count
>          irq019 ahci            50
>          irq009 acpi            17
>     cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 0                  0.00            0.00 0.00    0.00
>          C1E-VB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 0                  0.00            0.00 0.00    0.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   0                  0.00            0.00 0.00    0.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    0                  0.00            0.00 0.00    0.00
>     cpu3 wakeups        name            count
>
> ================ test with hackbench 3.14-rc8 + patchset ====================
>
> /usr/bin/hackbench -l 10000 -s 4096
>
> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
> Each sender will pass 10000 messages of 4096 bytes
>
> Time: 42.179
>
> Total trace buffer: 1846688 kB
> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 0                  0.00            0.00 0.00    0.00
>          C1E-VB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 0                  0.00            0.00 0.00    0.00
>   core0@state   hits          total(us)         avg(us) min(us) max(us)
>         POLL    0                  0.00            0.00 0.00    0.00
>         C1-IVB  0                  0.00            0.00 0.00    0.00
>         C1E-IVB 0                  0.00            0.00 0.00    0.00
>         C3-IVB  0                  0.00            0.00 0.00    0.00
>         C6-IVB  0                  0.00            0.00 0.00    0.00
>         C7-IVB  880         89157590.00       101315.44 0.00    400184.00
>     cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 0                  0.00            0.00 0.00    0.00
>          C1E-VB 1                233.00          233.00 233.00  233.00
>          C3-IVB 1                260.00          260.00 260.00  260.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 700         89162006.00       127374.29 182.00  400187.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   0                  0.00            0.00 0.00    0.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    0                  0.00            0.00 0.00    0.00
>     cpu0 wakeups        name            count
>          irq009 acpi            2
>     cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 0                  0.00            0.00 0.00    0.00
>          C1E-VB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 334         89164805.00       266960.49 1.00    1500677.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   0                  0.00            0.00 0.00    0.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    0                  0.00            0.00 0.00    0.00
>     cpu1 wakeups        name            count
>          irq009 acpi            6
>   core1@state   hits          total(us)         avg(us) min(us) max(us)
>         POLL    0                  0.00            0.00 0.00    0.00
>         C1-IVB  0                  0.00            0.00 0.00    0.00
>         C1E-IVB 0                  0.00            0.00 0.00    0.00
>         C3-IVB  0                  0.00            0.00 0.00    0.00
>         C6-IVB  0                  0.00            0.00 0.00    0.00
>         C7-IVB  0                  0.00            0.00 0.00    0.00
>     cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 19           2169047.00       114160.37 18.00   999129.00
>          C1E-IB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 376         86993307.00       231365.18 20.00   1500682.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   0                  0.00            0.00 0.00    0.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    0                  0.00            0.00 0.00    0.00
>     cpu2 wakeups        name            count
>          irq009 acpi            32
>          irq019 ahci            45
>     cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>          POLL   0                  0.00            0.00 0.00    0.00
>          C1-IVB 0                  0.00            0.00 0.00    0.00
>          C1E-VB 0                  0.00            0.00 0.00    0.00
>          C3-IVB 0                  0.00            0.00 0.00    0.00
>          C6-IVB 0                  0.00            0.00 0.00    0.00
>          C7-IVB 0                  0.00            0.00 0.00    0.00
>          1701   0                  0.00            0.00 0.00    0.00
>          1700   0                  0.00            0.00 0.00    0.00
>          1600   0                  0.00            0.00 0.00    0.00
>          1500   0                  0.00            0.00 0.00    0.00
>          1400   0                  0.00            0.00 0.00    0.00
>          1300   0                  0.00            0.00 0.00    0.00
>          1200   0                  0.00            0.00 0.00    0.00
>          1100   0                  0.00            0.00 0.00    0.00
>          1000   0                  0.00            0.00 0.00    0.00
>          900    0                  0.00            0.00 0.00    0.00
>          800    0                  0.00            0.00 0.00    0.00
>          782    0                  0.00            0.00 0.00    0.00
>     cpu3 wakeups        name            count
>
>
> Daniel Lezcano (3):
>   cpuidle: encapsulate power info in a separate structure
>   idle: store the idle state the cpu is
>   sched/fair: use the idle state info to choose the idlest cpu
>
>  arch/arm/include/asm/cpuidle.h       |    6 +-
>  arch/arm/mach-exynos/cpuidle.c       |    4 +-
>  drivers/acpi/processor_idle.c        |    4 +-
>  drivers/base/power/domain.c          |    6 +-
>  drivers/cpuidle/cpuidle-at91.c       |    4 +-
>  drivers/cpuidle/cpuidle-big_little.c |    9 +--
>  drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
>  drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
>  drivers/cpuidle/cpuidle-powernv.c    |    8 +--
>  drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
>  drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
>  drivers/cpuidle/cpuidle-zynq.c       |    4 +-
>  drivers/cpuidle/driver.c             |    6 +-
>  drivers/cpuidle/governors/ladder.c   |   14 +++--
>  drivers/cpuidle/governors/menu.c     |    8 +--
>  drivers/cpuidle/sysfs.c              |    2 +-
>  drivers/idle/intel_idle.c            |  112 +++++++++++++++++-----------------
>  include/linux/cpuidle.h              |   10 ++-
>  kernel/sched/fair.c                  |   46 ++++++++++++--
>  kernel/sched/idle.c                  |   17 +++++-
>  kernel/sched/sched.h                 |    5 ++
>  21 files changed, 180 insertions(+), 121 deletions(-)
>
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
@ 2014-03-31 15:55   ` Daniel Lezcano
  2014-04-01  7:16     ` Vincent Guittot
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-03-31 15:55 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
	linux-pm, Alex Shi, Morten Rasmussen

On 03/31/2014 03:52 PM, Vincent Guittot wrote:
> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>> The following patchset provides an interaction between cpuidle and the scheduler.
>>
>> The first patch encapsulate the needed information for the scheduler in a
>> separate cpuidle structure. The second one stores the pointer to this structure
>> when entering idle. The third one, use this information to take the decision to
>> find the idlest cpu.
>>
>> After some basic testing with hackbench, it appears there is an improvement for
>> the performances (small) and for the duration of the idle states (which provides
>> a better power saving).
>>
>> The measurement has been done with the 'idlestat' tool previously posted in this
>> mailing list.
>>
>> So the benefit is good for both sides performance and power saving.
>
> Hi Daniel,
>
> I have looked at your results and i'm a bit surprised that you have so
> much time in C-state with a test that involved 400 tasks on a dual
> cores HT system. You shouldn't have any CPUs in idle state when
> running hackbench; the total time of core0state in C7-IVB is
> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
> i'm doing something wrong in the interpretation of the results ?

No, actually I mixed the output of hackbench without being run with 
idlestat or with idlestat.

The hackbench's results below are without idlestat.

The idlestat results are consistent and effectively it adds a non 
negligeable overhead as it impacts the hackbench results.

So to summarize, hackbench has been run 4 times.

1, 2 : without idlestat, with and without the patchset - hackbench 
results ~42 secs

3, 4 : with idlestat, with and without the patchset - hackbench results 
~87 secs

At the first the glance, the results are consistent but I will double 
check them.

Do you have a suggestion for a benchmarking program ?

Thanks !

   -- Daniel


>> The select_idle_sibling could be also improved in the same way.
>>
>> ====================== test with hackbench 3.14-rc8 =========================
>>
>> /usr/bin/hackbench -l 10000 -s 4096
>>
>> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
>> Each sender will pass 10000 messages of 4096 bytes
>>
>> Time: 44.433
>>
>> Total trace buffer: 1846688 kB
>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>    core0@state   hits          total(us)         avg(us) min(us) max(us)
>>          POLL    0                  0.00            0.00 0.00    0.00
>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>          C7-IVB  1396        87932131.00        62988.63 0.00    320146.00
>>      cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 1                 14.00           14.00 14.00   14.00
>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 1                262.00          262.00 262.00  262.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 1180        87938177.00        74523.88 1.00    320147.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   0                  0.00            0.00 0.00    0.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    0                  0.00            0.00 0.00    0.00
>>      cpu0 wakeups        name            count
>>           irq009 acpi            1
>>      cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 475         87941356.00       185139.70 322.00  1500690.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   0                  0.00            0.00 0.00    0.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    0                  0.00            0.00 0.00    0.00
>>      cpu1 wakeups        name            count
>>           irq009 acpi            3
>>    core1@state   hits          total(us)         avg(us) min(us) max(us)
>>          POLL    0                  0.00            0.00 0.00    0.00
>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>          C7-IVB  0                  0.00            0.00 0.00    0.00
>>      cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 11            288157.00        26196.09 16.00   200060.00
>>           C1E-VB 6             221601.00        36933.50 79.00   200066.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 950         87417466.00        92018.39 19.00   200074.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   2                 34.00           17.00 11.00   23.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    745            18800.00           25.23 2.00    156.00
>>      cpu2 wakeups        name            count
>>           irq019 ahci            50
>>           irq009 acpi            17
>>      cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   0                  0.00            0.00 0.00    0.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    0                  0.00            0.00 0.00    0.00
>>      cpu3 wakeups        name            count
>>
>> ================ test with hackbench 3.14-rc8 + patchset ====================
>>
>> /usr/bin/hackbench -l 10000 -s 4096
>>
>> Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks)
>> Each sender will pass 10000 messages of 4096 bytes
>>
>> Time: 42.179
>>
>> Total trace buffer: 1846688 kB
>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>    core0@state   hits          total(us)         avg(us) min(us) max(us)
>>          POLL    0                  0.00            0.00 0.00    0.00
>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>          C7-IVB  880         89157590.00       101315.44 0.00    400184.00
>>      cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>           C1E-VB 1                233.00          233.00 233.00  233.00
>>           C3-IVB 1                260.00          260.00 260.00  260.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 700         89162006.00       127374.29 182.00  400187.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   0                  0.00            0.00 0.00    0.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    0                  0.00            0.00 0.00    0.00
>>      cpu0 wakeups        name            count
>>           irq009 acpi            2
>>      cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 334         89164805.00       266960.49 1.00    1500677.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   0                  0.00            0.00 0.00    0.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    0                  0.00            0.00 0.00    0.00
>>      cpu1 wakeups        name            count
>>           irq009 acpi            6
>>    core1@state   hits          total(us)         avg(us) min(us) max(us)
>>          POLL    0                  0.00            0.00 0.00    0.00
>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>          C7-IVB  0                  0.00            0.00 0.00    0.00
>>      cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 19           2169047.00       114160.37 18.00   999129.00
>>           C1E-IB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 376         86993307.00       231365.18 20.00   1500682.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   0                  0.00            0.00 0.00    0.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    0                  0.00            0.00 0.00    0.00
>>      cpu2 wakeups        name            count
>>           irq009 acpi            32
>>           irq019 ahci            45
>>      cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>>           POLL   0                  0.00            0.00 0.00    0.00
>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>           1701   0                  0.00            0.00 0.00    0.00
>>           1700   0                  0.00            0.00 0.00    0.00
>>           1600   0                  0.00            0.00 0.00    0.00
>>           1500   0                  0.00            0.00 0.00    0.00
>>           1400   0                  0.00            0.00 0.00    0.00
>>           1300   0                  0.00            0.00 0.00    0.00
>>           1200   0                  0.00            0.00 0.00    0.00
>>           1100   0                  0.00            0.00 0.00    0.00
>>           1000   0                  0.00            0.00 0.00    0.00
>>           900    0                  0.00            0.00 0.00    0.00
>>           800    0                  0.00            0.00 0.00    0.00
>>           782    0                  0.00            0.00 0.00    0.00
>>      cpu3 wakeups        name            count
>>
>>
>> Daniel Lezcano (3):
>>    cpuidle: encapsulate power info in a separate structure
>>    idle: store the idle state the cpu is
>>    sched/fair: use the idle state info to choose the idlest cpu
>>
>>   arch/arm/include/asm/cpuidle.h       |    6 +-
>>   arch/arm/mach-exynos/cpuidle.c       |    4 +-
>>   drivers/acpi/processor_idle.c        |    4 +-
>>   drivers/base/power/domain.c          |    6 +-
>>   drivers/cpuidle/cpuidle-at91.c       |    4 +-
>>   drivers/cpuidle/cpuidle-big_little.c |    9 +--
>>   drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
>>   drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
>>   drivers/cpuidle/cpuidle-powernv.c    |    8 +--
>>   drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
>>   drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
>>   drivers/cpuidle/cpuidle-zynq.c       |    4 +-
>>   drivers/cpuidle/driver.c             |    6 +-
>>   drivers/cpuidle/governors/ladder.c   |   14 +++--
>>   drivers/cpuidle/governors/menu.c     |    8 +--
>>   drivers/cpuidle/sysfs.c              |    2 +-
>>   drivers/idle/intel_idle.c            |  112 +++++++++++++++++-----------------
>>   include/linux/cpuidle.h              |   10 ++-
>>   kernel/sched/fair.c                  |   46 ++++++++++++--
>>   kernel/sched/idle.c                  |   17 +++++-
>>   kernel/sched/sched.h                 |    5 ++
>>   21 files changed, 180 insertions(+), 121 deletions(-)
>>
>> --
>> 1.7.9.5
>>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-03-31 15:55   ` Daniel Lezcano
@ 2014-04-01  7:16     ` Vincent Guittot
  2014-04-01  7:43       ` Daniel Lezcano
  0 siblings, 1 reply; 47+ messages in thread
From: Vincent Guittot @ 2014-04-01  7:16 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
	linux-pm, Alex Shi, Morten Rasmussen

On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> On 03/31/2014 03:52 PM, Vincent Guittot wrote:
>>
>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>>
>>> The following patchset provides an interaction between cpuidle and the
>>> scheduler.
>>>
>>> The first patch encapsulate the needed information for the scheduler in a
>>> separate cpuidle structure. The second one stores the pointer to this
>>> structure
>>> when entering idle. The third one, use this information to take the
>>> decision to
>>> find the idlest cpu.
>>>
>>> After some basic testing with hackbench, it appears there is an
>>> improvement for
>>> the performances (small) and for the duration of the idle states (which
>>> provides
>>> a better power saving).
>>>
>>> The measurement has been done with the 'idlestat' tool previously posted
>>> in this
>>> mailing list.
>>>
>>> So the benefit is good for both sides performance and power saving.
>>
>>
>> Hi Daniel,
>>
>> I have looked at your results and i'm a bit surprised that you have so
>> much time in C-state with a test that involved 400 tasks on a dual
>> cores HT system. You shouldn't have any CPUs in idle state when
>> running hackbench; the total time of core0state in C7-IVB is
>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
>> i'm doing something wrong in the interpretation of the results ?
>
>
> No, actually I mixed the output of hackbench without being run with idlestat
> or with idlestat.
>
> The hackbench's results below are without idlestat.
>
> The idlestat results are consistent and effectively it adds a non
> negligeable overhead as it impacts the hackbench results.
>
> So to summarize, hackbench has been run 4 times.
>
> 1, 2 : without idlestat, with and without the patchset - hackbench results
> ~42 secs
>
> 3, 4 : with idlestat, with and without the patchset - hackbench results ~87
> secs
>
> At the first the glance, the results are consistent but I will double check
> them.
>
> Do you have a suggestion for a benchmarking program ?

We are working on a bench which can generate middle load pattern with
idle CPUs but it's not available yet. In the mean time, one bench that
plays with idle time is cyclictest, it will not give you performance
results but only scheduling latency which might be what you are
looking for.

Vincent

>
> Thanks !
>
>   -- Daniel
>
>
>
>>> The select_idle_sibling could be also improved in the same way.
>>>
>>> ====================== test with hackbench 3.14-rc8
>>> =========================
>>>
>>> /usr/bin/hackbench -l 10000 -s 4096
>>>
>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>> 400 tasks)
>>> Each sender will pass 10000 messages of 4096 bytes
>>>
>>> Time: 44.433
>>>
>>> Total trace buffer: 1846688 kB
>>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>>    core0@state   hits          total(us)         avg(us) min(us) max(us)
>>>          POLL    0                  0.00            0.00 0.00    0.00
>>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>>          C7-IVB  1396        87932131.00        62988.63 0.00
>>> 320146.00
>>>      cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 1                 14.00           14.00 14.00   14.00
>>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 1                262.00          262.00 262.00  262.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 1180        87938177.00        74523.88 1.00
>>> 320147.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   0                  0.00            0.00 0.00    0.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    0                  0.00            0.00 0.00    0.00
>>>      cpu0 wakeups        name            count
>>>           irq009 acpi            1
>>>      cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 475         87941356.00       185139.70 322.00
>>> 1500690.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   0                  0.00            0.00 0.00    0.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    0                  0.00            0.00 0.00    0.00
>>>      cpu1 wakeups        name            count
>>>           irq009 acpi            3
>>>    core1@state   hits          total(us)         avg(us) min(us) max(us)
>>>          POLL    0                  0.00            0.00 0.00    0.00
>>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>>          C7-IVB  0                  0.00            0.00 0.00    0.00
>>>      cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 11            288157.00        26196.09 16.00
>>> 200060.00
>>>           C1E-VB 6             221601.00        36933.50 79.00
>>> 200066.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 950         87417466.00        92018.39 19.00
>>> 200074.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   2                 34.00           17.00 11.00   23.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    745            18800.00           25.23 2.00    156.00
>>>      cpu2 wakeups        name            count
>>>           irq019 ahci            50
>>>           irq009 acpi            17
>>>      cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   0                  0.00            0.00 0.00    0.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    0                  0.00            0.00 0.00    0.00
>>>      cpu3 wakeups        name            count
>>>
>>> ================ test with hackbench 3.14-rc8 + patchset
>>> ====================
>>>
>>> /usr/bin/hackbench -l 10000 -s 4096
>>>
>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>> 400 tasks)
>>> Each sender will pass 10000 messages of 4096 bytes
>>>
>>> Time: 42.179
>>>
>>> Total trace buffer: 1846688 kB
>>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>>    core0@state   hits          total(us)         avg(us) min(us) max(us)
>>>          POLL    0                  0.00            0.00 0.00    0.00
>>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>>          C7-IVB  880         89157590.00       101315.44 0.00
>>> 400184.00
>>>      cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>>           C1E-VB 1                233.00          233.00 233.00  233.00
>>>           C3-IVB 1                260.00          260.00 260.00  260.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 700         89162006.00       127374.29 182.00
>>> 400187.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   0                  0.00            0.00 0.00    0.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    0                  0.00            0.00 0.00    0.00
>>>      cpu0 wakeups        name            count
>>>           irq009 acpi            2
>>>      cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 334         89164805.00       266960.49 1.00
>>> 1500677.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   0                  0.00            0.00 0.00    0.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    0                  0.00            0.00 0.00    0.00
>>>      cpu1 wakeups        name            count
>>>           irq009 acpi            6
>>>    core1@state   hits          total(us)         avg(us) min(us) max(us)
>>>          POLL    0                  0.00            0.00 0.00    0.00
>>>          C1-IVB  0                  0.00            0.00 0.00    0.00
>>>          C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>          C3-IVB  0                  0.00            0.00 0.00    0.00
>>>          C6-IVB  0                  0.00            0.00 0.00    0.00
>>>          C7-IVB  0                  0.00            0.00 0.00    0.00
>>>      cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 19           2169047.00       114160.37 18.00
>>> 999129.00
>>>           C1E-IB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 376         86993307.00       231365.18 20.00
>>> 1500682.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   0                  0.00            0.00 0.00    0.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    0                  0.00            0.00 0.00    0.00
>>>      cpu2 wakeups        name            count
>>>           irq009 acpi            32
>>>           irq019 ahci            45
>>>      cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>>>           POLL   0                  0.00            0.00 0.00    0.00
>>>           C1-IVB 0                  0.00            0.00 0.00    0.00
>>>           C1E-VB 0                  0.00            0.00 0.00    0.00
>>>           C3-IVB 0                  0.00            0.00 0.00    0.00
>>>           C6-IVB 0                  0.00            0.00 0.00    0.00
>>>           C7-IVB 0                  0.00            0.00 0.00    0.00
>>>           1701   0                  0.00            0.00 0.00    0.00
>>>           1700   0                  0.00            0.00 0.00    0.00
>>>           1600   0                  0.00            0.00 0.00    0.00
>>>           1500   0                  0.00            0.00 0.00    0.00
>>>           1400   0                  0.00            0.00 0.00    0.00
>>>           1300   0                  0.00            0.00 0.00    0.00
>>>           1200   0                  0.00            0.00 0.00    0.00
>>>           1100   0                  0.00            0.00 0.00    0.00
>>>           1000   0                  0.00            0.00 0.00    0.00
>>>           900    0                  0.00            0.00 0.00    0.00
>>>           800    0                  0.00            0.00 0.00    0.00
>>>           782    0                  0.00            0.00 0.00    0.00
>>>      cpu3 wakeups        name            count
>>>
>>>
>>> Daniel Lezcano (3):
>>>    cpuidle: encapsulate power info in a separate structure
>>>    idle: store the idle state the cpu is
>>>    sched/fair: use the idle state info to choose the idlest cpu
>>>
>>>   arch/arm/include/asm/cpuidle.h       |    6 +-
>>>   arch/arm/mach-exynos/cpuidle.c       |    4 +-
>>>   drivers/acpi/processor_idle.c        |    4 +-
>>>   drivers/base/power/domain.c          |    6 +-
>>>   drivers/cpuidle/cpuidle-at91.c       |    4 +-
>>>   drivers/cpuidle/cpuidle-big_little.c |    9 +--
>>>   drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
>>>   drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
>>>   drivers/cpuidle/cpuidle-powernv.c    |    8 +--
>>>   drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
>>>   drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
>>>   drivers/cpuidle/cpuidle-zynq.c       |    4 +-
>>>   drivers/cpuidle/driver.c             |    6 +-
>>>   drivers/cpuidle/governors/ladder.c   |   14 +++--
>>>   drivers/cpuidle/governors/menu.c     |    8 +--
>>>   drivers/cpuidle/sysfs.c              |    2 +-
>>>   drivers/idle/intel_idle.c            |  112
>>> +++++++++++++++++-----------------
>>>   include/linux/cpuidle.h              |   10 ++-
>>>   kernel/sched/fair.c                  |   46 ++++++++++++--
>>>   kernel/sched/idle.c                  |   17 +++++-
>>>   kernel/sched/sched.h                 |    5 ++
>>>   21 files changed, 180 insertions(+), 121 deletions(-)
>>>
>>> --
>>> 1.7.9.5
>>>
>
>
> --
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-01  7:16     ` Vincent Guittot
@ 2014-04-01  7:43       ` Daniel Lezcano
  2014-04-01  9:05         ` Vincent Guittot
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-01  7:43 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
	linux-pm, Alex Shi, Morten Rasmussen

On 04/01/2014 09:16 AM, Vincent Guittot wrote:
> On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>> On 03/31/2014 03:52 PM, Vincent Guittot wrote:
>>>
>>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> The following patchset provides an interaction between cpuidle and the
>>>> scheduler.
>>>>
>>>> The first patch encapsulate the needed information for the scheduler in a
>>>> separate cpuidle structure. The second one stores the pointer to this
>>>> structure
>>>> when entering idle. The third one, use this information to take the
>>>> decision to
>>>> find the idlest cpu.
>>>>
>>>> After some basic testing with hackbench, it appears there is an
>>>> improvement for
>>>> the performances (small) and for the duration of the idle states (which
>>>> provides
>>>> a better power saving).
>>>>
>>>> The measurement has been done with the 'idlestat' tool previously posted
>>>> in this
>>>> mailing list.
>>>>
>>>> So the benefit is good for both sides performance and power saving.
>>>
>>>
>>> Hi Daniel,
>>>
>>> I have looked at your results and i'm a bit surprised that you have so
>>> much time in C-state with a test that involved 400 tasks on a dual
>>> cores HT system. You shouldn't have any CPUs in idle state when
>>> running hackbench; the total time of core0state in C7-IVB is
>>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
>>> i'm doing something wrong in the interpretation of the results ?
>>
>>
>> No, actually I mixed the output of hackbench without being run with idlestat
>> or with idlestat.
>>
>> The hackbench's results below are without idlestat.
>>
>> The idlestat results are consistent and effectively it adds a non
>> negligeable overhead as it impacts the hackbench results.
>>
>> So to summarize, hackbench has been run 4 times.
>>
>> 1, 2 : without idlestat, with and without the patchset - hackbench results
>> ~42 secs
>>
>> 3, 4 : with idlestat, with and without the patchset - hackbench results ~87
>> secs
>>
>> At the first the glance, the results are consistent but I will double check
>> them.
>>
>> Do you have a suggestion for a benchmarking program ?
>
> We are working on a bench which can generate middle load pattern with
> idle CPUs but it's not available yet. In the mean time, one bench that
> plays with idle time is cyclictest, it will not give you performance
> results but only scheduling latency which might be what you are
> looking for.

Yeah, thanks. I believe I know what is in the rt-tests package :)

What I meant is what kind of values would you like to see with this 
patchset ?



>>>> The select_idle_sibling could be also improved in the same way.
>>>>
>>>> ====================== test with hackbench 3.14-rc8
>>>> =========================
>>>>
>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>
>>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>>> 400 tasks)
>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>
>>>> Time: 44.433
>>>>
>>>> Total trace buffer: 1846688 kB
>>>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>     core0@state   hits          total(us)         avg(us) min(us) max(us)
>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C7-IVB  1396        87932131.00        62988.63 0.00
>>>> 320146.00
>>>>       cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 1                 14.00           14.00 14.00   14.00
>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 1                262.00          262.00 262.00  262.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 1180        87938177.00        74523.88 1.00
>>>> 320147.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>       cpu0 wakeups        name            count
>>>>            irq009 acpi            1
>>>>       cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 475         87941356.00       185139.70 322.00
>>>> 1500690.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>       cpu1 wakeups        name            count
>>>>            irq009 acpi            3
>>>>     core1@state   hits          total(us)         avg(us) min(us) max(us)
>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C7-IVB  0                  0.00            0.00 0.00    0.00
>>>>       cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 11            288157.00        26196.09 16.00
>>>> 200060.00
>>>>            C1E-VB 6             221601.00        36933.50 79.00
>>>> 200066.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 950         87417466.00        92018.39 19.00
>>>> 200074.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   2                 34.00           17.00 11.00   23.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    745            18800.00           25.23 2.00    156.00
>>>>       cpu2 wakeups        name            count
>>>>            irq019 ahci            50
>>>>            irq009 acpi            17
>>>>       cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>       cpu3 wakeups        name            count
>>>>
>>>> ================ test with hackbench 3.14-rc8 + patchset
>>>> ====================
>>>>
>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>
>>>> Running in process mode with 10 groups using 40 file descriptors each (==
>>>> 400 tasks)
>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>
>>>> Time: 42.179
>>>>
>>>> Total trace buffer: 1846688 kB
>>>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>     core0@state   hits          total(us)         avg(us) min(us) max(us)
>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C7-IVB  880         89157590.00       101315.44 0.00
>>>> 400184.00
>>>>       cpu0@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C1E-VB 1                233.00          233.00 233.00  233.00
>>>>            C3-IVB 1                260.00          260.00 260.00  260.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 700         89162006.00       127374.29 182.00
>>>> 400187.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>       cpu0 wakeups        name            count
>>>>            irq009 acpi            2
>>>>       cpu1@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 334         89164805.00       266960.49 1.00
>>>> 1500677.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>       cpu1 wakeups        name            count
>>>>            irq009 acpi            6
>>>>     core1@state   hits          total(us)         avg(us) min(us) max(us)
>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>           C7-IVB  0                  0.00            0.00 0.00    0.00
>>>>       cpu2@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 19           2169047.00       114160.37 18.00
>>>> 999129.00
>>>>            C1E-IB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 376         86993307.00       231365.18 20.00
>>>> 1500682.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>       cpu2 wakeups        name            count
>>>>            irq009 acpi            32
>>>>            irq019 ahci            45
>>>>       cpu3@state  hits          total(us)         avg(us) min(us) max(us)
>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>       cpu3 wakeups        name            count
>>>>
>>>>
>>>> Daniel Lezcano (3):
>>>>     cpuidle: encapsulate power info in a separate structure
>>>>     idle: store the idle state the cpu is
>>>>     sched/fair: use the idle state info to choose the idlest cpu
>>>>
>>>>    arch/arm/include/asm/cpuidle.h       |    6 +-
>>>>    arch/arm/mach-exynos/cpuidle.c       |    4 +-
>>>>    drivers/acpi/processor_idle.c        |    4 +-
>>>>    drivers/base/power/domain.c          |    6 +-
>>>>    drivers/cpuidle/cpuidle-at91.c       |    4 +-
>>>>    drivers/cpuidle/cpuidle-big_little.c |    9 +--
>>>>    drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
>>>>    drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
>>>>    drivers/cpuidle/cpuidle-powernv.c    |    8 +--
>>>>    drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
>>>>    drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
>>>>    drivers/cpuidle/cpuidle-zynq.c       |    4 +-
>>>>    drivers/cpuidle/driver.c             |    6 +-
>>>>    drivers/cpuidle/governors/ladder.c   |   14 +++--
>>>>    drivers/cpuidle/governors/menu.c     |    8 +--
>>>>    drivers/cpuidle/sysfs.c              |    2 +-
>>>>    drivers/idle/intel_idle.c            |  112
>>>> +++++++++++++++++-----------------
>>>>    include/linux/cpuidle.h              |   10 ++-
>>>>    kernel/sched/fair.c                  |   46 ++++++++++++--
>>>>    kernel/sched/idle.c                  |   17 +++++-
>>>>    kernel/sched/sched.h                 |    5 ++
>>>>    21 files changed, 180 insertions(+), 121 deletions(-)
>>>>
>>>> --
>>>> 1.7.9.5
>>>>
>>
>>
>> --
>>   <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-01  7:43       ` Daniel Lezcano
@ 2014-04-01  9:05         ` Vincent Guittot
  2014-04-15 13:13           ` Peter Zijlstra
  0 siblings, 1 reply; 47+ messages in thread
From: Vincent Guittot @ 2014-04-01  9:05 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, rjw, Nicolas Pitre,
	linux-pm, Alex Shi, Morten Rasmussen

On 1 April 2014 09:43, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> On 04/01/2014 09:16 AM, Vincent Guittot wrote:
>>
>> On 31 March 2014 17:55, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>>
>>> On 03/31/2014 03:52 PM, Vincent Guittot wrote:
>>>>
>>>>
>>>> On 28 March 2014 13:29, Daniel Lezcano <daniel.lezcano@linaro.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> The following patchset provides an interaction between cpuidle and the
>>>>> scheduler.
>>>>>
>>>>> The first patch encapsulate the needed information for the scheduler in
>>>>> a
>>>>> separate cpuidle structure. The second one stores the pointer to this
>>>>> structure
>>>>> when entering idle. The third one, use this information to take the
>>>>> decision to
>>>>> find the idlest cpu.
>>>>>
>>>>> After some basic testing with hackbench, it appears there is an
>>>>> improvement for
>>>>> the performances (small) and for the duration of the idle states (which
>>>>> provides
>>>>> a better power saving).
>>>>>
>>>>> The measurement has been done with the 'idlestat' tool previously
>>>>> posted
>>>>> in this
>>>>> mailing list.
>>>>>
>>>>> So the benefit is good for both sides performance and power saving.
>>>>
>>>>
>>>>
>>>> Hi Daniel,
>>>>
>>>> I have looked at your results and i'm a bit surprised that you have so
>>>> much time in C-state with a test that involved 400 tasks on a dual
>>>> cores HT system. You shouldn't have any CPUs in idle state when
>>>> running hackbench; the total time of core0state in C7-IVB is
>>>> 87932131.00(us), which is quite huge for a bench that runs 44sec. Or
>>>> i'm doing something wrong in the interpretation of the results ?
>>>
>>>
>>>
>>> No, actually I mixed the output of hackbench without being run with
>>> idlestat
>>> or with idlestat.
>>>
>>> The hackbench's results below are without idlestat.
>>>
>>> The idlestat results are consistent and effectively it adds a non
>>> negligeable overhead as it impacts the hackbench results.
>>>
>>> So to summarize, hackbench has been run 4 times.
>>>
>>> 1, 2 : without idlestat, with and without the patchset - hackbench
>>> results
>>> ~42 secs
>>>
>>> 3, 4 : with idlestat, with and without the patchset - hackbench results
>>> ~87
>>> secs
>>>
>>> At the first the glance, the results are consistent but I will double
>>> check
>>> them.
>>>
>>> Do you have a suggestion for a benchmarking program ?
>>
>>
>> We are working on a bench which can generate middle load pattern with
>> idle CPUs but it's not available yet. In the mean time, one bench that
>> plays with idle time is cyclictest, it will not give you performance
>> results but only scheduling latency which might be what you are
>> looking for.
>
>
> Yeah, thanks. I believe I know what is in the rt-tests package :)
>
> What I meant is what kind of values would you like to see with this patchset
> ?

IIUC, you patch tries to improve the wake up latency of a task by
selecting the CPUs with the shallowest C-state, so this metrics seems
to be a good candidate

>
>
>
>
>>>>> The select_idle_sibling could be also improved in the same way.
>>>>>
>>>>> ====================== test with hackbench 3.14-rc8
>>>>> =========================
>>>>>
>>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>>
>>>>> Running in process mode with 10 groups using 40 file descriptors each
>>>>> (==
>>>>> 400 tasks)
>>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>>
>>>>> Time: 44.433
>>>>>
>>>>> Total trace buffer: 1846688 kB
>>>>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>>     core0@state   hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C7-IVB  1396        87932131.00        62988.63 0.00
>>>>> 320146.00
>>>>>       cpu0@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 1                 14.00           14.00 14.00   14.00
>>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 1                262.00          262.00 262.00
>>>>> 262.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 1180        87938177.00        74523.88 1.00
>>>>> 320147.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.0
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>>       cpu0 wakeups        name            count
>>>>>            irq009 acpi            1
>>>>>       cpu1@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 475         87941356.00       185139.70 322.00
>>>>> 1500690.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>>       cpu1 wakeups        name            count
>>>>>            irq009 acpi            3
>>>>>     core1@state   hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C7-IVB  0                  0.00            0.00 0.00    0.00
>>>>>       cpu2@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 11            288157.00        26196.09 16.00
>>>>> 200060.00
>>>>>            C1E-VB 6             221601.00        36933.50 79.00
>>>>> 200066.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 950         87417466.00        92018.39 19.00
>>>>> 200074.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   2                 34.00           17.00 11.00   23.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    745            18800.00           25.23 2.00
>>>>> 156.00
>>>>>       cpu2 wakeups        name            count
>>>>>            irq019 ahci            50
>>>>>            irq009 acpi            17
>>>>>       cpu3@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>>       cpu3 wakeups        name            count
>>>>>
>>>>> ================ test with hackbench 3.14-rc8 + patchset
>>>>> ====================
>>>>>
>>>>> /usr/bin/hackbench -l 10000 -s 4096
>>>>>
>>>>> Running in process mode with 10 groups using 40 file descriptors each
>>>>> (==
>>>>> 400 tasks)
>>>>> Each sender will pass 10000 messages of 4096 bytes
>>>>>
>>>>> Time: 42.179
>>>>>
>>>>> Total trace buffer: 1846688 kB
>>>>> clusterA@state  hits          total(us)         avg(us) min(us) max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>>     core0@state   hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C7-IVB  880         89157590.00       101315.44 0.00
>>>>> 400184.00
>>>>>       cpu0@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C1E-VB 1                233.00          233.00 233.00
>>>>> 233.00
>>>>>            C3-IVB 1                260.00          260.00 260.00
>>>>> 260.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 700         89162006.00       127374.29 182.00
>>>>> 400187.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>>       cpu0 wakeups        name            count
>>>>>            irq009 acpi            2
>>>>>       cpu1@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 334         89164805.00       266960.49 1.00
>>>>> 1500677.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>>       cpu1 wakeups        name            count
>>>>>            irq009 acpi            6
>>>>>     core1@state   hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>           POLL    0                  0.00            0.00 0.00    0.00
>>>>>           C1-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C1E-IVB 0                  0.00            0.00 0.00    0.00
>>>>>           C3-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C6-IVB  0                  0.00            0.00 0.00    0.00
>>>>>           C7-IVB  0                  0.00            0.00 0.00    0.00
>>>>>       cpu2@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 19           2169047.00       114160.37 18.00
>>>>> 999129.00
>>>>>            C1E-IB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 376         86993307.00       231365.18 20.00
>>>>> 1500682.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>>       cpu2 wakeups        name            count
>>>>>            irq009 acpi            32
>>>>>            irq019 ahci            45
>>>>>       cpu3@state  hits          total(us)         avg(us) min(us)
>>>>> max(us)
>>>>>            POLL   0                  0.00            0.00 0.00    0.00
>>>>>            C1-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C1E-VB 0                  0.00            0.00 0.00    0.00
>>>>>            C3-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C6-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            C7-IVB 0                  0.00            0.00 0.00    0.00
>>>>>            1701   0                  0.00            0.00 0.00    0.00
>>>>>            1700   0                  0.00            0.00 0.00    0.00
>>>>>            1600   0                  0.00            0.00 0.00    0.00
>>>>>            1500   0                  0.00            0.00 0.00    0.00
>>>>>            1400   0                  0.00            0.00 0.00    0.00
>>>>>            1300   0                  0.00            0.00 0.00    0.00
>>>>>            1200   0                  0.00            0.00 0.00    0.00
>>>>>            1100   0                  0.00            0.00 0.00    0.00
>>>>>            1000   0                  0.00            0.00 0.00    0.00
>>>>>            900    0                  0.00            0.00 0.00    0.00
>>>>>            800    0                  0.00            0.00 0.00    0.00
>>>>>            782    0                  0.00            0.00 0.00    0.00
>>>>>       cpu3 wakeups        name            count
>>>>>
>>>>>
>>>>> Daniel Lezcano (3):
>>>>>     cpuidle: encapsulate power info in a separate structure
>>>>>     idle: store the idle state the cpu is
>>>>>     sched/fair: use the idle state info to choose the idlest cpu
>>>>>
>>>>>    arch/arm/include/asm/cpuidle.h       |    6 +-
>>>>>    arch/arm/mach-exynos/cpuidle.c       |    4 +-
>>>>>    drivers/acpi/processor_idle.c        |    4 +-
>>>>>    drivers/base/power/domain.c          |    6 +-
>>>>>    drivers/cpuidle/cpuidle-at91.c       |    4 +-
>>>>>    drivers/cpuidle/cpuidle-big_little.c |    9 +--
>>>>>    drivers/cpuidle/cpuidle-calxeda.c    |    6 +-
>>>>>    drivers/cpuidle/cpuidle-kirkwood.c   |    4 +-
>>>>>    drivers/cpuidle/cpuidle-powernv.c    |    8 +--
>>>>>    drivers/cpuidle/cpuidle-pseries.c    |   12 ++--
>>>>>    drivers/cpuidle/cpuidle-ux500.c      |   14 ++---
>>>>>    drivers/cpuidle/cpuidle-zynq.c       |    4 +-
>>>>>    drivers/cpuidle/driver.c             |    6 +-
>>>>>    drivers/cpuidle/governors/ladder.c   |   14 +++--
>>>>>    drivers/cpuidle/governors/menu.c     |    8 +--
>>>>>    drivers/cpuidle/sysfs.c              |    2 +-
>>>>>    drivers/idle/intel_idle.c            |  112
>>>>> +++++++++++++++++-----------------
>>>>>    include/linux/cpuidle.h              |   10 ++-
>>>>>    kernel/sched/fair.c                  |   46 ++++++++++++--
>>>>>    kernel/sched/idle.c                  |   17 +++++-
>>>>>    kernel/sched/sched.h                 |    5 ++
>>>>>    21 files changed, 180 insertions(+), 121 deletions(-)
>>>>>
>>>>> --
>>>>> 1.7.9.5
>>>>>
>>>
>>>
>>> --
>>>   <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>>
>>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>>> <http://twitter.com/#!/linaroorg> Twitter |
>>> <http://www.linaro.org/linaro-blog/> Blog
>>>
>
>
> --
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
                   ` (3 preceding siblings ...)
  2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
@ 2014-04-01 23:01 ` Rafael J. Wysocki
  2014-04-02  3:14   ` Nicolas Pitre
  2014-04-02  8:26   ` Daniel Lezcano
  2014-04-04  6:29 ` Len Brown
  5 siblings, 2 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-01 23:01 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> The following patchset provides an interaction between cpuidle and the scheduler.
> 
> The first patch encapsulate the needed information for the scheduler in a
> separate cpuidle structure. The second one stores the pointer to this structure
> when entering idle. The third one, use this information to take the decision to
> find the idlest cpu.
> 
> After some basic testing with hackbench, it appears there is an improvement for
> the performances (small) and for the duration of the idle states (which provides
> a better power saving).
> 
> The measurement has been done with the 'idlestat' tool previously posted in this
> mailing list.
> 
> So the benefit is good for both sides performance and power saving.
> 
> The select_idle_sibling could be also improved in the same way.

Well, quite frankly, I don't really like this series.  Not the idea itself, but
the way it has been implemented.

First off, if the scheduler is to access idle state data stored in struct
cpuidle_state, I'm not sure why we need a separate new structure for that?
Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
instead?  [->exit_latency is the only field that find_idlest_cpu() in your
third patch seems to be using anyway.]

Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
guaranteed to be non-racy?  I mean, what if a CPU changes its state from idle to
non-idle while another one is executing find_idlest_cpu()?  In other words,
where's the read memory barrier corresponding to the write ones in the modified
cpu_idle_call()?  And is the memory barrier actually sufficient?  After all,
you need to guarantee that the CPU is still idle after you have evaluated
idle_cpu() on it.

Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
CPU the best one?  What about deeper vs shallower idle states, for example?

Rafael


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
@ 2014-04-02  3:05   ` Nicolas Pitre
  2014-04-04 11:57     ` Rafael J. Wysocki
  2014-04-17 13:53     ` Daniel Lezcano
  2014-04-15 13:03   ` Peter Zijlstra
  1 sibling, 2 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-02  3:05 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Fri, 28 Mar 2014, Daniel Lezcano wrote:

> As we know in which idle state the cpu is, we can investigate the following:
> 
> 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> deeper it is idle
> 2. what exit latency is ? the greater the exit latency is, the deeper it is
> 
> With both information, when all cpus are idle, we can choose the idlest cpu.
> 
> When one cpu is not idle, the old check against weighted load applies.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

There seems to be some problems with the implementation.

> @@ -4336,20 +4337,53 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>  {
>  	unsigned long load, min_load = ULONG_MAX;
> -	int idlest = -1;
> +	unsigned int min_exit_latency = UINT_MAX;
> +	u64 idle_stamp, min_idle_stamp = ULONG_MAX;

I don't think you really meant to assign an u64 variable with ULONG_MAX.
You probably want ULLONG_MAX here.  And probably not in fact (more 
later).

> +
> +	struct rq *rq;
> +	struct cpuidle_power *power;
> +
> +	int cpu_idle = -1;
> +	int cpu_busy = -1;
>  	int i;
>  
>  	/* Traverse only the allowed CPUs */
>  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> -		load = weighted_cpuload(i);
>  
> -		if (load < min_load || (load == min_load && i == this_cpu)) {
> -			min_load = load;
> -			idlest = i;
> +		if (idle_cpu(i)) {
> +
> +			rq = cpu_rq(i);
> +			power = rq->power;
> +			idle_stamp = rq->idle_stamp;
> +
> +			/* The cpu is idle since a shorter time */
> +			if (idle_stamp < min_idle_stamp) {
> +				min_idle_stamp = idle_stamp;
> +				cpu_idle = i;
> +				continue;

Don't you want the highest time stamp in order to select the most 
recently idled CPU?  Favoring the CPU which has been idle the longest 
makes little sense.

> +			}
> +
> +			/* The cpu is idle but the exit_latency is shorter */
> +			if (power && power->exit_latency < min_exit_latency) {
> +				min_exit_latency = power->exit_latency;
> +				cpu_idle = i;
> +				continue;
> +			}

I think this is wrong.  This gives priority to CPUs which have been idle 
for a (longer... although this should have been) shorter period of time 
over those with a shallower idle state.  I think this should rather be:

	if (power && power->exit_latency < min_exit_latency) {
		min_exit_latency = power->exit_latency;
		latest_idle_stamp = idle_stamp;
	       	cpu_idle = i;
	} else if ((!power || power->exit_latency == min_exit_latency) &&
		   idle_stamp > latest_idle_stamp) {
		latest_idle_stamp = idle_stamp;
		cpu_idle = i;
	}

So the CPU with the shallowest idle state is selected in priority, and 
if many CPUs are in the same state then the time stamp is used to 
select the most recent one. Whenever 
a shallower idle state is found then the latest_idle_stamp is reset for 
that state even if it is further in the past.

> +		} else {
> +
> +			load = weighted_cpuload(i);
> +
> +			if (load < min_load ||
> +			    (load == min_load && i == this_cpu)) {
> +				min_load = load;
> +				cpu_busy = i;
> +				continue;
> +			}
>  		}

I think this is wrong to do an if-else based on idle_cpu() here.  What 
if a CPU is heavily loaded, but for some reason it happens to be idle at 
this very moment?  With your patch it could be selected as an idle CPU 
while it would be discarded as being too busy otherwise.

It is important to determine both cpu_busy and cpu_idle for all CPUs.

And cpu_busy is a bad name for this.  Something like least_loaded would 
be more self explanatory.  Same thing for cpu_idle which could be 
clearer if named shalloest_idle.

> -	return idlest;
> +	/* Busy cpus are considered less idle than idle cpus ;) */
> +	return cpu_busy != -1 ? cpu_busy : cpu_idle;

And finally it is a policy decision whether or not we want to return 
least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs 
first or not.  That in itself needs more investigation.  To keep the 
existing policy unchanged for now the above condition should have its 
variables swapped.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-01 23:01 ` Rafael J. Wysocki
@ 2014-04-02  3:14   ` Nicolas Pitre
  2014-04-04 11:43     ` Rafael J. Wysocki
  2014-04-02  8:26   ` Daniel Lezcano
  1 sibling, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-02  3:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Wed, 2 Apr 2014, Rafael J. Wysocki wrote:

> On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> > The following patchset provides an interaction between cpuidle and the scheduler.
> > 
> > The first patch encapsulate the needed information for the scheduler in a
> > separate cpuidle structure. The second one stores the pointer to this structure
> > when entering idle. The third one, use this information to take the decision to
> > find the idlest cpu.
> > 
> > After some basic testing with hackbench, it appears there is an improvement for
> > the performances (small) and for the duration of the idle states (which provides
> > a better power saving).
> > 
> > The measurement has been done with the 'idlestat' tool previously posted in this
> > mailing list.
> > 
> > So the benefit is good for both sides performance and power saving.
> > 
> > The select_idle_sibling could be also improved in the same way.
> 
> Well, quite frankly, I don't really like this series.  Not the idea itself, but
> the way it has been implemented.
> 
> First off, if the scheduler is to access idle state data stored in struct
> cpuidle_state, I'm not sure why we need a separate new structure for that?
> Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> instead?  [->exit_latency is the only field that find_idlest_cpu() in your
> third patch seems to be using anyway.]

Future patches are likely to use the other fields.  I presume that's why 
Daniel put them there.

But I admit being on the fence about this i.e whether or not we should 
encapsulate shared fields into a separate structure or not.

> Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> guaranteed to be non-racy?  I mean, what if a CPU changes its state from idle to
> non-idle while another one is executing find_idlest_cpu()?  In other words,
> where's the read memory barrier corresponding to the write ones in the modified
> cpu_idle_call()?  And is the memory barrier actually sufficient?  After all,
> you need to guarantee that the CPU is still idle after you have evaluated
> idle_cpu() on it.

I don't think avoiding races is all that important here.  Right now any 
idle CPU is selected regardless of its idle state depth.  What this 
patch should do (considering my previous comments on it) is to favor the 
idle CPU with the shalloest idle state.  If once in a while the 
selection is wrong because of a race we're not going to make it any 
worse than what we have today without this patch.

That probably means the write barrier could potentially be omitted as 
well if it implies a useless cost.

We need to ensure the cpuidle data structure is not going away (e.g. 
cpuidle driver module removal) while another CPU looks at it though.  
The timing would have to be awfully weird for this to happen but still.

> Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> CPU the best one?  What about deeper vs shallower idle states, for example?

That's what this patch series is about.  The find_idlest_cpu code should 
look for the idle CPU with the shallowest idle state, or the one with 
the smallest load.  In this context "find_idlest_cpu" might become a 
misnomer.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-01 23:01 ` Rafael J. Wysocki
  2014-04-02  3:14   ` Nicolas Pitre
@ 2014-04-02  8:26   ` Daniel Lezcano
  2014-04-04 11:23     ` Rafael J. Wysocki
  1 sibling, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-02  8:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On 04/02/2014 01:01 AM, Rafael J. Wysocki wrote:
> On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
>> The following patchset provides an interaction between cpuidle and the scheduler.
>>
>> The first patch encapsulate the needed information for the scheduler in a
>> separate cpuidle structure. The second one stores the pointer to this structure
>> when entering idle. The third one, use this information to take the decision to
>> find the idlest cpu.
>>
>> After some basic testing with hackbench, it appears there is an improvement for
>> the performances (small) and for the duration of the idle states (which provides
>> a better power saving).
>>
>> The measurement has been done with the 'idlestat' tool previously posted in this
>> mailing list.
>>
>> So the benefit is good for both sides performance and power saving.
>>
>> The select_idle_sibling could be also improved in the same way.
>
> Well, quite frankly, I don't really like this series.  Not the idea itself, but
> the way it has been implemented.
>
> First off, if the scheduler is to access idle state data stored in struct
> cpuidle_state, I'm not sure why we need a separate new structure for that?
> Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> instead?  [->exit_latency is the only field that find_idlest_cpu() in your
> third patch seems to be using anyway.]

Hi Rafael,

thank you very much for reviewing the patchset.

I created a specific structure to encapsulate the informations needed 
for the scheduler and to prevent to export unneeded data. This is purely 
for code design. Also it was to separate the idle's energy 
characteristics from the cpuidle framework data (flags, name, etc ...).

The exit_latency field is only used in this patchset but the 
target_residency will be used also (eg. prevent to wakeup a cpu before 
the minimum idle time target residency).

The power field is ... hum ... not filled by any board (except for 
calxeda). Vendors do not like to share this information, so very likely 
that would be changed to a normalized value, I don't know.

I agree we can put a pointer to the struct cpuidle_state instead if that 
reduce the impact of the patchset.

> Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> guaranteed to be non-racy?  I mean, what if a CPU changes its state from idle to
> non-idle while another one is executing find_idlest_cpu()?  In other words,
> where's the read memory barrier corresponding to the write ones in the modified
> cpu_idle_call()?  And is the memory barrier actually sufficient?  After all,
> you need to guarantee that the CPU is still idle after you have evaluated
> idle_cpu() on it.

Well, as Nicolas mentioned it in another mail, we can live with races, 
the scheduler will take a wrong decision but nothing worth than what we 
have today. In any case we want to prevent any lock in the code.

> Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> CPU the best one?  What about deeper vs shallower idle states, for example?

I believe it is what is supposed to do the patchset. 1. if the cpu is 
idle, pick the shallower, 2. if the cpu is not idle pick the less 
loaded. But may be there is something wrong in the routine as pointed 
Nico, I have to double check it.

Thanks !

   -- Daniel




-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
                   ` (4 preceding siblings ...)
  2014-04-01 23:01 ` Rafael J. Wysocki
@ 2014-04-04  6:29 ` Len Brown
  2014-04-04  8:16   ` Daniel Lezcano
  5 siblings, 1 reply; 47+ messages in thread
From: Len Brown @ 2014-04-04  6:29 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J. Wysocki,
	nicolas.pitre, Linux PM list, alex.shi, vincent.guittot,
	morten.rasmussen

Hi Daniel,

Interesting idea.

The benefit of this patch is to reduce power.
Have you been able to measure a power reduction, via power meter, or
via built-in RAPL power meter?
(turbostat will show RAPL watts, or if you have constant quantity of
work, use turbostat -J)

thanks,
-Len

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-04  6:29 ` Len Brown
@ 2014-04-04  8:16   ` Daniel Lezcano
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-04  8:16 UTC (permalink / raw)
  To: Len Brown
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J. Wysocki,
	nicolas.pitre, Linux PM list, alex.shi, vincent.guittot,
	morten.rasmussen

On 04/04/2014 08:29 AM, Len Brown wrote:
> Hi Daniel,
>
> Interesting idea.
>
> The benefit of this patch is to reduce power.
> Have you been able to measure a power reduction, via power meter, or
> via built-in RAPL power meter?
> (turbostat will show RAPL watts, or if you have constant quantity of
> work, use turbostat -J)

Hi Len,

thanks for looking the patches.

I will tweak, respin the patchset and do some more measurements.

I don't have a power meter but may be the RAPL could help to test on x86.

Thanks
   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-02  8:26   ` Daniel Lezcano
@ 2014-04-04 11:23     ` Rafael J. Wysocki
  0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-04 11:23 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, peterz, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Wednesday, April 02, 2014 10:26:31 AM Daniel Lezcano wrote:
> On 04/02/2014 01:01 AM, Rafael J. Wysocki wrote:
> > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> >> The following patchset provides an interaction between cpuidle and the scheduler.
> >>
> >> The first patch encapsulate the needed information for the scheduler in a
> >> separate cpuidle structure. The second one stores the pointer to this structure
> >> when entering idle. The third one, use this information to take the decision to
> >> find the idlest cpu.
> >>
> >> After some basic testing with hackbench, it appears there is an improvement for
> >> the performances (small) and for the duration of the idle states (which provides
> >> a better power saving).
> >>
> >> The measurement has been done with the 'idlestat' tool previously posted in this
> >> mailing list.
> >>
> >> So the benefit is good for both sides performance and power saving.
> >>
> >> The select_idle_sibling could be also improved in the same way.
> >
> > Well, quite frankly, I don't really like this series.  Not the idea itself, but
> > the way it has been implemented.
> >
> > First off, if the scheduler is to access idle state data stored in struct
> > cpuidle_state, I'm not sure why we need a separate new structure for that?
> > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> > instead?  [->exit_latency is the only field that find_idlest_cpu() in your
> > third patch seems to be using anyway.]
> 
> Hi Rafael,
> 
> thank you very much for reviewing the patchset.
> 
> I created a specific structure to encapsulate the informations needed 
> for the scheduler and to prevent to export unneeded data. This is purely 
> for code design. Also it was to separate the idle's energy 
> characteristics from the cpuidle framework data (flags, name, etc ...).
> 
> The exit_latency field is only used in this patchset but the 
> target_residency will be used also (eg. prevent to wakeup a cpu before 
> the minimum idle time target residency).

OK

It would be good to add that heuristics upfront so that we can see the full
picture.

> The power field is ... hum ... not filled by any board (except for 
> calxeda). Vendors do not like to share this information, so very likely 
> that would be changed to a normalized value, I don't know.

I'm not sure if that field is ever going to be used by everyone to be honest.

> I agree we can put a pointer to the struct cpuidle_state instead if that 
> reduce the impact of the patchset.

Yes, it will, in my opinion.

> > Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> > guaranteed to be non-racy?  I mean, what if a CPU changes its state from idle to
> > non-idle while another one is executing find_idlest_cpu()?  In other words,
> > where's the read memory barrier corresponding to the write ones in the modified
> > cpu_idle_call()?  And is the memory barrier actually sufficient?  After all,
> > you need to guarantee that the CPU is still idle after you have evaluated
> > idle_cpu() on it.
> 
> Well, as Nicolas mentioned it in another mail, we can live with races, 
> the scheduler will take a wrong decision but nothing worth than what we 

I guess you mean "worse"?  I'm not sure about that.

> have today. In any case we want to prevent any lock in the code.

Of course. :-)

> > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> > CPU the best one?  What about deeper vs shallower idle states, for example?
> 
> I believe it is what is supposed to do the patchset. 1. if the cpu is 
> idle, pick the shallower, 2. if the cpu is not idle pick the less 
> loaded. But may be there is something wrong in the routine as pointed 
> Nico, I have to double check it.

Yes, that routine doesn't look entirely correct then.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-02  3:14   ` Nicolas Pitre
@ 2014-04-04 11:43     ` Rafael J. Wysocki
  2014-04-15 13:17       ` Peter Zijlstra
  2014-04-15 13:25       ` Peter Zijlstra
  0 siblings, 2 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-04 11:43 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Tuesday, April 01, 2014 11:14:33 PM Nicolas Pitre wrote:
> On Wed, 2 Apr 2014, Rafael J. Wysocki wrote:
> 
> > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> > > The following patchset provides an interaction between cpuidle and the scheduler.
> > > 
> > > The first patch encapsulate the needed information for the scheduler in a
> > > separate cpuidle structure. The second one stores the pointer to this structure
> > > when entering idle. The third one, use this information to take the decision to
> > > find the idlest cpu.
> > > 
> > > After some basic testing with hackbench, it appears there is an improvement for
> > > the performances (small) and for the duration of the idle states (which provides
> > > a better power saving).
> > > 
> > > The measurement has been done with the 'idlestat' tool previously posted in this
> > > mailing list.
> > > 
> > > So the benefit is good for both sides performance and power saving.
> > > 
> > > The select_idle_sibling could be also improved in the same way.
> > 
> > Well, quite frankly, I don't really like this series.  Not the idea itself, but
> > the way it has been implemented.
> > 
> > First off, if the scheduler is to access idle state data stored in struct
> > cpuidle_state, I'm not sure why we need a separate new structure for that?
> > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> > instead?  [->exit_latency is the only field that find_idlest_cpu() in your
> > third patch seems to be using anyway.]
> 
> Future patches are likely to use the other fields.  I presume that's why 
> Daniel put them there.
> 
> But I admit being on the fence about this i.e whether or not we should 
> encapsulate shared fields into a separate structure or not.

Quite frankly, I don't see a point in using a separate structure here.

> > Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> > guaranteed to be non-racy?  I mean, what if a CPU changes its state from idle to
> > non-idle while another one is executing find_idlest_cpu()?  In other words,
> > where's the read memory barrier corresponding to the write ones in the modified
> > cpu_idle_call()?  And is the memory barrier actually sufficient?  After all,
> > you need to guarantee that the CPU is still idle after you have evaluated
> > idle_cpu() on it.
> 
> I don't think avoiding races is all that important here.  Right now any 
> idle CPU is selected regardless of its idle state depth.  What this 
> patch should do (considering my previous comments on it) is to favor the 
> idle CPU with the shalloest idle state.  If once in a while the 
> selection is wrong because of a race we're not going to make it any 
> worse than what we have today without this patch.
> 
> That probably means the write barrier could potentially be omitted as 
> well if it implies a useless cost.

Yes, the write barriers don't seem to serve any real purpose.

> We need to ensure the cpuidle data structure is not going away (e.g. 
> cpuidle driver module removal) while another CPU looks at it though.  
> The timing would have to be awfully weird for this to happen but still.

Well, I'm not sure if that is a real concern.  Only a couple of drivers try
to implement module unloading and I guess this isn't tested too much, so
perhaps we should just make it impossible to unload a cpuidle driver?

> > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> > CPU the best one?  What about deeper vs shallower idle states, for example?
> 
> That's what this patch series is about.  The find_idlest_cpu code should 
> look for the idle CPU with the shallowest idle state, or the one with 
> the smallest load.  In this context "find_idlest_cpu" might become a 
> misnomer.

Yes, clearly.  It should be called find_best_cpu or something like that.

Thanks!


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-02  3:05   ` Nicolas Pitre
@ 2014-04-04 11:57     ` Rafael J. Wysocki
  2014-04-04 16:56       ` Nicolas Pitre
  2014-04-17 13:53     ` Daniel Lezcano
  1 sibling, 1 reply; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-04 11:57 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote:
> On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> 
> > As we know in which idle state the cpu is, we can investigate the following:
> > 
> > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> > deeper it is idle
> > 2. what exit latency is ? the greater the exit latency is, the deeper it is
> > 
> > With both information, when all cpus are idle, we can choose the idlest cpu.
> > 
> > When one cpu is not idle, the old check against weighted load applies.
> > 
> > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> 
> There seems to be some problems with the implementation.
> 
> > @@ -4336,20 +4337,53 @@ static int
> >  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> >  {
> >  	unsigned long load, min_load = ULONG_MAX;
> > -	int idlest = -1;
> > +	unsigned int min_exit_latency = UINT_MAX;
> > +	u64 idle_stamp, min_idle_stamp = ULONG_MAX;
> 
> I don't think you really meant to assign an u64 variable with ULONG_MAX.
> You probably want ULLONG_MAX here.  And probably not in fact (more 
> later).
> 
> > +
> > +	struct rq *rq;
> > +	struct cpuidle_power *power;
> > +
> > +	int cpu_idle = -1;
> > +	int cpu_busy = -1;
> >  	int i;
> >  
> >  	/* Traverse only the allowed CPUs */
> >  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> > -		load = weighted_cpuload(i);
> >  
> > -		if (load < min_load || (load == min_load && i == this_cpu)) {
> > -			min_load = load;
> > -			idlest = i;
> > +		if (idle_cpu(i)) {
> > +
> > +			rq = cpu_rq(i);
> > +			power = rq->power;
> > +			idle_stamp = rq->idle_stamp;
> > +
> > +			/* The cpu is idle since a shorter time */
> > +			if (idle_stamp < min_idle_stamp) {
> > +				min_idle_stamp = idle_stamp;
> > +				cpu_idle = i;
> > +				continue;
> 
> Don't you want the highest time stamp in order to select the most 
> recently idled CPU?  Favoring the CPU which has been idle the longest 
> makes little sense.

It may make sense if the hardware can auto-promote CPUs to deeper C-states.

Something like that happens with package C-states that are only entered when
all cores have entered a particular core C-state already.  In that case the
probability of the core being in a deeper state grows with time.

That said I would just drop this heuristics for the time being.  If auto-promotion
is disregarded, it doesn't really matter how much time the given CPU has been idle
except for one case: When the target residency of its idle state hasn't been
reached yet, waking up the CPU may be a mistake (depending on how deep the state
actually is, but for the majority of drivers in the tree we don't have any measure
of that).

> > +			}
> > +
> > +			/* The cpu is idle but the exit_latency is shorter */
> > +			if (power && power->exit_latency < min_exit_latency) {
> > +				min_exit_latency = power->exit_latency;
> > +				cpu_idle = i;
> > +				continue;
> > +			}
> 
> I think this is wrong.  This gives priority to CPUs which have been idle 
> for a (longer... although this should have been) shorter period of time 
> over those with a shallower idle state.  I think this should rather be:
> 
> 	if (power && power->exit_latency < min_exit_latency) {
> 		min_exit_latency = power->exit_latency;
> 		latest_idle_stamp = idle_stamp;
> 	       	cpu_idle = i;
> 	} else if ((!power || power->exit_latency == min_exit_latency) &&
> 		   idle_stamp > latest_idle_stamp) {
> 		latest_idle_stamp = idle_stamp;
> 		cpu_idle = i;
> 	}
> 
> So the CPU with the shallowest idle state is selected in priority, and 
> if many CPUs are in the same state then the time stamp is used to 
> select the most recent one.

Again, if auto-promotion is disregarded, it doesn't really matter which of them
is woken up.

> Whenever a shallower idle state is found then the latest_idle_stamp is reset for 
> that state even if it is further in the past.
> 
> > +		} else {
> > +
> > +			load = weighted_cpuload(i);
> > +
> > +			if (load < min_load ||
> > +			    (load == min_load && i == this_cpu)) {
> > +				min_load = load;
> > +				cpu_busy = i;
> > +				continue;
> > +			}
> >  		}
> 
> I think this is wrong to do an if-else based on idle_cpu() here.  What 
> if a CPU is heavily loaded, but for some reason it happens to be idle at 
> this very moment?  With your patch it could be selected as an idle CPU 
> while it would be discarded as being too busy otherwise.

But see below ->

> It is important to determine both cpu_busy and cpu_idle for all CPUs.
> 
> And cpu_busy is a bad name for this.  Something like least_loaded would 
> be more self explanatory.  Same thing for cpu_idle which could be 
> clearer if named shalloest_idle.

shallowest_idle?

> > -	return idlest;
> > +	/* Busy cpus are considered less idle than idle cpus ;) */
> > +	return cpu_busy != -1 ? cpu_busy : cpu_idle;
> 
> And finally it is a policy decision whether or not we want to return 
> least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs 
> first or not.  That in itself needs more investigation.  To keep the 
> existing policy unchanged for now the above condition should have its 
> variables swapped.

Which means that once we've find the first idle CPU, it is not useful to
continue computing least_loaded, because we will return the idle one anyway,
right?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-04 11:57     ` Rafael J. Wysocki
@ 2014-04-04 16:56       ` Nicolas Pitre
  2014-04-05  2:01         ` Rafael J. Wysocki
  0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-04 16:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Fri, 4 Apr 2014, Rafael J. Wysocki wrote:

> On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote:
> > On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> > 
> > > As we know in which idle state the cpu is, we can investigate the following:
> > > 
> > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> > > deeper it is idle
> > > 2. what exit latency is ? the greater the exit latency is, the deeper it is
> > > 
> > > With both information, when all cpus are idle, we can choose the idlest cpu.
> > > 
> > > When one cpu is not idle, the old check against weighted load applies.
> > > 
> > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > 
> > There seems to be some problems with the implementation.
> > 
> > > @@ -4336,20 +4337,53 @@ static int
> > >  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> > >  {
> > >  	unsigned long load, min_load = ULONG_MAX;
> > > -	int idlest = -1;
> > > +	unsigned int min_exit_latency = UINT_MAX;
> > > +	u64 idle_stamp, min_idle_stamp = ULONG_MAX;
> > 
> > I don't think you really meant to assign an u64 variable with ULONG_MAX.
> > You probably want ULLONG_MAX here.  And probably not in fact (more 
> > later).
> > 
> > > +
> > > +	struct rq *rq;
> > > +	struct cpuidle_power *power;
> > > +
> > > +	int cpu_idle = -1;
> > > +	int cpu_busy = -1;
> > >  	int i;
> > >  
> > >  	/* Traverse only the allowed CPUs */
> > >  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> > > -		load = weighted_cpuload(i);
> > >  
> > > -		if (load < min_load || (load == min_load && i == this_cpu)) {
> > > -			min_load = load;
> > > -			idlest = i;
> > > +		if (idle_cpu(i)) {
> > > +
> > > +			rq = cpu_rq(i);
> > > +			power = rq->power;
> > > +			idle_stamp = rq->idle_stamp;
> > > +
> > > +			/* The cpu is idle since a shorter time */
> > > +			if (idle_stamp < min_idle_stamp) {
> > > +				min_idle_stamp = idle_stamp;
> > > +				cpu_idle = i;
> > > +				continue;
> > 
> > Don't you want the highest time stamp in order to select the most 
> > recently idled CPU?  Favoring the CPU which has been idle the longest 
> > makes little sense.
> 
> It may make sense if the hardware can auto-promote CPUs to deeper C-states.

If so the promotion will happen over time, no?  What I'm saying here is 
that those CPUs which have been idle longer should not be favored when 
it is time to select a CPU for a task to run. More recently idled CPUs 
are more likely to be in a shallower C-state.

> Something like that happens with package C-states that are only entered when
> all cores have entered a particular core C-state already.  In that case the
> probability of the core being in a deeper state grows with time.

Exactly what I'm saying.

Also here it is worth remembering that the scheduling domains should 
represent those packages that share common C-states at a higher level.  
The scheduler can then be told not to balance across domains if it 
doesn't need to in order to favor the conditions for those package 
C-states to be used.  That's what the task packing patch series is 
about, independently of this one.

> That said I would just drop this heuristics for the time being.  If auto-promotion
> is disregarded, it doesn't really matter how much time the given CPU has been idle
> except for one case: When the target residency of its idle state hasn't been
> reached yet, waking up the CPU may be a mistake (depending on how deep the state
> actually is, but for the majority of drivers in the tree we don't have any measure
> of that).

There is one reason for considering the time a CPU has been idle, 
assuming equivalent C-state, and that is cache snooping.  The longer a 
CPU is idle, the more likely its cache content will have been claimed 
and migrated by other CPUs.  Of course that doesn't make much difference 
for deeper C-states where the cache isn't preserved, but it is probably 
simpler and cheaper to apply this heuristic in all cases.

> > > +			}
> > > +
> > > +			/* The cpu is idle but the exit_latency is shorter */
> > > +			if (power && power->exit_latency < min_exit_latency) {
> > > +				min_exit_latency = power->exit_latency;
> > > +				cpu_idle = i;
> > > +				continue;
> > > +			}
> > 
> > I think this is wrong.  This gives priority to CPUs which have been idle 
> > for a (longer... although this should have been) shorter period of time 
> > over those with a shallower idle state.  I think this should rather be:
> > 
> > 	if (power && power->exit_latency < min_exit_latency) {
> > 		min_exit_latency = power->exit_latency;
> > 		latest_idle_stamp = idle_stamp;
> > 	       	cpu_idle = i;
> > 	} else if ((!power || power->exit_latency == min_exit_latency) &&
> > 		   idle_stamp > latest_idle_stamp) {
> > 		latest_idle_stamp = idle_stamp;
> > 		cpu_idle = i;
> > 	}
> > 
> > So the CPU with the shallowest idle state is selected in priority, and 
> > if many CPUs are in the same state then the time stamp is used to 
> > select the most recent one.
> 
> Again, if auto-promotion is disregarded, it doesn't really matter which of them
> is woken up.

If it doesn't matter then it doesn't hurt.  But in some cases it 
matters.

> > Whenever a shallower idle state is found then the latest_idle_stamp is reset for 
> > that state even if it is further in the past.
> > 
> > > +		} else {
> > > +
> > > +			load = weighted_cpuload(i);
> > > +
> > > +			if (load < min_load ||
> > > +			    (load == min_load && i == this_cpu)) {
> > > +				min_load = load;
> > > +				cpu_busy = i;
> > > +				continue;
> > > +			}
> > >  		}
> > 
> > I think this is wrong to do an if-else based on idle_cpu() here.  What 
> > if a CPU is heavily loaded, but for some reason it happens to be idle at 
> > this very moment?  With your patch it could be selected as an idle CPU 
> > while it would be discarded as being too busy otherwise.
> 
> But see below ->
> 
> > It is important to determine both cpu_busy and cpu_idle for all CPUs.
> > 
> > And cpu_busy is a bad name for this.  Something like least_loaded would 
> > be more self explanatory.  Same thing for cpu_idle which could be 
> > clearer if named shalloest_idle.
> 
> shallowest_idle?

Something that means the CPU with the shallowest C-state.  Using 
"cpu_idle" for this variable doesn't cut it.

> > > -	return idlest;
> > > +	/* Busy cpus are considered less idle than idle cpus ;) */
> > > +	return cpu_busy != -1 ? cpu_busy : cpu_idle;
> > 
> > And finally it is a policy decision whether or not we want to return 
> > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs 
> > first or not.  That in itself needs more investigation.  To keep the 
> > existing policy unchanged for now the above condition should have its 
> > variables swapped.
> 
> Which means that once we've find the first idle CPU, it is not useful to
> continue computing least_loaded, because we will return the idle one anyway,
> right?

Good point.  Currently, that should be the case.

Eventually we'll want to put new tasks on lightly loaded CPUs instead of 
waking up a fully idle CPU in order to favor deeper C-states. But that 
requires a patch series of its own just to determine how loaded a CPU is 
and how much work it can still accommodate before being oversubscribed, 
etc.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-04 16:56       ` Nicolas Pitre
@ 2014-04-05  2:01         ` Rafael J. Wysocki
  0 siblings, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-05  2:01 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Daniel Lezcano, linux-kernel, mingo, peterz, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Friday, April 04, 2014 12:56:52 PM Nicolas Pitre wrote:
> On Fri, 4 Apr 2014, Rafael J. Wysocki wrote:
> 
> > On Tuesday, April 01, 2014 11:05:49 PM Nicolas Pitre wrote:
> > > On Fri, 28 Mar 2014, Daniel Lezcano wrote:
> > > 
> > > > As we know in which idle state the cpu is, we can investigate the following:
> > > > 
> > > > 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
> > > > deeper it is idle
> > > > 2. what exit latency is ? the greater the exit latency is, the deeper it is
> > > > 
> > > > With both information, when all cpus are idle, we can choose the idlest cpu.
> > > > 
> > > > When one cpu is not idle, the old check against weighted load applies.
> > > > 
> > > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > > 
> > > There seems to be some problems with the implementation.
> > > 
> > > > @@ -4336,20 +4337,53 @@ static int
> > > >  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
> > > >  {
> > > >  	unsigned long load, min_load = ULONG_MAX;
> > > > -	int idlest = -1;
> > > > +	unsigned int min_exit_latency = UINT_MAX;
> > > > +	u64 idle_stamp, min_idle_stamp = ULONG_MAX;
> > > 
> > > I don't think you really meant to assign an u64 variable with ULONG_MAX.
> > > You probably want ULLONG_MAX here.  And probably not in fact (more 
> > > later).
> > > 
> > > > +
> > > > +	struct rq *rq;
> > > > +	struct cpuidle_power *power;
> > > > +
> > > > +	int cpu_idle = -1;
> > > > +	int cpu_busy = -1;
> > > >  	int i;
> > > >  
> > > >  	/* Traverse only the allowed CPUs */
> > > >  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> > > > -		load = weighted_cpuload(i);
> > > >  
> > > > -		if (load < min_load || (load == min_load && i == this_cpu)) {
> > > > -			min_load = load;
> > > > -			idlest = i;
> > > > +		if (idle_cpu(i)) {
> > > > +
> > > > +			rq = cpu_rq(i);
> > > > +			power = rq->power;
> > > > +			idle_stamp = rq->idle_stamp;
> > > > +
> > > > +			/* The cpu is idle since a shorter time */
> > > > +			if (idle_stamp < min_idle_stamp) {
> > > > +				min_idle_stamp = idle_stamp;
> > > > +				cpu_idle = i;
> > > > +				continue;
> > > 
> > > Don't you want the highest time stamp in order to select the most 
> > > recently idled CPU?  Favoring the CPU which has been idle the longest 
> > > makes little sense.
> > 
> > It may make sense if the hardware can auto-promote CPUs to deeper C-states.
> 
> If so the promotion will happen over time, no?  What I'm saying here is 
> that those CPUs which have been idle longer should not be favored when 
> it is time to select a CPU for a task to run. More recently idled CPUs 
> are more likely to be in a shallower C-state.
> 
> > Something like that happens with package C-states that are only entered when
> > all cores have entered a particular core C-state already.  In that case the
> > probability of the core being in a deeper state grows with time.
> 
> Exactly what I'm saying.

Right, I got that the other way around by mistake.

> Also here it is worth remembering that the scheduling domains should 
> represent those packages that share common C-states at a higher level.  
> The scheduler can then be told not to balance across domains if it 
> doesn't need to in order to favor the conditions for those package 
> C-states to be used.  That's what the task packing patch series is 
> about, independently of this one.
> 
> > That said I would just drop this heuristics for the time being.  If auto-promotion
> > is disregarded, it doesn't really matter how much time the given CPU has been idle
> > except for one case: When the target residency of its idle state hasn't been
> > reached yet, waking up the CPU may be a mistake (depending on how deep the state
> > actually is, but for the majority of drivers in the tree we don't have any measure
> > of that).
> 
> There is one reason for considering the time a CPU has been idle, 
> assuming equivalent C-state, and that is cache snooping.  The longer a 
> CPU is idle, the more likely its cache content will have been claimed 
> and migrated by other CPUs.  Of course that doesn't make much difference 
> for deeper C-states where the cache isn't preserved, but it is probably 
> simpler and cheaper to apply this heuristic in all cases.

Yes, that sounds like it might be a reason, but I'd like to see numbers
confirming that to be honest.

> > > > +			}
> > > > +
> > > > +			/* The cpu is idle but the exit_latency is shorter */
> > > > +			if (power && power->exit_latency < min_exit_latency) {
> > > > +				min_exit_latency = power->exit_latency;
> > > > +				cpu_idle = i;
> > > > +				continue;
> > > > +			}
> > > 
> > > I think this is wrong.  This gives priority to CPUs which have been idle 
> > > for a (longer... although this should have been) shorter period of time 
> > > over those with a shallower idle state.  I think this should rather be:
> > > 
> > > 	if (power && power->exit_latency < min_exit_latency) {
> > > 		min_exit_latency = power->exit_latency;
> > > 		latest_idle_stamp = idle_stamp;
> > > 	       	cpu_idle = i;
> > > 	} else if ((!power || power->exit_latency == min_exit_latency) &&
> > > 		   idle_stamp > latest_idle_stamp) {
> > > 		latest_idle_stamp = idle_stamp;
> > > 		cpu_idle = i;
> > > 	}
> > > 
> > > So the CPU with the shallowest idle state is selected in priority, and 
> > > if many CPUs are in the same state then the time stamp is used to 
> > > select the most recent one.
> > 
> > Again, if auto-promotion is disregarded, it doesn't really matter which of them
> > is woken up.
> 
> If it doesn't matter then it doesn't hurt.  But in some cases it 
> matters.
> 
> > > Whenever a shallower idle state is found then the latest_idle_stamp is reset for 
> > > that state even if it is further in the past.
> > > 
> > > > +		} else {
> > > > +
> > > > +			load = weighted_cpuload(i);
> > > > +
> > > > +			if (load < min_load ||
> > > > +			    (load == min_load && i == this_cpu)) {
> > > > +				min_load = load;
> > > > +				cpu_busy = i;
> > > > +				continue;
> > > > +			}
> > > >  		}
> > > 
> > > I think this is wrong to do an if-else based on idle_cpu() here.  What 
> > > if a CPU is heavily loaded, but for some reason it happens to be idle at 
> > > this very moment?  With your patch it could be selected as an idle CPU 
> > > while it would be discarded as being too busy otherwise.
> > 
> > But see below ->
> > 
> > > It is important to determine both cpu_busy and cpu_idle for all CPUs.
> > > 
> > > And cpu_busy is a bad name for this.  Something like least_loaded would 
> > > be more self explanatory.  Same thing for cpu_idle which could be 
> > > clearer if named shalloest_idle.
> > 
> > shallowest_idle?
> 
> Something that means the CPU with the shallowest C-state.  Using 
> "cpu_idle" for this variable doesn't cut it.

Yes, that was about the typo above only. :-)

> > > > -	return idlest;
> > > > +	/* Busy cpus are considered less idle than idle cpus ;) */
> > > > +	return cpu_busy != -1 ? cpu_busy : cpu_idle;
> > > 
> > > And finally it is a policy decision whether or not we want to return 
> > > least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs 
> > > first or not.  That in itself needs more investigation.  To keep the 
> > > existing policy unchanged for now the above condition should have its 
> > > variables swapped.
> > 
> > Which means that once we've find the first idle CPU, it is not useful to
> > continue computing least_loaded, because we will return the idle one anyway,
> > right?
> 
> Good point.  Currently, that should be the case.
> 
> Eventually we'll want to put new tasks on lightly loaded CPUs instead of 
> waking up a fully idle CPU in order to favor deeper C-states. But that 
> requires a patch series of its own just to determine how loaded a CPU is 
> and how much work it can still accommodate before being oversubscribed, 
> etc.

Wouldn't we need power consumption numbers for that realistically?

Rafael


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
  2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
@ 2014-04-15 12:43   ` Peter Zijlstra
  2014-04-15 12:44     ` Peter Zijlstra
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 12:43 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>  			if (!ret) {
>  				trace_cpu_idle_rcuidle(next_state, dev->cpu);
>  
> +				*power = &drv->states[next_state].power;
> +
> +				wmb();
> +

I very much suspect you meant: smp_wmb(), as I don't see the hardware
reading that pointer, therefore UP wouldn't care. Also, any and all
barriers should come with a comment that describes the data ordering and
points to the matchin barriers.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
  2014-04-15 12:43   ` Peter Zijlstra
@ 2014-04-15 12:44     ` Peter Zijlstra
  2014-04-15 14:17       ` Daniel Lezcano
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 12:44 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> > @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
> >  			if (!ret) {
> >  				trace_cpu_idle_rcuidle(next_state, dev->cpu);
> >  
> > +				*power = &drv->states[next_state].power;
> > +
> > +				wmb();
> > +
> 
> I very much suspect you meant: smp_wmb(), as I don't see the hardware
> reading that pointer, therefore UP wouldn't care. Also, any and all
> barriers should come with a comment that describes the data ordering and
> points to the matchin barriers.

Furthermore, this patch fails to describe the life-time rules of the
object placed there. Can the objected pointed to ever disappear?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
  2014-04-02  3:05   ` Nicolas Pitre
@ 2014-04-15 13:03   ` Peter Zijlstra
  1 sibling, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:03 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Fri, Mar 28, 2014 at 01:29:56PM +0100, Daniel Lezcano wrote:
> @@ -4336,20 +4337,53 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>  {
>  	unsigned long load, min_load = ULONG_MAX;
> -	int idlest = -1;
> +	unsigned int min_exit_latency = UINT_MAX;
> +	u64 idle_stamp, min_idle_stamp = ULONG_MAX;
> +
> +	struct rq *rq;
> +	struct cpuidle_power *power;
> +
> +	int cpu_idle = -1;
> +	int cpu_busy = -1;
>  	int i;
>  
>  	/* Traverse only the allowed CPUs */
>  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> -		load = weighted_cpuload(i);
>  
> -		if (load < min_load || (load == min_load && i == this_cpu)) {
> -			min_load = load;
> -			idlest = i;
> +		if (idle_cpu(i)) {
> +
> +			rq = cpu_rq(i);
> +			power = rq->power;
> +			idle_stamp = rq->idle_stamp;
> +
> +			/* The cpu is idle since a shorter time */
> +			if (idle_stamp < min_idle_stamp) {
> +				min_idle_stamp = idle_stamp;
> +				cpu_idle = i;
> +				continue;
> +			}
> +
> +			/* The cpu is idle but the exit_latency is shorter */
> +			if (power && power->exit_latency < min_exit_latency) {
> +				min_exit_latency = power->exit_latency;
> +				cpu_idle = i;
> +				continue;
> +			}

Aside from the arguments made by Nico (which I agree with), depending on
the life time rules of the power object we might need
smp_read_barrier_depends() between reading and using.

If all these objects are static and never change content we do not, if
there's dynamic objects involved we probably should.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-01  9:05         ` Vincent Guittot
@ 2014-04-15 13:13           ` Peter Zijlstra
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:13 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Daniel Lezcano, linux-kernel, Ingo Molnar, rjw, Nicolas Pitre,
	linux-pm, Alex Shi, Morten Rasmussen

On Tue, Apr 01, 2014 at 11:05:16AM +0200, Vincent Guittot wrote:
> >> We are working on a bench which can generate middle load pattern with
> >> idle CPUs but it's not available yet. In the mean time, one bench that
> >> plays with idle time is cyclictest, it will not give you performance
> >> results but only scheduling latency which might be what you are
> >> looking for.
> >
> >
> > Yeah, thanks. I believe I know what is in the rt-tests package :)
> >
> > What I meant is what kind of values would you like to see with this patchset
> > ?
> 
> IIUC, you patch tries to improve the wake up latency of a task by
> selecting the CPUs with the shallowest C-state, so this metrics seems
> to be a good candidate

cyclic-test might be too regular to really measure anything though.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-04 11:43     ` Rafael J. Wysocki
@ 2014-04-15 13:17       ` Peter Zijlstra
  2014-04-15 13:25       ` Peter Zijlstra
  1 sibling, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm,
	alex.shi, vincent.guittot, morten.rasmussen

On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > We need to ensure the cpuidle data structure is not going away (e.g. 
> > cpuidle driver module removal) while another CPU looks at it though.  
> > The timing would have to be awfully weird for this to happen but still.
> 
> Well, I'm not sure if that is a real concern.  Only a couple of drivers try
> to implement module unloading and I guess this isn't tested too much, so
> perhaps we should just make it impossible to unload a cpuidle driver?

The 'easy' solution is to mandate the use of rcu_read_lock() around the
dereference and make all cpuidle drivers put an rcu_barrier() in their
module unload path.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-04 11:43     ` Rafael J. Wysocki
  2014-04-15 13:17       ` Peter Zijlstra
@ 2014-04-15 13:25       ` Peter Zijlstra
  2014-04-15 15:27         ` Nicolas Pitre
  2014-04-15 15:33         ` Rafael J. Wysocki
  1 sibling, 2 replies; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 13:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm,
	alex.shi, vincent.guittot, morten.rasmussen

On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > That's what this patch series is about.  The find_idlest_cpu code should 
> > look for the idle CPU with the shallowest idle state, or the one with 
> > the smallest load.  In this context "find_idlest_cpu" might become a 
> > misnomer.
> 
> Yes, clearly.  It should be called find_best_cpu or something like that.

Ha!, but for what purpose? We already have find_busiest_cpu() to find
the CPU to steal work from. The converse action, currently called
find_idlest_cpu() is finding the CPU where to put work.

'Best' is ambiguous in all regards, it doesn't convey the direction nor
the quality sorted on.

So while idlest might be somewhat of a misnomer, it at least conveys the
directional thing fairly well. Also we are still searching the least
busy, and preferable an idle, cpu. 'Idlest' being a superlative also
conveys the meaning of order.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
  2014-04-15 12:44     ` Peter Zijlstra
@ 2014-04-15 14:17       ` Daniel Lezcano
  2014-04-15 14:33         ` Peter Zijlstra
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-15 14:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
> On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
>> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
>>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>>>   			if (!ret) {
>>>   				trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>>
>>> +				*power = &drv->states[next_state].power;
>>> +
>>> +				wmb();
>>> +
>>
>> I very much suspect you meant: smp_wmb(), as I don't see the hardware
>> reading that pointer, therefore UP wouldn't care. Also, any and all
>> barriers should come with a comment that describes the data ordering and
>> points to the matchin barriers.
>
> Furthermore, this patch fails to describe the life-time rules of the
> object placed there. Can the objected pointed to ever disappear?

Hi Peter,

thanks for reviewing the patches.

There are a couple of situations where a cpuidle state can disappear:

1. For x86/acpi with dynamic c-states, when a laptop switches from 
battery to AC that could result on removing the deeper idle state. The 
acpi driver triggers:

'acpi_processor_cst_has_changed' which will call 
'cpuidle_pause_and_lock'. This one will call 
'cpuidle_uninstall_idle_handler' which in turn calls 'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set 
to NULL again.

2. The cpuidle driver is unloaded. Logically that could happen but not 
in practice because the drivers are always compiled in and 95% of the 
drivers are not coded to unregister the driver. Anyway ...

The unloading code must call 'cpuidle_unregister_device', that calls 
'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.

IIUC, the race can happen if we take the pointer and then one of these 
two situation occurs at the same moment.

As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a 
rcu_barrier in 'cpuidle_pause_and_lock' or 
'cpuidle_uninstall_idle_handler' should suffice, no ?

Thanks

   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
  2014-04-15 14:17       ` Daniel Lezcano
@ 2014-04-15 14:33         ` Peter Zijlstra
  2014-04-15 14:39           ` Daniel Lezcano
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-15 14:33 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote:
> On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
> >On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
> >>On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> >>>@@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
> >>>  			if (!ret) {
> >>>  				trace_cpu_idle_rcuidle(next_state, dev->cpu);
> >>>
> >>>+				*power = &drv->states[next_state].power;
> >>>+
> >>>+				wmb();
> >>>+
> >>
> >>I very much suspect you meant: smp_wmb(), as I don't see the hardware
> >>reading that pointer, therefore UP wouldn't care. Also, any and all
> >>barriers should come with a comment that describes the data ordering and
> >>points to the matchin barriers.
> >
> >Furthermore, this patch fails to describe the life-time rules of the
> >object placed there. Can the objected pointed to ever disappear?
> 
> Hi Peter,
> 
> thanks for reviewing the patches.
> 
> There are a couple of situations where a cpuidle state can disappear:
> 
> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
> to AC that could result on removing the deeper idle state. The acpi driver
> triggers:
> 
> 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'.
> This one will call 'cpuidle_uninstall_idle_handler' which in turn calls
> 'kick_all_cpus_sync'.
> 
> All cpus will exit their idle state and the pointed object will be set to
> NULL again.
> 
> 2. The cpuidle driver is unloaded. Logically that could happen but not in
> practice because the drivers are always compiled in and 95% of the drivers
> are not coded to unregister the driver. Anyway ...
> 
> The unloading code must call 'cpuidle_unregister_device', that calls
> 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.
> 
> IIUC, the race can happen if we take the pointer and then one of these two
> situation occurs at the same moment.
> 
> As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a
> rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler'
> should suffice, no ?

Indeed. But be sure to document this.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 2/3] idle: store the idle state the cpu is
  2014-04-15 14:33         ` Peter Zijlstra
@ 2014-04-15 14:39           ` Daniel Lezcano
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-15 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, rjw, nicolas.pitre, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On 04/15/2014 04:33 PM, Peter Zijlstra wrote:
> On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote:
>> On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
>>> On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
>>>> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
>>>>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>>>>>   			if (!ret) {
>>>>>   				trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>>>>
>>>>> +				*power = &drv->states[next_state].power;
>>>>> +
>>>>> +				wmb();
>>>>> +
>>>>
>>>> I very much suspect you meant: smp_wmb(), as I don't see the hardware
>>>> reading that pointer, therefore UP wouldn't care. Also, any and all
>>>> barriers should come with a comment that describes the data ordering and
>>>> points to the matchin barriers.
>>>
>>> Furthermore, this patch fails to describe the life-time rules of the
>>> object placed there. Can the objected pointed to ever disappear?
>>
>> Hi Peter,
>>
>> thanks for reviewing the patches.
>>
>> There are a couple of situations where a cpuidle state can disappear:
>>
>> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
>> to AC that could result on removing the deeper idle state. The acpi driver
>> triggers:
>>
>> 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'.
>> This one will call 'cpuidle_uninstall_idle_handler' which in turn calls
>> 'kick_all_cpus_sync'.
>>
>> All cpus will exit their idle state and the pointed object will be set to
>> NULL again.
>>
>> 2. The cpuidle driver is unloaded. Logically that could happen but not in
>> practice because the drivers are always compiled in and 95% of the drivers
>> are not coded to unregister the driver. Anyway ...
>>
>> The unloading code must call 'cpuidle_unregister_device', that calls
>> 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.
>>
>> IIUC, the race can happen if we take the pointer and then one of these two
>> situation occurs at the same moment.
>>
>> As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a
>> rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler'
>> should suffice, no ?
>
> Indeed. But be sure to document this.

Yes, sure. Thanks for pointing this.

   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-15 13:25       ` Peter Zijlstra
@ 2014-04-15 15:27         ` Nicolas Pitre
  2014-04-15 15:33         ` Rafael J. Wysocki
  1 sibling, 0 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-15 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Daniel Lezcano, linux-kernel, mingo, linux-pm,
	alex.shi, vincent.guittot, morten.rasmussen

On Tue, 15 Apr 2014, Peter Zijlstra wrote:

> On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > > That's what this patch series is about.  The find_idlest_cpu code should 
> > > look for the idle CPU with the shallowest idle state, or the one with 
> > > the smallest load.  In this context "find_idlest_cpu" might become a 
> > > misnomer.
> > 
> > Yes, clearly.  It should be called find_best_cpu or something like that.
> 
> Ha!, but for what purpose? We already have find_busiest_cpu() to find
> the CPU to steal work from. The converse action, currently called
> find_idlest_cpu() is finding the CPU where to put work.
> 
> 'Best' is ambiguous in all regards, it doesn't convey the direction nor
> the quality sorted on.
> 
> So while idlest might be somewhat of a misnomer, it at least conveys the
> directional thing fairly well. Also we are still searching the least
> busy, and preferable an idle, cpu. 'Idlest' being a superlative also
> conveys the meaning of order.

I agree that anything which is called "best" is ambigous.  Best for 
what?  That isn't self explanatory.

However "idlest" is no longer the wanted attribute here.  "Least busy" 
is right.  But not necessarily the "idlest".  The "best" CPU here is 
somewhat in the middle between busiest and idlest i.e. preferably idle, 
but not the "idlest" in the cpuidle sense.

Maybe we could use your definition to simply call it 
find_cpu_to_put_work() or the like.  Today this is based on the idleness 
of CPUs, but eventually we'll want to pack tasks on already loaded CPUs 
(without oversubscribing them) in order to keep as many CPUs idle as 
possible when that makes sense, which would alter the selection 
somewhat.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info
  2014-04-15 13:25       ` Peter Zijlstra
  2014-04-15 15:27         ` Nicolas Pitre
@ 2014-04-15 15:33         ` Rafael J. Wysocki
  1 sibling, 0 replies; 47+ messages in thread
From: Rafael J. Wysocki @ 2014-04-15 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicolas Pitre, Daniel Lezcano, linux-kernel, mingo, linux-pm,
	alex.shi, vincent.guittot, morten.rasmussen

On Tuesday, April 15, 2014 03:25:10 PM Peter Zijlstra wrote:
> On Fri, Apr 04, 2014 at 01:43:00PM +0200, Rafael J. Wysocki wrote:
> > > That's what this patch series is about.  The find_idlest_cpu code should 
> > > look for the idle CPU with the shallowest idle state, or the one with 
> > > the smallest load.  In this context "find_idlest_cpu" might become a 
> > > misnomer.
> > 
> > Yes, clearly.  It should be called find_best_cpu or something like that.
> 
> Ha!, but for what purpose? We already have find_busiest_cpu() to find
> the CPU to steal work from. The converse action, currently called
> find_idlest_cpu() is finding the CPU where to put work.
> 
> 'Best' is ambiguous in all regards, it doesn't convey the direction nor
> the quality sorted on.
> 
> So while idlest might be somewhat of a misnomer, it at least conveys the
> directional thing fairly well. Also we are still searching the least
> busy, and preferable an idle, cpu. 'Idlest' being a superlative also
> conveys the meaning of order.

But 'idlest' can also be understood as 'deepest idle', which clearly is not the
intent.  Perhaps find_cpu_for_work() reflects what it does, but I'm not sure
if that's a good name either.

Rafael


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-02  3:05   ` Nicolas Pitre
  2014-04-04 11:57     ` Rafael J. Wysocki
@ 2014-04-17 13:53     ` Daniel Lezcano
  2014-04-17 14:47       ` Peter Zijlstra
  2014-04-17 15:53       ` Nicolas Pitre
  1 sibling, 2 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-17 13:53 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On 04/02/2014 05:05 AM, Nicolas Pitre wrote:
> On Fri, 28 Mar 2014, Daniel Lezcano wrote:
>
>> As we know in which idle state the cpu is, we can investigate the following:
>>
>> 1. when did the cpu entered the idle state ? the longer the cpu is idle, the
>> deeper it is idle
>> 2. what exit latency is ? the greater the exit latency is, the deeper it is
>>
>> With both information, when all cpus are idle, we can choose the idlest cpu.
>>
>> When one cpu is not idle, the old check against weighted load applies.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>
> There seems to be some problems with the implementation.
>
>> @@ -4336,20 +4337,53 @@ static int
>>   find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>>   {
>>   	unsigned long load, min_load = ULONG_MAX;
>> -	int idlest = -1;
>> +	unsigned int min_exit_latency = UINT_MAX;
>> +	u64 idle_stamp, min_idle_stamp = ULONG_MAX;
>
> I don't think you really meant to assign an u64 variable with ULONG_MAX.
> You probably want ULLONG_MAX here.  And probably not in fact (more
> later).
>
>> +
>> +	struct rq *rq;
>> +	struct cpuidle_power *power;
>> +
>> +	int cpu_idle = -1;
>> +	int cpu_busy = -1;
>>   	int i;
>>
>>   	/* Traverse only the allowed CPUs */
>>   	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
>> -		load = weighted_cpuload(i);
>>
>> -		if (load < min_load || (load == min_load && i == this_cpu)) {
>> -			min_load = load;
>> -			idlest = i;
>> +		if (idle_cpu(i)) {
>> +
>> +			rq = cpu_rq(i);
>> +			power = rq->power;
>> +			idle_stamp = rq->idle_stamp;
>> +
>> +			/* The cpu is idle since a shorter time */
>> +			if (idle_stamp < min_idle_stamp) {
>> +				min_idle_stamp = idle_stamp;
>> +				cpu_idle = i;
>> +				continue;
>
> Don't you want the highest time stamp in order to select the most
> recently idled CPU?  Favoring the CPU which has been idle the longest
> makes little sense.
>
>> +			}
>> +
>> +			/* The cpu is idle but the exit_latency is shorter */
>> +			if (power && power->exit_latency < min_exit_latency) {
>> +				min_exit_latency = power->exit_latency;
>> +				cpu_idle = i;
>> +				continue;
>> +			}
>
> I think this is wrong.  This gives priority to CPUs which have been idle
> for a (longer... although this should have been) shorter period of time
> over those with a shallower idle state.  I think this should rather be:
>
> 	if (power && power->exit_latency < min_exit_latency) {
> 		min_exit_latency = power->exit_latency;
> 		latest_idle_stamp = idle_stamp;
> 	       	cpu_idle = i;
> 	} else if ((!power || power->exit_latency == min_exit_latency) &&
> 		   idle_stamp > latest_idle_stamp) {
> 		latest_idle_stamp = idle_stamp;
> 		cpu_idle = i;
> 	}
>
> So the CPU with the shallowest idle state is selected in priority, and
> if many CPUs are in the same state then the time stamp is used to
> select the most recent one. Whenever
> a shallower idle state is found then the latest_idle_stamp is reset for
> that state even if it is further in the past.
>
>> +		} else {
>> +
>> +			load = weighted_cpuload(i);
>> +
>> +			if (load < min_load ||
>> +			    (load == min_load && i == this_cpu)) {
>> +				min_load = load;
>> +				cpu_busy = i;
>> +				continue;
>> +			}
>>   		}
>
> I think this is wrong to do an if-else based on idle_cpu() here.  What
> if a CPU is heavily loaded, but for some reason it happens to be idle at
> this very moment?  With your patch it could be selected as an idle CPU
> while it would be discarded as being too busy otherwise.
>
> It is important to determine both cpu_busy and cpu_idle for all CPUs.
>
> And cpu_busy is a bad name for this.  Something like least_loaded would
> be more self explanatory.  Same thing for cpu_idle which could be
> clearer if named shalloest_idle.
>
>> -	return idlest;
>> +	/* Busy cpus are considered less idle than idle cpus ;) */
>> +	return cpu_busy != -1 ? cpu_busy : cpu_idle;
>
> And finally it is a policy decision whether or not we want to return
> least_loaded over shallowest_idle e.g do we pack tasks on non idle CPUs
> first or not.  That in itself needs more investigation.  To keep the
> existing policy unchanged for now the above condition should have its
> variables swapped.


Ok, refreshed the patchset but before sending it out I would to discuss 
about the rational of the changes and the policy, and change the 
patchset consequently.

What order to choose if the cpu is idle ?

Let's assume all cpus are idle on a dual socket quad core.

Also, we can reasonably do the hypothesis if the cluster is in low power 
mode, the cpus belonging to the same cluster are in the same idle state 
(putting apart the auto-promote where we don't have control on).

If the policy you talk above is 'aggressive power saving', we can follow 
the rules with decreasing priority:

1. We want to prevent to wakeup the entire cluster
	=> as the cpus are in the same idle state, by choosing a cpu in shallow 
state, we should have the guarantee we won't wakeup a cluster (except if 
no shallowest idle cpu are found).

2. We want to prevent to wakeup a cpu which did not reach the target 
residency time (will need some work to unify cpuidle idle time and idle 
task run time)
	=> with the target residency and, as a first step, with the idle stamp, 
we can determine if the cpu slept enough

3. We want to prevent to wakeup a cpu in deep idle state
	=> by looking for the cpu in shallowest idle state

4. We want to prevent to wakeup a cpu where the exit latency is longer 
than the expected run time of the task (and the time to migrate the task ?)

Concerning the policy, I would suggest to create an entry in
/proc/sys/kernel/sched_power, where a couple of values could be 
performance - power saving (0 / 1).

Does it make sense ? Any ideas ?

Thanks
   -- Daniel




-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-17 13:53     ` Daniel Lezcano
@ 2014-04-17 14:47       ` Peter Zijlstra
  2014-04-17 15:03         ` Daniel Lezcano
  2014-04-17 15:53       ` Nicolas Pitre
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-17 14:47 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
> Concerning the policy, I would suggest to create an entry in
> /proc/sys/kernel/sched_power, where a couple of values could be performance
> - power saving (0 / 1).

Ingo wanted a sched_balance_policy file with 3 values:
  "performance, power, auto"

Where the auto thing switches between them, initially based off of
having AC or not.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-17 14:47       ` Peter Zijlstra
@ 2014-04-17 15:03         ` Daniel Lezcano
  2014-04-18  8:09           ` Ingo Molnar
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-17 15:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On 04/17/2014 04:47 PM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
>> Concerning the policy, I would suggest to create an entry in
>> /proc/sys/kernel/sched_power, where a couple of values could be performance
>> - power saving (0 / 1).
>
> Ingo wanted a sched_balance_policy file with 3 values:
>    "performance, power, auto"
>
> Where the auto thing switches between them, initially based off of
> having AC or not.

oh, good. Thanks !


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-17 13:53     ` Daniel Lezcano
  2014-04-17 14:47       ` Peter Zijlstra
@ 2014-04-17 15:53       ` Nicolas Pitre
  2014-04-17 16:05         ` Daniel Lezcano
  1 sibling, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-17 15:53 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Thu, 17 Apr 2014, Daniel Lezcano wrote:

> Ok, refreshed the patchset but before sending it out I would to discuss about
> the rational of the changes and the policy, and change the patchset
> consequently.
> 
> What order to choose if the cpu is idle ?
> 
> Let's assume all cpus are idle on a dual socket quad core.
> 
> Also, we can reasonably do the hypothesis if the cluster is in low power mode,
> the cpus belonging to the same cluster are in the same idle state (putting
> apart the auto-promote where we don't have control on).
> 
> If the policy you talk above is 'aggressive power saving', we can follow the
> rules with decreasing priority:
> 
> 1. We want to prevent to wakeup the entire cluster
> 	=> as the cpus are in the same idle state, by choosing a cpu in
> 	=> shallow 
> state, we should have the guarantee we won't wakeup a cluster (except if no
> shallowest idle cpu are found).

This is unclear to me.  Obviously, if an entire cluster is down, that 
means all the CPUs it contains have been idle for a long time. And 
therefore they shouldn't be subject to selection unless there is no 
other CPUs available.  Is that what you mean?

> 2. We want to prevent to wakeup a cpu which did not reach the target residency
> time (will need some work to unify cpuidle idle time and idle task run time)
> 	=> with the target residency and, as a first step, with the idle
> 	=> stamp, 
> we can determine if the cpu slept enough

Agreed. However, right now, the scheduler does not have any 
consideration for that.  So this should be done as a separate patch.

> 3. We want to prevent to wakeup a cpu in deep idle state
> 	=> by looking for the cpu in shallowest idle state

Obvious.

> 4. We want to prevent to wakeup a cpu where the exit latency is longer than
> the expected run time of the task (and the time to migrate the task ?)

Sure.  That would be a case for using task packing even if the policy is 
set to performance rather than powersave whereas task packing is 
normally for powersave.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-17 15:53       ` Nicolas Pitre
@ 2014-04-17 16:05         ` Daniel Lezcano
  2014-04-17 16:21           ` Nicolas Pitre
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-17 16:05 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On 04/17/2014 05:53 PM, Nicolas Pitre wrote:
> On Thu, 17 Apr 2014, Daniel Lezcano wrote:
>
>> Ok, refreshed the patchset but before sending it out I would to discuss about
>> the rational of the changes and the policy, and change the patchset
>> consequently.
>>
>> What order to choose if the cpu is idle ?
>>
>> Let's assume all cpus are idle on a dual socket quad core.
>>
>> Also, we can reasonably do the hypothesis if the cluster is in low power mode,
>> the cpus belonging to the same cluster are in the same idle state (putting
>> apart the auto-promote where we don't have control on).
>>
>> If the policy you talk above is 'aggressive power saving', we can follow the
>> rules with decreasing priority:
>>
>> 1. We want to prevent to wakeup the entire cluster
>> 	=> as the cpus are in the same idle state, by choosing a cpu in
>> 	=> shallow
>> state, we should have the guarantee we won't wakeup a cluster (except if no
>> shallowest idle cpu are found).
>
> This is unclear to me.  Obviously, if an entire cluster is down, that
> means all the CPUs it contains have been idle for a long time.  And
> therefore they shouldn't be subject to selection unless there is no
> other CPUs available.  Is that what you mean?

Yes, this is what I meant. But also what I meant is we can get rid for 
the moment of the cpu topology and the coupling idle state because if we 
do this described approach, as the idle state will be the same for the 
cpus belonging to the same cluster we won't select a cluster down 
(except if there is no other CPUs available).

>> 2. We want to prevent to wakeup a cpu which did not reach the target residency
>> time (will need some work to unify cpuidle idle time and idle task run time)
>> 	=> with the target residency and, as a first step, with the idle
>> 	=> stamp,
>> we can determine if the cpu slept enough
>
> Agreed. However, right now, the scheduler does not have any
> consideration for that.  So this should be done as a separate patch.

Yes, I thought as a very first step we can rely on the idle stamp until 
we unify the times with a big comment. Or I can first unify the idle 
times and then take into account the target residency. It is to comply 
with Rafael's request to have the 'big picture'.

>> 3. We want to prevent to wakeup a cpu in deep idle state
>> 	=> by looking for the cpu in shallowest idle state
>
> Obvious.
>
>> 4. We want to prevent to wakeup a cpu where the exit latency is longer than
>> the expected run time of the task (and the time to migrate the task ?)
>
> Sure.  That would be a case for using task packing even if the policy is
> set to performance rather than powersave whereas task packing is
> normally for powersave.

Yes, I agree, task packing improves also the performances and it makes 
really sense to prevent task migration under some circumstances for a 
better cache efficiency.

Thanks for the comments

   -- Daniel

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-17 16:05         ` Daniel Lezcano
@ 2014-04-17 16:21           ` Nicolas Pitre
  2014-04-18  9:38             ` Peter Zijlstra
  0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-17 16:21 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: linux-kernel, mingo, peterz, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Thu, 17 Apr 2014, Daniel Lezcano wrote:

> On 04/17/2014 05:53 PM, Nicolas Pitre wrote:
> > On Thu, 17 Apr 2014, Daniel Lezcano wrote:
> >
> > > Ok, refreshed the patchset but before sending it out I would to discuss
> > > about
> > > the rational of the changes and the policy, and change the patchset
> > > consequently.
> > >
> > > What order to choose if the cpu is idle ?
> > >
> > > Let's assume all cpus are idle on a dual socket quad core.
> > >
> > > Also, we can reasonably do the hypothesis if the cluster is in low power
> > > mode,
> > > the cpus belonging to the same cluster are in the same idle state (putting
> > > apart the auto-promote where we don't have control on).
> > >
> > > If the policy you talk above is 'aggressive power saving', we can follow
> > > the
> > > rules with decreasing priority:
> > >
> > > 1. We want to prevent to wakeup the entire cluster
> > > => as the cpus are in the same idle state, by choosing a cpu in
> > > => shallow
> > > state, we should have the guarantee we won't wakeup a cluster (except if
> > > no
> > > shallowest idle cpu are found).
> >
> > This is unclear to me.  Obviously, if an entire cluster is down, that
> > means all the CPUs it contains have been idle for a long time.  And
> > therefore they shouldn't be subject to selection unless there is no
> > other CPUs available.  Is that what you mean?
> 
> Yes, this is what I meant. But also what I meant is we can get rid for the
> moment of the cpu topology and the coupling idle state because if we do this
> described approach, as the idle state will be the same for the cpus belonging
> to the same cluster we won't select a cluster down (except if there is no
> other CPUs available).

CPU topology is needed to properly describe scheduling domains.  Whether 
we balance across domains or pack using as few domains as possible is a 
separate issue.  In other words, you shouldn't have to care in this 
patch series.

And IMHO coupled C-state is a low-level mechanism that should remain 
private to cpuidle which the scheduler shouldn't be aware of.

> > > 2. We want to prevent to wakeup a cpu which did not reach the target
> > > residency
> > > time (will need some work to unify cpuidle idle time and idle task run
> > > time)
> > > => with the target residency and, as a first step, with the idle
> > > => stamp,
> > > we can determine if the cpu slept enough
> >
> > Agreed. However, right now, the scheduler does not have any
> > consideration for that.  So this should be done as a separate patch.
> 
> Yes, I thought as a very first step we can rely on the idle stamp until we
> unify the times with a big comment. Or I can first unify the idle times and
> then take into account the target residency. It is to comply with Rafael's
> request to have the 'big picture'.

I agree, but that should be done incrementally.  Even without this 
consideration, what you proposed is already an improvement over the 
current state of affairs.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-17 15:03         ` Daniel Lezcano
@ 2014-04-18  8:09           ` Ingo Molnar
  2014-04-18  8:36             ` Daniel Lezcano
  0 siblings, 1 reply; 47+ messages in thread
From: Ingo Molnar @ 2014-04-18  8:09 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Peter Zijlstra, Nicolas Pitre, linux-kernel, mingo, rjw,
	linux-pm, alex.shi, vincent.guittot, morten.rasmussen


* Daniel Lezcano <daniel.lezcano@linaro.org> wrote:

> On 04/17/2014 04:47 PM, Peter Zijlstra wrote:
> >On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
> >>Concerning the policy, I would suggest to create an entry in
> >>/proc/sys/kernel/sched_power, where a couple of values could be performance
> >>- power saving (0 / 1).
> >
> >Ingo wanted a sched_balance_policy file with 3 values:
> >   "performance, power, auto"
> >
> >Where the auto thing switches between them, initially based off of
> >having AC or not.
> 
> oh, good. Thanks !

Also, 'auto' should be the default, because the kernel doing TRT is 
really what users want.

Userspace can sill tweak it all and make it all user-space controlled, 
by flipping between 'performance' and 'power'. (and those modes are 
also helpful for development and debugging.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-18  8:09           ` Ingo Molnar
@ 2014-04-18  8:36             ` Daniel Lezcano
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-18  8:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Nicolas Pitre, linux-kernel, mingo, rjw,
	linux-pm, alex.shi, vincent.guittot, morten.rasmussen

On 04/18/2014 10:09 AM, Ingo Molnar wrote:
>
> * Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
>> On 04/17/2014 04:47 PM, Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 03:53:32PM +0200, Daniel Lezcano wrote:
>>>> Concerning the policy, I would suggest to create an entry in
>>>> /proc/sys/kernel/sched_power, where a couple of values could be performance
>>>> - power saving (0 / 1).
>>>
>>> Ingo wanted a sched_balance_policy file with 3 values:
>>>    "performance, power, auto"
>>>
>>> Where the auto thing switches between them, initially based off of
>>> having AC or not.
>>
>> oh, good. Thanks !
>
> Also, 'auto' should be the default, because the kernel doing TRT is
> really what users want.
>
> Userspace can sill tweak it all and make it all user-space controlled,
> by flipping between 'performance' and 'power'. (and those modes are
> also helpful for development and debugging.)

Copy that.

   Thanks !

   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-17 16:21           ` Nicolas Pitre
@ 2014-04-18  9:38             ` Peter Zijlstra
  2014-04-18 12:13               ` Daniel Lezcano
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-18  9:38 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Daniel Lezcano, linux-kernel, mingo, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
> CPU topology is needed to properly describe scheduling domains.  Whether 
> we balance across domains or pack using as few domains as possible is a 
> separate issue.  In other words, you shouldn't have to care in this 
> patch series.
> 
> And IMHO coupled C-state is a low-level mechanism that should remain 
> private to cpuidle which the scheduler shouldn't be aware of.

I'm confused.. why wouldn't you want to expose these?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-18  9:38             ` Peter Zijlstra
@ 2014-04-18 12:13               ` Daniel Lezcano
  2014-04-18 12:53                 ` Peter Zijlstra
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-18 12:13 UTC (permalink / raw)
  To: Peter Zijlstra, Nicolas Pitre
  Cc: linux-kernel, mingo, rjw, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen

On 04/18/2014 11:38 AM, Peter Zijlstra wrote:
> On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
>> CPU topology is needed to properly describe scheduling domains.  Whether
>> we balance across domains or pack using as few domains as possible is a
>> separate issue.  In other words, you shouldn't have to care in this
>> patch series.
>>
>> And IMHO coupled C-state is a low-level mechanism that should remain
>> private to cpuidle which the scheduler shouldn't be aware of.
>
> I'm confused.. why wouldn't you want to expose these?

The couple C-state is used as a mechanism for cpuidle to sync the cpus 
when entering a specific c-state. This mechanism is usually used to 
handle the cluster power down. It is only used for a two drivers (soon 
three) but it is not the only mechanism used for syncing the cpus. There 
are also the MCPM (tc2), the hand made sync when the hardware allows it 
(ux500), and an abstraction from the firmware (mwait), transparent to 
the kernel.

Taking into account the couple c-state only does not make sense because 
of the other mechanisms above. This is why it should stay inside the 
cpuidle framework.

The extension of the cpu topology will provide a generic way to describe 
and abstracting such dependencies.

Does it answer your question ?

   -- Daniel

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-18 12:13               ` Daniel Lezcano
@ 2014-04-18 12:53                 ` Peter Zijlstra
  2014-04-18 13:04                   ` Daniel Lezcano
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2014-04-18 12:53 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Fri, Apr 18, 2014 at 02:13:48PM +0200, Daniel Lezcano wrote:
> On 04/18/2014 11:38 AM, Peter Zijlstra wrote:
> >On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
> >>CPU topology is needed to properly describe scheduling domains.  Whether
> >>we balance across domains or pack using as few domains as possible is a
> >>separate issue.  In other words, you shouldn't have to care in this
> >>patch series.
> >>
> >>And IMHO coupled C-state is a low-level mechanism that should remain
> >>private to cpuidle which the scheduler shouldn't be aware of.
> >
> >I'm confused.. why wouldn't you want to expose these?
> 
> The couple C-state is used as a mechanism for cpuidle to sync the cpus when
> entering a specific c-state. This mechanism is usually used to handle the
> cluster power down. It is only used for a two drivers (soon three) but it is
> not the only mechanism used for syncing the cpus. There are also the MCPM
> (tc2), the hand made sync when the hardware allows it (ux500), and an
> abstraction from the firmware (mwait), transparent to the kernel.
> 
> Taking into account the couple c-state only does not make sense because of
> the other mechanisms above. This is why it should stay inside the cpuidle
> framework.
> 
> The extension of the cpu topology will provide a generic way to describe and
> abstracting such dependencies.
> 
> Does it answer your question ?

I suppose so; its still a bit like we won't but we will :-)

So we _will_ actually expose coupled C states through the topology bits,
that's good.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-18 12:53                 ` Peter Zijlstra
@ 2014-04-18 13:04                   ` Daniel Lezcano
  2014-04-18 16:00                     ` Nicolas Pitre
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Lezcano @ 2014-04-18 13:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicolas Pitre, linux-kernel, mingo, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On 04/18/2014 02:53 PM, Peter Zijlstra wrote:
> On Fri, Apr 18, 2014 at 02:13:48PM +0200, Daniel Lezcano wrote:
>> On 04/18/2014 11:38 AM, Peter Zijlstra wrote:
>>> On Thu, Apr 17, 2014 at 12:21:28PM -0400, Nicolas Pitre wrote:
>>>> CPU topology is needed to properly describe scheduling domains.  Whether
>>>> we balance across domains or pack using as few domains as possible is a
>>>> separate issue.  In other words, you shouldn't have to care in this
>>>> patch series.
>>>>
>>>> And IMHO coupled C-state is a low-level mechanism that should remain
>>>> private to cpuidle which the scheduler shouldn't be aware of.
>>>
>>> I'm confused.. why wouldn't you want to expose these?
>>
>> The couple C-state is used as a mechanism for cpuidle to sync the cpus when
>> entering a specific c-state. This mechanism is usually used to handle the
>> cluster power down. It is only used for a two drivers (soon three) but it is
>> not the only mechanism used for syncing the cpus. There are also the MCPM
>> (tc2), the hand made sync when the hardware allows it (ux500), and an
>> abstraction from the firmware (mwait), transparent to the kernel.
>>
>> Taking into account the couple c-state only does not make sense because of
>> the other mechanisms above. This is why it should stay inside the cpuidle
>> framework.
>>
>> The extension of the cpu topology will provide a generic way to describe and
>> abstracting such dependencies.
>>
>> Does it answer your question ?
>
> I suppose so; its still a bit like we won't but we will :-)
>
> So we _will_ actually expose coupled C states through the topology bits,
> that's good.

Ah, ok. I think I understood where the confusion is coming from.

A couple of definitions for the same thing :)

1. Coupled C-states : *mechanism* implemented in the cpuidle framework: 
drivers/cpuidle/coupled.c

2. Coupled C-states : *constraint* to reach a cluster power down state, 
will be described through the topology and could be implemented by 
different mechanism (MCPM, handmade sync, cpuidle-coupled-c-state, 
firmware).

We want to expose 2. not 1. to the scheduler.



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu
  2014-04-18 13:04                   ` Daniel Lezcano
@ 2014-04-18 16:00                     ` Nicolas Pitre
  0 siblings, 0 replies; 47+ messages in thread
From: Nicolas Pitre @ 2014-04-18 16:00 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Peter Zijlstra, linux-kernel, mingo, rjw, linux-pm, alex.shi,
	vincent.guittot, morten.rasmussen

On Fri, 18 Apr 2014, Daniel Lezcano wrote:

> On 04/18/2014 02:53 PM, Peter Zijlstra wrote:
> > I suppose so; its still a bit like we won't but we will :-)
> >
> > So we _will_ actually expose coupled C states through the topology bits,
> > that's good.
> 
> Ah, ok. I think I understood where the confusion is coming from.
> 
> A couple of definitions for the same thing :)
> 
> 1. Coupled C-states : *mechanism* implemented in the cpuidle framework:
> drivers/cpuidle/coupled.c
> 
> 2. Coupled C-states : *constraint* to reach a cluster power down state, will
> be described through the topology and could be implemented by different
> mechanism (MCPM, handmade sync, cpuidle-coupled-c-state, firmware).
> 
> We want to expose 2. not 1. to the scheduler.

I couldn't explain it better.

Sorry for creating confusion.


Nicolas

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2014-04-18 16:00 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-28 12:29 [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 1/3] cpuidle: encapsulate power info in a separate structure Daniel Lezcano
2014-03-28 18:17   ` Nicolas Pitre
2014-03-28 20:42     ` Daniel Lezcano
2014-03-29  0:00       ` Nicolas Pitre
2014-03-28 12:29 ` [RFC PATCHC 2/3] idle: store the idle state the cpu is Daniel Lezcano
2014-04-15 12:43   ` Peter Zijlstra
2014-04-15 12:44     ` Peter Zijlstra
2014-04-15 14:17       ` Daniel Lezcano
2014-04-15 14:33         ` Peter Zijlstra
2014-04-15 14:39           ` Daniel Lezcano
2014-03-28 12:29 ` [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu Daniel Lezcano
2014-04-02  3:05   ` Nicolas Pitre
2014-04-04 11:57     ` Rafael J. Wysocki
2014-04-04 16:56       ` Nicolas Pitre
2014-04-05  2:01         ` Rafael J. Wysocki
2014-04-17 13:53     ` Daniel Lezcano
2014-04-17 14:47       ` Peter Zijlstra
2014-04-17 15:03         ` Daniel Lezcano
2014-04-18  8:09           ` Ingo Molnar
2014-04-18  8:36             ` Daniel Lezcano
2014-04-17 15:53       ` Nicolas Pitre
2014-04-17 16:05         ` Daniel Lezcano
2014-04-17 16:21           ` Nicolas Pitre
2014-04-18  9:38             ` Peter Zijlstra
2014-04-18 12:13               ` Daniel Lezcano
2014-04-18 12:53                 ` Peter Zijlstra
2014-04-18 13:04                   ` Daniel Lezcano
2014-04-18 16:00                     ` Nicolas Pitre
2014-04-15 13:03   ` Peter Zijlstra
2014-03-31 13:52 ` [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info Vincent Guittot
2014-03-31 15:55   ` Daniel Lezcano
2014-04-01  7:16     ` Vincent Guittot
2014-04-01  7:43       ` Daniel Lezcano
2014-04-01  9:05         ` Vincent Guittot
2014-04-15 13:13           ` Peter Zijlstra
2014-04-01 23:01 ` Rafael J. Wysocki
2014-04-02  3:14   ` Nicolas Pitre
2014-04-04 11:43     ` Rafael J. Wysocki
2014-04-15 13:17       ` Peter Zijlstra
2014-04-15 13:25       ` Peter Zijlstra
2014-04-15 15:27         ` Nicolas Pitre
2014-04-15 15:33         ` Rafael J. Wysocki
2014-04-02  8:26   ` Daniel Lezcano
2014-04-04 11:23     ` Rafael J. Wysocki
2014-04-04  6:29 ` Len Brown
2014-04-04  8:16   ` Daniel Lezcano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.