[PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle
@ 2014-04-24 12:24 Daniel Lezcano
  2014-04-24 12:24 ` [PATCH 1/3] sched: idle: Encapsulate the code to compile it out Daniel Lezcano
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-24 12:24 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, rjw, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen, linaro-kernel

This patchset provides three patches for the basis to integrate cpuidle with
the scheduler.

The first patch is a cleanup.
The second one adds the sched balance option as requested by Ingo.
The third one stores the idle state a cpu is and adds a rcu_barrier() to
 prevent races when using the pointed object.

This patchset is based on top of v3.15-rc2.

This patchset does not modify the behavior of the scheduler.

Taking into account the cpuidle information from the scheduler will be
posted in a separate patchset in order to keep focused on the right decisions
the scheduler should take regarding the policy vs idle parameters.

Daniel Lezcano (3):
  sched: idle: Encapsulate the code to compile it out
  sched: idle: Add sched balance option
  sched: idle: Store the idle state the cpu is

 drivers/cpuidle/cpuidle.c    |    6 ++
 include/linux/sched/sysctl.h |   14 ++++
 kernel/sched/fair.c          |   92 ++++++++++++++++++++++-
 kernel/sched/idle.c          |  169 +++++++++++++++++++++++-------------------
 kernel/sched/sched.h         |    5 ++
 kernel/sysctl.c              |   11 +++
 6 files changed, 220 insertions(+), 77 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/3] sched: idle: Encapsulate the code to compile it out
  2014-04-24 12:24 [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Daniel Lezcano
@ 2014-04-24 12:24 ` Daniel Lezcano
  2014-04-30 17:39   ` Nicolas Pitre
  2014-04-24 12:24 ` [PATCH 2/3] sched: idle: Add sched balance option Daniel Lezcano
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-24 12:24 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, rjw, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen, linaro-kernel

Encapsulate the large portion of cpuidle_idle_call inside another
function so when CONFIG_CPU_IDLE=n, the code will be compiled out.
Also that is benefitial for the clarity of the code as it removes
a nested indentation level.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/idle.c |  161 +++++++++++++++++++++++++++------------------------
 1 file changed, 86 insertions(+), 75 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8f4390a..e877dd4 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -63,6 +63,90 @@ void __weak arch_cpu_idle(void)
 	local_irq_enable();
 }
 
+#ifdef CONFIG_CPU_IDLE
+static int __cpuidle_idle_call(void)
+{
+	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
+	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
+	int next_state, entered_state, ret;
+	bool broadcast;
+
+	/*
+	 * Check if the cpuidle framework is ready, otherwise fallback
+	 * to the default arch specific idle method
+	 */
+	ret = cpuidle_enabled(drv, dev);
+	if (ret)
+		return ret;
+
+	/*
+	 * Ask the governor to choose an idle state it thinks
+	 * it is convenient to go to. There is *always* a
+	 * convenient idle state
+	 */
+	next_state = cpuidle_select(drv, dev);
+
+	/*
+	 * The idle task must be scheduled, it is pointless to
+	 * go to idle, just update no idle residency and get
+	 * out of this function
+	 */
+	if (current_clr_polling_and_test()) {
+		dev->last_residency = 0;
+		entered_state = next_state;
+		local_irq_enable();
+	} else {
+		broadcast = !!(drv->states[next_state].flags &
+			       CPUIDLE_FLAG_TIMER_STOP);
+
+		if (broadcast)
+			/*
+			 * Tell the time framework to switch to a
+			 * broadcast timer because our local timer
+			 * will be shutdown. If a local timer is used
+			 * from another cpu as a broadcast timer, this
+			 * call may fail if it is not available
+			 */
+			ret = clockevents_notify(
+				CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
+				&dev->cpu);
+
+		if (!ret) {
+			trace_cpu_idle_rcuidle(next_state, dev->cpu);
+
+			/*
+			 * Enter the idle state previously returned by
+			 * the governor decision. This function will
+			 * block until an interrupt occurs and will
+			 * take care of re-enabling the local
+			 * interrupts
+			 */
+			entered_state = cpuidle_enter(drv, dev, next_state);
+
+			trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
+
+			if (broadcast)
+				clockevents_notify(
+					CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
+					&dev->cpu);
+
+			/*
+			 * Give the governor an opportunity to reflect
+			 * on the outcome
+			 */
+			cpuidle_reflect(dev, entered_state);
+		}
+	}
+
+	return 0;
+}
+#else
+static int inline __cpuidle_idle_call(void)
+{
+	return -ENOSYS;
+}
+#endif
+
 /**
  * cpuidle_idle_call - the main idle function
  *
@@ -71,10 +155,7 @@ void __weak arch_cpu_idle(void)
  */
 static int cpuidle_idle_call(void)
 {
-	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
-	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
-	int next_state, entered_state, ret;
-	bool broadcast;
+	int ret;
 
 	/*
 	 * Check if the idle task must be rescheduled. If it is the
@@ -101,80 +182,10 @@ static int cpuidle_idle_call(void)
 	rcu_idle_enter();
 
 	/*
-	 * Check if the cpuidle framework is ready, otherwise fallback
-	 * to the default arch specific idle method
-	 */
-	ret = cpuidle_enabled(drv, dev);
-
-	if (!ret) {
-		/*
-		 * Ask the governor to choose an idle state it thinks
-		 * it is convenient to go to. There is *always* a
-		 * convenient idle state
-		 */
-		next_state = cpuidle_select(drv, dev);
-
-		/*
-		 * The idle task must be scheduled, it is pointless to
-		 * go to idle, just update no idle residency and get
-		 * out of this function
-		 */
-		if (current_clr_polling_and_test()) {
-			dev->last_residency = 0;
-			entered_state = next_state;
-			local_irq_enable();
-		} else {
-			broadcast = !!(drv->states[next_state].flags &
-				       CPUIDLE_FLAG_TIMER_STOP);
-
-			if (broadcast)
-				/*
-				 * Tell the time framework to switch
-				 * to a broadcast timer because our
-				 * local timer will be shutdown. If a
-				 * local timer is used from another
-				 * cpu as a broadcast timer, this call
-				 * may fail if it is not available
-				 */
-				ret = clockevents_notify(
-					CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
-					&dev->cpu);
-
-			if (!ret) {
-				trace_cpu_idle_rcuidle(next_state, dev->cpu);
-
-				/*
-				 * Enter the idle state previously
-				 * returned by the governor
-				 * decision. This function will block
-				 * until an interrupt occurs and will
-				 * take care of re-enabling the local
-				 * interrupts
-				 */
-				entered_state = cpuidle_enter(drv, dev,
-							      next_state);
-
-				trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
-						       dev->cpu);
-
-				if (broadcast)
-					clockevents_notify(
-						CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
-						&dev->cpu);
-
-				/*
-				 * Give the governor an opportunity to reflect on the
-				 * outcome
-				 */
-				cpuidle_reflect(dev, entered_state);
-			}
-		}
-	}
-
-	/*
 	 * We can't use the cpuidle framework, let's use the default
 	 * idle routine
 	 */
+	ret = __cpuidle_idle_call();
 	if (ret)
 		arch_cpu_idle();
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-24 12:24 [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Daniel Lezcano
  2014-04-24 12:24 ` [PATCH 1/3] sched: idle: Encapsulate the code to compile it out Daniel Lezcano
@ 2014-04-24 12:24 ` Daniel Lezcano
       [not found]   ` <CAP245DX9wewQFhcyGj5ZuNE7hHC4fRn90POC32HLF6ugja6nJg@mail.gmail.com>
  2014-04-24 12:24 ` [PATCH 3/3] sched: idle: Store the idle state the cpu is Daniel Lezcano
  2014-04-24 16:13 ` [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Rafael J. Wysocki
  3 siblings, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-24 12:24 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, rjw, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen, linaro-kernel

This patch adds a sysctl schedule balance option to choose against:

 * auto        (0)
 * performance (1)
 * power       (2)

It relies on the recently added notifier to monitor the power supply changes.
If the scheduler balance option is set to 'auto', then when the system switches
to battery, the balance option change to 'power' and when it goes back to AC, it
switches to 'performance'.

The default value is 'auto'.

If the kernel is compiled without the CONFIG_POWER_SUPPLY option, then any call
to the 'auto' option will fail and the scheduler will use the 'performance'
option as default.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 include/linux/sched/sysctl.h |   14 +++++++
 kernel/sched/fair.c          |   92 +++++++++++++++++++++++++++++++++++++++++-
 kernel/sysctl.c              |   11 +++++
 3 files changed, 115 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 8045a55..f8507bf 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -44,6 +44,20 @@ enum sched_tunable_scaling {
 };
 extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
 
+#ifdef CONFIG_SMP
+enum sched_balance_option {
+	SCHED_BALANCE_OPTION_PERFORMANCE,
+	SCHED_BALANCE_OPTION_POWER,
+	SCHED_BALANCE_OPTION_AUTO,
+	SCHED_BALANCE_OPTION_END,
+};
+extern enum sched_balance_option sysctl_sched_balance_option;
+
+int sched_proc_balance_option_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *length,
+		loff_t *ppos);
+#endif
+
 extern unsigned int sysctl_numa_balancing_scan_delay;
 extern unsigned int sysctl_numa_balancing_scan_period_min;
 extern unsigned int sysctl_numa_balancing_scan_period_max;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7570dd9..7b8e93d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -29,7 +29,7 @@
 #include <linux/mempolicy.h>
 #include <linux/migrate.h>
 #include <linux/task_work.h>
-
+#include <linux/power_supply.h>
 #include <trace/events/sched.h>
 
 #include "sched.h"
@@ -61,6 +61,24 @@ unsigned int normalized_sysctl_sched_latency = 6000000ULL;
 enum sched_tunable_scaling sysctl_sched_tunable_scaling
 	= SCHED_TUNABLESCALING_LOG;
 
+#ifdef CONFIG_SMP
+/*
+ * Scheduler balancing policy:
+ *
+ * Options are:
+ * SCHED_BALANCE_OPTION_PERFORMANCE - full performance
+ * SCHED_BALANCE_OPTION_POWER - power saving aggressive
+ * SCHED_BALANCE_OPTION_AUTO - switches to 'performance' when plugged
+ *                             on or 'power' on battery
+ */
+enum sched_balance_option sysctl_sched_balance_option
+	= SCHED_BALANCE_OPTION_AUTO;
+
+static int sched_current_balance_option
+	= SCHED_BALANCE_OPTION_PERFORMANCE;
+
+#endif
+
 /*
  * Minimal preemption granularity for CPU-bound tasks:
  * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
@@ -555,6 +573,76 @@ static struct sched_entity *__pick_next_entity(struct sched_entity *se)
 	return rb_entry(next, struct sched_entity, run_node);
 }
 
+#ifdef CONFIG_SMP
+static int sched_balance_option_update(void)
+{
+	int ret;
+
+	/*
+	 * Copy the current balance option
+	 */
+	if (sysctl_sched_balance_option != SCHED_BALANCE_OPTION_AUTO) {
+		sched_current_balance_option = sysctl_sched_balance_option;
+		return 0;
+	}
+
+	/*
+	 * This call may fail if the kernel is not compiled with
+	 * the POWER_SUPPLY option.
+	 */
+	ret = power_supply_is_system_supplied();
+	if (ret < 0) {
+		sysctl_sched_balance_option = sched_current_balance_option;
+		return ret;
+	}
+
+	/*
+	 * When in 'auto' mode, switch to 'performance if the system
+	 * is plugged on the wall, to 'power' if we are on battery
+	 */
+	sched_current_balance_option = ret ?
+		SCHED_BALANCE_OPTION_PERFORMANCE :
+		SCHED_BALANCE_OPTION_POWER;
+
+	return 0;
+}
+
+int sched_proc_balance_option_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *length,
+		loff_t *ppos)
+{
+	int ret;
+
+	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+	if (ret)
+		return ret;
+
+	return sched_balance_option_update();
+}
+
+static int sched_power_supply_notifier(struct notifier_block *b,
+				       unsigned long l, void *v)
+{
+	sched_balance_option_update();
+	return NOTIFY_OK;
+}
+
+static struct notifier_block power_supply_notifier_nb = {
+	.notifier_call = sched_power_supply_notifier,
+};
+
+static int sched_balance_option_init(void)
+{
+	int ret;
+
+	ret = sched_balance_option_update();
+	if (ret)
+		return ret;
+
+	return power_supply_reg_notifier(&power_supply_notifier_nb);
+}
+#endif
+
 #ifdef CONFIG_SCHED_DEBUG
 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
 {
@@ -7695,7 +7783,7 @@ __init void init_sched_fair_class(void)
 {
 #ifdef CONFIG_SMP
 	open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
-
+	sched_balance_option_init();
 #ifdef CONFIG_NO_HZ_COMMON
 	nohz.next_balance = jiffies;
 	zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 74f5b58..e4ecc7d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -282,6 +282,17 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+#ifdef CONFIG_SMP
+	{
+		.procname	= "sched_balance_option",
+		.data		= &sysctl_sched_balance_option,
+		.maxlen		= sizeof(enum sched_balance_option),
+		.mode		= 0644,
+		.proc_handler	= sched_proc_balance_option_handler,
+		.extra1		= &zero, /* SCHED_BALANCE_OPTION_AUTO */
+		.extra2		= &two,  /* SCHED_BALANCE_OPTION_POWER */
+	},
+#endif
 #ifdef CONFIG_SCHED_DEBUG
 	{
 		.procname	= "sched_min_granularity_ns",
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/3] sched: idle: Store the idle state the cpu is
  2014-04-24 12:24 [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Daniel Lezcano
  2014-04-24 12:24 ` [PATCH 1/3] sched: idle: Encapsulate the code to compile it out Daniel Lezcano
  2014-04-24 12:24 ` [PATCH 2/3] sched: idle: Add sched balance option Daniel Lezcano
@ 2014-04-24 12:24 ` Daniel Lezcano
  2014-04-24 16:16   ` Rafael J. Wysocki
  2014-04-24 16:13 ` [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Rafael J. Wysocki
  3 siblings, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-24 12:24 UTC (permalink / raw)
  To: peterz, mingo
  Cc: linux-kernel, rjw, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen, linaro-kernel

When the cpu enters idle it stores the cpuidle state pointer in the struct
rq which in turn could be used to take a right decision when balancing
a task.

As soon as the cpu exits the idle state, the structure is filled back with the
NULL pointer.

There are a couple of situations where the idle state pointer could be changed
while looking at it:

1. For x86/acpi with dynamic c-states, when a laptop switches from battery
   to AC that could result on removing the deeper idle state. The acpi driver
   triggers:
	'acpi_processor_cst_has_changed'
		'cpuidle_pause_and_lock'
			'cpuidle_uninstall_idle_handler'
				'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to NULL.

2. The cpuidle driver is unloaded. Logically that could happen but not in
practice because the drivers are always compiled in and 95% of the drivers are
not coded to unregister the driver. In any case, the unloading code must call
'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to
'kick_all_cpus_sync' as mentioned above.

The race can happen if we use the pointer and then one of these two situations
occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be used inside
a rcu_read_lock section where we are protected with the 'rcu_barrier' in the
'cpuidle_uninstall_idle_handler' function.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/cpuidle/cpuidle.c |    6 ++++++
 kernel/sched/idle.c       |    8 ++++++++
 kernel/sched/sched.h      |    5 +++++
 3 files changed, 19 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 8236746..6a13f40 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -190,6 +190,12 @@ void cpuidle_install_idle_handler(void)
  */
 void cpuidle_uninstall_idle_handler(void)
 {
+	/*
+	 * Wait for the scheduler to finish to use the idle state he
+	 * may pointing to when looking for the cpu idle states
+	 */
+	rcu_barrier();
+
 	if (enabled_devices) {
 		initialized = 0;
 		kick_all_cpus_sync();
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index e877dd4..4c14ec0 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -12,6 +12,8 @@
 
 #include <trace/events/power.h>
 
+#include "sched.h"
+
 static int __read_mostly cpu_idle_force_poll;
 
 void cpu_idle_poll_ctrl(bool enable)
@@ -66,6 +68,8 @@ void __weak arch_cpu_idle(void)
 #ifdef CONFIG_CPU_IDLE
 static int __cpuidle_idle_call(void)
 {
+	struct rq *rq = this_rq();
+	struct cpuidle_state **idle_state = &rq->idle_state;
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
 	int next_state, entered_state, ret;
@@ -114,6 +118,8 @@ static int __cpuidle_idle_call(void)
 		if (!ret) {
 			trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
+			*idle_state = &drv->states[next_state];
+
 			/*
 			 * Enter the idle state previously returned by
 			 * the governor decision. This function will
@@ -123,6 +129,8 @@ static int __cpuidle_idle_call(void)
 			 */
 			entered_state = cpuidle_enter(drv, dev, next_state);
 
+			*idle_state = NULL;
+
 			trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
 			if (broadcast)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 456e492..bace64a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -14,6 +14,7 @@
 #include "cpuacct.h"
 
 struct rq;
+struct cpuidle_state;
 
 extern __read_mostly int scheduler_running;
 
@@ -643,6 +644,10 @@ struct rq {
 #ifdef CONFIG_SMP
 	struct llist_head wake_list;
 #endif
+
+#ifdef CONFIG_CPU_IDLE
+	struct cpuidle_state *idle_state;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
       [not found]   ` <CAP245DX9wewQFhcyGj5ZuNE7hHC4fRn90POC32HLF6ugja6nJg@mail.gmail.com>
@ 2014-04-24 13:30     ` Daniel Lezcano
  2014-04-25 10:54       ` Amit Kucheria
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-24 13:30 UTC (permalink / raw)
  To: Amit Kucheria
  Cc: Peter Zijlstra, Ingo Molnar, Lists linaro-kernel, Linux PM list,
	Rafael J. Wysocki, Linux Kernel Mailing List

On 04/24/2014 03:13 PM, Amit Kucheria wrote:
> On Thu, Apr 24, 2014 at 5:54 PM, Daniel Lezcano
> <daniel.lezcano@linaro.org <mailto:daniel.lezcano@linaro.org>> wrote:
>
>     This patch adds a sysctl schedule balance option to choose against:
>
>       * auto        (0)
>       * performance (1)
>       * power       (2)
>
>     It relies on the recently added notifier to monitor the power supply
>     changes.
>     If the scheduler balance option is set to 'auto', then when the
>     system switches
>     to battery, the balance option change to 'power' and when it goes
>     back to AC, it
>     switches to 'performance'.
>
>     The default value is 'auto'.
>
>     If the kernel is compiled without the CONFIG_POWER_SUPPLY option,
>     then any call
>     to the 'auto' option will fail and the scheduler will use the
>     'performance'
>     option as default.
>
>     Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org
>     <mailto:daniel.lezcano@linaro.org>>
>     ---
>       include/linux/sched/sysctl.h |   14 +++++++
>       kernel/sched/fair.c          |   92
>     +++++++++++++++++++++++++++++++++++++++++-
>       kernel/sysctl.c              |   11 +++++
>       3 files changed, 115 insertions(+), 2 deletions(-)
>
>     diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
>     index 8045a55..f8507bf 100644
>     --- a/include/linux/sched/sysctl.h
>     +++ b/include/linux/sched/sysctl.h
>     @@ -44,6 +44,20 @@ enum sched_tunable_scaling {
>       };
>       extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
>
>     +#ifdef CONFIG_SMP
>     +enum sched_balance_option {
>
>
> What do you think of s/option/bias/g ?
>
> It is essentially biasing the scheduler towards performance or power

Yes, could be more adequate.

>     +       SCHED_BALANCE_OPTION_PERFORMANCE,
>     +       SCHED_BALANCE_OPTION_POWER,
>     +       SCHED_BALANCE_OPTION_AUTO,
>     +       SCHED_BALANCE_OPTION_END,
>     +};
>
>
>     +extern enum sched_balance_option sysctl_sched_balance_option;
>     +
>     +int sched_proc_balance_option_handler(struct ctl_table *table, int
>     write,
>     +               void __user *buffer, size_t *length,
>     +               loff_t *ppos);
>     +#endif
>     +
>       extern unsigned int sysctl_numa_balancing_scan_delay;
>       extern unsigned int sysctl_numa_balancing_scan_period_min;
>       extern unsigned int sysctl_numa_balancing_scan_period_max;
>     diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>     index 7570dd9..7b8e93d 100644
>     --- a/kernel/sched/fair.c
>     +++ b/kernel/sched/fair.c
>     @@ -29,7 +29,7 @@
>       #include <linux/mempolicy.h>
>       #include <linux/migrate.h>
>       #include <linux/task_work.h>
>     -
>     +#include <linux/power_supply.h>
>       #include <trace/events/sched.h>
>
>       #include "sched.h"
>     @@ -61,6 +61,24 @@ unsigned int normalized_sysctl_sched_latency =
>     6000000ULL;
>       enum sched_tunable_scaling sysctl_sched_tunable_scaling
>              = SCHED_TUNABLESCALING_LOG;
>
>     +#ifdef CONFIG_SMP
>     +/*
>     + * Scheduler balancing policy:
>     + *
>     + * Options are:
>     + * SCHED_BALANCE_OPTION_PERFORMANCE - full performance
>     + * SCHED_BALANCE_OPTION_POWER - power saving aggressive
>     + * SCHED_BALANCE_OPTION_AUTO - switches to 'performance' when plugged
>     + *                             on or 'power' on battery
>     + */
>     +enum sched_balance_option sysctl_sched_balance_option
>     +       = SCHED_BALANCE_OPTION_AUTO;
>     +
>     +static int sched_current_balance_option
>     +       = SCHED_BALANCE_OPTION_PERFORMANCE;
>     +
>     +#endif
>     +
>       /*
>        * Minimal preemption granularity for CPU-bound tasks:
>        * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
>     @@ -555,6 +573,76 @@ static struct sched_entity
>     *__pick_next_entity(struct sched_entity *se)
>              return rb_entry(next, struct sched_entity, run_node);
>       }
>
>     +#ifdef CONFIG_SMP
>     +static int sched_balance_option_update(void)
>     +{
>     +       int ret;
>     +
>     +       /*
>     +        * Copy the current balance option
>     +        */
>     +       if (sysctl_sched_balance_option != SCHED_BALANCE_OPTION_AUTO) {
>     +               sched_current_balance_option =
>     sysctl_sched_balance_option;
>     +               return 0;
>     +       }
>     +
>     +       /*
>     +        * This call may fail if the kernel is not compiled with
>     +        * the POWER_SUPPLY option.
>     +        */
>     +       ret = power_supply_is_system_supplied();
>     +       if (ret < 0) {
>     +               sysctl_sched_balance_option =
>     sched_current_balance_option;
>     +               return ret;
>     +       }
>     +
>     +       /*
>     +        * When in 'auto' mode, switch to 'performance if the system
>     +        * is plugged on the wall, to 'power' if we are on battery
>     +        */
>     +       sched_current_balance_option = ret ?
>     +               SCHED_BALANCE_OPTION_PERFORMANCE :
>     +               SCHED_BALANCE_OPTION_POWER;
>     +
>     +       return 0;
>     +}
>     +
>
>
> I understand that this is only meant to kick off discussions and other
> criteria besides power being plugged in to bias the scheduler
> performance could be added later. But does it make sense to hardcode the
> power supply assumption into the scheduler?
>
> Can't we instead make sched_balance_option_update() a function pointer
> (with a default implementation that you've provided) that provide
> platforms the ability to override that with their own implementation?

I agree if that really hurts, it could be placed somewhere else, for 
example in a new file:
	kernel/sched/energy.c

But concerning the callback, I don't see the point to create a specific 
platform ops for that as the current code is generic. Do you have any 
use case in mind ?

Thanks for the review

   -- Daniel

>     +int sched_proc_balance_option_handler(struct ctl_table *table, int
>     write,
>     +               void __user *buffer, size_t *length,
>     +               loff_t *ppos)
>     +{
>     +       int ret;
>     +
>     +       ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
>     +       if (ret)
>     +               return ret;
>     +
>     +       return sched_balance_option_update();
>     +}
>     +
>     +static int sched_power_supply_notifier(struct notifier_block *b,
>     +                                      unsigned long l, void *v)
>     +{
>     +       sched_balance_option_update();
>     +       return NOTIFY_OK;
>     +}
>     +
>     +static struct notifier_block power_supply_notifier_nb = {
>     +       .notifier_call = sched_power_supply_notifier,
>     +};
>     +
>     +static int sched_balance_option_init(void)
>     +{
>     +       int ret;
>     +
>     +       ret = sched_balance_option_update();
>     +       if (ret)
>     +               return ret;
>     +
>     +       return power_supply_reg_notifier(&power_supply_notifier_nb);
>     +}
>     +#endif
>     +
>       #ifdef CONFIG_SCHED_DEBUG
>       struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
>       {
>     @@ -7695,7 +7783,7 @@ __init void init_sched_fair_class(void)
>       {
>       #ifdef CONFIG_SMP
>              open_softirq(SCHED_SOFTIRQ, run_rebalance_domains);
>     -
>     +       sched_balance_option_init();
>       #ifdef CONFIG_NO_HZ_COMMON
>              nohz.next_balance = jiffies;
>              zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
>     diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>     index 74f5b58..e4ecc7d 100644
>     --- a/kernel/sysctl.c
>     +++ b/kernel/sysctl.c
>     @@ -282,6 +282,17 @@ static struct ctl_table kern_table[] = {
>                      .mode           = 0644,
>                      .proc_handler   = proc_dointvec,
>              },
>     +#ifdef CONFIG_SMP
>     +       {
>     +               .procname       = "sched_balance_option",
>     +               .data           = &sysctl_sched_balance_option,
>     +               .maxlen         = sizeof(enum sched_balance_option),
>     +               .mode           = 0644,
>     +               .proc_handler   = sched_proc_balance_option_handler,
>     +               .extra1         = &zero, /* SCHED_BALANCE_OPTION_AUTO */
>     +               .extra2         = &two,  /*
>     SCHED_BALANCE_OPTION_POWER */
>     +       },
>     +#endif
>       #ifdef CONFIG_SCHED_DEBUG
>              {
>                      .procname       = "sched_min_granularity_ns",
>     --
>     1.7.9.5
>
>
>     _______________________________________________
>     linaro-kernel mailing list
>     linaro-kernel@lists.linaro.org <mailto:linaro-kernel@lists.linaro.org>
>     http://lists.linaro.org/mailman/listinfo/linaro-kernel
>
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle
  2014-04-24 12:24 [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Daniel Lezcano
                   ` (2 preceding siblings ...)
  2014-04-24 12:24 ` [PATCH 3/3] sched: idle: Store the idle state the cpu is Daniel Lezcano
@ 2014-04-24 16:13 ` Rafael J. Wysocki
  3 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-24 16:13 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: peterz, mingo, linux-kernel, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen, linaro-kernel

On Thursday, April 24, 2014 02:24:48 PM Daniel Lezcano wrote:
> This patchset provides three patches for the basis to integrate cpuidle with
> the scheduler.
> 
> The first patch is a cleanup.
> The second one adds the sched balance option as requested by Ingo.
> The third one stores the idle state a cpu is and adds a rcu_barrier() to
>  prevent races when using the pointed object.
> 
> This patchset is based on top of v3.15-rc2.
> 
> This patchset does not modify the behavior of the scheduler.
> 
> Taking into account the cpuidle information from the scheduler will be
> posted in a separate patchset in order to keep focused on the right decisions
> the scheduler should take regarding the policy vs idle parameters.
> 
> Daniel Lezcano (3):
>   sched: idle: Encapsulate the code to compile it out
>   sched: idle: Add sched balance option
>   sched: idle: Store the idle state the cpu is

Please rebase patch [1/3] on top of https://patchwork.kernel.org/patch/4021831/

I agree with the Amit's comment regarding the power supply assumption in
patch [2/3].

>  drivers/cpuidle/cpuidle.c    |    6 ++
>  include/linux/sched/sysctl.h |   14 ++++
>  kernel/sched/fair.c          |   92 ++++++++++++++++++++++-
>  kernel/sched/idle.c          |  169 +++++++++++++++++++++++-------------------
>  kernel/sched/sched.h         |    5 ++
>  kernel/sysctl.c              |   11 +++
>  6 files changed, 220 insertions(+), 77 deletions(-)
> 
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] sched: idle: Store the idle state the cpu is
  2014-04-24 12:24 ` [PATCH 3/3] sched: idle: Store the idle state the cpu is Daniel Lezcano
@ 2014-04-24 16:16   ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-24 16:16 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: peterz, mingo, linux-kernel, linux-pm, alex.shi, vincent.guittot,
	morten.rasmussen, linaro-kernel

On Thursday, April 24, 2014 02:24:51 PM Daniel Lezcano wrote:
> When the cpu enters idle it stores the cpuidle state pointer in the struct
> rq which in turn could be used to take a right decision when balancing
> a task.
> 
> As soon as the cpu exits the idle state, the structure is filled back with the
> NULL pointer.
> 
> There are a couple of situations where the idle state pointer could be changed
> while looking at it:
> 
> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
>    to AC that could result on removing the deeper idle state. The acpi driver
>    triggers:
> 	'acpi_processor_cst_has_changed'
> 		'cpuidle_pause_and_lock'
> 			'cpuidle_uninstall_idle_handler'
> 				'kick_all_cpus_sync'.
> 
> All cpus will exit their idle state and the pointed object will be set to NULL.
> 
> 2. The cpuidle driver is unloaded. Logically that could happen but not in
> practice because the drivers are always compiled in and 95% of the drivers are
> not coded to unregister the driver. In any case, the unloading code must call
> 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to
> 'kick_all_cpus_sync' as mentioned above.
> 
> The race can happen if we use the pointer and then one of these two situations
> occurs at the same moment.
> 
> In order to be safe, the idle state pointer stored in the rq must be used inside
> a rcu_read_lock section where we are protected with the 'rcu_barrier' in the
> 'cpuidle_uninstall_idle_handler' function.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  drivers/cpuidle/cpuidle.c |    6 ++++++
>  kernel/sched/idle.c       |    8 ++++++++
>  kernel/sched/sched.h      |    5 +++++
>  3 files changed, 19 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 8236746..6a13f40 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -190,6 +190,12 @@ void cpuidle_install_idle_handler(void)
>   */
>  void cpuidle_uninstall_idle_handler(void)
>  {
> +	/*
> +	 * Wait for the scheduler to finish to use the idle state he
> +	 * may pointing to when looking for the cpu idle states
> +	 */
> +	rcu_barrier();
> +
>  	if (enabled_devices) {
>  		initialized = 0;
>  		kick_all_cpus_sync();
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index e877dd4..4c14ec0 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -12,6 +12,8 @@
>  
>  #include <trace/events/power.h>
>  
> +#include "sched.h"
> +
>  static int __read_mostly cpu_idle_force_poll;
>  
>  void cpu_idle_poll_ctrl(bool enable)
> @@ -66,6 +68,8 @@ void __weak arch_cpu_idle(void)
>  #ifdef CONFIG_CPU_IDLE
>  static int __cpuidle_idle_call(void)
>  {
> +	struct rq *rq = this_rq();
> +	struct cpuidle_state **idle_state = &rq->idle_state;
>  	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
>  	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
>  	int next_state, entered_state, ret;
> @@ -114,6 +118,8 @@ static int __cpuidle_idle_call(void)
>  		if (!ret) {
>  			trace_cpu_idle_rcuidle(next_state, dev->cpu);
>  
> +			*idle_state = &drv->states[next_state];

What about using rq->idle_state directly here?

> +
>  			/*
>  			 * Enter the idle state previously returned by
>  			 * the governor decision. This function will
> @@ -123,6 +129,8 @@ static int __cpuidle_idle_call(void)
>  			 */
>  			entered_state = cpuidle_enter(drv, dev, next_state);
>  
> +			*idle_state = NULL;

And here?

That would be simpler it seems.

> +
>  			trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
>  			if (broadcast)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 456e492..bace64a 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -14,6 +14,7 @@
>  #include "cpuacct.h"
>  
>  struct rq;
> +struct cpuidle_state;
>  
>  extern __read_mostly int scheduler_running;
>  
> @@ -643,6 +644,10 @@ struct rq {
>  #ifdef CONFIG_SMP
>  	struct llist_head wake_list;
>  #endif
> +
> +#ifdef CONFIG_CPU_IDLE
> +	struct cpuidle_state *idle_state;
> +#endif
>  };
>  
>  static inline int cpu_of(struct rq *rq)
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-24 13:30     ` Daniel Lezcano
@ 2014-04-25 10:54       ` Amit Kucheria
  2014-04-25 11:46         ` Rafael J. Wysocki
  2014-04-25 13:22         ` Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: Amit Kucheria @ 2014-04-25 10:54 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Peter Zijlstra, Ingo Molnar, Lists linaro-kernel, Linux PM list,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Apr 24, 2014 at 7:00 PM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 04/24/2014 03:13 PM, Amit Kucheria wrote:
>>
>> On Thu, Apr 24, 2014 at 5:54 PM, Daniel Lezcano
>> <daniel.lezcano@linaro.org <mailto:daniel.lezcano@linaro.org>> wrote:
>>
>>     This patch adds a sysctl schedule balance option to choose against:
>>
>>       * auto        (0)
>>       * performance (1)
>>       * power       (2)
>>
>>     It relies on the recently added notifier to monitor the power supply
>>     changes.
>>     If the scheduler balance option is set to 'auto', then when the
>>     system switches
>>     to battery, the balance option change to 'power' and when it goes
>>     back to AC, it
>>     switches to 'performance'.
>>
>>     The default value is 'auto'.
>>
>>     If the kernel is compiled without the CONFIG_POWER_SUPPLY option,
>>     then any call
>>     to the 'auto' option will fail and the scheduler will use the
>>     'performance'
>>     option as default.
>>
>>     Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org
>>     <mailto:daniel.lezcano@linaro.org>>
>>
>>     ---
>>       include/linux/sched/sysctl.h |   14 +++++++
>>       kernel/sched/fair.c          |   92
>>     +++++++++++++++++++++++++++++++++++++++++-
>>       kernel/sysctl.c              |   11 +++++
>>       3 files changed, 115 insertions(+), 2 deletions(-)
>>
>>     diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
>>     index 8045a55..f8507bf 100644
>>     --- a/include/linux/sched/sysctl.h
>>     +++ b/include/linux/sched/sysctl.h
>>     @@ -44,6 +44,20 @@ enum sched_tunable_scaling {
>>       };
>>       extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
>>
>>     +#ifdef CONFIG_SMP
>>     +enum sched_balance_option {
>>
>>
>> What do you think of s/option/bias/g ?
>>
>> It is essentially biasing the scheduler towards performance or power
>
>
> Yes, could be more adequate.
>
>
>>     +       SCHED_BALANCE_OPTION_PERFORMANCE,
>>     +       SCHED_BALANCE_OPTION_POWER,
>>     +       SCHED_BALANCE_OPTION_AUTO,
>>     +       SCHED_BALANCE_OPTION_END,
>>     +};
>>
>>
>>     +extern enum sched_balance_option sysctl_sched_balance_option;
>>     +
>>     +int sched_proc_balance_option_handler(struct ctl_table *table, int
>>     write,
>>     +               void __user *buffer, size_t *length,
>>     +               loff_t *ppos);
>>     +#endif
>>     +
>>       extern unsigned int sysctl_numa_balancing_scan_delay;
>>       extern unsigned int sysctl_numa_balancing_scan_period_min;
>>       extern unsigned int sysctl_numa_balancing_scan_period_max;
>>     diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>     index 7570dd9..7b8e93d 100644
>>     --- a/kernel/sched/fair.c
>>     +++ b/kernel/sched/fair.c
>>     @@ -29,7 +29,7 @@
>>       #include <linux/mempolicy.h>
>>       #include <linux/migrate.h>
>>       #include <linux/task_work.h>
>>     -
>>     +#include <linux/power_supply.h>
>>       #include <trace/events/sched.h>
>>
>>       #include "sched.h"
>>     @@ -61,6 +61,24 @@ unsigned int normalized_sysctl_sched_latency =
>>     6000000ULL;
>>       enum sched_tunable_scaling sysctl_sched_tunable_scaling
>>              = SCHED_TUNABLESCALING_LOG;
>>
>>     +#ifdef CONFIG_SMP
>>     +/*
>>     + * Scheduler balancing policy:
>>     + *
>>     + * Options are:
>>     + * SCHED_BALANCE_OPTION_PERFORMANCE - full performance
>>     + * SCHED_BALANCE_OPTION_POWER - power saving aggressive
>>     + * SCHED_BALANCE_OPTION_AUTO - switches to 'performance' when plugged
>>     + *                             on or 'power' on battery
>>     + */
>>     +enum sched_balance_option sysctl_sched_balance_option
>>     +       = SCHED_BALANCE_OPTION_AUTO;
>>     +
>>     +static int sched_current_balance_option
>>     +       = SCHED_BALANCE_OPTION_PERFORMANCE;
>>     +
>>     +#endif
>>     +
>>       /*
>>        * Minimal preemption granularity for CPU-bound tasks:
>>        * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
>>     @@ -555,6 +573,76 @@ static struct sched_entity
>>     *__pick_next_entity(struct sched_entity *se)
>>              return rb_entry(next, struct sched_entity, run_node);
>>       }
>>
>>     +#ifdef CONFIG_SMP
>>     +static int sched_balance_option_update(void)
>>     +{
>>     +       int ret;
>>     +
>>     +       /*
>>     +        * Copy the current balance option
>>     +        */
>>     +       if (sysctl_sched_balance_option != SCHED_BALANCE_OPTION_AUTO) {
>>     +               sched_current_balance_option =
>>     sysctl_sched_balance_option;
>>     +               return 0;
>>     +       }
>>     +
>>     +       /*
>>     +        * This call may fail if the kernel is not compiled with
>>     +        * the POWER_SUPPLY option.
>>     +        */
>>     +       ret = power_supply_is_system_supplied();
>>     +       if (ret < 0) {
>>     +               sysctl_sched_balance_option =
>>     sched_current_balance_option;
>>     +               return ret;
>>     +       }
>>     +
>>     +       /*
>>     +        * When in 'auto' mode, switch to 'performance if the system
>>     +        * is plugged on the wall, to 'power' if we are on battery
>>     +        */
>>     +       sched_current_balance_option = ret ?
>>     +               SCHED_BALANCE_OPTION_PERFORMANCE :
>>     +               SCHED_BALANCE_OPTION_POWER;
>>     +
>>     +       return 0;
>>     +}
>>     +
>>
>>
>> I understand that this is only meant to kick off discussions and other
>> criteria besides power being plugged in to bias the scheduler
>> performance could be added later. But does it make sense to hardcode the
>> power supply assumption into the scheduler?
>>
>> Can't we instead make sched_balance_option_update() a function pointer
>> (with a default implementation that you've provided) that provide
>> platforms the ability to override that with their own implementation?
>
>
> I agree if that really hurts, it could be placed somewhere else, for example in a new file:
>         kernel/sched/energy.c
>
> But concerning the callback, I don't see the point to create a specific platform ops for that as the current code is generic. Do you have any use case in mind ?
>

I had a offline conversation with Daniel about this since there are
other triggers - thermal constraints and game-like apps being examples
- that might want to override the system policy. He intended
"performance" mode to mean the existing code paths and "power" mode to
mean *additional* new heuristics for energy-efficiency. The power
supply assumption is just the first one of those heuristics.

In that case, "performance" should be the default so we don't change
existing system behavior. And perhaps we want to keep these energy
heuristics in a separate file, at least until we've gotten them right.

While we're at it, can we attempt to replace power with energy since
that is what we're trying to save in most cases?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 10:54       ` Amit Kucheria
@ 2014-04-25 11:46         ` Rafael J. Wysocki
  2014-04-25 12:17           ` Daniel Lezcano
  2014-04-25 13:20           ` Peter Zijlstra
  2014-04-25 13:22         ` Peter Zijlstra
  1 sibling, 2 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-25 11:46 UTC (permalink / raw)
  To: Amit Kucheria
  Cc: Daniel Lezcano, Peter Zijlstra, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Friday, April 25, 2014 04:24:49 PM Amit Kucheria wrote:
> On Thu, Apr 24, 2014 at 7:00 PM, Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
> >
> > On 04/24/2014 03:13 PM, Amit Kucheria wrote:
> >>
> >> On Thu, Apr 24, 2014 at 5:54 PM, Daniel Lezcano
> >> <daniel.lezcano@linaro.org <mailto:daniel.lezcano@linaro.org>> wrote:
> >>
> >>     This patch adds a sysctl schedule balance option to choose against:
> >>
> >>       * auto        (0)
> >>       * performance (1)
> >>       * power       (2)
> >>
> >>     It relies on the recently added notifier to monitor the power supply
> >>     changes.
> >>     If the scheduler balance option is set to 'auto', then when the
> >>     system switches
> >>     to battery, the balance option change to 'power' and when it goes
> >>     back to AC, it
> >>     switches to 'performance'.
> >>
> >>     The default value is 'auto'.
> >>
> >>     If the kernel is compiled without the CONFIG_POWER_SUPPLY option,
> >>     then any call
> >>     to the 'auto' option will fail and the scheduler will use the
> >>     'performance'
> >>     option as default.
> >>
> >>     Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org
> >>     <mailto:daniel.lezcano@linaro.org>>
> >>
> >>     ---
> >>       include/linux/sched/sysctl.h |   14 +++++++
> >>       kernel/sched/fair.c          |   92
> >>     +++++++++++++++++++++++++++++++++++++++++-
> >>       kernel/sysctl.c              |   11 +++++
> >>       3 files changed, 115 insertions(+), 2 deletions(-)
> >>
> >>     diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> >>     index 8045a55..f8507bf 100644
> >>     --- a/include/linux/sched/sysctl.h
> >>     +++ b/include/linux/sched/sysctl.h
> >>     @@ -44,6 +44,20 @@ enum sched_tunable_scaling {
> >>       };
> >>       extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
> >>
> >>     +#ifdef CONFIG_SMP
> >>     +enum sched_balance_option {
> >>
> >>
> >> What do you think of s/option/bias/g ?
> >>
> >> It is essentially biasing the scheduler towards performance or power
> >
> >
> > Yes, could be more adequate.
> >
> >
> >>     +       SCHED_BALANCE_OPTION_PERFORMANCE,
> >>     +       SCHED_BALANCE_OPTION_POWER,
> >>     +       SCHED_BALANCE_OPTION_AUTO,
> >>     +       SCHED_BALANCE_OPTION_END,
> >>     +};
> >>
> >>
> >>     +extern enum sched_balance_option sysctl_sched_balance_option;
> >>     +
> >>     +int sched_proc_balance_option_handler(struct ctl_table *table, int
> >>     write,
> >>     +               void __user *buffer, size_t *length,
> >>     +               loff_t *ppos);
> >>     +#endif
> >>     +
> >>       extern unsigned int sysctl_numa_balancing_scan_delay;
> >>       extern unsigned int sysctl_numa_balancing_scan_period_min;
> >>       extern unsigned int sysctl_numa_balancing_scan_period_max;
> >>     diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>     index 7570dd9..7b8e93d 100644
> >>     --- a/kernel/sched/fair.c
> >>     +++ b/kernel/sched/fair.c
> >>     @@ -29,7 +29,7 @@
> >>       #include <linux/mempolicy.h>
> >>       #include <linux/migrate.h>
> >>       #include <linux/task_work.h>
> >>     -
> >>     +#include <linux/power_supply.h>
> >>       #include <trace/events/sched.h>
> >>
> >>       #include "sched.h"
> >>     @@ -61,6 +61,24 @@ unsigned int normalized_sysctl_sched_latency =
> >>     6000000ULL;
> >>       enum sched_tunable_scaling sysctl_sched_tunable_scaling
> >>              = SCHED_TUNABLESCALING_LOG;
> >>
> >>     +#ifdef CONFIG_SMP
> >>     +/*
> >>     + * Scheduler balancing policy:
> >>     + *
> >>     + * Options are:
> >>     + * SCHED_BALANCE_OPTION_PERFORMANCE - full performance
> >>     + * SCHED_BALANCE_OPTION_POWER - power saving aggressive
> >>     + * SCHED_BALANCE_OPTION_AUTO - switches to 'performance' when plugged
> >>     + *                             on or 'power' on battery
> >>     + */
> >>     +enum sched_balance_option sysctl_sched_balance_option
> >>     +       = SCHED_BALANCE_OPTION_AUTO;
> >>     +
> >>     +static int sched_current_balance_option
> >>     +       = SCHED_BALANCE_OPTION_PERFORMANCE;
> >>     +
> >>     +#endif
> >>     +
> >>       /*
> >>        * Minimal preemption granularity for CPU-bound tasks:
> >>        * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
> >>     @@ -555,6 +573,76 @@ static struct sched_entity
> >>     *__pick_next_entity(struct sched_entity *se)
> >>              return rb_entry(next, struct sched_entity, run_node);
> >>       }
> >>
> >>     +#ifdef CONFIG_SMP
> >>     +static int sched_balance_option_update(void)
> >>     +{
> >>     +       int ret;
> >>     +
> >>     +       /*
> >>     +        * Copy the current balance option
> >>     +        */
> >>     +       if (sysctl_sched_balance_option != SCHED_BALANCE_OPTION_AUTO) {
> >>     +               sched_current_balance_option =
> >>     sysctl_sched_balance_option;
> >>     +               return 0;
> >>     +       }
> >>     +
> >>     +       /*
> >>     +        * This call may fail if the kernel is not compiled with
> >>     +        * the POWER_SUPPLY option.
> >>     +        */
> >>     +       ret = power_supply_is_system_supplied();
> >>     +       if (ret < 0) {
> >>     +               sysctl_sched_balance_option =
> >>     sched_current_balance_option;
> >>     +               return ret;
> >>     +       }
> >>     +
> >>     +       /*
> >>     +        * When in 'auto' mode, switch to 'performance if the system
> >>     +        * is plugged on the wall, to 'power' if we are on battery
> >>     +        */
> >>     +       sched_current_balance_option = ret ?
> >>     +               SCHED_BALANCE_OPTION_PERFORMANCE :
> >>     +               SCHED_BALANCE_OPTION_POWER;
> >>     +
> >>     +       return 0;
> >>     +}
> >>     +
> >>
> >>
> >> I understand that this is only meant to kick off discussions and other
> >> criteria besides power being plugged in to bias the scheduler
> >> performance could be added later. But does it make sense to hardcode the
> >> power supply assumption into the scheduler?
> >>
> >> Can't we instead make sched_balance_option_update() a function pointer
> >> (with a default implementation that you've provided) that provide
> >> platforms the ability to override that with their own implementation?
> >
> >
> > I agree if that really hurts, it could be placed somewhere else, for example in a new file:
> >         kernel/sched/energy.c
> >
> > But concerning the callback, I don't see the point to create a specific platform ops for that as the current code is generic. Do you have any use case in mind ?
> >
> 
> I had a offline conversation with Daniel about this since there are
> other triggers - thermal constraints and game-like apps being examples
> - that might want to override the system policy. He intended
> "performance" mode to mean the existing code paths and "power" mode to
> mean *additional* new heuristics for energy-efficiency. The power
> supply assumption is just the first one of those heuristics.

Well, so now the question is whether or not we relly want to always
go to the "power" (or "energy efficiency" if you will) mode if the system
is on battery.  That arguably may not be a good thing even for energy
efficiency depending on how exactly the modes are defined.

So in my opinion it's too early to add things like that at this point.

> In that case, "performance" should be the default so we don't change
> existing system behavior. And perhaps we want to keep these energy
> heuristics in a separate file, at least until we've gotten them right.
> 
> While we're at it, can we attempt to replace power with energy since
> that is what we're trying to save in most cases?

That would be a good thing IMO.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 11:46         ` Rafael J. Wysocki
@ 2014-04-25 12:17           ` Daniel Lezcano
  2014-04-25 13:20           ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-25 12:17 UTC (permalink / raw)
  To: Rafael J. Wysocki, Amit Kucheria
  Cc: Peter Zijlstra, Ingo Molnar, Lists linaro-kernel, Linux PM list,
	Linux Kernel Mailing List

On 04/25/2014 01:46 PM, Rafael J. Wysocki wrote:
> On Friday, April 25, 2014 04:24:49 PM Amit Kucheria wrote:
>> On Thu, Apr 24, 2014 at 7:00 PM, Daniel Lezcano
>> <daniel.lezcano@linaro.org> wrote:
>>>
>>> On 04/24/2014 03:13 PM, Amit Kucheria wrote:
>>>>
>>>> On Thu, Apr 24, 2014 at 5:54 PM, Daniel Lezcano
>>>> <daniel.lezcano@linaro.org <mailto:daniel.lezcano@linaro.org>> wrote:
>>>>
>>>>      This patch adds a sysctl schedule balance option to choose against:
>>>>
>>>>        * auto        (0)
>>>>        * performance (1)
>>>>        * power       (2)
>>>>
>>>>      It relies on the recently added notifier to monitor the power supply
>>>>      changes.
>>>>      If the scheduler balance option is set to 'auto', then when the
>>>>      system switches
>>>>      to battery, the balance option change to 'power' and when it goes
>>>>      back to AC, it
>>>>      switches to 'performance'.
>>>>
>>>>      The default value is 'auto'.
>>>>
>>>>      If the kernel is compiled without the CONFIG_POWER_SUPPLY option,
>>>>      then any call
>>>>      to the 'auto' option will fail and the scheduler will use the
>>>>      'performance'
>>>>      option as default.
>>>>
>>>>      Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org
>>>>      <mailto:daniel.lezcano@linaro.org>>
>>>>
>>>>      ---
>>>>        include/linux/sched/sysctl.h |   14 +++++++
>>>>        kernel/sched/fair.c          |   92
>>>>      +++++++++++++++++++++++++++++++++++++++++-
>>>>        kernel/sysctl.c              |   11 +++++
>>>>        3 files changed, 115 insertions(+), 2 deletions(-)
>>>>
>>>>      diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
>>>>      index 8045a55..f8507bf 100644
>>>>      --- a/include/linux/sched/sysctl.h
>>>>      +++ b/include/linux/sched/sysctl.h
>>>>      @@ -44,6 +44,20 @@ enum sched_tunable_scaling {
>>>>        };
>>>>        extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
>>>>
>>>>      +#ifdef CONFIG_SMP
>>>>      +enum sched_balance_option {
>>>>
>>>>
>>>> What do you think of s/option/bias/g ?
>>>>
>>>> It is essentially biasing the scheduler towards performance or power
>>>
>>>
>>> Yes, could be more adequate.
>>>
>>>
>>>>      +       SCHED_BALANCE_OPTION_PERFORMANCE,
>>>>      +       SCHED_BALANCE_OPTION_POWER,
>>>>      +       SCHED_BALANCE_OPTION_AUTO,
>>>>      +       SCHED_BALANCE_OPTION_END,
>>>>      +};
>>>>
>>>>
>>>>      +extern enum sched_balance_option sysctl_sched_balance_option;
>>>>      +
>>>>      +int sched_proc_balance_option_handler(struct ctl_table *table, int
>>>>      write,
>>>>      +               void __user *buffer, size_t *length,
>>>>      +               loff_t *ppos);
>>>>      +#endif
>>>>      +
>>>>        extern unsigned int sysctl_numa_balancing_scan_delay;
>>>>        extern unsigned int sysctl_numa_balancing_scan_period_min;
>>>>        extern unsigned int sysctl_numa_balancing_scan_period_max;
>>>>      diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>      index 7570dd9..7b8e93d 100644
>>>>      --- a/kernel/sched/fair.c
>>>>      +++ b/kernel/sched/fair.c
>>>>      @@ -29,7 +29,7 @@
>>>>        #include <linux/mempolicy.h>
>>>>        #include <linux/migrate.h>
>>>>        #include <linux/task_work.h>
>>>>      -
>>>>      +#include <linux/power_supply.h>
>>>>        #include <trace/events/sched.h>
>>>>
>>>>        #include "sched.h"
>>>>      @@ -61,6 +61,24 @@ unsigned int normalized_sysctl_sched_latency =
>>>>      6000000ULL;
>>>>        enum sched_tunable_scaling sysctl_sched_tunable_scaling
>>>>               = SCHED_TUNABLESCALING_LOG;
>>>>
>>>>      +#ifdef CONFIG_SMP
>>>>      +/*
>>>>      + * Scheduler balancing policy:
>>>>      + *
>>>>      + * Options are:
>>>>      + * SCHED_BALANCE_OPTION_PERFORMANCE - full performance
>>>>      + * SCHED_BALANCE_OPTION_POWER - power saving aggressive
>>>>      + * SCHED_BALANCE_OPTION_AUTO - switches to 'performance' when plugged
>>>>      + *                             on or 'power' on battery
>>>>      + */
>>>>      +enum sched_balance_option sysctl_sched_balance_option
>>>>      +       = SCHED_BALANCE_OPTION_AUTO;
>>>>      +
>>>>      +static int sched_current_balance_option
>>>>      +       = SCHED_BALANCE_OPTION_PERFORMANCE;
>>>>      +
>>>>      +#endif
>>>>      +
>>>>        /*
>>>>         * Minimal preemption granularity for CPU-bound tasks:
>>>>         * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
>>>>      @@ -555,6 +573,76 @@ static struct sched_entity
>>>>      *__pick_next_entity(struct sched_entity *se)
>>>>               return rb_entry(next, struct sched_entity, run_node);
>>>>        }
>>>>
>>>>      +#ifdef CONFIG_SMP
>>>>      +static int sched_balance_option_update(void)
>>>>      +{
>>>>      +       int ret;
>>>>      +
>>>>      +       /*
>>>>      +        * Copy the current balance option
>>>>      +        */
>>>>      +       if (sysctl_sched_balance_option != SCHED_BALANCE_OPTION_AUTO) {
>>>>      +               sched_current_balance_option =
>>>>      sysctl_sched_balance_option;
>>>>      +               return 0;
>>>>      +       }
>>>>      +
>>>>      +       /*
>>>>      +        * This call may fail if the kernel is not compiled with
>>>>      +        * the POWER_SUPPLY option.
>>>>      +        */
>>>>      +       ret = power_supply_is_system_supplied();
>>>>      +       if (ret < 0) {
>>>>      +               sysctl_sched_balance_option =
>>>>      sched_current_balance_option;
>>>>      +               return ret;
>>>>      +       }
>>>>      +
>>>>      +       /*
>>>>      +        * When in 'auto' mode, switch to 'performance if the system
>>>>      +        * is plugged on the wall, to 'power' if we are on battery
>>>>      +        */
>>>>      +       sched_current_balance_option = ret ?
>>>>      +               SCHED_BALANCE_OPTION_PERFORMANCE :
>>>>      +               SCHED_BALANCE_OPTION_POWER;
>>>>      +
>>>>      +       return 0;
>>>>      +}
>>>>      +
>>>>
>>>>
>>>> I understand that this is only meant to kick off discussions and other
>>>> criteria besides power being plugged in to bias the scheduler
>>>> performance could be added later. But does it make sense to hardcode the
>>>> power supply assumption into the scheduler?
>>>>
>>>> Can't we instead make sched_balance_option_update() a function pointer
>>>> (with a default implementation that you've provided) that provide
>>>> platforms the ability to override that with their own implementation?
>>>
>>>
>>> I agree if that really hurts, it could be placed somewhere else, for example in a new file:
>>>          kernel/sched/energy.c
>>>
>>> But concerning the callback, I don't see the point to create a specific platform ops for that as the current code is generic. Do you have any use case in mind ?
>>>
>>
>> I had a offline conversation with Daniel about this since there are
>> other triggers - thermal constraints and game-like apps being examples
>> - that might want to override the system policy. He intended
>> "performance" mode to mean the existing code paths and "power" mode to
>> mean *additional* new heuristics for energy-efficiency. The power
>> supply assumption is just the first one of those heuristics.
>
> Well, so now the question is whether or not we relly want to always
> go to the "power" (or "energy efficiency" if you will) mode if the system
> is on battery.  That arguably may not be a good thing even for energy
> efficiency depending on how exactly the modes are defined.
>
> So in my opinion it's too early to add things like that at this point.

Ok, the point to solve is to have the old code path (performance) and 
the new code path (power) to be selectable from userspace.

It is acceptable to keep the power/performance option for now without 
the power supply thing ? And we address the policy to switch from one 
mode to another mode with another option later ?


>> In that case, "performance" should be the default so we don't change
>> existing system behavior. And perhaps we want to keep these energy
>> heuristics in a separate file, at least until we've gotten them right.
>>
>> While we're at it, can we attempt to replace power with energy since
>> that is what we're trying to save in most cases?
>
> That would be a good thing IMO.


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 11:46         ` Rafael J. Wysocki
  2014-04-25 12:17           ` Daniel Lezcano
@ 2014-04-25 13:20           ` Peter Zijlstra
  2014-04-25 17:01             ` Daniel Lezcano
  2014-04-26  0:18             ` Rafael J. Wysocki
  1 sibling, 2 replies; 33+ messages in thread
From: Peter Zijlstra @ 2014-04-25 13:20 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Amit Kucheria, Daniel Lezcano, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Fri, Apr 25, 2014 at 01:46:53PM +0200, Rafael J. Wysocki wrote:

_trim_ emails!!! one of these days I'm going to write a bot to flame
your head of if there's excessive quoting.

> > I had a offline conversation with Daniel about this since there are
> > other triggers - thermal constraints and game-like apps being examples
> > - that might want to override the system policy. He intended
> > "performance" mode to mean the existing code paths and "power" mode to
> > mean *additional* new heuristics for energy-efficiency. The power
> > supply assumption is just the first one of those heuristics.
> 
> Well, so now the question is whether or not we relly want to always
> go to the "power" (or "energy efficiency" if you will) mode if the system
> is on battery.  That arguably may not be a good thing even for energy
> efficiency depending on how exactly the modes are defined.

Nobody is talking about always. But in general it seems a good enough
approach. Hell, many of the AC/BAT switches in todays power management
crap things are not always right.

Do I want it to dim the LCD further when I unplug the laptop -- mostly
no, but still it does. And the most annoying one is that it reduces the
screen blank time to something near 5 seconds or so.

Why would this be any different? If you know what you want you can turn
the knob.

> So in my opinion it's too early to add things like that at this point.

Meh.. 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 10:54       ` Amit Kucheria
  2014-04-25 11:46         ` Rafael J. Wysocki
@ 2014-04-25 13:22         ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2014-04-25 13:22 UTC (permalink / raw)
  To: Amit Kucheria
  Cc: Daniel Lezcano, Ingo Molnar, Lists linaro-kernel, Linux PM list,
	Rafael J. Wysocki, Linux Kernel Mailing List


The infinite wisdom of the original PC keyboard gifted us with backspace
and delete, use them!

On Fri, Apr 25, 2014 at 04:24:49PM +0530, Amit Kucheria wrote:
> In that case, "performance" should be the default so we don't change
> existing system behavior. And perhaps we want to keep these energy
> heuristics in a separate file, at least until we've gotten them right.

No, we're not going to make a separate file. Its going to be fully
integrated in the normal load balancer, no special add on hacks and
warts.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 13:20           ` Peter Zijlstra
@ 2014-04-25 17:01             ` Daniel Lezcano
  2014-04-25 18:43               ` Peter Zijlstra
  2014-04-26  0:18             ` Rafael J. Wysocki
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-25 17:01 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J. Wysocki
  Cc: Amit Kucheria, Ingo Molnar, Lists linaro-kernel, Linux PM list,
	Linux Kernel Mailing List

On 04/25/2014 03:20 PM, Peter Zijlstra wrote:
> On Fri, Apr 25, 2014 at 01:46:53PM +0200, Rafael J. Wysocki wrote:
>
> _trim_ emails!!! one of these days I'm going to write a bot to flame
> your head of if there's excessive quoting.
>
>>> I had a offline conversation with Daniel about this since there are
>>> other triggers - thermal constraints and game-like apps being examples
>>> - that might want to override the system policy. He intended
>>> "performance" mode to mean the existing code paths and "power" mode to
>>> mean *additional* new heuristics for energy-efficiency. The power
>>> supply assumption is just the first one of those heuristics.
>>
>> Well, so now the question is whether or not we relly want to always
>> go to the "power" (or "energy efficiency" if you will) mode if the system
>> is on battery.  That arguably may not be a good thing even for energy
>> efficiency depending on how exactly the modes are defined.
>
> Nobody is talking about always. But in general it seems a good enough
> approach. Hell, many of the AC/BAT switches in todays power management
> crap things are not always right.
>
> Do I want it to dim the LCD further when I unplug the laptop -- mostly
> no, but still it does. And the most annoying one is that it reduces the
> screen blank time to something near 5 seconds or so.
>
> Why would this be any different? If you know what you want you can turn
> the knob.
>
>> So in my opinion it's too early to add things like that at this point.
>
> Meh..
>

Peter,

I assume the patchset is correct for you (modulo the few comments about 
code), right ?

As the sysctl is some kind of ABI, I would like to make sure we reach a 
consensus and discuss a bit about that.

I understand Rafael and Amit could be reluctant with the power supply to 
be hardcoded. As mentioned Amit, we may want to switch to 'power' if the 
thermal framework tells us we are reaching a threshold. For example, the 
system is set to 'auto', we are on battery, thus the current policy is 
'power', we plug the device, the current policy will switch to 
'performance' but the thermal framework may want to do 'power' because 
of some thermal constraint.

IMO, the power supply part could be extended to take into account the 
thermal.

The other point is, I guess, what should do the 'power' policy ? pack 
tasks or not ? Override the cpuidle governor and choose the deeper idle 
states or not ? etc ... So in other words, how can we put a cursor 
between 'performance' and 'power'. Knowing on some platform one may be 
more efficient than another platform.

IIUC, 'performance' for you means 'max performance' and 'power' means 
'max power', and a list of sysctl for each power/performance option give 
us the knobs to tune one or another policy ? (Each platform defining its 
own options)

Would this approach be ok ?

Thanks

   -- Daniel

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 17:01             ` Daniel Lezcano
@ 2014-04-25 18:43               ` Peter Zijlstra
  2014-04-28 10:09                 ` Daniel Lezcano
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2014-04-25 18:43 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On Fri, Apr 25, 2014 at 07:01:23PM +0200, Daniel Lezcano wrote:
> As the sysctl is some kind of ABI, I would like to make sure we reach a
> consensus and discuss a bit about that.

We could make it a sysfs file, like /sys/power/state, which when read
provides the words it takes.

That is more flexible than a numeric sysctl for which we have to keep an
enumeration.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 13:20           ` Peter Zijlstra
  2014-04-25 17:01             ` Daniel Lezcano
@ 2014-04-26  0:18             ` Rafael J. Wysocki
  2014-04-26  6:17               ` Ingo Molnar
  2014-04-26 10:55               ` Peter Zijlstra
  1 sibling, 2 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-26  0:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Amit Kucheria, Daniel Lezcano, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Friday, April 25, 2014 03:20:55 PM Peter Zijlstra wrote:
> On Fri, Apr 25, 2014 at 01:46:53PM +0200, Rafael J. Wysocki wrote:
> 
> _trim_ emails!!! one of these days I'm going to write a bot to flame
> your head of if there's excessive quoting.

Words ...

> > 
> > Well, so now the question is whether or not we relly want to always
> > go to the "power" (or "energy efficiency" if you will) mode if the system
> > is on battery.  That arguably may not be a good thing even for energy
> > efficiency depending on how exactly the modes are defined.
> 
> Nobody is talking about always. But in general it seems a good enough
> approach. Hell, many of the AC/BAT switches in todays power management
> crap things are not always right.
> 
> Do I want it to dim the LCD further when I unplug the laptop -- mostly
> no, but still it does. And the most annoying one is that it reduces the
> screen blank time to something near 5 seconds or so.
> 
> Why would this be any different?

And why do we have to do things that we hate it when they are done by others?

> If you know what you want you can turn the knob.
> 
> > So in my opinion it's too early to add things like that at this point.
> 
> Meh.. 

Seriously, I'm not even sure if we really need that stuff to be honest.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-26  0:18             ` Rafael J. Wysocki
@ 2014-04-26  6:17               ` Ingo Molnar
  2014-04-27 13:23                 ` Rafael J. Wysocki
  2014-04-26 10:55               ` Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2014-04-26  6:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Amit Kucheria, Daniel Lezcano, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List


* Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> > > Well, so now the question is whether or not we relly want to 
> > > always go to the "power" (or "energy efficiency" if you will) 
> > > mode if the system is on battery.  That arguably may not be a 
> > > good thing even for energy efficiency depending on how exactly 
> > > the modes are defined.
> > 
> > Nobody is talking about always. But in general it seems a good 
> > enough approach. Hell, many of the AC/BAT switches in todays power 
> > management crap things are not always right.
> > 
> > Do I want it to dim the LCD further when I unplug the laptop -- 
> > mostly no, but still it does. And the most annoying one is that it 
> > reduces the screen blank time to something near 5 seconds or so.
> > 
> > Why would this be any different?
> 
> And why do we have to do things that we hate it when they are done 
> by others?

He replied to your question of 'do we want to act on power events'.

The answer is: yes, from the scheduler point of view we want to act on 
power events by default, and if a user does not want that default 
behavior, it's not an unprecedented request and GUIs offer various 
ways to tweak screen dimming and other power saving details.

So "trying to save power" is the default everywhere, and the scheduler 
wants to do the same. The main reason we couldn't do this before was 
that the scheduler's 'power saving mode' was dysfunctional. That is 
being fixed.

So lets try this, it's high time.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-26  0:18             ` Rafael J. Wysocki
  2014-04-26  6:17               ` Ingo Molnar
@ 2014-04-26 10:55               ` Peter Zijlstra
  2014-04-27 13:54                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2014-04-26 10:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Amit Kucheria, Daniel Lezcano, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Sat, Apr 26, 2014 at 02:18:58AM +0200, Rafael J. Wysocki wrote:
> And why do we have to do things that we hate it when they are done by others?

Well, I'm assuming that these things that I dislike are actually useful
and liked by the majority (me not fitting that is nothing new). If
everybody hates these things, WTF are they still the default on pretty
much every OS out there, Windows and Mac OS do the same screen dimming
things when they loose AC last time I touched one of them.

Also, whilst I dislike screen dimming with a passion, I wouldn't
actually dislike the power aware balancer (assuming it would actually
work). Then I'm on BAT I tend to not actually do much more than read
email/browse web and maybe write some code. But I rarely if ever compile
when on battery.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-26  6:17               ` Ingo Molnar
@ 2014-04-27 13:23                 ` Rafael J. Wysocki
  2014-04-29 10:50                   ` Morten Rasmussen
  0 siblings, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-27 13:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Amit Kucheria, Daniel Lezcano, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On Saturday, April 26, 2014 08:17:44 AM Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> 
> > > > Well, so now the question is whether or not we relly want to 
> > > > always go to the "power" (or "energy efficiency" if you will) 
> > > > mode if the system is on battery.  That arguably may not be a 
> > > > good thing even for energy efficiency depending on how exactly 
> > > > the modes are defined.
> > > 
> > > Nobody is talking about always. But in general it seems a good 
> > > enough approach. Hell, many of the AC/BAT switches in todays power 
> > > management crap things are not always right.
> > > 
> > > Do I want it to dim the LCD further when I unplug the laptop -- 
> > > mostly no, but still it does. And the most annoying one is that it 
> > > reduces the screen blank time to something near 5 seconds or so.
> > > 
> > > Why would this be any different?
> > 
> > And why do we have to do things that we hate it when they are done 
> > by others?
> 
> He replied to your question of 'do we want to act on power events'.
> 
> The answer is: yes, from the scheduler point of view we want to act on 
> power events by default, and if a user does not want that default 
> behavior, it's not an unprecedented request and GUIs offer various 
> ways to tweak screen dimming and other power saving details.
> 
> So "trying to save power" is the default everywhere, and the scheduler 
> wants to do the same. The main reason we couldn't do this before was 
> that the scheduler's 'power saving mode' was dysfunctional. That is 
> being fixed.
> 
> So lets try this, it's high time.

Thanks for stating your perspective clearly, this is good to know.

Still, I have a rather fundamental problem with the notion that performance
and energy efficiency are essentially at odds with each other, because quite
often they aren't.  What is good for performance is often good for energy
efficiency as well (like when it may be better to do a certain amount of work
quickly and then go idle instead of slowing down and taking more time to do
the work, because that time may turn out to be so long that energy used for
doing the work will actually be greater).  So while I'm all for acting on power
events, I'm also for being rather careful with that.

Moreover, in fact performance and energy efficiency are not the only factors
that need to be taken into account, there also is the maximum power draw (ie.
the rate of using energy) that can be sustained.  Arguably, all three of them
always matter, regardless of whether or not the system is on battery.

People often mentally translate energy efficiency into battery life, because
that's probably the most obvious case, but in fact it generally translates into
the cost of doing work (in therms of money or other things).  Going to battery
means something like "energy is now more valuable to me", but there may be other
events meaning the same (for example, energy may be cheaper in the night etc).
So it looks like we need a global mechanism to represent the current relative
value of energy to the user and to update it whenever something like "we're on
battery now" happens (other subsystems would benefit from that too IMO).

The same applies to the maximum sustainable power draw limit more or less.
While it generally changes between "on battery" and "on AC power", it may also
depend on other factors like the capacity of the available power supply for a
data center.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-26 10:55               ` Peter Zijlstra
@ 2014-04-27 13:54                 ` Rafael J. Wysocki
  2014-04-27 19:39                   ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-27 13:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Amit Kucheria, Daniel Lezcano, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Saturday, April 26, 2014 12:55:37 PM Peter Zijlstra wrote:
> On Sat, Apr 26, 2014 at 02:18:58AM +0200, Rafael J. Wysocki wrote:
> > And why do we have to do things that we hate it when they are done by others?
> 
> Well, I'm assuming that these things that I dislike are actually useful
> and liked by the majority (me not fitting that is nothing new). If
> everybody hates these things, WTF are they still the default on pretty
> much every OS out there, Windows and Mac OS do the same screen dimming
> things when they loose AC last time I touched one of them.

Well, vendors have certain battery life targets for their products (you
know, something for the marketing people to brag about) and tend to be
quite aggressive about doing whatever they can to reach them.

We don't have such targets for the mainline kernel, though, do we? ;-)

> Also, whilst I dislike screen dimming with a passion, I wouldn't
> actually dislike the power aware balancer (assuming it would actually
> work). Then I'm on BAT I tend to not actually do much more than read
> email/browse web and maybe write some code. But I rarely if ever compile
> when on battery.

So the relative value of energy grows for you when going from AC to battery.
That may happen in other situations too (like when the cost of energy in
terms of money depends on the time of day which may be the case in some
geographies etc.), so "battery vs AC" is not really special.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-27 13:54                 ` Rafael J. Wysocki
@ 2014-04-27 19:39                   ` Peter Zijlstra
  0 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2014-04-27 19:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Amit Kucheria, Daniel Lezcano, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Sun, Apr 27, 2014 at 03:54:32PM +0200, Rafael J. Wysocki wrote:
> On Saturday, April 26, 2014 12:55:37 PM Peter Zijlstra wrote:
> > On Sat, Apr 26, 2014 at 02:18:58AM +0200, Rafael J. Wysocki wrote:
> > > And why do we have to do things that we hate it when they are done by others?
> > 
> > Well, I'm assuming that these things that I dislike are actually useful
> > and liked by the majority (me not fitting that is nothing new). If
> > everybody hates these things, WTF are they still the default on pretty
> > much every OS out there, Windows and Mac OS do the same screen dimming
> > things when they loose AC last time I touched one of them.
> 
> Well, vendors have certain battery life targets for their products (you
> know, something for the marketing people to brag about) and tend to be
> quite aggressive about doing whatever they can to reach them.

Well, I was just hoping the industry wasn't _that_ rotten and there was
actual a majority of people liking this.

> We don't have such targets for the mainline kernel, though, do we? ;-)

No, for sure :-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-25 18:43               ` Peter Zijlstra
@ 2014-04-28 10:09                 ` Daniel Lezcano
  2014-04-28 10:28                   ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-28 10:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On 04/25/2014 08:43 PM, Peter Zijlstra wrote:
> On Fri, Apr 25, 2014 at 07:01:23PM +0200, Daniel Lezcano wrote:
>> As the sysctl is some kind of ABI, I would like to make sure we reach a
>> consensus and discuss a bit about that.
>
> We could make it a sysfs file, like /sys/power/state, which when read
> provides the words it takes.
>
> That is more flexible than a numeric sysctl for which we have to keep an
> enumeration.

I agree a numerical value is not flexible. But it sounds weird to put a 
scheduler option in the sysfs and maybe more options will follow.

I am wondering if we shouldn't create a new cgroup for 'energy' and put 
everything in there. So we will have more flexibility for extension and 
we will be able to create a group of tasks for performance and a group 
of tasks for energy saving.

Does it make sense ?

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-28 10:09                 ` Daniel Lezcano
@ 2014-04-28 10:28                   ` Peter Zijlstra
  2014-04-28 11:07                     ` Daniel Lezcano
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2014-04-28 10:28 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On Mon, Apr 28, 2014 at 12:09:20PM +0200, Daniel Lezcano wrote:
> I agree a numerical value is not flexible. But it sounds weird to put a
> scheduler option in the sysfs and maybe more options will follow.
> 
> I am wondering if we shouldn't create a new cgroup for 'energy' and put
> everything in there. So we will have more flexibility for extension and we
> will be able to create a group of tasks for performance and a group of tasks
> for energy saving.
> 
> Does it make sense ?

The old knobs used to live here:

-What:          /sys/devices/system/cpu/sched_mc_power_savings
-               /sys/devices/system/cpu/sched_smt_power_savings

Not entirely sure that's a fine place, but it has precedent.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-28 10:28                   ` Peter Zijlstra
@ 2014-04-28 11:07                     ` Daniel Lezcano
  2014-04-28 11:21                       ` Peter Zijlstra
                                         ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-28 11:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On 04/28/2014 12:28 PM, Peter Zijlstra wrote:
> On Mon, Apr 28, 2014 at 12:09:20PM +0200, Daniel Lezcano wrote:
>> I agree a numerical value is not flexible. But it sounds weird to put a
>> scheduler option in the sysfs and maybe more options will follow.
>>
>> I am wondering if we shouldn't create a new cgroup for 'energy' and put
>> everything in there. So we will have more flexibility for extension and we
>> will be able to create a group of tasks for performance and a group of tasks
>> for energy saving.
>>
>> Does it make sense ?
>
> The old knobs used to live here:
>
> -What:          /sys/devices/system/cpu/sched_mc_power_savings
> -               /sys/devices/system/cpu/sched_smt_power_savings

Ah right.

> Not entirely sure that's a fine place, but it has precedent.

I share your doubts about the right place.

I'm really wondering if the cgroup couldn't be a good solution:

Amit pointed the conflict about the power vs performance with some 
applications. We want to have for example a game to run fast performance 
and some other application to save power.

The cgroup will allow to:

  * eg. create a couple of cgroup one for performance and the other one 
for power and assign the different applications to one of these group

  * tweak the options just for a group of processes. Depending of the 
behavior of the task, the userspace can create and change some options 
for a specific application for optimum performance or energy saving

  * add more energy options in a place easy to extend

  * use string based options

  * alternatively use a single configuration for the entire system

  * use the event based file to trigger event about power consumption 
(threshold reached) and from there we can easily freeze the group of 
processes consuming too much energy

The cgroup will provide a highly configurable mechanism based on tasks 
and could use with some other cgroup subsys (eg. to force the tasks on a 
set of cpus).

IMHO, the cgroup is a good place.

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-28 11:07                     ` Daniel Lezcano
@ 2014-04-28 11:21                       ` Peter Zijlstra
  2014-04-28 23:11                       ` Rafael J. Wysocki
  2014-04-29  9:26                       ` Morten Rasmussen
  2 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2014-04-28 11:21 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On Mon, Apr 28, 2014 at 01:07:31PM +0200, Daniel Lezcano wrote:
> I'm really wondering if the cgroup couldn't be a good solution:

most my kernels don't have no stinking cgroups ;-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-28 11:07                     ` Daniel Lezcano
  2014-04-28 11:21                       ` Peter Zijlstra
@ 2014-04-28 23:11                       ` Rafael J. Wysocki
  2014-04-29 10:00                         ` Morten Rasmussen
  2014-04-29 10:25                         ` Daniel Lezcano
  2014-04-29  9:26                       ` Morten Rasmussen
  2 siblings, 2 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-28 23:11 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Peter Zijlstra, Amit Kucheria, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Monday, April 28, 2014 01:07:31 PM Daniel Lezcano wrote:
> On 04/28/2014 12:28 PM, Peter Zijlstra wrote:
> > On Mon, Apr 28, 2014 at 12:09:20PM +0200, Daniel Lezcano wrote:
> >> I agree a numerical value is not flexible. But it sounds weird to put a
> >> scheduler option in the sysfs and maybe more options will follow.
> >>
> >> I am wondering if we shouldn't create a new cgroup for 'energy' and put
> >> everything in there. So we will have more flexibility for extension and we
> >> will be able to create a group of tasks for performance and a group of tasks
> >> for energy saving.
> >>
> >> Does it make sense ?
> >
> > The old knobs used to live here:
> >
> > -What:          /sys/devices/system/cpu/sched_mc_power_savings
> > -               /sys/devices/system/cpu/sched_smt_power_savings
> 
> Ah right.
> 
> > Not entirely sure that's a fine place, but it has precedent.
> 
> I share your doubts about the right place.
> 
> I'm really wondering if the cgroup couldn't be a good solution:
> 
> Amit pointed the conflict about the power vs performance with some 
> applications. We want to have for example a game to run fast performance 
> and some other application to save power.

You can't save power.

Power is the energy flow *rate*.  It's like speed, so how can you save it?

If you talk about saving in this context, please always talk about energy as
well, because that's what we want to save.

This means that positioning power against performance doesn't make any sense
whatsoever.  You could try to position energy efficiency (that is, the relative
cost of doing work in terms of energy) against performance, but even that is
questionable, because, as I said in one of the previous messages, what is good
for performance is often good for energy efficiency too (think about race to
idle for example).

In other words, you want to have a knob whose both ends may happen to mean
the same thing.  Wouldn't that be a little odd?

In my opinion it would be much better to have a knob representing the current
relative value of energy to the user (which may depend on things like whether
or not the system is on battery etc) and meaning how far we need to go with
energy saving efforts.

So if that knob is 0, we'll do things that are known-good for performance.
If it is 1, we'll do some extra effort to save enery as well possibly at
a small expense of performance if that's necessary.  If it is 100, we'll do
all we can to save as much energy as possible without caring about performance
at all.

And it doesn't even have to be scheduler-specific, it very well may be global.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-28 11:07                     ` Daniel Lezcano
  2014-04-28 11:21                       ` Peter Zijlstra
  2014-04-28 23:11                       ` Rafael J. Wysocki
@ 2014-04-29  9:26                       ` Morten Rasmussen
  2 siblings, 0 replies; 33+ messages in thread
From: Morten Rasmussen @ 2014-04-29  9:26 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Peter Zijlstra, Rafael J. Wysocki, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On Mon, Apr 28, 2014 at 12:07:31PM +0100, Daniel Lezcano wrote:
> I'm really wondering if the cgroup couldn't be a good solution:
> 
> Amit pointed the conflict about the power vs performance with some 
> applications. We want to have for example a game to run fast performance 
> and some other application to save power.
> 
> The cgroup will allow to:
> 
>   * eg. create a couple of cgroup one for performance and the other one 
> for power and assign the different applications to one of these group

If you mean using the cgroups similar to how Android has foreground and
background groups, I think cgroups is a good way to distinguish between
tasks with different performance priorities.

We can't optimize for both energy and performance at the same time, but
it would allow us to only deliver high performance when a high
performance task is runnable. It would give us more fine-grained control
than either energy or performance for all tasks. If necessary, it would
also be possible to have more than two cgroups.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-28 23:11                       ` Rafael J. Wysocki
@ 2014-04-29 10:00                         ` Morten Rasmussen
  2014-04-29 22:19                           ` Rafael J. Wysocki
  2014-04-29 10:25                         ` Daniel Lezcano
  1 sibling, 1 reply; 33+ messages in thread
From: Morten Rasmussen @ 2014-04-29 10:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Lezcano, Peter Zijlstra, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On Tue, Apr 29, 2014 at 12:11:47AM +0100, Rafael J. Wysocki wrote:
> On Monday, April 28, 2014 01:07:31 PM Daniel Lezcano wrote:

[...]

> > I'm really wondering if the cgroup couldn't be a good solution:
> > 
> > Amit pointed the conflict about the power vs performance with some 
> > applications. We want to have for example a game to run fast performance 
> > and some other application to save power.
> 
> You can't save power.
> 
> Power is the energy flow *rate*.  It's like speed, so how can you save it?
> 
> If you talk about saving in this context, please always talk about energy as
> well, because that's what we want to save.
> 
> This means that positioning power against performance doesn't make any sense
> whatsoever.  You could try to position energy efficiency (that is, the relative
> cost of doing work in terms of energy) against performance, but even that is
> questionable, because, as I said in one of the previous messages, what is good
> for performance is often good for energy efficiency too (think about race to
> idle for example).
> 
> In other words, you want to have a knob whose both ends may happen to mean
> the same thing.  Wouldn't that be a little odd?
> 
> In my opinion it would be much better to have a knob representing the current
> relative value of energy to the user (which may depend on things like whether
> or not the system is on battery etc) and meaning how far we need to go with
> energy saving efforts.

I agree, more or less. I think of performance and energy awareness as
optimization objectives. If we only care about performance we are
dealing with a single-objective optimization problem. We do what it
takes to get the best possible performance without considering energy at
all. Energy awareness is IMO as multi-objective optimization problem. We
want to save energy, but do so without sacrificing too much performance.
What "too much" is depends on the scenario (application).

Energy efficiency and performance are different objectives, but I agree
that the means to get better energy efficiency might be the same as
those to get better performance in some cases. IMO, energy awareness is
not a big switch to turn on specific scheduling mechanics (or other
mechnaics in the kernel) such as task packing and disabling expensive
DVFS states, it is about changing the scheduling optimization objective
to include energy in the scheduling decisions. That is, try to save
energy for the current scenario. That includes race to idle if that is
the best way for the particular application and platform/system
topology.

> 
> So if that knob is 0, we'll do things that are known-good for performance.
> If it is 1, we'll do some extra effort to save enery as well possibly at
> a small expense of performance if that's necessary.  If it is 100, we'll do
> all we can to save as much energy as possible without caring about performance
> at all.
> 
> And it doesn't even have to be scheduler-specific, it very well may be global.

I like the idea of knob representing the weight of energy awareness.
Whether it should be global, per task, or per cgroup I'm not sure about.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-28 23:11                       ` Rafael J. Wysocki
  2014-04-29 10:00                         ` Morten Rasmussen
@ 2014-04-29 10:25                         ` Daniel Lezcano
  2014-05-05  0:32                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Lezcano @ 2014-04-29 10:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Amit Kucheria, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On 04/29/2014 01:11 AM, Rafael J. Wysocki wrote:
> On Monday, April 28, 2014 01:07:31 PM Daniel Lezcano wrote:
>> On 04/28/2014 12:28 PM, Peter Zijlstra wrote:
>>> On Mon, Apr 28, 2014 at 12:09:20PM +0200, Daniel Lezcano wrote:
>>>> I agree a numerical value is not flexible. But it sounds weird to put a
>>>> scheduler option in the sysfs and maybe more options will follow.
>>>>
>>>> I am wondering if we shouldn't create a new cgroup for 'energy' and put
>>>> everything in there. So we will have more flexibility for extension and we
>>>> will be able to create a group of tasks for performance and a group of tasks
>>>> for energy saving.
>>>>
>>>> Does it make sense ?
>>>
>>> The old knobs used to live here:
>>>
>>> -What:          /sys/devices/system/cpu/sched_mc_power_savings
>>> -               /sys/devices/system/cpu/sched_smt_power_savings
>>
>> Ah right.
>>
>>> Not entirely sure that's a fine place, but it has precedent.
>>
>> I share your doubts about the right place.
>>
>> I'm really wondering if the cgroup couldn't be a good solution:
>>
>> Amit pointed the conflict about the power vs performance with some
>> applications. We want to have for example a game to run fast performance
>> and some other application to save power.
>
> You can't save power.
>
> Power is the energy flow *rate*.  It's like speed, so how can you save it?
>
> If you talk about saving in this context, please always talk about energy as
> well, because that's what we want to save.

Hi Rafael,

yeah, I think there is an abuse when talking about 'power'. I thought I 
took care of talking about energy but 'power' comes always in my mind. I 
believe the confusion is coming from the meaning of 'power' in French 
where one translation is 'energy'.

Anyway, thanks for the clarification, I will try to use the term 
'energy' and 'power' conveniently next time.

> This means that positioning power against performance doesn't make any sense
> whatsoever.  You could try to position energy efficiency (that is, the relative
> cost of doing work in terms of energy) against performance, but even that is
> questionable, because, as I said in one of the previous messages, what is good
> for performance is often good for energy efficiency too (think about race to
> idle for example).
> In other words, you want to have a knob whose both ends may happen to mean
> the same thing.  Wouldn't that be a little odd?

Yes, I share this point of view. I believe we won't care about adding 
any knobs for this situation.

> In my opinion it would be much better to have a knob representing the current
> relative value of energy to the user (which may depend on things like whether
> or not the system is on battery etc) and meaning how far we need to go with
> energy saving efforts.
>
> So if that knob is 0, we'll do things that are known-good for performance.
> If it is 1, we'll do some extra effort to save enery as well possibly at
> a small expense of performance if that's necessary.  If it is 100, we'll do
> all we can to save as much energy as possible without caring about performance
> at all.
>
> And it doesn't even have to be scheduler-specific, it very well may be global.

That would be very nice but I don't see how we can quantify this energy 
and handle that generically from the kernel for all the hardware.

I am pretty sure we will discover for some kind of hardware a specific 
option will consume more power, argh ! energy I mean, than another 
hardware because of the architecture.

 From my personal experience, when we are facing this kind of complexity 
and heuristic, it is the sign the userspace has some work to do.

What I am proposing is not in contradiction with your approach, it is 
about exporting a lot of knobs to userspace, and the userspace decide 
how to map what is '0' <--> '100' regarding these options. Nothing 
prevent the different platform to set a default value for these options.

 From my POV, the cgroup could be a good solution for that for different 
reasons. Especially one good reason is we can stick the energy policy 
per task instead of the entire system.

Let's imagine the following scenario:

An user has a laptop running a mailer looking for the email every 5 
minutes. The system switched to 'power'. The user wants to play a video 
game but due to the 'power' policy, the game is not playable so it 
forces the policy to 'performance'. All the tasks will use the 
'performance' policy, thus consuming more energy.

If we do per task, the video game will use the 'performance' policy and 
the other tasks on the system will use the 'power' policy. The userspace 
can take the decision to freeze the application running 'performance' if 
we reach a critical battery level.

The cgroup is a good framework to do that and gives a lot of flexibility 
to userspace. I understood Peter does not like the cgroup but I did not 
give up to convince him, the cgroup could be good solution :)

Looking forward, if the energy policy is tied with the task, in the 
future we can normalize the energy consumption and stick to an 'energy 
load' per task and reuse the load tracking for energy, do per task 
energy accounting, nice per energy, etc ...

Going back to reality, concretely this sysctl patch did not reach a 
consensus. So I will resend the two other patches, hoping the discussion 
will lead to an agreement.

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-27 13:23                 ` Rafael J. Wysocki
@ 2014-04-29 10:50                   ` Morten Rasmussen
  2014-04-29 22:02                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 33+ messages in thread
From: Morten Rasmussen @ 2014-04-29 10:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Peter Zijlstra, Amit Kucheria, Daniel Lezcano,
	Ingo Molnar, Lists linaro-kernel, Linux PM list,
	Linux Kernel Mailing List

On Sun, Apr 27, 2014 at 02:23:24PM +0100, Rafael J. Wysocki wrote:
> Still, I have a rather fundamental problem with the notion that performance
> and energy efficiency are essentially at odds with each other, because quite
> often they aren't.  What is good for performance is often good for energy
> efficiency as well (like when it may be better to do a certain amount of work
> quickly and then go idle instead of slowing down and taking more time to do
> the work, because that time may turn out to be so long that energy used for
> doing the work will actually be greater).  So while I'm all for acting on power
> events, I'm also for being rather careful with that.

As I wrote in my other reply in this thread, IMO we should distinguish
between objectives and methods to optimize for objectives. Performance
and energy efficiency are different objectives, but some methods might
be useful for optimizing for both (such as race to idle in some
scenarios on some platforms). Tuning knobs and power events shouldn't be
assumed to directly control specific scheduling decisions, for example
whether to race to idle or not. They should only mean that the scheduler
should optimize for a specific goal. For example, a specific
energy/performance trade-off.

> Moreover, in fact performance and energy efficiency are not the only factors
> that need to be taken into account, there also is the maximum power draw (ie.
> the rate of using energy) that can be sustained.  Arguably, all three of them
> always matter, regardless of whether or not the system is on battery.
> 
> People often mentally translate energy efficiency into battery life, because
> that's probably the most obvious case, but in fact it generally translates into
> the cost of doing work (in therms of money or other things).  Going to battery
> means something like "energy is now more valuable to me", but there may be other
> events meaning the same (for example, energy may be cheaper in the night etc).
> So it looks like we need a global mechanism to represent the current relative
> value of energy to the user and to update it whenever something like "we're on
> battery now" happens (other subsystems would benefit from that too IMO).
> 
> The same applies to the maximum sustainable power draw limit more or less.
> While it generally changes between "on battery" and "on AC power", it may also
> depend on other factors like the capacity of the available power supply for a
> data center.

I haven't given maximum power draw much thought, but it is a third
objective independent of energy efficiency even though methods for
improving energy efficiency (such as disabling expensive DVFS states)
might be useful for this as well.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-29 10:50                   ` Morten Rasmussen
@ 2014-04-29 22:02                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-29 22:02 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Ingo Molnar, Peter Zijlstra, Amit Kucheria, Daniel Lezcano,
	Ingo Molnar, Lists linaro-kernel, Linux PM list,
	Linux Kernel Mailing List

On Tuesday, April 29, 2014 11:50:02 AM Morten Rasmussen wrote:
> On Sun, Apr 27, 2014 at 02:23:24PM +0100, Rafael J. Wysocki wrote:

[cut]

> > The same applies to the maximum sustainable power draw limit more or less.
> > While it generally changes between "on battery" and "on AC power", it may also
> > depend on other factors like the capacity of the available power supply for a
> > data center.
> 
> I haven't given maximum power draw much thought, but it is a third
> objective independent of energy efficiency even though methods for
> improving energy efficiency (such as disabling expensive DVFS states)
> might be useful for this as well.

Yes, it is an independent objective.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-29 10:00                         ` Morten Rasmussen
@ 2014-04-29 22:19                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-04-29 22:19 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Daniel Lezcano, Peter Zijlstra, Amit Kucheria, Ingo Molnar,
	Lists linaro-kernel, Linux PM list, Linux Kernel Mailing List

On Tuesday, April 29, 2014 11:00:23 AM Morten Rasmussen wrote:
> On Tue, Apr 29, 2014 at 12:11:47AM +0100, Rafael J. Wysocki wrote:
> > On Monday, April 28, 2014 01:07:31 PM Daniel Lezcano wrote:
> 
> [...]
> 
> > > I'm really wondering if the cgroup couldn't be a good solution:
> > > 
> > > Amit pointed the conflict about the power vs performance with some 
> > > applications. We want to have for example a game to run fast performance 
> > > and some other application to save power.
> > 
> > You can't save power.
> > 
> > Power is the energy flow *rate*.  It's like speed, so how can you save it?
> > 
> > If you talk about saving in this context, please always talk about energy as
> > well, because that's what we want to save.
> > 
> > This means that positioning power against performance doesn't make any sense
> > whatsoever.  You could try to position energy efficiency (that is, the relative
> > cost of doing work in terms of energy) against performance, but even that is
> > questionable, because, as I said in one of the previous messages, what is good
> > for performance is often good for energy efficiency too (think about race to
> > idle for example).
> > 
> > In other words, you want to have a knob whose both ends may happen to mean
> > the same thing.  Wouldn't that be a little odd?
> > 
> > In my opinion it would be much better to have a knob representing the current
> > relative value of energy to the user (which may depend on things like whether
> > or not the system is on battery etc) and meaning how far we need to go with
> > energy saving efforts.
> 
> I agree, more or less. I think of performance and energy awareness as
> optimization objectives. If we only care about performance we are
> dealing with a single-objective optimization problem. We do what it
> takes to get the best possible performance without considering energy at
> all. Energy awareness is IMO as multi-objective optimization problem. We
> want to save energy, but do so without sacrificing too much performance.
> What "too much" is depends on the scenario (application).
> 
> Energy efficiency and performance are different objectives, but I agree
> that the means to get better energy efficiency might be the same as
> those to get better performance in some cases. IMO, energy awareness is
> not a big switch to turn on specific scheduling mechanics (or other
> mechnaics in the kernel) such as task packing and disabling expensive
> DVFS states, it is about changing the scheduling optimization objective
> to include energy in the scheduling decisions.

Very well stated.

Also it is about to what extent those decisions should take energy into
account.

> That is, try to save energy for the current scenario. That includes race to
> idle if that is the best way for the particular application and platform/system
> topology.
> 
> > 
> > So if that knob is 0, we'll do things that are known-good for performance.
> > If it is 1, we'll do some extra effort to save enery as well possibly at
> > a small expense of performance if that's necessary.  If it is 100, we'll do
> > all we can to save as much energy as possible without caring about performance
> > at all.
> > 
> > And it doesn't even have to be scheduler-specific, it very well may be global.
> 
> I like the idea of knob representing the weight of energy awareness.
> Whether it should be global, per task, or per cgroup I'm not sure about.

A global knob might be useful to other parts of the kernel too, like device
drivers implementing runtime PM, for example.  In that context per task
or per cgroup knobs wouldn't be very useful.

Also there may be a user space component and a kernel component in it,
combined somehow, so that user space can influence its value and so that
the kernel can adjust it in response to power events automatically.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] sched: idle: Encapsulate the code to compile it out
  2014-04-24 12:24 ` [PATCH 1/3] sched: idle: Encapsulate the code to compile it out Daniel Lezcano
@ 2014-04-30 17:39   ` Nicolas Pitre
  0 siblings, 0 replies; 33+ messages in thread
From: Nicolas Pitre @ 2014-04-30 17:39 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: peterz, mingo, linaro-kernel, linux-pm, rjw, linux-kernel

On Thu, 24 Apr 2014, Daniel Lezcano wrote:

> Encapsulate the large portion of cpuidle_idle_call inside another
> function so when CONFIG_CPU_IDLE=n, the code will be compiled out.
> Also that is benefitial for the clarity of the code as it removes
> a nested indentation level.

I agree with the nesting level concern.

However I dislike the proliferation of #ifdef's in the main code. Those 
added in this patch are unnecessary. We want as much compilation 
coverage as possible.

With cpuidle_enabled() already hardcoded to return -ENODEV when 
CONFIG_CPU_IDLE=n, the compiler should already be smart enough to 
optimize away all the redundant code in that case.  You may look at the 
assembly output from the compiler if you're not sure.

And if you're still not trusting the compiler then the second best 
option is to use IS_ENABLED(CONFIG_CPU_IDLE) inside some if statement.

Nicolas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] sched: idle: Add sched balance option
  2014-04-29 10:25                         ` Daniel Lezcano
@ 2014-05-05  0:32                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2014-05-05  0:32 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Peter Zijlstra, Amit Kucheria, Ingo Molnar, Lists linaro-kernel,
	Linux PM list, Linux Kernel Mailing List

On Tuesday, April 29, 2014 12:25:39 PM Daniel Lezcano wrote:
> On 04/29/2014 01:11 AM, Rafael J. Wysocki wrote:
> > On Monday, April 28, 2014 01:07:31 PM Daniel Lezcano wrote:

[cut]

> > In my opinion it would be much better to have a knob representing the current
> > relative value of energy to the user (which may depend on things like whether
> > or not the system is on battery etc) and meaning how far we need to go with
> > energy saving efforts.
> >
> > So if that knob is 0, we'll do things that are known-good for performance.
> > If it is 1, we'll do some extra effort to save enery as well possibly at
> > a small expense of performance if that's necessary.  If it is 100, we'll do
> > all we can to save as much energy as possible without caring about performance
> > at all.
> >
> > And it doesn't even have to be scheduler-specific, it very well may be global.
> 
> That would be very nice but I don't see how we can quantify this energy 
> and handle that generically from the kernel for all the hardware.
> 
> I am pretty sure we will discover for some kind of hardware a specific 
> option will consume more power, argh ! energy I mean, than another 
> hardware because of the architecture.
> 
>  From my personal experience, when we are facing this kind of complexity 
> and heuristic, it is the sign the userspace has some work to do.
> 
> What I am proposing is not in contradiction with your approach, it is 
> about exporting a lot of knobs to userspace, and the userspace decide 
> how to map what is '0' <--> '100' regarding these options. Nothing 
> prevent the different platform to set a default value for these options.

Our experience so far, however, is that user space is not really likely to
change the default values of such knobs.

>  From my POV, the cgroup could be a good solution for that for different 
> reasons. Especially one good reason is we can stick the energy policy 
> per task instead of the entire system.
> 
> Let's imagine the following scenario:
> 
> An user has a laptop running a mailer looking for the email every 5 
> minutes. The system switched to 'power'. The user wants to play a video 
> game but due to the 'power' policy, the game is not playable so it 
> forces the policy to 'performance'. All the tasks will use the 
> 'performance' policy, thus consuming more energy.

This isn't all about the given task, but also about devices that are in use
while it is being run.  For example, to play a game the user would probably
like the input subsystem to be more responsive and the screen to be brighter
etc.  If that is a network game, the network adapter will probably need to
work in the "performance" mode too.

> If we do per task, the video game will use the 'performance' policy and 
> the other tasks on the system will use the 'power' policy. The userspace 
> can take the decision to freeze the application running 'performance' if 
> we reach a critical battery level.

Well, consider task packing in that context and suppose that according to the
current policy task packing should be applied to "energy efficient" tasks, but
not to "performance" tasks.  Now, suppose that there's an "energy efficient"
task to run and there's a core occupied by a "performace" task.  Should the
"energy efficient" task be run on that core or should we find another one
for it?  Who's more important in that case?

> The cgroup is a good framework to do that and gives a lot of flexibility 
> to userspace. I understood Peter does not like the cgroup but I did not 
> give up to convince him, the cgroup could be good solution :)

Will user space actually use that flexibility?

> Looking forward, if the energy policy is tied with the task, in the 
> future we can normalize the energy consumption and stick to an 'energy 
> load' per task and reuse the load tracking for energy, do per task 
> energy accounting, nice per energy, etc ...

That can be done, but some parts of the kernel that may benefit from an
"energy conservation bias" knob are not tied to any particular task.  The
block layer may be one example here.

> Going back to reality, concretely this sysctl patch did not reach a 
> consensus. So I will resend the two other patches, hoping the discussion 
> will lead to an agreement.

Well, the discussion so far has been useful to me anyway. :-)

Thanks!


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-05-05  0:15 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-24 12:24 [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Daniel Lezcano
2014-04-24 12:24 ` [PATCH 1/3] sched: idle: Encapsulate the code to compile it out Daniel Lezcano
2014-04-30 17:39   ` Nicolas Pitre
2014-04-24 12:24 ` [PATCH 2/3] sched: idle: Add sched balance option Daniel Lezcano
     [not found]   ` <CAP245DX9wewQFhcyGj5ZuNE7hHC4fRn90POC32HLF6ugja6nJg@mail.gmail.com>
2014-04-24 13:30     ` Daniel Lezcano
2014-04-25 10:54       ` Amit Kucheria
2014-04-25 11:46         ` Rafael J. Wysocki
2014-04-25 12:17           ` Daniel Lezcano
2014-04-25 13:20           ` Peter Zijlstra
2014-04-25 17:01             ` Daniel Lezcano
2014-04-25 18:43               ` Peter Zijlstra
2014-04-28 10:09                 ` Daniel Lezcano
2014-04-28 10:28                   ` Peter Zijlstra
2014-04-28 11:07                     ` Daniel Lezcano
2014-04-28 11:21                       ` Peter Zijlstra
2014-04-28 23:11                       ` Rafael J. Wysocki
2014-04-29 10:00                         ` Morten Rasmussen
2014-04-29 22:19                           ` Rafael J. Wysocki
2014-04-29 10:25                         ` Daniel Lezcano
2014-05-05  0:32                           ` Rafael J. Wysocki
2014-04-29  9:26                       ` Morten Rasmussen
2014-04-26  0:18             ` Rafael J. Wysocki
2014-04-26  6:17               ` Ingo Molnar
2014-04-27 13:23                 ` Rafael J. Wysocki
2014-04-29 10:50                   ` Morten Rasmussen
2014-04-29 22:02                     ` Rafael J. Wysocki
2014-04-26 10:55               ` Peter Zijlstra
2014-04-27 13:54                 ` Rafael J. Wysocki
2014-04-27 19:39                   ` Peter Zijlstra
2014-04-25 13:22         ` Peter Zijlstra
2014-04-24 12:24 ` [PATCH 3/3] sched: idle: Store the idle state the cpu is Daniel Lezcano
2014-04-24 16:16   ` Rafael J. Wysocki
2014-04-24 16:13 ` [PATCH 0/3] sched: idle: Provide the basis to integrate cpuidle Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.