linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes
@ 2014-11-07 14:31 Daniel Lezcano
  2014-11-07 14:31 ` [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function Daniel Lezcano
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:31 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

Hi,

this patchset provides some cleanups to prepare the removal of the
CPUIDLE_DRIVER_STATE_START macro in the cpuidle code. As the code has built
on top of this macro adding more complexity and ugly hacks/tricks, it is hard
to simply remove it. Some code has to be cleanup before, this is the purpose
of this first patchset.

Changelog:

V3:
  - fixed comment for the latency_req introduction back to the right patch
  - added back the last_idx_state in the reflect function
  - change the get_typical_interval to return the value

V2:
  - fixed typo
  - removed redundant comment
  - fixed readability


Daniel Lezcano (6):
  sched: idle: Add a weak arch_cpu_idle_poll function
  sched: idle: cpuidle: Check the latency req before idle
  sched: idle: Get the next timer event and pass it the cpuidle
    framework
  cpuidle: idle: menu: Don't reflect when a state selection failed
  cpuidle: menu: Fix the get_typical_interval
  cpuidle: menu: Move the update function before its declaration

 drivers/cpuidle/cpuidle.c          |  12 +--
 drivers/cpuidle/governors/ladder.c |  13 +--
 drivers/cpuidle/governors/menu.c   | 183 ++++++++++++++++++-------------------
 include/linux/cpuidle.h            |   9 +-
 kernel/sched/idle.c                |  55 +++++++----
 5 files changed, 140 insertions(+), 132 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function
  2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
@ 2014-11-07 14:31 ` Daniel Lezcano
  2014-11-08 10:39   ` Preeti U Murthy
  2014-11-10 12:29   ` Peter Zijlstra
  2014-11-07 14:31 ` [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle Daniel Lezcano
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:31 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

The poll function is called when a timer expired or if we force to poll when
the cpu_idle_force_poll option is set.

The poll function does:

       local_irq_enable();
       while (!tif_need_resched())
               cpu_relax();

This default poll function suits for the x86 arch because its rep; nop;
hardware power optimization. But on other archs, this optimization does not
exists and we are not saving power. The arch specific bits may want to
optimize this loop by adding their own optimization.

Give an opportunity to the different platform to specify their own polling
loop by adding a weak cpu_idle_poll_loop function.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/idle.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index c47fce7..ea65bbae 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -42,18 +42,6 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
 __setup("hlt", cpu_idle_nopoll_setup);
 #endif
 
-static inline int cpu_idle_poll(void)
-{
-	rcu_idle_enter();
-	trace_cpu_idle_rcuidle(0, smp_processor_id());
-	local_irq_enable();
-	while (!tif_need_resched())
-		cpu_relax();
-	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
-	rcu_idle_exit();
-	return 1;
-}
-
 /* Weak implementations for optional arch specific functions */
 void __weak arch_cpu_idle_prepare(void) { }
 void __weak arch_cpu_idle_enter(void) { }
@@ -65,6 +53,23 @@ void __weak arch_cpu_idle(void)
 	local_irq_enable();
 }
 
+void __weak arch_cpu_idle_poll(void)
+{
+	local_irq_enable();
+	while (!tif_need_resched())
+		cpu_relax();
+}
+
+static inline int cpu_idle_poll(void)
+{
+	rcu_idle_enter();
+	trace_cpu_idle_rcuidle(0, smp_processor_id());
+	arch_cpu_idle_poll();
+	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
+	rcu_idle_exit();
+	return 1;
+}
+
 /**
  * cpuidle_idle_call - the main idle function
  *
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
  2014-11-07 14:31 ` [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function Daniel Lezcano
@ 2014-11-07 14:31 ` Daniel Lezcano
  2014-11-08 10:40   ` Preeti U Murthy
  2014-11-10 12:41   ` Peter Zijlstra
  2014-11-07 14:31 ` [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework Daniel Lezcano
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:31 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

When the pmqos latency requirement is set to zero that means "poll in all the
cases".

That is correctly implemented on x86 but not on the other archs.

As how is written the code, if the latency request is zero, the governor will
return zero, so corresponding, for x86, to the poll function, but for the
others arch the default idle function. For example, on ARM this is wait-for-
interrupt with a latency of '1', so violating the constraint.

In order to fix that, do the latency requirement check *before* calling the
cpuidle framework in order to jump to the poll function without entering
cpuidle. That has several benefits:

 1. It clarifies and unifies the code
 2. It fixes x86 vs other archs behavior
 3. Factors out the call to the same function
 4. Prevent to enter the cpuidle framework with its expensive cost in
    calculation

As the latency_req is needed in all the cases, change the select API to take
the latency_req as parameter in case it is not equal to zero.

As a positive side effect, it introduces the latency constraint specified
externally, so one more step to the cpuidle/scheduler integration.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Len Brown <len.brown@intel.com>
---
 drivers/cpuidle/cpuidle.c          |  6 ++++--
 drivers/cpuidle/governors/ladder.c |  9 +--------
 drivers/cpuidle/governors/menu.c   |  8 ++------
 include/linux/cpuidle.h            |  7 ++++---
 kernel/sched/idle.c                | 18 ++++++++++++++----
 5 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 125150d..86f6cb8 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -155,10 +155,12 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
  *
  * @drv: the cpuidle driver
  * @dev: the cpuidle device
+ * @latency_req: the latency constraint when choosing an idle state
  *
  * Returns the index of the idle state.
  */
-int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+		   int latency_req)
 {
 	if (off || !initialized)
 		return -ENODEV;
@@ -169,7 +171,7 @@ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	if (unlikely(use_deepest_state))
 		return cpuidle_find_deepest_state(drv, dev);
 
-	return cpuidle_curr_governor->select(drv, dev);
+	return cpuidle_curr_governor->select(drv, dev, latency_req);
 }
 
 /**
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 06b57c4..53113c2 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -64,18 +64,11 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
  * @dev: the CPU
  */
 static int ladder_select_state(struct cpuidle_driver *drv,
-				struct cpuidle_device *dev)
+			       struct cpuidle_device *dev, int latency_req)
 {
 	struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
 	struct ladder_device_state *last_state;
 	int last_residency, last_idx = ldev->last_state_idx;
-	int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
-
-	/* Special case when user has set very strict latency requirement */
-	if (unlikely(latency_req == 0)) {
-		ladder_do_selection(ldev, last_idx, 0);
-		return 0;
-	}
 
 	last_state = &ldev->states[last_idx];
 
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 710a233..9b7c0b9 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -287,10 +287,10 @@ again:
  * @drv: cpuidle driver containing state data
  * @dev: the CPU
  */
-static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+		       int latency_req)
 {
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
-	int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
 	int i;
 	unsigned int interactivity_req;
 	unsigned long nr_iowaiters, cpu_load;
@@ -302,10 +302,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 
 	data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1;
 
-	/* Special case when user has set very strict latency requirement */
-	if (unlikely(latency_req == 0))
-		return 0;
-
 	/* determine the expected residency time, round up */
 	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
 
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 25e0df6..fb465c1 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -122,7 +122,7 @@ struct cpuidle_driver {
 extern void disable_cpuidle(void);
 
 extern int cpuidle_select(struct cpuidle_driver *drv,
-			  struct cpuidle_device *dev);
+			  struct cpuidle_device *dev, int latency_req);
 extern int cpuidle_enter(struct cpuidle_driver *drv,
 			 struct cpuidle_device *dev, int index);
 extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -150,7 +150,7 @@ extern struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
 #else
 static inline void disable_cpuidle(void) { }
 static inline int cpuidle_select(struct cpuidle_driver *drv,
-				 struct cpuidle_device *dev)
+				 struct cpuidle_device *dev, int latency_req)
 {return -ENODEV; }
 static inline int cpuidle_enter(struct cpuidle_driver *drv,
 				struct cpuidle_device *dev, int index)
@@ -205,7 +205,8 @@ struct cpuidle_governor {
 					struct cpuidle_device *dev);
 
 	int  (*select)		(struct cpuidle_driver *drv,
-					struct cpuidle_device *dev);
+				 struct cpuidle_device *dev,
+				 int latency_req);
 	void (*reflect)		(struct cpuidle_device *dev, int index);
 
 	struct module 		*owner;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ea65bbae..189e80a 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -5,6 +5,7 @@
 #include <linux/cpu.h>
 #include <linux/cpuidle.h>
 #include <linux/tick.h>
+#include <linux/pm_qos.h>
 #include <linux/mm.h>
 #include <linux/stackprotector.h>
 
@@ -79,7 +80,7 @@ static inline int cpu_idle_poll(void)
  * set, and it returns with polling set.  If it ever stops polling, it
  * must clear the polling bit.
  */
-static void cpuidle_idle_call(void)
+static void cpuidle_idle_call(unsigned int latency_req)
 {
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
@@ -112,7 +113,7 @@ static void cpuidle_idle_call(void)
 	 * Ask the cpuidle framework to choose a convenient idle state.
 	 * Fall back to the default arch idle method on errors.
 	 */
-	next_state = cpuidle_select(drv, dev);
+	next_state = cpuidle_select(drv, dev, latency_req);
 	if (next_state < 0) {
 use_default:
 		/*
@@ -193,6 +194,8 @@ exit_idle:
  */
 static void cpu_idle_loop(void)
 {
+	unsigned int latency_req;
+
 	while (1) {
 		/*
 		 * If the arch has a polling bit, we maintain an invariant:
@@ -216,19 +219,26 @@ static void cpu_idle_loop(void)
 			local_irq_disable();
 			arch_cpu_idle_enter();
 
+			latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
+
 			/*
 			 * In poll mode we reenable interrupts and spin.
 			 *
+			 * If the latency req is zero, we don't want to
+			 * enter any idle state and we jump to the poll
+			 * function directly
+			 *
 			 * Also if we detected in the wakeup from idle
 			 * path that the tick broadcast device expired
 			 * for us, we don't want to go deep idle as we
 			 * know that the IPI is going to arrive right
 			 * away
 			 */
-			if (cpu_idle_force_poll || tick_check_broadcast_expired())
+			if (!latency_req || cpu_idle_force_poll ||
+			    tick_check_broadcast_expired())
 				cpu_idle_poll();
 			else
-				cpuidle_idle_call();
+				cpuidle_idle_call(latency_req);
 
 			arch_cpu_idle_exit();
 		}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework
  2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
  2014-11-07 14:31 ` [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function Daniel Lezcano
  2014-11-07 14:31 ` [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle Daniel Lezcano
@ 2014-11-07 14:31 ` Daniel Lezcano
  2014-11-08 10:44   ` Preeti U Murthy
  2014-11-10 12:43   ` Peter Zijlstra
  2014-11-07 14:31 ` [PATCH V3 4/6] cpuidle: idle: menu: Don't reflect when a state selection failed Daniel Lezcano
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:31 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

Following the logic of the previous patch, retrieve from the idle task the
expected timer sleep duration and pass it to the cpuidle framework.

Take the opportunity to remove the unused headers in the menu.c file.

This patch does not change the current behavior.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Len Brown <len.brown@intel.com>
---
 drivers/cpuidle/cpuidle.c          | 10 ++++------
 drivers/cpuidle/governors/ladder.c |  3 ++-
 drivers/cpuidle/governors/menu.c   |  8 ++------
 include/linux/cpuidle.h            |  8 +++++---
 kernel/sched/idle.c                | 13 +++++++++----
 5 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 86f6cb8..f3e7b73 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -8,16 +8,12 @@
  * This code is licenced under the GPL.
  */
 
-#include <linux/clockchips.h>
 #include <linux/kernel.h>
 #include <linux/mutex.h>
-#include <linux/sched.h>
 #include <linux/notifier.h>
 #include <linux/pm_qos.h>
 #include <linux/cpu.h>
 #include <linux/cpuidle.h>
-#include <linux/ktime.h>
-#include <linux/hrtimer.h>
 #include <linux/module.h>
 #include <trace/events/power.h>
 
@@ -156,11 +152,12 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
  * @drv: the cpuidle driver
  * @dev: the cpuidle device
  * @latency_req: the latency constraint when choosing an idle state
+ * @next_timer_event: the duration until the timer expires
  *
  * Returns the index of the idle state.
  */
 int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
-		   int latency_req)
+		   int latency_req, int next_timer_event)
 {
 	if (off || !initialized)
 		return -ENODEV;
@@ -171,7 +168,8 @@ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	if (unlikely(use_deepest_state))
 		return cpuidle_find_deepest_state(drv, dev);
 
-	return cpuidle_curr_governor->select(drv, dev, latency_req);
+	return cpuidle_curr_governor->select(drv, dev, latency_req,
+					     next_timer_event);
 }
 
 /**
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 53113c2..51c9ccd 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -64,7 +64,8 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
  * @dev: the CPU
  */
 static int ladder_select_state(struct cpuidle_driver *drv,
-			       struct cpuidle_device *dev, int latency_req)
+			       struct cpuidle_device *dev,
+			       int latency_req, int next_timer_event)
 {
 	struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
 	struct ladder_device_state *last_state;
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 9b7c0b9..91b3000 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -13,10 +13,6 @@
 #include <linux/kernel.h>
 #include <linux/cpuidle.h>
 #include <linux/pm_qos.h>
-#include <linux/time.h>
-#include <linux/ktime.h>
-#include <linux/hrtimer.h>
-#include <linux/tick.h>
 #include <linux/sched.h>
 #include <linux/math64.h>
 #include <linux/module.h>
@@ -288,7 +284,7 @@ again:
  * @dev: the CPU
  */
 static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
-		       int latency_req)
+		       int latency_req, int next_timer_event)
 {
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
 	int i;
@@ -303,7 +299,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1;
 
 	/* determine the expected residency time, round up */
-	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
+	data->next_timer_us = next_timer_event;
 
 	get_iowait_load(&nr_iowaiters, &cpu_load);
 	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index fb465c1..d477746 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -122,7 +122,8 @@ struct cpuidle_driver {
 extern void disable_cpuidle(void);
 
 extern int cpuidle_select(struct cpuidle_driver *drv,
-			  struct cpuidle_device *dev, int latency_req);
+			  struct cpuidle_device *dev,
+			  int latency_req, int next_timer_event);
 extern int cpuidle_enter(struct cpuidle_driver *drv,
 			 struct cpuidle_device *dev, int index);
 extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -150,7 +151,8 @@ extern struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
 #else
 static inline void disable_cpuidle(void) { }
 static inline int cpuidle_select(struct cpuidle_driver *drv,
-				 struct cpuidle_device *dev, int latency_req)
+				 struct cpuidle_device *dev,
+				 int latency_req, int next_timer_event)
 {return -ENODEV; }
 static inline int cpuidle_enter(struct cpuidle_driver *drv,
 				struct cpuidle_device *dev, int index)
@@ -206,7 +208,7 @@ struct cpuidle_governor {
 
 	int  (*select)		(struct cpuidle_driver *drv,
 				 struct cpuidle_device *dev,
-				 int latency_req);
+				 int latency_req, int next_timer_event);
 	void (*reflect)		(struct cpuidle_device *dev, int index);
 
 	struct module 		*owner;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 189e80a..0a7a1d1 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -80,7 +80,8 @@ static inline int cpu_idle_poll(void)
  * set, and it returns with polling set.  If it ever stops polling, it
  * must clear the polling bit.
  */
-static void cpuidle_idle_call(unsigned int latency_req)
+static void cpuidle_idle_call(unsigned int latency_req,
+			      unsigned int next_timer_event)
 {
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
@@ -113,7 +114,7 @@ static void cpuidle_idle_call(unsigned int latency_req)
 	 * Ask the cpuidle framework to choose a convenient idle state.
 	 * Fall back to the default arch idle method on errors.
 	 */
-	next_state = cpuidle_select(drv, dev, latency_req);
+	next_state = cpuidle_select(drv, dev, latency_req, next_timer_event);
 	if (next_state < 0) {
 use_default:
 		/*
@@ -194,7 +195,7 @@ exit_idle:
  */
 static void cpu_idle_loop(void)
 {
-	unsigned int latency_req;
+	unsigned int latency_req, next_timer_event;
 
 	while (1) {
 		/*
@@ -221,6 +222,9 @@ static void cpu_idle_loop(void)
 
 			latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
 
+			next_timer_event =
+				ktime_to_us(tick_nohz_get_sleep_length());
+
 			/*
 			 * In poll mode we reenable interrupts and spin.
 			 *
@@ -238,7 +242,8 @@ static void cpu_idle_loop(void)
 			    tick_check_broadcast_expired())
 				cpu_idle_poll();
 			else
-				cpuidle_idle_call(latency_req);
+				cpuidle_idle_call(latency_req,
+						  next_timer_event);
 
 			arch_cpu_idle_exit();
 		}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 4/6] cpuidle: idle: menu: Don't reflect when a state selection failed
  2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
                   ` (2 preceding siblings ...)
  2014-11-07 14:31 ` [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework Daniel Lezcano
@ 2014-11-07 14:31 ` Daniel Lezcano
  2014-11-08 10:41   ` Preeti U Murthy
  2014-11-07 14:31 ` [PATCH V3 5/6] cpuidle: menu: Fix the get_typical_interval Daniel Lezcano
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:31 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

In the current code, the check to reflect or not the outcoming state is done
against the idle state which has been chosen and its value.

Instead of doing a check in each of the reflect functions, just don't call reflect
if something went wrong in the idle path.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 drivers/cpuidle/governors/ladder.c | 3 +--
 drivers/cpuidle/governors/menu.c   | 3 +--
 kernel/sched/idle.c                | 3 ++-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 51c9ccd..86e4d49 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
 static void ladder_reflect(struct cpuidle_device *dev, int index)
 {
 	struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
-	if (index > 0)
-		ldev->last_state_idx = index;
+	ldev->last_state_idx = index;
 }
 
 static struct cpuidle_governor ladder_governor = {
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 91b3000..2e4a315 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -366,8 +366,7 @@ static void menu_reflect(struct cpuidle_device *dev, int index)
 {
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
 	data->last_state_idx = index;
-	if (index >= 0)
-		data->needs_update = 1;
+	data->needs_update = 1;
 }
 
 /**
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 0a7a1d1..0ae1cc8 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -173,7 +173,8 @@ use_default:
 	/*
 	 * Give the governor an opportunity to reflect on the outcome
 	 */
-	cpuidle_reflect(dev, entered_state);
+	if (entered_state >= 0)
+		cpuidle_reflect(dev, entered_state);
 
 exit_idle:
 	__current_set_polling();
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 5/6] cpuidle: menu: Fix the get_typical_interval
  2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
                   ` (3 preceding siblings ...)
  2014-11-07 14:31 ` [PATCH V3 4/6] cpuidle: idle: menu: Don't reflect when a state selection failed Daniel Lezcano
@ 2014-11-07 14:31 ` Daniel Lezcano
  2014-11-07 14:31 ` [PATCH V3 6/6] cpuidle: menu: Move the update function before its declaration Daniel Lezcano
  2014-11-07 14:34 ` [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
  6 siblings, 0 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:31 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

The first time the 'get_typical_function' is called, it computes an average
of zero as no data is filled yet. That leads the 'data->predicted_us' variable
to be set to zero too.

The caller, 'menu_select' will then do:

	interactivity_req = data->predicted_us /
			performance_multiplier(nr_iowaiters, cpu_load);

That sets the interactivity_req to zero (0/performance...).

and then

	if (latency_req > interactivity_req)
		latency_req = interactivity_req;

... setting 'latency_req' to zero too.

No idle state will fulfill this constraint and we will go the C1 state as
default and leading to an update. So the next calls will compute an average
different from zero.

Even if that works with the current code but with a broken semantic, it will
just break with the next patches where we are stricter with the latencies
check: the first check will fail (latency_req is zero), then no update will
occur leading to always falling to choose an idle state.

As there are no previous values and it is pointless to compute a standard
deviation for these unexisting values.

Change the function to return the computed value and use it only if it is
different from zero and greater than the next timer expiration.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/cpuidle/governors/menu.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 2e4a315..163e63b 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -198,7 +198,7 @@ static u64 div_round64(u64 dividend, u32 divisor)
  * of points is below a threshold. If it is... then use the
  * average of these 8 points as the estimated value.
  */
-static void get_typical_interval(struct menu_device *data)
+static unsigned int get_typical_interval(struct menu_device *data)
 {
 	int i, divisor;
 	unsigned int max, thresh;
@@ -255,11 +255,8 @@ again:
 	if (likely(stddev <= ULONG_MAX)) {
 		stddev = int_sqrt(stddev);
 		if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
-							|| stddev <= 20) {
-			if (data->next_timer_us > avg)
-				data->predicted_us = avg;
-			return;
-		}
+							|| stddev <= 20)
+			return avg;
 	}
 
 	/*
@@ -272,7 +269,7 @@ again:
 	 * with sporadic activity with a bunch of short pauses.
 	 */
 	if ((divisor * 4) <= INTERVALS * 3)
-		return;
+		return 0;
 
 	thresh = max - 1;
 	goto again;
@@ -289,6 +286,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
 	int i;
 	unsigned int interactivity_req;
+	unsigned int interactivity_overrride_us;
 	unsigned long nr_iowaiters, cpu_load;
 
 	if (data->needs_update) {
@@ -313,7 +311,10 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 					 data->correction_factor[data->bucket],
 					 RESOLUTION * DECAY);
 
-	get_typical_interval(data);
+	interactivity_overrride_us = get_typical_interval(data);
+	if (interactivity_overrride_us &&
+	    data->next_timer_us > interactivity_overrride_us)
+		data->predicted_us = interactivity_overrride_us;
 
 	/*
 	 * Performance multiplier defines a minimum predicted idle
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V3 6/6] cpuidle: menu: Move the update function before its declaration
  2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
                   ` (4 preceding siblings ...)
  2014-11-07 14:31 ` [PATCH V3 5/6] cpuidle: menu: Fix the get_typical_interval Daniel Lezcano
@ 2014-11-07 14:31 ` Daniel Lezcano
  2014-11-07 14:34 ` [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
  6 siblings, 0 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:31 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

In order to prevent a pointless forward declaration, just move the function
at the beginning of the file.

This patch does not change the behavior of the governor, it is just code
reordering.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Len Brown <len.brown@intel.com>
---
 drivers/cpuidle/governors/menu.c | 149 +++++++++++++++++++--------------------
 1 file changed, 74 insertions(+), 75 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 163e63b..b8017ad 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -184,7 +184,6 @@ static inline int performance_multiplier(unsigned long nr_iowaiters, unsigned lo
 
 static DEFINE_PER_CPU(struct menu_device, menu_devices);
 
-static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev);
 
 /* This implements DIV_ROUND_CLOSEST but avoids 64 bit division */
 static u64 div_round64(u64 dividend, u32 divisor)
@@ -192,6 +191,80 @@ static u64 div_round64(u64 dividend, u32 divisor)
 	return div_u64(dividend + (divisor / 2), divisor);
 }
 
+/**
+ * menu_update - attempts to guess what happened after entry
+ * @drv: cpuidle driver containing state data
+ * @dev: the CPU
+ */
+static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+{
+	struct menu_device *data = this_cpu_ptr(&menu_devices);
+	int last_idx = data->last_state_idx;
+	struct cpuidle_state *target = &drv->states[last_idx];
+	unsigned int measured_us;
+	unsigned int new_factor;
+
+	/*
+	 * Try to figure out how much time passed between entry to low
+	 * power state and occurrence of the wakeup event.
+	 *
+	 * If the entered idle state didn't support residency measurements,
+	 * we are basically lost in the dark how much time passed.
+	 * As a compromise, assume we slept for the whole expected time.
+	 *
+	 * Any measured amount of time will include the exit latency.
+	 * Since we are interested in when the wakeup begun, not when it
+	 * was completed, we must subtract the exit latency. However, if
+	 * the measured amount of time is less than the exit latency,
+	 * assume the state was never reached and the exit latency is 0.
+	 */
+	if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
+		/* Use timer value as is */
+		measured_us = data->next_timer_us;
+
+	} else {
+		/* Use measured value */
+		measured_us = cpuidle_get_last_residency(dev);
+
+		/* Deduct exit latency */
+		if (measured_us > target->exit_latency)
+			measured_us -= target->exit_latency;
+
+		/* Make sure our coefficients do not exceed unity */
+		if (measured_us > data->next_timer_us)
+			measured_us = data->next_timer_us;
+	}
+
+	/* Update our correction ratio */
+	new_factor = data->correction_factor[data->bucket];
+	new_factor -= new_factor / DECAY;
+
+	if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
+		new_factor += RESOLUTION * measured_us / data->next_timer_us;
+	else
+		/*
+		 * we were idle so long that we count it as a perfect
+		 * prediction
+		 */
+		new_factor += RESOLUTION;
+
+	/*
+	 * We don't want 0 as factor; we always want at least
+	 * a tiny bit of estimated time. Fortunately, due to rounding,
+	 * new_factor will stay nonzero regardless of measured_us values
+	 * and the compiler can eliminate this test as long as DECAY > 1.
+	 */
+	if (DECAY == 1 && unlikely(new_factor == 0))
+		new_factor = 1;
+
+	data->correction_factor[data->bucket] = new_factor;
+
+	/* update the repeating-pattern data */
+	data->intervals[data->interval_ptr++] = measured_us;
+	if (data->interval_ptr >= INTERVALS)
+		data->interval_ptr = 0;
+}
+
 /*
  * Try detecting repeating patterns by keeping track of the last 8
  * intervals, and checking if the standard deviation of that set
@@ -371,80 +444,6 @@ static void menu_reflect(struct cpuidle_device *dev, int index)
 }
 
 /**
- * menu_update - attempts to guess what happened after entry
- * @drv: cpuidle driver containing state data
- * @dev: the CPU
- */
-static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
-{
-	struct menu_device *data = this_cpu_ptr(&menu_devices);
-	int last_idx = data->last_state_idx;
-	struct cpuidle_state *target = &drv->states[last_idx];
-	unsigned int measured_us;
-	unsigned int new_factor;
-
-	/*
-	 * Try to figure out how much time passed between entry to low
-	 * power state and occurrence of the wakeup event.
-	 *
-	 * If the entered idle state didn't support residency measurements,
-	 * we are basically lost in the dark how much time passed.
-	 * As a compromise, assume we slept for the whole expected time.
-	 *
-	 * Any measured amount of time will include the exit latency.
-	 * Since we are interested in when the wakeup begun, not when it
-	 * was completed, we must subtract the exit latency. However, if
-	 * the measured amount of time is less than the exit latency,
-	 * assume the state was never reached and the exit latency is 0.
-	 */
-	if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
-		/* Use timer value as is */
-		measured_us = data->next_timer_us;
-
-	} else {
-		/* Use measured value */
-		measured_us = cpuidle_get_last_residency(dev);
-
-		/* Deduct exit latency */
-		if (measured_us > target->exit_latency)
-			measured_us -= target->exit_latency;
-
-		/* Make sure our coefficients do not exceed unity */
-		if (measured_us > data->next_timer_us)
-			measured_us = data->next_timer_us;
-	}
-
-	/* Update our correction ratio */
-	new_factor = data->correction_factor[data->bucket];
-	new_factor -= new_factor / DECAY;
-
-	if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
-		new_factor += RESOLUTION * measured_us / data->next_timer_us;
-	else
-		/*
-		 * we were idle so long that we count it as a perfect
-		 * prediction
-		 */
-		new_factor += RESOLUTION;
-
-	/*
-	 * We don't want 0 as factor; we always want at least
-	 * a tiny bit of estimated time. Fortunately, due to rounding,
-	 * new_factor will stay nonzero regardless of measured_us values
-	 * and the compiler can eliminate this test as long as DECAY > 1.
-	 */
-	if (DECAY == 1 && unlikely(new_factor == 0))
-		new_factor = 1;
-
-	data->correction_factor[data->bucket] = new_factor;
-
-	/* update the repeating-pattern data */
-	data->intervals[data->interval_ptr++] = measured_us;
-	if (data->interval_ptr >= INTERVALS)
-		data->interval_ptr = 0;
-}
-
-/**
  * menu_enable_device - scans a CPU's states and does setup
  * @drv: cpuidle driver
  * @dev: the CPU
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes
  2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
                   ` (5 preceding siblings ...)
  2014-11-07 14:31 ` [PATCH V3 6/6] cpuidle: menu: Move the update function before its declaration Daniel Lezcano
@ 2014-11-07 14:34 ` Daniel Lezcano
  6 siblings, 0 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-07 14:34 UTC (permalink / raw)
  To: rjw
  Cc: preeti, nicolas.pitre, linux-pm, linux-kernel, peterz,
	linaro-kernel, patches, lenb

On 11/07/2014 03:31 PM, Daniel Lezcano wrote:
> Hi,
>
> this patchset provides some cleanups to prepare the removal of the
> CPUIDLE_DRIVER_STATE_START macro in the cpuidle code. As the code has built
> on top of this macro adding more complexity and ugly hacks/tricks, it is hard
> to simply remove it. Some code has to be cleanup before, this is the purpose
> of this first patchset.

By the way, I forgot to mention it is based on 3.18-rc3

Thanks

   -- Daniel

> Changelog:
>
> V3:
>    - fixed comment for the latency_req introduction back to the right patch
>    - added back the last_idx_state in the reflect function
>    - change the get_typical_interval to return the value
>
> V2:
>    - fixed typo
>    - removed redundant comment
>    - fixed readability
>
>
> Daniel Lezcano (6):
>    sched: idle: Add a weak arch_cpu_idle_poll function
>    sched: idle: cpuidle: Check the latency req before idle
>    sched: idle: Get the next timer event and pass it the cpuidle
>      framework
>    cpuidle: idle: menu: Don't reflect when a state selection failed
>    cpuidle: menu: Fix the get_typical_interval
>    cpuidle: menu: Move the update function before its declaration
>
>   drivers/cpuidle/cpuidle.c          |  12 +--
>   drivers/cpuidle/governors/ladder.c |  13 +--
>   drivers/cpuidle/governors/menu.c   | 183 ++++++++++++++++++-------------------
>   include/linux/cpuidle.h            |   9 +-
>   kernel/sched/idle.c                |  55 +++++++----
>   5 files changed, 140 insertions(+), 132 deletions(-)
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function
  2014-11-07 14:31 ` [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function Daniel Lezcano
@ 2014-11-08 10:39   ` Preeti U Murthy
  2014-11-10 12:29   ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-11-08 10:39 UTC (permalink / raw)
  To: Daniel Lezcano, rjw
  Cc: nicolas.pitre, linux-pm, linux-kernel, peterz, linaro-kernel,
	patches, lenb

On 11/07/2014 08:01 PM, Daniel Lezcano wrote:
> The poll function is called when a timer expired or if we force to poll when
> the cpu_idle_force_poll option is set.
> 
> The poll function does:
> 
>        local_irq_enable();
>        while (!tif_need_resched())
>                cpu_relax();
> 
> This default poll function suits for the x86 arch because its rep; nop;
> hardware power optimization. But on other archs, this optimization does not
> exists and we are not saving power. The arch specific bits may want to
> optimize this loop by adding their own optimization.
> 
> Give an opportunity to the different platform to specify their own polling
> loop by adding a weak cpu_idle_poll_loop function.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  kernel/sched/idle.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index c47fce7..ea65bbae 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -42,18 +42,6 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>  __setup("hlt", cpu_idle_nopoll_setup);
>  #endif
> 
> -static inline int cpu_idle_poll(void)
> -{
> -	rcu_idle_enter();
> -	trace_cpu_idle_rcuidle(0, smp_processor_id());
> -	local_irq_enable();
> -	while (!tif_need_resched())
> -		cpu_relax();
> -	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> -	rcu_idle_exit();
> -	return 1;
> -}
> -
>  /* Weak implementations for optional arch specific functions */
>  void __weak arch_cpu_idle_prepare(void) { }
>  void __weak arch_cpu_idle_enter(void) { }
> @@ -65,6 +53,23 @@ void __weak arch_cpu_idle(void)
>  	local_irq_enable();
>  }
> 
> +void __weak arch_cpu_idle_poll(void)
> +{
> +	local_irq_enable();
> +	while (!tif_need_resched())
> +		cpu_relax();
> +}
> +
> +static inline int cpu_idle_poll(void)
> +{
> +	rcu_idle_enter();
> +	trace_cpu_idle_rcuidle(0, smp_processor_id());
> +	arch_cpu_idle_poll();
> +	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> +	rcu_idle_exit();
> +	return 1;
> +}
> +
>  /**
>   * cpuidle_idle_call - the main idle function
>   *
> 

Thanks Daniel!

Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-07 14:31 ` [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle Daniel Lezcano
@ 2014-11-08 10:40   ` Preeti U Murthy
  2014-11-10 12:41   ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-11-08 10:40 UTC (permalink / raw)
  To: Daniel Lezcano, rjw
  Cc: nicolas.pitre, linux-pm, linux-kernel, peterz, linaro-kernel,
	patches, lenb

On 11/07/2014 08:01 PM, Daniel Lezcano wrote:
> When the pmqos latency requirement is set to zero that means "poll in all the
> cases".
> 
> That is correctly implemented on x86 but not on the other archs.
> 
> As how is written the code, if the latency request is zero, the governor will
> return zero, so corresponding, for x86, to the poll function, but for the
> others arch the default idle function. For example, on ARM this is wait-for-
> interrupt with a latency of '1', so violating the constraint.
> 
> In order to fix that, do the latency requirement check *before* calling the
> cpuidle framework in order to jump to the poll function without entering
> cpuidle. That has several benefits:
> 
>  1. It clarifies and unifies the code
>  2. It fixes x86 vs other archs behavior
>  3. Factors out the call to the same function
>  4. Prevent to enter the cpuidle framework with its expensive cost in
>     calculation
> 
> As the latency_req is needed in all the cases, change the select API to take
> the latency_req as parameter in case it is not equal to zero.
> 
> As a positive side effect, it introduces the latency constraint specified
> externally, so one more step to the cpuidle/scheduler integration.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Acked-by: Nicolas Pitre <nico@linaro.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> ---

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 4/6] cpuidle: idle: menu: Don't reflect when a state selection failed
  2014-11-07 14:31 ` [PATCH V3 4/6] cpuidle: idle: menu: Don't reflect when a state selection failed Daniel Lezcano
@ 2014-11-08 10:41   ` Preeti U Murthy
  0 siblings, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-11-08 10:41 UTC (permalink / raw)
  To: Daniel Lezcano, rjw
  Cc: nicolas.pitre, linux-pm, linux-kernel, peterz, linaro-kernel,
	patches, lenb

On 11/07/2014 08:01 PM, Daniel Lezcano wrote:
> In the current code, the check to reflect or not the outcoming state is done
> against the idle state which has been chosen and its value.
> 
> Instead of doing a check in each of the reflect functions, just don't call reflect
> if something went wrong in the idle path.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Acked-by: Nicolas Pitre <nico@linaro.org>
> ---
>  drivers/cpuidle/governors/ladder.c | 3 +--
>  drivers/cpuidle/governors/menu.c   | 3 +--
>  kernel/sched/idle.c                | 3 ++-
>  3 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index 51c9ccd..86e4d49 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -165,8 +165,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
>  static void ladder_reflect(struct cpuidle_device *dev, int index)
>  {
>  	struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
> -	if (index > 0)
> -		ldev->last_state_idx = index;
> +	ldev->last_state_idx = index;
>  }
> 
>  static struct cpuidle_governor ladder_governor = {
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 91b3000..2e4a315 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -366,8 +366,7 @@ static void menu_reflect(struct cpuidle_device *dev, int index)
>  {
>  	struct menu_device *data = this_cpu_ptr(&menu_devices);
>  	data->last_state_idx = index;
> -	if (index >= 0)
> -		data->needs_update = 1;
> +	data->needs_update = 1;
>  }
> 
>  /**
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 0a7a1d1..0ae1cc8 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -173,7 +173,8 @@ use_default:
>  	/*
>  	 * Give the governor an opportunity to reflect on the outcome
>  	 */
> -	cpuidle_reflect(dev, entered_state);
> +	if (entered_state >= 0)
> +		cpuidle_reflect(dev, entered_state);
> 
>  exit_idle:
>  	__current_set_polling();
> 
Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework
  2014-11-07 14:31 ` [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework Daniel Lezcano
@ 2014-11-08 10:44   ` Preeti U Murthy
  2014-11-10 12:43   ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-11-08 10:44 UTC (permalink / raw)
  To: Daniel Lezcano, rjw
  Cc: nicolas.pitre, linux-pm, linux-kernel, peterz, linaro-kernel,
	patches, lenb

On 11/07/2014 08:01 PM, Daniel Lezcano wrote:
> Following the logic of the previous patch, retrieve from the idle task the
> expected timer sleep duration and pass it to the cpuidle framework.
> 
> Take the opportunity to remove the unused headers in the menu.c file.
> 
> This patch does not change the current behavior.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Acked-by: Nicolas Pitre <nico@linaro.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> ---

This patch looks good to me as well.

Reviewed-by: Preeti U. Murthy


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function
  2014-11-07 14:31 ` [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function Daniel Lezcano
  2014-11-08 10:39   ` Preeti U Murthy
@ 2014-11-10 12:29   ` Peter Zijlstra
  2014-11-10 14:20     ` Preeti U Murthy
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-10 12:29 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
> The poll function is called when a timer expired or if we force to poll when
> the cpu_idle_force_poll option is set.
> 
> The poll function does:
> 
>        local_irq_enable();
>        while (!tif_need_resched())
>                cpu_relax();
> 
> This default poll function suits for the x86 arch because its rep; nop;
> hardware power optimization. But on other archs, this optimization does not
> exists and we are not saving power. The arch specific bits may want to
> optimize this loop by adding their own optimization.

This doesn't make sense to me; should an arch not either implement an
actual idle driver or implement cpu_relax() properly, why allow for a
third weird option?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-07 14:31 ` [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle Daniel Lezcano
  2014-11-08 10:40   ` Preeti U Murthy
@ 2014-11-10 12:41   ` Peter Zijlstra
  2014-11-10 15:12     ` Daniel Lezcano
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-10 12:41 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Fri, Nov 07, 2014 at 03:31:23PM +0100, Daniel Lezcano wrote:
> @@ -216,19 +219,26 @@ static void cpu_idle_loop(void)
>  			local_irq_disable();
>  			arch_cpu_idle_enter();
>  
> +			latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
> +
>  			/*
>  			 * In poll mode we reenable interrupts and spin.
>  			 *
> +			 * If the latency req is zero, we don't want to
> +			 * enter any idle state and we jump to the poll
> +			 * function directly
> +			 *
>  			 * Also if we detected in the wakeup from idle
>  			 * path that the tick broadcast device expired
>  			 * for us, we don't want to go deep idle as we
>  			 * know that the IPI is going to arrive right
>  			 * away
>  			 */
> -			if (cpu_idle_force_poll || tick_check_broadcast_expired())
> +			if (!latency_req || cpu_idle_force_poll ||
> +			    tick_check_broadcast_expired())
>  				cpu_idle_poll();

Is this why you wanted that weak poll function?

Should we not instead allow an arch to deal with !latency_req and only
fall back to this polling if there is no actual way for it to implement
this better?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework
  2014-11-07 14:31 ` [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework Daniel Lezcano
  2014-11-08 10:44   ` Preeti U Murthy
@ 2014-11-10 12:43   ` Peter Zijlstra
  2014-11-10 15:15     ` Daniel Lezcano
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-10 12:43 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Fri, Nov 07, 2014 at 03:31:24PM +0100, Daniel Lezcano wrote:
>  static void cpu_idle_loop(void)
>  {
> -	unsigned int latency_req;
> +	unsigned int latency_req, next_timer_event;
>  
>  	while (1) {
>  		/*
> @@ -221,6 +222,9 @@ static void cpu_idle_loop(void)
>  
>  			latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
>  
> +			next_timer_event =
> +				ktime_to_us(tick_nohz_get_sleep_length());
> +
>  			/*
>  			 * In poll mode we reenable interrupts and spin.
>  			 *
> @@ -238,7 +242,8 @@ static void cpu_idle_loop(void)
>  			    tick_check_broadcast_expired())
>  				cpu_idle_poll();
>  			else
> -				cpuidle_idle_call(latency_req);
> +				cpuidle_idle_call(latency_req,
> +						  next_timer_event);
>  
>  			arch_cpu_idle_exit();
>  		}

Why do we want to query the next timer in the poll case? Afaict the
other patches don't make use of this either.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function
  2014-11-10 12:29   ` Peter Zijlstra
@ 2014-11-10 14:20     ` Preeti U Murthy
  2014-11-10 15:17       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2014-11-10 14:20 UTC (permalink / raw)
  To: Peter Zijlstra, Daniel Lezcano
  Cc: rjw, nicolas.pitre, linux-pm, linux-kernel, linaro-kernel, patches, lenb

Hi Peter,

On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
> On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
>> The poll function is called when a timer expired or if we force to poll when
>> the cpu_idle_force_poll option is set.
>>
>> The poll function does:
>>
>>        local_irq_enable();
>>        while (!tif_need_resched())
>>                cpu_relax();
>>
>> This default poll function suits for the x86 arch because its rep; nop;
>> hardware power optimization. But on other archs, this optimization does not
>> exists and we are not saving power. The arch specific bits may want to
>> optimize this loop by adding their own optimization.
> 
> This doesn't make sense to me; should an arch not either implement an
> actual idle driver or implement cpu_relax() properly, why allow for a
> third weird option?
> 

The previous version of this patch simply invoked cpu_idle_loop() for
cases where latency_req was 0. This would have changed the behavior
on PowerPC wherein earlier the 0th idle index was returned which is also
a polling loop but differs from cpu_idle_loop() in two ways:

a. It polls at a relatively lower power state than cpu_relax().
b. We set certain registers to indicate that the cpu is idle.

Hence for all such cases wherein the cpu is required to poll while idle
(only for cases such as force poll, broadcast ipi to arrive soon and
latency_req = 0), we should be able to call into cpuidle_idle_loop()
only if the cpuidle driver's 0th idle state has an exit_latency > 0.
(The 0th idle state is expected to be a polling loop with
exit_latency = 0).

If otherwise, it would mean the driver has an optimized polling loop
when idle. But instead of adding in the logic of checking the
exit_latency, we thought it would be simpler to call into an arch
defined polling idle loop under the above circumstances. If that is no
better we could fall back to cpuidle_idle_loop().

Regards
Preeti U Murthy




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 12:41   ` Peter Zijlstra
@ 2014-11-10 15:12     ` Daniel Lezcano
  2014-11-10 15:28       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-10 15:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On 11/10/2014 01:41 PM, Peter Zijlstra wrote:
> On Fri, Nov 07, 2014 at 03:31:23PM +0100, Daniel Lezcano wrote:
>> @@ -216,19 +219,26 @@ static void cpu_idle_loop(void)
>>   			local_irq_disable();
>>   			arch_cpu_idle_enter();
>>
>> +			latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
>> +
>>   			/*
>>   			 * In poll mode we reenable interrupts and spin.
>>   			 *
>> +			 * If the latency req is zero, we don't want to
>> +			 * enter any idle state and we jump to the poll
>> +			 * function directly
>> +			 *
>>   			 * Also if we detected in the wakeup from idle
>>   			 * path that the tick broadcast device expired
>>   			 * for us, we don't want to go deep idle as we
>>   			 * know that the IPI is going to arrive right
>>   			 * away
>>   			 */
>> -			if (cpu_idle_force_poll || tick_check_broadcast_expired())
>> +			if (!latency_req || cpu_idle_force_poll ||
>> +			    tick_check_broadcast_expired())
>>   				cpu_idle_poll();
>
> Is this why you wanted that weak poll function?

Not only there, it will be added in the next patchset in the 
cpu_idle_call function.

> Should we not instead allow an arch to deal with !latency_req and only
> fall back to this polling if there is no actual way for it to implement
> this better?

All this is to remove the poll idle state from the x86 cpuidle driver in 
order to remove the CPUIDLE_DRIVER_STATE_START (this one forces to write 
always ugly code in the cpuidle framework).

This poll state introduces the CPUIDLE_DRIVER_STATE_START macro. If you 
look at the different governors and the code, you can checkout what kind 
of tricks this macro introduces and how that makes the code ugly.

For the sake of what ? Just a small optimization in the menu governor.

I suppose that has been introduce and then evolved on a wrong basis. So 
now we have to deal with that.

This patchset provides a first round of cleanup around the poll 
function, the next patchset will move the 5us timer optimization from 
the menu governor and the last patchset will remove the 
CPUIDLE_DRIVER_STATE_START ugly macro.





-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework
  2014-11-10 12:43   ` Peter Zijlstra
@ 2014-11-10 15:15     ` Daniel Lezcano
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-10 15:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On 11/10/2014 01:43 PM, Peter Zijlstra wrote:
> On Fri, Nov 07, 2014 at 03:31:24PM +0100, Daniel Lezcano wrote:
>>   static void cpu_idle_loop(void)
>>   {
>> -	unsigned int latency_req;
>> +	unsigned int latency_req, next_timer_event;
>>
>>   	while (1) {
>>   		/*
>> @@ -221,6 +222,9 @@ static void cpu_idle_loop(void)
>>
>>   			latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
>>
>> +			next_timer_event =
>> +				ktime_to_us(tick_nohz_get_sleep_length());
>> +
>>   			/*
>>   			 * In poll mode we reenable interrupts and spin.
>>   			 *
>> @@ -238,7 +242,8 @@ static void cpu_idle_loop(void)
>>   			    tick_check_broadcast_expired())
>>   				cpu_idle_poll();
>>   			else
>> -				cpuidle_idle_call(latency_req);
>> +				cpuidle_idle_call(latency_req,
>> +						  next_timer_event);
>>
>>   			arch_cpu_idle_exit();
>>   		}
>
> Why do we want to query the next timer in the poll case? Afaict the
> other patches don't make use of this either.

Well, the direction I am taking when writing those cleanups is to have 
something like:

"I will sleep X usec, I have Y usec latency constraints". Grouping the 
latency req and the next timer allows to stick to the next changes.


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function
  2014-11-10 14:20     ` Preeti U Murthy
@ 2014-11-10 15:17       ` Peter Zijlstra
  2014-11-11 11:00         ` Preeti U Murthy
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-10 15:17 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Daniel Lezcano, rjw, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote:
> Hi Peter,
> 
> On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
> > On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
> >> The poll function is called when a timer expired or if we force to poll when
> >> the cpu_idle_force_poll option is set.
> >>
> >> The poll function does:
> >>
> >>        local_irq_enable();
> >>        while (!tif_need_resched())
> >>                cpu_relax();
> >>
> >> This default poll function suits for the x86 arch because its rep; nop;
> >> hardware power optimization. But on other archs, this optimization does not
> >> exists and we are not saving power. The arch specific bits may want to
> >> optimize this loop by adding their own optimization.
> > 
> > This doesn't make sense to me; should an arch not either implement an
> > actual idle driver or implement cpu_relax() properly, why allow for a
> > third weird option?
> > 
> 
> The previous version of this patch simply invoked cpu_idle_loop() for
> cases where latency_req was 0. This would have changed the behavior
> on PowerPC wherein earlier the 0th idle index was returned which is also
> a polling loop but differs from cpu_idle_loop() in two ways:
> 
> a. It polls at a relatively lower power state than cpu_relax().
> b. We set certain registers to indicate that the cpu is idle.

So I'm confused; the current code runs the generic cpu_relax idle poll
loop for the broadcast case. I suppose you want to retain this because
not doing your a-b above will indeed give you a lower latency.

Therefore one could argue that latency_req==0 should indeed use this,
and your a-b idle state should be latency_req==1 or higher.

Thus yes it changes behaviour, but I think it actually fixes something.
You cannot have a latency_req==0 state which has higher latency than the
actual polling loop, as you appear to have.

> Hence for all such cases wherein the cpu is required to poll while idle
> (only for cases such as force poll, broadcast ipi to arrive soon and
> latency_req = 0), we should be able to call into cpuidle_idle_loop()
> only if the cpuidle driver's 0th idle state has an exit_latency > 0.
> (The 0th idle state is expected to be a polling loop with
> exit_latency = 0).
> 
> If otherwise, it would mean the driver has an optimized polling loop
> when idle. But instead of adding in the logic of checking the
> exit_latency, we thought it would be simpler to call into an arch
> defined polling idle loop under the above circumstances. If that is no
> better we could fall back to cpuidle_idle_loop().

That still doesn't make sense to me; suppose the implementation of this
special poll state differs on different uarchs for the same arch, then
we'll end up with another registration and selection interface parallel
to cpuidle.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 15:12     ` Daniel Lezcano
@ 2014-11-10 15:28       ` Peter Zijlstra
  2014-11-10 15:58         ` Daniel Lezcano
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-10 15:28 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Mon, Nov 10, 2014 at 04:12:47PM +0100, Daniel Lezcano wrote:
> All this is to remove the poll idle state from the x86 cpuidle driver in
> order to remove the CPUIDLE_DRIVER_STATE_START (this one forces to write
> always ugly code in the cpuidle framework).
> 
> This poll state introduces the CPUIDLE_DRIVER_STATE_START macro. If you look
> at the different governors and the code, you can checkout what kind of
> tricks this macro introduces and how that makes the code ugly.
> 
> For the sake of what ? Just a small optimization in the menu governor.
> 
> I suppose that has been introduce and then evolved on a wrong basis. So now
> we have to deal with that.
> 
> This patchset provides a first round of cleanup around the poll function,
> the next patchset will move the 5us timer optimization from the menu
> governor and the last patchset will remove the CPUIDLE_DRIVER_STATE_START
> ugly macro.

I don't get it, I've clearly not stared at it long enough, but why is
that STATE_START crap needed anywhere?

To me it appears 'natural' to have a latency_req==0 state, why does it
need so much special casing?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 15:28       ` Peter Zijlstra
@ 2014-11-10 15:58         ` Daniel Lezcano
  2014-11-10 16:15           ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-10 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On 11/10/2014 04:28 PM, Peter Zijlstra wrote:
> On Mon, Nov 10, 2014 at 04:12:47PM +0100, Daniel Lezcano wrote:
>> All this is to remove the poll idle state from the x86 cpuidle driver in
>> order to remove the CPUIDLE_DRIVER_STATE_START (this one forces to write
>> always ugly code in the cpuidle framework).
>>
>> This poll state introduces the CPUIDLE_DRIVER_STATE_START macro. If you look
>> at the different governors and the code, you can checkout what kind of
>> tricks this macro introduces and how that makes the code ugly.
>>
>> For the sake of what ? Just a small optimization in the menu governor.
>>
>> I suppose that has been introduce and then evolved on a wrong basis. So now
>> we have to deal with that.
>>
>> This patchset provides a first round of cleanup around the poll function,
>> the next patchset will move the 5us timer optimization from the menu
>> governor and the last patchset will remove the CPUIDLE_DRIVER_STATE_START
>> ugly macro.
>
> I don't get it, I've clearly not stared at it long enough, but why is
> that STATE_START crap needed anywhere?

Excellent question :)

On x86, the config option ARCH_HAS_CPU_RELAX is set (x86 is the only 
one). That leads to the macro CPUIDLE_DRIVER_STATE_START equal 1.

https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/include/linux/cpuidle.h#n221

Then the acpi cpuidle driver and the intel_driver begin to fill the idle 
state at index == CPUIDLE_DRIVER_STATE_START, so leaving the 0th idle 
state empty.

https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/idle/intel_idle.c#n848

https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/acpi/processor_idle.c#n953

Then when the driver is registered and if ARCH_HAS_CPU_RELAX is set, the 
cpuidle framework insert the 0th with the poll state (reminder : only 
for x86).

https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195

If you look at the ladder governor (which I believe nobody is still 
using it), or at the menu governor, all the indexes begin at 
CPUIDLE_DRIVER_STATE_START, so all the code is preventing to use the 0th 
state ... :)

So why is needed the poll state ?

1. When the latency_req is 0 (it returns 0, so the poll state)

2. When the select's menu governor fails to find a state *and* if the 
next timer is before 5us

https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195

And when we investigate the same code but on the other archs, the 
CPUIDLE_DRIVER_STATE_START dance makes things slightly different.

So the conclusion is, we are inserting a state in the idle state array 
but we do everything to prevent to use it :)

For me it appears logical to just kill this state from the x86 idle 
drivers and move it in the idle_mainloop in case an idle state selection 
fails.



> To me it appears 'natural' to have a latency_req==0 state, why does it
> need so much special casing?



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 15:58         ` Daniel Lezcano
@ 2014-11-10 16:15           ` Peter Zijlstra
  2014-11-10 17:19             ` Daniel Lezcano
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-10 16:15 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Mon, Nov 10, 2014 at 04:58:29PM +0100, Daniel Lezcano wrote:
> >I don't get it, I've clearly not stared at it long enough, but why is
> >that STATE_START crap needed anywhere?
> 
> Excellent question :)
> 
> On x86, the config option ARCH_HAS_CPU_RELAX is set (x86 is the only one).
> That leads to the macro CPUIDLE_DRIVER_STATE_START equal 1.
> 
> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/include/linux/cpuidle.h#n221
> 
> Then the acpi cpuidle driver and the intel_driver begin to fill the idle
> state at index == CPUIDLE_DRIVER_STATE_START, so leaving the 0th idle state
> empty.
> 
> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/idle/intel_idle.c#n848
> 
> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/acpi/processor_idle.c#n953
> 
> Then when the driver is registered and if ARCH_HAS_CPU_RELAX is set, the
> cpuidle framework insert the 0th with the poll state (reminder : only for
> x86).

Appears to me part of the problem is right there, the intel_idle and
proessor_idle should register the poll state themselves. That
immediately makes this weirdness go away.

Registering states from two places is not something that's sane or
desired I think.

> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195
> 
> If you look at the ladder governor (which I believe nobody is still using
> it), or at the menu governor, all the indexes begin at
> CPUIDLE_DRIVER_STATE_START, so all the code is preventing to use the 0th
> state ... :)
> 
> So why is needed the poll state ?
> 
> 1. When the latency_req is 0 (it returns 0, so the poll state)

Right, that makes sense.

> 2. When the select's menu governor fails to find a state *and* if the next
> timer is before 5us

That seems rather arbitrary. Why would it fail to find a state?

> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195
> 
> And when we investigate the same code but on the other archs, the
> CPUIDLE_DRIVER_STATE_START dance makes things slightly different.
> 
> So the conclusion is, we are inserting a state in the idle state array but
> we do everything to prevent to use it :)
> 
> For me it appears logical to just kill this state from the x86 idle drivers
> and move it in the idle_mainloop in case an idle state selection fails.

But why, ppc has a latency_req==0 state too, right?

I agree that we should shoot STATE_START in the head, but I feel we
should do it by fixing the state registration.

I really don't get why the governors should know about this though, its
just another state, they should iterate all states and pick the best,
given the power usage this state should really never be eligible unless
we're QoS forced or whatnot.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 16:15           ` Peter Zijlstra
@ 2014-11-10 17:19             ` Daniel Lezcano
  2014-11-10 19:48               ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-10 17:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On 11/10/2014 05:15 PM, Peter Zijlstra wrote:
> On Mon, Nov 10, 2014 at 04:58:29PM +0100, Daniel Lezcano wrote:
>>> I don't get it, I've clearly not stared at it long enough, but why is
>>> that STATE_START crap needed anywhere?
>>
>> Excellent question :)
>>
>> On x86, the config option ARCH_HAS_CPU_RELAX is set (x86 is the only one).
>> That leads to the macro CPUIDLE_DRIVER_STATE_START equal 1.
>>
>> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/include/linux/cpuidle.h#n221
>>
>> Then the acpi cpuidle driver and the intel_driver begin to fill the idle
>> state at index == CPUIDLE_DRIVER_STATE_START, so leaving the 0th idle state
>> empty.
>>
>> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/idle/intel_idle.c#n848
>>
>> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/acpi/processor_idle.c#n953
>>
>> Then when the driver is registered and if ARCH_HAS_CPU_RELAX is set, the
>> cpuidle framework insert the 0th with the poll state (reminder : only for
>> x86).
>
> Appears to me part of the problem is right there, the intel_idle and
> proessor_idle should register the poll state themselves. That
> immediately makes this weirdness go away.
>
> Registering states from two places is not something that's sane or
> desired I think.

I fully agree. I did a patchset in this direction but I realized the 
poll state most of the cases was not used.

>> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195
>>
>> If you look at the ladder governor (which I believe nobody is still using
>> it), or at the menu governor, all the indexes begin at
>> CPUIDLE_DRIVER_STATE_START, so all the code is preventing to use the 0th
>> state ... :)
>>
>> So why is needed the poll state ?
>>
>> 1. When the latency_req is 0 (it returns 0, so the poll state)
>
> Right, that makes sense.
>
>> 2. When the select's menu governor fails to find a state *and* if the next
>> timer is before 5us
>
> That seems rather arbitrary.

Yeah, and I am wondering if this very particular optimization couldn't 
be just removed. Is there still a benefit ?

> Why would it fail to find a state?

To do short: it could fail to find a state fulfilling the constraints 
(some states could be disabled or the ).

>> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195
>>
>> And when we investigate the same code but on the other archs, the
>> CPUIDLE_DRIVER_STATE_START dance makes things slightly different.
>>
>> So the conclusion is, we are inserting a state in the idle state array but
>> we do everything to prevent to use it :)
>>
>> For me it appears logical to just kill this state from the x86 idle drivers
>> and move it in the idle_mainloop in case an idle state selection fails.
>
> But why, ppc has a latency_req==0 state too, right?

Yes, this is why the arch_cpu_poll_idle, so ppc can override it with its 
optimized pooling loop (rep(); nop(); is x86).

> I agree that we should shoot STATE_START in the head, but I feel we
> should do it by fixing the state registration.

You can fix the state registration but that won't fix the STATE_START 
usage in the governors.

> I really don't get why the governors should know about this though, its
> just another state, they should iterate all states and pick the best,
> given the power usage this state should really never be eligible unless
> we're QoS forced or whatnot.

The governors just don't use the poll state at all, except for a couple 
of cases in menu.c defined above in the previous email. What is the 
rational of adding a state in the cpuidle driver and do everything we 
can to avoid using it ? From my POV, the poll state is a special state, 
we should remove from the driver's idle states like the arch_cpu_idle() 
is a specific idle state only used in idle.c (but which may overlap with 
an idle state in different archs eg. cpu_do_idle() and the 0th idle state).


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 17:19             ` Daniel Lezcano
@ 2014-11-10 19:48               ` Peter Zijlstra
  2014-11-10 22:21                 ` Daniel Lezcano
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-10 19:48 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Mon, Nov 10, 2014 at 06:19:02PM +0100, Daniel Lezcano wrote:
> >I really don't get why the governors should know about this though, its
> >just another state, they should iterate all states and pick the best,
> >given the power usage this state should really never be eligible unless
> >we're QoS forced or whatnot.
> 
> The governors just don't use the poll state at all, except for a couple of
> cases in menu.c defined above in the previous email. What is the rational of
> adding a state in the cpuidle driver and do everything we can to avoid using
> it ? From my POV, the poll state is a special state, we should remove from
> the driver's idle states like the arch_cpu_idle() is a specific idle state
> only used in idle.c (but which may overlap with an idle state in different
> archs eg. cpu_do_idle() and the 0th idle state).

So I disagree, I think poll-idle is an idle mode just like all the
others. It should be an available state to the governor and it should
treat it like any other.

I don't tihnk the whole ARCH_HAS_CPU_RELAX thing makes any kind of
sense, _every_ arch has some definition of it, the generic polling loop
is always a valid idle implementation.

What we can do is always populate the idle state table with it before
calling the regular drivers.

If the arch drivers have a 'better' latency_req==0 idle routine -- note
my argument on the ppc issue, I think its wrong -- it can replace the
existing one.

We should further remove all the special casing in the governors, its
always a valid state, but it should hardly ever be the most desirable
state.

I think the whole arch specific idle loop is a mistake, we already have
an (arch) interface into the idle routines, we don't need yet another.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 19:48               ` Peter Zijlstra
@ 2014-11-10 22:21                 ` Daniel Lezcano
  2014-11-11 10:20                   ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-10 22:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On 11/10/2014 08:48 PM, Peter Zijlstra wrote:
> On Mon, Nov 10, 2014 at 06:19:02PM +0100, Daniel Lezcano wrote:
>>> I really don't get why the governors should know about this though, its
>>> just another state, they should iterate all states and pick the best,
>>> given the power usage this state should really never be eligible unless
>>> we're QoS forced or whatnot.
>>
>> The governors just don't use the poll state at all, except for a couple of
>> cases in menu.c defined above in the previous email. What is the rational of
>> adding a state in the cpuidle driver and do everything we can to avoid using
>> it ? From my POV, the poll state is a special state, we should remove from
>> the driver's idle states like the arch_cpu_idle() is a specific idle state
>> only used in idle.c (but which may overlap with an idle state in different
>> archs eg. cpu_do_idle() and the 0th idle state).
>
> So I disagree, I think poll-idle is an idle mode just like all the
> others. It should be an available state to the governor and it should
> treat it like any other.

The governors are just ignoring it, except for a small timer 
optimization in menu.c (and I am still not convinced it is worth to have 
it). I don't see the point to add a state we don't want to use.

Eg. on my server it was called 2 times over 1313856 times.

> I don't tihnk the whole ARCH_HAS_CPU_RELAX thing makes any kind of
> sense, _every_ arch has some definition of it, the generic polling loop
> is always a valid idle implementation.
>
> What we can do is always populate the idle state table with it before
> calling the regular drivers.

I am not sure to understand. You want to add the poll idle loop in all 
the drivers ?

What about "safe_halt()" ? (arch_cpu_idle() for x86). It is also an idle 
state. Why not add it in the idle state table also ?

> If the arch drivers have a 'better' latency_req==0 idle routine -- note
> my argument on the ppc issue, I think its wrong -- it can replace the
> existing one.
>
> We should further remove all the special casing in the governors, its
> always a valid state, but it should hardly ever be the most desirable
> state.
>
> I think the whole arch specific idle loop is a mistake, we already have
> an (arch) interface into the idle routines, we don't need yet another.



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-10 22:21                 ` Daniel Lezcano
@ 2014-11-11 10:20                   ` Peter Zijlstra
  2014-11-12 13:53                     ` Daniel Lezcano
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-11 10:20 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Mon, Nov 10, 2014 at 11:21:19PM +0100, Daniel Lezcano wrote:
> On 11/10/2014 08:48 PM, Peter Zijlstra wrote:
> >On Mon, Nov 10, 2014 at 06:19:02PM +0100, Daniel Lezcano wrote:
> >>>I really don't get why the governors should know about this though, its
> >>>just another state, they should iterate all states and pick the best,
> >>>given the power usage this state should really never be eligible unless
> >>>we're QoS forced or whatnot.
> >>
> >>The governors just don't use the poll state at all, except for a couple of
> >>cases in menu.c defined above in the previous email. What is the rational of
> >>adding a state in the cpuidle driver and do everything we can to avoid using
> >>it ? From my POV, the poll state is a special state, we should remove from
> >>the driver's idle states like the arch_cpu_idle() is a specific idle state
> >>only used in idle.c (but which may overlap with an idle state in different
> >>archs eg. cpu_do_idle() and the 0th idle state).
> >
> >So I disagree, I think poll-idle is an idle mode just like all the
> >others. It should be an available state to the governor and it should
> >treat it like any other.
> 
> The governors are just ignoring it, except for a small timer optimization in
> menu.c (and I am still not convinced it is worth to have it). I don't see
> the point to add a state we don't want to use.

The ignoring it is _wrong_. Make that go away and you'll get rid of most
of the STATE_START crap.

The governors are the place where we combine the QoS constraints with
idle predictors and pick an idle state, polling is a valid state to
pick, and given QoS constraints it might be the only state to pick.

> Eg. on my server it was called 2 times over 1313856 times.
> 
> >I don't tihnk the whole ARCH_HAS_CPU_RELAX thing makes any kind of
> >sense, _every_ arch has some definition of it, the generic polling loop
> >is always a valid idle implementation.
> >
> >What we can do is always populate the idle state table with it before
> >calling the regular drivers.
> 
> I am not sure to understand. You want to add the poll idle loop in all the
> drivers ?
> 
> What about "safe_halt()" ? (arch_cpu_idle() for x86). It is also an idle
> state. Why not add it in the idle state table also ?

Because the latter is actually arch specific, whereas the idle polling
thing is not. You can _always_ poll idle, its generic, its valid, and
its guaranteed the most responsive method.

The arch drivers get to add arch specific idle states; if a x86 cpuidle
driver wants to add hlt they can.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function
  2014-11-10 15:17       ` Peter Zijlstra
@ 2014-11-11 11:00         ` Preeti U Murthy
  0 siblings, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-11-11 11:00 UTC (permalink / raw)
  To: Peter Zijlstra, Daniel Lezcano
  Cc: rjw, nicolas.pitre, linux-pm, linux-kernel, linaro-kernel, patches, lenb

On 11/10/2014 08:47 PM, Peter Zijlstra wrote:
> On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote:
>> Hi Peter,
>>
>> On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
>>> On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
>>>> The poll function is called when a timer expired or if we force to poll when
>>>> the cpu_idle_force_poll option is set.
>>>>
>>>> The poll function does:
>>>>
>>>>        local_irq_enable();
>>>>        while (!tif_need_resched())
>>>>                cpu_relax();
>>>>
>>>> This default poll function suits for the x86 arch because its rep; nop;
>>>> hardware power optimization. But on other archs, this optimization does not
>>>> exists and we are not saving power. The arch specific bits may want to
>>>> optimize this loop by adding their own optimization.
>>>
>>> This doesn't make sense to me; should an arch not either implement an
>>> actual idle driver or implement cpu_relax() properly, why allow for a
>>> third weird option?
>>>
>>
>> The previous version of this patch simply invoked cpu_idle_loop() for
>> cases where latency_req was 0. This would have changed the behavior
>> on PowerPC wherein earlier the 0th idle index was returned which is also
>> a polling loop but differs from cpu_idle_loop() in two ways:
>>
>> a. It polls at a relatively lower power state than cpu_relax().
>> b. We set certain registers to indicate that the cpu is idle.
> 
> So I'm confused; the current code runs the generic cpu_relax idle poll
> loop for the broadcast case. I suppose you want to retain this because
> not doing your a-b above will indeed give you a lower latency.
> 
> Therefore one could argue that latency_req==0 should indeed use this,
> and your a-b idle state should be latency_req==1 or higher.
> 
> Thus yes it changes behaviour, but I think it actually fixes something.
> You cannot have a latency_req==0 state which has higher latency than the
> actual polling loop, as you appear to have.

Yes you are right. This is fixing the current behavior. So we should be
good to call cpuidle_idle_loop() when latency_req=0.

> 
>> Hence for all such cases wherein the cpu is required to poll while idle
>> (only for cases such as force poll, broadcast ipi to arrive soon and
>> latency_req = 0), we should be able to call into cpuidle_idle_loop()
>> only if the cpuidle driver's 0th idle state has an exit_latency > 0.
>> (The 0th idle state is expected to be a polling loop with
>> exit_latency = 0).
>>
>> If otherwise, it would mean the driver has an optimized polling loop
>> when idle. But instead of adding in the logic of checking the
>> exit_latency, we thought it would be simpler to call into an arch
>> defined polling idle loop under the above circumstances. If that is no
>> better we could fall back to cpuidle_idle_loop().
> 
> That still doesn't make sense to me; suppose the implementation of this
> special poll state differs on different uarchs for the same arch, then
> we'll end up with another registration and selection interface parallel
> to cpuidle.

Yeah this will only get complicated. I was trying to see how to keep the
current behavior unchanged but looks like it is not worth it.

Thanks

Regards
Preeti U Murthy
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-11 10:20                   ` Peter Zijlstra
@ 2014-11-12 13:53                     ` Daniel Lezcano
  2014-11-12 15:02                       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-12 13:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On 11/11/2014 11:20 AM, Peter Zijlstra wrote:
> On Mon, Nov 10, 2014 at 11:21:19PM +0100, Daniel Lezcano wrote:
>> On 11/10/2014 08:48 PM, Peter Zijlstra wrote:
>>> On Mon, Nov 10, 2014 at 06:19:02PM +0100, Daniel Lezcano wrote:
>>>>> I really don't get why the governors should know about this though, its
>>>>> just another state, they should iterate all states and pick the best,
>>>>> given the power usage this state should really never be eligible unless
>>>>> we're QoS forced or whatnot.
>>>>
>>>> The governors just don't use the poll state at all, except for a couple of
>>>> cases in menu.c defined above in the previous email. What is the rational of
>>>> adding a state in the cpuidle driver and do everything we can to avoid using
>>>> it ? From my POV, the poll state is a special state, we should remove from
>>>> the driver's idle states like the arch_cpu_idle() is a specific idle state
>>>> only used in idle.c (but which may overlap with an idle state in different
>>>> archs eg. cpu_do_idle() and the 0th idle state).
>>>
>>> So I disagree, I think poll-idle is an idle mode just like all the
>>> others. It should be an available state to the governor and it should
>>> treat it like any other.
>>
>> The governors are just ignoring it, except for a small timer optimization in
>> menu.c (and I am still not convinced it is worth to have it). I don't see
>> the point to add a state we don't want to use.
>
> The ignoring it is _wrong_. Make that go away and you'll get rid of most
> of the STATE_START crap.
>
> The governors are the place where we combine the QoS constraints with
> idle predictors and pick an idle state, polling is a valid state to
> pick, and given QoS constraints it might be the only state to pick.

Well, I understand your point of view but I still disagree. IMO, the 
poll loop shouldn't be considered as a valid idle state for different 
reasons:

0. thermal issue if wrongly selected from any of the governor

1. a polling loop does not have a valid time measurements even if the 
TIME_VALID flag has been added

2. entering the idle governors is not free, especially the menu 
governor, which is contradictory with zero latency req

3. what is the meaning of having a zero latency (target + exit) idle 
state ? We know it will always succeed if the other fails

4. IIUC, you are suggesting to add the poll loop for all the cpuidle 
drivers. This is a *lot* of changes, I am not afraid about the work to 
do but there is a significant code impact and the entire behavior of the 
cpuidle framework for all the arch will be changed.

So given the points above, why not have one poll function, generic, and 
if we fail to find an idle state or if the req is zero, then fallback to 
the poll loop ?


>> Eg. on my server it was called 2 times over 1313856 times.
>>
>>> I don't tihnk the whole ARCH_HAS_CPU_RELAX thing makes any kind of
>>> sense, _every_ arch has some definition of it, the generic polling loop
>>> is always a valid idle implementation.
>>>
>>> What we can do is always populate the idle state table with it before
>>> calling the regular drivers.
>>
>> I am not sure to understand. You want to add the poll idle loop in all the
>> drivers ?
>>
>> What about "safe_halt()" ? (arch_cpu_idle() for x86). It is also an idle
>> state. Why not add it in the idle state table also ?
>
> Because the latter is actually arch specific, whereas the idle polling
> thing is not.
> You can _always_ poll idle, its generic, its valid, and
> its guaranteed the most responsive method.

I agree with this point but this kind of loop is hardware optimized for 
x86. On the other arch, where today this loop is never used, if we 
change the behavior of the cpuidle framework and introduces this loop, 
it may be selected and stay on this state for a long time (resulting 
from a bad prediction), I am afraid that can lead to some thermal issues.

> The arch drivers get to add arch specific idle states; if a x86 cpuidle
> driver wants to add hlt they can.




-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-12 13:53                     ` Daniel Lezcano
@ 2014-11-12 15:02                       ` Peter Zijlstra
  2014-11-12 17:52                         ` Daniel Lezcano
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2014-11-12 15:02 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On Wed, Nov 12, 2014 at 02:53:10PM +0100, Daniel Lezcano wrote:

> >>The governors are just ignoring it, except for a small timer optimization in
> >>menu.c (and I am still not convinced it is worth to have it). I don't see
> >>the point to add a state we don't want to use.
> >
> >The ignoring it is _wrong_. Make that go away and you'll get rid of most
> >of the STATE_START crap.
> >
> >The governors are the place where we combine the QoS constraints with
> >idle predictors and pick an idle state, polling is a valid state to
> >pick, and given QoS constraints it might be the only state to pick.
> 
> Well, I understand your point of view but I still disagree. IMO, the poll
> loop shouldn't be considered as a valid idle state for different reasons:
> 
> 0. thermal issue if wrongly selected from any of the governor

That seems like a QoS issue and should be fixed there, no?

> 1. a polling loop does not have a valid time measurements even if the
> TIME_VALID flag has been added

Ah, right you are. It does not. We _could_ fix that, not sure its worth
the hassle but see below.

> 2. entering the idle governors is not free, especially the menu governor,
> which is contradictory with zero latency req

Well, you could add this 'fast' path to the cpuidle code (before calling
into the actual governors) too. Also, since the governors actually use
this state it makes sense for it to be available.

> 3. what is the meaning of having a zero latency (target + exit) idle state ?
> We know it will always succeed if the other fails

Not quite sure I follow; you seem to have answered your own question?

> 4. IIUC, you are suggesting to add the poll loop for all the cpuidle
> drivers. This is a *lot* of changes, I am not afraid about the work to do
> but there is a significant code impact and the entire behavior of the
> cpuidle framework for all the arch will be changed.

I'm not sure it would be a lot of work, add it in the common cpuidle
code before calling the driver init?

> So given the points above, why not have one poll function, generic, and if
> we fail to find an idle state or if the req is zero, then fallback to the
> poll loop ?

I could agree but only if we keep the poll loop generic, we cannot go
add yet more arch hooks there.

> >Because the latter is actually arch specific, whereas the idle
> >polling thing is not.  You can _always_ poll idle, its generic, its
> >valid, and its guaranteed the most responsive method.
> 
> I agree with this point but this kind of loop is hardware optimized for x86.

well 'optimized' is a strong word there, the rep nop, or pause
instruction isn't really much at all and is mostly a SMT hint afaik.
ia64, sparc64 and ppc64 have similar magic and s390 and hexagon use a vm
yield like construct.

> On the other arch, where today this loop is never used, if we change the
> behavior of the cpuidle framework and introduces this loop, it may be
> selected and stay on this state for a long time (resulting from a bad
> prediction), I am afraid that can lead to some thermal issues.

Because of 1), right? Yes that's a problem.


---
 kernel/sched/idle.c | 6 +++++-
 kernel/softirq.c    | 7 +++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index c47fce75e666..9c76ae92650f 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -42,12 +42,16 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
 __setup("hlt", cpu_idle_nopoll_setup);
 #endif
 
+DEFINE_PER_CPU(unsigned int, int_seq);
+
 static inline int cpu_idle_poll(void)
 {
+	unsigned int seq = this_cpu_read(int_seq);
+
 	rcu_idle_enter();
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
 	local_irq_enable();
-	while (!tif_need_resched())
+	while (!tif_need_resched() && seq == this_cpu_read(int_seq))
 		cpu_relax();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 	rcu_idle_exit();
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 0699add19164..bc8d43545964 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -370,6 +370,8 @@ static inline void tick_irq_exit(void)
 #endif
 }
 
+DECLARE_PER_CPU(unsigned int, int_seq);
+
 /*
  * Exit an interrupt context. Process softirqs if needed and possible:
  */
@@ -386,6 +388,11 @@ void irq_exit(void)
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
 
+#ifdef TIG_POLLING_NRFLAG
+	if (test_thread_flag(TIF_POLLING_NRFLAG))
+#endif
+		this_cpu_inc(int_seq);
+
 	tick_irq_exit();
 	rcu_irq_exit();
 	trace_hardirq_exit(); /* must be last! */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle
  2014-11-12 15:02                       ` Peter Zijlstra
@ 2014-11-12 17:52                         ` Daniel Lezcano
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Lezcano @ 2014-11-12 17:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, preeti, nicolas.pitre, linux-pm, linux-kernel,
	linaro-kernel, patches, lenb

On 11/12/2014 04:02 PM, Peter Zijlstra wrote:
> On Wed, Nov 12, 2014 at 02:53:10PM +0100, Daniel Lezcano wrote:
>
>>>> The governors are just ignoring it, except for a small timer optimization in
>>>> menu.c (and I am still not convinced it is worth to have it). I don't see
>>>> the point to add a state we don't want to use.
>>>
>>> The ignoring it is _wrong_. Make that go away and you'll get rid of most
>>> of the STATE_START crap.
>>>
>>> The governors are the place where we combine the QoS constraints with
>>> idle predictors and pick an idle state, polling is a valid state to
>>> pick, and given QoS constraints it might be the only state to pick.
>>
>> Well, I understand your point of view but I still disagree. IMO, the poll
>> loop shouldn't be considered as a valid idle state for different reasons:
>>
>> 0. thermal issue if wrongly selected from any of the governor
>
> That seems like a QoS issue and should be fixed there, no?

It could be seen as a QoS issue if we stick the poll loop with the zero 
latency req. But as soon as we introduce the poll loop in the cpuidle 
framework, the issue could have multiple sources (see below at the end 
of the message).

>> 1. a polling loop does not have a valid time measurements even if the
>> TIME_VALID flag has been added
>
> Ah, right you are. It does not. We _could_ fix that, not sure its worth
> the hassle but see below.
>
>> 2. entering the idle governors is not free, especially the menu governor,
>> which is contradictory with zero latency req
>
> Well, you could add this 'fast' path to the cpuidle code (before calling
> into the actual governors) too. Also, since the governors actually use
> this state it makes sense for it to be available.

Actually the governors do not use this state or do whatever they can to 
prevent to use it. The ladder governor just ignore it, and the menu 
governor will default to C1 (not poll) if no state is found. It will use 
the poll loop only if a timer is about to expire within 5us and all this 
dance is around this micro optimization.

You are suggesting to add the fast in the cpuidle framework. IIUC we 
should have:

int cpuidle_select(drv, dev)
{
	...
	if (latency_req == 0)
		return ????;
}

The '????' suggests we duplicate everywhere an 0th idle state (more than 
17 drivers and that breaks the idle states DT binding.

>> 3. what is the meaning of having a zero latency (target + exit) idle state ?
>> We know it will always succeed if the other fails
>
> Not quite sure I follow; you seem to have answered your own question?

Yeah, right :)

>> 4. IIUC, you are suggesting to add the poll loop for all the cpuidle
>> drivers. This is a *lot* of changes, I am not afraid about the work to do
>> but there is a significant code impact and the entire behavior of the
>> cpuidle framework for all the arch will be changed.
>
> I'm not sure it would be a lot of work, add it in the common cpuidle
> code before calling the driver init?



>> So given the points above, why not have one poll function, generic, and if
>> we fail to find an idle state or if the req is zero, then fallback to the
>> poll loop ?
>
> I could agree but only if we keep the poll loop generic, we cannot go
> add yet more arch hooks there.

I understand. The arch hook just give an opportunity to specify an 
arch's specific poll loop, that does not imply it has to.

>>> Because the latter is actually arch specific, whereas the idle
>>> polling thing is not.  You can _always_ poll idle, its generic, its
>>> valid, and its guaranteed the most responsive method.
>>
>> I agree with this point but this kind of loop is hardware optimized for x86.
>
> well 'optimized' is a strong word there, the rep nop, or pause
> instruction isn't really much at all and is mostly a SMT hint afaik.
> ia64, sparc64 and ppc64 have similar magic and s390 and hexagon use a vm
> yield like construct.



>> On the other arch, where today this loop is never used, if we change the
>> behavior of the cpuidle framework and introduces this loop, it may be
>> selected and stay on this state for a long time (resulting from a bad
>> prediction), I am afraid that can lead to some thermal issues.
>
> Because of 1), right? Yes that's a problem.

Well, 1) won't help, but also for different reasons:

By removing the STATE_START macro and adding the poll as the 0th idle state:

ladder governor: it was never, ever, using the poll loop, just ignoring 
the 0th idle state by using the STATE_START macro to stop demoting. That 
won't be the case anymore, and we may fall in the poll loop

menu governor: it was almost never using the poll state. With the change 
it will select the state. If the prediction is bad and the poll idle 
loop is selected but the cpu stays idle longer (that could be several 
seconds). That can happens with a bad predictions or task migration when 
this one is doing a lot of IO.

On the ARM embedded systems where the poll loop is not used, except when 
specified in the kernel command line. That may lead to some thermal 
issues and big troubles.

I like your patch below that will help to re-evaluate the idle state but 
I am not sure it will solve the issue introduced with the generic idle 
loop as the 0th idle state.

Thanks Peter by taking the time to review this patch.

   -- Daniel

> ---
>   kernel/sched/idle.c | 6 +++++-
>   kernel/softirq.c    | 7 +++++++
>   2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index c47fce75e666..9c76ae92650f 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -42,12 +42,16 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>   __setup("hlt", cpu_idle_nopoll_setup);
>   #endif
>
> +DEFINE_PER_CPU(unsigned int, int_seq);
> +
>   static inline int cpu_idle_poll(void)
>   {
> +	unsigned int seq = this_cpu_read(int_seq);
> +
>   	rcu_idle_enter();
>   	trace_cpu_idle_rcuidle(0, smp_processor_id());
>   	local_irq_enable();
> -	while (!tif_need_resched())
> +	while (!tif_need_resched() && seq == this_cpu_read(int_seq))
>   		cpu_relax();
>   	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>   	rcu_idle_exit();
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 0699add19164..bc8d43545964 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -370,6 +370,8 @@ static inline void tick_irq_exit(void)
>   #endif
>   }
>
> +DECLARE_PER_CPU(unsigned int, int_seq);
> +
>   /*
>    * Exit an interrupt context. Process softirqs if needed and possible:
>    */
> @@ -386,6 +388,11 @@ void irq_exit(void)
>   	if (!in_interrupt() && local_softirq_pending())
>   		invoke_softirq();
>
> +#ifdef TIG_POLLING_NRFLAG
> +	if (test_thread_flag(TIF_POLLING_NRFLAG))
> +#endif
> +		this_cpu_inc(int_seq);
> +
>   	tick_irq_exit();
>   	rcu_irq_exit();
>   	trace_hardirq_exit(); /* must be last! */
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-11-12 17:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-07 14:31 [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano
2014-11-07 14:31 ` [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function Daniel Lezcano
2014-11-08 10:39   ` Preeti U Murthy
2014-11-10 12:29   ` Peter Zijlstra
2014-11-10 14:20     ` Preeti U Murthy
2014-11-10 15:17       ` Peter Zijlstra
2014-11-11 11:00         ` Preeti U Murthy
2014-11-07 14:31 ` [PATCH V3 2/6] sched: idle: cpuidle: Check the latency req before idle Daniel Lezcano
2014-11-08 10:40   ` Preeti U Murthy
2014-11-10 12:41   ` Peter Zijlstra
2014-11-10 15:12     ` Daniel Lezcano
2014-11-10 15:28       ` Peter Zijlstra
2014-11-10 15:58         ` Daniel Lezcano
2014-11-10 16:15           ` Peter Zijlstra
2014-11-10 17:19             ` Daniel Lezcano
2014-11-10 19:48               ` Peter Zijlstra
2014-11-10 22:21                 ` Daniel Lezcano
2014-11-11 10:20                   ` Peter Zijlstra
2014-11-12 13:53                     ` Daniel Lezcano
2014-11-12 15:02                       ` Peter Zijlstra
2014-11-12 17:52                         ` Daniel Lezcano
2014-11-07 14:31 ` [PATCH V3 3/6] sched: idle: Get the next timer event and pass it the cpuidle framework Daniel Lezcano
2014-11-08 10:44   ` Preeti U Murthy
2014-11-10 12:43   ` Peter Zijlstra
2014-11-10 15:15     ` Daniel Lezcano
2014-11-07 14:31 ` [PATCH V3 4/6] cpuidle: idle: menu: Don't reflect when a state selection failed Daniel Lezcano
2014-11-08 10:41   ` Preeti U Murthy
2014-11-07 14:31 ` [PATCH V3 5/6] cpuidle: menu: Fix the get_typical_interval Daniel Lezcano
2014-11-07 14:31 ` [PATCH V3 6/6] cpuidle: menu: Move the update function before its declaration Daniel Lezcano
2014-11-07 14:34 ` [PATCH V3 0/6] sched: idle: cpuidle: cleanups and fixes Daniel Lezcano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).