All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
@ 2014-09-04 15:32 Nicolas Pitre
  2014-09-04 15:32 ` [PATCH v2 1/2] sched: let the scheduler see CPU idle states Nicolas Pitre
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-04 15:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Daniel Lezcano, Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

This is a rework of the series initially posted by Daniel Lezcano here:

http://news.gmane.org/group/gmane.linux.power-management.general/thread=44161

Those patches were straightened up, commit logs are more comprehensive,
bugs were fixed, etc.


 drivers/cpuidle/cpuidle.c |  6 ++++++
 kernel/sched/fair.c       | 43 ++++++++++++++++++++++++++++++++++-------
 kernel/sched/idle.c       |  6 ++++++
 kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++
 4 files changed, 87 insertions(+), 7 deletions(-)


Nicolas


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-04 15:32 [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
@ 2014-09-04 15:32 ` Nicolas Pitre
  2014-09-18 17:37   ` Paul E. McKenney
  2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
  2014-09-10 21:35 ` [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
  2 siblings, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-04 15:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Daniel Lezcano, Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

From: Daniel Lezcano <daniel.lezcano@linaro.org>

When the cpu enters idle, it stores the cpuidle state pointer in its
struct rq instance which in turn could be used to make a better decision
when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is
cleared.

There are a couple of situations where the idle state pointer could be changed
while it is being consulted:

1. For x86/acpi with dynamic c-states, when a laptop switches from battery
   to AC that could result on removing the deeper idle state. The acpi driver
   triggers:
	'acpi_processor_cst_has_changed'
		'cpuidle_pause_and_lock'
			'cpuidle_uninstall_idle_handler'
				'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to
NULL.

2. The cpuidle driver is unloaded. Logically that could happen but not
in practice because the drivers are always compiled in and 95% of them are
not coded to unregister themselves.  In any case, the unloading code must
call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios
occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be
used inside a rcu_read_lock section where we are protected with the
'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
idle_get_state() and idle_put_state() accessors should be used to that
effect.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 drivers/cpuidle/cpuidle.c |  6 ++++++
 kernel/sched/idle.c       |  6 ++++++
 kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ee9df5e3f5..530e3055a2 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void)
 		initialized = 0;
 		kick_all_cpus_sync();
 	}
+
+	/*
+	 * Make sure external observers (such as the scheduler)
+	 * are done looking at pointed idle states.
+	 */
+	rcu_barrier();
 }
 
 /**
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 11e7bc434f..c47fce75e6 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -147,6 +147,9 @@ use_default:
 	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
 		goto use_default;
 
+	/* Take note of the planned idle state. */
+	idle_set_state(this_rq(), &drv->states[next_state]);
+
 	/*
 	 * Enter the idle state previously returned by the governor decision.
 	 * This function will block until an interrupt occurs and will take
@@ -154,6 +157,9 @@ use_default:
 	 */
 	entered_state = cpuidle_enter(drv, dev, next_state);
 
+	/* The cpu is no longer idle or about to enter idle. */
+	idle_set_state(this_rq(), NULL);
+
 	if (broadcast)
 		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 579712f4e9..aea8baa7a5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -14,6 +14,7 @@
 #include "cpuacct.h"
 
 struct rq;
+struct cpuidle_state;
 
 extern __read_mostly int scheduler_running;
 
@@ -636,6 +637,11 @@ struct rq {
 #ifdef CONFIG_SMP
 	struct llist_head wake_list;
 #endif
+
+#ifdef CONFIG_CPU_IDLE
+	/* Must be inspected within a rcu lock section */
+	struct cpuidle_state *idle_state;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
@@ -1180,6 +1186,39 @@ static inline void idle_exit_fair(struct rq *rq) { }
 
 #endif
 
+#ifdef CONFIG_CPU_IDLE
+static inline void idle_set_state(struct rq *rq,
+				  struct cpuidle_state *idle_state)
+{
+	rq->idle_state = idle_state;
+}
+
+static inline struct cpuidle_state *idle_get_state(struct rq *rq)
+{
+	rcu_read_lock();
+	return rq->idle_state;
+}
+
+static inline void cpuidle_put_state(struct rq *rq)
+{
+	rcu_read_unlock();
+}
+#else
+static inline void idle_set_state(struct rq *rq,
+				  struct cpuidle_state *idle_state)
+{
+}
+
+static inline struct cpuidle_state *idle_get_state(struct rq *rq)
+{
+	return NULL;
+}
+
+static inline void cpuidle_put_state(struct rq *rq)
+{
+}
+#endif
+
 extern void sysrq_sched_debug_show(void);
 extern void sched_init_granularity(void);
 extern void update_max_interval(void);
-- 
1.8.4.108.g55ea5f6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
  2014-09-04 15:32 [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
  2014-09-04 15:32 ` [PATCH v2 1/2] sched: let the scheduler see CPU idle states Nicolas Pitre
@ 2014-09-04 15:32 ` Nicolas Pitre
  2014-09-05  7:52   ` Daniel Lezcano
                     ` (4 more replies)
  2014-09-10 21:35 ` [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
  2 siblings, 5 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-04 15:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Daniel Lezcano, Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

The code in find_idlest_cpu() looks for the CPU with the smallest load.
However, if multiple CPUs are idle, the first idle CPU is selected
irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle
state, or the latest to have gone idle if all idle CPUs are in the same
state.  The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

- The idle exit latency of a CPU might be larger than the time needed
  to migrate the waking task to an already running CPU with sufficient
  capacity, and therefore performance would benefit from task packing
  in such case (in most cases task packing is about power saving).

- Some idle states have a non negligible and non abortable entry latency
  which needs to run to completion before the exit latency can start.
  A concurrent patch series is making this info available to the cpuidle
  core.  Once available, the entry latency with the idle timestamp could
  determine when the exit latency may be effective.

Those issues will be handled in due course.  In the mean time, what
is implemented here should improve things already compared to the current
state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86d0d..416329e1a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -23,6 +23,7 @@
 #include <linux/latencytop.h>
 #include <linux/sched.h>
 #include <linux/cpumask.h>
+#include <linux/cpuidle.h>
 #include <linux/slab.h>
 #include <linux/profile.h>
 #include <linux/interrupt.h>
@@ -4428,20 +4429,48 @@ static int
 find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 {
 	unsigned long load, min_load = ULONG_MAX;
-	int idlest = -1;
+	unsigned int min_exit_latency = UINT_MAX;
+	u64 latest_idle_timestamp = 0;
+	int least_loaded_cpu = this_cpu;
+	int shallowest_idle_cpu = -1;
 	int i;
 
 	/* Traverse only the allowed CPUs */
 	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
-		load = weighted_cpuload(i);
-
-		if (load < min_load || (load == min_load && i == this_cpu)) {
-			min_load = load;
-			idlest = i;
+		if (idle_cpu(i)) {
+			struct rq *rq = cpu_rq(i);
+			struct cpuidle_state *idle = idle_get_state(rq);
+			if (idle && idle->exit_latency < min_exit_latency) {
+				/*
+				 * We give priority to a CPU whose idle state
+				 * has the smallest exit latency irrespective
+				 * of any idle timestamp.
+				 */
+				min_exit_latency = idle->exit_latency;
+				latest_idle_timestamp = rq->idle_stamp;
+				shallowest_idle_cpu = i;
+			} else if ((!idle || idle->exit_latency == min_exit_latency) &&
+				   rq->idle_stamp > latest_idle_timestamp) {
+				/*
+				 * If equal or no active idle state, then
+				 * the most recently idled CPU might have
+				 * a warmer cache.
+				 */
+				latest_idle_timestamp = rq->idle_stamp;
+				shallowest_idle_cpu = i;
+			}
+			cpuidle_put_state(rq);
+		} else {
+			load = weighted_cpuload(i);
+			if (load < min_load ||
+			    (load == min_load && i == this_cpu)) {
+				min_load = load;
+				least_loaded_cpu = i;
+			}
 		}
 	}
 
-	return idlest;
+	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
 }
 
 /*
-- 
1.8.4.108.g55ea5f6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
  2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
@ 2014-09-05  7:52   ` Daniel Lezcano
  2014-09-18 23:46   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Daniel Lezcano @ 2014-09-05  7:52 UTC (permalink / raw)
  To: Nicolas Pitre, Peter Zijlstra, Ingo Molnar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On 09/04/2014 05:32 PM, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest load.
> However, if multiple CPUs are idle, the first idle CPU is selected
> irrespective of the depth of its idle state.
>
> Among the idle CPUs we should pick the one with with the shallowest idle
> state, or the latest to have gone idle if all idle CPUs are in the same
> state.  The later applies even when cpuidle is configured out.
>
> This patch doesn't cover the following issues:
>
> - The idle exit latency of a CPU might be larger than the time needed
>    to migrate the waking task to an already running CPU with sufficient
>    capacity, and therefore performance would benefit from task packing
>    in such case (in most cases task packing is about power saving).
>
> - Some idle states have a non negligible and non abortable entry latency
>    which needs to run to completion before the exit latency can start.
>    A concurrent patch series is making this info available to the cpuidle
>    core.  Once available, the entry latency with the idle timestamp could
>    determine when the exit latency may be effective.
>
> Those issues will be handled in due course.  In the mean time, what
> is implemented here should improve things already compared to the current
> state of affairs.
>
> Based on an initial patch from Daniel Lezcano.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>

Sounds good for me.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> ---
>   kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>   #include <linux/latencytop.h>
>   #include <linux/sched.h>
>   #include <linux/cpumask.h>
> +#include <linux/cpuidle.h>
>   #include <linux/slab.h>
>   #include <linux/profile.h>
>   #include <linux/interrupt.h>
> @@ -4428,20 +4429,48 @@ static int
>   find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>   {
>   	unsigned long load, min_load = ULONG_MAX;
> -	int idlest = -1;
> +	unsigned int min_exit_latency = UINT_MAX;
> +	u64 latest_idle_timestamp = 0;
> +	int least_loaded_cpu = this_cpu;
> +	int shallowest_idle_cpu = -1;
>   	int i;
>
>   	/* Traverse only the allowed CPUs */
>   	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> -		load = weighted_cpuload(i);
> -
> -		if (load < min_load || (load == min_load && i == this_cpu)) {
> -			min_load = load;
> -			idlest = i;
> +		if (idle_cpu(i)) {
> +			struct rq *rq = cpu_rq(i);
> +			struct cpuidle_state *idle = idle_get_state(rq);
> +			if (idle && idle->exit_latency < min_exit_latency) {
> +				/*
> +				 * We give priority to a CPU whose idle state
> +				 * has the smallest exit latency irrespective
> +				 * of any idle timestamp.
> +				 */
> +				min_exit_latency = idle->exit_latency;
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			} else if ((!idle || idle->exit_latency == min_exit_latency) &&
> +				   rq->idle_stamp > latest_idle_timestamp) {
> +				/*
> +				 * If equal or no active idle state, then
> +				 * the most recently idled CPU might have
> +				 * a warmer cache.
> +				 */
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			}
> +			cpuidle_put_state(rq);
> +		} else {
> +			load = weighted_cpuload(i);
> +			if (load < min_load ||
> +			    (load == min_load && i == this_cpu)) {
> +				min_load = load;
> +				least_loaded_cpu = i;
> +			}
>   		}
>   	}
>
> -	return idlest;
> +	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
>   }
>
>   /*
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-04 15:32 [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
  2014-09-04 15:32 ` [PATCH v2 1/2] sched: let the scheduler see CPU idle states Nicolas Pitre
  2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
@ 2014-09-10 21:35 ` Nicolas Pitre
  2014-09-10 22:50   ` Rafael J. Wysocki
  2014-09-18  0:39   ` Nicolas Pitre
  2 siblings, 2 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-10 21:35 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Rafael J. Wysocki, linux-kernel, linaro-kernel, linux-pm


Ping.


On Thu, 4 Sep 2014, Nicolas Pitre wrote:

> This is a rework of the series initially posted by Daniel Lezcano here:
> 
> http://news.gmane.org/group/gmane.linux.power-management.general/thread=44161
> 
> Those patches were straightened up, commit logs are more comprehensive,
> bugs were fixed, etc.
> 
> 
>  drivers/cpuidle/cpuidle.c |  6 ++++++
>  kernel/sched/fair.c       | 43 ++++++++++++++++++++++++++++++++++-------
>  kernel/sched/idle.c       |  6 ++++++
>  kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 87 insertions(+), 7 deletions(-)
> 
> 
> Nicolas
> 
> 
> _______________________________________________
> linaro-kernel mailing list
> linaro-kernel@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-kernel
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-10 21:35 ` [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
@ 2014-09-10 22:50   ` Rafael J. Wysocki
  2014-09-10 23:25     ` Nicolas Pitre
  2014-09-18  0:39   ` Nicolas Pitre
  1 sibling, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2014-09-10 22:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linaro-kernel, linux-pm

On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:
> 
> Ping.

Is this urgent, and if so, then why?

> On Thu, 4 Sep 2014, Nicolas Pitre wrote:
> 
> > This is a rework of the series initially posted by Daniel Lezcano here:
> > 
> > http://news.gmane.org/group/gmane.linux.power-management.general/thread=44161
> > 
> > Those patches were straightened up, commit logs are more comprehensive,
> > bugs were fixed, etc.
> > 
> > 
> >  drivers/cpuidle/cpuidle.c |  6 ++++++
> >  kernel/sched/fair.c       | 43 ++++++++++++++++++++++++++++++++++-------
> >  kernel/sched/idle.c       |  6 ++++++
> >  kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++
> >  4 files changed, 87 insertions(+), 7 deletions(-)
> > 
> > 
> > Nicolas
> > 
> > 
> > _______________________________________________
> > linaro-kernel mailing list
> > linaro-kernel@lists.linaro.org
> > http://lists.linaro.org/mailman/listinfo/linaro-kernel
> > 
> > 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-10 22:50   ` Rafael J. Wysocki
@ 2014-09-10 23:25     ` Nicolas Pitre
  2014-09-10 23:28       ` Nicolas Pitre
  2014-09-10 23:50       ` Rafael J. Wysocki
  0 siblings, 2 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-10 23:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linaro-kernel, linux-pm

On Thu, 11 Sep 2014, Rafael J. Wysocki wrote:

> On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:
> > 
> > Ping.
> 
> Is this urgent, and if so, then why?

What makes you think this could be urgent?

After almost a week after the original posting without any feedback, one 
may simply wonder if things could have accidentally fell into a crack, 
that's all.

> 
> > On Thu, 4 Sep 2014, Nicolas Pitre wrote:
> > 
> > > This is a rework of the series initially posted by Daniel Lezcano here:
> > > 
> > > http://news.gmane.org/group/gmane.linux.power-management.general/thread=44161
> > > 
> > > Those patches were straightened up, commit logs are more comprehensive,
> > > bugs were fixed, etc.
> > > 
> > > 
> > >  drivers/cpuidle/cpuidle.c |  6 ++++++
> > >  kernel/sched/fair.c       | 43 ++++++++++++++++++++++++++++++++++-------
> > >  kernel/sched/idle.c       |  6 ++++++
> > >  kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++
> > >  4 files changed, 87 insertions(+), 7 deletions(-)
> > > 
> > > 
> > > Nicolas
> > > 
> > > 
> > > _______________________________________________
> > > linaro-kernel mailing list
> > > linaro-kernel@lists.linaro.org
> > > http://lists.linaro.org/mailman/listinfo/linaro-kernel
> > > 
> > > 
> 
> -- 
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-10 23:25     ` Nicolas Pitre
@ 2014-09-10 23:28       ` Nicolas Pitre
  2014-09-10 23:50       ` Rafael J. Wysocki
  1 sibling, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-10 23:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linaro-kernel, linux-pm

On Wed, 10 Sep 2014, Nicolas Pitre wrote:

> On Thu, 11 Sep 2014, Rafael J. Wysocki wrote:
> 
> > On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:
> > > 
> > > Ping.
> > 
> > Is this urgent, and if so, then why?
> 
> What makes you think this could be urgent?
> 
> After almost a week after the original posting without any feedback, one 
> may simply wonder if things could have accidentally fell into a crack, 
> that's all.

s/fell/fallen/ of course.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-10 23:25     ` Nicolas Pitre
  2014-09-10 23:28       ` Nicolas Pitre
@ 2014-09-10 23:50       ` Rafael J. Wysocki
  1 sibling, 0 replies; 35+ messages in thread
From: Rafael J. Wysocki @ 2014-09-10 23:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linaro-kernel, linux-pm

On Wednesday, September 10, 2014 07:25:37 PM Nicolas Pitre wrote:
> On Thu, 11 Sep 2014, Rafael J. Wysocki wrote:
> 
> > On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:
> > > 
> > > Ping.
> > 
> > Is this urgent, and if so, then why?
> 
> What makes you think this could be urgent?
> 
> After almost a week after the original posting without any feedback, one 
> may simply wonder if things could have accidentally fell into a crack, 
> that's all.

Well, they didn't, but some recipients have been traveling quite a bit lately
and are in the process of dealing with their email backlogs ...

Sorry about being less responsive than expected.

Rafael


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-10 21:35 ` [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
  2014-09-10 22:50   ` Rafael J. Wysocki
@ 2014-09-18  0:39   ` Nicolas Pitre
  2014-09-18 23:24     ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-18  0:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Rafael J. Wysocki, linux-kernel, linaro-kernel, linux-pm


Ping-ping.

On Wed, 10 Sep 2014, Nicolas Pitre wrote:

> 
> Ping.
> 
> 
> On Thu, 4 Sep 2014, Nicolas Pitre wrote:
> 
> > This is a rework of the series initially posted by Daniel Lezcano here:
> > 
> > http://news.gmane.org/group/gmane.linux.power-management.general/thread=44161
> > 
> > Those patches were straightened up, commit logs are more comprehensive,
> > bugs were fixed, etc.
> > 
> > 
> >  drivers/cpuidle/cpuidle.c |  6 ++++++
> >  kernel/sched/fair.c       | 43 ++++++++++++++++++++++++++++++++++-------
> >  kernel/sched/idle.c       |  6 ++++++
> >  kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++
> >  4 files changed, 87 insertions(+), 7 deletions(-)
> > 
> > 
> > Nicolas
> > 
> > 
> > _______________________________________________
> > linaro-kernel mailing list
> > linaro-kernel@lists.linaro.org
> > http://lists.linaro.org/mailman/listinfo/linaro-kernel
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-04 15:32 ` [PATCH v2 1/2] sched: let the scheduler see CPU idle states Nicolas Pitre
@ 2014-09-18 17:37   ` Paul E. McKenney
  2014-09-18 17:39     ` Paul E. McKenney
  2014-09-18 18:32     ` Nicolas Pitre
  0 siblings, 2 replies; 35+ messages in thread
From: Paul E. McKenney @ 2014-09-18 17:37 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Thu, Sep 04, 2014 at 11:32:09AM -0400, Nicolas Pitre wrote:
> From: Daniel Lezcano <daniel.lezcano@linaro.org>
> 
> When the cpu enters idle, it stores the cpuidle state pointer in its
> struct rq instance which in turn could be used to make a better decision
> when balancing tasks.
> 
> As soon as the cpu exits its idle state, the struct rq reference is
> cleared.
> 
> There are a couple of situations where the idle state pointer could be changed
> while it is being consulted:
> 
> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
>    to AC that could result on removing the deeper idle state. The acpi driver
>    triggers:
> 	'acpi_processor_cst_has_changed'
> 		'cpuidle_pause_and_lock'
> 			'cpuidle_uninstall_idle_handler'
> 				'kick_all_cpus_sync'.
> 
> All cpus will exit their idle state and the pointed object will be set to
> NULL.
> 
> 2. The cpuidle driver is unloaded. Logically that could happen but not
> in practice because the drivers are always compiled in and 95% of them are
> not coded to unregister themselves.  In any case, the unloading code must
> call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
> leading to 'kick_all_cpus_sync' as mentioned above.
> 
> A race can happen if we use the pointer and then one of these two scenarios
> occurs at the same moment.
> 
> In order to be safe, the idle state pointer stored in the rq must be
> used inside a rcu_read_lock section where we are protected with the
> 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
> idle_get_state() and idle_put_state() accessors should be used to that
> effect.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  drivers/cpuidle/cpuidle.c |  6 ++++++
>  kernel/sched/idle.c       |  6 ++++++
>  kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 51 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index ee9df5e3f5..530e3055a2 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void)
>  		initialized = 0;
>  		kick_all_cpus_sync();
>  	}
> +
> +	/*
> +	 * Make sure external observers (such as the scheduler)
> +	 * are done looking at pointed idle states.
> +	 */
> +	rcu_barrier();

Actually, all rcu_barrier() does is to make sure that all previously
queued RCU callbacks have been invoked.  And given the current
implementation, if there are no callbacks queued anywhere in the system,
rcu_barrier() is an extended no-op.  "Has CPU 0 any callbacks?" "Nope!"
"Has CPU 1 any callbacks?"  "Nope!" ... "Has CPU nr_cpu_ids-1 any
callbacks?"  "Nope!"  "OK, done!"

This is all done with the current task looking at per-CPU data structures,
with no interaction with the scheduler and with no need to actually make
those other CPUs do anything.

So what is it that you really need to do here?

A synchronize_sched() will wait for all non-idle online CPUs to pass
through the scheduler, where "idle" includes usermode execution in
CONFIG_NO_HZ_FULL=y kernels.  But it won't wait for CPUs executing
in the idle loop.

A synchronize_rcu_tasks() will wait for all non-idle tasks that are
currently on a runqueue to do a voluntary context switch.  There has
been some discussion about extending this to idle tasks, but the current
prospective users can live without this.  But if you need it, I can push
on getting it set up.  (Current plans are that synchronize_rcu_tasks()
goes into the v3.18 merge window.)  And one caveat: There is long
latency associated with synchronize_rcu_tasks() by design.  Grace
periods are measured in seconds.

A stop_cpus() will force a context switch on all CPUs, though it is
a rather big hammer.

So again, what do you really need?

							Thanx, Paul

>  }
> 
>  /**
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 11e7bc434f..c47fce75e6 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -147,6 +147,9 @@ use_default:
>  	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
>  		goto use_default;
> 
> +	/* Take note of the planned idle state. */
> +	idle_set_state(this_rq(), &drv->states[next_state]);
> +
>  	/*
>  	 * Enter the idle state previously returned by the governor decision.
>  	 * This function will block until an interrupt occurs and will take
> @@ -154,6 +157,9 @@ use_default:
>  	 */
>  	entered_state = cpuidle_enter(drv, dev, next_state);
> 
> +	/* The cpu is no longer idle or about to enter idle. */
> +	idle_set_state(this_rq(), NULL);
> +
>  	if (broadcast)
>  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 579712f4e9..aea8baa7a5 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -14,6 +14,7 @@
>  #include "cpuacct.h"
> 
>  struct rq;
> +struct cpuidle_state;
> 
>  extern __read_mostly int scheduler_running;
> 
> @@ -636,6 +637,11 @@ struct rq {
>  #ifdef CONFIG_SMP
>  	struct llist_head wake_list;
>  #endif
> +
> +#ifdef CONFIG_CPU_IDLE
> +	/* Must be inspected within a rcu lock section */
> +	struct cpuidle_state *idle_state;
> +#endif
>  };
> 
>  static inline int cpu_of(struct rq *rq)
> @@ -1180,6 +1186,39 @@ static inline void idle_exit_fair(struct rq *rq) { }
> 
>  #endif
> 
> +#ifdef CONFIG_CPU_IDLE
> +static inline void idle_set_state(struct rq *rq,
> +				  struct cpuidle_state *idle_state)
> +{
> +	rq->idle_state = idle_state;
> +}
> +
> +static inline struct cpuidle_state *idle_get_state(struct rq *rq)
> +{
> +	rcu_read_lock();
> +	return rq->idle_state;
> +}
> +
> +static inline void cpuidle_put_state(struct rq *rq)
> +{
> +	rcu_read_unlock();
> +}
> +#else
> +static inline void idle_set_state(struct rq *rq,
> +				  struct cpuidle_state *idle_state)
> +{
> +}
> +
> +static inline struct cpuidle_state *idle_get_state(struct rq *rq)
> +{
> +	return NULL;
> +}
> +
> +static inline void cpuidle_put_state(struct rq *rq)
> +{
> +}
> +#endif
> +
>  extern void sysrq_sched_debug_show(void);
>  extern void sched_init_granularity(void);
>  extern void update_max_interval(void);
> -- 
> 1.8.4.108.g55ea5f6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-18 17:37   ` Paul E. McKenney
@ 2014-09-18 17:39     ` Paul E. McKenney
  2014-09-18 23:15       ` Peter Zijlstra
  2014-09-18 18:32     ` Nicolas Pitre
  1 sibling, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2014-09-18 17:39 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Thu, Sep 18, 2014 at 10:37:33AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 04, 2014 at 11:32:09AM -0400, Nicolas Pitre wrote:
> > From: Daniel Lezcano <daniel.lezcano@linaro.org>
> > 
> > When the cpu enters idle, it stores the cpuidle state pointer in its
> > struct rq instance which in turn could be used to make a better decision
> > when balancing tasks.
> > 
> > As soon as the cpu exits its idle state, the struct rq reference is
> > cleared.
> > 
> > There are a couple of situations where the idle state pointer could be changed
> > while it is being consulted:
> > 
> > 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
> >    to AC that could result on removing the deeper idle state. The acpi driver
> >    triggers:
> > 	'acpi_processor_cst_has_changed'
> > 		'cpuidle_pause_and_lock'
> > 			'cpuidle_uninstall_idle_handler'
> > 				'kick_all_cpus_sync'.
> > 
> > All cpus will exit their idle state and the pointed object will be set to
> > NULL.
> > 
> > 2. The cpuidle driver is unloaded. Logically that could happen but not
> > in practice because the drivers are always compiled in and 95% of them are
> > not coded to unregister themselves.  In any case, the unloading code must
> > call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
> > leading to 'kick_all_cpus_sync' as mentioned above.
> > 
> > A race can happen if we use the pointer and then one of these two scenarios
> > occurs at the same moment.
> > 
> > In order to be safe, the idle state pointer stored in the rq must be
> > used inside a rcu_read_lock section where we are protected with the
> > 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
> > idle_get_state() and idle_put_state() accessors should be used to that
> > effect.
> > 
> > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > ---
> >  drivers/cpuidle/cpuidle.c |  6 ++++++
> >  kernel/sched/idle.c       |  6 ++++++
> >  kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index ee9df5e3f5..530e3055a2 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void)
> >  		initialized = 0;
> >  		kick_all_cpus_sync();
> >  	}
> > +
> > +	/*
> > +	 * Make sure external observers (such as the scheduler)
> > +	 * are done looking at pointed idle states.
> > +	 */
> > +	rcu_barrier();
> 
> Actually, all rcu_barrier() does is to make sure that all previously
> queued RCU callbacks have been invoked.  And given the current
> implementation, if there are no callbacks queued anywhere in the system,
> rcu_barrier() is an extended no-op.  "Has CPU 0 any callbacks?" "Nope!"
> "Has CPU 1 any callbacks?"  "Nope!" ... "Has CPU nr_cpu_ids-1 any
> callbacks?"  "Nope!"  "OK, done!"
> 
> This is all done with the current task looking at per-CPU data structures,
> with no interaction with the scheduler and with no need to actually make
> those other CPUs do anything.
> 
> So what is it that you really need to do here?
> 
> A synchronize_sched() will wait for all non-idle online CPUs to pass
> through the scheduler, where "idle" includes usermode execution in
> CONFIG_NO_HZ_FULL=y kernels.  But it won't wait for CPUs executing
> in the idle loop.
> 
> A synchronize_rcu_tasks() will wait for all non-idle tasks that are
> currently on a runqueue to do a voluntary context switch.  There has
> been some discussion about extending this to idle tasks, but the current
> prospective users can live without this.  But if you need it, I can push
> on getting it set up.  (Current plans are that synchronize_rcu_tasks()
> goes into the v3.18 merge window.)  And one caveat: There is long
> latency associated with synchronize_rcu_tasks() by design.  Grace
> periods are measured in seconds.
> 
> A stop_cpus() will force a context switch on all CPUs, though it is
> a rather big hammer.

And I was reminded by the very next email that kick_all_cpus_sync() is
another possibility -- it forces an interrupt on all online CPUs, idle
or not.

							Thanx, Paul

> So again, what do you really need?
> 
> 							Thanx, Paul
> 
> >  }
> > 
> >  /**
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index 11e7bc434f..c47fce75e6 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -147,6 +147,9 @@ use_default:
> >  	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
> >  		goto use_default;
> > 
> > +	/* Take note of the planned idle state. */
> > +	idle_set_state(this_rq(), &drv->states[next_state]);
> > +
> >  	/*
> >  	 * Enter the idle state previously returned by the governor decision.
> >  	 * This function will block until an interrupt occurs and will take
> > @@ -154,6 +157,9 @@ use_default:
> >  	 */
> >  	entered_state = cpuidle_enter(drv, dev, next_state);
> > 
> > +	/* The cpu is no longer idle or about to enter idle. */
> > +	idle_set_state(this_rq(), NULL);
> > +
> >  	if (broadcast)
> >  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
> > 
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 579712f4e9..aea8baa7a5 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -14,6 +14,7 @@
> >  #include "cpuacct.h"
> > 
> >  struct rq;
> > +struct cpuidle_state;
> > 
> >  extern __read_mostly int scheduler_running;
> > 
> > @@ -636,6 +637,11 @@ struct rq {
> >  #ifdef CONFIG_SMP
> >  	struct llist_head wake_list;
> >  #endif
> > +
> > +#ifdef CONFIG_CPU_IDLE
> > +	/* Must be inspected within a rcu lock section */
> > +	struct cpuidle_state *idle_state;
> > +#endif
> >  };
> > 
> >  static inline int cpu_of(struct rq *rq)
> > @@ -1180,6 +1186,39 @@ static inline void idle_exit_fair(struct rq *rq) { }
> > 
> >  #endif
> > 
> > +#ifdef CONFIG_CPU_IDLE
> > +static inline void idle_set_state(struct rq *rq,
> > +				  struct cpuidle_state *idle_state)
> > +{
> > +	rq->idle_state = idle_state;
> > +}
> > +
> > +static inline struct cpuidle_state *idle_get_state(struct rq *rq)
> > +{
> > +	rcu_read_lock();
> > +	return rq->idle_state;
> > +}
> > +
> > +static inline void cpuidle_put_state(struct rq *rq)
> > +{
> > +	rcu_read_unlock();
> > +}
> > +#else
> > +static inline void idle_set_state(struct rq *rq,
> > +				  struct cpuidle_state *idle_state)
> > +{
> > +}
> > +
> > +static inline struct cpuidle_state *idle_get_state(struct rq *rq)
> > +{
> > +	return NULL;
> > +}
> > +
> > +static inline void cpuidle_put_state(struct rq *rq)
> > +{
> > +}
> > +#endif
> > +
> >  extern void sysrq_sched_debug_show(void);
> >  extern void sched_init_granularity(void);
> >  extern void update_max_interval(void);
> > -- 
> > 1.8.4.108.g55ea5f6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-18 17:37   ` Paul E. McKenney
  2014-09-18 17:39     ` Paul E. McKenney
@ 2014-09-18 18:32     ` Nicolas Pitre
  2014-09-18 23:17       ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-18 18:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Thu, 18 Sep 2014, Paul E. McKenney wrote:

> On Thu, Sep 04, 2014 at 11:32:09AM -0400, Nicolas Pitre wrote:
> > From: Daniel Lezcano <daniel.lezcano@linaro.org>
> > 
> > When the cpu enters idle, it stores the cpuidle state pointer in its
> > struct rq instance which in turn could be used to make a better decision
> > when balancing tasks.
> > 
> > As soon as the cpu exits its idle state, the struct rq reference is
> > cleared.
> > 
> > There are a couple of situations where the idle state pointer could be changed
> > while it is being consulted:
> > 
> > 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
> >    to AC that could result on removing the deeper idle state. The acpi driver
> >    triggers:
> > 	'acpi_processor_cst_has_changed'
> > 		'cpuidle_pause_and_lock'
> > 			'cpuidle_uninstall_idle_handler'
> > 				'kick_all_cpus_sync'.
> > 
> > All cpus will exit their idle state and the pointed object will be set to
> > NULL.
> > 
> > 2. The cpuidle driver is unloaded. Logically that could happen but not
> > in practice because the drivers are always compiled in and 95% of them are
> > not coded to unregister themselves.  In any case, the unloading code must
> > call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
> > leading to 'kick_all_cpus_sync' as mentioned above.
> > 
> > A race can happen if we use the pointer and then one of these two scenarios
> > occurs at the same moment.
> > 
> > In order to be safe, the idle state pointer stored in the rq must be
> > used inside a rcu_read_lock section where we are protected with the
> > 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
> > idle_get_state() and idle_put_state() accessors should be used to that
> > effect.
> > 
> > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > ---
> >  drivers/cpuidle/cpuidle.c |  6 ++++++
> >  kernel/sched/idle.c       |  6 ++++++
> >  kernel/sched/sched.h      | 39 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index ee9df5e3f5..530e3055a2 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void)
> >  		initialized = 0;
> >  		kick_all_cpus_sync();
> >  	}
> > +
> > +	/*
> > +	 * Make sure external observers (such as the scheduler)
> > +	 * are done looking at pointed idle states.
> > +	 */
> > +	rcu_barrier();
> 
> Actually, all rcu_barrier() does is to make sure that all previously
> queued RCU callbacks have been invoked.  And given the current
> implementation, if there are no callbacks queued anywhere in the system,
> rcu_barrier() is an extended no-op.  "Has CPU 0 any callbacks?" "Nope!"
> "Has CPU 1 any callbacks?"  "Nope!" ... "Has CPU nr_cpu_ids-1 any
> callbacks?"  "Nope!"  "OK, done!"
> 
> This is all done with the current task looking at per-CPU data structures,
> with no interaction with the scheduler and with no need to actually make
> those other CPUs do anything.
> 
> So what is it that you really need to do here?

In short, we don't want the cpufreq data to go away (see the 2 scenarios 
above) while the scheduler is looking at it.  The scheduler uses the 
provided accessors (see patch 2/2) so we can put any protection 
mechanism we want in them.  A simple spinlock could do just as well 
which should be good enough.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-18 17:39     ` Paul E. McKenney
@ 2014-09-18 23:15       ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2014-09-18 23:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nicolas Pitre, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Thu, Sep 18, 2014 at 10:39:25AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 18, 2014 at 10:37:33AM -0700, Paul E. McKenney wrote:

> > A stop_cpus() will force a context switch on all CPUs, though it is
> > a rather big hammer.
> 
> And I was reminded by the very next email that kick_all_cpus_sync() is
> another possibility -- it forces an interrupt on all online CPUs, idle
> or not.

I actually have a patch
http://lkml.kernel.org/r/1409815075-4180-2-git-send-email-chuansheng.liu@intel.com
that changes that, because apparently there are idle loops that don't
actually exit on interrupt :-)

But yes, something like the wake_up_all_idle_cpus() should do.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-18 18:32     ` Nicolas Pitre
@ 2014-09-18 23:17       ` Peter Zijlstra
  2014-09-18 23:28         ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2014-09-18 23:17 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Paul E. McKenney, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Thu, Sep 18, 2014 at 02:32:25PM -0400, Nicolas Pitre wrote:
> On Thu, 18 Sep 2014, Paul E. McKenney wrote:

> > So what is it that you really need to do here?
> 
> In short, we don't want the cpufreq data to go away (see the 2 scenarios 
> above) while the scheduler is looking at it.  The scheduler uses the 
> provided accessors (see patch 2/2) so we can put any protection 
> mechanism we want in them.  A simple spinlock could do just as well 
> which should be good enough.

rq->lock disables interrupts so on that something like
kick_all_cpus_sync() will guarantee what you need --
wake_up_all_idle_cpus() will not.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-18  0:39   ` Nicolas Pitre
@ 2014-09-18 23:24     ` Peter Zijlstra
  2014-09-19 18:22       ` Nicolas Pitre
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2014-09-18 23:24 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Ingo Molnar, Rafael J. Wysocki, linux-kernel, linaro-kernel, linux-pm

On Wed, Sep 17, 2014 at 08:39:34PM -0400, Nicolas Pitre wrote:
> 
> Ping-ping.

Right, finally got to it. Too much travel and some time away from the
computer with the 'usual' result :/



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-18 23:17       ` Peter Zijlstra
@ 2014-09-18 23:28         ` Peter Zijlstra
  2014-09-19 18:30           ` Nicolas Pitre
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2014-09-18 23:28 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Paul E. McKenney, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Fri, Sep 19, 2014 at 01:17:15AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 18, 2014 at 02:32:25PM -0400, Nicolas Pitre wrote:
> > On Thu, 18 Sep 2014, Paul E. McKenney wrote:
> 
> > > So what is it that you really need to do here?
> > 
> > In short, we don't want the cpufreq data to go away (see the 2 scenarios 
> > above) while the scheduler is looking at it.  The scheduler uses the 
> > provided accessors (see patch 2/2) so we can put any protection 
> > mechanism we want in them.  A simple spinlock could do just as well 
> > which should be good enough.
> 
> rq->lock disables interrupts so on that something like
> kick_all_cpus_sync() will guarantee what you need --
> wake_up_all_idle_cpus() will not.

Something like so then?

---
Subject: sched: let the scheduler see CPU idle states
From: Daniel Lezcano <daniel.lezcano@linaro.org>
Date: Thu, 04 Sep 2014 11:32:09 -0400

When the cpu enters idle, it stores the cpuidle state pointer in its
struct rq instance which in turn could be used to make a better decision
when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is
cleared.

There are a couple of situations where the idle state pointer could be changed
while it is being consulted:

1. For x86/acpi with dynamic c-states, when a laptop switches from battery
   to AC that could result on removing the deeper idle state. The acpi driver
   triggers:
	'acpi_processor_cst_has_changed'
		'cpuidle_pause_and_lock'
			'cpuidle_uninstall_idle_handler'
				'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to
NULL.

2. The cpuidle driver is unloaded. Logically that could happen but not
in practice because the drivers are always compiled in and 95% of them are
not coded to unregister themselves.  In any case, the unloading code must
call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios
occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be
used inside a rcu_read_lock section where we are protected with the
'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
idle_get_state() and idle_put_state() accessors should be used to that
effect.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 drivers/cpuidle/cpuidle.c |    6 ++++++
 kernel/sched/idle.c       |    6 ++++++
 kernel/sched/sched.h      |   29 +++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void
 		initialized = 0;
 		wake_up_all_idle_cpus();
 	}
+
+	/*
+	 * Make sure external observers (such as the scheduler)
+	 * are done looking at pointed idle states.
+	 */
+	kick_all_cpus_sync();
 }
 
 /**
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -147,6 +147,9 @@ static void cpuidle_idle_call(void)
 	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
 		goto use_default;
 
+	/* Take note of the planned idle state. */
+	idle_set_state(this_rq(), &drv->states[next_state]);
+
 	/*
 	 * Enter the idle state previously returned by the governor decision.
 	 * This function will block until an interrupt occurs and will take
@@ -154,6 +157,9 @@ static void cpuidle_idle_call(void)
 	 */
 	entered_state = cpuidle_enter(drv, dev, next_state);
 
+	/* The cpu is no longer idle or about to enter idle. */
+	idle_set_state(this_rq(), NULL);
+
 	if (broadcast)
 		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -14,6 +14,7 @@
 #include "cpuacct.h"
 
 struct rq;
+struct cpuidle_state;
 
 /* task_struct::on_rq states: */
 #define TASK_ON_RQ_QUEUED	1
@@ -640,6 +641,11 @@ struct rq {
 #ifdef CONFIG_SMP
 	struct llist_head wake_list;
 #endif
+
+#ifdef CONFIG_CPU_IDLE
+	/* Must be inspected within a rcu lock section */
+	struct cpuidle_state *idle_state;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
@@ -1193,6 +1199,29 @@ static inline void idle_exit_fair(struct
 
 #endif
 
+#ifdef CONFIG_CPU_IDLE
+static inline void idle_set_state(struct rq *rq,
+				  struct cpuidle_state *idle_state)
+{
+	rq->idle_state = idle_state;
+}
+
+static inline struct cpuidle_state *idle_get_state(struct rq *rq)
+{
+	return rq->idle_state;
+}
+#else
+static inline void idle_set_state(struct rq *rq,
+				  struct cpuidle_state *idle_state)
+{
+}
+
+static inline struct cpuidle_state *idle_get_state(struct rq *rq)
+{
+	return NULL;
+}
+#endif
+
 extern void sysrq_sched_debug_show(void);
 extern void sched_init_granularity(void);
 extern void update_max_interval(void);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
  2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
  2014-09-05  7:52   ` Daniel Lezcano
@ 2014-09-18 23:46   ` Peter Zijlstra
  2014-09-19  0:05   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2014-09-18 23:46 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki, linux-pm,
	linux-kernel, linaro-kernel

On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest load.
> However, if multiple CPUs are idle, the first idle CPU is selected
> irrespective of the depth of its idle state.
> 
> Among the idle CPUs we should pick the one with with the shallowest idle
> state, or the latest to have gone idle if all idle CPUs are in the same
> state.  The later applies even when cpuidle is configured out.
> 
> This patch doesn't cover the following issues:
> 
> - The idle exit latency of a CPU might be larger than the time needed
>   to migrate the waking task to an already running CPU with sufficient
>   capacity, and therefore performance would benefit from task packing
>   in such case (in most cases task packing is about power saving).
> 
> - Some idle states have a non negligible and non abortable entry latency
>   which needs to run to completion before the exit latency can start.
>   A concurrent patch series is making this info available to the cpuidle
>   core.  Once available, the entry latency with the idle timestamp could
>   determine when the exit latency may be effective.
> 
> Those issues will be handled in due course.  In the mean time, what
> is implemented here should improve things already compared to the current
> state of affairs.
> 
> Based on an initial patch from Daniel Lezcano.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>  #include <linux/latencytop.h>
>  #include <linux/sched.h>
>  #include <linux/cpumask.h>
> +#include <linux/cpuidle.h>
>  #include <linux/slab.h>
>  #include <linux/profile.h>
>  #include <linux/interrupt.h>
> @@ -4428,20 +4429,48 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)

Ah, now I see, you use it in find_idlest_cpu(), that does not indeed
hold rq->lock, but it does already hold rcu_read_lock(), so in that
regard sync_rcu() should be the right primitive.

I suppose we want the same kind of logic in select_idle_sibling() and
that too already has rcu_read_lock().

So I'll replace the kick_all_cpus_sync() with sync_rcu() and add a
WARN_ON(!rcu_read_lock_held()) to idle_get_state(), like the below.

I however do think we need a few word on why we don't need
rcu_assign_pointer() and rcu_dereference() for rq->idle_state -- and I
do indeed think we do not because the idle state data is static.

---
Subject: sched: let the scheduler see CPU idle states
From: Daniel Lezcano <daniel.lezcano@linaro.org>
Date: Thu, 04 Sep 2014 11:32:09 -0400

When the cpu enters idle, it stores the cpuidle state pointer in its
struct rq instance which in turn could be used to make a better decision
when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is
cleared.

There are a couple of situations where the idle state pointer could be changed
while it is being consulted:

1. For x86/acpi with dynamic c-states, when a laptop switches from battery
   to AC that could result on removing the deeper idle state. The acpi driver
   triggers:
	'acpi_processor_cst_has_changed'
		'cpuidle_pause_and_lock'
			'cpuidle_uninstall_idle_handler'
				'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to
NULL.

2. The cpuidle driver is unloaded. Logically that could happen but not
in practice because the drivers are always compiled in and 95% of them are
not coded to unregister themselves.  In any case, the unloading code must
call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios
occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be
used inside a rcu_read_lock section where we are protected with the
'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
idle_get_state() and idle_put_state() accessors should be used to that
effect.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 drivers/cpuidle/cpuidle.c |    6 ++++++
 kernel/sched/idle.c       |    6 ++++++
 kernel/sched/sched.h      |   30 ++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void
 		initialized = 0;
 		wake_up_all_idle_cpus();
 	}
+
+	/*
+	 * Make sure external observers (such as the scheduler)
+	 * are done looking at pointed idle states.
+	 */
+	synchronize_rcu();
 }
 
 /**
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -147,6 +147,9 @@ static void cpuidle_idle_call(void)
 	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
 		goto use_default;
 
+	/* Take note of the planned idle state. */
+	idle_set_state(this_rq(), &drv->states[next_state]);
+
 	/*
 	 * Enter the idle state previously returned by the governor decision.
 	 * This function will block until an interrupt occurs and will take
@@ -154,6 +157,9 @@ static void cpuidle_idle_call(void)
 	 */
 	entered_state = cpuidle_enter(drv, dev, next_state);
 
+	/* The cpu is no longer idle or about to enter idle. */
+	idle_set_state(this_rq(), NULL);
+
 	if (broadcast)
 		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -14,6 +14,7 @@
 #include "cpuacct.h"
 
 struct rq;
+struct cpuidle_state;
 
 /* task_struct::on_rq states: */
 #define TASK_ON_RQ_QUEUED	1
@@ -640,6 +641,11 @@ struct rq {
 #ifdef CONFIG_SMP
 	struct llist_head wake_list;
 #endif
+
+#ifdef CONFIG_CPU_IDLE
+	/* Must be inspected within a rcu lock section */
+	struct cpuidle_state *idle_state;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
@@ -1193,6 +1199,30 @@ static inline void idle_exit_fair(struct
 
 #endif
 
+#ifdef CONFIG_CPU_IDLE
+static inline void idle_set_state(struct rq *rq,
+				  struct cpuidle_state *idle_state)
+{
+	rq->idle_state = idle_state;
+}
+
+static inline struct cpuidle_state *idle_get_state(struct rq *rq)
+{
+	WARN_ON(!rcu_read_lock_held());
+	return rq->idle_state;
+}
+#else
+static inline void idle_set_state(struct rq *rq,
+				  struct cpuidle_state *idle_state)
+{
+}
+
+static inline struct cpuidle_state *idle_get_state(struct rq *rq)
+{
+	return NULL;
+}
+#endif
+
 extern void sysrq_sched_debug_show(void);
 extern void sched_init_granularity(void);
 extern void update_max_interval(void);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
  2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
  2014-09-05  7:52   ` Daniel Lezcano
  2014-09-18 23:46   ` Peter Zijlstra
@ 2014-09-19  0:05   ` Peter Zijlstra
  2014-09-19  4:49     ` Yao Dongdong
  2014-09-30 21:58   ` Rik van Riel
  4 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2014-09-19  0:05 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki, linux-pm,
	linux-kernel, linaro-kernel

On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>  #include <linux/latencytop.h>
>  #include <linux/sched.h>
>  #include <linux/cpumask.h>
> +#include <linux/cpuidle.h>
>  #include <linux/slab.h>
>  #include <linux/profile.h>
>  #include <linux/interrupt.h>
> @@ -4428,20 +4429,48 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>  {
>  	unsigned long load, min_load = ULONG_MAX;
> -	int idlest = -1;
> +	unsigned int min_exit_latency = UINT_MAX;
> +	u64 latest_idle_timestamp = 0;
> +	int least_loaded_cpu = this_cpu;
> +	int shallowest_idle_cpu = -1;
>  	int i;
>  
>  	/* Traverse only the allowed CPUs */
>  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> -		load = weighted_cpuload(i);
> -
> -		if (load < min_load || (load == min_load && i == this_cpu)) {
> -			min_load = load;
> -			idlest = i;
> +		if (idle_cpu(i)) {
> +			struct rq *rq = cpu_rq(i);
> +			struct cpuidle_state *idle = idle_get_state(rq);
> +			if (idle && idle->exit_latency < min_exit_latency) {
> +				/*
> +				 * We give priority to a CPU whose idle state
> +				 * has the smallest exit latency irrespective
> +				 * of any idle timestamp.
> +				 */
> +				min_exit_latency = idle->exit_latency;
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			} else if ((!idle || idle->exit_latency == min_exit_latency) &&
> +				   rq->idle_stamp > latest_idle_timestamp) {
> +				/*
> +				 * If equal or no active idle state, then
> +				 * the most recently idled CPU might have
> +				 * a warmer cache.
> +				 */
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			}
> +			cpuidle_put_state(rq);

Right, matching the other changes, I killed that line. The rest looks
ok.

> +		} else {
> +			load = weighted_cpuload(i);
> +			if (load < min_load ||
> +			    (load == min_load && i == this_cpu)) {
> +				min_load = load;
> +				least_loaded_cpu = i;
> +			}
>  		}
>  	}
>  
> -	return idlest;
> +	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
>  }


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
  2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
@ 2014-09-19  4:49     ` Yao Dongdong
  2014-09-18 23:46   ` Peter Zijlstra
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Yao Dongdong @ 2014-09-19  4:49 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On 2014/9/4 23:32, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest load.
> However, if multiple CPUs are idle, the first idle CPU is selected
> irrespective of the depth of its idle state.
>
> Among the idle CPUs we should pick the one with with the shallowest idle
> state, or the latest to have gone idle if all idle CPUs are in the same
> state.  The later applies even when cpuidle is configured out.
>
> This patch doesn't cover the following issues:
>
> - The idle exit latency of a CPU might be larger than the time needed
>   to migrate the waking task to an already running CPU with sufficient
>   capacity, and therefore performance would benefit from task packing
>   in such case (in most cases task packing is about power saving).
>
> - Some idle states have a non negligible and non abortable entry latency
>   which needs to run to completion before the exit latency can start.
>   A concurrent patch series is making this info available to the cpuidle
>   core.  Once available, the entry latency with the idle timestamp could
>   determine when the exit latency may be effective.
>
> Those issues will be handled in due course.  In the mean time, what
> is implemented here should improve things already compared to the current
> state of affairs.
>
> Based on an initial patch from Daniel Lezcano.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>  #include <linux/latencytop.h>
>  #include <linux/sched.h>
>  #include <linux/cpumask.h>
> +#include <linux/cpuidle.h>
>  #include <linux/slab.h>
>  #include <linux/profile.h>
>  #include <linux/interrupt.h>
> @@ -4428,20 +4429,48 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>  {
>  	unsigned long load, min_load = ULONG_MAX;
> -	int idlest = -1;
> +	unsigned int min_exit_latency = UINT_MAX;
> +	u64 latest_idle_timestamp = 0;
> +	int least_loaded_cpu = this_cpu;
> +	int shallowest_idle_cpu = -1;
>  	int i;
>  
>  	/* Traverse only the allowed CPUs */
>  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> -		load = weighted_cpuload(i);
> -
> -		if (load < min_load || (load == min_load && i == this_cpu)) {
> -			min_load = load;
> -			idlest = i;
> +		if (idle_cpu(i)) {
> +			struct rq *rq = cpu_rq(i);
> +			struct cpuidle_state *idle = idle_get_state(rq);
> +			if (idle && idle->exit_latency < min_exit_latency) {
> +				/*
> +				 * We give priority to a CPU whose idle state
> +				 * has the smallest exit latency irrespective
> +				 * of any idle timestamp.
> +				 */
> +				min_exit_latency = idle->exit_latency;
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			} else if ((!idle || idle->exit_latency == min_exit_latency) &&
> +				   rq->idle_stamp > latest_idle_timestamp) {
> +				/*
> +				 * If equal or no active idle state, then
> +				 * the most recently idled CPU might have
> +				 * a warmer cache.
> +				 */
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			}
> +			cpuidle_put_state(rq);
> +		} else {
I think we needn't test no idle cpus after find an idle cpu.
And what about this?
                                     } else if (shallowest_idle_cpu == -1) {

> +			load = weighted_cpuload(i);
> +			if (load < min_load ||
> +			    (load == min_load && i == this_cpu)) {
> +				min_load = load;
> +				least_loaded_cpu = i;
> +			}
>  		}
>  	}
>  
> -	return idlest;
> +	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
@ 2014-09-19  4:49     ` Yao Dongdong
  0 siblings, 0 replies; 35+ messages in thread
From: Yao Dongdong @ 2014-09-19  4:49 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On 2014/9/4 23:32, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest load.
> However, if multiple CPUs are idle, the first idle CPU is selected
> irrespective of the depth of its idle state.
>
> Among the idle CPUs we should pick the one with with the shallowest idle
> state, or the latest to have gone idle if all idle CPUs are in the same
> state.  The later applies even when cpuidle is configured out.
>
> This patch doesn't cover the following issues:
>
> - The idle exit latency of a CPU might be larger than the time needed
>   to migrate the waking task to an already running CPU with sufficient
>   capacity, and therefore performance would benefit from task packing
>   in such case (in most cases task packing is about power saving).
>
> - Some idle states have a non negligible and non abortable entry latency
>   which needs to run to completion before the exit latency can start.
>   A concurrent patch series is making this info available to the cpuidle
>   core.  Once available, the entry latency with the idle timestamp could
>   determine when the exit latency may be effective.
>
> Those issues will be handled in due course.  In the mean time, what
> is implemented here should improve things already compared to the current
> state of affairs.
>
> Based on an initial patch from Daniel Lezcano.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bfa3c86d0d..416329e1a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -23,6 +23,7 @@
>  #include <linux/latencytop.h>
>  #include <linux/sched.h>
>  #include <linux/cpumask.h>
> +#include <linux/cpuidle.h>
>  #include <linux/slab.h>
>  #include <linux/profile.h>
>  #include <linux/interrupt.h>
> @@ -4428,20 +4429,48 @@ static int
>  find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>  {
>  	unsigned long load, min_load = ULONG_MAX;
> -	int idlest = -1;
> +	unsigned int min_exit_latency = UINT_MAX;
> +	u64 latest_idle_timestamp = 0;
> +	int least_loaded_cpu = this_cpu;
> +	int shallowest_idle_cpu = -1;
>  	int i;
>  
>  	/* Traverse only the allowed CPUs */
>  	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
> -		load = weighted_cpuload(i);
> -
> -		if (load < min_load || (load == min_load && i == this_cpu)) {
> -			min_load = load;
> -			idlest = i;
> +		if (idle_cpu(i)) {
> +			struct rq *rq = cpu_rq(i);
> +			struct cpuidle_state *idle = idle_get_state(rq);
> +			if (idle && idle->exit_latency < min_exit_latency) {
> +				/*
> +				 * We give priority to a CPU whose idle state
> +				 * has the smallest exit latency irrespective
> +				 * of any idle timestamp.
> +				 */
> +				min_exit_latency = idle->exit_latency;
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			} else if ((!idle || idle->exit_latency == min_exit_latency) &&
> +				   rq->idle_stamp > latest_idle_timestamp) {
> +				/*
> +				 * If equal or no active idle state, then
> +				 * the most recently idled CPU might have
> +				 * a warmer cache.
> +				 */
> +				latest_idle_timestamp = rq->idle_stamp;
> +				shallowest_idle_cpu = i;
> +			}
> +			cpuidle_put_state(rq);
> +		} else {
I think we needn't test no idle cpus after find an idle cpu.
And what about this?
                                     } else if (shallowest_idle_cpu == -1) {

> +			load = weighted_cpuload(i);
> +			if (load < min_load ||
> +			    (load == min_load && i == this_cpu)) {
> +				min_load = load;
> +				least_loaded_cpu = i;
> +			}
>  		}
>  	}
>  
> -	return idlest;
> +	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info
  2014-09-18 23:24     ` Peter Zijlstra
@ 2014-09-19 18:22       ` Nicolas Pitre
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-19 18:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Rafael J. Wysocki, linux-kernel, linaro-kernel, linux-pm

On Fri, 19 Sep 2014, Peter Zijlstra wrote:

> On Wed, Sep 17, 2014 at 08:39:34PM -0400, Nicolas Pitre wrote:
> > 
> > Ping-ping.
> 
> Right, finally got to it. Too much travel and some time away from the
> computer with the 'usual' result :/

No problem.  Next wednesday I'd simply have sent a ping^3.  :-)


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] sched: let the scheduler see CPU idle states
  2014-09-18 23:28         ` Peter Zijlstra
@ 2014-09-19 18:30           ` Nicolas Pitre
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-19 18:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Fri, 19 Sep 2014, Peter Zijlstra wrote:

> On Fri, Sep 19, 2014 at 01:17:15AM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 18, 2014 at 02:32:25PM -0400, Nicolas Pitre wrote:
> > > On Thu, 18 Sep 2014, Paul E. McKenney wrote:
> > 
> > > > So what is it that you really need to do here?
> > > 
> > > In short, we don't want the cpufreq data to go away (see the 2 scenarios 
> > > above) while the scheduler is looking at it.  The scheduler uses the 
> > > provided accessors (see patch 2/2) so we can put any protection 
> > > mechanism we want in them.  A simple spinlock could do just as well 
> > > which should be good enough.
> > 
> > rq->lock disables interrupts so on that something like
> > kick_all_cpus_sync() will guarantee what you need --
> > wake_up_all_idle_cpus() will not.
> 
> Something like so then?

I'll trust you for anything that relates to RCU as its subtleties are 
still escaping my mind.

Still, the commit log refers to idle_put_state() which is no more, and 
that should be adjusted.

> 
> ---
> Subject: sched: let the scheduler see CPU idle states
> From: Daniel Lezcano <daniel.lezcano@linaro.org>
> Date: Thu, 04 Sep 2014 11:32:09 -0400
> 
> When the cpu enters idle, it stores the cpuidle state pointer in its
> struct rq instance which in turn could be used to make a better decision
> when balancing tasks.
> 
> As soon as the cpu exits its idle state, the struct rq reference is
> cleared.
> 
> There are a couple of situations where the idle state pointer could be changed
> while it is being consulted:
> 
> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
>    to AC that could result on removing the deeper idle state. The acpi driver
>    triggers:
> 	'acpi_processor_cst_has_changed'
> 		'cpuidle_pause_and_lock'
> 			'cpuidle_uninstall_idle_handler'
> 				'kick_all_cpus_sync'.
> 
> All cpus will exit their idle state and the pointed object will be set to
> NULL.
> 
> 2. The cpuidle driver is unloaded. Logically that could happen but not
> in practice because the drivers are always compiled in and 95% of them are
> not coded to unregister themselves.  In any case, the unloading code must
> call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock'
> leading to 'kick_all_cpus_sync' as mentioned above.
> 
> A race can happen if we use the pointer and then one of these two scenarios
> occurs at the same moment.
> 
> In order to be safe, the idle state pointer stored in the rq must be
> used inside a rcu_read_lock section where we are protected with the
> 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The
> idle_get_state() and idle_put_state() accessors should be used to that
> effect.
> 
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> ---
>  drivers/cpuidle/cpuidle.c |    6 ++++++
>  kernel/sched/idle.c       |    6 ++++++
>  kernel/sched/sched.h      |   29 +++++++++++++++++++++++++++++
>  3 files changed, 41 insertions(+)
> 
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void
>  		initialized = 0;
>  		wake_up_all_idle_cpus();
>  	}
> +
> +	/*
> +	 * Make sure external observers (such as the scheduler)
> +	 * are done looking at pointed idle states.
> +	 */
> +	kick_all_cpus_sync();
>  }
>  
>  /**
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -147,6 +147,9 @@ static void cpuidle_idle_call(void)
>  	    clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu))
>  		goto use_default;
>  
> +	/* Take note of the planned idle state. */
> +	idle_set_state(this_rq(), &drv->states[next_state]);
> +
>  	/*
>  	 * Enter the idle state previously returned by the governor decision.
>  	 * This function will block until an interrupt occurs and will take
> @@ -154,6 +157,9 @@ static void cpuidle_idle_call(void)
>  	 */
>  	entered_state = cpuidle_enter(drv, dev, next_state);
>  
> +	/* The cpu is no longer idle or about to enter idle. */
> +	idle_set_state(this_rq(), NULL);
> +
>  	if (broadcast)
>  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);
>  
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -14,6 +14,7 @@
>  #include "cpuacct.h"
>  
>  struct rq;
> +struct cpuidle_state;
>  
>  /* task_struct::on_rq states: */
>  #define TASK_ON_RQ_QUEUED	1
> @@ -640,6 +641,11 @@ struct rq {
>  #ifdef CONFIG_SMP
>  	struct llist_head wake_list;
>  #endif
> +
> +#ifdef CONFIG_CPU_IDLE
> +	/* Must be inspected within a rcu lock section */
> +	struct cpuidle_state *idle_state;
> +#endif
>  };
>  
>  static inline int cpu_of(struct rq *rq)
> @@ -1193,6 +1199,29 @@ static inline void idle_exit_fair(struct
>  
>  #endif
>  
> +#ifdef CONFIG_CPU_IDLE
> +static inline void idle_set_state(struct rq *rq,
> +				  struct cpuidle_state *idle_state)
> +{
> +	rq->idle_state = idle_state;
> +}
> +
> +static inline struct cpuidle_state *idle_get_state(struct rq *rq)
> +{
> +	return rq->idle_state;
> +}
> +#else
> +static inline void idle_set_state(struct rq *rq,
> +				  struct cpuidle_state *idle_state)
> +{
> +}
> +
> +static inline struct cpuidle_state *idle_get_state(struct rq *rq)
> +{
> +	return NULL;
> +}
> +#endif
> +
>  extern void sysrq_sched_debug_show(void);
>  extern void sched_init_granularity(void);
>  extern void update_max_interval(void);
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
  2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
                     ` (3 preceding siblings ...)
  2014-09-19  4:49     ` Yao Dongdong
@ 2014-09-30 21:58   ` Rik van Riel
  2014-09-30 23:15     ` Nicolas Pitre
  4 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2014-09-30 21:58 UTC (permalink / raw)
  To: Nicolas Pitre, Peter Zijlstra, Ingo Molnar
  Cc: Daniel Lezcano, Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/04/2014 11:32 AM, Nicolas Pitre wrote:
> The code in find_idlest_cpu() looks for the CPU with the smallest
> load. However, if multiple CPUs are idle, the first idle CPU is
> selected irrespective of the depth of its idle state.
> 
> Among the idle CPUs we should pick the one with with the shallowest
> idle state, or the latest to have gone idle if all idle CPUs are in
> the same state.  The later applies even when cpuidle is configured
> out.
> 
> This patch doesn't cover the following issues:

The main thing it does not cover is already running tasks that
get woken up again, since select_idle_sibling() covers everything
except for newly forked and newly executed tasks.

I am looking at adding similar logic to select_idle_sibling()

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUKyd9AAoJEM553pKExN6DXZgH/1R26XfYv2FzYV9IGbty3vVx
1kMPozdb7jRAUR8+WkUgs7ntkbqau2hC/nTjDsSsiLQwXdjaDdqvnbt8Y6srI1es
/Z+IRaIPGx24D7D6nB5sLgsAq6DsANUdtFK3TsED8+07LbiY71o64YQ3X1IEVyRO
FKBcDw9+DBPGVySKIdMm0h2txdnQ3Jy2lM3nKV8tBFheRuOhU4Rv/fumEYAUYvDV
J9y91RhKOeEJYmaYL6oQYtZgBqhDoJmh/0DjOrK6H71oZYiNWeIUxtieNXaNQp7B
Rd8khOVFLsf/qZK6qjmgnfO9Mm5ij/PvrALOBZt8O9KAD3/v3kXfWIm9tO1NDZU=
=kTdn
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu
  2014-09-30 21:58   ` Rik van Riel
@ 2014-09-30 23:15     ` Nicolas Pitre
  2014-10-02 17:15       ` [PATCH RFC] sched,idle: teach select_idle_sibling about idle states Rik van Riel
  0 siblings, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2014-09-30 23:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Tue, 30 Sep 2014, Rik van Riel wrote:

> On 09/04/2014 11:32 AM, Nicolas Pitre wrote:
> > The code in find_idlest_cpu() looks for the CPU with the smallest
> > load. However, if multiple CPUs are idle, the first idle CPU is
> > selected irrespective of the depth of its idle state.
> > 
> > Among the idle CPUs we should pick the one with with the shallowest
> > idle state, or the latest to have gone idle if all idle CPUs are in
> > the same state.  The later applies even when cpuidle is configured
> > out.
> > 
> > This patch doesn't cover the following issues:
> 
> The main thing it does not cover is already running tasks that
> get woken up again, since select_idle_sibling() covers everything
> except for newly forked and newly executed tasks.

True. Now that you bring this up, I remember that Peter mentioned it as 
well.

> I am looking at adding similar logic to select_idle_sibling()

OK thanks.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-09-30 23:15     ` Nicolas Pitre
@ 2014-10-02 17:15       ` Rik van Riel
  2014-10-03  6:04         ` Mike Galbraith
  2014-10-03  6:23         ` Mike Galbraith
  0 siblings, 2 replies; 35+ messages in thread
From: Rik van Riel @ 2014-10-02 17:15 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

On Tue, 30 Sep 2014 19:15:00 -0400 (EDT)
Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 30 Sep 2014, Rik van Riel wrote:

> > The main thing it does not cover is already running tasks that
> > get woken up again, since select_idle_sibling() covers everything
> > except for newly forked and newly executed tasks.
> 
> True. Now that you bring this up, I remember that Peter mentioned it as 
> well.
> 
> > I am looking at adding similar logic to select_idle_sibling()
> 
> OK thanks.

This patch is ugly. I have not bothered cleaning it up, because it
causes a regression with hackbench. Apparently for hackbench (and
potentially other sync wakeups), locality is more important than
idleness.

We may need to add a third clause before the search, something
along the lines of, to ensure target gets selected if neither
target or i are idle and the wakeup is synchronous...

    if (sync_wakeup && cpu_of(target)->nr_running == 1)
	return target;

I still need to run tests with other workloads, too.

Another consideration is that search costs with this patch
are potentially much increased. I suspect we may want to simply
propagate the load on each sched_group up the tree hierarchically,
with delta accounting and propagating the info upwards only when
the delta is significant, like done in __update_tg_runnable_avg.

---8<---

Subject: sched,idle: teach select_idle_sibling about idle states

Change select_idle_sibling to take cpu idle exit latency into
account.  First preference is to select the cpu with the lowest
exit latency from a completely idle sched_group inside the CPU;
if that is not available, we pick the CPU with the lowest exit
latency in any sched_group.

This increases the total search time of select_idle_sibling,
we may want to look into propagating load info up the sched_group
tree in some way. That information would also be useful to prevent
the wake_affine logic from causing a load imbalance between
sched_groups.

It is not clear when locality (from staying on the old CPU) beats
a lower idle exit latency. Having information on whether the CPU
drops content from the CPU caches in certain idle states would
help with that, but with multiple CPUs bound together in the same
physical CPU core, the hardware often does not do what we tell it,
anyway...

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 47 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 10a5a28..12540cd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4465,41 +4465,76 @@ static int select_idle_sibling(struct task_struct *p, int target)
 {
 	struct sched_domain *sd;
 	struct sched_group *sg;
+	unsigned int min_exit_latency_thread = UINT_MAX;
+	unsigned int min_exit_latency_core = UINT_MAX;
+	int shallowest_idle_thread = -1;
+	int shallowest_idle_core = -1;
 	int i = task_cpu(p);
 
+	/* target always has some code running and is not in an idle state */
 	if (idle_cpu(target))
 		return target;
 
 	/*
 	 * If the prevous cpu is cache affine and idle, don't be stupid.
+	 * XXX: does i's exit latency exceed sysctl_sched_migration_cost?
 	 */
 	if (i != target && cpus_share_cache(i, target) && idle_cpu(i))
 		return i;
 
 	/*
 	 * Otherwise, iterate the domains and find an elegible idle cpu.
+	 * First preference is finding a totally idle core with a thread
+	 * in a shallow idle state; second preference is whatever idle
+	 * thread has the shallowest idle state anywhere.
 	 */
 	sd = rcu_dereference(per_cpu(sd_llc, target));
 	for_each_lower_domain(sd) {
 		sg = sd->groups;
 		do {
+			unsigned int min_sg_exit_latency = UINT_MAX;
+			int shallowest_sg_idle_thread = -1;
+			bool all_idle = true;
+
 			if (!cpumask_intersects(sched_group_cpus(sg),
 						tsk_cpus_allowed(p)))
 				goto next;
 
 			for_each_cpu(i, sched_group_cpus(sg)) {
-				if (i == target || !idle_cpu(i))
-					goto next;
+				struct rq *rq;
+				struct cpuidle_state *idle;
+
+				if (i == target || !idle_cpu(i)) {
+					all_idle = false;
+					continue;
+				}
+
+				rq = cpu_rq(i);
+				idle = idle_get_state(rq);
+
+				if (idle && idle->exit_latency < min_sg_exit_latency) {
+					min_sg_exit_latency = idle->exit_latency;
+					shallowest_sg_idle_thread = i;
+				}
+			}
+
+			if (all_idle && min_sg_exit_latency < min_exit_latency_core) {
+				shallowest_idle_core = shallowest_sg_idle_thread;
+				min_exit_latency_core = min_sg_exit_latency;
+			} else if (min_sg_exit_latency < min_exit_latency_thread) {
+				shallowest_idle_thread = shallowest_sg_idle_thread;
+				min_exit_latency_thread = min_sg_exit_latency;
 			}
 
-			target = cpumask_first_and(sched_group_cpus(sg),
-					tsk_cpus_allowed(p));
-			goto done;
 next:
 			sg = sg->next;
 		} while (sg != sd->groups);
 	}
-done:
+	if (shallowest_idle_core >= 0)
+		target = shallowest_idle_core;
+	else if (shallowest_idle_thread >= 0)
+		target = shallowest_idle_thread;
+
 	return target;
 }
 


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-02 17:15       ` [PATCH RFC] sched,idle: teach select_idle_sibling about idle states Rik van Riel
@ 2014-10-03  6:04         ` Mike Galbraith
  2014-10-03  6:23         ` Mike Galbraith
  1 sibling, 0 replies; 35+ messages in thread
From: Mike Galbraith @ 2014-10-03  6:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Nicolas Pitre, Peter Zijlstra, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:

> This patch is ugly. I have not bothered cleaning it up, because it
> causes a regression with hackbench. Apparently for hackbench (and
> potentially other sync wakeups), locality is more important than
> idleness.
> 
> We may need to add a third clause before the search, something
> along the lines of, to ensure target gets selected if neither
> target or i are idle and the wakeup is synchronous...
> 
>     if (sync_wakeup && cpu_of(target)->nr_running == 1)
> 	return target;

I recommend you forget that trusting sync hint ever sprang to mind, it
is often a big fat lie.

-Mike


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-02 17:15       ` [PATCH RFC] sched,idle: teach select_idle_sibling about idle states Rik van Riel
  2014-10-03  6:04         ` Mike Galbraith
@ 2014-10-03  6:23         ` Mike Galbraith
  2014-10-03  7:50           ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Mike Galbraith @ 2014-10-03  6:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Nicolas Pitre, Peter Zijlstra, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:

> Subject: sched,idle: teach select_idle_sibling about idle states
> 
> Change select_idle_sibling to take cpu idle exit latency into
> account.  First preference is to select the cpu with the lowest
> exit latency from a completely idle sched_group inside the CPU;
> if that is not available, we pick the CPU with the lowest exit
> latency in any sched_group.
> 
> This increases the total search time of select_idle_sibling,
> we may want to look into propagating load info up the sched_group
> tree in some way. That information would also be useful to prevent
> the wake_affine logic from causing a load imbalance between
> sched_groups.

A generic boo hiss aimed in the general direction of all of this let's
go look at every possibility on every wakeup stuff.  Less is more.

-Mike


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-03  6:23         ` Mike Galbraith
@ 2014-10-03  7:50           ` Peter Zijlstra
  2014-10-03 13:05             ` Mike Galbraith
  2014-10-03 14:28             ` Rik van Riel
  0 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2014-10-03  7:50 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Rik van Riel, Nicolas Pitre, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On Fri, Oct 03, 2014 at 08:23:04AM +0200, Mike Galbraith wrote:
> On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:
> 
> > Subject: sched,idle: teach select_idle_sibling about idle states
> > 
> > Change select_idle_sibling to take cpu idle exit latency into
> > account.  First preference is to select the cpu with the lowest
> > exit latency from a completely idle sched_group inside the CPU;
> > if that is not available, we pick the CPU with the lowest exit
> > latency in any sched_group.
> > 
> > This increases the total search time of select_idle_sibling,
> > we may want to look into propagating load info up the sched_group
> > tree in some way. That information would also be useful to prevent
> > the wake_affine logic from causing a load imbalance between
> > sched_groups.
> 
> A generic boo hiss aimed in the general direction of all of this let's
> go look at every possibility on every wakeup stuff.  Less is more.

I hear you, can you see actual slowdown with the patch? While the worst
case doesn't change, it does make the average case equal to the worst
case iteration -- where we previously would average out at inspecting
half the CPUs before finding an idle one, we'd now always inspect all of
them in order to compare all idle ones on their properties.

Also, with the latest generation of Haswell Xeons having 18 cores (36
threads) this is one massively painful loop for sure.

I'm just not sure what to do about it.. I suppose we can artificially
split it into smaller groups, but I bet that'll hurt some, but if we can
show it gains more we might still be able to do it. The only real
problem is actual numbers/workloads (isn't it always) :/

One thing I suppose we could try is keeping a 'busy' flag at the
llc domain which is set when all CPUs are busy (we'll clear it from
new_idle) that way we can avoid the entire iteration if we know its
pointless.

Hmm...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-03  7:50           ` Peter Zijlstra
@ 2014-10-03 13:05             ` Mike Galbraith
  2014-10-03 14:28             ` Rik van Riel
  1 sibling, 0 replies; 35+ messages in thread
From: Mike Galbraith @ 2014-10-03 13:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, Nicolas Pitre, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On Fri, 2014-10-03 at 09:50 +0200, Peter Zijlstra wrote: 
> On Fri, Oct 03, 2014 at 08:23:04AM +0200, Mike Galbraith wrote:

> > A generic boo hiss aimed in the general direction of all of this let's
> > go look at every possibility on every wakeup stuff.  Less is more.
> 
> I hear you, can you see actual slowdown with the patch? While the worst
> case doesn't change, it does make the average case equal to the worst
> case iteration -- where we previously would average out at inspecting
> half the CPUs before finding an idle one, we'd now always inspect all of
> them in order to compare all idle ones on their properties.
> 
> Also, with the latest generation of Haswell Xeons having 18 cores (36
> threads) this is one massively painful loop for sure.

Yeah, the things are getting too damn big.  I didn't try the patch and
measure anything, my gut instantly said "nope, not worth it".
  
> I'm just not sure what to do about it.. I suppose we can artificially
> split it into smaller groups, but I bet that'll hurt some, but if we can
> show it gains more we might still be able to do it. The only real
> problem is actual numbers/workloads (isn't it always) :/
> 
> One thing I suppose we could try is keeping a 'busy' flag at the
> llc domain which is set when all CPUs are busy (we'll clear it from
> new_idle) that way we can avoid the entire iteration if we know its
> pointless.

On one of those huge packages, heck, even on a 8 core that could save a
substantial number of busy box cycles.

-Mike


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-03  7:50           ` Peter Zijlstra
  2014-10-03 13:05             ` Mike Galbraith
@ 2014-10-03 14:28             ` Rik van Riel
  2014-10-03 14:46               ` Peter Zijlstra
  2014-10-03 18:52               ` Nicolas Pitre
  1 sibling, 2 replies; 35+ messages in thread
From: Rik van Riel @ 2014-10-03 14:28 UTC (permalink / raw)
  To: Peter Zijlstra, Mike Galbraith
  Cc: Nicolas Pitre, Ingo Molnar, Daniel Lezcano, Rafael J. Wysocki,
	linux-pm, linux-kernel, linaro-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/03/2014 03:50 AM, Peter Zijlstra wrote:
> On Fri, Oct 03, 2014 at 08:23:04AM +0200, Mike Galbraith wrote:
>> On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:
>> 
>>> Subject: sched,idle: teach select_idle_sibling about idle
>>> states
>>> 
>>> Change select_idle_sibling to take cpu idle exit latency into 
>>> account.  First preference is to select the cpu with the
>>> lowest exit latency from a completely idle sched_group inside
>>> the CPU; if that is not available, we pick the CPU with the
>>> lowest exit latency in any sched_group.
>>> 
>>> This increases the total search time of select_idle_sibling, we
>>> may want to look into propagating load info up the sched_group 
>>> tree in some way. That information would also be useful to
>>> prevent the wake_affine logic from causing a load imbalance
>>> between sched_groups.
>> 
>> A generic boo hiss aimed in the general direction of all of this
>> let's go look at every possibility on every wakeup stuff.  Less
>> is more.
> 
> I hear you, can you see actual slowdown with the patch? While the
> worst case doesn't change, it does make the average case equal to
> the worst case iteration -- where we previously would average out
> at inspecting half the CPUs before finding an idle one, we'd now
> always inspect all of them in order to compare all idle ones on
> their properties.
> 
> Also, with the latest generation of Haswell Xeons having 18 cores
> (36 threads) this is one massively painful loop for sure.

We have 3 different goals when selecting a runqueue for a task:
1) locality: get the task running close to where it has stuff cached
2) work preserving: get the task running ASAP, and preferably on a
   fully idle core
3) idle state latency: place the task on a CPU that can start running
   it ASAP

We may also consider the interplay of the above 3 to have an impact on
4) power use: pack tasks on some CPUs so other CPUs can go into deeper
   idle states

The current implementation is a "compromise" between (1) and (2),
with a strong preference for (2), falling back to (1) if no fully
idle core is found.

My ugly hack isn't any better, trading off (1) in order to be better
at (2) and (3). Whether it even affects (4) remains to be seen.

I know my patch is probably unacceptable, but I do think it is important
that we talk about the problem, and hopefully agree on exactly what the
problem is that we want to solve.

One big question in my mind is, when is locality more important, and
when is work preserving more important?  Do we have an answer to that
question?

The current code has the potential to be quite painful on systems with
a large number of cores per chip, so we will have to change things
anyway...

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJULrKaAAoJEM553pKExN6DVk4H/0d3vVXEezyIUgONluPwKwJC
6QFlaYkglMvfPM85aVLzj4JSQwGmgttXOZBcKvPxk76TbPEgee3lHsstqb0hmWKA
gJdNsR3q/56uUZz4nKTFZqHTXQ6JeXWhppCtd6dibfugo4gI6duvfNsugtOdggm7
1xfUamU6wNAa8VYl3XlHaAaXG4xApVgiNuAC/zRog4ckhfB/Rl2X+4A5Ki7F3eBa
6Gz1DvABd9UYXWvzmHZvB0B+cwSMUpApj5PlPIeo+ZceMCfw7vN20gdZdg/2trsn
weAQsc6ENGaadd5xPj3vsE5QS9oXUw14QM/RH74xy5A7iNyd5JToDRz67aKONiA=
=ZlKb
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-03 14:28             ` Rik van Riel
@ 2014-10-03 14:46               ` Peter Zijlstra
  2014-10-03 15:37                 ` Rik van Riel
  2014-10-03 18:52               ` Nicolas Pitre
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2014-10-03 14:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mike Galbraith, Nicolas Pitre, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On Fri, Oct 03, 2014 at 10:28:42AM -0400, Rik van Riel wrote:
> We have 3 different goals when selecting a runqueue for a task:
> 1) locality: get the task running close to where it has stuff cached
> 2) work preserving: get the task running ASAP, and preferably on a
>    fully idle core
> 3) idle state latency: place the task on a CPU that can start running
>    it ASAP

3 can also be considered part of power aware, seeing how it will try and
let CPUs reach their deep idle potential.

> We may also consider the interplay of the above 3 to have an impact on
> 4) power use: pack tasks on some CPUs so other CPUs can go into deeper
>    idle states
> 
> The current implementation is a "compromise" between (1) and (2),
> with a strong preference for (2), falling back to (1) if no fully
> idle core is found.
> 
> My ugly hack isn't any better, trading off (1) in order to be better
> at (2) and (3). Whether it even affects (4) remains to be seen.
> 
> I know my patch is probably unacceptable, but I do think it is important
> that we talk about the problem, and hopefully agree on exactly what the
> problem is that we want to solve.

Yeah, we've been through this several times, it basically boils down to
the amount of fail vs win on 'various' workloads. The endless problem is
of course that the fail vs win ratio is entirely workload dependent and
as ever there is no comprehensive set.

The last time this came up was when Mike tried to do his cache buddy
idea, which basically reduced things to only looking at 2 cpus. That
make some things fly and some things tank.

> One big question in my mind is, when is locality more important, and
> when is work preserving more important?  Do we have an answer to that
> question?

Typically 2) is important when there's lots of short running tasks
around, any queueing typically destroys throughput in that case.

> The current code has the potential to be quite painful on systems with
> a large number of cores per chip, so we will have to change things
> anyway...

What I said.. so far we've failed at coming up with anything sane
though, so far we've found that 2 cpus is too small a slice to look at
and we're fairly sure 18/36 is too large :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-03 14:46               ` Peter Zijlstra
@ 2014-10-03 15:37                 ` Rik van Riel
  2014-10-09 16:04                   ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Rik van Riel @ 2014-10-03 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Nicolas Pitre, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/03/2014 10:46 AM, Peter Zijlstra wrote:
> On Fri, Oct 03, 2014 at 10:28:42AM -0400, Rik van Riel wrote:

>> The current code has the potential to be quite painful on systems
>> with a large number of cores per chip, so we will have to change
>> things anyway...
> 
> What I said.. so far we've failed at coming up with anything sane 
> though, so far we've found that 2 cpus is too small a slice to look
> at and we're fairly sure 18/36 is too large :-)

Some more brainstorming points...

1) We should probably (lazily/batched?) propagate load information
   up the sched_group tree.  This will be useful for wake_affine,
   load_balancing, find_idlest_cpu, and select_idle_sibling

2) With both find_idlest_cpu and select_idle_sibling walking down
   the tree from the LLC level, they could probably share code

3) Counting both blocked and runnable load may give better long
   term stability of loads, resulting in a reduction in work
   preserving behaviour, but an improvement in locality - this
   could be worthwhile, but it is hard to say in advance

4) We can be pretty sure that CPU makers are not going to stop
   at a mere 18 cores. We need to subdivide things below the LLC
   level, turning select_idle_sibling and find_idlest_cpu into
   a tree walk.

   This means whatever selection criteria are used by these need
   to be propagated up the sched_group tree. This, in turn, means
   we probably need to restrict ourselves to things that do not get
   changed/updated too often.

Am I overlooking anything?

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJULsK7AAoJEM553pKExN6DtBEIAIJWwDPXfrIN6D4yH+/sY7Xg
cDRVDS978OW8GMx3/IqOD90PIvx/l/pttIHHkAcMfDv2Lv8QhiGJEX+OMQg9ETPq
bA31A5t3V3Wlnfc/0xeIMrebc2P3Wfe5s2DApiYPQbDzh47BimDJyeC/9XSqKyvk
CuOZR02t4/20axGwZhl8hk7vGTJhlJWPuh5RUHWjRi2shoHJM90nfZh144GDO3S7
EfiNlC9ZT9z9MYUL6FvCGA7yF+fwzIPE4ppU/KeoDVHsav2OKadV+MjsTQ/IHti2
p0Heu80jEmWW3/zv9zeMpa8jv6Xg8kNsaW709ZSBAzphen5g9sch170A0SdZCiU=
=gUXr
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-03 14:28             ` Rik van Riel
  2014-10-03 14:46               ` Peter Zijlstra
@ 2014-10-03 18:52               ` Nicolas Pitre
  1 sibling, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2014-10-03 18:52 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On Fri, 3 Oct 2014, Rik van Riel wrote:

> We have 3 different goals when selecting a runqueue for a task:
> 1) locality: get the task running close to where it has stuff cached
> 2) work preserving: get the task running ASAP, and preferably on a
>    fully idle core
> 3) idle state latency: place the task on a CPU that can start running
>    it ASAP
> 
> We may also consider the interplay of the above 3 to have an impact on
> 4) power use: pack tasks on some CPUs so other CPUs can go into deeper
>    idle states

In my mind the actual choice is between (1) and (2).  Once you decided 
on (2) then obviously you should imply (3) all the time. And by having 
(3) then (4) should be a natural side effect by not selecting idle CPUs 
randomly.

By selecting (1) you already have (4).

The deficient part right now is (3) as a consequence of (2).  Fixing 
(3) should not have to affect (1).

> The current implementation is a "compromise" between (1) and (2),
> with a strong preference for (2), falling back to (1) if no fully
> idle core is found.
> 
> My ugly hack isn't any better, trading off (1) in order to be better
> at (2) and (3). Whether it even affects (4) remains to be seen.

(4) is greatly influenced by (3) on mobile platforms, especially those 
with a cluster topology.  This might not be as significant on server 
type systems, although performance should benefit as well from the 
smaller wake-up latency.

On a mobile system losing 10% performance to save 20% on power usage 
might be an excellent compromise.  Maybe not so on a server system where 
performance is everything.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states
  2014-10-03 15:37                 ` Rik van Riel
@ 2014-10-09 16:04                   ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2014-10-09 16:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Mike Galbraith, Nicolas Pitre, Ingo Molnar, Daniel Lezcano,
	Rafael J. Wysocki, linux-pm, linux-kernel, linaro-kernel

On Fri, Oct 03, 2014 at 11:37:31AM -0400, Rik van Riel wrote:
> Some more brainstorming points...
> 
> 1) We should probably (lazily/batched?) propagate load information
>    up the sched_group tree.  This will be useful for wake_affine,
>    load_balancing, find_idlest_cpu, and select_idle_sibling
> 
> 2) With both find_idlest_cpu and select_idle_sibling walking down
>    the tree from the LLC level, they could probably share code
> 
> 3) Counting both blocked and runnable load may give better long
>    term stability of loads, resulting in a reduction in work
>    preserving behaviour, but an improvement in locality - this
>    could be worthwhile, but it is hard to say in advance
> 
> 4) We can be pretty sure that CPU makers are not going to stop
>    at a mere 18 cores. We need to subdivide things below the LLC
>    level, turning select_idle_sibling and find_idlest_cpu into
>    a tree walk.
> 
>    This means whatever selection criteria are used by these need
>    to be propagated up the sched_group tree. This, in turn, means
>    we probably need to restrict ourselves to things that do not get
>    changed/updated too often.
> 
> Am I overlooking anything?

Well, we can certainly try something like that; but your last point
seems like a contradition; seeing how _the_ important point for
select_idle_sibling() is the actual idle state, and that per definition
is something that can change/update often.

But yes, the only viable option is some artificial breakup of the
topology and we can indeed try and bridge the gap with some caching.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-10-09 16:04 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-04 15:32 [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
2014-09-04 15:32 ` [PATCH v2 1/2] sched: let the scheduler see CPU idle states Nicolas Pitre
2014-09-18 17:37   ` Paul E. McKenney
2014-09-18 17:39     ` Paul E. McKenney
2014-09-18 23:15       ` Peter Zijlstra
2014-09-18 18:32     ` Nicolas Pitre
2014-09-18 23:17       ` Peter Zijlstra
2014-09-18 23:28         ` Peter Zijlstra
2014-09-19 18:30           ` Nicolas Pitre
2014-09-04 15:32 ` [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu Nicolas Pitre
2014-09-05  7:52   ` Daniel Lezcano
2014-09-18 23:46   ` Peter Zijlstra
2014-09-19  0:05   ` Peter Zijlstra
2014-09-19  4:49   ` Yao Dongdong
2014-09-19  4:49     ` Yao Dongdong
2014-09-30 21:58   ` Rik van Riel
2014-09-30 23:15     ` Nicolas Pitre
2014-10-02 17:15       ` [PATCH RFC] sched,idle: teach select_idle_sibling about idle states Rik van Riel
2014-10-03  6:04         ` Mike Galbraith
2014-10-03  6:23         ` Mike Galbraith
2014-10-03  7:50           ` Peter Zijlstra
2014-10-03 13:05             ` Mike Galbraith
2014-10-03 14:28             ` Rik van Riel
2014-10-03 14:46               ` Peter Zijlstra
2014-10-03 15:37                 ` Rik van Riel
2014-10-09 16:04                   ` Peter Zijlstra
2014-10-03 18:52               ` Nicolas Pitre
2014-09-10 21:35 ` [PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info Nicolas Pitre
2014-09-10 22:50   ` Rafael J. Wysocki
2014-09-10 23:25     ` Nicolas Pitre
2014-09-10 23:28       ` Nicolas Pitre
2014-09-10 23:50       ` Rafael J. Wysocki
2014-09-18  0:39   ` Nicolas Pitre
2014-09-18 23:24     ` Peter Zijlstra
2014-09-19 18:22       ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.