All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] sched: find the latest idle cpu
@ 2014-01-15  4:07 Alex Shi
  2014-01-15  4:31 ` Michael wang
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Alex Shi @ 2014-01-15  4:07 UTC (permalink / raw)
  To: mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel, Alex Shi

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/fair.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..fb52d26 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 			min_load = load;
 			idlest = i;
 		}
+#ifdef CONFIG_NO_HZ_COMMON
+		/*
+		 * Coarsely to get the latest idle cpu for shorter latency and
+		 * possible power benefit.
+		 */
+		if (!min_load) {
+			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+			s64 latest_wake = 0;
+			/* idle cpu doing irq */
+			if (ts->inidle && !ts->idle_active)
+				idlest = i;
+			/* the cpu resched */
+			else if (!ts->inidle)
+				idlest = i;
+			/* find latest idle cpu */
+			else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
+				idlest = i;
+		}
+#endif
 	}
 
 	return idlest;
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi
@ 2014-01-15  4:31 ` Michael wang
  2014-01-15  4:48   ` Alex Shi
  2014-01-15  5:33 ` Michael wang
  2014-01-15  7:35 ` Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Michael wang @ 2014-01-15  4:31 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, tglx, daniel.lezcano, vincent.guittot,
	morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

Hi, Alex

On 01/15/2014 12:07 PM, Alex Shi wrote:
[snip] 		}
> +#ifdef CONFIG_NO_HZ_COMMON
> +		/*
> +		 * Coarsely to get the latest idle cpu for shorter latency and
> +		 * possible power benefit.
> +		 */
> +		if (!min_load) {
> +			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
> +
> +			s64 latest_wake = 0;

I guess we missed some code for latest_wake here?

Regards,
Michael Wang

> +			/* idle cpu doing irq */
> +			if (ts->inidle && !ts->idle_active)
> +				idlest = i;
> +			/* the cpu resched */
> +			else if (!ts->inidle)
> +				idlest = i;
> +			/* find latest idle cpu */
> +			else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
> +				idlest = i;
> +		}
> +#endif
>  	}
> 
>  	return idlest;
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  4:31 ` Michael wang
@ 2014-01-15  4:48   ` Alex Shi
  2014-01-15  4:53     ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-01-15  4:48 UTC (permalink / raw)
  To: Michael wang, mingo, peterz, tglx, daniel.lezcano,
	vincent.guittot, morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

On 01/15/2014 12:31 PM, Michael wang wrote:
> Hi, Alex
> 
> On 01/15/2014 12:07 PM, Alex Shi wrote:
> [snip] 		}
>> +#ifdef CONFIG_NO_HZ_COMMON
>> +		/*
>> +		 * Coarsely to get the latest idle cpu for shorter latency and
>> +		 * possible power benefit.
>> +		 */
>> +		if (!min_load) {

here should be !load.
>> +			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
>> +
>> +			s64 latest_wake = 0;
> 
> I guess we missed some code for latest_wake here?

Yes, thanks for reminder!

so updated patch:

====

>From c3a88e73fed3da96549b5a922076e996832685f8 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@linaro.org>
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..73a2a07 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,31 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 			min_load = load;
 			idlest = i;
 		}
+#ifdef CONFIG_NO_HZ_COMMON
+		/*
+		 * Coarsely to get the latest idle cpu for shorter latency and
+		 * possible power benefit.
+		 */
+		if (!load) {
+			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+			s64 latest_wake = 0;
+			/* idle cpu doing irq */
+			if (ts->inidle && !ts->idle_active)
+				idlest = i;
+			/* the cpu resched */
+			else if (!ts->inidle)
+				idlest = i;
+			/* find latest idle cpu */
+			else {
+				s64 temp = ktime_to_us(ts->idle_entrytime);
+				if (temp > latest_wake) {
+					latest_wake = temp;
+					idlest = i;
+				}
+			}
+		}
+#endif
 	}
 
 	return idlest;
-- 
1.8.1.2

-- 
Thanks
    Alex

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  4:48   ` Alex Shi
@ 2014-01-15  4:53     ` Alex Shi
  2014-01-15  5:06       ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-01-15  4:53 UTC (permalink / raw)
  To: Michael wang, mingo, peterz, tglx, daniel.lezcano,
	vincent.guittot, morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

On 01/15/2014 12:48 PM, Alex Shi wrote:
> On 01/15/2014 12:31 PM, Michael wang wrote:
>> Hi, Alex
>>
>> On 01/15/2014 12:07 PM, Alex Shi wrote:
>> [snip] 		}
>>> +#ifdef CONFIG_NO_HZ_COMMON
>>> +		/*
>>> +		 * Coarsely to get the latest idle cpu for shorter latency and
>>> +		 * possible power benefit.
>>> +		 */
>>> +		if (!min_load) {
> 
> here should be !load.
>>> +			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
>>> +
>>> +			s64 latest_wake = 0;
>>
>> I guess we missed some code for latest_wake here?
> 
> Yes, thanks for reminder!
> 
> so updated patch:
> 

ops, still incorrect. re-updated:

===

>From 5d48303b3eb3b5ca7fde54a6dfcab79cff360403 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@linaro.org>
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/fair.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..e2c4cd9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4161,12 +4161,38 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 
 	/* Traverse only the allowed CPUs */
 	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
+		s64 latest_wake = 0;
+
 		load = weighted_cpuload(i);
 
 		if (load < min_load || (load == min_load && i == this_cpu)) {
 			min_load = load;
 			idlest = i;
 		}
+#ifdef CONFIG_NO_HZ_COMMON
+		/*
+		 * Coarsely to get the latest idle cpu for shorter latency and
+		 * possible power benefit.
+		 */
+		if (!load) {
+			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+			/* idle cpu doing irq */
+			if (ts->inidle && !ts->idle_active)
+				idlest = i;
+			/* the cpu resched */
+			else if (!ts->inidle)
+				idlest = i;
+			/* find latest idle cpu */
+			else {
+				s64 temp = ktime_to_us(ts->idle_entrytime);
+				if (temp > latest_wake) {
+					latest_wake = temp;
+					idlest = i;
+				}
+			}
+		}
+#endif
 	}
 
 	return idlest;
-- 
1.8.1.2

-- 
Thanks
    Alex

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  4:53     ` Alex Shi
@ 2014-01-15  5:06       ` Alex Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2014-01-15  5:06 UTC (permalink / raw)
  To: Michael wang, mingo, peterz, tglx, daniel.lezcano,
	vincent.guittot, morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

On 01/15/2014 12:53 PM, Alex Shi wrote:
>>> >> I guess we missed some code for latest_wake here?
>> > 
>> > Yes, thanks for reminder!
>> > 
>> > so updated patch:
>> > 
> ops, still incorrect. re-updated:

update to wrong file. re-re-update. :(

===

>From b75e43bb77df14e2209532c1e5c48e0e03afa414 Mon Sep 17 00:00:00 2001
From: Alex Shi <alex.shi@linaro.org>
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
 kernel/sched/fair.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..f82ca3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4159,6 +4159,10 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 	int idlest = -1;
 	int i;
 
+#ifdef CONFIG_NO_HZ_COMMON
+	s64 latest_wake = 0;
+#endif
+
 	/* Traverse only the allowed CPUs */
 	for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
 		load = weighted_cpuload(i);
@@ -4167,6 +4171,30 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
 			min_load = load;
 			idlest = i;
 		}
+#ifdef CONFIG_NO_HZ_COMMON
+		/*
+		 * Coarsely to get the latest idle cpu for shorter latency and
+		 * possible power benefit.
+		 */
+		if (!load) {
+			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+			/* idle cpu doing irq */
+			if (ts->inidle && !ts->idle_active)
+				idlest = i;
+			/* the cpu resched */
+			else if (!ts->inidle)
+				idlest = i;
+			/* find latest idle cpu */
+			else {
+				s64 temp = ktime_to_us(ts->idle_entrytime);
+				if (temp > latest_wake) {
+					latest_wake = temp;
+					idlest = i;
+				}
+			}
+		}
+#endif
 	}
 
 	return idlest;
-- 
1.8.1.2

-- 
Thanks
    Alex

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi
  2014-01-15  4:31 ` Michael wang
@ 2014-01-15  5:33 ` Michael wang
  2014-01-15  6:45   ` Alex Shi
  2014-01-15  7:35 ` Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Michael wang @ 2014-01-15  5:33 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, tglx, daniel.lezcano, vincent.guittot,
	morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

On 01/15/2014 12:07 PM, Alex Shi wrote:
> Currently we just try to find least load cpu. If some cpus idled,
> we just pick the first cpu in cpu mask.
> 
> In fact we can get the interrupted idle cpu or the latest idled cpu,
> then we may get the benefit from both latency and power.
> The selected cpu maybe not the best, since other cpu may be interrupted
> during our selecting. But be captious costs too much.

So the idea here is we want to choose the latest idle cpu if we have
multiple idle cpu for choosing, correct?

And I guess that was in order to avoid choosing tickless cpu while there
are un-tickless idle one, is that right?

What confused me is, what about those cpu who just going to recover from
tickless as you mentioned, which means latest idle doesn't mean the best
choice, or even could be the worst (if just two choice, and the longer
tickless one is just going to recover while the latest is going to
tickless).

So what about just check 'ts->tick_stopped' and record one ticking idle
cpu? the cost could be lower than time comparison, we could reduce the
risk may be...(well, not so risky since the logical only works when
system is relaxing with several cpu idle)

Regards,
Michael Wang

> 
> Signed-off-by: Alex Shi <alex.shi@linaro.org>
> ---
>  kernel/sched/fair.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c7395d9..fb52d26 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
>  			min_load = load;
>  			idlest = i;
>  		}
> +#ifdef CONFIG_NO_HZ_COMMON
> +		/*
> +		 * Coarsely to get the latest idle cpu for shorter latency and
> +		 * possible power benefit.
> +		 */
> +		if (!min_load) {
> +			struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
> +
> +			s64 latest_wake = 0;
> +			/* idle cpu doing irq */
> +			if (ts->inidle && !ts->idle_active)
> +				idlest = i;
> +			/* the cpu resched */
> +			else if (!ts->inidle)
> +				idlest = i;
> +			/* find latest idle cpu */
> +			else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
> +				idlest = i;
> +		}
> +#endif
>  	}
> 
>  	return idlest;
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  5:33 ` Michael wang
@ 2014-01-15  6:45   ` Alex Shi
  2014-01-15  8:05     ` Michael wang
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-01-15  6:45 UTC (permalink / raw)
  To: Michael wang, mingo, peterz, tglx, daniel.lezcano,
	vincent.guittot, morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

On 01/15/2014 01:33 PM, Michael wang wrote:
> On 01/15/2014 12:07 PM, Alex Shi wrote:
>> > Currently we just try to find least load cpu. If some cpus idled,
>> > we just pick the first cpu in cpu mask.
>> > 
>> > In fact we can get the interrupted idle cpu or the latest idled cpu,
>> > then we may get the benefit from both latency and power.
>> > The selected cpu maybe not the best, since other cpu may be interrupted
>> > during our selecting. But be captious costs too much.
> So the idea here is we want to choose the latest idle cpu if we have
> multiple idle cpu for choosing, correct?

yes.
> 
> And I guess that was in order to avoid choosing tickless cpu while there
> are un-tickless idle one, is that right?

no, current logical choice least load cpu no matter if it is idle.
> 
> What confused me is, what about those cpu who just going to recover from
> tickless as you mentioned, which means latest idle doesn't mean the best
> choice, or even could be the worst (if just two choice, and the longer
> tickless one is just going to recover while the latest is going to
> tickless).

yes, to save your scenario, we need to know the next timer for idle cpu,
but that is not enough, interrupt is totally unpredictable. So, I'd
rather bear the coarse method now.
> 
> So what about just check 'ts->tick_stopped' and record one ticking idle
> cpu? the cost could be lower than time comparison, we could reduce the
> risk may be...(well, not so risky since the logical only works when
> system is relaxing with several cpu idle)

first, nohz full also stop tick. second, tick_stopped can not reflect
the interrupt. when the idle cpu was interrupted, it's waken, then be a
good candidate for task running.

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi
  2014-01-15  4:31 ` Michael wang
  2014-01-15  5:33 ` Michael wang
@ 2014-01-15  7:35 ` Peter Zijlstra
  2014-01-15 14:37   ` Alex Shi
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2014-01-15  7:35 UTC (permalink / raw)
  To: Alex Shi
  Cc: mingo, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen,
	linux-kernel, akpm, fengguang.wu, linaro-kernel

On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
> Currently we just try to find least load cpu. If some cpus idled,
> we just pick the first cpu in cpu mask.
> 
> In fact we can get the interrupted idle cpu or the latest idled cpu,
> then we may get the benefit from both latency and power.
> The selected cpu maybe not the best, since other cpu may be interrupted
> during our selecting. But be captious costs too much.

No, we should not do anything like this without first integrating
cpuidle.

At which point we have a sane view of the idle states and can make a
sane choice between them.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  6:45   ` Alex Shi
@ 2014-01-15  8:05     ` Michael wang
  2014-01-15 14:28       ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Michael wang @ 2014-01-15  8:05 UTC (permalink / raw)
  To: Alex Shi, mingo, peterz, tglx, daniel.lezcano, vincent.guittot,
	morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

On 01/15/2014 02:45 PM, Alex Shi wrote:
[snip]
> 
> yes, to save your scenario, we need to know the next timer for idle cpu,
> but that is not enough, interrupt is totally unpredictable. So, I'd
> rather bear the coarse method now.
>>
>> So what about just check 'ts->tick_stopped' and record one ticking idle
>> cpu? the cost could be lower than time comparison, we could reduce the
>> risk may be...(well, not so risky since the logical only works when
>> system is relaxing with several cpu idle)
> 
> first, nohz full also stop tick. second, tick_stopped can not reflect
> the interrupt. when the idle cpu was interrupted, it's waken, then be a
> good candidate for task running.

IMHO, if we have to do gamble here, we better choose the cheaper bet,
unless we could prove this 'coarse method' have more higher chance for
BINGO than just check 'tick_stopped'...

BTW, may be the logical should be in the select_idle_sibling()?

Regards,
Michael Wang

> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  8:05     ` Michael wang
@ 2014-01-15 14:28       ` Alex Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2014-01-15 14:28 UTC (permalink / raw)
  To: Michael wang, mingo, peterz, tglx, daniel.lezcano,
	vincent.guittot, morten.rasmussen
  Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel

On 01/15/2014 04:05 PM, Michael wang wrote:
> On 01/15/2014 02:45 PM, Alex Shi wrote:
> [snip]
>>
>> yes, to save your scenario, we need to know the next timer for idle cpu,
>> but that is not enough, interrupt is totally unpredictable. So, I'd
>> rather bear the coarse method now.
>>>
>>> So what about just check 'ts->tick_stopped' and record one ticking idle
>>> cpu? the cost could be lower than time comparison, we could reduce the
>>> risk may be...(well, not so risky since the logical only works when
>>> system is relaxing with several cpu idle)
>>
>> first, nohz full also stop tick. second, tick_stopped can not reflect
>> the interrupt. when the idle cpu was interrupted, it's waken, then be a
>> good candidate for task running.
> 
> IMHO, if we have to do gamble here, we better choose the cheaper bet,
> unless we could prove this 'coarse method' have more higher chance for
> BINGO than just check 'tick_stopped'...

Tick stopped on a nohz full CPU, but the cpu still had a task running...
> 
> BTW, may be the logical should be in the select_idle_sibling()?

both of functions need to be considered.
> 
> Regards,
> Michael Wang
> 
>>
> 


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15  7:35 ` Peter Zijlstra
@ 2014-01-15 14:37   ` Alex Shi
  2014-01-16 11:03     ` Daniel Lezcano
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2014-01-15 14:37 UTC (permalink / raw)
  To: Peter Zijlstra, daniel.lezcano
  Cc: mingo, tglx, vincent.guittot, morten.rasmussen, linux-kernel,
	akpm, fengguang.wu, linaro-kernel

On 01/15/2014 03:35 PM, Peter Zijlstra wrote:
> On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
>> Currently we just try to find least load cpu. If some cpus idled,
>> we just pick the first cpu in cpu mask.
>>
>> In fact we can get the interrupted idle cpu or the latest idled cpu,
>> then we may get the benefit from both latency and power.
>> The selected cpu maybe not the best, since other cpu may be interrupted
>> during our selecting. But be captious costs too much.
> 
> No, we should not do anything like this without first integrating
> cpuidle.
> 
> At which point we have a sane view of the idle states and can make a
> sane choice between them.
> 


Daniel,

Any comments to make it better?

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-15 14:37   ` Alex Shi
@ 2014-01-16 11:03     ` Daniel Lezcano
  2014-01-16 11:38       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2014-01-16 11:03 UTC (permalink / raw)
  To: Alex Shi, Peter Zijlstra
  Cc: mingo, tglx, vincent.guittot, morten.rasmussen, linux-kernel,
	akpm, fengguang.wu, linaro-kernel, Michael wang

On 01/15/2014 03:37 PM, Alex Shi wrote:
> On 01/15/2014 03:35 PM, Peter Zijlstra wrote:
>> On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
>>> Currently we just try to find least load cpu. If some cpus idled,
>>> we just pick the first cpu in cpu mask.
>>>
>>> In fact we can get the interrupted idle cpu or the latest idled cpu,
>>> then we may get the benefit from both latency and power.
>>> The selected cpu maybe not the best, since other cpu may be interrupted
>>> during our selecting. But be captious costs too much.
>>
>> No, we should not do anything like this without first integrating
>> cpuidle.
>>
>> At which point we have a sane view of the idle states and can make a
>> sane choice between them.
>>
>
>
> Daniel,
>
> Any comments to make it better?

Hi Alex,

it is a nice optimization attempt but I agree with Peter we should focus 
on integrating cpuidle.

The question is "how do we integrate cpuidle ?"

IMHO, the main problem are the governors, especially the menu governor.

The menu governor tries to predict the events per cpu. This approach 
which gave us a nice benefit for the power saving may not fit well for 
the scheduler.

I think we can classify the events in three categories:

1. fully predictable (timers)
2. partially predictable (eg. MMC, sdd or network)
3. unpredictable (eg. keyboard, network ingress after quiescent period)

The menu governor mix 2 and 3 with statistics and a performance 
multiplier to reach shallow states based on heuristic and 
experimentation for a specific platform.

I was wondering if we shouldn't create a per task io latency tracking.

Mostly based on io_schedule and io_schedule_timeout, we track the 
latency for each task for each device, keeping up to date a rb-tree 
where the left-most leaf is the minimum latency for all the tasks 
running on a specific cpu. That allows better tracking when moving tasks 
across cpus.

With this approach, we have something consistent with the per load task 
tracking.

This io latency tracking gives us the next wake up event we can inject 
to the cpuidle framework directly. That removes all the code related to 
the menu governor statistics based on IO events and simplify a lot the 
menu governor code. So we replaced a piece of the cpuidle code by a 
scheduler code which I hope could be better for prediction, leading to a 
part of integration.

In order to finish integrating the cpuidle framework in the scheduler, 
there are pending questions about the impact in the current design.

Peter or Ingo, if you have time, could you have a look at the email I 
sent previously [1] ?

Thanks

   -- Daniel


[1] https://lkml.org/lkml/2013/12/17/106

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-16 11:03     ` Daniel Lezcano
@ 2014-01-16 11:38       ` Peter Zijlstra
  2014-01-16 12:16         ` Daniel Lezcano
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2014-01-16 11:38 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Alex Shi, mingo, tglx, vincent.guittot, morten.rasmussen,
	linux-kernel, akpm, fengguang.wu, linaro-kernel, Michael wang

On Thu, Jan 16, 2014 at 12:03:13PM +0100, Daniel Lezcano wrote:
> Hi Alex,
> 
> it is a nice optimization attempt but I agree with Peter we should focus on
> integrating cpuidle.
> 
> The question is "how do we integrate cpuidle ?"
> 
> IMHO, the main problem are the governors, especially the menu governor.

Yah.

> The menu governor tries to predict the events per cpu. This approach which
> gave us a nice benefit for the power saving may not fit well for the
> scheduler.

So the way to start all this is I think to gradually share more and
more.

Start by pulling in the actual idle state; such that we can indeed
observe what the relative cost is of waking a cpu (against another), and
maybe even the predicted wakeup time.

Then pull in the various statistics gathering bits -- without improving
them.

Then improve the statistics; try and remove duplicate statistics -- if
there's such things, try and use the extra information the scheduler has
etc..

Then worry about the governors, or what's left of them.

> In order to finish integrating the cpuidle framework in the scheduler, there
> are pending questions about the impact in the current design.
> 
> Peter or Ingo, if you have time, could you have a look at the email I sent
> previously [1] ?

I read it once, it didn't make sense at the time, I just read it again,
still doesn't make sense.

We need the idle task, since we need to DO something to go idle, the
scheduler needs to pick a task to go do that something. This is the idle
task.

You cannot get rid of that.

In fact, the 'doing' of that task is running much of the cpuidle code,
so by getting rid of it, there's nobody left to execute that code.

Also, since its already running that cpuidle stuff, integrating it more
closely with the scheduler will not in fact change much, it will still
run it.

Could of course be I'm not reading what you meant to write, if so, do
try again ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-16 11:38       ` Peter Zijlstra
@ 2014-01-16 12:16         ` Daniel Lezcano
  2014-01-17  2:40           ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2014-01-16 12:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alex Shi, mingo, tglx, vincent.guittot, morten.rasmussen,
	linux-kernel, akpm, fengguang.wu, linaro-kernel, Michael wang

On 01/16/2014 12:38 PM, Peter Zijlstra wrote:
> On Thu, Jan 16, 2014 at 12:03:13PM +0100, Daniel Lezcano wrote:
>> Hi Alex,
>>
>> it is a nice optimization attempt but I agree with Peter we should focus on
>> integrating cpuidle.
>>
>> The question is "how do we integrate cpuidle ?"
>>
>> IMHO, the main problem are the governors, especially the menu governor.
>
> Yah.
>
>> The menu governor tries to predict the events per cpu. This approach which
>> gave us a nice benefit for the power saving may not fit well for the
>> scheduler.
>
> So the way to start all this is I think to gradually share more and
> more.
>
> Start by pulling in the actual idle state; such that we can indeed
> observe what the relative cost is of waking a cpu (against another), and
> maybe even the predicted wakeup time.

Ok, I will send a patch for this.

> Then pull in the various statistics gathering bits -- without improving
> them.
>
> Then improve the statistics; try and remove duplicate statistics -- if
> there's such things, try and use the extra information the scheduler has
> etc..
>
> Then worry about the governors, or what's left of them.
>
>> In order to finish integrating the cpuidle framework in the scheduler, there
>> are pending questions about the impact in the current design.
>>
>> Peter or Ingo, if you have time, could you have a look at the email I sent
>> previously [1] ?
>
> I read it once, it didn't make sense at the time, I just read it again,
> still doesn't make sense.

:)

The question raised when I looked closely how to fully integrate cpuidle 
with the scheduler; in particular, the idle time.
The scheduler idle time is not the same than the cpuidle idle time.
A cpu can be idle for the scheduler 1s but it could be interrupted 
several times by an interrupt thus the idle time for cpuidle is 
different. But anyway ...

> We need the idle task, since we need to DO something to go idle, the
> scheduler needs to pick a task to go do that something. This is the idle
> task.
>
> You cannot get rid of that.
>
> In fact, the 'doing' of that task is running much of the cpuidle code,
> so by getting rid of it, there's nobody left to execute that code.
>
> Also, since its already running that cpuidle stuff, integrating it more
> closely with the scheduler will not in fact change much, it will still
> run it.
>
> Could of course be I'm not reading what you meant to write, if so, do
> try again ;-)

Well, I wanted to have a clarification of what was your feeling about 
how to integrate cpuidle in the scheduler. If removing the idle task (in 
the future) does not make sense for you, I will not insist. Let's see 
how the code evolves by integrating cpuidle and we will figure out what 
will be the impact on the idle task.

Thanks for your feedbacks

   -- Daniel

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] sched: find the latest idle cpu
  2014-01-16 12:16         ` Daniel Lezcano
@ 2014-01-17  2:40           ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2014-01-17  2:40 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Peter Zijlstra, linaro-kernel, Andrew Morton, linux-kernel,
	mingo, Michael wang, fengguang.wu

On Thu, 16 Jan 2014, Daniel Lezcano wrote:

> The question raised when I looked closely how to fully integrate cpuidle with
> the scheduler; in particular, the idle time.
> The scheduler idle time is not the same than the cpuidle idle time.
> A cpu can be idle for the scheduler 1s but it could be interrupted several
> times by an interrupt thus the idle time for cpuidle is different. But anyway
> ...

The idle task would run each time an interrupt has been serviced, either 
to yield to a newly awaken task or to put the CPU back to sleep.  In the 
later case the idle task may simply do extra idleness accounting 
locally.  If the former case happens most of the time then the scheduler 
idle time would be most representative already.

And if threaded IRQs are used then the the scheduler idle time would be 
the same as cpuidle's.

> > We need the idle task, since we need to DO something to go idle, the
> > scheduler needs to pick a task to go do that something. This is the idle
> > task.
> > 
> > You cannot get rid of that.
> > 
> > In fact, the 'doing' of that task is running much of the cpuidle code,
> > so by getting rid of it, there's nobody left to execute that code.
> > 
> > Also, since its already running that cpuidle stuff, integrating it more
> > closely with the scheduler will not in fact change much, it will still
> > run it.
> > 
> > Could of course be I'm not reading what you meant to write, if so, do
> > try again ;-)
> 
> Well, I wanted to have a clarification of what was your feeling about how to
> integrate cpuidle in the scheduler. If removing the idle task (in the future)
> does not make sense for you, I will not insist. Let's see how the code evolves
> by integrating cpuidle and we will figure out what will be the impact on the
> idle task.

I think we should be able to get rid of architecture specific idle 
loops.  The idle loop could be moved close to the scheduler and 
architectures would only need to provide a default CPU halt method for 
when there is nothing else registered with the cpuidle subsystem.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-01-17  2:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-15  4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi
2014-01-15  4:31 ` Michael wang
2014-01-15  4:48   ` Alex Shi
2014-01-15  4:53     ` Alex Shi
2014-01-15  5:06       ` Alex Shi
2014-01-15  5:33 ` Michael wang
2014-01-15  6:45   ` Alex Shi
2014-01-15  8:05     ` Michael wang
2014-01-15 14:28       ` Alex Shi
2014-01-15  7:35 ` Peter Zijlstra
2014-01-15 14:37   ` Alex Shi
2014-01-16 11:03     ` Daniel Lezcano
2014-01-16 11:38       ` Peter Zijlstra
2014-01-16 12:16         ` Daniel Lezcano
2014-01-17  2:40           ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.