linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
@ 2019-10-24  6:45 Viresh Kumar
  2019-10-25  6:43 ` Parth Shah
  2019-10-30 16:47 ` Mel Gorman
  0 siblings, 2 replies; 9+ messages in thread
From: Viresh Kumar @ 2019-10-24  6:45 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman
  Cc: Viresh Kumar, linux-kernel

There are instances where we keep searching for an idle CPU despite
having a sched-idle cpu already (in find_idlest_group_cpu(),
select_idle_smt() and select_idle_cpu() and then there are places where
we don't necessarily do that and return a sched-idle cpu as soon as we
find one (in select_idle_sibling()). This looks a bit inconsistent and
it may be worth having the same policy everywhere.

On the other hand, choosing a sched-idle cpu over a idle one shall be
beneficial from performance point of view as well, as we don't need to
get the cpu online from a deep idle state which is quite a time
consuming process and delays the scheduling of the newly wakeup task.

This patch tries to simplify code around sched-idle cpu selection and
make it consistent throughout.

FWIW, tests were done with the help of rt-app (8 SCHED_OTHER and 5
SCHED_IDLE tasks, not bound to any cpu) on ARM platform (octa-core), and
no significant difference in scheduling latency of SCHED_OTHER tasks was
found.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/fair.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a81c36472822..bb367f48c1ef 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5545,7 +5545,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 	unsigned int min_exit_latency = UINT_MAX;
 	u64 latest_idle_timestamp = 0;
 	int least_loaded_cpu = this_cpu;
-	int shallowest_idle_cpu = -1, si_cpu = -1;
+	int shallowest_idle_cpu = -1;
 	int i;
 
 	/* Check if we have any choice: */
@@ -5554,6 +5554,9 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 
 	/* Traverse only the allowed CPUs */
 	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
+		if (sched_idle_cpu(i))
+			return i;
+
 		if (available_idle_cpu(i)) {
 			struct rq *rq = cpu_rq(i);
 			struct cpuidle_state *idle = idle_get_state(rq);
@@ -5576,12 +5579,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 				latest_idle_timestamp = rq->idle_stamp;
 				shallowest_idle_cpu = i;
 			}
-		} else if (shallowest_idle_cpu == -1 && si_cpu == -1) {
-			if (sched_idle_cpu(i)) {
-				si_cpu = i;
-				continue;
-			}
-
+		} else if (shallowest_idle_cpu == -1) {
 			load = cpu_load(cpu_rq(i));
 			if (load < min_load) {
 				min_load = load;
@@ -5590,11 +5588,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 		}
 	}
 
-	if (shallowest_idle_cpu != -1)
-		return shallowest_idle_cpu;
-	if (si_cpu != -1)
-		return si_cpu;
-	return least_loaded_cpu;
+	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
 }
 
 static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
@@ -5747,7 +5741,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
  */
 static int select_idle_smt(struct task_struct *p, int target)
 {
-	int cpu, si_cpu = -1;
+	int cpu;
 
 	if (!static_branch_likely(&sched_smt_present))
 		return -1;
@@ -5755,13 +5749,11 @@ static int select_idle_smt(struct task_struct *p, int target)
 	for_each_cpu(cpu, cpu_smt_mask(target)) {
 		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 			continue;
-		if (available_idle_cpu(cpu))
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
 			return cpu;
-		if (si_cpu == -1 && sched_idle_cpu(cpu))
-			si_cpu = cpu;
 	}
 
-	return si_cpu;
+	return -1;
 }
 
 #else /* CONFIG_SCHED_SMT */
@@ -5790,7 +5782,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	u64 time, cost;
 	s64 delta;
 	int this = smp_processor_id();
-	int cpu, nr = INT_MAX, si_cpu = -1;
+	int cpu, nr = INT_MAX;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
@@ -5818,13 +5810,11 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
 		if (!--nr)
-			return si_cpu;
+			return -1;
 		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 			continue;
-		if (available_idle_cpu(cpu))
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
 			break;
-		if (si_cpu == -1 && sched_idle_cpu(cpu))
-			si_cpu = cpu;
 	}
 
 	time = cpu_clock(this) - time;
-- 
2.21.0.rc0.269.g1a574e7a288b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-10-24  6:45 [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout Viresh Kumar
@ 2019-10-25  6:43 ` Parth Shah
  2019-10-25  8:11   ` Viresh Kumar
  2019-10-30 16:47 ` Mel Gorman
  1 sibling, 1 reply; 9+ messages in thread
From: Parth Shah @ 2019-10-25  6:43 UTC (permalink / raw)
  To: Viresh Kumar, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman
  Cc: linux-kernel

Hi Viresh,

On 10/24/19 12:15 PM, Viresh Kumar wrote:
> There are instances where we keep searching for an idle CPU despite
> having a sched-idle cpu already (in find_idlest_group_cpu(),
> select_idle_smt() and select_idle_cpu() and then there are places where
> we don't necessarily do that and return a sched-idle cpu as soon as we
> find one (in select_idle_sibling()). This looks a bit inconsistent and
> it may be worth having the same policy everywhere.
> 
> On the other hand, choosing a sched-idle cpu over a idle one shall be
> beneficial from performance point of view as well, as we don't need to
> get the cpu online from a deep idle state which is quite a time
> consuming process and delays the scheduling of the newly wakeup task.
> 
> This patch tries to simplify code around sched-idle cpu selection and
> make it consistent throughout.
> 
> FWIW, tests were done with the help of rt-app (8 SCHED_OTHER and 5
> SCHED_IDLE tasks, not bound to any cpu) on ARM platform (octa-core), and
> no significant difference in scheduling latency of SCHED_OTHER tasks was
> found.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---

[...]

> @@ -5755,13 +5749,11 @@ static int select_idle_smt(struct task_struct *p, int target)
>  	for_each_cpu(cpu, cpu_smt_mask(target)) {
>  		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>  			continue;
> -		if (available_idle_cpu(cpu))
> +		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
>  			return cpu;

I guess this is a correct approach, but just wondering what if we still
keep searching for a sched_idle CPU even though we have found an
available_idle CPU?

[...]


Thanks,
Parth


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-10-25  6:43 ` Parth Shah
@ 2019-10-25  8:11   ` Viresh Kumar
  2019-10-25 12:00     ` Parth Shah
  0 siblings, 1 reply; 9+ messages in thread
From: Viresh Kumar @ 2019-10-25  8:11 UTC (permalink / raw)
  To: Parth Shah
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	linux-kernel

On 25-10-19, 12:13, Parth Shah wrote:
> Hi Viresh,
> 
> On 10/24/19 12:15 PM, Viresh Kumar wrote:
> > There are instances where we keep searching for an idle CPU despite
> > having a sched-idle cpu already (in find_idlest_group_cpu(),
> > select_idle_smt() and select_idle_cpu() and then there are places where
> > we don't necessarily do that and return a sched-idle cpu as soon as we
> > find one (in select_idle_sibling()). This looks a bit inconsistent and
> > it may be worth having the same policy everywhere.
> > 
> > On the other hand, choosing a sched-idle cpu over a idle one shall be
> > beneficial from performance point of view as well, as we don't need to
> > get the cpu online from a deep idle state which is quite a time
> > consuming process and delays the scheduling of the newly wakeup task.
> > 
> > This patch tries to simplify code around sched-idle cpu selection and
> > make it consistent throughout.
> > 
> > FWIW, tests were done with the help of rt-app (8 SCHED_OTHER and 5
> > SCHED_IDLE tasks, not bound to any cpu) on ARM platform (octa-core), and
> > no significant difference in scheduling latency of SCHED_OTHER tasks was
> > found.
> > 
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> 
> [...]
> 
> > @@ -5755,13 +5749,11 @@ static int select_idle_smt(struct task_struct *p, int target)
> >  	for_each_cpu(cpu, cpu_smt_mask(target)) {
> >  		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> >  			continue;
> > -		if (available_idle_cpu(cpu))
> > +		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
> >  			return cpu;
> 
> I guess this is a correct approach, but just wondering what if we still
> keep searching for a sched_idle CPU even though we have found an
> available_idle CPU?

I do believe selecting a sched-idle CPU should almost always be better
(performance wise), unless we have a strong argument against it. And
anyway, the load balancer will get triggered at a later point of time
and will pull away these newly wakeup tasks to idle CPUs. The
advantage we get out of it is that the tasks get serviced a bit
earlier when they first get queued.

It is really up to the maintainers to see what kind of policy do we
want to adapt here and not a choice I can make :)

-- 
viresh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-10-25  8:11   ` Viresh Kumar
@ 2019-10-25 12:00     ` Parth Shah
  0 siblings, 0 replies; 9+ messages in thread
From: Parth Shah @ 2019-10-25 12:00 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	linux-kernel



On 10/25/19 1:41 PM, Viresh Kumar wrote:
> On 25-10-19, 12:13, Parth Shah wrote:
>> Hi Viresh,
>>
>> On 10/24/19 12:15 PM, Viresh Kumar wrote:
>>> There are instances where we keep searching for an idle CPU despite
>>> having a sched-idle cpu already (in find_idlest_group_cpu(),
>>> select_idle_smt() and select_idle_cpu() and then there are places where
>>> we don't necessarily do that and return a sched-idle cpu as soon as we
>>> find one (in select_idle_sibling()). This looks a bit inconsistent and
>>> it may be worth having the same policy everywhere.
>>>
>>> On the other hand, choosing a sched-idle cpu over a idle one shall be
>>> beneficial from performance point of view as well, as we don't need to
>>> get the cpu online from a deep idle state which is quite a time
>>> consuming process and delays the scheduling of the newly wakeup task.
>>>
>>> This patch tries to simplify code around sched-idle cpu selection and
>>> make it consistent throughout.
>>>
>>> FWIW, tests were done with the help of rt-app (8 SCHED_OTHER and 5
>>> SCHED_IDLE tasks, not bound to any cpu) on ARM platform (octa-core), and
>>> no significant difference in scheduling latency of SCHED_OTHER tasks was
>>> found.
>>>
>>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>> ---
>>
>> [...]
>>
>>> @@ -5755,13 +5749,11 @@ static int select_idle_smt(struct task_struct *p, int target)
>>>  	for_each_cpu(cpu, cpu_smt_mask(target)) {
>>>  		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>>>  			continue;
>>> -		if (available_idle_cpu(cpu))
>>> +		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
>>>  			return cpu;
>>
>> I guess this is a correct approach, but just wondering what if we still
>> keep searching for a sched_idle CPU even though we have found an
>> available_idle CPU?
> 
> I do believe selecting a sched-idle CPU should almost always be better
> (performance wise), unless we have a strong argument against it. And
> anyway, the load balancer will get triggered at a later point of time
> and will pull away these newly wakeup tasks to idle CPUs. The
> advantage we get out of it is that the tasks get serviced a bit
> earlier when they first get queued.
> 
> It is really up to the maintainers to see what kind of policy do we
> want to adapt here and not a choice I can make :)
> 

yeah, I agree. I will favor selecting sched-idle first for smaller domains
like SMT but would leave on experts.
BTW, if sched-idle is given priority then maybe...
> @@ -5818,13 +5810,11 @@ static int select_idle_cpu(struct task_struct *p,
> struct sched_domain *sd, int t
>
>  	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
>  		if (!--nr)
> -			return si_cpu;
> +			return -1;
>  		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>  			continue;
> -		if (available_idle_cpu(cpu))
> +		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
>  			break;
...here too can be optimized I guess.


Thanks,
Parth


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-10-24  6:45 [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout Viresh Kumar
  2019-10-25  6:43 ` Parth Shah
@ 2019-10-30 16:47 ` Mel Gorman
  2019-10-31  9:12   ` Viresh Kumar
  2019-11-08 11:31   ` Viresh Kumar
  1 sibling, 2 replies; 9+ messages in thread
From: Mel Gorman @ 2019-10-30 16:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, linux-kernel

On Thu, Oct 24, 2019 at 12:15:27PM +0530, Viresh Kumar wrote:
> There are instances where we keep searching for an idle CPU despite
> having a sched-idle cpu already (in find_idlest_group_cpu(),
> select_idle_smt() and select_idle_cpu() and then there are places where
> we don't necessarily do that and return a sched-idle cpu as soon as we
> find one (in select_idle_sibling()). This looks a bit inconsistent and
> it may be worth having the same policy everywhere.
> 

This needs supporting data. find_idlest_group_cpu is generally from
a fork() context where it's not particularly performance critical.
select_idle_sibling and the helpers it uses is wakeup context where is
is often much more critical to wake quickly than find the best CPU. The
biggest challenge of select_idle_sibling is making a "good enough decision"
quickly without disrupting cache but a fork-intensive workload making quick
decision can overload local domains requiring fixing by the load balancer.

> On the other hand, choosing a sched-idle cpu over a idle one shall be
> beneficial from performance point of view as well, as we don't need to
> get the cpu online from a deep idle state which is quite a time
> consuming process and delays the scheduling of the newly wakeup task.
> 
> This patch tries to simplify code around sched-idle cpu selection and
> make it consistent throughout.
> 
> FWIW, tests were done with the help of rt-app (8 SCHED_OTHER and 5
> SCHED_IDLE tasks, not bound to any cpu) on ARM platform (octa-core), and
> no significant difference in scheduling latency of SCHED_OTHER tasks was
> found.
> 

As the patch stands, I think a fork-intensive workload where each
process is doing small amounts of work will suffer from overloading
domains and have variable performance depending on how quickly the load
balancer reacts.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-10-30 16:47 ` Mel Gorman
@ 2019-10-31  9:12   ` Viresh Kumar
  2019-10-31 10:19     ` Mel Gorman
  2019-11-08 11:31   ` Viresh Kumar
  1 sibling, 1 reply; 9+ messages in thread
From: Viresh Kumar @ 2019-10-31  9:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Linux Kernel Mailing List

On Wed, 30 Oct 2019 at 22:17, Mel Gorman <mgorman@suse.de> wrote:

> As the patch stands, I think a fork-intensive workload where each
> process is doing small amounts of work will suffer from overloading
> domains and have variable performance depending on how quickly the load
> balancer reacts.

Just wanted to clarify this slightly in case it is confusing. Once a
newly forked
(non SCHED_IDLE) task gets placed on a sched-idle CPU, it won't remain
sched-idle anymore and we will again start looking for a fully idle CPU. So,
we won't put everything on a small set of CPUs, but just one SCHED_NORMAL
task on a CPU unless we are out of idle CPUs.

Do you have some specific test in mind which I can run to test this ?

--
Viresh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-10-31  9:12   ` Viresh Kumar
@ 2019-10-31 10:19     ` Mel Gorman
  0 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2019-10-31 10:19 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Linux Kernel Mailing List

On Thu, Oct 31, 2019 at 02:42:03PM +0530, Viresh Kumar wrote:
> On Wed, 30 Oct 2019 at 22:17, Mel Gorman <mgorman@suse.de> wrote:
> 
> > As the patch stands, I think a fork-intensive workload where each
> > process is doing small amounts of work will suffer from overloading
> > domains and have variable performance depending on how quickly the load
> > balancer reacts.
> 
> Just wanted to clarify this slightly in case it is confusing. Once a
> newly forked
> (non SCHED_IDLE) task gets placed on a sched-idle CPU, it won't remain
> sched-idle anymore and we will again start looking for a fully idle CPU. So,
> we won't put everything on a small set of CPUs, but just one SCHED_NORMAL
> task on a CPU unless we are out of idle CPUs.
> 
> Do you have some specific test in mind which I can run to test this ?
> 

Nothing in particular. git test suite for the basic fork-intensive case
(mmtests config workload-shellscripts), something fork-intensive but
relatively short-lived like a kernel build scaling the number of build
jobs (mmtests config config-workload-kerndevel), something fairly basic
that scales number of running jobs and relatively long-lived like tbench
(mmtests config config-network-tbench). The ideal of course is that you
wrote the patch based on an observed problem that you decided to fix.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-10-30 16:47 ` Mel Gorman
  2019-10-31  9:12   ` Viresh Kumar
@ 2019-11-08 11:31   ` Viresh Kumar
  2019-11-08 17:01     ` Vincent Guittot
  1 sibling, 1 reply; 9+ messages in thread
From: Viresh Kumar @ 2019-11-08 11:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, linux-kernel

On 30-10-19, 16:47, Mel Gorman wrote:
> On Thu, Oct 24, 2019 at 12:15:27PM +0530, Viresh Kumar wrote:
> > There are instances where we keep searching for an idle CPU despite
> > having a sched-idle cpu already (in find_idlest_group_cpu(),
> > select_idle_smt() and select_idle_cpu() and then there are places where
> > we don't necessarily do that and return a sched-idle cpu as soon as we
> > find one (in select_idle_sibling()). This looks a bit inconsistent and
> > it may be worth having the same policy everywhere.
> > 
> 
> This needs supporting data.

I did some more interesting tests with rt-app. It was getting
difficult to generate the correct numbers with normal use cases as
most of the time prev/target/etc CPUs were found to be completely idle
and the task was getting placed there in all the cases and so no diff
with sched-idle changes.

To prove the point I was making (that we can reduce task latency with
SCHED_IDLE), I created 3 different tests on my hikey board (octa-core,
2 clusters, 0-3 and 4-7). The cpufreq governor was set to performance
to avoid any side affects from CPU frequency.

Test 1: 1-cfs-task:

A single SCHED_NORMAL task is pinned to CPU5 which runs for 2333 us
out of 7777 us (so gives time for the cluster to go in deep idle
state).

Test 2: 1-cfs-1-idle-task:

A single SCHED_NORMAL task is pinned on CPU5 and single SCHED_IDLE
task is pinned on CPU6 (to make sure cluster 1 doesn't go in deep idle
state).

Test 3: 1-cfs-8-idle-task:

A single SCHED_NORMAL task is pinned on CPU5 and eight SCHED_IDLE
tasks are created which run forever (not pinned anywhere, so they run
on all CPUs). Checked with kernelshark that as soon as NORMAL task
sleeps, the SCHED_IDLE task starts running on CPU5.

And here are the results on mean latency (in us), using the "st" tool.

$ st 1-cfs-task/rt-app-cfs_thread-0.log 
N	min	max	sum	mean	stddev
642	90	592	197180	307.134	109.906

$ st 1-cfs-1-idle-task/rt-app-cfs_thread-0.log 
N	min	max	sum	mean	stddev
642	67	311	113850	177.336	41.4251

$ st 1-cfs-8-idle-task/rt-app-cfs_thread-0.log 
N	min	max	sum	mean	stddev
643	29	173	41364	64.3297	13.2344


The mean latency when:
- we need to wakeup from deep idle state is 307 us
- we need to wakeup from shallow idle state is 177 us
- we need to preempt a SCHED_IDLE task is 64 us

So the theory looks correct, we should probably prefer SCHED_IDLE CPUs
both for power and performance :)

> find_idlest_group_cpu is generally from
> a fork() context where it's not particularly performance critical.
> select_idle_sibling and the helpers it uses is wakeup context where is
> is often much more critical to wake quickly than find the best CPU.

I agree. We must find the best CPU here. But won't a SCHED_IDLE cpu be
the best ? After all that is the one in shallowest idle state and so
better for power :)

-- 
viresh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout
  2019-11-08 11:31   ` Viresh Kumar
@ 2019-11-08 17:01     ` Vincent Guittot
  0 siblings, 0 replies; 9+ messages in thread
From: Vincent Guittot @ 2019-11-08 17:01 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Mel Gorman, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, linux-kernel

On Fri, 8 Nov 2019 at 12:32, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 30-10-19, 16:47, Mel Gorman wrote:
> > On Thu, Oct 24, 2019 at 12:15:27PM +0530, Viresh Kumar wrote:
> > > There are instances where we keep searching for an idle CPU despite
> > > having a sched-idle cpu already (in find_idlest_group_cpu(),
> > > select_idle_smt() and select_idle_cpu() and then there are places where
> > > we don't necessarily do that and return a sched-idle cpu as soon as we
> > > find one (in select_idle_sibling()). This looks a bit inconsistent and
> > > it may be worth having the same policy everywhere.
> > >
> >
> > This needs supporting data.
>
> I did some more interesting tests with rt-app. It was getting
> difficult to generate the correct numbers with normal use cases as
> most of the time prev/target/etc CPUs were found to be completely idle
> and the task was getting placed there in all the cases and so no diff
> with sched-idle changes.
>
> To prove the point I was making (that we can reduce task latency with
> SCHED_IDLE), I created 3 different tests on my hikey board (octa-core,
> 2 clusters, 0-3 and 4-7). The cpufreq governor was set to performance
> to avoid any side affects from CPU frequency.
>
> Test 1: 1-cfs-task:
>
> A single SCHED_NORMAL task is pinned to CPU5 which runs for 2333 us
> out of 7777 us (so gives time for the cluster to go in deep idle
> state).
>
> Test 2: 1-cfs-1-idle-task:
>
> A single SCHED_NORMAL task is pinned on CPU5 and single SCHED_IDLE
> task is pinned on CPU6 (to make sure cluster 1 doesn't go in deep idle
> state).
>
> Test 3: 1-cfs-8-idle-task:
>
> A single SCHED_NORMAL task is pinned on CPU5 and eight SCHED_IDLE
> tasks are created which run forever (not pinned anywhere, so they run
> on all CPUs). Checked with kernelshark that as soon as NORMAL task
> sleeps, the SCHED_IDLE task starts running on CPU5.
>
> And here are the results on mean latency (in us), using the "st" tool.
>
> $ st 1-cfs-task/rt-app-cfs_thread-0.log
> N       min     max     sum     mean    stddev
> 642     90      592     197180  307.134 109.906
>
> $ st 1-cfs-1-idle-task/rt-app-cfs_thread-0.log
> N       min     max     sum     mean    stddev
> 642     67      311     113850  177.336 41.4251
>
> $ st 1-cfs-8-idle-task/rt-app-cfs_thread-0.log
> N       min     max     sum     mean    stddev
> 643     29      173     41364   64.3297 13.2344
>
>
> The mean latency when:
> - we need to wakeup from deep idle state is 307 us
> - we need to wakeup from shallow idle state is 177 us
> - we need to preempt a SCHED_IDLE task is 64 us
>
> So the theory looks correct, we should probably prefer SCHED_IDLE CPUs
> both for power and performance :)
>
> > find_idlest_group_cpu is generally from
> > a fork() context where it's not particularly performance critical.
> > select_idle_sibling and the helpers it uses is wakeup context where is
> > is often much more critical to wake quickly than find the best CPU.
>
> I agree. We must find the best CPU here. But won't a SCHED_IDLE cpu be
> the best ? After all that is the one in shallowest idle state and so
> better for power :)

It makes sense to me to consider a CPU that runs only SCHED_IDLE task
as an idle CPU with shortest latency and most recently idled
timestamp. This seems to be confirmed be the data above.
The SCHED_IDLE tasks would be somewhat penalized because they can now
be preempted whereas there is a real idle CPU but such  SCHED_IDLE
task don't have any other requirements than not delaying NORMAL task
wakeup
Also this simplifies and shortens the search loop.


>
> --
> viresh

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-08 17:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24  6:45 [PATCH] sched/fair: Make sched-idle cpu selection consistent throughout Viresh Kumar
2019-10-25  6:43 ` Parth Shah
2019-10-25  8:11   ` Viresh Kumar
2019-10-25 12:00     ` Parth Shah
2019-10-30 16:47 ` Mel Gorman
2019-10-31  9:12   ` Viresh Kumar
2019-10-31 10:19     ` Mel Gorman
2019-11-08 11:31   ` Viresh Kumar
2019-11-08 17:01     ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).