linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Sync task util before slow-path wakeup
@ 2017-08-02 13:10 Brendan Jackman
  2017-08-02 13:24 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Brendan Jackman @ 2017-08-02 13:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, linux-kernel
  Cc: Joel Fernandes, Andres Oportus, Dietmar Eggemann,
	Vincent Guittot, Josef Bacik, Morten Rasmussen

We use task_util in find_idlest_group via capacity_spare_wake. This
task_util is updated in wake_cap. However wake_cap is not the only
reason for ending up in find_idlest_group - we could have been sent
there by wake_wide. So explicitly sync the task util with prev_cpu
when we are about to head to find_idlest_group.

We could simply do this at the beginning of
select_task_rq_fair (i.e. irrespective of whether we're heading to
select_idle_sibling or find_idlest_group & co), but I didn't want to
slow down the select_idle_sibling path more than necessary.

Don't do this during fork balancing, we won't need the task_util and
we'd just clobber the last_update_time, which is supposed to be 0.

Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e216f6..62869ff252b4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5913,6 +5913,14 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 			new_cpu = cpu;
 	}
 
+	if (sd && !(sd_flag & SD_BALANCE_FORK))
+		/*
+		 * We're going to need the task's util for capacity_spare_wake
+		 * in select_idlest_group. Sync it up to prev_cpu's
+		 * last_update_time.
+		 */
+		sync_entity_load_avg(&p->se);
+
 	if (!sd) {
  pick_cpu:
 		if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/fair: Sync task util before slow-path wakeup
  2017-08-02 13:10 [PATCH] sched/fair: Sync task util before slow-path wakeup Brendan Jackman
@ 2017-08-02 13:24 ` Peter Zijlstra
  2017-08-02 13:27   ` Brendan Jackman
  2017-08-07 12:51   ` Morten Rasmussen
  0 siblings, 2 replies; 4+ messages in thread
From: Peter Zijlstra @ 2017-08-02 13:24 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Ingo Molnar, linux-kernel, Joel Fernandes, Andres Oportus,
	Dietmar Eggemann, Vincent Guittot, Josef Bacik, Morten Rasmussen

On Wed, Aug 02, 2017 at 02:10:02PM +0100, Brendan Jackman wrote:
> We use task_util in find_idlest_group via capacity_spare_wake. This
> task_util is updated in wake_cap. However wake_cap is not the only
> reason for ending up in find_idlest_group - we could have been sent
> there by wake_wide. So explicitly sync the task util with prev_cpu
> when we are about to head to find_idlest_group.
> 
> We could simply do this at the beginning of
> select_task_rq_fair (i.e. irrespective of whether we're heading to
> select_idle_sibling or find_idlest_group & co), but I didn't want to
> slow down the select_idle_sibling path more than necessary.
> 
> Don't do this during fork balancing, we won't need the task_util and
> we'd just clobber the last_update_time, which is supposed to be 0.

So I remember Morten explicitly not aging util of tasks on wakeup
because the old util was higher and better representative of what the
new util would be, or something along those lines.

Morten?

> Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Josef Bacik <josef@toxicpanda.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/fair.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c95880e216f6..62869ff252b4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5913,6 +5913,14 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>  			new_cpu = cpu;
>  	}
>  
> +	if (sd && !(sd_flag & SD_BALANCE_FORK))
> +		/*
> +		 * We're going to need the task's util for capacity_spare_wake
> +		 * in select_idlest_group. Sync it up to prev_cpu's
> +		 * last_update_time.
> +		 */
> +		sync_entity_load_avg(&p->se);
> +

That has missing {}


>  	if (!sd) {
>   pick_cpu:

And if this patch lives, can you please fix up that broken label indent?

>  		if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/fair: Sync task util before slow-path wakeup
  2017-08-02 13:24 ` Peter Zijlstra
@ 2017-08-02 13:27   ` Brendan Jackman
  2017-08-07 12:51   ` Morten Rasmussen
  1 sibling, 0 replies; 4+ messages in thread
From: Brendan Jackman @ 2017-08-02 13:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Joel Fernandes, Andres Oportus,
	Dietmar Eggemann, Vincent Guittot, Josef Bacik, Morten Rasmussen


On Wed, Aug 02 2017 at 13:24, Peter Zijlstra wrote:
> On Wed, Aug 02, 2017 at 02:10:02PM +0100, Brendan Jackman wrote:
>> We use task_util in find_idlest_group via capacity_spare_wake. This
>> task_util is updated in wake_cap. However wake_cap is not the only
>> reason for ending up in find_idlest_group - we could have been sent
>> there by wake_wide. So explicitly sync the task util with prev_cpu
>> when we are about to head to find_idlest_group.
>>
>> We could simply do this at the beginning of
>> select_task_rq_fair (i.e. irrespective of whether we're heading to
>> select_idle_sibling or find_idlest_group & co), but I didn't want to
>> slow down the select_idle_sibling path more than necessary.
>>
>> Don't do this during fork balancing, we won't need the task_util and
>> we'd just clobber the last_update_time, which is supposed to be 0.
>
> So I remember Morten explicitly not aging util of tasks on wakeup
> because the old util was higher and better representative of what the
> new util would be, or something along those lines.
>
> Morten?
>
>> Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
>> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
>> Cc: Josef Bacik <josef@toxicpanda.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Morten Rasmussen <morten.rasmussen@arm.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> ---
>>  kernel/sched/fair.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index c95880e216f6..62869ff252b4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5913,6 +5913,14 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>>  			new_cpu = cpu;
>>  	}
>>
>> +	if (sd && !(sd_flag & SD_BALANCE_FORK))
>> +		/*
>> +		 * We're going to need the task's util for capacity_spare_wake
>> +		 * in select_idlest_group. Sync it up to prev_cpu's
>> +		 * last_update_time.
>> +		 */
>> +		sync_entity_load_avg(&p->se);
>> +
>
> That has missing {}

OK. Also just noticed it refers to "select_idlest_group", will change to
"find_idlest_group".

>
>
>>  	if (!sd) {
>>   pick_cpu:
>
> And if this patch lives, can you please fix up that broken label indent?

Sure.

>>  		if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */

Cheers,
Brendan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/fair: Sync task util before slow-path wakeup
  2017-08-02 13:24 ` Peter Zijlstra
  2017-08-02 13:27   ` Brendan Jackman
@ 2017-08-07 12:51   ` Morten Rasmussen
  1 sibling, 0 replies; 4+ messages in thread
From: Morten Rasmussen @ 2017-08-07 12:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Brendan Jackman, Ingo Molnar, linux-kernel, Joel Fernandes,
	Andres Oportus, Dietmar Eggemann, Vincent Guittot, Josef Bacik

On Wed, Aug 02, 2017 at 03:24:05PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 02, 2017 at 02:10:02PM +0100, Brendan Jackman wrote:
> > We use task_util in find_idlest_group via capacity_spare_wake. This
> > task_util is updated in wake_cap. However wake_cap is not the only
> > reason for ending up in find_idlest_group - we could have been sent
> > there by wake_wide. So explicitly sync the task util with prev_cpu
> > when we are about to head to find_idlest_group.
> > 
> > We could simply do this at the beginning of
> > select_task_rq_fair (i.e. irrespective of whether we're heading to
> > select_idle_sibling or find_idlest_group & co), but I didn't want to
> > slow down the select_idle_sibling path more than necessary.
> > 
> > Don't do this during fork balancing, we won't need the task_util and
> > we'd just clobber the last_update_time, which is supposed to be 0.
> 
> So I remember Morten explicitly not aging util of tasks on wakeup
> because the old util was higher and better representative of what the
> new util would be, or something along those lines.
> 
> Morten?

That was the intention, but when we discussed the wake_cap() stuff we
decided to drop that hoping that decay clamping or some other magic
would be added on top later. So this patch is in line with current
behaviour.

Using non-aged util is causing trouble when comparing prev_cpu to other
cpus. In cpu_util_wake() we compensate for the fact that the aged task
util is already included in the cpu util on the prev_cpu. For that to
work, we need to age the task util so we know how much is already
accounted for. In the original wake_cap() series I think I had a patch
that store the non-aged version so we could calculate the potential cpu
util as:

predicted_cpu_util(prev_cpu) =
	cpu_util(prev_cpu) - task_util_aged(task) + task_util_nonaged(task)

predicted_cpu_util(other_cpu) =
	cpu_util(other_cpu) + task_util_nonaged(task)

This would be better always under-estimating the task util by using the
aged util as we currently do:

predicted_cpu_util(prev_cpu) =
	cpu_util(prev_cpu) - task_util_aged(task) + task_util_aged(task)

predicted_cpu_util(other_cpu) =
	cpu_util(other_cpu) + task_util_aged(task)

but at least it gives us a fair comparison between prev_cpu and other
cpus.

The Android kernel carries additional patches that tracks the max (peak)
utilization and uses that as the non aged util for wake-up placement.
I'm hoping we can discuss this topic again at LPC, as last years idea of
clamping decay didn't work very well to solve this issue.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-07 12:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02 13:10 [PATCH] sched/fair: Sync task util before slow-path wakeup Brendan Jackman
2017-08-02 13:24 ` Peter Zijlstra
2017-08-02 13:27   ` Brendan Jackman
2017-08-07 12:51   ` Morten Rasmussen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).