[PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity
@ 2019-07-09 11:57 Chris Redpath
  2019-07-09 13:50 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Redpath @ 2019-07-09 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, morten.rasmussen, dietmar.eggemann,
	Chris Redpath

The ancient workaround to avoid the cost of updating rq clocks in the
middle of a migration causes some issues on asymmetric CPU capacity
systems where we use task utilization to determine which cpus fit a task.
On quiet systems we can inflate task util after a migration which
causes misfit to fire and force-migrate the task.

This occurs when:

(a) a task has util close to the non-overutilized capacity limit of a
    particular cpu (cpu0 here); and
(b) the prev_cpu was quiet otherwise, such that rq clock is
    sufficiently out of date (cpu1 here).

e.g.
                              _____
cpu0: ________________________|   |______________

                                  |<- misfit happens
          ______                  ___         ___
cpu1: ____|    |______________|___| |_________|

             ->|              |<- wakeup migration time
last rq clock update

When the task util is in just the right range for the system, we can end
up migrating an unlucky task back and forth many times until we are lucky
and the source rq happens to be updated close to the migration time.

In order to address this, lets update both rq_clock and cfs_rq where
this could be an issue.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
---
 kernel/sched/fair.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b798fe7ff7cd..51791db26a2a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6545,6 +6545,21 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
 		 * wakee task is less decayed, but giving the wakee more load
 		 * sounds not bad.
 		 */
+		if (static_branch_unlikely(&sched_asym_cpucapacity) &&
+			p->state == TASK_WAKING) {
+			/*
+			 * On asymmetric capacity systems task util guides
+			 * wake placement so update rq_clock and cfs_rq
+			 */
+			struct cfs_rq *cfs_rq = task_cfs_rq(p);
+			struct rq *rq = task_rq(p);
+			struct rq_flags rf;
+
+			rq_lock_irqsave(rq, &rf);
+			update_rq_clock(rq);
+			update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), cfs_rq);
+			rq_unlock_irqrestore(rq, &rf);
+		}
 		remove_entity_load_avg(&p->se);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity
  2019-07-09 11:57 [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity Chris Redpath
@ 2019-07-09 13:50 ` Peter Zijlstra
  2019-07-09 15:23   ` Chris Redpath
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2019-07-09 13:50 UTC (permalink / raw)
  To: Chris Redpath
  Cc: linux-kernel, Ingo Molnar, morten.rasmussen, dietmar.eggemann,
	Vincent Guittot

On Tue, Jul 09, 2019 at 12:57:59PM +0100, Chris Redpath wrote:
> The ancient workaround to avoid the cost of updating rq clocks in the
> middle of a migration causes some issues on asymmetric CPU capacity
> systems where we use task utilization to determine which cpus fit a task.
> On quiet systems we can inflate task util after a migration which
> causes misfit to fire and force-migrate the task.
> 
> This occurs when:
> 
> (a) a task has util close to the non-overutilized capacity limit of a
>     particular cpu (cpu0 here); and
> (b) the prev_cpu was quiet otherwise, such that rq clock is
>     sufficiently out of date (cpu1 here).
> 
> e.g.
>                               _____
> cpu0: ________________________|   |______________
> 
>                                   |<- misfit happens
>           ______                  ___         ___
> cpu1: ____|    |______________|___| |_________|
> 
>              ->|              |<- wakeup migration time
> last rq clock update
> 
> When the task util is in just the right range for the system, we can end
> up migrating an unlucky task back and forth many times until we are lucky
> and the source rq happens to be updated close to the migration time.
> 
> In order to address this, lets update both rq_clock and cfs_rq where
> this could be an issue.

Can you quantify how much of a problem this really is? It is really sad,
but this is already the second place where we take rq->lock on
migration. We worked so hard to avoid having to acquire it :/

> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
> ---
>  kernel/sched/fair.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b798fe7ff7cd..51791db26a2a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6545,6 +6545,21 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>  		 * wakee task is less decayed, but giving the wakee more load
>  		 * sounds not bad.
>  		 */
> +		if (static_branch_unlikely(&sched_asym_cpucapacity) &&
> +			p->state == TASK_WAKING) {

nit: indent fail.

> +			/*
> +			 * On asymmetric capacity systems task util guides
> +			 * wake placement so update rq_clock and cfs_rq
> +			 */
> +			struct cfs_rq *cfs_rq = task_cfs_rq(p);
> +			struct rq *rq = task_rq(p);
> +			struct rq_flags rf;
> +
> +			rq_lock_irqsave(rq, &rf);
> +			update_rq_clock(rq);
> +			update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), cfs_rq);
> +			rq_unlock_irqrestore(rq, &rf);
> +		}
>  		remove_entity_load_avg(&p->se);
>  	}
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity
  2019-07-09 13:50 ` Peter Zijlstra
@ 2019-07-09 15:23   ` Chris Redpath
  2019-07-09 15:36     ` Vincent Guittot
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Redpath @ 2019-07-09 15:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Morten Rasmussen, Dietmar Eggemann,
	Vincent Guittot

Hi Peter,

On 09/07/2019 14:50, Peter Zijlstra wrote:
> On Tue, Jul 09, 2019 at 12:57:59PM +0100, Chris Redpath wrote:
>> The ancient workaround to avoid the cost of updating rq clocks in the
>> middle of a migration causes some issues on asymmetric CPU capacity
>> systems where we use task utilization to determine which cpus fit a task.
>> On quiet systems we can inflate task util after a migration which
>> causes misfit to fire and force-migrate the task.
>>
>> This occurs when:
>>
>> (a) a task has util close to the non-overutilized capacity limit of a
>>      particular cpu (cpu0 here); and
>> (b) the prev_cpu was quiet otherwise, such that rq clock is
>>      sufficiently out of date (cpu1 here).
>>
>> e.g.
>>                                _____
>> cpu0: ________________________|   |______________
>>
>>                                    |<- misfit happens
>>            ______                  ___         ___
>> cpu1: ____|    |______________|___| |_________|
>>
>>               ->|              |<- wakeup migration time
>> last rq clock update
>>
>> When the task util is in just the right range for the system, we can end
>> up migrating an unlucky task back and forth many times until we are lucky
>> and the source rq happens to be updated close to the migration time.
>>
>> In order to address this, lets update both rq_clock and cfs_rq where
>> this could be an issue.
>
> Can you quantify how much of a problem this really is? It is really sad,
> but this is already the second place where we take rq->lock on
> migration. We worked so hard to avoid having to acquire it :/
>

I think you're familiar with the way we test the EAS and misfit stuff,
but some might not be, so I'll just outline them.

We have performance and placement tests for a suite of simple synthetic
scenarios selected to trigger the EAS & misfit mechanisms. The
performance tests use rt-app's slack metric, and we try to minimise
negative slack (i.e. missed deadlines).

In the placement tests we estimate the minimum energy consumed to run a
particular synthetic test job and we calculate the energy consumed in
the actual execution according to a trace. We pass the test if our
estimate of actual is less than ideal+20%.

We enter this code quite often in our testing, most individual runs of a
test which has small tasks involved have at least one hit where we make
a change to the clock with this patch in.

That said - despite the relatively high number of hits only about 5% of
runs see enough additional energy consumed to trigger a test failure. We
do try to keep a quiet system as much as possible and only run for a few
seconds so the impact we see in testing is also probably higher than in
the real world.

I totally appreciate the reluctance to add this - I don't much like it
either, but I was hoping that sticking it behind the asym_cpucapacity
key might be a reasonable compromise.

At least only those people who select a CPU using task util and capacity
see this cost, and we have smaller systems so in theory the cost is smaller.

I'm very open to exploring alternatives :)

>> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
>> ---
>>   kernel/sched/fair.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index b798fe7ff7cd..51791db26a2a 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6545,6 +6545,21 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>>               * wakee task is less decayed, but giving the wakee more load
>>               * sounds not bad.
>>               */
>> +            if (static_branch_unlikely(&sched_asym_cpucapacity) &&
>> +                    p->state == TASK_WAKING) {
>
> nit: indent fail.
>

oops, will tweak it

--Chris
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity
  2019-07-09 15:23   ` Chris Redpath
@ 2019-07-09 15:36     ` Vincent Guittot
  2019-07-09 15:42       ` Chris Redpath
  0 siblings, 1 reply; 6+ messages in thread
From: Vincent Guittot @ 2019-07-09 15:36 UTC (permalink / raw)
  To: Chris Redpath
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Morten Rasmussen,
	Dietmar Eggemann

Hi Chris,

On Tue, 9 Jul 2019 at 17:23, Chris Redpath <Chris.Redpath@arm.com> wrote:
>
> Hi Peter,
>
> On 09/07/2019 14:50, Peter Zijlstra wrote:
> > On Tue, Jul 09, 2019 at 12:57:59PM +0100, Chris Redpath wrote:
> >> The ancient workaround to avoid the cost of updating rq clocks in the
> >> middle of a migration causes some issues on asymmetric CPU capacity
> >> systems where we use task utilization to determine which cpus fit a task.
> >> On quiet systems we can inflate task util after a migration which
> >> causes misfit to fire and force-migrate the task.
> >>
> >> This occurs when:
> >>
> >> (a) a task has util close to the non-overutilized capacity limit of a
> >>      particular cpu (cpu0 here); and
> >> (b) the prev_cpu was quiet otherwise, such that rq clock is
> >>      sufficiently out of date (cpu1 here).
> >>
> >> e.g.
> >>                                _____
> >> cpu0: ________________________|   |______________
> >>
> >>                                    |<- misfit happens
> >>            ______                  ___         ___
> >> cpu1: ____|    |______________|___| |_________|
> >>
> >>               ->|              |<- wakeup migration time
> >> last rq clock update
> >>
> >> When the task util is in just the right range for the system, we can end
> >> up migrating an unlucky task back and forth many times until we are lucky
> >> and the source rq happens to be updated close to the migration time.
> >>
> >> In order to address this, lets update both rq_clock and cfs_rq where
> >> this could be an issue.
> >
> > Can you quantify how much of a problem this really is? It is really sad,
> > but this is already the second place where we take rq->lock on
> > migration. We worked so hard to avoid having to acquire it :/
> >
>
> I think you're familiar with the way we test the EAS and misfit stuff,
> but some might not be, so I'll just outline them.
>
> We have performance and placement tests for a suite of simple synthetic
> scenarios selected to trigger the EAS & misfit mechanisms. The
> performance tests use rt-app's slack metric, and we try to minimise
> negative slack (i.e. missed deadlines).
>
> In the placement tests we estimate the minimum energy consumed to run a
> particular synthetic test job and we calculate the energy consumed in
> the actual execution according to a trace. We pass the test if our
> estimate of actual is less than ideal+20%.
>
> We enter this code quite often in our testing, most individual runs of a
> test which has small tasks involved have at least one hit where we make
> a change to the clock with this patch in.

Do you have a rt-app file that you can share ?

>
> That said - despite the relatively high number of hits only about 5% of
> runs see enough additional energy consumed to trigger a test failure. We
> do try to keep a quiet system as much as possible and only run for a few
> seconds so the impact we see in testing is also probably higher than in
> the real world.

Yeah, I'm curious to see the impact on a real system which have a
60fps screen update like an android phone

>
> I totally appreciate the reluctance to add this - I don't much like it
> either, but I was hoping that sticking it behind the asym_cpucapacity
> key might be a reasonable compromise.
>
> At least only those people who select a CPU using task util and capacity
> see this cost, and we have smaller systems so in theory the cost is smaller.
>
> I'm very open to exploring alternatives :)
>
> >> Signed-off-by: Chris Redpath <chris.redpath@arm.com>
> >> ---
> >>   kernel/sched/fair.c | 15 +++++++++++++++
> >>   1 file changed, 15 insertions(+)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index b798fe7ff7cd..51791db26a2a 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -6545,6 +6545,21 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> >>               * wakee task is less decayed, but giving the wakee more load
> >>               * sounds not bad.
> >>               */
> >> +            if (static_branch_unlikely(&sched_asym_cpucapacity) &&
> >> +                    p->state == TASK_WAKING) {
> >
> > nit: indent fail.
> >
>
> oops, will tweak it
>
> --Chris
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity
  2019-07-09 15:36     ` Vincent Guittot
@ 2019-07-09 15:42       ` Chris Redpath
  2019-07-09 15:46         ` Vincent Guittot
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Redpath @ 2019-07-09 15:42 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Morten Rasmussen,
	Dietmar Eggemann

On 09/07/2019 16:36, Vincent Guittot wrote:
> Hi Chris,
> 
>>
>> We enter this code quite often in our testing, most individual runs of a
>> test which has small tasks involved have at least one hit where we make
>> a change to the clock with this patch in.
> 
> Do you have a rt-app file that you can share ?
> 

The ThreeSmallTasks test which is the worst hit produces this:

{
     "tasks": {
         "small_0": {
             "policy": "SCHED_OTHER",
             "delay": 0,
             "loop": 1,
             "phases": {
                 "p000001": {
                     "loop": 62,
                     "run": 2880,
                     "timer": {
                         "ref": "small_0",
                         "period": 16000
                     }
                 }
             }
         },
         "small_1": {
             "policy": "SCHED_OTHER",
             "delay": 0,
             "loop": 1,
             "phases": {
                 "p000001": {
                     "loop": 62,
                     "run": 2880,
                     "timer": {
                         "ref": "small_1",
                         "period": 16000
                     }
                 }
             }
         },
         "small_2": {
             "policy": "SCHED_OTHER",
             "delay": 0,
             "loop": 1,
             "phases": {
                 "p000001": {
                     "loop": 62,
                     "run": 2880,
                     "timer": {
                         "ref": "small_2",
                         "period": 16000
                     }
                 }
             }
         }
     },
     "global": {
         "default_policy": "SCHED_OTHER",
         "duration": -1,
         "calibration": 264,
         "logdir": "/root/devlib-target"
     }
}

when I run it

>>
>> That said - despite the relatively high number of hits only about 5% of
>> runs see enough additional energy consumed to trigger a test failure. We
>> do try to keep a quiet system as much as possible and only run for a few
>> seconds so the impact we see in testing is also probably higher than in
>> the real world.
> 
> Yeah, I'm curious to see the impact on a real system which have a
> 60fps screen update like an android phone
> 

I wouldn't expect much change there but I would on the idle-ish 
homescreen/day-of-use type benchmarks.

If I had a platform with any kind of representative energy use, I'd test 
it :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity
  2019-07-09 15:42       ` Chris Redpath
@ 2019-07-09 15:46         ` Vincent Guittot
  0 siblings, 0 replies; 6+ messages in thread
From: Vincent Guittot @ 2019-07-09 15:46 UTC (permalink / raw)
  To: Chris Redpath
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Morten Rasmussen,
	Dietmar Eggemann

On Tue, 9 Jul 2019 at 17:42, Chris Redpath <chris.redpath@foss.arm.com> wrote:
>
> On 09/07/2019 16:36, Vincent Guittot wrote:
> > Hi Chris,
> >
> >>
> >> We enter this code quite often in our testing, most individual runs of a
> >> test which has small tasks involved have at least one hit where we make
> >> a change to the clock with this patch in.
> >
> > Do you have a rt-app file that you can share ?
> >
>
> The ThreeSmallTasks test which is the worst hit produces this:
>
> {
>      "tasks": {
>          "small_0": {
>              "policy": "SCHED_OTHER",
>              "delay": 0,
>              "loop": 1,
>              "phases": {
>                  "p000001": {
>                      "loop": 62,
>                      "run": 2880,
>                      "timer": {
>                          "ref": "small_0",
>                          "period": 16000
>                      }
>                  }
>              }
>          },
>          "small_1": {
>              "policy": "SCHED_OTHER",
>              "delay": 0,
>              "loop": 1,
>              "phases": {
>                  "p000001": {
>                      "loop": 62,
>                      "run": 2880,
>                      "timer": {
>                          "ref": "small_1",
>                          "period": 16000
>                      }
>                  }
>              }
>          },
>          "small_2": {
>              "policy": "SCHED_OTHER",
>              "delay": 0,
>              "loop": 1,
>              "phases": {
>                  "p000001": {
>                      "loop": 62,
>                      "run": 2880,
>                      "timer": {
>                          "ref": "small_2",
>                          "period": 16000
>                      }
>                  }
>              }
>          }
>      },
>      "global": {
>          "default_policy": "SCHED_OTHER",
>          "duration": -1,
>          "calibration": 264,
>          "logdir": "/root/devlib-target"
>      }
> }
>
> when I run it

Thanks I will make it a try on my b.L platform

>
> >>
> >> That said - despite the relatively high number of hits only about 5% of
> >> runs see enough additional energy consumed to trigger a test failure. We
> >> do try to keep a quiet system as much as possible and only run for a few
> >> seconds so the impact we see in testing is also probably higher than in
> >> the real world.
> >
> > Yeah, I'm curious to see the impact on a real system which have a
> > 60fps screen update like an android phone
> >
>
> I wouldn't expect much change there but I would on the idle-ish
> homescreen/day-of-use type benchmarks.
>
> If I had a platform with any kind of representative energy use, I'd test
> it :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-09 15:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-09 11:57 [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity Chris Redpath
2019-07-09 13:50 ` Peter Zijlstra
2019-07-09 15:23   ` Chris Redpath
2019-07-09 15:36     ` Vincent Guittot
2019-07-09 15:42       ` Chris Redpath
2019-07-09 15:46         ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).