[PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
@ 2019-06-20 15:05 Patrick Bellasi
  2019-06-26 11:40 ` Vincent Guittot
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick Bellasi @ 2019-06-20 15:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Vincent Guittot,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

The estimated utilization for a task is currently defined based on:
 - enqueued: the utilization value at the end of the last activation
 - ewma:     an exponential moving average which samples are the enqueued values

According to this definition, when a task suddenly change it's bandwidth
requirements from small to big, the EWMA will need to collect multiple
samples before converging up to track the new big utilization.

Moreover, after the PELT scale invariance update [1], in the above scenario we
can see that the utilization of the task has a significant drop from the first
big activation to the following one. That's implied by the new "time-scaling"
mechanisms instead of the previous "delta-scaling" approach.

Unfortunately, these drops cannot be fully absorbed by the current util_est
implementation. Indeed, the low-frequency filtering introduced by the "ewma" is
entirely useless while converging up and it does not help in stabilizing sooner
the PELT signal.

To make util_est do better service in the above scenario, do change its
definition to slow down only utilization decreases. Do that by resetting the
"ewma" every time the last collected sample increases.

This change makes also the default util_est implementation more aligned with
the major scheduler behavior, which is to optimize for performance.
In the future, this implementation can be further refined to consider
task specific hints.

[1] sched/fair: Update scale invariance of PELT
    Message-ID: <tip-23127296889fe84b0762b191b5d041e8ba6f2599@git.kernel.org>

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/fair.c     | 14 +++++++++++++-
 kernel/sched/features.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c11dcdedcbc..27b33caaaaf4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3685,11 +3685,22 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 	if (ue.enqueued & UTIL_AVG_UNCHANGED)
 		return;

+	/*
+	 * Reset EWMA on utilization increases, the moving average is used only
+	 * to smooth utilization decreases.
+	 */
+	ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
+	if (sched_feat(UTIL_EST_FASTUP)) {
+		if (ue.ewma < ue.enqueued) {
+			ue.ewma = ue.enqueued;
+			goto done;
+		}
+	}
+
 	/*
 	 * Skip update of task's estimated utilization when its EWMA is
 	 * already ~1% close to its last activation value.
 	 */
-	ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
 	last_ewma_diff = ue.enqueued - ue.ewma;
 	if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
 		return;
@@ -3722,6 +3733,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 	ue.ewma <<= UTIL_EST_WEIGHT_SHIFT;
 	ue.ewma  += last_ewma_diff;
 	ue.ewma >>= UTIL_EST_WEIGHT_SHIFT;
+done:
 	WRITE_ONCE(p->se.avg.util_est, ue);
 }

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 2410db5e9a35..7481cd96f391 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -89,3 +89,4 @@ SCHED_FEAT(WA_BIAS, true)
  * UtilEstimation. Use estimated CPU utilization.
  */
 SCHED_FEAT(UTIL_EST, true)
+SCHED_FEAT(UTIL_EST_FASTUP, true)
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-20 15:05 [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases Patrick Bellasi
@ 2019-06-26 11:40 ` Vincent Guittot
  2019-06-28 10:08   ` Patrick Bellasi
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2019-06-26 11:40 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

Hi Patrick,

On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
>
> The estimated utilization for a task is currently defined based on:
>  - enqueued: the utilization value at the end of the last activation
>  - ewma:     an exponential moving average which samples are the enqueued values
>
> According to this definition, when a task suddenly change it's bandwidth
> requirements from small to big, the EWMA will need to collect multiple
> samples before converging up to track the new big utilization.
>
> Moreover, after the PELT scale invariance update [1], in the above scenario we
> can see that the utilization of the task has a significant drop from the first
> big activation to the following one. That's implied by the new "time-scaling"

Could you give us more details about this? I'm not sure to understand
what changes between the 1st big activation and the following one ?
The utilization implied by new "time-scaling" should be the same as
always running at max frequency with previous  method

> mechanisms instead of the previous "delta-scaling" approach.
>
> Unfortunately, these drops cannot be fully absorbed by the current util_est
> implementation. Indeed, the low-frequency filtering introduced by the "ewma" is
> entirely useless while converging up and it does not help in stabilizing sooner
> the PELT signal.
>
> To make util_est do better service in the above scenario, do change its
> definition to slow down only utilization decreases. Do that by resetting the
> "ewma" every time the last collected sample increases.
>
> This change makes also the default util_est implementation more aligned with
> the major scheduler behavior, which is to optimize for performance.
> In the future, this implementation can be further refined to consider
> task specific hints.
>
> [1] sched/fair: Update scale invariance of PELT
>     Message-ID: <tip-23127296889fe84b0762b191b5d041e8ba6f2599@git.kernel.org>
>
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/fair.c     | 14 +++++++++++++-
>  kernel/sched/features.h |  1 +
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3c11dcdedcbc..27b33caaaaf4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3685,11 +3685,22 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
>         if (ue.enqueued & UTIL_AVG_UNCHANGED)
>                 return;
>
> +       /*
> +        * Reset EWMA on utilization increases, the moving average is used only
> +        * to smooth utilization decreases.
> +        */
> +       ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
> +       if (sched_feat(UTIL_EST_FASTUP)) {
> +               if (ue.ewma < ue.enqueued) {
> +                       ue.ewma = ue.enqueued;
> +                       goto done;
> +               }
> +       }
> +
>         /*
>          * Skip update of task's estimated utilization when its EWMA is
>          * already ~1% close to its last activation value.
>          */
> -       ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
>         last_ewma_diff = ue.enqueued - ue.ewma;
>         if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
>                 return;
> @@ -3722,6 +3733,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
>         ue.ewma <<= UTIL_EST_WEIGHT_SHIFT;
>         ue.ewma  += last_ewma_diff;
>         ue.ewma >>= UTIL_EST_WEIGHT_SHIFT;
> +done:
>         WRITE_ONCE(p->se.avg.util_est, ue);
>  }
>
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 2410db5e9a35..7481cd96f391 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -89,3 +89,4 @@ SCHED_FEAT(WA_BIAS, true)
>   * UtilEstimation. Use estimated CPU utilization.
>   */
>  SCHED_FEAT(UTIL_EST, true)
> +SCHED_FEAT(UTIL_EST_FASTUP, true)
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-26 11:40 ` Vincent Guittot
@ 2019-06-28 10:08   ` Patrick Bellasi
  2019-06-28 12:38     ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick Bellasi @ 2019-06-28 10:08 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On 26-Jun 13:40, Vincent Guittot wrote:
> Hi Patrick,
> 
> On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> >
> > The estimated utilization for a task is currently defined based on:
> >  - enqueued: the utilization value at the end of the last activation
> >  - ewma:     an exponential moving average which samples are the enqueued values
> >
> > According to this definition, when a task suddenly change it's bandwidth
> > requirements from small to big, the EWMA will need to collect multiple
> > samples before converging up to track the new big utilization.
> >
> > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > can see that the utilization of the task has a significant drop from the first
> > big activation to the following one. That's implied by the new "time-scaling"
> 
> Could you give us more details about this? I'm not sure to understand
> what changes between the 1st big activation and the following one ?

We are after a solution for the problem Douglas Raillard discussed at
OSPM, specifically the "Task util drop after 1st idle" highlighted in
slide 6 of his presentation:

  http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf

which shows what happens with a task switches from 5% to 75% and
we get these start/end values for each activation:

  Act     Time          __comm  __cpu  __pid    task    util_avg
  --------------------------------------------------------------
  1       2.813559  	<idle>	    4	   0	step_up	      45
          2.902624	step_up	    4	2574	step_up	     665
  --------------------------------------------------------------
  2       2.903722	<idle>	    4	   0	step_up	     289
          2.917385	step_up	    4	2574	step_up	     452
  --------------------------------------------------------------
  3       2.919725	<idle>	    4	   0	step_up	     418
          2.953764	step_up	    4	2574	step_up	     658
  --------------------------------------------------------------
  4       2.954248	<idle>	    4	   0	step_up	     537
          2.967955	step_up	    4	2574	step_up	     645
  --------------------------------------------------------------
  5       2.970248	<idle>	    4	   0	step_up	     597
          2.983914	step_up	    4	2574	step_up	     692
  --------------------------------------------------------------
  6       2.986248	<idle>	    4	   0	step_up	     640
          2.999924	step_up	    4	2574	step_up	     725
  --------------------------------------------------------------
  7       3.002248	<idle>	    4	   0	step_up	     670
          3.015872	step_up	    4	2574	step_up	     749
  --------------------------------------------------------------
  8       3.018248	<idle>	    4	   0	step_up	     694
          3.030474	step_up	    4	2574	step_up	     767
  --------------------------------------------------------------
  9       3.034247	<idle>	    4	   0	step_up	     710
          3.046454	step_up	    4	2574	step_up	     780
  --------------------------------------------------------------

Since the first activation is running at lower-than-max OPPs we do
"time-scaling" at the end of the activation. Util_avg starts at 45
and ramps up to 665 but then it drops 375 units down to 289 at the
beginning of the second activation.

The second activation has a chance to run at higher OPPs, but still
not at max. Util_avg starts at 289 and ramps up to 452, which is even
lower then the previous max value, but then it drops 34 units down to
418.

The following activations have a similar pattern but util_avg
converges toward the final value, we run almost always at the highest
OPP and the drops are defined mainly by the expected PELT decay.

> The utilization implied by new "time-scaling" should be the same as
> always running at max frequency with previous  method

Right, the problem we are tacking with this patch however is to
make util_est a better signal for the ramp-up phases.

Right now util_est "fixes" only the second activation, since:

   max(util_avg, last_value, ewma) =
   max(289, 665, <289) = 665

and thus we keep running on the highest OPP we reached at the end of
the first activation.

While at the start of the third activation:

   max(util_avg, last_value, ewma) =
   max(452, 418, <452) = 452

and this time we drop the OPP quite a lot despite the signal still
being ramping up.

> > mechanisms instead of the previous "delta-scaling" approach.
> >

That happens because the EWMA takes multiple activations to converge
up, which means it's not very helping much:

> > Unfortunately, these drops cannot be fully absorbed by the current util_est
> > implementation. Indeed, the low-frequency filtering introduced by the "ewma" is
> > entirely useless while converging up and it does not help in stabilizing sooner
> > the PELT signal.

The idea of the patch is to exploit two observations:

 1. the default scheduler behavior is to be performance oriented
 2. the longher you run a task underprovisioned, the higher the
    util_avg will be

Which turns into:

> > To make util_est do better service in the above scenario, do change its
> > definition to slow down only utilization decreases. Do that by resetting the
> > "ewma" every time the last collected sample increases.
> >
> > This change makes also the default util_est implementation more aligned with
> > the major scheduler behavior, which is to optimize for performance.
> > In the future, this implementation can be further refined to consider
> > task specific hints.

Cheers,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-28 10:08   ` Patrick Bellasi
@ 2019-06-28 12:38     ` Peter Zijlstra
  2019-06-28 13:51       ` Vincent Guittot
  2019-06-28 14:00       ` Patrick Bellasi
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2019-06-28 12:38 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> On 26-Jun 13:40, Vincent Guittot wrote:
> > Hi Patrick,
> > 
> > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > >
> > > The estimated utilization for a task is currently defined based on:
> > >  - enqueued: the utilization value at the end of the last activation
> > >  - ewma:     an exponential moving average which samples are the enqueued values
> > >
> > > According to this definition, when a task suddenly change it's bandwidth
> > > requirements from small to big, the EWMA will need to collect multiple
> > > samples before converging up to track the new big utilization.
> > >
> > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > can see that the utilization of the task has a significant drop from the first
> > > big activation to the following one. That's implied by the new "time-scaling"
> > 
> > Could you give us more details about this? I'm not sure to understand
> > what changes between the 1st big activation and the following one ?
> 
> We are after a solution for the problem Douglas Raillard discussed at
> OSPM, specifically the "Task util drop after 1st idle" highlighted in
> slide 6 of his presentation:
> 
>   http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> 

So I see the problem, and I don't hate the patch, but I'm still
struggling to understand how exactly it related to the time-scaling
stuff. Afaict the fundamental problem here is layering two averages. The
second (EWMA in our case) will always lag/delay the input of the first
(PELT).

The time-scaling thing might make matters worse, because that helps PELT
ramp up faster, but that is not the primary issue.

Or am I missing something?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-28 12:38     ` Peter Zijlstra
@ 2019-06-28 13:51       ` Vincent Guittot
  2019-06-28 14:10         ` Patrick Bellasi
  2019-06-28 14:00       ` Patrick Bellasi
  1 sibling, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2019-06-28 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On Fri, 28 Jun 2019 at 14:38, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > On 26-Jun 13:40, Vincent Guittot wrote:
> > > Hi Patrick,
> > >
> > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > >
> > > > The estimated utilization for a task is currently defined based on:
> > > >  - enqueued: the utilization value at the end of the last activation
> > > >  - ewma:     an exponential moving average which samples are the enqueued values
> > > >
> > > > According to this definition, when a task suddenly change it's bandwidth
> > > > requirements from small to big, the EWMA will need to collect multiple
> > > > samples before converging up to track the new big utilization.
> > > >
> > > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > > can see that the utilization of the task has a significant drop from the first
> > > > big activation to the following one. That's implied by the new "time-scaling"
> > >
> > > Could you give us more details about this? I'm not sure to understand
> > > what changes between the 1st big activation and the following one ?
> >
> > We are after a solution for the problem Douglas Raillard discussed at
> > OSPM, specifically the "Task util drop after 1st idle" highlighted in
> > slide 6 of his presentation:
> >
> >   http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> >
>
> So I see the problem, and I don't hate the patch, but I'm still
> struggling to understand how exactly it related to the time-scaling
> stuff. Afaict the fundamental problem here is layering two averages. The

AFAICT, it's not related to the time-scaling

In fact the big 1st activation happens because task runs at low OPP
and hasn't enough time to finish its running phase before the time to
begin the next one happens. This means that the task will run several
computations phase in one go which is no more a 75% task. From a pelt
PoV, the task is far larger than a 75% task and its utilization too
because it runs far longer (even after scaling time with frequency).
Once cpu reaches a high enough OPP that enable to have sleep phase
between each running phases, the task load tracking comes back to the
normal slope increase (the one that would have happen if task would
have jump from 5% to 75% but already running at max OPP)

> second (EWMA in our case) will always lag/delay the input of the first
> (PELT).
>
> The time-scaling thing might make matters worse, because that helps PELT
> ramp up faster, but that is not the primary issue.
>
> Or am I missing something?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-28 12:38     ` Peter Zijlstra
  2019-06-28 13:51       ` Vincent Guittot
@ 2019-06-28 14:00       ` Patrick Bellasi
  2019-08-02  9:47         ` Patrick Bellasi
  1 sibling, 1 reply; 15+ messages in thread
From: Patrick Bellasi @ 2019-06-28 14:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On 28-Jun 14:38, Peter Zijlstra wrote:
> On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > On 26-Jun 13:40, Vincent Guittot wrote:
> > > Hi Patrick,
> > > 
> > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > >
> > > > The estimated utilization for a task is currently defined based on:
> > > >  - enqueued: the utilization value at the end of the last activation
> > > >  - ewma:     an exponential moving average which samples are the enqueued values
> > > >
> > > > According to this definition, when a task suddenly change it's bandwidth
> > > > requirements from small to big, the EWMA will need to collect multiple
> > > > samples before converging up to track the new big utilization.
> > > >
> > > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > > can see that the utilization of the task has a significant drop from the first
> > > > big activation to the following one. That's implied by the new "time-scaling"
> > > 
> > > Could you give us more details about this? I'm not sure to understand
> > > what changes between the 1st big activation and the following one ?
> > 
> > We are after a solution for the problem Douglas Raillard discussed at
> > OSPM, specifically the "Task util drop after 1st idle" highlighted in
> > slide 6 of his presentation:
> > 
> >   http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> > 
> 
> So I see the problem, and I don't hate the patch, but I'm still
> struggling to understand how exactly it related to the time-scaling
> stuff. Afaict the fundamental problem here is layering two averages. The
> second (EWMA in our case) will always lag/delay the input of the first
> (PELT).
> 
> The time-scaling thing might make matters worse, because that helps PELT
> ramp up faster, but that is not the primary issue.

Sure, we like the new time-scaling PELT which ramps up faster and, as
long as we have idle time, it's better in predicting what would be the
utilization as if we was running at max OPP.

However, the experiment above shows that:

 - despite the task being a 75% after a certain activation, it takes
   multiple activations for PELT to actually enter that range.

 - the first activation ends at 665, 10% short wrt the configured
   utilization

 - while the PELT signal converge toward the 75%, we have some pretty
   consistent drops at wakeup time, especially after the first big
   activation.

> Or am I missing something?

I'm not sure the above happens because of a problem in the new
time-scaling PELT, I actually think it's kind of expected given the
way we re-scale time contributions depending on the current OPPs.

It's just that a 375 drops in utilization with just 1.1ms sleep time
looks to me more related to the time-scaling invariance then just the
normal/expected PELT decay.

   Could it be an out-of-sync issue between the PELT time scaling code
   and capacity scaling code?
   Perhaps due to some OPP changes/notification going wrong?

Sorry for not being much more useful on that, maybe Vincent has some
better ideas.

The only thing I've kind of convinced myself is that an EWMA on
util_est does not make a lot of sense for increasing utilization
tracking.

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-28 13:51       ` Vincent Guittot
@ 2019-06-28 14:10         ` Patrick Bellasi
  2019-06-30  8:43           ` Vincent Guittot
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick Bellasi @ 2019-06-28 14:10 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On 28-Jun 15:51, Vincent Guittot wrote:
> On Fri, 28 Jun 2019 at 14:38, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > > On 26-Jun 13:40, Vincent Guittot wrote:
> > > > Hi Patrick,
> > > >
> > > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > > >
> > > > > The estimated utilization for a task is currently defined based on:
> > > > >  - enqueued: the utilization value at the end of the last activation
> > > > >  - ewma:     an exponential moving average which samples are the enqueued values
> > > > >
> > > > > According to this definition, when a task suddenly change it's bandwidth
> > > > > requirements from small to big, the EWMA will need to collect multiple
> > > > > samples before converging up to track the new big utilization.
> > > > >
> > > > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > > > can see that the utilization of the task has a significant drop from the first
> > > > > big activation to the following one. That's implied by the new "time-scaling"
> > > >
> > > > Could you give us more details about this? I'm not sure to understand
> > > > what changes between the 1st big activation and the following one ?
> > >
> > > We are after a solution for the problem Douglas Raillard discussed at
> > > OSPM, specifically the "Task util drop after 1st idle" highlighted in
> > > slide 6 of his presentation:
> > >
> > >   http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> > >
> >
> > So I see the problem, and I don't hate the patch, but I'm still
> > struggling to understand how exactly it related to the time-scaling
> > stuff. Afaict the fundamental problem here is layering two averages. The
> 
> AFAICT, it's not related to the time-scaling
> 
> In fact the big 1st activation happens because task runs at low OPP
> and hasn't enough time to finish its running phase before the time to
> begin the next one happens. This means that the task will run several
> computations phase in one go which is no more a 75% task.

But in that case, running multiple activations back to back, should we
not expect the util_avg to exceed the 75% mark?


> From a pelt PoV, the task is far larger than a 75% task and its
> utilization too because it runs far longer (even after scaling time
> with frequency).

Which thus should match my expectation above, no?

> Once cpu reaches a high enough OPP that enable to have sleep phase
> between each running phases, the task load tracking comes back to the
> normal slope increase (the one that would have happen if task would
> have jump from 5% to 75% but already running at max OPP)


Indeed, I can see from the plots a change in slope. But there is also
that big drop after the first big activation: 375 units in 1.1ms.

Is that expected? I guess yes, since we fix the clock_pelt with the
lost_idle_time.


> > second (EWMA in our case) will always lag/delay the input of the first
> > (PELT).
> >
> > The time-scaling thing might make matters worse, because that helps PELT
> > ramp up faster, but that is not the primary issue.
> >
> > Or am I missing something?

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-28 14:10         ` Patrick Bellasi
@ 2019-06-30  8:43           ` Vincent Guittot
  2019-07-01  8:53             ` Patrick Bellasi
  0 siblings, 1 reply; 15+ messages in thread
From: Vincent Guittot @ 2019-06-30  8:43 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On Fri, 28 Jun 2019 at 16:10, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
>
> On 28-Jun 15:51, Vincent Guittot wrote:
> > On Fri, 28 Jun 2019 at 14:38, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > > > On 26-Jun 13:40, Vincent Guittot wrote:
> > > > > Hi Patrick,
> > > > >
> > > > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > > > >
> > > > > > The estimated utilization for a task is currently defined based on:
> > > > > >  - enqueued: the utilization value at the end of the last activation
> > > > > >  - ewma:     an exponential moving average which samples are the enqueued values
> > > > > >
> > > > > > According to this definition, when a task suddenly change it's bandwidth
> > > > > > requirements from small to big, the EWMA will need to collect multiple
> > > > > > samples before converging up to track the new big utilization.
> > > > > >
> > > > > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > > > > can see that the utilization of the task has a significant drop from the first
> > > > > > big activation to the following one. That's implied by the new "time-scaling"
> > > > >
> > > > > Could you give us more details about this? I'm not sure to understand
> > > > > what changes between the 1st big activation and the following one ?
> > > >
> > > > We are after a solution for the problem Douglas Raillard discussed at
> > > > OSPM, specifically the "Task util drop after 1st idle" highlighted in
> > > > slide 6 of his presentation:
> > > >
> > > >   http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> > > >
> > >
> > > So I see the problem, and I don't hate the patch, but I'm still
> > > struggling to understand how exactly it related to the time-scaling
> > > stuff. Afaict the fundamental problem here is layering two averages. The
> >
> > AFAICT, it's not related to the time-scaling
> >
> > In fact the big 1st activation happens because task runs at low OPP
> > and hasn't enough time to finish its running phase before the time to
> > begin the next one happens. This means that the task will run several
> > computations phase in one go which is no more a 75% task.
>
> But in that case, running multiple activations back to back, should we
> not expect the util_avg to exceed the 75% mark?

But task starts with a very low value and Pelt needs time to ramp up.

>
>
> > From a pelt PoV, the task is far larger than a 75% task and its
> > utilization too because it runs far longer (even after scaling time
> > with frequency).
>
> Which thus should match my expectation above, no?

But utilization has to ramp up before stabilizing to final value. The
value at the end of the 1st big activation is not what would be the
utilization if task was always that long

>
> > Once cpu reaches a high enough OPP that enable to have sleep phase
> > between each running phases, the task load tracking comes back to the
> > normal slope increase (the one that would have happen if task would
> > have jump from 5% to 75% but already running at max OPP)
>
>
> Indeed, I can see from the plots a change in slope. But there is also
> that big drop after the first big activation: 375 units in 1.1ms.
>
> Is that expected? I guess yes, since we fix the clock_pelt with the
> lost_idle_time.
>
>
> > > second (EWMA in our case) will always lag/delay the input of the first
> > > (PELT).
> > >
> > > The time-scaling thing might make matters worse, because that helps PELT
> > > ramp up faster, but that is not the primary issue.
> > >
> > > Or am I missing something?
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-30  8:43           ` Vincent Guittot
@ 2019-07-01  8:53             ` Patrick Bellasi
  0 siblings, 0 replies; 15+ messages in thread
From: Patrick Bellasi @ 2019-07-01  8:53 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On 30-Jun 10:43, Vincent Guittot wrote:
> On Fri, 28 Jun 2019 at 16:10, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > On 28-Jun 15:51, Vincent Guittot wrote:
> > > On Fri, 28 Jun 2019 at 14:38, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > > > > On 26-Jun 13:40, Vincent Guittot wrote:

Hi Vincent,

[...]

> > > AFAICT, it's not related to the time-scaling
> > >
> > > In fact the big 1st activation happens because task runs at low OPP
> > > and hasn't enough time to finish its running phase before the time to
> > > begin the next one happens. This means that the task will run several
> > > computations phase in one go which is no more a 75% task.
> >
> > But in that case, running multiple activations back to back, should we
> > not expect the util_avg to exceed the 75% mark?
> 
> But task starts with a very low value and Pelt needs time to ramp up.

Of course...

[...]

> > > Once cpu reaches a high enough OPP that enable to have sleep phase
> > > between each running phases, the task load tracking comes back to the
> > > normal slope increase (the one that would have happen if task would
> > > have jump from 5% to 75% but already running at max OPP)
> >
> >
> > Indeed, I can see from the plots a change in slope. But there is also
> > that big drop after the first big activation: 375 units in 1.1ms.
> >
> > Is that expected? I guess yes, since we fix the clock_pelt with the
> > lost_idle_time.

... but, I guess Peter was mainly asking about the point above: is
that "big" drop after the first activation related to time-scaling or
not?

Cheers,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-06-28 14:00       ` Patrick Bellasi
@ 2019-08-02  9:47         ` Patrick Bellasi
  2019-10-14 14:52           ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick Bellasi @ 2019-08-02  9:47 UTC (permalink / raw)
  To: Peter Zijlstra, Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Rafael J . Wysocki, Viresh Kumar,
	Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

Hi Peter, Vincent,
is there anything different I can do on this?

Cheers,
Patrick

On 28-Jun 15:00, Patrick Bellasi wrote:
> On 28-Jun 14:38, Peter Zijlstra wrote:
> > On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > > On 26-Jun 13:40, Vincent Guittot wrote:
> > > > Hi Patrick,
> > > > 
> > > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > > >
> > > > > The estimated utilization for a task is currently defined based on:
> > > > >  - enqueued: the utilization value at the end of the last activation
> > > > >  - ewma:     an exponential moving average which samples are the enqueued values
> > > > >
> > > > > According to this definition, when a task suddenly change it's bandwidth
> > > > > requirements from small to big, the EWMA will need to collect multiple
> > > > > samples before converging up to track the new big utilization.
> > > > >
> > > > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > > > can see that the utilization of the task has a significant drop from the first
> > > > > big activation to the following one. That's implied by the new "time-scaling"
> > > > 
> > > > Could you give us more details about this? I'm not sure to understand
> > > > what changes between the 1st big activation and the following one ?
> > > 
> > > We are after a solution for the problem Douglas Raillard discussed at
> > > OSPM, specifically the "Task util drop after 1st idle" highlighted in
> > > slide 6 of his presentation:
> > > 
> > >   http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> > > 
> > 
> > So I see the problem, and I don't hate the patch, but I'm still
> > struggling to understand how exactly it related to the time-scaling
> > stuff. Afaict the fundamental problem here is layering two averages. The
> > second (EWMA in our case) will always lag/delay the input of the first
> > (PELT).
> > 
> > The time-scaling thing might make matters worse, because that helps PELT
> > ramp up faster, but that is not the primary issue.
> 
> Sure, we like the new time-scaling PELT which ramps up faster and, as
> long as we have idle time, it's better in predicting what would be the
> utilization as if we was running at max OPP.
> 
> However, the experiment above shows that:
> 
>  - despite the task being a 75% after a certain activation, it takes
>    multiple activations for PELT to actually enter that range.
> 
>  - the first activation ends at 665, 10% short wrt the configured
>    utilization
> 
>  - while the PELT signal converge toward the 75%, we have some pretty
>    consistent drops at wakeup time, especially after the first big
>    activation.
> 
> > Or am I missing something?
> 
> I'm not sure the above happens because of a problem in the new
> time-scaling PELT, I actually think it's kind of expected given the
> way we re-scale time contributions depending on the current OPPs.
> 
> It's just that a 375 drops in utilization with just 1.1ms sleep time
> looks to me more related to the time-scaling invariance then just the
> normal/expected PELT decay.
> 
>    Could it be an out-of-sync issue between the PELT time scaling code
>    and capacity scaling code?
>    Perhaps due to some OPP changes/notification going wrong?
> 
> Sorry for not being much more useful on that, maybe Vincent has some
> better ideas.
> 
> The only thing I've kind of convinced myself is that an EWMA on
> util_est does not make a lot of sense for increasing utilization
> tracking.
> 
> Best,
> Patrick
> 
> -- 
> #include <best/regards.h>
> 
> Patrick Bellasi

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-08-02  9:47         ` Patrick Bellasi
@ 2019-10-14 14:52           ` Peter Zijlstra
  2019-10-14 14:57             ` Vincent Guittot
                               ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2019-10-14 14:52 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

The energy aware schedutil patches remimded me this was still pending.

On Fri, Aug 02, 2019 at 10:47:25AM +0100, Patrick Bellasi wrote:
> Hi Peter, Vincent,
> is there anything different I can do on this?

I think both Vincent and me are basically fine with the patch, it was
the Changelog/explanation for it that sat uneasy.

Specifically I think the 'confusion' around the PELT invariance stuff
doesn't help.

I think that if you present it simply as making util_est directly follow
upward motion and only decay on downward -- and the rationale for it --
then it should be fine.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-10-14 14:52           ` Peter Zijlstra
@ 2019-10-14 14:57             ` Vincent Guittot
  2019-10-14 16:16             ` Douglas Raillard
  2019-10-21  6:19             ` Patrick Bellasi
  2 siblings, 0 replies; 15+ messages in thread
From: Vincent Guittot @ 2019-10-14 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Douglas Raillard, Quentin Perret, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli

On Mon, 14 Oct 2019 at 16:52, Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> The energy aware schedutil patches remimded me this was still pending.
>
> On Fri, Aug 02, 2019 at 10:47:25AM +0100, Patrick Bellasi wrote:
> > Hi Peter, Vincent,
> > is there anything different I can do on this?
>
> I think both Vincent and me are basically fine with the patch, it was
> the Changelog/explanation for it that sat uneasy.

I agree

>
> Specifically I think the 'confusion' around the PELT invariance stuff
> doesn't help.
>
> I think that if you present it simply as making util_est directly follow
> upward motion and only decay on downward -- and the rationale for it --
> then it should be fine.
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-10-14 14:52           ` Peter Zijlstra
  2019-10-14 14:57             ` Vincent Guittot
@ 2019-10-14 16:16             ` Douglas Raillard
  2019-10-17  8:25               ` Peter Zijlstra
  2019-10-21  6:19             ` Patrick Bellasi
  2 siblings, 1 reply; 15+ messages in thread
From: Douglas Raillard @ 2019-10-14 16:16 UTC (permalink / raw)
  To: Peter Zijlstra, Patrick Bellasi
  Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Quentin Perret, Dietmar Eggemann, Morten Rasmussen,
	Juri Lelli

Hi Peter,

On 10/14/19 3:52 PM, Peter Zijlstra wrote:
> 
> The energy aware schedutil patches remimded me this was still pending.
> 
> On Fri, Aug 02, 2019 at 10:47:25AM +0100, Patrick Bellasi wrote:
>> Hi Peter, Vincent,
>> is there anything different I can do on this?
> 
> I think both Vincent and me are basically fine with the patch, it was
> the Changelog/explanation for it that sat uneasy.
> 
> Specifically I think the 'confusion' around the PELT invariance stuff
> doesn't help.
> 
> I think that if you present it simply as making util_est directly follow
> upward motion and only decay on downward -- and the rationale for it --
> then it should be fine.

random idea: Since these things are much easier to understand by looking at a graph
of util over time, we may agree on some mailing-list-friendly way to convey graphs.
For example, a simple CSV with:
* before/after delimiters (line of # or =)
* graph title
* one point per signal transition, so that it can be plotted with gnuplot style "steps" or matplotlib drawstyle='steps-post'
* consistent column names:
    - time: in seconds (scientific notation for nanoseconds)
    - activation: 1 when the task is actually running, 0 otherwise
     (so it can be turned into transparent coloured bands like using gnuplot filledcurves, like in [1])
    - util: util_avg of the task being talked about

The delimiters allow writing a scripts to render graphs directly out of an mbox file or ML archive URL.
This won't solve the issue for the commit message itself, but that may ease the ML discussions.

[1] https://lisa-linux-integrated-system-analysis.readthedocs.io/en/master/trace_analysis.html#lisa.analysis.tasks.TasksAnalysis.plot_task_activation

Cheers,
Douglas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-10-14 16:16             ` Douglas Raillard
@ 2019-10-17  8:25               ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2019-10-17  8:25 UTC (permalink / raw)
  To: Douglas Raillard
  Cc: Patrick Bellasi, Vincent Guittot, linux-kernel, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Quentin Perret,
	Dietmar Eggemann, Morten Rasmussen, Juri Lelli

On Mon, Oct 14, 2019 at 05:16:02PM +0100, Douglas Raillard wrote:

> random idea: Since these things are much easier to understand by looking at a graph
> of util over time, we may agree on some mailing-list-friendly way to convey graphs.

I don't think that this patch warrants something like that. It is fairly
clear what it does.

For other stuff, maybe.

> For example, a simple CSV with:
> * before/after delimiters (line of # or =)
> * graph title
> * one point per signal transition, so that it can be plotted with gnuplot style "steps" or matplotlib drawstyle='steps-post'
> * consistent column names:
>    - time: in seconds (scientific notation for nanoseconds)
>    - activation: 1 when the task is actually running, 0 otherwise
>     (so it can be turned into transparent coloured bands like using gnuplot filledcurves, like in [1])
>    - util: util_avg of the task being talked about
> 
> The delimiters allow writing a scripts to render graphs directly out of an mbox file or ML archive URL.
> This won't solve the issue for the commit message itself, but that may ease the ML discussions.

Something like that could work; mutt can easily pipe emails into
scripts. OTOH gnuplot also has ASCII output, so one can easily stick
something like that into email.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases
  2019-10-14 14:52           ` Peter Zijlstra
  2019-10-14 14:57             ` Vincent Guittot
  2019-10-14 16:16             ` Douglas Raillard
@ 2019-10-21  6:19             ` Patrick Bellasi
  2 siblings, 0 replies; 15+ messages in thread
From: Patrick Bellasi @ 2019-10-21  6:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, Vincent Guittot, linux-kernel, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Douglas Raillard,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli

Hi Peter,

On 14-Oct 16:52, Peter Zijlstra wrote:
> 
> The energy aware schedutil patches remimded me this was still pending.
> 
> On Fri, Aug 02, 2019 at 10:47:25AM +0100, Patrick Bellasi wrote:
> > Hi Peter, Vincent,
> > is there anything different I can do on this?
> 
> I think both Vincent and me are basically fine with the patch, it was
> the Changelog/explanation for it that sat uneasy.
> 
> Specifically I think the 'confusion' around the PELT invariance stuff
> doesn't help.
> 
> I think that if you present it simply as making util_est directly follow
> upward motion and only decay on downward -- and the rationale for it --
> then it should be fine.

Ok, I'll update the commit message to remove the PELT related
ambiguity and post a new version soon.

Cheers,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-10-21  6:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-20 15:05 [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases Patrick Bellasi
2019-06-26 11:40 ` Vincent Guittot
2019-06-28 10:08   ` Patrick Bellasi
2019-06-28 12:38     ` Peter Zijlstra
2019-06-28 13:51       ` Vincent Guittot
2019-06-28 14:10         ` Patrick Bellasi
2019-06-30  8:43           ` Vincent Guittot
2019-07-01  8:53             ` Patrick Bellasi
2019-06-28 14:00       ` Patrick Bellasi
2019-08-02  9:47         ` Patrick Bellasi
2019-10-14 14:52           ` Peter Zijlstra
2019-10-14 14:57             ` Vincent Guittot
2019-10-14 16:16             ` Douglas Raillard
2019-10-17  8:25               ` Peter Zijlstra
2019-10-21  6:19             ` Patrick Bellasi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.