linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/pelt: Fix task util_est update filtering
@ 2021-02-16 16:39 vincent.donnefort
  2021-02-19 10:19 ` Dietmar Eggemann
  2021-02-19 10:48 ` Vincent Guittot
  0 siblings, 2 replies; 7+ messages in thread
From: vincent.donnefort @ 2021-02-16 16:39 UTC (permalink / raw)
  To: peterz, tglx, vincent.guittot
  Cc: dietmar.eggemann, linux-kernel, patrick.bellasi,
	valentin.schneider, Vincent Donnefort

From: Vincent Donnefort <vincent.donnefort@arm.com>

Being called for each dequeue, util_est reduces the number of its updates
by filtering out when the EWMA signal is different from the task util_avg
by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
decay from a previous high util_avg, EWMA might now be close enough to
the new util_avg. No update would then happen while it would leave
ue.enqueued with an out-of-date value.

Taking into consideration the two util_est members, EWMA and enqueued for
the filtering, ensures, for both, an up-to-date value.

This is for now an issue only for the trace probe that might return the
stale value. Functional-wise, it isn't (yet) a problem, as the value is
always accessed through max(enqueued, ewma).

This problem has been observed using LISA's UtilConvergence:test_means on
the sd845c board.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb945f8..9008e0c42def 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3941,24 +3941,27 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
 	trace_sched_util_est_cfs_tp(cfs_rq);
 }
 
+#define UTIL_EST_MARGIN (SCHED_CAPACITY_SCALE / 100)
+
 /*
- * Check if a (signed) value is within a specified (unsigned) margin,
+ * Check if a (signed) value is within the (unsigned) util_est margin,
  * based on the observation that:
  *
  *     abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
  *
- * NOTE: this only works when value + maring < INT_MAX.
+ * NOTE: this only works when value + UTIL_EST_MARGIN < INT_MAX.
  */
-static inline bool within_margin(int value, int margin)
+static inline bool util_est_within_margin(int value)
 {
-	return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
+	return ((unsigned int)(value + UTIL_EST_MARGIN - 1) <
+		(2 * UTIL_EST_MARGIN - 1));
 }
 
 static inline void util_est_update(struct cfs_rq *cfs_rq,
 				   struct task_struct *p,
 				   bool task_sleep)
 {
-	long last_ewma_diff;
+	long last_ewma_diff, last_enqueued_diff;
 	struct util_est ue;
 
 	if (!sched_feat(UTIL_EST))
@@ -3979,6 +3982,8 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	if (ue.enqueued & UTIL_AVG_UNCHANGED)
 		return;
 
+	last_enqueued_diff = ue.enqueued;
+
 	/*
 	 * Reset EWMA on utilization increases, the moving average is used only
 	 * to smooth utilization decreases.
@@ -3992,12 +3997,19 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
 	}
 
 	/*
-	 * Skip update of task's estimated utilization when its EWMA is
+	 * Skip update of task's estimated utilization when its members are
 	 * already ~1% close to its last activation value.
 	 */
 	last_ewma_diff = ue.enqueued - ue.ewma;
-	if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
+	last_enqueued_diff -= ue.enqueued;
+	if (util_est_within_margin(last_ewma_diff)) {
+		if (!util_est_within_margin(last_enqueued_diff)) {
+			ue.ewma = ue.enqueued;
+			goto done;
+		}
+
 		return;
+	}
 
 	/*
 	 * To avoid overestimation of actual task utilization, skip updates if
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/pelt: Fix task util_est update filtering
  2021-02-16 16:39 [PATCH] sched/pelt: Fix task util_est update filtering vincent.donnefort
@ 2021-02-19 10:19 ` Dietmar Eggemann
  2021-02-22  9:30   ` Vincent Donnefort
  2021-02-19 10:48 ` Vincent Guittot
  1 sibling, 1 reply; 7+ messages in thread
From: Dietmar Eggemann @ 2021-02-19 10:19 UTC (permalink / raw)
  To: vincent.donnefort, peterz, tglx, vincent.guittot
  Cc: linux-kernel, patrick.bellasi, valentin.schneider

On 16/02/2021 17:39, vincent.donnefort@arm.com wrote:
> From: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> Being called for each dequeue, util_est reduces the number of its updates
> by filtering out when the EWMA signal is different from the task util_avg
> by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> decay from a previous high util_avg, EWMA might now be close enough to
> the new util_avg. No update would then happen while it would leave
> ue.enqueued with an out-of-date value.

(1) enqueued[x-1] < ewma[x-1]

(2) diff(enqueued[x], ewma[x]) < 1024/100 && enqueued[x] < ewma[x] (*)

with ewma[x-1] == ewma[x]

(*) enqueued[x] must still be less than ewma[x] w/ default
UTIL_EST_FASTUP. Otherwise we would already 'goto done' (write the new
util_est) via the previous if condition.

> 
> Taking into consideration the two util_est members, EWMA and enqueued for
> the filtering, ensures, for both, an up-to-date value.
> 
> This is for now an issue only for the trace probe that might return the
> stale value. Functional-wise, it isn't (yet) a problem, as the value is
> always accessed through max(enqueued, ewma).

Yeah, I remember that the ue.enqueued plots looked weird in these
sections with stale ue.enqueued values.

> This problem has been observed using LISA's UtilConvergence:test_means on
> the sd845c board.

I ran the test a couple of times on my juno board and I never hit this
path (util_est_within_margin(last_ewma_diff) &&
!util_est_within_margin(last_enqueued_diff)) for a test task.

I can't see how this issue can be board specific? Does it happen
reliably on sd845c or is it just that it happens very, very occasionally?

I saw it a couple of times but always with a (non-test) tasks migrating
from one CPU to another.

> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

[...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/pelt: Fix task util_est update filtering
  2021-02-16 16:39 [PATCH] sched/pelt: Fix task util_est update filtering vincent.donnefort
  2021-02-19 10:19 ` Dietmar Eggemann
@ 2021-02-19 10:48 ` Vincent Guittot
  2021-02-22  9:24   ` Vincent Donnefort
  1 sibling, 1 reply; 7+ messages in thread
From: Vincent Guittot @ 2021-02-19 10:48 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Peter Zijlstra, Thomas Gleixner, Dietmar Eggemann, linux-kernel,
	Patrick Bellasi, Valentin Schneider

On Tue, 16 Feb 2021 at 17:39, <vincent.donnefort@arm.com> wrote:
>
> From: Vincent Donnefort <vincent.donnefort@arm.com>
>
> Being called for each dequeue, util_est reduces the number of its updates
> by filtering out when the EWMA signal is different from the task util_avg
> by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> decay from a previous high util_avg, EWMA might now be close enough to
> the new util_avg. No update would then happen while it would leave
> ue.enqueued with an out-of-date value.
>
> Taking into consideration the two util_est members, EWMA and enqueued for
> the filtering, ensures, for both, an up-to-date value.
>
> This is for now an issue only for the trace probe that might return the
> stale value. Functional-wise, it isn't (yet) a problem, as the value is

What do you mean by "it isn't (yet) a problem" ? How could this become
a problem ?

> always accessed through max(enqueued, ewma).
>

This adds more tests and or update of  struct avg.util_est. It would
be good to have an idea of the perf impact. Especially because this
only fixes a tracing problem


> This problem has been observed using LISA's UtilConvergence:test_means on
> the sd845c board.
>
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 794c2cb945f8..9008e0c42def 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3941,24 +3941,27 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
>         trace_sched_util_est_cfs_tp(cfs_rq);
>  }
>
> +#define UTIL_EST_MARGIN (SCHED_CAPACITY_SCALE / 100)
> +
>  /*
> - * Check if a (signed) value is within a specified (unsigned) margin,
> + * Check if a (signed) value is within the (unsigned) util_est margin,
>   * based on the observation that:
>   *
>   *     abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
>   *
> - * NOTE: this only works when value + maring < INT_MAX.
> + * NOTE: this only works when value + UTIL_EST_MARGIN < INT_MAX.
>   */
> -static inline bool within_margin(int value, int margin)
> +static inline bool util_est_within_margin(int value)
>  {
> -       return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
> +       return ((unsigned int)(value + UTIL_EST_MARGIN - 1) <
> +               (2 * UTIL_EST_MARGIN - 1));
>  }
>
>  static inline void util_est_update(struct cfs_rq *cfs_rq,
>                                    struct task_struct *p,
>                                    bool task_sleep)
>  {
> -       long last_ewma_diff;
> +       long last_ewma_diff, last_enqueued_diff;
>         struct util_est ue;
>
>         if (!sched_feat(UTIL_EST))
> @@ -3979,6 +3982,8 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>         if (ue.enqueued & UTIL_AVG_UNCHANGED)
>                 return;
>
> +       last_enqueued_diff = ue.enqueued;
> +
>         /*
>          * Reset EWMA on utilization increases, the moving average is used only
>          * to smooth utilization decreases.
> @@ -3992,12 +3997,19 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
>         }
>
>         /*
> -        * Skip update of task's estimated utilization when its EWMA is
> +        * Skip update of task's estimated utilization when its members are
>          * already ~1% close to its last activation value.
>          */
>         last_ewma_diff = ue.enqueued - ue.ewma;
> -       if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
> +       last_enqueued_diff -= ue.enqueued;
> +       if (util_est_within_margin(last_ewma_diff)) {
> +               if (!util_est_within_margin(last_enqueued_diff)) {
> +                       ue.ewma = ue.enqueued;
> +                       goto done;
> +               }
> +
>                 return;
> +       }
>
>         /*
>          * To avoid overestimation of actual task utilization, skip updates if
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/pelt: Fix task util_est update filtering
  2021-02-19 10:48 ` Vincent Guittot
@ 2021-02-22  9:24   ` Vincent Donnefort
  2021-02-25 15:26     ` Vincent Guittot
  0 siblings, 1 reply; 7+ messages in thread
From: Vincent Donnefort @ 2021-02-22  9:24 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Dietmar Eggemann, linux-kernel, Patrick Bellasi,
	Valentin Schneider

On Fri, Feb 19, 2021 at 11:48:28AM +0100, Vincent Guittot wrote:
> On Tue, 16 Feb 2021 at 17:39, <vincent.donnefort@arm.com> wrote:
> >
> > From: Vincent Donnefort <vincent.donnefort@arm.com>
> >
> > Being called for each dequeue, util_est reduces the number of its updates
> > by filtering out when the EWMA signal is different from the task util_avg
> > by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> > decay from a previous high util_avg, EWMA might now be close enough to
> > the new util_avg. No update would then happen while it would leave
> > ue.enqueued with an out-of-date value.
> >
> > Taking into consideration the two util_est members, EWMA and enqueued for
> > the filtering, ensures, for both, an up-to-date value.
> >
> > This is for now an issue only for the trace probe that might return the
> > stale value. Functional-wise, it isn't (yet) a problem, as the value is
> 
> What do you mean by "it isn't (yet) a problem" ? How could this become
> a problem ?

I wrote "yet" as nothing prevents anyone from using the ue.enqueued signal.

> 
> > always accessed through max(enqueued, ewma).
> >
> 
> This adds more tests and or update of  struct avg.util_est. It would
> be good to have an idea of the perf impact. Especially because this
> only fixes a tracing problem

I ran hackbench on the big cores of a SD845C board. After 100 iterations of
100 loops runs, the geometric mean of the hackbench test is 0.1% lower
with this patch applied (2.0833s vs 2.0858s). The p-value, computed with
the ks_2samp [1] is 0.37. We can't conclude that the two distributions are
different. This patch, in this scenario seems completely harmless.

Shall I include those results in the commit message?

[1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html

> 
> 
> > This problem has been observed using LISA's UtilConvergence:test_means on
> > the sd845c board.
> >
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 794c2cb945f8..9008e0c42def 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3941,24 +3941,27 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
> >         trace_sched_util_est_cfs_tp(cfs_rq);
> >  }
> >
> > +#define UTIL_EST_MARGIN (SCHED_CAPACITY_SCALE / 100)
> > +
> >  /*
> > - * Check if a (signed) value is within a specified (unsigned) margin,
> > + * Check if a (signed) value is within the (unsigned) util_est margin,
> >   * based on the observation that:
> >   *
> >   *     abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
> >   *
> > - * NOTE: this only works when value + maring < INT_MAX.
> > + * NOTE: this only works when value + UTIL_EST_MARGIN < INT_MAX.
> >   */
> > -static inline bool within_margin(int value, int margin)
> > +static inline bool util_est_within_margin(int value)
> >  {
> > -       return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
> > +       return ((unsigned int)(value + UTIL_EST_MARGIN - 1) <
> > +               (2 * UTIL_EST_MARGIN - 1));
> >  }
> >
> >  static inline void util_est_update(struct cfs_rq *cfs_rq,
> >                                    struct task_struct *p,
> >                                    bool task_sleep)
> >  {
> > -       long last_ewma_diff;
> > +       long last_ewma_diff, last_enqueued_diff;
> >         struct util_est ue;
> >
> >         if (!sched_feat(UTIL_EST))
> > @@ -3979,6 +3982,8 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> >         if (ue.enqueued & UTIL_AVG_UNCHANGED)
> >                 return;
> >
> > +       last_enqueued_diff = ue.enqueued;
> > +
> >         /*
> >          * Reset EWMA on utilization increases, the moving average is used only
> >          * to smooth utilization decreases.
> > @@ -3992,12 +3997,19 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> >         }
> >
> >         /*
> > -        * Skip update of task's estimated utilization when its EWMA is
> > +        * Skip update of task's estimated utilization when its members are
> >          * already ~1% close to its last activation value.
> >          */
> >         last_ewma_diff = ue.enqueued - ue.ewma;
> > -       if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
> > +       last_enqueued_diff -= ue.enqueued;
> > +       if (util_est_within_margin(last_ewma_diff)) {
> > +               if (!util_est_within_margin(last_enqueued_diff)) {
> > +                       ue.ewma = ue.enqueued;
> > +                       goto done;
> > +               }
> > +
> >                 return;
> > +       }
> >
> >         /*
> >          * To avoid overestimation of actual task utilization, skip updates if
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/pelt: Fix task util_est update filtering
  2021-02-19 10:19 ` Dietmar Eggemann
@ 2021-02-22  9:30   ` Vincent Donnefort
  0 siblings, 0 replies; 7+ messages in thread
From: Vincent Donnefort @ 2021-02-22  9:30 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: peterz, vincent.guittot, linux-kernel, patrick.bellasi,
	valentin.schneider

On Fri, Feb 19, 2021 at 11:19:05AM +0100, Dietmar Eggemann wrote:
> On 16/02/2021 17:39, vincent.donnefort@arm.com wrote:
> > From: Vincent Donnefort <vincent.donnefort@arm.com>
> > 
> > Being called for each dequeue, util_est reduces the number of its updates
> > by filtering out when the EWMA signal is different from the task util_avg
> > by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> > decay from a previous high util_avg, EWMA might now be close enough to
> > the new util_avg. No update would then happen while it would leave
> > ue.enqueued with an out-of-date value.
> 
> (1) enqueued[x-1] < ewma[x-1]
> 
> (2) diff(enqueued[x], ewma[x]) < 1024/100 && enqueued[x] < ewma[x] (*)
> 
> with ewma[x-1] == ewma[x]
> 
> (*) enqueued[x] must still be less than ewma[x] w/ default
> UTIL_EST_FASTUP. Otherwise we would already 'goto done' (write the new
> util_est) via the previous if condition.
> 
> > 
> > Taking into consideration the two util_est members, EWMA and enqueued for
> > the filtering, ensures, for both, an up-to-date value.
> > 
> > This is for now an issue only for the trace probe that might return the
> > stale value. Functional-wise, it isn't (yet) a problem, as the value is
> > always accessed through max(enqueued, ewma).
> 
> Yeah, I remember that the ue.enqueued plots looked weird in these
> sections with stale ue.enqueued values.
> 
> > This problem has been observed using LISA's UtilConvergence:test_means on
> > the sd845c board.
> 
> I ran the test a couple of times on my juno board and I never hit this
> path (util_est_within_margin(last_ewma_diff) &&
> !util_est_within_margin(last_enqueued_diff)) for a test task.
> 
> I can't see how this issue can be board specific? Does it happen
> reliably on sd845c or is it just that it happens very, very occasionally?

This is indeed not board specific. It just happened to be observed on that
one. And even then, it happens every once in a while.

> 
> I saw it a couple of times but always with a (non-test) tasks migrating
> from one CPU to another.
> 
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

Thanks!

> 
> [...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/pelt: Fix task util_est update filtering
  2021-02-22  9:24   ` Vincent Donnefort
@ 2021-02-25 15:26     ` Vincent Guittot
  2021-02-25 16:07       ` Vincent Donnefort
  0 siblings, 1 reply; 7+ messages in thread
From: Vincent Guittot @ 2021-02-25 15:26 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Peter Zijlstra, Dietmar Eggemann, linux-kernel, Patrick Bellasi,
	Valentin Schneider

On Mon, 22 Feb 2021 at 10:24, Vincent Donnefort
<vincent.donnefort@arm.com> wrote:
>
> On Fri, Feb 19, 2021 at 11:48:28AM +0100, Vincent Guittot wrote:
> > On Tue, 16 Feb 2021 at 17:39, <vincent.donnefort@arm.com> wrote:
> > >
> > > From: Vincent Donnefort <vincent.donnefort@arm.com>
> > >
> > > Being called for each dequeue, util_est reduces the number of its updates
> > > by filtering out when the EWMA signal is different from the task util_avg
> > > by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> > > decay from a previous high util_avg, EWMA might now be close enough to
> > > the new util_avg. No update would then happen while it would leave
> > > ue.enqueued with an out-of-date value.
> > >
> > > Taking into consideration the two util_est members, EWMA and enqueued for
> > > the filtering, ensures, for both, an up-to-date value.
> > >
> > > This is for now an issue only for the trace probe that might return the
> > > stale value. Functional-wise, it isn't (yet) a problem, as the value is
> >
> > What do you mean by "it isn't (yet) a problem" ? How could this become
> > a problem ?
>
> I wrote "yet" as nothing prevents anyone from using the ue.enqueued signal.

Hmm.. you are not supposed to use it outside the helper functions so
this is irrelevant IMO which means that only the trace probe is
impacted

>
> >
> > > always accessed through max(enqueued, ewma).
> > >
> >
> > This adds more tests and or update of  struct avg.util_est. It would
> > be good to have an idea of the perf impact. Especially because this
> > only fixes a tracing problem
>
> I ran hackbench on the big cores of a SD845C board. After 100 iterations of
> 100 loops runs, the geometric mean of the hackbench test is 0.1% lower
> with this patch applied (2.0833s vs 2.0858s). The p-value, computed with
> the ks_2samp [1] is 0.37. We can't conclude that the two distributions are
> different. This patch, in this scenario seems completely harmless.

For such kind of change,  perf bench sched pipe is better to highlight
any perf regression. I have done a quick test and i haven't seen
noticeable difference

>
> Shall I include those results in the commit message?
>
> [1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html
>
> >
> >
> > > This problem has been observed using LISA's UtilConvergence:test_means on
> > > the sd845c board.
> > >
> > > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 794c2cb945f8..9008e0c42def 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -3941,24 +3941,27 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
> > >         trace_sched_util_est_cfs_tp(cfs_rq);
> > >  }
> > >
> > > +#define UTIL_EST_MARGIN (SCHED_CAPACITY_SCALE / 100)
> > > +
> > >  /*
> > > - * Check if a (signed) value is within a specified (unsigned) margin,
> > > + * Check if a (signed) value is within the (unsigned) util_est margin,
> > >   * based on the observation that:
> > >   *
> > >   *     abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
> > >   *
> > > - * NOTE: this only works when value + maring < INT_MAX.
> > > + * NOTE: this only works when value + UTIL_EST_MARGIN < INT_MAX.
> > >   */
> > > -static inline bool within_margin(int value, int margin)
> > > +static inline bool util_est_within_margin(int value)
> > >  {
> > > -       return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
> > > +       return ((unsigned int)(value + UTIL_EST_MARGIN - 1) <
> > > +               (2 * UTIL_EST_MARGIN - 1));
> > >  }
> > >
> > >  static inline void util_est_update(struct cfs_rq *cfs_rq,
> > >                                    struct task_struct *p,
> > >                                    bool task_sleep)
> > >  {
> > > -       long last_ewma_diff;
> > > +       long last_ewma_diff, last_enqueued_diff;
> > >         struct util_est ue;
> > >
> > >         if (!sched_feat(UTIL_EST))
> > > @@ -3979,6 +3982,8 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> > >         if (ue.enqueued & UTIL_AVG_UNCHANGED)
> > >                 return;
> > >
> > > +       last_enqueued_diff = ue.enqueued;
> > > +
> > >         /*
> > >          * Reset EWMA on utilization increases, the moving average is used only
> > >          * to smooth utilization decreases.
> > > @@ -3992,12 +3997,19 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> > >         }
> > >
> > >         /*
> > > -        * Skip update of task's estimated utilization when its EWMA is
> > > +        * Skip update of task's estimated utilization when its members are
> > >          * already ~1% close to its last activation value.
> > >          */
> > >         last_ewma_diff = ue.enqueued - ue.ewma;
> > > -       if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
> > > +       last_enqueued_diff -= ue.enqueued;
> > > +       if (util_est_within_margin(last_ewma_diff)) {
> > > +               if (!util_est_within_margin(last_enqueued_diff)) {
> > > +                       ue.ewma = ue.enqueued;

why do you set ewma directly with latest enqueued value ?

> > > +                       goto done;
> > > +               }
> > > +
> > >                 return;
> > > +       }
> > >
> > >         /*
> > >          * To avoid overestimation of actual task utilization, skip updates if
> > > --
> > > 2.25.1
> > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched/pelt: Fix task util_est update filtering
  2021-02-25 15:26     ` Vincent Guittot
@ 2021-02-25 16:07       ` Vincent Donnefort
  0 siblings, 0 replies; 7+ messages in thread
From: Vincent Donnefort @ 2021-02-25 16:07 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Dietmar Eggemann, linux-kernel, Patrick Bellasi,
	Valentin Schneider

On Thu, Feb 25, 2021 at 04:26:50PM +0100, Vincent Guittot wrote:
> On Mon, 22 Feb 2021 at 10:24, Vincent Donnefort
> <vincent.donnefort@arm.com> wrote:
> >
> > On Fri, Feb 19, 2021 at 11:48:28AM +0100, Vincent Guittot wrote:
> > > On Tue, 16 Feb 2021 at 17:39, <vincent.donnefort@arm.com> wrote:
> > > >
> > > > From: Vincent Donnefort <vincent.donnefort@arm.com>
> > > >
> > > > Being called for each dequeue, util_est reduces the number of its updates
> > > > by filtering out when the EWMA signal is different from the task util_avg
> > > > by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> > > > decay from a previous high util_avg, EWMA might now be close enough to
> > > > the new util_avg. No update would then happen while it would leave
> > > > ue.enqueued with an out-of-date value.
> > > >
> > > > Taking into consideration the two util_est members, EWMA and enqueued for
> > > > the filtering, ensures, for both, an up-to-date value.
> > > >
> > > > This is for now an issue only for the trace probe that might return the
> > > > stale value. Functional-wise, it isn't (yet) a problem, as the value is
> > >
> > > What do you mean by "it isn't (yet) a problem" ? How could this become
> > > a problem ?
> >
> > I wrote "yet" as nothing prevents anyone from using the ue.enqueued signal.
> 
> Hmm.. you are not supposed to use it outside the helper functions so
> this is irrelevant IMO which means that only the trace probe is
> impacted

I'll remove it.

> 
> >
> > >
> > > > always accessed through max(enqueued, ewma).
> > > >
> > >
> > > This adds more tests and or update of  struct avg.util_est. It would
> > > be good to have an idea of the perf impact. Especially because this
> > > only fixes a tracing problem
> >
> > I ran hackbench on the big cores of a SD845C board. After 100 iterations of
> > 100 loops runs, the geometric mean of the hackbench test is 0.1% lower
> > with this patch applied (2.0833s vs 2.0858s). The p-value, computed with
> > the ks_2samp [1] is 0.37. We can't conclude that the two distributions are
> > different. This patch, in this scenario seems completely harmless.
> 
> For such kind of change,  perf bench sched pipe is better to highlight
> any perf regression. I have done a quick test and i haven't seen
> noticeable difference

Thanks. I'll add your results to the commit message.

> 
> >
> > Shall I include those results in the commit message?
> >
> > [1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html
> >
> > >
> > >
> > > > This problem has been observed using LISA's UtilConvergence:test_means on
> > > > the sd845c board.
> > > >
> > > > Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 794c2cb945f8..9008e0c42def 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -3941,24 +3941,27 @@ static inline void util_est_dequeue(struct cfs_rq *cfs_rq,
> > > >         trace_sched_util_est_cfs_tp(cfs_rq);
> > > >  }
> > > >
> > > > +#define UTIL_EST_MARGIN (SCHED_CAPACITY_SCALE / 100)
> > > > +
> > > >  /*
> > > > - * Check if a (signed) value is within a specified (unsigned) margin,
> > > > + * Check if a (signed) value is within the (unsigned) util_est margin,
> > > >   * based on the observation that:
> > > >   *
> > > >   *     abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
> > > >   *
> > > > - * NOTE: this only works when value + maring < INT_MAX.
> > > > + * NOTE: this only works when value + UTIL_EST_MARGIN < INT_MAX.
> > > >   */
> > > > -static inline bool within_margin(int value, int margin)
> > > > +static inline bool util_est_within_margin(int value)
> > > >  {
> > > > -       return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
> > > > +       return ((unsigned int)(value + UTIL_EST_MARGIN - 1) <
> > > > +               (2 * UTIL_EST_MARGIN - 1));
> > > >  }
> > > >
> > > >  static inline void util_est_update(struct cfs_rq *cfs_rq,
> > > >                                    struct task_struct *p,
> > > >                                    bool task_sleep)
> > > >  {
> > > > -       long last_ewma_diff;
> > > > +       long last_ewma_diff, last_enqueued_diff;
> > > >         struct util_est ue;
> > > >
> > > >         if (!sched_feat(UTIL_EST))
> > > > @@ -3979,6 +3982,8 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> > > >         if (ue.enqueued & UTIL_AVG_UNCHANGED)
> > > >                 return;
> > > >
> > > > +       last_enqueued_diff = ue.enqueued;
> > > > +
> > > >         /*
> > > >          * Reset EWMA on utilization increases, the moving average is used only
> > > >          * to smooth utilization decreases.
> > > > @@ -3992,12 +3997,19 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> > > >         }
> > > >
> > > >         /*
> > > > -        * Skip update of task's estimated utilization when its EWMA is
> > > > +        * Skip update of task's estimated utilization when its members are
> > > >          * already ~1% close to its last activation value.
> > > >          */
> > > >         last_ewma_diff = ue.enqueued - ue.ewma;
> > > > -       if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
> > > > +       last_enqueued_diff -= ue.enqueued;
> > > > +       if (util_est_within_margin(last_ewma_diff)) {
> > > > +               if (!util_est_within_margin(last_enqueued_diff)) {
> > > > +                       ue.ewma = ue.enqueued;
> 
> why do you set ewma directly with latest enqueued value ?

The idea was to align both ewma and enqueued, as the diff is < 1% anyway.

I'll remove that in v2.


> 
> > > > +                       goto done;
> > > > +               }
> > > > +
> > > >                 return;
> > > > +       }
> > > >
> > > >         /*
> > > >          * To avoid overestimation of actual task utilization, skip updates if
> > > > --
> > > > 2.25.1
> > > >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-25 16:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-16 16:39 [PATCH] sched/pelt: Fix task util_est update filtering vincent.donnefort
2021-02-19 10:19 ` Dietmar Eggemann
2021-02-22  9:30   ` Vincent Donnefort
2021-02-19 10:48 ` Vincent Guittot
2021-02-22  9:24   ` Vincent Donnefort
2021-02-25 15:26     ` Vincent Guittot
2021-02-25 16:07       ` Vincent Donnefort

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).