* [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg
2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
2022-01-12 15:26 ` Dietmar Eggemann
2022-01-18 11:18 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-11 13:46 ` [PATCH v3 2/4] sched/pelt: Continue to relax " Vincent Guittot
` (4 subsequent siblings)
5 siblings, 2 replies; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, bristot, linux-kernel, rickyiu, odin
Cc: sachinp, naresh.kamboju, Vincent Guittot
Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.
Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.
In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.
Reported-by: Rick Yiu <rickyiu@google.com>
Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 095b0aa378df..ed35255fdb85 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3381,6 +3381,7 @@ void set_task_rq_fair(struct sched_entity *se,
se->avg.last_update_time = n_last_update_time;
}
+#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
/*
* When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
@@ -3449,7 +3450,6 @@ void set_task_rq_fair(struct sched_entity *se,
* XXX: only do this for the part of runnable > running ?
*
*/
-
static inline void
update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
@@ -3681,7 +3681,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
r = removed_util;
sub_positive(&sa->util_avg, r);
- sa->util_sum = sa->util_avg * divider;
+ sub_positive(&sa->util_sum, r * divider);
+ /*
+ * Because of rounding, se->util_sum might ends up being +1 more than
+ * cfs->util_sum. Although this is not a problem by itself, detaching
+ * a lot of tasks with the rounding problem between 2 updates of
+ * util_avg (~1ms) can make cfs->util_sum becoming null whereas
+ * cfs_util_avg is not.
+ * Check that util_sum is still above its lower bound for the new
+ * util_avg. Given that period_contrib might have moved since the last
+ * sync, we are only sure that util_sum must be above or equal to
+ * util_avg * minimum possible divider
+ */
+ sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
r = removed_runnable;
sub_positive(&sa->runnable_avg, r);
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg
2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
@ 2022-01-12 15:26 ` Dietmar Eggemann
2022-01-12 16:04 ` Vincent Guittot
2022-01-18 11:18 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
1 sibling, 1 reply; 14+ messages in thread
From: Dietmar Eggemann @ 2022-01-12 15:26 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, rostedt, bsegall,
mgorman, bristot, linux-kernel, rickyiu, odin
Cc: sachinp, naresh.kamboju
On 11/01/2022 14:46, Vincent Guittot wrote:
> Rick reported performance regressions in bugzilla because of cpu frequency
> being lower than before:
> https://bugzilla.kernel.org/show_bug.cgi?id=215045
>
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
>
> This commit forces util_sum to be synced with the new util_avg after
> removing the contribution of a task and before the next periodic sync. By
> doing so util_sum is rounded to its lower bound and might lost up to
> LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
> reflected in util_avg.
>
> Instead of always setting util_sum to the low bound of util_avg, which can
> significantly lower the utilization of root cfs_rq after propagating the
> change down into the hierarchy, we revert the change of util_sum and
> propagate the difference.
IMHO, this paragraph does not match the changes in this patch.
> In addition, we also check that cfs's util_sum always stays above the
> lower bound for a given util_avg as it has been observed that
> sched_entity's util_sum is sometimes above cfs one.
And I guess this one also refers to the code change in 2/4, i.e. in
update_tg_cfs_util().
> Reported-by: Rick Yiu <rickyiu@google.com>
> Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 095b0aa378df..ed35255fdb85 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3381,6 +3381,7 @@ void set_task_rq_fair(struct sched_entity *se,
> se->avg.last_update_time = n_last_update_time;
> }
>
> +#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
>
> /*
> * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
> @@ -3449,7 +3450,6 @@ void set_task_rq_fair(struct sched_entity *se,
> * XXX: only do this for the part of runnable > running ?
> *
> */
> -
> static inline void
> update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> {
> @@ -3681,7 +3681,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>
> r = removed_util;
> sub_positive(&sa->util_avg, r);
> - sa->util_sum = sa->util_avg * divider;
> + sub_positive(&sa->util_sum, r * divider);
> + /*
> + * Because of rounding, se->util_sum might ends up being +1 more than
There are no se's involved in update_cfs_rq_load_avg(). Could be hard to
grasp for people only looking at this function.
> + * cfs->util_sum. Although this is not a problem by itself, detaching
> + * a lot of tasks with the rounding problem between 2 updates of
> + * util_avg (~1ms) can make cfs->util_sum becoming null whereas
> + * cfs_util_avg is not.
> + * Check that util_sum is still above its lower bound for the new
> + * util_avg. Given that period_contrib might have moved since the last
> + * sync, we are only sure that util_sum must be above or equal to
> + * util_avg * minimum possible divider
^^^
some superfluous whitepaces.
> + */
> + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
>
> r = removed_runnable;
> sub_positive(&sa->runnable_avg, r);
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg
2022-01-12 15:26 ` Dietmar Eggemann
@ 2022-01-12 16:04 ` Vincent Guittot
0 siblings, 0 replies; 14+ messages in thread
From: Vincent Guittot @ 2022-01-12 16:04 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: mingo, peterz, juri.lelli, rostedt, bsegall, mgorman, bristot,
linux-kernel, rickyiu, odin, sachinp, naresh.kamboju
On Wed, 12 Jan 2022 at 16:26, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 11/01/2022 14:46, Vincent Guittot wrote:
> > Rick reported performance regressions in bugzilla because of cpu frequency
> > being lower than before:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215045
> >
> > He bisected the problem to:
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> >
> > This commit forces util_sum to be synced with the new util_avg after
> > removing the contribution of a task and before the next periodic sync. By
> > doing so util_sum is rounded to its lower bound and might lost up to
> > LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
> > reflected in util_avg.
> >
> > Instead of always setting util_sum to the low bound of util_avg, which can
> > significantly lower the utilization of root cfs_rq after propagating the
> > change down into the hierarchy, we revert the change of util_sum and
> > propagate the difference.
>
> IMHO, this paragraph does not match the changes in this patch.
>
> > In addition, we also check that cfs's util_sum always stays above the
> > lower bound for a given util_avg as it has been observed that
> > sched_entity's util_sum is sometimes above cfs one.
>
> And I guess this one also refers to the code change in 2/4, i.e. in
> update_tg_cfs_util().
The 3 places can have the problem although it's probably more obvious
in detach_entity_load_avg or update_tg_cfs_util because this is done
synchronously unlike here
> > Reported-by: Rick Yiu <rickyiu@google.com>
> > Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> > kernel/sched/fair.c | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 095b0aa378df..ed35255fdb85 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3381,6 +3381,7 @@ void set_task_rq_fair(struct sched_entity *se,
> > se->avg.last_update_time = n_last_update_time;
> > }
> >
> > +#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
> >
> > /*
> > * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
> > @@ -3449,7 +3450,6 @@ void set_task_rq_fair(struct sched_entity *se,
> > * XXX: only do this for the part of runnable > running ?
> > *
> > */
> > -
> > static inline void
> > update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> > {
> > @@ -3681,7 +3681,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
> >
> > r = removed_util;
> > sub_positive(&sa->util_avg, r);
> > - sa->util_sum = sa->util_avg * divider;
> > + sub_positive(&sa->util_sum, r * divider);
> > + /*
> > + * Because of rounding, se->util_sum might ends up being +1 more than
>
> There are no se's involved in update_cfs_rq_load_avg(). Could be hard to
> grasp for people only looking at this function.
I moved the comment there to stay in patch 1 but it reflects the whole
problem of rounding when removing some util_avg/sum from a cfs whereas
we only have part of the problem in update_cfs_rq_load_avg()
>
> > + * cfs->util_sum. Although this is not a problem by itself, detaching
> > + * a lot of tasks with the rounding problem between 2 updates of
> > + * util_avg (~1ms) can make cfs->util_sum becoming null whereas
> > + * cfs_util_avg is not.
> > + * Check that util_sum is still above its lower bound for the new
> > + * util_avg. Given that period_contrib might have moved since the last
> > + * sync, we are only sure that util_sum must be above or equal to
> > + * util_avg * minimum possible divider
> ^^^
> some superfluous whitepaces.
>
> > + */
> > + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
> >
> > r = removed_runnable;
> > sub_positive(&sa->runnable_avg, r);
> >
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: sched/urgent] sched/pelt: Relax the sync of util_sum with util_avg
2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
2022-01-12 15:26 ` Dietmar Eggemann
@ 2022-01-18 11:18 ` tip-bot2 for Vincent Guittot
1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: Rick Yiu, Vincent Guittot, Peter Zijlstra (Intel),
Dietmar Eggemann, Sachin Sant, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 98b0d890220d45418cfbc5157b3382e6da5a12ab
Gitweb: https://git.kernel.org/tip/98b0d890220d45418cfbc5157b3382e6da5a12ab
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Tue, 11 Jan 2022 14:46:56 +01:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00
sched/pelt: Relax the sync of util_sum with util_avg
Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.
Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.
In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.
Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Reported-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 16 +++++++++++++---
kernel/sched/pelt.h | 4 +++-
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 095b0aa..d8f068d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3381,7 +3381,6 @@ void set_task_rq_fair(struct sched_entity *se,
se->avg.last_update_time = n_last_update_time;
}
-
/*
* When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
* propagate its contribution. The key to this propagation is the invariant
@@ -3449,7 +3448,6 @@ void set_task_rq_fair(struct sched_entity *se,
* XXX: only do this for the part of runnable > running ?
*
*/
-
static inline void
update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
@@ -3681,7 +3679,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
r = removed_util;
sub_positive(&sa->util_avg, r);
- sa->util_sum = sa->util_avg * divider;
+ sub_positive(&sa->util_sum, r * divider);
+ /*
+ * Because of rounding, se->util_sum might ends up being +1 more than
+ * cfs->util_sum. Although this is not a problem by itself, detaching
+ * a lot of tasks with the rounding problem between 2 updates of
+ * util_avg (~1ms) can make cfs->util_sum becoming null whereas
+ * cfs_util_avg is not.
+ * Check that util_sum is still above its lower bound for the new
+ * util_avg. Given that period_contrib might have moved since the last
+ * sync, we are only sure that util_sum must be above or equal to
+ * util_avg * minimum possible divider
+ */
+ sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDER);
r = removed_runnable;
sub_positive(&sa->runnable_avg, r);
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index e06071b..c336f5f 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -37,9 +37,11 @@ update_irq_load_avg(struct rq *rq, u64 running)
}
#endif
+#define PELT_MIN_DIVIDER (LOAD_AVG_MAX - 1024)
+
static inline u32 get_pelt_divider(struct sched_avg *avg)
{
- return LOAD_AVG_MAX - 1024 + avg->period_contrib;
+ return PELT_MIN_DIVIDER + avg->period_contrib;
}
static inline void cfs_se_util_change(struct sched_avg *avg)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 2/4] sched/pelt: Continue to relax the sync of util_sum with util_avg
2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
2022-01-18 11:18 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-11 13:46 ` [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg Vincent Guittot
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, bristot, linux-kernel, rickyiu, odin
Cc: sachinp, naresh.kamboju, Vincent Guittot
Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.
update_tg_cfs_util() is not the only place where we round util_sum and
lost some accumulated contributions that are not already reflected in
util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not
sync util_sum with the new util_avg. Instead of always setting util_sum to
the low bound of util_avg, which can significantly lower the utilization,
we propagate the difference. In addition, we also check that cfs's util_sum
always stays above the lower bound for a given util_avg as it has been
observed that sched_entity's util_sum is sometimes above cfs one.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ed35255fdb85..3eb73ce6ef13 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3453,11 +3453,11 @@ void set_task_rq_fair(struct sched_entity *se,
static inline void
update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
- u32 divider;
+ long delta_sum, delta_avg = gcfs_rq->avg.util_avg - se->avg.util_avg;
+ u32 new_sum, divider;
/* Nothing to update */
- if (!delta)
+ if (!delta_avg)
return;
/*
@@ -3466,13 +3466,20 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
*/
divider = get_pelt_divider(&cfs_rq->avg);
+
/* Set new sched_entity's utilization */
se->avg.util_avg = gcfs_rq->avg.util_avg;
- se->avg.util_sum = se->avg.util_avg * divider;
+ new_sum = se->avg.util_avg * divider;
+ delta_sum = (long)new_sum - (long)se->avg.util_sum;
+ se->avg.util_sum = new_sum;
/* Update parent cfs_rq utilization */
- add_positive(&cfs_rq->avg.util_avg, delta);
- cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+ add_positive(&cfs_rq->avg.util_avg, delta_avg);
+ add_positive(&cfs_rq->avg.util_sum, delta_sum);
+
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+ cfs_rq->avg.util_avg * MIN_DIVIDER);
}
static inline void
@@ -3792,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
dequeue_load_avg(cfs_rq, se);
sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
- cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+ sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+ cfs_rq->avg.util_avg * MIN_DIVIDER);
+
sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [tip: sched/urgent] sched/pelt: Continue to relax the sync of util_sum with util_avg
2022-01-11 13:46 ` [PATCH v3 2/4] sched/pelt: Continue to relax " Vincent Guittot
@ 2022-01-18 11:18 ` tip-bot2 for Vincent Guittot
0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel),
Dietmar Eggemann, Sachin Sant, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 7ceb77103001544a43e11d7f3a8a69a2c1f422cf
Gitweb: https://git.kernel.org/tip/7ceb77103001544a43e11d7f3a8a69a2c1f422cf
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Tue, 11 Jan 2022 14:46:57 +01:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00
sched/pelt: Continue to relax the sync of util_sum with util_avg
Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.
update_tg_cfs_util() is not the only place where we round util_sum and
lost some accumulated contributions that are not already reflected in
util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not
sync util_sum with the new util_avg. Instead of always setting util_sum to
the low bound of util_avg, which can significantly lower the utilization,
we propagate the difference. In addition, we also check that cfs's util_sum
always stays above the lower bound for a given util_avg as it has been
observed that sched_entity's util_sum is sometimes above cfs one.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-3-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d8f068d..ad2809c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3451,11 +3451,11 @@ void set_task_rq_fair(struct sched_entity *se,
static inline void
update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
- u32 divider;
+ long delta_sum, delta_avg = gcfs_rq->avg.util_avg - se->avg.util_avg;
+ u32 new_sum, divider;
/* Nothing to update */
- if (!delta)
+ if (!delta_avg)
return;
/*
@@ -3464,13 +3464,20 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
*/
divider = get_pelt_divider(&cfs_rq->avg);
+
/* Set new sched_entity's utilization */
se->avg.util_avg = gcfs_rq->avg.util_avg;
- se->avg.util_sum = se->avg.util_avg * divider;
+ new_sum = se->avg.util_avg * divider;
+ delta_sum = (long)new_sum - (long)se->avg.util_sum;
+ se->avg.util_sum = new_sum;
/* Update parent cfs_rq utilization */
- add_positive(&cfs_rq->avg.util_avg, delta);
- cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+ add_positive(&cfs_rq->avg.util_avg, delta_avg);
+ add_positive(&cfs_rq->avg.util_sum, delta_sum);
+
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+ cfs_rq->avg.util_avg * PELT_MIN_DIVIDER);
}
static inline void
@@ -3790,7 +3797,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
dequeue_load_avg(cfs_rq, se);
sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
- cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+ sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+ cfs_rq->avg.util_avg * PELT_MIN_DIVIDER);
+
sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg
2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
2022-01-11 13:46 ` [PATCH v3 2/4] sched/pelt: Continue to relax " Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
2022-01-18 11:18 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-11 13:46 ` [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg Vincent Guittot
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, bristot, linux-kernel, rickyiu, odin
Cc: sachinp, naresh.kamboju, Vincent Guittot
Similarly to util_avg and util_sum, don't sync runnable_sum with the low
bound of runnable_avg but only ensure that runnable_sum stays in the
correct range.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 33 +++++++++++++++++++--------------
1 file changed, 19 insertions(+), 14 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3eb73ce6ef13..e34de721a630 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3485,11 +3485,11 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
static inline void
update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
- u32 divider;
+ long delta_sum, delta_avg = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
+ u32 new_sum, divider;
/* Nothing to update */
- if (!delta)
+ if (!delta_avg)
return;
/*
@@ -3500,11 +3500,16 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
/* Set new sched_entity's runnable */
se->avg.runnable_avg = gcfs_rq->avg.runnable_avg;
- se->avg.runnable_sum = se->avg.runnable_avg * divider;
+ new_sum = se->avg.runnable_avg * divider;
+ delta_sum = (long)new_sum - (long)se->avg.runnable_sum;
+ se->avg.runnable_sum = new_sum;
/* Update parent cfs_rq runnable */
- add_positive(&cfs_rq->avg.runnable_avg, delta);
- cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+ add_positive(&cfs_rq->avg.runnable_avg, delta_avg);
+ add_positive(&cfs_rq->avg.runnable_sum, delta_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+ cfs_rq->avg.runnable_avg * MIN_DIVIDER);
}
static inline void
@@ -3704,7 +3709,10 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
r = removed_runnable;
sub_positive(&sa->runnable_avg, r);
- sa->runnable_sum = sa->runnable_avg * divider;
+ sub_positive(&sa->runnable_sum, r * divider);
+ /* See sa->util_sum above */
+ sa->runnable_sum = max_t(u32, sa->runnable_sum,
+ sa->runnable_avg * MIN_DIVIDER);
/*
* removed_runnable is the unweighted version of removed_load so we
@@ -3791,12 +3799,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
*/
static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- /*
- * cfs_rq->avg.period_contrib can be used for both cfs_rq and se.
- * See ___update_load_avg() for details.
- */
- u32 divider = get_pelt_divider(&cfs_rq->avg);
-
dequeue_load_avg(cfs_rq, se);
sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
@@ -3805,7 +3807,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
cfs_rq->avg.util_avg * MIN_DIVIDER);
sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
- cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+ sub_positive(&cfs_rq->avg.runnable_sum, se->avg.runnable_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+ cfs_rq->avg.runnable_avg * MIN_DIVIDER);
add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum);
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [tip: sched/urgent] sched/pelt: Relax the sync of runnable_sum with runnable_avg
2022-01-11 13:46 ` [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg Vincent Guittot
@ 2022-01-18 11:18 ` tip-bot2 for Vincent Guittot
0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel),
Dietmar Eggemann, Sachin Sant, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 95246d1ec80b8d19d882cd8eb7ad094e63b41bb8
Gitweb: https://git.kernel.org/tip/95246d1ec80b8d19d882cd8eb7ad094e63b41bb8
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Tue, 11 Jan 2022 14:46:58 +01:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00
sched/pelt: Relax the sync of runnable_sum with runnable_avg
Similarly to util_avg and util_sum, don't sync runnable_sum with the low
bound of runnable_avg but only ensure that runnable_sum stays in the
correct range.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-4-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 33 +++++++++++++++++++--------------
1 file changed, 19 insertions(+), 14 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ad2809c..0e87e19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3483,11 +3483,11 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
static inline void
update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
- u32 divider;
+ long delta_sum, delta_avg = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
+ u32 new_sum, divider;
/* Nothing to update */
- if (!delta)
+ if (!delta_avg)
return;
/*
@@ -3498,11 +3498,16 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
/* Set new sched_entity's runnable */
se->avg.runnable_avg = gcfs_rq->avg.runnable_avg;
- se->avg.runnable_sum = se->avg.runnable_avg * divider;
+ new_sum = se->avg.runnable_avg * divider;
+ delta_sum = (long)new_sum - (long)se->avg.runnable_sum;
+ se->avg.runnable_sum = new_sum;
/* Update parent cfs_rq runnable */
- add_positive(&cfs_rq->avg.runnable_avg, delta);
- cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+ add_positive(&cfs_rq->avg.runnable_avg, delta_avg);
+ add_positive(&cfs_rq->avg.runnable_sum, delta_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+ cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER);
}
static inline void
@@ -3702,7 +3707,10 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
r = removed_runnable;
sub_positive(&sa->runnable_avg, r);
- sa->runnable_sum = sa->runnable_avg * divider;
+ sub_positive(&sa->runnable_sum, r * divider);
+ /* See sa->util_sum above */
+ sa->runnable_sum = max_t(u32, sa->runnable_sum,
+ sa->runnable_avg * PELT_MIN_DIVIDER);
/*
* removed_runnable is the unweighted version of removed_load so we
@@ -3789,12 +3797,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
*/
static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- /*
- * cfs_rq->avg.period_contrib can be used for both cfs_rq and se.
- * See ___update_load_avg() for details.
- */
- u32 divider = get_pelt_divider(&cfs_rq->avg);
-
dequeue_load_avg(cfs_rq, se);
sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
@@ -3803,7 +3805,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
cfs_rq->avg.util_avg * PELT_MIN_DIVIDER);
sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
- cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+ sub_positive(&cfs_rq->avg.runnable_sum, se->avg.runnable_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+ cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER);
add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg
2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
` (2 preceding siblings ...)
2022-01-11 13:46 ` [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
2022-01-18 11:18 ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-12 11:23 ` [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Sachin Sant
2022-01-12 15:26 ` Dietmar Eggemann
5 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, bristot, linux-kernel, rickyiu, odin
Cc: sachinp, naresh.kamboju, Vincent Guittot
Similarly to util_avg and util_sum, don't sync load_sum with the low
bound of load_avg but only ensure that load_sum stays in the correct range.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 41 +++++++++++++++++++++++++----------------
1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e34de721a630..12d139fed1f7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3025,12 +3025,17 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum;
}
+#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
+
static inline void
dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- u32 divider = get_pelt_divider(&se->avg);
sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
- cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+ sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+ cfs_rq->avg.load_avg * MIN_DIVIDER);
+
}
#else
static inline void
@@ -3381,8 +3386,6 @@ void set_task_rq_fair(struct sched_entity *se,
se->avg.last_update_time = n_last_update_time;
}
-#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
-
/*
* When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
* propagate its contribution. The key to this propagation is the invariant
@@ -3515,9 +3518,10 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
static inline void
update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+ long delta_avg, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
unsigned long load_avg;
u64 load_sum = 0;
+ s64 delta_sum;
u32 divider;
if (!runnable_sum)
@@ -3544,7 +3548,7 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
* assuming all tasks are equally runnable.
*/
if (scale_load_down(gcfs_rq->load.weight)) {
- load_sum = div_s64(gcfs_rq->avg.load_sum,
+ load_sum = div_u64(gcfs_rq->avg.load_sum,
scale_load_down(gcfs_rq->load.weight));
}
@@ -3561,19 +3565,22 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
running_sum = se->avg.util_sum >> SCHED_CAPACITY_SHIFT;
runnable_sum = max(runnable_sum, running_sum);
- load_sum = (s64)se_weight(se) * runnable_sum;
- load_avg = div_s64(load_sum, divider);
+ load_sum = se_weight(se) * runnable_sum;
+ load_avg = div_u64(load_sum, divider);
- se->avg.load_sum = runnable_sum;
-
- delta = load_avg - se->avg.load_avg;
- if (!delta)
+ delta_avg = load_avg - se->avg.load_avg;
+ if (!delta_avg)
return;
- se->avg.load_avg = load_avg;
+ delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
- add_positive(&cfs_rq->avg.load_avg, delta);
- cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+ se->avg.load_sum = runnable_sum;
+ se->avg.load_avg = load_avg;
+ add_positive(&cfs_rq->avg.load_avg, delta_avg);
+ add_positive(&cfs_rq->avg.load_sum, delta_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+ cfs_rq->avg.load_avg * MIN_DIVIDER);
}
static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum)
@@ -3689,7 +3696,9 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
r = removed_load;
sub_positive(&sa->load_avg, r);
- sa->load_sum = sa->load_avg * divider;
+ sub_positive(&sa->load_sum, r * divider);
+ /* See sa->util_sum below */
+ sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER);
r = removed_util;
sub_positive(&sa->util_avg, r);
--
2.17.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [tip: sched/urgent] sched/pelt: Relax the sync of load_sum with load_avg
2022-01-11 13:46 ` [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg Vincent Guittot
@ 2022-01-18 11:18 ` tip-bot2 for Vincent Guittot
0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel),
Dietmar Eggemann, Sachin Sant, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 2d02fa8cc21a93da35cfba462bf8ab87bf2db651
Gitweb: https://git.kernel.org/tip/2d02fa8cc21a93da35cfba462bf8ab87bf2db651
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Tue, 11 Jan 2022 14:46:59 +01:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00
sched/pelt: Relax the sync of load_sum with load_avg
Similarly to util_avg and util_sum, don't sync load_sum with the low
bound of load_avg but only ensure that load_sum stays in the correct range.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-5-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0e87e19..f4f02c2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3028,9 +3028,11 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
static inline void
dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- u32 divider = get_pelt_divider(&se->avg);
sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
- cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+ sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+ cfs_rq->avg.load_avg * PELT_MIN_DIVIDER);
}
#else
static inline void
@@ -3513,9 +3515,10 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
static inline void
update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
- long delta, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+ long delta_avg, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
unsigned long load_avg;
u64 load_sum = 0;
+ s64 delta_sum;
u32 divider;
if (!runnable_sum)
@@ -3542,7 +3545,7 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
* assuming all tasks are equally runnable.
*/
if (scale_load_down(gcfs_rq->load.weight)) {
- load_sum = div_s64(gcfs_rq->avg.load_sum,
+ load_sum = div_u64(gcfs_rq->avg.load_sum,
scale_load_down(gcfs_rq->load.weight));
}
@@ -3559,19 +3562,22 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
running_sum = se->avg.util_sum >> SCHED_CAPACITY_SHIFT;
runnable_sum = max(runnable_sum, running_sum);
- load_sum = (s64)se_weight(se) * runnable_sum;
- load_avg = div_s64(load_sum, divider);
-
- se->avg.load_sum = runnable_sum;
+ load_sum = se_weight(se) * runnable_sum;
+ load_avg = div_u64(load_sum, divider);
- delta = load_avg - se->avg.load_avg;
- if (!delta)
+ delta_avg = load_avg - se->avg.load_avg;
+ if (!delta_avg)
return;
- se->avg.load_avg = load_avg;
+ delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
- add_positive(&cfs_rq->avg.load_avg, delta);
- cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+ se->avg.load_sum = runnable_sum;
+ se->avg.load_avg = load_avg;
+ add_positive(&cfs_rq->avg.load_avg, delta_avg);
+ add_positive(&cfs_rq->avg.load_sum, delta_sum);
+ /* See update_cfs_rq_load_avg() */
+ cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+ cfs_rq->avg.load_avg * PELT_MIN_DIVIDER);
}
static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum)
@@ -3687,7 +3693,9 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
r = removed_load;
sub_positive(&sa->load_avg, r);
- sa->load_sum = sa->load_avg * divider;
+ sub_positive(&sa->load_sum, r * divider);
+ /* See sa->util_sum below */
+ sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * PELT_MIN_DIVIDER);
r = removed_util;
sub_positive(&sa->util_avg, r);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
` (3 preceding siblings ...)
2022-01-11 13:46 ` [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg Vincent Guittot
@ 2022-01-12 11:23 ` Sachin Sant
2022-01-12 13:17 ` Vincent Guittot
2022-01-12 15:26 ` Dietmar Eggemann
5 siblings, 1 reply; 14+ messages in thread
From: Sachin Sant @ 2022-01-12 11:23 UTC (permalink / raw)
To: Vincent Guittot
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
rostedt, bsegall, mgorman, bristot, linux-kernel, rickyiu, odin,
naresh.kamboju
> On 11-Jan-2022, at 7:16 PM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> Rick reported performance regressions in bugzilla because of cpu
> frequency being lower than before:
> https://bugzilla.kernel.org/show_bug.cgi?id=215045
>
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
>
> More details are available in commit message of patch 1.
>
> This patchset reverts the commit above and adds several checks when
> propagating the changes in the hierarchy to make sure that we still have
> coherent util_avg and util_sum.
>
> Dietmar found a simple way to reproduce the WARN fixed by
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> by looping on hackbench in several different sched group levels.
>
> This patchset as run on the reproducer with success but it probably needs
> more tests by people who faced the WARN before.
>
I ran scheduler regression tests(including cfg_bandwidth) from LTP
for about 6 hours. I did not observe any (new or previously reported)
kernel warn messages.
Based on this test result for ppc64le
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
-Sachin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
2022-01-12 11:23 ` [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Sachin Sant
@ 2022-01-12 13:17 ` Vincent Guittot
0 siblings, 0 replies; 14+ messages in thread
From: Vincent Guittot @ 2022-01-12 13:17 UTC (permalink / raw)
To: Sachin Sant
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
rostedt, bsegall, mgorman, bristot, linux-kernel, rickyiu, odin,
naresh.kamboju
On Wed, 12 Jan 2022 at 12:24, Sachin Sant <sachinp@linux.vnet.ibm.com> wrote:
>
>
> > On 11-Jan-2022, at 7:16 PM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >
> > Rick reported performance regressions in bugzilla because of cpu
> > frequency being lower than before:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215045
> >
> > He bisected the problem to:
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> >
> > More details are available in commit message of patch 1.
> >
> > This patchset reverts the commit above and adds several checks when
> > propagating the changes in the hierarchy to make sure that we still have
> > coherent util_avg and util_sum.
> >
> > Dietmar found a simple way to reproduce the WARN fixed by
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> > by looping on hackbench in several different sched group levels.
> >
> > This patchset as run on the reproducer with success but it probably needs
> > more tests by people who faced the WARN before.
> >
>
> I ran scheduler regression tests(including cfg_bandwidth) from LTP
> for about 6 hours. I did not observe any (new or previously reported)
> kernel warn messages.
>
> Based on this test result for ppc64le
> Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Thanks
>
> -Sachin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
` (4 preceding siblings ...)
2022-01-12 11:23 ` [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Sachin Sant
@ 2022-01-12 15:26 ` Dietmar Eggemann
5 siblings, 0 replies; 14+ messages in thread
From: Dietmar Eggemann @ 2022-01-12 15:26 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, rostedt, bsegall,
mgorman, bristot, linux-kernel, rickyiu, odin
Cc: sachinp, naresh.kamboju
On 11/01/2022 14:46, Vincent Guittot wrote:
> Rick reported performance regressions in bugzilla because of cpu
> frequency being lower than before:
> https://bugzilla.kernel.org/show_bug.cgi?id=215045
>
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
>
> More details are available in commit message of patch 1.
>
> This patchset reverts the commit above and adds several checks when
> propagating the changes in the hierarchy to make sure that we still have
> coherent util_avg and util_sum.
>
> Dietmar found a simple way to reproduce the WARN fixed by
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> by looping on hackbench in several different sched group levels.
>
> This patchset as run on the reproducer with success but it probably needs
> more tests by people who faced the WARN before.
>
> The changes done on util_sum have been also applied to runnable_sum and
> load_sum which faces the same rounding problem although this has not been
> reflected in measurable performance impact.
>
> Changes for v3:
> - split patch 1 in 2 patches
> - One to fix rick's regression
> - One to apply same changes in other places
> - some typos
> - move main comment so it appears in the 1st patch
>
> Changes for v2:
> - fix wrong update of load_sum
> - move a change from patch 3 to patch 2
> - update patch 3 commit message
>
> Vincent Guittot (4):
> sched/pelt: Relax the sync of util_sum with util_avg
> sched/pelt: Continue to relax the sync of util_sum with util_avg
> sched/pelt: Relax the sync of runnable_sum with runnable_avg
> sched/pelt: Relax the sync of load_sum with load_avg
>
> kernel/sched/fair.c | 113 +++++++++++++++++++++++++++++---------------
> 1 file changed, 75 insertions(+), 38 deletions(-)
LGTM. Just a couple of questions on the patch header of 1/4.
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
^ permalink raw reply [flat|nested] 14+ messages in thread