All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4]   sched/pelt: Relax the sync of *_sum with *_avg
@ 2022-01-11 13:46 Vincent Guittot
  2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, rickyiu, odin
  Cc: sachinp, naresh.kamboju, Vincent Guittot

Rick reported performance regressions in bugzilla because of cpu
frequency being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

More details are available in commit message of patch 1.

This patchset reverts the commit above and adds several checks when
propagating the changes in the hierarchy to make sure that we still have
coherent util_avg and util_sum.

Dietmar found a simple way to reproduce the WARN fixed by 
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
by looping on hackbench in several different sched group levels.

This patchset as run on the reproducer with success but it probably needs
more tests by people who faced the WARN before.

The changes done on util_sum have been also applied to runnable_sum and
load_sum which faces the same rounding problem although this has not been
reflected in measurable performance impact.

Changes for v3:
- split patch 1 in 2 patches
  - One to fix rick's regression
  - One to apply same changes in other places
- some typos
- move main comment so it appears in the 1st patch 

Changes for v2:
- fix wrong update of load_sum
- move a change from patch 3 to patch 2
- update patch 3 commit message

Vincent Guittot (4):
  sched/pelt: Relax the sync of util_sum with util_avg
  sched/pelt: Continue to relax the sync of util_sum with util_avg
  sched/pelt: Relax the sync of runnable_sum with runnable_avg
  sched/pelt: Relax the sync of load_sum with load_avg

 kernel/sched/fair.c | 113 +++++++++++++++++++++++++++++---------------
 1 file changed, 75 insertions(+), 38 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg
  2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
  2022-01-12 15:26   ` Dietmar Eggemann
  2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
  2022-01-11 13:46 ` [PATCH v3 2/4] sched/pelt: Continue to relax " Vincent Guittot
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, rickyiu, odin
  Cc: sachinp, naresh.kamboju, Vincent Guittot

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.

In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.

Reported-by: Rick Yiu <rickyiu@google.com>
Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 095b0aa378df..ed35255fdb85 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3381,6 +3381,7 @@ void set_task_rq_fair(struct sched_entity *se,
 	se->avg.last_update_time = n_last_update_time;
 }
 
+#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
 
 /*
  * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
@@ -3449,7 +3450,6 @@ void set_task_rq_fair(struct sched_entity *se,
  * XXX: only do this for the part of runnable > running ?
  *
  */
-
 static inline void
 update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
@@ -3681,7 +3681,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 
 		r = removed_util;
 		sub_positive(&sa->util_avg, r);
-		sa->util_sum = sa->util_avg * divider;
+		sub_positive(&sa->util_sum, r * divider);
+		/*
+		 * Because of rounding, se->util_sum might ends up being +1 more than
+		 * cfs->util_sum. Although this is not a problem by itself, detaching
+		 * a lot of tasks with the rounding problem between 2 updates of
+		 * util_avg (~1ms) can make cfs->util_sum becoming null whereas
+		 * cfs_util_avg is not.
+		 * Check that util_sum is still above its lower bound for the new
+		 * util_avg. Given that period_contrib might have moved since the last
+		 * sync, we are only sure that util_sum must be above or equal to
+		 *    util_avg * minimum possible divider
+		 */
+		sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
 
 		r = removed_runnable;
 		sub_positive(&sa->runnable_avg, r);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] sched/pelt: Continue to relax the sync of util_sum with util_avg
  2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
  2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
  2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
  2022-01-11 13:46 ` [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg Vincent Guittot
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, rickyiu, odin
  Cc: sachinp, naresh.kamboju, Vincent Guittot

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

update_tg_cfs_util() is not the only place where we round util_sum and
lost some accumulated contributions that are not already reflected in
util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not
sync util_sum with the new util_avg. Instead of always setting util_sum to
the low bound of util_avg, which can significantly lower the utilization,
we propagate the difference. In addition, we also check that cfs's util_sum
always stays above the lower bound for a given util_avg as it has been
observed that sched_entity's util_sum is sometimes above cfs one.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ed35255fdb85..3eb73ce6ef13 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3453,11 +3453,11 @@ void set_task_rq_fair(struct sched_entity *se,
 static inline void
 update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
-	u32 divider;
+	long delta_sum, delta_avg = gcfs_rq->avg.util_avg - se->avg.util_avg;
+	u32 new_sum, divider;
 
 	/* Nothing to update */
-	if (!delta)
+	if (!delta_avg)
 		return;
 
 	/*
@@ -3466,13 +3466,20 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 	 */
 	divider = get_pelt_divider(&cfs_rq->avg);
 
+
 	/* Set new sched_entity's utilization */
 	se->avg.util_avg = gcfs_rq->avg.util_avg;
-	se->avg.util_sum = se->avg.util_avg * divider;
+	new_sum = se->avg.util_avg * divider;
+	delta_sum = (long)new_sum - (long)se->avg.util_sum;
+	se->avg.util_sum = new_sum;
 
 	/* Update parent cfs_rq utilization */
-	add_positive(&cfs_rq->avg.util_avg, delta);
-	cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+	add_positive(&cfs_rq->avg.util_avg, delta_avg);
+	add_positive(&cfs_rq->avg.util_sum, delta_sum);
+
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+					  cfs_rq->avg.util_avg * MIN_DIVIDER);
 }
 
 static inline void
@@ -3792,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 
 	dequeue_load_avg(cfs_rq, se);
 	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
-	cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+					  cfs_rq->avg.util_avg * MIN_DIVIDER);
+
 	sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
 	cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg
  2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
  2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
  2022-01-11 13:46 ` [PATCH v3 2/4] sched/pelt: Continue to relax " Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
  2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
  2022-01-11 13:46 ` [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg Vincent Guittot
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, rickyiu, odin
  Cc: sachinp, naresh.kamboju, Vincent Guittot

Similarly to util_avg and util_sum, don't sync runnable_sum with the low
bound of runnable_avg but only ensure that runnable_sum stays in the
correct range.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3eb73ce6ef13..e34de721a630 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3485,11 +3485,11 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 static inline void
 update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long delta = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
-	u32 divider;
+	long delta_sum, delta_avg = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
+	u32 new_sum, divider;
 
 	/* Nothing to update */
-	if (!delta)
+	if (!delta_avg)
 		return;
 
 	/*
@@ -3500,11 +3500,16 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
 
 	/* Set new sched_entity's runnable */
 	se->avg.runnable_avg = gcfs_rq->avg.runnable_avg;
-	se->avg.runnable_sum = se->avg.runnable_avg * divider;
+	new_sum = se->avg.runnable_avg * divider;
+	delta_sum = (long)new_sum - (long)se->avg.runnable_sum;
+	se->avg.runnable_sum = new_sum;
 
 	/* Update parent cfs_rq runnable */
-	add_positive(&cfs_rq->avg.runnable_avg, delta);
-	cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+	add_positive(&cfs_rq->avg.runnable_avg, delta_avg);
+	add_positive(&cfs_rq->avg.runnable_sum, delta_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+					      cfs_rq->avg.runnable_avg * MIN_DIVIDER);
 }
 
 static inline void
@@ -3704,7 +3709,10 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 
 		r = removed_runnable;
 		sub_positive(&sa->runnable_avg, r);
-		sa->runnable_sum = sa->runnable_avg * divider;
+		sub_positive(&sa->runnable_sum, r * divider);
+		/* See sa->util_sum above */
+		sa->runnable_sum = max_t(u32, sa->runnable_sum,
+					      sa->runnable_avg * MIN_DIVIDER);
 
 		/*
 		 * removed_runnable is the unweighted version of removed_load so we
@@ -3791,12 +3799,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
  */
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	/*
-	 * cfs_rq->avg.period_contrib can be used for both cfs_rq and se.
-	 * See ___update_load_avg() for details.
-	 */
-	u32 divider = get_pelt_divider(&cfs_rq->avg);
-
 	dequeue_load_avg(cfs_rq, se);
 	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
 	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
@@ -3805,7 +3807,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 					  cfs_rq->avg.util_avg * MIN_DIVIDER);
 
 	sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
-	cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+	sub_positive(&cfs_rq->avg.runnable_sum, se->avg.runnable_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+					      cfs_rq->avg.runnable_avg * MIN_DIVIDER);
 
 	add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg
  2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
                   ` (2 preceding siblings ...)
  2022-01-11 13:46 ` [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg Vincent Guittot
@ 2022-01-11 13:46 ` Vincent Guittot
  2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
  2022-01-12 11:23 ` [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Sachin Sant
  2022-01-12 15:26 ` Dietmar Eggemann
  5 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2022-01-11 13:46 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, rickyiu, odin
  Cc: sachinp, naresh.kamboju, Vincent Guittot

Similarly to util_avg and util_sum, don't sync load_sum with the low
bound of load_avg but only ensure that load_sum stays in the correct range.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e34de721a630..12d139fed1f7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3025,12 +3025,17 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum;
 }
 
+#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
+
 static inline void
 dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	u32 divider = get_pelt_divider(&se->avg);
 	sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
-	cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+	sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+					  cfs_rq->avg.load_avg * MIN_DIVIDER);
+
 }
 #else
 static inline void
@@ -3381,8 +3386,6 @@ void set_task_rq_fair(struct sched_entity *se,
 	se->avg.last_update_time = n_last_update_time;
 }
 
-#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
-
 /*
  * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
  * propagate its contribution. The key to this propagation is the invariant
@@ -3515,9 +3518,10 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
 static inline void
 update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long delta, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+	long delta_avg, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
 	unsigned long load_avg;
 	u64 load_sum = 0;
+	s64 delta_sum;
 	u32 divider;
 
 	if (!runnable_sum)
@@ -3544,7 +3548,7 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 		 * assuming all tasks are equally runnable.
 		 */
 		if (scale_load_down(gcfs_rq->load.weight)) {
-			load_sum = div_s64(gcfs_rq->avg.load_sum,
+			load_sum = div_u64(gcfs_rq->avg.load_sum,
 				scale_load_down(gcfs_rq->load.weight));
 		}
 
@@ -3561,19 +3565,22 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 	running_sum = se->avg.util_sum >> SCHED_CAPACITY_SHIFT;
 	runnable_sum = max(runnable_sum, running_sum);
 
-	load_sum = (s64)se_weight(se) * runnable_sum;
-	load_avg = div_s64(load_sum, divider);
+	load_sum = se_weight(se) * runnable_sum;
+	load_avg = div_u64(load_sum, divider);
 
-	se->avg.load_sum = runnable_sum;
-
-	delta = load_avg - se->avg.load_avg;
-	if (!delta)
+	delta_avg = load_avg - se->avg.load_avg;
+	if (!delta_avg)
 		return;
 
-	se->avg.load_avg = load_avg;
+	delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
 
-	add_positive(&cfs_rq->avg.load_avg, delta);
-	cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+	se->avg.load_sum = runnable_sum;
+	se->avg.load_avg = load_avg;
+	add_positive(&cfs_rq->avg.load_avg, delta_avg);
+	add_positive(&cfs_rq->avg.load_sum, delta_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+					  cfs_rq->avg.load_avg * MIN_DIVIDER);
 }
 
 static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum)
@@ -3689,7 +3696,9 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 
 		r = removed_load;
 		sub_positive(&sa->load_avg, r);
-		sa->load_sum = sa->load_avg * divider;
+		sub_positive(&sa->load_sum, r * divider);
+		/* See sa->util_sum below */
+		sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER);
 
 		r = removed_util;
 		sub_positive(&sa->util_avg, r);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/4]   sched/pelt: Relax the sync of *_sum with *_avg
  2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
                   ` (3 preceding siblings ...)
  2022-01-11 13:46 ` [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg Vincent Guittot
@ 2022-01-12 11:23 ` Sachin Sant
  2022-01-12 13:17   ` Vincent Guittot
  2022-01-12 15:26 ` Dietmar Eggemann
  5 siblings, 1 reply; 14+ messages in thread
From: Sachin Sant @ 2022-01-12 11:23 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	rostedt, bsegall, mgorman, bristot, linux-kernel, rickyiu, odin,
	naresh.kamboju


> On 11-Jan-2022, at 7:16 PM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> 
> Rick reported performance regressions in bugzilla because of cpu
> frequency being lower than before:
>    https://bugzilla.kernel.org/show_bug.cgi?id=215045
> 
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> 
> More details are available in commit message of patch 1.
> 
> This patchset reverts the commit above and adds several checks when
> propagating the changes in the hierarchy to make sure that we still have
> coherent util_avg and util_sum.
> 
> Dietmar found a simple way to reproduce the WARN fixed by 
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> by looping on hackbench in several different sched group levels.
> 
> This patchset as run on the reproducer with success but it probably needs
> more tests by people who faced the WARN before.
> 

I ran scheduler regression tests(including cfg_bandwidth) from LTP
for about 6 hours. I did not observe any (new or previously reported)
kernel warn messages.

Based on this test result for ppc64le
Tested-by: Sachin Sant <sachinp@linux.ibm.com>

-Sachin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
  2022-01-12 11:23 ` [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Sachin Sant
@ 2022-01-12 13:17   ` Vincent Guittot
  0 siblings, 0 replies; 14+ messages in thread
From: Vincent Guittot @ 2022-01-12 13:17 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	rostedt, bsegall, mgorman, bristot, linux-kernel, rickyiu, odin,
	naresh.kamboju

On Wed, 12 Jan 2022 at 12:24, Sachin Sant <sachinp@linux.vnet.ibm.com> wrote:
>
>
> > On 11-Jan-2022, at 7:16 PM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >
> > Rick reported performance regressions in bugzilla because of cpu
> > frequency being lower than before:
> >    https://bugzilla.kernel.org/show_bug.cgi?id=215045
> >
> > He bisected the problem to:
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> >
> > More details are available in commit message of patch 1.
> >
> > This patchset reverts the commit above and adds several checks when
> > propagating the changes in the hierarchy to make sure that we still have
> > coherent util_avg and util_sum.
> >
> > Dietmar found a simple way to reproduce the WARN fixed by
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> > by looping on hackbench in several different sched group levels.
> >
> > This patchset as run on the reproducer with success but it probably needs
> > more tests by people who faced the WARN before.
> >
>
> I ran scheduler regression tests(including cfg_bandwidth) from LTP
> for about 6 hours. I did not observe any (new or previously reported)
> kernel warn messages.
>
> Based on this test result for ppc64le
> Tested-by: Sachin Sant <sachinp@linux.ibm.com>

Thanks

>
> -Sachin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg
  2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
                   ` (4 preceding siblings ...)
  2022-01-12 11:23 ` [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Sachin Sant
@ 2022-01-12 15:26 ` Dietmar Eggemann
  5 siblings, 0 replies; 14+ messages in thread
From: Dietmar Eggemann @ 2022-01-12 15:26 UTC (permalink / raw)
  To: Vincent Guittot, mingo, peterz, juri.lelli, rostedt, bsegall,
	mgorman, bristot, linux-kernel, rickyiu, odin
  Cc: sachinp, naresh.kamboju

On 11/01/2022 14:46, Vincent Guittot wrote:
> Rick reported performance regressions in bugzilla because of cpu
> frequency being lower than before:
>     https://bugzilla.kernel.org/show_bug.cgi?id=215045
> 
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> 
> More details are available in commit message of patch 1.
> 
> This patchset reverts the commit above and adds several checks when
> propagating the changes in the hierarchy to make sure that we still have
> coherent util_avg and util_sum.
> 
> Dietmar found a simple way to reproduce the WARN fixed by 
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> by looping on hackbench in several different sched group levels.
> 
> This patchset as run on the reproducer with success but it probably needs
> more tests by people who faced the WARN before.
> 
> The changes done on util_sum have been also applied to runnable_sum and
> load_sum which faces the same rounding problem although this has not been
> reflected in measurable performance impact.
> 
> Changes for v3:
> - split patch 1 in 2 patches
>   - One to fix rick's regression
>   - One to apply same changes in other places
> - some typos
> - move main comment so it appears in the 1st patch 
> 
> Changes for v2:
> - fix wrong update of load_sum
> - move a change from patch 3 to patch 2
> - update patch 3 commit message
> 
> Vincent Guittot (4):
>   sched/pelt: Relax the sync of util_sum with util_avg
>   sched/pelt: Continue to relax the sync of util_sum with util_avg
>   sched/pelt: Relax the sync of runnable_sum with runnable_avg
>   sched/pelt: Relax the sync of load_sum with load_avg
> 
>  kernel/sched/fair.c | 113 +++++++++++++++++++++++++++++---------------
>  1 file changed, 75 insertions(+), 38 deletions(-)

LGTM. Just a couple of questions on the patch header of 1/4.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg
  2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
@ 2022-01-12 15:26   ` Dietmar Eggemann
  2022-01-12 16:04     ` Vincent Guittot
  2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
  1 sibling, 1 reply; 14+ messages in thread
From: Dietmar Eggemann @ 2022-01-12 15:26 UTC (permalink / raw)
  To: Vincent Guittot, mingo, peterz, juri.lelli, rostedt, bsegall,
	mgorman, bristot, linux-kernel, rickyiu, odin
  Cc: sachinp, naresh.kamboju

On 11/01/2022 14:46, Vincent Guittot wrote:
> Rick reported performance regressions in bugzilla because of cpu frequency
> being lower than before:
>     https://bugzilla.kernel.org/show_bug.cgi?id=215045
> 
> He bisected the problem to:
> commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> 
> This commit forces util_sum to be synced with the new util_avg after
> removing the contribution of a task and before the next periodic sync. By
> doing so util_sum is rounded to its lower bound and might lost up to
> LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
> reflected in util_avg.
> 
> Instead of always setting util_sum to the low bound of util_avg, which can
> significantly lower the utilization of root cfs_rq after propagating the
> change down into the hierarchy, we revert the change of util_sum and
> propagate the difference.

IMHO, this paragraph does not match the changes in this patch.

> In addition, we also check that cfs's util_sum always stays above the
> lower bound for a given util_avg as it has been observed that
> sched_entity's util_sum is sometimes above cfs one.

And I guess this one also refers to the code change in 2/4, i.e. in
update_tg_cfs_util().
> Reported-by: Rick Yiu <rickyiu@google.com>
> Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 095b0aa378df..ed35255fdb85 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3381,6 +3381,7 @@ void set_task_rq_fair(struct sched_entity *se,
>  	se->avg.last_update_time = n_last_update_time;
>  }
>  
> +#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
>  
>  /*
>   * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
> @@ -3449,7 +3450,6 @@ void set_task_rq_fair(struct sched_entity *se,
>   * XXX: only do this for the part of runnable > running ?
>   *
>   */
> -
>  static inline void
>  update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
>  {
> @@ -3681,7 +3681,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>  
>  		r = removed_util;
>  		sub_positive(&sa->util_avg, r);
> -		sa->util_sum = sa->util_avg * divider;
> +		sub_positive(&sa->util_sum, r * divider);
> +		/*
> +		 * Because of rounding, se->util_sum might ends up being +1 more than

There are no se's involved in update_cfs_rq_load_avg(). Could be hard to
grasp for people only looking at this function.

> +		 * cfs->util_sum. Although this is not a problem by itself, detaching
> +		 * a lot of tasks with the rounding problem between 2 updates of
> +		 * util_avg (~1ms) can make cfs->util_sum becoming null whereas
> +		 * cfs_util_avg is not.
> +		 * Check that util_sum is still above its lower bound for the new
> +		 * util_avg. Given that period_contrib might have moved since the last
> +		 * sync, we are only sure that util_sum must be above or equal to
> +		 *    util_avg * minimum possible divider
                   ^^^
some superfluous whitepaces.

> +		 */
> +		sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
>  
>  		r = removed_runnable;
>  		sub_positive(&sa->runnable_avg, r);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg
  2022-01-12 15:26   ` Dietmar Eggemann
@ 2022-01-12 16:04     ` Vincent Guittot
  0 siblings, 0 replies; 14+ messages in thread
From: Vincent Guittot @ 2022-01-12 16:04 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: mingo, peterz, juri.lelli, rostedt, bsegall, mgorman, bristot,
	linux-kernel, rickyiu, odin, sachinp, naresh.kamboju

On Wed, 12 Jan 2022 at 16:26, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 11/01/2022 14:46, Vincent Guittot wrote:
> > Rick reported performance regressions in bugzilla because of cpu frequency
> > being lower than before:
> >     https://bugzilla.kernel.org/show_bug.cgi?id=215045
> >
> > He bisected the problem to:
> > commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> >
> > This commit forces util_sum to be synced with the new util_avg after
> > removing the contribution of a task and before the next periodic sync. By
> > doing so util_sum is rounded to its lower bound and might lost up to
> > LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
> > reflected in util_avg.
> >
> > Instead of always setting util_sum to the low bound of util_avg, which can
> > significantly lower the utilization of root cfs_rq after propagating the
> > change down into the hierarchy, we revert the change of util_sum and
> > propagate the difference.
>
> IMHO, this paragraph does not match the changes in this patch.



>
> > In addition, we also check that cfs's util_sum always stays above the
> > lower bound for a given util_avg as it has been observed that
> > sched_entity's util_sum is sometimes above cfs one.
>
> And I guess this one also refers to the code change in 2/4, i.e. in
> update_tg_cfs_util().

The 3 places can have the problem although it's probably more obvious
in detach_entity_load_avg or update_tg_cfs_util because this is done
synchronously unlike here

> > Reported-by: Rick Yiu <rickyiu@google.com>
> > Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/sched/fair.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 095b0aa378df..ed35255fdb85 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -3381,6 +3381,7 @@ void set_task_rq_fair(struct sched_entity *se,
> >       se->avg.last_update_time = n_last_update_time;
> >  }
> >
> > +#define MIN_DIVIDER (LOAD_AVG_MAX - 1024)
> >
> >  /*
> >   * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
> > @@ -3449,7 +3450,6 @@ void set_task_rq_fair(struct sched_entity *se,
> >   * XXX: only do this for the part of runnable > running ?
> >   *
> >   */
> > -
> >  static inline void
> >  update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
> >  {
> > @@ -3681,7 +3681,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
> >
> >               r = removed_util;
> >               sub_positive(&sa->util_avg, r);
> > -             sa->util_sum = sa->util_avg * divider;
> > +             sub_positive(&sa->util_sum, r * divider);
> > +             /*
> > +              * Because of rounding, se->util_sum might ends up being +1 more than
>
> There are no se's involved in update_cfs_rq_load_avg(). Could be hard to
> grasp for people only looking at this function.

I moved the comment there to stay in patch 1 but it reflects the whole
problem of rounding when removing some util_avg/sum from a cfs whereas
we only have part of the problem in update_cfs_rq_load_avg()

>
> > +              * cfs->util_sum. Although this is not a problem by itself, detaching
> > +              * a lot of tasks with the rounding problem between 2 updates of
> > +              * util_avg (~1ms) can make cfs->util_sum becoming null whereas
> > +              * cfs_util_avg is not.
> > +              * Check that util_sum is still above its lower bound for the new
> > +              * util_avg. Given that period_contrib might have moved since the last
> > +              * sync, we are only sure that util_sum must be above or equal to
> > +              *    util_avg * minimum possible divider
>                    ^^^
> some superfluous whitepaces.
>
> > +              */
> > +             sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER);
> >
> >               r = removed_runnable;
> >               sub_positive(&sa->runnable_avg, r);
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: sched/urgent] sched/pelt: Relax the sync of load_sum with load_avg
  2022-01-11 13:46 ` [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg Vincent Guittot
@ 2022-01-18 11:18   ` tip-bot2 for Vincent Guittot
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel),
	Dietmar Eggemann, Sachin Sant, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     2d02fa8cc21a93da35cfba462bf8ab87bf2db651
Gitweb:        https://git.kernel.org/tip/2d02fa8cc21a93da35cfba462bf8ab87bf2db651
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 11 Jan 2022 14:46:59 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00

sched/pelt: Relax the sync of load_sum with load_avg

Similarly to util_avg and util_sum, don't sync load_sum with the low
bound of load_avg but only ensure that load_sum stays in the correct range.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-5-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0e87e19..f4f02c2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3028,9 +3028,11 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 static inline void
 dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	u32 divider = get_pelt_divider(&se->avg);
 	sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
-	cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+	sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+					  cfs_rq->avg.load_avg * PELT_MIN_DIVIDER);
 }
 #else
 static inline void
@@ -3513,9 +3515,10 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
 static inline void
 update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long delta, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
+	long delta_avg, running_sum, runnable_sum = gcfs_rq->prop_runnable_sum;
 	unsigned long load_avg;
 	u64 load_sum = 0;
+	s64 delta_sum;
 	u32 divider;
 
 	if (!runnable_sum)
@@ -3542,7 +3545,7 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 		 * assuming all tasks are equally runnable.
 		 */
 		if (scale_load_down(gcfs_rq->load.weight)) {
-			load_sum = div_s64(gcfs_rq->avg.load_sum,
+			load_sum = div_u64(gcfs_rq->avg.load_sum,
 				scale_load_down(gcfs_rq->load.weight));
 		}
 
@@ -3559,19 +3562,22 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 	running_sum = se->avg.util_sum >> SCHED_CAPACITY_SHIFT;
 	runnable_sum = max(runnable_sum, running_sum);
 
-	load_sum = (s64)se_weight(se) * runnable_sum;
-	load_avg = div_s64(load_sum, divider);
-
-	se->avg.load_sum = runnable_sum;
+	load_sum = se_weight(se) * runnable_sum;
+	load_avg = div_u64(load_sum, divider);
 
-	delta = load_avg - se->avg.load_avg;
-	if (!delta)
+	delta_avg = load_avg - se->avg.load_avg;
+	if (!delta_avg)
 		return;
 
-	se->avg.load_avg = load_avg;
+	delta_sum = load_sum - (s64)se_weight(se) * se->avg.load_sum;
 
-	add_positive(&cfs_rq->avg.load_avg, delta);
-	cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
+	se->avg.load_sum = runnable_sum;
+	se->avg.load_avg = load_avg;
+	add_positive(&cfs_rq->avg.load_avg, delta_avg);
+	add_positive(&cfs_rq->avg.load_sum, delta_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
+					  cfs_rq->avg.load_avg * PELT_MIN_DIVIDER);
 }
 
 static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum)
@@ -3687,7 +3693,9 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 
 		r = removed_load;
 		sub_positive(&sa->load_avg, r);
-		sa->load_sum = sa->load_avg * divider;
+		sub_positive(&sa->load_sum, r * divider);
+		/* See sa->util_sum below */
+		sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * PELT_MIN_DIVIDER);
 
 		r = removed_util;
 		sub_positive(&sa->util_avg, r);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: sched/urgent] sched/pelt: Relax the sync of runnable_sum with runnable_avg
  2022-01-11 13:46 ` [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg Vincent Guittot
@ 2022-01-18 11:18   ` tip-bot2 for Vincent Guittot
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel),
	Dietmar Eggemann, Sachin Sant, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     95246d1ec80b8d19d882cd8eb7ad094e63b41bb8
Gitweb:        https://git.kernel.org/tip/95246d1ec80b8d19d882cd8eb7ad094e63b41bb8
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 11 Jan 2022 14:46:58 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00

sched/pelt: Relax the sync of runnable_sum with runnable_avg

Similarly to util_avg and util_sum, don't sync runnable_sum with the low
bound of runnable_avg but only ensure that runnable_sum stays in the
correct range.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-4-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ad2809c..0e87e19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3483,11 +3483,11 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 static inline void
 update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long delta = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
-	u32 divider;
+	long delta_sum, delta_avg = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
+	u32 new_sum, divider;
 
 	/* Nothing to update */
-	if (!delta)
+	if (!delta_avg)
 		return;
 
 	/*
@@ -3498,11 +3498,16 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
 
 	/* Set new sched_entity's runnable */
 	se->avg.runnable_avg = gcfs_rq->avg.runnable_avg;
-	se->avg.runnable_sum = se->avg.runnable_avg * divider;
+	new_sum = se->avg.runnable_avg * divider;
+	delta_sum = (long)new_sum - (long)se->avg.runnable_sum;
+	se->avg.runnable_sum = new_sum;
 
 	/* Update parent cfs_rq runnable */
-	add_positive(&cfs_rq->avg.runnable_avg, delta);
-	cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+	add_positive(&cfs_rq->avg.runnable_avg, delta_avg);
+	add_positive(&cfs_rq->avg.runnable_sum, delta_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+					      cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER);
 }
 
 static inline void
@@ -3702,7 +3707,10 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 
 		r = removed_runnable;
 		sub_positive(&sa->runnable_avg, r);
-		sa->runnable_sum = sa->runnable_avg * divider;
+		sub_positive(&sa->runnable_sum, r * divider);
+		/* See sa->util_sum above */
+		sa->runnable_sum = max_t(u32, sa->runnable_sum,
+					      sa->runnable_avg * PELT_MIN_DIVIDER);
 
 		/*
 		 * removed_runnable is the unweighted version of removed_load so we
@@ -3789,12 +3797,6 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
  */
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	/*
-	 * cfs_rq->avg.period_contrib can be used for both cfs_rq and se.
-	 * See ___update_load_avg() for details.
-	 */
-	u32 divider = get_pelt_divider(&cfs_rq->avg);
-
 	dequeue_load_avg(cfs_rq, se);
 	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
 	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
@@ -3803,7 +3805,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 					  cfs_rq->avg.util_avg * PELT_MIN_DIVIDER);
 
 	sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
-	cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
+	sub_positive(&cfs_rq->avg.runnable_sum, se->avg.runnable_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.runnable_sum = max_t(u32, cfs_rq->avg.runnable_sum,
+					      cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER);
 
 	add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum);
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: sched/urgent] sched/pelt: Continue to relax the sync of util_sum with util_avg
  2022-01-11 13:46 ` [PATCH v3 2/4] sched/pelt: Continue to relax " Vincent Guittot
@ 2022-01-18 11:18   ` tip-bot2 for Vincent Guittot
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel),
	Dietmar Eggemann, Sachin Sant, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     7ceb77103001544a43e11d7f3a8a69a2c1f422cf
Gitweb:        https://git.kernel.org/tip/7ceb77103001544a43e11d7f3a8a69a2c1f422cf
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 11 Jan 2022 14:46:57 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00

sched/pelt: Continue to relax the sync of util_sum with util_avg

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

update_tg_cfs_util() is not the only place where we round util_sum and
lost some accumulated contributions that are not already reflected in
util_avg. Modify update_tg_cfs_util() and detach_entity_load_avg() to not
sync util_sum with the new util_avg. Instead of always setting util_sum to
the low bound of util_avg, which can significantly lower the utilization,
we propagate the difference. In addition, we also check that cfs's util_sum
always stays above the lower bound for a given util_avg as it has been
observed that sched_entity's util_sum is sometimes above cfs one.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-3-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d8f068d..ad2809c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3451,11 +3451,11 @@ void set_task_rq_fair(struct sched_entity *se,
 static inline void
 update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
-	long delta = gcfs_rq->avg.util_avg - se->avg.util_avg;
-	u32 divider;
+	long delta_sum, delta_avg = gcfs_rq->avg.util_avg - se->avg.util_avg;
+	u32 new_sum, divider;
 
 	/* Nothing to update */
-	if (!delta)
+	if (!delta_avg)
 		return;
 
 	/*
@@ -3464,13 +3464,20 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq
 	 */
 	divider = get_pelt_divider(&cfs_rq->avg);
 
+
 	/* Set new sched_entity's utilization */
 	se->avg.util_avg = gcfs_rq->avg.util_avg;
-	se->avg.util_sum = se->avg.util_avg * divider;
+	new_sum = se->avg.util_avg * divider;
+	delta_sum = (long)new_sum - (long)se->avg.util_sum;
+	se->avg.util_sum = new_sum;
 
 	/* Update parent cfs_rq utilization */
-	add_positive(&cfs_rq->avg.util_avg, delta);
-	cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+	add_positive(&cfs_rq->avg.util_avg, delta_avg);
+	add_positive(&cfs_rq->avg.util_sum, delta_sum);
+
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+					  cfs_rq->avg.util_avg * PELT_MIN_DIVIDER);
 }
 
 static inline void
@@ -3790,7 +3797,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 
 	dequeue_load_avg(cfs_rq, se);
 	sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
-	cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider;
+	sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
+	/* See update_cfs_rq_load_avg() */
+	cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum,
+					  cfs_rq->avg.util_avg * PELT_MIN_DIVIDER);
+
 	sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg);
 	cfs_rq->avg.runnable_sum = cfs_rq->avg.runnable_avg * divider;
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: sched/urgent] sched/pelt: Relax the sync of util_sum with util_avg
  2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
  2022-01-12 15:26   ` Dietmar Eggemann
@ 2022-01-18 11:18   ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2022-01-18 11:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Rick Yiu, Vincent Guittot, Peter Zijlstra (Intel),
	Dietmar Eggemann, Sachin Sant, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     98b0d890220d45418cfbc5157b3382e6da5a12ab
Gitweb:        https://git.kernel.org/tip/98b0d890220d45418cfbc5157b3382e6da5a12ab
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 11 Jan 2022 14:46:56 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Jan 2022 12:09:58 +01:00

sched/pelt: Relax the sync of util_sum with util_avg

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.

In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.

Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Reported-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 16 +++++++++++++---
 kernel/sched/pelt.h |  4 +++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 095b0aa..d8f068d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3381,7 +3381,6 @@ void set_task_rq_fair(struct sched_entity *se,
 	se->avg.last_update_time = n_last_update_time;
 }
 
-
 /*
  * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
  * propagate its contribution. The key to this propagation is the invariant
@@ -3449,7 +3448,6 @@ void set_task_rq_fair(struct sched_entity *se,
  * XXX: only do this for the part of runnable > running ?
  *
  */
-
 static inline void
 update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
@@ -3681,7 +3679,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 
 		r = removed_util;
 		sub_positive(&sa->util_avg, r);
-		sa->util_sum = sa->util_avg * divider;
+		sub_positive(&sa->util_sum, r * divider);
+		/*
+		 * Because of rounding, se->util_sum might ends up being +1 more than
+		 * cfs->util_sum. Although this is not a problem by itself, detaching
+		 * a lot of tasks with the rounding problem between 2 updates of
+		 * util_avg (~1ms) can make cfs->util_sum becoming null whereas
+		 * cfs_util_avg is not.
+		 * Check that util_sum is still above its lower bound for the new
+		 * util_avg. Given that period_contrib might have moved since the last
+		 * sync, we are only sure that util_sum must be above or equal to
+		 *    util_avg * minimum possible divider
+		 */
+		sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDER);
 
 		r = removed_runnable;
 		sub_positive(&sa->runnable_avg, r);
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index e06071b..c336f5f 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -37,9 +37,11 @@ update_irq_load_avg(struct rq *rq, u64 running)
 }
 #endif
 
+#define PELT_MIN_DIVIDER	(LOAD_AVG_MAX - 1024)
+
 static inline u32 get_pelt_divider(struct sched_avg *avg)
 {
-	return LOAD_AVG_MAX - 1024 + avg->period_contrib;
+	return PELT_MIN_DIVIDER + avg->period_contrib;
 }
 
 static inline void cfs_se_util_change(struct sched_avg *avg)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-18 11:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11 13:46 [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Vincent Guittot
2022-01-11 13:46 ` [PATCH 3 1/4] sched/pelt: Relax the sync of util_sum with util_avg Vincent Guittot
2022-01-12 15:26   ` Dietmar Eggemann
2022-01-12 16:04     ` Vincent Guittot
2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-11 13:46 ` [PATCH v3 2/4] sched/pelt: Continue to relax " Vincent Guittot
2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-11 13:46 ` [PATCH v3 3/4] sched/pelt: Relax the sync of runnable_sum with runnable_avg Vincent Guittot
2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-11 13:46 ` [PATCH v3 4/4] sched/pelt: Relax the sync of load_sum with load_avg Vincent Guittot
2022-01-18 11:18   ` [tip: sched/urgent] " tip-bot2 for Vincent Guittot
2022-01-12 11:23 ` [PATCH v3 0/4] sched/pelt: Relax the sync of *_sum with *_avg Sachin Sant
2022-01-12 13:17   ` Vincent Guittot
2022-01-12 15:26 ` Dietmar Eggemann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.