[RFC PATCH] sched/fair: Fix impossible migrate_util scenario in load balance

* [RFC PATCH] sched/fair: Fix impossible migrate_util scenario in load balance
@ 2023-07-16  1:41 Qais Yousef
  2023-07-18 12:48 ` Vincent Guittot
  0 siblings, 1 reply; 13+ messages in thread
From: Qais Yousef @ 2023-07-16  1:41 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann
  Cc: linux-kernel, Qais Yousef

We've seen cases while running geekbench that an idle little core never
pulls a task from a bigger overloaded cluster for 100s of ms and
sometimes over a second.

It turned out that the load balance identifies this as a migrate_util
type since the local group (little cluster) has a spare capacity and
will try to pull a task. But the little cluster capacity is very small
nowadays (around 200 or less) and if two busy tasks are stuck on a mid
core which has a capacity of over 700, this means the util of each tasks
will be around 350+ range. Which is always bigger than the spare
capacity of the little group with a single idle core.

When trying to detach_tasks() we bail out then because of the comparison
of:

	if (util > env->imbalance)
		goto next;

In calculate_imbalance() we convert a migrate_util into migrate_task
type if the CPU trying to do the pull is idle. But we only do this if
env->imbalance is 0; which I can't understand. AFAICT env->imbalance
contains the local group's spare capacity. If it is 0, this means it's
fully busy.

Removing this condition fixes the problem, but since I can't fully
understand why it checks for 0, sending this as RFC. It could be a typo
and meant to check for

	env->imbalance != 0

instead?

Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a80a73909dc2..682d9d6a8691 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10288,7 +10288,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			 * waiting task in this overloaded busiest group. Let's
 			 * try to pull it.
 			 */
-			if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) {
+			if (env->idle != CPU_NOT_IDLE) {
 				env->migration_type = migrate_task;
 				env->imbalance = 1;
 			}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread