linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 3/4] sched/numa: Apply imbalance limitations consistently
Date: Wed, 11 May 2022 15:30:37 +0100	[thread overview]
Message-ID: <20220511143038.4620-4-mgorman@techsingularity.net> (raw)
In-Reply-To: <20220511143038.4620-1-mgorman@techsingularity.net>

The imbalance limitations are applied inconsistently at fork time
and at runtime. At fork, a new task can remain local until there are
too many running tasks even if the degree of imbalance is larger than
NUMA_IMBALANCE_MIN which is different to runtime. Secondly, the imbalance
figure used during load balancing is different to the one used at NUMA
placement. Load balancing uses the number of tasks that must move to
restore imbalance where as NUMA balancing uses the total imbalance.

In combination, it is possible for a parallel workload that uses a small
number of CPUs without applying scheduler policies to have very variable
run-to-run performance.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 49 ++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03b1ad79d47d..602c05b22805 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9108,6 +9108,24 @@ static inline bool allow_numa_imbalance(int running, int imb_numa_nr)
 	return running <= imb_numa_nr;
 }
 
+#define NUMA_IMBALANCE_MIN 2
+
+static inline long adjust_numa_imbalance(int imbalance,
+				int dst_running, int imb_numa_nr)
+{
+	if (!allow_numa_imbalance(dst_running, imb_numa_nr))
+		return imbalance;
+
+	/*
+	 * Allow a small imbalance based on a simple pair of communicating
+	 * tasks that remain local when the destination is lightly loaded.
+	 */
+	if (imbalance <= NUMA_IMBALANCE_MIN)
+		return 0;
+
+	return imbalance;
+}
+
 /*
  * find_idlest_group() finds and returns the least busy CPU group within the
  * domain.
@@ -9245,8 +9263,12 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
 			 * allowed. If there is a real need of migration,
 			 * periodic load balance will take care of it.
 			 */
-			if (allow_numa_imbalance(local_sgs.sum_nr_running + 1, sd->imb_numa_nr))
+			imbalance = abs(local_sgs.idle_cpus - idlest_sgs.idle_cpus);
+			if (!adjust_numa_imbalance(imbalance,
+						   local_sgs.sum_nr_running + 1,
+						   sd->imb_numa_nr)) {
 				return NULL;
+			}
 		}
 
 		/*
@@ -9334,24 +9356,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	}
 }
 
-#define NUMA_IMBALANCE_MIN 2
-
-static inline long adjust_numa_imbalance(int imbalance,
-				int dst_running, int imb_numa_nr)
-{
-	if (!allow_numa_imbalance(dst_running, imb_numa_nr))
-		return imbalance;
-
-	/*
-	 * Allow a small imbalance based on a simple pair of communicating
-	 * tasks that remain local when the destination is lightly loaded.
-	 */
-	if (imbalance <= NUMA_IMBALANCE_MIN)
-		return 0;
-
-	return imbalance;
-}
-
 /**
  * calculate_imbalance - Calculate the amount of imbalance present within the
  *			 groups of a given sched_domain during load balance.
@@ -9436,7 +9440,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			 */
 			env->migration_type = migrate_task;
 			lsub_positive(&nr_diff, local->sum_nr_running);
-			env->imbalance = nr_diff >> 1;
+			env->imbalance = nr_diff;
 		} else {
 
 			/*
@@ -9445,7 +9449,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			 */
 			env->migration_type = migrate_task;
 			env->imbalance = max_t(long, 0, (local->idle_cpus -
-						 busiest->idle_cpus) >> 1);
+						 busiest->idle_cpus));
 		}
 
 		/* Consider allowing a small imbalance between NUMA groups */
@@ -9454,6 +9458,9 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 				local->sum_nr_running + 1, env->sd->imb_numa_nr);
 		}
 
+		/* Number of tasks to move to restore balance */
+		env->imbalance >>= 1;
+
 		return;
 	}
 
-- 
2.34.1


  parent reply	other threads:[~2022-05-11 14:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 14:30 [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour Mel Gorman
2022-05-11 14:30 ` [PATCH 1/4] sched/numa: Initialise numa_migrate_retry Mel Gorman
2022-05-11 14:30 ` [PATCH 2/4] sched/numa: Do not swap tasks between nodes when spare capacity is available Mel Gorman
2022-05-11 14:30 ` Mel Gorman [this message]
2022-05-18  9:24   ` [sched/numa] bb2dee337b: unixbench.score -11.2% regression kernel test robot
2022-05-18 15:22     ` Mel Gorman
2022-05-19  7:54       ` ying.huang
2022-05-20  6:44         ` [LKP] " Ying Huang
2022-05-18  9:31   ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Peter Zijlstra
2022-05-18 10:46     ` Mel Gorman
2022-05-18 13:59       ` Peter Zijlstra
2022-05-18 15:39         ` Mel Gorman
2022-05-11 14:30 ` [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Mel Gorman
2022-05-18  9:41   ` Peter Zijlstra
2022-05-18 11:15     ` Mel Gorman
2022-05-18 14:05       ` Peter Zijlstra
2022-05-18 17:06         ` Mel Gorman
2022-05-19  9:29           ` Mel Gorman
2022-05-20  4:58 ` [PATCH 0/4] Mitigate inconsistent NUMA imbalance behaviour K Prateek Nayak
2022-05-20 10:18   ` Mel Gorman
2022-05-20 15:17     ` K Prateek Nayak
2022-05-20 10:35 [PATCH v2 " Mel Gorman
2022-05-20 10:35 ` [PATCH 3/4] sched/numa: Apply imbalance limitations consistently Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220511143038.4620-4-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).