linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched,numa: move processes with load difference
@ 2014-05-13 23:55 Rik van Riel
  2014-05-14 15:29 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Rik van Riel @ 2014-05-13 23:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, mgorman

Currently the numa balancing code refuses to move a task from a
heavily loaded node to a much less heavily loaded node, if the
difference in load between them is large enough.

If the source load is larger than the destination load after the
swap, moving the task is fine. Chances are the load balancer would
move tasks in the same direction, anyway.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d9474c..98a018f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1224,10 +1224,6 @@ static void task_numa_compare(struct task_numa_env *env,
 		src_load += load;
 	}
 
-	/* make src_load the smaller */
-	if (dst_load < src_load)
-		swap(dst_load, src_load);
-
 	if (src_load * env->imbalance_pct < dst_load * 100)
 		goto unlock;
 


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched,numa: move processes with load difference
  2014-05-13 23:55 [PATCH] sched,numa: move processes with load difference Rik van Riel
@ 2014-05-14 15:29 ` Peter Zijlstra
  2014-05-14 17:22   ` [PATCH] sched,numa: allow task switch if load imbalance improves Rik van Riel
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2014-05-14 15:29 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, mingo, mgorman

[-- Attachment #1: Type: text/plain, Size: 3119 bytes --]

On Tue, May 13, 2014 at 07:55:50PM -0400, Rik van Riel wrote:
> Currently the numa balancing code refuses to move a task from a
> heavily loaded node to a much less heavily loaded node, if the
> difference in load between them is large enough.
> 
> If the source load is larger than the destination load after the
> swap, moving the task is fine. Chances are the load balancer would
> move tasks in the same direction, anyway.

So the intent of that code was that the swap (remember, both tasks don't
need to have the same weight) wouldn't push us over the imbalance
threshold.

Now we want to test that threshold both ways, dst being small and src
being large, and vice versa.

However, if we've already crossed that imbalance, crossing it isn't the
right test.

In that case I think we want to limit the swap such that the imbalance
improves (ie. gets smaller).

Something like the below perhaps (entirely untested).

Or am I missing something?

---
 kernel/sched/fair.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28ccf502c63c..87f88568ecb3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1107,7 +1107,9 @@ static void task_numa_compare(struct task_numa_env *env,
 	struct rq *src_rq = cpu_rq(env->src_cpu);
 	struct rq *dst_rq = cpu_rq(env->dst_cpu);
 	struct task_struct *cur;
+	long orig_dst_load, orig_src_load;
 	long dst_load, src_load;
+	long imb, orig_imb;
 	long load;
 	long imp = (groupimp > 0) ? groupimp : taskimp;
 
@@ -1181,8 +1183,8 @@ static void task_numa_compare(struct task_numa_env *env,
 	 * In the overloaded case, try and keep the load balanced.
 	 */
 balance:
-	dst_load = env->dst_stats.load;
-	src_load = env->src_stats.load;
+	orig_dst_load = dst_load = env->dst_stats.load;
+	orig_src_load = src_load = env->src_stats.load;
 
 	/* XXX missing power terms */
 	load = task_h_load(env->p);
@@ -1195,12 +1197,33 @@ static void task_numa_compare(struct task_numa_env *env,
 		src_load += load;
 	}
 
-	/* make src_load the smaller */
+	/* 
+	 * We want to compute the 'slope' of the imbalance between src and dst
+	 * since we're not interested in what direction the slope is.
+	 *
+	 * So make src_load the smaller.
+	 */
 	if (dst_load < src_load)
 		swap(dst_load, src_load);
 
-	if (src_load * env->imbalance_pct < dst_load * 100)
-		goto unlock;
+	/*
+	 * Test if the slope is over or under the imb_pct
+	 */
+	imb = dst_load * 100 - src_load * env->imbalance_pct;
+	if (imb >= 0) {
+		/*
+		 * If the slope is over the imb_pct, check the original state;
+		 * if we started out already being imbalanced, see if this swap
+		 * improves the situation by reducing the slope, even though
+		 * its still over the threshold.
+		 */
+		if (orig_dst_load < orig_src_load)
+			swap(orig_dst_load, orig_src_load);
+
+		orig_imb = orig_dst_load * 100 - orig_src_load * env->imbalance_pct;
+		if (imb > orig_imb)
+			goto unlock;
+	}
 
 assign:
 	task_numa_assign(env, cur, imp);

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] sched,numa: allow task switch if load imbalance improves
  2014-05-14 15:29 ` Peter Zijlstra
@ 2014-05-14 17:22   ` Rik van Riel
  2014-05-19 13:10     ` [tip:sched/core] sched,numa: Allow " tip-bot for Rik van Riel
  2014-05-22 12:29     ` [tip:sched/core] sched/numa: " tip-bot for Rik van Riel
  0 siblings, 2 replies; 5+ messages in thread
From: Rik van Riel @ 2014-05-14 17:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, mgorman, chegu_vinod

On Wed, 14 May 2014 17:29:43 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> In that case I think we want to limit the swap such that the imbalance
> improves (ie. gets smaller).
> 
> Something like the below perhaps (entirely untested).

You are absolutely right.

This is what it looks like with a helper function and more comments.

---8<---

Subject: sched,numa: allow task switch if load imbalance improves

Currently the NUMA balancing code only allows moving tasks between NUMA
nodes when the load on both nodes is in balance. This breaks down when
the load was imbalanced to begin with.

Allow tasks to be moved between NUMA nodes if the imbalance is small,
or if the new imbalance is be smaller than the original one.

Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/fair.c | 46 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8c6a01..887bd64 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1124,6 +1124,34 @@ static void task_numa_assign(struct task_numa_env *env,
 	env->best_cpu = env->dst_cpu;
 }
 
+static bool load_too_imbalanced(long orig_src_load, long orig_dst_load,
+				long src_load, long dst_load,
+				struct task_numa_env *env)
+{
+	long imb, old_imb;
+
+	/* We care about the slope of the imbalance, not the direction. */
+	if (dst_load < src_load)
+		swap(dst_load, src_load);
+
+	/* Is the difference below the threshold? */
+	imb = dst_load * 100 - src_load * env->imbalance_pct;
+	if (imb <= 0)
+		return false;
+
+	/*
+	 * The imbalance is above the allowed threshold.
+	 * Compare it with the old imbalance.
+	 */
+	if (orig_dst_load < orig_src_load)
+		swap(orig_dst_load, orig_src_load);
+	
+	old_imb = orig_dst_load * 100 - orig_src_load * env->imbalance_pct;
+
+	/* Would this change make things worse? */
+	return (old_imb > imb);
+}
+
 /*
  * This checks if the overall compute and NUMA accesses of the system would
  * be improved if the source tasks was migrated to the target dst_cpu taking
@@ -1136,7 +1164,8 @@ static void task_numa_compare(struct task_numa_env *env,
 	struct rq *src_rq = cpu_rq(env->src_cpu);
 	struct rq *dst_rq = cpu_rq(env->dst_cpu);
 	struct task_struct *cur;
-	long dst_load, src_load;
+	long orig_src_load, src_load;
+	long orig_dst_load, dst_load;
 	long load;
 	long imp = (groupimp > 0) ? groupimp : taskimp;
 
@@ -1210,13 +1239,13 @@ static void task_numa_compare(struct task_numa_env *env,
 	 * In the overloaded case, try and keep the load balanced.
 	 */
 balance:
-	dst_load = env->dst_stats.load;
-	src_load = env->src_stats.load;
+	orig_dst_load = env->dst_stats.load;
+	orig_src_load = env->src_stats.load;
 
 	/* XXX missing power terms */
 	load = task_h_load(env->p);
-	dst_load += load;
-	src_load -= load;
+	dst_load = orig_dst_load + load;
+	src_load = orig_src_load - load;
 
 	if (cur) {
 		load = task_h_load(cur);
@@ -1224,11 +1253,8 @@ static void task_numa_compare(struct task_numa_env *env,
 		src_load += load;
 	}
 
-	/* make src_load the smaller */
-	if (dst_load < src_load)
-		swap(dst_load, src_load);
-
-	if (src_load * env->imbalance_pct < dst_load * 100)
+	if (load_too_imbalanced(orig_src_load, orig_dst_load,
+				src_load, dst_load, env))
 		goto unlock;
 
 assign:


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:sched/core] sched,numa: Allow task switch if load imbalance improves
  2014-05-14 17:22   ` [PATCH] sched,numa: allow task switch if load imbalance improves Rik van Riel
@ 2014-05-19 13:10     ` tip-bot for Rik van Riel
  2014-05-22 12:29     ` [tip:sched/core] sched/numa: " tip-bot for Rik van Riel
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Rik van Riel @ 2014-05-19 13:10 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, peterz, tglx

Commit-ID:  b1fda183e09d70ea75d478ea055e2b6059476eff
Gitweb:     http://git.kernel.org/tip/b1fda183e09d70ea75d478ea055e2b6059476eff
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Wed, 14 May 2014 13:22:21 -0400
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 19 May 2014 22:02:42 +0900

sched,numa: Allow task switch if load imbalance improves

Currently the NUMA balancing code only allows moving tasks between NUMA
nodes when the load on both nodes is in balance. This breaks down when
the load was imbalanced to begin with.

Allow tasks to be moved between NUMA nodes if the imbalance is small,
or if the new imbalance is be smaller than the original one.

Cc: mingo@kernel.org
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140514132221.274b3463@annuminas.surriel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/fair.c | 46 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f7cac2b..b899613 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1095,6 +1095,34 @@ static void task_numa_assign(struct task_numa_env *env,
 	env->best_cpu = env->dst_cpu;
 }
 
+static bool load_too_imbalanced(long orig_src_load, long orig_dst_load,
+				long src_load, long dst_load,
+				struct task_numa_env *env)
+{
+	long imb, old_imb;
+
+	/* We care about the slope of the imbalance, not the direction. */
+	if (dst_load < src_load)
+		swap(dst_load, src_load);
+
+	/* Is the difference below the threshold? */
+	imb = dst_load * 100 - src_load * env->imbalance_pct;
+	if (imb <= 0)
+		return false;
+
+	/*
+	 * The imbalance is above the allowed threshold.
+	 * Compare it with the old imbalance.
+	 */
+	if (orig_dst_load < orig_src_load)
+		swap(orig_dst_load, orig_src_load);
+
+	old_imb = orig_dst_load * 100 - orig_src_load * env->imbalance_pct;
+
+	/* Would this change make things worse? */
+	return (old_imb > imb);
+}
+
 /*
  * This checks if the overall compute and NUMA accesses of the system would
  * be improved if the source tasks was migrated to the target dst_cpu taking
@@ -1107,7 +1135,8 @@ static void task_numa_compare(struct task_numa_env *env,
 	struct rq *src_rq = cpu_rq(env->src_cpu);
 	struct rq *dst_rq = cpu_rq(env->dst_cpu);
 	struct task_struct *cur;
-	long dst_load, src_load;
+	long orig_src_load, src_load;
+	long orig_dst_load, dst_load;
 	long load;
 	long imp = (groupimp > 0) ? groupimp : taskimp;
 
@@ -1181,13 +1210,13 @@ static void task_numa_compare(struct task_numa_env *env,
 	 * In the overloaded case, try and keep the load balanced.
 	 */
 balance:
-	dst_load = env->dst_stats.load;
-	src_load = env->src_stats.load;
+	orig_dst_load = env->dst_stats.load;
+	orig_src_load = env->src_stats.load;
 
 	/* XXX missing power terms */
 	load = task_h_load(env->p);
-	dst_load += load;
-	src_load -= load;
+	dst_load = orig_dst_load + load;
+	src_load = orig_src_load - load;
 
 	if (cur) {
 		load = task_h_load(cur);
@@ -1195,11 +1224,8 @@ balance:
 		src_load += load;
 	}
 
-	/* make src_load the smaller */
-	if (dst_load < src_load)
-		swap(dst_load, src_load);
-
-	if (src_load * env->imbalance_pct < dst_load * 100)
+	if (load_too_imbalanced(orig_src_load, orig_dst_load,
+				src_load, dst_load, env))
 		goto unlock;
 
 assign:

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:sched/core] sched/numa: Allow task switch if load imbalance improves
  2014-05-14 17:22   ` [PATCH] sched,numa: allow task switch if load imbalance improves Rik van Riel
  2014-05-19 13:10     ` [tip:sched/core] sched,numa: Allow " tip-bot for Rik van Riel
@ 2014-05-22 12:29     ` tip-bot for Rik van Riel
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Rik van Riel @ 2014-05-22 12:29 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, peterz, tglx

Commit-ID:  e63da03639cc9e6e83b62e7ef8ffdbb92421416a
Gitweb:     http://git.kernel.org/tip/e63da03639cc9e6e83b62e7ef8ffdbb92421416a
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Wed, 14 May 2014 13:22:21 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 May 2014 11:16:38 +0200

sched/numa: Allow task switch if load imbalance improves

Currently the NUMA balancing code only allows moving tasks between NUMA
nodes when the load on both nodes is in balance. This breaks down when
the load was imbalanced to begin with.

Allow tasks to be moved between NUMA nodes if the imbalance is small,
or if the new imbalance is be smaller than the original one.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140514132221.274b3463@annuminas.surriel.com
---
 kernel/sched/fair.c | 46 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f7cac2b..b899613 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1095,6 +1095,34 @@ static void task_numa_assign(struct task_numa_env *env,
 	env->best_cpu = env->dst_cpu;
 }
 
+static bool load_too_imbalanced(long orig_src_load, long orig_dst_load,
+				long src_load, long dst_load,
+				struct task_numa_env *env)
+{
+	long imb, old_imb;
+
+	/* We care about the slope of the imbalance, not the direction. */
+	if (dst_load < src_load)
+		swap(dst_load, src_load);
+
+	/* Is the difference below the threshold? */
+	imb = dst_load * 100 - src_load * env->imbalance_pct;
+	if (imb <= 0)
+		return false;
+
+	/*
+	 * The imbalance is above the allowed threshold.
+	 * Compare it with the old imbalance.
+	 */
+	if (orig_dst_load < orig_src_load)
+		swap(orig_dst_load, orig_src_load);
+
+	old_imb = orig_dst_load * 100 - orig_src_load * env->imbalance_pct;
+
+	/* Would this change make things worse? */
+	return (old_imb > imb);
+}
+
 /*
  * This checks if the overall compute and NUMA accesses of the system would
  * be improved if the source tasks was migrated to the target dst_cpu taking
@@ -1107,7 +1135,8 @@ static void task_numa_compare(struct task_numa_env *env,
 	struct rq *src_rq = cpu_rq(env->src_cpu);
 	struct rq *dst_rq = cpu_rq(env->dst_cpu);
 	struct task_struct *cur;
-	long dst_load, src_load;
+	long orig_src_load, src_load;
+	long orig_dst_load, dst_load;
 	long load;
 	long imp = (groupimp > 0) ? groupimp : taskimp;
 
@@ -1181,13 +1210,13 @@ static void task_numa_compare(struct task_numa_env *env,
 	 * In the overloaded case, try and keep the load balanced.
 	 */
 balance:
-	dst_load = env->dst_stats.load;
-	src_load = env->src_stats.load;
+	orig_dst_load = env->dst_stats.load;
+	orig_src_load = env->src_stats.load;
 
 	/* XXX missing power terms */
 	load = task_h_load(env->p);
-	dst_load += load;
-	src_load -= load;
+	dst_load = orig_dst_load + load;
+	src_load = orig_src_load - load;
 
 	if (cur) {
 		load = task_h_load(cur);
@@ -1195,11 +1224,8 @@ balance:
 		src_load += load;
 	}
 
-	/* make src_load the smaller */
-	if (dst_load < src_load)
-		swap(dst_load, src_load);
-
-	if (src_load * env->imbalance_pct < dst_load * 100)
+	if (load_too_imbalanced(orig_src_load, orig_dst_load,
+				src_load, dst_load, env))
 		goto unlock;
 
 assign:

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-22 12:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-13 23:55 [PATCH] sched,numa: move processes with load difference Rik van Riel
2014-05-14 15:29 ` Peter Zijlstra
2014-05-14 17:22   ` [PATCH] sched,numa: allow task switch if load imbalance improves Rik van Riel
2014-05-19 13:10     ` [tip:sched/core] sched,numa: Allow " tip-bot for Rik van Riel
2014-05-22 12:29     ` [tip:sched/core] sched/numa: " tip-bot for Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).