All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched,numa: ensure task_numa_migrate checks the preferred node
@ 2014-06-04 20:09 Rik van Riel
  2014-06-06 15:43 ` Peter Zijlstra
  2014-06-19 12:35 ` [tip:sched/core] sched/numa: Ensure task_numa_migrate() " tip-bot for Rik van Riel
  0 siblings, 2 replies; 3+ messages in thread
From: Rik van Riel @ 2014-06-04 20:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mgorman, mingo

The first thing task_numa_migrate does is check to see if there is
CPU capacity available on the preferred node, in order to move the
task there.

However, if the preferred node is all busy, we would skip considering
that node for tasks swaps in the subsequent loop. This prevents NUMA
convergence of tasks on busy systems.

However, swapping locations with a task on our preferred nid, when
the preferred nid is busy, is perfectly fine.

The fix is to also look for a CPU on our preferred nid when it is
totally busy.

This changes "perf bench numa mem -p 4 -t 20 -m -0 -P 1000" from
not converging in 15 minutes on my 4 node system, to converging in
10-20 seconds.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: stable@vger.kernel.org
---
This is a safe, simple patch to fix NUMA convergence, which is why
it should go to -stable, IMHO.

 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e5884d8..824e241 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1425,9 +1425,8 @@ static int task_numa_migrate(struct task_struct *p)
 	groupimp = group_weight(p, env.dst_nid, false) - groupweight;
 	update_numa_stats(&env.dst_stats, env.dst_nid);
 
-	/* If the preferred nid has capacity, try to use it. */
-	if (env.dst_stats.has_capacity)
-		task_numa_find_cpu(&env, taskimp, groupimp);
+	/* Try to find a spot on the preferred nid. */
+	task_numa_find_cpu(&env, taskimp, groupimp);
 
 	/* No space available on the preferred nid. Look elsewhere. */
 	if (env.best_cpu == -1) {


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched,numa: ensure task_numa_migrate checks the preferred node
  2014-06-04 20:09 [PATCH] sched,numa: ensure task_numa_migrate checks the preferred node Rik van Riel
@ 2014-06-06 15:43 ` Peter Zijlstra
  2014-06-19 12:35 ` [tip:sched/core] sched/numa: Ensure task_numa_migrate() " tip-bot for Rik van Riel
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2014-06-06 15:43 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, mgorman, mingo

On Wed, Jun 04, 2014 at 04:09:42PM -0400, Rik van Riel wrote:
> The first thing task_numa_migrate does is check to see if there is
> CPU capacity available on the preferred node, in order to move the
> task there.
> 
> However, if the preferred node is all busy, we would skip considering
> that node for tasks swaps in the subsequent loop. This prevents NUMA
> convergence of tasks on busy systems.
> 
> However, swapping locations with a task on our preferred nid, when
> the preferred nid is busy, is perfectly fine.
> 
> The fix is to also look for a CPU on our preferred nid when it is
> totally busy.
> 
> This changes "perf bench numa mem -p 4 -t 20 -m -0 -P 1000" from
> not converging in 15 minutes on my 4 node system, to converging in
> 10-20 seconds.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>


Indeed so. Thanks Rik!


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:sched/core] sched/numa: Ensure task_numa_migrate() checks the preferred node
  2014-06-04 20:09 [PATCH] sched,numa: ensure task_numa_migrate checks the preferred node Rik van Riel
  2014-06-06 15:43 ` Peter Zijlstra
@ 2014-06-19 12:35 ` tip-bot for Rik van Riel
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Rik van Riel @ 2014-06-19 12:35 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  a43455a1d572daf7b730fe12eb747d1e17411365
Gitweb:     http://git.kernel.org/tip/a43455a1d572daf7b730fe12eb747d1e17411365
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Wed, 4 Jun 2014 16:09:42 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 18 Jun 2014 18:29:57 +0200

sched/numa: Ensure task_numa_migrate() checks the preferred node

The first thing task_numa_migrate() does is check to see if there is
CPU capacity available on the preferred node, in order to move the
task there.

However, if the preferred node is all busy, we would skip considering
that node for tasks swaps in the subsequent loop. This prevents NUMA
convergence of tasks on busy systems.

However, swapping locations with a task on our preferred nid, when
the preferred nid is busy, is perfectly fine.

The fix is to also look for a CPU on our preferred nid when it is
totally busy.

This changes "perf bench numa mem -p 4 -t 20 -m -0 -P 1000" from
not converging in 15 minutes on my 4 node system, to converging in
10-20 seconds.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140604160942.6969b101@cuia.bos.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fea7d33..8fbb011 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1302,9 +1302,8 @@ static int task_numa_migrate(struct task_struct *p)
 	groupimp = group_weight(p, env.dst_nid) - groupweight;
 	update_numa_stats(&env.dst_stats, env.dst_nid);
 
-	/* If the preferred nid has free capacity, try to use it. */
-	if (env.dst_stats.has_free_capacity)
-		task_numa_find_cpu(&env, taskimp, groupimp);
+	/* Try to find a spot on the preferred nid. */
+	task_numa_find_cpu(&env, taskimp, groupimp);
 
 	/* No space available on the preferred nid. Look elsewhere. */
 	if (env.best_cpu == -1) {

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-06-19 12:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04 20:09 [PATCH] sched,numa: ensure task_numa_migrate checks the preferred node Rik van Riel
2014-06-06 15:43 ` Peter Zijlstra
2014-06-19 12:35 ` [tip:sched/core] sched/numa: Ensure task_numa_migrate() " tip-bot for Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.