All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] node capacity fixes for NUMA balancing
@ 2014-08-04 17:23 riel
  2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel
  2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel
  0 siblings, 2 replies; 5+ messages in thread
From: riel @ 2014-08-04 17:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mgorman, mingo, vincent.guittot, Morten.Rasmussen,
	nicolas.pitre, efault

The NUMA balancing code has a few issues with determining the
capacity of nodes, and using it when doing a task move.

First the NUMA balancing code does not have the equivalent of
c61037e9 to fix the "phantom cores" phenomenon in the presence
of SMT.

Secondly, the NUMA balancing code will happily move a task from
a node that is loaded to capacity, to another node that is
loaded to capacity. This can leave the second node overloaded.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] sched,numa: fix off-by-one in capacity check
  2014-08-04 17:23 [PATCH 0/2] node capacity fixes for NUMA balancing riel
@ 2014-08-04 17:23 ` riel
  2014-08-12 14:53   ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel
  2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel
  1 sibling, 1 reply; 5+ messages in thread
From: riel @ 2014-08-04 17:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mgorman, mingo, vincent.guittot, Morten.Rasmussen,
	nicolas.pitre, efault

From: Rik van Riel <riel@redhat.com>

Commit a43455a1d572daf7b730fe12eb747d1e17411365 ensures that
task_numa_migrate will call task_numa_compare on the preferred
node all the time, even when the preferred node has no free capacity.

This could lead to a performance regression if nr_running == capacity
on both the source and the destination node. This can be avoided by
also checking for nr_running == capacity on the source node, which is
one stricter than checking .has_free_capacity.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86..678ed03 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1206,7 +1206,7 @@ static void task_numa_compare(struct task_numa_env *env,
 
 	if (!cur) {
 		/* Is there capacity at our destination? */
-		if (env->src_stats.has_free_capacity &&
+		if (env->src_stats.nr_running <= env->src_stats.task_capacity &&
 		    !env->dst_stats.has_free_capacity)
 			goto unlock;
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] sched,numa: fix numa capacity computation
  2014-08-04 17:23 [PATCH 0/2] node capacity fixes for NUMA balancing riel
  2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel
@ 2014-08-04 17:23 ` riel
  2014-08-12 14:53   ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel
  1 sibling, 1 reply; 5+ messages in thread
From: riel @ 2014-08-04 17:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mgorman, mingo, vincent.guittot, Morten.Rasmussen,
	nicolas.pitre, efault

From: Rik van Riel <riel@redhat.com>

Commit c61037e9 fixes the phenomenon of 'fantom' cores due to
N*frac(smt_power) >= 1 by limiting the capacity to the actual
number of cores in the load balancing code.

This patch applies the same correction to the NUMA balancing
code.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/fair.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 678ed03..376bc07c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1038,7 +1038,8 @@ struct numa_stats {
  */
 static void update_numa_stats(struct numa_stats *ns, int nid)
 {
-	int cpu, cpus = 0;
+	int smt, cpu, cpus = 0;
+	unsigned long capacity;
 
 	memset(ns, 0, sizeof(*ns));
 	for_each_cpu(cpu, cpumask_of_node(nid)) {
@@ -1062,8 +1063,12 @@ static void update_numa_stats(struct numa_stats *ns, int nid)
 	if (!cpus)
 		return;
 
-	ns->task_capacity =
-		DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE);
+	/* smt := ceil(cpus / capacity), assumes: 1 < smt_power < 2 */
+	smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, ns->compute_capacity);
+	capacity = cpus / smt; /* cores */
+
+	ns->task_capacity = min_t(unsigned, capacity,
+		DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE));
 	ns->has_free_capacity = (ns->nr_running < ns->task_capacity);
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:sched/core] sched/numa: Fix off-by-one in capacity check
  2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel
@ 2014-08-12 14:53   ` tip-bot for Rik van Riel
  0 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Rik van Riel @ 2014-08-12 14:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  b932c03c34f3b03c7364c06aa8cae5b74609fc41
Gitweb:     http://git.kernel.org/tip/b932c03c34f3b03c7364c06aa8cae5b74609fc41
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Mon, 4 Aug 2014 13:23:27 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Aug 2014 12:48:22 +0200

sched/numa: Fix off-by-one in capacity check

Commit a43455a1d572daf7b730fe12eb747d1e17411365 ensures that
task_numa_migrate will call task_numa_compare on the preferred
node all the time, even when the preferred node has no free capacity.

This could lead to a performance regression if nr_running == capacity
on both the source and the destination node. This can be avoided by
also checking for nr_running == capacity on the source node, which is
one stricter than checking .has_free_capacity.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: vincent.guittot@linaro.org
Cc: Morten.Rasmussen@arm.com
Cc: nicolas.pitre@linaro.org
Cc: efault@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407173008-9334-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df1ed17..e1cf419 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1206,7 +1206,7 @@ static void task_numa_compare(struct task_numa_env *env,
 
 	if (!cur) {
 		/* Is there capacity at our destination? */
-		if (env->src_stats.has_free_capacity &&
+		if (env->src_stats.nr_running <= env->src_stats.task_capacity &&
 		    !env->dst_stats.has_free_capacity)
 			goto unlock;
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:sched/core] sched/numa: Fix numa capacity computation
  2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel
@ 2014-08-12 14:53   ` tip-bot for Rik van Riel
  0 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Rik van Riel @ 2014-08-12 14:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  83d7f2424741c9dc76c21377c9d00d47abaf88df
Gitweb:     http://git.kernel.org/tip/83d7f2424741c9dc76c21377c9d00d47abaf88df
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Mon, 4 Aug 2014 13:23:28 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 12 Aug 2014 12:48:23 +0200

sched/numa: Fix numa capacity computation

Commit c61037e9 fixes the phenomenon of 'fantom' cores due to
N*frac(smt_power) >= 1 by limiting the capacity to the actual
number of cores in the load balancing code.

This patch applies the same correction to the NUMA balancing
code.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: vincent.guittot@linaro.org
Cc: Morten.Rasmussen@arm.com
Cc: nicolas.pitre@linaro.org
Cc: efault@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407173008-9334-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e1cf419..1413c44 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1038,7 +1038,8 @@ struct numa_stats {
  */
 static void update_numa_stats(struct numa_stats *ns, int nid)
 {
-	int cpu, cpus = 0;
+	int smt, cpu, cpus = 0;
+	unsigned long capacity;
 
 	memset(ns, 0, sizeof(*ns));
 	for_each_cpu(cpu, cpumask_of_node(nid)) {
@@ -1062,8 +1063,12 @@ static void update_numa_stats(struct numa_stats *ns, int nid)
 	if (!cpus)
 		return;
 
-	ns->task_capacity =
-		DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE);
+	/* smt := ceil(cpus / capacity), assumes: 1 < smt_power < 2 */
+	smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, ns->compute_capacity);
+	capacity = cpus / smt; /* cores */
+
+	ns->task_capacity = min_t(unsigned, capacity,
+		DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE));
 	ns->has_free_capacity = (ns->nr_running < ns->task_capacity);
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-08-12 14:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-04 17:23 [PATCH 0/2] node capacity fixes for NUMA balancing riel
2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel
2014-08-12 14:53   ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel
2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel
2014-08-12 14:53   ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.