* [PATCH 0/2] node capacity fixes for NUMA balancing @ 2014-08-04 17:23 riel 2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel 2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel 0 siblings, 2 replies; 5+ messages in thread From: riel @ 2014-08-04 17:23 UTC (permalink / raw) To: linux-kernel Cc: peterz, mgorman, mingo, vincent.guittot, Morten.Rasmussen, nicolas.pitre, efault The NUMA balancing code has a few issues with determining the capacity of nodes, and using it when doing a task move. First the NUMA balancing code does not have the equivalent of c61037e9 to fix the "phantom cores" phenomenon in the presence of SMT. Secondly, the NUMA balancing code will happily move a task from a node that is loaded to capacity, to another node that is loaded to capacity. This can leave the second node overloaded. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] sched,numa: fix off-by-one in capacity check 2014-08-04 17:23 [PATCH 0/2] node capacity fixes for NUMA balancing riel @ 2014-08-04 17:23 ` riel 2014-08-12 14:53 ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel 2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel 1 sibling, 1 reply; 5+ messages in thread From: riel @ 2014-08-04 17:23 UTC (permalink / raw) To: linux-kernel Cc: peterz, mgorman, mingo, vincent.guittot, Morten.Rasmussen, nicolas.pitre, efault From: Rik van Riel <riel@redhat.com> Commit a43455a1d572daf7b730fe12eb747d1e17411365 ensures that task_numa_migrate will call task_numa_compare on the preferred node all the time, even when the preferred node has no free capacity. This could lead to a performance regression if nr_running == capacity on both the source and the destination node. This can be avoided by also checking for nr_running == capacity on the source node, which is one stricter than checking .has_free_capacity. Signed-off-by: Rik van Riel <riel@redhat.com> --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86..678ed03 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1206,7 +1206,7 @@ static void task_numa_compare(struct task_numa_env *env, if (!cur) { /* Is there capacity at our destination? */ - if (env->src_stats.has_free_capacity && + if (env->src_stats.nr_running <= env->src_stats.task_capacity && !env->dst_stats.has_free_capacity) goto unlock; -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [tip:sched/core] sched/numa: Fix off-by-one in capacity check 2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel @ 2014-08-12 14:53 ` tip-bot for Rik van Riel 0 siblings, 0 replies; 5+ messages in thread From: tip-bot for Rik van Riel @ 2014-08-12 14:53 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx Commit-ID: b932c03c34f3b03c7364c06aa8cae5b74609fc41 Gitweb: http://git.kernel.org/tip/b932c03c34f3b03c7364c06aa8cae5b74609fc41 Author: Rik van Riel <riel@redhat.com> AuthorDate: Mon, 4 Aug 2014 13:23:27 -0400 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 12 Aug 2014 12:48:22 +0200 sched/numa: Fix off-by-one in capacity check Commit a43455a1d572daf7b730fe12eb747d1e17411365 ensures that task_numa_migrate will call task_numa_compare on the preferred node all the time, even when the preferred node has no free capacity. This could lead to a performance regression if nr_running == capacity on both the source and the destination node. This can be avoided by also checking for nr_running == capacity on the source node, which is one stricter than checking .has_free_capacity. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mgorman@suse.de Cc: vincent.guittot@linaro.org Cc: Morten.Rasmussen@arm.com Cc: nicolas.pitre@linaro.org Cc: efault@gmx.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1407173008-9334-2-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index df1ed17..e1cf419 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1206,7 +1206,7 @@ static void task_numa_compare(struct task_numa_env *env, if (!cur) { /* Is there capacity at our destination? */ - if (env->src_stats.has_free_capacity && + if (env->src_stats.nr_running <= env->src_stats.task_capacity && !env->dst_stats.has_free_capacity) goto unlock; ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] sched,numa: fix numa capacity computation 2014-08-04 17:23 [PATCH 0/2] node capacity fixes for NUMA balancing riel 2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel @ 2014-08-04 17:23 ` riel 2014-08-12 14:53 ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel 1 sibling, 1 reply; 5+ messages in thread From: riel @ 2014-08-04 17:23 UTC (permalink / raw) To: linux-kernel Cc: peterz, mgorman, mingo, vincent.guittot, Morten.Rasmussen, nicolas.pitre, efault From: Rik van Riel <riel@redhat.com> Commit c61037e9 fixes the phenomenon of 'fantom' cores due to N*frac(smt_power) >= 1 by limiting the capacity to the actual number of cores in the load balancing code. This patch applies the same correction to the NUMA balancing code. Signed-off-by: Rik van Riel <riel@redhat.com> --- kernel/sched/fair.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 678ed03..376bc07c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1038,7 +1038,8 @@ struct numa_stats { */ static void update_numa_stats(struct numa_stats *ns, int nid) { - int cpu, cpus = 0; + int smt, cpu, cpus = 0; + unsigned long capacity; memset(ns, 0, sizeof(*ns)); for_each_cpu(cpu, cpumask_of_node(nid)) { @@ -1062,8 +1063,12 @@ static void update_numa_stats(struct numa_stats *ns, int nid) if (!cpus) return; - ns->task_capacity = - DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE); + /* smt := ceil(cpus / capacity), assumes: 1 < smt_power < 2 */ + smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, ns->compute_capacity); + capacity = cpus / smt; /* cores */ + + ns->task_capacity = min_t(unsigned, capacity, + DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE)); ns->has_free_capacity = (ns->nr_running < ns->task_capacity); } -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [tip:sched/core] sched/numa: Fix numa capacity computation 2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel @ 2014-08-12 14:53 ` tip-bot for Rik van Riel 0 siblings, 0 replies; 5+ messages in thread From: tip-bot for Rik van Riel @ 2014-08-12 14:53 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, riel, hpa, mingo, torvalds, peterz, tglx Commit-ID: 83d7f2424741c9dc76c21377c9d00d47abaf88df Gitweb: http://git.kernel.org/tip/83d7f2424741c9dc76c21377c9d00d47abaf88df Author: Rik van Riel <riel@redhat.com> AuthorDate: Mon, 4 Aug 2014 13:23:28 -0400 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 12 Aug 2014 12:48:23 +0200 sched/numa: Fix numa capacity computation Commit c61037e9 fixes the phenomenon of 'fantom' cores due to N*frac(smt_power) >= 1 by limiting the capacity to the actual number of cores in the load balancing code. This patch applies the same correction to the NUMA balancing code. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mgorman@suse.de Cc: vincent.guittot@linaro.org Cc: Morten.Rasmussen@arm.com Cc: nicolas.pitre@linaro.org Cc: efault@gmx.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1407173008-9334-3-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e1cf419..1413c44 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1038,7 +1038,8 @@ struct numa_stats { */ static void update_numa_stats(struct numa_stats *ns, int nid) { - int cpu, cpus = 0; + int smt, cpu, cpus = 0; + unsigned long capacity; memset(ns, 0, sizeof(*ns)); for_each_cpu(cpu, cpumask_of_node(nid)) { @@ -1062,8 +1063,12 @@ static void update_numa_stats(struct numa_stats *ns, int nid) if (!cpus) return; - ns->task_capacity = - DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE); + /* smt := ceil(cpus / capacity), assumes: 1 < smt_power < 2 */ + smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, ns->compute_capacity); + capacity = cpus / smt; /* cores */ + + ns->task_capacity = min_t(unsigned, capacity, + DIV_ROUND_CLOSEST(ns->compute_capacity, SCHED_CAPACITY_SCALE)); ns->has_free_capacity = (ns->nr_running < ns->task_capacity); } ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-08-12 14:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-04 17:23 [PATCH 0/2] node capacity fixes for NUMA balancing riel 2014-08-04 17:23 ` [PATCH 1/2] sched,numa: fix off-by-one in capacity check riel 2014-08-12 14:53 ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel 2014-08-04 17:23 ` [PATCH 2/2] sched,numa: fix numa capacity computation riel 2014-08-12 14:53 ` [tip:sched/core] sched/numa: Fix " tip-bot for Rik van Riel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.