linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] sched/numa: do load balance between remote nodes
@ 2012-06-06  6:52 Alex Shi
  2012-06-06  9:01 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Alex Shi @ 2012-06-06  6:52 UTC (permalink / raw)
  To: a.p.zijlstra
  Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
	linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
	linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
	tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
	bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
	seto.hidetoshi, ak, arjan.van.de.ven

commit cb83b629b remove the NODE sched domain and check if the node
distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
lose the load balance chance at exec/fork/wake_affine points.

But actually, even the node distance is farther than REMOTE_DISTANCE,
Modern CPUs also has QPI like connections, that make memory access is
not too slow between nodes. So above losing on NUMA machine make a
huge performance regression on benchmark: hackbench, tbench, netperf
and oltp etc.

This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
perfromance regressions. (all of them just has 2 kinds distance, 10 21)

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 39eb601..b2ee41a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6286,7 +6286,7 @@ static int sched_domains_curr_level;
 
 static inline int sd_local_flags(int level)
 {
-	if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+	if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
 		return 0;
 
 	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
  2012-06-06  6:52 [RFC PATCH] sched/numa: do load balance between remote nodes Alex Shi
@ 2012-06-06  9:01 ` Peter Zijlstra
  2012-06-07  0:33   ` Alex Shi
  2012-06-06 10:53 ` Sergei Shtylyov
  2012-06-06 15:53 ` [tip:sched/urgent] sched/numa: Load " tip-bot for Alex Shi
  2 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2012-06-06  9:01 UTC (permalink / raw)
  To: Alex Shi
  Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
	linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
	linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
	tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
	bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
	seto.hidetoshi, ak, arjan.van.de.ven

On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
> -       if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
> +       if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) 

I actually considered this.. I just felt a little uneasy re-purposing
the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
mean expensive-away-distance.

So I've taken this.

thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
  2012-06-06  6:52 [RFC PATCH] sched/numa: do load balance between remote nodes Alex Shi
  2012-06-06  9:01 ` Peter Zijlstra
@ 2012-06-06 10:53 ` Sergei Shtylyov
  2012-06-06 15:53 ` [tip:sched/urgent] sched/numa: Load " tip-bot for Alex Shi
  2 siblings, 0 replies; 5+ messages in thread
From: Sergei Shtylyov @ 2012-06-06 10:53 UTC (permalink / raw)
  To: Alex Shi
  Cc: a.p.zijlstra, anton, benh, cmetcalf, dhowells, davem, fenghua.yu,
	hpa, ink, linux-alpha, linux-ia64, linux-kernel, linux-mips,
	linuxppc-dev, linux-sh, mattst88, paulus, lethal, ralf, rth,
	sparclinux, tony.luck, x86, sivanich, greg.pearson,
	kamezawa.hiroyu, bob.picco, chris.mason, torvalds, akpm, mingo,
	pjt, tglx, seto.hidetoshi, ak, arjan.van.de.ven

Hello.

On 06-06-2012 10:52, Alex Shi wrote:

> commit cb83b629b

    Please also specify that commit's summary in parens.

> remove the NODE sched domain and check if the node
> distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
> lose the load balance chance at exec/fork/wake_affine points.

> But actually, even the node distance is farther than REMOTE_DISTANCE,
> Modern CPUs also has QPI like connections, that make memory access is

    "Is" not needed here.

> not too slow between nodes.  So above losing on NUMA machine make a
> huge performance regression on benchmark: hackbench, tbench, netperf
> and oltp etc.

> This patch will recover the scheduler behavior to old mode on all my
> Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
> perfromance regressions. (all of them just has 2 kinds distance, 10 21)

> Signed-off-by: Alex Shi<alex.shi@intel.com>

WBR, Sergei

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:sched/urgent] sched/numa: Load balance between remote nodes
  2012-06-06  6:52 [RFC PATCH] sched/numa: do load balance between remote nodes Alex Shi
  2012-06-06  9:01 ` Peter Zijlstra
  2012-06-06 10:53 ` Sergei Shtylyov
@ 2012-06-06 15:53 ` tip-bot for Alex Shi
  2 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Alex Shi @ 2012-06-06 15:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, a.p.zijlstra, alex.shi, tglx

Commit-ID:  10717dcde10d09f9fcee53a12a4236af1a82b484
Gitweb:     http://git.kernel.org/tip/10717dcde10d09f9fcee53a12a4236af1a82b484
Author:     Alex Shi <alex.shi@intel.com>
AuthorDate: Wed, 6 Jun 2012 14:52:51 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Jun 2012 16:52:25 +0200

sched/numa: Load balance between remote nodes

Commit cb83b629b ("sched/numa: Rewrite the CONFIG_NUMA sched
domain support") removed the NODE sched domain and started checking
if the node distance in SLIT table is farther than REMOTE_DISTANCE,
if so, it will lose the load balance chance at exec/fork/wake_affine
points.

But actually, even the node distance is farther than REMOTE_DISTANCE.

Modern CPUs also has QPI like connections, which ensures that memory
access is not too slow between nodes. So the above change in behavior
on NUMA machine causes a performance regression on various benchmarks:
hackbench, tbench, netperf, oltp, etc.

This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and thus fixes the
perfromance regressions. (all of them just have 2 kinds distance, 10, 21)

Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338965571-9812-1-git-send-email-alex.shi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c46958e..6546083 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6321,7 +6321,7 @@ static int sched_domains_curr_level;
 
 static inline int sd_local_flags(int level)
 {
-	if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+	if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
 		return 0;
 
 	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
  2012-06-06  9:01 ` Peter Zijlstra
@ 2012-06-07  0:33   ` Alex Shi
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Shi @ 2012-06-07  0:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: anton, benh, cmetcalf, dhowells, davem, fenghua.yu, hpa, ink,
	linux-alpha, linux-ia64, linux-kernel, linux-mips, linuxppc-dev,
	linux-sh, mattst88, paulus, lethal, ralf, rth, sparclinux,
	tony.luck, x86, sivanich, greg.pearson, kamezawa.hiroyu,
	bob.picco, chris.mason, torvalds, akpm, mingo, pjt, tglx,
	seto.hidetoshi, ak, arjan.van.de.ven

On 06/06/2012 05:01 PM, Peter Zijlstra wrote:

> On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
>> -       if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
>> +       if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) 
> 
> I actually considered this.. I just felt a little uneasy re-purposing
> the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
> mean expensive-away-distance.
> 


I understand you, the BIOS guys don't have a good alignment with us on
this.

> So I've taken this.
> 
> thanks!



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-07  0:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-06  6:52 [RFC PATCH] sched/numa: do load balance between remote nodes Alex Shi
2012-06-06  9:01 ` Peter Zijlstra
2012-06-07  0:33   ` Alex Shi
2012-06-06 10:53 ` Sergei Shtylyov
2012-06-06 15:53 ` [tip:sched/urgent] sched/numa: Load " tip-bot for Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).