LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/2] sched: Improve load balancing on AMD EPYC
@ 2019-08-08 19:52 Matt Fleming
  2019-08-08 19:53 ` [PATCH 1/2] ia64: Make NUMA select SMP Matt Fleming
  2019-08-08 19:53 ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Matt Fleming
  0 siblings, 2 replies; 8+ messages in thread
From: Matt Fleming @ 2019-08-08 19:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Tony Luck, Rik van Riel, Suravee.Suthikulpanit,
	Borislav Petkov, Thomas.Lendacky, Mel Gorman, Matt Fleming

This is another version of the AMD EPYC load balancing patch. The
difference with this one is that now it fixes the following ia64 build
error, reported by 0day:

   mm/page_alloc.o: In function `get_page_from_freelist':
   page_alloc.c:(.text+0x7850): undefined reference to `node_reclaim_distance'
   page_alloc.c:(.text+0x7931): undefined reference to `node_reclaim_distance'

Matt Fleming (2):
  ia64: Make NUMA select SMP
  sched/topology: Improve load balancing on AMD EPYC

 arch/ia64/Kconfig         |  1 +
 arch/x86/kernel/cpu/amd.c |  5 +++++
 include/linux/topology.h  | 14 ++++++++++++++
 kernel/sched/topology.c   |  3 ++-
 mm/khugepaged.c           |  2 +-
 mm/page_alloc.c           |  2 +-
 6 files changed, 24 insertions(+), 3 deletions(-)

-- 
2.13.7


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] ia64: Make NUMA select SMP
  2019-08-08 19:52 [PATCH v4 0/2] sched: Improve load balancing on AMD EPYC Matt Fleming
@ 2019-08-08 19:53 ` Matt Fleming
  2019-09-03  8:31   ` [tip: sched/core] arch, " tip-bot2 for Matt Fleming
  2019-08-08 19:53 ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Matt Fleming
  1 sibling, 1 reply; 8+ messages in thread
From: Matt Fleming @ 2019-08-08 19:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Tony Luck, Rik van Riel, Suravee.Suthikulpanit,
	Borislav Petkov, Thomas.Lendacky, Mel Gorman, Matt Fleming

While it does make sense to allow CONFIG_NUMA and !CONFIG_SMP in
theory, it doesn't make much sense in practice.

Follow other architectures and make CONFIG_NUMA select CONFIG_SMP.

The motivation for this patch is to allow a new NUMA variable to be
initialised in kernel/sched/topology.c.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/ia64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 7468d8e50467..997baba02b70 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -389,6 +389,7 @@ config NUMA
 	depends on !IA64_HP_SIM && !FLATMEM
 	default y if IA64_SGI_SN2
 	select ACPI_NUMA if ACPI
+	select SMP
 	help
 	  Say Y to compile the kernel to support NUMA (Non-Uniform Memory
 	  Access).  This option is for configuring high-end multiprocessor
-- 
2.13.7


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC
  2019-08-08 19:52 [PATCH v4 0/2] sched: Improve load balancing on AMD EPYC Matt Fleming
  2019-08-08 19:53 ` [PATCH 1/2] ia64: Make NUMA select SMP Matt Fleming
@ 2019-08-08 19:53 ` Matt Fleming
  2019-09-03  8:31   ` [tip: sched/core] sched/topology: Improve load balancing on AMD EPYC systems tip-bot2 for Matt Fleming
  2019-10-07 15:28   ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Guenter Roeck
  1 sibling, 2 replies; 8+ messages in thread
From: Matt Fleming @ 2019-08-08 19:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Tony Luck, Rik van Riel, Suravee.Suthikulpanit,
	Borislav Petkov, Thomas.Lendacky, Mel Gorman, Matt Fleming

SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init()
for any sched domains with a NUMA distance greater than 2 hops
(RECLAIM_DISTANCE). The idea being that it's expensive to balance
across domains that far apart.

However, as is rather unfortunately explained in

  commit 32e45ff43eaf ("mm: increase RECLAIM_DISTANCE to 30")

the value for RECLAIM_DISTANCE is based on node distance tables from
2011-era hardware.

Current AMD EPYC machines have the following NUMA node distances:

node distances:
node   0   1   2   3   4   5   6   7
  0:  10  16  16  16  32  32  32  32
  1:  16  10  16  16  32  32  32  32
  2:  16  16  10  16  32  32  32  32
  3:  16  16  16  10  32  32  32  32
  4:  32  32  32  32  10  16  16  16
  5:  32  32  32  32  16  10  16  16
  6:  32  32  32  32  16  16  10  16
  7:  32  32  32  32  16  16  16  10

where 2 hops is 32.

The result is that the scheduler fails to load balance properly across
NUMA nodes on different sockets -- 2 hops apart.

For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4
(CPUs 32-39) like so,

  $ numactl -C 0-7,32-39 ./spinner 16

causes all threads to fork and remain on node 0 until the active
balancer kicks in after a few seconds and forcibly moves some threads
to node 4.

Override node_reclaim_distance for AMD Zen.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Suravee.Suthikulpanit@amd.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas.Lendacky@amd.com
---
 arch/x86/kernel/cpu/amd.c |  5 +++++
 include/linux/topology.h  | 14 ++++++++++++++
 kernel/sched/topology.c   |  3 ++-
 mm/khugepaged.c           |  2 +-
 mm/page_alloc.c           |  2 +-
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 8d4e50428b68..ceeb8afc7cf3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -8,6 +8,7 @@
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/random.h>
+#include <linux/topology.h>
 #include <asm/processor.h>
 #include <asm/apic.h>
 #include <asm/cacheinfo.h>
@@ -824,6 +825,10 @@ static void init_amd_zn(struct cpuinfo_x86 *c)
 {
 	set_cpu_cap(c, X86_FEATURE_ZEN);
 
+#ifdef CONFIG_NUMA
+	node_reclaim_distance = 32;
+#endif
+
 	/*
 	 * Fix erratum 1076: CPB feature bit not being set in CPUID.
 	 * Always set it, except when running under a hypervisor.
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 47a3e3c08036..579522ec446c 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -59,6 +59,20 @@ int arch_update_cpu_topology(void);
  */
 #define RECLAIM_DISTANCE 30
 #endif
+
+/*
+ * The following tunable allows platforms to override the default node
+ * reclaim distance (RECLAIM_DISTANCE) if remote memory accesses are
+ * sufficiently fast that the default value actually hurts
+ * performance.
+ *
+ * AMD EPYC machines use this because even though the 2-hop distance
+ * is 32 (3.2x slower than a local memory access) performance actually
+ * *improves* if allowed to reclaim memory and load balance tasks
+ * between NUMA nodes 2-hops apart.
+ */
+extern int __read_mostly node_reclaim_distance;
+
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
 #endif
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8f83e8e3ea9a..b5667a273bf6 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1284,6 +1284,7 @@ static int			sched_domains_curr_level;
 int				sched_max_numa_distance;
 static int			*sched_domains_numa_distance;
 static struct cpumask		***sched_domains_numa_masks;
+int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;
 #endif
 
 /*
@@ -1402,7 +1403,7 @@ sd_init(struct sched_domain_topology_level *tl,
 
 		sd->flags &= ~SD_PREFER_SIBLING;
 		sd->flags |= SD_SERIALIZE;
-		if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
+		if (sched_domains_numa_distance[tl->numa_level] > node_reclaim_distance) {
 			sd->flags &= ~(SD_BALANCE_EXEC |
 				       SD_BALANCE_FORK |
 				       SD_WAKE_AFFINE);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index eaaa21b23215..ccede2425c3f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -710,7 +710,7 @@ static bool khugepaged_scan_abort(int nid)
 	for (i = 0; i < MAX_NUMNODES; i++) {
 		if (!khugepaged_node_load[i])
 			continue;
-		if (node_distance(nid, i) > RECLAIM_DISTANCE)
+		if (node_distance(nid, i) > node_reclaim_distance)
 			return true;
 	}
 	return false;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de1bf4e..0d54cd2c43a4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3522,7 +3522,7 @@ bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
 static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
 {
 	return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <=
-				RECLAIM_DISTANCE;
+				node_reclaim_distance;
 }
 #else	/* CONFIG_NUMA */
 static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
-- 
2.13.7


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip: sched/core] arch, ia64: Make NUMA select SMP
  2019-08-08 19:53 ` [PATCH 1/2] ia64: Make NUMA select SMP Matt Fleming
@ 2019-09-03  8:31   ` tip-bot2 for Matt Fleming
  0 siblings, 0 replies; 8+ messages in thread
From: tip-bot2 for Matt Fleming @ 2019-09-03  8:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Matt Fleming, Peter Zijlstra (Intel),
	Borislav Petkov, Linus Torvalds, Mel Gorman, Rik van Riel,
	Suravee.Suthikulpanit, Thomas Gleixner, Thomas.Lendacky,
	Tony Luck, Ingo Molnar, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     a2cbfd46559e809c8165773b7fe8afa058b35414
Gitweb:        https://git.kernel.org/tip/a2cbfd46559e809c8165773b7fe8afa058b35414
Author:        Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate:    Thu, 08 Aug 2019 20:53:00 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 03 Sep 2019 09:17:36 +02:00

arch, ia64: Make NUMA select SMP

While it does make sense to allow CONFIG_NUMA and !CONFIG_SMP in
theory, it doesn't make much sense in practice.

Follow other architectures and make CONFIG_NUMA select CONFIG_SMP.

The motivation for this patch is to allow a new NUMA variable to be
initialised in kernel/sched/topology.c.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suravee.Suthikulpanit@amd.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas.Lendacky@amd.com
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190808195301.13222-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/ia64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 7468d8e..997baba 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -389,6 +389,7 @@ config NUMA
 	depends on !IA64_HP_SIM && !FLATMEM
 	default y if IA64_SGI_SN2
 	select ACPI_NUMA if ACPI
+	select SMP
 	help
 	  Say Y to compile the kernel to support NUMA (Non-Uniform Memory
 	  Access).  This option is for configuring high-end multiprocessor

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip: sched/core] sched/topology: Improve load balancing on AMD EPYC systems
  2019-08-08 19:53 ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Matt Fleming
@ 2019-09-03  8:31   ` tip-bot2 for Matt Fleming
  2019-10-07 15:28   ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Guenter Roeck
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot2 for Matt Fleming @ 2019-09-03  8:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Matt Fleming, Peter Zijlstra (Intel),
	Mel Gorman, Borislav Petkov, Linus Torvalds, Rik van Riel,
	Suravee.Suthikulpanit, Thomas Gleixner, Thomas.Lendacky,
	Tony Luck, Ingo Molnar, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     a55c7454a8c887b226a01d7eed088ccb5374d81e
Gitweb:        https://git.kernel.org/tip/a55c7454a8c887b226a01d7eed088ccb5374d81e
Author:        Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate:    Thu, 08 Aug 2019 20:53:01 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 03 Sep 2019 09:17:37 +02:00

sched/topology: Improve load balancing on AMD EPYC systems

SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init()
for any sched domains with a NUMA distance greater than 2 hops
(RECLAIM_DISTANCE). The idea being that it's expensive to balance
across domains that far apart.

However, as is rather unfortunately explained in:

  commit 32e45ff43eaf ("mm: increase RECLAIM_DISTANCE to 30")

the value for RECLAIM_DISTANCE is based on node distance tables from
2011-era hardware.

Current AMD EPYC machines have the following NUMA node distances:

 node distances:
 node   0   1   2   3   4   5   6   7
   0:  10  16  16  16  32  32  32  32
   1:  16  10  16  16  32  32  32  32
   2:  16  16  10  16  32  32  32  32
   3:  16  16  16  10  32  32  32  32
   4:  32  32  32  32  10  16  16  16
   5:  32  32  32  32  16  10  16  16
   6:  32  32  32  32  16  16  10  16
   7:  32  32  32  32  16  16  16  10

where 2 hops is 32.

The result is that the scheduler fails to load balance properly across
NUMA nodes on different sockets -- 2 hops apart.

For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4
(CPUs 32-39) like so,

  $ numactl -C 0-7,32-39 ./spinner 16

causes all threads to fork and remain on node 0 until the active
balancer kicks in after a few seconds and forcibly moves some threads
to node 4.

Override node_reclaim_distance for AMD Zen.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suravee.Suthikulpanit@amd.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas.Lendacky@amd.com
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190808195301.13222-3-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/amd.c |  5 +++++
 include/linux/topology.h  | 14 ++++++++++++++
 kernel/sched/topology.c   |  3 ++-
 mm/khugepaged.c           |  2 +-
 mm/page_alloc.c           |  2 +-
 5 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 8d4e504..ceeb8af 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -8,6 +8,7 @@
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/random.h>
+#include <linux/topology.h>
 #include <asm/processor.h>
 #include <asm/apic.h>
 #include <asm/cacheinfo.h>
@@ -824,6 +825,10 @@ static void init_amd_zn(struct cpuinfo_x86 *c)
 {
 	set_cpu_cap(c, X86_FEATURE_ZEN);
 
+#ifdef CONFIG_NUMA
+	node_reclaim_distance = 32;
+#endif
+
 	/*
 	 * Fix erratum 1076: CPB feature bit not being set in CPUID.
 	 * Always set it, except when running under a hypervisor.
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 47a3e3c..579522e 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -59,6 +59,20 @@ int arch_update_cpu_topology(void);
  */
 #define RECLAIM_DISTANCE 30
 #endif
+
+/*
+ * The following tunable allows platforms to override the default node
+ * reclaim distance (RECLAIM_DISTANCE) if remote memory accesses are
+ * sufficiently fast that the default value actually hurts
+ * performance.
+ *
+ * AMD EPYC machines use this because even though the 2-hop distance
+ * is 32 (3.2x slower than a local memory access) performance actually
+ * *improves* if allowed to reclaim memory and load balance tasks
+ * between NUMA nodes 2-hops apart.
+ */
+extern int __read_mostly node_reclaim_distance;
+
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
 #endif
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8f83e8e..b5667a2 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1284,6 +1284,7 @@ static int			sched_domains_curr_level;
 int				sched_max_numa_distance;
 static int			*sched_domains_numa_distance;
 static struct cpumask		***sched_domains_numa_masks;
+int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;
 #endif
 
 /*
@@ -1402,7 +1403,7 @@ sd_init(struct sched_domain_topology_level *tl,
 
 		sd->flags &= ~SD_PREFER_SIBLING;
 		sd->flags |= SD_SERIALIZE;
-		if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
+		if (sched_domains_numa_distance[tl->numa_level] > node_reclaim_distance) {
 			sd->flags &= ~(SD_BALANCE_EXEC |
 				       SD_BALANCE_FORK |
 				       SD_WAKE_AFFINE);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index eaaa21b..ccede24 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -710,7 +710,7 @@ static bool khugepaged_scan_abort(int nid)
 	for (i = 0; i < MAX_NUMNODES; i++) {
 		if (!khugepaged_node_load[i])
 			continue;
-		if (node_distance(nid, i) > RECLAIM_DISTANCE)
+		if (node_distance(nid, i) > node_reclaim_distance)
 			return true;
 	}
 	return false;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de..0d54cd2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3522,7 +3522,7 @@ bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
 static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
 {
 	return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <=
-				RECLAIM_DISTANCE;
+				node_reclaim_distance;
 }
 #else	/* CONFIG_NUMA */
 static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC
  2019-08-08 19:53 ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Matt Fleming
  2019-09-03  8:31   ` [tip: sched/core] sched/topology: Improve load balancing on AMD EPYC systems tip-bot2 for Matt Fleming
@ 2019-10-07 15:28   ` Guenter Roeck
  2019-10-09 12:04     ` Matt Fleming
  1 sibling, 1 reply; 8+ messages in thread
From: Guenter Roeck @ 2019-10-07 15:28 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, linux-kernel, Tony Luck, Rik van Riel,
	Suravee.Suthikulpanit, Borislav Petkov, Thomas.Lendacky,
	Mel Gorman

Hi,

On Thu, Aug 08, 2019 at 08:53:01PM +0100, Matt Fleming wrote:
> SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init()
> for any sched domains with a NUMA distance greater than 2 hops
> (RECLAIM_DISTANCE). The idea being that it's expensive to balance
> across domains that far apart.
> 
> However, as is rather unfortunately explained in
> 
>   commit 32e45ff43eaf ("mm: increase RECLAIM_DISTANCE to 30")
> 
> the value for RECLAIM_DISTANCE is based on node distance tables from
> 2011-era hardware.
> 
> Current AMD EPYC machines have the following NUMA node distances:
> 
> node distances:
> node   0   1   2   3   4   5   6   7
>   0:  10  16  16  16  32  32  32  32
>   1:  16  10  16  16  32  32  32  32
>   2:  16  16  10  16  32  32  32  32
>   3:  16  16  16  10  32  32  32  32
>   4:  32  32  32  32  10  16  16  16
>   5:  32  32  32  32  16  10  16  16
>   6:  32  32  32  32  16  16  10  16
>   7:  32  32  32  32  16  16  16  10
> 
> where 2 hops is 32.
> 
> The result is that the scheduler fails to load balance properly across
> NUMA nodes on different sockets -- 2 hops apart.
> 
> For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4
> (CPUs 32-39) like so,
> 
>   $ numactl -C 0-7,32-39 ./spinner 16
> 
> causes all threads to fork and remain on node 0 until the active
> balancer kicks in after a few seconds and forcibly moves some threads
> to node 4.
> 
> Override node_reclaim_distance for AMD Zen.
> 
> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Cc: Suravee.Suthikulpanit@amd.com
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Thomas.Lendacky@amd.com

This patch causes build errors on systems where NUMA does not depend on SMP,
for example MIPS and PPC. For example, building mips:ip27_defconfig with SMP
disabled results in

mips-linux-ld: mm/page_alloc.o: in function `get_page_from_freelist':
page_alloc.c:(.text+0x5018): undefined reference to `node_reclaim_distance'
mips-linux-ld: page_alloc.c:(.text+0x5020): undefined reference to `node_reclaim_distance'
mips-linux-ld: page_alloc.c:(.text+0x5028): undefined reference to `node_reclaim_distance'
mips-linux-ld: page_alloc.c:(.text+0x5040): undefined reference to `node_reclaim_distance'
Makefile:1074: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

I have seen a similar problem with one of my PPC test builds.

powerpc64-linux-ld: mm/page_alloc.o:(.toc+0x18): undefined reference to `node_reclaim_distance'

Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC
  2019-10-07 15:28   ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Guenter Roeck
@ 2019-10-09 12:04     ` Matt Fleming
  2019-10-09 12:39       ` Guenter Roeck
  0 siblings, 1 reply; 8+ messages in thread
From: Matt Fleming @ 2019-10-09 12:04 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, linux-kernel, Tony Luck, Rik van Riel,
	Suravee.Suthikulpanit, Borislav Petkov, Thomas.Lendacky,
	Mel Gorman, Ralf Baechle, Paul Burton, James Hogan,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On Mon, 07 Oct, at 08:28:16AM, Guenter Roeck wrote:
> 
> This patch causes build errors on systems where NUMA does not depend on SMP,
> for example MIPS and PPC. For example, building mips:ip27_defconfig with SMP
> disabled results in
> 
> mips-linux-ld: mm/page_alloc.o: in function `get_page_from_freelist':
> page_alloc.c:(.text+0x5018): undefined reference to `node_reclaim_distance'
> mips-linux-ld: page_alloc.c:(.text+0x5020): undefined reference to `node_reclaim_distance'
> mips-linux-ld: page_alloc.c:(.text+0x5028): undefined reference to `node_reclaim_distance'
> mips-linux-ld: page_alloc.c:(.text+0x5040): undefined reference to `node_reclaim_distance'
> Makefile:1074: recipe for target 'vmlinux' failed
> make: *** [vmlinux] Error 1
> 
> I have seen a similar problem with one of my PPC test builds.
> 
> powerpc64-linux-ld: mm/page_alloc.o:(.toc+0x18): undefined reference to `node_reclaim_distance'

Thanks for this Guenter.

So, the way I've fixed this same issue for ia64 was to make NUMA
depend on SMP. Does that seem like a suitable solution for both PPC
and MIPS?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC
  2019-10-09 12:04     ` Matt Fleming
@ 2019-10-09 12:39       ` Guenter Roeck
  0 siblings, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2019-10-09 12:39 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, linux-kernel, Tony Luck, Rik van Riel,
	Suravee.Suthikulpanit, Borislav Petkov, Thomas.Lendacky,
	Mel Gorman, Ralf Baechle, Paul Burton, James Hogan,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman

On 10/9/19 5:04 AM, Matt Fleming wrote:
> On Mon, 07 Oct, at 08:28:16AM, Guenter Roeck wrote:
>>
>> This patch causes build errors on systems where NUMA does not depend on SMP,
>> for example MIPS and PPC. For example, building mips:ip27_defconfig with SMP
>> disabled results in
>>
>> mips-linux-ld: mm/page_alloc.o: in function `get_page_from_freelist':
>> page_alloc.c:(.text+0x5018): undefined reference to `node_reclaim_distance'
>> mips-linux-ld: page_alloc.c:(.text+0x5020): undefined reference to `node_reclaim_distance'
>> mips-linux-ld: page_alloc.c:(.text+0x5028): undefined reference to `node_reclaim_distance'
>> mips-linux-ld: page_alloc.c:(.text+0x5040): undefined reference to `node_reclaim_distance'
>> Makefile:1074: recipe for target 'vmlinux' failed
>> make: *** [vmlinux] Error 1
>>
>> I have seen a similar problem with one of my PPC test builds.
>>
>> powerpc64-linux-ld: mm/page_alloc.o:(.toc+0x18): undefined reference to `node_reclaim_distance'
> 
> Thanks for this Guenter.
> 
> So, the way I've fixed this same issue for ia64 was to make NUMA
> depend on SMP. Does that seem like a suitable solution for both PPC
> and MIPS?
> 

You would still have to cover all other architectures where SMP and NUMA are independent
of each other. Fortunately, it looks like this is only sh4.

sh4-linux-ld: mm/page_alloc.o: in function `get_page_from_freelist':
page_alloc.c:(.text+0x3ce0): undefined reference to `node_reclaim_distance'
Makefile:1074: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

arm64 and s390 happen to work because they mandate SMP support, even though NUMA
is nominally independent.

Wondering - why not declare node_reclaim_distance outside SMP dependency ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-08 19:52 [PATCH v4 0/2] sched: Improve load balancing on AMD EPYC Matt Fleming
2019-08-08 19:53 ` [PATCH 1/2] ia64: Make NUMA select SMP Matt Fleming
2019-09-03  8:31   ` [tip: sched/core] arch, " tip-bot2 for Matt Fleming
2019-08-08 19:53 ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Matt Fleming
2019-09-03  8:31   ` [tip: sched/core] sched/topology: Improve load balancing on AMD EPYC systems tip-bot2 for Matt Fleming
2019-10-07 15:28   ` [PATCH v4 2/2] sched/topology: Improve load balancing on AMD EPYC Guenter Roeck
2019-10-09 12:04     ` Matt Fleming
2019-10-09 12:39       ` Guenter Roeck

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git