All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration
@ 2011-02-12 17:10 Tejun Heo
  2011-02-12 17:10 ` [PATCH 01/26] x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones Tejun Heo
                   ` (25 more replies)
  0 siblings, 26 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa

Hello,

Currently, x86-64 NUMA configuration is unnecessarily complicated with
srat_64, amdtopology_64, dummy and emulation all doing about the same
things in slightly different ways.  This makes the code difficult to
comprehend, maintain and extend.

The worst offender is the NUMA emulation which maps and reverse maps
things in quite chaotic ways.  For example, CPU node remapping is done
by finding out the node the CPU is mapped to, taking the start address
and looking up the containing physical node, and then again looking
for emulated nodes which fall in the physical node.  Another
interesting example is node distance remapping whem the system is
using amdtopology - it generates pseudo ACPI PXM mapping.

This is the first of two patchset series.  This one cleans up x86-64
NUMA configuration so that specific implementations - srat,
amdtopology and dummy - only have to supply the information about the
actul configuration.  All the mangling and massaging are done inside
numa_64.c proper using the provided information.

This patchset implements all the infrastructure to make NUMA emulation
sane but doesn't actually update NUMA emulation.  It will be done by
the next patchset.  As it still retains old code and glues to keep
them working, LOC is increased by 60 lines.  After the second
patchset, the net LOC change will be -123 lines.

Once the x86-64 update is settled, x86-32 will be moved over to share
the new infrastructure.

This patchset is on top of the current tip/x86/numa[1] and contains
the following 26 patches.

Tested on an opteron NUMA machine which can do both ACPI and AMD
configs.  All NUMA configs, emulation, !NUMA and UP work as expected.

 0001-x86-64-NUMA-Make-dummy-node-initialization-path-simi.patch
 0002-x86-64-NUMA-Simplify-hotplug-node-handling-in-acpi_n.patch
 0003-x86-64-NUMA-Drop-start-last_pfn-from-initmem_init.patch
 0004-x86-64-NUMA-Unify-acpi-amd-_-numa_init-scan_nodes-ar.patch
 0005-x86-64-NUMA-Wrap-acpi_numa_init-so-that-failure-can-.patch
 0006-x86-64-NUMA-Move-_numa_init-invocations-into-initmem.patch
 0007-x86-64-NUMA-Restructure-initmem_init.patch
 0008-x86-64-NUMA-Use-common-cpu-mem-_nodes_parsed.patch
 0009-x86-64-NUMA-Remove-local-variable-found-from-amd_num.patch
 0010-x86-64-NUMA-Move-apicid-to-numa-mapping-initializati.patch
 0011-x86-64-NUMA-Use-common-numa_nodes.patch
 0012-x86-64-NUMA-Kill-acpi-amd-_get_nodes.patch
 0013-x86-64-NUMA-Factor-out-memblk-handling-into-numa_-ad.patch
 0014-x86-64-NUMA-Unify-use-of-memblk-in-all-init-methods.patch
 0015-x86-64-NUMA-Unify-the-rest-of-memblk-registration.patch
 0016-x86-64-NUMA-Kill-acpi-amd-dummy-_scan_nodes.patch
 0017-x86-64-NUMA-Remove-NULL-nodeids-handling-from-comput.patch
 0018-x86-64-NUMA-Introduce-struct-numa_meminfo.patch
 0019-x86-64-NUMA-Separate-out-numa_cleanup_meminfo.patch
 0020-x86-64-NUMA-make-numa_cleanup_meminfo-prettier.patch
 0021-x86-64-NUMA-consolidate-and-improve-memblk-sanity-ch.patch
 0022-x86-64-NUMA-Add-common-find_node_by_addr.patch
 0023-x86-64-NUMA-kill-numa_nodes.patch
 0024-x86-64-NUMA-Rename-cpu_nodes_parsed-to-numa_nodes_pa.patch
 0025-x86-64-NUMA-Kill-mem_nodes_parsed.patch
 0026-x86-64-NUMA-Implement-generic-node-distance-handling.patch

The patchset is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86_64-numa-unify

Diffstat follows.

 arch/x86/include/asm/acpi.h       |    6 
 arch/x86/include/asm/amd_nb.h     |    4 
 arch/x86/include/asm/numa_64.h    |   11 
 arch/x86/include/asm/page_types.h |    3 
 arch/x86/include/asm/topology.h   |    2 
 arch/x86/kernel/setup.c           |   16 -
 arch/x86/mm/amdtopology_64.c      |  117 ++------
 arch/x86/mm/init_64.c             |    5 
 arch/x86/mm/numa_64.c             |  507 ++++++++++++++++++++++++++++++++------
 arch/x86/mm/srat_64.c             |  290 +--------------------
 drivers/acpi/numa.c               |    9 
 11 files changed, 515 insertions(+), 455 deletions(-)

Thanks.

--
tejun

[1] eff9073790e1286aa12bf1c65814d3e0132b12e1 (x86: Rename incorrectly
    named parameter of numa_cpu_node())

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 01/26] x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:52   ` Yinghai Lu
  2011-02-12 17:10 ` [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() Tejun Heo
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Dummy node initialization in initmem_init() didn't initialize apicid
to node mapping and set cpu to node mapping directly by caling
numa_set_node(), which is different from non-dummy init paths.

Update it such that they behave similarly.  Initialize apicid to node
mapping and call numa_init_array().  The actual cpu to node mapping is
handled by init_cpu_to_node() later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index f548fbf..ea5dd48 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -623,10 +623,11 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	memnodemap[0] = 0;
 	node_set_online(0);
 	node_set(0, node_possible_map);
-	for (i = 0; i < nr_cpu_ids; i++)
-		numa_set_node(i, 0);
+	for (i = 0; i < MAX_LOCAL_APIC; i++)
+		set_apicid_to_node(i, NUMA_NO_NODE);
 	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
 	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+	numa_init_array();
 }
 
 unsigned long __init numa_free_all_bootmem(void)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
  2011-02-12 17:10 ` [PATCH 01/26] x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:47   ` Yinghai Lu
  2011-02-12 17:10 ` [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init() Tejun Heo
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Hotplug node handling in acpi_numa_memory_affinity_init() was
unnecessarily complicated with storing the original nodes[] entry and
restoring it afterwards.  Simplify it by not modifying the nodes[]
entry for hotplug nodes from the beginning.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/srat_64.c |   31 +++++++++++++------------------
 1 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 9a97261..e3e0dd3 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -251,7 +251,7 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
 void __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
-	struct bootnode *nd, oldnode;
+	struct bootnode *nd;
 	unsigned long start, end;
 	int node, pxm;
 	int i;
@@ -289,28 +289,23 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		bad_srat();
 		return;
 	}
-	nd = &nodes[node];
-	oldnode = *nd;
-	if (!node_test_and_set(node, nodes_parsed)) {
-		nd->start = start;
-		nd->end = end;
-	} else {
-		if (start < nd->start)
-			nd->start = start;
-		if (nd->end < end)
-			nd->end = end;
-	}
 
 	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
 	       start, end);
 
-	if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
+	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+		nd = &nodes[node];
+		if (!node_test_and_set(node, nodes_parsed)) {
+			nd->start = start;
+			nd->end = end;
+		} else {
+			if (start < nd->start)
+				nd->start = start;
+			if (nd->end < end)
+				nd->end = end;
+		}
+	} else
 		update_nodes_add(node, start, end);
-		/* restore nodes[node] */
-		*nd = oldnode;
-		if ((nd->start | nd->end) == 0)
-			node_clear(node, nodes_parsed);
-	}
 
 	node_memblk_range[num_node_memblks].start = start;
 	node_memblk_range[num_node_memblks].end = end;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
  2011-02-12 17:10 ` [PATCH 01/26] x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones Tejun Heo
  2011-02-12 17:10 ` [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:58   ` Yinghai Lu
  2011-02-14 13:50   ` [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init() Tejun Heo
  2011-02-12 17:10 ` [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values Tejun Heo
                   ` (22 subsequent siblings)
  25 siblings, 2 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

initmem_init() extensively accesses and modifies global data
structures and the parameters aren't even followed depending on which
path is being used.  Drop @start/last_pfn and let it deal with
@max_pfn directly.  This is in preparation for further NUMA init
cleanups.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/page_types.h |    3 +--
 arch/x86/kernel/setup.c           |    2 +-
 arch/x86/mm/init_64.c             |    5 ++---
 arch/x86/mm/numa_64.c             |   21 ++++++++-------------
 4 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 1df6621..95892a1 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -48,8 +48,7 @@ extern unsigned long max_pfn_mapped;
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
-extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn,
-				int acpi, int k8);
+extern void initmem_init(int acpi, int k8);
 extern void free_initmem(void);
 
 #endif	/* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1202341..c50ba3d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -996,7 +996,7 @@ void __init setup_arch(char **cmdline_p)
 		amd = !amd_numa_init(0, max_pfn);
 #endif
 
-	initmem_init(0, max_pfn, acpi, amd);
+	initmem_init(acpi, amd);
 	memblock_find_dma_reserve();
 	dma32_reserve_bootmem();
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..26e4e73 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -612,10 +612,9 @@ kernel_physical_mapping_init(unsigned long start,
 }
 
 #ifndef CONFIG_NUMA
-void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
-				int acpi, int k8)
+void __init initmem_init(int acpi, int k8)
 {
-	memblock_x86_register_active_regions(0, start_pfn, end_pfn);
+	memblock_x86_register_active_regions(0, 0, max_pfn);
 }
 #endif
 
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index ea5dd48..f534feb 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -578,8 +578,7 @@ static int __init numa_emulation(unsigned long start_pfn,
 }
 #endif /* CONFIG_NUMA_EMU */
 
-void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
-				int acpi, int amd)
+void __init initmem_init(int acpi, int amd)
 {
 	int i;
 
@@ -587,19 +586,16 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	nodes_clear(node_online_map);
 
 #ifdef CONFIG_NUMA_EMU
-	setup_physnodes(start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT,
-			acpi, amd);
-	if (cmdline && !numa_emulation(start_pfn, last_pfn, acpi, amd))
+	setup_physnodes(0, max_pfn << PAGE_SHIFT, acpi, amd);
+	if (cmdline && !numa_emulation(0, max_pfn, acpi, amd))
 		return;
-	setup_physnodes(start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT,
-			acpi, amd);
+	setup_physnodes(0, max_pfn << PAGE_SHIFT, acpi, amd);
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
-	if (!numa_off && acpi && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
-						  last_pfn << PAGE_SHIFT))
+	if (!numa_off && acpi && !acpi_scan_nodes(0, max_pfn << PAGE_SHIFT))
 		return;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
@@ -615,8 +611,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
 
 	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
-	       start_pfn << PAGE_SHIFT,
-	       last_pfn << PAGE_SHIFT);
+	       0LU, max_pfn << PAGE_SHIFT);
 	/* setup dummy node covering all memory */
 	memnode_shift = 63;
 	memnodemap = memnode.embedded_map;
@@ -625,8 +620,8 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	node_set(0, node_possible_map);
 	for (i = 0; i < MAX_LOCAL_APIC; i++)
 		set_apicid_to_node(i, NUMA_NO_NODE);
-	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
-	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+	memblock_x86_register_active_regions(0, 0, max_pfn);
+	setup_node_bootmem(0, 0, max_pfn << PAGE_SHIFT);
 	numa_init_array();
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (2 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 18:39   ` Yinghai Lu
  2011-02-12 17:10 ` [PATCH 05/26] x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value Tejun Heo
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

The functions used during NUMA initialization - *_numa_init() and
*_scan_nodes() - have different arguments and return values.  Unify
them such that they all take no argument and return 0 on success and
-errno on failure.  This is in preparation for further NUMA init
cleanups.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/acpi.h   |    2 +-
 arch/x86/include/asm/amd_nb.h |    2 +-
 arch/x86/kernel/setup.c       |    4 ++--
 arch/x86/mm/amdtopology_64.c  |   18 +++++++++---------
 arch/x86/mm/numa_64.c         |    2 +-
 arch/x86/mm/srat_64.c         |    4 ++--
 drivers/acpi/numa.c           |    9 ++++++---
 7 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 211ca3f..4e5dff9 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -187,7 +187,7 @@ struct bootnode;
 extern int acpi_numa;
 extern void acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
 				unsigned long end);
-extern int acpi_scan_nodes(unsigned long start, unsigned long end);
+extern int acpi_scan_nodes(void);
 #define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
 
 #ifdef CONFIG_NUMA_EMU
diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 64dc82e..72abf65 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -16,7 +16,7 @@ struct bootnode;
 extern int early_is_amd_nb(u32 value);
 extern int amd_cache_northbridges(void);
 extern void amd_flush_garts(void);
-extern int amd_numa_init(unsigned long start_pfn, unsigned long end_pfn);
+extern int amd_numa_init(void);
 extern int amd_scan_nodes(void);
 
 #ifdef CONFIG_NUMA_EMU
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c50ba3d..1870a59 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -988,12 +988,12 @@ void __init setup_arch(char **cmdline_p)
 	/*
 	 * Parse SRAT to discover nodes.
 	 */
-	acpi = acpi_numa_init();
+	acpi = !acpi_numa_init();
 #endif
 
 #ifdef CONFIG_AMD_NUMA
 	if (!acpi)
-		amd = !amd_numa_init(0, max_pfn);
+		amd = !amd_numa_init();
 #endif
 
 	initmem_init(acpi, amd);
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index c7fae38..ee70257 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -51,7 +51,7 @@ static __init int find_northbridge(void)
 		return num;
 	}
 
-	return -1;
+	return -ENOENT;
 }
 
 static __init void early_get_boot_cpu_id(void)
@@ -69,17 +69,17 @@ static __init void early_get_boot_cpu_id(void)
 #endif
 }
 
-int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
+int __init amd_numa_init(void)
 {
-	unsigned long start = PFN_PHYS(start_pfn);
-	unsigned long end = PFN_PHYS(end_pfn);
+	unsigned long start = PFN_PHYS(0);
+	unsigned long end = PFN_PHYS(max_pfn);
 	unsigned numnodes;
 	unsigned long prevbase;
 	int i, nb, found = 0;
 	u32 nodeid, reg;
 
 	if (!early_pci_allowed())
-		return -1;
+		return -EINVAL;
 
 	nb = find_northbridge();
 	if (nb < 0)
@@ -90,7 +90,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
 	reg = read_pci_config(0, nb, 0, 0x60);
 	numnodes = ((reg >> 4) & 0xF) + 1;
 	if (numnodes <= 1)
-		return -1;
+		return -ENOENT;
 
 	pr_info("Number of physical nodes %d\n", numnodes);
 
@@ -121,7 +121,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
 		if ((base >> 8) & 3 || (limit >> 8) & 3) {
 			pr_err("Node %d using interleaving mode %lx/%lx\n",
 			       nodeid, (base >> 8) & 3, (limit >> 8) & 3);
-			return -1;
+			return -EINVAL;
 		}
 		if (node_isset(nodeid, nodes_parsed)) {
 			pr_info("Node %d already present, skipping\n",
@@ -160,7 +160,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
 		if (prevbase > base) {
 			pr_err("Node map not sorted %lx,%lx\n",
 			       prevbase, base);
-			return -1;
+			return -EINVAL;
 		}
 
 		pr_info("Node %d MemBase %016lx Limit %016lx\n",
@@ -177,7 +177,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
 	}
 
 	if (!found)
-		return -1;
+		return -ENOENT;
 	return 0;
 }
 
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index f534feb..85561d1 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -595,7 +595,7 @@ void __init initmem_init(int acpi, int amd)
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
-	if (!numa_off && acpi && !acpi_scan_nodes(0, max_pfn << PAGE_SHIFT))
+	if (!numa_off && acpi && !acpi_scan_nodes())
 		return;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index e3e0dd3..19652dd 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -359,7 +359,7 @@ void __init acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
 #endif /* CONFIG_NUMA_EMU */
 
 /* Use the information discovered above to actually set up the nodes. */
-int __init acpi_scan_nodes(unsigned long start, unsigned long end)
+int __init acpi_scan_nodes(void)
 {
 	int i;
 
@@ -368,7 +368,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
 
 	/* First clean up the node list */
 	for (i = 0; i < MAX_NUMNODES; i++)
-		cutoff_node(i, start, end);
+		cutoff_node(i, 0, max_pfn << PAGE_SHIFT);
 
 	/*
 	 * Join together blocks on the same node, holes between
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 5eb25eb..3b5c318 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -274,7 +274,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 
 int __init acpi_numa_init(void)
 {
-	int ret = 0;
+	int cnt = 0;
 
 	/*
 	 * Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
@@ -288,7 +288,7 @@ int __init acpi_numa_init(void)
 				     acpi_parse_x2apic_affinity, 0);
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
 				     acpi_parse_processor_affinity, 0);
-		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
+		cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
 					    acpi_parse_memory_affinity,
 					    NR_NODE_MEMBLKS);
 	}
@@ -297,7 +297,10 @@ int __init acpi_numa_init(void)
 	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
 
 	acpi_numa_arch_fixup();
-	return ret;
+
+	if (cnt <= 0)
+		return cnt ?: -ENOENT;
+	return 0;
 }
 
 int acpi_get_pxm(acpi_handle h)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 05/26] x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (3 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init() Tejun Heo
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Because of the way ACPI tables are parsed, the generic
acpi_numa_init() couldn't return failure when error was detected by
arch hooks.  Instead, the failure state was recorded and later arch
dependent init hook - acpi_scan_nodes() - would fail.

Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be
indicated as return value immediately.  This is in preparation for
further NUMA init cleanups.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/acpi.h |    1 +
 arch/x86/kernel/setup.c     |    2 +-
 arch/x86/mm/srat_64.c       |   10 ++++++++++
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 4e5dff9..06fb786 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -187,6 +187,7 @@ struct bootnode;
 extern int acpi_numa;
 extern void acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
 				unsigned long end);
+extern int x86_acpi_numa_init(void);
 extern int acpi_scan_nodes(void);
 #define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1870a59..f69d838 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -988,7 +988,7 @@ void __init setup_arch(char **cmdline_p)
 	/*
 	 * Parse SRAT to discover nodes.
 	 */
-	acpi = !acpi_numa_init();
+	acpi = !x86_acpi_numa_init();
 #endif
 
 #ifdef CONFIG_AMD_NUMA
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 19652dd..8d145ae 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -358,6 +358,16 @@ void __init acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
 }
 #endif /* CONFIG_NUMA_EMU */
 
+int __init x86_acpi_numa_init(void)
+{
+	int ret;
+
+	ret = acpi_numa_init();
+	if (ret < 0)
+		return ret;
+	return srat_disabled() ? -EINVAL : 0;
+}
+
 /* Use the information discovered above to actually set up the nodes. */
 int __init acpi_scan_nodes(void)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (4 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 05/26] x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-14  6:10   ` Ankita Garg
  2011-02-14 13:51   ` [PATCH UPDATED 06/26] x86, " Tejun Heo
  2011-02-12 17:10 ` [PATCH 07/26] x86-64, NUMA: Restructure initmem_init() Tejun Heo
                   ` (19 subsequent siblings)
  25 siblings, 2 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

There's no reason for these to live in setup_arch().  Move them inside
initmem_init().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/page_types.h |    2 +-
 arch/x86/kernel/setup.c           |   16 +---------------
 arch/x86/mm/init_64.c             |    2 +-
 arch/x86/mm/numa_64.c             |   16 +++++++++++++++-
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 95892a1..c157986 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -48,7 +48,7 @@ extern unsigned long max_pfn_mapped;
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
-extern void initmem_init(int acpi, int k8);
+extern void initmem_init(void);
 extern void free_initmem(void);
 
 #endif	/* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f69d838..9907b45 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -704,8 +704,6 @@ static u64 __init get_max_mapped(void)
 
 void __init setup_arch(char **cmdline_p)
 {
-	int acpi = 0;
-	int amd = 0;
 	unsigned long flags;
 
 #ifdef CONFIG_X86_32
@@ -984,19 +982,7 @@ void __init setup_arch(char **cmdline_p)
 
 	early_acpi_boot_init();
 
-#ifdef CONFIG_ACPI_NUMA
-	/*
-	 * Parse SRAT to discover nodes.
-	 */
-	acpi = !x86_acpi_numa_init();
-#endif
-
-#ifdef CONFIG_AMD_NUMA
-	if (!acpi)
-		amd = !amd_numa_init();
-#endif
-
-	initmem_init(acpi, amd);
+	initmem_init();
 	memblock_find_dma_reserve();
 	dma32_reserve_bootmem();
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 26e4e73..2f333d4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -612,7 +612,7 @@ kernel_physical_mapping_init(unsigned long start,
 }
 
 #ifndef CONFIG_NUMA
-void __init initmem_init(int acpi, int k8)
+void __init initmem_init(void)
 {
 	memblock_x86_register_active_regions(0, 0, max_pfn);
 }
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 85561d1..4105728 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/sched.h>
+#include <linux/acpi.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -578,10 +579,23 @@ static int __init numa_emulation(unsigned long start_pfn,
 }
 #endif /* CONFIG_NUMA_EMU */
 
-void __init initmem_init(int acpi, int amd)
+void __init initmem_init(void)
 {
+	int acpi = 0, amd = 0;
 	int i;
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Parse SRAT to discover nodes.
+	 */
+	acpi = !x86_acpi_numa_init();
+#endif
+
+#ifdef CONFIG_AMD_NUMA
+	if (!acpi)
+		amd = !amd_numa_init();
+#endif
+
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 07/26] x86-64, NUMA: Restructure initmem_init()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (5 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 08/26] x86-64, NUMA: Use common {cpu|mem}_nodes_parsed Tejun Heo
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Reorganize initmem_init() such that,

* Different NUMA init methods are iterated in a consistent way.

* Each iteration re-initializes all the parameters and different
  method can be tried after a failure.

* Dummy init is handled the same as other methods.

Apart from how retry after failure, this patch doesn't change the
behavior.  The call sequences are kept equivalent across the
conversion.

After the change, bad_srat() doesn't need to clear apic to node
mapping or worry about numa_off.  Simplified accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |   94 ++++++++++++++++++++++++++----------------------
 arch/x86/mm/srat_64.c |    4 +--
 2 files changed, 52 insertions(+), 46 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 4105728..ba7e5b6 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -579,64 +579,72 @@ static int __init numa_emulation(unsigned long start_pfn,
 }
 #endif /* CONFIG_NUMA_EMU */
 
-void __init initmem_init(void)
+static int dummy_numa_init(void)
 {
-	int acpi = 0, amd = 0;
-	int i;
-
-#ifdef CONFIG_ACPI_NUMA
-	/*
-	 * Parse SRAT to discover nodes.
-	 */
-	acpi = !x86_acpi_numa_init();
-#endif
-
-#ifdef CONFIG_AMD_NUMA
-	if (!acpi)
-		amd = !amd_numa_init();
-#endif
-
-	nodes_clear(node_possible_map);
-	nodes_clear(node_online_map);
-
-#ifdef CONFIG_NUMA_EMU
-	setup_physnodes(0, max_pfn << PAGE_SHIFT, acpi, amd);
-	if (cmdline && !numa_emulation(0, max_pfn, acpi, amd))
-		return;
-	setup_physnodes(0, max_pfn << PAGE_SHIFT, acpi, amd);
-	nodes_clear(node_possible_map);
-	nodes_clear(node_online_map);
-#endif
-
-#ifdef CONFIG_ACPI_NUMA
-	if (!numa_off && acpi && !acpi_scan_nodes())
-		return;
-	nodes_clear(node_possible_map);
-	nodes_clear(node_online_map);
-#endif
+	return 0;
+}
 
-#ifdef CONFIG_AMD_NUMA
-	if (!numa_off && amd && !amd_scan_nodes())
-		return;
-	nodes_clear(node_possible_map);
-	nodes_clear(node_online_map);
-#endif
+static int dummy_scan_nodes(void)
+{
 	printk(KERN_INFO "%s\n",
 	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
-
 	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
 	       0LU, max_pfn << PAGE_SHIFT);
+
 	/* setup dummy node covering all memory */
 	memnode_shift = 63;
 	memnodemap = memnode.embedded_map;
 	memnodemap[0] = 0;
 	node_set_online(0);
 	node_set(0, node_possible_map);
-	for (i = 0; i < MAX_LOCAL_APIC; i++)
-		set_apicid_to_node(i, NUMA_NO_NODE);
 	memblock_x86_register_active_regions(0, 0, max_pfn);
 	setup_node_bootmem(0, 0, max_pfn << PAGE_SHIFT);
 	numa_init_array();
+
+	return 0;
+}
+
+void __init initmem_init(void)
+{
+	int (*numa_init[])(void) = { [2] = dummy_numa_init };
+	int (*scan_nodes[])(void) = { [2] = dummy_scan_nodes };
+	int i, j;
+
+	if (!numa_off) {
+#ifdef CONFIG_ACPI_NUMA
+		numa_init[0] = x86_acpi_numa_init;
+		scan_nodes[0] = acpi_scan_nodes;
+#endif
+#ifdef CONFIG_AMD_NUMA
+		numa_init[1] = amd_numa_init;
+		scan_nodes[1] = amd_scan_nodes;
+#endif
+	}
+
+	for (i = 0; i < ARRAY_SIZE(numa_init); i++) {
+		if (!numa_init[i])
+			continue;
+
+		for (j = 0; j < MAX_LOCAL_APIC; j++)
+			set_apicid_to_node(j, NUMA_NO_NODE);
+
+		nodes_clear(node_possible_map);
+		nodes_clear(node_online_map);
+
+		if (numa_init[i]() < 0)
+			continue;
+#ifdef CONFIG_NUMA_EMU
+		setup_physnodes(0, max_pfn << PAGE_SHIFT, i == 0, i == 1);
+		if (cmdline && !numa_emulation(0, max_pfn, i == 0, i == 1))
+			return;
+		setup_physnodes(0, max_pfn << PAGE_SHIFT, i == 0, i == 1);
+		nodes_clear(node_possible_map);
+		nodes_clear(node_online_map);
+#endif
+		if (!scan_nodes[i]())
+			return;
+	}
+	BUG();
 }
 
 unsigned long __init numa_free_all_bootmem(void)
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 8d145ae..2da8b65 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -78,8 +78,6 @@ static __init void bad_srat(void)
 	int i;
 	printk(KERN_ERR "SRAT: SRAT not used.\n");
 	acpi_numa = -1;
-	for (i = 0; i < MAX_LOCAL_APIC; i++)
-		set_apicid_to_node(i, NUMA_NO_NODE);
 	for (i = 0; i < MAX_NUMNODES; i++) {
 		nodes[i].start = nodes[i].end = 0;
 		nodes_add[i].start = nodes_add[i].end = 0;
@@ -89,7 +87,7 @@ static __init void bad_srat(void)
 
 static __init inline int srat_disabled(void)
 {
-	return numa_off || acpi_numa < 0;
+	return acpi_numa < 0;
 }
 
 /* Callback for SLIT parsing */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 08/26] x86-64, NUMA: Use common {cpu|mem}_nodes_parsed
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (6 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 07/26] x86-64, NUMA: Restructure initmem_init() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 09/26] x86-64, NUMA: Remove local variable found from amd_numa_init() Tejun Heo
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

ACPI and amd are using separate nodes_parsed masks.  Add
{cpu|mem}_nodes_parsed and use them in all NUMA init methods.
Initialization of the masks and building node_possible_map are now
handled commonly by initmem_init().

dummy_numa_init() is updated to set node 0 on both masks.  While at
it, move the info messages from scan to init.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/numa_64.h |    3 +++
 arch/x86/mm/amdtopology_64.c   |   10 ++++------
 arch/x86/mm/numa_64.c          |   25 ++++++++++++++++++-------
 arch/x86/mm/srat_64.c          |   17 ++++++-----------
 4 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 2819afa..de45936 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -27,6 +27,9 @@ extern void setup_node_bootmem(int nodeid, unsigned long start,
  */
 #define NODE_MIN_SIZE (4*1024*1024)
 
+extern nodemask_t cpu_nodes_parsed __initdata;
+extern nodemask_t mem_nodes_parsed __initdata;
+
 extern int __cpuinit numa_cpu_node(int cpu);
 
 #ifdef CONFIG_NUMA_EMU
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index ee70257..3180d96 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -28,7 +28,6 @@
 
 static struct bootnode __initdata nodes[8];
 static unsigned char __initdata nodeids[8];
-static nodemask_t __initdata nodes_parsed = NODE_MASK_NONE;
 
 static __init int find_northbridge(void)
 {
@@ -123,7 +122,7 @@ int __init amd_numa_init(void)
 			       nodeid, (base >> 8) & 3, (limit >> 8) & 3);
 			return -EINVAL;
 		}
-		if (node_isset(nodeid, nodes_parsed)) {
+		if (node_isset(nodeid, mem_nodes_parsed)) {
 			pr_info("Node %d already present, skipping\n",
 				nodeid);
 			continue;
@@ -173,7 +172,8 @@ int __init amd_numa_init(void)
 
 		prevbase = base;
 
-		node_set(nodeid, nodes_parsed);
+		node_set(nodeid, mem_nodes_parsed);
+		node_set(nodeid, cpu_nodes_parsed);
 	}
 
 	if (!found)
@@ -190,7 +190,7 @@ void __init amd_get_nodes(struct bootnode *physnodes)
 {
 	int i;
 
-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
 		physnodes[i].start = nodes[i].start;
 		physnodes[i].end = nodes[i].end;
 	}
@@ -258,8 +258,6 @@ int __init amd_scan_nodes(void)
 	unsigned int apicid_base;
 	int i;
 
-	BUG_ON(nodes_empty(nodes_parsed));
-	node_possible_map = nodes_parsed;
 	memnode_shift = compute_hash_shift(nodes, 8, NULL);
 	if (memnode_shift < 0) {
 		pr_err("No NUMA node hash function found. Contact maintainer\n");
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index ba7e5b6..86be8e3 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -25,6 +25,9 @@
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
 
+nodemask_t cpu_nodes_parsed __initdata;
+nodemask_t mem_nodes_parsed __initdata;
+
 struct memnode memnode;
 
 static unsigned long __initdata nodemap_addr;
@@ -581,22 +584,23 @@ static int __init numa_emulation(unsigned long start_pfn,
 
 static int dummy_numa_init(void)
 {
-	return 0;
-}
-
-static int dummy_scan_nodes(void)
-{
 	printk(KERN_INFO "%s\n",
 	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
 	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
 	       0LU, max_pfn << PAGE_SHIFT);
 
+	node_set(0, cpu_nodes_parsed);
+	node_set(0, mem_nodes_parsed);
+
+	return 0;
+}
+
+static int dummy_scan_nodes(void)
+{
 	/* setup dummy node covering all memory */
 	memnode_shift = 63;
 	memnodemap = memnode.embedded_map;
 	memnodemap[0] = 0;
-	node_set_online(0);
-	node_set(0, node_possible_map);
 	memblock_x86_register_active_regions(0, 0, max_pfn);
 	setup_node_bootmem(0, 0, max_pfn << PAGE_SHIFT);
 	numa_init_array();
@@ -628,6 +632,8 @@ void __init initmem_init(void)
 		for (j = 0; j < MAX_LOCAL_APIC; j++)
 			set_apicid_to_node(j, NUMA_NO_NODE);
 
+		nodes_clear(cpu_nodes_parsed);
+		nodes_clear(mem_nodes_parsed);
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
 
@@ -641,6 +647,11 @@ void __init initmem_init(void)
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
 #endif
+		/* Account for nodes with cpus and no memory */
+		nodes_or(node_possible_map, mem_nodes_parsed, cpu_nodes_parsed);
+		if (WARN_ON(nodes_empty(node_possible_map)))
+			continue;
+
 		if (!scan_nodes[i]())
 			return;
 	}
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 2da8b65..822bd68 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -28,8 +28,6 @@ int acpi_numa __initdata;
 
 static struct acpi_table_slit *acpi_slit;
 
-static nodemask_t nodes_parsed __initdata;
-static nodemask_t cpu_nodes_parsed __initdata;
 static struct bootnode nodes[MAX_NUMNODES] __initdata;
 static struct bootnode nodes_add[MAX_NUMNODES];
 
@@ -293,7 +291,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
 		nd = &nodes[node];
-		if (!node_test_and_set(node, nodes_parsed)) {
+		if (!node_test_and_set(node, mem_nodes_parsed)) {
 			nd->start = start;
 			nd->end = end;
 		} else {
@@ -319,7 +317,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
 	unsigned long pxmram, e820ram;
 
 	pxmram = 0;
-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
 		unsigned long s = nodes[i].start >> PAGE_SHIFT;
 		unsigned long e = nodes[i].end >> PAGE_SHIFT;
 		pxmram += e - s;
@@ -348,7 +346,7 @@ void __init acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
 {
 	int i;
 
-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
 		cutoff_node(i, start, end);
 		physnodes[i].start = nodes[i].start;
 		physnodes[i].end = nodes[i].end;
@@ -447,9 +445,6 @@ int __init acpi_scan_nodes(void)
 		return -1;
 	}
 
-	/* Account for nodes with cpus and no memory */
-	nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
-
 	/* Finally register nodes */
 	for_each_node_mask(i, node_possible_map)
 		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
@@ -483,7 +478,7 @@ static int __init find_node_by_addr(unsigned long addr)
 	int ret = NUMA_NO_NODE;
 	int i;
 
-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
 		/*
 		 * Find the real node that this emulated node appears on.  For
 		 * the sake of simplicity, we only use a real node's starting
@@ -543,10 +538,10 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 		__acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
 	memcpy(__apicid_to_node, fake_apicid_to_node, sizeof(__apicid_to_node));
 
-	nodes_clear(nodes_parsed);
+	nodes_clear(mem_nodes_parsed);
 	for (i = 0; i < num_nodes; i++)
 		if (fake_nodes[i].start != fake_nodes[i].end)
-			node_set(i, nodes_parsed);
+			node_set(i, mem_nodes_parsed);
 }
 
 static int null_slit_node_compare(int a, int b)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 09/26] x86-64, NUMA: Remove local variable found from amd_numa_init()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (7 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 08/26] x86-64, NUMA: Use common {cpu|mem}_nodes_parsed Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() Tejun Heo
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Use weight count on mem_nodes_parsed instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/amdtopology_64.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 3180d96..c5eddfa 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -74,7 +74,7 @@ int __init amd_numa_init(void)
 	unsigned long end = PFN_PHYS(max_pfn);
 	unsigned numnodes;
 	unsigned long prevbase;
-	int i, nb, found = 0;
+	int i, nb;
 	u32 nodeid, reg;
 
 	if (!early_pci_allowed())
@@ -165,8 +165,6 @@ int __init amd_numa_init(void)
 		pr_info("Node %d MemBase %016lx Limit %016lx\n",
 			nodeid, base, limit);
 
-		found++;
-
 		nodes[nodeid].start = base;
 		nodes[nodeid].end = limit;
 
@@ -176,7 +174,7 @@ int __init amd_numa_init(void)
 		node_set(nodeid, cpu_nodes_parsed);
 	}
 
-	if (!found)
+	if (!nodes_weight(mem_nodes_parsed))
 		return -ENOENT;
 	return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (8 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 09/26] x86-64, NUMA: Remove local variable found from amd_numa_init() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-14 22:59   ` Cyrill Gorcunov
  2011-02-12 17:10 ` [PATCH 11/26] x86-64, NUMA: Use common numa_nodes[] Tejun Heo
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

This brings amd initialization behavior closer to that of acpi.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/amdtopology_64.c |   40 ++++++++++++++++++++++------------------
 1 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index c5eddfa..4056333 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -74,8 +74,9 @@ int __init amd_numa_init(void)
 	unsigned long end = PFN_PHYS(max_pfn);
 	unsigned numnodes;
 	unsigned long prevbase;
-	int i, nb;
+	int i, j, nb;
 	u32 nodeid, reg;
+	unsigned int bits, cores, apicid_base;
 
 	if (!early_pci_allowed())
 		return -EINVAL;
@@ -176,6 +177,26 @@ int __init amd_numa_init(void)
 
 	if (!nodes_weight(mem_nodes_parsed))
 		return -ENOENT;
+
+	/*
+	 * We seem to have valid NUMA configuration.  Map apicids to nodes
+	 * using the coreid bits from early_identify_cpu.
+	 */
+	bits = boot_cpu_data.x86_coreid_bits;
+	cores = 1 << bits;
+	apicid_base = 0;
+
+	/* get the APIC ID of the BSP early for systems with apicid lifting */
+	early_get_boot_cpu_id();
+	if (boot_cpu_physical_apicid > 0) {
+		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
+		apicid_base = boot_cpu_physical_apicid;
+	}
+
+	for_each_node_mask(i, cpu_nodes_parsed)
+		for (j = apicid_base; j < cores + apicid_base; j++)
+			set_apicid_to_node((i << bits) + j, i);
+
 	return 0;
 }
 
@@ -251,9 +272,6 @@ void __init amd_fake_nodes(const struct bootnode *nodes, int nr_nodes)
 
 int __init amd_scan_nodes(void)
 {
-	unsigned int bits;
-	unsigned int cores;
-	unsigned int apicid_base;
 	int i;
 
 	memnode_shift = compute_hash_shift(nodes, 8, NULL);
@@ -264,24 +282,10 @@ int __init amd_scan_nodes(void)
 	pr_info("Using node hash shift of %d\n", memnode_shift);
 
 	/* use the coreid bits from early_identify_cpu */
-	bits = boot_cpu_data.x86_coreid_bits;
-	cores = (1<<bits);
-	apicid_base = 0;
-	/* get the APIC ID of the BSP early for systems with apicid lifting */
-	early_get_boot_cpu_id();
-	if (boot_cpu_physical_apicid > 0) {
-		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
-		apicid_base = boot_cpu_physical_apicid;
-	}
-
 	for_each_node_mask(i, node_possible_map) {
-		int j;
-
 		memblock_x86_register_active_regions(i,
 				nodes[i].start >> PAGE_SHIFT,
 				nodes[i].end >> PAGE_SHIFT);
-		for (j = apicid_base; j < cores + apicid_base; j++)
-			set_apicid_to_node((i << bits) + j, i);
 		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 11/26] x86-64, NUMA: Use common numa_nodes[]
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (9 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 12/26] x86-64, NUMA: Kill {acpi|amd}_get_nodes() Tejun Heo
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

ACPI and amd are using separate nodes[] array.  Add numa_nodes[] and
use them in all NUMA init methods.  cutoff_node() cleanup is moved
from srat_64.c to numa_64.c and applied in initmem_init() regardless
of init methods.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/numa_64.h |    1 +
 arch/x86/mm/amdtopology_64.c   |   19 ++++++++---------
 arch/x86/mm/numa_64.c          |   24 ++++++++++++++++++++++
 arch/x86/mm/srat_64.c          |   43 ++++++++++-----------------------------
 4 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index de45936..d3a4514 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -29,6 +29,7 @@ extern void setup_node_bootmem(int nodeid, unsigned long start,
 
 extern nodemask_t cpu_nodes_parsed __initdata;
 extern nodemask_t mem_nodes_parsed __initdata;
+extern struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
 
 extern int __cpuinit numa_cpu_node(int cpu);
 
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 4056333..06698b1 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -26,7 +26,6 @@
 #include <asm/apic.h>
 #include <asm/amd_nb.h>
 
-static struct bootnode __initdata nodes[8];
 static unsigned char __initdata nodeids[8];
 
 static __init int find_northbridge(void)
@@ -166,8 +165,8 @@ int __init amd_numa_init(void)
 		pr_info("Node %d MemBase %016lx Limit %016lx\n",
 			nodeid, base, limit);
 
-		nodes[nodeid].start = base;
-		nodes[nodeid].end = limit;
+		numa_nodes[nodeid].start = base;
+		numa_nodes[nodeid].end = limit;
 
 		prevbase = base;
 
@@ -210,8 +209,8 @@ void __init amd_get_nodes(struct bootnode *physnodes)
 	int i;
 
 	for_each_node_mask(i, mem_nodes_parsed) {
-		physnodes[i].start = nodes[i].start;
-		physnodes[i].end = nodes[i].end;
+		physnodes[i].start = numa_nodes[i].start;
+		physnodes[i].end = numa_nodes[i].end;
 	}
 }
 
@@ -221,7 +220,7 @@ static int __init find_node_by_addr(unsigned long addr)
 	int i;
 
 	for (i = 0; i < 8; i++)
-		if (addr >= nodes[i].start && addr < nodes[i].end) {
+		if (addr >= numa_nodes[i].start && addr < numa_nodes[i].end) {
 			ret = i;
 			break;
 		}
@@ -274,7 +273,7 @@ int __init amd_scan_nodes(void)
 {
 	int i;
 
-	memnode_shift = compute_hash_shift(nodes, 8, NULL);
+	memnode_shift = compute_hash_shift(numa_nodes, 8, NULL);
 	if (memnode_shift < 0) {
 		pr_err("No NUMA node hash function found. Contact maintainer\n");
 		return -1;
@@ -284,9 +283,9 @@ int __init amd_scan_nodes(void)
 	/* use the coreid bits from early_identify_cpu */
 	for_each_node_mask(i, node_possible_map) {
 		memblock_x86_register_active_regions(i,
-				nodes[i].start >> PAGE_SHIFT,
-				nodes[i].end >> PAGE_SHIFT);
-		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+				numa_nodes[i].start >> PAGE_SHIFT,
+				numa_nodes[i].end >> PAGE_SHIFT);
+		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
 	}
 
 	numa_init_array();
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 86be8e3..a0bceaa 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -33,6 +33,8 @@ struct memnode memnode;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
+struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
+
 /*
  * Given a shift value, try to populate memnodemap[]
  * Returns :
@@ -182,6 +184,22 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 	return NULL;
 }
 
+static __init void cutoff_node(int i, unsigned long start, unsigned long end)
+{
+	struct bootnode *nd = &numa_nodes[i];
+
+	if (nd->start < start) {
+		nd->start = start;
+		if (nd->end < nd->start)
+			nd->start = nd->end;
+	}
+	if (nd->end > end) {
+		nd->end = end;
+		if (nd->start > nd->end)
+			nd->start = nd->end;
+	}
+}
+
 /* Initialize bootmem allocator for a node */
 void __init
 setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
@@ -636,9 +654,15 @@ void __init initmem_init(void)
 		nodes_clear(mem_nodes_parsed);
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
+		memset(numa_nodes, 0, sizeof(numa_nodes));
 
 		if (numa_init[i]() < 0)
 			continue;
+
+		/* clean up the node list */
+		for (j = 0; j < MAX_NUMNODES; j++)
+			cutoff_node(j, 0, max_pfn << PAGE_SHIFT);
+
 #ifdef CONFIG_NUMA_EMU
 		setup_physnodes(0, max_pfn << PAGE_SHIFT, i == 0, i == 1);
 		if (cmdline && !numa_emulation(0, max_pfn, i == 0, i == 1))
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 822bd68..abb17d6 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -28,7 +28,6 @@ int acpi_numa __initdata;
 
 static struct acpi_table_slit *acpi_slit;
 
-static struct bootnode nodes[MAX_NUMNODES] __initdata;
 static struct bootnode nodes_add[MAX_NUMNODES];
 
 static int num_node_memblks __initdata;
@@ -55,29 +54,13 @@ static __init int conflicting_memblks(unsigned long start, unsigned long end)
 	return -1;
 }
 
-static __init void cutoff_node(int i, unsigned long start, unsigned long end)
-{
-	struct bootnode *nd = &nodes[i];
-
-	if (nd->start < start) {
-		nd->start = start;
-		if (nd->end < nd->start)
-			nd->start = nd->end;
-	}
-	if (nd->end > end) {
-		nd->end = end;
-		if (nd->start > nd->end)
-			nd->start = nd->end;
-	}
-}
-
 static __init void bad_srat(void)
 {
 	int i;
 	printk(KERN_ERR "SRAT: SRAT not used.\n");
 	acpi_numa = -1;
 	for (i = 0; i < MAX_NUMNODES; i++) {
-		nodes[i].start = nodes[i].end = 0;
+		numa_nodes[i].start = numa_nodes[i].end = 0;
 		nodes_add[i].start = nodes_add[i].end = 0;
 	}
 	remove_all_active_ranges();
@@ -276,12 +259,12 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	if (i == node) {
 		printk(KERN_WARNING
 		"SRAT: Warning: PXM %d (%lx-%lx) overlaps with itself (%Lx-%Lx)\n",
-			pxm, start, end, nodes[i].start, nodes[i].end);
+		       pxm, start, end, numa_nodes[i].start, numa_nodes[i].end);
 	} else if (i >= 0) {
 		printk(KERN_ERR
 		       "SRAT: PXM %d (%lx-%lx) overlaps with PXM %d (%Lx-%Lx)\n",
 		       pxm, start, end, node_to_pxm(i),
-			nodes[i].start, nodes[i].end);
+		       numa_nodes[i].start, numa_nodes[i].end);
 		bad_srat();
 		return;
 	}
@@ -290,7 +273,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	       start, end);
 
 	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
-		nd = &nodes[node];
+		nd = &numa_nodes[node];
 		if (!node_test_and_set(node, mem_nodes_parsed)) {
 			nd->start = start;
 			nd->end = end;
@@ -347,9 +330,8 @@ void __init acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
 	int i;
 
 	for_each_node_mask(i, mem_nodes_parsed) {
-		cutoff_node(i, start, end);
-		physnodes[i].start = nodes[i].start;
-		physnodes[i].end = nodes[i].end;
+		physnodes[i].start = numa_nodes[i].start;
+		physnodes[i].end = numa_nodes[i].end;
 	}
 }
 #endif /* CONFIG_NUMA_EMU */
@@ -372,10 +354,6 @@ int __init acpi_scan_nodes(void)
 	if (acpi_numa <= 0)
 		return -1;
 
-	/* First clean up the node list */
-	for (i = 0; i < MAX_NUMNODES; i++)
-		cutoff_node(i, 0, max_pfn << PAGE_SHIFT);
-
 	/*
 	 * Join together blocks on the same node, holes between
 	 * which don't overlap with memory on other nodes.
@@ -440,19 +418,20 @@ int __init acpi_scan_nodes(void)
 
 	/* for out of order entries in SRAT */
 	sort_node_map();
-	if (!nodes_cover_memory(nodes)) {
+	if (!nodes_cover_memory(numa_nodes)) {
 		bad_srat();
 		return -1;
 	}
 
 	/* Finally register nodes */
 	for_each_node_mask(i, node_possible_map)
-		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
 	/* Try again in case setup_node_bootmem missed one due
 	   to missing bootmem */
 	for_each_node_mask(i, node_possible_map)
 		if (!node_online(i))
-			setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+			setup_node_bootmem(i, numa_nodes[i].start,
+					   numa_nodes[i].end);
 
 	for (i = 0; i < nr_cpu_ids; i++) {
 		int node = early_cpu_to_node(i);
@@ -484,7 +463,7 @@ static int __init find_node_by_addr(unsigned long addr)
 		 * the sake of simplicity, we only use a real node's starting
 		 * address to determine which emulated node it appears on.
 		 */
-		if (addr >= nodes[i].start && addr < nodes[i].end) {
+		if (addr >= numa_nodes[i].start && addr < numa_nodes[i].end) {
 			ret = i;
 			break;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 12/26] x86-64, NUMA: Kill {acpi|amd}_get_nodes()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (10 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 11/26] x86-64, NUMA: Use common numa_nodes[] Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 13/26] x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk() Tejun Heo
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

With common numa_nodes[], common code in numa_64.c can access it
directly.  Copy directly and kill {acpi|amd}_get_nodes().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/acpi.h   |    2 --
 arch/x86/include/asm/amd_nb.h |    1 -
 arch/x86/mm/amdtopology_64.c  |   10 ----------
 arch/x86/mm/numa_64.c         |   23 ++++++++++-------------
 arch/x86/mm/srat_64.c         |   13 -------------
 5 files changed, 10 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 06fb786..446a5b9 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -185,8 +185,6 @@ struct bootnode;
 
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
-extern void acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
-				unsigned long end);
 extern int x86_acpi_numa_init(void);
 extern int acpi_scan_nodes(void);
 #define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 72abf65..765966f 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -21,7 +21,6 @@ extern int amd_scan_nodes(void);
 
 #ifdef CONFIG_NUMA_EMU
 extern void amd_fake_nodes(const struct bootnode *nodes, int nr_nodes);
-extern void amd_get_nodes(struct bootnode *nodes);
 #endif
 
 struct amd_northbridge {
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 06698b1..fe93e23 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -204,16 +204,6 @@ static s16 fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
-void __init amd_get_nodes(struct bootnode *physnodes)
-{
-	int i;
-
-	for_each_node_mask(i, mem_nodes_parsed) {
-		physnodes[i].start = numa_nodes[i].start;
-		physnodes[i].end = numa_nodes[i].end;
-	}
-}
-
 static int __init find_node_by_addr(unsigned long addr)
 {
 	int ret = NUMA_NO_NODE;
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index a0bceaa..2d3ee2f 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -257,21 +257,18 @@ void __init numa_emu_cmdline(char *str)
 	cmdline = str;
 }
 
-static int __init setup_physnodes(unsigned long start, unsigned long end,
-					int acpi, int amd)
+static int __init setup_physnodes(unsigned long start, unsigned long end)
 {
 	int ret = 0;
 	int i;
 
 	memset(physnodes, 0, sizeof(physnodes));
-#ifdef CONFIG_ACPI_NUMA
-	if (acpi)
-		acpi_get_nodes(physnodes, start, end);
-#endif
-#ifdef CONFIG_AMD_NUMA
-	if (amd)
-		amd_get_nodes(physnodes);
-#endif
+
+	for_each_node_mask(i, mem_nodes_parsed) {
+		physnodes[i].start = numa_nodes[i].start;
+		physnodes[i].end = numa_nodes[i].end;
+	}
+
 	/*
 	 * Basic sanity checking on the physical node map: there may be errors
 	 * if the SRAT or AMD code incorrectly reported the topology or the mem=
@@ -593,7 +590,7 @@ static int __init numa_emulation(unsigned long start_pfn,
 						nodes[i].end >> PAGE_SHIFT);
 		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 	}
-	setup_physnodes(addr, max_addr, acpi, amd);
+	setup_physnodes(addr, max_addr);
 	fake_physnodes(acpi, amd, num_nodes);
 	numa_init_array();
 	return 0;
@@ -664,10 +661,10 @@ void __init initmem_init(void)
 			cutoff_node(j, 0, max_pfn << PAGE_SHIFT);
 
 #ifdef CONFIG_NUMA_EMU
-		setup_physnodes(0, max_pfn << PAGE_SHIFT, i == 0, i == 1);
+		setup_physnodes(0, max_pfn << PAGE_SHIFT);
 		if (cmdline && !numa_emulation(0, max_pfn, i == 0, i == 1))
 			return;
-		setup_physnodes(0, max_pfn << PAGE_SHIFT, i == 0, i == 1);
+		setup_physnodes(0, max_pfn << PAGE_SHIFT);
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
 #endif
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index abb17d6..d84c983 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -323,19 +323,6 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
 
 void __init acpi_numa_arch_fixup(void) {}
 
-#ifdef CONFIG_NUMA_EMU
-void __init acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
-				unsigned long end)
-{
-	int i;
-
-	for_each_node_mask(i, mem_nodes_parsed) {
-		physnodes[i].start = numa_nodes[i].start;
-		physnodes[i].end = numa_nodes[i].end;
-	}
-}
-#endif /* CONFIG_NUMA_EMU */
-
 int __init x86_acpi_numa_init(void)
 {
 	int ret;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 13/26] x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (11 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 12/26] x86-64, NUMA: Kill {acpi|amd}_get_nodes() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 14/26] x86-64, NUMA: Unify use of memblk in all init methods Tejun Heo
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Factor out memblk handling from srat_64.c into two functions in
numa_64.c.  This patch doesn't introduce any behavior change.  The
next patch will make all init methods use these functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/acpi.h    |    1 -
 arch/x86/include/asm/numa_64.h |    5 ++-
 arch/x86/mm/numa_64.c          |  109 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/srat_64.c          |   96 +----------------------------------
 4 files changed, 116 insertions(+), 95 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 446a5b9..12bd1fd 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -187,7 +187,6 @@ struct bootnode;
 extern int acpi_numa;
 extern int x86_acpi_numa_init(void);
 extern int acpi_scan_nodes(void);
-#define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
 
 #ifdef CONFIG_NUMA_EMU
 extern void acpi_fake_nodes(const struct bootnode *fake_nodes,
diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index d3a4514..2b6a1c5 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -25,13 +25,16 @@ extern void setup_node_bootmem(int nodeid, unsigned long start,
  * result from BIOS bugs. So dont recognize nodes as standalone
  * NUMA entities that have less than this amount of RAM listed:
  */
-#define NODE_MIN_SIZE (4*1024*1024)
+#define NODE_MIN_SIZE		(4*1024*1024)
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES*2)
 
 extern nodemask_t cpu_nodes_parsed __initdata;
 extern nodemask_t mem_nodes_parsed __initdata;
 extern struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
 
 extern int __cpuinit numa_cpu_node(int cpu);
+extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+extern int __init numa_register_memblks(void);
 
 #ifdef CONFIG_NUMA_EMU
 #define FAKE_NODE_MIN_SIZE	((u64)32 << 20)
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 2d3ee2f..bbc42ca 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -33,6 +33,10 @@ struct memnode memnode;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
+static int num_node_memblks __initdata;
+static struct bootnode node_memblk_range[NR_NODE_MEMBLKS] __initdata;
+static int memblk_nodeid[NR_NODE_MEMBLKS] __initdata;
+
 struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
 
 /*
@@ -184,6 +188,43 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 	return NULL;
 }
 
+static __init int conflicting_memblks(unsigned long start, unsigned long end)
+{
+	int i;
+	for (i = 0; i < num_node_memblks; i++) {
+		struct bootnode *nd = &node_memblk_range[i];
+		if (nd->start == nd->end)
+			continue;
+		if (nd->end > start && nd->start < end)
+			return memblk_nodeid[i];
+		if (nd->end == end && nd->start == start)
+			return memblk_nodeid[i];
+	}
+	return -1;
+}
+
+int __init numa_add_memblk(int nid, u64 start, u64 end)
+{
+	int i;
+
+	i = conflicting_memblks(start, end);
+	if (i == nid) {
+		printk(KERN_WARNING "NUMA: Warning: node %d (%Lx-%Lx) overlaps with itself (%Lx-%Lx)\n",
+		       nid, start, end, numa_nodes[i].start, numa_nodes[i].end);
+	} else if (i >= 0) {
+		printk(KERN_ERR "NUMA: node %d (%Lx-%Lx) overlaps with node %d (%Lx-%Lx)\n",
+		       nid, start, end, i,
+		       numa_nodes[i].start, numa_nodes[i].end);
+		return -EINVAL;
+	}
+
+	node_memblk_range[num_node_memblks].start = start;
+	node_memblk_range[num_node_memblks].end = end;
+	memblk_nodeid[num_node_memblks] = nid;
+	num_node_memblks++;
+	return 0;
+}
+
 static __init void cutoff_node(int i, unsigned long start, unsigned long end)
 {
 	struct bootnode *nd = &numa_nodes[i];
@@ -246,6 +287,71 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 	node_set_online(nodeid);
 }
 
+int __init numa_register_memblks(void)
+{
+	int i;
+
+	/*
+	 * Join together blocks on the same node, holes between
+	 * which don't overlap with memory on other nodes.
+	 */
+	for (i = 0; i < num_node_memblks; ++i) {
+		int j, k;
+
+		for (j = i + 1; j < num_node_memblks; ++j) {
+			unsigned long start, end;
+
+			if (memblk_nodeid[i] != memblk_nodeid[j])
+				continue;
+			start = min(node_memblk_range[i].end,
+			            node_memblk_range[j].end);
+			end = max(node_memblk_range[i].start,
+			          node_memblk_range[j].start);
+			for (k = 0; k < num_node_memblks; ++k) {
+				if (memblk_nodeid[i] == memblk_nodeid[k])
+					continue;
+				if (start < node_memblk_range[k].end &&
+				    end > node_memblk_range[k].start)
+					break;
+			}
+			if (k < num_node_memblks)
+				continue;
+			start = min(node_memblk_range[i].start,
+			            node_memblk_range[j].start);
+			end = max(node_memblk_range[i].end,
+			          node_memblk_range[j].end);
+			printk(KERN_INFO "NUMA: Node %d [%Lx,%Lx) + [%Lx,%Lx) -> [%lx,%lx)\n",
+			       memblk_nodeid[i],
+			       node_memblk_range[i].start,
+			       node_memblk_range[i].end,
+			       node_memblk_range[j].start,
+			       node_memblk_range[j].end,
+			       start, end);
+			node_memblk_range[i].start = start;
+			node_memblk_range[i].end = end;
+			k = --num_node_memblks - j;
+			memmove(memblk_nodeid + j, memblk_nodeid + j+1,
+				k * sizeof(*memblk_nodeid));
+			memmove(node_memblk_range + j, node_memblk_range + j+1,
+				k * sizeof(*node_memblk_range));
+			--j;
+		}
+	}
+
+	memnode_shift = compute_hash_shift(node_memblk_range, num_node_memblks,
+					   memblk_nodeid);
+	if (memnode_shift < 0) {
+		printk(KERN_ERR "NUMA: No NUMA node hash function found. Contact maintainer\n");
+		return -EINVAL;
+	}
+
+	for (i = 0; i < num_node_memblks; i++)
+		memblock_x86_register_active_regions(memblk_nodeid[i],
+				node_memblk_range[i].start >> PAGE_SHIFT,
+				node_memblk_range[i].end >> PAGE_SHIFT);
+	return 0;
+}
+
 #ifdef CONFIG_NUMA_EMU
 /* Numa emulation */
 static struct bootnode nodes[MAX_NUMNODES] __initdata;
@@ -651,6 +757,9 @@ void __init initmem_init(void)
 		nodes_clear(mem_nodes_parsed);
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
+		num_node_memblks = 0;
+		memset(node_memblk_range, 0, sizeof(node_memblk_range));
+		memset(memblk_nodeid, 0, sizeof(memblk_nodeid));
 		memset(numa_nodes, 0, sizeof(numa_nodes));
 
 		if (numa_init[i]() < 0)
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index d84c983..b0f0616 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -30,30 +30,11 @@ static struct acpi_table_slit *acpi_slit;
 
 static struct bootnode nodes_add[MAX_NUMNODES];
 
-static int num_node_memblks __initdata;
-static struct bootnode node_memblk_range[NR_NODE_MEMBLKS] __initdata;
-static int memblk_nodeid[NR_NODE_MEMBLKS] __initdata;
-
 static __init int setup_node(int pxm)
 {
 	return acpi_map_pxm_to_node(pxm);
 }
 
-static __init int conflicting_memblks(unsigned long start, unsigned long end)
-{
-	int i;
-	for (i = 0; i < num_node_memblks; i++) {
-		struct bootnode *nd = &node_memblk_range[i];
-		if (nd->start == nd->end)
-			continue;
-		if (nd->end > start && nd->start < end)
-			return memblk_nodeid[i];
-		if (nd->end == end && nd->start == start)
-			return memblk_nodeid[i];
-	}
-	return -1;
-}
-
 static __init void bad_srat(void)
 {
 	int i;
@@ -233,7 +214,6 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	struct bootnode *nd;
 	unsigned long start, end;
 	int node, pxm;
-	int i;
 
 	if (srat_disabled())
 		return;
@@ -255,16 +235,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		bad_srat();
 		return;
 	}
-	i = conflicting_memblks(start, end);
-	if (i == node) {
-		printk(KERN_WARNING
-		"SRAT: Warning: PXM %d (%lx-%lx) overlaps with itself (%Lx-%Lx)\n",
-		       pxm, start, end, numa_nodes[i].start, numa_nodes[i].end);
-	} else if (i >= 0) {
-		printk(KERN_ERR
-		       "SRAT: PXM %d (%lx-%lx) overlaps with PXM %d (%Lx-%Lx)\n",
-		       pxm, start, end, node_to_pxm(i),
-		       numa_nodes[i].start, numa_nodes[i].end);
+
+	if (numa_add_memblk(node, start, end) < 0) {
 		bad_srat();
 		return;
 	}
@@ -285,11 +257,6 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		}
 	} else
 		update_nodes_add(node, start, end);
-
-	node_memblk_range[num_node_memblks].start = start;
-	node_memblk_range[num_node_memblks].end = end;
-	memblk_nodeid[num_node_memblks] = node;
-	num_node_memblks++;
 }
 
 /* Sanity check to catch more bad SRATs (they are amazingly common).
@@ -341,68 +308,11 @@ int __init acpi_scan_nodes(void)
 	if (acpi_numa <= 0)
 		return -1;
 
-	/*
-	 * Join together blocks on the same node, holes between
-	 * which don't overlap with memory on other nodes.
-	 */
-	for (i = 0; i < num_node_memblks; ++i) {
-		int j, k;
-
-		for (j = i + 1; j < num_node_memblks; ++j) {
-			unsigned long start, end;
-
-			if (memblk_nodeid[i] != memblk_nodeid[j])
-				continue;
-			start = min(node_memblk_range[i].end,
-			            node_memblk_range[j].end);
-			end = max(node_memblk_range[i].start,
-			          node_memblk_range[j].start);
-			for (k = 0; k < num_node_memblks; ++k) {
-				if (memblk_nodeid[i] == memblk_nodeid[k])
-					continue;
-				if (start < node_memblk_range[k].end &&
-				    end > node_memblk_range[k].start)
-					break;
-			}
-			if (k < num_node_memblks)
-				continue;
-			start = min(node_memblk_range[i].start,
-			            node_memblk_range[j].start);
-			end = max(node_memblk_range[i].end,
-			          node_memblk_range[j].end);
-			printk(KERN_INFO "SRAT: Node %d "
-			       "[%Lx,%Lx) + [%Lx,%Lx) -> [%lx,%lx)\n",
-			       memblk_nodeid[i],
-			       node_memblk_range[i].start,
-			       node_memblk_range[i].end,
-			       node_memblk_range[j].start,
-			       node_memblk_range[j].end,
-			       start, end);
-			node_memblk_range[i].start = start;
-			node_memblk_range[i].end = end;
-			k = --num_node_memblks - j;
-			memmove(memblk_nodeid + j, memblk_nodeid + j+1,
-				k * sizeof(*memblk_nodeid));
-			memmove(node_memblk_range + j, node_memblk_range + j+1,
-				k * sizeof(*node_memblk_range));
-			--j;
-		}
-	}
-
-	memnode_shift = compute_hash_shift(node_memblk_range, num_node_memblks,
-					   memblk_nodeid);
-	if (memnode_shift < 0) {
-		printk(KERN_ERR
-		     "SRAT: No NUMA node hash function found. Contact maintainer\n");
+	if (numa_register_memblks() < 0) {
 		bad_srat();
 		return -1;
 	}
 
-	for (i = 0; i < num_node_memblks; i++)
-		memblock_x86_register_active_regions(memblk_nodeid[i],
-				node_memblk_range[i].start >> PAGE_SHIFT,
-				node_memblk_range[i].end >> PAGE_SHIFT);
-
 	/* for out of order entries in SRAT */
 	sort_node_map();
 	if (!nodes_cover_memory(numa_nodes)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 14/26] x86-64, NUMA: Unify use of memblk in all init methods
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (12 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 13/26] x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration Tejun Heo
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Make both amd and dummy use numa_add_memblk() to describe the detected
memory blocks.  This allows initmem_init() to call
numa_register_memblk() regardless of init method in use.  Drop custom
memory registration codes from amd and dummy.

After this change, memblk merge/cleanup in numa_register_memblks() is
applied to all init methods.

As this makes compute_hash_shift() and numa_register_memblks() used
only inside numa_64.c, make them static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/numa_64.h |    4 ----
 arch/x86/mm/amdtopology_64.c   |   15 ++-------------
 arch/x86/mm/numa_64.c          |   15 +++++++--------
 arch/x86/mm/srat_64.c          |    5 -----
 4 files changed, 9 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 2b6a1c5..fbc9d33 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -8,9 +8,6 @@ struct bootnode {
 	u64 end;
 };
 
-extern int compute_hash_shift(struct bootnode *nodes, int numblks,
-			      int *nodeids);
-
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
 extern int numa_off;
@@ -34,7 +31,6 @@ extern struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
 
 extern int __cpuinit numa_cpu_node(int cpu);
 extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
-extern int __init numa_register_memblks(void);
 
 #ifdef CONFIG_NUMA_EMU
 #define FAKE_NODE_MIN_SIZE	((u64)32 << 20)
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index fe93e23..48ec374 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -167,6 +167,7 @@ int __init amd_numa_init(void)
 
 		numa_nodes[nodeid].start = base;
 		numa_nodes[nodeid].end = limit;
+		numa_add_memblk(nodeid, base, limit);
 
 		prevbase = base;
 
@@ -263,20 +264,8 @@ int __init amd_scan_nodes(void)
 {
 	int i;
 
-	memnode_shift = compute_hash_shift(numa_nodes, 8, NULL);
-	if (memnode_shift < 0) {
-		pr_err("No NUMA node hash function found. Contact maintainer\n");
-		return -1;
-	}
-	pr_info("Using node hash shift of %d\n", memnode_shift);
-
-	/* use the coreid bits from early_identify_cpu */
-	for_each_node_mask(i, node_possible_map) {
-		memblock_x86_register_active_regions(i,
-				numa_nodes[i].start >> PAGE_SHIFT,
-				numa_nodes[i].end >> PAGE_SHIFT);
+	for_each_node_mask(i, node_possible_map)
 		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
-	}
 
 	numa_init_array();
 	return 0;
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index bbc42ca..2e2ca94 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -131,8 +131,8 @@ static int __init extract_lsb_from_nodes(const struct bootnode *nodes,
 	return i;
 }
 
-int __init compute_hash_shift(struct bootnode *nodes, int numnodes,
-			      int *nodeids)
+static int __init compute_hash_shift(struct bootnode *nodes, int numnodes,
+				     int *nodeids)
 {
 	int shift;
 
@@ -287,7 +287,7 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 	node_set_online(nodeid);
 }
 
-int __init numa_register_memblks(void)
+static int __init numa_register_memblks(void)
 {
 	int i;
 
@@ -712,17 +712,13 @@ static int dummy_numa_init(void)
 
 	node_set(0, cpu_nodes_parsed);
 	node_set(0, mem_nodes_parsed);
+	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
 
 	return 0;
 }
 
 static int dummy_scan_nodes(void)
 {
-	/* setup dummy node covering all memory */
-	memnode_shift = 63;
-	memnodemap = memnode.embedded_map;
-	memnodemap[0] = 0;
-	memblock_x86_register_active_regions(0, 0, max_pfn);
 	setup_node_bootmem(0, 0, max_pfn << PAGE_SHIFT);
 	numa_init_array();
 
@@ -782,6 +778,9 @@ void __init initmem_init(void)
 		if (WARN_ON(nodes_empty(node_possible_map)))
 			continue;
 
+		if (numa_register_memblks() < 0)
+			continue;
+
 		if (!scan_nodes[i]())
 			return;
 	}
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index b0f0616..755d157 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -308,11 +308,6 @@ int __init acpi_scan_nodes(void)
 	if (acpi_numa <= 0)
 		return -1;
 
-	if (numa_register_memblks() < 0) {
-		bad_srat();
-		return -1;
-	}
-
 	/* for out of order entries in SRAT */
 	sort_node_map();
 	if (!nodes_cover_memory(numa_nodes)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (13 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 14/26] x86-64, NUMA: Unify use of memblk in all init methods Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-13  0:45   ` Yinghai Lu
  2011-02-12 17:10 ` [PATCH 16/26] x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes() Tejun Heo
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Move the remaining memblk registration logic from acpi_scan_nodes() to
numa_register_memblks() and initmem_init().

This applies nodes_cover_memory() sanity check, memory node sorting
and node_online() checking, which were only applied to acpi, to all
init methods.

As all memblk registration is moved to common code, active range
clearing is moved to initmem_init() too and removed from bad_srat().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/amdtopology_64.c |    6 ---
 arch/x86/mm/numa_64.c        |   71 +++++++++++++++++++++++++++++++++++++++---
 arch/x86/mm/srat_64.c        |   59 ----------------------------------
 3 files changed, 66 insertions(+), 70 deletions(-)

diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 48ec374..9c9f46a 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -262,11 +262,5 @@ void __init amd_fake_nodes(const struct bootnode *nodes, int nr_nodes)
 
 int __init amd_scan_nodes(void)
 {
-	int i;
-
-	for_each_node_mask(i, node_possible_map)
-		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
-
-	numa_init_array();
 	return 0;
 }
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 2e2ca94..062649d 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -287,6 +287,37 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 	node_set_online(nodeid);
 }
 
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static int __init nodes_cover_memory(const struct bootnode *nodes)
+{
+	unsigned long numaram, e820ram;
+	int i;
+
+	numaram = 0;
+	for_each_node_mask(i, mem_nodes_parsed) {
+		unsigned long s = nodes[i].start >> PAGE_SHIFT;
+		unsigned long e = nodes[i].end >> PAGE_SHIFT;
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(i, s, e);
+		if ((long)numaram < 0)
+			numaram = 0;
+	}
+
+	e820ram = max_pfn -
+		(memblock_x86_hole_size(0, max_pfn<<PAGE_SHIFT) >> PAGE_SHIFT);
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((long)(e820ram - numaram) >= (1<<(20 - PAGE_SHIFT))) {
+		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
+			(numaram << PAGE_SHIFT) >> 20,
+			(e820ram << PAGE_SHIFT) >> 20);
+		return 0;
+	}
+	return 1;
+}
+
 static int __init numa_register_memblks(void)
 {
 	int i;
@@ -349,6 +380,25 @@ static int __init numa_register_memblks(void)
 		memblock_x86_register_active_regions(memblk_nodeid[i],
 				node_memblk_range[i].start >> PAGE_SHIFT,
 				node_memblk_range[i].end >> PAGE_SHIFT);
+
+	/* for out of order entries */
+	sort_node_map();
+	if (!nodes_cover_memory(numa_nodes))
+		return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(i, node_possible_map)
+		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
+
+	/*
+	 * Try again in case setup_node_bootmem missed one due to missing
+	 * bootmem.
+	 */
+	for_each_node_mask(i, node_possible_map)
+		if (!node_online(i))
+			setup_node_bootmem(i, numa_nodes[i].start,
+					   numa_nodes[i].end);
+
 	return 0;
 }
 
@@ -713,15 +763,14 @@ static int dummy_numa_init(void)
 	node_set(0, cpu_nodes_parsed);
 	node_set(0, mem_nodes_parsed);
 	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
+	numa_nodes[0].start = 0;
+	numa_nodes[0].end = (u64)max_pfn << PAGE_SHIFT;
 
 	return 0;
 }
 
 static int dummy_scan_nodes(void)
 {
-	setup_node_bootmem(0, 0, max_pfn << PAGE_SHIFT);
-	numa_init_array();
-
 	return 0;
 }
 
@@ -757,6 +806,7 @@ void __init initmem_init(void)
 		memset(node_memblk_range, 0, sizeof(node_memblk_range));
 		memset(memblk_nodeid, 0, sizeof(memblk_nodeid));
 		memset(numa_nodes, 0, sizeof(numa_nodes));
+		remove_all_active_ranges();
 
 		if (numa_init[i]() < 0)
 			continue;
@@ -781,8 +831,19 @@ void __init initmem_init(void)
 		if (numa_register_memblks() < 0)
 			continue;
 
-		if (!scan_nodes[i]())
-			return;
+		if (scan_nodes[i]() < 0)
+			continue;
+
+		for (j = 0; j < nr_cpu_ids; j++) {
+			int nid = early_cpu_to_node(j);
+
+			if (nid == NUMA_NO_NODE)
+				continue;
+			if (!node_online(nid))
+				numa_clear_node(j);
+		}
+		numa_init_array();
+		return;
 	}
 	BUG();
 }
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 755d157..4a2c33b 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -44,7 +44,6 @@ static __init void bad_srat(void)
 		numa_nodes[i].start = numa_nodes[i].end = 0;
 		nodes_add[i].start = nodes_add[i].end = 0;
 	}
-	remove_all_active_ranges();
 }
 
 static __init inline int srat_disabled(void)
@@ -259,35 +258,6 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		update_nodes_add(node, start, end);
 }
 
-/* Sanity check to catch more bad SRATs (they are amazingly common).
-   Make sure the PXMs cover all memory. */
-static int __init nodes_cover_memory(const struct bootnode *nodes)
-{
-	int i;
-	unsigned long pxmram, e820ram;
-
-	pxmram = 0;
-	for_each_node_mask(i, mem_nodes_parsed) {
-		unsigned long s = nodes[i].start >> PAGE_SHIFT;
-		unsigned long e = nodes[i].end >> PAGE_SHIFT;
-		pxmram += e - s;
-		pxmram -= __absent_pages_in_range(i, s, e);
-		if ((long)pxmram < 0)
-			pxmram = 0;
-	}
-
-	e820ram = max_pfn - (memblock_x86_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
-	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
-	if ((long)(e820ram - pxmram) >= (1<<(20 - PAGE_SHIFT))) {
-		printk(KERN_ERR
-	"SRAT: PXMs only cover %luMB of your %luMB e820 RAM. Not used.\n",
-			(pxmram << PAGE_SHIFT) >> 20,
-			(e820ram << PAGE_SHIFT) >> 20);
-		return 0;
-	}
-	return 1;
-}
-
 void __init acpi_numa_arch_fixup(void) {}
 
 int __init x86_acpi_numa_init(void)
@@ -303,37 +273,8 @@ int __init x86_acpi_numa_init(void)
 /* Use the information discovered above to actually set up the nodes. */
 int __init acpi_scan_nodes(void)
 {
-	int i;
-
 	if (acpi_numa <= 0)
 		return -1;
-
-	/* for out of order entries in SRAT */
-	sort_node_map();
-	if (!nodes_cover_memory(numa_nodes)) {
-		bad_srat();
-		return -1;
-	}
-
-	/* Finally register nodes */
-	for_each_node_mask(i, node_possible_map)
-		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
-	/* Try again in case setup_node_bootmem missed one due
-	   to missing bootmem */
-	for_each_node_mask(i, node_possible_map)
-		if (!node_online(i))
-			setup_node_bootmem(i, numa_nodes[i].start,
-					   numa_nodes[i].end);
-
-	for (i = 0; i < nr_cpu_ids; i++) {
-		int node = early_cpu_to_node(i);
-
-		if (node == NUMA_NO_NODE)
-			continue;
-		if (!node_online(node))
-			numa_clear_node(i);
-	}
-	numa_init_array();
 	return 0;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 16/26] x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (14 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 17/26] x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift() Tejun Heo
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

They are empty now.  Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/acpi.h   |    1 -
 arch/x86/include/asm/amd_nb.h |    1 -
 arch/x86/mm/amdtopology_64.c  |    5 -----
 arch/x86/mm/numa_64.c         |   11 -----------
 arch/x86/mm/srat_64.c         |    8 --------
 5 files changed, 0 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 12bd1fd..cfa3d5c 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -186,7 +186,6 @@ struct bootnode;
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
 extern int x86_acpi_numa_init(void);
-extern int acpi_scan_nodes(void);
 
 #ifdef CONFIG_NUMA_EMU
 extern void acpi_fake_nodes(const struct bootnode *fake_nodes,
diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 765966f..627aff3 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -17,7 +17,6 @@ extern int early_is_amd_nb(u32 value);
 extern int amd_cache_northbridges(void);
 extern void amd_flush_garts(void);
 extern int amd_numa_init(void);
-extern int amd_scan_nodes(void);
 
 #ifdef CONFIG_NUMA_EMU
 extern void amd_fake_nodes(const struct bootnode *nodes, int nr_nodes);
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 9c9f46a..90cf297 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -259,8 +259,3 @@ void __init amd_fake_nodes(const struct bootnode *nodes, int nr_nodes)
 	memcpy(__apicid_to_node, fake_apicid_to_node, sizeof(__apicid_to_node));
 }
 #endif /* CONFIG_NUMA_EMU */
-
-int __init amd_scan_nodes(void)
-{
-	return 0;
-}
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 062649d..be173c4 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -769,25 +769,17 @@ static int dummy_numa_init(void)
 	return 0;
 }
 
-static int dummy_scan_nodes(void)
-{
-	return 0;
-}
-
 void __init initmem_init(void)
 {
 	int (*numa_init[])(void) = { [2] = dummy_numa_init };
-	int (*scan_nodes[])(void) = { [2] = dummy_scan_nodes };
 	int i, j;
 
 	if (!numa_off) {
 #ifdef CONFIG_ACPI_NUMA
 		numa_init[0] = x86_acpi_numa_init;
-		scan_nodes[0] = acpi_scan_nodes;
 #endif
 #ifdef CONFIG_AMD_NUMA
 		numa_init[1] = amd_numa_init;
-		scan_nodes[1] = amd_scan_nodes;
 #endif
 	}
 
@@ -831,9 +823,6 @@ void __init initmem_init(void)
 		if (numa_register_memblks() < 0)
 			continue;
 
-		if (scan_nodes[i]() < 0)
-			continue;
-
 		for (j = 0; j < nr_cpu_ids; j++) {
 			int nid = early_cpu_to_node(j);
 
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 4a2c33b..d56eff8 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -270,14 +270,6 @@ int __init x86_acpi_numa_init(void)
 	return srat_disabled() ? -EINVAL : 0;
 }
 
-/* Use the information discovered above to actually set up the nodes. */
-int __init acpi_scan_nodes(void)
-{
-	if (acpi_numa <= 0)
-		return -1;
-	return 0;
-}
-
 #ifdef CONFIG_NUMA_EMU
 static int fake_node_to_pxm_map[MAX_NUMNODES] __initdata = {
 	[0 ... MAX_NUMNODES-1] = PXM_INVAL
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 17/26] x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (15 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 16/26] x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 18/26] x86-64, NUMA: Introduce struct numa_meminfo Tejun Heo
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

numa_emulation() called compute_hash_shift() with %NULL @nodeids which
meant identity mapping between index and nodeid.  Make
numa_emulation() build identity array and drop %NULL @nodeids handling
from populate_memnodemap() and thus from compute_hash_shift().  This
is to prepare for transition to using memblks instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index be173c4..1d79cd8 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -63,12 +63,7 @@ static int __init populate_memnodemap(const struct bootnode *nodes,
 		do {
 			if (memnodemap[addr >> shift] != NUMA_NO_NODE)
 				return -1;
-
-			if (!nodeids)
-				memnodemap[addr >> shift] = i;
-			else
-				memnodemap[addr >> shift] = nodeids[i];
-
+			memnodemap[addr >> shift] = nodeids[i];
 			addr += (1UL << shift);
 		} while (addr < end);
 		res = 1;
@@ -704,6 +699,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
 static int __init numa_emulation(unsigned long start_pfn,
 			unsigned long last_pfn, int acpi, int amd)
 {
+	static int nodeid[NR_NODE_MEMBLKS] __initdata;
 	u64 addr = start_pfn << PAGE_SHIFT;
 	u64 max_addr = last_pfn << PAGE_SHIFT;
 	int num_nodes;
@@ -728,7 +724,11 @@ static int __init numa_emulation(unsigned long start_pfn,
 
 	if (num_nodes < 0)
 		return num_nodes;
-	memnode_shift = compute_hash_shift(nodes, num_nodes, NULL);
+
+	for (i = 0; i < ARRAY_SIZE(nodeid); i++)
+		nodeid[i] = i;
+
+	memnode_shift = compute_hash_shift(nodes, num_nodes, nodeid);
 	if (memnode_shift < 0) {
 		memnode_shift = 0;
 		printk(KERN_ERR "No NUMA hash function found.  NUMA emulation "
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 18/26] x86-64, NUMA: Introduce struct numa_meminfo
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (16 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 17/26] x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 19/26] x86-64, NUMA: Separate out numa_cleanup_meminfo() Tejun Heo
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Arrays for memblks and nodeids and their length lived in separate
variables making things unnecessarily cumbersome.  Introduce struct
numa_meminfo which contains all memory configuration info.  This patch
doesn't cause any behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |  145 +++++++++++++++++++++++++------------------------
 1 files changed, 75 insertions(+), 70 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 1d79cd8..04ea17b 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -22,6 +22,17 @@
 #include <asm/acpi.h>
 #include <asm/amd_nb.h>
 
+struct numa_memblk {
+	u64			start;
+	u64			end;
+	int			nid;
+};
+
+struct numa_meminfo {
+	int			nr_blks;
+	struct numa_memblk	blk[NR_NODE_MEMBLKS];
+};
+
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
 
@@ -33,9 +44,7 @@ struct memnode memnode;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
-static int num_node_memblks __initdata;
-static struct bootnode node_memblk_range[NR_NODE_MEMBLKS] __initdata;
-static int memblk_nodeid[NR_NODE_MEMBLKS] __initdata;
+static struct numa_meminfo numa_meminfo __initdata;
 
 struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
 
@@ -46,16 +55,15 @@ struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
  * 0 if memnodmap[] too small (of shift too small)
  * -1 if node overlap or lost ram (shift too big)
  */
-static int __init populate_memnodemap(const struct bootnode *nodes,
-				      int numnodes, int shift, int *nodeids)
+static int __init populate_memnodemap(const struct numa_meminfo *mi, int shift)
 {
 	unsigned long addr, end;
 	int i, res = -1;
 
 	memset(memnodemap, 0xff, sizeof(s16)*memnodemapsize);
-	for (i = 0; i < numnodes; i++) {
-		addr = nodes[i].start;
-		end = nodes[i].end;
+	for (i = 0; i < mi->nr_blks; i++) {
+		addr = mi->blk[i].start;
+		end = mi->blk[i].end;
 		if (addr >= end)
 			continue;
 		if ((end >> shift) >= memnodemapsize)
@@ -63,7 +71,7 @@ static int __init populate_memnodemap(const struct bootnode *nodes,
 		do {
 			if (memnodemap[addr >> shift] != NUMA_NO_NODE)
 				return -1;
-			memnodemap[addr >> shift] = nodeids[i];
+			memnodemap[addr >> shift] = mi->blk[i].nid;
 			addr += (1UL << shift);
 		} while (addr < end);
 		res = 1;
@@ -101,16 +109,15 @@ static int __init allocate_cachealigned_memnodemap(void)
  * The LSB of all start and end addresses in the node map is the value of the
  * maximum possible shift.
  */
-static int __init extract_lsb_from_nodes(const struct bootnode *nodes,
-					 int numnodes)
+static int __init extract_lsb_from_nodes(const struct numa_meminfo *mi)
 {
 	int i, nodes_used = 0;
 	unsigned long start, end;
 	unsigned long bitfield = 0, memtop = 0;
 
-	for (i = 0; i < numnodes; i++) {
-		start = nodes[i].start;
-		end = nodes[i].end;
+	for (i = 0; i < mi->nr_blks; i++) {
+		start = mi->blk[i].start;
+		end = mi->blk[i].end;
 		if (start >= end)
 			continue;
 		bitfield |= start;
@@ -126,18 +133,17 @@ static int __init extract_lsb_from_nodes(const struct bootnode *nodes,
 	return i;
 }
 
-static int __init compute_hash_shift(struct bootnode *nodes, int numnodes,
-				     int *nodeids)
+static int __init compute_hash_shift(const struct numa_meminfo *mi)
 {
 	int shift;
 
-	shift = extract_lsb_from_nodes(nodes, numnodes);
+	shift = extract_lsb_from_nodes(mi);
 	if (allocate_cachealigned_memnodemap())
 		return -1;
 	printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n",
 		shift);
 
-	if (populate_memnodemap(nodes, numnodes, shift, nodeids) != 1) {
+	if (populate_memnodemap(mi, shift) != 1) {
 		printk(KERN_INFO "Your memory is not aligned you need to "
 		       "rebuild your kernel with a bigger NODEMAPSIZE "
 		       "shift=%d\n", shift);
@@ -185,21 +191,25 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 
 static __init int conflicting_memblks(unsigned long start, unsigned long end)
 {
+	struct numa_meminfo *mi = &numa_meminfo;
 	int i;
-	for (i = 0; i < num_node_memblks; i++) {
-		struct bootnode *nd = &node_memblk_range[i];
-		if (nd->start == nd->end)
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *blk = &mi->blk[i];
+
+		if (blk->start == blk->end)
 			continue;
-		if (nd->end > start && nd->start < end)
-			return memblk_nodeid[i];
-		if (nd->end == end && nd->start == start)
-			return memblk_nodeid[i];
+		if (blk->end > start && blk->start < end)
+			return blk->nid;
+		if (blk->end == end && blk->start == start)
+			return blk->nid;
 	}
 	return -1;
 }
 
 int __init numa_add_memblk(int nid, u64 start, u64 end)
 {
+	struct numa_meminfo *mi = &numa_meminfo;
 	int i;
 
 	i = conflicting_memblks(start, end);
@@ -213,10 +223,10 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 		return -EINVAL;
 	}
 
-	node_memblk_range[num_node_memblks].start = start;
-	node_memblk_range[num_node_memblks].end = end;
-	memblk_nodeid[num_node_memblks] = nid;
-	num_node_memblks++;
+	mi->blk[mi->nr_blks].start = start;
+	mi->blk[mi->nr_blks].end = end;
+	mi->blk[mi->nr_blks].nid = nid;
+	mi->nr_blks++;
 	return 0;
 }
 
@@ -315,66 +325,59 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
 
 static int __init numa_register_memblks(void)
 {
+	struct numa_meminfo *mi = &numa_meminfo;
 	int i;
 
 	/*
 	 * Join together blocks on the same node, holes between
 	 * which don't overlap with memory on other nodes.
 	 */
-	for (i = 0; i < num_node_memblks; ++i) {
+	for (i = 0; i < mi->nr_blks; ++i) {
+		struct numa_memblk *bi = &mi->blk[i];
 		int j, k;
 
-		for (j = i + 1; j < num_node_memblks; ++j) {
+		for (j = i + 1; j < mi->nr_blks; ++j) {
+			struct numa_memblk *bj = &mi->blk[j];
 			unsigned long start, end;
 
-			if (memblk_nodeid[i] != memblk_nodeid[j])
+			if (bi->nid != bj->nid)
 				continue;
-			start = min(node_memblk_range[i].end,
-			            node_memblk_range[j].end);
-			end = max(node_memblk_range[i].start,
-			          node_memblk_range[j].start);
-			for (k = 0; k < num_node_memblks; ++k) {
-				if (memblk_nodeid[i] == memblk_nodeid[k])
+			start = min(bi->end, bj->end);
+			end = max(bi->start, bj->start);
+			for (k = 0; k < mi->nr_blks; ++k) {
+				struct numa_memblk *bk = &mi->blk[k];
+
+				if (bi->nid == bk->nid)
 					continue;
-				if (start < node_memblk_range[k].end &&
-				    end > node_memblk_range[k].start)
+				if (start < bk->end && end > bk->start)
 					break;
 			}
-			if (k < num_node_memblks)
+			if (k < mi->nr_blks)
 				continue;
-			start = min(node_memblk_range[i].start,
-			            node_memblk_range[j].start);
-			end = max(node_memblk_range[i].end,
-			          node_memblk_range[j].end);
+			start = min(bi->start, bj->start);
+			end = max(bi->end, bj->end);
 			printk(KERN_INFO "NUMA: Node %d [%Lx,%Lx) + [%Lx,%Lx) -> [%lx,%lx)\n",
-			       memblk_nodeid[i],
-			       node_memblk_range[i].start,
-			       node_memblk_range[i].end,
-			       node_memblk_range[j].start,
-			       node_memblk_range[j].end,
+			       bi->nid, bi->start, bi->end, bj->start, bj->end,
 			       start, end);
-			node_memblk_range[i].start = start;
-			node_memblk_range[i].end = end;
-			k = --num_node_memblks - j;
-			memmove(memblk_nodeid + j, memblk_nodeid + j+1,
-				k * sizeof(*memblk_nodeid));
-			memmove(node_memblk_range + j, node_memblk_range + j+1,
-				k * sizeof(*node_memblk_range));
+			bi->start = start;
+			bi->end = end;
+			k = --mi->nr_blks - j;
+			memmove(mi->blk + j, mi->blk + j + 1,
+				k * sizeof(mi->blk[0]));
 			--j;
 		}
 	}
 
-	memnode_shift = compute_hash_shift(node_memblk_range, num_node_memblks,
-					   memblk_nodeid);
+	memnode_shift = compute_hash_shift(mi);
 	if (memnode_shift < 0) {
 		printk(KERN_ERR "NUMA: No NUMA node hash function found. Contact maintainer\n");
 		return -EINVAL;
 	}
 
-	for (i = 0; i < num_node_memblks; i++)
-		memblock_x86_register_active_regions(memblk_nodeid[i],
-				node_memblk_range[i].start >> PAGE_SHIFT,
-				node_memblk_range[i].end >> PAGE_SHIFT);
+	for (i = 0; i < mi->nr_blks; i++)
+		memblock_x86_register_active_regions(mi->blk[i].nid,
+					mi->blk[i].start >> PAGE_SHIFT,
+					mi->blk[i].end >> PAGE_SHIFT);
 
 	/* for out of order entries */
 	sort_node_map();
@@ -699,7 +702,7 @@ static int __init split_nodes_size_interleave(u64 addr, u64 max_addr, u64 size)
 static int __init numa_emulation(unsigned long start_pfn,
 			unsigned long last_pfn, int acpi, int amd)
 {
-	static int nodeid[NR_NODE_MEMBLKS] __initdata;
+	static struct numa_meminfo ei __initdata;
 	u64 addr = start_pfn << PAGE_SHIFT;
 	u64 max_addr = last_pfn << PAGE_SHIFT;
 	int num_nodes;
@@ -725,10 +728,14 @@ static int __init numa_emulation(unsigned long start_pfn,
 	if (num_nodes < 0)
 		return num_nodes;
 
-	for (i = 0; i < ARRAY_SIZE(nodeid); i++)
-		nodeid[i] = i;
+	ei.nr_blks = num_nodes;
+	for (i = 0; i < ei.nr_blks; i++) {
+		ei.blk[i].start = nodes[i].start;
+		ei.blk[i].end = nodes[i].end;
+		ei.blk[i].nid = i;
+	}
 
-	memnode_shift = compute_hash_shift(nodes, num_nodes, nodeid);
+	memnode_shift = compute_hash_shift(&ei);
 	if (memnode_shift < 0) {
 		memnode_shift = 0;
 		printk(KERN_ERR "No NUMA hash function found.  NUMA emulation "
@@ -794,9 +801,7 @@ void __init initmem_init(void)
 		nodes_clear(mem_nodes_parsed);
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
-		num_node_memblks = 0;
-		memset(node_memblk_range, 0, sizeof(node_memblk_range));
-		memset(memblk_nodeid, 0, sizeof(memblk_nodeid));
+		memset(&numa_meminfo, 0, sizeof(numa_meminfo));
 		memset(numa_nodes, 0, sizeof(numa_nodes));
 		remove_all_active_ranges();
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 19/26] x86-64, NUMA: Separate out numa_cleanup_meminfo()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (17 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 18/26] x86-64, NUMA: Introduce struct numa_meminfo Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 20/26] x86-64, NUMA: make numa_cleanup_meminfo() prettier Tejun Heo
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Separate out numa_cleanup_meminfo() from numa_register_memblks().
node_possible_map initialization is moved to the top of the split
numa_register_memblks().

This patch doesn't cause behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |   83 +++++++++++++++++++++++++++----------------------
 1 files changed, 46 insertions(+), 37 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 04ea17b..4f173a5 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -292,40 +292,8 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 	node_set_online(nodeid);
 }
 
-/*
- * Sanity check to catch more bad NUMA configurations (they are amazingly
- * common).  Make sure the nodes cover all memory.
- */
-static int __init nodes_cover_memory(const struct bootnode *nodes)
+static int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 {
-	unsigned long numaram, e820ram;
-	int i;
-
-	numaram = 0;
-	for_each_node_mask(i, mem_nodes_parsed) {
-		unsigned long s = nodes[i].start >> PAGE_SHIFT;
-		unsigned long e = nodes[i].end >> PAGE_SHIFT;
-		numaram += e - s;
-		numaram -= __absent_pages_in_range(i, s, e);
-		if ((long)numaram < 0)
-			numaram = 0;
-	}
-
-	e820ram = max_pfn -
-		(memblock_x86_hole_size(0, max_pfn<<PAGE_SHIFT) >> PAGE_SHIFT);
-	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
-	if ((long)(e820ram - numaram) >= (1<<(20 - PAGE_SHIFT))) {
-		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
-			(numaram << PAGE_SHIFT) >> 20,
-			(e820ram << PAGE_SHIFT) >> 20);
-		return 0;
-	}
-	return 1;
-}
-
-static int __init numa_register_memblks(void)
-{
-	struct numa_meminfo *mi = &numa_meminfo;
 	int i;
 
 	/*
@@ -368,6 +336,49 @@ static int __init numa_register_memblks(void)
 		}
 	}
 
+	return 0;
+}
+
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static int __init nodes_cover_memory(const struct bootnode *nodes)
+{
+	unsigned long numaram, e820ram;
+	int i;
+
+	numaram = 0;
+	for_each_node_mask(i, mem_nodes_parsed) {
+		unsigned long s = nodes[i].start >> PAGE_SHIFT;
+		unsigned long e = nodes[i].end >> PAGE_SHIFT;
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(i, s, e);
+		if ((long)numaram < 0)
+			numaram = 0;
+	}
+
+	e820ram = max_pfn - (memblock_x86_hole_size(0,
+					max_pfn << PAGE_SHIFT) >> PAGE_SHIFT);
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((long)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
+		       (numaram << PAGE_SHIFT) >> 20,
+		       (e820ram << PAGE_SHIFT) >> 20);
+		return 0;
+	}
+	return 1;
+}
+
+static int __init numa_register_memblks(struct numa_meminfo *mi)
+{
+	int i;
+
+	/* Account for nodes with cpus and no memory */
+	nodes_or(node_possible_map, mem_nodes_parsed, cpu_nodes_parsed);
+	if (WARN_ON(nodes_empty(node_possible_map)))
+		return -EINVAL;
+
 	memnode_shift = compute_hash_shift(mi);
 	if (memnode_shift < 0) {
 		printk(KERN_ERR "NUMA: No NUMA node hash function found. Contact maintainer\n");
@@ -820,12 +831,10 @@ void __init initmem_init(void)
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
 #endif
-		/* Account for nodes with cpus and no memory */
-		nodes_or(node_possible_map, mem_nodes_parsed, cpu_nodes_parsed);
-		if (WARN_ON(nodes_empty(node_possible_map)))
+		if (numa_cleanup_meminfo(&numa_meminfo) < 0)
 			continue;
 
-		if (numa_register_memblks() < 0)
+		if (numa_register_memblks(&numa_meminfo) < 0)
 			continue;
 
 		for (j = 0; j < nr_cpu_ids; j++) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 20/26] x86-64, NUMA: make numa_cleanup_meminfo() prettier
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (18 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 19/26] x86-64, NUMA: Separate out numa_cleanup_meminfo() Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 21/26] x86-64, NUMA: consolidate and improve memblk sanity checks Tejun Heo
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

* Factor out numa_remove_memblk_from().

* Hole detection doesn't need separate start/end.  Calculate start/end
  once.

* Relocate comment.

* Define iterators at the top and remove unnecessary prefix
  increments.

This prepares for further improvements to the function.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |   36 +++++++++++++++++++-----------------
 1 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 4f173a5..62ba1fd 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -230,6 +230,13 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 	return 0;
 }
 
+static void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
+{
+	mi->nr_blks--;
+	memmove(&mi->blk[idx], &mi->blk[idx + 1],
+		(mi->nr_blks - idx) * sizeof(mi->blk[0]));
+}
+
 static __init void cutoff_node(int i, unsigned long start, unsigned long end)
 {
 	struct bootnode *nd = &numa_nodes[i];
@@ -294,25 +301,25 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 
 static int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 {
-	int i;
+	int i, j, k;
 
-	/*
-	 * Join together blocks on the same node, holes between
-	 * which don't overlap with memory on other nodes.
-	 */
-	for (i = 0; i < mi->nr_blks; ++i) {
+	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *bi = &mi->blk[i];
-		int j, k;
 
-		for (j = i + 1; j < mi->nr_blks; ++j) {
+		for (j = i + 1; j < mi->nr_blks; j++) {
 			struct numa_memblk *bj = &mi->blk[j];
 			unsigned long start, end;
 
+			/*
+			 * Join together blocks on the same node, holes
+			 * between which don't overlap with memory on other
+			 * nodes.
+			 */
 			if (bi->nid != bj->nid)
 				continue;
-			start = min(bi->end, bj->end);
-			end = max(bi->start, bj->start);
-			for (k = 0; k < mi->nr_blks; ++k) {
+			start = min(bi->start, bj->start);
+			end = max(bi->end, bj->end);
+			for (k = 0; k < mi->nr_blks; k++) {
 				struct numa_memblk *bk = &mi->blk[k];
 
 				if (bi->nid == bk->nid)
@@ -322,17 +329,12 @@ static int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 			}
 			if (k < mi->nr_blks)
 				continue;
-			start = min(bi->start, bj->start);
-			end = max(bi->end, bj->end);
 			printk(KERN_INFO "NUMA: Node %d [%Lx,%Lx) + [%Lx,%Lx) -> [%lx,%lx)\n",
 			       bi->nid, bi->start, bi->end, bj->start, bj->end,
 			       start, end);
 			bi->start = start;
 			bi->end = end;
-			k = --mi->nr_blks - j;
-			memmove(mi->blk + j, mi->blk + j + 1,
-				k * sizeof(mi->blk[0]));
-			--j;
+			numa_remove_memblk_from(j--, mi);
 		}
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 21/26] x86-64, NUMA: consolidate and improve memblk sanity checks
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (19 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 20/26] x86-64, NUMA: make numa_cleanup_meminfo() prettier Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:10 ` [PATCH 22/26] x86-64, NUMA: Add common find_node_by_addr() Tejun Heo
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

memblk sanity check was scattered around and incomplete.  Consolidate
and improve.

* Confliction detection and cutoff_node() logic are moved to
  numa_cleanup_meminfo().

* numa_cleanup_meminfo() clears the unused memblks before returning.

* Check and warn about invalid input parameters in numa_add_memblk().

* Check the maximum number of memblk isn't exceeded in
  numa_add_memblk().

* numa_cleanup_meminfo() is now called before numa_emulation() so that
  the emulation code also uses the cleaned up version.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |   99 ++++++++++++++++++++++++-------------------------
 1 files changed, 49 insertions(+), 50 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 62ba1fd..1996ee7 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -189,37 +189,23 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 	return NULL;
 }
 
-static __init int conflicting_memblks(unsigned long start, unsigned long end)
+int __init numa_add_memblk(int nid, u64 start, u64 end)
 {
 	struct numa_meminfo *mi = &numa_meminfo;
-	int i;
 
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *blk = &mi->blk[i];
+	/* ignore zero length blks */
+	if (start == end)
+		return 0;
 
-		if (blk->start == blk->end)
-			continue;
-		if (blk->end > start && blk->start < end)
-			return blk->nid;
-		if (blk->end == end && blk->start == start)
-			return blk->nid;
+	/* whine about and ignore invalid blks */
+	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
+		pr_warning("NUMA: Warning: invalid memblk node %d (%Lx-%Lx)\n",
+			   nid, start, end);
+		return 0;
 	}
-	return -1;
-}
-
-int __init numa_add_memblk(int nid, u64 start, u64 end)
-{
-	struct numa_meminfo *mi = &numa_meminfo;
-	int i;
 
-	i = conflicting_memblks(start, end);
-	if (i == nid) {
-		printk(KERN_WARNING "NUMA: Warning: node %d (%Lx-%Lx) overlaps with itself (%Lx-%Lx)\n",
-		       nid, start, end, numa_nodes[i].start, numa_nodes[i].end);
-	} else if (i >= 0) {
-		printk(KERN_ERR "NUMA: node %d (%Lx-%Lx) overlaps with node %d (%Lx-%Lx)\n",
-		       nid, start, end, i,
-		       numa_nodes[i].start, numa_nodes[i].end);
+	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
+		pr_err("NUMA: too many memblk ranges\n");
 		return -EINVAL;
 	}
 
@@ -237,22 +223,6 @@ static void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
 		(mi->nr_blks - idx) * sizeof(mi->blk[0]));
 }
 
-static __init void cutoff_node(int i, unsigned long start, unsigned long end)
-{
-	struct bootnode *nd = &numa_nodes[i];
-
-	if (nd->start < start) {
-		nd->start = start;
-		if (nd->end < nd->start)
-			nd->start = nd->end;
-	}
-	if (nd->end > end) {
-		nd->end = end;
-		if (nd->start > nd->end)
-			nd->start = nd->end;
-	}
-}
-
 /* Initialize bootmem allocator for a node */
 void __init
 setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
@@ -301,24 +271,53 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 
 static int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 {
+	const u64 low = 0;
+	const u64 high = (u64)max_pfn << PAGE_SHIFT;
 	int i, j, k;
 
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *bi = &mi->blk[i];
 
+		/* make sure all blocks are inside the limits */
+		bi->start = max(bi->start, low);
+		bi->end = min(bi->end, high);
+
+		/* and there's no empty block */
+		if (bi->start == bi->end) {
+			numa_remove_memblk_from(i--, mi);
+			continue;
+		}
+
 		for (j = i + 1; j < mi->nr_blks; j++) {
 			struct numa_memblk *bj = &mi->blk[j];
 			unsigned long start, end;
 
 			/*
+			 * See whether there are overlapping blocks.  Whine
+			 * about but allow overlaps of the same nid.  They
+			 * will be merged below.
+			 */
+			if (bi->end > bj->start && bi->start < bj->end) {
+				if (bi->nid != bj->nid) {
+					pr_err("NUMA: node %d (%Lx-%Lx) overlaps with node %d (%Lx-%Lx)\n",
+					       bi->nid, bi->start, bi->end,
+					       bj->nid, bj->start, bj->end);
+					return -EINVAL;
+				}
+				pr_warning("NUMA: Warning: node %d (%Lx-%Lx) overlaps with itself (%Lx-%Lx)\n",
+					   bi->nid, bi->start, bi->end,
+					   bj->start, bj->end);
+			}
+
+			/*
 			 * Join together blocks on the same node, holes
 			 * between which don't overlap with memory on other
 			 * nodes.
 			 */
 			if (bi->nid != bj->nid)
 				continue;
-			start = min(bi->start, bj->start);
-			end = max(bi->end, bj->end);
+			start = max(min(bi->start, bj->start), low);
+			end = min(max(bi->end, bj->end), high);
 			for (k = 0; k < mi->nr_blks; k++) {
 				struct numa_memblk *bk = &mi->blk[k];
 
@@ -338,6 +337,11 @@ static int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 		}
 	}
 
+	for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
+		mi->blk[i].start = mi->blk[i].end = 0;
+		mi->blk[i].nid = NUMA_NO_NODE;
+	}
+
 	return 0;
 }
 
@@ -821,10 +825,8 @@ void __init initmem_init(void)
 		if (numa_init[i]() < 0)
 			continue;
 
-		/* clean up the node list */
-		for (j = 0; j < MAX_NUMNODES; j++)
-			cutoff_node(j, 0, max_pfn << PAGE_SHIFT);
-
+		if (numa_cleanup_meminfo(&numa_meminfo) < 0)
+			continue;
 #ifdef CONFIG_NUMA_EMU
 		setup_physnodes(0, max_pfn << PAGE_SHIFT);
 		if (cmdline && !numa_emulation(0, max_pfn, i == 0, i == 1))
@@ -833,9 +835,6 @@ void __init initmem_init(void)
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
 #endif
-		if (numa_cleanup_meminfo(&numa_meminfo) < 0)
-			continue;
-
 		if (numa_register_memblks(&numa_meminfo) < 0)
 			continue;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 22/26] x86-64, NUMA: Add common find_node_by_addr()
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (20 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 21/26] x86-64, NUMA: consolidate and improve memblk sanity checks Tejun Heo
@ 2011-02-12 17:10 ` Tejun Heo
  2011-02-12 17:11 ` [PATCH 23/26] x86-64, NUMA: kill numa_nodes[] Tejun Heo
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:10 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

srat_64.c and amdtopology_64.c had their own versions of
find_node_by_addr() which were basically the same.  Add common one in
numa_64.c and remove the duplicates.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/numa_64.h |    1 +
 arch/x86/mm/amdtopology_64.c   |   13 -------------
 arch/x86/mm/numa_64.c          |   19 +++++++++++++++++++
 arch/x86/mm/srat_64.c          |   18 ------------------
 4 files changed, 20 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index fbc9d33..867d41b 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -36,6 +36,7 @@ extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
 #define FAKE_NODE_MIN_SIZE	((u64)32 << 20)
 #define FAKE_NODE_MIN_HASH_MASK	(~(FAKE_NODE_MIN_SIZE - 1UL))
 void numa_emu_cmdline(char *);
+int __init find_node_by_addr(unsigned long addr);
 #endif /* CONFIG_NUMA_EMU */
 #else
 static inline int numa_cpu_node(int cpu)		{ return NUMA_NO_NODE; }
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 90cf297..8f7a5eb 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -205,19 +205,6 @@ static s16 fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
-static int __init find_node_by_addr(unsigned long addr)
-{
-	int ret = NUMA_NO_NODE;
-	int i;
-
-	for (i = 0; i < 8; i++)
-		if (addr >= numa_nodes[i].start && addr < numa_nodes[i].end) {
-			ret = i;
-			break;
-		}
-	return ret;
-}
-
 /*
  * For NUMA emulation, fake proximity domain (_PXM) to node id mappings must be
  * setup to represent the physical topology but reflect the emulated
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 1996ee7..ea3fb52 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -428,6 +428,25 @@ void __init numa_emu_cmdline(char *str)
 	cmdline = str;
 }
 
+int __init find_node_by_addr(unsigned long addr)
+{
+	int ret = NUMA_NO_NODE;
+	int i;
+
+	for_each_node_mask(i, mem_nodes_parsed) {
+		/*
+		 * Find the real node that this emulated node appears on.  For
+		 * the sake of simplicity, we only use a real node's starting
+		 * address to determine which emulated node it appears on.
+		 */
+		if (addr >= numa_nodes[i].start && addr < numa_nodes[i].end) {
+			ret = i;
+			break;
+		}
+	}
+	return ret;
+}
+
 static int __init setup_physnodes(unsigned long start, unsigned long end)
 {
 	int ret = 0;
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index d56eff8..51d0733 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -277,24 +277,6 @@ static int fake_node_to_pxm_map[MAX_NUMNODES] __initdata = {
 static s16 fake_apicid_to_node[MAX_LOCAL_APIC] __initdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
-static int __init find_node_by_addr(unsigned long addr)
-{
-	int ret = NUMA_NO_NODE;
-	int i;
-
-	for_each_node_mask(i, mem_nodes_parsed) {
-		/*
-		 * Find the real node that this emulated node appears on.  For
-		 * the sake of simplicity, we only use a real node's starting
-		 * address to determine which emulated node it appears on.
-		 */
-		if (addr >= numa_nodes[i].start && addr < numa_nodes[i].end) {
-			ret = i;
-			break;
-		}
-	}
-	return ret;
-}
 
 /*
  * In NUMA emulation, we need to setup proximity domain (_PXM) to node ID
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 23/26] x86-64, NUMA: kill numa_nodes[]
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (21 preceding siblings ...)
  2011-02-12 17:10 ` [PATCH 22/26] x86-64, NUMA: Add common find_node_by_addr() Tejun Heo
@ 2011-02-12 17:11 ` Tejun Heo
  2011-02-12 17:11 ` [PATCH 24/26] x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsed Tejun Heo
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:11 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

numa_nodes[] doesn't carry any information which isn't present in
numa_meminfo.  Each entry is simply min/max range of all the memblks
for the node.  This is not only redundant but also inaccurate when
memblks for different nodes interleave - for example,
find_node_by_addr() can return the wrong nodeid.

Kill numa_nodes[] and always use numa_meminfo instead.

* nodes_cover_memory() is renamed to numa_meminfo_cover_memory() and
  now operations on numa_meminfo and returns bool.

* setup_node_bootmem() needs min/max range.  Compute the range on the
  fly.  setup_node_bootmem() invocation is restructured to use outer
  loop instead of hardcoding the double invocations.

* find_node_by_addr() now operates on numa_meminfo.

* setup_physnodes() builds physnodes[] from memblks.  This will go
  away when emulation code is updated to use struct numa_meminfo.

This patch also makes the following misc changes.

* Clearing of nodes_add[] clearing is converted to memset().

* numa_add_memblk() in amd_numa_init() is moved down a bit for
  consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/numa_64.h |    1 -
 arch/x86/mm/amdtopology_64.c   |    6 +--
 arch/x86/mm/numa_64.c          |   82 +++++++++++++++++++++++----------------
 arch/x86/mm/srat_64.c          |   22 ++---------
 4 files changed, 53 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 867d41b..da5c501 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -27,7 +27,6 @@ extern void setup_node_bootmem(int nodeid, unsigned long start,
 
 extern nodemask_t cpu_nodes_parsed __initdata;
 extern nodemask_t mem_nodes_parsed __initdata;
-extern struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
 
 extern int __cpuinit numa_cpu_node(int cpu);
 extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 8f7a5eb..0cb59e5 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -165,12 +165,8 @@ int __init amd_numa_init(void)
 		pr_info("Node %d MemBase %016lx Limit %016lx\n",
 			nodeid, base, limit);
 
-		numa_nodes[nodeid].start = base;
-		numa_nodes[nodeid].end = limit;
-		numa_add_memblk(nodeid, base, limit);
-
 		prevbase = base;
-
+		numa_add_memblk(nodeid, base, limit);
 		node_set(nodeid, mem_nodes_parsed);
 		node_set(nodeid, cpu_nodes_parsed);
 	}
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index ea3fb52..c0e45c7 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -46,8 +46,6 @@ static unsigned long __initdata nodemap_size;
 
 static struct numa_meminfo numa_meminfo __initdata;
 
-struct bootnode numa_nodes[MAX_NUMNODES] __initdata;
-
 /*
  * Given a shift value, try to populate memnodemap[]
  * Returns :
@@ -349,17 +347,17 @@ static int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
  * Sanity check to catch more bad NUMA configurations (they are amazingly
  * common).  Make sure the nodes cover all memory.
  */
-static int __init nodes_cover_memory(const struct bootnode *nodes)
+static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 {
 	unsigned long numaram, e820ram;
 	int i;
 
 	numaram = 0;
-	for_each_node_mask(i, mem_nodes_parsed) {
-		unsigned long s = nodes[i].start >> PAGE_SHIFT;
-		unsigned long e = nodes[i].end >> PAGE_SHIFT;
+	for (i = 0; i < mi->nr_blks; i++) {
+		unsigned long s = mi->blk[i].start >> PAGE_SHIFT;
+		unsigned long e = mi->blk[i].end >> PAGE_SHIFT;
 		numaram += e - s;
-		numaram -= __absent_pages_in_range(i, s, e);
+		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
 		if ((long)numaram < 0)
 			numaram = 0;
 	}
@@ -371,14 +369,14 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
 		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
 		       (numaram << PAGE_SHIFT) >> 20,
 		       (e820ram << PAGE_SHIFT) >> 20);
-		return 0;
+		return false;
 	}
-	return 1;
+	return true;
 }
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-	int i;
+	int i, j, nid;
 
 	/* Account for nodes with cpus and no memory */
 	nodes_or(node_possible_map, mem_nodes_parsed, cpu_nodes_parsed);
@@ -398,21 +396,32 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 
 	/* for out of order entries */
 	sort_node_map();
-	if (!nodes_cover_memory(numa_nodes))
+	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	/* Finally register nodes. */
-	for_each_node_mask(i, node_possible_map)
-		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
-
 	/*
-	 * Try again in case setup_node_bootmem missed one due to missing
-	 * bootmem.
+	 * Finally register nodes.  Do it twice in case setup_node_bootmem
+	 * missed one due to missing bootmem.
 	 */
-	for_each_node_mask(i, node_possible_map)
-		if (!node_online(i))
-			setup_node_bootmem(i, numa_nodes[i].start,
-					   numa_nodes[i].end);
+	for (i = 0; i < 2; i++) {
+		for_each_node_mask(nid, node_possible_map) {
+			u64 start = (u64)max_pfn << PAGE_SHIFT;
+			u64 end = 0;
+
+			if (node_online(nid))
+				continue;
+
+			for (j = 0; j < mi->nr_blks; j++) {
+				if (nid != mi->blk[j].nid)
+					continue;
+				start = min(mi->blk[j].start, start);
+				end = max(mi->blk[j].end, end);
+			}
+
+			if (start < end)
+				setup_node_bootmem(nid, start, end);
+		}
+	}
 
 	return 0;
 }
@@ -430,33 +439,41 @@ void __init numa_emu_cmdline(char *str)
 
 int __init find_node_by_addr(unsigned long addr)
 {
-	int ret = NUMA_NO_NODE;
+	const struct numa_meminfo *mi = &numa_meminfo;
 	int i;
 
-	for_each_node_mask(i, mem_nodes_parsed) {
+	for (i = 0; i < mi->nr_blks; i++) {
 		/*
 		 * Find the real node that this emulated node appears on.  For
 		 * the sake of simplicity, we only use a real node's starting
 		 * address to determine which emulated node it appears on.
 		 */
-		if (addr >= numa_nodes[i].start && addr < numa_nodes[i].end) {
-			ret = i;
-			break;
-		}
+		if (addr >= mi->blk[i].start && addr < mi->blk[i].end)
+			return mi->blk[i].nid;
 	}
-	return ret;
+	return NUMA_NO_NODE;
 }
 
 static int __init setup_physnodes(unsigned long start, unsigned long end)
 {
+	const struct numa_meminfo *mi = &numa_meminfo;
 	int ret = 0;
 	int i;
 
 	memset(physnodes, 0, sizeof(physnodes));
 
-	for_each_node_mask(i, mem_nodes_parsed) {
-		physnodes[i].start = numa_nodes[i].start;
-		physnodes[i].end = numa_nodes[i].end;
+	for (i = 0; i < mi->nr_blks; i++) {
+		int nid = mi->blk[i].nid;
+
+		if (physnodes[nid].start == physnodes[nid].end) {
+			physnodes[nid].start = mi->blk[i].start;
+			physnodes[nid].end = mi->blk[i].end;
+		} else {
+			physnodes[nid].start = min(physnodes[nid].start,
+						   mi->blk[i].start);
+			physnodes[nid].end = max(physnodes[nid].end,
+						 mi->blk[i].end);
+		}
 	}
 
 	/*
@@ -806,8 +823,6 @@ static int dummy_numa_init(void)
 	node_set(0, cpu_nodes_parsed);
 	node_set(0, mem_nodes_parsed);
 	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
-	numa_nodes[0].start = 0;
-	numa_nodes[0].end = (u64)max_pfn << PAGE_SHIFT;
 
 	return 0;
 }
@@ -838,7 +853,6 @@ void __init initmem_init(void)
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
 		memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-		memset(numa_nodes, 0, sizeof(numa_nodes));
 		remove_all_active_ranges();
 
 		if (numa_init[i]() < 0)
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 51d0733..e8b3b3c 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -37,13 +37,9 @@ static __init int setup_node(int pxm)
 
 static __init void bad_srat(void)
 {
-	int i;
 	printk(KERN_ERR "SRAT: SRAT not used.\n");
 	acpi_numa = -1;
-	for (i = 0; i < MAX_NUMNODES; i++) {
-		numa_nodes[i].start = numa_nodes[i].end = 0;
-		nodes_add[i].start = nodes_add[i].end = 0;
-	}
+	memset(nodes_add, 0, sizeof(nodes_add));
 }
 
 static __init inline int srat_disabled(void)
@@ -210,7 +206,6 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
 void __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
-	struct bootnode *nd;
 	unsigned long start, end;
 	int node, pxm;
 
@@ -243,18 +238,9 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
 	       start, end);
 
-	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
-		nd = &numa_nodes[node];
-		if (!node_test_and_set(node, mem_nodes_parsed)) {
-			nd->start = start;
-			nd->end = end;
-		} else {
-			if (start < nd->start)
-				nd->start = start;
-			if (nd->end < end)
-				nd->end = end;
-		}
-	} else
+	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
+		node_set(node, mem_nodes_parsed);
+	else
 		update_nodes_add(node, start, end);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 24/26] x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsed
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (22 preceding siblings ...)
  2011-02-12 17:11 ` [PATCH 23/26] x86-64, NUMA: kill numa_nodes[] Tejun Heo
@ 2011-02-12 17:11 ` Tejun Heo
  2011-02-12 17:11 ` [PATCH 25/26] x86-64, NUMA: Kill mem_nodes_parsed Tejun Heo
  2011-02-12 17:11 ` [PATCH 26/26] x86-64, NUMA: Implement generic node distance handling Tejun Heo
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:11 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

It's no longer necessary to keep both cpu_nodes_parsed and
mem_nodes_parsed.  In preparation for merge, rename cpu_nodes_parsed
to numa_nodes_parsed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/numa_64.h |    2 +-
 arch/x86/mm/amdtopology_64.c   |    4 ++--
 arch/x86/mm/numa_64.c          |    8 ++++----
 arch/x86/mm/srat_64.c          |    6 +++---
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index da5c501..da57c70 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -25,7 +25,7 @@ extern void setup_node_bootmem(int nodeid, unsigned long start,
 #define NODE_MIN_SIZE		(4*1024*1024)
 #define NR_NODE_MEMBLKS		(MAX_NUMNODES*2)
 
-extern nodemask_t cpu_nodes_parsed __initdata;
+extern nodemask_t numa_nodes_parsed __initdata;
 extern nodemask_t mem_nodes_parsed __initdata;
 
 extern int __cpuinit numa_cpu_node(int cpu);
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index 0cb59e5..e76bffa 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -168,7 +168,7 @@ int __init amd_numa_init(void)
 		prevbase = base;
 		numa_add_memblk(nodeid, base, limit);
 		node_set(nodeid, mem_nodes_parsed);
-		node_set(nodeid, cpu_nodes_parsed);
+		node_set(nodeid, numa_nodes_parsed);
 	}
 
 	if (!nodes_weight(mem_nodes_parsed))
@@ -189,7 +189,7 @@ int __init amd_numa_init(void)
 		apicid_base = boot_cpu_physical_apicid;
 	}
 
-	for_each_node_mask(i, cpu_nodes_parsed)
+	for_each_node_mask(i, numa_nodes_parsed)
 		for (j = apicid_base; j < cores + apicid_base; j++)
 			set_apicid_to_node((i << bits) + j, i);
 
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index c0e45c7..1797392 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -36,7 +36,7 @@ struct numa_meminfo {
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
 
-nodemask_t cpu_nodes_parsed __initdata;
+nodemask_t numa_nodes_parsed __initdata;
 nodemask_t mem_nodes_parsed __initdata;
 
 struct memnode memnode;
@@ -379,7 +379,7 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	int i, j, nid;
 
 	/* Account for nodes with cpus and no memory */
-	nodes_or(node_possible_map, mem_nodes_parsed, cpu_nodes_parsed);
+	nodes_or(node_possible_map, mem_nodes_parsed, numa_nodes_parsed);
 	if (WARN_ON(nodes_empty(node_possible_map)))
 		return -EINVAL;
 
@@ -820,7 +820,7 @@ static int dummy_numa_init(void)
 	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
 	       0LU, max_pfn << PAGE_SHIFT);
 
-	node_set(0, cpu_nodes_parsed);
+	node_set(0, numa_nodes_parsed);
 	node_set(0, mem_nodes_parsed);
 	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
 
@@ -848,7 +848,7 @@ void __init initmem_init(void)
 		for (j = 0; j < MAX_LOCAL_APIC; j++)
 			set_apicid_to_node(j, NUMA_NO_NODE);
 
-		nodes_clear(cpu_nodes_parsed);
+		nodes_clear(numa_nodes_parsed);
 		nodes_clear(mem_nodes_parsed);
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index e8b3b3c..8185189 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -94,7 +94,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
 		return;
 	}
 	set_apicid_to_node(apic_id, node);
-	node_set(node, cpu_nodes_parsed);
+	node_set(node, numa_nodes_parsed);
 	acpi_numa = 1;
 	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
 	       pxm, apic_id, node);
@@ -134,7 +134,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
 	}
 
 	set_apicid_to_node(apic_id, node);
-	node_set(node, cpu_nodes_parsed);
+	node_set(node, numa_nodes_parsed);
 	acpi_numa = 1;
 	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
 	       pxm, apic_id, node);
@@ -196,7 +196,7 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
 	}
 
 	if (changed) {
-		node_set(node, cpu_nodes_parsed);
+		node_set(node, numa_nodes_parsed);
 		printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
 				 nd->start, nd->end);
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 25/26] x86-64, NUMA: Kill mem_nodes_parsed
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (23 preceding siblings ...)
  2011-02-12 17:11 ` [PATCH 24/26] x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsed Tejun Heo
@ 2011-02-12 17:11 ` Tejun Heo
  2011-02-12 17:11 ` [PATCH 26/26] x86-64, NUMA: Implement generic node distance handling Tejun Heo
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:11 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

With all memory configuration information now carried in numa_meminfo,
there's no need to keep mem_nodes_parsed separate.  Drop it and use
numa_nodes_parsed for CPU / memory-less nodes.

A new helper numa_nodemask_from_meminfo() is added to calculate
memnode mask on the fly which is currently used to set
node_possible_map.

This simplifies NUMA init methods a bit and removes a source of
possible inconsistencies.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/numa_64.h |    1 -
 arch/x86/mm/amdtopology_64.c   |    5 ++---
 arch/x86/mm/numa_64.c          |   20 ++++++++++++++++----
 arch/x86/mm/srat_64.c          |    7 ++-----
 4 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index da57c70..04e74d8 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -26,7 +26,6 @@ extern void setup_node_bootmem(int nodeid, unsigned long start,
 #define NR_NODE_MEMBLKS		(MAX_NUMNODES*2)
 
 extern nodemask_t numa_nodes_parsed __initdata;
-extern nodemask_t mem_nodes_parsed __initdata;
 
 extern int __cpuinit numa_cpu_node(int cpu);
 extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
index e76bffa..fd7b609 100644
--- a/arch/x86/mm/amdtopology_64.c
+++ b/arch/x86/mm/amdtopology_64.c
@@ -122,7 +122,7 @@ int __init amd_numa_init(void)
 			       nodeid, (base >> 8) & 3, (limit >> 8) & 3);
 			return -EINVAL;
 		}
-		if (node_isset(nodeid, mem_nodes_parsed)) {
+		if (node_isset(nodeid, numa_nodes_parsed)) {
 			pr_info("Node %d already present, skipping\n",
 				nodeid);
 			continue;
@@ -167,11 +167,10 @@ int __init amd_numa_init(void)
 
 		prevbase = base;
 		numa_add_memblk(nodeid, base, limit);
-		node_set(nodeid, mem_nodes_parsed);
 		node_set(nodeid, numa_nodes_parsed);
 	}
 
-	if (!nodes_weight(mem_nodes_parsed))
+	if (!nodes_weight(numa_nodes_parsed))
 		return -ENOENT;
 
 	/*
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 1797392..9dd4a34 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -37,7 +37,6 @@ struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
 
 nodemask_t numa_nodes_parsed __initdata;
-nodemask_t mem_nodes_parsed __initdata;
 
 struct memnode memnode;
 
@@ -344,6 +343,20 @@ static int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 }
 
 /*
+ * Set nodes, which have memory in @mi, in *@nodemask.
+ */
+static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
+					      const struct numa_meminfo *mi)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
+		if (mi->blk[i].start != mi->blk[i].end &&
+		    mi->blk[i].nid != NUMA_NO_NODE)
+			node_set(mi->blk[i].nid, *nodemask);
+}
+
+/*
  * Sanity check to catch more bad NUMA configurations (they are amazingly
  * common).  Make sure the nodes cover all memory.
  */
@@ -379,7 +392,8 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	int i, j, nid;
 
 	/* Account for nodes with cpus and no memory */
-	nodes_or(node_possible_map, mem_nodes_parsed, numa_nodes_parsed);
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
 	if (WARN_ON(nodes_empty(node_possible_map)))
 		return -EINVAL;
 
@@ -821,7 +835,6 @@ static int dummy_numa_init(void)
 	       0LU, max_pfn << PAGE_SHIFT);
 
 	node_set(0, numa_nodes_parsed);
-	node_set(0, mem_nodes_parsed);
 	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
 
 	return 0;
@@ -849,7 +862,6 @@ void __init initmem_init(void)
 			set_apicid_to_node(j, NUMA_NO_NODE);
 
 		nodes_clear(numa_nodes_parsed);
-		nodes_clear(mem_nodes_parsed);
 		nodes_clear(node_possible_map);
 		nodes_clear(node_online_map);
 		memset(&numa_meminfo, 0, sizeof(numa_meminfo));
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 8185189..4f8e6cd 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -238,9 +238,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
 	       start, end);
 
-	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE))
-		node_set(node, mem_nodes_parsed);
-	else
+	if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
 		update_nodes_add(node, start, end);
 }
 
@@ -310,10 +308,9 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 		__acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
 	memcpy(__apicid_to_node, fake_apicid_to_node, sizeof(__apicid_to_node));
 
-	nodes_clear(mem_nodes_parsed);
 	for (i = 0; i < num_nodes; i++)
 		if (fake_nodes[i].start != fake_nodes[i].end)
-			node_set(i, mem_nodes_parsed);
+			node_set(i, numa_nodes_parsed);
 }
 
 static int null_slit_node_compare(int a, int b)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 26/26] x86-64, NUMA: Implement generic node distance handling
  2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
                   ` (24 preceding siblings ...)
  2011-02-12 17:11 ` [PATCH 25/26] x86-64, NUMA: Kill mem_nodes_parsed Tejun Heo
@ 2011-02-12 17:11 ` Tejun Heo
  25 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:11 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa
  Cc: Tejun Heo

Node distance either used direct node comparison, ACPI PXM comparison
or ACPI SLIT table lookup.  This patch implements generic node
distance handling.  NUMA init methods can call numa_set_distance() to
set distance between nodes and the common __node_distance()
implementation will report the set distance.

Due to the way NUMA emulation is implemented, the generic node
distance handling is used only when emulation is not used.  Later
patches will update NUMA emulation to use the generic distance
mechanism.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/acpi.h     |    1 +
 arch/x86/include/asm/numa_64.h  |    1 +
 arch/x86/include/asm/topology.h |    2 +-
 arch/x86/mm/numa_64.c           |   95 +++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/srat_64.c           |   27 +++++-------
 5 files changed, 109 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index cfa3d5c..9c9fe1b 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -190,6 +190,7 @@ extern int x86_acpi_numa_init(void);
 #ifdef CONFIG_NUMA_EMU
 extern void acpi_fake_nodes(const struct bootnode *fake_nodes,
 				   int num_nodes);
+extern int acpi_emu_node_distance(int a, int b);
 #endif
 #endif /* CONFIG_ACPI_NUMA */
 
diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 04e74d8..972af9d 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -29,6 +29,7 @@ extern nodemask_t numa_nodes_parsed __initdata;
 
 extern int __cpuinit numa_cpu_node(int cpu);
 extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+extern void __init numa_set_distance(int from, int to, int distance);
 
 #ifdef CONFIG_NUMA_EMU
 #define FAKE_NODE_MIN_SIZE	((u64)32 << 20)
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index b101c17..910a708 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -138,7 +138,7 @@ extern unsigned long node_remap_size[];
 	.balance_interval	= 1,					\
 }
 
-#ifdef CONFIG_X86_64_ACPI_NUMA
+#ifdef CONFIG_X86_64
 extern int __node_distance(int, int);
 #define node_distance(a, b) __node_distance(a, b)
 #endif
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 9dd4a34..b3c1418 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -45,6 +45,13 @@ static unsigned long __initdata nodemap_size;
 
 static struct numa_meminfo numa_meminfo __initdata;
 
+static int numa_distance_cnt;
+static u8 *numa_distance;
+
+#ifdef CONFIG_NUMA_EMU
+static bool numa_emu_dist;
+#endif
+
 /*
  * Given a shift value, try to populate memnodemap[]
  * Returns :
@@ -357,6 +364,92 @@ static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
 }
 
 /*
+ * Reset distance table.  The current table is freed.  The next
+ * numa_set_distance() call will create a new one.
+ */
+static void __init numa_reset_distance(void)
+{
+	size_t size;
+
+	size = numa_distance_cnt * sizeof(numa_distance[0]);
+	memblock_x86_free_range(__pa(numa_distance),
+				__pa(numa_distance) + size);
+	numa_distance = NULL;
+	numa_distance_cnt = 0;
+}
+
+/*
+ * Set the distance between node @from to @to to @distance.  If distance
+ * table doesn't exist, one which is large enough to accomodate all the
+ * currently known nodes will be created.
+ */
+void __init numa_set_distance(int from, int to, int distance)
+{
+	if (!numa_distance) {
+		nodemask_t nodes_parsed;
+		size_t size;
+		int i, j, cnt = 0;
+		u64 phys;
+
+		/* size the new table and allocate it */
+		nodes_parsed = numa_nodes_parsed;
+		numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
+
+		for_each_node_mask(i, nodes_parsed)
+			cnt = i;
+		size = ++cnt * sizeof(numa_distance[0]);
+
+		phys = memblock_find_in_range(0,
+					      (u64)max_pfn_mapped << PAGE_SHIFT,
+					      size, PAGE_SIZE);
+		if (phys == MEMBLOCK_ERROR) {
+			pr_warning("NUMA: Warning: can't allocate distance table!\n");
+			/* don't retry until explicitly reset */
+			numa_distance = (void *)1LU;
+			return;
+		}
+		memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
+
+		numa_distance = __va(phys);
+		numa_distance_cnt = cnt;
+
+		/* fill with the default distances */
+		for (i = 0; i < cnt; i++)
+			for (j = 0; j < cnt; j++)
+				numa_distance[i * cnt + j] = i == j ?
+					LOCAL_DISTANCE : REMOTE_DISTANCE;
+		printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
+	}
+
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt) {
+		printk_once(KERN_DEBUG "NUMA: Debug: distance out of bound, from=%d to=%d distance=%d\n",
+			    from, to, distance);
+		return;
+	}
+
+	if ((u8)distance != distance ||
+	    (from == to && distance != LOCAL_DISTANCE)) {
+		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
+			     from, to, distance);
+		return;
+	}
+
+	numa_distance[from * numa_distance_cnt + to] = distance;
+}
+
+int __node_distance(int from, int to)
+{
+#if defined(CONFIG_ACPI_NUMA) && defined(CONFIG_NUMA_EMU)
+	if (numa_emu_dist)
+		return acpi_emu_node_distance(from, to);
+#endif
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
+		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
+	return numa_distance[from * numa_distance_cnt + to];
+}
+EXPORT_SYMBOL(__node_distance);
+
+/*
  * Sanity check to catch more bad NUMA configurations (they are amazingly
  * common).  Make sure the nodes cover all memory.
  */
@@ -823,6 +916,7 @@ static int __init numa_emulation(unsigned long start_pfn,
 	setup_physnodes(addr, max_addr);
 	fake_physnodes(acpi, amd, num_nodes);
 	numa_init_array();
+	numa_emu_dist = true;
 	return 0;
 }
 #endif /* CONFIG_NUMA_EMU */
@@ -866,6 +960,7 @@ void __init initmem_init(void)
 		nodes_clear(node_online_map);
 		memset(&numa_meminfo, 0, sizeof(numa_meminfo));
 		remove_all_active_ranges();
+		numa_reset_distance();
 
 		if (numa_init[i]() < 0)
 			continue;
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 4f8e6cd..d2f53f3 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -50,9 +50,16 @@ static __init inline int srat_disabled(void)
 /* Callback for SLIT parsing */
 void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
 {
+	int i, j;
 	unsigned length;
 	unsigned long phys;
 
+	for (i = 0; i < slit->locality_count; i++)
+		for (j = 0; j < slit->locality_count; j++)
+			numa_set_distance(pxm_to_node(i), pxm_to_node(j),
+				slit->entry[slit->locality_count * i + j]);
+
+	/* acpi_slit is used only by emulation */
 	length = slit->header.length;
 	phys = memblock_find_in_range(0, max_pfn_mapped<<PAGE_SHIFT, length,
 		 PAGE_SIZE);
@@ -313,29 +320,17 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
 			node_set(i, numa_nodes_parsed);
 }
 
-static int null_slit_node_compare(int a, int b)
-{
-	return node_to_pxm(a) == node_to_pxm(b);
-}
-#else
-static int null_slit_node_compare(int a, int b)
-{
-	return a == b;
-}
-#endif /* CONFIG_NUMA_EMU */
-
-int __node_distance(int a, int b)
+int acpi_emu_node_distance(int a, int b)
 {
 	int index;
 
 	if (!acpi_slit)
-		return null_slit_node_compare(a, b) ? LOCAL_DISTANCE :
-						      REMOTE_DISTANCE;
+		return node_to_pxm(a) == node_to_pxm(b) ?
+			LOCAL_DISTANCE : REMOTE_DISTANCE;
 	index = acpi_slit->locality_count * node_to_pxm(a);
 	return acpi_slit->entry[index + node_to_pxm(b)];
 }
-
-EXPORT_SYMBOL(__node_distance);
+#endif /* CONFIG_NUMA_EMU */
 
 #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) || defined(CONFIG_ACPI_HOTPLUG_MEMORY)
 int memory_add_physaddr_to_nid(u64 start)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-12 17:10 ` [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() Tejun Heo
@ 2011-02-12 17:47   ` Yinghai Lu
  2011-02-12 17:56     ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-12 17:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/12/2011 09:10 AM, Tejun Heo wrote:
> Hotplug node handling in acpi_numa_memory_affinity_init() was
> unnecessarily complicated with storing the original nodes[] entry and
> restoring it afterwards.  Simplify it by not modifying the nodes[]
> entry for hotplug nodes from the beginning.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  arch/x86/mm/srat_64.c |   31 +++++++++++++------------------
>  1 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index 9a97261..e3e0dd3 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -251,7 +251,7 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
>  void __init
>  acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  {
> -	struct bootnode *nd, oldnode;
> +	struct bootnode *nd;
>  	unsigned long start, end;
>  	int node, pxm;
>  	int i;
> @@ -289,28 +289,23 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  		bad_srat();
>  		return;
>  	}
> -	nd = &nodes[node];
> -	oldnode = *nd;
> -	if (!node_test_and_set(node, nodes_parsed)) {
> -		nd->start = start;
> -		nd->end = end;
> -	} else {
> -		if (start < nd->start)
> -			nd->start = start;
> -		if (nd->end < end)
> -			nd->end = end;
> -	}
>  
>  	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
>  	       start, end);
>  
> -	if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> +	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> +		nd = &nodes[node];
> +		if (!node_test_and_set(node, nodes_parsed)) {
> +			nd->start = start;
> +			nd->end = end;
> +		} else {
> +			if (start < nd->start)
> +				nd->start = start;
> +			if (nd->end < end)
> +				nd->end = end;
> +		}
> +	} else
>  		update_nodes_add(node, start, end);
> -		/* restore nodes[node] */
> -		*nd = oldnode;
> -		if ((nd->start | nd->end) == 0)
> -			node_clear(node, nodes_parsed);
> -	}
>  
>  	node_memblk_range[num_node_memblks].start = start;
>  	node_memblk_range[num_node_memblks].end = end;

after change
looks like nodes_parsed is not set anymore for node that only have hotplug memory.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 01/26] x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones
  2011-02-12 17:10 ` [PATCH 01/26] x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones Tejun Heo
@ 2011-02-12 17:52   ` Yinghai Lu
  0 siblings, 0 replies; 77+ messages in thread
From: Yinghai Lu @ 2011-02-12 17:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/12/2011 09:10 AM, Tejun Heo wrote:
> Dummy node initialization in initmem_init() didn't initialize apicid
> to node mapping and set cpu to node mapping directly by caling
> numa_set_node(), which is different from non-dummy init paths.
> 
> Update it such that they behave similarly.  Initialize apicid to node
> mapping and call numa_init_array().  The actual cpu to node mapping is
> handled by init_cpu_to_node() later.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  arch/x86/mm/numa_64.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index f548fbf..ea5dd48 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -623,10 +623,11 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
>  	memnodemap[0] = 0;
>  	node_set_online(0);
>  	node_set(0, node_possible_map);
> -	for (i = 0; i < nr_cpu_ids; i++)
> -		numa_set_node(i, 0);
> +	for (i = 0; i < MAX_LOCAL_APIC; i++)
> +		set_apicid_to_node(i, NUMA_NO_NODE);
>  	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
>  	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
> +	numa_init_array();
>  }
>  
>  unsigned long __init numa_free_all_bootmem(void)

Acked-by: Yinghai Lu <yinghai@kernel.org>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-12 17:47   ` Yinghai Lu
@ 2011-02-12 17:56     ` Tejun Heo
  2011-02-12 18:04       ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 17:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Sat, Feb 12, 2011 at 6:47 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> -             if ((nd->start | nd->end) == 0)
>> -                     node_clear(node, nodes_parsed);
>
> after change
> looks like nodes_parsed is not set anymore for node that only have hotplug memory.

Yeap, which matches the above node_clear().  If the node was already
occupied, the bit would already be set.  If not, the above
node_clear() would clear it, so the same result.  The code was quite
convoluted.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init()
  2011-02-12 17:10 ` [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init() Tejun Heo
@ 2011-02-12 17:58   ` Yinghai Lu
  2011-02-12 18:03     ` Tejun Heo
  2011-02-14 13:50   ` [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init() Tejun Heo
  1 sibling, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-12 17:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/12/2011 09:10 AM, Tejun Heo wrote:
> initmem_init() extensively accesses and modifies global data
> structures and the parameters aren't even followed depending on which
> path is being used.  Drop @start/last_pfn and let it deal with
> @max_pfn directly.  This is in preparation for further NUMA init
> cleanups.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  arch/x86/include/asm/page_types.h |    3 +--
>  arch/x86/kernel/setup.c           |    2 +-
>  arch/x86/mm/init_64.c             |    5 ++---
>  arch/x86/mm/numa_64.c             |   21 ++++++++-------------
>  4 files changed, 12 insertions(+), 19 deletions(-)

it will break 32bit.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init()
  2011-02-12 17:58   ` Yinghai Lu
@ 2011-02-12 18:03     ` Tejun Heo
  0 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 18:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Sat, Feb 12, 2011 at 6:58 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> it will break 32bit.

Ah, right.  Forgot that part was shared with 32bit.  Will update.  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-12 17:56     ` Tejun Heo
@ 2011-02-12 18:04       ` Yinghai Lu
  2011-02-12 18:06         ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-12 18:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/12/2011 09:56 AM, Tejun Heo wrote:
> On Sat, Feb 12, 2011 at 6:47 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> -             if ((nd->start | nd->end) == 0)
>>> -                     node_clear(node, nodes_parsed);
>>
>> after change
>> looks like nodes_parsed is not set anymore for node that only have hotplug memory.
> 
> Yeap, which matches the above node_clear().  If the node was already
> occupied, the bit would already be set.  If not, the above
> node_clear() would clear it, so the same result.  The code was quite
> convoluted.

no. if the node only have hotplug mem.
then nd->start and nd->end will be set to that hot plug range.
and old code does not clear nodes_parsed...

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-12 18:04       ` Yinghai Lu
@ 2011-02-12 18:06         ` Tejun Heo
  2011-02-12 18:13           ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-12 18:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Sat, Feb 12, 2011 at 7:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> no. if the node only have hotplug mem.
> then nd->start and nd->end will be set to that hot plug range.
> and old code does not clear nodes_parsed...

Eh?  The oldnode thing will restore the node to initial state thus
fulfilling the node empty condition.  Am I missing something?

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-12 18:06         ` Tejun Heo
@ 2011-02-12 18:13           ` Yinghai Lu
  2011-02-14 11:25             ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-12 18:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/12/2011 10:06 AM, Tejun Heo wrote:
> On Sat, Feb 12, 2011 at 7:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> no. if the node only have hotplug mem.
>> then nd->start and nd->end will be set to that hot plug range.
>> and old code does not clear nodes_parsed...
> 
> Eh?  The oldnode thing will restore the node to initial state thus
> fulfilling the node empty condition.  Am I missing something?
> 

yes. nd get restored, but it keep node_parsed set for that kind of node.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values
  2011-02-12 17:10 ` [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values Tejun Heo
@ 2011-02-12 18:39   ` Yinghai Lu
  2011-02-14 11:29     ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-12 18:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/12/2011 09:10 AM, Tejun Heo wrote:
> The functions used during NUMA initialization - *_numa_init() and
> *_scan_nodes() - have different arguments and return values.  Unify
> them such that they all take no argument and return 0 on success and
> -errno on failure.  This is in preparation for further NUMA init
> cleanups.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  arch/x86/include/asm/acpi.h   |    2 +-
>  arch/x86/include/asm/amd_nb.h |    2 +-
>  arch/x86/kernel/setup.c       |    4 ++--
>  arch/x86/mm/amdtopology_64.c  |   18 +++++++++---------
>  arch/x86/mm/numa_64.c         |    2 +-
>  arch/x86/mm/srat_64.c         |    4 ++--
>  drivers/acpi/numa.c           |    9 ++++++---
>  7 files changed, 22 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
> index 211ca3f..4e5dff9 100644
> --- a/arch/x86/include/asm/acpi.h
> +++ b/arch/x86/include/asm/acpi.h
> @@ -187,7 +187,7 @@ struct bootnode;
>  extern int acpi_numa;
>  extern void acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
>  				unsigned long end);
> -extern int acpi_scan_nodes(unsigned long start, unsigned long end);
> +extern int acpi_scan_nodes(void);
>  #define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
>  
>  #ifdef CONFIG_NUMA_EMU
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 64dc82e..72abf65 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -16,7 +16,7 @@ struct bootnode;
>  extern int early_is_amd_nb(u32 value);
>  extern int amd_cache_northbridges(void);
>  extern void amd_flush_garts(void);
> -extern int amd_numa_init(unsigned long start_pfn, unsigned long end_pfn);
> +extern int amd_numa_init(void);
>  extern int amd_scan_nodes(void);
>  
>  #ifdef CONFIG_NUMA_EMU
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index c50ba3d..1870a59 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -988,12 +988,12 @@ void __init setup_arch(char **cmdline_p)
>  	/*
>  	 * Parse SRAT to discover nodes.
>  	 */
> -	acpi = acpi_numa_init();
> +	acpi = !acpi_numa_init();
>  #endif
>  
>  #ifdef CONFIG_AMD_NUMA
>  	if (!acpi)
> -		amd = !amd_numa_init(0, max_pfn);
> +		amd = !amd_numa_init();
>  #endif
>  
>  	initmem_init(acpi, amd);
> diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
> index c7fae38..ee70257 100644
> --- a/arch/x86/mm/amdtopology_64.c
> +++ b/arch/x86/mm/amdtopology_64.c
> @@ -51,7 +51,7 @@ static __init int find_northbridge(void)
>  		return num;
>  	}
>  
> -	return -1;
> +	return -ENOENT;
>  }
>  
>  static __init void early_get_boot_cpu_id(void)
> @@ -69,17 +69,17 @@ static __init void early_get_boot_cpu_id(void)
>  #endif
>  }
>  
> -int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
> +int __init amd_numa_init(void)
>  {
> -	unsigned long start = PFN_PHYS(start_pfn);
> -	unsigned long end = PFN_PHYS(end_pfn);
> +	unsigned long start = PFN_PHYS(0);
> +	unsigned long end = PFN_PHYS(max_pfn);
>  	unsigned numnodes;
>  	unsigned long prevbase;
>  	int i, nb, found = 0;
>  	u32 nodeid, reg;
>  
>  	if (!early_pci_allowed())
> -		return -1;
> +		return -EINVAL;
>  
>  	nb = find_northbridge();
>  	if (nb < 0)
> @@ -90,7 +90,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
>  	reg = read_pci_config(0, nb, 0, 0x60);
>  	numnodes = ((reg >> 4) & 0xF) + 1;
>  	if (numnodes <= 1)
> -		return -1;
> +		return -ENOENT;
>  
>  	pr_info("Number of physical nodes %d\n", numnodes);
>  
> @@ -121,7 +121,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
>  		if ((base >> 8) & 3 || (limit >> 8) & 3) {
>  			pr_err("Node %d using interleaving mode %lx/%lx\n",
>  			       nodeid, (base >> 8) & 3, (limit >> 8) & 3);
> -			return -1;
> +			return -EINVAL;
>  		}
>  		if (node_isset(nodeid, nodes_parsed)) {
>  			pr_info("Node %d already present, skipping\n",
> @@ -160,7 +160,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
>  		if (prevbase > base) {
>  			pr_err("Node map not sorted %lx,%lx\n",
>  			       prevbase, base);
> -			return -1;
> +			return -EINVAL;
>  		}
>  
>  		pr_info("Node %d MemBase %016lx Limit %016lx\n",
> @@ -177,7 +177,7 @@ int __init amd_numa_init(unsigned long start_pfn, unsigned long end_pfn)
>  	}
>  
>  	if (!found)
> -		return -1;
> +		return -ENOENT;
>  	return 0;
>  }
>  
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index f534feb..85561d1 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -595,7 +595,7 @@ void __init initmem_init(int acpi, int amd)
>  #endif
>  
>  #ifdef CONFIG_ACPI_NUMA
> -	if (!numa_off && acpi && !acpi_scan_nodes(0, max_pfn << PAGE_SHIFT))
> +	if (!numa_off && acpi && !acpi_scan_nodes())
>  		return;
>  	nodes_clear(node_possible_map);
>  	nodes_clear(node_online_map);
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index e3e0dd3..19652dd 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -359,7 +359,7 @@ void __init acpi_get_nodes(struct bootnode *physnodes, unsigned long start,
>  #endif /* CONFIG_NUMA_EMU */
>  
>  /* Use the information discovered above to actually set up the nodes. */
> -int __init acpi_scan_nodes(unsigned long start, unsigned long end)
> +int __init acpi_scan_nodes(void)
>  {
>  	int i;
>  
> @@ -368,7 +368,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
>  
>  	/* First clean up the node list */
>  	for (i = 0; i < MAX_NUMNODES; i++)
> -		cutoff_node(i, start, end);
> +		cutoff_node(i, 0, max_pfn << PAGE_SHIFT);
>  
>  	/*
>  	 * Join together blocks on the same node, holes between
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 5eb25eb..3b5c318 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -274,7 +274,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
>  
>  int __init acpi_numa_init(void)
>  {
> -	int ret = 0;
> +	int cnt = 0;
>  
>  	/*
>  	 * Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
> @@ -288,7 +288,7 @@ int __init acpi_numa_init(void)
>  				     acpi_parse_x2apic_affinity, 0);
>  		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
>  				     acpi_parse_processor_affinity, 0);
> -		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
> +		cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
>  					    acpi_parse_memory_affinity,
>  					    NR_NODE_MEMBLKS);
>  	}
> @@ -297,7 +297,10 @@ int __init acpi_numa_init(void)
>  	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>  
>  	acpi_numa_arch_fixup();
> -	return ret;
> +
> +	if (cnt <= 0)
> +		return cnt ?: -ENOENT;
> +	return 0;
>  }
>  
>  int acpi_get_pxm(acpi_handle h)


it will break AMD system that does not have SRAT.

your change will treat NO_SRAT as SRAT is there.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-12 17:10 ` [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration Tejun Heo
@ 2011-02-13  0:45   ` Yinghai Lu
  2011-02-14 11:32     ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-13  0:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/12/2011 09:10 AM, Tejun Heo wrote:
> Move the remaining memblk registration logic from acpi_scan_nodes() to
> numa_register_memblks() and initmem_init().
> 
> This applies nodes_cover_memory() sanity check, memory node sorting
> and node_online() checking, which were only applied to acpi, to all
> init methods.
> 
> As all memblk registration is moved to common code, active range
> clearing is moved to initmem_init() too and removed from bad_srat().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  arch/x86/mm/amdtopology_64.c |    6 ---
>  arch/x86/mm/numa_64.c        |   71 +++++++++++++++++++++++++++++++++++++++---
>  arch/x86/mm/srat_64.c        |   59 ----------------------------------
>  3 files changed, 66 insertions(+), 70 deletions(-)
> 
> diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
> index 48ec374..9c9f46a 100644
> --- a/arch/x86/mm/amdtopology_64.c
> +++ b/arch/x86/mm/amdtopology_64.c
> @@ -262,11 +262,5 @@ void __init amd_fake_nodes(const struct bootnode *nodes, int nr_nodes)
>  
>  int __init amd_scan_nodes(void)
>  {
> -	int i;
> -
> -	for_each_node_mask(i, node_possible_map)
> -		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
> -
> -	numa_init_array();
>  	return 0;
>  }
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index 2e2ca94..062649d 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -287,6 +287,37 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
>  	node_set_online(nodeid);
>  }
>  
> +/*
> + * Sanity check to catch more bad NUMA configurations (they are amazingly
> + * common).  Make sure the nodes cover all memory.
> + */
> +static int __init nodes_cover_memory(const struct bootnode *nodes)
> +{
> +	unsigned long numaram, e820ram;
> +	int i;
> +
> +	numaram = 0;
> +	for_each_node_mask(i, mem_nodes_parsed) {
> +		unsigned long s = nodes[i].start >> PAGE_SHIFT;
> +		unsigned long e = nodes[i].end >> PAGE_SHIFT;
> +		numaram += e - s;
> +		numaram -= __absent_pages_in_range(i, s, e);
> +		if ((long)numaram < 0)
> +			numaram = 0;
> +	}
> +
> +	e820ram = max_pfn -
> +		(memblock_x86_hole_size(0, max_pfn<<PAGE_SHIFT) >> PAGE_SHIFT);
> +	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
> +	if ((long)(e820ram - numaram) >= (1<<(20 - PAGE_SHIFT))) {
> +		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
> +			(numaram << PAGE_SHIFT) >> 20,
> +			(e820ram << PAGE_SHIFT) >> 20);
> +		return 0;
> +	}
> +	return 1;
> +}
> +
>  static int __init numa_register_memblks(void)
>  {
>  	int i;
> @@ -349,6 +380,25 @@ static int __init numa_register_memblks(void)
>  		memblock_x86_register_active_regions(memblk_nodeid[i],
>  				node_memblk_range[i].start >> PAGE_SHIFT,
>  				node_memblk_range[i].end >> PAGE_SHIFT);
> +
> +	/* for out of order entries */
> +	sort_node_map();
> +	if (!nodes_cover_memory(numa_nodes))
> +		return -EINVAL;
> +
> +	/* Finally register nodes. */
> +	for_each_node_mask(i, node_possible_map)
> +		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
> +
> +	/*
> +	 * Try again in case setup_node_bootmem missed one due to missing
> +	 * bootmem.
> +	 */
> +	for_each_node_mask(i, node_possible_map)
> +		if (!node_online(i))
> +			setup_node_bootmem(i, numa_nodes[i].start,
> +					   numa_nodes[i].end);
> +
>  	return 0;
>  }

please don't put setup_node_bootmem calling into numa_register_memblks()
that is not related.

put the calling in initmem_init() directly is more reasonable.

>  
> @@ -713,15 +763,14 @@ static int dummy_numa_init(void)
>  	node_set(0, cpu_nodes_parsed);
>  	node_set(0, mem_nodes_parsed);
>  	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
> +	numa_nodes[0].start = 0;
> +	numa_nodes[0].end = (u64)max_pfn << PAGE_SHIFT;
>  
>  	return 0;
>  }
>  
>  static int dummy_scan_nodes(void)
>  {
> -	setup_node_bootmem(0, 0, max_pfn << PAGE_SHIFT);
> -	numa_init_array();
> -
>  	return 0;
>  }
>  
> @@ -757,6 +806,7 @@ void __init initmem_init(void)
>  		memset(node_memblk_range, 0, sizeof(node_memblk_range));
>  		memset(memblk_nodeid, 0, sizeof(memblk_nodeid));
>  		memset(numa_nodes, 0, sizeof(numa_nodes));
> +		remove_all_active_ranges();
>  
>  		if (numa_init[i]() < 0)
>  			continue;
> @@ -781,8 +831,19 @@ void __init initmem_init(void)
>  		if (numa_register_memblks() < 0)
>  			continue;
>  
> -		if (!scan_nodes[i]())
> -			return;
> +		if (scan_nodes[i]() < 0)
> +			continue;
> +
> +		for (j = 0; j < nr_cpu_ids; j++) {
> +			int nid = early_cpu_to_node(j);
> +
> +			if (nid == NUMA_NO_NODE)
> +				continue;
> +			if (!node_online(nid))
> +				numa_clear_node(j);
> +		}
> +		numa_init_array();
> +		return;
>  	}
>  	BUG();
>  }
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index 755d157..4a2c33b 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -44,7 +44,6 @@ static __init void bad_srat(void)
>  		numa_nodes[i].start = numa_nodes[i].end = 0;
>  		nodes_add[i].start = nodes_add[i].end = 0;
>  	}
> -	remove_all_active_ranges();
>  }
>  
>  static __init inline int srat_disabled(void)
> @@ -259,35 +258,6 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  		update_nodes_add(node, start, end);
>  }
>  
> -/* Sanity check to catch more bad SRATs (they are amazingly common).
> -   Make sure the PXMs cover all memory. */
> -static int __init nodes_cover_memory(const struct bootnode *nodes)
> -{
> -	int i;
> -	unsigned long pxmram, e820ram;
> -
> -	pxmram = 0;
> -	for_each_node_mask(i, mem_nodes_parsed) {
> -		unsigned long s = nodes[i].start >> PAGE_SHIFT;
> -		unsigned long e = nodes[i].end >> PAGE_SHIFT;
> -		pxmram += e - s;
> -		pxmram -= __absent_pages_in_range(i, s, e);
> -		if ((long)pxmram < 0)
> -			pxmram = 0;
> -	}
> -
> -	e820ram = max_pfn - (memblock_x86_hole_size(0, max_pfn<<PAGE_SHIFT)>>PAGE_SHIFT);
> -	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
> -	if ((long)(e820ram - pxmram) >= (1<<(20 - PAGE_SHIFT))) {
> -		printk(KERN_ERR
> -	"SRAT: PXMs only cover %luMB of your %luMB e820 RAM. Not used.\n",
> -			(pxmram << PAGE_SHIFT) >> 20,
> -			(e820ram << PAGE_SHIFT) >> 20);
> -		return 0;
> -	}
> -	return 1;
> -}
> -
>  void __init acpi_numa_arch_fixup(void) {}
>  
>  int __init x86_acpi_numa_init(void)
> @@ -303,37 +273,8 @@ int __init x86_acpi_numa_init(void)
>  /* Use the information discovered above to actually set up the nodes. */
>  int __init acpi_scan_nodes(void)
>  {
> -	int i;
> -
>  	if (acpi_numa <= 0)
>  		return -1;
> -
> -	/* for out of order entries in SRAT */
> -	sort_node_map();
> -	if (!nodes_cover_memory(numa_nodes)) {
> -		bad_srat();
> -		return -1;
> -	}
> -
> -	/* Finally register nodes */
> -	for_each_node_mask(i, node_possible_map)
> -		setup_node_bootmem(i, numa_nodes[i].start, numa_nodes[i].end);
> -	/* Try again in case setup_node_bootmem missed one due
> -	   to missing bootmem */
> -	for_each_node_mask(i, node_possible_map)
> -		if (!node_online(i))
> -			setup_node_bootmem(i, numa_nodes[i].start,
> -					   numa_nodes[i].end);
> -
> -	for (i = 0; i < nr_cpu_ids; i++) {
> -		int node = early_cpu_to_node(i);
> -
> -		if (node == NUMA_NO_NODE)
> -			continue;
> -		if (!node_online(node))
> -			numa_clear_node(i);
> -	}
> -	numa_init_array();
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init()
  2011-02-12 17:10 ` [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init() Tejun Heo
@ 2011-02-14  6:10   ` Ankita Garg
  2011-02-14 11:09     ` Tejun Heo
  2011-02-14 13:51   ` [PATCH UPDATED 06/26] x86, " Tejun Heo
  1 sibling, 1 reply; 77+ messages in thread
From: Ankita Garg @ 2011-02-14  6:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa

Hi,

On Sat, Feb 12, 2011 at 06:10:43PM +0100, Tejun Heo wrote:
> There's no reason for these to live in setup_arch().  Move them inside
> initmem_init().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> ---
>  arch/x86/include/asm/page_types.h |    2 +-
>  arch/x86/kernel/setup.c           |   16 +---------------
>  arch/x86/mm/init_64.c             |    2 +-
>  arch/x86/mm/numa_64.c             |   16 +++++++++++++++-
>  4 files changed, 18 insertions(+), 18 deletions(-)
>

This will break 32bits.
 
-- 
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs,
Bangalore, India

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init()
  2011-02-14  6:10   ` Ankita Garg
@ 2011-02-14 11:09     ` Tejun Heo
  0 siblings, 0 replies; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 11:09 UTC (permalink / raw)
  To: Ankita Garg
  Cc: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa

On Mon, Feb 14, 2011 at 11:40:47AM +0530, Ankita Garg wrote:
> Hi,
> 
> On Sat, Feb 12, 2011 at 06:10:43PM +0100, Tejun Heo wrote:
> > There's no reason for these to live in setup_arch().  Move them inside
> > initmem_init().
> > 
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > Cc: Yinghai Lu <yinghai@kernel.org>
> > Cc: Brian Gerst <brgerst@gmail.com>
> > Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> > Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: H. Peter Anvin <hpa@linux.intel.com>
> > ---
> >  arch/x86/include/asm/page_types.h |    2 +-
> >  arch/x86/kernel/setup.c           |   16 +---------------
> >  arch/x86/mm/init_64.c             |    2 +-
> >  arch/x86/mm/numa_64.c             |   16 +++++++++++++++-
> >  4 files changed, 18 insertions(+), 18 deletions(-)
> >
> 
> This will break 32bits.

Yeap along with the previous patch.  Will update 32bit accordingly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-12 18:13           ` Yinghai Lu
@ 2011-02-14 11:25             ` Tejun Heo
  2011-02-14 16:12               ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 11:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

Hello, Yinghai.

On Sat, Feb 12, 2011 at 10:13:51AM -0800, Yinghai Lu wrote:
> > Eh?  The oldnode thing will restore the node to initial state thus
> > fulfilling the node empty condition.  Am I missing something?
> > 
> 
> yes. nd get restored, but it keep node_parsed set for that kind of node.

So, this is the code snippet.  Both @nd->start and end are zero and
nodes_parsed for @node is clear.

	nd = &nodes[node];
	oldnode = *nd;

@oldnode->start, end == 0.

	if (!node_test_and_set(node, nodes_parsed)) {
		nd->start = start;
		nd->end = end;
This path is taken and @nd->start and end are set.
	} else {
		if (start < nd->start)
			nd->start = start;
		if (nd->end < end)
			nd->end = end;
	}

	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
	       start, end);

	if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
		update_nodes_add(node, start, end);
		/* restore nodes[node] */
		*nd = oldnode;
@nd->start and end are restored to zero.
		if ((nd->start | nd->end) == 0)
			node_clear(node, nodes_parsed);
and @nodes_parsed is cleared.
	}

So, what the hell am I missing?

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values
  2011-02-12 18:39   ` Yinghai Lu
@ 2011-02-14 11:29     ` Tejun Heo
  2011-02-14 16:14       ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 11:29 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Sat, Feb 12, 2011 at 10:39:03AM -0800, Yinghai Lu wrote:
> > @@ -297,7 +297,10 @@ int __init acpi_numa_init(void)
> >  	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
> >  
> >  	acpi_numa_arch_fixup();
> > -	return ret;
> > +
> > +	if (cnt <= 0)
> > +		return cnt ?: -ENOENT;
> > +	return 0;
> >  }
> >  
> >  int acpi_get_pxm(acpi_handle h)
> 
> 
> it will break AMD system that does not have SRAT.
> 
> your change will treat NO_SRAT as SRAT is there.

Can you please elaborate a bit?  Yinghai, there's brevity and there's
being cryptic.  I appreciate your reviews but don't want to spend time
trying to decipher what you mean.  If it doesn't hurt your fingers too
much, please put a bit more effort into explaining.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-13  0:45   ` Yinghai Lu
@ 2011-02-14 11:32     ` Tejun Heo
  2011-02-14 16:08       ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 11:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

Hello,

On Sat, Feb 12, 2011 at 04:45:27PM -0800, Yinghai Lu wrote:
> please don't put setup_node_bootmem calling into numa_register_memblks()
> that is not related.
> 
> put the calling in initmem_init() directly is more reasonable.

No, I don't think so.  If you don't like the function name, let's
change the name.  I think it's better to put all registrations there.
Later in the series but function is changed to deal with struct
numa_meminfo anyway so maybe it's better to rename it to
numa_register_meminfo().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init()
  2011-02-12 17:10 ` [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init() Tejun Heo
  2011-02-12 17:58   ` Yinghai Lu
@ 2011-02-14 13:50   ` Tejun Heo
  2011-02-14 14:20     ` Ingo Molnar
  1 sibling, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 13:50 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa

initmem_init() extensively accesses and modifies global data
structures and the parameters aren't even followed depending on which
path is being used.  Drop @start/last_pfn and let it deal with
@max_pfn directly.  This is in preparation for further NUMA init
cleanups.

* x86-32 initmem_init() weren't updated breaking 32bit builds.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
---
Updated to convert 32bit initmem_init()'s which didn't use the two
parameters at all.  git tree updated accordingly.

Thanks.

 arch/x86/include/asm/page_types.h |    3 +--
 arch/x86/kernel/setup.c           |    2 +-
 arch/x86/mm/init_32.c             |    3 +--
 arch/x86/mm/init_64.c             |    5 ++---
 arch/x86/mm/numa_32.c             |    3 +--
 arch/x86/mm/numa_64.c             |   21 ++++++++-------------
 6 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 1df6621..95892a1 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -48,8 +48,7 @@ extern unsigned long max_pfn_mapped;
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
-extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn,
-				int acpi, int k8);
+extern void initmem_init(int acpi, int k8);
 extern void free_initmem(void);
 
 #endif	/* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1202341..c50ba3d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -996,7 +996,7 @@ void __init setup_arch(char **cmdline_p)
 		amd = !amd_numa_init(0, max_pfn);
 #endif
 
-	initmem_init(0, max_pfn, acpi, amd);
+	initmem_init(acpi, amd);
 	memblock_find_dma_reserve();
 	dma32_reserve_bootmem();
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index c821074..16adb66 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -644,8 +644,7 @@ void __init find_low_pfn_range(void)
 }
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
-void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
-				int acpi, int k8)
+void __init initmem_init(int acpi, int k8)
 {
 #ifdef CONFIG_HIGHMEM
 	highstart_pfn = highend_pfn = max_pfn;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..26e4e73 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -612,10 +612,9 @@ kernel_physical_mapping_init(unsigned long start,
 }
 
 #ifndef CONFIG_NUMA
-void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
-				int acpi, int k8)
+void __init initmem_init(int acpi, int k8)
 {
-	memblock_x86_register_active_regions(0, start_pfn, end_pfn);
+	memblock_x86_register_active_regions(0, 0, max_pfn);
 }
 #endif
 
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 505bb04..3249b37 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -352,8 +352,7 @@ static void init_remap_allocator(int nid)
 		(ulong) node_remap_end_vaddr[nid]);
 }
 
-void __init initmem_init(unsigned long start_pfn, unsigned long end_pfn,
-				int acpi, int k8)
+void __init initmem_init(int acpi, int k8)
 {
 	int nid;
 	long kva_target_pfn;
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index ea5dd48..f534feb 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -578,8 +578,7 @@ static int __init numa_emulation(unsigned long start_pfn,
 }
 #endif /* CONFIG_NUMA_EMU */
 
-void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
-				int acpi, int amd)
+void __init initmem_init(int acpi, int amd)
 {
 	int i;
 
@@ -587,19 +586,16 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	nodes_clear(node_online_map);
 
 #ifdef CONFIG_NUMA_EMU
-	setup_physnodes(start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT,
-			acpi, amd);
-	if (cmdline && !numa_emulation(start_pfn, last_pfn, acpi, amd))
+	setup_physnodes(0, max_pfn << PAGE_SHIFT, acpi, amd);
+	if (cmdline && !numa_emulation(0, max_pfn, acpi, amd))
 		return;
-	setup_physnodes(start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT,
-			acpi, amd);
+	setup_physnodes(0, max_pfn << PAGE_SHIFT, acpi, amd);
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
-	if (!numa_off && acpi && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
-						  last_pfn << PAGE_SHIFT))
+	if (!numa_off && acpi && !acpi_scan_nodes(0, max_pfn << PAGE_SHIFT))
 		return;
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
@@ -615,8 +611,7 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
 
 	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
-	       start_pfn << PAGE_SHIFT,
-	       last_pfn << PAGE_SHIFT);
+	       0LU, max_pfn << PAGE_SHIFT);
 	/* setup dummy node covering all memory */
 	memnode_shift = 63;
 	memnodemap = memnode.embedded_map;
@@ -625,8 +620,8 @@ void __init initmem_init(unsigned long start_pfn, unsigned long last_pfn,
 	node_set(0, node_possible_map);
 	for (i = 0; i < MAX_LOCAL_APIC; i++)
 		set_apicid_to_node(i, NUMA_NO_NODE);
-	memblock_x86_register_active_regions(0, start_pfn, last_pfn);
-	setup_node_bootmem(0, start_pfn << PAGE_SHIFT, last_pfn << PAGE_SHIFT);
+	memblock_x86_register_active_regions(0, 0, max_pfn);
+	setup_node_bootmem(0, 0, max_pfn << PAGE_SHIFT);
 	numa_init_array();
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH UPDATED 06/26] x86, NUMA: Move *_numa_init() invocations into initmem_init()
  2011-02-12 17:10 ` [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init() Tejun Heo
  2011-02-14  6:10   ` Ankita Garg
@ 2011-02-14 13:51   ` Tejun Heo
  2011-02-14 14:21     ` Ingo Molnar
  1 sibling, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 13:51 UTC (permalink / raw)
  To: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, mingo, hpa

There's no reason for these to live in setup_arch().  Move them inside
initmem_init().

* x86-32 initmem_init() weren't updated breaking 32bit builds.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Ankita Garg <ankita@in.ibm.com>
---
Updated to convert 32bit initmem_init() too.  Both parameters weren't
being used in 32bit versions.  git tree updated accordingly.

Thanks.

 arch/x86/include/asm/page_types.h |    2 +-
 arch/x86/kernel/setup.c           |   16 +---------------
 arch/x86/mm/init_32.c             |    2 +-
 arch/x86/mm/init_64.c             |    2 +-
 arch/x86/mm/numa_32.c             |    2 +-
 arch/x86/mm/numa_64.c             |   16 +++++++++++++++-
 6 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 95892a1..c157986 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -48,7 +48,7 @@ extern unsigned long max_pfn_mapped;
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
-extern void initmem_init(int acpi, int k8);
+extern void initmem_init(void);
 extern void free_initmem(void);
 
 #endif	/* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f69d838..9907b45 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -704,8 +704,6 @@ static u64 __init get_max_mapped(void)
 
 void __init setup_arch(char **cmdline_p)
 {
-	int acpi = 0;
-	int amd = 0;
 	unsigned long flags;
 
 #ifdef CONFIG_X86_32
@@ -984,19 +982,7 @@ void __init setup_arch(char **cmdline_p)
 
 	early_acpi_boot_init();
 
-#ifdef CONFIG_ACPI_NUMA
-	/*
-	 * Parse SRAT to discover nodes.
-	 */
-	acpi = !x86_acpi_numa_init();
-#endif
-
-#ifdef CONFIG_AMD_NUMA
-	if (!acpi)
-		amd = !amd_numa_init();
-#endif
-
-	initmem_init(acpi, amd);
+	initmem_init();
 	memblock_find_dma_reserve();
 	dma32_reserve_bootmem();
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 16adb66..5d43fa5 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -644,7 +644,7 @@ void __init find_low_pfn_range(void)
 }
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
-void __init initmem_init(int acpi, int k8)
+void __init initmem_init(void)
 {
 #ifdef CONFIG_HIGHMEM
 	highstart_pfn = highend_pfn = max_pfn;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 26e4e73..2f333d4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -612,7 +612,7 @@ kernel_physical_mapping_init(unsigned long start,
 }
 
 #ifndef CONFIG_NUMA
-void __init initmem_init(int acpi, int k8)
+void __init initmem_init(void)
 {
 	memblock_x86_register_active_regions(0, 0, max_pfn);
 }
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 3249b37..bde3906 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -352,7 +352,7 @@ static void init_remap_allocator(int nid)
 		(ulong) node_remap_end_vaddr[nid]);
 }
 
-void __init initmem_init(int acpi, int k8)
+void __init initmem_init(void)
 {
 	int nid;
 	long kva_target_pfn;
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 85561d1..4105728 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/sched.h>
+#include <linux/acpi.h>
 
 #include <asm/e820.h>
 #include <asm/proto.h>
@@ -578,10 +579,23 @@ static int __init numa_emulation(unsigned long start_pfn,
 }
 #endif /* CONFIG_NUMA_EMU */
 
-void __init initmem_init(int acpi, int amd)
+void __init initmem_init(void)
 {
+	int acpi = 0, amd = 0;
 	int i;
 
+#ifdef CONFIG_ACPI_NUMA
+	/*
+	 * Parse SRAT to discover nodes.
+	 */
+	acpi = !x86_acpi_numa_init();
+#endif
+
+#ifdef CONFIG_AMD_NUMA
+	if (!acpi)
+		amd = !amd_numa_init();
+#endif
+
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init()
  2011-02-14 13:50   ` [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init() Tejun Heo
@ 2011-02-14 14:20     ` Ingo Molnar
  2011-02-14 14:58       ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Ingo Molnar @ 2011-02-14 14:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa


* Tejun Heo <tj@kernel.org> wrote:

> initmem_init() extensively accesses and modifies global data
> structures and the parameters aren't even followed depending on which
> path is being used.  Drop @start/last_pfn and let it deal with
> @max_pfn directly.  This is in preparation for further NUMA init
> cleanups.
> 
> * x86-32 initmem_init() weren't updated breaking 32bit builds.  Fixed.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>

You forgot to add:

  Reported-by: Yinghai Lu <yinghai@kernel.org>

The kernel development process is review and testing limited and we have a clear 
oversupply in development power. So we want to encourage review and testing feedback 
as much as possible, so adding all the Reported-by / Tested-by tags is absolutely 
vital to being able to do more development.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH UPDATED 06/26] x86, NUMA: Move *_numa_init() invocations into initmem_init()
  2011-02-14 13:51   ` [PATCH UPDATED 06/26] x86, " Tejun Heo
@ 2011-02-14 14:21     ` Ingo Molnar
  0 siblings, 0 replies; 77+ messages in thread
From: Ingo Molnar @ 2011-02-14 14:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa


* Tejun Heo <tj@kernel.org> wrote:

> There's no reason for these to live in setup_arch().  Move them inside
> initmem_init().
> 
> * x86-32 initmem_init() weren't updated breaking 32bit builds.  Fixed.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Shaohui Zheng <shaohui.zheng@intel.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Ankita Garg <ankita@in.ibm.com>
> ---
> Updated to convert 32bit initmem_init() too.  Both parameters weren't
> being used in 32bit versions.  git tree updated accordingly.

This commit is missing a:

  Reported-by: Ankita Garg <ankita@in.ibm.com>

tag.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init()
  2011-02-14 14:20     ` Ingo Molnar
@ 2011-02-14 14:58       ` Tejun Heo
  2011-02-14 19:03         ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 14:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, x86, yinghai, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa

On Mon, Feb 14, 2011 at 03:20:11PM +0100, Ingo Molnar wrote:
> You forgot to add:
> 
>   Reported-by: Yinghai Lu <yinghai@kernel.org>
> 
> The kernel development process is review and testing limited and we have a clear 
> oversupply in development power. So we want to encourage review and testing feedback 
> as much as possible, so adding all the Reported-by / Tested-by tags is absolutely 
> vital to being able to do more development.

git tree updated accordingly, but Reported-by?  I use that to identify
the person who found out the root problem the commit is addressing.
Once review cycle is complete, Reviewed/Acked-by's will be added so
I'm not sure adding Reported-by serves any purpose.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 11:32     ` Tejun Heo
@ 2011-02-14 16:08       ` Yinghai Lu
  2011-02-14 16:12         ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 16:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 3:32 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Sat, Feb 12, 2011 at 04:45:27PM -0800, Yinghai Lu wrote:
>> please don't put setup_node_bootmem calling into numa_register_memblks()
>> that is not related.
>>
>> put the calling in initmem_init() directly is more reasonable.
>
> No, I don't think so.  If you don't like the function name, let's
> change the name.  I think it's better to put all registrations there.
> Later in the series but function is changed to deal with struct
> numa_meminfo anyway so maybe it's better to rename it to
> numa_register_meminfo().

No, I don't like ***_register_*** take care of calling setup_bootmem.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()
  2011-02-14 11:25             ` Tejun Heo
@ 2011-02-14 16:12               ` Yinghai Lu
  0 siblings, 0 replies; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 16:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 3:25 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Yinghai.
>
> On Sat, Feb 12, 2011 at 10:13:51AM -0800, Yinghai Lu wrote:
>> > Eh?  The oldnode thing will restore the node to initial state thus
>> > fulfilling the node empty condition.  Am I missing something?
>> >
>>
>> yes. nd get restored, but it keep node_parsed set for that kind of node.
>
> So, this is the code snippet.  Both @nd->start and end are zero and
> nodes_parsed for @node is clear.
>
>        nd = &nodes[node];
>        oldnode = *nd;
>
> @oldnode->start, end == 0.
>
>        if (!node_test_and_set(node, nodes_parsed)) {
>                nd->start = start;
>                nd->end = end;
> This path is taken and @nd->start and end are set.
>        } else {
>                if (start < nd->start)
>                        nd->start = start;
>                if (nd->end < end)
>                        nd->end = end;
>        }
>
>        printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
>               start, end);
>
>        if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>                update_nodes_add(node, start, end);
>                /* restore nodes[node] */
>                *nd = oldnode;
> @nd->start and end are restored to zero.
>                if ((nd->start | nd->end) == 0)
>                        node_clear(node, nodes_parsed);
> and @nodes_parsed is cleared.

oh, i missed that restore happen first...

sorry.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 16:08       ` Yinghai Lu
@ 2011-02-14 16:12         ` Tejun Heo
  2011-02-14 16:17           ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 16:12 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 08:08:08AM -0800, Yinghai Lu wrote:
> > No, I don't think so.  If you don't like the function name, let's
> > change the name.  I think it's better to put all registrations there.
> > Later in the series but function is changed to deal with struct
> > numa_meminfo anyway so maybe it's better to rename it to
> > numa_register_meminfo().
> 
> No, I don't like ***_register_*** take care of calling setup_bootmem.

Yeah, then, please go ahead and suggest the name you want.  I don't
really care about the name itself, but I don't want to put it directly
in initmem_init() because with double calling and extra loop added
later it gets nested too deep.  For now, let's move on, okay?  We can
argue about this for days but there's no clear technical
[dis]advantage one way or the other and falls squarely in the scope of
bikeshedding.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values
  2011-02-14 11:29     ` Tejun Heo
@ 2011-02-14 16:14       ` Yinghai Lu
  2011-02-14 16:18         ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 16:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 3:29 AM, Tejun Heo <tj@kernel.org> wrote:
> On Sat, Feb 12, 2011 at 10:39:03AM -0800, Yinghai Lu wrote:
>> > @@ -297,7 +297,10 @@ int __init acpi_numa_init(void)
>> >     acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>> >
>> >     acpi_numa_arch_fixup();
>> > -   return ret;
>> > +
>> > +   if (cnt <= 0)
>> > +           return cnt ?: -ENOENT;
>> > +   return 0;
>> >  }
>> >
>> >  int acpi_get_pxm(acpi_handle h)
>>
>>
>> it will break AMD system that does not have SRAT.
>>
>> your change will treat NO_SRAT as SRAT is there.
>
> Can you please elaborate a bit?  Yinghai, there's brevity and there's
> being cryptic.  I appreciate your reviews but don't want to spend time
> trying to decipher what you mean.  If it doesn't hurt your fingers too
> much, please put a bit more effort into explaining.

when system have acpi support  but SRAT is NOT there,  new
acpi_numa_init() will return 0 just like SRAT is there and correct.
So it will skip AMD node scanning.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 16:12         ` Tejun Heo
@ 2011-02-14 16:17           ` Yinghai Lu
  2011-02-14 16:22             ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 16:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 8:12 AM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Feb 14, 2011 at 08:08:08AM -0800, Yinghai Lu wrote:
>> > No, I don't think so.  If you don't like the function name, let's
>> > change the name.  I think it's better to put all registrations there.
>> > Later in the series but function is changed to deal with struct
>> > numa_meminfo anyway so maybe it's better to rename it to
>> > numa_register_meminfo().
>>
>> No, I don't like ***_register_*** take care of calling setup_bootmem.
>
> Yeah, then, please go ahead and suggest the name you want.  I don't
> really care about the name itself, but I don't want to put it directly
> in initmem_init() because with double calling and extra loop added
> later it gets nested too deep.  For now, let's move on, okay?  We can
> argue about this for days but there's no clear technical
> [dis]advantage one way or the other and falls squarely in the scope of
> bikeshedding.
>
why not do it at first point ?

numa_register_meminfo() should only take care of creating correctly
struct numa_meminfo.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values
  2011-02-14 16:14       ` Yinghai Lu
@ 2011-02-14 16:18         ` Tejun Heo
  2011-02-14 18:00           ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 16:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 08:14:36AM -0800, Yinghai Lu wrote:
> when system have acpi support  but SRAT is NOT there,  new
> acpi_numa_init() will return 0 just like SRAT is there and correct.
> So it will skip AMD node scanning.

How does it return 0?

...
		cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
					    acpi_parse_memory_affinity,
					    NR_NODE_MEMBLKS);
If there's no srat, cnt is zero.
	}

	/* SLIT: System Locality Information Table */
	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);

	acpi_numa_arch_fixup();

	if (cnt <= 0)
if cnt is zero, the if is taken
		return cnt ?: -ENOENT;
and as cnt is zero, -ENOENT is returned.
	return 0;
The function returns 0 iff cnt > 0.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 16:17           ` Yinghai Lu
@ 2011-02-14 16:22             ` Tejun Heo
  2011-02-14 18:14               ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 16:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 08:17:46AM -0800, Yinghai Lu wrote:
> why not do it at first point ?
> 
> numa_register_meminfo() should only take care of creating correctly
> struct numa_meminfo.

No, register meminfo doesn't create numa_meminfo.  It sets up system
states according to the configuration information in numa_meminfo, and
I really have no idea what you're arguing for or against.  What's your
point?

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values
  2011-02-14 16:18         ` Tejun Heo
@ 2011-02-14 18:00           ` Yinghai Lu
  0 siblings, 0 replies; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 18:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 8:18 AM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Feb 14, 2011 at 08:14:36AM -0800, Yinghai Lu wrote:
>> when system have acpi support  but SRAT is NOT there,  new
>> acpi_numa_init() will return 0 just like SRAT is there and correct.
>> So it will skip AMD node scanning.
>
> How does it return 0?
>
> ...
>                cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
>                                            acpi_parse_memory_affinity,
>                                            NR_NODE_MEMBLKS);
> If there's no srat, cnt is zero.
>        }
>
>        /* SLIT: System Locality Information Table */
>        acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>
>        acpi_numa_arch_fixup();
>
>        if (cnt <= 0)
> if cnt is zero, the if is taken
>                return cnt ?: -ENOENT;
> and as cnt is zero, -ENOENT is returned.
>        return 0;
> The function returns 0 iff cnt > 0.

oh, I missed it again.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 16:22             ` Tejun Heo
@ 2011-02-14 18:14               ` Yinghai Lu
  2011-02-14 18:27                 ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 18:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 8:22 AM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Feb 14, 2011 at 08:17:46AM -0800, Yinghai Lu wrote:
>> why not do it at first point ?
>>
>> numa_register_meminfo() should only take care of creating correctly
>> struct numa_meminfo.
>
> No, register meminfo doesn't create numa_meminfo.  It sets up system
> states according to the configuration information in numa_meminfo, and
> I really have no idea what you're arguing for or against.  What's your
> point?

I just want seperate setup_bootmem  ( and maybe init_memory_mapping_high...)
out  that __register__ function.

that __register__ function do some sth like memblock_register to
early_node_map[]
looks reasonable.

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 18:14               ` Yinghai Lu
@ 2011-02-14 18:27                 ` Tejun Heo
  2011-02-14 19:07                   ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 18:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

Hello,

On Mon, Feb 14, 2011 at 10:14:15AM -0800, Yinghai Lu wrote:
> I just want seperate setup_bootmem  ( and maybe init_memory_mapping_high...)
> out  that __register__ function.
> 
> that __register__ function do some sth like memblock_register to
> early_node_map[] looks reasonable.

Can you please provide some rationales _why_ you think setup_bootmem()
should be moved elsewhere?  Because it is not apparent to me and I
cannot read your mind.  I'm not against it given good enough reasons.
After all, _technically_ it doesn't make one iota of difference, but I
don't want to change it just because you don't like it and I don't
think you want that either, so _please_ give me some explanations
about what you want and _why_.  It's not like we're involved in a
romantic relationship and even when I'm in one I suck at that implied
communication thing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init()
  2011-02-14 14:58       ` Tejun Heo
@ 2011-02-14 19:03         ` Yinghai Lu
  2011-02-14 19:31           ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 19:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa

On 02/14/2011 06:58 AM, Tejun Heo wrote:
> On Mon, Feb 14, 2011 at 03:20:11PM +0100, Ingo Molnar wrote:
>> You forgot to add:
>>
>>   Reported-by: Yinghai Lu <yinghai@kernel.org>
>>
>> The kernel development process is review and testing limited and we have a clear 
>> oversupply in development power. So we want to encourage review and testing feedback 
>> as much as possible, so adding all the Reported-by / Tested-by tags is absolutely 
>> vital to being able to do more development.
> 
> git tree updated accordingly, but Reported-by?  I use that to identify
> the person who found out the root problem the commit is addressing.
> Once review cycle is complete, Reviewed/Acked-by's will be added so
> I'm not sure adding Reported-by serves any purpose.

could be something in change log for -v2

initmem_init() extensively accesses and modifies global data
structures and the parameters aren't even followed depending on which
path is being used.  Drop @start/last_pfn and let it deal with
@max_pfn directly.  This is in preparation for further NUMA init
cleanups.

-v2: x86-32 initmem_init() weren't updated breaking 32bit builds.  Fixed.
     Found by Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 18:27                 ` Tejun Heo
@ 2011-02-14 19:07                   ` Yinghai Lu
  2011-02-14 19:30                     ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 19:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/14/2011 10:27 AM, Tejun Heo wrote:
> Hello,
> 
> On Mon, Feb 14, 2011 at 10:14:15AM -0800, Yinghai Lu wrote:
>> I just want seperate setup_bootmem  ( and maybe init_memory_mapping_high...)
>> out  that __register__ function.
>>
>> that __register__ function do some sth like memblock_register to
>> early_node_map[] looks reasonable.
> 
> Can you please provide some rationales _why_ you think setup_bootmem()
> should be moved elsewhere?  Because it is not apparent to me and I
> cannot read your mind.  I'm not against it given good enough reasons.
> After all, _technically_ it doesn't make one iota of difference, but I
> don't want to change it just because you don't like it and I don't
> think you want that either, so _please_ give me some explanations
> about what you want and _why_.  It's not like we're involved in a
> romantic relationship and even when I'm in one I suck at that implied
> communication thing.

Never mind. will send out patch after your patches get merged into tip.

BTW, you may need to rebase your on top of tip/master.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 19:07                   ` Yinghai Lu
@ 2011-02-14 19:30                     ` Tejun Heo
  2011-02-14 19:35                       ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 19:30 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 11:07:02AM -0800, Yinghai Lu wrote:
> Never mind. will send out patch after your patches get merged into tip.

Alright, fair enough.

> BTW, you may need to rebase your on top of tip/master.

Yeah, I saw a new patch going into the numa branch, but shouldn't this
and the next series based on x86/numa?  That was how it was done with
the previous series.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init()
  2011-02-14 19:03         ` Yinghai Lu
@ 2011-02-14 19:31           ` Tejun Heo
  2011-02-15  2:29             ` Ingo Molnar
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-14 19:31 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa

On Mon, Feb 14, 2011 at 11:03:35AM -0800, Yinghai Lu wrote:
> initmem_init() extensively accesses and modifies global data
> structures and the parameters aren't even followed depending on which
> path is being used.  Drop @start/last_pfn and let it deal with
> @max_pfn directly.  This is in preparation for further NUMA init
> cleanups.
> 
> -v2: x86-32 initmem_init() weren't updated breaking 32bit builds.  Fixed.
>      Found by Yinghai

Alright, updated accordingly in the git tree.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 19:30                     ` Tejun Heo
@ 2011-02-14 19:35                       ` Yinghai Lu
  2011-02-15  9:11                         ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-14 19:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On 02/14/2011 11:30 AM, Tejun Heo wrote:
> On Mon, Feb 14, 2011 at 11:07:02AM -0800, Yinghai Lu wrote:
>> Never mind. will send out patch after your patches get merged into tip.
> 
> Alright, fair enough.
> 
>> BTW, you may need to rebase your on top of tip/master.
> 
> Yeah, I saw a new patch going into the numa branch, but shouldn't this
> and the next series based on x86/numa?  That was how it was done with
> the previous series.

there is patch about init_memory_mapping_high() in tip/x86/bootmem.
it will put pgtable on local nodes.

Yinghai


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-12 17:10 ` [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() Tejun Heo
@ 2011-02-14 22:59   ` Cyrill Gorcunov
  2011-02-15  9:36     ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Cyrill Gorcunov @ 2011-02-14 22:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, yinghai, brgerst, shaohui.zheng, rientjes, mingo, hpa

On 02/12/2011 08:10 PM, Tejun Heo wrote:
> This brings amd initialization behavior closer to that of acpi.
>
> Signed-off-by: Tejun Heo<tj@kernel.org>
> Cc: Yinghai Lu<yinghai@kernel.org>
> Cc: Brian Gerst<brgerst@gmail.com>
> Cc: Cyrill Gorcunov<gorcunov@gmail.com>
> Cc: Shaohui Zheng<shaohui.zheng@intel.com>
> Cc: David Rientjes<rientjes@google.com>
> Cc: Ingo Molnar<mingo@elte.hu>
> Cc: H. Peter Anvin<hpa@linux.intel.com>
> ---
>   arch/x86/mm/amdtopology_64.c |   40 ++++++++++++++++++++++------------------
>   1 files changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
> index c5eddfa..4056333 100644
> --- a/arch/x86/mm/amdtopology_64.c
> +++ b/arch/x86/mm/amdtopology_64.c
> @@ -74,8 +74,9 @@ int __init amd_numa_init(void)
>   	unsigned long end = PFN_PHYS(max_pfn);
>   	unsigned numnodes;
>   	unsigned long prevbase;
> -	int i, nb;
> +	int i, j, nb;
>   	u32 nodeid, reg;
> +	unsigned int bits, cores, apicid_base;
>
>   	if (!early_pci_allowed())
>   		return -EINVAL;
> @@ -176,6 +177,26 @@ int __init amd_numa_init(void)
>
>   	if (!nodes_weight(mem_nodes_parsed))
>   		return -ENOENT;
> +
> +	/*
> +	 * We seem to have valid NUMA configuration.  Map apicids to nodes
> +	 * using the coreid bits from early_identify_cpu.
> +	 */
> +	bits = boot_cpu_data.x86_coreid_bits;
> +	cores = 1<<  bits;
> +	apicid_base = 0;
> +
> +	/* get the APIC ID of the BSP early for systems with apicid lifting */
> +	early_get_boot_cpu_id();
> +	if (boot_cpu_physical_apicid>  0) {
> +		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
> +		apicid_base = boot_cpu_physical_apicid;
		^^^
> +	}
> +
> +	for_each_node_mask(i, cpu_nodes_parsed)
> +		for (j = apicid_base; j<  cores + apicid_base; j++)
> +			set_apicid_to_node((i<<  bits) + j, i);
> +
>   	return 0;
>   }
>

   Hi Tejun, while you at it, it seems apicid_base conditional assignment is
redundant here (boot_cpu_physical_apicid is unsigned int) so we might have
something like

	apicid_start	= boot_cpu_physical_apicid;
	apicid_end	= apicid_start + cores;

	for_each_node_mask(i, cpu_nodes_parsed) {
		for (j = apicid_start; j <  apicid_end; j++)
			set_apicid_to_node((i <<  bits) + j, i);
	}

  But of course it should be considered as followup update just to not
mess the things. (probably need to check if we ever may reach this point
with boot_cpu_physical_apicid = -1U). Just a thought.
-- 
     Cyrill

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init()
  2011-02-14 19:31           ` Tejun Heo
@ 2011-02-15  2:29             ` Ingo Molnar
  0 siblings, 0 replies; 77+ messages in thread
From: Ingo Molnar @ 2011-02-15  2:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yinghai Lu, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa



* Tejun Heo <tj@kernel.org> wrote:

> On Mon, Feb 14, 2011 at 11:03:35AM -0800, Yinghai Lu wrote:
> > initmem_init() extensively accesses and modifies global data
> > structures and the parameters aren't even followed depending on which
> > path is being used.  Drop @start/last_pfn and let it deal with
> > @max_pfn directly.  This is in preparation for further NUMA init
> > cleanups.
> > 
> > -v2: x86-32 initmem_init() weren't updated breaking 32bit builds.  Fixed.
> >      Found by Yinghai
> 
> Alright, updated accordingly in the git tree.

Yeah, that kind of tag is fine too, or:

 Build-bug-in-v1-found-by: ...

:-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-14 19:35                       ` Yinghai Lu
@ 2011-02-15  9:11                         ` Tejun Heo
  2011-02-15  9:43                           ` Ingo Molnar
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-15  9:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, x86, brgerst, gorcunov, shaohui.zheng, rientjes,
	mingo, hpa

On Mon, Feb 14, 2011 at 11:35:06AM -0800, Yinghai Lu wrote:
> On 02/14/2011 11:30 AM, Tejun Heo wrote:
> > On Mon, Feb 14, 2011 at 11:07:02AM -0800, Yinghai Lu wrote:
> >> Never mind. will send out patch after your patches get merged into tip.
> > 
> > Alright, fair enough.
> > 
> >> BTW, you may need to rebase your on top of tip/master.
> > 
> > Yeah, I saw a new patch going into the numa branch, but shouldn't this
> > and the next series based on x86/numa?  That was how it was done with
> > the previous series.
> 
> there is patch about init_memory_mapping_high() in tip/x86/bootmem.
> it will put pgtable on local nodes.

Ingo, hpa, how do you guys want to handle this?  Maybe you can cherry
pick or pull the branch into x86/numa?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-14 22:59   ` Cyrill Gorcunov
@ 2011-02-15  9:36     ` Tejun Heo
  2011-02-15 17:31       ` Cyrill Gorcunov
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-15  9:36 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, x86, yinghai, brgerst, shaohui.zheng, rientjes, mingo, hpa

Hello, Cyrill.

On Tue, Feb 15, 2011 at 01:59:26AM +0300, Cyrill Gorcunov wrote:
> >+	/* get the APIC ID of the BSP early for systems with apicid lifting */
> >+	early_get_boot_cpu_id();
> >+	if (boot_cpu_physical_apicid>  0) {
> >+		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
> >+		apicid_base = boot_cpu_physical_apicid;
> 		^^^
> >+	}
> >+
> >+	for_each_node_mask(i, cpu_nodes_parsed)
> >+		for (j = apicid_base; j<  cores + apicid_base; j++)
> >+			set_apicid_to_node((i<<  bits) + j, i);
> >+
> >  	return 0;
> >  }
> >
> 
>   Hi Tejun, while you at it, it seems apicid_base conditional assignment is
> redundant here (boot_cpu_physical_apicid is unsigned int) so we might have
> something like
> 
> 	apicid_start	= boot_cpu_physical_apicid;
> 	apicid_end	= apicid_start + cores;
> 
> 	for_each_node_mask(i, cpu_nodes_parsed) {
> 		for (j = apicid_start; j <  apicid_end; j++)
> 			set_apicid_to_node((i <<  bits) + j, i);
> 	}

Right, I think the intention there was

	if (boot_cpu_physical_apicid == -1U)

because that's the initial value and we don't really want to index the
apicid nid table with -1U.  Care to send a patch?  I'm gonna have to
rebase anyway and can put the patch at the front.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-15  9:11                         ` Tejun Heo
@ 2011-02-15  9:43                           ` Ingo Molnar
  2011-02-15 16:49                             ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Ingo Molnar @ 2011-02-15  9:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yinghai Lu, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa


* Tejun Heo <tj@kernel.org> wrote:

> On Mon, Feb 14, 2011 at 11:35:06AM -0800, Yinghai Lu wrote:
> > On 02/14/2011 11:30 AM, Tejun Heo wrote:
> > > On Mon, Feb 14, 2011 at 11:07:02AM -0800, Yinghai Lu wrote:
> > >> Never mind. will send out patch after your patches get merged into tip.
> > > 
> > > Alright, fair enough.
> > > 
> > >> BTW, you may need to rebase your on top of tip/master.
> > > 
> > > Yeah, I saw a new patch going into the numa branch, but shouldn't this
> > > and the next series based on x86/numa?  That was how it was done with
> > > the previous series.
> > 
> > there is patch about init_memory_mapping_high() in tip/x86/bootmem.
> > it will put pgtable on local nodes.
> 
> Ingo, hpa, how do you guys want to handle this?  Maybe you can cherry
> pick or pull the branch into x86/numa?

Would be nice to have a version against the latest tip:x86/numa tree, the current 
one conflicts in arch/x86/mm/numa_64.c with your tree. Also, i suspect you want to 
propagate Yinghai's Acked-by's into the commits?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-15  9:43                           ` Ingo Molnar
@ 2011-02-15 16:49                             ` Tejun Heo
  2011-02-16  8:41                               ` Ingo Molnar
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-15 16:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa

On Tue, Feb 15, 2011 at 10:43:27AM +0100, Ingo Molnar wrote:
> Would be nice to have a version against the latest tip:x86/numa
> tree, the current one conflicts in arch/x86/mm/numa_64.c with your
> tree. Also, i suspect you want to propagate Yinghai's Acked-by's
> into the commits?

Acked-by added and rebased on top of the current x86/numa.  Only one
patch needed to be updated and the udpated version was just posted.
The updated git branches are available at

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86_64-numa-unify
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86_64-numa-emu-unify

The latter one is superset of the former.

If you want the whole series reposted, please let me know.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-15  9:36     ` Tejun Heo
@ 2011-02-15 17:31       ` Cyrill Gorcunov
  2011-02-15 17:54         ` Yinghai Lu
  0 siblings, 1 reply; 77+ messages in thread
From: Cyrill Gorcunov @ 2011-02-15 17:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, x86, yinghai, brgerst, shaohui.zheng, rientjes, mingo, hpa

On 02/15/2011 12:36 PM, Tejun Heo wrote:
...
>>
>>    Hi Tejun, while you at it, it seems apicid_base conditional assignment is
>> redundant here (boot_cpu_physical_apicid is unsigned int) so we might have
>> something like
>>
>> 	apicid_start	= boot_cpu_physical_apicid;
>> 	apicid_end	= apicid_start + cores;
>>
>> 	for_each_node_mask(i, cpu_nodes_parsed) {
>> 		for (j = apicid_start; j<   apicid_end; j++)
>> 			set_apicid_to_node((i<<   bits) + j, i);
>> 	}
>
> Right, I think the intention there was
>
> 	if (boot_cpu_physical_apicid == -1U)
>
> because that's the initial value and we don't really want to index the
> apicid nid table with -1U.  Care to send a patch?  I'm gonna have to
> rebase anyway and can put the patch at the front.
>
> Thanks.
>

  Hi Tejun again :) I've looked some more and if I'm not missing something
(Yinghai?) the code is broken in another way. We might have AMD system
with corrupted MADT table so boot_cpu_physical_apicid remains =-1U
and then we better BUG_ON instead of possible access of out-of-range
__apicid_to_node array. If MADT is parsed successfully boot_cpu_physical_apicid
will have correct value. So I think we rather should add something like the
patch below. Again better Yinghai check it first so I would not _miss_ the
point that such situation is impossible at all. (And if I'm right we need to
check for set_apicid_to_node(apicid, ) never exceed MAX_LOCAL_APIC as well.

Yinghai am I missing something?

-- 
     Cyrill

---
x86, numa: amd -- Check for screwed MADT table

In case if MADT table is corrupted we might end up
with boot_cpu_physical_apicid = -1U, corebits > 0 and
get out of __apicid_to_node array bound access. Check for
boot_cpu_physical_apicid being not default value.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
  arch/x86/mm/amdtopology_64.c |    7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6.git/arch/x86/mm/amdtopology_64.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/mm/amdtopology_64.c
+++ linux-2.6.git/arch/x86/mm/amdtopology_64.c
@@ -271,9 +271,14 @@ int __init amd_scan_nodes(void)
  	bits = boot_cpu_data.x86_coreid_bits;
  	cores = (1<<bits);
  	apicid_base = 0;
+
  	/* get the APIC ID of the BSP early for systems with apicid lifting */
  	early_get_boot_cpu_id();
-	if (boot_cpu_physical_apicid > 0) {
+	if (boot_cpu_physical_apicid == -1U || ) {
+		pr_err("BAD APIC ID: %02x, NUMA node scaning canceled\n",
+			boot_cpu_physical_apicid);
+		return -1;
+	} else if (boot_cpu_physical_apicid > 0) {
  		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
  		apicid_base = boot_cpu_physical_apicid;
  	}

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-15 17:31       ` Cyrill Gorcunov
@ 2011-02-15 17:54         ` Yinghai Lu
  2011-02-15 18:01           ` Cyrill Gorcunov
  0 siblings, 1 reply; 77+ messages in thread
From: Yinghai Lu @ 2011-02-15 17:54 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Tejun Heo, linux-kernel, x86, brgerst, shaohui.zheng, rientjes,
	mingo, hpa

On Tue, Feb 15, 2011 at 9:31 AM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On 02/15/2011 12:36 PM, Tejun Heo wrote:
> ...
>>>
>>>   Hi Tejun, while you at it, it seems apicid_base conditional assignment
>>> is
>>> redundant here (boot_cpu_physical_apicid is unsigned int) so we might
>>> have
>>> something like
>>>
>>>        apicid_start    = boot_cpu_physical_apicid;
>>>        apicid_end      = apicid_start + cores;
>>>
>>>        for_each_node_mask(i, cpu_nodes_parsed) {
>>>                for (j = apicid_start; j<   apicid_end; j++)
>>>                        set_apicid_to_node((i<<   bits) + j, i);
>>>        }
>>
>> Right, I think the intention there was
>>
>>        if (boot_cpu_physical_apicid == -1U)
>>
>> because that's the initial value and we don't really want to index the
>> apicid nid table with -1U.  Care to send a patch?  I'm gonna have to
>> rebase anyway and can put the patch at the front.
>>
>> Thanks.
>>
>
>  Hi Tejun again :) I've looked some more and if I'm not missing something
> (Yinghai?) the code is broken in another way. We might have AMD system
> with corrupted MADT table so boot_cpu_physical_apicid remains =-1U
> and then we better BUG_ON instead of possible access of out-of-range
> __apicid_to_node array. If MADT is parsed successfully
> boot_cpu_physical_apicid
> will have correct value. So I think we rather should add something like the
> patch below. Again better Yinghai check it first so I would not _miss_ the
> point that such situation is impossible at all. (And if I'm right we need to
> check for set_apicid_to_node(apicid, ) never exceed MAX_LOCAL_APIC as well.
>
> Yinghai am I missing something?
>
> --
>    Cyrill
>
> ---
> x86, numa: amd -- Check for screwed MADT table
>
> In case if MADT table is corrupted we might end up
> with boot_cpu_physical_apicid = -1U, corebits > 0 and
> get out of __apicid_to_node array bound access. Check for
> boot_cpu_physical_apicid being not default value.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> ---
>  arch/x86/mm/amdtopology_64.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.git/arch/x86/mm/amdtopology_64.c
> =====================================================================
> --- linux-2.6.git.orig/arch/x86/mm/amdtopology_64.c
> +++ linux-2.6.git/arch/x86/mm/amdtopology_64.c
> @@ -271,9 +271,14 @@ int __init amd_scan_nodes(void)
>        bits = boot_cpu_data.x86_coreid_bits;
>        cores = (1<<bits);
>        apicid_base = 0;
> +
>        /* get the APIC ID of the BSP early for systems with apicid lifting
> */
>        early_get_boot_cpu_id();
> -       if (boot_cpu_physical_apicid > 0) {
> +       if (boot_cpu_physical_apicid == -1U || ) {
> +               pr_err("BAD APIC ID: %02x, NUMA node scaning canceled\n",
> +                       boot_cpu_physical_apicid);
> +               return -1;
> +       } else if (boot_cpu_physical_apicid > 0) {
>                pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
>                apicid_base = boot_cpu_physical_apicid;
>        }

could just change

 -       if (boot_cpu_physical_apicid > 0) {
 +       if (boot_cpu_physical_apicid != -1U) {
                pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
                apicid_base = boot_cpu_physical_apicid;
        }

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-15 17:54         ` Yinghai Lu
@ 2011-02-15 18:01           ` Cyrill Gorcunov
  2011-02-15 18:27             ` Cyrill Gorcunov
  2011-02-15 19:41             ` Yinghai Lu
  0 siblings, 2 replies; 77+ messages in thread
From: Cyrill Gorcunov @ 2011-02-15 18:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tejun Heo, linux-kernel, x86, brgerst, shaohui.zheng, rientjes,
	mingo, hpa

On 02/15/2011 08:54 PM, Yinghai Lu wrote:
...
>
> could just change
>
>   -       if (boot_cpu_physical_apicid>  0) {
>   +       if (boot_cpu_physical_apicid != -1U) {
>                  pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
>                  apicid_base = boot_cpu_physical_apicid;
>          }
>
> Thanks
>
> Yinghai

yup, that is exactly what Tejun suggested in first place ;) I'll update to
this form and add your Acked-by then, ok?

-- 
     Cyrill

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-15 18:01           ` Cyrill Gorcunov
@ 2011-02-15 18:27             ` Cyrill Gorcunov
  2011-02-15 19:41             ` Yinghai Lu
  1 sibling, 0 replies; 77+ messages in thread
From: Cyrill Gorcunov @ 2011-02-15 18:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tejun Heo, linux-kernel, x86, brgerst, shaohui.zheng, rientjes,
	mingo, hpa

On 02/15/2011 09:01 PM, Cyrill Gorcunov wrote:
> On 02/15/2011 08:54 PM, Yinghai Lu wrote:
> ...
>>
>> could just change
>>
>> - if (boot_cpu_physical_apicid> 0) {
>> + if (boot_cpu_physical_apicid != -1U) {
>> pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
>> apicid_base = boot_cpu_physical_apicid;
>> }
>>
>> Thanks
>>
>> Yinghai
>
> yup, that is exactly what Tejun suggested in first place ;) I'll update to
> this form and add your Acked-by then, ok?
>

Tejun I've updated it on top of your tj-numa/x86_64-numa-unify branch
and added Acks.

-- 
x86, numa: amd -- Check for screwed MADT table

In case if MADT table is corrupted we might end up
with boot_cpu_physical_apicid = -1U, corebits > 0 and
get out of __apicid_to_node array bound access. Check for
boot_cpu_physical_apicid being a sane value.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
---
  arch/x86/mm/amdtopology_64.c |    6 ++++--
  1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6.git/arch/x86/mm/amdtopology_64.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/mm/amdtopology_64.c
+++ linux-2.6.git/arch/x86/mm/amdtopology_64.c
@@ -212,14 +212,16 @@ void __init amd_fake_nodes(const struct
  {
  	unsigned int bits;
  	unsigned int cores;
-	unsigned int apicid_base = 0;
+	unsigned int apicid_base;
  	int i;

  	bits = boot_cpu_data.x86_coreid_bits;
  	cores = 1 << bits;
  	early_get_boot_cpu_id();
-	if (boot_cpu_physical_apicid > 0)
+	if (boot_cpu_physical_apicid != -1U)
  		apicid_base = boot_cpu_physical_apicid;
+	else
+		apicid_base = 0;

  	for (i = 0; i < nr_nodes; i++) {
  		int index;

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init()
  2011-02-15 18:01           ` Cyrill Gorcunov
  2011-02-15 18:27             ` Cyrill Gorcunov
@ 2011-02-15 19:41             ` Yinghai Lu
  1 sibling, 0 replies; 77+ messages in thread
From: Yinghai Lu @ 2011-02-15 19:41 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Tejun Heo, linux-kernel, x86, brgerst, shaohui.zheng, rientjes,
	mingo, hpa

On 02/15/2011 10:01 AM, Cyrill Gorcunov wrote:
> On 02/15/2011 08:54 PM, Yinghai Lu wrote:
> ...
>>
>> could just change
>>
>>   -       if (boot_cpu_physical_apicid>  0) {
>>   +       if (boot_cpu_physical_apicid != -1U) {
>>                  pr_info("BSP APIC ID: %02x\n",
>> boot_cpu_physical_apicid);
>>                  apicid_base = boot_cpu_physical_apicid;
>>          }
>>
>> Thanks
>>
>> Yinghai
> 
> yup, that is exactly what Tejun suggested in first place ;) I'll update to
> this form and add your Acked-by then, ok?
> 

yes.

Thanks

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-15 16:49                             ` Tejun Heo
@ 2011-02-16  8:41                               ` Ingo Molnar
  2011-02-16  8:48                                 ` Ingo Molnar
  0 siblings, 1 reply; 77+ messages in thread
From: Ingo Molnar @ 2011-02-16  8:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yinghai Lu, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa


* Tejun Heo <tj@kernel.org> wrote:

> On Tue, Feb 15, 2011 at 10:43:27AM +0100, Ingo Molnar wrote:
> > Would be nice to have a version against the latest tip:x86/numa
> > tree, the current one conflicts in arch/x86/mm/numa_64.c with your
> > tree. Also, i suspect you want to propagate Yinghai's Acked-by's
> > into the commits?
> 
> Acked-by added and rebased on top of the current x86/numa.  Only one
> patch needed to be updated and the udpated version was just posted.
> The updated git branches are available at
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86_64-numa-unify
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86_64-numa-emu-unify
> 
> The latter one is superset of the former.
> 
> If you want the whole series reposted, please let me know.

No need if the patches did not change - but a diffstat is generally nice with pull 
requests, so that mismatches between intention on your side and action on my side 
can be detected sooner. Anyway, pulled it - thanks Tejun!

	Ingo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-16  8:41                               ` Ingo Molnar
@ 2011-02-16  8:48                                 ` Ingo Molnar
  2011-02-16  9:01                                   ` Tejun Heo
  0 siblings, 1 reply; 77+ messages in thread
From: Ingo Molnar @ 2011-02-16  8:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yinghai Lu, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa


> > If you want the whole series reposted, please let me know.

Actually, it would be nice if you could merge your tree to the latest x86/mm tree i 
have just pushed out :-/

The problem is that your work is so wide that it conflicts with x86/mm, x86/bootmem, 
x86/amd-nb and x86/mm:

arch/x86/include/asm/amd_nb.h
arch/x86/include/asm/page_types.h
arch/x86/mm/amdtopology_64.c
arch/x86/mm/init_64.c
arch/x86/mm/numa_64.c
arch/x86/mm/srat_64.c

I consolidated all these topics into the generic x86/mm tree which you could use as 
a base.

I did not recognize the level of conflicts until i tried to pull your tree. You can 
generally test such interactions with other bits of the x86 tree by test-merging 
your pending work to tip:master.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-16  8:48                                 ` Ingo Molnar
@ 2011-02-16  9:01                                   ` Tejun Heo
  2011-02-16  9:31                                     ` Ingo Molnar
  0 siblings, 1 reply; 77+ messages in thread
From: Tejun Heo @ 2011-02-16  9:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa

Hello,

On Wed, Feb 16, 2011 at 09:48:52AM +0100, Ingo Molnar wrote:
> I consolidated all these topics into the generic x86/mm tree which you could use as 
> a base.
> 
> I did not recognize the level of conflicts until i tried to pull your tree. You can 
> generally test such interactions with other bits of the x86 tree by test-merging 
> your pending work to tip:master.

Alright, I'll just rebase it on top of x86/mm and send a proper pull
request.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration
  2011-02-16  9:01                                   ` Tejun Heo
@ 2011-02-16  9:31                                     ` Ingo Molnar
  0 siblings, 0 replies; 77+ messages in thread
From: Ingo Molnar @ 2011-02-16  9:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yinghai Lu, linux-kernel, x86, brgerst, gorcunov, shaohui.zheng,
	rientjes, hpa


* Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On Wed, Feb 16, 2011 at 09:48:52AM +0100, Ingo Molnar wrote:
> > I consolidated all these topics into the generic x86/mm tree which you could use as 
> > a base.
> > 
> > I did not recognize the level of conflicts until i tried to pull your tree. You can 
> > generally test such interactions with other bits of the x86 tree by test-merging 
> > your pending work to tip:master.
> 
> Alright, I'll just rebase it on top of x86/mm and send a proper pull
> request.

The tip of x86/mm is:

  275a88d3cf0e: Merge branch 'x86/amd-nb' into x86/mm

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2011-02-16  9:31 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-12 17:10 [PATCHSET x86/numa] x86-64, NUMA: bring sanity to NUMA configuration Tejun Heo
2011-02-12 17:10 ` [PATCH 01/26] x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones Tejun Heo
2011-02-12 17:52   ` Yinghai Lu
2011-02-12 17:10 ` [PATCH 02/26] x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() Tejun Heo
2011-02-12 17:47   ` Yinghai Lu
2011-02-12 17:56     ` Tejun Heo
2011-02-12 18:04       ` Yinghai Lu
2011-02-12 18:06         ` Tejun Heo
2011-02-12 18:13           ` Yinghai Lu
2011-02-14 11:25             ` Tejun Heo
2011-02-14 16:12               ` Yinghai Lu
2011-02-12 17:10 ` [PATCH 03/26] x86-64, NUMA: Drop @start/last_pfn from initmem_init() Tejun Heo
2011-02-12 17:58   ` Yinghai Lu
2011-02-12 18:03     ` Tejun Heo
2011-02-14 13:50   ` [PATCH UPDATED 03/26] x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init() Tejun Heo
2011-02-14 14:20     ` Ingo Molnar
2011-02-14 14:58       ` Tejun Heo
2011-02-14 19:03         ` Yinghai Lu
2011-02-14 19:31           ` Tejun Heo
2011-02-15  2:29             ` Ingo Molnar
2011-02-12 17:10 ` [PATCH 04/26] x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values Tejun Heo
2011-02-12 18:39   ` Yinghai Lu
2011-02-14 11:29     ` Tejun Heo
2011-02-14 16:14       ` Yinghai Lu
2011-02-14 16:18         ` Tejun Heo
2011-02-14 18:00           ` Yinghai Lu
2011-02-12 17:10 ` [PATCH 05/26] x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value Tejun Heo
2011-02-12 17:10 ` [PATCH 06/26] x86-64, NUMA: Move *_numa_init() invocations into initmem_init() Tejun Heo
2011-02-14  6:10   ` Ankita Garg
2011-02-14 11:09     ` Tejun Heo
2011-02-14 13:51   ` [PATCH UPDATED 06/26] x86, " Tejun Heo
2011-02-14 14:21     ` Ingo Molnar
2011-02-12 17:10 ` [PATCH 07/26] x86-64, NUMA: Restructure initmem_init() Tejun Heo
2011-02-12 17:10 ` [PATCH 08/26] x86-64, NUMA: Use common {cpu|mem}_nodes_parsed Tejun Heo
2011-02-12 17:10 ` [PATCH 09/26] x86-64, NUMA: Remove local variable found from amd_numa_init() Tejun Heo
2011-02-12 17:10 ` [PATCH 10/26] x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() Tejun Heo
2011-02-14 22:59   ` Cyrill Gorcunov
2011-02-15  9:36     ` Tejun Heo
2011-02-15 17:31       ` Cyrill Gorcunov
2011-02-15 17:54         ` Yinghai Lu
2011-02-15 18:01           ` Cyrill Gorcunov
2011-02-15 18:27             ` Cyrill Gorcunov
2011-02-15 19:41             ` Yinghai Lu
2011-02-12 17:10 ` [PATCH 11/26] x86-64, NUMA: Use common numa_nodes[] Tejun Heo
2011-02-12 17:10 ` [PATCH 12/26] x86-64, NUMA: Kill {acpi|amd}_get_nodes() Tejun Heo
2011-02-12 17:10 ` [PATCH 13/26] x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk() Tejun Heo
2011-02-12 17:10 ` [PATCH 14/26] x86-64, NUMA: Unify use of memblk in all init methods Tejun Heo
2011-02-12 17:10 ` [PATCH 15/26] x86-64, NUMA: Unify the rest of memblk registration Tejun Heo
2011-02-13  0:45   ` Yinghai Lu
2011-02-14 11:32     ` Tejun Heo
2011-02-14 16:08       ` Yinghai Lu
2011-02-14 16:12         ` Tejun Heo
2011-02-14 16:17           ` Yinghai Lu
2011-02-14 16:22             ` Tejun Heo
2011-02-14 18:14               ` Yinghai Lu
2011-02-14 18:27                 ` Tejun Heo
2011-02-14 19:07                   ` Yinghai Lu
2011-02-14 19:30                     ` Tejun Heo
2011-02-14 19:35                       ` Yinghai Lu
2011-02-15  9:11                         ` Tejun Heo
2011-02-15  9:43                           ` Ingo Molnar
2011-02-15 16:49                             ` Tejun Heo
2011-02-16  8:41                               ` Ingo Molnar
2011-02-16  8:48                                 ` Ingo Molnar
2011-02-16  9:01                                   ` Tejun Heo
2011-02-16  9:31                                     ` Ingo Molnar
2011-02-12 17:10 ` [PATCH 16/26] x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes() Tejun Heo
2011-02-12 17:10 ` [PATCH 17/26] x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift() Tejun Heo
2011-02-12 17:10 ` [PATCH 18/26] x86-64, NUMA: Introduce struct numa_meminfo Tejun Heo
2011-02-12 17:10 ` [PATCH 19/26] x86-64, NUMA: Separate out numa_cleanup_meminfo() Tejun Heo
2011-02-12 17:10 ` [PATCH 20/26] x86-64, NUMA: make numa_cleanup_meminfo() prettier Tejun Heo
2011-02-12 17:10 ` [PATCH 21/26] x86-64, NUMA: consolidate and improve memblk sanity checks Tejun Heo
2011-02-12 17:10 ` [PATCH 22/26] x86-64, NUMA: Add common find_node_by_addr() Tejun Heo
2011-02-12 17:11 ` [PATCH 23/26] x86-64, NUMA: kill numa_nodes[] Tejun Heo
2011-02-12 17:11 ` [PATCH 24/26] x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsed Tejun Heo
2011-02-12 17:11 ` [PATCH 25/26] x86-64, NUMA: Kill mem_nodes_parsed Tejun Heo
2011-02-12 17:11 ` [PATCH 26/26] x86-64, NUMA: Implement generic node distance handling Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.