linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization
@ 2011-04-29 15:28 Tejun Heo
  2011-04-29 15:28 ` [PATCH 01/25] x86-64, NUMA: Simplify hotadd memory handling Tejun Heo
                   ` (26 more replies)
  0 siblings, 27 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel

Hello,

This patchset, finally, unifies 32 and 64bit NUMA initialization.  It
gradually moves 64bit stuff to common code and replaces 32bit code
with it.  Once the unification is complete, amdtopology and emulation
are enabled for 32bit too (there's no reason not to).

This patchset contains the following 25 patches.

 0001-x86-64-NUMA-Simplify-hotadd-memory-handling.patch
 0002-x86-64-NUMA-trivial-cleanups-for-setup_node_bootmem.patch
 0003-x86-64-NUMA-simplify-nodedata-allocation.patch
 0004-x86-32-NUMA-Automatically-set-apicid-node-in-setup_l.patch
 0005-x86-NUMA-Unify-32-64bit-numa_cpu_node-implementation.patch
 0006-x86-32-NUMA-Make-apic-x86_32_numa_cpu_node-optional.patch
 0007-x86-32-NUMA-use-sparse_memory_present_with_active_re.patch
 0008-x86-NUMA-trivial-cleanups.patch

 0009-x86-NUMA-rename-srat_64.c-to-srat.c.patch
 0010-x86-NUMA-make-srat.c-32bit-safe.patch
 0011-x86-32-NUMA-Move-get_memcfg_numa-into-numa_32.c.patch
 0012-x86-NUMA-Move-numa_nodes_parsed-to-numa.-hc.patch
 0013-x86-32-NUMA-implement-temporary-NUMA-init-shims.patch
 0014-x86-32-NUMA-Replace-srat_32.c-with-srat.c.patch

 0015-x86-32-NUMA-Update-numaq-to-use-new-NUMA-init-protoc.patch

 0016-x86-NUMA-Move-NUMA-init-logic-from-numa_64.c-to-numa.patch
 0017-x86-NUMA-Enable-build-of-generic-NUMA-init-code-on-3.patch
 0018-x86-NUMA-Remove-long-64bit-assumption-from-numa.c.patch
 0019-x86-32-NUMA-Add-start-and-end-to-init_alloc_remap.patch
 0020-x86-NUMA-Initialize-and-use-remap-allocator-from-set.patch
 0021-x86-NUMA-Make-32bit-use-common-NUMA-init-path.patch
 0022-x86-NUMA-Make-numa_init_array-static.patch

 0023-x86-NUMA-Rename-amdtopology_64.c-to-amdtopology.c.patch
 0024-x86-NUMA-Enable-CONFIG_AMD_NUMA-on-32bit-too.patch
 0025-x86-NUMA-Enable-emulation-on-32bit-too.patch

Patches can be grouped as follows.

0001-0008 are preparation patches.  Please note that 0001 was posted
before.

0009-0014 rename srat_64.c to srat.c and replace srat_32.c with it.

0015 converts NUMAQ to follow the new init protocol.

0016-0022 move 64bit NUMA init code from numa_64.c to numa.c and
replace 32bit init code in numa_32.c with it.

0023-0025 enable amdtoplogy and emulation for 32bit too.

I've tested ACPI and amdtopology with and without emulation on 32 and
64bit with several different configurations.  It all looks dandy but
wider scope testing is definitely required.  Also, I couldn't test
NUMAQ at all.

Due to dependencies, the base commit this patchset is based on is a
bit complex.  This patchset is based on,

 tip:x86/mm	c7a7b814c9dca9ee01b38e63b4a46de87156d3b6
 + tip:x86/numa	993ba1585cbb03fab012e41d1a5d24330a283b31
 + Cherry pick of 765af22da8 (x86-32, NUMA: Fix ACPI NUMA init broken
   by recent x86-64 change)

The tip:x86/numa requirement is because of the "x86-32, NUMA: Clean up
alloc_remap" patchset[2] which got committed there.  I think it would
have been better if they got routed together through x86/mm.  Well,
anyways, we can sort it out.

So, for review, please use the following git branch as base.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-unify-numa-base

The whole patchset on top of the above base is available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-unify-numa

diffstat follows.  610 lines removed and 32bit NUMA got much better! :)

 arch/x86/Kconfig                 |    4 
 arch/x86/include/asm/acpi.h      |    2 
 arch/x86/include/asm/amd_nb.h    |    1 
 arch/x86/include/asm/apic.h      |    9 
 arch/x86/include/asm/dma.h       |   12 
 arch/x86/include/asm/mmzone_32.h |   20 -
 arch/x86/include/asm/numa.h      |   32 ++
 arch/x86/include/asm/numa_32.h   |   10 
 arch/x86/include/asm/numa_64.h   |   36 --
 arch/x86/include/asm/numaq.h     |    7 
 arch/x86/include/asm/srat.h      |   39 --
 arch/x86/include/asm/topology.h  |    7 
 arch/x86/kernel/apic/apic.c      |   26 -
 arch/x86/kernel/apic/apic_noop.c |    9 
 arch/x86/kernel/apic/bigsmp_32.c |    1 
 arch/x86/kernel/apic/es7000_32.c |    7 
 arch/x86/kernel/apic/numaq_32.c  |   30 --
 arch/x86/kernel/apic/probe_32.c  |    1 
 arch/x86/kernel/apic/summit_32.c |    1 
 arch/x86/mm/Makefile             |    4 
 arch/x86/mm/amdtopology.c        |  197 ++++++++++++++
 arch/x86/mm/amdtopology_64.c     |  196 --------------
 arch/x86/mm/init_32.c            |    1 
 arch/x86/mm/init_64.c            |    8 
 arch/x86/mm/numa.c               |  545 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/mm/numa_32.c            |  165 -----------
 arch/x86/mm/numa_64.c            |  519 -------------------------------------
 arch/x86/mm/numa_emulation.c     |   16 -
 arch/x86/mm/numa_internal.h      |    8 
 arch/x86/mm/srat.c               |  184 +++++++++++++
 arch/x86/mm/srat_32.c            |  287 --------------------
 arch/x86/mm/srat_64.c            |  260 ------------------
 32 files changed, 1017 insertions(+), 1627 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 01/25] x86-64, NUMA: Simplify hotadd memory handling
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 02/25] x86-64, NUMA: trivial cleanups for setup_node_bootmem() Tejun Heo
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

The only special handling NUMA needs to do for hotadd memory is
determining the node for the hotadd memory given the address of it and
there's nothing specific to specific config method used.

srat_64.c does somewhat elaborate error checking on
ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements
memory_add_physaddr_to_nid() which determines the node for given
hotadd address.

This is almost completely redundant.  All the information is already
available to the generic NUMA code which already performs all the
sanity checking and merging.  All that's necessary is not using
__initdata from numa_meminfo and providing a function which uses it to
map address to node.

Drop the specific implementation from srat_64.c and add generic
memory_add_physaddr_to_nid() in numa_64.c, which is enabled if
CONFIG_MEMORY_HOTPLUG is set.  Other than dropping the code, srat_64.c
doesn't need any change as it already calls numa_add_memblk() for hot
pluggable regions which is enough.

While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to
CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the
same.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/init_64.c |    8 -----
 arch/x86/mm/numa_64.c |   22 +++++++++++++-
 arch/x86/mm/srat_64.c |   78 +------------------------------------------------
 3 files changed, 22 insertions(+), 86 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 7942335..0404bb3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -679,14 +679,6 @@ int arch_add_memory(int nid, u64 start, u64 size)
 }
 EXPORT_SYMBOL_GPL(arch_add_memory);
 
-#if !defined(CONFIG_ACPI_NUMA) && defined(CONFIG_NUMA)
-int memory_add_physaddr_to_nid(u64 start)
-{
-	return 0;
-}
-EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-#endif
-
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 static struct kcore_list kcore_vsyscall;
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 13f5b06..c4eecc2 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -28,7 +28,12 @@ EXPORT_SYMBOL(node_data);
 
 nodemask_t numa_nodes_parsed __initdata;
 
-static struct numa_meminfo numa_meminfo __initdata;
+static struct numa_meminfo numa_meminfo
+#ifndef CONFIG_MEMORY_HOTPLUG
+__initdata
+#endif
+;
+
 static int numa_distance_cnt;
 static u8 *numa_distance;
 
@@ -540,3 +545,18 @@ int __cpuinit numa_cpu_node(int cpu)
 		return __apicid_to_node[apicid];
 	return NUMA_NO_NODE;
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int memory_add_physaddr_to_nid(u64 start)
+{
+	struct numa_meminfo *mi = &numa_meminfo;
+	int nid = mi->blk[0].nid;
+	int i;
+
+	for (i = 0; i < mi->nr_blks; i++)
+		if (mi->blk[i].start <= start && mi->blk[i].end > start)
+			nid = mi->blk[i].nid;
+	return nid;
+}
+EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
+#endif
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 8e9d339..9994d2c 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -26,8 +26,6 @@
 
 int acpi_numa __initdata;
 
-static struct bootnode nodes_add[MAX_NUMNODES];
-
 static __init int setup_node(int pxm)
 {
 	return acpi_map_pxm_to_node(pxm);
@@ -37,7 +35,6 @@ static __init void bad_srat(void)
 {
 	printk(KERN_ERR "SRAT: SRAT not used.\n");
 	acpi_numa = -1;
-	memset(nodes_add, 0, sizeof(nodes_add));
 }
 
 static __init inline int srat_disabled(void)
@@ -131,67 +128,11 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
 	       pxm, apic_id, node);
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#ifdef CONFIG_MEMORY_HOTPLUG
 static inline int save_add_info(void) {return 1;}
 #else
 static inline int save_add_info(void) {return 0;}
 #endif
-/*
- * Update nodes_add[]
- * This code supports one contiguous hot add area per node
- */
-static void __init
-update_nodes_add(int node, unsigned long start, unsigned long end)
-{
-	unsigned long s_pfn = start >> PAGE_SHIFT;
-	unsigned long e_pfn = end >> PAGE_SHIFT;
-	int changed = 0;
-	struct bootnode *nd = &nodes_add[node];
-
-	/* I had some trouble with strange memory hotadd regions breaking
-	   the boot. Be very strict here and reject anything unexpected.
-	   If you want working memory hotadd write correct SRATs.
-
-	   The node size check is a basic sanity check to guard against
-	   mistakes */
-	if ((signed long)(end - start) < NODE_MIN_SIZE) {
-		printk(KERN_ERR "SRAT: Hotplug area too small\n");
-		return;
-	}
-
-	/* This check might be a bit too strict, but I'm keeping it for now. */
-	if (absent_pages_in_range(s_pfn, e_pfn) != e_pfn - s_pfn) {
-		printk(KERN_ERR
-			"SRAT: Hotplug area %lu -> %lu has existing memory\n",
-			s_pfn, e_pfn);
-		return;
-	}
-
-	/* Looks good */
-
-	if (nd->start == nd->end) {
-		nd->start = start;
-		nd->end = end;
-		changed = 1;
-	} else {
-		if (nd->start == end) {
-			nd->start = start;
-			changed = 1;
-		}
-		if (nd->end == start) {
-			nd->end = end;
-			changed = 1;
-		}
-		if (!changed)
-			printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
-	}
-
-	if (changed) {
-		node_set(node, numa_nodes_parsed);
-		printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
-				 nd->start, nd->end);
-	}
-}
 
 /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
 void __init
@@ -228,9 +169,6 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 
 	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
 	       start, end);
-
-	if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
-		update_nodes_add(node, start, end);
 }
 
 void __init acpi_numa_arch_fixup(void) {}
@@ -244,17 +182,3 @@ int __init x86_acpi_numa_init(void)
 		return ret;
 	return srat_disabled() ? -EINVAL : 0;
 }
-
-#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) || defined(CONFIG_ACPI_HOTPLUG_MEMORY)
-int memory_add_physaddr_to_nid(u64 start)
-{
-	int i, ret = 0;
-
-	for_each_node(i)
-		if (nodes_add[i].start <= start && nodes_add[i].end > start)
-			ret = i;
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-#endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/25] x86-64, NUMA: trivial cleanups for setup_node_bootmem()
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
  2011-04-29 15:28 ` [PATCH 01/25] x86-64, NUMA: Simplify hotadd memory handling Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 03/25] x86-64, NUMA: simplify nodedata allocation Tejun Heo
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Make the following trivial changes in preparation for further updates.

* nodeid -> nid, nid -> tnid
* use nd_ prefix for nodedata related variables
* remove start/end_pfn and use start/end directly

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/numa_64.c |   52 +++++++++++++++++++++---------------------------
 1 files changed, 23 insertions(+), 29 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index c4eecc2..5e0dfc5 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -128,14 +128,11 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 
 /* Initialize bootmem allocator for a node */
 void __init
-setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
+setup_node_bootmem(int nid, unsigned long start, unsigned long end)
 {
-	unsigned long start_pfn, last_pfn, nodedata_phys;
-	const int pgdat_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
-	int nid;
-
-	if (!end)
-		return;
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	unsigned long nd_pa;
+	int tnid;
 
 	/*
 	 * Don't confuse VM with a node that doesn't have the
@@ -146,30 +143,27 @@ setup_node_bootmem(int nodeid, unsigned long start, unsigned long end)
 
 	start = roundup(start, ZONE_ALIGN);
 
-	printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n", nodeid,
-	       start, end);
-
-	start_pfn = start >> PAGE_SHIFT;
-	last_pfn = end >> PAGE_SHIFT;
+	printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n",
+	       nid, start, end);
 
-	node_data[nodeid] = early_node_mem(nodeid, start, end, pgdat_size,
-					   SMP_CACHE_BYTES);
-	if (node_data[nodeid] == NULL)
+	node_data[nid] = early_node_mem(nid, start, end, nd_size,
+					SMP_CACHE_BYTES);
+	if (node_data[nid] == NULL)
 		return;
-	nodedata_phys = __pa(node_data[nodeid]);
-	memblock_x86_reserve_range(nodedata_phys, nodedata_phys + pgdat_size, "NODE_DATA");
-	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n", nodedata_phys,
-		nodedata_phys + pgdat_size - 1);
-	nid = early_pfn_to_nid(nodedata_phys >> PAGE_SHIFT);
-	if (nid != nodeid)
-		printk(KERN_INFO "    NODE_DATA(%d) on node %d\n", nodeid, nid);
-
-	memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));
-	NODE_DATA(nodeid)->node_id = nodeid;
-	NODE_DATA(nodeid)->node_start_pfn = start_pfn;
-	NODE_DATA(nodeid)->node_spanned_pages = last_pfn - start_pfn;
-
-	node_set_online(nodeid);
+	nd_pa = __pa(node_data[nid]);
+	memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
+	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n",
+	       nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		printk(KERN_INFO "    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
+	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
+
+	node_set_online(nid);
 }
 
 /**
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/25] x86-64, NUMA: simplify nodedata allocation
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
  2011-04-29 15:28 ` [PATCH 01/25] x86-64, NUMA: Simplify hotadd memory handling Tejun Heo
  2011-04-29 15:28 ` [PATCH 02/25] x86-64, NUMA: trivial cleanups for setup_node_bootmem() Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 17:23   ` Yinghai Lu
  2011-04-29 15:28 ` [PATCH 04/25] x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC() Tejun Heo
                   ` (23 subsequent siblings)
  26 siblings, 1 reply; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

With top-down memblock allocation, the allocation range limits in
ealry_node_mem() can be simplified - try node-local first, then any
node but in any case don't allocate below DMA limit.

Remove early_node_mem() and implement simplified allocation directly
in setup_node_bootmem().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/numa_64.c |   53 +++++++++++++++---------------------------------
 1 files changed, 17 insertions(+), 36 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 5e0dfc5..59d8a1c 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -37,38 +37,6 @@ __initdata
 static int numa_distance_cnt;
 static u8 *numa_distance;
 
-static void * __init early_node_mem(int nodeid, unsigned long start,
-				    unsigned long end, unsigned long size,
-				    unsigned long align)
-{
-	unsigned long mem;
-
-	/*
-	 * put it on high as possible
-	 * something will go with NODE_DATA
-	 */
-	if (start < (MAX_DMA_PFN<<PAGE_SHIFT))
-		start = MAX_DMA_PFN<<PAGE_SHIFT;
-	if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
-	    end > (MAX_DMA32_PFN<<PAGE_SHIFT))
-		start = MAX_DMA32_PFN<<PAGE_SHIFT;
-	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
-	if (mem != MEMBLOCK_ERROR)
-		return __va(mem);
-
-	/* extend the search scope */
-	end = max_pfn_mapped << PAGE_SHIFT;
-	start = MAX_DMA_PFN << PAGE_SHIFT;
-	mem = memblock_find_in_range(start, end, size, align);
-	if (mem != MEMBLOCK_ERROR)
-		return __va(mem);
-
-	printk(KERN_ERR "Cannot find %lu bytes in node %d\n",
-		       size, nodeid);
-
-	return NULL;
-}
-
 static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
 				     struct numa_meminfo *mi)
 {
@@ -130,6 +98,8 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 void __init
 setup_node_bootmem(int nid, unsigned long start, unsigned long end)
 {
+	const u64 nd_low = (u64)MAX_DMA_PFN << PAGE_SHIFT;
+	const u64 nd_high = (u64)max_pfn_mapped << PAGE_SHIFT;
 	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
 	unsigned long nd_pa;
 	int tnid;
@@ -146,18 +116,29 @@ setup_node_bootmem(int nid, unsigned long start, unsigned long end)
 	printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n",
 	       nid, start, end);
 
-	node_data[nid] = early_node_mem(nid, start, end, nd_size,
-					SMP_CACHE_BYTES);
-	if (node_data[nid] == NULL)
+	/*
+	 * Try to allocate node data on local node and then fall back to
+	 * all nodes.  Never allocate in DMA zone.
+	 */
+	nd_pa = memblock_x86_find_in_range_node(nid, nd_low, nd_high,
+						nd_size, SMP_CACHE_BYTES);
+	if (nd_pa == MEMBLOCK_ERROR)
+		nd_pa = memblock_find_in_range(nd_low, nd_high,
+					       nd_size, SMP_CACHE_BYTES);
+	if (nd_pa == MEMBLOCK_ERROR) {
+		pr_err("Cannot find %lu bytes in node %d\n", nd_size, nid);
 		return;
-	nd_pa = __pa(node_data[nid]);
+	}
 	memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
+
+	/* report and initialize */
 	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n",
 	       nd_pa, nd_pa + nd_size - 1);
 	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
 	if (tnid != nid)
 		printk(KERN_INFO "    NODE_DATA(%d) on node %d\n", nid, tnid);
 
+	node_data[nid] = __va(nd_pa);
 	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
 	NODE_DATA(nid)->node_id = nid;
 	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/25] x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC()
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (2 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 03/25] x86-64, NUMA: simplify nodedata allocation Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 05/25] x86, NUMA: Unify 32/64bit numa_cpu_node() implementation Tejun Heo
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Some x86-32 NUMA implementations (NUMAQ) don't initialize apicid ->
node mapping using set_apicid_to_node() during NUMA init but implement
custom apic->x86_32_numa_cpu_node() instead.

This patch automatically initializes the default apic -> node mapping
table from apic->x86_32_numa_cpu_node() from setup_local_APIC() such
that the mapping table is in sync with the actual mapping.

As the table isn't used by custom implementations, this doesn't make
any difference at this point.  This is in preparation of unifying
numa_cpu_node() between x86-32 and 64.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/apic/apic.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 2bc503b..a6cd02a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1237,6 +1237,16 @@ void __cpuinit setup_local_APIC(void)
 	/* always use the value from LDR */
 	early_per_cpu(x86_cpu_to_logical_apicid, cpu) =
 		logical_smp_processor_id();
+
+	/*
+	 * Some NUMA implementations (NUMAQ) don't initialize apicid to
+	 * node mapping during NUMA init.  Now that logical apicid is
+	 * guaranteed to be known, give it another chance.  This is already
+	 * a bit too late - percpu allocation has already happened without
+	 * proper NUMA affinity.
+	 */
+	set_apicid_to_node(early_per_cpu(x86_cpu_to_apicid, cpu),
+			   apic->x86_32_numa_cpu_node(cpu));
 #endif
 
 	/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/25] x86, NUMA: Unify 32/64bit numa_cpu_node() implementation
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (3 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 04/25] x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC() Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 06/25] x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional Tejun Heo
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is
NUMAQ which returns valid mapping only after CPU is initialized during
SMP bringup; thus, the previous patch to set apicid -> node in
setup_local_APIC() makes __apicid_to_node[] always contain the correct
mapping whether custom apic->x86_32_numa_cpu_node() is used or not.

So, there is no reason to keep separate 32bit implementation.  We can
always consult __apicid_to_node[].  Move 64bit implementation from
numa_64.c to numa.c and remove 32bit implementation from numa_32.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/numa.h    |   10 ++++++++++
 arch/x86/include/asm/numa_32.h |    6 ------
 arch/x86/include/asm/numa_64.h |    3 ---
 arch/x86/mm/numa.c             |    9 +++++++++
 arch/x86/mm/numa_32.c          |    5 -----
 arch/x86/mm/numa_64.c          |    9 ---------
 6 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 3d4dab4..b2bd5e6 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_NUMA_H
 #define _ASM_X86_NUMA_H
 
+#include <linux/nodemask.h>
+
 #include <asm/topology.h>
 #include <asm/apicdef.h>
 
@@ -22,10 +24,18 @@ static inline void set_apicid_to_node(int apicid, s16 node)
 {
 	__apicid_to_node[apicid] = node;
 }
+
+extern int __cpuinit numa_cpu_node(int cpu);
+
 #else	/* CONFIG_NUMA */
 static inline void set_apicid_to_node(int apicid, s16 node)
 {
 }
+
+static inline int numa_cpu_node(int cpu)
+{
+	return NUMA_NO_NODE;
+}
 #endif	/* CONFIG_NUMA */
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/numa_32.h b/arch/x86/include/asm/numa_32.h
index c6beed1..242522f 100644
--- a/arch/x86/include/asm/numa_32.h
+++ b/arch/x86/include/asm/numa_32.h
@@ -5,12 +5,6 @@ extern int numa_off;
 
 extern int pxm_to_nid(int pxm);
 
-#ifdef CONFIG_NUMA
-extern int __cpuinit numa_cpu_node(int cpu);
-#else	/* CONFIG_NUMA */
-static inline int numa_cpu_node(int cpu)		{ return NUMA_NO_NODE; }
-#endif	/* CONFIG_NUMA */
-
 #ifdef CONFIG_HIGHMEM
 extern void set_highmem_pages_init(void);
 #else
diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 344eb17..12461eb 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -26,7 +26,6 @@ extern void setup_node_bootmem(int nodeid, unsigned long start,
 
 extern nodemask_t numa_nodes_parsed __initdata;
 
-extern int __cpuinit numa_cpu_node(int cpu);
 extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
 extern void __init numa_set_distance(int from, int to, int distance);
 
@@ -35,8 +34,6 @@ extern void __init numa_set_distance(int from, int to, int distance);
 #define FAKE_NODE_MIN_HASH_MASK	(~(FAKE_NODE_MIN_SIZE - 1UL))
 void numa_emu_cmdline(char *);
 #endif /* CONFIG_NUMA_EMU */
-#else
-static inline int numa_cpu_node(int cpu)		{ return NUMA_NO_NODE; }
 #endif
 
 #endif /* _ASM_X86_NUMA_64_H */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9559d36..08530e3 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -32,6 +32,15 @@ s16 __apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
 	[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
 
+int __cpuinit numa_cpu_node(int cpu)
+{
+	int apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
+
+	if (apicid != BAD_APICID)
+		return __apicid_to_node[apicid];
+	return NUMA_NO_NODE;
+}
+
 cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
 EXPORT_SYMBOL(node_to_cpumask_map);
 
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index c757c0a..e0d9716 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -107,11 +107,6 @@ extern unsigned long highend_pfn, highstart_pfn;
 static void *node_remap_start_vaddr[MAX_NUMNODES];
 void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
 
-int __cpuinit numa_cpu_node(int cpu)
-{
-	return apic->x86_32_numa_cpu_node(cpu);
-}
-
 /*
  * FLAT - support for basic PC memory model with discontig enabled, essentially
  *        a single node with all available processors in it with a flat
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 59d8a1c..3598fbf 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -512,15 +512,6 @@ unsigned long __init numa_free_all_bootmem(void)
 	return pages;
 }
 
-int __cpuinit numa_cpu_node(int cpu)
-{
-	int apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
-
-	if (apicid != BAD_APICID)
-		return __apicid_to_node[apicid];
-	return NUMA_NO_NODE;
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 int memory_add_physaddr_to_nid(u64 start)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/25] x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (4 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 05/25] x86, NUMA: Unify 32/64bit numa_cpu_node() implementation Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 07/25] x86-32, NUMA: use sparse_memory_present_with_active_regions() Tejun Heo
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

NUMAQ is the only meaningful user of this callback and
setup_local_APIC() the only callsite.  Stop torturing everyone else by
making the callback optional and removing all the boilerplate
implementations and assignments.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/apic.h      |    9 ++++++---
 arch/x86/kernel/apic/apic.c      |   20 +++-----------------
 arch/x86/kernel/apic/apic_noop.c |    9 ---------
 arch/x86/kernel/apic/bigsmp_32.c |    1 -
 arch/x86/kernel/apic/es7000_32.c |    7 -------
 arch/x86/kernel/apic/probe_32.c  |    1 -
 arch/x86/kernel/apic/summit_32.c |    1 -
 7 files changed, 9 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 2b7d573..a0c46f0 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -363,7 +363,12 @@ struct apic {
 	 */
 	int (*x86_32_early_logical_apicid)(int cpu);
 
-	/* determine CPU -> NUMA node mapping */
+	/*
+	 * Optional method called from setup_local_APIC() after logical
+	 * apicid is guaranteed to be known to initialize apicid -> node
+	 * mapping if NUMA initialization hasn't done so already.  Don't
+	 * add new users.
+	 */
 	int (*x86_32_numa_cpu_node)(int cpu);
 #endif
 };
@@ -537,8 +542,6 @@ static inline int default_phys_pkg_id(int cpuid_apic, int index_msb)
 	return cpuid_apic >> index_msb;
 }
 
-extern int default_x86_32_numa_cpu_node(int cpu);
-
 #endif
 
 static inline unsigned int
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a6cd02a..0c67b4f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1245,8 +1245,9 @@ void __cpuinit setup_local_APIC(void)
 	 * a bit too late - percpu allocation has already happened without
 	 * proper NUMA affinity.
 	 */
-	set_apicid_to_node(early_per_cpu(x86_cpu_to_apicid, cpu),
-			   apic->x86_32_numa_cpu_node(cpu));
+	if (apic->x86_32_numa_cpu_node)
+		set_apicid_to_node(early_per_cpu(x86_cpu_to_apicid, cpu),
+				   apic->x86_32_numa_cpu_node(cpu));
 #endif
 
 	/*
@@ -2013,21 +2014,6 @@ void default_init_apic_ldr(void)
 	apic_write(APIC_LDR, val);
 }
 
-#ifdef CONFIG_X86_32
-int default_x86_32_numa_cpu_node(int cpu)
-{
-#ifdef CONFIG_NUMA
-	int apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
-
-	if (apicid != BAD_APICID)
-		return __apicid_to_node[apicid];
-	return NUMA_NO_NODE;
-#else
-	return 0;
-#endif
-}
-#endif
-
 /*
  * Power management
  */
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index f1baa2d..775b82b 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -119,14 +119,6 @@ static void noop_apic_write(u32 reg, u32 v)
 	WARN_ON_ONCE(cpu_has_apic && !disable_apic);
 }
 
-#ifdef CONFIG_X86_32
-static int noop_x86_32_numa_cpu_node(int cpu)
-{
-	/* we're always on node 0 */
-	return 0;
-}
-#endif
-
 struct apic apic_noop = {
 	.name				= "noop",
 	.probe				= noop_probe,
@@ -195,6 +187,5 @@ struct apic apic_noop = {
 
 #ifdef CONFIG_X86_32
 	.x86_32_early_logical_apicid	= noop_x86_32_early_logical_apicid,
-	.x86_32_numa_cpu_node		= noop_x86_32_numa_cpu_node,
 #endif
 };
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 541a2e4..d84ac5a 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -253,5 +253,4 @@ struct apic apic_bigsmp = {
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
 	.x86_32_early_logical_apicid	= bigsmp_early_logical_apicid,
-	.x86_32_numa_cpu_node		= default_x86_32_numa_cpu_node,
 };
diff --git a/arch/x86/kernel/apic/es7000_32.c b/arch/x86/kernel/apic/es7000_32.c
index 3e9de48..70533de 100644
--- a/arch/x86/kernel/apic/es7000_32.c
+++ b/arch/x86/kernel/apic/es7000_32.c
@@ -510,11 +510,6 @@ static void es7000_setup_apic_routing(void)
 		nr_ioapics, cpumask_bits(es7000_target_cpus())[0]);
 }
 
-static int es7000_numa_cpu_node(int cpu)
-{
-	return 0;
-}
-
 static int es7000_cpu_present_to_apicid(int mps_cpu)
 {
 	if (!mps_cpu)
@@ -688,7 +683,6 @@ struct apic __refdata apic_es7000_cluster = {
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
 	.x86_32_early_logical_apicid	= es7000_early_logical_apicid,
-	.x86_32_numa_cpu_node		= es7000_numa_cpu_node,
 };
 
 struct apic __refdata apic_es7000 = {
@@ -752,5 +746,4 @@ struct apic __refdata apic_es7000 = {
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
 	.x86_32_early_logical_apicid	= es7000_early_logical_apicid,
-	.x86_32_numa_cpu_node		= es7000_numa_cpu_node,
 };
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index fc84c7b..6541e47 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -172,7 +172,6 @@ struct apic apic_default = {
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
 	.x86_32_early_logical_apicid	= default_x86_32_early_logical_apicid,
-	.x86_32_numa_cpu_node		= default_x86_32_numa_cpu_node,
 };
 
 extern struct apic apic_numaq;
diff --git a/arch/x86/kernel/apic/summit_32.c b/arch/x86/kernel/apic/summit_32.c
index e4b8059..35bcd7d 100644
--- a/arch/x86/kernel/apic/summit_32.c
+++ b/arch/x86/kernel/apic/summit_32.c
@@ -551,5 +551,4 @@ struct apic apic_summit = {
 	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 
 	.x86_32_early_logical_apicid	= summit_early_logical_apicid,
-	.x86_32_numa_cpu_node		= default_x86_32_numa_cpu_node,
 };
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/25] x86-32, NUMA: use sparse_memory_present_with_active_regions()
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (5 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 06/25] x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 08/25] x86, NUMA: trivial cleanups Tejun Heo
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Instead of calling memory_present() for each region from NUMA init,
call sparse_memory_present_with_active_regions() from paging_init()
similarly to x86-64.

For flat and numaq, this results in exactly the same memory_present()
calls.  For srat, if there are multiple memory chunks for a node,
after this change, memory_present() will be called separately for each
chunk instead of being called once to encompass the whole range, which
doesn't cause any harm and actually is the better behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/apic/numaq_32.c |    2 --
 arch/x86/mm/init_32.c           |    1 +
 arch/x86/mm/numa_32.c           |    1 -
 arch/x86/mm/srat_32.c           |    8 +-------
 4 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index 0aced70..41b8b29 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -91,8 +91,6 @@ static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
 
 	memblock_x86_register_active_regions(node, node_start_pfn[node],
 						node_end_pfn[node]);
-
-	memory_present(node, node_start_pfn[node], node_end_pfn[node]);
 }
 
 /*
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 80088f9..2cde0a3 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -716,6 +716,7 @@ void __init paging_init(void)
 	 * NOTE: at this point the bootmem allocator is fully available.
 	 */
 	olpc_dt_build_devicetree();
+	sparse_memory_present_with_active_regions(MAX_NUMNODES);
 	sparse_init();
 	zone_sizes_init();
 }
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index e0d9716..f847fa1 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -119,7 +119,6 @@ int __init get_memcfg_numa_flat(void)
 	node_start_pfn[0] = 0;
 	node_end_pfn[0] = max_pfn;
 	memblock_x86_register_active_regions(0, 0, max_pfn);
-	memory_present(0, 0, max_pfn);
 
         /* Indicate there is one node available. */
 	nodes_clear(node_online_map);
diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c
index ae20046..6b9bfd7 100644
--- a/arch/x86/mm/srat_32.c
+++ b/arch/x86/mm/srat_32.c
@@ -209,7 +209,7 @@ static __init int node_read_chunk(int nid, struct node_memory_chunk_s *memory_ch
 
 int __init get_memcfg_from_srat(void)
 {
-	int i, j, nid;
+	int i, j;
 
 	if (srat_disabled())
 		goto out_fail;
@@ -273,12 +273,6 @@ int __init get_memcfg_from_srat(void)
 	/* for out of order entries in SRAT */
 	sort_node_map();
 
-	for_each_online_node(nid) {
-		unsigned long start = node_start_pfn[nid];
-		unsigned long end = min(node_end_pfn[nid], max_pfn);
-
-		memory_present(nid, start, end);
-	}
 	return 1;
 out_fail:
 	printk(KERN_DEBUG "failed to get NUMA memory information from SRAT"
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/25] x86, NUMA: trivial cleanups
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (6 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 07/25] x86-32, NUMA: use sparse_memory_present_with_active_regions() Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 17:25   ` Yinghai Lu
  2011-04-29 15:28 ` [PATCH 09/25] x86, NUMA: rename srat_64.c to srat.c Tejun Heo
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

* Kill no longer used struct bootnode.

* Kill dangling declaration of pxm_to_nid() in numa_32.h.

* Make setup_node_bootmem() static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/acpi.h    |    2 --
 arch/x86/include/asm/amd_nb.h  |    1 -
 arch/x86/include/asm/numa_32.h |    2 --
 arch/x86/include/asm/numa_64.h |    7 -------
 arch/x86/mm/numa_64.c          |    2 +-
 5 files changed, 1 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 12e0e7d..416d865 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -183,8 +183,6 @@ static inline void disable_acpi(void) { }
 
 #define ARCH_HAS_POWER_INIT	1
 
-struct bootnode;
-
 #ifdef CONFIG_ACPI_NUMA
 extern int acpi_numa;
 extern int x86_acpi_numa_init(void);
diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3316822..67f87f2 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -11,7 +11,6 @@ struct amd_nb_bus_dev_range {
 
 extern const struct pci_device_id amd_nb_misc_ids[];
 extern const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[];
-struct bootnode;
 
 extern bool early_is_amd_nb(u32 value);
 extern int amd_cache_northbridges(void);
diff --git a/arch/x86/include/asm/numa_32.h b/arch/x86/include/asm/numa_32.h
index 242522f..7e54b64 100644
--- a/arch/x86/include/asm/numa_32.h
+++ b/arch/x86/include/asm/numa_32.h
@@ -3,8 +3,6 @@
 
 extern int numa_off;
 
-extern int pxm_to_nid(int pxm);
-
 #ifdef CONFIG_HIGHMEM
 extern void set_highmem_pages_init(void);
 #else
diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 12461eb..794da6d 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -3,18 +3,11 @@
 
 #include <linux/nodemask.h>
 
-struct bootnode {
-	u64 start;
-	u64 end;
-};
-
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
 extern int numa_off;
 
 extern unsigned long numa_free_all_bootmem(void);
-extern void setup_node_bootmem(int nodeid, unsigned long start,
-			       unsigned long end);
 
 #ifdef CONFIG_NUMA
 /*
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 3598fbf..813a161 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -95,7 +95,7 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 }
 
 /* Initialize bootmem allocator for a node */
-void __init
+static void __init
 setup_node_bootmem(int nid, unsigned long start, unsigned long end)
 {
 	const u64 nd_low = (u64)MAX_DMA_PFN << PAGE_SHIFT;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/25] x86, NUMA: rename srat_64.c to srat.c
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (7 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 08/25] x86, NUMA: trivial cleanups Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 10/25] x86, NUMA: make srat.c 32bit safe Tejun Heo
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Rename srat_64.c to srat.c.  This is to prepare for unification of
NUMA init paths between 32 and 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/Makefile  |    5 +-
 arch/x86/mm/srat.c    |  184 +++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/srat_64.c |  184 -------------------------------------------------
 3 files changed, 188 insertions(+), 185 deletions(-)
 create mode 100644 arch/x86/mm/srat.c
 delete mode 100644 arch/x86/mm/srat_64.c

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 3e608ed..37e7043 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -24,7 +24,10 @@ obj-$(CONFIG_MMIOTRACE_TEST)	+= testmmiotrace.o
 
 obj-$(CONFIG_NUMA)		+= numa.o numa_$(BITS).o
 obj-$(CONFIG_AMD_NUMA)		+= amdtopology_64.o
-obj-$(CONFIG_ACPI_NUMA)		+= srat_$(BITS).o
+ifeq ($(CONFIG_ACPI_NUMA),y)
+obj-$(CONFIG_X86_64)		+= srat.o
+obj-$(CONFIG_X86_32)		+= srat_32.o
+endif
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_HAVE_MEMBLOCK)		+= memblock.o
diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
new file mode 100644
index 0000000..9994d2c
--- /dev/null
+++ b/arch/x86/mm/srat.c
@@ -0,0 +1,184 @@
+/*
+ * ACPI 3.0 based NUMA setup
+ * Copyright 2004 Andi Kleen, SuSE Labs.
+ *
+ * Reads the ACPI SRAT table to figure out what memory belongs to which CPUs.
+ *
+ * Called from acpi_numa_init while reading the SRAT and SLIT tables.
+ * Assumes all memory regions belonging to a single proximity domain
+ * are in one chunk. Holes between them will be included in the node.
+ */
+
+#include <linux/kernel.h>
+#include <linux/acpi.h>
+#include <linux/mmzone.h>
+#include <linux/bitmap.h>
+#include <linux/module.h>
+#include <linux/topology.h>
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/mm.h>
+#include <asm/proto.h>
+#include <asm/numa.h>
+#include <asm/e820.h>
+#include <asm/apic.h>
+#include <asm/uv/uv.h>
+
+int acpi_numa __initdata;
+
+static __init int setup_node(int pxm)
+{
+	return acpi_map_pxm_to_node(pxm);
+}
+
+static __init void bad_srat(void)
+{
+	printk(KERN_ERR "SRAT: SRAT not used.\n");
+	acpi_numa = -1;
+}
+
+static __init inline int srat_disabled(void)
+{
+	return acpi_numa < 0;
+}
+
+/* Callback for SLIT parsing */
+void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
+{
+	int i, j;
+
+	for (i = 0; i < slit->locality_count; i++)
+		for (j = 0; j < slit->locality_count; j++)
+			numa_set_distance(pxm_to_node(i), pxm_to_node(j),
+				slit->entry[slit->locality_count * i + j]);
+}
+
+/* Callback for Proximity Domain -> x2APIC mapping */
+void __init
+acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
+{
+	int pxm, node;
+	int apic_id;
+
+	if (srat_disabled())
+		return;
+	if (pa->header.length < sizeof(struct acpi_srat_x2apic_cpu_affinity)) {
+		bad_srat();
+		return;
+	}
+	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
+		return;
+	pxm = pa->proximity_domain;
+	node = setup_node(pxm);
+	if (node < 0) {
+		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
+		bad_srat();
+		return;
+	}
+
+	apic_id = pa->apic_id;
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+		return;
+	}
+	set_apicid_to_node(apic_id, node);
+	node_set(node, numa_nodes_parsed);
+	acpi_numa = 1;
+	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
+	       pxm, apic_id, node);
+}
+
+/* Callback for Proximity Domain -> LAPIC mapping */
+void __init
+acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
+{
+	int pxm, node;
+	int apic_id;
+
+	if (srat_disabled())
+		return;
+	if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) {
+		bad_srat();
+		return;
+	}
+	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
+		return;
+	pxm = pa->proximity_domain_lo;
+	node = setup_node(pxm);
+	if (node < 0) {
+		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
+		bad_srat();
+		return;
+	}
+
+	if (get_uv_system_type() >= UV_X2APIC)
+		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
+	else
+		apic_id = pa->apic_id;
+
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+		return;
+	}
+
+	set_apicid_to_node(apic_id, node);
+	node_set(node, numa_nodes_parsed);
+	acpi_numa = 1;
+	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
+	       pxm, apic_id, node);
+}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int save_add_info(void) {return 1;}
+#else
+static inline int save_add_info(void) {return 0;}
+#endif
+
+/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
+void __init
+acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
+{
+	unsigned long start, end;
+	int node, pxm;
+
+	if (srat_disabled())
+		return;
+	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity)) {
+		bad_srat();
+		return;
+	}
+	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
+		return;
+
+	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
+		return;
+	start = ma->base_address;
+	end = start + ma->length;
+	pxm = ma->proximity_domain;
+	node = setup_node(pxm);
+	if (node < 0) {
+		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
+		bad_srat();
+		return;
+	}
+
+	if (numa_add_memblk(node, start, end) < 0) {
+		bad_srat();
+		return;
+	}
+
+	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
+	       start, end);
+}
+
+void __init acpi_numa_arch_fixup(void) {}
+
+int __init x86_acpi_numa_init(void)
+{
+	int ret;
+
+	ret = acpi_numa_init();
+	if (ret < 0)
+		return ret;
+	return srat_disabled() ? -EINVAL : 0;
+}
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
deleted file mode 100644
index 9994d2c..0000000
--- a/arch/x86/mm/srat_64.c
+++ /dev/null
@@ -1,184 +0,0 @@
-/*
- * ACPI 3.0 based NUMA setup
- * Copyright 2004 Andi Kleen, SuSE Labs.
- *
- * Reads the ACPI SRAT table to figure out what memory belongs to which CPUs.
- *
- * Called from acpi_numa_init while reading the SRAT and SLIT tables.
- * Assumes all memory regions belonging to a single proximity domain
- * are in one chunk. Holes between them will be included in the node.
- */
-
-#include <linux/kernel.h>
-#include <linux/acpi.h>
-#include <linux/mmzone.h>
-#include <linux/bitmap.h>
-#include <linux/module.h>
-#include <linux/topology.h>
-#include <linux/bootmem.h>
-#include <linux/memblock.h>
-#include <linux/mm.h>
-#include <asm/proto.h>
-#include <asm/numa.h>
-#include <asm/e820.h>
-#include <asm/apic.h>
-#include <asm/uv/uv.h>
-
-int acpi_numa __initdata;
-
-static __init int setup_node(int pxm)
-{
-	return acpi_map_pxm_to_node(pxm);
-}
-
-static __init void bad_srat(void)
-{
-	printk(KERN_ERR "SRAT: SRAT not used.\n");
-	acpi_numa = -1;
-}
-
-static __init inline int srat_disabled(void)
-{
-	return acpi_numa < 0;
-}
-
-/* Callback for SLIT parsing */
-void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
-{
-	int i, j;
-
-	for (i = 0; i < slit->locality_count; i++)
-		for (j = 0; j < slit->locality_count; j++)
-			numa_set_distance(pxm_to_node(i), pxm_to_node(j),
-				slit->entry[slit->locality_count * i + j]);
-}
-
-/* Callback for Proximity Domain -> x2APIC mapping */
-void __init
-acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
-{
-	int pxm, node;
-	int apic_id;
-
-	if (srat_disabled())
-		return;
-	if (pa->header.length < sizeof(struct acpi_srat_x2apic_cpu_affinity)) {
-		bad_srat();
-		return;
-	}
-	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
-		return;
-	pxm = pa->proximity_domain;
-	node = setup_node(pxm);
-	if (node < 0) {
-		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
-		bad_srat();
-		return;
-	}
-
-	apic_id = pa->apic_id;
-	if (apic_id >= MAX_LOCAL_APIC) {
-		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
-		return;
-	}
-	set_apicid_to_node(apic_id, node);
-	node_set(node, numa_nodes_parsed);
-	acpi_numa = 1;
-	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
-	       pxm, apic_id, node);
-}
-
-/* Callback for Proximity Domain -> LAPIC mapping */
-void __init
-acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
-{
-	int pxm, node;
-	int apic_id;
-
-	if (srat_disabled())
-		return;
-	if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) {
-		bad_srat();
-		return;
-	}
-	if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0)
-		return;
-	pxm = pa->proximity_domain_lo;
-	node = setup_node(pxm);
-	if (node < 0) {
-		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
-		bad_srat();
-		return;
-	}
-
-	if (get_uv_system_type() >= UV_X2APIC)
-		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
-	else
-		apic_id = pa->apic_id;
-
-	if (apic_id >= MAX_LOCAL_APIC) {
-		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
-		return;
-	}
-
-	set_apicid_to_node(apic_id, node);
-	node_set(node, numa_nodes_parsed);
-	acpi_numa = 1;
-	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
-	       pxm, apic_id, node);
-}
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-static inline int save_add_info(void) {return 1;}
-#else
-static inline int save_add_info(void) {return 0;}
-#endif
-
-/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
-void __init
-acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
-{
-	unsigned long start, end;
-	int node, pxm;
-
-	if (srat_disabled())
-		return;
-	if (ma->header.length != sizeof(struct acpi_srat_mem_affinity)) {
-		bad_srat();
-		return;
-	}
-	if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
-		return;
-
-	if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
-		return;
-	start = ma->base_address;
-	end = start + ma->length;
-	pxm = ma->proximity_domain;
-	node = setup_node(pxm);
-	if (node < 0) {
-		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
-		bad_srat();
-		return;
-	}
-
-	if (numa_add_memblk(node, start, end) < 0) {
-		bad_srat();
-		return;
-	}
-
-	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
-	       start, end);
-}
-
-void __init acpi_numa_arch_fixup(void) {}
-
-int __init x86_acpi_numa_init(void)
-{
-	int ret;
-
-	ret = acpi_numa_init();
-	if (ret < 0)
-		return ret;
-	return srat_disabled() ? -EINVAL : 0;
-}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/25] x86, NUMA: make srat.c 32bit safe
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (8 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 09/25] x86, NUMA: rename srat_64.c to srat.c Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 11/25] x86-32, NUMA: Move get_memcfg_numa() into numa_32.c Tejun Heo
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Make srat.c 32bit safe by removing the assumption that unsigned long
is 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/srat.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/srat.c b/arch/x86/mm/srat.c
index 9994d2c..81dbfde 100644
--- a/arch/x86/mm/srat.c
+++ b/arch/x86/mm/srat.c
@@ -138,7 +138,7 @@ static inline int save_add_info(void) {return 0;}
 void __init
 acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 {
-	unsigned long start, end;
+	u64 start, end;
 	int node, pxm;
 
 	if (srat_disabled())
@@ -167,7 +167,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 		return;
 	}
 
-	printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
+	printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
 	       start, end);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/25] x86-32, NUMA: Move get_memcfg_numa() into numa_32.c
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (9 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 10/25] x86, NUMA: make srat.c 32bit safe Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 12/25] x86, NUMA: Move numa_nodes_parsed to numa.[hc] Tejun Heo
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

There's no reason get_memcfg_numa() to be implemented inline in
mmzone_32.h.  Move it to numa_32.c and also make
get_memcfg_numa_flag() static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/mmzone_32.h |   18 ------------------
 arch/x86/mm/numa_32.c            |   11 ++++++++++-
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 91df7c5..73e5745 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -16,28 +16,10 @@ extern struct pglist_data *node_data[];
 /* summit or generic arch */
 #include <asm/srat.h>
 
-extern int get_memcfg_numa_flat(void);
-/*
- * This allows any one NUMA architecture to be compiled
- * for, and still fall back to the flat function if it
- * fails.
- */
-static inline void get_memcfg_numa(void)
-{
-
-	if (get_memcfg_numaq())
-		return;
-	if (get_memcfg_from_srat())
-		return;
-	get_memcfg_numa_flat();
-}
-
 extern void resume_map_numa_kva(pgd_t *pgd);
 
 #else /* !CONFIG_NUMA */
 
-#define get_memcfg_numa get_memcfg_numa_flat
-
 static inline void resume_map_numa_kva(pgd_t *pgd) {}
 
 #endif /* CONFIG_NUMA */
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index f847fa1..abf1247 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -112,7 +112,7 @@ void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
  *        a single node with all available processors in it with a flat
  *        memory map.
  */
-int __init get_memcfg_numa_flat(void)
+static int __init get_memcfg_numa_flat(void)
 {
 	printk(KERN_DEBUG "NUMA - single node, flat memory mode\n");
 
@@ -332,6 +332,15 @@ static __init void init_alloc_remap(int nid)
 	       nid, node_pa, node_pa + size, remap_va, remap_va + size);
 }
 
+static void get_memcfg_numa(void)
+{
+	if (get_memcfg_numaq())
+		return;
+	if (get_memcfg_from_srat())
+		return;
+	get_memcfg_numa_flat();
+}
+
 void __init initmem_init(void)
 {
 	int nid;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 12/25] x86, NUMA: Move numa_nodes_parsed to numa.[hc]
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (10 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 11/25] x86-32, NUMA: Move get_memcfg_numa() into numa_32.c Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 13/25] x86-32, NUMA: implement temporary NUMA init shims Tejun Heo
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for
NUMA init path unification.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/numa.h    |    1 +
 arch/x86/include/asm/numa_64.h |    4 ----
 arch/x86/mm/numa.c             |    1 +
 arch/x86/mm/numa_64.c          |    2 --
 4 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index b2bd5e6..6e1c74e 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -19,6 +19,7 @@
  * numa_cpu_node().
  */
 extern s16 __apicid_to_node[MAX_LOCAL_APIC];
+extern nodemask_t numa_nodes_parsed __initdata;
 
 static inline void set_apicid_to_node(int apicid, s16 node)
 {
diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 794da6d..e84113f 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -1,8 +1,6 @@
 #ifndef _ASM_X86_NUMA_64_H
 #define _ASM_X86_NUMA_64_H
 
-#include <linux/nodemask.h>
-
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
 extern int numa_off;
@@ -17,8 +15,6 @@ extern unsigned long numa_free_all_bootmem(void);
  */
 #define NODE_MIN_SIZE (4*1024*1024)
 
-extern nodemask_t numa_nodes_parsed __initdata;
-
 extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
 extern void __init numa_set_distance(int from, int to, int distance);
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 08530e3..74d8f90 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -6,6 +6,7 @@
 #include <asm/acpi.h>
 
 int __initdata numa_off;
+nodemask_t numa_nodes_parsed __initdata;
 
 static __init int numa_setup(char *opt)
 {
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 813a161..8d84f9c 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -26,8 +26,6 @@
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
 
-nodemask_t numa_nodes_parsed __initdata;
-
 static struct numa_meminfo numa_meminfo
 #ifndef CONFIG_MEMORY_HOTPLUG
 __initdata
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 13/25] x86-32, NUMA: implement temporary NUMA init shims
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (11 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 12/25] x86, NUMA: Move numa_nodes_parsed to numa.[hc] Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 14/25] x86-32, NUMA: Replace srat_32.c with srat.c Tejun Heo
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

To help transition to common NUMA init, implement temporary 32bit
shims for numa_add_memblk() and numa_set_distance().
numa_add_memblk() registers the memblk and adjusts
node_start/end_pfn[].  numa_set_distance() is noop.

These shims will allow using 64bit NUMA init functions on 32bit and
gradual transition to common NUMA init path.

For detailed description, please read description of commits which
make use of the shim functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/numa.h    |    3 +++
 arch/x86/include/asm/numa_64.h |    3 ---
 arch/x86/mm/numa_32.c          |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 6e1c74e..6959c27 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -21,6 +21,9 @@
 extern s16 __apicid_to_node[MAX_LOCAL_APIC];
 extern nodemask_t numa_nodes_parsed __initdata;
 
+extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+extern void __init numa_set_distance(int from, int to, int distance);
+
 static inline void set_apicid_to_node(int apicid, s16 node)
 {
 	__apicid_to_node[apicid] = node;
diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index e84113f..506dd05 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -15,9 +15,6 @@ extern unsigned long numa_free_all_bootmem(void);
  */
 #define NODE_MIN_SIZE (4*1024*1024)
 
-extern int __init numa_add_memblk(int nodeid, u64 start, u64 end);
-extern void __init numa_set_distance(int from, int to, int distance);
-
 #ifdef CONFIG_NUMA_EMU
 #define FAKE_NODE_MIN_SIZE	((u64)32 << 20)
 #define FAKE_NODE_MIN_HASH_MASK	(~(FAKE_NODE_MIN_SIZE - 1UL))
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index abf1247..d0369a5 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -414,3 +414,37 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
+/* temporary shim, will go away soon */
+int __init numa_add_memblk(int nid, u64 start, u64 end)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long end_pfn = end >> PAGE_SHIFT;
+
+	printk(KERN_DEBUG "nid %d start_pfn %08lx end_pfn %08lx\n",
+	       nid, start_pfn, end_pfn);
+
+	if (start >= (u64)max_pfn << PAGE_SHIFT) {
+		printk(KERN_INFO "Ignoring SRAT pfns: %08lx - %08lx\n",
+		       start_pfn, end_pfn);
+		return 0;
+	}
+
+	node_set_online(nid);
+	memblock_x86_register_active_regions(nid, start_pfn,
+					     min(end_pfn, max_pfn));
+
+	if (!node_has_online_mem(nid)) {
+		node_start_pfn[nid] = start_pfn;
+		node_end_pfn[nid] = end_pfn;
+	} else {
+		node_start_pfn[nid] = min(node_start_pfn[nid], start_pfn);
+		node_end_pfn[nid] = max(node_end_pfn[nid], end_pfn);
+	}
+	return 0;
+}
+
+/* temporary shim, will go away soon */
+void __init numa_set_distance(int from, int to, int distance)
+{
+	/* nada */
+}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 14/25] x86-32, NUMA: Replace srat_32.c with srat.c
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (12 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 13/25] x86-32, NUMA: implement temporary NUMA init shims Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 15/25] x86-32, NUMA: Update numaq to use new NUMA init protocol Tejun Heo
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

SRAT support implementation in srat_32.c and srat.c are generally
similar; however, there are some differences.

First of all, 64bit implementation supports more types of SRAT
entries.  64bit supports x2apic, affinity, memory and SLIT.  32bit
only supports processor and memory.

Most other differences stem from different initialization protocols
employed by 64bit and 32bit NUMA init paths.

On 64bit,

* Mappings among PXM, node and apicid are directly done in each SRAT
  entry callback.

* Memory affinity information is passed to numa_add_memblk() which
  takes care of all interfacing with NUMA init.

* Doesn't directly initialize NUMA configurations.  All the
  information is recorded in numa_nodes_parsed and memblks.

On 32bit,

* Checks numa_off.

* Things go through one more level of indirection via private tables
  but eventually end up initializing the same mappings.

* node_start/end_pfn[] are initialized and
  memblock_x86_register_active_regions() is called for each memory
  chunk.

* node_set_online() is called for each online node.

* sort_node_map() is called.

There are also other minor differences in sanity checking and messages
but taking 64bit version should be good enough.

This patch drops the 32bit specific implementation and makes the 64bit
implementation common for both 32 and 64bit.

The init protocol differences are dealt with in two places - the
numa_add_memblk() shim added in the previous patch and new temporary
numa_32.c:get_memcfg_from_srat() which wraps invocation of
x86_acpi_numa_init().

The shim numa_add_memblk() handles the folowings.

* node_start/end_pfn[] initialization.

* node_set_online() for memory nodes.

* Invocation of memblock_x86_register_active_regions().

The shim get_memcfg_from_srat() handles the followings.

* numa_off check.

* node_set_online() for CPU nodes.

* sort_node_map() invocation.

* Clearing of numa_nodes_parsed and active_ranges on failure.

The shims are temporary and will be removed as the generic NUMA init
path in 32bit is replaced with 64bit one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/mmzone_32.h |    2 -
 arch/x86/include/asm/srat.h      |   39 ------
 arch/x86/mm/Makefile             |    5 +-
 arch/x86/mm/numa_32.c            |   23 +++
 arch/x86/mm/srat_32.c            |  281 --------------------------------------
 5 files changed, 24 insertions(+), 326 deletions(-)
 delete mode 100644 arch/x86/include/asm/srat.h
 delete mode 100644 arch/x86/mm/srat_32.c

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 73e5745..5e83a41 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -13,8 +13,6 @@ extern struct pglist_data *node_data[];
 #define NODE_DATA(nid)	(node_data[nid])
 
 #include <asm/numaq.h>
-/* summit or generic arch */
-#include <asm/srat.h>
 
 extern void resume_map_numa_kva(pgd_t *pgd);
 
diff --git a/arch/x86/include/asm/srat.h b/arch/x86/include/asm/srat.h
deleted file mode 100644
index b508d63..0000000
--- a/arch/x86/include/asm/srat.h
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Some of the code in this file has been gleaned from the 64 bit
- * discontigmem support code base.
- *
- * Copyright (C) 2002, IBM Corp.
- *
- * All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT.  See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to Pat Gaughen <gone@us.ibm.com>
- */
-
-#ifndef _ASM_X86_SRAT_H
-#define _ASM_X86_SRAT_H
-
-#ifdef CONFIG_ACPI_NUMA
-extern int get_memcfg_from_srat(void);
-#else
-static inline int get_memcfg_from_srat(void)
-{
-	return 0;
-}
-#endif
-
-#endif /* _ASM_X86_SRAT_H */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 37e7043..62997be 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -24,10 +24,7 @@ obj-$(CONFIG_MMIOTRACE_TEST)	+= testmmiotrace.o
 
 obj-$(CONFIG_NUMA)		+= numa.o numa_$(BITS).o
 obj-$(CONFIG_AMD_NUMA)		+= amdtopology_64.o
-ifeq ($(CONFIG_ACPI_NUMA),y)
-obj-$(CONFIG_X86_64)		+= srat.o
-obj-$(CONFIG_X86_32)		+= srat_32.o
-endif
+obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_HAVE_MEMBLOCK)		+= memblock.o
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index d0369a5..8641239 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -332,6 +332,29 @@ static __init void init_alloc_remap(int nid)
 	       nid, node_pa, node_pa + size, remap_va, remap_va + size);
 }
 
+static int get_memcfg_from_srat(void)
+{
+#ifdef CONFIG_ACPI_NUMA
+	int nid;
+
+	if (numa_off)
+		return 0;
+
+	if (x86_acpi_numa_init() < 0) {
+		nodes_clear(numa_nodes_parsed);
+		remove_all_active_ranges();
+		return 0;
+	}
+
+	for_each_node_mask(nid, numa_nodes_parsed)
+		node_set_online(nid);
+	sort_node_map();
+	return 1;
+#else
+	return 0;
+#endif
+}
+
 static void get_memcfg_numa(void)
 {
 	if (get_memcfg_numaq())
diff --git a/arch/x86/mm/srat_32.c b/arch/x86/mm/srat_32.c
deleted file mode 100644
index 6b9bfd7..0000000
--- a/arch/x86/mm/srat_32.c
+++ /dev/null
@@ -1,281 +0,0 @@
-/*
- * Some of the code in this file has been gleaned from the 64 bit 
- * discontigmem support code base.
- *
- * Copyright (C) 2002, IBM Corp.
- *
- * All rights reserved.          
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT.  See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to Pat Gaughen <gone@us.ibm.com>
- */
-#include <linux/mm.h>
-#include <linux/bootmem.h>
-#include <linux/memblock.h>
-#include <linux/mmzone.h>
-#include <linux/acpi.h>
-#include <linux/nodemask.h>
-#include <asm/srat.h>
-#include <asm/topology.h>
-#include <asm/smp.h>
-#include <asm/e820.h>
-
-/*
- * proximity macros and definitions
- */
-#define NODE_ARRAY_INDEX(x)	((x) / 8)	/* 8 bits/char */
-#define NODE_ARRAY_OFFSET(x)	((x) % 8)	/* 8 bits/char */
-#define BMAP_SET(bmap, bit)	((bmap)[NODE_ARRAY_INDEX(bit)] |= 1 << NODE_ARRAY_OFFSET(bit))
-#define BMAP_TEST(bmap, bit)	((bmap)[NODE_ARRAY_INDEX(bit)] & (1 << NODE_ARRAY_OFFSET(bit)))
-/* bitmap length; _PXM is at most 255 */
-#define PXM_BITMAP_LEN (MAX_PXM_DOMAINS / 8) 
-static u8 __initdata pxm_bitmap[PXM_BITMAP_LEN];	/* bitmap of proximity domains */
-
-#define MAX_CHUNKS_PER_NODE	3
-#define MAXCHUNKS		(MAX_CHUNKS_PER_NODE * MAX_NUMNODES)
-struct node_memory_chunk_s {
-	unsigned long	start_pfn;
-	unsigned long	end_pfn;
-	u8	pxm;		// proximity domain of node
-	u8	nid;		// which cnode contains this chunk?
-	u8	bank;		// which mem bank on this node
-};
-static struct node_memory_chunk_s __initdata node_memory_chunk[MAXCHUNKS];
-
-static int __initdata num_memory_chunks; /* total number of memory chunks */
-static u8 __initdata apicid_to_pxm[MAX_LOCAL_APIC];
-
-int acpi_numa __initdata;
-
-static __init void bad_srat(void)
-{
-        printk(KERN_ERR "SRAT: SRAT not used.\n");
-        acpi_numa = -1;
-	num_memory_chunks = 0;
-}
-
-static __init inline int srat_disabled(void)
-{
-	return numa_off || acpi_numa < 0;
-}
-
-/* Identify CPU proximity domains */
-void __init
-acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *cpu_affinity)
-{
-	if (srat_disabled())
-		return;
-	if (cpu_affinity->header.length !=
-	     sizeof(struct acpi_srat_cpu_affinity)) {
-		bad_srat();
-		return;
-	}
-
-	if ((cpu_affinity->flags & ACPI_SRAT_CPU_ENABLED) == 0)
-		return;		/* empty entry */
-
-	/* mark this node as "seen" in node bitmap */
-	BMAP_SET(pxm_bitmap, cpu_affinity->proximity_domain_lo);
-
-	/* don't need to check apic_id here, because it is always 8 bits */
-	apicid_to_pxm[cpu_affinity->apic_id] = cpu_affinity->proximity_domain_lo;
-
-	printk(KERN_DEBUG "CPU %02x in proximity domain %02x\n",
-		cpu_affinity->apic_id, cpu_affinity->proximity_domain_lo);
-}
-
-/*
- * Identify memory proximity domains and hot-remove capabilities.
- * Fill node memory chunk list structure.
- */
-void __init
-acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *memory_affinity)
-{
-	unsigned long long paddr, size;
-	unsigned long start_pfn, end_pfn;
-	u8 pxm;
-	struct node_memory_chunk_s *p, *q, *pend;
-
-	if (srat_disabled())
-		return;
-	if (memory_affinity->header.length !=
-	     sizeof(struct acpi_srat_mem_affinity)) {
-		bad_srat();
-		return;
-	}
-
-	if ((memory_affinity->flags & ACPI_SRAT_MEM_ENABLED) == 0)
-		return;		/* empty entry */
-
-	pxm = memory_affinity->proximity_domain & 0xff;
-
-	/* mark this node as "seen" in node bitmap */
-	BMAP_SET(pxm_bitmap, pxm);
-
-	/* calculate info for memory chunk structure */
-	paddr = memory_affinity->base_address;
-	size = memory_affinity->length;
-
-	start_pfn = paddr >> PAGE_SHIFT;
-	end_pfn = (paddr + size) >> PAGE_SHIFT;
-
-
-	if (num_memory_chunks >= MAXCHUNKS) {
-		printk(KERN_WARNING "Too many mem chunks in SRAT."
-			" Ignoring %lld MBytes at %llx\n",
-			size/(1024*1024), paddr);
-		return;
-	}
-
-	/* Insertion sort based on base address */
-	pend = &node_memory_chunk[num_memory_chunks];
-	for (p = &node_memory_chunk[0]; p < pend; p++) {
-		if (start_pfn < p->start_pfn)
-			break;
-	}
-	if (p < pend) {
-		for (q = pend; q >= p; q--)
-			*(q + 1) = *q;
-	}
-	p->start_pfn = start_pfn;
-	p->end_pfn = end_pfn;
-	p->pxm = pxm;
-
-	num_memory_chunks++;
-
-	printk(KERN_DEBUG "Memory range %08lx to %08lx"
-			  " in proximity domain %02x %s\n",
-		start_pfn, end_pfn,
-		pxm,
-		((memory_affinity->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) ?
-		 "enabled and removable" : "enabled" ) );
-}
-
-/* Callback for SLIT parsing */
-void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
-{
-}
-
-void acpi_numa_arch_fixup(void)
-{
-}
-/*
- * The SRAT table always lists ascending addresses, so can always
- * assume that the first "start" address that you see is the real
- * start of the node, and that the current "end" address is after
- * the previous one.
- */
-static __init int node_read_chunk(int nid, struct node_memory_chunk_s *memory_chunk)
-{
-	/*
-	 * Only add present memory as told by the e820.
-	 * There is no guarantee from the SRAT that the memory it
-	 * enumerates is present at boot time because it represents
-	 * *possible* memory hotplug areas the same as normal RAM.
-	 */
-	if (memory_chunk->start_pfn >= max_pfn) {
-		printk(KERN_INFO "Ignoring SRAT pfns: %08lx - %08lx\n",
-			memory_chunk->start_pfn, memory_chunk->end_pfn);
-		return -1;
-	}
-	if (memory_chunk->nid != nid)
-		return -1;
-
-	if (!node_has_online_mem(nid))
-		node_start_pfn[nid] = memory_chunk->start_pfn;
-
-	if (node_start_pfn[nid] > memory_chunk->start_pfn)
-		node_start_pfn[nid] = memory_chunk->start_pfn;
-
-	if (node_end_pfn[nid] < memory_chunk->end_pfn)
-		node_end_pfn[nid] = memory_chunk->end_pfn;
-
-	return 0;
-}
-
-int __init get_memcfg_from_srat(void)
-{
-	int i, j;
-
-	if (srat_disabled())
-		goto out_fail;
-
-	if (acpi_numa_init() < 0)
-		goto out_fail;
-
-	if (num_memory_chunks == 0) {
-		printk(KERN_DEBUG
-			 "could not find any ACPI SRAT memory areas.\n");
-		goto out_fail;
-	}
-
-	/* Calculate total number of nodes in system from PXM bitmap and create
-	 * a set of sequential node IDs starting at zero.  (ACPI doesn't seem
-	 * to specify the range of _PXM values.)
-	 */
-	/*
-	 * MCD - we no longer HAVE to number nodes sequentially.  PXM domain
-	 * numbers could go as high as 256, and MAX_NUMNODES for i386 is typically
-	 * 32, so we will continue numbering them in this manner until MAX_NUMNODES
-	 * approaches MAX_PXM_DOMAINS for i386.
-	 */
-	nodes_clear(node_online_map);
-	for (i = 0; i < MAX_PXM_DOMAINS; i++) {
-		if (BMAP_TEST(pxm_bitmap, i)) {
-			int nid = acpi_map_pxm_to_node(i);
-			node_set_online(nid);
-		}
-	}
-	BUG_ON(num_online_nodes() == 0);
-
-	/* set cnode id in memory chunk structure */
-	for (i = 0; i < num_memory_chunks; i++)
-		node_memory_chunk[i].nid = pxm_to_node(node_memory_chunk[i].pxm);
-
-	printk(KERN_DEBUG "pxm bitmap: ");
-	for (i = 0; i < sizeof(pxm_bitmap); i++) {
-		printk(KERN_CONT "%02x ", pxm_bitmap[i]);
-	}
-	printk(KERN_CONT "\n");
-	printk(KERN_DEBUG "Number of logical nodes in system = %d\n",
-			 num_online_nodes());
-	printk(KERN_DEBUG "Number of memory chunks in system = %d\n",
-			 num_memory_chunks);
-
-	for (i = 0; i < MAX_LOCAL_APIC; i++)
-		set_apicid_to_node(i, pxm_to_node(apicid_to_pxm[i]));
-
-	for (j = 0; j < num_memory_chunks; j++){
-		struct node_memory_chunk_s * chunk = &node_memory_chunk[j];
-		printk(KERN_DEBUG
-			"chunk %d nid %d start_pfn %08lx end_pfn %08lx\n",
-		       j, chunk->nid, chunk->start_pfn, chunk->end_pfn);
-		if (node_read_chunk(chunk->nid, chunk))
-			continue;
-
-		memblock_x86_register_active_regions(chunk->nid, chunk->start_pfn,
-					     min(chunk->end_pfn, max_pfn));
-	}
-	/* for out of order entries in SRAT */
-	sort_node_map();
-
-	return 1;
-out_fail:
-	printk(KERN_DEBUG "failed to get NUMA memory information from SRAT"
-			" table\n");
-	return 0;
-}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 15/25] x86-32, NUMA: Update numaq to use new NUMA init protocol
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (13 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 14/25] x86-32, NUMA: Replace srat_32.c with srat.c Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 16/25] x86, NUMA: Move NUMA init logic from numa_64.c to numa.c Tejun Heo
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Update numaq such that it calls numa_add_memblk() and sets
numa_nodes_parsed instead of directly diddling with NUMA states.  The
original get_memcfg_numaq() is renamed to numaq_numa_init() and new
get_memcfg_numaq() is created in numa_32.c.

The shim numa_add_memblk() implementation handles node_start/end_pfn[]
and node_set_online() for nodes with memory.  The new
get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other
than calling the numaq init function.  Things get_memcfgs_numaq() do
are not strictly necessary for numaq but added for consistency and to
help unifying NUMA init handling.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/numaq.h    |    7 +------
 arch/x86/kernel/apic/numaq_32.c |   28 ++++++++++------------------
 arch/x86/mm/numa_32.c           |   23 +++++++++++++++++++++++
 3 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/numaq.h b/arch/x86/include/asm/numaq.h
index 37c5165..c3b3c32 100644
--- a/arch/x86/include/asm/numaq.h
+++ b/arch/x86/include/asm/numaq.h
@@ -29,7 +29,7 @@
 #ifdef CONFIG_X86_NUMAQ
 
 extern int found_numaq;
-extern int get_memcfg_numaq(void);
+extern int numaq_numa_init(void);
 extern int pci_numaq_init(void);
 
 extern void *xquad_portio;
@@ -166,11 +166,6 @@ struct sys_cfg_data {
 
 void numaq_tsc_disable(void);
 
-#else
-static inline int get_memcfg_numaq(void)
-{
-	return 0;
-}
 #endif /* CONFIG_X86_NUMAQ */
 #endif /* _ASM_X86_NUMAQ_H */
 
diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
index 41b8b29..30f1331 100644
--- a/arch/x86/kernel/apic/numaq_32.c
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -48,8 +48,6 @@
 #include <asm/e820.h>
 #include <asm/ipi.h>
 
-#define	MB_TO_PAGES(addr)		((addr) << (20 - PAGE_SHIFT))
-
 int found_numaq;
 
 /*
@@ -79,25 +77,20 @@ int					quad_local_to_mp_bus_id[NR_CPUS/4][4];
 static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
 {
 	struct eachquadmem *eq = scd->eq + node;
+	u64 start = (u64)(eq->hi_shrd_mem_start - eq->priv_mem_size) << 20;
+	u64 end = (u64)(eq->hi_shrd_mem_start + eq->hi_shrd_mem_size) << 20;
+	int ret;
 
-	node_set_online(node);
-
-	/* Convert to pages */
-	node_start_pfn[node] =
-		 MB_TO_PAGES(eq->hi_shrd_mem_start - eq->priv_mem_size);
-
-	node_end_pfn[node] =
-		 MB_TO_PAGES(eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);
-
-	memblock_x86_register_active_regions(node, node_start_pfn[node],
-						node_end_pfn[node]);
+	node_set(node, numa_nodes_parsed);
+	ret = numa_add_memblk(node, start, end);
+	BUG_ON(ret < 0);
 }
 
 /*
  * Function: smp_dump_qct()
  *
  * Description: gets memory layout from the quad config table.  This
- * function also updates node_online_map with the nodes (quads) present.
+ * function also updates numa_nodes_parsed with the nodes (quads) present.
  */
 static void __init smp_dump_qct(void)
 {
@@ -106,7 +99,6 @@ static void __init smp_dump_qct(void)
 
 	scd = (void *)__va(SYS_CFG_DATA_PRIV_ADDR);
 
-	nodes_clear(node_online_map);
 	for_each_node(node) {
 		if (scd->quads_present31_0 & (1 << node))
 			numaq_register_node(node, scd);
@@ -276,14 +268,14 @@ static __init void early_check_numaq(void)
 	}
 }
 
-int __init get_memcfg_numaq(void)
+int __init numaq_numa_init(void)
 {
 	early_check_numaq();
 	if (!found_numaq)
-		return 0;
+		return -ENOENT;
 	smp_dump_qct();
 
-	return 1;
+	return 0;
 }
 
 #define NUMAQ_APIC_DFR_VALUE	(APIC_DFR_CLUSTER)
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 8641239..14135e5 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -332,6 +332,29 @@ static __init void init_alloc_remap(int nid)
 	       nid, node_pa, node_pa + size, remap_va, remap_va + size);
 }
 
+static int get_memcfg_numaq(void)
+{
+#ifdef CONFIG_X86_NUMAQ
+	int nid;
+
+	if (numa_off)
+		return 0;
+
+	if (numaq_numa_init() < 0) {
+		nodes_clear(numa_nodes_parsed);
+		remove_all_active_ranges();
+		return 0;
+	}
+
+	for_each_node_mask(nid, numa_nodes_parsed)
+		node_set_online(nid);
+	sort_node_map();
+	return 1;
+#else
+	return 0;
+#endif
+}
+
 static int get_memcfg_from_srat(void)
 {
 #ifdef CONFIG_ACPI_NUMA
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 16/25] x86, NUMA: Move NUMA init logic from numa_64.c to numa.c
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (14 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 15/25] x86-32, NUMA: Update numaq to use new NUMA init protocol Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 17/25] x86, NUMA: Enable build of generic NUMA init code on 32bit Tejun Heo
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Move the generic 64bit NUMA init machinery from numa_64.c to numa.c.

* node_data[], numa_mem_info and numa_distance
* numa_add_memblk[_to](), numa_remove_memblk[_from]()
* numa_set_distance() and friends
* numa_init() and all the numa_meminfo handling helpers called from it
* dummy_numa_init()
* memory_add_physaddr_to_nid()

A new function x86_numa_init() is added and the content of
numa_64.c::initmem_init() is moved into it.  initmem_init() now simply
calls x86_numa_init().

Constants and numa_off declaration are moved from numa_{32|64}.h to
numa.h.

This is code reorganization and doesn't involve any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/numa.h    |   16 ++
 arch/x86/include/asm/numa_32.h |    2 -
 arch/x86/include/asm/numa_64.h |   19 --
 arch/x86/mm/numa.c             |  523 +++++++++++++++++++++++++++++++++++++++-
 arch/x86/mm/numa_64.c          |  503 +--------------------------------------
 arch/x86/mm/numa_internal.h    |    2 +
 6 files changed, 539 insertions(+), 526 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index 6959c27..c1934fc 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -9,6 +9,16 @@
 #ifdef CONFIG_NUMA
 
 #define NR_NODE_MEMBLKS		(MAX_NUMNODES*2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
+
+/*
+ * Too small node sizes may confuse the VM badly. Usually they
+ * result from BIOS bugs. So dont recognize nodes as standalone
+ * NUMA entities that have less than this amount of RAM listed:
+ */
+#define NODE_MIN_SIZE (4*1024*1024)
+
+extern int numa_off;
 
 /*
  * __apicid_to_node[] stores the raw mapping between physical apicid and
@@ -68,4 +78,10 @@ static inline void numa_remove_cpu(int cpu)		{ }
 struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable);
 #endif
 
+#ifdef CONFIG_NUMA_EMU
+#define FAKE_NODE_MIN_SIZE	((u64)32 << 20)
+#define FAKE_NODE_MIN_HASH_MASK	(~(FAKE_NODE_MIN_SIZE - 1UL))
+void numa_emu_cmdline(char *);
+#endif /* CONFIG_NUMA_EMU */
+
 #endif	/* _ASM_X86_NUMA_H */
diff --git a/arch/x86/include/asm/numa_32.h b/arch/x86/include/asm/numa_32.h
index 7e54b64..e7d6b82 100644
--- a/arch/x86/include/asm/numa_32.h
+++ b/arch/x86/include/asm/numa_32.h
@@ -1,8 +1,6 @@
 #ifndef _ASM_X86_NUMA_32_H
 #define _ASM_X86_NUMA_32_H
 
-extern int numa_off;
-
 #ifdef CONFIG_HIGHMEM
 extern void set_highmem_pages_init(void);
 #else
diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
index 506dd05..0c05f7a 100644
--- a/arch/x86/include/asm/numa_64.h
+++ b/arch/x86/include/asm/numa_64.h
@@ -1,25 +1,6 @@
 #ifndef _ASM_X86_NUMA_64_H
 #define _ASM_X86_NUMA_64_H
 
-#define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
-
-extern int numa_off;
-
 extern unsigned long numa_free_all_bootmem(void);
 
-#ifdef CONFIG_NUMA
-/*
- * Too small node sizes may confuse the VM badly. Usually they
- * result from BIOS bugs. So dont recognize nodes as standalone
- * NUMA entities that have less than this amount of RAM listed:
- */
-#define NODE_MIN_SIZE (4*1024*1024)
-
-#ifdef CONFIG_NUMA_EMU
-#define FAKE_NODE_MIN_SIZE	((u64)32 << 20)
-#define FAKE_NODE_MIN_HASH_MASK	(~(FAKE_NODE_MIN_SIZE - 1UL))
-void numa_emu_cmdline(char *);
-#endif /* CONFIG_NUMA_EMU */
-#endif
-
 #endif /* _ASM_X86_NUMA_64_H */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 74d8f90..3b20547 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -1,13 +1,42 @@
 /* Common code for 32 and 64-bit NUMA */
-#include <linux/topology.h>
-#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/init.h>
 #include <linux/bootmem.h>
-#include <asm/numa.h>
+#include <linux/memblock.h>
+#include <linux/mmzone.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/topology.h>
+
+#include <asm/e820.h>
+#include <asm/proto.h>
+#include <asm/dma.h>
 #include <asm/acpi.h>
+#include <asm/amd_nb.h>
+
+#include "numa_internal.h"
 
 int __initdata numa_off;
 nodemask_t numa_nodes_parsed __initdata;
 
+#ifdef CONFIG_X86_64
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+
+static struct numa_meminfo numa_meminfo
+#ifndef CONFIG_MEMORY_HOTPLUG
+__initdata
+#endif
+;
+
+static int numa_distance_cnt;
+static u8 *numa_distance;
+#endif
+
 static __init int numa_setup(char *opt)
 {
 	if (!opt)
@@ -105,6 +134,392 @@ void __init setup_node_to_cpumask_map(void)
 	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
 }
 
+#ifdef CONFIG_X86_64
+static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+				     struct numa_meminfo *mi)
+{
+	/* ignore zero length blks */
+	if (start == end)
+		return 0;
+
+	/* whine about and ignore invalid blks */
+	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
+		pr_warning("NUMA: Warning: invalid memblk node %d (%Lx-%Lx)\n",
+			   nid, start, end);
+		return 0;
+	}
+
+	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
+		pr_err("NUMA: too many memblk ranges\n");
+		return -EINVAL;
+	}
+
+	mi->blk[mi->nr_blks].start = start;
+	mi->blk[mi->nr_blks].end = end;
+	mi->blk[mi->nr_blks].nid = nid;
+	mi->nr_blks++;
+	return 0;
+}
+
+/**
+ * numa_remove_memblk_from - Remove one numa_memblk from a numa_meminfo
+ * @idx: Index of memblk to remove
+ * @mi: numa_meminfo to remove memblk from
+ *
+ * Remove @idx'th numa_memblk from @mi by shifting @mi->blk[] and
+ * decrementing @mi->nr_blks.
+ */
+void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
+{
+	mi->nr_blks--;
+	memmove(&mi->blk[idx], &mi->blk[idx + 1],
+		(mi->nr_blks - idx) * sizeof(mi->blk[0]));
+}
+
+/**
+ * numa_add_memblk - Add one numa_memblk to numa_meminfo
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @end: End address of the new memblk
+ *
+ * Add a new memblk to the default numa_meminfo.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int __init numa_add_memblk(int nid, u64 start, u64 end)
+{
+	return numa_add_memblk_to(nid, start, end, &numa_meminfo);
+}
+
+/* Initialize bootmem allocator for a node */
+static void __init
+setup_node_bootmem(int nid, unsigned long start, unsigned long end)
+{
+	const u64 nd_low = (u64)MAX_DMA_PFN << PAGE_SHIFT;
+	const u64 nd_high = (u64)max_pfn_mapped << PAGE_SHIFT;
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	unsigned long nd_pa;
+	int tnid;
+
+	/*
+	 * Don't confuse VM with a node that doesn't have the
+	 * minimum amount of memory:
+	 */
+	if (end && (end - start) < NODE_MIN_SIZE)
+		return;
+
+	start = roundup(start, ZONE_ALIGN);
+
+	printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n",
+	       nid, start, end);
+
+	/*
+	 * Try to allocate node data on local node and then fall back to
+	 * all nodes.  Never allocate in DMA zone.
+	 */
+	nd_pa = memblock_x86_find_in_range_node(nid, nd_low, nd_high,
+						nd_size, SMP_CACHE_BYTES);
+	if (nd_pa == MEMBLOCK_ERROR)
+		nd_pa = memblock_find_in_range(nd_low, nd_high,
+					       nd_size, SMP_CACHE_BYTES);
+	if (nd_pa == MEMBLOCK_ERROR) {
+		pr_err("Cannot find %lu bytes in node %d\n", nd_size, nid);
+		return;
+	}
+	memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
+
+	/* report and initialize */
+	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n",
+	       nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		printk(KERN_INFO "    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = __va(nd_pa);
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
+	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
+
+	node_set_online(nid);
+}
+
+/**
+ * numa_cleanup_meminfo - Cleanup a numa_meminfo
+ * @mi: numa_meminfo to clean up
+ *
+ * Sanitize @mi by merging and removing unncessary memblks.  Also check for
+ * conflicts and clear unused memblks.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
+{
+	const u64 low = 0;
+	const u64 high = (u64)max_pfn << PAGE_SHIFT;
+	int i, j, k;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *bi = &mi->blk[i];
+
+		/* make sure all blocks are inside the limits */
+		bi->start = max(bi->start, low);
+		bi->end = min(bi->end, high);
+
+		/* and there's no empty block */
+		if (bi->start == bi->end) {
+			numa_remove_memblk_from(i--, mi);
+			continue;
+		}
+
+		for (j = i + 1; j < mi->nr_blks; j++) {
+			struct numa_memblk *bj = &mi->blk[j];
+			unsigned long start, end;
+
+			/*
+			 * See whether there are overlapping blocks.  Whine
+			 * about but allow overlaps of the same nid.  They
+			 * will be merged below.
+			 */
+			if (bi->end > bj->start && bi->start < bj->end) {
+				if (bi->nid != bj->nid) {
+					pr_err("NUMA: node %d (%Lx-%Lx) overlaps with node %d (%Lx-%Lx)\n",
+					       bi->nid, bi->start, bi->end,
+					       bj->nid, bj->start, bj->end);
+					return -EINVAL;
+				}
+				pr_warning("NUMA: Warning: node %d (%Lx-%Lx) overlaps with itself (%Lx-%Lx)\n",
+					   bi->nid, bi->start, bi->end,
+					   bj->start, bj->end);
+			}
+
+			/*
+			 * Join together blocks on the same node, holes
+			 * between which don't overlap with memory on other
+			 * nodes.
+			 */
+			if (bi->nid != bj->nid)
+				continue;
+			start = max(min(bi->start, bj->start), low);
+			end = min(max(bi->end, bj->end), high);
+			for (k = 0; k < mi->nr_blks; k++) {
+				struct numa_memblk *bk = &mi->blk[k];
+
+				if (bi->nid == bk->nid)
+					continue;
+				if (start < bk->end && end > bk->start)
+					break;
+			}
+			if (k < mi->nr_blks)
+				continue;
+			printk(KERN_INFO "NUMA: Node %d [%Lx,%Lx) + [%Lx,%Lx) -> [%lx,%lx)\n",
+			       bi->nid, bi->start, bi->end, bj->start, bj->end,
+			       start, end);
+			bi->start = start;
+			bi->end = end;
+			numa_remove_memblk_from(j--, mi);
+		}
+	}
+
+	for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
+		mi->blk[i].start = mi->blk[i].end = 0;
+		mi->blk[i].nid = NUMA_NO_NODE;
+	}
+
+	return 0;
+}
+
+/*
+ * Set nodes, which have memory in @mi, in *@nodemask.
+ */
+static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
+					      const struct numa_meminfo *mi)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
+		if (mi->blk[i].start != mi->blk[i].end &&
+		    mi->blk[i].nid != NUMA_NO_NODE)
+			node_set(mi->blk[i].nid, *nodemask);
+}
+
+/**
+ * numa_reset_distance - Reset NUMA distance table
+ *
+ * The current table is freed.  The next numa_set_distance() call will
+ * create a new one.
+ */
+void __init numa_reset_distance(void)
+{
+	size_t size = numa_distance_cnt * numa_distance_cnt * sizeof(numa_distance[0]);
+
+	/* numa_distance could be 1LU marking allocation failure, test cnt */
+	if (numa_distance_cnt)
+		memblock_x86_free_range(__pa(numa_distance),
+					__pa(numa_distance) + size);
+	numa_distance_cnt = 0;
+	numa_distance = NULL;	/* enable table creation */
+}
+
+static int __init numa_alloc_distance(void)
+{
+	nodemask_t nodes_parsed;
+	size_t size;
+	int i, j, cnt = 0;
+	u64 phys;
+
+	/* size the new table and allocate it */
+	nodes_parsed = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
+
+	for_each_node_mask(i, nodes_parsed)
+		cnt = i;
+	cnt++;
+	size = cnt * cnt * sizeof(numa_distance[0]);
+
+	phys = memblock_find_in_range(0, (u64)max_pfn_mapped << PAGE_SHIFT,
+				      size, PAGE_SIZE);
+	if (phys == MEMBLOCK_ERROR) {
+		pr_warning("NUMA: Warning: can't allocate distance table!\n");
+		/* don't retry until explicitly reset */
+		numa_distance = (void *)1LU;
+		return -ENOMEM;
+	}
+	memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
+
+	numa_distance = __va(phys);
+	numa_distance_cnt = cnt;
+
+	/* fill with the default distances */
+	for (i = 0; i < cnt; i++)
+		for (j = 0; j < cnt; j++)
+			numa_distance[i * cnt + j] = i == j ?
+				LOCAL_DISTANCE : REMOTE_DISTANCE;
+	printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
+
+	return 0;
+}
+
+/**
+ * numa_set_distance - Set NUMA distance from one NUMA to another
+ * @from: the 'from' node to set distance
+ * @to: the 'to'  node to set distance
+ * @distance: NUMA distance
+ *
+ * Set the distance from node @from to @to to @distance.  If distance table
+ * doesn't exist, one which is large enough to accommodate all the currently
+ * known nodes will be created.
+ *
+ * If such table cannot be allocated, a warning is printed and further
+ * calls are ignored until the distance table is reset with
+ * numa_reset_distance().
+ *
+ * If @from or @to is higher than the highest known node at the time of
+ * table creation or @distance doesn't make sense, the call is ignored.
+ * This is to allow simplification of specific NUMA config implementations.
+ */
+void __init numa_set_distance(int from, int to, int distance)
+{
+	if (!numa_distance && numa_alloc_distance() < 0)
+		return;
+
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt) {
+		printk_once(KERN_DEBUG "NUMA: Debug: distance out of bound, from=%d to=%d distance=%d\n",
+			    from, to, distance);
+		return;
+	}
+
+	if ((u8)distance != distance ||
+	    (from == to && distance != LOCAL_DISTANCE)) {
+		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
+			     from, to, distance);
+		return;
+	}
+
+	numa_distance[from * numa_distance_cnt + to] = distance;
+}
+
+int __node_distance(int from, int to)
+{
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
+		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
+	return numa_distance[from * numa_distance_cnt + to];
+}
+EXPORT_SYMBOL(__node_distance);
+
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
+{
+	unsigned long numaram, e820ram;
+	int i;
+
+	numaram = 0;
+	for (i = 0; i < mi->nr_blks; i++) {
+		unsigned long s = mi->blk[i].start >> PAGE_SHIFT;
+		unsigned long e = mi->blk[i].end >> PAGE_SHIFT;
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		if ((long)numaram < 0)
+			numaram = 0;
+	}
+
+	e820ram = max_pfn - (memblock_x86_hole_size(0,
+					max_pfn << PAGE_SHIFT) >> PAGE_SHIFT);
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((long)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
+		       (numaram << PAGE_SHIFT) >> 20,
+		       (e820ram << PAGE_SHIFT) >> 20);
+		return false;
+	}
+	return true;
+}
+
+static int __init numa_register_memblks(struct numa_meminfo *mi)
+{
+	int i, nid;
+
+	/* Account for nodes with cpus and no memory */
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+	if (WARN_ON(nodes_empty(node_possible_map)))
+		return -EINVAL;
+
+	for (i = 0; i < mi->nr_blks; i++)
+		memblock_x86_register_active_regions(mi->blk[i].nid,
+					mi->blk[i].start >> PAGE_SHIFT,
+					mi->blk[i].end >> PAGE_SHIFT);
+
+	/* for out of order entries */
+	sort_node_map();
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = (u64)max_pfn << PAGE_SHIFT;
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_bootmem(nid, start, end);
+	}
+
+	return 0;
+}
+#endif
+
 /*
  * There are unfortunately some poorly designed mainboards around that
  * only connect memory to a single CPU. This breaks the 1:1 cpu->node
@@ -127,6 +542,93 @@ void __init numa_init_array(void)
 	}
 }
 
+#ifdef CONFIG_X86_64
+static int __init numa_init(int (*init_func)(void))
+{
+	int i;
+	int ret;
+
+	for (i = 0; i < MAX_LOCAL_APIC; i++)
+		set_apicid_to_node(i, NUMA_NO_NODE);
+
+	nodes_clear(numa_nodes_parsed);
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
+	remove_all_active_ranges();
+	numa_reset_distance();
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+	ret = numa_cleanup_meminfo(&numa_meminfo);
+	if (ret < 0)
+		return ret;
+
+	numa_emulation(&numa_meminfo, numa_distance_cnt);
+
+	ret = numa_register_memblks(&numa_meminfo);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < nr_cpu_ids; i++) {
+		int nid = early_cpu_to_node(i);
+
+		if (nid == NUMA_NO_NODE)
+			continue;
+		if (!node_online(nid))
+			numa_clear_node(i);
+	}
+	numa_init_array();
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	printk(KERN_INFO "%s\n",
+	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
+	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
+	       0LU, max_pfn << PAGE_SHIFT);
+
+	node_set(0, numa_nodes_parsed);
+	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
+
+	return 0;
+}
+
+/**
+ * x86_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init x86_numa_init(void)
+{
+	if (!numa_off) {
+#ifdef CONFIG_ACPI_NUMA
+		if (!numa_init(x86_acpi_numa_init))
+			return;
+#endif
+#ifdef CONFIG_AMD_NUMA
+		if (!numa_init(amd_numa_init))
+			return;
+#endif
+	}
+
+	numa_init(dummy_numa_init);
+}
+#endif
+
 static __init int find_near_online_node(int node)
 {
 	int n, val;
@@ -297,3 +799,18 @@ const struct cpumask *cpumask_of_node(int node)
 EXPORT_SYMBOL(cpumask_of_node);
 
 #endif	/* !CONFIG_DEBUG_PER_CPU_MAPS */
+
+#if defined(CONFIG_X86_64) && defined(CONFIG_MEMORY_HOTPLUG)
+int memory_add_physaddr_to_nid(u64 start)
+{
+	struct numa_meminfo *mi = &numa_meminfo;
+	int nid = mi->blk[0].nid;
+	int i;
+
+	for (i = 0; i < mi->nr_blks; i++)
+		if (mi->blk[i].start <= start && mi->blk[i].end > start)
+			nid = mi->blk[i].nid;
+	return nid;
+}
+EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
+#endif
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 8d84f9c..dd27f40 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -2,499 +2,13 @@
  * Generic VM initialization for x86-64 NUMA setups.
  * Copyright 2002,2003 Andi Kleen, SuSE Labs.
  */
-#include <linux/kernel.h>
-#include <linux/mm.h>
-#include <linux/string.h>
-#include <linux/init.h>
 #include <linux/bootmem.h>
-#include <linux/memblock.h>
-#include <linux/mmzone.h>
-#include <linux/ctype.h>
-#include <linux/module.h>
-#include <linux/nodemask.h>
-#include <linux/sched.h>
-#include <linux/acpi.h>
-
-#include <asm/e820.h>
-#include <asm/proto.h>
-#include <asm/dma.h>
-#include <asm/acpi.h>
-#include <asm/amd_nb.h>
 
 #include "numa_internal.h"
 
-struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
-EXPORT_SYMBOL(node_data);
-
-static struct numa_meminfo numa_meminfo
-#ifndef CONFIG_MEMORY_HOTPLUG
-__initdata
-#endif
-;
-
-static int numa_distance_cnt;
-static u8 *numa_distance;
-
-static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
-				     struct numa_meminfo *mi)
-{
-	/* ignore zero length blks */
-	if (start == end)
-		return 0;
-
-	/* whine about and ignore invalid blks */
-	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
-		pr_warning("NUMA: Warning: invalid memblk node %d (%Lx-%Lx)\n",
-			   nid, start, end);
-		return 0;
-	}
-
-	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
-		pr_err("NUMA: too many memblk ranges\n");
-		return -EINVAL;
-	}
-
-	mi->blk[mi->nr_blks].start = start;
-	mi->blk[mi->nr_blks].end = end;
-	mi->blk[mi->nr_blks].nid = nid;
-	mi->nr_blks++;
-	return 0;
-}
-
-/**
- * numa_remove_memblk_from - Remove one numa_memblk from a numa_meminfo
- * @idx: Index of memblk to remove
- * @mi: numa_meminfo to remove memblk from
- *
- * Remove @idx'th numa_memblk from @mi by shifting @mi->blk[] and
- * decrementing @mi->nr_blks.
- */
-void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
-{
-	mi->nr_blks--;
-	memmove(&mi->blk[idx], &mi->blk[idx + 1],
-		(mi->nr_blks - idx) * sizeof(mi->blk[0]));
-}
-
-/**
- * numa_add_memblk - Add one numa_memblk to numa_meminfo
- * @nid: NUMA node ID of the new memblk
- * @start: Start address of the new memblk
- * @end: End address of the new memblk
- *
- * Add a new memblk to the default numa_meminfo.
- *
- * RETURNS:
- * 0 on success, -errno on failure.
- */
-int __init numa_add_memblk(int nid, u64 start, u64 end)
-{
-	return numa_add_memblk_to(nid, start, end, &numa_meminfo);
-}
-
-/* Initialize bootmem allocator for a node */
-static void __init
-setup_node_bootmem(int nid, unsigned long start, unsigned long end)
-{
-	const u64 nd_low = (u64)MAX_DMA_PFN << PAGE_SHIFT;
-	const u64 nd_high = (u64)max_pfn_mapped << PAGE_SHIFT;
-	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
-	unsigned long nd_pa;
-	int tnid;
-
-	/*
-	 * Don't confuse VM with a node that doesn't have the
-	 * minimum amount of memory:
-	 */
-	if (end && (end - start) < NODE_MIN_SIZE)
-		return;
-
-	start = roundup(start, ZONE_ALIGN);
-
-	printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n",
-	       nid, start, end);
-
-	/*
-	 * Try to allocate node data on local node and then fall back to
-	 * all nodes.  Never allocate in DMA zone.
-	 */
-	nd_pa = memblock_x86_find_in_range_node(nid, nd_low, nd_high,
-						nd_size, SMP_CACHE_BYTES);
-	if (nd_pa == MEMBLOCK_ERROR)
-		nd_pa = memblock_find_in_range(nd_low, nd_high,
-					       nd_size, SMP_CACHE_BYTES);
-	if (nd_pa == MEMBLOCK_ERROR) {
-		pr_err("Cannot find %lu bytes in node %d\n", nd_size, nid);
-		return;
-	}
-	memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
-
-	/* report and initialize */
-	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n",
-	       nd_pa, nd_pa + nd_size - 1);
-	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
-	if (tnid != nid)
-		printk(KERN_INFO "    NODE_DATA(%d) on node %d\n", nid, tnid);
-
-	node_data[nid] = __va(nd_pa);
-	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
-	NODE_DATA(nid)->node_id = nid;
-	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
-	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
-
-	node_set_online(nid);
-}
-
-/**
- * numa_cleanup_meminfo - Cleanup a numa_meminfo
- * @mi: numa_meminfo to clean up
- *
- * Sanitize @mi by merging and removing unncessary memblks.  Also check for
- * conflicts and clear unused memblks.
- *
- * RETURNS:
- * 0 on success, -errno on failure.
- */
-int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
-{
-	const u64 low = 0;
-	const u64 high = (u64)max_pfn << PAGE_SHIFT;
-	int i, j, k;
-
-	for (i = 0; i < mi->nr_blks; i++) {
-		struct numa_memblk *bi = &mi->blk[i];
-
-		/* make sure all blocks are inside the limits */
-		bi->start = max(bi->start, low);
-		bi->end = min(bi->end, high);
-
-		/* and there's no empty block */
-		if (bi->start == bi->end) {
-			numa_remove_memblk_from(i--, mi);
-			continue;
-		}
-
-		for (j = i + 1; j < mi->nr_blks; j++) {
-			struct numa_memblk *bj = &mi->blk[j];
-			unsigned long start, end;
-
-			/*
-			 * See whether there are overlapping blocks.  Whine
-			 * about but allow overlaps of the same nid.  They
-			 * will be merged below.
-			 */
-			if (bi->end > bj->start && bi->start < bj->end) {
-				if (bi->nid != bj->nid) {
-					pr_err("NUMA: node %d (%Lx-%Lx) overlaps with node %d (%Lx-%Lx)\n",
-					       bi->nid, bi->start, bi->end,
-					       bj->nid, bj->start, bj->end);
-					return -EINVAL;
-				}
-				pr_warning("NUMA: Warning: node %d (%Lx-%Lx) overlaps with itself (%Lx-%Lx)\n",
-					   bi->nid, bi->start, bi->end,
-					   bj->start, bj->end);
-			}
-
-			/*
-			 * Join together blocks on the same node, holes
-			 * between which don't overlap with memory on other
-			 * nodes.
-			 */
-			if (bi->nid != bj->nid)
-				continue;
-			start = max(min(bi->start, bj->start), low);
-			end = min(max(bi->end, bj->end), high);
-			for (k = 0; k < mi->nr_blks; k++) {
-				struct numa_memblk *bk = &mi->blk[k];
-
-				if (bi->nid == bk->nid)
-					continue;
-				if (start < bk->end && end > bk->start)
-					break;
-			}
-			if (k < mi->nr_blks)
-				continue;
-			printk(KERN_INFO "NUMA: Node %d [%Lx,%Lx) + [%Lx,%Lx) -> [%lx,%lx)\n",
-			       bi->nid, bi->start, bi->end, bj->start, bj->end,
-			       start, end);
-			bi->start = start;
-			bi->end = end;
-			numa_remove_memblk_from(j--, mi);
-		}
-	}
-
-	for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
-		mi->blk[i].start = mi->blk[i].end = 0;
-		mi->blk[i].nid = NUMA_NO_NODE;
-	}
-
-	return 0;
-}
-
-/*
- * Set nodes, which have memory in @mi, in *@nodemask.
- */
-static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
-					      const struct numa_meminfo *mi)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
-		if (mi->blk[i].start != mi->blk[i].end &&
-		    mi->blk[i].nid != NUMA_NO_NODE)
-			node_set(mi->blk[i].nid, *nodemask);
-}
-
-/**
- * numa_reset_distance - Reset NUMA distance table
- *
- * The current table is freed.  The next numa_set_distance() call will
- * create a new one.
- */
-void __init numa_reset_distance(void)
-{
-	size_t size = numa_distance_cnt * numa_distance_cnt * sizeof(numa_distance[0]);
-
-	/* numa_distance could be 1LU marking allocation failure, test cnt */
-	if (numa_distance_cnt)
-		memblock_x86_free_range(__pa(numa_distance),
-					__pa(numa_distance) + size);
-	numa_distance_cnt = 0;
-	numa_distance = NULL;	/* enable table creation */
-}
-
-static int __init numa_alloc_distance(void)
-{
-	nodemask_t nodes_parsed;
-	size_t size;
-	int i, j, cnt = 0;
-	u64 phys;
-
-	/* size the new table and allocate it */
-	nodes_parsed = numa_nodes_parsed;
-	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
-
-	for_each_node_mask(i, nodes_parsed)
-		cnt = i;
-	cnt++;
-	size = cnt * cnt * sizeof(numa_distance[0]);
-
-	phys = memblock_find_in_range(0, (u64)max_pfn_mapped << PAGE_SHIFT,
-				      size, PAGE_SIZE);
-	if (phys == MEMBLOCK_ERROR) {
-		pr_warning("NUMA: Warning: can't allocate distance table!\n");
-		/* don't retry until explicitly reset */
-		numa_distance = (void *)1LU;
-		return -ENOMEM;
-	}
-	memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
-
-	numa_distance = __va(phys);
-	numa_distance_cnt = cnt;
-
-	/* fill with the default distances */
-	for (i = 0; i < cnt; i++)
-		for (j = 0; j < cnt; j++)
-			numa_distance[i * cnt + j] = i == j ?
-				LOCAL_DISTANCE : REMOTE_DISTANCE;
-	printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
-
-	return 0;
-}
-
-/**
- * numa_set_distance - Set NUMA distance from one NUMA to another
- * @from: the 'from' node to set distance
- * @to: the 'to'  node to set distance
- * @distance: NUMA distance
- *
- * Set the distance from node @from to @to to @distance.  If distance table
- * doesn't exist, one which is large enough to accommodate all the currently
- * known nodes will be created.
- *
- * If such table cannot be allocated, a warning is printed and further
- * calls are ignored until the distance table is reset with
- * numa_reset_distance().
- *
- * If @from or @to is higher than the highest known node at the time of
- * table creation or @distance doesn't make sense, the call is ignored.
- * This is to allow simplification of specific NUMA config implementations.
- */
-void __init numa_set_distance(int from, int to, int distance)
-{
-	if (!numa_distance && numa_alloc_distance() < 0)
-		return;
-
-	if (from >= numa_distance_cnt || to >= numa_distance_cnt) {
-		printk_once(KERN_DEBUG "NUMA: Debug: distance out of bound, from=%d to=%d distance=%d\n",
-			    from, to, distance);
-		return;
-	}
-
-	if ((u8)distance != distance ||
-	    (from == to && distance != LOCAL_DISTANCE)) {
-		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
-			     from, to, distance);
-		return;
-	}
-
-	numa_distance[from * numa_distance_cnt + to] = distance;
-}
-
-int __node_distance(int from, int to)
-{
-	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
-		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
-	return numa_distance[from * numa_distance_cnt + to];
-}
-EXPORT_SYMBOL(__node_distance);
-
-/*
- * Sanity check to catch more bad NUMA configurations (they are amazingly
- * common).  Make sure the nodes cover all memory.
- */
-static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
-{
-	unsigned long numaram, e820ram;
-	int i;
-
-	numaram = 0;
-	for (i = 0; i < mi->nr_blks; i++) {
-		unsigned long s = mi->blk[i].start >> PAGE_SHIFT;
-		unsigned long e = mi->blk[i].end >> PAGE_SHIFT;
-		numaram += e - s;
-		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
-		if ((long)numaram < 0)
-			numaram = 0;
-	}
-
-	e820ram = max_pfn - (memblock_x86_hole_size(0,
-					max_pfn << PAGE_SHIFT) >> PAGE_SHIFT);
-	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
-	if ((long)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
-		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
-		       (numaram << PAGE_SHIFT) >> 20,
-		       (e820ram << PAGE_SHIFT) >> 20);
-		return false;
-	}
-	return true;
-}
-
-static int __init numa_register_memblks(struct numa_meminfo *mi)
-{
-	int i, nid;
-
-	/* Account for nodes with cpus and no memory */
-	node_possible_map = numa_nodes_parsed;
-	numa_nodemask_from_meminfo(&node_possible_map, mi);
-	if (WARN_ON(nodes_empty(node_possible_map)))
-		return -EINVAL;
-
-	for (i = 0; i < mi->nr_blks; i++)
-		memblock_x86_register_active_regions(mi->blk[i].nid,
-					mi->blk[i].start >> PAGE_SHIFT,
-					mi->blk[i].end >> PAGE_SHIFT);
-
-	/* for out of order entries */
-	sort_node_map();
-	if (!numa_meminfo_cover_memory(mi))
-		return -EINVAL;
-
-	/* Finally register nodes. */
-	for_each_node_mask(nid, node_possible_map) {
-		u64 start = (u64)max_pfn << PAGE_SHIFT;
-		u64 end = 0;
-
-		for (i = 0; i < mi->nr_blks; i++) {
-			if (nid != mi->blk[i].nid)
-				continue;
-			start = min(mi->blk[i].start, start);
-			end = max(mi->blk[i].end, end);
-		}
-
-		if (start < end)
-			setup_node_bootmem(nid, start, end);
-	}
-
-	return 0;
-}
-
-/**
- * dummy_numma_init - Fallback dummy NUMA init
- *
- * Used if there's no underlying NUMA architecture, NUMA initialization
- * fails, or NUMA is disabled on the command line.
- *
- * Must online at least one node and add memory blocks that cover all
- * allowed memory.  This function must not fail.
- */
-static int __init dummy_numa_init(void)
-{
-	printk(KERN_INFO "%s\n",
-	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
-	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
-	       0LU, max_pfn << PAGE_SHIFT);
-
-	node_set(0, numa_nodes_parsed);
-	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
-
-	return 0;
-}
-
-static int __init numa_init(int (*init_func)(void))
-{
-	int i;
-	int ret;
-
-	for (i = 0; i < MAX_LOCAL_APIC; i++)
-		set_apicid_to_node(i, NUMA_NO_NODE);
-
-	nodes_clear(numa_nodes_parsed);
-	nodes_clear(node_possible_map);
-	nodes_clear(node_online_map);
-	memset(&numa_meminfo, 0, sizeof(numa_meminfo));
-	remove_all_active_ranges();
-	numa_reset_distance();
-
-	ret = init_func();
-	if (ret < 0)
-		return ret;
-	ret = numa_cleanup_meminfo(&numa_meminfo);
-	if (ret < 0)
-		return ret;
-
-	numa_emulation(&numa_meminfo, numa_distance_cnt);
-
-	ret = numa_register_memblks(&numa_meminfo);
-	if (ret < 0)
-		return ret;
-
-	for (i = 0; i < nr_cpu_ids; i++) {
-		int nid = early_cpu_to_node(i);
-
-		if (nid == NUMA_NO_NODE)
-			continue;
-		if (!node_online(nid))
-			numa_clear_node(i);
-	}
-	numa_init_array();
-	return 0;
-}
-
 void __init initmem_init(void)
 {
-	if (!numa_off) {
-#ifdef CONFIG_ACPI_NUMA
-		if (!numa_init(x86_acpi_numa_init))
-			return;
-#endif
-#ifdef CONFIG_AMD_NUMA
-		if (!numa_init(amd_numa_init))
-			return;
-#endif
-	}
-
-	numa_init(dummy_numa_init);
+	x86_numa_init();
 }
 
 unsigned long __init numa_free_all_bootmem(void)
@@ -509,18 +23,3 @@ unsigned long __init numa_free_all_bootmem(void)
 
 	return pages;
 }
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-int memory_add_physaddr_to_nid(u64 start)
-{
-	struct numa_meminfo *mi = &numa_meminfo;
-	int nid = mi->blk[0].nid;
-	int i;
-
-	for (i = 0; i < mi->nr_blks; i++)
-		if (mi->blk[i].start <= start && mi->blk[i].end > start)
-			nid = mi->blk[i].nid;
-	return nid;
-}
-EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-#endif
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index ef2d973..ad86ec9 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -19,6 +19,8 @@ void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi);
 int __init numa_cleanup_meminfo(struct numa_meminfo *mi);
 void __init numa_reset_distance(void);
 
+void __init x86_numa_init(void);
+
 #ifdef CONFIG_NUMA_EMU
 void __init numa_emulation(struct numa_meminfo *numa_meminfo,
 			   int numa_dist_cnt);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 17/25] x86, NUMA: Enable build of generic NUMA init code on 32bit
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (15 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 16/25] x86, NUMA: Move NUMA init logic from numa_64.c to numa.c Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 18/25] x86, NUMA: Remove long 64bit assumption from numa.c Tejun Heo
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Generic NUMA init code was moved to numa.c from numa_64.c but is still
guaraded by CONFIG_X86_64.  This patch removes the compile guard and
enables compiling on 32bit.

* numa_add_memblk() and numa_set_distance() clash with the shim
  implementation in numa_32.c and are left out.

* memory_add_physaddr_to_nid() clashes with 32bit implementation and
  is left out.

* MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32.

* node_data definition in numa_32.c removed in favor of the one in
  numa.c.

There are places where ulong is assumed to be 64bit.  The next patch
will fix them up.  Note that although the code is compiled it isn't
used yet and this patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/dma.h |    6 +++---
 arch/x86/mm/numa.c         |   10 ++++------
 arch/x86/mm/numa_32.c      |    3 ---
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/dma.h b/arch/x86/include/asm/dma.h
index 057099e..d1a314b 100644
--- a/arch/x86/include/asm/dma.h
+++ b/arch/x86/include/asm/dma.h
@@ -69,6 +69,9 @@
 
 #define MAX_DMA_CHANNELS	8
 
+/* 16MB ISA DMA zone */
+#define MAX_DMA_PFN   ((16 * 1024 * 1024) >> PAGE_SHIFT)
+
 #ifdef CONFIG_X86_32
 
 /* The maximum address that we can perform a DMA transfer to on this platform */
@@ -76,9 +79,6 @@
 
 #else
 
-/* 16MB ISA DMA zone */
-#define MAX_DMA_PFN   ((16 * 1024 * 1024) >> PAGE_SHIFT)
-
 /* 4GB broken PCI/AGP hardware bus master zone */
 #define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024) >> PAGE_SHIFT)
 
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 3b20547..22dc9ed 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -23,7 +23,6 @@
 int __initdata numa_off;
 nodemask_t numa_nodes_parsed __initdata;
 
-#ifdef CONFIG_X86_64
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
 
@@ -35,7 +34,6 @@ __initdata
 
 static int numa_distance_cnt;
 static u8 *numa_distance;
-#endif
 
 static __init int numa_setup(char *opt)
 {
@@ -134,7 +132,6 @@ void __init setup_node_to_cpumask_map(void)
 	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
 }
 
-#ifdef CONFIG_X86_64
 static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
 				     struct numa_meminfo *mi)
 {
@@ -176,6 +173,7 @@ void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
 		(mi->nr_blks - idx) * sizeof(mi->blk[0]));
 }
 
+#ifdef CONFIG_X86_64
 /**
  * numa_add_memblk - Add one numa_memblk to numa_meminfo
  * @nid: NUMA node ID of the new memblk
@@ -191,6 +189,7 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 {
 	return numa_add_memblk_to(nid, start, end, &numa_meminfo);
 }
+#endif
 
 /* Initialize bootmem allocator for a node */
 static void __init
@@ -402,6 +401,7 @@ static int __init numa_alloc_distance(void)
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * numa_set_distance - Set NUMA distance from one NUMA to another
  * @from: the 'from' node to set distance
@@ -440,6 +440,7 @@ void __init numa_set_distance(int from, int to, int distance)
 
 	numa_distance[from * numa_distance_cnt + to] = distance;
 }
+#endif
 
 int __node_distance(int from, int to)
 {
@@ -518,7 +519,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 
 	return 0;
 }
-#endif
 
 /*
  * There are unfortunately some poorly designed mainboards around that
@@ -542,7 +542,6 @@ void __init numa_init_array(void)
 	}
 }
 
-#ifdef CONFIG_X86_64
 static int __init numa_init(int (*init_func)(void))
 {
 	int i;
@@ -627,7 +626,6 @@ void __init x86_numa_init(void)
 
 	numa_init(dummy_numa_init);
 }
-#endif
 
 static __init int find_near_online_node(int node)
 {
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 14135e5..975a76f 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -41,9 +41,6 @@
 #include <asm/bios_ebda.h>
 #include <asm/proto.h>
 
-struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
-EXPORT_SYMBOL(node_data);
-
 /*
  * numa interface - we expect the numa architecture specific code to have
  *                  populated the following initialisation.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 18/25] x86, NUMA: Remove long 64bit assumption from numa.c
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (16 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 17/25] x86, NUMA: Enable build of generic NUMA init code on 32bit Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 19/25] x86-32, NUMA: Add @start and @end to init_alloc_remap() Tejun Heo
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Code moved from numa_64.c has assumption that long is 64bit in several
places.  This patch removes the assumption by using {s|u}64_t
explicity, using PFN_PHYS() for page number -> addr conversions and
adjusting printf formats.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/numa.c |   45 ++++++++++++++++++++++-----------------------
 1 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 22dc9ed..b2fca54 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -192,13 +192,12 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 #endif
 
 /* Initialize bootmem allocator for a node */
-static void __init
-setup_node_bootmem(int nid, unsigned long start, unsigned long end)
+static void __init setup_node_bootmem(int nid, u64 start, u64 end)
 {
-	const u64 nd_low = (u64)MAX_DMA_PFN << PAGE_SHIFT;
-	const u64 nd_high = (u64)max_pfn_mapped << PAGE_SHIFT;
+	const u64 nd_low = PFN_PHYS(MAX_DMA_PFN);
+	const u64 nd_high = PFN_PHYS(max_pfn_mapped);
 	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
-	unsigned long nd_pa;
+	u64 nd_pa;
 	int tnid;
 
 	/*
@@ -210,7 +209,7 @@ setup_node_bootmem(int nid, unsigned long start, unsigned long end)
 
 	start = roundup(start, ZONE_ALIGN);
 
-	printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n",
+	printk(KERN_INFO "Initmem setup node %d %016Lx-%016Lx\n",
 	       nid, start, end);
 
 	/*
@@ -223,13 +222,13 @@ setup_node_bootmem(int nid, unsigned long start, unsigned long end)
 		nd_pa = memblock_find_in_range(nd_low, nd_high,
 					       nd_size, SMP_CACHE_BYTES);
 	if (nd_pa == MEMBLOCK_ERROR) {
-		pr_err("Cannot find %lu bytes in node %d\n", nd_size, nid);
+		pr_err("Cannot find %zu bytes in node %d\n", nd_size, nid);
 		return;
 	}
 	memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
 
 	/* report and initialize */
-	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n",
+	printk(KERN_INFO "  NODE_DATA [%016Lx - %016Lx]\n",
 	       nd_pa, nd_pa + nd_size - 1);
 	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
 	if (tnid != nid)
@@ -257,7 +256,7 @@ setup_node_bootmem(int nid, unsigned long start, unsigned long end)
 int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 {
 	const u64 low = 0;
-	const u64 high = (u64)max_pfn << PAGE_SHIFT;
+	const u64 high = PFN_PHYS(max_pfn);
 	int i, j, k;
 
 	for (i = 0; i < mi->nr_blks; i++) {
@@ -275,7 +274,7 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 
 		for (j = i + 1; j < mi->nr_blks; j++) {
 			struct numa_memblk *bj = &mi->blk[j];
-			unsigned long start, end;
+			u64 start, end;
 
 			/*
 			 * See whether there are overlapping blocks.  Whine
@@ -313,7 +312,7 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
 			}
 			if (k < mi->nr_blks)
 				continue;
-			printk(KERN_INFO "NUMA: Node %d [%Lx,%Lx) + [%Lx,%Lx) -> [%lx,%lx)\n",
+			printk(KERN_INFO "NUMA: Node %d [%Lx,%Lx) + [%Lx,%Lx) -> [%Lx,%Lx)\n",
 			       bi->nid, bi->start, bi->end, bj->start, bj->end,
 			       start, end);
 			bi->start = start;
@@ -378,7 +377,7 @@ static int __init numa_alloc_distance(void)
 	cnt++;
 	size = cnt * cnt * sizeof(numa_distance[0]);
 
-	phys = memblock_find_in_range(0, (u64)max_pfn_mapped << PAGE_SHIFT,
+	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
 				      size, PAGE_SIZE);
 	if (phys == MEMBLOCK_ERROR) {
 		pr_warning("NUMA: Warning: can't allocate distance table!\n");
@@ -456,24 +455,24 @@ EXPORT_SYMBOL(__node_distance);
  */
 static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 {
-	unsigned long numaram, e820ram;
+	u64 numaram, e820ram;
 	int i;
 
 	numaram = 0;
 	for (i = 0; i < mi->nr_blks; i++) {
-		unsigned long s = mi->blk[i].start >> PAGE_SHIFT;
-		unsigned long e = mi->blk[i].end >> PAGE_SHIFT;
+		u64 s = mi->blk[i].start >> PAGE_SHIFT;
+		u64 e = mi->blk[i].end >> PAGE_SHIFT;
 		numaram += e - s;
 		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
-		if ((long)numaram < 0)
+		if ((s64)numaram < 0)
 			numaram = 0;
 	}
 
 	e820ram = max_pfn - (memblock_x86_hole_size(0,
-					max_pfn << PAGE_SHIFT) >> PAGE_SHIFT);
+					PFN_PHYS(max_pfn)) >> PAGE_SHIFT);
 	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
-	if ((long)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
-		printk(KERN_ERR "NUMA: nodes only cover %luMB of your %luMB e820 RAM. Not used.\n",
+	if ((s64)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		printk(KERN_ERR "NUMA: nodes only cover %LuMB of your %LuMB e820 RAM. Not used.\n",
 		       (numaram << PAGE_SHIFT) >> 20,
 		       (e820ram << PAGE_SHIFT) >> 20);
 		return false;
@@ -503,7 +502,7 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 
 	/* Finally register nodes. */
 	for_each_node_mask(nid, node_possible_map) {
-		u64 start = (u64)max_pfn << PAGE_SHIFT;
+		u64 start = PFN_PHYS(max_pfn);
 		u64 end = 0;
 
 		for (i = 0; i < mi->nr_blks; i++) {
@@ -595,11 +594,11 @@ static int __init dummy_numa_init(void)
 {
 	printk(KERN_INFO "%s\n",
 	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
-	printk(KERN_INFO "Faking a node at %016lx-%016lx\n",
-	       0LU, max_pfn << PAGE_SHIFT);
+	printk(KERN_INFO "Faking a node at %016Lx-%016Lx\n",
+	       0LLU, PFN_PHYS(max_pfn));
 
 	node_set(0, numa_nodes_parsed);
-	numa_add_memblk(0, 0, (u64)max_pfn << PAGE_SHIFT);
+	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
 
 	return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 19/25] x86-32, NUMA: Add @start and @end to init_alloc_remap()
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (17 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 18/25] x86, NUMA: Remove long 64bit assumption from numa.c Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 20/25] x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() Tejun Heo
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Instead of dereferencing node_start/end_pfn[] directly, make
init_alloc_remap() take @start and @end and let the caller be
responsible for making sure the range is sane.  This is to prepare for
use from unified NUMA init code.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/numa_32.c |   29 ++++++++++++++---------------
 1 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 975a76f..9008632 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -265,8 +265,10 @@ void resume_map_numa_kva(pgd_t *pgd_base)
  * opportunistically and the callers will fall back to other memory
  * allocation mechanisms on failure.
  */
-static __init void init_alloc_remap(int nid)
+static __init void init_alloc_remap(int nid, u64 start, u64 end)
 {
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long end_pfn = end >> PAGE_SHIFT;
 	unsigned long size, pfn;
 	u64 node_pa, remap_pa;
 	void *remap_va;
@@ -276,24 +278,15 @@ static __init void init_alloc_remap(int nid)
 	 * memory could be added but not currently present.
 	 */
 	printk(KERN_DEBUG "node %d pfn: [%lx - %lx]\n",
-	       nid, node_start_pfn[nid], node_end_pfn[nid]);
-	if (node_start_pfn[nid] > max_pfn)
-		return;
-	if (!node_end_pfn[nid])
-		return;
-	if (node_end_pfn[nid] > max_pfn)
-		node_end_pfn[nid] = max_pfn;
+	       nid, start_pfn, end_pfn);
 
 	/* calculate the necessary space aligned to large page size */
-	size = node_memmap_size_bytes(nid, node_start_pfn[nid],
-				      min(node_end_pfn[nid], max_pfn));
+	size = node_memmap_size_bytes(nid, start_pfn, end_pfn);
 	size += ALIGN(sizeof(pg_data_t), PAGE_SIZE);
 	size = ALIGN(size, LARGE_PAGE_BYTES);
 
 	/* allocate node memory and the lowmem remap area */
-	node_pa = memblock_find_in_range(node_start_pfn[nid] << PAGE_SHIFT,
-					 (u64)node_end_pfn[nid] << PAGE_SHIFT,
-					 size, LARGE_PAGE_BYTES);
+	node_pa = memblock_find_in_range(start, end, size, LARGE_PAGE_BYTES);
 	if (node_pa == MEMBLOCK_ERROR) {
 		pr_warning("remap_alloc: failed to allocate %lu bytes for node %d\n",
 			   size, nid);
@@ -391,8 +384,14 @@ void __init initmem_init(void)
 	get_memcfg_numa();
 	numa_init_array();
 
-	for_each_online_node(nid)
-		init_alloc_remap(nid);
+	for_each_online_node(nid) {
+		u64 start = (u64)node_start_pfn[nid] << PAGE_SHIFT;
+		u64 end = min((u64)node_end_pfn[nid] << PAGE_SHIFT,
+			      (u64)max_pfn << PAGE_SHIFT);
+
+		if (start < end)
+			init_alloc_remap(nid, start, end);
+	}
 
 #ifdef CONFIG_HIGHMEM
 	highstart_pfn = highend_pfn = max_pfn;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 20/25] x86, NUMA: Initialize and use remap allocator from setup_node_bootmem()
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (18 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 19/25] x86-32, NUMA: Add @start and @end to init_alloc_remap() Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 21/25] x86, NUMA: Make 32bit use common NUMA init path Tejun Heo
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

setup_node_bootmem() is taken from 64bit and doesn't use remap
allocator.  It's about to be shared with 32bit so add support for it.
If NODE_DATA is remapped, it's noted in the debug message and node
locality check is skipped as the __pa() of the remapped address
doesn't reflect the actual physical address.

On 64bit, remap allocator becomes noop and doesn't affect the
behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/numa.c          |   41 +++++++++++++++++++++++++++--------------
 arch/x86/mm/numa_32.c       |    2 +-
 arch/x86/mm/numa_internal.h |    6 ++++++
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index b2fca54..a37b382 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -197,7 +197,9 @@ static void __init setup_node_bootmem(int nid, u64 start, u64 end)
 	const u64 nd_low = PFN_PHYS(MAX_DMA_PFN);
 	const u64 nd_high = PFN_PHYS(max_pfn_mapped);
 	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	bool remapped = false;
 	u64 nd_pa;
+	void *nd;
 	int tnid;
 
 	/*
@@ -207,34 +209,45 @@ static void __init setup_node_bootmem(int nid, u64 start, u64 end)
 	if (end && (end - start) < NODE_MIN_SIZE)
 		return;
 
+	/* initialize remap allocator before aligning to ZONE_ALIGN */
+	init_alloc_remap(nid, start, end);
+
 	start = roundup(start, ZONE_ALIGN);
 
 	printk(KERN_INFO "Initmem setup node %d %016Lx-%016Lx\n",
 	       nid, start, end);
 
 	/*
-	 * Try to allocate node data on local node and then fall back to
-	 * all nodes.  Never allocate in DMA zone.
+	 * Allocate node data.  Try remap allocator first, node-local
+	 * memory and then any node.  Never allocate in DMA zone.
 	 */
-	nd_pa = memblock_x86_find_in_range_node(nid, nd_low, nd_high,
+	nd = alloc_remap(nid, nd_size);
+	if (nd) {
+		nd_pa = __pa(nd);
+		remapped = true;
+	} else {
+		nd_pa = memblock_x86_find_in_range_node(nid, nd_low, nd_high,
 						nd_size, SMP_CACHE_BYTES);
-	if (nd_pa == MEMBLOCK_ERROR)
-		nd_pa = memblock_find_in_range(nd_low, nd_high,
-					       nd_size, SMP_CACHE_BYTES);
-	if (nd_pa == MEMBLOCK_ERROR) {
-		pr_err("Cannot find %zu bytes in node %d\n", nd_size, nid);
-		return;
+		if (nd_pa == MEMBLOCK_ERROR)
+			nd_pa = memblock_find_in_range(nd_low, nd_high,
+						nd_size, SMP_CACHE_BYTES);
+		if (nd_pa == MEMBLOCK_ERROR) {
+			pr_err("Cannot find %zu bytes in node %d\n",
+			       nd_size, nid);
+			return;
+		}
+		memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
+		nd = __va(nd_pa);
 	}
-	memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
 
 	/* report and initialize */
-	printk(KERN_INFO "  NODE_DATA [%016Lx - %016Lx]\n",
-	       nd_pa, nd_pa + nd_size - 1);
+	printk(KERN_INFO "  NODE_DATA [%016Lx - %016Lx]%s\n",
+	       nd_pa, nd_pa + nd_size - 1, remapped ? " (remapped)" : "");
 	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
-	if (tnid != nid)
+	if (!remapped && tnid != nid)
 		printk(KERN_INFO "    NODE_DATA(%d) on node %d\n", nid, tnid);
 
-	node_data[nid] = __va(nd_pa);
+	node_data[nid] = nd;
 	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
 	NODE_DATA(nid)->node_id = nid;
 	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 9008632..fbd558f 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -265,7 +265,7 @@ void resume_map_numa_kva(pgd_t *pgd_base)
  * opportunistically and the callers will fall back to other memory
  * allocation mechanisms on failure.
  */
-static __init void init_alloc_remap(int nid, u64 start, u64 end)
+void __init init_alloc_remap(int nid, u64 start, u64 end)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long end_pfn = end >> PAGE_SHIFT;
diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h
index ad86ec9..7178c3a 100644
--- a/arch/x86/mm/numa_internal.h
+++ b/arch/x86/mm/numa_internal.h
@@ -21,6 +21,12 @@ void __init numa_reset_distance(void);
 
 void __init x86_numa_init(void);
 
+#ifdef CONFIG_X86_64
+static inline void init_alloc_remap(int nid, u64 start, u64 end)	{ }
+#else
+void __init init_alloc_remap(int nid, u64 start, u64 end);
+#endif
+
 #ifdef CONFIG_NUMA_EMU
 void __init numa_emulation(struct numa_meminfo *numa_meminfo,
 			   int numa_dist_cnt);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 21/25] x86, NUMA: Make 32bit use common NUMA init path
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (19 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 20/25] x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 22/25] x86, NUMA: Make numa_init_array() static Tejun Heo
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

With both _numa_init() methods converted and the rest of init code
adjusted, numa_32.c now can switch from the 32bit only init code to
the common one in numa.c.

* Shim get_memcfg_*()'s are dropped and initmem_init() calls
  x86_numa_init(), which is updated to handle NUMAQ.

* All boilerplate operations including node range limiting, pgdat
  alloc/init are handled by numa_init().  32bit only implementation is
  removed.

* 32bit numa_add_memblk(), numa_set_distance() and
  memory_add_physaddr_to_nid() removed and common versions in
  numa_32.c enabled for 32bit.

This change causes the following behavior changes.

* NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized
  for 32bit too.

* Much more sanity checks and configuration cleanups.

* Proper handling of node distances.

* The same NUMA init messages as 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/topology.h |    7 -
 arch/x86/mm/numa.c              |   10 +-
 arch/x86/mm/numa_32.c           |  232 +--------------------------------------
 3 files changed, 7 insertions(+), 242 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 8dba769..c006924 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -93,18 +93,11 @@ extern void setup_node_to_cpumask_map(void);
 #define pcibus_to_node(bus) __pcibus_to_node(bus)
 
 #ifdef CONFIG_X86_32
-extern unsigned long node_start_pfn[];
-extern unsigned long node_end_pfn[];
-#define node_has_online_mem(nid) (node_start_pfn[nid] != node_end_pfn[nid])
-
 # define SD_CACHE_NICE_TRIES	1
 # define SD_IDLE_IDX		1
-
 #else
-
 # define SD_CACHE_NICE_TRIES	2
 # define SD_IDLE_IDX		2
-
 #endif
 
 /* sched_domains SD_NODE_INIT for NUMA machines */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index a37b382..e6bc804 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -173,7 +173,6 @@ void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi)
 		(mi->nr_blks - idx) * sizeof(mi->blk[0]));
 }
 
-#ifdef CONFIG_X86_64
 /**
  * numa_add_memblk - Add one numa_memblk to numa_meminfo
  * @nid: NUMA node ID of the new memblk
@@ -189,7 +188,6 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 {
 	return numa_add_memblk_to(nid, start, end, &numa_meminfo);
 }
-#endif
 
 /* Initialize bootmem allocator for a node */
 static void __init setup_node_bootmem(int nid, u64 start, u64 end)
@@ -413,7 +411,6 @@ static int __init numa_alloc_distance(void)
 	return 0;
 }
 
-#ifdef CONFIG_X86_64
 /**
  * numa_set_distance - Set NUMA distance from one NUMA to another
  * @from: the 'from' node to set distance
@@ -452,7 +449,6 @@ void __init numa_set_distance(int from, int to, int distance)
 
 	numa_distance[from * numa_distance_cnt + to] = distance;
 }
-#endif
 
 int __node_distance(int from, int to)
 {
@@ -626,6 +622,10 @@ static int __init dummy_numa_init(void)
 void __init x86_numa_init(void)
 {
 	if (!numa_off) {
+#ifdef CONFIG_X86_NUMAQ
+		if (!numa_init(numaq_numa_init))
+			return;
+#endif
 #ifdef CONFIG_ACPI_NUMA
 		if (!numa_init(x86_acpi_numa_init))
 			return;
@@ -810,7 +810,7 @@ EXPORT_SYMBOL(cpumask_of_node);
 
 #endif	/* !CONFIG_DEBUG_PER_CPU_MAPS */
 
-#if defined(CONFIG_X86_64) && defined(CONFIG_MEMORY_HOTPLUG)
+#ifdef CONFIG_MEMORY_HOTPLUG
 int memory_add_physaddr_to_nid(u64 start)
 {
 	struct numa_meminfo *mi = &numa_meminfo;
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index fbd558f..c930e41 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -22,36 +22,10 @@
  * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
-#include <linux/mm.h>
 #include <linux/bootmem.h>
 #include <linux/memblock.h>
-#include <linux/mmzone.h>
-#include <linux/highmem.h>
-#include <linux/initrd.h>
-#include <linux/nodemask.h>
-#include <linux/module.h>
-#include <linux/kexec.h>
-#include <linux/pfn.h>
-#include <linux/swap.h>
-#include <linux/acpi.h>
-
-#include <asm/e820.h>
-#include <asm/setup.h>
-#include <asm/mmzone.h>
-#include <asm/bios_ebda.h>
-#include <asm/proto.h>
-
-/*
- * numa interface - we expect the numa architecture specific code to have
- *                  populated the following initialisation.
- *
- * 1) node_online_map  - the map of all nodes configured (online) in the system
- * 2) node_start_pfn   - the starting page frame number for a node
- * 3) node_end_pfn     - the ending page fram number for a node
- */
-unsigned long node_start_pfn[MAX_NUMNODES] __read_mostly;
-unsigned long node_end_pfn[MAX_NUMNODES] __read_mostly;
 
+#include "numa_internal.h"
 
 #ifdef CONFIG_DISCONTIGMEM
 /*
@@ -96,7 +70,6 @@ unsigned long node_memmap_size_bytes(int nid, unsigned long start_pfn,
 }
 #endif
 
-extern unsigned long find_max_low_pfn(void);
 extern unsigned long highend_pfn, highstart_pfn;
 
 #define LARGE_PAGE_BYTES (PTRS_PER_PTE * PAGE_SIZE)
@@ -105,68 +78,6 @@ static void *node_remap_start_vaddr[MAX_NUMNODES];
 void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
 
 /*
- * FLAT - support for basic PC memory model with discontig enabled, essentially
- *        a single node with all available processors in it with a flat
- *        memory map.
- */
-static int __init get_memcfg_numa_flat(void)
-{
-	printk(KERN_DEBUG "NUMA - single node, flat memory mode\n");
-
-	node_start_pfn[0] = 0;
-	node_end_pfn[0] = max_pfn;
-	memblock_x86_register_active_regions(0, 0, max_pfn);
-
-        /* Indicate there is one node available. */
-	nodes_clear(node_online_map);
-	node_set_online(0);
-	return 1;
-}
-
-/*
- * Find the highest page frame number we have available for the node
- */
-static void __init propagate_e820_map_node(int nid)
-{
-	if (node_end_pfn[nid] > max_pfn)
-		node_end_pfn[nid] = max_pfn;
-	/*
-	 * if a user has given mem=XXXX, then we need to make sure 
-	 * that the node _starts_ before that, too, not just ends
-	 */
-	if (node_start_pfn[nid] > max_pfn)
-		node_start_pfn[nid] = max_pfn;
-	BUG_ON(node_start_pfn[nid] > node_end_pfn[nid]);
-}
-
-/* 
- * Allocate memory for the pg_data_t for this node via a crude pre-bootmem
- * method.  For node zero take this from the bottom of memory, for
- * subsequent nodes place them at node_remap_start_vaddr which contains
- * node local data in physically node local memory.  See setup_memory()
- * for details.
- */
-static void __init allocate_pgdat(int nid)
-{
-	char buf[16];
-
-	NODE_DATA(nid) = alloc_remap(nid, ALIGN(sizeof(pg_data_t), PAGE_SIZE));
-	if (!NODE_DATA(nid)) {
-		unsigned long pgdat_phys;
-		pgdat_phys = memblock_find_in_range(min_low_pfn<<PAGE_SHIFT,
-				 max_pfn_mapped<<PAGE_SHIFT,
-				 sizeof(pg_data_t),
-				 PAGE_SIZE);
-		NODE_DATA(nid) = (pg_data_t *)(pfn_to_kaddr(pgdat_phys>>PAGE_SHIFT));
-		memset(buf, 0, sizeof(buf));
-		sprintf(buf, "NODE_DATA %d",  nid);
-		memblock_x86_reserve_range(pgdat_phys, pgdat_phys + sizeof(pg_data_t), buf);
-	}
-	printk(KERN_DEBUG "allocate_pgdat: node %d NODE_DATA %08lx\n",
-		nid, (unsigned long)NODE_DATA(nid));
-}
-
-/*
  * Remap memory allocator
  */
 static unsigned long node_remap_start_pfn[MAX_NUMNODES];
@@ -322,76 +233,9 @@ void __init init_alloc_remap(int nid, u64 start, u64 end)
 	       nid, node_pa, node_pa + size, remap_va, remap_va + size);
 }
 
-static int get_memcfg_numaq(void)
-{
-#ifdef CONFIG_X86_NUMAQ
-	int nid;
-
-	if (numa_off)
-		return 0;
-
-	if (numaq_numa_init() < 0) {
-		nodes_clear(numa_nodes_parsed);
-		remove_all_active_ranges();
-		return 0;
-	}
-
-	for_each_node_mask(nid, numa_nodes_parsed)
-		node_set_online(nid);
-	sort_node_map();
-	return 1;
-#else
-	return 0;
-#endif
-}
-
-static int get_memcfg_from_srat(void)
-{
-#ifdef CONFIG_ACPI_NUMA
-	int nid;
-
-	if (numa_off)
-		return 0;
-
-	if (x86_acpi_numa_init() < 0) {
-		nodes_clear(numa_nodes_parsed);
-		remove_all_active_ranges();
-		return 0;
-	}
-
-	for_each_node_mask(nid, numa_nodes_parsed)
-		node_set_online(nid);
-	sort_node_map();
-	return 1;
-#else
-	return 0;
-#endif
-}
-
-static void get_memcfg_numa(void)
-{
-	if (get_memcfg_numaq())
-		return;
-	if (get_memcfg_from_srat())
-		return;
-	get_memcfg_numa_flat();
-}
-
 void __init initmem_init(void)
 {
-	int nid;
-
-	get_memcfg_numa();
-	numa_init_array();
-
-	for_each_online_node(nid) {
-		u64 start = (u64)node_start_pfn[nid] << PAGE_SHIFT;
-		u64 end = min((u64)node_end_pfn[nid] << PAGE_SHIFT,
-			      (u64)max_pfn << PAGE_SHIFT);
-
-		if (start < end)
-			init_alloc_remap(nid, start, end);
-	}
+	x86_numa_init();
 
 #ifdef CONFIG_HIGHMEM
 	highstart_pfn = highend_pfn = max_pfn;
@@ -412,81 +256,9 @@ void __init initmem_init(void)
 
 	printk(KERN_DEBUG "Low memory ends at vaddr %08lx\n",
 			(ulong) pfn_to_kaddr(max_low_pfn));
-	for_each_online_node(nid)
-		allocate_pgdat(nid);
 
 	printk(KERN_DEBUG "High memory starts at vaddr %08lx\n",
 			(ulong) pfn_to_kaddr(highstart_pfn));
-	for_each_online_node(nid)
-		propagate_e820_map_node(nid);
-
-	for_each_online_node(nid) {
-		memset(NODE_DATA(nid), 0, sizeof(struct pglist_data));
-		NODE_DATA(nid)->node_id = nid;
-	}
 
 	setup_bootmem_allocator();
 }
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-static int paddr_to_nid(u64 addr)
-{
-	int nid;
-	unsigned long pfn = PFN_DOWN(addr);
-
-	for_each_node(nid)
-		if (node_start_pfn[nid] <= pfn &&
-		    pfn < node_end_pfn[nid])
-			return nid;
-
-	return -1;
-}
-
-/*
- * This function is used to ask node id BEFORE memmap and mem_section's
- * initialization (pfn_to_nid() can't be used yet).
- * If _PXM is not defined on ACPI's DSDT, node id must be found by this.
- */
-int memory_add_physaddr_to_nid(u64 addr)
-{
-	int nid = paddr_to_nid(addr);
-	return (nid >= 0) ? nid : 0;
-}
-
-EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-#endif
-
-/* temporary shim, will go away soon */
-int __init numa_add_memblk(int nid, u64 start, u64 end)
-{
-	unsigned long start_pfn = start >> PAGE_SHIFT;
-	unsigned long end_pfn = end >> PAGE_SHIFT;
-
-	printk(KERN_DEBUG "nid %d start_pfn %08lx end_pfn %08lx\n",
-	       nid, start_pfn, end_pfn);
-
-	if (start >= (u64)max_pfn << PAGE_SHIFT) {
-		printk(KERN_INFO "Ignoring SRAT pfns: %08lx - %08lx\n",
-		       start_pfn, end_pfn);
-		return 0;
-	}
-
-	node_set_online(nid);
-	memblock_x86_register_active_regions(nid, start_pfn,
-					     min(end_pfn, max_pfn));
-
-	if (!node_has_online_mem(nid)) {
-		node_start_pfn[nid] = start_pfn;
-		node_end_pfn[nid] = end_pfn;
-	} else {
-		node_start_pfn[nid] = min(node_start_pfn[nid], start_pfn);
-		node_end_pfn[nid] = max(node_end_pfn[nid], end_pfn);
-	}
-	return 0;
-}
-
-/* temporary shim, will go away soon */
-void __init numa_set_distance(int from, int to, int distance)
-{
-	/* nada */
-}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 22/25] x86, NUMA: Make numa_init_array() static
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (20 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 21/25] x86, NUMA: Make 32bit use common NUMA init path Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 23/25] x86, NUMA: Rename amdtopology_64.c to amdtopology.c Tejun Heo
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

numa_init_array() no longer has users outside of numa.c.  Make it
static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/numa.h |    2 --
 arch/x86/mm/numa.c          |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index c1934fc..a3f6d3e 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -61,14 +61,12 @@ static inline int numa_cpu_node(int cpu)
 #ifdef CONFIG_NUMA
 extern void __cpuinit numa_set_node(int cpu, int node);
 extern void __cpuinit numa_clear_node(int cpu);
-extern void __init numa_init_array(void);
 extern void __init init_cpu_to_node(void);
 extern void __cpuinit numa_add_cpu(int cpu);
 extern void __cpuinit numa_remove_cpu(int cpu);
 #else	/* CONFIG_NUMA */
 static inline void numa_set_node(int cpu, int node)	{ }
 static inline void numa_clear_node(int cpu)		{ }
-static inline void numa_init_array(void)		{ }
 static inline void init_cpu_to_node(void)		{ }
 static inline void numa_add_cpu(int cpu)		{ }
 static inline void numa_remove_cpu(int cpu)		{ }
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e6bc804..28e9aad 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -535,7 +535,7 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
  * as the number of CPUs is not known yet. We round robin the existing
  * nodes.
  */
-void __init numa_init_array(void)
+static void __init numa_init_array(void)
 {
 	int rr, i;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 23/25] x86, NUMA: Rename amdtopology_64.c to amdtopology.c
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (21 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 22/25] x86, NUMA: Make numa_init_array() static Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 24/25] x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too Tejun Heo
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

amdtopology is going to be used by 32bit too drop _64 suffix.  This is
pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/mm/Makefile         |    2 +-
 arch/x86/mm/amdtopology.c    |  196 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/amdtopology_64.c |  196 ------------------------------------------
 3 files changed, 197 insertions(+), 197 deletions(-)
 create mode 100644 arch/x86/mm/amdtopology.c
 delete mode 100644 arch/x86/mm/amdtopology_64.c

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 62997be..3d11327 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -23,7 +23,7 @@ mmiotrace-y			:= kmmio.o pf_in.o mmio-mod.o
 obj-$(CONFIG_MMIOTRACE_TEST)	+= testmmiotrace.o
 
 obj-$(CONFIG_NUMA)		+= numa.o numa_$(BITS).o
-obj-$(CONFIG_AMD_NUMA)		+= amdtopology_64.o
+obj-$(CONFIG_AMD_NUMA)		+= amdtopology.o
 obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
new file mode 100644
index 0000000..0919c26
--- /dev/null
+++ b/arch/x86/mm/amdtopology.c
@@ -0,0 +1,196 @@
+/*
+ * AMD NUMA support.
+ * Discover the memory map and associated nodes.
+ *
+ * This version reads it directly from the AMD northbridge.
+ *
+ * Copyright 2002,2003 Andi Kleen, SuSE Labs.
+ */
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/string.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/memblock.h>
+
+#include <asm/io.h>
+#include <linux/pci_ids.h>
+#include <linux/acpi.h>
+#include <asm/types.h>
+#include <asm/mmzone.h>
+#include <asm/proto.h>
+#include <asm/e820.h>
+#include <asm/pci-direct.h>
+#include <asm/numa.h>
+#include <asm/mpspec.h>
+#include <asm/apic.h>
+#include <asm/amd_nb.h>
+
+static unsigned char __initdata nodeids[8];
+
+static __init int find_northbridge(void)
+{
+	int num;
+
+	for (num = 0; num < 32; num++) {
+		u32 header;
+
+		header = read_pci_config(0, num, 0, 0x00);
+		if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
+			continue;
+
+		header = read_pci_config(0, num, 1, 0x00);
+		if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
+			continue;
+		return num;
+	}
+
+	return -ENOENT;
+}
+
+static __init void early_get_boot_cpu_id(void)
+{
+	/*
+	 * need to get the APIC ID of the BSP so can use that to
+	 * create apicid_to_node in amd_scan_nodes()
+	 */
+#ifdef CONFIG_X86_MPPARSE
+	/*
+	 * get boot-time SMP configuration:
+	 */
+	if (smp_found_config)
+		early_get_smp_config();
+#endif
+}
+
+int __init amd_numa_init(void)
+{
+	unsigned long start = PFN_PHYS(0);
+	unsigned long end = PFN_PHYS(max_pfn);
+	unsigned numnodes;
+	unsigned long prevbase;
+	int i, j, nb;
+	u32 nodeid, reg;
+	unsigned int bits, cores, apicid_base;
+
+	if (!early_pci_allowed())
+		return -EINVAL;
+
+	nb = find_northbridge();
+	if (nb < 0)
+		return nb;
+
+	pr_info("Scanning NUMA topology in Northbridge %d\n", nb);
+
+	reg = read_pci_config(0, nb, 0, 0x60);
+	numnodes = ((reg >> 4) & 0xF) + 1;
+	if (numnodes <= 1)
+		return -ENOENT;
+
+	pr_info("Number of physical nodes %d\n", numnodes);
+
+	prevbase = 0;
+	for (i = 0; i < 8; i++) {
+		unsigned long base, limit;
+
+		base = read_pci_config(0, nb, 1, 0x40 + i*8);
+		limit = read_pci_config(0, nb, 1, 0x44 + i*8);
+
+		nodeids[i] = nodeid = limit & 7;
+		if ((base & 3) == 0) {
+			if (i < numnodes)
+				pr_info("Skipping disabled node %d\n", i);
+			continue;
+		}
+		if (nodeid >= numnodes) {
+			pr_info("Ignoring excess node %d (%lx:%lx)\n", nodeid,
+				base, limit);
+			continue;
+		}
+
+		if (!limit) {
+			pr_info("Skipping node entry %d (base %lx)\n",
+				i, base);
+			continue;
+		}
+		if ((base >> 8) & 3 || (limit >> 8) & 3) {
+			pr_err("Node %d using interleaving mode %lx/%lx\n",
+			       nodeid, (base >> 8) & 3, (limit >> 8) & 3);
+			return -EINVAL;
+		}
+		if (node_isset(nodeid, numa_nodes_parsed)) {
+			pr_info("Node %d already present, skipping\n",
+				nodeid);
+			continue;
+		}
+
+		limit >>= 16;
+		limit <<= 24;
+		limit |= (1<<24)-1;
+		limit++;
+
+		if (limit > end)
+			limit = end;
+		if (limit <= base)
+			continue;
+
+		base >>= 16;
+		base <<= 24;
+
+		if (base < start)
+			base = start;
+		if (limit > end)
+			limit = end;
+		if (limit == base) {
+			pr_err("Empty node %d\n", nodeid);
+			continue;
+		}
+		if (limit < base) {
+			pr_err("Node %d bogus settings %lx-%lx.\n",
+			       nodeid, base, limit);
+			continue;
+		}
+
+		/* Could sort here, but pun for now. Should not happen anyroads. */
+		if (prevbase > base) {
+			pr_err("Node map not sorted %lx,%lx\n",
+			       prevbase, base);
+			return -EINVAL;
+		}
+
+		pr_info("Node %d MemBase %016lx Limit %016lx\n",
+			nodeid, base, limit);
+
+		prevbase = base;
+		numa_add_memblk(nodeid, base, limit);
+		node_set(nodeid, numa_nodes_parsed);
+	}
+
+	if (!nodes_weight(numa_nodes_parsed))
+		return -ENOENT;
+
+	/*
+	 * We seem to have valid NUMA configuration.  Map apicids to nodes
+	 * using the coreid bits from early_identify_cpu.
+	 */
+	bits = boot_cpu_data.x86_coreid_bits;
+	cores = 1 << bits;
+	apicid_base = 0;
+
+	/* get the APIC ID of the BSP early for systems with apicid lifting */
+	early_get_boot_cpu_id();
+	if (boot_cpu_physical_apicid > 0) {
+		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
+		apicid_base = boot_cpu_physical_apicid;
+	}
+
+	for_each_node_mask(i, numa_nodes_parsed)
+		for (j = apicid_base; j < cores + apicid_base; j++)
+			set_apicid_to_node((i << bits) + j, i);
+
+	return 0;
+}
diff --git a/arch/x86/mm/amdtopology_64.c b/arch/x86/mm/amdtopology_64.c
deleted file mode 100644
index 0919c26..0000000
--- a/arch/x86/mm/amdtopology_64.c
+++ /dev/null
@@ -1,196 +0,0 @@
-/*
- * AMD NUMA support.
- * Discover the memory map and associated nodes.
- *
- * This version reads it directly from the AMD northbridge.
- *
- * Copyright 2002,2003 Andi Kleen, SuSE Labs.
- */
-#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/string.h>
-#include <linux/module.h>
-#include <linux/nodemask.h>
-#include <linux/memblock.h>
-
-#include <asm/io.h>
-#include <linux/pci_ids.h>
-#include <linux/acpi.h>
-#include <asm/types.h>
-#include <asm/mmzone.h>
-#include <asm/proto.h>
-#include <asm/e820.h>
-#include <asm/pci-direct.h>
-#include <asm/numa.h>
-#include <asm/mpspec.h>
-#include <asm/apic.h>
-#include <asm/amd_nb.h>
-
-static unsigned char __initdata nodeids[8];
-
-static __init int find_northbridge(void)
-{
-	int num;
-
-	for (num = 0; num < 32; num++) {
-		u32 header;
-
-		header = read_pci_config(0, num, 0, 0x00);
-		if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
-			header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
-			header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
-			continue;
-
-		header = read_pci_config(0, num, 1, 0x00);
-		if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
-			header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
-			header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
-			continue;
-		return num;
-	}
-
-	return -ENOENT;
-}
-
-static __init void early_get_boot_cpu_id(void)
-{
-	/*
-	 * need to get the APIC ID of the BSP so can use that to
-	 * create apicid_to_node in amd_scan_nodes()
-	 */
-#ifdef CONFIG_X86_MPPARSE
-	/*
-	 * get boot-time SMP configuration:
-	 */
-	if (smp_found_config)
-		early_get_smp_config();
-#endif
-}
-
-int __init amd_numa_init(void)
-{
-	unsigned long start = PFN_PHYS(0);
-	unsigned long end = PFN_PHYS(max_pfn);
-	unsigned numnodes;
-	unsigned long prevbase;
-	int i, j, nb;
-	u32 nodeid, reg;
-	unsigned int bits, cores, apicid_base;
-
-	if (!early_pci_allowed())
-		return -EINVAL;
-
-	nb = find_northbridge();
-	if (nb < 0)
-		return nb;
-
-	pr_info("Scanning NUMA topology in Northbridge %d\n", nb);
-
-	reg = read_pci_config(0, nb, 0, 0x60);
-	numnodes = ((reg >> 4) & 0xF) + 1;
-	if (numnodes <= 1)
-		return -ENOENT;
-
-	pr_info("Number of physical nodes %d\n", numnodes);
-
-	prevbase = 0;
-	for (i = 0; i < 8; i++) {
-		unsigned long base, limit;
-
-		base = read_pci_config(0, nb, 1, 0x40 + i*8);
-		limit = read_pci_config(0, nb, 1, 0x44 + i*8);
-
-		nodeids[i] = nodeid = limit & 7;
-		if ((base & 3) == 0) {
-			if (i < numnodes)
-				pr_info("Skipping disabled node %d\n", i);
-			continue;
-		}
-		if (nodeid >= numnodes) {
-			pr_info("Ignoring excess node %d (%lx:%lx)\n", nodeid,
-				base, limit);
-			continue;
-		}
-
-		if (!limit) {
-			pr_info("Skipping node entry %d (base %lx)\n",
-				i, base);
-			continue;
-		}
-		if ((base >> 8) & 3 || (limit >> 8) & 3) {
-			pr_err("Node %d using interleaving mode %lx/%lx\n",
-			       nodeid, (base >> 8) & 3, (limit >> 8) & 3);
-			return -EINVAL;
-		}
-		if (node_isset(nodeid, numa_nodes_parsed)) {
-			pr_info("Node %d already present, skipping\n",
-				nodeid);
-			continue;
-		}
-
-		limit >>= 16;
-		limit <<= 24;
-		limit |= (1<<24)-1;
-		limit++;
-
-		if (limit > end)
-			limit = end;
-		if (limit <= base)
-			continue;
-
-		base >>= 16;
-		base <<= 24;
-
-		if (base < start)
-			base = start;
-		if (limit > end)
-			limit = end;
-		if (limit == base) {
-			pr_err("Empty node %d\n", nodeid);
-			continue;
-		}
-		if (limit < base) {
-			pr_err("Node %d bogus settings %lx-%lx.\n",
-			       nodeid, base, limit);
-			continue;
-		}
-
-		/* Could sort here, but pun for now. Should not happen anyroads. */
-		if (prevbase > base) {
-			pr_err("Node map not sorted %lx,%lx\n",
-			       prevbase, base);
-			return -EINVAL;
-		}
-
-		pr_info("Node %d MemBase %016lx Limit %016lx\n",
-			nodeid, base, limit);
-
-		prevbase = base;
-		numa_add_memblk(nodeid, base, limit);
-		node_set(nodeid, numa_nodes_parsed);
-	}
-
-	if (!nodes_weight(numa_nodes_parsed))
-		return -ENOENT;
-
-	/*
-	 * We seem to have valid NUMA configuration.  Map apicids to nodes
-	 * using the coreid bits from early_identify_cpu.
-	 */
-	bits = boot_cpu_data.x86_coreid_bits;
-	cores = 1 << bits;
-	apicid_base = 0;
-
-	/* get the APIC ID of the BSP early for systems with apicid lifting */
-	early_get_boot_cpu_id();
-	if (boot_cpu_physical_apicid > 0) {
-		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
-		apicid_base = boot_cpu_physical_apicid;
-	}
-
-	for_each_node_mask(i, numa_nodes_parsed)
-		for (j = apicid_base; j < cores + apicid_base; j++)
-			set_apicid_to_node((i << bits) + j, i);
-
-	return 0;
-}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 24/25] x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (22 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 23/25] x86, NUMA: Rename amdtopology_64.c to amdtopology.c Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 15:28 ` [PATCH 25/25] x86, NUMA: Enable emulation " Tejun Heo
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Now that NUMA init path is unified, amdtopology can be enabled on
32bit.  Make amdtopology.c safe on 32bit by explicitly using u64 and
drop X86_64 dependency from Kconfig.

Inclusion of bootmem.h is added for max_pfn declaration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/Kconfig          |    2 +-
 arch/x86/mm/amdtopology.c |   21 +++++++++++----------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8db4fbf..50cb68d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1174,7 +1174,7 @@ comment "NUMA (Summit) requires SMP, 64GB highmem support, ACPI"
 config AMD_NUMA
 	def_bool y
 	prompt "Old style AMD Opteron NUMA detection"
-	depends on X86_64 && NUMA && PCI
+	depends on NUMA && PCI
 	---help---
 	  Enable AMD NUMA node topology detection.  You should say Y here if
 	  you have a multi processor AMD system. This uses an old method to
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 0919c26..5247d01 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/memblock.h>
+#include <linux/bootmem.h>
 
 #include <asm/io.h>
 #include <linux/pci_ids.h>
@@ -69,10 +70,10 @@ static __init void early_get_boot_cpu_id(void)
 
 int __init amd_numa_init(void)
 {
-	unsigned long start = PFN_PHYS(0);
-	unsigned long end = PFN_PHYS(max_pfn);
+	u64 start = PFN_PHYS(0);
+	u64 end = PFN_PHYS(max_pfn);
 	unsigned numnodes;
-	unsigned long prevbase;
+	u64 prevbase;
 	int i, j, nb;
 	u32 nodeid, reg;
 	unsigned int bits, cores, apicid_base;
@@ -95,7 +96,7 @@ int __init amd_numa_init(void)
 
 	prevbase = 0;
 	for (i = 0; i < 8; i++) {
-		unsigned long base, limit;
+		u64 base, limit;
 
 		base = read_pci_config(0, nb, 1, 0x40 + i*8);
 		limit = read_pci_config(0, nb, 1, 0x44 + i*8);
@@ -107,18 +108,18 @@ int __init amd_numa_init(void)
 			continue;
 		}
 		if (nodeid >= numnodes) {
-			pr_info("Ignoring excess node %d (%lx:%lx)\n", nodeid,
+			pr_info("Ignoring excess node %d (%Lx:%Lx)\n", nodeid,
 				base, limit);
 			continue;
 		}
 
 		if (!limit) {
-			pr_info("Skipping node entry %d (base %lx)\n",
+			pr_info("Skipping node entry %d (base %Lx)\n",
 				i, base);
 			continue;
 		}
 		if ((base >> 8) & 3 || (limit >> 8) & 3) {
-			pr_err("Node %d using interleaving mode %lx/%lx\n",
+			pr_err("Node %d using interleaving mode %Lx/%Lx\n",
 			       nodeid, (base >> 8) & 3, (limit >> 8) & 3);
 			return -EINVAL;
 		}
@@ -150,19 +151,19 @@ int __init amd_numa_init(void)
 			continue;
 		}
 		if (limit < base) {
-			pr_err("Node %d bogus settings %lx-%lx.\n",
+			pr_err("Node %d bogus settings %Lx-%Lx.\n",
 			       nodeid, base, limit);
 			continue;
 		}
 
 		/* Could sort here, but pun for now. Should not happen anyroads. */
 		if (prevbase > base) {
-			pr_err("Node map not sorted %lx,%lx\n",
+			pr_err("Node map not sorted %Lx,%Lx\n",
 			       prevbase, base);
 			return -EINVAL;
 		}
 
-		pr_info("Node %d MemBase %016lx Limit %016lx\n",
+		pr_info("Node %d MemBase %016Lx Limit %016Lx\n",
 			nodeid, base, limit);
 
 		prevbase = base;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 25/25] x86, NUMA: Enable emulation on 32bit too
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (23 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 24/25] x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too Tejun Heo
@ 2011-04-29 15:28 ` Tejun Heo
  2011-04-29 18:15 ` [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Ingo Molnar
  2011-04-29 20:14 ` Yinghai Lu
  26 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-29 15:28 UTC (permalink / raw)
  To: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel; +Cc: Tejun Heo

Now that NUMA init path is unified, NUMA emulation can be enabled on
32bit.  Make numa_emluation.c safe on 32bit by doing the followings.

* Define MAX_DMA32_PFN on 32bit too.

* Include bootmem.h for max_pfn declaration.

* Use u64 explicitly and always use PFN_PHYS() when converting page
  number to address.

* Avoid __udivdi3() generation on 32bit by doing number of pages
  calculation instead in split_nodes_interleave().

And drop X86_64 dependency from Kconfig.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/Kconfig             |    2 +-
 arch/x86/include/asm/dma.h   |   10 +++-------
 arch/x86/mm/numa_emulation.c |   16 +++++++++++-----
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 50cb68d..648fca4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1201,7 +1201,7 @@ config NODES_SPAN_OTHER_NODES
 
 config NUMA_EMU
 	bool "NUMA emulation"
-	depends on X86_64 && NUMA
+	depends on NUMA
 	---help---
 	  Enable NUMA emulation. A flat machine will be split
 	  into virtual nodes when booted with "numa=fake=N", where N is the
diff --git a/arch/x86/include/asm/dma.h b/arch/x86/include/asm/dma.h
index d1a314b..0bdb0c5 100644
--- a/arch/x86/include/asm/dma.h
+++ b/arch/x86/include/asm/dma.h
@@ -72,19 +72,15 @@
 /* 16MB ISA DMA zone */
 #define MAX_DMA_PFN   ((16 * 1024 * 1024) >> PAGE_SHIFT)
 
-#ifdef CONFIG_X86_32
+/* 4GB broken PCI/AGP hardware bus master zone */
+#define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024) >> PAGE_SHIFT)
 
+#ifdef CONFIG_X86_32
 /* The maximum address that we can perform a DMA transfer to on this platform */
 #define MAX_DMA_ADDRESS      (PAGE_OFFSET + 0x1000000)
-
 #else
-
-/* 4GB broken PCI/AGP hardware bus master zone */
-#define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024) >> PAGE_SHIFT)
-
 /* Compat define for old dma zone */
 #define MAX_DMA_ADDRESS ((unsigned long)__va(MAX_DMA_PFN << PAGE_SHIFT))
-
 #endif
 
 /* 8237 DMA controllers */
diff --git a/arch/x86/mm/numa_emulation.c b/arch/x86/mm/numa_emulation.c
index ad091e4..c0040f8 100644
--- a/arch/x86/mm/numa_emulation.c
+++ b/arch/x86/mm/numa_emulation.c
@@ -5,6 +5,7 @@
 #include <linux/errno.h>
 #include <linux/topology.h>
 #include <linux/memblock.h>
+#include <linux/bootmem.h>
 #include <asm/dma.h>
 
 #include "numa_internal.h"
@@ -84,7 +85,13 @@ static int __init split_nodes_interleave(struct numa_meminfo *ei,
 		nr_nodes = MAX_NUMNODES;
 	}
 
-	size = (max_addr - addr - memblock_x86_hole_size(addr, max_addr)) / nr_nodes;
+	/*
+	 * Calculate target node size.  x86_32 freaks on __udivdi3() so do
+	 * the division in ulong number of pages and convert back.
+	 */
+	size = max_addr - addr - memblock_x86_hole_size(addr, max_addr);
+	size = PFN_PHYS((unsigned long)(size >> PAGE_SHIFT) / nr_nodes);
+
 	/*
 	 * Calculate the number of big nodes that can be allocated as a result
 	 * of consolidating the remainder.
@@ -226,7 +233,7 @@ static int __init split_nodes_size_interleave(struct numa_meminfo *ei,
 	 */
 	while (nodes_weight(physnode_mask)) {
 		for_each_node_mask(i, physnode_mask) {
-			u64 dma32_end = MAX_DMA32_PFN << PAGE_SHIFT;
+			u64 dma32_end = PFN_PHYS(MAX_DMA32_PFN);
 			u64 start, limit, end;
 			int phys_blk;
 
@@ -298,7 +305,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 {
 	static struct numa_meminfo ei __initdata;
 	static struct numa_meminfo pi __initdata;
-	const u64 max_addr = max_pfn << PAGE_SHIFT;
+	const u64 max_addr = PFN_PHYS(max_pfn);
 	u8 *phys_dist = NULL;
 	size_t phys_size = numa_dist_cnt * numa_dist_cnt * sizeof(phys_dist[0]);
 	int max_emu_nid, dfl_phys_nid;
@@ -342,8 +349,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 	if (numa_dist_cnt) {
 		u64 phys;
 
-		phys = memblock_find_in_range(0,
-					      (u64)max_pfn_mapped << PAGE_SHIFT,
+		phys = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
 					      phys_size, PAGE_SIZE);
 		if (phys == MEMBLOCK_ERROR) {
 			pr_warning("NUMA: Warning: can't allocate copy of distance table, disabling emulation\n");
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/25] x86-64, NUMA: simplify nodedata allocation
  2011-04-29 15:28 ` [PATCH 03/25] x86-64, NUMA: simplify nodedata allocation Tejun Heo
@ 2011-04-29 17:23   ` Yinghai Lu
  2011-04-30 12:02     ` Tejun Heo
  0 siblings, 1 reply; 43+ messages in thread
From: Yinghai Lu @ 2011-04-29 17:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On 04/29/2011 08:28 AM, Tejun Heo wrote:
> With top-down memblock allocation, the allocation range limits in
> ealry_node_mem() can be simplified - try node-local first, then any
> node but in any case don't allocate below DMA limit.
> 
> Remove early_node_mem() and implement simplified allocation directly
> in setup_node_bootmem().

keep early_node_mem would be better?

Yinghai

> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/mm/numa_64.c |   53 +++++++++++++++---------------------------------
>  1 files changed, 17 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index 5e0dfc5..59d8a1c 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -37,38 +37,6 @@ __initdata
>  static int numa_distance_cnt;
>  static u8 *numa_distance;
>  
> -static void * __init early_node_mem(int nodeid, unsigned long start,
> -				    unsigned long end, unsigned long size,
> -				    unsigned long align)
> -{
> -	unsigned long mem;
> -
> -	/*
> -	 * put it on high as possible
> -	 * something will go with NODE_DATA
> -	 */
> -	if (start < (MAX_DMA_PFN<<PAGE_SHIFT))
> -		start = MAX_DMA_PFN<<PAGE_SHIFT;
> -	if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
> -	    end > (MAX_DMA32_PFN<<PAGE_SHIFT))
> -		start = MAX_DMA32_PFN<<PAGE_SHIFT;
> -	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
> -	if (mem != MEMBLOCK_ERROR)
> -		return __va(mem);
> -
> -	/* extend the search scope */
> -	end = max_pfn_mapped << PAGE_SHIFT;
> -	start = MAX_DMA_PFN << PAGE_SHIFT;
> -	mem = memblock_find_in_range(start, end, size, align);
> -	if (mem != MEMBLOCK_ERROR)
> -		return __va(mem);
> -
> -	printk(KERN_ERR "Cannot find %lu bytes in node %d\n",
> -		       size, nodeid);
> -
> -	return NULL;
> -}
> -
>  static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
>  				     struct numa_meminfo *mi)
>  {
> @@ -130,6 +98,8 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>  void __init
>  setup_node_bootmem(int nid, unsigned long start, unsigned long end)
>  {
> +	const u64 nd_low = (u64)MAX_DMA_PFN << PAGE_SHIFT;
> +	const u64 nd_high = (u64)max_pfn_mapped << PAGE_SHIFT;
>  	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
>  	unsigned long nd_pa;
>  	int tnid;
> @@ -146,18 +116,29 @@ setup_node_bootmem(int nid, unsigned long start, unsigned long end)
>  	printk(KERN_INFO "Initmem setup node %d %016lx-%016lx\n",
>  	       nid, start, end);
>  
> -	node_data[nid] = early_node_mem(nid, start, end, nd_size,
> -					SMP_CACHE_BYTES);
> -	if (node_data[nid] == NULL)
> +	/*
> +	 * Try to allocate node data on local node and then fall back to
> +	 * all nodes.  Never allocate in DMA zone.
> +	 */
> +	nd_pa = memblock_x86_find_in_range_node(nid, nd_low, nd_high,
> +						nd_size, SMP_CACHE_BYTES);
> +	if (nd_pa == MEMBLOCK_ERROR)
> +		nd_pa = memblock_find_in_range(nd_low, nd_high,
> +					       nd_size, SMP_CACHE_BYTES);
> +	if (nd_pa == MEMBLOCK_ERROR) {
> +		pr_err("Cannot find %lu bytes in node %d\n", nd_size, nid);
>  		return;
> -	nd_pa = __pa(node_data[nid]);
> +	}
>  	memblock_x86_reserve_range(nd_pa, nd_pa + nd_size, "NODE_DATA");
> +
> +	/* report and initialize */
>  	printk(KERN_INFO "  NODE_DATA [%016lx - %016lx]\n",
>  	       nd_pa, nd_pa + nd_size - 1);
>  	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>  	if (tnid != nid)
>  		printk(KERN_INFO "    NODE_DATA(%d) on node %d\n", nid, tnid);
>  
> +	node_data[nid] = __va(nd_pa);
>  	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>  	NODE_DATA(nid)->node_id = nid;
>  	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/25] x86, NUMA: trivial cleanups
  2011-04-29 15:28 ` [PATCH 08/25] x86, NUMA: trivial cleanups Tejun Heo
@ 2011-04-29 17:25   ` Yinghai Lu
  2011-04-30 12:03     ` Tejun Heo
  0 siblings, 1 reply; 43+ messages in thread
From: Yinghai Lu @ 2011-04-29 17:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On 04/29/2011 08:28 AM, Tejun Heo wrote:
> * Kill no longer used struct bootnode.
> 
> * Kill dangling declaration of pxm_to_nid() in numa_32.h.
> 
> * Make setup_node_bootmem() static.

first one and third one should appear in patches that I posted before.

Yinghai

> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/include/asm/acpi.h    |    2 --
>  arch/x86/include/asm/amd_nb.h  |    1 -
>  arch/x86/include/asm/numa_32.h |    2 --
>  arch/x86/include/asm/numa_64.h |    7 -------
>  arch/x86/mm/numa_64.c          |    2 +-
>  5 files changed, 1 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
> index 12e0e7d..416d865 100644
> --- a/arch/x86/include/asm/acpi.h
> +++ b/arch/x86/include/asm/acpi.h
> @@ -183,8 +183,6 @@ static inline void disable_acpi(void) { }
>  
>  #define ARCH_HAS_POWER_INIT	1
>  
> -struct bootnode;
> -
>  #ifdef CONFIG_ACPI_NUMA
>  extern int acpi_numa;
>  extern int x86_acpi_numa_init(void);
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 3316822..67f87f2 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -11,7 +11,6 @@ struct amd_nb_bus_dev_range {
>  
>  extern const struct pci_device_id amd_nb_misc_ids[];
>  extern const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[];
> -struct bootnode;
>  
>  extern bool early_is_amd_nb(u32 value);
>  extern int amd_cache_northbridges(void);
> diff --git a/arch/x86/include/asm/numa_32.h b/arch/x86/include/asm/numa_32.h
> index 242522f..7e54b64 100644
> --- a/arch/x86/include/asm/numa_32.h
> +++ b/arch/x86/include/asm/numa_32.h
> @@ -3,8 +3,6 @@
>  
>  extern int numa_off;
>  
> -extern int pxm_to_nid(int pxm);
> -
>  #ifdef CONFIG_HIGHMEM
>  extern void set_highmem_pages_init(void);
>  #else
> diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h
> index 12461eb..794da6d 100644
> --- a/arch/x86/include/asm/numa_64.h
> +++ b/arch/x86/include/asm/numa_64.h
> @@ -3,18 +3,11 @@
>  
>  #include <linux/nodemask.h>
>  
> -struct bootnode {
> -	u64 start;
> -	u64 end;
> -};
> -
>  #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
>  
>  extern int numa_off;
>  
>  extern unsigned long numa_free_all_bootmem(void);
> -extern void setup_node_bootmem(int nodeid, unsigned long start,
> -			       unsigned long end);
>  
>  #ifdef CONFIG_NUMA
>  /*
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index 3598fbf..813a161 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -95,7 +95,7 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>  }
>  
>  /* Initialize bootmem allocator for a node */
> -void __init
> +static void __init
>  setup_node_bootmem(int nid, unsigned long start, unsigned long end)
>  {
>  	const u64 nd_low = (u64)MAX_DMA_PFN << PAGE_SHIFT;


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (24 preceding siblings ...)
  2011-04-29 15:28 ` [PATCH 25/25] x86, NUMA: Enable emulation " Tejun Heo
@ 2011-04-29 18:15 ` Ingo Molnar
  2011-04-29 20:14 ` Yinghai Lu
  26 siblings, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2011-04-29 18:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, yinghai, rientjes, tglx, hpa, x86, linux-kernel


* Tejun Heo <tj@kernel.org> wrote:

> diffstat follows.  610 lines removed and 32bit NUMA got much better! :)

>  32 files changed, 1017 insertions(+), 1627 deletions(-)

impressive! :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization
  2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
                   ` (25 preceding siblings ...)
  2011-04-29 18:15 ` [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Ingo Molnar
@ 2011-04-29 20:14 ` Yinghai Lu
  2011-04-30 12:17   ` Tejun Heo
  26 siblings, 1 reply; 43+ messages in thread
From: Yinghai Lu @ 2011-04-29 20:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On 04/29/2011 08:28 AM, Tejun Heo wrote:
> Hello,
> 
> This patchset, finally, unifies 32 and 64bit NUMA initialization.  It
> gradually moves 64bit stuff to common code and replaces 32bit code
> with it.  Once the unification is complete, amdtopology and emulation
> are enabled for 32bit too (there's no reason not to).

got:

SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 100000-80000000
SRAT: Node 0 PXM 0 100000000-880000000
SRAT: Node 1 PXM 1 880000000-1080000000
SRAT: Node 2 PXM 2 1080000000-1880000000
SRAT: Node 3 PXM 3 1880000000-2080000000
SRAT: Node 4 PXM 4 2080000000-2880000000
SRAT: Node 5 PXM 5 2880000000-3080000000
SRAT: Node 6 PXM 6 3080000000-3880000000
SRAT: Node 7 PXM 7 3880000000-4080000000
NUMA: Initialized distance table, cnt=8
NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
NUMA: Node 0 [0,80000000) + [100000000,880000000) -> [0,880000000)
Adding active range (0, 0x10, 0x95) 0 entries of 3200 used
Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used
Adding active range (0, 0x100000, 0x880000) 2 entries of 3200 used
Adding active range (1, 0x880000, 0x1000000) 3 entries of 3200 used
Adding active range (2, 0x1080000, 0x1000000) 4 entries of 3200 used
------------[ cut here ]------------
WARNING: at mm/sparse.c:170 mminit_validate_memmodel_limits+0x29/0x69()
Hardware name: Sun Fire X4800 M2
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.39-rc5-tip-03938-gda7ba4e-dirty #896
Call Trace:
 [<40261c57>] warn_slowpath_common+0x65/0x7a
 [<40e30020>] ? mminit_validate_memmodel_limits+0x29/0x69
 [<40261c7b>] warn_slowpath_null+0xf/0x13
 [<40e30020>] mminit_validate_memmodel_limits+0x29/0x69
 [<40e10a2e>] add_active_range+0x34/0xdb
 [<40e0c281>] memblock_x86_register_active_regions+0x92/0xb3
 [<40e0a8c1>] numa_init+0xf1/0x5e9
 [<408cad47>] ? printk+0xf/0x11
 [<40e0af98>] x86_numa_init+0x16/0x34
 [<40e0b2dd>] initmem_init+0x8/0xbc
 [<40dfc3ce>] setup_arch+0xa27/0xad7
 [<40df85cb>] start_kernel+0x71/0x2ed
 [<40df80c4>] i386_start_kernel+0xc4/0xcb
---[ end trace 4eaa2a86a8e2da22 ]---
Adding active range (3, 0x1880000, 0x1000000) 5 entries of 3200 used
Adding active range (4, 0x2080000, 0x1000000) 6 entries of 3200 used
Adding active range (5, 0x2880000, 0x1000000) 7 entries of 3200 used
Adding active range (6, 0x3080000, 0x1000000) 8 entries of 3200 used
Adding active range (7, 0x3880000, 0x1000000) 9 entries of 3200 used
NUMA: nodes only cover 0MB of your 63478MB e820 RAM. Not used.
No NUMA configuration found
Faking a node at 0000000000000000-0000001000000000
Adding active range (0, 0x10, 0x95) 0 entries of 3200 used
Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used
Adding active range (0, 0x100000, 0x1000000) 2 entries of 3200 used
node 0 pfn: [0 - 1000000]
remap_alloc: node 0 [fffe00000-1000000000) -> [bde00000-be000000)
Initmem setup node 0 [0000000000000000-0000000fffffffff]
  NODE_DATA [#000000007de00000 - 0x0000007de02fff] (remapped)
62606MB HIGHMEM available.
2929MB LOWMEM available.
max_low_pfn = b71fe, highstart_pfn = b71fe


need following patch.

Thanks

Yinghai

[PATCH] x86, numa: Trim numa meminfo correctly

During testing 32bit numa unifying code from tj, found one system with more than 64g
fail to use numa.

It turn out we do not trim that numa meminfo correctly with max_pfn.
start could be bigger than 64g too.

Also need to make the checking in seperated loop.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/mm/numa.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa.c
+++ linux-2.6/arch/x86/mm/numa.c
@@ -272,6 +272,7 @@ int __init numa_cleanup_meminfo(struct n
 	const u64 high = PFN_PHYS(max_pfn);
 	int i, j, k;
 
+	/* trim all entries at first */
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *bi = &mi->blk[i];
 
@@ -280,10 +281,12 @@ int __init numa_cleanup_meminfo(struct n
 		bi->end = min(bi->end, high);
 
 		/* and there's no empty block */
-		if (bi->start == bi->end) {
+		if (bi->start >= bi->end)
 			numa_remove_memblk_from(i--, mi);
-			continue;
-		}
+	}
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *bi = &mi->blk[i];
 
 		for (j = i + 1; j < mi->nr_blks; j++) {
 			struct numa_memblk *bj = &mi->blk[j];
@@ -313,8 +316,8 @@ int __init numa_cleanup_meminfo(struct n
 			 */
 			if (bi->nid != bj->nid)
 				continue;
-			start = max(min(bi->start, bj->start), low);
-			end = min(max(bi->end, bj->end), high);
+			start = min(bi->start, bj->start);
+			end = max(bi->end, bj->end);
 			for (k = 0; k < mi->nr_blks; k++) {
 				struct numa_memblk *bk = &mi->blk[k];
 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/25] x86-64, NUMA: simplify nodedata allocation
  2011-04-29 17:23   ` Yinghai Lu
@ 2011-04-30 12:02     ` Tejun Heo
  0 siblings, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-30 12:02 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On Fri, Apr 29, 2011 at 10:23:27AM -0700, Yinghai Lu wrote:
> On 04/29/2011 08:28 AM, Tejun Heo wrote:
> > With top-down memblock allocation, the allocation range limits in
> > ealry_node_mem() can be simplified - try node-local first, then any
> > node but in any case don't allocate below DMA limit.
> > 
> > Remove early_node_mem() and implement simplified allocation directly
> > in setup_node_bootmem().
> 
> keep early_node_mem would be better?

I don't know, maybe, maybe not.  I usually find separating out linear
procedural logic into a function, which is used only once, more
distracting / obfuscating than helpful.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/25] x86, NUMA: trivial cleanups
  2011-04-29 17:25   ` Yinghai Lu
@ 2011-04-30 12:03     ` Tejun Heo
  2011-04-30 16:24       ` Yinghai Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Tejun Heo @ 2011-04-30 12:03 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On Fri, Apr 29, 2011 at 10:25:44AM -0700, Yinghai Lu wrote:
> On 04/29/2011 08:28 AM, Tejun Heo wrote:
> > * Kill no longer used struct bootnode.
> > 
> > * Kill dangling declaration of pxm_to_nid() in numa_32.h.
> > 
> > * Make setup_node_bootmem() static.
> 
> first one and third one should appear in patches that I posted before.

Sorry, can't understand what you mean.  What are you trying to say?

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization
  2011-04-29 20:14 ` Yinghai Lu
@ 2011-04-30 12:17   ` Tejun Heo
  2011-04-30 12:33     ` [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() Tejun Heo
  2011-04-30 16:31     ` [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Yinghai Lu
  0 siblings, 2 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-30 12:17 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

Hello, Yinghai.

Nice catch, but,

On Fri, Apr 29, 2011 at 01:14:14PM -0700, Yinghai Lu wrote:
> [PATCH] x86, numa: Trim numa meminfo correctly
> 
> During testing 32bit numa unifying code from tj, found one system with more than 64g
> fail to use numa.
> 
> It turn out we do not trim that numa meminfo correctly with max_pfn.
> start could be bigger than 64g too.
> 
> Also need to make the checking in seperated loop.

Why?

Isn't all that necessary the following?

---
 arch/x86/mm/numa.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: work/arch/x86/mm/numa.c
===================================================================
--- work.orig/arch/x86/mm/numa.c
+++ work/arch/x86/mm/numa.c
@@ -278,7 +278,7 @@ int __init numa_cleanup_meminfo(struct n
 		bi->end = min(bi->end, high);
 
 		/* and there's no empty block */
-		if (bi->start == bi->end) {
+		if (bi->start >= bi->end) {
 			numa_remove_memblk_from(i--, mi);
 			continue;
 		}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  2011-04-30 12:17   ` Tejun Heo
@ 2011-04-30 12:33     ` Tejun Heo
  2011-04-30 12:35       ` Tejun Heo
  2011-05-01  0:43       ` Yinghai Lu
  2011-04-30 16:31     ` [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Yinghai Lu
  1 sibling, 2 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-30 12:33 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

From: Yinghai Lu <yinghai@kernel.org>

numa_cleanup_meminfo() trims each memblk between low (0) and high
(max_pfn) limits and discard empty ones.  However, the emptiness
detection incorrectly used equality test.  If the start of a memblk is
higher than max_pfn, it is empty but fails the equality test and
doesn't get discarded.

Fix it by using >= instead of ==.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
So, something like this.  Does this fix the problem you see?

Thanks.

 arch/x86/mm/numa_64.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: work/arch/x86/mm/numa_64.c
===================================================================
--- work.orig/arch/x86/mm/numa_64.c
+++ work/arch/x86/mm/numa_64.c
@@ -191,7 +191,7 @@ int __init numa_cleanup_meminfo(struct n
 		bi->end = min(bi->end, high);
 
 		/* and there's no empty block */
-		if (bi->start == bi->end) {
+		if (bi->start >= bi->end) {
 			numa_remove_memblk_from(i--, mi);
 			continue;
 		}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  2011-04-30 12:33     ` [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() Tejun Heo
@ 2011-04-30 12:35       ` Tejun Heo
  2011-05-01  0:43       ` Yinghai Lu
  1 sibling, 0 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-30 12:35 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On Sat, Apr 30, 2011 at 02:33:30PM +0200, Tejun Heo wrote:
> From: Yinghai Lu <yinghai@kernel.org>
> 
> numa_cleanup_meminfo() trims each memblk between low (0) and high
> (max_pfn) limits and discard empty ones.  However, the emptiness
> detection incorrectly used equality test.  If the start of a memblk is
> higher than max_pfn, it is empty but fails the equality test and
> doesn't get discarded.
> 
> Fix it by using >= instead of ==.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> So, something like this.  Does this fix the problem you see?

Ooh, this is before the code is moved to numa.c, so please test the
previous patch against numa.c.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/25] x86, NUMA: trivial cleanups
  2011-04-30 12:03     ` Tejun Heo
@ 2011-04-30 16:24       ` Yinghai Lu
  2011-04-30 18:00         ` Tejun Heo
  0 siblings, 1 reply; 43+ messages in thread
From: Yinghai Lu @ 2011-04-30 16:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On 04/30/2011 05:03 AM, Tejun Heo wrote:
> On Fri, Apr 29, 2011 at 10:25:44AM -0700, Yinghai Lu wrote:
>> On 04/29/2011 08:28 AM, Tejun Heo wrote:
>>> * Kill no longer used struct bootnode.
>>>
>>> * Kill dangling declaration of pxm_to_nid() in numa_32.h.
>>>
>>> * Make setup_node_bootmem() static.
>>
>> first one and third one should appear in patches that I posted before.
> 
> Sorry, can't understand what you mean.  What are you trying to say?
> 

I posted two patches for 
Kill no longer used struct bootnode.
Make setup_node_bootmem() static

Also setup_node_bootmem now is some misleading, could change to setup_node_data() ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization
  2011-04-30 12:17   ` Tejun Heo
  2011-04-30 12:33     ` [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() Tejun Heo
@ 2011-04-30 16:31     ` Yinghai Lu
  1 sibling, 0 replies; 43+ messages in thread
From: Yinghai Lu @ 2011-04-30 16:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On 04/30/2011 05:17 AM, Tejun Heo wrote:
> Hello, Yinghai.
> 
> Nice catch, but,
> 
> On Fri, Apr 29, 2011 at 01:14:14PM -0700, Yinghai Lu wrote:
>> [PATCH] x86, numa: Trim numa meminfo correctly
>>
>> During testing 32bit numa unifying code from tj, found one system with more than 64g
>> fail to use numa.
>>
>> It turn out we do not trim that numa meminfo correctly with max_pfn.
>> start could be bigger than 64g too.
>>
>> Also need to make the checking in seperated loop.
> 
> Why?

so do not need to compare them with low/high in following inner loop.


> 
> Isn't all that necessary the following?
> 
> ---
>  arch/x86/mm/numa.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: work/arch/x86/mm/numa.c
> ===================================================================
> --- work.orig/arch/x86/mm/numa.c
> +++ work/arch/x86/mm/numa.c
> @@ -278,7 +278,7 @@ int __init numa_cleanup_meminfo(struct n
>  		bi->end = min(bi->end, high);
>  
>  		/* and there's no empty block */
> -		if (bi->start == bi->end) {
> +		if (bi->start >= bi->end) {
>  			numa_remove_memblk_from(i--, mi);
>  			continue;
>  		}


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/25] x86, NUMA: trivial cleanups
  2011-04-30 16:24       ` Yinghai Lu
@ 2011-04-30 18:00         ` Tejun Heo
  2011-04-30 23:10           ` Yinghai Lu
  2011-04-30 23:11           ` [PATCH] x86, numa: Rename setup_node_bootmem to setup_node_data Yinghai Lu
  0 siblings, 2 replies; 43+ messages in thread
From: Tejun Heo @ 2011-04-30 18:00 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

Hello,

On Sat, Apr 30, 2011 at 6:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> I posted two patches for
> Kill no longer used struct bootnode.
> Make setup_node_bootmem() static

Ah, okay.  Do you mind resending them to me?  I'll integrate it with
the patchset.

> Also setup_node_bootmem now is some misleading, could change to setup_node_data() ?

Sure, I don't like the current name either.  Do you mind sending a
patch for that too?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/25] x86, NUMA: trivial cleanups
  2011-04-30 18:00         ` Tejun Heo
@ 2011-04-30 23:10           ` Yinghai Lu
  2011-04-30 23:11           ` [PATCH] x86, numa: Rename setup_node_bootmem to setup_node_data Yinghai Lu
  1 sibling, 0 replies; 43+ messages in thread
From: Yinghai Lu @ 2011-04-30 23:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On 04/30/2011 11:00 AM, Tejun Heo wrote:
> Hello,
> 
> On Sat, Apr 30, 2011 at 6:24 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> I posted two patches for
>> Kill no longer used struct bootnode.
>> Make setup_node_bootmem() static
> 
> Ah, okay.  Do you mind resending them to me?  I'll integrate it with
> the patchset.

Never mind. Do not want to waste your time to rebase your tree.

> 
>> Also setup_node_bootmem now is some misleading, could change to setup_node_data() ?
> 
> Sure, I don't like the current name either.  Do you mind sending a
> patch for that too?

ok, another mail.

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] x86, numa: Rename setup_node_bootmem to setup_node_data
  2011-04-30 18:00         ` Tejun Heo
  2011-04-30 23:10           ` Yinghai Lu
@ 2011-04-30 23:11           ` Yinghai Lu
  1 sibling, 0 replies; 43+ messages in thread
From: Yinghai Lu @ 2011-04-30 23:11 UTC (permalink / raw)
  To: Tejun Heo, mingo, rientjes, tglx, hpa; +Cc: linux-kernel


after using memblock to replace bootmem, that function only setup node_data
now.

Change the name to reflecting the real work.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/mm/numa.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa.c
+++ linux-2.6/arch/x86/mm/numa.c
@@ -189,8 +189,8 @@ int __init numa_add_memblk(int nid, u64
 	return numa_add_memblk_to(nid, start, end, &numa_meminfo);
 }
 
-/* Initialize bootmem allocator for a node */
-static void __init setup_node_bootmem(int nid, u64 start, u64 end)
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start, u64 end)
 {
 	const u64 nd_low = PFN_PHYS(MAX_DMA_PFN);
 	const u64 nd_high = PFN_PHYS(max_pfn_mapped);
@@ -522,7 +522,7 @@ static int __init numa_register_memblks(
 		}
 
 		if (start < end)
-			setup_node_bootmem(nid, start, end);
+			setup_node_data(nid, start, end);
 	}
 
 	return 0;

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  2011-04-30 12:33     ` [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() Tejun Heo
  2011-04-30 12:35       ` Tejun Heo
@ 2011-05-01  0:43       ` Yinghai Lu
  2011-05-01 10:20         ` Tejun Heo
  1 sibling, 1 reply; 43+ messages in thread
From: Yinghai Lu @ 2011-05-01  0:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

On 04/30/2011 05:33 AM, Tejun Heo wrote:
> From: Yinghai Lu <yinghai@kernel.org>
> 
> numa_cleanup_meminfo() trims each memblk between low (0) and high
> (max_pfn) limits and discard empty ones.  However, the emptiness
> detection incorrectly used equality test.  If the start of a memblk is
> higher than max_pfn, it is empty but fails the equality test and
> doesn't get discarded.
> 
> Fix it by using >= instead of ==.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> So, something like this.  Does this fix the problem you see?
> 
> Thanks.
> 
>  arch/x86/mm/numa_64.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: work/arch/x86/mm/numa.c
> ===================================================================
> --- work.orig/arch/x86/mm/numa.c
> +++ work/arch/x86/mm/numa.c
> @@ -191,7 +191,7 @@ int __init numa_cleanup_meminfo(struct n
>  		bi->end = min(bi->end, high);
>  
>  		/* and there's no empty block */
> -		if (bi->start == bi->end) {
> +		if (bi->start >= bi->end) {
>  			numa_remove_memblk_from(i--, mi);
>  			continue;
>  		}
this one works too
but print out is some strange
on 512g system got:

SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 100000-80000000
SRAT: Node 0 PXM 0 100000000-1080000000
SRAT: Node 1 PXM 1 1080000000-2080000000
SRAT: Node 2 PXM 2 2080000000-3080000000
SRAT: Node 3 PXM 3 3080000000-4080000000
SRAT: Node 4 PXM 4 4080000000-5080000000
SRAT: Node 5 PXM 5 5080000000-6080000000
SRAT: Node 6 PXM 6 6080000000-7080000000
SRAT: Node 7 PXM 7 7080000000-8080000000
NUMA: Initialized distance table, cnt=8
NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000)


first patch on 512g system got 
NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000)

still thinking first one is more clean.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  2011-05-01  0:43       ` Yinghai Lu
@ 2011-05-01 10:20         ` Tejun Heo
  2011-05-01 19:44           ` [PATCH] x86, numa: Trim numa meminfo with max_pfn in separated loop Yinghai Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Tejun Heo @ 2011-05-01 10:20 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: mingo, rientjes, tglx, hpa, x86, linux-kernel

Hello,

On Sat, Apr 30, 2011 at 05:43:22PM -0700, Yinghai Lu wrote:
> this one works too
> but print out is some strange
> on 512g system got:
> 
> NUMA: Initialized distance table, cnt=8
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000)
> 
> 
> first patch on 512g system got 
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000)
> 
> still thinking first one is more clean.

Yeah, I don't object to the change (it's easier to understand too) but
it should be a separate patch because it's unrelated change to the fix
itself.  If you don't mind sending the splitting part separately, I'll
put it on top of the patchset.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] x86, numa: Trim numa meminfo with max_pfn in separated loop
  2011-05-01 10:20         ` Tejun Heo
@ 2011-05-01 19:44           ` Yinghai Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Yinghai Lu @ 2011-05-01 19:44 UTC (permalink / raw)
  To: Tejun Heo, mingo, rientjes, tglx, hpa; +Cc: x86, linux-kernel


During testing 32bit numa unifying code from tj, found one system with more than
64g fail to use numa.
It turns out we do not trim that numa meminfo correctly with max_pfn. Because
start could be bigger than 64g too.
Bug fix (checking correctly) already made it to tip tree.

This one move the checking and trimming to separated loop.
So We don't need to compare low/high in following merge loops.
It makes the code more readable.

Also make one 512g numa system with 32bit get not strange print out.
befrore:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000)

after:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000)

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/mm/numa.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa.c
+++ linux-2.6/arch/x86/mm/numa.c
@@ -272,6 +272,7 @@ int __init numa_cleanup_meminfo(struct n
 	const u64 high = PFN_PHYS(max_pfn);
 	int i, j, k;
 
+	/* Trim all entries at first */
 	for (i = 0; i < mi->nr_blks; i++) {
 		struct numa_memblk *bi = &mi->blk[i];
 
@@ -280,10 +281,12 @@ int __init numa_cleanup_meminfo(struct n
 		bi->end = min(bi->end, high);
 
 		/* and there's no empty block */
-		if (bi->start >= bi->end) {
+		if (bi->start >= bi->end)
 			numa_remove_memblk_from(i--, mi);
-			continue;
-		}
+	}
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *bi = &mi->blk[i];
 
 		for (j = i + 1; j < mi->nr_blks; j++) {
 			struct numa_memblk *bj = &mi->blk[j];
@@ -313,8 +316,8 @@ int __init numa_cleanup_meminfo(struct n
 			 */
 			if (bi->nid != bj->nid)
 				continue;
-			start = max(min(bi->start, bj->start), low);
-			end = min(max(bi->end, bj->end), high);
+			start = min(bi->start, bj->start);
+			end = max(bi->end, bj->end);
 			for (k = 0; k < mi->nr_blks; k++) {
 				struct numa_memblk *bk = &mi->blk[k];
 

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2011-05-01 19:45 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-29 15:28 [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Tejun Heo
2011-04-29 15:28 ` [PATCH 01/25] x86-64, NUMA: Simplify hotadd memory handling Tejun Heo
2011-04-29 15:28 ` [PATCH 02/25] x86-64, NUMA: trivial cleanups for setup_node_bootmem() Tejun Heo
2011-04-29 15:28 ` [PATCH 03/25] x86-64, NUMA: simplify nodedata allocation Tejun Heo
2011-04-29 17:23   ` Yinghai Lu
2011-04-30 12:02     ` Tejun Heo
2011-04-29 15:28 ` [PATCH 04/25] x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC() Tejun Heo
2011-04-29 15:28 ` [PATCH 05/25] x86, NUMA: Unify 32/64bit numa_cpu_node() implementation Tejun Heo
2011-04-29 15:28 ` [PATCH 06/25] x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional Tejun Heo
2011-04-29 15:28 ` [PATCH 07/25] x86-32, NUMA: use sparse_memory_present_with_active_regions() Tejun Heo
2011-04-29 15:28 ` [PATCH 08/25] x86, NUMA: trivial cleanups Tejun Heo
2011-04-29 17:25   ` Yinghai Lu
2011-04-30 12:03     ` Tejun Heo
2011-04-30 16:24       ` Yinghai Lu
2011-04-30 18:00         ` Tejun Heo
2011-04-30 23:10           ` Yinghai Lu
2011-04-30 23:11           ` [PATCH] x86, numa: Rename setup_node_bootmem to setup_node_data Yinghai Lu
2011-04-29 15:28 ` [PATCH 09/25] x86, NUMA: rename srat_64.c to srat.c Tejun Heo
2011-04-29 15:28 ` [PATCH 10/25] x86, NUMA: make srat.c 32bit safe Tejun Heo
2011-04-29 15:28 ` [PATCH 11/25] x86-32, NUMA: Move get_memcfg_numa() into numa_32.c Tejun Heo
2011-04-29 15:28 ` [PATCH 12/25] x86, NUMA: Move numa_nodes_parsed to numa.[hc] Tejun Heo
2011-04-29 15:28 ` [PATCH 13/25] x86-32, NUMA: implement temporary NUMA init shims Tejun Heo
2011-04-29 15:28 ` [PATCH 14/25] x86-32, NUMA: Replace srat_32.c with srat.c Tejun Heo
2011-04-29 15:28 ` [PATCH 15/25] x86-32, NUMA: Update numaq to use new NUMA init protocol Tejun Heo
2011-04-29 15:28 ` [PATCH 16/25] x86, NUMA: Move NUMA init logic from numa_64.c to numa.c Tejun Heo
2011-04-29 15:28 ` [PATCH 17/25] x86, NUMA: Enable build of generic NUMA init code on 32bit Tejun Heo
2011-04-29 15:28 ` [PATCH 18/25] x86, NUMA: Remove long 64bit assumption from numa.c Tejun Heo
2011-04-29 15:28 ` [PATCH 19/25] x86-32, NUMA: Add @start and @end to init_alloc_remap() Tejun Heo
2011-04-29 15:28 ` [PATCH 20/25] x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() Tejun Heo
2011-04-29 15:28 ` [PATCH 21/25] x86, NUMA: Make 32bit use common NUMA init path Tejun Heo
2011-04-29 15:28 ` [PATCH 22/25] x86, NUMA: Make numa_init_array() static Tejun Heo
2011-04-29 15:28 ` [PATCH 23/25] x86, NUMA: Rename amdtopology_64.c to amdtopology.c Tejun Heo
2011-04-29 15:28 ` [PATCH 24/25] x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too Tejun Heo
2011-04-29 15:28 ` [PATCH 25/25] x86, NUMA: Enable emulation " Tejun Heo
2011-04-29 18:15 ` [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Ingo Molnar
2011-04-29 20:14 ` Yinghai Lu
2011-04-30 12:17   ` Tejun Heo
2011-04-30 12:33     ` [PATCH] x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() Tejun Heo
2011-04-30 12:35       ` Tejun Heo
2011-05-01  0:43       ` Yinghai Lu
2011-05-01 10:20         ` Tejun Heo
2011-05-01 19:44           ` [PATCH] x86, numa: Trim numa meminfo with max_pfn in separated loop Yinghai Lu
2011-04-30 16:31     ` [PATCHSET tip] x86, NUMA: Unify 32 and 64bit NUMA initialization Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).