All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] x86, numa: Do not scan two times for setup_node_bootmem()
       [not found] <4D5EC05A.60103@kernel.org>
@ 2011-02-18 18:59 ` Yinghai Lu
  2011-02-20  4:03   ` David Rientjes
  2011-02-18 18:59 ` [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem() Yinghai Lu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2011-02-18 18:59 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo
  Cc: linux-kernel, David Rientjes, Cyrill Gorcunov


We don't need to scan two times for setup_node_bootmem()
because We are using mapped memblock for node_data finding already.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa_64.c |   31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -480,7 +480,7 @@ static bool __init numa_meminfo_cover_me
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-	int i, j, nid;
+	int i, nid;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -507,27 +507,22 @@ static int __init numa_register_memblks(
 	init_memory_mapping_high();
 
 	/*
-	 * Finally register nodes.  Do it twice in case setup_node_bootmem
-	 * missed one due to missing bootmem.
+	 * Do not do that twice, not needed!
+	 *   We are using mapped memblock directly for node data
 	 */
-	for (i = 0; i < 2; i++) {
-		for_each_node_mask(nid, node_possible_map) {
-			u64 start = (u64)max_pfn << PAGE_SHIFT;
-			u64 end = 0;
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = (u64)max_pfn << PAGE_SHIFT;
+		u64 end = 0;
 
-			if (node_online(nid))
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
 				continue;
-
-			for (j = 0; j < mi->nr_blks; j++) {
-				if (nid != mi->blk[j].nid)
-					continue;
-				start = min(mi->blk[j].start, start);
-				end = max(mi->blk[j].end, end);
-			}
-
-			if (start < end)
-				setup_node_bootmem(nid, start, end);
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
 		}
+
+		if (start < end)
+			setup_node_bootmem(nid, start, end);
 	}
 
 	return 0;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
       [not found] <4D5EC05A.60103@kernel.org>
  2011-02-18 18:59 ` [PATCH 1/4] x86, numa: Do not scan two times for setup_node_bootmem() Yinghai Lu
@ 2011-02-18 18:59 ` Yinghai Lu
  2011-02-20  4:03   ` David Rientjes
  2011-02-18 18:59 ` [PATCH 3/4] x86, numa: cleanup x86_acpi_numa_init() Yinghai Lu
  2011-02-18 18:59 ` [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance Yinghai Lu
  3 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2011-02-18 18:59 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo
  Cc: linux-kernel, David Rientjes, Cyrill Gorcunov


We have top-down allocation already. So do not bothter to adjust
start/end to make sure get high addr on local hode.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa_64.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -164,14 +164,9 @@ static void * __init early_node_mem(int
 	unsigned long mem;
 
 	/*
-	 * put it on high as possible
-	 * something will go with NODE_DATA
+	 * memblock find will follow top-down, so don't need to adjust
+	 *   start anymore
 	 */
-	if (start < (MAX_DMA_PFN<<PAGE_SHIFT))
-		start = MAX_DMA_PFN<<PAGE_SHIFT;
-	if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
-	    end > (MAX_DMA32_PFN<<PAGE_SHIFT))
-		start = MAX_DMA32_PFN<<PAGE_SHIFT;
 	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
 	if (mem != MEMBLOCK_ERROR)
 		return __va(mem);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/4] x86, numa: cleanup x86_acpi_numa_init()
       [not found] <4D5EC05A.60103@kernel.org>
  2011-02-18 18:59 ` [PATCH 1/4] x86, numa: Do not scan two times for setup_node_bootmem() Yinghai Lu
  2011-02-18 18:59 ` [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem() Yinghai Lu
@ 2011-02-18 18:59 ` Yinghai Lu
  2011-02-20  4:03   ` David Rientjes
  2011-02-18 18:59 ` [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance Yinghai Lu
  3 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2011-02-18 18:59 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo
  Cc: linux-kernel, David Rientjes, Cyrill Gorcunov


make it more readable. put valid checking together.

Also restore old acpi_numa_init(). we don't need to touch it
because already have x86 own wrapper.

We can limit change to x86 code only.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/srat_64.c |   13 ++++++++++++-
 drivers/acpi/numa.c   |    8 +++-----
 2 files changed, 15 insertions(+), 6 deletions(-)

Index: linux-2.6/arch/x86/mm/srat_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/srat_64.c
+++ linux-2.6/arch/x86/mm/srat_64.c
@@ -240,9 +240,20 @@ int __init x86_acpi_numa_init(void)
 	int ret;
 
 	ret = acpi_numa_init();
+
+	/* no srat */
+	if (!ret)
+		return -ENOENT;
+
+	/* parse failed */
 	if (ret < 0)
 		return ret;
-	return srat_disabled() ? -EINVAL : 0;
+
+	/* bad_srat() */
+	if (srat_disabled())
+		return -EINVAL;
+
+	return 0;
 }
 
 #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) || defined(CONFIG_ACPI_HOTPLUG_MEMORY)
Index: linux-2.6/drivers/acpi/numa.c
===================================================================
--- linux-2.6.orig/drivers/acpi/numa.c
+++ linux-2.6/drivers/acpi/numa.c
@@ -274,7 +274,7 @@ acpi_table_parse_srat(enum acpi_srat_typ
 
 int __init acpi_numa_init(void)
 {
-	int cnt = 0;
+	int ret = 0;
 
 	/*
 	 * Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
@@ -288,7 +288,7 @@ int __init acpi_numa_init(void)
 				     acpi_parse_x2apic_affinity, 0);
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
 				     acpi_parse_processor_affinity, 0);
-		cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
+		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
 					    acpi_parse_memory_affinity,
 					    NR_NODE_MEMBLKS);
 	}
@@ -298,9 +298,7 @@ int __init acpi_numa_init(void)
 
 	acpi_numa_arch_fixup();
 
-	if (cnt <= 0)
-		return cnt ?: -ENOENT;
-	return 0;
+	return ret;
 }
 
 int acpi_get_pxm(acpi_handle h)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance.
       [not found] <4D5EC05A.60103@kernel.org>
                   ` (2 preceding siblings ...)
  2011-02-18 18:59 ` [PATCH 3/4] x86, numa: cleanup x86_acpi_numa_init() Yinghai Lu
@ 2011-02-18 18:59 ` Yinghai Lu
  2011-02-20  4:03   ` David Rientjes
  2011-02-21 10:48   ` Tejun Heo
  3 siblings, 2 replies; 23+ messages in thread
From: Yinghai Lu @ 2011-02-18 18:59 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo
  Cc: linux-kernel, David Rientjes, Cyrill Gorcunov


alloc code is much bigger the setting self.

 make the code some readable.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa_64.c |   77 +++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 35 deletions(-)

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -375,6 +375,43 @@ static void __init numa_reset_distance(v
 	numa_distance = NULL;
 }
 
+static void __init alloc_numa_distance(void)
+{
+	nodemask_t nodes_parsed;
+	size_t size;
+	int i, j, cnt = 0;
+	u64 phys;
+
+	/* size the new table and allocate it */
+	nodes_parsed = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
+
+	for_each_node_mask(i, nodes_parsed)
+		cnt = i;
+	size = ++cnt * sizeof(numa_distance[0]);
+
+	phys = memblock_find_in_range(0,
+				      (u64)max_pfn_mapped << PAGE_SHIFT,
+				      size, PAGE_SIZE);
+	if (phys == MEMBLOCK_ERROR) {
+		pr_warning("NUMA: Warning: can't allocate distance table!\n");
+		/* don't retry until explicitly reset */
+		numa_distance = (void *)1LU;
+		return;
+	}
+	memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
+
+	numa_distance = __va(phys);
+	numa_distance_cnt = cnt;
+
+	/* fill with the default distances */
+	for (i = 0; i < cnt; i++)
+		for (j = 0; j < cnt; j++)
+			numa_distance[i * cnt + j] = i == j ?
+				LOCAL_DISTANCE : REMOTE_DISTANCE;
+	printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
+}
+
 /*
  * Set the distance between node @from to @to to @distance.  If distance
  * table doesn't exist, one which is large enough to accomodate all the
@@ -382,41 +419,11 @@ static void __init numa_reset_distance(v
  */
 void __init numa_set_distance(int from, int to, int distance)
 {
-	if (!numa_distance) {
-		nodemask_t nodes_parsed;
-		size_t size;
-		int i, j, cnt = 0;
-		u64 phys;
-
-		/* size the new table and allocate it */
-		nodes_parsed = numa_nodes_parsed;
-		numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
-
-		for_each_node_mask(i, nodes_parsed)
-			cnt = i;
-		size = ++cnt * sizeof(numa_distance[0]);
-
-		phys = memblock_find_in_range(0,
-					      (u64)max_pfn_mapped << PAGE_SHIFT,
-					      size, PAGE_SIZE);
-		if (phys == MEMBLOCK_ERROR) {
-			pr_warning("NUMA: Warning: can't allocate distance table!\n");
-			/* don't retry until explicitly reset */
-			numa_distance = (void *)1LU;
-			return;
-		}
-		memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
-
-		numa_distance = __va(phys);
-		numa_distance_cnt = cnt;
-
-		/* fill with the default distances */
-		for (i = 0; i < cnt; i++)
-			for (j = 0; j < cnt; j++)
-				numa_distance[i * cnt + j] = i == j ?
-					LOCAL_DISTANCE : REMOTE_DISTANCE;
-		printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
-	}
+	if (!numa_distance)
+		alloc_numa_distance();
+
+	if (!numa_distance_cnt)
+		return;
 
 	if (from >= numa_distance_cnt || to >= numa_distance_cnt) {
 		printk_once(KERN_DEBUG "NUMA: Debug: distance out of bound, from=%d to=%d distance=%d\n",

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance.
  2011-02-18 18:59 ` [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance Yinghai Lu
@ 2011-02-20  4:03   ` David Rientjes
  2011-02-21 10:48   ` Tejun Heo
  1 sibling, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-02-20  4:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	linux-kernel, Cyrill Gorcunov

On Fri, 18 Feb 2011, Yinghai Lu wrote:

> 
> alloc code is much bigger the setting self.
> 
>  make the code some readable.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/4] x86, numa: Do not scan two times for setup_node_bootmem()
  2011-02-18 18:59 ` [PATCH 1/4] x86, numa: Do not scan two times for setup_node_bootmem() Yinghai Lu
@ 2011-02-20  4:03   ` David Rientjes
  2011-02-21 10:20     ` [PATCH] x86-64, NUMA: " Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-02-20  4:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	linux-kernel, Cyrill Gorcunov

On Fri, 18 Feb 2011, Yinghai Lu wrote:

> 
> We don't need to scan two times for setup_node_bootmem()
> because We are using mapped memblock for node_data finding already.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/mm/numa_64.c |   31 +++++++++++++------------------
>  1 file changed, 13 insertions(+), 18 deletions(-)
> 
> Index: linux-2.6/arch/x86/mm/numa_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/numa_64.c
> +++ linux-2.6/arch/x86/mm/numa_64.c
> @@ -480,7 +480,7 @@ static bool __init numa_meminfo_cover_me
>  
>  static int __init numa_register_memblks(struct numa_meminfo *mi)
>  {
> -	int i, j, nid;
> +	int i, nid;
>  
>  	/* Account for nodes with cpus and no memory */
>  	node_possible_map = numa_nodes_parsed;
> @@ -507,27 +507,22 @@ static int __init numa_register_memblks(
>  	init_memory_mapping_high();
>  
>  	/*
> -	 * Finally register nodes.  Do it twice in case setup_node_bootmem
> -	 * missed one due to missing bootmem.
> +	 * Do not do that twice, not needed!
> +	 *   We are using mapped memblock directly for node data

I don't think this comment is actually needed, it seems to be referring to 
why this patch is changing it and has no real relevance once it's merged.

Other than that:

Acked-by: David Rientjes <rientjes@google.com>

>  	 */
> -	for (i = 0; i < 2; i++) {
> -		for_each_node_mask(nid, node_possible_map) {
> -			u64 start = (u64)max_pfn << PAGE_SHIFT;
> -			u64 end = 0;
> +	for_each_node_mask(nid, node_possible_map) {
> +		u64 start = (u64)max_pfn << PAGE_SHIFT;
> +		u64 end = 0;
>  
> -			if (node_online(nid))
> +		for (i = 0; i < mi->nr_blks; i++) {
> +			if (nid != mi->blk[i].nid)
>  				continue;
> -
> -			for (j = 0; j < mi->nr_blks; j++) {
> -				if (nid != mi->blk[j].nid)
> -					continue;
> -				start = min(mi->blk[j].start, start);
> -				end = max(mi->blk[j].end, end);
> -			}
> -
> -			if (start < end)
> -				setup_node_bootmem(nid, start, end);
> +			start = min(mi->blk[i].start, start);
> +			end = max(mi->blk[i].end, end);
>  		}
> +
> +		if (start < end)
> +			setup_node_bootmem(nid, start, end);
>  	}
>  
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-18 18:59 ` [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem() Yinghai Lu
@ 2011-02-20  4:03   ` David Rientjes
  2011-02-20  4:17     ` Yinghai Lu
  0 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-02-20  4:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	linux-kernel, Cyrill Gorcunov

On Fri, 18 Feb 2011, Yinghai Lu wrote:

> Index: linux-2.6/arch/x86/mm/numa_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/numa_64.c
> +++ linux-2.6/arch/x86/mm/numa_64.c
> @@ -164,14 +164,9 @@ static void * __init early_node_mem(int
>  	unsigned long mem;
>  
>  	/*
> -	 * put it on high as possible
> -	 * something will go with NODE_DATA
> +	 * memblock find will follow top-down, so don't need to adjust
> +	 *   start anymore
>  	 */
> -	if (start < (MAX_DMA_PFN<<PAGE_SHIFT))
> -		start = MAX_DMA_PFN<<PAGE_SHIFT;
> -	if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
> -	    end > (MAX_DMA32_PFN<<PAGE_SHIFT))
> -		start = MAX_DMA32_PFN<<PAGE_SHIFT;
>  	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
>  	if (mem != MEMBLOCK_ERROR)
>  		return __va(mem);
> 

The old code guarantees that the range is from a single zone, and even 
though memblock may be top -> down, it seems like there would be 
configurations where this would still be an issue (perhaps simulating it 
with numa=fake for testing?) if it crosses the boundary.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] x86, numa: cleanup x86_acpi_numa_init()
  2011-02-18 18:59 ` [PATCH 3/4] x86, numa: cleanup x86_acpi_numa_init() Yinghai Lu
@ 2011-02-20  4:03   ` David Rientjes
  2011-02-21  9:46     ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-02-20  4:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	linux-kernel, Cyrill Gorcunov

On Fri, 18 Feb 2011, Yinghai Lu wrote:

> 
> make it more readable. put valid checking together.
> 
> Also restore old acpi_numa_init(). we don't need to touch it
> because already have x86 own wrapper.
> 
> We can limit change to x86 code only.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-20  4:03   ` David Rientjes
@ 2011-02-20  4:17     ` Yinghai Lu
  2011-02-21  9:43       ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2011-02-20  4:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	linux-kernel, Cyrill Gorcunov

On Sat, Feb 19, 2011 at 8:03 PM, David Rientjes <rientjes@google.com> wrote:
> On Fri, 18 Feb 2011, Yinghai Lu wrote:
>
>> Index: linux-2.6/arch/x86/mm/numa_64.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/mm/numa_64.c
>> +++ linux-2.6/arch/x86/mm/numa_64.c
>> @@ -164,14 +164,9 @@ static void * __init early_node_mem(int
>>       unsigned long mem;
>>
>>       /*
>> -      * put it on high as possible
>> -      * something will go with NODE_DATA
>> +      * memblock find will follow top-down, so don't need to adjust
>> +      *   start anymore
>>        */
>> -     if (start < (MAX_DMA_PFN<<PAGE_SHIFT))
>> -             start = MAX_DMA_PFN<<PAGE_SHIFT;
>> -     if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
>> -         end > (MAX_DMA32_PFN<<PAGE_SHIFT))
>> -             start = MAX_DMA32_PFN<<PAGE_SHIFT;
>>       mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
>>       if (mem != MEMBLOCK_ERROR)
>>               return __va(mem);
>>
>
> The old code guarantees that the range is from a single zone, and even
> though memblock may be top -> down, it seems like there would be
> configurations where this would still be an issue (perhaps simulating it
> with numa=fake for testing?) if it crosses the boundary.

memblock_x86_find_in_range_node() will go over with early_node_map[].
so it will always can get right on node allocation.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-20  4:17     ` Yinghai Lu
@ 2011-02-21  9:43       ` Tejun Heo
  2011-02-21 10:44         ` Tejun Heo
  2011-02-21 17:21         ` Yinghai Lu
  0 siblings, 2 replies; 23+ messages in thread
From: Tejun Heo @ 2011-02-21  9:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Rientjes, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

Hello,

On Sat, Feb 19, 2011 at 08:17:11PM -0800, Yinghai Lu wrote:
> > The old code guarantees that the range is from a single zone, and even
> > though memblock may be top -> down, it seems like there would be
> > configurations where this would still be an issue (perhaps simulating it
> > with numa=fake for testing?) if it crosses the boundary.
> 
> memblock_x86_find_in_range_node() will go over with early_node_map[].
> so it will always can get right on node allocation.

I think always doing top-down allocation should be enough as long as
there's no highmem, which we don't have on 64bit.  That said, the
patch description should note the behavior difference.  Yinghai, care
to add a bit more detail to the patch description?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] x86, numa: cleanup x86_acpi_numa_init()
  2011-02-20  4:03   ` David Rientjes
@ 2011-02-21  9:46     ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2011-02-21  9:46 UTC (permalink / raw)
  To: David Rientjes
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

On Sat, Feb 19, 2011 at 08:03:48PM -0800, David Rientjes wrote:
> On Fri, 18 Feb 2011, Yinghai Lu wrote:
> > make it more readable. put valid checking together.
> > 
> > Also restore old acpi_numa_init(). we don't need to touch it
> > because already have x86 own wrapper.
> > 
> > We can limit change to x86 code only.
> > 
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> Acked-by: David Rientjes <rientjes@google.com>

I'm not taking this one.  It's pointless and to me it seems worse.
Why does the function return the count?  What difference do the counts
of 5 and 10 make to the caller?  All it can indicate is failure or
success and 0 and -errno are much better return values for that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] x86-64, NUMA: Do not scan two times for setup_node_bootmem()
  2011-02-20  4:03   ` David Rientjes
@ 2011-02-21 10:20     ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2011-02-21 10:20 UTC (permalink / raw)
  To: David Rientjes
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

>From 2d389b34c02cbc36c561a309735f69568aa9c3ea Mon Sep 17 00:00:00 2001
From: Yinghai Lu <yinghai@kernel.org>
Date: Mon, 21 Feb 2011 10:58:13 +0100

By the time setup_node_bootmem() is called, all the memblocks are
already registered.  As node_data is allocated from these memblocks,
calling it more than once doesn't make any difference.  Drop the loop.

tj: Rephrased description.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Applied to x86-numa.  I dropped the comment David pointed out and
redid the description.  If you are unhappy with me redoing the
descriptions, please scream.

Thanks.

 arch/x86/mm/numa_64.c |   32 ++++++++++++--------------------
 1 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index f6d85e3..6e4ee96 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -480,7 +480,7 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
 
 static int __init numa_register_memblks(struct numa_meminfo *mi)
 {
-	int i, j, nid;
+	int i, nid;
 
 	/* Account for nodes with cpus and no memory */
 	node_possible_map = numa_nodes_parsed;
@@ -506,28 +506,20 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 
 	init_memory_mapping_high();
 
-	/*
-	 * Finally register nodes.  Do it twice in case setup_node_bootmem
-	 * missed one due to missing bootmem.
-	 */
-	for (i = 0; i < 2; i++) {
-		for_each_node_mask(nid, node_possible_map) {
-			u64 start = (u64)max_pfn << PAGE_SHIFT;
-			u64 end = 0;
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = (u64)max_pfn << PAGE_SHIFT;
+		u64 end = 0;
 
-			if (node_online(nid))
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
 				continue;
-
-			for (j = 0; j < mi->nr_blks; j++) {
-				if (nid != mi->blk[j].nid)
-					continue;
-				start = min(mi->blk[j].start, start);
-				end = max(mi->blk[j].end, end);
-			}
-
-			if (start < end)
-				setup_node_bootmem(nid, start, end);
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
 		}
+
+		if (start < end)
+			setup_node_bootmem(nid, start, end);
 	}
 
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-21  9:43       ` Tejun Heo
@ 2011-02-21 10:44         ` Tejun Heo
  2011-02-21 20:28           ` Yinghai Lu
  2011-02-21 17:21         ` Yinghai Lu
  1 sibling, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2011-02-21 10:44 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Rientjes, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

On Mon, Feb 21, 2011 at 10:43:18AM +0100, Tejun Heo wrote:
> Hello,
> 
> On Sat, Feb 19, 2011 at 08:17:11PM -0800, Yinghai Lu wrote:
> > > The old code guarantees that the range is from a single zone, and even
> > > though memblock may be top -> down, it seems like there would be
> > > configurations where this would still be an issue (perhaps simulating it
> > > with numa=fake for testing?) if it crosses the boundary.
> > 
> > memblock_x86_find_in_range_node() will go over with early_node_map[].
> > so it will always can get right on node allocation.
> 
> I think always doing top-down allocation should be enough as long as
> there's no highmem, which we don't have on 64bit.  That said, the
> patch description should note the behavior difference.  Yinghai, care
> to add a bit more detail to the patch description?

Hmmm... thinking more about it, there actually is a difference.
Depending on configuration, the new code allows node_data[] to be
allocated below DMA boundary.  I think we need to keep the first if().
Areas crossing the boundaries is okay, in fact, the original code
already allowed that when the NUMA affine allocation failed; however,
node_data[] was never allowed below the DMA boundary and I think it
shouldn't be.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance.
  2011-02-18 18:59 ` [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance Yinghai Lu
  2011-02-20  4:03   ` David Rientjes
@ 2011-02-21 10:48   ` Tejun Heo
  2011-02-21 17:36     ` Yinghai Lu
  1 sibling, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2011-02-21 10:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel,
	David Rientjes, Cyrill Gorcunov

Hello,

On Fri, Feb 18, 2011 at 10:59:38AM -0800, Yinghai Lu wrote:
> +	if (!numa_distance)
> +		alloc_numa_distance();
> +
> +	if (!numa_distance_cnt)
> +		return;

Can you at least make alloc_numa_distance() return -ENOMEM on failure
instead of checking the numa_distance_cnt implicitly in the caller?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-21  9:43       ` Tejun Heo
  2011-02-21 10:44         ` Tejun Heo
@ 2011-02-21 17:21         ` Yinghai Lu
  2011-02-22  0:27           ` David Rientjes
  1 sibling, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2011-02-21 17:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Rientjes, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

On 02/21/2011 01:43 AM, Tejun Heo wrote:
> Hello,
> 
> On Sat, Feb 19, 2011 at 08:17:11PM -0800, Yinghai Lu wrote:
>>> The old code guarantees that the range is from a single zone, and even
>>> though memblock may be top -> down, it seems like there would be
>>> configurations where this would still be an issue (perhaps simulating it
>>> with numa=fake for testing?) if it crosses the boundary.
>>
>> memblock_x86_find_in_range_node() will go over with early_node_map[].
>> so it will always can get right on node allocation.
> 
> I think always doing top-down allocation should be enough as long as
> there's no highmem, which we don't have on 64bit.  That said, the
> patch description should note the behavior difference.  Yinghai, care
> to add a bit more detail to the patch description?

please check

[PATCH] x86, numa, 64bit: Do not adjust start/end at first for early_node_mem()

We have top-down allocation  with memblock way now.
So do not need to adjust start/end to make them above DMA region.
memblock allocation always get high address for us.

Just remove those lines.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa_64.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -164,14 +164,9 @@ static void * __init early_node_mem(int
 	unsigned long mem;
 
 	/*
-	 * put it on high as possible
-	 * something will go with NODE_DATA
+	 * memblock find will follow top-down. we will get addr above DMA region
+	 *  if possible, so don't need to adjust start anymore
 	 */
-	if (start < (MAX_DMA_PFN<<PAGE_SHIFT))
-		start = MAX_DMA_PFN<<PAGE_SHIFT;
-	if (start < (MAX_DMA32_PFN<<PAGE_SHIFT) &&
-	    end > (MAX_DMA32_PFN<<PAGE_SHIFT))
-		start = MAX_DMA32_PFN<<PAGE_SHIFT;
 	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
 	if (mem != MEMBLOCK_ERROR)
 		return __va(mem);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance.
  2011-02-21 10:48   ` Tejun Heo
@ 2011-02-21 17:36     ` Yinghai Lu
  2011-02-22 10:23       ` [PATCH] x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2011-02-21 17:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel,
	David Rientjes, Cyrill Gorcunov

On 02/21/2011 02:48 AM, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 18, 2011 at 10:59:38AM -0800, Yinghai Lu wrote:
>> +	if (!numa_distance)
>> +		alloc_numa_distance();
>> +
>> +	if (!numa_distance_cnt)
>> +		return;
> 
> Can you at least make alloc_numa_distance() return -ENOMEM on failure
> instead of checking the numa_distance_cnt implicitly in the caller?

please check

[PATCH -v2] x86, numa: seperate alloc_numa_distance from numa_reset_distance.

alloc code is much bigger the setting self.

 make the code some readable.

-v2: let alloc_numa_distance to return -ENOMEM on failing path, requested by tj.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>

---
 arch/x86/mm/numa_64.c |   76 ++++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 35 deletions(-)

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -375,6 +375,45 @@ static void __init numa_reset_distance(v
 	numa_distance = NULL;
 }
 
+static int __init alloc_numa_distance(void)
+{
+	nodemask_t nodes_parsed;
+	size_t size;
+	int i, j, cnt = 0;
+	u64 phys;
+
+	/* size the new table and allocate it */
+	nodes_parsed = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
+
+	for_each_node_mask(i, nodes_parsed)
+		cnt = i;
+	size = ++cnt * sizeof(numa_distance[0]);
+
+	phys = memblock_find_in_range(0,
+				      (u64)max_pfn_mapped << PAGE_SHIFT,
+				      size, PAGE_SIZE);
+	if (phys == MEMBLOCK_ERROR) {
+		pr_warning("NUMA: Warning: can't allocate distance table!\n");
+		/* don't retry until explicitly reset */
+		numa_distance = (void *)1LU;
+		return -ENOMEM;
+	}
+	memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
+
+	numa_distance = __va(phys);
+	numa_distance_cnt = cnt;
+
+	/* fill with the default distances */
+	for (i = 0; i < cnt; i++)
+		for (j = 0; j < cnt; j++)
+			numa_distance[i * cnt + j] = i == j ?
+				LOCAL_DISTANCE : REMOTE_DISTANCE;
+	printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
+
+	return 0;
+}
+
 /*
  * Set the distance between node @from to @to to @distance.  If distance
  * table doesn't exist, one which is large enough to accomodate all the
@@ -382,41 +421,8 @@ static void __init numa_reset_distance(v
  */
 void __init numa_set_distance(int from, int to, int distance)
 {
-	if (!numa_distance) {
-		nodemask_t nodes_parsed;
-		size_t size;
-		int i, j, cnt = 0;
-		u64 phys;
-
-		/* size the new table and allocate it */
-		nodes_parsed = numa_nodes_parsed;
-		numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
-
-		for_each_node_mask(i, nodes_parsed)
-			cnt = i;
-		size = ++cnt * sizeof(numa_distance[0]);
-
-		phys = memblock_find_in_range(0,
-					      (u64)max_pfn_mapped << PAGE_SHIFT,
-					      size, PAGE_SIZE);
-		if (phys == MEMBLOCK_ERROR) {
-			pr_warning("NUMA: Warning: can't allocate distance table!\n");
-			/* don't retry until explicitly reset */
-			numa_distance = (void *)1LU;
-			return;
-		}
-		memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
-
-		numa_distance = __va(phys);
-		numa_distance_cnt = cnt;
-
-		/* fill with the default distances */
-		for (i = 0; i < cnt; i++)
-			for (j = 0; j < cnt; j++)
-				numa_distance[i * cnt + j] = i == j ?
-					LOCAL_DISTANCE : REMOTE_DISTANCE;
-		printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
-	}
+	if (!numa_distance && alloc_numa_distance())
+		return;
 
 	if (from >= numa_distance_cnt || to >= numa_distance_cnt) {
 		printk_once(KERN_DEBUG "NUMA: Debug: distance out of bound, from=%d to=%d distance=%d\n",

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-21 10:44         ` Tejun Heo
@ 2011-02-21 20:28           ` Yinghai Lu
  2011-02-22 10:36             ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2011-02-21 20:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Rientjes, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

On 02/21/2011 02:44 AM, Tejun Heo wrote:
> On Mon, Feb 21, 2011 at 10:43:18AM +0100, Tejun Heo wrote:
>> Hello,
>>
>> On Sat, Feb 19, 2011 at 08:17:11PM -0800, Yinghai Lu wrote:
>>>> The old code guarantees that the range is from a single zone, and even
>>>> though memblock may be top -> down, it seems like there would be
>>>> configurations where this would still be an issue (perhaps simulating it
>>>> with numa=fake for testing?) if it crosses the boundary.
>>>
>>> memblock_x86_find_in_range_node() will go over with early_node_map[].
>>> so it will always can get right on node allocation.
>>
>> I think always doing top-down allocation should be enough as long as
>> there's no highmem, which we don't have on 64bit.  That said, the
>> patch description should note the behavior difference.  Yinghai, care
>> to add a bit more detail to the patch description?
> 
> Hmmm... thinking more about it, there actually is a difference.
> Depending on configuration, the new code allows node_data[] to be
> allocated below DMA boundary.  I think we need to keep the first if().
> Areas crossing the boundaries is okay, in fact, the original code
> already allowed that when the NUMA affine allocation failed; however,
> node_data[] was never allowed below the DMA boundary and I think it
> shouldn't be.

No. when those code were added before. it was bottom-up allocation from e820.

Now with new memblock allocation. it will always try to do top down. 

will have no chance to get under DMA normally.

except your first node only has < 16M.

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-21 17:21         ` Yinghai Lu
@ 2011-02-22  0:27           ` David Rientjes
  0 siblings, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-02-22  0:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Tejun Heo, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

On Mon, 21 Feb 2011, Yinghai Lu wrote:

> [PATCH] x86, numa, 64bit: Do not adjust start/end at first for early_node_mem()
> 
> We have top-down allocation  with memblock way now.
> So do not need to adjust start/end to make them above DMA region.
> memblock allocation always get high address for us.
> 
> Just remove those lines.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()
  2011-02-21 17:36     ` Yinghai Lu
@ 2011-02-22 10:23       ` Tejun Heo
  2011-02-22 10:30         ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2011-02-22 10:23 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel,
	David Rientjes, Cyrill Gorcunov

Alloc code is much bigger the distance setting.  Separate it out into
numa_alloc_distance() for readability.

-v2: Let alloc_numa_distance to return -ENOMEM on failing path,
     requested by tj.

-tj: Description update.  Minor tweaks including function name,
     location and return value check.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Applied to x86-numa with minor tweaks.  Thanks.

 arch/x86/mm/numa_64.c |   75 ++++++++++++++++++++++++++-----------------------
 1 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 848381b..cccc01d 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -401,6 +401,44 @@ void __init numa_reset_distance(void)
 	numa_distance = NULL;
 }
 
+static int __init numa_alloc_distance(void)
+{
+	nodemask_t nodes_parsed;
+	size_t size;
+	int i, j, cnt = 0;
+	u64 phys;
+
+	/* size the new table and allocate it */
+	nodes_parsed = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
+
+	for_each_node_mask(i, nodes_parsed)
+		cnt = i;
+	size = ++cnt * sizeof(numa_distance[0]);
+
+	phys = memblock_find_in_range(0, (u64)max_pfn_mapped << PAGE_SHIFT,
+				      size, PAGE_SIZE);
+	if (phys == MEMBLOCK_ERROR) {
+		pr_warning("NUMA: Warning: can't allocate distance table!\n");
+		/* don't retry until explicitly reset */
+		numa_distance = (void *)1LU;
+		return -ENOMEM;
+	}
+	memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
+
+	numa_distance = __va(phys);
+	numa_distance_cnt = cnt;
+
+	/* fill with the default distances */
+	for (i = 0; i < cnt; i++)
+		for (j = 0; j < cnt; j++)
+			numa_distance[i * cnt + j] = i == j ?
+				LOCAL_DISTANCE : REMOTE_DISTANCE;
+	printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
+
+	return 0;
+}
+
 /**
  * numa_set_distance - Set NUMA distance from one NUMA to another
  * @from: the 'from' node to set distance
@@ -413,41 +451,8 @@ void __init numa_reset_distance(void)
  */
 void __init numa_set_distance(int from, int to, int distance)
 {
-	if (!numa_distance) {
-		nodemask_t nodes_parsed;
-		size_t size;
-		int i, j, cnt = 0;
-		u64 phys;
-
-		/* size the new table and allocate it */
-		nodes_parsed = numa_nodes_parsed;
-		numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
-
-		for_each_node_mask(i, nodes_parsed)
-			cnt = i;
-		size = ++cnt * sizeof(numa_distance[0]);
-
-		phys = memblock_find_in_range(0,
-					      (u64)max_pfn_mapped << PAGE_SHIFT,
-					      size, PAGE_SIZE);
-		if (phys == MEMBLOCK_ERROR) {
-			pr_warning("NUMA: Warning: can't allocate distance table!\n");
-			/* don't retry until explicitly reset */
-			numa_distance = (void *)1LU;
-			return;
-		}
-		memblock_x86_reserve_range(phys, phys + size, "NUMA DIST");
-
-		numa_distance = __va(phys);
-		numa_distance_cnt = cnt;
-
-		/* fill with the default distances */
-		for (i = 0; i < cnt; i++)
-			for (j = 0; j < cnt; j++)
-				numa_distance[i * cnt + j] = i == j ?
-					LOCAL_DISTANCE : REMOTE_DISTANCE;
-		printk(KERN_DEBUG "NUMA: Initialized distance table, cnt=%d\n", cnt);
-	}
+	if (!numa_distance && numa_alloc_distance() < 0)
+		return;
 
 	if (from >= numa_distance_cnt || to >= numa_distance_cnt) {
 		printk_once(KERN_DEBUG "NUMA: Debug: distance out of bound, from=%d to=%d distance=%d\n",
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()
  2011-02-22 10:23       ` [PATCH] x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() Tejun Heo
@ 2011-02-22 10:30         ` Ingo Molnar
  2011-02-22 10:32           ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2011-02-22 10:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yinghai Lu, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	David Rientjes, Cyrill Gorcunov


* Tejun Heo <tj@kernel.org> wrote:

> Alloc code is much bigger the distance setting.  Separate it out into
> numa_alloc_distance() for readability.

FYI, the From: Yinghai line seems to be missing. If this patch is committed as-is 
then you will be the author.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()
  2011-02-22 10:30         ` Ingo Molnar
@ 2011-02-22 10:32           ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2011-02-22 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	David Rientjes, Cyrill Gorcunov

On Tue, Feb 22, 2011 at 11:30 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Tejun Heo <tj@kernel.org> wrote:
>
>> Alloc code is much bigger the distance setting.  Separate it out into
>> numa_alloc_distance() for readability.
>
> FYI, the From: Yinghai line seems to be missing. If this patch is committed as-is
> then you will be the author.

Oh, don't worry.  It was committed with the proper --author option.  I
just forgot to retain the From header while posting the git generated
patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-21 20:28           ` Yinghai Lu
@ 2011-02-22 10:36             ` Tejun Heo
  2011-02-22 10:39               ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2011-02-22 10:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Rientjes, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

On Mon, Feb 21, 2011 at 12:28:20PM -0800, Yinghai Lu wrote:
> > Hmmm... thinking more about it, there actually is a difference.
> > Depending on configuration, the new code allows node_data[] to be
> > allocated below DMA boundary.  I think we need to keep the first if().
> > Areas crossing the boundaries is okay, in fact, the original code
> > already allowed that when the NUMA affine allocation failed; however,
> > node_data[] was never allowed below the DMA boundary and I think it
> > shouldn't be.
> 
> No. when those code were added before. it was bottom-up allocation from e820.
> Now with new memblock allocation. it will always try to do top down. 
> will have no chance to get under DMA normally.
> except your first node only has < 16M.

NODE_MIN_SIZE is 4M.  Crazy SRAT isn't unheard of.  Even on a sane
configuration, given how the physical meomory is laid out, an emulated
NUMA node can easily end up with only <= 16M memory.

Also, your patch makes the code inconsistent.  The NUMA aware
allocation doesn't put any restriction but if it fails we do generic
allocation with DMA boundary limit.

So, don't remove the DMA boundary limit.  Even if the current code
wouldn't result in that (and the current code CAN), it is a good
sanity check to have because you don't want memory map under 16MiB no
matter what.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem()
  2011-02-22 10:36             ` Tejun Heo
@ 2011-02-22 10:39               ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2011-02-22 10:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: David Rientjes, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Cyrill Gorcunov

On Tue, Feb 22, 2011 at 11:36:53AM +0100, Tejun Heo wrote:
> So, don't remove the DMA boundary limit.  Even if the current code
> wouldn't result in that (and the current code CAN), it is a good
> sanity check to have because you don't want memory map under 16MiB no
> matter what.

Oh, if you're genuinely concerned about machines which wouldn't be
able to allocate memory map because it doesn't have enough memory
above 16MiB, the correct thing to do is removing the boundary limit
from the fallback generic allocation, NOT from the NUMA aware
allocation.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-02-22 10:39 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4D5EC05A.60103@kernel.org>
2011-02-18 18:59 ` [PATCH 1/4] x86, numa: Do not scan two times for setup_node_bootmem() Yinghai Lu
2011-02-20  4:03   ` David Rientjes
2011-02-21 10:20     ` [PATCH] x86-64, NUMA: " Tejun Heo
2011-02-18 18:59 ` [PATCH 2/4] x86, numa: Do not adjust start/end for early_node_mem() Yinghai Lu
2011-02-20  4:03   ` David Rientjes
2011-02-20  4:17     ` Yinghai Lu
2011-02-21  9:43       ` Tejun Heo
2011-02-21 10:44         ` Tejun Heo
2011-02-21 20:28           ` Yinghai Lu
2011-02-22 10:36             ` Tejun Heo
2011-02-22 10:39               ` Tejun Heo
2011-02-21 17:21         ` Yinghai Lu
2011-02-22  0:27           ` David Rientjes
2011-02-18 18:59 ` [PATCH 3/4] x86, numa: cleanup x86_acpi_numa_init() Yinghai Lu
2011-02-20  4:03   ` David Rientjes
2011-02-21  9:46     ` Tejun Heo
2011-02-18 18:59 ` [PATCH 4/4] x86, numa: seperate alloc_numa_distance from numa_reset_distance Yinghai Lu
2011-02-20  4:03   ` David Rientjes
2011-02-21 10:48   ` Tejun Heo
2011-02-21 17:36     ` Yinghai Lu
2011-02-22 10:23       ` [PATCH] x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() Tejun Heo
2011-02-22 10:30         ` Ingo Molnar
2011-02-22 10:32           ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.