All of lore.kernel.org
 help / color / mirror / Atom feed
* early_node_mem()'s memory allocation policy
@ 2010-10-26 22:18 Jeremy Fitzhardinge
  2010-10-27  5:49 ` Yinghai Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-26 22:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Linux Kernel Mailing List, Konrad Rzeszutek Wilk

 We're seeing problems under Xen where large portions of the memory
could be reserved (because they're not yet physically present, even
though the appear in E820), and the 'start' and 'end' early_node_mem()
is choosing is entirely within that reserved range.

Also, the code seems dubious because it adjusts start and end without
regarding how much space it is trying to allocate:

	/* extend the search scope */
	end = max_pfn_mapped << PAGE_SHIFT;
	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
		start = MAX_DMA32_PFN<<PAGE_SHIFT;
	else
		start = MAX_DMA_PFN<<PAGE_SHIFT;

what if max_pfn_mapped is only a few pages larger than MAX_DMA32_PFN,
and that is smaller than the size it is trying to allocate?

I tried just removing the start and end adjustments in early_node_mem()
and the kernel booted fine under Xen, but it seemed to allocate at a
very low address.  Should the for_each_active_range_index_in_nid() loop
in find_memory_core_early() be iterating from high to low addresses?  If
the allocation could be relied on to be top-down, then you wouldn't need
to adjust start at all, and it would return the highest available memory
in a natural way.

Thanks,
    J

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: early_node_mem()'s memory allocation policy
  2010-10-26 22:18 early_node_mem()'s memory allocation policy Jeremy Fitzhardinge
@ 2010-10-27  5:49 ` Yinghai Lu
  2010-10-27 14:28   ` Konrad Rzeszutek Wilk
  2010-10-27 20:21   ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 6+ messages in thread
From: Yinghai Lu @ 2010-10-27  5:49 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Linux Kernel Mailing List, Konrad Rzeszutek Wilk

On 10/26/2010 03:18 PM, Jeremy Fitzhardinge wrote:
>  We're seeing problems under Xen where large portions of the memory
> could be reserved (because they're not yet physically present, even
> though the appear in E820), and the 'start' and 'end' early_node_mem()
> is choosing is entirely within that reserved range.
> 
> Also, the code seems dubious because it adjusts start and end without
> regarding how much space it is trying to allocate:
> 
> 	/* extend the search scope */
> 	end = max_pfn_mapped << PAGE_SHIFT;
> 	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
> 		start = MAX_DMA32_PFN<<PAGE_SHIFT;
> 	else
> 		start = MAX_DMA_PFN<<PAGE_SHIFT;
> 
> what if max_pfn_mapped is only a few pages larger than MAX_DMA32_PFN,
> and that is smaller than the size it is trying to allocate?
> 
> I tried just removing the start and end adjustments in early_node_mem()
> and the kernel booted fine under Xen, but it seemed to allocate at a
> very low address.  Should the for_each_active_range_index_in_nid() loop
> in find_memory_core_early() be iterating from high to low addresses?  If
> the allocation could be relied on to be top-down, then you wouldn't need
> to adjust start at all, and it would return the highest available memory
> in a natural way.

please check

[PATCH] x86, memblock: Fix early_node_mem with big reserved region.

Jeremy said Xen could reserve huge mem but still show as ram in e820.

early_node_mem could not find range because of start/end adjusting.

Let's use memblock_find_in_range instead ***_node. So get real top down in fallback path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 60f4985..7ffc9b7 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -178,11 +178,8 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 
 	/* extend the search scope */
 	end = max_pfn_mapped << PAGE_SHIFT;
-	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
-		start = MAX_DMA32_PFN<<PAGE_SHIFT;
-	else
-		start = MAX_DMA_PFN<<PAGE_SHIFT;
-	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
+	start = MAX_DMA_PFN << PAGE_SHIFT;
+	mem = memblock_find_in_range(start, end, size, align);
 	if (mem != MEMBLOCK_ERROR)
 		return __va(mem);
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: early_node_mem()'s memory allocation policy
  2010-10-27  5:49 ` Yinghai Lu
@ 2010-10-27 14:28   ` Konrad Rzeszutek Wilk
  2010-10-27 20:21   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 6+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-27 14:28 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jeremy Fitzhardinge, H. Peter Anvin, Linux Kernel Mailing List

On Tue, Oct 26, 2010 at 10:49:33PM -0700, Yinghai Lu wrote:
> On 10/26/2010 03:18 PM, Jeremy Fitzhardinge wrote:
> >  We're seeing problems under Xen where large portions of the memory
> > could be reserved (because they're not yet physically present, even
> > though the appear in E820), and the 'start' and 'end' early_node_mem()
> > is choosing is entirely within that reserved range.
> > 
> > Also, the code seems dubious because it adjusts start and end without
> > regarding how much space it is trying to allocate:
> > 
> > 	/* extend the search scope */
> > 	end = max_pfn_mapped << PAGE_SHIFT;
> > 	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
> > 		start = MAX_DMA32_PFN<<PAGE_SHIFT;
> > 	else
> > 		start = MAX_DMA_PFN<<PAGE_SHIFT;
> > 
> > what if max_pfn_mapped is only a few pages larger than MAX_DMA32_PFN,
> > and that is smaller than the size it is trying to allocate?
> > 
> > I tried just removing the start and end adjustments in early_node_mem()
> > and the kernel booted fine under Xen, but it seemed to allocate at a
> > very low address.  Should the for_each_active_range_index_in_nid() loop
> > in find_memory_core_early() be iterating from high to low addresses?  If
> > the allocation could be relied on to be top-down, then you wouldn't need
> > to adjust start at all, and it would return the highest available memory
> > in a natural way.
> 
> please check

It definitly gets us across that hump. Thanks.
> 
> [PATCH] x86, memblock: Fix early_node_mem with big reserved region.
> 
> Jeremy said Xen could reserve huge mem but still show as ram in e820.
> 
> early_node_mem could not find range because of start/end adjusting.
> 
> Let's use memblock_find_in_range instead ***_node. So get real top down in fallback path.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> 
> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index 60f4985..7ffc9b7 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -178,11 +178,8 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
>  
>  	/* extend the search scope */
>  	end = max_pfn_mapped << PAGE_SHIFT;
> -	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
> -		start = MAX_DMA32_PFN<<PAGE_SHIFT;
> -	else
> -		start = MAX_DMA_PFN<<PAGE_SHIFT;
> -	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
> +	start = MAX_DMA_PFN << PAGE_SHIFT;
> +	mem = memblock_find_in_range(start, end, size, align);
>  	if (mem != MEMBLOCK_ERROR)
>  		return __va(mem);
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: early_node_mem()'s memory allocation policy
  2010-10-27  5:49 ` Yinghai Lu
  2010-10-27 14:28   ` Konrad Rzeszutek Wilk
@ 2010-10-27 20:21   ` Jeremy Fitzhardinge
  2010-10-28 16:50     ` [PATCH] x86, memblock: Fix early_node_mem with big reserved region Yinghai Lu
  1 sibling, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-27 20:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Linux Kernel Mailing List, Konrad Rzeszutek Wilk

 On 10/26/2010 10:49 PM, Yinghai Lu wrote:
> please check
>
> [PATCH] x86, memblock: Fix early_node_mem with big reserved region.
>
> Jeremy said Xen could reserve huge mem but still show as ram in e820.
>
> early_node_mem could not find range because of start/end adjusting.
>
> Let's use memblock_find_in_range instead ***_node. So get real top down in fallback path.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>

Yes, that works.  Could you queue it for upstream?  Without this we see
Xen crashes with certain memory configurations.

Thanks,
    J

> diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
> index 60f4985..7ffc9b7 100644
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -178,11 +178,8 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
>  
>  	/* extend the search scope */
>  	end = max_pfn_mapped << PAGE_SHIFT;
> -	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
> -		start = MAX_DMA32_PFN<<PAGE_SHIFT;
> -	else
> -		start = MAX_DMA_PFN<<PAGE_SHIFT;
> -	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
> +	start = MAX_DMA_PFN << PAGE_SHIFT;
> +	mem = memblock_find_in_range(start, end, size, align);
>  	if (mem != MEMBLOCK_ERROR)
>  		return __va(mem);
>  
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] x86, memblock: Fix early_node_mem with big reserved region.
  2010-10-27 20:21   ` Jeremy Fitzhardinge
@ 2010-10-28 16:50     ` Yinghai Lu
  2010-10-28 23:40       ` [tip:x86/urgent] " tip-bot for Yinghai Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2010-10-28 16:50 UTC (permalink / raw)
  To: H. Peter Anvin, Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar
  Cc: Jeremy Fitzhardinge, Linux Kernel Mailing List, Konrad Rzeszutek Wilk


Jeremy said Xen could reserve huge mem but still show as ram in e820.
early_node_mem could not find range because of start/end adjusting.
and will go through fallback path. but fallback is still using 
memblock_x86_find_range_node(), and it is partially top-down because it
go through active_range entries from low to high.

Let's use memblock_find_in_range instead memblock_x86_find_range_node.
So get real top down in fallback path.

This is for 2.6.37.

And We may still need to make memblock_x86_find_range_node to do overall
top_down work.

Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa_64.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -178,11 +178,8 @@ static void * __init early_node_mem(int
 
 	/* extend the search scope */
 	end = max_pfn_mapped << PAGE_SHIFT;
-	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
-		start = MAX_DMA32_PFN<<PAGE_SHIFT;
-	else
-		start = MAX_DMA_PFN<<PAGE_SHIFT;
-	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
+	start = MAX_DMA_PFN << PAGE_SHIFT;
+	mem = memblock_find_in_range(start, end, size, align);
 	if (mem != MEMBLOCK_ERROR)
 		return __va(mem);
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/urgent] x86, memblock: Fix early_node_mem with big reserved region.
  2010-10-28 16:50     ` [PATCH] x86, memblock: Fix early_node_mem with big reserved region Yinghai Lu
@ 2010-10-28 23:40       ` tip-bot for Yinghai Lu
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Yinghai Lu @ 2010-10-28 23:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, jeremy, hpa, mingo, konrad.wilk, yinghai, tglx, hpa

Commit-ID:  419db274bed4269f475a8e78cbe9c917192cfe8b
Gitweb:     http://git.kernel.org/tip/419db274bed4269f475a8e78cbe9c917192cfe8b
Author:     Yinghai Lu <yinghai@kernel.org>
AuthorDate: Thu, 28 Oct 2010 09:50:17 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Thu, 28 Oct 2010 15:52:36 -0700

x86, memblock: Fix early_node_mem with big reserved region.

Xen can reserve huge amounts of memory for pre-ballooning, but that
still shows as RAM in the e820 memory map.  early_node_mem could not
find range because of start/end adjusting, and will go through the
fallback path.  However, the fallback patch is still using
memblock_x86_find_range_node(), and it is partially top-down because
it go through active_range entries from low to high.

Let's use memblock_find_in_range instead memblock_x86_find_range_node.
So get real top down in fallback path.

We may still need to make memblock_x86_find_range_node to do overall
top_down work.

Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Jeremy Fitzhardinge <jeremy@goop.org>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CC9A9C9.8020700@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/mm/numa_64.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 60f4985..7ffc9b7 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -178,11 +178,8 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 
 	/* extend the search scope */
 	end = max_pfn_mapped << PAGE_SHIFT;
-	if (end > (MAX_DMA32_PFN<<PAGE_SHIFT))
-		start = MAX_DMA32_PFN<<PAGE_SHIFT;
-	else
-		start = MAX_DMA_PFN<<PAGE_SHIFT;
-	mem = memblock_x86_find_in_range_node(nodeid, start, end, size, align);
+	start = MAX_DMA_PFN << PAGE_SHIFT;
+	mem = memblock_find_in_range(start, end, size, align);
 	if (mem != MEMBLOCK_ERROR)
 		return __va(mem);
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-10-28 23:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-26 22:18 early_node_mem()'s memory allocation policy Jeremy Fitzhardinge
2010-10-27  5:49 ` Yinghai Lu
2010-10-27 14:28   ` Konrad Rzeszutek Wilk
2010-10-27 20:21   ` Jeremy Fitzhardinge
2010-10-28 16:50     ` [PATCH] x86, memblock: Fix early_node_mem with big reserved region Yinghai Lu
2010-10-28 23:40       ` [tip:x86/urgent] " tip-bot for Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.