linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
       [not found] <200812241325.49404.chandru@in.ibm.com>
@ 2008-12-25  7:35 ` Andrew Morton
  2008-12-25  8:07   ` Andrew Morton
  2008-12-26  0:59   ` Paul Mackerras
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2008-12-25  7:35 UTC (permalink / raw)
  To: Chandru; +Cc: Benjamin, linuxppc-dev, Paul Mackerras, linux-kernel

(cc's added)

On Wed, 24 Dec 2008 13:25:49 +0530 Chandru <chandru@in.ibm.com> wrote:

> On a ppc machine booting linux-2.6.28-rc9 with crashkernel=256M@32M boot 
> parameter causes the kernel to panic while booting. __Following are the console 
> messages...

- Please put [patch] in the Subject: line of patches

- Please choose a suitable title, as per
  Documentation/SubmittingPatches, section 15.

- Please cc suitable mailing lists and maintainers on bug reports and
  on patches.



From: Chandru <chandru@in.ibm.com>

When booted with crashkernel=224M@32M or any memory size less than this,
the system boots properly.  The following was the observation..  The
system comes up with two nodes (0-256M and 256M-4GB).  _The crashkernel
memory reservation spans across these two nodes.  _The
mark_reserved_regions_for_nid() in arch/powerpc/mm/numa.c resizes the
reserved part of the memory within it as:

_ _ _ _ _ _ if (end_pfn > node_ar.end_pfn)
_ _ _ _ _ _ _ _ reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
_ _ _ _ _ _ _ _ _ _ - (start_pfn << PAGE_SHIFT);


but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of end 

_ _ end = PFN_UP(physaddr + size);

This causes end to get a value past the last page in the 0-256M node. 
_Again when reserve_bootmem_node() returns,
_mark_reserved_regions_for_nid() loops around to set the rest of the
crashkernel memory in the next node as reserved.  _ It references
NODE_DATA(node_ar.nid) and this causes another 'Oops: kernel access of bad
area' problem.  The following changes made the system to boot with any
amount of crashkernel memory size.

Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/mm/numa.c |    7 ++++---
 mm/bootmem.c           |    4 ++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff -puN arch/powerpc/mm/numa.c~2628-rc9-panics-with-crashkernel=256m-while-booting arch/powerpc/mm/numa.c
--- a/arch/powerpc/mm/numa.c~2628-rc9-panics-with-crashkernel=256m-while-booting
+++ a/arch/powerpc/mm/numa.c
@@ -995,10 +995,11 @@ void __init do_init_bootmem(void)
 				  start_pfn, end_pfn);
 
 		free_bootmem_with_active_regions(nid, end_pfn);
+	}
+
+	for_each_online_node(nid) {
 		/*
-		 * Be very careful about moving this around.  Future
-		 * calls to careful_allocation() depend on this getting
-		 * done correctly.
+		 * Be very careful about moving this around.
 		 */
 		mark_reserved_regions_for_nid(nid);
 		sparse_memory_present_with_active_regions(nid);
diff -puN mm/bootmem.c~2628-rc9-panics-with-crashkernel=256m-while-booting mm/bootmem.c
--- a/mm/bootmem.c~2628-rc9-panics-with-crashkernel=256m-while-booting
+++ a/mm/bootmem.c
@@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
 				 unsigned long size, int flags)
 {
 	unsigned long start, end;
+	bootmem_data_t *bdata = pgdat->bdata;
 
 	start = PFN_DOWN(physaddr);
 	end = PFN_UP(physaddr + size);
 
+	if (end > bdata->node_low_pfn)
+		end = bdata->node_low_pfn;
+
 	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
 }
 
_

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2008-12-25  7:35 ` 2.6.28-rc9 panics with crashkernel=256M while booting Andrew Morton
@ 2008-12-25  8:07   ` Andrew Morton
  2008-12-26  0:59   ` Paul Mackerras
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2008-12-25  8:07 UTC (permalink / raw)
  To: Chandru, linux-kernel, linuxppc-dev, Benjamin Herrenschmidt,
	Paul Mackerras

On Wed, 24 Dec 2008 23:35:36 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:

> - Please put [patch] in the Subject: line of patches
> 
> - Please choose a suitable title, as per
>   Documentation/SubmittingPatches, section 15.
> 
> - Please cc suitable mailing lists and maintainers on bug reports and
>   on patches.

Also the patch was wordwrapped and the changelog was filled with weird
UTF8 characters.

I think I have it all cleaned up now.


From: Chandru <chandru@in.ibm.com>

When booted with crashkernel=224M@32M or any memory size less than this,
the system boots properly.  The following was the observation..  The
system comes up with two nodes (0-256M and 256M-4GB).  The crashkernel
memory reservation spans across these two nodes.  The
mark_reserved_regions_for_nid() in arch/powerpc/mm/numa.c resizes the
reserved part of the memory within it as:

	if (end_pfn > node_ar.end_pfn)
		reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
				- (start_pfn << PAGE_SHIFT);


but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of end 

	end = PFN_UP(physaddr + size);

This causes end to get a value past the last page in the 0-256M node. 
Again when reserve_bootmem_node() returns, mark_reserved_regions_for_nid()
loops around to set the rest of the crashkernel memory in the next node as
reserved.  It references NODE_DATA(node_ar.nid) and this causes another
'Oops: kernel access of bad area' problem.  The following changes made the
system to boot with any amount of crashkernel memory size.

Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/mm/numa.c |    7 ++++---
 mm/bootmem.c           |    4 ++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff -puN arch/powerpc/mm/numa.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes arch/powerpc/mm/numa.c
--- a/arch/powerpc/mm/numa.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes
+++ a/arch/powerpc/mm/numa.c
@@ -995,10 +995,11 @@ void __init do_init_bootmem(void)
 				  start_pfn, end_pfn);
 
 		free_bootmem_with_active_regions(nid, end_pfn);
+	}
+
+	for_each_online_node(nid) {
 		/*
-		 * Be very careful about moving this around.  Future
-		 * calls to careful_allocation() depend on this getting
-		 * done correctly.
+		 * Be very careful about moving this around.
 		 */
 		mark_reserved_regions_for_nid(nid);
 		sparse_memory_present_with_active_regions(nid);
diff -puN mm/bootmem.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes mm/bootmem.c
--- a/mm/bootmem.c~powerpc-fix-code-for-reserved-memory-spanning-across-nodes
+++ a/mm/bootmem.c
@@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
 				 unsigned long size, int flags)
 {
 	unsigned long start, end;
+	bootmem_data_t *bdata = pgdat->bdata;
 
 	start = PFN_DOWN(physaddr);
 	end = PFN_UP(physaddr + size);
 
+	if (end > bdata->node_low_pfn)
+		end = bdata->node_low_pfn;
+
 	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
 }
 
_

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2008-12-25  7:35 ` 2.6.28-rc9 panics with crashkernel=256M while booting Andrew Morton
  2008-12-25  8:07   ` Andrew Morton
@ 2008-12-26  0:59   ` Paul Mackerras
  2008-12-29 21:36     ` Dave Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: Paul Mackerras @ 2008-12-26  0:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Hansen, linuxppc-dev, linux-kernel

Andrew Morton writes:

> On Wed, 24 Dec 2008 13:25:49 +0530 Chandru <chandru@in.ibm.com> wrote:
> 
> > On a ppc machine booting linux-2.6.28-rc9 with crashkernel=256M@32M boot 
> > parameter causes the kernel to panic while booting. __Following are the console 
> > messages...
> 
> - Please put [patch] in the Subject: line of patches
> 
> - Please choose a suitable title, as per
>   Documentation/SubmittingPatches, section 15.
> 
> - Please cc suitable mailing lists and maintainers on bug reports and
>   on patches.

Dave Hansen was working on this code recently, and this looks a bit
similar to some changes he was making.  Dave, what's your opinion on
this patch?

Paul.

> From: Chandru <chandru@in.ibm.com>
> 
> When booted with crashkernel=224M@32M or any memory size less than this,
> the system boots properly.  The following was the observation..  The
> system comes up with two nodes (0-256M and 256M-4GB).  _The crashkernel
> memory reservation spans across these two nodes.  _The
> mark_reserved_regions_for_nid() in arch/powerpc/mm/numa.c resizes the
> reserved part of the memory within it as:
> 
> _ _ _ _ _ _ if (end_pfn > node_ar.end_pfn)
> _ _ _ _ _ _ _ _ reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
> _ _ _ _ _ _ _ _ _ _ - (start_pfn << PAGE_SHIFT);
> 
> 
> but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of end 
> 
> _ _ end = PFN_UP(physaddr + size);
> 
> This causes end to get a value past the last page in the 0-256M node. 
> _Again when reserve_bootmem_node() returns,
> _mark_reserved_regions_for_nid() loops around to set the rest of the
> crashkernel memory in the next node as reserved.  _ It references
> NODE_DATA(node_ar.nid) and this causes another 'Oops: kernel access of bad
> area' problem.  The following changes made the system to boot with any
> amount of crashkernel memory size.
> 
> Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  arch/powerpc/mm/numa.c |    7 ++++---
>  mm/bootmem.c           |    4 ++++
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff -puN arch/powerpc/mm/numa.c~2628-rc9-panics-with-crashkernel=256m-while-booting arch/powerpc/mm/numa.c
> --- a/arch/powerpc/mm/numa.c~2628-rc9-panics-with-crashkernel=256m-while-booting
> +++ a/arch/powerpc/mm/numa.c
> @@ -995,10 +995,11 @@ void __init do_init_bootmem(void)
>  				  start_pfn, end_pfn);
>  
>  		free_bootmem_with_active_regions(nid, end_pfn);
> +	}
> +
> +	for_each_online_node(nid) {
>  		/*
> -		 * Be very careful about moving this around.  Future
> -		 * calls to careful_allocation() depend on this getting
> -		 * done correctly.
> +		 * Be very careful about moving this around.
>  		 */
>  		mark_reserved_regions_for_nid(nid);
>  		sparse_memory_present_with_active_regions(nid);
> diff -puN mm/bootmem.c~2628-rc9-panics-with-crashkernel=256m-while-booting mm/bootmem.c
> --- a/mm/bootmem.c~2628-rc9-panics-with-crashkernel=256m-while-booting
> +++ a/mm/bootmem.c
> @@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
>  				 unsigned long size, int flags)
>  {
>  	unsigned long start, end;
> +	bootmem_data_t *bdata = pgdat->bdata;
>  
>  	start = PFN_DOWN(physaddr);
>  	end = PFN_UP(physaddr + size);
>  
> +	if (end > bdata->node_low_pfn)
> +		end = bdata->node_low_pfn;
> +
>  	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
>  }
>  
> _

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2008-12-26  0:59   ` Paul Mackerras
@ 2008-12-29 21:36     ` Dave Hansen
  2009-01-05 13:49       ` Chandru
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2008-12-29 21:36 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Andrew Morton, linux-kernel

On Fri, 2008-12-26 at 11:59 +1100, Paul Mackerras wrote:
> 
> > diff -puN arch/powerpc/mm/numa.c~2628-rc9-panics-with-crashkernel=256m-while-booting arch/powerpc/mm/numa.c
> > --- a/arch/powerpc/mm/numa.c~2628-rc9-panics-with-crashkernel=256m-while-booting
> > +++ a/arch/powerpc/mm/numa.c
> > @@ -995,10 +995,11 @@ void __init do_init_bootmem(void)
> >                                 start_pfn, end_pfn);
> >  
> >               free_bootmem_with_active_regions(nid, end_pfn);
> > +     }
> > +
> > +     for_each_online_node(nid) {
> >               /*
> > -              * Be very careful about moving this around.  Future
> > -              * calls to careful_allocation() depend on this getting
> > -              * done correctly.
> > +              * Be very careful about moving this around.
> >                */
> >               mark_reserved_regions_for_nid(nid);
> >               sparse_memory_present_with_active_regions(nid);

I think this reintroduces one of the bugs that I squashed.  You *have*
to call mark_reserved_regions_for_nid() right after you do
free_bootmem_with_active_regions().  Otherwise, someone else can
bootmem_alloc() a reserved region from that node.

Perhaps I need to make that comment a bit more forceful. :)

/*
 * Don't break this loop out.  Period.  Never.  Ever.
 * No, seriously.  Don't do it.  I mean it.  Really!
 */

-- Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2008-12-29 21:36     ` Dave Hansen
@ 2009-01-05 13:49       ` Chandru
  2009-01-05 16:30         ` Dave Hansen
  0 siblings, 1 reply; 19+ messages in thread
From: Chandru @ 2009-01-05 13:49 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Tuesday 30 December 2008 03:06:07 Dave Hansen wrote:
> On Fri, 2008-12-26 at 11:59 +1100, Paul Mackerras wrote:
> > > +     }
> > > +
> > > +     for_each_online_node(nid) {
> > >               /*
> > > -              * Be very careful about moving this around.  Future
> > > -              * calls to careful_allocation() depend on this getting
> > > -              * done correctly.
> > > +              * Be very careful about moving this around.
> > >                */
> > >               mark_reserved_regions_for_nid(nid);
> > >               sparse_memory_present_with_active_regions(nid);
>
> I think this reintroduces one of the bugs that I squashed.  You *have*
> to call mark_reserved_regions_for_nid() right after you do
> free_bootmem_with_active_regions().  Otherwise, someone else can
> bootmem_alloc() a reserved region from that node.

Thanks for the review comments Dave. With the commit:a4c74ddd5ea3db53fc73d29c222b22656a7d05be, I see this has been taken care in mark_reserved_regions_for_nid(). In that case we may only need the change made in reserve_bootmem_node(). 

Hello Andrew, 
Could you please consider the following patch instead of the original patch in this thread. 
Thanks, 


When booted with crashkernel=224M@32M or any memory size less than this, the system boots properly. The system comes up with two nodes (0-256M and 256M-4GB). The crashkernel memory reservation spans across these two nodes. The mark_reserved_regions_for_nid() in arch/powerpc/numa.c resizes the reserved part of the memory within it as... 

	 if (end_pfn > node_ar.end_pfn)
		reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
				- (start_pfn << PAGE_SHIFT);

but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of end

	end = PFN_UP(physaddr + size);

This causes end to get a value past the last page in the 0-256M node. The following change restricts the value of end if it exceeds the last pfn in a given node.

Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
---

 mm/bootmem.c |    4 ++++
 1 file changed, 4 insertions(+)


--- linux-2.6.28/mm/bootmem.c.orig	2009-01-05 20:42:12.000000000 +0530
+++ linux-2.6.28/mm/bootmem.c	2009-01-05 20:43:53.000000000 +0530
@@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
 				 unsigned long size, int flags)
 {
 	unsigned long start, end;
+	bootmem_data_t *bdata = pgdat->bdata;
 
 	start = PFN_DOWN(physaddr);
 	end = PFN_UP(physaddr + size);
 
+	if (end > bdata->node_low_pfn)
+		end = bdata->node_low_pfn;
+
 	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
 }
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-05 13:49       ` Chandru
@ 2009-01-05 16:30         ` Dave Hansen
  2009-01-07 12:58           ` Chandru
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2009-01-05 16:30 UTC (permalink / raw)
  To: Chandru; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Mon, 2009-01-05 at 19:19 +0530, Chandru wrote:
> On Tuesday 30 December 2008 03:06:07 Dave Hansen wrote:
> When booted with crashkernel=224M@32M or any memory size less than
> this, the system boots properly. The system comes up with two nodes
> (0-256M and 256M-4GB). The crashkernel memory reservation spans across
> these two nodes. The mark_reserved_regions_for_nid() in
> arch/powerpc/numa.c resizes the reserved part of the memory within it
> as... 
> 
> 	 if (end_pfn > node_ar.end_pfn)
> 		reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
> 				- (start_pfn << PAGE_SHIFT);
> 
> but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of
> end
> 
> 	end = PFN_UP(physaddr + size);
> 
> This causes end to get a value past the last page in the 0-256M node.
> The following change restricts the value of end if it exceeds the last
> pfn in a given node.

OK, I had to think about this for a good, long time.  That's bad. :)

There are two things that we're dealing with here: "active regions" and
the NODE_DATA's.  The if() you've pasted above resizes the reservation
so that it fits into the current active region.  However, as you noted,
we haven't resized it so that it fits into the NODE_DATA() that we're
looking at.  We call into the bootmem code, and BUG_ON().

The thing I don't like about this is that it might hide bugs in other
callers.  This really is a ppc-specific thing and, although what you
wrote will fix the bug on ppc, it will probably cause someone in the
future to call reserve_bootmem_node() with too large a reservation and
get a silent failure (not reserving the requested size) back.  

We really do need to go take a hard look at the whole interaction
between lmb's, node active regions, and the NUMA code some day.  It has
kinda grown to be a bit ungainly.

How about we just consult the NODE_DATA() in
mark_reserved_regions_for_nid() instead of reserve_bootmem_node()?

> --- linux-2.6.28/mm/bootmem.c.orig	2009-01-05 20:42:12.000000000 +0530
> +++ linux-2.6.28/mm/bootmem.c	2009-01-05 20:43:53.000000000 +0530
> @@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
>  				 unsigned long size, int flags)
>  {
>  	unsigned long start, end;
> +	bootmem_data_t *bdata = pgdat->bdata;
> 
>  	start = PFN_DOWN(physaddr);
>  	end = PFN_UP(physaddr + size);
> 
> +	if (end > bdata->node_low_pfn)
> +		end = bdata->node_low_pfn;
> +
>  	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
>  }
-- Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-05 16:30         ` Dave Hansen
@ 2009-01-07 12:58           ` Chandru
  2009-01-07 17:25             ` Dave Hansen
  0 siblings, 1 reply; 19+ messages in thread
From: Chandru @ 2009-01-07 12:58 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Monday 05 January 2009 22:00:33 Dave Hansen wrote:
> OK, I had to think about this for a good, long time.  That's bad. :)
>
> There are two things that we're dealing with here: "active regions" and
> the NODE_DATA's.  The if() you've pasted above resizes the reservation
> so that it fits into the current active region.  However, as you noted,
> we haven't resized it so that it fits into the NODE_DATA() that we're
> looking at.  We call into the bootmem code, and BUG_ON().
>
> The thing I don't like about this is that it might hide bugs in other
> callers.  This really is a ppc-specific thing and, although what you
> wrote will fix the bug on ppc, it will probably cause someone in the
> future to call reserve_bootmem_node() with too large a reservation and
> get a silent failure (not reserving the requested size) back.
>
> We really do need to go take a hard look at the whole interaction
> between lmb's, node active regions, and the NUMA code some day.  It has
> kinda grown to be a bit ungainly.
>
> How about we just consult the NODE_DATA() in
> mark_reserved_regions_for_nid() instead of reserve_bootmem_node()?

I don't know how you wanted NODE_DATA() to be consulted here. i.e before 
calling reserve_bootmem_node() should we have a condition 

	if (PFN_UP(physbase+reserve_size) > node_end_pfn) 
	then
		resize reserve_size again so that PFN_UP() will equate to node_end_pfn ??
	end 

Also I was wondering if in reserve_bootmem_node()
	end = PFN_DOWN() ; will do.. 

With the recent changes from you that went into 2.6.28 stable 
(commit:a4c74ddd5ea3db53fc73d29c222b22656a7d05be), it worked on the system 
with PFN_DOWN(). 

Thanks,
Chandru

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-07 12:58           ` Chandru
@ 2009-01-07 17:25             ` Dave Hansen
  2009-01-08 10:29               ` Chandru
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2009-01-07 17:25 UTC (permalink / raw)
  To: Chandru; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Wed, 2009-01-07 at 18:28 +0530, Chandru wrote:
> I don't know how you wanted NODE_DATA() to be consulted here. i.e before 
> calling reserve_bootmem_node() should we have a condition 
> 
>         if (PFN_UP(physbase+reserve_size) > node_end_pfn) 
>         then
>                 resize reserve_size again so that PFN_UP() will equate to node_end_pfn ??
>         end 

I'm just suggesting making your fix in the ppc code instead of in
mm/bootmem.c.

-- Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-07 17:25             ` Dave Hansen
@ 2009-01-08 10:29               ` Chandru
  2009-01-08 20:03                 ` Dave Hansen
  0 siblings, 1 reply; 19+ messages in thread
From: Chandru @ 2009-01-08 10:29 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Wednesday 07 January 2009 22:55:25 Dave Hansen wrote:
> 
> I'm just suggesting making your fix in the ppc code instead of in
> mm/bootmem.c.
> 

Here are the changes that helped to boot the kernel. Please review it.
Thanks,

Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
---

 arch/powerpc/mm/numa.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- linux-2.6.28/arch/powerpc/mm/numa.c.orig	2009-01-08 03:20:41.000000000 -0600
+++ linux-2.6.28/arch/powerpc/mm/numa.c	2009-01-08 03:50:41.000000000 -0600
@@ -16,6 +16,7 @@
 #include <linux/module.h>
 #include <linux/nodemask.h>
 #include <linux/cpu.h>
+#include <linux/pfn.h>
 #include <linux/notifier.h>
 #include <linux/lmb.h>
 #include <linux/of.h>
@@ -898,9 +899,17 @@ static void mark_reserved_regions_for_ni
 			 * if reserved region extends past active region
 			 * then trim size to active region
 			 */
-			if (end_pfn > node_ar.end_pfn)
+			if (end_pfn > node_ar.end_pfn) {
 				reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
 					- (start_pfn << PAGE_SHIFT);
+				/*
+				 * resize it further if the reservation could
+				 * cross the last page in this node
+				 */
+				if (PFN_UP(physbase+reserve_size) >
+						 node_end_pfn)
+					reserve_size -= PAGE_SIZE;
+			}
 			/*
 			 * Only worry about *this* node, others may not
 			 * yet have valid NODE_DATA().

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-08 10:29               ` Chandru
@ 2009-01-08 20:03                 ` Dave Hansen
  2009-01-09 11:07                   ` Chandru
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2009-01-08 20:03 UTC (permalink / raw)
  To: Chandru; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Thu, 2009-01-08 at 15:59 +0530, Chandru wrote:
> @@ -898,9 +899,17 @@ static void mark_reserved_regions_for_ni
>                          * if reserved region extends past active region
>                          * then trim size to active region
>                          */
> -                       if (end_pfn > node_ar.end_pfn)
> +                       if (end_pfn > node_ar.end_pfn) {
>                                 reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
>                                         - (start_pfn << PAGE_SHIFT);
> +                               /*
> +                                * resize it further if the reservation could
> +                                * cross the last page in this node
> +                                */
> +                               if (PFN_UP(physbase+reserve_size) >
> +                                                node_end_pfn)
> +                                       reserve_size -= PAGE_SIZE;
> +                       }
>                         /*

Now I'm even more confused.  Could you please send a fully changelogged
patch that describes the problem, and how this fixes it?  This just
seems like an off-by-one error, which isn't what I thought we had before
at all.

I'm also horribly confused why PFN_UP is needed here.  Is 'physbase' not
page aligned?  reserve_size looks like it *has* to be.  'end_pfn' is
always (as far as I have ever seen in the kernel) the pfn of the page
after the area we are interested in and we treat it as such in that
function.  In the case of an unaligned physbase, that wouldn't be true.

Think of the case where we have a 1-byte reservation.  start_pfn will
equal end_pfn and we won't go into that while loop at *all* and won't
reserve anything.

Does 'end_pfn' need fixing?

-- Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-08 20:03                 ` Dave Hansen
@ 2009-01-09 11:07                   ` Chandru
  2009-01-15  8:05                     ` Chandru
  0 siblings, 1 reply; 19+ messages in thread
From: Chandru @ 2009-01-09 11:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Friday 09 January 2009 01:33:12 Dave Hansen wrote:
> Now I'm even more confused.  Could you please send a fully changelogged
> patch that describes the problem, and how this fixes it?  This just
> seems like an off-by-one error, which isn't what I thought we had before
> at all.
> 
> I'm also horribly confused why PFN_UP is needed here.  Is 'physbase' not
> page aligned?  reserve_size looks like it *has* to be.  'end_pfn' is
> always (as far as I have ever seen in the kernel) the pfn of the page
> after the area we are interested in and we treat it as such in that
> function.  In the case of an unaligned physbase, that wouldn't be true.
> 
> Think of the case where we have a 1-byte reservation.  start_pfn will
> equal end_pfn and we won't go into that while loop at *all* and won't
> reserve anything.
> 
> Does 'end_pfn' need fixing?
> 

Attached is the console log with debug command line parameters enabled and 
with couple of more debug statements added to the code.  Please take a look at it. 

thanks,
Chandru


[-- Attachment #2: console_log.txt --]
[-- Type: text/plain, Size: 5272 bytes --]

Using 0078209e bytes for initrd buffer
Please wait, loading kernel...
Allocated 00d00000 bytes for kernel @ 02d00000
   Elf64 kernel loaded...
Loading ramdisk...
ramdisk loaded 0078209e @ 00d00000
OF stdout device is: /vdevice/vty@30000000
Hypertas detected, assuming LPAR !
command line: root=/dev/disk/by-id/scsi-35000cca0030c7be2-part3  xmon=on crashkernel=256M@32M debug bootmem_debug numa=debug
memory layout at init:
  alloc_bottom : 0000000003880000
  alloc_top    : 0000000010000000
  alloc_top_hi : 0000000100000000
  rmo_top      : 0000000010000000
  ram_top      : 0000000100000000
Looking for displays
instantiating rtas at 0x000000000f4e0000 ... done
boot cpu hw idx 0000000000000000
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000003a90000 -> 0x0000000003a917bc
Device tree struct  0x0000000003aa0000 -> 0x0000000003ac0000
Calling quiesce ...
returning from prom_init
Reserving 256MB of memory at 32MB for crashkernel (System RAM: 4096MB)
Phyp-dump disabled at boot time
Using pSeries machine description
Page orders: linear mapping = 24, virtual = 16, io = 12
Using 1TB segments
Found initrd at 0xc000000000d00000:0xc00000000148209e
console [udbg0] enabled
Partition configured for 4 cpus.
CPU maps initialized for 2 threads per core
 (thread shift is 1)
Starting Linux PPC64 #7 SMP Fri Jan 9 04:50:05 CST 2009
-----------------------------------------------------
ppc64_pft_size                = 0x1a
physicalMemorySize            = 0x100000000
htab_hash_mask                = 0x7ffff
-----------------------------------------------------
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.28-1-ppc64 (root@rulerlp10) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #7 SMP Fri Jan 9 04:50:05 CST 2009
[boot]0012 Setup Arch
NUMA associativity depth for CPU/Memory: 3
Node 0 Memory:
Node 5 Memory: 0x0-0x10000000
Node 7 Memory: 0x10000000-0x100000000
adding cpu 0 to node 0
node 0
NODE_DATA() = c0000000fffeda80
node 5
NODE_DATA() = c000000001f78800
start_paddr = 0
end_paddr = 10000000
bootmap_paddr = 1f60000
bootmem::init_bootmem_core nid=5 start=0 map=1f6 end=1000 mapsize=200
bootmem::mark_bootmem_node nid=5 start=0 end=1000 reserve=0 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000
bootmem::__free nid=5 start=0 end=1000
reserve_bootmem :
        physbase :0x0  size:0xb70000 reserve_size=0xb70000 nid=5
        start_pfn:0x0  end_pfn:0xb7
        node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000
        node->node_start_pfn:0x0 node->node_spanned_pages:4096
bootmem::mark_bootmem_node nid=5 start=0 end=b7 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000
bootmem::__reserve nid=5 start=0 end=b7 flags=0
reserve_bootmem :
        physbase :0xd00000  size:0x78209e reserve_size=0x78209e nid=5
        start_pfn:0xd0  end_pfn:0x148
        node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000
        node->node_start_pfn:0x0 node->node_spanned_pages:4096
bootmem::mark_bootmem_node nid=5 start=d0 end=149 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000
bootmem::__reserve nid=5 start=d0 end=149 flags=0
reserve_bootmem :
        physbase :0xd00000  size:0x790000 reserve_size=0x790000 nid=5
        start_pfn:0xd0  end_pfn:0x149
        node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000
        node->node_start_pfn:0x0 node->node_spanned_pages:4096
bootmem::mark_bootmem_node nid=5 start=d0 end=149 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000
bootmem::__reserve nid=5 start=d0 end=149 flags=0
bootmem::__reserve silent double reserve of PFN d0
bootmem::__reserve silent double reserve of PFN d1
...
...
...
bootmem::__reserve silent double reserve of PFN 147
bootmem::__reserve silent double reserve of PFN 148
reserve_bootmem :
        physbase :0x1f60000  size:0x10000 reserve_size=0x10000 nid=5
        start_pfn:0x1f6  end_pfn:0x1f7
        node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000
        node->node_start_pfn:0x0 node->node_spanned_pages:4096
bootmem::mark_bootmem_node nid=5 start=1f6 end=1f7 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000
bootmem::__reserve nid=5 start=1f6 end=1f7 flags=0
reserve_bootmem :
        physbase :0x1f78800  size:0x10087800 reserve_size=0xe090000 nid=5
        start_pfn:0x1f7  end_pfn:0x1200
        node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000
        node->node_start_pfn:0x0 node->node_spanned_pages:4096
bootmem::mark_bootmem_node nid=5 start=1f7 end=1001 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000
------------[ cut here ]------------
kernel BUG at mm/bootmem.c:279!
cpu 0x0: Vector: 700 (Program Check) at [c0000000008a39d0]
    pc: c000000000676ec0: .mark_bootmem_node+0xb4/0x12c
    lr: c000000000676e98: .mark_bootmem_node+0x8c/0x12c
    sp: c0000000008a3c50
   msr: 8000000000021032
  current = 0xc0000000007fa820
  paca    = 0xc000000000962c00
    pid   = 0, comm = swapper
kernel BUG at mm/bootmem.c:279!
enter ? for help
[c0000000008a3d00] c000000000661d3c .do_init_bootmem+0xc3c/0xd78
[c0000000008a3e40] c0000000006567f0 .setup_arch+0x1a4/0x21c
[c0000000008a3ee0] c000000000650868 .start_kernel+0xdc/0x56c
[c0000000008a3f90] c0000000000083b8 .start_here_common+0x1c/0x64
0:mon>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-09 11:07                   ` Chandru
@ 2009-01-15  8:05                     ` Chandru
  2009-01-16 12:16                       ` Chandru
  0 siblings, 1 reply; 19+ messages in thread
From: Chandru @ 2009-01-15  8:05 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Friday 09 January 2009 16:37:24 Chandru wrote:
> On Friday 09 January 2009 01:33:12 Dave Hansen wrote:
> > Now I'm even more confused.  Could you please send a fully changelogged
> > patch that describes the problem, and how this fixes it?  This just
> > seems like an off-by-one error, which isn't what I thought we had before
> > at all.
> > 
> > I'm also horribly confused why PFN_UP is needed here.  Is 'physbase' not
> > page aligned?  reserve_size looks like it *has* to be.  'end_pfn' is
> > always (as far as I have ever seen in the kernel) the pfn of the page
> > after the area we are interested in and we treat it as such in that
> > function.  In the case of an unaligned physbase, that wouldn't be true.
> > 
> > Think of the case where we have a 1-byte reservation.  start_pfn will
> > equal end_pfn and we won't go into that while loop at *all* and won't
> > reserve anything.
> > 
> > Does 'end_pfn' need fixing?
> > 
> 
> Attached is the console log with debug command line parameters enabled and 
> with couple of more debug statements added to the code.  Please take a look at it. 
> 
> thanks,
> Chandru
> 

Hello Dave, From the debug console output, if there is anything you can add here,
pls let me know.

thanks 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-15  8:05                     ` Chandru
@ 2009-01-16 12:16                       ` Chandru
  2009-01-16 17:52                         ` Dave Hansen
  0 siblings, 1 reply; 19+ messages in thread
From: Chandru @ 2009-01-16 12:16 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Thursday 15 January 2009 13:35:27 Chandru wrote:
> Hello Dave, From the debug console output, if there is anything you can add
> here, pls let me know.

As we can see from the console output here,  physbase isn't page aligned when 
the panic occurs.  So we could as well send (start_pfn << PAGE_SHIFT) to 
reserve_bootmem_node() instead of physbase. your thoughts ?.

Also end_pfn in mark_reserved_region_for_nid() is defined as 

unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);

Does this refer to the pfn after the area that we are interested in ?.  We have 
atleast two fixes here,  
1. Limit start and end to bdata->node_min_pfn  and bdata->node_low_pfn in 
reserve_bootmem_node() and add comments out in there that the caller of the 
funtion should be aware of how much are they reserving. 
2. send (start_pfn << PAGE_SHIFT) to reserve_bootmem_node() instead of 
physbase. 

Chandru

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-16 12:16                       ` Chandru
@ 2009-01-16 17:52                         ` Dave Hansen
  2009-01-19  8:11                           ` Chandru
  2009-01-19 11:30                           ` Chandru
  0 siblings, 2 replies; 19+ messages in thread
From: Dave Hansen @ 2009-01-16 17:52 UTC (permalink / raw)
  To: Chandru; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Fri, 2009-01-16 at 17:46 +0530, Chandru wrote:
> On Thursday 15 January 2009 13:35:27 Chandru wrote:
> > Hello Dave, From the debug console output, if there is anything you can add
> > here, pls let me know.
> 
> As we can see from the console output here,  physbase isn't page aligned when 
> the panic occurs.  So we could as well send (start_pfn << PAGE_SHIFT) to 
> reserve_bootmem_node() instead of physbase. your thoughts ?.
> 
> Also end_pfn in mark_reserved_region_for_nid() is defined as 
> 
> unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT);
> 
> Does this refer to the pfn after the area that we are interested in ?.  We have 
> atleast two fixes here,  
> 1. Limit start and end to bdata->node_min_pfn  and bdata->node_low_pfn in 
> reserve_bootmem_node() and add comments out in there that the caller of the 
> funtion should be aware of how much are they reserving. 
> 2. send (start_pfn << PAGE_SHIFT) to reserve_bootmem_node() instead of 
> physbase. 

Just looking at it, that calculation is OK.  But, there was one in your
dmesg that looked a page too long, like page 0x1001 instead of 0x1000.
I'd find out how that happened.

-- Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-16 17:52                         ` Dave Hansen
@ 2009-01-19  8:11                           ` Chandru
  2009-01-19 11:30                           ` Chandru
  1 sibling, 0 replies; 19+ messages in thread
From: Chandru @ 2009-01-19  8:11 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Friday 16 January 2009 23:22:57 Dave Hansen wrote:
> Just looking at it, that calculation is OK.  But, there was one in your
> dmesg that looked a page too long, like page 0x1001 instead of 0x1000.
> I'd find out how that happened.

That is a result of PFN_UP() in reserve_bootmem_node() for which we hit the 
BUG_ON() eventually.   Prior to calling reserve_bootmem_node() we have...

node_ar.end_pfn = node->node_end_pfn = PFN_DOWN(physbase+reserve_size). 

Hence a PFN_UP() will raise the value of 'end'. The kernel has 
CONFIG_PPC_64K_PAGES enabled in the config. 

Chandru

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-16 17:52                         ` Dave Hansen
  2009-01-19  8:11                           ` Chandru
@ 2009-01-19 11:30                           ` Chandru
  2009-01-20  8:13                             ` Chandru
  2009-01-22  0:29                             ` Dave Hansen
  1 sibling, 2 replies; 19+ messages in thread
From: Chandru @ 2009-01-19 11:30 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

In case either physbase or reserve_size are not page aligned and in addition
if the following condition is also true

node_ar.end_pfn = node->node_end_pfn = PFN_DOWN(physbase+reserve_size).

we may hit the BUG_ON(end > bdata->node_low_pfn) in mark_bootmem_node() in
mm/bootmem.c Hence pass the pfn that the physbase is part of and align
reserve_size before calling reserve_bootmem_node().

Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- linux-2.6.29-rc2/arch/powerpc/mm/numa.c.orig	2009-01-19 16:14:49.000000000 +0530
+++ linux-2.6.29-rc2/arch/powerpc/mm/numa.c	2009-01-19 16:36:38.000000000 +0530
@@ -901,7 +901,8 @@ static void mark_reserved_regions_for_ni
 		get_node_active_region(start_pfn, &node_ar);
 		while (start_pfn < end_pfn &&
 			node_ar.start_pfn < node_ar.end_pfn) {
-			unsigned long reserve_size = size;
+			unsigned long reserve_size = (size >> PAGE_SHIFT) <<
+								PAGE_SHIFT;
 			/*
 			 * if reserved region extends past active region
 			 * then trim size to active region
@@ -917,7 +918,8 @@ static void mark_reserved_regions_for_ni
 				dbg("reserve_bootmem %lx %lx nid=%d\n",
 					physbase, reserve_size, node_ar.nid);
 				reserve_bootmem_node(NODE_DATA(node_ar.nid),
-						physbase, reserve_size,
+						(start_pfn << PAGE_SHIFT),
+						reserve_size,
 						BOOTMEM_DEFAULT);
 			}
 			/*

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-19 11:30                           ` Chandru
@ 2009-01-20  8:13                             ` Chandru
  2009-01-22  0:29                             ` Dave Hansen
  1 sibling, 0 replies; 19+ messages in thread
From: Chandru @ 2009-01-20  8:13 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

Chandru wrote:
> In case either physbase or reserve_size are not page aligned and in addition
> if the following condition is also true
>
> node_ar.end_pfn = node->node_end_pfn = PFN_DOWN(physbase+reserve_size).
>
> we may hit the BUG_ON(end > bdata->node_low_pfn) in mark_bootmem_node() in
> mm/bootmem.c Hence pass the pfn that the physbase is part of and align
> reserve_size before calling reserve_bootmem_node().
>
> Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> --- linux-2.6.29-rc2/arch/powerpc/mm/numa.c.orig	2009-01-19 16:14:49.000000000 +0530
> +++ linux-2.6.29-rc2/arch/powerpc/mm/numa.c	2009-01-19 16:36:38.000000000 +0530
> @@ -901,7 +901,8 @@ static void mark_reserved_regions_for_ni
>  		get_node_active_region(start_pfn, &node_ar);
>  		while (start_pfn < end_pfn &&
>  			node_ar.start_pfn < node_ar.end_pfn) {
> -			unsigned long reserve_size = size;
> +			unsigned long reserve_size = (size >> PAGE_SHIFT) <<
> +								PAGE_SHIFT;
>  			/*
>  			 * if reserved region extends past active region
>  			 * then trim size to active region
> @@ -917,7 +918,8 @@ static void mark_reserved_regions_for_ni
>  				dbg("reserve_bootmem %lx %lx nid=%d\n",
>  					physbase, reserve_size, node_ar.nid);
>  				reserve_bootmem_node(NODE_DATA(node_ar.nid),
> -						physbase, reserve_size,
> +						(start_pfn << PAGE_SHIFT),
> +						reserve_size,
>  						BOOTMEM_DEFAULT);
>  			}
>  			/*
>   
does this patch look good ?, do you concur with it ?

thanks,
Chandru

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-19 11:30                           ` Chandru
  2009-01-20  8:13                             ` Chandru
@ 2009-01-22  0:29                             ` Dave Hansen
  2009-01-22  8:20                               ` Chandru
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2009-01-22  0:29 UTC (permalink / raw)
  To: Chandru; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Mon, 2009-01-19 at 17:00 +0530, Chandru wrote:
> --- linux-2.6.29-rc2/arch/powerpc/mm/numa.c.orig	2009-01-19 16:14:49.000000000 +0530
> +++ linux-2.6.29-rc2/arch/powerpc/mm/numa.c	2009-01-19 16:36:38.000000000 +0530
> @@ -901,7 +901,8 @@ static void mark_reserved_regions_for_ni
>  		get_node_active_region(start_pfn, &node_ar);
>  		while (start_pfn < end_pfn &&
>  			node_ar.start_pfn < node_ar.end_pfn) {
> -			unsigned long reserve_size = size;
> +			unsigned long reserve_size = (size >> PAGE_SHIFT) <<
> +								PAGE_SHIFT;
>  			/*
>  			 * if reserved region extends past active region
>  			 * then trim size to active region
> @@ -917,7 +918,8 @@ static void mark_reserved_regions_for_ni
>  				dbg("reserve_bootmem %lx %lx nid=%d\n",
>  					physbase, reserve_size, node_ar.nid);
>  				reserve_bootmem_node(NODE_DATA(node_ar.nid),
> -						physbase, reserve_size,
> +						(start_pfn << PAGE_SHIFT),
> +						reserve_size,
>  						BOOTMEM_DEFAULT);
>  			}
>  			/*

Chandru, I don't mean to keep ragging on your patches, but I really
don't think this is right, yet.

Let's take, for instance, a 1-byte reservation.  With this code, you've
suddenly turned that into a 0-byte reservation, and that *can't* be
right.  The same thing happens if you have a reservation that spans two
pages.  If you unconditionally round it down, then you might miss the
part that spans a portion of the second page.

It needs to be rounded down like you are suggesting here, but only in
the case where we've gone over the *CURRENT* node's boundary.  This is
kinda what that "if (end_pfn > node_ar.end_pfn)" check is doing.  But,
it evidently screws it up if the overlap isn't by an entire page or
something.

Please also, for pete's sake, use masks (a la PAGE_MASK) or macros if
you're going to page-align something.  Don't shift down and up like
that.  

-- Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 2.6.28-rc9 panics with crashkernel=256M while booting
  2009-01-22  0:29                             ` Dave Hansen
@ 2009-01-22  8:20                               ` Chandru
  0 siblings, 0 replies; 19+ messages in thread
From: Chandru @ 2009-01-22  8:20 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linuxppc-dev, Andrew Morton, Paul Mackerras, linux-kernel

On Thursday 22 January 2009 05:59:39 Dave Hansen wrote:
> Let's take, for instance, a 1-byte reservation.  With this code, you've
> suddenly turned that into a 0-byte reservation, and that *can't* be
> right.  The same thing happens if you have a reservation that spans two
> pages.  If you unconditionally round it down, then you might miss the
> part that spans a portion of the second page.
> 
> It needs to be rounded down like you are suggesting here, but only in
> the case where we've gone over the *CURRENT* node's boundary.  This is
> kinda what that "if (end_pfn > node_ar.end_pfn)" check is doing.  But,
> it evidently screws it up if the overlap isn't by an entire page or
> something.

I assumed the condition 'while (start_pfn < end_pfn  && .. )' asks for atleast
a PAGE_SIZE difference between them and hence went ahead with that patch.
My guess was a 1-byte , 2-byte or a (PAGE_SIZE -1)-byte reservations may not even
go into that loop.  However we just need a fix for this problem. So if there is a 
better fix that you have please post it to lkml. 

Thanks,
Chandru

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-01-22  8:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200812241325.49404.chandru@in.ibm.com>
2008-12-25  7:35 ` 2.6.28-rc9 panics with crashkernel=256M while booting Andrew Morton
2008-12-25  8:07   ` Andrew Morton
2008-12-26  0:59   ` Paul Mackerras
2008-12-29 21:36     ` Dave Hansen
2009-01-05 13:49       ` Chandru
2009-01-05 16:30         ` Dave Hansen
2009-01-07 12:58           ` Chandru
2009-01-07 17:25             ` Dave Hansen
2009-01-08 10:29               ` Chandru
2009-01-08 20:03                 ` Dave Hansen
2009-01-09 11:07                   ` Chandru
2009-01-15  8:05                     ` Chandru
2009-01-16 12:16                       ` Chandru
2009-01-16 17:52                         ` Dave Hansen
2009-01-19  8:11                           ` Chandru
2009-01-19 11:30                           ` Chandru
2009-01-20  8:13                             ` Chandru
2009-01-22  0:29                             ` Dave Hansen
2009-01-22  8:20                               ` Chandru

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).